Hello,
There are two forms of parallel_for_work_group. One just takes the number of work-groups, as you’ve pointed out, and the other takes both the number of work-groups and the size of each work-group.
The first form:
void parallel_for_work_group(
range<dimensions> numWorkGroups,
WorkgroupFunctionType kernelFunc)
and the second form:
void parallel_for_work_group(
range<dimensions> numWorkGroups,
range<dimensions> workGroupSize,
WorkgroupFunctionType kernelFunc)
Within parallel_for_work_group, you typically embed a parallel_for_work_item construct (as your example has). One form of parallel_for_work_item allows you to specify the number of work-items in the work-group (SYCL has the concept of a logical work-group size, but I don’t think that’s important here).
First form:
void parallel_for_work_item(workItemFunctionT func) const
and the second form:
void parallel_for_work_item(range<dimensions> logicalRange,
workItemFunctionT func) const;
If you combine the forms of these constructs that do not define the work-group size, then you’re correct that it isn’t clear what the work-group size (and total number of work-items globally) should be. The spec actually says that doing this is illegal because it’s ambiguous. In the definition of the parallel_for_work_item form that does not take a logical work-group size:
It is undefined behavior for this member function to be invoked from within the parallel_for_work_group form that does not define work-group size, because then the number of work-items that should execute the code is not defined. It is expected that this form of parallel_for_work_item is invoked within the parallel_for_work_group form that specififies the size of a workgroup.
So I think the answer to your question is that you should define a work-group size either within parallel_for_work_group, or within the nested parallel_for_work_item calls. It is illegal to not specify the work-group size anywhere because then the global amount of work to do is undefined, which I think is what motivated you to post.
You can specify the work-group size in both parallel_for_work_group and parallel_for_work_item, and those sizes can be different. That’s where the SYCL concept of “physical” versus “logical” work-group sizes starts to matter, and can be quite powerful.