Skip to content

Conversation

@bosilca
Copy link
Contributor

@bosilca bosilca commented Sep 10, 2024

The idea is the following:

  • tasks incarnations (aka. BODY) can be marked with the "batch" property allowing the runtime to provide the task with the entire list of ready tasks of the execution stream instead of just extracting the head.
  • this list of ready tasks is in fact a ring, that can then be trimmed by the kernel and divided into the tasks to be batch and the rest. While the batch group will be submitted for execution (user responsibility), the rest of the tasks will be added back into the stream pending list, in the order in which they were provided in the ring. This mechanism also allow the user to reorder the tasks based on some user-level criteria.
  • the kernel also needs to provide a callback into the gpu_task complete_stage, such that the runtime can call the specialized function able to complete all batched tasks.

The idea is the following:
- tasks incarnations (aka. BODY) can be marked with the "batch" property
  allowing the runtime to provide the task with the entire list of ready
  tasks of the execution stream instead of just extracting the head.
- this list of ready tasks is in fact a ring, that can then be trimmed
  by the kernel and divided into batch and the rest. The rest of the
  tasks will be left in the ring, while the batch group will be
  submitted for execution.
- the kernel also needs to provide a callback into the gpu_task
  complete_stage, such that the runtime can call the specialized
  function able to complete all batched tasks.

Signed-off-by: George Bosilca <gbosilca@nvidia.com>
@bosilca bosilca force-pushed the topic/batched_tasks branch from fffc3ec to 9998554 Compare September 10, 2024 05:33
@abouteiller abouteiller self-requested a review October 11, 2024 15:38
* from the task ring, and singleton it or replace it with the aggregated tasks
* as necessary.
*/
goto move_forward_with_this_task;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we avoid using a goto here?

if( NULL != type_property) {

if (!strcasecmp(type_property->expr->jdf_var, "cuda")
if (!strncasecmp(type_property->expr->jdf_var, "cuda", 4) /* for batched */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a leftover from a prior iteration of that patchset that used the type=cuda_batched instead of adding a new batched property.

I assume the expectation is that we can have batched and non batched CUDA bodies simultaneously. Did you test this works?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants