This is an archive of the discontinued LLVM Phabricator instance.

[GPGPU] Synchronize after each kernel, not each copy out
ClosedPublic

Authored by grosser on Aug 18 2017, 3:49 AM.

Details

Summary

This change reduces the overall number of synchronize calls for kernels with
a lot of output data at the cost of additional synchronize calls for kernels
launched in sequence without any device to host transfers in between. As the
latter pattern is a lot less frequent, this seems a better tradeoff.

Even though the above motivation would be motivation enough, this is just
a step towards enabling ppcg to not compute to and from device copy calls
at all, which would be incorrect in case we still relied on these calls to
place our synchronization statements.

Event Timeline

grosser created this revision.Aug 18 2017, 3:49 AM
bollu accepted this revision.Aug 18 2017, 4:53 AM

Does this patch depend on some other patch? If so, please record this information by creating a Parent Revision. (Edit Related Revisions -> Edit Parent Revision). If some other patch depends on this, then make this a parent of that patch. Helps when looking up revisions later on.

Other than that, LGTM.

This revision is now accepted and ready to land.Aug 18 2017, 4:53 AM
This revision was automatically updated to reflect the committed changes.