This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][libomptarget] New reduction scheme for team reductions
ClosedPublic

Authored by gtbercea on Feb 19 2019, 2:55 PM.

Details

Summary

This patch adds a more sophisticated team reduction scheme to the OpenMP libomptarget-nvptx runtime.

The scheme uses a fixed size global memory buffer whose length can be adjusted via compiler flag:

-fopenmp-cuda-teams-reduction-recs-num=1024

The global buffer is a structure of arrays (with default size of 1024 each and controlled by the above flag), one array for each reduction variable.

Values in the buffer are processed by the last team to finish executing the body of the target region.

In addition to adding support for the new flag, the compiler also emits special functions used for the reduction of the intermediate reduction values. These changes will be added in a separate compiler patch following this one.

Diff Detail

Repository
rOMP OpenMP

Event Timeline

gtbercea created this revision.Feb 19 2019, 2:55 PM
gtbercea edited the summary of this revision. (Show Details)Feb 19 2019, 2:55 PM
This revision is now accepted and ready to land.Feb 20 2019, 6:26 AM
This revision was automatically updated to reflect the committed changes.