RequiresDataSharing was always 0, resulting dead code in device runtime library.
Change looks great to me.
Rolling the reduction in leading whitespace in nvptx_target_parallel_reduction_codegen.cpp in with the patch might be contentious, added a couple more reviewers to see if other people would prefer that part split out. I'll accept in a day or so if there are no comments on the whitespace.
This is interesting. This turns out to be the only call to RootS(), and that cascades through a bunch of other code removed in this patch.