This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Added a feature to use short pointers for const/local/shared AS.
ClosedPublic

Authored by tra on Apr 26 2018, 3:18 PM.

Details

Summary

Const/local/shared address spaces are all < 4GB and we can always use
32-bit pointers to access them. This has substantial performance impact
on kernels that uses shared memory for intermediary results.

The feature is disabled by default.

Diff Detail

Repository
rL LLVM

Event Timeline

tra created this revision.Apr 26 2018, 3:18 PM

Ben, are you comfortable reviewing this?

llvm/lib/Target/NVPTX/NVPTX.td
78 ↗(On Diff #144215)

Should we s/short/32-bit/, or do you want to leave open the possibility of using shorter pointers?

llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
738 ↗(On Diff #144215)

s/ShortPointer /ShortPointers /?

Same below.

tra updated this revision to Diff 144228.Apr 26 2018, 3:37 PM

UseShortPointer -> UseShortPointers

tra updated this revision to Diff 144230.Apr 26 2018, 3:41 PM

Updated feature description.

tra marked an inline comment as done.Apr 26 2018, 3:42 PM
tra added inline comments.
llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
738 ↗(On Diff #144215)

In this case we're dealing with just one pointer, while the subtarget can use any number of them. :-)
Oh, well, plural it is.

bkramer added inline comments.Apr 27 2018, 2:09 AM
llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
855 ↗(On Diff #144230)

There are 4 identical instances of the pointer size computation in this patch, it should really be pulled into its own function.

You can also consider getting the pointer size from the DataLayout instead. Might be cleaner.

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
1255 ↗(On Diff #144230)

Why is this needed? The pointer returned by getPointerTy should be right wrt data layout.

tra updated this revision to Diff 144393.Apr 27 2018, 1:39 PM

Use DataLayout to obtain pointer size.

tra marked 2 inline comments as done.Apr 27 2018, 1:41 PM
tra added inline comments.Apr 27 2018, 1:43 PM
llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
855 ↗(On Diff #144230)

That's indeed the right thing to do here. I've updated the patch to use DataLayout for determining pointer size.

bkramer accepted this revision.Apr 27 2018, 2:00 PM

lgtm

This revision is now accepted and ready to land.Apr 27 2018, 2:00 PM
arsenm added a subscriber: arsenm.Apr 27 2018, 2:19 PM

NVPTX wasn't already doing this? As this is a module property, a subtarget feature can't be used for this. For example what happens for a global value?

tra added a comment.Apr 27 2018, 2:30 PM

NVPTX wasn't already doing this? As this is a module property, a subtarget feature can't be used for this. For example what happens for a global value?

No, NVPTX is always using 64-bit pointers for all address spaces and that's one of the common situations where it gives NVCC an edge.

It's not clear what concerns you about global values. Could you elaborate or give me an example?

In D46147#1081660, @tra wrote:

NVPTX wasn't already doing this? As this is a module property, a subtarget feature can't be used for this. For example what happens for a global value?

No, NVPTX is always using 64-bit pointers for all address spaces and that's one of the common situations where it gives NVCC an edge.

It's not clear what concerns you about global values. Could you elaborate or give me an example?

The subtarget is a function level property. The pointer size / datalayout is a module property and can't be different between functions. If you have multiple functions with different pointer sizes in a module (or no functions at all), then what pointer size ends up used for a global? The global itself has a size, and any ConstantExpr pointers in its initializer will also need to have a size.

tra added a comment.Apr 27 2018, 3:08 PM

The subtarget is a function level property. The pointer size / datalayout is a module property and can't be different between functions. If you have multiple functions with different pointer sizes in a module (or no functions at all), then what pointer size ends up used for a global? The global itself has a size, and any ConstantExpr pointers in its initializer will also need to have a size.

I think I see. So, someone could add "target-attributes" to a function and that may potentially override the features. It does not harm us in this case as we only set datalayout once when it's called during creation NVPTXTargetMachine so only the features supplied on command line will be affected. That said, I do see your point that subtarget is not the right place for this option. After a bit of digging I've found a curious comment in LLVMTargetMachine, so this approach is both used in other places and considered problemetic.

// FIXME: Having an MCSubtargetInfo on the target machine is a hack due
// to some backends having subtarget feature dependent module level
// code generation. This is similar to the hack in the AsmPrinter for
// module level assembly etc.
STI = TheTarget.createMCSubtargetInfo(getTargetTriple().str(), getTargetCPU(),
                                      getTargetFeatureString());

One way out of this is to add a separate command-line option to control this. Is there a better way to do it?

tra updated this revision to Diff 144412.Apr 27 2018, 3:39 PM

Moved control of the feature from subtarget feature to --nvptx-short-ptr command-line option.

tra added a comment.Apr 27 2018, 3:42 PM

@arsenm Will this do?

Would it be possible to get rid of the flag entirely and make this controlled only by the data layout? That's defined on the module level.

Would it be possible to get rid of the flag entirely and make this controlled only by the data layout? That's defined on the module level.

Yes, ultimately the switch is just changing the datalayout. If this is just a short term thing for migration I think a global flag is fine. If you want to be able to swap these properly long term, this is a different arch name / TargetMachine.

tra added a comment.May 3 2018, 1:47 PM

Would it be possible to get rid of the flag entirely and make this controlled only by the data layout? That's defined on the module level.

Yes, ultimately the switch is just changing the datalayout. If this is just a short term thing for migration I think a global flag is fine. If you want to be able to swap these properly long term, this is a different arch name / TargetMachine.

I'm still not quite sure how clang's setting of data layout in the module plays with LLVM's. When they are out of sync, LLVM throws an error which suggests that we currently can't just have arbitrary datalayout specified by the module, it must also match LLVM's idea of the data layout used by particular back-end. At least, that's my understanding how things currently work. I might be wrong.

In any case, I do want the short pointers to eventually become the default. However, it's a somewhat risky change, so I need an escape hatch to be able to test the changes without breaking everyone. I know of at least one issue in the split-GEP pass (will send a patch soon). There are likely more.

arsenm added a comment.May 3 2018, 1:59 PM
In D46147#1087005, @tra wrote:

Would it be possible to get rid of the flag entirely and make this controlled only by the data layout? That's defined on the module level.

Yes, ultimately the switch is just changing the datalayout. If this is just a short term thing for migration I think a global flag is fine. If you want to be able to swap these properly long term, this is a different arch name / TargetMachine.

I'm still not quite sure how clang's setting of data layout in the module plays with LLVM's. When they are out of sync, LLVM throws an error which suggests that we currently can't just have arbitrary datalayout specified by the module, it must also match LLVM's idea of the data layout used by particular back-end. At least, that's my understanding how things currently work. I might be wrong.

In any case, I do want the short pointers to eventually become the default. However, it's a somewhat risky change, so I need an escape hatch to be able to test the changes without breaking everyone. I know of at least one issue in the split-GEP pass (will send a patch soon). There are likely more.

You need to repeat the datalayout in clang as the target expects

This revision was automatically updated to reflect the committed changes.