This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Stop adding CUDA features twice
ClosedPublic

Authored by jhuber6 on Jun 28 2022, 12:24 PM.

Details

Summary

We currently call the addNVPTXFeatures function in two places, inside
of the CUDA Toolchain and inside of Clang in the standard entry point.
We normally add features to the job in Clang, so the call inside of the
CUDA toolchain is redundant and results in +ptx features being added.
Since we remove this call, we no longer will have a cached CUDA
installation so we will usually create it twice.

Diff Detail

Event Timeline

jhuber6 created this revision.Jun 28 2022, 12:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2022, 12:24 PM
jhuber6 requested review of this revision.Jun 28 2022, 12:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2022, 12:24 PM
tra added a comment.Jun 28 2022, 12:39 PM

we no longer will have a cached CUDA installation so we will usually create it twice.

Does that result in extra output in case we find an unexpected CUDA version, or when compiler is run with -v ?

We may want to wrap installation detector creation into some sort of singleton creation function.

we no longer will have a cached CUDA installation so we will usually create it twice.

Does that result in extra output in case we find an unexpected CUDA version, or when compiler is run with -v ?

We may want to wrap installation detector creation into some sort of singleton creation function.

We already create another one for the call coming from Clang, this patch just gets rid of the other call. I think the printouts you're talking about come from the variable in the CUDA toolchain specifically. Here we simply create one to get the version and throw it away. It's not ideal to do the same work twice, so we could wrap this into some singleton interface. Maybe a static optional value inside of the Toolchains/Cuda.cpp file wither a getter that returns a reference to it. Though I don't think this is likely to be a bottleneck.

tra added a comment.Jun 28 2022, 1:14 PM

we no longer will have a cached CUDA installation so we will usually create it twice.

Does that result in extra output in case we find an unexpected CUDA version, or when compiler is run with -v ?

We may want to wrap installation detector creation into some sort of singleton creation function.

We already create another one for the call coming from Clang, this patch just gets rid of the other call. I think the printouts you're talking about come from the variable in the CUDA toolchain specifically. Here we simply create one to get the version and throw it away. It's not ideal to do the same work twice, so we could wrap this into some singleton interface. Maybe a static optional value inside of the Toolchains/Cuda.cpp file wither a getter that returns a reference to it. Though I don't think this is likely to be a bottleneck.

We already heard complaints that searching for CUDA installation in multiple places does add a measurable delay when the search hits NFS-mounted directories.

Replacing uses of CudaInstallation with a getter function returning a reference to a singleton would be great.

We already heard complaints that searching for CUDA installation in multiple places does add a measurable delay when the search hits NFS-mounted directories.

Replacing uses of CudaInstallation with a getter function returning a reference to a singleton would be great.

Sounds good, I'll make a patch for it. Considering that this one doesn't change that this one doesn't change anything we don't already do on that front is it good to land separately?

tra accepted this revision.Jun 28 2022, 1:52 PM

Do we have tests that verify -target-feature arguments? It may be worth adding a test case there checking for redundant features.

This revision is now accepted and ready to land.Jun 28 2022, 1:52 PM

Do we have tests that verify -target-feature arguments? It may be worth adding a test case there checking for redundant features.

Yeah, we have some existing tests that check for including the target features at least once. I felt like there was no need to include a test for what was more or less an oversight

tra added a comment.Jun 28 2022, 2:48 PM

Do we have tests that verify -target-feature arguments? It may be worth adding a test case there checking for redundant features.

Yeah, we have some existing tests that check for including the target features at least once. I felt like there was no need to include a test for what was more or less an oversight

The test helps a lot to illustrate what the patch does. There are enough moving parts in the driver that, while I do believe that what the patch description says is intended to be true, I would like to see specific evidence that it's indeed the case.
Think of it as a test case that should've been added when we've started passing the feature flags.

This revision was automatically updated to reflect the committed changes.