We currently call the addNVPTXFeatures function in two places, inside
of the CUDA Toolchain and inside of Clang in the standard entry point.
We normally add features to the job in Clang, so the call inside of the
CUDA toolchain is redundant and results in +ptx features being added.
Since we remove this call, we no longer will have a cached CUDA
installation so we will usually create it twice.
Details
- Reviewers
jdoerfert tra yaxunl - Commits
- rG56ab966a04dd: [CUDA] Stop adding CUDA features twice
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
we no longer will have a cached CUDA installation so we will usually create it twice.
Does that result in extra output in case we find an unexpected CUDA version, or when compiler is run with -v ?
We may want to wrap installation detector creation into some sort of singleton creation function.
We already create another one for the call coming from Clang, this patch just gets rid of the other call. I think the printouts you're talking about come from the variable in the CUDA toolchain specifically. Here we simply create one to get the version and throw it away. It's not ideal to do the same work twice, so we could wrap this into some singleton interface. Maybe a static optional value inside of the Toolchains/Cuda.cpp file wither a getter that returns a reference to it. Though I don't think this is likely to be a bottleneck.
We already heard complaints that searching for CUDA installation in multiple places does add a measurable delay when the search hits NFS-mounted directories.
Replacing uses of CudaInstallation with a getter function returning a reference to a singleton would be great.
Sounds good, I'll make a patch for it. Considering that this one doesn't change that this one doesn't change anything we don't already do on that front is it good to land separately?
Do we have tests that verify -target-feature arguments? It may be worth adding a test case there checking for redundant features.
Yeah, we have some existing tests that check for including the target features at least once. I felt like there was no need to include a test for what was more or less an oversight
The test helps a lot to illustrate what the patch does. There are enough moving parts in the driver that, while I do believe that what the patch description says is intended to be true, I would like to see specific evidence that it's indeed the case.
Think of it as a test case that should've been added when we've started passing the feature flags.