This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add amdgpu-unify-metadata pass
ClosedPublic

Authored by rampitec on Oct 7 2016, 2:50 PM.

Details

Summary

Multiple metadata values for records such as opencl.ocl.version, llvm.ident and similar are created after linking several modules. For some of them, notably opencl.ocl.version, this creates semantic problem because we cannot tell which version of OpenCL the composite module conforms.

Moreover, such repetitions of identical values often create a huge list of unneeded metadata, which grows bitcode size both in memory and stored on disk. It can go up to several Mb when linked against our OpenCL library. Lastly, such long lists obscure reading of dumped IR.

The pass unifies metadata after linking.

Ideally we would like to run this as a last step during linking, but such interface is not available for a target. Therefor it is run as a first step in the optimizer as guided by the AMDGPUTargetMachine::addEarlyAsPossiblePasses(). There is a drawback, passes added this way go to a function pass manager, so while pass shall be a ModulePass by the nature it is converted to a function pass to work on the function's parent. Note, second and further invocations for other functions do nothing because metadata is already unified. In the future we may consider to convert it back to a ModulePass.

For the OpenCL version we could use two modes as guided by the last argument of unifyVersionMD() - pick largest or pick the first one. In general largest may be more correct, but in reality first is a right one with our library. The library is built as OpenCL 2.0 although it does not use specific features in a calls to functions which are not 2.0 specific. The first value in the list is always user's kernel module version, so we can rely on that first value. We may reconsider this in the future, but that would also require us to split library into 1.2 and 2.0 portions.

Diff Detail

Repository
rL LLVM

Event Timeline

rampitec updated this revision to Diff 73987.Oct 7 2016, 2:50 PM
rampitec retitled this revision from to [AMDGPU] Add amdgpu-unify-metadata pass.
rampitec updated this object.
rampitec added reviewers: tstellarAMD, vpykhtin.
rampitec set the repository for this revision to rL LLVM.
rampitec added a project: Restricted Project.
rampitec added a subscriber: llvm-commits.

I think Matt had a patch to do this in common code.

arsenm edited edge metadata.Oct 11 2016, 10:22 PM

This should be handled by the generic linker, this isn't an AMDGPU specific issue. I had this https://reviews.llvm.org/D20582 (but there are more comments not visible in phabricator). An RFC should be posted for whether all named metadata should behave like a set

This should be handled by the generic linker, this isn't an AMDGPU specific issue. I had this https://reviews.llvm.org/D20582 (but there are more comments not visible in phabricator). An RFC should be posted for whether all named metadata should behave like a set

I see. D20582 only covers llvm.ident. The same happens to opencl.ocl.version, spir.version and others. Notably opencl and spir version handling are specific to the target stack. Like in our case we cannot use maximum version, but have to take version from the kernel module. Some other implementation probably would want different behavior. So in my opinion D20582 is a good move, but does not replace this change.

tstellarAMD added a comment.EditedOct 12 2016, 12:14 PM

This should be handled by the generic linker, this isn't an AMDGPU specific issue. I had this https://reviews.llvm.org/D20582 (but there are more comments not visible in phabricator). An RFC should be posted for whether all named metadata should behave like a set

I see. D20582 only covers llvm.ident. The same happens to opencl.ocl.version, spir.version and others. Notably opencl and spir version handling are specific to the target stack. Like in our case we cannot use maximum version, but have to take version from the kernel module. Some other implementation probably would want different behavior. So in my opinion D20582 is a good move, but does not replace this change.

Take a look at the proposed solution here: https://marc.info/?l=llvm-commits&m=147370670326170&w=2 This is what Matt was referring to when he mentioned doing a RFC on the list.

Take a look at the proposed solution here: https://marc.info/?l=llvm-commits&m=147370670326170&w=2 This is what Matt was referring to when he mentioned doing a RFC on the list.

That's an interesting idea, but a long haul. Even then I do not see how would it resolve problem with choosing a right version from the proposed vector of values.
I mean the patch here is not ideal as well, but it resolves problem for now. I guess when an RFC is created and implemented we would still need something like this to make a target choice.

For the OpenCL version metadata, is there anything in the backend that uses it?

For the OpenCL version metadata, is there anything in the backend that uses it?

One thing is that OpenCL 2.0 supports non-uniform workgroup sizes, and it is non-uniform by default. A query to get_local_size and similar become surprisingly long expansion and you cannot really constant fold a known wg size any longer. Under OCL 1.2 you can optimize it freely.

For the OpenCL version metadata, is there anything in the backend that uses it?

The runtime metadata relies on it to know the OpenCL version.

It seems like merging the OpenCl metadata would have to be done in an AMDGPU backend pass, since it uses rules specific to our target. However, I think de-duplicating generic metadata like .ident should be handled in a target independent way.

lib/Target/AMDGPU/AMDGPUUnifyMetadata.cpp
105 ↗(On Diff #73987)

Would it be better to use Twine here?

It seems like merging the OpenCl metadata would have to be done in an AMDGPU backend pass, since it uses rules specific to our target. However, I think de-duplicating generic metadata like .ident should be handled in a target independent way.

I agree. When D20582 is submitted we can remove one line from this change.

lib/Target/AMDGPU/AMDGPUUnifyMetadata.cpp
105 ↗(On Diff #73987)

I have to modify the string each iteration (potentially). That is not how Twine does work, it is basically not created to be modified later. In particular is does not have operator=(), only constructor.

vpykhtin added inline comments.Dec 2 2016, 4:49 AM
test/CodeGen/AMDGPU/unify-metadata.ll
20 ↗(On Diff #73987)

How about adding test for deduplicating extension strings?

rampitec updated this revision to Diff 80126.Dec 2 2016, 1:35 PM

Changed source to conform to SPIR specification - opencl.* named metadata all contain a single metadata reference, which is itself a list of values.
Added opencl.used.extensions test.
Changed loops to C++11 range syntax.

rampitec marked 3 inline comments as done.Dec 2 2016, 1:36 PM
vpykhtin accepted this revision.Dec 8 2016, 4:27 AM
vpykhtin edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Dec 8 2016, 4:27 AM
This revision was automatically updated to reflect the committed changes.