Page MenuHomePhabricator

[amdgpu] Default to code object v3
ClosedPublic

Authored by JonChesterfield on Dec 14 2020, 4:15 PM.

Details

Summary

[amdgpu] Default to code object v3
v4 is not yet readily available, and doesn't appear
to be implemented in the back end

Diff Detail

Event Timeline

JonChesterfield requested review of this revision.Dec 14 2020, 4:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 14 2020, 4:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 14 2020, 5:01 PM
  • another part of docs
t-tye added inline comments.Dec 14 2020, 5:06 PM
llvm/docs/AMDGPUUsage.rst
2873–2874

Move this to the "Code Object V4 Metadata" section.

JonChesterfield marked an inline comment as done.Dec 14 2020, 5:08 PM
t-tye accepted this revision.Dec 14 2020, 5:09 PM

LGTM

This revision is now accepted and ready to land.Dec 14 2020, 5:09 PM
This revision was landed with ongoing or failed builds.Dec 14 2020, 5:11 PM
This revision was automatically updated to reflect the committed changes.

This seems to break check-clang everywhere: http://45.33.8.238/linux/35295/step_7.txt

PTAL, and if it takes a while to investigate, please revert for now.

yaxunl reopened this revision.EditedDec 14 2020, 8:15 PM

@JonChesterfield Did this patch pass ePSDB in gerrit? Better do that before committing it to trunk since we don't know if math libs are compatible with this patch. Also you need to fix lit tests.

This revision is now accepted and ready to land.Dec 14 2020, 8:15 PM
yaxunl requested changes to this revision.Dec 14 2020, 8:20 PM
This revision now requires changes to proceed.Dec 14 2020, 8:20 PM

Thanks! Just saw the CI emails come through. I didn't realise code object version was hardcoded across various tests. That's a weird thing to do when it has no effect on the generated IR.

@yaxunl This would never pass the gerrit testing. AMD internal has moved to code object v4 so changing the default to v3 will break internal tests that assume that. The hsa runtime that can load v4 objects hasn't shipped, either in rocm or as source in github, and the backend that generates v4 object files also hasn't shipped, so defaulting to v4 in the trunk front end is a poor choice.

Thanks! Just saw the CI emails come through. I didn't realise code object version was hardcoded across various tests. That's a weird thing to do when it has no effect on the generated IR.

@yaxunl This would never pass the gerrit testing. AMD internal has moved to code object v4 so changing the default to v3 will break internal tests that assume that. The hsa runtime that can load v4 objects hasn't shipped, either in rocm or as source in github, and the backend that generates v4 object files also hasn't shipped, so defaulting to v4 in the trunk front end is a poor choice.

Can they use rocm release branches of llvm/clang with the corresponding rocm release? As llvm/clang trunk is like the development branch, it is understandable that they may contain new changes that may temporarily not working with previous rocm releases.

JonChesterfield added a comment.EditedDec 15 2020, 3:12 AM

Can they use rocm release branches of llvm/clang with the corresponding rocm release? As llvm/clang trunk is like the development branch, it is understandable that they may contain new changes that may temporarily not working with previous rocm releases.

Code object v4 does not work with any rocm release, or with code published that is not in a release. As far as open source goes, it exists only as some documentation and the change to clang that decided it was the right default to use.

That change turned out to be largely harmless as the back end uses v3 despite the docs and clang -### claiming it will use v4.

Depending on a library that doesn't exist is absolutely not 'understandable'. I believe this patch undoes a mistaken step in that direction and that the hip tests are also in error.

  • Remove hard coded version numbers in tests

@yaxunl Please see Gerrit 456139 for a close approximation to the test changes here. I don't think hip should be hard coding a version number in tests that don't care about it, so would like to move trunk and internal to a regex.

Thanks for fixing the lit tests. Using regex is the right choice.

Do we have a plan about how to merge this to amd-stg-open? Will it cause ePSDB to fail? Do you have a follow up patch to make amd-stg-open happy?

Thanks.

JonChesterfield added a comment.EditedDec 16 2020, 7:50 AM

Thanks for fixing the lit tests. Using regex is the right choice.

Do we have a plan about how to merge this to amd-stg-open? Will it cause ePSDB to fail? Do you have a follow up patch to make amd-stg-open happy?

Thanks.

Extracted as D93398. Suggest we land that, then it flows into gerrit (where it won't break anything), and rebase this patch to reduce the noise.

This will need to be merged carefully into amd-stg-open. No follow up patch necessary, the trick would be to not land this on amd-stg-open if we want to keep v4 as the default.

Note that none of this would be necessary if trunk clang hadn't been prematurely updated to default to a code object version that llvm trunk doesn't use.

yaxunl accepted this revision.Dec 16 2020, 11:10 AM

LGTM. Thanks.

This revision is now accepted and ready to land.Dec 16 2020, 11:10 AM
This revision was landed with ongoing or failed builds.Dec 17 2020, 8:10 AM
This revision was automatically updated to reflect the committed changes.