Page MenuHomePhabricator

[AMDGPU] gfx90a support
ClosedPublic

Authored by rampitec on Feb 17 2021, 2:53 PM.

Diff Detail

Event Timeline

rampitec created this revision.Feb 17 2021, 2:53 PM
rampitec requested review of this revision.Feb 17 2021, 2:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 17 2021, 2:53 PM
Herald added subscribers: MaskRay, wdng. · View Herald Transcript
This revision is now accepted and ready to land.Feb 17 2021, 3:59 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 17 2021, 4:02 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
tra added a subscriber: tra.Feb 17 2021, 4:31 PM

This is a pretty huge patch, with no details in the commit log.

One hour between sending the patch out and landing it is not sufficient for anyone to meaningfully
review the patch and there are no mentions of the review done anywhere else.

While the code only changes AMDGPU back-end, it does not mean that the patch should be just rubber-stamped.

In D96906#2570086, @tra wrote:

This is a pretty huge patch, with no details in the commit log.

One hour between sending the patch out and landing it is not sufficient for anyone to meaningfully
review the patch and there are no mentions of the review done anywhere else.

While the code only changes AMDGPU back-end, it does not mean that the patch should be just rubber-stamped.

It's a year of work necessarily downstream. Every line there was reviewed and tested in the downstream. I understand no one can reasonably review something that big, although I cannot break it into small patches after a year of changes and fixes. Not that I have too much choice.

tra added a comment.Feb 17 2021, 4:52 PM

It's a year of work necessarily downstream. Every line there was reviewed and tested in the downstream. I understand no one can reasonably review something that big, although I cannot break it into small patches after a year of changes and fixes. Not that I have too much choice.

The point is that nobody upstream even got a chance to chime in.

According to https://llvm.org/docs/CodeReview.html

We expect significant patches to be reviewed before being committed.

This patch certainly qualifies as significant.

foad added a subscriber: foad.Feb 18 2021, 7:57 AM
foad added inline comments.
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.workitem.id.ll
23

CO-V3 isn't tested by any RUN line. I think FileCheck might complain about this in future.

41

UNPACKED-TID isn't tested by any RUN line.

kzhuravl added inline comments.Feb 18 2021, 8:49 AM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.workitem.id.ll
23

Thanks for spotting this. I think issue was introduced in one of the merges from trunk. Fixed in https://reviews.llvm.org/D96967

41

Thanks for spotting this. I think issue was introduced in one of the merges from trunk. Fixed in https://reviews.llvm.org/D96967

arsenm added inline comments.Feb 18 2021, 1:46 PM
llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
191–199

This is a problem because I've removed forAllLanes.

This is a hack, we should be using a different register class for cases that don't support a given subregister index not scanning for an example non-reserved register

rampitec added inline comments.Feb 18 2021, 1:49 PM
llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
191–199

This would be massive duplication of all instructions with such operands, isn't it?

The point is that nobody upstream even got a chance to chime in.

We are and will be taking care of any feedback provided in this review post-commit.

The point is that nobody upstream even got a chance to chime in.

We are and will be taking care of any feedback provided in this review post-commit.

To be fair to @rampitec , it was not his desire to push this up in 1 big patch. We needed this upstreamed and no time was given to him to break it up into reasonably sized pieces. If it appears to be his doing/his intent, well, it should not. There have been a couple comments; I believe most addressed; comments will continue to be addressed.

tra added a comment.Feb 18 2021, 4:12 PM

The point is that nobody upstream even got a chance to chime in.

We are and will be taking care of any feedback provided in this review post-commit.

To be fair to @rampitec , it was not his desire to push this up in 1 big patch. We needed this upstreamed and no time was given to him to break it up into reasonably sized pieces. If it appears to be his doing/his intent, well, it should not. There have been a couple comments; I believe most addressed; comments will continue to be addressed.

Who landed the change is not particularly important.

What does matter is to make sure that shortcutting the review does not become a regular occurrence.

Stuff happens, Sometimes the standard rules may need to be bent. However, it should not be a unilateral decision by the committing side.

A better way to handle that would be to send the patch for review few days early (you presumably did have most/all of these changes made by then), provide details describing the changes (single subject line falls a bit short), outline your situation explaining why the patch can't be split and reviewed pre-commit. If the changes are indeed well-reviewed downstream, that would probably not pose much of a challenge to get the patch approved. If the patch does need further cleanups, at least we would have a reasonable idea how invasive they would be and could make an informed call. "Commit now, ask for forgiveness later" is not among the LLVM contribution guidelines, not for large patches. At the very minimum it should've been publicly discussed before the fact.

clang/include/clang/Driver/Options.td
3097–3101

We have BoolFOption to generate -fsomething and -fno-something

llvm/include/llvm/BinaryFormat/ELF.h
737

Nit: This looks odd. GFX90A does not need to be in the middle of the list. It makes it somewhat confusing to tell which ID is really the last. The _LAST enum says it's GFX90A, but it's not the last item of the list. There are already out-of-name-order GPUs at the end of the list, so putting 90A at the end would probably be a better choice. At least we'd still have the numeric values in order. Right now the list is ordered neither by the name nor by the value.

There's also a question of whether something needs to be done about the missing values 0x3c..0x3e. Presumably the _FIRST.._LAST enums specify the range we'll use to iterate over the GPU IDs. Do we handle the missing values correctly?

Looks like it's benign at the moment as we're only using it to return amdgcn triple in ELFObjectFile.h. I'd add the placeholder enums for the reserved/unused values within the range.

The point is that nobody upstream even got a chance to chime in.

We are and will be taking care of any feedback provided in this review post-commit.

To be fair to @rampitec , it was not his desire to push this up in 1 big patch. We needed this upstreamed and no time was given to him to break it up into reasonably sized pieces. If it appears to be his doing/his intent, well, it should not. There have been a couple comments; I believe most addressed; comments will continue to be addressed.

"we needed this upstream" is a business issue on AMD's side, not an issue for the llvm project. In general the expectation is that code is reviewed according to the guidelines and a single reviewer with one (small) patch that wasn't a revert doesn't feel like sufficient review for something of this size. For something this size I'd have expected Matt to at least be on the reviewer line and that also wasn't done. This feels like an abuse of the review system and probably should be reverted.

Thanks.

-eric

kzhuravl added inline comments.Feb 18 2021, 5:04 PM
llvm/include/llvm/BinaryFormat/ELF.h
737
kzhuravl added inline comments.Feb 18 2021, 5:20 PM
clang/include/clang/Driver/Options.td
3097–3101

The reason we use m_amdgpu_Features_Group is it is getting transformed to target features (-mtgsplit to +tgsplit, mno-tgsplit to -tgsplit. For example, tgsplit target feature in AMDGPU backend:

https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPU.td#L157

Does BoolFOption get translated to target features as well?

rampitec added inline comments.Feb 18 2021, 5:23 PM
clang/include/clang/Driver/Options.td
3097–3101

We could probably create similar BoolMOption. This is not only tgsplit, there are plenty of such binary options around.

kzhuravl added inline comments.Feb 18 2021, 5:23 PM
clang/include/clang/Driver/Options.td
3097–3101

Quickly glancing over Options.td, BoolFOption is in f_group, and does not get automatically converted to target-features

kzhuravl added inline comments.Feb 18 2021, 5:26 PM
clang/include/clang/Driver/Options.td
3097–3101

agreed, seems like a good choice given there is BoolFOption, BoolGOption

kzhuravl added inline comments.Feb 18 2021, 5:37 PM
clang/include/clang/Driver/Options.td
3097–3101

but it will still need to be in m_amdgpu_Features_Group because https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/AMDGPU.cpp#L403 unless we want to switch away from that. so maybe make a group an optional, last template parameter to BoolMOption?

kzhuravl added inline comments.Feb 18 2021, 5:39 PM
clang/include/clang/Driver/Options.td
3097–3101
rampitec added inline comments.Feb 18 2021, 6:09 PM
clang/include/clang/Driver/Options.td
3097–3101

Looks like. It is not just the same thing as BoolFOption or BoolGOption.

kzhuravl added inline comments.Feb 18 2021, 6:20 PM
clang/include/clang/Driver/Options.td
3097–3101

@tra , do you have any suggestion based on the comments above? thanks.

foad added a comment.Feb 19 2021, 3:56 AM

The point is that nobody upstream even got a chance to chime in.

We are and will be taking care of any feedback provided in this review post-commit.

To be fair to @rampitec , it was not his desire to push this up in 1 big patch. We needed this upstreamed and no time was given to him to break it up into reasonably sized pieces. If it appears to be his doing/his intent, well, it should not. There have been a couple comments; I believe most addressed; comments will continue to be addressed.

"we needed this upstream" is a business issue on AMD's side, not an issue for the llvm project. In general the expectation is that code is reviewed according to the guidelines and a single reviewer with one (small) patch that wasn't a revert doesn't feel like sufficient review for something of this size. For something this size I'd have expected Matt to at least be on the reviewer line and that also wasn't done. This feels like an abuse of the review system and probably should be reverted.

Thanks.

-eric

I'd appreciate it if you could find a solution that does not involve reverting and reapplying later, as this will triple the amount of churn we get downstream. (I realise LLVM policy is not to care about downstream but I thought I'd plead my case anyway!)

rampitec added inline comments.Feb 19 2021, 11:05 AM
clang/include/clang/Driver/Options.td
3097–3101

I have created new BoolMOption here: D97069. Not sure if it saves much code but let's see if we like it.

arsenm added inline comments.Feb 19 2021, 2:33 PM
llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
191–199

Ideally yes. We can still use register classes for this, with special care to make sure we never end up with the unaligned virtual registers in the wrong contexts.

The less that's tracked by the instruction definitions, the more special case code we have to right. I've been thinking of swapping out the entire MCInstrDesc table per-subtarget to make this easier, although that may be a painful change.

rampitec added inline comments.Feb 19 2021, 2:41 PM
llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
191–199

I do not see how it can be less code. You will need to duplicate all VALU pseudos, not just real instructions. Which means every time you write in the source something like AMDGPU::FLAT_LOAD_DWORDX2 you would have to write an if. For every VALU instruction.

arsenm added inline comments.Feb 19 2021, 3:25 PM
llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
191–199

It's less code because the code that's already there is supposed to rely on the static operand definitions. Every time we want to deviate from those, we end up writing manual code in the verifier and fixup things here and there that differ.

The point of swapping out the table would be to eliminate all the VALU pseudos. We would just have the same enum values referring to different physical instruction definitions

rampitec added inline comments.Feb 19 2021, 6:08 PM
llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
191–199

This makes sense, although as you said also quite painful and to me also sounds like a hack. There is still a lot of legalization needed even with this approach. Every time you hit an instruction not supported by a target you will need to do something about it. In a worst case expanding. Sounds like another year of work. Especially when you look at highly specialized ASICs which can do this but cannot do that, and you have a lot them.

rampitec added inline comments.Feb 19 2021, 10:47 PM
llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
191–199

JBTW it will not help anyway. Not for this problem. You may create an operand of a different RC or you may just reserve every other register like I did, the net result will be the same, you will end up using prohibited register. Imagine you are using an RC where only even tuples are added. And then you are using sub1_sub2 subreg of it. RA will happily allocate forbidden register just like it does now. To me this is RA bug in the first place to allocate a reserved register.

The only thing which could help is an another register info without odd wide subregs, but that you cannot achieve just by duplication of instruction definitions, for that you would need to duplicate register info as well. This is almost a new BE.

arsenm added inline comments.Feb 23 2021, 11:53 AM
llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
191–199

It's not a hack, this is how operand classes are intended to work. You wouldn't be producing these instructions on targets that don't support them (ideally we would also have a verifier for this, which is another area where subtarget handling is weak).

The point is to not reserve them. References to unaligned registers can exist, they just can't be used in the context of a real machine operand. D97316 switches to using dedicated classes for alignment (the further cleanup would be to have this come directly from the instruction definitions instead of fixing them up after isel).

rampitec added inline comments.Feb 23 2021, 12:40 PM
llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
191–199

It's not a hack, this is how operand classes are intended to work. You wouldn't be producing these instructions on targets that don't support them (ideally we would also have a verifier for this, which is another area where subtarget handling is weak).

The point is to not reserve them. References to unaligned registers can exist, they just can't be used in the context of a real machine operand. D97316 switches to using dedicated classes for alignment (the further cleanup would be to have this come directly from the instruction definitions instead of fixing them up after isel).

I have replied in D97316, but I do not believe it will help as is. It will run into the exactly same issue as with reserved registers approach and their subregs.

rampitec added inline comments.Feb 23 2021, 1:21 PM
llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
191–199

I have checked the commit. This code is not even related to vgpr agignment. It was about SGPRs in fact:

If stack is present 4 low SGPRs are reserved. The pass was
using just a first register as a representative for RC to
check for reserved subregs. The first register was reserved
and pass failed to find a valid lane split.

The testcase is mem_clause_sreg256_used_stack from memory_clause.mir.

rampitec marked 2 inline comments as done.Mar 1 2021, 9:06 AM
foad added inline comments.Mar 29 2021, 11:30 AM
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
100

Spelling "sequence".

rampitec marked an inline comment as done.Mar 29 2021, 12:23 PM
rampitec added inline comments.
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
100