Add a pass to fixup various vector ISel issues.
Currently we handle converting GLOBAL_{LOAD|STORE}_*
and GLOBAL_Atomic_* instructions into their _SADDR variants.
This involves feeding the sreg into the saddr field of the new instruction.
Details
Diff Detail
Event Timeline
lib/Target/AMDGPU/FLATInstructions.td | ||
---|---|---|
179–182 | Put these on the following line and indent | |
lib/Target/AMDGPU/SIFixupVectorISel.cpp | ||
48–52 | I'd rather not have this option. We have an already excessively long list of band-aids to avoid updating tests | |
101–102 | I think you could break this with a shader using an inreg/sgpr argument | |
108–111 | This is assuming too much about the REG_SEQUENCE, so at least needs an assert. It could be a wider reg_sequence | |
113–118 | You don't need these. We'll never emit a physical register operand like this. I might worry about seeing a subreg index on the use operand though | |
126 | de Morgan this | |
208 | Comment not needed | |
218–219 | Range loop | |
test/CodeGen/AMDGPU/conv2d-saddr.ll | ||
13 ↗ | (On Diff #168164) | Move the triple to the command line and drop this |
16 ↗ | (On Diff #168164) | This testcase is way too big. Most of the tests should be only a handful of instructions. There should be ones stressing the immediate limits. I also don't see ones for the atomics that the comments say work |
Matt, thanks for review, I have addressed most of the issues although the inreg/sgpr one may require some more investigation on my end.
see my responses to each of your review comments.
A revision to this patch will be uploaded in the next hour or so...
lib/Target/AMDGPU/SIFixupVectorISel.cpp | ||
---|---|---|
48–52 | option removed, and tests altered to not reference the now non existent flag. | |
101–102 | I tried to construct a test that exercises this path, its in the next revision of this patch in a test named global-saddr-misc.ll | |
108–111 | assert added | |
113–118 | changed to check for subreg. | |
test/CodeGen/AMDGPU/conv2d-saddr.ll | ||
16 ↗ | (On Diff #168164) | reduced test case to 47 lines total. |
I am planning to add another lit test for coverage of all the Global load/store opcodes.
lib/Target/AMDGPU/SIFixupVectorISel.cpp | ||
---|---|---|
102 | assert seems like overkill, so I am going to change this to continue; | |
test/CodeGen/AMDGPU/global-saddr-misc.ll | ||
10 ↗ | (On Diff #168524) | Reduced version of this test in the works, next patch rev. |
added a load/store globals test case (.mir)
updated global-saddr-misc.ll per offline discussion.
changed assert to simpler if (check) continue.
I took a look and (1) seems OK, though Matt should opine as well (2) minor nit: you could run instnamer on global-saddr-misc.ll to tidy up the %var<numbering>
Thanks Mark , the instnamer suggestion is a good idea. Completed locally, waiting on further review comments from Matt.
lib/Target/AMDGPU/SIFixupVectorISel.cpp | ||
---|---|---|
168–173 | Braces for both half | |
test/CodeGen/AMDGPU/conv2d-saddr.ll | ||
16 ↗ | (On Diff #168164) | This is still too big. You only need not much more than load, a store, and a GEP in each function |
test/CodeGen/AMDGPU/global-saddr-misc.ll | ||
7–11 ↗ | (On Diff #168781) | You don't need the complicated vector stuff (this is also broken since addrspace(1) is 64-bit). The pointer argument needs to be the direct argument here. It won't be since now there will end up being a zext to the pointer size, so this isn't testing what it's supposed to |
test/CodeGen/AMDGPU/global-saddr-misc.ll | ||
---|---|---|
7–11 ↗ | (On Diff #168781) | I also realized the case I was worried about here probably can't happen |
lib/Target/AMDGPU/SIFixupVectorISel.cpp | ||
---|---|---|
168–173 | I thought the coding standard was no braces for single statement 'if' and 'else' , is it not ? NewGlob = BuildMI(MBB, I, MI.getDebugLoc(), TII->get(NewOpcd)); if (HasVdst) NewGlob->addOperand(MF, MI.getOperand(0)); |
test/CodeGen/AMDGPU/global-saddr-misc.ll | ||
---|---|---|
7–11 ↗ | (On Diff #168781) | seems like I should remove this test now. Originally I added it based on a concern you expressed. |
test/CodeGen/AMDGPU/conv2d-saddr.ll | ||
---|---|---|
16 ↗ | (On Diff #168164) | is this comment about only needing "load store gep per function? intended for a different test? This test, conv2d-saddr.ll only has one function. perhaps you were thinking of global-saddr-atomics.ll ? |
lib/Target/AMDGPU/SIFixupVectorISel.cpp | ||
---|---|---|
168–173 | It was spread over multiple lines but I don't se that in the diff now | |
206 | Braces | |
test/CodeGen/AMDGPU/global-saddr-misc.ll | ||
7–11 ↗ | (On Diff #168781) | It would still be good to have to make sure this actually works, but you need to completely replace the test body to be simpler and avoid these issues. Also I wouldn't split these tests up arbitrarily like this and merge them into one |
lib/Target/AMDGPU/SIFixupVectorISel.cpp | ||
---|---|---|
191 | This is probably just noise for general debugging |
lib/Target/AMDGPU/SIFixupVectorISel.cpp | ||
---|---|---|
191 | It is indeed for general debugging. | |
206 | This is at the moment a single statement loop, I would expect it to become compound at some point at which braces would be inserted. | |
test/CodeGen/AMDGPU/global-saddr-misc.ll | ||
7–11 ↗ | (On Diff #168781) | I think part of your your suggestion is to reduce the current number of new tests down to 1 or 2. I would propose changing these
into a single .mir coverage test
into a single test |
Cleaned up global-saddr.ll test, reducing number of functions, and use addrspace(1) i64
You probably need to extend the pass to support subregs, not necessarily in the same patch.
But you definitely need a mir test with subregs used with either operands.
lib/Target/AMDGPU/SIFixupVectorISel.cpp | ||
---|---|---|
10 | The name of the pass is too opaque and gives no hint what pass actually doing. I think it needs to have something about "flat" in the name. | |
118 | Can be subreg. | |
123 | Can be subreg. |
lib/Target/AMDGPU/SIFixupVectorISel.cpp | ||
---|---|---|
10 | It shouldn't. This should be a general pass that isn't specifically for this one thing. There may be other SelectionDAG workarounds we want to put in here |
lib/Target/AMDGPU/SIFixupVectorISel.cpp | ||
---|---|---|
10 | Ok. |
Rebased with latest llvm, pushed up to trigger some AMD internal Jenkins testing.
Will work on the subregs suggestions next.
lib/Target/AMDGPU/SIFixupVectorISel.cpp | ||
---|---|---|
126 | All of these places listing SReg_64* should probably just avoid it since we have so many variants. Something like |
Per review comments, now using getRegBitWidth(),
which exposed a missing case in getRegBitWidth() for AMDGPU::SReg_64_XEXECRegClassID
rL347008: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST
Put these on the following line and indent