Page MenuHomePhabricator

[AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST

Authored by ronlieb on Oct 3 2018, 1:57 PM.


Group Reviewers
Restricted Project

Add a pass to fixup various vector ISel issues.
Currently we handle converting GLOBAL_{LOAD|STORE}_*
and GLOBAL_Atomic_* instructions into their _SADDR variants.
This involves feeding the sreg into the saddr field of the new instruction.

Diff Detail

Event Timeline

ronlieb created this revision.Oct 3 2018, 1:57 PM
arsenm added inline comments.Oct 4 2018, 2:27 AM

Put these on the following line and indent


I'd rather not have this option. We have an already excessively long list of band-aids to avoid updating tests


I think you could break this with a shader using an inreg/sgpr argument


This is assuming too much about the REG_SEQUENCE, so at least needs an assert. It could be a wider reg_sequence


You don't need these. We'll never emit a physical register operand like this. I might worry about seeing a subreg index on the use operand though


de Morgan this


Comment not needed


Range loop

13 ↗(On Diff #168164)

Move the triple to the command line and drop this

16 ↗(On Diff #168164)

This testcase is way too big. Most of the tests should be only a handful of instructions. There should be ones stressing the immediate limits. I also don't see ones for the atomics that the comments say work

ronlieb marked 10 inline comments as done.Oct 5 2018, 1:32 PM

Matt, thanks for review, I have addressed most of the issues although the inreg/sgpr one may require some more investigation on my end.
see my responses to each of your review comments.
A revision to this patch will be uploaded in the next hour or so...


option removed, and tests altered to not reference the now non existent flag.


I tried to construct a test that exercises this path, its in the next revision of this patch in a test named global-saddr-misc.ll


assert added


changed to check for subreg.

16 ↗(On Diff #168164)

reduced test case to 47 lines total.

ronlieb updated this revision to Diff 168524.Oct 5 2018, 1:43 PM
ronlieb marked 5 inline comments as done.

Revised with various changes to reflect review comments.

I am planning to add another lit test for coverage of all the Global load/store opcodes.


assert seems like overkill, so I am going to change this to
if (DefInst->getNumOperands() != 5)

10 ↗(On Diff #168524)

Reduced version of this test in the works, next patch rev.

ronlieb updated this revision to Diff 168576.Oct 6 2018, 1:28 PM

added a load/store globals test case (.mir)
updated global-saddr-misc.ll per offline discussion.
changed assert to simpler if (check) continue.

I took a look and (1) seems OK, though Matt should opine as well (2) minor nit: you could run instnamer on global-saddr-misc.ll to tidy up the %var<numbering>

Thanks Mark , the instnamer suggestion is a good idea. Completed locally, waiting on further review comments from Matt.

ran instnamer on global-saddr-misc.ll

ronlieb updated this revision to Diff 168781.Oct 9 2018, 5:22 AM

updated global-saddr-misc.ll with instnamer changes

arsenm added inline comments.Oct 9 2018, 7:26 AM

Braces for both half

16 ↗(On Diff #168164)

This is still too big. You only need not much more than load, a store, and a GEP in each function

7–11 ↗(On Diff #168781)

You don't need the complicated vector stuff (this is also broken since addrspace(1) is 64-bit). The pointer argument needs to be the direct argument here. It won't be since now there will end up being a zext to the pointer size, so this isn't testing what it's supposed to

arsenm added inline comments.Oct 9 2018, 7:33 AM
7–11 ↗(On Diff #168781)

I also realized the case I was worried about here probably can't happen

ronlieb added inline comments.Oct 9 2018, 7:45 AM

I thought the coding standard was no braces for single statement 'if' and 'else' , is it not ?
I personally like braces, however staying with convention seems best.
I could also refactor it a little bit and avoid the whole topic...

NewGlob = BuildMI(MBB, I, MI.getDebugLoc(), TII->get(NewOpcd));
if (HasVdst)
  NewGlob->addOperand(MF, MI.getOperand(0));
ronlieb added inline comments.Oct 9 2018, 8:09 AM
7–11 ↗(On Diff #168781)

seems like I should remove this test now. Originally I added it based on a concern you expressed.
Matt, do you have a preference regarding this test?

ronlieb added inline comments.Oct 9 2018, 9:13 AM
16 ↗(On Diff #168164)

is this comment about only needing "load store gep per function? intended for a different test? This test, conv2d-saddr.ll only has one function. perhaps you were thinking of global-saddr-atomics.ll ?

ronlieb updated this revision to Diff 168823.Oct 9 2018, 10:18 AM

Cleaned up an if-else statement
Slightly reduced global-saddr-misc.ll

arsenm added inline comments.Oct 9 2018, 7:13 PM

It was spread over multiple lines but I don't se that in the diff now



7–11 ↗(On Diff #168781)

It would still be good to have to make sure this actually works, but you need to completely replace the test body to be simpler and avoid these issues. Also I wouldn't split these tests up arbitrarily like this and merge them into one

arsenm added inline comments.Oct 9 2018, 7:14 PM

This is probably just noise for general debugging

ronlieb added inline comments.Oct 9 2018, 7:50 PM

It is indeed for general debugging.
Is there a specific request implied by your comment?
would you prefer this be removed?


This is at the moment a single statement loop, I would expect it to become compound at some point at which braces would be inserted.
See example: line 163

7–11 ↗(On Diff #168781)

I think part of your your suggestion is to reduce the current number of new tests down to 1 or 2.

I would propose changing these
test/CodeGen/AMDGPU/conv2d-saddr.ll (47 lines)
test/CodeGen/AMDGPU/global-load_stores.mir (86 lines)
test/CodeGen/AMDGPU/global-saddr-atomics.ll (359 lines)
test/CodeGen/AMDGPU/global-saddr-misc.ll (13 lines)
test/CodeGen/AMDGPU/global-saddr-offsets.ll (60 lines)

  1. fold these test/CodeGen/AMDGPU/global-load_stores.mir (86 lines) test/CodeGen/AMDGPU/global-saddr-atomics.ll (359 lines)

into a single .mir coverage test
test/CodeGen/AMDGPU/global-load-stores-atomics.mir (86 lines)

  1. fold these test/CodeGen/AMDGPU/conv2d-saddr.ll (47 lines) test/CodeGen/AMDGPU/global-saddr-misc.ll (13 lines) test/CodeGen/AMDGPU/global-saddr-offsets.ll (60 lines)

into a single test

ronlieb updated this revision to Diff 169041.Oct 10 2018, 10:28 AM

Consolidated 5 tests into 2.
A few minor cleanups on some spacing and indentation.

ronlieb updated this revision to Diff 169141.Oct 10 2018, 6:19 PM

Cleaned up global-saddr.ll test, reducing number of functions, and use addrspace(1) i64

You probably need to extend the pass to support subregs, not necessarily in the same patch.
But you definitely need a mir test with subregs used with either operands.


The name of the pass is too opaque and gives no hint what pass actually doing. I think it needs to have something about "flat" in the name.


Can be subreg.


Can be subreg.

arsenm added inline comments.Oct 22 2018, 12:43 PM

It shouldn't. This should be a general pass that isn't specifically for this one thing. There may be other SelectionDAG workarounds we want to put in here

rampitec added inline comments.Oct 22 2018, 1:18 PM


ronlieb updated this revision to Diff 173850.Nov 13 2018, 7:18 AM

Rebased with latest llvm, pushed up to trigger some AMD internal Jenkins testing.
Will work on the subregs suggestions next.

ronlieb updated this revision to Diff 173972.Nov 13 2018, 6:13 PM

Added additional comments and some code related to subregs.

ronlieb marked 9 inline comments as done.Nov 13 2018, 6:18 PM
arsenm added inline comments.Nov 14 2018, 9:18 AM

All of these places listing SReg_64* should probably just avoid it since we have so many variants. Something like
AMDGPU::getRegBitWidth() == 64 && TRI->isSGPRClass(), although maybe we should have an isSGPR64 or something

ronlieb updated this revision to Diff 174085.Nov 14 2018, 12:45 PM

Per review comments, now using getRegBitWidth(),
which exposed a missing case in getRegBitWidth() for AMDGPU::SReg_64_XEXECRegClassID

ronlieb marked an inline comment as done.Nov 14 2018, 12:47 PM
This revision is now accepted and ready to land.Nov 15 2018, 10:21 AM
arsenm accepted this revision.Nov 15 2018, 11:07 AM


ronlieb closed this revision.Nov 16 2018, 2:08 PM

rL347008: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST