This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Enable TableGen'd instruction selector
ClosedPublic

Authored by tstellar on Apr 23 2018, 3:13 PM.

Diff Detail

Event Timeline

tstellar created this revision.Apr 23 2018, 3:13 PM
arsenm added inline comments.Apr 23 2018, 4:31 PM
lib/Target/AMDGPU/SIRegisterInfo.cpp
1578

Should this be the allocatable SReg_32_XM0?

1581

Same for XEXEC

test/CodeGen/AMDGPU/GlobalISel/inst-select-or.mir
5

should be able to drop this if the calling convention doesn't matter

arsenm added inline comments.Apr 23 2018, 4:32 PM
lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
440

Should the default be selectImpl?

tstellar updated this revision to Diff 143843.Apr 24 2018, 5:20 PM

Address code review comments.

tstellar marked 2 inline comments as done.Apr 24 2018, 5:27 PM
tstellar added inline comments.
lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
440

Eventually yes, but enabling it for everything now could lead to crashes and miscompiles that will prevent the SelectionDAG fallback from activating. I would like to get a full piglit run working with GISel for simple shaders and SelectionDAG for everything else before we make selectImpl() the default.

tstellar updated this revision to Diff 145852.May 8 2018, 8:23 PM

Update TableGen patterns so that they will automatically fold immediates into
the instruction.

arsenm added inline comments.May 8 2018, 10:34 PM
lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
478

We don't fold immediates in the DAG selector now, so why do this here?

nhaehnle added inline comments.May 9 2018, 1:09 AM
lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
478

In fact, I remember there are some edge cases where folding immediates directly doesn't work due to moveToVALU. Basically, if you initially generate an instruction like S_ADD with an immediate and that instruction then gets moved to VALU it becomes a VOP3-encoded V_ADD which cannot have an immediate (because of the restriction that instructions can be at most 64 bits long), leading to machine instruction verifier errors.

The question is whether the same issue may apply here when going through GlobalISel in corner cases, even if we try to be better at selection the correct ALU directly. I think it makes sense to be a bit conservative initially.

tstellar added inline comments.May 9 2018, 7:21 AM
lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
478

We don't fold immediates in the DAG selector now, so why do this here?

Because we can? Ideally doing folding in during ISel would allow us to eliminate or greatly simplify the SIFoldOperands pass, but admittedly, global-isel is so new I don't know if that would be possible. I can go either way on folding during ISel, but it would at least be nice to have a target feature to enable this.

How do you feel about selecting directly to _e32 variants when we can instead of _e64, which also something that global-isel is doing?

tstellar added inline comments.May 9 2018, 7:43 AM
lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
478

Because we can? Ideally doing folding in during ISel would allow us to eliminate or greatly simplify the SIFoldOperands pass, but admittedly, global-isel is so new I don't know if that would be possible. I can go either way on folding during ISel, but it would at least be nice to have a target feature to enable this.

Actually, thinking about this more I would prefer to have global-isel do immediate folding by default with a target option to disable it. global-isel is still very experimental for AMDGPU, and I would like to take advantage of as many new features as possible while it's still in the experimental state. This will also help us learn what's possible and what isn't.

478

In theory global-isel won't just be better at selecting the correct ALU, it will *always* select the correct ALU, so we shouldn't have to worry about corner cases like this, but this is all in theory until we actually have a more complete global-isel implementation.

Hmm, you may have a point about trying to push the boundaries of what's possible by folding immediately. It would certainly be nice if it was possible.

arsenm added inline comments.May 9 2018, 11:03 AM
lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
478

I don't particularly like either. I don't think doing these during selection eliminates the need to do these things later. Other passes will be introducing and changing instructions, which will introduce new places these need to happen. It's true most of the places that need this now are junk created by SIFixSGPRCopies hacks.

I also think for good debug info constants are supposed to be materialized into a register before use. There are also multiple use and code size considerations when an immediate is used multiple times, and it's probably not appropriate for a single instruction pattern to be considering those, and a proper optimization pass would have to do it.

There is some stuff that has become problematic we do now in SIFoldOperands (the way clamp folding is implemented for example starts to break down with packed operations) that would be better to move when we have more semantically pure pseudoinstructions. Similarly, SDWA would probably be easier do handle during isel (although it has some of the other code size consideration issues as immediates)

Doing folding (especially for encoding shrinking) must be done later, since things like frame index elimination introduce new constants to deal with very late.

tstellar updated this revision to Diff 146056.May 9 2018, 8:20 PM

I've dropped the immediate folding for now. This is blocking the rest of
my global isel patches, so I would rather just get this simple version
committed. We can always bring back the immediate folding if we think
it will be useful.

arsenm accepted this revision.May 10 2018, 5:09 AM

LGTM. At this point I would be more interested in what source and dest modifier folding looks like in global isel

This revision is now accepted and ready to land.May 10 2018, 5:09 AM
This revision was automatically updated to reflect the committed changes.