Page MenuHomePhabricator

[AMDGPU] Fix a weird WWM intrinsic issue.
ClosedPublic

Authored by sheredom on Dec 21 2018, 7:08 AM.

Details

Summary

I found a really strange WWM issue through a very convoluted shader that essentially boils down to a bug in SIInstrInfo where canReadVGPR did not correctly identify that WWM is like a copy and can have a VGPR as its source.

Diff Detail

Event Timeline

sheredom created this revision.Dec 21 2018, 7:08 AM

Interesting.

The only user of canReadVGPR is addUsersToMoveToVALUWorklist. Since the intended semantics of canReadVGPR aren't at all clear from the name, might I suggesting folding it into its only user?

Following the chain further down, getOpRegClass also doesn't have particularly clear semantics. It's at least documented in the method's comment, but still: a function which returns an instruction's opcode's intrinsic regclass, but then falls back to the regclass that just happens to be there seems a bit misguided to me.

I think getOpRegClass should at most be a convenience function that returns the opcode's regclass if it exists, or nullptr otherwise. From a quick glance, it looks like most (all?) users should be fine with either using an opcode's intrinsic regclass (e.g. in getOpSize) or, more commonly, with just taking the MachineOperand's register class.

In the case of addUsersToMoveToVALUWorklist, the code should just check the users' opcode's intrinsic regclass; if it exists, add the user based on whether the regclass of the opcode has VGPRs (if it doesn't, we need to move-to-VALU, i.e. add to worklist). If it doesn't exist, IMO we should just add to the worklist unconditionally, and handle the fallout (if any) in the main moveToVALU method. That way, the more complex logic is centered in one place.

I'd appreciate if you could help cleanup the technical debt here on those points...

getOpRegClass behaves the way it does because otherwise it will fail on any generic or variadic instructions, e.g. copy or a call instruction

sheredom updated this revision to Diff 180461.Jan 7 2019, 4:50 AM

Removed canReadVGPR as it had only a single callsite.

@nhaehnle I've removed canReadVGPR as it only had the single callsite - but given @arsenm's comment I did not change getOpRegClass. I also don't feel super comfortable doing the change you suggested to addUsersToMoveToVALUWorklist as I'm worried that there will be non-LLVM-tested paths that I could trip up on with ease. I'd rather ship the commit as is if y'all are ok with it.

LGTM but I know hardly anything about WWM

test/CodeGen/AMDGPU/fix-wwm-vgpr-copy.ll
5

Move to -mtriple and remove the triple/datalayout

sheredom updated this revision to Diff 183822.Jan 28 2019, 4:10 AM

Fixed review comments.

sheredom marked an inline comment as done.Jan 28 2019, 4:10 AM
tpr accepted this revision.Jan 29 2019, 3:20 AM

LGTM.

I was thinking it is a bit odd that we're having to handle WQM and WWM in this way even though they are target instructions. However I think it is ok because they act like the non-target COPY instruction in that they do not specify a register class for operands.

This revision is now accepted and ready to land.Jan 29 2019, 3:20 AM
This revision was automatically updated to reflect the committed changes.