I found a really strange WWM issue through a very convoluted shader that essentially boils down to a bug in SIInstrInfo where canReadVGPR did not correctly identify that WWM is like a copy and can have a VGPR as its source.
Details
Diff Detail
Event Timeline
Interesting.
The only user of canReadVGPR is addUsersToMoveToVALUWorklist. Since the intended semantics of canReadVGPR aren't at all clear from the name, might I suggesting folding it into its only user?
Following the chain further down, getOpRegClass also doesn't have particularly clear semantics. It's at least documented in the method's comment, but still: a function which returns an instruction's opcode's intrinsic regclass, but then falls back to the regclass that just happens to be there seems a bit misguided to me.
I think getOpRegClass should at most be a convenience function that returns the opcode's regclass if it exists, or nullptr otherwise. From a quick glance, it looks like most (all?) users should be fine with either using an opcode's intrinsic regclass (e.g. in getOpSize) or, more commonly, with just taking the MachineOperand's register class.
In the case of addUsersToMoveToVALUWorklist, the code should just check the users' opcode's intrinsic regclass; if it exists, add the user based on whether the regclass of the opcode has VGPRs (if it doesn't, we need to move-to-VALU, i.e. add to worklist). If it doesn't exist, IMO we should just add to the worklist unconditionally, and handle the fallout (if any) in the main moveToVALU method. That way, the more complex logic is centered in one place.
I'd appreciate if you could help cleanup the technical debt here on those points...
getOpRegClass behaves the way it does because otherwise it will fail on any generic or variadic instructions, e.g. copy or a call instruction
@nhaehnle I've removed canReadVGPR as it only had the single callsite - but given @arsenm's comment I did not change getOpRegClass. I also don't feel super comfortable doing the change you suggested to addUsersToMoveToVALUWorklist as I'm worried that there will be non-LLVM-tested paths that I could trip up on with ease. I'd rather ship the commit as is if y'all are ok with it.
LGTM but I know hardly anything about WWM
test/CodeGen/AMDGPU/fix-wwm-vgpr-copy.ll | ||
---|---|---|
5 | Move to -mtriple and remove the triple/datalayout |
LGTM.
I was thinking it is a bit odd that we're having to handle WQM and WWM in this way even though they are target instructions. However I think it is ok because they act like the non-target COPY instruction in that they do not specify a register class for operands.
Move to -mtriple and remove the triple/datalayout