Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Thanks for finally doing this. Does this fix infinite loops? I didn't see one in the new tests, and I think there are a couple of XFAILed ones right now.
I think we should put an assert somewhere that the immediate for the branch target isn't overflowing. I'm worried that we could end up hitting that limit someday.
lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | ||
---|---|---|
42 ↗ | (On Diff #46051) | Extra space before ( |
lib/Target/AMDGPU/SIFixSGPRLiveRanges.cpp | ||
113 ↗ | (On Diff #46051) | Indentation looks off |
115 ↗ | (On Diff #46051) | You should use TII->isSOPP(*I) here. I think there should be an assert that I->isBranch() |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
1168 ↗ | (On Diff #46051) | It seems like there should be an assert or something that this is actually a branch intrinsic |
1169 ↗ | (On Diff #46051) | Typo leaglize |
1172–1177 ↗ | (On Diff #46051) | The SetCC in the if should be folded into a single assert, or this whole block be put under an #ifndef NDEBUG block |
lib/Target/AMDGPU/SIInstrInfo.cpp | ||
2913–2914 ↗ | (On Diff #46051) | I think this could use a better name. How about something like |
2915–2916 ↗ | (On Diff #46051) | We should probably have a verifier check for this |
2924–2925 ↗ | (On Diff #46051) | Why is this limited to branches? What if you had another user like a s_cselect? I would expect any user to need processing |
lib/Target/AMDGPU/SIInstrInfo.td | ||
252 ↗ | (On Diff #46051) | Extra space |
855 ↗ | (On Diff #46051) | Commented out code |
lib/Target/AMDGPU/SIInstructions.td | ||
465 ↗ | (On Diff #46051) | Looks unnecessary |
3118–3119 ↗ | (On Diff #46051) | This looks like the workaround for a tablegen bug with multiple outputs to reg_sequence. Maybe note that in a FIXME? |
lib/Target/AMDGPU/SIRegisterInfo.cpp | ||
413 ↗ | (On Diff #46051) | This should be the last in the array. The more common types should be first |
test/CodeGen/AMDGPU/si-annotate-cf.ll | ||
43–46 ↗ | (On Diff #46051) | Why this test change and the others like it? |
test/CodeGen/AMDGPU/uniform-cfg.ll | ||
362 ↗ | (On Diff #46051) | I think there should be a test where an icmp is the branch condition in a different block (there might be on here that I missed). One with the icmp used as the local branch's condition and another block's as well. There are a few other perverse cases that might be nice to test, like branch condition on inline asm output. |
Is it this hardware bug handled properly?
Description: VCCZ bit corrupted by SMRD
Scope: SI+CI
Solution: Anytime a Scalar Memory read is issued, the shader compiler must assume that VCCZ is no longer valid AND cannot be loaded with a new value until all of the scalar memory reads have completed and returned their data, as guaranteed by inserting an S_WAITCNT LGKM_CNT(0) before issuing an instruction which updates VCCZ.
lib/Target/AMDGPU/SIFixSGPRLiveRanges.cpp | ||
---|---|---|
115 ↗ | (On Diff #46051) | You mean the assert should be inside the branch? |
lib/Target/AMDGPU/SIInstrInfo.cpp | ||
2924–2925 ↗ | (On Diff #46051) | If we don't limit this to branch instructions, then any instructions in the block which read scc will be moved to VALU, which we don't want. For most instructions the decision to move them to the VALU is based on explicit uses. This function needs to handle the cases where changing and implicit uses would force the instruction to go to the VALU. |
test/CodeGen/AMDGPU/si-annotate-cf.ll | ||
43–46 ↗ | (On Diff #46051) | Most of the tests with branching assume divergent branches, but with these changes most branches are uniform, so I forced branches to be divergent in places where the test assumes divergence. |
lib/Target/AMDGPU/SIFixSGPRLiveRanges.cpp | ||
---|---|---|
115 ↗ | (On Diff #46051) | Actually since this is from getFirstTerminator, I think the MachineVerifier would already catch this so it's not important |
lib/Target/AMDGPU/SIInstrInfo.cpp | ||
2924–2925 ↗ | (On Diff #46051) | But if the SCC def instruction was moved, any non-branch users would also need to be moved. We don't want all SCC users, just the ones until the next SCC def which is what this loop does. I don't think being implicit avoids the problem of needing to move the users. Also I think this function should be only called for isDef() below. I think this works because the def check is still done for the original def instruction, but is more work to follow. This will end up getting called twice if you happen to have an addc to branch on overflow which has both a use and a def, and it will end up moving more instructions from the use. The loop should also probably start at SCCDefInst + 1 also. |