Joe_Nash rampitec piotr jpages
- Group Reviewers
- rGcfb7ffdec0eb: [AMDGPU] New AMDGPUInsertDelayAlu pass
For tests with generated checks: if the only difference was adding a bunch of s_delay_alu instructions then I regenerated the checks. If it was more disruptive (e.g. if it meant that shared prefixes could no longer be used) then I added -amdgpu-enable-delay-alu=0 to the GFX11 RUN lines instead.
Should check LLVM_ENABLE_DUMP instead
Bad autos, I'm not even sure what the type is supposed to be here
Can merge these into one LLVM_DEBUG. Also should use printMBBReference
This is D128313 (except for SI_RETURN_TO_EPILOG which I'm not sure about since it's a bit weird)
Should extract this into a separate predicate function
Why does the opcode need special casing here? Why not every tied operand?
Done, but the single LLVM_DEBUG looks uglier to me.
Can update this when that patch lands.
Because the hardware does not see a RAW dependency here:
v_mov_b32 v0, 0 v_writelane_b32 v0, s0, 0
At the thread level there is no dependency. The MIR instruction uses a tied read-write operand for v0, which I suppose is a wave-level representation where the read represents the lanes that are not modified.
Other instructions with tied operands (like V_MAC) represent a real read of a VGPR in all active lanes, so we do want to model a delay for them.