Details
- Reviewers
Joe_Nash rampitec piotr jpages - Group Reviewers
Restricted Project - Commits
- rGcfb7ffdec0eb: [AMDGPU] New AMDGPUInsertDelayAlu pass
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
60,100 ms | x64 debian > LLVM.CodeGen/NVPTX::wmma.py |
Event Timeline
For tests with generated checks: if the only difference was adding a bunch of s_delay_alu instructions then I regenerated the checks. If it was more disruptive (e.g. if it meant that shared prefixes could no longer be used) then I added -amdgpu-enable-delay-alu=0 to the GFX11 RUN lines instead.
llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp | ||
---|---|---|
163 | Should check LLVM_ENABLE_DUMP instead | |
218 | Bad autos, I'm not even sure what the type is supposed to be here | |
302–303 | Can merge these into one LLVM_DEBUG. Also should use printMBBReference | |
314–321 | This is D128313 (except for SI_RETURN_TO_EPILOG which I'm not sure about since it's a bit weird) | |
324–329 | Should extract this into a separate predicate function | |
331–333 | Ditto | |
346–349 | Why does the opcode need special casing here? Why not every tied operand? | |
386–387 | One LLVM_DEBUG |
llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp | ||
---|---|---|
302–303 | Done, but the single LLVM_DEBUG looks uglier to me. | |
314–321 | Can update this when that patch lands. | |
346–349 | Because the hardware does not see a RAW dependency here: v_mov_b32 v0, 0 v_writelane_b32 v0, s0, 0 At the thread level there is no dependency. The MIR instruction uses a tied read-write operand for v0, which I suppose is a wave-level representation where the read represents the lanes that are not modified. Other instructions with tied operands (like V_MAC) represent a real read of a VGPR in all active lanes, so we do want to model a delay for them. |
Should check LLVM_ENABLE_DUMP instead