Page MenuHomePhabricator

[AMDGPU] Fix llvm.amdgcn.init.exec and frame materialization
Needs ReviewPublic

Authored by critson on Wed, Jan 13, 7:06 PM.

Details

Summary

Frame-base materialization we may insert vector instructions
before EXEC is initialised. Fix this by moving lowering of
llvm.amdgcn.init.exec post-RA.
Also remove SI_INIT_EXEC_LO pseudo as this is not necessary.

Based on code by Ruiling Song.

Diff Detail

Event Timeline

critson created this revision.Wed, Jan 13, 7:06 PM
critson requested review of this revision.Wed, Jan 13, 7:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptWed, Jan 13, 7:06 PM

This looks good to me, but I'm not very familiar with LLVM.

arsenm added inline comments.Thu, Jan 14, 8:19 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
1887–1893

We shouldn't need to do ad-hoc use/def searches. Can you do this pre-RA, post-SSA like SILowerControlFlow?

critson added inline comments.Thu, Jan 14, 4:09 PM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
1887–1893

We definitely could do.
Do you have an opinion about what pass it should go in?
It looks to me that we would need to add one.

ruiling added inline comments.Thu, Jan 14, 10:11 PM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
1916

In case the FirstMI is not null, which means a COPY instruction to copy from input register was not eliminated for some reason, we need to move the COPY instruction to the beginning of the block and insert exec initialization right after that. Because the frame-base materialization instructions may be inserted before the COPY instruction. We need to make sure the exec initialization always be inserted at the entry of the block.

critson added inline comments.Thu, Jan 14, 10:51 PM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
1916

I only partially agree with this.
In theory the input to the intrinsic can be any operation -- unless we refine its definition (which I am not against doing).
So in the general case we cannot move this operation before the instruction which generates its input.
If we are saying that the input to this register is always a live-in, then we shouldn't be moving the COPY, rather we should be using the source of the COPY as the source of this operation directly.

ruiling added inline comments.Fri, Jan 15, 12:52 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
1916

what I said is the most common case at MachineIR. a Copy from funtion input, then init.exec.from.input with the Copy destination. I think no frontend will generate the argument to init.exec.with.input with arbitray operation. I am not saying moving operation before the instruction that generates the input to the operation. The general idea is we need to ensure the instructions to set the EXEC should be before any vector instruction. Based on this, we should move all instructions in the define-use chain of the input to init.exec.with.input to block entry to be before all vector instructions. In practice, your suggestion to use the Copy' source should work because that's the only usage the frontend will generate. The existing implementation before your change already assume Copy is the only possible operation that generates input to this intrinsic.

BTW I felt the input in the name of llvm.amdgcn.init.exec.from.input some kind of means the argument is the function input argument in LLVM IR. I think we can clarify that the value to this intrinsic should be an function input argument. That is the only requirement I can see currently from frontend. So that we don't need to consider various hard to solve situations. @mareko What do you think?

BTW I felt the input in the name of llvm.amdgcn.init.exec.from.input some kind of means the argument is the function input argument in LLVM IR. I think we can clarify that the value to this intrinsic should be an function input argument. That is the only requirement I can see currently from frontend. So that we don't need to consider various hard to solve situations. @mareko What do you think?

Yes, the input can only be an SGPR function argument.

ruiling added inline comments.Mon, Jan 18, 4:44 PM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
1887–1893

We definitely could do.
Do you have an opinion about what pass it should go in?
It looks to me that we would need to add one.

Can we just implement this in SILowerControlFlow?