Spill2Reg can now emit spill and reload instructions.
This will not generate correct code, as it does not keep track of live regs.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
llvm/lib/CodeGen/Spill2Reg.cpp | ||
---|---|---|
235 | Is this assuming you can only spill one register to one vector register? What if you can place multiple values in different subregisters? |
llvm/lib/CodeGen/Spill2Reg.cpp | ||
---|---|---|
235 | Yes, for now we can only spill one register to the first lane of one vector register. The reason is that if we want to spill to another lane other than the first one in x86 then we need to use the PINSR/PEXTR instructions rather than MOVD which have a higher latency and use more uops. But yeah, I think it is still worth extending it to spill to more lanes in the future. Here is the relevant data from Agner Fog's instruction tables: uops uops uops fused unfused each latency throughput domain domain port Spill-to-reg ------------ MOVD mm/x r32/64 1 1 p5 2 1 MOVD r32/64 mm/x 1 1 p0 2 1 PINSRD/Q x,r,i 2 2 2p5 3 2 PEXTRB/W/D/Q r,x,i 2 2 p0 p5 3 1 |
llvm/lib/CodeGen/Spill2Reg.cpp | ||
---|---|---|
235 | The reason I ask is because I'm very interested in using something more like this for AMDGPU. We currently have 2 custom, convoluted mechanisms for handling "spills" to registers. I'm wondering if we could adapt this pass to one of them, but it would require a broader notion of how/where the registers can be spilled (and there might be some additional liveness hazards) |
llvm/lib/CodeGen/Spill2Reg.cpp | ||
---|---|---|
235 | Yeah that would make sense. I think we can make code generation a bit more sophisticated than it is now and have some target specific components decide which register and lane we should use on each spill. |
llvm/lib/CodeGen/Spill2Reg.cpp | ||
---|---|---|
235 | I don't think either will be much use to you. One of the mechanisms doesn't really use vector registers in the same sense as a subregister, and also relies on reserved registers |
llvm/test/CodeGen/X86/spill2reg_simple_2.mir | ||
---|---|---|
3 | All test cases have the option -spill2reg-mem-instrs=0. It looks to me more like a debug purpose option. A more practical and positive performance impact value should be larger than 0. Could you add a test case for it. |
llvm/test/CodeGen/X86/spill2reg_simple_2.mir | ||
---|---|---|
3 | Yeah this is basically disabling the heuristic so that we can check the functionality even in small tests. |
Is this assuming you can only spill one register to one vector register? What if you can place multiple values in different subregisters?