This patch also support lowering global addresses.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp | ||
---|---|---|
123 | Because pcalau12i always produces results with the lower 12 bits clean, isn't ori more suitable for concatenating the lower bits? |
llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp | ||
---|---|---|
123 | The linker will corrects the pcalau12i instruction based on whether bit 12 is 1. example: pcalau12i $a1, %pc_hi20(G) addi.d $a2, $a1, %pc_lo12(G) ld.w $a1, $a2, 0 => pcalau128 $a1, %pc_hi20(G) ld.w $a2, $a1, %pc_lo12(G) |
llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll | ||
---|---|---|
30 | Switch to opaque pointers. |
llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp | ||
---|---|---|
122 | SDValue AddrHi(DAG.getMachineNode(LoongArch::PCALAU12I, DL, Ty, GA), 0); |
While I'm not very sure about the new pcalau12i + addi symbol address materialization, they look good regardless, so are the other test cases.
Do you think a test exercising very large offset for getelementptr is worthwhile? LGTM otherwise.
Thanks!
The large offset situation can be covered in previous patches, offsets will be converted to immediate loads.
llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll | ||
---|---|---|
330 | Based on the previous discussion, should we move "ori" to the last instruction in the long immediate load sequence and change it to "addi.d" if possible, so a peephole optimization would be able to combine "addi.d" and "ld" into one instruction? |
llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll | ||
---|---|---|
330 | In terms of principle this can be implemented. But seems that this is only suitable for very few scenarios (maybe only when accessing a constant memory address). I'm not very sure. |
llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll | ||
---|---|---|
330 | I think this depends on the micro-architecture. If lu12i.w; ori; lu32i.d; lu52i.d sequences or variations of it are macro-op-fusioned, then breaking away from the current pattern could instead harm performance. OTOH it could be a net benefit but as @SixWeining pointed out this case of accessing absolute addresses is probably too rare to justify a special-case. |
llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll | ||
---|---|---|
330 | Then I'll withdraw the proposal, at least for now. |
llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll | ||
---|---|---|
330 | I tested some very simple cases on a 3A5000LL. It seems you can move ori after lu32i.d or lu52i.d, or change it to addi.d or even xori w/o performance loss. Maybe my case is too noob and can't reflect the performance of real apps though. |
llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll | ||
---|---|---|
330 | Note that I don't know whether there's macro-op fusion of this sort on 3A5000 in the first place; if this fusion actually doesn't happen on 3A5000 then there should be no performance loss whatsoever. We need input from the Loongson hardware team to be sure. |
SDValue AddrHi(DAG.getMachineNode(LoongArch::PCALAU12I, DL, Ty, GA), 0);