This is an archive of the discontinued LLVM Phabricator instance.

[LoongArch] Add codegen support for load/store operations
ClosedPublic

Authored by wangleiat on Jun 23 2022, 5:04 AM.

Details

Summary

This patch also support lowering global addresses.

Diff Detail

Event Timeline

wangleiat created this revision.Jun 23 2022, 5:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2022, 5:04 AM
wangleiat requested review of this revision.Jun 23 2022, 5:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2022, 5:04 AM
xen0n added inline comments.Jun 23 2022, 8:58 AM
llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
123

Because pcalau12i always produces results with the lower 12 bits clean, isn't ori more suitable for concatenating the lower bits?

wangleiat added inline comments.Jun 23 2022, 8:28 PM
llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
123

The linker will corrects the pcalau12i instruction based on whether bit 12 is 1.
The advantage of this is that there is a chance to combine the addi.{d,w} instruction with the load/store instruction.

example:

pcalau12i $a1, %pc_hi20(G)
addi.d $a2, $a1, %pc_lo12(G)
ld.w $a1, $a2, 0
=>
pcalau128 $a1, %pc_hi20(G)
ld.w $a2, $a1, %pc_lo12(G)
MaskRay added inline comments.Jun 26 2022, 4:52 PM
llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll
29

Switch to opaque pointers.

MaskRay added inline comments.Jun 26 2022, 7:12 PM
llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
122

SDValue AddrHi(DAG.getMachineNode(LoongArch::PCALAU12I, DL, Ty, GA), 0);

wangleiat updated this revision to Diff 440102.Jun 26 2022, 8:42 PM

Update test to use opaque pointers

wangleiat added inline comments.Jun 26 2022, 8:50 PM
llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
122

thanks!

llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll
29

thanks!

PING
Do you have any comments on this series of patches?

xen0n accepted this revision.Jul 1 2022, 6:37 AM

While I'm not very sure about the new pcalau12i + addi symbol address materialization, they look good regardless, so are the other test cases.

Do you think a test exercising very large offset for getelementptr is worthwhile? LGTM otherwise.

This revision is now accepted and ready to land.Jul 1 2022, 6:37 AM

While I'm not very sure about the new pcalau12i + addi symbol address materialization, they look good regardless, so are the other test cases.

Do you think a test exercising very large offset for getelementptr is worthwhile? LGTM otherwise.

Thanks!
The large offset situation can be covered in previous patches, offsets will be converted to immediate loads.

xry111 added inline comments.Jul 1 2022, 7:48 PM
llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll
330

Based on the previous discussion, should we move "ori" to the last instruction in the long immediate load sequence and change it to "addi.d" if possible, so a peephole optimization would be able to combine "addi.d" and "ld" into one instruction?

SixWeining added inline comments.Jul 1 2022, 8:51 PM
llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll
330

In terms of principle this can be implemented. But seems that this is only suitable for very few scenarios (maybe only when accessing a constant memory address). I'm not very sure.

xen0n added a comment.Jul 2 2022, 12:54 AM

While I'm not very sure about the new pcalau12i + addi symbol address materialization, they look good regardless, so are the other test cases.

Do you think a test exercising very large offset for getelementptr is worthwhile? LGTM otherwise.

Thanks!
The large offset situation can be covered in previous patches, offsets will be converted to immediate loads.

Okay. No need to additionally cover it here then.

xen0n added inline comments.Jul 2 2022, 12:58 AM
llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll
330

I think this depends on the micro-architecture. If lu12i.w; ori; lu32i.d; lu52i.d sequences or variations of it are macro-op-fusioned, then breaking away from the current pattern could instead harm performance. OTOH it could be a net benefit but as @SixWeining pointed out this case of accessing absolute addresses is probably too rare to justify a special-case.

xry111 added inline comments.Jul 2 2022, 4:01 AM
llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll
330

Then I'll withdraw the proposal, at least for now.

xry111 added inline comments.Jul 2 2022, 6:42 PM
llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll
330

I tested some very simple cases on a 3A5000LL. It seems you can move ori after lu32i.d or lu52i.d, or change it to addi.d or even xori w/o performance loss. Maybe my case is too noob and can't reflect the performance of real apps though.

xen0n added inline comments.Jul 2 2022, 8:08 PM
llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll
330

Note that I don't know whether there's macro-op fusion of this sort on 3A5000 in the first place; if this fusion actually doesn't happen on 3A5000 then there should be no performance loss whatsoever. We need input from the Loongson hardware team to be sure.

This revision was landed with ongoing or failed builds.Jul 4 2022, 8:59 PM
This revision was automatically updated to reflect the committed changes.