Page MenuHomePhabricator

[RISCV] Constant materialisation for RV64I
ClosedPublic

Authored by asb on Oct 6 2018, 3:29 AM.

Details

Summary

This commit introduces support for materialising 64-bit constants for RV64I,
making use of the RISCVMatInt::generateInstSeq helper in order to share logic
for immediate materialisation with the MC layer (where it's used for the li
pseudoinstruction).

test/CodeGen/RISCV/imm.ll is updated to test RV64, and gains new 64-bit
constant tests. It would be preferable if anyext constant returns were sign
rather than zero extended (see PR39092). This patch simply adds an explicit
signext to the returns in imm.ll.

Further optimisations for constant materialisation are possible, most notably
for mask-like values which can be generated my loading -1 and shifting right.
A future patch will standardise on the C++ codepath for immediate selection on
RV32 as well as RV64, and then add further such optimisations to
RISCVMatInt::generateInstSeq in order to benefit both RV32 and RV64 for
codegen and li expansion.

Diff Detail

Repository
rL LLVM

Event Timeline

asb created this revision.Oct 6 2018, 3:29 AM
psnobl added a subscriber: psnobl.Oct 9 2018, 12:58 AM
psnobl added a comment.Oct 9 2018, 1:18 AM

Wouldn't it better to put more complicated constants (requiring 4 and more instructions to materialize) in constant pool and load them from there?

Wouldn't it better to put more complicated constants (requiring 4 and more instructions to materialize) in constant pool and load them from there?

gcc does that, at some point. It depends. If you can guarantee a cache hit for the load then it can be better. If you can't then you could be looking at a big slowdown. Typical processors go to a lot of trouble to prefetch the instruction stream. Eight instructions and 20 to 24 bytes to load an eight byte literal isn't too horrible. The load from a pool approach is going to generally take two 4 byte instructions (auipc/ld or lui/ld) plus the 8 byte literal itself, for a total of 16 bytes. Plus, it might cost anything from half a dozen clock cycles to several hundred clock cycles to get the cache line. Well, let's say on average it's going to be in L2 cache. It's not a clear win. And the worst case is awful. All to save 4 to 8 bytes of code and maybe four clock cycles in the best case.

Oh, and by the way, if the user really wants to use a load, they can easily force that by writing a global const (or single element initialized array to be really sure).

psnobl added a comment.Oct 9 2018, 2:38 AM

Well you can save more in the total size if the constant is needed in more than one place in the application (and linker can merge the CP entries) but I see your point about possible cache miss. It's not a clear win, as you say.

asb updated this revision to Diff 169200.Oct 11 2018, 6:24 AM

Updated patch to add a TODO to indicate that it may sometimes be preferable to load from the constant pool. As Bruce points out, this isn't always a clear win.

If people are happy with this approach, my intent with this patchset is to land something that generates "reasonable" code and leave tuning such as constpool vs materialisation for future patches.

sabuasal edited reviewers, added: sabuasal; removed: sameer.abuasal.Oct 22 2018, 4:43 PM
asb added a comment.Nov 15 2018, 2:21 AM

Confirming this patch still applies cleanly against current LLVM HEAD. Ping?

sabuasal accepted this revision.Nov 15 2018, 2:48 PM

LGTM!

This revision is now accepted and ready to land.Nov 15 2018, 2:48 PM
This revision was automatically updated to reflect the committed changes.