Page MenuHomePhabricator

[RISCV] implement li pseudo instruction
ClosedPublic

Authored by niosHD on Jan 11 2018, 6:52 AM.

Details

Summary

The implementation follows the MIPS backend and expands the
pseudo instruction directly during asm parsing. As the result, only
real MC instructions are emitted to the MCStreamer. Additionally,
PseudoLI instructions are emitted during codegen. The actual
expansion to real instructions is performed during MI to MC lowering
and is similar to the expansion performed by the GNU Assembler.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
asb added inline comments.Jan 17 2018, 5:55 AM
test/MC/RISCV/rvi-aliases-valid.s
36

Yes, I see your concern. My main problem is that when there was just CHECK-INST and CHECK-ALIAS it was fairly obvious what the different check lines meant. A comment in the file that explains the different check lines might make it easier on the reader. I suppose CHECK-EXPAND might be a little more descriptive, seeing as we're verifying that the pseudoinstruction expands to the expected multi-instruction sequence?

niosHD updated this revision to Diff 130438.EditedJan 18 2018, 10:09 AM
niosHD marked 8 inline comments as done.
niosHD edited the summary of this revision. (Show Details)

Addresses all of Alex's comments (thank you) and integrates PseudoLI emission into CodeGen.

More comments are welcome. Especially opinions about the correct location (and name) for the emitLoadImm method which is currently simply copied to both users.

Razer6 added a subscriber: Razer6.Jan 19 2018, 6:56 AM
asb added a comment.Jan 23 2018, 2:44 AM

Addresses all of Alex's comments (thank you) and integrates PseudoLI emission into CodeGen.

More comments are welcome. Especially opinions about the correct location (and name) for the emitLoadImm method which is currently simply copied to both users.

It's not immediately obvious exactly where it should go. It looks like Mips just ended up duplicating logic. Finding somewhere for it in RISCVDesc seems like it would be reasonable. Does anyone have else have any thoughts?

niosHD updated this revision to Diff 132139.Jan 31 2018, 4:39 AM
niosHD edited the summary of this revision. (Show Details)

Moved the shared emitLoadImm methods to a free function into the RISCVDesc library. Additionally, the Size of the pseudo instruction has been set to 8 (worst case for RV32) to ensure proper branch relaxation. Finally, the pattern for 32-bit immediate integers has been updated to generate PseudoLI and the tests have been updated accordingly.

I am going to add support for 64-bit constants next in order to get rid of the WIP tag. Nevertheless, comments are of course still welcome.

asb added a comment.Feb 1 2018, 5:29 AM

I'm liking the look of this, looking forward to giving a final review when you're happy to remove the WIP tag. Thanks!

apazos added inline comments.Feb 1 2018, 10:48 AM
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
602

You changed getSTI() -> STI, was it intentional?

lib/Target/RISCV/MCTargetDesc/RISCVMCPseudoExpansion.cpp
35

extra {}

39

extra {}

lib/Target/RISCV/RISCVAsmPrinter.cpp
74

can't we return the new instruction from this function and reuse the EmitToStreamer call below. This way we reduce the places to insert compression calls, when instruction compression at MC level is enabled.

niosHD added a comment.Feb 2 2018, 7:33 AM

Thank you Ana for your comments!

lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
602

No, good catch, although I am not sure if it is better to use getSTI compared to directly accessing STI. (Probably a matter of taste.)

On a closer look, actually, the whole MCSubtargetInfo operand of processInstruction seems to be redundant and can be removed given that we can access the STI within the method as well.

Still, is it preferred to access STI via getSTI in the AsmParser?

lib/Target/RISCV/RISCVAsmPrinter.cpp
74

Theoretically yes, but isn't compression done in the EmitInstruction of the streamer?

The code here is basically a custom MI to MC lowering. It uses the same EmitInstruction function which is also used by the generated emitPseudoExpansionLowering internally. Maybe I miss something but assuming that the MC compression works in conjunction with pseudo expansion I expect that it also works for the current code.

niosHD updated this revision to Diff 133023.Feb 6 2018, 9:06 AM
niosHD marked 2 inline comments as done.
niosHD retitled this revision from [RISCV] [WIP] implement li pseudo instruction to [RISCV] implement li pseudo instruction.
niosHD edited the summary of this revision. (Show Details)

Added support for handling 64-bit immediate values.

I think this is ready for review.

niosHD updated this revision to Diff 133026.Feb 6 2018, 9:15 AM

Fixed some typos in the comments.

I just stumbled across a difference between the binutils assembler and my current li implementation regarding accepted immediate values.

The following snippet shows the issue:

% cat li.S                                                                      
li t0, 0x80000000
li t1, -2147483648
li t2, 3147483648
li t3, -3147483648

% riscv32-unknown-elf-gcc -o li.o -c li.S && riscv32-unknown-elf-objdump -d li.o
[...]
00000000 <.text>:
   0:   800002b7                lui     t0,0x80000
   4:   80000337                lui     t1,0x80000
   8:   bb9ad3b7                lui     t2,0xbb9ad
   c:   a0038393                addi    t2,t2,-1536 # 0xbb9aca00
  10:   44653e37                lui     t3,0x44653
  14:   600e0e13                addi    t3,t3,1536 # 0x44653600

While it may be reasonable to accept the first three li instructions, accepting the fourth one definitely does not feel correct. It looks to me as if the immediate verification of the binutils assembler accepts everything that can theoretically be represented as 32-bit value, potentially even as purely negative number. My current implementation verifies that the immediate is a 32-bit signed integer and therefore only accepts the second li instruction in the above example. Should we also be more relaxed regarding immediate verification or should this be considered as binutils bug?

niosHD updated this revision to Diff 135655.Feb 23 2018, 9:31 AM

Addressed the discovered defect regarding the immediate of the li instruction. In RV32 mode we now accept either a signed or an unsigned 32-bit value. In RV64 mode we accept basically everything that fits into 64-bit.

Missing testcase for "li a0, foo".

lib/Target/RISCV/RISCVInstrInfo.td
401

I'm not sure it's a good idea to make code generation use this pseudo-instruction; you'll miss optimization opportunities, like MachineCSE of lui instructions.

test/MC/RISCV/rv64i-aliases-valid.s
94

This seems a little unfortunate... given you can load an arbitrary 32-bit immediate in two instructions, you should be able to load a 64-bit immediate in six instructions ("hi << 32 | lo"). But I guess that requires a second register?

niosHD updated this revision to Diff 135877.Feb 26 2018, 2:31 AM

Thank you for your comments Eli!

  • Added li t4, foo test and fixed error message for RV64.
lib/Target/RISCV/RISCVInstrInfo.td
401

Indeed, me neither. I also raised this concern in one of our weekly sync up calls and the consensus was to go with the Pseudo instruction for now. However, I am definitely not opposed to expand the respective immediate loads early into machine instructions.

test/MC/RISCV/rv64i-aliases-valid.s
94

Correct, with a second register, 6 instructions would be sufficient. Unfortunately, using a second register is, at least for the assembler, not an option. On the other hand, during codegen I think we should invest these two (virtual) registers. Additionally, in the long term, loading the constant from a constant pool should be evaluated given that it could be even more efficient. (assuming RV64I: 64-bit constant + 1 load + at most 2 instructions for the address calculation)

asb added a subscriber: kparzysz.Feb 27 2018, 11:12 AM

Hi Mario, sorry for the delay in more review comments. The vast majority of my comments are very minor nits - I think the main one is to take a closer look at the comments for emitRISCVLoadImm. Cleaning that up should make it easier to review that bit of logic. Thanks!

lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
965

It would be nice to add a comment documenting the purpose of processInstruction

977

You might as well use SignExtend64 from MathExtras here.

lib/Target/RISCV/MCTargetDesc/CMakeLists.txt
8

Sort alphabetically

lib/Target/RISCV/MCTargetDesc/RISCVMCPseudoExpansion.cpp
26

LLVM coding standards suggest just using static for single functions https://llvm.org/docs/CodingStandards.html#anonymous-namespaces

30

Can we not avoid this and use findFirstSet from MathExtras? Unless I'm missing something, you could just mask out the first 12 bits at the call-site and so avoid the need for the 'StartOffset' parameter.

53

No need to mask the value passed to SignExtend64. Although it does no harm, I'd recommend changing to SignExtend64<12>(Value).

54

Just unsigned is more usual in the LLVM tree

63

LLVM is somewhat conservative when it comes to the use of auto. Given that there's not much saving in space, I'd be explicit and use unsigned here.

71

Should have something like && "Target must be 64-bit to support a >32-bit constant" or whatever phrasing you prefer

73–74

The comment describing how emitting 32-bit constants works was fantastic - it would be nice to expand this comment to a similar level of detail. The comment doesn't quite seem to match the behaviour either, as in the implementation emitRISCVLoadImm is called recursively before emitting any other instructions.

77

Do you rely on this being an arithmetic right shift? I'm not 100% sure if the C++ standard guarantees that.

lib/Target/RISCV/MCTargetDesc/RISCVMCPseudoExpansion.h
24

Write this as unsigned DestReg.

lib/Target/RISCV/RISCVInstrInfo.td
401

If I recall correctly, @kparzysz reported that based on his experience there was probably little to gain.

test/MC/RISCV/rv64i-aliases-valid.s
94

For what it's worth, the RV64I codegen patches (not yet merged) do just use two registers and six instructions - but this is done in a dumb way that fails to recogise cases where <6 instructions can be used. Fully agree that it will be worth looking at using the constant pool

niosHD added a comment.Mar 7 2018, 8:26 AM

Hi Alex, thank you very much for your comments!

I will address all of them in my next revision. Unfortunately, I am really busy at the moment and will not be able to join the sync up call tomorrow. However, I expect that I can provide the new revision at the beginning of next week.

Best,
Mario

lib/Target/RISCV/MCTargetDesc/RISCVMCPseudoExpansion.cpp
30

I will have a look, with an additional if at the call site it should probably work. If I remember correctly, having this function is more or less an remainder of an older revision of the patch where Value was checked for 0 and -1.

73–74

I will try to expand/improve the comment to make it clearer. The basic idea was to convey that, at this point, it is already fixed that an ADDI is going to be emitted (hence scheduled). The actual emission, on the other hand, is performed after the recursive all returns.

However, I obviously failed at expressing this and will try again. ;)

77

Yes, I rely on that and it seems indeed not guaranteed by the standard. I can add an additional SignExtend64 call to make it clear. However, the implementation of SignExtend64 relies right shifts being arithmetic too.

niosHD updated this revision to Diff 138535.Mar 15 2018, 5:27 AM
niosHD marked 10 inline comments as done.

I rebased the patch and addressed all comments. Thank you again for the feedback.

asb accepted this revision.Mar 22 2018, 5:34 AM

Thanks Mario. I think this is looking good to land now.

Are you planning a follow-up patch that will show li in disassembly and for generated assembly in simple cases? (matching binutils more closely).

I haven't looked into it more closely, but I do note a minor codegen change for float-mem.ll which results in an extra instruction:

 ; Ensure that 1 is added to the high 20 bits if bit 11 of the low part is 1
 define float @flw_fsw_constant(float %a) nounwind {
 ; RV32IF-LABEL: flw_fsw_constant:
 ; RV32IF:       # %bb.0:
 ; RV32IF-NEXT:    fmv.w.x ft0, a0
 ; RV32IF-NEXT:    lui a0, 912092
-; RV32IF-NEXT:    flw ft1, -273(a0)
+; RV32IF-NEXT:    addi a0, a0, -273
+; RV32IF-NEXT:    flw ft1, 0(a0)
 ; RV32IF-NEXT:    fadd.s ft0, ft0, ft1
-; RV32IF-NEXT:    fsw ft0, -273(a0)
+; RV32IF-NEXT:    fsw ft0, 0(a0)
 ; RV32IF-NEXT:    fmv.x.w a0, ft0
 ; RV32IF-NEXT:    ret
   %1 = inttoptr i32 3735928559 to float*
   %2 = load volatile float, float* %1
   %3 = fadd float %a, %2
   store float %3, float* %1
   ret float %3
 }
This revision is now accepted and ready to land.Mar 22 2018, 5:34 AM
niosHD marked 5 inline comments as done.Mar 23 2018, 7:23 AM
In D41949#1045516, @asb wrote:

Thanks Mario. I think this is looking good to land now.

Perfect, thank you for the great feedback!

Are you planning a follow-up patch that will show li in disassembly and for generated assembly in simple cases? (matching binutils more closely).

Yes, I will look into it. Doing the same as binutils should be reasonable simple.

I haven't looked into it more closely, but I do note a minor codegen change for float-mem.ll which results in an extra instruction:

Good catch, I missed that codegen change. Seems like the ADDI was previously merged into the FLW. Given that we can solely use PseudoLI for constants we probably only miss a simplification pattern or a simple peephole optimisation. I can have a look but given that I currently do not use floating point instructions it may take some time.

asb added a comment.Mar 23 2018, 7:37 AM
In D41949#1045516, @asb wrote:

I haven't looked into it more closely, but I do note a minor codegen change for float-mem.ll which results in an extra instruction:

Good catch, I missed that codegen change. Seems like the ADDI was previously merged into the FLW. Given that we can solely use PseudoLI for constants we probably only miss a simplification pattern or a simple peephole optimisation. I can have a look but given that I currently do not use floating point instructions it may take some time.

I'm mainly surprised that we're seeing this codegen change for floating point but not integer loads/stores. I'll try to take a closer look at it before committing, but it's not something that should block this patch anyway.

zzheng added a subscriber: zzheng.Apr 12 2018, 3:54 PM
niosHD updated this revision to Diff 142385.Apr 13 2018, 6:03 AM

Rebased on master as Mandeep requested via email .

Currently there are two open "problems" with this patch:

  1. The doPeepholeLoadStoreADDI peephole optimisation can currently not deal with the PseudoLI instruction which results in the codegen regression that Alex already detected (see test/CodeGen/RISCV/mem.ll, test/CodeGen/RISCV/fload-mem.ll, test/CodeGen/RISCV/double-mem.ll). I am not sure yet if it is better to extend the current optimisation or to introduce a new one given that it requires to update the memory instruction as well as the PseudoLI instruction.
  2. The compression support, that was in the meantime landed, is not yet integrated into the RISCVMCPseudoExpansion. I did a quick experiment and it seems to be easy though. Should I add it to this patch or post a new one?

Best,
Mario

niosHD updated this revision to Diff 142423.Apr 13 2018, 9:14 AM

Extended peephole optimisation to fix introduced codegen regression.

asb added a comment.Apr 13 2018, 9:47 AM

I'd do the compressed changes in a different patch. Thanks for updating the peephole RISCVISelDAGToDAG, I'll review that bit ASAP and then commit. At a first look, it seems to handle this exactly as I would expect.

sabuasal added inline comments.Apr 13 2018, 6:52 PM
lib/Target/RISCV/MCTargetDesc/RISCVMCPseudoExpansion.cpp
45

Hi Mario @niosHD ,

Thanks for the patch, this looks nice.

just a small note about your comment addressing compression in case you want to update it in the future like @asb suggested.

Since you are calling your function (emitRISCVLoadImm) from the InstPrinter (RISCVAsmPrinter::EmitInstruction) the standard way to Emit the Instructoin is by calling EmitToStreamer in your AsmPrinter.

In other back-ends this will call ( AsmPrinter::EmitToStreamer), In RISCV, we define our own EmitToStreamer all what you have to do to support compression is calling your AsmPrinter->EmitToStreamer().

sabuasal added inline comments.Apr 13 2018, 6:58 PM
lib/Target/RISCV/RISCVAsmPrinter.cpp
74

I believe I addressed this in my other comment but I actually just saw this comment you had!

The way "emitPseudoExpansionLowering" emits the instruction is "EmitToStreamer(OutStreamer, TmpInst);". This way it preserves any behavior in the XXXASMPrinter it is called from. You can check that in any inc file "XXXXGenMCPseudoLowering.inc"

In D41949#1067297, @asb wrote:

I'd do the compressed changes in a different patch. Thanks for updating the peephole RISCVISelDAGToDAG, I'll review that bit ASAP and then commit. At a first look, it seems to handle this exactly as I would expect.

Agreed! I will add compression in a different patch. Considering the inline discussion with Ana and Sameer, any opinion on what is the cleanest way to add compression?

lib/Target/RISCV/MCTargetDesc/RISCVMCPseudoExpansion.cpp
45

Hi Sameer @sabuasal,

thank you for the hint but I do not think that calling AsmPrinter::EmitToStreamer is easily possible. emitRISCVLoadImm takes an MCStreamer as input because it is available in both, the RISCVAsmParser and the RISCVAsmPrinter, where it is called from.

My current compression prototype therefore simple adds the same compression code that has been added to the RISCVAsmParser and the RISCVAsmPrinter to emitRISCVLoadImm (via a static helper function in RISCVMCPseudoExpansion.cpp). However, I am not particularly fond of this duplication and am open for alternative ideas.

lib/Target/RISCV/RISCVAsmPrinter.cpp
74

(see above)

Returning the instructions, as Ana suggested in the first comment, would be an alternative to adding compression to the RISCVMCPseudoExpansion. However, I am still not sure if it is idiomatic for the llvm code base to return a list of instructions from such a function. Further opinions are welcome!

asb added a comment.Apr 17 2018, 6:42 AM

Thanks again Mario. I've reviewed the new RISCVISelDAGToDAG changes and just have a minor comment. This is also looking good when testing with the torture suite. I'll commit this as soon as you can confirm my minor query.

I'll think more about compression handling. If you already have something that works, it might be worth just posting that so we have something concrete to discuss.

lib/Target/RISCV/RISCVISelDAGToDAG.cpp
196 ↗(On Diff #142423)

Perhaps I'm missing something obvious, but shouldn't this be 'Hi52'?

niosHD updated this revision to Diff 142773.Apr 17 2018, 7:30 AM

Updated patch to fix variable names.

niosHD marked an inline comment as done.Apr 17 2018, 7:33 AM
In D41949#1069751, @asb wrote:

I'll think more about compression handling. If you already have something that works, it might be worth just posting that so we have something concrete to discuss.

Great, I will post it and then we can continue the discussion in the new review thread.

lib/Target/RISCV/RISCVISelDAGToDAG.cpp
196 ↗(On Diff #142423)

Indeed, good catch! This is a stupid copy and paste error which, to my embarrassment, originates from the new emitRISCVLoadImm function...

I'll fix this immediately and refresh the patch.

This revision was automatically updated to reflect the committed changes.

Hi,

This patch causes repeated LUI generation for the following test case :

void foo (int num, int* addr) {

addr[0] = num*4097;
addr[1] = num*4098;
addr[2] = num*4099;
addr[3] = num*4100;

}

Without patch:

1.	lui       a0, 1
2.	addi      a3, a0, 2
3.	mul       a3, a1, a3
4.	sw         a3, 4(a2)
5.	addi      a3, a0, 1
6.	mul       a3, a1, a3
7.	sw         a3, 0(a2)
8.	addi      a3, a0, 3
9.	mul       a3, a1, a3
10.	sw         a3, 8(a2)
11.	addi      a0, a0, 4
12.	mul       a0, a1, a0
13.	sw         a0, 12(a2)
14.	ret

with patch:

1.	lui        a0, 1
2.	addi       a0, a0, 2
3.	mul        a0, a1, a0
4.	sw a0, 4(a2)
5.	lui       a0, 1           repeated lui!
6.	addi       a0, a0, 1
7.	mul        a0, a1, a0
8.	sw a0, 0(a2)
9.	 lui       a0, 1   ---> repeated load
10.	addi       a0, a0, 3
11.	mul        a0, a1, a0
12.	sw a0, 8(a2)
13.	lui        a0, 1      ---> repeated load
14.	addi       a0, a0, 4
15.	mul        a0, a1, a0
16.	sw a0, 12(a2)
17.	ret

I think this is bacuse we are hiding the %hi part f the immediate from the Selection Dag so it doesn't optimize it away.

Sorry for the late reply. I thought you were going to hold off committing this till the compression issue is addressed.

Hi Sameer,

thank you for reporting the issue. Eli already predicted that we loose CSE for LUI due to the use of the pseudo instruction. I was therefore already kind of expecting a missed optimisation of this form. On the plus side, considering that we only emit 32-bit constants in the codegen path, I am pretty confident that the LUI duplication is as bad as it gets. Still, we definitely should fix this issue.

Unfortunately, I do not see an easy fix as long as we stick with emitting pseudo instructions during codegen. When I introduced this in January it was a pure win given that it improved code quality and de-duplicated code. However, this may has to be revisited now given that the backend has been improved considerably in the meantime. To be perfectly honest, I would probably take a step backward in this situation and remove the automatic emission of the PseudoLI instruction from the codegen path again. I still think it is in the long run desirable to share the calculation of the individual immediate values and shift constants (if needed) between the codegen path and the MC layer. However, the current approach does not seem to be the right one. What do you guys think?

asb added a comment.Apr 18 2018, 2:51 AM

I'm just working through this now. I'll play around with the options and update this thread within the next few hours.

asb added a comment.Apr 18 2018, 9:22 AM

Ok, I've had a good think about this issue. I was slightly over-eager in committing this last night. Something like PseudoLI seems necessary for the more complex materialisation logic required for 64-bit immediates in RV64, but we can do without for RV32. I've weighed up whether to revert and revise, or to make changes post commit

I suggest the following:

  • I'll update test/CodeGen/RISCV/imm.ll so it contains tests for imm32_hi20_only which are the primary benefit of this patch for codegen
  • I'll improve testing of the codegen->compression path so that we have tests that would pick up any change in the status of compression of code sequences for materialising constants
  • I'll add a test for common subexpression elimination of code sequences for materializing constants, similar to Sameer's example
  • I'll restore the previous imm32 pattern, add a new pattern for immediates with lo12 == 0 and remove the pattern using PseudoLI. This retains the codegen improvements and will fix the issues with compressed instruction emission.
  • I'll revert the RISCVISelDAGToDAG changes as that code path is now dead. I think it's still worth having PseudoLI available to the backend as it's already an improvement over the old movImm32 code.

I have patches written for all the above, and will get going with committing them.

Thanks Mario and Alex for the patch and addressing the code size concerns.
We see code size increase < 1%, but one particular test SPEC2006/bzip2 is 12%.

asb added a comment.Apr 18 2018, 1:59 PM

In the end I decided to temporarily revert this patch. I've committed patches that fill in the holes identified in testing, and added the straight-forward patch for imm32 with lo12=0.

  • Better tests for imm32_hi20_only (rL330274)
  • Improved testing of the codegen -> compression codepath (rL330288)
  • Test for constant subexpression elimination when materialising immediates (rL330291)
  • Add pattern for immediates with zero for the lower 12 bits (rL330293)

I think the best path forwards for this patch is to start by getting a version of it landed that only adds support in the MC layer for li. Would you be happy to make that change Mario? Apologies for the hassle.

As you say, it would be good to share logic between MC and codegen for materialising constants, especially the more complex logic for 64-bit constants. But let's get the MC bit landed and then we can revisit.

Thank you Mario for all of your work on this, and thanks Ana and Sameer fo the feedback.