The implementation follows the MIPS backend and expands the
pseudo instruction directly during asm parsing. As the result, only
real MC instructions are emitted to the MCStreamer. Additionally,
PseudoLI instructions are emitted during codegen. The actual
expansion to real instructions is performed during MI to MC lowering
and is similar to the expansion performed by the GNU Assembler.
Details
Diff Detail
Event Timeline
Hi Mario - as this is marked in WIP I've added a few initial comments rather than given a 100% complete review.
It's a shame we need to have RISCVInstrInfo::movImm32 as well as the expansion introduced here - but of course one produces MachineInstr and the other produces MCInstr.
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp | ||
---|---|---|
1012–1014 | Locals are normally capitalised in the LLVM coding style. | |
lib/Target/RISCV/RISCVInstrFormats.td | ||
108–109 | Do you think having these properties inferred might be a big 'magic'? I'm not really decided one way or another myself, but it does seem a bit non-obvious. | |
test/MC/RISCV/rvi-aliases-valid.s | ||
36 | Having CHECK-INST and CHECK-ALIAS makes sense in this file. Adding in CHECK to the mix makes it a little confusing. Maybe the li cases that don't 'round-trip' belong in a separate test file? e.g. to match gnu as behaviour we'd expect li x3, 0x80 to be printed by objdump, but li x4, 0x800 would be expanded to lui+addiw and will never appear in objdump output. |
Hi Alex, thank you for your comments!
As mentioned two weeks ago, I also think that it would be nice if we can share the code that synthesizes immediates between the assembler and codegen. I plan to experiment with getting this working in the next week. The idea that I plan to investigate is to delay the generation of the immediates until the MC layer is reached. Basically, emitting PseudoLI machine instructions in RISCVInstrInfo::movImm32 and perform the actual expansion manually during MI to MC lowering.
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp | ||
---|---|---|
1012–1014 | Right, I really need read the coding style again. Will be fixed in the next iteration. | |
lib/Target/RISCV/RISCVInstrFormats.td | ||
108–109 | Well, the motivation for the magic was simply to keep the patch minimal. I at first intended to introduce a new AsmPseudo class. However, I decided against it because deriving from Pseudo did not feel particularly clean given that a real AsmPseudo is typically not a CodeGenPseudo. With the mental model that a Pseudo is either a AsmPseudo or a CodeGenPseudo, inferring the type depending on opcodestr is not that bad. Anyway, I got the code wrong. The isAsmParserOnly assignment should be negated (i.e., !if(!eq(opcodestr,""), 0, 1)). Regarding alternatives, for me, the cleanest option would be to introduce a new PseudoBaseClass and derive a AsmPseudo and CodeGenPseudo from it. (Naming suggestions are welcome.) | |
test/MC/RISCV/rvi-aliases-valid.s | ||
36 | If you are not strongly against it I would prefer to keep all pseudo instruction test cases together in the respective ...aliases-valid and ...aliases-invalid files. We already have many files in the RISCV MC test directory and I am hesitant to add even more without real need. I expect that the remaining pseudo instructions most likely will not properly roundtrip either and certainly do not want to add new test files for every individual instruction. Also, as you noted, some li instructions will roundtrip depending on the specified immediate. Splitting the files based on this property would introduce even more test files given that RV32 and RV64 need separate tests too. |
Yes, I think using PseudoLI in codegen could make sense, and as you say this could allow the reuse of a common helper function.
lib/Target/RISCV/RISCVInstrFormats.td | ||
---|---|---|
108–109 | Could you just override isCodeGenOnly/isAsmParserOnly when necessary: let isAsmParserOnly = 1 in def FooInst : Pseudo<....> |
test/MC/RISCV/rvi-aliases-valid.s | ||
---|---|---|
36 | Yes, I see your concern. My main problem is that when there was just CHECK-INST and CHECK-ALIAS it was fairly obvious what the different check lines meant. A comment in the file that explains the different check lines might make it easier on the reader. I suppose CHECK-EXPAND might be a little more descriptive, seeing as we're verifying that the pseudoinstruction expands to the expected multi-instruction sequence? |
Addresses all of Alex's comments (thank you) and integrates PseudoLI emission into CodeGen.
More comments are welcome. Especially opinions about the correct location (and name) for the emitLoadImm method which is currently simply copied to both users.
It's not immediately obvious exactly where it should go. It looks like Mips just ended up duplicating logic. Finding somewhere for it in RISCVDesc seems like it would be reasonable. Does anyone have else have any thoughts?
Moved the shared emitLoadImm methods to a free function into the RISCVDesc library. Additionally, the Size of the pseudo instruction has been set to 8 (worst case for RV32) to ensure proper branch relaxation. Finally, the pattern for 32-bit immediate integers has been updated to generate PseudoLI and the tests have been updated accordingly.
I am going to add support for 64-bit constants next in order to get rid of the WIP tag. Nevertheless, comments are of course still welcome.
I'm liking the look of this, looking forward to giving a final review when you're happy to remove the WIP tag. Thanks!
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp | ||
---|---|---|
623–625 | You changed getSTI() -> STI, was it intentional? | |
lib/Target/RISCV/MCTargetDesc/RISCVMCPseudoExpansion.cpp | ||
35 | extra {} | |
39 | extra {} | |
lib/Target/RISCV/RISCVAsmPrinter.cpp | ||
85 | can't we return the new instruction from this function and reuse the EmitToStreamer call below. This way we reduce the places to insert compression calls, when instruction compression at MC level is enabled. |
Thank you Ana for your comments!
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp | ||
---|---|---|
623–625 | No, good catch, although I am not sure if it is better to use getSTI compared to directly accessing STI. (Probably a matter of taste.) On a closer look, actually, the whole MCSubtargetInfo operand of processInstruction seems to be redundant and can be removed given that we can access the STI within the method as well. Still, is it preferred to access STI via getSTI in the AsmParser? | |
lib/Target/RISCV/RISCVAsmPrinter.cpp | ||
85 | Theoretically yes, but isn't compression done in the EmitInstruction of the streamer? The code here is basically a custom MI to MC lowering. It uses the same EmitInstruction function which is also used by the generated emitPseudoExpansionLowering internally. Maybe I miss something but assuming that the MC compression works in conjunction with pseudo expansion I expect that it also works for the current code. |
Added support for handling 64-bit immediate values.
I think this is ready for review.
I just stumbled across a difference between the binutils assembler and my current li implementation regarding accepted immediate values.
The following snippet shows the issue:
% cat li.S li t0, 0x80000000 li t1, -2147483648 li t2, 3147483648 li t3, -3147483648 % riscv32-unknown-elf-gcc -o li.o -c li.S && riscv32-unknown-elf-objdump -d li.o [...] 00000000 <.text>: 0: 800002b7 lui t0,0x80000 4: 80000337 lui t1,0x80000 8: bb9ad3b7 lui t2,0xbb9ad c: a0038393 addi t2,t2,-1536 # 0xbb9aca00 10: 44653e37 lui t3,0x44653 14: 600e0e13 addi t3,t3,1536 # 0x44653600
While it may be reasonable to accept the first three li instructions, accepting the fourth one definitely does not feel correct. It looks to me as if the immediate verification of the binutils assembler accepts everything that can theoretically be represented as 32-bit value, potentially even as purely negative number. My current implementation verifies that the immediate is a 32-bit signed integer and therefore only accepts the second li instruction in the above example. Should we also be more relaxed regarding immediate verification or should this be considered as binutils bug?
Addressed the discovered defect regarding the immediate of the li instruction. In RV32 mode we now accept either a signed or an unsigned 32-bit value. In RV64 mode we accept basically everything that fits into 64-bit.
Missing testcase for "li a0, foo".
lib/Target/RISCV/RISCVInstrInfo.td | ||
---|---|---|
433 | I'm not sure it's a good idea to make code generation use this pseudo-instruction; you'll miss optimization opportunities, like MachineCSE of lui instructions. | |
test/MC/RISCV/rv64i-aliases-valid.s | ||
94 | This seems a little unfortunate... given you can load an arbitrary 32-bit immediate in two instructions, you should be able to load a 64-bit immediate in six instructions ("hi << 32 | lo"). But I guess that requires a second register? |
Thank you for your comments Eli!
- Added li t4, foo test and fixed error message for RV64.
lib/Target/RISCV/RISCVInstrInfo.td | ||
---|---|---|
433 | Indeed, me neither. I also raised this concern in one of our weekly sync up calls and the consensus was to go with the Pseudo instruction for now. However, I am definitely not opposed to expand the respective immediate loads early into machine instructions. | |
test/MC/RISCV/rv64i-aliases-valid.s | ||
94 | Correct, with a second register, 6 instructions would be sufficient. Unfortunately, using a second register is, at least for the assembler, not an option. On the other hand, during codegen I think we should invest these two (virtual) registers. Additionally, in the long term, loading the constant from a constant pool should be evaluated given that it could be even more efficient. (assuming RV64I: 64-bit constant + 1 load + at most 2 instructions for the address calculation) |
Hi Mario, sorry for the delay in more review comments. The vast majority of my comments are very minor nits - I think the main one is to take a closer look at the comments for emitRISCVLoadImm. Cleaning that up should make it easier to review that bit of logic. Thanks!
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp | ||
---|---|---|
994 | It would be nice to add a comment documenting the purpose of processInstruction | |
1006 | You might as well use SignExtend64 from MathExtras here. | |
lib/Target/RISCV/MCTargetDesc/CMakeLists.txt | ||
9 | Sort alphabetically | |
lib/Target/RISCV/MCTargetDesc/RISCVMCPseudoExpansion.cpp | ||
27 | LLVM coding standards suggest just using static for single functions https://llvm.org/docs/CodingStandards.html#anonymous-namespaces | |
31 | Can we not avoid this and use findFirstSet from MathExtras? Unless I'm missing something, you could just mask out the first 12 bits at the call-site and so avoid the need for the 'StartOffset' parameter. | |
54 | No need to mask the value passed to SignExtend64. Although it does no harm, I'd recommend changing to SignExtend64<12>(Value). | |
55 | Just unsigned is more usual in the LLVM tree | |
64 | LLVM is somewhat conservative when it comes to the use of auto. Given that there's not much saving in space, I'd be explicit and use unsigned here. | |
72 | Should have something like && "Target must be 64-bit to support a >32-bit constant" or whatever phrasing you prefer | |
74–75 | The comment describing how emitting 32-bit constants works was fantastic - it would be nice to expand this comment to a similar level of detail. The comment doesn't quite seem to match the behaviour either, as in the implementation emitRISCVLoadImm is called recursively before emitting any other instructions. | |
78 | Do you rely on this being an arithmetic right shift? I'm not 100% sure if the C++ standard guarantees that. | |
lib/Target/RISCV/MCTargetDesc/RISCVMCPseudoExpansion.h | ||
25 | Write this as unsigned DestReg. | |
lib/Target/RISCV/RISCVInstrInfo.td | ||
433 | If I recall correctly, @kparzysz reported that based on his experience there was probably little to gain. | |
test/MC/RISCV/rv64i-aliases-valid.s | ||
94 | For what it's worth, the RV64I codegen patches (not yet merged) do just use two registers and six instructions - but this is done in a dumb way that fails to recogise cases where <6 instructions can be used. Fully agree that it will be worth looking at using the constant pool |
Hi Alex, thank you very much for your comments!
I will address all of them in my next revision. Unfortunately, I am really busy at the moment and will not be able to join the sync up call tomorrow. However, I expect that I can provide the new revision at the beginning of next week.
Best,
Mario
lib/Target/RISCV/MCTargetDesc/RISCVMCPseudoExpansion.cpp | ||
---|---|---|
31 | I will have a look, with an additional if at the call site it should probably work. If I remember correctly, having this function is more or less an remainder of an older revision of the patch where Value was checked for 0 and -1. | |
74–75 | I will try to expand/improve the comment to make it clearer. The basic idea was to convey that, at this point, it is already fixed that an ADDI is going to be emitted (hence scheduled). The actual emission, on the other hand, is performed after the recursive all returns. However, I obviously failed at expressing this and will try again. ;) | |
78 | Yes, I rely on that and it seems indeed not guaranteed by the standard. I can add an additional SignExtend64 call to make it clear. However, the implementation of SignExtend64 relies right shifts being arithmetic too. |
Thanks Mario. I think this is looking good to land now.
Are you planning a follow-up patch that will show li in disassembly and for generated assembly in simple cases? (matching binutils more closely).
I haven't looked into it more closely, but I do note a minor codegen change for float-mem.ll which results in an extra instruction:
; Ensure that 1 is added to the high 20 bits if bit 11 of the low part is 1 define float @flw_fsw_constant(float %a) nounwind { ; RV32IF-LABEL: flw_fsw_constant: ; RV32IF: # %bb.0: ; RV32IF-NEXT: fmv.w.x ft0, a0 ; RV32IF-NEXT: lui a0, 912092 -; RV32IF-NEXT: flw ft1, -273(a0) +; RV32IF-NEXT: addi a0, a0, -273 +; RV32IF-NEXT: flw ft1, 0(a0) ; RV32IF-NEXT: fadd.s ft0, ft0, ft1 -; RV32IF-NEXT: fsw ft0, -273(a0) +; RV32IF-NEXT: fsw ft0, 0(a0) ; RV32IF-NEXT: fmv.x.w a0, ft0 ; RV32IF-NEXT: ret %1 = inttoptr i32 3735928559 to float* %2 = load volatile float, float* %1 %3 = fadd float %a, %2 store float %3, float* %1 ret float %3 }
Perfect, thank you for the great feedback!
Are you planning a follow-up patch that will show li in disassembly and for generated assembly in simple cases? (matching binutils more closely).
Yes, I will look into it. Doing the same as binutils should be reasonable simple.
I haven't looked into it more closely, but I do note a minor codegen change for float-mem.ll which results in an extra instruction:
Good catch, I missed that codegen change. Seems like the ADDI was previously merged into the FLW. Given that we can solely use PseudoLI for constants we probably only miss a simplification pattern or a simple peephole optimisation. I can have a look but given that I currently do not use floating point instructions it may take some time.
I'm mainly surprised that we're seeing this codegen change for floating point but not integer loads/stores. I'll try to take a closer look at it before committing, but it's not something that should block this patch anyway.
Rebased on master as Mandeep requested via email .
Currently there are two open "problems" with this patch:
- The doPeepholeLoadStoreADDI peephole optimisation can currently not deal with the PseudoLI instruction which results in the codegen regression that Alex already detected (see test/CodeGen/RISCV/mem.ll, test/CodeGen/RISCV/fload-mem.ll, test/CodeGen/RISCV/double-mem.ll). I am not sure yet if it is better to extend the current optimisation or to introduce a new one given that it requires to update the memory instruction as well as the PseudoLI instruction.
- The compression support, that was in the meantime landed, is not yet integrated into the RISCVMCPseudoExpansion. I did a quick experiment and it seems to be easy though. Should I add it to this patch or post a new one?
Best,
Mario
I'd do the compressed changes in a different patch. Thanks for updating the peephole RISCVISelDAGToDAG, I'll review that bit ASAP and then commit. At a first look, it seems to handle this exactly as I would expect.
lib/Target/RISCV/MCTargetDesc/RISCVMCPseudoExpansion.cpp | ||
---|---|---|
45 | Hi Mario @niosHD , Thanks for the patch, this looks nice. just a small note about your comment addressing compression in case you want to update it in the future like @asb suggested. Since you are calling your function (emitRISCVLoadImm) from the InstPrinter (RISCVAsmPrinter::EmitInstruction) the standard way to Emit the Instructoin is by calling EmitToStreamer in your AsmPrinter. In other back-ends this will call ( AsmPrinter::EmitToStreamer), In RISCV, we define our own EmitToStreamer all what you have to do to support compression is calling your AsmPrinter->EmitToStreamer(). |
lib/Target/RISCV/RISCVAsmPrinter.cpp | ||
---|---|---|
85 | I believe I addressed this in my other comment but I actually just saw this comment you had! The way "emitPseudoExpansionLowering" emits the instruction is "EmitToStreamer(OutStreamer, TmpInst);". This way it preserves any behavior in the XXXASMPrinter it is called from. You can check that in any inc file "XXXXGenMCPseudoLowering.inc" |
Agreed! I will add compression in a different patch. Considering the inline discussion with Ana and Sameer, any opinion on what is the cleanest way to add compression?
lib/Target/RISCV/MCTargetDesc/RISCVMCPseudoExpansion.cpp | ||
---|---|---|
45 | Hi Sameer @sabuasal, thank you for the hint but I do not think that calling AsmPrinter::EmitToStreamer is easily possible. emitRISCVLoadImm takes an MCStreamer as input because it is available in both, the RISCVAsmParser and the RISCVAsmPrinter, where it is called from. My current compression prototype therefore simple adds the same compression code that has been added to the RISCVAsmParser and the RISCVAsmPrinter to emitRISCVLoadImm (via a static helper function in RISCVMCPseudoExpansion.cpp). However, I am not particularly fond of this duplication and am open for alternative ideas. | |
lib/Target/RISCV/RISCVAsmPrinter.cpp | ||
85 | (see above) Returning the instructions, as Ana suggested in the first comment, would be an alternative to adding compression to the RISCVMCPseudoExpansion. However, I am still not sure if it is idiomatic for the llvm code base to return a list of instructions from such a function. Further opinions are welcome! |
Thanks again Mario. I've reviewed the new RISCVISelDAGToDAG changes and just have a minor comment. This is also looking good when testing with the torture suite. I'll commit this as soon as you can confirm my minor query.
I'll think more about compression handling. If you already have something that works, it might be worth just posting that so we have something concrete to discuss.
lib/Target/RISCV/RISCVISelDAGToDAG.cpp | ||
---|---|---|
196 | Perhaps I'm missing something obvious, but shouldn't this be 'Hi52'? |
Great, I will post it and then we can continue the discussion in the new review thread.
lib/Target/RISCV/RISCVISelDAGToDAG.cpp | ||
---|---|---|
196 | Indeed, good catch! This is a stupid copy and paste error which, to my embarrassment, originates from the new emitRISCVLoadImm function... I'll fix this immediately and refresh the patch. |
Hi,
This patch causes repeated LUI generation for the following test case :
void foo (int num, int* addr) {
addr[0] = num*4097; addr[1] = num*4098; addr[2] = num*4099; addr[3] = num*4100;
}
Without patch:
1. lui a0, 1 2. addi a3, a0, 2 3. mul a3, a1, a3 4. sw a3, 4(a2) 5. addi a3, a0, 1 6. mul a3, a1, a3 7. sw a3, 0(a2) 8. addi a3, a0, 3 9. mul a3, a1, a3 10. sw a3, 8(a2) 11. addi a0, a0, 4 12. mul a0, a1, a0 13. sw a0, 12(a2) 14. ret
with patch:
1. lui a0, 1 2. addi a0, a0, 2 3. mul a0, a1, a0 4. sw a0, 4(a2) 5. lui a0, 1 repeated lui! 6. addi a0, a0, 1 7. mul a0, a1, a0 8. sw a0, 0(a2) 9. lui a0, 1 ---> repeated load 10. addi a0, a0, 3 11. mul a0, a1, a0 12. sw a0, 8(a2) 13. lui a0, 1 ---> repeated load 14. addi a0, a0, 4 15. mul a0, a1, a0 16. sw a0, 12(a2) 17. ret
I think this is bacuse we are hiding the %hi part f the immediate from the Selection Dag so it doesn't optimize it away.
Sorry for the late reply. I thought you were going to hold off committing this till the compression issue is addressed.
Hi Sameer,
thank you for reporting the issue. Eli already predicted that we loose CSE for LUI due to the use of the pseudo instruction. I was therefore already kind of expecting a missed optimisation of this form. On the plus side, considering that we only emit 32-bit constants in the codegen path, I am pretty confident that the LUI duplication is as bad as it gets. Still, we definitely should fix this issue.
Unfortunately, I do not see an easy fix as long as we stick with emitting pseudo instructions during codegen. When I introduced this in January it was a pure win given that it improved code quality and de-duplicated code. However, this may has to be revisited now given that the backend has been improved considerably in the meantime. To be perfectly honest, I would probably take a step backward in this situation and remove the automatic emission of the PseudoLI instruction from the codegen path again. I still think it is in the long run desirable to share the calculation of the individual immediate values and shift constants (if needed) between the codegen path and the MC layer. However, the current approach does not seem to be the right one. What do you guys think?
I'm just working through this now. I'll play around with the options and update this thread within the next few hours.
Ok, I've had a good think about this issue. I was slightly over-eager in committing this last night. Something like PseudoLI seems necessary for the more complex materialisation logic required for 64-bit immediates in RV64, but we can do without for RV32. I've weighed up whether to revert and revise, or to make changes post commit
I suggest the following:
- I'll update test/CodeGen/RISCV/imm.ll so it contains tests for imm32_hi20_only which are the primary benefit of this patch for codegen
- I'll improve testing of the codegen->compression path so that we have tests that would pick up any change in the status of compression of code sequences for materialising constants
- I'll add a test for common subexpression elimination of code sequences for materializing constants, similar to Sameer's example
- I'll restore the previous imm32 pattern, add a new pattern for immediates with lo12 == 0 and remove the pattern using PseudoLI. This retains the codegen improvements and will fix the issues with compressed instruction emission.
- I'll revert the RISCVISelDAGToDAG changes as that code path is now dead. I think it's still worth having PseudoLI available to the backend as it's already an improvement over the old movImm32 code.
I have patches written for all the above, and will get going with committing them.
Thanks Mario and Alex for the patch and addressing the code size concerns.
We see code size increase < 1%, but one particular test SPEC2006/bzip2 is 12%.
In the end I decided to temporarily revert this patch. I've committed patches that fill in the holes identified in testing, and added the straight-forward patch for imm32 with lo12=0.
- Better tests for imm32_hi20_only (rL330274)
- Improved testing of the codegen -> compression codepath (rL330288)
- Test for constant subexpression elimination when materialising immediates (rL330291)
- Add pattern for immediates with zero for the lower 12 bits (rL330293)
I think the best path forwards for this patch is to start by getting a version of it landed that only adds support in the MC layer for li. Would you be happy to make that change Mario? Apologies for the hassle.
As you say, it would be good to share logic between MC and codegen for materialising constants, especially the more complex logic for 64-bit constants. But let's get the MC bit landed and then we can revisit.
Thank you Mario for all of your work on this, and thanks Ana and Sameer fo the feedback.
You changed getSTI() -> STI, was it intentional?