This patch enables the machine outliner for RISC-V and adds the necessary logic for checking whether sequences can be safely outlined, and describing how they should be outlined. Outlined functions are called using the register t0 (x5) as the return address register, which must be available for an occurrence of a sequence to be safely outlined.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp | ||
---|---|---|
492 | AArch64 has more checks, some of which seem like could be relevant for us. Checking for section markings, link once, etc. | |
502 | If we are only going to support one possible register for now, shouldn't it be the one least likely to already be in use? Wouldn't that be t6 (x31)? | |
517 | This might be pessimistic when the instructions are compressible. Suddenly the outlined amount vs the overhead math might not quite be realistic. I think it's worth devising a test to try to explore how much of a problem that is, and if we can compensate for it by, say, tweaking the overhead numbers, or determining if the instruction is likely to be compressible, or being conservative regarding the instruction size when targeting RVC. |
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp | ||
---|---|---|
517 | @efriedma I was assuming it wouldn't be possible to make getInstSizeInBytes fully account for compression due to relaxations and/or compression being done by the linker, but if that is possible (or at least a good enough approximation) then that sounds like a good solution. |
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp | ||
---|---|---|
502 | Good point about RV32E. I guess you can either always use t2 or check the target and use t6 when available, falling back to t2 otherwise. If that's not trivial it might be worth checking how hard it would be to dynamically choose the register, like AArch64 does IIRC. |
llvm/test/CodeGen/RISCV/machineoutliner.ll | ||
---|---|---|
1 ↗ | (On Diff #215075) | I recommend writing a .mir test for this instead of a .ll test. (e.g, use -stop-before=machine-outliner -simplify-mir) That would make the test resilient against other code generation changes, and make it easier to test the instruction sequences you want to test. |
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp | ||
---|---|---|
517 | The linker only changes the size of calls and global address computation, currently. I mean, I guess it's theoretically possible for it to compress other branches, but that isn't implemented, as far as I know. So it should be possible to accurately compute the size of almost all compressible instructions. LLVM currently doesn't do any RISCV compression computation before the assembler runs, but it was only implemented that way to simplify the implementation; it could be done earlier, if it would be useful. |
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp | ||
---|---|---|
492 | I've added the basic checks here, I will look into adding tests for this. | |
llvm/test/CodeGen/RISCV/machineoutliner.ll | ||
1 ↗ | (On Diff #215075) | I haven't used the tool in this way before, I've been trying to replicate the command used in the AArch64 run line, but I'm still getting 'error: expected top-level entity' when running the check. |
llvm/test/CodeGen/RISCV/machineoutliner.ll | ||
---|---|---|
1 ↗ | (On Diff #215075) | "expected top-level entity" comes from the IR parser, so your testcase probably isn't getting treated as MIR at all. MIR testcases have to end with a ".mir" extension (or you can use "-x mir"). |
Updated the test to an MIR-based test, and added checks for the cases where outlining cannot be done.
llvm/test/CodeGen/RISCV/machineoutliner.mir | ||
---|---|---|
2–3 | You probably want -verify-machineinstrs on each of these. | |
8–12 | I don't think you actually need the body of the IR functions here. You can make each function void, and just return: e.g. define i32 @outline_0(i32 %a, i32 %b) #0 { ret void } If it weren't for the section checks, you wouldn't need the IR at all. It only needs to be there when you look back at the IR in some way in the pass. | |
60 | I'm pretty sure you can remove this. | |
62 | This isn't checked/used by the outliner, so you can remove it from the IR. | |
77 | If you do remove the bulk of the IR code, you'll need to rename the BB labels to bb.0. | |
82 | It would be good to check the contents of the outlined functions as well. E.g. OUTLINED_FUNCTION_0 should contain the ORI/ADDI/AND sequence. (It's possible that utils/update_mir_test_checks.py can generate all of the checks you want here, but I'm not sure if it can handle functions being created.) |
I've been testing this on the Embench set of benchmarks (https://github.com/embench/embench-iot), and it appears that some programs cause execution failures with this enabled, which is a little concerning... I'm going to look into this and then also update the tests.
Don't allow return statements in outlined sequences until tail calls to outlined functions are properly implemented. Improve logic for checking if X5 is available for use at the insertion point of an outlined sequence, and update the test to be pure MIR.
llvm/test/CodeGen/RISCV/machineoutliner.mir | ||
---|---|---|
2–3 | I think there is some issue with the generation of MIR, at least for RISC-V... If I do IR -> MIR -> Assembly with -verify-machineinstrs, I get the error: 'Function has NoVRegs property but there are VReg operands'. I don't get it for IR -> Assembly with -verify-machineinstrs, and I'm not at all sure what to do about it. | |
82 |
I'm not sure I can either! Inserting check lines at the end of the file causes parsing errors. |
llvm/test/CodeGen/RISCV/machineoutliner.mir | ||
---|---|---|
2–3 | The verifier should tell you which pass is causing this to happen, and in which function. What is probably happening is that it's because you're using -enable-machine-outliner. This is probably running passes you don't expect. Since this output was generated by -stop-before=machine-outliner, these passes are probably doing weird things. What you probably want is one of these:
I think that using -run-pass is usually better for these kinds of tests. The reason being that it localizes the test to a single pass, rather than several. The outliner is an extremely late pass though, so using -start-before is probably fine. If you use -run-pass, you'll have to rewrite your CHECK lines as MIR though. |
Refactor the MIR test case again:
- use -run-pass=machine-outliner to only test this pass (required removing the no-machineoutliner checks, they are redundant anyway)
- add riscv64 checks for completeness?
- add -verify-machineinstrs
Two small nits, other than that it's looking pretty good.
I presume you got to the bottom of the exception handling issues you saw, and they are now solved.
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp | ||
---|---|---|
522 | It would be useful to include the ABI name of x5 in this comment too. | |
524 | Can you rename this to CannotInsertCall, I kept missing the t when reviewing, which inverts its meaning. |
We discussed this at the RISC-V meeting on 19 Sept 2019. @apazos says she is willing to test this on SPEC.
I have run a couple of standard workloads like SPEC.
There is no correctness issue when enabling the MO feature (except for spec2000/twolf which fails with/without MO).
The MO code size gains without compression are up to 7%. With compression enabled, most of the code size gain is gone, and I see more code size increase.
It is possible it has to do with the SequenceSize we are estimating.
The reason we enable compression late is to have all paths - coming from codegen (LLVM IR), parsing, assembling .s or inline asm - go through the same mechanism for compression.
This is similar/compatible with GCC behavior, which relies on the external assembler to implement compression.
We can better estimate SequenceSize by checking if an instruction is compressable. We can modify the tablegen backend for compression emitter to generate a function that does the isCompressable check, but instead of using MCInst and MCOperands for the checks, we need to use MachineInstr and MachineOperand types. I will try this solution. Another alternative is to compress LLVM IR code before running the machine outliner.
Hi Ana, thank you for this analysis, interestingly I still found significant benefits with compressed instructions enabled. It's worth noting that I was using Embench, which targets the embedded use case.
I'm not sure I quite follow what you mean when you say 'Another alternative is to compress LLVM IR code before running the machine outliner', But did you make any progress with the 'isCompressible' method?
Hi Lewis,
Here is the change: https://reviews.llvm.org/D68290
With this change I was able to remove the code size degradation I had observed.
Please try it out.
Lewis, this patch LGTM. You can go ahead and merge it.
To improve code size, I have updated the additional patch in https://reviews.llvm.org/D68290. No change is required in machine outliner code.
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp | ||
---|---|---|
502 | ra and t0 (x1 and x5) have special functionality in implementations with a return-address stack; jr t6 will be treated as a general indirect branch, not a return, and is much more likely to mispredict. So this should probably prefer t0 whenever possible. | |
549 | Tangentially related, but in many of the cases where outlining makes sense, it would also make sense to generate 4-byte jal t0, label instructions; lld and ld.bfd would need to be taught to generate thunks for out of range jal (the Go linker already supports them). | |
553 | Note that jr <reg> is a compressible instruction regardless of the register used. |
I'm seeing some excess failures from this. However, the failures I've looked into so far seem to be timeouts from compilation being too slow. I'll hold off on committing until I've triaged all of them.
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp | ||
---|---|---|
553 | Thanks, I'll try and incorporate this too. |
I need to investigate some failures with this patch whereby sections of code which reference different temporary symbols still get outlined, with only one reference being used in the outlined function.
IE the following sequences should definitely not be outlined:
lui a0, %hi(.LCPI1_0) addi a0, a0, %lo(.LCPI1_0) ... ... lui a0, %hi(.LCPI2_0) addi a0, a0, %lo(.LCPI2_0) ... ... lui a0, %hi(.LCPI3_0) addi a0, a0, %lo(.LCPI3_0) ...
Rebased. Added special case for size of jr t0 with compression enabled. Fixed failure in GCC testsuite.
The failure was caused by machine operands referencing constant pool indexes being outlined, even though these indexes are materialized as local labels. This was fixed by simply checking MO.isCPI().
Lewis, I tested the latest version of this patch (with and without https://reviews.llvm.org/D68290), I don't see any issue. LGTM.
I'm doing a final rebase and check on this before merging. I have a failure in the GCC testsuite which I'm triaging.
So the failure in the GCC testsuite occurs due to the following test:
/* PR rtl-optimization/48141 */ /* { dg-do compile } */ /* { dg-options "-O" } */ #define A i = 0; #define B A A A A A A A A A A #define C B B B B B B B B B B #define D C C C C C C C C C C #define E D D D D D D D D D D int foo (void) { volatile int i = 0; E E E E E E E E E E E return 0; }
Which generates a huge amount of work for the machine outliner algorithm to process. I get a segmentation fault when the machine outliner attempts to iterate over the tree to add suffix indices.
Obviously it would be nice for the compiler not to segfault and I'll carry on looking into this further. However I don't see that this indicates a problem with the RISC-V backend specific logic added by this patch, so if there's no objection I may go ahead with merging this later.
Can you open a bug for the machine outliner and maybe contact Jessica, may be she can help fix this quickly.
Sorry for the delay. Finding a testcase for X86 is proving extremely time consuming, although I have been able to prove to myself that the same situation is indeed possible. I just don't know enough about the X86 backend to invoke it.
So I think it is easier if I commit the RISC-V implementation, and we can then discuss how to approach the testcase which we already have.
Thanks Lewis for providing the test case. A bug can be opened for the machine outliner with that example. In my opinion you can go ahead and merge this patch.
https://bugs.llvm.org/show_bug.cgi?id=44344
Pinging @paquette perhaps you have some insight into whether there's anything that can be done to avoid this issue (also discussed above).
AArch64 has more checks, some of which seem like could be relevant for us. Checking for section markings, link once, etc.