This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Enable the machine outliner for RISC-V
ClosedPublic

Authored by lewis-revill on Aug 14 2019, 3:53 AM.

Details

Summary

This patch enables the machine outliner for RISC-V and adds the necessary logic for checking whether sequences can be safely outlined, and describing how they should be outlined. Outlined functions are called using the register t0 (x5) as the return address register, which must be available for an occurrence of a sequence to be safely outlined.

Diff Detail

Event Timeline

lewis-revill created this revision.Aug 14 2019, 3:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 14 2019, 3:53 AM
luismarques requested changes to this revision.Aug 16 2019, 10:36 AM
luismarques added inline comments.
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
492

AArch64 has more checks, some of which seem like could be relevant for us. Checking for section markings, link once, etc.

502

If we are only going to support one possible register for now, shouldn't it be the one least likely to already be in use? Wouldn't that be t6 (x31)?

517

This might be pessimistic when the instructions are compressible. Suddenly the outlined amount vs the overhead math might not quite be realistic. I think it's worth devising a test to try to explore how much of a problem that is, and if we can compensate for it by, say, tweaking the overhead numbers, or determining if the instruction is likely to be compressible, or being conservative regarding the instruction size when targeting RVC.

This revision now requires changes to proceed.Aug 16 2019, 10:36 AM
efriedma added inline comments.Aug 16 2019, 1:37 PM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
517

Wouldn't you want getInstSizeInBytes to account for compression for other reasons, anyway? For example, to avoid triggering branch relaxation when it isn't necessary.

524

RISCV::PseudoRET = 8 bytes.

I don't understand the math here.

luismarques added inline comments.Aug 16 2019, 3:58 PM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
517

@efriedma I was assuming it wouldn't be possible to make getInstSizeInBytes fully account for compression due to relaxations and/or compression being done by the linker, but if that is possible (or at least a good enough approximation) then that sounds like a good solution.

lewis-revill marked 2 inline comments as done.Aug 19 2019, 4:48 AM
lewis-revill added inline comments.
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
502

That's a sensible suggestion, I was using t0 to match how the save/restore libcalls behave so I presumed there was a good reason for using t0. Don't we also need to think about RV32E here though?

524

Good catch this is definitely an error.

luismarques marked an inline comment as done.Aug 19 2019, 5:02 AM
luismarques added inline comments.
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
502

Good point about RV32E. I guess you can either always use t2 or check the target and use t6 when available, falling back to t2 otherwise. If that's not trivial it might be worth checking how hard it would be to dynamically choose the register, like AArch64 does IIRC.

paquette added inline comments.Aug 20 2019, 9:07 AM
llvm/test/CodeGen/RISCV/machineoutliner.ll
1 ↗(On Diff #215075)

I recommend writing a .mir test for this instead of a .ll test. (e.g, use -stop-before=machine-outliner -simplify-mir)

That would make the test resilient against other code generation changes, and make it easier to test the instruction sequences you want to test.

efriedma added inline comments.Aug 20 2019, 12:36 PM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
517

The linker only changes the size of calls and global address computation, currently. I mean, I guess it's theoretically possible for it to compress other branches, but that isn't implemented, as far as I know. So it should be possible to accurately compute the size of almost all compressible instructions.

LLVM currently doesn't do any RISCV compression computation before the assembler runs, but it was only implemented that way to simplify the implementation; it could be done earlier, if it would be useful.

Rebased and addressed a couple of comments.

lewis-revill marked 3 inline comments as done.Aug 22 2019, 2:43 AM
lewis-revill added inline comments.
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
492

I've added the basic checks here, I will look into adding tests for this.

llvm/test/CodeGen/RISCV/machineoutliner.ll
1 ↗(On Diff #215075)

I haven't used the tool in this way before, I've been trying to replicate the command used in the AArch64 run line, but I'm still getting 'error: expected top-level entity' when running the check.

efriedma added inline comments.Aug 22 2019, 2:06 PM
llvm/test/CodeGen/RISCV/machineoutliner.ll
1 ↗(On Diff #215075)

"expected top-level entity" comes from the IR parser, so your testcase probably isn't getting treated as MIR at all.

MIR testcases have to end with a ".mir" extension (or you can use "-x mir").

Updated the test to an MIR-based test, and added checks for the cases where outlining cannot be done.

paquette added inline comments.Sep 3 2019, 10:17 AM
llvm/test/CodeGen/RISCV/machineoutliner.mir
2–3

You probably want -verify-machineinstrs on each of these.

8–12

I don't think you actually need the body of the IR functions here. You can make each function void, and just return:

e.g.

define i32 @outline_0(i32 %a, i32 %b) #0 { ret void }

If it weren't for the section checks, you wouldn't need the IR at all. It only needs to be there when you look back at the IR in some way in the pass.

60

I'm pretty sure you can remove this.

62

This isn't checked/used by the outliner, so you can remove it from the IR.

77

If you do remove the bulk of the IR code, you'll need to rename the BB labels to bb.0.

82

It would be good to check the contents of the outlined functions as well.

E.g. OUTLINED_FUNCTION_0 should contain the ORI/ADDI/AND sequence.

(It's possible that utils/update_mir_test_checks.py can generate all of the checks you want here, but I'm not sure if it can handle functions being created.)

lewis-revill planned changes to this revision.Sep 11 2019, 2:32 AM
lewis-revill marked an inline comment as done.

I've been testing this on the Embench set of benchmarks (https://github.com/embench/embench-iot), and it appears that some programs cause execution failures with this enabled, which is a little concerning... I'm going to look into this and then also update the tests.

Don't allow return statements in outlined sequences until tail calls to outlined functions are properly implemented. Improve logic for checking if X5 is available for use at the insertion point of an outlined sequence, and update the test to be pure MIR.

lewis-revill marked 3 inline comments as done.Sep 11 2019, 9:28 AM
lewis-revill added inline comments.
llvm/test/CodeGen/RISCV/machineoutliner.mir
2–3

I think there is some issue with the generation of MIR, at least for RISC-V... If I do IR -> MIR -> Assembly with -verify-machineinstrs, I get the error: 'Function has NoVRegs property but there are VReg operands'. I don't get it for IR -> Assembly with -verify-machineinstrs, and I'm not at all sure what to do about it.

82

(It's possible that utils/update_mir_test_checks.py can generate all of the checks you want here, but I'm not sure if it can handle functions being created.)

I'm not sure I can either! Inserting check lines at the end of the file causes parsing errors.

paquette added inline comments.Sep 11 2019, 11:08 AM
llvm/test/CodeGen/RISCV/machineoutliner.mir
2–3

The verifier should tell you which pass is causing this to happen, and in which function.

What is probably happening is that it's because you're using -enable-machine-outliner. This is probably running passes you don't expect. Since this output was generated by -stop-before=machine-outliner, these passes are probably doing weird things.

What you probably want is one of these:

  • -run-pass=machine-outliner: Only run the outliner, and produce MIR output
  • -start-before=machine-outliner: Run the outliner and all following passes, and produce assembly output

I think that using -run-pass is usually better for these kinds of tests. The reason being that it localizes the test to a single pass, rather than several. The outliner is an extremely late pass though, so using -start-before is probably fine.

If you use -run-pass, you'll have to rewrite your CHECK lines as MIR though.

Refactor the MIR test case again:

  • use -run-pass=machine-outliner to only test this pass (required removing the no-machineoutliner checks, they are redundant anyway)
  • add riscv64 checks for completeness?
  • add -verify-machineinstrs

Two small nits, other than that it's looking pretty good.

I presume you got to the bottom of the exception handling issues you saw, and they are now solved.

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
522

It would be useful to include the ABI name of x5 in this comment too.

524

Can you rename this to CannotInsertCall, I kept missing the t when reviewing, which inverts its meaning.

Updated to address comments.

lenary accepted this revision.Sep 17 2019, 7:12 AM

LGTM. Please wait for a review from @asb before landing this.

lenary added a comment.EditedSep 19 2019, 8:22 AM

We discussed this at the RISC-V meeting on 19 Sept 2019. @apazos says she is willing to test this on SPEC.

I have run a couple of standard workloads like SPEC.
There is no correctness issue when enabling the MO feature (except for spec2000/twolf which fails with/without MO).
The MO code size gains without compression are up to 7%. With compression enabled, most of the code size gain is gone, and I see more code size increase.
It is possible it has to do with the SequenceSize we are estimating.
The reason we enable compression late is to have all paths - coming from codegen (LLVM IR), parsing, assembling .s or inline asm - go through the same mechanism for compression.
This is similar/compatible with GCC behavior, which relies on the external assembler to implement compression.
We can better estimate SequenceSize by checking if an instruction is compressable. We can modify the tablegen backend for compression emitter to generate a function that does the isCompressable check, but instead of using MCInst and MCOperands for the checks, we need to use MachineInstr and MachineOperand types. I will try this solution. Another alternative is to compress LLVM IR code before running the machine outliner.

I have run a couple of standard workloads like SPEC.
There is no correctness issue when enabling the MO feature (except for spec2000/twolf which fails with/without MO).
The MO code size gains without compression are up to 7%. With compression enabled, most of the code size gain is gone, and I see more code size increase.
It is possible it has to do with the SequenceSize we are estimating.
The reason we enable compression late is to have all paths - coming from codegen (LLVM IR), parsing, assembling .s or inline asm - go through the same mechanism for compression.
This is similar/compatible with GCC behavior, which relies on the external assembler to implement compression.
We can better estimate SequenceSize by checking if an instruction is compressable. We can modify the tablegen backend for compression emitter to generate a function that does the isCompressable check, but instead of using MCInst and MCOperands for the checks, we need to use MachineInstr and MachineOperand types. I will try this solution. Another alternative is to compress LLVM IR code before running the machine outliner.

Hi Ana, thank you for this analysis, interestingly I still found significant benefits with compressed instructions enabled. It's worth noting that I was using Embench, which targets the embedded use case.

I'm not sure I quite follow what you mean when you say 'Another alternative is to compress LLVM IR code before running the machine outliner', But did you make any progress with the 'isCompressible' method?

Add missing bailout condition after potentially removing candidate locations.

apazos added a comment.Oct 1 2019, 4:03 PM

Hi Lewis,
Here is the change: https://reviews.llvm.org/D68290
With this change I was able to remove the code size degradation I had observed.
Please try it out.

apazos added a comment.Oct 7 2019, 1:57 PM

Lewis, this patch LGTM. You can go ahead and merge it.

To improve code size, I have updated the additional patch in https://reviews.llvm.org/D68290. No change is required in machine outliner code.

lewis-revill retitled this revision from [RFC/WIP][RISCV] Enable the machine outliner for RISC-V to [RISCV] Enable the machine outliner for RISC-V.

Rebased prior to commit; Will run this through the testsuite once more first.

sorear added a subscriber: sorear.Oct 9 2019, 2:54 AM
sorear added inline comments.
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
502

ra and t0 (x1 and x5) have special functionality in implementations with a return-address stack; jr t6 will be treated as a general indirect branch, not a return, and is much more likely to mispredict. So this should probably prefer t0 whenever possible.

549

Tangentially related, but in many of the cases where outlining makes sense, it would also make sense to generate 4-byte jal t0, label instructions; lld and ld.bfd would need to be taught to generate thunks for out of range jal (the Go linker already supports them).

553

Note that jr <reg> is a compressible instruction regardless of the register used.

lewis-revill added a comment.EditedOct 10 2019, 7:59 AM

Rebased prior to commit; Will run this through the testsuite once more first.

I'm seeing some excess failures from this. However, the failures I've looked into so far seem to be timeouts from compilation being too slow. I'll hold off on committing until I've triaged all of them.

lewis-revill added inline comments.Oct 10 2019, 8:01 AM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
553

Thanks, I'll try and incorporate this too.

I need to investigate some failures with this patch whereby sections of code which reference different temporary symbols still get outlined, with only one reference being used in the outlined function.

IE the following sequences should definitely not be outlined:

	lui	a0, %hi(.LCPI1_0)
	addi	a0, a0, %lo(.LCPI1_0)
        ...

...
	lui	a0, %hi(.LCPI2_0)
	addi	a0, a0, %lo(.LCPI2_0)
        ...

...
	lui	a0, %hi(.LCPI3_0)
	addi	a0, a0, %lo(.LCPI3_0)
        ...

Rebased. Added special case for size of jr t0 with compression enabled. Fixed failure in GCC testsuite.

The failure was caused by machine operands referencing constant pool indexes being outlined, even though these indexes are materialized as local labels. This was fixed by simply checking MO.isCPI().

apazos added a comment.Dec 3 2019, 4:39 PM

Lewis, I tested the latest version of this patch (with and without https://reviews.llvm.org/D68290), I don't see any issue. LGTM.

I'm doing a final rebase and check on this before merging. I have a failure in the GCC testsuite which I'm triaging.

So the failure in the GCC testsuite occurs due to the following test:

/* PR rtl-optimization/48141 */
/* { dg-do compile } */
/* { dg-options "-O" } */

#define A i = 0;
#define B A A A A A A A A A A
#define C B B B B B B B B B B
#define D C C C C C C C C C C
#define E D D D D D D D D D D

int
foo (void)
{
  volatile int i = 0;
  E E E E E E E E E E E
  return 0;
}

Which generates a huge amount of work for the machine outliner algorithm to process. I get a segmentation fault when the machine outliner attempts to iterate over the tree to add suffix indices.

Obviously it would be nice for the compiler not to segfault and I'll carry on looking into this further. However I don't see that this indicates a problem with the RISC-V backend specific logic added by this patch, so if there's no objection I may go ahead with merging this later.

Can you open a bug for the machine outliner and maybe contact Jessica, may be she can help fix this quickly.

Can you open a bug for the machine outliner and maybe contact Jessica, may be she can help fix this quickly.

Sorry for the delay. Finding a testcase for X86 is proving extremely time consuming, although I have been able to prove to myself that the same situation is indeed possible. I just don't know enough about the X86 backend to invoke it.

So I think it is easier if I commit the RISC-V implementation, and we can then discuss how to approach the testcase which we already have.

Thanks Lewis for providing the test case. A bug can be opened for the machine outliner with that example. In my opinion you can go ahead and merge this patch.

This revision was not accepted when it landed; it landed in state Needs Review.Dec 19 2019, 8:43 AM
This revision was automatically updated to reflect the committed changes.

https://bugs.llvm.org/show_bug.cgi?id=44344

Pinging @paquette perhaps you have some insight into whether there's anything that can be done to avoid this issue (also discussed above).