Store rlwinm Rx, Ry, 32, 0, 31 as rlwinm Rx, Ry, 0, 0, 31 and store
rldicl Rx, Ry, 64, 0 as rldicl Rx, Ry, 0, 0. Otherwise SH field is overflow
and fails assertion in assembly printing stage.
Details
- Reviewers
power-llvm-team hfinkel echristo nemanjai stefanp lei jsji - Group Reviewers
Restricted Project - Commits
- rGc7be06797436: [PowerPC] Fix SH field overflow issue
rL373519: [PowerPC] Fix SH field overflow issue
Diff Detail
Event Timeline
The fix looks good to me.
However, I feel that the test is over-specified. What you are looking for is a rotate or a shift with an immediate of zero and to make sure that we can safely print those kinds of instructions to assembly. I feel that using the update_llc_test_checks.py script only produces a test that will fail when other changes are made later on and forcing the developer of that changeset to regenerate your test.
llvm/test/CodeGen/PowerPC/convert-rr-to-ri-instrs.ll | ||
---|---|---|
5 ↗ | (On Diff #218023) | Should add a comment here to say what you are testing. |
11 ↗ | (On Diff #218023) | So, if I understand correctly, this is the instruction that we had trouble printing. slwi r5, r3, 0 I think the test should just look for this instruction and make sure that the zero is there correctly. The test as it stands is over-specified. The tool to generate tests is nice but it creates overhead for the future when scheduling or register allocation changes slightly then tests like this one would need to be updated. |
47 ↗ | (On Diff #218023) | Same goes for this. You can simplify the checks quite a bit here. |
I think this also LGTM overall.
llvm/test/CodeGen/PowerPC/convert-rr-to-ri-instrs.ll | ||
---|---|---|
2 ↗ | (On Diff #218252) | We don't need this line, do we? |
Can you please use MIR test instead. Thanks.
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp | ||
---|---|---|
3593–3594 | Comments should be updated too. | |
llvm/test/CodeGen/PowerPC/convert-rr-to-ri-instrs.ll | ||
3 ↗ | (On Diff #218252) | Remove this line as well. |
15 ↗ | (On Diff #218252) | run-pass may not trigger the assert, as we don't call the InstPrinters. $ llc -O3 -stop-before ppc-mi-peepholes ../llvm-project/llvm/test/CodeGen/PowerPC/convert-rr-to-ri-instrs.ll -o t.mir $ llc -O3 -start-before ppc-mi-peepholes t.mir llc: .../llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp:327: void llvm::PPCInstPrinter::printU5ImmOperand(const llvm::MCInst*, unsigned int, llvm::raw_ostream&): Assertion `Value <= 31 && "Invalid u5imm argument!"' failed. Stack dump: 0. Program arguments: llc -O3 -start-before ppc-mi-peepholes t.mir 1. Running pass 'Function Pass Manager' on module 't.mir'. 2. Running pass 'Linux PPC Assembly Printer' on function '@special_right_shift32_0' ... Aborted (core dumped) |
Thanks for working on MIR test, I think we can reduce it a little further.
llvm/test/CodeGen/PowerPC/SH-field-overflow.mir | ||
---|---|---|
1 ↗ | (On Diff #220612) | Add -mtriple please, don't rely on the target triple in optional embedded LLVM IR module. |
10 ↗ | (On Diff #220612) | We know what exactly we want to test in MIR, so why don't we reduce this MIR test further ? $ cat sh-overflow.mir --- name: special_right_shift32_0 liveins: - { reg: '$x3'} - { reg: '$x4'} tracksRegLiveness: true body: | bb.0.entry: liveins: $r3, $r4 renamable $r4 = LI 0 renamable $r3 = SRW renamable $r3, renamable $r4 BLR8 implicit $lr8, implicit $rm, implicit $x3 ... |
llvm/test/CodeGen/PowerPC/SH-field-overflow.mir | ||
---|---|---|
10 ↗ | (On Diff #220612) | Multiple passes after ppc-mi-peepholes relies on SSA form. Apparently, the test case is not in SSA form so we can't just use it. |
I think the test can be simplified more, like what @jsji has mentioned and I think it should not be related to SSA form. Test is kinda like
# RUN: llc -O3 -mtriple=powerpc64le-unknown-linux-gnu -stop-after=ppc-pre-emit-peephole -verify-machineinstrs %s -o - | FileCheck %s --- name: foo alignment: 4 tracksRegLiveness: true body: | bb.0.entry: liveins: $r3 renamable $r4 = LI 0 renamable $r5 = SRW renamable $r3, renamable killed $r4 $r3 = COPY renamable killed $r5 BLR implicit $lr, implicit $rm, implicit $r3 ...
We can just let llc -stop-after=ppc-pre-emit-peephole here since we have known the cause. Another nit, an NFC patch to pre-commit the test would also be preferable.
You are right, we should make sure we meet SSA constraint when writing the MIR for passes before RA.
However, this does NOT prevent us from further reducing the testcase.
llvm/test/CodeGen/PowerPC/SH-field-overflow.mir | ||
---|---|---|
10 ↗ | (On Diff #220612) | Good point, sorry, I did not check SSA constraint for MIR carefully. How about something like these? $ cat t2.mir --- name: special_right_shift32_0 liveins: - { reg: '$r4'} - { reg: '$r5'} tracksRegLiveness: true body: | bb.0.entry: liveins: $r5, $r4 renamable $r4 = LI 0 renamable $r3 = SRW killed renamable $r5, killed renamable $r4, implicit-def $x3 BLR8 implicit $lr8, implicit $rm, implicit killed $x3 ... $ cat t3.mir --- name: special_right_shift64_0 liveins: - { reg: '$r4'} - { reg: '$x5'} tracksRegLiveness: true body: | bb.0.entry: liveins: $r4, $x5 renamable $r4 = LI 0 renamable $x3 = SRD killed renamable $x5, killed renamable $r4 BLR8 implicit $lr8, implicit $rm, implicit killed $x3 ... |
BTW: isSSA() in MIRParser will always return true for MIRs that does NOT have virtual regs,
I think this is a limitation for now, we might want to extend it to support MIR with all HWregs,
but for now, we should check the code manually and carefully. Thanks for pointing out.
llvm/test/CodeGen/PowerPC/SH-field-overflow.mir | ||
---|---|---|
10 ↗ | (On Diff #220612) | Thanks for your test case. Both of your test cases works. One concern is that they are non-SSA MIR tests but we apply some SSA MIR passes (e.g., Live Variable Analysis) on them. I come up with reduced SSA MIR tests, how do you think? |
I am OK with you using virtual regs, and the reduced case looks mostly good to me, but I think we should try to avoid rely on other opts before peephole.
llvm/test/CodeGen/PowerPC/SH-field-overflow.mir | ||
---|---|---|
10 ↗ | (On Diff #220612) | Why they are non-SSA? |
llvm/test/CodeGen/PowerPC/sh-overflow.mir | ||
1 | -start-after=phi-node-elimination ? Why not start-before=ppc-mi-peepholes? Does this imply that we rely on some other passes to generate the necessary input? Can you come up with one that works with either start-before=ppc-mi-peepholes or -start-after ppc-mi-peepholes -ppc-late-peephole ? | |
50 | nit: alignment and maxAlignment seems random here, any reason you want to use 1 here? |
llvm/test/CodeGen/PowerPC/SH-field-overflow.mir | ||
---|---|---|
10 ↗ | (On Diff #220612) | I don't understand this question. SSA == Static Single Assignment form. Since r4 is live-in and defined in the entry block, it is not SSA. But this is really arguing semantics. I haven't checked, but it is entirely possible that the MachineRegisterInfo does not compute the defs and uses of physical registers when consuming MIR. So if we are providing an MIR test case that we are passing to pre-RA passes, we should write them without the registers already allocated. |
llvm/test/CodeGen/PowerPC/sh-overflow.mir | ||
1 | Yes, let's change the test case to have 2 RUN lines. One starts before MI Peephole. The other starts after it. The checks should be the same for both as we should do the right thing in both places. Also, please do not talk about the "current state of things" or "what the patch fixes" in the source/tests. These statements make sense only in the context of the patch now, but later down the road, such statements are meaningless. It should suffice to say something like: |
llvm/test/CodeGen/PowerPC/SH-field-overflow.mir | ||
---|---|---|
10 ↗ | (On Diff #220612) | Ah, yes, I forgot to remove the live-in for $r4 here. Thanks @nemanjai . Yes, agree that if we are providing an MIR test to pre-RA passes, we should try our best to write them without the register already allocated. I think we shouldn't assume that a testcase without virtual register is definitely NOT SSA. |
llvm/test/CodeGen/PowerPC/sh-overflow.mir | ||
1 | Good point. |
llvm/test/CodeGen/PowerPC/sh-overflow.mir | ||
---|---|---|
50 | The special_right_shift32_0 is derived from unsigned int test(unsigned int a, unsigned int b) { return a >> b; } generated by clang --target=powerpc-unknown-unknown. In contrast, the special_right_shift64_0 is derived from unsigned long test(unsigned long a, unsigned long b) { return a >> b; } generated by clang --target=powerpc64-unknown-unknown. I just leave alignment and maxAlignment as it is. What alignment and maxAlignment do you think it should be? |
llvm/test/CodeGen/PowerPC/sh-overflow.mir | ||
---|---|---|
50 | Thanks for explanation. I don't have specific number in mind, just wondering why it is 1? since it is smaller than 2 above. |
llvm/test/CodeGen/PowerPC/sh-overflow.mir | ||
---|---|---|
1 | Given the MIR: $ cat special_right_shift32_0.mir --- name: test alignment: 2 tracksRegLiveness: true registers: - { id: 0, class: gprc } - { id: 1, class: gprc } - { id: 2, class: gprc } liveins: - { reg: '$r3', virtual-reg: '%0' } machineFunctionInfo: {} body: | bb.0.entry: liveins: $r3 %1:gprc = LI 0 %0:gprc = COPY $r3 %2:gprc = SRW %0, %1 $r3 = COPY %2 BLR implicit $lr, implicit $rm, implicit $r3 ... $ llc -O3 -mtriple=powerpc64-unknown-linux-gnu -start-before=ppc-mi-peepholes special_right_shift32_0.mir -o special_right_shift32_0.before.s $ llc -O3 -mtriple=powerpc64-unknown-linux-gnu -start-after=ppc-mi-peepholes -ppc-late-peephole special_right_shift32_0.mir -o special_right_shift32_0.after.s $ diff special_right_shift32_0.before.s special_right_shift32_0.after.s 15a16 > slwi 3, 3, 0 All the assembly in special_right_shift32_0.before.s is optimized out and it contains only blr[1]. In contrast, special_right_shift32_0.after.s has expected output[2]. That is, we get different results for the 2 RUN lines. [1] $ cat special_right_shift32_0.before.s .text .file "special_right_shift32_0.mir" .globl test # -- Begin function test .p2align 2 .type test,@function .section .opd,"aw",@progbits test: # @test .p2align 3 .quad .Lfunc_begin0 .quad .TOC.@tocbase .quad 0 .text .Lfunc_begin0: .cfi_startproc # %bb.0: # %entry blr .long 0 .quad 0 .Lfunc_end0: .size test, .Lfunc_end0-.Lfunc_begin0 .cfi_endproc # -- End function .section ".note.GNU-stack","",@progbits [2] $ cat special_right_shift32_0.after.s .text .file "special_right_shift32_0.mir" .globl test # -- Begin function test .p2align 2 .type test,@function .section .opd,"aw",@progbits test: # @test .p2align 3 .quad .Lfunc_begin0 .quad .TOC.@tocbase .quad 0 .text .Lfunc_begin0: .cfi_startproc # %bb.0: # %entry slwi 3, 3, 0 blr .long 0 .quad 0 .Lfunc_end0: .size test, .Lfunc_end0-.Lfunc_begin0 .cfi_endproc # -- End function .section ".note.GNU-stack","",@progbits | |
50 | According to https://llvm.org/docs/MIRLangRef.html#simplifying-mir-files: The whole frameInfo section is often unnecessary if there is no special frame usage in the function. Would remove the whole frameInfo section and leave alignment: 2 there for both cases. |
LGTM. Thanks for the patience during review.
llvm/test/CodeGen/PowerPC/sh-overflow.mir | ||
---|---|---|
20 | This might be a little confusing -- why avoiding transform into a form in ISA 3.0b , which normally is supposed to be correct? yes, I think you are trying to express that ISA 3.0b also have the bug when describing extended mnemonics of srwi and srdi. In that case, I think we should contact the ISA team to fix it. We don't need to emphasis that in the comments here. |
Comments should be updated too.