This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Fix SH field overflow issue
ClosedPublic

Authored by Yi-Hong.Lyu on Aug 29 2019, 9:56 PM.

Details

Summary

Store rlwinm Rx, Ry, 32, 0, 31 as rlwinm Rx, Ry, 0, 0, 31 and store
rldicl Rx, Ry, 64, 0 as rldicl Rx, Ry, 0, 0. Otherwise SH field is overflow
and fails assertion in assembly printing stage.

Diff Detail

Repository
rL LLVM

Event Timeline

Yi-Hong.Lyu created this revision.Aug 29 2019, 9:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2019, 9:57 PM

The fix looks good to me.
However, I feel that the test is over-specified. What you are looking for is a rotate or a shift with an immediate of zero and to make sure that we can safely print those kinds of instructions to assembly. I feel that using the update_llc_test_checks.py script only produces a test that will fail when other changes are made later on and forcing the developer of that changeset to regenerate your test.

llvm/test/CodeGen/PowerPC/convert-rr-to-ri-instrs.ll
5 ↗(On Diff #218023)

Should add a comment here to say what you are testing.

11 ↗(On Diff #218023)

So, if I understand correctly, this is the instruction that we had trouble printing.

slwi r5, r3, 0

I think the test should just look for this instruction and make sure that the zero is there correctly. The test as it stands is over-specified. The tool to generate tests is nice but it creates overhead for the future when scheduling or register allocation changes slightly then tests like this one would need to be updated.
I think a set of simpler checks might work better. Look for the label (CHECK-LABEL) as you have done. Then look for the slwi with the zero making sure you use some kind of wildcard for the register numbers. I would usually also look for the blr but you don't have one for this case so that's fine.

47 ↗(On Diff #218023)

Same goes for this. You can simplify the checks quite a bit here.

stefanp requested changes to this revision.Aug 30 2019, 1:01 PM
This revision now requires changes to proceed.Aug 30 2019, 1:01 PM
Yi-Hong.Lyu edited the summary of this revision. (Show Details)Aug 31 2019, 2:27 PM

Address Stefan's comments

Yi-Hong.Lyu marked 3 inline comments as done.Sep 8 2019, 8:28 PM

ping

LGTM. I'll let Stefan give the final ack though.

stefanp accepted this revision.Sep 12 2019, 7:10 AM

Sorry it took me so long to get back to this.
LGTM.

This revision is now accepted and ready to land.Sep 12 2019, 7:10 AM
amyk added a subscriber: amyk.Sep 12 2019, 7:23 PM

I think this also LGTM overall.

llvm/test/CodeGen/PowerPC/convert-rr-to-ri-instrs.ll
2 ↗(On Diff #218252)

We don't need this line, do we?

jsji requested changes to this revision.Sep 13 2019, 8:23 AM
jsji added a reviewer: Restricted Project.

Can you please use MIR test instead. Thanks.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3592 ↗(On Diff #218252)

Comments should be updated too.

llvm/test/CodeGen/PowerPC/convert-rr-to-ri-instrs.ll
3 ↗(On Diff #218252)

Remove this line as well.

15 ↗(On Diff #218252)

run-pass may not trigger the assert, as we don't call the InstPrinters.
But I think you should still be able to run MIR test and trigger the assert with start-before.
eg:

$ llc -O3 -stop-before ppc-mi-peepholes ../llvm-project/llvm/test/CodeGen/PowerPC/convert-rr-to-ri-instrs.ll -o t.mir
$ llc -O3 -start-before ppc-mi-peepholes t.mir
llc: .../llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp:327: void llvm::PPCInstPrinter::printU5ImmOperand(const llvm::MCInst*, unsigned int, llvm::raw_ostream&): Assertion `Value <= 31 && "Invalid u5imm argument!"' failed.
Stack dump:
0.      Program arguments: llc -O3 -start-before ppc-mi-peepholes t.mir 
1.      Running pass 'Function Pass Manager' on module 't.mir'.
2.      Running pass 'Linux PPC Assembly Printer' on function '@special_right_shift32_0'
...
Aborted (core dumped)
This revision now requires changes to proceed.Sep 13 2019, 8:23 AM

Address Jinsong's comment

Yi-Hong.Lyu marked 6 inline comments as done.Sep 17 2019, 10:12 PM
Yi-Hong.Lyu added inline comments.
llvm/test/CodeGen/PowerPC/convert-rr-to-ri-instrs.ll
2 ↗(On Diff #218252)

No longer an issue since MIR doesn't have such field

3 ↗(On Diff #218252)

No longer an issue since MIR doesn't have such field

jsji requested changes to this revision.Sep 18 2019, 11:16 AM

Thanks for working on MIR test, I think we can reduce it a little further.

llvm/test/CodeGen/PowerPC/SH-field-overflow.mir
1 ↗(On Diff #220612)

Add -mtriple please, don't rely on the target triple in optional embedded LLVM IR module.

10 ↗(On Diff #220612)

We know what exactly we want to test in MIR, so why don't we reduce this MIR test further ?
eg: a MIR with following lines should be sufficient for 32 bit, you can add another module for 64 bits, and that should be all.

$ cat sh-overflow.mir
---
name:            special_right_shift32_0
liveins:
  - { reg: '$x3'}
  - { reg: '$x4'}
tracksRegLiveness: true
body:             |
  bb.0.entry:
    liveins: $r3, $r4
  
    renamable $r4 = LI 0
    renamable $r3 = SRW renamable $r3, renamable $r4
    BLR8 implicit $lr8, implicit $rm, implicit $x3
...
This revision now requires changes to proceed.Sep 18 2019, 11:16 AM
Yi-Hong.Lyu marked 4 inline comments as done.Sep 22 2019, 1:19 PM
Yi-Hong.Lyu added inline comments.
llvm/test/CodeGen/PowerPC/SH-field-overflow.mir
10 ↗(On Diff #220612)

Multiple passes after ppc-mi-peepholes relies on SSA form. Apparently, the test case is not in SSA form so we can't just use it.

Yi-Hong.Lyu marked an inline comment as done.

Address Jinsong's comments

Yi-Hong.Lyu marked an inline comment as done.Sep 22 2019, 1:30 PM
lkail added a subscriber: lkail.EditedSep 22 2019, 8:16 PM

I think the test can be simplified more, like what @jsji has mentioned and I think it should not be related to SSA form. Test is kinda like

# RUN: llc -O3 -mtriple=powerpc64le-unknown-linux-gnu -stop-after=ppc-pre-emit-peephole -verify-machineinstrs %s -o - | FileCheck %s
---                                                                                                                                                                           
name: foo                                                                                                                                                                   
alignment: 4                                                                                                                                                                  
tracksRegLiveness: true                                                                                                                                                       
body: |                                                                                                                                                                       
  bb.0.entry:                                                                                                                                                                 
    liveins: $r3                                                                                                                                                              
    renamable $r4 = LI 0                                                                                                                                                      
    renamable $r5 = SRW renamable $r3, renamable killed $r4                                                                                                                   
    $r3 = COPY renamable killed $r5                                                                                                                                           
    BLR implicit $lr, implicit $rm, implicit $r3                                                                                                                              
...

We can just let llc -stop-after=ppc-pre-emit-peephole here since we have known the cause. Another nit, an NFC patch to pre-commit the test would also be preferable.

jsji added a comment.Sep 23 2019, 8:39 AM

You are right, we should make sure we meet SSA constraint when writing the MIR for passes before RA.

However, this does NOT prevent us from further reducing the testcase.

llvm/test/CodeGen/PowerPC/SH-field-overflow.mir
10 ↗(On Diff #220612)

Good point, sorry, I did not check SSA constraint for MIR carefully.
My example above is mostly just to demo the idea of how to reduce the case

How about something like these?

$ cat t2.mir
---
name:            special_right_shift32_0
liveins:
  - { reg: '$r4'}
  - { reg: '$r5'}
tracksRegLiveness: true
body:             |
  bb.0.entry:
    liveins: $r5, $r4
  
    renamable $r4 = LI 0
    renamable $r3 = SRW killed renamable $r5, killed renamable $r4, implicit-def $x3
    BLR8 implicit $lr8, implicit $rm, implicit killed $x3
...
$ cat t3.mir
---
name:            special_right_shift64_0
liveins:
  - { reg: '$r4'}
  - { reg: '$x5'}
tracksRegLiveness: true
body:             |
  bb.0.entry:
    liveins: $r4, $x5
  
    renamable $r4 = LI 0
    renamable $x3 = SRD killed renamable $x5, killed renamable $r4
    BLR8 implicit $lr8, implicit $rm, implicit killed $x3
...
jsji added a comment.Sep 23 2019, 8:43 AM

You are right, we should make sure we meet SSA constraint when writing the MIR for passes before RA.

BTW: isSSA() in MIRParser will always return true for MIRs that does NOT have virtual regs,
I think this is a limitation for now, we might want to extend it to support MIR with all HWregs,
but for now, we should check the code manually and carefully. Thanks for pointing out.

Reduce the MIR test cases (Address Jinsong's comments)

Yi-Hong.Lyu marked 2 inline comments as done.Sep 23 2019, 12:12 PM
Yi-Hong.Lyu added inline comments.
llvm/test/CodeGen/PowerPC/SH-field-overflow.mir
10 ↗(On Diff #220612)

Thanks for your test case. Both of your test cases works. One concern is that they are non-SSA MIR tests but we apply some SSA MIR passes (e.g., Live Variable Analysis) on them. I come up with reduced SSA MIR tests, how do you think?

jsji added a comment.Sep 23 2019, 2:06 PM

I am OK with you using virtual regs, and the reduced case looks mostly good to me, but I think we should try to avoid rely on other opts before peephole.

llvm/test/CodeGen/PowerPC/SH-field-overflow.mir
10 ↗(On Diff #220612)

Why they are non-SSA?

llvm/test/CodeGen/PowerPC/sh-overflow.mir
1 ↗(On Diff #221381)

-start-after=phi-node-elimination ? Why not start-before=ppc-mi-peepholes? Does this imply that we rely on some other passes to generate the necessary input?

Can you come up with one that works with either start-before=ppc-mi-peepholes or -start-after ppc-mi-peepholes -ppc-late-peephole ?

50 ↗(On Diff #221381)

nit: alignment and maxAlignment seems random here, any reason you want to use 1 here?

nemanjai added inline comments.Sep 24 2019, 3:33 AM
llvm/test/CodeGen/PowerPC/SH-field-overflow.mir
10 ↗(On Diff #220612)

I don't understand this question. SSA == Static Single Assignment form. Since r4 is live-in and defined in the entry block, it is not SSA.

But this is really arguing semantics. I haven't checked, but it is entirely possible that the MachineRegisterInfo does not compute the defs and uses of physical registers when consuming MIR. So if we are providing an MIR test case that we are passing to pre-RA passes, we should write them without the registers already allocated.

llvm/test/CodeGen/PowerPC/sh-overflow.mir
1 ↗(On Diff #221381)

Yes, let's change the test case to have 2 RUN lines. One starts before MI Peephole. The other starts after it. The checks should be the same for both as we should do the right thing in both places.

Also, please do not talk about the "current state of things" or "what the patch fixes" in the source/tests. These statements make sense only in the context of the patch now, but later down the road, such statements are meaningless. It should suffice to say something like:
"Ensure we do not attempt to transform this into srwi $r3, $r3, 0 in the form specified by ISA 3.0b (rlwinm $r3, $r3, 32 - 0, 0, 31)"

jsji added inline comments.Sep 24 2019, 7:16 AM
llvm/test/CodeGen/PowerPC/SH-field-overflow.mir
10 ↗(On Diff #220612)

Ah, yes, I forgot to remove the live-in for $r4 here. Thanks @nemanjai .

Yes, agree that if we are providing an MIR test to pre-RA passes, we should try our best to write them without the register already allocated.

I think we shouldn't assume that a testcase without virtual register is definitely NOT SSA.
Although it is indeed really error prone without the MIRParse isSSA() check support,
and so should be avoided if possible. Thanks.

llvm/test/CodeGen/PowerPC/sh-overflow.mir
1 ↗(On Diff #221381)

Good point.

Yi-Hong.Lyu marked 2 inline comments as done.Sep 27 2019, 10:03 AM
Yi-Hong.Lyu added inline comments.
llvm/test/CodeGen/PowerPC/sh-overflow.mir
50 ↗(On Diff #221381)

The special_right_shift32_0 is derived from

unsigned int test(unsigned int a, unsigned int b) {
  return a >> b;
}

generated by clang --target=powerpc-unknown-unknown. In contrast, the special_right_shift64_0 is derived from

unsigned long test(unsigned long a, unsigned long b) {
  return a >> b;
}

generated by clang --target=powerpc64-unknown-unknown. I just leave alignment and maxAlignment as it is. What alignment and maxAlignment do you think it should be?

jsji added inline comments.Sep 27 2019, 3:16 PM
llvm/test/CodeGen/PowerPC/sh-overflow.mir
50 ↗(On Diff #221381)

Thanks for explanation. I don't have specific number in mind, just wondering why it is 1? since it is smaller than 2 above.

Yi-Hong.Lyu marked 2 inline comments as done.Sep 30 2019, 12:37 AM
Yi-Hong.Lyu added inline comments.
llvm/test/CodeGen/PowerPC/sh-overflow.mir
1 ↗(On Diff #221381)

Given the MIR:

$ cat special_right_shift32_0.mir
---
name:            test
alignment:       2
tracksRegLiveness: true
registers:
  - { id: 0, class: gprc }
  - { id: 1, class: gprc }
  - { id: 2, class: gprc }
liveins:
  - { reg: '$r3', virtual-reg: '%0' }
machineFunctionInfo: {}
body:             |
  bb.0.entry:
    liveins: $r3

    %1:gprc = LI 0
    %0:gprc = COPY $r3
    %2:gprc = SRW %0, %1
    $r3 = COPY %2
    BLR implicit $lr, implicit $rm, implicit $r3

...
$ llc -O3 -mtriple=powerpc64-unknown-linux-gnu -start-before=ppc-mi-peepholes special_right_shift32_0.mir -o special_right_shift32_0.before.s
$ llc -O3 -mtriple=powerpc64-unknown-linux-gnu -start-after=ppc-mi-peepholes -ppc-late-peephole special_right_shift32_0.mir -o special_right_shift32_0.after.s
$ diff special_right_shift32_0.before.s special_right_shift32_0.after.s
15a16
>       slwi 3, 3, 0

All the assembly in special_right_shift32_0.before.s is optimized out and it contains only blr[1]. In contrast, special_right_shift32_0.after.s has expected output[2]. That is, we get different results for the 2 RUN lines.

[1]

$ cat special_right_shift32_0.before.s
        .text
        .file   "special_right_shift32_0.mir"
        .globl  test                    # -- Begin function test
        .p2align        2
        .type   test,@function
        .section        .opd,"aw",@progbits
test:                                   # @test
        .p2align        3
        .quad   .Lfunc_begin0
        .quad   .TOC.@tocbase
        .quad   0
        .text
.Lfunc_begin0:
        .cfi_startproc
# %bb.0:                                # %entry
        blr
        .long   0
        .quad   0
.Lfunc_end0:
        .size   test, .Lfunc_end0-.Lfunc_begin0
        .cfi_endproc
                                        # -- End function

        .section        ".note.GNU-stack","",@progbits

[2]

$ cat special_right_shift32_0.after.s
        .text
        .file   "special_right_shift32_0.mir"
        .globl  test                    # -- Begin function test
        .p2align        2
        .type   test,@function
        .section        .opd,"aw",@progbits
test:                                   # @test
        .p2align        3
        .quad   .Lfunc_begin0
        .quad   .TOC.@tocbase
        .quad   0
        .text
.Lfunc_begin0:
        .cfi_startproc
# %bb.0:                                # %entry
        slwi 3, 3, 0
        blr
        .long   0
        .quad   0
.Lfunc_end0:
        .size   test, .Lfunc_end0-.Lfunc_begin0
        .cfi_endproc
                                        # -- End function

        .section        ".note.GNU-stack","",@progbits
50 ↗(On Diff #221381)

According to https://llvm.org/docs/MIRLangRef.html#simplifying-mir-files:

The whole frameInfo section is often unnecessary if there is no special frame usage in the function.

Would remove the whole frameInfo section and leave alignment: 2 there for both cases.

jsji added inline comments.Sep 30 2019, 8:32 AM
llvm/test/CodeGen/PowerPC/sh-overflow.mir
1 ↗(On Diff #221381)

I believe we do difference opts depends on PostRA, that maybe why you can not get exact results with above two RUN line.

50 ↗(On Diff #221381)

OK for me. Thanks.

Address Jonsong and Nemanja's comments

Yi-Hong.Lyu marked 7 inline comments as done.Oct 2 2019, 7:15 AM
jsji accepted this revision as: jsji.Oct 2 2019, 7:29 AM

LGTM. Thanks for the patience during review.

llvm/test/CodeGen/PowerPC/sh-overflow.mir
19 ↗(On Diff #222829)

This might be a little confusing -- why avoiding transform into a form in ISA 3.0b , which normally is supposed to be correct?

yes, I think you are trying to express that ISA 3.0b also have the bug when describing extended mnemonics of srwi and srdi. In that case, I think we should contact the ISA team to fix it. We don't need to emphasis that in the comments here.

jsji accepted this revision.Oct 2 2019, 7:29 AM
This revision is now accepted and ready to land.Oct 2 2019, 7:29 AM
This revision was automatically updated to reflect the committed changes.