Page MenuHomePhabricator

[PPC] Do not emit extswsli in 32BIT mode when using -mcpu=pwr9
ClosedPublic

Authored by ZarkoCA on Sep 2 2020, 10:46 AM.

Details

Summary

It looks like in some circumstances when compiling with -mcpu=pwr9 we create an EXTSWSLI node when which causes llc to fail. No such error occurs in pwr8 or lower.

This occurs in 32BIT AIX and BE Linux. the cause seems to be that the default return in combineSHL is to create an EXTSWSLI node. Adding a check for whether we are in PPC64 before that fixes the issue.

Diff Detail

Event Timeline

ZarkoCA created this revision.Sep 2 2020, 10:46 AM
ZarkoCA requested review of this revision.Sep 2 2020, 10:46 AM
Xiangling_L added inline comments.Sep 18 2020, 11:59 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
16327–16328

It seems you can move Subtarget.isPPC64() here instead. Since the following code returns SDValue() for 32bit no matter what.

llvm/test/CodeGen/PowerPC/ppc-32bit-shift.ll
11

The failing reason has nothing to do with endian but only related to 32bit or 64bit mode. So I am suggesting you can test -mtriple=powerpc and -mtriple=powerpc64 only

18

We can simplify the testcase like the following:
define void @a(i32 %add, i64* %sum_a) {
entry:

%conv = sext i32 %add to i64
%shl = shl nsw i64 %conv, 8
store i64 %shl, i64* %sum_a, align 8
ret void

}

ZarkoCA marked 3 inline comments as done.Sep 21 2020, 5:13 AM
ZarkoCA added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
16327–16328

It is better to add it there.

llvm/test/CodeGen/PowerPC/ppc-32bit-shift.ll
11

Good point, we only care about 32/64bit.

18

That's much better, thanks.

ZarkoCA updated this revision to Diff 293144.Sep 21 2020, 5:15 AM
ZarkoCA marked 3 inline comments as done.

Moved !Subtarget.isPPC64() check earlier alongside other checks.

Reworked test case.

Does PPCMIPeephole::combineSEXTAndSHL have a similar problem that needs to be addressed?

Does PPCMIPeephole::combineSEXTAndSHL have a similar problem that needs to be addressed?

I thought so, but it looks like that function is only called when PPCMIPeephole::simplifyCode sees an RLDICR Opcode which is a 64Bit only instruction. So it looks like we avoid this problem there.

Xiangling_L added inline comments.Sep 23 2020, 7:58 AM
llvm/test/CodeGen/PowerPC/ppc-32bit-shift.ll
18

It seems, you missed matching the third part of srawi instruction. Any reason of doing so?

19

I am suggesting to use eg. [[REG3:[0-9]+]] to match register, use {{[0-9]+}} to match numbers. Or since 8 is hardcoded in the IR, we don't need to worry about this value will change in the assembly .

For example,

rotlwi [[REG3:[0-9]+]], [[REG2]], 8

Does PPCMIPeephole::combineSEXTAndSHL have a similar problem that needs to be addressed?

I thought so, but it looks like that function is only called when PPCMIPeephole::simplifyCode sees an RLDICR Opcode which is a 64Bit only instruction. So it looks like we avoid this problem there.

Thanks for looking into it.

llvm/test/CodeGen/PowerPC/ppc-32bit-shift.ll
16

Is the . character treated as a regex? Neither 32-bit or 64-bit assembly will produce .a in the assembly, both produce a:, but the test works, so I am clearly missing something.

19

+1, and same for the ME/MB fields of the rlwimi instruction.

23

Real minor nit, but there is a trailing space at the end of this line.

ZarkoCA updated this revision to Diff 293927.Sep 23 2020, 8:01 PM

Updated test case.

ZarkoCA marked 5 inline comments as done.Sep 23 2020, 8:11 PM
ZarkoCA added inline comments.
llvm/test/CodeGen/PowerPC/ppc-32bit-shift.ll
16

I changed it to foo and removed the period for clarity. It's strange that it wasn't caught.

18

I wanted to only check the register usage but leaving out the SH value makes it more confusing so I added it.

19

Thanks for catching that. I left the actual values for the shifts and the mask end/begin, and used variables for the registers in the 32BIT case. Is that better?

nemanjai accepted this revision.Sep 24 2020, 7:50 AM

This LGTM. My comment isn't really meant to require you to change the test, just to indicate that there are a lot of moving parts that are very difficult to account for completely. So if update_llc_test_checks.py works, you're better off just using that.

llvm/test/CodeGen/PowerPC/ppc-32bit-shift.ll
2

I assume you are not using the script to produce the checks here because it doesn't work with 32-bit PPC codegen or something along those lines. If it does work, it would be preferable to use it.

For example, you have the srawi necessarily preceding the rotlwi but they are independent. So you go through a fair bit of trouble to keep the register allocation flexible, but the instruction order could change. And technically, there is no requirement that the target of the slwi is the same as the source.
Plus the stores are missing so you are not testing that the sign bits are ending up in the right place.

This revision is now accepted and ready to land.Sep 24 2020, 7:50 AM
ZarkoCA updated this revision to Diff 294123.Sep 24 2020, 11:44 AM
ZarkoCA marked 2 inline comments as done.

Generated test case using update_llc_test_checks.py, added stores and used DAG instead of NEXT when appropriate.

Xiangling_L added inline comments.Fri, Sep 25, 10:51 AM
llvm/test/CodeGen/PowerPC/ppc-32bit-shift.ll
22

It seems, [[REG3]] does not need to be the same register as the one used in rotlwi?

23

I am wondering why we don't use regex to match register 4 here?

28

Same question as above.

ZarkoCA marked 4 inline comments as done.Fri, Sep 25, 1:01 PM
ZarkoCA added inline comments.
llvm/test/CodeGen/PowerPC/ppc-32bit-shift.ll
2

Thanks for the review, I added the stores and tried to address your points.

ZarkoCA updated this revision to Diff 294408.Fri, Sep 25, 1:01 PM
ZarkoCA marked an inline comment as done.

Fixed register usage.

Xiangling_L added inline comments.Fri, Sep 25, 4:54 PM
llvm/test/CodeGen/PowerPC/ppc-32bit-shift.ll
27

Sorry, I should've mentioned this earlier, we want to follow the same rules here for 64bit as above 32bit cases.
For example:

; 64BIT:               extswsli [[REG1:[0-9]+]], {{REG2:[0-9]+}}, 8
ZarkoCA added inline comments.Fri, Sep 25, 8:09 PM
llvm/test/CodeGen/PowerPC/ppc-32bit-shift.ll
27

Does REG2 need to be matched anywhere?

ZarkoCA updated this revision to Diff 295287.Wed, Sep 30, 7:56 AM
ZarkoCA edited the summary of this revision. (Show Details)

Used REG variables in 64Bit part of test case.

ZarkoCA updated this revision to Diff 295288.Wed, Sep 30, 7:59 AM
ZarkoCA edited the summary of this revision. (Show Details)

Code changes didn't get included in last diff.

This revision was landed with ongoing or failed builds.Wed, Sep 30, 8:07 AM
This revision was automatically updated to reflect the committed changes.