This is an archive of the discontinued LLVM Phabricator instance.

[PPC] Move the combine "a << (b % (sizeof(a) * 8)) -> (PPCshl a, b)" to the backend. NFC.
ClosedPublic

Authored by timshen on May 10 2017, 3:23 PM.

Details

Summary

Eli pointed out that it's unsafe to combine the shifts to ISD::SHL etc.,
because those are not defined for b > sizeof(a) * 8, even after some of
the combiners run.

However, PPCISD::SHL defines that behavior (as the instructions themselves).
Move the combination to the backend.

The tests in shift_mask.ll still pass.

Event Timeline

timshen created this revision.May 10 2017, 3:23 PM
efriedma edited edge metadata.May 10 2017, 3:31 PM

Please update the comment in PPCISelLowering.h

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
13006

Duplicated code; can you refactor these?

timshen updated this revision to Diff 98548.May 10 2017, 3:55 PM

Document PPCISD::SHL/SRA/SRL, and factor out common code.

timshen marked an inline comment as done.May 10 2017, 3:56 PM

Looks fine, but someone who knows PPC better should approve.

llvm/lib/Target/PowerPC/PPCISelLowering.h
124

Maybe leave the comment noting that 32-bit shifts are modulo 64?

timshen updated this revision to Diff 98564.May 10 2017, 5:05 PM

Add back the comment on scalar types to PPCISD::SHL/SRA/SRL. Verified the behavior by looking at Power ISA sld/srd/srad instructions.

timshen updated this revision to Diff 98566.May 10 2017, 5:10 PM

32-bit instructions like slw has a subtly different behavior. Make the documentation refer back to the ISA, since it's the intention.

timshen added inline comments.May 10 2017, 5:13 PM
llvm/lib/Target/PowerPC/PPCISelLowering.h
124

Documented.

It's actually more subtle than "modulo 64". FWIW:

For (shl (i32 a), b):

if ((b % 64) >= 32) return 0;
return a << (b % 32)
timshen updated this revision to Diff 98655.May 11 2017, 10:30 AM

Update comments for accuracy.

iteratee added inline comments.May 11 2017, 11:21 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
12957

That's a typo.

llvm/lib/Target/PowerPC/PPCISelLowering.h
128

"For vector types, only the last n bits are used."

llvm/lib/Target/PowerPC/PPCInstrAltivec.td
990

Can these patterns go in a separate patch? They only seem partially related.

timshen added inline comments.May 11 2017, 11:26 AM
llvm/lib/Target/PowerPC/PPCInstrAltivec.td
990

They can, the problem is that there is no way to test that patch, since no one generates PPCshl on vector until this patch.

Do you think it'd be ok to have a separate patch being unable to test?

iteratee added inline comments.May 11 2017, 11:36 AM
llvm/lib/Target/PowerPC/PPCInstrAltivec.td
990

If you add the patterns, they should be generated. That was what happened when I added the vector shift patterns for v1i128

timshen added inline comments.May 11 2017, 12:40 PM
llvm/lib/Target/PowerPC/PPCInstrAltivec.td
990

The patterns will be generated, but they will match nothing at the time (therefore not easy to test).

This is because no one generate the SDNodes "PPCshl", aka PPCISD::SHL, for vector operations, without the changes in stripModuloOnShift().

iteratee added inline comments.May 11 2017, 1:11 PM
llvm/lib/Target/PowerPC/PPCInstrAltivec.td
990

OK, why do you need a separate node for shift?

timshen added inline comments.May 11 2017, 2:07 PM
llvm/lib/Target/PowerPC/PPCInstrAltivec.td
990
  1. It already exists for scalar types. I simply extend it for vector types as well.
  2. They are different because shl (ISD::SHL) has UB, but PPCshl (PPCISD::SHL) doesn't.

For example, (shl (i32 a), b) is UB when b >= 32. (PPCshl (i32 a), b), however, performs a << (b % 32) on vector types, as the instructions do too.

See discussion here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170508/452111.html

iteratee added inline comments.May 11 2017, 2:42 PM
llvm/lib/Target/PowerPC/PPCInstrAltivec.td
990

OK, I get why you're doing it in dagcombiner, but why can't you just produce the instructions directly? Why do you need the intermediate nodes?

timshen added inline comments.May 11 2017, 3:02 PM
llvm/lib/Target/PowerPC/PPCInstrAltivec.td
990

Because I don't want to manually lower it to 8 * 2 = 16 instructions in C++ code. I prefer to write .td patterns for that.

Without PPCISD::SHL I either write C++, or write pattern to match ISD::SHL. But ISD::SHL doesn't have the semantic we want.

timshen updated this revision to Diff 98689.May 11 2017, 3:23 PM

Fix comments.

iteratee accepted this revision.May 11 2017, 3:54 PM

Looks fine with the comments I gave addressed.

llvm/lib/Target/PowerPC/PPCInstrAltivec.td
990

OK. I get the reasoning behind delaying the output.

1099

Minor point, the patterns above are grouped by type and then by PPCxxx vs xxx
Can you group these the same?

This revision is now accepted and ready to land.May 11 2017, 3:54 PM
timshen added inline comments.May 11 2017, 4:06 PM
llvm/lib/Target/PowerPC/PPCInstrAltivec.td
1099

Actually this is consistent - the ordering is (operation, PPC or not, types).

This revision was automatically updated to reflect the committed changes.