This is an archive of the discontinued LLVM Phabricator instance.

[Codegen][SelectionDAG] X u% C == 0 fold: non-splat vector improvements
ClosedPublic

Authored by lebedev.ri on Jun 28 2019, 4:14 PM.

Details

Summary

Four things here:

  1. Generalize the fold to handle non-splat divisors. Reasonably trivial.
  2. Unban power-of-two divisors. I don't see any reason why they should be illegal.
    • There is no ban in Hacker's Delight
    • I think the ban came from the same bug that caused the miscompile - in floor((2^W - 1) / D) we were dividing by D0 instead of D, and we were ensuring that D0 is not 1, which made sense.
  3. Unban 1 divisors. I no longer believe Hacker's Delight actually says that the fold is invalid for D = 0. Further considerations:
    • We know that
      • (X u% 1) == 0 can be constant-folded to 1,
      • (X u% 1) != 0 can be constant-folded to 0,
    • Also, we know that
      • X u<= -1 can be constant-folded to 1,
      • X u> -1 can be constant-folded to 0,
    • https://godbolt.org/z/7jnZJX https://rise4fun.com/Alive/oF6p
    • We know will end up with the following: (setule/setugt (rotr (mul N, P), K), Q)
    • Therefore, for given new DAG nodes and comparison predicates (ule/ugt), we will still produce the correct answer if: Q is a all-ones constant; and both P and K are *anything* other than undef.
    • The fold will indeed produce Q = all-ones.
  4. Try to re-splat the P and K vectors - we don't care about their values for the lines where divisor was 1.

Diff Detail

Event Timeline

lebedev.ri created this revision.Jun 28 2019, 4:14 PM

It really bothered me, so here's alive proofs for powers-of-two:

$ ~/src/alive2/build-Clang-release/alive /tmp/test.ll 
OMP: Info #270: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Processing /tmp/test.ll..

----------------------------------------
Name: power-of-two
  %o0 = shl i16 1, %K ; so %o0 is a power-of-two
  %o1 = urem i16 %X, %o0
  %r = icmp eq i16 %o1, 0
=>
  %o0 = shl i16 1, %K ; so %o0 is a power-of-two
  %o1 = urem i16 %X, %o0
  %n0 = mul i16 %X, 1 ; multiplicative inverse of d0=1 is 1
  %n1 = fshr i16 %n0, %n0, %K
  %n2 = udiv i16 -1, %o0
  %r = icmp ule i16 %n1, %n2

Done: 1
Optimization is correct!

----------------------------------------
Name: power-of-two
  %o0 = shl i16 1, %K ; so %o0 is a power-of-two
  %o1 = urem i16 %X, %o0
  %r = icmp ne i16 %o1, 0
=>
  %o0 = shl i16 1, %K ; so %o0 is a power-of-two
  %o1 = urem i16 %X, %o0
  %n0 = mul i16 %X, 1 ; multiplicative inverse of d0=1 is 1
  %n1 = fshr i16 %n0, %n0, %K
  %n2 = udiv i16 -1, %o0
  %r = icmp ugt i16 %n1, %n2

Done: 1
Optimization is correct!

----------------------------------------
Name: one
  %o1 = urem i16 %X, 1
  %r = icmp eq i16 %o1, 0
=>
  %o1 = urem i16 %X, 1
  %n0 = mul i16 %X, 0
  %n1 = fshr i16 %n0, %n0, 0
  %r = icmp ule i16 %n1, -1

Done: 1
Optimization is correct!

----------------------------------------
Name: one
  %o1 = urem i16 %X, 1
  %r = icmp ne i16 %o1, 0
=>
  %o1 = urem i16 %X, 1
  %n0 = mul i16 %X, 0
  %n1 = fshr i16 %n0, %n0, 0
  %r = icmp ugt i16 %n1, -1

Done: 1
Optimization is correct!
RKSimon added inline comments.Jul 2 2019, 3:44 AM
include/llvm/CodeGen/SelectionDAG.h
1588 ↗(On Diff #207171)

This comment doesn't really grok well, if I've understood correctly it should be something like:

"If Values contains values that either match the predicate and the non-matching values are the same 'splat' values, then replace all of them with the 'splat' value."

But I'm not sure that's much better.

include/llvm/CodeGen/TargetLowering.h
4058 ↗(On Diff #207171)

N0 -> REMNode?

4064 ↗(On Diff #207171)

These look like a separate NFC change?

lib/CodeGen/SelectionDAG/TargetLowering.cpp
4472 ↗(On Diff #207171)

Separate NFC change?

4485 ↗(On Diff #207171)

Separate NFC change?

4496 ↗(On Diff #207171)

REMNode.getValueType() ?

4594 ↗(On Diff #207171)

Is this code really necessary? Can't you just use SetCC/Select like we do in the BuildDIV methods? SimplifyDemandedBits/SimplifyDemandedVectorElts are more likely to deal with it then.

lebedev.ri added inline comments.Jul 2 2019, 5:08 AM
lib/CodeGen/SelectionDAG/TargetLowering.cpp
4594 ↗(On Diff #207171)

This is absolutely *NOT* *necessary*.
The fold is still fully correct without this if (HadOneDivisor) block
(and without corresponding if (D.isOneValue()) block).

There is no correctness reason to add any extra nodes if we had 1 divisor,
therefore less nodes == better.

The only thing this does is what it looks like - for the channel where the divisor was 1,
we don't care what multiplier and shift amounts will be, therefore if we can make those
constant vectors a splats, we should be better off that way.

I can totally drop this splatting :)

lebedev.ri updated this revision to Diff 207555.Jul 2 2019, 6:53 AM
lebedev.ri marked 7 inline comments as done.

Precommitted NFC cleanup parts.

include/llvm/CodeGen/SelectionDAG.h
1588 ↗(On Diff #207171)

Yeah, it's jumbled. Maybe this is better?

Did you run test suite/ctmark? All fine?

Did you run test suite/ctmark? All fine?

I did not. Here i'm reasonably confident with my alive exercises.

All fine?

Did you run test suite/ctmark? All fine?

I did not.

Err, what i was trying to say was: i certainly did not run ctmark.
I certainly did run test-suite for earlier patches, but i don't recall if i did run it here.

Did you run test suite/ctmark? All fine?

Here i'm reasonably confident with my alive exercises.

This still stands

All fine?

RKSimon added inline comments.Jul 14 2019, 7:44 AM
include/llvm/CodeGen/SelectionDAG.h
1588 ↗(On Diff #207171)

Cheers - instead of adding this to SelectionDAG, please can you just put it TargetLowering.cpp as a static helper for prepareUREMEqFold?

lib/CodeGen/SelectionDAG/TargetLowering.cpp
4569 ↗(On Diff #207555)

REMNode.getOperand()

lebedev.ri marked 4 inline comments as done.
lebedev.ri changed the repository for this revision from rL LLVM to rG LLVM Github Monorepo.

Thanks for taking a look!
Rebased, addressed review notes.

A couple of minors

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
4468

I haven't used find_if_not before - so this is guaranteed to return nullptr for no match?

4521

REMNode.getValueType() ?

lebedev.ri added inline comments.Jul 18 2019, 4:05 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
4468

Whoops, neither llvm wrappers nor the STL versions return nullptr here, this should be checking for != Values.end()
Thanks!

lebedev.ri marked 3 inline comments as done.

Address review notes.
While turnVectorIntoSplatVector() change is not NFC, i don't believe that issue can be triggered through it's only user.

RKSimon accepted this revision.Jul 20 2019, 7:58 AM

LGTM with one minor

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
4468

Better to use "auto SplatValue"? I'd expect the return type to be some kind of iterator type not a regular pointer.

This revision is now accepted and ready to land.Jul 20 2019, 7:58 AM
This revision was automatically updated to reflect the committed changes.