This is an archive of the discontinued LLVM Phabricator instance.

[KnownBits] Implement accurate unsigned and signed max and min
ClosedPublic

Authored by foad on Sep 2 2020, 8:40 AM.

Diff Detail

Event Timeline

foad created this revision.Sep 2 2020, 8:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 2 2020, 8:40 AM
foad requested review of this revision.Sep 2 2020, 8:40 AM

@regehr do you know anyone who could help me verify that this implementation is sound and precise? I have only exhaustively tested it up to 7 bits.

lebedev.ri added inline comments.Sep 2 2020, 9:10 AM
llvm/unittests/Support/KnownBitsTest.cpp
103

I have only exhaustively tested it up to 7 bits.

For one-time testing you could bump this all the way to 16,
but i think 7 was plenty enough to catch any correctness issues.

111–114

i would think this should start from

KnownBits KnownUMax(KnownAnd) = 0;
KnownBits KnownUMin(KnownAnd) = -1;
KnownBits KnownSMax(KnownAnd) = INT_MIN;
KnownBits KnownSMin(KnownAnd) = INT_MAX;

Can you replace the implementations in SelectionDAG::computeKnownBits with these?

foad added inline comments.Sep 2 2020, 11:31 AM
llvm/unittests/Support/KnownBitsTest.cpp
103

The time is exponential in the number of bits. 6 bits took about 1 second. 7 bits took about 20 seconds. I doubt I would make it as far as 16 bits.

111–114

No, the way these values are initialized and updated with &= as we discover each real result, is independent of which function we're testing.

nikic added inline comments.Sep 2 2020, 11:40 AM
llvm/include/llvm/Support/KnownBits.h
178

const APInt &?

llvm/lib/Support/KnownBits.cpp
113

How does this differ from LHS.getMinValue().uge(RHS.getMaxValue())?

nikic added inline comments.Sep 2 2020, 11:45 AM
llvm/lib/Support/KnownBits.cpp
113

Generally I think this code should be structured something like this:

if (LHS.getMinValue().uge(RHS.getMaxValue()))
  return LHS;
if (RHS.getMinValue().uge(LHS.getMaxValue()))
  return RHS;

// If the result of the umax is LHS then it must be greater than or equal to
// the minimum possible value of RHS. Likewise for RHS. Any known bits that
// are common to these two values are also known in the result.
KnownBits L = LHS.makeGE(RHS.getMinValue());
KnownBits R = RHS.makeGE(LHS.getMinValue());
return KnownBits(L.Zero & R.Zero, L.One & R.One);
foad updated this revision to Diff 289526.Sep 2 2020, 12:03 PM

Rebase.

foad added a comment.Sep 2 2020, 12:04 PM

Can you replace the implementations in SelectionDAG::computeKnownBits with these?

Yes, that's the idea, and the same in GlobalISel and in ValueTracking.

foad added inline comments.Sep 2 2020, 12:07 PM
llvm/include/llvm/Support/KnownBits.h
178

Could do, but then I'd need to copy it in the implementation before calling clearLowBits. I don't know which is best.

foad updated this revision to Diff 289535.Sep 2 2020, 12:15 PM

Simplify the implementation of KnownBits::umax.

llvm/lib/Support/KnownBits.cpp
113

Thanks! Can you simplify makeGE for me too? :)

Harbormaster completed remote builds in B70443: Diff 289535.

Can you replace the implementations in SelectionDAG::computeKnownBits with these?

Yes, that's the idea, and the same in GlobalISel and in ValueTracking.

Sorry, what I meant was are there any codegen test diffs if you replace SelectionDAG::computeKnownBits with this implementation?

llvm/include/llvm/Support/KnownBits.h
178

Its more common to use the const & approach

foad added a comment.Sep 3 2020, 9:52 AM

Can you replace the implementations in SelectionDAG::computeKnownBits with these?

Yes, that's the idea, and the same in GlobalISel and in ValueTracking.

Sorry, what I meant was are there any codegen test diffs if you replace SelectionDAG::computeKnownBits with this implementation?

No, using it in SelectionDAG doesn't affect any existing tests.

foad updated this revision to Diff 289750.Sep 3 2020, 10:01 AM

Use const APInt &.

Can you replace the implementations in SelectionDAG::computeKnownBits with these?

Yes, that's the idea, and the same in GlobalISel and in ValueTracking.

Sorry, what I meant was are there any codegen test diffs if you replace SelectionDAG::computeKnownBits with this implementation?

No, using it in SelectionDAG doesn't affect any existing tests.

I think the question is, is this implementation expected to be identical to those, or better?
Regardless, i believe all those implementations should be replaced to use this in *this* patch,
to give better testing coverage.

foad added a comment.Sep 3 2020, 10:12 AM

Can you replace the implementations in SelectionDAG::computeKnownBits with these?

Yes, that's the idea, and the same in GlobalISel and in ValueTracking.

Sorry, what I meant was are there any codegen test diffs if you replace SelectionDAG::computeKnownBits with this implementation?

No, using it in SelectionDAG doesn't affect any existing tests.

I think the question is, is this implementation expected to be identical to those, or better?
Regardless, i believe all those implementations should be replaced to use this in *this* patch,
to give better testing coverage.

The new implementation is supposed to be strictly better than the existing implementations -- though not necessarily in cases that occur very often in practice.

foad updated this revision to Diff 289771.Sep 3 2020, 11:21 AM

Use it in SelectionDAG, GlobalISel and ValueTracking.

lebedev.ri accepted this revision.Sep 3 2020, 11:38 AM

Thanks.
Unless others have more comments (@nikic?), i think this is good.

This revision is now accepted and ready to land.Sep 3 2020, 11:38 AM
nikic accepted this revision.Sep 3 2020, 1:01 PM

LG as well.

llvm/lib/Support/KnownBits.cpp
89

Personally I'd write this as unsigned N = (Zero | Val).countLeadingOnes() (with the comment something like "determine number of known leading bits"). Okay either way though.

nikic added inline comments.Sep 3 2020, 1:03 PM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3431 ↗(On Diff #289771)

Maybe drop this early return. Even if one operand is unknown we can determine something here.

foad added inline comments.Sep 3 2020, 2:07 PM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3431 ↗(On Diff #289771)

Yes, well spotted, but I am planning to do that in a separate patch because it affects some codegen tests.

foad updated this revision to Diff 289944.Sep 4 2020, 7:20 AM

Simplify makeGE.

nikic added a comment.Sep 7 2020, 1:27 PM

Contrary to my expectations, this change had a small but measurable compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=04ea680a8ccc4f9a4d7333cd712333960348c35b&to=5350e1b5096aa4707aa525baf7398d93b4a4f1a5&stat=instructions

Unfortunately, I have no idea why. I looked at callgrind profiles and the extra instructions (consistently) end up inside a DenseMap method (without changes to number of calls). Can't explain why that would happen, and the diff is not large enough to bother looking further.

I did end up applying this small cleanup while looking over the code again: https://github.com/llvm/llvm-project/commit/ddab4cd83ea31141aaada424dccf94278482ee88

foad added a comment.Sep 8 2020, 2:39 AM

Contrary to my expectations, this change had a small but measurable compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=04ea680a8ccc4f9a4d7333cd712333960348c35b&to=5350e1b5096aa4707aa525baf7398d93b4a4f1a5&stat=instructions

Unfortunately, I have no idea why. I looked at callgrind profiles and the extra instructions (consistently) end up inside a DenseMap method (without changes to number of calls). Can't explain why that would happen, and the diff is not large enough to bother looking further.

I did end up applying this small cleanup while looking over the code again: https://github.com/llvm/llvm-project/commit/ddab4cd83ea31141aaada424dccf94278482ee88

Thanks. I wonder whether the number of calls to KnownBits and/or APInt constructors really affects compile time, as long as the APInts are all <= 64 bits (so they don't have to allocate any extra storage). If it does make a difference then KnownBits.h could probably be improved quite a lot e.g. by using rvalue references to avoid lots of copies.

Is there an easy way for me to reproduce one of the compile-time-tracker's tests on my own machine? E.g. compiling "sqlite3" under perf stat?

nikic added a comment.Sep 8 2020, 9:52 AM

Thanks. I wonder whether the number of calls to KnownBits and/or APInt constructors really affects compile time, as long as the APInts are all <= 64 bits (so they don't have to allocate any extra storage). If it does make a difference then KnownBits.h could probably be improved quite a lot e.g. by using rvalue references to avoid lots of copies.

Ah no, this was just intended as cleanup. It doesn't have a practical impact on compile-time.

Is there an easy way for me to reproduce one of the compile-time-tracker's tests on my own machine? E.g. compiling "sqlite3" under perf stat?

What I do is run ninja sqlite3 -v (or whatever the benchmark is) in test-suite and then prepend perf stat or valgrind --tool=callgrind before the printed compilation command. Results in something like...

valgrind --tool=callgrind /path/to/llvm-project/build/bin/clang -DNDEBUG -O3 -w -Werror=date-time -save-stats=obj -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DSQLITE_OMIT_LOAD_EXTENSION=1 -DSQLITE_THREADSAFE=0 -I. -save-stats=obj -MD -MT MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.c.o -MF MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.c.o.d -o MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.c.o -c ../MultiSource/Applications/sqlite3/sqlite3.c