Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
@regehr do you know anyone who could help me verify that this implementation is sound and precise? I have only exhaustively tested it up to 7 bits.
llvm/unittests/Support/KnownBitsTest.cpp | ||
---|---|---|
103 | For one-time testing you could bump this all the way to 16, | |
111–114 | i would think this should start from KnownBits KnownUMax(KnownAnd) = 0; KnownBits KnownUMin(KnownAnd) = -1; KnownBits KnownSMax(KnownAnd) = INT_MIN; KnownBits KnownSMin(KnownAnd) = INT_MAX; |
llvm/unittests/Support/KnownBitsTest.cpp | ||
---|---|---|
103 | The time is exponential in the number of bits. 6 bits took about 1 second. 7 bits took about 20 seconds. I doubt I would make it as far as 16 bits. | |
111–114 | No, the way these values are initialized and updated with &= as we discover each real result, is independent of which function we're testing. |
llvm/lib/Support/KnownBits.cpp | ||
---|---|---|
113 | Generally I think this code should be structured something like this: if (LHS.getMinValue().uge(RHS.getMaxValue())) return LHS; if (RHS.getMinValue().uge(LHS.getMaxValue())) return RHS; // If the result of the umax is LHS then it must be greater than or equal to // the minimum possible value of RHS. Likewise for RHS. Any known bits that // are common to these two values are also known in the result. KnownBits L = LHS.makeGE(RHS.getMinValue()); KnownBits R = RHS.makeGE(LHS.getMinValue()); return KnownBits(L.Zero & R.Zero, L.One & R.One); |
llvm/include/llvm/Support/KnownBits.h | ||
---|---|---|
178 | Could do, but then I'd need to copy it in the implementation before calling clearLowBits. I don't know which is best. |
Simplify the implementation of KnownBits::umax.
llvm/lib/Support/KnownBits.cpp | ||
---|---|---|
113 | Thanks! Can you simplify makeGE for me too? :) |
Sorry, what I meant was are there any codegen test diffs if you replace SelectionDAG::computeKnownBits with this implementation?
llvm/include/llvm/Support/KnownBits.h | ||
---|---|---|
178 | Its more common to use the const & approach |
I think the question is, is this implementation expected to be identical to those, or better?
Regardless, i believe all those implementations should be replaced to use this in *this* patch,
to give better testing coverage.
The new implementation is supposed to be strictly better than the existing implementations -- though not necessarily in cases that occur very often in practice.
LG as well.
llvm/lib/Support/KnownBits.cpp | ||
---|---|---|
89 | Personally I'd write this as unsigned N = (Zero | Val).countLeadingOnes() (with the comment something like "determine number of known leading bits"). Okay either way though. |
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp | ||
---|---|---|
3434 | Maybe drop this early return. Even if one operand is unknown we can determine something here. |
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp | ||
---|---|---|
3434 | Yes, well spotted, but I am planning to do that in a separate patch because it affects some codegen tests. |
Contrary to my expectations, this change had a small but measurable compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=04ea680a8ccc4f9a4d7333cd712333960348c35b&to=5350e1b5096aa4707aa525baf7398d93b4a4f1a5&stat=instructions
Unfortunately, I have no idea why. I looked at callgrind profiles and the extra instructions (consistently) end up inside a DenseMap method (without changes to number of calls). Can't explain why that would happen, and the diff is not large enough to bother looking further.
I did end up applying this small cleanup while looking over the code again: https://github.com/llvm/llvm-project/commit/ddab4cd83ea31141aaada424dccf94278482ee88
Thanks. I wonder whether the number of calls to KnownBits and/or APInt constructors really affects compile time, as long as the APInts are all <= 64 bits (so they don't have to allocate any extra storage). If it does make a difference then KnownBits.h could probably be improved quite a lot e.g. by using rvalue references to avoid lots of copies.
Is there an easy way for me to reproduce one of the compile-time-tracker's tests on my own machine? E.g. compiling "sqlite3" under perf stat?
Ah no, this was just intended as cleanup. It doesn't have a practical impact on compile-time.
Is there an easy way for me to reproduce one of the compile-time-tracker's tests on my own machine? E.g. compiling "sqlite3" under perf stat?
What I do is run ninja sqlite3 -v (or whatever the benchmark is) in test-suite and then prepend perf stat or valgrind --tool=callgrind before the printed compilation command. Results in something like...
valgrind --tool=callgrind /path/to/llvm-project/build/bin/clang -DNDEBUG -O3 -w -Werror=date-time -save-stats=obj -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DSQLITE_OMIT_LOAD_EXTENSION=1 -DSQLITE_THREADSAFE=0 -I. -save-stats=obj -MD -MT MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.c.o -MF MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.c.o.d -o MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.c.o -c ../MultiSource/Applications/sqlite3/sqlite3.c
const APInt &?