If the divisor is even, we can first shift the dividend and divisor
right by the number of trailing zeros. Now the divisor is odd and we
can do the original algorithm to calculate a remainder. Then we shift
that remainder left by the number of trailing zeros and add the bits
that were shifted out of the dividend.
Details
- Reviewers
RKSimon efriedma nickdesaulniers - Commits
- rG1fa8fd4c33cb: Recommit "[TargetLowering][RISCV][X86] Support even divisors in…
rGf6a7b4782090: [TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant.
rGd4facda414b6: [TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp | ||
---|---|---|
7261 | Would we benefit at all from creating a ISD::FSHL node here? |
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp | ||
---|---|---|
7261 | We don't use FSHL/FSHR in ExpandShiftByConstant so I think we should be ok. Looks like DAGCombiner is matching it to FSHL for X86. |
FYI, I'm seeing some failures from this internally in some tests that do str <-> floating point conversion. I'll try to reduce a test case now; I'm not sure yet if this patch is doing something wrong.
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp | ||
---|---|---|
7302–7303 | Should this not be using the original divisor (whether by shifting back by TrailingZeros or making a copy you use elsewhere instead of doing an lshrInPlace)? |
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp | ||
---|---|---|
7302–7303 | Yes! Thank you, that's probably the bug. |
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp | ||
---|---|---|
7302–7303 | Well I think that just exposed worse problems in the algorithm. The multiplicative inverse doesn't exist if Divisor is even. It needs to be coprime with 1<<BitWidth. But they will both have 2 as a common factor. |
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp | ||
---|---|---|
7302–7303 | After the subtraction, you have something like "udiv exact X, Y". So you can use the same algorithm as BuildExactSDIV (i.e. shift right X and Y, then multiply X by the inverse of Y). |
Small-ish repro which uses absl:
#include <iostream> #include "absl/strings/str_format.h" int main() { std::cout << absl::StrFormat("%.1g", 1e+15) << "\n"; std::cout << absl::StrFormat("%.1g", 1e+20) << "\n"; std::cout << absl::StrFormat("%.1g", 1e+25) << "\n"; std::cout << absl::StrFormat("%.1g", 1e+30) << "\n"; std::cout << absl::StrFormat("%.1g", 1e+35) << "\n"; }
Before, it prints them as-is. After, it prints just the first three and then crashes:
1e+15 3e+28 3e+35 assert.h assertion failed at absl/strings/internal/str_format/float_conversion.cc:1006 in void absl::str_format_internal::(anonymous namespace)::Buffer::push_front(char): begin > data
I made a godbolt link, but trunk there hasn't caught up to this commit yet: https://godbolt.org/z/zbM1McdzM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp | ||
---|---|---|
7302–7303 | I already shifted the dividend and the divisor to get the remainder. I think I should subtract the uncorrected remainder from the shifted dividend and do the multiplicative inverse to calculate the quotient on that. Then correct the remainder with the part shifted off earlier. |
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp | ||
---|---|---|
7302–7303 | Yes, that's equivalent. |
Change how quotient is calculated. Still need to clean up some comments and do some testing.
I patched this in, and my failing repros from before are now passing, so LGTM from me. Thanks!
This caused miscompilations for me on 32 bit architectures (i386 and arm - while it didn't seem to trigger any such issues on 64 bit). I don't know offhand what/where the issue is yet though - does that aspect ring a bell for anyone, or do I need to dig further to see what's going on? (I guess it'd be best to revert this for now?)
Thanks for the revert!
I didn’t have time to narrow it down further yet, but it can be reproduced with these steps:
git clone git://source.ffmpeg.org/ffmpeg cd ffmpeg ./configure --cc=“clang -target …” --samples=$(pwd)/../samples make -j$(nproc) make fate-rsync make fate-sub-lrc-remux
I don’t know yet which object files contain the breakage here.
For this particular testcase, the changed behaviour is in libavformat/lrcenc.o.
The changed behaviour can be seen in https://martin.st/temp/lrcenc-preproc.c, compiled with clang -target i686-w64-mingw32 -c lrcenc-preproc.c -O2. (I haven't traced the code to see exactly what inputs makes what difference in behaviour though.)
Thanks. I found one bug in how the remainder was calculated. I inadvertently copied the LSBs from after a shift instead of from before it. I recommited that earlier today. Hopefully it hasn't caused any additional failures?
No, this time around it seems to have run successfully everywhere in my tests. Thanks!
Might be helpful to mention the even->odd case here as well?