This is an archive of the discontinued LLVM Phabricator instance.

[TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant.
ClosedPublic

Authored by craig.topper on Oct 9 2022, 11:15 AM.

Details

Summary

If the divisor is even, we can first shift the dividend and divisor
right by the number of trailing zeros. Now the divisor is odd and we
can do the original algorithm to calculate a remainder. Then we shift
that remainder left by the number of trailing zeros and add the bits
that were shifted out of the dividend.

Diff Detail

Event Timeline

craig.topper created this revision.Oct 9 2022, 11:15 AM
craig.topper requested review of this revision.Oct 9 2022, 11:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 9 2022, 11:15 AM
RKSimon added inline comments.Oct 10 2022, 5:21 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7261

Would we benefit at all from creating a ISD::FSHL node here?

craig.topper added inline comments.Oct 10 2022, 9:15 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7261

We don't use FSHL/FSHR in ExpandShiftByConstant so I think we should be ok. Looks like DAGCombiner is matching it to FSHL for X86.

RKSimon accepted this revision.Oct 10 2022, 10:25 AM

LGTM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7175

Might be helpful to mention the even->odd case here as well?

7261

OK, I have a vague memory of trying to get the legalizers to use funnel-shifts for those cases in the past - I can't remember why though!

This revision is now accepted and ready to land.Oct 10 2022, 10:25 AM
This revision was landed with ongoing or failed builds.Oct 10 2022, 11:02 AM
This revision was automatically updated to reflect the committed changes.

FYI, I'm seeing some failures from this internally in some tests that do str <-> floating point conversion. I'll try to reduce a test case now; I'm not sure yet if this patch is doing something wrong.

craig.topper reopened this revision.Oct 10 2022, 2:59 PM
This revision is now accepted and ready to land.Oct 10 2022, 2:59 PM
craig.topper planned changes to this revision.Oct 10 2022, 2:59 PM
jrtc27 added a comment.EditedOct 10 2022, 3:02 PM
This comment has been deleted.
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7302–7303

Should this not be using the original divisor (whether by shifting back by TrailingZeros or making a copy you use elsewhere instead of doing an lshrInPlace)?

craig.topper added inline comments.Oct 10 2022, 3:03 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7302–7303

Yes! Thank you, that's probably the bug.

craig.topper added inline comments.Oct 10 2022, 3:35 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7302–7303

Well I think that just exposed worse problems in the algorithm. The multiplicative inverse doesn't exist if Divisor is even. It needs to be coprime with 1<<BitWidth. But they will both have 2 as a common factor.

efriedma added inline comments.Oct 10 2022, 4:18 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7302–7303

After the subtraction, you have something like "udiv exact X, Y". So you can use the same algorithm as BuildExactSDIV (i.e. shift right X and Y, then multiply X by the inverse of Y).

FYI, I'm seeing some failures from this internally in some tests that do str <-> floating point conversion. I'll try to reduce a test case now; I'm not sure yet if this patch is doing something wrong.

Small-ish repro which uses absl:

#include <iostream>
#include "absl/strings/str_format.h"

int main() {
  std::cout << absl::StrFormat("%.1g", 1e+15) << "\n";
  std::cout << absl::StrFormat("%.1g", 1e+20) << "\n";
  std::cout << absl::StrFormat("%.1g", 1e+25) << "\n";
  std::cout << absl::StrFormat("%.1g", 1e+30) << "\n";
  std::cout << absl::StrFormat("%.1g", 1e+35) << "\n";
}

Before, it prints them as-is. After, it prints just the first three and then crashes:

1e+15
3e+28
3e+35
assert.h assertion failed at absl/strings/internal/str_format/float_conversion.cc:1006 in void absl::str_format_internal::(anonymous namespace)::Buffer::push_front(char): begin > data

I made a godbolt link, but trunk there hasn't caught up to this commit yet: https://godbolt.org/z/zbM1McdzM

craig.topper added inline comments.Oct 10 2022, 8:07 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7302–7303

I already shifted the dividend and the divisor to get the remainder. I think I should subtract the uncorrected remainder from the shifted dividend and do the multiplicative inverse to calculate the quotient on that. Then correct the remainder with the part shifted off earlier.

efriedma added inline comments.Oct 10 2022, 9:13 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7302–7303

Yes, that's equivalent.

Change how quotient is calculated. Still need to clean up some comments and do some testing.

This revision is now accepted and ready to land.Oct 10 2022, 9:32 PM

I patched this in, and my failing repros from before are now passing, so LGTM from me. Thanks!

craig.topper planned changes to this revision.Oct 11 2022, 4:04 PM
This revision was not accepted when it landed; it landed in state Changes Planned.Oct 22 2022, 11:47 PM
This revision was automatically updated to reflect the committed changes.

This caused miscompilations for me on 32 bit architectures (i386 and arm - while it didn't seem to trigger any such issues on 64 bit). I don't know offhand what/where the issue is yet though - does that aspect ring a bell for anyone, or do I need to dig further to see what's going on? (I guess it'd be best to revert this for now?)

Thanks for the revert!

I didn’t have time to narrow it down further yet, but it can be reproduced with these steps:

git clone git://source.ffmpeg.org/ffmpeg
cd ffmpeg
./configure --cc=“clang -target …” --samples=$(pwd)/../samples
make -j$(nproc)
make fate-rsync
make fate-sub-lrc-remux

I don’t know yet which object files contain the breakage here.

Thanks for the revert!

I didn’t have time to narrow it down further yet, but it can be reproduced with these steps:

git clone git://source.ffmpeg.org/ffmpeg
cd ffmpeg
./configure --cc=“clang -target …” --samples=$(pwd)/../samples
make -j$(nproc)
make fate-rsync
make fate-sub-lrc-remux

I don’t know yet which object files contain the breakage here.

For this particular testcase, the changed behaviour is in libavformat/lrcenc.o.

The changed behaviour can be seen in https://martin.st/temp/lrcenc-preproc.c, compiled with clang -target i686-w64-mingw32 -c lrcenc-preproc.c -O2. (I haven't traced the code to see exactly what inputs makes what difference in behaviour though.)

Thanks for the revert!

I didn’t have time to narrow it down further yet, but it can be reproduced with these steps:

git clone git://source.ffmpeg.org/ffmpeg
cd ffmpeg
./configure --cc=“clang -target …” --samples=$(pwd)/../samples
make -j$(nproc)
make fate-rsync
make fate-sub-lrc-remux

I don’t know yet which object files contain the breakage here.

For this particular testcase, the changed behaviour is in libavformat/lrcenc.o.

The changed behaviour can be seen in https://martin.st/temp/lrcenc-preproc.c, compiled with clang -target i686-w64-mingw32 -c lrcenc-preproc.c -O2. (I haven't traced the code to see exactly what inputs makes what difference in behaviour though.)

Thanks. I found one bug in how the remainder was calculated. I inadvertently copied the LSBs from after a shift instead of from before it. I recommited that earlier today. Hopefully it hasn't caused any additional failures?

Thanks. I found one bug in how the remainder was calculated. I inadvertently copied the LSBs from after a shift instead of from before it. I recommited that earlier today. Hopefully it hasn't caused any additional failures?

No, this time around it seems to have run successfully everywhere in my tests. Thanks!