This is an archive of the discontinued LLVM Phabricator instance.

[BasicAA] Be more careful with modulo ops on VariableGEPIndex.
ClosedPublic

Authored by fhahn on Mar 26 2021, 9:58 AM.

Details

Summary

(V * Scale) % X may not produce the same result for any possible value
of V, e.g. if the multiplication overflows. This means we currently
incorrectly determine NoAlias in some cases.

This patch adjusts the code linarizing GEPs to try to track if the
modulo semantics are preserved. There might still be GEPs and cases
where we miss, but it should address a few critical cases.

Diff Detail

Event Timeline

fhahn created this revision.Mar 26 2021, 9:58 AM
fhahn requested review of this revision.Mar 26 2021, 9:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 26 2021, 9:58 AM

I'm somewhat confused by the implementation approach in this patch. I think there's really two orthogonal cases: The first is if all operations are non-wrapping, in which case we can optimize for arbitrary modulos. The second is that we can always use power of two factors from the GCD, because arithmetic is over a power of two field.

I think handling for these two cases should be split. The first part can be handled via flag, and the second part can be handled by only extracting the power of two part if the flag is not set. I think this should both make the semantics of the flag clear and be more precise.

fhahn updated this revision to Diff 335931.Apr 7 2021, 2:40 PM

Rebased on top of latest changes.

I'm somewhat confused by the implementation approach in this patch. I think there's really two orthogonal cases: The first is if all operations are non-wrapping, in which case we can optimize for arbitrary modulos. The second is that we can always use power of two factors from the GCD, because arithmetic is over a power of two field.

I think handling for these two cases should be split. The first part can be handled via flag, and the second part can be handled by only extracting the power of two part if the flag is not set. I think this should both make the semantics of the flag clear and be more precise.

I'm not sure where the logic should be split? I tried to just have a may-wrap flag and then check if Scale is a power-of-2 when the scale is used for the modulo computations. But I think that wouldn't correctly handle cases like (a + x) * 2^y, if the multiply operand gets distributed. Or were you thinking about something else?

nikic added inline comments.Apr 7 2021, 3:01 PM
llvm/test/Analysis/BasicAA/gep-modulo.ll
135

This is an example of the case I have in mind: This can be NoAlias (https://alive2.llvm.org/ce/z/yYFAAz) even though the mul+sub can wrap. Despite the wrapping, we are still guaranteed that the power-of-2 part of the GCD still holds.

nikic added inline comments.Apr 7 2021, 3:14 PM
llvm/test/Analysis/BasicAA/gep-modulo.ll
135

Eh, maybe I'm wrong here regarding the general case. I'll have to think about it more carefully.

fhahn added a comment.Apr 8 2021, 1:54 AM
llvm/test/Analysis/BasicAA/gep-modulo.ll
135

I think it may hold for cases where we compute % power-of-2, but not if the operand is not a power-of-2? I'm not too familiar with the code that actually uses the modulo, but if that's the case we may be able to use this? If so, we should be able to do so by adjusting the check to
if (!DecompGEP1.VarIndices[i].PreservesModulo && !Scale.isPowerOf2())

fhahn added a comment.Apr 8 2021, 2:48 AM

The impact of this seems quite limited. Below the AA stats for MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto:

aa.NumMayAlias
Program                                        base      basicaa   diff 
 test-suite...urce/Applications/aha/aha.test   271.00    292.00     7.7%
 test-suite...rks/FreeBench/pifft/pifft.test   14990.00  15689.00   4.7%
 test-suite...CFP2006/433.milc/433.milc.test   13793.00  14243.00   3.3%
 test-suite...-dbl/LinearDependence-dbl.test   522.00    525.00     0.6%
 test-suite...T2000/300.twolf/300.twolf.test   45884.00  46038.00   0.3%
 test-suite...TimberWolfMC/timberwolfmc.test   49643.00  49798.00   0.3%
 test-suite...-flt/LinearDependence-flt.test   1216.00   1219.00    0.2%
 test-suite...CFP2000/177.mesa/177.mesa.test   67899.00  68059.00   0.2%
 test-suite.../CINT2006/429.mcf/429.mcf.test   1134.00   1136.00    0.2%
 test-suite.../CINT2000/181.mcf/181.mcf.test   1173.00   1175.00    0.2%
 test-suite...libquantum/462.libquantum.test   3498.00   3502.00    0.1%
 test-suite...ications/JM/lencod/lencod.test   293097.00 293418.00  0.1%
 test-suite...nal/skidmarks10/skidmarks.test   16112.00  16126.00   0.1%
 test-suite...006/453.povray/453.povray.test   120925.00 121011.00  0.1%
 test-suite.../CINT2000/254.gap/254.gap.test   67251.00  67293.00   0.1%
 Geomean difference                                                 0.5%

aa.NumMustAlias
Program                                        base     basicaa  diff 
 test-suite...CFP2006/433.milc/433.milc.test   1932.00  1901.00  -1.6%
 test-suite.../Benchmarks/Bullet/bullet.test   25887.00 25875.00 -0.0%
 test-suite...CFP2000/177.mesa/177.mesa.test   8657.00  8653.00  -0.0%
 test-suite...006/453.povray/453.povray.test   28386.00 28374.00 -0.0%
 Geomean difference                                              -0.4%

aa.NumNoAlias
Program                                        base       basicaa    diff 
 test-suite...urce/Applications/aha/aha.test   4913.00    4832.00    -1.6%
 test-suite...rks/FreeBench/pifft/pifft.test   96670.00   95195.00   -1.5%
 test-suite...CFP2006/433.milc/433.milc.test   96144.00   95308.00   -0.9%
 test-suite...-dbl/LinearDependence-dbl.test   3488.00    3461.00    -0.8%
 test-suite...-flt/LinearDependence-flt.test   8855.00    8828.00    -0.3%
 test-suite...TimberWolfMC/timberwolfmc.test   306639.00  305968.00  -0.2%
 test-suite...T2000/300.twolf/300.twolf.test   458839.00  458169.00  -0.1%
 test-suite...000/183.equake/183.equake.test   75059.00   75009.00   -0.1%
 test-suite...libquantum/462.libquantum.test   7910.00    7906.00    -0.1%
 test-suite...CFP2000/177.mesa/177.mesa.test   607413.00  607138.00  -0.0%
 test-suite...oxyApps-C/miniAMR/miniAMR.test   120107.00  120076.00  -0.0%
 test-suite.../CINT2000/254.gap/254.gap.test   193489.00  193447.00  -0.0%
 test-suite.../CINT2000/181.mcf/181.mcf.test   10422.00   10420.00   -0.0%
 test-suite.../CINT2006/429.mcf/429.mcf.test   12104.00   12102.00   -0.0%
 test-suite...ications/JM/lencod/lencod.test   2077915.00 2077590.00 -0.0%
 Geomean difference                                                  -0.2%

And the size impact (binary changes to 9 out of 237 binaries)

Same hash: 228 (filtered out)
Remaining: 9
Metric: size.__text

Program                                        base     basicaa  diff
 test-suite...arks/mafft/pairlocalalign.test    233639   233927   0.1%
 test-suite...-dbl/LinearDependence-dbl.test    58195    58243    0.1%
 test-suite.../Benchmarks/Bullet/bullet.test    309159   309175   0.0%
 test-suite...006/453.povray/453.povray.test    1152978  1152994  0.0%
 test-suite...CFP2000/177.mesa/177.mesa.test    694753   694753   0.0%
 test-suite.../CINT2006/403.gcc/403.gcc.test    3238353  3238353  0.0%
 test-suite.../CINT2000/176.gcc/176.gcc.test    1429678  1429678  0.0%
 test-suite...CFP2006/433.milc/433.milc.test    119989   119861  -0.1%
 test-suite...-flt/LinearDependence-flt.test    54951    54743   -0.4%

There's no difference with special handling for % power-of-2.

nikic added a comment.Apr 22 2021, 1:06 PM

Okay, I've finally gotten around to looking into this in more detail. Because thinking about this really fries my brain, I ended up modelling this in alive2 (hopefully correctly...)

https://alive2.llvm.org/ce/z/HYBxGs This is the general case and shows that we only need nsw flags on the arithmetic, rather than both nuw and nsw. This makes sense, as all the arithmetic involved is signed.

https://alive2.llvm.org/ce/z/qXicaB This shows that it is indeed safe to always take the power-of-2 portion of the scale, even if the arithmetic is not nsw.

fhahn updated this revision to Diff 351122.Jun 10 2021, 3:56 AM

Okay, I've finally gotten around to looking into this in more detail. Because thinking about this really fries my brain, I ended up modelling this in alive2 (hopefully correctly...)

https://alive2.llvm.org/ce/z/HYBxGs This is the general case and shows that we only need nsw flags on the arithmetic, rather than both nuw and nsw. This makes sense, as all the arithmetic involved is signed.

https://alive2.llvm.org/ce/z/qXicaB This shows that it is indeed safe to always take the power-of-2 portion of the scale, even if the arithmetic is not nsw.

Thanks for checking with Alive. I've updated the code now to just track the NSW flag for LinearExpression/VariableGEPIndex and added a power-of-2 check for the scale.

nikic added inline comments.Jun 27 2021, 12:50 PM
llvm/lib/Analysis/BasicAliasAnalysis.cpp
313

Probably doesn't really matter, but this should be true. Note that the expression here is Val * 0 + Const.

377–378

This should be E.IsNSW &= NSW, to keep it as a plain NSW flag without further semantics.

1156

What I had in mind here is something like this:

APInt Scale = DecompGEP1.VarIndices[i].Scale;
if (!DecompGEP1.VarIndices[i].IsNSW)
  Scale = APInt::getOneBitSet(
      Scale.getBitWidth(), Scale.countTrailingZeros());

And then continue with the previous implementation. This preserves a power of two factor, even if the whole scale is not power of two.

1720

This doesn't appear to be tested.

llvm/test/Analysis/BasicAA/gep-alias.ll
164 ↗(On Diff #351122)

With the suggested change to the power of two handling, this change shouldn't be needed anymore.

fhahn updated this revision to Diff 354914.Jun 28 2021, 8:45 AM
fhahn marked an inline comment as done.

Addressed comments, thanks!

fhahn added inline comments.Jun 28 2021, 8:47 AM
llvm/lib/Analysis/BasicAliasAnalysis.cpp
313

Ah right, Val is multiplied by 0 here. Updated.

377–378

Yes, I also removed the outdated comment.

1156

Updated, thanks!

1720

I added a test in ef78325c1033 and also removed the outdated comment.

llvm/test/Analysis/BasicAA/gep-alias.ll
164 ↗(On Diff #351122)

Correct, I removed the changes, thanks!

nikic accepted this revision.Jun 28 2021, 8:54 AM

LGTM, thanks!

This revision is now accepted and ready to land.Jun 28 2021, 8:54 AM

I’ve bisected a miscompilation to this commit, FYI. Looking into a repro now.

The problem appears with https://martin.st/temp/pixlet-preproc.c, compiled with “clang -target aarch64-w64-mingw32 -c -O3 pixlet-preproc.c”. I haven’t pinpointed exactly what changes and whether that’s wrong or if the source itself relies on something undefined though, or if there’s some strict aliasing violation.

fhahn added a comment.Jun 30 2021, 3:06 AM

The problem appears with https://martin.st/temp/pixlet-preproc.c, compiled with “clang -target aarch64-w64-mingw32 -c -O3 pixlet-preproc.c”. I haven’t pinpointed exactly what changes and whether that’s wrong or if the source itself relies on something undefined though, or if there’s some strict aliasing violation.

Thanks for the heads-up!

I suspect that setting Scale = APInt::getOneBitSet(Scale.getBitWidth(), may be at fault here. With the change below, which makes BasicAA more conservative for that case, I don't get any differences with your reproducer. Could you try if that fixes the end-to-end miscomputation?

diff --git a/llvm/lib/Analysis/BasicAliasAnalysis.cpp b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
index da489b8d457f..81823b7ea778 100644
--- a/llvm/lib/Analysis/BasicAliasAnalysis.cpp
+++ b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
@@ -1149,8 +1149,7 @@ AliasResult BasicAAResult::aliasGEP(
     for (unsigned i = 0, e = DecompGEP1.VarIndices.size(); i != e; ++i) {
       APInt Scale = DecompGEP1.VarIndices[i].Scale;
       if (!DecompGEP1.VarIndices[i].IsNSW)
-        Scale = APInt::getOneBitSet(Scale.getBitWidth(),
-                                    Scale.countTrailingZeros());
+        GCD = APInt(Scale.getBitWidth(), 1);

       if (i == 0)
         GCD = Scale.abs();
fhahn added a comment.Jun 30 2021, 3:16 AM

The problem appears with https://martin.st/temp/pixlet-preproc.c, compiled with “clang -target aarch64-w64-mingw32 -c -O3 pixlet-preproc.c”. I haven’t pinpointed exactly what changes and whether that’s wrong or if the source itself relies on something undefined though, or if there’s some strict aliasing violation.

If you manage to reduce it further down, that would be very helpful. Unfortunately it looks like the only difference comes from the MachineScheduler, so it is a bit tricky to extract a nice test case.

I've also bisected a miscompile to this patch. I don't know at all what is happening but the suggested fix

-        Scale = APInt::getOneBitSet(Scale.getBitWidth(),
-                                    Scale.countTrailingZeros());
+        GCD = APInt(Scale.getBitWidth(), 1);

makes the miscompile go away.

fhahn added a comment.Jun 30 2021, 5:42 AM

I've also bisected a miscompile to this patch. I don't know at all what is happening but the suggested fix

Any chance you could share the reproducer?

I've also bisected a miscompile to this patch. I don't know at all what is happening but the suggested fix

Any chance you could share the reproducer?

I'll see. It's for my out-of-tree target so it takes a bit of fiddling to extract something that could be useful to you.

The problem appears with https://martin.st/temp/pixlet-preproc.c, compiled with “clang -target aarch64-w64-mingw32 -c -O3 pixlet-preproc.c”. I haven’t pinpointed exactly what changes and whether that’s wrong or if the source itself relies on something undefined though, or if there’s some strict aliasing violation.

Thanks for the heads-up!

I suspect that setting Scale = APInt::getOneBitSet(Scale.getBitWidth(), may be at fault here. With the change below, which makes BasicAA more conservative for that case, I don't get any differences with your reproducer. Could you try if that fixes the end-to-end miscomputation?

Thanks! That does indeed fix my issue, and it also fixes another failed testcase that I hadn’t inspected closer yet.

The problem appears with https://martin.st/temp/pixlet-preproc.c, compiled with “clang -target aarch64-w64-mingw32 -c -O3 pixlet-preproc.c”. I haven’t pinpointed exactly what changes and whether that’s wrong or if the source itself relies on something undefined though, or if there’s some strict aliasing violation.

If you manage to reduce it further down, that would be very helpful. Unfortunately it looks like the only difference comes from the MachineScheduler, so it is a bit tricky to extract a nice test case.

FWIW, if you’ve got access to aarch64 linux, I would expect that you can reproduce the same issue there. (I haven’t verified myself yet.) To try it out at runtime, you should be able to do this:

git clone git://source.ffmpeg.org/ffmpeg
cd ffmpeg
./configure --cc=clang --samples=/path/to/empty/dir
make fate-rsync # sync data samples for running tests, into the path specified above
make -j$(nproc) fate-pixlet-rgb fate-twinvq

The miscompilation happens in object files libavcodec/pixlet.o and libavcodec/twinvq.o.

(Sorry I'm not of more assistance in narrowing down the actual change itself at the moment.)

Ok, after another look, I think the problem was that setting Scale as in this patch may turn Scale from a negative to a positive number, which in turn means that AllNonNegative may be set incorrectly, which should be independent of the scale used for the GCD.

I pushed a test for that scenario (returned noalias based on all indices being non-negative) in f4ea6531e677. And a fix to avoid nuking the sign of the original Scale: e6d22d0174e0 . This is likely only temporary, as relying on the sign of the scale is also problematic when the operations wrap.

@nikic it would be great if you could take a quick look at the commits. I'll probably post a follow-up in the next few days for review.

@mstorsjo @uabelho could you check if e6d22d0174e0 fixes the issues you are seeing? I checked with @mstorsjo's example and with the new patch we get the same results as with D99424 reverted.

@mstorsjo @uabelho could you check if e6d22d0174e0 fixes the issues you are seeing? I checked with @mstorsjo's example and with the new patch we get the same results as with D99424 reverted.

Thanks! It does seem like that commit fixes the issues I was observing in both those two testcases.

Ok, after another look, I think the problem was that setting Scale as in this patch may turn Scale from a negative to a positive number, which in turn means that AllNonNegative may be set incorrectly, which should be independent of the scale used for the GCD.

I pushed a test for that scenario (returned noalias based on all indices being non-negative) in f4ea6531e677. And a fix to avoid nuking the sign of the original Scale: e6d22d0174e0 . This is likely only temporary, as relying on the sign of the scale is also problematic when the operations wrap.

@nikic it would be great if you could take a quick look at the commits. I'll probably post a follow-up in the next few days for review.

@mstorsjo @uabelho could you check if e6d22d0174e0 fixes the issues you are seeing? I checked with @mstorsjo's example and with the new patch we get the same results as with D99424 reverted.

It helps in my case. The miscompile goes away with e6d22d0174e0. Thanks!