Diff Detail
Event Timeline
This looks okay on the surface, but this needs at least a spot check of llvm-test-suite IR diffs to get an idea of how much this affects codegen. If there are no substantial regressions, let's go with it. Otherwise we may need to think harder about how we can mitigate regressions.
The one case I'm probably worried about most are checks of the form %p == null, where replacing %p with null is not generally legal because it has nullary provenance. Replacing two arbitrary pointers likely isn't super important, but replacement with null might be.
Note that it *is* possible to perform the replacement for uses that are provenance-independent, in particular inside icmp instructions. Whether we need to make that distinction depends on how the fallout looks like.
@nikic
I was looking at https://llvm.org/devmtg/2018-04/slides/Lopes-Sbirlea-Pointers,%20Alias%20and%20ModRef%20Analyses.pdf page 15, which says that replacement with null ptr is safe. So I did two runs, one where we do pointer replacement only when they have same underlying objects, and the another where we also replace pointers with null.
Here are the diffs for llvm-test-suite bitcode:
https://github.com/UsmanNadeem/vigilant-octo-journey/commit/125698f591df1f8de81ad4119b0838e135a18af8
and
https://github.com/UsmanNadeem/vigilant-octo-journey/commit/a778479334852e7f05b5a248da4473106aa986fb
here is some data:
Tests: 2432 Metric: gvn.NumGVNInstr Program gvn.NumGVNInstr base sameunderlying-and-nullptr sameunderlying diff MultiSource/Benchmarks/Olden/tsp/tsp 35.00 21.00 21.00 66.7% MultiSourc...s/Prolangs-C++/objects/objects 16.00 16.00 14.00 14.3% MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4 7857.00 6964.00 6936.00 13.3% MultiSource/Benchmarks/McCat/09-vor/vor 59.00 55.00 54.00 9.3% MicroBench...CALS/SubsetBRawLoops/lcalsBRaw 1267.00 1363.00 1324.00 7.6% MicroBench...ubsetBLambdaLoops/lcalsBLambda 1270.00 1366.00 1327.00 7.6% MicroBench...ubsetALambdaLoops/lcalsALambda 1350.00 1446.00 1407.00 7.1% MicroBench...CALS/SubsetARawLoops/lcalsARaw 1321.00 1414.00 1375.00 7.0% MicroBench...ubsetCLambdaLoops/lcalsCLambda 1370.00 1466.00 1427.00 7.0% MicroBench...CALS/SubsetCRawLoops/lcalsCRaw 1393.00 1489.00 1450.00 6.9% MicroBench...teralFiltering/BilateralFilter 69.00 72.00 72.00 4.3% MultiSourc.../DOE-ProxyApps-C++/CLAMR/CLAMR 6769.00 7044.00 7038.00 4.1% MicroBench.../ImageProcessing/Dither/Dither 75.00 78.00 78.00 4.0% MultiSourc...ks/FreeBench/analyzer/analyzer 29.00 29.00 30.00 3.4% SingleSour...ks/Misc-C++/stepanov_container 577.00 558.00 558.00 3.4% Geomean difference 0.5% gvn.NumGVNInstr run base sameunderlying-and-nullptr sameunderlying diff count 334.000000 334.000000 334.000000 334.000000 mean 376.790419 376.655689 375.895210 0.005643 std 1264.258518 1254.702622 1253.774169 0.039618 min 1.000000 1.000000 1.000000 0.000000 25% 9.000000 9.000000 9.000000 0.000000 50% 29.000000 29.000000 29.000000 0.000000 75% 125.000000 125.000000 125.000000 0.000000 max 10671.000000 10671.000000 10671.000000 0.666667 Tests: 2432 Metric: gvn.NumGVNLoad Program gvn.NumGVNLoad base sameunderlying-and-nullptr sameunderlying diff ultiSource/Applications/SPASS/SPASS 398.00 394.00 394.00 1.0% ultiSource/Applications/kimwitu++/kc 204.00 202.00 202.00 1.0% ultiSource...-ProxyApps-C++/PENNANT/PENNANT 149.00 150.00 150.00 0.7% icroBenchm...ubsetBLambdaLoops/lcalsBLambda 207.00 207.00 208.00 0.5% icroBenchm...CALS/SubsetBRawLoops/lcalsBRaw 209.00 209.00 210.00 0.5% icroBenchm...CALS/SubsetARawLoops/lcalsARaw 226.00 226.00 227.00 0.4% icroBenchm...ubsetALambdaLoops/lcalsALambda 230.00 230.00 231.00 0.4% icroBenchm...ubsetCLambdaLoops/lcalsCLambda 231.00 231.00 232.00 0.4% icroBenchm...CALS/SubsetCRawLoops/lcalsCRaw 244.00 244.00 245.00 0.4% ultiSource.../Applications/JM/ldecod/ldecod 646.00 645.00 645.00 0.2% ultiSource.../DOE-ProxyApps-C++/CLAMR/CLAMR 934.00 933.00 933.00 0.1% ultiSource...nchmarks/tramp3d-v4/tramp3d-v4 978.00 977.00 977.00 0.1% ultiSource...ring-flt/LoopRestructuring-flt 1.00 1.00 1.00 0.0% ultiSource...ring-dbl/LoopRestructuring-dbl 1.00 1.00 1.00 0.0% ultiSource...marks/Trimaran/enc-pc1/enc-pc1 2.00 2.00 2.00 0.0% Geomean difference 0.0% gvn.NumGVNLoad run base sameunderlying-and-nullptr sameunderlying diff count 216.000000 216.000000 216.000000 216.000000 mean 92.689815 92.652778 92.680556 0.000265 std 283.845653 283.785744 283.798757 0.001286 min 1.000000 1.000000 1.000000 0.000000 25% 2.000000 2.000000 2.000000 0.000000 50% 10.000000 10.000000 10.000000 0.000000 75% 44.000000 44.000000 44.000000 0.000000 max 2516.000000 2516.000000 2516.000000 0.010152 Tests: 2432 Metric: gvn.NumGVNPRE Program gvn.NumGVNPRE base sameunderlying-and-nullptr sameunderlying diff SingleSour.../Benchmarks/Misc-C++-EH/spirit 5.00 2.00 2.00 150.0% SingleSour...ks/Misc-C++/stepanov_container 22.00 19.00 19.00 15.8% MultiSourc...-ProxyApps-C++/PENNANT/PENNANT 12.00 11.00 11.00 9.1% MultiSourc...e/Applications/ClamAV/clamscan 109.00 107.00 107.00 1.9% MultiSourc.../DOE-ProxyApps-C++/CLAMR/CLAMR 377.00 372.00 372.00 1.3% Bitcode/Be...hmarks/Halide/blur/halide_blur 4.00 4.00 4.00 0.0% MultiSourc...s/Prolangs-C++/objects/objects 1.00 1.00 1.00 0.0% MultiSourc...olangs-C/unix-smail/unix-smail 8.00 8.00 8.00 0.0% MultiSourc...s/Prolangs-C/football/football 6.00 6.00 6.00 0.0% MultiSourc...s/Prolangs-C/compiler/compiler 2.00 2.00 2.00 0.0% MultiSourc...marks/Prolangs-C/bison/mybison 19.00 19.00 19.00 0.0% MultiSourc...Prolangs-C/assembler/assembler 1.00 1.00 1.00 0.0% MultiSourc...rolangs-C/archie-client/archie 2.00 2.00 2.00 0.0% MultiSourc...chmarks/Prolangs-C/agrep/agrep 17.00 17.00 17.00 0.0% MultiSourc...gs-C/TimberWolfMC/timberwolfmc 49.00 49.00 49.00 0.0% Geomean difference 1.1% gvn.NumGVNPRE run base sameunderlying-and-nullptr sameunderlying diff count 111.000000 110.000000 110.000000 111.000000 mean 20.684685 20.654545 20.654545 0.016044 std 46.993226 46.829857 46.829857 0.143183 min 1.000000 1.000000 1.000000 0.000000 25% 2.000000 2.000000 2.000000 0.000000 50% 5.000000 5.000000 5.000000 0.000000 75% 16.000000 16.500000 16.500000 0.000000 max 377.000000 372.000000 372.000000 1.500000 Tests: 2432 Metric: gvn.NumGVNBlocks Program gvn.NumGVNBlocks base sameunderlying-and-nullptr sameunderlying diff MultiSourc...OE-ProxyApps-C++/miniFE/miniFE 69.00 68.00 68.00 1.5% MultiSourc...enchmarks/mafft/pairlocalalign 578.00 578.00 578.00 0.0% SingleSour...enchmarks/BenchmarkGame/n-body 15.00 15.00 15.00 0.0% SingleSour...chmarks/BenchmarkGame/fannkuch 14.00 14.00 14.00 0.0% SingleSour...arks/BenchmarkGame/Large/fasta 2.00 2.00 2.00 0.0% SingleSour...arks/Adobe-C++/stepanov_vector 4.00 4.00 4.00 0.0% SingleSour...Adobe-C++/stepanov_abstraction 4.00 4.00 4.00 0.0% SingleSour...++/simple_types_loop_invariant 2.00 2.00 2.00 0.0% SingleSour.../simple_types_constant_folding 126.00 126.00 126.00 0.0% SingleSour...nchmarks/Adobe-C++/loop_unroll 8.00 8.00 8.00 0.0% SingleSour...arks/Adobe-C++/functionobjects 3.00 3.00 3.00 0.0% MultiSourc...++11/frame_layout/frame_layout 2.00 2.00 2.00 0.0% MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4 503.00 503.00 503.00 0.0% MultiSource/Benchmarks/nbench/nbench 48.00 48.00 48.00 0.0% MultiSourc...nch/mpeg2/mpeg2dec/mpeg2decode 53.00 53.00 53.00 0.0% Geomean difference 0.0% gvn.NumGVNBlocks run base sameunderlying-and-nullptr sameunderlying diff count 291.000000 291.000000 291.000000 291.000000 mean 47.079038 47.075601 47.075601 0.000051 std 130.828991 130.828426 130.828426 0.000862 min 1.000000 1.000000 1.000000 0.000000 25% 2.000000 2.000000 2.000000 0.000000 50% 8.000000 8.000000 8.000000 0.000000 75% 50.000000 50.000000 50.000000 0.000000 max 1476.000000 1476.000000 1476.000000 0.014706 Tests: 2432 Metric: gvn.NumGVNSimpl Program gvn.NumGVNSimpl base sameunderlying-and-nullptr sameunderlying diff MultiSourc...s/Prolangs-C++/objects/objects 4.00 4.00 2.00 100.0% MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4 3739.00 3002.00 2984.00 25.3% MicroBench.../ImageProcessing/Dither/Dither 34.00 39.00 39.00 14.7% MicroBench...ubsetBLambdaLoops/lcalsBLambda 415.00 474.00 434.00 14.2% MicroBench...CALS/SubsetBRawLoops/lcalsBRaw 415.00 474.00 434.00 14.2% MicroBench...CALS/SubsetCRawLoops/lcalsCRaw 421.00 480.00 440.00 14.0% MicroBench...ubsetCLambdaLoops/lcalsCLambda 426.00 485.00 445.00 13.8% MicroBench...ubsetALambdaLoops/lcalsALambda 430.00 489.00 449.00 13.7% MicroBench...CALS/SubsetARawLoops/lcalsARaw 412.00 468.00 428.00 13.6% MicroBench...teralFiltering/BilateralFilter 38.00 43.00 43.00 13.2% MultiSource/Applications/kimwitu++/kc 702.00 786.00 786.00 12.0% MultiSourc.../DOE-ProxyApps-C++/CLAMR/CLAMR 2460.00 2740.00 2734.00 11.4% SingleSource/Benchmarks/Misc-C++/bigfib 53.00 59.00 59.00 11.3% MultiSource/Benchmarks/McCat/09-vor/vor 10.00 10.00 9.00 11.1% MultiSourc...ks/FreeBench/analyzer/analyzer 15.00 15.00 16.00 6.7% Geomean difference 0.9% gvn.NumGVNSimpl run base sameunderlying-and-nullptr sameunderlying diff count 311.000000 311.000000 311.000000 311.000000 mean 155.244373 155.379421 154.565916 0.010764 std 511.943543 502.146410 501.382767 0.063660 min 1.000000 1.000000 1.000000 0.000000 25% 6.000000 6.000000 6.000000 0.000000 50% 17.000000 17.000000 17.000000 0.000000 75% 59.000000 60.500000 60.500000 0.000000 max 4872.000000 4871.000000 4871.000000 1.000000 Tests: 2432 Metric: gvn.NumGVNEqProp Program gvn.NumGVNEqProp base sameunderlying-and-nullptr sameunderlying diff MultiSourc...OE-ProxyApps-C++/miniFE/miniFE 66.00 8.00 4.00 1550.0% SingleSour.../Benchmarks/Misc-C++-EH/spirit 49.00 3.00 3.00 1533.3% SingleSour...ks/Misc-C++/stepanov_container 107.00 12.00 11.00 872.7% SingleSource/Benchmarks/Misc-C++/bigfib 18.00 2.00 2.00 800.0% MultiSourc.../DOE-ProxyApps-C++/HPCCG/HPCCG 6.00 1.00 1.00 500.0% MultiSourc.../Benchmarks/McCat/15-trie/trie 6.00 6.00 1.00 500.0% MultiSourc...hmarks/MallocBench/cfrac/cfrac 6.00 6.00 1.00 500.0% MultiSourc.../DOE-ProxyApps-C++/CLAMR/CLAMR 874.00 175.00 169.00 417.2% MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4 222.00 75.00 43.00 416.3% MultiSource/Benchmarks/McCat/09-vor/vor 14.00 3.00 366.7% MultiSource/Applications/kimwitu++/kc 293.00 72.00 63.00 365.1% MultiSourc...-ProxyApps-C++/PENNANT/PENNANT 55.00 12.00 12.00 358.3% MicroBench...ubsetCLambdaLoops/lcalsCLambda 122.00 70.00 37.00 229.7% MicroBench...ubsetBLambdaLoops/lcalsBLambda 122.00 70.00 37.00 229.7% MicroBench...CALS/SubsetBRawLoops/lcalsBRaw 122.00 70.00 37.00 229.7% Geomean difference 37.7% gvn.NumGVNEqProp run base sameunderlying-and-nullptr sameunderlying diff count 147.000000 133.000000 126.000000 147.000000 mean 44.442177 33.511278 30.103175 0.809993 std 104.076048 69.462614 60.728356 2.227728 min 1.000000 1.000000 1.000000 0.000000 25% 2.000000 2.000000 2.000000 0.000000 50% 9.000000 9.000000 8.000000 0.000000 75% 36.500000 34.000000 34.000000 0.431444 max 874.000000 509.000000 373.000000 15.500000 Tests: 2432 Metric: gvn.NumPRELoad Program gvn.NumPRELoad base sameunderlying-and-nullptr sameunderlying diff ultiSource/Benchmarks/Olden/tsp/tsp 11.00 7.00 7.00 57.1% ultiSource/Benchmarks/McCat/09-vor/vor 10.00 8.00 8.00 25.0% ultiSource...nchmarks/tramp3d-v4/tramp3d-v4 1133.00 1061.00 1061.00 6.8% ultiSource/Applications/d/make_dparser 191.00 192.00 192.00 0.5% ultiSource...sumer-typeset/consumer-typeset 727.00 726.00 726.00 0.1% ultiSource...VC/Expansion-dbl/Expansion-dbl 6.00 6.00 6.00 0.0% ultiSource/Benchmarks/Ptrdist/ft/ft 3.00 3.00 3.00 0.0% ultiSource/Benchmarks/Ptrdist/ks/ks 3.00 3.00 3.00 0.0% ultiSource...Benchmarks/Ptrdist/yacr2/yacr2 35.00 35.00 35.00 0.0% ultiSource...Benchmarks/SciMark2-C/scimark2 11.00 11.00 11.00 0.0% ultiSource...ontrolFlow-dbl/ControlFlow-dbl 7.00 7.00 7.00 0.0% ultiSource...ontrolFlow-flt/ControlFlow-flt 7.00 7.00 7.00 0.0% ultiSource...lds-dbl/CrossingThresholds-dbl 1.00 1.00 1.00 0.0% ultiSource...lds-flt/CrossingThresholds-flt 1.00 1.00 1.00 0.0% icroBenchmarks/Builtins/Int128/Builtins 12.00 12.00 12.00 0.0% Geomean difference 0.4% gvn.NumPRELoad run base sameunderlying-and-nullptr sameunderlying diff count 194.000000 194.000000 194.000000 194.000000 mean 74.097938 73.695876 73.695876 0.004618 std 186.856476 184.798491 184.798491 0.044925 min 1.000000 1.000000 1.000000 0.000000 25% 2.000000 2.000000 2.000000 0.000000 50% 7.500000 7.000000 7.000000 0.000000 75% 42.000000 42.000000 42.000000 0.000000 max 1133.000000 1061.000000 1061.000000 0.571429 Tests: 2432 Metric: gvn.NumPRELoad Program gvn.NumPRELoad base sameunderlying-and-nullptr sameunderlying diff ultiSource/Benchmarks/Olden/tsp/tsp 11.00 7.00 7.00 57.1% ultiSource/Benchmarks/McCat/09-vor/vor 10.00 8.00 8.00 25.0% ultiSource...nchmarks/tramp3d-v4/tramp3d-v4 1133.00 1061.00 1061.00 6.8% ultiSource/Applications/d/make_dparser 191.00 192.00 192.00 0.5% ultiSource...sumer-typeset/consumer-typeset 727.00 726.00 726.00 0.1% ultiSource...VC/Expansion-dbl/Expansion-dbl 6.00 6.00 6.00 0.0% ultiSource/Benchmarks/Ptrdist/ft/ft 3.00 3.00 3.00 0.0% ultiSource/Benchmarks/Ptrdist/ks/ks 3.00 3.00 3.00 0.0% ultiSource...Benchmarks/Ptrdist/yacr2/yacr2 35.00 35.00 35.00 0.0% ultiSource...Benchmarks/SciMark2-C/scimark2 11.00 11.00 11.00 0.0% ultiSource...ontrolFlow-dbl/ControlFlow-dbl 7.00 7.00 7.00 0.0% ultiSource...ontrolFlow-flt/ControlFlow-flt 7.00 7.00 7.00 0.0% ultiSource...lds-dbl/CrossingThresholds-dbl 1.00 1.00 1.00 0.0% ultiSource...lds-flt/CrossingThresholds-flt 1.00 1.00 1.00 0.0% icroBenchmarks/Builtins/Int128/Builtins 12.00 12.00 12.00 0.0% Geomean difference 0.4% gvn.NumPRELoad run base sameunderlying-and-nullptr sameunderlying diff count 194.000000 194.000000 194.000000 194.000000 mean 74.097938 73.695876 73.695876 0.004618 std 186.856476 184.798491 184.798491 0.044925 min 1.000000 1.000000 1.000000 0.000000 25% 2.000000 2.000000 2.000000 0.000000 50% 7.500000 7.000000 7.000000 0.000000 75% 42.000000 42.000000 42.000000 0.000000 max 1133.000000 1061.000000 1061.000000 0.571429 Tests: 2432 Metric: gvn.NumPRELoopLoad Program gvn.NumPRELoopLoad base sameunderlying-and-nullptr sameunderlying diff MicroBench...ubsetALambdaLoops/lcalsALambda 1.00 1.00 1.00 0.0% MultiSourc...marks/Prolangs-C++/simul/simul 1.00 1.00 1.00 0.0% MultiSourc...chmarks/Prolangs-C/agrep/agrep 4.00 4.00 4.00 0.0% MultiSourc...marks/Prolangs-C/bison/mybison 17.00 17.00 17.00 0.0% MultiSourc...chmarks/Prolangs-C/cdecl/cdecl 1.00 1.00 1.00 0.0% MultiSourc...s/Prolangs-C/compiler/compiler 1.00 1.00 1.00 0.0% MultiSourc...s/Prolangs-C/football/football 1.00 1.00 1.00 0.0% MultiSourc...Prolangs-C/simulator/simulator 1.00 1.00 1.00 0.0% MultiSourc...hmarks/Ptrdist/anagram/anagram 1.00 1.00 1.00 0.0% MultiSourc...Benchmarks/Ptrdist/yacr2/yacr2 4.00 4.00 4.00 0.0% MultiSourc...enchmarks/mafft/pairlocalalign 5.00 5.00 5.00 0.0% MultiSourc...iabench/g721/g721encode/encode 1.00 1.00 1.00 0.0% MultiSourc...nch/mpeg2/mpeg2dec/mpeg2decode 1.00 1.00 1.00 0.0% MultiSource/Benchmarks/nbench/nbench 1.00 1.00 1.00 0.0% SingleSour...nchmarks/Adobe-C++/loop_unroll 130.00 130.00 130.00 0.0% Geomean difference 0.0% gvn.NumPRELoopLoad run base sameunderlying-and-nullptr sameunderlying diff count 45.000000 45.000000 45.000000 45.0 mean 15.711111 15.711111 15.711111 0.0 std 37.860163 37.860163 37.860163 0.0 min 1.000000 1.000000 1.000000 0.0 25% 1.000000 1.000000 1.000000 0.0 50% 1.000000 1.000000 1.000000 0.0 75% 6.000000 6.000000 6.000000 0.0 max 168.000000 168.000000 168.000000 0.0
The one case I'm probably worried about most are checks of the form %p == null, where replacing %p with null is not generally legal because it has nullary provenance. Replacing two arbitrary pointers likely isn't super important, but replacement with null might be.
I was looking at https://llvm.org/devmtg/2018-04/slides/Lopes-Sbirlea-Pointers,%20Alias%20and%20ModRef%20Analyses.pdf page 15, which says that replacement with null ptr is safe. So I did two runs, one where we do pointer replacement only when they have same underlying objects, and the another where we also replace pointers with null.
The validity of this transformation depends on the definition of a null pointer which is in a gray area.
The slide is based on the following memory model: (1) we can safely replace p with inttoptr (ptrtoint p). (2) null pointer is equivalent to inttoptr 0 (or inttoptr c for some special, fixed constant).
In this model, replacing p with null is fine.
I think we can talk about this in the LLVM discourse group. The numbers look informative and people would like to see them & suggest their ideas.
Thanks a lot for the detailed analysis!
I believe the currently accepted interpretation of null in LLVM is that it has nullary provenance. For example, we rely on this here: https://github.com/llvm/llvm-project/blob/46cdf7f0991280a3601a777a80fbc460196fb033/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp#L1064 This folds load (gep null, x) to unreachable. Of course, we also fold inttoptr 0 to null, which is inconsistent with that interpretation.
As @nlopes suggested, this should probably be discussed on discourse. Giving null universal provenance seems wrong from a principled perspective, but it may well be the most pragmatic option. (I don't think we rely on nullary provenance in many places, not sure there are any beyond the one I shared, and for that one we can probably recover most practically useful cases with an inbounds requirement.)
Here are the diffs for llvm-test-suite bitcode:
https://github.com/UsmanNadeem/vigilant-octo-journey/commit/125698f591df1f8de81ad4119b0838e135a18af8
and
https://github.com/UsmanNadeem/vigilant-octo-journey/commit/a778479334852e7f05b5a248da4473106aa986fb
Oh, that's a nice way to analyze differences. How did you produce the output?
I looked through the diffs a bit, and while most of them are immaterial diffs (just replacing one pointer with another without further simplification), it looks like there are also many genuine regressions in optimization quality. The first one is https://github.com/UsmanNadeem/vigilant-octo-journey/commit/a778479334852e7f05b5a248da4473106aa986fb#diff-429f0aa0b366d469c8f0fe4aaded9e87e83898118f192b6741caf4100e4e7792. I think the logic here is that we have a icmp eq ptr %link.112821, %48 condition on the edge, and then check
%313 = phi ptr [ %.pre12846, %for.end11932.loopexit ], [ %link.112821, %if.end11920 ] %cmp11936 = icmp ne ptr %313, %48
so if we replace %link.112821 with %48 here, we see that the condition is always true when coming from %if.end11920 and that allows us to thread the branch. This is a legal optimization, because icmp is provenance independent.
From a quick glance, the same basic pattern also occurs in many other IR diffs, so it seems like this is something important to preserve, though I'm not entirely sure what the best way to only propagating the equality in provenance-independent contexts would be (doing a recursive user walk seems rather iffy, but I don't have a better idea).
I compiled the test suite using -O3 with the base toolchain and made a list of files where GVN did pointer replacements by adding some debug output in that function. Copied the compile commands for those files from the make output and then used those to dump the IR for the base toolchain and toolchains with various patches.
Discourse link: https://discourse.llvm.org/t/restricting-equality-propagation-for-pointers-whats-the-best-way-how-to-deal-with-regressions/68619
llvm/lib/Transforms/Scalar/GVN.cpp | ||
---|---|---|
2562 | As @nikic suggested in the Discourse thread, do you want to fix patchAndReplaceAllUsesWith so that if it is a pointer it allows replacing uses at instructions that are unaware of provenance? I wonder how things will go. | |
llvm/test/Transforms/GVN/condprop.ll | ||
591 | Nit: could you avoid using br i1 undef? This is UB and there is this side project: https://llvm.org/OpenProjects.html#remove_ub_tests |
- Added a new method replaceDominatedUsesWithIf() that takes a callback.
- Updated canReplacePointersIfEqual() to return true for the following:
- Uses in ICmpInst and PtrToIntInst
- Replacements with null
- When underlying objects are the same
- Update tests.
- New LLVM test suite bitcode diff using O3 (assembly files were more difficult to analyze): https://github.com/UsmanNadeem/vigilant-octo-journey/commit/65a770a65eca4ed2cb37f6ff6a36ec6193fe8b73
Not sure about the changes in the assume-equal.ll test.
llvm/lib/Transforms/Scalar/GVN.cpp | ||
---|---|---|
2562 | Maybe in a separate patch? I tried doing assembly files but the changes were more extensive due to register assignment changes. |
Thanks! Yes, indeed there are still many files having diffs. But the number shrunk to ~1/5, which seems nice!
I looked at the diffs, and I could see a few replacements in the past that seem buggy...
In 82.ll:
%call103 = tail call ptr @encodingLine(ptr noundef %messageIn) #17 %22 = load ptr, ptr %t_next, align 8, !tbaa !30 %cmp105 = icmp eq ptr %call103, %22 br i1 %cmp105, label %if.then107, label %do.cond if.then107: %23 = load ptr, ptr %22, align 8, !tbaa !28 // Replacing %22 with %call103 is illegal
encodingLine does not have a function body as well as meaningful attributes, so it can return any pointer. This patch fixes the bug, which is good.
In 73.ll: I think we can make an improvement.
%call.i = tail call noalias dereferenceable_or_null(64) ptr @malloc(i64 noundef 64) #35 %tobool.not = icmp eq ptr %call.i, null ... cond.end.i.i: // At this point %call.i isn't null. %5 = load ptr, ptr %arrayidx6.i.i, align 8, !tbaa !5 %cmp.i.i = icmp eq ptr %5, %call.i br i1 %cmp.i.i, label %_Z15yy_flush_bufferP15yy_buffer_state.exit.thread.i,.... _Z15yy_flush_bufferP15yy_buffer_state.exit.thread.i: ; preds = %cond.end.i.i %yy_n_chars.i.i.i = getelementptr inbounds %struct.yy_buffer_state, ptr %5, i64 0, i32 4 // <- This can be replaced with 'gep inbounds %call.i, 0, 4'. %6 = load i32, ptr %yy_n_chars.i.i.i, align 4, !tbaa !13
Replacing p at load (gep inbounds p, ofs) with q at is fine if (1) q points to the beginning address of a live object (e.g. unfreed malloc), and (2) ofs is non-negative.
To guarantee liveness (1), you will probably need to check whether q hasn't been escaped, because an unknown function call may deallocate it.
My reasoning is like this:
- If p was a freed (dead) pointer, load would have already been UB, so q can be anything.
- If p was an unfreed (live) pointer, q must also be a live object because otherwise the replacement will make load introduce UB. If p and q are both live:
- If p and q pointed to the same object: replacement is trivially correct
- If p and q pointed to different objects: since p == q was true, the two different objects must be adjacent in their memory layout, and one of q or p must point to the beginning address of a live object. 73.ll corresponds to q pointing to 'the beginning address of a live object' case.
I hope I didn't miss something obvious in 73.ll & this replacement helps.
llvm/lib/Analysis/Loads.cpp | ||
---|---|---|
726 | Could you elaborate a bit about isa<PHINode>(User) condition? BB1: %phi1 = phi [%p, <pred>], .. br BB2 BB2: %phi2 = phi [%phi1, <BB1>], .. store i32 0, ptr %phi2 |
llvm/lib/Analysis/Loads.cpp | ||
---|---|---|
726 | I was trying to handle something like the example below. I guess I didn't think about the cases where the second phi's use could have illegal replacements. Not sure what the best way to handle this would be. BB1: %phi1 = phi [%p, <pred>], .. %cond = icmp eq ptr %phi1 , %q <<---- This case br BB2 BB2: %phi2 = phi [%phi1, <BB1>], .. store i32 0, ptr %phi2 |
Hi, I am currently occupied with something else. I will be back by today and continue reviewing this.
Regarding the updated experimental results, I think people might have thoughts about them. I will leave a comment there.
llvm/include/llvm/Transforms/Utils/Local.h | ||
---|---|---|
416 | You'll want to use clang-format. https://llvm.org/docs/Contributing.html#format-patches | |
llvm/lib/Analysis/Loads.cpp | ||
726 | I think dealing with this case is tricky. BB1: %phi1 = phi [%p, <pred>], .. %phi1_copy = phi [%p, <pred>], .. <<-- we can replace %p with %q here %cond = icmp eq ptr %phi1_copy , %q <<---- ... because %phi1_copy's only use is this icmp. br BB2 BB2: %phi2 = phi [%phi1, <BB1>], .. store i32 0, ptr %phi2 |
llvm/test/Transforms/GVN/assume-equal.ll | ||
---|---|---|
56 ↗ | (On Diff #501008) | Instead of the negative checks (which may pass for other reasons) I'd use a positive one with the new output. |
Okay, the test diffs look much better now. Some interesting remaining cases:
- 10.ll: This is another jump threading failure. The problem here is that the phi nodes are used in icmps, but not *only* in icmps, so we can't simply replace the operand there. I'm not sure what to do about this one. We could obviously duplicate the phi node and replace its non-provenance uses, but that's not going to be profitable in general. Possibly there is some specialized jump threading support one could add for this, but this is technically tricky, especially as threading can't use the dominator tree. LVI currently only tracks equality to constants.
- 101.ll (sink) and 103.ll (hoist): Here we have something like if (p == q) { store v, p } else { store v, q } where previously store v, q could be sunk/hoisted out of it. This isn't legal anymore. This is probably unrecoverable at the IR level, but maybe the machine layer handles it (I didn't check).
Overall I'd be willing to try landing this and seeing whether any significant performance issues arise.
llvm/include/llvm/Transforms/Utils/Local.h | ||
---|---|---|
416 | The callback doesn't need From and To -- if it needs them, it can capture them from context. Instead of User * we should pass Use &, which is more general. (E.g. if some operands are provenance-dependent, but others aren't -- I have future ptr_provenance support in mind here). | |
llvm/lib/Analysis/Loads.cpp | ||
789 | I think we should split this up into two APIs: One should stay as previous any just answer the question "can this always be replaced?" and the other should only take a Use and answer the question "is this provenance-independent?" We don't need to perform some of these checks for every single use, and some of the things (e.g. addition to leader table) can only be done if all uses can be replaced. | |
llvm/lib/Transforms/Scalar/GVN.cpp | ||
2328 | I don't think we can do this unless it's legal to unconditionally replace the pointers. |
+1 for landing this patch if there is no significant input from the discourse thread by next Monday (Mar. 13).
- Guard addToLeaderTable with the pointer replacement check.
- Create two APIs
- bool canReplacePointersIfEqual(const Value *From, const Value *To);
- bool canReplacePointersIfEqualInUse(const Use &U, const Value *To);
- For the callback in replaceDominatedUsesWithIf I changed the from argument with a Use. But I still need the to value in the callback.
- Clang format
- Update tests
llvm/include/llvm/Transforms/Utils/Local.h | ||
---|---|---|
416 |
It is already clang-formatted, not sure why the formatting is now different. | |
416 |
I replaced From with Use &, but I still needed the To param. Can you look at loads.cpp to see if it makes sense to use both the use and the to value? | |
llvm/lib/Analysis/Loads.cpp | ||
726 | I fixed the code and added a test case in a previous revision. I think fixing this will need more extensive changes than can be easily done in replaceDominatedUsesWith | |
789 | I created an isPointerUseReplacable(const Use &U but made it a static function which can be exposed in the future. | |
llvm/lib/Transforms/Scalar/GVN.cpp | ||
2562 | I have guarded addToLeaderTable with the check so IIUC this is not needed any more? |
llvm/lib/Analysis/Loads.cpp | ||
---|---|---|
787 | This API should not contain any Use-based reasoning, it should be just the isPointerAlwaysReplacable() part. The leader table replacement isn't valid even if all uses of this particular pointer are fine, as we may end up performing an indirect replacement on additional users. | |
llvm/test/Transforms/GVN/assume-equal.ll | ||
24 ↗ | (On Diff #505980) | Hm, these regressions look problematic. These assumptions are emitted by clang in EmitVTableAssumptionLoad() specifically to allow these replacements. This is going to lose devirtualization support for these cases. We probably need to carve out another exception for this case. Possibly we should just keep the existing exception for constants (i.e. allow if they are deferencable) to make this work. |
Added an exception for constant dereferenceable pointers and changed canReplacePointersIfEqual to not look at the Uses.
To get the dereferenceability of the pointer, Instead of passing the DL around I am trying to get it from the value, let me know if it looks okay.
You'll want to use clang-format. https://llvm.org/docs/Contributing.html#format-patches