PR41688
Details
Diff Detail
Event Timeline
This is a good idea in theory, but I'm somewhat concerned that this will backfire in practice because assumes often pessimize optimization. For example LLVM has a really bad offender where it inserts assumes for alignment annotations during inlining -- thankfully there's an option to disable it, because it's a major perf regression otherwise.
This feels like it should be less problematic though.
Pessimization of the final produced code, or of the llvm itself?
This feels like it should be less problematic though.
Of the final produced code. I think it's mainly because assumes cause one-use checks to fail.
Oh, good point. It would make sense to not count @llvm.assume() during use-checking,
but then this isn't just about the call void @llvm.assume(), but about all the instructions
that would be dropped when the call is dropped during back-end lowering...
Since llvm.assume is not a really true use, oneUse check fails? Then we should really fix it to not count it.
Any advices about next steps for this patch?
if inserted llvm.assume is a big problem and “fixing” oneUse match check is not proper solution, then this patch is dead..
In general, automatically llvm.assume much be done with extreme care, as it can pessimize optimizations. We try do this only in cases where the user as specifically indicated that there is some information relevant to optimization that is worth preserving. In this case, it might be worth doing, because the if (x) unreachable is the recommended "assume" pattern for use with GCC.
Possibly. But more fixes are needed as preconditions to enable this, as you mentioned here (fix oneUse check). Without them, we cannot push this futher.
Just to be clear, I didn't mean to say that we shouldn't do this, just that it is not guaranteed to be a win in the end. If it's possible to do some end-to-end performance testing with this patch to make sure that it doesn't come with unexpected regressions, I see no reason not to move forward with this change. The problem with assumes is a long-standing one and I don't think it has any simple solution.
$ ../test-suite/utils/compare.py baseline.json
Tests: 1163
Metric: exec_time
Program baseline
LCALS/Subs...aw.test:BM_MAT_X_MAT_RAW/44217 868459.97
LCALS/Subs...test:BM_MAT_X_MAT_LAMBDA/44217 563058.65
ImageProce...t:BENCHMARK_GAUSSIAN_BLUR/1024 99965.67
ImageProce...HMARK_ANISTROPIC_DIFFUSION/256 77470.40
harris/har...est:BENCHMARK_HARRIS/2048/2048 52498.97
ImageProce...MARK_BICUBIC_INTERPOLATION/256 44359.86
LCALS/Subs...Raw.test:BM_MAT_X_MAT_RAW/5001 25499.45
ImageProce...st:BENCHMARK_GAUSSIAN_BLUR/512 24021.84
LCALS/Subs....test:BM_MAT_X_MAT_LAMBDA/5001 20025.00
ImageProce...HMARK_ANISTROPIC_DIFFUSION/128 18171.17
harris/har...est:BENCHMARK_HARRIS/1024/1024 15093.33
ImageProce...ate.test:BENCHMARK_DILATE/1024 12321.46
ImageProce...ARK_BILINEAR_INTERPOLATION/256 11589.59
ImageProce...MARK_BICUBIC_INTERPOLATION/128 10496.46
LCALS/Subs...Raw.test:BM_HYDRO_2D_RAW/44217 8448.96
baseline
count 1147.000000
mean 1727.619647
std 30862.929090
min 0.000000
25% 0.004000
50% 0.008800
75% 3.645378
max 868459.973000
$ ../test-suite/utils/compare.py patch.json
Tests: 1163
Metric: exec_time
Program patch
LCALS/Subs...test:BM_MAT_X_MAT_LAMBDA/44217 648771.10
LCALS/Subs...aw.test:BM_MAT_X_MAT_RAW/44217 549594.39
ImageProce...t:BENCHMARK_GAUSSIAN_BLUR/1024 112422.28
ImageProce...HMARK_ANISTROPIC_DIFFUSION/256 91073.89
harris/har...est:BENCHMARK_HARRIS/2048/2048 52924.20
ImageProce...MARK_BICUBIC_INTERPOLATION/256 50989.07
ImageProce...st:BENCHMARK_GAUSSIAN_BLUR/512 26065.95
LCALS/Subs...Raw.test:BM_MAT_X_MAT_RAW/5001 23588.53
ImageProce...HMARK_ANISTROPIC_DIFFUSION/128 21120.28
LCALS/Subs....test:BM_MAT_X_MAT_LAMBDA/5001 20901.71
ImageProce...ARK_BILINEAR_INTERPOLATION/256 15573.54
harris/har...est:BENCHMARK_HARRIS/1024/1024 14383.34
ImageProce...ate.test:BENCHMARK_DILATE/1024 12928.17
ImageProce...MARK_BICUBIC_INTERPOLATION/128 10768.69
LCALS/Subs...Raw.test:BM_HYDRO_2D_RAW/44217 9542.02
patch
count 1147.000000
mean 1571.500067
std 25580.275818
min 0.000000
25% 0.004000
50% 0.008900
75% 3.815259
max 648771.096000
You can use something like the command below to get a comparison between 2 sets of runs. Also, doing single runs only will probably result in quite noisy results.
./test-suite/utils/compare.py -m exec_time patch.json vs master.json
Not sure why this crashes for me:
/home/xbolva00/.local/lib/python2.7/site-packages/scipy/stats/stats.py:308: RuntimeWarning: divide by zero encountered in log
log_a = np.log(np.array(a, dtype=dtype))
/home/xbolva00/.local/lib/python2.7/site-packages/numpy/core/_methods.py:75: RuntimeWarning: invalid value encountered in reduce
ret = umr_sum(arr, axis, dtype, out, keepdims)
Anway, I used --filter-short.. I ran and compare it 3x..
https://pastebin.com/kjFBv2ZP
Yes, noise a bit but I think there are some stable improvements..
These results look pretty noisy, so it's hard to say. Maybe someone who already has a good benchmarking setup (maybe @dmgreen?) could run some tests?
Independently of that, could you please precommit your new test, as well as the regenerated test checks? Especially for the LoopVectorize/if-pred-stores.ll test it's not clear what has actually changed.
Rebased on updated test checks..
Some tests are failing
Failing Tests (4):
LLVM :: CodeGen/Hexagon/bit-visit-flowq.ll LLVM :: CodeGen/Hexagon/rdf-ignore-undef.ll LLVM :: CodeGen/Hexagon/reg-scavengebug.ll LLVM :: CodeGen/Hexagon/regalloc-block-overlap.ll
The root cause: tests contain "br undef" and this is now optimized away. Not sure how to progress here..
The root cause: tests contain "br undef" and this is now optimized away. Not sure how to progress here..
This is a common problem with bugpoint reduced tests. You'll have to update these tests (as well as if-pred-stores.ll) either by replacing unreachables with something else (say a dummy return), or replacing the undef branch with a proper condition (e.g. pass in a bool as arg), while trying to preserve the original intention of the test.
This LGTM. Please watch the LNT bots carefully and revert if we find performance regressions. A fear of performance regressions is specifically why we didn't do this when the assumption facility was first introduced.
Hello. Looks fine from the tests I ran. For codesize too, which can be a good way to get low-noise indication of how things change.
Thank you!
Commited.
Please also see https://bugs.llvm.org/show_bug.cgi?id=42019 if you have any ideas how to handle intrinsics in use-checking code.