Page MenuHomePhabricator

[SimplifyCFG] Added condition assumption for unreachable blocks
ClosedPublic

Authored by xbolva00 on May 1 2019, 3:40 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

xbolva00 created this revision.May 1 2019, 3:40 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 1 2019, 3:40 PM
xbolva00 updated this revision to Diff 197663.May 1 2019, 4:00 PM

Update old tests

This comment was removed by xbolva00.

Please precommit the check-line regeneration

nikic added a subscriber: nikic.May 2 2019, 12:44 AM

This is a good idea in theory, but I'm somewhat concerned that this will backfire in practice because assumes often pessimize optimization. For example LLVM has a really bad offender where it inserts assumes for alignment annotations during inlining -- thankfully there's an option to disable it, because it's a major perf regression otherwise.

This feels like it should be less problematic though.

This is a good idea in theory, but I'm somewhat concerned that this will backfire in practice because assumes often pessimize optimization. For example LLVM has a really bad offender where it inserts assumes for alignment annotations during inlining -- thankfully there's an option to disable it, because it's a major perf regression otherwise.

Pessimization of the final produced code, or of the llvm itself?

This feels like it should be less problematic though.

nikic added a comment.May 2 2019, 12:59 AM

This is a good idea in theory, but I'm somewhat concerned that this will backfire in practice because assumes often pessimize optimization. For example LLVM has a really bad offender where it inserts assumes for alignment annotations during inlining -- thankfully there's an option to disable it, because it's a major perf regression otherwise.

Pessimization of the final produced code, or of the llvm itself?

Of the final produced code. I think it's mainly because assumes cause one-use checks to fail.

This is a good idea in theory, but I'm somewhat concerned that this will backfire in practice because assumes often pessimize optimization. For example LLVM has a really bad offender where it inserts assumes for alignment annotations during inlining -- thankfully there's an option to disable it, because it's a major perf regression otherwise.

Pessimization of the final produced code, or of the llvm itself?

Of the final produced code. I think it's mainly because assumes cause one-use checks to fail.

Oh, good point. It would make sense to not count @llvm.assume() during use-checking,
but then this isn't just about the call void @llvm.assume(), but about all the instructions
that would be dropped when the call is dropped during back-end lowering...

xbolva00 added a comment.EditedMay 2 2019, 1:10 AM

Since llvm.assume is not a really true use, oneUse check fails? Then we should really fix it to not count it.

xbolva00 added a comment.EditedMay 2 2019, 10:36 AM

It would be good if anybody familiar with llvm patternmatch could fix m_OneUse to ignore llvm.assume.

Then this patch will not cause any problems or regressions..

@spatel @RKSimon

It would be good if anybody familiar with llvm patternmatch could fix m_OneUse to ignore llvm.assume.

This is a good idea in theory, but I'm somewhat concerned that this will backfire in practice because assumes often pessimize optimization. For example LLVM has a really bad offender where it inserts assumes for alignment annotations during inlining -- thankfully there's an option to disable it, because it's a major perf regression otherwise.

Pessimization of the final produced code, or of the llvm itself?

Of the final produced code. I think it's mainly because assumes cause one-use checks to fail.

Oh, good point. It would make sense to not count @llvm.assume() during use-checking,
but then this isn't just about the call void @llvm.assume(), but about all the instructions
that would be dropped when the call is dropped during back-end lowering...

I.e. just ignoring only the @llvm.assume() itself is only tip of the iceberg.

Then this patch will not cause any problems or regressions..

@spatel @RKSimon

It would be good if anybody familiar with llvm patternmatch could fix m_OneUse to ignore llvm.assume.

This is a good idea in theory, but I'm somewhat concerned that this will backfire in practice because assumes often pessimize optimization. For example LLVM has a really bad offender where it inserts assumes for alignment annotations during inlining -- thankfully there's an option to disable it, because it's a major perf regression otherwise.

Pessimization of the final produced code, or of the llvm itself?

Of the final produced code. I think it's mainly because assumes cause one-use checks to fail.

Oh, good point. It would make sense to not count @llvm.assume() during use-checking,
but then this isn't just about the call void @llvm.assume(), but about all the instructions
that would be dropped when the call is dropped during back-end lowering...

I.e. just ignoring only the @llvm.assume() itself is only tip of the iceberg.

Then this patch will not cause any problems or regressions..

@spatel @RKSimon

I agree that this is a (much) bigger than fixing the pattern matcher. I have no instinct or data to say how much inserted assumes can/do harm optimization, but this is a long-standing problem with that implementation. @hfinkel or @reames may have a better idea.

Any advices about next steps for this patch?

if inserted llvm.assume is a big problem and “fixing” oneUse match check is not proper solution, then this patch is dead..

Any advices about next steps for this patch?

if inserted llvm.assume is a big problem and “fixing” oneUse match check is not proper solution, then this patch is dead..

In general, automatically llvm.assume much be done with extreme care, as it can pessimize optimizations. We try do this only in cases where the user as specifically indicated that there is some information relevant to optimization that is worth preserving. In this case, it might be worth doing, because the if (x) unreachable is the recommended "assume" pattern for use with GCC.

xbolva00 abandoned this revision.May 15 2019, 1:29 PM

Any advices about next steps for this patch?

if inserted llvm.assume is a big problem and “fixing” oneUse match check is not proper solution, then this patch is dead..

In general, automatically llvm.assume much be done with extreme care, as it can pessimize optimizations. We try do this only in cases where the user as specifically indicated that there is some information relevant to optimization that is worth preserving.

In this case, it might be worth doing, because the if (x) unreachable is the recommended "assume" pattern for use with GCC.

It looked this patch was moving in the right direction?

Possibly. But more fixes are needed as preconditions to enable this, as you mentioned here (fix oneUse check). Without them, we cannot push this futher.

xbolva00 reclaimed this revision.May 15 2019, 1:57 PM
nikic added a comment.May 15 2019, 2:07 PM

Just to be clear, I didn't mean to say that we shouldn't do this, just that it is not guaranteed to be a win in the end. If it's possible to do some end-to-end performance testing with this patch to make sure that it doesn't come with unexpected regressions, I see no reason not to move forward with this change. The problem with assumes is a long-standing one and I don't think it has any simple solution.

Just to be clear, I didn't mean to say that we shouldn't do this, just that it is not guaranteed to be a win in the end. If it's possible to do some end-to-end performance testing with this patch to make sure that it doesn't come with unexpected regressions, I see no reason not to move forward with this change. The problem with assumes is a long-standing one and I don't think it has any simple solution.

That was my take, too.

$ ../test-suite/utils/compare.py baseline.json
Tests: 1163
Metric: exec_time

Program baseline
LCALS/Subs...aw.test:BM_MAT_X_MAT_RAW/44217 868459.97
LCALS/Subs...test:BM_MAT_X_MAT_LAMBDA/44217 563058.65
ImageProce...t:BENCHMARK_GAUSSIAN_BLUR/1024 99965.67
ImageProce...HMARK_ANISTROPIC_DIFFUSION/256 77470.40
harris/har...est:BENCHMARK_HARRIS/2048/2048 52498.97
ImageProce...MARK_BICUBIC_INTERPOLATION/256 44359.86
LCALS/Subs...Raw.test:BM_MAT_X_MAT_RAW/5001 25499.45
ImageProce...st:BENCHMARK_GAUSSIAN_BLUR/512 24021.84
LCALS/Subs....test:BM_MAT_X_MAT_LAMBDA/5001 20025.00
ImageProce...HMARK_ANISTROPIC_DIFFUSION/128 18171.17
harris/har...est:BENCHMARK_HARRIS/1024/1024 15093.33
ImageProce...ate.test:BENCHMARK_DILATE/1024 12321.46
ImageProce...ARK_BILINEAR_INTERPOLATION/256 11589.59
ImageProce...MARK_BICUBIC_INTERPOLATION/128 10496.46
LCALS/Subs...Raw.test:BM_HYDRO_2D_RAW/44217 8448.96

baseline

count 1147.000000
mean 1727.619647
std 30862.929090
min 0.000000
25% 0.004000
50% 0.008800
75% 3.645378
max 868459.973000
$ ../test-suite/utils/compare.py patch.json
Tests: 1163
Metric: exec_time

Program patch
LCALS/Subs...test:BM_MAT_X_MAT_LAMBDA/44217 648771.10
LCALS/Subs...aw.test:BM_MAT_X_MAT_RAW/44217 549594.39
ImageProce...t:BENCHMARK_GAUSSIAN_BLUR/1024 112422.28
ImageProce...HMARK_ANISTROPIC_DIFFUSION/256 91073.89
harris/har...est:BENCHMARK_HARRIS/2048/2048 52924.20
ImageProce...MARK_BICUBIC_INTERPOLATION/256 50989.07
ImageProce...st:BENCHMARK_GAUSSIAN_BLUR/512 26065.95
LCALS/Subs...Raw.test:BM_MAT_X_MAT_RAW/5001 23588.53
ImageProce...HMARK_ANISTROPIC_DIFFUSION/128 21120.28
LCALS/Subs....test:BM_MAT_X_MAT_LAMBDA/5001 20901.71
ImageProce...ARK_BILINEAR_INTERPOLATION/256 15573.54
harris/har...est:BENCHMARK_HARRIS/1024/1024 14383.34
ImageProce...ate.test:BENCHMARK_DILATE/1024 12928.17
ImageProce...MARK_BICUBIC_INTERPOLATION/128 10768.69
LCALS/Subs...Raw.test:BM_HYDRO_2D_RAW/44217 9542.02

patch

count 1147.000000
mean 1571.500067
std 25580.275818
min 0.000000
25% 0.004000
50% 0.008900
75% 3.815259
max 648771.096000

fhahn added a subscriber: fhahn.May 19 2019, 2:35 PM

You can use something like the command below to get a comparison between 2 sets of runs. Also, doing single runs only will probably result in quite noisy results.

./test-suite/utils/compare.py -m exec_time patch.json vs master.json

You can use something like the command below to get a comparison between 2 sets of runs. Also, doing single runs only will probably result in quite noisy results.

./test-suite/utils/compare.py -m exec_time patch.json vs master.json

Not sure why this crashes for me:
/home/xbolva00/.local/lib/python2.7/site-packages/scipy/stats/stats.py:308: RuntimeWarning: divide by zero encountered in log

log_a = np.log(np.array(a, dtype=dtype))

/home/xbolva00/.local/lib/python2.7/site-packages/numpy/core/_methods.py:75: RuntimeWarning: invalid value encountered in reduce

ret = umr_sum(arr, axis, dtype, out, keepdims)

Anway, I used --filter-short.. I ran and compare it 3x..
https://pastebin.com/kjFBv2ZP

Yes, noise a bit but I think there are some stable improvements..

Ping

I thinks the results are okay..

nikic added a subscriber: dmgreen.May 25 2019, 3:00 AM

These results look pretty noisy, so it's hard to say. Maybe someone who already has a good benchmarking setup (maybe @dmgreen?) could run some tests?

nikic added a comment.May 25 2019, 3:03 AM

Independently of that, could you please precommit your new test, as well as the regenerated test checks? Especially for the LoopVectorize/if-pred-stores.ll test it's not clear what has actually changed.

RKSimon edited reviewers, added: hfinkel, reames; removed: RKSimon.May 25 2019, 5:04 AM
xbolva00 updated this revision to Diff 201398.May 25 2019, 6:26 AM

Rebased on updated test checks..

Some tests are failing
Failing Tests (4):

LLVM :: CodeGen/Hexagon/bit-visit-flowq.ll
LLVM :: CodeGen/Hexagon/rdf-ignore-undef.ll
LLVM :: CodeGen/Hexagon/reg-scavengebug.ll
LLVM :: CodeGen/Hexagon/regalloc-block-overlap.ll

The root cause: tests contain "br undef" and this is now optimized away. Not sure how to progress here..

nikic added a comment.May 25 2019, 6:43 AM

The root cause: tests contain "br undef" and this is now optimized away. Not sure how to progress here..

This is a common problem with bugpoint reduced tests. You'll have to update these tests (as well as if-pred-stores.ll) either by replacing unreachables with something else (say a dummy return), or replacing the undef branch with a proper condition (e.g. pass in a bool as arg), while trying to preserve the original intention of the test.

hfinkel accepted this revision.May 25 2019, 1:41 PM

This LGTM. Please watch the LNT bots carefully and revert if we find performance regressions. A fear of performance regressions is specifically why we didn't do this when the assumption facility was first introduced.

This revision is now accepted and ready to land.May 25 2019, 1:41 PM

Thanks!

I will wait a few days in case @dmgreen wants to do more benchmarks.

Hello. Looks fine from the tests I ran. For codesize too, which can be a good way to get low-noise indication of how things change.

This revision was automatically updated to reflect the committed changes.

Thank you!

Commited.

Please also see https://bugs.llvm.org/show_bug.cgi?id=42019 if you have any ideas how to handle intrinsics in use-checking code.