Page MenuHomePhabricator

[MachineCSE] Extend CSE heuristic
Needs ReviewPublic

Authored by piotr on Jun 27 2019, 1:31 AM.

Details

Summary

Extend CSE heuristic, if all uses of a value are in a BB
whose only successor contains the new use.

Diff Detail

Event Timeline

piotr created this revision.Jun 27 2019, 1:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2019, 1:31 AM
arsenm added inline comments.Jul 1 2019, 7:50 AM
lib/CodeGen/MachineCSE.cpp
505–513

I'm not sure the phis only part is needed. It would make sense to me to check the successor blocks if it's trivial (i.e. the successor only has the one predecessor here)

test/CodeGen/AMDGPU/cse-phi-incoming-val.ll
14

There should probably be additional MIR tests for this. I wouldn't trust that the control flow block layout we emit know will remain constant

piotr updated this revision to Diff 207498.Jul 2 2019, 3:20 AM

V2: Removing condition that BB has to contain phis only. Added a MIR test.

piotr marked 2 inline comments as done.Jul 2 2019, 3:24 AM
piotr added inline comments.
lib/CodeGen/MachineCSE.cpp
505–513

Good point, I removed the condition that BB has to contain phis only. I am checking that the BB has only one successor. However, that successor might have more than one predecessor, as the pattern I am matching here is a block containing phi nodes, and such blocks typically have more than one predecessor.

test/CodeGen/AMDGPU/cse-phi-incoming-val.ll
14

Added a MIR test.

piotr updated this revision to Diff 207500.Jul 2 2019, 3:25 AM

V3: Updating commit message.

piotr edited the summary of this revision. (Show Details)Jul 2 2019, 3:27 AM

Hi @piotr, have you tried regression benchmarking with this patch? For instance, test suite.

piotr added a comment.Jul 8 2019, 4:26 PM

Hi @piotr, have you tried regression benchmarking with this patch? For instance, test suite.

Hi Anton, Yes, I have just run regression testing on the test-suite and the tests pass ("Expected Passes : 917").

arsenm added inline comments.Jul 8 2019, 4:34 PM
lib/CodeGen/MachineCSE.cpp
505–513

I'm not sure if this should include the phi check or not. More sample cases stressing the different numbers of successors and other instructions would be good?

test/CodeGen/AMDGPU/cse-phi-incoming-val.mir
43–52

Can you trim some of these instructions?

Hi @piotr, have you tried regression benchmarking with this patch? For instance, test suite.

Hi Anton, Yes, I have just run regression testing on the test-suite and the tests pass ("Expected Passes : 917").

Hmm, I'm talking about performance benchmarking, from llvm Test Suite: https://llvm.org/docs/TestingGuide.html#test-suite.

piotr added a comment.Jul 10 2019, 4:59 AM

Hi @piotr, have you tried regression benchmarking with this patch? For instance, test suite.

Hi Anton, Yes, I have just run regression testing on the test-suite and the tests pass ("Expected Passes : 917").

Hmm, I'm talking about performance benchmarking, from llvm Test Suite: https://llvm.org/docs/TestingGuide.html#test-suite.

Sorry, I referred to the entire test-suite. For MicroBenchmarks only (llvm-lit -v -j 1 -o patch.json MicroBenchmarks) I get the following detailed results:

$python3 ../../test-suite/utils/compare.py --filter-short orig.json patch.json
Warning: 'test-suite :: MicroBenchmarks/XRay/FDRMode/fdrmode-bench.test' has No metrics!
Warning: 'test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.test' has No metrics!
Warning: 'test-suite :: MicroBenchmarks/XRay/FDRMode/fdrmode-bench.test' has No metrics!
Warning: 'test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.test' has No metrics!
Tests: 248
Metric: exec_time

Program orig patch diff
test-suite...mbda.test:BM_PIC_2D_LAMBDA/171 3.18 0.88 -72.2%
test-suite...alsCRaw.test:BM_PIC_2D_RAW/171 1.77 0.89 -49.8%
test-suite...Raw.test:BM_MAT_X_MAT_RAW/5001 4520.23 6770.15 49.8%
test-suite...aw.test:BM_MULADDSUB_RAW/44217 47.74 69.22 45.0%
test-suite...lsBRaw.test:BM_INIT3_RAW/44217 33.57 47.99 42.9%
test-suite...ambda.test:BM_EOS_LAMBDA/44217 46.45 58.30 25.5%
test-suite...ENCHMARK_BILATERAL_FILTER/16/2 22.35 27.97 25.2%
test-suite...ENCHMARK_BILATERAL_FILTER/16/4 68.97 85.68 24.2%
test-suite...ENCHMARK_BILATERAL_FILTER/32/2 103.48 127.84 23.5%
test-suite...ENCHMARK_BILATERAL_FILTER/32/4 381.09 468.47 22.9%
test-suite...mbda.test:BM_ICCG_LAMBDA/44217 32.80 39.22 19.6%
test-suite...st:BM_INT_PREDICT_LAMBDA/44217 366.07 434.41 18.7%
test-suite...est:BM_INT_PREDICT_LAMBDA/5001 14.69 16.82 14.5%
test-suite...BRaw.test:BM_IF_QUAD_RAW/44217 148.05 168.47 13.8%
test-suite....test:BENCHMARK_HARRIS/256/256 389.53 346.90 -10.9%
Geomean difference nan%
orig patch diff
count 232.000000 232.000000 232.000000
mean 2081.547998 2091.565842 0.007603
std 11692.576779 11651.097661 0.091756
min 0.030994 0.030717 -0.722465
25% 3.198364 2.967566 -0.012795
50% 36.165086 39.408759 -0.000642
75% 318.340952 318.455341 0.009107
max 115341.083667 113080.746500 0.497745

piotr added inline comments.Jul 10 2019, 5:04 AM
lib/CodeGen/MachineCSE.cpp
505–513

My feeling is that without the phi check this heuristic would be too generic and will cause regressions, but can double check. The aim of this change is to target phi nodes re-using the same arguments.
As for the second part, do you mean to add more tests?

test/CodeGen/AMDGPU/cse-phi-incoming-val.mir
43–52

Will do.

piotr updated this revision to Diff 210522.Jul 18 2019, 3:19 AM

V4: Simplified the condition and test.

piotr retitled this revision from [MachineCSE] Improve CSE on phi node incoming value to [MachineCSE] Extend CSE heuristic.Jul 18 2019, 3:20 AM
piotr edited the summary of this revision. (Show Details)
arsenm added inline comments.Jul 18 2019, 8:02 AM
lib/CodeGen/MachineCSE.cpp
505–507

I'm having a bit too much trouble following the heuristic with the two loops. Can you split the first loop into a function to return a possible alternative block? The second loop would then remain as-is, except checking UseMI.getParent() == OrigParent || == AlternativeParent

507–512

Can you re-flow this loop into separate BBUses and !BBUses case?

piotr updated this revision to Diff 210802.Jul 19 2019, 5:02 AM
piotr edited the summary of this revision. (Show Details)

V5: Refactored code.

piotr marked 2 inline comments as done.Mon, Jul 29, 3:57 AM

Ping

arsenm added a comment.Mon, Aug 5, 7:38 AM

Mostly LGTM, except for the empty block case I'm worried about

lib/CodeGen/MachineCSE.cpp
422

Extra empty line

test/CodeGen/AMDGPU/cse-phi-incoming-val.mir
73

Can you add another test with a totally empty successor block? I think those should also be ignored