This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add test
ClosedPublic

Authored by piotr on Jun 27 2019, 1:31 AM.

Details

Summary

Add test checking that the redundant immediate MOV instruction
(by-product of handling phi nodes) is not found in the generated code.

Diff Detail

Event Timeline

piotr created this revision.Jun 27 2019, 1:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2019, 1:31 AM
arsenm added inline comments.Jul 1 2019, 7:50 AM
lib/CodeGen/MachineCSE.cpp
482–490

I'm not sure the phis only part is needed. It would make sense to me to check the successor blocks if it's trivial (i.e. the successor only has the one predecessor here)

test/CodeGen/AMDGPU/cse-phi-incoming-val.ll
14

There should probably be additional MIR tests for this. I wouldn't trust that the control flow block layout we emit know will remain constant

piotr updated this revision to Diff 207498.Jul 2 2019, 3:20 AM

V2: Removing condition that BB has to contain phis only. Added a MIR test.

piotr marked 2 inline comments as done.Jul 2 2019, 3:24 AM
piotr added inline comments.
lib/CodeGen/MachineCSE.cpp
482–490

Good point, I removed the condition that BB has to contain phis only. I am checking that the BB has only one successor. However, that successor might have more than one predecessor, as the pattern I am matching here is a block containing phi nodes, and such blocks typically have more than one predecessor.

test/CodeGen/AMDGPU/cse-phi-incoming-val.ll
14

Added a MIR test.

piotr updated this revision to Diff 207500.Jul 2 2019, 3:25 AM

V3: Updating commit message.

piotr edited the summary of this revision. (Show Details)Jul 2 2019, 3:27 AM

Hi @piotr, have you tried regression benchmarking with this patch? For instance, test suite.

piotr added a comment.Jul 8 2019, 4:26 PM

Hi @piotr, have you tried regression benchmarking with this patch? For instance, test suite.

Hi Anton, Yes, I have just run regression testing on the test-suite and the tests pass ("Expected Passes : 917").

arsenm added inline comments.Jul 8 2019, 4:34 PM
lib/CodeGen/MachineCSE.cpp
482–490

I'm not sure if this should include the phi check or not. More sample cases stressing the different numbers of successors and other instructions would be good?

test/CodeGen/AMDGPU/cse-phi-incoming-val.mir
42–51

Can you trim some of these instructions?

Hi @piotr, have you tried regression benchmarking with this patch? For instance, test suite.

Hi Anton, Yes, I have just run regression testing on the test-suite and the tests pass ("Expected Passes : 917").

Hmm, I'm talking about performance benchmarking, from llvm Test Suite: https://llvm.org/docs/TestingGuide.html#test-suite.

piotr added a comment.Jul 10 2019, 4:59 AM

Hi @piotr, have you tried regression benchmarking with this patch? For instance, test suite.

Hi Anton, Yes, I have just run regression testing on the test-suite and the tests pass ("Expected Passes : 917").

Hmm, I'm talking about performance benchmarking, from llvm Test Suite: https://llvm.org/docs/TestingGuide.html#test-suite.

Sorry, I referred to the entire test-suite. For MicroBenchmarks only (llvm-lit -v -j 1 -o patch.json MicroBenchmarks) I get the following detailed results:

$python3 ../../test-suite/utils/compare.py --filter-short orig.json patch.json
Warning: 'test-suite :: MicroBenchmarks/XRay/FDRMode/fdrmode-bench.test' has No metrics!
Warning: 'test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.test' has No metrics!
Warning: 'test-suite :: MicroBenchmarks/XRay/FDRMode/fdrmode-bench.test' has No metrics!
Warning: 'test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.test' has No metrics!
Tests: 248
Metric: exec_time

Program orig patch diff
test-suite...mbda.test:BM_PIC_2D_LAMBDA/171 3.18 0.88 -72.2%
test-suite...alsCRaw.test:BM_PIC_2D_RAW/171 1.77 0.89 -49.8%
test-suite...Raw.test:BM_MAT_X_MAT_RAW/5001 4520.23 6770.15 49.8%
test-suite...aw.test:BM_MULADDSUB_RAW/44217 47.74 69.22 45.0%
test-suite...lsBRaw.test:BM_INIT3_RAW/44217 33.57 47.99 42.9%
test-suite...ambda.test:BM_EOS_LAMBDA/44217 46.45 58.30 25.5%
test-suite...ENCHMARK_BILATERAL_FILTER/16/2 22.35 27.97 25.2%
test-suite...ENCHMARK_BILATERAL_FILTER/16/4 68.97 85.68 24.2%
test-suite...ENCHMARK_BILATERAL_FILTER/32/2 103.48 127.84 23.5%
test-suite...ENCHMARK_BILATERAL_FILTER/32/4 381.09 468.47 22.9%
test-suite...mbda.test:BM_ICCG_LAMBDA/44217 32.80 39.22 19.6%
test-suite...st:BM_INT_PREDICT_LAMBDA/44217 366.07 434.41 18.7%
test-suite...est:BM_INT_PREDICT_LAMBDA/5001 14.69 16.82 14.5%
test-suite...BRaw.test:BM_IF_QUAD_RAW/44217 148.05 168.47 13.8%
test-suite....test:BENCHMARK_HARRIS/256/256 389.53 346.90 -10.9%
Geomean difference nan%
orig patch diff
count 232.000000 232.000000 232.000000
mean 2081.547998 2091.565842 0.007603
std 11692.576779 11651.097661 0.091756
min 0.030994 0.030717 -0.722465
25% 3.198364 2.967566 -0.012795
50% 36.165086 39.408759 -0.000642
75% 318.340952 318.455341 0.009107
max 115341.083667 113080.746500 0.497745

piotr added inline comments.Jul 10 2019, 5:04 AM
lib/CodeGen/MachineCSE.cpp
482–490

My feeling is that without the phi check this heuristic would be too generic and will cause regressions, but can double check. The aim of this change is to target phi nodes re-using the same arguments.
As for the second part, do you mean to add more tests?

test/CodeGen/AMDGPU/cse-phi-incoming-val.mir
42–51

Will do.

piotr updated this revision to Diff 210522.Jul 18 2019, 3:19 AM

V4: Simplified the condition and test.

piotr retitled this revision from [MachineCSE] Improve CSE on phi node incoming value to [MachineCSE] Extend CSE heuristic.Jul 18 2019, 3:20 AM
piotr edited the summary of this revision. (Show Details)
arsenm added inline comments.Jul 18 2019, 8:02 AM
lib/CodeGen/MachineCSE.cpp
482–484

I'm having a bit too much trouble following the heuristic with the two loops. Can you split the first loop into a function to return a possible alternative block? The second loop would then remain as-is, except checking UseMI.getParent() == OrigParent || == AlternativeParent

484–489

Can you re-flow this loop into separate BBUses and !BBUses case?

piotr updated this revision to Diff 210802.Jul 19 2019, 5:02 AM
piotr edited the summary of this revision. (Show Details)

V5: Refactored code.

piotr marked 2 inline comments as done.Jul 29 2019, 3:57 AM

Ping

arsenm added a comment.Aug 5 2019, 7:38 AM

Mostly LGTM, except for the empty block case I'm worried about

lib/CodeGen/MachineCSE.cpp
422

Extra empty line

test/CodeGen/AMDGPU/cse-phi-incoming-val.mir
74

Can you add another test with a totally empty successor block? I think those should also be ignored

Any update on this?

piotr added a comment.Aug 30 2019, 6:02 AM

Thank you for bearing with me, I have been away.

Now, when I re-checked the original case from D63709 this morning (cse-phi-incoming-val.ll) I noticed that the extra mov is not present, even without my patch. It turns out that D63198 (which was pushed recently) made the original issue go away. I guess, the fact that the issue was in the structurizer kind of explains the observation that other targets seemed to not suffer from similar problems.

Anyway, I intend to just push cse-phi-incoming-val.ll as part of the current review.

piotr updated this revision to Diff 218123.Aug 30 2019, 9:58 AM

Reverting the patch and leaving the test only as it passes without the change in code.

piotr retitled this revision from [MachineCSE] Extend CSE heuristic to [AMDGPU] Add test.Aug 30 2019, 9:59 AM
piotr edited the summary of this revision. (Show Details)
arsenm accepted this revision.Aug 30 2019, 10:05 AM

LGTM

This revision is now accepted and ready to land.Aug 30 2019, 10:05 AM
This revision was automatically updated to reflect the committed changes.