Add test checking that the redundant immediate MOV instruction
(by-product of handling phi nodes) is not found in the generated code.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/CodeGen/MachineCSE.cpp | ||
---|---|---|
482–490 ↗ | (On Diff #206794) | I'm not sure the phis only part is needed. It would make sense to me to check the successor blocks if it's trivial (i.e. the successor only has the one predecessor here) |
test/CodeGen/AMDGPU/cse-phi-incoming-val.ll | ||
13 ↗ | (On Diff #206794) | There should probably be additional MIR tests for this. I wouldn't trust that the control flow block layout we emit know will remain constant |
lib/CodeGen/MachineCSE.cpp | ||
---|---|---|
482–490 ↗ | (On Diff #206794) | Good point, I removed the condition that BB has to contain phis only. I am checking that the BB has only one successor. However, that successor might have more than one predecessor, as the pattern I am matching here is a block containing phi nodes, and such blocks typically have more than one predecessor. |
test/CodeGen/AMDGPU/cse-phi-incoming-val.ll | ||
13 ↗ | (On Diff #206794) | Added a MIR test. |
Hi @piotr, have you tried regression benchmarking with this patch? For instance, test suite.
Hi Anton, Yes, I have just run regression testing on the test-suite and the tests pass ("Expected Passes : 917").
lib/CodeGen/MachineCSE.cpp | ||
---|---|---|
482–490 ↗ | (On Diff #206794) | I'm not sure if this should include the phi check or not. More sample cases stressing the different numbers of successors and other instructions would be good? |
test/CodeGen/AMDGPU/cse-phi-incoming-val.mir | ||
42–51 ↗ | (On Diff #207500) | Can you trim some of these instructions? |
Hmm, I'm talking about performance benchmarking, from llvm Test Suite: https://llvm.org/docs/TestingGuide.html#test-suite.
Sorry, I referred to the entire test-suite. For MicroBenchmarks only (llvm-lit -v -j 1 -o patch.json MicroBenchmarks) I get the following detailed results:
$python3 ../../test-suite/utils/compare.py --filter-short orig.json patch.json
Warning: 'test-suite :: MicroBenchmarks/XRay/FDRMode/fdrmode-bench.test' has No metrics!
Warning: 'test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.test' has No metrics!
Warning: 'test-suite :: MicroBenchmarks/XRay/FDRMode/fdrmode-bench.test' has No metrics!
Warning: 'test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.test' has No metrics!
Tests: 248
Metric: exec_time
Program orig patch diff
test-suite...mbda.test:BM_PIC_2D_LAMBDA/171 3.18 0.88 -72.2%
test-suite...alsCRaw.test:BM_PIC_2D_RAW/171 1.77 0.89 -49.8%
test-suite...Raw.test:BM_MAT_X_MAT_RAW/5001 4520.23 6770.15 49.8%
test-suite...aw.test:BM_MULADDSUB_RAW/44217 47.74 69.22 45.0%
test-suite...lsBRaw.test:BM_INIT3_RAW/44217 33.57 47.99 42.9%
test-suite...ambda.test:BM_EOS_LAMBDA/44217 46.45 58.30 25.5%
test-suite...ENCHMARK_BILATERAL_FILTER/16/2 22.35 27.97 25.2%
test-suite...ENCHMARK_BILATERAL_FILTER/16/4 68.97 85.68 24.2%
test-suite...ENCHMARK_BILATERAL_FILTER/32/2 103.48 127.84 23.5%
test-suite...ENCHMARK_BILATERAL_FILTER/32/4 381.09 468.47 22.9%
test-suite...mbda.test:BM_ICCG_LAMBDA/44217 32.80 39.22 19.6%
test-suite...st:BM_INT_PREDICT_LAMBDA/44217 366.07 434.41 18.7%
test-suite...est:BM_INT_PREDICT_LAMBDA/5001 14.69 16.82 14.5%
test-suite...BRaw.test:BM_IF_QUAD_RAW/44217 148.05 168.47 13.8%
test-suite....test:BENCHMARK_HARRIS/256/256 389.53 346.90 -10.9%
Geomean difference nan%
orig patch diff
count 232.000000 232.000000 232.000000
mean 2081.547998 2091.565842 0.007603
std 11692.576779 11651.097661 0.091756
min 0.030994 0.030717 -0.722465
25% 3.198364 2.967566 -0.012795
50% 36.165086 39.408759 -0.000642
75% 318.340952 318.455341 0.009107
max 115341.083667 113080.746500 0.497745
lib/CodeGen/MachineCSE.cpp | ||
---|---|---|
482–490 ↗ | (On Diff #206794) | My feeling is that without the phi check this heuristic would be too generic and will cause regressions, but can double check. The aim of this change is to target phi nodes re-using the same arguments. |
test/CodeGen/AMDGPU/cse-phi-incoming-val.mir | ||
42–51 ↗ | (On Diff #207500) | Will do. |
lib/CodeGen/MachineCSE.cpp | ||
---|---|---|
482–484 ↗ | (On Diff #210522) | I'm having a bit too much trouble following the heuristic with the two loops. Can you split the first loop into a function to return a possible alternative block? The second loop would then remain as-is, except checking UseMI.getParent() == OrigParent || == AlternativeParent |
484–489 ↗ | (On Diff #210522) | Can you re-flow this loop into separate BBUses and !BBUses case? |
Thank you for bearing with me, I have been away.
Now, when I re-checked the original case from D63709 this morning (cse-phi-incoming-val.ll) I noticed that the extra mov is not present, even without my patch. It turns out that D63198 (which was pushed recently) made the original issue go away. I guess, the fact that the issue was in the structurizer kind of explains the observation that other targets seemed to not suffer from similar problems.
Anyway, I intend to just push cse-phi-incoming-val.ll as part of the current review.
Reverting the patch and leaving the test only as it passes without the change in code.