This is the second part of the commit fixing PR38917 (hoisting partitially redundant machine instruction). Most of PRE (partitial redundancy elimination) and CSE work is done on LLVM IR, but some of redundancy arises during DAG legalization. Machine CSE is not enough to deal with it. This simple PRE implementation works a little bit intricately: it passes before CSE, looking for partitial redundancy and transforming it to fully redundancy, anticipating that the next CSE step will eliminate this created redundancy. If CSE doesn't eliminate this, than created instruction will remain dead and eliminated later by Remove Dead Machine Instructions pass. The third part of the commit is supposed to refactor MachineCSE, to make it more clear and to merge MachinePRE with MachineCSE, so one need no rely on further Remove Dead pass to clear instrs not eliminated by CSE. First step: https://reviews.llvm.org/D54839 Fixes llvm.org/PR38917
Details
- Reviewers
- RKSimon - MatzeB - spatel - craig.topper - aemerson - qcolombet - atrick - rtereshin 
- Commits
- rZORG3b8eade8a420: [MIR] Add simple PRE pass to MachineCSE
 rZORGea3d1df9d6a7: [MIR] Add simple PRE pass to MachineCSE
 rG3b8eade8a420: [MIR] Add simple PRE pass to MachineCSE
 rGea3d1df9d6a7: [MIR] Add simple PRE pass to MachineCSE
 rG9c20156de39b: [MIR] Add simple PRE pass to MachineCSE
 rL359870: [MIR] Add simple PRE pass to MachineCSE
Diff Detail
Event Timeline
FYI: Generally accepted ping frequency is once per week.
| lib/CodeGen/MachineCSE.cpp | ||
|---|---|---|
| 54 | Our general style is not to use line continuations like this, but instead rely on adjacent string concatenation, so this would be: STATISTIC(NumPREs,      "Number of partial redundant expression"
                            " transformed to fully redundant");(where the starting quotes are vertically aligned) | |
| 754 | Comments should be full sentences and end with a period (or other punctuation). Also, please explain *why* the conditions are stronger in this case. | |
| 790 | It looks like this could be: if (!PREMap.count(MI)) {
  PREMap[MI] = MBB;
  continue;
}and then you don't need the else and extra associated indentation. | |
| 812 | It would be better to not create, and then delete, instructions speculatively. isProfitableToCSE does look at the parent blocks of the instructions, but we could abstract that by: 
 | |
Have you measured the compile-time impact?
| lib/CodeGen/MachineCSE.cpp | ||
|---|---|---|
| 755 | I think it would be useful to elaborate on under what conditions we create these PRE instructions that are not later CSEd. Do you know how often that happens (I assume that you can figure this out by looking at the statistics and comparing runs where this happens vs. where it doesn't)? | |
What do you mean by compile-time impact? I've run test suite of course, compared with baseline.
The results for exec_time and size..text metrics are like here:
> ~/llvm/test-suite/utils/compare.py  --filter-short -m exec_time ~/llvm/test-suite-results/results_rel_base.json ~/llvm/test-suite-results/results_rel_exp3.json 
...
Program                                         results_rel_base  results_rel_exp3 diff  
                                                                                        
MicroBench...RayFDRMultiThreaded/threads:32   1055.94           931.21            -11.8%
Bitcode/Be...rid/halide_bilateral_grid.test    30.26             26.70            -11.8%
MultiSourc...mbolics-flt/Symbolics-flt.test     1.10              0.98            -10.9%
MicroBench...XRayFDRMultiThreaded/threads:2   497.85            547.92            10.1% 
MicroBench...lcalsCRaw.test:BM_EOS_RAW/5001    10.33              9.49            -8.2% 
MicroBench...bda.test:BM_PIC_1D_LAMBDA/5001    75.75             81.43             7.5% 
MicroBench...XRayFDRMultiThreaded/threads:8   1029.95           1100.16            6.8% 
MicroBench...XRayFDRMultiThreaded/threads:4   626.46            667.17             6.5% 
MicroBench...ambda.test:BM_EOS_LAMBDA/44217    87.21             92.66             6.2% 
MultiSourc...flt/LoopRestructuring-flt.test     3.76              3.53            -6.2% 
MicroBench...calsCRaw.test:BM_EOS_RAW/44217    85.94             90.96             5.8% 
MicroBench...Lambda.test:BM_ADI_LAMBDA/5001   128.27            135.35             5.5% 
MicroBench...est:BM_ReturnNeverInstrumented     1.60              1.68             5.3% 
MicroBench....test:BM_MULADDSUB_LAMBDA/5001    14.14             14.84             4.9% 
SingleSour...ncils/fdtd-apml/fdtd-apml.test     0.63              0.66             4.4%
...
> ~/llvm/test-suite/utils/compare.py  --filter-short -m size..text ~/llvm/test-suite-results/results_rel_base.json ~/llvm/test-suite-results/results_rel_exp3.json 
...
Program                                         results_rel_base  results_rel_exp3 diff 
                                                                                       
SingleSour...marks/Shootout/Shootout-random   658.00            642.00            -2.4%
SingleSour...ce/UnitTests/SignlessTypes/rem   4834.00           4930.00            2.0%
SingleSour...ootout-C++/Shootout-C++-random   818.00            802.00            -2.0%
MultiSource/Benchmarks/Olden/mst/mst          2658.00           2706.00            1.8%
SingleSource/Benchmarks/Misc/ffbench          2322.00           2290.00           -1.4%
MultiSource/Benchmarks/NPB-serial/is/is       4946.00           4898.00           -1.0%
MultiSource/Benchmarks/McCat/18-imp/imp       13266.00          13138.00          -1.0%
MultiSourc...rks/McCat/03-testtrie/testtrie   1810.00           1794.00           -0.9%
MultiSourc...chmarks/McCat/01-qbsort/qbsort   2066.00           2050.00           -0.8%
MultiSourc...iabench/g721/g721encode/encode   7394.00           7346.00           -0.6%
SingleSource/Benchmarks/Misc/fbench           2546.00           2562.00            0.6%
SingleSour...ce/Benchmarks/Misc/ReedSolomon   11170.00          11234.00           0.6%
SingleSource/Benchmarks/Stanford/Towers       2802.00           2786.00           -0.6%
MultiSourc...e/Applications/SIBsim4/SIBsim4   47666.00          47938.00           0.6%
MultiSourc...OE-ProxyApps-C/miniGMG/miniGMG   76258.00          76674.00           0.5%Although I'm not sure exec_time is good metrics. I've also tried compile_time, never used it before:
~/llvm/test-suite/utils/compare.py  --filter-short -m compile_time ~/llvm/test-suite-results/results_rel_base.json ~/llvm/test-suite-results/results_rel_exp3.json 
...
Program                                         results_rel_base  results_rel_exp3 diff 
                                                                                       
SingleSour.../Benchmarks/Misc-C++/Large/ray     0.68              0.74             7.6%
MultiSourc...Prolangs-C/simulator/simulator     1.56              1.64             5.7%
MultiSource/Benchmarks/McCat/18-imp/imp         0.82              0.78            -5.3%
MultiSourc...OE-ProxyApps-C/RSBench/rsbench     0.88              0.84            -4.6%
MultiSource/Benchmarks/Ptrdist/bc/bc            2.42              2.52             4.1%
SingleSour...enchmarks/CoyoteBench/fftbench     1.08              1.03            -4.1%
SingleSour...UnitTests/Vectorizer/gcc-loops     1.42              1.48             3.9%
MultiSourc...chmarks/Prolangs-C++/city/city     3.20              3.08            -3.9%
MultiSource/Benchmarks/sim/sim                  0.72              0.75             3.9%
MultiSourc...nchmarks/FreeBench/pifft/pifft     2.56              2.65             3.6%
MultiSource/Applications/spiff/spiff            1.58              1.64             3.5%
MultiSourc...lds-flt/CrossingThresholds-flt     3.80              3.94             3.5%
MultiSourc...s/Prolangs-C/compiler/compiler     0.84              0.81            -3.3%
MultiSourc...rks/Prolangs-C++/employ/employ     0.74              0.71            -3.3%
SingleSour...ks/Misc-C++/stepanov_container     3.01              3.11             3.2%
...Also one can see many tests increased their machine-cse.NumCSEs statistics (meaning that PRE step created common subexpressions eliminated by CSE):
> ~/llvm/test-suite/utils/compare.py  --filter-short -m machine-cse.NumCSEs ~/llvm/test-suite-results_deb_base.json ~/llvm/test-suite-results/results_deb_exp.json
...
Program                                         results_deb_base  results_deb_exp diff   
                                                                                        
SingleSour...ce/UnitTests/SignlessTypes/rem     1.00             19.00           1800.0%
MultiSource/Benchmarks/llubenchmark/llu         1.00             10.00           900.0% 
MultiSource/Benchmarks/NPB-serial/is/is         1.00              9.00           800.0% 
SingleSource/Benchmarks/Misc/fbench             1.00              7.00           600.0% 
MultiSourc...OE-ProxyApps-C/XSBench/XSBench     2.00              8.00           300.0% 
SingleSource/Benchmarks/Misc/ffbench            2.00              8.00           300.0% 
SingleSour...chmarks/Misc-C++/stepanov_v1p2     1.00              3.00           200.0% 
MultiSourc...arks/FreeBench/distray/distray     7.00             20.00           185.7% 
MicroBench.../ImageProcessing/Dither/Dither     3.00              8.00           166.7% 
MultiSourc...ench/telecomm-FFT/telecomm-fft     5.00             13.00           160.0% 
MultiSourc...enchmarks/mafft/pairlocalalign   482.00            1214.00          151.9% 
MultiSourc...aran/netbench-url/netbench-url     3.00              7.00           133.3% 
MultiSourc...Benchmarks/SciMark2-C/scimark2     6.00             13.00           116.7% 
MultiSourc...nch/mpeg2/mpeg2dec/mpeg2decode    57.00            115.00           101.8% 
MultiSourc...chmarks/McCat/01-qbsort/qbsort     2.00              4.00           100.0%
...| lib/CodeGen/MachineCSE.cpp | ||
|---|---|---|
| 755 | Thanks for pointing this out, I've cleaned this place. I've removed needless checks like MI->isReturn() after comparing statistics (no change). Mostly, these checks are implicitly done while CSE work (not inside isCSECandidate()). I've left aux checks of virtual registers though -- they are also implicitly done. I've also updated comment. This looks artificially, but I'm going to refactor this place in the third (and last) step of this work -- merging PRE and CSE, so we need no different checks. | |
| lib/CodeGen/MachineCSE.cpp | ||
|---|---|---|
| 800 | Are these calls expensive? SingleSour.../Benchmarks/Misc-C++/Large/ray 0.68 0.74 7.6% That 7% increase in compile time, and a few other others, look significant. (Maybe there's something you can cache?) | |
| lib/CodeGen/MachineCSE.cpp | ||
|---|---|---|
| 800 | I believe caching this gives nothing, cause two common subexpressions unlikely belong to the same basic blocks pair. 7% are only for one test among hundreds, there are also -5% and so on. I'm to run benchmarks again to compare results. | |
Hi @hfinkel, 
Unfortunately, the results of compile_time are unstable and not reproducible. Below are two different runs.
The first run:
> ~/llvm/test-suite/utils/compare.py  --filter-short -m compile_time ~/llvm/test-suite-results/results_rel_base.json ~/llvm/test-suite-results/results_rel_exp.json 
Tests: 1163
Short Running: 1010 (filtered out)
Remaining: 153
Metric: compile_time
Program                                         results_rel_base  results_rel_exp diff 
                                                                                      
MultiSourc...OE-ProxyApps-C/XSBench/XSBench     0.74              0.79            6.5%
MicroBench...y/ReturnReference/retref-bench     1.24              1.17           -6.1%
SingleSour...rks/Misc-C++/Large/sphereflake     0.71              0.68           -4.5%
MultiSourc...ps-C++/HACCKernels/HACCKernels     0.72              0.76            4.4%
MultiSourc...chmarks/Prolangs-C/gnugo/gnugo     1.40              1.34           -4.3%
MultiSourc...rks/Prolangs-C++/employ/employ     0.76              0.73           -4.2%
MicroBench...arks/ImageProcessing/Blur/blur     1.61              1.55           -4.0%
MultiSourc...rks/Trimaran/enc-3des/enc-3des     0.62              0.65            3.8%
MultiSourc...rks/mediabench/gsm/toast/toast     2.29              2.38            3.7%
MultiSourc...Benchmarks/SciMark2-C/scimark2     0.79              0.82            3.6%
MicroBench.../ImageProcessing/Dilate/Dilate     1.32              1.36            2.7%
MicroBench...oopInterchange/LoopInterchange     0.74              0.72           -2.7%
MultiSourc...s/Prolangs-C/unix-tbl/unix-tbl     2.22              2.16           -2.7%
MultiSourc...rolangs-C/archie-client/archie     1.20              1.17           -2.7%
MultiSourc...marks/Prolangs-C/loader/loader     0.62              0.60           -2.6%
       results_rel_base  results_rel_exp        diff
count  153.000000        152.000000       152.000000
mean   7.235216          7.270763        -0.000904  
std    12.322023         12.328571        0.016021  
min    0.600000          0.604000        -0.061093  
25%    1.516000          1.535000        -0.007504  
50%    3.784000          3.800000         0.000000  
75%    5.568000          5.581000         0.006387  
max    88.724000         88.972000        0.064516and the second run:
> ~/llvm/test-suite/utils/compare.py  --filter-short -m compile_time ~/llvm/test-suite-results/results_rel_base2.json ~/llvm/test-suite-results/results_rel_exp2.json 
Tests: 1163
Short Running: 1009 (filtered out)
Remaining: 154
Metric: compile_time
Program                                         results_rel_base2  results_rel_exp2 diff 
                                                                                        
ultiSource/Benchmarks/McCat/09-vor/vor          0.66               0.62            -6.6%
ultiSource...Benchmarks/SciMark2-C/scimark2     0.80               0.75            -5.5%
ultiSource...rolangs-C/archie-client/archie     1.16               1.22             5.2%
ultiSource...OE-ProxyApps-C/XSBench/XSBench     0.79               0.76            -4.1%
ultiSource/Benchmarks/McCat/18-imp/imp          0.80               0.83             4.0%
ultiSource...OE-ProxyApps-C/RSBench/rsbench     0.87               0.90             3.7%
ultiSource...rks/Trimaran/enc-3des/enc-3des     0.66               0.63            -3.7%
ultiSource...hmarks/MallocBench/cfrac/cfrac     1.64               1.70             3.6%
icroBenchm.../ImageProcessing/Dilate/Dilate     1.37               1.32            -3.5%
ultiSource...ALAC/decode/alacconvert-decode     2.41               2.34            -3.0%
ultiSource...chmarks/Prolangs-C++/city/city     3.22               3.13            -3.0%
ultiSource...xyApps-C/Pathfinder/PathFinder     1.77               1.72            -2.9%
ultiSource...s/Prolangs-C/compiler/compiler     0.86               0.84            -2.8%
ultiSource...rks/Prolangs-C++/employ/employ     0.73               0.71            -2.7%
ultiSource/Applications/sgefa/sgefa             0.99               1.02             2.4%
       results_rel_base2  results_rel_exp2        diff
count  152.000000         153.000000        151.000000
mean   7.284842           7.236418         -0.002551  
std    12.346005          12.320806         0.015407  
min    0.600000           0.604000         -0.066265  
25%    1.567000           1.504000         -0.009324  
50%    3.794000           3.776000         -0.001112  
75%    5.609000           5.584000          0.005669  
max    89.004000          89.036000         0.051724Also, I don't think the heavy functions here (like findNearestCommonDominator() and isPotentiallyReachable()) can significant impact compile time -- they are called rarely only for two BBs with common subexpressions, which is unusual. Taking the machine-cse.NumCSE statistics above as example, one can make an estimation of no more than 10-100 calls of these functions for the benchs from test suite. Also, these heavy calls are already cached in a sense, they are using prebuilt Machine Dominator Tree.
LGTM - based on the improved codegen and the negligible effect on compile time. Thanks.
| llvm/lib/CodeGen/MachineCSE.cpp | ||
|---|---|---|
| 784 ↗ | (On Diff #196406) | Very minor - but why can't the ++I be moved inside the for-loop increment statement? | 
| llvm/lib/CodeGen/MachineCSE.cpp | ||
|---|---|---|
| 784 ↗ | (On Diff #196406) | It can, I'm to fix it, thanks. | 
| llvm/lib/CodeGen/MachineCSE.cpp | ||
|---|---|---|
| 784 ↗ | (On Diff #196406) | This one too? | 
| llvm/lib/CodeGen/MachineCSE.cpp | ||
|---|---|---|
| 784 ↗ | (On Diff #196406) | Hmm, looks like I've eventually fixed L506 -- the original loop inside ProcessBlockCSE() instead of ProcessBlockPRE(). | 
This is causing incorrect code generation for this piece of IR:
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-arm-none-eabi"
@var = hidden local_unnamed_addr global i32 0, align 4
@_ZTIi = external dso_local constant i8*
declare dso_local void @_Z2fnv() local_unnamed_addr #1
declare dso_local i32 @__gxx_personality_v0(...)
declare i32 @llvm.eh.typeid.for(i8*) #2
declare dso_local i8* @__cxa_begin_catch(i8*) local_unnamed_addr
declare dso_local void @__cxa_end_catch() local_unnamed_addr
define hidden i32 @_Z7examplev() personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
entry:
  invoke void @_Z2fnv()
          to label %try.cont unwind label %lpad
lpad:                                             ; preds = %entry
  %0 = landingpad { i8*, i32 }
          catch i8* bitcast (i8** @_ZTIi to i8*)
          catch i8* null
  %1 = extractvalue { i8*, i32 } %0, 0
  %2 = extractvalue { i8*, i32 } %0, 1
  %3 = tail call i32 @llvm.eh.typeid.for(i8* bitcast (i8** @_ZTIi to i8*))
  %matches = icmp eq i32 %2, %3
  %4 = tail call i8* @__cxa_begin_catch(i8* %1)
  %5 = load i32, i32* @var, align 4
  br i1 %matches, label %catch1, label %catch
catch1:                                           ; preds = %lpad
  %or3 = or i32 %5, 4
  store i32 %or3, i32* @var, align 4
  tail call void @__cxa_end_catch()
  br label %try.cont
try.cont:                                         ; preds = %entry, %catch1, %catch
  %6 = load i32, i32* @var, align 4
  ret i32 %6
catch:                                            ; preds = %lpad
  %or = or i32 %5, 8
  store i32 %or, i32* @var, align 4
  tail call void @__cxa_end_catch()
  br label %try.cont
}As part of the accesses to var an ADRP instruction is being generated and MachineCSE is hoisting it to the entry block, but it gets hoisted to after the call to fn so when we catch an exception the ADRP hasn't been executed so the loads and stores use an undefined base register.
Hi @john.brawn , thanks, I'm looking into it. At first glance, this instruction should have special flag to being non-hoistable.
The issue actually is not with this instruction, but it's related to exception handling. Hoisted instruction is inserted before getFirstTerminator(), but there could be EH_LABEL's which are not terminators, but could change control flow.
I'm to fix it.
Hi @john.brawn , here is the fix, could you please look to it: https://reviews.llvm.org/D63148
| llvm/trunk/lib/CodeGen/MachineCSE.cpp | ||
|---|---|---|
| 814 ↗ | (On Diff #197945) | Do we need to enhance the algorithm to consider more about register pressure on the profit calculation? I'm afraid there is performance drop when the register pressure is heavy. | 
| llvm/trunk/lib/CodeGen/MachineCSE.cpp | ||
|---|---|---|
| 814 ↗ | (On Diff #197945) | Hi @LuoYuanke, yes, this could be the case. Actually this commit doesn't change profit calculation, ProcessBlockPRE() uses the same isProfitableToCommit() function as ProcessBlockCSE() uses. Do you have concrete test cases where register pressure increases? | 
| llvm/trunk/lib/CodeGen/MachineCSE.cpp | ||
|---|---|---|
| 814 ↗ | (On Diff #197945) | CSE only eliminate MI, but current PRE insert MI to the common dominated block, so the instruction is hoisted to dominated block. It increase register pressure more than CSE. | 
| llvm/trunk/lib/CodeGen/MachineCSE.cpp | ||
|---|---|---|
| 814 ↗ | (On Diff #197945) | Hi @LuoYuanke, actually PRE doesn't hoist instruction, but _duplicates_ it, real hoisting is made by CSE when it _eliminates_ instruction. I do believe that CSE should consider register pressure while eliminating instruction. Have you tried MachineLICMBase::isProfitableToHoist() with your test case? We can adopt and insert it to CSE. | 
| llvm/trunk/lib/CodeGen/MachineCSE.cpp | ||
|---|---|---|
| 814 ↗ | (On Diff #197945) | Hi @anton-afanasyev, sorry for reply late. I'm trying to figure out this issue, but it seems complex for me. Can you run SPEC cpu2017/500.perlbench_r? There is some significant performance drop on X86 with the patch. If you have resource to run the benchmark, could you help to look into the issue? | 
@anton-afanasyev
Hi, 
Did you look into the SPEC cpu2017/500.perlbench_r issue? There is some significant performance drop on X86 with the patch. I ask you to revert the patch first, and when the SPEC2017 regression is fixed, we can submit the patch again. How do you think?
Hi @LuoYuanke, I'm sorry, it looks like my previous answer to you remained unsubmitted.
Here it is:
Hi @LuoYuanke , I have only access to SPEC cpu2006, I see 400.perlbench there. Could it be the case you know regressed sample from cpu2006?
Also, I have prepared new patch for review: https://reviews.llvm.org/D63934 -- I believe it can potentially decrease reg pressure.
At the moment, I believe that another patch (from Kai Luo) https://reviews.llvm.org/D64394 fixes this regression.
| llvm/trunk/lib/CodeGen/MachineCSE.cpp | ||
|---|---|---|
| 807 ↗ | (On Diff #197945) | Hi @anton-afanasyev , I have a concern here that CFG of LLVM IR might not be equivalent to CFG of Machine IR. | 
@anton-afanasyev
Hi,
Do you have any performance data for the patch? I'd like to know what benchmark has performance gain with your patch. https://reviews.llvm.org/D64394 fixed perlbench regression, but I wonder what the performance gain do we achieve with the 2 patch?
Hi @LuoYuanke, I've benchmarked the first patch and posted results in two posts started from here: https://reviews.llvm.org/D56772#1376284.
But you are right: with the second patch the performance gain may be eliminated! I'm to measure second patch effect.
| llvm/trunk/lib/CodeGen/MachineCSE.cpp | ||
|---|---|---|
| 807 ↗ | (On Diff #197945) | Hi @lkail, yes this is possible case. But that is not an issue actually, for this case in place it would be just non-optimization. PRE will create instruction, but if CSE doesn't eliminate original instructions, than "hoisted" instruction will be deleted by Remove Dead Machine Instructions pass later. | 
Hi @LuoYuanke, I've benchmarked the effect of the revertion my and @lkail patches.
The benchmark showed some increase of the exec_time:
~/llvm/test-suite/utils/compare.py --filter-short -m exec_time ~/llvm/test-suite-results/results_rel_base.json ~/llvm/test-suite-results/results_rel_base2.json vs ~/llvm/test-suite-results/results_rel_exp.json ~/llvm/test-suite-results/results_rel_exp2.json Program lhs rhs diff test-suite...Raw.test:BM_MULADDSUB_RAW/5001 13.96 17.35 24.3% test-suite...XRayFDRMultiThreaded/threads:4 568.16 634.38 11.7% test-suite...RayFDRMultiThreaded/threads:32 858.02 944.45 10.1% test-suite...XRayFDRMultiThreaded/threads:2 464.93 501.92 8.0% test-suite...algebra/kernels/symm/symm.test 14.00 15.12 7.9% test-suite...st:BM_BAND_LIN_EQ_LAMBDA/44217 39.41 37.34 -5.2% test-suite...test:BM_MULADDSUB_LAMBDA/44217 141.97 148.38 4.5% test-suite...Lambda.test:BM_EOS_LAMBDA/5001 9.88 9.48 -4.0% test-suite...ambda.test:BM_EOS_LAMBDA/44217 89.08 85.78 -3.7% test-suite...s/Halide/blur/halide_blur.test 1.77 1.83 3.6% test-suite...BenchmarkGame/Large/fasta.test 0.72 0.70 -3.3% test-suite...lcalsCRaw.test:BM_EOS_RAW/5001 9.86 9.54 -3.3% test-suite...calsCRaw.test:BM_EOS_RAW/44217 88.90 86.00 -3.3% test-suite...XRayFDRMultiThreaded/threads:8 983.92 1014.32 3.1% test-suite...RayFDRMultiThreaded/threads:16 979.55 1008.68 3.0%
So I'd like to leave these changes unreverted and to proceed with particular patches (like this one https://reviews.llvm.org/D63934). Please tell if you have any objections.
@anton-afanasyev, I have no objection. Thank you for the effort to improve the performance.
Our general style is not to use line continuations like this, but instead rely on adjacent string concatenation, so this would be:
STATISTIC(NumPREs, "Number of partial redundant expression" " transformed to fully redundant");(where the starting quotes are vertically aligned)