This is yet another attempt to eliminate unnecessary loads of immediates in case where it is already known by the preceding comparison (https://reviews.llvm.org/D98905, https://reviews.llvm.org/D100039).
SystemZ:
- Added isSelect flag on LOCHIMux and LOCGHI.
- Implemented analyzeSelect() and optimizeSelect() for them.
TargetInstrInfo - analyzeSelect() and optimizeSelect():
Changed the handling of optimizeSelect() so that target can return a modified instrution in which case it is *not* deleted.
If (as it appears to me) PeepholeOptimizer.cpp is the only user of these hooks (and there are no downstream out-of-tree targets that have requested this), maybe we could merge these two hooks? It seems this could more or less be just one 'optimizeSelect()' method as there appears to be no use for the arguments to analyzeSelect(), or?
If the arguments to analyzeSelect() are indeed needed to be filled out, the current patch makes sense, by doing a careful analysis in that method. Otherwise, it is a waste as it has to be redone in optimizeSelect() (It would probably be better to return true from analyzeSelect() from the interesting opcodes and then do the work in optimizeSelect()).
Benchmarks:
I tried four combinations of two options: "single use of compare operand" and "find LHIMux/LGHI via MRI if not found locally" (experimental options in the patch):
master <> "multiple users" + "only cases with local LHIMux/LGHI" lhi : 225040 222044 -2996 lghi : 445603 444910 -693 lr : 61869 62276 +407 lgr : 853946 854211 +265 ...
master <> "single uer" + "only cases with local LHIMux/LGHI" lhi : 225040 222702 -2338 lghi : 445603 445263 -340 lgr : 853946 854083 +137 lr : 61869 61928 +59 ...
master <> "multiple users" + "use MRI to find LHIMux/LGHI" lhi : 225040 220319 -4721 lghi : 445603 443104 -2499 lr : 61869 62808 +939 lgr : 853946 854436 +490 ...
master <> "single user" + "use MRI to find LHIMux/LGHI" lhi : 225040 221788 -3252 lghi : 445603 443556 -2047 lgr : 853946 854352 +406 lr : 61869 61942 +73 ...
Initial measurements do not show any bigger performance changes either way...
clang-format: please reformat the code