This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Scalar/
-
llvm/
-
Transforms/
-
Scalar/
-
TreeHeightReduction.h
-
lib/
-
Passes/
-
PassBuilder.cpp
-
PassBuilderPipelines.cpp
-
PassRegistry.def
-
Transforms/Scalar/
-
Scalar/
-
CMakeLists.txt
1
TreeHeightReduction.cpp
-
test/
-
Other/
-
new-pm-defaults.ll
-
new-pm-thinlto-defaults.ll
-
new-pm-thinlto-postlink-pgo-defaults.ll
-
new-pm-thinlto-postlink-samplepgo-defaults.ll
-
Transforms/TreeHeightReduction/
-
TreeHeightReduction/
-
floating-point-add-only.ll
-
floating-point-add-with-constant.ll
-
floating-point-mult-only.ll
-
floating-point-sub-only.ll
-
fp16-add-with-constant.ll
-
fp16-add.ll
-
fp16-mult.ll
-
fp16-sub.ll
-
integer-add-only.ll
-
integer-add-with-constant.ll
-
integer-mult-only.ll
-
integer-sub-only.ll
-
leaf-num-check.ll
-
long-double-add-with-constant.ll
-
long-double-add.ll
-
long-double-mult.ll
-
long-double-sub.ll

Differential D132828

Add new optimization pass of Tree Height Reduction
AbandonedPublic

Authored by kawashima-fj on Aug 29 2022, 12:44 AM.

Download Raw Diff

Details

Reviewers

hfinkel
xbolva00
fhahn
jdoerfert
spatel
RKSimon

Summary

The tree height reduction optimization increases the instruction-level parallelism by changing the order of operations in a loop to keep the operation tree as low as possible.

For example, see the the following code.

for (int i = 0; i < N; ++i)
  x[i] = a0[i] + a1[i] + a2[i] + a3[i] + a4[i] + a5[i] + a6[i] + a7[i];

This is equivalent to the following code. Each addition depends on the result of the preceding addition.

for (int i = 0; i < N; ++i)
  x[i] = ((((((a0[i] + a1[i]) + a2[i]) + a3[i]) + a4[i]) + a5[i]) + a6[i]) + a7[i];

Tree height reduction transforms it to the following code. Additions in innermost parentheses can be executed in parallel.

for (int i = 0; i < N; ++i)
  x[i] = ((a0[i] + a1[i]) + (a2[i] + a3[i])) + ((a4[i] + a5[i]) + (a6[i] + a7[i]));

The implemented algorithm is based on the following paper:

Katherine Coons, Warren Hunt, Bertrand A. Maher, Doug Burger, Kathryn S. McKinley. Optimal Huffman Tree-Height Reduction for Instruction-Level Parallelism.

Applicable conditions

This patch incorporates tree height reduction pass into the default optimization pipeline but it is disabled by default. You need the -enable-int-thr option for integer operations (add, mul, and, or, and xor) and the -enable-fp-thr option for floating-point operations (fadd and fmul) to enable it. Furthermore, for floating-point operations, you also need reassoc and nsz flags to fadd/fmul.

You can use it via Clang by:

clang -O1 -fassociative-math -fno-signed-zeros -mllvm -enable-int-thr -mllvm -enable-fp-thr

Or, simply:

clang -Ofast -mllvm -enable-int-thr -mllvm -enable-fp-thr

It is only applied to operations in innermost loops.

Performance

I run C/C++ benchmarks in SPECspeed 2017 on Fujitsu A64FX processor, which has two pipelines for integer operations and SIMD/FP operations each. 600.perlbench_s and 619.lbm_s had 3% improvement. Other benchmarks (602, 605, 620, 623, 625, 631, 641, 644, 657) were within 1% up/down. In these runs, to emphasize the performance improvement, the number of OpenMP threads is limited to one.

Relation to D67383

This patch is an updated version of D67383. The author @masakazu.ueno was my colleague. I took over his patch.

The following comments in D67383 are addressed in this patch.

Also, this patch has following updates.

Adapt to the latest main branch (new pass manager, opaque pointer, Apache license, ...)
Remove the requirement of full fast-math flags
Fix bugs
Simplify the code
etc.

Future work

Currently the cost estimation is not implemented. Of course, as @hfinkel said, this optimization has no positive effect if the target processor cannot utilize ILP. I want to add some cost estimation or something and enable this optimization automatically when profitable.

And, I want to add a Clang option to enable/disable it.

These will be addressed in another patch.

Discussion

I want comments about the following points.

This optimization is only applied to instructions in innermost loops because D67383 was implemented so. Should we expand it to instructions in all basic blocks?
Where is the best position of this pass in the default optimization pipeline? I put it after the loop unrolling pass. Because, if a loop has a reduction, the reduction can also be optimized by this pass.
As explained in the future work above, I want to add decision to apply the optimization or not. What information this decision should be based on? Issue width in a machine model?

Diff Detail

Event Timeline

kawashima-fj created this revision.Aug 29 2022, 12:44 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2022, 12:44 AM

Herald added subscribers: ormris, wenlei, steven_wu and 2 others. · View Herald Transcript

kawashima-fj requested review of this revision.Aug 29 2022, 12:44 AM

Herald added a reviewer: jdoerfert. · View Herald TranscriptAug 29 2022, 12:44 AM

Herald added subscribers: llvm-commits, sstefan1. · View Herald Transcript

kawashima-fj mentioned this in D67383: Add new optimization pass of Tree-Height-Reduction.Aug 29 2022, 12:47 AM

xbolva00 added reviewers: spatel, RKSimon.Aug 29 2022, 1:33 AM

Harbormaster completed remote builds in B183871: Diff 456269.Aug 29 2022, 2:21 AM

Maybe we can also add reverse mode to help GPU or Low-end CPU to reduce register pressure.

kawashima-fj removed a subscriber: masakazu.ueno.Aug 29 2022, 3:56 AM

bmahjour added a subscriber: bmahjour.Aug 29 2022, 6:24 AM

Thanks for updating the patch! Have you considered implementing this as MachineFunctionPass instead of an LLVM IR pass? Doing the transformation on MachineIR would allow for more precise cost estimates, including more accurate information about register usage, selected instructions and processor resource usage. MachineCombiner.cpp might be interesting example to look at for similar (although simpler) transformations with relatively accurate uarch-driven cost-modeling.

I run C/C++ benchmarks in SPECspeed 2017 on Fujitsu A64FX processor, which has two pipelines for integer operations and SIMD/FP operations each. 600.perlbench_s and 619.lbm_s had 3% improvement. Other benchmarks (602, 605, 620, 623, 625, 631, 641, 644, 657) were within 1% up/down. In these runs, to emphasize the performance improvement, the number of OpenMP threads is limited to one.

It might be interesting to also run SPECrate instead of just running speed with one thread?

In D132828#3757715, @fhahn wrote:

Have you considered implementing this as MachineFunctionPass instead of an LLVM IR pass?

No. Just because the original patch D67383 was implemented as an LLVM IR pass and I'm not familier with MachineFunctionPass.

Doing the transformation on MachineIR would allow for more precise cost estimates, including more accurate information about register usage, selected instructions and processor resource usage. MachineCombiner.cpp might be interesting example to look at for similar (although simpler) transformations with relatively accurate uarch-driven cost-modeling.

Thanks for your advice. It seems to have an advantage but porting to MachineFunctionPass may take time for me. I'll see about it.

It might be interesting to also run SPECrate instead of just running speed with one thread?

OK. I'll do.

mingmingl added a subscriber: mingmingl.Oct 2 2022, 12:12 AM

kawashima-fj mentioned this in D138107: [AArch64][MachineCombiner] Update isAssociativeAndCommutative.Nov 16 2022, 2:23 AM

In D132828#3757973, @kawashima-fj wrote:

In D132828#3757715, @fhahn wrote:

Have you considered implementing this as MachineFunctionPass instead of an LLVM IR pass?

No. Just because the original patch D67383 was implemented as an LLVM IR pass and I'm not familier with MachineFunctionPass.

Doing the transformation on MachineIR would allow for more precise cost estimates, including more accurate information about register usage, selected instructions and processor resource usage. MachineCombiner.cpp might be interesting example to look at for similar (although simpler) transformations with relatively accurate uarch-driven cost-modeling.

Thanks for your advice. It seems to have an advantage but porting to MachineFunctionPass may take time for me. I'll see about it.

Thank you very much for your information. I found MachineCombiner.cpp does similar transformations and adding missing opcodes to an existing function can achieve similar result for AArch64. The patch is D138107. Please review. I abandon this patch.

@kawashima-fj, I think it is a good optimization pass. I want to integrate it into my product. Is there any license issue since it is not taken by LLVM? Thanks

In D132828#3941665, @seanxilinx wrote:

@kawashima-fj, I think it is a good optimization pass. I want to integrate it into my product. Is there any license issue since it is not taken by LLVM? Thanks

@seanxilinx No, there is no license issue. This patch was posted here under the license same as the LLVM (Apache License v2.0 with LLVM Exceptions).

As I explained in D138107, D132828 and D138107 achieve similar transformation. However, if the LLVM community wants both, I can continue this patch.

In D132828#3942506, @kawashima-fj wrote:

In D132828#3941665, @seanxilinx wrote:

@kawashima-fj, I think it is a good optimization pass. I want to integrate it into my product. Is there any license issue since it is not taken by LLVM? Thanks

@seanxilinx No, there is no license issue. This patch was posted here under the license same as the LLVM (Apache License v2.0 with LLVM Exceptions).

As I explained in D138107, D132828 and D138107 achieve similar transformation. However, if the LLVM community wants both, I can continue this patch.

Thanks @kawashima-fj for your contribution.

llvm/lib/Transforms/Scalar/TreeHeightReduction.cpp
462	for the root of tree, we may not need to check whether it is isBranchCandidate or not.

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Scalar/

TreeHeightReduction.h

25 lines

lib/

Passes/

PassBuilder.cpp

1 line

PassBuilderPipelines.cpp

6 lines

PassRegistry.def

1 line

Transforms/

Scalar/

CMakeLists.txt

1 line

TreeHeightReduction.cpp

726 lines

test/

Other/

new-pm-defaults.ll

3 lines

new-pm-thinlto-defaults.ll

3 lines

new-pm-thinlto-postlink-pgo-defaults.ll

3 lines

new-pm-thinlto-postlink-samplepgo-defaults.ll

3 lines

Transforms/

TreeHeightReduction/

floating-point-add-only.ll

105 lines

floating-point-add-with-constant.ll

109 lines

floating-point-mult-only.ll

105 lines

floating-point-sub-only.ll

105 lines

fp16-add-with-constant.ll

55 lines

53 lines

53 lines

105 lines

209 lines

integer-add-with-constant.ll

217 lines

integer-mult-only.ll

209 lines

integer-sub-only.ll

209 lines

leaf-num-check.ll

69 lines

long-double-add-with-constant.ll

55 lines

long-double-add.ll

53 lines

long-double-mult.ll

53 lines

long-double-sub.ll

105 lines

Diff 456269

llvm/include/llvm/Transforms/Scalar/TreeHeightReduction.h

This file was added.

				//===- TreeHeightReduction.h - Minimize the height of an operation tree ---===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_SCALAR_TREEHEIGHTREDUCTION_H
				#define LLVM_TRANSFORMS_SCALAR_TREEHEIGHTREDUCTION_H

				#include "llvm/IR/PassManager.h"
				#include "llvm/Transforms/Scalar/LoopPassManager.h"

				namespace llvm {

				class TreeHeightReductionPass : public PassInfoMixin<TreeHeightReductionPass> {
				public:
				PreservedAnalyses run(Loop &L, LoopAnalysisManager &AM,
				LoopStandardAnalysisResults &AR, LPMUpdater &U);
				};

				} // namespace llvm

				#endif // LLVM_TRANSFORMS_SCALAR_TREEHEIGHTREDUCTION_H

llvm/lib/Passes/PassBuilder.cpp

	Show First 20 Lines • Show All 209 Lines • ▼ Show 20 Lines
	#include "llvm/Transforms/Scalar/SimpleLoopUnswitch.h"			#include "llvm/Transforms/Scalar/SimpleLoopUnswitch.h"
	#include "llvm/Transforms/Scalar/SimplifyCFG.h"			#include "llvm/Transforms/Scalar/SimplifyCFG.h"
	#include "llvm/Transforms/Scalar/Sink.h"			#include "llvm/Transforms/Scalar/Sink.h"
	#include "llvm/Transforms/Scalar/SpeculativeExecution.h"			#include "llvm/Transforms/Scalar/SpeculativeExecution.h"
	#include "llvm/Transforms/Scalar/StraightLineStrengthReduce.h"			#include "llvm/Transforms/Scalar/StraightLineStrengthReduce.h"
	#include "llvm/Transforms/Scalar/StructurizeCFG.h"			#include "llvm/Transforms/Scalar/StructurizeCFG.h"
	#include "llvm/Transforms/Scalar/TLSVariableHoist.h"			#include "llvm/Transforms/Scalar/TLSVariableHoist.h"
	#include "llvm/Transforms/Scalar/TailRecursionElimination.h"			#include "llvm/Transforms/Scalar/TailRecursionElimination.h"
				#include "llvm/Transforms/Scalar/TreeHeightReduction.h"
	#include "llvm/Transforms/Scalar/WarnMissedTransforms.h"			#include "llvm/Transforms/Scalar/WarnMissedTransforms.h"
	#include "llvm/Transforms/Utils/AddDiscriminators.h"			#include "llvm/Transforms/Utils/AddDiscriminators.h"
	#include "llvm/Transforms/Utils/AssumeBundleBuilder.h"			#include "llvm/Transforms/Utils/AssumeBundleBuilder.h"
	#include "llvm/Transforms/Utils/BreakCriticalEdges.h"			#include "llvm/Transforms/Utils/BreakCriticalEdges.h"
	#include "llvm/Transforms/Utils/CanonicalizeAliases.h"			#include "llvm/Transforms/Utils/CanonicalizeAliases.h"
	#include "llvm/Transforms/Utils/CanonicalizeFreezeInLoops.h"			#include "llvm/Transforms/Utils/CanonicalizeFreezeInLoops.h"
	#include "llvm/Transforms/Utils/Debugify.h"			#include "llvm/Transforms/Utils/Debugify.h"
	#include "llvm/Transforms/Utils/EntryExitInstrumenter.h"			#include "llvm/Transforms/Utils/EntryExitInstrumenter.h"
	▲ Show 20 Lines • Show All 1,653 Lines • Show Last 20 Lines

llvm/lib/Passes/PassBuilderPipelines.cpp

Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines
#include "llvm/Transforms/Scalar/NewGVN.h"		#include "llvm/Transforms/Scalar/NewGVN.h"
#include "llvm/Transforms/Scalar/Reassociate.h"		#include "llvm/Transforms/Scalar/Reassociate.h"
#include "llvm/Transforms/Scalar/SCCP.h"		#include "llvm/Transforms/Scalar/SCCP.h"
#include "llvm/Transforms/Scalar/SROA.h"		#include "llvm/Transforms/Scalar/SROA.h"
#include "llvm/Transforms/Scalar/SimpleLoopUnswitch.h"		#include "llvm/Transforms/Scalar/SimpleLoopUnswitch.h"
#include "llvm/Transforms/Scalar/SimplifyCFG.h"		#include "llvm/Transforms/Scalar/SimplifyCFG.h"
#include "llvm/Transforms/Scalar/SpeculativeExecution.h"		#include "llvm/Transforms/Scalar/SpeculativeExecution.h"
#include "llvm/Transforms/Scalar/TailRecursionElimination.h"		#include "llvm/Transforms/Scalar/TailRecursionElimination.h"
		#include "llvm/Transforms/Scalar/TreeHeightReduction.h"
#include "llvm/Transforms/Scalar/WarnMissedTransforms.h"		#include "llvm/Transforms/Scalar/WarnMissedTransforms.h"
#include "llvm/Transforms/Utils/AddDiscriminators.h"		#include "llvm/Transforms/Utils/AddDiscriminators.h"
#include "llvm/Transforms/Utils/AssumeBundleBuilder.h"		#include "llvm/Transforms/Utils/AssumeBundleBuilder.h"
#include "llvm/Transforms/Utils/CanonicalizeAliases.h"		#include "llvm/Transforms/Utils/CanonicalizeAliases.h"
#include "llvm/Transforms/Utils/InjectTLIMappings.h"		#include "llvm/Transforms/Utils/InjectTLIMappings.h"
#include "llvm/Transforms/Utils/LibCallsShrinkWrap.h"		#include "llvm/Transforms/Utils/LibCallsShrinkWrap.h"
#include "llvm/Transforms/Utils/Mem2Reg.h"		#include "llvm/Transforms/Utils/Mem2Reg.h"
#include "llvm/Transforms/Utils/NameAnonGlobals.h"		#include "llvm/Transforms/Utils/NameAnonGlobals.h"
▲ Show 20 Lines • Show All 1,111 Lines • ▼ Show 20 Lines	PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
OptimizePM.addPass(LoopDistributePass());		OptimizePM.addPass(LoopDistributePass());

// Populates the VFABI attribute with the scalar-to-vector mappings		// Populates the VFABI attribute with the scalar-to-vector mappings
// from the TargetLibraryInfo.		// from the TargetLibraryInfo.
OptimizePM.addPass(InjectTLIMappings());		OptimizePM.addPass(InjectTLIMappings());

addVectorPasses(Level, OptimizePM, /* IsFullLTO */ false);		addVectorPasses(Level, OptimizePM, /* IsFullLTO */ false);

		// Increase instruction-level parallelism by reordering associative and
		// commutative operations in a loop. Putting this pass after the loop
		// unrolling pass will produce a better effect if the loop has reductions.
		OptimizePM.addPass(createFunctionToLoopPassAdaptor(TreeHeightReductionPass()));

// LoopSink pass sinks instructions hoisted by LICM, which serves as a		// LoopSink pass sinks instructions hoisted by LICM, which serves as a
// canonicalization pass that enables other optimizations. As a result,		// canonicalization pass that enables other optimizations. As a result,
// LoopSink pass needs to be a very late IR pass to avoid undoing LICM		// LoopSink pass needs to be a very late IR pass to avoid undoing LICM
// result too early.		// result too early.
OptimizePM.addPass(LoopSinkPass());		OptimizePM.addPass(LoopSinkPass());

// And finally clean up LCSSA form before generating code.		// And finally clean up LCSSA form before generating code.
OptimizePM.addPass(InstSimplifyPass());		OptimizePM.addPass(InstSimplifyPass());
▲ Show 20 Lines • Show All 655 Lines • Show Last 20 Lines

llvm/lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 521 Lines • ▼ Show 20 Lines
	LOOP_PASS("print<iv-users>", IVUsersPrinterPass(dbgs()))			LOOP_PASS("print<iv-users>", IVUsersPrinterPass(dbgs()))
	LOOP_PASS("print<loopnest>", LoopNestPrinterPass(dbgs()))			LOOP_PASS("print<loopnest>", LoopNestPrinterPass(dbgs()))
	LOOP_PASS("print<loop-cache-cost>", LoopCachePrinterPass(dbgs()))			LOOP_PASS("print<loop-cache-cost>", LoopCachePrinterPass(dbgs()))
	LOOP_PASS("loop-predication", LoopPredicationPass())			LOOP_PASS("loop-predication", LoopPredicationPass())
	LOOP_PASS("guard-widening", GuardWideningPass())			LOOP_PASS("guard-widening", GuardWideningPass())
	LOOP_PASS("loop-bound-split", LoopBoundSplitPass())			LOOP_PASS("loop-bound-split", LoopBoundSplitPass())
	LOOP_PASS("loop-reroll", LoopRerollPass())			LOOP_PASS("loop-reroll", LoopRerollPass())
	LOOP_PASS("loop-versioning-licm", LoopVersioningLICMPass())			LOOP_PASS("loop-versioning-licm", LoopVersioningLICMPass())
				LOOP_PASS("tree-height-reduction", TreeHeightReductionPass())
	#undef LOOP_PASS			#undef LOOP_PASS

	#ifndef LOOP_PASS_WITH_PARAMS			#ifndef LOOP_PASS_WITH_PARAMS
	#define LOOP_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS)			#define LOOP_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS)
	#endif			#endif
	LOOP_PASS_WITH_PARAMS("simple-loop-unswitch",			LOOP_PASS_WITH_PARAMS("simple-loop-unswitch",
	"SimpleLoopUnswitchPass",			"SimpleLoopUnswitchPass",
	[](std::pair<bool, bool> Params) {			[](std::pair<bool, bool> Params) {
	Show All 19 Lines

llvm/lib/Transforms/Scalar/CMakeLists.txt

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	add_llvm_component_library(LLVMScalarOpts
SimpleLoopUnswitch.cpp		SimpleLoopUnswitch.cpp
SimplifyCFGPass.cpp		SimplifyCFGPass.cpp
Sink.cpp		Sink.cpp
SpeculativeExecution.cpp		SpeculativeExecution.cpp
StraightLineStrengthReduce.cpp		StraightLineStrengthReduce.cpp
StructurizeCFG.cpp		StructurizeCFG.cpp
TailRecursionElimination.cpp		TailRecursionElimination.cpp
TLSVariableHoist.cpp		TLSVariableHoist.cpp
		TreeHeightReduction.cpp
WarnMissedTransforms.cpp		WarnMissedTransforms.cpp

ADDITIONAL_HEADER_DIRS		ADDITIONAL_HEADER_DIRS
${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms		${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms
${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms/Scalar		${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms/Scalar

DEPENDS		DEPENDS
intrinsics_gen		intrinsics_gen
Show All 12 Lines

llvm/lib/Transforms/Scalar/TreeHeightReduction.cpp

This file was added.

				//===- TreeHeightReduction.cpp - Minimize the height of an operation tree -===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass implements a tree-height-reduction pass.
				//
				// Tree height reduction is an optimization to increase instruction level
				// parallelism by transforming an operation tree like the following.
				//
				// Before After
				//
				// _ N1 _ _________ N1 _________
				// \| \| \| \|
				// _ N2 _ L1 ___ N2 ___ ___ N3 ___
				// \| \| \| \| \| \|
				// _ N3 _ L2 ---> _ N4 _ _ N5 _ _ N6 _ _ N7 _
				// \| \| \| \| \| \| \| \| \| \|
				// _ N4 _ L3 L1 L2 L3 L4 L5 L6 L7 L8
				// \| \|
				// _ N5 _ L4
				// \| \|
				// _ N6 _ L5
				// \| \|
				// _ N7 _ L6
				// \| \|
				// L7 L8
				//
				//===----------------------------------------------------------------------===//
				//
				// The algorithm of tree height reduction is based on the paper:
				// Katherine Coons, Warren Hunt, Bertrand A. Maher, Doug Burger,
				// Kathryn S. McKinley.
				// Optimal Huffman Tree-Height Reduction for Instruction-Level Parallelism.
				//

				#include "llvm/ADT/Twine.h"
				#include "llvm/Analysis/OptimizationRemarkEmitter.h"
				#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/IR/BasicBlock.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/Instruction.h"
				#include "llvm/IR/Type.h"
				#include "llvm/IR/User.h"
				#include "llvm/IR/Value.h"
				#include "llvm/Pass.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Transforms/Scalar.h"
				#include "llvm/Transforms/Scalar/TreeHeightReduction.h"
				#include <queue>
				#include <utility>
				#include <vector>

				using namespace llvm;

				#define DEBUG_TYPE "tree-height-reduction"

				// Whether to apply tree height reduction to integer instructions.
				static cl::opt<bool> EnableIntTHR(
				"enable-int-thr",
				cl::desc("Enable tree height reduction to integer instructions"),
				cl::init(false));

				// Whether to apply tree height reduction to floating-point instructions.
				static cl::opt<bool> EnableFpTHR(
				"enable-fp-thr",
				cl::desc("Enable tree height reduction to floating-point instructions"),
				cl::init(false));

				// Minimum number of leaves to apply tree height reduction.
				// Tree height reduction has effect only if the number of leaves is 4 or more.
				static cl::opt<unsigned> MinLeaves(
				"thr-min-leaves",
				cl::desc("Minimum number of leaves to apply tree height reduction "
				"(default = 4)"),
				cl::init(4));

				namespace {
				class Node {
				public:
				explicit Node(Value *V)
				: Inst(nullptr), DefValue(V), Parent(nullptr), Left(nullptr),
				Right(nullptr), Latency(0), TotalCost(0) {
				if (Instruction *I = dyn_cast<Instruction>(V)) {
				Inst = I;
				}
				}

				/// Set the parent node of this node.
				void setParent(Node *P) { Parent = P; }

				/// Set the left node of this node.
				void setLeft(Node *L) { Left = L; }

				/// Set the right node of this node.
				void setRight(Node *R) { Right = R; }

				/// Set the latency of this node.
				void setLatency(InstructionCost L) { Latency = L; }

				/// Set the total cost of this node.
				void setTotalCost(InstructionCost C) { TotalCost = C; }

				/// Get the original instruction of this node.
				Instruction *getOrgInst() const {
				assert(Inst && "Inst should not be nullptr.");
				return Inst;
				}

				/// Get the defined value of this node.
				Value *getDefinedValue() const {
				assert(DefValue && "DefValue should not be nullptr");
				return DefValue;
				}

				/// Get the parent node of this node.
				Node *getParent() const { return Parent; }

				/// Get the left node of this node.
				Node *getLeft() const { return Left; }

				/// Get the right node of this node.
				Node *getRight() const { return Right; }

				/// Get the latency of this node.
				InstructionCost getLatency() const { return Latency; }

				/// Get the total cost of this node.
				InstructionCost getTotalCost() const { return TotalCost; }

				/// Return true if this node is a branch (including a root).
				bool isBranch() const { return Left != nullptr && Right != nullptr; }

				/// Return true if this node is a pure leaf.
				bool isLeaf() const { return !isBranch(); }

				/// Return true if this node is a root.
				bool isRoot() const { return Parent == nullptr; }

				/// Return true if this node is considered as a leaf node.
				/// Tree height reduction can be applied only to nodes whose operation codes
				/// are same. In addition, IR flags like 'nuw' must be same to preserve them.
				/// So it is necessary to consider a node whose operation code or IR flags
				/// are different from its parent's ones as a leaf node.
				bool isConsideredAsLeaf() const {
				if (isLeaf())
				return true;
				if (isRoot())
				return false;
				if (getOrgInst()->getOpcode() != getParent()->getOrgInst()->getOpcode() \|\|
				!getOrgInst()->hasSameSubclassOptionalData(getParent()->getOrgInst()))
				return true;
				return false;
				}

				/// Update tree's latency under this node.
				void updateTreeLatency(TargetTransformInfo *TTI);

				/// Update the latency of this node.
				void updateNodeLatency(TargetTransformInfo *TTI);

				/// Update the latency of this node using the left or right node.
				void updateLeftOrRightNodeLatency(TargetTransformInfo TTI, Node SubNode);

				private:
				/// Original instruction of this node.
				Instruction *Inst;

				/// Defined value of this node.
				/// This may be a constant value.
				Value *DefValue;

				/// Parent node of this node.
				Node *Parent;
				/// Left node and right node of this node.
				Node Left, Right;

				/// Instruction latency.
				InstructionCost Latency;
				/// Total cost of nodes under this node.
				InstructionCost TotalCost;
				};

				class TreeHeightReduction {
				public:
				enum struct InstTy { INTEGER, FLOATING_POINT };

				TreeHeightReduction(InstTy Ty, TargetTransformInfo *TTI,
				OptimizationRemarkEmitter *ORE)
				: TargetInstTy(Ty), TTI(TTI), ORE(ORE) {}

				/// Apply tree height reduction to one basic block.
				bool runOnBasicBlock(BasicBlock *BB);

				private:
				/// Type of instruction included in the operation tree.
				InstTy TargetInstTy;

				TargetTransformInfo *TTI;
				OptimizationRemarkEmitter *ORE;

				/// Return true if 'I' is a target instruction of tree height reduction.
				bool isTHRTargetInst(Instruction *I) const;

				/// Construct an operation tree from the value 'V'.
				Node constructTree(Value V, BasicBlock *BB);

				/// Destruct an operation tree constructed by constructTree.
				void destructTree(Node *N);

				/// Collect original instructions to be erased from the basic block.
				void collectInstsToBeErasedFrom(Node N, std::vector<Instruction > &Insts);

				/// Apply tree height reduction to the tree 'N'.
				/// Returned value is a tree to which tree height reduction is applied.
				Node applyTreeHeightReduction(Node N, bool isLeft);

				/// Collect leaf nodes and reusable branch nodes under the node 'N'.
				void collectLeavesAndReusableBranches(Node N, std::vector<Node > &Leaves,
				std::vector<Node *> &ReusableBranches);

				/// Construct an optimized subtree by applying tree height reduction.
				Node constructOptimizedSubtree(std::vector<Node > &Leaves,
				std::vector<Node *> &ReusableBranches);

				/// Combine two leaf elements, create a branch from them, and put it into
				/// 'Leaves'.
				void combineLeaves(std::vector<Node > &Leaves, Node Op1, Node *Op2,
				std::vector<Node *> &ReusableBranches);

				/// Create IRs for the new tree.
				void createIRs(Node Root, std::set<Instruction > &GeneratedInsts);

				/// Create a new instruction for the node 'N' with operands 'Op1' and 'Op2'.
				Value createInst(IRBuilder<> &Builder, Node N, Value Op1, Value Op2);

				/// Return true if 'I' is a root candidate.
				bool isRootCandidate(Instruction &I) const {
				if (!isTHRTargetInst(&I))
				return false;
				for (unsigned i = 0; i < I.getNumOperands(); ++i)
				if (isBranchCandidate(I.getOperand(i)))
				return true;
				return false;
				}

				/// Return true if 'Op' is a branch candidate.
				bool isBranchCandidate(Value *Op) const {
				assert(Op && "Operand should not be nullptr");
				if (!Op->hasOneUse())
				return false;
				if (Instruction *I = dyn_cast<Instruction>(Op))
				return isTHRTargetInst(I);
				return false;
				}

				void printTree(raw_ostream &OS, Node *N, const int Indent) const {
				if (N->isLeaf())
				return;

				for (int i = 0; i < Indent; ++i)
				OS << " ";
				N->getOrgInst()->dump();

				for (int i = 0; i < Indent + 4; ++i)
				OS << " ";
				OS << "Latency: " << N->getLatency();
				OS << ", TotalCost: " << N->getTotalCost() << "\n";

				printTree(OS, N->getLeft(), Indent + 2);
				printTree(OS, N->getRight(), Indent + 2);
				}

				void printLeaves(raw_ostream &OS, std::vector<Node *> &Leaves,
				bool isBefore) const {
				if (isBefore)
				OS << " --- Before ---\n";
				else
				OS << " --- After ---\n";
				for (auto *Node : Leaves)
				printTree(OS, Node, 2);
				}
				};
				} // end anonymous namespace

				// BFS: Breadth First Search
				static std::vector<Node > getNodesByBFS(Node N) {
				std::vector<Node *> Nodes = {N};

				for (unsigned i = 0; i < Nodes.size(); ++i) {
				Node *CurNode = Nodes[i];
				if (CurNode->isBranch()) {
				Nodes.push_back(CurNode->getLeft());
				Nodes.push_back(CurNode->getRight());
				}
				}

				return std::move(Nodes);
				}

				void Node::updateTreeLatency(TargetTransformInfo *TTI) {
				std::vector<Node *> Nodes = getNodesByBFS(this);
				// Update node latency by bottom-up order.
				for (auto Begin = Nodes.rbegin(), End = Nodes.rend(); Begin != End; ++Begin) {
				Node CurNode = Begin;
				CurNode->updateNodeLatency(TTI);
				}
				}

				void Node::updateNodeLatency(TargetTransformInfo *TTI) {
				setLatency(0);
				setTotalCost(0);

				if (isLeaf())
				return;

				// Tree height reduction minimizes the weighted sum of heights.
				// The latency of each instruction is used as the height of each node.
				// This makes an optimal operation tree even if operation types (add, mul,
				// etc.) are mixed in the tree.

				updateLeftOrRightNodeLatency(TTI, getLeft());
				updateLeftOrRightNodeLatency(TTI, getRight());

				Instruction *Inst = getOrgInst();
				InstructionCost InstLatency =
				TTI->getInstructionCost(Inst, TargetTransformInfo::TCK_Latency);
				// To balance the operation tree by sorting nodes by latency even if the
				// latency is unknown, set a valid latency.
				if (!InstLatency.isValid())
				InstLatency = InstructionCost(1);
				setLatency(getLatency() + InstLatency);
				setTotalCost(getTotalCost() + InstLatency);
				}

				void Node::updateLeftOrRightNodeLatency(TargetTransformInfo *TTI,
				Node *SubNode) {
				assert(SubNode && "Left or right node should not be nullptr.");
				const InstructionCost SubNodeLatency = SubNode->getLatency();
				if (SubNodeLatency > getLatency())
				setLatency(SubNodeLatency);
				setTotalCost(getTotalCost() + SubNode->getTotalCost());
				}

				static void eraseOrgInsts(std::vector<Instruction *> &Insts) {
				for (auto *I : Insts)
				I->eraseFromParent();
				}

				static bool isProfitableToApply(Node *Root) {
				std::vector<Node *> Nodes = getNodesByBFS(Root);
				unsigned NumLeaves = 0;

				for (auto *N : Nodes) {
				if (N->isLeaf())
				++NumLeaves;
				if (NumLeaves >= MinLeaves)
				return true;
				}

				return false;
				}

				bool TreeHeightReduction::runOnBasicBlock(BasicBlock *BB) {
				bool Applied = false;
				std::set<Instruction *> GeneratedInsts;

				// Search target instructions in the basic block reversely.
				for (auto Begin = BB->rbegin(); Begin != BB->rend(); ++Begin) {
				Instruction &I = *Begin;
				if (GeneratedInsts.count(&I) == 1)
				continue;
				if (!isRootCandidate(I))
				continue;

				// Construct an original operation tree from the root instruction.
				Node *OrgTree = constructTree(&I, BB);

				if (!isProfitableToApply(OrgTree)) {
				destructTree(OrgTree);
				continue;
				}

				std::vector<Instruction *> OrgInsts;
				collectInstsToBeErasedFrom(OrgTree, OrgInsts);

				// Apply tree height reduction to the original tree 'OrgTree'
				// and create a new tree 'ReducedTree'.
				Node *ReducedTree = applyTreeHeightReduction(OrgTree, true);
				assert(ReducedTree->getOrgInst() == OrgTree->getOrgInst() &&
				"OrgInst of ReducedTree and OrgTree should be same.");

				// Create IRs from the tree to which tree height reduction was applied.
				createIRs(ReducedTree, GeneratedInsts);

				// The following optimization message output process must be called
				// before 'eraseOrgInsts' because 'I' is erased there.
				ORE->emit([&]() {
				return OptimizationRemark(DEBUG_TYPE, "TreeHeightReduction", &I)
				<< "reduced tree height";
				});

				--Begin;
				eraseOrgInsts(OrgInsts);
				Applied = true;

				// 'OrgTree' and 'ReducedTree' share memory, so it is enough to release
				// memory of 'ReducedTree'.
				destructTree(ReducedTree);
				}

				return Applied;
				}

				template <typename T> static bool isUsedAtCmpInst(Instruction *I) {
				for (User *U : I->users()) {
				Instruction *UserInst = dyn_cast<Instruction>(U);
				if (UserInst && isa<T>(*UserInst))
				return true;
				}
				return false;
				}

				// Return true if 'I' is an integer type and is a target instruction of tree
				// height reduction.
				static bool isIntgerInstTHRTarget(Instruction *I) {
				if (!I->getType()->isIntegerTy())
				return false;
				// 'I' which is used at ICmpInst may be an induction variable.
				if (isUsedAtCmpInst<ICmpInst>(I))
				return false;
				// Add, Mul, And, Or, or Xor
				return I->isCommutative() && I->isAssociative();
				}

				// Return true if 'I' is a floating-point type and is a target instruction
				// of tree height reduction.
				static bool isFpInstTHRTarget(Instruction *I) {
				if (!I->getType()->isFloatingPointTy())
				return false;
				// FAdd or FMul with reassoc
				return I->isCommutative() && I->isAssociative();
				}

				bool TreeHeightReduction::isTHRTargetInst(Instruction *I) const {
				switch (TargetInstTy) {
				case InstTy::INTEGER:
				return isIntgerInstTHRTarget(I);
				case InstTy::FLOATING_POINT:
				return isFpInstTHRTarget(I);
				default:
				return false;
				}
				}

				Node TreeHeightReduction::constructTree(Value V, BasicBlock *BB) {
				if (!isBranchCandidate(V))
				seanxilinxUnsubmitted Not Done Reply Inline Actions for the root of tree, we may not need to check whether it is isBranchCandidate or not. seanxilinx: for the root of tree, we may not need to check whether it is isBranchCandidate or not.
				return new Node(V);

				Instruction *I = dyn_cast<Instruction>(V);
				assert(I && "Instruction should not be nullptr.");
				assert(I->getNumOperands() == 2 && "The number of operands should be 2.");

				if (I->getParent() != BB)
				return new Node(V);

				Node *Parent = new Node(V);

				Value *LeftOp = I->getOperand(0);
				Node *Left = constructTree(LeftOp, BB);
				Parent->setLeft(Left);
				Left->setParent(Parent);

				Value *RightOp = I->getOperand(1);
				Node *Right = constructTree(RightOp, BB);
				Parent->setRight(Right);
				Right->setParent(Parent);

				return Parent;
				}

				void TreeHeightReduction::destructTree(Node *N) {
				std::vector<Node *> Nodes = getNodesByBFS(N);
				for (auto *CurNode : Nodes)
				delete CurNode;
				}

				void TreeHeightReduction::collectInstsToBeErasedFrom(
				Node N, std::vector<Instruction > &Insts) {
				std::vector<Node *> Nodes = getNodesByBFS(N);
				for (auto *CurNode : Nodes)
				// Instructions belonging to leaf nodes should be saved.
				if (!CurNode->isLeaf())
				Insts.push_back(CurNode->getOrgInst());
				}

				Node TreeHeightReduction::applyTreeHeightReduction(Node N, bool isLeft) {
				// Postorder depth-first search.
				if (!N->isBranch())
				return N;
				applyTreeHeightReduction(N->getLeft(), true);
				applyTreeHeightReduction(N->getRight(), false);

				// Save original parent information.
				Node *Parent = N->getParent();

				std::vector<Node *> Leaves;
				// 'ReusableBranches' holds branch nodes which are reused when updating
				// parent and child node's relationship in constructOptimizedSubtree().
				// By doing so, the amount of memory used can be reduced.
				std::vector<Node *> ReusableBranches;
				collectLeavesAndReusableBranches(N, Leaves, ReusableBranches);

				Node *NewNode = constructOptimizedSubtree(Leaves, ReusableBranches);
				NewNode->setParent(Parent);
				if (Parent) {
				if (isLeft)
				Parent->setLeft(NewNode);
				else
				Parent->setRight(NewNode);
				}
				// Return value has a meaning only if 'Parent' is nullptr because
				// this means 'Node' is a root node.
				return NewNode;
				}

				void TreeHeightReduction::collectLeavesAndReusableBranches(
				Node N, std::vector<Node > &Leaves,
				std::vector<Node *> &ReusableBranches) {
				std::vector<Node *> Worklist = {N};

				// NOTE: Don't use 'getNodesAndLeavesByBFS'! Like that function, the following
				// processing collects all leaves and all branches by BFS, but it is
				// different from that function because this is BFS with a condition.
				for (unsigned i = 0; i < Worklist.size(); ++i) {
				Node *CurNode = Worklist[i];
				if (CurNode->isConsideredAsLeaf()) {
				Leaves.push_back(CurNode);
				} else {
				ReusableBranches.push_back(CurNode);
				Worklist.push_back(CurNode->getLeft());
				Worklist.push_back(CurNode->getRight());
				}
				}
				}

				Node *TreeHeightReduction::constructOptimizedSubtree(
				std::vector<Node > &Leaves, std::vector<Node > &ReusableBranches) {
				while (Leaves.size() > 1) {
				llvm::stable_sort(Leaves, [](Node LHS, Node RHS) -> bool {
				if (LHS->getLatency() != RHS->getLatency())
				return LHS->getLatency() < RHS->getLatency();
				if (LHS->getTotalCost() != RHS->getTotalCost())
				return LHS->getTotalCost() < RHS->getTotalCost();
				return false;
				});
				LLVM_DEBUG(printLeaves(dbgs(), Leaves, true));

				Node Op1 = Leaves[0], Op2 = Leaves[1];
				Leaves.erase(Leaves.begin(), Leaves.begin() + 2);
				combineLeaves(Leaves, Op1, Op2, ReusableBranches);

				LLVM_DEBUG(printLeaves(dbgs(), Leaves, false));
				}

				return Leaves.front();
				}

				void TreeHeightReduction::combineLeaves(std::vector<Node > &Leaves, Node Op1,
				Node *Op2,
				std::vector<Node *> &ReusableBranches) {
				Node *N = ReusableBranches.back();
				ReusableBranches.pop_back();

				// Update the parent-child relationship.
				N->setLeft(Op1);
				N->setRight(Op2);
				Op1->setParent(N);
				Op2->setParent(N);

				N->updateTreeLatency(TTI);
				Leaves.push_back(N);
				}

				static std::vector<Node > getBranches(Node N) {
				std::vector<Node *> Nodes = getNodesByBFS(N);

				// Exclude leaves.
				for (auto Iter = Nodes.begin(); Iter != Nodes.end(); ++Iter)
				if ((*Iter)->isLeaf()) {
				Nodes.erase(Iter);
				--Iter;
				}

				return std::move(Nodes);
				}

				static void setLeftAndRightOperand(Value &LeftOp, Value &RightOp, Node *N,
				std::queue<Value *> &Ops) {
				auto frontPopVal = [](std::queue<Value > &Ops) -> Value {
				Value *V = Ops.front();
				Ops.pop();
				return V;
				};

				Node *L = N->getLeft();
				Node *R = N->getRight();
				if (L->isLeaf() && R->isLeaf()) {
				// Both the left child and the right child of 'N' are leaves.
				LeftOp = L->getDefinedValue();
				RightOp = R->getDefinedValue();
				} else {
				assert(!Ops.empty());
				if (L->isLeaf()) {
				// The left child is a leaf and the right child is a branch.
				LeftOp = L->getDefinedValue();
				RightOp = frontPopVal(Ops);
				} else if (R->isLeaf()) {
				// The left child is a branch and the right child is a leaf.
				LeftOp = frontPopVal(Ops);
				RightOp = R->getDefinedValue();
				} else {
				// Both the left child and the right child of 'N' are branches.
				LeftOp = frontPopVal(Ops);
				RightOp = frontPopVal(Ops);
				}
				}
				}

				void TreeHeightReduction::createIRs(Node *Root,
				std::set<Instruction *> &GeneratedInsts) {
				Instruction *RootInst = Root->getOrgInst();
				IRBuilder<> Builder(RootInst);

				std::vector<Node *> Nodes = getBranches(Root);
				std::queue<Value *> Ops;
				while (!Nodes.empty()) {
				Node *N = Nodes.back();

				Value LeftOp = nullptr, RightOp = nullptr;
				setLeftAndRightOperand(LeftOp, RightOp, N, Ops);

				Value *NewNodeValue = createInst(Builder, N, LeftOp, RightOp);
				Ops.push(NewNodeValue);
				GeneratedInsts.insert(dyn_cast<Instruction>(NewNodeValue));

				Nodes.pop_back();
				}

				assert(Ops.size() == 1 && "The size of queue should be 1.");
				Value *NewRootValue = Ops.front();
				GeneratedInsts.insert(dyn_cast<Instruction>(NewRootValue));
				RootInst->replaceAllUsesWith(NewRootValue);
				}

				Value TreeHeightReduction::createInst(IRBuilder<> &Builder, Node N,
				Value Op1, Value Op2) {
				Value *V;
				switch (N->getOrgInst()->getOpcode()) {
				case Instruction::Add:
				V = Builder.CreateAdd(Op1, Op2, Twine("thr.add"));
				break;
				case Instruction::Mul:
				V = Builder.CreateMul(Op1, Op2, Twine("thr.mul"));
				break;
				case Instruction::And:
				V = Builder.CreateAnd(Op1, Op2, Twine("thr.and"));
				break;
				case Instruction::Or:
				V = Builder.CreateOr(Op1, Op2, Twine("thr.or"));
				break;
				case Instruction::Xor:
				V = Builder.CreateXor(Op1, Op2, Twine("thr.xor"));
				break;
				case Instruction::FAdd:
				V = Builder.CreateFAdd(Op1, Op2, Twine("thr.fadd"));
				break;
				case Instruction::FMul:
				V = Builder.CreateFMul(Op1, Op2, Twine("thr.fmul"));
				break;
				default:
				llvm_unreachable("Unexpected operation code");
				}

				// Take over the original instruction IR flags.
				// V may have been folded to Constant if Op1 and Op2 are both Constant.
				if (Instruction *NewInst = dyn_cast<Instruction>(V))
				NewInst->copyIRFlags(N->getDefinedValue());

				return V;
				}

				PreservedAnalyses TreeHeightReductionPass::run(Loop &L, LoopAnalysisManager &AM,
				LoopStandardAnalysisResults &AR,
				LPMUpdater &U) {
				// Tree height reduction is applied only to inner-most loops.
				if (!L.isInnermost())
				return PreservedAnalyses::all();

				OptimizationRemarkEmitter ORE(L.getHeader()->getParent());
				bool Changed = false;
				auto &LoopBlocks = L.getBlocksVector();

				if (EnableIntTHR) {
				auto THR = TreeHeightReduction(TreeHeightReduction::InstTy::INTEGER,
				&AR.TTI, &ORE);
				for (auto *BB : LoopBlocks)
				Changed \|= THR.runOnBasicBlock(BB);
				}

				if (EnableFpTHR) {
				auto THR = TreeHeightReduction(TreeHeightReduction::InstTy::FLOATING_POINT,
				&AR.TTI, &ORE);
				for (auto *BB : LoopBlocks)
				Changed \|= THR.runOnBasicBlock(BB);
				}

				if (!Changed)
				return PreservedAnalyses::all();
				return getLoopPassPreservedAnalyses();
				}

llvm/test/Other/new-pm-defaults.ll

	Show First 20 Lines • Show All 254 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass			; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LICMPass			; CHECK-O-NEXT: Running pass: LICMPass
	; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass			; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass
				; CHECK-O-NEXT: Running pass: LoopSimplifyPass
				; CHECK-O-NEXT: Running pass: LCSSAPass
				; CHECK-O-NEXT: Running pass: TreeHeightReductionPass
	; CHECK-O-NEXT: Running pass: LoopSinkPass			; CHECK-O-NEXT: Running pass: LoopSinkPass
	; CHECK-O-NEXT: Running pass: InstSimplifyPass			; CHECK-O-NEXT: Running pass: InstSimplifyPass
	; CHECK-O-NEXT: Running pass: DivRemPairsPass			; CHECK-O-NEXT: Running pass: DivRemPairsPass
	; CHECK-O-NEXT: Running pass: TailCallElimPass			; CHECK-O-NEXT: Running pass: TailCallElimPass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-EP-OPTIMIZER-LAST: Running pass: NoOpModulePass			; CHECK-EP-OPTIMIZER-LAST: Running pass: NoOpModulePass
	; CHECK-HOT-COLD-SPLIT-NEXT: Running pass: HotColdSplittingPass			; CHECK-HOT-COLD-SPLIT-NEXT: Running pass: HotColdSplittingPass
	; CHECK-IR-OUTLINER-NEXT: Running pass: IROutlinerPass			; CHECK-IR-OUTLINER-NEXT: Running pass: IROutlinerPass
	Show All 40 Lines

llvm/test/Other/new-pm-thinlto-defaults.ll

	Show First 20 Lines • Show All 218 Lines • ▼ Show 20 Lines
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-POSTLINK-O-NEXT: Running pass: WarnMissedTransformationsPass			; CHECK-POSTLINK-O-NEXT: Running pass: WarnMissedTransformationsPass
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-POSTLINK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass			; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LICMPass			; CHECK-POSTLINK-O-NEXT: Running pass: LICMPass
	; CHECK-POSTLINK-O-NEXT: Running pass: AlignmentFromAssumptionsPass			; CHECK-POSTLINK-O-NEXT: Running pass: AlignmentFromAssumptionsPass
				; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass
				; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass
				; CHECK-POSTLINK-O-NEXT: Running pass: TreeHeightReductionPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopSinkPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopSinkPass
	; CHECK-POSTLINK-O-NEXT: Running pass: InstSimplifyPass			; CHECK-POSTLINK-O-NEXT: Running pass: InstSimplifyPass
	; CHECK-POSTLINK-O-NEXT: Running pass: DivRemPairsPass			; CHECK-POSTLINK-O-NEXT: Running pass: DivRemPairsPass
	; CHECK-POSTLINK-O-NEXT: Running pass: TailCallElimPass			; CHECK-POSTLINK-O-NEXT: Running pass: TailCallElimPass
	; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-POSTLINK-O-NEXT: Running pass: GlobalDCEPass			; CHECK-POSTLINK-O-NEXT: Running pass: GlobalDCEPass
	; CHECK-POSTLINK-O-NEXT: Running pass: ConstantMergePass			; CHECK-POSTLINK-O-NEXT: Running pass: ConstantMergePass
	; CHECK-POSTLINK-O-NEXT: Running pass: CGProfilePass			; CHECK-POSTLINK-O-NEXT: Running pass: CGProfilePass
	Show All 35 Lines

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

	Show First 20 Lines • Show All 187 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass			; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LICMPass			; CHECK-O-NEXT: Running pass: LICMPass
	; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass			; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass
				; CHECK-O-NEXT: Running pass: LoopSimplifyPass
				; CHECK-O-NEXT: Running pass: LCSSAPass
				; CHECK-O-NEXT: Running pass: TreeHeightReductionPass
	; CHECK-O-NEXT: Running pass: LoopSinkPass			; CHECK-O-NEXT: Running pass: LoopSinkPass
	; CHECK-O-NEXT: Running pass: InstSimplifyPass			; CHECK-O-NEXT: Running pass: InstSimplifyPass
	; CHECK-O-NEXT: Running pass: DivRemPairsPass			; CHECK-O-NEXT: Running pass: DivRemPairsPass
	; CHECK-O-NEXT: Running pass: TailCallElimPass			; CHECK-O-NEXT: Running pass: TailCallElimPass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: GlobalDCEPass			; CHECK-O-NEXT: Running pass: GlobalDCEPass
	; CHECK-O-NEXT: Running pass: ConstantMergePass			; CHECK-O-NEXT: Running pass: ConstantMergePass
	; CHECK-O-NEXT: Running pass: CGProfilePass			; CHECK-O-NEXT: Running pass: CGProfilePass
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

	Show First 20 Lines • Show All 199 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass			; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LICMPass			; CHECK-O-NEXT: Running pass: LICMPass
	; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass			; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass
				; CHECK-O-NEXT: Running pass: LoopSimplifyPass
				; CHECK-O-NEXT: Running pass: LCSSAPass
				; CHECK-O-NEXT: Running pass: TreeHeightReductionPass
	; CHECK-O-NEXT: Running pass: LoopSinkPass			; CHECK-O-NEXT: Running pass: LoopSinkPass
	; CHECK-O-NEXT: Running pass: InstSimplifyPass			; CHECK-O-NEXT: Running pass: InstSimplifyPass
	; CHECK-O-NEXT: Running pass: DivRemPairsPass			; CHECK-O-NEXT: Running pass: DivRemPairsPass
	; CHECK-O-NEXT: Running pass: TailCallElimPass			; CHECK-O-NEXT: Running pass: TailCallElimPass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: GlobalDCEPass			; CHECK-O-NEXT: Running pass: GlobalDCEPass
	; CHECK-O-NEXT: Running pass: ConstantMergePass			; CHECK-O-NEXT: Running pass: ConstantMergePass
	; CHECK-O-NEXT: Running pass: CGProfilePass			; CHECK-O-NEXT: Running pass: CGProfilePass
	Show All 35 Lines

llvm/test/Transforms/TreeHeightReduction/floating-point-add-only.ll

This file was added.

				; RUN: opt -S -passes=tree-height-reduction -enable-fp-thr < %s \| FileCheck %s

				; CHECK-LABEL: @add_float(
				; CHECK: %[[V0:.]] = fadd reassoc nsz float {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = fadd reassoc nsz float {{.}}, {{.*}}
				; CHECK-NEXT: %[[V2:.]] = fadd reassoc nsz float {{.}}, {{.*}}
				; CHECK-NEXT: %[[V3:.]] = fadd reassoc nsz float {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.*]] = fadd reassoc nsz float %[[V0]], %[[V1]]
				; CHECK-NEXT: %[[V5:.*]] = fadd reassoc nsz float %[[V2]], %[[V3]]
				; CHECK-NEXT: fadd reassoc nsz float %[[V4]], %[[V5]]
				define void @add_float(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds float, ptr %B, i64 %indvars.iv
				%0 = load float, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds float, ptr %C, i64 %indvars.iv
				%1 = load float, ptr %arrayidx.2, align 4
				%2 = fadd reassoc nsz float %1, %0
				%arrayidx.3 = getelementptr inbounds float, ptr %D, i64 %indvars.iv
				%3 = load float, ptr %arrayidx.3, align 4
				%4 = fadd reassoc nsz float %2, %3
				%arrayidx.4 = getelementptr inbounds float, ptr %E, i64 %indvars.iv
				%5 = load float, ptr %arrayidx.4, align 4
				%6 = fadd reassoc nsz float %4, %5
				%arrayidx.5 = getelementptr inbounds float, ptr %F, i64 %indvars.iv
				%7 = load float, ptr %arrayidx.5, align 4
				%8 = fadd reassoc nsz float %6, %7
				%arrayidx.6 = getelementptr inbounds float, ptr %G, i64 %indvars.iv
				%9 = load float, ptr %arrayidx.6, align 4
				%10 = fadd reassoc nsz float %8, %9
				%arrayidx.7 = getelementptr inbounds float, ptr %H, i64 %indvars.iv
				%11 = load float, ptr %arrayidx.7, align 4
				%12 = fadd reassoc nsz float %10, %11
				%arrayidx.8 = getelementptr inbounds float, ptr %I, i64 %indvars.iv
				%13 = load float, ptr %arrayidx.8, align 4
				%14 = fadd reassoc nsz float %12, %13
				%arrayidx.9 = getelementptr inbounds float, ptr %A, i64 %indvars.iv
				store float %14, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

				; CHECK-LABEL: @add_double(
				; CHECK: %[[V0:.]] = fadd reassoc nsz double {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = fadd reassoc nsz double {{.}}, {{.*}}
				; CHECK-NEXT: %[[V2:.]] = fadd reassoc nsz double {{.}}, {{.*}}
				; CHECK-NEXT: %[[V3:.]] = fadd reassoc nsz double {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.*]] = fadd reassoc nsz double %[[V0]], %[[V1]]
				; CHECK-NEXT: %[[V5:.*]] = fadd reassoc nsz double %[[V2]], %[[V3]]
				; CHECK-NEXT: fadd reassoc nsz double %[[V4]], %[[V5]]
				define void @add_double(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds double, ptr %B, i64 %indvars.iv
				%0 = load double, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds double, ptr %C, i64 %indvars.iv
				%1 = load double, ptr %arrayidx.2, align 4
				%2 = fadd reassoc nsz double %1, %0
				%arrayidx.3 = getelementptr inbounds double, ptr %D, i64 %indvars.iv
				%3 = load double, ptr %arrayidx.3, align 4
				%4 = fadd reassoc nsz double %2, %3
				%arrayidx.4 = getelementptr inbounds double, ptr %E, i64 %indvars.iv
				%5 = load double, ptr %arrayidx.4, align 4
				%6 = fadd reassoc nsz double %4, %5
				%arrayidx.5 = getelementptr inbounds double, ptr %F, i64 %indvars.iv
				%7 = load double, ptr %arrayidx.5, align 4
				%8 = fadd reassoc nsz double %6, %7
				%arrayidx.6 = getelementptr inbounds double, ptr %G, i64 %indvars.iv
				%9 = load double, ptr %arrayidx.6, align 4
				%10 = fadd reassoc nsz double %8, %9
				%arrayidx.7 = getelementptr inbounds double, ptr %H, i64 %indvars.iv
				%11 = load double, ptr %arrayidx.7, align 4
				%12 = fadd reassoc nsz double %10, %11
				%arrayidx.8 = getelementptr inbounds double, ptr %I, i64 %indvars.iv
				%13 = load double, ptr %arrayidx.8, align 4
				%14 = fadd reassoc nsz double %12, %13
				%arrayidx.9 = getelementptr inbounds double, ptr %A, i64 %indvars.iv
				store double %14, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

llvm/test/Transforms/TreeHeightReduction/floating-point-add-with-constant.ll

This file was added.

				; RUN: opt -S -passes=tree-height-reduction -enable-fp-thr < %s \| FileCheck %s

				; CHECK-LABEL: @add_float_with_constant(
				; CHECK: %[[V0:.]] = fadd reassoc nsz float {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = fadd reassoc nsz float {{.}}, %[[V0]]
				; CHECK-NEXT: %[[V2:.]] = fadd reassoc nsz float {{.}}, 1.000000e+01
				; CHECK-NEXT: %[[V3:.]] = fadd reassoc nsz float {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.]] = fadd reassoc nsz float {{.}}, {{.*}}
				; CHECK-NEXT: %[[V5:.*]] = fadd reassoc nsz float %[[V1]], %[[V2]]
				; CHECK-NEXT: %[[V6:.*]] = fadd reassoc nsz float %[[V3]], %[[V4]]
				; CHECK-NEXT: fadd reassoc nsz float %[[V5]], %[[V6]]
				define void @add_float_with_constant(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds float, ptr %B, i64 %indvars.iv
				%0 = load float, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds float, ptr %C, i64 %indvars.iv
				%1 = load float, ptr %arrayidx.2, align 4
				%arrayidx.3 = getelementptr inbounds float, ptr %D, i64 %indvars.iv
				%2 = load float, ptr %arrayidx.3, align 4
				%arrayidx.4 = getelementptr inbounds float, ptr %E, i64 %indvars.iv
				%3 = load float, ptr %arrayidx.4, align 4
				%arrayidx.5 = getelementptr inbounds float, ptr %F, i64 %indvars.iv
				%4 = load float, ptr %arrayidx.5, align 4
				%arrayidx.6 = getelementptr inbounds float, ptr %G, i64 %indvars.iv
				%5 = load float, ptr %arrayidx.6, align 4
				%arrayidx.7 = getelementptr inbounds float, ptr %H, i64 %indvars.iv
				%6 = load float, ptr %arrayidx.7, align 4
				%arrayidx.8 = getelementptr inbounds float, ptr %I, i64 %indvars.iv
				%7 = load float, ptr %arrayidx.8, align 4
				%8 = fadd reassoc nsz float %0, 1.000000e+01
				%9 = fadd reassoc nsz float %8, %1
				%10 = fadd reassoc nsz float %9, %2
				%11 = fadd reassoc nsz float %10, %3
				%12 = fadd reassoc nsz float %11, %4
				%13 = fadd reassoc nsz float %12, %5
				%14 = fadd reassoc nsz float %13, %6
				%15 = fadd reassoc nsz float %14, %7
				%arrayidx.9 = getelementptr inbounds float, ptr %A, i64 %indvars.iv
				store float %15, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

				; CHECK-LABEL: @add_double_with_constant(
				; CHECK: %[[V0:.]] = fadd reassoc nsz double {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = fadd reassoc nsz double {{.}}, %[[V0]]
				; CHECK-NEXT: %[[V2:.]] = fadd reassoc nsz double {{.}}, 1.000000e+01
				; CHECK-NEXT: %[[V3:.]] = fadd reassoc nsz double {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.]] = fadd reassoc nsz double {{.}}, {{.*}}
				; CHECK-NEXT: %[[V5:.*]] = fadd reassoc nsz double %[[V1]], %[[V2]]
				; CHECK-NEXT: %[[V6:.*]] = fadd reassoc nsz double %[[V3]], %[[V4]]
				; CHECK-NEXT: fadd reassoc nsz double %[[V5]], %[[V6]]
				define void @add_double_with_constant(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds double, ptr %B, i64 %indvars.iv
				%0 = load double, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds double, ptr %C, i64 %indvars.iv
				%1 = load double, ptr %arrayidx.2, align 4
				%arrayidx.3 = getelementptr inbounds double, ptr %D, i64 %indvars.iv
				%2 = load double, ptr %arrayidx.3, align 4
				%arrayidx.4 = getelementptr inbounds double, ptr %E, i64 %indvars.iv
				%3 = load double, ptr %arrayidx.4, align 4
				%arrayidx.5 = getelementptr inbounds double, ptr %F, i64 %indvars.iv
				%4 = load double, ptr %arrayidx.5, align 4
				%arrayidx.6 = getelementptr inbounds double, ptr %G, i64 %indvars.iv
				%5 = load double, ptr %arrayidx.6, align 4
				%arrayidx.7 = getelementptr inbounds double, ptr %H, i64 %indvars.iv
				%6 = load double, ptr %arrayidx.7, align 4
				%arrayidx.8 = getelementptr inbounds double, ptr %I, i64 %indvars.iv
				%7 = load double, ptr %arrayidx.8, align 4
				%8 = fadd reassoc nsz double %0, 1.000000e+01
				%9 = fadd reassoc nsz double %8, %1
				%10 = fadd reassoc nsz double %9, %2
				%11 = fadd reassoc nsz double %10, %3
				%12 = fadd reassoc nsz double %11, %4
				%13 = fadd reassoc nsz double %12, %5
				%14 = fadd reassoc nsz double %13, %6
				%15 = fadd reassoc nsz double %14, %7
				%arrayidx.9 = getelementptr inbounds double, ptr %A, i64 %indvars.iv
				store double %15, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

llvm/test/Transforms/TreeHeightReduction/floating-point-mult-only.ll

This file was added.

				; RUN: opt -S -passes=tree-height-reduction -enable-fp-thr < %s \| FileCheck %s

				; CHECK-LABEL: @add_float(
				; CHECK: %[[V0:.]] = fmul reassoc nsz float {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = fmul reassoc nsz float {{.}}, {{.*}}
				; CHECK-NEXT: %[[V2:.]] = fmul reassoc nsz float {{.}}, {{.*}}
				; CHECK-NEXT: %[[V3:.]] = fmul reassoc nsz float {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.*]] = fmul reassoc nsz float %[[V0]], %[[V1]]
				; CHECK-NEXT: %[[V5:.*]] = fmul reassoc nsz float %[[V2]], %[[V3]]
				; CHECK-NEXT: fmul reassoc nsz float %[[V4]], %[[V5]]
				define void @add_float(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds float, ptr %B, i64 %indvars.iv
				%0 = load float, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds float, ptr %C, i64 %indvars.iv
				%1 = load float, ptr %arrayidx.2, align 4
				%2 = fmul reassoc nsz float %1, %0
				%arrayidx.3 = getelementptr inbounds float, ptr %D, i64 %indvars.iv
				%3 = load float, ptr %arrayidx.3, align 4
				%4 = fmul reassoc nsz float %2, %3
				%arrayidx.4 = getelementptr inbounds float, ptr %E, i64 %indvars.iv
				%5 = load float, ptr %arrayidx.4, align 4
				%6 = fmul reassoc nsz float %4, %5
				%arrayidx.5 = getelementptr inbounds float, ptr %F, i64 %indvars.iv
				%7 = load float, ptr %arrayidx.5, align 4
				%8 = fmul reassoc nsz float %6, %7
				%arrayidx.6 = getelementptr inbounds float, ptr %G, i64 %indvars.iv
				%9 = load float, ptr %arrayidx.6, align 4
				%10 = fmul reassoc nsz float %8, %9
				%arrayidx.7 = getelementptr inbounds float, ptr %H, i64 %indvars.iv
				%11 = load float, ptr %arrayidx.7, align 4
				%12 = fmul reassoc nsz float %10, %11
				%arrayidx.8 = getelementptr inbounds float, ptr %I, i64 %indvars.iv
				%13 = load float, ptr %arrayidx.8, align 4
				%14 = fmul reassoc nsz float %12, %13
				%arrayidx.9 = getelementptr inbounds float, ptr %A, i64 %indvars.iv
				store float %14, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

				; CHECK-LABEL: @add_double(
				; CHECK: %[[V0:.]] = fmul reassoc nsz double {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = fmul reassoc nsz double {{.}}, {{.*}}
				; CHECK-NEXT: %[[V2:.]] = fmul reassoc nsz double {{.}}, {{.*}}
				; CHECK-NEXT: %[[V3:.]] = fmul reassoc nsz double {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.*]] = fmul reassoc nsz double %[[V0]], %[[V1]]
				; CHECK-NEXT: %[[V5:.*]] = fmul reassoc nsz double %[[V2]], %[[V3]]
				; CHECK-NEXT: fmul reassoc nsz double %[[V4]], %[[V5]]
				define void @add_double(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds double, ptr %B, i64 %indvars.iv
				%0 = load double, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds double, ptr %C, i64 %indvars.iv
				%1 = load double, ptr %arrayidx.2, align 4
				%2 = fmul reassoc nsz double %1, %0
				%arrayidx.3 = getelementptr inbounds double, ptr %D, i64 %indvars.iv
				%3 = load double, ptr %arrayidx.3, align 4
				%4 = fmul reassoc nsz double %2, %3
				%arrayidx.4 = getelementptr inbounds double, ptr %E, i64 %indvars.iv
				%5 = load double, ptr %arrayidx.4, align 4
				%6 = fmul reassoc nsz double %4, %5
				%arrayidx.5 = getelementptr inbounds double, ptr %F, i64 %indvars.iv
				%7 = load double, ptr %arrayidx.5, align 4
				%8 = fmul reassoc nsz double %6, %7
				%arrayidx.6 = getelementptr inbounds double, ptr %G, i64 %indvars.iv
				%9 = load double, ptr %arrayidx.6, align 4
				%10 = fmul reassoc nsz double %8, %9
				%arrayidx.7 = getelementptr inbounds double, ptr %H, i64 %indvars.iv
				%11 = load double, ptr %arrayidx.7, align 4
				%12 = fmul reassoc nsz double %10, %11
				%arrayidx.8 = getelementptr inbounds double, ptr %I, i64 %indvars.iv
				%13 = load double, ptr %arrayidx.8, align 4
				%14 = fmul reassoc nsz double %12, %13
				%arrayidx.9 = getelementptr inbounds double, ptr %A, i64 %indvars.iv
				store double %14, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

llvm/test/Transforms/TreeHeightReduction/floating-point-sub-only.ll

This file was added.

				; RUN: opt -S -passes=tree-height-reduction -enable-fp-thr < %s \| FileCheck %s

				; CHECK-LABEL: @sub_float(
				; CHECK: %[[V0:.]] = fsub reassoc nsz float {{.}}, {{.*}}
				; CHECK: %[[V1:.]] = fsub reassoc nsz float %[[V0]], %{{.}}
				; CHECK: %[[V2:.]] = fsub reassoc nsz float %[[V1]], {{.}}
				; CHECK: %[[V3:.]] = fsub reassoc nsz float %[[V2]], {{.}}
				; CHECK: %[[V4:.]] = fsub reassoc nsz float %[[V3]], {{.}}
				; CHECK: %[[V5:.]] = fsub reassoc nsz float %[[V4]], {{.}}
				; CHECK: fsub reassoc nsz float %[[V5]], {{.*}}
				define void @sub_float(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds float, ptr %B, i64 %indvars.iv
				%0 = load float, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds float, ptr %C, i64 %indvars.iv
				%1 = load float, ptr %arrayidx.2, align 4
				%2 = fsub reassoc nsz float %0, %1
				%arrayidx.3 = getelementptr inbounds float, ptr %D, i64 %indvars.iv
				%3 = load float, ptr %arrayidx.3, align 4
				%4 = fsub reassoc nsz float %2, %3
				%arrayidx.4 = getelementptr inbounds float, ptr %E, i64 %indvars.iv
				%5 = load float, ptr %arrayidx.4, align 4
				%6 = fsub reassoc nsz float %4, %5
				%arrayidx.5 = getelementptr inbounds float, ptr %F, i64 %indvars.iv
				%7 = load float, ptr %arrayidx.5, align 4
				%8 = fsub reassoc nsz float %6, %7
				%arrayidx.6 = getelementptr inbounds float, ptr %G, i64 %indvars.iv
				%9 = load float, ptr %arrayidx.6, align 4
				%10 = fsub reassoc nsz float %8, %9
				%arrayidx.7 = getelementptr inbounds float, ptr %H, i64 %indvars.iv
				%11 = load float, ptr %arrayidx.7, align 4
				%12 = fsub reassoc nsz float %10, %11
				%arrayidx.8 = getelementptr inbounds float, ptr %I, i64 %indvars.iv
				%13 = load float, ptr %arrayidx.8, align 4
				%14 = fsub reassoc nsz float %12, %13
				%arrayidx.9 = getelementptr inbounds float, ptr %A, i64 %indvars.iv
				store float %14, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

				; CHECK-LABEL: @sub_double(
				; CHECK: %[[V0:.]] = fsub reassoc nsz double {{.}}, {{.*}}
				; CHECK: %[[V1:.]] = fsub reassoc nsz double %[[V0]], %{{.}}
				; CHECK: %[[V2:.]] = fsub reassoc nsz double %[[V1]], {{.}}
				; CHECK: %[[V3:.]] = fsub reassoc nsz double %[[V2]], {{.}}
				; CHECK: %[[V4:.]] = fsub reassoc nsz double %[[V3]], {{.}}
				; CHECK: %[[V5:.]] = fsub reassoc nsz double %[[V4]], {{.}}
				; CHECK: fsub reassoc nsz double %[[V5]], {{.*}}
				define void @sub_double(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds double, ptr %B, i64 %indvars.iv
				%0 = load double, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds double, ptr %C, i64 %indvars.iv
				%1 = load double, ptr %arrayidx.2, align 4
				%2 = fsub reassoc nsz double %0, %1
				%arrayidx.3 = getelementptr inbounds double, ptr %D, i64 %indvars.iv
				%3 = load double, ptr %arrayidx.3, align 4
				%4 = fsub reassoc nsz double %2, %3
				%arrayidx.4 = getelementptr inbounds double, ptr %E, i64 %indvars.iv
				%5 = load double, ptr %arrayidx.4, align 4
				%6 = fsub reassoc nsz double %4, %5
				%arrayidx.5 = getelementptr inbounds double, ptr %F, i64 %indvars.iv
				%7 = load double, ptr %arrayidx.5, align 4
				%8 = fsub reassoc nsz double %6, %7
				%arrayidx.6 = getelementptr inbounds double, ptr %G, i64 %indvars.iv
				%9 = load double, ptr %arrayidx.6, align 4
				%10 = fsub reassoc nsz double %8, %9
				%arrayidx.7 = getelementptr inbounds double, ptr %H, i64 %indvars.iv
				%11 = load double, ptr %arrayidx.7, align 4
				%12 = fsub reassoc nsz double %10, %11
				%arrayidx.8 = getelementptr inbounds double, ptr %I, i64 %indvars.iv
				%13 = load double, ptr %arrayidx.8, align 4
				%14 = fsub reassoc nsz double %12, %13
				%arrayidx.9 = getelementptr inbounds double, ptr %A, i64 %indvars.iv
				store double %14, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

llvm/test/Transforms/TreeHeightReduction/fp16-add-with-constant.ll

This file was added.

				; RUN: opt -S -passes=tree-height-reduction -enable-fp-thr < %s \| FileCheck %s

				; CHECK-LABEL: @add_half_with_constant(
				; CHECK: %[[V0:.]] = fadd reassoc nsz half {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = fadd reassoc nsz half {{.}}, %[[V0]]
				; CHECK-NEXT: %[[V2:.]] = fadd reassoc nsz half {{.}}, 0xH4900
				; CHECK-NEXT: %[[V3:.]] = fadd reassoc nsz half {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.]] = fadd reassoc nsz half {{.}}, {{.*}}
				; CHECK-NEXT: %[[V5:.*]] = fadd reassoc nsz half %[[V1]], %[[V2]]
				; CHECK-NEXT: %[[V6:.*]] = fadd reassoc nsz half %[[V3]], %[[V4]]
				; CHECK-NEXT: fadd reassoc nsz half %[[V5]], %[[V6]]
				define void @add_half_with_constant(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds half, ptr %B, i64 %indvars.iv
				%0 = load half, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds half, ptr %C, i64 %indvars.iv
				%1 = load half, ptr %arrayidx.2, align 4
				%arrayidx.3 = getelementptr inbounds half, ptr %D, i64 %indvars.iv
				%2 = load half, ptr %arrayidx.3, align 4
				%arrayidx.4 = getelementptr inbounds half, ptr %E, i64 %indvars.iv
				%3 = load half, ptr %arrayidx.4, align 4
				%arrayidx.5 = getelementptr inbounds half, ptr %F, i64 %indvars.iv
				%4 = load half, ptr %arrayidx.5, align 4
				%arrayidx.6 = getelementptr inbounds half, ptr %G, i64 %indvars.iv
				%5 = load half, ptr %arrayidx.6, align 4
				%arrayidx.7 = getelementptr inbounds half, ptr %H, i64 %indvars.iv
				%6 = load half, ptr %arrayidx.7, align 4
				%arrayidx.8 = getelementptr inbounds half, ptr %I, i64 %indvars.iv
				%7 = load half, ptr %arrayidx.8, align 4
				%8 = fadd reassoc nsz half %0, 0xH4900
				%9 = fadd reassoc nsz half %8, %1
				%10 = fadd reassoc nsz half %9, %2
				%11 = fadd reassoc nsz half %10, %3
				%12 = fadd reassoc nsz half %11, %4
				%13 = fadd reassoc nsz half %12, %5
				%14 = fadd reassoc nsz half %13, %6
				%15 = fadd reassoc nsz half %14, %7
				%arrayidx.9 = getelementptr inbounds half, ptr %A, i64 %indvars.iv
				store half %15, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

llvm/test/Transforms/TreeHeightReduction/fp16-add.ll

This file was added.

				; RUN: opt -S -passes=tree-height-reduction -enable-fp-thr < %s \| FileCheck %s

				; CHECK-LABEL: @add_half(
				; CHECK: %[[V0:.]] = fadd reassoc nsz half {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = fadd reassoc nsz half {{.}}, {{.*}}
				; CHECK-NEXT: %[[V2:.]] = fadd reassoc nsz half {{.}}, {{.*}}
				; CHECK-NEXT: %[[V3:.]] = fadd reassoc nsz half {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.*]] = fadd reassoc nsz half %[[V0]], %[[V1]]
				; CHECK-NEXT: %[[V5:.*]] = fadd reassoc nsz half %[[V2]], %[[V3]]
				; CHECK-NEXT: fadd reassoc nsz half %[[V4]], %[[V5]]
				define void @add_half(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds half, ptr %B, i64 %indvars.iv
				%0 = load half, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds half, ptr %C, i64 %indvars.iv
				%1 = load half, ptr %arrayidx.2, align 4
				%2 = fadd reassoc nsz half %1, %0
				%arrayidx.3 = getelementptr inbounds half, ptr %D, i64 %indvars.iv
				%3 = load half, ptr %arrayidx.3, align 4
				%4 = fadd reassoc nsz half %2, %3
				%arrayidx.4 = getelementptr inbounds half, ptr %E, i64 %indvars.iv
				%5 = load half, ptr %arrayidx.4, align 4
				%6 = fadd reassoc nsz half %4, %5
				%arrayidx.5 = getelementptr inbounds half, ptr %F, i64 %indvars.iv
				%7 = load half, ptr %arrayidx.5, align 4
				%8 = fadd reassoc nsz half %6, %7
				%arrayidx.6 = getelementptr inbounds half, ptr %G, i64 %indvars.iv
				%9 = load half, ptr %arrayidx.6, align 4
				%10 = fadd reassoc nsz half %8, %9
				%arrayidx.7 = getelementptr inbounds half, ptr %H, i64 %indvars.iv
				%11 = load half, ptr %arrayidx.7, align 4
				%12 = fadd reassoc nsz half %10, %11
				%arrayidx.8 = getelementptr inbounds half, ptr %I, i64 %indvars.iv
				%13 = load half, ptr %arrayidx.8, align 4
				%14 = fadd reassoc nsz half %12, %13
				%arrayidx.9 = getelementptr inbounds half, ptr %A, i64 %indvars.iv
				store half %14, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

llvm/test/Transforms/TreeHeightReduction/fp16-mult.ll

This file was added.

				; RUN: opt -S -passes=tree-height-reduction -enable-fp-thr < %s \| FileCheck %s

				; CHECK-LABEL: @add_half(
				; CHECK: %[[V0:.]] = fmul reassoc nsz half {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = fmul reassoc nsz half {{.}}, {{.*}}
				; CHECK-NEXT: %[[V2:.]] = fmul reassoc nsz half {{.}}, {{.*}}
				; CHECK-NEXT: %[[V3:.]] = fmul reassoc nsz half {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.*]] = fmul reassoc nsz half %[[V0]], %[[V1]]
				; CHECK-NEXT: %[[V5:.*]] = fmul reassoc nsz half %[[V2]], %[[V3]]
				; CHECK-NEXT: fmul reassoc nsz half %[[V4]], %[[V5]]
				define void @add_half(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds half, ptr %B, i64 %indvars.iv
				%0 = load half, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds half, ptr %C, i64 %indvars.iv
				%1 = load half, ptr %arrayidx.2, align 4
				%2 = fmul reassoc nsz half %1, %0
				%arrayidx.3 = getelementptr inbounds half, ptr %D, i64 %indvars.iv
				%3 = load half, ptr %arrayidx.3, align 4
				%4 = fmul reassoc nsz half %2, %3
				%arrayidx.4 = getelementptr inbounds half, ptr %E, i64 %indvars.iv
				%5 = load half, ptr %arrayidx.4, align 4
				%6 = fmul reassoc nsz half %4, %5
				%arrayidx.5 = getelementptr inbounds half, ptr %F, i64 %indvars.iv
				%7 = load half, ptr %arrayidx.5, align 4
				%8 = fmul reassoc nsz half %6, %7
				%arrayidx.6 = getelementptr inbounds half, ptr %G, i64 %indvars.iv
				%9 = load half, ptr %arrayidx.6, align 4
				%10 = fmul reassoc nsz half %8, %9
				%arrayidx.7 = getelementptr inbounds half, ptr %H, i64 %indvars.iv
				%11 = load half, ptr %arrayidx.7, align 4
				%12 = fmul reassoc nsz half %10, %11
				%arrayidx.8 = getelementptr inbounds half, ptr %I, i64 %indvars.iv
				%13 = load half, ptr %arrayidx.8, align 4
				%14 = fmul reassoc nsz half %12, %13
				%arrayidx.9 = getelementptr inbounds half, ptr %A, i64 %indvars.iv
				store half %14, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

llvm/test/Transforms/TreeHeightReduction/fp16-sub.ll

This file was added.

				; RUN: opt -S -passes=tree-height-reduction -enable-fp-thr < %s \| FileCheck %s

				; CHECK-LABEL: @sub_half(
				; CHECK: %[[V0:.]] = fsub reassoc nsz half {{.}}, {{.*}}
				; CHECK: %[[V1:.]] = fsub reassoc nsz half %[[V0]], %{{.}}
				; CHECK: %[[V2:.]] = fsub reassoc nsz half %[[V1]], {{.}}
				; CHECK: %[[V3:.]] = fsub reassoc nsz half %[[V2]], {{.}}
				; CHECK: %[[V4:.]] = fsub reassoc nsz half %[[V3]], {{.}}
				; CHECK: %[[V5:.]] = fsub reassoc nsz half %[[V4]], {{.}}
				; CHECK: fsub reassoc nsz half %[[V5]], {{.*}}
				define void @sub_half(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds half, ptr %B, i64 %indvars.iv
				%0 = load half, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds half, ptr %C, i64 %indvars.iv
				%1 = load half, ptr %arrayidx.2, align 4
				%2 = fsub reassoc nsz half %0, %1
				%arrayidx.3 = getelementptr inbounds half, ptr %D, i64 %indvars.iv
				%3 = load half, ptr %arrayidx.3, align 4
				%4 = fsub reassoc nsz half %2, %3
				%arrayidx.4 = getelementptr inbounds half, ptr %E, i64 %indvars.iv
				%5 = load half, ptr %arrayidx.4, align 4
				%6 = fsub reassoc nsz half %4, %5
				%arrayidx.5 = getelementptr inbounds half, ptr %F, i64 %indvars.iv
				%7 = load half, ptr %arrayidx.5, align 4
				%8 = fsub reassoc nsz half %6, %7
				%arrayidx.6 = getelementptr inbounds half, ptr %G, i64 %indvars.iv
				%9 = load half, ptr %arrayidx.6, align 4
				%10 = fsub reassoc nsz half %8, %9
				%arrayidx.7 = getelementptr inbounds half, ptr %H, i64 %indvars.iv
				%11 = load half, ptr %arrayidx.7, align 4
				%12 = fsub reassoc nsz half %10, %11
				%arrayidx.8 = getelementptr inbounds half, ptr %I, i64 %indvars.iv
				%13 = load half, ptr %arrayidx.8, align 4
				%14 = fsub reassoc nsz half %12, %13
				%arrayidx.9 = getelementptr inbounds half, ptr %A, i64 %indvars.iv
				store half %14, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

				; CHECK-LABEL: @sub_double(
				; CHECK: %[[V0:.]] = fsub reassoc nsz double {{.}}, {{.*}}
				; CHECK: %[[V1:.]] = fsub reassoc nsz double %[[V0]], %{{.}}
				; CHECK: %[[V2:.]] = fsub reassoc nsz double %[[V1]], {{.}}
				; CHECK: %[[V3:.]] = fsub reassoc nsz double %[[V2]], {{.}}
				; CHECK: %[[V4:.]] = fsub reassoc nsz double %[[V3]], {{.}}
				; CHECK: %[[V5:.]] = fsub reassoc nsz double %[[V4]], {{.}}
				; CHECK: fsub reassoc nsz double %[[V5]], {{.*}}
				define void @sub_double(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds double, ptr %B, i64 %indvars.iv
				%0 = load double, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds double, ptr %C, i64 %indvars.iv
				%1 = load double, ptr %arrayidx.2, align 4
				%2 = fsub reassoc nsz double %0, %1
				%arrayidx.3 = getelementptr inbounds double, ptr %D, i64 %indvars.iv
				%3 = load double, ptr %arrayidx.3, align 4
				%4 = fsub reassoc nsz double %2, %3
				%arrayidx.4 = getelementptr inbounds double, ptr %E, i64 %indvars.iv
				%5 = load double, ptr %arrayidx.4, align 4
				%6 = fsub reassoc nsz double %4, %5
				%arrayidx.5 = getelementptr inbounds double, ptr %F, i64 %indvars.iv
				%7 = load double, ptr %arrayidx.5, align 4
				%8 = fsub reassoc nsz double %6, %7
				%arrayidx.6 = getelementptr inbounds double, ptr %G, i64 %indvars.iv
				%9 = load double, ptr %arrayidx.6, align 4
				%10 = fsub reassoc nsz double %8, %9
				%arrayidx.7 = getelementptr inbounds double, ptr %H, i64 %indvars.iv
				%11 = load double, ptr %arrayidx.7, align 4
				%12 = fsub reassoc nsz double %10, %11
				%arrayidx.8 = getelementptr inbounds double, ptr %I, i64 %indvars.iv
				%13 = load double, ptr %arrayidx.8, align 4
				%14 = fsub reassoc nsz double %12, %13
				%arrayidx.9 = getelementptr inbounds double, ptr %A, i64 %indvars.iv
				store double %14, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

llvm/test/Transforms/TreeHeightReduction/integer-add-only.ll

This file was added.

				; RUN: opt -S -passes=tree-height-reduction -enable-int-thr < %s \| FileCheck %s

				; CHECK-LABEL: @add_i8(
				; CHECK: %[[V0:.]] = add i8 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = add i8 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V2:.]] = add i8 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V3:.]] = add i8 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.*]] = add i8 %[[V0]], %[[V1]]
				; CHECK-NEXT: %[[V5:.*]] = add i8 %[[V2]], %[[V3]]
				; CHECK-NEXT: add i8 %[[V4]], %[[V5]]
				define void @add_i8(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds i8, ptr %B, i64 %indvars.iv
				%0 = load i8, ptr %arrayidx.1, align 1
				%arrayidx.2 = getelementptr inbounds i8, ptr %C, i64 %indvars.iv
				%1 = load i8, ptr %arrayidx.2, align 1
				%2 = add i8 %1, %0
				%arrayidx.3 = getelementptr inbounds i8, ptr %D, i64 %indvars.iv
				%3 = load i8, ptr %arrayidx.3, align 1
				%4 = add i8 %2, %3
				%arrayidx.4 = getelementptr inbounds i8, ptr %E, i64 %indvars.iv
				%5 = load i8, ptr %arrayidx.4, align 1
				%6 = add i8 %4, %5
				%arrayidx.5 = getelementptr inbounds i8, ptr %F, i64 %indvars.iv
				%7 = load i8, ptr %arrayidx.5, align 1
				%8 = add i8 %6, %7
				%arrayidx.6 = getelementptr inbounds i8, ptr %G, i64 %indvars.iv
				%9 = load i8, ptr %arrayidx.6, align 1
				%10 = add i8 %8, %9
				%arrayidx.7 = getelementptr inbounds i8, ptr %H, i64 %indvars.iv
				%11 = load i8, ptr %arrayidx.7, align 1
				%12 = add i8 %10, %11
				%arrayidx.8 = getelementptr inbounds i8, ptr %I, i64 %indvars.iv
				%13 = load i8, ptr %arrayidx.8, align 1
				%14 = add i8 %12, %13
				%arrayidx.9 = getelementptr inbounds i8, ptr %A, i64 %indvars.iv
				store i8 %14, ptr %arrayidx.9, align 1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

				; CHECK-LABEL: @add_i16(
				; CHECK: %[[V0:.]] = add i16 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = add i16 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V2:.]] = add i16 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V3:.]] = add i16 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.*]] = add i16 %[[V0]], %[[V1]]
				; CHECK-NEXT: %[[V5:.*]] = add i16 %[[V2]], %[[V3]]
				; CHECK-NEXT: add i16 %[[V4]], %[[V5]]
				define void @add_i16(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds i16, ptr %B, i64 %indvars.iv
				%0 = load i16, ptr %arrayidx.1, align 1
				%arrayidx.2 = getelementptr inbounds i16, ptr %C, i64 %indvars.iv
				%1 = load i16, ptr %arrayidx.2, align 1
				%2 = add i16 %1, %0
				%arrayidx.3 = getelementptr inbounds i16, ptr %D, i64 %indvars.iv
				%3 = load i16, ptr %arrayidx.3, align 1
				%4 = add i16 %2, %3
				%arrayidx.4 = getelementptr inbounds i16, ptr %E, i64 %indvars.iv
				%5 = load i16, ptr %arrayidx.4, align 1
				%6 = add i16 %4, %5
				%arrayidx.5 = getelementptr inbounds i16, ptr %F, i64 %indvars.iv
				%7 = load i16, ptr %arrayidx.5, align 1
				%8 = add i16 %6, %7
				%arrayidx.6 = getelementptr inbounds i16, ptr %G, i64 %indvars.iv
				%9 = load i16, ptr %arrayidx.6, align 1
				%10 = add i16 %8, %9
				%arrayidx.7 = getelementptr inbounds i16, ptr %H, i64 %indvars.iv
				%11 = load i16, ptr %arrayidx.7, align 1
				%12 = add i16 %10, %11
				%arrayidx.8 = getelementptr inbounds i16, ptr %I, i64 %indvars.iv
				%13 = load i16, ptr %arrayidx.8, align 1
				%14 = add i16 %12, %13
				%arrayidx.9 = getelementptr inbounds i16, ptr %A, i64 %indvars.iv
				store i16 %14, ptr %arrayidx.9, align 1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

				; CHECK-LABEL: @add_i32(
				; CHECK: %[[V0:.]] = add i32 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = add i32 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V2:.]] = add i32 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V3:.]] = add i32 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.*]] = add i32 %[[V0]], %[[V1]]
				; CHECK-NEXT: %[[V5:.*]] = add i32 %[[V2]], %[[V3]]
				; CHECK-NEXT: add i32 %[[V4]], %[[V5]]
				define void @add_i32(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds i32, ptr %B, i64 %indvars.iv
				%0 = load i32, ptr %arrayidx.1, align 1
				%arrayidx.2 = getelementptr inbounds i32, ptr %C, i64 %indvars.iv
				%1 = load i32, ptr %arrayidx.2, align 1
				%2 = add i32 %1, %0
				%arrayidx.3 = getelementptr inbounds i32, ptr %D, i64 %indvars.iv
				%3 = load i32, ptr %arrayidx.3, align 1
				%4 = add i32 %2, %3
				%arrayidx.4 = getelementptr inbounds i32, ptr %E, i64 %indvars.iv
				%5 = load i32, ptr %arrayidx.4, align 1
				%6 = add i32 %4, %5
				%arrayidx.5 = getelementptr inbounds i32, ptr %F, i64 %indvars.iv
				%7 = load i32, ptr %arrayidx.5, align 1
				%8 = add i32 %6, %7
				%arrayidx.6 = getelementptr inbounds i32, ptr %G, i64 %indvars.iv
				%9 = load i32, ptr %arrayidx.6, align 1
				%10 = add i32 %8, %9
				%arrayidx.7 = getelementptr inbounds i32, ptr %H, i64 %indvars.iv
				%11 = load i32, ptr %arrayidx.7, align 1
				%12 = add i32 %10, %11
				%arrayidx.8 = getelementptr inbounds i32, ptr %I, i64 %indvars.iv
				%13 = load i32, ptr %arrayidx.8, align 1
				%14 = add i32 %12, %13
				%arrayidx.9 = getelementptr inbounds i32, ptr %A, i64 %indvars.iv
				store i32 %14, ptr %arrayidx.9, align 1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

				; CHECK-LABEL: @add_i64(
				; CHECK: %[[V0:.]] = add i64 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = add i64 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V2:.]] = add i64 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V3:.]] = add i64 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.*]] = add i64 %[[V0]], %[[V1]]
				; CHECK-NEXT: %[[V5:.*]] = add i64 %[[V2]], %[[V3]]
				; CHECK-NEXT: add i64 %[[V4]], %[[V5]]
				define void @add_i64(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds i64, ptr %B, i64 %indvars.iv
				%0 = load i64, ptr %arrayidx.1, align 1
				%arrayidx.2 = getelementptr inbounds i64, ptr %C, i64 %indvars.iv
				%1 = load i64, ptr %arrayidx.2, align 1
				%2 = add i64 %1, %0
				%arrayidx.3 = getelementptr inbounds i64, ptr %D, i64 %indvars.iv
				%3 = load i64, ptr %arrayidx.3, align 1
				%4 = add i64 %2, %3
				%arrayidx.4 = getelementptr inbounds i64, ptr %E, i64 %indvars.iv
				%5 = load i64, ptr %arrayidx.4, align 1
				%6 = add i64 %4, %5
				%arrayidx.5 = getelementptr inbounds i64, ptr %F, i64 %indvars.iv
				%7 = load i64, ptr %arrayidx.5, align 1
				%8 = add i64 %6, %7
				%arrayidx.6 = getelementptr inbounds i64, ptr %G, i64 %indvars.iv
				%9 = load i64, ptr %arrayidx.6, align 1
				%10 = add i64 %8, %9
				%arrayidx.7 = getelementptr inbounds i64, ptr %H, i64 %indvars.iv
				%11 = load i64, ptr %arrayidx.7, align 1
				%12 = add i64 %10, %11
				%arrayidx.8 = getelementptr inbounds i64, ptr %I, i64 %indvars.iv
				%13 = load i64, ptr %arrayidx.8, align 1
				%14 = add i64 %12, %13
				%arrayidx.9 = getelementptr inbounds i64, ptr %A, i64 %indvars.iv
				store i64 %14, ptr %arrayidx.9, align 1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

llvm/test/Transforms/TreeHeightReduction/integer-add-with-constant.ll

This file was added.

				; RUN: opt -S -passes=tree-height-reduction -enable-int-thr < %s \| FileCheck %s

				; CHECK-LABEL: @add_with_constant_i8(
				; CHECK: %[[V0:.]] = add i8 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = add i8 {{.}}, %[[V0]]
				; CHECK-NEXT: %[[V2:.]] = add i8 {{.}}, 10
				; CHECK-NEXT: %[[V3:.]] = add i8 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.]] = add i8 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V5:.*]] = add i8 %[[V1]], %[[V2]]
				; CHECK-NEXT: %[[V6:.*]] = add i8 %[[V3]], %[[V4]]
				; CHECK-NEXT: add i8 %[[V5]], %[[V6]]
				define void @add_with_constant_i8(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds i8, ptr %B, i64 %indvars.iv
				%0 = load i8, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds i8, ptr %C, i64 %indvars.iv
				%1 = load i8, ptr %arrayidx.2, align 4
				%arrayidx.3 = getelementptr inbounds i8, ptr %D, i64 %indvars.iv
				%2 = load i8, ptr %arrayidx.3, align 4
				%arrayidx.4 = getelementptr inbounds i8, ptr %E, i64 %indvars.iv
				%3 = load i8, ptr %arrayidx.4, align 4
				%arrayidx.5 = getelementptr inbounds i8, ptr %F, i64 %indvars.iv
				%4 = load i8, ptr %arrayidx.5, align 4
				%arrayidx.6 = getelementptr inbounds i8, ptr %G, i64 %indvars.iv
				%5 = load i8, ptr %arrayidx.6, align 4
				%arrayidx.7 = getelementptr inbounds i8, ptr %H, i64 %indvars.iv
				%6 = load i8, ptr %arrayidx.7, align 4
				%arrayidx.8 = getelementptr inbounds i8, ptr %I, i64 %indvars.iv
				%7 = load i8, ptr %arrayidx.8, align 4
				%8 = add i8 %0, 10
				%9 = add i8 %8, %1
				%10 = add i8 %9, %2
				%11 = add i8 %10, %3
				%12 = add i8 %11, %4
				%13 = add i8 %12, %5
				%14 = add i8 %13, %6
				%15 = add i8 %14, %7
				%arrayidx.9 = getelementptr inbounds i8, ptr %A, i64 %indvars.iv
				store i8 %15, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

				; CHECK-LABEL: @add_with_constant_i16(
				; CHECK: %[[V0:.]] = add i16 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = add i16 {{.}}, %[[V0]]
				; CHECK-NEXT: %[[V2:.]] = add i16 {{.}}, 10
				; CHECK-NEXT: %[[V3:.]] = add i16 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.]] = add i16 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V5:.*]] = add i16 %[[V1]], %[[V2]]
				; CHECK-NEXT: %[[V6:.*]] = add i16 %[[V3]], %[[V4]]
				; CHECK-NEXT: add i16 %[[V5]], %[[V6]]
				define void @add_with_constant_i16(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds i16, ptr %B, i64 %indvars.iv
				%0 = load i16, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds i16, ptr %C, i64 %indvars.iv
				%1 = load i16, ptr %arrayidx.2, align 4
				%arrayidx.3 = getelementptr inbounds i16, ptr %D, i64 %indvars.iv
				%2 = load i16, ptr %arrayidx.3, align 4
				%arrayidx.4 = getelementptr inbounds i16, ptr %E, i64 %indvars.iv
				%3 = load i16, ptr %arrayidx.4, align 4
				%arrayidx.5 = getelementptr inbounds i16, ptr %F, i64 %indvars.iv
				%4 = load i16, ptr %arrayidx.5, align 4
				%arrayidx.6 = getelementptr inbounds i16, ptr %G, i64 %indvars.iv
				%5 = load i16, ptr %arrayidx.6, align 4
				%arrayidx.7 = getelementptr inbounds i16, ptr %H, i64 %indvars.iv
				%6 = load i16, ptr %arrayidx.7, align 4
				%arrayidx.8 = getelementptr inbounds i16, ptr %I, i64 %indvars.iv
				%7 = load i16, ptr %arrayidx.8, align 4
				%8 = add i16 %0, 10
				%9 = add i16 %8, %1
				%10 = add i16 %9, %2
				%11 = add i16 %10, %3
				%12 = add i16 %11, %4
				%13 = add i16 %12, %5
				%14 = add i16 %13, %6
				%15 = add i16 %14, %7
				%arrayidx.9 = getelementptr inbounds i16, ptr %A, i64 %indvars.iv
				store i16 %15, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

				; CHECK-LABEL: @add_with_constant_i32(
				; CHECK: %[[V0:.]] = add i32 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = add i32 {{.}}, %[[V0]]
				; CHECK-NEXT: %[[V2:.]] = add i32 {{.}}, 10
				; CHECK-NEXT: %[[V3:.]] = add i32 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.]] = add i32 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V5:.*]] = add i32 %[[V1]], %[[V2]]
				; CHECK-NEXT: %[[V6:.*]] = add i32 %[[V3]], %[[V4]]
				; CHECK-NEXT: add i32 %[[V5]], %[[V6]]
				define void @add_with_constant_i32(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds i32, ptr %B, i64 %indvars.iv
				%0 = load i32, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds i32, ptr %C, i64 %indvars.iv
				%1 = load i32, ptr %arrayidx.2, align 4
				%arrayidx.3 = getelementptr inbounds i32, ptr %D, i64 %indvars.iv
				%2 = load i32, ptr %arrayidx.3, align 4
				%arrayidx.4 = getelementptr inbounds i32, ptr %E, i64 %indvars.iv
				%3 = load i32, ptr %arrayidx.4, align 4
				%arrayidx.5 = getelementptr inbounds i32, ptr %F, i64 %indvars.iv
				%4 = load i32, ptr %arrayidx.5, align 4
				%arrayidx.6 = getelementptr inbounds i32, ptr %G, i64 %indvars.iv
				%5 = load i32, ptr %arrayidx.6, align 4
				%arrayidx.7 = getelementptr inbounds i32, ptr %H, i64 %indvars.iv
				%6 = load i32, ptr %arrayidx.7, align 4
				%arrayidx.8 = getelementptr inbounds i32, ptr %I, i64 %indvars.iv
				%7 = load i32, ptr %arrayidx.8, align 4
				%8 = add i32 %0, 10
				%9 = add i32 %8, %1
				%10 = add i32 %9, %2
				%11 = add i32 %10, %3
				%12 = add i32 %11, %4
				%13 = add i32 %12, %5
				%14 = add i32 %13, %6
				%15 = add i32 %14, %7
				%arrayidx.9 = getelementptr inbounds i32, ptr %A, i64 %indvars.iv
				store i32 %15, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

				; CHECK-LABEL: @add_with_constant_i64(
				; CHECK: %[[V0:.]] = add i64 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = add i64 {{.}}, %[[V0]]
				; CHECK-NEXT: %[[V2:.]] = add i64 {{.}}, 10
				; CHECK-NEXT: %[[V3:.]] = add i64 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.]] = add i64 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V5:.*]] = add i64 %[[V1]], %[[V2]]
				; CHECK-NEXT: %[[V6:.*]] = add i64 %[[V3]], %[[V4]]
				; CHECK-NEXT: add i64 %[[V5]], %[[V6]]
				define void @add_with_constant_i64(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds i64, ptr %B, i64 %indvars.iv
				%0 = load i64, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds i64, ptr %C, i64 %indvars.iv
				%1 = load i64, ptr %arrayidx.2, align 4
				%arrayidx.3 = getelementptr inbounds i64, ptr %D, i64 %indvars.iv
				%2 = load i64, ptr %arrayidx.3, align 4
				%arrayidx.4 = getelementptr inbounds i64, ptr %E, i64 %indvars.iv
				%3 = load i64, ptr %arrayidx.4, align 4
				%arrayidx.5 = getelementptr inbounds i64, ptr %F, i64 %indvars.iv
				%4 = load i64, ptr %arrayidx.5, align 4
				%arrayidx.6 = getelementptr inbounds i64, ptr %G, i64 %indvars.iv
				%5 = load i64, ptr %arrayidx.6, align 4
				%arrayidx.7 = getelementptr inbounds i64, ptr %H, i64 %indvars.iv
				%6 = load i64, ptr %arrayidx.7, align 4
				%arrayidx.8 = getelementptr inbounds i64, ptr %I, i64 %indvars.iv
				%7 = load i64, ptr %arrayidx.8, align 4
				%8 = add i64 %0, 10
				%9 = add i64 %8, %1
				%10 = add i64 %9, %2
				%11 = add i64 %10, %3
				%12 = add i64 %11, %4
				%13 = add i64 %12, %5
				%14 = add i64 %13, %6
				%15 = add i64 %14, %7
				%arrayidx.9 = getelementptr inbounds i64, ptr %A, i64 %indvars.iv
				store i64 %15, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

llvm/test/Transforms/TreeHeightReduction/integer-mult-only.ll

This file was added.

				; RUN: opt -S -passes=tree-height-reduction -enable-int-thr < %s \| FileCheck %s

				; CHECK-LABEL: @mul_i8(
				; CHECK: %[[V0:.]] = mul i8 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = mul i8 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V2:.]] = mul i8 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V3:.]] = mul i8 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.*]] = mul i8 %[[V0]], %[[V1]]
				; CHECK-NEXT: %[[V5:.*]] = mul i8 %[[V2]], %[[V3]]
				; CHECK-NEXT: mul i8 %[[V4]], %[[V5]]
				define void @mul_i8(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds i8, ptr %B, i64 %indvars.iv
				%0 = load i8, ptr %arrayidx.1, align 1
				%arrayidx.2 = getelementptr inbounds i8, ptr %C, i64 %indvars.iv
				%1 = load i8, ptr %arrayidx.2, align 1
				%2 = mul i8 %1, %0
				%arrayidx.3 = getelementptr inbounds i8, ptr %D, i64 %indvars.iv
				%3 = load i8, ptr %arrayidx.3, align 1
				%4 = mul i8 %2, %3
				%arrayidx.4 = getelementptr inbounds i8, ptr %E, i64 %indvars.iv
				%5 = load i8, ptr %arrayidx.4, align 1
				%6 = mul i8 %4, %5
				%arrayidx.5 = getelementptr inbounds i8, ptr %F, i64 %indvars.iv
				%7 = load i8, ptr %arrayidx.5, align 1
				%8 = mul i8 %6, %7
				%arrayidx.6 = getelementptr inbounds i8, ptr %G, i64 %indvars.iv
				%9 = load i8, ptr %arrayidx.6, align 1
				%10 = mul i8 %8, %9
				%arrayidx.7 = getelementptr inbounds i8, ptr %H, i64 %indvars.iv
				%11 = load i8, ptr %arrayidx.7, align 1
				%12 = mul i8 %10, %11
				%arrayidx.8 = getelementptr inbounds i8, ptr %I, i64 %indvars.iv
				%13 = load i8, ptr %arrayidx.8, align 1
				%14 = mul i8 %12, %13
				%arrayidx.9 = getelementptr inbounds i8, ptr %A, i64 %indvars.iv
				store i8 %14, ptr %arrayidx.9, align 1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

				; CHECK-LABEL: @mul_i16(
				; CHECK: %[[V0:.]] = mul i16 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = mul i16 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V2:.]] = mul i16 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V3:.]] = mul i16 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.*]] = mul i16 %[[V0]], %[[V1]]
				; CHECK-NEXT: %[[V5:.*]] = mul i16 %[[V2]], %[[V3]]
				; CHECK-NEXT: mul i16 %[[V4]], %[[V5]]
				define void @mul_i16(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds i16, ptr %B, i64 %indvars.iv
				%0 = load i16, ptr %arrayidx.1, align 1
				%arrayidx.2 = getelementptr inbounds i16, ptr %C, i64 %indvars.iv
				%1 = load i16, ptr %arrayidx.2, align 1
				%2 = mul i16 %1, %0
				%arrayidx.3 = getelementptr inbounds i16, ptr %D, i64 %indvars.iv
				%3 = load i16, ptr %arrayidx.3, align 1
				%4 = mul i16 %2, %3
				%arrayidx.4 = getelementptr inbounds i16, ptr %E, i64 %indvars.iv
				%5 = load i16, ptr %arrayidx.4, align 1
				%6 = mul i16 %4, %5
				%arrayidx.5 = getelementptr inbounds i16, ptr %F, i64 %indvars.iv
				%7 = load i16, ptr %arrayidx.5, align 1
				%8 = mul i16 %6, %7
				%arrayidx.6 = getelementptr inbounds i16, ptr %G, i64 %indvars.iv
				%9 = load i16, ptr %arrayidx.6, align 1
				%10 = mul i16 %8, %9
				%arrayidx.7 = getelementptr inbounds i16, ptr %H, i64 %indvars.iv
				%11 = load i16, ptr %arrayidx.7, align 1
				%12 = mul i16 %10, %11
				%arrayidx.8 = getelementptr inbounds i16, ptr %I, i64 %indvars.iv
				%13 = load i16, ptr %arrayidx.8, align 1
				%14 = mul i16 %12, %13
				%arrayidx.9 = getelementptr inbounds i16, ptr %A, i64 %indvars.iv
				store i16 %14, ptr %arrayidx.9, align 1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

				; CHECK-LABEL: @mul_i32(
				; CHECK: %[[V0:.]] = mul i32 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = mul i32 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V2:.]] = mul i32 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V3:.]] = mul i32 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.*]] = mul i32 %[[V0]], %[[V1]]
				; CHECK-NEXT: %[[V5:.*]] = mul i32 %[[V2]], %[[V3]]
				; CHECK-NEXT: mul i32 %[[V4]], %[[V5]]
				define void @mul_i32(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds i32, ptr %B, i64 %indvars.iv
				%0 = load i32, ptr %arrayidx.1, align 1
				%arrayidx.2 = getelementptr inbounds i32, ptr %C, i64 %indvars.iv
				%1 = load i32, ptr %arrayidx.2, align 1
				%2 = mul i32 %1, %0
				%arrayidx.3 = getelementptr inbounds i32, ptr %D, i64 %indvars.iv
				%3 = load i32, ptr %arrayidx.3, align 1
				%4 = mul i32 %2, %3
				%arrayidx.4 = getelementptr inbounds i32, ptr %E, i64 %indvars.iv
				%5 = load i32, ptr %arrayidx.4, align 1
				%6 = mul i32 %4, %5
				%arrayidx.5 = getelementptr inbounds i32, ptr %F, i64 %indvars.iv
				%7 = load i32, ptr %arrayidx.5, align 1
				%8 = mul i32 %6, %7
				%arrayidx.6 = getelementptr inbounds i32, ptr %G, i64 %indvars.iv
				%9 = load i32, ptr %arrayidx.6, align 1
				%10 = mul i32 %8, %9
				%arrayidx.7 = getelementptr inbounds i32, ptr %H, i64 %indvars.iv
				%11 = load i32, ptr %arrayidx.7, align 1
				%12 = mul i32 %10, %11
				%arrayidx.8 = getelementptr inbounds i32, ptr %I, i64 %indvars.iv
				%13 = load i32, ptr %arrayidx.8, align 1
				%14 = mul i32 %12, %13
				%arrayidx.9 = getelementptr inbounds i32, ptr %A, i64 %indvars.iv
				store i32 %14, ptr %arrayidx.9, align 1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

				; CHECK-LABEL: @mul_i64(
				; CHECK: %[[V0:.]] = mul i64 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = mul i64 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V2:.]] = mul i64 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V3:.]] = mul i64 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.*]] = mul i64 %[[V0]], %[[V1]]
				; CHECK-NEXT: %[[V5:.*]] = mul i64 %[[V2]], %[[V3]]
				; CHECK-NEXT: mul i64 %[[V4]], %[[V5]]
				define void @mul_i64(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds i64, ptr %B, i64 %indvars.iv
				%0 = load i64, ptr %arrayidx.1, align 1
				%arrayidx.2 = getelementptr inbounds i64, ptr %C, i64 %indvars.iv
				%1 = load i64, ptr %arrayidx.2, align 1
				%2 = mul i64 %1, %0
				%arrayidx.3 = getelementptr inbounds i64, ptr %D, i64 %indvars.iv
				%3 = load i64, ptr %arrayidx.3, align 1
				%4 = mul i64 %2, %3
				%arrayidx.4 = getelementptr inbounds i64, ptr %E, i64 %indvars.iv
				%5 = load i64, ptr %arrayidx.4, align 1
				%6 = mul i64 %4, %5
				%arrayidx.5 = getelementptr inbounds i64, ptr %F, i64 %indvars.iv
				%7 = load i64, ptr %arrayidx.5, align 1
				%8 = mul i64 %6, %7
				%arrayidx.6 = getelementptr inbounds i64, ptr %G, i64 %indvars.iv
				%9 = load i64, ptr %arrayidx.6, align 1
				%10 = mul i64 %8, %9
				%arrayidx.7 = getelementptr inbounds i64, ptr %H, i64 %indvars.iv
				%11 = load i64, ptr %arrayidx.7, align 1
				%12 = mul i64 %10, %11
				%arrayidx.8 = getelementptr inbounds i64, ptr %I, i64 %indvars.iv
				%13 = load i64, ptr %arrayidx.8, align 1
				%14 = mul i64 %12, %13
				%arrayidx.9 = getelementptr inbounds i64, ptr %A, i64 %indvars.iv
				store i64 %14, ptr %arrayidx.9, align 1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

llvm/test/Transforms/TreeHeightReduction/integer-sub-only.ll

This file was added.

				; RUN: opt -S -passes=tree-height-reduction -enable-int-thr < %s \| FileCheck %s

				; CHECK-LABEL: @sub_i8(
				; CHECK: %[[V0:.]] = sub i8 {{.}}, {{.*}}
				; CHECK: %[[V1:.]] = sub i8 %[[V0]], %{{.}}
				; CHECK: %[[V2:.]] = sub i8 %[[V1]], {{.}}
				; CHECK: %[[V3:.]] = sub i8 %[[V2]], {{.}}
				; CHECK: %[[V4:.]] = sub i8 %[[V3]], {{.}}
				; CHECK: %[[V5:.]] = sub i8 %[[V4]], {{.}}
				; CHECK: sub i8 %[[V5]], {{.*}}
				define void @sub_i8(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds i8, ptr %B, i64 %indvars.iv
				%0 = load i8, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds i8, ptr %C, i64 %indvars.iv
				%1 = load i8, ptr %arrayidx.2, align 4
				%2 = sub i8 %0, %1
				%arrayidx.3 = getelementptr inbounds i8, ptr %D, i64 %indvars.iv
				%3 = load i8, ptr %arrayidx.3, align 4
				%4 = sub i8 %2, %3
				%arrayidx.4 = getelementptr inbounds i8, ptr %E, i64 %indvars.iv
				%5 = load i8, ptr %arrayidx.4, align 4
				%6 = sub i8 %4, %5
				%arrayidx.5 = getelementptr inbounds i8, ptr %F, i64 %indvars.iv
				%7 = load i8, ptr %arrayidx.5, align 4
				%8 = sub i8 %6, %7
				%arrayidx.6 = getelementptr inbounds i8, ptr %G, i64 %indvars.iv
				%9 = load i8, ptr %arrayidx.6, align 4
				%10 = sub i8 %8, %9
				%arrayidx.7 = getelementptr inbounds i8, ptr %H, i64 %indvars.iv
				%11 = load i8, ptr %arrayidx.7, align 4
				%12 = sub i8 %10, %11
				%arrayidx.8 = getelementptr inbounds i8, ptr %I, i64 %indvars.iv
				%13 = load i8, ptr %arrayidx.8, align 4
				%14 = sub i8 %12, %13
				%arrayidx.9 = getelementptr inbounds i8, ptr %A, i64 %indvars.iv
				store i8 %14, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

				; CHECK-LABEL: @sub_i16(
				; CHECK: %[[V0:.]] = sub i16 {{.}}, {{.*}}
				; CHECK: %[[V1:.]] = sub i16 %[[V0]], %{{.}}
				; CHECK: %[[V2:.]] = sub i16 %[[V1]], {{.}}
				; CHECK: %[[V3:.]] = sub i16 %[[V2]], {{.}}
				; CHECK: %[[V4:.]] = sub i16 %[[V3]], {{.}}
				; CHECK: %[[V5:.]] = sub i16 %[[V4]], {{.}}
				; CHECK: sub i16 %[[V5]], {{.*}}
				define void @sub_i16(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds i16, ptr %B, i64 %indvars.iv
				%0 = load i16, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds i16, ptr %C, i64 %indvars.iv
				%1 = load i16, ptr %arrayidx.2, align 4
				%2 = sub i16 %0, %1
				%arrayidx.3 = getelementptr inbounds i16, ptr %D, i64 %indvars.iv
				%3 = load i16, ptr %arrayidx.3, align 4
				%4 = sub i16 %2, %3
				%arrayidx.4 = getelementptr inbounds i16, ptr %E, i64 %indvars.iv
				%5 = load i16, ptr %arrayidx.4, align 4
				%6 = sub i16 %4, %5
				%arrayidx.5 = getelementptr inbounds i16, ptr %F, i64 %indvars.iv
				%7 = load i16, ptr %arrayidx.5, align 4
				%8 = sub i16 %6, %7
				%arrayidx.6 = getelementptr inbounds i16, ptr %G, i64 %indvars.iv
				%9 = load i16, ptr %arrayidx.6, align 4
				%10 = sub i16 %8, %9
				%arrayidx.7 = getelementptr inbounds i16, ptr %H, i64 %indvars.iv
				%11 = load i16, ptr %arrayidx.7, align 4
				%12 = sub i16 %10, %11
				%arrayidx.8 = getelementptr inbounds i16, ptr %I, i64 %indvars.iv
				%13 = load i16, ptr %arrayidx.8, align 4
				%14 = sub i16 %12, %13
				%arrayidx.9 = getelementptr inbounds i16, ptr %A, i64 %indvars.iv
				store i16 %14, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

				; CHECK-LABEL: @sub_i32(
				; CHECK: %[[V0:.]] = sub i32 {{.}}, {{.*}}
				; CHECK: %[[V1:.]] = sub i32 %[[V0]], %{{.}}
				; CHECK: %[[V2:.]] = sub i32 %[[V1]], {{.}}
				; CHECK: %[[V3:.]] = sub i32 %[[V2]], {{.}}
				; CHECK: %[[V4:.]] = sub i32 %[[V3]], {{.}}
				; CHECK: %[[V5:.]] = sub i32 %[[V4]], {{.}}
				; CHECK: sub i32 %[[V5]], {{.*}}
				define void @sub_i32(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds i32, ptr %B, i64 %indvars.iv
				%0 = load i32, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds i32, ptr %C, i64 %indvars.iv
				%1 = load i32, ptr %arrayidx.2, align 4
				%2 = sub i32 %0, %1
				%arrayidx.3 = getelementptr inbounds i32, ptr %D, i64 %indvars.iv
				%3 = load i32, ptr %arrayidx.3, align 4
				%4 = sub i32 %2, %3
				%arrayidx.4 = getelementptr inbounds i32, ptr %E, i64 %indvars.iv
				%5 = load i32, ptr %arrayidx.4, align 4
				%6 = sub i32 %4, %5
				%arrayidx.5 = getelementptr inbounds i32, ptr %F, i64 %indvars.iv
				%7 = load i32, ptr %arrayidx.5, align 4
				%8 = sub i32 %6, %7
				%arrayidx.6 = getelementptr inbounds i32, ptr %G, i64 %indvars.iv
				%9 = load i32, ptr %arrayidx.6, align 4
				%10 = sub i32 %8, %9
				%arrayidx.7 = getelementptr inbounds i32, ptr %H, i64 %indvars.iv
				%11 = load i32, ptr %arrayidx.7, align 4
				%12 = sub i32 %10, %11
				%arrayidx.8 = getelementptr inbounds i32, ptr %I, i64 %indvars.iv
				%13 = load i32, ptr %arrayidx.8, align 4
				%14 = sub i32 %12, %13
				%arrayidx.9 = getelementptr inbounds i32, ptr %A, i64 %indvars.iv
				store i32 %14, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

				; CHECK-LABEL: @sub_i64(
				; CHECK: %[[V0:.]] = sub i64 {{.}}, {{.*}}
				; CHECK: %[[V1:.]] = sub i64 %[[V0]], %{{.}}
				; CHECK: %[[V2:.]] = sub i64 %[[V1]], {{.}}
				; CHECK: %[[V3:.]] = sub i64 %[[V2]], {{.}}
				; CHECK: %[[V4:.]] = sub i64 %[[V3]], {{.}}
				; CHECK: %[[V5:.]] = sub i64 %[[V4]], {{.}}
				; CHECK: sub i64 %[[V5]], {{.*}}
				define void @sub_i64(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds i64, ptr %B, i64 %indvars.iv
				%0 = load i64, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds i64, ptr %C, i64 %indvars.iv
				%1 = load i64, ptr %arrayidx.2, align 4
				%2 = sub i64 %0, %1
				%arrayidx.3 = getelementptr inbounds i64, ptr %D, i64 %indvars.iv
				%3 = load i64, ptr %arrayidx.3, align 4
				%4 = sub i64 %2, %3
				%arrayidx.4 = getelementptr inbounds i64, ptr %E, i64 %indvars.iv
				%5 = load i64, ptr %arrayidx.4, align 4
				%6 = sub i64 %4, %5
				%arrayidx.5 = getelementptr inbounds i64, ptr %F, i64 %indvars.iv
				%7 = load i64, ptr %arrayidx.5, align 4
				%8 = sub i64 %6, %7
				%arrayidx.6 = getelementptr inbounds i64, ptr %G, i64 %indvars.iv
				%9 = load i64, ptr %arrayidx.6, align 4
				%10 = sub i64 %8, %9
				%arrayidx.7 = getelementptr inbounds i64, ptr %H, i64 %indvars.iv
				%11 = load i64, ptr %arrayidx.7, align 4
				%12 = sub i64 %10, %11
				%arrayidx.8 = getelementptr inbounds i64, ptr %I, i64 %indvars.iv
				%13 = load i64, ptr %arrayidx.8, align 4
				%14 = sub i64 %12, %13
				%arrayidx.9 = getelementptr inbounds i64, ptr %A, i64 %indvars.iv
				store i64 %14, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

llvm/test/Transforms/TreeHeightReduction/leaf-num-check.ll

This file was added.

				; RUN: opt -S -passes=tree-height-reduction -enable-int-thr < %s \| FileCheck %s

				; CHECK-LABEL: @leaf_num_is_3(
				; CHECK: %[[V0:.]] = add nsw i32 {{.}}, {{.*}}
				; CHECK: add nsw i32 %[[V0]], {{.*}}
				define void @leaf_num_is_3(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds i32, ptr %B, i64 %indvars.iv
				%0 = load i32, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds i32, ptr %C, i64 %indvars.iv
				%1 = load i32, ptr %arrayidx.2, align 4
				%2 = add nsw i32 %1, %0
				%arrayidx.3 = getelementptr inbounds i32, ptr %D, i64 %indvars.iv
				%3 = load i32, ptr %arrayidx.3, align 4
				%4 = add nsw i32 %2, %3
				%arrayidx.4 = getelementptr inbounds i32, ptr %A, i64 %indvars.iv
				store i32 %4, ptr %arrayidx.1, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

				; CHECK-LABEL: @leaf_num_is_4(
				; CHECK: %[[V0:.]] = add nsw i32 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = add nsw i32 {{.}}, {{.*}}
				; CHECK-NEXT: add nsw i32 %[[V0]], %[[V1]]
				define void @leaf_num_is_4(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds i32, ptr %B, i64 %indvars.iv
				%0 = load i32, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds i32, ptr %C, i64 %indvars.iv
				%1 = load i32, ptr %arrayidx.2, align 4
				%2 = add nsw i32 %1, %0
				%arrayidx.3 = getelementptr inbounds i32, ptr %D, i64 %indvars.iv
				%3 = load i32, ptr %arrayidx.3, align 4
				%4 = add nsw i32 %2, %3
				%arrayidx.4 = getelementptr inbounds i32, ptr %E, i64 %indvars.iv
				%5 = load i32, ptr %arrayidx.4, align 4
				%6 = add nsw i32 %4, %5
				%arrayidx.5 = getelementptr inbounds i32, ptr %A, i64 %indvars.iv
				store i32 %6, ptr %arrayidx.5, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

llvm/test/Transforms/TreeHeightReduction/long-double-add-with-constant.ll

This file was added.

				; RUN: opt -S -passes=tree-height-reduction -enable-fp-thr < %s \| FileCheck %s

				; CHECK-LABEL: @add_fp128_with_constant(
				; CHECK: %[[V0:.]] = fadd reassoc nsz fp128 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = fadd reassoc nsz fp128 {{.}}, %[[V0]]
				; CHECK-NEXT: %[[V2:.]] = fadd reassoc nsz fp128 {{.}}, 0xL00000000000000004002400000000000
				; CHECK-NEXT: %[[V3:.]] = fadd reassoc nsz fp128 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.]] = fadd reassoc nsz fp128 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V5:.*]] = fadd reassoc nsz fp128 %[[V1]], %[[V2]]
				; CHECK-NEXT: %[[V6:.*]] = fadd reassoc nsz fp128 %[[V3]], %[[V4]]
				; CHECK-NEXT: fadd reassoc nsz fp128 %[[V5]], %[[V6]]
				define void @add_fp128_with_constant(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds fp128, ptr %B, i64 %indvars.iv
				%0 = load fp128, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds fp128, ptr %C, i64 %indvars.iv
				%1 = load fp128, ptr %arrayidx.2, align 4
				%arrayidx.3 = getelementptr inbounds fp128, ptr %D, i64 %indvars.iv
				%2 = load fp128, ptr %arrayidx.3, align 4
				%arrayidx.4 = getelementptr inbounds fp128, ptr %E, i64 %indvars.iv
				%3 = load fp128, ptr %arrayidx.4, align 4
				%arrayidx.5 = getelementptr inbounds fp128, ptr %F, i64 %indvars.iv
				%4 = load fp128, ptr %arrayidx.5, align 4
				%arrayidx.6 = getelementptr inbounds fp128, ptr %G, i64 %indvars.iv
				%5 = load fp128, ptr %arrayidx.6, align 4
				%arrayidx.7 = getelementptr inbounds fp128, ptr %H, i64 %indvars.iv
				%6 = load fp128, ptr %arrayidx.7, align 4
				%arrayidx.8 = getelementptr inbounds fp128, ptr %I, i64 %indvars.iv
				%7 = load fp128, ptr %arrayidx.8, align 4
				%8 = fadd reassoc nsz fp128 %0, 0xL00000000000000004002400000000000
				%9 = fadd reassoc nsz fp128 %8, %1
				%10 = fadd reassoc nsz fp128 %9, %2
				%11 = fadd reassoc nsz fp128 %10, %3
				%12 = fadd reassoc nsz fp128 %11, %4
				%13 = fadd reassoc nsz fp128 %12, %5
				%14 = fadd reassoc nsz fp128 %13, %6
				%15 = fadd reassoc nsz fp128 %14, %7
				%arrayidx.9 = getelementptr inbounds fp128, ptr %A, i64 %indvars.iv
				store fp128 %15, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

llvm/test/Transforms/TreeHeightReduction/long-double-add.ll

This file was added.

				; RUN: opt -S -passes=tree-height-reduction -enable-fp-thr < %s \| FileCheck %s

				; CHECK-LABEL: @add_fp128(
				; CHECK: %[[V0:.]] = fadd reassoc nsz fp128 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = fadd reassoc nsz fp128 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V2:.]] = fadd reassoc nsz fp128 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V3:.]] = fadd reassoc nsz fp128 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.*]] = fadd reassoc nsz fp128 %[[V0]], %[[V1]]
				; CHECK-NEXT: %[[V5:.*]] = fadd reassoc nsz fp128 %[[V2]], %[[V3]]
				; CHECK-NEXT: fadd reassoc nsz fp128 %[[V4]], %[[V5]]
				define void @add_fp128(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds fp128, ptr %B, i64 %indvars.iv
				%0 = load fp128, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds fp128, ptr %C, i64 %indvars.iv
				%1 = load fp128, ptr %arrayidx.2, align 4
				%2 = fadd reassoc nsz fp128 %1, %0
				%arrayidx.3 = getelementptr inbounds fp128, ptr %D, i64 %indvars.iv
				%3 = load fp128, ptr %arrayidx.3, align 4
				%4 = fadd reassoc nsz fp128 %2, %3
				%arrayidx.4 = getelementptr inbounds fp128, ptr %E, i64 %indvars.iv
				%5 = load fp128, ptr %arrayidx.4, align 4
				%6 = fadd reassoc nsz fp128 %4, %5
				%arrayidx.5 = getelementptr inbounds fp128, ptr %F, i64 %indvars.iv
				%7 = load fp128, ptr %arrayidx.5, align 4
				%8 = fadd reassoc nsz fp128 %6, %7
				%arrayidx.6 = getelementptr inbounds fp128, ptr %G, i64 %indvars.iv
				%9 = load fp128, ptr %arrayidx.6, align 4
				%10 = fadd reassoc nsz fp128 %8, %9
				%arrayidx.7 = getelementptr inbounds fp128, ptr %H, i64 %indvars.iv
				%11 = load fp128, ptr %arrayidx.7, align 4
				%12 = fadd reassoc nsz fp128 %10, %11
				%arrayidx.8 = getelementptr inbounds fp128, ptr %I, i64 %indvars.iv
				%13 = load fp128, ptr %arrayidx.8, align 4
				%14 = fadd reassoc nsz fp128 %12, %13
				%arrayidx.9 = getelementptr inbounds fp128, ptr %A, i64 %indvars.iv
				store fp128 %14, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

llvm/test/Transforms/TreeHeightReduction/long-double-mult.ll

This file was added.

				; RUN: opt -S -passes=tree-height-reduction -enable-fp-thr < %s \| FileCheck %s

				; CHECK-LABEL: @add_fp128(
				; CHECK: %[[V0:.]] = fmul reassoc nsz fp128 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V1:.]] = fmul reassoc nsz fp128 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V2:.]] = fmul reassoc nsz fp128 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V3:.]] = fmul reassoc nsz fp128 {{.}}, {{.*}}
				; CHECK-NEXT: %[[V4:.*]] = fmul reassoc nsz fp128 %[[V0]], %[[V1]]
				; CHECK-NEXT: %[[V5:.*]] = fmul reassoc nsz fp128 %[[V2]], %[[V3]]
				; CHECK-NEXT: fmul reassoc nsz fp128 %[[V4]], %[[V5]]
				define void @add_fp128(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds fp128, ptr %B, i64 %indvars.iv
				%0 = load fp128, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds fp128, ptr %C, i64 %indvars.iv
				%1 = load fp128, ptr %arrayidx.2, align 4
				%2 = fmul reassoc nsz fp128 %1, %0
				%arrayidx.3 = getelementptr inbounds fp128, ptr %D, i64 %indvars.iv
				%3 = load fp128, ptr %arrayidx.3, align 4
				%4 = fmul reassoc nsz fp128 %2, %3
				%arrayidx.4 = getelementptr inbounds fp128, ptr %E, i64 %indvars.iv
				%5 = load fp128, ptr %arrayidx.4, align 4
				%6 = fmul reassoc nsz fp128 %4, %5
				%arrayidx.5 = getelementptr inbounds fp128, ptr %F, i64 %indvars.iv
				%7 = load fp128, ptr %arrayidx.5, align 4
				%8 = fmul reassoc nsz fp128 %6, %7
				%arrayidx.6 = getelementptr inbounds fp128, ptr %G, i64 %indvars.iv
				%9 = load fp128, ptr %arrayidx.6, align 4
				%10 = fmul reassoc nsz fp128 %8, %9
				%arrayidx.7 = getelementptr inbounds fp128, ptr %H, i64 %indvars.iv
				%11 = load fp128, ptr %arrayidx.7, align 4
				%12 = fmul reassoc nsz fp128 %10, %11
				%arrayidx.8 = getelementptr inbounds fp128, ptr %I, i64 %indvars.iv
				%13 = load fp128, ptr %arrayidx.8, align 4
				%14 = fmul reassoc nsz fp128 %12, %13
				%arrayidx.9 = getelementptr inbounds fp128, ptr %A, i64 %indvars.iv
				store fp128 %14, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

llvm/test/Transforms/TreeHeightReduction/long-double-sub.ll

This file was added.

				; RUN: opt -S -passes=tree-height-reduction -enable-fp-thr < %s \| FileCheck %s

				; CHECK-LABEL: @sub_fp128(
				; CHECK: %[[V0:.]] = fsub reassoc nsz fp128 {{.}}, {{.*}}
				; CHECK: %[[V1:.]] = fsub reassoc nsz fp128 %[[V0]], %{{.}}
				; CHECK: %[[V2:.]] = fsub reassoc nsz fp128 %[[V1]], {{.}}
				; CHECK: %[[V3:.]] = fsub reassoc nsz fp128 %[[V2]], {{.}}
				; CHECK: %[[V4:.]] = fsub reassoc nsz fp128 %[[V3]], {{.}}
				; CHECK: %[[V5:.]] = fsub reassoc nsz fp128 %[[V4]], {{.}}
				; CHECK: fsub reassoc nsz fp128 %[[V5]], {{.*}}
				define void @sub_fp128(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds fp128, ptr %B, i64 %indvars.iv
				%0 = load fp128, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds fp128, ptr %C, i64 %indvars.iv
				%1 = load fp128, ptr %arrayidx.2, align 4
				%2 = fsub reassoc nsz fp128 %0, %1
				%arrayidx.3 = getelementptr inbounds fp128, ptr %D, i64 %indvars.iv
				%3 = load fp128, ptr %arrayidx.3, align 4
				%4 = fsub reassoc nsz fp128 %2, %3
				%arrayidx.4 = getelementptr inbounds fp128, ptr %E, i64 %indvars.iv
				%5 = load fp128, ptr %arrayidx.4, align 4
				%6 = fsub reassoc nsz fp128 %4, %5
				%arrayidx.5 = getelementptr inbounds fp128, ptr %F, i64 %indvars.iv
				%7 = load fp128, ptr %arrayidx.5, align 4
				%8 = fsub reassoc nsz fp128 %6, %7
				%arrayidx.6 = getelementptr inbounds fp128, ptr %G, i64 %indvars.iv
				%9 = load fp128, ptr %arrayidx.6, align 4
				%10 = fsub reassoc nsz fp128 %8, %9
				%arrayidx.7 = getelementptr inbounds fp128, ptr %H, i64 %indvars.iv
				%11 = load fp128, ptr %arrayidx.7, align 4
				%12 = fsub reassoc nsz fp128 %10, %11
				%arrayidx.8 = getelementptr inbounds fp128, ptr %I, i64 %indvars.iv
				%13 = load fp128, ptr %arrayidx.8, align 4
				%14 = fsub reassoc nsz fp128 %12, %13
				%arrayidx.9 = getelementptr inbounds fp128, ptr %A, i64 %indvars.iv
				store fp128 %14, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

				; CHECK-LABEL: @sub_double(
				; CHECK: %[[V0:.]] = fsub reassoc nsz double {{.}}, {{.*}}
				; CHECK: %[[V1:.]] = fsub reassoc nsz double %[[V0]], %{{.}}
				; CHECK: %[[V2:.]] = fsub reassoc nsz double %[[V1]], {{.}}
				; CHECK: %[[V3:.]] = fsub reassoc nsz double %[[V2]], {{.}}
				; CHECK: %[[V4:.]] = fsub reassoc nsz double %[[V3]], {{.}}
				; CHECK: %[[V5:.]] = fsub reassoc nsz double %[[V4]], {{.}}
				; CHECK: fsub reassoc nsz double %[[V5]], {{.*}}
				define void @sub_double(ptr noalias %A, ptr noalias %B, ptr noalias %C, ptr noalias %D, ptr noalias %E, ptr noalias %F, ptr noalias %G, ptr noalias %H, ptr noalias %I, i32 %N) norecurse nounwind {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %preh, label %for.end

				preh: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %preh
				%indvars.iv = phi i64 [ 0, %preh ], [ %indvars.iv.next, %for.body ]
				%arrayidx.1 = getelementptr inbounds double, ptr %B, i64 %indvars.iv
				%0 = load double, ptr %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds double, ptr %C, i64 %indvars.iv
				%1 = load double, ptr %arrayidx.2, align 4
				%2 = fsub reassoc nsz double %0, %1
				%arrayidx.3 = getelementptr inbounds double, ptr %D, i64 %indvars.iv
				%3 = load double, ptr %arrayidx.3, align 4
				%4 = fsub reassoc nsz double %2, %3
				%arrayidx.4 = getelementptr inbounds double, ptr %E, i64 %indvars.iv
				%5 = load double, ptr %arrayidx.4, align 4
				%6 = fsub reassoc nsz double %4, %5
				%arrayidx.5 = getelementptr inbounds double, ptr %F, i64 %indvars.iv
				%7 = load double, ptr %arrayidx.5, align 4
				%8 = fsub reassoc nsz double %6, %7
				%arrayidx.6 = getelementptr inbounds double, ptr %G, i64 %indvars.iv
				%9 = load double, ptr %arrayidx.6, align 4
				%10 = fsub reassoc nsz double %8, %9
				%arrayidx.7 = getelementptr inbounds double, ptr %H, i64 %indvars.iv
				%11 = load double, ptr %arrayidx.7, align 4
				%12 = fsub reassoc nsz double %10, %11
				%arrayidx.8 = getelementptr inbounds double, ptr %I, i64 %indvars.iv
				%13 = load double, ptr %arrayidx.8, align 4
				%14 = fsub reassoc nsz double %12, %13
				%arrayidx.9 = getelementptr inbounds double, ptr %A, i64 %indvars.iv
				store double %14, ptr %arrayidx.9, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

Add new optimization pass of Tree Height ReductionAbandonedPublic

Details

Applicable conditions

Performance

Relation to D67383

Future work

Discussion

Diff Detail

Event Timeline

Revision Contents

Diff 456269

llvm/include/llvm/Transforms/Scalar/TreeHeightReduction.h

llvm/lib/Passes/PassBuilder.cpp

llvm/lib/Passes/PassBuilderPipelines.cpp

llvm/lib/Passes/PassRegistry.def

llvm/lib/Transforms/Scalar/CMakeLists.txt

llvm/lib/Transforms/Scalar/TreeHeightReduction.cpp

llvm/test/Other/new-pm-defaults.ll

llvm/test/Other/new-pm-thinlto-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

llvm/test/Transforms/TreeHeightReduction/floating-point-add-only.ll

llvm/test/Transforms/TreeHeightReduction/floating-point-add-with-constant.ll

llvm/test/Transforms/TreeHeightReduction/floating-point-mult-only.ll

llvm/test/Transforms/TreeHeightReduction/floating-point-sub-only.ll

llvm/test/Transforms/TreeHeightReduction/fp16-add-with-constant.ll

llvm/test/Transforms/TreeHeightReduction/fp16-add.ll

llvm/test/Transforms/TreeHeightReduction/fp16-mult.ll

llvm/test/Transforms/TreeHeightReduction/fp16-sub.ll

llvm/test/Transforms/TreeHeightReduction/integer-add-only.ll

llvm/test/Transforms/TreeHeightReduction/integer-add-with-constant.ll

llvm/test/Transforms/TreeHeightReduction/integer-mult-only.ll

llvm/test/Transforms/TreeHeightReduction/integer-sub-only.ll

llvm/test/Transforms/TreeHeightReduction/leaf-num-check.ll

llvm/test/Transforms/TreeHeightReduction/long-double-add-with-constant.ll

llvm/test/Transforms/TreeHeightReduction/long-double-add.ll

llvm/test/Transforms/TreeHeightReduction/long-double-mult.ll

llvm/test/Transforms/TreeHeightReduction/long-double-sub.ll

Add new optimization pass of Tree Height Reduction
AbandonedPublic