This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
flang/
-
lib/Optimizer/HLFIR/Transforms/
-
Optimizer/
-
HLFIR/
-
Transforms/
3/3
OptimizedBufferization.cpp
-
test/HLFIR/
-
HLFIR/
-
opt-scalar-assign.fir

Differential D159151

[flang][hlfir] Expand hlfir.assign's with scalar RHS.
ClosedPublic

Authored by vzakhari on Aug 29 2023, 5:53 PM.

Download Raw Diff

Details

Reviewers

tblah
jeanPerier

Commits

rGe60dc8ed7eec: [flang][hlfir] Expand hlfir.assign's with scalar RHS.

Summary

Expanding hlfir.assign's with scalar RHS late in MLIR optimization
pipeline allows LLVM to recognize most of them as simple memset loops.
This is especially important for small size LHS arrays, because
the assign loop nest may be completely unrolled enabling more value
propagation.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

vzakhari created this revision.Aug 29 2023, 5:53 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2023, 5:53 PM

Herald added subscribers: bzcheeseman, mehdi_amini, rriddle, jdoerfert. · View Herald Transcript

vzakhari requested review of this revision.Aug 29 2023, 5:53 PM

Herald added a subscriber: stephenneuendorffer. · View Herald TranscriptAug 29 2023, 5:53 PM

This addresses one performance problem in Polyhedron/fatigue2.

Another problem is in assignments like this:

subroutine test(x)
  real :: x(:,:)
  real :: y(3,3)
  y(:,:) = x(:,:)

So we have to optimize hlfir.assign where RHS is not elemental but a "variable". We just need to prove that LHS and RHS do not conflict. I am going to work on this next.

flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
433	@tblah I wanted to check with you before making this change. If you are extending the pass for cam, then I would like to postpone it. Please let me know.

Harbormaster completed remote builds in B255661: Diff 554537.Aug 29 2023, 9:13 PM

vzakhari added a child revision: D159246: [flang][hlfir] Expand array hlfir.assign's..Aug 30 2023, 6:13 PM

LGTM, thanks for this!

flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
433	I have no changes to this pass in progress so feel free to go ahead with changes. I wonder if running it on hlfir.assign might be premature because we might find optimizable bufferizations which don't involve a hlfir.assign (although I don't have any in mind currently). But yeah I like the idea of having some central heuristic for deciding which pattern to apply.

tblah accepted this revision.Aug 31 2023, 3:01 AM

This revision is now accepted and ready to land.Aug 31 2023, 3:01 AM

vzakhari added inline comments.Aug 31 2023, 8:09 AM

flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
433	Thanks! I think I will investigate other benchmarks' performance and postpone the reordering. I will proceed with it if nothing new appears.

Closed by commit rGe60dc8ed7eec: [flang][hlfir] Expand hlfir.assign's with scalar RHS. (authored by vzakhari). · Explain WhyAug 31 2023, 8:48 AM

This revision was automatically updated to reflect the committed changes.

vzakhari added a commit: rGe60dc8ed7eec: [flang][hlfir] Expand hlfir.assign's with scalar RHS..

Revision Contents

Path

Size

flang/

lib/

Optimizer/

HLFIR/

Transforms/

OptimizedBufferization.cpp

66 lines

test/

HLFIR/

opt-scalar-assign.fir

121 lines

Diff 555066

flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp

Show First 20 Lines • Show All 352 Lines • ▼ Show 20 Lines	builder.create<hlfir::AssignOp>(
/keep_lhs_length_if_realloc=/false, match->assign.getTemporaryLhs());		/keep_lhs_length_if_realloc=/false, match->assign.getTemporaryLhs());

rewriter.eraseOp(match->assign);		rewriter.eraseOp(match->assign);
rewriter.eraseOp(match->destroy);		rewriter.eraseOp(match->destroy);
rewriter.eraseOp(elemental);		rewriter.eraseOp(elemental);
return mlir::success();		return mlir::success();
}		}

		/// Expand hlfir.assign of a scalar RHS to array LHS into a loop nest
		/// of element-by-element assignments:
		/// hlfir.assign %cst to %0 : f32, !fir.ref<!fir.array<6x6xf32>>
		/// into:
		/// fir.do_loop %arg0 = %c1 to %c6 step %c1 unordered {
		/// fir.do_loop %arg1 = %c1 to %c6 step %c1 unordered {
		/// %1 = hlfir.designate %0 (%arg1, %arg0) :
		/// (!fir.ref<!fir.array<6x6xf32>>, index, index) -> !fir.ref<f32>
		/// hlfir.assign %cst to %1 : f32, !fir.ref<f32>
		/// }
		/// }
		class BroadcastAssignBufferization
		: public mlir::OpRewritePattern<hlfir::AssignOp> {
		private:
		public:
		using mlir::OpRewritePattern<hlfir::AssignOp>::OpRewritePattern;

		mlir::LogicalResult
		matchAndRewrite(hlfir::AssignOp assign,
		mlir::PatternRewriter &rewriter) const override;
		};

		mlir::LogicalResult BroadcastAssignBufferization::matchAndRewrite(
		hlfir::AssignOp assign, mlir::PatternRewriter &rewriter) const {
		if (assign.isAllocatableAssignment())
		return rewriter.notifyMatchFailure(assign, "AssignOp may imply allocation");

		mlir::Value rhs = assign.getRhs();
		if (!fir::isa_trivial(rhs.getType()))
		return rewriter.notifyMatchFailure(
		assign, "AssignOp's RHS is not a trivial scalar");

		hlfir::Entity lhs{assign.getLhs()};
		if (!lhs.isArray())
		return rewriter.notifyMatchFailure(assign,
		"AssignOp's LHS is not an array");

		mlir::Type eleTy = lhs.getFortranElementType();
		if (!fir::isa_trivial(eleTy))
		return rewriter.notifyMatchFailure(
		assign, "AssignOp's LHS data type is not trivial");

		mlir::Location loc = assign->getLoc();
		fir::FirOpBuilder builder(rewriter, assign.getOperation());
		builder.setInsertionPoint(assign);
		lhs = hlfir::derefPointersAndAllocatables(loc, builder, lhs);
		mlir::Value shape = hlfir::genShape(loc, builder, lhs);
		llvm::SmallVector<mlir::Value> extents =
		hlfir::getIndexExtents(loc, builder, shape);
		hlfir::LoopNest loopNest =
		hlfir::genLoopNest(loc, builder, extents, /isUnordered=/true);
		builder.setInsertionPointToStart(loopNest.innerLoop.getBody());
		auto arrayElement =
		hlfir::getElementAt(loc, builder, lhs, loopNest.oneBasedIndices);
		builder.create<hlfir::AssignOp>(loc, rhs, arrayElement);
		rewriter.eraseOp(assign);
		return mlir::success();
		}

class OptimizedBufferizationPass		class OptimizedBufferizationPass
: public hlfir::impl::OptimizedBufferizationBase<		: public hlfir::impl::OptimizedBufferizationBase<
OptimizedBufferizationPass> {		OptimizedBufferizationPass> {
public:		public:
void runOnOperation() override {		void runOnOperation() override {
mlir::func::FuncOp func = getOperation();		mlir::func::FuncOp func = getOperation();
mlir::MLIRContext *context = &getContext();		mlir::MLIRContext *context = &getContext();

mlir::GreedyRewriteConfig config;		mlir::GreedyRewriteConfig config;
// Prevent the pattern driver from merging blocks		// Prevent the pattern driver from merging blocks
config.enableRegionSimplification = false;		config.enableRegionSimplification = false;

mlir::RewritePatternSet patterns(context);		mlir::RewritePatternSet patterns(context);
		// TODO: right now the patterns are non-conflicting,
		vzakhariAuthorUnsubmitted Done Reply Inline Actions @tblah I wanted to check with you before making this change. If you are extending the pass for cam, then I would like to postpone it. Please let me know. vzakhari: @tblah I wanted to check with you before making this change. If you are extending the pass for…
		tblahUnsubmitted Done Reply Inline Actions I have no changes to this pass in progress so feel free to go ahead with changes. I wonder if running it on hlfir.assign might be premature because we might find optimizable bufferizations which don't involve a hlfir.assign (although I don't have any in mind currently). But yeah I like the idea of having some central heuristic for deciding which pattern to apply. tblah: I have no changes to this pass in progress so feel free to go ahead with changes. I wonder if…
		vzakhariAuthorUnsubmitted Done Reply Inline Actions Thanks! I think I will investigate other benchmarks' performance and postpone the reordering. I will proceed with it if nothing new appears. vzakhari: Thanks! I think I will investigate other benchmarks' performance and postpone the reordering.
		// but it might be better to run this pass on hlfir.assign
		// operations and decide which transformation to apply
		// at one place (e.g. we may use some heuristics and
		// choose different optimization strategies).
		// This requires small code reordering in ElementalAssignBufferization.
patterns.insert<ElementalAssignBufferization>(context);		patterns.insert<ElementalAssignBufferization>(context);
		patterns.insert<BroadcastAssignBufferization>(context);

if (mlir::failed(mlir::applyPatternsAndFoldGreedily(		if (mlir::failed(mlir::applyPatternsAndFoldGreedily(
func, std::move(patterns), config))) {		func, std::move(patterns), config))) {
mlir::emitError(func.getLoc(),		mlir::emitError(func.getLoc(),
"failure in HLFIR optimized bufferization");		"failure in HLFIR optimized bufferization");
signalPassFailure();		signalPassFailure();
}		}
}		}
};		};
} // namespace		} // namespace

std::unique_ptr<mlir::Pass> hlfir::createOptimizedBufferizationPass() {		std::unique_ptr<mlir::Pass> hlfir::createOptimizedBufferizationPass() {
return std::make_unique<OptimizedBufferizationPass>();		return std::make_unique<OptimizedBufferizationPass>();
}		}

flang/test/HLFIR/opt-scalar-assign.fir

This file was added.

				// Test optimized bufferization for hlfir.assign with scalar RHS.
				// RUN: fir-opt --opt-bufferization %s \| FileCheck %s

				func.func @_QPtest1() {
				%cst = arith.constant 0.000000e+00 : f32
				%c11 = arith.constant 11 : index
				%c13 = arith.constant 13 : index
				%0 = fir.alloca !fir.array<11x13xf32> {bindc_name = "x", uniq_name = "_QFtest1Ex"}
				%1 = fir.shape %c11, %c13 : (index, index) -> !fir.shape<2>
				%2:2 = hlfir.declare %0(%1) {uniq_name = "_QFtest1Ex"} : (!fir.ref<!fir.array<11x13xf32>>, !fir.shape<2>) -> (!fir.ref<!fir.array<11x13xf32>>, !fir.ref<!fir.array<11x13xf32>>)
				hlfir.assign %cst to %2#0 : f32, !fir.ref<!fir.array<11x13xf32>>
				return
				}
				// CHECK-LABEL: func.func @_QPtest1() {
				// CHECK: %[[VAL_0:.*]] = arith.constant 1 : index
				// CHECK: %[[VAL_1:.*]] = arith.constant 0.000000e+00 : f32
				// CHECK: %[[VAL_2:.*]] = arith.constant 11 : index
				// CHECK: %[[VAL_3:.*]] = arith.constant 13 : index
				// CHECK: %[[VAL_4:.*]] = fir.alloca !fir.array<11x13xf32> {bindc_name = "x", uniq_name = "_QFtest1Ex"}
				// CHECK: %[[VAL_5:.*]] = fir.shape %[[VAL_2]], %[[VAL_3]] : (index, index) -> !fir.shape<2>
				// CHECK: %[[VAL_6:.*]]:2 = hlfir.declare %[[VAL_4]](%[[VAL_5]]) {uniq_name = "_QFtest1Ex"} : (!fir.ref<!fir.array<11x13xf32>>, !fir.shape<2>) -> (!fir.ref<!fir.array<11x13xf32>>, !fir.ref<!fir.array<11x13xf32>>)
				// CHECK: fir.do_loop %[[VAL_7:.*]] = %[[VAL_0]] to %[[VAL_3]] step %[[VAL_0]] unordered {
				// CHECK: fir.do_loop %[[VAL_8:.*]] = %[[VAL_0]] to %[[VAL_2]] step %[[VAL_0]] unordered {
				// CHECK: %[[VAL_9:.*]] = hlfir.designate %[[VAL_6]]#0 (%[[VAL_8]], %[[VAL_7]]) : (!fir.ref<!fir.array<11x13xf32>>, index, index) -> !fir.ref<f32>
				// CHECK: hlfir.assign %[[VAL_1]] to %[[VAL_9]] : f32, !fir.ref<f32>
				// CHECK: }
				// CHECK: }
				// CHECK: return
				// CHECK: }

				func.func @_QPtest2(%arg0: !fir.box<!fir.array<?x?xi32>> {fir.bindc_name = "x"}) {
				%c0_i32 = arith.constant 0 : i32
				%0:2 = hlfir.declare %arg0 {uniq_name = "_QFtest2Ex"} : (!fir.box<!fir.array<?x?xi32>>) -> (!fir.box<!fir.array<?x?xi32>>, !fir.box<!fir.array<?x?xi32>>)
				hlfir.assign %c0_i32 to %0#0 : i32, !fir.box<!fir.array<?x?xi32>>
				return
				}
				// CHECK-LABEL: func.func @_QPtest2(
				// CHECK-SAME: %[[VAL_0:.*]]: !fir.box<!fir.array<?x?xi32>> {fir.bindc_name = "x"}) {
				// CHECK: %[[VAL_1:.*]] = arith.constant 1 : index
				// CHECK: %[[VAL_2:.*]] = arith.constant 0 : index
				// CHECK: %[[VAL_3:.*]] = arith.constant 0 : i32
				// CHECK: %[[VAL_4:.*]]:2 = hlfir.declare %[[VAL_0]] {uniq_name = "_QFtest2Ex"} : (!fir.box<!fir.array<?x?xi32>>) -> (!fir.box<!fir.array<?x?xi32>>, !fir.box<!fir.array<?x?xi32>>)
				// CHECK: %[[VAL_5:.*]]:3 = fir.box_dims %[[VAL_4]]#0, %[[VAL_2]] : (!fir.box<!fir.array<?x?xi32>>, index) -> (index, index, index)
				// CHECK: %[[VAL_6:.*]]:3 = fir.box_dims %[[VAL_4]]#0, %[[VAL_1]] : (!fir.box<!fir.array<?x?xi32>>, index) -> (index, index, index)
				// CHECK: fir.do_loop %[[VAL_7:.*]] = %[[VAL_1]] to %[[VAL_6]]#1 step %[[VAL_1]] unordered {
				// CHECK: fir.do_loop %[[VAL_8:.*]] = %[[VAL_1]] to %[[VAL_5]]#1 step %[[VAL_1]] unordered {
				// CHECK: %[[VAL_9:.*]] = hlfir.designate %[[VAL_4]]#0 (%[[VAL_8]], %[[VAL_7]]) : (!fir.box<!fir.array<?x?xi32>>, index, index) -> !fir.ref<i32>
				// CHECK: hlfir.assign %[[VAL_3]] to %[[VAL_9]] : i32, !fir.ref<i32>
				// CHECK: }
				// CHECK: }
				// CHECK: return
				// CHECK: }

				func.func @_QPtest4(%arg0: !fir.ref<!fir.box<!fir.ptr<!fir.array<?x!fir.logical<4>>>>> {fir.bindc_name = "x"}) {
				%true = arith.constant true
				%0:2 = hlfir.declare %arg0 {fortran_attrs = #fir.var_attrs<pointer>, uniq_name = "_QFtest4Ex"} : (!fir.ref<!fir.box<!fir.ptr<!fir.array<?x!fir.logical<4>>>>>) -> (!fir.ref<!fir.box<!fir.ptr<!fir.array<?x!fir.logical<4>>>>>, !fir.ref<!fir.box<!fir.ptr<!fir.array<?x!fir.logical<4>>>>>)
				%1 = fir.convert %true : (i1) -> !fir.logical<4>
				%2 = fir.load %0#0 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?x!fir.logical<4>>>>>
				hlfir.assign %1 to %2 : !fir.logical<4>, !fir.box<!fir.ptr<!fir.array<?x!fir.logical<4>>>>
				return
				}
				// CHECK-LABEL: func.func @_QPtest4(
				// CHECK-SAME: %[[VAL_0:.*]]: !fir.ref<!fir.box<!fir.ptr<!fir.array<?x!fir.logical<4>>>>> {fir.bindc_name = "x"}) {
				// CHECK: %[[VAL_1:.*]] = arith.constant 1 : index
				// CHECK: %[[VAL_2:.*]] = arith.constant 0 : index
				// CHECK: %[[VAL_3:.*]] = arith.constant true
				// CHECK: %[[VAL_4:.*]]:2 = hlfir.declare %[[VAL_0]] {fortran_attrs = #fir.var_attrs<pointer>, uniq_name = "_QFtest4Ex"} : (!fir.ref<!fir.box<!fir.ptr<!fir.array<?x!fir.logical<4>>>>>) -> (!fir.ref<!fir.box<!fir.ptr<!fir.array<?x!fir.logical<4>>>>>, !fir.ref<!fir.box<!fir.ptr<!fir.array<?x!fir.logical<4>>>>>)
				// CHECK: %[[VAL_5:.*]] = fir.convert %[[VAL_3]] : (i1) -> !fir.logical<4>
				// CHECK: %[[VAL_6:.*]] = fir.load %[[VAL_4]]#0 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?x!fir.logical<4>>>>>
				// CHECK: %[[VAL_7:.*]]:3 = fir.box_dims %[[VAL_6]], %[[VAL_2]] : (!fir.box<!fir.ptr<!fir.array<?x!fir.logical<4>>>>, index) -> (index, index, index)
				// CHECK: fir.do_loop %[[VAL_8:.*]] = %[[VAL_1]] to %[[VAL_7]]#1 step %[[VAL_1]] unordered {
				// CHECK: %[[VAL_9:.*]]:3 = fir.box_dims %[[VAL_6]], %[[VAL_2]] : (!fir.box<!fir.ptr<!fir.array<?x!fir.logical<4>>>>, index) -> (index, index, index)
				// CHECK: %[[VAL_10:.*]] = arith.subi %[[VAL_9]]#0, %[[VAL_1]] : index
				// CHECK: %[[VAL_11:.*]] = arith.addi %[[VAL_8]], %[[VAL_10]] : index
				// CHECK: %[[VAL_12:.*]] = hlfir.designate %[[VAL_6]] (%[[VAL_11]]) : (!fir.box<!fir.ptr<!fir.array<?x!fir.logical<4>>>>, index) -> !fir.ref<!fir.logical<4>>
				// CHECK: hlfir.assign %[[VAL_5]] to %[[VAL_12]] : !fir.logical<4>, !fir.ref<!fir.logical<4>>
				// CHECK: }
				// CHECK: return
				// CHECK: }

				func.func @_QPtest3(%arg0: !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> {fir.bindc_name = "x"}) {
				%c0_i32 = arith.constant 0 : i32
				%0:2 = hlfir.declare %arg0 {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFtest3Ex"} : (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) -> (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>)
				hlfir.assign %c0_i32 to %0#0 realloc : i32, !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
				return
				}
				// CHECK-LABEL: func.func @_QPtest3(
				// CHECK-SAME: %[[VAL_0:.*]]: !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> {fir.bindc_name = "x"}) {
				// CHECK: %[[VAL_1:.*]] = arith.constant 0 : i32
				// CHECK: %[[VAL_2:.*]]:2 = hlfir.declare %[[VAL_0]] {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFtest3Ex"} : (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) -> (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>)
				// CHECK: hlfir.assign %[[VAL_1]] to %[[VAL_2]]#0 realloc : i32, !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
				// CHECK: return
				// CHECK: }

				func.func @_QPtest5(%arg0: !fir.ref<!fir.array<77x!fir.complex<4>>> {fir.bindc_name = "x"}) {
				%cst = arith.constant 0.000000e+00 : f32
				%c77 = arith.constant 77 : index
				%0 = fir.shape %c77 : (index) -> !fir.shape<1>
				%1:2 = hlfir.declare %arg0(%0) {uniq_name = "_QFtest5Ex"} : (!fir.ref<!fir.array<77x!fir.complex<4>>>, !fir.shape<1>) -> (!fir.ref<!fir.array<77x!fir.complex<4>>>, !fir.ref<!fir.array<77x!fir.complex<4>>>)
				%2 = fir.undefined !fir.complex<4>
				%3 = fir.insert_value %2, %cst, [0 : index] : (!fir.complex<4>, f32) -> !fir.complex<4>
				%4 = fir.insert_value %3, %cst, [1 : index] : (!fir.complex<4>, f32) -> !fir.complex<4>
				hlfir.assign %4 to %1#0 : !fir.complex<4>, !fir.ref<!fir.array<77x!fir.complex<4>>>
				return
				}
				// CHECK-LABEL: func.func @_QPtest5(
				// CHECK-SAME: %[[VAL_0:.*]]: !fir.ref<!fir.array<77x!fir.complex<4>>> {fir.bindc_name = "x"}) {
				// CHECK: %[[VAL_1:.*]] = arith.constant 1 : index
				// CHECK: %[[VAL_2:.*]] = arith.constant 0.000000e+00 : f32
				// CHECK: %[[VAL_3:.*]] = arith.constant 77 : index
				// CHECK: %[[VAL_4:.*]] = fir.shape %[[VAL_3]] : (index) -> !fir.shape<1>
				// CHECK: %[[VAL_5:.*]]:2 = hlfir.declare %[[VAL_0]](%[[VAL_4]]) {uniq_name = "_QFtest5Ex"} : (!fir.ref<!fir.array<77x!fir.complex<4>>>, !fir.shape<1>) -> (!fir.ref<!fir.array<77x!fir.complex<4>>>, !fir.ref<!fir.array<77x!fir.complex<4>>>)
				// CHECK: %[[VAL_6:.*]] = fir.undefined !fir.complex<4>
				// CHECK: %[[VAL_7:.*]] = fir.insert_value %[[VAL_6]], %[[VAL_2]], [0 : index] : (!fir.complex<4>, f32) -> !fir.complex<4>
				// CHECK: %[[VAL_8:.*]] = fir.insert_value %[[VAL_7]], %[[VAL_2]], [1 : index] : (!fir.complex<4>, f32) -> !fir.complex<4>
				// CHECK: fir.do_loop %[[VAL_9:.*]] = %[[VAL_1]] to %[[VAL_3]] step %[[VAL_1]] unordered {
				// CHECK: %[[VAL_10:.*]] = hlfir.designate %[[VAL_5]]#0 (%[[VAL_9]]) : (!fir.ref<!fir.array<77x!fir.complex<4>>>, index) -> !fir.ref<!fir.complex<4>>
				// CHECK: hlfir.assign %[[VAL_8]] to %[[VAL_10]] : !fir.complex<4>, !fir.ref<!fir.complex<4>>
				// CHECK: }
				// CHECK: return
				// CHECK: }