This is an archive of the discontinued LLVM Phabricator instance.

[flang][hlfir] Hoist forall bounds computation when possible
ClosedPublic

Authored by jeanPerier on May 22 2023, 8:34 AM.

Download Raw Diff

Details

Reviewers

tblah
vzakhari
clementval

Commits

rG96a003b9bf79: [flang][hlfir] Hoist forall bounds computation when possible

Summary

When inner forall bound computations do not depend on previous
forall indices, they can be hoisted.
This is possible because:

bound computation are required to be pure (so evaluating them only once is possible).
If the bound computation depends on a value previously assigned, the forall scheduling analysis created different run for it: the assignment impacting the bounds value is not part of the current loop nest.

The reason this optimization is done at that point and not as part of
generic loop hoisting optimization is that having the all the loop
bound computation hoisted will allow allocating simple temporary
storages. The number of iteration can be pre-computed and used as the
extent for the temporary.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jeanPerier created this revision.May 22 2023, 8:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 22 2023, 8:34 AM

Herald added subscribers: sunshaoce, mehdi_amini, jdoerfert. · View Herald Transcript

jeanPerier requested review of this revision.May 22 2023, 8:34 AM

LGTM, thanks!

This revision is now accepted and ready to land.May 22 2023, 9:04 AM

Harbormaster completed remote builds in B233588: Diff 524326.May 22 2023, 9:13 AM

vzakhari accepted this revision.May 22 2023, 12:05 PM

Closed by commit rG96a003b9bf79: [flang][hlfir] Hoist forall bounds computation when possible (authored by jeanPerier). · Explain WhyMay 23 2023, 12:19 AM

This revision was automatically updated to reflect the committed changes.

jeanPerier added a commit: rG96a003b9bf79: [flang][hlfir] Hoist forall bounds computation when possible.

Revision Contents

Path

Size

flang/

lib/

Optimizer/

HLFIR/

Transforms/

LowerHLFIROrderedAssignments.cpp

98 lines

test/

HLFIR/

order_assignments/

forall-codegen-no-conflict.fir

18 lines

Diff 524586

flang/lib/Optimizer/HLFIR/Transforms/LowerHLFIROrderedAssignments.cpp

Show First 20 Lines • Show All 135 Lines • ▼ Show 20 Lines	private:
bool		bool
isRequiredInCurrentRun(hlfir::OrderedAssignmentTreeOpInterface node) const;		isRequiredInCurrentRun(hlfir::OrderedAssignmentTreeOpInterface node) const;

/// Generate a scalar value yielded by an ordered assignment tree region.		/// Generate a scalar value yielded by an ordered assignment tree region.
/// If the value was not saved in a previous run, this clone the region		/// If the value was not saved in a previous run, this clone the region
/// code, except the final yield, at the current execution point.		/// code, except the final yield, at the current execution point.
/// If the value was saved in a previous run, this fetches the saved value		/// If the value was saved in a previous run, this fetches the saved value
/// from the temporary storage and returns the value.		/// from the temporary storage and returns the value.
mlir::Value generateYieldedScalarValue(mlir::Region &region);		/// Inside Forall, the value will be hoisted outside of the forall loops if
		/// it does not depend on the forall indices.
		/// An optional type can be provided to get a value from a specific type
		/// (the cast will be hoisted if the computation is hoisted).
		mlir::Value generateYieldedScalarValue(
		mlir::Region &region,
		std::optional<mlir::Type> castToType = std::nullopt);

/// Generate an entity yielded by an ordered assignment tree region, and		/// Generate an entity yielded by an ordered assignment tree region, and
/// optionally return the (uncloned) yield if there is any clean-up that		/// optionally return the (uncloned) yield if there is any clean-up that
/// should be done after using the entity. Like, generateYieldedScalarValue,		/// should be done after using the entity. Like, generateYieldedScalarValue,
/// this will return the saved value if the region was saved in a previous		/// this will return the saved value if the region was saved in a previous
/// run.		/// run.
std::pair<mlir::Value, std::optional<hlfir::YieldOp>>		std::pair<mlir::Value, std::optional<hlfir::YieldOp>>
generateYieldedEntity(mlir::Region &region);		generateYieldedEntity(mlir::Region &region,
		std::optional<mlir::Type> castToType = std::nullopt);

/// If \p maybeYield is present and has a clean-up, generate the clean-up		/// If \p maybeYield is present and has a clean-up, generate the clean-up
/// at the current insertion point (by cloning).		/// at the current insertion point (by cloning).
void generateCleanupIfAny(std::optional<hlfir::YieldOp> maybeYield);		void generateCleanupIfAny(std::optional<hlfir::YieldOp> maybeYield);

/// Generate a masked entity. This can only be called when whereLoopNest was		/// Generate a masked entity. This can only be called when whereLoopNest was
/// set (When an hlfir.where is being visited).		/// set (When an hlfir.where is being visited).
/// This method returns the scalar element (that may have been previously		/// This method returns the scalar element (that may have been previously
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	if (auto *body = node.getSubTreeRegion()) {
hlfir::ElseWhereOp>([&](auto concreteOp) { post(concreteOp); })		hlfir::ElseWhereOp>([&](auto concreteOp) { post(concreteOp); })
.Default([](auto) {});		.Default([](auto) {});
}		}
}		}
}		}

void OrderedAssignmentRewriter::pre(hlfir::ForallOp forallOp) {		void OrderedAssignmentRewriter::pre(hlfir::ForallOp forallOp) {
/// Create a fir.do_loop given the hlfir.forall control values.		/// Create a fir.do_loop given the hlfir.forall control values.
mlir::Value rawLowerBound =
generateYieldedScalarValue(forallOp.getLbRegion());
mlir::Location loc = forallOp.getLoc();
mlir::Type idxTy = builder.getIndexType();		mlir::Type idxTy = builder.getIndexType();
mlir::Value lb = builder.createConvert(loc, idxTy, rawLowerBound);		mlir::Location loc = forallOp.getLoc();
mlir::Value rawUpperBound =		mlir::Value lb = generateYieldedScalarValue(forallOp.getLbRegion(), idxTy);
generateYieldedScalarValue(forallOp.getUbRegion());		mlir::Value ub = generateYieldedScalarValue(forallOp.getUbRegion(), idxTy);
mlir::Value ub = builder.createConvert(loc, idxTy, rawUpperBound);
mlir::Value step;		mlir::Value step;
if (forallOp.getStepRegion().empty()) {		if (forallOp.getStepRegion().empty()) {
		auto insertionPoint = builder.saveInsertionPoint();
		if (!constructStack.empty())
		builder.setInsertionPoint(constructStack[0]);
step = builder.createIntegerConstant(loc, idxTy, 1);		step = builder.createIntegerConstant(loc, idxTy, 1);
		if (!constructStack.empty())
		builder.restoreInsertionPoint(insertionPoint);
} else {		} else {
step = generateYieldedScalarValue(forallOp.getStepRegion());		step = generateYieldedScalarValue(forallOp.getStepRegion(), idxTy);
step = builder.createConvert(loc, idxTy, step);
}		}
auto doLoop = builder.create<fir::DoLoopOp>(loc, lb, ub, step);		auto doLoop = builder.create<fir::DoLoopOp>(loc, lb, ub, step);
builder.setInsertionPointToStart(doLoop.getBody());		builder.setInsertionPointToStart(doLoop.getBody());
mlir::Value oldIndex = forallOp.getForallIndexValue();		mlir::Value oldIndex = forallOp.getForallIndexValue();
mlir::Value newIndex =		mlir::Value newIndex =
builder.createConvert(loc, oldIndex.getType(), doLoop.getInductionVar());		builder.createConvert(loc, oldIndex.getType(), doLoop.getInductionVar());
mapper.map(oldIndex, newIndex);		mapper.map(oldIndex, newIndex);
constructStack.push_back(doLoop);		constructStack.push_back(doLoop);
Show All 11 Lines	mlir::Value indexVar =
builder.createTemporary(loc, intTy, forallIndexOp.getName());		builder.createTemporary(loc, intTy, forallIndexOp.getName());
mlir::Value newVal = mapper.lookupOrDefault(forallIndexOp.getIndex());		mlir::Value newVal = mapper.lookupOrDefault(forallIndexOp.getIndex());
builder.createStoreWithConvert(loc, newVal, indexVar);		builder.createStoreWithConvert(loc, newVal, indexVar);
mapper.map(forallIndexOp, indexVar);		mapper.map(forallIndexOp, indexVar);
}		}

void OrderedAssignmentRewriter::pre(hlfir::ForallMaskOp forallMaskOp) {		void OrderedAssignmentRewriter::pre(hlfir::ForallMaskOp forallMaskOp) {
mlir::Location loc = forallMaskOp.getLoc();		mlir::Location loc = forallMaskOp.getLoc();
mlir::Value mask = generateYieldedScalarValue(forallMaskOp.getMaskRegion());		mlir::Value mask = generateYieldedScalarValue(forallMaskOp.getMaskRegion(),
mask = builder.createConvert(loc, builder.getI1Type(), mask);		builder.getI1Type());
auto ifOp = builder.create<fir::IfOp>(loc, std::nullopt, mask, false);		auto ifOp = builder.create<fir::IfOp>(loc, std::nullopt, mask, false);
builder.setInsertionPointToStart(&ifOp.getThenRegion().front());		builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
constructStack.push_back(ifOp);		constructStack.push_back(ifOp);
}		}

void OrderedAssignmentRewriter::post(hlfir::ForallMaskOp forallMaskOp) {		void OrderedAssignmentRewriter::post(hlfir::ForallMaskOp forallMaskOp) {
assert(!constructStack.empty() && "must contain an ifop");		assert(!constructStack.empty() && "must contain an ifop");
builder.setInsertionPointAfter(constructStack.pop_back_val());		builder.setInsertionPointAfter(constructStack.pop_back_val());
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
void OrderedAssignmentRewriter::post(hlfir::ElseWhereOp elseWhereOp) {		void OrderedAssignmentRewriter::post(hlfir::ElseWhereOp elseWhereOp) {
// Exit ifOp that was created for the elseWhereOp mask, if any.		// Exit ifOp that was created for the elseWhereOp mask, if any.
if (elseWhereOp.getMaskRegion().empty())		if (elseWhereOp.getMaskRegion().empty())
return;		return;
assert(!constructStack.empty() && "must contain a fir.if");		assert(!constructStack.empty() && "must contain a fir.if");
builder.setInsertionPointAfter(constructStack.pop_back_val());		builder.setInsertionPointAfter(constructStack.pop_back_val());
}		}

		/// Is this value a Forall index?
		/// Forall index are block arguments of hlfir.forall body, or the result
		/// of hlfir.forall_index.
		static bool isForallIndex(mlir::Value value) {
		if (auto blockArg = mlir::dyn_cast<mlir::BlockArgument>(value)) {
		if (mlir::Block *block = blockArg.getOwner())
		return block->isEntryBlock() &&
		mlir::isa_and_nonnull<hlfir::ForallOp>(block->getParentOp());
		return false;
		}
		return value.getDefiningOp<hlfir::ForallIndexOp>();
		}

std::pair<mlir::Value, std::optional<hlfir::YieldOp>>		std::pair<mlir::Value, std::optional<hlfir::YieldOp>>
OrderedAssignmentRewriter::generateYieldedEntity(mlir::Region &region) {		OrderedAssignmentRewriter::generateYieldedEntity(
		mlir::Region &region, std::optional<mlir::Type> castToType) {
// TODO: if the region was saved, use that instead of generating code again.		// TODO: if the region was saved, use that instead of generating code again.
if (whereLoopNest.has_value()) {		if (whereLoopNest.has_value()) {
mlir::Location loc = region.getParentOp()->getLoc();		mlir::Location loc = region.getParentOp()->getLoc();
return {generateMaskedEntity(loc, region), std::nullopt};		return {generateMaskedEntity(loc, region), std::nullopt};
}		}
assert(region.hasOneBlock() && "region must contain one block");		assert(region.hasOneBlock() && "region must contain one block");
// Clone all operations except the final hlfir.yield.		auto oldYield = mlir::dyn_cast_or_null<hlfir::YieldOp>(
		region.back().getOperations().back());
		assert(oldYield && "region computing entities must end with a YieldOp");
mlir::Block::OpListType &ops = region.back().getOperations();		mlir::Block::OpListType &ops = region.back().getOperations();

		// Inside Forall, scalars that do not depend on forall indices can be hoisted
		// here because their evaluation is required to only call pure procedures, and
		// if they depend on a variable previously assigned to in a forall assignment,
		// this assignment must have been scheduled in a previous run. Hoisting of
		// scalars is done here to help creating simple temporary storage if needed.
		// Inner forall bounds can often be hoisted, and this allows computing the
		// total number of iterations to create temporary storages.
		bool hoistComputation = false;
		if (fir::isa_trivial(oldYield.getEntity().getType()) &&
		!constructStack.empty()) {
		hoistComputation = true;
		for (mlir::Operation &op : ops)
		if (llvm::any_of(op.getOperands(), [](mlir::Value value) {
		return isForallIndex(value);
		})) {
		hoistComputation = false;
		break;
		}
		}
		auto insertionPoint = builder.saveInsertionPoint();
		if (hoistComputation)
		builder.setInsertionPoint(constructStack[0]);

		// Clone all operations except the final hlfir.yield.
assert(!ops.empty() && "yield block cannot be empty");		assert(!ops.empty() && "yield block cannot be empty");
auto end = ops.end();		auto end = ops.end();
for (auto opIt = ops.begin(); std::next(opIt) != end; ++opIt)		for (auto opIt = ops.begin(); std::next(opIt) != end; ++opIt)
(void)builder.clone(*opIt, mapper);		(void)builder.clone(*opIt, mapper);
auto oldYield = mlir::dyn_cast_or_null<hlfir::YieldOp>(
region.back().getOperations().back());
assert(oldYield && "region computing scalar must end with a YieldOp");
// Get the value for the yielded entity, it may be the result of an operation		// Get the value for the yielded entity, it may be the result of an operation
// that was cloned, or it may be the same as the previous value if the yield		// that was cloned, or it may be the same as the previous value if the yield
// operand was created before the ordered assignment tree.		// operand was created before the ordered assignment tree.
mlir::Value newEntity = mapper.lookupOrDefault(oldYield.getEntity());		mlir::Value newEntity = mapper.lookupOrDefault(oldYield.getEntity());
		if (castToType.has_value())
		newEntity =
		builder.createConvert(newEntity.getLoc(), *castToType, newEntity);

		if (hoistComputation) {
		// Hoisted trivial scalars clean-up can be done right away, the value is
		// in registers.
		generateCleanupIfAny(oldYield);
		builder.restoreInsertionPoint(insertionPoint);
		return {newEntity, std::nullopt};
		}
if (oldYield.getCleanup().empty())		if (oldYield.getCleanup().empty())
return {newEntity, std::nullopt};		return {newEntity, std::nullopt};
return {newEntity, oldYield};		return {newEntity, oldYield};
}		}

mlir::Value		mlir::Value OrderedAssignmentRewriter::generateYieldedScalarValue(
OrderedAssignmentRewriter::generateYieldedScalarValue(mlir::Region &region) {		mlir::Region &region, std::optional<mlir::Type> castToType) {
auto [value, maybeYield] = generateYieldedEntity(region);		auto [value, maybeYield] = generateYieldedEntity(region, castToType);
assert(fir::isa_trivial(value.getType()) && "not a trivial scalar value");		assert(fir::isa_trivial(value.getType()) && "not a trivial scalar value");
generateCleanupIfAny(maybeYield);		generateCleanupIfAny(maybeYield);
return value;		return value;
}		}

mlir::Value		mlir::Value
OrderedAssignmentRewriter::generateMaskedEntity(MaskedArrayExpr &maskedExpr) {		OrderedAssignmentRewriter::generateMaskedEntity(MaskedArrayExpr &maskedExpr) {
assert(whereLoopNest.has_value() && "must be inside WHERE loop nest");		assert(whereLoopNest.has_value() && "must be inside WHERE loop nest");
▲ Show 20 Lines • Show All 311 Lines • Show Last 20 Lines

flang/test/HLFIR/order_assignments/forall-codegen-no-conflict.fir

Show All 18 Lines	func.func @test_simple(%x: !fir.ref<!fir.array<10xi32>>) {
}		}
return		return
}		}
// CHECK-LABEL: func.func @test_simple(		// CHECK-LABEL: func.func @test_simple(
// CHECK-SAME: %[[VAL_0:.*]]: !fir.ref<!fir.array<10xi32>>) {		// CHECK-SAME: %[[VAL_0:.*]]: !fir.ref<!fir.array<10xi32>>) {
// CHECK: %[[VAL_1:.*]] = arith.constant 1 : index		// CHECK: %[[VAL_1:.*]] = arith.constant 1 : index
// CHECK: %[[VAL_2:.*]] = arith.constant 10 : index		// CHECK: %[[VAL_2:.*]] = arith.constant 10 : index
// CHECK: %[[VAL_3:.*]] = arith.constant 1 : index		// CHECK: %[[VAL_3:.*]] = arith.constant 1 : index
// CHECK: fir.do_loop %[[VAL_4:.*]] = %[[VAL_1]] to %[[VAL_2]] step %[[VAL_3]] {		// CHECK: %[[VAL_4:.*]] = arith.constant 42 : i32
// CHECK: %[[VAL_5:.*]] = arith.constant 42 : i32		// CHECK: fir.do_loop %[[VAL_5:.*]] = %[[VAL_1]] to %[[VAL_2]] step %[[VAL_3]] {
// CHECK: %[[VAL_6:.*]] = hlfir.designate %[[VAL_0]] (%[[VAL_4]]) : (!fir.ref<!fir.array<10xi32>>, index) -> !fir.ref<i32>		// CHECK: %[[VAL_6:.*]] = hlfir.designate %[[VAL_0]] (%[[VAL_5]]) : (!fir.ref<!fir.array<10xi32>>, index) -> !fir.ref<i32>
// CHECK: hlfir.assign %[[VAL_5]] to %[[VAL_6]] : i32, !fir.ref<i32>		// CHECK: hlfir.assign %[[VAL_4]] to %[[VAL_6]] : i32, !fir.ref<i32>
// CHECK: }		// CHECK: }

func.func @test_index(%x: !fir.ref<!fir.array<10xi32>>) {		func.func @test_index(%x: !fir.ref<!fir.array<10xi32>>) {
hlfir.forall lb {		hlfir.forall lb {
%c1 = arith.constant 1 : index		%c1 = arith.constant 1 : index
hlfir.yield %c1 : index		hlfir.yield %c1 : index
} ub {		} ub {
%c10 = arith.constant 10 : index		%c10 = arith.constant 10 : index
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
// CHECK: %[[VAL_14:.*]] = hlfir.designate %[[VAL_7]]#0 (%[[VAL_13]]) : (!fir.box<!fir.array<?xf32>>, i64) -> !fir.ref<f32>		// CHECK: %[[VAL_14:.*]] = hlfir.designate %[[VAL_7]]#0 (%[[VAL_13]]) : (!fir.box<!fir.array<?xf32>>, i64) -> !fir.ref<f32>
// CHECK: %[[VAL_15:.*]] = fir.load %[[VAL_14]] : !fir.ref<f32>		// CHECK: %[[VAL_15:.*]] = fir.load %[[VAL_14]] : !fir.ref<f32>
// CHECK: %[[VAL_16:.*]] = hlfir.designate %[[VAL_6]]#0 (%[[VAL_13]]) : (!fir.box<!fir.array<?xf32>>, i64) -> !fir.ref<f32>		// CHECK: %[[VAL_16:.*]] = hlfir.designate %[[VAL_6]]#0 (%[[VAL_13]]) : (!fir.box<!fir.array<?xf32>>, i64) -> !fir.ref<f32>
// CHECK: hlfir.assign %[[VAL_15]] to %[[VAL_16]] : f32, !fir.ref<f32>		// CHECK: hlfir.assign %[[VAL_15]] to %[[VAL_16]] : f32, !fir.ref<f32>
// CHECK: }		// CHECK: }
// CHECK: %[[VAL_17:.*]] = fir.convert %[[VAL_5]] : (i64) -> index		// CHECK: %[[VAL_17:.*]] = fir.convert %[[VAL_5]] : (i64) -> index
// CHECK: %[[VAL_18:.*]] = fir.convert %[[VAL_4]] : (i64) -> index		// CHECK: %[[VAL_18:.*]] = fir.convert %[[VAL_4]] : (i64) -> index
// CHECK: %[[VAL_19:.*]] = arith.constant 1 : index		// CHECK: %[[VAL_19:.*]] = arith.constant 1 : index
// CHECK: fir.do_loop %[[VAL_20:.*]] = %[[VAL_17]] to %[[VAL_18]] step %[[VAL_19]] {
// CHECK: %[[VAL_21:.*]] = fir.convert %[[VAL_20]] : (index) -> i64
// CHECK: %[[VAL_22:.*]] = fir.convert %[[VAL_5]] : (i64) -> index		// CHECK: %[[VAL_22:.*]] = fir.convert %[[VAL_5]] : (i64) -> index
// CHECK: %[[VAL_23:.*]] = fir.convert %[[VAL_4]] : (i64) -> index		// CHECK: %[[VAL_23:.*]] = fir.convert %[[VAL_4]] : (i64) -> index
// CHECK: %[[VAL_24:.*]] = arith.constant 1 : index		// CHECK: %[[VAL_24:.*]] = arith.constant 1 : index
		// CHECK: fir.do_loop %[[VAL_20:.*]] = %[[VAL_17]] to %[[VAL_18]] step %[[VAL_19]] {
		// CHECK: %[[VAL_21:.*]] = fir.convert %[[VAL_20]] : (index) -> i64
// CHECK: fir.do_loop %[[VAL_25:.*]] = %[[VAL_22]] to %[[VAL_23]] step %[[VAL_24]] {		// CHECK: fir.do_loop %[[VAL_25:.*]] = %[[VAL_22]] to %[[VAL_23]] step %[[VAL_24]] {
// CHECK: %[[VAL_26:.*]] = fir.convert %[[VAL_25]] : (index) -> i64		// CHECK: %[[VAL_26:.*]] = fir.convert %[[VAL_25]] : (index) -> i64
// CHECK: %[[VAL_27:.*]] = arith.subi %[[VAL_3]], %[[VAL_21]] : i64		// CHECK: %[[VAL_27:.*]] = arith.subi %[[VAL_3]], %[[VAL_21]] : i64
// CHECK: %[[VAL_28:.*]] = hlfir.designate %[[VAL_6]]#0 (%[[VAL_27]]) : (!fir.box<!fir.array<?xf32>>, i64) -> !fir.ref<f32>		// CHECK: %[[VAL_28:.*]] = hlfir.designate %[[VAL_6]]#0 (%[[VAL_27]]) : (!fir.box<!fir.array<?xf32>>, i64) -> !fir.ref<f32>
// CHECK: %[[VAL_29:.*]] = fir.load %[[VAL_28]] : !fir.ref<f32>		// CHECK: %[[VAL_29:.*]] = fir.load %[[VAL_28]] : !fir.ref<f32>
// CHECK: %[[VAL_30:.*]] = hlfir.designate %[[VAL_8]]#0 (%[[VAL_21]], %[[VAL_26]]) : (!fir.box<!fir.array<?x?xf32>>, i64, i64) -> !fir.ref<f32>		// CHECK: %[[VAL_30:.*]] = hlfir.designate %[[VAL_8]]#0 (%[[VAL_21]], %[[VAL_26]]) : (!fir.box<!fir.array<?x?xf32>>, i64, i64) -> !fir.ref<f32>
// CHECK: hlfir.assign %[[VAL_29]] to %[[VAL_30]] : f32, !fir.ref<f32>		// CHECK: hlfir.assign %[[VAL_29]] to %[[VAL_30]] : f32, !fir.ref<f32>
// CHECK: }		// CHECK: }
Show All 38 Lines
// CHECK: %[[VAL_3:.*]] = arith.constant 10 : i64		// CHECK: %[[VAL_3:.*]] = arith.constant 10 : i64
// CHECK: %[[VAL_4:.*]] = arith.constant 1 : i64		// CHECK: %[[VAL_4:.*]] = arith.constant 1 : i64
// CHECK: %[[VAL_5:.]]:2 = hlfir.declare %{{.}} {uniq_name = "mask"} : (!fir.box<!fir.array<?x!fir.logical<4>>>) -> (!fir.box<!fir.array<?x!fir.logical<4>>>, !fir.box<!fir.array<?x!fir.logical<4>>>)		// CHECK: %[[VAL_5:.]]:2 = hlfir.declare %{{.}} {uniq_name = "mask"} : (!fir.box<!fir.array<?x!fir.logical<4>>>) -> (!fir.box<!fir.array<?x!fir.logical<4>>>, !fir.box<!fir.array<?x!fir.logical<4>>>)
// CHECK: %[[VAL_6:.]]:2 = hlfir.declare %{{.}} {uniq_name = "x"} : (!fir.box<!fir.array<?x?xf32>>) -> (!fir.box<!fir.array<?x?xf32>>, !fir.box<!fir.array<?x?xf32>>)		// CHECK: %[[VAL_6:.]]:2 = hlfir.declare %{{.}} {uniq_name = "x"} : (!fir.box<!fir.array<?x?xf32>>) -> (!fir.box<!fir.array<?x?xf32>>, !fir.box<!fir.array<?x?xf32>>)
// CHECK: %[[VAL_7:.]]:2 = hlfir.declare %{{.}} {uniq_name = "y"} : (!fir.box<!fir.array<?x?xf32>>) -> (!fir.box<!fir.array<?x?xf32>>, !fir.box<!fir.array<?x?xf32>>)		// CHECK: %[[VAL_7:.]]:2 = hlfir.declare %{{.}} {uniq_name = "y"} : (!fir.box<!fir.array<?x?xf32>>) -> (!fir.box<!fir.array<?x?xf32>>, !fir.box<!fir.array<?x?xf32>>)
// CHECK: %[[VAL_8:.*]] = fir.convert %[[VAL_4]] : (i64) -> index		// CHECK: %[[VAL_8:.*]] = fir.convert %[[VAL_4]] : (i64) -> index
// CHECK: %[[VAL_9:.*]] = fir.convert %[[VAL_3]] : (i64) -> index		// CHECK: %[[VAL_9:.*]] = fir.convert %[[VAL_3]] : (i64) -> index
// CHECK: %[[VAL_10:.*]] = arith.constant 1 : index		// CHECK: %[[VAL_10:.*]] = arith.constant 1 : index
		// CHECK: %[[VAL_16:.*]] = fir.convert %[[VAL_4]] : (i64) -> index
		// CHECK: %[[VAL_18:.*]] = arith.constant 1 : index
// CHECK: fir.do_loop %[[VAL_11:.*]] = %[[VAL_8]] to %[[VAL_9]] step %[[VAL_10]] {		// CHECK: fir.do_loop %[[VAL_11:.*]] = %[[VAL_8]] to %[[VAL_9]] step %[[VAL_10]] {
// CHECK: %[[VAL_12:.*]] = fir.convert %[[VAL_11]] : (index) -> i64		// CHECK: %[[VAL_12:.*]] = fir.convert %[[VAL_11]] : (index) -> i64
// CHECK: %[[VAL_13:.*]] = hlfir.designate %[[VAL_5]]#0 (%[[VAL_12]]) : (!fir.box<!fir.array<?x!fir.logical<4>>>, i64) -> !fir.ref<!fir.logical<4>>		// CHECK: %[[VAL_13:.*]] = hlfir.designate %[[VAL_5]]#0 (%[[VAL_12]]) : (!fir.box<!fir.array<?x!fir.logical<4>>>, i64) -> !fir.ref<!fir.logical<4>>
// CHECK: %[[VAL_14:.*]] = fir.load %[[VAL_13]] : !fir.ref<!fir.logical<4>>		// CHECK: %[[VAL_14:.*]] = fir.load %[[VAL_13]] : !fir.ref<!fir.logical<4>>
// CHECK: %[[VAL_15:.*]] = fir.convert %[[VAL_14]] : (!fir.logical<4>) -> i1		// CHECK: %[[VAL_15:.*]] = fir.convert %[[VAL_14]] : (!fir.logical<4>) -> i1
// CHECK: fir.if %[[VAL_15]] {		// CHECK: fir.if %[[VAL_15]] {
// CHECK: %[[VAL_16:.*]] = fir.convert %[[VAL_4]] : (i64) -> index
// CHECK: %[[VAL_17:.*]] = fir.convert %[[VAL_12]] : (i64) -> index		// CHECK: %[[VAL_17:.*]] = fir.convert %[[VAL_12]] : (i64) -> index
// CHECK: %[[VAL_18:.*]] = arith.constant 1 : index
// CHECK: fir.do_loop %[[VAL_19:.*]] = %[[VAL_16]] to %[[VAL_17]] step %[[VAL_18]] {		// CHECK: fir.do_loop %[[VAL_19:.*]] = %[[VAL_16]] to %[[VAL_17]] step %[[VAL_18]] {
// CHECK: %[[VAL_20:.*]] = fir.convert %[[VAL_19]] : (index) -> i64		// CHECK: %[[VAL_20:.*]] = fir.convert %[[VAL_19]] : (index) -> i64
// CHECK: %[[VAL_21:.*]] = hlfir.designate %[[VAL_7]]#0 (%[[VAL_12]], %[[VAL_20]]) : (!fir.box<!fir.array<?x?xf32>>, i64, i64) -> !fir.ref<f32>		// CHECK: %[[VAL_21:.*]] = hlfir.designate %[[VAL_7]]#0 (%[[VAL_12]], %[[VAL_20]]) : (!fir.box<!fir.array<?x?xf32>>, i64, i64) -> !fir.ref<f32>
// CHECK: %[[VAL_22:.*]] = fir.load %[[VAL_21]] : !fir.ref<f32>		// CHECK: %[[VAL_22:.*]] = fir.load %[[VAL_21]] : !fir.ref<f32>
// CHECK: %[[VAL_23:.*]] = hlfir.designate %[[VAL_6]]#0 (%[[VAL_12]], %[[VAL_20]]) : (!fir.box<!fir.array<?x?xf32>>, i64, i64) -> !fir.ref<f32>		// CHECK: %[[VAL_23:.*]] = hlfir.designate %[[VAL_6]]#0 (%[[VAL_12]], %[[VAL_20]]) : (!fir.box<!fir.array<?x?xf32>>, i64, i64) -> !fir.ref<f32>
// CHECK: hlfir.assign %[[VAL_22]] to %[[VAL_23]] : f32, !fir.ref<f32>		// CHECK: hlfir.assign %[[VAL_22]] to %[[VAL_23]] : f32, !fir.ref<f32>
// CHECK: }		// CHECK: }
// CHECK: }		// CHECK: }
// CHECK: }		// CHECK: }