This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
flang/
-
lib/Optimizer/Transforms/
-
Optimizer/
-
Transforms/
2/4
LoopVersioning.cpp
-
test/Transforms/
-
Transforms/
-
loop-versioning.fir

Differential D158597

[flang][LoopVersioning] support fir.array_coor
ClosedPublic

Authored by tblah on Aug 23 2023, 5:03 AM.

Download Raw Diff

Details

Reviewers

Leporacanthicus
kiranchandramohan
vzakhari

Commits

rGad9af7de90d2: [flang][LoopVersioning] support fir.array_coor

Summary

This is the last piece required for the loop versioning patch to work on
code lowered via HLFIR. With this patch, HLFIR performance on spec2017
roms is now similar to the FIR lowering.

Adding support for fir.array_coor means that many more loops will be
versioned, even in the FIR lowering. So far as I have seen, these do not
seem to have an impact on performance for the benchmarks I tried, but I
expect it would speed up some programs, if the loop being versioned
happened to be the hot code.

The main difference between fir.array_coor and fir.coordinate_of is
that fir.coordinate_of uses zero-based indices, whereas fir.array_coor
uses the indices as specified in the Fortran program (starting from 1 by
default, but also supporting non default lower bounds). I opted to
transform fir.array_coor operations into fir.coordinate_of operations
because this allows both to share the same offset calculation logic.

The tricky bit of this patch is getting the correct lower bounds for the
array operand to subtract from the fir.array_coor indices to get a
zero-based indices. So far as I can tell, the FIR lowering will always
provide lower bounds (shift) information in the shape operand to the
fir.array_coor when non-default lower bounds are used. If none is given,
I originally tried falling back to reading lower bounds from the box,
but this led to misscompilation in SPEC2017 cam4. Therefore the pass
instead assumes that if it can't already find an SSA value for the shift
information, the default lower bound (1) should be used.

A suspect the incorrect lower bounds in the box for the FIR lowering was
already a known issue (see https://reviews.llvm.org/D158119).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tblah created this revision.Aug 23 2023, 5:03 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 23 2023, 5:03 AM

Herald added a subscriber: mehdi_amini. · View Herald Transcript

tblah requested review of this revision.Aug 23 2023, 5:03 AM

Herald added a subscriber: jdoerfert. · View Herald TranscriptAug 23 2023, 5:03 AM

Harbormaster completed remote builds in B254304: Diff 552663.Aug 23 2023, 5:26 AM

Sorry, I will need more time to review this thoroughly.

So far as I can tell, the FIR lowering will always provide lower bounds (shift) information in the shape operand to the fir.array_coor when non-default lower bounds are used. If none is given, I originally tried falling back to reading lower bounds from the box, but this led to misscompilation in SPEC2017 cam4. Therefore the pass instead assumes that if it can't already find an SSA value for the shift information, the default lower bound (1) should be used.

I believe the assumption is correct: fir.array_coor without a shift/shapeshift operand implies default lbound-1. I think this comes from the code in pre-cg lowering of fir.array_coor and further codegen. I think fir.array_coor description might be missing this.

As to the incorrect bounds in the box, I wonder how this is possible given the changes in D158119. Will it be easy for you to put your changes for "falling back to reading lower bounds from the box" under an option so that I can run more testing and see if the issue appears on smaller tests? I do not want to deal with cam4 miscompare...

vzakhari added inline comments.Aug 24 2023, 4:54 PM

flang/lib/Optimizer/Transforms/LoopVersioning.cpp
145	I think cases 2 and 3 should not be here. According to `ArrayCoorOp` pre-codegen and `XArrayCoorOp` codegen, if there is no shift, then the lower bounds are always 1. I would suggest following the same logic here for consistency.
252	Please add braces for the else-if clause.

Matt added a subscriber: Matt.Aug 28 2023, 1:33 PM

In D158597#4612550, @vzakhari wrote:

Sorry, I will need more time to review this thoroughly.

So far as I can tell, the FIR lowering will always provide lower bounds (shift) information in the shape operand to the fir.array_coor when non-default lower bounds are used. If none is given, I originally tried falling back to reading lower bounds from the box, but this led to misscompilation in SPEC2017 cam4. Therefore the pass instead assumes that if it can't already find an SSA value for the shift information, the default lower bound (1) should be used.

I believe the assumption is correct: fir.array_coor without a shift/shapeshift operand implies default lbound-1. I think this comes from the code in pre-cg lowering of fir.array_coor and further codegen. I think fir.array_coor description might be missing this.

As to the incorrect bounds in the box, I wonder how this is possible given the changes in D158119. Will it be easy for you to put your changes for "falling back to reading lower bounds from the box" under an option so that I can run more testing and see if the issue appears on smaller tests? I do not want to deal with cam4 miscompare...

Yes after D158119, I saw no issue on cam4 with HLFIR lowering. The issue was with the FIR lowering.

In D158597#4625208, @tblah wrote:

In D158597#4612550, @vzakhari wrote:

Sorry, I will need more time to review this thoroughly.

So far as I can tell, the FIR lowering will always provide lower bounds (shift) information in the shape operand to the fir.array_coor when non-default lower bounds are used. If none is given, I originally tried falling back to reading lower bounds from the box, but this led to misscompilation in SPEC2017 cam4. Therefore the pass instead assumes that if it can't already find an SSA value for the shift information, the default lower bound (1) should be used.

I believe the assumption is correct: fir.array_coor without a shift/shapeshift operand implies default lbound-1. I think this comes from the code in pre-cg lowering of fir.array_coor and further codegen. I think fir.array_coor description might be missing this.

As to the incorrect bounds in the box, I wonder how this is possible given the changes in D158119. Will it be easy for you to put your changes for "falling back to reading lower bounds from the box" under an option so that I can run more testing and see if the issue appears on smaller tests? I do not want to deal with cam4 miscompare...

Yes after D158119, I saw no issue on cam4 with HLFIR lowering. The issue was with the FIR lowering.

I see. Then please ignore my request.

vzakhari added inline comments.Aug 30 2023, 8:32 PM

flang/lib/Optimizer/Transforms/LoopVersioning.cpp
145	Hi @tblah, I just want to make sure we are on the same page. I think 2 and 3 needs to be removed for correctness, though, I do not have an example where this might cause a problem. Otherwise, the patch looks good to me!

tblah marked 2 inline comments as done.Aug 31 2023, 3:05 AM

tblah added inline comments.

flang/lib/Optimizer/Transforms/LoopVersioning.cpp
145	Hi yeah I'm working on it. I've had a few days off recently and all the spec runs take ages. I think this shouldn't be a correctness issue - if we have incorrect lower bounds information in the IR then that is a bug. But I'm happy to remove it so long as the HLFIR flow still works.

Changes: removed attempts to find lower bounds by tracing the memref argument through IR

Harbormaster completed remote builds in B255972: Diff 554970.Aug 31 2023, 4:27 AM

LGTM. Thanks Tom.

This revision is now accepted and ready to land.Sep 1 2023, 6:37 AM

Thank you, Tom!

This revision was landed with ongoing or failed builds.Sep 4 2023, 3:43 AM

Closed by commit rGad9af7de90d2: [flang][LoopVersioning] support fir.array_coor (authored by tblah). · Explain Why

This revision was automatically updated to reflect the committed changes.

tblah added a commit: rGad9af7de90d2: [flang][LoopVersioning] support fir.array_coor.

Revision Contents

Path

Size

flang/

lib/

Optimizer/

Transforms/

LoopVersioning.cpp

89 lines

test/

Transforms/

loop-versioning.fir

696 lines

Diff 552663

flang/lib/Optimizer/Transforms/LoopVersioning.cpp

Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines
}		}

/// normalize a value (removing fir.declare and fir.rebox) so that we can		/// normalize a value (removing fir.declare and fir.rebox) so that we can
/// more conveniently spot values which came from function arguments		/// more conveniently spot values which came from function arguments
static mlir::Value normaliseVal(mlir::Value val) {		static mlir::Value normaliseVal(mlir::Value val) {
return unwrapFirDeclare(unwrapReboxOp(val));		return unwrapFirDeclare(unwrapReboxOp(val));
}		}

		/// some FIR operations accept a fir.shape, a fir.shift or a fir.shapeshift.
		/// fir.shift and fir.shapeshift allow us to extract lower bounds
		/// if lowerbounds cannot be found, return nullptr
		static mlir::Value tryGetLowerBoundsFromShapeLike(mlir::Value shapeLike,
		unsigned dim) {
		mlir::Value lowerBound{nullptr};
		if (auto shift = shapeLike.getDefiningOp<fir::ShiftOp>())
		lowerBound = shift.getOrigins()[dim];
		if (auto shapeShift = shapeLike.getDefiningOp<fir::ShapeShiftOp>())
		lowerBound = shapeShift.getOrigins()[dim];
		return lowerBound;
		}

		/// attempt to get the array lower bounds of dimension dim of the memref
		/// argument to a fir.array_coor op
		/// 0 <= dim < rank
		/// May return nullptr if no lower bounds can be determined
		static mlir::Value getLowerBound(fir::ArrayCoorOp coop, unsigned dim) {
		// 1) try to get from the shape argument to fir.array_coor
		if (mlir::Value shapeLike = coop.getShape())
		if (mlir::Value lb = tryGetLowerBoundsFromShapeLike(shapeLike, dim))
		return lb;

		// 2) if we get the memref from a rebox op, that might have a shape argument
		vzakhariUnsubmitted Done Reply Inline Actions I think cases 2 and 3 should not be here. According to `ArrayCoorOp` pre-codegen and `XArrayCoorOp` codegen, if there is no shift, then the lower bounds are always 1. I would suggest following the same logic here for consistency. vzakhari: I think cases 2 and 3 should not be here. According to `ArrayCoorOp` pre-codegen and…
		vzakhariUnsubmitted Not Done Reply Inline Actions Hi @tblah, I just want to make sure we are on the same page. I think 2 and 3 needs to be removed for correctness, though, I do not have an example where this might cause a problem. Otherwise, the patch looks good to me! vzakhari: Hi @tblah, I just want to make sure we are on the same page. I think 2 and 3 needs to be…
		tblahAuthorUnsubmitted Not Done Reply Inline Actions Hi yeah I'm working on it. I've had a few days off recently and all the spec runs take ages. I think this shouldn't be a correctness issue - if we have incorrect lower bounds information in the IR then that is a bug. But I'm happy to remove it so long as the HLFIR flow still works. tblah: Hi yeah I'm working on it. I've had a few days off recently and all the spec runs take ages. I…
		if (auto rebox = coop.getMemref().getDefiningOp<fir::ReboxOp>())
		if (mlir::Value shapeLike = rebox.getShape())
		if (mlir::Value lb = tryGetLowerBoundsFromShapeLike(shapeLike, dim))
		return lb;

		// 3) if we get the memref from a fir.declare, that might have a shape
		// argument
		if (auto declare =
		unwrapReboxOp(coop.getMemref()).getDefiningOp<fir::DeclareOp>())
		if (mlir::Value shapeLike = declare.getShape())
		if (mlir::Value lb = tryGetLowerBoundsFromShapeLike(shapeLike, dim))
		return lb;

		// It is important not to try to read the lower bound from the box, because
		// in the FIR lowering, boxes will sometimes contain incorrect lower bound
		// information

		// out of ideas
		return {};
		}

		/// gets the i'th index from array coordinate operation op
		/// dim should range between 0 and rank - 1
		static mlir::Value getIndex(fir::FirOpBuilder &builder, mlir::Operation *op,
		unsigned dim) {
		if (fir::CoordinateOp coop = mlir::dyn_cast<fir::CoordinateOp>(op))
		return coop.getCoor()[dim];

		fir::ArrayCoorOp coop = mlir::dyn_cast<fir::ArrayCoorOp>(op);
		assert(coop &&
		"operation must be either fir.coordiante_of or fir.array_coor");

		// fir.coordinate_of indices start at 0: adjust these indices to match by
		// subtracting the lower bound
		mlir::Value index = coop.getIndices()[dim];
		mlir::Value lb = getLowerBound(coop, dim);
		if (!lb)
		// assume a default lower bound of one
		lb = builder.createIntegerConstant(coop.getLoc(), index.getType(), 1);

		// index_0 = index - lb;
		if (lb.getType() != index.getType())
		lb = builder.createConvert(coop.getLoc(), index.getType(), lb);
		return builder.create<mlir::arith::SubIOp>(coop.getLoc(), index, lb);
		}

void LoopVersioningPass::runOnOperation() {		void LoopVersioningPass::runOnOperation() {
LLVM_DEBUG(llvm::dbgs() << "=== Begin " DEBUG_TYPE " ===\n");		LLVM_DEBUG(llvm::dbgs() << "=== Begin " DEBUG_TYPE " ===\n");
mlir::func::FuncOp func = getOperation();		mlir::func::FuncOp func = getOperation();

/// @c ArgInfo		/// @c ArgInfo
/// A structure to hold an argument, the size of the argument and dimension		/// A structure to hold an argument, the size of the argument and dimension
/// information.		/// information.
struct ArgInfo {		struct ArgInfo {
Show All 38 Lines	struct OpsWithArgs {
mlir::SmallVector<ArgInfo, 4> argsAndDims;		mlir::SmallVector<ArgInfo, 4> argsAndDims;
};		};
// Now see if those arguments are used inside any loop.		// Now see if those arguments are used inside any loop.
mlir::SmallVector<OpsWithArgs, 4> loopsOfInterest;		mlir::SmallVector<OpsWithArgs, 4> loopsOfInterest;

func.walk([&](fir::DoLoopOp loop) {		func.walk([&](fir::DoLoopOp loop) {
mlir::Block &body = *loop.getBody();		mlir::Block &body = *loop.getBody();
mlir::SmallVector<ArgInfo, 4> argsInLoop;		mlir::SmallVector<ArgInfo, 4> argsInLoop;
body.walk([&](fir::CoordinateOp op) {		body.walk([&](mlir::Operation *op) {
		// support either fir.array_coor or fir.coordinate_of
		if (auto arrayCoor = mlir::dyn_cast<fir::ArrayCoorOp>(op)) {
		// no support currently for sliced arrays
		if (arrayCoor.getSlice())
		return;
		} else if (!mlir::isa<fir::CoordinateOp>(op))
		vzakhariUnsubmitted Done Reply Inline Actions Please add braces for the else-if clause. vzakhari: Please add braces for the else-if clause.
		return;

// The current operation could be inside another loop than		// The current operation could be inside another loop than
// the one we're currently processing. Skip it, we'll get		// the one we're currently processing. Skip it, we'll get
// to it later.		// to it later.
if (op->getParentOfType<fir::DoLoopOp>() != loop)		if (op->getParentOfType<fir::DoLoopOp>() != loop)
return;		return;
mlir::Value operand = op->getOperand(0);		mlir::Value operand = op->getOperand(0);
for (auto a : argsOfInterest) {		for (auto a : argsOfInterest) {
if (a.arg == normaliseVal(operand)) {		if (a.arg == normaliseVal(operand)) {
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	for (auto &arg : op.argsAndDims) {
auto elementType = fir::unwrapSeqOrBoxedSeqType(arg.arg.getType());		auto elementType = fir::unwrapSeqOrBoxedSeqType(arg.arg.getType());
mlir::Type arrTy = fir::SequenceType::get(newShape, elementType);		mlir::Type arrTy = fir::SequenceType::get(newShape, elementType);
mlir::Type boxArrTy = fir::BoxType::get(arrTy);		mlir::Type boxArrTy = fir::BoxType::get(arrTy);
mlir::Type refArrTy = builder.getRefType(arrTy);		mlir::Type refArrTy = builder.getRefType(arrTy);
auto carg = builder.create<fir::ConvertOp>(loc, boxArrTy, arg.arg);		auto carg = builder.create<fir::ConvertOp>(loc, boxArrTy, arg.arg);
auto caddr = builder.create<fir::BoxAddrOp>(loc, refArrTy, carg);		auto caddr = builder.create<fir::BoxAddrOp>(loc, refArrTy, carg);
auto insPt = builder.saveInsertionPoint();		auto insPt = builder.saveInsertionPoint();
// Use caddr instead of arg.		// Use caddr instead of arg.
clonedLoop->walk([&](fir::CoordinateOp coop) {		clonedLoop->walk([&](mlir::Operation *coop) {
		if (!mlir::isa<fir::CoordinateOp, fir::ArrayCoorOp>(coop))
		return;
// Reduce the multi-dimensioned index to a single index.		// Reduce the multi-dimensioned index to a single index.
// This is required becase fir arrays do not support multiple dimensions		// This is required becase fir arrays do not support multiple dimensions
// with unknown dimensions at compile time.		// with unknown dimensions at compile time.
// We then calculate the multidimensional array like this:		// We then calculate the multidimensional array like this:
// arr(x, y, z) bedcomes arr(z * stride(2) + y * stride(1) + x)		// arr(x, y, z) bedcomes arr(z * stride(2) + y * stride(1) + x)
// where stride is the distance between elements in the dimensions		// where stride is the distance between elements in the dimensions
// 0, 1 and 2 or x, y and z.		// 0, 1 and 2 or x, y and z.
if (coop->getOperand(0) == arg.arg && coop->getOperands().size() >= 2) {		if (coop->getOperand(0) == arg.arg && coop->getOperands().size() >= 2) {
builder.setInsertionPoint(coop);		builder.setInsertionPoint(coop);
mlir::Value totalIndex;		mlir::Value totalIndex;
for (unsigned i = arg.rank - 1; i > 0; i--) {		for (unsigned i = arg.rank - 1; i > 0; i--) {
// Operand(1) = array; Operand(2) = index1; Operand(3) = index2
mlir::Value curIndex =		mlir::Value curIndex =
builder.createConvert(loc, idxTy, coop->getOperand(i + 1));		builder.createConvert(loc, idxTy, getIndex(builder, coop, i));
// Multiply by the stride of this array. Later we'll divide by the		// Multiply by the stride of this array. Later we'll divide by the
// element size.		// element size.
mlir::Value scale =		mlir::Value scale =
builder.createConvert(loc, idxTy, arg.dims[i].getResult(2));		builder.createConvert(loc, idxTy, arg.dims[i].getResult(2));
curIndex =		curIndex =
builder.create<mlir::arith::MulIOp>(loc, scale, curIndex);		builder.create<mlir::arith::MulIOp>(loc, scale, curIndex);
totalIndex = (totalIndex) ? builder.create<mlir::arith::AddIOp>(		totalIndex = (totalIndex) ? builder.create<mlir::arith::AddIOp>(
loc, curIndex, totalIndex)		loc, curIndex, totalIndex)
: curIndex;		: curIndex;
}		}
// This is the lowest dimension - which doesn't need scaling		// This is the lowest dimension - which doesn't need scaling
mlir::Value finalIndex =		mlir::Value finalIndex =
builder.createConvert(loc, idxTy, coop->getOperand(1));		builder.createConvert(loc, idxTy, getIndex(builder, coop, 0));
if (totalIndex) {		if (totalIndex) {
assert(llvm::isPowerOf2_32(arg.size) &&		assert(llvm::isPowerOf2_32(arg.size) &&
"Expected power of two here");		"Expected power of two here");
unsigned bits = llvm::Log2_32(arg.size);		unsigned bits = llvm::Log2_32(arg.size);
mlir::Value elemShift =		mlir::Value elemShift =
builder.createIntegerConstant(loc, idxTy, bits);		builder.createIntegerConstant(loc, idxTy, bits);
totalIndex = builder.create<mlir::arith::AddIOp>(		totalIndex = builder.create<mlir::arith::AddIOp>(
loc,		loc,
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

flang/test/Transforms/loop-versioning.fir

	Show First 20 Lines • Show All 516 Lines • ▼ Show 20 Lines
	// CHECK: %{{.*}}= fir.load %[[COORD2]] : !fir.ref<f64>			// CHECK: %{{.*}}= fir.load %[[COORD2]] : !fir.ref<f64>
	// CHECK: fir.result %{{.}}, %{{.}}			// CHECK: fir.result %{{.}}, %{{.}}
	// CHECK: }			// CHECK: }
	// CHECK fir.result %[[LOOP_RES2]]#0, %[[LOOP_RES2]]#1			// CHECK fir.result %[[LOOP_RES2]]#0, %[[LOOP_RES2]]#1
	// CHECK: }			// CHECK: }
	// CHECK: fir.store %[[IF_RES]]#1 to %{{.*}}			// CHECK: fir.store %[[IF_RES]]#1 to %{{.*}}
	// CHECK: return			// CHECK: return

				// test sum1d with hlfir lowering
				func.func @_QPsum1d(%arg0: !fir.box<!fir.array<?xf64>> {fir.bindc_name = "a"}, %arg1: !fir.ref<i32> {fir.bindc_name = "n"}) {
				%c1 = arith.constant 1 : index
				%cst = arith.constant 0.000000e+00 : f64
				%0 = fir.declare %arg0 {uniq_name = "_QFsum1dEa"} : (!fir.box<!fir.array<?xf64>>) -> !fir.box<!fir.array<?xf64>>
				%1 = fir.rebox %0 : (!fir.box<!fir.array<?xf64>>) -> !fir.box<!fir.array<?xf64>>
				%2 = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFsum1dEi"}
				%3 = fir.declare %2 {uniq_name = "_QFsum1dEi"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%4 = fir.declare %arg1 {uniq_name = "_QFsum1dEn"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%5 = fir.alloca f64 {bindc_name = "sum", uniq_name = "_QFsum1dEsum"}
				%6 = fir.declare %5 {uniq_name = "_QFsum1dEsum"} : (!fir.ref<f64>) -> !fir.ref<f64>
				fir.store %cst to %6 : !fir.ref<f64>
				%7 = fir.load %4 : !fir.ref<i32>
				%8 = fir.convert %7 : (i32) -> index
				%9 = fir.convert %c1 : (index) -> i32
				%10:2 = fir.do_loop %arg2 = %c1 to %8 step %c1 iter_args(%arg3 = %9) -> (index, i32) {
				fir.store %arg3 to %3 : !fir.ref<i32>
				%11 = fir.load %6 : !fir.ref<f64>
				%12 = fir.load %3 : !fir.ref<i32>
				%13 = fir.convert %12 : (i32) -> i64
				%14 = fir.array_coor %1 %13 : (!fir.box<!fir.array<?xf64>>, i64) -> !fir.ref<f64>
				%15 = fir.load %14 : !fir.ref<f64>
				%16 = arith.addf %11, %15 fastmath<contract> : f64
				fir.store %16 to %6 : !fir.ref<f64>
				%17 = arith.addi %arg2, %c1 : index
				%18 = fir.load %3 : !fir.ref<i32>
				%19 = arith.addi %18, %9 : i32
				fir.result %17, %19 : index, i32
				}
				fir.store %10#1 to %3 : !fir.ref<i32>
				return
				}
				// CHECK-LABEL: func.func @_QPsum1d(
				// CHECK-SAME: %[[VAL_0:.*]]: !fir.box<!fir.array<?xf64>> {fir.bindc_name = "a"},
				// CHECK-SAME: %[[VAL_1:.*]]: !fir.ref<i32> {fir.bindc_name = "n"}) {
				// CHECK: %[[VAL_2:.*]] = arith.constant 1 : index
				// CHECK: %[[VAL_3:.*]] = arith.constant 0.000000e+00 : f64
				// CHECK: %[[VAL_4:.*]] = fir.declare %[[VAL_0]] {uniq_name = "_QFsum1dEa"} : (!fir.box<!fir.array<?xf64>>) -> !fir.box<!fir.array<?xf64>>
				// CHECK: %[[VAL_5:.*]] = fir.rebox %[[VAL_4]] : (!fir.box<!fir.array<?xf64>>) -> !fir.box<!fir.array<?xf64>>
				// CHECK: %[[VAL_6:.*]] = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFsum1dEi"}
				// CHECK: %[[VAL_7:.*]] = fir.declare %[[VAL_6]] {uniq_name = "_QFsum1dEi"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_8:.*]] = fir.declare %[[VAL_1]] {uniq_name = "_QFsum1dEn"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_9:.*]] = fir.alloca f64 {bindc_name = "sum", uniq_name = "_QFsum1dEsum"}
				// CHECK: %[[VAL_10:.*]] = fir.declare %[[VAL_9]] {uniq_name = "_QFsum1dEsum"} : (!fir.ref<f64>) -> !fir.ref<f64>
				// CHECK: fir.store %[[VAL_3]] to %[[VAL_10]] : !fir.ref<f64>
				// CHECK: %[[VAL_11:.*]] = fir.load %[[VAL_8]] : !fir.ref<i32>
				// CHECK: %[[VAL_12:.*]] = fir.convert %[[VAL_11]] : (i32) -> index
				// CHECK: %[[VAL_13:.*]] = fir.convert %[[VAL_2]] : (index) -> i32
				// CHECK: %[[VAL_14:.*]] = arith.constant 0 : index
				// CHECK: %[[VAL_15:.*]]:3 = fir.box_dims %[[VAL_5]], %[[VAL_14]] : (!fir.box<!fir.array<?xf64>>, index) -> (index, index, index)
				// CHECK: %[[VAL_16:.*]] = arith.constant 8 : index
				// CHECK: %[[VAL_17:.*]] = arith.cmpi eq, %[[VAL_15]]#2, %[[VAL_16]] : index
				// CHECK: %[[VAL_18:.*]]:2 = fir.if %[[VAL_17]] -> (index, i32) {
				// CHECK: %[[VAL_19:.*]] = fir.convert %[[VAL_5]] : (!fir.box<!fir.array<?xf64>>) -> !fir.box<!fir.array<?xf64>>
				// CHECK: %[[VAL_20:.*]] = fir.box_addr %[[VAL_19]] : (!fir.box<!fir.array<?xf64>>) -> !fir.ref<!fir.array<?xf64>>
				// CHECK: %[[VAL_21:.]]:2 = fir.do_loop %[[VAL_22:.]] = %[[VAL_2]] to %[[VAL_12]] step %[[VAL_2]] iter_args(%[[VAL_23:.*]] = %[[VAL_13]]) -> (index, i32) {
				// CHECK: fir.store %[[VAL_23]] to %[[VAL_7]] : !fir.ref<i32>
				// CHECK: %[[VAL_24:.*]] = fir.load %[[VAL_10]] : !fir.ref<f64>
				// CHECK: %[[VAL_25:.*]] = fir.load %[[VAL_7]] : !fir.ref<i32>
				// CHECK: %[[VAL_26:.*]] = fir.convert %[[VAL_25]] : (i32) -> i64
				// CHECK: %[[VAL_27:.*]] = arith.constant 1 : i64
				// CHECK: %[[VAL_28:.*]] = arith.subi %[[VAL_26]], %[[VAL_27]] : i64
				// CHECK: %[[VAL_29:.*]] = fir.convert %[[VAL_28]] : (i64) -> index
				// CHECK: %[[VAL_30:.*]] = fir.coordinate_of %[[VAL_20]], %[[VAL_29]] : (!fir.ref<!fir.array<?xf64>>, index) -> !fir.ref<f64>
				// CHECK: %[[VAL_31:.*]] = fir.load %[[VAL_30]] : !fir.ref<f64>
				// CHECK: %[[VAL_32:.*]] = arith.addf %[[VAL_24]], %[[VAL_31]] fastmath<contract> : f64
				// CHECK: fir.store %[[VAL_32]] to %[[VAL_10]] : !fir.ref<f64>
				// CHECK: %[[VAL_33:.*]] = arith.addi %[[VAL_22]], %[[VAL_2]] : index
				// CHECK: %[[VAL_34:.*]] = fir.load %[[VAL_7]] : !fir.ref<i32>
				// CHECK: %[[VAL_35:.*]] = arith.addi %[[VAL_34]], %[[VAL_13]] : i32
				// CHECK: fir.result %[[VAL_33]], %[[VAL_35]] : index, i32
				// CHECK: }
				// CHECK: fir.result %[[VAL_36:.*]]#0, %[[VAL_36]]#1 : index, i32
				// CHECK: } else {
				// CHECK: %[[VAL_37:.]]:2 = fir.do_loop %[[VAL_38:.]] = %[[VAL_2]] to %[[VAL_12]] step %[[VAL_2]] iter_args(%[[VAL_39:.*]] = %[[VAL_13]]) -> (index, i32) {
				// CHECK: fir.store %[[VAL_39]] to %[[VAL_7]] : !fir.ref<i32>
				// CHECK: %[[VAL_40:.*]] = fir.load %[[VAL_10]] : !fir.ref<f64>
				// CHECK: %[[VAL_41:.*]] = fir.load %[[VAL_7]] : !fir.ref<i32>
				// CHECK: %[[VAL_42:.*]] = fir.convert %[[VAL_41]] : (i32) -> i64
				// CHECK: %[[VAL_43:.*]] = fir.array_coor %[[VAL_5]] %[[VAL_42]] : (!fir.box<!fir.array<?xf64>>, i64) -> !fir.ref<f64>
				// CHECK: %[[VAL_44:.*]] = fir.load %[[VAL_43]] : !fir.ref<f64>
				// CHECK: %[[VAL_45:.*]] = arith.addf %[[VAL_40]], %[[VAL_44]] fastmath<contract> : f64
				// CHECK: fir.store %[[VAL_45]] to %[[VAL_10]] : !fir.ref<f64>
				// CHECK: %[[VAL_46:.*]] = arith.addi %[[VAL_38]], %[[VAL_2]] : index
				// CHECK: %[[VAL_47:.*]] = fir.load %[[VAL_7]] : !fir.ref<i32>
				// CHECK: %[[VAL_48:.*]] = arith.addi %[[VAL_47]], %[[VAL_13]] : i32
				// CHECK: fir.result %[[VAL_46]], %[[VAL_48]] : index, i32
				// CHECK: }
				// CHECK: fir.result %[[VAL_49:.*]]#0, %[[VAL_49]]#1 : index, i32
				// CHECK: }
				// CHECK: fir.store %[[VAL_50:.*]]#1 to %[[VAL_7]] : !fir.ref<i32>
				// CHECK: return
				// CHECK: }

				// test sum2d with hlfir lowering
				func.func @_QPsum2d(%arg0: !fir.box<!fir.array<?x?xf64>> {fir.bindc_name = "a"}, %arg1: !fir.ref<i32> {fir.bindc_name = "nx"}, %arg2: !fir.ref<i32> {fir.bindc_name = "ny"}) {
				%c1 = arith.constant 1 : index
				%cst = arith.constant 0.000000e+00 : f64
				%0 = fir.declare %arg0 {uniq_name = "_QFsum2dEa"} : (!fir.box<!fir.array<?x?xf64>>) -> !fir.box<!fir.array<?x?xf64>>
				%1 = fir.rebox %0 : (!fir.box<!fir.array<?x?xf64>>) -> !fir.box<!fir.array<?x?xf64>>
				%2 = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFsum2dEi"}
				%3 = fir.declare %2 {uniq_name = "_QFsum2dEi"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%4 = fir.alloca i32 {bindc_name = "j", uniq_name = "_QFsum2dEj"}
				%5 = fir.declare %4 {uniq_name = "_QFsum2dEj"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%6 = fir.declare %arg1 {uniq_name = "_QFsum2dEnx"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%7 = fir.declare %arg2 {uniq_name = "_QFsum2dEny"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%8 = fir.alloca f64 {bindc_name = "sum", uniq_name = "_QFsum2dEsum"}
				%9 = fir.declare %8 {uniq_name = "_QFsum2dEsum"} : (!fir.ref<f64>) -> !fir.ref<f64>
				fir.store %cst to %9 : !fir.ref<f64>
				%10 = fir.load %6 : !fir.ref<i32>
				%11 = fir.convert %10 : (i32) -> index
				%12 = fir.convert %c1 : (index) -> i32
				%13:2 = fir.do_loop %arg3 = %c1 to %11 step %c1 iter_args(%arg4 = %12) -> (index, i32) {
				fir.store %arg4 to %3 : !fir.ref<i32>
				%14 = fir.load %7 : !fir.ref<i32>
				%15 = fir.convert %14 : (i32) -> index
				%16:2 = fir.do_loop %arg5 = %c1 to %15 step %c1 iter_args(%arg6 = %12) -> (index, i32) {
				fir.store %arg6 to %5 : !fir.ref<i32>
				%20 = fir.load %9 : !fir.ref<f64>
				%21 = fir.load %5 : !fir.ref<i32>
				%22 = fir.convert %21 : (i32) -> i64
				%23 = fir.load %3 : !fir.ref<i32>
				%24 = fir.convert %23 : (i32) -> i64
				%25 = fir.array_coor %1 %22, %24 : (!fir.box<!fir.array<?x?xf64>>, i64, i64) -> !fir.ref<f64>
				%26 = fir.load %25 : !fir.ref<f64>
				%27 = arith.addf %20, %26 fastmath<contract> : f64
				fir.store %27 to %9 : !fir.ref<f64>
				%28 = arith.addi %arg5, %c1 : index
				%29 = fir.load %5 : !fir.ref<i32>
				%30 = arith.addi %29, %12 : i32
				fir.result %28, %30 : index, i32
				}
				fir.store %16#1 to %5 : !fir.ref<i32>
				%17 = arith.addi %arg3, %c1 : index
				%18 = fir.load %3 : !fir.ref<i32>
				%19 = arith.addi %18, %12 : i32
				fir.result %17, %19 : index, i32
				}
				fir.store %13#1 to %3 : !fir.ref<i32>
				return
				}
				// CHECK-LABEL: func.func @_QPsum2d(
				// CHECK-SAME: %[[VAL_0:.*]]: !fir.box<!fir.array<?x?xf64>> {fir.bindc_name = "a"},
				// CHECK-SAME: %[[VAL_1:.*]]: !fir.ref<i32> {fir.bindc_name = "nx"},
				// CHECK-SAME: %[[VAL_2:.*]]: !fir.ref<i32> {fir.bindc_name = "ny"}) {
				// CHECK: %[[VAL_3:.*]] = arith.constant 1 : index
				// CHECK: %[[VAL_4:.*]] = arith.constant 0.000000e+00 : f64
				// CHECK: %[[VAL_5:.*]] = fir.declare %[[VAL_0]] {uniq_name = "_QFsum2dEa"} : (!fir.box<!fir.array<?x?xf64>>) -> !fir.box<!fir.array<?x?xf64>>
				// CHECK: %[[VAL_6:.*]] = fir.rebox %[[VAL_5]] : (!fir.box<!fir.array<?x?xf64>>) -> !fir.box<!fir.array<?x?xf64>>
				// CHECK: %[[VAL_7:.*]] = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFsum2dEi"}
				// CHECK: %[[VAL_8:.*]] = fir.declare %[[VAL_7]] {uniq_name = "_QFsum2dEi"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_9:.*]] = fir.alloca i32 {bindc_name = "j", uniq_name = "_QFsum2dEj"}
				// CHECK: %[[VAL_10:.*]] = fir.declare %[[VAL_9]] {uniq_name = "_QFsum2dEj"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_11:.*]] = fir.declare %[[VAL_1]] {uniq_name = "_QFsum2dEnx"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_12:.*]] = fir.declare %[[VAL_2]] {uniq_name = "_QFsum2dEny"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_13:.*]] = fir.alloca f64 {bindc_name = "sum", uniq_name = "_QFsum2dEsum"}
				// CHECK: %[[VAL_14:.*]] = fir.declare %[[VAL_13]] {uniq_name = "_QFsum2dEsum"} : (!fir.ref<f64>) -> !fir.ref<f64>
				// CHECK: fir.store %[[VAL_4]] to %[[VAL_14]] : !fir.ref<f64>
				// CHECK: %[[VAL_15:.*]] = fir.load %[[VAL_11]] : !fir.ref<i32>
				// CHECK: %[[VAL_16:.*]] = fir.convert %[[VAL_15]] : (i32) -> index
				// CHECK: %[[VAL_17:.*]] = fir.convert %[[VAL_3]] : (index) -> i32
				// CHECK: %[[VAL_18:.]]:2 = fir.do_loop %[[VAL_19:.]] = %[[VAL_3]] to %[[VAL_16]] step %[[VAL_3]] iter_args(%[[VAL_20:.*]] = %[[VAL_17]]) -> (index, i32) {
				// CHECK: fir.store %[[VAL_20]] to %[[VAL_8]] : !fir.ref<i32>
				// CHECK: %[[VAL_21:.*]] = fir.load %[[VAL_12]] : !fir.ref<i32>
				// CHECK: %[[VAL_22:.*]] = fir.convert %[[VAL_21]] : (i32) -> index
				// CHECK: %[[VAL_23:.*]] = arith.constant 0 : index
				// CHECK: %[[VAL_24:.*]]:3 = fir.box_dims %[[VAL_6]], %[[VAL_23]] : (!fir.box<!fir.array<?x?xf64>>, index) -> (index, index, index)
				// CHECK: %[[VAL_25:.*]] = arith.constant 1 : index
				// CHECK: %[[VAL_26:.*]]:3 = fir.box_dims %[[VAL_6]], %[[VAL_25]] : (!fir.box<!fir.array<?x?xf64>>, index) -> (index, index, index)
				// CHECK: %[[VAL_27:.*]] = arith.constant 8 : index
				// CHECK: %[[VAL_28:.*]] = arith.cmpi eq, %[[VAL_24]]#2, %[[VAL_27]] : index
				// CHECK: %[[VAL_29:.*]]:2 = fir.if %[[VAL_28]] -> (index, i32) {
				// CHECK: %[[VAL_30:.*]] = fir.convert %[[VAL_6]] : (!fir.box<!fir.array<?x?xf64>>) -> !fir.box<!fir.array<?xf64>>
				// CHECK: %[[VAL_31:.*]] = fir.box_addr %[[VAL_30]] : (!fir.box<!fir.array<?xf64>>) -> !fir.ref<!fir.array<?xf64>>
				// CHECK: %[[VAL_32:.]]:2 = fir.do_loop %[[VAL_33:.]] = %[[VAL_3]] to %[[VAL_22]] step %[[VAL_3]] iter_args(%[[VAL_34:.*]] = %[[VAL_17]]) -> (index, i32) {
				// CHECK: fir.store %[[VAL_34]] to %[[VAL_10]] : !fir.ref<i32>
				// CHECK: %[[VAL_35:.*]] = fir.load %[[VAL_14]] : !fir.ref<f64>
				// CHECK: %[[VAL_36:.*]] = fir.load %[[VAL_10]] : !fir.ref<i32>
				// CHECK: %[[VAL_37:.*]] = fir.convert %[[VAL_36]] : (i32) -> i64
				// CHECK: %[[VAL_38:.*]] = fir.load %[[VAL_8]] : !fir.ref<i32>
				// CHECK: %[[VAL_39:.*]] = fir.convert %[[VAL_38]] : (i32) -> i64
				// CHECK: %[[VAL_40:.*]] = arith.constant 1 : i64
				// CHECK: %[[VAL_41:.*]] = arith.subi %[[VAL_39]], %[[VAL_40]] : i64
				// CHECK: %[[VAL_42:.*]] = fir.convert %[[VAL_41]] : (i64) -> index
				// CHECK: %[[VAL_43:.*]] = arith.muli %[[VAL_26]]#2, %[[VAL_42]] : index
				// CHECK: %[[VAL_44:.*]] = arith.constant 1 : i64
				// CHECK: %[[VAL_45:.*]] = arith.subi %[[VAL_37]], %[[VAL_44]] : i64
				// CHECK: %[[VAL_46:.*]] = fir.convert %[[VAL_45]] : (i64) -> index
				// CHECK: %[[VAL_47:.*]] = arith.constant 3 : index
				// CHECK: %[[VAL_48:.*]] = arith.shrsi %[[VAL_43]], %[[VAL_47]] : index
				// CHECK: %[[VAL_49:.*]] = arith.addi %[[VAL_48]], %[[VAL_46]] : index
				// CHECK: %[[VAL_50:.*]] = fir.coordinate_of %[[VAL_31]], %[[VAL_49]] : (!fir.ref<!fir.array<?xf64>>, index) -> !fir.ref<f64>
				// CHECK: %[[VAL_51:.*]] = fir.load %[[VAL_50]] : !fir.ref<f64>
				// CHECK: %[[VAL_52:.*]] = arith.addf %[[VAL_35]], %[[VAL_51]] fastmath<contract> : f64
				// CHECK: fir.store %[[VAL_52]] to %[[VAL_14]] : !fir.ref<f64>
				// CHECK: %[[VAL_53:.*]] = arith.addi %[[VAL_33]], %[[VAL_3]] : index
				// CHECK: %[[VAL_54:.*]] = fir.load %[[VAL_10]] : !fir.ref<i32>
				// CHECK: %[[VAL_55:.*]] = arith.addi %[[VAL_54]], %[[VAL_17]] : i32
				// CHECK: fir.result %[[VAL_53]], %[[VAL_55]] : index, i32
				// CHECK: }
				// CHECK: fir.result %[[VAL_56:.*]]#0, %[[VAL_56]]#1 : index, i32
				// CHECK: } else {
				// CHECK: %[[VAL_57:.]]:2 = fir.do_loop %[[VAL_58:.]] = %[[VAL_3]] to %[[VAL_22]] step %[[VAL_3]] iter_args(%[[VAL_59:.*]] = %[[VAL_17]]) -> (index, i32) {
				// CHECK: fir.store %[[VAL_59]] to %[[VAL_10]] : !fir.ref<i32>
				// CHECK: %[[VAL_60:.*]] = fir.load %[[VAL_14]] : !fir.ref<f64>
				// CHECK: %[[VAL_61:.*]] = fir.load %[[VAL_10]] : !fir.ref<i32>
				// CHECK: %[[VAL_62:.*]] = fir.convert %[[VAL_61]] : (i32) -> i64
				// CHECK: %[[VAL_63:.*]] = fir.load %[[VAL_8]] : !fir.ref<i32>
				// CHECK: %[[VAL_64:.*]] = fir.convert %[[VAL_63]] : (i32) -> i64
				// CHECK: %[[VAL_65:.*]] = fir.array_coor %[[VAL_6]] %[[VAL_62]], %[[VAL_64]] : (!fir.box<!fir.array<?x?xf64>>, i64, i64) -> !fir.ref<f64>
				// CHECK: %[[VAL_66:.*]] = fir.load %[[VAL_65]] : !fir.ref<f64>
				// CHECK: %[[VAL_67:.*]] = arith.addf %[[VAL_60]], %[[VAL_66]] fastmath<contract> : f64
				// CHECK: fir.store %[[VAL_67]] to %[[VAL_14]] : !fir.ref<f64>
				// CHECK: %[[VAL_68:.*]] = arith.addi %[[VAL_58]], %[[VAL_3]] : index
				// CHECK: %[[VAL_69:.*]] = fir.load %[[VAL_10]] : !fir.ref<i32>
				// CHECK: %[[VAL_70:.*]] = arith.addi %[[VAL_69]], %[[VAL_17]] : i32
				// CHECK: fir.result %[[VAL_68]], %[[VAL_70]] : index, i32
				// CHECK: }
				// CHECK: fir.result %[[VAL_71:.*]]#0, %[[VAL_71]]#1 : index, i32
				// CHECK: }
				// CHECK: fir.store %[[VAL_72:.*]]#1 to %[[VAL_10]] : !fir.ref<i32>
				// CHECK: %[[VAL_73:.*]] = arith.addi %[[VAL_19]], %[[VAL_3]] : index
				// CHECK: %[[VAL_74:.*]] = fir.load %[[VAL_8]] : !fir.ref<i32>
				// CHECK: %[[VAL_75:.*]] = arith.addi %[[VAL_74]], %[[VAL_17]] : i32
				// CHECK: fir.result %[[VAL_73]], %[[VAL_75]] : index, i32
				// CHECK: }
				// CHECK: fir.store %[[VAL_76:.*]]#1 to %[[VAL_8]] : !fir.ref<i32>
				// CHECK: return
				// CHECK: }

				// test sum3d with hlfir lowering
				func.func @_QPsum3d(%arg0: !fir.box<!fir.array<?x?x?xf64>> {fir.bindc_name = "a"}, %arg1: !fir.ref<i32> {fir.bindc_name = "nx"}, %arg2: !fir.ref<i32> {fir.bindc_name = "ny"}, %arg3: !fir.ref<i32> {fir.bindc_name = "nz"}) {
				%c0 = arith.constant 0 : index
				%c1 = arith.constant 1 : index
				%cst = arith.constant 0.000000e+00 : f64
				%0 = fir.declare %arg0 {uniq_name = "_QFsum3dEa"} : (!fir.box<!fir.array<?x?x?xf64>>) -> !fir.box<!fir.array<?x?x?xf64>>
				%1 = fir.rebox %0 : (!fir.box<!fir.array<?x?x?xf64>>) -> !fir.box<!fir.array<?x?x?xf64>>
				%2 = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFsum3dEi"}
				%3 = fir.declare %2 {uniq_name = "_QFsum3dEi"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%4 = fir.alloca i32 {bindc_name = "j", uniq_name = "_QFsum3dEj"}
				%5 = fir.declare %4 {uniq_name = "_QFsum3dEj"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%6 = fir.alloca i32 {bindc_name = "k", uniq_name = "_QFsum3dEk"}
				%7 = fir.declare %6 {uniq_name = "_QFsum3dEk"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%8 = fir.declare %arg1 {uniq_name = "_QFsum3dEnx"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%9 = fir.declare %arg2 {uniq_name = "_QFsum3dEny"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%10 = fir.declare %arg3 {uniq_name = "_QFsum3dEnz"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%11 = fir.alloca f64 {bindc_name = "sum", uniq_name = "_QFsum3dEsum"}
				%12 = fir.declare %11 {uniq_name = "_QFsum3dEsum"} : (!fir.ref<f64>) -> !fir.ref<f64>
				fir.store %cst to %12 : !fir.ref<f64>
				%13 = fir.load %10 : !fir.ref<i32>
				%14 = fir.convert %13 : (i32) -> index
				%15 = fir.convert %c1 : (index) -> i32
				%16:2 = fir.do_loop %arg4 = %c1 to %14 step %c1 iter_args(%arg5 = %15) -> (index, i32) {
				fir.store %arg5 to %7 : !fir.ref<i32>
				%17 = fir.load %9 : !fir.ref<i32>
				%18 = fir.convert %17 : (i32) -> index
				%19:2 = fir.do_loop %arg6 = %c1 to %18 step %c1 iter_args(%arg7 = %15) -> (index, i32) {
				fir.store %arg7 to %5 : !fir.ref<i32>
				%23 = fir.load %8 : !fir.ref<i32>
				%24 = fir.convert %23 : (i32) -> index
				%25 = fir.convert %c0 : (index) -> i32
				%26:2 = fir.do_loop %arg8 = %c0 to %24 step %c1 iter_args(%arg9 = %25) -> (index, i32) {
				fir.store %arg9 to %3 : !fir.ref<i32>
				%30 = fir.load %12 : !fir.ref<f64>
				%31 = fir.load %3 : !fir.ref<i32>
				%32 = fir.convert %31 : (i32) -> i64
				%33 = fir.load %5 : !fir.ref<i32>
				%34 = fir.convert %33 : (i32) -> i64
				%35 = fir.load %7 : !fir.ref<i32>
				%36 = fir.convert %35 : (i32) -> i64
				%37 = fir.array_coor %1 %32, %34, %36 : (!fir.box<!fir.array<?x?x?xf64>>, i64, i64, i64) -> !fir.ref<f64>
				%38 = fir.load %37 : !fir.ref<f64>
				%39 = arith.addf %30, %38 fastmath<contract> : f64
				fir.store %39 to %12 : !fir.ref<f64>
				%40 = arith.addi %arg8, %c1 : index
				%41 = fir.load %3 : !fir.ref<i32>
				%42 = arith.addi %41, %15 : i32
				fir.result %40, %42 : index, i32
				}
				fir.store %26#1 to %3 : !fir.ref<i32>
				%27 = arith.addi %arg6, %c1 : index
				%28 = fir.load %5 : !fir.ref<i32>
				%29 = arith.addi %28, %15 : i32
				fir.result %27, %29 : index, i32
				}
				fir.store %19#1 to %5 : !fir.ref<i32>
				%20 = arith.addi %arg4, %c1 : index
				%21 = fir.load %7 : !fir.ref<i32>
				%22 = arith.addi %21, %15 : i32
				fir.result %20, %22 : index, i32
				}
				fir.store %16#1 to %7 : !fir.ref<i32>
				return
				}
				// CHECK-LABEL: func.func @_QPsum3d(
				// CHECK-SAME: %[[VAL_0:.*]]: !fir.box<!fir.array<?x?x?xf64>> {fir.bindc_name = "a"},
				// CHECK-SAME: %[[VAL_1:.*]]: !fir.ref<i32> {fir.bindc_name = "nx"},
				// CHECK-SAME: %[[VAL_2:.*]]: !fir.ref<i32> {fir.bindc_name = "ny"},
				// CHECK-SAME: %[[VAL_3:.*]]: !fir.ref<i32> {fir.bindc_name = "nz"}) {
				// CHECK: %[[VAL_4:.*]] = arith.constant 0 : index
				// CHECK: %[[VAL_5:.*]] = arith.constant 1 : index
				// CHECK: %[[VAL_6:.*]] = arith.constant 0.000000e+00 : f64
				// CHECK: %[[VAL_7:.*]] = fir.declare %[[VAL_0]] {uniq_name = "_QFsum3dEa"} : (!fir.box<!fir.array<?x?x?xf64>>) -> !fir.box<!fir.array<?x?x?xf64>>
				// CHECK: %[[VAL_8:.*]] = fir.rebox %[[VAL_7]] : (!fir.box<!fir.array<?x?x?xf64>>) -> !fir.box<!fir.array<?x?x?xf64>>
				// CHECK: %[[VAL_9:.*]] = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFsum3dEi"}
				// CHECK: %[[VAL_10:.*]] = fir.declare %[[VAL_9]] {uniq_name = "_QFsum3dEi"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_11:.*]] = fir.alloca i32 {bindc_name = "j", uniq_name = "_QFsum3dEj"}
				// CHECK: %[[VAL_12:.*]] = fir.declare %[[VAL_11]] {uniq_name = "_QFsum3dEj"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_13:.*]] = fir.alloca i32 {bindc_name = "k", uniq_name = "_QFsum3dEk"}
				// CHECK: %[[VAL_14:.*]] = fir.declare %[[VAL_13]] {uniq_name = "_QFsum3dEk"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_15:.*]] = fir.declare %[[VAL_1]] {uniq_name = "_QFsum3dEnx"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_16:.*]] = fir.declare %[[VAL_2]] {uniq_name = "_QFsum3dEny"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_17:.*]] = fir.declare %[[VAL_3]] {uniq_name = "_QFsum3dEnz"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_18:.*]] = fir.alloca f64 {bindc_name = "sum", uniq_name = "_QFsum3dEsum"}
				// CHECK: %[[VAL_19:.*]] = fir.declare %[[VAL_18]] {uniq_name = "_QFsum3dEsum"} : (!fir.ref<f64>) -> !fir.ref<f64>
				// CHECK: fir.store %[[VAL_6]] to %[[VAL_19]] : !fir.ref<f64>
				// CHECK: %[[VAL_20:.*]] = fir.load %[[VAL_17]] : !fir.ref<i32>
				// CHECK: %[[VAL_21:.*]] = fir.convert %[[VAL_20]] : (i32) -> index
				// CHECK: %[[VAL_22:.*]] = fir.convert %[[VAL_5]] : (index) -> i32
				// CHECK: %[[VAL_23:.]]:2 = fir.do_loop %[[VAL_24:.]] = %[[VAL_5]] to %[[VAL_21]] step %[[VAL_5]] iter_args(%[[VAL_25:.*]] = %[[VAL_22]]) -> (index, i32) {
				// CHECK: fir.store %[[VAL_25]] to %[[VAL_14]] : !fir.ref<i32>
				// CHECK: %[[VAL_26:.*]] = fir.load %[[VAL_16]] : !fir.ref<i32>
				// CHECK: %[[VAL_27:.*]] = fir.convert %[[VAL_26]] : (i32) -> index
				// CHECK: %[[VAL_28:.]]:2 = fir.do_loop %[[VAL_29:.]] = %[[VAL_5]] to %[[VAL_27]] step %[[VAL_5]] iter_args(%[[VAL_30:.*]] = %[[VAL_22]]) -> (index, i32) {
				// CHECK: fir.store %[[VAL_30]] to %[[VAL_12]] : !fir.ref<i32>
				// CHECK: %[[VAL_31:.*]] = fir.load %[[VAL_15]] : !fir.ref<i32>
				// CHECK: %[[VAL_32:.*]] = fir.convert %[[VAL_31]] : (i32) -> index
				// CHECK: %[[VAL_33:.*]] = fir.convert %[[VAL_4]] : (index) -> i32
				// CHECK: %[[VAL_34:.*]] = arith.constant 0 : index
				// CHECK: %[[VAL_35:.*]]:3 = fir.box_dims %[[VAL_8]], %[[VAL_34]] : (!fir.box<!fir.array<?x?x?xf64>>, index) -> (index, index, index)
				// CHECK: %[[VAL_36:.*]] = arith.constant 1 : index
				// CHECK: %[[VAL_37:.*]]:3 = fir.box_dims %[[VAL_8]], %[[VAL_36]] : (!fir.box<!fir.array<?x?x?xf64>>, index) -> (index, index, index)
				// CHECK: %[[VAL_38:.*]] = arith.constant 2 : index
				// CHECK: %[[VAL_39:.*]]:3 = fir.box_dims %[[VAL_8]], %[[VAL_38]] : (!fir.box<!fir.array<?x?x?xf64>>, index) -> (index, index, index)
				// CHECK: %[[VAL_40:.*]] = arith.constant 8 : index
				// CHECK: %[[VAL_41:.*]] = arith.cmpi eq, %[[VAL_35]]#2, %[[VAL_40]] : index
				// CHECK: %[[VAL_42:.*]]:2 = fir.if %[[VAL_41]] -> (index, i32) {
				// CHECK: %[[VAL_43:.*]] = fir.convert %[[VAL_8]] : (!fir.box<!fir.array<?x?x?xf64>>) -> !fir.box<!fir.array<?xf64>>
				// CHECK: %[[VAL_44:.*]] = fir.box_addr %[[VAL_43]] : (!fir.box<!fir.array<?xf64>>) -> !fir.ref<!fir.array<?xf64>>
				// CHECK: %[[VAL_45:.]]:2 = fir.do_loop %[[VAL_46:.]] = %[[VAL_4]] to %[[VAL_32]] step %[[VAL_5]] iter_args(%[[VAL_47:.*]] = %[[VAL_33]]) -> (index, i32) {
				// CHECK: fir.store %[[VAL_47]] to %[[VAL_10]] : !fir.ref<i32>
				// CHECK: %[[VAL_48:.*]] = fir.load %[[VAL_19]] : !fir.ref<f64>
				// CHECK: %[[VAL_49:.*]] = fir.load %[[VAL_10]] : !fir.ref<i32>
				// CHECK: %[[VAL_50:.*]] = fir.convert %[[VAL_49]] : (i32) -> i64
				// CHECK: %[[VAL_51:.*]] = fir.load %[[VAL_12]] : !fir.ref<i32>
				// CHECK: %[[VAL_52:.*]] = fir.convert %[[VAL_51]] : (i32) -> i64
				// CHECK: %[[VAL_53:.*]] = fir.load %[[VAL_14]] : !fir.ref<i32>
				// CHECK: %[[VAL_54:.*]] = fir.convert %[[VAL_53]] : (i32) -> i64
				// CHECK: %[[VAL_55:.*]] = arith.constant 1 : i64
				// CHECK: %[[VAL_56:.*]] = arith.subi %[[VAL_54]], %[[VAL_55]] : i64
				// CHECK: %[[VAL_57:.*]] = fir.convert %[[VAL_56]] : (i64) -> index
				// CHECK: %[[VAL_58:.*]] = arith.muli %[[VAL_39]]#2, %[[VAL_57]] : index
				// CHECK: %[[VAL_59:.*]] = arith.constant 1 : i64
				// CHECK: %[[VAL_60:.*]] = arith.subi %[[VAL_52]], %[[VAL_59]] : i64
				// CHECK: %[[VAL_61:.*]] = fir.convert %[[VAL_60]] : (i64) -> index
				// CHECK: %[[VAL_62:.*]] = arith.muli %[[VAL_37]]#2, %[[VAL_61]] : index
				// CHECK: %[[VAL_63:.*]] = arith.addi %[[VAL_62]], %[[VAL_58]] : index
				// CHECK: %[[VAL_64:.*]] = arith.constant 1 : i64
				// CHECK: %[[VAL_65:.*]] = arith.subi %[[VAL_50]], %[[VAL_64]] : i64
				// CHECK: %[[VAL_66:.*]] = fir.convert %[[VAL_65]] : (i64) -> index
				// CHECK: %[[VAL_67:.*]] = arith.constant 3 : index
				// CHECK: %[[VAL_68:.*]] = arith.shrsi %[[VAL_63]], %[[VAL_67]] : index
				// CHECK: %[[VAL_69:.*]] = arith.addi %[[VAL_68]], %[[VAL_66]] : index
				// CHECK: %[[VAL_70:.*]] = fir.coordinate_of %[[VAL_44]], %[[VAL_69]] : (!fir.ref<!fir.array<?xf64>>, index) -> !fir.ref<f64>
				// CHECK: %[[VAL_71:.*]] = fir.load %[[VAL_70]] : !fir.ref<f64>
				// CHECK: %[[VAL_72:.*]] = arith.addf %[[VAL_48]], %[[VAL_71]] fastmath<contract> : f64
				// CHECK: fir.store %[[VAL_72]] to %[[VAL_19]] : !fir.ref<f64>
				// CHECK: %[[VAL_73:.*]] = arith.addi %[[VAL_46]], %[[VAL_5]] : index
				// CHECK: %[[VAL_74:.*]] = fir.load %[[VAL_10]] : !fir.ref<i32>
				// CHECK: %[[VAL_75:.*]] = arith.addi %[[VAL_74]], %[[VAL_22]] : i32
				// CHECK: fir.result %[[VAL_73]], %[[VAL_75]] : index, i32
				// CHECK: }
				// CHECK: fir.result %[[VAL_76:.*]]#0, %[[VAL_76]]#1 : index, i32
				// CHECK: } else {
				// CHECK: %[[VAL_77:.]]:2 = fir.do_loop %[[VAL_78:.]] = %[[VAL_4]] to %[[VAL_32]] step %[[VAL_5]] iter_args(%[[VAL_79:.*]] = %[[VAL_33]]) -> (index, i32) {
				// CHECK: fir.store %[[VAL_79]] to %[[VAL_10]] : !fir.ref<i32>
				// CHECK: %[[VAL_80:.*]] = fir.load %[[VAL_19]] : !fir.ref<f64>
				// CHECK: %[[VAL_81:.*]] = fir.load %[[VAL_10]] : !fir.ref<i32>
				// CHECK: %[[VAL_82:.*]] = fir.convert %[[VAL_81]] : (i32) -> i64
				// CHECK: %[[VAL_83:.*]] = fir.load %[[VAL_12]] : !fir.ref<i32>
				// CHECK: %[[VAL_84:.*]] = fir.convert %[[VAL_83]] : (i32) -> i64
				// CHECK: %[[VAL_85:.*]] = fir.load %[[VAL_14]] : !fir.ref<i32>
				// CHECK: %[[VAL_86:.*]] = fir.convert %[[VAL_85]] : (i32) -> i64
				// CHECK: %[[VAL_87:.*]] = fir.array_coor %[[VAL_8]] %[[VAL_82]], %[[VAL_84]], %[[VAL_86]] : (!fir.box<!fir.array<?x?x?xf64>>, i64, i64, i64) -> !fir.ref<f64>
				// CHECK: %[[VAL_88:.*]] = fir.load %[[VAL_87]] : !fir.ref<f64>
				// CHECK: %[[VAL_89:.*]] = arith.addf %[[VAL_80]], %[[VAL_88]] fastmath<contract> : f64
				// CHECK: fir.store %[[VAL_89]] to %[[VAL_19]] : !fir.ref<f64>
				// CHECK: %[[VAL_90:.*]] = arith.addi %[[VAL_78]], %[[VAL_5]] : index
				// CHECK: %[[VAL_91:.*]] = fir.load %[[VAL_10]] : !fir.ref<i32>
				// CHECK: %[[VAL_92:.*]] = arith.addi %[[VAL_91]], %[[VAL_22]] : i32
				// CHECK: fir.result %[[VAL_90]], %[[VAL_92]] : index, i32
				// CHECK: }
				// CHECK: fir.result %[[VAL_93:.*]]#0, %[[VAL_93]]#1 : index, i32
				// CHECK: }
				// CHECK: fir.store %[[VAL_94:.*]]#1 to %[[VAL_10]] : !fir.ref<i32>
				// CHECK: %[[VAL_95:.*]] = arith.addi %[[VAL_29]], %[[VAL_5]] : index
				// CHECK: %[[VAL_96:.*]] = fir.load %[[VAL_12]] : !fir.ref<i32>
				// CHECK: %[[VAL_97:.*]] = arith.addi %[[VAL_96]], %[[VAL_22]] : i32
				// CHECK: fir.result %[[VAL_95]], %[[VAL_97]] : index, i32
				// CHECK: }
				// CHECK: fir.store %[[VAL_98:.*]]#1 to %[[VAL_12]] : !fir.ref<i32>
				// CHECK: %[[VAL_99:.*]] = arith.addi %[[VAL_24]], %[[VAL_5]] : index
				// CHECK: %[[VAL_100:.*]] = fir.load %[[VAL_14]] : !fir.ref<i32>
				// CHECK: %[[VAL_101:.*]] = arith.addi %[[VAL_100]], %[[VAL_22]] : i32
				// CHECK: fir.result %[[VAL_99]], %[[VAL_101]] : index, i32
				// CHECK: }
				// CHECK: fir.store %[[VAL_102:.*]]#1 to %[[VAL_14]] : !fir.ref<i32>
				// CHECK: return
				// CHECK: }

				// test non-default lower bounds are handled correctly
				func.func @_QPlbounds_repro(%arg0: !fir.box<!fir.array<?x?x?xf32>> {fir.bindc_name = "u"}, %arg1: !fir.ref<i32> {fir.bindc_name = "ims"}, %arg2: !fir.ref<i32> {fir.bindc_name = "jms"}, %arg3: !fir.ref<i32> {fir.bindc_name = "kms"}, %arg4: !fir.ref<i32> {fir.bindc_name = "ips"}, %arg5: !fir.ref<i32> {fir.bindc_name = "ipe"}, %arg6: !fir.ref<i32> {fir.bindc_name = "jps"}, %arg7: !fir.ref<i32> {fir.bindc_name = "jpe"}, %arg8: !fir.ref<i32> {fir.bindc_name = "kps"}, %arg9: !fir.ref<i32> {fir.bindc_name = "kpe"}) {
				%c1_i32 = arith.constant 1 : i32
				%c1 = arith.constant 1 : index
				%0 = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFlbounds_reproEi"}
				%1 = fir.declare %0 {uniq_name = "_QFlbounds_reproEi"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%2 = fir.declare %arg1 {fortran_attrs = #fir.var_attrs<intent_in>, uniq_name = "_QFlbounds_reproEims"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%3 = fir.declare %arg5 {fortran_attrs = #fir.var_attrs<intent_in>, uniq_name = "_QFlbounds_reproEipe"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%4 = fir.declare %arg4 {fortran_attrs = #fir.var_attrs<intent_in>, uniq_name = "_QFlbounds_reproEips"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%5 = fir.alloca i32 {bindc_name = "j", uniq_name = "_QFlbounds_reproEj"}
				%6 = fir.declare %5 {uniq_name = "_QFlbounds_reproEj"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%7 = fir.declare %arg2 {fortran_attrs = #fir.var_attrs<intent_in>, uniq_name = "_QFlbounds_reproEjms"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%8 = fir.declare %arg7 {fortran_attrs = #fir.var_attrs<intent_in>, uniq_name = "_QFlbounds_reproEjpe"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%9 = fir.declare %arg6 {fortran_attrs = #fir.var_attrs<intent_in>, uniq_name = "_QFlbounds_reproEjps"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%10 = fir.alloca i32 {bindc_name = "k", uniq_name = "_QFlbounds_reproEk"}
				%11 = fir.declare %10 {uniq_name = "_QFlbounds_reproEk"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%12 = fir.declare %arg3 {fortran_attrs = #fir.var_attrs<intent_in>, uniq_name = "_QFlbounds_reproEkms"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%13 = fir.declare %arg9 {fortran_attrs = #fir.var_attrs<intent_in>, uniq_name = "_QFlbounds_reproEkpe"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%14 = fir.declare %arg8 {fortran_attrs = #fir.var_attrs<intent_in>, uniq_name = "_QFlbounds_reproEkps"} : (!fir.ref<i32>) -> !fir.ref<i32>
				%15 = fir.alloca f32 {bindc_name = "vmax", uniq_name = "_QFlbounds_reproEvmax"}
				%16 = fir.declare %15 {uniq_name = "_QFlbounds_reproEvmax"} : (!fir.ref<f32>) -> !fir.ref<f32>
				%17 = fir.load %12 : !fir.ref<i32>
				%18 = fir.convert %17 : (i32) -> index
				%19 = fir.load %2 : !fir.ref<i32>
				%20 = fir.convert %19 : (i32) -> index
				%21 = fir.load %7 : !fir.ref<i32>
				%22 = fir.convert %21 : (i32) -> index
				%23 = fir.shift %18, %20, %22 : (index, index, index) -> !fir.shift<3>
				%24 = fir.declare %arg0(%23) {fortran_attrs = #fir.var_attrs<intent_in>, uniq_name = "_QFlbounds_reproEu"} : (!fir.box<!fir.array<?x?x?xf32>>, !fir.shift<3>) -> !fir.box<!fir.array<?x?x?xf32>>
				%25 = fir.rebox %24(%23) : (!fir.box<!fir.array<?x?x?xf32>>, !fir.shift<3>) -> !fir.box<!fir.array<?x?x?xf32>>
				%26 = fir.array_coor %25(%23) %c1, %c1, %c1 : (!fir.box<!fir.array<?x?x?xf32>>, !fir.shift<3>, index, index, index) -> !fir.ref<f32>
				%27 = fir.load %26 : !fir.ref<f32>
				fir.store %27 to %16 : !fir.ref<f32>
				%28 = fir.load %9 : !fir.ref<i32>
				%29 = fir.convert %28 : (i32) -> index
				%30 = fir.load %8 : !fir.ref<i32>
				%31 = arith.subi %30, %c1_i32 : i32
				%32 = fir.convert %31 : (i32) -> index
				%33 = fir.convert %29 : (index) -> i32
				%34:2 = fir.do_loop %arg10 = %29 to %32 step %c1 iter_args(%arg11 = %33) -> (index, i32) {
				fir.store %arg11 to %6 : !fir.ref<i32>
				%35 = fir.load %4 : !fir.ref<i32>
				%36 = fir.convert %35 : (i32) -> index
				%37 = fir.load %3 : !fir.ref<i32>
				%38 = fir.convert %37 : (i32) -> index
				%39 = fir.convert %36 : (index) -> i32
				%40:2 = fir.do_loop %arg12 = %36 to %38 step %c1 iter_args(%arg13 = %39) -> (index, i32) {
				fir.store %arg13 to %1 : !fir.ref<i32>
				%45 = fir.load %14 : !fir.ref<i32>
				%46 = fir.convert %45 : (i32) -> index
				%47 = fir.load %13 : !fir.ref<i32>
				%48 = arith.subi %47, %c1_i32 : i32
				%49 = fir.convert %48 : (i32) -> index
				%50 = fir.convert %46 : (index) -> i32
				%51:2 = fir.do_loop %arg14 = %46 to %49 step %c1 iter_args(%arg15 = %50) -> (index, i32) {
				fir.store %arg15 to %11 : !fir.ref<i32>
				%56 = fir.load %11 : !fir.ref<i32>
				%57 = fir.convert %56 : (i32) -> i64
				%58 = fir.load %1 : !fir.ref<i32>
				%59 = fir.convert %58 : (i32) -> i64
				%60 = fir.load %6 : !fir.ref<i32>
				%61 = fir.convert %60 : (i32) -> i64
				%62 = fir.array_coor %25(%23) %57, %59, %61 : (!fir.box<!fir.array<?x?x?xf32>>, !fir.shift<3>, i64, i64, i64) -> !fir.ref<f32>
				%63 = fir.load %62 : !fir.ref<f32>
				%64 = fir.load %16 : !fir.ref<f32>
				%65 = arith.cmpf ogt, %63, %64 : f32
				fir.if %65 {
				%70 = fir.load %11 : !fir.ref<i32>
				%71 = fir.convert %70 : (i32) -> i64
				%72 = fir.load %1 : !fir.ref<i32>
				%73 = fir.convert %72 : (i32) -> i64
				%74 = fir.load %6 : !fir.ref<i32>
				%75 = fir.convert %74 : (i32) -> i64
				%76 = fir.array_coor %25(%23) %71, %73, %75 : (!fir.box<!fir.array<?x?x?xf32>>, !fir.shift<3>, i64, i64, i64) -> !fir.ref<f32>
				%77 = fir.load %76 : !fir.ref<f32>
				fir.store %77 to %16 : !fir.ref<f32>
				} else {
				}
				%66 = arith.addi %arg14, %c1 : index
				%67 = fir.convert %c1 : (index) -> i32
				%68 = fir.load %11 : !fir.ref<i32>
				%69 = arith.addi %68, %67 : i32
				fir.result %66, %69 : index, i32
				}
				fir.store %51#1 to %11 : !fir.ref<i32>
				%52 = arith.addi %arg12, %c1 : index
				%53 = fir.convert %c1 : (index) -> i32
				%54 = fir.load %1 : !fir.ref<i32>
				%55 = arith.addi %54, %53 : i32
				fir.result %52, %55 : index, i32
				}
				fir.store %40#1 to %1 : !fir.ref<i32>
				%41 = arith.addi %arg10, %c1 : index
				%42 = fir.convert %c1 : (index) -> i32
				%43 = fir.load %6 : !fir.ref<i32>
				%44 = arith.addi %43, %42 : i32
				fir.result %41, %44 : index, i32
				}
				fir.store %34#1 to %6 : !fir.ref<i32>
				return
				}
				// CHECK-LABEL: func.func @_QPlbounds_repro(
				// CHECK-SAME: %[[VAL_0:.*]]: !fir.box<!fir.array<?x?x?xf32>> {fir.bindc_name = "u"},
				// CHECK-SAME: %[[VAL_1:.*]]: !fir.ref<i32> {fir.bindc_name = "ims"},
				// CHECK-SAME: %[[VAL_2:.*]]: !fir.ref<i32> {fir.bindc_name = "jms"},
				// CHECK-SAME: %[[VAL_3:.*]]: !fir.ref<i32> {fir.bindc_name = "kms"},
				// CHECK-SAME: %[[VAL_4:.*]]: !fir.ref<i32> {fir.bindc_name = "ips"},
				// CHECK-SAME: %[[VAL_5:.*]]: !fir.ref<i32> {fir.bindc_name = "ipe"},
				// CHECK-SAME: %[[VAL_6:.*]]: !fir.ref<i32> {fir.bindc_name = "jps"},
				// CHECK-SAME: %[[VAL_7:.*]]: !fir.ref<i32> {fir.bindc_name = "jpe"},
				// CHECK-SAME: %[[VAL_8:.*]]: !fir.ref<i32> {fir.bindc_name = "kps"},
				// CHECK-SAME: %[[VAL_9:.*]]: !fir.ref<i32> {fir.bindc_name = "kpe"}) {
				// CHECK: %[[VAL_10:.*]] = arith.constant 1 : i32
				// CHECK: %[[VAL_11:.*]] = arith.constant 1 : index
				// CHECK: %[[VAL_12:.*]] = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFlbounds_reproEi"}
				// CHECK: %[[VAL_13:.*]] = fir.declare %[[VAL_12]] {uniq_name = "_QFlbounds_reproEi"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_14:.]] = fir.declare %[[VAL_1]] {fortran_attrs = {{.}}, uniq_name = "_QFlbounds_reproEims"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_15:.]] = fir.declare %[[VAL_5]] {fortran_attrs = {{.}}, uniq_name = "_QFlbounds_reproEipe"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_16:.]] = fir.declare %[[VAL_4]] {fortran_attrs = {{.}}, uniq_name = "_QFlbounds_reproEips"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_17:.*]] = fir.alloca i32 {bindc_name = "j", uniq_name = "_QFlbounds_reproEj"}
				// CHECK: %[[VAL_18:.*]] = fir.declare %[[VAL_17]] {uniq_name = "_QFlbounds_reproEj"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_19:.]] = fir.declare %[[VAL_2]] {fortran_attrs = {{.}}, uniq_name = "_QFlbounds_reproEjms"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_20:.]] = fir.declare %[[VAL_7]] {fortran_attrs = {{.}}, uniq_name = "_QFlbounds_reproEjpe"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_21:.]] = fir.declare %[[VAL_6]] {fortran_attrs = {{.}}, uniq_name = "_QFlbounds_reproEjps"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_22:.*]] = fir.alloca i32 {bindc_name = "k", uniq_name = "_QFlbounds_reproEk"}
				// CHECK: %[[VAL_23:.*]] = fir.declare %[[VAL_22]] {uniq_name = "_QFlbounds_reproEk"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_24:.]] = fir.declare %[[VAL_3]] {fortran_attrs = {{.}}, uniq_name = "_QFlbounds_reproEkms"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_25:.]] = fir.declare %[[VAL_9]] {fortran_attrs = {{.}}, uniq_name = "_QFlbounds_reproEkpe"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_26:.]] = fir.declare %[[VAL_8]] {fortran_attrs = {{.}}, uniq_name = "_QFlbounds_reproEkps"} : (!fir.ref<i32>) -> !fir.ref<i32>
				// CHECK: %[[VAL_27:.*]] = fir.alloca f32 {bindc_name = "vmax", uniq_name = "_QFlbounds_reproEvmax"}
				// CHECK: %[[VAL_28:.*]] = fir.declare %[[VAL_27]] {uniq_name = "_QFlbounds_reproEvmax"} : (!fir.ref<f32>) -> !fir.ref<f32>
				// CHECK: %[[VAL_29:.*]] = fir.load %[[VAL_24]] : !fir.ref<i32>
				// CHECK: %[[VAL_30:.*]] = fir.convert %[[VAL_29]] : (i32) -> index
				// CHECK: %[[VAL_31:.*]] = fir.load %[[VAL_14]] : !fir.ref<i32>
				// CHECK: %[[VAL_32:.*]] = fir.convert %[[VAL_31]] : (i32) -> index
				// CHECK: %[[VAL_33:.*]] = fir.load %[[VAL_19]] : !fir.ref<i32>
				// CHECK: %[[VAL_34:.*]] = fir.convert %[[VAL_33]] : (i32) -> index
				// CHECK: %[[VAL_35:.*]] = fir.shift %[[VAL_30]], %[[VAL_32]], %[[VAL_34]] : (index, index, index) -> !fir.shift<3>
				// CHECK: %[[VAL_36:.]] = fir.declare %[[VAL_0]](%[[VAL_35]]) {fortran_attrs = {{.}}, uniq_name = "_QFlbounds_reproEu"} : (!fir.box<!fir.array<?x?x?xf32>>, !fir.shift<3>) -> !fir.box<!fir.array<?x?x?xf32>>
				// CHECK: %[[VAL_37:.*]] = fir.rebox %[[VAL_36]](%[[VAL_35]]) : (!fir.box<!fir.array<?x?x?xf32>>, !fir.shift<3>) -> !fir.box<!fir.array<?x?x?xf32>>
				// CHECK: %[[VAL_38:.*]] = fir.array_coor %[[VAL_37]](%[[VAL_35]]) %[[VAL_11]], %[[VAL_11]], %[[VAL_11]] : (!fir.box<!fir.array<?x?x?xf32>>, !fir.shift<3>, index, index, index) -> !fir.ref<f32>
				// CHECK: %[[VAL_39:.*]] = fir.load %[[VAL_38]] : !fir.ref<f32>
				// CHECK: fir.store %[[VAL_39]] to %[[VAL_28]] : !fir.ref<f32>
				// CHECK: %[[VAL_40:.*]] = fir.load %[[VAL_21]] : !fir.ref<i32>
				// CHECK: %[[VAL_41:.*]] = fir.convert %[[VAL_40]] : (i32) -> index
				// CHECK: %[[VAL_42:.*]] = fir.load %[[VAL_20]] : !fir.ref<i32>
				// CHECK: %[[VAL_43:.*]] = arith.subi %[[VAL_42]], %[[VAL_10]] : i32
				// CHECK: %[[VAL_44:.*]] = fir.convert %[[VAL_43]] : (i32) -> index
				// CHECK: %[[VAL_45:.*]] = fir.convert %[[VAL_41]] : (index) -> i32
				// CHECK: %[[VAL_46:.]]:2 = fir.do_loop %[[VAL_47:.]] = %[[VAL_41]] to %[[VAL_44]] step %[[VAL_11]] iter_args(%[[VAL_48:.*]] = %[[VAL_45]]) -> (index, i32) {
				// CHECK: fir.store %[[VAL_48]] to %[[VAL_18]] : !fir.ref<i32>
				// CHECK: %[[VAL_49:.*]] = fir.load %[[VAL_16]] : !fir.ref<i32>
				// CHECK: %[[VAL_50:.*]] = fir.convert %[[VAL_49]] : (i32) -> index
				// CHECK: %[[VAL_51:.*]] = fir.load %[[VAL_15]] : !fir.ref<i32>
				// CHECK: %[[VAL_52:.*]] = fir.convert %[[VAL_51]] : (i32) -> index
				// CHECK: %[[VAL_53:.*]] = fir.convert %[[VAL_50]] : (index) -> i32
				// CHECK: %[[VAL_54:.]]:2 = fir.do_loop %[[VAL_55:.]] = %[[VAL_50]] to %[[VAL_52]] step %[[VAL_11]] iter_args(%[[VAL_56:.*]] = %[[VAL_53]]) -> (index, i32) {
				// CHECK: fir.store %[[VAL_56]] to %[[VAL_13]] : !fir.ref<i32>
				// CHECK: %[[VAL_57:.*]] = fir.load %[[VAL_26]] : !fir.ref<i32>
				// CHECK: %[[VAL_58:.*]] = fir.convert %[[VAL_57]] : (i32) -> index
				// CHECK: %[[VAL_59:.*]] = fir.load %[[VAL_25]] : !fir.ref<i32>
				// CHECK: %[[VAL_60:.*]] = arith.subi %[[VAL_59]], %[[VAL_10]] : i32
				// CHECK: %[[VAL_61:.*]] = fir.convert %[[VAL_60]] : (i32) -> index
				// CHECK: %[[VAL_62:.*]] = fir.convert %[[VAL_58]] : (index) -> i32
				// CHECK: %[[VAL_63:.*]] = arith.constant 0 : index
				// CHECK: %[[VAL_64:.*]]:3 = fir.box_dims %[[VAL_37]], %[[VAL_63]] : (!fir.box<!fir.array<?x?x?xf32>>, index) -> (index, index, index)
				// CHECK: %[[VAL_65:.*]] = arith.constant 1 : index
				// CHECK: %[[VAL_66:.*]]:3 = fir.box_dims %[[VAL_37]], %[[VAL_65]] : (!fir.box<!fir.array<?x?x?xf32>>, index) -> (index, index, index)
				// CHECK: %[[VAL_67:.*]] = arith.constant 2 : index
				// CHECK: %[[VAL_68:.*]]:3 = fir.box_dims %[[VAL_37]], %[[VAL_67]] : (!fir.box<!fir.array<?x?x?xf32>>, index) -> (index, index, index)
				// CHECK: %[[VAL_69:.*]] = arith.constant 4 : index
				// CHECK: %[[VAL_70:.*]] = arith.cmpi eq, %[[VAL_64]]#2, %[[VAL_69]] : index
				// CHECK: %[[VAL_71:.*]]:2 = fir.if %[[VAL_70]] -> (index, i32) {
				// CHECK: %[[VAL_72:.*]] = fir.convert %[[VAL_37]] : (!fir.box<!fir.array<?x?x?xf32>>) -> !fir.box<!fir.array<?xf32>>
				// CHECK: %[[VAL_73:.*]] = fir.box_addr %[[VAL_72]] : (!fir.box<!fir.array<?xf32>>) -> !fir.ref<!fir.array<?xf32>>
				// CHECK: %[[VAL_74:.]]:2 = fir.do_loop %[[VAL_75:.]] = %[[VAL_58]] to %[[VAL_61]] step %[[VAL_11]] iter_args(%[[VAL_76:.*]] = %[[VAL_62]]) -> (index, i32) {
				// CHECK: fir.store %[[VAL_76]] to %[[VAL_23]] : !fir.ref<i32>
				// CHECK: %[[VAL_77:.*]] = fir.load %[[VAL_23]] : !fir.ref<i32>
				// CHECK: %[[VAL_78:.*]] = fir.convert %[[VAL_77]] : (i32) -> i64
				// CHECK: %[[VAL_79:.*]] = fir.load %[[VAL_13]] : !fir.ref<i32>
				// CHECK: %[[VAL_80:.*]] = fir.convert %[[VAL_79]] : (i32) -> i64
				// CHECK: %[[VAL_81:.*]] = fir.load %[[VAL_18]] : !fir.ref<i32>
				// CHECK: %[[VAL_82:.*]] = fir.convert %[[VAL_81]] : (i32) -> i64
				// CHECK: %[[VAL_83:.*]] = fir.convert %[[VAL_34]] : (index) -> i64
				// CHECK: %[[VAL_84:.*]] = arith.subi %[[VAL_82]], %[[VAL_83]] : i64
				// CHECK: %[[VAL_85:.*]] = fir.convert %[[VAL_84]] : (i64) -> index
				// CHECK: %[[VAL_86:.*]] = arith.muli %[[VAL_68]]#2, %[[VAL_85]] : index
				// CHECK: %[[VAL_87:.*]] = fir.convert %[[VAL_32]] : (index) -> i64
				// CHECK: %[[VAL_88:.*]] = arith.subi %[[VAL_80]], %[[VAL_87]] : i64
				// CHECK: %[[VAL_89:.*]] = fir.convert %[[VAL_88]] : (i64) -> index
				// CHECK: %[[VAL_90:.*]] = arith.muli %[[VAL_66]]#2, %[[VAL_89]] : index
				// CHECK: %[[VAL_91:.*]] = arith.addi %[[VAL_90]], %[[VAL_86]] : index
				// CHECK: %[[VAL_92:.*]] = fir.convert %[[VAL_30]] : (index) -> i64
				// CHECK: %[[VAL_93:.*]] = arith.subi %[[VAL_78]], %[[VAL_92]] : i64
				// CHECK: %[[VAL_94:.*]] = fir.convert %[[VAL_93]] : (i64) -> index
				// CHECK: %[[VAL_95:.*]] = arith.constant 2 : index
				// CHECK: %[[VAL_96:.*]] = arith.shrsi %[[VAL_91]], %[[VAL_95]] : index
				// CHECK: %[[VAL_97:.*]] = arith.addi %[[VAL_96]], %[[VAL_94]] : index
				// CHECK: %[[VAL_98:.*]] = fir.coordinate_of %[[VAL_73]], %[[VAL_97]] : (!fir.ref<!fir.array<?xf32>>, index) -> !fir.ref<f32>
				// CHECK: %[[VAL_99:.*]] = fir.load %[[VAL_98]] : !fir.ref<f32>
				// CHECK: %[[VAL_100:.*]] = fir.load %[[VAL_28]] : !fir.ref<f32>
				// CHECK: %[[VAL_101:.*]] = arith.cmpf ogt, %[[VAL_99]], %[[VAL_100]] : f32
				// CHECK: fir.if %[[VAL_101]] {
				// CHECK: %[[VAL_102:.*]] = fir.load %[[VAL_23]] : !fir.ref<i32>
				// CHECK: %[[VAL_103:.*]] = fir.convert %[[VAL_102]] : (i32) -> i64
				// CHECK: %[[VAL_104:.*]] = fir.load %[[VAL_13]] : !fir.ref<i32>
				// CHECK: %[[VAL_105:.*]] = fir.convert %[[VAL_104]] : (i32) -> i64
				// CHECK: %[[VAL_106:.*]] = fir.load %[[VAL_18]] : !fir.ref<i32>
				// CHECK: %[[VAL_107:.*]] = fir.convert %[[VAL_106]] : (i32) -> i64
				// CHECK: %[[VAL_108:.*]] = fir.convert %[[VAL_34]] : (index) -> i64
				// CHECK: %[[VAL_109:.*]] = arith.subi %[[VAL_107]], %[[VAL_108]] : i64
				// CHECK: %[[VAL_110:.*]] = fir.convert %[[VAL_109]] : (i64) -> index
				// CHECK: %[[VAL_111:.*]] = arith.muli %[[VAL_68]]#2, %[[VAL_110]] : index
				// CHECK: %[[VAL_112:.*]] = fir.convert %[[VAL_32]] : (index) -> i64
				// CHECK: %[[VAL_113:.*]] = arith.subi %[[VAL_105]], %[[VAL_112]] : i64
				// CHECK: %[[VAL_114:.*]] = fir.convert %[[VAL_113]] : (i64) -> index
				// CHECK: %[[VAL_115:.*]] = arith.muli %[[VAL_66]]#2, %[[VAL_114]] : index
				// CHECK: %[[VAL_116:.*]] = arith.addi %[[VAL_115]], %[[VAL_111]] : index
				// CHECK: %[[VAL_117:.*]] = fir.convert %[[VAL_30]] : (index) -> i64
				// CHECK: %[[VAL_118:.*]] = arith.subi %[[VAL_103]], %[[VAL_117]] : i64
				// CHECK: %[[VAL_119:.*]] = fir.convert %[[VAL_118]] : (i64) -> index
				// CHECK: %[[VAL_120:.*]] = arith.constant 2 : index
				// CHECK: %[[VAL_121:.*]] = arith.shrsi %[[VAL_116]], %[[VAL_120]] : index
				// CHECK: %[[VAL_122:.*]] = arith.addi %[[VAL_121]], %[[VAL_119]] : index
				// CHECK: %[[VAL_123:.*]] = fir.coordinate_of %[[VAL_73]], %[[VAL_122]] : (!fir.ref<!fir.array<?xf32>>, index) -> !fir.ref<f32>
				// CHECK: %[[VAL_124:.*]] = fir.load %[[VAL_123]] : !fir.ref<f32>
				// CHECK: fir.store %[[VAL_124]] to %[[VAL_28]] : !fir.ref<f32>
				// CHECK: } else {
				// CHECK: }
				// CHECK: %[[VAL_125:.*]] = arith.addi %[[VAL_75]], %[[VAL_11]] : index
				// CHECK: %[[VAL_126:.*]] = fir.convert %[[VAL_11]] : (index) -> i32
				// CHECK: %[[VAL_127:.*]] = fir.load %[[VAL_23]] : !fir.ref<i32>
				// CHECK: %[[VAL_128:.*]] = arith.addi %[[VAL_127]], %[[VAL_126]] : i32
				// CHECK: fir.result %[[VAL_125]], %[[VAL_128]] : index, i32
				// CHECK: }
				// CHECK: fir.result %[[VAL_129:.*]]#0, %[[VAL_129]]#1 : index, i32
				// CHECK: } else {
				// CHECK: %[[VAL_130:.]]:2 = fir.do_loop %[[VAL_131:.]] = %[[VAL_58]] to %[[VAL_61]] step %[[VAL_11]] iter_args(%[[VAL_132:.*]] = %[[VAL_62]]) -> (index, i32) {
				// CHECK: fir.store %[[VAL_132]] to %[[VAL_23]] : !fir.ref<i32>
				// CHECK: %[[VAL_133:.*]] = fir.load %[[VAL_23]] : !fir.ref<i32>
				// CHECK: %[[VAL_134:.*]] = fir.convert %[[VAL_133]] : (i32) -> i64
				// CHECK: %[[VAL_135:.*]] = fir.load %[[VAL_13]] : !fir.ref<i32>
				// CHECK: %[[VAL_136:.*]] = fir.convert %[[VAL_135]] : (i32) -> i64
				// CHECK: %[[VAL_137:.*]] = fir.load %[[VAL_18]] : !fir.ref<i32>
				// CHECK: %[[VAL_138:.*]] = fir.convert %[[VAL_137]] : (i32) -> i64
				// CHECK: %[[VAL_139:.*]] = fir.array_coor %[[VAL_37]](%[[VAL_35]]) %[[VAL_134]], %[[VAL_136]], %[[VAL_138]] : (!fir.box<!fir.array<?x?x?xf32>>, !fir.shift<3>, i64, i64, i64) -> !fir.ref<f32>
				// CHECK: %[[VAL_140:.*]] = fir.load %[[VAL_139]] : !fir.ref<f32>
				// CHECK: %[[VAL_141:.*]] = fir.load %[[VAL_28]] : !fir.ref<f32>
				// CHECK: %[[VAL_142:.*]] = arith.cmpf ogt, %[[VAL_140]], %[[VAL_141]] : f32
				// CHECK: fir.if %[[VAL_142]] {
				// CHECK: %[[VAL_143:.*]] = fir.load %[[VAL_23]] : !fir.ref<i32>
				// CHECK: %[[VAL_144:.*]] = fir.convert %[[VAL_143]] : (i32) -> i64
				// CHECK: %[[VAL_145:.*]] = fir.load %[[VAL_13]] : !fir.ref<i32>
				// CHECK: %[[VAL_146:.*]] = fir.convert %[[VAL_145]] : (i32) -> i64
				// CHECK: %[[VAL_147:.*]] = fir.load %[[VAL_18]] : !fir.ref<i32>
				// CHECK: %[[VAL_148:.*]] = fir.convert %[[VAL_147]] : (i32) -> i64
				// CHECK: %[[VAL_149:.*]] = fir.array_coor %[[VAL_37]](%[[VAL_35]]) %[[VAL_144]], %[[VAL_146]], %[[VAL_148]] : (!fir.box<!fir.array<?x?x?xf32>>, !fir.shift<3>, i64, i64, i64) -> !fir.ref<f32>
				// CHECK: %[[VAL_150:.*]] = fir.load %[[VAL_149]] : !fir.ref<f32>
				// CHECK: fir.store %[[VAL_150]] to %[[VAL_28]] : !fir.ref<f32>
				// CHECK: } else {
				// CHECK: }
				// CHECK: %[[VAL_151:.*]] = arith.addi %[[VAL_131]], %[[VAL_11]] : index
				// CHECK: %[[VAL_152:.*]] = fir.convert %[[VAL_11]] : (index) -> i32
				// CHECK: %[[VAL_153:.*]] = fir.load %[[VAL_23]] : !fir.ref<i32>
				// CHECK: %[[VAL_154:.*]] = arith.addi %[[VAL_153]], %[[VAL_152]] : i32
				// CHECK: fir.result %[[VAL_151]], %[[VAL_154]] : index, i32
				// CHECK: }
				// CHECK: fir.result %[[VAL_155:.*]]#0, %[[VAL_155]]#1 : index, i32
				// CHECK: }
				// CHECK: fir.store %[[VAL_156:.*]]#1 to %[[VAL_23]] : !fir.ref<i32>
				// CHECK: %[[VAL_157:.*]] = arith.addi %[[VAL_55]], %[[VAL_11]] : index
				// CHECK: %[[VAL_158:.*]] = fir.convert %[[VAL_11]] : (index) -> i32
				// CHECK: %[[VAL_159:.*]] = fir.load %[[VAL_13]] : !fir.ref<i32>
				// CHECK: %[[VAL_160:.*]] = arith.addi %[[VAL_159]], %[[VAL_158]] : i32
				// CHECK: fir.result %[[VAL_157]], %[[VAL_160]] : index, i32
				// CHECK: }
				// CHECK: fir.store %[[VAL_161:.*]]#1 to %[[VAL_13]] : !fir.ref<i32>
				// CHECK: %[[VAL_162:.*]] = arith.addi %[[VAL_47]], %[[VAL_11]] : index
				// CHECK: %[[VAL_163:.*]] = fir.convert %[[VAL_11]] : (index) -> i32
				// CHECK: %[[VAL_164:.*]] = fir.load %[[VAL_18]] : !fir.ref<i32>
				// CHECK: %[[VAL_165:.*]] = arith.addi %[[VAL_164]], %[[VAL_163]] : i32
				// CHECK: fir.result %[[VAL_162]], %[[VAL_165]] : index, i32
				// CHECK: }
				// CHECK: fir.store %[[VAL_166:.*]]#1 to %[[VAL_18]] : !fir.ref<i32>
				// CHECK: return
				// CHECK: }
	} // End module			} // End module