This is an archive of the discontinued LLVM Phabricator instance.

[flang][LoopVersioning] support fir.array_coor
ClosedPublic

Authored by tblah on Aug 23 2023, 5:03 AM.

Details

Summary

This is the last piece required for the loop versioning patch to work on
code lowered via HLFIR. With this patch, HLFIR performance on spec2017
roms is now similar to the FIR lowering.

Adding support for fir.array_coor means that many more loops will be
versioned, even in the FIR lowering. So far as I have seen, these do not
seem to have an impact on performance for the benchmarks I tried, but I
expect it would speed up some programs, if the loop being versioned
happened to be the hot code.

The main difference between fir.array_coor and fir.coordinate_of is
that fir.coordinate_of uses zero-based indices, whereas fir.array_coor
uses the indices as specified in the Fortran program (starting from 1 by
default, but also supporting non default lower bounds). I opted to
transform fir.array_coor operations into fir.coordinate_of operations
because this allows both to share the same offset calculation logic.

The tricky bit of this patch is getting the correct lower bounds for the
array operand to subtract from the fir.array_coor indices to get a
zero-based indices. So far as I can tell, the FIR lowering will always
provide lower bounds (shift) information in the shape operand to the
fir.array_coor when non-default lower bounds are used. If none is given,
I originally tried falling back to reading lower bounds from the box,
but this led to misscompilation in SPEC2017 cam4. Therefore the pass
instead assumes that if it can't already find an SSA value for the shift
information, the default lower bound (1) should be used.

A suspect the incorrect lower bounds in the box for the FIR lowering was
already a known issue (see https://reviews.llvm.org/D158119).

Diff Detail

Event Timeline

tblah created this revision.Aug 23 2023, 5:03 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 23 2023, 5:03 AM
Herald added a subscriber: mehdi_amini. · View Herald Transcript
tblah requested review of this revision.Aug 23 2023, 5:03 AM

Sorry, I will need more time to review this thoroughly.

So far as I can tell, the FIR lowering will always provide lower bounds (shift) information in the shape operand to the fir.array_coor when non-default lower bounds are used. If none is given, I originally tried falling back to reading lower bounds from the box, but this led to misscompilation in SPEC2017 cam4. Therefore the pass instead assumes that if it can't already find an SSA value for the shift information, the default lower bound (1) should be used.

I believe the assumption is correct: fir.array_coor without a shift/shapeshift operand implies default lbound-1. I think this comes from the code in pre-cg lowering of fir.array_coor and further codegen. I think fir.array_coor description might be missing this.

As to the incorrect bounds in the box, I wonder how this is possible given the changes in D158119. Will it be easy for you to put your changes for "falling back to reading lower bounds from the box" under an option so that I can run more testing and see if the issue appears on smaller tests? I do not want to deal with cam4 miscompare...

vzakhari added inline comments.Aug 24 2023, 4:54 PM
flang/lib/Optimizer/Transforms/LoopVersioning.cpp
145

I think cases 2 and 3 should not be here. According to ArrayCoorOp pre-codegen and XArrayCoorOp codegen, if there is no shift, then the lower bounds are always 1. I would suggest following the same logic here for consistency.

246

Please add braces for the else-if clause.

Matt added a subscriber: Matt.Aug 28 2023, 1:33 PM
tblah added a comment.Aug 29 2023, 8:20 AM

Sorry, I will need more time to review this thoroughly.

So far as I can tell, the FIR lowering will always provide lower bounds (shift) information in the shape operand to the fir.array_coor when non-default lower bounds are used. If none is given, I originally tried falling back to reading lower bounds from the box, but this led to misscompilation in SPEC2017 cam4. Therefore the pass instead assumes that if it can't already find an SSA value for the shift information, the default lower bound (1) should be used.

I believe the assumption is correct: fir.array_coor without a shift/shapeshift operand implies default lbound-1. I think this comes from the code in pre-cg lowering of fir.array_coor and further codegen. I think fir.array_coor description might be missing this.

As to the incorrect bounds in the box, I wonder how this is possible given the changes in D158119. Will it be easy for you to put your changes for "falling back to reading lower bounds from the box" under an option so that I can run more testing and see if the issue appears on smaller tests? I do not want to deal with cam4 miscompare...

Yes after D158119, I saw no issue on cam4 with HLFIR lowering. The issue was with the FIR lowering.

Sorry, I will need more time to review this thoroughly.

So far as I can tell, the FIR lowering will always provide lower bounds (shift) information in the shape operand to the fir.array_coor when non-default lower bounds are used. If none is given, I originally tried falling back to reading lower bounds from the box, but this led to misscompilation in SPEC2017 cam4. Therefore the pass instead assumes that if it can't already find an SSA value for the shift information, the default lower bound (1) should be used.

I believe the assumption is correct: fir.array_coor without a shift/shapeshift operand implies default lbound-1. I think this comes from the code in pre-cg lowering of fir.array_coor and further codegen. I think fir.array_coor description might be missing this.

As to the incorrect bounds in the box, I wonder how this is possible given the changes in D158119. Will it be easy for you to put your changes for "falling back to reading lower bounds from the box" under an option so that I can run more testing and see if the issue appears on smaller tests? I do not want to deal with cam4 miscompare...

Yes after D158119, I saw no issue on cam4 with HLFIR lowering. The issue was with the FIR lowering.

I see. Then please ignore my request.

vzakhari added inline comments.Aug 30 2023, 8:32 PM
flang/lib/Optimizer/Transforms/LoopVersioning.cpp
145

Hi @tblah, I just want to make sure we are on the same page. I think 2 and 3 needs to be removed for correctness, though, I do not have an example where this might cause a problem. Otherwise, the patch looks good to me!

tblah marked 2 inline comments as done.Aug 31 2023, 3:05 AM
tblah added inline comments.
flang/lib/Optimizer/Transforms/LoopVersioning.cpp
145

Hi yeah I'm working on it. I've had a few days off recently and all the spec runs take ages.

I think this shouldn't be a correctness issue - if we have incorrect lower bounds information in the IR then that is a bug. But I'm happy to remove it so long as the HLFIR flow still works.

tblah updated this revision to Diff 554970.Aug 31 2023, 4:04 AM

Changes: removed attempts to find lower bounds by tracing the memref argument through IR

kiranchandramohan accepted this revision.Sep 1 2023, 6:37 AM

LGTM. Thanks Tom.

This revision is now accepted and ready to land.Sep 1 2023, 6:37 AM
vzakhari accepted this revision.Sep 1 2023, 9:04 AM

Thank you, Tom!

This revision was landed with ongoing or failed builds.Sep 4 2023, 3:43 AM
This revision was automatically updated to reflect the committed changes.