This is an archive of the discontinued LLVM Phabricator instance.

[flang][hlfir] `hlfir.char_extremum` op definition and codegen
ClosedPublic

Authored by cabreraam on Jun 8 2023, 3:15 PM.

Details

Summary

This patch adds an hlfir operation called char_extremum, which takes the
lexicographic comparison between a variadic number (minimum of 2 arguments) of
characters.

Discussion for this work can be found in the draft revision found
here. The reason I'm not promoting that draft to
a true patch for review was because I needed to separate out the op
definition/codegen and lowering as two separate patches, as preferred by
@jeanPerier.

Diff Detail

Event Timeline

cabreraam created this revision.Jun 8 2023, 3:15 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
cabreraam requested review of this revision.Jun 8 2023, 3:15 PM
cabreraam updated this revision to Diff 529820.Jun 8 2023, 9:38 PM

Buildbot was failing on a test. I forgot to change tmp to .tmp.char_extremum in
all cases.

tblah added a comment.Jun 10 2023, 9:22 AM

Thank you for your work on this.

I'm worried this could generate a significant amount of code if the intrinsic is used regularly. I guess you decided not to implement this as a runtime function (instead of always generating code inline) because of the variable number of arguments?

Thank you for your work on this.

I'm worried this could generate a significant amount of code if the intrinsic is used regularly. I guess you decided not to implement this as a runtime function (instead of always generating code inline) because of the variable number of arguments?

Thanks for the feedback @tblah

I actually do generate a call to a runtime function -- _FortranACharacterCompareScalar1 -- through the call to the fir::runtime::genCharCompare function. Is this what you meant?

As I'm still new-ish to this flavor of contribution, it's not immediately obvious to me what "a significant amount of code is", so if you and others feel this way, I am open to suggestions about different ways to proceed.

Though the way I implement variadic arguments introduces additional lines of MLIR for every argument, my sense is that there is "a lot" of code generated just because there is a fair bit of stuff happening:
+ variadic arguments
+ finding the lexicographically largest/smallest character
+ finding the largest character w.r.t. size
+ allocating a new buffer for the result
+ populating that buffer using a runtime call to memmove

I may be slow to respond until Monday, but I'm interested in your feedback!

tschuett added inline comments.
flang/include/flang/Optimizer/Builder/Character.h
48

You could use an ArrayRef instead of the SmallVector.

73

ArrayRef?

flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp
1486

Newline?

tblah added a comment.EditedJun 11 2023, 12:45 PM

Thank you for your work on this.

I'm worried this could generate a significant amount of code if the intrinsic is used regularly. I guess you decided not to implement this as a runtime function (instead of always generating code inline) because of the variable number of arguments?

Thanks for the feedback @tblah

I actually do generate a call to a runtime function -- _FortranACharacterCompareScalar1 -- through the call to the fir::runtime::genCharCompare function. Is this what you meant?

As I'm still new-ish to this flavor of contribution, it's not immediately obvious to me what "a significant amount of code is", so if you and others feel this way, I am open to suggestions about different ways to proceed.

Though the way I implement variadic arguments introduces additional lines of MLIR for every argument, my sense is that there is "a lot" of code generated just because there is a fair bit of stuff happening:
+ variadic arguments
+ finding the lexicographically largest/smallest character
+ finding the largest character w.r.t. size
+ allocating a new buffer for the result
+ populating that buffer using a runtime call to memmove

I may be slow to respond until Monday, but I'm interested in your feedback!

Thanks for your thoughts.

Yeah it might be fine, I was just wondering if there was a particular reason you decided to do it this way. Most of the intrinsics are fully implemented in the runtime, so the only generated code will be for the function call. This means that the implementation of the intrinsic function is only stored once in memory instead of being inlined at every call site.

There is the simplify intrinsics pass, which can cause some frequently used intrinsics to be inlined. That does give performance gains without excessively bloating binaries so this might be fine. Do you have any programs in mind which use these intrinsics a lot? My comment might be a case of premature optimization.

If there's a good reason this can't be done in the runtime (e.g. there's no convenient way to support varadic arguments) then I think your approach makes sense too.

cabreraam added inline comments.Jun 12 2023, 11:13 AM
flang/include/flang/Optimizer/Builder/Character.h
73

I can make this change, no problem. Is there a reason why ArrayRef is preferable to SmallVector, aside from continuity in other parts of `Character.{h,cpp}. Is there a performance benefit?

tschuett added inline comments.Jun 13 2023, 1:48 AM
flang/include/flang/Optimizer/Builder/Character.h
73

There is no performance difference. There is just precedence for ArrayRef.

tblah added a comment.Jun 14 2023, 8:33 AM

Thank you for your work on this.

I'm worried this could generate a significant amount of code if the intrinsic is used regularly. I guess you decided not to implement this as a runtime function (instead of always generating code inline) because of the variable number of arguments?

Thanks for the feedback @tblah

I actually do generate a call to a runtime function -- _FortranACharacterCompareScalar1 -- through the call to the fir::runtime::genCharCompare function. Is this what you meant?

As I'm still new-ish to this flavor of contribution, it's not immediately obvious to me what "a significant amount of code is", so if you and others feel this way, I am open to suggestions about different ways to proceed.

Though the way I implement variadic arguments introduces additional lines of MLIR for every argument, my sense is that there is "a lot" of code generated just because there is a fair bit of stuff happening:
+ variadic arguments
+ finding the lexicographically largest/smallest character
+ finding the largest character w.r.t. size
+ allocating a new buffer for the result
+ populating that buffer using a runtime call to memmove

I may be slow to respond until Monday, but I'm interested in your feedback!

Thanks for your thoughts.

Yeah it might be fine, I was just wondering if there was a particular reason you decided to do it this way. Most of the intrinsics are fully implemented in the runtime, so the only generated code will be for the function call. This means that the implementation of the intrinsic function is only stored once in memory instead of being inlined at every call site.

There is the simplify intrinsics pass, which can cause some frequently used intrinsics to be inlined. That does give performance gains without excessively bloating binaries so this might be fine. Do you have any programs in mind which use these intrinsics a lot? My comment might be a case of premature optimization.

If there's a good reason this can't be done in the runtime (e.g. there's no convenient way to support varadic arguments) then I think your approach makes sense too.

Having thought about this more and discussed with @cabreraam, I don't think there is a simple enough way to pass varadic arguments to the runtime for this to be worth trying. We can fix this if/when we have a concrete case where the generated code is too bloated. Some other intrinsic functions with variable numbers of arguments seem to produce multiple runtime calls using an accumulator variable. This is roughly what this patch already does (although there is also the calculation of the return type).

Thanks for your work on this. The patch looks good to me once the existing comments are addressed.

cabreraam updated this revision to Diff 531968.Jun 15 2023, 7:07 PM

Updating D152474: [flang][hlfir] hlfir.char_extremum op definition and codegen

Addressing some nits from the patch

cabreraam marked 3 inline comments as done.Jun 15 2023, 7:09 PM

Thank you for your work on this.

I'm worried this could generate a significant amount of code if the intrinsic is used regularly. I guess you decided not to implement this as a runtime function (instead of always generating code inline) because of the variable number of arguments?

Thanks for the feedback @tblah

I actually do generate a call to a runtime function -- _FortranACharacterCompareScalar1 -- through the call to the fir::runtime::genCharCompare function. Is this what you meant?

As I'm still new-ish to this flavor of contribution, it's not immediately obvious to me what "a significant amount of code is", so if you and others feel this way, I am open to suggestions about different ways to proceed.

Though the way I implement variadic arguments introduces additional lines of MLIR for every argument, my sense is that there is "a lot" of code generated just because there is a fair bit of stuff happening:
+ variadic arguments
+ finding the lexicographically largest/smallest character
+ finding the largest character w.r.t. size
+ allocating a new buffer for the result
+ populating that buffer using a runtime call to memmove

I may be slow to respond until Monday, but I'm interested in your feedback!

Thanks for your thoughts.

Yeah it might be fine, I was just wondering if there was a particular reason you decided to do it this way. Most of the intrinsics are fully implemented in the runtime, so the only generated code will be for the function call. This means that the implementation of the intrinsic function is only stored once in memory instead of being inlined at every call site.

There is the simplify intrinsics pass, which can cause some frequently used intrinsics to be inlined. That does give performance gains without excessively bloating binaries so this might be fine. Do you have any programs in mind which use these intrinsics a lot? My comment might be a case of premature optimization.

If there's a good reason this can't be done in the runtime (e.g. there's no convenient way to support varadic arguments) then I think your approach makes sense too.

Having thought about this more and discussed with @cabreraam, I don't think there is a simple enough way to pass varadic arguments to the runtime for this to be worth trying. We can fix this if/when we have a concrete case where the generated code is too bloated. Some other intrinsic functions with variable numbers of arguments seem to produce multiple runtime calls using an accumulator variable. This is roughly what this patch already does (although there is also the calculation of the return type).

Thanks for your work on this. The patch looks good to me once the existing comments are addressed.

Comments have been addressed! Thanks for the feedback, again, @tblah, @jeanPerier, and @tschuett

flang/include/flang/Optimizer/Builder/Character.h
73

Took care of it!

cabreraam updated this revision to Diff 532697.Jun 19 2023, 9:45 AM
cabreraam marked an inline comment as done.

Forgot newline for HLFIROps.cpp

Thanks, I have a small nit, this looks good to me otherwise. Please run clang-format on flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp (see pre-merge buildbot clang-format failure).

flang/lib/Optimizer/Builder/Character.cpp
797

nit: "not known at compile time" would be clearer than "undefined".

cabreraam updated this revision to Diff 533614.Jun 22 2023, 8:06 AM
cabreraam marked an inline comment as done.

Updating D152474: [flang][hlfir] hlfir.char_extremum op definition and codegen

cabreraam updated this revision to Diff 533624.Jun 22 2023, 8:24 AM

Updating D152474: [flang][hlfir] hlfir.char_extremum op definition and codegen

Updating D152474: [flang][hlfir] hlfir.char_extremum op definition and codegen

cabreraam added inline comments.Jun 22 2023, 10:14 AM
flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp
1486

Interestingly, adding a newline here was causing the build to fail because of clang-format.

jeanPerier accepted this revision.Jun 23 2023, 12:42 AM

LGTM, Thanks!

This revision is now accepted and ready to land.Jun 23 2023, 12:42 AM