Page MenuHomePhabricator

[flang] Add Count to simplified intrinsics

Authored by SBallantyne on Jan 20 2023, 6:35 AM.



This patch adds a simplfiied version of count for the simplify intrinsics pass, allowing the function to be inlined.

This was done specifically to help improve performance for exchange2, and provides a ~12% performance increase.

Diff Detail

Event Timeline

SBallantyne created this revision.Jan 20 2023, 6:35 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJan 20 2023, 6:35 AM
SBallantyne requested review of this revision.Jan 20 2023, 6:35 AM
SBallantyne edited the summary of this revision. (Show Details)Jan 20 2023, 7:11 AM
SBallantyne added reviewers: DavidTruby, tblah.

Looks OK in general, just some minor tidy-up suggestions.


Need to document the new argument.


Nit: Did you clang-format this? I'm suspicious the elementType is sticking out further than most other things...


Spurious whitespace change.


Since this is the same as resultType, you don't need to fetch it again.


You'll want a return here. Mostly to match the rest of the code...

Minor changes to formating and documentation

SBallantyne marked 5 inline comments as done.Jan 23 2023, 4:24 AM
SBallantyne added inline comments.

This is after clang-format, and if i move elementType to another line on its own clang-format will put it back to the previous line.


This is in a different function and doesn't have the resultType variable available to it.

vzakhari added inline comments.Jan 23 2023, 9:29 AM

Since _FortranACount does not have the element type in its name, we have to add it explicitly, otherwise, all calls will map to the same simplified version regardless of the element type. Please add a test with different logical kinds.

vzakhari added inline comments.Jan 23 2023, 1:09 PM

It looks like you assume that logical TRUE is 1 - I am not sure if this is correct. I would rather reproduce the logic from flang/runtime/tools.h, and assume that everything that is not 0 is TRUE:

// Utility for dealing with elemental LOGICAL arguments
inline bool IsLogicalElementTrue(
    const Descriptor &logical, const SubscriptValue at[]) {
  // A LOGICAL value is false if and only if all of its bytes are zero.
  const char *p{logical.Element<char>(at)};
  for (std::size_t j{logical.ElementBytes()}; j-- > 0; ++p) {
    if (*p) {
      return true;
  return false;
SBallantyne added inline comments.Jan 24 2023, 3:28 AM

That is intended, the compiler will deal with different sized logicals by converting them to logical<4> before passing to count, for runtime or simplified. I can still add the test if needed, but this code isn't responsible for handling this, and if that area breaks regular count will also break.

vzakhari added inline comments.Jan 24 2023, 8:37 AM

Thank you for the clarification! I did not know that we always create a copy for the Count call. I think this is a performance issue, since we do not really need this copy. I would suggest fixing the mangling in this commit, and getting rid of the copy in another commit.

SBallantyne marked an inline comment as done.
SBallantyne edited the summary of this revision. (Show Details)

Change to test for 0 in simplified count rather than add logicals

SBallantyne added inline comments.Jan 25 2023, 5:37 AM

I agree that it is wasteful to convert everything to logical<4> but i think that including the kind of logical in the call name and getting rid of the conversion would be better kept in the same commit, rather than split over two commits - it doesn't cause any issues with this current version and wouldn't provide any benefit yet either.

vzakhari added inline comments.Jan 25 2023, 1:09 PM

I think one64 and zero64 need to be swapped.


Ok, works for me.

In addition, it does not make much sense to mangle the logical versions with FMF, since they do not have any FP operations. I think we'd better fix this in the current commit.

Change naming scheme to not include fast math strings in logical functions

SBallantyne added inline comments.Jan 26 2023, 3:13 AM

The function does work as it is currently, though it is strange that the ordering here is swapped compared to the mlir generated:

%7 = %6, %c0_i64_1, %c1_i64 : i64

Changing it around does swap the values in the mlir, leading to the wrong result being calculated

builder.create<mlir::arith::SelectOp>(loc, compare, zero64, one64);
%7 = %6, %c1_i64, %c0_i64_1 : i64

Fix zero64/one64 values being swapped around


Nevermind, i somehow didn't notice that one64 was declared using 0 and zero64 was delcared using 1

Leporacanthicus accepted this revision.Jan 26 2023, 11:53 AM

I think this is good to go in now, subject to @vzakhari not finding anything else wrong.

This revision is now accepted and ready to land.Jan 26 2023, 11:53 AM
vzakhari accepted this revision.Jan 26 2023, 9:34 PM

Thank you for the changes. Just one minor thing.


nit: the final mlir::Twine{} is redundant.

Remove uneeded mlir::twine{}

This revision was landed with ongoing or failed builds.Jan 27 2023, 8:30 AM
This revision was automatically updated to reflect the committed changes.