This is an archive of the discontinued LLVM Phabricator instance.

[flang] Add & use a better visit()
ClosedPublic

Authored by klausler on Mar 24 2022, 3:54 PM.

Details

Summary

Adds flang/include/flang/Common/visit.h, which defines
a Fortran::common::visit() template function that is a drop-in
replacement for std::visit(). Modifies most use sites in
the front-end and runtime to use common::visit().

The C++ standard mandates that std::visit() have O(1) execution
time, which forces implementations to build dispatch tables.
This new common::visit() is O(log2 N) in the number of alternatives
in a variant<>, but that N tends to be small and so this change
produces a fairly significant improvement in compiler build
memory requirements, a 5-10% improvement in compiler build time,
and a small improvement in compiler execution time.

Building with -DFLANG_USE_STD_VISIT causes common::visit()
to be an alias for std::visit().

Calls to common::visit() with multiple variant arguments
are referred to std::visit(), pending further work.

Diff Detail

Event Timeline

klausler created this revision.Mar 24 2022, 3:54 PM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: jdoerfert. · View Herald Transcript
klausler requested review of this revision.Mar 24 2022, 3:54 PM
klausler updated this revision to Diff 418069.Mar 24 2022, 4:09 PM

Rebased to current HEAD.

klausler updated this revision to Diff 418103.Mar 24 2022, 6:38 PM
klausler edited the summary of this revision. (Show Details)

Scrub some changes that snuck in from another patch.

jeanPerier requested changes to this revision.Mar 25 2022, 12:55 AM

LGTM

This revision now requires changes to proceed.Mar 25 2022, 12:55 AM
jeanPerier accepted this revision.Mar 25 2022, 12:55 AM

Wrong button, approved.

This revision is now accepted and ready to land.Mar 25 2022, 12:55 AM
This revision was landed with ongoing or failed builds.Mar 25 2022, 1:15 PM
This revision was automatically updated to reflect the committed changes.

Thanks for this patch. I will need to revert it for now as it's causing multiple buildbots to fail:

All of these buildbots run on AArch64 and use Clang. I can confirm that there are no issues when:

  • building on X86 (tested with GCC 9.3 and Clang 12)
  • building on AArch64 with GCC (tested with GCC 11)

AFAIK, there's no pre-merge CI for AArch64, but I can test this manually for you once a fix is available. One other option is to change this to "opt-in", but I'm not a fan.

The change could have been disabled easily in one place with a #define -- it was protected by #ifdefs. You should not have reverted all of it without consulting me. I will not waste any more time trying to push it into llvm.

rovka added a subscriber: rovka.Mar 29 2022, 1:41 AM

The change could have been disabled easily in one place with a #define -- it was protected by #ifdefs. You should not have reverted all of it without consulting me. I will not waste any more time trying to push it into llvm.

@klausler: I can't speak for Andrzej, but I think his revert was conforming to the (admittedly aggressive) LLVM revert policy. A revert is not meant as an affront, and in general it's preferred to revert and recommit with a fix than to commit just a fix later on (this makes it easier for people to backport patches into e.g. the release branch or downstream branches without having to chase follow-up fixes). We also try really hard to bring the buildbots back to green as soon as possible, because if someone else breaks something while the bots are already red, they won't get a notification and it will be much more difficult to track down the issue (it basically means the person who committed the second red patch won't be aware that they broke something, and it will be down to the buildbot owner to notice that the CI is red and waste a lot of time trying to figure out what's going on; this also tends to get a nasty snowball effect). Because of this urgency it's considered good practice to revert someone else's patch if they don't do it themselves after some arbitrary amount of time (and especially if you know they're on a different timezone and suspect they might be offline).

TL;DR: I think this is just a misunderstanding, please don't abandon the patch because of it. My personal impression is that Andrzej was acting in good faith here, according to the norms and habits of the wider LLVM community.

TL;DR: I think this is just a misunderstanding, please don't abandon the patch because of it. My personal impression is that Andrzej was acting in good faith here, according to the norms and habits of the wider LLVM community.

The patch is not required for correctness, and I've already spent too much of my limited time on it.

Thanks Diana, this was indeed reverted in good faith. I appreciate that this can be frustrating, but with such a high traffic and so many developers working on this project, reverting is the most reliable way to address such issues in a timely fashion. @klausler, please don't abandon this patch. Given the time difference between us, it can be tricky to coordinate (IIUC, that's one of the reasons for the revert policy to be so liberal).

We did some more digging and it looks like a Clang 13 issue. On AArch64, both Clang 12 and Clang 14 are fine. I believe that @Leporacanthicus was also able to reproduce this with Clang 13 on X86.

We did some more digging and it looks like a Clang 13 issue. On AArch64, both Clang 12 and Clang 14 are fine. I believe that @Leporacanthicus was also able to reproduce this with Clang 13 on X86.

I have indeed reproduced this, both an older Clang-14 (from early November, not picked for any other reason than "I happen to have it compiled") and a Clang-13 from about a year back (again, not a specific release, just "a build I had sitting around"). I stopped compiling at around 25-30 minutes of CPU time in PFTBuilder.cpp.

I suspect this is a bug in Clang (or performance issue), exposed by this change. I ran perf record clang++ ..., but it's not showing an obvious "it hangs in one function" - it may well still hang, just that it's calling many different functions within the loop that it is stuck in. I have not tried to debug using for example gdb. This has been a background task while I was working on other things, trying to help Andrzej understand the issue.

Changing the logic to disable the new visit function works fine, so one option might be to commit this with it default disabled, and add some CMake code to enable it where it is beneficial (or not exposing the bug at least). I'm not entirely sure what those conditions are, however.

I have left my machine to compile that file over night, to see if it's a genuine hang or just "takes a very long time" - if it's not finished by tomorrow morning, I'm definitely calling it a hang. I will report back tomorrow on the result of that.

I have left my machine to compile that file over night, to see if it's a genuine hang or just "takes a very long time" - if it's not finished by tomorrow morning, I'm definitely calling it a hang. I will report back tomorrow on the result of that.

15h29m of CPU time later, and it still hasn't finished. I call that a hang... :)

Quick look in the debugger, it seems to be stuck in llvm::InlinerPass::run, but I've not spent any effort on figuring out why/reduce the problem