This is an archive of the discontinued LLVM Phabricator instance.

InstrProf - don't emit 64 bit atomic operations on 32 bit platforms
Needs ReviewPublic

Authored by SeanMollet on Jun 6 2023, 6:21 PM.

Details

Summary

Please forgive any mistakes in form, this is my first contribution to LLVM.

Starting with this commit: 61ba2e3996120a08deef823dccd7e8d8cd9c4332
Profiling was implemented with an intrinsic and lowering for it was consolidated in InstrProfiling.cpp

Optional lowering of the counter increment via (faster) atomic operations was also added at this time. Per the notes from the original commit, the primary goal here was consolidating the manual lowering of the profiling code into one place so it could be hand optimized.

As best I can tell, later on, c3ddc13d7d631a29b55f82c7cc7a9008bf89b1f4 then sets DoCounterPromotion = true unless O0.

This breaks compilation with the profiler turned on for platforms that don't support 64 bit atomic operations.

Since the atomic operations are an optimization anyway and are only useful if the hardware has support, my patch checks for that support (in an admittedly naive way, but it should be sufficient) and disables the atomic optimizations if they're not going to work.

The resulting output of the compiler (including the profiling) works correctly on my target mipsel platform. It also produces correct output on x86_64, aarch64 and arm.

The existing tests for the profiler will adequately cover my changes as well. Prior to my change, both the atomic and non atomic variations of the code were used even within the same job at various places.

Alina, you approved several commits for this section of code, so I'm assuming you know it well. I'm open to suggestions if you have a better approach. Thanks.

Diff Detail

Event Timeline

SeanMollet created this revision.Jun 6 2023, 6:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2023, 6:21 PM
SeanMollet requested review of this revision.Jun 6 2023, 6:21 PM

Phabricator doesn't seem to have the commits from pre GIT days, so here's a link to the original commit that introduced the code in question:

https://github.com/SeanMollet/llvm-project/commit/61ba2e3996120a08deef823dccd7e8d8cd9c4332

The commit that added the lowering to atomics is: https://reviews.llvm.org/D34085

ellis added a comment.Jun 6 2023, 7:30 PM

It would be good to test that atomics are not used for targets that don't support them. I think it would look similar to llvm/test/Instrumentation/InstrProfiling/atomic-updates.ll.

Another option would be to throw an error if the user attempts to pass one of the atomic flags (e.g., -instrprof-atomic-counter-update-all, -atomic-first-counter, ...) on targets that don't support atomics. Or maybe we should at least report a warning? I'm not sure what the precedent is.

I'm also adding other reviewers that might have more context.

llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
454

I think this can be refactored to

if (!targetSupportsAtomic())
  return false;
SeanMollet updated this revision to Diff 529270.EditedJun 7 2023, 6:15 AM

Updated counter_promo tests to include a target triple that supports the atomic profiler functionality they're testing.
Added an atomic-updates-disabled test with a target triple of a device that doesn't support atomic profiling.
atomic-updates.ll already specified an appropriate target triple that supports atomic profiling.
Made @ellis suggested refactor.

It would be good to test that atomics are not used for targets that don't support them. I think it would look similar to llvm/test/Instrumentation/InstrProfiling/atomic-updates.ll.

Another option would be to throw an error if the user attempts to pass one of the atomic flags (e.g., -instrprof-atomic-counter-update-all, -atomic-first-counter, ...) on targets that don't support atomics. Or maybe we should at least report a warning? I'm not sure what the precedent is.

I'm also adding other reviewers that might have more context.

I added your suggested test and suggested refactor. Good ideas on both, thanks.

As I mentioned, this is my first LLVM contribution, so I'm not familiar with the guiding philosophy. Feel free to advise me if my thought process is off base. I generally prefer the path of least resistance/least change to APIs.

The difficulty with throwing an error is that atomics are now turned on by default. So, it's not the client asking for them in most cases, it's being done internally. Those options aren't exposed by the API and the profiler is used quite extensively in Rust. So, changing the behavior to require passing the flags would require what I suspect is significant rework here and there for no real benefit. Building for a target that supports the atomics, this works exactly as it always did and requires no changes downstream. Building for a target that doesn't support them used to fail, this makes it work. Minimal changes, everything works as expected.

I wouldn't object to a warning, since I don't think that would impair any functionality. I'm happy to add one if I can impose on somebody for a sample of the right way to throw one.

For target without hw instruction for fetch add, intrinsic like __sync_fetch_and_add() can be used instead -- it will be slow but workable. Why disabling it?

llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
456

why disabling counter promotion? It is a different from atomic counter update.

For target without hw instruction for fetch add, intrinsic like __sync_fetch_and_add() can be used instead -- it will be slow but workable. Why disabling it?

Because the whole "lowering to an atomic operation" thing exists for the purpose of improving performance, A slow replacement doesn't accomplish that, so why add more code and potential bugs when there's no benefit?

Regarding improving performance, it is more about improving the optimized build (profile-use) performance instead of the performance of the instrumented build. This is because the atomic update allows more precise counter update in multi-threaded programs.

Regarding improving performance, it is more about improving the optimized build (profile-use) performance instead of the performance of the instrumented build. This is because the atomic update allows more precise counter update in multi-threaded programs.

I agree that it's about improving the profiler performance. The full software approach would improve precision while profling multi-threaded programs at the cost of substantially slowing down the resulting program and potentially impacting the timing.

In general, the targets this change would apply to are older or embedded, slow and usually single core systems. In my particular case, I'm working on a 600 Mhz embedded Mips system. It's already slow and the code isn't very threaded. Given the choice, I'm perfectly happy to give up some accuracy for faster profiling.

If I was building something for a server or modern laptop/desktop/mobile phone, I'd make the opposite choice. But, they all support hardware atomics, so it's moot.

llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
456

Because it also sometimes lowers to an atomic counter update. As an alternative, it could be re-worked to not do that, but then it doesn't provide meaningful benefit.

It might be better to handle this at the start of InstrProfiling::run (to override the Atomic related options with a proper warning).

Another thing is that what if an user wants to enable the emulated atomic update even when it is slow? You may want to introduce another internal option to control that : e.g. useAtomicEmulation (default to false). If it is true, do not override the option even when the target does not have hardware impl.

It might be better to handle this at the start of InstrProfiling::run (to override the Atomic related options with a proper warning).

I agree that a warning is a good idea. Can you point me to a good sample of the right/correct/prefered way to word, format and output one?

Another thing is that what if an user wants to enable the emulated atomic update even when it is slow? You may want to introduce another internal option to control that : e.g. useAtomicEmulation (default to false). If it is true, do not override the option even when the target does not have hardware impl.

That's a reasonable approach. Is it actually necessary now? Considering that currently profiling isn't available at all on the targets in question, and I'm apparently the first person to notice that in the 9 years since the original change; I don't think the group of people that care is very large. If somebody asks for it in the future, I'll write it.

There is an example in InstrProfiling.cpp about missing debug information warning.

For the emulation feature, it is ok to delay it -- but please do a TODO as a comment.

There is an example in InstrProfiling.cpp about missing debug information warning.

For the emulation feature, it is ok to delay it -- but please do a TODO as a comment.

Perfect, thanks.

One other question, I want to set a flag so as to not spam the user with warnings. Is there a standard/expected way to do that in LLVM?

ellis added a comment.Jun 20 2023, 4:06 PM

There is an example in InstrProfiling.cpp about missing debug information warning.

For the emulation feature, it is ok to delay it -- but please do a TODO as a comment.

Yeah I kind of regret adding this warning as it is. It is definitely pretty spammy and the only way to suppress it is to suppress all backend warnings.

There is an example in InstrProfiling.cpp about missing debug information warning.

For the emulation feature, it is ok to delay it -- but please do a TODO as a comment.

Yeah I kind of regret adding this warning as it is. It is definitely pretty spammy and the only way to suppress it is to suppress all backend warnings.

I was thinking a static flag that gets set on the first warning and then skip subsequent ones. I could wrap that other warning as well. Just didn't want to go randomly adding static variables since I have no idea about the policies surrounding things like that.

ellis added a comment.Jun 20 2023, 5:13 PM

There is an example in InstrProfiling.cpp about missing debug information warning.

For the emulation feature, it is ok to delay it -- but please do a TODO as a comment.

Yeah I kind of regret adding this warning as it is. It is definitely pretty spammy and the only way to suppress it is to suppress all backend warnings.

I was thinking a static flag that gets set on the first warning and then skip subsequent ones. I could wrap that other warning as well. Just didn't want to go randomly adding static variables since I have no idea about the policies surrounding things like that.

Even with a static flag, that doesn't stop the warning from showing up from each clang invocation if you aren't using LTO.

By the way, why not simply use -fprofile-update=single (which I think will set Options.Atomic = false) for your use case? Now that I look closely, all the atomic flags seem to be false by default. Where are you seeing atomic instructions emitted for you?

My understanding is that atomic update is enabled in some shared build environment thus causing problems for the 32 bit target. If that is not the case, it is perhaps better to use -fprofile-update option to control the behavior as suggested by Ellis.

Even with a static flag, that doesn't stop the warning from showing up from each clang invocation if you aren't using LTO.

True. At least it would only be once per file. Without a flag, it would throw at least twice per file. I don't know the call pattern into PGO, so maybe more than that. At least twice though, which seems worth a flag. Open to suggestions, I only want to do the "right" thing..

By the way, why not simply use -fprofile-update=single (which I think will set Options.Atomic = false) for your use case? Now that I look closely, all the atomic flags seem to be false by default. Where are you seeing atomic instructions emitted for you?

Forgive me if I get the exact wording of this wrong, this is the my first foray into LLVM.

Because the intrinsics aren't being emitted by the backend. They're being converted to an atomic directly by InstrProfiling.cpp which doesn't look at that option.

PassBuilderPipelines.cpp line 787 effectively sets the default for InstrProfiling.cpp to true. DoCounterPromotion=true uses atomics.

// Add the profile lowering pass.
InstrProfOptions Options;
if (!ProfileFile.empty())
  Options.InstrProfileOutput = ProfileFile;
// Do counter promotion at Level greater than O0.
Options.DoCounterPromotion = true;
Options.UseBFIInPromotion = IsCS;
MPM.addPass(InstrProfiling(Options, IsCS));

My understanding is that atomic update is enabled in some shared build environment thus causing problems for the 32 bit target. If that is not the case, it is perhaps better to use -fprofile-update option to control the behavior as suggested by Ellis.

It's enabled at any optimization level greater than O0 (I pasted the offending code in the previous reply). As far as I can tell, profile-update has no effect on it.

ellis added a comment.Jun 20 2023, 6:08 PM

Even with a static flag, that doesn't stop the warning from showing up from each clang invocation if you aren't using LTO.

True. At least it would only be once per file. Without a flag, it would throw at least twice per file. I don't know the call pattern into PGO, so maybe more than that. At least twice though, which seems worth a flag. Open to suggestions, I only want to do the "right" thing..

By the way, why not simply use -fprofile-update=single (which I think will set Options.Atomic = false) for your use case? Now that I look closely, all the atomic flags seem to be false by default. Where are you seeing atomic instructions emitted for you?

Forgive me if I get the exact wording of this wrong, this is the my first foray into LLVM.

Because the intrinsics aren't being emitted by the backend. They're being converted to an atomic directly by InstrProfiling.cpp which doesn't look at that option.

PassBuilderPipelines.cpp line 787 effectively sets the default for InstrProfiling.cpp to true. DoCounterPromotion=true uses atomics.

// Add the profile lowering pass.
InstrProfOptions Options;
if (!ProfileFile.empty())
  Options.InstrProfileOutput = ProfileFile;
// Do counter promotion at Level greater than O0.
Options.DoCounterPromotion = true;
Options.UseBFIInPromotion = IsCS;
MPM.addPass(InstrProfiling(Options, IsCS));

I still don't see how DoCounterPromotion=true emits atomics. As far as I can tell, all atomic instructions emitted in InstrProfiling.cpp are guarded behind an atomic flag that defaults to false.

I still don't see how DoCounterPromotion=true emits atomics. As far as I can tell, all atomic instructions emitted in InstrProfiling.cpp are guarded behind an atomic flag that defaults to false.

I honestly couldn't figure it out either. If I change that to =false or comment that line out, they don't get emitted. Somehow AtomicCounterUpdatePromoted is getting set to true.

The offending emitter is InstrProfiling.cpp line 198:

Builder.CreateAtomicRMW(AtomicRMWInst::Add, Addr, LiveInValue,
                        MaybeAlign(),
                        AtomicOrdering::SequentiallyConsistent);

I added the check around line 743 as well for sanity reasons.

I still don't see how DoCounterPromotion=true emits atomics. As far as I can tell, all atomic instructions emitted in InstrProfiling.cpp are guarded behind an atomic flag that defaults to false.

I honestly couldn't figure it out either. If I change that to =false or comment that line out, they don't get emitted. Somehow AtomicCounterUpdatePromoted is getting set to true.

The offending emitter is InstrProfiling.cpp line 198:

Builder.CreateAtomicRMW(AtomicRMWInst::Add, Addr, LiveInValue,
                        MaybeAlign(),
                        AtomicOrdering::SequentiallyConsistent);

I added the check around line 743 as well for sanity reasons.

I think we should first figure out why that's the case. In any case, I don't think this change is the right approach. If we wanted to use atomics depending on whether the target supports them or not, we'll need to postpone the lowering until later in the backend.

llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
1390

I don't think this is correct, there are many 64-bit targets that don't have atomics, for example RV64 without the A extension, or even ARMv8-A. Conversely, there are 32-bit targets that support atomics such as Armv8-M.

SeanMollet marked an inline comment as done.Jun 20 2023, 7:31 PM

I don't think this change is the right approach.

As opposed to the current approach of just emitting them all the time with reckless abandon? This is a sanity check to eliminate targets that are almost certain to fail.

If we wanted to use atomics depending on whether the target supports them or not, we'll need to postpone the lowering until later in the backend.

I agree that either this or checking with the target class (which as far as I can tell isn't accessible from InstrProfiler) is the ideal solution. But, doing either is well beyond my experience with LLVM and available time.

llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
1390

Then RV64 without A and ARMv8-A should also be excluded. I don't believe there's a way to get sub-target details from inside this class. Do you know of a way?

Best I can tell, all it has is the triplet.

Note: the bar is not whether a target supports atomic operations. InstrProfiler.cpp emits 64 bit atomic operations, which based on the manual, I do not believe Armv8-M supports.

Ref the Armv8-M architecture reference manual:

https://developer.arm.com/documentation/ddi0553/latest/

Page 252 lists the available Load-Acquire/Store-Release instructions which do not include 64-bit variants.

I still don't see how DoCounterPromotion=true emits atomics. As far as I can tell, all atomic instructions emitted in InstrProfiling.cpp are guarded behind an atomic flag that defaults to false.

I honestly couldn't figure it out either. If I change that to =false or comment that line out, they don't get emitted. Somehow AtomicCounterUpdatePromoted is getting set to true.

The offending emitter is InstrProfiling.cpp line 198:

Builder.CreateAtomicRMW(AtomicRMWInst::Add, Addr, LiveInValue,
                        MaybeAlign(),
                        AtomicOrdering::SequentiallyConsistent);

This is guarded by option atomic-counter-update-promoted option which is false by default. Can you debug your build see how it is set to true?

I added the check around line 743 as well for sanity reasons.

OK, I got brought back to this by a comment on my downstream rust issue.

@Zalathar kindly pointed out the commit on the rust side that is turning this on (blindly).

Turn on:
https://github.com/rust-lang/rust/pull/111469/files

compiler/rustc_llvm/llvm-wrapper/PassWrapper.cpp

    if (InstrProfileOutput) {
      Options.InstrProfileOutput = InstrProfileOutput;
    }
    // cargo run tests in multhreading mode by default
    // so use atomics for coverage counters
    Options.Atomic = true;
    MPM.addPass(InstrProfiling(Options, false));
  }
);

I'm going to investigate down there and see if there's a way for it to be smarter about turning this on. With that said though, I'm still of the mind that a sanity check in LLVM for obvious failures isn't a bad idea. Thoughts?