Page MenuHomePhabricator

Fix atomic loads and stores of 64-bit values in non-default address spaces on ARM
AbandonedPublic

Authored by koutheir on Jun 14 2021, 1:43 PM.

Details

Summary

Performing an atomic load or store of a 64-bit value, using an address outside the default address space, triggers an assertion failure on ARM.

The ARM back-end is missing an address space cast that is required in this situation.

This patch assumes that ARM synchronization semantics are the same irrespective of the address space.

Diff Detail

Event Timeline

koutheir created this revision.Jun 14 2021, 1:43 PM
koutheir requested review of this revision.Jun 14 2021, 1:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2021, 1:44 PM
koutheir retitled this revision from Fix atomic loads and stores of 64-bit integers in non-default address spaces on ARM. Performing an atomic load or store of a 64-bit value, using an address outside the default address space, triggers an assertion failure on ARM. The ARM back-end... to Fix atomic loads and stores of 64-bit values in non-default address spaces on ARM.Jun 14 2021, 1:45 PM
koutheir edited the summary of this revision. (Show Details)
koutheir edited the summary of this revision. (Show Details)

Hi Koutheir,

I'm not an expert on atomics or the various uses of address spaces in the wild, but I'm curious as to what your use case is.

cheers,
--renato

llvm/lib/Target/ARM/ARMISelLowering.cpp
19658

This doesn't sound like the right solution.

Maybe I'm misunderstanding what you're trying to do, but if the address space is different because it's in a different virtual environment (encapsulation, security, hardware purposes), then casting it to the default address space won't point to the "right" element.

19700

Copy and paste introduce errors. Please, factor this in a new anonymous namespace helper.

I'm not an expert on atomics or the various uses of address spaces in the wild, but I'm curious as to what your use case is.

The use of a different address space in my case is used to distinguish "pointers to native memory" from "pointers to managed objects" in a Java virtual machine.

This doesn't sound like the right solution.

Maybe I'm misunderstanding what you're trying to do, but if the address space is different because it's in a different virtual environment (encapsulation, security, hardware purposes), then casting it to the default address space won't point to the "right" element.

I agree with your concern, and that's the reason why I explicitly stated my assumption:

This patch assumes that ARM synchronization semantics are the same irrespective of the address space.

In my use case, the address space cast works as expected.

I thought my assumption was reasonable because that is the same assumption made when performing atomic loads and stores on other integers with lower bit widths, e.g., i32, i16, i8. For instance, replacing all i64 with i32 in the minimal test below works perfectly without this patch. This patch simply makes the same operation work for i64 values.

I think that atomic loads and stores from non-default address spaces should behave uniformly irrespective of the integer bit width (as long as it is supported by the processor). This means choosing:

  1. Either they behave the same way irrespective of the address space, or
  2. They are legal only on the default address space.

The current situation is somewhere in between, and this patch brings it completely to choice (1).

llvm/lib/Target/ARM/ARMISelLowering.cpp
19700

OK. I'll do that.

I'm not an expert on atomics or the various uses of address spaces in the wild,...

@rengolin: I used llvm/CODE_OWNERS.TXT in order to find out which people should be invited to review this patch. However, I might have missed other people who deal more with atomic operations on ARM. Could you please invite these persons if you know them?

koutheir updated this revision to Diff 352131.Jun 15 2021, 7:18 AM

The use of a different address space in my case is used to distinguish "pointers to native memory" from "pointers to managed objects" in a Java virtual machine.

I see. I imagine to help with garbage collecting.

This patch assumes that ARM synchronization semantics are the same irrespective of the address space.

I believe this is true, but that's not what I was concerned about.

I think that atomic loads and stores from non-default address spaces should behave uniformly irrespective of the integer bit width (as long as it is supported by the processor).

I also agree with this statement, so we're on the same page.

My worry wasn't about the semantics of the atomic load/store instructions, but about the introduction of the unchecked CreateAddrSpaceCast in there.

Basically, what that's saying is that *all* access to *any* address space is the same as an access through the default address space.

It may be true for your case, but not necessarily others. Your code would make it true for all.

I looked at other targets and none of them assume this is true, except the GPU targets on very specific conditions (I'm, assuming ABI related).

This looks to me like the wrong place to fix it. It'd probably be a matter for the middle-end casting the address space to default when performing the atomic load?

Basically, what that's saying is that *all* access to *any* address space is the same as an access through the default address space.

It may be true for your case, but not necessarily others. Your code would make it true for all.

Yes, see choice (1) above.

I looked at other targets and none of them assume this is true, except the GPU targets on very specific conditions (I'm, assuming ABI related).

If I understand well, you rather opt for choice (2): Atomic loads and stores are legal only on the default address space.

If that is the case, then I think we should explicitly check for this, and report an error instead of:

  • Silently generating code for i8, i16 and i32.
  • Crashing due to an assertion failure for i64.

This choice also makes sense, and I'm ready to implement it. Please confirm that this is your actual opinion.

This looks to me like the wrong place to fix it. It'd probably be a matter for the middle-end casting the address space to default when performing the atomic load?

That would work around the issue by avoiding the problematic code path. I still think LLVM should provide a consistent behavior irrespective of the integer bit width.

koutheir updated this revision to Diff 352142.Jun 15 2021, 8:04 AM
koutheir updated this revision to Diff 352145.Jun 15 2021, 8:06 AM

If I understand well, you rather opt for choice (2): Atomic loads and stores are legal only on the default address space.

Not quite. Address spaces in LLVM are a somewhat opaque concept, and can mean anything. I'm just not sure they are exchangeable like you propose on all cases.

For example, the LangRef says of addrspacecast: "It can be a no-op cast or a complex value modification, depending on the target and the address space pair."

My assumption was that the back-end should not see address spaces in that way, that they would have been converted by some earlier pass to checks and bounds, but I may be wrong.

How are you generating that IR? Is that something that comes from Clang or another official front-end?

If that is the case, then I think we should explicitly check for this, and report an error instead of:

  • Silently generating code for i8, i16 and i32.
  • Crashing due to an assertion failure for i64.

I agree this isn't correct.

I looked at your test and it doesn't generate LDEX/STREX for 32-bits, but LDR/STR followed by a DMB. No address space conversions are needed either way.

The crash only happens with 64-bit types because there's an i8* conversion to map i64 into {i32, i32}, so it's accidental. Changing the code to accept any addrspace in that cast is fixing the wrong thing.

The only non-GPU test I could find that has "store atomic ... addrspace" is test/CodeGen/X86/2020_12_02_decrementing_loop.ll which the end result is an ud2 (ie. undefined instruction, ie. crash).

I'm not sure the back-end is ready to consume that kind of pattern.

Not quite. Address spaces in LLVM are a somewhat opaque concept, and can mean anything. I'm just not sure they are exchangeable like you propose on all cases.

Would this qualify atomic operations in non-default address spaces as undefined behavior?

How are you generating that IR? Is that something that comes from Clang or another official front-end?

The IR is coming from our custom port of the LLVM back-end of GraalVM to ARMv7-A.

I agree this isn't correct.
...
The crash only happens with 64-bit types because there's an i8* conversion to map i64 into {i32, i32}, so it's accidental. Changing the code to accept any addrspace in that cast is fixing the wrong thing.

How should we correct this?

I'm not sure the back-end is ready to consume that kind of pattern.

Does this mean we should keep the current state as is, and simply never feed the ARM back-end this kind of instructions?

Those are great questions! :)

Would this qualify atomic operations in non-default address spaces as undefined behavior?

No. Atomic operations are a red-herring.

The problem here is that a particular handling of 64-bit atomic operations on Arm are creating a pointer bit-cast that fails because they're in different address spaces.

Since implicit bit-casting pointers across address spaces are not allowed, this code in the Arm back-end is "faulty".

Perhaps the fix is to create an i8* on the address space of the original pointer?

The IR is coming from our custom port of the LLVM back-end of GraalVM to ARMv7-A.

I suspected as much.

Does this mean we should keep the current state as is, and simply never feed the ARM back-end this kind of instructions?

Perhaps. You can lower the address space semantics (checks and cast to the default address space) before it gets to the back-end.

This fixes your problem. It doesn't fix the bug you found, though.

I created a bug for the back-end part of things: https://bugs.llvm.org/show_bug.cgi?id=50721.

Feel free to subscribe, or even send an email to llvm-dev to discuss and hope someone who understands more about address spaces in LLVM chimes in.

Meanwhile, you can avoid the problem by converting address space semantics before it gets to the back-end.

If I understand well, you rather opt for choice (2): Atomic loads and stores are legal only on the default address space.

Not quite. Address spaces in LLVM are a somewhat opaque concept, and can mean anything. I'm just not sure they are exchangeable like you propose on all cases.

They are target defined. You're only doing this for this one target, so you can make them mean whatever you want them to mean

They are target defined. You're only doing this for this one target, so you can make them mean whatever you want them to mean

Right, I'm just not sure they mean what this patch wants them to mean for address space usage (indistinguishable for load/store purposes) on all uses and variations of Arm uarch/extensions.

I'm not finding many examples of address space cast codegen in Clang/LLVM other than GPU code.

I know CHERI uses address space for their capabilities and I don't think it would work to just cast it to non-capability world like that.

The Morello (AArch64 + CHERI) implementation uses AS 200 for capabilities and does different instruction selection for loads and stores (including atomics) depending on the address space.

The back end should never see IR that contains address spaces that don't exist for that back end. I'd expect that code wanting to use AS1 to indicate GC pointers should rewrite them to AS0 immediately before codegen.

The Morello (AArch64 + CHERI) implementation uses AS 200 for capabilities and does different instruction selection for loads and stores (including atomics) depending on the address space.

The back end should never see IR that contains address spaces that don't exist for that back end. I'd expect that code wanting to use AS1 to indicate GC pointers should rewrite them to AS0 immediately before codegen.

Thanks @theraven, that's what I was thinking.

Regardless of the semantics of the back-end bitcast in question, @koutheir, implementing the GC logic on your side is probably the right solution anyway.

And how should the crash be fixed? Or shouldn't it?

Back ends are expected to crash when given invalid IR.

Back ends are expected to crash when given invalid IR.

Well, we could make the crash more meaningful with an assert, but yeah, crash.

If we decide to do anything, it will be covered in the bug I opened.

This review can be closed.

Thanks for bringing this to our attention.

To elaborate: The set of possible invalid IR is huge and back ends are permitted to assume that they are only ever provided with valid IR, as per the informal contract between the back end and the rest of the compiler (including requiring function arguments to be arranged in a way that the back end's ABI-lowering logic understands). Ensuring that the back end is only ever given IR that it knows how to consume is the responsibility of whatever is invoking the back end. In an ideal world, every construct that a front end + optimiser pipeline may feed to a back end would have a test in tests/Target/{back end} that would check that it correctly lowers the IR.

I don't have any objection to the ARM back ends generating an error if they encounter a pointer in an unknown address space, but I also don't object if that error is an assertion failure or an llvm::unreachable, and it's also fine for it to crash because there are a load of other cases of IR that you can feed into a back end to make it crash if you try hard enough.

Back ends are expected to crash when given invalid IR.

Well, this marks this patch as invalid, and the crash as "by design".

Can someone close this review as "Won't fix"?

Can someone close this review as "Won't fix"?

You can "abandon" the patch. This isn't a bug, so "won't fix" doesn't apply. :D

Please select action "Resign revision" (or something similar, I can't remember the actual text).

koutheir abandoned this revision.Jun 23 2021, 10:12 AM