This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Follow AACPS for preserving number of loads/stores of volatile bit-fields
ClosedPublic

Authored by dnsampaio on Sep 10 2019, 7:44 AM.

Details

Summary

Following the AAPCS, every store to a volatile bit-field requires to generate one load of that field, even if all the bits are going to be replaced.
This patch allows the user to opt-in in following such rule, whenever the a.

AAPCS Release 2019Q1.1 (https://static.docs.arm.com/ihi0042/g/aapcs32.pdf)
section 8.1 Data Types, page 35, paragraph: Volatile bit-fields – preserving number and width of container accesses

When a volatile bit-field is written, and its container does not overlap with any non-bit-field member, its
container must be read exactly once and written exactly once using the access width appropriate to the
type of the container. The two accesses are not atomic.

Event Timeline

dnsampaio created this revision.Sep 10 2019, 7:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 10 2019, 7:44 AM
dnsampaio added a comment.EditedSep 10 2019, 7:48 AM

This patch could hack clang to generate an extra load. However, my knowledge in the clang code base is not extensive. How could we ensure that the width of loads and stores are the size of the container, and that they don't overlap non-bitfields?

lebedev.ri resigned from this revision.Sep 10 2019, 7:52 AM
lebedev.ri added a reviewer: jfb.

I'm not sure why i'm added as a reviewer here.

That being said i don't see why that load is needed there.
As i read it, the document only states that if load is needed,
then it must be a single load, likewise for store.
I.e. i'm not sure why it requires a load when it isn't needed.
Unlike atomics, volatiles don't enforce any ordering.

@ostannard might prove me wrong, but according to the AACPS:

When a volatile bit-field is written, and its container does not overlap with any non-bit-field member, its
container must be read exactly once and written exactly once using the access width appropriate to the
type of the container. The two accesses are not atomic.

This rule does not define that the load is done if required. It states that it will be read once. It even gives the example that an increment will always perform two reads and one write, bitwidth agnostic. It writes just after:

Note: Note the volatile access rules apply even when the width and alignment of the bit-field imply that
the access could be achieved more efficiently using a narrower type. For a write operation the read must
always occur even if the entire contents of the container will be replaced.

The rationale is to provide a uniform behavior for volatile bitfields independent of their width (as far they do not overlap with non-bitfields).

@ostannard might prove me wrong, but according to the AACPS:

When a volatile bit-field is written, and its container does not overlap with any non-bit-field member, its
container must be read exactly once and written exactly once using the access width appropriate to the
type of the container. The two accesses are not atomic.

This rule does not define that the load is done if required. It states that it will be read once. It even gives the example that an increment will always perform two reads and one write, bitwidth agnostic. It writes just after:

While it can be read as "just always read volatile bitfield before overwriting it", i'm not sure that makes any sense.
Why exactly would you want do to that load, if you don't use it's results? What side-effects does it cause?

What it should be saying is: "when you have a bitfield, and you want to update several of it's fields but not all,
first do one read to get the old contents, then merge old-new data, and then perform a single store replacing the entire contents at once".
That is the only sane behavior it can require.

Note: Note the volatile access rules apply even when the width and alignment of the bit-field imply that
the access could be achieved more efficiently using a narrower type. For a write operation the read must
always occur even if the entire contents of the container will be replaced.

The rationale is to provide a uniform behavior for volatile bitfields independent of their width (as far they do not overlap with non-bitfields).

jfb added a subscriber: rjmccall.Sep 10 2019, 8:59 AM
jfb added inline comments.Sep 10 2019, 9:01 AM
clang/test/CodeGen/aapcs-bitfield.c
543

These are just extra loads? Why?

554

Why isn't this load sufficient?

dnsampaio marked 4 inline comments as done.EditedSep 11 2019, 1:55 AM

Hi @jfb. In a example such as:

struct { int a : 1; int b : 16; } S;
extern int expensive_computaion(int v);
void foo(volatile S* s){
  s->b = expensive_computation(s->b);
}

There is no guarantee that s->a is not modified during a expensive computation, so it must be obtained just before writing the s->b value, as a and b share the same memory position. This is already done by llvm. Indeed, the exact output would be

define void @foo(%struct.S* %S) local_unnamed_addr #0 {
entry:
  %0 = bitcast %struct.S* %S to i32*
  %bf.load = load volatile i32, i32* %0, align 4
  %bf.shl = shl i32 %bf.load, 15
  %bf.ashr = ashr i32 %bf.shl, 16
  %call = tail call i32 @expensive_computation(i32 %bf.ashr) #2
  %bf.load1 = load volatile i32, i32* %0, align 4 ; <<<== Here it obtains the value to s->a to restore it.
  %bf.value = shl i32 %call, 1
  %bf.shl2 = and i32 %bf.value, 131070
  %bf.clear = and i32 %bf.load1, -131071
  %bf.set = or i32 %bf.clear, %bf.shl2
  store volatile i32 %bf.set, i32* %0, align 4
  ret void
}

These extra loads here are required to make uniform the number of times the volatile bitfield is read, independent if they share or not memory with other data.

We could have it under a flag, such as -faacps-volatilebitfield, disabled by default.

The other point not conformant to the AACPS is the width of the loads/stores used to obtain bitfields. They should be the width of the container, if that would not overlap with non-bitfields. Do you have any idea where that could be computed? I imagine that would be when computing the alignment of the elements of the structure, where I can check if the performing the entire container width load would overlap with other elements. Could you point me where that is done?

clang/test/CodeGen/aapcs-bitfield.c
543

Yes, these are just extra loads. As the AACPS describes, every write requires to perform a load as well, even if all bits of the volatile bitfield is going to be replaced.

554

Technically speaking, that is the load for reading the bitfield, not the load required when writing it.

dnsampaio marked 2 inline comments as done.Sep 11 2019, 1:56 AM

The exact sequence of volatile accesses is observable behavior, and it's the ABI's right to define correct behavior for compliant implementations, so we do need to do this.

Diogo, IRGen breaks up bit-field storage units in CGRecordLowering::accumulateBitFields. I'm sure there's some reasonable way to add target-specific adjustments to that so that we break things up by storage unit.

rsmith added a subscriber: rsmith.Sep 12 2019, 6:10 PM

The exact sequence of volatile accesses is observable behavior, and it's the ABI's right to define correct behavior for compliant implementations, so we do need to do this.

I'm not convinced; I think the AAPCS is overstepping its bounds here by trying to specify this. Whether you get a volatile load for a full-width bitfield access doesn't seem to have anything to do with ABI; implementations that differ on this detail will be able to produce code that links together and works together just fine. I think this is instead a question of what guarantees the implementation wants to give about how it treats volatile operations and how it lowers them to hardware. That should be specified, sure (and in fact C++ at least requires us to document what we do), but it's not ABI any more than (for example) the choice of whether a volatile access to a local variable must actually issue load / store instructions -- or which instructions must be produced -- is ABI. I have no particular opinion on whether we should make this change -- volatile bit-fields are pretty weird and broken regardless. But I don't think we should view it as being part of the ABI, and just as with the question of whether plain bit-fields are signed, we should tell the psABI folks that the question of what constitutes a volatile access and the semantics of such an access is not ABI and not up to them.

(Similar example: on Windows x86 targets, volatile accesses are atomic by default (controlled by compiler switch). clang-cl follows cl in this regard, but we don't consider it to be part of the ABI, and in the regular clang driver, we do not treat volatile accesses as atomic when targeting Windows x86.)

I have to say that I disagree. The ABI certainly doesn't get to control the language and define what constitutes a volatile access, but when the language has decided that a volatile access is actually performed, I think ABIs absolutely ought to define how they should be compiled. Volatile accesses are quite unusual in that they are used for a purpose — memory-mapped I/O — that is extremely dependent for correctness on the exact instructions used to implement it. IIRC there are architectures that for whatever reason provide two different load instructions where only one of those instructions actually performs memory-mapped I/O correctly. Certainly the exact width of load or store is critical for correctness. So while I have certainly seen ABI overreach before and pushed back on it, for this one case I think ABI involvement is totally appropriate, and in fact I wish more ABIs specified how they expect volatile accesses to be performed.

It is, of course, somewhat unfortunate that a corner-case use case like memory-mapped I/O has been privileged with such a core position in the language, but that's how it is.

I do think that AAPCS's specific request that the compiler emit spurious loads when storing to volatile bit-fields is a bad idea, and I think it would be fine to push back on that part, and perhaps also on the idea that compound assignments and increments should load twice.

Indeed our main concern is regarding the access widths of loads. As mentioned by @rjmccall, most volatile bitfields are used to perform memory mapped I/O, and some hardware only support them with a specific access width.
The spurious load I am more than glad to leave it disable behind a command flag, so it will only appear if the user requests it. See that volatile accesses might have side effects, and for example, an I/O read counter holding an odd number could define that the data is still being processed.

jfb added a comment.Sep 13 2019, 8:46 AM

Indeed our main concern is regarding the access widths of loads. As mentioned by @rjmccall, most volatile bitfields are used to perform memory mapped I/O, and some hardware only support them with a specific access width.
The spurious load I am more than glad to leave it disable behind a command flag, so it will only appear if the user requests it. See that volatile accesses might have side effects, and for example, an I/O read counter holding an odd number could define that the data is still being processed.

Are the cases being addressed in the PR actually relevant to real MMIO, or is this patch following the letter of AAPCS which doesn't actually matter?

In D67399#1669568, @jfb wrote:

Indeed our main concern is regarding the access widths of loads. As mentioned by @rjmccall, most volatile bitfields are used to perform memory mapped I/O, and some hardware only support them with a specific access width.
The spurious load I am more than glad to leave it disable behind a command flag, so it will only appear if the user requests it. See that volatile accesses might have side effects, and for example, an I/O read counter holding an odd number could define that the data is still being processed.

Are the cases being addressed in the PR actually relevant to real MMIO, or is this patch following the letter of AAPCS which doesn't actually matter?

Again, I think AAPCS is well within its rights to say that certain volatile accesses should be performed with loads and stores of certain widths. If low-level programmers cannot use bit-fields today with memory-mapped I/O because they cannot trust compilers to produce reasonable accesses, that is a legitimate concern for ABI authors and a legitimate bug for compiler maintainers.

In D67399#1669568, @jfb wrote:

Indeed our main concern is regarding the access widths of loads. As mentioned by @rjmccall, most volatile bitfields are used to perform memory mapped I/O, and some hardware only support them with a specific access width.
The spurious load I am more than glad to leave it disable behind a command flag, so it will only appear if the user requests it. See that volatile accesses might have side effects, and for example, an I/O read counter holding an odd number could define that the data is still being processed.

Are the cases being addressed in the PR actually relevant to real MMIO, or is this patch following the letter of AAPCS which doesn't actually matter?

Again, I think AAPCS is well within its rights to say that certain volatile accesses should be performed with loads and stores of certain widths. If low-level programmers cannot use bit-fields today with memory-mapped I/O because they cannot trust compilers to produce reasonable accesses, that is a legitimate concern for ABI authors and a legitimate bug for compiler maintainers.

I have no objection to the direction in this patch. I agree that it's important to have a specification that covers this, and while I still think that this has nothing to do with ABI as I believe the term is normally understood, treating it as platform-dependent and specifying it in the same place as the platform ABI might be the most reasonable option. Clearly the AAPCS is more than a Procedure Call Standard, which is fine.

dnsampaio retitled this revision from [ARM] Follow AACPS standard for volatile bitfields to [ARM] Follow AACPS for preserving number of loads/stores of volatile bit-fields.Jan 30 2020, 9:06 AM
dnsampaio edited the summary of this revision. (Show Details)
dnsampaio updated this revision to Diff 241486.Jan 30 2020, 9:06 AM
  • Added flag to allow user to opt-in
jfb accepted this revision.Jan 30 2020, 9:39 AM

I'm happy with this change since it's opt-in. Thanks!

This revision is now accepted and ready to land.Jan 30 2020, 9:39 AM
dnsampaio updated this revision to Diff 243102.Feb 7 2020, 2:09 AM

Revordered some tests adding and definition of the "isAAPCS" function from patch D72932

This revision was automatically updated to reflect the committed changes.