This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Follow AACPS standard for volatile bit-fields access width
ClosedPublic

Authored by stuij on Jan 17 2020, 9:06 AM.

Details

Summary

This patch resumes the work of D16586.
According to the AAPCS, volatile bit-fields should
be accessed using containers of the widht of their
declarative type. In such case:

struct S1 {
  short a : 1;
}

should be accessed using load and stores of the width
(sizeof(short)), where now the compiler does only load
the minimum required width (char in this case).
However, as discussed in D16586,
that could overwrite non-volatile bit-fields, which
conflicted with C and C++ object models by creating
data race conditions that are not part of the bit-field,
e.g.

struct S2 {
  short a;
  int  b : 16;
}

Accessing S2.b would also access S2.a.

The AAPCS Release 2020Q2
(https://documentation-service.arm.com/static/5efb7fbedbdee951c1ccf186?token=)
section 8.1 Data Types, page 36, "Volatile bit-fields -
preserving number and width of container accesses" has been
updated to avoid conflict with the C++ Memory Model.
Now it reads in the note:

This ABI does not place any restrictions on the access widths of bit-fields where the container
overlaps with a non-bit-field member or where the container overlaps with any zero length bit-field
placed between two other bit-fields. This is because the C/C++ memory model defines these as being
separate memory locations, which can be accessed by two threads simultaneously. For this reason,
compilers must be permitted to use a narrower memory access width (including splitting the access into
multiple instructions) to avoid writing to a different memory location. For example, in
struct S { int a:24; char b; }; a write to a must not also write to the location occupied by b, this requires at least two
memory accesses in all current Arm architectures. In the same way, in struct S { int a:24; int:0; int b:8; };,
writes to a or b must not overwrite each other.

I've updated the patch D16586 to follow such behavior by verifying that we
only change volatile bit-field access when:

  • it won't overlap with any other non-bit-field member
  • we only access memory inside the bounds of the record
  • avoid overlapping zero-length bit-fields.

Regarding the number of memory accesses, that should be preserved, that will
be implemented by D67399.

Diff Detail

Event Timeline

dnsampaio created this revision.Jan 17 2020, 9:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2020, 9:06 AM

Why are you doing this in CodeGen, rather than adjusting the existing layout code in CGRecordLowering? Doing it this way will result in AdjustAAPCSBitfieldLValue being called for every access to the bitfield, rather than just once. This is probably more fragile too, because it's spreading the logic across multiple parts of the codebase, and has to undo some of the layout done by CGRecordLowering.

Why are you doing this in CodeGen, rather than adjusting the existing layout code in CGRecordLowering? Doing it this way will result in AdjustAAPCSBitfieldLValue being called for every access to the bit-field, rather than just once. This is probably more fragile too, because it's spreading the logic across multiple parts of the codebase, and has to undo some of the layout done by CGRecordLowering.

Only at CodeGen we definitely know if an access is volatile, due casts to volatile.
About undoing what the CGRecordLowering does, that can't be avoided. We only can compute if a given offset and width will touch outside the bit-field when all fields and padding are defined.
We could add special volatile access fields into CGBitFieldInfo and compute them at the end for CGRecordLowering::lower, but these fields would only be relevant to ARM. And they will need to be computed even if there are any volatile accesses.
I have no strong opinion about which is better, I can add the fields if that suites better.

This revision was not accepted when it landed; it landed in state Needs Review.Jan 21 2020, 7:34 AM
This revision was automatically updated to reflect the committed changes.
dnsampaio reopened this revision.Jan 21 2020, 7:36 AM

Sorry, submitted using ide by mistake. Already reverted it.

This comment was removed by dnsampaio.
dnsampaio added a comment.EditedJan 22 2020, 2:01 AM

Why are you doing this in CodeGen, rather than adjusting the existing layout code in CGRecordLowering? Doing it this way will result in AdjustAAPCSBitfieldLValue being called for every access to the bitfield, rather than just once. This is probably more fragile too, because it's spreading the logic across multiple parts of the codebase, and has to undo some of the layout done by CGRecordLowering.

@ostannard Indeed, after looking at the CGRecordLayout I am not kind for doing this change. It will require changing all possible initializations, with a sensible value. And either way, codegen will also need to change, to define the address, it must know if it is volatile or not. I don't believe that recomputing it a every access would pose much of an overhead, perhaps we can leave it as a TODO to move it there?

It will require changing all possible initializations, with a sensible value.

Where are you seeing multiple initializations? It looks like all of the logic for struct layout happens in CGRecordLowering::lower, we might need to add a pass there to calculate the valid access widths for bitfields.

And either way, codegen will also need to change, to define the address, it must know if it is volatile or not.

The difference is that codegen only needs to change to check if the access is volatile, and select the correct offset and size already calculated during struct layout.

perhaps we can leave it as a TODO to move it there?

I'm less concerned about the performance impact of this, and more about making this code harder to understand by spreading the logic across multiple different phases of compilation. This has been broken for years, I'd rather we fix it properly, than add a TODO which will never get done.

dnsampaio updated this revision to Diff 241421.Jan 30 2020, 5:28 AM
  • Moved computation of volatile accesses to the record layout builder
dnsampaio updated this revision to Diff 241477.Jan 30 2020, 8:35 AM
  • Do not generate special volatile access if the record alignment is smaller than the bit-field declared type
This comment was removed by dnsampaio.
dnsampaio updated this revision to Diff 242823.Feb 6 2020, 12:56 AM
  • Removed test
  • Added clear at the end of run as well, to clear waste
  • Moved clearing to a more sensible position
Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2020, 12:56 AM
dnsampaio planned changes to this revision.Feb 6 2020, 1:20 AM

Updated wrong patch here.

dnsampaio updated this revision to Diff 243197.Feb 7 2020, 9:34 AM

Added opt-out flag

I think I've spotted a bug in the ABI spec, which you've faithfully implemented here. I don't know of any other compiler which has implemented this ABI change yet, so it's probably worth seeing if we can get the spec fixed.

The intention of the ABI is to avoid conflicting with the C/C++11 spec, which requires access to one "memory location" to not write to any adjacent memory locations. However, the wording of the ABI does not take into account zero-sized bitfields, which are defined in C/C++ to start a new memory location. For example:

struct foo {
  int a : 8;
  char  : 0;
  int b : 8;
};

Here, C/C++ says that a and b are separate memory locations, so it must be possible to read and write them from two threads. However, the ABI does not have the special case for zero-sized bitfields, so requires access to both fields to use 32-bit loads and stores, which overlap.

I've raised a ticket (internal to Arm) to consider changing the ABI to match C/C++ in this case.

clang/include/clang/Basic/CodeGenOptions.def
409

Why is this a negative option, when the one above is positive?

clang/include/clang/Driver/Options.td
2397

Command-line options are in kebab-case, so this should be something like fno-aapcs-bitfield-width. This also applies to the fAAPCSBitfieldLoad option above, assuming it's not too late to change that.

Please also add a positive version of this option (i.e. faapcs-bitfield-width).

2399

s/Does/Do/

s/declarative/container/

clang/lib/CodeGen/CGRecordLayout.h
101

It doesn't look like this is ever called with different volatile/non-volatile values, so I don't think we need the extra parameters.

clang/lib/CodeGen/CGRecordLayoutBuilder.cpp
531

Space in "itscontainer".

534

Space in "itscontainer".

535

Space in "thetype".

547

_register_ alignment?

548

Comments end in a full stop (multiple times in this function).

647

Unnecessary formatting change.

dnsampaio updated this revision to Diff 243855.Feb 11 2020, 6:55 AM
  • Add test that the volatile access does not cross a border defined by a zero lenght bit-field, defined by C11, avoiding race-conditions
dnsampaio updated this revision to Diff 243858.Feb 11 2020, 7:08 AM
dnsampaio marked 5 inline comments as done.
  • Fixing comments and removing some tests changes not required

Hi @ostannard, thanks for your review.
I updated the patch so it won't act when the computed volatile bit-field access will overlap a zero length bit-field, avoiding the conflict. We can update it accordingly to future versions of the AAPCS if required.

dnsampaio updated this revision to Diff 276034.Jul 7 2020, 6:17 AM

Rebased.
Now the AAPCS explicitly avoids conflicts with the C11, by not imposing any restriction
when the natural container will overlap a zero lenght bit-field:
https://github.com/ARM-software/abi-aa/commit/2334fc7611ede31b33e314ddd0dc90579015b322
Both 32 and 64 bit versions were updated on the same way.

dnsampaio edited the summary of this revision. (Show Details)Jul 17 2020, 2:25 AM
dnsampaio edited the summary of this revision. (Show Details)

You haven't addressed my earlier inline comments.

dnsampaio marked 6 inline comments as done.Jul 17 2020, 8:02 AM

Indeed not all of them. Fixed this time.

clang/include/clang/Basic/CodeGenOptions.def
409

The enforcing of number of accesses would not be accepted if it was not an opt-in option. This one I expect it should be accepted with a single opt-out option.

dnsampaio updated this revision to Diff 278793.Jul 17 2020, 8:54 AM
dnsampaio marked an inline comment as done.

Fixed remaining of inline comments

Ping ... ping...

stuij added a subscriber: stuij.Aug 28 2020, 7:16 AM

@ostannard: pinging on behalf of @dnsampaio. The changes still apply cleanly.

ostannard added inline comments.Sep 7 2020, 6:07 AM
clang/include/clang/Basic/CodeGenOptions.def
409

My problem is with the name of the option (adding an extra negative just makes things more confusing), not with the default value. This could just be called AAPCSBitfieldWidth, (unless you think the Force is adding something), and default to true father than false.

clang/include/clang/Driver/Options.td
2397

This still needs a positive version.

clang/lib/CodeGen/CGRecordLayout.h
83

These doc comments are copied from the ones above, they need changing.

clang/lib/CodeGen/CGRecordLayoutBuilder.cpp
279

Returning void is confusing (yes I know it was already there), this should be a separate return; statement.

289

Same here.

537

s/disable/disabled/

stuij commandeered this revision.Sep 8 2020, 7:52 AM
stuij added a reviewer: dnsampaio.

Commandeering as I've made some changes to the patch.

stuij updated this revision to Diff 290485.Sep 8 2020, 7:53 AM

addressed review comment

stuij marked 6 inline comments as done.Sep 8 2020, 7:55 AM
stuij added inline comments.
clang/include/clang/Basic/CodeGenOptions.def
409

done

clang/include/clang/Driver/Options.td
2397

done

This revision is now accepted and ready to land.Sep 8 2020, 8:26 AM
This revision was landed with ongoing or failed builds.Sep 8 2020, 9:50 AM
This revision was automatically updated to reflect the committed changes.
stuij marked 2 inline comments as done.
stuij reopened this revision.Oct 7 2020, 4:46 AM

Reopening as this commit made clang/test/CodeGen/volatile.c fail on Arm/AArch64 buildbot hosts.

This revision is now accepted and ready to land.Oct 7 2020, 4:46 AM
stuij updated this revision to Diff 296644.Oct 7 2020, 4:47 AM

After committing this patch, clang/test/CodeGen/volatile.c failed on Arm/AArch64 buildbot hosts. The reason for this is that %itanium_abi_triple, a run line Lit target triple substitution at the top of the file, is filled in with the host arch triple. For example: aarch64-unknown-linux-gnu. I've amended the tests to take into account the changes to code generation in this patch.

ostannard added inline comments.Oct 8 2020, 6:13 AM
clang/test/CodeGen/volatile.c
1–2

I think it would be better to change this test to use explicit triples, so that we're always testing both the ARM and non-ARM behaviour, regardless of the default triple.

stuij updated this revision to Diff 297584.Oct 12 2020, 7:40 AM

addressed review comment to hardwire non-MS target platforms

stuij updated this revision to Diff 297585.Oct 12 2020, 7:45 AM

removed clang/test/CodeGen/lit.local.cfg.py

This revision was landed with ongoing or failed builds.Oct 13 2020, 2:32 AM
This revision was automatically updated to reflect the committed changes.
clang/lib/Frontend/CompilerInvocation.cpp