This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/
-
CodeGen/
2/2
CGExpr.cpp
-
CGValue.h
-
CodeGenFunction.h
-
test/CodeGen/
-
CodeGen/
2
aapcs-bitfield.c

Differential D16586

Make clang AAPCS compliant w.r.t volatile bitfield accesses
AbandonedPublic

Authored by rmaprath on Jan 26 2016, 9:01 AM.

Download Raw Diff

Details

Reviewers

rengolin
rjmccall
olista01

Summary

Lets consider the following plain struct:

struct S1 {
  char a;
  short b : 8;
};

According to the AAPCS, the bitfield 'b' _can_ be accessed by loading its
"container" (a short in this case) and then using bit-shifting operations to
extract the actual bitfield value. However, for plain (non-volatile) bitfields,
the AAPCS does not mandate this, so compilers are free to use whatever type of
access they think is best. armcc tends to use wider accesses even when narrower
accesses are possible (like in the above example, b can be byte-loaded), whereas
clang and gcc tend to issue narrow loads/stores where available.

But things get tricky when volatile bitfields are involved:

struct S2 {
  char a;
  volatile short b : 8;
};

Now the AAPCS mandates that 'b' must be accessed through loads/stores of widths
as appropriate to the container type (Section 7.1.7.5). To complicate matters
further, AAPCS allows overlapping of bitfields with regular fields
(Section 7.1.7.4). What this means is that the storage units of 'a' and 'b'
are overlapping, where 'a' occupies the first byte of the struct and 'b' lies
on the first half-word of the struct. Loading 'b' will load 'a' as well since
'b' is loaded as a short.

Currently, clang will byte-load 'b' and is therefore not AAPCS compliant. The
purpose of this patch is to make clang respect these access widths of volatile
bitfields.

One way to fix this problem is to teach clang the concept of overlapping storage
units, this would require an overhaul of clang's structure layout code, as the
idea of distinct storage units is deeply embedded in clang's structure layout
routines. In this patch, I take the view that this confusion is only a matter of
abstraction; whether the above struct's storage is viewed as two consecutive
bytes (this is how clang lays out the struct) or as a single half-word (this is
how armcc lays out the struct) is irrelevant, the actual data of the fields will
always be at the same offset from the base of the struct. This difference of
abstraction only matters when we generate code to access the relevant fields, we
can therefore intercept those loads/stores in clang and adjust the accesses so
that they are AAPCS compliant.

Diff Detail

Event Timeline

rmaprath updated this revision to Diff 46003.Jan 26 2016, 9:01 AM

rmaprath retitled this revision from to Make clang AAPCS compliant w.r.t volatile bitfield accesses.

rmaprath updated this object.

rmaprath added reviewers: rjmccall, jmolloy, rengolin, olista01.

rmaprath added a subscriber: cfe-commits.

Herald added a subscriber: aemerson. · View Herald TranscriptJan 26 2016, 9:01 AM

Well, that's certainly an interesting ABI rule.

A few high-level notes:

AAPCS requires the bit-field to be loaded on a store, even if the store fills the entire container; that doesn't seem to be implemented in your patch.

Especially because of #1, let's not do this unless the l-value is actually volatile. The ABI rule here is arguably actively wrong for non-volatile cases, e.g. struct { volatile char c; short s : 8; }.

Instead of using string comparisons all over the place, please make this a flag on the CG TargetInfo or something.

lib/CodeGen/CGExpr.cpp
1761	This alignment computation is wrong; you need to be basing this on the alignment of the base. It would be easier to do that in the formation of the LValue in the first place in EmitLValueForField, and then you won't need to modify as many of these uses.

rmaprath added inline comments.Jan 27 2016, 9:49 AM

lib/CodeGen/CGExpr.cpp
1761	I'm wondering if it would be possible to do the all the adjustments (not just setting the bitfield LValue alignment) within EmitLValueForField(). It would certainly simplify the code a lot (e.g. No need to dissect the load/store GEP as I currently do in AdjustAAPCSBitfieldAccess(), I can intercept the original GEP when it is being constructed). One problem I have is, as part of the adjustments, I have to override the CGBitFieldInfo of the bitfield LValue. But LValue holds a pointer to the CGBitFieldInfo and I'd have to allocate a new CGBitFieldInfo instance and pass it into LValue::MakeBitfield() method. Not sure if that is a good idea (who'd free it?). If you have any suggestions, please let me know. Adjusting alignments in EmitLValueForField() and then again adjusting other bitfield access parameters in the load/store methods would be a bit messy, I imagine. Thanks for all the feedback!

Addressing review comments by @rjmccall:

Moved all the AAPCS specific tweaks to EmitLValueForField(), this simplified the patch a lot (now there is no mucking about a de-constructed GEP at load/store points). In order to do this, LValue definition had to be updated so that it gets initialized with is own copy of CGBitFieldInfo.

A few high-level notes:
AAPCS requires the bit-field to be loaded on a store, even if the store fills the entire container; that doesn't seem to be implemented in your patch.

Fixed.

Especially because of #1, let's not do this unless the l-value is actually volatile. The ABI rule here is arguably actively wrong for non-volatile cases, e.g. struct { volatile char c; short s : 8; }.

The AAPCS also say (Section 7.1.7.5):

"If the container of a non-volatile bit-field overlaps a volatile bit-field then it is undefined whether access to the non volatile field will cause the volatile field to be accessed."

So it looks like doing this for normal fields is still within the bounds of AAPCS. In fact, armcc loads the volatile 'c' in your example when 's' is accessed. But I agree that limiting this to volatile cases is a good idea; we are still AAPCS compliant and there's less things to break.

Instead of using string comparisons all over the place, please make this a flag on the CG TargetInfo or something.

Factored this out into a small utility function.

Ping?

rsmith added a subscriber: rsmith.Feb 1 2016, 5:00 PM

rsmith added inline comments.

test/CodeGen/aapcs-bitfield.c
312–317	This violates the C and C++ object models by creating a data race on `m->y.a` that was not present in the source code. A store to a bit-field cannot write to bytes that are not part of the same sequence of bit-field members. If this ABI really requires that (and supports multi-threaded systems), it is not a correct ABI for C11 nor C++11. (This leaves open the question of which standard we should follow...)

rmaprath added inline comments.Feb 2 2016, 1:07 AM

test/CodeGen/aapcs-bitfield.c
312–317	Hi Richard, Thank you for this, I didn't know about this restriction in the C11/C++11 standards. The AAPCS is indeed at odds with the standards in this case, for a simpler example, consider: struct foo { char a; volatile short b : 8; }; void foo(struct foo *p) { p->b = 0xFF; } This store will cause 'a' to be written as well according to the AAPCS. The conflicting sections of the standards are: AAPCS: 7.1.7.4 Combining bit-field and non-bit-field members (+ 7.1.7.5 - volatile bitfield access) C++11: 1.7 The C++ memory model C11: 3.14 memory location I will take this up with the AAPCS authors.

I'm not the right person to review this.

rmaprath abandoned this revision.May 4 2016, 6:49 AM

dnsampaio mentioned this in D72932: [ARM] Follow AACPS standard for volatile bit-fields access width.Jan 17 2020, 9:06 AM

dnsampaio mentioned this in rG6a24339a4524: [ARM] Follow AACPS standard for volatile bit-fields access width.Jan 21 2020, 7:35 AM

stuij mentioned this in rG514df1b2bb1e: [ARM] Follow AACPS standard for volatile bit-fields access width.Sep 8 2020, 9:50 AM

stuij mentioned this in rG208987844ffa: [ARM] Follow AACPS standard for volatile bit-fields access width.Oct 13 2020, 2:32 AM

Revision Contents

Path

Size

lib/

CodeGen/

CGExpr.cpp

97 lines

CGValue.h

12 lines

CodeGenFunction.h

6 lines

test/

CodeGen/

aapcs-bitfield.c

353 lines

Diff 46264

lib/CodeGen/CGExpr.cpp

Context not available.
	Loc);	Loc);
	}	}

		// Helper method to check if the underlying ABI is AAPCS
		static bool isAAPCS(const TargetInfo &TargetInfo) {
		return TargetInfo.getABI().startswith("aapcs");
		}

	/// EmitIgnoredExpr - Emit code to compute the specified expression,	/// EmitIgnoredExpr - Emit code to compute the specified expression,
	/// ignoring the result.	/// ignoring the result.
	void CodeGenFunction::EmitIgnoredExpr(const Expr *E) {	void CodeGenFunction::EmitIgnoredExpr(const Expr *E) {
Context not available.
	llvm::Value *MaskedVal = SrcVal;	llvm::Value *MaskedVal = SrcVal;

	// See if there are other bits in the bitfield's storage we'll need to load	// See if there are other bits in the bitfield's storage we'll need to load
	// and mask together with source before storing.	// and mask together with source before storing. Note that AAPCS dictates
	if (Info.StorageSize != Info.Size) {	// volatile bitfield stores must always load the bitfield container, even
	assert(Info.StorageSize > Info.Size && "Invalid bitfield size.");	// when the bitfield fills the entire container.
		if (Info.StorageSize != Info.Size \|\|
		(isAAPCS(CGM.getTarget()) && Dst.isVolatileQualified())) {
		assert(Info.StorageSize >= Info.Size && "Invalid bitfield size.");
	llvm::Value *Val =	llvm::Value *Val =
	Builder.CreateLoad(Ptr, Dst.isVolatileQualified(), "bf.load");	Builder.CreateLoad(Ptr, Dst.isVolatileQualified(), "bf.load");

		rjmccallUnsubmitted Done Reply Inline Actions This alignment computation is wrong; you need to be basing this on the alignment of the base. It would be easier to do that in the formation of the LValue in the first place in EmitLValueForField, and then you won't need to modify as many of these uses. rjmccall: This alignment computation is wrong; you need to be basing this on the alignment of the base.
		rmaprathAuthorUnsubmitted Done Reply Inline Actions I'm wondering if it would be possible to do the all the adjustments (not just setting the bitfield LValue alignment) within EmitLValueForField(). It would certainly simplify the code a lot (e.g. No need to dissect the load/store GEP as I currently do in AdjustAAPCSBitfieldAccess(), I can intercept the original GEP when it is being constructed). One problem I have is, as part of the adjustments, I have to override the CGBitFieldInfo of the bitfield LValue. But LValue holds a pointer to the CGBitFieldInfo and I'd have to allocate a new CGBitFieldInfo instance and pass it into LValue::MakeBitfield() method. Not sure if that is a good idea (who'd free it?). If you have any suggestions, please let me know. Adjusting alignments in EmitLValueForField() and then again adjusting other bitfield access parameters in the load/store methods would be a bit messy, I imagine. Thanks for all the feedback! rmaprath: I'm wondering if it would be possible to do the all the adjustments (not just setting the…
Context not available.
	return CGF.Builder.CreateStructGEP(base, idx, offset, field->getName());	return CGF.Builder.CreateStructGEP(base, idx, offset, field->getName());
	}	}

		// AAPCS requires bitfield acesses to be performed using the natural alignment /
		// width of the container type. This requirement is at odds with how clang works
		// with record types. For example, consider:
		// struct s {
		// char a;
		// short b : 8;
		// };
		// Here, clang treats 'a' and 'b' as two separate (consecutive) storage units
		// with each having a size of 8-bits. This means an access to the bitfield 'b'
		// will use an i8* instead of an i16* (type of the container) as required by the
		// AAPCS. In other words, AAPCS allows the storage of 'a' and 'b' to overlap
		// (7.1.7.4); a load of 'b' will load 'a' as well, which is then masked out as
		// necessary. Modifying clang's abstraction of structures to allow overlapping
		// fields is quite obtrusive, we have therefore resolved to the following tweak
		// where we intercept bitfied loads / stores and adjust them so that they are
		// AAPCS compliant. Note that both clang and AAPCS already agree on the layout
		// of structs, it's only the access width / alignment that needs fixing.
		Address CodeGenFunction::AdjustAAPCSBitfieldLValue(Address Base,
		QualType FieldType,
		CGBitFieldInfo &Info,
		bool &Adjusted) {
		llvm::Type *ResLTy = ConvertTypeForMem(FieldType);

		// CGRecordLowering::setBitFieldInfo() pre-adjusts the bitfield offsets for
		// big-endian targets, but it assumes a container of width Info.StorageSize.
		// Since AAPCS uses a different container size (width of the type), we first
		// undo that calculation here and redo it once the bitfield offset within the
		// new container is calculated
		if (CGM.getTypes().getDataLayout().isBigEndian())
		Info.Offset = Info.StorageSize - (Info.Offset + Info.Size);

		// Offset to the bitfield from the beginning of the struct
		uint32_t AbsoluteOffset = getContext().toBits(Info.StorageOffset) +
		Info.Offset;

		// Container size is the width of the bitfield type
		uint32_t ContainerSize = ResLTy->getPrimitiveSizeInBits();

		// Offset within the container
		uint32_t MemberOffset = AbsoluteOffset & (ContainerSize - 1);

		// Bail out if an aligned load of the container cannot cover the entire
		// bitfield. This can happen for example, if the bitfield is part of a packed
		// struct. AAPCS does not define access rules for such cases, we let clang to
		// follow its own rules.
		if (MemberOffset + Info.Size > ContainerSize)
		return Base;

		// Re-adjust offsets for big-endian targets
		if (CGM.getTypes().getDataLayout().isBigEndian())
		MemberOffset = ContainerSize - (MemberOffset + Info.Size);

		// No turning back after this point
		Adjusted = true;

		// Calculate the new bitfield access parameters
		Info.StorageOffset =
		getContext().toCharUnitsFromBits(AbsoluteOffset & ~(ContainerSize - 1));
		Info.StorageSize = ContainerSize;
		Info.Offset = MemberOffset;

		// GEP into the bitfield container. Here we essentially treat the Base as a
		// pointer to a block of containers and index into it appropriately
		return Builder.CreateConstInBoundsGEP(
		Builder.CreateElementBitCast(Base, ResLTy),
		AbsoluteOffset / ContainerSize,
		getContext().toCharUnitsFromBits(ContainerSize));
		}

	LValue CodeGenFunction::EmitLValueForField(LValue base,	LValue CodeGenFunction::EmitLValueForField(LValue base,
	const FieldDecl *field) {	const FieldDecl *field) {
	AlignmentSource fieldAlignSource =	AlignmentSource fieldAlignSource =
Context not available.
	if (field->isBitField()) {	if (field->isBitField()) {
	const CGRecordLayout &RL =	const CGRecordLayout &RL =
	CGM.getTypes().getCGRecordLayout(field->getParent());	CGM.getTypes().getCGRecordLayout(field->getParent());
	const CGBitFieldInfo &Info = RL.getBitFieldInfo(field);	CGBitFieldInfo Info = RL.getBitFieldInfo(field);
	Address Addr = base.getAddress();	Address Addr = base.getAddress();
	unsigned Idx = RL.getLLVMFieldNo(field);	unsigned Idx = RL.getLLVMFieldNo(field);
		QualType fieldType =
		field->getType().withCVRQualifiers(base.getVRQualifiers());

		bool Adjusted = false;
		if (isAAPCS(CGM.getTarget()) && fieldType.isVolatileQualified()) {
		Addr = AdjustAAPCSBitfieldLValue(Addr, fieldType, Info, Adjusted);
		if (Adjusted)
		return LValue::MakeBitfield(Addr, Info, fieldType, fieldAlignSource);
		}

	if (Idx != 0)	if (Idx != 0)
	// For structs, we GEP to the field that the record layout suggests.	// For structs, we GEP to the field that the record layout suggests.
	Addr = Builder.CreateStructGEP(Addr, Idx, Info.StorageOffset,	Addr = Builder.CreateStructGEP(Addr, Idx, Info.StorageOffset,
Context not available.
	if (Addr.getElementType() != FieldIntTy)	if (Addr.getElementType() != FieldIntTy)
	Addr = Builder.CreateElementBitCast(Addr, FieldIntTy);	Addr = Builder.CreateElementBitCast(Addr, FieldIntTy);

	QualType fieldType =
	field->getType().withCVRQualifiers(base.getVRQualifiers());
	return LValue::MakeBitfield(Addr, Info, fieldType, fieldAlignSource);	return LValue::MakeBitfield(Addr, Info, fieldType, fieldAlignSource);
	}	}

Context not available.

lib/CodeGen/CGValue.h

Context not available.
	#include "llvm/IR/Value.h"	#include "llvm/IR/Value.h"
	#include "llvm/IR/Type.h"	#include "llvm/IR/Type.h"
	#include "Address.h"	#include "Address.h"
		#include "CGRecordLayout.h"

	namespace llvm {	namespace llvm {
	class Constant;	class Constant;
Context not available.
	namespace clang {	namespace clang {
	namespace CodeGen {	namespace CodeGen {
	class AggValueSlot;	class AggValueSlot;
	struct CGBitFieldInfo;

	/// RValue - This trivial value class is used to represent the result of an	/// RValue - This trivial value class is used to represent the result of an
	/// expression that is evaluated. It can be one of three things: either a	/// expression that is evaluated. It can be one of three things: either a
Context not available.

	// ExtVector element subset: V.xyx	// ExtVector element subset: V.xyx
	llvm::Constant *VectorElts;	llvm::Constant *VectorElts;

	// BitField start bit and size
	const CGBitFieldInfo *BitFieldInfo;
	};	};

		// BitField start bit and size
		CGBitFieldInfo BitFieldInfo;

	QualType Type;	QualType Type;

	// 'const' is unused here	// 'const' is unused here
Context not available.
	llvm::Value *getBitFieldPointer() const { assert(isBitField()); return V; }	llvm::Value *getBitFieldPointer() const { assert(isBitField()); return V; }
	const CGBitFieldInfo &getBitFieldInfo() const {	const CGBitFieldInfo &getBitFieldInfo() const {
	assert(isBitField());	assert(isBitField());
	return *BitFieldInfo;	return BitFieldInfo;
	}	}

	// global register lvalue	// global register lvalue
Context not available.
	LValue R;	LValue R;
	R.LVType = BitField;	R.LVType = BitField;
	R.V = Addr.getPointer();	R.V = Addr.getPointer();
	R.BitFieldInfo = &Info;	R.BitFieldInfo = Info;
	R.Initialize(type, type.getQualifiers(), Addr.getAlignment(), alignSource);	R.Initialize(type, type.getQualifiers(), Addr.getAlignment(), alignSource);
	return R;	return R;
	}	}
Context not available.

lib/CodeGen/CodeGenFunction.h

Context not available.
	/// \brief Emit code for sections directive.	/// \brief Emit code for sections directive.
	OpenMPDirectiveKind EmitSections(const OMPExecutableDirective &S);	OpenMPDirectiveKind EmitSections(const OMPExecutableDirective &S);

		/// \brief Perform AAPCS specific tweaks on bitfield accesses.
		Address AdjustAAPCSBitfieldLValue(Address Base,
		QualType FieldType,
		CGBitFieldInfo &Info,
		bool &Adjusted);

	public:	public:

	//===--------------------------------------------------------------------===//	//===--------------------------------------------------------------------===//
Context not available.

test/CodeGen/aapcs-bitfield.c

This file was added.

				// RUN: %clang_cc1 -triple armv8-none-linux-eabi %s -emit-llvm -o - -O3 \
				// RUN: \| FileCheck %s -check-prefix=LE -check-prefix=CHECK
				// RUN: %clang_cc1 -triple armebv8-none-linux-eabi %s -emit-llvm -o - -O3 \
				// RUN: \| FileCheck %s -check-prefix=BE -check-prefix=CHECK

				// CHECK: %struct.st1 = type { i8, i8 }
				// CHECK: %struct.st2 = type { i16, [2 x i8] }
				// CHECK: %struct.st3 = type { i16, i8 }
				// CHECK: %struct.st4 = type { i16, i8, i8 }
				// CHECK: %struct.st5b = type { i8, [3 x i8], %struct.st5a }
				// CHECK: %struct.st5a = type { i8, i8, [2 x i8] }
				// CHECK: %struct.st6 = type { i16, [2 x i8] }

				// A simple volatile bitfield, should be accessed through an i16*
				struct st1 {
				// Expected masks (0s mark the bit-field):
				// le: 0xff80 (-128)
				// be: 0x01ff (511)
				volatile short c : 7;
				};

				// CHECK-LABLE: st1_check_load
				int st1_check_load(struct st1 *m) {
				// LE: %[[PTR1:.]] = bitcast %struct.st1 %m to i16*
				// LE-NEXT: %[[LD:.]] = load volatile i16, i16 %[[PTR1]], align 2
				// LE-NEXT: %[[CLR1:.*]] = shl i16 %[[LD]], 9
				// LE-NEXT: %[[CLR2:.*]] = ashr exact i16 %[[CLR1]], 9
				// LE-NEXT: %[[SEXT:.*]] = sext i16 %[[CLR2]] to i32
				// LE-NEXT: ret i32 %[[SEXT]]

				// BE: %[[PTR1:.]] = bitcast %struct.st1 %m to i16*
				// BE-NEXT: %[[LD:.]] = load volatile i16, i16 %[[PTR1]], align 2
				// BE-NEXT: %[[CLR:.*]] = ashr i16 %[[LD]], 9
				// BE-NEXT: %[[SEXT:.*]] = sext i16 %[[CLR]] to i32
				// BE-NEXT: ret i32 %[[SEXT]]
				return m->c;
				}

				// CHECK-LABLE: st1_check_store
				void st1_check_store(struct st1 *m) {
				// LE: %[[PTR1:.]] = bitcast %struct.st1 %m to i16*
				// LE-NEXT: %[[LD:.]] = load volatile i16, i16 %[[PTR1]], align 2
				// LE-NEXT: %[[CLR:.*]] = and i16 %[[LD]], -128
				// LE-NEXT: %[[SET:.*]] = or i16 %[[CLR]], 1
				// LE-NEXT: store volatile i16 %[[SET]], i16* %[[PTR1]], align 2

				// BE: %[[PTR1:.]] = bitcast %struct.st1 %m to i16*
				// BE-NEXT: %[[LD:.]] = load volatile i16, i16 %[[PTR1]], align 2
				// BE-NEXT: %[[CLR:.*]] = and i16 %[[LD]], 511
				// BE-NEXT: %[[SET:.*]] = or i16 %[[CLR]], 512
				// BE-NEXT: store volatile i16 %[[SET]], i16* %[[PTR1]], align 2
				m->c = 1;
				}

				// 'c' should land straight after 'b' and should be accessed through an i16*
				struct st2 {
				int b : 10;
				// Expected masks (0s mark the bit-field):
				// le: 0x03ff (1023)
				// be: 0xffc0 (-64)
				volatile short c : 6;
				};

				// CHECK-LABLE: st2_check_load
				int st2_check_load(struct st2 *m) {
				// LE: %[[PTR:.]] = getelementptr inbounds %struct.st2, %struct.st2 %m, i32 0, i32 0
				// LE-NEXT: %[[LD:.]] = load volatile i16, i16 %[[PTR]], align 4
				// LE-NEXT: %[[CLR:.*]] = ashr i16 %[[LD]], 10
				// LE-NEXT: %[[SEXT:.*]] = sext i16 %[[CLR]] to i32
				// LE-NEXT: ret i32 %[[SEXT]]

				// BE: %[[PTR:.]] = getelementptr inbounds %struct.st2, %struct.st2 %m, i32 0, i32 0
				// BE-NEXT: %[[LD:.]] = load volatile i16, i16 %[[PTR]], align 4
				// BE-NEXT: %[[CLR1:.*]] = shl i16 %[[LD]], 10
				// BE-NEXT: %[[CLR2:.*]] = ashr exact i16 %[[CLR1]], 10
				// BE-NEXT: %[[SEXT:.*]] = sext i16 %[[CLR2]] to i32
				// BE-NEXT: ret i32 %[[SEXT]]
				return m->c;
				}

				// CHECK-LABLE: st2_check_store
				void st2_check_store(struct st2 *m) {
				// LE: %[[PTR:.]] = getelementptr inbounds %struct.st2, %struct.st2 %m, i32 0, i32 0
				// LE-NEXT: %[[LD:.]] = load volatile i16, i16 %[[PTR]], align 4
				// LE-NEXT: %[[CLR:.*]] = and i16 %[[LD]], 1023
				// LE-NEXT: %[[SET:.*]] = or i16 %[[CLR]], 1024
				// LE-NEXT: store volatile i16 %[[SET]], i16* %[[PTR]], align 4

				// BE: %[[PTR:.]] = getelementptr inbounds %struct.st2, %struct.st2 %m, i32 0, i32 0
				// BE-NEXT: %[[LD:.]] = load volatile i16, i16 %[[PTR]], align 4
				// BE-NEXT: %[[CLR:.*]] = and i16 %[[LD]], -64
				// BE-NEXT: %[[SET:.*]] = or i16 %[[CLR]], 1
				// BE-NEXT: store volatile i16 %[[SET]], i16* %[[PTR]], align 4
				m->c = 1;
				}

				// 'c' should land into the third byte and should be accessed through an i16*
				struct st3 {
				int a : 10;
				// Expected masks (0s mark the bit-field):
				// le: 0xff80 (-128)
				// be: 0x01ff (511)
				volatile short c : 7;
				};

				// CHECK-LABEL: st3_check_load
				int st3_check_load(struct st3 *m) {
				// LE: %[[PTR1:.]] = getelementptr inbounds %struct.st3, %struct.st3 %m, i32 0, i32 0
				// LE-NEXT: %[[PTR2:.]] = getelementptr inbounds i16, i16 %[[PTR1]], i32 1
				// LE-NEXT: %[[LD:.]] = load volatile i16, i16 %[[PTR2]], align 2
				// LE-NEXT: %[[CLR1:.*]] = shl i16 %[[LD]], 9
				// LE-NEXT: %[[CLR2:.*]] = ashr exact i16 %[[CLR1]], 9
				// LE-nEXT: %[[SEXT:.*]] = sext i16 %[[CLR2]] to i32
				// ret i32 %[[SEXT]]

				// BE: %[[PTR1:.]] = getelementptr inbounds %struct.st3, %struct.st3 %m, i32 0, i32 0
				// BE-NEXT: %[[PTR2:.]] = getelementptr inbounds i16, i16 %[[PTR1]], i32 1
				// BE-NEXT: %[[LD:.]] = load volatile i16, i16 %[[PTR2]], align 2
				// BE-NEXT: %[[CLR:.*]] = ashr i16 %[[LD]], 9
				// BE-NEXT: %[[SEXT:.*]] = sext i16 %[[CLR]] to i32
				// BE-NEXT: ret i32 %[[SEXT]]
				return m->c;
				}

				// CHECK-LABEL: st3_check_store
				void st3_check_store(struct st3 *m) {
				// LE: %[[PTR1:.]] = getelementptr inbounds %struct.st3, %struct.st3 %m, i32 0, i32 0
				// LE-NEXT: %[[PTR2:.]] = getelementptr inbounds i16, i16 %[[PTR1]], i32 1
				// LE-NEXT: %[[LD:.]] = load volatile i16, i16 %[[PTR2]], align 2
				// LE-NEXT: %[[CLR:.*]] = and i16 %[[LD]], -128
				// LE-NEXT: %[[SET:.*]] = or i16 %[[CLR]], 1
				// LE-NEXT: store volatile i16 %[[SET]], i16* %[[PTR2]], align 2

				// BE: %[[PTR1:.]] = getelementptr inbounds %struct.st3, %struct.st3 %m, i32 0, i32 0
				// BE-NEXT: %[[PTR2:.]] = getelementptr inbounds i16, i16 %[[PTR1]], i32 1
				// BE-NEXT: %[[LD:.]] = load volatile i16, i16 %[[PTR2]], align 2
				// BE-NEXT: %[[CLR:.*]] = and i16 %[[LD]], 511
				// BE-NEXT: %[[SET:.*]] = or i16 %[[CLR]], 512
				// BE-NEXT: store volatile i16 %[[SET]], i16* %[[PTR2]], align 2
				m->c = 1;
				}

				// Overlapping (access) of volatile bitfields and normal fields
				struct st4 {
				// Occupies the first two bytes
				// le: 0xfffff000 (-4096)
				// be: 0x000fffff (1048575)
				volatile int a : 12;
				// Occupies the third byte
				char b;
				// Occupies the last byte
				// le: 0xe0ffffff (-520093697)
				// be: 0xffffff07 (-249)
				volatile int c : 5;
				};

				// CHECK-LABEL: st4_check_load
				int st4_check_load(struct st4 *m) {
				// LE: %[[PTR1:.]] = bitcast %struct.st4 %m to i32*
				// LE-NEXT: %[[LD:.]] = load volatile i32, i32 %[[PTR1]], align 4
				// LE-NEXT: %[[CLR1:.*]] = shl i32 %[[LD]], 20
				// LE-NEXT: %[[CLR2:.*]] = ashr exact i32 %[[CLR1]], 20

				// BE: %[[PTR1:.]] = bitcast %struct.st4 %m to i32*
				// BE-NEXT: %[[LD:.]] = load volatile i32, i32 %[[PTR1]], align 4
				// BE-NEXT: %[[CLR1:.*]] = ashr i32 %[[LD]], 20
				int x = m->a;

				// LE: %[[PTR2:.]] = getelementptr inbounds %struct.st4, %struct.st4 %m, i32 0, i32 1
				// LE-NEXT: %[[LD2:.]] = load i8, i8 %[[PTR2]], align 2{{.*}}
				// LE-NEXT: %[[SEXT:.*]] = sext i8 %[[LD2]] to i32
				// LE-NEXT: %[[RES1:.*]] = add nsw i32 %[[CLR2]], %[[SEXT]]

				// BE: %[[PTR2:.]] = getelementptr inbounds %struct.st4, %struct.st4 %m, i32 0, i32 1
				// BE-NEXT: %[[LD2:.]] = load i8, i8 %[[PTR2]], align 2{{.*}}
				// BE-NEXT: %[[SEXT:.*]] = sext i8 %[[LD2]] to i32
				// BE-NEXT: %[[RES1:.*]] = add nsw i32 %[[SEXT]], %[[CLR1]]
				x += m->b;

				// LE: %[[LD3:.]] = load volatile i32, i32 %[[PTR1]], align 4
				// LE-NEXT: %[[CLR3:.*]] = shl i32 %[[LD3]], 3
				// LE-NEXT: %[[CLR4:.*]] = ashr i32 %[[CLR3]], 27
				// LE-NEXT: %[[RES2:.*]] = add nsw i32 %[[RES1]], %[[CLR4]]

				// BE: %[[LD3:.]] = load volatile i32, i32 %[[PTR1]], align 4
				// BE-NEXT: %[[CLR3:.*]] = shl i32 %[[LD3]], 24
				// BE-NEXT: %[[CLR4:.*]] = ashr i32 %[[CLR3]], 27
				// BE-NEXT: %[[RES2:.*]] = add nsw i32 %[[RES1]], %[[CLR4]]
				x += m->c;

				// LE: ret i32 %[[RES2]]
				// BE: ret i32 %[[RES2]]
				return x;
				}

				// CHECK-LABEL: st4_check_store
				void st4_check_store(struct st4 *m) {
				// LE: %[[PTR1:.]] = bitcast %struct.st4 %m to i32*
				// LE-NEXT: %[[LD:.]] = load volatile i32, i32 %[[PTR1]], align 4
				// LE-NEXT: %[[CLR:.*]] = and i32 %[[LD]], -4096
				// LE-NEXT: %[[SET:.*]] = or i32 %[[CLR]], 1
				// LE-NEXT: store volatile i32 %[[SET]], i32* %[[PTR1]], align 4

				// BE: %[[PTR1:.]] = bitcast %struct.st4 %m to i32*
				// BE-NEXT: %[[LD:.]] = load volatile i32, i32 %[[PTR1]], align 4
				// BE-NEXT: %[[CLR:.*]] = and i32 %[[LD]], 1048575
				// BE-NEXT: %[[SET:.*]] = or i32 %[[CLR]], 1048576
				// BE-NEXT: store volatile i32 %[[SET]], i32* %[[PTR1]], align 4
				m->a = 1;

				// LE: %[[PTR2:.]] = getelementptr inbounds %struct.st4, %struct.st4 %m, i32 0, i32 1
				// LE-NEXT: store i8 2, i8* %[[PTR2]], align 2{{.*}}
				// BE: %[[PTR2:.]] = getelementptr inbounds %struct.st4, %struct.st4 %m, i32 0, i32 1
				// BE-NEXT: store i8 2, i8* %[[PTR2]], align 2{{.*}}
				m->b = 2;

				// LE: %[[LD2:.]] = load volatile i32, i32 %[[PTR1]], align 4
				// LE-NEXT: %[[CLR2:.*]] = and i32 %[[LD2]], -520093697
				// LE-NEXT: %[[SET2:.*]] = or i32 %[[CLR2]], 50331648
				// LE-NEXT: store volatile i32 %[[SET2]], i32* %[[PTR1]], align 4

				// BE: %[[LD2:.]] = load volatile i32, i32 %[[PTR1]], align 4
				// BE-NEXT: %[[CLR2:.*]] = and i32 %[[LD2]], -249
				// BE-NEXT: %[[SET2:.*]] = or i32 %[[CLR2]], 24
				// store volatile i32 %[[SET2]], i32* %[[PTR2]], align 4
				m->c = 3;
				}

				// Nested structs and volatile bitfields
				struct st5a {
				// Occupies the first byte
				char a;
				// Occupies the second byte
				// le: 0xffffe0ff (-7937)
				// be: 0xff07ffff (-16252929)
				volatile int b : 5;
				};

				struct st5b {
				// Occupies the first byte
				char x;
				// Comes after 3 bytes of padding
				struct st5a y;
				};

				// CHECK-LABEL: st7_check_load
				int st7_check_load(struct st5b *m) {
				// LE: %[[PTR:.]] = getelementptr inbounds %struct.st5b, %struct.st5b %m, i32 0, i32 0
				// LE-NEXT: %[[LD:.]] = load i8, i8 %[[PTR]], align 4{{.*}}
				// LE-NEXT: %[[SEXT:.*]] = sext i8 %[[LD]] to i32

				// BE: %[[PTR:.]] = getelementptr inbounds %struct.st5b, %struct.st5b %m, i32 0, i32 0
				// BE-NEXT: %[[LD:.]] = load i8, i8 %[[PTR]], align 4{{.*}}
				// BE-NEXT: %[[SEXT:.*]] = sext i8 %[[LD]] to i32
				int r = m->x;

				// LE: %[[PTR1:.]] = getelementptr inbounds %struct.st5b, %struct.st5b %m, i32 0, i32 2
				// LE-NEXT: %[[PTR2:.]] = getelementptr inbounds %struct.st5a, %struct.st5a %[[PTR1]], i32 0, i32 0
				// LE-NEXT: %[[LD1:.]] = load i8, i8 %[[PTR2]], align 4{{.*}}
				// LE-NEXT: %[[SEXT1:.*]] = sext i8 %[[LD1]] to i32
				// LE-NEXT: %[[RES:.*]] = add nsw i32 %[[SEXT1]], %[[SEXT]]

				// BE: %[[PTR1:.]] = getelementptr inbounds %struct.st5b, %struct.st5b %m, i32 0, i32 2
				// BE-NEXT: %[[PTR2:.]] = getelementptr inbounds %struct.st5a, %struct.st5a %[[PTR1]], i32 0, i32 0
				// BE-NEXT: %[[LD1:.]] = load i8, i8 %[[PTR2]], align 4{{.*}}
				// BE-NEXT: %[[SEXT1:.*]] = sext i8 %[[LD1]] to i32
				// BE-NEXT: %[[RES:.*]] = add nsw i32 %[[SEXT1]], %[[SEXT]]
				r += m->y.a;

				// LE: %[[PTR3:.]] = bitcast %struct.st5a %[[PTR1]] to i32*
				// LE-NEXT: %[[LD2:.]] = load volatile i32, i32 %[[PTR3]], align 4
				// LE-NEXT: %[[CLR:.*]] = shl i32 %[[LD2]], 19
				// LE-NEXT: %[[CLR1:.*]] = ashr i32 %[[CLR]], 27
				// LE-NEXT: %[[RES1:.*]] = add nsw i32 %[[RES]], %[[CLR1]]

				// BE: %[[PTR3:.]] = bitcast %struct.st5a %[[PTR1]] to i32*
				// BE-NEXT: %[[LD2:.]] = load volatile i32, i32 %[[PTR3]], align 4
				// BE-NEXT: %[[CLR:.*]] = shl i32 %[[LD2]], 8
				// BE-NEXT: %[[CLR1:.*]] = ashr i32 %[[CLR]], 27
				// BE-NEXT: %[[RES1:.*]] = add nsw i32 %[[RES]], %[[CLR1]]
				r += m->y.b;

				// LE: ret i32 %[[RES1]]
				// BE: ret i32 %[[RES1]]
				return r;
				}

				// CHECK-LABEL: st7_check_store
				void st7_check_store(struct st5b *m) {
				// LE: %[[PTR1:.]] = getelementptr inbounds %struct.st5b, %struct.st5b %m, i32 0, i32 0
				// LE-NEXT: store i8 1, i8* %[[PTR1]], align 4{{.*}}

				// BE: %[[PTR1:.]] = getelementptr inbounds %struct.st5b, %struct.st5b %m, i32 0, i32 0
				// BE-NEXT: store i8 1, i8* %[[PTR1]], align 4{{.*}}
				m->x = 1;

				// LE: %[[PTR2:.]] = getelementptr inbounds %struct.st5b, %struct.st5b %m, i32 0, i32 2
				// LE-NEXT: %[[PTR3:.]] = getelementptr inbounds %struct.st5a, %struct.st5a %[[PTR2]], i32 0, i32 0
				// LE-NEXT: store i8 2, i8* %[[PTR3]], align 4{{.*}}

				// BE: %[[PTR2:.]] = getelementptr inbounds %struct.st5b, %struct.st5b %m, i32 0, i32 2
				// BE-NEXT: %[[PTR3:.]] = getelementptr inbounds %struct.st5a, %struct.st5a %[[PTR2]], i32 0, i32 0
				// BE-NEXT: store i8 2, i8* %[[PTR3]], align 4{{.*}}
				m->y.a = 2;

				// LE: %[[PTR3:.]] = bitcast %struct.st5a %[[PTR2]] to i32*
				// LE-NEXT: %[[LD:.]] = load volatile i32, i32 %[[PTR3]], align 4
				// LE-NEXT: %[[CLR:.*]] = and i32 %[[LD]], -7937
				// LE-NEXT: %[[SET:.*]] = or i32 %[[CLR]], 768
				// LE-NEXT: store volatile i32 %[[SET]], i32* %[[PTR3]], align 4

				// BE: %[[PTR3:.]] = bitcast %struct.st5a %[[PTR2]] to i32*
				// BE-NEXT: %[[LD:.]] = load volatile i32, i32 %[[PTR3]], align 4
				// BE-NEXT: %[[CLR:.*]] = and i32 %[[LD]], -16252929
				// BE-NEXT: %[[SET:.*]] = or i32 %[[CLR]], 1572864
				// BE-NEXT: store volatile i32 %[[SET]], i32* %[[PTR3]], align 4
				m->y.b = 3;
				rsmithUnsubmitted Not Done Reply Inline Actions This violates the C and C++ object models by creating a data race on `m->y.a` that was not present in the source code. A store to a bit-field cannot write to bytes that are not part of the same sequence of bit-field members. If this ABI really requires that (and supports multi-threaded systems), it is not a correct ABI for C11 nor C++11. (This leaves open the question of which standard we should follow...) rsmith: This violates the C and C++ object models by creating a data race on `m->y.a` that was not…
				rmaprathAuthorUnsubmitted Not Done Reply Inline Actions Hi Richard, Thank you for this, I didn't know about this restriction in the C11/C++11 standards. The AAPCS is indeed at odds with the standards in this case, for a simpler example, consider: struct foo { char a; volatile short b : 8; }; void foo(struct foo p) { p->b = 0xFF; } This store will cause 'a' to be written as well according to the AAPCS. The conflicting sections of the standards are: AAPCS: 7.1.7.4 Combining bit-field and non-bit-field members (+ 7.1.7.5 - volatile bitfield access) C++11: 1.7 The C++ memory model C11: 3.14 memory location I will take this up with the AAPCS authors. rmaprath:* Hi Richard, Thank you for this, I didn't know about this restriction in the C11/C++11…
				}

				// Check overflowing assignments to volatile bitfields
				struct st6 {
				volatile unsigned f : 16;
				};

				// CHECK-LABEL: st6_check_assignment
				int st6_check_assignment(struct st6 *m) {
				// LE: %[[PTR:.]] = bitcast %struct.st6 %m to i32*
				// LE-NEXT: %[[LD:.]] = load volatile i32, i32 %[[PTR]], align 4
				// LE-NEXT: %[[SET:.*]] = or i32 %[[LD]], 65535
				// LE-NEXT: store volatile i32 %[[SET]], i32* %[[PTR]], align 4
				// LE-NEXT: ret i32 65535

				// BE: %[[PTR:.]] = bitcast %struct.st6 %m to i32*
				// BE-NEXT: %[[LD:.]] = load volatile i32, i32 %[[PTR]], align 4
				// BE-NEXT: %[[SET:.*]] = or i32 %[[LD]], -65536
				// BE-NEXT: store volatile i32 %[[SET]], i32* %[[PTR]], align 4
				// BE-NEXT: ret i32 65535
				return m->f = 0xffffffff;
				}

				// Check that volatile bitfield stores load the container, even when the
				// bitfield occupy the entire container
				struct st9 {
				volatile short a : 16;
				};

				// CHECK-LABEL: st9_check_full_volatile_store
				void st9_check_full_volatile_store(struct st9 *m) {
				// CHECK: %[[PTR:.]] = getelementptr inbounds %struct.st9, %struct.st9 %m, i32 0, i32 0
				// CHECK-NEXT: %[[LD:.]] = load volatile i16, i16 %[[PTR]], align 2
				// CHECK-NEXT: store volatile i16 15, i16* %[[PTR]], align 2
				m->a = 0xf;
				}