This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/clang/
-
clang/
-
Basic/
1/1
DiagnosticDriverKinds.td
-
Driver/
2/6
Options.td
-
Frontend/
1/2
CodeGenOptions.def
-
lib/
-
CodeGen/
2/4
CGRecordLayoutBuilder.cpp
-
Driver/ToolChains/
-
ToolChains/
-
Clang.cpp
-
Frontend/
-
CompilerInvocation.cpp
-
test/CodeGenCXX/
-
CodeGenCXX/
-
finegrain-bitfield-access.cpp

Differential D36562

[Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type
ClosedPublic

Authored by wmi on Aug 9 2017, 5:04 PM.

Download Raw Diff

Details

Reviewers

chandlerc
eli.friedman
davidxl
hfinkel

Commits

rG9b3d6272800a: [Bitfield] Add an option to access bitfield in a fine-grained manner.
rC315915: [Bitfield] Add an option to access bitfield in a fine-grained manner.
rL315915: [Bitfield] Add an option to access bitfield in a fine-grained manner.

Summary

Now, all the consecutive bitfields are wrapped as a large integer unless there is unamed zero sized bitfield in between. The patch is trying to make the bitfield to be accessed as separate memory location if it has width of legal integer type and its bit offset is naturally aligned for the type. This can significantly improve the access efficiency of such bitfield.

https://reviews.llvm.org/D30416 wants to achieve the same goal in llvm, but it is much more difficult because it has to deal with the significantly tweaked IR after all sorts of optimizations. The patch here is trying to do that in clang.

With the patch, we can remove most of D30416 except the illegal memory access shrinking.

Diff Detail

Repository: rL LLVM

Event Timeline

wmi created this revision.Aug 9 2017, 5:04 PM

Herald added a subscriber: sanjoy. · View Herald TranscriptAug 9 2017, 5:04 PM

wmi edited the summary of this revision. (Show Details)Aug 9 2017, 5:05 PM

wmi edited the summary of this revision. (Show Details)Aug 9 2017, 5:13 PM

This has been discussed before and I still pretty strongly disagree with it.

This cripples the ability of TSan to find race conditions between accesses to consecutive bitfields -- and these bugs have actually come up.

We also have had cases in the past where LLVM missed significant bitfield combines and optimizations due to loading them as separate integers. Those would become problems again, and I think they would be even harder to solve than narrowing the access is going to be because we will have strictly less information to work with.

Ultimately, while I understand the appeal of this approach, I don't think it is correct and I think we should instead figure out how to optimize these memory accesses well in LLVM. That approach will have the added benefit of optimizing cases where the user has manually used a large integer to simulate bitfields, and making combining and canonicalization substantially easier.

craig.topper added a subscriber: cfe-commits.Aug 9 2017, 7:49 PM

In D36562#837594, @chandlerc wrote:

This has been discussed before and I still pretty strongly disagree with it.

This cripples the ability of TSan to find race conditions between accesses to consecutive bitfields -- and these bugs have actually come up.

I guess you mean accessing different bitfields in a consecutive run simultaneously can cause data race. Because bitfields order in a consecutive run is implementation defined. With the change, Tsan may miss reporting that, but such data race can be exposed in a different compiler.

This can be solved by detecting tsan mode in the code. If tsan is enabled, we can stop splitting the bitfields.

We also have had cases in the past where LLVM missed significant bitfield combines and optimizations due to loading them as separate integers. Those would become problems again, and I think they would be even harder to solve than narrowing the access is going to be because we will have strictly less information to work with.

how about only separating legal integer width bitfields at the beginning of a consecutive run? Then it won't hinder bitfields combines. This way, it can still help for some cases, including the important case we saw.

Ultimately, while I understand the appeal of this approach, I don't think it is correct and I think we should instead figure out how to optimize these memory accesses well in LLVM. That approach will have the added benefit of optimizing cases where the user has manually used a large integer to simulate bitfields, and making combining and canonicalization substantially easier.

Don't separate bitfield in the middle of a run because it is possible to hinder bitfields accesses combine. Only separate bitfield at the beginning of a run.

I limit the bitfield separation in the last update to only happen at the beginning of a run so no bitfield combine will be blocked.

I think I need to explain more about why I change the direction and start to work on the problem in frontend. Keeping the information by generating widening type and letting llvm pass to do narrowing looks like a good solution to me originally. However, I saw in real cases that the narrowing approach in a late llvm stage has more difficulties than I originally thought. Some of them are solved but at the cost of code complexity, but others are more difficult.

store forwarding issue: To extract legal integer width bitfields from a large integer generated by frontend, we need to split both stores and loads related with legal integer bitfields. If store is narrowed and load is not, the width of load may be smaller than the store and the target may have difficulty to do store forwarding and that fact will hit the performance. Note, we found case that related load and store are in different funcs, so when deciding whether to narrow a store or not, it is possible that we have no idea that the related load is narrowed or not. If we cannot know all the related loads will be narrowed, the store is better not to be narrowed.

After instcombine, some bitfield access information will be lost: The case we saw is: unsigned long bf1 : 16 unsigned long bf2 : 16 unsigned long bf3 : 16 unsigned long bf4 : 8

bool cond = "bf3 == 0 && bf4 == -1";

Before instcombine, bf3 and bf4 are extracted from an i64 separately. We can know bf3 is a 16 bits access and bf4 is a 8 bit access from the extracting code pattern. After instcombine, bf3 and bf4 are merged into a 24 bit access, the comparison above is changed to: extract 24 bit data from the i64 (%bf.load = wide load i64, %extract = and %bf.load, 0xffffff00000000) and compare %extract with 0xffff00000000. The information that there are two legal integer bitfield accesses is lost, and we won't do narrowing for the load here.

Because we cannot split the load here, we trigger store forwarding issue.

That is why I am exploring to work on the bitfield access issue in multiple directions.

Try another idea suggested by David.

All the bitfields in a single run are still wrapped inside of a large integer according to CGBitFieldInfo. For the bitfields with legal integer types and aligned, change their access manner when we generate load/store in llvm IR for bitfield (In EmitLoadOfBitfieldLValue and EmitStoreThroughBitfieldLValue). All the other bitfields will still be accessed using widen load/store plus masking operations. Here is an example:

class A {

unsigned long f1:2;
unsigned long f2:6;
unsigned long f3:8;
unsigned long f4:4;

};
A a;

f1, f2, f3 and f4 will still be wrapped as a large integer. f1, f2, f4 will have the same access code as before. f3 will be accessed as if it is a separate unsigned char variable.

In this way, we can reduce the chance of blocking bitfield access combining. a.f1 access and a.f4 access can be combined if only no a.f3 access stands in between a.f1 and a.f4. We will generate two less instructions for foo, and one more instruction for goo. So it is better, but not perfect.

void foo (unsigned long n1, unsigned long n2, unsigned long n3) {

a.f1 = n1;
a.f4 = n4;
a.f3 = n3;

}

void goo (unsigned long n1, unsigned long n2, unsigned long n3) {

a.f1 = n1;
a.f3 = n3;    // a.f3 will still block the combining of a.f1 and a.f2 because a.f3 is accessed independently.
a.f4 = n4;

}

I'm really not a fan of the degree of complexity and subtlety that this introduces into the frontend, all to allow particular backend optimizations.

I feel like this is Clang working around a fundamental deficiency in LLVM and we should instead find a way to fix this in LLVM itself.

As has been pointed out before, user code can synthesize large integers that small bit sequences are extracted from, and Clang and LLVM should handle those just as well as actual bitfields.

Can we see how far we can push the LLVM side before we add complexity to Clang here? I understand that there remain challenges to LLVM's stuff, but I don't think those challenges make *all* of the LLVM improvements off the table, I don't think we've exhausted all ways of improving the LLVM changes being proposed, and I think we should still land all of those and re-evaluate how important these issues are when all of that is in place.

Changes following the discussion:

Put the bitfield split logic under an option and off by default.
When sanitizer is enabled, the option for bitfield split will be ignored and a warning message will be emitted.

In addition, a test is added.

You seem to be only changing the behavior for the "separatable" fields, but I suspect you want to change the behavior for the others too. The bitfield would be decomposed into shards, separated by the naturally-sized-and-aligned fields. Each access only loads its shard. For example, in your test case you have:

struct S3 {
  unsigned long f1:14;
  unsigned long f2:18;
  unsigned long f3:32;
};

and you test that, with this option, loading/storing to a3.f3 only access the specific 4 bytes composing f3. But if you load f1 or f2, we're still loading all 8 bytes, right? I think we should only load/store the lower 4 bytes when we access a3.f1 and/or a3.f2.

Otherwise, you can again end up with the narrow-store/wide-load problem for nearby fields under a different set of circumstances.

include/clang/Driver/Options.td
1039	I'm not opposed to -fsplit-bitfields, but I'd prefer if we find something more self-explanatory. It's not really clear what "splitting a bitfield" means. Maybe? -fsplit-bitfield-accesses -fdecomposed-bitfield-accesses -fsharded-bitfield-accesses -ffine-grained-bitfield-accesses (I think that I prefer -ffine-grained-bitfield-accesses, although it's the longest)
1041	How about? Use separate access for bitfields with legal widths and alignments. I don't think that "in LLVM" is needed here (or we could put "in LLVM" on an awful lot of these options).
lib/CodeGen/CGExpr.cpp
1679 ↗	(On Diff #116232)	var -> variable

In D36562#880808, @hfinkel wrote:
You seem to be only changing the behavior for the "separatable" fields, but I suspect you want to change the behavior for the others too. The bitfield would be decomposed into shards, separated by the naturally-sized-and-aligned fields. Each access only loads its shard. For example, in your test case you have:
struct S3 {
  unsigned long f1:14;
  unsigned long f2:18;
  unsigned long f3:32;
};
and you test that, with this option, loading/storing to a3.f3 only access the specific 4 bytes composing f3. But if you load f1 or f2, we're still loading all 8 bytes, right? I think we should only load/store the lower 4 bytes when we access a3.f1 and/or a3.f2.

This is intentional. if the struct S3 is like following:
struct S3 {

unsigned long f1:14;
unsigned long f2:32;
unsigned long f3:18;

};

and if there is no write of a.f2 between a.f1 and a.f3, the loads of a.f1 and a.f2 can still be shared. It is trying to keep the combining opportunity maximally while reducing the cost of accessing naturally-sized-and-aligned fields

Otherwise, you can again end up with the narrow-store/wide-load problem for nearby fields under a different set of circumstances.

Good catch. It is possible to have the problem indeed. Considering the big perf impact and triaging difficulty of store-forwarding problem, I have to sacrifice the combining opportunity above and take the suggestion just as you describe.

Thanks,
Wei.

include/clang/Driver/Options.td
1039	Ok.
1041	Sure.

Address Hal's comment. Separate bitfields to shards separated by the naturally-sized-and-aligned fields.

hfinkel added inline comments.Oct 5 2017, 3:34 PM

lib/CodeGen/CGRecordLayoutBuilder.cpp
411	betterBeSingleFieldRun -> IsBetterAsSingleFieldRun
455	The logic here is not obvious. Can you please add a comment. SingleFieldRun here is only not equal to `betterBeSingleFieldRun(Field)` if we've skipped 0-length bitfields, right? Please explain what's going on and also please make sure there's a test case.

wmi marked an inline comment as done.Oct 5 2017, 6:22 PM

wmi added inline comments.

lib/CodeGen/CGRecordLayoutBuilder.cpp
455	I restructure the code a little bit and hope the logic is more clear. I already have a testcase added for it.

Address Hal's comment.

hfinkel added inline comments.Oct 7 2017, 11:08 PM

include/clang/Basic/DiagnosticDriverKinds.td
335	with a sanitizer
include/clang/Driver/Options.td
1041	access -> accesses
1044	Use large-integer access for consecutive bitfield runs.
include/clang/Frontend/CodeGenOptions.def
182	These lines are too long.
lib/CodeGen/CGRecordLayoutBuilder.cpp
449	Why do you have the `IsBetterAsSingleFieldRun(Run)` check here (where we'll evaluate it multiple times (for all of the fields in the run)). Can't you make the predicate above directly? // Any non-zero-length bitfield can start a new run. if (Field->getBitWidthValue(Context) != 0 && !IsBetterAsSingleFieldRun(Field)) { Run = Field; StartBitOffset = getFieldBitOffset(*Field); ...

Address Hal's comments.

LGTM

include/clang/Frontend/CodeGenOptions.def
182	finegrained -> fine-grained (I suppose we're not sticking to 80 cols in this file anyway)

This revision is now accepted and ready to land.Oct 12 2017, 9:35 AM

Closed by commit rL315915: [Bitfield] Add an option to access bitfield in a fine-grained manner. (authored by wmi). · Explain WhyOct 16 2017, 9:50 AM

This revision was automatically updated to reflect the committed changes.

Shouldn't we disable OPT_ffine_grained_bitfield_accesses only if TSAN is active?

Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2019, 8:18 AM

In D36562#1639441, @chill wrote:

Shouldn't we disable OPT_ffine_grained_bitfield_accesses only if TSAN is active?

I don't remember why it is disabled for all sanitizer modes. Seems you are right that the disabling the option is only necessary for TSAN. Do you have actual needs for the option to be functioning on other sanitizer modes?

In D36562#1641930, @wmi wrote:

In D36562#1639441, @chill wrote:

Shouldn't we disable OPT_ffine_grained_bitfield_accesses only if TSAN is active?

I don't remember why it is disabled for all sanitizer modes. Seems you are right that the disabling the option is only necessary for TSAN. Do you have actual needs for the option to be functioning on other sanitizer modes?

Well, yes and no. We have the option enabled by default and it causes a warning when we use it together with -fsanitize=memtag (we aren't really concerned with other sanitizers). That warning broke a few builds (e.g. CMake doing tests and not wanting to see *any* diagnostics. We can work around that in a number of ways, e.g. we can leave the default off for AArch64.

I'd prefer though to have an upstream solution, if that's considered beneficial for all LLVM users and this one seems like such a case: let anyone use the option with sanitizers, unless it's known that some sanitizers'utility is affected negatively (as with TSAN).

In D36562#1642403, @chill wrote:

In D36562#1641930, @wmi wrote:

In D36562#1639441, @chill wrote:

Shouldn't we disable OPT_ffine_grained_bitfield_accesses only if TSAN is active?

I don't remember why it is disabled for all sanitizer modes. Seems you are right that the disabling the option is only necessary for TSAN. Do you have actual needs for the option to be functioning on other sanitizer modes?

Well, yes and no. We have the option enabled by default and it causes a warning when we use it together with -fsanitize=memtag (we aren't really concerned with other sanitizers). That warning broke a few builds (e.g. CMake doing tests and not wanting to see *any* diagnostics. We can work around that in a number of ways, e.g. we can leave the default off for AArch64.

I'd prefer though to have an upstream solution, if that's considered beneficial for all LLVM users and this one seems like such a case: let anyone use the option with sanitizers, unless it's known that some sanitizers'utility is affected negatively (as with TSAN).

Thanks for providing the background in detail. I sent out a patch for it: https://reviews.llvm.org/D66726

Revision Contents

Path

Size

include/

clang/

Basic/

DiagnosticDriverKinds.td

4 lines

Driver/

Options.td

7 lines

Frontend/

CodeGenOptions.def

2 lines

lib/

CodeGen/

CGRecordLayoutBuilder.cpp

34 lines

Driver/

ToolChains/

Clang.cpp

3 lines

Frontend/

CompilerInvocation.cpp

10 lines

test/

CodeGenCXX/

finegrain-bitfield-access.cpp

162 lines

Diff 117947

include/clang/Basic/DiagnosticDriverKinds.td

	Show First 20 Lines • Show All 324 Lines • ▼ Show 20 Lines

	def err_drv_unsupported_linker : Error<"unsupported value '%0' for -linker option">;			def err_drv_unsupported_linker : Error<"unsupported value '%0' for -linker option">;
	def err_drv_defsym_invalid_format : Error<"defsym must be of the form: sym=value: %0">;			def err_drv_defsym_invalid_format : Error<"defsym must be of the form: sym=value: %0">;
	def err_drv_defsym_invalid_symval : Error<"Value is not an integer: %0">;			def err_drv_defsym_invalid_symval : Error<"Value is not an integer: %0">;
	def warn_drv_msvc_not_found : Warning<			def warn_drv_msvc_not_found : Warning<
	"unable to find a Visual Studio installation; "			"unable to find a Visual Studio installation; "
	"try running Clang from a developer command prompt">,			"try running Clang from a developer command prompt">,
	InGroup<DiagGroup<"msvc-not-found">>;			InGroup<DiagGroup<"msvc-not-found">>;

				def warn_drv_fine_grained_bitfield_accesses_ignored : Warning<
				"option '-ffine-grained-bitfield-accesses' cannot be enabled together with sanitizer; flag ignored">,
				hfinkelUnsubmitted Done Reply Inline Actions with a sanitizer hfinkel: with a sanitizer
				InGroup<OptionIgnored>;
	}			}

include/clang/Driver/Options.td

Show First 20 Lines • Show All 1,030 Lines • ▼ Show 20 Lines	def fxray_always_instrument :
JoinedOrSeparate<["-"], "fxray-always-instrument=">,		JoinedOrSeparate<["-"], "fxray-always-instrument=">,
Group<f_Group>, Flags<[CC1Option]>,		Group<f_Group>, Flags<[CC1Option]>,
HelpText<"Filename defining the whitelist for imbuing the 'always instrument' XRay attribute.">;		HelpText<"Filename defining the whitelist for imbuing the 'always instrument' XRay attribute.">;
def fxray_never_instrument :		def fxray_never_instrument :
JoinedOrSeparate<["-"], "fxray-never-instrument=">,		JoinedOrSeparate<["-"], "fxray-never-instrument=">,
Group<f_Group>, Flags<[CC1Option]>,		Group<f_Group>, Flags<[CC1Option]>,
HelpText<"Filename defining the whitelist for imbuing the 'never instrument' XRay attribute.">;		HelpText<"Filename defining the whitelist for imbuing the 'never instrument' XRay attribute.">;

		def ffine_grained_bitfield_accesses : Flag<["-"],
		hfinkelUnsubmitted Not Done Reply Inline Actions I'm not opposed to -fsplit-bitfields, but I'd prefer if we find something more self-explanatory. It's not really clear what "splitting a bitfield" means. Maybe? -fsplit-bitfield-accesses -fdecomposed-bitfield-accesses -fsharded-bitfield-accesses -ffine-grained-bitfield-accesses (I think that I prefer -ffine-grained-bitfield-accesses, although it's the longest) hfinkel: I'm not opposed to -fsplit-bitfields, but I'd prefer if we find something more self-explanatory.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Ok. wmi: Ok.
		"ffine-grained-bitfield-accesses">, Group<f_clang_Group>, Flags<[CC1Option]>,
		HelpText<"Use separate access for bitfields with legal widths and alignments.">;
		hfinkelUnsubmitted Not Done Reply Inline Actions How about? Use separate access for bitfields with legal widths and alignments. I don't think that "in LLVM" is needed here (or we could put "in LLVM" on an awful lot of these options). hfinkel: How about? Use separate access for bitfields with legal widths and alignments. I don't…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Sure. wmi: Sure.
		hfinkelUnsubmitted Done Reply Inline Actions access -> accesses hfinkel: access -> accesses
		def fno_fine_grained_bitfield_accesses : Flag<["-"],
		"fno-fine-grained-bitfield-accesses">, Group<f_clang_Group>, Flags<[CC1Option]>,
		HelpText<"Use a big integer wrap for a consecutive run of bitfields.">;
		hfinkelUnsubmitted Done Reply Inline Actions Use large-integer access for consecutive bitfield runs. hfinkel: Use large-integer access for consecutive bitfield runs.

def flat__namespace : Flag<["-"], "flat_namespace">;		def flat__namespace : Flag<["-"], "flat_namespace">;
def flax_vector_conversions : Flag<["-"], "flax-vector-conversions">, Group<f_Group>;		def flax_vector_conversions : Flag<["-"], "flax-vector-conversions">, Group<f_Group>;
def flimited_precision_EQ : Joined<["-"], "flimited-precision=">, Group<f_Group>;		def flimited_precision_EQ : Joined<["-"], "flimited-precision=">, Group<f_Group>;
def flto_EQ : Joined<["-"], "flto=">, Flags<[CoreOption, CC1Option]>, Group<f_Group>,		def flto_EQ : Joined<["-"], "flto=">, Flags<[CoreOption, CC1Option]>, Group<f_Group>,
HelpText<"Set LTO mode to either 'full' or 'thin'">, Values<"thin,full">;		HelpText<"Set LTO mode to either 'full' or 'thin'">, Values<"thin,full">;
def flto : Flag<["-"], "flto">, Flags<[CoreOption, CC1Option]>, Group<f_Group>,		def flto : Flag<["-"], "flto">, Flags<[CoreOption, CC1Option]>, Group<f_Group>,
HelpText<"Enable LTO in 'full' mode">;		HelpText<"Enable LTO in 'full' mode">;
def fno_lto : Flag<["-"], "fno-lto">, Group<f_Group>,		def fno_lto : Flag<["-"], "fno-lto">, Group<f_Group>,
▲ Show 20 Lines • Show All 1,645 Lines • Show Last 20 Lines

include/clang/Frontend/CodeGenOptions.def

Show First 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	CODEGENOPT(SanitizeCoverageTracePCGuard, 1, 0) ///< Enable PC tracing with guard
///< in sanitizer coverage.		///< in sanitizer coverage.
CODEGENOPT(SanitizeCoverageInline8bitCounters, 1, 0) ///< Use inline 8bit counters.		CODEGENOPT(SanitizeCoverageInline8bitCounters, 1, 0) ///< Use inline 8bit counters.
CODEGENOPT(SanitizeCoveragePCTable, 1, 0) ///< Create a PC Table.		CODEGENOPT(SanitizeCoveragePCTable, 1, 0) ///< Create a PC Table.
CODEGENOPT(SanitizeCoverageNoPrune, 1, 0) ///< Disable coverage pruning.		CODEGENOPT(SanitizeCoverageNoPrune, 1, 0) ///< Disable coverage pruning.
CODEGENOPT(SanitizeCoverageStackDepth, 1, 0) ///< Enable max stack depth tracing		CODEGENOPT(SanitizeCoverageStackDepth, 1, 0) ///< Enable max stack depth tracing
CODEGENOPT(SanitizeStats , 1, 0) ///< Collect statistics for sanitizers.		CODEGENOPT(SanitizeStats , 1, 0) ///< Collect statistics for sanitizers.
CODEGENOPT(SimplifyLibCalls , 1, 1) ///< Set when -fbuiltin is enabled.		CODEGENOPT(SimplifyLibCalls , 1, 1) ///< Set when -fbuiltin is enabled.
CODEGENOPT(SoftFloat , 1, 0) ///< -soft-float.		CODEGENOPT(SoftFloat , 1, 0) ///< -soft-float.
		CODEGENOPT(FineGrainedBitfieldAccesses, 1, 0) ///< Use separate access for bitfields
		hfinkelUnsubmitted Done Reply Inline Actions These lines are too long. hfinkel: These lines are too long.
		hfinkelUnsubmitted Not Done Reply Inline Actions finegrained -> fine-grained (I suppose we're not sticking to 80 cols in this file anyway) hfinkel: finegrained -> fine-grained (I suppose we're not sticking to 80 cols in this file anyway)
		///< with legal widths and alignments.
CODEGENOPT(StrictEnums , 1, 0) ///< Optimize based on strict enum definition.		CODEGENOPT(StrictEnums , 1, 0) ///< Optimize based on strict enum definition.
CODEGENOPT(StrictVTablePointers, 1, 0) ///< Optimize based on the strict vtable pointers		CODEGENOPT(StrictVTablePointers, 1, 0) ///< Optimize based on the strict vtable pointers
CODEGENOPT(TimePasses , 1, 0) ///< Set when -ftime-report is enabled.		CODEGENOPT(TimePasses , 1, 0) ///< Set when -ftime-report is enabled.
CODEGENOPT(UnrollLoops , 1, 0) ///< Control whether loops are unrolled.		CODEGENOPT(UnrollLoops , 1, 0) ///< Control whether loops are unrolled.
CODEGENOPT(RerollLoops , 1, 0) ///< Control whether loops are rerolled.		CODEGENOPT(RerollLoops , 1, 0) ///< Control whether loops are rerolled.
CODEGENOPT(NoUseJumpTables , 1, 0) ///< Set when -fno-jump-tables is enabled.		CODEGENOPT(NoUseJumpTables , 1, 0) ///< Set when -fno-jump-tables is enabled.
CODEGENOPT(UnsafeFPMath , 1, 0) ///< Allow unsafe floating point optzns.		CODEGENOPT(UnsafeFPMath , 1, 0) ///< Allow unsafe floating point optzns.
CODEGENOPT(UnwindTables , 1, 0) ///< Emit unwind tables.		CODEGENOPT(UnwindTables , 1, 0) ///< Emit unwind tables.
▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

lib/CodeGen/CGRecordLayoutBuilder.cpp

Show First 20 Lines • Show All 397 Lines • ▼ Show 20 Lines	for (; Field != FieldEnd; ++Field) {
}		}
// Bitfields get the offset of their storage but come afterward and remain		// Bitfields get the offset of their storage but come afterward and remain
// there after a stable sort.		// there after a stable sort.
Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),		Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));		MemberInfo::Field, nullptr, *Field));
}		}
return;		return;
}		}

		// Check if current Field is better as a single field run. When current field
		// has legal integer width, and its bitfield offset is naturally aligned, it
		// is better to make the bitfield a separate storage component so as it can be
		// accessed directly with lower cost.
		auto IsBetterAsSingleFieldRun = [&](RecordDecl::field_iterator Field) {
		hfinkelUnsubmitted Done Reply Inline Actions betterBeSingleFieldRun -> IsBetterAsSingleFieldRun hfinkel: betterBeSingleFieldRun -> IsBetterAsSingleFieldRun
		if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
		return false;
		unsigned Width = Field->getBitWidthValue(Context);
		if (!DataLayout.isLegalInteger(Width))
		return false;
		// Make sure Field is natually aligned if it is treated as an IType integer.
		if (getFieldBitOffset(*Field) %
		Context.toBits(getAlignment(getIntNType(Width))) !=
		0)
		return false;
		return true;
		};

		// The start of a new run is better as a single field.
for (;;) {		for (;;) {
// Check to see if we need to start a new run.		// Check to see if we need to start a new run.
if (Run == FieldEnd) {		if (Run == FieldEnd) {
// If we're out of fields, return.		// If we're out of fields, return.
if (Field == FieldEnd)		if (Field == FieldEnd)
break;		break;
// Any non-zero-length bitfield can start a new run.		// Any non-zero-length bitfield can start a new run.
if (Field->getBitWidthValue(Context) != 0) {		if (Field->getBitWidthValue(Context) != 0) {
Run = Field;		Run = Field;
StartBitOffset = getFieldBitOffset(*Field);		StartBitOffset = getFieldBitOffset(*Field);
Tail = StartBitOffset + Field->getBitWidthValue(Context);		Tail = StartBitOffset + Field->getBitWidthValue(Context);
}		}
++Field;		++Field;
continue;		continue;
}		}
// Add bitfields to the run as long as they qualify.
if (Field != FieldEnd && Field->getBitWidthValue(Context) != 0 &&		// If the start field of a new run is better as a single run, or
		// if current field is better as a single run, or
		// if current field has zero width bitfield, or
		// if the offset of current field is inconsistent with the offset of
		// previous field plus its offset,
		// skip the block below and go ahead to emit the storage.
		// Otherwise, try to add bitfields to the run.
		if (Run != FieldEnd && !IsBetterAsSingleFieldRun(Run) &&
		hfinkelUnsubmitted Done Reply Inline Actions Why do you have the `IsBetterAsSingleFieldRun(Run)` check here (where we'll evaluate it multiple times (for all of the fields in the run)). Can't you make the predicate above directly? // Any non-zero-length bitfield can start a new run. if (Field->getBitWidthValue(Context) != 0 && !IsBetterAsSingleFieldRun(Field)) { Run = Field; StartBitOffset = getFieldBitOffset(Field); ... hfinkel:* Why do you have the `IsBetterAsSingleFieldRun(Run)` check here (where we'll evaluate it…
		Field != FieldEnd && !IsBetterAsSingleFieldRun(Field) &&
		Field->getBitWidthValue(Context) != 0 &&
Tail == getFieldBitOffset(*Field)) {		Tail == getFieldBitOffset(*Field)) {
Tail += Field->getBitWidthValue(Context);		Tail += Field->getBitWidthValue(Context);
++Field;		++Field;
continue;		continue;
		hfinkelUnsubmitted Not Done Reply Inline Actions The logic here is not obvious. Can you please add a comment. SingleFieldRun here is only not equal to `betterBeSingleFieldRun(Field)` if we've skipped 0-length bitfields, right? Please explain what's going on and also please make sure there's a test case. hfinkel: The logic here is not obvious. Can you please add a comment. SingleFieldRun here is only not…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions I restructure the code a little bit and hope the logic is more clear. I already have a testcase added for it. wmi: I restructure the code a little bit and hope the logic is more clear. I already have a testcase…
}		}

// We've hit a break-point in the run and need to emit a storage field.		// We've hit a break-point in the run and need to emit a storage field.
llvm::Type *Type = getIntNType(Tail - StartBitOffset);		llvm::Type *Type = getIntNType(Tail - StartBitOffset);
// Add the storage member to the record and set the bitfield info for all of		// Add the storage member to the record and set the bitfield info for all of
// the bitfields in the run. Bitfields get the offset of their storage but		// the bitfields in the run. Bitfields get the offset of their storage but
// come afterward and remain there after a stable sort.		// come afterward and remain there after a stable sort.
Members.push_back(StorageInfo(bitsToCharUnits(StartBitOffset), Type));		Members.push_back(StorageInfo(bitsToCharUnits(StartBitOffset), Type));
for (; Run != Field; ++Run)		for (; Run != Field; ++Run)
Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),		Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
▲ Show 20 Lines • Show All 425 Lines • Show Last 20 Lines

lib/Driver/ToolChains/Clang.cpp

Show First 20 Lines • Show All 3,336 Lines • ▼ Show 20 Lines	#endif
if (Args.hasFlag(options::OPT_fstrict_vtable_pointers,		if (Args.hasFlag(options::OPT_fstrict_vtable_pointers,
options::OPT_fno_strict_vtable_pointers,		options::OPT_fno_strict_vtable_pointers,
false))		false))
CmdArgs.push_back("-fstrict-vtable-pointers");		CmdArgs.push_back("-fstrict-vtable-pointers");
if (!Args.hasFlag(options::OPT_foptimize_sibling_calls,		if (!Args.hasFlag(options::OPT_foptimize_sibling_calls,
options::OPT_fno_optimize_sibling_calls))		options::OPT_fno_optimize_sibling_calls))
CmdArgs.push_back("-mdisable-tail-calls");		CmdArgs.push_back("-mdisable-tail-calls");

		Args.AddLastArg(CmdArgs, options::OPT_ffine_grained_bitfield_accesses,
		options::OPT_fno_fine_grained_bitfield_accesses);

// Handle segmented stacks.		// Handle segmented stacks.
if (Args.hasArg(options::OPT_fsplit_stack))		if (Args.hasArg(options::OPT_fsplit_stack))
CmdArgs.push_back("-split-stacks");		CmdArgs.push_back("-split-stacks");

RenderFloatingPointOptions(getToolChain(), D, OFastEnabled, Args, CmdArgs);		RenderFloatingPointOptions(getToolChain(), D, OFastEnabled, Args, CmdArgs);

// Decide whether to use verbose asm. Verbose assembly is the default on		// Decide whether to use verbose asm. Verbose assembly is the default on
// toolchains which have the integrated assembler on by default.		// toolchains which have the integrated assembler on by default.
▲ Show 20 Lines • Show All 2,039 Lines • Show Last 20 Lines

lib/Frontend/CompilerInvocation.cpp

Show First 20 Lines • Show All 539 Lines • ▼ Show 20 Lines	static bool ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args, InputKind IK,
Opts.DisableLifetimeMarkers = Args.hasArg(OPT_disable_lifetimemarkers);		Opts.DisableLifetimeMarkers = Args.hasArg(OPT_disable_lifetimemarkers);
Opts.DisableO0ImplyOptNone = Args.hasArg(OPT_disable_O0_optnone);		Opts.DisableO0ImplyOptNone = Args.hasArg(OPT_disable_O0_optnone);
Opts.DisableRedZone = Args.hasArg(OPT_disable_red_zone);		Opts.DisableRedZone = Args.hasArg(OPT_disable_red_zone);
Opts.ForbidGuardVariables = Args.hasArg(OPT_fforbid_guard_variables);		Opts.ForbidGuardVariables = Args.hasArg(OPT_fforbid_guard_variables);
Opts.UseRegisterSizedBitfieldAccess = Args.hasArg(		Opts.UseRegisterSizedBitfieldAccess = Args.hasArg(
OPT_fuse_register_sized_bitfield_access);		OPT_fuse_register_sized_bitfield_access);
Opts.RelaxedAliasing = Args.hasArg(OPT_relaxed_aliasing);		Opts.RelaxedAliasing = Args.hasArg(OPT_relaxed_aliasing);
Opts.StructPathTBAA = !Args.hasArg(OPT_no_struct_path_tbaa);		Opts.StructPathTBAA = !Args.hasArg(OPT_no_struct_path_tbaa);
		Opts.FineGrainedBitfieldAccesses =
		Args.hasFlag(OPT_ffine_grained_bitfield_accesses,
		OPT_fno_fine_grained_bitfield_accesses, false);
Opts.DwarfDebugFlags = Args.getLastArgValue(OPT_dwarf_debug_flags);		Opts.DwarfDebugFlags = Args.getLastArgValue(OPT_dwarf_debug_flags);
Opts.MergeAllConstants = !Args.hasArg(OPT_fno_merge_all_constants);		Opts.MergeAllConstants = !Args.hasArg(OPT_fno_merge_all_constants);
Opts.NoCommon = Args.hasArg(OPT_fno_common);		Opts.NoCommon = Args.hasArg(OPT_fno_common);
Opts.NoImplicitFloat = Args.hasArg(OPT_no_implicit_float);		Opts.NoImplicitFloat = Args.hasArg(OPT_no_implicit_float);
Opts.OptimizeSize = getOptimizationLevelSize(Args);		Opts.OptimizeSize = getOptimizationLevelSize(Args);
Opts.SimplifyLibCalls = !(Args.hasArg(OPT_fno_builtin) \|\|		Opts.SimplifyLibCalls = !(Args.hasArg(OPT_fno_builtin) \|\|
Args.hasArg(OPT_ffreestanding));		Args.hasArg(OPT_ffreestanding));
if (Opts.SimplifyLibCalls)		if (Opts.SimplifyLibCalls)
▲ Show 20 Lines • Show All 2,186 Lines • ▼ Show 20 Lines	ParsePreprocessorOutputArgs(Res.getPreprocessorOutputOpts(), Args,
Res.getFrontendOpts().ProgramAction);		Res.getFrontendOpts().ProgramAction);

// Turn on -Wspir-compat for SPIR target.		// Turn on -Wspir-compat for SPIR target.
llvm::Triple T(Res.getTargetOpts().Triple);		llvm::Triple T(Res.getTargetOpts().Triple);
auto Arch = T.getArch();		auto Arch = T.getArch();
if (Arch == llvm::Triple::spir \|\| Arch == llvm::Triple::spir64) {		if (Arch == llvm::Triple::spir \|\| Arch == llvm::Triple::spir64) {
Res.getDiagnosticOpts().Warnings.push_back("spir-compat");		Res.getDiagnosticOpts().Warnings.push_back("spir-compat");
}		}

		// If sanitizer is enabled, disable OPT_ffine_grained_bitfield_accesses.
		if (Res.getCodeGenOpts().FineGrainedBitfieldAccesses &&
		!Res.getLangOpts()->Sanitize.empty()) {
		Res.getCodeGenOpts().FineGrainedBitfieldAccesses = false;
		Diags.Report(diag::warn_drv_fine_grained_bitfield_accesses_ignored);
		}
return Success;		return Success;
}		}

std::string CompilerInvocation::getModuleHash() const {		std::string CompilerInvocation::getModuleHash() const {
// Note: For QoI reasons, the things we use as a hash here should all be		// Note: For QoI reasons, the things we use as a hash here should all be
// dumped via the -module-info flag.		// dumped via the -module-info flag.
using llvm::hash_code;		using llvm::hash_code;
using llvm::hash_value;		using llvm::hash_value;
▲ Show 20 Lines • Show All 157 Lines • Show Last 20 Lines

test/CodeGenCXX/finegrain-bitfield-access.cpp

				// RUN: %clang_cc1 -triple x86_64-linux-gnu -ffine-grained-bitfield-accesses \
				// RUN: -emit-llvm -o - %s \| FileCheck %s
				// RUN: %clang_cc1 -triple x86_64-linux-gnu -ffine-grained-bitfield-accesses \
				// RUN: -emit-llvm -fsanitize=address -o - %s \| FileCheck %s --check-prefix=SANITIZE
				// Check -fsplit-bitfields will be ignored since sanitizer is enabled.

				struct S1 {
				unsigned f1:2;
				unsigned f2:6;
				unsigned f3:8;
				unsigned f4:4;
				unsigned f5:8;
				};

				S1 a1;
				unsigned read8_1() {
				// CHECK-LABEL: @_Z7read8_1v
				// CHECK: %bf.load = load i8, i8* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 1), align 1
				// CHECK-NEXT: %bf.cast = zext i8 %bf.load to i32
				// CHECK-NEXT: ret i32 %bf.cast
				// SANITIZE-LABEL: @_Z7read8_1v
				// SANITIZE: %bf.load = load i32, i32* getelementptr inbounds {{.*}}, align 4
				// SANITIZE: %bf.lshr = lshr i32 %bf.load, 8
				// SANITIZE: %bf.clear = and i32 %bf.lshr, 255
				// SANITIZE: ret i32 %bf.clear
				return a1.f3;
				}
				void write8_1() {
				// CHECK-LABEL: @_Z8write8_1v
				// CHECK: store i8 3, i8* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 1), align 1
				// CHECK-NEXT: ret void
				// SANITIZE-LABEL: @_Z8write8_1v
				// SANITIZE: %bf.load = load i32, i32* getelementptr inbounds {{.*}}, align 4
				// SANITIZE-NEXT: %bf.clear = and i32 %bf.load, -65281
				// SANITIZE-NEXT: %bf.set = or i32 %bf.clear, 768
				// SANITIZE-NEXT: store i32 %bf.set, i32* getelementptr inbounds {{.*}}, align 4
				// SANITIZE-NEXT: ret void
				a1.f3 = 3;
				}

				unsigned read8_2() {
				// CHECK-LABEL: @_Z7read8_2v
				// CHECK: %bf.load = load i16, i16* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 2), align 2
				// CHECK-NEXT: %bf.lshr = lshr i16 %bf.load, 4
				// CHECK-NEXT: %bf.clear = and i16 %bf.lshr, 255
				// CHECK-NEXT: %bf.cast = zext i16 %bf.clear to i32
				// CHECK-NEXT: ret i32 %bf.cast
				// SANITIZE-LABEL: @_Z7read8_2v
				// SANITIZE: %bf.load = load i32, i32* getelementptr inbounds {{.*}}, align 4
				// SANITIZE-NEXT: %bf.lshr = lshr i32 %bf.load, 20
				// SANITIZE-NEXT: %bf.clear = and i32 %bf.lshr, 255
				// SANITIZE-NEXT: ret i32 %bf.clear
				return a1.f5;
				}
				void write8_2() {
				// CHECK-LABEL: @_Z8write8_2v
				// CHECK: %bf.load = load i16, i16* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 2), align 2
				// CHECK-NEXT: %bf.clear = and i16 %bf.load, -4081
				// CHECK-NEXT: %bf.set = or i16 %bf.clear, 48
				// CHECK-NEXT: store i16 %bf.set, i16* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 2), align 2
				// CHECK-NEXT: ret void
				// SANITIZE-LABEL: @_Z8write8_2v
				// SANITIZE: %bf.load = load i32, i32* getelementptr inbounds {{.*}}, align 4
				// SANITIZE-NEXT: %bf.clear = and i32 %bf.load, -267386881
				// SANITIZE-NEXT: %bf.set = or i32 %bf.clear, 3145728
				// SANITIZE-NEXT: store i32 %bf.set, i32* getelementptr inbounds {{.*}}, align 4
				// SANITIZE-NEXT: ret void
				a1.f5 = 3;
				}

				struct S2 {
				unsigned long f1:16;
				unsigned long f2:16;
				unsigned long f3:6;
				};

				S2 a2;
				unsigned read16_1() {
				// CHECK-LABEL: @_Z8read16_1v
				// CHECK: %bf.load = load i16, i16* getelementptr inbounds (%struct.S2, %struct.S2* @a2, i32 0, i32 0), align 8
				// CHECK-NEXT: %bf.cast = zext i16 %bf.load to i64
				// CHECK-NEXT: %conv = trunc i64 %bf.cast to i32
				// CHECK-NEXT: ret i32 %conv
				// SANITIZE-LABEL: @_Z8read16_1v
				// SANITIZE: %bf.load = load i64, i64* bitcast {{.*}}, align 8
				// SANITIZE-NEXT: %bf.clear = and i64 %bf.load, 65535
				// SANITIZE-NEXT: %conv = trunc i64 %bf.clear to i32
				// SANITIZE-NEXT: ret i32 %conv
				return a2.f1;
				}
				unsigned read16_2() {
				// CHECK-LABEL: @_Z8read16_2v
				// CHECK: %bf.load = load i16, i16* getelementptr inbounds (%struct.S2, %struct.S2* @a2, i32 0, i32 1), align 2
				// CHECK-NEXT: %bf.cast = zext i16 %bf.load to i64
				// CHECK-NEXT: %conv = trunc i64 %bf.cast to i32
				// CHECK-NEXT: ret i32 %conv
				// SANITIZE-LABEL: @_Z8read16_2v
				// SANITIZE: %bf.load = load i64, i64* bitcast {{.*}}, align 8
				// SANITIZE-NEXT: %bf.lshr = lshr i64 %bf.load, 16
				// SANITIZE-NEXT: %bf.clear = and i64 %bf.lshr, 65535
				// SANITIZE-NEXT: %conv = trunc i64 %bf.clear to i32
				// SANITIZE-NEXT: ret i32 %conv
				return a2.f2;
				}

				void write16_1() {
				// CHECK-LABEL: @_Z9write16_1v
				// CHECK: store i16 5, i16* getelementptr inbounds (%struct.S2, %struct.S2* @a2, i32 0, i32 0), align 8
				// CHECK-NEXT: ret void
				// SANITIZE-LABEL: @_Z9write16_1v
				// SANITIZE: %bf.load = load i64, i64* bitcast {{.*}}, align 8
				// SANITIZE-NEXT: %bf.clear = and i64 %bf.load, -65536
				// SANITIZE-NEXT: %bf.set = or i64 %bf.clear, 5
				// SANITIZE-NEXT: store i64 %bf.set, i64* bitcast {{.*}}, align 8
				// SANITIZE-NEXT: ret void
				a2.f1 = 5;
				}
				void write16_2() {
				// CHECK-LABEL: @_Z9write16_2v
				// CHECK: store i16 5, i16* getelementptr inbounds (%struct.S2, %struct.S2* @a2, i32 0, i32 1), align 2
				// CHECK-NEXT: ret void
				// SANITIZE-LABEL: @_Z9write16_2v
				// SANITIZE: %bf.load = load i64, i64* bitcast {{.*}}, align 8
				// SANITIZE-NEXT: %bf.clear = and i64 %bf.load, -4294901761
				// SANITIZE-NEXT: %bf.set = or i64 %bf.clear, 327680
				// SANITIZE-NEXT: store i64 %bf.set, i64* bitcast {{.*}}, align 8
				// SANITIZE-NEXT: ret void
				a2.f2 = 5;
				}

				struct S3 {
				unsigned long f1:14;
				unsigned long f2:18;
				unsigned long f3:32;
				};

				S3 a3;
				unsigned read32_1() {
				// CHECK-LABEL: @_Z8read32_1v
				// CHECK: %bf.load = load i32, i32* getelementptr inbounds (%struct.S3, %struct.S3* @a3, i32 0, i32 1), align 4
				// CHECK-NEXT: %bf.cast = zext i32 %bf.load to i64
				// CHECK-NEXT: %conv = trunc i64 %bf.cast to i32
				// CHECK-NEXT: ret i32 %conv
				// SANITIZE-LABEL: @_Z8read32_1v
				// SANITIZE: %bf.load = load i64, i64* getelementptr inbounds {{.*}}, align 8
				// SANITIZE-NEXT: %bf.lshr = lshr i64 %bf.load, 32
				// SANITIZE-NEXT: %conv = trunc i64 %bf.lshr to i32
				// SANITIZE-NEXT: ret i32 %conv
				return a3.f3;
				}
				void write32_1() {
				// CHECK-LABEL: @_Z9write32_1v
				// CHECK: store i32 5, i32* getelementptr inbounds (%struct.S3, %struct.S3* @a3, i32 0, i32 1), align 4
				// CHECK-NEXT: ret void
				// SANITIZE-LABEL: @_Z9write32_1v
				// SANITIZE: %bf.load = load i64, i64* getelementptr inbounds {{.*}}, align 8
				// SANITIZE-NEXT: %bf.clear = and i64 %bf.load, 4294967295
				// SANITIZE-NEXT: %bf.set = or i64 %bf.clear, 21474836480
				// SANITIZE-NEXT: store i64 %bf.set, i64* getelementptr inbounds {{.*}}, align 8
				// SANITIZE-NEXT: ret void
				a3.f3 = 5;
				}