This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
cfe/trunk/
-
trunk/
-
include/clang/
-
clang/
-
Driver/
-
CC1Options.td
-
Frontend/
-
CodeGenOptions.def
-
lib/
-
CodeGen/
-
CGOpenMPRuntime.cpp
-
CGOpenMPRuntimeNVPTX.cpp
-
CodeGenModule.cpp
-
Frontend/
-
CompilerInvocation.cpp
-
test/
-
CodeGen/
-
aarch64-neon-2velem.c
-
aarch64-neon-3v.c
-
aarch64-neon-across.c
-
aarch64-neon-extract.c
-
aarch64-neon-fcvt-intrinsics.c
-
aarch64-neon-fma.c
-
aarch64-neon-intrinsics.c
-
aarch64-neon-ldst-one.c
-
aarch64-neon-misc.c
-
aarch64-neon-perm.c
-
aarch64-neon-scalar-copy.c
-
aarch64-neon-scalar-x-indexed-elem.c
-
aarch64-neon-shifts.c
-
aarch64-neon-tbl.c
-
aarch64-neon-vcombine.c
-
aarch64-neon-vget-hilo.c
-
aarch64-neon-vget.c
-
aarch64-poly128.c
-
aarch64-poly64.c
-
address-safety-attr-kasan.cpp
-
address-safety-attr.cpp
-
arm-crc32.c
-
arm-neon-directed-rounding.c
-
arm-neon-fma.c
-
arm-neon-numeric-maxmin.c
-
arm-neon-shifts.c
-
arm-neon-vcvtX.c
-
arm-neon-vget.c
-
arm64-crc32.c
-
arm64-lanes.c
-
arm64_vcopy.c
-
arm64_vdupq_n_f64.c
-
attr-coldhot.c
-
attr-naked.c
-
builtins-arm-exclusive.c
-
builtins-arm.c
-
builtins-arm64.c
-
noduplicate-cxx11-test.cpp
-
pragma-weak.c
-
unwind-attr.c
-
CodeGenCXX/
-
apple-kext-indirect-virtual-dtor-call.cpp
-
apple-kext-no-staticinit-section.cpp
-
debug-info-global-ctor-dtor.cpp
-
optnone-templates.cpp
-
static-init-wasm.cpp
-
thunks.cpp
-
CodeGenObjC/
-
gnu-exceptions.m
-
CodeGenOpenCL/
-
amdgpu-attrs.cl
-
Driver/
-
darwin-iphone-defaults.m

Differential D28404

IRGen: Add optnone attribute on function during O0
ClosedPublic

Authored by mehdi_amini on Jan 6 2017, 11:31 AM.

Download Raw Diff

Details

Reviewers

chandlerc
• tstellarAMD
rsmith
dexonsmith

Commits

rG6aa9e9b41a57: IRGen: Add optnone attribute on function during O0
rC304127: IRGen: Add optnone attribute on function during O0
rL304127: IRGen: Add optnone attribute on function during O0

Summary

Amongst other, this will help LTO to correctly handle/honor files
compiled with O0, helping debugging failures.
It also seems in line with how we handle other options, like how
-fnoinline add the appropriate attribute as well.

Diff Detail

Repository: rL LLVM

Event Timeline

mehdi_amini updated this revision to Diff 83390.Jan 6 2017, 11:31 AM

mehdi_amini retitled this revision from to IRGen: Add optnone attribute on function during O0.

mehdi_amini updated this object.

mehdi_amini added reviewers: chandlerc, rsmith.

mehdi_amini added subscribers: cfe-commits, dexonsmith.

Herald added a reviewer: • tstellarAMD. · View Herald TranscriptJan 6 2017, 11:31 AM

Herald added a subscriber: nhaehnle. · View Herald Transcript

mehdi_amini updated this revision to Diff 83391.Jan 6 2017, 11:33 AM

Remove spurious change

Herald edited edge metadata. · View Herald TranscriptJan 6 2017, 11:33 AM

Herald added a subscriber: wdng. · View Herald Transcript

Maybe instead, pass a flag to enable setting optnone on everything when the driver sees -O0 -flto? The patch as-is obviously has a massive testing cost, and it's easy to imagine people being tripped up by this in the future.

In D28404#638217, @probinson wrote:

Maybe instead, pass a flag to enable setting optnone on everything when the driver sees -O0 -flto?

I'm not fond of this: limiting discrepancy between LTO and non-LTO reduces the LTO specific bugs and reduces the maintenance of LTO.

The patch as-is obviously has a massive testing cost, and it's easy to imagine people being tripped up by this in the future.

Can you clarify what massive testing cost you're referring to?

In D28404#638221, @mehdi_amini wrote:

In D28404#638217, @probinson wrote:

The patch as-is obviously has a massive testing cost, and it's easy to imagine people being tripped up by this in the future.

Can you clarify what massive testing cost you're referring to?

Well, you just had to modify around 50 tests, and I'd expect some future tests to have to deal with it too. Maybe "massive" is overstating it but it seemed like an unusually large number.

I don't know that just slapping the option on all these tests is really the most appropriate fix, either, in some cases. I'll look at it more.

In D28404#638299, @probinson wrote:

In D28404#638221, @mehdi_amini wrote:

In D28404#638217, @probinson wrote:

The patch as-is obviously has a massive testing cost, and it's easy to imagine people being tripped up by this in the future.

Can you clarify what massive testing cost you're referring to?

Well, you just had to modify around 50 tests, and I'd expect some future tests to have to deal with it too. Maybe "massive" is overstating it but it seemed like an unusually large number.

There are two things:

tests are modified: when adding a new option, it does not seems unusual to me
what impact on future testing. I still don't see any of this future "testing cost" you're referring to right now.

I don't know that just slapping the option on all these tests is really the most appropriate fix, either, in some cases. I'll look at it more.

For instance the ARM test are relying on piping the output of clang to mem2reg to clean the IR and have simpler check patterns (I assume). This is not compatible with optnone obviously.
On the other hand I don't want to update the check lines for > 20000 lines in testsclang/test/CodeGen/aarch64-neon-intrinsics.c just to save passing an option.
It's likely that some of these test could have their check line adapted, but I didn't see much interest in doing this.

Fix minsize issue (conditional was reversed)

Herald edited edge metadata. · View Herald TranscriptJan 6 2017, 2:43 PM

Herald added a subscriber: jholewinski. · View Herald Transcript

Fix one more conflicts with always_inline, and change some test check lines

Herald edited edge metadata. · View Herald TranscriptJan 6 2017, 3:04 PM

Herald added subscribers: dschuff, jfb. · View Herald Transcript

In D28404#638350, @mehdi_amini wrote:

In D28404#638299, @probinson wrote:

In D28404#638221, @mehdi_amini wrote:

In D28404#638217, @probinson wrote:

The patch as-is obviously has a massive testing cost, and it's easy to imagine people being tripped up by this in the future.

Can you clarify what massive testing cost you're referring to?

Well, you just had to modify around 50 tests, and I'd expect some future tests to have to deal with it too. Maybe "massive" is overstating it but it seemed like an unusually large number.

There are two things:

tests are modified: when adding a new option, it does not seems unusual to me

50 seems rather more than usual, but whatever. Granted it's not hundreds.

what impact on future testing. I still don't see any of this future "testing cost" you're referring to right now.

Maybe I worry too much.

I am getting a slightly different set of test failures than you did though. I get these failures:
CodeGen/aarch64-neon-extract.c
CodeGen/aarch64-poly128.c
CodeGen/arm-neon-shifts.c
CodeGen/arm64-crc32.c

And I don't get these failures:
CodeGenCXX/apple-kext-indirect-virtual-dtor-call.cpp
CodeGenCXX/apple-kext-no-staticinit-section.cpp
CodeGenCXX/debug-info-global-ctor-dtor.cpp

clang/lib/CodeGen/CodeGenModule.cpp
900 ↗	(On Diff #83391)	I'd set ShouldAddOptNone = false here, as it's already explicit.
clang/test/CodeGen/aarch64-neon-2velem.c
1 ↗	(On Diff #83441)	Option specified twice.

chandlerc added inline comments.Jan 6 2017, 3:37 PM

clang/lib/CodeGen/CGOpenMPRuntime.cpp
760–762 ↗	(On Diff #83441)	At point where we are in numerous places doing 3 coupled calls, we should add some routine to do this... Maybe we should have when I added the noinline bit. I don't have a good idea of where best to do this -- as part of or as an alternative to `SetInternalFunctionAttributes`? Something else? I'm imagining something like `SetAlwaysInlinedRuntimeFunctionAttributes` or something. Need a clang IRGen person to help push the organization in the right direction.
clang/lib/CodeGen/CodeGenModule.cpp
899–900 ↗	(On Diff #83441)	Unrelated (and unnecessary) formatting change?
910–912 ↗	(On Diff #83441)	Is this still at all correct? Why? it seems pretty confusing especially in conjunction with the code below. I think this may force you to either: a) stop early-marking of -Os and -Oz flags with these attributes (early: prior to calling this routine) and handling all of the -O flag synthesized attributes here, or b) set optnone for -O0 wher ewe set optsize for -Os and friends, and then remove it where necessary here. I don't have any strong opinion about a vs. b.
962 ↗	(On Diff #83441)	why is optnone incompatible with cold....
892 ↗	(On Diff #83391)	attributes prevents -> attributes prevent ACtually, what do you mean by attributes here? Or should this comment instead go below, where we start to branch on the actual 'hasAttr' calls? After reading below, I understand better. Maybe: // Track whether we need to add the optnone LLVM attribute, // starting with the default for this optimization level.

probinson added inline comments.Jan 6 2017, 4:03 PM

clang/lib/CodeGen/CodeGenModule.cpp
962 ↗	(On Diff #83441)	Because cold implies OptimizeForSize (just above this). I take no position on whether that is reasonable.

Address comments: reorganize the way ShouldAddOptNone is handled, hopefully make it more easy to track.

Also after talking with Chandler on IRC, the source attribute "cold" does
not add the LLVM IR attribute "optsize" at O0, we add "optnone" instead.

Herald edited edge metadata. · View Herald TranscriptJan 6 2017, 4:25 PM

mehdi_amini marked 6 inline comments as done.Jan 6 2017, 4:27 PM

mehdi_amini added inline comments.

clang/lib/CodeGen/CGOpenMPRuntime.cpp
760–762 ↗	(On Diff #83441)	Yes some refactoring of all this custom handling would be welcome. I'll take any pointer to how to do it in clang (I'm not familiar enough with clang).
clang/lib/CodeGen/CodeGenModule.cpp
910–912 ↗	(On Diff #83441)	I believe it is still correct: during Os/Oz we reach this point and figure that there is `__attribute__((optnone))` in the source (not `-O0`), we remove the attributes, nothing changes. Did I miss something?
962 ↗	(On Diff #83441)	The source attribute "Cold" adds `llvm::Attribute::OptimizeForSize` even at O0 right now, I changed this and now we emit `optnone` at O0 in this case.
892 ↗	(On Diff #83391)	Actually I instead moved it all together.

probinson added inline comments.Jan 6 2017, 4:39 PM

clang/lib/CodeGen/CodeGenModule.cpp
896 ↗	(On Diff #83459)	Period at the end of a comment.
900 ↗	(On Diff #83459)	This block is redundant now? The same things are added in the next if block.

mehdi_amini marked 2 inline comments as done.Jan 6 2017, 4:44 PM

mehdi_amini added inline comments.

clang/lib/CodeGen/CodeGenModule.cpp
900 ↗	(On Diff #83459)	Oh right! Will remove, thanks!

Address Paul's comment (remove useless block and add period to end comment)

Herald edited edge metadata. · View Herald TranscriptJan 6 2017, 5:06 PM

probinson added inline comments.Jan 6 2017, 5:18 PM

clang/lib/CodeGen/CodeGenModule.cpp
910–912 ↗	(On Diff #83441)	Hmmm the Os/Oz attributes are added in CGCall.cpp, and are guarded with a check on the presence of the Optnone source attribute, so if the Optnone source attribute is present we should never see these. And Os/Oz set OptimizationLevel to 2, which is not zero, so we won't come through here for ShouldAddOptNone reasons either. Therefore these 'remove' calls should be no-ops and could be removed. (For paranoia you could turn them into asserts, and do some experimenting to see whether I'm confused about how this all fits together.)

mehdi_amini added inline comments.Jan 6 2017, 6:12 PM

clang/lib/CodeGen/CodeGenModule.cpp
910–912 ↗	(On Diff #83441)	The verifier is already complaining if we get this wrong, and indeed it complains if I'm removing these. See clang/test/CodeGen/attr-func-def.c: int foo1(int); int foo2(int a) { return foo1(a + 2); } __attribute__((optnone)) int foo1(int a) { return a + 1; } Here we have the attributed optnone on the definition but not the declaration, and the check you're mentioning in CGCalls is only applying to the declaration.

Forgot to update test/CodeGen/attr-naked.c

Herald edited edge metadata. · View Herald TranscriptJan 6 2017, 6:15 PM

chandlerc added inline comments.Jan 6 2017, 6:52 PM

clang/lib/CodeGen/CodeGenModule.cpp

910–912 ↗

(On Diff #83441)

This is all still incredibly confusing code.

I think what would make me happy with this is to have a separate section for each mutually exclusive group of LLVM attributes added to the function. so:

// Add the relevant optimization level to the LLVM function.
if (...) {
  B.addAttribute(llvm::Attribute::OptNone);
  F.removeFnAttr(llvm::ATtribute::OptForSize);
  ...
} else if (...) {
  B.addAttribute(llvm::Attribute::OptForSize);
} else if (...) }
  ...
}

// Add the inlining control attributes.
if (...) {
  <whatever to set NoInline>
} else if (...) {
  <whatever to set AlwaysInline>
} else if (...) {
  <whatever to set inlinehint>
}

// Add specific semantic attributes such as 'naked' and 'cold'.
if (D->hasAttr<NakedAttr>()) {
  B.addAttribute(...::Naked);
}
if (D->hasAttr<Cold>()) {
  ...
}

Even though this means testing the Clang-level attributes multiple times, I think it'll be much less confusing to read and update. We're actually already really close. just need to hoist the non-inlining bits of optnone out, sink the naked attribute down, and hoist the cold sizeopt up.

mehdi_amini added inline comments.Jan 6 2017, 9:00 PM

clang/lib/CodeGen/CodeGenModule.cpp
910–912 ↗	(On Diff #83441)	Since you answer below the example I gave above, I just want to be sure you understand that the attributes for the declarations are not even handled in the same file right? The "state machine" is cross TU here, and it seems to me what you're describing would require some refactoring between CGCall.cpp and CodeGenModule.cpp.

Over the weekend I had a thought: Why is -O0 so special here? That is, after going to all this trouble to propagate -O0 to LTO, how does this generalize to propagating -O1 or any other specific -O option? (Maybe this question would be better dealt with on the dev list...)

In D28404#639874, @probinson wrote:

Over the weekend I had a thought: Why is -O0 so special here? That is, after going to all this trouble to propagate -O0 to LTO, how does this generalize to propagating -O1 or any other specific -O option? (Maybe this question would be better dealt with on the dev list...)

O0 is "special" like Os and Oz because we have an attribute for it and passes "know" how to handle this attribute.
I guess no-one cares enough about O1/O2/O3 to find a solution for these (in the context of LTO, I don't really care about O1/O2).
It is likely that Og would need a special treatment at some point, maybe with a new attribute as well, to inhibit optimization that can't preserve debug info properly.

In D28404#639887, @mehdi_amini wrote:

In D28404#639874, @probinson wrote:

Over the weekend I had a thought: Why is -O0 so special here? That is, after going to all this trouble to propagate -O0 to LTO, how does this generalize to propagating -O1 or any other specific -O option? (Maybe this question would be better dealt with on the dev list...)

O0 is "special" like Os and Oz because we have an attribute for it and passes "know" how to handle this attribute.
I guess no-one cares enough about O1/O2/O3 to find a solution for these (in the context of LTO, I don't really care about O1/O2).
It is likely that Og would need a special treatment at some point, maybe with a new attribute as well, to inhibit optimization that can't preserve debug info properly.

"I don't care" doesn't seem like much of a principle.

Optnone does not equal -O0. It is a debugging aid for the programmer, because debugging optimized code sucks. If you have an LTO-built application and want to de-optimize parts of it to aid with debugging, then you can use the pragma, as originally intended. I don't think -c -O0 should get this not-entirely-O0-like behavior.

In D28404#640046, @probinson wrote:

In D28404#639887, @mehdi_amini wrote:

In D28404#639874, @probinson wrote:

Over the weekend I had a thought: Why is -O0 so special here? That is, after going to all this trouble to propagate -O0 to LTO, how does this generalize to propagating -O1 or any other specific -O option? (Maybe this question would be better dealt with on the dev list...)

O0 is "special" like Os and Oz because we have an attribute for it and passes "know" how to handle this attribute.
I guess no-one cares enough about O1/O2/O3 to find a solution for these (in the context of LTO, I don't really care about O1/O2).
It is likely that Og would need a special treatment at some point, maybe with a new attribute as well, to inhibit optimization that can't preserve debug info properly.

"I don't care" doesn't seem like much of a principle.

Long version is: "There is no use-case, no users, so I don't have much motivation to push it forward for the only sake of completeness". Does it sound enough of a principle like that?

Optnone does not equal -O0. It is a debugging aid for the programmer, because debugging optimized code sucks. If you have an LTO-built application and want to de-optimize parts of it to aid with debugging, then you can use the pragma, as originally intended.

Having to modifying the source isn't friendly. Not being able to honor -O0 during LTO is not user-friendly.

I don't think -c -O0 should get this not-entirely-O0-like behavior.

What is "not-entirely"? And why do you think that?

In D28404#640090, @mehdi_amini wrote:

In D28404#640046, @probinson wrote:

"I don't care" doesn't seem like much of a principle.

Long version is: "There is no use-case, no users, so I don't have much motivation to push it forward for the only sake of completeness". Does it sound enough of a principle like that?

No. You still need to have adequate justification for your use case, which I think you do not.

Optnone does not equal -O0. It is a debugging aid for the programmer, because debugging optimized code sucks. If you have an LTO-built application and want to de-optimize parts of it to aid with debugging, then you can use the pragma, as originally intended.

Having to modifying the source isn't friendly. Not being able to honor -O0 during LTO is not user-friendly.

IMO, '-O0' and '-flto' are conflicting options and therefore not deserving of special support.

In my experience, modifying source is by far simpler than hacking a build system to make a special case for compiler options for one module in an application. (If you have a way to build Clang with everything done LTO except one module built with -O0, on Linux with ninja, I would be very curious to hear how you do that.) But if your build system makes that easy, you can just as easily remove -flto as add -O0 and thus get the result you want without trying to pass conflicting options to the compiler. Or spending time implementing this patch.

I don't think -c -O0 should get this not-entirely-O0-like behavior.

What is "not-entirely"? And why do you think that?

"Not entirely" means that running the -O0 pipeline, and running an optimization pipeline but asking some subset of passes to turn themselves off, does not get you the same result. And I think that because I'm the one who put 'optnone' upstream in the first place. The case that particularly sticks in my memory is the register allocator, but I believe there are passes at every stage that do not turn themselves off for optnone.

In D28404#640170, @probinson wrote:

In my experience, modifying source

Note that the source modification consists of adding #pragma clang optimize off to the top of the file. It is not a complicated thing.

In D28404#640170, @probinson wrote:

In D28404#640090, @mehdi_amini wrote:

In D28404#640046, @probinson wrote:

"I don't care" doesn't seem like much of a principle.

Long version is: "There is no use-case, no users, so I don't have much motivation to push it forward for the only sake of completeness". Does it sound enough of a principle like that?

No. You still need to have adequate justification for your use case, which I think you do not.

I don't follow your logic.
IIUC, you asked about "why not supporting O1/O2/O3" ; how is *not supporting* these because their not useful / don't have use-case related to "supporting O0 is useful"?

Optnone does not equal -O0. It is a debugging aid for the programmer, because debugging optimized code sucks. If you have an LTO-built application and want to de-optimize parts of it to aid with debugging, then you can use the pragma, as originally intended.

Having to modifying the source isn't friendly. Not being able to honor -O0 during LTO is not user-friendly.

IMO, '-O0' and '-flto' are conflicting options and therefore not deserving of special support.

You're advocating for *rejecting* O0 built module at link-time? We'd still need to detect this though. Status-quo isn't acceptable.

Also, that's not practicable: what if I have an LTO static library for which I don't have the source, now if I build my own file with -O0 -flto I can't link anymore.

In my experience, modifying source is by far simpler than hacking a build system to make a special case for compiler options for one module in an application. (If you have a way to build Clang with everything done LTO except one module built with -O0, on Linux with ninja, I would be very curious to hear how you do that.)

Static library, separated projects, etc.
We have tons of users...

I don't think -c -O0 should get this not-entirely-O0-like behavior.

What is "not-entirely"? And why do you think that?

"Not entirely" means that running the -O0 pipeline, and running an optimization pipeline but asking some subset of passes to turn themselves off, does not get you the same result. And I think that because I'm the one who put 'optnone' upstream in the first place. The case that particularly sticks in my memory is the register allocator, but I believe there are passes at every stage that do not turn themselves off for optnone.

That's orthogonal: you're saying we are not handling it correctly yet, I'm just moving toward *fixing* all these.

In D28404#640178, @mehdi_amini wrote:

Also, that's not practicable: what if I have an LTO static library for which I don't have the source, now if I build my own file with -O0 -flto I can't link anymore.

Also: LTO is required for some features likes CFI. There are users who wants CFI+O0 during development (possibly for debugging a subcomponent of the app).

In D28404#640178, @mehdi_amini wrote:

In D28404#640170, @probinson wrote:

In D28404#640090, @mehdi_amini wrote:

In D28404#640046, @probinson wrote:

"I don't care" doesn't seem like much of a principle.

Long version is: "There is no use-case, no users, so I don't have much motivation to push it forward for the only sake of completeness". Does it sound enough of a principle like that?

No. You still need to have adequate justification for your use case, which I think you do not.

I don't follow your logic.
IIUC, you asked about "why not supporting O1/O2/O3" ; how is *not supporting* these because their not useful / don't have use-case related to "supporting O0 is useful"?

Upfront, it seemed peculiar to handle only one optimization level. After more thought, the whole idea of mixing -O0 and LTO seems wrong. Sorry, should have signaled that I had changed my mind about it.

Optnone does not equal -O0. It is a debugging aid for the programmer, because debugging optimized code sucks. If you have an LTO-built application and want to de-optimize parts of it to aid with debugging, then you can use the pragma, as originally intended.

Having to modifying the source isn't friendly. Not being able to honor -O0 during LTO is not user-friendly.

IMO, '-O0' and '-flto' are conflicting options and therefore not deserving of special support.

You're advocating for *rejecting* O0 built module at link-time? We'd still need to detect this though. Status-quo isn't acceptable.
Also, that's not practicable: what if I have an LTO static library for which I don't have the source, now if I build my own file with -O0 -flto I can't link anymore.

No, I'm saying they are conflicting options on the same Clang command line.
As long as your linker can handle foo.o and bar.bc on the same command line, not a problem. (If your linker can't handle that, fix the linker first.)

In my experience, modifying source is by far simpler than hacking a build system to make a special case for compiler options for one module in an application. (If you have a way to build Clang with everything done LTO except one module built with -O0, on Linux with ninja, I would be very curious to hear how you do that.)

Static library, separated projects, etc.
We have tons of users...

Still waiting. Your up-front use case was about de-optimizing a module to assist debugging it within an LTO-built application, not building entire projects one way versus another. If that is not actually your use case, you need to start over with the correct description.

I don't think -c -O0 should get this not-entirely-O0-like behavior.

What is "not-entirely"? And why do you think that?

"Not entirely" means that running the -O0 pipeline, and running an optimization pipeline but asking some subset of passes to turn themselves off, does not get you the same result. And I think that because I'm the one who put 'optnone' upstream in the first place. The case that particularly sticks in my memory is the register allocator, but I believe there are passes at every stage that do not turn themselves off for optnone.

That's orthogonal: you're saying we are not handling it correctly yet, I'm just moving toward *fixing* all these.

It's not orthogonal; that's exactly how 'optnone' behaves today. If you have proposed a redesign of how to mix optnone and non-optnone functions in the same compilation unit, in some way other than what's done today, I am not aware of it; can you point to your proposal?

In D28404#640182, @mehdi_amini wrote:

In D28404#640178, @mehdi_amini wrote:

Also, that's not practicable: what if I have an LTO static library for which I don't have the source, now if I build my own file with -O0 -flto I can't link anymore.

Also: LTO is required for some features likes CFI. There are users who wants CFI+O0 during development (possibly for debugging a subcomponent of the app).

Sorry, you lost me. CFI is part of DWARF and we do DWARF perfectly well without LTO (and at O0).

In D28404#640284, @probinson wrote:

In D28404#640178, @mehdi_amini wrote:

In D28404#640170, @probinson wrote:

In D28404#640090, @mehdi_amini wrote:

In D28404#640046, @probinson wrote:

"I don't care" doesn't seem like much of a principle.

Long version is: "There is no use-case, no users, so I don't have much motivation to push it forward for the only sake of completeness". Does it sound enough of a principle like that?

No. You still need to have adequate justification for your use case, which I think you do not.

I don't follow your logic.
IIUC, you asked about "why not supporting O1/O2/O3" ; how is *not supporting* these because their not useful / don't have use-case related to "supporting O0 is useful"?

Upfront, it seemed peculiar to handle only one optimization level. After more thought, the whole idea of mixing -O0 and LTO seems wrong. Sorry, should have signaled that I had changed my mind about it.

You just haven't articulated 1) why it is wrong and 2) what should we do about it.

Optnone does not equal -O0. It is a debugging aid for the programmer, because debugging optimized code sucks. If you have an LTO-built application and want to de-optimize parts of it to aid with debugging, then you can use the pragma, as originally intended.

Having to modifying the source isn't friendly. Not being able to honor -O0 during LTO is not user-friendly.

IMO, '-O0' and '-flto' are conflicting options and therefore not deserving of special support.

You're advocating for *rejecting* O0 built module at link-time? We'd still need to detect this though. Status-quo isn't acceptable.
Also, that's not practicable: what if I have an LTO static library for which I don't have the source, now if I build my own file with -O0 -flto I can't link anymore.

No, I'm saying they are conflicting options on the same Clang command line.
As long as your linker can handle foo.o and bar.bc on the same command line, not a problem. (If your linker can't handle that, fix the linker first.)

You just wrote above that " mixing -O0 and LTO " is wrong, *if* I were to agree with you at some point, then I'd make it a hard error.

In my experience, modifying source is by far simpler than hacking a build system to make a special case for compiler options for one module in an application. (If you have a way to build Clang with everything done LTO except one module built with -O0, on Linux with ninja, I would be very curious to hear how you do that.)

Static library, separated projects, etc.
We have tons of users...

Still waiting.

Waiting for what?
We have use-cases, I gave you a few (vendor static libraries are one). Again, if you think it is wrong to support O0 and LTO, then please elaborate.

I don't think -c -O0 should get this not-entirely-O0-like behavior.

What is "not-entirely"? And why do you think that?

"Not entirely" means that running the -O0 pipeline, and running an optimization pipeline but asking some subset of passes to turn themselves off, does not get you the same result. And I think that because I'm the one who put 'optnone' upstream in the first place. The case that particularly sticks in my memory is the register allocator, but I believe there are passes at every stage that do not turn themselves off for optnone.

That's orthogonal: you're saying we are not handling it correctly yet, I'm just moving toward *fixing* all these.

It's not orthogonal; that's exactly how 'optnone' behaves today. If you have proposed a redesign of how to mix optnone and non-optnone functions in the same compilation unit, in some way other than what's done today, I am not aware of it; can you point to your proposal?

I don't follow: IMO if I generate a module with optnone and pipe it to opt -O3 I expect no function IR to be touched. If it is not the case it is a bug.

In D28404#640297, @probinson wrote:

Sorry, you lost me. CFI is part of DWARF and we do DWARF perfectly well without LTO (and at O0).

This CFI: http://clang.llvm.org/docs/ControlFlowIntegrity.html

In D28404#640314, @mehdi_amini wrote:

I don't follow: IMO if I generate a module with optnone and pipe it to opt -O3 I expect no function IR to be touched. If it is not the case it is a bug.

Your opinion and expectation are not supported by the IR spec. Optnone skips "most" optimization passes. It is not practical (or was not, at the time) to make the -O3 pipeline behave exactly the same as the -O0 pipeline, and also not actually necessary to support the purpose for which 'optnone' was invented.

If you have a goal of making 'optnone' functions use the actual -O0 pipeline, while non-optnone functions use the optimizing pipeline, more power to you and you will need to take up that particular design challenge with Chandler first.

In D28404#640314, @mehdi_amini wrote:

You just wrote above that " mixing -O0 and LTO " is wrong, *if* I were to agree with you at some point, then I'd make it a hard error.

Yes, I was not clear that I meant that -O0 -flto on the same clang command line just seems nonsensical. "Optimize my program without optimizing it" forsooth.

In D28404#640314, @mehdi_amini wrote:

In D28404#640284, @probinson wrote:

Upfront, it seemed peculiar to handle only one optimization level. After more thought, the whole idea of mixing -O0 and LTO seems wrong. Sorry, should have signaled that I had changed my mind about it.

You just haven't articulated 1) why it is wrong and 2) what should we do about it.

"Optimize without optimizing" really? Does not sound confused to you? Persuade me why it makes sense.

If it doesn't make sense, then yes making the -O0 -flto combination an error would be the right path.

Unless you are taking the position that -flto doesn't mean "use LTO" and instead means something else, like "emit bitcode" in which case you should be advocating to change the name of the option to say what it means.

In D28404#640314, @mehdi_amini wrote:

In D28404#640284, @probinson wrote:

In D28404#640178, @mehdi_amini wrote:

In D28404#640170, @probinson wrote:

In my experience, modifying source is by far simpler than hacking a build system to make a special case for compiler options for one module in an application. (If you have a way to build Clang with everything done LTO except one module built with -O0, on Linux with ninja, I would be very curious to hear how you do that.)

Static library, separated projects, etc.
We have tons of users...

Still waiting.

Waiting for what?
We have use-cases, I gave you a few (vendor static libraries are one). Again, if you think it is wrong to support O0 and LTO, then please elaborate.

Your original use-case described debugging a module in an application. You claimed it was simpler to change the build options for a module than change the source, which I am still waiting to hear how/why that is simpler.

Your subsequent use cases are about entire sub-projects, which is entirely different and orthogonal to where you started. Please elaborate on the original use case.

Basically, I don't see why having clang always emit a real .o at -O0 would be a problem.
I haven't gotten through the other-CFI documentation yet though.

In D28404#640362, @probinson wrote:

In D28404#640314, @mehdi_amini wrote:

I don't follow: IMO if I generate a module with optnone and pipe it to opt -O3 I expect no function IR to be touched. If it is not the case it is a bug.

Your opinion and expectation are not supported by the IR spec. Optnone skips "most" optimization passes. It is not practical (or was not, at the time) to make the -O3 pipeline behave exactly the same as the -O0 pipeline, and also not actually necessary to support the purpose for which 'optnone' was invented.

If you have a goal of making 'optnone' functions use the actual -O0 pipeline, while non-optnone functions use the optimizing pipeline, more power to you and you will need to take up that particular design challenge with Chandler first.

Oh, maybe you are thinking of eliminating the -O0 pipeline? Because if -O0 implies optnone then it's kinda-sorta the same thing as the optimization pipeline operating on nothing but optnone functions? I'd think that would make -O0 compilations slow down, which would not be a feature.

Actually, as mentioned before, I could be fine with making O0 incompatible with LTO, however security features like CFI (or other sort of whole-program analyses/instrumentations) requires LTO.

In D28404#640588, @mehdi_amini wrote:

Actually, as mentioned before, I could be fine with making O0 incompatible with LTO, however security features like CFI (or other sort of whole-program analyses/instrumentations) requires LTO.

Well, "requires LTO" is overstating the case, AFAICT from the link you gave me. Doesn't depend on optimization at all. It depends on some interprocedural analyses given some particular scope/visibility boundary, which it is convenient to define as a set of linked bitcode modules, that by some happy chance is the same set of linked bitcode modules that LTO will operate on.

If it's important to support combining a bitcode version of my-application with your-bitcode-library for this CFI or whatever, and you also want to let me have my-application be unoptimized while your-bitcode-library gets optimized, NOW we have a use-case. (Maybe that's what you had in mind earlier, but for some reason I wasn't able to extract that out of any prior comments. No matter.)

I'm now thinking along the lines of a -foptimize-off flag (bikesheds welcome) which would set the default for the pragma to 'off'. How is that different than what you wanted for -O0? It is defined in terms of an existing pragma, which is WAY easier to explain and WAY easier to implement. And, it still lets us say that -c -O0 -flto is a mistake, if that seems like a useful thing to say.

Does that seem reasonable? Fit your understanding of the needs?

I'm now thinking along the lines of a -foptimize-off flag (bikesheds welcome) which would set the default for the pragma to 'off'. How is that different than what you wanted for -O0? It is defined in terms of an existing pragma, which is WAY easier to explain and WAY easier to implement. And, it still lets us say that -c -O0 -flto is a mistake, if that seems like a useful thing to say.

Well -O0 being actually "disable optimization", I found "way easier" to handle everything the same way (pragma, command line, etc.). I kind of find it confusing for the user to differentiate -O0 from -foptimize=off. What is supposed to change between the two?

In D28404#640682, @mehdi_amini wrote:

I'm now thinking along the lines of a -foptimize-off flag (bikesheds welcome) which would set the default for the pragma to 'off'. How is that different than what you wanted for -O0? It is defined in terms of an existing pragma, which is WAY easier to explain and WAY easier to implement. And, it still lets us say that -c -O0 -flto is a mistake, if that seems like a useful thing to say.

Well -O0 being actually "disable optimization", I found "way easier" to handle everything the same way (pragma, command line, etc.). I kind of find it confusing for the user to differentiate -O0 from -foptimize=off. What is supposed to change between the two?

There is a pedantic difference, rooted in the still-true factoid that O0 != optnone.
If we redefine LTO as "Link Time Operation" (rather than Optimization; see my reply to Duncan) then -O0 -flto is no longer an oxymoron, but using the attribute to imply the optimization level is still not good fidelity to what the user asked for.

In D28404#640862, @probinson wrote:

In D28404#640682, @mehdi_amini wrote:

I'm now thinking along the lines of a -foptimize-off flag (bikesheds welcome) which would set the default for the pragma to 'off'. How is that different than what you wanted for -O0? It is defined in terms of an existing pragma, which is WAY easier to explain and WAY easier to implement. And, it still lets us say that -c -O0 -flto is a mistake, if that seems like a useful thing to say.

Well -O0 being actually "disable optimization", I found "way easier" to handle everything the same way (pragma, command line, etc.). I kind of find it confusing for the user to differentiate -O0 from -foptimize=off. What is supposed to change between the two?

There is a pedantic difference, rooted in the still-true factoid that O0 != optnone.
If we redefine LTO as "Link Time Operation" (rather than Optimization; see my reply to Duncan) then -O0 -flto is no longer an oxymoron, but using the attribute to imply the optimization level is still not good fidelity to what the user asked for.

I have to say, I don't understand the confusion or problem here...

For me, the arguments you're raising against -O0 and -flto don't hold up on closer inspection:

O0 != optnone: correct. But this is only visible in LTO. And in LTO, Os != optsize, and Oz != minsize. But we use optsize and minsize to communicate between the compilation and the LTO step to the best of our ability the intent of the programmer. It appears we can use optnone exactly the same way here.

optnone isn't *really* no optimizations: clearly this is true, but then neither is -O0. We run the always inliner, a couple of other passes, and we run several parts of the code generators optimizer. I understand why optnone deficiencies (ie, too many optimizations) might be frustrating, but having *more users* seems likely to make this *better*.

There is no use case for -O0 + -flto: I really don't understand this. CFI and other whole program analysis or semantic transformations (*not* optimizations) require LTO but not any particular pipeline. And I *really* want the ability to bisect files going into an LTO build to chase miscompiles. There are large systems built to manipulate flags that are much more efficient and accessible than modifying source code. It seems an entirely reasonable (and quite low cost) feature. The fact that the LTO acronym stands for Link Time Optimization seems like a relatively unimportant thing. It is just an acronym and a name. We shouldn't let it preclude interesting use cases.

But all of this seems like an attempt to argue "you are wrong to have your use case". I personally find that an unproductive line of discussion. I would suggest instead we look at this differently:

For example, you might ask: could we find some other way to solve the problem you are trying to solve here? Suggesting an alternative approach would seem constructive. So far, all we've got is modify source code, but I think that there is a clear explanation of why that doesn't address the particular use case.

You might also ask: is supporting this feature a reasonable maintenance burden for Clang to address the use case? That seems like a productive discussion. For example, I *am* concerned about the increasing attribute noise at -O0. I don't think it is something to be dismissed. However, given the options we have today, it seems like the most effective way to address this use case and I don't have any better ideas to solve the problems Mehdi is solving here.

But I'm also not one of the most active maintainers writing patches, fixing bugs, and improving the IRGen layer. So ultimately, I defer on the maintenance issue to those maintainers.

In D28404#641078, @chandlerc wrote:

For me, the arguments you're raising against -O0 and -flto don't hold up on closer inspection:

O0 != optnone: correct. But this is only visible in LTO. And in LTO, Os != optsize, and Oz != minsize. But we use optsize and minsize to communicate between the compilation and the LTO step to the best of our ability the intent of the programmer. It appears we can use optnone exactly the same way here.

If the design decision is that relevant optimization controls are propagated into bitcode as function attributes, I grumble but concede it will do something similar to what was requested.

It does bother me that we keep finding things that LTO needs to know but which it does not know because it runs in a separate phase of the workflow. I hope it is not a serious problem to ask "is there a more sensible way to fix this?" Maybe I'm not so good at expressing that so it comes out as a question rather than an objection, but that's basically what it is.

This design decision leaves -O1/-Og needing yet another attribute, when we get around to that, but I suppose Og would not have the interaction-with-other-attributes problems that optnone has.

optnone isn't *really* no optimizations: clearly this is true, but then neither is -O0. We run the always inliner, a couple of other passes, and we run several parts of the code generators optimizer. I understand why optnone deficiencies (ie, too many optimizations) might be frustrating, but having *more users* seems likely to make this *better*.

We have picked all the low-hanging fruit there, and probably some medium-hanging fruit. Mehdi did have the misunderstanding that optnone == -O0 and that I think was worth correcting.

There is no use case for -O0 + -flto:

The email thread has an exchange between Duncan and me, where I accept the use case.

But all of this seems like an attempt to argue "you are wrong to have your use case". I personally find that an unproductive line of discussion.

Not saying it was *wrong* just the description did not convey adequate justification. Listing a few project types does not constitute a use case. We did get to one, eventually, and it even involved differences in optimization levels.

For example, you might ask: could we find some other way to solve the problem you are trying to solve here?

There is another way to make use of the attribute, which I think will be more robust:

Have Sema pretend the pragma is in effect at all times, at -O0. Then all the existing conflict detection/resolution logic Just Works, and there's no need to spend 4 lines of code hoping to replicate the correct conditions in CodeGenModule.

Because Sema does not have a handle on CodeGenOptions and therefore does not a-priori know the optimization level, probably the right thing to do is move the flag to LangOpts and set it under the correct conditions in CompilerInvocation. It wouldn't be the first codegen-like option in LangOpts.

In D28404#641538, @probinson wrote:

optnone isn't *really* no optimizations: clearly this is true, but then neither is -O0. We run the always inliner, a couple of other passes, and we run several parts of the code generators optimizer. I understand why optnone deficiencies (ie, too many optimizations) might be frustrating, but having *more users* seems likely to make this *better*.

We have picked all the low-hanging fruit there, and probably some medium-hanging fruit. Mehdi did have the misunderstanding that optnone == -O0 and that I think was worth correcting.

As I stand right now, there hasn't been any correction.
I still consider the fact that optnone wouldn't produce the "same" result (modulo corner cases around merging global variables for instance) as O0 a bug that need to be fixed.

(Disabling passes for compile time at O0 stays I compile time improvement, I never suggested to stop doing this...)

In D28404#641557, @mehdi_amini wrote:

As I stand right now, there hasn't been any correction.
I still consider the fact that optnone wouldn't produce the "same" result (modulo corner cases around merging global variables for instance) as O0 a bug that need to be fixed.

Why? That's not the purpose of optnone. You've already admitted there are some differences. Why are other differences important?

In D28404#641597, @probinson wrote:

In D28404#641557, @mehdi_amini wrote:

As I stand right now, there hasn't been any correction.
I still consider the fact that optnone wouldn't produce the "same" result (modulo corner cases around merging global variables for instance) as O0 a bug that need to be fixed.

Why?

Why not? What's the alternative?
If we want to support -O0 -flto and optnone it the way to convey this to the optimizer, I don't see the alternative.

In D28404#641606, @mehdi_amini wrote:

If we want to support -O0 -flto and optnone it the way to convey this to the optimizer, I don't see the alternative.

optsize != -Os (according to Chandler)
minsize != -Oz (according to Chandler)
optnone != -O0 (according to both me and Chandler)

optnone is not "the way to convey (-O0) to the optimizer." Please get that misunderstanding out of your head. Clang handles -O0 by creating a short, minimalist pipeline, and running everything through it. Clang handles -O2 by creating a fuller optimization pipeline (with not just more, but *different* passes), and functions with 'optnone' skip many of the passes in the pipeline.

These are architecturally different processes, you are not going to be able to make 'optnone' behave exactly like -O0 without major redesign of how the pipelines work.

In D28404#641632, @probinson wrote:

In D28404#641606, @mehdi_amini wrote:

If we want to support -O0 -flto and optnone it the way to convey this to the optimizer, I don't see the alternative.

optsize != -Os (according to Chandler)
minsize != -Oz (according to Chandler)
optnone != -O0 (according to both me and Chandler)

Of course, but that's just an implementation limitation, mostly for historical reason I believe, not by design. That does not have to be set in stone and I'm giving you the direction with respect to LTO in particular here: these attributes should be able to behave the same way as the corresponding '-O' command line.

optnone is not "the way to convey (-O0) to the optimizer." Please get that misunderstanding out of your head. Clang handles -O0 by creating a short, minimalist pipeline, and running everything through it. Clang handles -O2 by creating a fuller optimization pipeline, and functions with 'optnone' skip many of the passes in the pipeline.

Don't get me wrong: I believe I have a very good understanding how the optimizer pipeline is setup and how the passes operates with respect to the attributes.
And it is because I understand the deficiencies (and how it is an issue with LTO) that I'm aligning all of this toward a consistent/coherent expected result for the users.

These are architecturally different processes, you are not going to be able to make 'optnone' behave exactly like -O0 without major redesign of how the pipelines work.

I'd disagree about your estimation of "major". It's not gonna be tomorrow, sure, but it does not have to be. The most difficult part will be the inter procedural ones, but there are not that many.

In D28404#641696, @mehdi_amini wrote:

In D28404#641632, @probinson wrote:

In D28404#641606, @mehdi_amini wrote:

If we want to support -O0 -flto and optnone it the way to convey this to the optimizer, I don't see the alternative.

optsize != -Os (according to Chandler)
minsize != -Oz (according to Chandler)
optnone != -O0 (according to both me and Chandler)

Of course, but that's just an implementation limitation, mostly for historical reason I believe, not by design.

There is certainly a lot of history here influencing this, but I think there is also some a fundamental difference. The flag is a request from the user to behave in a particular way. The LLVM attribute is a tool we use in some cases to try to satisfy that request.

When we're not doing LTO, it is easier to satisfy the requests of '-O' flags. The fact that we happen to not use attributes to do it today is just an implementation detail.

When we are doing LTO, satisfying different requests is hard. We should do our best, but I think it is reasonable to explain to the user that "with LTO, we can't fail to optimize with the Wombat optimization because of <reasons> when one file requests -O0 and another requests -O2". Same thing for the other levels.

This seems precisely analogous to the fact that even when the user requests -O0, we will do some inlining. Why? Because we *have to* for semantic reasons.

So I think what Mehdi is driving at is that if '-O0 -flto' has a mismatch from '-O0' in terms of what users expect, we should probably try to fix that. I'd suggest that there may be fundamental things we can't fix and that is OK. I don't think this is unprincipled either, we're doing the best we can to honor the user's request.

The other thing that might help is to point out that there *are* principles behind why these flags. Unlike the differences between -O[123], all of -O0, -Os, and -Oz have non-threshold semantic implications. So with this change, I think we will have *all* the -O flags covered, because I view '-O[123]' as a single semantic space with a threshold modifier that we *don't* need to communicate to LTO. We model that state as the absence of any attribute. And -O0, -Os, and-Oz have dedicated attributes.

If we ever want to really push on -Og, that might indeed require an attribute to distinguish it.

optnone is not "the way to convey (-O0) to the optimizer."

So, I view '-O0' as a request from the programmer to turn off the optimizer to the extent possible and give them as naive, minimally transformed representation of th ecode as possible.

And based on that, I view optnone as a tool to implement precisely these semantics at a fine granularity and with survivability across bitcode roundtrip.

It just isn't the *only* tool, and sometimes we can use an easier (and cheaper to Mehdi's compile time point) tool.

I think the text for spec'ing optnone in the LLVM langref needs to be updated to reflect this though. Currently it says:

This function attribute indicates that most optimization passes will skip this function, with the exception of interprocedural optimization passes.

This is demonstrably false:

% ag OptimizeNone lib/Transforms/IPO
lib/Transforms/IPO/ForceFunctionAttrs.cpp
47:      .Case("optnone", Attribute::OptimizeNone)

lib/Transforms/IPO/Inliner.cpp
813:    if (F.hasFnAttribute(Attribute::OptimizeNone))

lib/Transforms/IPO/InferFunctionAttrs.cpp
30:    if (F.isDeclaration() && !F.hasFnAttribute((Attribute::OptimizeNone)))

lib/Transforms/IPO/FunctionAttrs.cpp
1056:    if (F.hasFnAttribute(Attribute::OptimizeNone)) {
1137:    if (!F || F->hasFnAttribute(Attribute::OptimizeNone)) {

I'll send a patch.

I guess I'm getting irritated because people are trying to tell me what optnone means. I know what it means; I spent probably a whole year pushing to get it adopted.

Optnone means: When you are running optimizations, try not to optimize this part, if you can.

That's it. That's *all*. It has never meant anything else. Telling me different means you misunderstand, and trying to persuade me that *I* misunderstand is going to be a waste of time and effort.

I fully understand that this is not the definition of optnone that you *want*. Please feel free to propose a redefinition. But don't go telling me that the thing you *want* is what the thing already *is* and that any difference is a bug.

In D28404#641757, @chandlerc wrote:

% ag OptimizeNone lib/Transforms/IPO
lib/Transforms/IPO/ForceFunctionAttrs.cpp
47:      .Case("optnone", Attribute::OptimizeNone)

This is implementing a debugging option, not skipping a pass.

lib/Transforms/IPO/Inliner.cpp
813:    if (F.hasFnAttribute(Attribute::OptimizeNone))

This is declining to operate on a callee, not skipping a pass. Given that optnone is supposed to be paired with noinline, wouldn't this be redundant?

lib/Transforms/IPO/InferFunctionAttrs.cpp
30:    if (F.isDeclaration() && !F.hasFnAttribute((Attribute::OptimizeNone)))

lib/Transforms/IPO/FunctionAttrs.cpp
1056:    if (F.hasFnAttribute(Attribute::OptimizeNone)) {
1137:    if (!F || F->hasFnAttribute(Attribute::OptimizeNone)) {

This is SCC stuff, which I don't really understand what it is trying to do. I wonder whether 'noinline' ought to be the relevant attribute, though.

@rsmith could you say whether it seems reasonable to have a LangOpts flag that basically means "pragma clang optimize off is always in effect." I think it would make the other optnone-related logic simpler. It would not be the only sort-of-codegen-related flag in LangOpts (e.g. the PIC/PIE stuff).

In D28404#641538, @probinson wrote:

There is another way to make use of the attribute, which I think will be more robust:

Have Sema pretend the pragma is in effect at all times, at -O0. Then all the existing conflict detection/resolution logic Just Works, and there's no need to spend 4 lines of code hoping to replicate the correct conditions in CodeGenModule.

Because Sema does not have a handle on CodeGenOptions and therefore does not a-priori know the optimization level, probably the right thing to do is move the flag to LangOpts and set it under the correct conditions in CompilerInvocation. It wouldn't be the first codegen-like option in LangOpts.

Ping :)

In D28404#673260, @mehdi_amini wrote:

Ping :)

To clarify my understanding of this thread, it seems like there are three ways forward here:

To have -O0 add optnone to the generated functions (enabling some degree of lack of optimization of those functions even when used with -flto)
To have -O0 -flto essentially turn off LTO (so that we get unoptimized objects directly for things we're debugging)
Add a separate flag to make optnone the default

(1) is this patch. The disadvantage of (2) is that it also precludes CFI (and other whole-program transformations). This seems highly unfortunate at best and a non-starter in the general case. The disadvantage of (3) is that it might seems confusing to users (i.e. how to explain the difference between -O0 and -foptimize-off?) and is an unnecessary exposure to users of implementation details. On this point I agree.

It is true that -O0 != optnone, in a technical sense, but in the end, both are best effort. Moreover, there is a tradeoff between disabling optimization of the functions you don't want to optimize and keeping the remainder of the code as similar as possible to how it would be if everything were being optimized. What optnone provides seems like a reasonable point in that tradeoff space. I think that we should move forward with this approach.

Also note that @chandlerc in r290398 made clang adding "noinline" on every function at O0 by default, which seems very similar to what I'm doing here.

We're still waiting for @rsmith to comment whether it'd be better to have a LangOpts flag that basically means "pragma clang optimize off is always in effect." and Have Sema pretend the pragma is in effect at all times, at -O0.

Just to be explicit, I agree with Hal's summary. This seems like the right engineering tradeoff and I don't find anything particularly unsatisfying about it.

In D28404#675616, @mehdi_amini wrote:

Also note that @chandlerc in r290398 made clang adding "noinline" on every function at O0 by default, which seems very similar to what I'm doing here.

We're still waiting for @rsmith to comment whether it'd be better to have a LangOpts flag that basically means "pragma clang optimize off is always in effect." and Have Sema pretend the pragma is in effect at all times, at -O0.

FWIW, I have no real opinion about this side of it, I see it more as a detail of how Clang wants to implement this kind of thing.

In D28404#675687, @chandlerc wrote:

In D28404#675616, @mehdi_amini wrote:

We're still waiting for @rsmith to comment whether it'd be better to have a LangOpts flag that basically means "pragma clang optimize off is always in effect." and Have Sema pretend the pragma is in effect at all times, at -O0.

FWIW, I have no real opinion about this side of it, I see it more as a detail of how Clang wants to implement this kind of thing.

That was my suggestion as it seemed like this patch is essentially replicating the attribute-conflict detection logic that's in place for attributes specified in the source. And we do like to say DRY.
But I won't insist; the patch can proceed as far as I'm concerned.

FWIW, I think this makes sense.
Moving O0 and optnone get closer seems sensible. Even though -O3 with an optnone function indeed gives you different results today.
We are basically maintaining two things for the same "do not optimize" goal.
This obviously won't make O0 and optnone being the same in todays pass managers, but it is a step in the right direction.

Actually, looking through the comments, it appears that everyone (eventually) agreed with the approach in the patch. I agree too. LGTM.

Mehdi, are you able to rebase and commit, or should someone take over?

This revision is now accepted and ready to land.May 25 2017, 4:38 PM

Closed by commit rL304127: IRGen: Add optnone attribute on function during O0 (authored by mehdi_amini). · Explain WhyMay 28 2017, 10:38 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

cfe/

trunk/

include/

clang/

Driver/

CC1Options.td

2 lines

Frontend/

CodeGenOptions.def

1 line

lib/

CodeGen/

CGOpenMPRuntime.cpp

2 lines

CGOpenMPRuntimeNVPTX.cpp

1 line

CodeGenModule.cpp

14 lines

Frontend/

CompilerInvocation.cpp

1 line

test/

CodeGen/

aarch64-neon-2velem.c

2 lines

aarch64-neon-3v.c

2 lines

aarch64-neon-across.c

2 lines

aarch64-neon-extract.c

2 lines

aarch64-neon-fcvt-intrinsics.c

2 lines

aarch64-neon-fma.c

2 lines

aarch64-neon-intrinsics.c

2 lines

aarch64-neon-ldst-one.c

2 lines

aarch64-neon-misc.c

2 lines

aarch64-neon-perm.c

2 lines

aarch64-neon-scalar-copy.c

3 lines

aarch64-neon-scalar-x-indexed-elem.c

2 lines

aarch64-neon-shifts.c

2 lines

aarch64-neon-tbl.c

2 lines

aarch64-neon-vcombine.c

2 lines

aarch64-neon-vget-hilo.c

2 lines

aarch64-neon-vget.c

2 lines

aarch64-poly128.c

2 lines

aarch64-poly64.c

2 lines

address-safety-attr-kasan.cpp

6 lines

address-safety-attr.cpp

8 lines

arm-crc32.c

2 lines

arm-neon-directed-rounding.c

2 lines

arm-neon-fma.c

2 lines

arm-neon-numeric-maxmin.c

2 lines

1 line

2 lines

2 lines

2 lines

4 lines

2 lines

2 lines

6 lines

5 lines

builtins-arm-exclusive.c

5 lines

builtins-arm.c

2 lines

builtins-arm64.c

2 lines

noduplicate-cxx11-test.cpp

2 lines

pragma-weak.c

2 lines

unwind-attr.c

2 lines

CodeGenCXX/

apple-kext-indirect-virtual-dtor-call.cpp

2 lines

apple-kext-no-staticinit-section.cpp

2 lines

debug-info-global-ctor-dtor.cpp

4 lines

optnone-templates.cpp

2 lines

static-init-wasm.cpp

8 lines

thunks.cpp

2 lines

CodeGenObjC/

gnu-exceptions.m

2 lines

CodeGenOpenCL/

amdgpu-attrs.cl

48 lines

Driver/

darwin-iphone-defaults.m

2 lines

Diff 100580

cfe/trunk/include/clang/Driver/CC1Options.td

	Show First 20 Lines • Show All 166 Lines • ▼ Show 20 Lines
	def disable_llvm_passes : Flag<["-"], "disable-llvm-passes">,			def disable_llvm_passes : Flag<["-"], "disable-llvm-passes">,
	HelpText<"Use together with -emit-llvm to get pristine LLVM IR from the "			HelpText<"Use together with -emit-llvm to get pristine LLVM IR from the "
	"frontend by not running any LLVM passes at all">;			"frontend by not running any LLVM passes at all">;
	def disable_llvm_optzns : Flag<["-"], "disable-llvm-optzns">,			def disable_llvm_optzns : Flag<["-"], "disable-llvm-optzns">,
	Alias<disable_llvm_passes>;			Alias<disable_llvm_passes>;
	def disable_lifetimemarkers : Flag<["-"], "disable-lifetime-markers">,			def disable_lifetimemarkers : Flag<["-"], "disable-lifetime-markers">,
	HelpText<"Disable lifetime-markers emission even when optimizations are "			HelpText<"Disable lifetime-markers emission even when optimizations are "
	"enabled">;			"enabled">;
				def disable_O0_optnone : Flag<["-"], "disable-O0-optnone">,
				HelpText<"Disable adding the optnone attribute to functions at O0">;
	def disable_red_zone : Flag<["-"], "disable-red-zone">,			def disable_red_zone : Flag<["-"], "disable-red-zone">,
	HelpText<"Do not emit code that uses the red zone.">;			HelpText<"Do not emit code that uses the red zone.">;
	def dwarf_column_info : Flag<["-"], "dwarf-column-info">,			def dwarf_column_info : Flag<["-"], "dwarf-column-info">,
	HelpText<"Turn on column location information.">;			HelpText<"Turn on column location information.">;
	def split_dwarf : Flag<["-"], "split-dwarf">,			def split_dwarf : Flag<["-"], "split-dwarf">,
	HelpText<"Split out the dwarf .dwo sections">;			HelpText<"Split out the dwarf .dwo sections">;
	def gnu_pubnames : Flag<["-"], "gnu-pubnames">,			def gnu_pubnames : Flag<["-"], "gnu-pubnames">,
	HelpText<"Emit newer GNU style pubnames">;			HelpText<"Emit newer GNU style pubnames">;
	▲ Show 20 Lines • Show All 606 Lines • Show Last 20 Lines

cfe/trunk/include/clang/Frontend/CodeGenOptions.def

	Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	CODEGENOPT(DisableFPElim , 1, 0) ///< Set when -fomit-frame-pointer is enabled.			CODEGENOPT(DisableFPElim , 1, 0) ///< Set when -fomit-frame-pointer is enabled.
	CODEGENOPT(DisableFree , 1, 0) ///< Don't free memory.			CODEGENOPT(DisableFree , 1, 0) ///< Don't free memory.
	CODEGENOPT(DiscardValueNames , 1, 0) ///< Discard Value Names from the IR (LLVMContext flag)			CODEGENOPT(DiscardValueNames , 1, 0) ///< Discard Value Names from the IR (LLVMContext flag)
	CODEGENOPT(DisableGCov , 1, 0) ///< Don't run the GCov pass, for testing.			CODEGENOPT(DisableGCov , 1, 0) ///< Don't run the GCov pass, for testing.
	CODEGENOPT(DisableLLVMPasses , 1, 0) ///< Don't run any LLVM IR passes to get			CODEGENOPT(DisableLLVMPasses , 1, 0) ///< Don't run any LLVM IR passes to get
	///< the pristine IR generated by the			///< the pristine IR generated by the
	///< frontend.			///< frontend.
	CODEGENOPT(DisableLifetimeMarkers, 1, 0) ///< Don't emit any lifetime markers			CODEGENOPT(DisableLifetimeMarkers, 1, 0) ///< Don't emit any lifetime markers
				CODEGENOPT(DisableO0ImplyOptNone , 1, 0) ///< Don't annonate function with optnone at O0
	CODEGENOPT(ExperimentalNewPassManager, 1, 0) ///< Enables the new, experimental			CODEGENOPT(ExperimentalNewPassManager, 1, 0) ///< Enables the new, experimental
	///< pass manager.			///< pass manager.
	CODEGENOPT(DisableRedZone , 1, 0) ///< Set when -mno-red-zone is enabled.			CODEGENOPT(DisableRedZone , 1, 0) ///< Set when -mno-red-zone is enabled.
	CODEGENOPT(DisableTailCalls , 1, 0) ///< Do not emit tail calls.			CODEGENOPT(DisableTailCalls , 1, 0) ///< Do not emit tail calls.
	CODEGENOPT(EmitDeclMetadata , 1, 0) ///< Emit special metadata indicating what			CODEGENOPT(EmitDeclMetadata , 1, 0) ///< Emit special metadata indicating what
	///< Decl* various IR entities came from.			///< Decl* various IR entities came from.
	///< Only useful when running CodeGen as a			///< Only useful when running CodeGen as a
	///< subroutine.			///< subroutine.
	▲ Show 20 Lines • Show All 213 Lines • Show Last 20 Lines

cfe/trunk/lib/CodeGen/CGOpenMPRuntime.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 754 Lines • ▼ Show 20 Lines	emitCombinerOrInitializer(CodeGenModule &CGM, QualType Ty,
auto &FnInfo =		auto &FnInfo =
CGM.getTypes().arrangeBuiltinFunctionDeclaration(C.VoidTy, Args);		CGM.getTypes().arrangeBuiltinFunctionDeclaration(C.VoidTy, Args);
auto *FnTy = CGM.getTypes().GetFunctionType(FnInfo);		auto *FnTy = CGM.getTypes().GetFunctionType(FnInfo);
auto *Fn = llvm::Function::Create(		auto *Fn = llvm::Function::Create(
FnTy, llvm::GlobalValue::InternalLinkage,		FnTy, llvm::GlobalValue::InternalLinkage,
IsCombiner ? ".omp_combiner." : ".omp_initializer.", &CGM.getModule());		IsCombiner ? ".omp_combiner." : ".omp_initializer.", &CGM.getModule());
CGM.SetInternalFunctionAttributes(/D=/nullptr, Fn, FnInfo);		CGM.SetInternalFunctionAttributes(/D=/nullptr, Fn, FnInfo);
Fn->removeFnAttr(llvm::Attribute::NoInline);		Fn->removeFnAttr(llvm::Attribute::NoInline);
		Fn->removeFnAttr(llvm::Attribute::OptimizeNone);
Fn->addFnAttr(llvm::Attribute::AlwaysInline);		Fn->addFnAttr(llvm::Attribute::AlwaysInline);
CodeGenFunction CGF(CGM);		CodeGenFunction CGF(CGM);
// Map "T omp_in;" variable to "*omp_in_parm" value in all expressions.		// Map "T omp_in;" variable to "*omp_in_parm" value in all expressions.
// Map "T omp_out;" variable to "*omp_out_parm" value in all expressions.		// Map "T omp_out;" variable to "*omp_out_parm" value in all expressions.
CGF.StartFunction(GlobalDecl(), C.VoidTy, Fn, FnInfo, Args);		CGF.StartFunction(GlobalDecl(), C.VoidTy, Fn, FnInfo, Args);
CodeGenFunction::OMPPrivateScope Scope(CGF);		CodeGenFunction::OMPPrivateScope Scope(CGF);
Address AddrIn = CGF.GetAddrOfLocalVar(&OmpInParm);		Address AddrIn = CGF.GetAddrOfLocalVar(&OmpInParm);
Scope.addPrivate(In, [&CGF, AddrIn, PtrTy]() -> Address {		Scope.addPrivate(In, [&CGF, AddrIn, PtrTy]() -> Address {
▲ Show 20 Lines • Show All 2,739 Lines • ▼ Show 20 Lines	emitTaskPrivateMappingFunction(CodeGenModule &CGM, SourceLocation Loc,
auto *TaskPrivatesMapTy =		auto *TaskPrivatesMapTy =
CGM.getTypes().GetFunctionType(TaskPrivatesMapFnInfo);		CGM.getTypes().GetFunctionType(TaskPrivatesMapFnInfo);
auto *TaskPrivatesMap = llvm::Function::Create(		auto *TaskPrivatesMap = llvm::Function::Create(
TaskPrivatesMapTy, llvm::GlobalValue::InternalLinkage,		TaskPrivatesMapTy, llvm::GlobalValue::InternalLinkage,
".omp_task_privates_map.", &CGM.getModule());		".omp_task_privates_map.", &CGM.getModule());
CGM.SetInternalFunctionAttributes(/D=/nullptr, TaskPrivatesMap,		CGM.SetInternalFunctionAttributes(/D=/nullptr, TaskPrivatesMap,
TaskPrivatesMapFnInfo);		TaskPrivatesMapFnInfo);
TaskPrivatesMap->removeFnAttr(llvm::Attribute::NoInline);		TaskPrivatesMap->removeFnAttr(llvm::Attribute::NoInline);
		TaskPrivatesMap->removeFnAttr(llvm::Attribute::OptimizeNone);
TaskPrivatesMap->addFnAttr(llvm::Attribute::AlwaysInline);		TaskPrivatesMap->addFnAttr(llvm::Attribute::AlwaysInline);
CodeGenFunction CGF(CGM);		CodeGenFunction CGF(CGM);
CGF.disableDebugInfo();		CGF.disableDebugInfo();
CGF.StartFunction(GlobalDecl(), C.VoidTy, TaskPrivatesMap,		CGF.StartFunction(GlobalDecl(), C.VoidTy, TaskPrivatesMap,
TaskPrivatesMapFnInfo, Args);		TaskPrivatesMapFnInfo, Args);

// *privi = &.privates.privi;		// *privi = &.privates.privi;
LValue Base = CGF.EmitLoadOfPointerLValue(		LValue Base = CGF.EmitLoadOfPointerLValue(
▲ Show 20 Lines • Show All 3,411 Lines • Show Last 20 Lines

cfe/trunk/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp

	Show First 20 Lines • Show All 855 Lines • ▼ Show 20 Lines
	llvm::Value *CGOpenMPRuntimeNVPTX::emitTeamsOutlinedFunction(			llvm::Value *CGOpenMPRuntimeNVPTX::emitTeamsOutlinedFunction(
	const OMPExecutableDirective &D, const VarDecl *ThreadIDVar,			const OMPExecutableDirective &D, const VarDecl *ThreadIDVar,
	OpenMPDirectiveKind InnermostKind, const RegionCodeGenTy &CodeGen) {			OpenMPDirectiveKind InnermostKind, const RegionCodeGenTy &CodeGen) {

	llvm::Value *OutlinedFunVal = CGOpenMPRuntime::emitTeamsOutlinedFunction(			llvm::Value *OutlinedFunVal = CGOpenMPRuntime::emitTeamsOutlinedFunction(
	D, ThreadIDVar, InnermostKind, CodeGen);			D, ThreadIDVar, InnermostKind, CodeGen);
	llvm::Function *OutlinedFun = cast<llvm::Function>(OutlinedFunVal);			llvm::Function *OutlinedFun = cast<llvm::Function>(OutlinedFunVal);
	OutlinedFun->removeFnAttr(llvm::Attribute::NoInline);			OutlinedFun->removeFnAttr(llvm::Attribute::NoInline);
				OutlinedFun->removeFnAttr(llvm::Attribute::OptimizeNone);
	OutlinedFun->addFnAttr(llvm::Attribute::AlwaysInline);			OutlinedFun->addFnAttr(llvm::Attribute::AlwaysInline);

	return OutlinedFun;			return OutlinedFun;
	}			}

	void CGOpenMPRuntimeNVPTX::emitTeamsCall(CodeGenFunction &CGF,			void CGOpenMPRuntimeNVPTX::emitTeamsCall(CodeGenFunction &CGF,
	const OMPExecutableDirective &D,			const OMPExecutableDirective &D,
	SourceLocation Loc,			SourceLocation Loc,
	▲ Show 20 Lines • Show All 1,381 Lines • Show Last 20 Lines

cfe/trunk/lib/CodeGen/CodeGenModule.cpp

Show First 20 Lines • Show All 901 Lines • ▼ Show 20 Lines	if (!D) {
if (!F->hasFnAttribute(llvm::Attribute::AlwaysInline) &&		if (!F->hasFnAttribute(llvm::Attribute::AlwaysInline) &&
CodeGenOpts.getInlining() == CodeGenOptions::OnlyAlwaysInlining)		CodeGenOpts.getInlining() == CodeGenOptions::OnlyAlwaysInlining)
B.addAttribute(llvm::Attribute::NoInline);		B.addAttribute(llvm::Attribute::NoInline);

F->addAttributes(llvm::AttributeList::FunctionIndex, B);		F->addAttributes(llvm::AttributeList::FunctionIndex, B);
return;		return;
}		}

if (D->hasAttr<OptimizeNoneAttr>()) {		// Track whether we need to add the optnone LLVM attribute,
		// starting with the default for this optimization level.
		bool ShouldAddOptNone =
		!CodeGenOpts.DisableO0ImplyOptNone && CodeGenOpts.OptimizationLevel == 0;
		// We can't add optnone in the following cases, it won't pass the verifier.
		ShouldAddOptNone &= !D->hasAttr<MinSizeAttr>();
		ShouldAddOptNone &= !F->hasFnAttribute(llvm::Attribute::AlwaysInline);
		ShouldAddOptNone &= !D->hasAttr<AlwaysInlineAttr>();

		if (ShouldAddOptNone \|\| D->hasAttr<OptimizeNoneAttr>()) {
B.addAttribute(llvm::Attribute::OptimizeNone);		B.addAttribute(llvm::Attribute::OptimizeNone);

// OptimizeNone implies noinline; we should not be inlining such functions.		// OptimizeNone implies noinline; we should not be inlining such functions.
B.addAttribute(llvm::Attribute::NoInline);		B.addAttribute(llvm::Attribute::NoInline);
assert(!F->hasFnAttribute(llvm::Attribute::AlwaysInline) &&		assert(!F->hasFnAttribute(llvm::Attribute::AlwaysInline) &&
"OptimizeNone and AlwaysInline on same function!");		"OptimizeNone and AlwaysInline on same function!");

// We still need to handle naked functions even though optnone subsumes		// We still need to handle naked functions even though optnone subsumes
Show All 37 Lines	if (auto *FD = dyn_cast<FunctionDecl>(D)) {
}		}
}		}
}		}

// Add other optimization related attributes if we are optimizing this		// Add other optimization related attributes if we are optimizing this
// function.		// function.
if (!D->hasAttr<OptimizeNoneAttr>()) {		if (!D->hasAttr<OptimizeNoneAttr>()) {
if (D->hasAttr<ColdAttr>()) {		if (D->hasAttr<ColdAttr>()) {
		if (!ShouldAddOptNone)
B.addAttribute(llvm::Attribute::OptimizeForSize);		B.addAttribute(llvm::Attribute::OptimizeForSize);
B.addAttribute(llvm::Attribute::Cold);		B.addAttribute(llvm::Attribute::Cold);
}		}

if (D->hasAttr<MinSizeAttr>())		if (D->hasAttr<MinSizeAttr>())
B.addAttribute(llvm::Attribute::MinSize);		B.addAttribute(llvm::Attribute::MinSize);
}		}

F->addAttributes(llvm::AttributeList::FunctionIndex, B);		F->addAttributes(llvm::AttributeList::FunctionIndex, B);
▲ Show 20 Lines • Show All 3,443 Lines • Show Last 20 Lines

cfe/trunk/lib/Frontend/CompilerInvocation.cpp

Show First 20 Lines • Show All 528 Lines • ▼ Show 20 Lines	for (const auto &Arg : Args.getAllArgValues(OPT_fdebug_prefix_map_EQ))
Opts.DebugPrefixMap.insert(StringRef(Arg).split('='));		Opts.DebugPrefixMap.insert(StringRef(Arg).split('='));

if (const Arg *A =		if (const Arg *A =
Args.getLastArg(OPT_emit_llvm_uselists, OPT_no_emit_llvm_uselists))		Args.getLastArg(OPT_emit_llvm_uselists, OPT_no_emit_llvm_uselists))
Opts.EmitLLVMUseLists = A->getOption().getID() == OPT_emit_llvm_uselists;		Opts.EmitLLVMUseLists = A->getOption().getID() == OPT_emit_llvm_uselists;

Opts.DisableLLVMPasses = Args.hasArg(OPT_disable_llvm_passes);		Opts.DisableLLVMPasses = Args.hasArg(OPT_disable_llvm_passes);
Opts.DisableLifetimeMarkers = Args.hasArg(OPT_disable_lifetimemarkers);		Opts.DisableLifetimeMarkers = Args.hasArg(OPT_disable_lifetimemarkers);
		Opts.DisableO0ImplyOptNone = Args.hasArg(OPT_disable_O0_optnone);
Opts.DisableRedZone = Args.hasArg(OPT_disable_red_zone);		Opts.DisableRedZone = Args.hasArg(OPT_disable_red_zone);
Opts.ForbidGuardVariables = Args.hasArg(OPT_fforbid_guard_variables);		Opts.ForbidGuardVariables = Args.hasArg(OPT_fforbid_guard_variables);
Opts.UseRegisterSizedBitfieldAccess = Args.hasArg(		Opts.UseRegisterSizedBitfieldAccess = Args.hasArg(
OPT_fuse_register_sized_bitfield_access);		OPT_fuse_register_sized_bitfield_access);
Opts.RelaxedAliasing = Args.hasArg(OPT_relaxed_aliasing);		Opts.RelaxedAliasing = Args.hasArg(OPT_relaxed_aliasing);
Opts.StructPathTBAA = !Args.hasArg(OPT_no_struct_path_tbaa);		Opts.StructPathTBAA = !Args.hasArg(OPT_no_struct_path_tbaa);
Opts.DwarfDebugFlags = Args.getLastArgValue(OPT_dwarf_debug_flags);		Opts.DwarfDebugFlags = Args.getLastArgValue(OPT_dwarf_debug_flags);
Opts.MergeAllConstants = !Args.hasArg(OPT_fno_merge_all_constants);		Opts.MergeAllConstants = !Args.hasArg(OPT_fno_merge_all_constants);
▲ Show 20 Lines • Show All 2,240 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/aarch64-neon-2velem.c

	// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s

	// Test new aarch64 intrinsics and types			// Test new aarch64 intrinsics and types

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: @test_vmla_lane_s16(			// CHECK-LABEL: @test_vmla_lane_s16(
	// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x i16> %v, <4 x i16> %v, <4 x i32> <i32 3, i32 3, i32 3, i32 3>			// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x i16> %v, <4 x i16> %v, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
	// CHECK: [[MUL:%.*]] = mul <4 x i16> %b, [[SHUFFLE]]			// CHECK: [[MUL:%.*]] = mul <4 x i16> %b, [[SHUFFLE]]
	▲ Show 20 Lines • Show All 4,423 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/aarch64-neon-3v.c

	// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s

	// Test new aarch64 intrinsics and types			// Test new aarch64 intrinsics and types

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define <8 x i8> @test_vand_s8(<8 x i8> %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vand_s8(<8 x i8> %a, <8 x i8> %b) #0 {
	// CHECK: [[AND_I:%.*]] = and <8 x i8> %a, %b			// CHECK: [[AND_I:%.*]] = and <8 x i8> %a, %b
	// CHECK: ret <8 x i8> [[AND_I]]			// CHECK: ret <8 x i8> [[AND_I]]
	▲ Show 20 Lines • Show All 588 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/aarch64-neon-across.c

	// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \			// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \
	// RUN: -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s

	// Test new aarch64 intrinsics and types			// Test new aarch64 intrinsics and types

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define i16 @test_vaddlv_s8(<8 x i8> %a) #0 {			// CHECK-LABEL: define i16 @test_vaddlv_s8(<8 x i8> %a) #0 {
	// CHECK: [[VADDLV_I:%.*]] = call i32 @llvm.aarch64.neon.saddlv.i32.v8i8(<8 x i8> %a) #2			// CHECK: [[VADDLV_I:%.*]] = call i32 @llvm.aarch64.neon.saddlv.i32.v8i8(<8 x i8> %a) #2
	// CHECK: [[TMP0:%.*]] = trunc i32 [[VADDLV_I]] to i16			// CHECK: [[TMP0:%.*]] = trunc i32 [[VADDLV_I]] to i16
	▲ Show 20 Lines • Show All 332 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/aarch64-neon-extract.c

	// REQUIRES: aarch64-registered-target			// REQUIRES: aarch64-registered-target
	// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \			// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \
	// RUN: -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s

	// Test new aarch64 intrinsics and types			// Test new aarch64 intrinsics and types

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define <8 x i8> @test_vext_s8(<8 x i8> %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vext_s8(<8 x i8> %a, <8 x i8> %b) #0 {
	// CHECK: [[VEXT:%.*]] = shufflevector <8 x i8> %a, <8 x i8> %b, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9>			// CHECK: [[VEXT:%.*]] = shufflevector <8 x i8> %a, <8 x i8> %b, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9>
	// CHECK: ret <8 x i8> [[VEXT]]			// CHECK: ret <8 x i8> [[VEXT]]
	▲ Show 20 Lines • Show All 236 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/aarch64-neon-fcvt-intrinsics.c

	// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \			// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \
	// RUN: -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s

	// Test new aarch64 intrinsics and types			// Test new aarch64 intrinsics and types

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define float @test_vcvtxd_f32_f64(double %a) #0 {			// CHECK-LABEL: define float @test_vcvtxd_f32_f64(double %a) #0 {
	// CHECK: [[VCVTXD_F32_F64_I:%.*]] = call float @llvm.aarch64.sisd.fcvtxn(double %a) #2			// CHECK: [[VCVTXD_F32_F64_I:%.*]] = call float @llvm.aarch64.sisd.fcvtxn(double %a) #2
	// CHECK: ret float [[VCVTXD_F32_F64_I]]			// CHECK: ret float [[VCVTXD_F32_F64_I]]
	▲ Show 20 Lines • Show All 143 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/aarch64-neon-fma.c

	// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon -S -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon -S -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s

	// Test new aarch64 intrinsics and types			// Test new aarch64 intrinsics and types

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define <2 x float> @test_vmla_n_f32(<2 x float> %a, <2 x float> %b, float %c) #0 {			// CHECK-LABEL: define <2 x float> @test_vmla_n_f32(<2 x float> %a, <2 x float> %b, float %c) #0 {
	// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x float> undef, float %c, i32 0			// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x float> undef, float %c, i32 0
	// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x float> [[VECINIT_I]], float %c, i32 1			// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x float> [[VECINIT_I]], float %c, i32 1
	▲ Show 20 Lines • Show All 222 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/aarch64-neon-intrinsics.c

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \			// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \
	// RUN: -fallow-half-arguments-and-returns -S -emit-llvm -o - %s \			// RUN: -fallow-half-arguments-and-returns -S -disable-O0-optnone -emit-llvm -o - %s \
	// RUN: \| opt -S -mem2reg \			// RUN: \| opt -S -mem2reg \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s

	// Test new aarch64 intrinsics and types			// Test new aarch64 intrinsics and types

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: @test_vadd_s8(			// CHECK-LABEL: @test_vadd_s8(
	▲ Show 20 Lines • Show All 21,531 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/aarch64-neon-ldst-one.c

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \			// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \
	// RUN: -fallow-half-arguments-and-returns -emit-llvm -o - %s \			// RUN: -disable-O0-optnone -fallow-half-arguments-and-returns -emit-llvm -o - %s \
	// RUN: \| opt -S -mem2reg \| FileCheck %s			// RUN: \| opt -S -mem2reg \| FileCheck %s

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define <16 x i8> @test_vld1q_dup_u8(i8* %a) #0 {			// CHECK-LABEL: define <16 x i8> @test_vld1q_dup_u8(i8* %a) #0 {
	// CHECK: [[TMP0:%.]] = load i8, i8 %a			// CHECK: [[TMP0:%.]] = load i8, i8 %a
	// CHECK: [[TMP1:%.*]] = insertelement <16 x i8> undef, i8 [[TMP0]], i32 0			// CHECK: [[TMP1:%.*]] = insertelement <16 x i8> undef, i8 [[TMP0]], i32 0
	// CHECK: [[LANE:%.*]] = shufflevector <16 x i8> [[TMP1]], <16 x i8> [[TMP1]], <16 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <16 x i8> [[TMP1]], <16 x i8> [[TMP1]], <16 x i32> zeroinitializer
	▲ Show 20 Lines • Show All 7,967 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/aarch64-neon-misc.c

	// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \			// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \
	// RUN: -fallow-half-arguments-and-returns -emit-llvm -o - %s \			// RUN: -disable-O0-optnone -fallow-half-arguments-and-returns -emit-llvm -o - %s \
	// RUN: \| opt -S -mem2reg \| FileCheck %s			// RUN: \| opt -S -mem2reg \| FileCheck %s

	// Test new aarch64 intrinsics and types			// Test new aarch64 intrinsics and types

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: @test_vceqz_s8(			// CHECK-LABEL: @test_vceqz_s8(
	// CHECK: [[TMP0:%.*]] = icmp eq <8 x i8> %a, zeroinitializer			// CHECK: [[TMP0:%.*]] = icmp eq <8 x i8> %a, zeroinitializer
	▲ Show 20 Lines • Show All 2,788 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/aarch64-neon-perm.c

	// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \			// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \
	// RUN: -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s

	// Test new aarch64 intrinsics and types			// Test new aarch64 intrinsics and types
	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: @test_vuzp1_s8(			// CHECK-LABEL: @test_vuzp1_s8(
	// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <8 x i8> %a, <8 x i8> %b, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>			// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <8 x i8> %a, <8 x i8> %b, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
	// CHECK: ret <8 x i8> [[SHUFFLE_I]]			// CHECK: ret <8 x i8> [[SHUFFLE_I]]
	int8x8_t test_vuzp1_s8(int8x8_t a, int8x8_t b) {			int8x8_t test_vuzp1_s8(int8x8_t a, int8x8_t b) {
	▲ Show 20 Lines • Show All 2,245 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/aarch64-neon-scalar-copy.c

	// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \			// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \
	// RUN: -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s


	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define float @test_vdups_lane_f32(<2 x float> %a) #0 {			// CHECK-LABEL: define float @test_vdups_lane_f32(<2 x float> %a) #0 {
	// CHECK: [[TMP0:%.*]] = bitcast <2 x float> %a to <8 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <2 x float> %a to <8 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <2 x float>			// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <2 x float>
	// CHECK: [[VDUPS_LANE:%.*]] = extractelement <2 x float> [[TMP1]], i32 1			// CHECK: [[VDUPS_LANE:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
	// CHECK: ret float [[VDUPS_LANE]]			// CHECK: ret float [[VDUPS_LANE]]
	▲ Show 20 Lines • Show All 217 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/aarch64-neon-scalar-x-indexed-elem.c

	// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon -target-cpu cyclone \			// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon -target-cpu cyclone \
	// RUN: -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s

	// Test new aarch64 intrinsics and types			// Test new aarch64 intrinsics and types

	#include <arm_neon.h>			#include <arm_neon.h>


	// CHECK-LABEL: define float @test_vmuls_lane_f32(float %a, <2 x float> %b) #0 {			// CHECK-LABEL: define float @test_vmuls_lane_f32(float %a, <2 x float> %b) #0 {
	// CHECK: [[TMP0:%.*]] = bitcast <2 x float> %b to <8 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <2 x float> %b to <8 x i8>
	▲ Show 20 Lines • Show All 532 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/aarch64-neon-shifts.c

	// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \			// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \
	// RUN: -ffp-contract=fast -S -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: -disable-O0-optnone -ffp-contract=fast -S -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s

	#include <arm_neon.h>			#include <arm_neon.h>

	uint8x8_t test_shift_vshr(uint8x8_t a) {			uint8x8_t test_shift_vshr(uint8x8_t a) {
	// CHECK-LABEL: test_shift_vshr			// CHECK-LABEL: test_shift_vshr
	// CHECK: %{{.*}} = lshr <8 x i8> %a, <i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5>			// CHECK: %{{.*}} = lshr <8 x i8> %a, <i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5>
	return vshr_n_u8(a, 5);			return vshr_n_u8(a, 5);
	}			}
	Show All 33 Lines

cfe/trunk/test/CodeGen/aarch64-neon-tbl.c

	// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \			// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \
	// RUN: -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s

	// Test new aarch64 intrinsics and types			// Test new aarch64 intrinsics and types

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define <8 x i8> @test_vtbl1_s8(<8 x i8> %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vtbl1_s8(<8 x i8> %a, <8 x i8> %b) #0 {
	// CHECK: [[VTBL1_I:%.*]] = shufflevector <8 x i8> %a, <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL1_I:%.*]] = shufflevector <8 x i8> %a, <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL11_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> [[VTBL1_I]], <8 x i8> %b) #2			// CHECK: [[VTBL11_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> [[VTBL1_I]], <8 x i8> %b) #2
	▲ Show 20 Lines • Show All 1,490 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/aarch64-neon-vcombine.c

	// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon -fallow-half-arguments-and-returns -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon -fallow-half-arguments-and-returns -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s

	// Test new aarch64 intrinsics and types			// Test new aarch64 intrinsics and types

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define <16 x i8> @test_vcombine_s8(<8 x i8> %low, <8 x i8> %high) #0 {			// CHECK-LABEL: define <16 x i8> @test_vcombine_s8(<8 x i8> %low, <8 x i8> %high) #0 {
	// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <8 x i8> %low, <8 x i8> %high, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <8 x i8> %low, <8 x i8> %high, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: ret <16 x i8> [[SHUFFLE_I]]			// CHECK: ret <16 x i8> [[SHUFFLE_I]]
	▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/aarch64-neon-vget-hilo.c

	// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \			// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \
	// RUN: -fallow-half-arguments-and-returns -emit-llvm -o - %s \			// RUN: -fallow-half-arguments-and-returns -disable-O0-optnone -emit-llvm -o - %s \
	// RUN: \| opt -S -mem2reg \| FileCheck %s			// RUN: \| opt -S -mem2reg \| FileCheck %s
	// Test new aarch64 intrinsics and types			// Test new aarch64 intrinsics and types

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define <8 x i8> @test_vget_high_s8(<16 x i8> %a) #0 {			// CHECK-LABEL: define <8 x i8> @test_vget_high_s8(<16 x i8> %a) #0 {
	// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <16 x i8> %a, <16 x i8> %a, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <16 x i8> %a, <16 x i8> %a, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: ret <8 x i8> [[SHUFFLE_I]]			// CHECK: ret <8 x i8> [[SHUFFLE_I]]
	▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/aarch64-neon-vget.c

	// RUN: %clang_cc1 -triple arm64-apple-darwin -target-feature +neon \			// RUN: %clang_cc1 -triple arm64-apple-darwin -target-feature +neon \
	// RUN: -fallow-half-arguments-and-returns -emit-llvm -o - %s \			// RUN: -fallow-half-arguments-and-returns -disable-O0-optnone -emit-llvm -o - %s \
	// RUN: \| opt -S -mem2reg \| FileCheck %s			// RUN: \| opt -S -mem2reg \| FileCheck %s

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define i8 @test_vget_lane_u8(<8 x i8> %a) #0 {			// CHECK-LABEL: define i8 @test_vget_lane_u8(<8 x i8> %a) #0 {
	// CHECK: [[VGET_LANE:%.*]] = extractelement <8 x i8> %a, i32 7			// CHECK: [[VGET_LANE:%.*]] = extractelement <8 x i8> %a, i32 7
	// CHECK: ret i8 [[VGET_LANE]]			// CHECK: ret i8 [[VGET_LANE]]
	uint8_t test_vget_lane_u8(uint8x8_t a) {			uint8_t test_vget_lane_u8(uint8x8_t a) {
	▲ Show 20 Lines • Show All 448 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/aarch64-poly128.c

	// REQUIRES: aarch64-registered-target			// REQUIRES: aarch64-registered-target
	// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \			// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \
	// RUN: -ffp-contract=fast -emit-llvm -o - %s \| opt -S -mem2reg \			// RUN: -disable-O0-optnone -ffp-contract=fast -emit-llvm -o - %s \| opt -S -mem2reg \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s

	// Test new aarch64 intrinsics with poly128			// Test new aarch64 intrinsics with poly128
	// FIXME: Currently, poly128_t equals to uint128, which will be spilt into			// FIXME: Currently, poly128_t equals to uint128, which will be spilt into
	// two 64-bit GPR(eg X0, X1). Now moving data from X0, X1 to FPR128 will			// two 64-bit GPR(eg X0, X1). Now moving data from X0, X1 to FPR128 will
	// introduce 2 store and 1 load instructions(store X0, X1 to memory and			// introduce 2 store and 1 load instructions(store X0, X1 to memory and
	// then load back to Q0). If target has NEON, this is better replaced by			// then load back to Q0). If target has NEON, this is better replaced by
	// FMOV or INS.			// FMOV or INS.
	▲ Show 20 Lines • Show All 240 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/aarch64-poly64.c

	// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \			// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \
	// RUN: -ffp-contract=fast -emit-llvm -o - %s \| opt -S -mem2reg \			// RUN: -ffp-contract=fast -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s

	// Test new aarch64 intrinsics with poly64			// Test new aarch64 intrinsics with poly64

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define <1 x i64> @test_vceq_p64(<1 x i64> %a, <1 x i64> %b) #0 {			// CHECK-LABEL: define <1 x i64> @test_vceq_p64(<1 x i64> %a, <1 x i64> %b) #0 {
	// CHECK: [[CMP_I:%.*]] = icmp eq <1 x i64> %a, %b			// CHECK: [[CMP_I:%.*]] = icmp eq <1 x i64> %a, %b
	▲ Show 20 Lines • Show All 604 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/address-safety-attr-kasan.cpp

	// Make sure the sanitize_address attribute is emitted when using both ASan and KASan.			// Make sure the sanitize_address attribute is emitted when using both ASan and KASan.
	// Also document that __attribute__((no_sanitize_address)) doesn't disable KASan instrumentation.			// Also document that __attribute__((no_sanitize_address)) doesn't disable KASan instrumentation.

	/// RUN: %clang_cc1 -triple i386-unknown-linux -emit-llvm -o - %s \| FileCheck -check-prefix=CHECK-NOASAN %s			/// RUN: %clang_cc1 -triple i386-unknown-linux -disable-O0-optnone -emit-llvm -o - %s \| FileCheck -check-prefix=CHECK-NOASAN %s
	/// RUN: %clang_cc1 -triple i386-unknown-linux -fsanitize=address -emit-llvm -o - %s \| FileCheck -check-prefix=CHECK-ASAN %s			/// RUN: %clang_cc1 -triple i386-unknown-linux -fsanitize=address -disable-O0-optnone -emit-llvm -o - %s \| FileCheck -check-prefix=CHECK-ASAN %s
	/// RUN: %clang_cc1 -triple i386-unknown-linux -fsanitize=kernel-address -emit-llvm -o - %s \| FileCheck -check-prefix=CHECK-KASAN %s			/// RUN: %clang_cc1 -triple i386-unknown-linux -fsanitize=kernel-address -disable-O0-optnone -emit-llvm -o - %s \| FileCheck -check-prefix=CHECK-KASAN %s

	int HasSanitizeAddress() {			int HasSanitizeAddress() {
	return 1;			return 1;
	}			}
	// CHECK-NOASAN: {{Function Attrs: noinline nounwind$}}			// CHECK-NOASAN: {{Function Attrs: noinline nounwind$}}
	// CHECK-ASAN: Function Attrs: noinline nounwind sanitize_address			// CHECK-ASAN: Function Attrs: noinline nounwind sanitize_address
	// CHECK-KASAN: Function Attrs: noinline nounwind sanitize_address			// CHECK-KASAN: Function Attrs: noinline nounwind sanitize_address

	Show All 24 Lines

cfe/trunk/test/CodeGen/address-safety-attr.cpp

	int DefinedInDifferentFile(int *a);			int DefinedInDifferentFile(int *a);
	// RUN: echo "int DefinedInDifferentFile(int a) { return a; }" > %t.extra-source.cpp			// RUN: echo "int DefinedInDifferentFile(int a) { return a; }" > %t.extra-source.cpp
	// RUN: echo "struct S { S(){} ~S(){} };" >> %t.extra-source.cpp			// RUN: echo "struct S { S(){} ~S(){} };" >> %t.extra-source.cpp
	// RUN: echo "S glob_array[5];" >> %t.extra-source.cpp			// RUN: echo "S glob_array[5];" >> %t.extra-source.cpp

	// RUN: %clang_cc1 -std=c++11 -triple x86_64-apple-darwin -emit-llvm -o - %s -include %t.extra-source.cpp \| FileCheck -check-prefix=WITHOUT %s			// RUN: %clang_cc1 -std=c++11 -triple x86_64-apple-darwin -disable-O0-optnone -emit-llvm -o - %s -include %t.extra-source.cpp \| FileCheck -check-prefix=WITHOUT %s
	// RUN: %clang_cc1 -std=c++11 -triple x86_64-apple-darwin -emit-llvm -o - %s -include %t.extra-source.cpp -fsanitize=address \| FileCheck -check-prefix=ASAN %s			// RUN: %clang_cc1 -std=c++11 -triple x86_64-apple-darwin -disable-O0-optnone -emit-llvm -o - %s -include %t.extra-source.cpp -fsanitize=address \| FileCheck -check-prefix=ASAN %s

	// RUN: echo "fun:BlacklistedFunction" > %t.func.blacklist			// RUN: echo "fun:BlacklistedFunction" > %t.func.blacklist
	// RUN: %clang_cc1 -std=c++11 -triple x86_64-apple-darwin -emit-llvm -o - %s -include %t.extra-source.cpp -fsanitize=address -fsanitize-blacklist=%t.func.blacklist \| FileCheck -check-prefix=BLFUNC %s			// RUN: %clang_cc1 -std=c++11 -triple x86_64-apple-darwin -disable-O0-optnone -emit-llvm -o - %s -include %t.extra-source.cpp -fsanitize=address -fsanitize-blacklist=%t.func.blacklist \| FileCheck -check-prefix=BLFUNC %s

	// The blacklist file uses regexps, so escape backslashes, which are common in			// The blacklist file uses regexps, so escape backslashes, which are common in
	// Windows paths.			// Windows paths.
	// RUN: echo "src:%s" \| sed -e 's/\\/\\\\/g' > %t.file.blacklist			// RUN: echo "src:%s" \| sed -e 's/\\/\\\\/g' > %t.file.blacklist
	// RUN: %clang_cc1 -std=c++11 -triple x86_64-apple-darwin -emit-llvm -o - %s -include %t.extra-source.cpp -fsanitize=address -fsanitize-blacklist=%t.file.blacklist \| FileCheck -check-prefix=BLFILE %s			// RUN: %clang_cc1 -std=c++11 -triple x86_64-apple-darwin -disable-O0-optnone -emit-llvm -o - %s -include %t.extra-source.cpp -fsanitize=address -fsanitize-blacklist=%t.file.blacklist \| FileCheck -check-prefix=BLFILE %s

	// The sanitize_address attribute should be attached to functions			// The sanitize_address attribute should be attached to functions
	// when AddressSanitizer is enabled, unless no_sanitize_address attribute			// when AddressSanitizer is enabled, unless no_sanitize_address attribute
	// is present.			// is present.

	// Attributes for function defined in different source file:			// Attributes for function defined in different source file:
	// WITHOUT: DefinedInDifferentFile{{.*}} [[NOATTR:#[0-9]+]]			// WITHOUT: DefinedInDifferentFile{{.*}} [[NOATTR:#[0-9]+]]
	// BLFILE: DefinedInDifferentFile{{.*}} [[WITH:#[0-9]+]]			// BLFILE: DefinedInDifferentFile{{.*}} [[WITH:#[0-9]+]]
	▲ Show 20 Lines • Show All 132 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/arm-crc32.c

	// RUN: %clang_cc1 -triple armv8-none-linux-gnueabi \			// RUN: %clang_cc1 -triple armv8-none-linux-gnueabi \
	// RUN: -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s

	int crc32b(int a, char b)			int crc32b(int a, char b)
	{			{
	return __builtin_arm_crc32b(a,b);			return __builtin_arm_crc32b(a,b);
	// CHECK: [[T0:%[0-9]+]] = zext i8 %b to i32			// CHECK: [[T0:%[0-9]+]] = zext i8 %b to i32
	// CHECK: call i32 @llvm.arm.crc32b(i32 %a, i32 [[T0]])			// CHECK: call i32 @llvm.arm.crc32b(i32 %a, i32 [[T0]])
	}			}

	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/arm-neon-directed-rounding.c

	// RUN: %clang_cc1 -triple thumbv8-linux-gnueabihf -target-cpu cortex-a57 -ffreestanding -emit-llvm %s -o - \| opt -S -mem2reg \| FileCheck %s			// RUN: %clang_cc1 -triple thumbv8-linux-gnueabihf -target-cpu cortex-a57 -ffreestanding -disable-O0-optnone -emit-llvm %s -o - \| opt -S -mem2reg \| FileCheck %s

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define <2 x float> @test_vrnda_f32(<2 x float> %a) #0 {			// CHECK-LABEL: define <2 x float> @test_vrnda_f32(<2 x float> %a) #0 {
	// CHECK: [[VRNDA_V1_I:%.*]] = call <2 x float> @llvm.arm.neon.vrinta.v2f32(<2 x float> %a) #2			// CHECK: [[VRNDA_V1_I:%.*]] = call <2 x float> @llvm.arm.neon.vrinta.v2f32(<2 x float> %a) #2
	// CHECK: ret <2 x float> [[VRNDA_V1_I]]			// CHECK: ret <2 x float> [[VRNDA_V1_I]]
	float32x2_t test_vrnda_f32(float32x2_t a) {			float32x2_t test_vrnda_f32(float32x2_t a) {
	return vrnda_f32(a);			return vrnda_f32(a);
	▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/arm-neon-fma.c

	// RUN: %clang_cc1 -triple thumbv7-none-linux-gnueabihf \			// RUN: %clang_cc1 -triple thumbv7-none-linux-gnueabihf \
	// RUN: -target-abi aapcs \			// RUN: -target-abi aapcs \
	// RUN: -target-cpu cortex-a7 \			// RUN: -target-cpu cortex-a7 \
	// RUN: -mfloat-abi hard \			// RUN: -mfloat-abi hard \
	// RUN: -ffreestanding \			// RUN: -ffreestanding \
	// RUN: -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define <2 x float> @test_fma_order(<2 x float> %accum, <2 x float> %lhs, <2 x float> %rhs) #0 {			// CHECK-LABEL: define <2 x float> @test_fma_order(<2 x float> %accum, <2 x float> %lhs, <2 x float> %rhs) #0 {
	// CHECK: [[TMP6:%.*]] = call <2 x float> @llvm.fma.v2f32(<2 x float> %lhs, <2 x float> %rhs, <2 x float> %accum) #2			// CHECK: [[TMP6:%.*]] = call <2 x float> @llvm.fma.v2f32(<2 x float> %lhs, <2 x float> %rhs, <2 x float> %accum) #2
	// CHECK: ret <2 x float> [[TMP6]]			// CHECK: ret <2 x float> [[TMP6]]
	float32x2_t test_fma_order(float32x2_t accum, float32x2_t lhs, float32x2_t rhs) {			float32x2_t test_fma_order(float32x2_t accum, float32x2_t lhs, float32x2_t rhs) {
	return vfma_f32(accum, lhs, rhs);			return vfma_f32(accum, lhs, rhs);
	}			}

	// CHECK-LABEL: define <4 x float> @test_fmaq_order(<4 x float> %accum, <4 x float> %lhs, <4 x float> %rhs) #0 {			// CHECK-LABEL: define <4 x float> @test_fmaq_order(<4 x float> %accum, <4 x float> %lhs, <4 x float> %rhs) #0 {
	// CHECK: [[TMP6:%.*]] = call <4 x float> @llvm.fma.v4f32(<4 x float> %lhs, <4 x float> %rhs, <4 x float> %accum) #2			// CHECK: [[TMP6:%.*]] = call <4 x float> @llvm.fma.v4f32(<4 x float> %lhs, <4 x float> %rhs, <4 x float> %accum) #2
	// CHECK: ret <4 x float> [[TMP6]]			// CHECK: ret <4 x float> [[TMP6]]
	float32x4_t test_fmaq_order(float32x4_t accum, float32x4_t lhs, float32x4_t rhs) {			float32x4_t test_fmaq_order(float32x4_t accum, float32x4_t lhs, float32x4_t rhs) {
	return vfmaq_f32(accum, lhs, rhs);			return vfmaq_f32(accum, lhs, rhs);
	}			}

cfe/trunk/test/CodeGen/arm-neon-numeric-maxmin.c

	// RUN: %clang_cc1 -triple thumbv8-linux-gnueabihf -target-cpu cortex-a57 -ffreestanding -emit-llvm %s -o - \| opt -S -mem2reg \| FileCheck %s			// RUN: %clang_cc1 -triple thumbv8-linux-gnueabihf -target-cpu cortex-a57 -ffreestanding -disable-O0-optnone -emit-llvm %s -o - \| opt -S -mem2reg \| FileCheck %s

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define <2 x float> @test_vmaxnm_f32(<2 x float> %a, <2 x float> %b) #0 {			// CHECK-LABEL: define <2 x float> @test_vmaxnm_f32(<2 x float> %a, <2 x float> %b) #0 {
	// CHECK: [[VMAXNM_V2_I:%.*]] = call <2 x float> @llvm.arm.neon.vmaxnm.v2f32(<2 x float> %a, <2 x float> %b) #2			// CHECK: [[VMAXNM_V2_I:%.*]] = call <2 x float> @llvm.arm.neon.vmaxnm.v2f32(<2 x float> %a, <2 x float> %b) #2
	// CHECK: ret <2 x float> [[VMAXNM_V2_I]]			// CHECK: ret <2 x float> [[VMAXNM_V2_I]]
	float32x2_t test_vmaxnm_f32(float32x2_t a, float32x2_t b) {			float32x2_t test_vmaxnm_f32(float32x2_t a, float32x2_t b) {
	return vmaxnm_f32(a, b);			return vmaxnm_f32(a, b);
	Show All 22 Lines

cfe/trunk/test/CodeGen/arm-neon-shifts.c

	// REQUIRES: arm-registered-target			// REQUIRES: arm-registered-target
	// RUN: %clang_cc1 -triple thumbv7-apple-darwin \			// RUN: %clang_cc1 -triple thumbv7-apple-darwin \
				// RUN: -disable-O0-optnone \
	// RUN: -target-cpu cortex-a8 \			// RUN: -target-cpu cortex-a8 \
	// RUN: -ffreestanding \			// RUN: -ffreestanding \
	// RUN: -emit-llvm -w -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: -emit-llvm -w -o - %s \| opt -S -mem2reg \| FileCheck %s

	#include <arm_neon.h>			#include <arm_neon.h>

	uint8x8_t test_shift_vshr(uint8x8_t a) {			uint8x8_t test_shift_vshr(uint8x8_t a) {
	// CHECK-LABEL: test_shift_vshr			// CHECK-LABEL: test_shift_vshr
	Show All 36 Lines

cfe/trunk/test/CodeGen/arm-neon-vcvtX.c

	// RUN: %clang_cc1 -triple thumbv8-linux-gnueabihf -target-cpu cortex-a57 -ffreestanding -emit-llvm %s -o - \| opt -S -mem2reg \| FileCheck %s			// RUN: %clang_cc1 -triple thumbv8-linux-gnueabihf -target-cpu cortex-a57 -ffreestanding -disable-O0-optnone -emit-llvm %s -o - \| opt -S -mem2reg \| FileCheck %s

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define <2 x i32> @test_vcvta_s32_f32(<2 x float> %a) #0 {			// CHECK-LABEL: define <2 x i32> @test_vcvta_s32_f32(<2 x float> %a) #0 {
	// CHECK: [[VCVTA_S32_V1_I:%.*]] = call <2 x i32> @llvm.arm.neon.vcvtas.v2i32.v2f32(<2 x float> %a) #2			// CHECK: [[VCVTA_S32_V1_I:%.*]] = call <2 x i32> @llvm.arm.neon.vcvtas.v2i32.v2f32(<2 x float> %a) #2
	// CHECK: ret <2 x i32> [[VCVTA_S32_V1_I]]			// CHECK: ret <2 x i32> [[VCVTA_S32_V1_I]]
	int32x2_t test_vcvta_s32_f32(float32x2_t a) {			int32x2_t test_vcvta_s32_f32(float32x2_t a) {
	return vcvta_s32_f32(a);			return vcvta_s32_f32(a);
	▲ Show 20 Lines • Show All 106 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/arm-neon-vget.c

	// RUN: %clang_cc1 -triple thumbv7-apple-darwin \			// RUN: %clang_cc1 -triple thumbv7-apple-darwin \
	// RUN: -target-abi apcs-gnu \			// RUN: -target-abi apcs-gnu \
	// RUN: -target-cpu cortex-a8 \			// RUN: -target-cpu cortex-a8 \
	// RUN: -mfloat-abi soft \			// RUN: -mfloat-abi soft \
	// RUN: -target-feature +soft-float-abi \			// RUN: -target-feature +soft-float-abi \
	// RUN: -ffreestanding \			// RUN: -ffreestanding \
	// RUN: -emit-llvm -w -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: -disable-O0-optnone -emit-llvm -w -o - %s \| opt -S -mem2reg \| FileCheck %s

	#include <arm_neon.h>			#include <arm_neon.h>

	// Check that the vget_low/vget_high intrinsics generate a single shuffle			// Check that the vget_low/vget_high intrinsics generate a single shuffle
	// without any bitcasting.			// without any bitcasting.
	int8x8_t low_s8(int8x16_t a) {			int8x8_t low_s8(int8x16_t a) {
	// CHECK: shufflevector <16 x i8> %a, <16 x i8> %a, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			// CHECK: shufflevector <16 x i8> %a, <16 x i8> %a, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	return vget_low_s8(a);			return vget_low_s8(a);
	▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/arm64-crc32.c

	// REQUIRES: aarch64-registered-target			// REQUIRES: aarch64-registered-target
	// RUN: %clang_cc1 -triple arm64-none-linux-gnu \			// RUN: %clang_cc1 -triple arm64-none-linux-gnu \
	// RUN: -S -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: -disable-O0-optnone -S -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s

	int crc32b(int a, char b)			int crc32b(int a, char b)
	{			{
	return __builtin_arm_crc32b(a,b);			return __builtin_arm_crc32b(a,b);
	// CHECK: [[T0:%[0-9]+]] = zext i8 %b to i32			// CHECK: [[T0:%[0-9]+]] = zext i8 %b to i32
	// CHECK: call i32 @llvm.aarch64.crc32b(i32 %a, i32 [[T0]])			// CHECK: call i32 @llvm.aarch64.crc32b(i32 %a, i32 [[T0]])
	}			}

	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/arm64-lanes.c

	// RUN: %clang_cc1 -triple arm64-apple-ios7 -target-feature +neon -ffreestanding -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: %clang_cc1 -triple arm64-apple-ios7 -target-feature +neon -ffreestanding -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s
	// RUN: %clang_cc1 -triple aarch64_be-linux-gnu -target-feature +neon -ffreestanding -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s --check-prefix CHECK-BE			// RUN: %clang_cc1 -triple aarch64_be-linux-gnu -target-feature +neon -ffreestanding -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s --check-prefix CHECK-BE

	#include <arm_neon.h>			#include <arm_neon.h>

	int8_t test_vdupb_lane_s8(int8x8_t src) {			int8_t test_vdupb_lane_s8(int8x8_t src) {
	return vdupb_lane_s8(src, 2);			return vdupb_lane_s8(src, 2);
	// CHECK-LABEL: @test_vdupb_lane_s8			// CHECK-LABEL: @test_vdupb_lane_s8
	// CHECK: extractelement <8 x i8> %src, i32 2			// CHECK: extractelement <8 x i8> %src, i32 2

	▲ Show 20 Lines • Show All 117 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/arm64_vcopy.c

	// RUN: %clang_cc1 -triple arm64-apple-ios7 -target-feature +neon -ffreestanding -S -o - -emit-llvm %s \| opt -S -mem2reg \| FileCheck %s			// RUN: %clang_cc1 -triple arm64-apple-ios7 -target-feature +neon -ffreestanding -S -o - -disable-O0-optnone -emit-llvm %s \| opt -S -mem2reg \| FileCheck %s

	// Test ARM64 SIMD copy vector element to vector element: vcopyq_lane*			// Test ARM64 SIMD copy vector element to vector element: vcopyq_lane*

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define <16 x i8> @test_vcopyq_laneq_s8(<16 x i8> %a1, <16 x i8> %a2) #0 {			// CHECK-LABEL: define <16 x i8> @test_vcopyq_laneq_s8(<16 x i8> %a1, <16 x i8> %a2) #0 {
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <16 x i8> %a2, i32 13			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <16 x i8> %a2, i32 13
	// CHECK: [[VSET_LANE:%.*]] = insertelement <16 x i8> %a1, i8 [[VGETQ_LANE]], i32 3			// CHECK: [[VSET_LANE:%.*]] = insertelement <16 x i8> %a1, i8 [[VGETQ_LANE]], i32 3
	▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/arm64_vdupq_n_f64.c

	// RUN: %clang_cc1 -triple arm64-apple-ios7 -target-feature +neon -ffreestanding -fallow-half-arguments-and-returns -S -o - -emit-llvm %s \| opt -S -mem2reg \| FileCheck %s			// RUN: %clang_cc1 -triple arm64-apple-ios7 -target-feature +neon -ffreestanding -fallow-half-arguments-and-returns -S -o - -disable-O0-optnone -emit-llvm %s \| opt -S -mem2reg \| FileCheck %s

	#include <arm_neon.h>			#include <arm_neon.h>

	// vdupq_n_f64 -> dup.2d v0, v0[0]			// vdupq_n_f64 -> dup.2d v0, v0[0]
	//			//
	// CHECK-LABEL: define <2 x double> @test_vdupq_n_f64(double %w) #0 {			// CHECK-LABEL: define <2 x double> @test_vdupq_n_f64(double %w) #0 {
	// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x double> undef, double %w, i32 0			// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x double> undef, double %w, i32 0
	// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x double> [[VECINIT_I]], double %w, i32 1			// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x double> [[VECINIT_I]], double %w, i32 1
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/attr-coldhot.c

	// RUN: %clang_cc1 -emit-llvm %s -o - \| FileCheck %s			// RUN: %clang_cc1 -emit-llvm %s -o - \| FileCheck %s -check-prefixes=CHECK,O0
				// RUN: %clang_cc1 -emit-llvm %s -o - -O1 -disable-llvm-passes \| FileCheck %s -check-prefixes=CHECK,O1

	int test1() __attribute__((__cold__)) {			int test1() __attribute__((__cold__)) {
	return 42;			return 42;

	// Check that we set the optsize attribute on the function.			// Check that we set the optsize attribute on the function.
	// CHECK: @test1{{.*}}[[ATTR:#[0-9]+]]			// CHECK: @test1{{.*}}[[ATTR:#[0-9]+]]
	// CHECK: ret			// CHECK: ret
	}			}

	// CHECK: attributes [[ATTR]] = { {{.}}cold{{.}}optsize{{.*}} }			// O0: attributes [[ATTR]] = { {{.}}cold{{.}}optnone{{.*}} }
				// O1: attributes [[ATTR]] = { {{.}}cold{{.}}optsize{{.*}} }

cfe/trunk/test/CodeGen/attr-naked.c

	// RUN: %clang_cc1 -triple x86_64-apple-macosx10.7.0 %s -emit-llvm -o - \| FileCheck %s			// RUN: %clang_cc1 -triple x86_64-apple-macosx10.7.0 %s -emit-llvm -o - \| FileCheck %s

	void t1() __attribute__((naked));			void t1() __attribute__((naked));

	// Basic functionality check			// Basic functionality check
	// (Note that naked needs to imply noinline to work properly.)			// (Note that naked needs to imply noinline to work properly.)
	// CHECK: define void @t1() [[NAKED:#[0-9]+]] {			// CHECK: define void @t1() [[NAKED_OPTNONE:#[0-9]+]] {
	void t1()			void t1()
	{			{
	}			}

	// Make sure this doesn't explode in the verifier.			// Make sure this doesn't explode in the verifier.
	// (It doesn't really make sense, but it isn't invalid.)			// (It doesn't really make sense, but it isn't invalid.)
	// CHECK: define void @t2() [[NAKED]] {			// CHECK: define void @t2() [[NAKED:#[0-9]+]] {
	__attribute((naked, always_inline)) void t2() {			__attribute((naked, always_inline)) void t2() {
	}			}

	// Make sure not to generate prolog or epilog for naked functions.			// Make sure not to generate prolog or epilog for naked functions.
	__attribute((naked)) void t3(int x) {			__attribute((naked)) void t3(int x) {
	// CHECK: define void @t3(i32)			// CHECK: define void @t3(i32)
	// CHECK-NOT: alloca			// CHECK-NOT: alloca
	// CHECK-NOT: store			// CHECK-NOT: store
	// CHECK: unreachable			// CHECK: unreachable
	}			}

				// CHECK: attributes [[NAKED_OPTNONE]] = { naked noinline nounwind optnone{{.*}} }
	// CHECK: attributes [[NAKED]] = { naked noinline nounwind{{.*}} }			// CHECK: attributes [[NAKED]] = { naked noinline nounwind{{.*}} }

cfe/trunk/test/CodeGen/builtins-arm-exclusive.c

	// RUN: %clang_cc1 -Wall -Werror -triple thumbv8-linux-gnueabi -fno-signed-char -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: %clang_cc1 -Wall -Werror -triple thumbv8-linux-gnueabi -fno-signed-char -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s
	// RUN: %clang_cc1 -Wall -Werror -triple arm64-apple-ios7.0 -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s --check-prefix=CHECK-ARM64			// RUN: %clang_cc1 -Wall -Werror -triple arm64-apple-ios7.0 -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s --check-prefix=CHECK-ARM64


	struct Simple {			struct Simple {
	char a, b;			char a, b;
	};			};

	int test_ldrex(char addr, long long addr64, float *addrfloat) {			int test_ldrex(char addr, long long addr64, float *addrfloat) {
	// CHECK-LABEL: @test_ldrex			// CHECK-LABEL: @test_ldrex
	// CHECK-ARM64-LABEL: @test_ldrex			// CHECK-ARM64-LABEL: @test_ldrex
	▲ Show 20 Lines • Show All 400 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/builtins-arm.c

	// RUN: %clang_cc1 -Wall -Werror -triple thumbv7-eabi -target-cpu cortex-a8 -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: %clang_cc1 -Wall -Werror -triple thumbv7-eabi -target-cpu cortex-a8 -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s

	#include <stdint.h>			#include <stdint.h>

	void *f0()			void *f0()
	{			{
	return __builtin_thread_pointer();			return __builtin_thread_pointer();
	}			}

	▲ Show 20 Lines • Show All 237 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/builtins-arm64.c

	// RUN: %clang_cc1 -triple arm64-unknown-linux -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: %clang_cc1 -triple arm64-unknown-linux -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s

	void f0(void a, void b) {			void f0(void a, void b) {
	__clear_cache(a,b);			__clear_cache(a,b);
	// CHECK: call {{.*}} @__clear_cache			// CHECK: call {{.*}} @__clear_cache
	}			}

	void *tp (void) {			void *tp (void) {
	return __builtin_thread_pointer ();			return __builtin_thread_pointer ();
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/noduplicate-cxx11-test.cpp

	Show All 11 Lines
	}			}

	int main() {			int main() {

	return noduplicatedfun(5);			return noduplicatedfun(5);

	}			}

	// CHECK: attributes [[NI]] = { noduplicate nounwind{{.*}} }			// CHECK: attributes [[NI]] = { noduplicate {{.}}nounwind{{.}} }

cfe/trunk/test/CodeGen/pragma-weak.c

	Show First 20 Lines • Show All 183 Lines • ▼ Show 20 Lines
	void zzz(void){}			void zzz(void){}
	#pragma weak yyy			#pragma weak yyy
	// NOTE: weak doesn't apply, not before or in same TopLevelDec(!)			// NOTE: weak doesn't apply, not before or in same TopLevelDec(!)
	// CHECK-LABEL: define void @yyy()			// CHECK-LABEL: define void @yyy()

	int correct_linkage;			int correct_linkage;

	// CHECK: attributes [[NI]] = { noinline nounwind{{.*}} }			// CHECK: attributes [[NI]] = { noinline nounwind{{.*}} }
	// CHECK: attributes [[RN]] = { noinline nounwind readnone{{.*}} }			// CHECK: attributes [[RN]] = { noinline nounwind optnone readnone{{.*}} }

cfe/trunk/test/CodeGen/unwind-attr.c

	Show All 17 Lines

	// <rdar://problem/8283071>: not for weak functions			// <rdar://problem/8283071>: not for weak functions
	// CHECK: define weak [[INT:i.*]] @test2() [[TF]] {			// CHECK: define weak [[INT:i.*]] @test2() [[TF]] {
	// CHECK-NOEXC: define weak [[INT:i.*]] @test2() [[NUW]] {			// CHECK-NOEXC: define weak [[INT:i.*]] @test2() [[NUW]] {
	__attribute__((weak)) int test2(void) {			__attribute__((weak)) int test2(void) {
	return 0;			return 0;
	}			}

	// CHECK: attributes [[TF]] = { noinline "{{.*}} }			// CHECK: attributes [[TF]] = { noinline optnone "{{.*}} }
	// CHECK: attributes [[NUW]] = { noinline nounwind{{.*}} }			// CHECK: attributes [[NUW]] = { noinline nounwind{{.*}} }

	// CHECK-NOEXC: attributes [[NUW]] = { noinline nounwind{{.*}} }			// CHECK-NOEXC: attributes [[NUW]] = { noinline nounwind{{.*}} }

cfe/trunk/test/CodeGenCXX/apple-kext-indirect-virtual-dtor-call.cpp

	// RUN: %clang_cc1 -triple x86_64-apple-darwin10 -fapple-kext -fno-rtti -emit-llvm -o - %s \| FileCheck %s			// RUN: %clang_cc1 -triple x86_64-apple-darwin10 -fapple-kext -fno-rtti -disable-O0-optnone -emit-llvm -o - %s \| FileCheck %s

	// CHECK: @_ZTV5TemplIiE = internal unnamed_addr constant { [7 x i8] } { [7 x i8] [i8* null, i8* null, i8* bitcast (void (%struct.Templ) @_ZN5TemplIiED1Ev to i8), i8 bitcast (void (%struct.Templ) @_ZN5TemplIiED0Ev to i8), i8 bitcast (void (%struct.Templ) @_ZN5TemplIiE1fEv to i8), i8 bitcast (void (%struct.Templ) @_ZN5TemplIiE1gEv to i8), i8 null] }			// CHECK: @_ZTV5TemplIiE = internal unnamed_addr constant { [7 x i8] } { [7 x i8] [i8* null, i8* null, i8* bitcast (void (%struct.Templ) @_ZN5TemplIiED1Ev to i8), i8 bitcast (void (%struct.Templ) @_ZN5TemplIiED0Ev to i8), i8 bitcast (void (%struct.Templ) @_ZN5TemplIiE1fEv to i8), i8 bitcast (void (%struct.Templ) @_ZN5TemplIiE1gEv to i8), i8 null] }

	struct B1 {			struct B1 {
	virtual ~B1();			virtual ~B1();
	};			};

	B1::~B1() {}			B1::~B1() {}
	Show All 39 Lines

cfe/trunk/test/CodeGenCXX/apple-kext-no-staticinit-section.cpp

	// RUN: %clang_cc1 -triple x86_64-apple-darwin10 -fapple-kext -fno-rtti -emit-llvm -o - %s \| FileCheck %s			// RUN: %clang_cc1 -triple x86_64-apple-darwin10 -fapple-kext -fno-rtti -disable-O0-optnone -emit-llvm -o - %s \| FileCheck %s
	// rdar://8825235			// rdar://8825235
	/**			/**
	1) Normally, global object construction code ends up in __StaticInit segment of text section			1) Normally, global object construction code ends up in __StaticInit segment of text section
	.section __TEXT,__StaticInit,regular,pure_instructions			.section __TEXT,__StaticInit,regular,pure_instructions
	In kext mode, they end up in the __text segment.			In kext mode, they end up in the __text segment.
	*/			*/

	class foo {			class foo {
	Show All 11 Lines

cfe/trunk/test/CodeGenCXX/debug-info-global-ctor-dtor.cpp

	// RUN: %clang_cc1 %s -debug-info-kind=limited -triple %itanium_abi_triple -fno-use-cxa-atexit -S -emit-llvm -o - \			// RUN: %clang_cc1 %s -debug-info-kind=limited -triple %itanium_abi_triple -fno-use-cxa-atexit -S -disable-O0-optnone -emit-llvm -o - \
	// RUN: \| FileCheck %s --check-prefix=CHECK-NOKEXT			// RUN: \| FileCheck %s --check-prefix=CHECK-NOKEXT
	// RUN: %clang_cc1 %s -debug-info-kind=limited -triple %itanium_abi_triple -fno-use-cxa-atexit -fapple-kext -S -emit-llvm -o - \			// RUN: %clang_cc1 %s -debug-info-kind=limited -triple %itanium_abi_triple -fno-use-cxa-atexit -fapple-kext -S -disable-O0-optnone -emit-llvm -o - \
	// RUN: \| FileCheck %s --check-prefix=CHECK-KEXT			// RUN: \| FileCheck %s --check-prefix=CHECK-KEXT

	class A {			class A {
	public:			public:
	A() {}			A() {}
	virtual ~A() {}			virtual ~A() {}
	};			};

	Show All 16 Lines

cfe/trunk/test/CodeGenCXX/optnone-templates.cpp

	// RUN: %clang_cc1 %s -triple %itanium_abi_triple -std=c++11 -emit-llvm -o - \| FileCheck %s			// RUN: %clang_cc1 %s -triple %itanium_abi_triple -std=c++11 -disable-O0-optnone -emit-llvm -o - \| FileCheck %s

	// Test optnone on template instantiations.			// Test optnone on template instantiations.

	//-- Effect of optnone on generic add template function.			//-- Effect of optnone on generic add template function.

	template <typename T> T template_normal(T a)			template <typename T> T template_normal(T a)
	{			{
	return a + a;			return a + a;
	▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGenCXX/static-init-wasm.cpp

	Show All 37 Lines

	// Test various aspects of static constructor calls.			// Test various aspects of static constructor calls.
	struct A {			struct A {
	A();			A();
	};			};

	A theA;			A theA;

	// WEBASSEMBLY32: define internal void @__cxx_global_var_init() #0 section ".text.__startup" {			// WEBASSEMBLY32: define internal void @__cxx_global_var_init() #3 section ".text.__startup" {
	// WEBASSEMBLY32: call %struct.A* @_ZN1AC1Ev(%struct.A* @theA)			// WEBASSEMBLY32: call %struct.A* @_ZN1AC1Ev(%struct.A* @theA)
	// WEBASSEMBLY32: define internal void @_GLOBAL__sub_I_static_init_wasm.cpp() #0 section ".text.__startup" {			// WEBASSEMBLY32: define internal void @_GLOBAL__sub_I_static_init_wasm.cpp() #3 section ".text.__startup" {
	// WEBASSEMBLY32: call void @__cxx_global_var_init()			// WEBASSEMBLY32: call void @__cxx_global_var_init()
	//			//
	// WEBASSEMBLY64: define internal void @__cxx_global_var_init() #0 section ".text.__startup" {			// WEBASSEMBLY64: define internal void @__cxx_global_var_init() #3 section ".text.__startup" {
	// WEBASSEMBLY64: call %struct.A* @_ZN1AC1Ev(%struct.A* @theA)			// WEBASSEMBLY64: call %struct.A* @_ZN1AC1Ev(%struct.A* @theA)
	// WEBASSEMBLY64: define internal void @_GLOBAL__sub_I_static_init_wasm.cpp() #0 section ".text.__startup" {			// WEBASSEMBLY64: define internal void @_GLOBAL__sub_I_static_init_wasm.cpp() #3 section ".text.__startup" {
	// WEBASSEMBLY64: call void @__cxx_global_var_init()			// WEBASSEMBLY64: call void @__cxx_global_var_init()

cfe/trunk/test/CodeGenCXX/thunks.cpp

	Show First 20 Lines • Show All 395 Lines • ▼ Show 20 Lines

	// This is from Test5:			// This is from Test5:
	// CHECK-OPT-LABEL: define linkonce_odr void @_ZTv0_n24_N5Test51B1fEv			// CHECK-OPT-LABEL: define linkonce_odr void @_ZTv0_n24_N5Test51B1fEv

	// This is from Test10:			// This is from Test10:
	// CHECK-OPT-LABEL: define linkonce_odr void @_ZN6Test101C3fooEv			// CHECK-OPT-LABEL: define linkonce_odr void @_ZN6Test101C3fooEv
	// CHECK-OPT-LABEL: define linkonce_odr void @_ZThn8_N6Test101C3fooEv			// CHECK-OPT-LABEL: define linkonce_odr void @_ZThn8_N6Test101C3fooEv

	// CHECK-NONOPT: attributes [[NUW]] = { noinline nounwind uwtable{{.*}} }			// CHECK-NONOPT: attributes [[NUW]] = { noinline nounwind optnone uwtable{{.*}} }
	// CHECK-OPT: attributes [[NUW]] = { nounwind uwtable{{.*}} }			// CHECK-OPT: attributes [[NUW]] = { nounwind uwtable{{.*}} }

cfe/trunk/test/CodeGenObjC/gnu-exceptions.m

Show All 26 Lines	@try {
// NEW-ABI: objc_end_catch		// NEW-ABI: objc_end_catch

log(0);		log(0);
}		}

log(1);		log(1);
}		}

// CHECK: attributes [[TF]] = { noinline "{{.*}} }		// CHECK: attributes [[TF]] = { noinline optnone "{{.*}} }

cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl

	Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
	// X86-NOT: "amdgpu-num-sgpr"			// X86-NOT: "amdgpu-num-sgpr"

	// CHECK-NOT: "amdgpu-flat-work-group-size"="0,0"			// CHECK-NOT: "amdgpu-flat-work-group-size"="0,0"
	// CHECK-NOT: "amdgpu-waves-per-eu"="0"			// CHECK-NOT: "amdgpu-waves-per-eu"="0"
	// CHECK-NOT: "amdgpu-waves-per-eu"="0,0"			// CHECK-NOT: "amdgpu-waves-per-eu"="0,0"
	// CHECK-NOT: "amdgpu-num-sgpr"="0"			// CHECK-NOT: "amdgpu-num-sgpr"="0"
	// CHECK-NOT: "amdgpu-num-vgpr"="0"			// CHECK-NOT: "amdgpu-num-vgpr"="0"

	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64]] = { noinline nounwind "amdgpu-flat-work-group-size"="32,64"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64"
	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_64_64]] = { noinline nounwind "amdgpu-flat-work-group-size"="64,64"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_64_64]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="64,64"
	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_16_128]] = { noinline nounwind "amdgpu-flat-work-group-size"="16,128"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_16_128]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="16,128"
	// CHECK-DAG: attributes [[WAVES_PER_EU_2]] = { noinline nounwind "amdgpu-waves-per-eu"="2"			// CHECK-DAG: attributes [[WAVES_PER_EU_2]] = { noinline nounwind optnone "amdgpu-waves-per-eu"="2"
	// CHECK-DAG: attributes [[WAVES_PER_EU_2_4]] = { noinline nounwind "amdgpu-waves-per-eu"="2,4"			// CHECK-DAG: attributes [[WAVES_PER_EU_2_4]] = { noinline nounwind optnone "amdgpu-waves-per-eu"="2,4"
	// CHECK-DAG: attributes [[NUM_SGPR_32]] = { noinline nounwind "amdgpu-num-sgpr"="32"			// CHECK-DAG: attributes [[NUM_SGPR_32]] = { noinline nounwind optnone "amdgpu-num-sgpr"="32"
	// CHECK-DAG: attributes [[NUM_VGPR_64]] = { noinline nounwind "amdgpu-num-vgpr"="64"			// CHECK-DAG: attributes [[NUM_VGPR_64]] = { noinline nounwind optnone "amdgpu-num-vgpr"="64"

	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2]] = { noinline nounwind "amdgpu-flat-work-group-size"="32,64" "amdgpu-waves-per-eu"="2"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-waves-per-eu"="2"
	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4]] = { noinline nounwind "amdgpu-flat-work-group-size"="32,64" "amdgpu-waves-per-eu"="2,4"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-waves-per-eu"="2,4"
	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_NUM_SGPR_32]] = { noinline nounwind "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-sgpr"="32"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_NUM_SGPR_32]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-sgpr"="32"
	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_NUM_VGPR_64]] = { noinline nounwind "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-vgpr"="64"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_NUM_VGPR_64]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-vgpr"="64"
	// CHECK-DAG: attributes [[WAVES_PER_EU_2_NUM_SGPR_32]] = { noinline nounwind "amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2"			// CHECK-DAG: attributes [[WAVES_PER_EU_2_NUM_SGPR_32]] = { noinline nounwind optnone "amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2"
	// CHECK-DAG: attributes [[WAVES_PER_EU_2_NUM_VGPR_64]] = { noinline nounwind "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2"			// CHECK-DAG: attributes [[WAVES_PER_EU_2_NUM_VGPR_64]] = { noinline nounwind optnone "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2"
	// CHECK-DAG: attributes [[WAVES_PER_EU_2_4_NUM_SGPR_32]] = { noinline nounwind "amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2,4"			// CHECK-DAG: attributes [[WAVES_PER_EU_2_4_NUM_SGPR_32]] = { noinline nounwind optnone "amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2,4"
	// CHECK-DAG: attributes [[WAVES_PER_EU_2_4_NUM_VGPR_64]] = { noinline nounwind "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2,4"			// CHECK-DAG: attributes [[WAVES_PER_EU_2_4_NUM_VGPR_64]] = { noinline nounwind optnone "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2,4"
	// CHECK-DAG: attributes [[NUM_SGPR_32_NUM_VGPR_64]] = { noinline nounwind "amdgpu-num-sgpr"="32" "amdgpu-num-vgpr"="64"			// CHECK-DAG: attributes [[NUM_SGPR_32_NUM_VGPR_64]] = { noinline nounwind optnone "amdgpu-num-sgpr"="32" "amdgpu-num-vgpr"="64"

	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_NUM_SGPR_32]] = { noinline nounwind "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_NUM_SGPR_32]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2"
	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_NUM_VGPR_64]] = { noinline nounwind "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_NUM_VGPR_64]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2"
	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4_NUM_SGPR_32]] = { noinline nounwind "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2,4"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4_NUM_SGPR_32]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2,4"
	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4_NUM_VGPR_64]] = { noinline nounwind "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2,4"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4_NUM_VGPR_64]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2,4"

	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_NUM_SGPR_32_NUM_VGPR_64]] = { noinline nounwind "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-sgpr"="32" "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_NUM_SGPR_32_NUM_VGPR_64]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-sgpr"="32" "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2"
	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4_NUM_SGPR_32_NUM_VGPR_64]] = { noinline nounwind "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-sgpr"="32" "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2,4"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4_NUM_SGPR_32_NUM_VGPR_64]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-sgpr"="32" "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2,4"

cfe/trunk/test/Driver/darwin-iphone-defaults.m

	Show All 20 Lines
	@interface I1			@interface I1
	+(id) alloc;			+(id) alloc;
	@end			@end

	void f1() {			void f1() {
	[I1 alloc];			[I1 alloc];
	}			}

	// CHECK: attributes [[F0]] = { noinline ssp{{.*}} }			// CHECK: attributes [[F0]] = { noinline optnone ssp{{.*}} }

This is an archive of the discontinued LLVM Phabricator instance.

IRGen: Add optnone attribute on function during O0ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 100580

cfe/trunk/include/clang/Driver/CC1Options.td

cfe/trunk/include/clang/Frontend/CodeGenOptions.def

cfe/trunk/lib/CodeGen/CGOpenMPRuntime.cpp

cfe/trunk/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp

cfe/trunk/lib/CodeGen/CodeGenModule.cpp

cfe/trunk/lib/Frontend/CompilerInvocation.cpp

cfe/trunk/test/CodeGen/aarch64-neon-2velem.c

cfe/trunk/test/CodeGen/aarch64-neon-3v.c

cfe/trunk/test/CodeGen/aarch64-neon-across.c

cfe/trunk/test/CodeGen/aarch64-neon-extract.c

cfe/trunk/test/CodeGen/aarch64-neon-fcvt-intrinsics.c

cfe/trunk/test/CodeGen/aarch64-neon-fma.c

cfe/trunk/test/CodeGen/aarch64-neon-intrinsics.c

cfe/trunk/test/CodeGen/aarch64-neon-ldst-one.c

cfe/trunk/test/CodeGen/aarch64-neon-misc.c

cfe/trunk/test/CodeGen/aarch64-neon-perm.c

cfe/trunk/test/CodeGen/aarch64-neon-scalar-copy.c

cfe/trunk/test/CodeGen/aarch64-neon-scalar-x-indexed-elem.c

cfe/trunk/test/CodeGen/aarch64-neon-shifts.c

cfe/trunk/test/CodeGen/aarch64-neon-tbl.c

cfe/trunk/test/CodeGen/aarch64-neon-vcombine.c

cfe/trunk/test/CodeGen/aarch64-neon-vget-hilo.c

cfe/trunk/test/CodeGen/aarch64-neon-vget.c

cfe/trunk/test/CodeGen/aarch64-poly128.c

cfe/trunk/test/CodeGen/aarch64-poly64.c

cfe/trunk/test/CodeGen/address-safety-attr-kasan.cpp

cfe/trunk/test/CodeGen/address-safety-attr.cpp

cfe/trunk/test/CodeGen/arm-crc32.c

cfe/trunk/test/CodeGen/arm-neon-directed-rounding.c

cfe/trunk/test/CodeGen/arm-neon-fma.c

cfe/trunk/test/CodeGen/arm-neon-numeric-maxmin.c

cfe/trunk/test/CodeGen/arm-neon-shifts.c

cfe/trunk/test/CodeGen/arm-neon-vcvtX.c

cfe/trunk/test/CodeGen/arm-neon-vget.c

cfe/trunk/test/CodeGen/arm64-crc32.c

cfe/trunk/test/CodeGen/arm64-lanes.c

cfe/trunk/test/CodeGen/arm64_vcopy.c

cfe/trunk/test/CodeGen/arm64_vdupq_n_f64.c

cfe/trunk/test/CodeGen/attr-coldhot.c

cfe/trunk/test/CodeGen/attr-naked.c

cfe/trunk/test/CodeGen/builtins-arm-exclusive.c

cfe/trunk/test/CodeGen/builtins-arm.c

cfe/trunk/test/CodeGen/builtins-arm64.c

cfe/trunk/test/CodeGen/noduplicate-cxx11-test.cpp

cfe/trunk/test/CodeGen/pragma-weak.c

cfe/trunk/test/CodeGen/unwind-attr.c

cfe/trunk/test/CodeGenCXX/apple-kext-indirect-virtual-dtor-call.cpp

cfe/trunk/test/CodeGenCXX/apple-kext-no-staticinit-section.cpp

cfe/trunk/test/CodeGenCXX/debug-info-global-ctor-dtor.cpp

cfe/trunk/test/CodeGenCXX/optnone-templates.cpp

cfe/trunk/test/CodeGenCXX/static-init-wasm.cpp

cfe/trunk/test/CodeGenCXX/thunks.cpp

cfe/trunk/test/CodeGenObjC/gnu-exceptions.m

cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl

cfe/trunk/test/Driver/darwin-iphone-defaults.m

IRGen: Add optnone attribute on function during O0
ClosedPublic