This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/
-
CodeGen/
5/12
TargetInfo.cpp
-
test/CodeGen/
-
CodeGen/
-
vector.c
-
x86_32-arguments-darwin.c
-
x86_32-arguments-linux.c
-
x86_32-m64.c

Differential D59744

Fix i386 ABI "__m64" type bug
Needs RevisionPublic

Authored by wxiao3 on Mar 23 2019, 9:08 PM.

Download Raw Diff

Details

Reviewers

annita.zhang
LuoYuanke
smaslov
craig.topper
hjl.tools
RKSimon
rnk
andreadb
gbedwell
rjmccall
krytarowski
mgorny
joerg

Commits

rGfbfee60c3263: [X86] [ABI] Fix i386 ABI "__m64" type bug
rC363116: [X86] [ABI] Fix i386 ABI "__m64" type bug
rL363116: [X86] [ABI] Fix i386 ABI "__m64" type bug

Summary

According to System V i386 ABI: the __m64 type paramater and return value are
passed by MMX registers. But current implementation treats __m64 as i64
which results in parameter passing by stack and returning by EDX and EAX.

This patch fixes the bug (https://bugs.llvm.org/show_bug.cgi?id=41029) for Linux
and NetBSD.

Diff Detail

Event Timeline

wxiao3 created this revision.Mar 23 2019, 9:08 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2019, 9:08 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

craig.topper added reviewers: RKSimon, rnk.Mar 23 2019, 9:14 PM

wxiao3 edited the summary of this revision. (Show Details)Mar 24 2019, 1:10 AM

RKSimon added reviewers: andreadb, gbedwell.Mar 24 2019, 3:40 AM

Dear reviewers, any comments?

rnk added inline comments.Apr 4 2019, 2:49 PM

lib/CodeGen/TargetInfo.cpp
9488–9489	The Sys V rules apply to every non-Windows OS, not just Linux. I think you should add the parameter regardless of the OS

RKSimon added inline comments.Apr 8 2019, 2:12 AM

test/CodeGen/x86_32-mmx-linux.c
2 ↗	(On Diff #192021)	Test on more triples and add the test file to trunk with current codegen so this patch shows the delta

wxiao3 updated this revision to Diff 195096.Apr 14 2019, 10:58 PM

wxiao3 marked 2 inline comments as done.

RKSimon added inline comments.Apr 15 2019, 10:19 AM

test/CodeGen/x86_32-m64-darwin.c
1 ↗	(On Diff #195096)	You should be able to merge all of these triples into the same test file, each with their own RUN: line, you will need to add a FileCheck prefix, something like: \| FileCheck %s --check-prefixes=CHECK,DARWIN \| FileCheck %s --check-prefixes=CHECK,IAMCU \| FileCheck %s --check-prefixes=CHECK,LINUX \| FileCheck %s --check-prefixes=CHECK,WIN32

wxiao3 updated this revision to Diff 195291.Apr 15 2019, 7:24 PM

wxiao3 marked an inline comment as done.

One last style comment from me but we need somebody better with the different ABIs to finally approve this.

lib/CodeGen/TargetInfo.cpp
1416	superfluous braces?

wxiao3 updated this revision to Diff 195659.Apr 17 2019, 6:25 PM

wxiao3 updated this revision to Diff 195660.Apr 17 2019, 6:33 PM

wxiao3 marked an inline comment as done.

rnk added subscribers: dexonsmith, rjmccall.Apr 22 2019, 5:27 PM

rnk added inline comments.

lib/CodeGen/TargetInfo.cpp
919	I think looking at the LLVM type to decide how something should be passed is a bad pattern to follow. We should look at the clang AST to decide how things will be passed, not LLVM types. Would that be complicated? Are there aggregate types that end up getting passed directly in MMX registers?
9490	I think this needs to preserve existing behavior for Darwin and PS4 based on comments from @rjmccall and @dexonsmith in D60748.

wxiao3 marked 2 inline comments as done.Apr 27 2019, 8:08 AM

wxiao3 added inline comments.

lib/CodeGen/TargetInfo.cpp
919	For x86 32 bit target, no aggregate types end up getting passed in MMX register. The only type passed by MMX is __m64 which is defined in header file (mmintrin.h): typedef long long __m64 __attribute__((__vector_size__(8), __aligned__(8))); Yes, it would be good if we define _m64 as a builtin type and handle it in AST level. But I'm afraid that it won't be a trivial work. Since GCC also handles __m64 in the same way as Clang currently does, can we just keep current implementation as it is?
9490	ok, I will follow it.

rnk added inline comments.May 1 2019, 1:09 PM

lib/CodeGen/TargetInfo.cpp
919	That's not quite what I'm suggesting. I'm saying that IsX86_MMXType should take a QualType parameter, and it should check if that qualtype looks like the __m64 vector type, instead of converting the QualType to llvm::Type and then checking if the llvm::Type is a 64-bit vector. Does that seem reasonable? See the code near the call site conditionalized on IsDarwinVectorABI which already has similar logic.

RKSimon resigned from this revision.May 2 2019, 3:15 AM

wxiao3 updated this revision to Diff 199233.May 13 2019, 3:37 AM

wxiao3 marked an inline comment as done.May 13 2019, 3:46 AM

wxiao3 added inline comments.

lib/CodeGen/TargetInfo.cpp
919	Yes, it's unnecessary to convert QualType to llvm::Type just for the _m64 vector type checking. Since It's very simple to check _m64 type based on QualType with pre-conditioned type assertion if (const VectorType *VT = RetTy->getAs<VectorType>()) I just remove the utility function: IsX86_MMXType.

wxiao3 added a reviewer: rjmccall.Jun 1 2019, 7:50 PM

wxiao3 added a reviewer: krytarowski.

wxiao3 updated this revision to Diff 202579.Jun 1 2019, 7:53 PM

wxiao3 edited the summary of this revision. (Show Details)

Hi all,

With the latest version, I have made below changes according to all your comments:

Only apply the fix to Linux where many libraries are built by GCC.
Avoid converting the QualType to llvm::Type and then checking if the llvm::Type is a 64-bit vector, which is unnecessary and inefficient. Furthermore, I remove the utility function: IsX86_MMXType since It's very simple to check _m64 type based on QualType.

Ok for merge now?

sysv abi is not only for Linux but most non-Windows ones (BSDs, HAIKU, ...).

Consider other Systems (e.g Darwin, PS4 and FreeBSD) don't want to spend any effort dealing with the ramifications of ABI breaks (as discussed in https://reviews.llvm.org/D60748) at present, I only fix the bug for Linux. If other system wants the fix, the only thing needed is to add a flag (like "IsLinuxABI" ) to enable it.

In D59744#1527412, @wxiao3 wrote:

Consider other Systems (e.g Darwin, PS4 and FreeBSD) don't want to spend any effort dealing with the ramifications of ABI breaks (as discussed in https://reviews.llvm.org/D60748) at present, I only fix the bug for Linux. If other system wants the fix, the only thing needed is to add a flag (like "IsLinuxABI" ) to enable it.

CC @mgorny and @joerg - do we want this for NetBSD?

In D59744#1529182, @krytarowski wrote:

In D59744#1527412, @wxiao3 wrote:

Consider other Systems (e.g Darwin, PS4 and FreeBSD) don't want to spend any effort dealing with the ramifications of ABI breaks (as discussed in https://reviews.llvm.org/D60748) at present, I only fix the bug for Linux. If other system wants the fix, the only thing needed is to add a flag (like "IsLinuxABI" ) to enable it.

CC @mgorny and @joerg - do we want this for NetBSD?

Probably yes. FWICS, gcc uses %mm0 and %mm1 on NetBSD while clang doesn't.

In D59744#1529218, @mgorny wrote:

In D59744#1529182, @krytarowski wrote:

In D59744#1527412, @wxiao3 wrote:

Consider other Systems (e.g Darwin, PS4 and FreeBSD) don't want to spend any effort dealing with the ramifications of ABI breaks (as discussed in https://reviews.llvm.org/D60748) at present, I only fix the bug for Linux. If other system wants the fix, the only thing needed is to add a flag (like "IsLinuxABI" ) to enable it.

CC @mgorny and @joerg - do we want this for NetBSD?

Probably yes. FWICS, gcc uses %mm0 and %mm1 on NetBSD while clang doesn't.

Unless Joerg will protest, @wxiao3 please enable it on NetBSD as well.. but personally I would enable it unconditionally for all sysv ABIs.

I think MMX code is obscure enough at this point that it doesn't matter much either way. Even less across DSO boundaries. That's why I don't really care either way.

mgorny added inline comments.Jun 6 2019, 6:33 AM

lib/CodeGen/TargetInfo.cpp
1013	Maybe replace the two booleans with something alike `IsPassInMMXRegABI`? And while at it, include NetBSD there, please.

wxiao3 updated this revision to Diff 203492.Jun 6 2019, 9:36 PM

wxiao3 edited the summary of this revision. (Show Details)

wxiao3 added a reviewer: mgorny.

wxiao3 added a reviewer: joerg.

Thanks for the suggestions!
I have updated the patch accordingly.

Ok for merge now?

rjmccall added inline comments.Jun 10 2019, 4:15 PM

lib/CodeGen/TargetInfo.cpp
1013	`CGT` is a member variable, so you can just query the target fresh in your `isPassInMMXRegABI` method. The check upfront for a 64-bit vector type should keep this well out of the fast path.

Thanks for the suggestions!
I have updated it.

Ok now?

Minor comments, then LGTM.

lib/CodeGen/TargetInfo.cpp
1102	"The System V i386 psABI requires __m64 to be passed in MMX registers. Clang historically had a bug where it failed to apply this rule, and some platforms (e.g. Darwin, PS4, and FreeBSD) have opted to maintain compatibility with the old Clang behavior, so we only apply it on platforms that have specifically requested it (currently just Linux and NetBSD)."
1415	Indentation on the continuation line.

This revision is now accepted and ready to land.Jun 11 2019, 9:56 AM

Thanks for the comments!
Updated for landing.

pengfei added a subscriber: pengfei.Jun 11 2019, 6:10 PM

Closed by commit rL363116: [X86] [ABI] Fix i386 ABI "__m64" type bug (authored by pengfei). · Explain WhyJun 11 2019, 6:50 PM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptJun 11 2019, 6:50 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

I tried replying on the cfe-commits email, but both Pengfei and Wei's email addresses seem to bounce, so replying here instead:

This broke Chromium for 32-bit Linux:
https://bugs.chromium.org/p/chromium/issues/detail?id=974542#c5

It's not clear what's happening yet, bet just to give a heads up.

Since this changes ABI behaviour, it would probably we worth
mentioning in docs/ReleaseNotes.rst too.

@hans

Please make sure all Chromium for 32-bit Linux libraries are following System V ABI (i.e., m64 is passed on mmx register). I suspect that there are some hand written assembly code in your libraries which is not following the ABI.

In D59744#1547445, @wxiao3 wrote:

@hans

Please make sure all Chromium for 32-bit Linux libraries are following System V ABI (i.e., m64 is passed on mmx register). I suspect that there are some hand written assembly code in your libraries which is not following the ABI.

That's likely true, but also not very helpful since the ABI implications here are pretty big (see comments on the chromium bug). It's also currently impossible to write an assembly function that works with both trunk clang and clang 8.0.0, which makes it difficult to update compilers independent of changing the code. clang generally tries to be abi-compatible with itself. This should probably support the existing fclang-abi-compat= flag at least (and have a release notes entry); possibly there should be a dedicated flag for this.

I have created a patch for you: https://reviews.llvm.org/D63473
Is it ok?

In D59744#1547445, @wxiao3 wrote:

@hans

Please make sure all Chromium for 32-bit Linux libraries are following System V ABI (i.e., m64 is passed on mmx register). I suspect that there are some hand written assembly code in your libraries which is not following the ABI.

We still don't have the root cause, but the library in question (Skia) doesn't have much assembly code. After your patch, %st0 (which aliases with %mm0) gets clobbered if a function returns a 4 x u16 vector. Skia tries to work around this by force-inlining such functions, but we're still seeing functions where %mm0 gets used. We believe this is the cause, but I'm still trying to figure out where the remaining %mm0 uses come from.

Hey folks, I'm the Skia point of contact on this, and "luckily" the person who wrote all the code that got us into this mess. Let me cross post a couple questions I've had from the Chromium bug over here where folks might know the answer...

Now that Clang's decided to match GCC's behavior of using mm0 to pass around 8-byte vectors on x86-32, is there any way to use 8-byte vector types safely any more? I don't really have the full context of this Clang change, but is it maybe a good idea applied to too many types? I notice the change mentions m64, but here I'm using uint16_t ext_vector_type(4) exclusively, never m64 or even an 8x8 vector... can we just squint and say u16x4 and m64 aren't the same, passing m64 according to the ABI but vector extensions however we were doing it before? Or can we work out some sort of ABI that preserves st0/mm0? I think we're finding that even with forced-inlining, at -O0 we still end up getting u16x4 values stored in mm0 briefly (kind of roundabout through xmm registers and the stack once or twice too).

In short, should working with 4x u16 be safe on x86-32 and there's a bug / undefined behavior in my code leading to this mm0/st0 clobber, or is this just actually not really spec'd to work?

In D59744#1548540, @mtklein wrote:

Hey folks, I'm the Skia point of contact on this, and "luckily" the person who wrote all the code that got us into this mess. Let me cross post a couple questions I've had from the Chromium bug over here where folks might know the answer...

Now that Clang's decided to match GCC's behavior of using mm0 to pass around 8-byte vectors on x86-32, is there any way to use 8-byte vector types safely any more? I don't really have the full context of this Clang change, but is it maybe a good idea applied to too many types? I notice the change mentions m64, but here I'm using uint16_t ext_vector_type(4) exclusively, never m64 or even an 8x8 vector... can we just squint and say u16x4 and m64 aren't the same, passing m64 according to the ABI but vector extensions however we were doing it before?

__m64 is of course defined using the compiler's vector extensions. More importantly, GCC also has those vector extensions (or at least some of them), and my understanding is that GCC is interpreting the ABI's __m64 to mean "all 8-byte vectors" (which seems quite reasonable to me), and that's what Clang needs to stay compatible with on systems where GCC is the system compiler.

Now, we could theoretically use a different ABI rule for vectors defined with Clang-specific extensions, but that seems like it would cause quite a few problems of its own.

In short, should working with 4x u16 be safe on x86-32 and there's a bug / undefined behavior in my code leading to this mm0/st0 clobber, or is this just actually not really spec'd to work?

It's always possible that there's a bug in the compiler, but the most likely thing is that you have assembly code that's not obeying the ABI in some way.

Now, we could theoretically use a different ABI rule for vectors defined with Clang-specific extensions, but that seems like it would cause quite a few problems of its own.

I think we can't reasonably impose this ABI rule on vectors defined with ext_vector_type: that makes it impossible to build portable OpenCL code for 32-bit x86, given the side-effects of introducing any use of the x86_mmx type. So that leaves us with two options: make vector_size and ext_vector_type incompatible, or revert this patch and intentionally remain ABI-incompatible with gcc. (I guess we could theoretically try to separate out a special case for OpenCL instead, but that seems even more fragile.)

Being ABI-incompatible is obviously inconvenient if you're writing code using MMX types/intrinsics, but using MMX intrinsics is sort of "at your own risk" anyway, given neither LLVM nor gcc properly manages the state of the MMX/x87 register file.

In D59744#1548919, @efriedma wrote:

Now, we could theoretically use a different ABI rule for vectors defined with Clang-specific extensions, but that seems like it would cause quite a few problems of its own.

I think we can't reasonably impose this ABI rule on vectors defined with ext_vector_type: that makes it impossible to build portable OpenCL code for 32-bit x86, given the side-effects of introducing any use of the x86_mmx type.

Sorry, I've remained somewhat intentionally ignorant of the issues here. Are you saying that using MMX in LLVM requires source-level workarounds in some way, and so we can't lower portable code to use MMX because that code will (reasonably) lack those workarounds? If that's true, then fixing that seems like a blocker to landing this patch; it is better to be ABI-non-compliant than to produce broken code.

Are you saying that using MMX in LLVM requires source-level workarounds in some way, and so we can't lower portable code to use MMX because that code will (reasonably) lack those workarounds?

Yes.

The x86 architecture requires that a program executes an "emms" instruction between any MMX instructions, and any x87 instructions. Otherwise, the x87 instructions will produce nonsense results. LLVM, and other compilers, never insert emms automatically; this is partially historical, but also because emms can be expensive on Intel chips. Instead, the user is expected to call _mm_empty() in appropriate places.

To allow users to generate arbitrary vector IR without tripping over this, LLVM does not lower vector IR to MMX instructions; instead, it only generates MMX instructions for operations using the special type x86_mmx. If any instruction or argument has a result or operand of type x86_mmx in LLVM IR, the user must explicitly execute emms (@llvm.x86.mmx.emms() in IR, _mm_empty() in C) between that instruction, and any code that might use x87 registers. "Between" isn't really sound because emms intrinsic doesn't reliably prevent code motion of floating-point operations, but it works well enough in practice. (See also https://bugs.llvm.org/show_bug.cgi?id=35982 .)

On the clang side, without this patch, we only generate code using the x86_mmx type in a couple places: _mm_* calls, and inline asm with an MMX operand. If the user does not use either of those, there will never be any values of type x86_mmx, so there will never be any MMX instructions, and we avoid the whole mess. 64-bit vector operations get lowered to SSE2 instructions instead (or scalarized).

This patch introduces a new place where clang will generate the type x86_mmx: for call arguments and return values. This means more places where the user is required to write _mm_empty() to get correct behavior.

Ah, thank you for that explanation. That's got to be exactly what we're tripping over in Chromium / Skia.

Thank you. So it sounds like this patch needs to be reverted, and the correct version of it will have to insert these intrinsic calls in four places:

before translating vector arguments to MMX type before calls that pass __m64 arguments,
after translating MMX parameters to vector type in functions that receive __m64 parameters,
before translating vector results to MMX type in functions that return __m64, and
after translating MMX results to vector type after calls that return __m64.

Will that be sufficient to satisfy LLVM?

If we're going to insert emms instructions automatically, it doesn't really make sense to do it in the frontend; the backend could figure out the most efficient placement itself. (See lib/Target/X86/X86VZeroUpper.cpp, which implements similar logic for AVX.) The part I'd be worried about is the potential performance hit from calling emms in places where other compilers wouldn't, for code using MMX intrinsics.

In D59744#1549229, @efriedma wrote:

If we're going to insert emms instructions automatically, it doesn't really make sense to do it in the frontend; the backend could figure out the most efficient placement itself. (See lib/Target/X86/X86VZeroUpper.cpp, which implements similar logic for AVX.) The part I'd be worried about is the potential performance hit from calling emms in places where other compilers wouldn't, for code using MMX intrinsics.

It would certainly be simpler for the frontend if the backend did this — in fact, even if the "frontend" was going to do it, I would have suggested doing it as a pass over the emitted IR rather than a special case in IRGen. Anyway, I'm open to any reasonable option; at this point, I'm just laying out the basic requirements for getting this patch back in, because the current patch is invalid given LLVM's current requirements.

I'm just laying out the basic requirements for getting this patch back in, because the current patch is invalid given LLVM's current requirements.

Yes, I'm on the same page.

Can anyone provide me some small reproducers code for the issue tripped over by Chromium / Skia?

In D59744#1549675, @wxiao3 wrote:

Can anyone provide me some small reproducers code for the issue tripped over by Chromium / Skia?

Sorry, I don't have a small repro yet. I'm still working on finding out exactly what's happening in Chromium, but it's a large test. It's easy to find where the x87 state gets clobbered after your change, but I haven't found what code was depending on that state yet.

I've raised https://bugs.llvm.org/show_bug.cgi?id=42319 which suggests the creation of a EMMS insertion pass.

In D59744#1549746, @hans wrote:

In D59744#1549675, @wxiao3 wrote:

Can anyone provide me some small reproducers code for the issue tripped over by Chromium / Skia?

Sorry, I don't have a small repro yet. I'm still working on finding out exactly what's happening in Chromium, but it's a large test. It's easy to find where the x87 state gets clobbered after your change, but I haven't found what code was depending on that state yet.

Oh, I thought the problem was just that the registers alias, not that the whole x87 state gets messed up by mmx instructions. Here's a simple repro:

$ cat /tmp/a.c
#include <stdint.h>
#include <stdio.h>

#ifdef __clang__
typedef uint16_t __attribute__((ext_vector_type(4))) V;
#else
typedef uint16_t V __attribute__ ((vector_size (4*sizeof(uint16_t))));
#endif

V f() {
  V v = { 1,2,3,4 };
  return v;
}

double d() { return 3.14; }

int main() {
  f();
  printf("%lf\n", d());
  return 0;
}

$ bin/clang -m32 -O0 /tmp/a.c && ./a.out
-nan

Before your change, it prints 3.140000.

Chromium was previously working around this problem in gcc by force-inlining f() into main(). That doesn't work with Clang because it touches %mm0 even after inlining.

I've reverted in r363790 until a solution can be found.

RKSimon reopened this revision.Jun 19 2019, 4:39 AM

This revision is now accepted and ready to land.Jun 19 2019, 4:39 AM

RKSimon requested changes to this revision.Jun 19 2019, 4:39 AM

This revision now requires changes to proceed.Jun 19 2019, 4:39 AM

$ bin/clang -m32 -O0 /tmp/a.c && ./a.out
-nan
Before your change, it prints 3.140000.

I looked through the Intel manual to understand what's happening in detail:

When we return from f() with the new ABI, we write to the %mm0 register, and as a side effect:

(9.5.1) After each MMX instruction, the entire x87 FPU tag word is set to valid (00B).

What does that mean?

(8.1.7) "The x87 FPU uses the tag values to detect stack overflow and underflow conditions (see Section 8.5.1.1)"

(8.5.1.1) "Stack overflow — An instruction attempts to load a non-empty x87 FPU register from memory. A non-empty
register is defined as a register containing a zero (tag value of 01), a valid value (tag value of 00), or a special
value (tag value of 10).

When the x87 FPU detects stack overflow or underflow, it sets the IE flag (bit 0) and the SF flag (bit 6) in the x87
FPU status word to 1. It then sets condition-code flag C1 (bit 9) in the x87 FPU status word to 1 if stack overflow
occurred or to 0 if stack underflow occurred.
If the invalid-operation exception is masked, the x87 FPU returns the floating point, integer, or packed decimal
integer indefinite value to the destination operand, depending on the instruction being executed. This value over-
writes the destination register or memory location specified by the instruction."

Okay, so essentially any MMX instruction marks the x87 register stack as full, and when we try to store into it in d() with "fldl" we get a stack overflow, and because the exception is masked, it stores "the floating point indefinite value" into the register, which is what we end up printing.

At least I finally think I understand what's going on :-)

-O0 always inline isn't working because the frontend is emitting a store of vector type to memory then a load of x86_mmx to do the type coercion. The caller does the opposite to coerce back from mmx. This -O0 pipeline isn't capable of getting rid of these redundant store/load pairs. We might have a better chance if we just emitted bitcasts.

Revision Contents

Path

Size

lib/

CodeGen/

TargetInfo.cpp

33 lines

test/

CodeGen/

vector.c

2 lines

x86_32-arguments-darwin.c

4 lines

x86_32-arguments-linux.c

4 lines

x86_32-m64.c

25 lines

Diff 195291

lib/CodeGen/TargetInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 910 Lines • ▼ Show 20 Lines	ABIArgInfo PNaClABIInfo::classifyReturnType(QualType RetTy) const {
if (const EnumType *EnumTy = RetTy->getAs<EnumType>())		if (const EnumType *EnumTy = RetTy->getAs<EnumType>())
RetTy = EnumTy->getDecl()->getIntegerType();		RetTy = EnumTy->getDecl()->getIntegerType();

return (RetTy->isPromotableIntegerType() ? ABIArgInfo::getExtend(RetTy)		return (RetTy->isPromotableIntegerType() ? ABIArgInfo::getExtend(RetTy)
: ABIArgInfo::getDirect());		: ABIArgInfo::getDirect());
}		}

/// IsX86_MMXType - Return true if this is an MMX type.		/// IsX86_MMXType - Return true if this is an MMX type.
bool IsX86_MMXType(llvm::Type *IRType) {		bool IsX86_MMXType(llvm::Type *IRType) {
rnkUnsubmitted Not Done Reply Inline Actions I think looking at the LLVM type to decide how something should be passed is a bad pattern to follow. We should look at the clang AST to decide how things will be passed, not LLVM types. Would that be complicated? Are there aggregate types that end up getting passed directly in MMX registers? rnk: I think looking at the LLVM type to decide how something should be passed is a bad pattern to…
wxiao3AuthorUnsubmitted Done Reply Inline Actions For x86 32 bit target, no aggregate types end up getting passed in MMX register. The only type passed by MMX is __m64 which is defined in header file (mmintrin.h): typedef long long __m64 __attribute__((__vector_size__(8), __aligned__(8))); Yes, it would be good if we define _m64 as a builtin type and handle it in AST level. But I'm afraid that it won't be a trivial work. Since GCC also handles __m64 in the same way as Clang currently does, can we just keep current implementation as it is? wxiao3: For x86 32 bit target, no aggregate types end up getting passed in MMX register. The only type…
rnkUnsubmitted Not Done Reply Inline Actions That's not quite what I'm suggesting. I'm saying that IsX86_MMXType should take a QualType parameter, and it should check if that qualtype looks like the __m64 vector type, instead of converting the QualType to llvm::Type and then checking if the llvm::Type is a 64-bit vector. Does that seem reasonable? See the code near the call site conditionalized on IsDarwinVectorABI which already has similar logic. rnk: That's not quite what I'm suggesting. I'm saying that IsX86_MMXType should take a QualType…
wxiao3AuthorUnsubmitted Done Reply Inline Actions Yes, it's unnecessary to convert QualType to llvm::Type just for the _m64 vector type checking. Since It's very simple to check _m64 type based on QualType with pre-conditioned type assertion if (const VectorType VT = RetTy->getAs<VectorType>()) I just remove the utility function: IsX86_MMXType. wxiao3:* Yes, it's unnecessary to convert QualType to llvm::Type just for the _m64 vector type checking.
// Return true if the type is an MMX type <2 x i32>, <4 x i16>, or <8 x i8>.		// Return true if the type is an MMX type <1 x i64>, <2 x i32>, <4 x i16>,
		// or <8 x i8>.
return IRType->isVectorTy() && IRType->getPrimitiveSizeInBits() == 64 &&		return IRType->isVectorTy() && IRType->getPrimitiveSizeInBits() == 64 &&
cast<llvm::VectorType>(IRType)->getElementType()->isIntegerTy() &&		cast<llvm::VectorType>(IRType)->getElementType()->isIntegerTy();
IRType->getScalarSizeInBits() != 64;
}		}

static llvm::Type* X86AdjustInlineAsmType(CodeGen::CodeGenFunction &CGF,		static llvm::Type* X86AdjustInlineAsmType(CodeGen::CodeGenFunction &CGF,
StringRef Constraint,		StringRef Constraint,
llvm::Type* Ty) {		llvm::Type* Ty) {
bool IsMMXCons = llvm::StringSwitch<bool>(Constraint)		bool IsMMXCons = llvm::StringSwitch<bool>(Constraint)
.Cases("y", "&y", "^Ym", true)		.Cases("y", "&y", "^Ym", true)
.Default(false);		.Default(false);
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	class X86_32ABIInfo : public SwiftABIInfo {

static const unsigned MinABIStackAlignInBytes = 4;		static const unsigned MinABIStackAlignInBytes = 4;

bool IsDarwinVectorABI;		bool IsDarwinVectorABI;
bool IsRetSmallStructInRegABI;		bool IsRetSmallStructInRegABI;
bool IsWin32StructABI;		bool IsWin32StructABI;
bool IsSoftFloatABI;		bool IsSoftFloatABI;
bool IsMCUABI;		bool IsMCUABI;
unsigned DefaultNumRegisterParameters;		unsigned DefaultNumRegisterParameters;
		mgornyUnsubmitted Not Done Reply Inline Actions Maybe replace the two booleans with something alike `IsPassInMMXRegABI`? And while at it, include NetBSD there, please. mgorny: Maybe replace the two booleans with something alike `IsPassInMMXRegABI`? And while at it…
		rjmccallUnsubmitted Not Done Reply Inline Actions `CGT` is a member variable, so you can just query the target fresh in your `isPassInMMXRegABI` method. The check upfront for a 64-bit vector type should keep this well out of the fast path. rjmccall: `CGT` is a member variable, so you can just query the target fresh in your `isPassInMMXRegABI`…
		bool IsMMXEnabled;

static bool isRegisterSize(unsigned Size) {		static bool isRegisterSize(unsigned Size) {
return (Size == 8 \|\| Size == 16 \|\| Size == 32 \|\| Size == 64);		return (Size == 8 \|\| Size == 16 \|\| Size == 32 \|\| Size == 64);
}		}

bool isHomogeneousAggregateBaseType(QualType Ty) const override {		bool isHomogeneousAggregateBaseType(QualType Ty) const override {
// FIXME: Assumes vectorcall is in use.		// FIXME: Assumes vectorcall is in use.
return isX86VectorTypeForVectorCall(getContext(), Ty);		return isX86VectorTypeForVectorCall(getContext(), Ty);
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
public:		public:

void computeInfo(CGFunctionInfo &FI) const override;		void computeInfo(CGFunctionInfo &FI) const override;
Address EmitVAArg(CodeGenFunction &CGF, Address VAListAddr,		Address EmitVAArg(CodeGenFunction &CGF, Address VAListAddr,
QualType Ty) const override;		QualType Ty) const override;

X86_32ABIInfo(CodeGen::CodeGenTypes &CGT, bool DarwinVectorABI,		X86_32ABIInfo(CodeGen::CodeGenTypes &CGT, bool DarwinVectorABI,
bool RetSmallStructInRegABI, bool Win32StructABI,		bool RetSmallStructInRegABI, bool Win32StructABI,
unsigned NumRegisterParameters, bool SoftFloatABI)		unsigned NumRegisterParameters, bool SoftFloatABI,
		bool MMXEnabled)
: SwiftABIInfo(CGT), IsDarwinVectorABI(DarwinVectorABI),		: SwiftABIInfo(CGT), IsDarwinVectorABI(DarwinVectorABI),
IsRetSmallStructInRegABI(RetSmallStructInRegABI),		IsRetSmallStructInRegABI(RetSmallStructInRegABI),
IsWin32StructABI(Win32StructABI),		IsWin32StructABI(Win32StructABI),
IsSoftFloatABI(SoftFloatABI),		IsSoftFloatABI(SoftFloatABI),
IsMCUABI(CGT.getTarget().getTriple().isOSIAMCU()),		IsMCUABI(CGT.getTarget().getTriple().isOSIAMCU()),
DefaultNumRegisterParameters(NumRegisterParameters) {}		DefaultNumRegisterParameters(NumRegisterParameters),
		IsMMXEnabled(MMXEnabled) {}

bool shouldPassIndirectlyForSwift(ArrayRef<llvm::Type*> scalars,		bool shouldPassIndirectlyForSwift(ArrayRef<llvm::Type*> scalars,
bool asReturnValue) const override {		bool asReturnValue) const override {
// LLVM's x86-32 lowering currently only assigns up to three		// LLVM's x86-32 lowering currently only assigns up to three
// integer registers and three fp registers. Oddly, it'll use up to		// integer registers and three fp registers. Oddly, it'll use up to
// four vector registers for vectors, but those can overlap with the		// four vector registers for vectors, but those can overlap with the
// scalar registers.		// scalar registers.
return occupiesMoreThan(CGT, scalars, /total/ 3);		return occupiesMoreThan(CGT, scalars, /total/ 3);
}		}

bool isSwiftErrorInRegister() const override {		bool isSwiftErrorInRegister() const override {
// x86-32 lowering does not support passing swifterror in a register.		// x86-32 lowering does not support passing swifterror in a register.
return false;		return false;
}		}
};		};

class X86_32TargetCodeGenInfo : public TargetCodeGenInfo {		class X86_32TargetCodeGenInfo : public TargetCodeGenInfo {
public:		public:
X86_32TargetCodeGenInfo(CodeGen::CodeGenTypes &CGT, bool DarwinVectorABI,		X86_32TargetCodeGenInfo(CodeGen::CodeGenTypes &CGT, bool DarwinVectorABI,
bool RetSmallStructInRegABI, bool Win32StructABI,		bool RetSmallStructInRegABI, bool Win32StructABI,
		rjmccallUnsubmitted Not Done Reply Inline Actions "The System V i386 psABI requires __m64 to be passed in MMX registers. Clang historically had a bug where it failed to apply this rule, and some platforms (e.g. Darwin, PS4, and FreeBSD) have opted to maintain compatibility with the old Clang behavior, so we only apply it on platforms that have specifically requested it (currently just Linux and NetBSD)." rjmccall: "The System V i386 psABI requires __m64 to be passed in MMX registers. Clang historically had a…
unsigned NumRegisterParameters, bool SoftFloatABI)		unsigned NumRegisterParameters, bool SoftFloatABI,
		bool MMXEnabled = false)
: TargetCodeGenInfo(new X86_32ABIInfo(		: TargetCodeGenInfo(new X86_32ABIInfo(
CGT, DarwinVectorABI, RetSmallStructInRegABI, Win32StructABI,		CGT, DarwinVectorABI, RetSmallStructInRegABI, Win32StructABI,
NumRegisterParameters, SoftFloatABI)) {}		NumRegisterParameters, SoftFloatABI, MMXEnabled)) {}

static bool isStructReturnInRegABI(		static bool isStructReturnInRegABI(
const llvm::Triple &Triple, const CodeGenOptions &Opts);		const llvm::Triple &Triple, const CodeGenOptions &Opts);

void setTargetAttributes(const Decl D, llvm::GlobalValue GV,		void setTargetAttributes(const Decl D, llvm::GlobalValue GV,
CodeGen::CodeGenModule &CGM) const override;		CodeGen::CodeGenModule &CGM) const override;

int getDwarfEHStackPointer(CodeGen::CodeGenModule &CGM) const override {		int getDwarfEHStackPointer(CodeGen::CodeGenModule &CGM) const override {
▲ Show 20 Lines • Show All 290 Lines • ▼ Show 20 Lines	if (IsDarwinVectorABI) {
if ((Size == 8 \|\| Size == 16 \|\| Size == 32) \|\|		if ((Size == 8 \|\| Size == 16 \|\| Size == 32) \|\|
(Size == 64 && VT->getNumElements() == 1))		(Size == 64 && VT->getNumElements() == 1))
return ABIArgInfo::getDirect(llvm::IntegerType::get(getVMContext(),		return ABIArgInfo::getDirect(llvm::IntegerType::get(getVMContext(),
Size));		Size));

return getIndirectReturnResult(RetTy, State);		return getIndirectReturnResult(RetTy, State);
}		}

		if (IsMMXEnabled && IsX86_MMXType(CGT.ConvertType(RetTy))) {
		return ABIArgInfo::getDirect(llvm::Type::getX86_MMXTy(getVMContext()));
		rjmccallUnsubmitted Not Done Reply Inline Actions Indentation on the continuation line. rjmccall: Indentation on the continuation line.
		}
		RKSimonUnsubmitted Done Reply Inline Actions superfluous braces? RKSimon: superfluous braces?

return ABIArgInfo::getDirect();		return ABIArgInfo::getDirect();
}		}

if (isAggregateTypeForABI(RetTy)) {		if (isAggregateTypeForABI(RetTy)) {
if (const RecordType *RT = RetTy->getAs<RecordType>()) {		if (const RecordType *RT = RetTy->getAs<RecordType>()) {
// Structures with flexible arrays are always indirect.		// Structures with flexible arrays are always indirect.
if (RT->getDecl()->hasFlexibleArrayMember())		if (RT->getDecl()->hasFlexibleArrayMember())
return getIndirectReturnResult(RetTy, State);		return getIndirectReturnResult(RetTy, State);
▲ Show 20 Lines • Show All 288 Lines • ▼ Show 20 Lines	if (const VectorType *VT = Ty->getAs<VectorType>()) {
if (IsDarwinVectorABI) {		if (IsDarwinVectorABI) {
uint64_t Size = getContext().getTypeSize(Ty);		uint64_t Size = getContext().getTypeSize(Ty);
if ((Size == 8 \|\| Size == 16 \|\| Size == 32) \|\|		if ((Size == 8 \|\| Size == 16 \|\| Size == 32) \|\|
(Size == 64 && VT->getNumElements() == 1))		(Size == 64 && VT->getNumElements() == 1))
return ABIArgInfo::getDirect(llvm::IntegerType::get(getVMContext(),		return ABIArgInfo::getDirect(llvm::IntegerType::get(getVMContext(),
Size));		Size));
}		}

if (IsX86_MMXType(CGT.ConvertType(Ty)))		if (IsX86_MMXType(CGT.ConvertType(Ty))) {
		if (IsMMXEnabled)
		return ABIArgInfo::getDirect(llvm::Type::getX86_MMXTy(getVMContext()));
return ABIArgInfo::getDirect(llvm::IntegerType::get(getVMContext(), 64));		return ABIArgInfo::getDirect(llvm::IntegerType::get(getVMContext(),64));
		}

return ABIArgInfo::getDirect();		return ABIArgInfo::getDirect();
}		}


if (const EnumType *EnumTy = Ty->getAs<EnumType>())		if (const EnumType *EnumTy = Ty->getAs<EnumType>())
Ty = EnumTy->getDecl()->getIntegerType();		Ty = EnumTy->getDecl()->getIntegerType();

▲ Show 20 Lines • Show All 7,745 Lines • ▼ Show 20 Lines	case llvm::Triple::x86: {
bool RetSmallStructInRegABI =		bool RetSmallStructInRegABI =
X86_32TargetCodeGenInfo::isStructReturnInRegABI(Triple, CodeGenOpts);		X86_32TargetCodeGenInfo::isStructReturnInRegABI(Triple, CodeGenOpts);
bool IsWin32FloatStructABI = Triple.isOSWindows() && !Triple.isOSCygMing();		bool IsWin32FloatStructABI = Triple.isOSWindows() && !Triple.isOSCygMing();

if (Triple.getOS() == llvm::Triple::Win32) {		if (Triple.getOS() == llvm::Triple::Win32) {
return SetCGInfo(new WinX86_32TargetCodeGenInfo(		return SetCGInfo(new WinX86_32TargetCodeGenInfo(
Types, IsDarwinVectorABI, RetSmallStructInRegABI,		Types, IsDarwinVectorABI, RetSmallStructInRegABI,
IsWin32FloatStructABI, CodeGenOpts.NumRegisterParameters));		IsWin32FloatStructABI, CodeGenOpts.NumRegisterParameters));
} else {		} else {
		// System V i386 ABI requires __m64 value passing by MMX registers.
		rnkUnsubmitted Done Reply Inline Actions The Sys V rules apply to every non-Windows OS, not just Linux. I think you should add the parameter regardless of the OS rnk: The Sys V rules apply to every non-Windows OS, not just Linux. I think you should add the…
		bool EnableMMX = getContext().getTargetInfo().getABI() != "no-mmx";
		rnkUnsubmitted Not Done Reply Inline Actions I think this needs to preserve existing behavior for Darwin and PS4 based on comments from @rjmccall and @dexonsmith in D60748. rnk: I think this needs to preserve existing behavior for Darwin and PS4 based on comments from…
		wxiao3AuthorUnsubmitted Done Reply Inline Actions ok, I will follow it. wxiao3: ok, I will follow it.
return SetCGInfo(new X86_32TargetCodeGenInfo(		return SetCGInfo(new X86_32TargetCodeGenInfo(
Types, IsDarwinVectorABI, RetSmallStructInRegABI,		Types, IsDarwinVectorABI, RetSmallStructInRegABI,
IsWin32FloatStructABI, CodeGenOpts.NumRegisterParameters,		IsWin32FloatStructABI, CodeGenOpts.NumRegisterParameters,
CodeGenOpts.FloatABI == "soft"));		CodeGenOpts.FloatABI == "soft", EnableMMX));
}		}
}		}

case llvm::Triple::x86_64: {		case llvm::Triple::x86_64: {
StringRef ABI = getTarget().getABI();		StringRef ABI = getTarget().getABI();
X86AVXABILevel AVXLevel =		X86AVXABILevel AVXLevel =
(ABI == "avx512"		(ABI == "avx512"
? X86AVXABILevel::AVX512		? X86AVXABILevel::AVX512
▲ Show 20 Lines • Show All 137 Lines • Show Last 20 Lines

test/CodeGen/vector.c

	Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	// CHECK: icmp eq i32			// CHECK: icmp eq i32

	typedef int vec_int2 __attribute__((vector_size(8)));			typedef int vec_int2 __attribute__((vector_size(8)));
	vec_int2 lax_vector_compare2(long long x, vec_int2 y) {			vec_int2 lax_vector_compare2(long long x, vec_int2 y) {
	y = x == y;			y = x == y;
	return y;			return y;
	}			}

	// CHECK: define void @lax_vector_compare2(<2 x i32>* {{.sret.}}, i64 {{.}}, i64 {{.}})			// CHECK: define void @lax_vector_compare2(<2 x i32>* {{.sret.}}, i64 {{.}}, x86_mmx {{.}})
	// CHECK: icmp eq <2 x i32>			// CHECK: icmp eq <2 x i32>

test/CodeGen/x86_32-arguments-darwin.c

	Show First 20 Lines • Show All 223 Lines • ▼ Show 20 Lines
	typedef int v4i32 __attribute__((__vector_size__(16)));			typedef int v4i32 __attribute__((__vector_size__(16)));

	// CHECK-LABEL: define <2 x i64> @f55(<4 x i32> %arg)			// CHECK-LABEL: define <2 x i64> @f55(<4 x i32> %arg)
	// PR8029			// PR8029
	v4i32 f55(v4i32 arg) { return arg+arg; }			v4i32 f55(v4i32 arg) { return arg+arg; }

	// CHECK-LABEL: define void @f56(			// CHECK-LABEL: define void @f56(
	// CHECK: i8 signext %a0, %struct.s56_0* byval align 4 %a1,			// CHECK: i8 signext %a0, %struct.s56_0* byval align 4 %a1,
	// CHECK: i64 %a2.coerce, %struct.s56_1* byval align 4,			// CHECK: x86_mmx %a2.coerce, %struct.s56_1* byval align 4,
	// CHECK: i64 %a4.coerce, %struct.s56_2* byval align 4,			// CHECK: i64 %a4.coerce, %struct.s56_2* byval align 4,
	// CHECK: <4 x i32> %a6, %struct.s56_3* byval align 16 %a7,			// CHECK: <4 x i32> %a6, %struct.s56_3* byval align 16 %a7,
	// CHECK: <2 x double> %a8, %struct.s56_4* byval align 16 %a9,			// CHECK: <2 x double> %a8, %struct.s56_4* byval align 16 %a9,
	// CHECK: <8 x i32> %a10, %struct.s56_5* byval align 4,			// CHECK: <8 x i32> %a10, %struct.s56_5* byval align 4,
	// CHECK: <4 x double> %a12, %struct.s56_6* byval align 4)			// CHECK: <4 x double> %a12, %struct.s56_6* byval align 4)

	// CHECK: call void (i32, ...) @f56_0(i32 1,			// CHECK: call void (i32, ...) @f56_0(i32 1,
	// CHECK: i32 %{{[^ ]}}, %struct.s56_0 byval align 4 %{{[^ ]*}},			// CHECK: i32 %{{[^ ]}}, %struct.s56_0 byval align 4 %{{[^ ]*}},
	// CHECK: i64 %{{[^ ]}}, %struct.s56_1 byval align 4 %{{[^ ]*}},			// CHECK: x86_mmx %{{[^ ]}}, %struct.s56_1 byval align 4 %{{[^ ]*}},
	// CHECK: i64 %{{[^ ]}}, %struct.s56_2 byval align 4 %{{[^ ]*}},			// CHECK: i64 %{{[^ ]}}, %struct.s56_2 byval align 4 %{{[^ ]*}},
	// CHECK: <4 x i32> %{{[^ ]}}, %struct.s56_3 byval align 16 %{{[^ ]*}},			// CHECK: <4 x i32> %{{[^ ]}}, %struct.s56_3 byval align 16 %{{[^ ]*}},
	// CHECK: <2 x double> %{{[^ ]}}, %struct.s56_4 byval align 16 %{{[^ ]*}},			// CHECK: <2 x double> %{{[^ ]}}, %struct.s56_4 byval align 16 %{{[^ ]*}},
	// CHECK: <8 x i32> {{[^ ]}}, %struct.s56_5 byval align 4 %{{[^ ]*}},			// CHECK: <8 x i32> {{[^ ]}}, %struct.s56_5 byval align 4 %{{[^ ]*}},
	// CHECK: <4 x double> {{[^ ]}}, %struct.s56_6 byval align 4 %{{[^ ]*}})			// CHECK: <4 x double> {{[^ ]}}, %struct.s56_6 byval align 4 %{{[^ ]*}})
	// CHECK: }			// CHECK: }
	//			//
	// <rdar://problem/7964854> [i386] clang misaligns long double in structures			// <rdar://problem/7964854> [i386] clang misaligns long double in structures
	▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

test/CodeGen/x86_32-arguments-linux.c

	// RUN: %clang_cc1 -w -fblocks -triple i386-pc-linux-gnu -target-cpu pentium4 -emit-llvm -o %t %s			// RUN: %clang_cc1 -w -fblocks -triple i386-pc-linux-gnu -target-cpu pentium4 -emit-llvm -o %t %s
	// RUN: FileCheck < %t %s			// RUN: FileCheck < %t %s

	// CHECK-LABEL: define void @f56(			// CHECK-LABEL: define void @f56(
	// CHECK: i8 signext %a0, %struct.s56_0* byval align 4 %a1,			// CHECK: i8 signext %a0, %struct.s56_0* byval align 4 %a1,
	// CHECK: i64 %a2.coerce, %struct.s56_1* byval align 4,			// CHECK: x86_mmx %a2.coerce, %struct.s56_1* byval align 4,
	// CHECK: <1 x double> %a4, %struct.s56_2* byval align 4,			// CHECK: <1 x double> %a4, %struct.s56_2* byval align 4,
	// CHECK: <4 x i32> %a6, %struct.s56_3* byval align 4,			// CHECK: <4 x i32> %a6, %struct.s56_3* byval align 4,
	// CHECK: <2 x double> %a8, %struct.s56_4* byval align 4,			// CHECK: <2 x double> %a8, %struct.s56_4* byval align 4,
	// CHECK: <8 x i32> %a10, %struct.s56_5* byval align 4,			// CHECK: <8 x i32> %a10, %struct.s56_5* byval align 4,
	// CHECK: <4 x double> %a12, %struct.s56_6* byval align 4)			// CHECK: <4 x double> %a12, %struct.s56_6* byval align 4)

	// CHECK: call void (i32, ...) @f56_0(i32 1,			// CHECK: call void (i32, ...) @f56_0(i32 1,
	// CHECK: i32 %{{.}}, %struct.s56_0 byval align 4 %{{[^ ]*}},			// CHECK: i32 %{{.}}, %struct.s56_0 byval align 4 %{{[^ ]*}},
	// CHECK: i64 %{{[^ ]}}, %struct.s56_1 byval align 4 %{{[^ ]*}},			// CHECK: x86_mmx %{{[^ ]}}, %struct.s56_1 byval align 4 %{{[^ ]*}},
	// CHECK: <1 x double> %{{[^ ]}}, %struct.s56_2 byval align 4 %{{[^ ]*}},			// CHECK: <1 x double> %{{[^ ]}}, %struct.s56_2 byval align 4 %{{[^ ]*}},
	// CHECK: <4 x i32> %{{[^ ]}}, %struct.s56_3 byval align 4 %{{[^ ]*}},			// CHECK: <4 x i32> %{{[^ ]}}, %struct.s56_3 byval align 4 %{{[^ ]*}},
	// CHECK: <2 x double> %{{[^ ]}}, %struct.s56_4 byval align 4 %{{[^ ]*}},			// CHECK: <2 x double> %{{[^ ]}}, %struct.s56_4 byval align 4 %{{[^ ]*}},
	// CHECK: <8 x i32> %{{[^ ]}}, %struct.s56_5 byval align 4 %{{[^ ]*}},			// CHECK: <8 x i32> %{{[^ ]}}, %struct.s56_5 byval align 4 %{{[^ ]*}},
	// CHECK: <4 x double> %{{[^ ]}}, %struct.s56_6 byval align 4 %{{[^ ]*}})			// CHECK: <4 x double> %{{[^ ]}}, %struct.s56_6 byval align 4 %{{[^ ]*}})
	// CHECK: }			// CHECK: }
	//			//
	// <rdar://problem/7964854> [i386] clang misaligns long double in structures			// <rdar://problem/7964854> [i386] clang misaligns long double in structures
	Show All 28 Lines

test/CodeGen/x86_32-m64.c

This file was added.

				// RUN: %clang_cc1 -w -O2 -fblocks -triple i386-pc-linux-gnu -target-cpu pentium4 -emit-llvm -o - %s \| FileCheck %s --check-prefixes=CHECK,LINUX
				// RUN: %clang_cc1 -w -O2 -fblocks -triple i386-apple-darwin9 -target-cpu yonah -emit-llvm -o - %s \| FileCheck %s --check-prefixes=CHECK,DARWIN
				// RUN: %clang_cc1 -w -O2 -fblocks -triple i386-pc-elfiamcu -mfloat-abi soft -emit-llvm -o - %s \| FileCheck %s --check-prefixes=CHECK,IAMCU
				// RUN: %clang_cc1 -w -O2 -fblocks -triple i386-pc-win32 -emit-llvm -o - %s \| FileCheck %s --check-prefixes=CHECK,WIN32

				#include <mmintrin.h>
				__m64 m64;
				void callee(__m64 __m1, __m64 __m2);
				__m64 caller(__m64 __m1, __m64 __m2)
				{
				// LINUX-LABEL: define x86_mmx @caller(x86_mmx %__m1.coerce, x86_mmx %__m2.coerce)
				// LINUX: tail call void @callee(x86_mmx %__m2.coerce, x86_mmx %__m1.coerce)
				// LINUX: ret x86_mmx
				// DARWIN-LABEL: define i64 @caller(i64 %__m1.coerce, i64 %__m2.coerce)
				// DARWIN: tail call void @callee(i64 %__m2.coerce, i64 %__m1.coerce)
				// DARWIN: ret i64
				// IAMCU-LABEL: define <1 x i64> @caller(i64 %__m1.coerce, i64 %__m2.coerce)
				// IAMCU: tail call void @callee(i64 %__m2.coerce, i64 %__m1.coerce)
				// IAMCU: ret <1 x i64>
				// WIN32-LABEL: define dso_local <1 x i64> @caller(i64 %__m1.coerce, i64 %__m2.coerce)
				// WIN32: call void @callee(i64 %__m2.coerce, i64 %__m1.coerce)
				// WIN32: ret <1 x i64>
				callee(__m2, __m1);
				return m64;
				}