This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Basic/Targets/
-
Basic/
-
Targets/
-
OSTargets.h
-
X86.h
-
test/CodeGen/
-
CodeGen/
-
target-data.c
-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
-
AutoUpgrade.h
-
lib/
-
IR/
1/4
AutoUpgrade.cpp
-
Target/X86/
-
X86/
-
X86TargetMachine.cpp
-
test/
-
Bitcode/
-
upgrade-datalayout.ll
-
upgrade-datalayout3.ll
-
CodeGen/X86/
-
X86/
-
atomic-unordered.ll
-
bitcast-i256.ll
-
catchpad-dynamic-alloca.ll
-
implicit-null-check.ll
-
legalize-shl-vec.ll
-
osx-private-labels.ll
-
scheduler-backtracking.ll
-
setcc-wide-types.ll
-
sret-implicit.ll
-
statepoint-vector.ll
-
tools/llvm-lto2/X86/
-
llvm-lto2/
-
X86/
-
pipeline.ll
-
slp-vectorize-pm.ll
-
stats-file-option.ll
-
unittests/Bitcode/
-
Bitcode/
-
DataLayoutUpgradeTest.cpp

Differential D86310

[X86] Align i128 to 16 bytes in x86 datalayouts
ClosedPublic

Authored by hvdijk on Aug 20 2020, 10:54 AM.

Download Raw Diff

Details

Reviewers

efriedma
echristo
spatel
wristow
craig.topper
rnk
tmgross

Commits

rGa21abc782a8e: [X86] Align i128 to 16 bytes in x86 datalayouts

Summary

This is an attempt at rebooting https://reviews.llvm.org/D28990

I've included AutoUpgrade changes to modify the data layout to satisfy the compatible layout check. But this does mean alloca, loads, stores, etc in old IR will automatically get this new alignment.

This should fix PR46320.

Diff Detail

Event Timeline

craig.topper created this revision.Aug 20 2020, 10:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 20 2020, 10:54 AM

Herald added subscribers: nikic, dexonsmith, steven_wu and 3 others. · View Herald Transcript

craig.topper requested review of this revision.Aug 20 2020, 10:54 AM

I'm afraid the AutoUpgrade component of this isn't compatible with existing IR without some additional work. I'm most concerned about cases like the following:

#pragma pack(8)
struct X { __int128 x; }; // Not a packed struct in IR because the native alignment is 8
struct Y { long long x; struct X y; }; // 24 bytes before autoupgrade, 32 bytes after
struct Y x;

On a related note, we need to add "Fn8" to the x86 datalayout at some point.

In D86310#2231136, @efriedma wrote:
I'm afraid the AutoUpgrade component of this isn't compatible with existing IR without some additional work. I'm most concerned about cases like the following:
#pragma pack(8)
struct X { __int128 x; }; // Not a packed struct in IR because the native alignment is 8
struct Y { long long x; struct X y; }; // 24 bytes before autoupgrade, 32 bytes after
struct Y x;
On a related note, we need to add "Fn8" to the x86 datalayout at some point.

I kind of feared that old IR was going to be a problem. Any thoughts on how to fix it? Do we need to visit every alloca/load/store/etc that don't have explicit alignment and force them to the old alignment? Alternatively, could we skip the autoupgrade and weaken the compatible layout check somehow?

As far as I know, there are basically three categories of things that depend on the alignment of a type.

The default alignment of load/store/alloca. On trunk, load/store/alloca always have explicitly specified alignment in memory. That said, old bitcode doesn't have explicit alignment in some cases, and we currently run UpgradeDataLayoutString() before we actually parse the IR instructions.
The default alignment of global variables. Globals are allowed to have unspecified alignment, and the resulting alignment is implicitly computed by a sort of tricky algorithm. We could look into forcing it to be computed explicitly, but it's a lot of work because there are a lot of places in the code that create globals without specifying the alignment.
The layout of other types: for a struct that isn't packed, LLVM implicitly inserts padding to ensure it's aligned. To make this work correctly, you'd have to rewrite the types of every global/load/store/GEP/etc so they don't depend on the alignment of i128.

To autoupgrade correctly, we have to handle all three of those.

We can't just weaken the compatible datalayout check because the modules are actually incompatible, for the above reasons.

RKSimon added a reviewer: wristow.Sep 1 2020, 4:44 AM

There is a risk of bitcode incompatibilities with this change, but we already have that the code we generate now is incompatible with GCC and results in crashes that way too, I don't think there's a perfect fix, I'd like it if we could merge this. I came up with roughly the same patch today, based on current sources, to fix bug #50198 before finding this one.

llvm/lib/IR/AutoUpgrade.cpp
4323	This needs to not be limited to `TT.isArch64Bit()`. i128 needs 16-byte alignment on all targets, and although clang disables `__int128` for X86, we still use it for lowering f128.

Herald added a subscriber: pengfei. · View Herald TranscriptMay 4 2021, 12:52 PM

RKSimon edited reviewers, added: hvdijk; removed: RKSimon.May 4 2021, 2:32 PM

RKSimon added a subscriber: RKSimon.

craig.topper mentioned this in D115942: [X86][MS] Change the alignment of f80 to 16 bytes on Windows 32bits to match with ICC.Jan 6 2022, 1:53 PM

rnk added a subscriber: rnk.Jan 6 2022, 2:15 PM

In D86310#2231378, @efriedma wrote:

As far as I know, there are basically three categories of things that depend on the alignment of a type.

The default alignment of load/store/alloca. On trunk, load/store/alloca always have explicitly specified alignment in memory. That said, old bitcode doesn't have explicit alignment in some cases, and we currently run UpgradeDataLayoutString() before we actually parse the IR instructions.

The default alignment of global variables. Globals are allowed to have unspecified alignment, and the resulting alignment is implicitly computed by a sort of tricky algorithm. We could look into forcing it to be computed explicitly, but it's a lot of work because there are a lot of places in the code that create globals without specifying the alignment.

The layout of other types: for a struct that isn't packed, LLVM implicitly inserts padding to ensure it's aligned. To make this work correctly, you'd have to rewrite the types of every global/load/store/GEP/etc so they don't depend on the alignment of i128.

To autoupgrade correctly, we have to handle all three of those.

We can't just weaken the compatible datalayout check because the modules are actually incompatible, for the above reasons.

I think it's feasible for the autoupgrader to use the original data layout from the module to "freeze" the IR by converting all unpacked struct types in the module to packed types and assigning explicit alignments to all memory operations that lack them. If that's what's required to give us the flexibility to change the datalayout in the future, so be it, it's probably worth doing, and all other targets will benefit as well.

In D86310#2736983, @hvdijk wrote:

There is a risk of bitcode incompatibilities with this change, but we already have that the code we generate now is incompatible with GCC and results in crashes that way too, I don't think there's a perfect fix, I'd like it if we could merge this. I came up with roughly the same patch today, based on current sources, to fix bug #50198 before finding this one.

Who exactly generates GCC-incompatible code, clang, LLVM, or some other frontend? My understanding is that Clang handles most struct layout and alignment concerns in the frontend. The feature I'm not clear on is calling convention lowering, so when i128 is passed in memory, the LLVM data layout controls its alignment. However, I wonder if the alignstack() parameter attribute can be used to control this instead from the frontend:
https://llvm.org/docs/LangRef.html#parameter-attributes

Herald added a subscriber: ormris. · View Herald TranscriptJan 6 2022, 2:15 PM

In D86310#3226142, @rnk wrote:

In D86310#2736983, @hvdijk wrote:

There is a risk of bitcode incompatibilities with this change, but we already have that the code we generate now is incompatible with GCC and results in crashes that way too, I don't think there's a perfect fix, I'd like it if we could merge this. I came up with roughly the same patch today, based on current sources, to fix bug #50198 before finding this one.

Who exactly generates GCC-incompatible code, clang, LLVM, or some other frontend? My understanding is that Clang handles most struct layout and alignment concerns in the frontend.

It's usually handled by clang, but when operations get lowered by LLVM to libcalls, it's coming from LLVM, and that's happening in the bug I referenced in that comment.

kkysen added a subscriber: kkysen.Nov 27 2022, 11:44 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 27 2022, 11:44 PM

Herald added subscribers: • pcwang-thead, StephenFan. · View Herald Transcript

What is the current status of this patch? Are the reviewers here OK with this fix in general but just need to see changes to autoupgrade?

@craig.topper or @hvdijk since you worked on it, are you interested in doing these changes, or is this patch in need of new authors?

Herald added a subscriber: wangpc. · View Herald TranscriptJul 10 2023, 9:56 AM

In D86310#4485837, @tmgross wrote:

What is the current status of this patch? Are the reviewers here OK with this fix in general but just need to see changes to autoupgrade?

@craig.topper or @hvdijk since you worked on it, are you interested in doing these changes, or is this patch in need of new authors?

I'm no longer working on X86 so I won't be able to work on it.

In D86310#4485837, @tmgross wrote:

What is the current status of this patch? Are the reviewers here OK with this fix in general but just need to see changes to autoupgrade?

@craig.topper or @hvdijk since you worked on it, are you interested in doing these changes, or is this patch in need of new authors?

The TT.isArch64Bit() thing I commented on is something I could change, if desired, but from my perspective, the suggested changes to the upgrade mechanism are an unreasonable amount of work considering the benefit is that it keeps already broken code equally broken, so I am not planning on working on that, sorry.

dexonsmith removed a subscriber: dexonsmith.Jul 10 2023, 1:29 PM

Thank you Craig and Harald for getting back so quick. I suppose that leaves it up to what level of AutoUpgrade changes would be accepted at a minimum.

@efriedma would you consider the changes suggested by @hvdijk sufficient under any circumstances or would you still insist on fully compatible AutoUpgrade, given the above discussion?

In D86310#3226142, @rnk wrote:

Who exactly generates GCC-incompatible code, clang, LLVM, or some other frontend? My understanding is that Clang handles most struct layout and alignment concerns in the frontend. The feature I'm not clear on is calling convention lowering, so when i128 is passed in memory, the LLVM data layout controls its alignment. However, I wonder if the alignstack() parameter attribute can be used to control this instead from the frontend:
https://llvm.org/docs/LangRef.html#parameter-attributes

Old question but just to add some more context - LLVM is generating code that is incorrect for the linux ABI (16-byte alignment is required, LLVM produces 8-byte alignment) but the Clang frontend patches this in a way that "mostly works". It does not always work, such as in the bug that Herald linked at https://bugs.llvm.org/show_bug.cgi?id=50198, which segfaults with the mostt recent LLVM versions but is OK with GCC. This is pretty bad because it means that any frontend has to provide a workaround just to make LLVM do the mostly correct (but still not fully correct) thing.

This came into relevance recently because we are revisiting the issue in Rust. I think we are pretty close to providing a hack solution like Clang does, but LLVM is objectively wrong here so there are going to be things that just don't work correctly for anybody until this gets fixed. There is some thorough discussion on our related issue, around this comment https://github.com/rust-lang/rust/issues/54341#issuecomment-1064729606.

Note that a fix for this was landed at some point but got reverted, https://reviews.llvm.org/D28990. @echristo as you were the reviewer there, do you maybe have anything to add about the proposed fix here?

A thought occurs: in older versions of LLVM, the data layout mechanism worked differently and permitted targets to declare that they supported multiple different data layout strings, by overriding isCompatibleDataLayout. This mechanism was removed in D67631. If we reinstate that, we can have the X86 target declare that it "supports" data layout strings with and without the -i128:128, where by "supports", I mean the code continues to not generally work in the same way it does not generally work now, but the specific limited cases that do work continue to work exactly the same ABI-incompatible way. This would have the same result of bug-for-bug compatibility with existing modules, but in what I suspect would be a significantly simpler way than by going through the module and adding explicit alignments everywhere. While I would still prefer to give up on that compatibility, if it is a hard requirement, and if this would be an alternative way of achieving it, I might possibly be able to update this patch to do just that. Would this be acceptable?

In D86310#4495825, @hvdijk wrote:

A thought occurs: in older versions of LLVM, the data layout mechanism worked differently and permitted targets to declare that they supported multiple different data layout strings, by overriding isCompatibleDataLayout. This mechanism was removed in D67631. If we reinstate that, we can have the X86 target declare that it "supports" data layout strings with and without the -i128:128, where by "supports", I mean the code continues to not generally work in the same way it does not generally work now, but the specific limited cases that do work continue to work exactly the same ABI-incompatible way. This would have the same result of bug-for-bug compatibility with existing modules, but in what I suspect would be a significantly simpler way than by going through the module and adding explicit alignments everywhere. While I would still prefer to give up on that compatibility, if it is a hard requirement, and if this would be an alternative way of achieving it, I might possibly be able to update this patch to do just that. Would this be acceptable?

The main problem with that is that we can't have multiple data layouts for one module, so linking old and new bitcode together would fail. But maybe that's exactly what we want -- after all, it is incompatible. Even if we "correctly" upgraded to preserve behavior of the old bitcode, it would still be incompatible with the new bitcode if i128 crosses the ABI boundary (explicitly or implicitly).

In D86310#4496582, @nikic wrote:

The main problem with that is that we can't have multiple data layouts for one module, so linking old and new bitcode together would fail.

Good point, but it's worth pointing out that this only applies to linking in the LLVM IR sense. Linking in the ELF object file sense would work exactly as it would with the explicit alignments added everywhere, as ELF object files do not contain that data layout string. Linking in the LLVM IR sense is what happens with clang -flto though.

But maybe that's exactly what we want -- after all, it is incompatible. Even if we "correctly" upgraded to preserve behavior of the old bitcode, it would still be incompatible with the new bitcode if i128 crosses the ABI boundary (explicitly or implicitly).

Yeah, that is a tricky question to answer. Let's say this change goes into LLVM 17, so LLVM 17 X86 data layouts include i128:128, and nothing is changed for LLVM 16. Let's also say we have a program made up of two source files, a.c, and b.c. Then:

clang-16 -c -flto a.c b.c && clang-17 a.o b.o should ideally be accepted and would behave in the same way as clang-16 -c a.c b.c && clang-16 a.o b.o.
clang-16 -c -flto a.c && clang-17 -c -flto b.c && clang-17 a.o b.o should ideally be rejected if both a.o nor b.o use i128, but possibly accepted otherwise?
clang-16 -c a.c && clang-17 -c b.c && clang-17 a.o b.o cannot be detected as an error if i128 is used in both, but will not behave sensibly.

I am not sure there is a simple solution there that covers all of it.

@efriedma would you consider the changes suggested by @hvdijk sufficient under any circumstances or would you still insist on fully compatible AutoUpgrade, given the above discussion?

If the requirement is "we can mix old and new IR", we have to do it correctly, to the extent old versions of clang do it correctly.

If we're willing to refuse to compile old IR and/or refuse to LTO together old and new IR, there are other possible solutions. I'm not sure what workflows depend on having working autoupgrade.

I see two ways forward here:

Autoupgrade modules with old datalayout strings by increasing the alignment of i128 & co. This will change LLVM IR struct layouts, argument alignments, etc. As far as native ABI boundaries are concerned, this should be "more correct": Clang explicitly applies alignstack attributes to increase the alignment of i128 arguments, and adds padding to structs to align i128. As far as IR ABI boundaries within LTO are concerned, it is ABI compatible with IR modules.
Freeze the ABI of the old module during autoupgrade. Replace all struct types with equivalent packed structs and explicit padding. Apply explicit alignments to all i128 loads and stores. Apply explicit alignstack(8) attributes to all i128 arguments.

I think 1 is better than 2. The only problem that approach 2 solves is to ensure that a non-clang frontend using i128 is ABI compatible with old versions of that same frontend (think Rust). Given that most non-clang frontends want the bug fix (ABI break), who exactly is asking for this level of IR ABI stability? Maybe I'm missing something, but after skimming over this review again, I think the existing autoupgrade approach is probably good enough. Can we add a release note or something and leave it at that?

In D86310#4498551, @rnk wrote:

Given that most non-clang frontends want the bug fix (ABI break), who exactly is asking for this level of IR ABI stability?

You were, I thought, or at least that's how I interpreted your earlier comment. :) If we're now all in agreement that that level of ABI stability is not needed, I can update this patch to address the comment that I had left (it should not be limited to 64-bit, it's needed for all X86). I'll probably be able to find time for this in the weekend.

The only problem that approach 2 solves is to ensure that a non-clang frontend using i128

https://reviews.llvm.org/D86310#2231136 has an example where IR generated by clang breaks.

In D86310#4498575, @efriedma wrote:

https://reviews.llvm.org/D86310#2231136 has an example where IR generated by clang breaks.

clang bases it on the data layout, so when the change here is applied, clang already generates correct IR for that example without further changes (using %struct.Y = type <{ i64, %struct.X }>). Unless you were using that as an example of when using old clang to generate LLVM IR, and new LLVM to produce machine code, would break?

In D86310#4498721, @hvdijk wrote:

In D86310#4498575, @efriedma wrote:

https://reviews.llvm.org/D86310#2231136 has an example where IR generated by clang breaks.

clang bases it on the data layout, so when the change here is applied, clang already generates correct IR for that example without further changes (using %struct.Y = type <{ i64, %struct.X }>). Unless you were using that as an example of when using old clang to generate LLVM IR, and new LLVM to produce machine code, would break?

I only meant to dispute the assertion that ABI compatibility with old IR is only a problem for non-clang frontends.

In D86310#4498575, @efriedma wrote:

https://reviews.llvm.org/D86310#2231136 has an example where IR generated by clang breaks.

Right, so we'd break LTO of packed structs with i128 members.

I still think we're overthinking this, and letting our ABI compat concerns block us from making progress. Maybe we could do something as simple as erroring out from the auto-upgrader when the module being upgraded has a struct whose layout would change as a result of the increased i128 alignment, while remaining bitcode compatible in the vast majority of cases.

As for the longer term solution to this problem, instead of permitting mixed data layouts of data layout customization, IMO LLVM structs should explicitly encode field offsets. LLVM would still have APIs to assist frontends with producing semi-C-compatible struct layouts, in so much as we do today.

I'm not personally involved with any workflows that care about autoupgrade, so I'm not really invested in ensuring it's stable. If everyone agrees the impact is small enough that we're willing to just break autoupgrade to the extent it's relevant, I'll withdraw my objection.

As for the longer term solution to this problem, instead of permitting mixed data layouts of data layout customization, IMO LLVM structs should explicitly encode field offsets. LLVM would still have APIs to assist frontends with producing semi-C-compatible struct layouts, in so much as we do today.

I agree with the general sentiment that we want less dependence on alignment specified in the datalayout. Not sure about the exact design of that for structs... if IR moves in the direction people are proposing, the notion of a "struct" with a memory layout will likely go away altogether. (If we remove struct types form global variables and GEPs, there's very little left that actually cares about the layout of structs in memory.)

In D86310#4499095, @rnk wrote:

I still think we're overthinking this, and letting our ABI compat concerns block us from making progress. Maybe we could do something as simple as erroring out from the auto-upgrader when the module being upgraded has a struct whose layout would change as a result of the increased i128 alignment, while remaining bitcode compatible in the vast majority of cases.

This seems like a reasonable path forward, avoiding any concerns about IR mismatches while alerting users to the change. I would have to imagine there aren't all that many users that (1) don't use clang or another frontend that has to deal with this somehow, (2) use these types, (3) completely rely on autoupgrade.

Any i128 use, not just structs, _could_ be checked to catch mismatches like #50198 or the below example (more info on that in the github link I sent above), but this would affect clang users as well.

void i128_val_in_0_perturbed_small(
  uint8_t arg0, 
  __int128_t arg1,
  __int128_t arg2, 
  __int128_t arg3,
  __int128_t arg4, 
  float arg5
);

As a legacy OS provider on a platform that needs/requires ABI compatibility, I don't like the direction this is going. Like @rnk, I would having MORE control over struct layout is better than less. I'm adapting non-traditional languages to LLVM which allow very explicit control over layout of fields in structs. I have system-provided headers and data structures that have been the same since 1977. Fortunately, none contain i128 (or f128) sized items but I'm watching closely about any undermining of data layout control. This area of layout control (both with fields in structures and variables in sections) has been our biggest challenge with getting OpenVMS running on x86 using LLVM. I really don't want to be locked into a older version of the backend out of concerns about ABI reshuffling. We guarantee that previously compiled images continue to execute forever and that you can mix/match objects across versions. You can compile one file today and link it against an object library (provided by a 3rd party vendor) that was compiled 5 years ago with older compilers and it will work as intended.

@JohnReagan That is a valid concern, and I hope it reassures you that if things were working before, I would never be on board with this change. For example, it would generally be better if long double were 8-byte-aligned, but the x86 32-bit ABI specifies that it is 4-byte-aligned, and that is set in stone. I would be against any change in LLVM's ABI that changed their alignment, even if it would speed up code. I still occasionally run 20-year-old binaries, myself, that are dynamically linked to shared object files built with current compilers. Compatibility matters, I would not be on board with a change that breaks things like that. But that is not what is happening here. For i128, what clang implemented matched GCC, what LLVM implemented deviated from GCC/clang, but LLVM assumed that its implementation actually did match GCC/clang and code crashed as a result. This change would make it so that LLVM starts to also match GCC/clang, to change things from something that doesn't work to something that does work, and because things crash in current LLVM, I do not believe there can be much code out there that relies on the current behaviour. As you say, you aren't using i128/f128 yourself either. I hope that when I can update the patch, you can check that this does not cause problems for you.

@craig.topper Just to make sure, are you okay with me 'commandeering' this change and updating it?

In D86310#4501170, @hvdijk wrote:

For example, it would generally be better if long double were 8-byte-aligned, but the x86 32-bit ABI specifies that it is 4-byte-aligned, and that is set in stone. I would be against any change in LLVM's ABI that changed their alignment, even if it would speed up code.

That may be your view, but other users rely on the -malign-double flag (D19734) to get the new behavior, despite the ABI concerns. Specifically, it mattered for users passing structs from CPU to GPU, because the GPU doesn't tolerate misaligned doubles well. With that in mind, I wouldn't describe this ABI rule as being "set in stone", but I understand your perspective.

Returning to the patch at hand, it sounds like we have consensus that the next step is to teach auto-upgrade to traverse the module looking for uses of a particular type in structs and IR. That logic could be reused in the future to solve similar problems when we need to adjust the layout of exotic types.

In D86310#4501240, @rnk wrote:

In D86310#4501170, @hvdijk wrote:

For example, it would generally be better if long double were 8-byte-aligned, but the x86 32-bit ABI specifies that it is 4-byte-aligned, and that is set in stone. I would be against any change in LLVM's ABI that changed their alignment, even if it would speed up code.

That may be your view, but other users rely on the -malign-double flag (D19734) to get the new behavior, despite the ABI concerns. Specifically, it mattered for users passing structs from CPU to GPU, because the GPU doesn't tolerate misaligned doubles well. With that in mind, I wouldn't describe this ABI rule as being "set in stone", but I understand your perspective.

As long as it is an option, it is fine, that will not cause compatibility issues.

Returning to the patch at hand, it sounds like we have consensus that the next step is to teach auto-upgrade to traverse the module looking for uses of a particular type in structs and IR. That logic could be reused in the future to solve similar problems when we need to adjust the layout of exotic types.

That is not my understanding of the consensus that we have, that is something that you asked for, then asked who asked for it, and are now again asking for. I do not see anyone else having asked for this, and I repeat that I think it is an unreasonable amount of work.

@craig.topper Just to make sure, are you okay with me 'commandeering' this change and updating it?

Yes. Thanks for taking it on.

hvdijk commandeered this revision.Jul 16 2023, 5:38 AM

hvdijk edited reviewers, added: craig.topper; removed: hvdijk.

Rebased on current LLVM. No longer limited to 64 bit mode. 32 bit mode was taken out in D28990 because it was not believed to be required for compatibility, but as PR50198 shows, it does cause a compatibility issue to leave that as is.

This diff is uploaded without full context, because full context causes it to grow beyond what Phabricator allows in an upload.

I will write about the current state of compatibility in more detail shortly.

Herald added a project: Restricted Project. · View Herald TranscriptJul 16 2023, 5:44 AM

THE CURRENT STATE

LLVM

LLVM permits i128 and fp128 in both x86 and x64. (When I write "x86", throughout this comment, I mean 32-bit x86.)
For x86, it aligns i128 to 4 bytes. It aligns fp128 to 16 bytes, except for Intel MCU. It calls libgcc functions for fp128, but not for i128. It uses i128 in these fp128 calls, which gets passed in memory.
For x64, it aligns i128 to 8 bytes. It aligns fp128 to 16 bytes. It calls libgcc functions for both i128 and fp128, and uses i128 in these fp128 calls. It uses i128 in these fp128 calls. Arguments to libgcc functions are passed in registers, with the exception of _ _udivmodti4 and _ _udivmodti4 which are currently not used by LLVM.

GCC

x86: GCC does not permit _ _int128/_BitInt(128). GCC does permit _Float128/_ _float128 and aligns it to 16 bytes, except for Intel MCU.
x64: GCC does not permit _BitInt(128). It permits _ _int128 and _Float128/_ _float128 and aligns them both to 16 bytes.

clang

x86: clang permits _BitInt(128) and aligns it to 4 bytes. It permits _ _float128 and aligns it to 16 bytes, except for Intel MCU. It does not permit _ _int128 or _Float128. It maps _BitInt(128) to LLVM i128 and _ _float128 to LLVM fp128.
x64: clang permits _BitInt(128), _ _int128, and _ _float128. It aligns _BitInt(128) to 8 bytes, and _ _int128 and _ _float128 to 16 bytes. It does not permit _Float128. It maps _BitInt(128) and _ _int128 to LLVM i128, and _ _float128 to LLVM fp128.

MSVC

MSVC does not support _ _int128/_BitInt(128)/_Float128/_ _float128 in either x86 or x64.

Compatibility between LLVM and GCC

For x86, the current i128 handling is compatible. The alignment to 8 byte boundaries causes no compatibility issues because nothing else supports i128.
For x86, the current fp128 handling is incompatible. The use of i128 with lower alignment in a call into libgcc breaks compatibility.

For x64, the current i128 handling is compatible but fragile. The alignment to 8 byte boundaries causes no compatibility issue because all calls into libgcc pass values in registers. If support for _ _udivmodti4 and _ _udivmodti4 were to be added in the future, the current i128 handling would be wrong.
For x64, the current fp128 handling is compatible. The alignment to 8 byte boundaries causes no compatibility issue because all calls into libgcc pass values in registers. No other libgcc functions use pointers.

Compatibility between clang and GCC

For both x86 and x64, for all types supported by both clang and GCC, they agree on alignment. The handling is compatible. For _BitInt(128), although not yet implemented in GCC, the x86-64 psABI has been changed to require that this be aligned like i64 (https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/32) and this is what GCC is implementing too (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989). clang agrees with it. The i386 psABI has not yet been changed but has been said will follow the x86-64 psABI.

Compatibility between clang and LLVM

When generating LLVM IR, where LLVM's native alignment is different, this is worked around, e.g. by inserting dummy fields into structures to add padding. The handling is compatible.

Rust

Rust permits i128 and u128 in both x86 and x64. It translates this into LLVM i128. Per https://rust-lang.github.io/rfcs/1504-int128.html, this is intended to match clang's _ _int128. Because LLVM's i128 alignment is different from clang's _ _int128, it instead actually matches clang's _BitInt(128).

ISSUES

As far as I can tell, the compatibility issues in the current version of LLVM are: the fp128 handling in x86, potentially the i128 handling in x64, and the i128 handling in Rust.

For the fp128 handling in LLVM for x86, it is required that LLVM align these to 16 bytes.

For the i128 handling in LLVM for x64, it is not currently required that LLVM align these to 16 bytes, but it will be required in the future if _ _udivmodti4 and _ _udivmodti4 are added.

For the i128 handling in Rust, it assumes that LLVM's i128 matches clang's _ _int128, which it currently does not.

QUESTIONS

Is the behaviour of LLVM, clang, GCC, and MSVC indeed as I described, or did I make a mistake anywhere?

Am I correct in how these three issues affect compatibility?

Are there issues related to i128 alignment in current LLVM beyond what I have written here?

Note that in this particular comment, none of it is intended to provide any argument as to whether or not the patch should be applied. This comment is only intended to get clarity on the current state.

Edit: The constant alignment in bug 46320 is definitely an issue as well.

Harbormaster completed remote builds in B245676: Diff 540796.Jul 16 2023, 6:38 AM

In D86310#4504168, @hvdijk wrote:

THE CURRENT STATE
[...]
Compatibility between LLVM and GCC

For x86, the current i128 handling is compatible. The alignment to 8 byte boundaries causes no compatibility issues because nothing else supports i128.
For x86, the current fp128 handling is incompatible. The use of i128 with lower alignment in a call into libgcc breaks compatibility.

For x64, the current i128 handling is compatible but fragile. The alignment to 8 byte boundaries causes no compatibility issue because all calls into libgcc pass values in registers. If support for _ _udivmodti4 and _ _udivmodti4 were to be added in the future, the current i128 handling would be wrong.
For x64, the current fp128 handling is compatible. The alignment to 8 byte boundaries causes no compatibility issue because all calls into libgcc pass values in registers. No other libgcc functions use pointers.

Is the compatibility note here only meant to address calls between LLVM and GCC, not generated code? Because of course, struct layout and pass-in-memory function calls are incompatible.

ISSUES

As far as I can tell, the compatibility issues in the current version of LLVM are: the fp128 handling in x86, potentially the i128 handling in x64, and the i128 handling in Rust.

Rust just uses LLVM's i128 value directly so it doesn't necessarily need to be called out on its own (think we are in agreement here, just clarifying)

[...]
QUESTIONS

Is the behaviour of LLVM, clang, GCC, and MSVC indeed as I described, or did I make a mistake anywhere?

I believe that MSVC is in general ambiguous about these details on types that it does not support, but I would assume that being consistent with the Linux ABI is preferred and probably what MSVC would choose if they ever do decide on a specification for this type (unless LLVM has contact with Microsoft that may be able to clarify? They make no guarantees against breaking things in any case.)

[...]

It probably makes sense to have reasoning for choosing the selected behavior and having something specific to test against, so I'll link what I know.

From AMD4 ABI Draft 0.99.7 (2014):

[paraphrased from Figure 3.1]
type - sizeof - alignment - AMD64 architecture
long - 8 - 8 - signed eightbyte [I included this in the table for the below reference]
__int128 - 16 - 16 - signed sixteenbyte
signed __int128 - 16 - 16 - signed sixteenbyte
long double - 16 - 16 - 80-bit extended (IEEE-754)
__float128 - 16 - 16 - 128-bit extended (IEEE-754)
[...]
The __int128 type is stored in little-endian order in memory, i.e., the 64
low-order bits are stored at a a lower address than the 64 high-order bits
[...]
Arguments of type __int128 offer the same operations as INTEGERs,
yet they do not fit into one general purpose register but require two registers.
For classification purposes __int128 is treated as if it were implemented
as:
typedef struct {
    long low, high;
} __int128;
with the exception that arguments of type __int128 that are stored in
memory must be aligned on a 16-byte boundary

K1OM agrees https://www.intel.com/content/dam/develop/external/us/en/documents/k1om-psabi-1-0.pdf
These types don't seem to be mentioned anywhere in i386 1997 https://www.sco.com/developers/devspecs/abi386-4.pdf
Also not in MIPS RISC 1996 https://math-atlas.sourceforge.net/devel/assembly/mipsabi32.pdf
MIPSpro64 doesn't mention 128-bit integers but does mention 128-bit floats. From page 24 https://math-atlas.sourceforge.net/devel/assembly/mipsabi64.pdf

Quad-precision floating point parameters (C long double or Fortran REAL*16) are
always 16-byte aligned. This requires that they be passed in even-odd floating point
register pairs, even if doing so requires skipping a register parameter and/or a
64-bit save area slot. [The 32-bit ABI does not consider long double parameters,
since they were not supported.]

From PPC64 section 3.1.4 https://math-atlas.sourceforge.net/devel/assembly/PPC-elf64abi-1.7.pdf:

[paraphrased from table]
type - sizeof - alignment
__int128_t - 16 - quadword
__uint128_t - 16 - quadword
long double - 16 - quadword

z/Arch: this is the only target that clang seems to align to 8, see [1]. Also from 1.1.2.4 in https://github.com/IBM/s390x-abi/releases/tag/v1.6:

[paraphrased from table]
type - size (bytes) - alignment
__int128 - 16 - 8
signed __int128 - 16 - 8
long double - 16 - 8

[1]: https://reviews.llvm.org/D130900

In D86310#4516184, @tmgross wrote:

Is the compatibility note here only meant to address calls between LLVM and GCC, not generated code? Because of course, struct layout and pass-in-memory function calls are incompatible.

There should be no compatibility issue there between GCC and clang in most cases, because clang ensures __int128 is aligned to 16 bytes everywhere, even if the LLVM data layout specifies lower alignment. clang's __int128 and LLVM's i128 play by different rules, currently. This change would make them play by the same rules.

Rust just uses LLVM's i128 value directly so it doesn't necessarily need to be called out on its own (think we are in agreement here, just clarifying)

I do think it needs to be called out on its own: Rust makes its i128 match LLVM's i128, while assuming it's matching clang's __int128.

I believe that MSVC is in general ambiguous about these details on types that it does not support, but I would assume that being consistent with the Linux ABI is preferred and probably what MSVC would choose if they ever do decide on a specification for this type (unless LLVM has contact with Microsoft that may be able to clarify? They make no guarantees against breaking things in any case.)

I would assume the same unless we hear otherwise.

Thanks for the references to the ABIs for __int128. It's good to know that if we decide LLVM's i128 should match that, we can make the same change everywhere.

Here's confirmation that _BitInt(128) should be 8-byte aligned and not 16 (so, different from __int128) from https://gitlab.com/x86-psABIs/x86-64-ABI:

• For N > 64, they are treated as struct of 64-bit integer chunks. The number of
chunks is the smallest number that can contain the type. _BitInt(N) types are
byte-aligned to 64 bits. The size of these types is the smallest multiple of the 64-bit
chunks greater than or equal to N.

Just FYI. There are a few reports about the compatibility issues, e.g., #41784. There's also concern about the alignment difference between _BitInt(128) and __int128, see #60925

In D86310#4516876, @pengfei wrote:

Just FYI. There are a few reports about the compatibility issues, e.g., #41784.

Thanks. This is a case where clang breaks up __int128 into 2x i64 before it gets to LLVM. It is therefore not affected by this patch. Your other link also references #20283, which is the same issue of clang breaking __int128 into 2x i64.

Although this patch will not fix those issues, it may make it easier to fix them later on: it will give clang the ability to use LLVM's i128 type rather than trying to emulate it.

There's also concern about the alignment difference between _BitInt(128) and __int128, see #60925

That references https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues/11, where the answer four months ago was basically "it's probably already too late for that" with a suggestion to try and post on the mailing list to try and convince others that this was important enough to do. Nothing was posted to the mailing list, and by now GCC has started implementing what the ABI specifies (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989). I think we would need an extraordinary rationale if we want to convince others that the ABI should be changed.

In D86310#4516911, @hvdijk wrote:

In D86310#4516876, @pengfei wrote:

Just FYI. There are a few reports about the compatibility issues, e.g., #41784.

Thanks. This is a case where clang breaks up __int128 into 2x i64 before it gets to LLVM. It is therefore not affected by this patch. Your other link also references #20283, which is the same issue of clang breaking __int128 into 2x i64.

Although this patch will not fix those issues, it may make it easier to fix them later on: it will give clang the ability to use LLVM's i128 type rather than trying to emulate it.>

Does this happen on the clang side or the LLVM side? I built rustc against LLVM with your patch ([link to source](llvm.org/docs/LangRef.html#floating-point-types)) and it makes rustc compatible with clang (progress!) but it still seems not compatible with GCC. That is, after the patch rustc now seems to have an identical calling behavior to clang, so I'm thinking that maybe this behavior lies somewhere in LLVM and not the frontend?

Quick ABI check that demonstrates this https://github.com/tgross35/quick-abi-check, the outputs of note (clang-new is built with this patch):

# all caller-foo-callee-samefoo work fine

+ ./bins/caller-gcc-callee-gcc
caller cc: gcc 11.3.0
caller align i128 16
caller arg0 244
caller argval 0xf0e0d0c0b0a09080706050403020100
caller arg15 123456.125000
callee cc: gcc 11.3.0
callee arg0 244
callee arg1 0xf0e0d0c0b0a09080706050403020100
callee arg2 0xf0e0d0c0b0a09080706050403020100
callee arg3 0xf0e0d0c0b0a09080706050403020100
callee arg4 0xf0e0d0c0b0a09080706050403020100
callee arg15 123456.125000

# between clang and gcc arg3+ seem to flip he word order?
+ ./bins/caller-gcc-callee-clang-old
caller cc: gcc 11.3.0
caller align i128 16
caller arg0 244
caller argval 0xf0e0d0c0b0a09080706050403020100
caller arg15 123456.125000
callee cc: clang 14.0.0 
callee arg0 244
callee arg1 0xf0e0d0c0b0a09080706050403020100
callee arg2 0xf0e0d0c0b0a09080706050403020100
callee arg3 0x7060504030201000000000000000000
callee arg4 0x7060504030201000f0e0d0c0b0a0908
callee arg15 123456.125000

+ ./bins/caller-gcc-callee-clang-new
caller cc: gcc 11.3.0
caller align i128 16
caller arg0 244
caller argval 0xf0e0d0c0b0a09080706050403020100
caller arg15 123456.125000
callee cc: clang 17.0.0 (git@github.com:tgross35/llvm-project.git 1733d949633a61cd0213f63e22d461a39e798946)
callee arg0 244
callee arg1 0xf0e0d0c0b0a09080706050403020100
callee arg2 0xf0e0d0c0b0a09080706050403020100
callee arg3 0x7060504030201000000000000000000
callee arg4 0x7060504030201000f0e0d0c0b0a0908
callee arg15 123456.125000

I think this patch can stand on its own even if it doesn't fix the above, but I'm just trying to get a better idea of where it's coming from if anyone knows more details.

There's also concern about the alignment difference between _BitInt(128) and __int128, see #60925

That references https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues/11, where the answer four months ago was basically "it's probably already too late for that" with a suggestion to try and post on the mailing list to try and convince others that this was important enough to do. Nothing was posted to the mailing list, and by now GCC has started implementing what the ABI specifies (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989). I think we would need an extraordinary rationale if we want to convince others that the ABI should be changed.

I reached out to the person who authored that. We will follow up with the mailing list to at least make a mild push for _BitInt(128) to take the alignment of __int128 (imagine the stackoverflow confusion if they aren't the same). However, I'm in agreement with the above comment - that is a separate concern from the behavior here since LLVM already complies with the bitint spec.

In D86310#4519549, @tmgross wrote:

Does this happen on the clang side or the LLVM side?

Definitely on the clang side, but...

I built rustc against LLVM with your patch ([link to source](llvm.org/docs/LangRef.html#floating-point-types)) and it makes rustc compatible with clang (progress!) but it still seems not compatible with GCC. That is, after the patch rustc now seems to have an identical calling behavior to clang, so I'm thinking that maybe this behavior lies somewhere in LLVM and not the frontend?

...this suggests that potentially the same thing that clang is doing, LLVM is also doing independently. In which case, maybe it would be better to fix that at the same time: if we decide that LLVM's i128 should match __int128, I'd rather have a single change to the ABI to make it match __int128, rather than incremental changes, because incremental changes make it more likely that someone is going to be using the intermediate version and relying on its ABI. It'll be a little while before I can look into this but I'll try to come up with a patch to apply on top of this if no one else gets to it first.

Quick ABI check that demonstrates this https://github.com/tgross35/quick-abi-check, the outputs of note (clang-new is built with this patch):

Thanks, this is useful as an extra test.

@nikic posted a patch that fixes the register passing at https://reviews.llvm.org/D158169. I think that patch plus this one should resolve all the problems we have

In D86310#4595996, @tmgross wrote:

@nikic posted a patch that fixes the register passing at https://reviews.llvm.org/D158169. I think that patch plus this one should resolve all the problems we have

Thanks for the link, that will save a lot of time. I don't think it will resolve all the problems, but that it's a significant additional step in the right direction. We also need to make clang use i128 for __int128 in order to actually use this fixed calling convention.

Is clang still doing something wrong? From my testing, it seems like clang and GCC now agree with each other, I am not sure what would still be incorrect

In D86310#4596712, @tmgross wrote:

Is clang still doing something wrong? From my testing, it seems like clang and GCC now agree with each other, I am not sure what would still be incorrect

My understanding is that the code clang generates for __int128 will still allow it to be passed half-in-register, half-in-memory, exactly what D158169 sets out to fix, because D158169 only fixes it for LLVM's i128 which clang bypasses.

In D86310#4596730, @hvdijk wrote:

My understanding is that the code clang generates for __int128 will still allow it to be passed half-in-register, half-in-memory, exactly what D158169 sets out to fix, because D158169 only fixes it for LLVM's i128 which clang bypasses.

I think that D158169 seems to have fixed clang as well; after applying both patches, clang gcc and rustc all seem to agree. On the readme for https://github.com/tgross35/quick-abi-check look at the tests i128-caller-gcc-callee-clang-old (args don't align) i128-caller-gcc-callee-clang-new (args are the same) and i128-caller-gcc-callee-rustc (args are the same). Also the full ABI checker seems to say everything is in order (https://github.com/rust-lang/rust/pull/113880#issuecomment-1683021483 not sure why it says "4 failed" at the end, but I think it's a bug since no tests actually show failed).

Does this all seem correct? As far as I can tell it seems like with both patchs these issues should be resolved.

Was your failure in https://bugs.llvm.org/show_bug.cgi?id=50198 fixed with these patches? I cannot reproduce that failure for some reason, but it would likely make a good run-pass test.

These two patches do not seem to fix varargs segfaulting, as documented in https://bugs.llvm.org/show_bug.cgi?id=19909 (testing with this code https://godbolt.org/z/WeE7TvrGe) so it seems like that will need a separate fix.

In D86310#4596841, @tmgross wrote:

I think that D158169 seems to have fixed clang as well; after applying both patches, clang gcc and rustc all seem to agree.

Interesting. I cannot see how it would, I may be missing something; I will check when I am able.

In D86310#4596932, @tmgross wrote:

Was your failure in https://bugs.llvm.org/show_bug.cgi?id=50198 fixed with these patches?

Yes, it was (at least it was at the time that I initially commented).

I cannot reproduce that failure for some reason, but it would likely make a good run-pass test.

It's reproducible online, https://godbolt.org/z/j918EeoMv, it would be interesting to know why it does not fail for you.

These two patches do not seem to fix varargs segfaulting, as documented in https://bugs.llvm.org/show_bug.cgi?id=19909 (testing with this code https://godbolt.org/z/WeE7TvrGe) so it seems like that will need a separate fix.

Thanks, and clang appears to avoid the use of the LLVM va_arg instruction here; we'll have to make sure to adapt that example to the LLVM IR equivalent that does use va_arg to make sure that's tested as well, and fixed if needed.

In D86310#4597359, @hvdijk wrote:

I cannot reproduce that failure for some reason, but it would likely make a good run-pass test.

It's reproducible online, https://godbolt.org/z/j918EeoMv, it would be interesting to know why it does not fail for you.

I tested both on my machine and in a container (debian docker image, then installing clang and gcc-multilib only) and can't get it to reproduce. Weird.

$ clang --version
Ubuntu clang version 14.0.0-1ubuntu1.1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
$ cat stack-fail.c
int main(void) {
  long double ld = 0;
  __float128 f128 = ld;
  int i = f128;
  return i;
}
$ clang -m32 stack-fail.c && ./a.out && echo ok
ok

stack-fail.s1 KBDownload

stack-fail.ll1 KBDownload

The IR looks about the same as on godbolt but maybe the attributes are affecting something. It's probably still doing the wrong thing, just not segfaulting for whatever reason

These two patches do not seem to fix varargs segfaulting, as documented in https://bugs.llvm.org/show_bug.cgi?id=19909 (testing with this code https://godbolt.org/z/WeE7TvrGe) so it seems like that will need a separate fix.

Thanks, and clang appears to avoid the use of the LLVM va_arg instruction here; we'll have to make sure to adapt that example to the LLVM IR equivalent that does use va_arg to make sure that's tested as well, and fixed if needed.

Are you comfortable landing these two patches without fixing varargs, since it seems like those need separate work? Not sure if llvm follows kernel conventions but assuming yes and assuming you are OK with however D158169 seems to fix clang, you can add Tested-by: Trevor Gross <tmgross@umich.edu>: to the best of my knowledge the alignment, ABI, and general interop problems on x86_64 have been resolved for both LLVM and Clang.

tmgross mentioned this in D158169: [X86] Fix i128 argument passing under SysV ABI.Aug 17 2023, 7:52 PM

slanterns added a subscriber: slanterns.Aug 19 2023, 11:17 AM

In D86310#4597359, @hvdijk wrote:

In D86310#4596841, @tmgross wrote:

I think that D158169 seems to have fixed clang as well; after applying both patches, clang gcc and rustc all seem to agree.

Interesting. I cannot see how it would, I may be missing something; I will check when I am able.

D158169 landed today, I confirmed that the current main (with D158169) makes Clang <-> GCC works but LLVM still fails without this patch.

Doesn't clang just wind up going through the same tablegen as LLVM, so it makes sense that both would be fixed?

In D86310#4596932, @tmgross wrote:

Was your failure in https://bugs.llvm.org/show_bug.cgi?id=50198 fixed with these patches?

Yes, it was (at least it was at the time that I initially commented).

You mean this patch only right - how does that work? Looking closer at your comments there, it doesn't seem like i128 changes would affect anything if the f128 return alignment is the source of the problem.

h-vetinari added a subscriber: h-vetinari.Aug 22 2023, 12:21 AM

In D86310#4605475, @tmgross wrote:

In D86310#4597359, @hvdijk wrote:

In D86310#4596841, @tmgross wrote:

I think that D158169 seems to have fixed clang as well; after applying both patches, clang gcc and rustc all seem to agree.

Interesting. I cannot see how it would, I may be missing something; I will check when I am able.

D158169 landed today, I confirmed that the current main (with D158169) makes Clang <-> GCC works but LLVM still fails without this patch.

I had hoped to avoid the piecewise ABI breakage, but with that already having landed, we already have that anyway, so I no longer see a reason to delay this until we can also fix va_arg.

Doesn't clang just wind up going through the same tablegen as LLVM, so it makes sense that both would be fixed?

Actually able to look into this now again, and yes, it does. I was sure I'd seen clang expand __int128 so that at the LLVM level, there was no longer any i128, but it does not happen here, and because it does not happen here, this patch does fix it.

Was your failure in https://bugs.llvm.org/show_bug.cgi?id=50198 fixed with these patches?

Yes, it was (at least it was at the time that I initially commented).

You mean this patch only right - how does that work? Looking closer at your comments there, it doesn't seem like i128 changes would affect anything if the f128 return alignment is the source of the problem.

See the source code comment I quoted in https://bugs.llvm.org/show_bug.cgi?id=50198#c3: "If the target does not have native f128 support, expand it to i128 and we will be generating soft float library calls." This applies to x86. f128 is expanded to i128, so any changes to the alignment for i128 automatically apply to f128 as well.

In D86310#4516911, @hvdijk wrote:

In D86310#4516876, @pengfei wrote:

There's also concern about the alignment difference between _BitInt(128) and __int128, see #60925

That references https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues/11, where the answer four months ago was basically "it's probably already too late for that" with a suggestion to try and post on the mailing list to try and convince others that this was important enough to do. Nothing was posted to the mailing list, and by now GCC has started implementing what the ABI specifies (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989). I think we would need an extraordinary rationale if we want to convince others that the ABI should be changed.

The discussion has since moved to the list (https://groups.google.com/g/x86-64-abi/c/-JeR9HgUU20) and it seems as if the alignment of __int128 is fixed, no changes are planned there; if anything changes, it will be the alignment of _BitInt(128), and that will be independent of this patch.

Based on this, I now do think again the right course of action is to just commit this. It still applies to current LLVM without changes, and passes tests.

The point that is still contentious is the handling of IR generated from older versions of LLVM that do not have this patch. Personally, I feel that D158169 being accepted already answered how to handle this. D158169 clearly broke the ABI in LLVM: code generated with the current version of LLVM is not binary compatible with code generated with older versions of LLVM. But that is considered acceptable when the code generated by these older versions of LLVM was buggy and we have no reason to expect that there is code out there that relies on that bug remaining unfixed. The same logic applies here.

See the source code comment I quoted in https://bugs.llvm.org/show_bug.cgi?id=50198#c3: "If the target does not have native f128 support, expand it to i128 and we will be generating soft float library calls." This applies to x86. f128 is expanded to i128, so any changes to the alignment for i128 automatically apply to f128 as well.

Thank you for the explanation, that makes sense.

In D86310#4516911, @hvdijk wrote:

In D86310#4516876, @pengfei wrote:

There's also concern about the alignment difference between _BitInt(128) and __int128, see #60925

That references https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues/11, where the answer four months ago was basically "it's probably already too late for that" with a suggestion to try and post on the mailing list to try and convince others that this was important enough to do. Nothing was posted to the mailing list, and by now GCC has started implementing what the ABI specifies (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989). I think we would need an extraordinary rationale if we want to convince others that the ABI should be changed.

The discussion has since moved to the list (https://groups.google.com/g/x86-64-abi/c/-JeR9HgUU20) and it seems as if the alignment of __int128 is fixed, no changes are planned there; if anything changes, it will be the alignment of _BitInt(128), and that will be independent of this patch.

Agreed; LLVM is doing the wrong thing with i128 and the correct thing for _BitInt(128), so _BitInt has no bearing on this change.

Based on this, I now do think again the right course of action is to just commit this. It still applies to current LLVM without changes, and passes tests.

The point that is still contentious is the handling of IR generated from older versions of LLVM that do not have this patch. Personally, I feel that D158169 being accepted already answered how to handle this. D158169 clearly broke the ABI in LLVM: code generated with the current version of LLVM is not binary compatible with code generated with older versions of LLVM. But that is considered acceptable when the code generated by these older versions of LLVM was buggy and we have no reason to expect that there is code out there that relies on that bug remaining unfixed. The same logic applies here.

Also agreed with this, I think concensus on this thread seems to be in agreement with the current patch too. Looking forward to the land :)

Is this just waiting for a review?

In D86310#4648446, @tmgross wrote:

Is this just waiting for a review?

Yes, I think so. Valid concerns over compatibility were raised, but now that strict compatibility with i128 has already been broken anyway, I no longer believe there is any reason not to just apply this as is, preferably soon so as to minimise the time that we have the current partially changed i128 ABI, to minimise the chance that people will start to rely on that somewhere.

I'm happy to sign off on the x86-64 part here, but I'm less sure about x86-32. If I understood correctly, the i128 alignment is raised there exclusively to fix the "f128 legalized to i128 libcall" case -- is there any other ABI requirement for i128 alignment on x86-32? Is raising i128 alignment the right way to fix an f128 issue?

In D86310#4648634, @nikic wrote:

I'm happy to sign off on the x86-64 part here, but I'm less sure about x86-32. If I understood correctly, the i128 alignment is raised there exclusively to fix the "f128 legalized to i128 libcall" case -- is there any other ABI requirement for i128 alignment on x86-32? Is raising i128 alignment the right way to fix an f128 issue?

GCC does not support __int128 on x86-32, but clang has the -fforce-enable-int128 option, and when that is used, it gives it the same 16-byte alignment that it does on x86-64, so even ignoring the _Float128 issue, I think the change is right for x86-32.

In D86310#4648646, @hvdijk wrote:

In D86310#4648634, @nikic wrote:

I'm happy to sign off on the x86-64 part here, but I'm less sure about x86-32. If I understood correctly, the i128 alignment is raised there exclusively to fix the "f128 legalized to i128 libcall" case -- is there any other ABI requirement for i128 alignment on x86-32? Is raising i128 alignment the right way to fix an f128 issue?

GCC does not support __int128 on x86-32, but clang has the -fforce-enable-int128 option, and when that is used, it gives it the same 16-byte alignment that it does on x86-64, so even ignoring the _Float128 issue, I think the change is right for x86-32.

Okay, that's a compelling argument.

llvm/lib/IR/AutoUpgrade.cpp
5233	I don't think this will work for the 32-bit targets that don't have `-i64:64`.

hvdijk added inline comments.Sep 20 2023, 10:03 AM

llvm/lib/IR/AutoUpgrade.cpp
5233	Oh, you're right, thanks. That was intentional but wrong: there is a test that we do not upgrade data layout strings that do not look sufficiently close to valid, and this was intended to address that. But this also avoids it for data layout strings that do need upgrading. I'll have to figure out how to handle both; will update when I know how.

I do not think there is a sensible way to keep upgrade-datalayout2.ll working, with the way the upgrade logic is structured, and we should rethink that test. The change here intends to insert -i128:128- into x86 data layouts that do not have it. The goal of upgrade-datalayout2.ll is to test that data layouts that are not valid x86 data layouts do not get upgraded. However, I see no sensible logic by which we can say that in this particular case, we should not add it.

What's more, none of the data layout upgrades *ever* checked that the data layout was a valid x86 data layout, not even D67631 which added this test: it is easy to construct data layout strings that are not valid x86 data layout strings, that would already be upgraded by that very first version of UpgradeDataLayoutString, despite what the test claimed to check. So if we regard it as a bug to upgrade invalid target data layout strings, this is a pre-existing bug. Alternatively, we can choose to not regard it as a bug, and instead say the test is invalid. I do not know the rationale here, but given that it was explicitly said to be intended to work this way, I am on the side of seeing it as a pre-existing bug. One that is nearly impossible to fix in the current structure.

Now that there can only be one valid data layout string per target, if it is intended that UpgradeDataLayoutString only upgrade target-valid data layout strings, it is a bug for UpgradeDataLayoutString to ever produce anything other than 1) its input or 2) the target's one valid data layout string. This allows a much simpler implementation that completely fixes the bug, but is too big to be part of this change. I would like to propose that in this change, we change UpgradeDataLayoutString to insert -i128:128- including in that one test, and we XFAIL upgrade-datalayout2.ll since the uncovered bug is not actually a new bug. In a followup PR, I can then restructure the UpgradeDataLayoutString logic by removing the function entirely and instead having target functions to check whether a given data layout string is a valid historic data layout string for the target that should be upgraded, and if so, simply clobbering the data layout string with what the target reports is the correct data layout string. (Edit: Proof of concept: https://github.com/hvdijk/llvm-project/commit/14e7f5dd2b8de862773b0700bde483bd722e4ad5. This does not update the tests that actually rely on invalid data layout strings being upgraded, and does not merge this new computeX86DataLayout with the existing computeDataLayout in X86TargetMachine.cpp, but should make it clear it can be done without difficulty.)

Does that seem reasonable? Am I overlooking anything that would make this a non-option? Are there good alternatives that I am not seeing right now?

Regarding upgrade-datalayout2.ll, I don't think we need to be too constrained by it. @akhuang , do you recall why you added it?

In other words, I think your direction is reasonable, we should go forward with this.

Updated AutoUpgrade.cpp to also upgrade 32-bit data layout strings.
Marked upgrade-datalayout2.ll as XFAIL with an explanation.
Added upgrade-datalayout5.ll to check that we upgrade 32-bit data layout strings.
Removed a test from DataLayoutUpgradeTest.cpp that we do not upgrade e-p:32:32, which now gets upgraded to e-p:32:32-i128:128.

In D86310#4652817, @rnk wrote:

Regarding upgrade-datalayout2.ll, I don't think we need to be too constrained by it. @akhuang , do you recall why you added it?

In other words, I think your direction is reasonable, we should go forward with this.

Thanks, I have now done this. The same problem turned out to also exist in DataLayoutUpgradeTest.cpp, where XFAIL is not an option, so I removed that one. That is not ideal but I do not see what alternative we have.

hvdijk added inline comments.Oct 8 2023, 6:24 PM

llvm/lib/IR/AutoUpgrade.cpp
5233	This should now be fixed. X86 data layout strings always have their components in the same order, `mpifnaS`, where some may be omitted. I make use of this by looking for any leading `-m`/`-p`/`-i` components and inserting `-i128:128` after the last of those.

Harbormaster completed remote builds in B257789: Diff 557645.Oct 8 2023, 6:54 PM

I re-read the code review, and I think most folks are in favor of this change, but I may have missed some. Many concerns were raised, so please wait for approval from @efriedma as well before landing.

This revision is now accepted and ready to land.Oct 9 2023, 1:28 PM

Given the complexity here, I agree this is probably the best we can reasonably do. Code and test changes LGTM.

That said, this is missing a release note.

In D86310#4653537, @efriedma wrote:

Given the complexity here, I agree this is probably the best we can reasonably do. Code and test changes LGTM.

That said, this is missing a release note.

Thanks, added a release note. I'm not sure how much detail release notes usually go into, can shorten it or add to it if you like.

Explicitly still ok with this as well. Thanks for continuing here. :)

Harbormaster completed remote builds in B257796: Diff 557655.Oct 9 2023, 6:10 PM

Tested that this patch applied on top of main fixes all i128 ABI issues among gcc, clang, and rustc. Probably would be good to add https://bugs.llvm.org/show_bug.cgi?id=50198 to the test suite if it isn't there already.

Thanks for sticking with this Harald!

In D86310#4653550, @tmgross wrote:

Probably would be good to add https://bugs.llvm.org/show_bug.cgi?id=50198 to the test suite if it isn't there already.

That test would not work as an LLVM test directly, but we do already have lit tests that cover that, the test changes in here show the fixed alignment of f128 too.

This was ready to push pending @efriedma's approval, who rightly pointed out a release note was missing but it was otherwise okay. With the release note now added, I think that there is nothing stopping this from being pushed, so I intend to do so once I am able to rebase one hopefully last time and re-run tests to verify no new tests have been added that also require an update. Thanks for the feedback, everyone.

Closed by commit rGa21abc782a8e: [X86] Align i128 to 16 bytes in x86 datalayouts (authored by hvdijk). · Explain WhyOct 11 2023, 2:24 AM

This revision was automatically updated to reflect the committed changes.

hvdijk added a commit: rGa21abc782a8e: [X86] Align i128 to 16 bytes in x86 datalayouts.

The buildbot found one more test that needed updating, that was disabled on my system. Created https://github.com/llvm/llvm-project/pull/68781 for that.

GitHub <noreply@github.com> mentioned this in rG20799fd57bcb: [OCaml][test] Use correct data layout string. (#68781).Oct 11 2023, 3:12 AM

leonardchan mentioned this in rG2ae3a7123048: Fix minimal-throw-catch.ll on x86 mac.Oct 12 2023, 1:35 PM

Hi there,

This change seems to be causing assertion failure in clang when a struct contains a _BitInt with length longer than 128 - https://godbolt.org/z/4jTrW4fcP .

In D86310#4656024, @Fznamznon wrote:

Hi there,

This change seems to be causing assertion failure in clang when a struct contains a _BitInt with length longer than 128 - https://godbolt.org/z/4jTrW4fcP .

Thanks for the report. This is a pre-existing problem for i128:128 targets -- the same assertion failure can be seen, without this change, for the same program with e.g. --target=aarch64 -Xclang -fexperimental-max-bitint-width=1024 -- so I don't think it's a problem in this change directly, but considering X86 is the only target that enables >128-bit bit integers by default, it became far more visible after this change and because of that, more important to fix. This is related to an open ABI issue, #60925, and I have added a comment there.

Revision Contents

Path

Size

clang/

lib/

Basic/

Targets/

OSTargets.h

2 lines

X86.h

8 lines

test/

CodeGen/

target-data.c

4 lines

llvm/

include/

llvm/

IR/

AutoUpgrade.h

2 lines

lib/

IR/

AutoUpgrade.cpp

41 lines

Target/

X86/

X86TargetMachine.cpp

5 lines

test/

Bitcode/

upgrade-datalayout.ll

4 lines

upgrade-datalayout3.ll

4 lines

CodeGen/

X86/

atomic-unordered.ll

18 lines

bitcast-i256.ll

2 lines

catchpad-dynamic-alloca.ll

2 lines

implicit-null-check.ll

16 lines

legalize-shl-vec.ll

28 lines

osx-private-labels.ll

2 lines

scheduler-backtracking.ll

100 lines

setcc-wide-types.ll

58 lines

sret-implicit.ll

4 lines

statepoint-vector.ll

4 lines

tools/

llvm-lto2/

X86/

pipeline.ll

2 lines

slp-vectorize-pm.ll

2 lines

stats-file-option.ll

2 lines

unittests/

Bitcode/

DataLayoutUpgradeTest.cpp

16 lines

Diff 286858

clang/lib/Basic/Targets/OSTargets.h

Show First 20 Lines • Show All 783 Lines • ▼ Show 20 Lines	NaClTargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts)
this->LongDoubleFormat = &llvm::APFloat::IEEEdouble();		this->LongDoubleFormat = &llvm::APFloat::IEEEdouble();
if (Triple.getArch() == llvm::Triple::arm) {		if (Triple.getArch() == llvm::Triple::arm) {
// Handled in ARM's setABI().		// Handled in ARM's setABI().
} else if (Triple.getArch() == llvm::Triple::x86) {		} else if (Triple.getArch() == llvm::Triple::x86) {
this->resetDataLayout("e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-"		this->resetDataLayout("e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-"
"i64:64-n8:16:32-S128");		"i64:64-n8:16:32-S128");
} else if (Triple.getArch() == llvm::Triple::x86_64) {		} else if (Triple.getArch() == llvm::Triple::x86_64) {
this->resetDataLayout("e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-"		this->resetDataLayout("e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-"
"i64:64-n8:16:32:64-S128");		"i64:64-i128:128-n8:16:32:64-S128");
} else if (Triple.getArch() == llvm::Triple::mipsel) {		} else if (Triple.getArch() == llvm::Triple::mipsel) {
// Handled on mips' setDataLayout.		// Handled on mips' setDataLayout.
} else {		} else {
assert(Triple.getArch() == llvm::Triple::le32);		assert(Triple.getArch() == llvm::Triple::le32);
this->resetDataLayout("e-p:32:32-i64:64");		this->resetDataLayout("e-p:32:32-i64:64");
}		}
}		}
};		};
▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/X86.h

Show First 20 Lines • Show All 646 Lines • ▼ Show 20 Lines	X86_64TargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts)
PtrDiffType = IsX32 ? SignedInt : SignedLong;		PtrDiffType = IsX32 ? SignedInt : SignedLong;
IntPtrType = IsX32 ? SignedInt : SignedLong;		IntPtrType = IsX32 ? SignedInt : SignedLong;
IntMaxType = IsX32 ? SignedLongLong : SignedLong;		IntMaxType = IsX32 ? SignedLongLong : SignedLong;
Int64Type = IsX32 ? SignedLongLong : SignedLong;		Int64Type = IsX32 ? SignedLongLong : SignedLong;
RegParmMax = 6;		RegParmMax = 6;

// Pointers are 32-bit in x32.		// Pointers are 32-bit in x32.
resetDataLayout(IsX32 ? "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-"		resetDataLayout(IsX32 ? "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-"
"i64:64-f80:128-n8:16:32:64-S128"		"i64:64-i128:128-f80:128-n8:16:32:64-S128"
: IsWinCOFF ? "e-m:w-p270:32:32-p271:32:32-p272:64:"		: IsWinCOFF ? "e-m:w-p270:32:32-p271:32:32-p272:64:"
"64-i64:64-f80:128-n8:16:32:64-S128"		"64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
: "e-m:e-p270:32:32-p271:32:32-p272:64:"		: "e-m:e-p270:32:32-p271:32:32-p272:64:"
"64-i64:64-f80:128-n8:16:32:64-S128");		"64-i64:64-i128:128-f80:128-n8:16:32:64-S128");

// Use fpret only for long double.		// Use fpret only for long double.
RealTypeUsesObjCFPRet = (1 << TargetInfo::LongDouble);		RealTypeUsesObjCFPRet = (1 << TargetInfo::LongDouble);

// Use fp2ret for _Complex long double.		// Use fp2ret for _Complex long double.
ComplexLongDoubleUsesFP2Ret = true;		ComplexLongDoubleUsesFP2Ret = true;

// Make __builtin_ms_va_list available.		// Make __builtin_ms_va_list available.
▲ Show 20 Lines • Show All 175 Lines • ▼ Show 20 Lines
public:		public:
DarwinX86_64TargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts)		DarwinX86_64TargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts)
: DarwinTargetInfo<X86_64TargetInfo>(Triple, Opts) {		: DarwinTargetInfo<X86_64TargetInfo>(Triple, Opts) {
Int64Type = SignedLongLong;		Int64Type = SignedLongLong;
// The 64-bit iOS simulator uses the builtin bool type for Objective-C.		// The 64-bit iOS simulator uses the builtin bool type for Objective-C.
llvm::Triple T = llvm::Triple(Triple);		llvm::Triple T = llvm::Triple(Triple);
if (T.isiOS())		if (T.isiOS())
UseSignedCharForObjCBool = false;		UseSignedCharForObjCBool = false;
resetDataLayout("e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:"		resetDataLayout("e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:"
"16:32:64-S128");		"16:32:64-S128");
}		}

bool handleTargetFeatures(std::vector<std::string> &Features,		bool handleTargetFeatures(std::vector<std::string> &Features,
DiagnosticsEngine &Diags) override {		DiagnosticsEngine &Diags) override {
if (!DarwinTargetInfo<X86_64TargetInfo>::handleTargetFeatures(Features,		if (!DarwinTargetInfo<X86_64TargetInfo>::handleTargetFeatures(Features,
Diags))		Diags))
return false;		return false;
▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

clang/test/CodeGen/target-data.c

	Show All 14 Lines
	// I686-CYGWIN: target datalayout = "e-m:x-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:32-n8:16:32-a:0:32-S32"			// I686-CYGWIN: target datalayout = "e-m:x-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:32-n8:16:32-a:0:32-S32"

	// RUN: %clang_cc1 -triple i686-pc-macho -emit-llvm -o - %s \| \			// RUN: %clang_cc1 -triple i686-pc-macho -emit-llvm -o - %s \| \
	// RUN: FileCheck --check-prefix=I686-MACHO %s			// RUN: FileCheck --check-prefix=I686-MACHO %s
	// I686-MACHO: target datalayout = "e-m:o-p:32:32-p270:32:32-p271:32:32-p272:64:64-f64:32:64-f80:32-n8:16:32-S128"			// I686-MACHO: target datalayout = "e-m:o-p:32:32-p270:32:32-p271:32:32-p272:64:64-f64:32:64-f80:32-n8:16:32-S128"

	// RUN: %clang_cc1 -triple x86_64-unknown-unknown -emit-llvm -o - %s \| \			// RUN: %clang_cc1 -triple x86_64-unknown-unknown -emit-llvm -o - %s \| \
	// RUN: FileCheck --check-prefix=X86_64 %s			// RUN: FileCheck --check-prefix=X86_64 %s
	// X86_64: target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			// X86_64: target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"

	// RUN: %clang_cc1 -triple xcore-unknown-unknown -emit-llvm -o - %s \| \			// RUN: %clang_cc1 -triple xcore-unknown-unknown -emit-llvm -o - %s \| \
	// RUN: FileCheck --check-prefix=XCORE %s			// RUN: FileCheck --check-prefix=XCORE %s
	// XCORE: target datalayout = "e-m:e-p:32:32-i1:8:32-i8:8:32-i16:16:32-i64:32-f64:32-a:0:32-n32"			// XCORE: target datalayout = "e-m:e-p:32:32-i1:8:32-i8:8:32-i16:16:32-i64:32-f64:32-a:0:32-n32"

	// RUN: %clang_cc1 -triple sparc-sun-solaris -emit-llvm -o - %s \| \			// RUN: %clang_cc1 -triple sparc-sun-solaris -emit-llvm -o - %s \| \
	// RUN: FileCheck %s --check-prefix=SPARC-V8			// RUN: FileCheck %s --check-prefix=SPARC-V8
	// SPARC-V8: target datalayout = "E-m:e-p:32:32-i64:64-f128:64-n32-S64"			// SPARC-V8: target datalayout = "E-m:e-p:32:32-i64:64-f128:64-n32-S64"
	▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	// PS3: target datalayout = "E-m:e-p:32:32-i64:64-n32:64"			// PS3: target datalayout = "E-m:e-p:32:32-i64:64-n32:64"

	// RUN: %clang_cc1 -triple i686-nacl -o - -emit-llvm %s \| \			// RUN: %clang_cc1 -triple i686-nacl -o - -emit-llvm %s \| \
	// RUN: FileCheck %s -check-prefix=I686-NACL			// RUN: FileCheck %s -check-prefix=I686-NACL
	// I686-NACL: target datalayout = "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64-n8:16:32-S128"			// I686-NACL: target datalayout = "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64-n8:16:32-S128"

	// RUN: %clang_cc1 -triple x86_64-nacl -o - -emit-llvm %s \| \			// RUN: %clang_cc1 -triple x86_64-nacl -o - -emit-llvm %s \| \
	// RUN: FileCheck %s -check-prefix=X86_64-NACL			// RUN: FileCheck %s -check-prefix=X86_64-NACL
	// X86_64-NACL: target datalayout = "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64-n8:16:32:64-S128"			// X86_64-NACL: target datalayout = "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-n8:16:32:64-S128"

	// RUN: %clang_cc1 -triple arm-nacl -o - -emit-llvm %s \| \			// RUN: %clang_cc1 -triple arm-nacl -o - -emit-llvm %s \| \
	// RUN: FileCheck %s -check-prefix=ARM-NACL			// RUN: FileCheck %s -check-prefix=ARM-NACL
	// ARM-NACL: target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S128"			// ARM-NACL: target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S128"

	// RUN: %clang_cc1 -triple mipsel-nacl -o - -emit-llvm %s \| \			// RUN: %clang_cc1 -triple mipsel-nacl -o - -emit-llvm %s \| \
	// RUN: FileCheck %s -check-prefix=MIPS-NACL			// RUN: FileCheck %s -check-prefix=MIPS-NACL
	// MIPS-NACL: target datalayout = "e-m:m-p:32:32-i8:8:32-i16:16:32-i64:64-n32-S64"			// MIPS-NACL: target datalayout = "e-m:m-p:32:32-i8:8:32-i16:16:32-i64:64-n32-S64"
	▲ Show 20 Lines • Show All 153 Lines • Show Last 20 Lines

llvm/include/llvm/IR/AutoUpgrade.h

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	inline bool mayBeOldLoopAttachmentTag(StringRef Name) {
return Name.startswith("llvm.vectorizer.");		return Name.startswith("llvm.vectorizer.");
}		}

/// Upgrade the loop attachment metadata node.		/// Upgrade the loop attachment metadata node.
MDNode *upgradeInstructionLoopAttachment(MDNode &N);		MDNode *upgradeInstructionLoopAttachment(MDNode &N);

/// Upgrade the datalayout string by adding a section for address space		/// Upgrade the datalayout string by adding a section for address space
/// pointers.		/// pointers.
std::string UpgradeDataLayoutString(StringRef DL, StringRef Triple);		std::string UpgradeDataLayoutString(std::string DL, StringRef Triple);

/// Upgrade attributes that changed format or kind.		/// Upgrade attributes that changed format or kind.
void UpgradeAttributes(AttrBuilder &B);		void UpgradeAttributes(AttrBuilder &B);

} // End llvm namespace		} // End llvm namespace

#endif		#endif

llvm/lib/IR/AutoUpgrade.cpp

Show First 20 Lines • Show All 4,301 Lines • ▼ Show 20 Lines	MDNode *llvm::upgradeInstructionLoopAttachment(MDNode &N) {
SmallVector<Metadata *, 8> Ops;		SmallVector<Metadata *, 8> Ops;
Ops.reserve(T->getNumOperands());		Ops.reserve(T->getNumOperands());
for (Metadata *MD : T->operands())		for (Metadata *MD : T->operands())
Ops.push_back(upgradeLoopArgument(MD));		Ops.push_back(upgradeLoopArgument(MD));

return MDTuple::get(T->getContext(), Ops);		return MDTuple::get(T->getContext(), Ops);
}		}

std::string llvm::UpgradeDataLayoutString(StringRef DL, StringRef TT) {		std::string llvm::UpgradeDataLayoutString(std::string DLStr,
StringRef AddrSpaces = "-p270:32:32-p271:32:32-p272:64:64";		StringRef TripleStr) {

		Triple TT(TripleStr);

// If X86, and the datalayout matches the expected format, add pointer size		// We only have upgrades for X86.
// address spaces to the datalayout.		if (!TT.isX86())
if (!Triple(TT).isX86() \|\| DL.contains(AddrSpaces))		return DLStr;
return std::string(DL);
		StringRef DL = DLStr;

		// We have two cases to handle. Missing address spaces and missing i128
		// alignment. We'll handle them separately.
		if (TT.isArch64Bit() && !DL.contains("-i128:128")) {
		hvdijkAuthorUnsubmitted Not Done Reply Inline Actions This needs to not be limited to `TT.isArch64Bit()`. i128 needs 16-byte alignment on all targets, and although clang disables `__int128` for X86, we still use it for lowering f128. hvdijk: This needs to not be limited to `TT.isArch64Bit()`. i128 needs 16-byte alignment on all targets…
		auto I = DL.find("-i64:64-");
		if (I != StringRef::npos) {
		// Insert just before the - at the end of the string we matched.
		DLStr = (DL.take_front(I + 7) + "-i128:128" + DL.drop_front(I + 7)).str();
		DL = DLStr;
		}
		}

		StringRef AddrSpaces = "-p270:32:32-p271:32:32-p272:64:64";
		if (!DL.contains(AddrSpaces)) {
SmallVector<StringRef, 4> Groups;		SmallVector<StringRef, 4> Groups;
Regex R("(e-m:[a-z](-p:32:32)?)(-[if]64:.*$)");		Regex R("(e-m:[a-z](-p:32:32)?)(-[if]64:.*$)");
if (!R.match(DL, &Groups))		if (R.match(DL, &Groups)) {
return std::string(DL);		DLStr = (Groups[1] + AddrSpaces + Groups[3]).str();
		DL = DLStr;
		}
		}

return (Groups[1] + AddrSpaces + Groups[3]).str();		return DLStr;
}		}

void llvm::UpgradeAttributes(AttrBuilder &B) {		void llvm::UpgradeAttributes(AttrBuilder &B) {
StringRef FramePointer;		StringRef FramePointer;
if (B.contains("no-frame-pointer-elim")) {		if (B.contains("no-frame-pointer-elim")) {
// The value can be "true" or "false".		// The value can be "true" or "false".
for (const auto &I : B.td_attrs())		for (const auto &I : B.td_attrs())
if (I.first == "no-frame-pointer-elim")		if (I.first == "no-frame-pointer-elim")
Show All 14 Lines	if (B.contains("null-pointer-is-valid")) {
bool NullPointerIsValid = false;		bool NullPointerIsValid = false;
for (const auto &I : B.td_attrs())		for (const auto &I : B.td_attrs())
if (I.first == "null-pointer-is-valid")		if (I.first == "null-pointer-is-valid")
NullPointerIsValid = I.second == "true";		NullPointerIsValid = I.second == "true";
B.removeAttribute("null-pointer-is-valid");		B.removeAttribute("null-pointer-is-valid");
if (NullPointerIsValid)		if (NullPointerIsValid)
B.addAttribute(Attribute::NullPointerIsValid);		B.addAttribute(Attribute::NullPointerIsValid);
}		}
}		}
		nikicUnsubmitted Not Done Reply Inline Actions I don't think this will work for the 32-bit targets that don't have `-i64:64`. nikic: I don't think this will work for the 32-bit targets that don't have `-i64:64`.
		hvdijkAuthorUnsubmitted Not Done Reply Inline Actions Oh, you're right, thanks. That was intentional but wrong: there is a test that we do not upgrade data layout strings that do not look sufficiently close to valid, and this was intended to address that. But this also avoids it for data layout strings that do need upgrading. I'll have to figure out how to handle both; will update when I know how. hvdijk: Oh, you're right, thanks. That was intentional but wrong: there is a test that we do not…
		hvdijkAuthorUnsubmitted Done Reply Inline Actions This should now be fixed. X86 data layout strings always have their components in the same order, `mpifnaS`, where some may be omitted. I make use of this by looking for any leading `-m`/`-p`/`-i` components and inserting `-i128:128` after the last of those. hvdijk: This should now be fixed. X86 data layout strings always have their components in the same…

llvm/lib/Target/X86/X86TargetMachine.cpp

Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	static std::string computeDataLayout(const Triple &TT) {
// Some ABIs align 64 bit integers and doubles to 64 bits, others to 32.		// Some ABIs align 64 bit integers and doubles to 64 bits, others to 32.
if (TT.isArch64Bit() \|\| TT.isOSWindows() \|\| TT.isOSNaCl())		if (TT.isArch64Bit() \|\| TT.isOSWindows() \|\| TT.isOSNaCl())
Ret += "-i64:64";		Ret += "-i64:64";
else if (TT.isOSIAMCU())		else if (TT.isOSIAMCU())
Ret += "-i64:32-f64:32";		Ret += "-i64:32-f64:32";
else		else
Ret += "-f64:32:64";		Ret += "-f64:32:64";

		// 128 bit integers are always aligned to 128 bits, but only 64-bit matters,
		// because __int128 is only supported on 64-bit targets.
		if (TT.isArch64Bit())
		Ret += "-i128:128";

// Some ABIs align long double to 128 bits, others to 32.		// Some ABIs align long double to 128 bits, others to 32.
if (TT.isOSNaCl() \|\| TT.isOSIAMCU())		if (TT.isOSNaCl() \|\| TT.isOSIAMCU())
; // No f80		; // No f80
else if (TT.isArch64Bit() \|\| TT.isOSDarwin())		else if (TT.isArch64Bit() \|\| TT.isOSDarwin())
Ret += "-f80:128";		Ret += "-f80:128";
else		else
Ret += "-f80:32";		Ret += "-f80:32";

▲ Show 20 Lines • Show All 442 Lines • Show Last 20 Lines

llvm/test/Bitcode/upgrade-datalayout.ll

	; Test to make sure datalayout is automatically upgraded.			; Test to make sure datalayout is automatically upgraded.
	;			;
	; RUN: llvm-as %s -o - \| llvm-dis - \| FileCheck %s			; RUN: llvm-as %s -o - \| llvm-dis - \| FileCheck %s

	target datalayout = "e-m:e-p:32:32-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-p:32:32-i64:64-i128:128-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; CHECK: target datalayout = "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			; CHECK: target datalayout = "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"

llvm/test/Bitcode/upgrade-datalayout3.ll

	; Test to make sure datalayout is automatically upgraded.			; Test to make sure datalayout is automatically upgraded.
	;			;
	; RUN: llvm-as %s -o - \| llvm-dis - \| FileCheck %s			; RUN: llvm-as %s -o - \| llvm-dis - \| FileCheck %s

	target datalayout = "e-m:w-p:32:32-i64:64-f80:32-n8:16:32-S32"			target datalayout = "e-m:w-p:32:32-i64:64-i128:128-f80:32-n8:16:32-S32"
	target triple = "i686-pc-windows-msvc"			target triple = "i686-pc-windows-msvc"

	; CHECK: target datalayout = "e-m:w-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:32-n8:16:32-S32"			; CHECK: target datalayout = "e-m:w-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:32-n8:16:32-S32"

llvm/test/CodeGen/X86/atomic-unordered.ll

	Show First 20 Lines • Show All 317 Lines • ▼ Show 20 Lines
	; CHECK-O0-NEXT: .cfi_def_cfa_offset 64			; CHECK-O0-NEXT: .cfi_def_cfa_offset 64
	; CHECK-O0-NEXT: movq %rdi, %rax			; CHECK-O0-NEXT: movq %rdi, %rax
	; CHECK-O0-NEXT: movl $32, %ecx			; CHECK-O0-NEXT: movl $32, %ecx
	; CHECK-O0-NEXT: leaq {{[0-9]+}}(%rsp), %rdx			; CHECK-O0-NEXT: leaq {{[0-9]+}}(%rsp), %rdx
	; CHECK-O0-NEXT: xorl %r8d, %r8d			; CHECK-O0-NEXT: xorl %r8d, %r8d
	; CHECK-O0-NEXT: movq %rdi, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill			; CHECK-O0-NEXT: movq %rdi, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
	; CHECK-O0-NEXT: movq %rcx, %rdi			; CHECK-O0-NEXT: movq %rcx, %rdi
	; CHECK-O0-NEXT: movl %r8d, %ecx			; CHECK-O0-NEXT: movl %r8d, %ecx
	; CHECK-O0-NEXT: movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill			; CHECK-O0-NEXT: movq %rax, (%rsp) # 8-byte Spill
	; CHECK-O0-NEXT: callq __atomic_load			; CHECK-O0-NEXT: callq __atomic_load
	; CHECK-O0-NEXT: movq {{[0-9]+}}(%rsp), %rax			; CHECK-O0-NEXT: movq {{[0-9]+}}(%rsp), %rax
	; CHECK-O0-NEXT: movq {{[0-9]+}}(%rsp), %rcx			; CHECK-O0-NEXT: movq {{[0-9]+}}(%rsp), %rcx
	; CHECK-O0-NEXT: movq {{[0-9]+}}(%rsp), %rdx			; CHECK-O0-NEXT: movq {{[0-9]+}}(%rsp), %rdx
	; CHECK-O0-NEXT: movq {{[0-9]+}}(%rsp), %rsi			; CHECK-O0-NEXT: movq {{[0-9]+}}(%rsp), %rsi
	; CHECK-O0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rdi # 8-byte Reload			; CHECK-O0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rdi # 8-byte Reload
	; CHECK-O0-NEXT: movq %rsi, 24(%rdi)			; CHECK-O0-NEXT: movq %rsi, 24(%rdi)
	; CHECK-O0-NEXT: movq %rdx, 16(%rdi)			; CHECK-O0-NEXT: movq %rdx, 16(%rdi)
	; CHECK-O0-NEXT: movq %rcx, 8(%rdi)			; CHECK-O0-NEXT: movq %rcx, 8(%rdi)
	; CHECK-O0-NEXT: movq %rax, (%rdi)			; CHECK-O0-NEXT: movq %rax, (%rdi)
	; CHECK-O0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rax # 8-byte Reload			; CHECK-O0-NEXT: movq (%rsp), %rax # 8-byte Reload
	; CHECK-O0-NEXT: addq $56, %rsp			; CHECK-O0-NEXT: addq $56, %rsp
	; CHECK-O0-NEXT: .cfi_def_cfa_offset 8			; CHECK-O0-NEXT: .cfi_def_cfa_offset 8
	; CHECK-O0-NEXT: retq			; CHECK-O0-NEXT: retq
	;			;
	; CHECK-O3-LABEL: load_i256:			; CHECK-O3-LABEL: load_i256:
	; CHECK-O3: # %bb.0:			; CHECK-O3: # %bb.0:
	; CHECK-O3-NEXT: pushq %rbx			; CHECK-O3-NEXT: pushq %rbx
	; CHECK-O3-NEXT: .cfi_def_cfa_offset 16			; CHECK-O3-NEXT: .cfi_def_cfa_offset 16
	Show All 16 Lines
	; CHECK-O3-NEXT: retq			; CHECK-O3-NEXT: retq
	%v = load atomic i256, i256* %ptr unordered, align 16			%v = load atomic i256, i256* %ptr unordered, align 16
	ret i256 %v			ret i256 %v
	}			}

	define void @store_i256(i256* %ptr, i256 %v) {			define void @store_i256(i256* %ptr, i256 %v) {
	; CHECK-O0-LABEL: store_i256:			; CHECK-O0-LABEL: store_i256:
	; CHECK-O0: # %bb.0:			; CHECK-O0: # %bb.0:
	; CHECK-O0-NEXT: subq $40, %rsp			; CHECK-O0-NEXT: subq $56, %rsp
	; CHECK-O0-NEXT: .cfi_def_cfa_offset 48			; CHECK-O0-NEXT: .cfi_def_cfa_offset 64
	; CHECK-O0-NEXT: xorl %eax, %eax			; CHECK-O0-NEXT: xorl %eax, %eax
	; CHECK-O0-NEXT: leaq {{[0-9]+}}(%rsp), %r9			; CHECK-O0-NEXT: leaq {{[0-9]+}}(%rsp), %r9
	; CHECK-O0-NEXT: movq %rsi, {{[0-9]+}}(%rsp)			; CHECK-O0-NEXT: movq %rsi, {{[0-9]+}}(%rsp)
	; CHECK-O0-NEXT: movq %rdx, {{[0-9]+}}(%rsp)			; CHECK-O0-NEXT: movq %rdx, {{[0-9]+}}(%rsp)
	; CHECK-O0-NEXT: movq %rcx, {{[0-9]+}}(%rsp)			; CHECK-O0-NEXT: movq %rcx, {{[0-9]+}}(%rsp)
	; CHECK-O0-NEXT: movq %r8, {{[0-9]+}}(%rsp)			; CHECK-O0-NEXT: movq %r8, {{[0-9]+}}(%rsp)
	; CHECK-O0-NEXT: movl $32, %ecx			; CHECK-O0-NEXT: movl $32, %ecx
	; CHECK-O0-NEXT: movq %rdi, (%rsp) # 8-byte Spill			; CHECK-O0-NEXT: movq %rdi, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
	; CHECK-O0-NEXT: movq %rcx, %rdi			; CHECK-O0-NEXT: movq %rcx, %rdi
	; CHECK-O0-NEXT: movq (%rsp), %rsi # 8-byte Reload			; CHECK-O0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rsi # 8-byte Reload
	; CHECK-O0-NEXT: movq %r9, %rdx			; CHECK-O0-NEXT: movq %r9, %rdx
	; CHECK-O0-NEXT: movl %eax, %ecx			; CHECK-O0-NEXT: movl %eax, %ecx
	; CHECK-O0-NEXT: callq __atomic_store			; CHECK-O0-NEXT: callq __atomic_store
	; CHECK-O0-NEXT: addq $40, %rsp			; CHECK-O0-NEXT: addq $56, %rsp
	; CHECK-O0-NEXT: .cfi_def_cfa_offset 8			; CHECK-O0-NEXT: .cfi_def_cfa_offset 8
	; CHECK-O0-NEXT: retq			; CHECK-O0-NEXT: retq
	;			;
	; CHECK-O3-LABEL: store_i256:			; CHECK-O3-LABEL: store_i256:
	; CHECK-O3: # %bb.0:			; CHECK-O3: # %bb.0:
	; CHECK-O3-NEXT: subq $40, %rsp			; CHECK-O3-NEXT: subq $40, %rsp
	; CHECK-O3-NEXT: .cfi_def_cfa_offset 48			; CHECK-O3-NEXT: .cfi_def_cfa_offset 48
	; CHECK-O3-NEXT: movq %rdi, %rax			; CHECK-O3-NEXT: movq %rdi, %rax
	; CHECK-O3-NEXT: movq %r8, {{[0-9]+}}(%rsp)			; CHECK-O3-NEXT: movq %r8, {{[0-9]+}}(%rsp)
	; CHECK-O3-NEXT: movq %rcx, {{[0-9]+}}(%rsp)			; CHECK-O3-NEXT: movq %rcx, {{[0-9]+}}(%rsp)
	; CHECK-O3-NEXT: movq %rdx, {{[0-9]+}}(%rsp)			; CHECK-O3-NEXT: movq %rdx, {{[0-9]+}}(%rsp)
	; CHECK-O3-NEXT: movq %rsi, {{[0-9]+}}(%rsp)			; CHECK-O3-NEXT: movq %rsi, (%rsp)
	; CHECK-O3-NEXT: leaq {{[0-9]+}}(%rsp), %rdx			; CHECK-O3-NEXT: movq %rsp, %rdx
	; CHECK-O3-NEXT: movl $32, %edi			; CHECK-O3-NEXT: movl $32, %edi
	; CHECK-O3-NEXT: movq %rax, %rsi			; CHECK-O3-NEXT: movq %rax, %rsi
	; CHECK-O3-NEXT: xorl %ecx, %ecx			; CHECK-O3-NEXT: xorl %ecx, %ecx
	; CHECK-O3-NEXT: callq __atomic_store			; CHECK-O3-NEXT: callq __atomic_store
	; CHECK-O3-NEXT: addq $40, %rsp			; CHECK-O3-NEXT: addq $40, %rsp
	; CHECK-O3-NEXT: .cfi_def_cfa_offset 8			; CHECK-O3-NEXT: .cfi_def_cfa_offset 8
	; CHECK-O3-NEXT: retq			; CHECK-O3-NEXT: retq
	store atomic i256 %v, i256* %ptr unordered, align 16			store atomic i256 %v, i256* %ptr unordered, align 16
	▲ Show 20 Lines • Show All 2,340 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/bitcast-i256.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx,-slow-unaligned-mem-32 \| FileCheck %s --check-prefix=FAST			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx,-slow-unaligned-mem-32 \| FileCheck %s --check-prefix=FAST
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx,+slow-unaligned-mem-32 \| FileCheck %s --check-prefix=SLOW			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx,+slow-unaligned-mem-32 \| FileCheck %s --check-prefix=SLOW

	define i256 @foo(<8 x i32> %a) {			define i256 @foo(<8 x i32> %a) {
	; FAST-LABEL: foo:			; FAST-LABEL: foo:
	; FAST: # %bb.0:			; FAST: # %bb.0:
	; FAST-NEXT: movq %rdi, %rax			; FAST-NEXT: movq %rdi, %rax
	; FAST-NEXT: vmovups %ymm0, (%rdi)			; FAST-NEXT: vmovups %ymm0, (%rdi)
	; FAST-NEXT: vzeroupper			; FAST-NEXT: vzeroupper
	; FAST-NEXT: retq			; FAST-NEXT: retq
	;			;
	; SLOW-LABEL: foo:			; SLOW-LABEL: foo:
	; SLOW: # %bb.0:			; SLOW: # %bb.0:
	; SLOW-NEXT: movq %rdi, %rax			; SLOW-NEXT: movq %rdi, %rax
	; SLOW-NEXT: vextractf128 $1, %ymm0, 16(%rdi)			; SLOW-NEXT: vextractf128 $1, %ymm0, 16(%rdi)
	; SLOW-NEXT: vmovups %xmm0, (%rdi)			; SLOW-NEXT: vmovaps %xmm0, (%rdi)
	; SLOW-NEXT: vzeroupper			; SLOW-NEXT: vzeroupper
	; SLOW-NEXT: retq			; SLOW-NEXT: retq
	%r = bitcast <8 x i32> %a to i256			%r = bitcast <8 x i32> %a to i256
	ret i256 %r			ret i256 %r
	}			}

llvm/test/CodeGen/X86/catchpad-dynamic-alloca.ll

	Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines

	catch.switch:			catch.switch:
	%cs = catchswitch within none [label %catch.pad] unwind to caller			%cs = catchswitch within none [label %catch.pad] unwind to caller
	}			}

	; CHECK-LABEL: $handlerMap$0$test2:			; CHECK-LABEL: $handlerMap$0$test2:
	; CHECK: .long 0			; CHECK: .long 0
	; CHECK-NEXT: .long 0			; CHECK-NEXT: .long 0
	; CHECK-NEXT: .long 8			; CHECK-NEXT: .long 16

llvm/test/CodeGen/X86/implicit-null-check.ll

Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	not_null:
ret i8 %t		ret i8 %t
}		}

define i256 @imp_null_check_load_i256(i256* %x) {		define i256 @imp_null_check_load_i256(i256* %x) {
; CHECK-LABEL: imp_null_check_load_i256:		; CHECK-LABEL: imp_null_check_load_i256:
; CHECK: ## %bb.0: ## %entry		; CHECK: ## %bb.0: ## %entry
; CHECK-NEXT: movq %rdi, %rax		; CHECK-NEXT: movq %rdi, %rax
; CHECK-NEXT: Ltmp3:		; CHECK-NEXT: Ltmp3:
; CHECK-NEXT: movq (%rsi), %rcx ## on-fault: LBB5_1		; CHECK-NEXT: movaps (%rsi), %xmm0 ## on-fault: LBB5_1
; CHECK-NEXT: ## %bb.2: ## %not_null		; CHECK-NEXT: ## %bb.2: ## %not_null
; CHECK-NEXT: movq 8(%rsi), %rdx		; CHECK-NEXT: movaps 16(%rsi), %xmm1
; CHECK-NEXT: movq 16(%rsi), %rdi		; CHECK-NEXT: movaps %xmm1, 16(%rax)
; CHECK-NEXT: movq 24(%rsi), %rsi		; CHECK-NEXT: movaps %xmm0, (%rax)
; CHECK-NEXT: movq %rsi, 24(%rax)
; CHECK-NEXT: movq %rdi, 16(%rax)
; CHECK-NEXT: movq %rdx, 8(%rax)
; CHECK-NEXT: movq %rcx, (%rax)
; CHECK-NEXT: retq		; CHECK-NEXT: retq
; CHECK-NEXT: LBB5_1: ## %is_null		; CHECK-NEXT: LBB5_1: ## %is_null
; CHECK-NEXT: movq $0, 24(%rax)		; CHECK-NEXT: xorps %xmm0, %xmm0
; CHECK-NEXT: movq $0, 16(%rax)		; CHECK-NEXT: movaps %xmm0, 16(%rax)
; CHECK-NEXT: movq $0, 8(%rax)		; CHECK-NEXT: movq $0, 8(%rax)
; CHECK-NEXT: movq $42, (%rax)		; CHECK-NEXT: movq $42, (%rax)
; CHECK-NEXT: retq		; CHECK-NEXT: retq

entry:		entry:
%c = icmp eq i256* %x, null		%c = icmp eq i256* %x, null
br i1 %c, label %is_null, label %not_null, !make.implicit !0		br i1 %c, label %is_null, label %not_null, !make.implicit !0

▲ Show 20 Lines • Show All 410 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/legalize-shl-vec.ll

	Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; X32-NEXT: retl $4			; X32-NEXT: retl $4
	;			;
	; X64-LABEL: test_shl:			; X64-LABEL: test_shl:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdx			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdx
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdi			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdi
	; X64-NEXT: shldq $2, %rcx, %rdx			; X64-NEXT: shldq $2, %rdx, %rcx
	; X64-NEXT: shldq $2, %rdi, %rcx			; X64-NEXT: shldq $2, %rdi, %rdx
	; X64-NEXT: shldq $2, %r9, %rdi			; X64-NEXT: shldq $2, %r9, %rdi
	; X64-NEXT: shlq $63, %rsi			; X64-NEXT: shlq $63, %rsi
	; X64-NEXT: shlq $2, %r9			; X64-NEXT: shlq $2, %r9
	; X64-NEXT: movq %rdx, 56(%rax)			; X64-NEXT: movq %rcx, 56(%rax)
	; X64-NEXT: movq %rcx, 48(%rax)			; X64-NEXT: movq %rdx, 48(%rax)
	; X64-NEXT: movq %rdi, 40(%rax)			; X64-NEXT: movq %rdi, 40(%rax)
	; X64-NEXT: movq %r9, 32(%rax)			; X64-NEXT: movq %r9, 32(%rax)
	; X64-NEXT: movq %rsi, 24(%rax)			; X64-NEXT: movq %rsi, 24(%rax)
	; X64-NEXT: xorps %xmm0, %xmm0			; X64-NEXT: xorps %xmm0, %xmm0
	; X64-NEXT: movaps %xmm0, (%rax)			; X64-NEXT: movaps %xmm0, (%rax)
	; X64-NEXT: movq $0, 16(%rax)			; X64-NEXT: movq $0, 16(%rax)
	; X64-NEXT: retq			; X64-NEXT: retq
	%Amt = insertelement <2 x i256> <i256 1, i256 2>, i256 255, i32 0			%Amt = insertelement <2 x i256> <i256 1, i256 2>, i256 255, i32 0
	▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	;			;
	; X64-LABEL: test_srl:			; X64-LABEL: test_srl:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdx			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdx
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rsi			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rsi
	; X64-NEXT: shrdq $4, %rsi, %r9			; X64-NEXT: shrdq $4, %rsi, %r9
	; X64-NEXT: shrdq $4, %rcx, %rsi			; X64-NEXT: shrdq $4, %rdx, %rsi
				; X64-NEXT: shrdq $4, %rcx, %rdx
	; X64-NEXT: shrq $63, %r8			; X64-NEXT: shrq $63, %r8
	; X64-NEXT: shrdq $4, %rdx, %rcx			; X64-NEXT: shrq $4, %rcx
	; X64-NEXT: shrq $4, %rdx			; X64-NEXT: movq %rcx, 56(%rdi)
	; X64-NEXT: movq %rdx, 56(%rdi)			; X64-NEXT: movq %rdx, 48(%rdi)
	; X64-NEXT: movq %rcx, 48(%rdi)
	; X64-NEXT: movq %rsi, 40(%rdi)			; X64-NEXT: movq %rsi, 40(%rdi)
	; X64-NEXT: movq %r9, 32(%rdi)			; X64-NEXT: movq %r9, 32(%rdi)
	; X64-NEXT: movq %r8, (%rdi)			; X64-NEXT: movq %r8, (%rdi)
	; X64-NEXT: xorps %xmm0, %xmm0			; X64-NEXT: xorps %xmm0, %xmm0
	; X64-NEXT: movaps %xmm0, 16(%rdi)			; X64-NEXT: movaps %xmm0, 16(%rdi)
	; X64-NEXT: movq $0, 8(%rdi)			; X64-NEXT: movq $0, 8(%rdi)
	; X64-NEXT: retq			; X64-NEXT: retq
	%Amt = insertelement <2 x i256> <i256 3, i256 4>, i256 255, i32 0			%Amt = insertelement <2 x i256> <i256 3, i256 4>, i256 255, i32 0
	▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	;			;
	; X64-LABEL: test_sra:			; X64-LABEL: test_sra:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdx			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdx
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rsi			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rsi
	; X64-NEXT: shrdq $6, %rsi, %r9			; X64-NEXT: shrdq $6, %rsi, %r9
	; X64-NEXT: shrdq $6, %rcx, %rsi			; X64-NEXT: shrdq $6, %rdx, %rsi
				; X64-NEXT: shrdq $6, %rcx, %rdx
	; X64-NEXT: sarq $63, %r8			; X64-NEXT: sarq $63, %r8
	; X64-NEXT: shrdq $6, %rdx, %rcx			; X64-NEXT: sarq $6, %rcx
	; X64-NEXT: sarq $6, %rdx			; X64-NEXT: movq %rcx, 56(%rdi)
	; X64-NEXT: movq %rdx, 56(%rdi)			; X64-NEXT: movq %rdx, 48(%rdi)
	; X64-NEXT: movq %rcx, 48(%rdi)
	; X64-NEXT: movq %rsi, 40(%rdi)			; X64-NEXT: movq %rsi, 40(%rdi)
	; X64-NEXT: movq %r9, 32(%rdi)			; X64-NEXT: movq %r9, 32(%rdi)
	; X64-NEXT: movq %r8, 24(%rdi)			; X64-NEXT: movq %r8, 24(%rdi)
	; X64-NEXT: movq %r8, 16(%rdi)			; X64-NEXT: movq %r8, 16(%rdi)
	; X64-NEXT: movq %r8, 8(%rdi)			; X64-NEXT: movq %r8, 8(%rdi)
	; X64-NEXT: movq %r8, (%rdi)			; X64-NEXT: movq %r8, (%rdi)
	; X64-NEXT: retq			; X64-NEXT: retq
	%Amt = insertelement <2 x i256> <i256 5, i256 6>, i256 255, i32 0			%Amt = insertelement <2 x i256> <i256 5, i256 6>, i256 255, i32 0
	%Out = ashr <2 x i256> %In, %Amt			%Out = ashr <2 x i256> %In, %Amt
	ret <2 x i256> %Out			ret <2 x i256> %Out
	}			}

llvm/test/CodeGen/X86/osx-private-labels.ll

	Show All 30 Lines

	@private5 = private unnamed_addr constant i64 42			@private5 = private unnamed_addr constant i64 42
	; CHECK: .section __TEXT,__literal8,8byte_literals			; CHECK: .section __TEXT,__literal8,8byte_literals
	; CHECK-NEXT: .p2align 3			; CHECK-NEXT: .p2align 3
	; CHECK-NEXT: L_private5:			; CHECK-NEXT: L_private5:

	@private6 = private unnamed_addr constant i128 42			@private6 = private unnamed_addr constant i128 42
	; CHECK: .section __TEXT,__literal16,16byte_literals			; CHECK: .section __TEXT,__literal16,16byte_literals
	; CHECK-NEXT: .p2align 3			; CHECK-NEXT: .p2align 4
	; CHECK-NEXT: L_private6:			; CHECK-NEXT: L_private6:

	%struct._objc_class = type { i8* }			%struct._objc_class = type { i8* }
	@private7 = private global %struct._objc_class* null, section "__OBJC,__cls_refs,literal_pointers,no_dead_strip"			@private7 = private global %struct._objc_class* null, section "__OBJC,__cls_refs,literal_pointers,no_dead_strip"
	; CHECK: .section __OBJC,__cls_refs,literal_pointers,no_dead_strip			; CHECK: .section __OBJC,__cls_refs,literal_pointers,no_dead_strip
	; CHECK: .p2align 3			; CHECK: .p2align 3
	; CHECK: L_private7:			; CHECK: L_private7:

	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/scheduler-backtracking.ll

Show First 20 Lines • Show All 246 Lines • ▼ Show 20 Lines	; LIN-NEXT: retq
%f = select i1 %t, i256 undef, i256 %c		%f = select i1 %t, i256 undef, i256 %c
ret i256 %f		ret i256 %f
}		}

define i256 @test2(i256 %a) nounwind {		define i256 @test2(i256 %a) nounwind {
; ILP-LABEL: test2:		; ILP-LABEL: test2:
; ILP: # %bb.0:		; ILP: # %bb.0:
; ILP-NEXT: movq %rdi, %rax		; ILP-NEXT: movq %rdi, %rax
; ILP-NEXT: xorl %edi, %edi		; ILP-NEXT: xorps %xmm0, %xmm0
		; ILP-NEXT: movaps %xmm0, 16(%rdi)
		; ILP-NEXT: xorl %r9d, %r9d
; ILP-NEXT: movq %rsi, %r11		; ILP-NEXT: movq %rsi, %r11
; ILP-NEXT: negq %r11		; ILP-NEXT: negq %r11
; ILP-NEXT: movl $0, %r10d		; ILP-NEXT: movl $0, %r10d
; ILP-NEXT: sbbq %rdx, %r10		; ILP-NEXT: sbbq %rdx, %r10
; ILP-NEXT: movl $0, %r9d		; ILP-NEXT: movl $0, %edi
; ILP-NEXT: sbbq %rcx, %r9		; ILP-NEXT: sbbq %rcx, %rdi
; ILP-NEXT: sbbq %r8, %rdi		; ILP-NEXT: sbbq %r8, %r9
; ILP-NEXT: andq %rcx, %r9		; ILP-NEXT: andq %r8, %r9
; ILP-NEXT: bsrq %r9, %rcx		; ILP-NEXT: bsrq %r9, %r8
; ILP-NEXT: xorq $63, %rcx
; ILP-NEXT: andq %r8, %rdi
; ILP-NEXT: bsrq %rdi, %r8
; ILP-NEXT: andq %rdx, %r10		; ILP-NEXT: andq %rdx, %r10
; ILP-NEXT: bsrq %r10, %rdx		; ILP-NEXT: bsrq %r10, %rdx
; ILP-NEXT: xorq $63, %r8		; ILP-NEXT: xorq $63, %r8
		; ILP-NEXT: andq %rcx, %rdi
		; ILP-NEXT: bsrq %rdi, %rcx
		; ILP-NEXT: xorq $63, %rcx
; ILP-NEXT: addq $64, %rcx		; ILP-NEXT: addq $64, %rcx
; ILP-NEXT: testq %rdi, %rdi		; ILP-NEXT: testq %r9, %r9
; ILP-NEXT: movq $0, 24(%rax)
; ILP-NEXT: movq $0, 16(%rax)
; ILP-NEXT: movq $0, 8(%rax)		; ILP-NEXT: movq $0, 8(%rax)
; ILP-NEXT: cmovneq %r8, %rcx		; ILP-NEXT: cmovneq %r8, %rcx
; ILP-NEXT: xorq $63, %rdx		; ILP-NEXT: xorq $63, %rdx
; ILP-NEXT: andq %rsi, %r11		; ILP-NEXT: andq %rsi, %r11
; ILP-NEXT: movl $127, %r8d		; ILP-NEXT: movl $127, %r8d
; ILP-NEXT: bsrq %r11, %rsi		; ILP-NEXT: bsrq %r11, %rsi
; ILP-NEXT: cmoveq %r8, %rsi		; ILP-NEXT: cmoveq %r8, %rsi
; ILP-NEXT: xorq $63, %rsi		; ILP-NEXT: xorq $63, %rsi
; ILP-NEXT: addq $64, %rsi		; ILP-NEXT: addq $64, %rsi
; ILP-NEXT: testq %r10, %r10		; ILP-NEXT: testq %r10, %r10
; ILP-NEXT: cmovneq %rdx, %rsi		; ILP-NEXT: cmovneq %rdx, %rsi
; ILP-NEXT: subq $-128, %rsi		; ILP-NEXT: subq $-128, %rsi
; ILP-NEXT: orq %r9, %rdi		; ILP-NEXT: orq %r9, %rdi
; ILP-NEXT: cmovneq %rcx, %rsi		; ILP-NEXT: cmovneq %rcx, %rsi
; ILP-NEXT: movq %rsi, (%rax)		; ILP-NEXT: movq %rsi, (%rax)
; ILP-NEXT: retq		; ILP-NEXT: retq
;		;
; HYBRID-LABEL: test2:		; HYBRID-LABEL: test2:
; HYBRID: # %bb.0:		; HYBRID: # %bb.0:
; HYBRID-NEXT: movq %rdi, %rax		; HYBRID-NEXT: movq %rdi, %rax
		; HYBRID-NEXT: xorps %xmm0, %xmm0
		; HYBRID-NEXT: movaps %xmm0, 16(%rdi)
; HYBRID-NEXT: xorl %r9d, %r9d		; HYBRID-NEXT: xorl %r9d, %r9d
; HYBRID-NEXT: movq %rsi, %r11		; HYBRID-NEXT: movq %rsi, %r11
; HYBRID-NEXT: negq %r11		; HYBRID-NEXT: negq %r11
; HYBRID-NEXT: movl $0, %r10d		; HYBRID-NEXT: movl $0, %r10d
; HYBRID-NEXT: sbbq %rdx, %r10		; HYBRID-NEXT: sbbq %rdx, %r10
; HYBRID-NEXT: movl $0, %edi		; HYBRID-NEXT: movl $0, %edi
; HYBRID-NEXT: sbbq %rcx, %rdi		; HYBRID-NEXT: sbbq %rcx, %rdi
; HYBRID-NEXT: sbbq %r8, %r9		; HYBRID-NEXT: sbbq %r8, %r9
Show All 16 Lines
; HYBRID-NEXT: xorq $63, %rsi		; HYBRID-NEXT: xorq $63, %rsi
; HYBRID-NEXT: addq $64, %rsi		; HYBRID-NEXT: addq $64, %rsi
; HYBRID-NEXT: testq %r10, %r10		; HYBRID-NEXT: testq %r10, %r10
; HYBRID-NEXT: cmovneq %rdx, %rsi		; HYBRID-NEXT: cmovneq %rdx, %rsi
; HYBRID-NEXT: subq $-128, %rsi		; HYBRID-NEXT: subq $-128, %rsi
; HYBRID-NEXT: orq %r9, %rdi		; HYBRID-NEXT: orq %r9, %rdi
; HYBRID-NEXT: cmovneq %rcx, %rsi		; HYBRID-NEXT: cmovneq %rcx, %rsi
; HYBRID-NEXT: movq %rsi, (%rax)		; HYBRID-NEXT: movq %rsi, (%rax)
; HYBRID-NEXT: movq $0, 24(%rax)
; HYBRID-NEXT: movq $0, 16(%rax)
; HYBRID-NEXT: movq $0, 8(%rax)		; HYBRID-NEXT: movq $0, 8(%rax)
; HYBRID-NEXT: retq		; HYBRID-NEXT: retq
;		;
; BURR-LABEL: test2:		; BURR-LABEL: test2:
; BURR: # %bb.0:		; BURR: # %bb.0:
; BURR-NEXT: movq %rdi, %rax		; BURR-NEXT: movq %rdi, %rax
		; BURR-NEXT: xorps %xmm0, %xmm0
		; BURR-NEXT: movaps %xmm0, 16(%rdi)
; BURR-NEXT: xorl %r9d, %r9d		; BURR-NEXT: xorl %r9d, %r9d
; BURR-NEXT: movq %rsi, %r11		; BURR-NEXT: movq %rsi, %r11
; BURR-NEXT: negq %r11		; BURR-NEXT: negq %r11
; BURR-NEXT: movl $0, %r10d		; BURR-NEXT: movl $0, %r10d
; BURR-NEXT: sbbq %rdx, %r10		; BURR-NEXT: sbbq %rdx, %r10
; BURR-NEXT: movl $0, %edi		; BURR-NEXT: movl $0, %edi
; BURR-NEXT: sbbq %rcx, %rdi		; BURR-NEXT: sbbq %rcx, %rdi
; BURR-NEXT: sbbq %r8, %r9		; BURR-NEXT: sbbq %r8, %r9
Show All 16 Lines
; BURR-NEXT: xorq $63, %rsi		; BURR-NEXT: xorq $63, %rsi
; BURR-NEXT: addq $64, %rsi		; BURR-NEXT: addq $64, %rsi
; BURR-NEXT: testq %r10, %r10		; BURR-NEXT: testq %r10, %r10
; BURR-NEXT: cmovneq %rdx, %rsi		; BURR-NEXT: cmovneq %rdx, %rsi
; BURR-NEXT: subq $-128, %rsi		; BURR-NEXT: subq $-128, %rsi
; BURR-NEXT: orq %r9, %rdi		; BURR-NEXT: orq %r9, %rdi
; BURR-NEXT: cmovneq %rcx, %rsi		; BURR-NEXT: cmovneq %rcx, %rsi
; BURR-NEXT: movq %rsi, (%rax)		; BURR-NEXT: movq %rsi, (%rax)
; BURR-NEXT: movq $0, 24(%rax)
; BURR-NEXT: movq $0, 16(%rax)
; BURR-NEXT: movq $0, 8(%rax)		; BURR-NEXT: movq $0, 8(%rax)
; BURR-NEXT: retq		; BURR-NEXT: retq
;		;
; SRC-LABEL: test2:		; SRC-LABEL: test2:
; SRC: # %bb.0:		; SRC: # %bb.0:
; SRC-NEXT: movq %rdi, %rax		; SRC-NEXT: movq %rdi, %rax
; SRC-NEXT: xorl %edi, %edi		; SRC-NEXT: xorl %edi, %edi
; SRC-NEXT: movq %rsi, %r11		; SRC-NEXT: movq %rsi, %r11
Show All 21 Lines
; SRC-NEXT: cmovneq %r8, %rsi		; SRC-NEXT: cmovneq %r8, %rsi
; SRC-NEXT: xorq $63, %rsi		; SRC-NEXT: xorq $63, %rsi
; SRC-NEXT: addq $64, %rsi		; SRC-NEXT: addq $64, %rsi
; SRC-NEXT: testq %r10, %r10		; SRC-NEXT: testq %r10, %r10
; SRC-NEXT: cmovneq %rcx, %rsi		; SRC-NEXT: cmovneq %rcx, %rsi
; SRC-NEXT: subq $-128, %rsi		; SRC-NEXT: subq $-128, %rsi
; SRC-NEXT: orq %r9, %rdi		; SRC-NEXT: orq %r9, %rdi
; SRC-NEXT: cmovneq %rdx, %rsi		; SRC-NEXT: cmovneq %rdx, %rsi
		; SRC-NEXT: xorps %xmm0, %xmm0
		; SRC-NEXT: movaps %xmm0, 16(%rax)
; SRC-NEXT: movq %rsi, (%rax)		; SRC-NEXT: movq %rsi, (%rax)
; SRC-NEXT: movq $0, 24(%rax)
; SRC-NEXT: movq $0, 16(%rax)
; SRC-NEXT: movq $0, 8(%rax)		; SRC-NEXT: movq $0, 8(%rax)
; SRC-NEXT: retq		; SRC-NEXT: retq
;		;
; LIN-LABEL: test2:		; LIN-LABEL: test2:
; LIN: # %bb.0:		; LIN: # %bb.0:
; LIN-NEXT: movq %rdi, %rax		; LIN-NEXT: movq %rdi, %rax
		; LIN-NEXT: xorps %xmm0, %xmm0
		; LIN-NEXT: movaps %xmm0, 16(%rdi)
; LIN-NEXT: movq %rsi, %rdi		; LIN-NEXT: movq %rsi, %rdi
; LIN-NEXT: negq %rdi		; LIN-NEXT: negq %rdi
; LIN-NEXT: andq %rsi, %rdi		; LIN-NEXT: andq %rsi, %rdi
; LIN-NEXT: bsrq %rdi, %rsi		; LIN-NEXT: bsrq %rdi, %rsi
; LIN-NEXT: movl $127, %edi		; LIN-NEXT: movl $127, %edi
; LIN-NEXT: cmovneq %rsi, %rdi		; LIN-NEXT: cmovneq %rsi, %rdi
; LIN-NEXT: xorq $63, %rdi		; LIN-NEXT: xorq $63, %rdi
; LIN-NEXT: addq $64, %rdi		; LIN-NEXT: addq $64, %rdi
Show All 17 Lines
; LIN-NEXT: bsrq %r9, %rdi		; LIN-NEXT: bsrq %r9, %rdi
; LIN-NEXT: xorq $63, %rdi		; LIN-NEXT: xorq $63, %rdi
; LIN-NEXT: testq %r9, %r9		; LIN-NEXT: testq %r9, %r9
; LIN-NEXT: cmoveq %rcx, %rdi		; LIN-NEXT: cmoveq %rcx, %rdi
; LIN-NEXT: orq %rsi, %r9		; LIN-NEXT: orq %rsi, %r9
; LIN-NEXT: cmoveq %rdx, %rdi		; LIN-NEXT: cmoveq %rdx, %rdi
; LIN-NEXT: movq %rdi, (%rax)		; LIN-NEXT: movq %rdi, (%rax)
; LIN-NEXT: movq $0, 8(%rax)		; LIN-NEXT: movq $0, 8(%rax)
; LIN-NEXT: movq $0, 16(%rax)
; LIN-NEXT: movq $0, 24(%rax)
; LIN-NEXT: retq		; LIN-NEXT: retq
%b = sub i256 0, %a		%b = sub i256 0, %a
%c = and i256 %b, %a		%c = and i256 %b, %a
%d = call i256 @llvm.ctlz.i256(i256 %c, i1 false)		%d = call i256 @llvm.ctlz.i256(i256 %c, i1 false)
ret i256 %d		ret i256 %d
}		}

define i256 @test3(i256 %n) nounwind {		define i256 @test3(i256 %n) nounwind {
; ILP-LABEL: test3:		; ILP-LABEL: test3:
; ILP: # %bb.0:		; ILP: # %bb.0:
		; ILP-NEXT: pushq %rbx
; ILP-NEXT: movq %rdi, %rax		; ILP-NEXT: movq %rdi, %rax
; ILP-NEXT: xorl %r10d, %r10d		; ILP-NEXT: xorps %xmm0, %xmm0
		; ILP-NEXT: movaps %xmm0, 16(%rdi)
		; ILP-NEXT: xorl %edi, %edi
; ILP-NEXT: movq %rsi, %r9		; ILP-NEXT: movq %rsi, %r9
; ILP-NEXT: negq %r9		; ILP-NEXT: negq %r9
		; ILP-NEXT: movl $0, %r10d
		; ILP-NEXT: sbbq %rdx, %r10
; ILP-NEXT: movl $0, %r11d		; ILP-NEXT: movl $0, %r11d
; ILP-NEXT: sbbq %rdx, %r11		; ILP-NEXT: sbbq %rcx, %r11
; ILP-NEXT: movl $0, %edi		; ILP-NEXT: sbbq %r8, %rdi
; ILP-NEXT: sbbq %rcx, %rdi		; ILP-NEXT: notq %r8
; ILP-NEXT: sbbq %r8, %r10		; ILP-NEXT: andq %rdi, %r8
		; ILP-NEXT: bsrq %r8, %rbx
		; ILP-NEXT: notq %rdx
		; ILP-NEXT: andq %r10, %rdx
; ILP-NEXT: notq %rcx		; ILP-NEXT: notq %rcx
; ILP-NEXT: andq %rdi, %rcx		; ILP-NEXT: andq %r11, %rcx
		; ILP-NEXT: bsrq %rdx, %r10
		; ILP-NEXT: xorq $63, %rbx
; ILP-NEXT: bsrq %rcx, %rdi		; ILP-NEXT: bsrq %rcx, %rdi
; ILP-NEXT: notq %rdx
; ILP-NEXT: andq %r11, %rdx
; ILP-NEXT: xorq $63, %rdi		; ILP-NEXT: xorq $63, %rdi
; ILP-NEXT: notq %r8
; ILP-NEXT: andq %r10, %r8
; ILP-NEXT: bsrq %r8, %r10
; ILP-NEXT: xorq $63, %r10
; ILP-NEXT: addq $64, %rdi		; ILP-NEXT: addq $64, %rdi
; ILP-NEXT: bsrq %rdx, %r11
; ILP-NEXT: notq %rsi		; ILP-NEXT: notq %rsi
; ILP-NEXT: testq %r8, %r8		; ILP-NEXT: testq %r8, %r8
; ILP-NEXT: movq $0, 24(%rax)
; ILP-NEXT: movq $0, 16(%rax)
; ILP-NEXT: movq $0, 8(%rax)		; ILP-NEXT: movq $0, 8(%rax)
; ILP-NEXT: cmovneq %r10, %rdi		; ILP-NEXT: cmovneq %rbx, %rdi
; ILP-NEXT: xorq $63, %r11		; ILP-NEXT: xorq $63, %r10
; ILP-NEXT: andq %r9, %rsi		; ILP-NEXT: andq %r9, %rsi
; ILP-NEXT: movl $127, %r9d		; ILP-NEXT: movl $127, %ebx
; ILP-NEXT: bsrq %rsi, %rsi		; ILP-NEXT: bsrq %rsi, %rsi
; ILP-NEXT: cmoveq %r9, %rsi		; ILP-NEXT: cmoveq %rbx, %rsi
; ILP-NEXT: xorq $63, %rsi		; ILP-NEXT: xorq $63, %rsi
; ILP-NEXT: addq $64, %rsi		; ILP-NEXT: addq $64, %rsi
; ILP-NEXT: testq %rdx, %rdx		; ILP-NEXT: testq %rdx, %rdx
; ILP-NEXT: cmovneq %r11, %rsi		; ILP-NEXT: cmovneq %r10, %rsi
; ILP-NEXT: subq $-128, %rsi		; ILP-NEXT: subq $-128, %rsi
; ILP-NEXT: orq %rcx, %r8		; ILP-NEXT: orq %r8, %rcx
; ILP-NEXT: cmovneq %rdi, %rsi		; ILP-NEXT: cmovneq %rdi, %rsi
; ILP-NEXT: movq %rsi, (%rax)		; ILP-NEXT: movq %rsi, (%rax)
		; ILP-NEXT: popq %rbx
; ILP-NEXT: retq		; ILP-NEXT: retq
;		;
; HYBRID-LABEL: test3:		; HYBRID-LABEL: test3:
; HYBRID: # %bb.0:		; HYBRID: # %bb.0:
; HYBRID-NEXT: pushq %rbx		; HYBRID-NEXT: pushq %rbx
; HYBRID-NEXT: movq %rdi, %rax		; HYBRID-NEXT: movq %rdi, %rax
		; HYBRID-NEXT: xorps %xmm0, %xmm0
		; HYBRID-NEXT: movaps %xmm0, 16(%rdi)
; HYBRID-NEXT: xorl %edi, %edi		; HYBRID-NEXT: xorl %edi, %edi
; HYBRID-NEXT: movq %rsi, %r9		; HYBRID-NEXT: movq %rsi, %r9
; HYBRID-NEXT: negq %r9		; HYBRID-NEXT: negq %r9
; HYBRID-NEXT: movl $0, %r10d		; HYBRID-NEXT: movl $0, %r10d
; HYBRID-NEXT: sbbq %rdx, %r10		; HYBRID-NEXT: sbbq %rdx, %r10
; HYBRID-NEXT: movl $0, %r11d		; HYBRID-NEXT: movl $0, %r11d
; HYBRID-NEXT: sbbq %rcx, %r11		; HYBRID-NEXT: sbbq %rcx, %r11
; HYBRID-NEXT: sbbq %r8, %rdi		; HYBRID-NEXT: sbbq %r8, %rdi
Show All 20 Lines
; HYBRID-NEXT: xorq $63, %rsi		; HYBRID-NEXT: xorq $63, %rsi
; HYBRID-NEXT: addq $64, %rsi		; HYBRID-NEXT: addq $64, %rsi
; HYBRID-NEXT: testq %rdx, %rdx		; HYBRID-NEXT: testq %rdx, %rdx
; HYBRID-NEXT: cmovneq %rbx, %rsi		; HYBRID-NEXT: cmovneq %rbx, %rsi
; HYBRID-NEXT: subq $-128, %rsi		; HYBRID-NEXT: subq $-128, %rsi
; HYBRID-NEXT: orq %r8, %rcx		; HYBRID-NEXT: orq %r8, %rcx
; HYBRID-NEXT: cmovneq %rdi, %rsi		; HYBRID-NEXT: cmovneq %rdi, %rsi
; HYBRID-NEXT: movq %rsi, (%rax)		; HYBRID-NEXT: movq %rsi, (%rax)
; HYBRID-NEXT: movq $0, 24(%rax)
; HYBRID-NEXT: movq $0, 16(%rax)
; HYBRID-NEXT: movq $0, 8(%rax)		; HYBRID-NEXT: movq $0, 8(%rax)
; HYBRID-NEXT: popq %rbx		; HYBRID-NEXT: popq %rbx
; HYBRID-NEXT: retq		; HYBRID-NEXT: retq
;		;
; BURR-LABEL: test3:		; BURR-LABEL: test3:
; BURR: # %bb.0:		; BURR: # %bb.0:
; BURR-NEXT: pushq %rbx		; BURR-NEXT: pushq %rbx
; BURR-NEXT: movq %rdi, %rax		; BURR-NEXT: movq %rdi, %rax
		; BURR-NEXT: xorps %xmm0, %xmm0
		; BURR-NEXT: movaps %xmm0, 16(%rdi)
; BURR-NEXT: xorl %edi, %edi		; BURR-NEXT: xorl %edi, %edi
; BURR-NEXT: movq %rsi, %r9		; BURR-NEXT: movq %rsi, %r9
; BURR-NEXT: negq %r9		; BURR-NEXT: negq %r9
; BURR-NEXT: movl $0, %r10d		; BURR-NEXT: movl $0, %r10d
; BURR-NEXT: sbbq %rdx, %r10		; BURR-NEXT: sbbq %rdx, %r10
; BURR-NEXT: movl $0, %r11d		; BURR-NEXT: movl $0, %r11d
; BURR-NEXT: sbbq %rcx, %r11		; BURR-NEXT: sbbq %rcx, %r11
; BURR-NEXT: sbbq %r8, %rdi		; BURR-NEXT: sbbq %r8, %rdi
Show All 20 Lines
; BURR-NEXT: xorq $63, %rsi		; BURR-NEXT: xorq $63, %rsi
; BURR-NEXT: addq $64, %rsi		; BURR-NEXT: addq $64, %rsi
; BURR-NEXT: testq %rdx, %rdx		; BURR-NEXT: testq %rdx, %rdx
; BURR-NEXT: cmovneq %rbx, %rsi		; BURR-NEXT: cmovneq %rbx, %rsi
; BURR-NEXT: subq $-128, %rsi		; BURR-NEXT: subq $-128, %rsi
; BURR-NEXT: orq %r8, %rcx		; BURR-NEXT: orq %r8, %rcx
; BURR-NEXT: cmovneq %rdi, %rsi		; BURR-NEXT: cmovneq %rdi, %rsi
; BURR-NEXT: movq %rsi, (%rax)		; BURR-NEXT: movq %rsi, (%rax)
; BURR-NEXT: movq $0, 24(%rax)
; BURR-NEXT: movq $0, 16(%rax)
; BURR-NEXT: movq $0, 8(%rax)		; BURR-NEXT: movq $0, 8(%rax)
; BURR-NEXT: popq %rbx		; BURR-NEXT: popq %rbx
; BURR-NEXT: retq		; BURR-NEXT: retq
;		;
; SRC-LABEL: test3:		; SRC-LABEL: test3:
; SRC: # %bb.0:		; SRC: # %bb.0:
; SRC-NEXT: movq %rdi, %rax		; SRC-NEXT: movq %rdi, %rax
; SRC-NEXT: movq %rsi, %r9		; SRC-NEXT: movq %rsi, %r9
Show All 26 Lines
; SRC-NEXT: cmovneq %r10, %rsi		; SRC-NEXT: cmovneq %r10, %rsi
; SRC-NEXT: xorq $63, %rsi		; SRC-NEXT: xorq $63, %rsi
; SRC-NEXT: addq $64, %rsi		; SRC-NEXT: addq $64, %rsi
; SRC-NEXT: testq %rdx, %rdx		; SRC-NEXT: testq %rdx, %rdx
; SRC-NEXT: cmovneq %r9, %rsi		; SRC-NEXT: cmovneq %r9, %rsi
; SRC-NEXT: subq $-128, %rsi		; SRC-NEXT: subq $-128, %rsi
; SRC-NEXT: orq %rcx, %r8		; SRC-NEXT: orq %rcx, %r8
; SRC-NEXT: cmovneq %rdi, %rsi		; SRC-NEXT: cmovneq %rdi, %rsi
		; SRC-NEXT: xorps %xmm0, %xmm0
		; SRC-NEXT: movaps %xmm0, 16(%rax)
; SRC-NEXT: movq %rsi, (%rax)		; SRC-NEXT: movq %rsi, (%rax)
; SRC-NEXT: movq $0, 24(%rax)
; SRC-NEXT: movq $0, 16(%rax)
; SRC-NEXT: movq $0, 8(%rax)		; SRC-NEXT: movq $0, 8(%rax)
; SRC-NEXT: retq		; SRC-NEXT: retq
;		;
; LIN-LABEL: test3:		; LIN-LABEL: test3:
; LIN: # %bb.0:		; LIN: # %bb.0:
; LIN-NEXT: movq %rdi, %rax		; LIN-NEXT: movq %rdi, %rax
		; LIN-NEXT: xorps %xmm0, %xmm0
		; LIN-NEXT: movaps %xmm0, 16(%rdi)
; LIN-NEXT: movq %rsi, %rdi		; LIN-NEXT: movq %rsi, %rdi
; LIN-NEXT: negq %rdi		; LIN-NEXT: negq %rdi
; LIN-NEXT: notq %rsi		; LIN-NEXT: notq %rsi
; LIN-NEXT: andq %rdi, %rsi		; LIN-NEXT: andq %rdi, %rsi
; LIN-NEXT: bsrq %rsi, %rsi		; LIN-NEXT: bsrq %rsi, %rsi
; LIN-NEXT: movl $127, %edi		; LIN-NEXT: movl $127, %edi
; LIN-NEXT: cmovneq %rsi, %rdi		; LIN-NEXT: cmovneq %rsi, %rdi
; LIN-NEXT: xorq $63, %rdi		; LIN-NEXT: xorq $63, %rdi
Show All 21 Lines
; LIN-NEXT: bsrq %r8, %rdi		; LIN-NEXT: bsrq %r8, %rdi
; LIN-NEXT: xorq $63, %rdi		; LIN-NEXT: xorq $63, %rdi
; LIN-NEXT: testq %r8, %r8		; LIN-NEXT: testq %r8, %r8
; LIN-NEXT: cmoveq %rdx, %rdi		; LIN-NEXT: cmoveq %rdx, %rdi
; LIN-NEXT: orq %rcx, %r8		; LIN-NEXT: orq %rcx, %r8
; LIN-NEXT: cmoveq %rsi, %rdi		; LIN-NEXT: cmoveq %rsi, %rdi
; LIN-NEXT: movq %rdi, (%rax)		; LIN-NEXT: movq %rdi, (%rax)
; LIN-NEXT: movq $0, 8(%rax)		; LIN-NEXT: movq $0, 8(%rax)
; LIN-NEXT: movq $0, 16(%rax)
; LIN-NEXT: movq $0, 24(%rax)
; LIN-NEXT: retq		; LIN-NEXT: retq
%m = sub i256 -1, %n		%m = sub i256 -1, %n
%x = sub i256 0, %n		%x = sub i256 0, %n
%y = and i256 %x, %m		%y = and i256 %x, %m
%z = call i256 @llvm.ctlz.i256(i256 %y, i1 false)		%z = call i256 @llvm.ctlz.i256(i256 %y, i1 false)
ret i256 %z		ret i256 %z
}		}

▲ Show 20 Lines • Show All 335 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/setcc-wide-types.ll

Show First 20 Lines • Show All 614 Lines • ▼ Show 20 Lines
}		}

; This test models the expansion of 'memcmp(a, b, 32) != 0'		; This test models the expansion of 'memcmp(a, b, 32) != 0'
; if we allowed 2 pairs of 16-byte loads per block.		; if we allowed 2 pairs of 16-byte loads per block.

define i32 @ne_i128_pair(i128* %a, i128* %b) {		define i32 @ne_i128_pair(i128* %a, i128* %b) {
; SSE2-LABEL: ne_i128_pair:		; SSE2-LABEL: ne_i128_pair:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqu (%rdi), %xmm0		; SSE2-NEXT: movdqa (%rdi), %xmm0
; SSE2-NEXT: movdqu 16(%rdi), %xmm1		; SSE2-NEXT: movdqa 16(%rdi), %xmm1
; SSE2-NEXT: movdqu (%rsi), %xmm2		; SSE2-NEXT: pcmpeqb 16(%rsi), %xmm1
; SSE2-NEXT: pcmpeqb %xmm0, %xmm2		; SSE2-NEXT: pcmpeqb (%rsi), %xmm0
; SSE2-NEXT: movdqu 16(%rsi), %xmm0		; SSE2-NEXT: pand %xmm1, %xmm0
; SSE2-NEXT: pcmpeqb %xmm1, %xmm0
; SSE2-NEXT: pand %xmm2, %xmm0
; SSE2-NEXT: pmovmskb %xmm0, %ecx		; SSE2-NEXT: pmovmskb %xmm0, %ecx
; SSE2-NEXT: xorl %eax, %eax		; SSE2-NEXT: xorl %eax, %eax
; SSE2-NEXT: cmpl $65535, %ecx # imm = 0xFFFF		; SSE2-NEXT: cmpl $65535, %ecx # imm = 0xFFFF
; SSE2-NEXT: setne %al		; SSE2-NEXT: setne %al
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE41-LABEL: ne_i128_pair:		; SSE41-LABEL: ne_i128_pair:
; SSE41: # %bb.0:		; SSE41: # %bb.0:
; SSE41-NEXT: movdqu (%rdi), %xmm0		; SSE41-NEXT: movdqa (%rdi), %xmm0
; SSE41-NEXT: movdqu 16(%rdi), %xmm1		; SSE41-NEXT: movdqa 16(%rdi), %xmm1
; SSE41-NEXT: movdqu (%rsi), %xmm2		; SSE41-NEXT: pxor 16(%rsi), %xmm1
; SSE41-NEXT: pxor %xmm0, %xmm2		; SSE41-NEXT: pxor (%rsi), %xmm0
; SSE41-NEXT: movdqu 16(%rsi), %xmm0		; SSE41-NEXT: por %xmm1, %xmm0
; SSE41-NEXT: pxor %xmm1, %xmm0
; SSE41-NEXT: por %xmm2, %xmm0
; SSE41-NEXT: xorl %eax, %eax		; SSE41-NEXT: xorl %eax, %eax
; SSE41-NEXT: ptest %xmm0, %xmm0		; SSE41-NEXT: ptest %xmm0, %xmm0
; SSE41-NEXT: setne %al		; SSE41-NEXT: setne %al
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVXANY-LABEL: ne_i128_pair:		; AVXANY-LABEL: ne_i128_pair:
; AVXANY: # %bb.0:		; AVXANY: # %bb.0:
; AVXANY-NEXT: vmovdqu (%rdi), %xmm0		; AVXANY-NEXT: vmovdqa (%rdi), %xmm0
; AVXANY-NEXT: vmovdqu 16(%rdi), %xmm1		; AVXANY-NEXT: vmovdqa 16(%rdi), %xmm1
; AVXANY-NEXT: vpxor 16(%rsi), %xmm1, %xmm1		; AVXANY-NEXT: vpxor 16(%rsi), %xmm1, %xmm1
; AVXANY-NEXT: vpxor (%rsi), %xmm0, %xmm0		; AVXANY-NEXT: vpxor (%rsi), %xmm0, %xmm0
; AVXANY-NEXT: vpor %xmm1, %xmm0, %xmm0		; AVXANY-NEXT: vpor %xmm1, %xmm0, %xmm0
; AVXANY-NEXT: xorl %eax, %eax		; AVXANY-NEXT: xorl %eax, %eax
; AVXANY-NEXT: vptest %xmm0, %xmm0		; AVXANY-NEXT: vptest %xmm0, %xmm0
; AVXANY-NEXT: setne %al		; AVXANY-NEXT: setne %al
; AVXANY-NEXT: retq		; AVXANY-NEXT: retq
%a0 = load i128, i128* %a		%a0 = load i128, i128* %a
Show All 11 Lines
}		}

; This test models the expansion of 'memcmp(a, b, 32) == 0'		; This test models the expansion of 'memcmp(a, b, 32) == 0'
; if we allowed 2 pairs of 16-byte loads per block.		; if we allowed 2 pairs of 16-byte loads per block.

define i32 @eq_i128_pair(i128* %a, i128* %b) {		define i32 @eq_i128_pair(i128* %a, i128* %b) {
; SSE2-LABEL: eq_i128_pair:		; SSE2-LABEL: eq_i128_pair:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqu (%rdi), %xmm0		; SSE2-NEXT: movdqa (%rdi), %xmm0
; SSE2-NEXT: movdqu 16(%rdi), %xmm1		; SSE2-NEXT: movdqa 16(%rdi), %xmm1
; SSE2-NEXT: movdqu (%rsi), %xmm2		; SSE2-NEXT: pcmpeqb 16(%rsi), %xmm1
; SSE2-NEXT: pcmpeqb %xmm0, %xmm2		; SSE2-NEXT: pcmpeqb (%rsi), %xmm0
; SSE2-NEXT: movdqu 16(%rsi), %xmm0		; SSE2-NEXT: pand %xmm1, %xmm0
; SSE2-NEXT: pcmpeqb %xmm1, %xmm0
; SSE2-NEXT: pand %xmm2, %xmm0
; SSE2-NEXT: pmovmskb %xmm0, %ecx		; SSE2-NEXT: pmovmskb %xmm0, %ecx
; SSE2-NEXT: xorl %eax, %eax		; SSE2-NEXT: xorl %eax, %eax
; SSE2-NEXT: cmpl $65535, %ecx # imm = 0xFFFF		; SSE2-NEXT: cmpl $65535, %ecx # imm = 0xFFFF
; SSE2-NEXT: sete %al		; SSE2-NEXT: sete %al
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE41-LABEL: eq_i128_pair:		; SSE41-LABEL: eq_i128_pair:
; SSE41: # %bb.0:		; SSE41: # %bb.0:
; SSE41-NEXT: movdqu (%rdi), %xmm0		; SSE41-NEXT: movdqa (%rdi), %xmm0
; SSE41-NEXT: movdqu 16(%rdi), %xmm1		; SSE41-NEXT: movdqa 16(%rdi), %xmm1
; SSE41-NEXT: movdqu (%rsi), %xmm2		; SSE41-NEXT: pxor 16(%rsi), %xmm1
; SSE41-NEXT: pxor %xmm0, %xmm2		; SSE41-NEXT: pxor (%rsi), %xmm0
; SSE41-NEXT: movdqu 16(%rsi), %xmm0		; SSE41-NEXT: por %xmm1, %xmm0
; SSE41-NEXT: pxor %xmm1, %xmm0
; SSE41-NEXT: por %xmm2, %xmm0
; SSE41-NEXT: xorl %eax, %eax		; SSE41-NEXT: xorl %eax, %eax
; SSE41-NEXT: ptest %xmm0, %xmm0		; SSE41-NEXT: ptest %xmm0, %xmm0
; SSE41-NEXT: sete %al		; SSE41-NEXT: sete %al
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVXANY-LABEL: eq_i128_pair:		; AVXANY-LABEL: eq_i128_pair:
; AVXANY: # %bb.0:		; AVXANY: # %bb.0:
; AVXANY-NEXT: vmovdqu (%rdi), %xmm0		; AVXANY-NEXT: vmovdqa (%rdi), %xmm0
; AVXANY-NEXT: vmovdqu 16(%rdi), %xmm1		; AVXANY-NEXT: vmovdqa 16(%rdi), %xmm1
; AVXANY-NEXT: vpxor 16(%rsi), %xmm1, %xmm1		; AVXANY-NEXT: vpxor 16(%rsi), %xmm1, %xmm1
; AVXANY-NEXT: vpxor (%rsi), %xmm0, %xmm0		; AVXANY-NEXT: vpxor (%rsi), %xmm0, %xmm0
; AVXANY-NEXT: vpor %xmm1, %xmm0, %xmm0		; AVXANY-NEXT: vpor %xmm1, %xmm0, %xmm0
; AVXANY-NEXT: xorl %eax, %eax		; AVXANY-NEXT: xorl %eax, %eax
; AVXANY-NEXT: vptest %xmm0, %xmm0		; AVXANY-NEXT: vptest %xmm0, %xmm0
; AVXANY-NEXT: sete %al		; AVXANY-NEXT: sete %al
; AVXANY-NEXT: retq		; AVXANY-NEXT: retq
%a0 = load i128, i128* %a		%a0 = load i128, i128* %a
▲ Show 20 Lines • Show All 515 Lines • ▼ Show 20 Lines	; ANY-NEXT: retq
%a2 = add i256 %a, 1		%a2 = add i256 %a, 1
%r = icmp eq i256 %a2, %b		%r = icmp eq i256 %a2, %b
ret i1 %r		ret i1 %r
}		}

define i1 @eq_i512_op(i512 %a, i512 %b) {		define i1 @eq_i512_op(i512 %a, i512 %b) {
; ANY-LABEL: eq_i512_op:		; ANY-LABEL: eq_i512_op:
; ANY: # %bb.0:		; ANY: # %bb.0:
; ANY-NEXT: movq {{[0-9]+}}(%rsp), %r10
; ANY-NEXT: movq {{[0-9]+}}(%rsp), %rax		; ANY-NEXT: movq {{[0-9]+}}(%rsp), %rax
		; ANY-NEXT: movq {{[0-9]+}}(%rsp), %r10
; ANY-NEXT: addq $1, %rdi		; ANY-NEXT: addq $1, %rdi
; ANY-NEXT: adcq $0, %rsi		; ANY-NEXT: adcq $0, %rsi
; ANY-NEXT: adcq $0, %rdx		; ANY-NEXT: adcq $0, %rdx
; ANY-NEXT: adcq $0, %rcx		; ANY-NEXT: adcq $0, %rcx
; ANY-NEXT: adcq $0, %r8		; ANY-NEXT: adcq $0, %r8
; ANY-NEXT: adcq $0, %r9		; ANY-NEXT: adcq $0, %r9
; ANY-NEXT: adcq $0, %r10		; ANY-NEXT: adcq $0, %r10
; ANY-NEXT: adcq $0, %rax		; ANY-NEXT: adcq $0, %rax
▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/sret-implicit.ll

	Show All 19 Lines
	; X86: retl			; X86: retl

	define i256 @sret_demoted() {			define i256 @sret_demoted() {
	ret i256 0			ret i256 0
	}			}

	; X64-LABEL: sret_demoted			; X64-LABEL: sret_demoted
	; X64-DAG: movq %rdi, %rax			; X64-DAG: movq %rdi, %rax
	; X64-DAG: movq $0, (%rdi)			; X64-DAG: xorps %xmm0, %xmm0
				; X64-DAG: movaps %xmm0, 16(%rdi)
				; X64-DAG: movaps %xmm0, (%rdi)
	; X64: retq			; X64: retq

	; X86-LABEL: sret_demoted			; X86-LABEL: sret_demoted
	; X86: movl 4(%esp), %eax			; X86: movl 4(%esp), %eax
	; X86: movl $0, (%eax)			; X86: movl $0, (%eax)
	; X86: retl			; X86: retl

llvm/test/CodeGen/X86/statepoint-vector.ll

	Show First 20 Lines • Show All 116 Lines • ▼ Show 20 Lines
	; the moment, this simply means spilling them, but there's a potential			; the moment, this simply means spilling them, but there's a potential
	; optimization for values representable as sext(Con64).			; optimization for values representable as sext(Con64).
	define void @test5() gc "statepoint-example" {			define void @test5() gc "statepoint-example" {
	; CHECK-LABEL: test5:			; CHECK-LABEL: test5:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: subq $40, %rsp			; CHECK-NEXT: subq $40, %rsp
	; CHECK-NEXT: .cfi_def_cfa_offset 48			; CHECK-NEXT: .cfi_def_cfa_offset 48
	; CHECK-NEXT: xorps %xmm0, %xmm0			; CHECK-NEXT: xorps %xmm0, %xmm0
	; CHECK-NEXT: movups %xmm0, {{[0-9]+}}(%rsp)			; CHECK-NEXT: movaps %xmm0, {{[0-9]+}}(%rsp)
	; CHECK-NEXT: movq $-1, {{[0-9]+}}(%rsp)
	; CHECK-NEXT: movq $-1, {{[0-9]+}}(%rsp)			; CHECK-NEXT: movq $-1, {{[0-9]+}}(%rsp)
				; CHECK-NEXT: movq $-1, (%rsp)
	; CHECK-NEXT: callq do_safepoint			; CHECK-NEXT: callq do_safepoint
	; CHECK-NEXT: .Ltmp4:			; CHECK-NEXT: .Ltmp4:
	; CHECK-NEXT: addq $40, %rsp			; CHECK-NEXT: addq $40, %rsp
	; CHECK-NEXT: .cfi_def_cfa_offset 8			; CHECK-NEXT: .cfi_def_cfa_offset 8
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%safepoint_token = call token (i64, i32, void (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void () @do_safepoint, i32 0, i32 0, i32 0, i32 0) ["deopt" (i128 0, i128 -1)]			%safepoint_token = call token (i64, i32, void (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void () @do_safepoint, i32 0, i32 0, i32 0, i32 0) ["deopt" (i128 0, i128 -1)]
	ret void			ret void
	▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

llvm/test/tools/llvm-lto2/X86/pipeline.ll

	Show All 9 Lines
	; RUN: -r %t1.bc,patatino,px -opt-pipeline loweratomic \			; RUN: -r %t1.bc,patatino,px -opt-pipeline loweratomic \
	; RUN: -aa-pipeline basic-aa			; RUN: -aa-pipeline basic-aa
	; RUN: llvm-dis < %t.o.0.4.opt.bc \| FileCheck %s --check-prefix=CUSTOM			; RUN: llvm-dis < %t.o.0.4.opt.bc \| FileCheck %s --check-prefix=CUSTOM

	; Try the new pass manager LTO default pipeline (make sure the option			; Try the new pass manager LTO default pipeline (make sure the option
	; is accepted).			; is accepted).
	; RUN: llvm-lto2 run %t1.bc -o %t.o -use-new-pm -r %t1.bc,patatino,px			; RUN: llvm-lto2 run %t1.bc -o %t.o -use-new-pm -r %t1.bc,patatino,px

	target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	define void @patatino() {			define void @patatino() {
	fence seq_cst			fence seq_cst
	ret void			ret void
	}			}

	; CUSTOM: define void @patatino() {			; CUSTOM: define void @patatino() {
	Show All 16 Lines

llvm/test/tools/llvm-lto2/X86/slp-vectorize-pm.ll

	Show All 20 Lines
	; CHECK-O1-SLP-NOT: Running pass: SLPVectorizerPass			; CHECK-O1-SLP-NOT: Running pass: SLPVectorizerPass
	; CHECK-O2-SLP: Running pass: SLPVectorizerPass			; CHECK-O2-SLP: Running pass: SLPVectorizerPass
	; CHECK-O3-SLP: Running pass: SLPVectorizerPass			; CHECK-O3-SLP: Running pass: SLPVectorizerPass
	; CHECK-O0-LPV-NOT: = !{!"llvm.loop.isvectorized", i32 1}			; CHECK-O0-LPV-NOT: = !{!"llvm.loop.isvectorized", i32 1}
	; CHECK-O1-LPV-NOT: = !{!"llvm.loop.isvectorized", i32 1}			; CHECK-O1-LPV-NOT: = !{!"llvm.loop.isvectorized", i32 1}
	; CHECK-O2-LPV: = !{!"llvm.loop.isvectorized", i32 1}			; CHECK-O2-LPV: = !{!"llvm.loop.isvectorized", i32 1}
	; CHECK-O3-LPV: = !{!"llvm.loop.isvectorized", i32 1}			; CHECK-O3-LPV: = !{!"llvm.loop.isvectorized", i32 1}

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-i128:128-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	define i32 @foo(i32* %a) {			define i32 @foo(i32* %a) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	Show All 14 Lines

llvm/test/tools/llvm-lto2/X86/stats-file-option.ll

	; REQUIRES: asserts			; REQUIRES: asserts

	; RUN: llvm-as < %s > %t1.bc			; RUN: llvm-as < %s > %t1.bc

	; Try to save statistics to file.			; Try to save statistics to file.
	; RUN: llvm-lto2 run %t1.bc -o %t.o -r %t1.bc,patatino,px -stats-file=%t2.stats			; RUN: llvm-lto2 run %t1.bc -o %t.o -r %t1.bc,patatino,px -stats-file=%t2.stats
	; RUN: FileCheck --input-file=%t2.stats %s			; RUN: FileCheck --input-file=%t2.stats %s

	target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	define void @patatino() {			define void @patatino() {
	fence seq_cst			fence seq_cst
	ret void			ret void
	}			}

	; CHECK: {			; CHECK: {
	; CHECK: "asm-printer.EmittedInsts":			; CHECK: "asm-printer.EmittedInsts":
	; CHECK: }			; CHECK: }


	; Try to save statistics to an invalid file.			; Try to save statistics to an invalid file.
	; RUN: not llvm-lto2 run %t1.bc -o %t.o -r %t1.bc,patatino,px \			; RUN: not llvm-lto2 run %t1.bc -o %t.o -r %t1.bc,patatino,px \
	; RUN: -stats-file=%t2/foo.stats 2>&1 \| FileCheck --check-prefix=ERROR %s			; RUN: -stats-file=%t2/foo.stats 2>&1 \| FileCheck --check-prefix=ERROR %s
	; ERROR: LTO::run failed: {{[Nn]}}o such file or directory			; ERROR: LTO::run failed: {{[Nn]}}o such file or directory

llvm/unittests/Bitcode/DataLayoutUpgradeTest.cpp

	Show All 9 Lines
	#include "gtest/gtest.h"			#include "gtest/gtest.h"

	using namespace llvm;			using namespace llvm;

	namespace {			namespace {

	TEST(DataLayoutUpgradeTest, ValidDataLayoutUpgrade) {			TEST(DataLayoutUpgradeTest, ValidDataLayoutUpgrade) {
	std::string DL1 =			std::string DL1 =
	UpgradeDataLayoutString("e-m:e-p:32:32-i64:64-f80:128-n8:16:32:64-S128",			UpgradeDataLayoutString("e-m:e-p:32:32-i64:64-i128:128-f80:128-n8:16:32:64-S128",
	"x86_64-unknown-linux-gnu");			"x86_64-unknown-linux-gnu");
	std::string DL2 = UpgradeDataLayoutString(			std::string DL2 = UpgradeDataLayoutString(
	"e-m:w-p:32:32-i64:64-f80:32-n8:16:32-S32", "i686-pc-windows-msvc");			"e-m:w-p:32:32-i64:64-i128:128-f80:32-n8:16:32-S32", "i686-pc-windows-msvc");
	std::string DL3 = UpgradeDataLayoutString("e-m:o-i64:64-i128:128-n32:64-S128",			std::string DL3 = UpgradeDataLayoutString("e-m:o-i64:64-i128:128-n32:64-S128",
	"x86_64-apple-macosx");			"x86_64-apple-macosx");
	EXPECT_EQ(DL1, "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64"			EXPECT_EQ(DL1, "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64"
	"-f80:128-n8:16:32:64-S128");			"-i128:128-f80:128-n8:16:32:64-S128");
	EXPECT_EQ(DL2, "e-m:w-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64"			EXPECT_EQ(DL2, "e-m:w-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64"
	"-f80:32-n8:16:32-S32");			"-i128:128-f80:32-n8:16:32-S32");
	EXPECT_EQ(DL3, "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128"			EXPECT_EQ(DL3, "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128"
	"-n32:64-S128");			"-n32:64-S128");
	}			}

	TEST(DataLayoutUpgradeTest, NoDataLayoutUpgrade) {			TEST(DataLayoutUpgradeTest, NoDataLayoutUpgrade) {
	std::string DL1 = UpgradeDataLayoutString(			std::string DL1 = UpgradeDataLayoutString(
	"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32"			"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-i128:128:128-f32:32:32"
	"-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"			"-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"
	"-n8:16:32:64-S128",			"-n8:16:32:64-S128",
	"x86_64-unknown-linux-gnu");			"x86_64-unknown-linux-gnu");
	std::string DL2 = UpgradeDataLayoutString("e-p:32:32", "i686-apple-darwin9");			std::string DL2 = UpgradeDataLayoutString("e-p:32:32", "i686-apple-darwin9");
	std::string DL3 = UpgradeDataLayoutString("e-m:e-i64:64-n32:64",			std::string DL3 = UpgradeDataLayoutString("e-m:e-i64:64-n32:64",
	"powerpc64le-unknown-linux-gnu");			"powerpc64le-unknown-linux-gnu");
	std::string DL4 =			std::string DL4 =
	UpgradeDataLayoutString("e-m:o-i64:64-i128:128-n32:64-S128", "aarch64--");			UpgradeDataLayoutString("e-m:o-i64:64-i128:128-n32:64-S128", "aarch64--");
	EXPECT_EQ(DL1, "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64"			EXPECT_EQ(DL1, "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-i128:128:128"
	"-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64"			"-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64"
	"-f80:128:128-n8:16:32:64-S128");			"-f80:128:128-n8:16:32:64-S128");
	EXPECT_EQ(DL2, "e-p:32:32");			EXPECT_EQ(DL2, "e-p:32:32");
	EXPECT_EQ(DL3, "e-m:e-i64:64-n32:64");			EXPECT_EQ(DL3, "e-m:e-i64:64-n32:64");
	EXPECT_EQ(DL4, "e-m:o-i64:64-i128:128-n32:64-S128");			EXPECT_EQ(DL4, "e-m:o-i64:64-i128:128-n32:64-S128");
	}			}

	TEST(DataLayoutUpgradeTest, EmptyDataLayout) {			TEST(DataLayoutUpgradeTest, EmptyDataLayout) {
	std::string DL1 = UpgradeDataLayoutString("", "x86_64-unknown-linux-gnu");			std::string DL1 = UpgradeDataLayoutString("", "x86_64-unknown-linux-gnu");
	std::string DL2 = UpgradeDataLayoutString(			std::string DL2 = UpgradeDataLayoutString(
	"e-m:e-p:32:32-i64:64-f80:128-n8:16:32:64-S128", "");			"e-m:e-p:32:32-i64:64-i128:128-f80:128-n8:16:32:64-S128", "");
	EXPECT_EQ(DL1, "");			EXPECT_EQ(DL1, "");
	EXPECT_EQ(DL2, "e-m:e-p:32:32-i64:64-f80:128-n8:16:32:64-S128");			EXPECT_EQ(DL2, "e-m:e-p:32:32-i64:64-i128:128-f80:128-n8:16:32:64-S128");
	}			}

	} // end namespace			} // end namespace

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Align i128 to 16 bytes in x86 datalayoutsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 286858

clang/lib/Basic/Targets/OSTargets.h

clang/lib/Basic/Targets/X86.h

clang/test/CodeGen/target-data.c

llvm/include/llvm/IR/AutoUpgrade.h

llvm/lib/IR/AutoUpgrade.cpp

llvm/lib/Target/X86/X86TargetMachine.cpp

llvm/test/Bitcode/upgrade-datalayout.ll

llvm/test/Bitcode/upgrade-datalayout3.ll

llvm/test/CodeGen/X86/atomic-unordered.ll

llvm/test/CodeGen/X86/bitcast-i256.ll

llvm/test/CodeGen/X86/catchpad-dynamic-alloca.ll

llvm/test/CodeGen/X86/implicit-null-check.ll

llvm/test/CodeGen/X86/legalize-shl-vec.ll

llvm/test/CodeGen/X86/osx-private-labels.ll

llvm/test/CodeGen/X86/scheduler-backtracking.ll

llvm/test/CodeGen/X86/setcc-wide-types.ll

llvm/test/CodeGen/X86/sret-implicit.ll

llvm/test/CodeGen/X86/statepoint-vector.ll

llvm/test/tools/llvm-lto2/X86/pipeline.ll

llvm/test/tools/llvm-lto2/X86/slp-vectorize-pm.ll

llvm/test/tools/llvm-lto2/X86/stats-file-option.ll

llvm/unittests/Bitcode/DataLayoutUpgradeTest.cpp

[X86] Align i128 to 16 bytes in x86 datalayouts
ClosedPublic