This is an archive of the discontinued LLVM Phabricator instance.

[X86] Align i128 to 16 bytes in x86 datalayouts
ClosedPublic

Authored by hvdijk on Aug 20 2020, 10:54 AM.

Details

Summary

This is an attempt at rebooting https://reviews.llvm.org/D28990

I've included AutoUpgrade changes to modify the data layout to satisfy the compatible layout check. But this does mean alloca, loads, stores, etc in old IR will automatically get this new alignment.

This should fix PR46320.

Diff Detail

Event Timeline

craig.topper created this revision.Aug 20 2020, 10:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 20 2020, 10:54 AM
craig.topper requested review of this revision.Aug 20 2020, 10:54 AM

I'm afraid the AutoUpgrade component of this isn't compatible with existing IR without some additional work. I'm most concerned about cases like the following:

#pragma pack(8)
struct X { __int128 x; }; // Not a packed struct in IR because the native alignment is 8
struct Y { long long x; struct X y; }; // 24 bytes before autoupgrade, 32 bytes after
struct Y x;

On a related note, we need to add "Fn8" to the x86 datalayout at some point.

I'm afraid the AutoUpgrade component of this isn't compatible with existing IR without some additional work. I'm most concerned about cases like the following:

#pragma pack(8)
struct X { __int128 x; }; // Not a packed struct in IR because the native alignment is 8
struct Y { long long x; struct X y; }; // 24 bytes before autoupgrade, 32 bytes after
struct Y x;

On a related note, we need to add "Fn8" to the x86 datalayout at some point.

I kind of feared that old IR was going to be a problem. Any thoughts on how to fix it? Do we need to visit every alloca/load/store/etc that don't have explicit alignment and force them to the old alignment? Alternatively, could we skip the autoupgrade and weaken the compatible layout check somehow?

As far as I know, there are basically three categories of things that depend on the alignment of a type.

  1. The default alignment of load/store/alloca. On trunk, load/store/alloca always have explicitly specified alignment in memory. That said, old bitcode doesn't have explicit alignment in some cases, and we currently run UpgradeDataLayoutString() before we actually parse the IR instructions.
  2. The default alignment of global variables. Globals are allowed to have unspecified alignment, and the resulting alignment is implicitly computed by a sort of tricky algorithm. We could look into forcing it to be computed explicitly, but it's a lot of work because there are a lot of places in the code that create globals without specifying the alignment.
  3. The layout of other types: for a struct that isn't packed, LLVM implicitly inserts padding to ensure it's aligned. To make this work correctly, you'd have to rewrite the types of every global/load/store/GEP/etc so they don't depend on the alignment of i128.

To autoupgrade correctly, we have to handle all three of those.

We can't just weaken the compatible datalayout check because the modules are actually incompatible, for the above reasons.

hvdijk added a subscriber: hvdijk.May 4 2021, 12:52 PM

There is a risk of bitcode incompatibilities with this change, but we already have that the code we generate now is incompatible with GCC and results in crashes that way too, I don't think there's a perfect fix, I'd like it if we could merge this. I came up with roughly the same patch today, based on current sources, to fix bug #50198 before finding this one.

llvm/lib/IR/AutoUpgrade.cpp
5158

This needs to not be limited to TT.isArch64Bit(). i128 needs 16-byte alignment on all targets, and although clang disables __int128 for X86, we still use it for lowering f128.

RKSimon edited reviewers, added: hvdijk; removed: RKSimon.May 4 2021, 2:32 PM
RKSimon added a subscriber: RKSimon.
rnk added a subscriber: rnk.Jan 6 2022, 2:15 PM
rnk added a comment.Jan 6 2022, 2:15 PM

As far as I know, there are basically three categories of things that depend on the alignment of a type.

  1. The default alignment of load/store/alloca. On trunk, load/store/alloca always have explicitly specified alignment in memory. That said, old bitcode doesn't have explicit alignment in some cases, and we currently run UpgradeDataLayoutString() before we actually parse the IR instructions.
  2. The default alignment of global variables. Globals are allowed to have unspecified alignment, and the resulting alignment is implicitly computed by a sort of tricky algorithm. We could look into forcing it to be computed explicitly, but it's a lot of work because there are a lot of places in the code that create globals without specifying the alignment.
  3. The layout of other types: for a struct that isn't packed, LLVM implicitly inserts padding to ensure it's aligned. To make this work correctly, you'd have to rewrite the types of every global/load/store/GEP/etc so they don't depend on the alignment of i128.

To autoupgrade correctly, we have to handle all three of those.

We can't just weaken the compatible datalayout check because the modules are actually incompatible, for the above reasons.

I think it's feasible for the autoupgrader to use the original data layout from the module to "freeze" the IR by converting all unpacked struct types in the module to packed types and assigning explicit alignments to all memory operations that lack them. If that's what's required to give us the flexibility to change the datalayout in the future, so be it, it's probably worth doing, and all other targets will benefit as well.

There is a risk of bitcode incompatibilities with this change, but we already have that the code we generate now is incompatible with GCC and results in crashes that way too, I don't think there's a perfect fix, I'd like it if we could merge this. I came up with roughly the same patch today, based on current sources, to fix bug #50198 before finding this one.

Who exactly generates GCC-incompatible code, clang, LLVM, or some other frontend? My understanding is that Clang handles most struct layout and alignment concerns in the frontend. The feature I'm not clear on is calling convention lowering, so when i128 is passed in memory, the LLVM data layout controls its alignment. However, I wonder if the alignstack() parameter attribute can be used to control this instead from the frontend:
https://llvm.org/docs/LangRef.html#parameter-attributes

hvdijk added a comment.Jan 6 2022, 2:41 PM
In D86310#3226142, @rnk wrote:

There is a risk of bitcode incompatibilities with this change, but we already have that the code we generate now is incompatible with GCC and results in crashes that way too, I don't think there's a perfect fix, I'd like it if we could merge this. I came up with roughly the same patch today, based on current sources, to fix bug #50198 before finding this one.

Who exactly generates GCC-incompatible code, clang, LLVM, or some other frontend? My understanding is that Clang handles most struct layout and alignment concerns in the frontend.

It's usually handled by clang, but when operations get lowered by LLVM to libcalls, it's coming from LLVM, and that's happening in the bug I referenced in that comment.

kkysen added a subscriber: kkysen.Nov 27 2022, 11:44 PM

What is the current status of this patch? Are the reviewers here OK with this fix in general but just need to see changes to autoupgrade?

@craig.topper or @hvdijk since you worked on it, are you interested in doing these changes, or is this patch in need of new authors?

What is the current status of this patch? Are the reviewers here OK with this fix in general but just need to see changes to autoupgrade?

@craig.topper or @hvdijk since you worked on it, are you interested in doing these changes, or is this patch in need of new authors?

I'm no longer working on X86 so I won't be able to work on it.

What is the current status of this patch? Are the reviewers here OK with this fix in general but just need to see changes to autoupgrade?

@craig.topper or @hvdijk since you worked on it, are you interested in doing these changes, or is this patch in need of new authors?

The TT.isArch64Bit() thing I commented on is something I could change, if desired, but from my perspective, the suggested changes to the upgrade mechanism are an unreasonable amount of work considering the benefit is that it keeps already broken code equally broken, so I am not planning on working on that, sorry.

Thank you Craig and Harald for getting back so quick. I suppose that leaves it up to what level of AutoUpgrade changes would be accepted at a minimum.

@efriedma would you consider the changes suggested by @hvdijk sufficient under any circumstances or would you still insist on fully compatible AutoUpgrade, given the above discussion?

In D86310#3226142, @rnk wrote:

Who exactly generates GCC-incompatible code, clang, LLVM, or some other frontend? My understanding is that Clang handles most struct layout and alignment concerns in the frontend. The feature I'm not clear on is calling convention lowering, so when i128 is passed in memory, the LLVM data layout controls its alignment. However, I wonder if the alignstack() parameter attribute can be used to control this instead from the frontend:
https://llvm.org/docs/LangRef.html#parameter-attributes

Old question but just to add some more context - LLVM is generating code that is incorrect for the linux ABI (16-byte alignment is required, LLVM produces 8-byte alignment) but the Clang frontend patches this in a way that "mostly works". It does not always work, such as in the bug that Herald linked at https://bugs.llvm.org/show_bug.cgi?id=50198, which segfaults with the mostt recent LLVM versions but is OK with GCC. This is pretty bad because it means that any frontend has to provide a workaround just to make LLVM do the mostly correct (but still not fully correct) thing.

This came into relevance recently because we are revisiting the issue in Rust. I think we are pretty close to providing a hack solution like Clang does, but LLVM is objectively wrong here so there are going to be things that just don't work correctly for anybody until this gets fixed. There is some thorough discussion on our related issue, around this comment https://github.com/rust-lang/rust/issues/54341#issuecomment-1064729606.

Note that a fix for this was landed at some point but got reverted, https://reviews.llvm.org/D28990. @echristo as you were the reviewer there, do you maybe have anything to add about the proposed fix here?

A thought occurs: in older versions of LLVM, the data layout mechanism worked differently and permitted targets to declare that they supported multiple different data layout strings, by overriding isCompatibleDataLayout. This mechanism was removed in D67631. If we reinstate that, we can have the X86 target declare that it "supports" data layout strings with and without the -i128:128, where by "supports", I mean the code continues to not generally work in the same way it does not generally work now, but the specific limited cases that do work continue to work exactly the same ABI-incompatible way. This would have the same result of bug-for-bug compatibility with existing modules, but in what I suspect would be a significantly simpler way than by going through the module and adding explicit alignments everywhere. While I would still prefer to give up on that compatibility, if it is a hard requirement, and if this would be an alternative way of achieving it, I might possibly be able to update this patch to do just that. Would this be acceptable?

nikic added a comment.Jul 13 2023, 2:45 AM

A thought occurs: in older versions of LLVM, the data layout mechanism worked differently and permitted targets to declare that they supported multiple different data layout strings, by overriding isCompatibleDataLayout. This mechanism was removed in D67631. If we reinstate that, we can have the X86 target declare that it "supports" data layout strings with and without the -i128:128, where by "supports", I mean the code continues to not generally work in the same way it does not generally work now, but the specific limited cases that do work continue to work exactly the same ABI-incompatible way. This would have the same result of bug-for-bug compatibility with existing modules, but in what I suspect would be a significantly simpler way than by going through the module and adding explicit alignments everywhere. While I would still prefer to give up on that compatibility, if it is a hard requirement, and if this would be an alternative way of achieving it, I might possibly be able to update this patch to do just that. Would this be acceptable?

The main problem with that is that we can't have multiple data layouts for one module, so linking old and new bitcode together would fail. But maybe that's exactly what we want -- after all, it is incompatible. Even if we "correctly" upgraded to preserve behavior of the old bitcode, it would still be incompatible with the new bitcode if i128 crosses the ABI boundary (explicitly or implicitly).

hvdijk added a comment.EditedJul 13 2023, 9:45 AM

The main problem with that is that we can't have multiple data layouts for one module, so linking old and new bitcode together would fail.

Good point, but it's worth pointing out that this only applies to linking in the LLVM IR sense. Linking in the ELF object file sense would work exactly as it would with the explicit alignments added everywhere, as ELF object files do not contain that data layout string. Linking in the LLVM IR sense is what happens with clang -flto though.

But maybe that's exactly what we want -- after all, it is incompatible. Even if we "correctly" upgraded to preserve behavior of the old bitcode, it would still be incompatible with the new bitcode if i128 crosses the ABI boundary (explicitly or implicitly).

Yeah, that is a tricky question to answer. Let's say this change goes into LLVM 17, so LLVM 17 X86 data layouts include i128:128, and nothing is changed for LLVM 16. Let's also say we have a program made up of two source files, a.c, and b.c. Then:

  • clang-16 -c -flto a.c b.c && clang-17 a.o b.o should ideally be accepted and would behave in the same way as clang-16 -c a.c b.c && clang-16 a.o b.o.
  • clang-16 -c -flto a.c && clang-17 -c -flto b.c && clang-17 a.o b.o should ideally be rejected if both a.o nor b.o use i128, but possibly accepted otherwise?
  • clang-16 -c a.c && clang-17 -c b.c && clang-17 a.o b.o cannot be detected as an error if i128 is used in both, but will not behave sensibly.

I am not sure there is a simple solution there that covers all of it.

@efriedma would you consider the changes suggested by @hvdijk sufficient under any circumstances or would you still insist on fully compatible AutoUpgrade, given the above discussion?

If the requirement is "we can mix old and new IR", we have to do it correctly, to the extent old versions of clang do it correctly.

If we're willing to refuse to compile old IR and/or refuse to LTO together old and new IR, there are other possible solutions. I'm not sure what workflows depend on having working autoupgrade.

rnk added a comment.Jul 13 2023, 11:25 AM

I see two ways forward here:

  1. Autoupgrade modules with old datalayout strings by increasing the alignment of i128 & co. This will change LLVM IR struct layouts, argument alignments, etc. As far as native ABI boundaries are concerned, this should be "more correct": Clang explicitly applies alignstack attributes to increase the alignment of i128 arguments, and adds padding to structs to align i128. As far as IR ABI boundaries within LTO are concerned, it is ABI compatible with IR modules.
  2. Freeze the ABI of the old module during autoupgrade. Replace all struct types with equivalent packed structs and explicit padding. Apply explicit alignments to all i128 loads and stores. Apply explicit alignstack(8) attributes to all i128 arguments.

I think 1 is better than 2. The only problem that approach 2 solves is to ensure that a non-clang frontend using i128 is ABI compatible with old versions of that same frontend (think Rust). Given that most non-clang frontends want the bug fix (ABI break), who exactly is asking for this level of IR ABI stability? Maybe I'm missing something, but after skimming over this review again, I think the existing autoupgrade approach is probably good enough. Can we add a release note or something and leave it at that?

In D86310#4498551, @rnk wrote:

Given that most non-clang frontends want the bug fix (ABI break), who exactly is asking for this level of IR ABI stability?

You were, I thought, or at least that's how I interpreted your earlier comment. :) If we're now all in agreement that that level of ABI stability is not needed, I can update this patch to address the comment that I had left (it should not be limited to 64-bit, it's needed for all X86). I'll probably be able to find time for this in the weekend.

The only problem that approach 2 solves is to ensure that a non-clang frontend using i128

https://reviews.llvm.org/D86310#2231136 has an example where IR generated by clang breaks.

hvdijk added a comment.EditedJul 13 2023, 12:23 PM

https://reviews.llvm.org/D86310#2231136 has an example where IR generated by clang breaks.

clang bases it on the data layout, so when the change here is applied, clang already generates correct IR for that example without further changes (using %struct.Y = type <{ i64, %struct.X }>). Unless you were using that as an example of when using old clang to generate LLVM IR, and new LLVM to produce machine code, would break?

https://reviews.llvm.org/D86310#2231136 has an example where IR generated by clang breaks.

clang bases it on the data layout, so when the change here is applied, clang already generates correct IR for that example without further changes (using %struct.Y = type <{ i64, %struct.X }>). Unless you were using that as an example of when using old clang to generate LLVM IR, and new LLVM to produce machine code, would break?

I only meant to dispute the assertion that ABI compatibility with old IR is only a problem for non-clang frontends.

rnk added a comment.Jul 13 2023, 2:41 PM

https://reviews.llvm.org/D86310#2231136 has an example where IR generated by clang breaks.

Right, so we'd break LTO of packed structs with i128 members.

I still think we're overthinking this, and letting our ABI compat concerns block us from making progress. Maybe we could do something as simple as erroring out from the auto-upgrader when the module being upgraded has a struct whose layout would change as a result of the increased i128 alignment, while remaining bitcode compatible in the vast majority of cases.

As for the longer term solution to this problem, instead of permitting mixed data layouts of data layout customization, IMO LLVM structs should explicitly encode field offsets. LLVM would still have APIs to assist frontends with producing semi-C-compatible struct layouts, in so much as we do today.

I'm not personally involved with any workflows that care about autoupgrade, so I'm not really invested in ensuring it's stable. If everyone agrees the impact is small enough that we're willing to just break autoupgrade to the extent it's relevant, I'll withdraw my objection.

As for the longer term solution to this problem, instead of permitting mixed data layouts of data layout customization, IMO LLVM structs should explicitly encode field offsets. LLVM would still have APIs to assist frontends with producing semi-C-compatible struct layouts, in so much as we do today.

I agree with the general sentiment that we want less dependence on alignment specified in the datalayout. Not sure about the exact design of that for structs... if IR moves in the direction people are proposing, the notion of a "struct" with a memory layout will likely go away altogether. (If we remove struct types form global variables and GEPs, there's very little left that actually cares about the layout of structs in memory.)

In D86310#4499095, @rnk wrote:

I still think we're overthinking this, and letting our ABI compat concerns block us from making progress. Maybe we could do something as simple as erroring out from the auto-upgrader when the module being upgraded has a struct whose layout would change as a result of the increased i128 alignment, while remaining bitcode compatible in the vast majority of cases.

This seems like a reasonable path forward, avoiding any concerns about IR mismatches while alerting users to the change. I would have to imagine there aren't all that many users that (1) don't use clang or another frontend that has to deal with this somehow, (2) use these types, (3) completely rely on autoupgrade.

Any i128 use, not just structs, _could_ be checked to catch mismatches like #50198 or the below example (more info on that in the github link I sent above), but this would affect clang users as well.

void i128_val_in_0_perturbed_small(
  uint8_t arg0, 
  __int128_t arg1,
  __int128_t arg2, 
  __int128_t arg3,
  __int128_t arg4, 
  float arg5
);

As a legacy OS provider on a platform that needs/requires ABI compatibility, I don't like the direction this is going. Like @rnk, I would having MORE control over struct layout is better than less. I'm adapting non-traditional languages to LLVM which allow very explicit control over layout of fields in structs. I have system-provided headers and data structures that have been the same since 1977. Fortunately, none contain i128 (or f128) sized items but I'm watching closely about any undermining of data layout control. This area of layout control (both with fields in structures and variables in sections) has been our biggest challenge with getting OpenVMS running on x86 using LLVM. I really don't want to be locked into a older version of the backend out of concerns about ABI reshuffling. We guarantee that previously compiled images continue to execute forever and that you can mix/match objects across versions. You can compile one file today and link it against an object library (provided by a 3rd party vendor) that was compiled 5 years ago with older compilers and it will work as intended.

@JohnReagan That is a valid concern, and I hope it reassures you that if things were working before, I would never be on board with this change. For example, it would generally be better if long double were 8-byte-aligned, but the x86 32-bit ABI specifies that it is 4-byte-aligned, and that is set in stone. I would be against any change in LLVM's ABI that changed their alignment, even if it would speed up code. I still occasionally run 20-year-old binaries, myself, that are dynamically linked to shared object files built with current compilers. Compatibility matters, I would not be on board with a change that breaks things like that. But that is not what is happening here. For i128, what clang implemented matched GCC, what LLVM implemented deviated from GCC/clang, but LLVM assumed that its implementation actually did match GCC/clang and code crashed as a result. This change would make it so that LLVM starts to also match GCC/clang, to change things from something that doesn't work to something that does work, and because things crash in current LLVM, I do not believe there can be much code out there that relies on the current behaviour. As you say, you aren't using i128/f128 yourself either. I hope that when I can update the patch, you can check that this does not cause problems for you.

@craig.topper Just to make sure, are you okay with me 'commandeering' this change and updating it?

rnk added a comment.Jul 14 2023, 9:00 AM

For example, it would generally be better if long double were 8-byte-aligned, but the x86 32-bit ABI specifies that it is 4-byte-aligned, and that is set in stone. I would be against any change in LLVM's ABI that changed their alignment, even if it would speed up code.

That may be your view, but other users rely on the -malign-double flag (D19734) to get the new behavior, despite the ABI concerns. Specifically, it mattered for users passing structs from CPU to GPU, because the GPU doesn't tolerate misaligned doubles well. With that in mind, I wouldn't describe this ABI rule as being "set in stone", but I understand your perspective.

Returning to the patch at hand, it sounds like we have consensus that the next step is to teach auto-upgrade to traverse the module looking for uses of a particular type in structs and IR. That logic could be reused in the future to solve similar problems when we need to adjust the layout of exotic types.

In D86310#4501240, @rnk wrote:

For example, it would generally be better if long double were 8-byte-aligned, but the x86 32-bit ABI specifies that it is 4-byte-aligned, and that is set in stone. I would be against any change in LLVM's ABI that changed their alignment, even if it would speed up code.

That may be your view, but other users rely on the -malign-double flag (D19734) to get the new behavior, despite the ABI concerns. Specifically, it mattered for users passing structs from CPU to GPU, because the GPU doesn't tolerate misaligned doubles well. With that in mind, I wouldn't describe this ABI rule as being "set in stone", but I understand your perspective.

As long as it is an option, it is fine, that will not cause compatibility issues.

Returning to the patch at hand, it sounds like we have consensus that the next step is to teach auto-upgrade to traverse the module looking for uses of a particular type in structs and IR. That logic could be reused in the future to solve similar problems when we need to adjust the layout of exotic types.

That is not my understanding of the consensus that we have, that is something that you asked for, then asked who asked for it, and are now again asking for. I do not see anyone else having asked for this, and I repeat that I think it is an unreasonable amount of work.

@craig.topper Just to make sure, are you okay with me 'commandeering' this change and updating it?

Yes. Thanks for taking it on.

hvdijk commandeered this revision.Jul 16 2023, 5:38 AM
hvdijk edited reviewers, added: craig.topper; removed: hvdijk.
hvdijk updated this revision to Diff 540796.Jul 16 2023, 5:44 AM
hvdijk retitled this revision from [X86] Align i128 to 16 bytes in x86-64 datalayout to [X86] Align i128 to 16 bytes in x86 datalayouts.

Rebased on current LLVM. No longer limited to 64 bit mode. 32 bit mode was taken out in D28990 because it was not believed to be required for compatibility, but as PR50198 shows, it does cause a compatibility issue to leave that as is.

This diff is uploaded without full context, because full context causes it to grow beyond what Phabricator allows in an upload.

I will write about the current state of compatibility in more detail shortly.

Herald added a project: Restricted Project. · View Herald TranscriptJul 16 2023, 5:44 AM
hvdijk added a comment.EditedJul 16 2023, 6:22 AM

THE CURRENT STATE

LLVM

LLVM permits i128 and fp128 in both x86 and x64. (When I write "x86", throughout this comment, I mean 32-bit x86.)
For x86, it aligns i128 to 4 bytes. It aligns fp128 to 16 bytes, except for Intel MCU. It calls libgcc functions for fp128, but not for i128. It uses i128 in these fp128 calls, which gets passed in memory.
For x64, it aligns i128 to 8 bytes. It aligns fp128 to 16 bytes. It calls libgcc functions for both i128 and fp128, and uses i128 in these fp128 calls. It uses i128 in these fp128 calls. Arguments to libgcc functions are passed in registers, with the exception of _ _udivmodti4 and _ _udivmodti4 which are currently not used by LLVM.

GCC

x86: GCC does not permit _ _int128/_BitInt(128). GCC does permit _Float128/_ _float128 and aligns it to 16 bytes, except for Intel MCU.
x64: GCC does not permit _BitInt(128). It permits _ _int128 and _Float128/_ _float128 and aligns them both to 16 bytes.

clang

x86: clang permits _BitInt(128) and aligns it to 4 bytes. It permits _ _float128 and aligns it to 16 bytes, except for Intel MCU. It does not permit _ _int128 or _Float128. It maps _BitInt(128) to LLVM i128 and _ _float128 to LLVM fp128.
x64: clang permits _BitInt(128), _ _int128, and _ _float128. It aligns _BitInt(128) to 8 bytes, and _ _int128 and _ _float128 to 16 bytes. It does not permit _Float128. It maps _BitInt(128) and _ _int128 to LLVM i128, and _ _float128 to LLVM fp128.

MSVC

MSVC does not support _ _int128/_BitInt(128)/_Float128/_ _float128 in either x86 or x64.

Compatibility between LLVM and GCC

For x86, the current i128 handling is compatible. The alignment to 8 byte boundaries causes no compatibility issues because nothing else supports i128.
For x86, the current fp128 handling is incompatible. The use of i128 with lower alignment in a call into libgcc breaks compatibility.

For x64, the current i128 handling is compatible but fragile. The alignment to 8 byte boundaries causes no compatibility issue because all calls into libgcc pass values in registers. If support for _ _udivmodti4 and _ _udivmodti4 were to be added in the future, the current i128 handling would be wrong.
For x64, the current fp128 handling is compatible. The alignment to 8 byte boundaries causes no compatibility issue because all calls into libgcc pass values in registers. No other libgcc functions use pointers.

Compatibility between clang and GCC

For both x86 and x64, for all types supported by both clang and GCC, they agree on alignment. The handling is compatible. For _BitInt(128), although not yet implemented in GCC, the x86-64 psABI has been changed to require that this be aligned like i64 (https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/32) and this is what GCC is implementing too (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989). clang agrees with it. The i386 psABI has not yet been changed but has been said will follow the x86-64 psABI.

Compatibility between clang and LLVM

When generating LLVM IR, where LLVM's native alignment is different, this is worked around, e.g. by inserting dummy fields into structures to add padding. The handling is compatible.

Rust

Rust permits i128 and u128 in both x86 and x64. It translates this into LLVM i128. Per https://rust-lang.github.io/rfcs/1504-int128.html, this is intended to match clang's _ _int128. Because LLVM's i128 alignment is different from clang's _ _int128, it instead actually matches clang's _BitInt(128).

ISSUES

As far as I can tell, the compatibility issues in the current version of LLVM are: the fp128 handling in x86, potentially the i128 handling in x64, and the i128 handling in Rust.

For the fp128 handling in LLVM for x86, it is required that LLVM align these to 16 bytes.

For the i128 handling in LLVM for x64, it is not currently required that LLVM align these to 16 bytes, but it will be required in the future if _ _udivmodti4 and _ _udivmodti4 are added.

For the i128 handling in Rust, it assumes that LLVM's i128 matches clang's _ _int128, which it currently does not.

QUESTIONS

Is the behaviour of LLVM, clang, GCC, and MSVC indeed as I described, or did I make a mistake anywhere?

Am I correct in how these three issues affect compatibility?

Are there issues related to i128 alignment in current LLVM beyond what I have written here?

Note that in this particular comment, none of it is intended to provide any argument as to whether or not the patch should be applied. This comment is only intended to get clarity on the current state.

Edit: The constant alignment in bug 46320 is definitely an issue as well.

THE CURRENT STATE
[...]
Compatibility between LLVM and GCC

For x86, the current i128 handling is compatible. The alignment to 8 byte boundaries causes no compatibility issues because nothing else supports i128.
For x86, the current fp128 handling is incompatible. The use of i128 with lower alignment in a call into libgcc breaks compatibility.

For x64, the current i128 handling is compatible but fragile. The alignment to 8 byte boundaries causes no compatibility issue because all calls into libgcc pass values in registers. If support for _ _udivmodti4 and _ _udivmodti4 were to be added in the future, the current i128 handling would be wrong.
For x64, the current fp128 handling is compatible. The alignment to 8 byte boundaries causes no compatibility issue because all calls into libgcc pass values in registers. No other libgcc functions use pointers.

Is the compatibility note here only meant to address calls between LLVM and GCC, not generated code? Because of course, struct layout and pass-in-memory function calls are incompatible.

ISSUES

As far as I can tell, the compatibility issues in the current version of LLVM are: the fp128 handling in x86, potentially the i128 handling in x64, and the i128 handling in Rust.

Rust just uses LLVM's i128 value directly so it doesn't necessarily need to be called out on its own (think we are in agreement here, just clarifying)

[...]
QUESTIONS

Is the behaviour of LLVM, clang, GCC, and MSVC indeed as I described, or did I make a mistake anywhere?

I believe that MSVC is in general ambiguous about these details on types that it does not support, but I would assume that being consistent with the Linux ABI is preferred and probably what MSVC would choose if they ever do decide on a specification for this type (unless LLVM has contact with Microsoft that may be able to clarify? They make no guarantees against breaking things in any case.)

[...]

It probably makes sense to have reasoning for choosing the selected behavior and having something specific to test against, so I'll link what I know.

[paraphrased from Figure 3.1]
type - sizeof - alignment - AMD64 architecture
long - 8 - 8 - signed eightbyte [I included this in the table for the below reference]
__int128 - 16 - 16 - signed sixteenbyte
signed __int128 - 16 - 16 - signed sixteenbyte
long double - 16 - 16 - 80-bit extended (IEEE-754)
__float128 - 16 - 16 - 128-bit extended (IEEE-754)
[...]
The __int128 type is stored in little-endian order in memory, i.e., the 64
low-order bits are stored at a a lower address than the 64 high-order bits
[...]
Arguments of type __int128 offer the same operations as INTEGERs,
yet they do not fit into one general purpose register but require two registers.
For classification purposes __int128 is treated as if it were implemented
as:

typedef struct {
    long low, high;
} __int128;

with the exception that arguments of type __int128 that are stored in
memory must be aligned on a 16-byte boundary

Quad-precision floating point parameters (C long double or Fortran REAL*16) are
always 16-byte aligned. This requires that they be passed in even-odd floating point
register pairs, even if doing so requires skipping a register parameter and/or a
64-bit save area slot. [The 32-bit ABI does not consider long double parameters,
since they were not supported.]

[paraphrased from table]
type - sizeof - alignment
__int128_t - 16 - quadword
__uint128_t - 16 - quadword
long double - 16 - quadword

[paraphrased from table]
type - size (bytes) - alignment
__int128 - 16 - 8
signed __int128 - 16 - 8
long double - 16 - 8

[1]: https://reviews.llvm.org/D130900

hvdijk added a comment.EditedJul 19 2023, 1:56 PM

Is the compatibility note here only meant to address calls between LLVM and GCC, not generated code? Because of course, struct layout and pass-in-memory function calls are incompatible.

There should be no compatibility issue there between GCC and clang in most cases, because clang ensures __int128 is aligned to 16 bytes everywhere, even if the LLVM data layout specifies lower alignment. clang's __int128 and LLVM's i128 play by different rules, currently. This change would make them play by the same rules.

Rust just uses LLVM's i128 value directly so it doesn't necessarily need to be called out on its own (think we are in agreement here, just clarifying)

I do think it needs to be called out on its own: Rust makes its i128 match LLVM's i128, while assuming it's matching clang's __int128.

I believe that MSVC is in general ambiguous about these details on types that it does not support, but I would assume that being consistent with the Linux ABI is preferred and probably what MSVC would choose if they ever do decide on a specification for this type (unless LLVM has contact with Microsoft that may be able to clarify? They make no guarantees against breaking things in any case.)

I would assume the same unless we hear otherwise.

Thanks for the references to the ABIs for __int128. It's good to know that if we decide LLVM's i128 should match that, we can make the same change everywhere.

Here's confirmation that _BitInt(128) should be 8-byte aligned and not 16 (so, different from __int128) from https://gitlab.com/x86-psABIs/x86-64-ABI:

• For N > 64, they are treated as struct of 64-bit integer chunks. The number of
chunks is the smallest number that can contain the type. _BitInt(N) types are
byte-aligned to 64 bits. The size of these types is the smallest multiple of the 64-bit
chunks greater than or equal to N.

Just FYI. There are a few reports about the compatibility issues, e.g., #41784. There's also concern about the alignment difference between _BitInt(128) and __int128, see #60925

Just FYI. There are a few reports about the compatibility issues, e.g., #41784.

Thanks. This is a case where clang breaks up __int128 into 2x i64 before it gets to LLVM. It is therefore not affected by this patch. Your other link also references #20283, which is the same issue of clang breaking __int128 into 2x i64.

Although this patch will not fix those issues, it may make it easier to fix them later on: it will give clang the ability to use LLVM's i128 type rather than trying to emulate it.

There's also concern about the alignment difference between _BitInt(128) and __int128, see #60925

That references https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues/11, where the answer four months ago was basically "it's probably already too late for that" with a suggestion to try and post on the mailing list to try and convince others that this was important enough to do. Nothing was posted to the mailing list, and by now GCC has started implementing what the ABI specifies (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989). I think we would need an extraordinary rationale if we want to convince others that the ABI should be changed.

Just FYI. There are a few reports about the compatibility issues, e.g., #41784.

Thanks. This is a case where clang breaks up __int128 into 2x i64 before it gets to LLVM. It is therefore not affected by this patch. Your other link also references #20283, which is the same issue of clang breaking __int128 into 2x i64.

Although this patch will not fix those issues, it may make it easier to fix them later on: it will give clang the ability to use LLVM's i128 type rather than trying to emulate it.>

Does this happen on the clang side or the LLVM side? I built rustc against LLVM with your patch ([link to source](llvm.org/docs/LangRef.html#floating-point-types)) and it makes rustc compatible with clang (progress!) but it still seems not compatible with GCC. That is, after the patch rustc now seems to have an identical calling behavior to clang, so I'm thinking that maybe this behavior lies somewhere in LLVM and not the frontend?

Quick ABI check that demonstrates this https://github.com/tgross35/quick-abi-check, the outputs of note (clang-new is built with this patch):

# all caller-foo-callee-samefoo work fine

+ ./bins/caller-gcc-callee-gcc
caller cc: gcc 11.3.0
caller align i128 16
caller arg0 244
caller argval 0xf0e0d0c0b0a09080706050403020100
caller arg15 123456.125000
callee cc: gcc 11.3.0
callee arg0 244
callee arg1 0xf0e0d0c0b0a09080706050403020100
callee arg2 0xf0e0d0c0b0a09080706050403020100
callee arg3 0xf0e0d0c0b0a09080706050403020100
callee arg4 0xf0e0d0c0b0a09080706050403020100
callee arg15 123456.125000

# between clang and gcc arg3+ seem to flip he word order?
+ ./bins/caller-gcc-callee-clang-old
caller cc: gcc 11.3.0
caller align i128 16
caller arg0 244
caller argval 0xf0e0d0c0b0a09080706050403020100
caller arg15 123456.125000
callee cc: clang 14.0.0 
callee arg0 244
callee arg1 0xf0e0d0c0b0a09080706050403020100
callee arg2 0xf0e0d0c0b0a09080706050403020100
callee arg3 0x7060504030201000000000000000000
callee arg4 0x7060504030201000f0e0d0c0b0a0908
callee arg15 123456.125000

+ ./bins/caller-gcc-callee-clang-new
caller cc: gcc 11.3.0
caller align i128 16
caller arg0 244
caller argval 0xf0e0d0c0b0a09080706050403020100
caller arg15 123456.125000
callee cc: clang 17.0.0 (git@github.com:tgross35/llvm-project.git 1733d949633a61cd0213f63e22d461a39e798946)
callee arg0 244
callee arg1 0xf0e0d0c0b0a09080706050403020100
callee arg2 0xf0e0d0c0b0a09080706050403020100
callee arg3 0x7060504030201000000000000000000
callee arg4 0x7060504030201000f0e0d0c0b0a0908
callee arg15 123456.125000

I think this patch can stand on its own even if it doesn't fix the above, but I'm just trying to get a better idea of where it's coming from if anyone knows more details.

There's also concern about the alignment difference between _BitInt(128) and __int128, see #60925

That references https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues/11, where the answer four months ago was basically "it's probably already too late for that" with a suggestion to try and post on the mailing list to try and convince others that this was important enough to do. Nothing was posted to the mailing list, and by now GCC has started implementing what the ABI specifies (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989). I think we would need an extraordinary rationale if we want to convince others that the ABI should be changed.

I reached out to the person who authored that. We will follow up with the mailing list to at least make a mild push for _BitInt(128) to take the alignment of __int128 (imagine the stackoverflow confusion if they aren't the same). However, I'm in agreement with the above comment - that is a separate concern from the behavior here since LLVM already complies with the bitint spec.

Does this happen on the clang side or the LLVM side?

Definitely on the clang side, but...

I built rustc against LLVM with your patch ([link to source](llvm.org/docs/LangRef.html#floating-point-types)) and it makes rustc compatible with clang (progress!) but it still seems not compatible with GCC. That is, after the patch rustc now seems to have an identical calling behavior to clang, so I'm thinking that maybe this behavior lies somewhere in LLVM and not the frontend?

...this suggests that potentially the same thing that clang is doing, LLVM is also doing independently. In which case, maybe it would be better to fix that at the same time: if we decide that LLVM's i128 should match __int128, I'd rather have a single change to the ABI to make it match __int128, rather than incremental changes, because incremental changes make it more likely that someone is going to be using the intermediate version and relying on its ABI. It'll be a little while before I can look into this but I'll try to come up with a patch to apply on top of this if no one else gets to it first.

Quick ABI check that demonstrates this https://github.com/tgross35/quick-abi-check, the outputs of note (clang-new is built with this patch):

Thanks, this is useful as an extra test.

@nikic posted a patch that fixes the register passing at https://reviews.llvm.org/D158169. I think that patch plus this one should resolve all the problems we have

@nikic posted a patch that fixes the register passing at https://reviews.llvm.org/D158169. I think that patch plus this one should resolve all the problems we have

Thanks for the link, that will save a lot of time. I don't think it will resolve all the problems, but that it's a significant additional step in the right direction. We also need to make clang use i128 for __int128 in order to actually use this fixed calling convention.

Is clang still doing something wrong? From my testing, it seems like clang and GCC now agree with each other, I am not sure what would still be incorrect

Is clang still doing something wrong? From my testing, it seems like clang and GCC now agree with each other, I am not sure what would still be incorrect

My understanding is that the code clang generates for __int128 will still allow it to be passed half-in-register, half-in-memory, exactly what D158169 sets out to fix, because D158169 only fixes it for LLVM's i128 which clang bypasses.

My understanding is that the code clang generates for __int128 will still allow it to be passed half-in-register, half-in-memory, exactly what D158169 sets out to fix, because D158169 only fixes it for LLVM's i128 which clang bypasses.

I think that D158169 seems to have fixed clang as well; after applying both patches, clang gcc and rustc all seem to agree. On the readme for https://github.com/tgross35/quick-abi-check look at the tests i128-caller-gcc-callee-clang-old (args don't align) i128-caller-gcc-callee-clang-new (args are the same) and i128-caller-gcc-callee-rustc (args are the same). Also the full ABI checker seems to say everything is in order (https://github.com/rust-lang/rust/pull/113880#issuecomment-1683021483 not sure why it says "4 failed" at the end, but I think it's a bug since no tests actually show failed).

Does this all seem correct? As far as I can tell it seems like with both patchs these issues should be resolved.

Was your failure in https://bugs.llvm.org/show_bug.cgi?id=50198 fixed with these patches? I cannot reproduce that failure for some reason, but it would likely make a good run-pass test.

These two patches do not seem to fix varargs segfaulting, as documented in https://bugs.llvm.org/show_bug.cgi?id=19909 (testing with this code https://godbolt.org/z/WeE7TvrGe) so it seems like that will need a separate fix.

hvdijk added a comment.EditedAug 17 2023, 5:12 PM

I think that D158169 seems to have fixed clang as well; after applying both patches, clang gcc and rustc all seem to agree.

Interesting. I cannot see how it would, I may be missing something; I will check when I am able.

Was your failure in https://bugs.llvm.org/show_bug.cgi?id=50198 fixed with these patches?

Yes, it was (at least it was at the time that I initially commented).

I cannot reproduce that failure for some reason, but it would likely make a good run-pass test.

It's reproducible online, https://godbolt.org/z/j918EeoMv, it would be interesting to know why it does not fail for you.

These two patches do not seem to fix varargs segfaulting, as documented in https://bugs.llvm.org/show_bug.cgi?id=19909 (testing with this code https://godbolt.org/z/WeE7TvrGe) so it seems like that will need a separate fix.

Thanks, and clang appears to avoid the use of the LLVM va_arg instruction here; we'll have to make sure to adapt that example to the LLVM IR equivalent that does use va_arg to make sure that's tested as well, and fixed if needed.

I cannot reproduce that failure for some reason, but it would likely make a good run-pass test.

It's reproducible online, https://godbolt.org/z/j918EeoMv, it would be interesting to know why it does not fail for you.

I tested both on my machine and in a container (debian docker image, then installing clang and gcc-multilib only) and can't get it to reproduce. Weird.

$ clang --version
Ubuntu clang version 14.0.0-1ubuntu1.1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
$ cat stack-fail.c
int main(void) {
  long double ld = 0;
  __float128 f128 = ld;
  int i = f128;
  return i;
}
$ clang -m32 stack-fail.c && ./a.out && echo ok
ok


The IR looks about the same as on godbolt but maybe the attributes are affecting something. It's probably still doing the wrong thing, just not segfaulting for whatever reason

These two patches do not seem to fix varargs segfaulting, as documented in https://bugs.llvm.org/show_bug.cgi?id=19909 (testing with this code https://godbolt.org/z/WeE7TvrGe) so it seems like that will need a separate fix.

Thanks, and clang appears to avoid the use of the LLVM va_arg instruction here; we'll have to make sure to adapt that example to the LLVM IR equivalent that does use va_arg to make sure that's tested as well, and fixed if needed.

Are you comfortable landing these two patches without fixing varargs, since it seems like those need separate work? Not sure if llvm follows kernel conventions but assuming yes and assuming you are OK with however D158169 seems to fix clang, you can add Tested-by: Trevor Gross <tmgross@umich.edu>: to the best of my knowledge the alignment, ABI, and general interop problems on x86_64 have been resolved for both LLVM and Clang.

I think that D158169 seems to have fixed clang as well; after applying both patches, clang gcc and rustc all seem to agree.

Interesting. I cannot see how it would, I may be missing something; I will check when I am able.

D158169 landed today, I confirmed that the current main (with D158169) makes Clang <-> GCC works but LLVM still fails without this patch.

Doesn't clang just wind up going through the same tablegen as LLVM, so it makes sense that both would be fixed?

Was your failure in https://bugs.llvm.org/show_bug.cgi?id=50198 fixed with these patches?

Yes, it was (at least it was at the time that I initially commented).

You mean this patch only right - how does that work? Looking closer at your comments there, it doesn't seem like i128 changes would affect anything if the f128 return alignment is the source of the problem.

I think that D158169 seems to have fixed clang as well; after applying both patches, clang gcc and rustc all seem to agree.

Interesting. I cannot see how it would, I may be missing something; I will check when I am able.

D158169 landed today, I confirmed that the current main (with D158169) makes Clang <-> GCC works but LLVM still fails without this patch.

I had hoped to avoid the piecewise ABI breakage, but with that already having landed, we already have that anyway, so I no longer see a reason to delay this until we can also fix va_arg.

Doesn't clang just wind up going through the same tablegen as LLVM, so it makes sense that both would be fixed?

Actually able to look into this now again, and yes, it does. I was sure I'd seen clang expand __int128 so that at the LLVM level, there was no longer any i128, but it does not happen here, and because it does not happen here, this patch does fix it.

Was your failure in https://bugs.llvm.org/show_bug.cgi?id=50198 fixed with these patches?

Yes, it was (at least it was at the time that I initially commented).

You mean this patch only right - how does that work? Looking closer at your comments there, it doesn't seem like i128 changes would affect anything if the f128 return alignment is the source of the problem.

See the source code comment I quoted in https://bugs.llvm.org/show_bug.cgi?id=50198#c3: "If the target does not have native f128 support, expand it to i128 and we will be generating soft float library calls." This applies to x86. f128 is expanded to i128, so any changes to the alignment for i128 automatically apply to f128 as well.

There's also concern about the alignment difference between _BitInt(128) and __int128, see #60925

That references https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues/11, where the answer four months ago was basically "it's probably already too late for that" with a suggestion to try and post on the mailing list to try and convince others that this was important enough to do. Nothing was posted to the mailing list, and by now GCC has started implementing what the ABI specifies (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989). I think we would need an extraordinary rationale if we want to convince others that the ABI should be changed.

The discussion has since moved to the list (https://groups.google.com/g/x86-64-abi/c/-JeR9HgUU20) and it seems as if the alignment of __int128 is fixed, no changes are planned there; if anything changes, it will be the alignment of _BitInt(128), and that will be independent of this patch.

Based on this, I now do think again the right course of action is to just commit this. It still applies to current LLVM without changes, and passes tests.

The point that is still contentious is the handling of IR generated from older versions of LLVM that do not have this patch. Personally, I feel that D158169 being accepted already answered how to handle this. D158169 clearly broke the ABI in LLVM: code generated with the current version of LLVM is not binary compatible with code generated with older versions of LLVM. But that is considered acceptable when the code generated by these older versions of LLVM was buggy and we have no reason to expect that there is code out there that relies on that bug remaining unfixed. The same logic applies here.

See the source code comment I quoted in https://bugs.llvm.org/show_bug.cgi?id=50198#c3: "If the target does not have native f128 support, expand it to i128 and we will be generating soft float library calls." This applies to x86. f128 is expanded to i128, so any changes to the alignment for i128 automatically apply to f128 as well.

Thank you for the explanation, that makes sense.

There's also concern about the alignment difference between _BitInt(128) and __int128, see #60925

That references https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues/11, where the answer four months ago was basically "it's probably already too late for that" with a suggestion to try and post on the mailing list to try and convince others that this was important enough to do. Nothing was posted to the mailing list, and by now GCC has started implementing what the ABI specifies (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989). I think we would need an extraordinary rationale if we want to convince others that the ABI should be changed.

The discussion has since moved to the list (https://groups.google.com/g/x86-64-abi/c/-JeR9HgUU20) and it seems as if the alignment of __int128 is fixed, no changes are planned there; if anything changes, it will be the alignment of _BitInt(128), and that will be independent of this patch.

Agreed; LLVM is doing the wrong thing with i128 and the correct thing for _BitInt(128), so _BitInt has no bearing on this change.

Based on this, I now do think again the right course of action is to just commit this. It still applies to current LLVM without changes, and passes tests.

The point that is still contentious is the handling of IR generated from older versions of LLVM that do not have this patch. Personally, I feel that D158169 being accepted already answered how to handle this. D158169 clearly broke the ABI in LLVM: code generated with the current version of LLVM is not binary compatible with code generated with older versions of LLVM. But that is considered acceptable when the code generated by these older versions of LLVM was buggy and we have no reason to expect that there is code out there that relies on that bug remaining unfixed. The same logic applies here.

Also agreed with this, I think concensus on this thread seems to be in agreement with the current patch too. Looking forward to the land :)

Is this just waiting for a review?

Is this just waiting for a review?

Yes, I think so. Valid concerns over compatibility were raised, but now that strict compatibility with i128 has already been broken anyway, I no longer believe there is any reason not to just apply this as is, preferably soon so as to minimise the time that we have the current partially changed i128 ABI, to minimise the chance that people will start to rely on that somewhere.

I'm happy to sign off on the x86-64 part here, but I'm less sure about x86-32. If I understood correctly, the i128 alignment is raised there exclusively to fix the "f128 legalized to i128 libcall" case -- is there any other ABI requirement for i128 alignment on x86-32? Is raising i128 alignment the right way to fix an f128 issue?

I'm happy to sign off on the x86-64 part here, but I'm less sure about x86-32. If I understood correctly, the i128 alignment is raised there exclusively to fix the "f128 legalized to i128 libcall" case -- is there any other ABI requirement for i128 alignment on x86-32? Is raising i128 alignment the right way to fix an f128 issue?

GCC does not support __int128 on x86-32, but clang has the -fforce-enable-int128 option, and when that is used, it gives it the same 16-byte alignment that it does on x86-64, so even ignoring the _Float128 issue, I think the change is right for x86-32.

nikic added a comment.Sep 20 2023, 3:48 AM

I'm happy to sign off on the x86-64 part here, but I'm less sure about x86-32. If I understood correctly, the i128 alignment is raised there exclusively to fix the "f128 legalized to i128 libcall" case -- is there any other ABI requirement for i128 alignment on x86-32? Is raising i128 alignment the right way to fix an f128 issue?

GCC does not support __int128 on x86-32, but clang has the -fforce-enable-int128 option, and when that is used, it gives it the same 16-byte alignment that it does on x86-64, so even ignoring the _Float128 issue, I think the change is right for x86-32.

Okay, that's a compelling argument.

llvm/lib/IR/AutoUpgrade.cpp
5233

I don't think this will work for the 32-bit targets that don't have -i64:64.

hvdijk added inline comments.Sep 20 2023, 10:03 AM
llvm/lib/IR/AutoUpgrade.cpp
5233

Oh, you're right, thanks. That was intentional but wrong: there is a test that we do not upgrade data layout strings that do not look sufficiently close to valid, and this was intended to address that. But this also avoids it for data layout strings that do need upgrading. I'll have to figure out how to handle both; will update when I know how.

hvdijk added a comment.EditedSep 21 2023, 5:38 PM

I do not think there is a sensible way to keep upgrade-datalayout2.ll working, with the way the upgrade logic is structured, and we should rethink that test. The change here intends to insert -i128:128- into x86 data layouts that do not have it. The goal of upgrade-datalayout2.ll is to test that data layouts that are not valid x86 data layouts do not get upgraded. However, I see no sensible logic by which we can say that in this particular case, we should not add it.

What's more, none of the data layout upgrades *ever* checked that the data layout was a valid x86 data layout, not even D67631 which added this test: it is easy to construct data layout strings that are not valid x86 data layout strings, that would already be upgraded by that very first version of UpgradeDataLayoutString, despite what the test claimed to check. So if we regard it as a bug to upgrade invalid target data layout strings, this is a pre-existing bug. Alternatively, we can choose to not regard it as a bug, and instead say the test is invalid. I do not know the rationale here, but given that it was explicitly said to be intended to work this way, I am on the side of seeing it as a pre-existing bug. One that is nearly impossible to fix in the current structure.

Now that there can only be one valid data layout string per target, if it is intended that UpgradeDataLayoutString only upgrade target-valid data layout strings, it is a bug for UpgradeDataLayoutString to ever produce anything other than 1) its input or 2) the target's one valid data layout string. This allows a much simpler implementation that completely fixes the bug, but is too big to be part of this change. I would like to propose that in this change, we change UpgradeDataLayoutString to insert -i128:128- including in that one test, and we XFAIL upgrade-datalayout2.ll since the uncovered bug is not actually a new bug. In a followup PR, I can then restructure the UpgradeDataLayoutString logic by removing the function entirely and instead having target functions to check whether a given data layout string is a valid historic data layout string for the target that should be upgraded, and if so, simply clobbering the data layout string with what the target reports is the correct data layout string. (Edit: Proof of concept: https://github.com/hvdijk/llvm-project/commit/14e7f5dd2b8de862773b0700bde483bd722e4ad5. This does not update the tests that actually rely on invalid data layout strings being upgraded, and does not merge this new computeX86DataLayout with the existing computeDataLayout in X86TargetMachine.cpp, but should make it clear it can be done without difficulty.)

Does that seem reasonable? Am I overlooking anything that would make this a non-option? Are there good alternatives that I am not seeing right now?

rnk added a subscriber: akhuang.Oct 3 2023, 2:01 PM

Regarding upgrade-datalayout2.ll, I don't think we need to be too constrained by it. @akhuang , do you recall why you added it?

In other words, I think your direction is reasonable, we should go forward with this.

hvdijk updated this revision to Diff 557645.Oct 8 2023, 6:20 PM
  • Updated AutoUpgrade.cpp to also upgrade 32-bit data layout strings.
  • Marked upgrade-datalayout2.ll as XFAIL with an explanation.
  • Added upgrade-datalayout5.ll to check that we upgrade 32-bit data layout strings.
  • Removed a test from DataLayoutUpgradeTest.cpp that we do not upgrade e-p:32:32, which now gets upgraded to e-p:32:32-i128:128.
In D86310#4652817, @rnk wrote:

Regarding upgrade-datalayout2.ll, I don't think we need to be too constrained by it. @akhuang , do you recall why you added it?

In other words, I think your direction is reasonable, we should go forward with this.

Thanks, I have now done this. The same problem turned out to also exist in DataLayoutUpgradeTest.cpp, where XFAIL is not an option, so I removed that one. That is not ideal but I do not see what alternative we have.

hvdijk added inline comments.Oct 8 2023, 6:24 PM
llvm/lib/IR/AutoUpgrade.cpp
5233

This should now be fixed. X86 data layout strings always have their components in the same order, mpifnaS, where some may be omitted. I make use of this by looking for any leading -m/-p/-i components and inserting -i128:128 after the last of those.

rnk accepted this revision.Oct 9 2023, 1:28 PM

I re-read the code review, and I think most folks are in favor of this change, but I may have missed some. Many concerns were raised, so please wait for approval from @efriedma as well before landing.

This revision is now accepted and ready to land.Oct 9 2023, 1:28 PM

Given the complexity here, I agree this is probably the best we can reasonably do. Code and test changes LGTM.

That said, this is missing a release note.

hvdijk updated this revision to Diff 557655.Oct 9 2023, 4:36 PM

Given the complexity here, I agree this is probably the best we can reasonably do. Code and test changes LGTM.

That said, this is missing a release note.

Thanks, added a release note. I'm not sure how much detail release notes usually go into, can shorten it or add to it if you like.

echristo accepted this revision.Oct 9 2023, 5:07 PM

Explicitly still ok with this as well. Thanks for continuing here. :)

tmgross accepted this revision.Oct 9 2023, 9:15 PM

Tested that this patch applied on top of main fixes all i128 ABI issues among gcc, clang, and rustc. Probably would be good to add https://bugs.llvm.org/show_bug.cgi?id=50198 to the test suite if it isn't there already.

Thanks for sticking with this Harald!

Probably would be good to add https://bugs.llvm.org/show_bug.cgi?id=50198 to the test suite if it isn't there already.

That test would not work as an LLVM test directly, but we do already have lit tests that cover that, the test changes in here show the fixed alignment of f128 too.

This was ready to push pending @efriedma's approval, who rightly pointed out a release note was missing but it was otherwise okay. With the release note now added, I think that there is nothing stopping this from being pushed, so I intend to do so once I am able to rebase one hopefully last time and re-run tests to verify no new tests have been added that also require an update. Thanks for the feedback, everyone.

This revision was automatically updated to reflect the committed changes.

The buildbot found one more test that needed updating, that was disabled on my system. Created https://github.com/llvm/llvm-project/pull/68781 for that.

Hi there,

This change seems to be causing assertion failure in clang when a struct contains a _BitInt with length longer than 128 - https://godbolt.org/z/4jTrW4fcP .

hvdijk added a comment.EditedNov 2 2023, 12:27 PM

Hi there,

This change seems to be causing assertion failure in clang when a struct contains a _BitInt with length longer than 128 - https://godbolt.org/z/4jTrW4fcP .

Thanks for the report. This is a pre-existing problem for i128:128 targets -- the same assertion failure can be seen, without this change, for the same program with e.g. --target=aarch64 -Xclang -fexperimental-max-bitint-width=1024 -- so I don't think it's a problem in this change directly, but considering X86 is the only target that enables >128-bit bit integers by default, it became far more visible after this change and because of that, more important to fix. This is related to an open ABI issue, #60925, and I have added a comment there.