This is an archive of the discontinued LLVM Phabricator instance.

[IR] Support importing modules with invalid data layouts.
ClosedPublic

Authored by jsilvanus on Jan 4 2023, 6:39 AM.

Details

Summary

Use the existing mechanism to change the data layout using callbacks.

Before this patch, we had a callback type DataLayoutCallbackTy that receives
a single StringRef specifying the target triple, and optionally returns
the data layout string to be used. Module loaders (both IR and BC) then
apply the callback to potentially override the module's data layout,
after first having imported and parsed the data layout from the file.

We can't do the same to fix invalid data layouts, because the import will already
fail, before the callback has a chance to fix it.
Instead, module loaders now tentatively parse the data layout into a string,
wait until the target triple has been parsed, apply the override callback
to the imported string and only then parse the tentative string as a data layout.

Moreover, add the old data layout string S as second argument to the callback,
in addition to the already existing target triple argument.
S is either the default data layout string in case none is specified, or the data
layout string specified in the module, possibly after auto-upgrades (for the BitcodeReader).
This allows callbacks to inspect the old data layout string,
and fix it instead of setting a fixed data layout.

Also allow to pass data layout override callbacks to lazy bitcode module
loader functions.

Diff Detail

Event Timeline

jsilvanus created this revision.Jan 4 2023, 6:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 4 2023, 6:39 AM
jsilvanus requested review of this revision.Jan 4 2023, 6:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 4 2023, 6:39 AM
jsilvanus updated this revision to Diff 486273.Jan 4 2023, 6:53 AM

Add lit test case.

For context, this prepares requiring naturally aligned i8 as discussed in https://discourse.llvm.org/t/status-of-overaligned-i8/66913/

This will render some existing modules (e.g. DXIL) invalid, but some users still need to import such modules.

The changes look good from my side. Can you add a unittest that converts an invalid DataLayout string to something legal?

llvm/include/llvm/Bitcode/BitcodeReader.h
38–39

Instead of “old” data layout string, maybe something like “data layout string from the input”?

arsenm added a subscriber: arsenm.Jan 5 2023, 6:43 AM

Is DataLayout compatibility still determined with string equality and not semantic equality?

Is DataLayout compatibility still determined with string equality and not semantic equality?

What exactly do you mean?
The callback mechanism changed in this patch was and is string-based, and the parsing of the string into a DL happens after the callback.

jsilvanus updated this revision to Diff 487390.Jan 9 2023, 5:10 AM
Address review feedback

 - improve comment on callback args
 - add callback unit test
nikic added a reviewer: nikic.Jan 10 2023, 8:53 AM
nikic added a subscriber: nikic.
nikic added inline comments.
llvm/lib/AsmParser/LLParser.cpp
384

It would be better to go through DataLayout::parse here as well, and report without location. setDataLayout will cause a fatal error for invalid data layouts, I believe.

llvm/lib/Bitcode/Reader/BitcodeReader.cpp
4210

High level question: Why is the existing UpgradeDataLayoutString mechanigm not sufficient?

jsilvanus updated this revision to Diff 487853.Jan 10 2023, 9:48 AM
jsilvanus marked an inline comment as done.

Address review feedback: Handle invalid data layouts gracefully

jsilvanus marked an inline comment as done.Jan 10 2023, 9:49 AM

Thanks for reviewing @nikic. Comments inline.

llvm/lib/AsmParser/LLParser.cpp
384

That's correct. I've changed it to always use DataLayout::parse and return an error instead of running into a fatal error.

llvm/lib/Bitcode/Reader/BitcodeReader.cpp
4210

In the case this is intended for (requiring natural alignment for i8), it is true that there is a straightforward fix of a broken data layout string: Just replace i8s alignment with natural alignment.

But depending on how i8 is used in the module, this can break semantics, requiring nontrivial fixup (e.g. adding manual padding). This does not seem to be necessary for DXIL though.

I wanted to avoid auto-upgrade (silently?) breaking a module, and adding the manual padding as part of auto-upgrade seems excessive, at least given that we're not aware of any cases actually requiring it.

On the other hand, letting the user fix the layout string avoids that issue of auto-upgrade breaking a module. If the user is sure that i8 alignment does not matter, the callback provides the option to fix the DL string.

Is DataLayout compatibility still determined with string equality and not semantic equality?

What exactly do you mean?
The callback mechanism changed in this patch was and is string-based, and the parsing of the string into a DL happens after the callback.

I mean llvm-link complains if you link modules with different datalayouts, but it does dumb string comparison so if you reorder fields (or rely on defaulted fields) it prints a warning

nikic accepted this revision.Jan 11 2023, 7:52 AM

LGTM

llvm/lib/Bitcode/Reader/BitcodeReader.cpp
4210

I see. If this is intended for manual user intervention, then that makes sense.

This revision is now accepted and ready to land.Jan 11 2023, 7:52 AM
This revision was landed with ongoing or failed builds.Jan 12 2023, 1:12 AM
This revision was automatically updated to reflect the committed changes.
jsilvanus marked an inline comment as not done.
cmtice added a subscriber: cmtice.Jan 12 2023, 9:31 PM

Just a heads up: It looks like this change is causing many of our internal tests to fail. I'm working on trying to create a reproducer for you.