This is an archive of the discontinued LLVM Phabricator instance.

Infer alignment of loads with unspecified alignment in IR/bitcode parsing.

Authored by efriedma on Apr 17 2020, 2:04 PM.



For IR generated by a compiler, this is really simple: you just take the datalayout from the beginning of the file, and apply it to all the IR later in the file. For optimization testcases that don't care about the datalayout, this is also really simple: we just use the default datalayout.

The complexity here comes from the fact that some LLVM tools allow overriding the datalayout: some tools have an explicit flag for this, some tools will infer a datalayout based on the code generation target. Supporting this properly required plumbing through a bunch of new machinery: we want to allow overriding the datalayout after the datalayout is parsed from the file, but before we use any information from it. Therefore, IR/bitcode parsing now has a callback to allow tools to compute the datalayout at the appropriate time.

Not sure if I covered all the LLVM tools that want to use the callback. (clang? lli? Misc IR manipulation tools like llvm-link?). But this is at least enough for all the LLVM regression tests, and IR without a datalayout is not something frontends should generate.

With this patch, we impose one other new restriction: all "target triple", "target datalayout", and "source_file" directives have to be at the beginning of the file. This should be true for all IR generated by LLVM, but hand-written IR tests sometimes stick them in weird places.

This change had some sort of weird effects for certain CodeGen regression tests: if the datalayout is overridden with a datalayout with a different program or stack address space, we now parse IR based on the overridden datalayout, instead of the one written in the file (or the default one, if none is specified). This broke a few AVR tests, and one AMDGPU test.

Outside the CodeGen tests I mentioned, the test changes are all just fixing CHECK lines and moving around datalayout lines in weird places.

Not sure if I should try to split this. The code changes are small enough, but the test changes are a bit large to review. I guess I could change some of the tests to explicitly specify alignment, and commit that separately?

Once this lands, it should be easy to rebase and land D77454, to completely eliminate loads with unspecified alignment from IR.

Diff Detail

Event Timeline

efriedma created this revision.Apr 17 2020, 2:04 PM
Herald added a reviewer: aartbik. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
mehdi_amini added inline comments.Apr 17 2020, 5:08 PM

The type for the callback is strange to me, this allows basically any modification of the Module during the callback, can't we just pass an Optional<DataLayout> override_datalayout in the parsing functions?


How is it related to the data layout thing?

efriedma marked 2 inline comments as done.Apr 17 2020, 5:35 PM
efriedma added inline comments.

llc needs access to the target triple from the IR file to decide which datalayout to use. Hence the callback API: we need to decide the datalayout after we parse the triple, but before we parse any function definitions.


Upgrading debug info encounters a similar sort of issue with the datalayout as computing the alignment: if we try to upgrade debug info with the wrong datalayout, we get the wrong result. llc was using the UpgradeDebugInfo boolean to delay upgrading debug info until after it computed the correct datalayout. With this patch, llc doesn't need that special case anymore.

Given that, there isn't really any legitimate use for disabling UpgradeDebugInfo anymore. So I decided it makes sense to split apart the API, so the normal entry points for IR parsing don't expose this weird edge case of intentionally creating invalid bitcode files. I added the entry point parseAssemblyFileWithIndexNoUpgradeDebugInfo, and I split the debug-upgrading part of "-disable-verify" into a separate option.

Maybe I should do something more minimal in this patch?

dexonsmith added inline comments.

It would be great for @aprantl to review at least this part of the patch.

@efriedma, I'm curious if this refactoring would be easy/reasonable to split out and commit first?

mehdi_amini added inline comments.Apr 17 2020, 6:02 PM

In this case can you make it a callback that would be something like DataLayout callback(Triple&)?

aprantl added inline comments.Apr 17 2020, 6:17 PM

There is a stale doxygen comment here now.


Given that, there isn't really any legitimate use for disabling UpgradeDebugInfo anymore.

IIRC I originally added this option for the sole reason of testing the upgrade behavior, as shown by the doxygen comment:

/// \param UpgradeDebugInfo Run UpgradeDebugInfo, which runs the Verifier.
///                         This option should only be set to false by llvm-as
///                         for use inside the LLVM testuite!

The intended behavior is that if you run llvm-as --disable-verify (something that should not be done outside of the testsuite) this flag would be set to false.

See for the original context.

efriedma marked an inline comment as done.Apr 17 2020, 6:39 PM
efriedma added inline comments.

I guess I could split out the change to add this flag into a separate patch. I could split out the actual change to set the alignment of LoadInst into a separate patch. I guess I could also split out the part of the IR parsing change that requires the datalayout at the beginning of the file. I don't think any of those changes help make the patch more comprehensible, but they would make it smaller.

I can't really split the IR parsing API rewrite in any reasonable way.

efriedma updated this revision to Diff 258875.Apr 20 2020, 5:42 PM
efriedma marked an inline comment as done.

Rebased. Changed the callback API per review comment.

mehdi_amini added inline comments.Apr 20 2020, 8:26 PM

Nit: function argument shouldn't take a string by copy, ideally should StringRef.


Can you update the doxygen? (same for the other functions)

Thanks, this will be very useful for our out-of-tree CHERI changes. We have some tests where we would like to pass the datalayout on the command line using a lit substitution instead of hardcoding it in the test files. I can't remember exactly what problems were caused by setting the datalayout late, but I think it was related to errors due to function forward declarations being inserted in the wrong address space.

Just one general comment: I wonder if would make more sense to pass the Module* to the callback and let the tool perform any changes there? This would make it more flexible if we want to allow early overriding other aspects on the command line other than the datalayout.
Should also make the callbacks a bit shorter:
For example the llvm-as one could them be:

auto SetDataLayout = [](llvm::Module*) {
  if (!ClDataLayout.empty())

This doc comment should be removed since the parameter is gone now.

efriedma updated this revision to Diff 259058.Apr 21 2020, 10:56 AM

Address review comments.

Just one general comment: I wonder if would make more sense to pass the Module* to the callback and let the tool perform any changes there?

This was the original revision, I'm against doing this: I think the contract should be left as narrow as possible to express the intent of the API here. This is to express that this is not a hook to "hack whatever you want there" kind of callback.

mehdi_amini added inline comments.Apr 21 2020, 8:44 PM

I think a better default is an empty function_ref instead of a lambda: I suspect you can write it this way DataLayoutCallbackTy DataLayoutCallback = {}


Are you making invalid bitcode that was valid before here?
I'd rather avoid doing this, can we check if (SeenFirstFunctionBody && DataLayoutCallback)?
(which also implies to not provide a callback from any tool when not needed (and make sure the default argument for the callback is everywhere an empty function_ref)


Is there valid bitcode out-there that could be hit by this?

efriedma marked 3 inline comments as done.Apr 21 2020, 9:36 PM
efriedma added inline comments.

I didn't realize function_ref was allowed to be null.

I guess that's simpler in some sense, but I'm not sure it's worthwhile; possibly-null values make the documentation more complicated. Not a big deal either way, though.


Yes, this is rejecting bitcode that LLVM currently accepts. No version of LLVM has ever emitted bitcode that violates this constraint, though; LLVM always emits the datalayout early. In theory, I guess if someone had a custom bitcode emitter that emitted bitcode in some really weird order, it could be an issue.

I'm not sure what we would do if we want to "handle" this; if we see a datalayout after we've already read IR instructions, we would have to go back and fix/reparse already parsed IR. I guess we could, but I'm not really enthusiastic about writing bitcode testcases by hand to test that codepath.


No, it's illegal to load a value of unsized type. The verifier has an equivalent check; we're just moving that check earlier, so we can resolve the alignment before calling the LoadInst constructor.

mehdi_amini accepted this revision.Apr 21 2020, 9:43 PM

LGTM, but it'd be nice to have another approval as well.


if we see a datalayout after we've already read IR instructions, we would have to go back and fix/reparse already parsed IR

This would already be an issue with the current parser though?
If you don't mind committing these two lines right now since it seems like a defensive measure in the reader that is valid to do independently of this patch (assuming I understand correctly here?)

This revision is now accepted and ready to land.Apr 21 2020, 9:43 PM
efriedma marked an inline comment as done.Apr 22 2020, 1:24 PM
efriedma added inline comments.

The current parser doesn't really query the datalayout. (Well, except in the cases where it does, for the program/stack address spaces. But that doesn't affect mainstream targets.)

Yes, sure, I can commit this separately.

efriedma updated this revision to Diff 259382.Apr 22 2020, 1:57 PM

Rebase. Add missing handling for atomic loads (which doesn't really do anything on its own, but it's necessary for D77454).

nikic added a subscriber: nikic.May 1 2020, 2:49 AM
efriedma updated this revision to Diff 263824.May 13 2020, 12:27 PM

Rebased. Split off D79900