This is an archive of the discontinued LLVM Phabricator instance.

LLParser: add an argument for overriding data layout and do not check alloca addr space
ClosedPublic

Authored by yaxunl on Jan 8 2018, 12:25 PM.

Details

Summary

Sometimes users do not specify data layout in LLVM assembly and let llc set the data layout by target triple after loading the LLVM assembly.

Currently the parser checks alloca address space no matter whether the LLVM assembly contains data layout definition, which causes false alarm since the default data layout does not contain the correct alloca address space.

The parser also calls verifier to check debug info and updating invalid debug info. Currently there is no way to let the verifier to check debug info only. If the verifier finds non-debug-info issues the parser will fail.

For llc, the fix is to remove the check of alloca addr space in the parser and disable updating debug info, and defer the updating of debug info and verification to be after setting data layout of the IR by target.

For other llvm tools, since they do not override data layout by target but instead can override data layout by a command line option, an argument for overriding data layout is added to the parser. In cases where data layout overriding is necessary for the parser, the data layout can be provided by command line.

Diff Detail

Repository
rL LLVM

Event Timeline

yaxunl created this revision.Jan 8 2018, 12:25 PM
arsenm added inline comments.Jan 8 2018, 12:32 PM
lib/AsmParser/LLParser.cpp
6181–6185 ↗(On Diff #128967)

The idea was to remove this check from the parser entirely

lib/IR/Verifier.cpp
3210 ↗(On Diff #128967)

s/stack/DataLayout alloca/

yaxunl added inline comments.Jan 8 2018, 12:36 PM
lib/AsmParser/LLParser.cpp
6181–6185 ↗(On Diff #128967)

Removing this check will cause some lit tests to fail since the verifier output different error messages.

The error message from the parser is better since it contains line/column number. Should we keep the check if the assembly contains data layout definition?

lib/IR/Verifier.cpp
3210 ↗(On Diff #128967)

will fix.

arsenm added inline comments.Jan 8 2018, 12:39 PM
lib/AsmParser/LLParser.cpp
6181–6185 ↗(On Diff #128967)

I think that's less important. The verifier error will dump the instruction so it's not difficult to find anyway. I think changing the error depending on whether the datalayout is in the module already would be inconsistent

yaxunl added inline comments.Jan 8 2018, 12:41 PM
lib/AsmParser/LLParser.cpp
6181–6185 ↗(On Diff #128967)

OK. Will remove this and update lit tests.

yaxunl updated this revision to Diff 128971.Jan 8 2018, 12:59 PM
yaxunl edited the summary of this revision. (Show Details)

Removed check of alloca addr space from LLParser.

arsenm added inline comments.Jan 8 2018, 1:30 PM
include/llvm/IR/Verifier.h
101 ↗(On Diff #128971)

I'm not sure I understand why this option is needed or to UpgradeDebugInfo. I would expect the datalayout to be set already by the time any verification is done

yaxunl added inline comments.Jan 8 2018, 1:35 PM
include/llvm/IR/Verifier.h
101 ↗(On Diff #128971)

LLParser calls UpgradeDebugInfo, which calls verifyModule. At this point the datalayout has not been set yet. llc sets datalayout by target after parsing the IR.

arsenm added inline comments.Jan 9 2018, 7:53 AM
include/llvm/IR/Verifier.h
101 ↗(On Diff #128971)

Why does UpgradeDebugInfo need a special second verify run? Is there an issue with just removing that and relying on the normal verify pass?

yaxunl added inline comments.Jan 9 2018, 8:13 AM
include/llvm/IR/Verifier.h
101 ↗(On Diff #128971)

UpgradeDebugInfo checks if the module only contains debug info error or any other error. If the module only contains debug info error, then it reports the error. If the debug info version is outdated, it deletes the debug info. If the module contains other error, it will report fatal error.

LLParser calls UpgradeDebugInfo mainly to report debug info error and delete outdated debug info. However since UpgradeDeugInfo may be called by other functions which expect it to abort on invalid module.

yaxunl updated this revision to Diff 129499.Jan 11 2018, 12:37 PM
yaxunl edited the summary of this revision. (Show Details)

Removed module verification from UpgradeDebugInfo. UpgradeDebugInfo only checks debug info.

yaxunl added inline comments.Jan 12 2018, 6:20 AM
include/llvm/IR/Verifier.h
101 ↗(On Diff #128971)

I have updated the patch so that UpgradeDebugInfo only checks debug info. It still needs to check debug info because it needs to delete invalid debug info. Now it does not check non-debug-info issues. Please take a look. Thanks.

yaxunl retitled this revision from LLParser: Do not check alloca addr space if the assembly does not contain data layout definition to LLParser: do not verify LLVM module.Jan 16 2018, 2:24 PM
yaxunl added reviewers: manmanren, aprantl.
yaxunl edited the summary of this revision. (Show Details)Jan 16 2018, 2:33 PM
arsenm added inline comments.Jan 22 2018, 10:14 AM
lib/IR/Verifier.cpp
214–216 ↗(On Diff #129499)

This seems to just be stopping all error messages?

test/Verifier/llvm.dbg.intrinsic-dbg-attachment.ll
1 ↗(On Diff #129499)

This is a test regression

yaxunl added inline comments.Jan 22 2018, 10:40 AM
lib/IR/Verifier.cpp
214–216 ↗(On Diff #129499)

No. This will only stop checking of non-debug info and will not stop checking debug info. Since checking of debug info does not use this function but uses DebugInfoCheckFailed.

test/Verifier/llvm.dbg.intrinsic-dbg-attachment.ll
1 ↗(On Diff #129499)

This test should pass since it only contains invalid debug info. After the parser removes the debug info, the IR is valid. There is a bug in verifier mistakenly treated debug info error as non-debug info error. (See change in Verifier.cpp line 4492 for the fix). After that bug is fixed, this test should pass.

arsenm added inline comments.Jan 22 2018, 11:34 AM
test/Verifier/llvm.dbg.intrinsic-dbg-attachment.ll
1 ↗(On Diff #129499)

Can you fix this in a separate patch then?

yaxunl added inline comments.Jan 22 2018, 11:37 AM
test/Verifier/llvm.dbg.intrinsic-dbg-attachment.ll
1 ↗(On Diff #129499)

will do.

yaxunl added inline comments.Jan 22 2018, 12:02 PM
test/Verifier/llvm.dbg.intrinsic-dbg-attachment.ll
1 ↗(On Diff #129499)

Separated the fix to https://reviews.llvm.org/D42391

yaxunl updated this revision to Diff 130942.Jan 22 2018, 12:05 PM

Separated a bug fix about verifier incorrectly treating debug info error as non-debug-info error to another review.

yaxunl updated this revision to Diff 131346.Jan 24 2018, 1:59 PM

Add a command line option to llvm-as to allow set datalayout. Opt already has such option but has a bug. Fixed the bug so that it works properly.
Add lit tests for llvm-as/opt.

yaxunl updated this revision to Diff 131362.Jan 24 2018, 2:54 PM

Add back a change dropped by accident.

aprantl added inline comments.Jan 24 2018, 3:25 PM
lib/IR/Verifier.cpp
3239 ↗(On Diff #131362)

Please move this change into a separate review, it looks like an orthogonal change and just makes this patch harder to understand.

214–216 ↗(On Diff #129499)

Can you please elaborate on this? I'm afraid I didn't understand your explanation.

yaxunl added inline comments.Jan 25 2018, 7:48 AM
lib/IR/Verifier.cpp
3239 ↗(On Diff #131362)

will do.

214–216 ↗(On Diff #129499)

The verifier checks two types of issues:

  1. debug info issues
  1. non-debug info issues

with this change, when TreatBrokenDebugInfoAsError is set, the verifier will ignore non-debug-info issues and only check debug info issues.

This is necessary for LLParser. Because LLParser only wants to check debug info issues so that it can drop invalid debug info.

Checking of non-debug-info issues is deferred after parsing.

yaxunl updated this revision to Diff 131464.Jan 25 2018, 8:39 AM

Separate the diagnostic error message change to https://reviews.llvm.org/D42543

aprantl added inline comments.Jan 25 2018, 9:28 AM
lib/IR/Verifier.cpp
214–216 ↗(On Diff #129499)

In what situation would LLParser want to not verify the non-metadata part of the IR?

I think I need some more context here. Is it accurate description that you want to be able to have LLParser accept malformed/underspecified address spaces on allocas? Why do you want to do this? Is this for writing testcases only? Both llc and opt already take an option to disable the verifier for the purpose of writing testcases. Why does this not work for you?

Note that in production environments (think: non-asserts builds of LLVM) we *always* want/must run the verifier on any external input. This is very important for correctness and even for security (imagine LLVM as a JIT compiler for example).

I think I need some more context here. Is it accurate description that you want to be able to have LLParser accept malformed/underspecified address spaces on allocas? Why do you want to do this? Is this for writing testcases only? Both llc and opt already take an option to disable the verifier for the purpose of writing testcases. Why does this not work for you?

The problem is tools like llc/opt can set the triple/datalayout from the command line. The parser is only capable of validating based on whatever IR is in the parsed module, so there is an inconsistency in what is considered valid IR until the datalayout is set. The verifier needs to be run as a separate step from the parsing, so clients would need to be responsible for running the verifier after it's set the datalayout. This is an issue for any datalayout dependent properties checked by the verifier, which as far as I can tell today is the alloca address space, and non-integral pointer address spaces.

It sounds like what we really want is a --disable-datalayout-verification option that specifically just disables that one check?

It sounds like what we really want is a --disable-datalayout-verification option that specifically just disables that one check?

I don't think this should be an option. We want the tools to work correctly with the verifier without manual intervention. Alternatively the parser should accept the datalayout and ignore whatever is parsed in the module?

You are saying that if a datalayout is specified on the command line, it should override what is in the file? That sounds very reasonable to me.

You are saying that if a datalayout is specified on the command line, it should override what is in the file? That sounds very reasonable to me.

Yes. I think an earlier version of the patch did something like that, but it seemed cleaner to me to separate the verifier from the initial IR parsing

Wouldn't it be better to move the M->setDataLayout(ClDataLayout); to before we run the Verifier? It seems like we should verify the module in the state that it will be eventually consumed.

Can we reach consensus on how to fix this issue?

Letting LLParser accept an optional datalayout parameter and use it to override the datalayout in LLVM IR is simpler and more straightforward. There will be no change in Verifier.

Wouldn't it be better to move the M->setDataLayout(ClDataLayout); to before we run the Verifier? It seems like we should verify the module in the state that it will be eventually consumed.

Doing that solely is not enough for fixing the issue, since LLParser already calls Verifier before that by itself. Unless we provide LLParser a correct datalayout to override the datalayout from the IR, the verification inside the LLParser itself will still fail.

LLParser already calls Verifier before that by itself

I wonder if that is actually necessary. Are there any code paths that invoke LLParser and don't run the Verifier afterwards? If not, then it would make sense to change that.

LLParser already calls Verifier before that by itself

I wonder if that is actually necessary. Are there any code paths that invoke LLParser and don't run the Verifier afterwards? If not, then it would make sense to change that.

There are some llvm tools invoke LLParser but do not run the Verifier afterwards, e.g. lli, llvm-cat, llvm-diff, etc.

Thanks for checking!

Based on this, we could either

  • make it the responsibility of the Caller of LLParser to run the Verifier and update those tools to run the Verifier.
  • or pass the override datalayout into LLParser so it can inject it before running the Verifier.

I personally don't like the idea of relaxing the Verifier / making it verify something that is not what is being fed into the later stages. (Feel free to try and convince me otherwise though :-)

Thanks for checking!

Based on this, we could either

  • make it the responsibility of the Caller of LLParser to run the Verifier and update those tools to run the Verifier.
  • or pass the override datalayout into LLParser so it can inject it before running the Verifier.

I personally don't like the idea of relaxing the Verifier / making it verify something that is not what is being fed into the later stages. (Feel free to try and convince me otherwise though :-)

If we don't want to modify Verifier, then the only choice is to add an option to LLParser to allow datalayout to be overridden. Then the verifier will see a valid IR.

By adding the option to LLParser, all the tools automatically get an option to override the datalayout, then minimal change is needed.

I tried to add a parameter to LLParser for overriding datalayout. However I encountered another issue for llc.

If target is not specified on llc command line, llc needs to parse the LLVM assembly to get the target. To parse the LLVM assembly, llc needs to pass the datalayout to LLParser first. However, it does not know the datalayout since it does not know the target.

It seems we have to go back the current approach, that is, let LLParser only checks debug info and ignores non-debug info. For those llvm tools which does not explicitly verify module after parsing, I can add explicit verification after parsing.

If target is not specified on llc command line, llc needs to parse the LLVM assembly to get the target. To parse the LLVM assembly, llc needs to pass the datalayout to LLParser first. However, it does not know the datalayout since it does not know the target.
It seems we have to go back the current approach, that is, let LLParser only checks debug info and ignores non-debug info. For those llvm tools which does not explicitly verify module after parsing, I can add explicit verification after parsing.

What about the other option: Don't run the verifier in LLParser and instead ensure that every client of LLParser runs the verifier?
We could even have the LLParser return a wrapper akin to

class UnverifiedModule {
  Module *M;
public:
  Module *verify();
}

to force clients to verify.

If target is not specified on llc command line, llc needs to parse the LLVM assembly to get the target. To parse the LLVM assembly, llc needs to pass the datalayout to LLParser first. However, it does not know the datalayout since it does not know the target.
It seems we have to go back the current approach, that is, let LLParser only checks debug info and ignores non-debug info. For those llvm tools which does not explicitly verify module after parsing, I can add explicit verification after parsing.

What about the other option: Don't run the verifier in LLParser and instead ensure that every client of LLParser runs the verifier?
We could even have the LLParser return a wrapper akin to

class UnverifiedModule {
  Module *M;
public:
  Module *verify();
}

to force clients to verify.

The parser needs to drop the invalid debug info. To do that it needs to call the verifier to check whether the debug info is valid. Do we also want to move dropping invalid debug info out of parser?

You are right. My mental model of how this works was a bit out of date. We separated verification and debug info stripping recently. Let's scrap that idea.

Let's go back to your previous statement:

If target is not specified on llc command line, llc needs to parse the LLVM assembly to get the target. To parse the LLVM assembly, llc needs to pass the datalayout to LLParser first. However, it does not know the datalayout since it does not know the target.

Can you elaborate this? The problem is when only a datalayout but no target is specified on the commandline? Could you pass the datalayout as a string and have LLParser parse it?

You are right. My mental model of how this works was a bit out of date. We separated verification and debug info stripping recently. Let's scrap that idea.

Let's go back to your previous statement:

If target is not specified on llc command line, llc needs to parse the LLVM assembly to get the target. To parse the LLVM assembly, llc needs to pass the datalayout to LLParser first. However, it does not know the datalayout since it does not know the target.

Can you elaborate this? The problem is when only a datalayout but no target is specified on the commandline? Could you pass the datalayout as a string and have LLParser parse it?

llc is different from opt or llvm-as beause llc always override the datalayout of the input module with the datalayout of the target. Because llc always creates TargetMachine, it knows the datalayout string by target. Therefore llc is not supposed to get datalayout string by command line option. Instead, it is supposed to get datalayout string by target.

If the user does not specify target on llc command line, llc is supposed to get the target from LLVM assembly, then use it to get the datalayout string through TargetMachine. However, that means it has to parse the LLVM assembly first, but it cannot. Because to parse the LLVM assembly it needs to know the datalayout string first.

Of course, if we specify datalayout string by a command line option we can bypass this issue. If this does not sounds like a redundant option.

If the user does not specify target on llc command line, llc is supposed to get the target from LLVM assembly, then use it to get the datalayout string through TargetMachine. However, that means it has to parse the LLVM assembly first, but it cannot. Because to parse the LLVM assembly it needs to know the datalayout string first.

I'm probably missing something obvious here. The case where llc gets invoked without a target on the command line must be working today, doesn't it? Why can't we fall back to that code path?

If the user does not specify target on llc command line, llc is supposed to get the target from LLVM assembly, then use it to get the datalayout string through TargetMachine. However, that means it has to parse the LLVM assembly first, but it cannot. Because to parse the LLVM assembly it needs to know the datalayout string first.

I'm probably missing something obvious here. The case where llc gets invoked without a target on the command line must be working today, doesn't it? Why can't we fall back to that code path?

This patch is trying to let llc load an llvm module which does not contain datalayout. In most cases, the parser is able to do that even though there is no datalayout in the module.

However, in some rare cases, the parser needs to know the datalayout to be able to parse the module. Therefore we discussed and concluded that we need to inject datalayout when parsing the module.

Let's consider the situation where llc command line does not contain target.

The old code path just parses the LLVM assembly and gets the target from the assembly. It can do that because it does not need to inject the datalayout.

However, now we need to inject datalayout when parsing, then we cannot since we don't know datalayout unless we parse the module to know the target first.

If the user does not specify target on llc command line, llc is supposed to get the target from LLVM assembly, then use it to get the datalayout string through TargetMachine. However, that means it has to parse the LLVM assembly first, but it cannot. Because to parse the LLVM assembly it needs to know the datalayout string first.

I'm probably missing something obvious here. The case where llc gets invoked without a target on the command line must be working today, doesn't it? Why can't we fall back to that code path?

This patch is trying to let llc load an llvm module which does not contain datalayout. In most cases, the parser is able to do that even though there is no datalayout in the module.

However, in some rare cases, the parser needs to know the datalayout to be able to parse the module. Therefore we discussed and concluded that we need to inject datalayout when parsing the module.

Let's consider the situation where llc command line does not contain target.

The old code path just parses the LLVM assembly and gets the target from the assembly. It can do that because it does not need to inject the datalayout.

However, now we need to inject datalayout when parsing, then we cannot since we don't know datalayout unless we parse the module to know the target first.

My suggestion is

  1. Introduce a function parseIRFileWithoutValidation, which does not drop invalid debug info and does not verify module.
  2. Add an optional DataLayoutString argument to parseIRFile

parseIRFile will call parseIRFileWithoutValidation, then inject datalayout, then drop invalid debug info and verify the module.

only llc calls parseIRFileWithoutValidation, it will inject datalayout, drop invalid debuginfo and verify module by itself.

All other tools call parseIRFile. They either inject datalayout by command line option, or does not inject datalayout.

This revision was not accepted when it landed; it landed in state Needs Review.Jan 26 2018, 1:46 PM
This revision was automatically updated to reflect the committed changes.
aprantl reopened this revision.Jan 26 2018, 1:52 PM

I'm very sorry, I accidentally pasted a wrong review URL into a commit message!

If the user does not specify target on llc command line, llc is supposed to get the target from LLVM assembly, then use it to get the datalayout string through TargetMachine. However, that means it has to parse the LLVM assembly first, but it cannot. Because to parse the LLVM assembly it needs to know the datalayout string first.

I'm probably missing something obvious here. The case where llc gets invoked without a target on the command line must be working today, doesn't it? Why can't we fall back to that code path?

This patch is trying to let llc load an llvm module which does not contain datalayout. In most cases, the parser is able to do that even though there is no datalayout in the module.

However, in some rare cases, the parser needs to know the datalayout to be able to parse the module. Therefore we discussed and concluded that we need to inject datalayout when parsing the module.

Let's consider the situation where llc command line does not contain target.

The old code path just parses the LLVM assembly and gets the target from the assembly. It can do that because it does not need to inject the datalayout.

However, now we need to inject datalayout when parsing, then we cannot since we don't know datalayout unless we parse the module to know the target first.

My suggestion is

  1. Introduce a function parseIRFileWithoutValidation, which does not drop invalid debug info and does not verify module.
  1. Add an optional DataLayoutString argument to parseIRFile

parseIRFile will call parseIRFileWithoutValidation, then inject datalayout, then drop invalid debug info and verify the module.

only llc calls parseIRFileWithoutValidation, it will inject datalayout, drop invalid debuginfo and verify module by itself.

All other tools call parseIRFile. They either inject datalayout by command line option, or does not inject datalayout.

That sounds good to me, thanks!

aprantl accepted this revision.Jan 26 2018, 2:01 PM

Landed with Jim's changes as r323565.

This revision is now accepted and ready to land.Jan 26 2018, 2:01 PM
aprantl closed this revision.Jan 26 2018, 2:01 PM
yaxunl updated this revision to Diff 131652.Jan 26 2018, 2:01 PM

Re-upload the patch.

aprantl reopened this revision.Jan 26 2018, 2:08 PM
This revision is now accepted and ready to land.Jan 26 2018, 2:08 PM
yaxunl updated this revision to Diff 131991.Jan 30 2018, 10:14 AM
yaxunl retitled this revision from LLParser: do not verify LLVM module to LLParser: add an argument for overriding data layout and do not check alloca addr space.
yaxunl edited the summary of this revision. (Show Details)

Revised as discussed with Adrian.

It is even simpler than discussed since ParseIRFile etc. already have an argument to disable updating debug info. Therefore no new functions need to be added. Only need to add an argument for overriding data layout.

aprantl accepted this revision.Jan 30 2018, 10:33 AM

Thanks, LGTM!

lib/CodeGen/MIRParser/MIRParser.cpp
240 ↗(On Diff #131991)

Perhaps comment what false means?

yaxunl added inline comments.Jan 30 2018, 10:42 AM
lib/CodeGen/MIRParser/MIRParser.cpp
240 ↗(On Diff #131991)

will add /*UpgradeDebugInfo=*/. This is used only by llc, so it is OK to do so since we upgrade debug info explicitly later.

This revision was automatically updated to reflect the committed changes.