This is an archive of the discontinued LLVM Phabricator instance.

[LTO] Don't generate invalid modules if "LTOPostLink" MD already exists
AbandonedPublic

Authored by Pierre-vh on Dec 12 2022, 1:12 AM.

Details

Summary

Prevents the LTO library from generating an invalid module when the
LTOPostlink MD is already present (which was added in D63932).

Diff Detail

Event Timeline

Pierre-vh created this revision.Dec 12 2022, 1:12 AM
Pierre-vh requested review of this revision.Dec 12 2022, 1:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 12 2022, 1:12 AM

Can you indicate what is your motivation for such change? From the original change, the flag is to indicate whether the post link optimization is done or not, and overwriting it is just hiding the actual problem of post link is run twice (so perhaps one of the run is done without the full information, thus the result is not correct).

Can you indicate what is your motivation for such change? From the original change, the flag is to indicate whether the post link optimization is done or not, and overwriting it is just hiding the actual problem of post link is run twice (so perhaps one of the run is done without the full information, thus the result is not correct).

I'm not sure if running the post-link optimization twice is an issue in itself. I assumed it wasn't (I don't see why running the passes multiple times would be an issue?) and that's why I proposed this change, but if we determine that running post-link optimizations twice is an issue/something undesirable, then it's fine for me to put a diagnostic or something else (e.g. skip the pipeline if the flag is present) instead.
For instance, a case where this came up involved re-running IR obtained from -save-temps through LLD, which crashed due to the verification error (duplicate flag).

The motivation is just to prevent an avoidable verifier error. Whether we do it by adding the flag more carefully, or by checking for its presence earlier and exiting more elegantly, I don't mind

(cc @arsenm)

Pierre-vh updated this revision to Diff 486184.Jan 4 2023, 12:34 AM

Ping + rebase

Ping - can someone please take a look?

Gentle ping :)
Can someone please take a quick look?

My understanding of the flag is preventing you from:

  • run the devirtual pass
  • add some new function
  • run the devirtual pass again

The LTO devirtualization passes require visibility for all the functions when the passes are run.

My understanding of the flag is preventing you from:

  • run the devirtual pass
  • add some new function
  • run the devirtual pass again

The LTO devirtualization passes require visibility for all the functions when the passes are run.

Can the pass cause a miscompilation or even crash if it's run twice?
If yes, then this patch is indeed not the right solution and we should probably emit an error instead.
If no, then I don't think this patch is a bad idea to prevent the crash, but maybe we should print a warning/remark if the flag is already set?

I am not the expert to answer that question. I will think this is an assertion-like behavior. It tells you as a compiler developer that your pipeline configuration is wrong in a sense that you should add all the functions in before running post link passes. You should delay the post link pass when you hit this error, rather than disable the check without a compelling reason.

For example, the legacy LTO pipeline used to hit the same error, when you run ld -r (it is the linking option in ld64 that produces another object file for later usage), then link with other bitcode object in the final link. The fix is to not run post link pass during ld -r so it can be run in the end once.

I am not the expert to answer that question. I will think this is an assertion-like behavior. It tells you as a compiler developer that your pipeline configuration is wrong in a sense that you should add all the functions in before running post link passes. You should delay the post link pass when you hit this error, rather than disable the check without a compelling reason.

For example, the legacy LTO pipeline used to hit the same error, when you run ld -r (it is the linking option in ld64 that produces another object file for later usage), then link with other bitcode object in the final link. The fix is to not run post link pass during ld -r so it can be run in the end once.

I think this violates a core tenet of being a modular, reusable IR. It shouldn't be wrong to run a pass twice. It certainly shouldn't error by way of verifier error

I think this violates a core tenet of being a modular, reusable IR. It shouldn't be wrong to run a pass twice. It certainly shouldn't error by way of verifier error

There are plenty of verifiers that will error on wrong usage of IR. And there is a reason why the ModuleFlag has a Behavior field and it will just error out when it sees incompatible values. This is conceptually not different.

There are two parts of the changes in this commit. First is to add an API to allow overwriting module flag, which is generally fine. The second part is that LTOPostLink should be overwritten with new value. I think that is against the reason why it is set the way it is now. I am against it because:

  • You didn't provide reason why it should not be a fix on the toolchain setup, like how legacy LTO API did. Any insights into the actual problem you run into will be helpful to evaluate the situation.
  • I think overwrite it basically kills it function. In that case, it should just be removed.

This flag just seems to be used to tell GlobalDCE it's running Post-Link. It's not used to guard against multiple runs of the pass.
I also agree with Matt that running a pass or even a pipeline twice shouldn't result in a verifier error or an assertion failure. If the workfllow is indeed bad, then diagnostics are a much better way to communicate to the user. (Or just skipping the pipeline + emit a warning on the second run if it's harmless)

In this particular case I see no compelling argument why this pipeline can't run multiple times on the same output, so it should work. Whether it should work _and_ run the passes a second time is another question (instead of this, we can add another flag to skip the pipeline on the second+ run if there's a good reason to)

arsenm added inline comments.Jan 30 2023, 3:31 AM
llvm/lib/LTO/LTOCodeGenerator.cpp
621–622

Do you really need a new method? Can you just use ModFlagBehavior::Override?

Pierre-vh marked an inline comment as done.Jan 30 2023, 6:55 AM
Pierre-vh added inline comments.
llvm/lib/LTO/LTOCodeGenerator.cpp
621–622

I tried using 'addModuleFlag` with override and got module flag identifiers must be unique (or of 'require' type)

Note that the new function is just for convenience. I could also just call setModuleFlag and create a ConstantInt/ConstantAsMetadata here but it'd be uglier

Pierre-vh marked an inline comment as done.Feb 8 2023, 12:42 AM

Gentle ping :)

I think this violates a core tenet of being a modular, reusable IR. It shouldn't be wrong to run a pass twice. It certainly shouldn't error by way of verifier error

There are plenty of verifiers that will error on wrong usage of IR. And there is a reason why the ModuleFlag has a Behavior field and it will just error out when it sees incompatible values. This is conceptually not different.

This is conceptually very different. Verifier errors are for malformed IR by construction. It's for catching API errors and bugs in compiler passes, not tool usage. By running any combination of tools or passes, it should not be possible to hit a verifier error.

I don't really see the reason to have the verified check this at all, can we just drop it? I can see the use of emitting an informative module flag but this doesn't need to be semantically enforced. It is still possible to introduce new code after this point in ways that aren't wrong.

steven_wu resigned from this revision.Feb 28 2023, 9:33 AM

From the semantic meaning of this module flag as described in D63932, and from its usage, it seems it should not block IR from being sent through the LTO pipeline multiple times.

Adding a new interface works, but if you do that please add a test of it to llvm/unittests/IR/ModuleTest.cpp (see where the other setModuleFlag interface is tested there). The other alternative would be to simply guard the addModuleFlag with a check of getModuleFlag("LTOPostLink") returning null (and if non-null assert that the value is '1').

After looking at the use of this module flag again, I think it would actually be better to remove it completely and replace it by passing a flag down to the GlobalDCEPass constructor when it is called from PassBuilder::buildLTODefaultPipeline (looks like there are a few calls there). That is the more typical way of communicating this information.

Pierre-vh updated this revision to Diff 533907.Jun 23 2023, 2:36 AM

Removing new APIs, use existing ones

I'm not sure about removing the flag yet, it's present in a lot of tests and the intention with this patch is to be almost a NFC and just fix a edge case compiler crash

arsenm added inline comments.Jun 23 2023, 6:15 AM
llvm/test/LTO/ARM/lto-linking-metadata-already-present.ll
1

Might as well use llvm-as for this part

Removing new APIs, use existing ones

I'm not sure about removing the flag yet, it's present in a lot of tests and the intention with this patch is to be almost a NFC and just fix a edge case compiler crash

All but 2 of the test uses are just incidental. Only 2 tests fail when this is removed - one is obsolete if the flag is removed (it was specifically just to test that the flag gets generated), and the other just needs to switch to using the LTO default pipeline. I have a small fix to GlobalDCEPass to get this passed down from the pass manager, I'll send it for review once I rerun all the tests.

Removing new APIs, use existing ones

I'm not sure about removing the flag yet, it's present in a lot of tests and the intention with this patch is to be almost a NFC and just fix a edge case compiler crash

All but 2 of the test uses are just incidental. Only 2 tests fail when this is removed - one is obsolete if the flag is removed (it was specifically just to test that the flag gets generated), and the other just needs to switch to using the LTO default pipeline. I have a small fix to GlobalDCEPass to get this passed down from the pass manager, I'll send it for review once I rerun all the tests.

D153655

Can the summary be changed to summarize the useful prior discussions?

Can the summary be changed to summarize the useful prior discussions?

Are you talking about this patch (which is about to be obsolete after D153655), or D153655? If the latter, I did summarize the issue there in its summary, but pointed back to this one for full context.

Pierre-vh abandoned this revision.Jun 26 2023, 12:10 AM