This is an archive of the discontinued LLVM Phabricator instance.

[Driver] Re-run lld with --reproduce when it crashes
ClosedPublic

Authored by abrachet on Feb 18 2022, 8:58 PM.

Details

Summary

This was discussed on https://discourse.llvm.org/t/rfc-generating-lld-reproducers-on-crashes/58071/12

When lld crashes, or errors when -gen-reproducer=error and -fcrash-diagnostics=all clang will re-run lld with --reproduce=$temp_file for easily reproducing the crash/error.

Diff Detail

Event Timeline

abrachet requested review of this revision.Feb 18 2022, 8:58 PM
abrachet created this revision.

Thanks. I think this is more appropriate than D102892.
I am thinking that whether this should be opt-in, possibly with an environment variable or option.

clang/include/clang/Driver/Driver.h
511

Remove generateLinkerDiagnostics -.
https://llvm.org/docs/CodingStandards.html#commenting "Don’t duplicate function or class name at the beginning of the comment"

There is currently no testing because, although there is testing for generateCompilationDiagnostics, it is harder to force lld to crash, even if we added a way to crash lld on purpose, the test would need to depend on lld. Looking for suggestions if anyone has any.

I think you add test coverage in LLD, through a environment variable that triggers a crash in any of the drivers & --reproduce to ensure the archive was created.

clang/lib/Driver/Driver.cpp
1550

Instead of adding generateLinkerDiagnostics above, would it be possible to leverage the same codepath as when Clang crashes, ie. initCompilationForDiagnostics() and then add --reproduce when building linker options, based on C.isForDiagnostics()? (and use C.addTempFile())

abrachet updated this revision to Diff 410644.Feb 22 2022, 2:15 PM

Add -fcrash_diagnostics which can accept a level to determine if lld should be re-run with --reproduce

abrachet marked an inline comment as done.Feb 22 2022, 2:23 PM

There is currently no testing because, although there is testing for generateCompilationDiagnostics, it is harder to force lld to crash, even if we added a way to crash lld on purpose, the test would need to depend on lld. Looking for suggestions if anyone has any.

I think you add test coverage in LLD, through a environment variable that triggers a crash in any of the drivers & --reproduce to ensure the archive was created.

lld has tests for --reproduce already. Even if I added an environment variable to crash lld, the test would depend on lld being built, adding a big dependency to clang's tests. Do we think that's worth it?

Thanks. I think this is more appropriate than D102892.
I am thinking that whether this should be opt-in, possibly with an environment variable or option.

Sounds good, how does the current flag look?

clang/lib/Driver/Driver.cpp
1550

I was stuck on this for a while, you're right that it would be better to do it this way. Ultimately I ended up deciding it wasn't worth the increase in complexity to make that possible.

Right now generateCompilationDiagnostics chooses it's jobs based on input files so with clang source.c it doesn't get run. Also currently non preprocess-able sources get filtered out and a job isn't created for them. Changing those ended up being a big hassle and I was worried would add edge cases that we're going to be difficult to think of to test.

Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2022, 9:13 AM

There is currently no testing because, although there is testing for generateCompilationDiagnostics, it is harder to force lld to crash, even if we added a way to crash lld on purpose, the test would need to depend on lld. Looking for suggestions if anyone has any.

I think you add test coverage in LLD, through a environment variable that triggers a crash in any of the drivers & --reproduce to ensure the archive was created.

lld has tests for --reproduce already. Even if I added an environment variable to crash lld, the test would depend on lld being built, adding a big dependency to clang's tests. Do we think that's worth it?

It would be a interesting addition to bind LLVM_LINKER_IS_LLD to a LIT test "feature" -- in a follow-up patch if you're willing to? (see config.available_features.add('z3') and LLVM_WITH_Z3 in clang/test/lit.cfg.py for an example)

clang/docs/UsersManual.rst
670

-fcrash-diagnostics (dash not underscore)

675

Shouldn't we be more explicit and set the option to be "compiler"?

clang/include/clang/Driver/Compilation.h
228

Should probably remove (see other comments)

clang/include/clang/Driver/Driver.h
511

s/lld/LLD and please add missing period.

clang/lib/Driver/Driver.cpp
1507

Uppercase Level.

1521

Do you actually need the redirects here? I figure that if LLD re-runs, we don't need to emit the stdout/stderr once again. Actually calling Compilation::initCompilationForDiagnostics() would disable that.

1554

I wonder if you can move this call to C.initCompilationForDiagnostics() just a bit above if (IsLLD) to be able to call Driver::GetNamedOutputPath in generateLinkerDiagnostics? This will prevent the temp files cleanup.

abrachet updated this revision to Diff 415265.Mar 14 2022, 4:20 PM

Add tests when building with lld

abrachet marked 7 inline comments as done.Mar 14 2022, 4:28 PM

There is currently no testing because, although there is testing for generateCompilationDiagnostics, it is harder to force lld to crash, even if we added a way to crash lld on purpose, the test would need to depend on lld. Looking for suggestions if anyone has any.

I think you add test coverage in LLD, through a environment variable that triggers a crash in any of the drivers & --reproduce to ensure the archive was created.

lld has tests for --reproduce already. Even if I added an environment variable to crash lld, the test would depend on lld being built, adding a big dependency to clang's tests. Do we think that's worth it?

It would be a interesting addition to bind LLVM_LINKER_IS_LLD to a LIT test "feature" -- in a follow-up patch if you're willing to? (see config.available_features.add('z3') and LLVM_WITH_Z3 in clang/test/lit.cfg.py for an example)

Ah thanks for the tip. I've done it when "lld" is in LLVM_ENABLE_PROJECTS because I need to have a way to crash it, which I have added to lld.

clang/docs/UsersManual.rst
675

Sure

clang/lib/Driver/Driver.cpp
1521

You're right I've just sent output to /dev/null. I've moved the code bellow the initCompilationForDiagnostics call. However the redirects would normally happen in BuildJobs/ExecuteJobs but I'm not using that here. So I just specified the same redirects as in initCompilationForDiagnostics

1554

Done. I pulled out the specific code I'm interested in from GetNamedOutputPath into Driver::CreateTempFile because it manually chooses the suffix with types::getTypeTempSuffix(JA.getType(), IsCLMode()); which will never be "tar"

abrachet updated this revision to Diff 415285.Mar 14 2022, 5:36 PM
abrachet marked 3 inline comments as done.

Forgot to add test file to the diff

aganea added a comment.EditedMar 15 2022, 7:55 AM

Thanks for making the changes!

You could probably split this in 3 incremental commits, to save on debugging-after-commit & reverts if things go wrong, but it's up to you!

  • One commit for the LLD addition and a very simple test in lld/test/ that triggers config->forceCrash.
  • One for "Clang using LLD" LIT option, and a simple test such as clang++ main.cpp -### and check if LLD is there. Then ensuring the bots that have LLD actually are passing the test.
  • One for extending fcrash-diagnostics= which is the main purpose of this patch.
clang/lib/Driver/Driver.cpp
1635

TmpName?

lld/ELF/Driver.cpp
1208 ↗(On Diff #415265)

You could do config->forceCrash = getenv("LLD_CRASH");. I would add a very simple test in LLD as well.

lld/ELF/Options.td
718 ↗(On Diff #415265)

I think it'd be better if wasn't an explicit option, but a environment variable instead. @MaskRay an opinion on this?

abrachet updated this revision to Diff 415520.Mar 15 2022, 11:28 AM

Split up into 3 separate patches. See D121724 and D121725

abrachet updated this revision to Diff 435931.EditedJun 10 2022, 8:46 AM
abrachet edited the summary of this revision. (Show Details)

This no longer depends of previous dependent revisions because I found an easier way to check if clang is being built with lld, and since -gen-reproducer changes landed, there is no need to depend on lld crashing, just that we can make an error.

rnk added a comment.Jul 6 2022, 11:59 AM

Regarding environment variables, all flags are accessible via the environment if your build system propagates CCC_OVERRIDE_OPTIONS:
https://twitter.com/kastiglione/status/1334348223452876800?lang=en

IMO we ought to document this hidden feature. I've mentioned this feature to various folks and some of them say we shouldn't document it because it's a giant hack, but I think it's a great feature.

On the patch at hand, this sounds good to me, we should do it. :) I agree we can't do this by default. It seems OK to generate a ~10MB pre-processed source file in /tmp every time we crash, but a multi-GB archive for every linker crash would quickly fill up most disks.

Ping. We recently ran into a linker crash and this would have been useful :)

aganea accepted this revision.Aug 1 2022, 10:28 AM

LGTM!

This revision is now accepted and ready to land.Aug 1 2022, 10:28 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 1 2022, 1:01 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
MaskRay added a comment.EditedAug 1 2022, 1:11 PM

LGTM

(Note: This is a patch which I think would benefit from a second look and 2.5h from approval to push was too short for others to react on this patch.)

LGTM

(Note: This is a patch which I think would benefit from a second look and 2.5h from approval to push was too short for others to react on this patch.)

Ack, I'll give more time in the future.

thakis added a subscriber: thakis.Aug 1 2022, 1:57 PM

This breaks tests on mac and win:
http://45.33.8.238/macm1/41545/step_7.txt
http://45.33.8.238/win/63579/step_7.txt

Please take a look and revert for now if it takes a while to fix.

This breaks tests on mac and win:
http://45.33.8.238/macm1/41545/step_7.txt
http://45.33.8.238/win/63579/step_7.txt

Please take a look and revert for now if it takes a while to fix.

Should be fixed in https://github.com/llvm/llvm-project/commit/9028966a717239cf586d6c4f606965d444176dc4

lld has tests for --reproduce already. Even if I added an environment variable to crash lld, the test would depend on lld being built, adding a big dependency to clang's tests. Do we think that's worth it?

There is now cross-project-tests which would seem to be the appropriate place for a test that wants to run both clang and lld.

thakis added inline comments.Aug 2 2022, 10:02 AM
clang/lib/Driver/Driver.cpp
1638

This does the wrong thing on Windows, I think (the flag is spelled /reproduce:path there I think?)

There is now cross-project-tests which would seem to be the appropriate place for a test that wants to run both clang and lld.

Thanks for pointing that out. I'll look into moving the test there

clang/lib/Driver/Driver.cpp
1638

Ah yeah good catch. Unfortunately this wasn't caught upstream because I quickly had to change to explicit target because *-ps4 target doesn't support -fuse-ld and there didn't already exist a PS4 lit feature that I could mark as unsupported. This was actually why you saw test failures on non linux platforms.

I'll make a new review that correctly uses /reproduce:. Thanks again

there didn't already exist a PS4 lit feature that I could mark as unsupported.

? I believe UNSUPPORTED: ps4 should work, because UNSUPPORTED will look at the default target triple.

I noticed that when using -fcrash-diagnostics=all and the linker crashes, clang creates both preprocessor and linker repros:

clang: note: diagnostic msg: /var/folders/w6/wpbtszrs7jl9dc9l5qtdkvg00000gn/T/main-241332.c
clang: note: diagnostic msg: /var/folders/w6/wpbtszrs7jl9dc9l5qtdkvg00000gn/T/linker-crash-945b07.tar
clang: note: diagnostic msg: /var/folders/w6/wpbtszrs7jl9dc9l5qtdkvg00000gn/T/main-241332.sh

Is that intentional? I would've expected to only get the linker repro when the linker crashes.

I noticed that when using -fcrash-diagnostics=all and the linker crashes, clang creates both preprocessor and linker repros:

clang: note: diagnostic msg: /var/folders/w6/wpbtszrs7jl9dc9l5qtdkvg00000gn/T/main-241332.c
clang: note: diagnostic msg: /var/folders/w6/wpbtszrs7jl9dc9l5qtdkvg00000gn/T/linker-crash-945b07.tar
clang: note: diagnostic msg: /var/folders/w6/wpbtszrs7jl9dc9l5qtdkvg00000gn/T/main-241332.sh

Is that intentional? I would've expected to only get the linker repro when the linker crashes.

I didn't really make a conscious decision one way or the other. I guess reproing all jobs leading to a crash somewhat makes sense, though it also doesn't seem that helpful to get the preprocessed source file. I'll send out a patch to fix this

I didn't really make a conscious decision one way or the other. I guess reproing all jobs leading to a crash somewhat makes sense, though it also doesn't seem that helpful to get the preprocessed source file. I'll send out a patch to fix this

Great, thanks :)

For context, we're trying to use this flag In Production to try and debug a linker crash, but making the compiler repro failed for some reason – and then we wondered why that step has to happen at all :)

Great, thanks :)

For context, we're trying to use this flag In Production to try and debug a linker crash, but making the compiler repro failed for some reason – and then we wondered why that step has to happen at all :)

Nice, I'm happy folks are using (or trying to use) this in production :). We've found that it has saved us quite a bit of time.

D137289 is up to stop generating the preprocessed sources for linker crashes