This is an archive of the discontinued LLVM Phabricator instance.

[Clang] Harmonize Split DWARF options with llc
ClosedPublic

Authored by aaronpuchert on Mar 21 2019, 3:21 PM.

Details

Summary

With Split DWARF the resulting object file (then called skeleton CU)
contains the file name of another ("DWO") file with the debug info.
This can be a problem for remote compilation, as it will contain the
name of the file on the compilation server, not on the client.

To use Split DWARF with remote compilation, one needs to either

  • make sure only relative paths are used, and mirror the build directory structure of the client on the server,
  • inject the desired file name on the client directly.

Since llc already supports the latter solution, we're just copying that
over. We allow setting the actual output filename separately from the
value of the DW_AT_[GNU_]dwo_name attribute in the skeleton CU.

Fixes PR40276.

Diff Detail

Event Timeline

aaronpuchert created this revision.Mar 21 2019, 3:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 21 2019, 3:21 PM

Pleasue include mention of the bug (PR40276) in the commit message & clarify that while this is useful for some remote compilation models, it's not strictly necessary/the only way to do it (a remote compilation model that keeps relative paths and uses compilation-dir instead can work without this attribute)

Use llvm-dwarfdump rather than llvm-objdump to dump the contents of the debug_info section and test the dwo_name there (rather than dumping hex) & probably the objdump part of the test isn't needed? (& I guess there's an LLVM patch to add the rest of this functionality?)

aaronpuchert edited the summary of this revision. (Show Details)Mar 21 2019, 5:40 PM

Use llvm-dwarfdump rather than llvm-objdump to dump the contents of the debug_info section and test the dwo_name there (rather than dumping hex)

I didn't know about llvm-dwarfdump, I wondered why llvm-objdump wouldn't pretty-print the debug info as objdump does. That makes a lot of sense then.

probably the objdump part of the test isn't needed?

Actually I only need to check that the DWO file is there (under the proper file name). Any ideas?

(& I guess there's an LLVM patch to add the rest of this functionality?)

What do we have to do there? Should we add the option to llc as well?

aaronpuchert edited the summary of this revision. (Show Details)Mar 21 2019, 6:06 PM

Use llvm-dwarfdump to inspect debug info, remove unneeded flags.

Use llvm-dwarfdump rather than llvm-objdump to dump the contents of the debug_info section and test the dwo_name there (rather than dumping hex)

I didn't know about llvm-dwarfdump, I wondered why llvm-objdump wouldn't pretty-print the debug info as objdump does. That makes a lot of sense then.

probably the objdump part of the test isn't needed?

Actually I only need to check that the DWO file is there (under the proper file name). Any ideas?

Ah, fair. You could actually test the dwo_name is accurate in the .dwo file (I added the dwo_name to the .dwo file so that multi-level dwp error messages could be more informative)

(& I guess there's an LLVM patch to add the rest of this functionality?)

What do we have to do there? Should we add the option to llc as well?

Yep - pretty much anything in MCOptions should be surfaced through llc for llvm-level testing.

Ah, fair. You could actually test the dwo_name is accurate in the .dwo file (I added the dwo_name to the .dwo file so that multi-level dwp error messages could be more informative)

Ok, I'll just check the dwo_name for both files then.

(& I guess there's an LLVM patch to add the rest of this functionality?)

What do we have to do there? Should we add the option to llc as well?

Yep - pretty much anything in MCOptions should be surfaced through llc for llvm-level testing.

I was about to add this when I discovered that there is such an option already. The command line clang -cc1 -split-dwarf-file a -fsplit-dwarf-dwo-name-attr b corresponds to llc -split-dwarf-output a -split-dwarf-file b. That seems a bit unfortunate, especially the different meanings of -split-dwarf-file. Should/can we strive to emulate the behavior of llc in Clang instead? That would perhaps require changes to the behavior of -split-dwarf-file. Here's how the options behave with llc:

Option-split-dwarf-file a
No splitSplit DI in same object, set DW_AT_GNU_dwo_name = a, no extra file
-split-dwarf-output bNo splitSplit DI, set DW_AT_GNU_dwo_name = a, separate file b

Currently, Clang's -enable-split-dwarf[=split] -split-dwarf-file a does the same as -split-dwarf-output a -split-dwarf-file a in llc, and -enable-split-dwarf=single -split-dwarf-file a does the same as -split-dwarf-file a in llc.

Ok, here is an idea. We introduce -split-dwarf-output in Clang instead of -fsplit-dwarf-dwo-name-attr. If given, it overrides the output file name for the Split DWARF file, which we otherwise take from -split-dwarf-file. The option is obviously incompatible with -enable-split-dwarf=single, so we will disallow that. This should be backwards-compatible, and bring the behavior of llc and clang -cc1 closer together. What do you think?

Ok, here is an idea. We introduce -split-dwarf-output in Clang instead of -fsplit-dwarf-dwo-name-attr. If given, it overrides the output file name for the Split DWARF file, which we otherwise take from -split-dwarf-file. The option is obviously incompatible with -enable-split-dwarf=single, so we will disallow that. This should be backwards-compatible, and bring the behavior of llc and clang -cc1 closer together. What do you think?

Sure, I think the naming's a bit weird (but hard to come up with good names for any of this) - but consistency seems like a good first pass at least, given we're halfway there anyway.

aaronpuchert planned changes to this revision.Apr 14 2019, 6:29 PM

Sure, I think the naming's a bit weird (but hard to come up with good names for any of this)

Agreed, -split-dwarf-output is pretty clear, but -split-dwarf-file could be both the actual filename or the attribute.

There is test/CodeGen/split-debug-filename.c checking that we emit !DICompileUnit({{.*}}, splitDebugFilename: "foo.dwo" into the IR under some circumstances, but I'm not sure why. It seems to be ignored by llc. What's the intended use for this? Your commit message in rC301063 suggests that this is needed for implicit modules. How does this work? I'm asking of course because I'm not sure whether we might need to emit both file names there.

Sure, I think the naming's a bit weird (but hard to come up with good names for any of this)

Agreed, -split-dwarf-output is pretty clear, but -split-dwarf-file could be both the actual filename or the attribute.

There is test/CodeGen/split-debug-filename.c checking that we emit !DICompileUnit({{.*}}, splitDebugFilename: "foo.dwo" into the IR under some circumstances, but I'm not sure why. It seems to be ignored by llc.

For implicit modular debug info I think it probably isn't ignored. See DwarfDebug.cpp:~630.

What's the intended use for this? Your commit message in rC301063 suggests that this is needed for implicit modules. How does this work? I'm asking of course because I'm not sure whether we might need to emit both file names there.

So implicit modules can use a sort of pseudo-split DWARF. The .pcm (module file) itself is an object file in this mode - with the split DWARF and a section containing the AST bitcode. Then in the normal source files that use the .pcm, they get the usual non-split DWARF, plus skeleton CUs that reference the .pcm file (& contain some other attributes for regenerating it in case it's been cleaned up from the module cache). This is why the file name can be (& has to be) passed down through the IR - it /can/ be, because this compilation isn't choosing tnhe file name (so there's no issue with the .dwo file name changing between IR generation and object generation (as there is with LTO - where the compilation doesn't know the final object name, only the linker knows that)) and it /must/ be done this way due to multiple skeleton CUs if the main CU references multiple .pcm debug info.

Introduce -split-dwarf-output instead to mirror the behavior of llc, which already allows to set the attribute separately from the output file name.

aaronpuchert retitled this revision from [Driver] Allow setting the DWO name DWARF attribute separately to [Clang] Harmonize Split DWARF options with llc.Apr 22 2019, 3:53 PM
aaronpuchert edited the summary of this revision. (Show Details)
aaronpuchert added inline comments.
lib/CodeGen/BackendUtil.cpp
1345

@pcc Your documentation for DwoPath suggests that this should be the actual output filename. However, the test that you added together with this line in rC333677 doesn't fail whatever garbage I write into that field here. What can I add to that so that it fails when we don't do the right thing here?

Adapt one more test case.

Ping. I'm sorry that the change turned out so big, but note that it doesn't change the behavior of any non-cc1 flags.

Might be easier as a few patches - renaming the existing option, adding the new one, then removing the single split dwarf flag handling in favor of implying that by the absence of an output file name. (if I'm reading what this patch does)

include/clang/Basic/CodeGenOptions.h
185–188

Comments are a bit inconsistent - "filename for the split debug info" versus "name for the split debug info file" - makes it perhaps a bit harder to see what the important difference is between these two values.

It might be helpful to clarify "used in the skeleton CU" as "Used as the dwo_name in the DWARF" versus "the name of the file to write the .dwo sections to" or something like that?

lib/CodeGen/BackendUtil.cpp
847

Why did this add a check for EnableSplitDwarf here that wasn't there before?

aaronpuchert marked an inline comment as done.Jun 3 2019, 7:47 AM

Might be easier as a few patches - renaming the existing option, adding the new one, then removing the single split dwarf flag handling in favor of implying that by the absence of an output file name.

No problem, makes sense to me. I'll see if it's possible to separate these changes.

if I'm reading what this patch does

Yes, that seems to be how llc behaves, and I want to copy that.

lib/CodeGen/BackendUtil.cpp
847

I just changed the order, the check was there as (CodeGenOpts.getSplitDwarfMode() == CodeGenOptions::SplitFileFission) in the second line of the if.

aaronpuchert planned changes to this revision.Jun 3 2019, 7:47 AM
aaronpuchert added a comment.EditedJun 5 2019, 9:24 AM

Might be easier as a few patches - renaming the existing option, adding the new one, then removing the single split dwarf flag handling in favor of implying that by the absence of an output file name.

No problem, makes sense to me. I'll see if it's possible to separate these changes.

I have managed to split this into commits that build and test fine own their own, but there are temporary inconsistencies. If I rename -split-dwarf-file to -split-dwarf-output first, then I have to use that option for -enable-split-dwarf=single, and then I reintroduce -split-dwarf-file and have to change it back. It's not a huge problem, but it might feel a bit weird. On the other hand, the changes are indeed smaller and easier to review. I will push them soon.

Split from other changes as suggested. A predecessor is in D63130, and a successor will come soon.

aaronpuchert edited the summary of this revision. (Show Details)Jun 11 2019, 4:21 AM

Correct an oversight.

aaronpuchert marked 2 inline comments as done.Jun 11 2019, 2:12 PM

Might be easier as a few patches - renaming the existing option,

D63130

adding the new one,

This change.

then removing the single split dwarf flag handling in favor of implying that by the absence of an output file name. (if I'm reading what this patch does)

D63167

dblaikie accepted this revision.Jun 11 2019, 5:56 PM

Looks good!

This revision is now accepted and ready to land.Jun 11 2019, 5:56 PM
aaronpuchert added inline comments.
lib/CodeGen/BackendUtil.cpp
1345

@pcc Could you (or perhaps @tejohnson) comment on what the intended behavior is here, and how I can change the test so that it fails when I do the wrong thing? Is this the name of the file we write the split debug info to, or is it the value we use for the DW_AT_[GNU_]dwo_name attribute in the skeleton CU?

tejohnson added inline comments.Jun 12 2019, 6:38 AM
lib/CodeGen/BackendUtil.cpp
1345

It is the name of the file the split debug info is written to. If you test by changing the file name given to the -split-dwarf-file option in test/CodeGen/thinlto-split-dwarf.c, make sure you clean up the old one in your test output directory. When I tested it just now, it initially still passed because I had an old one sitting in the output directory from a prior run. Once I removed that it failed with the new name (without changing the corresponding llvm-readobj invocation).

aaronpuchert added inline comments.Jun 12 2019, 12:33 PM
lib/CodeGen/BackendUtil.cpp
1345

Thanks, I didn't consider that. I wasn't even aware that test output is persisted.

It seems DwoPath is used both as output filename and as value for DW_AT_[GNU_]dwo_name in the skeleton CU. llc has separate options for both: -split-dwarf-file for the attribute and -split-dwarf-output for the output filename. I want to do the same for Clang. Then we should probably separate them for LTO, too.

What is the use case for -split-dwarf-file with an individual thin backend? Is that used for distributed/remote builds? Also is there any non-cc1 way to use it? There is --plugin-opt=dwo_dir=... for the LTO linker plugin, but that seems to go a different route.

tejohnson added inline comments.Jun 12 2019, 12:50 PM
lib/CodeGen/BackendUtil.cpp
1345

This is the path taken for distributed ThinLTO, the plugin-opt passed to linker are for in-process ThinLTO. While -split-dwarf-file is a cc1 option, it is normally set automatically from the filename under -gsplit-dwarf (see where it is set in Clang::ConstructJob).

aaronpuchert added inline comments.Jun 12 2019, 1:22 PM
lib/CodeGen/BackendUtil.cpp
1345

Ok, then it makes sense to do the changes for LTO as well.

You're right that -split-dwarf-file is produced for -gsplit-dwarf, but not with LTO, right? When I set -gsplit-dwarf on a vanilla ThinLTO build, then the flag is just ignored and I'm getting the debug info in the final executables/shared objects. (At least with Clang 8.)

I'm asking because this changes the cc1-interface, and while I have adapted Clang::ConstructJob, I'm not sure how distributed ThinLTO works, and whether there have to be changes.

If it works like test/CodeGen/thinlto-split-dwarf.c, using cc1-options, then users might have to adapt after this change.

Make sure the flags have the same meaning for LTO. Also slightly reworded the HelpText.

Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2019, 4:50 PM

Sorry for making the inline comments disappear, but I had to switch to the big repository.

@tejohnson Could you have a look at the LTO-related changes (BackendUtil.cpp + all files with LTO in the name)?

lgtm for the LTO bits. Suggestion below for comment.

llvm/include/llvm/LTO/Config.h
92 ↗(On Diff #204661)

Probably needs a comment similar to the one below about being for running individual backends directly. Otherwise (for in process ThinLTO) we use the DwoDir.

aaronpuchert added inline comments.Jun 14 2019, 7:53 AM
llvm/include/llvm/LTO/Config.h
92 ↗(On Diff #204661)

Sure. Any objections to the following wording?

/// The name for the split debug info file used for the DW_AT_[GNU_]dwo_name
/// attribute in the skeleton CU. This should generally only be used when
/// running an individual backend directly via thinBackend(), as otherwise
/// all objects would use the same .dwo file. Not used as output path.
tejohnson added inline comments.Jun 14 2019, 9:54 AM
llvm/include/llvm/LTO/Config.h
92 ↗(On Diff #204661)

Looks good.

This revision was automatically updated to reflect the committed changes.
aaronpuchert marked 3 inline comments as done.
svenvh added a subscriber: svenvh.Jun 21 2019, 3:33 AM

As a followup to r363496, I've added llvm-dwarfdump as a clang test dependency in r364021.