This is an archive of the discontinued LLVM Phabricator instance.

[Driver] -gsplit-dwarf: Produce .dwo regardless of -gN for -fthinlto-index=
ClosedPublic

Authored by MaskRay on Jan 13 2021, 7:34 PM.

Details

Summary

-g is an IR generation option while -gsplit-dwarf is an object file generation option.
For -gsplit-dwarf in the backend phase of a distributed ThinLTO (-fthinlto-index=) which does object file generation and no IR generation, -g should not be needed.

This patch makes -fthinlto-index= -gsplit-dwarf emit .dwo even in the absence of -g.
This should fix https://crbug.com/1158215 after D80391.

// Distributed ThinLTO usage
clang -g -O2 -c -flto=thin -fthin-link-bitcode=a.indexing.o a.c
clang -g -O2 -c -flto=thin -fthin-link-bitcode=b.indexing.o b.c
clang -fuse-ld=lld -Wl,--thinlto-index-only=a.rsp -Wl,--thinlto-prefix-replace=';lto/' -Wl,--thinlto-object-suffix-replace='.indexing.o;.o' a.indexing.o b.indexing.o
clang -gsplit-dwarf -O2 -c -fthinlto-index=lto/a.o.thinlto.bc a.o -o lto/a.o
clang -gsplit-dwarf -O2 -c -fthinlto-index=lto/b.o.thinlto.bc b.o -o lto/b.o
clang -fuse-ld=lld @a.rsp -o exe

Note: for implicit regular/Thin LTO, .dwo emission works without this patch:
clang -flto=thin -gsplit-dwarf a.o b.o passes -plugin-opt=dwo_dir= to the linker.
The linker forwards the option to LTO. LTOBackend.cpp emits $dwo_dir/[0123...].dwo.

Diff Detail

Event Timeline

MaskRay created this revision.Jan 13 2021, 7:34 PM
MaskRay requested review of this revision.Jan 13 2021, 7:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 13 2021, 7:34 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
MaskRay edited the summary of this revision. (Show Details)Jan 13 2021, 7:35 PM
MaskRay edited the summary of this revision. (Show Details)
MaskRay updated this revision to Diff 316558.Jan 13 2021, 7:40 PM

improve code comment

In implicit thinlto this seemed to work for me without any changes:

$ clang++-tot -flto=thin test.cpp -g -c
$ clang++-tot -flto=thin -fuse-ld=lld -gsplit-dwarf test.o && llvm-dwarfdump-tot a.out
a.out:  file format elf64-x86-64

.debug_info contents:
0x00000000: Compile Unit: length = 0x0000002c, format = DWARF32, version = 0x0004, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x00000030)

0x0000000b: DW_TAG_compile_unit
              DW_AT_stmt_list   (0x00000000)
              DW_AT_comp_dir    ("/usr/local/google/home/blaikie/dev/scratch")
              DW_AT_GNU_dwo_name        ("a.out_dwo/1.dwo")
              DW_AT_GNU_dwo_id  (0xf29563d7c812deae)
              DW_AT_low_pc      (0x0000000000201730)
              DW_AT_high_pc     (0x0000000000201738)
              DW_AT_GNU_addr_base       (0x00000000)

Any idea why that worked without the need for a special case/changes? Might be nice if both these use cases (implicit and explicit thinlto) somehow managed to use more common code generation options. But I guess that's too different - probably some generic "take all the clang flags and pass them through to the linker to pass back to the compiler", which isn't really relevant to this... well, maybe (if they're being passed back to the compiler in implicit thin lto, how does implicit thin lto work correctly without the -g flag?)

MaskRay added a comment.EditedJan 13 2021, 8:31 PM

In implicit thinlto this seemed to work for me without any changes:

$ clang++-tot -flto=thin test.cpp -g -c
$ clang++-tot -flto=thin -fuse-ld=lld -gsplit-dwarf test.o && llvm-dwarfdump-tot a.out
a.out:  file format elf64-x86-64

.debug_info contents:
0x00000000: Compile Unit: length = 0x0000002c, format = DWARF32, version = 0x0004, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x00000030)

0x0000000b: DW_TAG_compile_unit
              DW_AT_stmt_list   (0x00000000)
              DW_AT_comp_dir    ("/usr/local/google/home/blaikie/dev/scratch")
              DW_AT_GNU_dwo_name        ("a.out_dwo/1.dwo")
              DW_AT_GNU_dwo_id  (0xf29563d7c812deae)
              DW_AT_low_pc      (0x0000000000201730)
              DW_AT_high_pc     (0x0000000000201738)
              DW_AT_GNU_addr_base       (0x00000000)

Any idea why that worked without the need for a special case/changes? Might be nice if both these use cases (implicit and explicit thinlto) somehow managed to use more common code generation options. But I guess that's too different - probably some generic "take all the clang flags and pass them through to the linker to pass back to the compiler", which isn't really relevant to this... well, maybe (if they're being passed back to the compiler in implicit thin lto, how does implicit thin lto work correctly without the -g flag?)

Interesting. D44788 let Clang driver pass -plugin-opt=dwo_dir= to the linker.
The linker forwards the option to LTO. LTOBackend.cpp emits $dwo_dir/[01234].dwo.

There are .dwo files but the file names are a bit less ideal. For distributed ThinLTO, I think specifying the dwo filename explicitly is better.

dblaikie accepted this revision.Jan 13 2021, 8:38 PM

Looks alright. I wouldn't mind knowing a bit more about how these things interact if different types of input files are listed together in a single command line, though.

This revision is now accepted and ready to land.Jan 13 2021, 8:38 PM
MaskRay edited the summary of this revision. (Show Details)Jan 13 2021, 8:48 PM
MaskRay retitled this revision from [Driver] -gsplit-dwarf: Produce .dwo regardless of -gN if -fthinlto-index= is specified to [Driver] -gsplit-dwarf: Produce .dwo regardless of -gN for -fthinlto-index=.Jan 13 2021, 8:53 PM
MaskRay edited the summary of this revision. (Show Details)
This revision was landed with ongoing or failed builds.Jan 13 2021, 9:02 PM
This revision was automatically updated to reflect the committed changes.
tejohnson added inline comments.
clang/lib/Driver/ToolChains/Clang.cpp
3758

Better to simply check if the input type is IR/BC via types::isLLVMIR(Input.getType()). Technically you don't need ThinLTO to do a clang compile with bitcode input. Looks like Input can be passed down from the caller.

dblaikie added inline comments.Jan 13 2021, 10:45 PM
clang/lib/Driver/ToolChains/Clang.cpp
3758

Good point! +1 to that.

MaskRay added inline comments.Jan 13 2021, 11:03 PM
clang/lib/Driver/ToolChains/Clang.cpp
3758

That currently does not work:
clang -g -c -flto a.c; fclang -gsplit-dwarf -g a.o -o a
there is no .dwo

tejohnson added inline comments.Jan 13 2021, 11:08 PM
clang/lib/Driver/ToolChains/Clang.cpp
3758

I assume you mean clang and not "fclang"?

Try with "-x ir" like "clang -gsplit-dwarf -g -x ir a.o -o a". I thought that the -x ir was needed when passing a .o file that was actually bitcode?

Can we not just make -gsplit-dwarf set the DwarfFissionKind without also looking at other things? That way, we don't have to worry about detecting the right mix of -g, -g2, -fthinlto-index, -x ir, bitcode input, etc. If there is some reason why this doesn't work, I would also like it to be expressed in a comment in the code so that someone doesn't come along and simplify/break the code later on.

Can we not just make -gsplit-dwarf set the DwarfFissionKind without also looking at other things? That way, we don't have to worry about detecting the right mix of -g, -g2, -fthinlto-index, -x ir, bitcode input, etc. If there is some reason why this doesn't work, I would also like it to be expressed in a comment in the code so that someone doesn't come along and simplify/break the code later on.

D94655

inglorion added inline comments.Jan 14 2021, 12:35 PM
clang/lib/Driver/ToolChains/Clang.cpp
3758

I thought that the -x ir was needed when passing a .o file that was actually bitcode?

I think that's a linker invocation, not a codegen invocation.

I would have spelled it

clang -fuse-ld=lld -flto -gsplit-dwarf -g a.o -o a

And, for me at least, it does use debug fission:

objdump -g a | grep dwo                                    
    DW_AT_GNU_dwo_name DW_FORM_strp
    DW_AT_GNU_dwo_id   DW_FORM_data8
    <14>   DW_AT_GNU_dwo_name: (indirect string, offset: 0x29): a_dwo/0.dwo
    <18>   DW_AT_GNU_dwo_id  : 0x24fab96c73f98143
  0x00000020 2f746d70 2f747374 00615f64 776f2f30 /tmp/tst.a_dwo/0
  0x00000030 2e64776f 00                         .dwo.
dblaikie added inline comments.Jan 14 2021, 1:02 PM
clang/lib/Driver/ToolChains/Clang.cpp
3758

They're somewhat different things.

clang -gsplit-dwarf -g -x ir a.o -o a

With a single file, this is the same as the lto invocation - but with multiple files this is different, because it compiles the IR to object code, then links, rather than linking then compiling to object code.
The easier way to write "Compile to object code" includes things like:

clang -c -gsplit-dwarf a.bc
clang -c -gsplit-dwarf a.ll

for bitcode IR and textual IR respectively.

But yes, all these forms, whatever user-driven invocation causes IR->object code generation to occur, should accept and respect -gsplit-dwarf. And I think hopefully that's the case after/with D94655