This is an archive of the discontinued LLVM Phabricator instance.

[llvm-profgen] Support symbol loading for debug fission
ClosedPublic

Authored by wlei on Dec 17 2021, 2:59 PM.

Details

Summary

Support to load debug info from dwarf split file, like .dwo, .dwp files. Leverage the getNonSkeletonUnitDIE(false) API to achieve this.

Add test cause to make sure all the ranges is well retrieved by the loader.

Diff Detail

Event Timeline

wlei created this revision.Dec 17 2021, 2:59 PM
wlei requested review of this revision.Dec 17 2021, 2:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 17 2021, 2:59 PM
wlei edited the summary of this revision. (Show Details)Dec 17 2021, 3:06 PM
wlei added reviewers: ayermolo, hoy, wenlei.
ayermolo added inline comments.Dec 18 2021, 7:53 AM
llvm/tools/llvm-profgen/ProfiledBinary.cpp
638

Right now it will always use info from DWO sections, even if -fsplit-dwarf-inlining is set. I wonder if it's worth checking that if children exist use them. I think with split dwarf enabled -fsplit-dwarf-inlining is only option that will add children to Skeleton CU.

657

The path can be either relative or absolute.

wenlei added inline comments.Dec 19 2021, 12:01 AM
llvm/tools/llvm-profgen/ProfiledBinary.cpp
655–657

There's no debug info update involved here, is this message a copy paste?

And this should probably be a warning?

wlei updated this revision to Diff 395603.Dec 20 2021, 11:19 PM

addressing reviewers' feedback

Add --dwp=DWPPath to specify dwo/dwp file path.

Add more testcase(-gsplit-dwarf=single, -gsplit-dwarf=split, -fsplit-dwarf-inlining)

llvm/tools/llvm-profgen/ProfiledBinary.cpp
638

Yeah, I tried -fsplit-dwarf-inlining and it indeed add the children to Skeleton CU, then we will get two same function symbols, one from main binary, one from dwo section. It seems both have enough info, probably it's still safer to use the union set of them, so removed this check this condition. Even we have the duplication, the loader will just use the first one.

Added a test for this.

657

The path can be either relative or absolute.

I see, updated the comments, it appears that symbolizer requires the DWP file path, so added a cmd flag for it((BOLT also have it).

There's no debug info update involved here, is this message a copy paste?

Removed, thanks. Yeah, I copied the getNonSkeletonUnitDIE part of code from BOLT code.

And this should probably be a warning?

Changed to warning,

ayermolo added inline comments.Dec 23 2021, 10:08 AM
llvm/tools/llvm-profgen/ProfiledBinary.cpp
48

There is one DWP file. There could be multiple .o/.dwo files in multiple directories. I don't think this option makes sense for the latter. The path to those should be encoded in DWARF CUs.
Also for DWP it's not exactly a path, but fully resolved path+filename
auto Obj = object::ObjectFile::createObjectFile(

this->DWPName.empty()
    ? (DObj->getFileName() + ".dwp").toStringRef(DWPName)
    : StringRef(this->DWPName));
wlei updated this revision to Diff 396056.Dec 23 2021, 11:05 AM

add warning for .o/.dwo file path used as DWP path

wlei added inline comments.Dec 23 2021, 11:07 AM
llvm/tools/llvm-profgen/ProfiledBinary.cpp
48

Thanks! I see, while I was creating the test cases, I found that ".o and .dwo" were all encoded in absolute paths, then the tests will fail in others' machine, so I used this "-dwp" as a workaround to find the .o/.dwo file.
For this reason, how about that we give warning while using "-dwp" for .o and .dwo path?

Also for DWP it's not exactly a path, but fully resolved path+filename

I see, so even if the DWP path is missing, it can search the dwp in the same directory of obj file with ".dwp" suffix. Updated its description.

ayermolo added inline comments.Dec 23 2021, 11:23 AM
llvm/tools/llvm-profgen/ProfiledBinary.cpp
48

Warning seems like an overkill. Presumably people know what they are doing. I don't have a strong opinion one way or the other.

wlei updated this revision to Diff 396057.Dec 23 2021, 11:26 AM

removed the warning for dwp

ayermolo added inline comments.Dec 23 2021, 11:32 AM
llvm/tools/llvm-profgen/ProfiledBinary.cpp
638

I wonder if this even justifies a warning? With Debug Fission this warning will always be printed. It implies that something is wrong with DebugInfo in binary.

llvm/tools/llvm-profgen/ProfiledBinary.h
294

Maybe make this pass by reference? Since we don't expect it this be nullptr.

ayermolo accepted this revision.Dec 23 2021, 11:32 AM

Couple minor things, but looks good otherwise.
Thank you for working on this.

This revision is now accepted and ready to land.Dec 23 2021, 11:32 AM
wlei added inline comments.Dec 23 2021, 11:52 AM
llvm/tools/llvm-profgen/ProfiledBinary.cpp
48

Sounds good, it seems most of time o/.dwo files are not visible to user, removed the warning.

638

Good point, warning removed.

llvm/tools/llvm-profgen/ProfiledBinary.h
294

Good point, fixed.

wlei updated this revision to Diff 396068.Dec 23 2021, 11:52 AM

Updating D115973: [llvm-profgen] Support symbol loading for debug fission

wlei added a comment.Dec 23 2021, 12:01 PM

Couple minor things, but looks good otherwise.
Thank you for working on this.

Thanks for your guidance and review!

hoy added a comment.Jan 2 2022, 9:36 PM

Thanks for bringing up the debug fission support.

llvm/test/tools/llvm-profgen/split-dwarf.test
3

A dumb question. In the single mode, where does the dwo object live? Is it in the immediate .o and all such .o should be saved for debugging?

13

Add a test case for no --dwp?

llvm/tools/llvm-profgen/ProfiledBinary.cpp
650

What is a DWO unit checked against here? Is it an indirection to the real compilation unit?

ayermolo added inline comments.Jan 4 2022, 9:52 AM
llvm/test/tools/llvm-profgen/split-dwarf.test
3

Yes, in single mode dwo sections remain in the lo files.

wlei updated this revision to Diff 397452.Jan 4 2022, 8:21 PM

add test case for -fsplit-dwarf-inlining without --dwp

llvm/test/tools/llvm-profgen/split-dwarf.test
3

Yeah, I think when used in the real service, they're all packaged in dwp file, here it's only for testing.

13

Good point, added. Without -dwp, it will only load the inliner function.

llvm/tools/llvm-profgen/ProfiledBinary.cpp
650

I think it's an internal check whether the current parsed result is a DWO unit.
The getNonSkeletonUnitDIE is expected to get a DWOUnit, if the parsing failed and not a DWOUnit, it's likely due to the incorrect dwo file path.

ayermolo added inline comments.Jan 4 2022, 9:34 PM
llvm/tools/llvm-profgen/ProfiledBinary.cpp
650

Right. It's a bit weird implementation. If it fails to parse and produce DWO Die it invokes getUnitDIE and returns a Die for Skeleton CU.

hoy added inline comments.Jan 5 2022, 10:01 AM
llvm/test/tools/llvm-profgen/split-dwarf.test
13

Thanks for adding the test. Per the dwo switch description, should the dwo file be loaded if it is not specified and placed alongside with the binary?

When not specified, it will be <binary>.dwp in the same directory as the main binary.

I guess it'll look for split-dwarf-split-inlining.perfbin.dwo instead of split-dwarf-split-inlining.dwo?

ayermolo added inline comments.Jan 5 2022, 10:05 AM
llvm/test/tools/llvm-profgen/split-dwarf.test
13

.dwo/.o files don't need to be next to the binary, and there is no assumption that they will be. In fact in production environment they are not. The fully resolved path is specified in DWARF CU. If .dwo/.o files are not there then it's a warning/error. Depending on context.

hoy added inline comments.Jan 5 2022, 10:09 AM
llvm/test/tools/llvm-profgen/split-dwarf.test
13

Yeah, but in this case we have them side by side. I guess it's an issue of the file naming that resulted in the dwo file not loaded. Should we change the arg description to not support the automatic load?

wlei added inline comments.Jan 5 2022, 10:47 AM
llvm/test/tools/llvm-profgen/split-dwarf.test
13

Yeah, for .dwo/.o files, if not specified --dwp, it will used the path encoded in the CU and it's an absolute path. In this test, the path is pointed to a directory in my machine, so it will fail for others.

For .dwp file, if not specified --dwp, it will be <binary>.dwp in the same directory.

In fact, --dwp is designed only to be used for .dwp file. In real case, we won't expect -dwp used for .dwo/.o files, here only work around for the testing.

hoy accepted this revision.Jan 5 2022, 11:13 AM
hoy added inline comments.
llvm/test/tools/llvm-profgen/split-dwarf.test
13

I see. I misunderstood dwo files as dwp files. It's clear to me now. Can you please add the clarification in the description of the dwp switch? LGTM otherwise.

dblaikie added inline comments.Jan 5 2022, 12:13 PM
llvm/test/tools/llvm-profgen/split-dwarf.test
13

Other tests for Split DWARF that need to test .dwo access use binaries/objects created with -fdebug-compilation-dir (or -fdebug-prefix-map) to avoid baking in an absolute path. The test case can copy the .dwo to a known temp directory, have the test cd into that directory (or a parent, if -fdebug-compilation-dir is made to be a parent of some subdir the .dwo is in) and then the tool should be able to resolve the relative path.

llvm/tools/llvm-profgen/ProfiledBinary.cpp
650

We can change APIs if they don't behave in a reasonable way.

LLVM's symbolizer APIs have to look through from skeleton units to split units - We probably should refactor/reuse that code here, if possible?

wenlei added inline comments.Jan 5 2022, 3:16 PM
llvm/test/tools/llvm-profgen/split-dwarf.test
13

It'd be good if we can have relative path encoded in CU with -fdebug-compilation-dir/-fdebug-prefix-map, so split unit can be loaded following that path without needing to cd into a directory.. Otherwise, the --dwp workaround is probably fine though I also find it a bit confusing to pass .o/.dwo to --dwp

llvm/tools/llvm-profgen/ProfiledBinary.cpp
618–620

typo: Misssing->Missing

dblaikie added inline comments.Jan 5 2022, 7:30 PM
llvm/test/tools/llvm-profgen/split-dwarf.test
13

Not sure I follow - you can use a relative path with -fdebug-compilation-dir/-fdebug-prefix-map, but I think for lit tests you'd still need to cd into the test directory because I don't think lit guarantees which directory is the cwd of the test at the start.

I'd rather not implement (or test using a quirk/unintended feature, even if it doesn't require explicit code to support the use case) a semi-confusing --dwp flag that supports dwo files for testing when there's an alternative (^) that we use for all other dwo testing

MaskRay requested changes to this revision.Jan 7 2022, 4:43 PM
MaskRay added a subscriber: MaskRay.

Prebuilt .o files are strongly discouraged. They are opaque and difficult to update / difficult to compare across versions.
It's recommended to use yaml2obj to generate .o files.

This revision now requires changes to proceed.Jan 7 2022, 4:43 PM

Prebuilt .o files are strongly discouraged. They are opaque and difficult to update / difficult to compare across versions.
It's recommended to use yaml2obj to generate .o files.

Not sure that the DWARF yaml is especially readable - I think we usually just use raw assembly for this sort of thing? But yeah, we don't tend to check in object files anymore.

In 2021-01, https://lists.llvm.org/pipermail/llvm-dev/2021-January/148048.html ([RFC] Cross-project lit test suite) discussed some pain in cross-project debug info testing. Perhaps @jhenderson has some idea for the clang usage in the test.

llvm/test/tools/llvm-profgen/split-dwarf.test
19
20

Add -NEXT: whenever applicable to have a better FileCheck diagnostic in case of an error. Tests should be written in the mind that others may need to update them.

llvm/tools/llvm-profgen/ProfiledBinary.cpp
618–620
635–643

In 2021-01, https://lists.llvm.org/pipermail/llvm-dev/2021-January/148048.html ([RFC] Cross-project lit test suite) discussed some pain in cross-project debug info testing. Perhaps @jhenderson has some idea for the clang usage in the test.

I've not dug into this patch at all, but in general terms, the way I see it, you have about three options to craft DWARF in test objects these days:

  1. yaml2obj with DWARF, if the support is sufficiently mature for your use-case, and the output is easy enough to read - one advantage with DWARF yaml2obj is that you can omit many of the required fields, unlike assembly, so depending on the complexity you need, this may be a better option for long-term maintenance.
  2. Directly write assembly, and then use llvm-mc to compile it into the object file you want. If necessary, you might generate this assembly via clang, and then document the input source, the command line, the version of clang used to generate it, and any post-generation modifications you made. (Other variations might be to compile from IR or other similar input).
  3. Add the test to cross-project-tests somewhere, and then use clang directly within the test (together with a REQUIRES: clang directive). However, this should really only be done if either of the above don't sensibly work OR you are testing the interaction between clang (explicitly clang, not the LLVM backend, for which you can use other tools like llc etc) and llvm-profdata.
wlei updated this revision to Diff 407665.Feb 10 2022, 1:38 PM

addressing reviewers' feedback.

wlei added a comment.Feb 10 2022, 1:40 PM

Sorry for the late reply, I'm taking time off recently.

@MaskRay @dblaikie @jhenderson Thanks for your helpful feedbacks!

It's recommended to use yaml2obj to generate .o files.

! In D115973#3228784, @MaskRay wrote:

In 2021-01, https://lists.llvm.org/pipermail/llvm-dev/2021-January/148048.html ([RFC] Cross-project lit test suite) discussed some pain in cross-project debug info testing. Perhaps @jhenderson has some idea for the clang usage in the test.

I've updated all binaries using yaml2obj to generate the .o/.dwo/executable binary. I assume the .o files you mentioned also include the executable binary. As llvm-profgen always require an executable binary(--binary=<binary>), here I converted all the .o/.dwo/exe to yaml format.

In 2021-01, https://lists.llvm.org/pipermail/llvm-dev/2021-January/148048.html ([RFC] Cross-project lit test suite) discussed some pain in cross-project debug info testing. Perhaps @jhenderson has some idea for the clang usage in the test.

I've not dug into this patch at all, but in general terms, the way I see it, you have about three options to craft DWARF in test objects these days:

  1. yaml2obj with DWARF, if the support is sufficiently mature for your use-case, and the output is easy enough to read - one advantage with DWARF yaml2obj is that you can omit many of the required fields, unlike assembly, so depending on the complexity you need, this may be a better option for long-term maintenance.
  2. Directly write assembly, and then use llvm-mc to compile it into the object file you want. If necessary, you might generate this assembly via clang, and then document the input source, the command line, the version of clang used to generate it, and any post-generation modifications you made. (Other variations might be to compile from IR or other similar input).
  3. Add the test to cross-project-tests somewhere, and then use clang directly within the test (together with a REQUIRES: clang directive). However, this should really only be done if either of the above don't sensibly work OR you are testing the interaction between clang (explicitly clang, not the LLVM backend, for which you can use other tools like llc etc) and llvm-profdata.

Thanks for the detailed instructions. Updated the test using yaml2obj.

Not sure I follow - you can use a relative path with -fdebug-compilation-dir/-fdebug-prefix-map, but I think for lit tests you'd still need to cd into the test directory because I don't think lit guarantees which directory is the cwd of the test at the start.
I'd rather not implement (or test using a quirk/unintended feature, even if it doesn't require explicit code to support the use case) a semi-confusing --dwp flag that supports dwo files for testing when there's an alternative (^) that we use for all other dwo testing

Thanks! Updated the test using -fdebug-compilation-dir=., now --dwp is only for dwp file. Yeah, it have to cd into the test directory to make the lit work.

llvm/test/tools/llvm-profgen/split-dwarf.test
19

Fixed, thanks!

20

Fixed, thanks!

llvm/tools/llvm-profgen/ProfiledBinary.cpp
618–620

Fixed, thanks!

635–643

Fixed, thanks!

wlei added a comment.Feb 14 2022, 10:42 AM

Ping.. @MaskRay Could you take a further look? thanks!

Did you read jhenderson's point?

  1. yaml2obj with DWARF, if the support is sufficiently mature for your use-case, and the output is easy enough to read - one advantage with DWARF yaml2obj is that you can omit many of the required fields, unlike assembly, so depending on the complexity you need, this may be a better option for long-term maintenance.

The tests are clumsy. I don't think we should land the change as is.
split-dwarf-split.exe.yaml has nearly 800 lines and includes many tags which are not needed for the purpose of the test. For example, all the STT_SECTION symbols, .eh_frame, .init_array, and .dynsym can be deleted.
The .debug_* sections are opaque and difficult to maintain.
If the readability of split-dwarf-split.exe.yaml cannot be improved after some cleanups, try leaving a small stub test and consider adding the fullblown test (simplification still required) in cross-project-tests.

wlei added a comment.Feb 14 2022, 1:50 PM

Did you read jhenderson's point?

  1. yaml2obj with DWARF, if the support is sufficiently mature for your use-case, and the output is easy enough to read - one advantage with DWARF yaml2obj is that you can omit many of the required fields, unlike assembly, so depending on the complexity you need, this may be a better option for long-term maintenance.

The tests are clumsy. I don't think we should land the change as is.
split-dwarf-split.exe.yaml has nearly 800 lines and includes many tags which are not needed for the purpose of the test. For example, all the STT_SECTION symbols, .eh_frame, .init_array, and .dynsym can be deleted.
The .debug_* sections are opaque and difficult to maintain.
If the readability of split-dwarf-split.exe.yaml cannot be improved after some cleanups, try leaving a small stub test and consider adding the fullblown test (simplification still required) in cross-project-tests.

@MaskRay Thanks for your feedbacks! I did try to reduce the *.exe.yaml file but found that it's hard to do this because there're a lot of dependences, I think it's also less productive for people in the future to reproduce that.

The .debug_* sections are opaque and difficult to maintain.

Yeah, but the test is tested against those debug sections, I guess using assembly should have the same issue. Considering this, it seems the only way to leave for cross-project-tests?

try leaving a small stub test and consider adding the fullblown test (simplification still required) in cross-project-tests.

Could you elaborate more on this? The exe binary is the root dependence of those tests. Regarding the stub test, do you mean leave a .c source file in llvm/test/.., then move the whole test file into cross-project-tests?

I did also play with the cross-project-tests, but it seems it isolates the test from the llvm tool(llvm-profgen here). Before we can just ninja check-llvm-tools, but now we need to run more cmds. You see, in llvm-symbolizer test https://github.com/llvm/llvm-project/tree/main/llvm/test/tools/llvm-symbolizer/Inputs, there're still many exe for split dwarf. This is exactly the same to this patch, I'm curious is there any concerns not moving llvm-symbolizer test into cross-project-tests? cc @jhenderson @dblaikie

Bit of history - way back in the day, we used checked in binaries (objects or executables) to test binary utilities like llvm-symbolizer (the thinking being that that was the input to the tool, we didn't want the test to cover more/unrelated tools (like assemblers) so that they were isolated and wouldn't fail for unrelated reasons (like the assembler being broken)) This made tests hard to read, etc.

Eventually (I think with lld testing, maybe before) that softened to "let's check in assembly and depend on the LLVM integrated assembler to assemble those into binaries and test the resulting binaries" - the thinking being that the assembler is simple/robust enough, and the readability/maintainability improvement for the tests was worth the small chance of unrelated failures.

But it's not easy to hand-craft DWARF assembly (or some other object file features) - so they're still kind of read-only tests, usually generated from some source code provided in a comment or in a side file (like one in Inputs, that isn't actually read by any test, but exists there for documentation and if the test assembly needs to be modified/regenerated).

There's also yaml2obj - which I think lets you make somewhat hand-writable DWARF for some DWARF features at least and more object/non-DWARF features - which, if possible, would be preferred and it'd be great if it were improved to the point where lots of these object utility tests could be written in terms of hand-crafted/directly editable yaml2obj inputs - but I don't think it has enough feature coverage for that to entirely be the case yet. Probably more "you could generate an object file, obj2yaml it, then maybe it'd be useful for minor edits/easier to read than raw assembly, but you'd probably still want to include the source/compilation command in a comment or side file for posterity/documentation/etc".

You see, in llvm-symbolizer test https://github.com/llvm/llvm-project/tree/main/llvm/test/tools/llvm-symbolizer/Inputs, there're still many exe for split dwarf. This is exactly the same to this patch, I'm curious is there any concerns not moving llvm-symbolizer test into cross-project-tests? cc @jhenderson @dblaikie

Yeah, cross-project-tests in my fairly stern opinion, should not be a substitute for in-tree tests. All functionality should include in-subproject testing where the functionality resides (if you touch clang source in a way that's observable (ie: not NFC) - then there should be a clang test that goes with it). cross-project-tests can be used for integration. (eg: if you test that clang generates this IR, and you test that LLVM optimizes that IR to something important - you could have a cross-project-test that verifies that compiling some source code with clang generates that important assembly, so that the handshake between clang and LLVM doesn't get out of sync/lost).

wlei added a comment.Feb 14 2022, 4:48 PM

Bit of history - way back in the day, we used checked in binaries (objects or executables) to test binary utilities like llvm-symbolizer (the thinking being that that was the input to the tool, we didn't want the test to cover more/unrelated tools (like assemblers) so that they were isolated and wouldn't fail for unrelated reasons (like the assembler being broken)) This made tests hard to read, etc.

Eventually (I think with lld testing, maybe before) that softened to "let's check in assembly and depend on the LLVM integrated assembler to assemble those into binaries and test the resulting binaries" - the thinking being that the assembler is simple/robust enough, and the readability/maintainability improvement for the tests was worth the small chance of unrelated failures.

But it's not easy to hand-craft DWARF assembly (or some other object file features) - so they're still kind of read-only tests, usually generated from some source code provided in a comment or in a side file (like one in Inputs, that isn't actually read by any test, but exists there for documentation and if the test assembly needs to be modified/regenerated).

There's also yaml2obj - which I think lets you make somewhat hand-writable DWARF for some DWARF features at least and more object/non-DWARF features - which, if possible, would be preferred and it'd be great if it were improved to the point where lots of these object utility tests could be written in terms of hand-crafted/directly editable yaml2obj inputs - but I don't think it has enough feature coverage for that to entirely be the case yet. Probably more "you could generate an object file, obj2yaml it, then maybe it'd be useful for minor edits/easier to read than raw assembly, but you'd probably still want to include the source/compilation command in a comment or side file for posterity/documentation/etc".

You see, in llvm-symbolizer test https://github.com/llvm/llvm-project/tree/main/llvm/test/tools/llvm-symbolizer/Inputs, there're still many exe for split dwarf. This is exactly the same to this patch, I'm curious is there any concerns not moving llvm-symbolizer test into cross-project-tests? cc @jhenderson @dblaikie

Yeah, cross-project-tests in my fairly stern opinion, should not be a substitute for in-tree tests. All functionality should include in-subproject testing where the functionality resides (if you touch clang source in a way that's observable (ie: not NFC) - then there should be a clang test that goes with it). cross-project-tests can be used for integration. (eg: if you test that clang generates this IR, and you test that LLVM optimizes that IR to something important - you could have a cross-project-test that verifies that compiling some source code with clang generates that important assembly, so that the handshake between clang and LLVM doesn't get out of sync/lost).

@dblaikie This's very informative, I have learned more contexts, thanks!

I forgot to provide the special context about this testing: the purpose of this testing is to test whether llvm-profgen can use split dwarf binary as input to generate a non-empty profile, it doesn't really care about the source code and it doesn't intend to test any debug info features or the integration(clang or lld), we should have other dedicated testings for them. Hence, to narrow the scope, it just used a very simple source code(with building instructions in comments), it's very unlikely to break any debug info features or integration, I have never got any related failures in other tests of llvm-profgen before. So I lean toward keeping it into the in-tree tests.

Also as shown in this version, it seems it doesn't improve the readability of the test using yaml2obj or assembly. While testing llvm-profgen, it's actually designed for read-only. People reproducing this need to call the clang cmd offline to generate the obj/exe, call obj2yaml, then spend some time to reduce the file, which seems inconvenient.

@MaskRay I do appreciate your feedback very much, do you accept using the pre-built obj/exe for this test considering the aforementioned points?

Bit of history - way back in the day, we used checked in binaries (objects or executables) to test binary utilities like llvm-symbolizer (the thinking being that that was the input to the tool, we didn't want the test to cover more/unrelated tools (like assemblers) so that they were isolated and wouldn't fail for unrelated reasons (like the assembler being broken)) This made tests hard to read, etc.

Eventually (I think with lld testing, maybe before) that softened to "let's check in assembly and depend on the LLVM integrated assembler to assemble those into binaries and test the resulting binaries" - the thinking being that the assembler is simple/robust enough, and the readability/maintainability improvement for the tests was worth the small chance of unrelated failures.

But it's not easy to hand-craft DWARF assembly (or some other object file features) - so they're still kind of read-only tests, usually generated from some source code provided in a comment or in a side file (like one in Inputs, that isn't actually read by any test, but exists there for documentation and if the test assembly needs to be modified/regenerated).

There's also yaml2obj - which I think lets you make somewhat hand-writable DWARF for some DWARF features at least and more object/non-DWARF features - which, if possible, would be preferred and it'd be great if it were improved to the point where lots of these object utility tests could be written in terms of hand-crafted/directly editable yaml2obj inputs - but I don't think it has enough feature coverage for that to entirely be the case yet. Probably more "you could generate an object file, obj2yaml it, then maybe it'd be useful for minor edits/easier to read than raw assembly, but you'd probably still want to include the source/compilation command in a comment or side file for posterity/documentation/etc".

You see, in llvm-symbolizer test https://github.com/llvm/llvm-project/tree/main/llvm/test/tools/llvm-symbolizer/Inputs, there're still many exe for split dwarf. This is exactly the same to this patch, I'm curious is there any concerns not moving llvm-symbolizer test into cross-project-tests? cc @jhenderson @dblaikie

Yeah, cross-project-tests in my fairly stern opinion, should not be a substitute for in-tree tests. All functionality should include in-subproject testing where the functionality resides (if you touch clang source in a way that's observable (ie: not NFC) - then there should be a clang test that goes with it). cross-project-tests can be used for integration. (eg: if you test that clang generates this IR, and you test that LLVM optimizes that IR to something important - you could have a cross-project-test that verifies that compiling some source code with clang generates that important assembly, so that the handshake between clang and LLVM doesn't get out of sync/lost).

@dblaikie This's very informative, I have learned more contexts, thanks!

I forgot to provide the special context about this testing: the purpose of this testing is to test whether llvm-profgen can use split dwarf binary as input to generate a non-empty profile, it doesn't really care about the source code and it doesn't intend to test any debug info features or the integration(clang or lld), we should have other dedicated testings for them. Hence, to narrow the scope, it just used a very simple source code(with building instructions in comments), it's very unlikely to break any debug info features or integration, I have never got any related failures in other tests of llvm-profgen before. So I lean toward keeping it into the in-tree tests.

Also as shown in this version, it seems it doesn't improve the readability of the test using yaml2obj or assembly. While testing llvm-profgen, it's actually designed for read-only. People reproducing this need to call the clang cmd offline to generate the obj/exe, call obj2yaml, then spend some time to reduce the file, which seems inconvenient.

I've missed a step here - it'd still be desirable, if possible, to write the test with something like yaml2obj to make the input human readable so it would be easier to understand the test - the actual inputs and expected behavior. Though the current yaml isn't a great representation of that (I think the current yaml in this patch is probably worse than assembly - but there are maybe better options): you can see how to write DWARF in yaml in this test file: llvm/test/tools/yaml2obj/ELF/DWARF/debug-info.yaml - I'm not sure it's still /ideal/ for writing test cases, so it'd be interesting to see how it looks for an issue like this. (it might not/probably doesn't support things like dwp indexes... and having those as raw hex strings doesn't sound ideal, but I guess we don't have anything that generates annotated/legible assembly for them either... )

But, yeah, if all the tools we have at the moment don't make for a moderately readable textual form, I wouldn't fundamentally object to checked in object files and source code/repro steps. Possible that the missing element right now is textual/readable dwp support, so maybe that's the limit/justification for using checked in binaries for this issue.

Pinging @Higuoxing, as he was the one who actually did a lot of the more recent yaml2obj DWARF implementation and may have some feedback on how best to reduce things.

Picking up on a couple of different points:

As @dblaikie mentioned, cross-project-tests is primarily for integration testing between LLVM components. One use case that is on the boundary line is llvm-symbolizer and other tool testing that need to consume a linked binary for their test case, as you can't use LLD outside of the cross-project-tests and the lld project. yaml2obj could be used for generating the test input objects in such cases, but as already noted, it may not be practical for some cases to reduce the YAML down to a usable document.

Regarding checked-in binaries as test inputs: I'd always prefer to avoid this, even if the only alternative is using an opaque assembly or YAML file as input, with instructions on how to generate it (specifically the input source, the compilation/linking commands, and the version of the tool(s) used in generating it). The reason for this is repository size: git isn't very good at keeping the size of committed binaries to a minimum, meaning that every binary ever checked into the repo, even if not still in use contributes to the size of people's local git repositories. It gets worse when that binary gets updated too. Another advantage with the checked-in source approach is that it's easy to get diffs between versions: say a tweak was made to the test input, it is usually fairly straightforward to see the differences that such a tweak cause from a diff.

Pinging @Higuoxing, as he was the one who actually did a lot of the more recent yaml2obj DWARF implementation and may have some feedback on how best to reduce things.

Picking up on a couple of different points:

As @dblaikie mentioned, cross-project-tests is primarily for integration testing between LLVM components. One use case that is on the boundary line is llvm-symbolizer and other tool testing that need to consume a linked binary for their test case, as you can't use LLD outside of the cross-project-tests and the lld project. yaml2obj could be used for generating the test input objects in such cases, but as already noted, it may not be practical for some cases to reduce the YAML down to a usable document.

Regarding checked-in binaries as test inputs: I'd always prefer to avoid this, even if the only alternative is using an opaque assembly or YAML file as input, with instructions on how to generate it (specifically the input source, the compilation/linking commands, and the version of the tool(s) used in generating it). The reason for this is repository size:

Is this about file size in general (I'd expect an object file to be smaller than the equivalent yaml or assembly) - or it's about git and non-textual files? (I guess you're saying that it stores textual file updates as diffs rather than whole copies, but binary files it doesn't know how to encode diffs so it encodes the whole file each time?)

wlei updated this revision to Diff 409098.Feb 15 2022, 4:31 PM

remove *.yaml temporarily, only leave split-dwarf-split.dwo.yaml for easy review

wlei added a comment.Feb 15 2022, 4:35 PM

Pinging @Higuoxing, as he was the one who actually did a lot of the more recent yaml2obj DWARF implementation and may have some feedback on how best to reduce things.

Picking up on a couple of different points:

As @dblaikie mentioned, cross-project-tests is primarily for integration testing between LLVM components. One use case that is on the boundary line is llvm-symbolizer and other tool testing that need to consume a linked binary for their test case, as you can't use LLD outside of the cross-project-tests and the lld project. yaml2obj could be used for generating the test input objects in such cases, but as already noted, it may not be practical for some cases to reduce the YAML down to a usable document.

Regarding checked-in binaries as test inputs: I'd always prefer to avoid this, even if the only alternative is using an opaque assembly or YAML file as input, with instructions on how to generate it (specifically the input source, the compilation/linking commands, and the version of the tool(s) used in generating it). The reason for this is repository size:

Is this about file size in general (I'd expect an object file to be smaller than the equivalent yaml or assembly) - or it's about git and non-textual files? (I guess you're saying that it stores textual file updates as diffs rather than whole copies, but binary files it doesn't know how to encode diffs so it encodes the whole file each time?)

Yeah, object file is actually smaller than the yaml file if yaml file can't be reduced, take split-dwarf-split.dwo for example: dwo file 1.5K vs yaml file 2.6K. I think if it's read-only(like this testing), it doesn't make any difference for git diff.

But I understand yaml file is good for readability for sure, I'm trying to reduce it..

Do you know how to reduce the field of split-dwarf-split.dwo.yaml? see the patch, I found the four fields(.debug_str.dwo, .debug_str_offsets.dwo, .debug_info.dwo, .debug_abbrev.dwo) are must-have(removing will cause test failed), but apparently there're hex string in the Content which are not ideal.

I run obj2yaml split-dwarf-split.dwo > split-dwarf-split.dwo.yaml to get the yaml file, do I miss anything? (I assume /llvm/test/tools/yaml2obj/ELF/DWARF/debug-info.yaml is also created by obj2yaml) or currently obj2yaml/yaml2obj lack to support *.dwo(as you said "it might not/probably doesn't support things like dwp indexes")?

Is this about file size in general (I'd expect an object file to be smaller than the equivalent yaml or assembly) - or it's about git and non-textual files? (I guess you're saying that it stores textual file updates as diffs rather than whole copies, but binary files it doesn't know how to encode diffs so it encodes the whole file each time?)

My understanding (which admittedly may be incorrect, as I'm by no means a git expert), is as you describe in the parentheses, so yeah, I'm talking about the git repository (specifically the .git folder).

Do you know how to reduce the field of split-dwarf-split.dwo.yaml? see the patch, I found the four fields(.debug_str.dwo, .debug_str_offsets.dwo, .debug_info.dwo, .debug_abbrev.dwo) are must-have(removing will cause test failed), but apparently there're hex string in the Content which are not ideal.

I run obj2yaml split-dwarf-split.dwo > split-dwarf-split.dwo.yaml to get the yaml file, do I miss anything? (I assume /llvm/test/tools/yaml2obj/ELF/DWARF/debug-info.yaml is also created by obj2yaml) or currently obj2yaml/yaml2obj lack to support *.dwo(as you said "it might not/probably doesn't support things like dwp indexes")?

I wouldn't be surprised if .dwo sections aren't supported yet - I know that @Higuoxing didn't finish everything. That being said, I think it's unlikely to be all that tricky to implement, if you wanted to do some experiments, by basing the implementation on the existing debug section implementations.

wlei updated this revision to Diff 409841.Feb 17 2022, 7:23 PM

Simplify source code

wlei added a comment.Feb 17 2022, 11:47 PM

Is this about file size in general (I'd expect an object file to be smaller than the equivalent yaml or assembly) - or it's about git and non-textual files? (I guess you're saying that it stores textual file updates as diffs rather than whole copies, but binary files it doesn't know how to encode diffs so it encodes the whole file each time?)

My understanding (which admittedly may be incorrect, as I'm by no means a git expert), is as you describe in the parentheses, so yeah, I'm talking about the git repository (specifically the .git folder).

Do you know how to reduce the field of split-dwarf-split.dwo.yaml? see the patch, I found the four fields(.debug_str.dwo, .debug_str_offsets.dwo, .debug_info.dwo, .debug_abbrev.dwo) are must-have(removing will cause test failed), but apparently there're hex string in the Content which are not ideal.

I run obj2yaml split-dwarf-split.dwo > split-dwarf-split.dwo.yaml to get the yaml file, do I miss anything? (I assume /llvm/test/tools/yaml2obj/ELF/DWARF/debug-info.yaml is also created by obj2yaml) or currently obj2yaml/yaml2obj lack to support *.dwo(as you said "it might not/probably doesn't support things like dwp indexes")?

Thanks for the discussion. I didn't chose to move this testing to cross-project-tests because this doesn't intend to test out-tree components(clang or lld) at all, it just aims to test if debuginfo can be loaded(doesn't matter the content) by llvm-profgen. I just simplified the source code of testing, the dwo file size is much small(800Byte) and the dwo.yaml file size is actually bigger 1.7K. Also I think this feature(testing if debuginfo can be loaded) is unlikely to be frequently modified to affect the repo size.

I wouldn't be surprised if .dwo sections aren't supported yet - I know that @Higuoxing didn't finish everything. That being said, I think it's unlikely to be all that tricky to implement, if you wanted to do some experiments, by basing the implementation on the existing debug section implementations.

As you mentioned, the human readable section of dwo debug_* sections are unsupported for now and I also found some unsupported sections in the linked binary which is required by this testing. I feel it would be a separate task to implement dwo2yaml feature, it seems kind of counter-productive to block a work by the unsupported infra of test.

Would it be good to check in with the binary or raw yaml for now and I will fix this once dwo2yaml is ready?

Seems like you've accidentally added an empty file to cross-project-tests?

llvm/test/tools/llvm-profgen/Inputs/split-dwarf-split.dwo.yaml
2

Add a comment to the start or end of this file explaining how to regenerate this YAML (as described earlier - original source, commands and versions of tools used), so that it can be easily updated in the future.

But, yeah, if all the tools we have at the moment don't make for a moderately readable textual form, I wouldn't fundamentally object to checked in object files and source code/repro steps. Possible that the missing element right now is textual/readable dwp support, so maybe that's the limit/justification for using checked in binaries for this issue.

Agreed with @dblaikie.

IIUC the reasons/goals yaml was suggested are: 1) human readable, editable test; 2) smaller in file size. But it turns out in this particular case for dwo/dwp, yaml achieves neither of the two. I understand that there's an eventual state we want to be in, but it has a dependency on better tooling support which doesn't exist yet.

Given what we have right now, why do we still want to push for yaml in this particular case as it achieves none of the two goals? I think it makes more sense to go with binary for this case, and we can regenerate test inputs when yaml tool has better support - better means actually achieving its goals.

As for using cross-project-test, I feel that is hack - there's no integration across projects involved here, so the test doesn't belong there.

WDYT @jhenderson?

Currently, yaml2obj doesn't support generating split dwarf object files. I can help add support for some of the dwo sections but it needs several weeks? I guess. I proposed D120184 for adding support for emitting/dumping the .debug_str.dwo section.

But, yeah, if all the tools we have at the moment don't make for a moderately readable textual form, I wouldn't fundamentally object to checked in object files and source code/repro steps. Possible that the missing element right now is textual/readable dwp support, so maybe that's the limit/justification for using checked in binaries for this issue.

Agreed with @dblaikie.

IIUC the reasons/goals yaml was suggested are: 1) human readable, editable test; 2) smaller in file size. But it turns out in this particular case for dwo/dwp, yaml achieves neither of the two. I understand that there's an eventual state we want to be in, but it has a dependency on better tooling support which doesn't exist yet.

Given what we have right now, why do we still want to push for yaml in this particular case as it achieves none of the two goals? I think it makes more sense to go with binary for this case, and we can regenerate test inputs when yaml tool has better support - better means actually achieving its goals.

As noted above, the file size issue isn't about the physical size on disk of the file in the checked out testsuite, but rather the potentially long-term impact on the .git folder size. This latter impact grows significantly every time the file gets modified, as I understand it (caveat: I haven't researched this and am only going on what others have told me), as the entire file is stored there, rather than a diff. Based on the sizes specified above, it would only take approximately 2 later updates to the binary to ensure it has a permanent impact on repo size that is greater than the YAML file does, even if only a single byte has changed. You'd also still need to check in source and instructions to allow regenerating the binary in the future (this latter point is required regardless of the approach is needed, of course, until tooling is sufficiently good. One advantage with the YAML approach is that you can store this source and instructions in the same file, rather than separate, where you coudl end up with a dead references (e.g. the binary is later replaced with a hand-crafted YAML, whilst the source and instructions weren't deleted).

wlei updated this revision to Diff 410625.Feb 22 2022, 1:07 PM

Addressing jhenderson's feedbacks: remove the emptyt file(added by mistake) in cross-project-test, add build instructions in the comments.

jhenderson added inline comments.Feb 23 2022, 12:35 AM
llvm/test/tools/llvm-profgen/Inputs/split-dwarf-single.o.yaml
14

Also include the obj2yaml command for full reproducible steps.

wlei updated this revision to Diff 410737.Feb 23 2022, 12:49 AM

addressing feedback: add obj2yaml cmd.

Nothing more from me at this point, thanks. (Not approving as I don't know this code at all).

wlei added a comment.Feb 23 2022, 2:03 AM

Nothing more from me at this point, thanks. (Not approving as I don't know this code at all).

I'm sure the only left concern was about the way of testing. Thanks for your review and suggestion on the testing. @jhenderson

wenlei accepted this revision.Feb 23 2022, 9:29 AM

lgtm

This revision was not accepted when it landed; it landed in state Needs Review.Feb 23 2022, 9:41 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.