Support to load debug info from dwarf split file, like .dwo, .dwp files. Leverage the getNonSkeletonUnitDIE(false) API to achieve this.
Add test cause to make sure all the ranges is well retrieved by the loader.
Differential D115973
[llvm-profgen] Support symbol loading for debug fission wlei on Dec 17 2021, 2:59 PM. Authored by
Details Support to load debug info from dwarf split file, like .dwo, .dwp files. Leverage the getNonSkeletonUnitDIE(false) API to achieve this. Add test cause to make sure all the ranges is well retrieved by the loader.
Diff Detail
Event Timeline
Comment Actions addressing reviewers' feedback Add --dwp=DWPPath to specify dwo/dwp file path. Add more testcase(-gsplit-dwarf=single, -gsplit-dwarf=split, -fsplit-dwarf-inlining)
Comment Actions Thanks for bringing up the debug fission support.
Comment Actions add test case for -fsplit-dwarf-inlining without --dwp
Comment Actions Prebuilt .o files are strongly discouraged. They are opaque and difficult to update / difficult to compare across versions. Comment Actions Not sure that the DWARF yaml is especially readable - I think we usually just use raw assembly for this sort of thing? But yeah, we don't tend to check in object files anymore. Comment Actions In 2021-01, https://lists.llvm.org/pipermail/llvm-dev/2021-January/148048.html ([RFC] Cross-project lit test suite) discussed some pain in cross-project debug info testing. Perhaps @jhenderson has some idea for the clang usage in the test.
Comment Actions I've not dug into this patch at all, but in general terms, the way I see it, you have about three options to craft DWARF in test objects these days:
Comment Actions Sorry for the late reply, I'm taking time off recently. @MaskRay @dblaikie @jhenderson Thanks for your helpful feedbacks!
I've updated all binaries using yaml2obj to generate the .o/.dwo/executable binary. I assume the .o files you mentioned also include the executable binary. As llvm-profgen always require an executable binary(--binary=<binary>), here I converted all the .o/.dwo/exe to yaml format. Thanks for the detailed instructions. Updated the test using yaml2obj.
Thanks! Updated the test using -fdebug-compilation-dir=., now --dwp is only for dwp file. Yeah, it have to cd into the test directory to make the lit work.
Comment Actions Did you read jhenderson's point?
The tests are clumsy. I don't think we should land the change as is. Comment Actions @MaskRay Thanks for your feedbacks! I did try to reduce the *.exe.yaml file but found that it's hard to do this because there're a lot of dependences, I think it's also less productive for people in the future to reproduce that.
Yeah, but the test is tested against those debug sections, I guess using assembly should have the same issue. Considering this, it seems the only way to leave for cross-project-tests?
Could you elaborate more on this? The exe binary is the root dependence of those tests. Regarding the stub test, do you mean leave a .c source file in llvm/test/.., then move the whole test file into cross-project-tests? I did also play with the cross-project-tests, but it seems it isolates the test from the llvm tool(llvm-profgen here). Before we can just ninja check-llvm-tools, but now we need to run more cmds. You see, in llvm-symbolizer test https://github.com/llvm/llvm-project/tree/main/llvm/test/tools/llvm-symbolizer/Inputs, there're still many exe for split dwarf. This is exactly the same to this patch, I'm curious is there any concerns not moving llvm-symbolizer test into cross-project-tests? cc @jhenderson @dblaikie Comment Actions Bit of history - way back in the day, we used checked in binaries (objects or executables) to test binary utilities like llvm-symbolizer (the thinking being that that was the input to the tool, we didn't want the test to cover more/unrelated tools (like assemblers) so that they were isolated and wouldn't fail for unrelated reasons (like the assembler being broken)) This made tests hard to read, etc. Eventually (I think with lld testing, maybe before) that softened to "let's check in assembly and depend on the LLVM integrated assembler to assemble those into binaries and test the resulting binaries" - the thinking being that the assembler is simple/robust enough, and the readability/maintainability improvement for the tests was worth the small chance of unrelated failures. But it's not easy to hand-craft DWARF assembly (or some other object file features) - so they're still kind of read-only tests, usually generated from some source code provided in a comment or in a side file (like one in Inputs, that isn't actually read by any test, but exists there for documentation and if the test assembly needs to be modified/regenerated). There's also yaml2obj - which I think lets you make somewhat hand-writable DWARF for some DWARF features at least and more object/non-DWARF features - which, if possible, would be preferred and it'd be great if it were improved to the point where lots of these object utility tests could be written in terms of hand-crafted/directly editable yaml2obj inputs - but I don't think it has enough feature coverage for that to entirely be the case yet. Probably more "you could generate an object file, obj2yaml it, then maybe it'd be useful for minor edits/easier to read than raw assembly, but you'd probably still want to include the source/compilation command in a comment or side file for posterity/documentation/etc".
Yeah, cross-project-tests in my fairly stern opinion, should not be a substitute for in-tree tests. All functionality should include in-subproject testing where the functionality resides (if you touch clang source in a way that's observable (ie: not NFC) - then there should be a clang test that goes with it). cross-project-tests can be used for integration. (eg: if you test that clang generates this IR, and you test that LLVM optimizes that IR to something important - you could have a cross-project-test that verifies that compiling some source code with clang generates that important assembly, so that the handshake between clang and LLVM doesn't get out of sync/lost). Comment Actions @dblaikie This's very informative, I have learned more contexts, thanks! I forgot to provide the special context about this testing: the purpose of this testing is to test whether llvm-profgen can use split dwarf binary as input to generate a non-empty profile, it doesn't really care about the source code and it doesn't intend to test any debug info features or the integration(clang or lld), we should have other dedicated testings for them. Hence, to narrow the scope, it just used a very simple source code(with building instructions in comments), it's very unlikely to break any debug info features or integration, I have never got any related failures in other tests of llvm-profgen before. So I lean toward keeping it into the in-tree tests. Also as shown in this version, it seems it doesn't improve the readability of the test using yaml2obj or assembly. While testing llvm-profgen, it's actually designed for read-only. People reproducing this need to call the clang cmd offline to generate the obj/exe, call obj2yaml, then spend some time to reduce the file, which seems inconvenient. @MaskRay I do appreciate your feedback very much, do you accept using the pre-built obj/exe for this test considering the aforementioned points? Comment Actions I've missed a step here - it'd still be desirable, if possible, to write the test with something like yaml2obj to make the input human readable so it would be easier to understand the test - the actual inputs and expected behavior. Though the current yaml isn't a great representation of that (I think the current yaml in this patch is probably worse than assembly - but there are maybe better options): you can see how to write DWARF in yaml in this test file: llvm/test/tools/yaml2obj/ELF/DWARF/debug-info.yaml - I'm not sure it's still /ideal/ for writing test cases, so it'd be interesting to see how it looks for an issue like this. (it might not/probably doesn't support things like dwp indexes... and having those as raw hex strings doesn't sound ideal, but I guess we don't have anything that generates annotated/legible assembly for them either... ) But, yeah, if all the tools we have at the moment don't make for a moderately readable textual form, I wouldn't fundamentally object to checked in object files and source code/repro steps. Possible that the missing element right now is textual/readable dwp support, so maybe that's the limit/justification for using checked in binaries for this issue. Comment Actions Pinging @Higuoxing, as he was the one who actually did a lot of the more recent yaml2obj DWARF implementation and may have some feedback on how best to reduce things. Picking up on a couple of different points: As @dblaikie mentioned, cross-project-tests is primarily for integration testing between LLVM components. One use case that is on the boundary line is llvm-symbolizer and other tool testing that need to consume a linked binary for their test case, as you can't use LLD outside of the cross-project-tests and the lld project. yaml2obj could be used for generating the test input objects in such cases, but as already noted, it may not be practical for some cases to reduce the YAML down to a usable document. Regarding checked-in binaries as test inputs: I'd always prefer to avoid this, even if the only alternative is using an opaque assembly or YAML file as input, with instructions on how to generate it (specifically the input source, the compilation/linking commands, and the version of the tool(s) used in generating it). The reason for this is repository size: git isn't very good at keeping the size of committed binaries to a minimum, meaning that every binary ever checked into the repo, even if not still in use contributes to the size of people's local git repositories. It gets worse when that binary gets updated too. Another advantage with the checked-in source approach is that it's easy to get diffs between versions: say a tweak was made to the test input, it is usually fairly straightforward to see the differences that such a tweak cause from a diff. Comment Actions Is this about file size in general (I'd expect an object file to be smaller than the equivalent yaml or assembly) - or it's about git and non-textual files? (I guess you're saying that it stores textual file updates as diffs rather than whole copies, but binary files it doesn't know how to encode diffs so it encodes the whole file each time?) Comment Actions Yeah, object file is actually smaller than the yaml file if yaml file can't be reduced, take split-dwarf-split.dwo for example: dwo file 1.5K vs yaml file 2.6K. I think if it's read-only(like this testing), it doesn't make any difference for git diff. But I understand yaml file is good for readability for sure, I'm trying to reduce it.. Do you know how to reduce the field of split-dwarf-split.dwo.yaml? see the patch, I found the four fields(.debug_str.dwo, .debug_str_offsets.dwo, .debug_info.dwo, .debug_abbrev.dwo) are must-have(removing will cause test failed), but apparently there're hex string in the Content which are not ideal. I run obj2yaml split-dwarf-split.dwo > split-dwarf-split.dwo.yaml to get the yaml file, do I miss anything? (I assume /llvm/test/tools/yaml2obj/ELF/DWARF/debug-info.yaml is also created by obj2yaml) or currently obj2yaml/yaml2obj lack to support *.dwo(as you said "it might not/probably doesn't support things like dwp indexes")? Comment Actions My understanding (which admittedly may be incorrect, as I'm by no means a git expert), is as you describe in the parentheses, so yeah, I'm talking about the git repository (specifically the .git folder). I wouldn't be surprised if .dwo sections aren't supported yet - I know that @Higuoxing didn't finish everything. That being said, I think it's unlikely to be all that tricky to implement, if you wanted to do some experiments, by basing the implementation on the existing debug section implementations. Comment Actions Thanks for the discussion. I didn't chose to move this testing to cross-project-tests because this doesn't intend to test out-tree components(clang or lld) at all, it just aims to test if debuginfo can be loaded(doesn't matter the content) by llvm-profgen. I just simplified the source code of testing, the dwo file size is much small(800Byte) and the dwo.yaml file size is actually bigger 1.7K. Also I think this feature(testing if debuginfo can be loaded) is unlikely to be frequently modified to affect the repo size.
As you mentioned, the human readable section of dwo debug_* sections are unsupported for now and I also found some unsupported sections in the linked binary which is required by this testing. I feel it would be a separate task to implement dwo2yaml feature, it seems kind of counter-productive to block a work by the unsupported infra of test. Would it be good to check in with the binary or raw yaml for now and I will fix this once dwo2yaml is ready? Comment Actions Seems like you've accidentally added an empty file to cross-project-tests?
Comment Actions
Agreed with @dblaikie. IIUC the reasons/goals yaml was suggested are: 1) human readable, editable test; 2) smaller in file size. But it turns out in this particular case for dwo/dwp, yaml achieves neither of the two. I understand that there's an eventual state we want to be in, but it has a dependency on better tooling support which doesn't exist yet. Given what we have right now, why do we still want to push for yaml in this particular case as it achieves none of the two goals? I think it makes more sense to go with binary for this case, and we can regenerate test inputs when yaml tool has better support - better means actually achieving its goals. As for using cross-project-test, I feel that is hack - there's no integration across projects involved here, so the test doesn't belong there. WDYT @jhenderson? Comment Actions Currently, yaml2obj doesn't support generating split dwarf object files. I can help add support for some of the dwo sections but it needs several weeks? I guess. I proposed D120184 for adding support for emitting/dumping the .debug_str.dwo section. Comment Actions As noted above, the file size issue isn't about the physical size on disk of the file in the checked out testsuite, but rather the potentially long-term impact on the .git folder size. This latter impact grows significantly every time the file gets modified, as I understand it (caveat: I haven't researched this and am only going on what others have told me), as the entire file is stored there, rather than a diff. Based on the sizes specified above, it would only take approximately 2 later updates to the binary to ensure it has a permanent impact on repo size that is greater than the YAML file does, even if only a single byte has changed. You'd also still need to check in source and instructions to allow regenerating the binary in the future (this latter point is required regardless of the approach is needed, of course, until tooling is sufficiently good. One advantage with the YAML approach is that you can store this source and instructions in the same file, rather than separate, where you coudl end up with a dead references (e.g. the binary is later replaced with a hand-crafted YAML, whilst the source and instructions weren't deleted). Comment Actions Addressing jhenderson's feedbacks: remove the emptyt file(added by mistake) in cross-project-test, add build instructions in the comments.
Comment Actions Nothing more from me at this point, thanks. (Not approving as I don't know this code at all). Comment Actions I'm sure the only left concern was about the way of testing. Thanks for your review and suggestion on the testing. @jhenderson |
Also include the obj2yaml command for full reproducible steps.