This adds inline stack frames for symbolizing on Windows.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This change causes about 30 asan tests to fail, so I still have to fix those.
llvm/lib/DebugInfo/PDB/Native/NativeFunctionSymbol.cpp | ||
---|---|---|
59 | I think this sometimes returns true incorrectly, seems like it happens when there are multiple ChangeCodeLength annotations. Maybe I'm interpreting those incorrectly? |
This looks pretty good.
- Are the sources for the binary test inputs around somewhere so they could be recreated if necessary someday?
- The test turns a small stack into file+line numbers. Is that sufficient or should there also be one to get function name+offsets?
- Most of the formatting nits listed by lint are legit and should be fixed before committing.
I'll be curious to hear what you figure out about the ASAN tests.
[It's so weird how methods likes getSymbolById return std::unique_ptrs. That's not a bug in this patch, just an unintuitive interface design that we probably inherited from the original DIA wrappers. Every time I see it, though, I end up chasing down calls to make sure it's correct.]
llvm/include/llvm/DebugInfo/PDB/Native/NativeSession.h | ||
---|---|---|
134 | I don't see anything explicit in LLVM's style guidelines, but I think it's common to segregate the data members and methods whenever possible. So perhaps the parseSectionContribes() should be moved up. | |
llvm/lib/DebugInfo/PDB/Native/NativeFunctionSymbol.cpp | ||
59 | I don't know much about these annotations. Can these be nested? For example if foo runs from f0 to f1 and the annotations also describe another subrange inside of f0..f1, then I would think Found would have to be a counter rather than a bool. If it's 0, you're not inside any range. If it's 2, you're inside two nested ranges. If they don't nest, then I don't see a problem. |
-Add cpp source for the test binaries
-Fixed some of the lint errors (mostly adding consts), but some of the formatting ones conflict with git clang-format.
Doesn't it also have function names? The asan tests will also be providing some test coverage for this.
I think most of them are probably windows tests that don't expect inline stack frames, but I haven't actually fixed any of them yet.
There are also some non windows-specific tests that are failing because the windows inline stack frames are apparently different from the linux ones (e.g. compiler-rt/test/asan/TestCases/calloc-overflow.cpp). Seems like the first line on the linux stack trace is a macro, and in the windows stack trace the first line is a function that was in the macro.
Ok, as far as I can tell, all of the asan tests are failing for the same reason-- the symbolizer now outputs an extra line for __sanitizer::BufferedStackTrace::Unwind.
#0 0x7ff6f9fa7e64 in __sanitizer::BufferedStackTrace::Unwind C:\src\llvm-project\compiler-rt\lib\sanitizer_common\sanitizer_stacktrace.h:124 #1 0x7ff6f9fa7e64 in malloc C:\src\llvm-project\compiler-rt\lib\asan\asan_malloc_win.cpp:98
on the failing asan tests: I assume the symbolizer is doing the correct thing (since the native and DIA implementations do the same thing), so I think I'll just update the tests to accept the extra line. I was wondering why we don't get this line on Linux, though, so I looked at where the address is in the calloc function in assembly, and it seems to be in a different part of the function on Windows and Linux.
llvm/lib/DebugInfo/PDB/Native/NativeFunctionSymbol.cpp | ||
---|---|---|
59 | I figured out why I was getting different results with DIA, and it turned out to be unrelated to the inline code. (I wasn't specificying PDB_SymType::Function when searching for the parent function, and I guess my implementation for findSymbolByAddress is different). |
I guess I misspoke when we chatted, I think we need to avoid these extra frames. Here's the line of code that I think is executing:
https://github.com/llvm/llvm-project/blob/d784f7406911c4fb6bc559320f7f9ff134be7ff5/compiler-rt/lib/asan/asan_stack.h#L45
stack.Unwind(StackTrace::GetCurrentPc(), \ GET_CURRENT_FRAME(), nullptr, fast, max_size); \
Here's what I think might be happening: in the MS C++ ABI, argument evaluation has to be right-to-left. On Linux, it is left to right. So, on Linux, we get assembly that looks like this:
callq GetCurrentPC movq $0, %rdi # set up other args ... # inlined callsite for Unwind
But on Windows, that's rearranged like so:
movq $0, %rcx # set up other args callq GetCurrentPc # set up rightmost arg last # inlined call site for Unwind
I guess to sort it all out, the thing to do is to get the annotated assembly produced by clang-cl, and look at the assembly stream. It will have the human-readable .cv_loc directives to help us work out where to apply the fix. We could, for example, subtract one from the result of GetCurrentPc().
Hm, ok. Here's the assembly around GetCurrentPc. I haven't looked too far into it, but I guess I'll put it here for now.
On Linux:
.LBB4_9: callq _ZN11__sanitizer10StackTrace12GetCurrentPcEv@PLT .Ltmp72: movq %rax, %r12 movq _ZN11__sanitizer21common_flags_dont_useE@GOTPCREL(%rip), %rax movb 34(%rax), %bl callq _ZN6__asan20GetMallocContextSizeEv@PLT .Ltmp73: .loc 2 0 3 is_stmt 0 # llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:0:3 xorl %ecx, %ecx .Ltmp74: .loc 3 116 31 is_stmt 1 # llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_stacktrace.h:116:31
On Windows:
.LBB9_4: # %if.else #DEBUG_VALUE: calloc:size <- $r15 #DEBUG_VALUE: calloc:nmemb <- $rdi .cv_loc 16 2 114 0 # C:/src/llvm-project/compiler-rt/lib/asan/asan_malloc_win.cpp:114:0 mov sil, byte ptr [rip + "?common_flags_dont_use@__sanitizer@@3UCommonFlags@1@A"+34] lea r14, [rbp - 128] call "?GetCurrentPc@StackTrace@__sanitizer@@SA_KXZ" xor ecx, ecx .Ltmp70: #DEBUG_VALUE: Unwind:this <- [DW_OP_constu 80, DW_OP_minus, DW_OP_stack_value] $rbp #DEBUG_VALUE: Unwind:context <- 0 #DEBUG_VALUE: Unwind:request_fast <- [DW_OP_LLVM_convert 1 7, DW_OP_LLVM_convert 8 7, DW_OP_stack_value] undef #DEBUG_VALUE: Unwind:max_depth <- $ebx #DEBUG_VALUE: Unwind:pc <- $rax #DEBUG_VALUE: Unwind:bp <- $r14 .cv_inline_site_id 19 within 16 inlined_at 2 114 0 .cv_loc 19 3 116 0 # C:/src/llvm-project/compiler-rt/lib\sanitizer_common/sanitizer_stacktrace.h:116:0
It looks like the first frame on Windows doesn't even point to GetCurrentPc; it points to the call to UnwindImpl.
Well, from reading the code, I would expect PC to be the return address of GetCurrentPc, which points to the xor %ecx, %ecx instruction, which is not in the inline call site. The .cv_inline_site_id directive happens at the next instruction. Maybe there is a bug in the way this code compares against inline site begin/end markers.
It's also possible that LLVM's own inline line annotations aren't correct. One thing you could try to debug this is to load up windbg, run an asan test, set a breakpoint on GetCurrentPc, take a stack trace from inside it, and see what windbg says. If it lists the UnwindImpl frame, then it must be a bug in LLVM's inline line table annotations, not your code.
llvm/lib/DebugInfo/PDB/Native/NativeFunctionSymbol.cpp | ||
---|---|---|
70 | I wonder if this should be > instead of >=. Consider the case of: callq somewhere .cv_loc ... inline location nop # anything Otherwise, this seems like it should work to me. |
The result of GetCurrentPc does point to the GetCurrentPc return address. If I give that address to llvm-symbolizer it correctly gets calloc as the function. (And the call stack from the VS debugger looks correct as well). So as far as I can tell, LLVM's inline line annotations are correct and the symbolizer is doing the right thing.
However the first address in the stack trace is not that address.
I looked a bit more in the debugger yesterday asan's stack trace code, and stack traces are being created here. I guess it searches for the result of GetCurrentPc in the stack trace, but for some reason it's not there and the closest thing is the UnwindImpl return address?
Wow, that is some interesting code. :) That explains what is going on: we get the current PC, then we do a stack unwind later (inside the inline call frame), then we take the PC, and truncate the stack trace with it. The return address in the stack trace actually *is* inside the inlined call site, so this code never would've worked with inline line info.
I think this can be fixed by modifying the stack trace in place, after popping off the extra stack frames. After this line:
https://github.com/llvm/llvm-project/blob/1882568fcb08ed8af689f13826cc7e84c3c84e33/compiler-rt/lib/sanitizer_common/sanitizer_unwind_win.cpp#L39
... overwrite the first PC in the stack trace with pc. It could work. :)
I'd like to be able to test this code more thoroughly. Take a look at llvm/test/DebugInfo/symbolize-paths.s. Can we construct a similar test with an assembly file? What would it take to get that set up? That would allow us to write tests with really fine grained control over the .cv_loc positioning, so we could test all the edge cases for what happens when you symbolize a PC that falls in a gap in an inline call site.
llvm/lib/DebugInfo/PDB/Native/NativeFunctionSymbol.cpp | ||
---|---|---|
70 | Any thoughts on this? |
llvm/lib/DebugInfo/PDB/Native/NativeFunctionSymbol.cpp | ||
---|---|---|
70 | Hm, the things I've seen are something like func: .cv_inline_site_id 2 ... .cv_loc 2 ... movl ... # inlined code addl ... # inlined code .cv_loc 1 nop # some other instructions and the address of the movl would be the starting offset of the S_INLINESITE. So the < would be non-inclusive but not the >=. |
llvm/lib/DebugInfo/PDB/Native/NativeFunctionSymbol.cpp | ||
---|---|---|
70 | Got it, makes sense. I think I was thinking about return addresses, where the debugger is stopped after the executing instruction, and it wants to report the line info for the previous instruction. |
I added an assembly test and removed the other test in llvm/test/tools, since it pretty much does the same thing.
I'm not sure how much there really is to test in terms of edge cases; I just put in one nested inline site and some line and file changes.
llvm/lib/DebugInfo/PDB/Native/SymbolCache.cpp | ||
---|---|---|
566 | fyi, I removed this bit of code since it was buggy. It basically searches for line numbers in the next compile unit if we haven't exceeded the length. I was just trying to match DIA's behavior here, but I don't think we ever use this |
lgtm, but please address the commented out code before landing.
Sorry for the delay, I wrote all these comments, and then didn't hit send.
llvm/include/llvm/DebugInfo/PDB/Native/NativeSession.h | ||
---|---|---|
133 | You know, we shouldn't actually need to create this intermediate data structure. The section contributions in the PDB are sorted by ascending RVA. It should be possible to binary search them directly with std::lower_bound without building a new interval map data structure. But, I see you are just moving this code around, so this can be a follow-up optimization. | |
llvm/include/llvm/DebugInfo/PDB/Native/SymbolCache.h | ||
40 | All the const/mutable changes make sense, the cache is "notionally" const. Looking things up fills the cache, but doesn't affect results. | |
llvm/lib/DebugInfo/PDB/Native/NativeFunctionSymbol.cpp | ||
113–114 | Getting here requires doing O(log(#inputsections)) work, which is pretty quick. This loop here is O(#symbols in function). Maybe one day the lookup will need to be faster or be cached, but I don't see any obvious way to do that today. | |
llvm/lib/DebugInfo/PDB/Native/SymbolCache.cpp | ||
74 | Please remove the commented out part. | |
566 | Makes sense. |
llvm/include/llvm/DebugInfo/PDB/Native/NativeSession.h | ||
---|---|---|
133 | Hm, yeah, I'll look at it later. |
lld/test/COFF/symbolizer-inline.s | ||
---|---|---|
3–5 | After updated the dependencies (adding a dependency on llvm-symbolizer from lld's tests so this test could run 27e73816d6f9a7e627db73c445c4329db2ecfeaf ) that got me looking at this/thinking: If this is the first use of llvm-symbolizer in lld, maybe it's out of place here? Indeed this patch made no changes to lld, so it seems unsuitable that tests be added to lld - changes to llvm should be tested from within llvm. For ELF, at least, we're leaning towards writing hand-crafted assembly and the assembling that (with llvm-mc) and running llvm-symbolizer on the assembled file. If that model would work for COFF that'd be great - but otherwise it is acceptable to include source and repro steps in a file, and checkin a binary file for running llvm-symbolizer over. |
lld/test/COFF/symbolizer-inline.s | ||
---|---|---|
3–5 | IMO it is important for test readability that we start with assembly, not a checked in binary file. It allows us to come up with creative .cv_loc transitions from one instruction to the next, and validate that we get the right source location at each instruction boundary. There is prior art for using llvm-mc to produce an object file in llvm and then using llvm-symbolizer on that object file, but it's impossible to do the same for COFF. llvm-symbolizer expects to operate on a PDB file. The only tool capable of making a PDB from an object right now is LLD. While it's unfortunate that the test lives in the wrong repo, the great increase in testability makes it worth it to me. Debug info is historically undertested or only tested via interactive debugger integration tests. I think there is a huge amount of value to this level (medium size integration?) of testing. |
lld/test/COFF/symbolizer-inline.s | ||
---|---|---|
3–5 |
I agree that it's important - but to me, not at the cost of leaving the code untested in LLVM. That code is tested in the repository where it's committed seems like a fairly core value of the LLVM project in my experience.
Not sure I follow here - a checked in binary, built from source including creative .cv_loc transitions, etc, could have the same test coverage, right?
Yeah, I still have sort of mixed feelings about that - but generally accepted the tradeoff (of increasing the size of the code underneath the test - including all of llvm-mc, rather than only the dumping/symbolizing code itself) in test maintainability is worthwhile, given the fairly limited complexity of functionality in llvm-mc that's required for most of these tests. (mostly don't care about eh_frame, so the line table generation is about the worst part of that - the rest of the DWARF is generated using basic assembly directives so the probability of bugs in llvm-mc messing with the tests is relatively low)
Yeah, I thought that might be the case hence the caveat that maybe a checked in binary would be needed.
I'm not sure it's any less testable, though - generating and checking in a binary doesn't seem like a huge hurdle to writing tests here. It was done in the past & while moving to assembly based tests has some value add, I don't think the checked in binaries testing libDebugInfo have been a major burden on LLVM development/maintenance.
I don't think this test is adding significantly more/different testing than I think would be reasonably requested otherwise. An LLVM test with a checked in binary and an lld test using assembly and a debug info dump of some kind (if that isn't already covered - I'd have assumed it was already covered when the relevant functionality was added to lld in other/previous patches?).
I'm not sure I see this particular test providing a lot more value over testing the components separately. And even then, I think it's important to have the isolated testing in addition to any end-to-end testing. There's the debuginfo-tests directory for more end-to-end testing, if that's desired (not sure if/how well that fits/is usable for COFF/PDB testing, though - could be worth expanding to support if it's not already testable there). Could have a full end-to-end test there, source code down to PDBs and symbolized. (might even be able to be portable between DWARF and PDB?) |
lld/test/COFF/symbolizer-inline.s | ||
---|---|---|
3–5 |
We could also add a binary for testing in the llvm-symbolizer tests, where the other tests are. I didn't consider the fact that the code is untested in LLVM. I agree that the assembly test doesn't provide any extra coverage that a checked in binary can't; I think the main upside of including the assembly test in lld is so that the test is easier to understand/modify. But maybe it's not too difficult to just include the assembly file in the llvm test inputs directory and re-build the binary whenever the test needs to be changed. |
lld/test/COFF/symbolizer-inline.s | ||
---|---|---|
3–5 |
Agreed that being able to write the assembly is beneficial (though even the DWARF tests from assembly aren't all that maintainable - maybe the yaml2obj will eventually make DWARF more human writable) - but hopefully not too bad to write a test like the many other llvm-symbolizer tests that are using checked in binaries. (I'd argue, though not as firmly as I am about there being some testing in llvm itself, that once such testing is added, the lld test should probably be removed - as we tend not to intentionally implement end-to-end tests like this in LLVM - maybe something really end-to-end in the debuginfo-tests repository would be suitable) Broader/side question: Is it possible to implement pdbs in assembly? Are they "just" a COFF file with particular bytes in them? Are those bytes reasonable for a human to write, or do they, for instance, contain hash tables or other things that would be hard to construct by hand? |
lld/test/COFF/symbolizer-inline.s | ||
---|---|---|
3–5 |
We did host an intern whose project goal was to make CodeView assembly tests easier to write and maintain. The same could be done for DWARF. We made several directives to support a lot of the hard-to-write-by-hand parts into MC, the assembler. For example, .cv_inline_line_table is a directive that generates these wacky state machine opcodes. I suppose for DWARF we are more constrained by what gas can do.
I think debuginfo-tests aren't really the right place for this. They mostly focus on testing integration with interactive debuggers. To my knowledge, we aren't running debuginfo-tests continuously on Windows yet. The setup and configuration was sufficiently difficult that it fell off the top of the stack.
Nope, they are not COFF files. PDBs are a kind of "multistream file" (MSF). We do have the means to make PDB files from yaml, though, so we could test things that way. However, the inline line tables, the thing under test in this case, are not represented symbolically, I believe they are just strings of hex bytes. We could make more testing tools to make PDBs, but at a certain point you're going to apply COFF relocations, and now you have yourself a second linker. I think if we want to make llvm-symbolizer more testable from LLVM, the way to go is to:
|
lld/test/COFF/symbolizer-inline.s | ||
---|---|---|
3–5 |
Just chiming in that @Higuoxing spent this year's GSOC period significantly improving DWARF support in yaml2obj and to a lesser extent obj2yaml. I don't think it's covered every case yet, but it's certainly possible to write many more tests using YAML than it was before (see llvm\test\tools\yaml2obj\ELF\DWARF for a number of tests that demonstrate the behaviour that has been added), and with greater ease, as some of the information at least is now automatically derived by yaml2obj without needing to be explicitly specified in the YAML itself. Unfortunately, that doesn't help with COFF/PDB of course. |
lld/test/COFF/symbolizer-inline.s | ||
---|---|---|
3–5 |
I think we can implement assembly extensions without necessarily mandating that we only accept ones that gas accepts - especially if they're more for our own test usage (though I think yaml2obj is somewhat nicer/keeps these internal test utilities separate - since we'd have no real intent of using these assembly directives in the wild).
Perhaps - I'd be happy to widen the use case of debuginfo-tests for other things like symbolizing. And if the existing interactive debugger parts weren't Windows suitable/of interest, I'd be OK with it being possible to run those symbolizer bits without the interactive debugger bits. But would provide a space for a full clang+lld+llvm-symbolizer test cases, which currently can't be done in either the clang or lld repository, for instance.
Any chance of improving the yaml support so those can be written symbolically? (not sure where that sits compared to other things being suggested/considered)
Yeah, not sure that'd be the best bang-for-buck.
Yeah, +1 for that direction from me, at least.
|
(Taking out-of-line since the comment thread has rather diverged from the original thing)
Perhaps - I'd be happy to widen the use case of debuginfo-tests for other things like symbolizing. And if the existing interactive debugger parts weren't Windows suitable/of interest, I'd be OK with it being possible to run those symbolizer bits without the interactive debugger bits. But would provide a space for a full clang+lld+llvm-symbolizer test cases, which currently can't be done in either the clang or lld repository, for instance.
We've internally been considering a need for an "integration" lit-based testsuite that is able to test the interactions in tools that would otherwise not really fit in one or the other location. We have our own downstream non-lit based test repository, but for some tests this is too heavyweight, and it would be nice to test things more locally. Examples include showing, for example, that tools can consume debug data produced by clang (note that because defaults might change as to what version of DWARF is emitted, an llvm-mc assembly or YAML based test here wouldn't be appropriate), or similarly for relocations being handled by the downstream tools. The intent here is not to replace targeted lit tests, but rather to show that end-to-end behaviour is still sensible.
Maybe this idea could be useful upstream? It would need the ability to create a lit-based testsuite, probably as a separate top-level directory in the llvm-project tree, which can use components from all projects. We'd want REQUIRES support that could be used to say REQUIRES: clang or similar so that users could still run the subset of tests that work with the projects they build.
Yeah - I'd probably first suggest that such tests could go into debuginfo-tests, though if it's meant to extend to things other than debug info - yeah, another top level repo might be suitable. Maybe even the test-suite itself, though that's a bit of a dark place and maybe a distinct place with narrower, pure lit based testing would be beneficial.
An integration style testing for llvm-symbolizer (both clang and lld are allowed) sounds good to me. The current llvm-symbolizer tests for ELF mostly focus on relocatable objects. There are very few shared object/executable targeted tests.
I've made a note on our internal issue tracker item for our integration suite idea, to keep an eye on this. I don't if or when we'll get around to it, so if somebody else wants to get the ball rolling, just let me know what people have done and I can update our tracker accordingly, to make sure we don't duplicate effort. I guess the first stage would be to propose and RFC.
A couple of notes in case anybody picks this up: 1) @probinson mentioned on our internal issue tracker that there was an end-to-end round table discussion that touched on this topic at the 2019 developers meeting, so there's probably some wider interest outside those involved in this discussion. 2) I think it needs to be integrated with the check-all target, so that people and bots will run it properly.
After updated the dependencies (adding a dependency on llvm-symbolizer from lld's tests so this test could run 27e73816d6f9a7e627db73c445c4329db2ecfeaf ) that got me looking at this/thinking: If this is the first use of llvm-symbolizer in lld, maybe it's out of place here?
Indeed this patch made no changes to lld, so it seems unsuitable that tests be added to lld - changes to llvm should be tested from within llvm. For ELF, at least, we're leaning towards writing hand-crafted assembly and the assembling that (with llvm-mc) and running llvm-symbolizer on the assembled file. If that model would work for COFF that'd be great - but otherwise it is acceptable to include source and repro steps in a file, and checkin a binary file for running llvm-symbolizer over.