- User Since
- Jul 8 2015, 10:26 AM (309 w, 3 d)
Fri, Jun 11
I think this is moving in the right direction! However, the way ThreadTrace is specified still raises the same scalability concerns (see inline). The "instruction index" concept doesn't seem sufficient. Decoding needs to be done in two stages: a separate "trace index" concept would help with that. In the first stage, lldb identifies which subset of the trace it needs decoded by specifying start/stop trace indices. This should be fast, as counting the number of available trace bytes/"chunks" is O(1). Nothing needs to be decoded at this point. The second stage starts after a trace subset is decoded, and is where instruction indices are useful.
Thu, Jun 10
Thanks for sharing this! Specifically re: the -fprofile-generate v. MIP comparison for code coverage, perhaps it would be more fair to compare against -finstrument-function-entry-bare. The latter only adds one call per function, whereas -fprofile-generate can add instrument a function in more than one place.
@ggeorgakoudis apologies again for the delay here.
Wed, Jun 9
Tue, Jun 8
I'm quite concerned about the design decision to represent a trace as a vector of TraceInstruction. This bakes considerations that appear to be specific to Facebook's usage of IntelPT way too deep into lldb's APIs, and isn't workable for our use cases.
Thanks, excited to see this work progress!
Wed, Jun 2
If llvm stopped appending __profd_foo to llvm.compiler.used, there might not be anything left to prevent the optimizer from discarding it. The current scheme of having the linker ignore N_NO_DEAD_STRIP if S_ATTR_LIVE_SUPPORT is in effect seems like a neat way to solve the issue. Is there a tidy alternative?
While I haven't spotted any issues with this patch, I feel it'd be better for someone with more familiarity with COFF linking to offer a second +1.
Thanks for adding these excellent tests. Lgtm as well.
Thanks Adrian. + 1 from me as well, FTR.
Tue, May 25
Mon, May 24
Thu, May 20
Thanks, lgtm as well. On Darwin, the __llvm_prf_data section is marked with S_ATTR_LIVE_SUPPORT to allow the linker to dead strip functions even if they are pointed-to by a profd global. Removing the reference altogether should yield even better size benefits.
Wed, May 19
Tue, May 18
Mon, May 17
Address review feedback.
Fri, May 14
Thanks for the review!
Address code review feedback.
Great - I just wanted to understand the motivation. Thanks for the cleanup!
May 13 2021
Is there a planned change to the use-after-return instrumentation that would regress this test as-is?
May 12 2021
May 11 2021
Just so I'm clear, are you saying that every Counter and Expression generated is guaranteed to be added to the coverage map (so there are no index gaps), even if the counter is part of an unreachable (and therefore eliminated) branch/block?
Thanks for the context. I haven't looked at the Rust FE's coverage implementation, so I'm not familiar with any tradeoffs inherent to the implementation approach. One thing you wrote does concern me:
May 7 2021
Please take a look at this patch by Michael Daniels that implements a similar feature:
Thanks for digging into this. Could you explain why it's necessary to use more coverage counters than there are regions? If there's no longer a reserved counter (counter #0), then are counters N >= Regions.size() somehow redundant (and if so, are they worth deduplicating in the Rust FE)?
Apr 29 2021
Since this is a fairly impactful driver change, I'd suggest sending a heads-up about it to llvm-dev and adding a release note for extra visibility.
Apr 26 2021
Apr 22 2021
Context: In D59417 (see also llvm.org/PR37964), there was some debate about whether it's worthwhile to assign a line 0 location to a phi instruction which would otherwise not have a location. The consensus seemed to be: no, it's not worth it. So D75242 was spun off to suppress the false positive in debugify. It looks like D75242 had the unexpected/unfortunate side-effect of introducing a new FP: any unique, preserved DebugLoc attached to a phi instruction because "missing".
@phosek I apologize for the delay in reviewing this.
This looks great, thanks.
Apr 21 2021
Thanks for doing this!
Apr 20 2021
Apr 16 2021
for example it makes possible to extract the same region with different exclusions without creating another CodeExtractor instance,
Apr 9 2021
Apr 8 2021
If you'd like, you can pass -debugify-level=locations to elide those dbg.value calls.
Apr 1 2021
I think then you would somehow need multiple -compilation-dir options if e.g. you have paths with different levels of ../ nesting, e.g. ../../foo.c and ../../../../bar.c. That sounds like it could become hard to manage.
This patch let llvm-cov construct output directory with the to part of -path-equivalence=from,to as prefix
Mar 29 2021
Please add a test. I think I've reproduced the bug with this:
Mar 25 2021
Mar 23 2021
Thanks for explaining. I'd suggest making ExcludedAggArgs part of a CodeExtractor instances internal state: e.g. the client may call CE.addArgExludedFromAggregate(Value *) some number of times before CE.extractCodeRegion(). This way, the client doesn't need to maintain a SetVector, and the rest of the interface isn't polluted with an option that's specific to the AggregateArg case.
Thanks for bearing with me :). The memory profile is really helpful.
Mar 22 2021
This should prevent the MemoryBuffer API from allocating a buffer for the object file, just to add a '\0' at the end. If the OS can just mmap() the buffer readonly, that should be essentially free.
Yes, I have tried disabling the RequiresNullTerminator bit, but it didn't help. Actually, the readonly buffer itself was fine. There are some places reading and storing data from them.
Have you determined why opening a readonly binary costs memory?
Sorry about that. The file was removed in 8248dd91d7f042893d4a605e98d19cb1b89a44d4.
Mar 19 2021
Add asserts checking that the library paths exist, and restrict the test to Darwin. See https://bugs.llvm.org/show_bug.cgi?id=49656
- Delete an unused '#undef DLOPEN_OPTIONS'
- Define RTLD_LAZY in the expression as suggested
Mar 18 2021
- Make the test generic (not Darwin-specific)
- Add a test. It's Darwin-specific as I couldn't work out how to pass the .dylib path to the linker in a platform-agnostic way.
- Trim the comment.
The best way I can think of to test this is to:
This needs a test.
Mar 16 2021
Mar 10 2021
Use a helper function, respect the force argument.
Thanks for the detailed explanation of the -fprofile-list workflow; given the difference constraints, this patch lgtm. Please document the divergent behavior re: no .profraw file when #counters == 0 for non-MachO in the clang docs.
Mar 9 2021
@ributzka may have stronger thoughts about when -fprofile-instr-generate must imply that a known set of symbols appear with external visibility. Up until now, the answer has been "always", and this is what tapi enforces for MachO. It's awkward to have this be inconsistent between MachO/ELF, but if there's a compelling performance reason then I think it's fine.
Mar 4 2021
It might help to have a bot online first to avoid shipping a regressed libprofile for RISCV, but I don't want to hold up work that depends on this. I'm not sure what the right call is, cc'ing @ro and @eugenis who have past experience (D40943)
It's great to see this. Will these tests get picked up by an existing RISCV bot?