Page MenuHomePhabricator

[lldb-server] jThreadsInfo returns stack memory
Needs RevisionPublic

Authored by jarin on Feb 11 2020, 5:28 AM.

Details

Summary

This patch adds parts of the stack that should be useful for unwinding to the jThreadsInfo reply from lldb-server. We return the top of the stack (12 words), and we also try to walk the frame pointer linked list and return the memory containing frame pointer and return address pairs. The idea is to cover the cases with and without frame pointer omission.

Here are some questions:

  • Does this approach sound reasonable?
  • How do we test this?
  • Is it fine if we do not handle the big-endian and 32-bit word cases? (There we will be basically never generate the frame list.)

Diff Detail

Event Timeline

jarin created this revision.Feb 11 2020, 5:28 AM

Does this approach sound reasonable?

I think this is ok, in principle. It might be nice to include some numbers, like how much this increases the jThreadsInfo packet size vs. how much time or packets does this save us.

How do we test this?

I am imagining a two-pronged test strategy. One test, where we do a high-level check on the syntax of the packet, and cross-check the consistency of the data against regular memory reads. And another test, which creates "interesting" stack(s) via inline asm, and where we can make stronger assertions about the data returned.

Is it fine if we do not handle the big-endian and 32-bit word cases? (There we will be basically never generate the frame list.)

We should definitely do 32-bit. It shouldn't be that hard -- just use process->GetArchitecture().GetAddressByteSize() instead of sizeof(addr_t). I have a feeling this code might actually work on big-endian (if we assume the lldb-server and inferior have the same endianness), but it may be better to read the data via DataExtractor::GetAddress, as that will also handle the 32-bit case for you.

Another case where this won't work is architectures with upward-growing stacks -- but I don't think we need to handle that, as lldb does not even have a way to describe this.

Nice improvement. debugserver does a similar thing in our jThreadsInfo implementation as well as our ? packet response (T05, etc) where we include the current & caller stack frame, assuming the frame pointer and pc are saved to the stack,

1581545891.235902071 < 627> read packet: $T05thread:d4414d;threads:d4414d;thread-pcs:7fff7149ffd0;00:0000000000000000;01:28f9bfeffe7f0000;02:0200000000000000; [...] memory:0x7ffeefbfe4a0=d0e4bfeffe7f0000f20d2971ff7f0000;memory:0x7ffeefbfe4d0=f0e4bfeffe7f000066cf2971ff7f0000;#00

This gives the thread plans enough information that we can eliminate extra reads while instruction stepping. We try to do a full stack walk using the same "follow the $fp chain" algorithm for the jThreadsInfo packet for all threads.

It's interesting to send up the entire stack frame contents for frame 0. There's a good chance we'll need to fetch some/all of this for the selected frame (the frame the user is looking at), but it's probably not helping anything for the other threads.

As Pavel was suggesting, I'd get the size of the inferior's pointers from lldb, and I would read the bytes for the saved frame pointer into a buffer, wrap it in a DataExtractor, and let it do the correct byte swapping.

We'll still have an assumption that the stack moves downward, and that the saved pc and frame pointer are at the start of every stack frame. It's fine for x86/arm, but we probably have support for an architecture where this isn't true by this point, I wouldn't be surprised. We'll stop on the first memory read, so if we get back a bogus spilled-fp value, we will stop scanning, so I wouldn't expect it to cause problems, but those targets may not benefit from this. (debugserver only works on x86/arm, so it didn't have to figure this out.)

Thank you for the feedback! I am a bit busy with other things ATM, but I should be able to get back to this next week.

I should give credit where it's due - Pavel pointed me to the debug-server code and I took the idea from there. We do not always have the frame pointer list, so I took the hybrid approach of copying the top stack section from each thread, too.

As for some preliminary numbers, with 8ms RTT we see thread list update time in our UI improve from 800ms to 250ms in an Unreal engine sample. I will get more numbers (packet sizes) when I hack on this again next week.

BTW, a the lldb/examples/python/gdbremote.py script for these kinds of things measurements. It takes a lldb log file produced like so: log enable gdb-remote packets -T -f /tmp/gdb.log, and it gives you some statistics about the various packets present there:

#------------------------------------------------------------
# Packet                   Time (sec)  Percent Count  Latency
#------------------------- ----------- ------- ------ -------
                         x    0.003926  37.15%     22  0.000178
             qRegisterInfo    0.001462  13.83%    135  0.000011
                         A    0.001419  13.43%      1  0.001419
...

I'd be interested in seeing is how much does this extra work increase the latency of the jThreadsInfo packet.

To be honest, I'm thinking of stealing your idea of sending the $fp - $sp memory region for frame 0 in debugserver, at least for the currently selected thread. I might add a check that it's not a huge amount of bytes in case there's a giant buffer or something that we may not need to fetch.

jarin updated this revision to Diff 250951.Mar 17 2020, 5:41 PM
  • Added a test that checks consistency of thread info's memory chunks with the actual memory.
  • Using DataExtractor to extract pointers with the right endian-ness and the right size.

Thanks. Could you also add the other kind of test (the one inline asm) I mentioned. In an ideal world we'd have a test case for every boundary condition, but we're pretty far from that right now. Even so, one test case like that would be nice.

I am imagining the inferior doing something like:

movabsq $0xdead, %rax
pushq %rax ; fake pc
leaq 0x1000(%rsp), %rbp ; larger than kMaxFrameSize
pushq %rbp
movq %rsp, %rbp
pushq $1 ; fake frame contents
pushq $2
pushq $3

incq %rax
push %rax; second fake pc
pushq %rbp
movq %rsp, %rbp
pushq $4 ; fake frame contents
pushq $5
pushq $6
int3

and then the test would assert that the result contains the entirety of the first fake frame, the bp+pc of the second fake frame, and then would stop due to hitting the kMaxFrameSize boundary.

lldb/packages/Python/lldbsuite/test/tools/lldb-server/TestGdbRemoteThreadsInStopReply.py
373 ↗(On Diff #250951)

Also assert that you have at least one chunk here?

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
594–597

I think it would be cleaner if this returned the json::Object as a return value (much like GetStackMemoryAsJSON returns a json::Array).

627

You will need to "handle" the error inside the Expected object here. The object asserts (at runtime) that you do that at the right time. (http://llvm.org/docs/ProgrammersManual.html#recoverable-errors).

That said, if you're not going to do anything with the error, then maybe the GetRegisterValue doesn't really need to return an Expected (maybe Optional<RegisterValue>)

641

we usually use arrays of *u*int8_t to represent uninterpreted data.

671

It is probably going to be the same, but you might as well use process.GetArchitecture().GetByteOrder(), since you have it handy.

jarin updated this revision to Diff 251181.Mar 18 2020, 2:56 PM
jarin marked 5 inline comments as done.

Addressed reviewer comments in the code, but still have no clue how to write the test.

jarin added a comment.Mar 18 2020, 3:34 PM

Thanks. Could you also add the other kind of test (the one inline asm) I mentioned. In an ideal world we'd have a test case for every boundary condition, but we're pretty far from that right now. Even so, one test case like that would be nice.

Pavel, thank you for the careful review! I still do not quite understand how the asm test should look like and where it should live. Are you asking for creating a new directory with a compiled program below and building the machinery to stop at the right place and check the stack? If yes, that sounds like a fairly time-consuming undertaking, and I am afraid I cannot invest much more time into this. Perhaps it is better if we stick this patch only into our lldb branch, this one should be pretty easy for us to rebase.

As for the timings with local lldb on Linux, I see the time for jThreadsInfo of 100 threads with fairly shallow stacks (10 entries or so) taking 20ms locally, as opposed to 4ms before this change. The jThreadsInfo reply size is 80kB, up from 11kB. While this seems excessive, I do not see any extra memory traffic for getting a thread list with the patch, whereas without this patch we get over 100 roundtrips (each is at least 512 bytes of payload) to display the top of the stack of each thread.

I am imagining the inferior doing something like:

movabsq $0xdead, %rax
pushq %rax ; fake pc
leaq 0x1000(%rsp), %rbp ; larger than kMaxFrameSize
pushq %rbp
movq %rsp, %rbp
pushq $1 ; fake frame contents
pushq $2
pushq $3

incq %rax
push %rax; second fake pc
pushq %rbp
movq %rsp, %rbp
pushq $4 ; fake frame contents
pushq $5
pushq $6
int3

and then the test would assert that the result contains the entirety of the first fake frame, the bp+pc of the second fake frame, and then would stop due to hitting the kMaxFrameSize boundary.

lldb/packages/Python/lldbsuite/test/tools/lldb-server/TestGdbRemoteThreadsInStopReply.py
373 ↗(On Diff #250951)

I have asserted that for each thread now.

Thanks. Could you also add the other kind of test (the one inline asm) I mentioned. In an ideal world we'd have a test case for every boundary condition, but we're pretty far from that right now. Even so, one test case like that would be nice.

Pavel, thank you for the careful review! I still do not quite understand how the asm test should look like and where it should live. Are you asking for creating a new directory with a compiled program below and building the machinery to stop at the right place and check the stack? If yes, that sounds like a fairly time-consuming undertaking, and I am afraid I cannot invest much more time into this.

Yes, that is kind of what I am proposing, but I don't think it should be that hard. The creation of a new directory is necessary because of how our test suite expects there to be one inferior per directory, and we want to build a new inferior here. The "inferior would basically consist of a single main function containing something like the inline asm I mentioned previously (it probably doesn't compile right now, but I hope it shows the general idea). "Stopping at the right place" is implemented via the int3 opcode. Then the test would just run the program, wait for it to stop, and examine the jThreadsInfo response. It should expect to see one thread containing two (three?) memory records. The only tricky part would be validating the contents of those records, as they will not have fully static data due to the unpredictability of the %rsp value. But the test could compare that to the value of %rsp it gets through the p packet. Or maybe the inferior could align the stack at say 1MB boundary, and then the test could verify that rsp/rbp have the expected values modulo 1MB.

Perhaps it is better if we stick this patch only into our lldb branch, this one should be pretty easy for us to rebase.

That's something you'll have to decide. I'm still very much interested in having this patch, but I am also trying to maintain a higher bar for test coverage. I see this as sort of similar to the register reading tests we added recently (lldb/test/Shell/Register. Before that, we only had tests which try to read all registers and expect to get /a/ value. That's sort of similar to what your first test does -- it is somewhat useful, and it is cross-platform, but it doesn't really check all that much. Also, if we ever find a bug in this code, it will be impossible to write a regression test for it using that method.

This other test would be more similar to the "new" register tests. They are a bit trickier to write, but they enable you to write stronger assertions about what the code is actually supposed to do -- not just in the "normal" case, but also in various boundary conditions.

As for the timings with local lldb on Linux, I see the time for jThreadsInfo of 100 threads with fairly shallow stacks (10 entries or so) taking 20ms locally, as opposed to 4ms before this change. The jThreadsInfo reply size is 80kB, up from 11kB. While this seems excessive, I do not see any extra memory traffic for getting a thread list with the patch, whereas without this patch we get over 100 roundtrips (each is at least 512 bytes of payload) to display the top of the stack of each thread.

Thanks, for checking this out. 20ms for 100 threads doesn't sound too bad. I am guessing the statistics would look better if you also showed how many packets this saved. I would expect a net saving from all the memory read packets we can avoid sending now.

jarin updated this revision to Diff 251512.EditedMar 19 2020, 5:06 PM

Adding a tighter x64 test as suggested by labath@.

labath accepted this revision.Mar 20 2020, 1:11 AM

This is great. The new test looks even better than I had hoped for. Thanks for sticking with this.

lldb/packages/Python/lldbsuite/test/tools/lldb-server/threads-info/main.cpp
1 ↗(On Diff #251512)

I don't believe this include is needed.

3 ↗(On Diff #251512)

Maybe a short intro (here or on the test) which states that this constructs some artificial stack frames to test the frame-pointer following code in lldb-server.

This revision is now accepted and ready to land.Mar 20 2020, 1:11 AM
jarin updated this revision to Diff 251714.Mar 20 2020, 12:01 PM
jarin marked 3 inline comments as done.

Addressed comments.

jarin added a comment.Mar 21 2020, 3:57 AM

Regarding the packet savings - there are still things that worry me.

First of all, when lldb CLI stops on a breakpoint, it will first unwind top of the stack of each thread as part of ThreadList::ShouldStop. This sends lots of "x" packets to lldb-server and only then issues jThreadInfo, which then replies with the stack memory uselessly (with my patch). I am yet to investigate why the CLI does that.

My patch saves messages when we start stepping after the breakpoint - in my example with a hundred threads, when we stop on a breakpoint, take a step and then do "thread list", there are no additional 'x' packets sent to lldb-server for the "thread list" command; without my patch there is a packet per-thread. In our VS extension, this seems to help quite a bit, but I am not so sure about the CLI.

It also feels like jThreadsInfo might not the best place to send the stacks - I am wondering whether jstopinfo in the vCont/c packet response would be a better place (even though the encoding of that is quite terrible). What do you think?

lldb/packages/Python/lldbsuite/test/tools/lldb-server/threads-info/main.cpp
1 ↗(On Diff #251512)

Good catch, copypasta :-/

Regarding the packet savings - there are still things that worry me.

First of all, when lldb CLI stops on a breakpoint, it will first unwind top of the stack of each thread as part of ThreadList::ShouldStop. This sends lots of "x" packets to lldb-server and only then issues jThreadInfo, which then replies with the stack memory uselessly (with my patch). I am yet to investigate why the CLI does that.

I haven't looked into this, but this sound like the difference between a "private stop" and a "public stop". When you do a source-level "next" command, lldb will put breakpoints on branches/function calls, run to those, then instruction step those instructions, etc. Each time it stops at these breakpoints or instruction-steps. Each of these stops are called private stops - the UI layer (lldb command line driver or the SB API driver) never hear about these. When we've completed executing the instructions for that source line, then we declare this to be a "public stop" - this is when lldb driver or SB API driver, and the user, are notified that execution has stopped.

jThreadInfo is probably only retrieved when there is a public stop. debugserver puts two words of the current stack frame and the caller's stack frame in the questionmark (T05 etc) packet along with register contents for the frame that stopped. The stepping thread plans need to know the current stack frame and the caller stack frame, to be sure that you didn't just step out of a function call or something.

For all the threads that aren't stepping, lldb should only need to know the current pc value - in the questionmark stop packet I also include the thread-pcs: list of pc values for each thread so we don't query them.

Look over the packet log while thinking about the difference between private stops and public stops and I think it might be easier to understand what's going on.

My patch saves messages when we start stepping after the breakpoint - in my example with a hundred threads, when we stop on a breakpoint, take a step and then do "thread list", there are no additional 'x' packets sent to lldb-server for the "thread list" command; without my patch there is a packet per-thread. In our VS extension, this seems to help quite a bit, but I am not so sure about the CLI.

It also feels like jThreadsInfo might not the best place to send the stacks - I am wondering whether jstopinfo in the vCont/c packet response would be a better place (even though the encoding of that is quite terrible). What do you think?

Regarding the packet savings - there are still things that worry me.

First of all, when lldb CLI stops on a breakpoint, it will first unwind top of the stack of each thread as part of ThreadList::ShouldStop. This sends lots of "x" packets to lldb-server and only then issues jThreadInfo, which then replies with the stack memory uselessly (with my patch). I am yet to investigate why the CLI does that.

I haven't looked into this, but this sound like the difference between a "private stop" and a "public stop". When you do a source-level "next" command, lldb will put breakpoints on branches/function calls, run to those, then instruction step those instructions, etc. Each time it stops at these breakpoints or instruction-steps. Each of these stops are called private stops - the UI layer (lldb command line driver or the SB API driver) never hear about these. When we've completed executing the instructions for that source line, then we declare this to be a "public stop" - this is when lldb driver or SB API driver, and the user, are notified that execution has stopped.

To give a quick example with some random binary I had sitting around,

4   	int main()
5   	{

-> 6 std::vector<uint64_t> a;

7   	    a.push_back (0x0000000100000000);

(lldb) tar mod dump line-table a.cc
Line table for /tmp/a.cc in `a.out
0x0000000100001220: /tmp/a.cc:5
0x000000010000122f: /tmp/a.cc:6:27
0x0000000100001245: /tmp/a.cc:7:18
0x0000000100001251: /tmp/a.cc:7:7

(lldb) dis -s 0x000000010000122f -e 0x100001245
a.out`main:
-> 0x10000122f <+15>: movq %rax, %rdi

0x100001232 <+18>: movq   %rax, -0x68(%rbp)
0x100001236 <+22>: callq  0x100001350               ; std::__1::vector<unsigned long long, std::__1::allocator<unsigned long long> >::vector at vector:497
0x10000123b <+27>: movabsq $0x100000000, %rax        ; imm = 0x100000000

(lldb)

this source-level step will take 5 stops. first go to the CALLQ, then instruction-step in, then set a breakpoint on the return address and continue, then stepi to the first instruction of line 7. Let's look at what llvm does after it's instruction-stepped into the CALLQ,

< 18> send packet: $Z0,10000123b,1#fc
< 6> read packet: $OK#00
< 5> send packet: $c#63
< 624> read packet: $T05thread:5ada71;threads:5ada71;thread-pcs:10000123b;00:58fabfeffe7f0000;01:0000000000000000;02:80fbbfeffe7f0000;03:90fabfeffe7f0000;04:58fabfeffe7f0000;05:58fabfeffe7f0000;06:60fabfeffe7f0000;07:e0f9bfeffe7f0000;08:0000000000000000;09:0000000000000000;0a:0000000000000000;0b:0000000000000000;0c:0000000000000000;0d:0000000000000000;0e:0000000000000000;0f:0000000000000000;10:3b12000001000000;11:0202000000000000;12:2b00000000000000;13:0000000000000000;14:0000000000000000;metype:6;mecount:2;medata:2;medata:0;memory:0x7ffeefbffa60=70fabfeffe7f0000fdd7f765ff7f0000;memory:0x7ffeefbffa70=00000000000000000100000000000000;#00
< 18> send packet: $z0,10000123b,1#1c
< 6> read packet: $OK#00
< 18> send packet: $vCont;s:5ada71#b5
< 624> read packet: $T05thread:5ada71;threads:5ada71;thread-pcs:100001245;00:0000000001000000;01:0000000000000000;02:80fbbfeffe7f0000;03:90fabfeffe7f0000;04:58fabfeffe7f0000;05:58fabfeffe7f0000;06:60fabfeffe7f0000;07:e0f9bfeffe7f0000;08:0000000000000000;09:0000000000000000;0a:0000000000000000;0b:0000000000000000;0c:0000000000000000;0d:0000000000000000;0e:0000000000000000;0f:0000000000000000;10:4512000001000000;11:0202000000000000;12:2b00000000000000;13:0000000000000000;14:0000000000000000;metype:6;mecount:2;medata:1;medata:0;memory:0x7ffeefbffa60=70fabfeffe7f0000fdd7f765ff7f0000;memory:0x7ffeefbffa70=00000000000000000100000000000000;#00
< 16> send packet: $jThreadsInfo#c1
< 889> read packet: $[{"tid":5954161,"metype":6,"medata":[1,0],"reason":"exception","qaddr":4295843648,"associated_with_dispatch_queue":true,"dispatch_queue_t":140735606695552,"qname":"com.apple.main-thread","qkind":"serial","qserialnum":1,"registers":{"0":"0000000001000000","1":"0000000000000000","[...]

So we set a breakpoint and continued to it to get out of the function call, then we cleared the breakpoint, stepi'ed to get to the first instruction of line 7 and now we've got a public stop and we issue a jThreadsInfo to get more expensive / exhaustive information about all threads, not just the thread that hit the breakpoint.

You can see debugserver providing enough of the stack memory in the memory: field in the questionmark (T05) packet so that lldb can walk the stack one frame and be sure of where it is.

(and if you're still seeing mystery reads, put a breakpoint on ProcessGDBRemote::DoReadMemory and see who is doing it)

jarin added a comment.Mar 24 2020, 8:59 AM

(and if you're still seeing mystery reads, put a breakpoint on ProcessGDBRemote::DoReadMemory and see who is doing it)

Thanks for the great explanations! I did put a breakpoint into DoReadMemory, and the unwinder doing the reads for each thread (called from https://github.com/llvm/llvm-project/blob/master/lldb/source/Target/ThreadList.cpp#L349). If I understand correctly, the Thread::WillStop/SelectMostRelevantFrame should care only about the current PC of each thread, but not about the stack contents. Let me double check that.

jarin added a comment.Mar 31 2020, 1:49 AM

(and if you're still seeing mystery reads, put a breakpoint on ProcessGDBRemote::DoReadMemory and see who is doing it)

Thanks for the great explanations! I did put a breakpoint into DoReadMemory, and the unwinder doing the reads for each thread (called from https://github.com/llvm/llvm-project/blob/master/lldb/source/Target/ThreadList.cpp#L349). If I understand correctly, the Thread::WillStop/SelectMostRelevantFrame should care only about the current PC of each thread, but not about the stack contents. Let me double check that.

Looking at this in more detail, Thread::WillStop will unwind more than one frame (for each thread), the stack trace is below. I guess the surprising bit is that UnwindLLDB::UpdateUnwindPlanForFirstFrameIfInvalid (frame #11) always unwinds one more frame. Is that expected?

#0: process_gdb_remote::ProcessGDBRemote::DoReadMemory() at ProcessGDBRemote.cpp:2694:54
#1: Process::ReadMemoryFromInferior() at Process.cpp:2092:21
#2: Process::ReadMemoryFromInferior() at Process.cpp:2081
#3: MemoryCache::Read() at Memory.cpp:229:69
#4: RegisterContext::ReadRegisterValueFromMemory() at RegisterContext.cpp:337:31
#5: RegisterContextUnwind::ReadRegisterValueFromRegisterLocation() at RegisterContextUnwind.cpp:1040:15
#6: RegisterContextUnwind::ReadGPRValue() at RegisterContextUnwind.cpp:2008:44
#7: RegisterContextUnwind::InitializeNonZerothFrame() at RegisterContextUnwind.cpp:287:20
#8: RegisterContextUnwind::RegisterContextUnwind() at RegisterContextUnwind.cpp:70:29
#9: UnwindLLDB::GetOneMoreFrame() at UnwindLLDB.cpp:129:77
#10: UnwindLLDB::AddOneMoreFrame() at UnwindLLDB.cpp:332:32
#11: UnwindLLDB::UpdateUnwindPlanForFirstFrameIfInvalid() at UnwindLLDB.cpp:305:18
#12: UnwindLLDB::AddFirstFrame() at UnwindLLDB.cpp:99:41
#13: UnwindLLDB::AddFirstFrame() at UnwindLLDB.cpp:70
#14: UnwindLLDB::DoGetFrameInfoAtIndex() at UnwindLLDB.cpp:395:23
#15: StackFrameList::GetFramesUpTo() at Unwind.h:53:33
#16: StackFrameList::GetFrameAtIndex() at StackFrameList.cpp:660:16
#17: Thread::SelectMostRelevantFrame() at Thread.cpp:612:52
#18: Thread::WillStop() at Thread.cpp:634:26
#19: ThreadList::ShouldStop() at ThreadList.cpp:349:26
#20: Process::ShouldBroadcastEvent() at Process.cpp:3427:50
#21: Process::HandlePrivateEvent() at Process.cpp:3652:53
#22: Process::RunPrivateStateThread() at Process.cpp:3846:25

jarin added a comment.Apr 2 2020, 6:47 AM

Pavel, could you land this for me?

This revision was automatically updated to reflect the committed changes.
omjavaid reopened this revision.Apr 7 2020, 3:34 AM
omjavaid added a subscriber: omjavaid.

This patch breaks on TestMemoryCache.py lldb AArch64/Linux. we ll have to revert it untill fixed

FAIL: test_memory_cache_dwarf (TestMemoryCache.MemoryCacheTestCase)

Test the MemoryCache class with a sequence of 'memory read' and 'memory write' operations.

Traceback (most recent call last):

File "/home/omair.javaid/work/lldb/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 1751, in test_method
  return attrvalue(self)
File "/home/omair.javaid/work/lldb/llvm-project/lldb/test/API/functionalities/memory/cache/TestMemoryCache.py", line 62, in test_memory_cache
  self.assertEquals(0x000000AA, int(line.split(':')[1], 0))

AssertionError: 170 != 66

Config=aarch64-/home/omair.javaid/work/lldb/build/arm64/bin/clang-11

FAIL: test_memory_cache_dwo (TestMemoryCache.MemoryCacheTestCase)

Test the MemoryCache class with a sequence of 'memory read' and 'memory write' operations.

Traceback (most recent call last):

File "/home/omair.javaid/work/lldb/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 1751, in test_method
  return attrvalue(self)
File "/home/omair.javaid/work/lldb/llvm-project/lldb/test/API/functionalities/memory/cache/TestMemoryCache.py", line 62, in test_memory_cache
  self.assertEquals(0x000000AA, int(line.split(':')[1], 0))

AssertionError: 170 != 66

Config=aarch64-/home/omair.javaid/work/lldb/build/arm64/bin/clang-11

This revision is now accepted and ready to land.Apr 7 2020, 3:34 AM
omjavaid requested changes to this revision.Apr 7 2020, 3:35 AM
This revision now requires changes to proceed.Apr 7 2020, 3:35 AM
jarin added a comment.Apr 7 2020, 6:09 AM

Looking at the code for flushing L1 cache, it appears broken. I am guessing that is the reason for the failure of my patch on the bots.

Here is the L1 flushing code (after checking L1 is not empty).

AddrRange flush_range(addr, size);
BlockMap::iterator pos = m_L1_cache.upper_bound(addr);
if (pos != m_L1_cache.begin()) {
  --pos;
}
while (pos != m_L1_cache.end()) {
  AddrRange chunk_range(pos->first, pos->second->GetByteSize());
  if (!chunk_range.DoesIntersect(flush_range))
    break;
  pos = m_L1_cache.erase(pos);
}

For instance, let the cache contains two chunks (10, 10) and (30, 10). Let us flush (25, 10). Then the while loop chunk_range will be (10, 10), which is not intersecting with (25, 10), so we do not flush anything. This is clearly wrong because (30, 10) should be flushed.

I am wondering whether something like this would be correct (I am still a bit worried about the case of overlapping things in L1).

AddrRange flush_range(addr, size);
BlockMap::iterator pos = m_L1_cache.upper_bound(addr);
if (pos != m_L1_cache.begin()) {
  // If we are not in the beginning, the previous range might be also intersecting.
  BlockMap::iterator previous = pos;
  previous--;
  AddrRange previous_range(previous->first, previous->second->GetByteSize()) 
  if (!previous_range.DoesIntersect(flush_range))
    pos = m_L1_cache.erase(previous);
}
while (pos != m_L1_cache.end()) {
  AddrRange chunk_range(pos->first, pos->second->GetByteSize());
  if (!chunk_range.DoesIntersect(flush_range))
    break;
  pos = m_L1_cache.erase(pos);
}