This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Make MemoryCache::Read more resilient
ClosedPublic

Authored by bulbazord on Mar 8 2023, 3:57 PM.

Details

Summary

MemoryCache::Read is not resilient to partial reads when reading memory
chunks less than or equal in size to L2 cache lines. There have been
attempts in the past to fix this but nothing really solved the root of
the issue.

I first created a test exercising MemoryCache's implementation and
documenting how I believe MemoryCache::Read should behave. I then
rewrote the implementation of MemoryCache::Read as needed to make sure
that the different scenarios behaved correctly.

rdar://105407095

Diff Detail

Event Timeline

bulbazord created this revision.Mar 8 2023, 3:57 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2023, 3:57 PM
bulbazord requested review of this revision.Mar 8 2023, 3:57 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2023, 3:57 PM
bulbazord edited the summary of this revision. (Show Details)Mar 8 2023, 4:02 PM
JDevlieghere added inline comments.Mar 8 2023, 4:16 PM
lldb/source/Target/Memory.cpp
155–156

This isn't used until line 180. I'd move it down, closer to where it is being used.

155–157

Instead of describing the algorithm here, would it make sense to break this up and put it above the relevant code below? It seems like it matches pretty well with the code structure. Looking at the signature of FindEntryThatContains and the fact that it doesn't take the size, I assume it's because we only check the start address?

161–163

What would a more thorough check look like? Or phrased differently: what is the current check missing?

165

Should the error mention that if failed due to the address overlapping with an invalid range? Is an invalid range something that is meaningful to the user?

202–204

Why can't we read from the process? Same question below.

mib added inline comments.Mar 8 2023, 4:47 PM
lldb/source/Target/Memory.cpp
199–200

nit: Is this necessary ?

200

IIUC, now that the read succeeded, you're shifting the current cache line base address to point to the next cache line base address, so you can continue reading if necessary. I had troubles understanding the point of this line before reading the rest of the code so either this should move closer to where it's used or at least it should have a comment explaining what it's doing.

201–202

Could use a comment explaining what we're doing here.

206

Shouldn't this be the size of the cached line ?

There was a fix that was never submitted for Google Stadia for the memory cache here:

https://github.com/googlestadia/vsi-lldb/tree/master/patches/llvm-project

Might be worth checking what they did to ensure we have all of the same abilities.

lldb/source/Target/Memory.cpp
202–204

Because if we did a memory request before from a valid "curr_cache_line_base_addr", and we got back fewer bytes that requested, then the bytes won't be available later right?

205–207

Not on the FIXME: We can't really check this near the beginning, because this happens for each cache line we as we advance the "curr_cache_line_base_addr" right?

One thing to note about this code is that we might need to read at most 2 cache lines for any requests that make it to this code since we check above for "if (dst_len > m_L2_cache_line_byte_size)..." and use the L1 cache if that is true. So we know that we will read at most 2 cache lines depending on the offset. Might be nice to read the 2 cache lines in one memory access below if possible, and then make two cache entries with the result, but it will be either one cache line read, or two

260–261

If we don't read an entire cache line, should we populate this into the L1 cache instead? It might make the logic for accessing data in the L2 cache a bit simpler?

263

Should we just create a DataBufferSP right away here instead of creating a unique pointer and releasing it later?

bulbazord added inline comments.Mar 9 2023, 11:25 AM
lldb/source/Target/Memory.cpp
155–157

Good point!

165

I don't think the concept of "invalid range" is meaningful to the user right now. I'm pretty sure we only use it to prevent us from reading __PAGEZERO on apple platforms.

200

I should have paid closer attention to this line. We should only be moving the cache line base address to the next cache line if we're going to continue reading. Will refactor.

202–204

If we've hit this code path then we have previously read from the process and got back fewer bytes than a cache line fits. For example, maybe a cache line is 512 bytes and when we performed the read we got back 502 bytes for some reason. If we're trying to read the last 10 bytes of that line, that's just not available, so we bail out. We could try to protect against this in a number of ways, like if we get back fewer bytes than we wanted initially then maybe we can retry the read before caching the line, or if the line isn't filled out maybe can try to read the inferior one more time or something.

Ultimately, I want MemoryCache to be prepared for reads to be incomplete and guard against touching memory that we don't have.

205–207

If I'm understanding what you mean correctly, I think we can check this near the beginning. We have all the information we need to do safety checks before we even start reading anything, I believe...?

Also, because we read at most 2 cache lines, I can probably get rid of the loop and just do 2 sequential reads...

206

To be honest, I'm somewhat sure that this line actually could be return 0;. I'm pretty sure that we only will hit this on the first cache line read. If we read a second cache line, we should always be starting at the beginning of the cache line... I'll probably refactor this.

260–261

That might not be a bad idea actually. I'll try it and see how much it simplifies the logic.

263

That sounds like a smarter move.

One other optimization we can do is if we read from the process memory and it returns that is read zero bytes, right now we add the range we were trying to read into the m_invalid_ranges member variable. So lets say we were trying to read the range [0x1000-0x2000) on a mac. We will fail to read this due to __PAGEZERO, but I believe we currently add this range to the m_invalid_ranges. But we could ask about this memory region from the process and realize we can actually add [0x0-0x100000000) to the m_invalid_ranges. That might help avoid multiple bad reads from a large area that isn't mapped.

I would suggest checking the google stadia patch for the L1 and L2 caches:

https://github.com/googlestadia/vsi-lldb/blob/master/patches/llvm-project/0019-lldb-Fix-incorrect-L1-inferior-memory-cache-flushing.patch

Just to see how they did things.

I would suggest checking the google stadia patch for the L1 and L2 caches:

https://github.com/googlestadia/vsi-lldb/blob/master/patches/llvm-project/0019-lldb-Fix-incorrect-L1-inferior-memory-cache-flushing.patch

Just to see how they did things.

I looked at this patch earlier. They're modifying code that I'm not touching and these 2 patches can be applied separately without conflict. It may be worth trying to apply this fix in a follow-up commit.

bulbazord updated this revision to Diff 503942.Mar 9 2023, 2:56 PM
  • Addressed reviewer feedback
  • Simplified the case where we use L2 cache lines
bulbazord marked 11 inline comments as done.Mar 9 2023, 2:57 PM
bulbazord added inline comments.
lldb/source/Target/Memory.cpp
260–261

It didn't make things any simpler so I didn't change it. Good idea though.

clayborg added inline comments.Mar 13 2023, 10:50 AM
lldb/source/Target/Memory.cpp
133–135

remove braces for single line if statement per llvm coding guidelines

203–204

move these two lines below the 2 if statements below that return early?

232–233

Is this an error here? We already got something from the first read and we are just returning partial data, do we need an error? If we fail the first read, then this is an error.

bulbazord added inline comments.Mar 13 2023, 3:38 PM
lldb/source/Target/Memory.cpp
232–233

If the second cache line you read is in an invalid range, maybe the user would want some feedback about why it was a partial read. It's a detectable condition. Maybe we shouldn't set an error string though, idk.

bulbazord updated this revision to Diff 504873.Mar 13 2023, 3:39 PM

Address small nits from Greg

bulbazord marked 2 inline comments as done.Mar 13 2023, 3:39 PM
JDevlieghere accepted this revision.Mar 13 2023, 4:44 PM

LGTM if @clayborg is happy

This revision is now accepted and ready to land.Mar 13 2023, 4:44 PM
This revision was automatically updated to reflect the committed changes.