This is an archive of the discontinued LLVM Phabricator instance.

[lldb] (Partially) enable formatting of utf strings before the program is started
ClosedPublic

Authored by labath on Nov 3 2021, 5:54 AM.

Details

Summary

The StringPrinter class was using a Process instance to read memory.
This automatically prevented it from working before starting the
program.

This patch changes the class to use the Target object for reading
memory, as targets are always available. This required moving
ReadStringFromMemory from Process to Target.

This is sufficient to make frame/target variable work, but further
changes are necessary for the expression evaluator. Preliminary analysis
indicates the failures are due to the expression result ValueObjects
failing to provide an address, presumably because we're operating on
file addresses before starting. I haven't looked into what would it take
to make that work.

Diff Detail

Event Timeline

labath requested review of this revision.Nov 3 2021, 5:54 AM
labath created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptNov 3 2021, 5:54 AM
labath added a comment.Nov 3 2021, 5:55 AM

bug reported in PR45856.

ljmf00 added a subscriber: Geod24.Nov 3 2021, 6:46 AM
ljmf00 added a comment.Nov 3 2021, 6:53 AM

bug reported in PR45856.

Thanks for your time fixing this. So, for better understanding, does the infrastructure behind LLDB, Target and Process have similar APIs but Process is just for the running state?

labath added a comment.Nov 3 2021, 9:30 AM

bug reported in PR45856.

Thanks for your time fixing this. So, for better understanding, does the infrastructure behind LLDB, Target and Process have similar APIs but Process is just for the running state?

Not exactly. It's true that a Process represents the running state of the application, and that both of them can read memory (process always reads live memory while a target can optionally return data from the object file), but that's pretty much where the similarities end. Process' methods generally concern the running of the application, so you have things that deal with stdio, signals, threads, etc. OTOH, a target works with the object files, and the debug info contained therein (breakpoints, modules, symbols, ...).

Jim, could you take look at this?

jingham requested changes to this revision.Nov 15 2021, 5:00 PM

It makes sense for the string formatters to try reading from the target if there's no process. So the general idea is fine.

But one of the other major differences between the Target before a process has been launched and attached to; and its state after attachment; is that before running there isn't a unique mapping from a scalar address (lldb::addr_t) to a specific byte in a specific Object File. For instance, on the systems where lldb can figure it out the dependent libraries before execution, lldb will determine & load into the Target not just the binary handed to it by the "file" command, but also the closure of that binary's direct dependents. That's handy, it means that you can inspect disassembly, set breakpoints, etc. seeing all the libraries your binary depends on before launch. However, on many systems libraries aren't given unique load addresses. They generally just get set to load at 0x0, and the loader is the one that gives them all unique slots in the memory space when the process runs. So Modules in the Target might and often do overlap each other in the "before running" address space.

When you are asking a process for memory, there's only one value at any given address, so all those API's can take an lldb::addr_t, the Process will always be able to unambiguously turn that back into either a plain address (if the memory is on the heap or stack) or a section relative address if the address is in TEXT/DATA, etc. of some loaded binary. But when you start asking the Target questions, you can't rely on this fact.

The lldb_private::Address class handles that by recording the address as section in a Module + offset, inherently picking the right binary. So using Addresses we can always record the user's intentions correctly. For instance, if you get the Address from a ValueObject representing a global variable which is in the DATA section of a target, that Address will record both the data section, and the variable's offset in that sections. But for that to work, you have to be careful to preserve the Address throughout the process, and never go back & forth from lldb_private::Address to lldb::addr_t until you actually have to request bytes of something (process, file, etc). Otherwise you might start out trying to read a string in library A, but end up reading the memory for it from library B...

Preserving the Address as long as is practicable is actually a general good practice in lldb. Even if you have a live process, when you go from section-relative Address to addr_t and back to Address, we have to re-lookup that address in the section load list, which is inefficient.

You make this mistake a few places in this patch, which I've pointed out in the detailed comments.

lldb/include/lldb/Target/Target.h
1028

ReadCStringFromMemory (both flavors) set "force_live_memory" to be true. Note that doesn't mean "don't read from the target" it means "always read from the process first." It does fall back to the Target if there's no process, AND if your Address is section relative.

OTOH, you make it a default argument and then set it to "false" - the opposite behavior. Is there a reason for that? That seems confusing.

force_live_memory is really an efficiency thing. It will only make a difference if the data to be read is in a read-only section of a binary, in which case it will read from there. lldb always used to prefer reading locally if it can, but if someone has been able to flip permissions and overwrite read-only DATA through some kind of attack, you wouldn't be able to see that if this is the default. So in actual fact, we pass force_live_memory -> true most of the places we use these Target Memory Reading API's.

So it seems better to follow ReadCStringFromMemory and just force this to true.

lldb/source/Plugins/Language/CPlusPlus/CxxStringTypes.cpp
58

This should take some form of lldb_private::Address not an lldb::addr_t or it won't work pre-run. See the overall comment for details.

lldb/source/Target/Target.cpp
1911

Process::ReadStringFromMemory is a pretty old function, and hasn't been fully converted to llvm style, which prefers to check things like dst != nullptr up front and do an early return if anything fails. That reduces indentation, and spaces are a precious commodity if you restrict yourself to 80 characters on a line.

Also, the error message was originally pretty useless, but there's no reason to propagate that now that you've touched it ;-)

1922

This is wrong. If the incoming address was section relative, you've thrown that information away. Just pass Target::ReadMemory the full Address (addr), it will know what to do with it. And you can use:

addr.Slide(bytes_read);

instead of lines 1951-1952 to update the address.

1924

This is a seemingly arbitrary choice.

From what I can tell you are copying this from GetCStringFromMemory, so the decision to do this isn't yours...

IIUC the whole point of this paging so we don't read a big chunk of memory and then find the terminator a couple of characters in. The optimal size for that doesn't have to line up with the process cache size, but I don't see why you wouldn't use that if it was handy.

Anyway, having this here with no comment is confusing. Either repeat the original justification or get the process cache line size if there's a process.

lldb/test/API/lang/cpp/char8_t/TestCxxChar8_t.py
25

It's a shame we don't have SBTarget.GetValueForVariablePath() - the equivalent of SBFrame.GetValueForVariablePath. If we did we could make a global variable version of lldbtest.expect_var_path, which would make this test more precise. But we don't...

29

Rather than a TODO, it's better to put in a test here, then expect it to fail. That way if someone comes upon this comment later on, they won't have to guess what didn't work for you. And if somebody fixes this while doing something else, the unexpected pass will let them know.

This revision now requires changes to proceed.Nov 15 2021, 5:00 PM
labath marked 7 inline comments as done.Nov 16 2021, 2:34 AM

Thanks for the review.

lldb/include/lldb/Target/Target.h
1028

I changed the value to true.

However, I've have copied this pattern from ReadMemory, ReadScalarIntegerFromMemory, etc. Should we be setting the value to true for those functions as well?

lldb/source/Plugins/Language/CPlusPlus/CxxStringTypes.cpp
58

I believe I've found the right incantation to do that. Please have a look at the new version.

lldb/source/Target/Target.cpp
1922

I've removed it. BTW, this was cargo-culted from Target::ReadCStringFromMemory, so it has the same bug. I'll create a patch to just delete that whole function

1924

I've created a helper function which returns a (properly alinged/rounded) size if we have a process, and a fixed value if we don't. Let me know what you think.

labath updated this revision to Diff 387543.Nov 16 2021, 2:34 AM
labath marked 4 inline comments as done.

New version.

Something seems to have happened to reviews.llvm.org. I can sort of see patches, but the interface isn't responding, and I can't actually make comments there, so we'll try this way instead.

Your changes look good, the "ReasonableSizeForReading" reasonable.

About the "force_live_memory" thing. As I understand it, this is really one of those "why we can't have nice things" deals that I think we haven't fully embraced. It would be very nice to not have to go to the device for read-only loaded sections, since in some situations it can be much faster. OTOH, the security folks are right, silently suppressing these differences hides something that, while it almost never happens, matters a lot when it does. There's no other reason that I can think of to choose to read read-only sections from memory rather than from the binary.

Maybe this is more the sort of thing that should be a policy controlled by a Target setting instead of variables passed around and set different ways by different functions? That way security conscious folks can force the always read behavior, and everybody else can work a little faster?

Anyway, the patch looks good, and if I could get Phabricator to allow me to check "Accept Revision" and click the Submit button, I would do so.

Jim

This revision was not accepted when it landed; it landed in state Needs Review.Nov 18 2021, 5:48 AM
This revision was automatically updated to reflect the committed changes.