This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Fix (unintentional) recursion in CommandObjectRegexCommand
ClosedPublic

Authored by JDevlieghere on Feb 17 2022, 4:43 PM.

Details

Summary

Jim noticed that the regex command is unintentionally recursive:

Let's use the following command regex as an example:

(lldb) com regex humm 's/([^ ]+) ([^ ]+)/p %1 %2 %1 %2/'

If we call it with arguments foo bar, thing behave as expected:

(lldb) humm foo bar
(...)
foo bar foo bar

However, if we include %2 in the arguments, things break down:

(lldb) humm fo%2o bar
(...)
fobaro bar fobaro bar

The problem is that the implementation of the substitution is too naive. It substitutes the %1 token into the target template in place, then does the %2 substitution starting with the resultant string. So if the previous substitution introduced a %2 token, it would get processed in the second sweep, etc.

This patch addresses the issue by walking the command once and substituting the % variables in place. I didn't really trust myself so I also wrote a few unit tests to make sure I didn't miss any edge cases.

(lldb) humm fo%2o bar
(...)
fo%2o bar fo%2o bar

rdar://81236994

Diff Detail

Event Timeline

JDevlieghere created this revision.Feb 17 2022, 4:43 PM
JDevlieghere requested review of this revision.Feb 17 2022, 4:43 PM
JDevlieghere edited the summary of this revision. (Show Details)
mib added inline comments.Feb 17 2022, 5:06 PM
lldb/source/Commands/CommandObjectRegexCommand.cpp
40

Are we assuming that the number following % will always have a single digit ?

52

TIL but I think the exact formula is floor(log10(num))+2 according to this https://stackoverflow.com/questions/61707231/why-is-log-base-10-used-in-this-code-to-convert-int-to-string

I don't know if this formula has a name but if it does it would be great to add it as a comment above.

JDevlieghere edited the summary of this revision. (Show Details)Feb 17 2022, 5:07 PM
JDevlieghere marked an inline comment as done.
JDevlieghere added inline comments.
lldb/source/Commands/CommandObjectRegexCommand.cpp
40

Added a comment to make it clear that atoi will parse until it encounters a character that's not a digit.

52

The formula you linked is to compute the size in bytes for the number, not the string length. Take 1 for example: floor(log10(1)) + 2 = 0 + 2 = 2.

clayborg added inline comments.Feb 17 2022, 11:15 PM
lldb/source/Commands/CommandObjectRegexCommand.cpp
13

I cringe every time I see an old C header being imported as C++ because of the huge amounts of junk it causes the DWARF to incur with many import declarations inside the std namespace for any types that are defined... If we import <math.h> we don't end up with all that. We don't have to change it, but as you are aware dsymutil keeps every type that is defined in these header files because of this extra fluff in the DWARF.

41

We don't want negative numbers to succeed here. Might be better to use strtoul()

51

Update to use strtoul "end" parameter

We also have StringExtractor that we could call StringExtractor::GetU32(...) on. We would need to add a new method like:

bool StringExtractor::SkipTo(char ch);

And then we could call StringExtractor::GetU32(..) if that returns true. The current string position only gets updated if the number extraction succeeds. This uses strtoul being the scenes:

uint32_t StringExtractor::GetU32(uint32_t fail_value, int base);
lldb/source/Commands/CommandObjectRegexCommand.cpp
13

If we end up using strtoul as mentioned below, then:
#include <stdlib.h>

labath added a subscriber: labath.Feb 18 2022, 12:48 AM

Wouldn't want to miss a good bikeshed. Logarithms are cool, but I think something like this (untested) snippet would be simpler

Input.split(Parts, '%');
Result << Parts[0];
for (llvm::StringRef Part : drop_begin(Parts)) {
  if (!Part.consumeInteger(10, Idx) && Idx < Replacements.size())
    Result << Replacements[Idx];
  else
    Result << '%';
  Result << Part;
}

Use Pavel's not-as-cool solution.

clayborg added inline comments.Feb 20 2022, 4:27 PM
lldb/source/Commands/CommandObjectRegexCommand.h
47

Do we want a "Expected<std::string>" as the result? If any index that follows a '%' 'that is 1 or higher could require the index be valid within "replacements" or we would return an error like "%4 is out of range, not enough arguments specified". Seems weird to just leave it in the resulting string as that is most certainly not what the user would expect? I might be what we used to do, but since we are improving things here, is seems like this would be easy to do

clayborg added inline comments.Feb 20 2022, 4:28 PM
lldb/source/Commands/CommandObjectRegexCommand.h
47

And also, do we really want to print "%0" if we have one in the string? Maybe an error for that as well?

JDevlieghere marked 2 inline comments as done.
  • Allow using %0
  • Return an error when using an out of range index
JDevlieghere marked 2 inline comments as done.Feb 21 2022, 9:28 AM
mib accepted this revision.Feb 23 2022, 8:48 AM

LGTM !

This revision is now accepted and ready to land.Feb 23 2022, 8:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 23 2022, 12:34 PM