Page MenuHomePhabricator

[lldb] Stop searching for a symbol in a pdb by regex

Authored by asmith on Dec 11 2017, 12:48 PM.



It was possible when searching for a symbol by regex in a pdb that an invalid regex would cause an exception on Windows. This updates the code to avoid throwing an exception.

When fixing the exception it was decided there is no reason to search for a symbol in a pdb by regex. To support this, SymbolFilePDB::FindTypes() now only searches for types by name and no longer calls FindTypesByRegEx().

Diff Detail


Event Timeline

asmith created this revision.Dec 11 2017, 12:48 PM
zturner added inline comments.Dec 11 2017, 12:59 PM

I can see two possible problems here.

  1. Now if it is a valid regex, it's going to compile it twice, hurting performance (not sure by how much, it could be negligible)
  1. Whether or not it's valid is determined by asking LLDB's regex engine, but it's then compiled with C++ STL's engine. It's possible they won't agree on whether or not a regex is valid.

You mentioned that it throws an exception on an invalid regex. What actually happens though? Because we compile with exception support disabled. Does it terminate the program?

At a minimum, the validity check and the find function should agree on the regex engine. If the only way to make that work is to change from std::regex to lldb_private::RegularExpression as the matching engine, then I guess we have to do that.

asmith added inline comments.Dec 11 2017, 1:09 PM

The find_first could be removed depending on the cost of lldb_private::RegularExpression()

We should have a dedicated API that actually searches for types using a lldb_private::RegularExpression. Why do we even try to create a regular expression in SymbolFilePDB::FindTypes()? Was it used in testing? No API in LLDB currently expects this and all other API in LLDB has a separate function call that uses lldb_private::RegularExpression. We can add one if needed to the lldb_private::SymbolFile API. I don't like API that magically tries to look at something and tries to compile a regular expression on each invocation. Can we change this?

asmith added inline comments.Dec 11 2017, 2:04 PM

How about this?

  lldb_private::RegularExpression name_regex(name_str);
  if (name_regex.IsValid())
    FindTypesByRegex(name_regex, max_matches, types); // pass regex here
    FindTypesByName(name_str, max_matches, types);
  return types.GetSize();

void SymbolFilePDB::FindTypesByRegex(lldb_private::RegularExpression &regex,
                                     uint32_t max_matches,
                                     lldb_private::TypeMap &types) {
asmith added inline comments.Dec 13 2017, 10:27 AM

Our experience on Windows is that when lldb is built with exceptions disabled, std::regex segfaults on an invalid regex causing lldb to terminate.

The test case will reproduce the failure or an example from the lldb console (if you had the corresponding changes required to do the symbol lookup):

image lookup -n function_name

asmith updated this revision to Diff 127255.Dec 16 2017, 12:23 PM

This changes SymbolFilePDB::FindTypesByRegex () to take an lldb_private::RegularExpression as the argument and removes the use of the STL regex.

asmith marked an inline comment as done.Dec 16 2017, 12:23 PM
zturner accepted this revision.Dec 16 2017, 4:18 PM

lgtm, if you have some time in the future for further improvements Greg's suggestion would be a good way to make this better.

This revision is now accepted and ready to land.Dec 16 2017, 4:18 PM
clayborg added inline comments.Dec 18 2017, 9:44 AM

I would rather not sniff the string passed in to see if this could be a regex and add a new FindTypesByRegex() call to lldb_private::SymbolFile that anyone can use. Do we have PDB tests that are using this functionality?

asmith added inline comments.Dec 18 2017, 8:47 PM

The error can be reproduced with:

image lookup -t char*

clayborg added inline comments.Dec 19 2017, 8:38 AM

You should make the RegularExpression object first and then check for errors. If there are errors, the type lookup should happen by normal name. This is again why I don't like us sniffing for a regular expression. If there is an error, like when trying to lookup "char*" as mentioned by Aaron, would you expect to see an error message saying "char*" isn't a valid regex? Then the user thinks that the type lookup always takes a regular expression which is not the case, just something the PDB plugin added for some reason. I do believe part of this patch should be adding the lldb_private::SymbolFile::FindTypesByRegex(...) to the base class and fixing this issue by removing the regex sniffing code since it is fragile at best (how are we doing to lookup a template type without triggering the regex issues?).

zturner added inline comments.Dec 20 2017, 8:03 AM

The reason I don't think it's appropriate for this patch is because nothing currently calls that method. So something much higher level would then have to be updated to call this new method, which might even entail adding a new command line option. For now, just fixing a crash is sufficient.

If you load a exe file that has a PDB file, people can currently run:

(lldb) type lookup "char*"

If no testing is using the regular expression stuff, then just pull it out. Do we have unit tests that depend on this working? If not, lets just pull it out from the SymbolFilePDB::FindTypes() and we can leave the fix in that uses the RegularExpression class so it won't crash in the future?

asmith updated this revision to Diff 127803.Dec 20 2017, 5:05 PM
asmith retitled this revision from [lldb] Check that a regex is valid before searching by regex for a symbol in a pdb. to [lldb] Stop searching for a symbol in a pdb by regex.
asmith edited the summary of this revision. (Show Details)
zturner accepted this revision.Dec 20 2017, 9:50 PM
asmith closed this revision.Dec 21 2017, 9:27 PM
labath added a subscriber: labath.Dec 22 2017, 2:01 AM

This is making the windows unit tests fail: If I understand correctly, this changes FindTypes to not search by regex.
If that's the case, then the SymbolFilePDBTests::TestRegexNameMatch needs to be updated to not expect that .* will find a match.
Also, while technically not failing, the SymbolFilePDBTests::TestMaxMatches also looks weird in this new world, and probably needs updating.

Could you look into those?