This is an archive of the discontinued LLVM Phabricator instance.

Be consistent when adding names and mangled names to the index
Needs ReviewPublic

Authored by clayborg on May 31 2019, 3:31 PM.

Details

Summary

We were being very inconsistent with the way we were adding names to the indexes.

For DW_TAG_subprogram and DW_TAG_inlined_subroutine we would not add the mangled name if the we didn't have a simple name. We would only add the mangled name if started with '_' (even if it matches the simple name that started with '_') or if the name differed if the mangled name didn't start with '_'.

For DW_TAG_array_type, DW_TAG_base_type, DW_TAG_class_type, DW_TAG_constant, DW_TAG_enumeration_type, DW_TAG_string_type, DW_TAG_structure_type, DW_TAG_subroutine_type, DW_TAG_typedef, DW_TAG_union_type, and DW_TAG_unspecified_type we would add the simple name and the mangled name if either existed, even if they were the same..

For DW_TAG_variable we would only follow the same rule as the DW_TAG_subprogram and DW_TAG_inlined_subroutine (so no simple name meant we wouldn't add the mangled name).

Not sure if this was on purpose or not, but wanted to post this diff for discussion on what the right thing to do should be.

The new way adds a helper function called AddMangled that takes both the name and mangled name and:

  • returns false if mangled is NULL or of name == mangled (same string in .debug_str)
  • returns true if name is NULL so we add the mangled name even if we don't have a simple name
  • returns true if the first character of mangled is different than the first character in simple name (not just blindly accepting any mangled name that starts with '_' like before)
  • return true if name differs from mangled using strcmp as a last resort

We might think about adding a feature where if there is a mangled name and no simple name, chop up the demangled name like we do in the ObjectFile plug-ins when we see mangled names so we extract the basename out and then could use this as the simple name. This might help expressions find things correctly by basename

Diff Detail

Event Timeline

clayborg created this revision.May 31 2019, 3:31 PM
aprantl added inline comments.May 31 2019, 3:34 PM
source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp
114

Even in a static function I feel weird comparing two char* by pointer value. Can you use llvm::StringRefs instead? then we don't need to call the unsafe strcmp either and just trust that operator== is implemented efficiently.

clayborg edited the summary of this revision. (Show Details)May 31 2019, 3:36 PM

Being consistent definitely sounds like a good idea. Since this does change behavior somewhat, I'm wondering whether it would make sense to add a test here. The thing that's not clear to me is whether this change in behavior is only theoretical, or if it can also happen in a real-world scenario. We can always hand-construct an DIE which will have a linkage name, but no "normal" name, but I don't see how would this ever happen in practice. How did you initially discover this inconsistency?

We might think about adding a feature where if there is a mangled name and no simple name, chop up the demangled name like we do in the ObjectFile plug-ins when we see mangled names so we extract the basename out and then could use this as the simple name. This might help expressions find things correctly by basename

My worry about that is two-fold:

  • it adds a demangling step (a pretty expensive operation) to something that is already the biggest bottleneck in loading DWARF
  • this isn't something that we would do when using accelerator tables, so this will result in inconsistent behavior in these two cases

To understand the tradeoffs here, it would be good to understand under what circumstances can this situation occur...

source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp
114

Since this is *the* hottest piece of code when not using the accelerator tables, I think it makes sense to spend some time to squeeze every last bit of performace from it. The problem I see with constructing a StringRef is that it will force a linear scan of the c string regardless of the result of comparison. Since (I expect) the most comon result of this function will be "true", it means we will end up adding this string to the lookup tables, which will mean another linear scan due to the construction of ConstString.

Though, I guess that means that if the most frequent (>95%) result here is "unequal", we can just optimistically ConstString-ify both names, and rely on the ConstString operator== to do the fast comparison. That may result in less overall branches in the code and faster execution, but this is just a speculation on my part.

Being consistent definitely sounds like a good idea. Since this does change behavior somewhat, I'm wondering whether it would make sense to add a test here. The thing that's not clear to me is whether this change in behavior is only theoretical, or if it can also happen in a real-world scenario. We can always hand-construct an DIE which will have a linkage name, but no "normal" name, but I don't see how would this ever happen in practice. How did you initially discover this inconsistency?

I added asserts in the else clause if code like:

if (name) {
  ...
} else {
  assert(mangled_str == nullptr);
}

And I got the assert. So this is a real issue.

We might think about adding a feature where if there is a mangled name and no simple name, chop up the demangled name like we do in the ObjectFile plug-ins when we see mangled names so we extract the basename out and then could use this as the simple name. This might help expressions find things correctly by basename

My worry about that is two-fold:

  • it adds a demangling step (a pretty expensive operation) to something that is already the biggest bottleneck in loading DWARF
  • this isn't something that we would do when using accelerator tables, so this will result in inconsistent behavior in these two cases

To understand the tradeoffs here, it would be good to understand under what circumstances can this situation occur...

Fine not doing this yet.

I will post a new diff with a simpler solution that maintains performance to address Adrian's question of using StringRef.

clayborg updated this revision to Diff 204104.Jun 11 2019, 10:35 AM

Changed to use ConstString in AddMangled which simplifies the logic quite a bit. I verified performance didn't regress.

clayborg updated this revision to Diff 204116.Jun 11 2019, 11:08 AM

Remove commented out code.

Being consistent definitely sounds like a good idea. Since this does change behavior somewhat, I'm wondering whether it would make sense to add a test here. The thing that's not clear to me is whether this change in behavior is only theoretical, or if it can also happen in a real-world scenario. We can always hand-construct an DIE which will have a linkage name, but no "normal" name, but I don't see how would this ever happen in practice. How did you initially discover this inconsistency?

I added asserts in the else clause if code like:

if (name) {
  ...
} else {
  assert(mangled_str == nullptr);
}

And I got the assert. So this is a real issue.

Ok, so in that case, can you add a test for this? It may be easiest to hand-craft some DIEs with appropriate attributes. Then you could either use lldb-test symbols to dump the indexes, or lldb-test symbols -find=variable --name=foo to actually query them.