Page MenuHomePhabricator

Don't type-erase the SymbolContextItem enum
ClosedPublic

Authored by zturner on Oct 23 2018, 11:54 AM.

Details

Summary

When we get the resolve_scope parameter from the SB API, it's a uint32_t. We then pass it through all of LLDB this way, as a uint32. This is unfortunate, because it means the user of an API never actually knows what they're dealing with. We can call it something like resolve_scope and have comments saying "this is a value from the SymbolContextItem enumeration, but it makes more sense to just have it actually *be* the correct type in the actual C++ type system to begin with. This way the person reading the code just knows what it is.

There is one thing here which I'm a little uncertain about, which is that I included a file from llvm llvm/ADT/BitmaskEnum.h from lldb/lldb-enumerations.h, which is a public header.` This is only done for convenience, and has two effects.

  1. It allows us to use bitwise operations on enums so we don't have to write things like Foo x = Foo(eFoo1 | eFoo2)
  2. More subtly, it inserts an enumerator into the enum. But (!) it doesn't change the value of any existing enumerators and it does so with a name that won't cause any clashes, so the important thing is that it shouldn't cause any name clashes.

Putting this all together, I think this should be an ABI-compatible change as far as SWIG is concerned, and I can't see any differences on my end. But if anyone can see any reason why we should be wary of this, speak up.

Assuming all goes well with this patch, I plan to repeat the same thing with SymbolFile::GetTypes which currently has a uint32_t types_mask

Diff Detail

Repository
rLLDB LLDB

Event Timeline

zturner created this revision.Oct 23 2018, 11:54 AM

It would be great not to have to use comments to express what these values mean. OTOH, having to put in static casts whenever you want to or values together would be a pain, so if there's no way to do it without magic, I'm a little less enthused...

But on the magic: It looks like BitMaskEnum.h pulls in MathTypes.h which pulls in Compiler.h which then pulls in llvm-config.h. You are supposed to be able to build tools on top of lldb with just the headers that go in LLDB.framework, you are not required to have the full source tree for an lldb build. I don't want to impose that restriction without very good reason, and fixing this wart doesn't rise to that level. Anyway, so if we want to include BitMaskEnum.h we would be required to ship a bunch of llvm headers (including some build produced ones IIUC). I don't think that's a very good idea.

It also looks to me like the value of the largest item + 1 gets baked into the operator overloading code. What would happen if we decided to add an element to the enum?

It would be great not to have to use comments to express what these values mean. OTOH, having to put in static casts whenever you want to or values together would be a pain, so if there's no way to do it without magic, I'm a little less enthused...

But on the magic: It looks like BitMaskEnum.h pulls in MathTypes.h which pulls in Compiler.h which then pulls in llvm-config.h. You are supposed to be able to build tools on top of lldb with just the headers that go in LLDB.framework, you are not required to have the full source tree for an lldb build. I don't want to impose that restriction without very good reason, and fixing this wart doesn't rise to that level. Anyway, so if we want to include BitMaskEnum.h we would be required to ship a bunch of llvm headers (including some build produced ones IIUC). I don't think that's a very good idea.

It also looks to me like the value of the largest item + 1 gets baked into the operator overloading code. What would happen if we decided to add an element to the enum?

It's not the largest item +1, it's actually just that the largest item gets a second internal name. If you add a new element to the enum you need to just make sure you update the LLVM_MARK_AS_BITMASK_ENUM() to contain the new largest value.

Anyway, your point about llvm-config.h is a good one, so what I can perhaps do instead is make something called LLDB_DEFINE_BITMASK_ENUM(EnumName) which basically just defines the overloads for a particular enum, then we can have it be a two step process. 1. Make the enum, 2. Call LLDB_DEFINE_BITMASK_ENUM(Enum). This way there's no external header dependencies. I'll upload a new patch shortly.

That would be great!

zturner updated this revision to Diff 170737.Oct 23 2018, 1:09 PM

Remove the reference to llvm/ADT/BitmaskEnu.h and define the operators ourselves. Also, removed a few casts that got left in.

That looks good to me, though you should wait for Greg to weigh in. He might notice something I missed.

Hi Greg, any thoughts on this?

As long as Swig is happy and the ABI doesn't change I am ok with this. Will we see the variables better when debugging? Or is this solely so the SymbolContextItem type doesn't disappear from the debug info?

As long as Swig is happy and the ABI doesn't change I am ok with this. Will we see the variables better when debugging? Or is this solely so the SymbolContextItem type doesn't disappear from the debug info?

We will see variables better when debugging too.

As long as Swig is happy and the ABI doesn't change I am ok with this. Will we see the variables better when debugging? Or is this solely so the SymbolContextItem type doesn't disappear from the debug info?

We will see variables better when debugging too.

(That was actually my primary motivation)

So far as I can tell, this patch will make lookup of exact types faster for PDB, but because of the way DWARF debug_names tables are constructed, I don't think there's any way we can do the same thing for DWARF.

But unless I'm misunderstanding the patch, this doesn't change correctness of the lookup (except for fixing "type lookup 'struct Foo'"). Did I miss something?

Jim

So far as I can tell, this patch will make lookup of exact types faster for PDB, but because of the way DWARF debug_names tables are constructed, I don't think there's any way we can do the same thing for DWARF.

But unless I'm misunderstanding the patch, this doesn't change correctness of the lookup (except for fixing "type lookup 'struct Foo'"). Did I miss something?

Jim

That's the other patch. This patch is NFC and just makes debugging nicer because you can see enum values in your debugger as rich enumerators. But for the other patch, if what you said is correct, then I suppose that's correct. I asked Eric Christopher and he said he thought (but wasn't 100% sure) that types were hashed in the accelerator tables by their full name and not their base name. If that is true then it could make exact matches faster. But if it's incorrect then yes, it wouldn't be able to make exact matches faster in DWARF.

Ah, right... Too many patches (a good problem!)

The standard as I read it says that the name entry points into the general string table, but doesn't specify which entry it points to. However, the current DWARF debug_info doesn't ever emit a string for the fully qualified name, so you would have to construct it specially to have an exact name to refer to. I also couldn't see anything that said specifically whether the DW_AT_name for a type has to be the full name or a base name, it just says:

If a name has been given to the structure, union, or class in the source program, then the corresponding structure type, union type, or class type entry has a DW_AT_name attribute whose value is a null-terminated string containing the type name.

But it would bloat the name tables to use the full name, and since you can reconstruct it from the context it's not needed... So I've only seen base names in the name entry for types in the debug_info.

Anyway, current clang for -gdwarf-5 produces:

Bucket 1 [
  Name 2 {
    Hash: 0xB887389
    String: 0x000000c3 "Foo"
    Entry @ 0x92 {
      Abbrev: 0x39
      Tag: DW_TAG_namespace
      DW_IDX_die_offset: 0x0000004c
    }
  }
]
Bucket 2 [
  Name 3 {
    Hash: 0xB8860BA
    String: 0x000000c7 "Bar"
    Entry @ 0x9b {
      Abbrev: 0x13
      Tag: DW_TAG_structure_type
      DW_IDX_die_offset: 0x0000004e
    }
  }
  Name 4 {
    Hash: 0x7C9A7F6A
    String: 0x000000b5 "main"
    Entry @ 0xa4 {
      Abbrev: 0x2E
      Tag: DW_TAG_subprogram
      DW_IDX_die_offset: 0x00000026
    }
  }

For:

namespace Foo
{

struct Bar
{
  int First;
};

}

int
main()
{

Foo::Bar mine = {10};
return mine.First;

}

Jim

I guess the question is, How is that hash and the bucket computed? If it's
based on the full name, then you should be able to get fast exact lookup.
If it's based on the based name, then it will indeed be slow.

This revision was not accepted when it landed; it landed in state Needs Review.Oct 25 2018, 1:47 PM
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state Needs Review.Oct 25 2018, 1:47 PM
This revision was automatically updated to reflect the committed changes.