Page MenuHomePhabricator

Names for structs are held on the Context, not the Module. Move getTypeByName from Module to Type taking a Context parameter.
ClosedPublic

Authored by nicholas on Apr 23 2020, 11:06 PM.

Details

Summary

Creating a named struct requires only a Context and a name, but looking up a struct by name requires a Module. The method on Module merely accesses the LLVMContextImpl and no data from the module itself, so this patch moves getTypeByName to a static method on StructType that takes a Context and a name.

There's a small number of users of this function, they are all updated.

This updates the C API adding a new method LLVMGetTypeByName2, naming suggestions welcome. I considered LLVMGetTypeByNameInContext, but the LLVM*InContext pattern is used to differentiate between methods that work on the global context and those that don't, and that isn't the case here.

Diff Detail

Event Timeline

nicholas created this revision.Apr 23 2020, 11:06 PM
nicholas retitled this revision from Names for structs held are on the Context, not the Module. Move getTypeByName off Module. to Names for structs are held on the Context, not the Module. Move getTypeByName from Module to Type taking a Context parameter..Apr 28 2020, 9:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2020, 9:50 AM

I think the new location is reasonable, I don't know about our C API policy though. Two questions below.

llvm/include/llvm-c/Core.h
873
  1. Should we use the deprecated attribute here?
  2. We cannot use attribute((overrload)) can we?
nickwasmer added inline comments.
llvm/include/llvm-c/Core.h
873
  1. There aren't any other uses of attribute deprecated, or any GNU-style attribute in the C API header files. At a guess, this might be for compatibility with non-GNU compilers.
  1. We can't. It would change the symbol name of both of them, which would be an ABI break for the existing LLVMGetTypeByName.
jdoerfert added inline comments.Apr 30 2020, 12:56 PM
llvm/include/llvm-c/Core.h
873
  1. Unclear if that is really a problem but OK.
  2. Makes sense.
CodaFi added a subscriber: CodaFi.Jun 9 2020, 12:39 PM

C API changes LGTM. My one request is to update the echo llvm-c-test to use the new API.

jdoerfert accepted this revision.Jun 10 2020, 1:41 PM

LLVM changes LGTM.

This revision is now accepted and ready to land.Jun 10 2020, 1:41 PM
nicholas updated this revision to Diff 272617.Jun 22 2020, 11:03 PM
nicholas added a project: Restricted Project.

Use LLVMGetTypeByName2 in llvm-c-test echo test too.

nicholas added inline comments.Jun 23 2020, 11:04 AM
llvm/include/llvm-c/Core.h
630

clang-tidy readability-identifier-naming doesn't like 'LLVMGetTypeByName2'. Should I ignore this because it's standard style for the LLVM C API? Or what should I change it to?

This revision was landed with ongoing or failed builds.Nov 30 2020, 11:35 AM
This revision was automatically updated to reflect the committed changes.

@nicholas, I missed this when it was reviewed, but seeing it now that you've landed it.

Can you help me understand if there's a deeper motivation than the one in the patch description? I ask because I would have envisioned eventually moving these "named" types to the module (i.e., keep the old API before your change, but change the implementation to make sense of it).

In particular, it would be nice if LLVMContext were restricted to types / values / metadata that are purely functional, and anything non-functional or open to interpretation is put on the Module.

  • For values and metadata, this requires some significant work (I have some concrete ideas of how, but it seems quite tangential to this patch).
  • For types, this could be done as a very simple follow-up to the opaque pointer work (@dblaikie @arsenm, not sure if you have thoughts). The Type class would be purely functional, and the struct names would move to a symbol table in each Module. The name is not inherent to the type, it's just a handle that some module wants to use to refer to it (the type itself can be shared between modules even if they disagree about what to call it).

One benefit of this hypothetical change is that we could change everything in LLVMContext to be immutable, potentially using lock-free data structures, etc.; I think that would open up some interesting possibilities.

@nicholas, I missed this when it was reviewed, but seeing it now that you've landed it.

Can you help me understand if there's a deeper motivation than the one in the patch description? I ask because I would have envisioned eventually moving these "named" types to the module (i.e., keep the old API before your change, but change the implementation to make sense of it).

My motivation really is only what's stated in the description, I was asked about this by an author of an LLVM wrapper for Rust, all the other APIs didn't need a Module and this one did. If I remember right, LLVM pre-3.0 used to work as you describe, at which point the current system where distinct names of structs imply distinct struct types was introduced leading to a major simplification / performance improvement in forward declarations and module linking. See the first bullet under "Internal API Changes" in https://releases.llvm.org/3.0/docs/ReleaseNotes.html . My justification for this patch is that it's just cleaning up this one spot that was missed way back in the 2.9 -> 3.0 cycle.

In particular, it would be nice if LLVMContext were restricted to types / values / metadata that are purely functional, and anything non-functional or open to interpretation is put on the Module.

  • For values and metadata, this requires some significant work (I have some concrete ideas of how, but it seems quite tangential to this patch).
  • For types, this could be done as a very simple follow-up to the opaque pointer work (@dblaikie @arsenm, not sure if you have thoughts). The Type class would be purely functional, and the struct names would move to a symbol table in each Module. The name is not inherent to the type, it's just a handle that some module wants to use to refer to it (the type itself can be shared between modules even if they disagree about what to call it).

I don't have much to add here, I'd simply recommend that you have a plan for dealing with the problems of type merging and refinement, structs whose members are pointers to that same struct (detect and reject creating a struct that holds itself as a member, %0 = type { \0 }), etc., before making the change.

I'm not familiar with the opaque pointer work and how it affects the types and context.

One benefit of this hypothetical change is that we could change everything in LLVMContext to be immutable, potentially using lock-free data structures, etc.; I think that would open up some interesting possibilities.

Offhand it's not clear to me that the LLVMContext becomes immutable? It still has to construct those pure-functional objects exactly once to ensure they're pointer-comparable, and that's a mutation? Perhaps you can do that without a lock, and that's close enough?

llvm/include/llvm-c/Core.h
630

clang-tidy readability-identifier-naming doesn't like 'LLVMGetTypeByName2'. Should I ignore this because it's standard style for the LLVM C API? Or what should I change it to?

It appears that this blocks buildbot due to the clang-tidy error. I'll try suppressing with NOLINT now, but it'd be nice if the automated checks were aware of the C API.

My justification for this patch is that it's just cleaning up this one spot that was missed way back in the 2.9 -> 3.0 cycle.

Fair enough, makes sense to me.

In particular, it would be nice if LLVMContext were restricted to types / values / metadata that are purely functional, and anything non-functional or open to interpretation is put on the Module.

  • For values and metadata, this requires some significant work (I have some concrete ideas of how, but it seems quite tangential to this patch).
  • For types, this could be done as a very simple follow-up to the opaque pointer work (@dblaikie @arsenm, not sure if you have thoughts). The Type class would be purely functional, and the struct names would move to a symbol table in each Module. The name is not inherent to the type, it's just a handle that some module wants to use to refer to it (the type itself can be shared between modules even if they disagree about what to call it).

I don't have much to add here, I'd simply recommend that you have a plan for dealing with the problems of type merging and refinement, structs whose members are pointers to that same struct (detect and reject creating a struct that holds itself as a member, %0 = type { \0 }), etc., before making the change.

I'm not familiar with the opaque pointer work and how it affects the types and context.

Yeah, that's exactly what's fixed by the opaque pointer work: it turns the type graph into a DAG. In your example, the type wasn't holding itself as a member (that's not possible), it was holding a pointer to itself as a member. Once pointers are opaque, then it's just holding a pointer, no cycles anymore.

One benefit of this hypothetical change is that we could change everything in LLVMContext to be immutable, potentially using lock-free data structures, etc.; I think that would open up some interesting possibilities.

Offhand it's not clear to me that the LLVMContext becomes immutable? It still has to construct those pure-functional objects exactly once to ensure they're pointer-comparable, and that's a mutation? Perhaps you can do that without a lock, and that's close enough?

Sorry, the objects held by the LLVMContext would be immutable; adding things to it could certainly race; but yeah, I think you can do that without a lock.

! In D78793#2424013, @dexonsmith wrote:
In particular, it would be nice if LLVMContext were restricted to types / values / metadata that are purely functional, and anything non-functional or open to interpretation is put on the Module.

  • For values and metadata, this requires some significant work (I have some concrete ideas of how, but it seems quite tangential to this patch).
  • For types, this could be done as a very simple follow-up to the opaque pointer work (@dblaikie @arsenm, not sure if you have thoughts). The Type class would be purely functional, and the struct names would move to a symbol table in each Module. The name is not inherent to the type, it's just a handle that some module wants to use to refer to it (the type itself can be shared between modules even if they disagree about what to call it).

I don't have much to add here, I'd simply recommend that you have a plan for dealing with the problems of type merging and refinement, structs whose members are pointers to that same struct (detect and reject creating a struct that holds itself as a member, %0 = type { \0 }), etc., before making the change.

I'm not familiar with the opaque pointer work and how it affects the types and context.

Yeah, that's exactly what's fixed by the opaque pointer work: it turns the type graph into a DAG. In your example, the type wasn't holding itself as a member (that's not possible), it was holding a pointer to itself as a member. Once pointers are opaque, then it's just holding a pointer, no cycles anymore.

My initial reaction, that sounds like a great proposal!

Let me just think out loud, so to type. It's been a while since I last thought hard about the consequences of changes to the LLVM IR. Back In the 2.x series, LLVM had an OpaqueType that could take the place of any type. It was something that might be resolved as part of linking, and this made module linking pretty complicated as opaques got resolved (not to mention the rest of LLVM had to deal with opaques though really except it didn't really inside function bodies). OpaqueType was restricted to StructType by 2.9 in preparation for the 3.0 change. I want to make sure that having a replacement for OpaqueType that's restricted to pointers doesn't reintroduce most of the same problems. After all, pointers aren't the only type that have elements, why not array types? Array types can have struct elements and struct elements can have array types, so we'll still need to do a recursive check that a type is acyclic when constructed. That doesn't sound too bad, though I think it will have to be O(n^2) unless you can get it in the right data representation to use Tarjan's algorithm which will probably be tricky. It seems that the big difference here (vs. LLVM 2.9) is that we can have opaque pointers propagated through the whole IR including all the instructions that operate on pointers. Then when linking we match up types by structure and we never have to do any type refinement. The remaining challenge here is forward declared structs, we want those to match up by name during module linking, but is there any reason that can't be handled the same as current LLVM does it? Same name in two modules, if both have bodies one gets renamed, otherwise they're the same type. The name can then live on the module and not the context. StructType::setBody can still be used to take a struct from body-less to one with a body, though it still has to check for non-pointer recursive types on construction. Ah, but you never need to because the only way to get a legal cycle is through a pointer anyhow! So you still need empty-body structs for the forward declaration case (as in, the original C code used a forward-declared struct -- note that struct Foo; struct Foo f(); returning a non-pointer forward declared struct is valid in C) to handle module linking, but that's the only case, in all other cases you can pass in all fields at construction time. So you don't really need setBody at all, the module linker just has to change the types from the opaque one to the one with a body when moving the values.

Ok, I like this plan. Best of luck!

! In D78793#2424013, @dexonsmith wrote:
In particular, it would be nice if LLVMContext were restricted to types / values / metadata that are purely functional, and anything non-functional or open to interpretation is put on the Module.

  • For values and metadata, this requires some significant work (I have some concrete ideas of how, but it seems quite tangential to this patch).
  • For types, this could be done as a very simple follow-up to the opaque pointer work (@dblaikie @arsenm, not sure if you have thoughts). The Type class would be purely functional, and the struct names would move to a symbol table in each Module. The name is not inherent to the type, it's just a handle that some module wants to use to refer to it (the type itself can be shared between modules even if they disagree about what to call it).

I don't have much to add here, I'd simply recommend that you have a plan for dealing with the problems of type merging and refinement, structs whose members are pointers to that same struct (detect and reject creating a struct that holds itself as a member, %0 = type { \0 }), etc., before making the change.

I'm not familiar with the opaque pointer work and how it affects the types and context.

Yeah, that's exactly what's fixed by the opaque pointer work: it turns the type graph into a DAG. In your example, the type wasn't holding itself as a member (that's not possible), it was holding a pointer to itself as a member. Once pointers are opaque, then it's just holding a pointer, no cycles anymore.

My initial reaction, that sounds like a great proposal!

Let me just think out loud, so to type. It's been a while since I last thought hard about the consequences of changes to the LLVM IR. Back In the 2.x series, LLVM had an OpaqueType that could take the place of any type. It was something that might be resolved as part of linking, and this made module linking pretty complicated as opaques got resolved (not to mention the rest of LLVM had to deal with opaques though really except it didn't really inside function bodies). OpaqueType was restricted to StructType by 2.9 in preparation for the 3.0 change. I want to make sure that having a replacement for OpaqueType that's restricted to pointers doesn't reintroduce most of the same problems. After all, pointers aren't the only type that have elements, why not array types? Array types can have struct elements and struct elements can have array types, so we'll still need to do a recursive check that a type is acyclic when constructed. That doesn't sound too bad, though I think it will have to be O(n^2) unless you can get it in the right data representation to use Tarjan's algorithm which will probably be tricky. It seems that the big difference here (vs. LLVM 2.9) is that we can have opaque pointers propagated through the whole IR including all the instructions that operate on pointers. Then when linking we match up types by structure and we never have to do any type refinement. The remaining challenge here is forward declared structs, we want those to match up by name during module linking, but is there any reason that can't be handled the same as current LLVM does it? Same name in two modules, if both have bodies one gets renamed, otherwise they're the same type. The name can then live on the module and not the context. StructType::setBody can still be used to take a struct from body-less to one with a body, though it still has to check for non-pointer recursive types on construction. Ah, but you never need to because the only way to get a legal cycle is through a pointer anyhow! So you still need empty-body structs for the forward declaration case (as in, the original C code used a forward-declared struct -- note that struct Foo; struct Foo f(); returning a non-pointer forward declared struct is valid in C) to handle module linking, but that's the only case, in all other cases you can pass in all fields at construction time. So you don't really need setBody at all, the module linker just has to change the types from the opaque one to the one with a body when moving the values.

Ok, I like this plan. Best of luck!

Thanks for thinking it through! I also appreciate hearing about the the pre-3.0 days; I wasn't aware of that history (and how we're revisiting similar solutions).

Note that if you want more discussion of the opaque pointers, David gave an LLVM talk a while ago (https://www.youtube.com/watch?v=OWgWDx_gB1I). He left a few exercises for the viewer/listener which I think are slowly being worked through (I've been following it mostly from a distance, but it appears to be getting close).

Thanks for thinking it through! I also appreciate hearing about the the pre-3.0 days; I wasn't aware of that history (and how we're revisiting similar solutions).

Yeah, appreciate the thoughts/context. (I came in /just/ after the 3.0 days - some of my first patches were cleaning up/removing "const" on "const Type" because Type was now immutable)

Note that if you want more discussion of the opaque pointers, David gave an LLVM talk a while ago (https://www.youtube.com/watch?v=OWgWDx_gB1I). He left a few exercises for the viewer/listener which I think are slowly being worked through (I've been following it mostly from a distance, but it appears to be getting close).

Yep, that's about the size of it. Few folks have chipped in here and there - glad to help/might be able to pick up some of it myself here and there.