This is an archive of the discontinued LLVM Phabricator instance.

Include getting generated struct offsets in CodegenABITypes
ClosedPublic

Authored by mppf on Oct 2 2017, 11:29 AM.

Details

Summary

This change adds a new function, CodeGen::getFieldNumber, that
enables a user of clang's code generation to get the field number
in a generated LLVM IR struct that corresponds to a particular field
in a C struct.

It is important to expose this information in Clang's code generation
interface because there is no reasonable way for users of Clang's code
generation to get this information. In particular:

  1. LLVM struct types do not include field names.
  2. Clang adds a non-trivial amount of logic to the code generation of LLVM IR types for structs, in particular to handle padding and bit fields.

Diff Detail

Repository
rL LLVM

Event Timeline

mppf created this revision.Oct 2 2017, 11:29 AM
mppf added a comment.Oct 2 2017, 11:35 AM

This in some ways is a philosophical continuation of https://reviews.llvm.org/D35180 .

I'm working on a programming language that uses Clang to create seamless C interoperability. Of of the language features is to create "extern records", which are just structs that have a definition in C code. The language needs to be able to code generate element access to particular structure fields by name, and without this patch, I need to use clang-internal headers to do so.

mppf removed a subscriber: cfe-commits.Oct 2 2017, 11:42 AM
mppf added a subscriber: cfe-commits.
aprantl edited edge metadata.Oct 3 2017, 8:58 AM

This needs some kind of testcase. Either a unittest or a test-only executable similar to, for example, clang-index-test.

mppf updated this revision to Diff 117579.Oct 3 2017, 2:06 PM
  • Add a test of getFieldNumber (and others)
rsmith edited edge metadata.Oct 3 2017, 3:03 PM

I have no philosophical objection to exposing this, but defer to @rjmccall on that. As is usual with our C++ API, there's no guarantee that we will keep this working in future versions. (Eg, if we move to an IR representation using something non-GEPable to represent a source-level class type, we would remove this function as it would be meaningless.)

This interface does not seem to be sufficient to support bit-fields. You should also indicate *which* record layout (complete object type or base subobject type) the field number is for. I don't think there's any guarantee that the same indexes work in both.

rjmccall added inline comments.Oct 3 2017, 7:08 PM
include/clang/CodeGen/CodeGenABITypes.h
81 โ†—(On Diff #117400)

"Given a non-bitfield struct field, return its index within the elements of the struct's converted type."

83 โ†—(On Diff #117400)

I think "getLLVMFieldNumber" is probably the right name for this.

mppf updated this revision to Diff 117665.Oct 4 2017, 6:51 AM
  • Improve comments, rename to getLLVMFieldNumber
mppf added a comment.Oct 4 2017, 7:04 AM

Thanks for the feedback!

"Given a non-bitfield struct field, return its index within the elements of the struct's converted type."

Done

I think "getLLVMFieldNumber" is probably the right name for this.

Great suggestion, done.

This interface does not seem to be sufficient to support bit-fields.

Right, supporting bit fields would make it more complex & isn't something I need right now anyway. With rjmccal's suggested comment, it's clear that bit fields won't work with this function. (Given a non-bitfield struct field...)

You should also indicate *which* record layout (complete object type or base subobject type) the field number is for. I don't think there's any guarantee that the same indexes work in both.

I added a comment about this - indicating that inherited fields are not supported by this function. My use case is just for C structs at the moment and I'm not actually sure what we'd need to do for the interface to support building GEPs for inherited fields. I'm sure it would complicate the API. I think we should leave that complexity to another function or a future function (when somebody actually needs it).

(Eg, if we move to an IR representation using something non-GEPable to represent a source-level class type, we would remove this function as it would be meaningless.)

In that event, I'd just want to have a way to build the appropriate field access. (e.g. a function to build an appropriate GEP; or a function that adds appropriate instructions to an IRBuilder and returns a Value* that points to the field, say). I'm open to trying to go directly to such an interface but I'd need significant help in designing/implementing it. (The interface I've proposed here meets my needs and I don't think I know enough about clang to design/implement a better one without help).

In D38473#888159, @mppf wrote:

You should also indicate *which* record layout (complete object type or base subobject type) the field number is for. I don't think there's any guarantee that the same indexes work in both.

I added a comment about this - indicating that inherited fields are not supported by this function. My use case is just for C structs at the moment and I'm not actually sure what we'd need to do for the interface to support building GEPs for inherited fields. I'm sure it would complicate the API. I think we should leave that complexity to another function or a future function (when somebody actually needs it).

I don't think your added comment is related to my request. There are two (potentially) different LLVM types for a C++ class type in general, one for the case where the object is a base class subobject (excluding vbases, and where the tail padding could be reused) and one for where it is a complete object (including vbases, and where the tail padding is known to be owned by the object). I don't know whether we guarantee that field indexes for one layout can be used for the other layout; in particular, I'm concerned that one could be a packed struct when the other isn't, and thus the field indexes could differ. (I think we generally use the complete object type within IR generation, even though that may seem like an odd choice.)

But perhaps we do guarantee the common prefixes are the same (@rjmccall: do we?), or we only ever expose one of the two layouts to code outside Clang, and in either case this wouldn't matter.

(Eg, if we move to an IR representation using something non-GEPable to represent a source-level class type, we would remove this function as it would be meaningless.)

In that event, I'd just want to have a way to build the appropriate field access. (e.g. a function to build an appropriate GEP; or a function that adds appropriate instructions to an IRBuilder and returns a Value* that points to the field, say). I'm open to trying to go directly to such an interface but I'd need significant help in designing/implementing it. (The interface I've proposed here meets my needs and I don't think I know enough about clang to design/implement a better one without help).

Given the generality of kinds of lvalues and rvalues that we support, this would mean exposing a significant chunk of the CodeGen interface, I think -- probably much more than we would be comfortable exposing.

rjmccall edited edge metadata.Oct 4 2017, 7:32 PM
In D38473#888159, @mppf wrote:

You should also indicate *which* record layout (complete object type or base subobject type) the field number is for. I don't think there's any guarantee that the same indexes work in both.

I added a comment about this - indicating that inherited fields are not supported by this function. My use case is just for C structs at the moment and I'm not actually sure what we'd need to do for the interface to support building GEPs for inherited fields. I'm sure it would complicate the API. I think we should leave that complexity to another function or a future function (when somebody actually needs it).

I don't think your added comment is related to my request. There are two (potentially) different LLVM types for a C++ class type in general, one for the case where the object is a base class subobject (excluding vbases, and where the tail padding could be reused) and one for where it is a complete object (including vbases, and where the tail padding is known to be owned by the object). I don't know whether we guarantee that field indexes for one layout can be used for the other layout; in particular, I'm concerned that one could be a packed struct when the other isn't, and thus the field indexes could differ. (I think we generally use the complete object type within IR generation, even though that may seem like an odd choice.)

But perhaps we do guarantee the common prefixes are the same (@rjmccall: do we?), or we only ever expose one of the two layouts to code outside Clang, and in either case this wouldn't matter.

I think we should probably just document that they're only valid in the complete-object type, which also, yes, happens to be the only type we expose outside of Clang because it's the type returned by convertTypeForMemory.

(Eg, if we move to an IR representation using something non-GEPable to represent a source-level class type, we would remove this function as it would be meaningless.)

In that event, I'd just want to have a way to build the appropriate field access. (e.g. a function to build an appropriate GEP; or a function that adds appropriate instructions to an IRBuilder and returns a Value* that points to the field, say). I'm open to trying to go directly to such an interface but I'd need significant help in designing/implementing it. (The interface I've proposed here meets my needs and I don't think I know enough about clang to design/implement a better one without help).

Given the generality of kinds of lvalues and rvalues that we support, this would mean exposing a significant chunk of the CodeGen interface, I think -- probably much more than we would be comfortable exposing.

I thought about making this sort of comment and decided that if we added more non-GEPable kinds of fields, we could always just put them after bit-fields in the exceptions list.

mppf updated this revision to Diff 117806.Oct 5 2017, 6:18 AM
  • Update comment for "complete object type"
mppf added a comment.Oct 5 2017, 6:19 AM

@rjmccall , @rsmith - I added a comment about complete object type. I think I've addressed all the concerns at this point, but speak up if I'm misunderstanding.

I don't have commit access so I'll need help committing it. Thanks for your help!

rsmith added a comment.Oct 5 2017, 1:45 PM

Looks good to me.

mppf added a comment.Oct 10 2017, 7:25 AM

Since I don't have commit access, could somebody commit this for me please? Thanks.

@rjmccall Do you have any more comments?

I can commit this for you if John is happy with it.

rjmccall accepted this revision.Oct 10 2017, 4:15 PM

LGTM.

This revision is now accepted and ready to land.Oct 10 2017, 4:15 PM
This revision was automatically updated to reflect the committed changes.