This is an archive of the discontinued LLVM Phabricator instance.

[docs] Add page on opaque pointer types
ClosedPublic

Authored by aeubanks on May 11 2021, 5:24 PM.

Diff Detail

Event Timeline

aeubanks created this revision.May 11 2021, 5:24 PM
aeubanks requested review of this revision.May 11 2021, 5:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2021, 5:24 PM

Looks like a good start - might benefit from a smidge more structure (see comment)

llvm/docs/OpaquePointers.rst
31–34

This is the first paragraph that seems to assume the existence/description/knowledge of the future state (opaque pointers) but nothing before this paragraph seems to have described that future, so the motivation for the suggestion to frontends to keep track of pointee types on their own is missing due to that.

Might be worth separate headings for "Motivation", "Solution", and "fallout"/necessary changes (as either prerequisite cleanup in LLVM itself, or as "this is the sort of thing you need to do now that you're trying to integrate your LLVM-using code with a new version of LLVM that lacks pointee types")

aeubanks updated this revision to Diff 344665.May 11 2021, 10:25 PM

more structure

Matt added a subscriber: Matt.May 12 2021, 2:32 AM
nlopes added a subscriber: nlopes.May 12 2021, 3:11 AM

Can you clarify if this only for data type or also for function types pls?
(data and function pointer types don't have the same bit-width on all architectures)

Can you clarify if this only for data type or also for function types pls?
(data and function pointer types don't have the same bit-width on all architectures)

This should apply to all pointer types.
Do you have an example in LLVM of where this matters for function vs data?
In TargetMachine I only see the pointer size differences per address space, not function vs data. TargetMachine::getPointerSize()/DataLayout::getPointerSize().

aeubanks updated this revision to Diff 344890.May 12 2021, 11:42 AM

add section on what frontends need to do

Can you clarify if this only for data type or also for function types pls?
(data and function pointer types don't have the same bit-width on all architectures)

This should apply to all pointer types.
Do you have an example in LLVM of where this matters for function vs data?
In TargetMachine I only see the pointer size differences per address space, not function vs data. TargetMachine::getPointerSize()/DataLayout::getPointerSize().

I'm not an expert on the backend stuff. I just wanted to get it documented, while we are at it, whether LLVM supports architectures with different data/function pointer sizes or not.
I guess the answer that you point at, is that one needs to use different address spaces if working with such architectures. I would be grateful if you could document this here as well.

llvm/docs/OpaquePointers.rst
128

this sounds like you can't use GEP anymore. I believe you mean certain ways of building GEPs have to be replaced? Maybe give an example of old/new API?

aeubanks updated this revision to Diff 344896.May 12 2021, 11:56 AM

give example on what instruction builder APIs to avoid

Can you clarify if this only for data type or also for function types pls?
(data and function pointer types don't have the same bit-width on all architectures)

This should apply to all pointer types.
Do you have an example in LLVM of where this matters for function vs data?
In TargetMachine I only see the pointer size differences per address space, not function vs data. TargetMachine::getPointerSize()/DataLayout::getPointerSize().

I'm not an expert on the backend stuff. I just wanted to get it documented, while we are at it, whether LLVM supports architectures with different data/function pointer sizes or not.
I guess the answer that you point at, is that one needs to use different address spaces if working with such architectures. I would be grateful if you could document this here as well.

I'm also not an expert on backend stuff, some clarification from somebody more knowledgeable would be great. Although documenting things like using different address spaces in those cases seems unrelated, that wouldn't change (assuming that's true) with opaque pointers, and the first section does mention that anything with address spaces is unchanged.

Can you clarify if this only for data type or also for function types pls?
(data and function pointer types don't have the same bit-width on all architectures)

This should apply to all pointer types.
Do you have an example in LLVM of where this matters for function vs data?
In TargetMachine I only see the pointer size differences per address space, not function vs data. TargetMachine::getPointerSize()/DataLayout::getPointerSize().

I'm not an expert on the backend stuff. I just wanted to get it documented, while we are at it, whether LLVM supports architectures with different data/function pointer sizes or not.
I guess the answer that you point at, is that one needs to use different address spaces if working with such architectures. I would be grateful if you could document this here as well.

Yeah, that seems out of scope for this document. https://llvm.org/docs/LangRef.html#pointer-type documents that pointers to functions and pointers to other things are all roughly the same sort of thing - and we're changing all of them over to the opaque pointer type. & other wording generally treats all pointers as having the same size, etc. I think that's best left elsewhere than this doc if there's no interesting distinction between function pointers and data pointers currently (which it doesn't seem like there is).

dblaikie accepted this revision.May 13 2021, 1:26 PM

This looks alright to me as a starting point - welcome to wait a bit/seek other feedback/approvals (especially if there are other outstanding comments) - but also should be easy to address in post-commit too.

This revision is now accepted and ready to land.May 13 2021, 1:26 PM
dexonsmith accepted this revision.May 13 2021, 2:27 PM

LGTM too. This is great to have; thanks for writing this up!

This revision was automatically updated to reflect the committed changes.

Yeah, that seems out of scope for this document. https://llvm.org/docs/LangRef.html#pointer-type documents that pointers to functions and pointers to other things are all roughly the same sort of thing - and we're changing all of them over to the opaque pointer type. & other wording generally treats all pointers as having the same size, etc. I think that's best left elsewhere than this doc if there's no interesting distinction between function pointers and data pointers currently (which it doesn't seem like there is).

Note that we have support for different code and data representations with the P flag in the DataLayout. We can also differentiate between pointers to globals with G and there's an under-review patch for a specified address space for allocas. I think it's worth highlighting in this document that address spaces are the recommended way of representing different kinds of pointers (with potentially different sizes, access to different types of memory, representations of null, and so on). We retain the distinction between pointer types where the semantics matter for lowering, we lose the distinction when the distinction doesn't convey any useful meaning.

llvm/docs/OpaquePointers.rst
18

I don't think this needs stating, especially given how important address spaces are in motivating this work: The need to cast to i8* makes a lot of code need to be AS-aware if integrates with code that uses address spaces, whereas with opaque pointers you don't need the bitcasts and so you don't need to care about the pointer AS.

Yeah, that seems out of scope for this document. https://llvm.org/docs/LangRef.html#pointer-type documents that pointers to functions and pointers to other things are all roughly the same sort of thing - and we're changing all of them over to the opaque pointer type. & other wording generally treats all pointers as having the same size, etc. I think that's best left elsewhere than this doc if there's no interesting distinction between function pointers and data pointers currently (which it doesn't seem like there is).

Note that we have support for different code and data representations with the P flag in the DataLayout. We can also differentiate between pointers to globals with G and there's an under-review patch for a specified address space for allocas. I think it's worth highlighting in this document that address spaces are the recommended way of representing different kinds of pointers (with potentially different sizes, access to different types of memory, representations of null, and so on). We retain the distinction between pointer types where the semantics matter for lowering, we lose the distinction when the distinction doesn't convey any useful meaning.

This is good information to know (I didn't know about it before), but IMO address spaces should have their own documentation (that I'd be happy to link to here). This seems orthogonal to opaque pointers.

Yeah, that seems out of scope for this document. https://llvm.org/docs/LangRef.html#pointer-type documents that pointers to functions and pointers to other things are all roughly the same sort of thing - and we're changing all of them over to the opaque pointer type. & other wording generally treats all pointers as having the same size, etc. I think that's best left elsewhere than this doc if there's no interesting distinction between function pointers and data pointers currently (which it doesn't seem like there is).

Note that we have support for different code and data representations with the P flag in the DataLayout. We can also differentiate between pointers to globals with G and there's an under-review patch for a specified address space for allocas. I think it's worth highlighting in this document that address spaces are the recommended way of representing different kinds of pointers (with potentially different sizes, access to different types of memory, representations of null, and so on). We retain the distinction between pointer types where the semantics matter for lowering, we lose the distinction when the distinction doesn't convey any useful meaning.

This is good information to know (I didn't know about it before), but IMO address spaces should have their own documentation (that I'd be happy to link to here). This seems orthogonal to opaque pointers.

Probably address spaces should have their own document (for now you could point at the LangRef documentation on addrspace), but if this is going to be a common question, it might be worth answering here as well; and precisely defining how opaque "opaque" is seems on topic (e.g., @theraven's comment about retaining differences for lowering semantics).