This is an archive of the discontinued LLVM Phabricator instance.

[docs][OpaquePtr] Add detail to motivations behind opaque pointers
ClosedPublic

Authored by aeubanks on May 24 2022, 10:58 AM.

Download Raw Diff

Details

Reviewers

rnk
nikic

Group Reviewers

Restricted Project

Commits

rG47bfc365fc84: [docs][OpaquePtr] Add detail to motivations behind opaque pointers

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aeubanks created this revision.May 24 2022, 10:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 24 2022, 10:58 AM

aeubanks requested review of this revision.May 24 2022, 10:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 24 2022, 10:58 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

aeubanks added reviewers: Restricted Project, rnk.May 24 2022, 10:58 AM

Harbormaster completed remote builds in B166089: Diff 431727.May 24 2022, 11:59 AM

If you want to add any of the history, looks like this is my first email proposing the direction: https://lists.llvm.org/pipermail/llvm-dev/2015-February/081822.html

reference post from 2015

Harbormaster completed remote builds in B166292: Diff 432024.May 25 2022, 9:59 AM

rnk added inline comments.May 25 2022, 11:59 AM

llvm/docs/OpaquePointers.rst
51–101	I believe I provided this wording suggestion, but I think it needs work. I did a bit of digging, and if you go back to the original 2003 publication, it was explicit that the types were included with the intention that they would support optimization: "The architecture that we propose is based on a new language-independent low-level code representation that preserves important type information from the source code. ... However, the linktime optimizer can only perform meaningful optimizations on the program if it has enough high-level information about the program to prove that aggressive optimizations are safe. Because of this, the lowlevel code representation is typed (using a languageindependent constructive type system) and directly exposes information about structure and array accesses to the optimizer. ...." Originally, LLVM was a research project with a goal of enabling fancy optimizations (see the DSA paper). As LLVM evolved into a production compiler, the community started to realize that the LLVM struct type system, or at least the way llvm-gcc used it, couldn't really be used as a sound basis for alias analysis. The DSA alias analysis was removed from LLVM in 2006. So with that in mind, here's a wording suggestion: LLVM's type system was originally designed to support high-level optimization. However, years of LLVM implementation experience have demonstrated that the current pointee type system design does not effectively support optimization. Memory optimization algorithms, such as SROA, GVN, and AA, generally need to look through LLVM's struct types and reason about the underlying memory offsets. The community realized that pointee types are hindering LLVM development, rather than helping it. Pointee types provide some value to frontends because the IR verifier uses types to detect straightforward type confusion bugs. However, frontends also have to deal with the complexity of inserting bitcasts everywhere that they might be required. The current community consensus is that the costs of pointee types outweight the benefits, and that they should be removed.

update

+@nikic

This revision is now accepted and ready to land.Jun 15 2022, 4:21 PM

Harbormaster completed remote builds in B170146: Diff 437388.Jun 15 2022, 5:40 PM

Not familiar with the historical context, but looks fine to me :)

This revision was landed with ongoing or failed builds.Jun 16 2022, 10:17 AM

Closed by commit rG47bfc365fc84: [docs][OpaquePtr] Add detail to motivations behind opaque pointers (authored by aeubanks). · Explain Why

This revision was automatically updated to reflect the committed changes.

aeubanks added a commit: rG47bfc365fc84: [docs][OpaquePtr] Add detail to motivations behind opaque pointers.

foad added a subscriber: foad.Jun 17 2022, 2:40 AM

foad added inline comments.

llvm/docs/OpaquePointers.rst
55	"hindrance"

Revision Contents

Path

Size

llvm/

docs/

OpaquePointers.rst

100 lines

Diff 437592

llvm/docs/OpaquePointers.rst

	===============			===============
	Opaque Pointers			Opaque Pointers
	===============			===============

	The Opaque Pointer Type			The Opaque Pointer Type
	=======================			=======================

	Traditionally, LLVM IR pointer types have contained a pointee type. For example,			Traditionally, LLVM IR pointer types have contained a pointee type. For example,
	``i32*`` is a pointer that points to an ``i32`` somewhere in memory. However,			``i32*`` is a pointer that points to an ``i32`` somewhere in memory. However,
	due to a lack of pointee type semantics and various issues with having pointee			due to a lack of pointee type semantics and various issues with having pointee
	types, there is a desire to remove pointee types from pointers.			types, there is a desire to remove pointee types from pointers.

	The opaque pointer type project aims to replace all pointer types containing			The opaque pointer type project aims to replace all pointer types containing
	pointee types in LLVM with an opaque pointer type. The new pointer type is			pointee types in LLVM with an opaque pointer type. The new pointer type is
	tentatively represented textually as ``ptr``.			represented textually as ``ptr``.

				Some instructions still need to know what type to treat the memory pointed to by
				the pointer as. For example, a load needs to know how many bytes to load from
				memory and what type to treat the resulting value as. In these cases,
				instructions themselves contain a type argument. For example the load
				instruction from older versions of LLVM

				.. code-block:: llvm

				load i64* %p

				becomes

				.. code-block:: llvm

				load i64, ptr %p

	Address spaces are still used to distinguish between different kinds of pointers			Address spaces are still used to distinguish between different kinds of pointers
	where the distinction is relevant for lowering (e.g. data vs function pointers			where the distinction is relevant for lowering (e.g. data vs function pointers
	have different sizes on some architectures). Opaque pointers are not changing			have different sizes on some architectures). Opaque pointers are not changing
	anything related to address spaces and lowering. For more information, see			anything related to address spaces and lowering. For more information, see
	`DataLayout <LangRef.html#langref-datalayout>`_. Opaque pointers in non-default			`DataLayout <LangRef.html#langref-datalayout>`_. Opaque pointers in non-default
	address space are spelled ``ptr addrspace(N)``.			address space are spelled ``ptr addrspace(N)``.

				This was proposed all the way back in
				`2015 <https://lists.llvm.org/pipermail/llvm-dev/2015-February/081822.html>`_.

	Issues with explicit pointee types			Issues with explicit pointee types
	==================================			==================================

	LLVM IR pointers can be cast back and forth between pointers with different			LLVM IR pointers can be cast back and forth between pointers with different
	pointee types. The pointee type does not necessarily represent the actual			pointee types. The pointee type does not necessarily represent the actual
	underlying type in memory. In other words, the pointee type carries no real			underlying type in memory. In other words, the pointee type carries no real
	semantics.			semantics.

	Lots of operations do not actually care about the underlying type. These			Historically LLVM was some sort of type-safe subset of C. Having pointee types
	operations, typically intrinsics, usually end up taking an ``i8*``. This causes			provided an extra layer of checks to make sure that the Clang frontend matched
	lots of redundant no-op bitcasts in the IR to and from a pointer with a			its frontend values/operations with the corresponding LLVM IR. However, as other
	different pointee type. The extra bitcasts take up space and require extra work			languages like C++ adopted LLVM, the community realized that pointee types were
	to look through in optimizations. And more bitcasts increase the chances of			more of a hinderance for LLVM development and that the extra type checking with
				foadUnsubmitted Not Done Reply Inline Actions "hindrance" foad: "hindrance"
	incorrect bitcasts, especially in regards to address spaces.			some frontends wasn't worth it.

	Some instructions still need to know what type to treat the memory pointed to by			LLVM's type system was `originally designed
	the pointer as. For example, a load needs to know how many bytes to load from			<https://llvm.org/pubs/2003-05-01-GCCSummit2003.html>` to support high-level
	memory. In these cases, instructions themselves contain a type argument. For			optimization. However, years of LLVM implementation experience have demonstrated
	example the load instruction from older versions of LLVM			that the pointee type system design does not effectively support
				optimization. Memory optimization algorithms, such as SROA, GVN, and AA,
	.. code-block:: llvm			generally need to look through LLVM's struct types and reason about the
				underlying memory offsets. The community realized that pointee types hinder LLVM
	load i64* %p			development, rather than helping it. Some of the initially proposed high-level
				optimizations have evolved into `TBAA
	becomes			<https://llvm.org/docs/LangRef.html#tbaa-metadata>` due to limitations with
				representing higher-level language information directly via SSA values.
	.. code-block:: llvm
				Pointee types provide some value to frontends because the IR verifier uses types
	load i64, ptr %p			to detect straightforward type confusion bugs. However, frontends also have to
				deal with the complexity of inserting bitcasts everywhere that they might be
	A nice analogous transition that happened earlier in LLVM is integer signedness.			required. The community consensus is that the costs of pointee types
	There is no distinction between signed and unsigned integer types, rather the			outweight the benefits, and that they should be removed.
	integer operations themselves contain what to treat the integer as. Initially,
	LLVM IR distinguished between unsigned and signed integer types. The transition			Many operations do not actually care about the underlying type. These
	from manifesting signedness in types to instructions happened early on in LLVM's			operations, typically intrinsics, usually end up taking an arbitrary pointer
	life to the betterment of LLVM IR.			type ``i8*`` and sometimes a size. This causes lots of redundant no-op bitcasts
				in the IR to and from a pointer with a different pointee type.

				No-op bitcasts take up memory/disk space and also take up compile time to look
				through. However, perhaps the biggest issue is the code complexity required to
				deal with bitcasts. When looking up through def-use chains for pointers it's
				easy to forget to call `Value::stripPointerCasts()` to find the true underlying
				pointer obfuscated by bitcasts. And when looking down through def-use chains
				passes need to iterate through bitcasts to handle uses. Removing no-op pointer
				bitcasts prevents a category of missed optimizations and makes writing LLVM
				passes a little bit easier.

				Fewer no-op pointer bitcasts also reduces the chances of incorrect bitcasts in
				regards to address spaces. People maintaining backends that care a lot about
				address spaces have complained that frontends like Clang often incorrectly
				bitcast pointers, losing address space information.

				An analogous transition that happened earlier in LLVM is integer signedness.
				Currently there is no distinction between signed and unsigned integer types, but
				rather each integer operation (e.g. add) contains flags to signal how to treat
				the integer. Previously LLVM IR distinguished between unsigned and signed
				integer types and ran into similar issues of no-op casts. The transition from
				manifesting signedness in types to instructions happened early on in LLVM's
				timeline to make LLVM easier to work with.
				rnkUnsubmitted Not Done Reply Inline Actions I believe I provided this wording suggestion, but I think it needs work. I did a bit of digging, and if you go back to the original 2003 publication, it was explicit that the types were included with the intention that they would support optimization: "The architecture that we propose is based on a new language-independent low-level code representation that preserves important type information from the source code. ... However, the linktime optimizer can only perform meaningful optimizations on the program if it has enough high-level information about the program to prove that aggressive optimizations are safe. Because of this, the lowlevel code representation is typed (using a languageindependent constructive type system) and directly exposes information about structure and array accesses to the optimizer. ...." Originally, LLVM was a research project with a goal of enabling fancy optimizations (see the DSA paper). As LLVM evolved into a production compiler, the community started to realize that the LLVM struct type system, or at least the way llvm-gcc used it, couldn't really be used as a sound basis for alias analysis. The DSA alias analysis was removed from LLVM in 2006. So with that in mind, here's a wording suggestion: LLVM's type system was originally designed to support high-level optimization. However, years of LLVM implementation experience have demonstrated that the current pointee type system design does not effectively support optimization. Memory optimization algorithms, such as SROA, GVN, and AA, generally need to look through LLVM's struct types and reason about the underlying memory offsets. The community realized that pointee types are hindering LLVM development, rather than helping it. Pointee types provide some value to frontends because the IR verifier uses types to detect straightforward type confusion bugs. However, frontends also have to deal with the complexity of inserting bitcasts everywhere that they might be required. The current community consensus is that the costs of pointee types outweight the benefits, and that they should be removed. rnk: I believe I provided this wording suggestion, but I think it needs work. I did a bit of…

	Opaque Pointers Mode			Opaque Pointers Mode
	====================			====================

	During the transition phase, LLVM can be used in two modes: In typed pointer			During the transition phase, LLVM can be used in two modes: In typed pointer
	mode all pointer types have a pointee type and opaque pointers cannot be used.			mode all pointer types have a pointee type and opaque pointers cannot be used.
	In opaque pointers mode (the default), all pointers are opaque. The opaque			In opaque pointers mode (the default), all pointers are opaque. The opaque
	pointer mode can be disabled using ``-opaque-pointers=0`` in			pointer mode can be disabled using ``-opaque-pointers=0`` in
	▲ Show 20 Lines • Show All 166 Lines • Show Last 20 Lines