This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
-
LangRef.rst
-
include/llvm/IR/
-
llvm/
-
IR/
-
DataLayout.h
-
lib/IR/
-
IR/
-
DataLayout.cpp
-
test/Assembler/
-
Assembler/
-
invalid-datalayout25.ll
-
invalid-datalayout26.ll

Differential D135158

[DataLayout] Introduce DataLayout::getPointerIntegralSize(AS)
Needs ReviewPublic

Authored by arichardson on Oct 4 2022, 7:32 AM.

Download Raw Diff

Details

Reviewers

nikic
theraven
jrtc27
reames
mkazantsev

Summary

This function can be used to retrieve the number of bits that can be used
for arithmetic in a given address space (i.e. the range of the address
space). For most (all?) in-tree targets this should not make any difference,
but differentiating between the size of a pointer in bits and the address
range is extremely important e.g. for CHERI-enabled targets, where pointers
carry additional metadata such as bounds and permissions and only a subset
of the pointer bits is used as the address. This could also benefit other
users of non-integral pointers but I am not familiar with any of those
backends. In the out-of-tree CHERI target, we use the index width of the
datalayout to identify the address range (and I believe this was also
the intent when the index type was introduced in D42123).
This commit is just a clarification of the LangRef as well as
the introduction of new functions that more clearly explain the
intended usage. In the future, if we end up supporting a target
with index width != integral range, we could add a new datalayout
component.

I am not sure if getPointerIntegralSize() is the best name for this
accessor, I also considered names such as getPointerArithmenticRange or
getPointerAddressRange().

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

arichardson created this revision.Oct 4 2022, 7:32 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 4 2022, 7:32 AM

Herald added subscribers: jdoerfert, hiraditya. · View Herald Transcript

arichardson requested review of this revision.Oct 4 2022, 7:32 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 4 2022, 7:32 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

I don't really understand what the distinction between the index size and the integral size is supposed to be. Can you please give some examples where these two quantities would differ?

I would be extremely vary of adding a third dimension to pointer sizes. The index size is already enough of a burden to deal with, and we've only recently approached something resembling correct use of index sizes.

In D135158#3833549, @nikic wrote:

I don't really understand what the distinction between the index size and the integral size is supposed to be. Can you please give some examples where these two quantities would differ?

I would be extremely vary of adding a third dimension to pointer sizes. The index size is already enough of a burden to deal with, and we've only recently approached something resembling correct use of index sizes.

Ah, in that case I would be more than happy to drop this distinction and just clarify the datalayout that the index size is the size of the address (which is what we do for CHERI already).
I added this new type based on your comment in D99660:

The index type is used for GEP index and casting a pointer to the index type doesn't make a whole lot of sense to me. Do you have any LangRef wording or other usages that would show that this is a sensible thing to do?

Would a clarification to the LangRef be sufficient?

Basically, the problem I'm trying to solve here is that code introducing ptr2int right now uses DL.getPointerSize() to obtain the integer, but for us we want this to be the address size (i.e. 64 for ELFCLASS64).
We don't want ptr2int to give us i128 types since only 64 of those bits are meaningful (but using ptrtoint is perfectly fine, so the current non-integral constraints are too restrictive)

arichardson added a child revision: D99660: Use DL.getIndexType() in Value::getPointerAlignment().Oct 4 2022, 7:54 AM

Avoid adding a new datalayout component and only add the function

arichardson edited the summary of this revision. (Show Details)Oct 4 2022, 8:13 AM

Harbormaster completed remote builds in B190217: Diff 465019.Oct 4 2022, 8:45 AM

But I thought we banned ptrtoint for nonintegral pointers

CHERI capabilities aren't non-integral. Converting a capability to an integer gives you the address, discarding the metadata (which is also stable*), which is as well-defined as it is on normal architectures. Non-integral pointers are weird unstable things where ptrtoint on the same thing can give different results and so you can't introduce new instructions during optimisation (but so long as you know what you're doing it's fine to create them in the first place in the frontend).

Technically not true when revocation is involved if you inspect the metadata of a capability referring to an object past the end of its lifetime, but that is highly UB and constrained in terms of what you can observe

Is this needed for anything but the one usage in D99660? We could just stop using ptrtoint there and just explicitly check for the nullptr and inttoptr cases, which is all this handles in practice. This overly general constant expression based code has already caused enough complications in the past, so I'm happy to drop it.

I don't think producing a ptr2int out of thin air (without a known result type) is common, and should probably be avoided in general. The main case where we currently introduce ptr2int is when type punning through memory, and in that case the load/store type determines the used type.

In D135158#3834627, @nikic wrote:

Is this needed for anything but the one usage in D99660? We could just stop using ptrtoint there and just explicitly check for the nullptr and inttoptr cases, which is all this handles in practice. This overly general constant expression based code has already caused enough complications in the past, so I'm happy to drop it.

I don't think producing a ptr2int out of thin air (without a known result type) is common, and should probably be avoided in general. The main case where we currently introduce ptr2int is when type punning through memory, and in that case the load/store type determines the used type.

There are also a few other cases where code wants needs the address range instead of the pointer size but that does not matter for most targets. I'll upload a few more patches that show where this function is needed for CHERI.

CHERI capabilities aren't non-integral. Converting a capability to an integer gives you the address, discarding the metadata (which is also stable*), which is as well-defined as it is on normal architectures.

Note that this is purely for historical reasons: Non-integral pointers were not available when we started. I worked with the folks pushing NI pointers to make sure that they'd work with CHERI and the plan was always to move over to NI at some point.

I overloaded the semantics of ptrtoint as get-address as a quick hack to make things work but it's definitely not the right thing to do and we spent several years working with other folks with similar requirements to upstream features that would let us move away from this. The behaviour of CHERI's ptrtoint is confusing to optimisers. I believe that the correct solution (which, I vaguely recall, I wrote in a roadmap doc before I left the CL) is:

Make CHERI use NI pointers.
Make clang generate get-address intrinsics instead of ptrtoint.
Help fix any optimisations that try to introduce ptrtoint for NI pointers.
Make the CHERI back ends reject PTRTOINT DAG nodes.

CHERI LLVM IR should only ever have get-address and set-address intrinsics, no inttoptr and ptrtoint instructions. This should improve optimisations for things that round-trip addresses through pointers because ptrtoint is treated as an escape for alias analysis, whereas get-address is not and set-address is explicitly a provenance-carrying operation that works directly with alias analysis. This structure is also desirable for Rust's Strict Provenance model, even with non-CHERI targets.

This is also the approach that I've recommended to the embecosm folks working on preparing CHERI LLVM for upstreaming. It would be great if more people could join those calls.

Except then you can't do unsigned long x = (unsigned long)&y as intrinsics are not constant expressions. Non-integral pointers are too strict for CHERI as things stand, we want a subset of their behaviour.

In D135158#3836839, @jrtc27 wrote:

Except then you can't do unsigned long x = (unsigned long)&y as intrinsics are not constant expressions. Non-integral pointers are too strict for CHERI as things stand, we want a subset of their behaviour.

You can; however, do it as a constant address-space cast, followed by a ptrtoint, which would make lowering easier in the back end because a constant address space cast of a global should give a non-capability relocation.

Not sure I can give any useful feedback here.

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

8 lines

include/

llvm/

IR/

DataLayout.h

26 lines

lib/

IR/

DataLayout.cpp

6 lines

test/

Assembler/

invalid-datalayout25.ll

6 lines

invalid-datalayout26.ll

6 lines

Diff 465019

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,748 Lines • ▼ Show 20 Lines	``G<address space>``
LLVM passes).		LLVM passes).
``A<address space>``		``A<address space>``
Specifies the address space of objects created by '``alloca``'.		Specifies the address space of objects created by '``alloca``'.
Defaults to the default address space of 0.		Defaults to the default address space of 0.
``p[n]:<size>:<abi>[:<pref>][:<idx>]``		``p[n]:<size>:<abi>[:<pref>][:<idx>]``
This specifies the size of a pointer and its ``<abi>`` and		This specifies the size of a pointer and its ``<abi>`` and
``<pref>``\erred alignments for address space ``n``. ``<pref>`` is optional		``<pref>``\erred alignments for address space ``n``. ``<pref>`` is optional
and defaults to ``<abi>``. The fourth parameter ``<idx>`` is the size of the		and defaults to ``<abi>``. The fourth parameter ``<idx>`` is the size of the
index that used for address calculation. If not		index that used for address calculation. For targets that include additional
specified, the default index size is equal to the pointer size. All sizes		non-integral bits in the pointer representation (e.g. fat pointer metadata),
are in bits. The address space, ``n``, is optional, and if not specified,		``<idx>`` also specifies the number of integral bits, i.e. the address space
		range. If not specified, the default index size is equal to the pointer
		size. The address space, ``n``, is optional, and if not specified,
denotes the default address space 0. The value of ``n`` must be		denotes the default address space 0. The value of ``n`` must be
in the range [1,2^23).		in the range [1,2^23).
``i<size>:<abi>[:<pref>]``		``i<size>:<abi>[:<pref>]``
This specifies the alignment for an integer type of a given bit		This specifies the alignment for an integer type of a given bit
``<size>``. The value of ``<size>`` must be in the range [1,2^23).		``<size>``. The value of ``<size>`` must be in the range [1,2^23).
``<pref>`` is optional and defaults to ``<abi>``.		``<pref>`` is optional and defaults to ``<abi>``.
``v<size>:<abi>[:<pref>]``		``v<size>:<abi>[:<pref>]``
This specifies the alignment for a vector type of a given bit		This specifies the alignment for a vector type of a given bit
▲ Show 20 Lines • Show All 23,049 Lines • Show Last 20 Lines

llvm/include/llvm/IR/DataLayout.h

Show First 20 Lines • Show All 367 Lines • ▼ Show 20 Lines	public:
/// Layout pointer alignment		/// Layout pointer alignment
Align getPointerABIAlignment(unsigned AS) const;		Align getPointerABIAlignment(unsigned AS) const;

/// Return target's alignment for stack-based pointers		/// Return target's alignment for stack-based pointers
/// FIXME: The defaults need to be removed once all of		/// FIXME: The defaults need to be removed once all of
/// the backends/clients are updated.		/// the backends/clients are updated.
Align getPointerPrefAlignment(unsigned AS = 0) const;		Align getPointerPrefAlignment(unsigned AS = 0) const;

/// Layout pointer size in bytes, rounded up to a whole		/// Layout pointer size in bytes, rounded up to a whole number of bytes. The
/// number of bytes.		/// difference between this function and getPointerIntegralSize() is this one
		/// returns the size of the entire pointer type (this includes metadata bits
		/// for fat pointers) and the latter only returns the number of address bits.
		/// \sa DataLayout::getPointerIntegralSize
/// FIXME: The defaults need to be removed once all of		/// FIXME: The defaults need to be removed once all of
/// the backends/clients are updated.		/// the backends/clients are updated.
unsigned getPointerSize(unsigned AS = 0) const;		unsigned getPointerSize(unsigned AS = 0) const;

		/// Returns the integral size of a pointer in a given address space in bytes.
		/// For targets that store bits in pointers that are not part of the address,
		/// this returns the number of bits that can be manipulated using operations
		/// that change the address (e.g. addition/subtraction).
		/// For example, a 64-bit CHERI-enabled target has 128-bit pointers of which
		/// only 64 are used to represent the address and the remaining ones are used
		/// for metadata such as bounds and access permissions. In this case
		/// getPointerSize() returns 16, but getPointerIntegralSize() returns 8.
		/// \sa DataLayout::getPointerSize
		unsigned getPointerIntegralSize(unsigned AS) const;

/// Returns the maximum index size over all address spaces.		/// Returns the maximum index size over all address spaces.
unsigned getMaxIndexSize() const;		unsigned getMaxIndexSize() const;

// Index size in bytes used for address calculation,		// Index size in bytes used for address calculation,
/// rounded up to a whole number of bytes.		/// rounded up to a whole number of bytes.
unsigned getIndexSize(unsigned AS) const;		unsigned getIndexSize(unsigned AS) const;

/// Return the address spaces containing non-integral pointers. Pointers in		/// Return the address spaces containing non-integral pointers. Pointers in
Show All 18 Lines	public:

/// Layout pointer size, in bits		/// Layout pointer size, in bits
/// FIXME: The defaults need to be removed once all of		/// FIXME: The defaults need to be removed once all of
/// the backends/clients are updated.		/// the backends/clients are updated.
unsigned getPointerSizeInBits(unsigned AS = 0) const {		unsigned getPointerSizeInBits(unsigned AS = 0) const {
return getPointerAlignElem(AS).TypeBitWidth;		return getPointerAlignElem(AS).TypeBitWidth;
}		}

		unsigned getPointerIntegralSizeInBits(unsigned AS) const {
		// Currently, this returns the same value as getIndexSizeInBits() as this
		// is correct for all currently known LLVM targets. If another target is
		// added that has pointer size != pointer range != GEP index width, we can
		// add a new datalayout field for pointer integral range.
		return getPointerAlignElem(AS).IndexBitWidth;
		}

/// Returns the maximum index size over all address spaces.		/// Returns the maximum index size over all address spaces.
unsigned getMaxIndexSizeInBits() const {		unsigned getMaxIndexSizeInBits() const {
return getMaxIndexSize() * 8;		return getMaxIndexSize() * 8;
}		}

/// Size in bits of index used for address calculation in getelementptr.		/// Size in bits of index used for address calculation in getelementptr.
unsigned getIndexSizeInBits(unsigned AS) const {		unsigned getIndexSizeInBits(unsigned AS) const {
return getPointerAlignElem(AS).IndexBitWidth;		return getPointerAlignElem(AS).IndexBitWidth;
▲ Show 20 Lines • Show All 300 Lines • Show Last 20 Lines

llvm/lib/IR/DataLayout.cpp

Show First 20 Lines • Show All 352 Lines • ▼ Show 20 Lines	case 'p': {
// Now read the index. It is the second optional parameter here.		// Now read the index. It is the second optional parameter here.
if (!Rest.empty()) {		if (!Rest.empty()) {
if (Error Err = ::split(Rest, ':', Split))		if (Error Err = ::split(Rest, ':', Split))
return Err;		return Err;
if (Error Err = getInt(Tok, IndexSize))		if (Error Err = getInt(Tok, IndexSize))
return Err;		return Err;
if (!IndexSize)		if (!IndexSize)
return reportError("Invalid index size of 0 bytes");		return reportError("Invalid index size of 0 bytes");
		if (IndexSize > PointerMemSize)
		return reportError("Index size cannot be larger than pointer size");
}		}
}		}
if (Error Err = setPointerAlignmentInBits(		if (Error Err = setPointerAlignmentInBits(
AddrSpace, assumeAligned(PointerABIAlign),		AddrSpace, assumeAligned(PointerABIAlign),
assumeAligned(PointerPrefAlign), PointerMemSize, IndexSize))		assumeAligned(PointerPrefAlign), PointerMemSize, IndexSize))
return Err;		return Err;
break;		break;
}		}
▲ Show 20 Lines • Show All 335 Lines • ▼ Show 20 Lines
Align DataLayout::getPointerPrefAlignment(unsigned AS) const {		Align DataLayout::getPointerPrefAlignment(unsigned AS) const {
return getPointerAlignElem(AS).PrefAlign;		return getPointerAlignElem(AS).PrefAlign;
}		}

unsigned DataLayout::getPointerSize(unsigned AS) const {		unsigned DataLayout::getPointerSize(unsigned AS) const {
return divideCeil(getPointerAlignElem(AS).TypeBitWidth, 8);		return divideCeil(getPointerAlignElem(AS).TypeBitWidth, 8);
}		}

		unsigned DataLayout::getPointerIntegralSize(unsigned AS) const {
		return divideCeil(getPointerIntegralSizeInBits(AS), 8);
		}

unsigned DataLayout::getMaxIndexSize() const {		unsigned DataLayout::getMaxIndexSize() const {
unsigned MaxIndexSize = 0;		unsigned MaxIndexSize = 0;
for (auto &P : Pointers)		for (auto &P : Pointers)
MaxIndexSize =		MaxIndexSize =
std::max(MaxIndexSize, (unsigned)divideCeil(P.TypeBitWidth, 8));		std::max(MaxIndexSize, (unsigned)divideCeil(P.TypeBitWidth, 8));

return MaxIndexSize;		return MaxIndexSize;
}		}
▲ Show 20 Lines • Show All 293 Lines • Show Last 20 Lines

llvm/test/Assembler/invalid-datalayout25.ll

This file was added.

				; RUN: not llvm-as < %s 2>&1 \| FileCheck %s

				target datalayout = "p0:32:32:32:64"

				; CHECK: error: index size cannot be larger than pointer size

llvm/test/Assembler/invalid-datalayout26.ll

This file was added.

				; RUN: not llvm-as < %s 2>&1 \| FileCheck %s

				target datalayout = "p0:32:32:32:0"

				; CHECK: error: Invalid index size of 0 bytes

This is an archive of the discontinued LLVM Phabricator instance.

[DataLayout] Introduce DataLayout::getPointerIntegralSize(AS)Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 465019

llvm/docs/LangRef.rst

llvm/include/llvm/IR/DataLayout.h

llvm/lib/IR/DataLayout.cpp

llvm/test/Assembler/invalid-datalayout25.ll

llvm/test/Assembler/invalid-datalayout26.ll

[DataLayout] Introduce DataLayout::getPointerIntegralSize(AS)
Needs ReviewPublic