This is an archive of the discontinued LLVM Phabricator instance.

[IR] Introduce a non-integral pointer type
ClosedPublic

Authored by sanjoy on Jul 18 2016, 4:38 PM.

Details

Summary

This change adds a ni:<addrspace> specifier in the datalayout string to
denote pointers in the given address space as "non-integral", and adds some
typing rules these special pointers.

Diff Detail

Repository
rL LLVM

Event Timeline

sanjoy updated this revision to Diff 64418.Jul 18 2016, 4:38 PM
sanjoy retitled this revision from to [IR] Introduce a precise GC reference type.
sanjoy updated this object.
sanjoy added a subscriber: llvm-commits.

This probably does not need saying, but this is just the very first step in the plan mentioned in http://lists.llvm.org/pipermail/llvm-dev/2016-July/102466.html, other steps will follow once this lands.

For people seeing this for the first time, the design discussion starts at http://lists.llvm.org/pipermail/llvm-dev/2016-July/102161.html

sanjoy updated this revision to Diff 64613.Jul 19 2016, 4:59 PM
  • review
sanjoy retitled this revision from [IR] Introduce a precise GC reference type to [IR] Introduce a non-integral pointer type.Jul 22 2016, 11:24 AM
sanjoy updated this object.

Also ping!

(I've changed the specification to not be tied to GC semantics as discussed on the llvm-dev thread)

arsenm added a subscriber: arsenm.Jul 22 2016, 11:52 AM
arsenm added inline comments.
lib/IR/Verifier.cpp
2413–2415 ↗(On Diff #64613)

What about vector of pointers?

2442–2444 ↗(On Diff #64613)

Ditto

sanjoy added inline comments.Jul 22 2016, 12:02 PM
lib/IR/Verifier.cpp
2413–2415 ↗(On Diff #64613)

Won't getScalarType take care of that? I'll add a test case.

2442–2444 ↗(On Diff #64613)

Won't getScalarType take care of that? I'll add a test case.

sanjoy updated this revision to Diff 65125.Jul 22 2016, 12:45 PM
  • review
  • move test case
  • review
arsenm added inline comments.Jul 22 2016, 1:26 PM
docs/LangRef.rst
559–561 ↗(On Diff #65125)

What consequence does the changing pointer value have on the optimizer? I'm interested in this for fat pointers which will not change value during runtime

sanjoy added inline comments.Jul 22 2016, 1:33 PM
docs/LangRef.rst
559–561 ↗(On Diff #65125)

That is the justification for disallowing integer <-> pointer conversions (directly via cast instructions or via memory). If fat pointers have a stable bitwise representation, then why not represent them as normal pointers?

arsenm added inline comments.Jul 22 2016, 1:46 PM
docs/LangRef.rst
559–561 ↗(On Diff #65125)

The pointer arithmetic does not behave exactly like a 128-bit integer. Doing pointer arithmetic such as a 128-bit add would be incorrect, and we don't want CodeGenPrepare etc. to be allowed to decompose it into integer operations.

The high 64-bits are constant "metadata" bits for the memory operation. The low 64-bits behave like a pointer. I think what we need is something that would only ever have the pointer manipulated through GEP, and then special addressing mode matching in the backend (since the fat pointer should almost always be treated as a constant with a separate index operand)

sanjoy added inline comments.Jul 26 2016, 12:23 PM
docs/LangRef.rst
559–561 ↗(On Diff #65125)

Okay, that sounds reasonable. We have a similar restriction, in that GEPs have to stay GEPs and can't be decomposed into integer arithmetic. However, we also want to allow things like alignment, which only really make sense if the pointer has an integral representation. What do you think of changing the language to say "does not have a useful integral representation"? "useful" is deliberately vague here, with the understanding that we disallow some integer-like operations / properties, but allow others.

sanjoy added inline comments.Jul 26 2016, 12:26 PM
docs/LangRef.rst
559–561 ↗(On Diff #65125)

we also want to allow things like alignment

Just to be explicit: I meant we also want to allow specifying things like alignment on these non-integral pointers

arsenm added inline comments.Jul 26 2016, 12:26 PM
docs/LangRef.rst
559–561 ↗(On Diff #65125)

That seems ok. The low bits are pointer like, so alignment still would make sense for our case

arsenm added inline comments.Jul 26 2016, 12:27 PM
docs/LangRef.rst
559–561 ↗(On Diff #65125)

Maybe better would be a target dependent representation? Saying not useful seems restrictive to backend code that wants to rely on target specific assumptions

sanjoy updated this revision to Diff 65580.Jul 26 2016, 1:14 PM
  • update docs as per dicussion with @arsenm
arsenm added inline comments.Jul 28 2016, 11:46 AM
docs/LangRef.rst
1854–1855 ↗(On Diff #65580)

Maybe mention this isn't allowed for AS 0?

include/llvm/IR/DataLayout.h
206 ↗(On Diff #65580)

Only one non-integral is supported? I would expect it to support specifying any address space (like the pointer bit widths). I'm not sure we really need more than one, but 2 or 3 seems likely.

arsenm added inline comments.Jul 28 2016, 11:49 AM
include/llvm/IR/DataLayout.h
206 ↗(On Diff #65580)

If the one restriction remains it at least needs a mention in the langref

sanjoy updated this revision to Diff 65978.Jul 28 2016, 1:19 PM

address review

sanjoy marked 2 inline comments as done.Jul 28 2016, 1:20 PM

Addressed review

arsenm added inline comments.Jul 28 2016, 1:38 PM
include/llvm/IR/DataLayout.h
150 ↗(On Diff #65978)

Seems like it should be a SmallSet?

sanjoy added inline comments.Jul 28 2016, 1:42 PM
include/llvm/IR/DataLayout.h
150 ↗(On Diff #65978)

Since this isn't going to change after creation, how about a sorted SmallVector? That way we'll avoid the isSmall check in isNonIntegralPointerType and will also be able to use a binary search instead of a linear search.

arsenm added inline comments.Jul 28 2016, 1:46 PM
include/llvm/IR/DataLayout.h
150 ↗(On Diff #65978)

I don't forsee this ever being larger than 3 or 4, so the linear search a SmallSet does for the small case is probably better

sanjoy added inline comments.Jul 28 2016, 1:50 PM
include/llvm/IR/DataLayout.h
150 ↗(On Diff #65978)

I don't forsee this ever being larger than 3 or 4, so the linear search a SmallSet does for the small case is probably better

Then why bother with a SmallSet at all? If you're worried about worst-case situations where we have millions of non-integral address spaces, then a sorted vector seems better than a SmallSet.

arsenm added inline comments.Jul 28 2016, 1:53 PM
include/llvm/IR/DataLayout.h
150 ↗(On Diff #65978)

I suppose there's not much difference

sanjoy updated this revision to Diff 65996.Jul 28 2016, 2:14 PM
  • bump the small vector size, as discussed with @arsenm on IRC
arsenm accepted this revision.Jul 28 2016, 4:33 PM
arsenm added a reviewer: arsenm.

LGTM

This revision is now accepted and ready to land.Jul 28 2016, 4:33 PM
This revision was automatically updated to reflect the committed changes.