This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/docs/
-
docs/
2/9
LangRef.rst

Differential D104547

[langref] attempt to clarify semantics of inttoptr/ptrtoint for non-integral types
ClosedPublic

Authored by reames on Jun 18 2021, 9:34 AM.

Download Raw Diff

Details

Reviewers

efriedma
lebedev.ri
apilipenko
mkazantsev
samparker

Commits

rGf74bb95bbe4d: [langref] attempt to clarify semantics of inttoptr/ptrtoint for non-integral…

Summary

In review discussion on D104322, Eli and Roman quite reasonable raised concerns about the LangRef not really providing a precise definition for inttoptr/ptrtoint on non-integral types. These had previously been disallowed, but I'd pragmatically allowed them in ac81cb7e6. This is my attempt to improve the situation.

Diff Detail

Unit TestsFailed

	Time	Test
	2,020 ms	x64 debian > libFuzzer.libFuzzer::dataflow.test
	120,640 ms	x64 debian > libFuzzer.libFuzzer::only-some-bytes-fork.test
	7,570 ms	x64 debian > libFuzzer.libFuzzer::only-some-bytes.test

Event Timeline

reames created this revision.Jun 18 2021, 9:34 AM

Herald added subscribers: jdoerfert, jfb, bollu and 2 others. · View Herald TranscriptJun 18 2021, 9:34 AM

reames requested review of this revision.Jun 18 2021, 9:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 18 2021, 9:34 AM

reames added a reviewer: samparker.Jun 18 2021, 9:34 AM

Thank you.

reames mentioned this in D103656: [ScalarEvolution] Ensure backedge-taken counts are not pointers..Jun 18 2021, 10:00 AM

mkazantsev added inline comments.Jun 18 2021, 11:51 AM

llvm/docs/LangRef.rst
620	Does this also apply to such things as unrolling and unswitching? Sounds over-restrictive.

mkazantsev added inline comments.Jun 18 2021, 11:52 AM

llvm/docs/LangRef.rst
608	What about same cast of the same operand executed twice (e.g. in a loop)? Is it possible that it produces different values twice, and therefore LICM of the cast is illegal?

Harbormaster completed remote builds in B109950: Diff 353029.Jun 19 2021, 4:43 AM

reames added inline comments.Jun 21 2021, 9:35 AM

llvm/docs/LangRef.rst
608	They could produce two different values, but unless fenced (in some unspecified implementation dependent manner), they're allowed to return a single value and thus LICM is legal.
620	This is a good question. I had not intended to disallow unrolling, but I can see how the current wording reads that way. If I tweaked the wording to say that the optimizer isn't allowed to insert new dynamic occurrences of the cast, would that address the concern?

mkazantsev added inline comments.Jun 21 2021, 11:39 PM

llvm/docs/LangRef.rst
620	I guess what we want is smth like "for any possible execution path, we cannot increase the number of executed casts".

mkazantsev added inline comments.Jun 21 2021, 11:41 PM

llvm/docs/LangRef.rst
620	yea, word "dynamic" is the same thing, I guess.

So, I missed the original discussion of this, but I'm somewhat concerned about this direction. We very heavily rely on non-integral pointers in Julia. I'm concerned that removing the verifier rules will allow non-sound inttoptr/ptrtoint transformations to sneak in undetected. In general, I'm ok with the semantics proposed in this revision, but I would be much happier if they were disallowed entirely. Perhaps it is time to allow specifying more extensive address space attributes in the IR. For example, there is the constant discussion of whether geps may be commuted across addrspacecasts, which is just very end-user dependent. And now we have this possible distinction between "hard" and "soft" non-integral pointers. I think just letting frontends specify more precise semantics for their non-integral pointers may help alleviate them being pulled in so many different directions. Lastly, this may not be for the revision, but I take it the main motivation for this change is to be able to compute offsets between pointers to the same underlying object? I'm sympathetic to that use case, but could we just have a version of sub that did that directly? These offsets are well defined and stable, whereas allowing ptrtoint to give incompletely defined volatile answers seems very likely to wreak havoc.

vchuravy added a subscriber: vchuravy.Jun 23 2021, 10:57 AM

Add word "dynamic" to address review comment.

In D104547#2836163, @loladiro wrote:

So, I missed the original discussion of this, but I'm somewhat concerned about this direction. We very heavily rely on non-integral pointers in Julia. I'm concerned that removing the verifier rules will allow non-sound inttoptr/ptrtoint transformations to sneak in undetected. In general, I'm ok with the semantics proposed in this revision, but I would be much happier if they were disallowed entirely. Perhaps it is time to allow specifying more extensive address space attributes in the IR. For example, there is the constant discussion of whether geps may be commuted across addrspacecasts, which is just very end-user dependent. And now we have this possible distinction between "hard" and "soft" non-integral pointers. I think just letting frontends specify more precise semantics for their non-integral pointers may help alleviate them being pulled in so many different directions. Lastly, this may not be for the revision, but I take it the main motivation for this change is to be able to compute offsets between pointers to the same underlying object? I'm sympathetic to that use case, but could we just have a version of sub that did that directly? These offsets are well defined and stable, whereas allowing ptrtoint to give incompletely defined volatile answers seems very likely to wreak havoc.

I wish we had well defined semantics for address space casts. If we did, we could achieve reasonable well define semantics for NI conversion with addrspacecast + ptrtoint of integral pointer. We don't. Worse, I see little evidence of progress in that direction.

On the pointer subtraction question, I'm increasingly seeing that as a valueable IR construct for optimization quality reasons. It doesn't really solve anything for NI types though, so this is mostly a red herring.

The two root issues which caused me to relax the rules were:

Every actual user I know of - with maybe you as an exception - had removed the verfier rules. As such, we were specifying a language that literally no one used or tested.
Dead code. The optimizer is generally free to emit arbitrarily broken code along dynamically dead paths. As a specific example, if a non-integral pointer's base is proven null (which is zero, even for NI pointers), we can have a NI pointer with a provably integral value. This can happen both directly, and indirectly via LIV and constant ranges. I don't remember the exact test case, but we did see real cases where the optimizer was exploiting undef in a way which produced ptrtoint along a dead path which was really hard to argue we shouldn't.

I'm happy to hear proposals for other approaches, but I really don't see any which are reasonably pragmatic.

Harbormaster completed remote builds in B111400: Diff 355061.Jun 28 2021, 5:28 PM

mkazantsev added inline comments.Jun 28 2021, 10:10 PM

llvm/docs/LangRef.rst
623	I think it should clearly be stated what happens if a new cast is inserted (for some reason) and the corresponding rules are not fulfuilled. Does a new cast returns undef, or some other cast (e.g. that goes after it) returns undef, of the entire program has undefined behavior? It's not clear to me how exactly it may break.

mkazantsev added inline comments.Jun 28 2021, 10:17 PM

llvm/docs/LangRef.rst
623	It is also not clear from the text if it's safe to insert new dynamic casts if their results are not used. Their result is state-dependent - fine. I don't care what they return. But if they don't change the state, they should not have impact on any further casts. So it's not obvious why it should be illegal to make such insertion.

The more I think of it, the more unclear the attempt to prohibit insertion of new casts becomes. I understand the goal, but the wording here leaves too many room for false interpretations.

I believe that what you really wanted to say is that the result of any cast may change whenever the external state changes, and therefore we should not introduce new uses from casts from potential other states into where was uses of different state.

Simple example of (meaningless and merciless) demon optimization.

Old code:

%x = ptrtoint %p to i32

New code:

%i1 = ptrtoint %p to i32
%i2 = ptrtoint %p to i32
%x = select undef, %i1, %i2

We have introduced new dynamically executed cast. We have not introduced new dynamically used value. The transform looks meaningless, but should be absolutely legal from perspective that casts are state-dependent.

It might not be legal if we did something like

%i1 = ptrtoint %p to i32
<fence>
%i2 = ptrtoint %p to i32
%x = select undef, %i1, %i2

In this case, at the point of <fence> the outer state may change, and therefor %x may be a value from either one or another state, which assumes different cast results.

I think we should not have statements like "inserting cast instructions is illegal", no matter what "illegal" means. We should instead clearly state that smth like:

If a state is changed at some point, then all usages of casts that were executed before this point become undefined values. The exact bit pattern seen by user in the same state from a cast from the same state is a stable implementation-defined value. Same casts of same pointers in same state always produce the same value.

And I think it directly breaches SSA semantics of this operation. Looks like it simply needs a notion that it reads external (state) memory,

jdoerfert added inline comments.Jun 28 2021, 10:48 PM

llvm/docs/LangRef.rst
626	Drive by: It states here that we should not introduce new p2i casts, which I think we should avoid in general. That said, the reality is (or at least was) that we introduce p2i casts in some situations. Are we planning to introduce checks on the pointer "kind" or eliminate the introduction of those casts?

Max, I see your comment, but I'm not going to act on them. Your commenting on the section which is discussing implications, not specification. The specification wording is two paragraphs earlier.

I'm now asking for a LGTM. I think this is worthwhile, but this has hit the point of perfection being the enemy of the good in the review discussion. I don't plan on further iterating this - I really don't care enough - so from a practical perspective, we either take the improvement or I abandon the review and move on. I'll leave it up to reviewers which they'd desire.

I see your point Philip. Actually I think I understand what is meant here, and the problem is with word chosing. Just be aware that the opts which literally correspond this specification should also be LGTMed. ;)

This revision is now accepted and ready to land.Jul 8 2021, 12:08 AM

Closed by commit rGf74bb95bbe4d: [langref] attempt to clarify semantics of inttoptr/ptrtoint for non-integral… (authored by reames). · Explain WhyJul 12 2021, 8:49 AM

This revision was automatically updated to reflect the committed changes.

reames added a commit: rGf74bb95bbe4d: [langref] attempt to clarify semantics of inttoptr/ptrtoint for non-integral….

vtjnash mentioned this in D128670: [SimplifyCFG] teach simplifycfg not to introduce ptrtoint for NI pointers.Jun 27 2022, 10:39 AM

vchuravy mentioned this in rG3a8d7fe20199: [SimplifyCFG] teach simplifycfg not to introduce ptrtoint for NI pointers.Aug 15 2022, 12:12 PM

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

28 lines

Diff 353029

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 595 Lines • ▼ Show 20 Lines
	considered experimental at this time.			considered experimental at this time.

	LLVM IR optionally allows the frontend to denote pointers in certain address			LLVM IR optionally allows the frontend to denote pointers in certain address
	spaces as "non-integral" via the :ref:`datalayout string<langref_datalayout>`.			spaces as "non-integral" via the :ref:`datalayout string<langref_datalayout>`.
	Non-integral pointer types represent pointers that have an unspecified bitwise			Non-integral pointer types represent pointers that have an unspecified bitwise
	representation; that is, the integral representation may be target dependent or			representation; that is, the integral representation may be target dependent or
	unstable (not backed by a fixed integer).			unstable (not backed by a fixed integer).

	``inttoptr`` and ``ptrtoint`` instructions converting integers to non-integral			``inttoptr`` and ``ptrtoint`` instructions have the same semantics as for
	pointer types or vice versa are implementation defined, and subject to likely			integral (i.e. normal) pointers in that they convert integers to and from
	future revision in semantics. Vector versions of said instructions are as well.			corresponding pointer types, but there are additional implications to be
	Users of non-integral-pointer types are advised not to design around current			aware of. Because the bit-representation of a non-integral pointer may
	semantics as they may very well change in the nearish future.			not be stable, two identical casts of the same operand may or may not
				mkazantsevUnsubmitted Not Done Reply Inline Actions What about same cast of the same operand executed twice (e.g. in a loop)? Is it possible that it produces different values twice, and therefore LICM of the cast is illegal? mkazantsev: What about same cast of the same operand executed twice (e.g. in a loop)? Is it possible that…
				reamesAuthorUnsubmitted Done Reply Inline Actions They could produce two different values, but unless fenced (in some unspecified implementation dependent manner), they're allowed to return a single value and thus LICM is legal. reames: They could produce two different values, but unless fenced (in some unspecified implementation…
				return the same value. Said differently, the conversion to or from the
				non-integral type depends on environmental state in an implementation
				defined manner.

				If the frontend wishes to observe a particular value following a cast, the
				generated IR must fence with the underlying environment in an implementation
				defined manner. (In practice, this tends to require ``noinline`` routines for
				such operations.)

				From the perspective of the optimizer, ``inttoptr`` and ``ptrtoint`` for
				non-integral types are analogous to ones on integral types with one
				key exception: the optimizer may not, in general, insert new occurrences of
				mkazantsevUnsubmitted Not Done Reply Inline Actions Does this also apply to such things as unrolling and unswitching? Sounds over-restrictive. mkazantsev: Does this also apply to such things as unrolling and unswitching? Sounds over-restrictive.
				reamesAuthorUnsubmitted Done Reply Inline Actions This is a good question. I had not intended to disallow unrolling, but I can see how the current wording reads that way. If I tweaked the wording to say that the optimizer isn't allowed to insert new dynamic occurrences of the cast, would that address the concern? reames: This is a good question. I had not intended to disallow unrolling, but I can see how the…
				mkazantsevUnsubmitted Not Done Reply Inline Actions I guess what we want is smth like "for any possible execution path, we cannot increase the number of executed casts". mkazantsev: I guess what we want is smth like "for any possible execution path, we cannot increase the…
				mkazantsevUnsubmitted Not Done Reply Inline Actions yea, word "dynamic" is the same thing, I guess. mkazantsev: yea, word "dynamic" is the same thing, I guess.
				such casts. If a new cast is inserted, the optimizer would need to either
				ensure that a) all possible values are valid, or b) appropriate fencing is
				inserted. Since the appropriate fencing is implementation defined, the
				mkazantsevUnsubmitted Not Done Reply Inline Actions I think it should clearly be stated what happens if a new cast is inserted (for some reason) and the corresponding rules are not fulfuilled. Does a new cast returns undef, or some other cast (e.g. that goes after it) returns undef, of the entire program has undefined behavior? It's not clear to me how exactly it may break. mkazantsev: I think it should clearly be stated what happens if a new cast is inserted (for some reason)…
				mkazantsevUnsubmitted Not Done Reply Inline Actions It is also not clear from the text if it's safe to insert new dynamic casts if their results are not used. Their result is state-dependent - fine. I don't care what they return. But if they don't change the state, they should not have impact on any further casts. So it's not obvious why it should be illegal to make such insertion. mkazantsev: It is also not clear from the text if it's safe to insert new dynamic casts if their results…
				optimizer can't do the latter. The former is challenging as many commonly
				expected properties, such as ``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for
				non-integral types.
				jdoerfertUnsubmitted Not Done Reply Inline Actions Drive by: It states here that we should not introduce new p2i casts, which I think we should avoid in general. That said, the reality is (or at least was) that we introduce p2i casts in some situations. Are we planning to introduce checks on the pointer "kind" or eliminate the introduction of those casts? jdoerfert: Drive by: It states here that we should not introduce new p2i casts, which I think we should…

	.. _globalvars:			.. _globalvars:

	Global Variables			Global Variables
	----------------			----------------

	Global variables define regions of memory allocated at compilation time			Global variables define regions of memory allocated at compilation time
	instead of run-time.			instead of run-time.
	▲ Show 20 Lines • Show All 21,900 Lines • Show Last 20 Lines