This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/test/
-
test/
-
CodeGen/
-
ms-intrinsics.c
-
ms-x86-intrinsics.c
-
systemz-inline-asm.c
-
CodeGenOpenCL/
-
kernels-have-spir-cc-by-default.cl
-
llvm/
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
10/18
InferFunctionAttrs.cpp
-
test/
-
CodeGen/AMDGPU/
-
AMDGPU/
-
inline-attr.ll
-
Transforms/InferFunctionAttrs/
-
InferFunctionAttrs/
3/4
dereferenceable.ll

Differential D64258

[InferFuncAttributes] extend 'dereferenceable' attribute based on loads
AbandonedPublic

Authored by spatel on Jul 5 2019, 11:22 AM.

Download Raw Diff

Details

Reviewers

reames
hfinkel
RKSimon
jdoerfert

Summary

This is similar to a 'nonnull' patch:
D27855
...that is still off by default because of C problems.

For this patch, the motivating case is shown in PR21780:
https://bugs.llvm.org/show_bug.cgi?id=21780

We are trying to preserve the ability of SLP (D64142) and/or the backend (D64205) to create a vector load even after some other pass like InstCombine has deleted scalar instructions by using demanded elements analysis. We do that by collecting all guaranteed accesses from a given pointer argument and creating a known dereferenceable byte range from those.

There's an alternate proposal to do something similar but more involved in:
D37579
...but that seems to have stalled.

And if I'm interpreting the comments there correctly, this is an implementation of a suggestion from @reames :
"...we can prove that the loads post dominate the entry to the function and could update the argument with the existing dereferenceability attribute. This might be an alternate approach and separately worth implementation."

Diff Detail

Event Timeline

spatel created this revision.Jul 5 2019, 11:22 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 5 2019, 11:22 AM

Herald added subscribers: hiraditya, mcrosier. · View Herald Transcript

Patch updated:
I missed diffs in some existing over-reaching clang and AMDGPU tests. These regression tests should not be testing the entire optimization pipeline, but I adjusted the assertions to make them pass.

Herald added subscribers: nhaehnle, jvesely. · View Herald TranscriptJul 5 2019, 12:36 PM

The direction of this makes total sense and we will need it. However this shoulnd't be here (wrt. the file/pass).

Assuming we want this right right now, it should life in FunctionAttrs.cpp. Assuming we want to do it "right" it should become part of the Attributor framework.

The early prototype of the "deref-or-null" abstract attribute already had this functionality, see https://reviews.llvm.org/D59202#C1381429NL1995, and the test case https://reviews.llvm.org/D59202#change-FJbHx7N4s6ye . For the new Attributor, dereferenceable-or-null has not yet been ported and the transfer of "close by information" is not part of the new model. Both things are going to change soon.

In D64258#1571981, @jdoerfert wrote:

The direction of this makes total sense and we will need it. However this shoulnd't be here (wrt. the file/pass).

Assuming we want this right right now, it should life in FunctionAttrs.cpp. Assuming we want to do it "right" it should become part of the Attributor framework.

The early prototype of the "deref-or-null" abstract attribute already had this functionality, see https://reviews.llvm.org/D59202#C1381429NL1995, and the test case https://reviews.llvm.org/D59202#change-FJbHx7N4s6ye . For the new Attributor, dereferenceable-or-null has not yet been ported and the transfer of "close by information" is not part of the new model. Both things are going to change soon.

Thanks for taking a look! I was just about to add you as a reviewer. I know you are working on a major overhaul of this functionality, but I have not gotten a chance to look at those patches.
The reason I did not put this into FunctionAttrs.cpp is because that's currently too late to catch the motivating example from PR21780 using the default opt pass pipeline. Ie, -instcombine runs before -functionattrs and kills the loads before we have a chance to update the arguments. I would like to get this in soon to make the next clang major release, so this seemed like the patch of least resistance. :)
If there's a better way, I can certainly try to make it happen.

In D64258#1572004, @spatel wrote:

In D64258#1571981, @jdoerfert wrote:

The direction of this makes total sense and we will need it. However this shoulnd't be here (wrt. the file/pass).

Assuming we want this right right now, it should life in FunctionAttrs.cpp. Assuming we want to do it "right" it should become part of the Attributor framework.

The early prototype of the "deref-or-null" abstract attribute already had this functionality, see https://reviews.llvm.org/D59202#C1381429NL1995, and the test case https://reviews.llvm.org/D59202#change-FJbHx7N4s6ye . For the new Attributor, dereferenceable-or-null has not yet been ported and the transfer of "close by information" is not part of the new model. Both things are going to change soon.

Thanks for taking a look! I was just about to add you as a reviewer. I know you are working on a major overhaul of this functionality, but I have not gotten a chance to look at those patches.
The reason I did not put this into FunctionAttrs.cpp is because that's currently too late to catch the motivating example from PR21780 using the default opt pass pipeline. Ie, -instcombine runs before -functionattrs and kills the loads before we have a chance to update the arguments. I would like to get this in soon to make the next clang major release, so this seemed like the patch of least resistance. :)
If there's a better way, I can certainly try to make it happen.

Thanks for thinking of me ;) And again, I think this is an important change we need!

The Attributor is in tree and, if enabled, it is run very early (as I very very strongly believe it should). I think we can get the Attributor enabled for the next release (maybe with a low iteration count and restrictions on the attributes we derive). Now there are two missing parts to get this functionality into the Attributor in a decent way:

A generic way to "look around for existing information" (more on this below).
The abstract attribute for dereferenceability(_or_null) that makes use of 1) and potentially performs usual deduction.

Implementing 2) is fairly easy. It should not take long to create the boilerplate if we only want to rely on the deduction through 1). Also, the logic is already in this patch (and the old prototype).
Regarding 1):
I was going to work on this once I found some free cycles but I could do it now if we decide to go this way. The idea is that you specify a program point PP (=instruction) and a callback. The callback is then automatically applied to all instruction which have to be executed when PP is also reached, either before or after. I would like this to be an abstract interface from the get-go but I am also willing to provide the interface and the initial implementation that will at least suffice for this use case. It should then be used from the AbstractAttribute::initialize and AbstractAttribute::updateImpl method of the abstract attribute for the dereferenceable attribute (and others later as well).

P.S. You should be aware of the change to dereferenceability that is going to happen very soon, see D61652 and D63243 (I'm still fixing that one).

In D64258#1572056, @jdoerfert wrote:

Thanks for thinking of me ;) And again, I think this is an important change we need!

The Attributor is in tree and, if enabled, it is run very early (as I very very strongly believe it should). I think we can get the Attributor enabled for the next release (maybe with a low iteration count and restrictions on the attributes we derive). Now there are two missing parts to get this functionality into the Attributor in a decent way:

A generic way to "look around for existing information" (more on this below).

The abstract attribute for dereferenceability(_or_null) that makes use of 1) and potentially performs usual deduction.

Implementing 2) is fairly easy. It should not take long to create the boilerplate if we only want to rely on the deduction through 1). Also, the logic is already in this patch (and the old prototype).
Regarding 1):
I was going to work on this once I found some free cycles but I could do it now if we decide to go this way. The idea is that you specify a program point PP (=instruction) and a callback. The callback is then automatically applied to all instruction which have to be executed when PP is also reached, either before or after. I would like this to be an abstract interface from the get-go but I am also willing to provide the interface and the initial implementation that will at least suffice for this use case. It should then be used from the AbstractAttribute::initialize and AbstractAttribute::updateImpl method of the abstract attribute for the dereferenceable attribute (and others later as well).

P.S. You should be aware of the change to dereferenceability that is going to happen very soon, see D61652 and D63243 (I'm still fixing that one).

Thanks for the links. I'm still trying to digest where we stand currently and weigh the timing/risk/effort.

Attributor is in trunk, but it is not enabled by default? Or there are no transforms implemented that run with the default opt pipeline?
The current branch date for clang 9 is July 18 (10 days from today). That seems very tight to implement this based on the patch reviews that I skimmed. Ie, there's still a lot of back-and-forth going on in those reviews.

IMO, this patch carries significant risk alone. I'm basing that on the fact that D27855 still isn't enabled by default and the related D29999 was reverted because it caused crashing (I still haven't tracked down why).
So -- unless it would mean significantly more work to have this in trunk and then ported to the new and better way -- it is less overall work/risk to proceed here as an intermediate step since I already created this patch.

If there's nothing majorly wrong here, we can get several days of bot/fuzz/real-world testing on this code before the release. We are going to deprecate InferFunctionAttrs and its existing transform as part of switching to Attributor anyway, right? I can mark this with 'FIXME' and try to help with the porting if I've assessed that correctly.

hfinkel added inline comments.Jul 8 2019, 3:09 PM

llvm/lib/Transforms/IPO/InferFunctionAttrs.cpp
38	Why not also stores (or AtomicCmpXchg or AtomicRMW)?
42	This logic seems unnecessarily limited. Why not use GetPointerBaseWithConstantOffset?
120	Does this do the right thing if the lowest GEP index is negative? You could skip the negative ones first? As a general point, I think that you want logic here mirroring (a subset of) what's in isOverwrite in DeadStoreElimination.cpp

spatel marked 3 inline comments as done.Jul 9 2019, 8:31 AM

spatel added inline comments.

llvm/lib/Transforms/IPO/InferFunctionAttrs.cpp
38	Oversight on my part - tunnel vision based on the motivating cases. I've never done anything with the atomic opcodes, so I didn't remember them. For stores, how would dereferenceable aid optimization? Ok if we make this a TODO enhancement/follow-up?
42	Another oversight on my part. I wasn't thinking about cases with pointer casts, so that made the logic simpler. Will change, but hoping a partial implementation is good enough for an initial patch (see next comment).
120	I don't think there was miscompile potential there, but that was accidental. I didn't think about negative offsets. Will fix. I'd like to make the DSE-like enhancement to support arbitrary-sized sub-ranges (via pointer casts) a follow-up, so this patch doesn't get too complicated.

Patch updated:

Used GetPointerBaseWithConstantOffset() to allow more complex pattern matching.
But limited that matching to cases where the argument and access have the same size to reduce complexity.
Generalized variable names and comments to allow less churn for follow-up enhancements.
Added tests with multiple dereferenceable arguments, pointer casts, and negative offsets.

Herald added a subscriber: jfb. · View Herald TranscriptJul 9 2019, 8:38 AM

Passing-by thought:
Does attributor already being run in the pipeline?
Is it in the state where it should be extended instead of adding more backlog for porting into it?

I do not want to block this patch but I still believe that this is the wrong way to go (middle/long term). The fact that we need to put this not in FunctionAttrs.cpp, where the other deductions live, but in InferFunctionAttrs.cpp, where we so far only annotated library functions, should be a first sign. Also, the functionality here is only one way to deduce dereferenceable, arguably, you want all the ways together such that they can benefit from each other.

In D64258#1576166, @lebedev.ri wrote:

Passing-by thought:
Does attributor already being run in the pipeline?
Is it in the state where it should be extended instead of adding more backlog for porting into it?

The pass is always "run" in the pipeline but by default the Attributor object will not be created. The cmd line flag attributor-disable defaults to true for now.

I think we can reasonably extend it and we are making progress getting rid of the backlog. Multiple deductions got in already and if one doesn't depend on any AbstractAttribute stuck in the pipeline (which you can choose not to in the beginning), one can easily get it in.

Going the Attributor route:
I'll upload a prototype of a very generic visitor that can be used to inspect all instructions that "must be executed with" a given one. That functionality is useful on its own and for various attributes so I will first finish the visitor, then we need an interface for AbstractAttributes, and then we can add deductions based on instructions that are executed with.

In D64258#1576314, @jdoerfert wrote:

I do not want to block this patch but I still believe that this is the wrong way to go (middle/long term). The fact that we need to put this not in FunctionAttrs.cpp, where the other deductions live, but in InferFunctionAttrs.cpp, where we so far only annotated library functions, should be a first sign. Also, the functionality here is only one way to deduce dereferenceable, arguably, you want all the ways together such that they can benefit from each other.

I don't disagree with the long-term view. My thoughts about the risk/reward are in my comment from yesterday. I'd like to get this in for the practical/short-term advantage to the clang 9.0 release.

hfinkel added inline comments.Jul 9 2019, 11:58 AM

llvm/lib/Transforms/IPO/InferFunctionAttrs.cpp
38	If you store to an address, then you know that it is dereferenceable. The point is not to aid the optimization of the store, but to aid the optimization of later loads. I'd like to see stores handled here - if we're going to find problems with this, that will make it more likely that we'll find them quickly. Also, you must omit volatile loads and stores (IIRC, our semantics mean that volatile loads/stores won't imply dereferenceability for the non-volatile accesses).

Mostly comments to improve this. Two required changes.

Maybe we could mention that this logic should, or better will, move into the Attributor framework somewhere?

llvm/lib/Transforms/IPO/InferFunctionAttrs.cpp
42	"Style": Is this the only use of the "match" function? If so, why not do the (in the middle-end) more familiar pattern of `dyn_cast` and `getPointerOperand`?
49	I don't think you can get a `nullptr` back.
59	Maybe a bit more general, something like: bits = DL.getTypeSizeInBits(ArgTy->getType()->getPointerElementType()` // Round down to the nearest multiple of 8, dereferenceable attributes uses bytes. bits = bits - bits % 8; if (!bits) continue; (`GEPOperator::accumulateConstantOffset` uses `DL.getTypeAllocSize` which we could probably use as well.)
72	Two ideas: We could only track minimum + maximum `ByteOffset` values, iff we know you cannot "jump" between allocations. Regardless of 1), we could use the maximum `ByteOffset` value we found for a `inbounds` GEP as a lower bound for the `dereferenceable` bytes. `inbounds` GEPs should not allow to do any "jumping" or starting outside the object.
134	Wrt. the above changes it would probably be: `MaxOffset * EltSize / 8`.
140	You need to remove `deref_or_null` as well.
llvm/test/Transforms/InferFunctionAttrs/dereferenceable.ll
114–115	volatile should not cause `deref`, I think this was said: This means the compiler may not use a volatile operation to prove a non-volatile access to that address has defined behavior.
173	Yes ;)

spatel mentioned this in rL365636: [InferFunctionAttrs] add/adjust tests for dereferenceable; NFC.Jul 10 2019, 7:42 AM

spatel mentioned this in rG9cd82a4fbd2d: [InferFunctionAttrs] add/adjust tests for dereferenceable; NFC.

In D64258#1577635, @jdoerfert wrote:

Mostly comments to improve this. Two required changes.

Thanks!

Maybe we could mention that this logic should, or better will, move into the Attributor framework somewhere?

Yes - I'll add a FIXME to the top of this file, so we know the whole thing should go away.

llvm/lib/Transforms/IPO/InferFunctionAttrs.cpp
42	Yes - that's the only use of match. Since we're going to support stores now, it will go away. Note: 'match' is the more familiar pattern to me because that's used throughout instcombine.
49	Ah, I misunderstood the API - I thought if there's no constant offset, the base should be returned as nullptr. Independent of this patch: add a line to the documentation comment to make the behavior clear as part of the cleanup in D64468?
llvm/test/Transforms/InferFunctionAttrs/dereferenceable.ll
173	I've left this as a TODO for now because it's highly unusual to see non-power-of-2 bitwidths. We're going to rewrite the code fairly soon, and we'll have a code comment and this test in place to remind us that we can make the logic more flexible.

jdoerfert mentioned this in D64468: Replace three "strip & accumulate" implementations with a single one.Jul 10 2019, 9:20 AM

Fwiw, I don't need all the proposed improvements. Getting this in with one or more FIXMEs is fine with me.

llvm/lib/Transforms/IPO/InferFunctionAttrs.cpp
49	Independent of this patch: add a line to the documentation comment to make the behavior clear as part of the cleanup in D64468? Done, take a look if that is what you wanted. (The wrappers are "not well" documented but only the base function is.)
llvm/test/Transforms/InferFunctionAttrs/dereferenceable.ll
173	Fine with me.

Patch updated:

Allow stores to have the same inferences as loads. This exposed more clang test failures, so those diffs are included.
Don't infer anything from volatile (non-simple) memory accesses.
There was a bug in how we dealt isGuaranteedToTransferExecutionToSuccessor(), so added an assert and a test with a function call to verify that.
Added code/test for replacing DereferenceableOrNull attribute.
Added FIXME comment to indicate that this pass should be subsumed by Attributor.

Right now the control flow isn't clever, but I wonder if, as this analysis becomes more powerful, it'll have to act differently when -fno-delete-null-pointer-checks is specified? Is there a simple test that you can add to make sure null pointer checks don't cause false assumptions whenever this optimization becomes smarter?

llvm/lib/Transforms/IPO/InferFunctionAttrs.cpp
38	It does seem like you want to handle non-volatile atomic load / store, as well as cmpxchg and RMW.

In D64258#1578820, @jfb wrote:

Right now the control flow isn't clever, but I wonder if, as this analysis becomes more powerful, it'll have to act differently when -fno-delete-null-pointer-checks is specified? Is there a simple test that you can add to make sure null pointer checks don't cause false assumptions whenever this optimization becomes smarter?

I hadn't seen that flag before. In IR we have this translation of the minimal test in clang/test/CodeGen/nonnull.c:

define void @foo(i32* nocapture dereferenceable(4) %x) #0 {
  store i32 0, i32* %x, align 4, !tbaa !3
  ret void
}

attributes #0 = { ... "null-pointer-is-valid"="true" ... }

This pass/patch doesn't act on the dereferenceable attribute; it just adds it. So some other pass would be at risk if it tries a transform based on "dereferenceable" without checking the "null-pointer-is-valid" function attribute or "nonnull" argument attribute? Argument::hasNonNullAttr() seems safe.

spatel marked 2 inline comments as done.Jul 10 2019, 11:28 AM

spatel added inline comments.

llvm/lib/Transforms/IPO/InferFunctionAttrs.cpp
38	Ok - one more TODO. :)

Patch updated:
Add TODO code comment about using "isSimple()" and add test with an atomic load.

Are there ever cases where you know something is always dereferenceable based on where the memory lives? For example, globals can always be dereferenced, so can the stack (but I don't think the stack exists here). Are there other locations?

How do you deal with address spaces? Should you only infer this attribute for address space 0?

Also, would it make sense to separate readable from writable? We currently have this bug where LLVM will promote all const static globals to rodata, and sometimes generate atomic cmpxchg to them (e.g. because we're trying to load a 128-bit value). Similarly, we might want to honor R / W memory protection in general. Right now dereferenceable just means "you can load from this", because we can't speculate most stores.

In D64258#1579146, @jfb wrote:

Are there ever cases where you know something is always dereferenceable based on where the memory lives? For example, globals can always be dereferenced, so can the stack (but I don't think the stack exists here). Are there other locations?

Probably, and the Attributor implementation can do deduction of that (also the stack case). However, dereferenceable_globally is not available yet (D61652). For now, we cannot distinguish between "globally dereferenceable" and "locally dereferenceable".

How do you deal with address spaces? Should you only infer this attribute for address space 0?

I do think all address spaces are fine. Must accesses should always imply dereferenceability, afaik, except if they are volatile.

Also, would it make sense to separate readable from writable? We currently have this bug where LLVM will promote all const static globals to rodata, and sometimes generate atomic cmpxchg to them (e.g. because we're trying to load a 128-bit value). Similarly, we might want to honor R / W memory protection in general. Right now dereferenceable just means "you can load from this", because we can't speculate most stores.

I do not understand the problem but I have the feeling this is an orthogonal issue.

Also, would it make sense to separate readable from writable? We currently have this bug where LLVM will promote all const static globals to rodata, and sometimes generate atomic cmpxchg to them (e.g. because we're trying to load a 128-bit value). Similarly, we might want to honor R / W memory protection in general. Right now dereferenceable just means "you can load from this", because we can't speculate most stores.

I do not understand the problem but I have the feeling this is an orthogonal issue.

mprotect can make memory readable but not writable, or writable but not readable... or neither. What does dereferenceable mean when faced with this fact? Further, what happens to dereferenceable when mprotect is called (any opaque function could call it)? I don't think this is an orthogonal problem at all.

In D64258#1579214, @jfb wrote:

Also, would it make sense to separate readable from writable? We currently have this bug where LLVM will promote all const static globals to rodata, and sometimes generate atomic cmpxchg to them (e.g. because we're trying to load a 128-bit value). Similarly, we might want to honor R / W memory protection in general. Right now dereferenceable just means "you can load from this", because we can't speculate most stores.

I do not understand the problem but I have the feeling this is an orthogonal issue.

mprotect can make memory readable but not writable, or writable but not readable... or neither. What does dereferenceable mean when faced with this fact? Further, what happens to dereferenceable when mprotect is called (any opaque function could call it)? I don't think this is an orthogonal problem at all.

So, I guess what the above means is "dereferenceable" is too coarse grained. We have "global dereferenceability" that cannot be changed, and we have "local dereferenceability" that can be changed, e.g., through calls to free, realloc, or mprotect. From accesses we can only deduce "local dereferenceability". Now, that is why we need D61652, or more precisely, D63243. After those changes landed, the reasoning introduced in this patch should be fine, before, it is as broken as Clang is when it emits dereferenceable for arguments passed by reference. (The logic above, with the same problems and more, is also used in ArgumentPromotion right now...).

In D64258#1579308, @jdoerfert wrote:

In D64258#1579214, @jfb wrote:

Also, would it make sense to separate readable from writable? We currently have this bug where LLVM will promote all const static globals to rodata, and sometimes generate atomic cmpxchg to them (e.g. because we're trying to load a 128-bit value). Similarly, we might want to honor R / W memory protection in general. Right now dereferenceable just means "you can load from this", because we can't speculate most stores.

I do not understand the problem but I have the feeling this is an orthogonal issue.

mprotect can make memory readable but not writable, or writable but not readable... or neither. What does dereferenceable mean when faced with this fact? Further, what happens to dereferenceable when mprotect is called (any opaque function could call it)? I don't think this is an orthogonal problem at all.

So, I guess what the above means is "dereferenceable" is too coarse grained. We have "global dereferenceability" that cannot be changed, and we have "local dereferenceability" that can be changed, e.g., through calls to free, realloc, or mprotect. From accesses we can only deduce "local dereferenceability". Now, that is why we need D61652, or more precisely, D63243. After those changes landed, the reasoning introduced in this patch should be fine, before, it is as broken as Clang is when it emits dereferenceable for arguments passed by reference. (The logic above, with the same problems and more, is also used in ArgumentPromotion right now...).

So we are saying that the current attribute is too vague/broken to be useful? Ie, this patch must be abandoned?

In D64258#1579501, @spatel wrote:

In D64258#1579308, @jdoerfert wrote:

In D64258#1579214, @jfb wrote:

Also, would it make sense to separate readable from writable? We currently have this bug where LLVM will promote all const static globals to rodata, and sometimes generate atomic cmpxchg to them (e.g. because we're trying to load a 128-bit value). Similarly, we might want to honor R / W memory protection in general. Right now dereferenceable just means "you can load from this", because we can't speculate most stores.

I do not understand the problem but I have the feeling this is an orthogonal issue.

mprotect can make memory readable but not writable, or writable but not readable... or neither. What does dereferenceable mean when faced with this fact? Further, what happens to dereferenceable when mprotect is called (any opaque function could call it)? I don't think this is an orthogonal problem at all.

So, I guess what the above means is "dereferenceable" is too coarse grained. We have "global dereferenceability" that cannot be changed, and we have "local dereferenceability" that can be changed, e.g., through calls to free, realloc, or mprotect. From accesses we can only deduce "local dereferenceability". Now, that is why we need D61652, or more precisely, D63243. After those changes landed, the reasoning introduced in this patch should be fine, before, it is as broken as Clang is when it emits dereferenceable for arguments passed by reference. (The logic above, with the same problems and more, is also used in ArgumentPromotion right now...).

So we are saying that the current attribute is too vague/broken to be useful? Ie, this patch must be abandoned?

The current situation is broken but works "so far". Adding this will expose the broken part further, that is the reason why I started to fix the situation in the first place ;)
That being said, I think the patch is "fine", at the latest after D63243 is in.

Now for timeline, in case you want to avoid exposing the problem with this patch:
I hope to land D61652 this week, after I can test the latest update of D63243 which depends on nosync. D63243 would then land a week or so later, giving front-end people time to update from dereferenceable to dereferenceable_globally if they actually want that behavior.

spatel mentioned this in D64432: [InstCombine] try to narrow a truncated load.Jul 12 2019, 10:30 AM

uenoku added a subscriber: uenoku.Jul 13 2019, 1:18 AM

uenoku mentioned this in D64876: [Attributor] Deduce "dereferenceable" attribute.Jul 17 2019, 11:38 AM

Quick update: I finally put the skeleton of what I wanted to use in the Attributor online: D64975. The bridge to the Attributor is still missing but the dereferenceable deduction and propagation part is already there: D64876

uenoku mentioned this in rL366788: [Attributor] Deduce "dereferenceable" attribute.Jul 23 2019, 1:19 AM

uenoku mentioned this in rG19c07afe17fc: [Attributor] Deduce "dereferenceable" attribute.

uenoku mentioned this in D65402: [Attributor][MustExec] Deduce dereferenceable and nonnull attribute using MustBeExecutedContextExplorer.Jul 29 2019, 9:43 AM

dtemirbulatov added a subscriber: dtemirbulatov.Sep 2 2019, 9:48 AM

uenoku mentioned this in rL374063: [Attributor][MustExec] Deduce dereferenceable and nonnull attribute using….Oct 8 2019, 8:26 AM

uenoku mentioned this in rG96e6ce4cd361: [Attributor][MustExec] Deduce dereferenceable and nonnull attribute using….

where are we with this patch vs attributor? are we close to the attributor being able to perform the equivalent of some of the improvements to the test cases?

Attributor catches some of the cases but it doesn't create the access map (see getArgToOffsetsMap). @uenoku is that correct?

In D64258#1759021, @jdoerfert wrote:

Attributor catches some of the cases but it doesn't create the access map (see getArgToOffsetsMap). @uenoku is that correct?

Right. I'll try to create the access map.

uenoku mentioned this in D70714: [Attributor] Deduce dereferenceable based on accessed bytes map.Nov 26 2019, 5:25 AM

I uploaded a patch(D70714). As far as I see, almost all the cases are covered.

In D64258#1760106, @uenoku wrote:

I uploaded a patch(D70714). As far as I see, almost all the cases are covered.

Thanks! The new patch appears to do everything that this patch tried to do and much more, so I'm happy to abandon and continue discussion on the new patch.

There's 1 new test that I was planning to add based on feedback here:

; TODO: We should allow inference for atomic (but not volatile) ops.

define void @atomic_is_alright(i16* %ptr) {
; CHECK-LABEL: @atomic_is_alright(i16* %ptr)
  %arrayidx0 = getelementptr i16, i16* %ptr, i64 0
  %arrayidx1 = getelementptr i16, i16* %ptr, i64 1
  %arrayidx2 = getelementptr i16, i16* %ptr, i64 2
  %t0 = load atomic i16, i16* %arrayidx0 unordered, align 2
  %t1 = load i16, i16* %arrayidx1
  %t2 = load i16, i16* %arrayidx2
  ret void
}

...so I can commit that as-is to trunk as a baseline test, and then you can rebase D70714 as needed.

In D64258#1756898, @RKSimon wrote:

where are we with this patch vs attributor? are we close to the attributor being able to perform the equivalent of some of the improvements to the test cases?

I am wondering whether Attributor will be enabled in next release.

spatel mentioned this in rG2bd252ea8941: [InferFuncAttributes][Attributor] add tests for 'dereferenceable'; NFC.Nov 26 2019, 6:15 AM

There was 1 more test added here that I missed in the last comment:

define void @more_bytes_and_not_null(i32* dereferenceable_or_null(8) %ptr) {

Committed that and:

define void @atomic_is_alright(i16* %ptr) {

rG2bd252ea8941

So let's abandon this and find out when Attributor will be enabled in D70714.

spatel abandoned this revision.Nov 26 2019, 6:20 AM

uenoku mentioned this in rG6c742fdbf48e: [Attributor] Deduce dereferenceable based on accessed bytes map.Nov 28 2019, 11:00 PM

Revision Contents

Path

Size

clang/

test/

CodeGen/

ms-intrinsics.c

8 lines

ms-x86-intrinsics.c

4 lines

systemz-inline-asm.c

2 lines

CodeGenOpenCL/

kernels-have-spir-cc-by-default.cl

8 lines

llvm/

lib/

Transforms/

IPO/

InferFunctionAttrs.cpp

160 lines

test/

CodeGen/

AMDGPU/

inline-attr.ll

2 lines

Transforms/

InferFunctionAttrs/

dereferenceable.ll

75 lines

Diff 209033

clang/test/CodeGen/ms-intrinsics.c

	Show First 20 Lines • Show All 493 Lines • ▼ Show 20 Lines
	// CHECK: ret i32 [[RESULT]]			// CHECK: ret i32 [[RESULT]]
	// CHECK: }			// CHECK: }

	char test_iso_volatile_load8(char volatile *p) { return __iso_volatile_load8(p); }			char test_iso_volatile_load8(char volatile *p) { return __iso_volatile_load8(p); }
	short test_iso_volatile_load16(short volatile *p) { return __iso_volatile_load16(p); }			short test_iso_volatile_load16(short volatile *p) { return __iso_volatile_load16(p); }
	int test_iso_volatile_load32(int volatile *p) { return __iso_volatile_load32(p); }			int test_iso_volatile_load32(int volatile *p) { return __iso_volatile_load32(p); }
	__int64 test_iso_volatile_load64(__int64 volatile *p) { return __iso_volatile_load64(p); }			__int64 test_iso_volatile_load64(__int64 volatile *p) { return __iso_volatile_load64(p); }

	// CHECK: define{{.}}i8 @test_iso_volatile_load8(i8{{[a-z_ ]*}}%p)			// CHECK: define{{.}}i8 @test_iso_volatile_load8(i8{{[a-z0-9_() ]*}}%p)
	// CHECK: = load volatile i8, i8* %p			// CHECK: = load volatile i8, i8* %p
	// CHECK: define{{.}}i16 @test_iso_volatile_load16(i16{{[a-z_ ]*}}%p)			// CHECK: define{{.}}i16 @test_iso_volatile_load16(i16{{[a-z0-9_() ]*}}%p)
	// CHECK: = load volatile i16, i16* %p			// CHECK: = load volatile i16, i16* %p
	// CHECK: define{{.}}i32 @test_iso_volatile_load32(i32{{[a-z_ ]*}}%p)			// CHECK: define{{.}}i32 @test_iso_volatile_load32(i32{{[a-z0-9_() ]*}}%p)
	// CHECK: = load volatile i32, i32* %p			// CHECK: = load volatile i32, i32* %p
	// CHECK: define{{.}}i64 @test_iso_volatile_load64(i64{{[a-z_ ]*}}%p)			// CHECK: define{{.}}i64 @test_iso_volatile_load64(i64{{[a-z0-9_() ]*}}%p)
	// CHECK: = load volatile i64, i64* %p			// CHECK: = load volatile i64, i64* %p

	void test_iso_volatile_store8(char volatile *p, char v) { __iso_volatile_store8(p, v); }			void test_iso_volatile_store8(char volatile *p, char v) { __iso_volatile_store8(p, v); }
	void test_iso_volatile_store16(short volatile *p, short v) { __iso_volatile_store16(p, v); }			void test_iso_volatile_store16(short volatile *p, short v) { __iso_volatile_store16(p, v); }
	void test_iso_volatile_store32(int volatile *p, int v) { __iso_volatile_store32(p, v); }			void test_iso_volatile_store32(int volatile *p, int v) { __iso_volatile_store32(p, v); }
	void test_iso_volatile_store64(__int64 volatile *p, __int64 v) { __iso_volatile_store64(p, v); }			void test_iso_volatile_store64(__int64 volatile *p, __int64 v) { __iso_volatile_store64(p, v); }

	// CHECK: define{{.}}void @test_iso_volatile_store8(i8{{[a-z_ ]}}%p, i8 {{[a-z_ ]}}%v)			// CHECK: define{{.}}void @test_iso_volatile_store8(i8{{[a-z_ ]}}%p, i8 {{[a-z_ ]}}%v)
	▲ Show 20 Lines • Show All 868 Lines • Show Last 20 Lines

clang/test/CodeGen/ms-x86-intrinsics.c

	Show First 20 Lines • Show All 114 Lines • ▼ Show 20 Lines
	// CHECK-X64-LABEL: define dso_local i64 @test__umulh(i64 %a, i64 %b)			// CHECK-X64-LABEL: define dso_local i64 @test__umulh(i64 %a, i64 %b)
	// CHECK-X64: = mul nuw i128 %			// CHECK-X64: = mul nuw i128 %

	__int64 test_mul128(__int64 Multiplier,			__int64 test_mul128(__int64 Multiplier,
	__int64 Multiplicand,			__int64 Multiplicand,
	__int64 *HighProduct) {			__int64 *HighProduct) {
	return _mul128(Multiplier, Multiplicand, HighProduct);			return _mul128(Multiplier, Multiplicand, HighProduct);
	}			}
	// CHECK-X64-LABEL: define dso_local i64 @test_mul128(i64 %Multiplier, i64 %Multiplicand, i64{{[a-z_ ]}}%HighProduct)			// CHECK-X64-LABEL: define dso_local i64 @test_mul128(i64 %Multiplier, i64 %Multiplicand, i64{{[a-z0-9()_ ]}}%HighProduct)
	// CHECK-X64: = sext i64 %Multiplier to i128			// CHECK-X64: = sext i64 %Multiplier to i128
	// CHECK-X64: = sext i64 %Multiplicand to i128			// CHECK-X64: = sext i64 %Multiplicand to i128
	// CHECK-X64: = mul nsw i128 %			// CHECK-X64: = mul nsw i128 %
	// CHECK-X64: store i64 %			// CHECK-X64: store i64 %
	// CHECK-X64: ret i64 %			// CHECK-X64: ret i64 %

	unsigned __int64 test_umul128(unsigned __int64 Multiplier,			unsigned __int64 test_umul128(unsigned __int64 Multiplier,
	unsigned __int64 Multiplicand,			unsigned __int64 Multiplicand,
	unsigned __int64 *HighProduct) {			unsigned __int64 *HighProduct) {
	return _umul128(Multiplier, Multiplicand, HighProduct);			return _umul128(Multiplier, Multiplicand, HighProduct);
	}			}
	// CHECK-X64-LABEL: define dso_local i64 @test_umul128(i64 %Multiplier, i64 %Multiplicand, i64{{[a-z_ ]}}%HighProduct)			// CHECK-X64-LABEL: define dso_local i64 @test_umul128(i64 %Multiplier, i64 %Multiplicand, i64{{[a-z0-9()_ ]}}%HighProduct)
	// CHECK-X64: = zext i64 %Multiplier to i128			// CHECK-X64: = zext i64 %Multiplier to i128
	// CHECK-X64: = zext i64 %Multiplicand to i128			// CHECK-X64: = zext i64 %Multiplicand to i128
	// CHECK-X64: = mul nuw i128 %			// CHECK-X64: = mul nuw i128 %
	// CHECK-X64: store i64 %			// CHECK-X64: store i64 %
	// CHECK-X64: ret i64 %			// CHECK-X64: ret i64 %

	unsigned __int64 test__shiftleft128(unsigned __int64 l, unsigned __int64 h,			unsigned __int64 test__shiftleft128(unsigned __int64 l, unsigned __int64 h,
	unsigned char d) {			unsigned char d) {
	Show All 28 Lines

clang/test/CodeGen/systemz-inline-asm.c

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	double test_f64(double f, double g) {
return f;		return f;
// CHECK-LABEL: define double @test_f64(double %f, double %g)		// CHECK-LABEL: define double @test_f64(double %f, double %g)
// CHECK: call double asm "adbr $0, $2", "=f,0,f"(double %f, double %g)		// CHECK: call double asm "adbr $0, $2", "=f,0,f"(double %f, double %g)
}		}

long double test_f128(long double f, long double g) {		long double test_f128(long double f, long double g) {
asm("axbr %0, %2" : "=f" (f) : "0" (f), "f" (g));		asm("axbr %0, %2" : "=f" (f) : "0" (f), "f" (g));
return f;		return f;
// CHECK: define void @test_f128(fp128* noalias nocapture sret [[DEST:%.]], fp128 nocapture readonly, fp128* nocapture readonly)		// CHECK: define void @test_f128(fp128* noalias nocapture sret dereferenceable(16) [[DEST:%.]], fp128 nocapture readonly dereferenceable(16), fp128* nocapture readonly dereferenceable(16))
// CHECK: %f = load fp128, fp128* %0		// CHECK: %f = load fp128, fp128* %0
// CHECK: %g = load fp128, fp128* %1		// CHECK: %g = load fp128, fp128* %1
// CHECK: [[RESULT:%.*]] = tail call fp128 asm "axbr $0, $2", "=f,0,f"(fp128 %f, fp128 %g)		// CHECK: [[RESULT:%.*]] = tail call fp128 asm "axbr $0, $2", "=f,0,f"(fp128 %f, fp128 %g)
// CHECK: store fp128 [[RESULT]], fp128* [[DEST]]		// CHECK: store fp128 [[RESULT]], fp128* [[DEST]]
}		}

clang/test/CodeGenOpenCL/kernels-have-spir-cc-by-default.cl

Show All 22 Lines	typedef struct test_struct {
short elementG;		short elementG;
double elementH;		double elementH;
} test_struct;		} test_struct;

kernel void test_single(int_single input, global int* output) {		kernel void test_single(int_single input, global int* output) {
// CHECK: spir_kernel		// CHECK: spir_kernel
// AMDGCN: define amdgpu_kernel void @test_single		// AMDGCN: define amdgpu_kernel void @test_single
// CHECK: struct.int_single* nocapture {{.*}} byval(%struct.int_single)		// CHECK: struct.int_single* nocapture {{.*}} byval(%struct.int_single)
// CHECK: i32* nocapture %output		// CHECK: i32* nocapture dereferenceable(4) %output
output[0] = input.a;		output[0] = input.a;
}		}

kernel void test_pair(int_pair input, global int* output) {		kernel void test_pair(int_pair input, global int* output) {
// CHECK: spir_kernel		// CHECK: spir_kernel
// AMDGCN: define amdgpu_kernel void @test_pair		// AMDGCN: define amdgpu_kernel void @test_pair
// CHECK: struct.int_pair* nocapture {{.*}} byval(%struct.int_pair)		// CHECK: struct.int_pair* nocapture {{.*}} byval(%struct.int_pair)
// CHECK: i32* nocapture %output		// CHECK: i32* nocapture dereferenceable(8) %output
output[0] = (int)input.a;		output[0] = (int)input.a;
output[1] = (int)input.b;		output[1] = (int)input.b;
}		}

kernel void test_kernel(test_struct input, global int* output) {		kernel void test_kernel(test_struct input, global int* output) {
// CHECK: spir_kernel		// CHECK: spir_kernel
// AMDGCN: define amdgpu_kernel void @test_kernel		// AMDGCN: define amdgpu_kernel void @test_kernel
// CHECK: struct.test_struct* nocapture {{.*}} byval(%struct.test_struct)		// CHECK: struct.test_struct* nocapture {{.*}} byval(%struct.test_struct)
// CHECK: i32* nocapture %output		// CHECK: i32* nocapture dereferenceable(32) %output
output[0] = input.elementA;		output[0] = input.elementA;
output[1] = input.elementB;		output[1] = input.elementB;
output[2] = (int)input.elementC;		output[2] = (int)input.elementC;
output[3] = (int)input.elementD;		output[3] = (int)input.elementD;
output[4] = (int)input.elementE;		output[4] = (int)input.elementE;
output[5] = (int)input.elementF;		output[5] = (int)input.elementF;
output[6] = (int)input.elementG;		output[6] = (int)input.elementG;
output[7] = (int)input.elementH;		output[7] = (int)input.elementH;
};		};

void test_function(int_pair input, global int* output) {		void test_function(int_pair input, global int* output) {
// CHECK-NOT: spir_kernel		// CHECK-NOT: spir_kernel
// AMDGCN-NOT: define amdgpu_kernel void @test_function		// AMDGCN-NOT: define amdgpu_kernel void @test_function
// CHECK: i64 %input.coerce0, i64 %input.coerce1, i32* nocapture %output		// CHECK: i64 %input.coerce0, i64 %input.coerce1, i32* nocapture dereferenceable(8) %output
output[0] = (int)input.a;		output[0] = (int)input.a;
output[1] = (int)input.b;		output[1] = (int)input.b;
}		}

llvm/lib/Transforms/IPO/InferFunctionAttrs.cpp

	//===- InferFunctionAttrs.cpp - Infer implicit function attributes --------===//			//===- InferFunctionAttrs.cpp - Infer implicit function attributes --------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "llvm/Transforms/IPO/InferFunctionAttrs.h"			#include "llvm/Transforms/IPO/InferFunctionAttrs.h"
	#include "llvm/Analysis/TargetLibraryInfo.h"			#include "llvm/Analysis/TargetLibraryInfo.h"
				#include "llvm/Analysis/ValueTracking.h"
	#include "llvm/IR/Function.h"			#include "llvm/IR/Function.h"
	#include "llvm/IR/LLVMContext.h"			#include "llvm/IR/LLVMContext.h"
	#include "llvm/IR/Module.h"			#include "llvm/IR/Module.h"
	#include "llvm/Support/Debug.h"			#include "llvm/Support/Debug.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"
	#include "llvm/Transforms/Utils/BuildLibCalls.h"			#include "llvm/Transforms/Utils/BuildLibCalls.h"
	using namespace llvm;			using namespace llvm;

				// TODO: Could use an LLVM set container, but requires sorting?
				using SetOfOffsets = std::set<int64_t>;
				using ArgToOffsetsMap = SmallDenseMap<Argument *, SetOfOffsets>;

	#define DEBUG_TYPE "inferattrs"			#define DEBUG_TYPE "inferattrs"

	static bool inferAllPrototypeAttributes(Module &M,			// FIXME: This entire pass should be deprecated by making the "Attributor" pass
	const TargetLibraryInfo &TLI) {			// handle these kinds of inferences.

				static void getArgToOffsetsMap(Function &F, ArgToOffsetsMap &ArgOffsetMap) {
				// To apply a dereferenceable attribute to an argument based on a memory
				// access in the function, the access must be guaranteed to execute every time
				// the function is called.
				// Conservatively, only check for memory ops in the entry block that are
				// guaranteed to execute.
				// TODO: This could be enhanced by testing if a memory access post-dominates
				// the entry block (walking to/from the load). We can also check if a
				// block is guaranteed to transfer execution to another block.
				const DataLayout &DL = F.getParent()->getDataLayout();
				hfinkelUnsubmitted Not Done Reply Inline Actions Why not also stores (or AtomicCmpXchg or AtomicRMW)? hfinkel: Why not also stores (or AtomicCmpXchg or AtomicRMW)?
				spatelAuthorUnsubmitted Done Reply Inline Actions Oversight on my part - tunnel vision based on the motivating cases. I've never done anything with the atomic opcodes, so I didn't remember them. For stores, how would dereferenceable aid optimization? Ok if we make this a TODO enhancement/follow-up? spatel: Oversight on my part - tunnel vision based on the motivating cases. I've never done anything…
				hfinkelUnsubmitted Not Done Reply Inline Actions If you store to an address, then you know that it is dereferenceable. The point is not to aid the optimization of the store, but to aid the optimization of later loads. I'd like to see stores handled here - if we're going to find problems with this, that will make it more likely that we'll find them quickly. Also, you must omit volatile loads and stores (IIRC, our semantics mean that volatile loads/stores won't imply dereferenceability for the non-volatile accesses). hfinkel: If you store to an address, then you know that it is dereferenceable. The point is not to aid…
				jfbUnsubmitted Done Reply Inline Actions It does seem like you want to handle non-volatile atomic load / store, as well as cmpxchg and RMW. jfb: It does seem like you want to handle non-volatile atomic load / store, as well as cmpxchg and…
				spatelAuthorUnsubmitted Done Reply Inline Actions Ok - one more TODO. :) spatel: Ok - one more TODO. :)
				BasicBlock &Entry = F.getEntryBlock();
				for (Instruction &I : Entry) {
				// Analyze pointer operands of any load/store instruction.
				// TODO: Allow cmpxchg and atomicrmw opcodes.
				hfinkelUnsubmitted Not Done Reply Inline Actions This logic seems unnecessarily limited. Why not use GetPointerBaseWithConstantOffset? hfinkel: This logic seems unnecessarily limited. Why not use GetPointerBaseWithConstantOffset?
				spatelAuthorUnsubmitted Done Reply Inline Actions Another oversight on my part. I wasn't thinking about cases with pointer casts, so that made the logic simpler. Will change, but hoping a partial implementation is good enough for an initial patch (see next comment). spatel: Another oversight on my part. I wasn't thinking about cases with pointer casts, so that made…
				jdoerfertUnsubmitted Done Reply Inline Actions "Style": Is this the only use of the "match" function? If so, why not do the (in the middle-end) more familiar pattern of `dyn_cast` and `getPointerOperand`? jdoerfert: "Style": Is this the only use of the "match" function? If so, why not do the (in the middle…
				spatelAuthorUnsubmitted Done Reply Inline Actions Yes - that's the only use of match. Since we're going to support stores now, it will go away. Note: 'match' is the more familiar pattern to me because that's used throughout instcombine. spatel: Yes - that's the only use of match. Since we're going to support stores now, it will go away.
				// TODO: "isSimple()" excludes atomic ops, but some subset of those should
				// be allowed.
				Value *PtrOp = nullptr;
				switch (I.getOpcode()) {
				case Instruction::Load: {
				auto *Load = cast<LoadInst>(&I);
				if (Load->isSimple())
				jdoerfertUnsubmitted Done Reply Inline Actions I don't think you can get a `nullptr` back. jdoerfert: I don't think you can get a `nullptr` back.
				spatelAuthorUnsubmitted Done Reply Inline Actions Ah, I misunderstood the API - I thought if there's no constant offset, the base should be returned as nullptr. Independent of this patch: add a line to the documentation comment to make the behavior clear as part of the cleanup in D64468? spatel: Ah, I misunderstood the API - I thought if there's no constant offset, the base should be…
				jdoerfertUnsubmitted Not Done Reply Inline Actions Independent of this patch: add a line to the documentation comment to make the behavior clear as part of the cleanup in D64468? Done, take a look if that is what you wanted. (The wrappers are "not well" documented but only the base function is.) jdoerfert: > Independent of this patch: add a line to the documentation comment to make the behavior clear…
				PtrOp = Load->getPointerOperand();
				break;
				}
				case Instruction::Store: {
				auto *Store = cast<StoreInst>(&I);
				if (Store->isSimple())
				PtrOp = Store->getPointerOperand();
				break;
				}
				default:
				jdoerfertUnsubmitted Not Done Reply Inline Actions Maybe a bit more general, something like: bits = DL.getTypeSizeInBits(ArgTy->getType()->getPointerElementType()` // Round down to the nearest multiple of 8, dereferenceable attributes uses bytes. bits = bits - bits % 8; if (!bits) continue; (`GEPOperator::accumulateConstantOffset` uses `DL.getTypeAllocSize` which we could probably use as well.) jdoerfert: Maybe a bit more general, something like: ``` bits = DL.getTypeSizeInBits(ArgTy->getType()…
				break;
				}
				if (!PtrOp) {
				if (!isGuaranteedToTransferExecutionToSuccessor(&I))
				return;
				continue;
				}
				assert(isGuaranteedToTransferExecutionToSuccessor(&I) &&
				"Expected simple memory access to transfer execution");

				// Decompose the pointer into base (which must be a function argument) and
				// offset. Ignore negative offsets because the dereferenceable range must
				// begin at the argument.
				jdoerfertUnsubmitted Not Done Reply Inline Actions Two ideas: We could only track minimum + maximum `ByteOffset` values, iff we know you cannot "jump" between allocations. Regardless of 1), we could use the maximum `ByteOffset` value we found for a `inbounds` GEP as a lower bound for the `dereferenceable` bytes. `inbounds` GEPs should not allow to do any "jumping" or starting outside the object. jdoerfert: Two ideas: 1) We could only track minimum + maximum `ByteOffset` values, iff we know you…
				int64_t ByteOffset;
				Value *Base = GetPointerBaseWithConstantOffset(PtrOp, ByteOffset, DL);
				auto *Arg = dyn_cast<Argument>(Base);
				if (!Arg \|\| ByteOffset < 0)
				continue;

				// Make sure we have a pointer to a type that is a multiple of 8-bit bytes
				// because the 'dereferenceable' attribute range is specified using bytes.
				// TODO: We can handle weird bitwidths by rounding down.
				assert(Arg->getType()->isPointerTy() && "Unexpected non-pointer type");
				Type *ArgEltType = cast<PointerType>(Arg->getType())->getElementType();
				unsigned ArgSizeInBits = ArgEltType->getPrimitiveSizeInBits();
				if (!ArgSizeInBits \|\| ArgSizeInBits % 8 != 0)
				continue;

				// TODO: This restriction can be removed, but that will make the range
				// calculation more complicated. Instead of only tracking whole number
				// offsets from the base, we have to track individual offsets and
				// ranges (fractional and multiple offsets are possible via casts).
				assert(isa<PointerType>(PtrOp->getType()) && "Expected pointer type");
				Type *AccessType = cast<PointerType>(PtrOp->getType())->getElementType();
				unsigned AccessSizeInBits = AccessType->getPrimitiveSizeInBits();
				if (AccessSizeInBits != ArgSizeInBits)
				continue;

				assert((ByteOffset % (AccessSizeInBits / 8)) == 0 &&
				"Unexpected address offset calculation");
				SetOfOffsets &OffsetsForArg = ArgOffsetMap[Arg];
				OffsetsForArg.insert(ByteOffset / (AccessSizeInBits / 8));
				}
				}

				static bool inferDereferenceableFromMemoryAccesses(Function &F) {
				ArgToOffsetsMap ArgOffsetMap;
				getArgToOffsetsMap(F, ArgOffsetMap);
	bool Changed = false;			bool Changed = false;

	for (Function &F : M.functions())			// For any pointer argument that we matched with memory accesses...
	// We only infer things using the prototype and the name; we don't need			for (auto &ArgAndOffsetPair : ArgOffsetMap) {
	// definitions.			Argument *Arg = ArgAndOffsetPair.getFirst();
	if (F.isDeclaration() && !F.hasOptNone())			SetOfOffsets &Offsets = ArgAndOffsetPair.getSecond();
	Changed \|= inferLibFuncAttributes(F, TLI);
				// Determine how many consecutive memory accesses that we found. The set is
				// sorted, so as soon as we miss an offset from the pointer, we are done.
				// We do not know if a chunk of memory is dereferenceable without an access.
				// TODO: See size limitation in getArgToOffsetsMap(). If we allow varying
				// sizes of accesses from an argument, this will not be valid.
				int64_t MaxOffset = 0;
				hfinkelUnsubmitted Not Done Reply Inline Actions Does this do the right thing if the lowest GEP index is negative? You could skip the negative ones first? As a general point, I think that you want logic here mirroring (a subset of) what's in isOverwrite in DeadStoreElimination.cpp hfinkel: Does this do the right thing if the lowest GEP index is negative? You could skip the negative…
				spatelAuthorUnsubmitted Done Reply Inline Actions I don't think there was miscompile potential there, but that was accidental. I didn't think about negative offsets. Will fix. I'd like to make the DSE-like enhancement to support arbitrary-sized sub-ranges (via pointer casts) a follow-up, so this patch doesn't get too complicated. spatel: I don't think there was miscompile potential there, but that was accidental. I didn't think…
				for (int64_t Offset : Offsets) {
				if (Offset != MaxOffset)
				break;
				++MaxOffset;
				}
				// If there was no access directly from this pointer argument, give up.
				// TODO: We could extend an existing known dereferenceable argument with
				// extra bytes even if there are missing leading chunks.
				if (!MaxOffset)
				continue;

				auto *PtrTy = cast<PointerType>(Arg->getType());
				unsigned EltSize = PtrTy->getElementType()->getPrimitiveSizeInBits();
				uint64_t DerefBytes = MaxOffset * (EltSize / 8);
				jdoerfertUnsubmitted Not Done Reply Inline Actions Wrt. the above changes it would probably be: `MaxOffset * EltSize / 8`. jdoerfert: Wrt. the above changes it would probably be: `MaxOffset * EltSize / 8`.

				// Replace existing dereferenceable attributes if we determined that more
				// bytes are always accessed.
				unsigned ArgNumber = Arg->getArgNo();
				if (F.getParamDereferenceableBytes(ArgNumber) < DerefBytes) {
				F.removeParamAttr(ArgNumber, Attribute::Dereferenceable);
				jdoerfertUnsubmitted Done Reply Inline Actions You need to remove `deref_or_null` as well. jdoerfert: You need to remove `deref_or_null` as well.
				F.removeParamAttr(ArgNumber, Attribute::DereferenceableOrNull);
				F.addDereferenceableParamAttr(ArgNumber, DerefBytes);
				Changed = true;
				}
				}

				return Changed;
				}

				static bool inferAttributes(Module &M, const TargetLibraryInfo &TLI) {
				bool Changed = false;

				for (Function &F : M.functions()) {
				if (F.hasOptNone())
				continue;
				// For libfunc attributes, we infer things using the prototype and the name.
				// For other attributes, we need to look at the function definition.
				if (F.isDeclaration())
				Changed \|= inferLibFuncAttributes(F, TLI);
				else
				Changed \|= inferDereferenceableFromMemoryAccesses(F);
				}
	return Changed;			return Changed;
	}			}

	PreservedAnalyses InferFunctionAttrsPass::run(Module &M,			PreservedAnalyses InferFunctionAttrsPass::run(Module &M,
	ModuleAnalysisManager &AM) {			ModuleAnalysisManager &AM) {
	auto &TLI = AM.getResult<TargetLibraryAnalysis>(M);			// If we may have changed fundamental function attributes, clear analyses.

	if (!inferAllPrototypeAttributes(M, TLI))
	// If we didn't infer anything, preserve all analyses.			// If we didn't infer anything, preserve all analyses.
	return PreservedAnalyses::all();			auto &TLI = AM.getResult<TargetLibraryAnalysis>(M);
				return inferAttributes(M, TLI) ? PreservedAnalyses::none()
	// Otherwise, we may have changed fundamental function attributes, so clear			: PreservedAnalyses::all();
	// out all the passes.
	return PreservedAnalyses::none();
	}			}

	namespace {			namespace {
	struct InferFunctionAttrsLegacyPass : public ModulePass {			struct InferFunctionAttrsLegacyPass : public ModulePass {
	static char ID; // Pass identification, replacement for typeid			static char ID; // Pass identification, replacement for typeid
	InferFunctionAttrsLegacyPass() : ModulePass(ID) {			InferFunctionAttrsLegacyPass() : ModulePass(ID) {
	initializeInferFunctionAttrsLegacyPassPass(			initializeInferFunctionAttrsLegacyPassPass(
	*PassRegistry::getPassRegistry());			*PassRegistry::getPassRegistry());
	}			}

	void getAnalysisUsage(AnalysisUsage &AU) const override {			void getAnalysisUsage(AnalysisUsage &AU) const override {
	AU.addRequired<TargetLibraryInfoWrapperPass>();			AU.addRequired<TargetLibraryInfoWrapperPass>();
	}			}

	bool runOnModule(Module &M) override {			bool runOnModule(Module &M) override {
	if (skipModule(M))			if (skipModule(M))
	return false;			return false;

	auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();			auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
	return inferAllPrototypeAttributes(M, TLI);			return inferAttributes(M, TLI);
	}			}
	};			};
	}			}

	char InferFunctionAttrsLegacyPass::ID = 0;			char InferFunctionAttrsLegacyPass::ID = 0;
	INITIALIZE_PASS_BEGIN(InferFunctionAttrsLegacyPass, "inferattrs",			INITIALIZE_PASS_BEGIN(InferFunctionAttrsLegacyPass, "inferattrs",
	"Infer set function attributes", false, false)			"Infer set function attributes", false, false)
	INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)			INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
	INITIALIZE_PASS_END(InferFunctionAttrsLegacyPass, "inferattrs",			INITIALIZE_PASS_END(InferFunctionAttrsLegacyPass, "inferattrs",
	"Infer set function attributes", false, false)			"Infer set function attributes", false, false)

	Pass *llvm::createInferFunctionAttrsLegacyPass() {			Pass *llvm::createInferFunctionAttrsLegacyPass() {
	return new InferFunctionAttrsLegacyPass();			return new InferFunctionAttrsLegacyPass();
	}			}

llvm/test/CodeGen/AMDGPU/inline-attr.ll

	; RUN: opt -mtriple=amdgcn--amdhsa -S -O3 -enable-unsafe-fp-math %s \| FileCheck -check-prefix=GCN -check-prefix=UNSAFE %s			; RUN: opt -mtriple=amdgcn--amdhsa -S -O3 -enable-unsafe-fp-math %s \| FileCheck -check-prefix=GCN -check-prefix=UNSAFE %s
	; RUN: opt -mtriple=amdgcn--amdhsa -S -O3 -enable-no-nans-fp-math %s \| FileCheck -check-prefix=GCN -check-prefix=NONANS %s			; RUN: opt -mtriple=amdgcn--amdhsa -S -O3 -enable-no-nans-fp-math %s \| FileCheck -check-prefix=GCN -check-prefix=NONANS %s
	; RUN: opt -mtriple=amdgcn--amdhsa -S -O3 -enable-no-infs-fp-math %s \| FileCheck -check-prefix=GCN -check-prefix=NOINFS %s			; RUN: opt -mtriple=amdgcn--amdhsa -S -O3 -enable-no-infs-fp-math %s \| FileCheck -check-prefix=GCN -check-prefix=NOINFS %s

	; GCN: define float @foo(float %x) local_unnamed_addr #0 {			; GCN: define float @foo(float %x) local_unnamed_addr #0 {
	; GCN: define amdgpu_kernel void @caller(float addrspace(1)* nocapture %p) local_unnamed_addr #1 {			; GCN: define amdgpu_kernel void @caller(float addrspace(1)* nocapture dereferenceable(4) %p) local_unnamed_addr #1 {
	; GCN: %mul.i = fmul float %load, 1.500000e+01			; GCN: %mul.i = fmul float %load, 1.500000e+01

	; UNSAFE: attributes #0 = { norecurse nounwind readnone "less-precise-fpmad"="true" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "unsafe-fp-math"="true" }			; UNSAFE: attributes #0 = { norecurse nounwind readnone "less-precise-fpmad"="true" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "unsafe-fp-math"="true" }
	; UNSAFE: attributes #1 = { nofree norecurse nounwind "less-precise-fpmad"="true" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "unsafe-fp-math"="true" }			; UNSAFE: attributes #1 = { nofree norecurse nounwind "less-precise-fpmad"="true" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "unsafe-fp-math"="true" }

	; NOINFS: attributes #0 = { norecurse nounwind readnone "no-infs-fp-math"="true" }			; NOINFS: attributes #0 = { norecurse nounwind readnone "no-infs-fp-math"="true" }
	; NOINFS: attributes #1 = { nofree norecurse nounwind "less-precise-fpmad"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="false" "unsafe-fp-math"="false" }			; NOINFS: attributes #1 = { nofree norecurse nounwind "less-precise-fpmad"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="false" "unsafe-fp-math"="false" }

	Show All 19 Lines

llvm/test/Transforms/InferFunctionAttrs/dereferenceable.ll

; RUN: opt < %s -inferattrs -S \| FileCheck %s		; RUN: opt < %s -inferattrs -S \| FileCheck %s
		; RUN: opt < %s -passes=inferattrs -S \| FileCheck %s

; Determine dereference-ability before unused loads get deleted:		; Determine dereference-ability before unused loads get deleted:
; https://bugs.llvm.org/show_bug.cgi?id=21780		; https://bugs.llvm.org/show_bug.cgi?id=21780

define <4 x double> @PR21780(double* %ptr) {		define <4 x double> @PR21780(double* %ptr) {
; CHECK-LABEL: @PR21780(double* %ptr)		; CHECK-LABEL: @PR21780(double* dereferenceable(32) %ptr)
; GEP of index 0 is simplified away.		; GEP of index 0 is simplified away.
%arrayidx1 = getelementptr inbounds double, double* %ptr, i64 1		%arrayidx1 = getelementptr inbounds double, double* %ptr, i64 1
%arrayidx2 = getelementptr inbounds double, double* %ptr, i64 2		%arrayidx2 = getelementptr inbounds double, double* %ptr, i64 2
%arrayidx3 = getelementptr inbounds double, double* %ptr, i64 3		%arrayidx3 = getelementptr inbounds double, double* %ptr, i64 3

%t0 = load double, double* %ptr, align 8		%t0 = load double, double* %ptr, align 8
%t1 = load double, double* %arrayidx1, align 8		%t1 = load double, double* %arrayidx1, align 8
%t2 = load double, double* %arrayidx2, align 8		%t2 = load double, double* %arrayidx2, align 8
%t3 = load double, double* %arrayidx3, align 8		%t3 = load double, double* %arrayidx3, align 8

%vecinit0 = insertelement <4 x double> undef, double %t0, i32 0		%vecinit0 = insertelement <4 x double> undef, double %t0, i32 0
%vecinit1 = insertelement <4 x double> %vecinit0, double %t1, i32 1		%vecinit1 = insertelement <4 x double> %vecinit0, double %t1, i32 1
%vecinit2 = insertelement <4 x double> %vecinit1, double %t2, i32 2		%vecinit2 = insertelement <4 x double> %vecinit1, double %t2, i32 2
%vecinit3 = insertelement <4 x double> %vecinit2, double %t3, i32 3		%vecinit3 = insertelement <4 x double> %vecinit2, double %t3, i32 3
%shuffle = shufflevector <4 x double> %vecinit3, <4 x double> %vecinit3, <4 x i32> <i32 0, i32 0, i32 2, i32 2>		%shuffle = shufflevector <4 x double> %vecinit3, <4 x double> %vecinit3, <4 x i32> <i32 0, i32 0, i32 2, i32 2>
ret <4 x double> %shuffle		ret <4 x double> %shuffle
}		}

; Unsimplified, but still valid. Also, throw in some bogus arguments.		; Unsimplified, but still valid. Also, throw in a bogus argument and a store argument.

define void @gep0(i8* %unused, i8* %other, i8* %ptr) {		define void @gep0(i8* %unused, i8* %other, i8* %ptr) {
; CHECK-LABEL: @gep0(i8* %unused, i8* %other, i8* %ptr)		; CHECK-LABEL: @gep0(i8* %unused, i8* dereferenceable(1) %other, i8* dereferenceable(3) %ptr)
%arrayidx0 = getelementptr i8, i8* %ptr, i64 0		%arrayidx0 = getelementptr i8, i8* %ptr, i64 0
%arrayidx1 = getelementptr i8, i8* %ptr, i64 1		%arrayidx1 = getelementptr i8, i8* %ptr, i64 1
%arrayidx2 = getelementptr i8, i8* %ptr, i64 2		%arrayidx2 = getelementptr i8, i8* %ptr, i64 2
%t0 = load i8, i8* %arrayidx0		%t0 = load i8, i8* %arrayidx0
%t1 = load i8, i8* %arrayidx1		%t1 = load i8, i8* %arrayidx1
%t2 = load i8, i8* %arrayidx2		%t2 = load i8, i8* %arrayidx2
store i8 %t2, i8* %other		store i8 %t2, i8* %other
ret void		ret void
}		}

; Order of accesses does not change computation.		; Order of accesses does not change computation.
; Multiple arguments may be dereferenceable.		; Multiple arguments may be dereferenceable.

define void @ordering(i8* %ptr1, i32* %ptr2) {		define void @ordering(i8* %ptr1, i32* %ptr2) {
; CHECK-LABEL: @ordering(i8* %ptr1, i32* %ptr2)		; CHECK-LABEL: @ordering(i8* dereferenceable(3) %ptr1, i32* dereferenceable(8) %ptr2)
%a20 = getelementptr i32, i32* %ptr2, i64 0		%a20 = getelementptr i32, i32* %ptr2, i64 0
%a12 = getelementptr i8, i8* %ptr1, i64 2		%a12 = getelementptr i8, i8* %ptr1, i64 2
%t12 = load i8, i8* %a12		%t12 = load i8, i8* %a12
%a11 = getelementptr i8, i8* %ptr1, i64 1		%a11 = getelementptr i8, i8* %ptr1, i64 1
%t20 = load i32, i32* %a20		%t20 = load i32, i32* %a20
%a10 = getelementptr i8, i8* %ptr1, i64 0		%a10 = getelementptr i8, i8* %ptr1, i64 0
%t10 = load i8, i8* %a10		%t10 = load i8, i8* %a10
%t11 = load i8, i8* %a11		%t11 = load i8, i8* %a11
Show All 13 Lines	exit:
%arrayidx1 = getelementptr i8, i8* %ptr, i64 1		%arrayidx1 = getelementptr i8, i8* %ptr, i64 1
%arrayidx2 = getelementptr i8, i8* %ptr, i64 2		%arrayidx2 = getelementptr i8, i8* %ptr, i64 2
%t0 = load i8, i8* %arrayidx0		%t0 = load i8, i8* %arrayidx0
%t1 = load i8, i8* %arrayidx1		%t1 = load i8, i8* %arrayidx1
%t2 = load i8, i8* %arrayidx2		%t2 = load i8, i8* %arrayidx2
ret void		ret void
}		}

; Not in entry block and not guaranteed to execute.		; Negative test - not in entry block and not guaranteed to execute.

define void @not_entry_not_guaranteed_to_execute(i8* %ptr, i1 %cond) {		define void @not_entry_not_guaranteed_to_execute(i8* %ptr, i1 %cond) {
; CHECK-LABEL: @not_entry_not_guaranteed_to_execute(i8* %ptr, i1 %cond)		; CHECK-LABEL: @not_entry_not_guaranteed_to_execute(i8* %ptr, i1 %cond)
entry:		entry:
br i1 %cond, label %loads, label %exit		br i1 %cond, label %loads, label %exit
loads:		loads:
%arrayidx0 = getelementptr i8, i8* %ptr, i64 0		%arrayidx0 = getelementptr i8, i8* %ptr, i64 0
%arrayidx1 = getelementptr i8, i8* %ptr, i64 1		%arrayidx1 = getelementptr i8, i8* %ptr, i64 1
%arrayidx2 = getelementptr i8, i8* %ptr, i64 2		%arrayidx2 = getelementptr i8, i8* %ptr, i64 2
%t0 = load i8, i8* %arrayidx0		%t0 = load i8, i8* %arrayidx0
%t1 = load i8, i8* %arrayidx1		%t1 = load i8, i8* %arrayidx1
%t2 = load i8, i8* %arrayidx2		%t2 = load i8, i8* %arrayidx2
ret void		ret void
exit:		exit:
ret void		ret void
}		}

; The last load may not execute, so derefenceable bytes only covers the 1st two loads.		; The last load may not execute, so derefenceable bytes only covers the 1st two loads.

define void @partial_in_entry(i16* %ptr, i1 %cond) {		define void @partial_in_entry(i16* %ptr, i1 %cond) {
; CHECK-LABEL: @partial_in_entry(i16* %ptr, i1 %cond)		; CHECK-LABEL: @partial_in_entry(i16* dereferenceable(4) %ptr, i1 %cond)
entry:		entry:
%arrayidx0 = getelementptr i16, i16* %ptr, i64 0		%arrayidx0 = getelementptr i16, i16* %ptr, i64 0
%arrayidx1 = getelementptr i16, i16* %ptr, i64 1		%arrayidx1 = getelementptr i16, i16* %ptr, i64 1
%arrayidx2 = getelementptr i16, i16* %ptr, i64 2		%arrayidx2 = getelementptr i16, i16* %ptr, i64 2
%t0 = load i16, i16* %arrayidx0		%t0 = load i16, i16* %arrayidx0
%t1 = load i16, i16* %arrayidx1		%t1 = load i16, i16* %arrayidx1
br i1 %cond, label %loads, label %exit		br i1 %cond, label %loads, label %exit
loads:		loads:
%t2 = load i16, i16* %arrayidx2		%t2 = load i16, i16* %arrayidx2
ret void		ret void
exit:		exit:
ret void		ret void
}		}

; The volatile load can't be used to prove a non-volatile access is allowed.		; The volatile load can't be used to prove a non-volatile access is allowed.
; The 2nd and 3rd loads may never execute.		; The 2nd and 3rd loads may never execute.

define void @volatile_is_not_dereferenceable(i16* %ptr) {		define void @volatile_is_not_dereferenceable(i16* %ptr) {
; CHECK-LABEL: @volatile_is_not_dereferenceable(i16* %ptr)		; CHECK-LABEL: @volatile_is_not_dereferenceable(i16* %ptr)
		jdoerfertUnsubmitted Done Reply Inline Actions volatile should not cause `deref`, I think this was said: This means the compiler may not use a volatile operation to prove a non-volatile access to that address has defined behavior. jdoerfert: volatile should not cause `deref`, I think this was said: > This means the compiler may not…
%arrayidx0 = getelementptr i16, i16* %ptr, i64 0		%arrayidx0 = getelementptr i16, i16* %ptr, i64 0
%arrayidx1 = getelementptr i16, i16* %ptr, i64 1		%arrayidx1 = getelementptr i16, i16* %ptr, i64 1
%arrayidx2 = getelementptr i16, i16* %ptr, i64 2		%arrayidx2 = getelementptr i16, i16* %ptr, i64 2
%t0 = load volatile i16, i16* %arrayidx0		%t0 = load volatile i16, i16* %arrayidx0
%t1 = load i16, i16* %arrayidx1		%t1 = load i16, i16* %arrayidx1
%t2 = load i16, i16* %arrayidx2		%t2 = load i16, i16* %arrayidx2
ret void		ret void
}		}

		; TODO: We should allow inference for atomic (but not volatile) ops.

		define void @atomic_is_alright(i16* %ptr) {
		; CHECK-LABEL: @atomic_is_alright(i16* %ptr)
		%arrayidx0 = getelementptr i16, i16* %ptr, i64 0
		%arrayidx1 = getelementptr i16, i16* %ptr, i64 1
		%arrayidx2 = getelementptr i16, i16* %ptr, i64 2
		%t0 = load atomic i16, i16* %arrayidx0 unordered, align 2
		%t1 = load i16, i16* %arrayidx1
		%t2 = load i16, i16* %arrayidx2
		ret void
		}

declare void @may_not_return()		declare void @may_not_return()

define void @not_guaranteed_to_transfer_execution(i16* %ptr) {		define void @not_guaranteed_to_transfer_execution(i16* %ptr) {
; CHECK-LABEL: @not_guaranteed_to_transfer_execution(i16* %ptr)		; CHECK-LABEL: @not_guaranteed_to_transfer_execution(i16* dereferenceable(2) %ptr)
%arrayidx0 = getelementptr i16, i16* %ptr, i64 0		%arrayidx0 = getelementptr i16, i16* %ptr, i64 0
%arrayidx1 = getelementptr i16, i16* %ptr, i64 1		%arrayidx1 = getelementptr i16, i16* %ptr, i64 1
%arrayidx2 = getelementptr i16, i16* %ptr, i64 2		%arrayidx2 = getelementptr i16, i16* %ptr, i64 2
%t0 = load i16, i16* %arrayidx0		%t0 = load i16, i16* %arrayidx0
call void @may_not_return()		call void @may_not_return()
%t1 = load i16, i16* %arrayidx1		%t1 = load i16, i16* %arrayidx1
%t2 = load i16, i16* %arrayidx2		%t2 = load i16, i16* %arrayidx2
ret void		ret void
}		}

; We must have consecutive accesses.		; We must have consecutive accesses.

define void @variable_gep_index(i8* %unused, i8* %ptr, i64 %variable_index) {		define void @variable_gep_index(i8* %unused, i8* %ptr, i64 %variable_index) {
; CHECK-LABEL: @variable_gep_index(i8* %unused, i8* %ptr, i64 %variable_index)		; CHECK-LABEL: @variable_gep_index(i8* %unused, i8* dereferenceable(1) %ptr, i64 %variable_index)
%arrayidx1 = getelementptr i8, i8* %ptr, i64 %variable_index		%arrayidx1 = getelementptr i8, i8* %ptr, i64 %variable_index
%arrayidx2 = getelementptr i8, i8* %ptr, i64 2		%arrayidx2 = getelementptr i8, i8* %ptr, i64 2
%t0 = load i8, i8* %ptr		%t0 = load i8, i8* %ptr
%t1 = load i8, i8* %arrayidx1		%t1 = load i8, i8* %arrayidx1
%t2 = load i8, i8* %arrayidx2		%t2 = load i8, i8* %arrayidx2
ret void		ret void
}		}

; Deal with >1 GEP index.		; TODO: Deal with >1 GEP index.

define void @multi_index_gep(<4 x i8>* %ptr) {		define void @multi_index_gep(<4 x i8>* %ptr) {
; CHECK-LABEL: @multi_index_gep(<4 x i8>* %ptr)		; CHECK-LABEL: @multi_index_gep(<4 x i8>* %ptr)
%arrayidx00 = getelementptr <4 x i8>, <4 x i8>* %ptr, i64 0, i64 0		%arrayidx00 = getelementptr <4 x i8>, <4 x i8>* %ptr, i64 0, i64 0
%t0 = load i8, i8* %arrayidx00		%t0 = load i8, i8* %arrayidx00
ret void		ret void
}		}

; Could round weird bitwidths down?		; TODO: Could round weird bitwidths down?
		jdoerfertUnsubmitted Done Reply Inline Actions Yes ;) jdoerfert: Yes ;)
		spatelAuthorUnsubmitted Done Reply Inline Actions I've left this as a TODO for now because it's highly unusual to see non-power-of-2 bitwidths. We're going to rewrite the code fairly soon, and we'll have a code comment and this test in place to remind us that we can make the logic more flexible. spatel: I've left this as a TODO for now because it's highly unusual to see non-power-of-2 bitwidths.
		jdoerfertUnsubmitted Not Done Reply Inline Actions Fine with me. jdoerfert: Fine with me.

define void @not_byte_multiple(i9* %ptr) {		define void @not_byte_multiple(i9* %ptr) {
; CHECK-LABEL: @not_byte_multiple(i9* %ptr)		; CHECK-LABEL: @not_byte_multiple(i9* %ptr)
%arrayidx0 = getelementptr i9, i9* %ptr, i64 0		%arrayidx0 = getelementptr i9, i9* %ptr, i64 0
%t0 = load i9, i9* %arrayidx0		%t0 = load i9, i9* %arrayidx0
ret void		ret void
}		}

; Missing direct access from the pointer.		; Negative test - missing direct access from the pointer.

define void @no_pointer_deref(i16* %ptr) {		define void @no_pointer_deref(i16* %ptr) {
; CHECK-LABEL: @no_pointer_deref(i16* %ptr)		; CHECK-LABEL: @no_pointer_deref(i16* %ptr)
%arrayidx1 = getelementptr i16, i16* %ptr, i64 1		%arrayidx1 = getelementptr i16, i16* %ptr, i64 1
%arrayidx2 = getelementptr i16, i16* %ptr, i64 2		%arrayidx2 = getelementptr i16, i16* %ptr, i64 2
%t1 = load i16, i16* %arrayidx1		%t1 = load i16, i16* %arrayidx1
%t2 = load i16, i16* %arrayidx2		%t2 = load i16, i16* %arrayidx2
ret void		ret void
}		}

; Out-of-order is ok, but missing access concludes dereferenceable range.		; Out-of-order is ok, but missing access concludes dereferenceable range.

define void @non_consecutive(i32* %ptr) {		define void @non_consecutive(i32* %ptr) {
; CHECK-LABEL: @non_consecutive(i32* %ptr)		; CHECK-LABEL: @non_consecutive(i32* dereferenceable(8) %ptr)
%arrayidx1 = getelementptr i32, i32* %ptr, i64 1		%arrayidx1 = getelementptr i32, i32* %ptr, i64 1
%arrayidx0 = getelementptr i32, i32* %ptr, i64 0		%arrayidx0 = getelementptr i32, i32* %ptr, i64 0
%arrayidx3 = getelementptr i32, i32* %ptr, i64 3		%arrayidx3 = getelementptr i32, i32* %ptr, i64 3
%t1 = load i32, i32* %arrayidx1		%t1 = load i32, i32* %arrayidx1
%t0 = load i32, i32* %arrayidx0		%t0 = load i32, i32* %arrayidx0
%t3 = load i32, i32* %arrayidx3		%t3 = load i32, i32* %arrayidx3
ret void		ret void
}		}

; Improve on existing dereferenceable attribute.		; Improve on existing dereferenceable attribute.

define void @more_bytes(i32* dereferenceable(8) %ptr) {		define void @more_bytes(i32* dereferenceable(8) %ptr) {
; CHECK-LABEL: @more_bytes(i32* dereferenceable(8) %ptr)		; CHECK-LABEL: @more_bytes(i32* dereferenceable(16) %ptr)
		%arrayidx3 = getelementptr i32, i32* %ptr, i64 3
		%arrayidx1 = getelementptr i32, i32* %ptr, i64 1
		%arrayidx0 = getelementptr i32, i32* %ptr, i64 0
		%arrayidx2 = getelementptr i32, i32* %ptr, i64 2
		%t3 = load i32, i32* %arrayidx3
		%t1 = load i32, i32* %arrayidx1
		%t2 = load i32, i32* %arrayidx2
		%t0 = load i32, i32* %arrayidx0
		ret void
		}

		; Improve on existing dereferenceable_or_null attribute.

		define void @more_bytes_and_not_null(i32* dereferenceable_or_null(8) %ptr) {
		; CHECK-LABEL: @more_bytes_and_not_null(i32* dereferenceable(16) %ptr)
%arrayidx3 = getelementptr i32, i32* %ptr, i64 3		%arrayidx3 = getelementptr i32, i32* %ptr, i64 3
%arrayidx1 = getelementptr i32, i32* %ptr, i64 1		%arrayidx1 = getelementptr i32, i32* %ptr, i64 1
%arrayidx0 = getelementptr i32, i32* %ptr, i64 0		%arrayidx0 = getelementptr i32, i32* %ptr, i64 0
%arrayidx2 = getelementptr i32, i32* %ptr, i64 2		%arrayidx2 = getelementptr i32, i32* %ptr, i64 2
%t3 = load i32, i32* %arrayidx3		%t3 = load i32, i32* %arrayidx3
%t1 = load i32, i32* %arrayidx1		%t1 = load i32, i32* %arrayidx1
%t2 = load i32, i32* %arrayidx2		%t2 = load i32, i32* %arrayidx2
%t0 = load i32, i32* %arrayidx0		%t0 = load i32, i32* %arrayidx0
ret void		ret void
}		}

; But don't pessimize existing dereferenceable attribute.		; Negative test - don't pessimize existing dereferenceable attribute.

define void @better_bytes(i32* dereferenceable(100) %ptr) {		define void @better_bytes(i32* dereferenceable(100) %ptr) {
; CHECK-LABEL: @better_bytes(i32* dereferenceable(100) %ptr)		; CHECK-LABEL: @better_bytes(i32* dereferenceable(100) %ptr)
%arrayidx3 = getelementptr i32, i32* %ptr, i64 3		%arrayidx3 = getelementptr i32, i32* %ptr, i64 3
%arrayidx1 = getelementptr i32, i32* %ptr, i64 1		%arrayidx1 = getelementptr i32, i32* %ptr, i64 1
%arrayidx0 = getelementptr i32, i32* %ptr, i64 0		%arrayidx0 = getelementptr i32, i32* %ptr, i64 0
%arrayidx2 = getelementptr i32, i32* %ptr, i64 2		%arrayidx2 = getelementptr i32, i32* %ptr, i64 2
%t3 = load i32, i32* %arrayidx3		%t3 = load i32, i32* %arrayidx3
%t1 = load i32, i32* %arrayidx1		%t1 = load i32, i32* %arrayidx1
%t2 = load i32, i32* %arrayidx2		%t2 = load i32, i32* %arrayidx2
%t0 = load i32, i32* %arrayidx0		%t0 = load i32, i32* %arrayidx0
ret void		ret void
}		}

		; Peeking through same-size-element bitcast is supported.

define void @bitcast(i32* %arg) {		define void @bitcast(i32* %arg) {
; CHECK-LABEL: @bitcast(i32* %arg)		; CHECK-LABEL: @bitcast(i32* dereferenceable(8) %arg)
%ptr = bitcast i32* %arg to float*		%ptr = bitcast i32* %arg to float*
%arrayidx0 = getelementptr float, float* %ptr, i64 0		%arrayidx0 = getelementptr float, float* %ptr, i64 0
%arrayidx1 = getelementptr float, float* %ptr, i64 1		%arrayidx1 = getelementptr float, float* %ptr, i64 1
%t0 = load float, float* %arrayidx0		%t0 = load float, float* %arrayidx0
%t1 = load float, float* %arrayidx1		%t1 = load float, float* %arrayidx1
ret void		ret void
}		}

		; TODO: Enhance to allow arbitrary sub-ranges.

define void @bitcast_different_sizes(double* %arg1, i8* %arg2) {		define void @bitcast_different_sizes(double* %arg1, i8* %arg2) {
; CHECK-LABEL: @bitcast_different_sizes(double* %arg1, i8* %arg2)		; CHECK-LABEL: @bitcast_different_sizes(double* %arg1, i8* %arg2)
%ptr1 = bitcast double* %arg1 to float*		%ptr1 = bitcast double* %arg1 to float*
%a10 = getelementptr float, float* %ptr1, i64 0		%a10 = getelementptr float, float* %ptr1, i64 0
%a11 = getelementptr float, float* %ptr1, i64 1		%a11 = getelementptr float, float* %ptr1, i64 1
%a12 = getelementptr float, float* %ptr1, i64 2		%a12 = getelementptr float, float* %ptr1, i64 2
%ld10 = load float, float* %a10		%ld10 = load float, float* %a10
%ld11 = load float, float* %a11		%ld11 = load float, float* %a11
%ld12 = load float, float* %a12		%ld12 = load float, float* %a12

%ptr2 = bitcast i8* %arg2 to i64*		%ptr2 = bitcast i8* %arg2 to i64*
%a20 = getelementptr i64, i64* %ptr2, i64 0		%a20 = getelementptr i64, i64* %ptr2, i64 0
%a21 = getelementptr i64, i64* %ptr2, i64 1		%a21 = getelementptr i64, i64* %ptr2, i64 1
%ld20 = load i64, i64* %a20		%ld20 = load i64, i64* %a20
%ld21 = load i64, i64* %a21		%ld21 = load i64, i64* %a21
ret void		ret void
}		}

		; The attribute has a length, not a range, so can't represent this better.

define void @negative_offset(i32* %arg) {		define void @negative_offset(i32* %arg) {
; CHECK-LABEL: @negative_offset(i32* %arg)		; CHECK-LABEL: @negative_offset(i32* dereferenceable(4) %arg)
%ptr = bitcast i32* %arg to float*		%ptr = bitcast i32* %arg to float*
%arrayidx0 = getelementptr float, float* %ptr, i64 0		%arrayidx0 = getelementptr float, float* %ptr, i64 0
%arrayidx1 = getelementptr float, float* %ptr, i64 -1		%arrayidx1 = getelementptr float, float* %ptr, i64 -1
%t0 = load float, float* %arrayidx0		%t0 = load float, float* %arrayidx0
%t1 = load float, float* %arrayidx1		%t1 = load float, float* %arrayidx1
ret void		ret void
}		}

		; Simple store accesses allow inferring too.

define void @stores(i32* %arg) {		define void @stores(i32* %arg) {
; CHECK-LABEL: @stores(i32* %arg)		; CHECK-LABEL: @stores(i32* dereferenceable(8) %arg)
%ptr = bitcast i32* %arg to float*		%ptr = bitcast i32* %arg to float*
%arrayidx0 = getelementptr float, float* %ptr, i64 0		%arrayidx0 = getelementptr float, float* %ptr, i64 0
%arrayidx1 = getelementptr float, float* %ptr, i64 1		%arrayidx1 = getelementptr float, float* %ptr, i64 1
store float 1.0, float* %arrayidx0		store float 1.0, float* %arrayidx0
store float 2.0, float* %arrayidx1		store float 2.0, float* %arrayidx1
ret void		ret void
}		}

		; Combinations of load/store can be used together.

define void @load_store(i32* %arg) {		define void @load_store(i32* %arg) {
; CHECK-LABEL: @load_store(i32* %arg)		; CHECK-LABEL: @load_store(i32* dereferenceable(8) %arg)
%ptr = bitcast i32* %arg to float*		%ptr = bitcast i32* %arg to float*
%arrayidx0 = getelementptr float, float* %ptr, i64 0		%arrayidx0 = getelementptr float, float* %ptr, i64 0
%arrayidx1 = getelementptr float, float* %ptr, i64 1		%arrayidx1 = getelementptr float, float* %ptr, i64 1
%t1 = load float, float* %arrayidx0		%t1 = load float, float* %arrayidx0
store float 2.0, float* %arrayidx1		store float 2.0, float* %arrayidx1
ret void		ret void
}		}