This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/IPO/
-
llvm/
-
Transforms/
-
IPO/
1/4
Attributor.h
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
7/28
Attributor.cpp
-
test/Transforms/Attributor/
-
Transforms/
-
Attributor/
-
undefined_behavior.ll

Differential D71435

[Attributor] Function level undefined behavior attribute
ClosedPublic

Authored by baziotis on Dec 12 2019, 2:08 PM.

Download Raw Diff

Details

Reviewers

jdoerfert
sstefan1

Commits

rG58f324a468ff: [Attributor] Function level undefined behavior attribute

Summary

_Eventually_, this attribute will be assigned to a function if it contains undefined behavior. As a first small step, I tried to make it loop through the load instructions in a function (eventually, the plan is to check if a load instructions causes undefined behavior, because e.g. dereferences a null pointer - Also eventually, this won't happen in initialize() but in updateImpl()).

Note: This is my first LLVM and Attributor review hence this patch doesn't really do anything yet, I'm just trying to get the initial details down.
Any help / advice / proposal is highly appreciated.

Edit: Correction: The attribute won't be assigned to functions (yet). Its purpose is to be integrated in AAIsDead.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

baziotis created this revision.Dec 12 2019, 2:08 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 12 2019, 2:08 PM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

I'm a little dubious on the use-case of such an attribute, perhaps you can add that into patch's description.

Thanks for working on this. I left some comments and look forward to the updateImpl code :)

llvm/include/llvm/Transforms/IPO/Attributor.h
1710	In comments two words please "undefined behavior" Remove the "AA" in the method names: "isKnownUndefinedBehavior" or "isKnownToCauseUB". Return the state value not `true`/`false`. See other `isKnownXXXXX` functions around here.
llvm/lib/Transforms/IPO/Attributor.cpp
2001	I'm unsure what the IPos does here. Probably copy&paste. Note that `I` is known to be a `Load` here, so `ImmutableCallSite(&I)` will always result in a "unusable" call site, e.g., one that evaluates to `false` if converted to bool.
2025	We need to track statistics differently. Maybe we want to track two lists, one for assumed live and UB loads one for assumed live and non-UB loads. We can track the number of UB loads then by using something like this (sorry for the template mess) STATS_DECL(UndefinedBehaviorInstruction, Instruction, "Number of instructions known to have UB"); BUILD_STAT_NAME(UndefinedBehaviorInstruction, Instruction) += AssumedUBInstructions.size();
2027	My proposal going forward: Keep a list of loads that are not assumed to cause UB. Initially that list is empty. Every updateImpl is called you iterate over all assumed live loads (as you do above) and: skip ones we have in the list of not assumed UB causing check if we can assume we can continue to assume execution would cause UB, initially by asking if the pointer is `null` or `undef`. If this fails add them to the list. If the list changed the internal state changed and we have to tell the Attributor.
2036	Let's not handle call sites for now and just give up on them.
5618	The wording is a bit off. Maybe say something like: `Every function might contain instructions that cause "undefined-behavior"`

In D71435#1782577, @lebedev.ri wrote:

I'm a little dubious on the use-case of such an attribute, perhaps you can add that into patch's description.

Let me answer this one as I proposed this project.

The Attributor uses liveness extensively to improve results. I mean, basically all results are "conditional" in the sense SCCP is "conditional".
Liveness in SCCP, and so far in the Attributor, is mostly derived by not following assumed dead branches. Having the ability to reason about UB will allow us to augment this.

Take the following derived example (https://godbolt.org/z/ZcGWiC)

extern int Condition;

int extern_fn(void);

static void internal_fn(int *a) {
  if (Condition) {
    *a = 1;
    extern_fn();
  }
}

void entry() {
  internal_fn(0);
}

While we remove the store to the nullptr and add a trap eventually, we keep the conditional and we introduce a trap. Both are actually harmful for optimization purposes.
I know this is constructed example but we see a lot of known UB (not necessarily load related) after inlining and constant prop which would be very valuable to prune infeasible paths.

I'm unsure what the IPos does here. Probably copy&paste. Note that I is known to be a Load here, so ImmutableCallSite(&I) will always result in a "unusable" call site, e.g., one that evaluates to false if converted to bool.

Oh yes of course, it was totally because of copy paste.

We need to track statistics differently. Maybe we want to track two lists, one for assumed live and UB loads one for assumed live and non-UB loads. We can track the number of UB loads then by using something like this (sorry for the template mess)

I've no idea how statistics are used / tracked. But this code unfortunately doesn't work. I assume that the AssumedUBInstructions is the list you're
referring in the proposed method for updateImpl(), so I'll try to update this when that is ready.

Made AAUB function-only (i.e. no call-site, as AAReachability).
Small fixes.

I'll come back with tests and updateImpl() (probably in reverse order, as otherwise tests are useless).

baziotis edited the summary of this revision. (Show Details)Dec 15 2019, 6:31 AM

This is a high level overview of the algorithm. noUBLoads is monotonically increasing and upper bounded thus this procedure will end (note that for the time being though,
we can't derive pessimistic fixpoint).
A note on the DS: I assume LLVM has its own version of unordered_set but I don't know its internals and this seemed to be the most applicable in this case.
Feel free to propose LLVM alternatives.

Also, now that we have the noUBLoads, I'll add the stats.

You should either diff against the proper trunk revision you work on or merge all your commits into a single one. Right now you always only upload a diff against the last version of the patch.

llvm/lib/Transforms/IPO/Attributor.cpp
1990	I would have taken a `SmallPtrSet<Instruction *, 8>`. Please start variable names with an upper case letter.
2001	FWIW: Branching on an uninitialized local variable is UB.
2008	Nit: Cant -> Cannot No braces if there is only a single stmt.

jdoerfert added inline comments.Dec 15 2019, 11:36 AM

llvm/lib/Transforms/IPO/Attributor.cpp
2001	For now just check if the pointer is a constant null and if null is not dereferenceable

You should either diff against the proper trunk revision you work on or merge all your commits into a single one. Right now you always only upload a diff against the last version of the patch.

My bad, I thought this is what I _had to_ do.

llvm/lib/Transforms/IPO/Attributor.cpp
2000	I couldn't find a way to save the entry so that we don't have to search the second (i.e. like in the unordered_map).

riccibruno added a subscriber: riccibruno.Dec 15 2019, 2:58 PM

jdoerfert added inline comments.Dec 15 2019, 6:29 PM

llvm/lib/Transforms/IPO/Attributor.cpp
2000	Search the second what? `if (MySmallPtrSet.count(&I))` and `MySmallPtrSet.insert(&I)` should work just fine. Nit: I would not do the `Iref` and `I = &Iref` thing but just use the reference (called `I`) and the `&` as needed.

baziotis marked 2 inline comments as done.Dec 15 2019, 6:42 PM

baziotis added inline comments.

llvm/lib/Transforms/IPO/Attributor.cpp
2001	FWIW: Branching on an uninitialized local variable is UB Yes, thanks, it was intended to sort of signify that "this is not complete yet, it has to be filled".
2001	For now just check if the pointer is a constant null and if null is not dereferenceable I was assuming the first one but in the second one I lost it. Did you maybe mean "If it is not constant null, check if it is not dereferenceable (i.e. getAAFor<Dereferenceable>)"?

baziotis marked an inline comment as done.Dec 15 2019, 6:51 PM

baziotis added inline comments.

llvm/lib/Transforms/IPO/Attributor.cpp
2000	Search the second what? if (MySmallPtrSet.count(&I)) and MySmallPtrSet.insert(&I) should work just fine. Yes, but pretty much `count` would achieve what `find` achieves already right? What I meant was that imagine that our DS is implemented as a linear probing hash table. Then, you would do one search with find (or count) and then another with insert, but it would be the _same_ search. In an `unordered_map` we can avoid that by caching the search in find, like this: Instruction *&entry = map[I]; if (!entry) { // Does not exist, so insert it. entry = ...; } But I don't if this is possible on `std::set` let alone `SmallPtrSet`. It's not that important I guess. Nit: I would not do the Iref and I = &Iref thing but just use the reference (called I) and the & as needed. Noted :)

jdoerfert added inline comments.Dec 15 2019, 9:26 PM

llvm/lib/Transforms/IPO/Attributor.cpp
2001	Did you maybe mean "If it is not constant null, check if it is not dereferenceable (i.e. getAAFor<Dereferenceable>)"? No, AADerferenceable will not tell you if it is "not dereferenceanle" but only if it is. What I tried to say in more words: Do not assume NULL cannot be dereferenced, thus that a load from NULL would be UB for sure. NULL can be a valid pointer, it just happens to be one that is different on your OS. In order to determine if NULL can be dereferenced, use the `NullPointerIsDefined` helper function, you can find some uses of it in the Attributor.cpp file already.

No, AADerferenceable will not tell you if it is "not dereferenceanle" but only if it is.

Yes, my bad. Since the number of dereferenceable is only increasing.

What I tried to say in more words:

Do not assume NULL cannot be dereferenced, thus that a load from NULL would be UB for sure. NULL can be a valid pointer, it just happens to be one that is different on your OS. In order to determine if NULL can be dereferenced, use the NullPointerIsDefined helper function, you can find some uses of it in the Attributor.cpp file already.

Oh yes, ok, got it.

One case for a load to cause UB: It is constant null and null is not defined for the target.
Stats using NoUBLoads.size()

Removed braces for single statement ifs

Almost there. Once a manifest method (see below) is there to act on the information we should be able to test it. So in addition to the inlined comments we need test cases now.

llvm/include/llvm/Transforms/IPO/Attributor.h
1699	Since we start with a function version only, we probably need to add an `llvm::Instruction` operand here. While it makes sense if every execution of a function causes UB, the initial implementation will answer the questions for a single (load) instruction.
llvm/lib/Transforms/IPO/Attributor.cpp
1995	Please document what is collected in here. Maybe make it not accessible for other classes, e.g., move it to the end into a `private:` part of the struct declaration.
2023	You can assume `I` to be in a function, thus `I.getFunction()` is not null. If you write the code with early exists you can save multiple levels of nesting and make it generally easier to read.
2040	Let's also add a manifest method to replace the loads that are still considered UB with an undef (for now). That should allow us to test this.
2051	This was my bad, as I proposed this kind of statistic tracking, but this counts the opposite of what the text says. You can keep a different set with the ones that are still considered UB which you update in the updateImpl as well. That set can also be used in the manifest method.

baziotis marked an inline comment as done.Dec 16 2019, 2:29 PM

baziotis added inline comments.

llvm/lib/Transforms/IPO/Attributor.cpp
2040	The whole instruction? Like replace `%a = load i32, i32* null` with `undef`. I don't think this can happen. Maybe I misunderstood and you meant replace it with this: `%a = load i32, i32* undef` (i.e. replace the pointer operand value).

baziotis marked an inline comment as done.Dec 16 2019, 3:00 PM

baziotis added inline comments.

llvm/include/llvm/Transforms/IPO/Attributor.h
1699	You mean to add a (method) overload for `isKnownToCauseUB`, maybe pure `virtual` that is implemented in `AAUndefinedBehaviorImpl`? It can check if it is a `Load` and if so, check if it is in `UBLoads`.

jdoerfert added inline comments.Dec 16 2019, 3:43 PM

llvm/include/llvm/Transforms/IPO/Attributor.h
1699	Exactly. For now that is probably the only two methods we need. I mean we do not actually use the boolean state right now that is returned in the current `isAssumedToCauseUB`.
llvm/lib/Transforms/IPO/Attributor.cpp
2040	Your first interpretation was what I meant. Replace the load (instruction) with undef. We actually want to be more aggressive but we will need that step either way.

baziotis marked an inline comment as done.Dec 16 2019, 4:49 PM

baziotis added inline comments.

llvm/lib/Transforms/IPO/Attributor.cpp
2040	Sorry but I couldn't find a way to do that. I've seen in the language reference that one can do this: `%some_name = undef` and so I guess this is what we want, but I couldn't produce it. The closest thing I found is `deleteAfterManifest` which replaces _its uses_ with `undef` but will also delete the instruction. I guess we actually need the instruction (or, its replaced counterpart) so that the info of undefined behavior stays.

jdoerfert added inline comments.Dec 16 2019, 4:56 PM

llvm/lib/Transforms/IPO/Attributor.cpp
2040	First, no worries, ask if you need to find an API. The code below was obviously not tested but should do what I described earlier: `I.replaceAllUsesWith(UndefValue::get(I.getType()))` You can also use `deleteAfterManifest`, actually you probably should use that instead. Or we directly go one step further and use `llvm::changeToUnreachable` (as AAIsDead does). P.S. I don't think we can do `%some_name = undef` though, the lang ref might need an update there.

baziotis marked an inline comment as done.Dec 16 2019, 4:57 PM

baziotis added inline comments.

llvm/lib/Transforms/IPO/Attributor.cpp
2040	Well, I was assuming initially that `%c = undef` can't appear anywhere, but I changed my mind because of the ref: https://llvm.org/docs/LangRef.html#undefined-values. However, it seems that one can't actually write this (only maybe LLVM from one pass to another ? I've no idea, it's obscure).

baziotis marked an inline comment as done.Dec 16 2019, 5:15 PM

baziotis added inline comments.

llvm/lib/Transforms/IPO/Attributor.cpp
2040	First, no worries, ask if you need to find an API. I don't think we can do %some_name = undef though, the lang ref might need an update there. :) I was searching for half an hour. Indeed, I had tried both `replaceAllUses` alone (which indeed replaced all uses but left the instruction as is), `deleteAfterManifest` (which did the previous thing + deleting the instruction - makes sense since `deleteAfterManifest` does pretty much `replaceAllUses` + `eraseFromParent`) but no way to do that. Can we somehow submit a patch for the LanguageRef ? I just tried `changeToUnreachable` which eliminates the whole block :)

manifest that changes to uncreachable live UB loads.
Set both for UB loads and non-UB loads.
Stats that use UB loads.
Method that checks if a specific (load) instruction is considered UB.

Note: Clang format did its job a little bit too well and replaced 2 parts that are not related to this patch. Should I remove them?

Now all that is left is a test file with positive and negative tests so we can see it works and not accidentally break it in the future.

In D71435#1787126, @baziotis wrote:

manifest that changes to uncreachable live UB loads.

Set both for UB loads and non-UB loads.

Stats that use UB loads.

Method that checks if a specific (load) instruction is considered UB.

The method needs to be isAssumedToCauseUB not isKnownToCauseUB because we don't know until a fixpoint is reached.
Also remove the ones that do not take an instruction for now. They are confusing and not needed until they actually return the proper value, I mean true if the function *always* exhibits UB.

Note: Clang format did its job a little bit too well and replaced 2 parts that are not related to this patch. Should I remove them?

Two options: 1) Remove them from the diff. 2) Commit them first.

Since I will probably have to actually push the diff (as you do not have push access) I will do a clang format of the file first.
It also helps to only use the clang format diff script during development. Though, the Attributor files should always be formatted which is why option 2) above is a proper way to resolve this.

llvm/lib/Transforms/IPO/Attributor.cpp
1998	Nit: This is the same as not overriding the method in the first place.
2032	Nit: Add a TODO above explaining that we should not only look at loads and also expand it to more than constant null values eventually. Nit: Empty lines, e.g., before comments, help (me) to read code.
2041	Nit: `UBLoads.count(I)` is shorter, also in some other places.

In D71435#1787162, @jdoerfert wrote:

Now all that is left is a test file with positive and negative tests so we can see it works and not accidentally break it in the future.

Yes, finishing minor details and they come next.

The method needs to be isAssumedToCauseUB not isKnownToCauseUB because we don't know until a fixpoint is reached.

Indeed, it was not intended.

Also remove the ones that do not take an instruction for now. They are confusing and not needed until they actually return the proper value, I mean true if the function *always* exhibits UB.

Noted.

Two options: 1) Remove them from the diff. 2) Commit them first.

Since I will probably have to actually push the diff (as you do not have push access) I will do a clang format of the file first.
It also helps to only use the clang format diff script during development.

I didn't know about the diff option, thanks.

Though, the Attributor files should always be formatted which is why option 2) above is a proper way to resolve this.

I lost it.. :) Should I commit them and upload diffs over that (local) commit or should I just not commit them and run
clang-format on the diff and let you do the format when you push it?

I lost it.. :) Should I commit them and upload diffs over that (local) commit or should I just not commit them and run
clang-format on the diff and let you do the format when you push it?

I just formatted the Attributor file. Once this change is rebased these differences will disappear.

I just formatted the Attributor file. Once this change is rebased these differences will disappear.

Thank you.

Fixed minor details here and there. I'll come back tomorrow with tests.

uenoku added a subscriber: uenoku.Dec 17 2019, 10:45 AM

Added basic tests. There's a FIXME because with "null-pointer-is-valid` attribute in the function, nullPointerIsDefined should return true and hence not put the instruction in UBLoads and not make the code unreachable.

Apparently, I didn't look the implementation of nullPointerIsDefined that I referenced. I have to set the attribute equal to "true" and it works fine.

Thanks! LGTM. I can commit this for you if you want. Maybe update the commit message first.

We can do follow up patches for other instructions/situations now. Some ideas:

Look at stores and other memory accesses.
Look at branches and other control flow instructions.
Look at attribute violations, e.g., null is passed but the nonnull attribute is present.
Look not only for null but also for undef, maybe even known dangling pointers (via another attribute).
Use AAValueSimplify to get an assumed simplified value that we check (that would force us to look at assumed but not known UB instructions every updateImpl call!)
Implement the function UB analysis that checks if an instruction that is in the "must-be-executed-context" of the function entry is assumed to have UB.

Let me know if you plan to tackle any/all of them.

This revision is now accepted and ready to land.Dec 18 2019, 1:43 PM

In D71435#1790358, @jdoerfert wrote:

Thanks! LGTM. I can commit this for you if you want. Maybe update the commit message first.

Thank you! Of course, you may commit it and update the commit message as you wish.

We can do follow up patches for other instructions/situations now. Some ideas:

Look at stores and other memory accesses.

Look at branches and other control flow instructions.

Look at attribute violations, e.g., null is passed but the nonnull attribute is present.

Look not only for null but also for undef, maybe even known dangling pointers (via another attribute).

Use AAValueSimplify to get an assumed simplified value that we check (that would force us to look at assumed but not known UB instructions every updateImpl call!)

Implement the function UB analysis that checks if an instruction that is in the "must-be-executed-context" of the function entry is assumed to have UB.

Let me know if you plan to tackle any/all of them.

I'd totally like to tackle all of them, it's pretty interesting already! But it's better to not get a (implicit) assignment in any of them as time is constrained right now.
Definitely I'll start with which seem more approachable like stores and other memory accesses, branches and attribute violations.

Closed by commit rG58f324a468ff: [Attributor] Function level undefined behavior attribute (authored by jdoerfert). · Explain WhyDec 24 2019, 5:24 PM

This revision was automatically updated to reflect the committed changes.

Herald added a reviewer: sstefan1. · View Herald TranscriptDec 24 2019, 5:24 PM

baziotis retitled this revision from [WIP] [Attributor] Function level undefined behavior attribute to [Attributor] Function level undefined behavior attribute.Jan 7 2020, 3:00 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

IPO/

Attributor.h

26 lines

lib/

Transforms/

IPO/

Attributor.cpp

97 lines

test/

Transforms/

Attributor/

undefined_behavior.ll

38 lines

Diff 235242

llvm/include/llvm/Transforms/IPO/Attributor.h

Show First 20 Lines • Show All 1,680 Lines • ▼ Show 20 Lines	struct AAWillReturn

/// Create an abstract attribute view for the position \p IRP.		/// Create an abstract attribute view for the position \p IRP.
static AAWillReturn &createForPosition(const IRPosition &IRP, Attributor &A);		static AAWillReturn &createForPosition(const IRPosition &IRP, Attributor &A);

/// Unique ID (due to the unique address)		/// Unique ID (due to the unique address)
static const char ID;		static const char ID;
};		};

		/// An abstract attribute for undefined behavior.
		struct AAUndefinedBehavior
		: public StateWrapper<BooleanState, AbstractAttribute>,
		public IRPosition {
		AAUndefinedBehavior(const IRPosition &IRP) : IRPosition(IRP) {}

		/// Return true if "undefined behavior" is assumed.
		bool isAssumedToCauseUB() const { return getAssumed(); }

		/// Return true if "undefined behavior" is assumed for a specific instruction.
		virtual bool isAssumedToCauseUB(Instruction *I) const = 0;
		jdoerfertUnsubmitted Not Done Reply Inline Actions Since we start with a function version only, we probably need to add an `llvm::Instruction` operand here. While it makes sense if every execution of a function causes UB, the initial implementation will answer the questions for a single (load) instruction. jdoerfert: Since we start with a function version only, we probably need to add an `llvm::Instruction`…
		baziotisAuthorUnsubmitted Done Reply Inline Actions You mean to add a (method) overload for `isKnownToCauseUB`, maybe pure `virtual` that is implemented in `AAUndefinedBehaviorImpl`? It can check if it is a `Load` and if so, check if it is in `UBLoads`. baziotis: You mean to add a (method) overload for `isKnownToCauseUB`, maybe pure `virtual` that is…
		jdoerfertUnsubmitted Not Done Reply Inline Actions Exactly. For now that is probably the only two methods we need. I mean we do not actually use the boolean state right now that is returned in the current `isAssumedToCauseUB`. jdoerfert: Exactly. For now that is probably the only two methods we need. I mean we do not actually use…

		/// Return true if "undefined behavior" is known.
		bool isKnownToCauseUB() const { return getKnown(); }

		/// Return an IR position, see struct IRPosition.
		const IRPosition &getIRPosition() const override { return *this; }

		/// Create an abstract attribute view for the position \p IRP.
		static AAUndefinedBehavior &createForPosition(const IRPosition &IRP,
		Attributor &A);

		jdoerfertUnsubmitted Not Done Reply Inline Actions In comments two words please "undefined behavior" Remove the "AA" in the method names: "isKnownUndefinedBehavior" or "isKnownToCauseUB". Return the state value not `true`/`false`. See other `isKnownXXXXX` functions around here. jdoerfert: In comments two words please "undefined behavior" Remove the "AA" in the method names…
		/// Unique ID (due to the unique address)
		static const char ID;
		};

/// An abstract interface to determine reachability of point A to B.		/// An abstract interface to determine reachability of point A to B.
struct AAReachability : public StateWrapper<BooleanState, AbstractAttribute>,		struct AAReachability : public StateWrapper<BooleanState, AbstractAttribute>,
public IRPosition {		public IRPosition {
AAReachability(const IRPosition &IRP) : IRPosition(IRP) {}		AAReachability(const IRPosition &IRP) : IRPosition(IRP) {}

/// Returns true if 'From' instruction is assumed to reach, 'To' instruction.		/// Returns true if 'From' instruction is assumed to reach, 'To' instruction.
/// Users should provide two positions they are interested in, and the class		/// Users should provide two positions they are interested in, and the class
/// determines (and caches) reachability.		/// determines (and caches) reachability.
▲ Show 20 Lines • Show All 456 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/Attributor.cpp

Show First 20 Lines • Show All 1,981 Lines • ▼ Show 20 Lines	return clampStateAndIndicateChange(
getState(),		getState(),
static_cast<const AANoRecurse::StateType &>(FnAA.getState()));		static_cast<const AANoRecurse::StateType &>(FnAA.getState()));
}		}

/// See AbstractAttribute::trackStatistics()		/// See AbstractAttribute::trackStatistics()
void trackStatistics() const override { STATS_DECLTRACK_CS_ATTR(norecurse); }		void trackStatistics() const override { STATS_DECLTRACK_CS_ATTR(norecurse); }
};		};

		/// -------------------- Undefined-Behavior Attributes ------------------------
		jdoerfertUnsubmitted Not Done Reply Inline Actions I would have taken a `SmallPtrSet<Instruction , 8>`. Please start variable names with an upper case letter. jdoerfert:* I would have taken a `SmallPtrSet<Instruction *, 8>`. Please start variable names with an…

		struct AAUndefinedBehaviorImpl : public AAUndefinedBehavior {
		AAUndefinedBehaviorImpl(const IRPosition &IRP) : AAUndefinedBehavior(IRP) {}

		/// See AbstractAttribute::updateImpl(...).
		jdoerfertUnsubmitted Not Done Reply Inline Actions Please document what is collected in here. Maybe make it not accessible for other classes, e.g., move it to the end into a `private:` part of the struct declaration. jdoerfert: Please document what is collected in here. Maybe make it not accessible for other classes, e.g.
		ChangeStatus updateImpl(Attributor &A) override {
		size_t PrevSize = NoUBLoads.size();

		jdoerfertUnsubmitted Not Done Reply Inline Actions Nit: This is the same as not overriding the method in the first place. jdoerfert: Nit: This is the same as not overriding the method in the first place.
		// TODO: We should not only check for load instructions.
		auto InspectLoadForUB = [&](Instruction &I) {
		baziotisAuthorUnsubmitted Not Done Reply Inline Actions I couldn't find a way to save the entry so that we don't have to search the second (i.e. like in the unordered_map). baziotis: I couldn't find a way to save the entry so that we don't have to search the second (i.e. like…
		jdoerfertUnsubmitted Not Done Reply Inline Actions Search the second what? `if (MySmallPtrSet.count(&I))` and `MySmallPtrSet.insert(&I)` should work just fine. Nit: I would not do the `Iref` and `I = &Iref` thing but just use the reference (called `I`) and the `&` as needed. jdoerfert: Search the second what? `if (MySmallPtrSet.count(&I))` and `MySmallPtrSet.insert(&I)`…
		baziotisAuthorUnsubmitted Done Reply Inline Actions Search the second what? if (MySmallPtrSet.count(&I)) and MySmallPtrSet.insert(&I) should work just fine. Yes, but pretty much `count` would achieve what `find` achieves already right? What I meant was that imagine that our DS is implemented as a linear probing hash table. Then, you would do one search with find (or count) and then another with insert, but it would be the _same_ search. In an `unordered_map` we can avoid that by caching the search in find, like this: Instruction &entry = map[I]; if (!entry) { // Does not exist, so insert it. entry = ...; } But I don't if this is possible on `std::set` let alone `SmallPtrSet`. It's not that important I guess. Nit: I would not do the Iref and I = &Iref thing but just use the reference (called I) and the & as needed. Noted :) baziotis:* Search the second what? > if (MySmallPtrSet.count(&I)) > and > MySmallPtrSet.insert(&I) >…
		// Skip instructions that are already saved.
		jdoerfertUnsubmitted Not Done Reply Inline Actions I'm unsure what the IPos does here. Probably copy&paste. Note that `I` is known to be a `Load` here, so `ImmutableCallSite(&I)` will always result in a "unusable" call site, e.g., one that evaluates to `false` if converted to bool. jdoerfert: I'm unsure what the IPos does here. Probably copy&paste. Note that `I` is known to be a `Load`…
		jdoerfertUnsubmitted Not Done Reply Inline Actions FWIW: Branching on an uninitialized local variable is UB. jdoerfert: FWIW: Branching on an uninitialized local variable is UB.
		jdoerfertUnsubmitted Not Done Reply Inline Actions For now just check if the pointer is a constant null and if null is not dereferenceable jdoerfert: For now just check if the pointer is a constant null and if null is not dereferenceable
		baziotisAuthorUnsubmitted Done Reply Inline Actions For now just check if the pointer is a constant null and if null is not dereferenceable I was assuming the first one but in the second one I lost it. Did you maybe mean "If it is not constant null, check if it is not dereferenceable (i.e. getAAFor<Dereferenceable>)"? baziotis: > For now just check if the pointer is a constant null and if null is not dereferenceable I…
		jdoerfertUnsubmitted Not Done Reply Inline Actions Did you maybe mean "If it is not constant null, check if it is not dereferenceable (i.e. getAAFor<Dereferenceable>)"? No, AADerferenceable will not tell you if it is "not dereferenceanle" but only if it is. What I tried to say in more words: Do not assume NULL cannot be dereferenced, thus that a load from NULL would be UB for sure. NULL can be a valid pointer, it just happens to be one that is different on your OS. In order to determine if NULL can be dereferenced, use the `NullPointerIsDefined` helper function, you can find some uses of it in the Attributor.cpp file already. jdoerfert: > Did you maybe mean "If it is not constant null, check if it is not dereferenceable (i.e.
		baziotisAuthorUnsubmitted Done Reply Inline Actions FWIW: Branching on an uninitialized local variable is UB Yes, thanks, it was intended to sort of signify that "this is not complete yet, it has to be filled". baziotis: > FWIW: Branching on an uninitialized local variable is UB Yes, thanks, it was intended to sort…
		if (NoUBLoads.count(&I) \|\| UBLoads.count(&I))
		return true;

		Value *PtrOp = cast<LoadInst>(&I)->getPointerOperand();

		// A load is considered UB only if it dereferences a constant
		// null pointer.
		jdoerfertUnsubmitted Not Done Reply Inline Actions Nit: Cant -> Cannot No braces if there is only a single stmt. jdoerfert: Nit: Cant -> Cannot No braces if there is only a single stmt.
		if (!isa<ConstantPointerNull>(PtrOp)) {
		NoUBLoads.insert(&I);
		return true;
		}
		Type *PtrTy = PtrOp->getType();

		// Because we only consider loads inside functions,
		// assume that a parent function exists.
		const Function *F = I.getFunction();

		// A dereference on constant null is only considered UB
		// if null dereference is _not_ defined for the target platform.
		// TODO: Expand it to not only check constant values.
		if (!llvm::NullPointerIsDefined(F, PtrTy->getPointerAddressSpace()))
		UBLoads.insert(&I);
		jdoerfertUnsubmitted Not Done Reply Inline Actions You can assume `I` to be in a function, thus `I.getFunction()` is not null. If you write the code with early exists you can save multiple levels of nesting and make it generally easier to read. jdoerfert: You can assume `I` to be in a function, thus `I.getFunction()` is not null. If you write the…
		else
		NoUBLoads.insert(&I);
		jdoerfertUnsubmitted Not Done Reply Inline Actions We need to track statistics differently. Maybe we want to track two lists, one for assumed live and UB loads one for assumed live and non-UB loads. We can track the number of UB loads then by using something like this (sorry for the template mess) STATS_DECL(UndefinedBehaviorInstruction, Instruction, "Number of instructions known to have UB"); BUILD_STAT_NAME(UndefinedBehaviorInstruction, Instruction) += AssumedUBInstructions.size(); jdoerfert: We need to track statistics differently. Maybe we want to track two lists, one for assumed live…
		return true;
		};
		jdoerfertUnsubmitted Not Done Reply Inline Actions My proposal going forward: Keep a list of loads that are not assumed to cause UB. Initially that list is empty. Every updateImpl is called you iterate over all assumed live loads (as you do above) and: skip ones we have in the list of not assumed UB causing check if we can assume we can continue to assume execution would cause UB, initially by asking if the pointer is `null` or `undef`. If this fails add them to the list. If the list changed the internal state changed and we have to tell the Attributor. jdoerfert: My proposal going forward: Keep a list of loads that are not assumed to cause UB. Initially…

		A.checkForAllInstructions(InspectLoadForUB, *this, {Instruction::Load});
		if (PrevSize != NoUBLoads.size())
		return ChangeStatus::CHANGED;
		return ChangeStatus::UNCHANGED;
		jdoerfertUnsubmitted Not Done Reply Inline Actions Nit: Add a TODO above explaining that we should not only look at loads and also expand it to more than constant null values eventually. Nit: Empty lines, e.g., before comments, help (me) to read code. jdoerfert: Nit: Add a TODO above explaining that we should not only look at loads and also expand it to…
		}

		bool isAssumedToCauseUB(Instruction *I) const override {
		return UBLoads.count(I);
		jdoerfertUnsubmitted Not Done Reply Inline Actions Let's not handle call sites for now and just give up on them. jdoerfert: Let's not handle call sites for now and just give up on them.
		}

		ChangeStatus manifest(Attributor &A) override {
		if (!UBLoads.size())
		jdoerfertUnsubmitted Not Done Reply Inline Actions Let's also add a manifest method to replace the loads that are still considered UB with an undef (for now). That should allow us to test this. jdoerfert: Let's also add a manifest method to replace the loads that are still considered UB with an…
		baziotisAuthorUnsubmitted Done Reply Inline Actions The whole instruction? Like replace `%a = load i32, i32* null` with `undef`. I don't think this can happen. Maybe I misunderstood and you meant replace it with this: `%a = load i32, i32* undef` (i.e. replace the pointer operand value). baziotis: The whole instruction? Like replace `%a = load i32, i32* null` with `undef`. I don't think this…
		jdoerfertUnsubmitted Not Done Reply Inline Actions Your first interpretation was what I meant. Replace the load (instruction) with undef. We actually want to be more aggressive but we will need that step either way. jdoerfert: Your first interpretation was what I meant. Replace the load (instruction) with undef. We…
		baziotisAuthorUnsubmitted Done Reply Inline Actions Sorry but I couldn't find a way to do that. I've seen in the language reference that one can do this: `%some_name = undef` and so I guess this is what we want, but I couldn't produce it. The closest thing I found is `deleteAfterManifest` which replaces _its uses_ with `undef` but will also delete the instruction. I guess we actually need the instruction (or, its replaced counterpart) so that the info of undefined behavior stays. baziotis: Sorry but I couldn't find a way to do that. I've seen in the language reference that one can do…
		jdoerfertUnsubmitted Not Done Reply Inline Actions First, no worries, ask if you need to find an API. The code below was obviously not tested but should do what I described earlier: `I.replaceAllUsesWith(UndefValue::get(I.getType()))` You can also use `deleteAfterManifest`, actually you probably should use that instead. Or we directly go one step further and use `llvm::changeToUnreachable` (as AAIsDead does). P.S. I don't think we can do `%some_name = undef` though, the lang ref might need an update there. jdoerfert: First, no worries, ask if you need to find an API. The code below was obviously not tested but…
		baziotisAuthorUnsubmitted Done Reply Inline Actions First, no worries, ask if you need to find an API. I don't think we can do %some_name = undef though, the lang ref might need an update there. :) I was searching for half an hour. Indeed, I had tried both `replaceAllUses` alone (which indeed replaced all uses but left the instruction as is), `deleteAfterManifest` (which did the previous thing + deleting the instruction - makes sense since `deleteAfterManifest` does pretty much `replaceAllUses` + `eraseFromParent`) but no way to do that. Can we somehow submit a patch for the LanguageRef ? I just tried `changeToUnreachable` which eliminates the whole block :) baziotis: > First, no worries, ask if you need to find an API. > I don't think we can do %some_name =…
		baziotisAuthorUnsubmitted Done Reply Inline Actions Well, I was assuming initially that `%c = undef` can't appear anywhere, but I changed my mind because of the ref: https://llvm.org/docs/LangRef.html#undefined-values. However, it seems that one can't actually write this (only maybe LLVM from one pass to another ? I've no idea, it's obscure). baziotis: Well, I was assuming initially that `%c = undef` can't appear anywhere, but I changed my mind…
		return ChangeStatus::UNCHANGED;
		jdoerfertUnsubmitted Not Done Reply Inline Actions Nit: `UBLoads.count(I)` is shorter, also in some other places. jdoerfert: Nit: `UBLoads.count(I)` is shorter, also in some other places.
		for (Instruction *I : UBLoads)
		changeToUnreachable(I, /* UseLLVMTrap */ false);
		return ChangeStatus::CHANGED;
		}

		/// See AbstractAttribute::getAsStr()
		const std::string getAsStr() const override {
		return getAssumed() ? "undefined-behavior" : "no-ub";
		}

		jdoerfertUnsubmitted Not Done Reply Inline Actions This was my bad, as I proposed this kind of statistic tracking, but this counts the opposite of what the text says. You can keep a different set with the ones that are still considered UB which you update in the updateImpl as well. That set can also be used in the manifest method. jdoerfert: This was my bad, as I proposed this kind of statistic tracking, but this counts the opposite of…
		protected:
		// A set of all the (live) load instructions that _are_ assumed to cause UB.
		SmallPtrSet<Instruction *, 8> UBLoads;

		private:
		// A set of all the (live) load instructions that are _not_ assumed to cause
		// UB.
		// Note: The correctness of the procedure depends on the fact that this
		// set stops changing after some point. "Change" here means that the size
		// of the set changes. The size of this set is monotonically increasing
		// (we only add items to it) and is upper bounded by the number of load
		// instructions in the processed function (we can never save more elements
		// in this set than this number). Hence, the size of this set, at some
		// point, will stop increasing, effectively reaching a fixpoint.
		SmallPtrSet<Instruction *, 8> NoUBLoads;
		};

		struct AAUndefinedBehaviorFunction final : AAUndefinedBehaviorImpl {
		AAUndefinedBehaviorFunction(const IRPosition &IRP)
		: AAUndefinedBehaviorImpl(IRP) {}

		/// See AbstractAttribute::trackStatistics()
		void trackStatistics() const override {
		STATS_DECL(UndefinedBehaviorInstruction, Instruction,
		"Number of instructions known to have UB");
		BUILD_STAT_NAME(UndefinedBehaviorInstruction, Instruction) +=
		UBLoads.size();
		}
		};

/// ------------------------ Will-Return Attributes ----------------------------		/// ------------------------ Will-Return Attributes ----------------------------

// Helper function that checks whether a function has any cycle.		// Helper function that checks whether a function has any cycle.
// TODO: Replace with more efficent code		// TODO: Replace with more efficent code
static bool containsCycle(Function &F) {		static bool containsCycle(Function &F) {
SmallPtrSet<BasicBlock *, 32> Visited;		SmallPtrSet<BasicBlock *, 32> Visited;

// Traverse BB by dfs and check whether successor is already visited.		// Traverse BB by dfs and check whether successor is already visited.
▲ Show 20 Lines • Show All 3,520 Lines • ▼ Show 20 Lines	void Attributor::identifyDefaultAbstractAttributes(Function &F) {
// Check for dead BasicBlocks in every function.		// Check for dead BasicBlocks in every function.
// We need dead instruction detection because we do not want to deal with		// We need dead instruction detection because we do not want to deal with
// broken IR in which SSA rules do not apply.		// broken IR in which SSA rules do not apply.
getOrCreateAAFor<AAIsDead>(FPos);		getOrCreateAAFor<AAIsDead>(FPos);

// Every function might be "will-return".		// Every function might be "will-return".
getOrCreateAAFor<AAWillReturn>(FPos);		getOrCreateAAFor<AAWillReturn>(FPos);

		// Every function might contain instructions that cause "undefined behavior".
		jdoerfertUnsubmitted Not Done Reply Inline Actions The wording is a bit off. Maybe say something like: `Every function might contain instructions that cause "undefined-behavior"` jdoerfert: The wording is a bit off. Maybe say something like: `Every function might contain instructions…
		getOrCreateAAFor<AAUndefinedBehavior>(FPos);

// Every function can be nounwind.		// Every function can be nounwind.
getOrCreateAAFor<AANoUnwind>(FPos);		getOrCreateAAFor<AANoUnwind>(FPos);

// Every function might be marked "nosync"		// Every function might be marked "nosync"
getOrCreateAAFor<AANoSync>(FPos);		getOrCreateAAFor<AANoSync>(FPos);

// Every function might be "no-free".		// Every function might be "no-free".
getOrCreateAAFor<AANoFree>(FPos);		getOrCreateAAFor<AANoFree>(FPos);
▲ Show 20 Lines • Show All 288 Lines • ▼ Show 20 Lines

const char AAReturnedValues::ID = 0;		const char AAReturnedValues::ID = 0;
const char AANoUnwind::ID = 0;		const char AANoUnwind::ID = 0;
const char AANoSync::ID = 0;		const char AANoSync::ID = 0;
const char AANoFree::ID = 0;		const char AANoFree::ID = 0;
const char AANonNull::ID = 0;		const char AANonNull::ID = 0;
const char AANoRecurse::ID = 0;		const char AANoRecurse::ID = 0;
const char AAWillReturn::ID = 0;		const char AAWillReturn::ID = 0;
		const char AAUndefinedBehavior::ID = 0;
const char AANoAlias::ID = 0;		const char AANoAlias::ID = 0;
const char AAReachability::ID = 0;		const char AAReachability::ID = 0;
const char AANoReturn::ID = 0;		const char AANoReturn::ID = 0;
const char AAIsDead::ID = 0;		const char AAIsDead::ID = 0;
const char AADereferenceable::ID = 0;		const char AADereferenceable::ID = 0;
const char AAAlign::ID = 0;		const char AAAlign::ID = 0;
const char AANoCapture::ID = 0;		const char AANoCapture::ID = 0;
const char AAValueSimplify::ID = 0;		const char AAValueSimplify::ID = 0;
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines
CREATE_VALUE_ABSTRACT_ATTRIBUTE_FOR_POSITION(AANoCapture)		CREATE_VALUE_ABSTRACT_ATTRIBUTE_FOR_POSITION(AANoCapture)

CREATE_ALL_ABSTRACT_ATTRIBUTE_FOR_POSITION(AAValueSimplify)		CREATE_ALL_ABSTRACT_ATTRIBUTE_FOR_POSITION(AAValueSimplify)
CREATE_ALL_ABSTRACT_ATTRIBUTE_FOR_POSITION(AAIsDead)		CREATE_ALL_ABSTRACT_ATTRIBUTE_FOR_POSITION(AAIsDead)
CREATE_ALL_ABSTRACT_ATTRIBUTE_FOR_POSITION(AANoFree)		CREATE_ALL_ABSTRACT_ATTRIBUTE_FOR_POSITION(AANoFree)

CREATE_FUNCTION_ONLY_ABSTRACT_ATTRIBUTE_FOR_POSITION(AAHeapToStack)		CREATE_FUNCTION_ONLY_ABSTRACT_ATTRIBUTE_FOR_POSITION(AAHeapToStack)
CREATE_FUNCTION_ONLY_ABSTRACT_ATTRIBUTE_FOR_POSITION(AAReachability)		CREATE_FUNCTION_ONLY_ABSTRACT_ATTRIBUTE_FOR_POSITION(AAReachability)
		CREATE_FUNCTION_ONLY_ABSTRACT_ATTRIBUTE_FOR_POSITION(AAUndefinedBehavior)

CREATE_NON_RET_ABSTRACT_ATTRIBUTE_FOR_POSITION(AAMemoryBehavior)		CREATE_NON_RET_ABSTRACT_ATTRIBUTE_FOR_POSITION(AAMemoryBehavior)

#undef CREATE_FUNCTION_ONLY_ABSTRACT_ATTRIBUTE_FOR_POSITION		#undef CREATE_FUNCTION_ONLY_ABSTRACT_ATTRIBUTE_FOR_POSITION
#undef CREATE_FUNCTION_ABSTRACT_ATTRIBUTE_FOR_POSITION		#undef CREATE_FUNCTION_ABSTRACT_ATTRIBUTE_FOR_POSITION
#undef CREATE_NON_RET_ABSTRACT_ATTRIBUTE_FOR_POSITION		#undef CREATE_NON_RET_ABSTRACT_ATTRIBUTE_FOR_POSITION
#undef CREATE_VALUE_ABSTRACT_ATTRIBUTE_FOR_POSITION		#undef CREATE_VALUE_ABSTRACT_ATTRIBUTE_FOR_POSITION
#undef CREATE_ALL_ABSTRACT_ATTRIBUTE_FOR_POSITION		#undef CREATE_ALL_ABSTRACT_ATTRIBUTE_FOR_POSITION
#undef SWITCH_PK_CREATE		#undef SWITCH_PK_CREATE
#undef SWITCH_PK_INV		#undef SWITCH_PK_INV

INITIALIZE_PASS_BEGIN(AttributorLegacyPass, "attributor",		INITIALIZE_PASS_BEGIN(AttributorLegacyPass, "attributor",
"Deduce and propagate attributes", false, false)		"Deduce and propagate attributes", false, false)
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
INITIALIZE_PASS_END(AttributorLegacyPass, "attributor",		INITIALIZE_PASS_END(AttributorLegacyPass, "attributor",
"Deduce and propagate attributes", false, false)		"Deduce and propagate attributes", false, false)

llvm/test/Transforms/Attributor/undefined_behavior.ll

This file was added.

				; RUN: opt --attributor --attributor-disable=false -S < %s \| FileCheck %s --check-prefix=ATTRIBUTOR

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				; Test cases specifically designed for the "undefined behavior" abstract function attribute.
				; We want to verify that whenever undefined behavior is assumed, the code becomes unreachable.
				; We use FIXME's to indicate problems and missing attributes.

				; ATTRIBUTOR: define void @wholly_unreachable()
				; ATTRIBUTOR-NEXT: unreachable
				define void @wholly_unreachable() {
				%a = load i32, i32* null
				ret void
				}

				; ATTRIBUTOR: define void @single_bb_unreachable(i1 %cond)
				; ATTRIBUTOR-NEXT: br i1 %cond, label %t, label %e
				; ATTRIBUTOR-EMPTY:
				; ATTRIBUTOR-NEXT: t:
				; ATTRIBUTOR-NEXT: unreachable
				; ATTRIBUTOR-EMPTY:
				; ATTRIBUTOR-NEXT: e:
				; ATTRIBUTOR-NEXT: ret void
				define void @single_bb_unreachable(i1 %cond) {
				br i1 %cond, label %t, label %e
				t:
				%b = load i32, i32* null
				br label %e
				e:
				ret void
				}

				; ATTRIBUTOR: define void @null_pointer_is_defined()
				; ATTRIBUTOR-NEXT: %a = load i32, i32* null
				define void @null_pointer_is_defined() "null-pointer-is-valid"="true" {
				%a = load i32, i32* null
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[Attributor] Function level undefined behavior attributeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 235242

llvm/include/llvm/Transforms/IPO/Attributor.h

llvm/lib/Transforms/IPO/Attributor.cpp

llvm/test/Transforms/Attributor/undefined_behavior.ll

[Attributor] Function level undefined behavior attribute
ClosedPublic