This is an archive of the discontinued LLVM Phabricator instance.

[Attributor] Function level undefined behavior attribute
ClosedPublic

Authored by baziotis on Dec 12 2019, 2:08 PM.

Details

Summary

_Eventually_, this attribute will be assigned to a function if it contains undefined behavior. As a first small step, I tried to make it loop through the load instructions in a function (eventually, the plan is to check if a load instructions causes undefined behavior, because e.g. dereferences a null pointer - Also eventually, this won't happen in initialize() but in updateImpl()).

Note: This is my first LLVM and Attributor review hence this patch doesn't really do anything yet, I'm just trying to get the initial details down.
Any help / advice / proposal is highly appreciated.

Edit: Correction: The attribute won't be assigned to functions (yet). Its purpose is to be integrated in AAIsDead.

Diff Detail

Event Timeline

baziotis created this revision.Dec 12 2019, 2:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 12 2019, 2:08 PM

I'm a little dubious on the use-case of such an attribute, perhaps you can add that into patch's description.

Thanks for working on this. I left some comments and look forward to the updateImpl code :)

llvm/include/llvm/Transforms/IPO/Attributor.h
1710

In comments two words please "undefined behavior"

Remove the "AA" in the method names: "isKnownUndefinedBehavior" or "isKnownToCauseUB".

Return the state value not true/false. See other isKnownXXXXX functions around here.

llvm/lib/Transforms/IPO/Attributor.cpp
2001

I'm unsure what the IPos does here. Probably copy&paste. Note that I is known to be a Load here, so ImmutableCallSite(&I) will always result in a "unusable" call site, e.g., one that evaluates to false if converted to bool.

2025

We need to track statistics differently. Maybe we want to track two lists, one for assumed live and UB loads one for assumed live and non-UB loads. We can track the number of UB loads then by using something like this (sorry for the template mess)

STATS_DECL(UndefinedBehaviorInstruction, Instruction, "Number of instructions known to have UB");
BUILD_STAT_NAME(UndefinedBehaviorInstruction, Instruction) += AssumedUBInstructions.size();
2027

My proposal going forward:

Keep a list of loads that are *not* assumed to cause UB. Initially that list is empty.
Every updateImpl is called you iterate over all assumed live loads (as you do above) and:

  • skip ones we have in the list of *not* assumed UB causing
  • check if we can assume we can continue to assume execution would cause UB, initially by asking if the pointer is null or undef. If this fails add them to the list.

If the list changed the internal state changed and we have to tell the Attributor.

2036

Let's not handle call sites for now and just give up on them.

5618

The wording is a bit off. Maybe say something like: Every function might contain instructions that cause "undefined-behavior"

I'm a little dubious on the use-case of such an attribute, perhaps you can add that into patch's description.

Let me answer this one as I proposed this project.

The Attributor uses liveness extensively to improve results. I mean, basically all results are "conditional" in the sense SCCP is "conditional".
Liveness in SCCP, and so far in the Attributor, is mostly derived by not following assumed dead branches. Having the ability to reason about UB will allow us to augment this.

Take the following derived example (https://godbolt.org/z/ZcGWiC)

extern int Condition;

int extern_fn(void);

static void internal_fn(int *a) {
  if (Condition) {
    *a = 1;
    extern_fn();
  }
}

void entry() {
  internal_fn(0);
}

While we remove the store to the nullptr and add a trap eventually, we keep the conditional and we introduce a trap. Both are actually harmful for optimization purposes.
I know this is constructed example but we see a lot of known UB (not necessarily load related) after inlining and constant prop which would be very valuable to prune infeasible paths.

I'm unsure what the IPos does here. Probably copy&paste. Note that I is known to be a Load here, so ImmutableCallSite(&I) will always result in a "unusable" call site, e.g., one that evaluates to false if converted to bool.

Oh yes of course, it was totally because of copy paste.

We need to track statistics differently. Maybe we want to track two lists, one for assumed live and UB loads one for assumed live and non-UB loads. We can track the number of UB loads then by using something like this (sorry for the template mess)

I've no idea how statistics are used / tracked. But this code unfortunately doesn't work. I assume that the AssumedUBInstructions is the list you're
referring in the proposed method for updateImpl(), so I'll try to update this when that is ready.

baziotis updated this revision to Diff 233705.Dec 12 2019, 4:29 PM
  • Made AAUB function-only (i.e. no call-site, as AAReachability).
  • Small fixes.

I'll come back with tests and updateImpl() (probably in reverse order, as otherwise tests are useless).

baziotis edited the summary of this revision. (Show Details)Dec 15 2019, 6:31 AM
baziotis updated this revision to Diff 233964.EditedDec 15 2019, 6:39 AM

This is a high level overview of the algorithm. noUBLoads is monotonically increasing and upper bounded thus this procedure will end (note that for the time being though,
we can't derive pessimistic fixpoint).
A note on the DS: I assume LLVM has its own version of unordered_set but I don't know its internals and this seemed to be the most applicable in this case.
Feel free to propose LLVM alternatives.

Also, now that we have the noUBLoads, I'll add the stats.

You should either diff against the proper trunk revision you work on or merge all your commits into a single one. Right now you always only upload a diff against the last version of the patch.

llvm/lib/Transforms/IPO/Attributor.cpp
1990

I would have taken a SmallPtrSet<Instruction *, 8>.

Please start variable names with an upper case letter.

2001

FWIW: Branching on an uninitialized local variable is UB.

2008

Nit: Cant -> Cannot

No braces if there is only a single stmt.

jdoerfert added inline comments.Dec 15 2019, 11:36 AM
llvm/lib/Transforms/IPO/Attributor.cpp
2001

For now just check if the pointer is a constant null and if null is not dereferenceable

You should either diff against the proper trunk revision you work on or merge all your commits into a single one. Right now you always only upload a diff against the last version of the patch.

My bad, I thought this is what I _had to_ do.

llvm/lib/Transforms/IPO/Attributor.cpp
2000

I couldn't find a way to save the entry so that we don't have to search the second (i.e. like in the unordered_map).

jdoerfert added inline comments.Dec 15 2019, 6:29 PM
llvm/lib/Transforms/IPO/Attributor.cpp
2000

Search the second what?

if (MySmallPtrSet.count(&I))

and

MySmallPtrSet.insert(&I)

should work just fine.

Nit: I would not do the Iref and I = &Iref thing but just use the reference (called I) and the & as needed.

baziotis marked 2 inline comments as done.Dec 15 2019, 6:42 PM
baziotis added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
2001

FWIW: Branching on an uninitialized local variable is UB

Yes, thanks, it was intended to sort of signify that "this is not complete yet, it has to be filled".

2001

For now just check if the pointer is a constant null and if null is not dereferenceable

I was assuming the first one but in the second one I lost it.
Did you maybe mean "If it is not constant null, check if it is not dereferenceable (i.e. getAAFor<Dereferenceable>)"?

baziotis marked an inline comment as done.Dec 15 2019, 6:51 PM
baziotis added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
2000

Search the second what?

if (MySmallPtrSet.count(&I))

and

MySmallPtrSet.insert(&I)

should work just fine.

Yes, but pretty much count would achieve what find achieves already right?
What I meant was that imagine that our DS is implemented as a linear probing hash table. Then, you would do one search with find (or count) and then another with insert, but it would be the _same_ search. In an unordered_map we can avoid that by caching the search in find, like this:

Instruction *&entry = map[I];
if (!entry) {
  // Does not exist, so insert it.
  entry = ...;
}

But I don't if this is possible on std::set let alone SmallPtrSet. It's not that important I guess.

Nit: I would not do the Iref and I = &Iref thing but just use the reference (called I) and the & as needed.

Noted :)

jdoerfert added inline comments.Dec 15 2019, 9:26 PM
llvm/lib/Transforms/IPO/Attributor.cpp
2001

Did you maybe mean "If it is not constant null, check if it is not dereferenceable (i.e. getAAFor<Dereferenceable>)"?

No, AADerferenceable will not tell you if it is "not dereferenceanle" but only if it is.

What I tried to say in more words:

Do not assume NULL cannot be dereferenced, thus that a load from NULL would be UB for sure. NULL can be a valid pointer, it just happens to be one that is different on your OS. In order to determine if NULL can be dereferenced, use the NullPointerIsDefined helper function, you can find some uses of it in the Attributor.cpp file already.

No, AADerferenceable will not tell you if it is "not dereferenceanle" but only if it is.

Yes, my bad. Since the number of dereferenceable is only increasing.

What I tried to say in more words:

Do not assume NULL cannot be dereferenced, thus that a load from NULL would be UB for sure. NULL can be a valid pointer, it just happens to be one that is different on your OS. In order to determine if NULL can be dereferenced, use the NullPointerIsDefined helper function, you can find some uses of it in the Attributor.cpp file already.

Oh yes, ok, got it.

baziotis updated this revision to Diff 234033.Dec 16 2019, 5:03 AM
  • One case for a load to cause UB: It is constant null and null is not defined for the target.
  • Stats using NoUBLoads.size()
baziotis updated this revision to Diff 234037.Dec 16 2019, 5:40 AM

Removed braces for single statement ifs

Almost there. Once a manifest method (see below) is there to act on the information we should be able to test it. So in addition to the inlined comments we need test cases now.

llvm/include/llvm/Transforms/IPO/Attributor.h
1699

Since we start with a function version only, we probably need to add an llvm::Instruction operand here.
While it makes sense if *every* execution of a function causes UB, the initial implementation will
answer the questions for a single (load) instruction.

llvm/lib/Transforms/IPO/Attributor.cpp
1995

Please document what is collected in here. Maybe make it not accessible for other classes, e.g., move it to the end into a private: part of the struct declaration.

2023

You can assume I to be in a function, thus I.getFunction() is not null.

If you write the code with early exists you can save multiple levels of nesting and make it generally easier to read.

2040

Let's also add a manifest method to replace the loads that are still considered UB with an undef (for now). That should allow us to test this.

2051

This was my bad, as I proposed this kind of statistic tracking, but this counts the opposite of what the text says. You can keep a different set with the ones that are still considered UB which you update in the updateImpl as well. That set can also be used in the manifest method.

baziotis marked an inline comment as done.Dec 16 2019, 2:29 PM
baziotis added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
2040

The whole instruction? Like replace %a = load i32, i32* null with undef. I don't think this can happen. Maybe I misunderstood and you meant replace it with this: %a = load i32, i32* undef (i.e. replace the pointer operand value).

baziotis marked an inline comment as done.Dec 16 2019, 3:00 PM
baziotis added inline comments.
llvm/include/llvm/Transforms/IPO/Attributor.h
1699

You mean to add a (method) overload for isKnownToCauseUB, maybe pure virtual that is implemented in AAUndefinedBehaviorImpl? It can check if it is a Load and if so, check if it is in UBLoads.

jdoerfert added inline comments.Dec 16 2019, 3:43 PM
llvm/include/llvm/Transforms/IPO/Attributor.h
1699

Exactly. For now that is probably the only two methods we need. I mean we do not actually use the boolean state right now that is returned in the current isAssumedToCauseUB.

llvm/lib/Transforms/IPO/Attributor.cpp
2040

Your first interpretation was what I meant. Replace the load (instruction) with undef. We actually want to be more aggressive but we will need that step either way.

baziotis marked an inline comment as done.Dec 16 2019, 4:49 PM
baziotis added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
2040

Sorry but I couldn't find a way to do that. I've seen in the language reference that one can do this: %some_name = undef and so I guess this is what we want, but I couldn't produce it. The closest thing I found is deleteAfterManifest which replaces _its uses_ with undef but will also delete the instruction. I guess we actually need the instruction (or, its replaced counterpart) so that the info of undefined behavior stays.

jdoerfert added inline comments.Dec 16 2019, 4:56 PM
llvm/lib/Transforms/IPO/Attributor.cpp
2040

First, no worries, ask if you need to find an API. The code below was obviously not tested but should do what I described earlier:

I.replaceAllUsesWith(UndefValue::get(I.getType()))

You can also use deleteAfterManifest, actually you probably should use that instead. Or we directly go one step further and use llvm::changeToUnreachable (as AAIsDead does).

P.S.
I don't think we can do %some_name = undef though, the lang ref might need an update there.

baziotis marked an inline comment as done.Dec 16 2019, 4:57 PM
baziotis added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
2040

Well, I was assuming initially that %c = undef can't appear anywhere, but I changed my mind because of the ref: https://llvm.org/docs/LangRef.html#undefined-values. However, it seems that one can't actually write this (only maybe LLVM from one pass to another ? I've no idea, it's obscure).

baziotis marked an inline comment as done.Dec 16 2019, 5:15 PM
baziotis added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
2040

First, no worries, ask if you need to find an API.
I don't think we can do %some_name = undef though, the lang ref might need an update there.

:) I was searching for half an hour. Indeed, I had tried both replaceAllUses alone (which indeed replaced all uses but left the instruction as is), deleteAfterManifest (which did the previous thing + deleting the instruction - makes sense since deleteAfterManifest does pretty much replaceAllUses + eraseFromParent) but no way to do that. Can we somehow submit a patch for the LanguageRef ?

I just tried changeToUnreachable which eliminates the whole block :)

baziotis updated this revision to Diff 234201.Dec 16 2019, 5:35 PM
  • manifest that changes to uncreachable live UB loads.
  • Set both for UB loads and non-UB loads.
  • Stats that use UB loads.
  • Method that checks if a specific (load) instruction is considered UB.

Note: Clang format did its job a little bit too well and replaced 2 parts that are not related to this patch. Should I remove them?

Now all that is left is a test file with positive and negative tests so we can see it works and not accidentally break it in the future.

  • manifest that changes to uncreachable live UB loads.
  • Set both for UB loads and non-UB loads.
  • Stats that use UB loads.
  • Method that checks if a specific (load) instruction is considered UB.

The method needs to be isAssumedToCauseUB not isKnownToCauseUB because we don't know until a fixpoint is reached.
Also remove the ones that do not take an instruction for now. They are confusing and not needed until they actually return the proper value, I mean true if the function *always* exhibits UB.

Note: Clang format did its job a little bit too well and replaced 2 parts that are not related to this patch. Should I remove them?

Two options: 1) Remove them from the diff. 2) Commit them first.

Since I will probably have to actually push the diff (as you do not have push access) I will do a clang format of the file first.
It also helps to only use the clang format diff script during development. Though, the Attributor files should always be formatted which is why option 2) above is a proper way to resolve this.

llvm/lib/Transforms/IPO/Attributor.cpp
1998

Nit: This is the same as not overriding the method in the first place.

2032

Nit: Add a TODO above explaining that we should not only look at loads and also expand it to more than *constant* null values eventually.
Nit: Empty lines, e.g., before comments, help (me) to read code.

2041

Nit: UBLoads.count(I) is shorter, also in some other places.

Now all that is left is a test file with positive and negative tests so we can see it works and not accidentally break it in the future.

Yes, finishing minor details and they come next.

The method needs to be isAssumedToCauseUB not isKnownToCauseUB because we don't know until a fixpoint is reached.

Indeed, it was not intended.

Also remove the ones that do not take an instruction for now. They are confusing and not needed until they actually return the proper value, I mean true if the function *always* exhibits UB.

Noted.

Two options: 1) Remove them from the diff. 2) Commit them first.

Since I will probably have to actually push the diff (as you do not have push access) I will do a clang format of the file first.
It also helps to only use the clang format diff script during development.

I didn't know about the diff option, thanks.

Though, the Attributor files should always be formatted which is why option 2) above is a proper way to resolve this.

I lost it.. :) Should I commit them and upload diffs over that (local) commit or should I just not commit them and run
clang-format on the diff and let you do the format when you push it?

I lost it.. :) Should I commit them and upload diffs over that (local) commit or should I just not commit them and run
clang-format on the diff and let you do the format when you push it?

I just formatted the Attributor file. Once this change is rebased these differences will disappear.

baziotis updated this revision to Diff 234214.Dec 16 2019, 7:37 PM

I just formatted the Attributor file. Once this change is rebased these differences will disappear.

Thank you.

Fixed minor details here and there. I'll come back tomorrow with tests.

uenoku added a subscriber: uenoku.Dec 17 2019, 10:45 AM
baziotis updated this revision to Diff 234428.EditedDec 17 2019, 5:25 PM

Added basic tests. There's a FIXME because with "null-pointer-is-valid` attribute in the function, nullPointerIsDefined should return true and hence not put the instruction in UBLoads and not make the code unreachable.

baziotis updated this revision to Diff 234555.EditedDec 18 2019, 8:59 AM

Apparently, I didn't look the implementation of nullPointerIsDefined that I referenced. I have to set the attribute equal to "true" and it works fine.

jdoerfert accepted this revision.Dec 18 2019, 1:43 PM

Thanks! LGTM. I can commit this for you if you want. Maybe update the commit message first.

We can do follow up patches for other instructions/situations now. Some ideas:

  • Look at stores and other memory accesses.
  • Look at branches and other control flow instructions.
  • Look at attribute violations, e.g., null is passed but the nonnull attribute is present.
  • Look not only for null but also for undef, maybe even known dangling pointers (via another attribute).
  • Use AAValueSimplify to get an assumed simplified value that we check (that would force us to look at assumed but not known UB instructions every updateImpl call!)
  • Implement the function UB analysis that checks if an instruction that is in the "must-be-executed-context" of the function entry is assumed to have UB.

Let me know if you plan to tackle any/all of them.

This revision is now accepted and ready to land.Dec 18 2019, 1:43 PM

Thanks! LGTM. I can commit this for you if you want. Maybe update the commit message first.

Thank you! Of course, you may commit it and update the commit message as you wish.

We can do follow up patches for other instructions/situations now. Some ideas:

  • Look at stores and other memory accesses.
  • Look at branches and other control flow instructions.
  • Look at attribute violations, e.g., null is passed but the nonnull attribute is present.
  • Look not only for null but also for undef, maybe even known dangling pointers (via another attribute).
  • Use AAValueSimplify to get an assumed simplified value that we check (that would force us to look at assumed but not known UB instructions every updateImpl call!)
  • Implement the function UB analysis that checks if an instruction that is in the "must-be-executed-context" of the function entry is assumed to have UB.

Let me know if you plan to tackle any/all of them.

I'd totally like to tackle all of them, it's pretty interesting already! But it's better to not get a (implicit) assignment in any of them as time is constrained right now.
Definitely I'll start with which seem more approachable like stores and other memory accesses, branches and attribute violations.

This revision was automatically updated to reflect the committed changes.
baziotis retitled this revision from [WIP] [Attributor] Function level undefined behavior attribute to [Attributor] Function level undefined behavior attribute.Jan 7 2020, 3:00 AM