This is an archive of the discontinued LLVM Phabricator instance.

Fold compares irrespective of whether allocation can be elided
ClosedPublic

Authored by anna on Apr 26 2016, 11:29 AM.

Details

Summary

When a non-escaping pointer is compared to a global value, the comparison can be folded even if the corresponding malloc/allocation call cannot be elided.
We need to make sure the global value is not null, since comparisons to null cannot be folded.

In future, we should also handle cases when the the comparison instruction dominates the pointer escape.

Diff Detail

Repository
rL LLVM

Event Timeline

anna updated this revision to Diff 55049.Apr 26 2016, 11:29 AM
anna retitled this revision from to Fold compares irrespective of whether allocation can be elided.
anna updated this object.
anna set the repository for this revision to rL LLVM.
anna added a reviewer: sanjoy.
anna added a subscriber: llvm-commits.
anna added a reviewer: majnemer.
sanjoy requested changes to this revision.Apr 26 2016, 12:02 PM
sanjoy edited edge metadata.

I think we cannot fold comparisons with values loaded from globals, since the global we're loading from could contain null. Maybe we need to use !nonnull metadata?

lib/Analysis/CaptureTracking.cpp
312 ↗(On Diff #55049)

Nit: I'd rephrase this as "Comparison against value stored in global variable"

lib/Analysis/InstructionSimplify.cpp
2091 ↗(On Diff #55049)

Nit: use nullptr

This revision now requires changes to proceed.Apr 26 2016, 12:02 PM
anna updated this revision to Diff 55084.Apr 26 2016, 1:43 PM
anna updated this object.
anna edited edge metadata.
anna marked 2 inline comments as done.

fixes issue Sanjoy pointed out: avoiding compare folding when null value loaded in global.

majnemer added inline comments.Apr 26 2016, 1:50 PM
lib/Analysis/CaptureTracking.cpp
313 ↗(On Diff #55084)

Please format this appropriately.

lib/Analysis/InstructionSimplify.cpp
2087–2092 ↗(On Diff #55084)

Why not use isKnownNonNull or isKnownNonNullAt ?

anna updated this revision to Diff 55093.Apr 26 2016, 2:28 PM
anna edited edge metadata.
anna marked 2 inline comments as done.
sanjoy requested changes to this revision.Apr 27 2016, 1:36 PM
sanjoy edited edge metadata.
sanjoy added inline comments.
lib/Analysis/CaptureTracking.cpp
312 ↗(On Diff #55093)

Please add a sentence or two here on why this is correct i.e. you're
optimistically assuming that the pointer does not escape, and given
that it does not escape, its value could could not have been "guessed
into existence" and separately stored in a global variable.

I.e. we're avoiding situations like this:

store <some value> to <global>
%val = load <global>
if (%val == %ptr)
  escape(%val)

since <some value> cannot be equal to %ptr if %ptr does not escape.

lib/Analysis/InstructionSimplify.cpp
2087 ↗(On Diff #55093)

s/Comparisons/comparisons and s/NULL/null. Can you also make the description one continuous paragraph?

2100 ↗(On Diff #55093)

I think you need to pass true for both ReturnCaptures and StoreCaptures. Can you also add a test case that would've caught this?

This revision now requires changes to proceed.Apr 27 2016, 1:36 PM
anna marked 2 inline comments as done.Apr 28 2016, 9:36 AM
anna added inline comments.
lib/Analysis/InstructionSimplify.cpp
2100 ↗(On Diff #55093)

I had ReturnCaptures as false because of this case:
compare dominates the ret statement which returns the pointer. We can fold the compare since pointer has not escaped at the time of compare.

In case compare does not dominate the ret statement, if we fold the cmp and some optimizations later on reordere the statements such that the ret no longer returns the right value, we have incorrect functionality in the IR.
However, the test case should be such that the ret value is different, but the only statement causing the pointer to be captured should be the ret. If there were other statements causing pointer to be captured, we would have returned false for PointerMayBeCaptured()

TC I wrote for the case where cmp for %m does not dominate the ret

define i8* @compare_ret_escape(i8* %c) {

  %m = call i8* @malloc(i64 4)
  %n = call i8* @malloc(i64 4)
  %cmp = icmp eq i8* %n, %c
  br i1 %cmp, label %retst, label %chk

retst:
  ret i8* %m

chk:
  %bc = bitcast i8* %m to i32*
  %lgp = load i32*, i32** @gp, align 8, !nonnull !0
  %cmp2 = icmp eq i32* %bc, %lgp
  br i1 %cmp2, label %retst,  label %chk2

chk2:
  ret i8* %n
}

Right now the %cmp2 folds to false, and it looks right. Is there a case where we really need ReturnCaptures to be true?

anna added inline comments.Apr 28 2016, 10:47 AM
lib/Analysis/InstructionSimplify.cpp
2100 ↗(On Diff #55093)

Also, just to add to this, the third case is cmp post dominates the ret. In all cases, the important point is that ret is a terminator instruction that exits the function, so there is no way to go from the ret to cmp.

sanjoy added inline comments.Apr 28 2016, 2:38 PM
lib/Analysis/InstructionSimplify.cpp
2100 ↗(On Diff #55093)

You're right in that the optimizer needs to preserve single threaded
data dependencies. But I think there can be problems with
cross-thread data dependencies:

define i8* @f() {
entry:
  %ptr = call i8* @malloc()
  %global = load atomic i8*, i8** @global_var
  %cmp = icmp ne i8* %global, %ptr
  if (%cmp)
    *%global = 42;
  else
    *%global = 50;
  ret i8* %ptr
}

define void @g() {
  %val2 = call i8* @f()
  store atomic i8* %val2, i8** @global_var_2  ;; noalias with @global_var
  ret void
}

opt folds %cmp in @f to true, and then inlines @f into @g:

define void @g() {
entry:
  %ptr = call i8* @malloc()
  %global = load atomic i8*, i8** @global_var
  *%global = 42;
  store atomic i8* %ptr, i8** @global_var_2  ;; noalias with @global_var
  ret void
}

and re-orders:

define void @g() {
entry:
  %ptr = call i8* @malloc()
  store atomic i8* %ptr, i8** @global_var_2  ;; noalias with @global_var
  %global = load atomic i8*, i8** @global_var
  *%global = 42;
  ret void
}

The atomic markers on the loads and stores mean that these memory
accesses are visible on other threads (without atomic, sharing
memory across threads is undefined behavior), but it does not imply
any specific ordering.

Let's say we also have a thread doing this:

define void @another_thread() {
  %val = load atomic i8*, i8** @global_var_2
  store atomic i8* %val, i8** @global_var_2
  ret void
}

The original program would only ever write 50 to the value stored in
@global_var if the value stored in @global_var was from the
@malloc, so you could later do this safely:

%val = load i8*, i8** @global_var
if (*%val != 50)
  // no need to free %val

If another thread reads (atomically) from @global_var_2 and writes
that value to @global_var then in the transformed program the above
invariant is no longer true, since you could have this trace:

define void @g() {
entry:
  %ptr = call i8* @malloc()
  store atomic i8* %ptr, i8** @global_var_2  ;; noalias with @global_var

              %val = load atomic i8*, i8** @global_var_2
              store atomic i8* %val, i8** @global_var_2

  %global = load atomic i8*, i8** @global_var
  *%global = 42;
  ret void
}

and write 42 to a malloc'ed pointer stored in @global_var (illegal
state according to the original program).

Now I think we can get around this by restricting the load instruction
to be non-atomic, but for now I'd say it is safer to just bail out if
we see stores or returns.

anna updated this revision to Diff 55584.Apr 29 2016, 5:25 AM
anna edited edge metadata.
anna marked 2 inline comments as done.
sanjoy requested changes to this revision.May 2 2016, 11:19 AM
sanjoy edited edge metadata.
sanjoy added inline comments.
lib/Analysis/CaptureTracking.cpp
312 ↗(On Diff #55584)

Is this comment block wrapped to 80 chars?

313 ↗(On Diff #55584)

Nit: "its"

314 ↗(On Diff #55584)

Nit: I'd remove the "If the pointer escapes ... bit". I think the first sentence is enough, and the "captured = true" bit can be misleading since there isn't a bool capture in this function.

lib/Analysis/InstructionSimplify.cpp
2100 ↗(On Diff #55584)

You can use CmpInst::isFalseWhenEqual here, but did you intend to put this under the previous Pred == CmpInst::ICMP_EQ || Pred == CmpInst::ICMP_NE check? Right now you're handling inequalities too, I think.

Inequalities are tricky, since it is difficult to justify folding %newPtr ULE -1 to false, for instance.

This revision now requires changes to proceed.May 2 2016, 11:19 AM
anna updated this revision to Diff 55865.May 2 2016, 12:34 PM
anna edited edge metadata.
anna marked 2 inline comments as done.
anna marked 2 inline comments as done.
anna added inline comments.
lib/Analysis/InstructionSimplify.cpp
2100 ↗(On Diff #55584)

I see your point. Although we can never know the %newPtr value when it is dynamically allocated, there *might be* some inequalities which can not automatically be folded to false, such as making sure the value is ULE or UGE a 'boundary' value.

sanjoy accepted this revision.May 2 2016, 2:00 PM
sanjoy edited edge metadata.

This LGTM, but can you please add a test for ne specifically and one or two negative tests for inequality?

lib/Analysis/InstructionSimplify.cpp
3046 ↗(On Diff #55865)

This line is too long. I think just changing Constant to auto will fix it, otherwise please wrap.

This revision is now accepted and ready to land.May 2 2016, 2:00 PM
anna updated this revision to Diff 55991.May 3 2016, 7:18 AM
anna edited edge metadata.
anna marked an inline comment as done.

added more tests for compare as ne and scenarios when we cannot fold the comparison.

anna updated this object.May 3 2016, 7:38 AM
This revision was automatically updated to reflect the committed changes.