This is an archive of the discontinued LLVM Phabricator instance.

Calculate __builtin_object_size when pointer depends on a condition
ClosedPublic

Authored by spetrovic on Mar 24 2016, 4:35 AM.

Details

Summary

This patch fixes calculating of builtin_object_size if it depends on condition. Before this patch compiler didn't know how to calculate the object size when it finds condition that cannot be eliminated. This patch enables calculating of builtin_object_size even in case when condition cannot be eliminated by choosing minimum or maximum value as result from condition. Choosing minimum or maximum value from condition is based on second argument of __builtin_object_size function.

Diff Detail

Repository
rL LLVM

Event Timeline

spetrovic updated this revision to Diff 51533.Mar 24 2016, 4:35 AM
spetrovic retitled this revision from to Calculate __builtin_object_size when pointer depends on a condition.
spetrovic updated this object.
spetrovic set the repository for this revision to rL LLVM.
spetrovic added subscribers: ivanbaev, rankov.

Hi! Thanks for doing this.

include/llvm/Analysis/MemoryBuiltins.h
36 ↗(On Diff #51533)

Style nit: This should be CamelCase, not all caps.

39 ↗(On Diff #51533)

These identifiers seem really general to be putting them into the llvm namespace. Can we make this an enum class instead? Or can we prefix them all with something like OSM_?

137 ↗(On Diff #51533)

Please document what Mode does here.

lib/Analysis/MemoryBuiltins.cpp
559 ↗(On Diff #51533)

Do we not need to handle Phis, as well?

Consider:

void sideeffect();
int foo(int N) {
  char Buf[100];
  char *Ptr = Buf;
  if (N > 0) {
    sideeffect();
    Ptr = Buf + 50;
  }
  return __builtin_object_size(Ptr, 0);
}

...Which turns into:

define i32 @foo(i32 %N) {
entry:
  %Buf = alloca [100 x i8], align 16
  %0 = getelementptr inbounds [100 x i8], [100 x i8]* %Buf, i64 0, i64 0
  %cmp = icmp sgt i32 %N, 0
  br i1 %cmp, label %if.then, label %if.end

if.then:                                          ; preds = %entry
  tail call void (...) @sideeffect() #4
  %add.ptr = getelementptr inbounds [100 x i8], [100 x i8]* %Buf, i64 0, i64 50
  br label %if.end

if.end:                                           ; preds = %if.then, %entry
  %Ptr.0 = phi i8* [ %add.ptr, %if.then ], [ %0, %entry ]
  %1 = call i64 @llvm.objectsize.i64.p0i8(i8* %Ptr.0, i1 false)
  %conv = trunc i64 %1 to i32
  ret i32 %conv
}
567 ↗(On Diff #51533)

This seems like it would fail given:

int foo(int N) {
  char Big[20];
  char Small[10];
  char *Ptr = N ? Big + 10 : Small;
  return __bos(Ptr);
}

...And it looks like this was a bug in the original implementation, too. Yay bugs. :)

If you'd like to fix it, feel free; otherwise, I'll do it after your patch lands. If you decide to fix it, please add a test-case that doesn't go through CGP.

571 ↗(On Diff #51533)

Is there a reason we can't just compare TrueSide.first and FalseSide.first directly?

571 ↗(On Diff #51533)

Do we really want to ignore offsets here? It seems that doing would cause this to break on code like:

int foo(int N) {
  char Small[10];
  char Large[20];

  char *Ptr = N ? Small : Large + 19;
  return __bos(Ptr);
}
575 ↗(On Diff #51533)

Same comments as the Mode == MIN conditional above

lib/CodeGen/CodeGenPrepare.cpp
1788 ↗(On Diff #51533)

Style nit: Add another space of indentation please

1792 ↗(On Diff #51533)

Can we just say Mode == MIN in the ternary here?

test/CodeGen/X86/builtin-condition.ll
1 ↗(On Diff #51533)

I recommend using opt instead, so we don't have to compare against asm. Given that our test cases should be simple, you can probably get away with changing the run line to:

; RUN: opt -codegenprepare -S < %s | FileCheck %s

...And it will output LLVM IR with all of the objectsize calls lowered for you :)

34 ↗(On Diff #51533)

Can we add a test-case for something like

int foo(int N) {
  char Small[10];
  char Large[20];

  char *Ptr = N ? Small : Large + 19;
  return __bos(Ptr);
}

Too, please?

spetrovic updated this revision to Diff 51643.Mar 25 2016, 9:34 AM
spetrovic edited edge metadata.
spetrovic marked 7 inline comments as done.

Comments addressed. I was thinking to resolve handling Phis and including offset in calculating min and max values (for select and for phi) in next patch if it is OK with you.

I was thinking to resolve handling Phis and including offset in calculating min and max values (for select and for phi) in next patch if it is OK with you.

Sorry, "next patch" meaning the next revision to this review, or a different patch after this is committed?

If the former: sounds good!

If the latter: I'm fine if you want to commit Phi handling separately, but the offset issue seems like it would introduce a bug that breaks existing, correct code. I don't think that would be good. :)

include/llvm/Analysis/MemoryBuiltins.h
139 ↗(On Diff #51643)

Typo: "correspondig"

Ok, I will resolve offset problem in this revision. Handling Phis will be different patch after this is committed.
So, I have question about offset problem. For this example:

int foo(int N) {

char Small[10];
char Large[20];

char *Ptr = N ? Small : Large - 19;
return __builtin_object_size(Ptr,0);

}

int main() {

size_t ret;
ret = foo(0);
printf("\n%d\n", ret);
return 0;

}

gcc gives 39 as result, I don't think that is correct value, what is your opinion about this case and similar cases ?

In clang's lib/AST/ExprConstant.cpp, it seems that we always return 0 if
our offset from the object's start is negative (or if the offset is greater
than the size of the object). I think this behavior is sane, and that it
would be odd if LLVM and clang acted differently here, so I'd just hand
back 0 bytes if our offset is negative.

spetrovic updated this revision to Diff 52197.Mar 31 2016, 6:52 AM

Comments addressed.

spetrovic added inline comments.Mar 31 2016, 7:02 AM
lib/Analysis/MemoryBuiltins.cpp
574 ↗(On Diff #52197)

What do you think if we return unknown value if one of sides (true or false) has offset out of bounds, or you have some advice to return something else? Second proposal is to return some invalid value in SizeOffsetType object?

lib/Analysis/MemoryBuiltins.cpp
574 ↗(On Diff #52197)

What do you think if we return unknown value if one of sides (true or false) has offset out of bounds

I think that the goal of this code is to basically turn:

__builtin_object_size(a ? b : c, N);

into

max(__builtin_object_size(b, N), __builtin_object_size(c, N));

(or min, if N is 2 or 3).

If this is correct, I would expect foo below to return 3 if it isn't inlined:

int foo(int i) {
  char c[3];
  return __builtin_object_size(i ? c : c + 5, 0);
}

...because __bos(c, 0) should hand back 3, __bos(c + 5, 0) should hand back 0, and max(3, 0) is 3. So, I don't see how returning unknown makes sense.

spetrovic updated this revision to Diff 52803.Apr 6 2016, 9:12 AM

Comments addressed.

Thanks for fixing the offset bug! Just a few rather small comments, and I think we're good.

lib/Analysis/MemoryBuiltins.cpp
390 ↗(On Diff #52803)

Is there a reason this method needs to exist on the visitor? Either way, please take SizeOffsetType by const&.

391 ↗(On Diff #52803)

Nit: Data.second.isNegative()

579 ↗(On Diff #52803)

Nit: The braces serve no purpose, and I don't think they were here before. Please remove them. :)

585 ↗(On Diff #52803)

And here

spetrovic marked 4 inline comments as done.Apr 7 2016, 7:34 AM
spetrovic added inline comments.
lib/Analysis/MemoryBuiltins.cpp
390 ↗(On Diff #52803)

This part of code is called few times, and maybe will be called and for phi node case too, so I think it's OK to put it in a method.

spetrovic updated this revision to Diff 52918.Apr 7 2016, 8:05 AM
spetrovic marked an inline comment as done.

Comments addressed.

lib/Analysis/MemoryBuiltins.cpp
390 ↗(On Diff #52918)

...But

  1. It uses absolutely nothing from ObjectSizeOffsetVisitor
  2. It's presumably useless outside of MemoryBuiltins.cpp

Because of point 1, it should ideally be a free function. Because of point 2, it should ideally be local to MemoryBuiltins.cpp. So I'm not seeing how it makes more sense to make this a public method.

Sorry for the multiple comments -- one more thing:

test/CodeGen/X86/builtin-condition.ll
1 ↗(On Diff #52918)

Nit: This should probably go into test/Transforms/CodeGenPrepare, now that it's no longer generating x86. :)

spetrovic updated this revision to Diff 53016.Apr 8 2016, 5:38 AM
spetrovic marked 3 inline comments as done.
george.burgess.iv edited edge metadata.

This LGTM with two more comments. Thanks so much for the patch! :)

include/llvm/Analysis/MemoryBuiltins.h
146 ↗(On Diff #53016)

This has no users outside of MemoryBuiltins.cpp, so we can remove this.

lib/Analysis/MemoryBuiltins.cpp
367 ↗(On Diff #53016)

Please make this static APInt getSizeWithOverflow(const SizeOffsetType &Data) {, and move it above llvm::getObjectSize (so llvm::getObjectSize can use it :) ).

This revision is now accepted and ready to land.Apr 8 2016, 10:42 AM
nlopes added a subscriber: nlopes.Apr 9 2016, 10:18 AM
nlopes added inline comments.
lib/Analysis/MemoryBuiltins.cpp
559 ↗(On Diff #53016)

We used to handle PHIs, but then we had a compile-time regression because alias analysis got too slow. We may introduce a switch, though, to be more or less aggressive.

spetrovic updated this revision to Diff 53208.Apr 11 2016, 3:52 AM
spetrovic edited edge metadata.
spetrovic marked 2 inline comments as done.

Comments addressed. Thank you for review.

This revision was automatically updated to reflect the committed changes.