This is an archive of the discontinued LLVM Phabricator instance.

[ubsan] Detect UB loads from bitfields
ClosedPublic

Authored by vsk on Feb 27 2017, 12:00 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

vsk created this revision.Feb 27 2017, 12:00 PM
arphaman added inline comments.Mar 6 2017, 9:29 AM
test/CodeGenCXX/ubsan-bitfields.cpp
21 ↗(On Diff #89914)

Can we avoid the check if the bitfield is 2 bits wide?

test/CodeGenObjC/ubsan-bool.m
25 ↗(On Diff #89914)

One unrelated thing that I noticed in the IR, the zexts to i64 are emitted before the branch, even though they are used only in the invalid_value blocks. I know that the optimizer can move those anyway, but would there any be any benefit in moving them into the blocks at the frontend IRGen level?

26 ↗(On Diff #89914)

Is it possible to avoid the check here since the bitfield is just one bit wide?

vsk added a comment.Mar 6 2017, 6:19 PM

Thanks for the feedback!

test/CodeGenCXX/ubsan-bitfields.cpp
21 ↗(On Diff #89914)

I don't think so, because if the user memset()'s the memory backing the struct, we could still load a value outside of {1, 2, 3} from the bitfield.

test/CodeGenObjC/ubsan-bool.m
25 ↗(On Diff #89914)

I'm not sure. Maybe it would make the live range of the zext smaller at -O0? I'll look into this and see if there's a real perf impact.

26 ↗(On Diff #89914)

No, a user could do:

struct S1 s;
s.b1 = 1;
if (s.b1 == 1) // evaluates to false, s.b1 is negative

With this patch we get:

runtime error: load of value -1, which is not a valid value for type 'BOOL' (aka 'signed char')

I think this might miss loads from bitfield ivars. Also, what about the conversion that happens for properties whose backing ivar is a bitfield? (or does that happen in the runtime? can't remember)

test/CodeGenObjC/ubsan-bool.m
25 ↗(On Diff #89914)

@arphaman if it can be done easily/eleganly during IRGen, then this is a compile-duration win.

26 ↗(On Diff #89914)

What if BOOL is an unsigned char, or a char that's not signed?

vsk added a comment.Mar 7 2017, 9:40 AM

I think this might miss loads from bitfield ivars.

I'll add a test that shows that this case is covered.

Also, what about the conversion that happens for properties whose backing ivar is a bitfield? (or does that happen in the runtime? can't remember)

I'll also check that synthesized getters have the range check.

test/CodeGenObjC/ubsan-bool.m
26 ↗(On Diff #89914)

Good point, we don't need to emit the range check if the bitfield is an unsigned, 1-bit wide BOOL. Would you be OK with me tackling that in a follow-up patch?

vsk updated this revision to Diff 90869.Mar 7 2017, 9:42 AM
  • Improve objc test coverage per Jon's suggestions.
vsk added inline comments.Mar 7 2017, 9:45 AM
test/CodeGenCXX/ubsan-bitfields.cpp
21 ↗(On Diff #89914)

Sorry, I misread your question as 'if the enum can be represented in 2 bits'. You're right, the check can be skipped if the _bitfield_ is 2 bits wide. I think this can be handled along with the 1-bit unsigned BOOL case in a follow up.

jroelofs accepted this revision.Mar 7 2017, 10:01 PM

LGTM

test/CodeGenObjC/ubsan-bool.m
50 ↗(On Diff #90869)

hmmm. just occurred to me that this value is always going to be 0xff or 0x0, so it might be useful if there were also a frontend warning that complains about signed bitfield bools (maybe something useful for another patch).

26 ↗(On Diff #89914)

No problem, sounds good to me.

This revision is now accepted and ready to land.Mar 7 2017, 10:01 PM
filcab added inline comments.Mar 8 2017, 7:47 AM
test/CodeGenObjC/ubsan-bool.m
50 ↗(On Diff #90869)

I'm more ok with this, assuming it makes sense in ObjC.

26 ↗(On Diff #89914)

I don't like generating an error here.
-1 is a perfectly valid BOOL (signed char) value and it's a perfectly valid value to have in a (signed) 1-wide bitfield.

vsk added inline comments.Mar 8 2017, 8:56 AM
test/CodeGenObjC/ubsan-bool.m
50 ↗(On Diff #90869)

Yes, we have an internal feature request for such a warning.

26 ↗(On Diff #89914)

True/false should be the only values loaded from a boolean. Since we made ubsan stricter about the range of BOOL (r289290), the check has caught a lot of bugs in our software. Specifically, it's helped root out buggy BOOL initialization and portability problems (the signedness of BOOL is not consistent on Apple platforms). There have been no false positives so far.

Is this an issue you think needs to be addressed in this patch?

filcab edited edge metadata.Mar 8 2017, 9:18 AM

LGTM since my issue is only an issue on ObjC platforms and it seems those are the semantics it should have there.

test/CodeGenObjC/ubsan-bool.m
26 ↗(On Diff #89914)

It looks weird that we would disallow using a 1-bit (signed) bitfield for storing BOOL. But since this will only be a problem on obj-c, I don't think it will be a problem in general. If you're saying those are the semantics, then I'm ok with it.
P.S: If it documented that the only possible values for BOOL are YES and NO (assuming they are 1 and 0)?
If so, I wonder if it's planned to document that you can't store a YES on a BOOL bitfield with a width of 1, since it looks like a weird case.

vsk added a comment.Mar 8 2017, 9:25 AM

LGTM since my issue is only an issue on ObjC platforms and it seems those are the semantics it should have there.

Thanks for the review.

P.S: If it documented that the only possible values for BOOL are YES and NO (assuming they are 1 and 0)?

I think this more or less counts: https://developer.apple.com/reference/objectivec/objective_c_runtime/boolean_values

If so, I wonder if it's planned to document that you can't store a YES on a BOOL bitfield with a width of 1, since it looks like a weird case.

I will ask around. There is a warning about signedness issues here: https://developer.apple.com/reference/objectivec/bool?language=objc

This revision was automatically updated to reflect the committed changes.