We frequently run into user bugs caused by UB loads of out-of-range values from enum / BOOL bitfields. Teach UBSan to diagnose the issue.
Details
- Reviewers
- filcab - jroelofs - arphaman - rsmith 
- Commits
- rG129edab12593: Retry: [ubsan] Detect UB loads from bitfields
 rG5c13623a69ba: [ubsan] Detect UB loads from bitfields
 rC297389: Retry: [ubsan] Detect UB loads from bitfields
 rC297298: [ubsan] Detect UB loads from bitfields
 rL297389: Retry: [ubsan] Detect UB loads from bitfields
 rL297298: [ubsan] Detect UB loads from bitfields
Diff Detail
Event Timeline
| test/CodeGenCXX/ubsan-bitfields.cpp | ||
|---|---|---|
| 21 | Can we avoid the check if the bitfield is 2 bits wide? | |
| test/CodeGenObjC/ubsan-bool.m | ||
| 25 | One unrelated thing that I noticed in the IR, the zexts to i64 are emitted before the branch, even though they are used only in the invalid_value blocks. I know that the optimizer can move those anyway, but would there any be any benefit in moving them into the blocks at the frontend IRGen level? | |
| 26 | Is it possible to avoid the check here since the bitfield is just one bit wide? | |
Thanks for the feedback!
| test/CodeGenCXX/ubsan-bitfields.cpp | ||
|---|---|---|
| 21 | I don't think so, because if the user memset()'s the memory backing the struct, we could still load a value outside of {1, 2, 3} from the bitfield. | |
| test/CodeGenObjC/ubsan-bool.m | ||
| 25 | I'm not sure. Maybe it would make the live range of the zext smaller at -O0? I'll look into this and see if there's a real perf impact. | |
| 26 | No, a user could do: struct S1 s; s.b1 = 1; if (s.b1 == 1) // evaluates to false, s.b1 is negative With this patch we get: runtime error: load of value -1, which is not a valid value for type 'BOOL' (aka 'signed char') | |
I think this might miss loads from bitfield ivars. Also, what about the conversion that happens for properties whose backing ivar is a bitfield? (or does that happen in the runtime? can't remember)
| test/CodeGenObjC/ubsan-bool.m | ||
|---|---|---|
| 25 | @arphaman if it can be done easily/eleganly during IRGen, then this is a compile-duration win. | |
| 26 | What if BOOL is an unsigned char, or a char that's not signed? | |
I'll add a test that shows that this case is covered.
Also, what about the conversion that happens for properties whose backing ivar is a bitfield? (or does that happen in the runtime? can't remember)
I'll also check that synthesized getters have the range check.
| test/CodeGenObjC/ubsan-bool.m | ||
|---|---|---|
| 26 | Good point, we don't need to emit the range check if the bitfield is an unsigned, 1-bit wide BOOL. Would you be OK with me tackling that in a follow-up patch? | |
| test/CodeGenCXX/ubsan-bitfields.cpp | ||
|---|---|---|
| 21 | Sorry, I misread your question as 'if the enum can be represented in 2 bits'. You're right, the check can be skipped if the _bitfield_ is 2 bits wide. I think this can be handled along with the 1-bit unsigned BOOL case in a follow up. | |
| test/CodeGenObjC/ubsan-bool.m | ||
|---|---|---|
| 26 | True/false should be the only values loaded from a boolean. Since we made ubsan stricter about the range of BOOL (r289290), the check has caught a lot of bugs in our software. Specifically, it's helped root out buggy BOOL initialization and portability problems (the signedness of BOOL is not consistent on Apple platforms). There have been no false positives so far. Is this an issue you think needs to be addressed in this patch? | |
| 67 | Yes, we have an internal feature request for such a warning. | |
LGTM since my issue is only an issue on ObjC platforms and it seems those are the semantics it should have there.
| test/CodeGenObjC/ubsan-bool.m | ||
|---|---|---|
| 26 | It looks weird that we would disallow using a 1-bit (signed) bitfield for storing BOOL. But since this will only be a problem on obj-c, I don't think it will be a problem in general. If you're saying those are the semantics, then I'm ok with it. | |
Thanks for the review.
P.S: If it documented that the only possible values for BOOL are YES and NO (assuming they are 1 and 0)?
I think this more or less counts: https://developer.apple.com/reference/objectivec/objective_c_runtime/boolean_values
If so, I wonder if it's planned to document that you can't store a YES on a BOOL bitfield with a width of 1, since it looks like a weird case.
I will ask around. There is a warning about signedness issues here: https://developer.apple.com/reference/objectivec/bool?language=objc
Can we avoid the check if the bitfield is 2 bits wide?