This is an archive of the discontinued LLVM Phabricator instance.

[clang] Fix crash caused by PseudoObjectExprBitfields::NumSubExprs overflow
ClosedPublic

Authored by yronglin on Jul 9 2023, 4:26 AM.

Details

Summary

This patch makes the bit-fields wider, and also implement a small optimization for PseudoObjectExprBitfields, when there is no result in PseudoObjectExpr, we use 32 bits to store the number of subexpressions, otherwise, we use 16 bits to store the number of subexpressions, and use 16 bits to store the result indexes.

Fixes https://github.com/llvm/llvm-project/issues/63169

Diff Detail

Event Timeline

yronglin created this revision.Jul 9 2023, 4:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 9 2023, 4:26 AM
yronglin requested review of this revision.Jul 9 2023, 4:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 9 2023, 4:26 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
yronglin edited the summary of this revision. (Show Details)Jul 9 2023, 4:29 AM
yronglin added reviewers: rsmith, rjmccall, shafik.
yronglin added a reviewer: aaron.ballman.

I'm not sure if we should limit the value of NumSubExprs when build PseudoObjectExpr for __builtin_dump_struct, This cost too much memory when the nested members of the record are very deep and the num of member is very large.

yronglin edited the summary of this revision. (Show Details)Jul 9 2023, 4:38 AM

In general, I think this is a good approach. However, it sort of kicks the can down the road a bit; we will still overflow the member if there are enough fields. Would it make sense to also add a diagnostic to Sema so that overflow with the widened fields is diagnosed rather than causing a crash?

clang/include/clang/AST/Stmt.h
596–597
clang/test/SemaCXX/builtin-dump-struct.cpp
163
yronglin updated this revision to Diff 538709.Jul 10 2023, 9:55 AM

Address Aaron's comments.

yronglin updated this revision to Diff 538712.Jul 10 2023, 10:04 AM
yronglin marked 2 inline comments as done.

Update comments in PseudoObjectExpr.

In general, I think this is a good approach. However, it sort of kicks the can down the road a bit; we will still overflow the member if there are enough fields. Would it make sense to also add a diagnostic to Sema so that overflow with the widened fields is diagnosed rather than causing a crash?

Thanks you for take a look!

Would it make sense to also add a diagnostic to Sema so that overflow with the widened fields is diagnosed rather than causing a crash?

Ah, I think your are right. It doesn't make sense, As the comments for PseudoObjectExprBitfields says These don't need to be particularly wide, because they're strictly limited by the forms of expressions we permit. I think developers who use PseudoObjectExprBitfields need to be more careful.

We had that discussion in the bug report. Let's just increase the widths unconditionally; this is not such a common expression class that we need to worry about using an extra word.

yronglin added a comment.EditedJul 11 2023, 4:08 AM

We had that discussion in the bug report. Let's just increase the widths unconditionally; this is not such a common expression class that we need to worry about using an extra word.

Thanks for your review!

Let's just increase the widths unconditionally

Do you mean:

class PseudoObjectExprBitfields {
    friend class ASTStmtReader; // deserialization
    friend class PseudoObjectExpr;

    unsigned : NumExprBits;
    unsigned NumSubExprs : 16;
    unsigned ResultIndex : 16;
  };

There are 14 padding bits we can not use it to extent NumSubExprs and ResultIndex which after unsigned : NumExprBits

yronglin updated this revision to Diff 539152.Jul 11 2023, 9:30 AM

Address John's comment.

rjmccall added inline comments.Jul 11 2023, 10:17 AM
clang/include/clang/AST/Stmt.h
596–597

Please remove the comment, which is incorrect. Otherwise, I think this is fine.

aaron.ballman accepted this revision.Jul 11 2023, 11:28 AM

LGTM, thank you for the fix!

This revision is now accepted and ready to land.Jul 11 2023, 11:28 AM
yronglin updated this revision to Diff 539493.Jul 12 2023, 4:24 AM

Address comment

yronglin marked an inline comment as done.Jul 12 2023, 4:25 AM
yronglin added inline comments.
clang/include/clang/AST/Stmt.h
596–597

Thanks, removed.

yronglin marked an inline comment as done.Jul 12 2023, 4:26 AM

Thanks for your review! @aaron.ballman @rjmccall

Wait for CI green

dblaikie added inline comments.Jul 24 2023, 7:59 PM
clang/include/clang/AST/Stmt.h
596–597

Could/should we add some error checking in the ctor to assert that we don't overflow these longer values/just hit the bug later on?

(& could we use unsigned short here rather than bitfields?)

aaron.ballman added inline comments.Jul 25 2023, 6:11 AM
clang/include/clang/AST/Stmt.h
596–597

We've already got them packed in with other bit-fields from the expression bits, so I think it's reasonable to continue the pattern of using bit-fields (that way we don't accidentally end up with padding between the unnamed bits at the start and the named bits in this object).

I think adding some assertions would not be a bad idea as a follow-up.

dblaikie added inline comments.Jul 25 2023, 3:33 PM
clang/include/clang/AST/Stmt.h
596–597

Maybe some unconditional (rather than only in asserts builds) error handling? (report_fatal_error, if this is low priority enough to not have an elegant failure mode, but something where we don't just overflow and carry on would be good... )

dblaikie added inline comments.Jul 31 2023, 9:52 AM
clang/include/clang/AST/Stmt.h
596–597

Ping on this? I worry this code has just punted the same bug further down, but not plugged the hole/ensured we don't overflow on novel/larger inputs.

yronglin marked 4 inline comments as done.Aug 8 2023, 7:47 AM
yronglin added inline comments.
clang/include/clang/AST/Stmt.h
596–597

Sorry for the late reply, I was looking through the emails and found this. I agree add some assertions to check the value is a good idea, It's easy to help people catch bugs, at least with when -DLLVM_ENABLE_ASSERTIONS=ON, and I'm glad to work on it, but one thing that worries me is that, in ASTReader, we access this field directly, not through the constructor or accessor, and we have to add assertions everywhere. https://github.com/llvm/llvm-project/blob/05b4310c8aec7a050574277ced08a0ab86b27681/clang/lib/Serialization/ASTReaderStmt.cpp#L1382

aaron.ballman added inline comments.Aug 8 2023, 9:24 AM
clang/include/clang/AST/Stmt.h
596–597

I don't think we have to add too many assertions. As best I can tell, we'll need one in each of the PseudoObjectExpr constructors and one in ASTStmtReader::VisitPseudoObjectExpr(), but those are the only places we assign a value into the bit-field. Three assertions isn't a lot, but if we're worried, we could add a setter method that does the assertion and use the setter in all three places.

dblaikie added inline comments.Aug 9 2023, 10:52 AM
clang/include/clang/AST/Stmt.h
596–597

My concern wasn't (well, wasn't entirely) about adding more assertions - but about having a reliable error here. The patch only makes the sizes larger, but doesn't have a hard-stop in case those sizes are exceeded again (which, admittedly, is much harder to do - maybe it's totally unreachable now, for all practical purposes?)

I suspect with more carefully constructed recursive inputs could still reach the higher limit & I think it'd be good to fail hard in that case in some way? (it's probably rare enough that a report_fatal_error would be not-the-worst-thing-ever)

But good assertions would be nice too (the old code only failed when you hit /exactly/ on just the overflow value, and any more than that the wraparound would not crash/fail, but misbehave) - I did add the necessary assertion to ArrayRef (begin <= end) which would've helped detect this more reliably, but some assert checking for overflow in the ctor would be good too (with all the usual nuance/care in checking for overflow) - unless we're going to make that into a fatal or other real error.

yronglin marked 3 inline comments as done.
yronglin added inline comments.
clang/include/clang/AST/Stmt.h
596–597

Sorry for the very late reply. I have no preference between assertion and llvm_unreachable, if error then fail fast is looks good. I have a patch D158296 to add assertion.

dblaikie added inline comments.Aug 18 2023, 12:12 PM
clang/include/clang/AST/Stmt.h
596–597

Thanks for the assertions - though they still haven't met my main concern that this should have a hard failure even in a non-assertions build.

I know we don't have a perfect plan/policy for these sort of "run out of resources/hit a representational limit" issues (at least I don't think we do... do we, @aaron.ballman ? I know we have some limits (recursion, template expansion, etc) but they're fairly specific/aren't about every possible case of integer overflow in some representational element, etc) but we've seen this one is pretty reachable.

Here's a test case that would still trigger the assertion, and trigger UB in a non-assertions build:

truct t1 { };
template<typename T1>
struct templ {
    T1 v1;
    T1 v2;
    T1 v3;
    T1 v4;
};

struct t2 {
  templ<templ<templ<templ<templ<templ<t1>>>>>> c0;
  templ<templ<templ<templ<templ<templ<t1>>>>>> c1;
  templ<templ<templ<templ<templ<templ<t1>>>>>> c2;
};

void aj(...);
void f1(t2 w) { __builtin_dump_struct(&w, aj); }

(used templates to pack this a bit more densely than the original test case) - the sizeof the struct is certainly a bit outlandish (~12kbytes) bit not, I think, totally unreasonable?

yronglin marked an inline comment as done.Aug 19 2023, 8:33 AM
yronglin added inline comments.
clang/include/clang/AST/Stmt.h
596–597

Thanks for your example. I have three ways:

  1. use llvm_unreachable to emit a hard failure but not an assertion.
  2. extend these two field to 32-bit unsigned, it's may big enough.
  3. limit the functionality of __builtin_dump_struct, if there are too many fields in a struct, the part exceeding the limit will not be output, and replaced with ...(maybe).

WDYT? You guys are expert in clang, and I would like to wait for your guidance :)

yronglin marked an inline comment as done.Aug 19 2023, 8:36 AM
yronglin added inline comments.Aug 19 2023, 8:42 AM
clang/include/clang/AST/Stmt.h
596–597
aaron.ballman added inline comments.Aug 21 2023, 9:20 AM
clang/include/clang/AST/Stmt.h
596–597

I know we don't have a perfect plan/policy for these sort of "run out of resources/hit a representational limit" issues (at least I don't think we do... do we, @aaron.ballman ? I know we have some limits (recursion, template expansion, etc) but they're fairly specific/aren't about every possible case of integer overflow in some representational element, etc) but we've seen this one is pretty reachable.

Correct, we don't have a general mechanism for handling resource limits, we mostly play whack-a-mole when we run into them. So having a general utility that can work for other bit-fields would be really nice. However, __builtin_dump_struct is a debugging interface and not something we expect users to call particularly often, so some sharp edges with the interface are not the end of the world IMO.

limit the functionality of __builtin_dump_struct, if there are too many fields in a struct, the part exceeding the limit will not be output, and replaced with ...(maybe).

I think this is basically what @dblaikie was asking for -- if there are too many fields in the structure, either give a diagnostic that the structure is too complex for us to dump, or cut off the output somewhere sensible, etc. @dblaikie, do you have a preference for diagnostic vs prettier output? Ideally, I lean towards prettier output, but at the same time, that might be a bigger ask than what you were after.

dblaikie added inline comments.Aug 23 2023, 4:36 PM
clang/include/clang/AST/Stmt.h
596–597

Yeah, I don't have major/heroic requirements here - just that we don't crash on these inputs. An error, wider integers, both, etc, it's all good to me.

just an error seems fine to me - as you say, it's basically a debugging interface, people shouldn't be making this some loadbearing/totally generic part of an API or somesuch.

aaron.ballman added inline comments.Aug 24 2023, 6:10 AM
clang/include/clang/AST/Stmt.h
596–597

Thank you! I'm also fine with going with just an error.

yronglin marked 2 inline comments as done.Aug 24 2023, 9:40 AM
yronglin added inline comments.
clang/include/clang/AST/Stmt.h
596–597

I'm fine too! But I have some concerns, the BuiltinDumpStructGenerator create call of print functions. but the call of print functions not only for field, but also for { and '}', so I have no idea to describe the problem in the diagnostic message, 'too many fields in struct' doesn't feel right to me. Do you have any other good idea? I'd be happy to continue cook D158296

yronglin marked an inline comment as done.Aug 24 2023, 11:09 AM
aaron.ballman added inline comments.Aug 25 2023, 5:16 AM
clang/include/clang/AST/Stmt.h
596–597

Hmm, I think I'd be fine with something along the lines of: class|struct|union 'blah' is too complex to dump and leave "too complex" vague (because this is a debugging interface and it involves some quite questionably-sized objects, I think it's fine for the QoI bar to be lower for this diagnostic). WDYT?

dblaikie added inline comments.Aug 25 2023, 2:09 PM
clang/include/clang/AST/Stmt.h
596–597

"too complex" seems like totally the right phrasing to me

yronglin marked 2 inline comments as done.Aug 30 2023, 10:36 AM
yronglin added inline comments.
clang/include/clang/AST/Stmt.h
596–597

Thanks for your suggestions, I have updated D158296