This fixes https://bugs.llvm.org/show_bug.cgi?id=51652.
The idea is to bump all the stat fields to 64-bit wide unsigned integers. I've confirmed this resolves the use case for chromium.
Differential D109217
[llvm-dwarfdump] Fix unsigned overflow when calculating stats djtodoro on Sep 3 2021, 2:44 AM. Authored by
Details This fixes https://bugs.llvm.org/show_bug.cgi?id=51652. The idea is to bump all the stat fields to 64-bit wide unsigned integers. I've confirmed this resolves the use case for chromium.
Diff Detail Event Timeline
Comment Actions Seems like a generally reasonabel direction forward.
Comment Actions Could something ala: Comment Actions Probably enough to write the portable code (so it works on MSVC too, etc) and let the compiler optimize it. At least for this saturation code, Clang produces the same with or without the intrinsic (& GCC produces something else entirely - to itself and to clang): https://godbolt.org/z/jb3nnaKvT Comment Actions Perhaps. (though anything that asserts in debug mode should basically be UB in non-debug mode, in my opinion - if you aren't testing/using the functionality, it shouldn't be defined) For ints we've got UBSan, but unsigned ints are defined on wrap. There's no unsigned type that's UB on overflow - certainly might be nice to have them to clarify the difference between a think you want to do weird bitfiddling with and expect all the overflow, etc, and a thing that's meant to do maths and where sanitizers could diagnose overflow, etc. But I think a couple of manual overflow checks here is probably OK - might be worth putting it in a generic function and applying it to all the statistics to make things more robust/generic. Comment Actions
-fsanitize=unsigned-integer-overflow ? Comment Actions Oh, right, we do have that :) (but no doubt LLVM isn't remotely clean of failures for it) Comment Actions
Comment Actions Just one more inline question from me, but I will defer to the other reviewers for the rest & approval. Thanks for fixing this.
Comment Actions Let me revisit the saturating integer without asserts. If you print 5 for an uint32_t, you will never know whether it overflowed never or 10 times (in release mode). A saturating integer will print 5 or max int (saturated). Even an SaturatingUint<uint64_t> shouldn't yield too much overhead. Comment Actions Hmm, is there an implementation of the SaturatingUint, or do we need to implement such type? Comment Actions Not at the moment. I just wanted to pitch the idea of having a saturating integer in LLVM. Comment Actions I think we can implement it here, and it will be useful/safe. Comment Actions It seems like the down side of using asserts to detect the "overflow" (the patch's current approach) is that release-config users may still get misleading stats due to wrapping. Implementing a saturating int in and of itself doesn't seem like a full solution, since a user may not notice that the stat is the saturated value, or even know that it's special, especially if the stats are consumed by another tool. IMO when the stat cannot be computed properly - however the detection is implemented, either with saturating ints, checks like the ones in the asserts, or something else - a good solution for users would be to print a message, and either skip printing the "bad" stats or all of them. That would be consistent for all build configurations and avoid hiding the issue. What do you think? Sorry if I'm just stating the obvious!
Comment Actions You will always know whether it is max int or max int because of saturating behaviour. class SatUint32 { uint32_t value; bool overflowed; } Comment Actions The saturating integer class would use the builtins I mentioned above to perform arithmetic operations on value and detect overflow and set overflow to true. Comment Actions I'd just saturate to max int, and use the max int value to indicate overflow. Shaving one value off to represent the overflow state seems fine to me. Comment Actions That is fine be me. I guess the point is a save way to collect statistics and give guidance to users when the results could be bad, in release and debug mode. I would argue that saturating integers are different and maybe more precise solution than going from uint32_t to uint64_t ... Comment Actions Well, both, probably - support use cases that weren't supported before (binaries that were too large to fit in the existing stats) and, separately/additionally, some way of reporting overflow rather than reporting bogus values.
Comment Actions
"#call site DIEs": N (llvm-dwarfdump: warning: this field overflows), Comment Actions Maybe we could render it symbolically and just say: "#call site DIEs": >= 9223372036854775807 But yeah, maybe the warning is more suitable, not sure - I'll leave it up to you folks to decide what's best. Comment Actions I'd prefer the warning since it will be easier when parsing the JSON data from utilities such as llvm-locstats.
Comment Actions I think this is reasonable, out of curiosity, would there be a benefit to using APInt? Probably not because 64 bits is already huge... Comment Actions I guess we can use APInt as well, but the 64 bits seem enough for now. A challenge would be to teach all the front-ends how to parse the big numbers (>64bits), but it is achievable. |
I think it'd be worth CHECKing the specific/full syntax, rather than just "this warning text appears somewhere in the output" - since we're specifically putting it in the output in a particular place.
Hmm, that raises a question: is this warning going to stderr, but the actual stats output is to stdout? If so then I think that's a different problem. Maybe that answers one of my other questions though when I suggested printing the output as "> max int" - https://reviews.llvm.org/D109217#3012175 - is that what you meant in this comment? That the value in the JSON data doesn't have any mention of the warning/overflow, and only has the saturated integer value?
I worry that's error-prone though, since the value is incorrect and a tool might not be aware of that. So I think it may be valuable to ensure we don't encode a valid value/something that could be mistaken for a valid value in the field when it's overflowed?
JSON supports the value being a string, I assume - so perhaps a string representation of ">= max int" or "overflowed" or something would be suitable?