DIEnumerator stores an APInt as of April 2020, so now we don't need to
truncate the enumerator value to 64 bits. Fixes assertions during IRGen.
Split from D105320, thanks to Matheus Izvekov for the test case and
report.
Differential D106585
Fix clang debug info irgen of i128 enums rnk on Jul 22 2021, 12:46 PM. Authored by
Details DIEnumerator stores an APInt as of April 2020, so now we don't need to Split from D105320, thanks to Matheus Izvekov for the test case and
Diff Detail
Event TimelineComment Actions Small nit but otherwise LGTM :)
Comment Actions Preserving existing behavior sounds OK - maybe some comment about that it might be good to remove so the next person who looks at it knows there's something not-quite-fully-reasoned here (& the comment about GCC's representation choice still seems off - GCC does use the signedness of the enumerators, not only the enumeration).
Comment Actions @rnk - any thoughts on the comments? (the existing comment about GCC's emission seems incorrect to me (or subtly correct, but probably easy to misunderstand) - GCC does use different DWARF encodings for negative enumerators, depending on the enumerator) And seems like a comment about how this is maybe confusing/in need of some more touch-up might be handy for the next reader? (specifically that the current DWARF backend ignores the "isUnsigned" flag entirely, and relies on the signedness of the enumerator's underlying type instead - so maybe we could remove the isUnsigned flag?) Comment Actions I wasn't aware that the backend ignored this info. If that's the case, then I think we can carry the signedness over from the enumerator, update the enum2 IRgen test, and someone can update the DWARF backend later if they want to match GCC more closely. I think C and GCC's behavior is driven by a desire to make enumerators behave as close as possible to an integer literal macro definition, which could explain the per-enumerator signedness, similar to how certain ways of spelling a literal affect its type. Comment Actions Yeah, perhaps that's the motivation, though it seems like somewhat of a lost cause, I think? Since there are lots of spellings of an enumerator value where this property wouldn't be preserved (a call to a constexpr function, for instance) - so it seems to me that the value would be better off using the domain of the enumeration rather than something about how an enumerator was spelled. Ah well, hopefully this discussion here will be findable if/when someone wants to look into it more in the future. Thanks! Comment Actions I'm curious: can you share an example dwarfdump diff that shows what is non-deterministic? Comment Actions Reviewing the code, I don't see any obvious sources of non-determinism (hash iteration). The only source I can imagine is uninitialized memory usage in the APInt. Let me know if this can be repro'd somehow. Comment Actions Actually, there are some remaining cases not covered by that. I'm trying to produce a small reproducer. Comment Actions Yes, cherry-picking rG3c47c5ca13b8a502de3272e8105548715947b7a8, rGbadcd585897253e94b7b2d4e6f9f430a2020d642 and reverting the changes to clang/lib/CodeGen/CGDebugInfo.cpp from rG323049329939becf690adbeeff9f5f7e219075ec and rGf9f56488e02d1c09a9cd4acde61ce1c712e71405 fixes the non-determinism in Firefox with clang 13.0.0. Comment Actions So far, what I've found is that in some cases, DIEnumerator::get returns the same node for similar enumerator values in different enums that happen to have the same name and value (even if their bitwidth differs), but sometimes not. DIEnumerator::get (Name={Data@0x5575ee17ea38="nsNumberControlFrame_id", Length=23}, IsUnsigned: optimized out, Value={U={VAL=50, pVal=0x32}, BitWidth=8}, Context@0x5575e8305da0.pImpl=0x5575e8307230) = 0x55760563a708 DIEnumerator::get (Name={Data@0x5575ee17ea38="nsNumberControlFrame_id", Length=23}, IsUnsigned: optimized out, Value={U={VAL=50, pVal=0x32}, BitWidth=32}, Context@0x5575e8305da0.pImpl=0x5575e8307230) = 0x5576067e7148 In another compilation of the same file with the same flags: DIEnumerator::get (Name={Data@0x561c659e0cc8="nsNumberControlFrame_id", Length=23}, IsUnsigned: optimized out, Value={U={VAL=50, pVal=0x32}, BitWidth=8}, Context@0x561c5fb68da0.pImpl=0x561c5fb6a230) = 0x561c7ce9dbf8 DIEnumerator::get (Name={Data@0x561c659e0cc8="nsNumberControlFrame_id", Length=23}, IsUnsigned: optimized out, Value={U={VAL=50, pVal=0x32}, BitWidth=32}, Context@0x561c5fb68da0.pImpl=0x561c5fb6a230) = 0x561c7ce9dbf8 (it's worth noting that both enums have, like, 155 items, all of which are identical (except for their bitwidth), and only one of them is deduplicated this way, sometimes) Comment Actions Reduced testcase: enum FrameIID { nsBox_id, nsIFrame_id, nsHTMLFramesetBlankFrame_id, nsHTMLFramesetBorderFrame_id, }; enum class ClassID : unsigned char { nsBox_id, nsIFrame_id, nsHTMLFramesetBlankFrame_id, nsHTMLFramesetBorderFrame_id, }; extern void foo(FrameIID, ClassID); void bar(FrameIID f, ClassID c) { foo(f, c); } Compile with clang++ -o test.ll -S -emit-llvm -g test.cpp a number of times, and observe that sometimes the output is different (about 1 in 20 on my machine). Add items to the enums to make it more frequent. Comment Actions This usage of isSameValue seems suspicious: https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/LLVMContextImpl.h#L389 It seems to allow the possibility that APInts of differing bitwidth compare equal, but the hash value incorporates the bitwidth, so they may be inserted into differing hash buckets. Comment Actions yup, also checking the bit width makes the non-determinism go away, I'll send out a patch |
Is the value already signed appropriately?
Removing this line of code (and the bool IsSigned variable, so it doesn't break -Werror=unused-variable) doesn't cause any tests to fail, that I can see.