This is an archive of the discontinued LLVM Phabricator instance.

[lldb][TypeSystemClang] Deduce lldb::eEncodingUint for unsigned enum types
ClosedPublic

Authored by Michael137 on Sep 22 2022, 3:41 PM.

Details

Summary

The motivating issue was the following:

$ cat main.cpp
enum class EnumVals : uint16_t {
    VAL1 = 0
};

struct Foo {
    EnumVals b1 : 4;
};

int main() {
    Foo f{.b1 = (EnumVals)8};

    return 0; // Break here
}

(lldb) script
>>> lldb.frame.FindVariable("f").GetChildMemberWithName("b1").GetValueAsUnsigned()
4294967288

In the above example we observe a unsigned integer wrap-around
because we sign-extended the bit-fields underlying Scalar value
before casting it to an unsigned. The sign extension occurs because
we don't mark APSInt::IsUnsigned == true correctly when extracting
the value from memory (in Value::ResolveValue). The reason why sign
extension causes the wraparound is that the value we're assigning
to the bit-field is out-of-range, which causes Scalar::sext to left-fill
the Scalar with 1s.

This patch corrects GetEncoding to account for unsigned enum types.
With this change the Scalar would be zero-extended instead.

This is mainly a convenience fix which well-formed code wouldn't
encounter.

rdar://99785324

Diff Detail

Event Timeline

Michael137 created this revision.Sep 22 2022, 3:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 22 2022, 3:41 PM
Michael137 requested review of this revision.Sep 22 2022, 3:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 22 2022, 3:41 PM
Michael137 added a comment.EditedSep 22 2022, 3:42 PM

Wasn't sure how to properly test this since the original reproducer technically relies on implementation-defined behaviour (i.e., initialising a bitfield with an out-of-range value). Suggestions are welcome

Michael137 edited the summary of this revision. (Show Details)Sep 22 2022, 3:46 PM
Michael137 edited the summary of this revision. (Show Details)

Wasn't sure how to properly test this since the original reproducer technically relies on implementation-defined behaviour (i.e., initialising a bitfield with an out-of-range value). Suggestions are welcome

Is this undefined behaviour defined at least for clang only across platforms? I'm guessing most of the time the test suite programs are built with clang, though you can change that with a cmake var. Perhaps you could do something like @skipifnotclang in that case.

It wouldn't need to be same across platforms either really. Can always @skipifnotarchwhatever and as long as it's tested somewhere that's fine. Ok clang could change its mind so we mitigate that by making sure the test would fail if it does, then update it as needed.

  • Add API test

It wouldn't need to be same across platforms either really. Can always @skipifnotarchwhatever and as long as it's tested somewhere that's fine. Ok clang could change its mind so we mitigate that by making sure the test would fail if it does, then update it as needed.

Fair enough, I added a test for this. Can skip/fix if compilers ever change their mind

labath added a subscriber: labath.Sep 23 2022, 2:55 AM

Wasn't sure how to properly test this since the original reproducer technically relies on implementation-defined behaviour (i.e., initialising a bitfield with an out-of-range value). Suggestions are welcome

I'm probably missing something, but what exactly is undefined about that test program? The number eight fits comfortably in four bits, and afaik it is a valid value for the EnumVals type because it has a fixed underlying type.

Other than that, this seems fine to me.

Wasn't sure how to properly test this since the original reproducer technically relies on implementation-defined behaviour (i.e., initialising a bitfield with an out-of-range value). Suggestions are welcome

I'm probably missing something, but what exactly is undefined about that test program? The number eight fits comfortably in four bits, and afaik it is a valid value for the EnumVals type because it has a fixed underlying type.

Other than that, this seems fine to me.

Ah yes, absolutely true! Misinterpreted it from some of the other reproducers.
It's enough for the value we assign (here 8) to be out of range of the bit-field (if it were signed). Since we previously treated
all bit-fields of enum type as signed, that would incorrectly left-fill with 1s.
I.e., if the bit-field were of width 16, then 32768 would sign-extend incorrectly. I'll update the test description accordingly

  • Fix test description
  • Reword commit message
labath accepted this revision.Sep 23 2022, 3:23 AM

Ok, sounds good then.

This revision is now accepted and ready to land.Sep 23 2022, 3:23 AM
werat added a subscriber: werat.Sep 23 2022, 4:56 AM

On a somewhat related note, is it expected that GetValueAsUnsigned() performs an overflow as int32 if the type is smaller that int64?

❯ cat overflow.cc 
#include <stdint.h>
int main() {
  int8_t i8 = -1;
  int32_t i32 = -1;
  int64_t i64 = -1;
  return 0;
}

(lldb) script

>>> lldb.frame.FindVariable("i8").GetValueAsUnsigned()
4294967295
>>> lldb.frame.FindVariable("i32").GetValueAsUnsigned()
4294967295
>>> lldb.frame.FindVariable("i64").GetValueAsUnsigned()
18446744073709551615
>>>

lol, I knew about the last two parts (not that I agree with them, but I think we have about an equal amount of code which relies on it, and that which tries to work around it), but the first one is really weird. I think we have invented a middle ground between sign- and zero-extension.

werat added a comment.Sep 26 2022, 5:08 AM

lol, I knew about the last two parts (not that I agree with them, but I think we have about an equal amount of code which relies on it, and that which tries to work around it), but the first one is really weird. I think we have invented a middle ground between sign- and zero-extension.

haha, so this mean no chance of fixing this? I have a workaround for this as well -- https://github.com/google/lldb-eval/blob/04a73616c012c3dac7fb11206511bd2a9fe16db4/lldb-eval/value.cc#L146
I can live with that, but current behaviour does look like a bug.

lol, I knew about the last two parts (not that I agree with them, but I think we have about an equal amount of code which relies on it, and that which tries to work around it), but the first one is really weird. I think we have invented a middle ground between sign- and zero-extension.

haha, so this mean no chance of fixing this? I have a workaround for this as well -- https://github.com/google/lldb-eval/blob/04a73616c012c3dac7fb11206511bd2a9fe16db4/lldb-eval/value.cc#L146
I can live with that, but current behaviour does look like a bug.

That does look like a bug, thanks for reporting. Was going to open an issue and take a look at it this week

shafik added a subscriber: shafik.Sep 27 2022, 7:10 PM

Note, in both C and C++ converting a -1 to unsigned will always result in the max unsigned value e.g.:

#include <iostream>
#include <cstdint>

int main() {
  int8_t i8 = -1;
  int32_t i32 = -1;

  unsigned x = i8;
  std::cout << x << "\n";

  x = i32;
  std::cout << x << "\n";
}

output:

4294967295
4294967295
werat added a comment.Sep 28 2022, 7:28 AM

Note, in both C and C++ converting a -1 to unsigned will always result in the max unsigned value e.g.:

When you convert to "unsigned" then yeah, but here the API is uint64_t GetAsUnsigned(). I would expect this to be the equivalent of converting to uint64_t, not unsigned.