Page MenuHomePhabricator

[CodeGen] Make uninitialized Lvalue bit-field stores poison compatible
AcceptedPublic

Authored by jmciver on Jun 24 2022, 12:39 AM.

Details

Summary

Poison compatibility is provided by emitting a freeze instruction, post load,
for all Lvalue bit-field store operations. This solution is indiscriminate to
the initialization state and thus the freeze instruction may hide poison during
subsequent re-assignments.

This patch implements the first proposed solution from the following RFC:
https://discourse.llvm.org/t/rfc-making-bit-field-codegen-poison-compatible/63250

The test-suite is passing; my current setup is not idea for timing information.

Diff Detail

Event Timeline

jmciver created this revision.Jun 24 2022, 12:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 24 2022, 12:39 AM
jmciver edited the summary of this revision. (Show Details)Jun 24 2022, 1:01 AM
jmciver published this revision for review.Jun 24 2022, 1:42 AM
jmciver edited the summary of this revision. (Show Details)
jmciver added reviewers: rjmccall, eli.friedman.
jmciver added a subscriber: nlopes.
Herald added a project: Restricted Project. · View Herald TranscriptJun 24 2022, 1:43 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
rjmccall accepted this revision.Jun 24 2022, 6:10 AM

Seems right to me.

This revision is now accepted and ready to land.Jun 24 2022, 6:10 AM

Under this scheme, is it illegal to link together object files built with -ffine-grained-bitfield-accesses and object files built with -fno-fine-grained-bitfield-accesses?

Do we want to add a temporary option to control this, to make it easier for people to benchmark any performance differences?

Under this scheme, is it illegal to link together object files built with -ffine-grained-bitfield-accesses and object files built with -fno-fine-grained-bitfield-accesses?

No, you can still link those. There's no ABI change nor any interference at IR level.
Thanks for the pointer, I wasn't aware of that option. What we can do is to optimize away the freezes with -ffine-grained-bitfield-accesses. The freeze is only needed when a struct field is shared between more than one bitfield.

Regarding perf, it's looking good, but @jmciver will post here a summary later.

No, you can still link those. There's no ABI change nor any interference at IR level.

The scenario I was thinking of with -ffine-grained-bitfield-accesses is something like the following:

File A:

struct X { int a : 8; int b : 24; };
void f(struct X*);
int g() {
    struct X x;
    x.a = 10;
    f(&x);
    return x.a;
}

File B:

struct X { int a : 8; int b : 24; };
void f(struct X* x) {
    x->b = 10;
}

If both files are compiled -ffine-grained-bitfield-accesses, the fields don't overlap. If both files are compiled with -fno-fine-grained-bitfield-accesses, the assignment in file A freezes both fields of "x". If file A is compiled with -ffine-grained-bitfield-accesses, but file B is not, f() corrupts the field "a", so g() returns poison (if I'm not missing anything?).

No, you can still link those. There's no ABI change nor any interference at IR level.

The scenario I was thinking of with -ffine-grained-bitfield-accesses is something like the following:

File A:

struct X { int a : 8; int b : 24; };
void f(struct X*);
int g() {
    struct X x;
    x.a = 10;
    f(&x);
    return x.a;
}

File B:

struct X { int a : 8; int b : 24; };
void f(struct X* x) {
    x->b = 10;
}

If both files are compiled -ffine-grained-bitfield-accesses, the fields don't overlap. If both files are compiled with -fno-fine-grained-bitfield-accesses, the assignment in file A freezes both fields of "x". If file A is compiled with -ffine-grained-bitfield-accesses, but file B is not, f() corrupts the field "a", so g() returns poison (if I'm not missing anything?).

You are right, thanks! f() corrupts a because f assumes the fields were both initialized or neither of them was initialized.
The fix is not trivial though, because -ffine-grained-bitfield-accesses would have to freeze the adjacent fields.
This only matters if the IRs are linked together with IPO. Otherwise, at object level it doesn't really matter.

Do you think we can get away by just documenting the incompatibility of doing IPO with files compiled with different -ffine-grained-bitfield-accesses flags?

This only matters if the IRs are linked together with IPO. Otherwise, at object level it doesn't really matter.

Right.

Do you think we can get away by just documenting the incompatibility of doing IPO with files compiled with different -ffine-grained-bitfield-accesses flags?

This seems like something that would be very easy to get wrong; I'd prefer not to trust that users know what their build systems are doing.

We could use a module flag to forbid linking together modules with different settings. Alternatively, we could mess with the code we generate with -ffine-grained-bitfield-accesses: use some intrinsic to freeze the adjacent bitfields, so a bitfield store has the same semantics with or without -ffine-grained-bitfield-accesses.

Is there no way in LLVM to load partially-initialized memory and then freeze it that will preserve the parts of the value that are initialized? Because that seems like something that LLVM needs to be providing.

Is there no way in LLVM to load partially-initialized memory and then freeze it that will preserve the parts of the value that are initialized? Because that seems like something that LLVM needs to be providing.

It's currently possible to load <32 x i1>, freeze the result, and bitcast it to i32. But the backend is likely to turn that into something messy. Otherwise, no, there isn't really any way to load a partially poisoned value as an i32.

This is loosely related to https://discourse.llvm.org/t/a-memory-model-for-llvm-ir-supporting-limited-type-punning/61948 .

Is there no way in LLVM to load partially-initialized memory and then freeze it that will preserve the parts of the value that are initialized? Because that seems like something that LLVM needs to be providing.

It's currently possible to load <32 x i1>, freeze the result, and bitcast it to i32. But the backend is likely to turn that into something messy. Otherwise, no, there isn't really any way to load a partially poisoned value as an i32.

Is that seen as a defect or as for some reason desirable? Because I worry that optimizer people are talking themselves into something that would be a truly beautiful model if only there were a frontend that could emit code for it.

Doesn't this make it e.g. illegal to use large integer types to do a memcpy of anything that might contain uninitialized padding?

Is that seen as a defect or as for some reason desirable? Because I worry that optimizer people are talking themselves into something that would be a truly beautiful model if only there were a frontend that could emit code for it.

Doesn't this make it e.g. illegal to use large integer types to do a memcpy of anything that might contain uninitialized padding?

Yes, it does make that illegal. I think there's still significant gap here.

To allow implementing memcpy using large integers, given we have poison, basically, the possible solutions I know of are: