This is an archive of the discontinued LLVM Phabricator instance.

[Mem2Reg] Use poison instead of undef for read of uninitialized memory
Changes PlannedPublic

Authored by nikic on Jun 21 2021, 8:02 AM.

Details

Summary

During mem2reg, make loads from uninitialized memory return a poison value rather than an undef value. Also clarify the alloca documentation in this regard.

Diff Detail

Event Timeline

nikic created this revision.Jun 21 2021, 8:02 AM
nikic requested review of this revision.Jun 21 2021, 8:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2021, 8:02 AM

This is a big change because AFAIK clang intentionally encodes the value of uninitialized variables in a benign way.
For example: https://godbolt.org/z/EYjasxM1M is optimized to 0 even if it is undef ^ undef = undef. If poison is used, this won't work.
I think that to switch to poison reasonable performance benefit should be shown. I have a patch that makes poison propagate better, which will help this change.
BTW, do you have any idea about the failures happening at clang unit tests?

I really look forward to the day that we can commit this patch! 😀 That's the goal, to get rid of undef.

However, there's still one issue we need to solve first: bit-fields. The lowering of bitfield stores in clang is something like:

v = load word
v' = mask v + combine with bitfield value to be stored
store v'

On the first bit-field store the memory is uninitialized but it is still combined with the value to be stored. If we change uninitialized memory to be poison, as this patch does, clang's lowering becomes wrong.
Two fixes are possible:

  1. For bitfields on the stack, initialize the value to zero [possible optimization?]
  2. For other bitfields, freeze the value after loading. This is sufficient to tame the initial poison from uninitialized memory. Subsequent loads don't need freeze (if clang can prove they touch a bitfield that has been initialized before) as even if some value becomes poison it's not problematic. In C/C++ if a value becomes poison then the program has already executed UB, so we can taint the whole word without a problem.

TL;DR: we need to add a freeze after load in clang's bitfield lowering before this patch can go in. @nikic do you have bandwidth for this work?
After that, I'm all for it! But we should probably send an email to the ML so that folks working on other frontends notice the IR semantics change.

This is a big change because AFAIK clang intentionally encodes the value of uninitialized variables in a benign way.
For example: https://godbolt.org/z/EYjasxM1M is optimized to 0 even if it is undef ^ undef = undef. If poison is used, this won't work.

I'm not sure what you're getting at here. This isn't a difference between undef and poison. It's perfectly legal to fold "poison ^ poison -> 0". And we probably should, just to avoid the hassle involved in diagnosing the issues that would result from changing this fold.

@aqjune I don't think performance is really relevant here: It's a necessary step to remove undef in the future. I think this is one of the main ways left in which we introduce undef in IR.

@nlopes Ugh, this is ugly. The fact that clang is using this also means that we can't just ignore bitcode autoupgrade. Adding a freeze on every load seems like a really big hammer, but I'm not sure what else could be done.

@nlopes Ugh, this is ugly. The fact that clang is using this also means that we can't just ignore bitcode autoupgrade. Adding a freeze on every load seems like a really big hammer, but I'm not sure what else could be done.

It would be just loads for bitfields. They are not common, so I guess it's ok.

There are technically better solutions, but require more functionality in the IR. But I don't think it's worth it for such an uncommon feature.
A better solution come to mind:
Introduce "really packed" structs, where we push all the ABI work to the frontend. LLVM's packed structs still go through the ABI code and may get extra padding, while we would need something that bypasses the padding so we could encode bitfields precisely.
This solution doesn't require any freezing, but it's more complicated: requires additional IR features, backend work, etc. So the load+freeze for bitfields sounds better to me.

Or, can we temporarily mark an alloca with !has_bitfield if it had bitfields in C & make mem2reg use poison only when uninitialized memory is read, for a while?

Or, can we temporarily mark an alloca with !has_bitfield if it had bitfields in C & make mem2reg use poison only when uninitialized memory is read, for a while?

Well, that doesn't help much. It's just a band-aid..
I would rather initialize all alloca'd bitfields with zero instead. Zero folds trivially with the bit-masking operations from bit-fields, so it's unlikely to disrupt anything.

nikic planned changes to this revision.Jul 7 2021, 12:51 PM

Removing this from the review queue -- this change by itself is clearly not sufficient.