This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Zero-extend the compare operand for ATOMIC_CMP_SWAP
ClosedPublic

Authored by nemanjai on Jan 9 2018, 2:27 AM.

Details

Summary

This fixes https://bugs.llvm.org/show_bug.cgi?id=35812.

As it turns out, the PPC back end has a long-standing bug where the atomic compare and swap operation on sub-word sizes will do the wrong thing if the old value (and the comparand) is negative. The issue is that the old value we load will be zero-extended and the value we're comparing it to might be sign or zero extended. This is broken both with the old way of loading/masking the value as well as with the new way (using the lharx/lbarx that were introduced in ISA 2.07).

This patch simply zero extends the input value if it isn't already guaranteed to be zero extended.

I initially posted a fix for this in legalization, but there was at least one target that preferred that this be handled separately in each target rather than in target-independent legalization.

Diff Detail

Repository
rL LLVM

Event Timeline

nemanjai created this revision.Jan 9 2018, 2:27 AM

My plan is to write a test case in projects/test-suite that will test these sub-word atomic compare-and-swap operations with negative values in various contexts (compare input is constant, parameter, loaded value, computed value). If there are no objections to this of course.

efriedma added inline comments.
lib/Target/PowerPC/PPCISelLowering.cpp
8819 ↗(On Diff #129050)

Can you simplify this using MaskedValueIsZero?

With this approach, I'm a little worried it will break in the future. For stores, DAGCombine calls SimplifyDemandedBits to simplify the input; if someone added the same functionality for atomic operations, it would break this check.

With this approach, I'm a little worried it will break in the future. For stores, DAGCombine calls SimplifyDemandedBits to simplify the input; if someone added the same functionality for atomic operations, it would break this check.

Do you mean that the general approach of updating the nodes during custom legalization is fragile or just because I used my own check for already zero-extended values? I am assuming that if I use MaskedValueIsZero() it should be safe since if this code can't prove the high bits aren't set, neither will SimplifyDemandedBits so it shouldn't undo what this code does.

lib/Target/PowerPC/PPCISelLowering.cpp
8819 ↗(On Diff #129050)

Thank you. I was looking for something with similar semantics but I was assuming it would have a name suggestive of zero-extension. This looks like it does exactly what I want.

nemanjai updated this revision to Diff 129279.Jan 10 2018, 9:01 AM

Updated to use MaskedValueIsZero() instead of a custom function. Thanks for the suggestion Eli.

Do you mean that the general approach of updating the nodes during custom legalization is fragile

Yes. The problem is that you're essentially attaching additional semantics to a target-independent opcode (specifically, that the "unused" bits of the ATOMIC_CMP_SWAP must be zero). DAGCombine doesn't understand this, and can therefore break your assumptions. This has come up before in other contexts (e.g. D31331).

The clean approach is to introduce a target-specific node (e.g. PPCISD::ATOMIC_CMP_SWAP), and custom-lower ISD::ATOMIC_CMP_SWAP to PPCISD::ATOMIC_CMP_SWAP; that way, it's clear your node has different semantics.

lib/Target/PowerPC/PPCISelLowering.cpp
8876 ↗(On Diff #129279)

Do you really need to custom-lower ATOMIC_CMP_SWAP_WITH_SUCCESS, given that we're just going to call into the custom lowering code again with the generated ATOMIC_CMP_SWAP?

Do you mean that the general approach of updating the nodes during custom legalization is fragile

Yes. The problem is that you're essentially attaching additional semantics to a target-independent opcode (specifically, that the "unused" bits of the ATOMIC_CMP_SWAP must be zero). DAGCombine doesn't understand this, and can therefore break your assumptions. This has come up before in other contexts (e.g. D31331).

The clean approach is to introduce a target-specific node (e.g. PPCISD::ATOMIC_CMP_SWAP), and custom-lower ISD::ATOMIC_CMP_SWAP to PPCISD::ATOMIC_CMP_SWAP; that way, it's clear your node has different semantics.

OK, I tend to opt for not introducing target-specific stuff that is opaque to the infrastructure, but if it has to be done, it has to be done. I'll add the PPCISD versions and update the patch asap since I'd certainly like to merge this back into 6.0 as well.

lib/Target/PowerPC/PPCISelLowering.cpp
8876 ↗(On Diff #129279)

Well, unless your patch or Uli's (D41798 or D38215) to zero-extend the operand to the generated SETCC lands, I need to do it for both, don't I? I certainly continue to get failures if I don't do this for both nodes without one of those patches.

efriedma added inline comments.Jan 10 2018, 1:04 PM
lib/Target/PowerPC/PPCISelLowering.cpp
8876 ↗(On Diff #129279)

Oh, yes, you're right. Hopefully we can get that landed soon, though.

nemanjai added inline comments.Jan 10 2018, 1:09 PM
lib/Target/PowerPC/PPCISelLowering.cpp
8876 ↗(On Diff #129279)

OK. So I'll update this to only handle ATOMIC_CMP_SWAP assuming that the other patch will be committed soon. When they land, I'll ask for both to be merged in 6.0. Does that sound like a reasonable plan?

efriedma added inline comments.Jan 10 2018, 1:16 PM
lib/Target/PowerPC/PPCISelLowering.cpp
8876 ↗(On Diff #129279)

Yes, sounds good.

nemanjai updated this revision to Diff 129368.Jan 10 2018, 4:29 PM

Legalize by producing PPC-specific atomic compare and swap ISD nodes.

Please note that this fix now no longer completely fixes the listed PR. It requires D41798 to land in order to completely fix that bug.

This revision is now accepted and ready to land.Jan 11 2018, 11:47 AM
This revision was automatically updated to reflect the committed changes.