This is an archive of the discontinued LLVM Phabricator instance.

[LegalizeDAG] Fix ATOMIC_CMP_SWAP_WITH_SUCCESS legalization.
ClosedPublic

Authored by efriedma on Jan 5 2018, 5:36 PM.

Details

Summary

The code wasn't zero-extending correctly, so the comparison could spuriously fail.

Inspired by D41791.

Diff Detail

Repository
rL LLVM

Event Timeline

efriedma created this revision.Jan 5 2018, 5:36 PM

I should start by saying that I think this patch seems to me rather uncontroversial, but I don't think it is adequate to address the original problem (explanation follows).

Shouldn't an equivalent thing be done for the ATOMIC_CMP_SWAP that is generated (since it will also have an underlying comparison)? What I see in the test case attached to the original patch is this:

t19: i32,i1,ch = AtomicCmpSwapWithSuccess<Volatile LDST2[@Glob]> t2, GlobalAddress:i64<i16* @Glob> 0, Constant:i32<-3344>, Constant:i32<222>

becomes the following after operation legalization:

t23: i32,ch = AtomicCmpSwap<Volatile LDST2[@Glob]> t2, t32, Constant:i32<-3344>, Constant:i32<222>
t25: i32 = AssertZext t23, ValueType:ch:i16
t29: i1 = setcc t25, Constant:i32<62192>, seteq:ch

So the operand to SETCC is zero-extended as desired, but the compare operand to the ATOMIC_CMP_SWAP is sign-extended (which in the test case I have included with the other patch will store the wrong value).

On the other hand, with the original patch the entire problem is resolved and I see:

t6: i16,i1,ch = AtomicCmpSwapWithSuccess<Volatile LDST2[@Glob]> t2, GlobalAddress:i64<i16* @Glob> 0, Constant:i16<-3344>, Constant:i16<222>

becomes the following after type legalization:

t21: i32,i1,ch = AtomicCmpSwapWithSuccess<Volatile LDST2[@Glob]> t2, GlobalAddress:i64<i16* @Glob> 0, Constant:i32<62192>, Constant:i32<222>

which will then be expanded into:

t25: i32,ch = AtomicCmpSwap<Volatile LDST2[@Glob]> t2, t32, Constant:i32<62192>, Constant:i32<222>
t27: i32 = AssertZext t25, ValueType:ch:i16
t29: i1 = setcc t27, Constant:i32<62192>, seteq:ch

Namely, doing this in type legalization will zero-extend the compare operand for the node before we get to the expansion, so that the SETCC/ATOMIC_CMP_SWAP nodes generated for the expansion will have their compare operand zero-extended consistently. However, doing so would also zero-extend the compare operand to the ATOMIC_CMP_SWAP_WITH_SUCCESS node even if the operation is custom-legalized and although I don't imagine any target relies on this not happening, I suppose it's possible that some target is.

In any case, this patch does not sufficiently address the motivating issue for the original patch (from: https://bugs.llvm.org/show_bug.cgi?id=35812).

This patch seems to be identical to the one I proposed here:
https://reviews.llvm.org/D38215

(While as mentioned there, this is no longer needed on SystemZ, I still think this is a correct change.)

As to Nemanja's comments, in the default expansion of ATOMIC_CMP_SWAP_WITH_SUCCESS, the result of an ATOMIC_CMP_SWAP call is compared with the compare value. Since the result of ATOMIC_CMP_SWAP may be differently extended depending on the platform (this is what TLI.getExtendForAtomicOps is all about), the compare value must be extended in the same way. E.g. on Mips, where TLI.getExtendForAtomicOps is SIGN_EXTEND, the compare value must likewise be always sign-extended, or else the comparison will fail.

Given that which extension is correct depends on the platform, it doesn't seem to make much sense to me to do an unconditional zero-extension of the compare value in common code ahead of time.

In any case, I'm not sure why the *input* to ATOMIC_CMP_SWAP should be extended in any way in the first place: the back-end gets told the actual type explicitly, and if any extension is necessary due to the particular way a back-end implements the operation, the back-end can just do that extension itself.

This patch seems to be identical to the one I proposed here:
https://reviews.llvm.org/D38215

(While as mentioned there, this is no longer needed on SystemZ, I still think this is a correct change.)

As to Nemanja's comments, in the default expansion of ATOMIC_CMP_SWAP_WITH_SUCCESS, the result of an ATOMIC_CMP_SWAP call is compared with the compare value. Since the result of ATOMIC_CMP_SWAP may be differently extended depending on the platform (this is what TLI.getExtendForAtomicOps is all about), the compare value must be extended in the same way. E.g. on Mips, where TLI.getExtendForAtomicOps is SIGN_EXTEND, the compare value must likewise be always sign-extended, or else the comparison will fail.

Given that which extension is correct depends on the platform, it doesn't seem to make much sense to me to do an unconditional zero-extension of the compare value in common code ahead of time.

Fair enough. I agree.

In any case, I'm not sure why the *input* to ATOMIC_CMP_SWAP should be extended in any way in the first place: the back-end gets told the actual type explicitly, and if any extension is necessary due to the particular way a back-end implements the operation, the back-end can just do that extension itself.

I'm not sure what you mean here. The i8 and i16 types are not legal on PPC. As such, type legalization will undoubtedly expand operands of those types. If you look at the debug dumps I posted above, you'll notice that pre-type-legalization, the inputs are i16 and post-type-legalization, they're i32.
I think the gist of the issue here is that type legalization should do the right thing - operation legalization is probably later than the optimal place for this. Although, I agree that the operands should probably be extended according to TLI.getExtendForAtomicOps().

In any case, I'm not sure why the *input* to ATOMIC_CMP_SWAP should be extended in any way in the first place: the back-end gets told the actual type explicitly, and if any extension is necessary due to the particular way a back-end implements the operation, the back-end can just do that extension itself.

I'm not sure what you mean here. The i8 and i16 types are not legal on PPC. As such, type legalization will undoubtedly expand operands of those types. If you look at the debug dumps I posted above, you'll notice that pre-type-legalization, the inputs are i16 and post-type-legalization, they're i32.
I think the gist of the issue here is that type legalization should do the right thing - operation legalization is probably later than the optimal place for this. Although, I agree that the operands should probably be extended according to TLI.getExtendForAtomicOps().

Sure, the operands will have been any-extended from i8/i16 to i32. But the target knows that this happened, because it knows that this is a i8/i16 ATOMIC_CMP_SWAP (via getMemoryVT). In that case, if it actually requires a particular sign- or zero-extended version of the original i8/i16 constant, it can still do the appropriate in-reg extension from the any-extended i32 value it got.

Sure, the operands will have been any-extended from i8/i16 to i32. But the target knows that this happened, because it knows that this is a i8/i16 ATOMIC_CMP_SWAP (via getMemoryVT). In that case, if it actually requires a particular sign- or zero-extended version of the original i8/i16 constant, it can still do the appropriate in-reg extension from the any-extended i32 value it got.

Oh OK, of course. I mentioned in the original patch that I can certainly fix this in the PPC back end. I imagine that other back ends have done the same thing one way or another (or have a bug in this they don't know about just as PPC does). However, I ultimately don't see the utility in type-legalization ignoring how the target wants these inputs extended and doing an any-extend when the target is going to have to pick one down the line. And the target has already stated how it wants atomics extended in that TLI hook.

Of course, if there's a good reason to keep these as any-extends in type-legalization, I don't mind fixing it in the back end.

Sure, the operands will have been any-extended from i8/i16 to i32. But the target knows that this happened, because it knows that this is a i8/i16 ATOMIC_CMP_SWAP (via getMemoryVT). In that case, if it actually requires a particular sign- or zero-extended version of the original i8/i16 constant, it can still do the appropriate in-reg extension from the any-extended i32 value it got.

Oh OK, of course. I mentioned in the original patch that I can certainly fix this in the PPC back end. I imagine that other back ends have done the same thing one way or another (or have a bug in this they don't know about just as PPC does). However, I ultimately don't see the utility in type-legalization ignoring how the target wants these inputs extended and doing an any-extend when the target is going to have to pick one down the line. And the target has already stated how it wants atomics extended in that TLI hook.

But that's not what the TLI hook says as I understand it; the TLI hook simply allows the back-end to inform common code how the result of a sub-word ATOMIC_CMP_SWAP will have been extended (either by hardware or by the back-end specific implementation). It doesn't say anything about inputs.

And it's not clear that inputs really need to be extended in any particular way anyway -- if the back-end has sub-word instructions, those will likely ignore high bits anyway; and if the back-end creates a compare-and-swap loop on the surrounding word, it may be able (like SystemZ) to implement the necessary comparisons without explicit extensions. So IMO this is best left to each target.

But that's not what the TLI hook says as I understand it; the TLI hook simply allows the back-end to inform common code how the result of a sub-word ATOMIC_CMP_SWAP will have been extended (either by hardware or by the back-end specific implementation). It doesn't say anything about inputs.

I have a similar understanding of the hook - i.e. how the results of atomic operations are extended. I guess I just see ATOMIC_CMP_SWAP as one where the input should conform to this as well as it is really a load-compare-store operation, so its input will be compared to a value that is loaded atomically. Namely, it's input should have the same high bits as the result of ATOMIC_LOAD.

And it's not clear that inputs really need to be extended in any particular way anyway -- if the back-end has sub-word instructions, those will likely ignore high bits anyway; and if the back-end creates a compare-and-swap loop on the surrounding word, it may be able (like SystemZ) to implement the necessary comparisons without explicit extensions. So IMO this is best left to each target.

If a target has sub-word instructions for all three operations, I suppose it doesn't care if the loaded value and the comparand have different high bits. In all other cases, the target ultimately has to do the work of ensuring the high bits are set the same way for both values at some point. It seemed to me that legalization would be a logical place for ensuring that work is done, but of course since there are objections, I'll just fix that in the PPC target.

As far as this patch is concerned (or D38215), I think it should still be committed since it truly appears to be the right thing to do. And for the one I originally posted, I'll abandon the approach of fixing this in legalization and I'll fix it in the PPC back end.

The PPC-specific part I discussed with Uli is in D41856.

I'll commit D41856 and between that patch and this one, we'll fix https://bugs.llvm.org/show_bug.cgi?id=35812. Once we get both committed, I'll ask for a merge into 6.0.

hfinkel accepted this revision.Jan 16 2018, 9:39 AM

LGTM

This revision is now accepted and ready to land.Jan 16 2018, 9:39 AM
This revision was automatically updated to reflect the committed changes.