This is an archive of the discontinued LLVM Phabricator instance.

[LV] Don't predicate uniform mem op stores unneccessarily
ClosedPublic

Authored by reames on Jul 27 2022, 7:42 AM.

Details

Summary

We already had the reasoning about uniform mem op loads; if the address is accessed at least once, we know the instruction doesn't need predicated to ensure fault safety. For stores, we do need to ensure that the values visible in memory are the same with and without predication. The easiest sub-case to check for is that all the values being stored are the same. Since we know that at least one lane is active, this tells us that the value must be visible.

Warning on confusing terminology: "uniform" vs "uniform mem op" mean two different things here, and this patch is specific to the later. It would *not* be legal to make this same change for merely "uniform" operations.

Diff Detail

Event Timeline

reames created this revision.Jul 27 2022, 7:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2022, 7:42 AM
reames requested review of this revision.Jul 27 2022, 7:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2022, 7:42 AM

; if the address is accessed at least once, we know the instruction doesn't need predicated. This change just extends it to handle uniform mem op stores as well.

I might be missing something, but I am not sure if this logic can be directly extended to stores. I think the reasoning for uniform loads was that they load the same value in each iteration, so it is sufficient to load it once (and with tail folding it is still guaranteed to execute once). But uniform stores (as per isUniformMemOp) will store to the same address, but not necessarily the same value in each iteration.

With tail folding, if the stores to the same address are executed unconditionally, the final stored value will be the one from the last lane of the vector iteration. But couldn't that lane be masked out and the value from last active lane should be the final stored value? AFAICT this is happening in llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll?

; if the address is accessed at least once, we know the instruction doesn't need predicated. This change just extends it to handle uniform mem op stores as well.

I might be missing something, but I am not sure if this logic can be directly extended to stores. I think the reasoning for uniform loads was that they load the same value in each iteration, so it is sufficient to load it once (and with tail folding it is still guaranteed to execute once). But uniform stores (as per isUniformMemOp) will store to the same address, but not necessarily the same value in each iteration.

With tail folding, if the stores to the same address are executed unconditionally, the final stored value will be the one from the last lane of the vector iteration. But couldn't that lane be masked out and the value from last active lane should be the final stored value? AFAICT this is happening in llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll?

You're completely correct, and this patch is wrong and about to be abandoned. I'd gotten myself fixated on the speculation safety aspect here, and had not considered the predication required to ensure the right *value* was written. Grrr... At least restricting the original patch to be correct should be simple, so this isn't too big a waste of time.

Sorry for the wasted time, and thank you for pointing out my missing the obvious.

reames abandoned this revision.Jul 27 2022, 10:21 AM

As pointed out by reviewer, patch is unsound.

reames updated this revision to Diff 448101.Jul 27 2022, 11:08 AM
reames edited the summary of this revision. (Show Details)

Resurrect review for the corrected version of same logic - realized this was still worth splitting off even with the fix.

david-arm accepted this revision.Jul 28 2022, 12:54 AM

LGTM! The logic looks sounds to me now.

This revision is now accepted and ready to land.Jul 28 2022, 12:54 AM
This revision was landed with ongoing or failed builds.Jul 28 2022, 8:55 AM
This revision was automatically updated to reflect the committed changes.