This is an archive of the discontinued LLVM Phabricator instance.

Convert a masked.gather of at most one element to a masked.load
AbandonedPublic

Authored by reames on Apr 22 2019, 11:57 AM.

Details

Summary

Split off of D59703. If we have a gather with a single element, that's equivalent to a single element masked.load from the respective address.

Diff Detail

Event Timeline

reames created this revision.Apr 22 2019, 11:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2019, 11:57 AM
reames marked an inline comment as done.Apr 22 2019, 12:29 PM
reames added inline comments.
test/Transforms/InstCombine/masked_intrinsics.ll
211

Note: This result shows a missed scalarization opportunity.

In the generic case, we're creating a masked load of a scalar (although as noted inline, there's apparently no support or test for that pattern). But I don't know what the codegen for a <1 x X> masked load would be (do we cmp/br around the load since it's not speculatable?). Someone else with masked load lowering knowledge (@craig.topper @efriedma ?) should also review this.

If the motivating case really is a constant mask, then I think it's best to limit this patch to that pattern alone and create the scalar load directly.

lib/Transforms/InstCombine/InstCombineCalls.cpp
1234

'countPopulation() == 1' - more direct translation of the code comment?

1251–1252

Change 'auto' to 'Value' for all of these Builder calls to avoid ambiguity.

test/Transforms/InstCombine/masked_intrinsics.ll
204–206

It's fine to show the expected follow-on transform, but we should have a minimal test for this transform alone:

define <4 x double> @gather_lane0(<4 x double*> %ptrs, <4 x i1> %mask, <4 x double> %passthru) {
  %mask_with_at_most_1_bit_set = and <4 x i1> %mask, <i1 true, i1 false, i1 false, i1 false>
  %res = call <4 x double> @llvm.masked.gather.v4f64.v4p0f64(<4 x double*> %ptrs, i32 4, <4 x i1> %mask_with_at_most_1_bit_set, <4 x double> %passthru)
  ret <4 x double> %res
}

But there's not enough analysis power in possiblyDemandedEltsInMask() to get this?

Could at least reduce the test to this?

define <4 x double> @gather_lane1(<4 x double*> %ptrs, <4 x double> %passthru)  {
  %res = call <4 x double> @llvm.masked.gather.v4f64.v4p0f64(<4 x double*> %ptrs, i32 4, <4 x i1> <i1 false, i1 true, i1 false, i1 false>, <4 x double> %passthru)
 ret <4 x double> %res
}

In the generic case, we're creating a masked load of a scalar (although as noted inline, there's apparently no support or test for that pattern). But I don't know what the codegen for a <1 x X> masked load would be (do we cmp/br around the load since it's not speculatable?). Someone else with masked load lowering knowledge (@craig.topper @efriedma ?) should also review this.

A single <1 x X> masked load should get scalarized to a compare and branch sequence by the ScalarizeMaskedMemIntrinsic pass. This was done because the type legalizer wants to scalarize 1x vectors. So I have X86's target transform info reporting illegal for 1x vectors. Not sure if it would make sense to custom widen that instead.

RKSimon added inline comments.
lib/Transforms/InstCombine/InstCombineCalls.cpp
25

include order?

reames abandoned this revision.Oct 15 2021, 11:58 AM

Abandoning an old review I'm not going to return to any time soon.