This is an archive of the discontinued LLVM Phabricator instance.

[TLI] `TargetLowering::SimplifyDemandedVectorElts()`: narrowing bitcast: fill known zero elts from known src bits
ClosedPublic

Authored by lebedev.ri on Apr 5 2022, 4:38 PM.

Details

Summary

E.g. in

%i0 = zext <2 x i8> to <2 x i16>
%i1 = bitcast <2 x i16> to <4 x i8>

the %i0's zero bits are known to be 0xFF00 (upper half of every element is known zero),
but no elements are known to be zero, and for %i1, we don't know anything about zero bits,
but the elements under 0b1010 mask are known to be zero (i.e. the odd elements).

But, we didn't perform such a propagation.

I think i wrote that right?

Diff Detail

Event Timeline

lebedev.ri created this revision.Apr 5 2022, 4:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2022, 4:38 PM
lebedev.ri requested review of this revision.Apr 5 2022, 4:38 PM
lebedev.ri edited the summary of this revision. (Show Details)Apr 5 2022, 4:47 PM
RKSimon added inline comments.Apr 6 2022, 1:51 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
2771

Known is only guaranteed to be correct for the requested SrcDemandedBits/SrcDemandedElts - is that going to work in a different way to the DemandedElts check below? Sorry I haven't investigated properly yet, but we might have to check that we were demanding those src bits?

lebedev.ri added inline comments.Apr 6 2022, 2:04 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
2754–2759

SrcDemandedBits was just synthesized from the DemandedElts, so i don't see any need in checking the former?

RKSimon added inline comments.Apr 6 2022, 3:29 AM
llvm/test/CodeGen/X86/slow-pmulld.ll
264

regression?

lebedev.ri added inline comments.Apr 6 2022, 3:29 AM
llvm/test/CodeGen/X86/slow-pmulld.ll
284

With AVX1, we can only broadcast i32 load to XMM/YMM, and i64 to YMM,
but with AVX2 we can broadcast i8/i16/i32 load to XMM/YMM.
Is lowerBuildVectorAsBroadcast() intentionally not doing that,
because such i8/i16 broadcasts are slow, or is that a bug?

lebedev.ri added inline comments.Apr 6 2022, 3:32 AM
llvm/test/CodeGen/X86/slow-pmulld.ll
284

but with AVX2 we can broadcast i8/i16/i32 load to XMM/YMM.
but with AVX2 we can broadcast i8/i16/i32/64 load to XMM/YMM.

RKSimon accepted this revision.Apr 6 2022, 3:40 AM

LGTM - cheers

llvm/test/CodeGen/X86/slow-pmulld.ll
284

Its one of the many annoyances of lowering constant broadcasts that I mentioned on https://github.com/llvm/llvm-project/issues/54743 - I think this is because AVX512 doesn't have many ops that do i8/i16 broadcast-memory folds? Let's accept it for now.

This revision is now accepted and ready to land.Apr 6 2022, 3:40 AM
lebedev.ri added inline comments.Apr 6 2022, 3:45 AM
llvm/test/CodeGen/X86/slow-pmulld.ll
284

I have a patch, but it shows a number of load folding failures instead :S

LGTM - cheers

Okay then, thank you for the review!

This revision was landed with ongoing or failed builds.Apr 6 2022, 4:19 AM
This revision was automatically updated to reflect the committed changes.