This is an archive of the discontinued LLVM Phabricator instance.

[ARM][NEON] Improve vector popcnt lowering with PADDL (PR39281)
ClosedPublic

Authored by RKSimon on Oct 14 2018, 8:47 AM.

Details

Summary

As I suggested on PR39281, this patch uses PADDL pairwise addition to widen from the vXi8 CTPOP result to the target vector type.

This is a blocker for generic vector CTPOP expansion (P32655) - ARM's vXi64 CTPOP currently expands, which would generate a vXi64 MUL but ARM's lowering expands the general MUL case and vectors aren't well handled in LegalizeDAG - improving the CTPOP lowering was a lot easier than fixing the MUL lowering......

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Oct 14 2018, 8:47 AM
RKSimon updated this revision to Diff 169609.Oct 14 2018, 11:13 AM

Turns out the CTTZ custom lowering was already using this pattern.

samparker added inline comments.Oct 15 2018, 3:23 AM
lib/Target/ARM/ARMISelLowering.cpp
5483

For the 64-bit vector case, couldn't we use vpadd instead? We don't care about signed/unsigned, but we'd have to know that the wide result isn't necessary too - which I expect is fine for most bit counting cases.

RKSimon added inline comments.Oct 15 2018, 4:10 AM
lib/Target/ARM/ARMISelLowering.cpp
5483

Sorry, I don't quite understand - please can you show in the test codegen what you're trying to achieve?

samparker added inline comments.Oct 15 2018, 5:25 AM
lib/Target/ARM/ARMISelLowering.cpp
5483

Sorry, that would have been more clear and would have prevented me from asking in the first place! I thought you could use vpadd instead of vpaddl because I didn't realise the output vector properties.

This revision is now accepted and ready to land.Oct 15 2018, 5:30 AM
This revision was automatically updated to reflect the committed changes.