This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][GlobalISel] Legalize ctpop s128
ClosedPublic

Authored by jroelofs on Jul 21 2021, 1:27 PM.

Details

Summary

~Marking as "WIP" because the code generated for the zext i32 -> i128 is less than ideal, which reflects poorly on ctpop 256.~

Diff Detail

Event Timeline

jroelofs created this revision.Jul 21 2021, 1:27 PM
jroelofs requested review of this revision.Jul 21 2021, 1:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2021, 1:27 PM

Do you have any idea of how to improve the zext behaviour?

Do you have any idea of how to improve the zext behaviour?

I was thinking about making narrowScalar do: zext(add(trunc(ctpop(hi)), trunc(ctpop(lo))) with the hope that some combine folds the inner trunc(zext(ctpop(x))) => ctpop(x), but I haven't tried this yet.

Do you have any idea of how to improve the zext behaviour?

I was thinking about making narrowScalar do: zext(add(trunc(ctpop(hi)), trunc(ctpop(lo))) with the hope that some combine folds the inner trunc(zext(ctpop(x))) => ctpop(x), but I haven't tried this yet.

For the s32 -> s64 case, that should be folded to a G_ZEXT by D106768

Do you have any idea of how to improve the zext behaviour?

I was thinking about making narrowScalar do: zext(add(trunc(ctpop(hi)), trunc(ctpop(lo))) with the hope that some combine folds the inner trunc(zext(ctpop(x))) => ctpop(x), but I haven't tried this yet.

For the s32 -> s64 case, that should be folded to a G_ZEXT by D106768

Ah but these are going to be assigned to the FPR banks. We might be able to recognize the extend to s128 pattern and select the optimal code during selection.

jroelofs updated this revision to Diff 361810.Jul 26 2021, 2:39 PM
jroelofs retitled this revision from WIP: [AArch64][GlobalISel] Legalize ctpop s128 to [AArch64][GlobalISel] Legalize ctpop s128.
jroelofs edited the summary of this revision. (Show Details)

Rebased.

Also added zext(add(trunc(ctpop(lo)), trunc(ctpop(hi)))) narrowing of the add to improve things for ctpop 256.

aemerson accepted this revision.Jul 26 2021, 3:08 PM

LGTM. I don't think we need to have perfect codegen first time if we're adding support from scratch.

This revision is now accepted and ready to land.Jul 26 2021, 3:08 PM
This revision was landed with ongoing or failed builds.Jul 26 2021, 4:34 PM
This revision was automatically updated to reflect the committed changes.