~Marking as "WIP" because the code generated for the zext i32 -> i128 is less than ideal, which reflects poorly on ctpop 256.~
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
I was thinking about making narrowScalar do: zext(add(trunc(ctpop(hi)), trunc(ctpop(lo))) with the hope that some combine folds the inner trunc(zext(ctpop(x))) => ctpop(x), but I haven't tried this yet.
Comment Actions
Ah but these are going to be assigned to the FPR banks. We might be able to recognize the extend to s128 pattern and select the optimal code during selection.
Comment Actions
Rebased.
Also added zext(add(trunc(ctpop(lo)), trunc(ctpop(hi)))) narrowing of the add to improve things for ctpop 256.
Comment Actions
LGTM. I don't think we need to have perfect codegen first time if we're adding support from scratch.