This is an archive of the discontinued LLVM Phabricator instance.

[CostModel][AArch64] Improve the cost estimate of CTPOP intrinsic
ClosedPublic

Authored by RosieSumpter on Jun 9 2021, 4:01 AM.

Details

Summary

Added a case for CTPOP to AArch64TTIImpl::getIntrinsicInstrCost so that
the cost estimate matches the codegen in
test/CodeGen/AArch64/arm64-vpopcnt.ll

Diff Detail

Event Timeline

RosieSumpter created this revision.Jun 9 2021, 4:01 AM
RosieSumpter requested review of this revision.Jun 9 2021, 4:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 9 2021, 4:01 AM
SjoerdMeijer added inline comments.Jun 9 2021, 4:19 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
281

From a quick look, lookup tables here are more rare, while the it seems a lot more prevalent in the x86 implementations. But I quite like this little table here, it's accurate and easy to read, so have no problem with it.

289

Nit: you don't need the curly brackets for an if with 1 statement.

289

Do we have a test case where the table look up fails?

llvm/test/Analysis/CostModel/AArch64/ctpop.ll
151

Does this really have a cost of only 2?

I guess it's 2 because LT.first * Entry->Cost = 2 * 1 = 2

So I am wondering if it is easier to expand the table with vector types, and if the lookup fails let it fall back to the generic function that will hopefully then assign a high cost for it.

RosieSumpter added inline comments.Jun 9 2021, 6:36 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
289

Good point, I'll expand the table then the large vector cases won't be found in the table.

llvm/test/Analysis/CostModel/AArch64/ctpop.ll
151

I did try doing a check of the codegen for the large vector cases and they did seem to be double the cost, but I understand it is probably safer to expand the table so I'm happy to do that

Table now contains the vectors <= 128 bits

RosieSumpter added inline comments.Jun 9 2021, 9:47 AM
llvm/test/Analysis/CostModel/AArch64/ctpop.ll
151

The costs for the large vectors are all now 4 as they fall through to a CustomCost of LT.first*2 in BasicTTIImpl.h. Is this what we want?

dmgreen added a subscriber: Rin.Jun 9 2021, 10:27 AM
dmgreen added inline comments.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
280

It looks like this file doesn't contain all of the cases. As far as I understand, this is how it works:

  • v8i8 and v8i16 are legal, so 1 instruction. Fantastic!
  • v4i16 and v8i16 are converted to "v16i8 ctpop + addp". So cost 2
  • v2i32 and v4i32 are converted to "v16i8 ctpop + addp + addp". So cost 3
  • v1i64 and v2i64 are converted to "v16i8 ctpop + addp + addp + addp". So cost 4

Those are all good. For scalar, as opposed to vector, there is no good instruction though. The generation also looks pretty terrible at the moment.
So a i8 becomes "and 0xff; expensive-mov; v8i8 cnt; addlv; expensive-mov". The others looks equally expensive too. I would guess a cost of 5 would make sense?

Everything else would be legalized to one of those types. So you can probably use "auto LT = TLI->getTypeLegalizationCost(DL, RetTy);" and Use LT.first for the type in the table lookup, and return Entry->Cost * LT.second, to include the cost of legalization (how many different vectors there will be).

282

The table can also be easier to read if it's not formatted.
And maybe in a different order would help.

CarolineConcatto added inline comments.
llvm/test/Analysis/CostModel/AArch64/ctpop.ll
64

nit: I don't know why ';' was added here in all functions, but I don't think you should use this patch to remove them.

RosieSumpter added inline comments.Jun 10 2021, 2:20 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
280

Thanks for the help with this, I've had another go at it. Only thing I'm not sure about now is that e.g. v2i16 is promoted to v2i32 so with LT.first * Entry->Cost it gets a cost of 3, but checking codegen it seems there are 5 instructions. Something like (2 * Entry->Cost - 1) would give the right numbers, but is this the correct thing to do?

  • Corrected the scalar costs
  • Adjusted the table
  • Return LT.first * Entry->Cost or LT.first*Entry->Cost + 1 depending on whether vector integer type is promoted
dmgreen accepted this revision.Jun 11 2021, 1:00 AM

Looks sensible to me, thanks

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
293

I may be simpler to write it as this, without the if:

// Extra cost of +1 when illegal vector types are legalized by promoting
// the integer type.
int ExtraCost = MTy.getScalarSizeInBits() != RetTy->getScalarSizeInBits() ? 1 : 0;
return LT.first * Entry->Cost + ExtraCost;
llvm/test/Analysis/CostModel/AArch64/ctpop.ll
64

If this is what the update script produces, I'm OK with the extra lines being removed. Either here or in a separate NFC patch.

This revision is now accepted and ready to land.Jun 11 2021, 1:00 AM
This revision was landed with ongoing or failed builds.Jun 11 2021, 3:23 AM
This revision was automatically updated to reflect the committed changes.
RosieSumpter marked 2 inline comments as done.