This is an archive of the discontinued LLVM Phabricator instance.

[Lanai] implement wide immediate support
ClosedPublic

Authored by q3k on Jul 29 2021, 10:47 AM.

Details

Summary

This fixes LanaiTTIImpl::getIntImmCost to return valid costs for i128
(and wider) values. Previously any immediate wider than
64 bits would cause Lanai llc to crash.

A regression test is also added that exercises this functionality.

Diff Detail

Event Timeline

q3k created this revision.Jul 29 2021, 10:47 AM
q3k requested review of this revision.Jul 29 2021, 10:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 29 2021, 10:47 AM
q3k updated this revision to Diff 362826.Jul 29 2021, 10:57 AM

Attempting to stack changes.

jpienaar added inline comments.Aug 2 2021, 9:57 AM
llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h
63

Wouldn't this say that returning > 64 is cheaper than returning 32 bits?

q3k added inline comments.Aug 2 2021, 12:46 PM
llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h
63

As far as I understand, this affects only constant hoisting, and returning TCC_Free means “don't hoist this constant”, which is the default behavior unless implemented otherwise anyway. IIUC, this means worst case we're leaving possible optimizations for i128+ on the table but aren't making things worse for narrower types. There is also precedent to returning _Free when deilng with wide immediates in SystemZTTIImpl::getIntImmCost, which is in fact where I've lifted this from. :) Same for X86TTIImpl::getIntImmCost.

However, I can instead attempt to properly implement the cost here. As far as I understand, we want to return the cost of materializing loading the constant as an immediate, a.k.a. the number of instructions it takes to load a given constant into registers. For anything wider than 32 bit we could implement the line 71-74 calculation in a loop for each 32-bit chunk of the immediate. This is what RISC-V and AArch64 do.

Alternatively we could simplify it to (BitSize >> 3) * TTC_Basic (8 for i128, 16 for i256...), reflecting the worst case mov+or materialization for each 32 bits, ignoring the no-low-bits set optimization from lines 72-73. This is in line with 4 * TTC_Basic currently returned for i64.

WDYT?

jpienaar accepted this revision.Aug 4 2021, 11:22 AM
jpienaar added inline comments.
llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h
63

That makes sense, OK, lets keep it as is. And can refine if needed.

This revision is now accepted and ready to land.Aug 4 2021, 11:22 AM
q3k added a comment.Aug 5 2021, 2:25 AM

Thanks! I'll also need someone to submit/commit/land this for me. Is this something you can do, @jpienaar? Or should I be asking someone else?

q3k added a subscriber: MaskRay.Sep 9 2021, 9:58 AM

Friendly ping, need someone to actually submit this now... :) @jpienaar? @MaskRay?

This revision was automatically updated to reflect the committed changes.