This is an archive of the discontinued LLVM Phabricator instance.

[LV] Improve register pressure estimate at high VFs
ClosedPublic

Authored by peterwaller-arm on May 18 2022, 12:04 PM.

Details

Summary

Previously, getRegUsageForType was implemented using
getTypeLegalizationCost. getRegUsageForType is used by the loop
vectorizer to estimate the register pressure caused by using a vector
type. However, getTypeLegalizationCost currently only appears to
understand splitting and not scalarization, so significantly
underestimates the register requirements.

Instead, use getNumRegisters, which understands when scalarization
can occur (via computeRegisterProperties).

This was discovered while investigating D118979 (Set maximum VF with
shouldMaximizeVectorBandwidth), where under fixed-length 512-bit SVE the
loop vectorizer previously ends up costing an v128i1 as 2 v64i*
registers where it actually occupies 128 i32 registers.

I'm sending this patch early for comment, I'm still doing some sanity checking
with LNT. I note that getRegisterClassForType appears to return VectorRC even
though the type in question (large vNi1 types) end up occupying scalar
registers. That might be worth fixing too.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2022, 12:04 PM
Herald added a subscriber: ctetreau. · View Herald Transcript
peterwaller-arm requested review of this revision.May 18 2022, 12:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2022, 12:04 PM
paulwalker-arm added inline comments.May 18 2022, 3:56 PM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
384–385

I lack some historical knowledge here but I agree it does look like the current implementation is answering the wrong question here.

Assuming others agree with the intent of the change I'm thinking the function definition should also be changed. Returning InstructionCost seems wrong and likely just the result of the original call to getTypeLegalizationCost(). I think unsigned is more representative of the function's intent.

  • Address Paul's suggestion: make getRegUsageForType return unsigned and simplify.
peterwaller-arm added inline comments.
llvm/test/Transforms/LoopVectorize/AArch64/i1-reg-usage.ll
36

I took a look at making this report the correct RC in D125956.

sdesmalen accepted this revision.May 19 2022, 2:55 AM

This change looks right to me.

I do think there is a related issue exposed by your test. The cost-model for the PHI node is not accurate, because it suggests the cost of the PHI itself is 0 but because the copy is being scalarized in CopyToReg/CopyFromReg the cost is actually really high. Perhaps we can update getCFInstrCost with some checks to see if the vector type requires Promotion + Splitting and if so, return a scalarization cost for the PHI node instead. We can also just fix the codegen itself of course, but I suspect that may be more work.

This revision is now accepted and ready to land.May 19 2022, 2:55 AM

I think for the case of D118979 it makes sense to prevent maximizing the vector bandwidth for fixed-length sve. The larger vectors will already be wide enough and as far as I understand they don't benefit from the wider types in the same way that NEON does.

It sounds like this will be useful in either case though. The same thing could potentially happen with 128xi1 types. And it looks like it is used in the interleaving factor calculations.

llvm/test/Transforms/LoopVectorize/AArch64/i1-reg-usage.ll
15

I don't think this need the target-feature=+neon

paulwalker-arm accepted this revision.May 19 2022, 10:04 AM
peterwaller-arm marked an inline comment as done.
  • Remove target-features test attribute per review and rebase.

I'm in the process of final-check-and-submit now, but may get interrupted.

This revision was landed with ongoing or failed builds.May 23 2022, 1:01 AM
This revision was automatically updated to reflect the committed changes.

it introduced a regression here:

https://github.com/llvm/llvm-project/issues/56374

Thanks for the report, should be fixed in rGc146af3f469adde04f7adb126e7d7b7b7047c88c, please ping again if not, I don't have gcc-6 to hand to test with.