This is an archive of the discontinued LLVM Phabricator instance.

[RISCV][SLP] Inflate insert/extract costs on very small vectors
AbandonedPublic

Authored by reames on Jun 2 2023, 8:01 AM.

Details

Summary

This change is a bit of a hack, but I've run out of better ideas. Basically, I'm adding a fudge factor to the cost of insertelement and extractelement operations at very small VLs. This has the effect of making vectorization of partially vectorizeable sub-trees appear less profitable to SLP. The result is that we vectorize significantly fewer small trees when SLP is enabled.

Note that because this penalty *isn't* being applied to loads and stores, we will still vectorize an VL=2 tree (or an VL=2 subtree with a wider root) if the sub-tree is fully vectorizeable.

Here's the impact on namd from spec 2017.

Before:

vsetivli : 2181 total
1: 135
2: 1411
3: 21
4: 579
8: 35
vsetvli : 735 total
zero, zero: 687
reg, zero : 48

After:

vsetivli : 1360 total
1: 73
2: 737
3: 11
4: 505
8: 34
vsetvli : 257 total
zero, zero: 191
reg, zero : 66

This interface isn't solely used by SLP, but it's close. There's one use in CodeGenPrepare which will cause CGP to be slightly more aggressive about speculating unused lanes, and one use in LV which will bias us away from uniform stores at very small VLs. Within SLP, this is mostly used to compute build_vector and extract costs. As the SLP test diff shows, this sometimes results in odd choices - e.g. why did scalarization cost going up result in fewer sub-vector extracts? - but on the whole clearly decreases vectorization at small VLs.

For context, I've been trying to improve the vector codegen at small VLs for the last few weeks (and @luke has been helping), but we don't seem to be making significant progress. My goal with this patch is basically to side step that work, be able to enable SLP by default, and *then* return to trying to hammer on small VL codegen.

Diff Detail

Event Timeline

reames created this revision.Jun 2 2023, 8:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2023, 8:01 AM
reames requested review of this revision.Jun 2 2023, 8:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2023, 8:01 AM
arcbbb added inline comments.Jun 3 2023, 9:51 AM
llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
1526

Is it needed for CostKind == TTI::TCK_CodeSize ?

I was thinking the total cost was composed of vmv instruction (BaseCost) and vslide instruction (SlideCost).
IIUC, BaseCost includes vector-scalar communication cost, and the addend 2 here accounts for this.

arcbbb added inline comments.Jun 3 2023, 9:58 AM
llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
1526

Considering that the addend accounts for the communication cost, it may not be necessary to impose limitations on the VF.

reames added inline comments.Jun 5 2023, 8:42 AM
llvm/test/Transforms/SLPVectorizer/RISCV/math-function.ll
203

For a bit of context since this confused me at first.

We don't actually have a vector lowering for exp. As a result, @llvm.exp.v2f32 is getting scalarized by the backend, and the cost model knows this. This is why this change to extract/insert element is seemingly effecting extract subvector costs. It isn't; it's, correctly, effecting the perceived cost of the vector exp op.

reames added inline comments.Jun 5 2023, 8:45 AM
llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
1526

This change is not about modeling the cost of the instruction - that should already be accounted for in the existing code. It's about artificially biasing the cost so as to influence the primary consumer against unprofitable decisions.

If you believe BaseCost is too low, please feel free to suggest an alternate patch with the motivation of why you believe this better explained.

ABataev added inline comments.Jun 5 2023, 9:32 AM
llvm/test/Transforms/SLPVectorizer/RISCV/math-function.ll
203

Then it requires the changes in the nide building algorithm. Need to check that the vector node is legal/vectorized, if it is not legal/salarized - build gather node instead of vector node

reames added inline comments.Jun 5 2023, 9:38 AM
llvm/test/Transforms/SLPVectorizer/RISCV/math-function.ll
203

Not sure what you meant here. Can you expand/clarify? In particular, was this meant as a requested change to this review? Or as a side comment?

ABataev added inline comments.Jun 5 2023, 9:43 AM
llvm/test/Transforms/SLPVectorizer/RISCV/math-function.ll
203

It is the correct fix. Need to change/fix buildnode algorithm. Need to check if it is legal to build vector node (like in your example for expf). If it is not, change the node to gather node, not vector node.

ping

@ABataev To confirm, your last comment was about potential future SLP work, and non-blocking for this change?

@ABataev ping?

Check D154738

I went and dug through this patch. I really don't think your patch and mine are solving the same problem at all. Yours appear to be focused on whether we should have vectorized an operation based on it's legality. Mine is focused on biasing vectorization away from *legal* but *unprofitable* partial vectorization scenarios. There's definitely an interaction here, but the approach is quite different.

@ABataev ping?

Check D154738

I went and dug through this patch. I really don't think your patch and mine are solving the same problem at all. Yours appear to be focused on whether we should have vectorized an operation based on it's legality. Mine is focused on biasing vectorization away from *legal* but *unprofitable* partial vectorization scenarios. There's definitely an interaction here, but the approach is quite different.

They are solving the same problem. Just your patch prevents vectorization at all because of the high cost of insert/extract instructions, while mine just limits the graph by the nodes, which won't be scalarazied later. Thus, it just throws away sclarized vector instructions out of the graph and do not even try to vectorize them and estimate their vectorization cost (which is not vectorization, but scalarization instead)

reames abandoned this revision.Aug 4 2023, 11:38 AM

Abandoning a change which is stuck in review.

In related activity, no one has screamed loudly about regressions since SLP was enabled. I've also found another set of opportunities to improve insertelement lowering. Putting those two together will the stalled review progress, this isn't of enough importance to continue with.