When it comes to the scalar cost of any predicated block, the loop vectorizer by default regards this predication as a sign that it is looking at an if-conversion and divides the scalar cost of the block by 2, assuming it would only be executed half the time. This however makes no sense if the predication has been introduced to tail predicate the loop.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Can you add the test separately and then just have the diff of the cost-model change in this patch?
llvm/test/Transforms/LoopVectorize/ARM/scalar-block-cost.ll | ||
---|---|---|
1 | Are the lines actually auto-generated? Looks like they are just a subset of the debug output? | |
8 | nit: are dse_local, no capture, read none local_unnamed_addr actually needed? | |
103 | are all those attributes needed? can you limit them to the minimum required? |
Yep, that's the plan. It will need a number of other patches though. I think one for adding a cost to predicated blocks is important, along with the one for costing masked loads more correctly under MVE.
LGTM, thanks!
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
3 | might be worth adding a note here that we do intentionally not use LoopVectorizationCostModel::blockNeedsPredication, this also returns true if predication is required due to folding the tail by masking. |
Thanks. We need to be careful with this as it has the potential to cause some regressions, especially as we here (in MVE land) make heavy use of predication nowadays. I am working through some of the details.
@dmgreen While I'm not sure this patch is a real root cause of the loop vectorizer crash https://bugs.llvm.org/show_bug.cgi?id=48564 but revert of this patch eliminates the crash.
I would appreciate if you could look into the bug.
Will do. I expected some fallout from this one.
Thanks for the report. Ill try and take a look.
might be worth adding a note here that we do intentionally not use LoopVectorizationCostModel::blockNeedsPredication, this also returns true if predication is required due to folding the tail by masking.