This is an archive of the discontinued LLVM Phabricator instance.

[LV] Create new vector loop preheader so it contains vectorizer generated code only.
Needs ReviewPublic

Authored by ebrevnov on Dec 10 2019, 3:47 AM.

Details

Summary

This change is motivated by https://reviews.llvm.org/D71053 where we add suport for evaluating overhead of runtime checks.
Currently we may reuse existing block for vector preheader. Thus we endup with instructions not related to vectorization in vector preheader. With this change we will reuse existing preheader only if it has single instruction (should be a branch) and will generate new preheader in all other cases. That makes it easier to read the IR. What more importantly it enables us to evaluate cost of vector prehader in a more convinient way.

Event Timeline

ebrevnov created this revision.Dec 10 2019, 3:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 10 2019, 3:47 AM

This change introduces a number of new branches in the code. While not necessarily the hottest code, it may prove significant on short loops, especially nested loops. Without evidence in the form of benchmarks etc. it's hard to justify.

Furthermore, I'm not sure I agree with the statement that this makes the analysis somehow simpler. Do you have an explicit goal in mind?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2687

Are you sure this is semantically valid on all targets?

This change introduces a number of new branches in the code. While not necessarily the hottest code, it may prove significant on short loops, especially nested loops. Without evidence in the form of benchmarks etc. it's hard to justify.

These extra branches are very short leaved and optimized out by SimplifyCFG which happens just 3 passes after the vectorizer. Thus I don't believe it can cause any harm to the performance. I double check that LLVM's test-suite has no regressions.

Furthermore, I'm not sure I agree with the statement that this makes the analysis somehow simpler. Do you have an explicit goal in mind?

This is definitely a subjective thing as there is no any formal metric to measure. IMHO having vectorizer generated code clearly separated from the rest makes IR more readable. It becomes easy to see whole structure of the vectorizer generated code. Having sad that it's surely not the main motivation for the change. The aim here is to simplify next patch (https://reviews.llvm.org/D71053) which benefits from being able to observe vectorizer generated code without doing extra book keeping.

ebrevnov marked an inline comment as done.Dec 10 2019, 9:02 PM
ebrevnov added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2687

That doesn't change existing semantics. It simply allows not to generate trivially dead code because otherwise there is an unconditional branch to vector loop.

These extra branches are very short leaved and optimized out by SimplifyCFG which happens just 3 passes after the vectorizer.

Sounds reasonable.

Thus I don't believe it can cause any harm to the performance. I double check that LLVM's test-suite has no regressions.

Regression in performance or not showing any of those branches in asm output?

Unfortunately, the test-suite benchmark is not a complete set of relevant programs. :(

This is definitely a subjective thing as there is no any formal metric to measure. IMHO having vectorizer generated code clearly separated from the rest makes IR more readable. It becomes easy to see whole structure of the vectorizer generated code.

Just to keep in mind, in the past, steps towards generating "better looking" IR have almost always lead us to poorer codegen. Usually, we consider this to be a weak reason, if one at all, and one that could have unintended consequences.

So, if your patch has other properties, it's best to focus on them.

Having sad that it's surely not the main motivation for the change. The aim here is to simplify next patch (https://reviews.llvm.org/D71053) which benefits from being able to observe vectorizer generated code without doing extra book keeping.

It'd be interesting to know if this patch would also make other analises easier, too. That way, even if your following patch doesn't land, this one can benefit other parts of the vectoriser.

Can you think of some existing infrastructure in the vectoriser that would be easier with this change?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2687

Ah, my bad. I didn't see the condition invertion.