This change is motivated by https://reviews.llvm.org/D71053 where we add suport for evaluating overhead of runtime checks.
Currently we may reuse existing block for vector preheader. Thus we endup with instructions not related to vectorization in vector preheader. With this change we will reuse existing preheader only if it has single instruction (should be a branch) and will generate new preheader in all other cases. That makes it easier to read the IR. What more importantly it enables us to evaluate cost of vector prehader in a more convinient way.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
Buildable 42639 Build 43166: arc lint + arc unit
Event Timeline
This change introduces a number of new branches in the code. While not necessarily the hottest code, it may prove significant on short loops, especially nested loops. Without evidence in the form of benchmarks etc. it's hard to justify.
Furthermore, I'm not sure I agree with the statement that this makes the analysis somehow simpler. Do you have an explicit goal in mind?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2687 | Are you sure this is semantically valid on all targets? |
These extra branches are very short leaved and optimized out by SimplifyCFG which happens just 3 passes after the vectorizer. Thus I don't believe it can cause any harm to the performance. I double check that LLVM's test-suite has no regressions.
Furthermore, I'm not sure I agree with the statement that this makes the analysis somehow simpler. Do you have an explicit goal in mind?
This is definitely a subjective thing as there is no any formal metric to measure. IMHO having vectorizer generated code clearly separated from the rest makes IR more readable. It becomes easy to see whole structure of the vectorizer generated code. Having sad that it's surely not the main motivation for the change. The aim here is to simplify next patch (https://reviews.llvm.org/D71053) which benefits from being able to observe vectorizer generated code without doing extra book keeping.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2687 | That doesn't change existing semantics. It simply allows not to generate trivially dead code because otherwise there is an unconditional branch to vector loop. |
Sounds reasonable.
Thus I don't believe it can cause any harm to the performance. I double check that LLVM's test-suite has no regressions.
Regression in performance or not showing any of those branches in asm output?
Unfortunately, the test-suite benchmark is not a complete set of relevant programs. :(
This is definitely a subjective thing as there is no any formal metric to measure. IMHO having vectorizer generated code clearly separated from the rest makes IR more readable. It becomes easy to see whole structure of the vectorizer generated code.
Just to keep in mind, in the past, steps towards generating "better looking" IR have almost always lead us to poorer codegen. Usually, we consider this to be a weak reason, if one at all, and one that could have unintended consequences.
So, if your patch has other properties, it's best to focus on them.
Having sad that it's surely not the main motivation for the change. The aim here is to simplify next patch (https://reviews.llvm.org/D71053) which benefits from being able to observe vectorizer generated code without doing extra book keeping.
It'd be interesting to know if this patch would also make other analises easier, too. That way, even if your following patch doesn't land, this one can benefit other parts of the vectoriser.
Can you think of some existing infrastructure in the vectoriser that would be easier with this change?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2687 | Ah, my bad. I didn't see the condition invertion. |
Are you sure this is semantically valid on all targets?