The patch fixes bug 26957, enabling unroll by non-power-of-2 counts if they are explicitly set in "pragma unroll".
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Transforms/Scalar/LoopUnrollPass.cpp | ||
---|---|---|
736–739 ↗ | (On Diff #50782) | Why do we need to move this? |
lib/Transforms/Utils/LoopUnrollRuntime.cpp | ||
354–355 ↗ | (On Diff #50782) | This change is unnecessary. |
365–367 ↗ | (On Diff #50782) | What is the final expression for ModVal? From the code it looks like ModVal = ((BECount % Count) + 1) % Count Do we really need double urem here? If so, could you please add a comment explaining why we need exactly this? |
368 ↗ | (On Diff #50782) | Nitpick: dot missing at the end. |
lib/Transforms/Scalar/LoopUnrollPass.cpp | ||
---|---|---|
736–739 ↗ | (On Diff #50782) | This is better place for the check. Previously "nounroll" attribute was even if we failed to unroll loop. There are plenty of checks inside UnrollLoop that can stop unrolling. |
lib/Transforms/Utils/LoopUnrollRuntime.cpp | ||
354–355 ↗ | (On Diff #50782) | As now the comment is inside "if" it is shifted on 2 spaces. That causes longest string in it to become 81 symbols. So I just make it shorter than 81. |
365–367 ↗ | (On Diff #50782) | Potentially BECount + 1 can unsigned overflow, while (BECount % Count) +1 can not. That is why double URem is used. I can try to simplify this to something like this ModVal = ((Count - ModValAdd) >> (ModVal->getSclarType()->getPrimitiveSizeInBits() - 1) & ModValAdd ``` or to select here or somewhere in further combiners. However non-power-of-2 is not frequent case as for runtime unrolling it comes only from user specified pragma. |
Minor drive-by comment.
lib/Transforms/Utils/LoopUnrollRuntime.cpp | ||
---|---|---|
365–367 ↗ | (On Diff #50782) | Alternatively, you could have ModVal as BECount == -1 ? <constant expr> : ((BECount +nuw 1) % Count. |
lib/Transforms/Utils/LoopUnrollRuntime.cpp | ||
---|---|---|
365–367 ↗ | (On Diff #50886) | Yes. That is what I mean by select. I would prefer combiner to decide the correct replacement (maybe based on some architecture properties). The simplification of TripCount % Count to TripCount & (Count - 1) when Count is power-of-2 is pretty obvious and architecture independent. This one is more complicated: ((BECount % Count) + 1) % Count. |
Hi,
This patch LGTM, thanks for working on this!
Michael
lib/Transforms/Scalar/LoopUnrollPass.cpp | ||
---|---|---|
736–739 ↗ | (On Diff #50886) | But maybe it's not a bad idea to have a 'nounroll' attribute on a loop that we cannot unroll:) Otherwise, next time we'll just probably spend some time trying again. |
lib/Transforms/Utils/LoopUnrollRuntime.cpp | ||
354–355 ↗ | (On Diff #50886) | Oh, makes sense. Phabricator doesn't show whitespace changes, so I missed the change in indentation. |
365–367 ↗ | (On Diff #50886) | Thanks for the explanation, it's clear now. I think both BECount == -1 ? <constant expr> : ((BECount +nuw 1) % Count and ((BECount % Count) + 1) % Count are good options here. |