I wanted to be able to do some partial unrolling on my target, as well as limit the unroll counts for full unrolling. This meant the behavior I wanted (which seemed quite reasonable to ask for!) looks like this:
- If doing full unrolling, use threshold X, and don't go over A iterations unrolled.
- If we don't meet the requirements for full unrolling, use threshold Y and don't go over A iterations unrolled. Also, make sure to make an unroll count that divides evenly into the loop count.
- Don't do runtime unrolling.
Unfortunately, I ran into three problems, which this patch fixes (comments welcome if there's any better ways to fix them).
- There's no way to limit the number of iterations for full unrolling -- only for partial/runtime unrolling. So I added that.
- A bug in partial unrolling causes it to not reduce the count to be modulo-tripcount if the PartialThreshold is already met. So I fixed that. I'm not sure if this bug can trigger without change 1), though.
- A bug in partial unrolling causes it to ignore MaxCount, even though MaxCount says it applies to everything but full unrolling. So I fixed that.
(Use-case: our target, a GPU, can [in TTI] roughly estimate the number of high-latency operations, like loads and texture reads, and make reasonable judgements as to how much unrolling is reasonable given that number. But to do that, we need to be able to put a cap on full unrolling separate from the overall cost threshold.)