Don't bail out on constant divisors for divisions that can be narrowed without
introducing control flow . This gives us a 32 bit multiply instead of an
emulated 64 bit multiply in the generated PTX assembly.
Details
- Reviewers
jlebar - Commits
- rGaa92cae14e99: [BypassSlowDivision] Improve our handling of divisions by constants
rGeda7a86d42ff: [BypassSlowDivision] Improve our handling of divisions by constants
rL319677: [BypassSlowDivision] Improve our handling of divisions by constants
rL314253: [BypassSlowDivision] Improve our handling of divisions by constants
Diff Detail
- Build Status
Buildable 10607 Build 10607: arc lint + arc unit
Event Timeline
lib/Transforms/Utils/BypassSlowDivision.cpp | ||
---|---|---|
359 | The divisor also stays a constant in the other two cases, so is this really the thing that makes us want to do this transformation but not the other ones when the divisor is a constant? |
lib/Transforms/Utils/BypassSlowDivision.cpp | ||
---|---|---|
359 | The way I worked this out in my head is that, for the optimization to be worth it, the perf improvement from doing a shorter op must be more than the perf regression due to control flow. When the op is divide the assumption is that this tradeoff is worth it, but when the op is multiply (division by constant) this tradeoff is not worth it. However, if we can narrow the op (divide or multiply) without any control flow then there is no tradeoff -- we should always narrow the op. |
The divisor also stays a constant in the other two cases, so is this really the thing that makes us want to do this transformation but not the other ones when the divisor is a constant?