Currently the MaxUsesToExplore limit only applies to the number of users
per value, not the total number of users to explore.
The current limit of 20 pessimizes IR with opaque pointers in some
cases. Without opaque pointers, we have deeper pointer def-use chains in
general due to extra bitcasts and geps for structs with index 0.
With opaque pointers the def-use chain is not as deep but wider, due to
bitcasts & 0-geps missing.
To improve the situation for opaque pointers, this patch does 2 things:
- Apply the limit to the total number of uses visited. From the wording in the description of the option it seems like this may be the original intention. With the current implementation we could still end up walking a lot of uses.
- Increase the limit to 100. This is quite arbitrary, but enables a good number of additional optimizations.
Those adjustments have a noticeable compile-time impact though. In part
that is likely due to additional transformations (and conversely
the current baseline misses optimizations after switching to opaque
pointers).
Limit=100:
- NewPM-O3: +0.15%
- NewPM-ReleaseThinLTO: +0.86%
- NewPM-ReleaseLTO-g: +0.44%
Limit=60:
- NewPM-O3: +0.14%
- NewPM-ReleaseThinLTO: +0.41%
- NewPM-ReleaseLTO-g: +0.21%
Limit=40:
- NewPM-O3: +0.11%
- NewPM-ReleaseThinLTO: +0.12%
- NewPM-ReleaseLTO-g: +0.09%
I'll add a test if/once we converge on agreement. I'd be more than happy to
discuss alternatives as well
does moving this after Visited.insert() below help at all? I'd think that going through the worklist below is the expensive part, not constructing the worklist