User Details
- User Since
- Jun 27 2023, 10:34 PM (22 w, 4 d)
Sep 24 2023
Sep 21 2023
Polish comments.
Sep 20 2023
Keep order when updating dom-tree.
Sep 19 2023
Polish comment and fix when to print log.
Reduce irrelevant formatting
Sep 8 2023
Sep 6 2023
ping.
Sep 4 2023
Insert use function into jump-table-islands.ll to avoid unrelated transform.
Sep 1 2023
Simplify LazyGetIndex function, and ready to be reviewed.
Drop unnecessary formatting.
Aug 31 2023
Remove unnecessary helper function
Aug 28 2023
LGTM. May commit it if no changes are requested by others in one day.
Aug 12 2023
ping.
Aug 7 2023
Eliminate boundary-check.
Aug 4 2023
Update tests and FIXME: Boundary-check
Aug 2 2023
Aug 1 2023
Already limit it to one variable. But I don't know why it is unlikely to be useful. From my perspective For multi-dimensional arrays, it's possible to have more than one index variable.
Complete basically correct fold.
Jul 31 2023
Update tests.
Update tests
Extends foldCmpLoadFromIndexedGlobal to fold.
Extend foldCmpLoadFromIndexedGlobal and use load-cmp.ll as test file.
Update AArch64's codegen tests.
Jul 30 2023
[Clang-format]
It seems perform correctly and well in most cases.
Jul 29 2023
Nice work to me! Just some nits here.
Can you create two patches to better show the difference before/after transform?
The first patch includes your unoptimized testcases without the fold. (Your first commit)
The second patch includes your optimized testcases and your code for the fold. (Your second commit, which is based on your first commit)
And then let the first patch be the parent revision of the second patch through "Edit Related Revisions".
Update codesize-loop.ll
Add test for generalization.
- Add complete positive tests
- Fix bugs about preserving domtree
Jul 28 2023
- Refactor TryToSimplifyUncondBranchFromEmptyBlock to merge phis between BB and Succ with a common predecessor. Logic correctly embedded (OK in Local tests), but not efficient currently.
- A regression found, in switch-simplify-crash2.ll. But InstCombine would solve it.
- Testcases not added now, incoming. And ARM/jump-islands-table.ll doesn't test well still.
Jul 27 2023
Add -arm-atomic-cfg-tidy=0 option.
Sure, something like FoldTwoEntryPHINode/HoistThenElseToIf may miss/delay. Or some Phi to Select may be delayed/sunk. I haven't constructed counter-examples here. (Maybe it's my intuitive fault, I actually constructed some examples optimized well).
I think the probitability can be tested by the existing testcases here. I'll try to generalize TryToSimplifyUncondBranchFromEmptyBlock to perform the tests.
I think even if we don't always perform the transform, we should still express it generically, and then have a profitability heuristic on top: E.g. say that the one common predecessor must have a switch terminator. That way we can clearly separate correctness from profitability requirements.
I agree with you. The generalization is indeed necessary. It may take some time to complete it. At that time can we determine what profitability heuristic shoud be.
Edit: Just for reference, this is an example of the non-switch CFG I have in mind where such a transform should be possible: https://gist.github.com/nikic/8f22ef4e366fd3041c7e150c228f9673
Thanks! That could serve as a testcase to be expanded/reduced.
Emphasize ordering.
Jul 26 2023
Omit jump-table-islands.ll test temporarily. (Sorry, I'm not familiar with ARM and unable to update it)
Contact related committers to update if this patch is approved.
Jul 25 2023
- Fix the nits
- Change the order of checks to increase efficiency.
Here is an optimized example related to the lookup table optimization in the issue in summary:
add 1
Jul 24 2023
- Fix the nits
- Clear the logic
Thanks for your review! I understand your framing and it clear my mind. This framing is much general, not just for switch. And as you analyse, actually, a large part of the logic of this patch is copied from TryToSimplifyUncondBranchFromEmptyBlock, specified for switch. I restrict this fold in a narrow situation. Though this transformation is not directly related to switch, but I didn't apply such mergence to all BasicBlocks, since it's not always worthwhile as you say.
- Add m_OneUse guard to zext(icmp).
Jul 23 2023
- Fix all the nits.
- Extract the logic of canMergePhisOfCase
Maybe you need to create two patch.
The first patch includes your unoptimized testcases without the fold. (Your first commit)
The second patch includes optimized testcases and the code for the fold. (Your second commit, which is based on your first commit)
And then you should let the first patch be the parent revision of the second patch through "Edit Related Revisions".
- Apply m_OneUse guard to lshr.
- Fix nits
ping.
Jul 22 2023
Fix the bugs of merging the phis from the BasicBlock out of switch.
Jul 21 2023
- Preserve domtree
Avoid infinite loops
- Avoid infinite loops
Jul 20 2023
Jul 18 2023
Produce zext(bitwise(icmp.icmp))
Jul 16 2023
ping.
Limit the fold for multiuse cases
Jul 14 2023
Add xor tests
Jul 13 2023
I update the diff to ensure that we fold exactly what can be folded by foldXorOfICmps and foldAndOrOfICmps.
Jul 12 2023
[clang-format]