Judging by the existing comments, this was the intention, but the
transform never actually checked if the existing phi's would be removed.
See https://bugs.llvm.org/show_bug.cgi?id=44242 for an example where
this causes much worse code generation on AMDGPU.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
Buildable 42207 Build 42622: arc lint + arc unit
Event Timeline
I'm not sure if these patches conflict, but D71164 is also trying to fix a bug in this code.
I agree that we shouldn't be increasing instruction count, especially PHI count.
llvm/test/Transforms/InstCombine/pr44242.ll | ||
---|---|---|
1 | Please autogenerate and precommit tests. |
llvm/test/Transforms/InstCombine/pr44242.ll | ||
---|---|---|
1 | Sorry, but what exactly do you mean? |
Change looks fine to me. I'd suggest to add test coverage for a few more of the cases you check. In particular, right now you only cover the case where the outer phi has extra uses, but not the case where an inner phi has them.
llvm/test/Transforms/InstCombine/pr44242.ll | ||
---|---|---|
1 | You can use utils/update_test_checks.py to generate the expected output. Then you should first commit just the test (with generated output without your patch). The patch will then only show the IR diff in the test, making is clearer what actually changed. |
llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp | ||
---|---|---|
2203 | I think we should be doing those checks here. |
llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp | ||
---|---|---|
2203 | The OldPhiNodes.count(PHI) == 0 part of the check wouldn't work correctly at this points -- it needs all PHIs to be collected first. |
llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp | ||
---|---|---|
2203 | Right. But the rest can be done here, no? |
llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp | ||
---|---|---|
2203 | While it could be done, I don't think there'd be much benefit. We'd have to iterate all the users a second time just to check the phi nodes. The current implementation is also nicely symmetric between the checking loop (added in this patch) and the replacement loop (preexisting), and I think it's worth keeping that symmetry. |
llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp | ||
---|---|---|
2203 | I don't have a strong opinion *here*, but in general the earlier we error-out |
llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp | ||
---|---|---|
2203 | I'd agree with @nikic here that this sounds like a premature optimization. It would split up the two parts which must be in sync into three, and break the symmetry between them, making it even harder to verify that the already-complex transform is correct, all for a theoretical performance benefit. I can do it if you require it, but otherwise I'd rather not. |
llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp | ||
---|---|---|
2300–2310 | In a Release build I get "warning: unused variable" for TyA and TyB. |
- Update precommitted test in this commit.
- Make the test match the original bug better. It turns out that in my attempt to make it harder for other transforms to happen beforehand and break the test, I accidentally made another transform kick in which broke it anyways. Use this form with a loop which should hopefully defeat other optimizations better.
llvm/test/Transforms/InstCombine/pr44242.ll | ||
---|---|---|
1 | Hopefully I did that right. |
I just asked earlier this week to get my commit rights back, but I haven't heard back right away, so you can commit it if I don't get it first.
I think we should be doing those checks here.