Now that PR33325 is fixed, this should always improve the generated code.
Details
- Reviewers
spatel hfinkel nemanjai timshen - Commits
- rG10003e31f47c: [MergeICmps] Re-commit rL324317 "Enable the MergeICmps Pass by default."
rG7d09780fa205: [MergeICmps] Enable the MergeICmps Pass by default.
rL324465: [MergeICmps] Re-commit rL324317 "Enable the MergeICmps Pass by default."
rL324317: [MergeICmps] Enable the MergeICmps Pass by default.
Diff Detail
- Repository
- rL LLVM
- Build Status
Buildable 14546 Build 14546: arc lint + arc unit
Event Timeline
Can you create at least one llc test each for x86 and PPC, so we see the result of merging + expansion together? This could just be copying existing tests from test/Transforms/MergeICmps?
Done. Can you double check the PPC test ? It does make sense to me, but I'm not an expert.
Thanks. This test looks fine to me as PPC given the 'align 4' on the loads. I recommend checking these tests in now, so we just see the improvements from enabling MergeICmps when this patch is committed.
For reference, the current PPC codegen is something like this:
# %bb.0: # %entry lwz 5, 0(3) lwz 6, 0(4) cmplw 5, 6 bne 0, .LBB0_2 # %bb.1: # %land.rhs.i lwz 3, 4(3) lwz 4, 4(4) cmpw 3, 4 b .LBB0_3 .LBB0_2: crxor 2, 2, 2 .LBB0_3: # %opeq1.exit li 3, 0 li 4, 1 isel 3, 4, 3, 2 blr
My concern is that I don't see consideration for misalignment when combining to larger ops. We generally don't care on x86, but it could be a problem for some PPC? This may be a non-issue if some other pass is expected to split misaligned memops later.
Thanks!
Thinking about this a bit more, since MergeICmps and MemCmpExpansion use the same TLI hook, this shouldn't be a problem unless there's already a bug in PPC's enabling of that hook. Given that MemCmpExpansion was originally written with a PPC target in mind, there's not much risk of that.
So I think this patch is fine now, but let's give someone who's closer to current-day PPC metal a chance to reply.
The PPC code looks way nicer. The alignment is fine as well - unaligned scalar loads are cheap on modern PPC HW and the loads are in bounds. Of course, I have no comment on the X86 code as I don't know enough about the target.
LGTM.
Just wanted to also mention that this passes a double bootstrap build/lit/lnt on PPC.
FWIW, this broke compilation of Qt for x86 (at least on windows, and the reduced example also triggers for other x86 targets), see PR36557 for details. Do you have time to look at it sometime soon, or is it appropriate to revert this until that is sorted out?