- Eliminate redundant COPYs from the same register & subregister pair.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This patch enhances the peephole-opt to fix the redundant copy issues once to be fixed in D87556. With the enhancement, we could remove that redundant COPY locally. Test cases are revised due to the code quality improvement or change. Fortunately, AMDGPU and ARM tests need addressing that difference.
llvm/test/CodeGen/Thumb2/mve-vcvt16.ll | ||
---|---|---|
21–30 | The code sequence is totally different. But, based on my understanding ARM ISA, they are equivalent. The previous one will copy q0 to q2 and convert s8~s11 (alias to q2) into s0~s7 (alias to q0 and q1) as the return value. The new one firstly convert s0~s3 (alias to q0 as the input) to s4~s11 (alias to q1 and q2) followed by moving q2 to q0 to form the return pair of q0 and q1. Please let me know whether they are really equivalent. |
llvm/test/CodeGen/Thumb2/mve-vcvt16.ll | ||
---|---|---|
21–30 | Yeah sounds fine. |
Hi @hliao,
I must be missing something, but it feels to me that this patch is actually making the situation worse.
Could you look at my example inlined below and explain how it would still work with this patch?
Cheers,
-Quentin
llvm/lib/CodeGen/PeepholeOptimizer.cpp | ||
---|---|---|
1409 | I am not convinced this patch is an improvement at removing redundant copy instructions. It seems to me we would only consider redundant copies for matching sub register whereas previously we would have propagated whole copies. %1 = COPY %0 %2 = COPY %0:sub1 We would have replaced %2 with %1:sub1, but unless I miss something, with this new patch, we won't see %2 as a redundant copy anymore. |
The comment seems outdated, if my understanding is right, and even the original code cannot perform that change since, once the 2nd COPY with same source is found in L1407, the check @ L1419 just skips that earlier as the 1st COPY has no subreg and the 2nd COPY has sub1.
The comment seems outdated, if my understanding is right, and even the original code cannot perform that change since, once the 2nd COPY with same source is found in L1407, the check @ L1419 just skips that earlier as the 1st COPY has no subreg and the 2nd COPY has sub1.
Good point!
Now, I am wondering why is this change not just NFC then?
It does improve the original by considering the COPY from the same subreg.
BTW, the comment seems ambiguous. It may read that that subreg extract won't be eliminated but handled somewhere else. @arsenm any comment?
It does improve the original by considering the COPY from the same subreg.
I see. Previously we would have stopped on the first non-subreg copy.
Makes sense.
Thanks for taking the time to explain it.
Make sure the commit message explains all this. LGTM otherwise.
Cheers,
-Quentin
llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i128.ll | ||
---|---|---|
301–303 | I'm surprised there are so many changes here, but I guess it's probably papering over a number of combines that are missing |
llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i128.ll | ||
---|---|---|
301–303 | for this 3 global isel tests, I compared the new assembly against the old one, all the redundant copies are removed and overall code lengths are reduced. |
I am not convinced this patch is an improvement at removing redundant copy instructions.
It seems to me we would only consider redundant copies for matching sub register whereas previously we would have propagated whole copies.
E.g., previously for
We would have replaced %2 with %1:sub1, but unless I miss something, with this new patch, we won't see %2 as a redundant copy anymore.