This is an archive of the discontinued LLVM Phabricator instance.

[PeepholeOptimizer] Enhance the redundant COPY elimination.
ClosedPublic

Authored by hliao on Sep 18 2020, 1:25 PM.

Details

Summary
  • Eliminate redundant COPYs from the same register & subregister pair.

Diff Detail

Event Timeline

hliao created this revision.Sep 18 2020, 1:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 18 2020, 1:25 PM
hliao requested review of this revision.Sep 18 2020, 1:25 PM
hliao added a comment.Sep 18 2020, 1:29 PM

This patch enhances the peephole-opt to fix the redundant copy issues once to be fixed in D87556. With the enhancement, we could remove that redundant COPY locally. Test cases are revised due to the code quality improvement or change. Fortunately, AMDGPU and ARM tests need addressing that difference.

hliao added inline comments.Sep 18 2020, 1:34 PM
llvm/test/CodeGen/Thumb2/mve-vcvt16.ll
21–30

The code sequence is totally different. But, based on my understanding ARM ISA, they are equivalent. The previous one will copy q0 to q2 and convert s8~s11 (alias to q2) into s0~s7 (alias to q0 and q1) as the return value. The new one firstly convert s0~s3 (alias to q0 as the input) to s4~s11 (alias to q1 and q2) followed by moving q2 to q0 to form the return pair of q0 and q1. Please let me know whether they are really equivalent.

dmgreen added inline comments.Sep 18 2020, 2:02 PM
llvm/test/CodeGen/Thumb2/mve-vcvt16.ll
21–30

Yeah sounds fine.

hliao updated this revision to Diff 293155.Sep 21 2020, 6:32 AM

Rebase.

qcolombet requested changes to this revision.Sep 21 2020, 10:58 AM

Hi @hliao,

I must be missing something, but it feels to me that this patch is actually making the situation worse.

Could you look at my example inlined below and explain how it would still work with this patch?

Cheers,
-Quentin

llvm/lib/CodeGen/PeepholeOptimizer.cpp
1409

I am not convinced this patch is an improvement at removing redundant copy instructions.

It seems to me we would only consider redundant copies for matching sub register whereas previously we would have propagated whole copies.
E.g., previously for

%1 = COPY %0
%2 = COPY %0:sub1

We would have replaced %2 with %1:sub1, but unless I miss something, with this new patch, we won't see %2 as a redundant copy anymore.

This revision now requires changes to proceed.Sep 21 2020, 10:58 AM

Hi @hliao,

I must be missing something, but it feels to me that this patch is actually making the situation worse.

Could you look at my example inlined below and explain how it would still work with this patch?

Cheers,
-Quentin

The comment seems outdated, if my understanding is right, and even the original code cannot perform that change since, once the 2nd COPY with same source is found in L1407, the check @ L1419 just skips that earlier as the 1st COPY has no subreg and the 2nd COPY has sub1.

The comment seems outdated, if my understanding is right, and even the original code cannot perform that change since, once the 2nd COPY with same source is found in L1407, the check @ L1419 just skips that earlier as the 1st COPY has no subreg and the 2nd COPY has sub1.

Good point!

Now, I am wondering why is this change not just NFC then?

The comment seems outdated, if my understanding is right, and even the original code cannot perform that change since, once the 2nd COPY with same source is found in L1407, the check @ L1419 just skips that earlier as the 1st COPY has no subreg and the 2nd COPY has sub1.

Good point!

Now, I am wondering why is this change not just NFC then?

It does improve the original by considering the COPY from the same subreg.

BTW, the comment seems ambiguous. It may read that that subreg extract won't be eliminated but handled somewhere else. @arsenm any comment?

qcolombet accepted this revision.Sep 21 2020, 12:12 PM

It does improve the original by considering the COPY from the same subreg.

I see. Previously we would have stopped on the first non-subreg copy.

Makes sense.

Thanks for taking the time to explain it.

Make sure the commit message explains all this. LGTM otherwise.

Cheers,
-Quentin

This revision is now accepted and ready to land.Sep 21 2020, 12:12 PM
arsenm accepted this revision.Sep 22 2020, 6:36 AM
arsenm added inline comments.
llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i128.ll
301

I'm surprised there are so many changes here, but I guess it's probably papering over a number of combines that are missing

hliao added inline comments.Sep 22 2020, 7:02 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i128.ll
301

for this 3 global isel tests, I compared the new assembly against the old one, all the redundant copies are removed and overall code lengths are reduced.

This revision was automatically updated to reflect the committed changes.