This is an archive of the discontinued LLVM Phabricator instance.

[PPC] Eliminate more compare instructions using record-form operation
ClosedPublic

Authored by inouehrs on Feb 16 2017, 9:44 PM.

Details

Summary

PPC backend eliminates compare instructions by using record-form instructions in PPCInstrInfo::optimizeCompareInstr, which is called from peephole optimization pass.
This patch improves this optimization to eliminate more compare instructions in two types of common case.

comparison against a constant 1 or -1

The record-form instructions set CR bit based on signed comparison against 0. So, the current implementation does not exploit the record-form instruction for comparison against a non-zero constant.
This patch enables record-form optimization for constant of 1 or -1 if possible; it changes the condition "greater than -1" into "greater than or equal to 0" and "less than 1" into "less than or equal to 0".
With this patch, compare can be eliminated in the following code sequence, as an example.

uint64_t a, b;
if ((a | b) & 0x8000000000000000ull) { ... }
else { ... }

andi for 32-bit comparison on PPC64

Since record-form instructions execute 64-bit signed comparison and so we have limitation in eliminating 32-bit comparison, i.e. with cmplwi, using the record-form. The original implementation already has such checks but andi. is not recognized as an instruction which executes implicit zero extension and hence safe to convert into record-form if used for equality check.

%1 = and i32 %a, 10
%2 = icmp ne i32 %1, 0
br i1 %2, label %foo, label %bar

In this simple example, LLVM generates andi. + cmplwi + beq on PPC64.
This patch make it possible to eliminate the cmplwi for this case.
I added andi. for optimization targets if it is safe to do so.

Diff Detail

Event Timeline

inouehrs created this revision.Feb 16 2017, 9:44 PM
nemanjai added inline comments.Feb 22 2017, 6:25 AM
lib/Target/PowerPC/PPCInstrInfo.cpp
1601

I didn't really go through this function to really understand the purpose for this check, but presumably all these additional instruction opcodes are used for something. If the opcodes you're adding are used for something there should be associated test cases for all of the additional ones. If they're not used at all, then they shouldn't be added. For example, if they're just being added here and something additional needs to be done to exploit them, we should not add them but rather put in a FIXME to add them once we can also add the necessary support.

Thank you for the comment.
As you said, I did not confirm the cases for added opcodes other than andi.. So I will remove them.
I thought it is safe since they are aliases for special cases of rlwinm, which is already included as a target of optimization.

nemanjai edited edge metadata.Feb 23 2017, 2:49 PM

Thank you for the comment.
As you said, I did not confirm the cases for added opcodes other than andi.. So I will remove them.
I thought it is safe since they are aliases for special cases of rlwinm, which is already included as a target of optimization.

They may very well be safe. In fact, I assume they are since I'm sure you've run fairly thorough testing with this patch. But it would be good to understand the impact and try to write test cases that exercise these.

hfinkel edited edge metadata.Feb 23 2017, 3:02 PM

Thank you for the comment.
As you said, I did not confirm the cases for added opcodes other than andi.. So I will remove them.
I thought it is safe since they are aliases for special cases of rlwinm, which is already included as a target of optimization.

They may very well be safe. In fact, I assume they are since I'm sure you've run fairly thorough testing with this patch. But it would be good to understand the impact and try to write test cases that exercise these.

FWIW, it might be easier to write an MIR test for these than IR-level tests.

inouehrs updated this revision to Diff 89622.EditedFeb 24 2017, 1:38 AM
inouehrs edited the summary of this revision. (Show Details)

I eliminated andis. and rlwinm variants from optimization targets for safety.

inouehrs added inline comments.Feb 24 2017, 1:41 AM
lib/Target/PowerPC/PPCInstrInfo.cpp
1601

I eliminated instructions other than andi. Maybe I can add test cases, but anyway, I have never observed actual occurrences of these sequences. So, I disabled them for safety.

nemanjai requested changes to this revision.May 14 2017, 2:54 AM

The extension to the zero-extending opcodes and to comparison to 1/-1 (which are just "or-equal" comparisons against zero) seems perfectly fine to me. My concern is with the safety of optimizing the situation where the machine instruction feeding the compare resides in a different basic block than the compare. We are effectively hoisting the def of the CR field to a predecessor block, which would be fine if we add the respective vreg copies, etc. But the code that is added does not appear to do that. So what I'd like to see in the code (or comment) is how this issue is addressed.

lib/Target/PowerPC/PPCInstrInfo.cpp
1635

This change doesn't seem meaningful but it's harmless, so I guess it's OK.

1663

It isn't clear from this code how it addresses the concern in the comment. Besides, the comment should be updated if this code alleviates the cross-call concern.
Of course, I haven't checked all the code in this function but this code does not appear to do anything to address a situation where there might be a call between the instruction that will be converted to record form and the actual use of the CR field.

This revision now requires changes to proceed.May 14 2017, 2:54 AM
inouehrs updated this revision to Diff 99121.May 16 2017, 2:38 AM
inouehrs edited edge metadata.

Updated based on Nemanja's comments. Also rebased to the latest code base.

inouehrs added inline comments.May 16 2017, 2:43 AM
lib/Target/PowerPC/PPCInstrInfo.cpp
1635

In the first patch, I was reusing FoundUse outside the loop and so I moved the definition of FoundUse before the loop. The current version does not use this outside loop.

1663

I modified the code to be conservative on checking MI and CmpInstr are in the same BB. Now, the optimization on the comparison against 1 or -1 is applied only when Value != 0 and MI and CmpInstr are in the same BB.

jtony edited edge metadata.May 16 2017, 6:58 AM

In the end of the description, you mentioned "In this simple example, LLVM generates andi. + cmplwi + beq on PPC64" And after optimization, Does it become just andi. + beq? If not, can you put what is the actually assembly generated after your optimization in the description also? thanks!

Yes, it becomes just andi. + beq with this patch.
Without opt

	andi. 3, 3, 10
	cmplwi	 3, 0
	beq	 0, .LBB2_2

With opt

	andi. 3, 3, 10
	beq	 0, .LBB2_2
nemanjai accepted this revision.May 16 2017, 10:42 AM

LGTM with the minor nit about the comment addressed on the commit.

lib/Target/PowerPC/PPCInstrInfo.cpp
1675

Complete sentences with capitalization, punctuation, etc. in the comments please.

This revision is now accepted and ready to land.May 16 2017, 10:42 AM
inouehrs updated this revision to Diff 99252.May 17 2017, 12:13 AM

Updated comments. NFC from the previous patch.

timshen added inline comments.
lib/Target/PowerPC/PPCInstrInfo.cpp
1691

The name is confusing in this scope. If Value != 0, then it literally means The immediate is not zero.

Combined with the next comment, I suggest to use a better name - Success is even better IMO.

Alternatively you can wrap up this scope into a function/lambda, and use the return value. That will eliminate the Success variable even further.

1766

If we use (Sub && Value == 0) here, the scope of NonZeroImmed can be limited to the previous small scope else if (Value != 0) {, right?

It seems better to have fewer states in the larger scope.

timshen removed a subscriber: timshen.
inouehrs updated this revision to Diff 99396.May 17 2017, 11:13 PM

addressed timshen's advice on the name and scope of non-zero immediate flag

inouehrs marked 6 inline comments as done.May 17 2017, 11:19 PM
inouehrs added inline comments.
lib/Target/PowerPC/PPCInstrInfo.cpp
1691

I changed the name to Success and made its scope local.

1766

When SrcReg2 != 0, Value is not used and so not initialized.
To avoid this problem, I initialized Value with zero in analyzeCompare above.

inouehrs marked 2 inline comments as done.May 20 2017, 1:29 AM

@timshen Do you have further advice/comments? Thanks!

timshen edited edge metadata.May 20 2017, 1:55 PM

@timshen Do you have further advice/comments? Thanks!

LGTM.

Thanks!

This revision was automatically updated to reflect the committed changes.