This is an archive of the discontinued LLVM Phabricator instance.

[X86] Optimize umax(X,1)
ClosedPublic

Authored by kazu on Feb 21 2023, 12:16 AM.

Details

Summary

Without this patch:

%cond = call i32 @llvm.umax.i32(i32 %X, i32 1)

is compiled as:

83 ff 02                   cmp    $0x2,%edi
b8 01 00 00 00             mov    $0x1,%eax
0f 43 c7                   cmovae %edi,%eax

With this patch, the compiler generates:

89 f8                      mov    %edi,%eax
83 ff 01                   cmp    $0x1,%edi
83 d0 00                   adc    $0x0,%eax

saving 3 bytes. We should be able to save 5 bytes in larger functions
where the mov is unnecessary.

This patch converts the specific cmov pattern to cmp $1 followed by
adc $0.

This patch partially fixes:

https://github.com/llvm/llvm-project/issues/60374

The LLVM IR optimizer is yet to canonicalize max expressions to
actual @llvm.umax.

Diff Detail

Event Timeline

kazu created this revision.Feb 21 2023, 12:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 21 2023, 12:16 AM
kazu requested review of this revision.Feb 21 2023, 12:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 21 2023, 12:16 AM
kazu retitled this revision from [X86] Optimize umax(X,1) (NFC) to [X86] Optimize umax(X,1).Feb 21 2023, 7:13 AM

Slightly unrelated, is it possible to get the codegen to be:

	xorl	%eax, %eax
	cmp	$0x1, %edi
	adc	%edi, %eax

?

If so that would be preferable (except in register pressure corner case b.c eax and edi now have slight live-range overlap).
On target like ICX it saves a full instruction b.c no move-elim.

This doesn't handle i8 and i16 the same way because truncates get in the way, right? Would it be better to match this as "umax(x, 1)" (before it becomes x86-specific instructions)? Or could we peek through a truncate and still generate the adc for i8/i16?

This doesn't handle i8 and i16 the same way because truncates get in the way, right? Would it be better to match this as "umax(x, 1)" (before it becomes x86-specific instructions)? Or could we peek through a truncate and still generate the adc for i8/i16?

I think there is potentially a bit of danger of transforming umin as other patterns seem more likely to look for umin than the alternative (select or x86adc).

goldstein.w.n added inline comments.Feb 21 2023, 12:26 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
47277

Can it also match X86ISD::CMP?

47286

Can ADC valuetype by different than comparison valuetype?

This doesn't handle i8 and i16 the same way because truncates get in the way, right? Would it be better to match this as "umax(x, 1)" (before it becomes x86-specific instructions)? Or could we peek through a truncate and still generate the adc for i8/i16?

I think there is potentially a bit of danger of transforming umin as other patterns seem more likely to look for umin than the alternative (select or x86adc).

Looking a little more, think you're right, best to do this in LowerSELECT before the X86ISD::CMOV logic.

Do we have i8/i16 test coverage anywhere?

Incidently, I'm guessing the vector equivalent code will be pretty poor as well when a umax op isn't available (v2i64 pre-AVX512 etc).

Incidently, I'm guessing the vector equivalent code will be pretty poor as well when a umax op isn't available (v2i64 pre-AVX512 etc).

Yes, I posted an example of that in https://github.com/llvm/llvm-project/issues/60374 :
https://alive2.llvm.org/ce/z/didmjC

If there's no legal/custom umax, we probably want to convert to setcc + add. Once that's done, we can take another step to canonicalize the IR icmp+zext+add sequence to umax.
There's a sibling pattern with umin:
https://alive2.llvm.org/ce/z/cXL_d9
...and related signed patterns too.

kazu added a comment.Feb 25 2023, 11:07 PM

Slightly unrelated, is it possible to get the codegen to be:

	xorl	%eax, %eax
	cmp	$0x1, %edi
	adc	%edi, %eax

?

If so that would be preferable (except in register pressure corner case b.c eax and edi now have slight live-range overlap).
On target like ICX it saves a full instruction b.c no move-elim.

I've never thought of that. Very interesting. The comparison of the two approaches is such a close call. I have a slight preference to the mov-cmp-adc route. If we assume that users typically do not care about the value of x past umax(x,1), which I haven't surveyed in real-world applications, we shouldn't have to make a copy of the source operand with a mov, especially in large enough functions where the calling conventions (edi->eax) aren't much of a constraint. Friendliness under high register pressure along with the 6-byte encoding (without a mov) is a plus.

kazu added a comment.Feb 25 2023, 11:35 PM

Do we have i8/i16 test coverage anywhere?

I just checked in 86bd9c984154d625392bfab05541bbe9ee18b6ab. This patch being reviewed here does not handle i8/i16 yet.

Do we have i8/i16 test coverage anywhere?

I just checked in 86bd9c984154d625392bfab05541bbe9ee18b6ab. This patch being reviewed here does not handle i8/i16 yet.

Think might make sense to move this to LowerSELECT so it can handle i8/i16. Also makes more sense to just lower this directly to adc/sbb rather than through cmovcc.

kazu updated this revision to Diff 502353.Mar 3 2023, 10:55 PM

Add support for i8 and i16 by seeing through ISD::TRUNCATE.

kazu added a comment.Mar 3 2023, 10:59 PM

Please take a look.

I settled on the idea of seeing trough ISD::TRUNCATE. I tried sending ISD::UMAX of MVT::i8 through MVT::i64 to LowerMINMAX with setOperationAction, but I ended up seeing various differences in the generated code even without my actual change for the optimization.

RKSimon added inline comments.Mar 5 2023, 5:47 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47439

Cond.hasOneUse() will only check the sub flags output has one use - what about the sub result? Cond->hasOneUse()?

47443

(style) auto *Sub1C =

kazu updated this revision to Diff 502472.Mar 5 2023, 1:53 PM

Use auto and Cond->hasOneUse().

kazu marked 2 inline comments as done.Mar 5 2023, 1:54 PM

Please take a look. Thanks!

RKSimon accepted this revision.Mar 6 2023, 3:06 AM

LGTM cheers

This revision is now accepted and ready to land.Mar 6 2023, 3:06 AM
spatel accepted this revision.Mar 6 2023, 5:00 AM
This revision was automatically updated to reflect the committed changes.