This is an archive of the discontinued LLVM Phabricator instance.

[CGP / PowerPC] avoid multi-block overhead for simple memcmp expansion
ClosedPublic

Authored by spatel on Jun 7 2017, 11:08 AM.

Details

Summary

The test diff for PowerPC is minimal, but for x86, there's a substantial difference because branches are assumed cheap and SDAG can't optimize across blocks. Instead of this:

_cmp_eq8:
	movq	(%rdi), %rax
	cmpq	(%rsi), %rax
	je	LBB23_1
## BB#2:                                ## %res_block
	movl	$1, %ecx
	jmp	LBB23_3
LBB23_1:
	xorl	%ecx, %ecx
LBB23_3:                                ## %endblock
	xorl	%eax, %eax
	testl	%ecx, %ecx
	sete	%al
	retq

We get this:

cmp_eq8:   
	movq	(%rdi), %rcx
	xorl	%eax, %eax
	cmpq	(%rsi), %rcx
	sete	%al
	retq

And that matches the optimal codegen that we get from the current expansion in SelectionDAGBuilder::visitMemCmpCall(). If this looks right, then I just need to confirm that vector-sized expansion will work from here, and we can enable CGP memcmp() expansion for x86. Ie, we'll bypass the power-of-2 special cases currently optimized in SDAG because we can lower the IR produced here optimally.

Diff Detail

Event Timeline

spatel created this revision.Jun 7 2017, 11:08 AM
courbet edited edge metadata.Jun 7 2017, 12:07 PM

LGTM, you might want to wait for other comments as I'm new here :)

lib/CodeGen/CodeGenPrepare.cpp
1703

I think the comment should also explain why in this case (only one block) we don't want to abort everything right now and let the SDAG do the lowering (IIUC, something along the lines of "in that case, we still want to do the memcmp expansion here because this code handles vector expansions better").

1729

The debug location is never used; this looks like a remnant of some previous code before refactoring. It looks like the intent was to use that builder in member functions. Now the builder is recreated every time (see e.g. MemCmpExpansion::emitLoadCompareByteBlock, line 1750). Builder should be made a member (maybe in another revision).

courbet accepted this revision.Jun 7 2017, 12:08 PM
This revision is now accepted and ready to land.Jun 7 2017, 12:08 PM
spatel added a comment.Jun 7 2017, 2:07 PM

LGTM, you might want to wait for other comments as I'm new here :)

Thanks for the quick review! I'll give the other reviewers a little more time to comment in case they see any problems/improvements.

lib/CodeGen/CodeGenPrepare.cpp
1703

Sure. I think there will be scalar expansions that are also better handled here if we lift MemCmpNumLoadsPerBlock above '1'. It could have been done in the SDAG, but now that we have this general infrastructure here, I think it's better to keep the memcmp expansion together.

1729

This hasn't changed since the initial commit of D28637 / rL304313, but I agree we should clean it up. I'll make that a follow-up step.

spatel updated this revision to Diff 101903.Jun 8 2017, 6:48 AM

Patch updated - no code changes, but:

  1. Added comment about overlapped responsibility with the DAG for lowering the single-block case.
  2. Rebased to r304974. The PPC test now avoids cmp/isel altogether. I think this was either D33718 / D33720. Thanks, @nemanjai!
This revision was automatically updated to reflect the committed changes.