This is an archive of the discontinued LLVM Phabricator instance.

[CGP] use subtract of cmps for result of memcmp expansion
ClosedPublic

Authored by spatel on Jun 30 2017, 1:20 PM.

Details

Summary

There are some regressions for PPC here (even if we keep the 'subf', the 'extsw' is unnecessary, isn't it?). I'm hoping we can address that as a follow-up rather than make this change dependent on fixing that.

As noted in the code comment, transforming this in the other direction might require a separate transform here in CGP given the block-at-a-time DAG constraint.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Jun 30 2017, 1:20 PM
efriedma added inline comments.Jul 6 2017, 5:22 PM
test/CodeGen/X86/memcmp.ll
38 ↗(On Diff #104912)

Potential alternative sequence: instead of xorl+cmpw+seta+sbbl, just generate a "subl", like we do for length 1.

spatel added inline comments.Jul 7 2017, 8:53 AM
test/CodeGen/X86/memcmp.ll
38 ↗(On Diff #104912)

Ah, of course. I've been trying to find a way to use subtract in the larger cases, but this is straightforward for i8/i16. I'll update the patch.

spatel updated this revision to Diff 105649.Jul 7 2017, 9:08 AM

Patch updated:
Bypass the compares for i8/i16:
http://rise4fun.com/Alive/7sl

nemanjai edited edge metadata.Jul 10 2017, 10:08 PM

We have patches that will eventually land that improve our handling of zero-extending i1 in the PPC back end so I'm not overly concern about the slightly poorer code here. However, I'd like to make sure I understand the reason for this change.

  1. On targets that will expand selects to branches, we want to avoid introducing selects to reduce branching
  2. Recreating a select in the DAG is possible and getting rid of the select in the DAG is impossible on targets that expand it to branches
  3. Making this choice conditional on something like an IR-level version of canInsertSelect() (maybe something called shouldInsertSelect()...) introduces unnecessary complexity

Does that evaluation roughly cover the motivation for this patch?

test/CodeGen/PowerPC/memcmp.ll
15 ↗(On Diff #105649)

I wouldn't worry about this sign extension. This is something that happens all over the PPC back end and we have a patch in the pipeline that will eliminate it.

We have patches that will eventually land that improve our handling of zero-extending i1 in the PPC back end so I'm not overly concern about the slightly poorer code here. However, I'd like to make sure I understand the reason for this change.

  1. On targets that will expand selects to branches, we want to avoid introducing selects to reduce branching
  2. Recreating a select in the DAG is possible and getting rid of the select in the DAG is impossible on targets that expand it to branches
  3. Making this choice conditional on something like an IR-level version of canInsertSelect() (maybe something called shouldInsertSelect()...) introduces unnecessary complexity

Does that evaluation roughly cover the motivation for this patch?

That's correct, although if pre-DAG select-to-branch transforms are a concern, then this is just a tiny example of that.

I admit that when I saw the branchy output on some of the x86 tests, I thought this was happening already, but it turns out those branches are created later. When I realized we could create the memcmp result with a subtract, however, I thought that would translate to better codegen for both PPC and x86 without introducing more DAG transforms.

It turns out we'll have to add those transforms anyway if the related llvm-dev thread ( http://lists.llvm.org/pipermail/llvm-dev/2017-July/114885.html ) continues in its current direction. So the practical motivation is that we're producing better code for memcmp() for x86 (and PPC if we avoid the isels?) without waiting for that extra DAG step to improve codegen.

I almost forgot another hidden motivation that I need to mention in case we ever come back to the idea for x86 or other targets try to hack this: I was looking to use vector-length types in this expansion on x86 for 16-byte and larger memcmp. Because of existing DAG transforms, the x86 codegen would become a strange mix of scalar and vector code with the select-based IR in this expansion...and that would have required several steps to undo. Using subtract-of-cmps avoided that. :)
See: https://bugs.llvm.org/show_bug.cgi?id=33329 for detailed analysis.

RKSimon added a subscriber: RKSimon.

This makes sense to me if the PowerPC guys are happy.

RKSimon edited edge metadata.Jul 20 2017, 4:16 AM

This makes sense to me if the PowerPC guys are happy.

@nemanjai @echristo @hfinkel Are you guys happy with this approach?

This makes sense to me if the PowerPC guys are happy.

@nemanjai @echristo @hfinkel Are you guys happy with this approach?

Seems perfectly fine to me. Let's wait and see if Eric or Hal have any further objections.

nemanjai accepted this revision.Jul 27 2017, 1:55 AM

Since there have not been any concerns voiced, I think this patch can proceed.

This revision is now accepted and ready to land.Jul 27 2017, 1:55 AM
This revision was automatically updated to reflect the committed changes.