This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen][ExpandMemcmp] Allow memcmp to expand to vector loads (2).
ClosedPublic

Authored by courbet on Oct 24 2017, 4:59 AM.

Details

Reviewers
spatel
nemanjai
Summary
  • Targets that want to support memcmp expansions now return the list of supported load sizes.
  • Expansion codegen does not assume that all power-of-two load sizes smaller than the max load size are valid. For examples, this is not the case for x86(32bit)+sse2.

Fixes PR34887.

Event Timeline

courbet created this revision.Oct 24 2017, 4:59 AM
courbet updated this revision to Diff 120049.Oct 24 2017, 4:59 AM
  • clang-format
Harbormaster completed remote builds in B11427: Diff 120049.
spatel added inline comments.Oct 24 2017, 8:37 AM
include/llvm/Analysis/TargetTransformInfo.h
557–564

We should avoid using different vocabulary in this API than what is in the expansion code. Instead of 'IsThreeWay' here and other places, we can use 'IsZeroCmp'?

lib/Target/X86/X86TargetTransformInfo.cpp
2555

AVX512 needs to be a 'TODO' in this patch. We're going to need to add more tests (64- and 128-byte), and the DAG is not prepared for those patterns yet.

2556–2557

This isn't correct (or at least it doesn't match what the DAG handles optimally). I've added extra runs to memcmp.ll, so we can see what happens for SSE1/AVX1 vs. SSE2/AVX2.

test/CodeGen/X86/memcmp.ll
57

Apologies for all this noise. I updated the script again at rL316443 to avoid the 'ret' diffs.
I also added extra run lines for SSE1 and AVX1 at rL316446, so you'll need to rebase.

courbet updated this revision to Diff 120787.Oct 30 2017, 3:04 AM
courbet marked 2 inline comments as done.

Address review comments.

courbet updated this revision to Diff 120788.Oct 30 2017, 3:08 AM

Update MergeICMps after switch for IsThreeWay -> IsEquality.

courbet added inline comments.Oct 30 2017, 3:12 AM
lib/Target/X86/X86TargetTransformInfo.cpp
2556–2557

Right. I was basing this on getRegisterBitWidth), but now I see that the DAG can do something else for the sake of performance. So I switched to 16B for >=SSE2 and 32B for >= AVX2.
Thanks for the tests.

spatel accepted this revision.Oct 30 2017, 6:44 AM

LGTM - thanks! See inline comment for a nit.

lib/Target/X86/X86TargetTransformInfo.cpp
2541–2542

This isn't strictly a question of bswap availability - see the suggested code in:
https://bugs.llvm.org/show_bug.cgi?id=33329

This revision is now accepted and ready to land.Oct 30 2017, 6:44 AM
courbet updated this revision to Diff 120814.Oct 30 2017, 6:57 AM
courbet marked an inline comment as done.
  • Add a link to the bug in the comment.

Thanks for the review.

spatel closed this revision.Nov 2 2017, 7:44 AM

Marking as closed with rL316905 (I assume it was just an oversight that the commit didn't include a link to this review).