This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine, ARM] Convert vld1 to llvm load
ClosedPublic

Authored by labrinea on Apr 30 2018, 10:06 AM.

Details

Summary

This patch converts a vector load intrinsic into a simple llvm load instruction. This is beneficial when the underlying object being addressed comes from a constant, since we get constant-folding for free.

Diff Detail

Event Timeline

labrinea created this revision.Apr 30 2018, 10:06 AM

I think you need an endianness check somewhere?

Nevermind, you don't need an endianness check, I think; you're not changing the number of elements, so it should have the same semantics. (See also https://llvm.org/docs/BigEndianNEON.html .)

javed.absar added inline comments.Apr 30 2018, 2:47 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
870

There is a check of IntrAlign, but then IntrAlign is still used otherwise? Maybe it will be cleaner to not use 'auto'

2924

It might be cleaner to extract MemAlign in the called function

I'd like to run some performance tests before this gets merged; I'll try to get back to you sometime this week.

labrinea added inline comments.May 1 2018, 4:10 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
870

Using auto as it's inferred by the dynamic cast.

2924

I was thinking that I'll have to pass three pointers as parameters to make that happen, which doesn't sound very optimal performance wise.

spatel added a subscriber: spatel.May 1 2018, 2:28 PM
spatel added inline comments.
lib/Transforms/InstCombine/InstCombineCalls.cpp
867

The name makes this look like a generic helper, but it's only applicable to neon, right?

Why is it sitting in the middle of a pile of x86 code?

870
871–872

This doesn't make sense when IntrAlign is not a ConstantInt. Please add a test like this:

define <2 x i64> @vld1_2x64(i8* %ptr, i32 %x) {
  %vld1 = call <2 x i64> @llvm.arm.neon.vld1.v2i64.p0i8(i8* %ptr, i32 %x)
  ret <2 x i64> %vld1
}

...or if that's not supposed to be allowed, then we don't need a dyn_cast.

The dynamic cast check for the alignment parameter of the intrinsic is necessary. I could bail the optimization if it's not constant and let the backend crash later on. There are plenty of intrinsics where some arguments need to be constant, but IR has no way to enforce that. Clang will guarantee for it. The rest of the suggestions sound sensible. I'll update my patch accordingly.

labrinea updated this revision to Diff 144874.May 2 2018, 6:32 AM
spatel added inline comments.May 2 2018, 6:49 AM
test/Transforms/InstCombine/ARM/vld1.ll
2

Please auto-generate the assertions using:
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py

The RUN line should redirect from the input file rather than taking a param:
; RUN: opt < %s -instcombine -S | FileCheck %s

labrinea updated this revision to Diff 144989.May 3 2018, 1:56 AM

Changes to the test file:

  • Made the RUN line redirect the input file to opt.
  • Added a NOTE to enable autogenerated assertions.
spatel added a comment.May 3 2018, 6:40 AM
  • Added a NOTE to enable autogenerated assertions.

Sorry if this wasn't clear: my suggestion was to actually run the script, not pretend to run the script. You shouldn't need any manually-generated 'CHECK' lines in the test file. Let me know if you have problems/questions using the script.

@spatel, how cool! Thanks for pointing that out. I wasn't aware of that script. I'll update my test shortly.

labrinea updated this revision to Diff 145022.May 3 2018, 8:12 AM

I've actually used the utils/update_test_checks.py to auto-generate Filecheck assertions for the unit test. @spatel, this script is very practical but I am bit sceptical about it. We must be very careful when using it. The checks may not impose the desired compiler behaviour when auto-generated.

spatel added a comment.May 3 2018, 9:01 AM

I've actually used the utils/update_test_checks.py to auto-generate Filecheck assertions for the unit test. @spatel, this script is very practical but I am bit sceptical about it. We must be very careful when using it. The checks may not impose the desired compiler behaviour when auto-generated.

Thanks. Can you explain your concern? This is off-topic for the patch itself, so feel free to raise it on llvm-dev if that seems more appropriate.

I have no further comments for the patch, but I'll let @efriedma comment/approve once the perf results are available.

Sure! Thanks again for the review :)

@efriedma, ping. Any perf results on this?

Sorry about the delay. I ran some benchmarks internally, looks like it's perfomance-neutral, so we're fine there.

test/Transforms/InstCombine/ARM/vld1.ll
25

Alignment on an LLVM IR load instruction is in bytes, not bits.

labrinea updated this revision to Diff 147970.May 22 2018, 2:49 AM

Using getLimitedValue() instead of getZExtValue() for the ConstantInt representing the memory alignment of the load instruction. Updated the tests: alignment in now expressed in bytes instead of bits.

efriedma added inline comments.May 23 2018, 12:08 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
1402

For safety, probably should verify the alignment is a power of two.

labrinea updated this revision to Diff 149097.May 30 2018, 5:39 AM

Bail the optimization if the memory alignment is not power of two. Added a test to cover this case.

This revision is now accepted and ready to land.May 30 2018, 12:15 PM
This revision was automatically updated to reflect the committed changes.