This patch converts a vector load intrinsic into a simple llvm load instruction. This is beneficial when the underlying object being addressed comes from a constant, since we get constant-folding for free.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Nevermind, you don't need an endianness check, I think; you're not changing the number of elements, so it should have the same semantics. (See also https://llvm.org/docs/BigEndianNEON.html .)
I'd like to run some performance tests before this gets merged; I'll try to get back to you sometime this week.
lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
---|---|---|
932 ↗ | (On Diff #144577) | The name makes this look like a generic helper, but it's only applicable to neon, right? Why is it sitting in the middle of a pile of x86 code? |
935 ↗ | (On Diff #144577) | LLVM preference is to use 'auto *' for pointers: |
936–937 ↗ | (On Diff #144577) | This doesn't make sense when IntrAlign is not a ConstantInt. Please add a test like this: define <2 x i64> @vld1_2x64(i8* %ptr, i32 %x) { %vld1 = call <2 x i64> @llvm.arm.neon.vld1.v2i64.p0i8(i8* %ptr, i32 %x) ret <2 x i64> %vld1 } ...or if that's not supposed to be allowed, then we don't need a dyn_cast. |
The dynamic cast check for the alignment parameter of the intrinsic is necessary. I could bail the optimization if it's not constant and let the backend crash later on. There are plenty of intrinsics where some arguments need to be constant, but IR has no way to enforce that. Clang will guarantee for it. The rest of the suggestions sound sensible. I'll update my patch accordingly.
test/Transforms/InstCombine/ARM/vld1.ll | ||
---|---|---|
1 ↗ | (On Diff #144874) | Please auto-generate the assertions using: The RUN line should redirect from the input file rather than taking a param: |
Changes to the test file:
- Made the RUN line redirect the input file to opt.
- Added a NOTE to enable autogenerated assertions.
Sorry if this wasn't clear: my suggestion was to actually run the script, not pretend to run the script. You shouldn't need any manually-generated 'CHECK' lines in the test file. Let me know if you have problems/questions using the script.
@spatel, how cool! Thanks for pointing that out. I wasn't aware of that script. I'll update my test shortly.
I've actually used the utils/update_test_checks.py to auto-generate Filecheck assertions for the unit test. @spatel, this script is very practical but I am bit sceptical about it. We must be very careful when using it. The checks may not impose the desired compiler behaviour when auto-generated.
Thanks. Can you explain your concern? This is off-topic for the patch itself, so feel free to raise it on llvm-dev if that seems more appropriate.
I have no further comments for the patch, but I'll let @efriedma comment/approve once the perf results are available.
Sorry about the delay. I ran some benchmarks internally, looks like it's perfomance-neutral, so we're fine there.
test/Transforms/InstCombine/ARM/vld1.ll | ||
---|---|---|
24 ↗ | (On Diff #145022) | Alignment on an LLVM IR load instruction is in bytes, not bits. |
Using getLimitedValue() instead of getZExtValue() for the ConstantInt representing the memory alignment of the load instruction. Updated the tests: alignment in now expressed in bytes instead of bits.
lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
---|---|---|
1402 ↗ | (On Diff #147970) | For safety, probably should verify the alignment is a power of two. |
Bail the optimization if the memory alignment is not power of two. Added a test to cover this case.