This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][ARM] Match interleaved memory accesses into ldN/stN/vldN/vstN intrinsics.
ClosedPublic

Authored by HaoLiu on Jun 18 2015, 3:08 AM.

Details

Summary

Hi,

I refactored the patch in D10335 according to the comments. The main concern is to share as much code as possible.

This patch adds a new pass called InterleavedAccessPass in the lib/CodeGen. It constains the code about identifying interleaved accesses. As we can not share the code about creating target specific intrinsics, I put such code in the target backends.

Review please.

Thanks,
-Hao

Diff Detail

Event Timeline

HaoLiu updated this revision to Diff 27923.Jun 18 2015, 3:08 AM
HaoLiu retitled this revision from to [AArch64][ARM] Match interleaved memory accesses into ldN/stN/vldN/vstN intrinsics..
HaoLiu updated this object.
HaoLiu edited the test plan for this revision. (Show Details)
HaoLiu added a subscriber: Unknown Object (MLST).
sbaranga added inline comments.
lib/Target/ARM/ARMTargetMachine.cpp
41

Would it be better to only have one switch in the interleave pass instead of having a separate switch in each backend?

The pass could return when executing runOnFunction if the option is not enabled.

Updated a new patch refactored according to Silviu's comment.

Review please.

Thanks,
-Hao

lib/Target/ARM/ARMTargetMachine.cpp
41

Reasonable.

rengolin added inline comments.Jun 19 2015, 6:13 AM
lib/CodeGen/InterleavedAccessPass.cpp
114

If the mask index can't be negative, why use ArrayRef<int>?

135

Checking for all factors "up to" in isDeInterleaveMaskOfFactor() is redundant with this line.

Though, I see that you're using it in other functions that may need that functionality.

Not sure how to split this, but it looks inefficient...

rengolin added inline comments.Jun 19 2015, 7:43 AM
lib/CodeGen/InterleavedAccessPass.cpp
223

CGP?

Updated a patch according to Renato's comments.

Review please.

Thanks,
-Hao

lib/CodeGen/InterleavedAccessPass.cpp
114

It could be negative. When a mask is undef, it is -1.
Here we only compare non-negative masks and ignore undef masks.

135

I merged isDeInterleaveMask() and isDeInterleaveMaskOfFactor() into one function isDeInterleaveMask().

223

Fixed.

ping...

Thanks,
-Hao

LGTM from me, but I think Michael also wanted to have a look and perhaps Renato has further comments?

rengolin added inline comments.Jun 24 2015, 8:48 AM
lib/CodeGen/InterleavedAccessPass.cpp
135

Hum, I'm still seeing isDeInterleaveMaskOfFactor in the latest patch...

mzolotukhin edited edge metadata.Jun 24 2015, 11:59 AM

Hi Hao,

The code generally looks fine, but I have a question regarding lowerInterleavedStore (please see inline).

Thanks,
Michael

lib/CodeGen/InterleavedAccessPass.cpp
33–34

How would IR look for 4 vectors? Will we have a shuffle of shuffles?

57–58

Do these names comply with the coding standards?

241–243

Will it work for Factor != 2? If not, and other factors aren't supported for now, please add an explicit assert and TODO for it. If yes, should we also check the other shuffles?

lib/Target/AArch64/AArch64TargetTransformInfo.cpp
415

Nitpick: I'd prefer comparing with 2 and 4, instead of 1 and 5. I.e.

if (Factor >= 2 && Factor <= 4)

Also, could we somehow reuse MIN_FACTOR and MAX_FACTOR from InterleavedAccessPass.cpp here? Having the same constants in different places will lead to bugs in future.

lib/Transforms/Vectorize/LoopVectorize.cpp
142 ↗(On Diff #28203)

This change doesn't belong here and anyway needs a separate discussion.

HaoLiu edited edge metadata.

Hi,

I refactored the patch according to Michael's comments. As well as inline comments to answer questions from Renato and Michael.

Review please.

Thanks,
-Hao

lib/CodeGen/InterleavedAccessPass.cpp
33–34

Will have a shuffle.
E.g. An interleaved store of factor 4.

%i.vec = shuffle <8 x i32> %v0, <8 x i32> %v2, <0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15>
store <16 x i32> %i.vec, <16 x i32>* %ptr

%v0 and %v2 could be concatenated from other small vectors, like:

%v0 = shuffle <4 x i32> %A, <4 x i32> %B, <0, 1, 2, 3, 4, 5, 6, 7>
%v1 = shuffle <4 x i32> %C, <4 x i32> %D, <0, 1, 2, 3, 4, 5, 6, 7>

but we only need to check the last shuffle with the RE-interleaved mask.

135

Sorry. I misunderstood and misleaded. I merged isReInterleaveMask() and isReInterleaveMaskOfFactor().

I cannot merge isDeInterleaveMask() and isDeInterleaveMaskOfFactor(), which are both used in lowerInterleavedLoad(). The former is used to check and find an interleave factor. The later is only used to check whether the given mask is the DE-interleaved of the given factor.

241–243

Yes, it will work for other factor.
As the previous example of factor 4, we only need to check the last shuffle with RE-interleaved mask.

lib/Target/AArch64/AArch64TargetTransformInfo.cpp
415

I refactored to add a hook called getMaxSupportedInterleaveFactor(), which is used to share the maximum factor supported by a target.
No need to get the minimum factor, which is always 2.

lib/Transforms/Vectorize/LoopVectorize.cpp
142 ↗(On Diff #28203)

Yes. Forgot to clean.

rengolin added inline comments.Jun 25 2015, 4:28 AM
lib/CodeGen/InterleavedAccessPass.cpp
135

Right, I thought it was weird that you had merged them. :)

rengolin accepted this revision.Jun 25 2015, 5:06 AM
rengolin edited edge metadata.

Thanks Hao, LGTM.

This revision is now accepted and ready to land.Jun 25 2015, 5:06 AM
mzolotukhin accepted this revision.Jun 25 2015, 5:31 PM
mzolotukhin edited edge metadata.

Hi Hao,

Thanks for the explanation!

The code looks good to me now. I'd also recommend to check-in shared part and the target implementations in separate commits - by the logic everything should work fine even without them, right? (and that's what will happen on other targets)

Best regards,
Michael

test/CodeGen/AArch64/aarch64-interleaved-accesses.ll
14–16

s/delat/delta/

test/CodeGen/ARM/arm-interleaved-accesses.ll
14–16

s/delat/delta/ ?

This revision was automatically updated to reflect the committed changes.

Hi

I committed this patch separately in r240751, r240754 and r240755. Thanks again for your review comments.

As interleaved access is disabled by default in LoopVectorize, currently this pass is also disabled by default. The next step is to enable it by default. My dear colleague (probably Silviu) will do benchmarking work and enable it by default later.

Thanks,
-Hao

MatzeB added a subscriber: MatzeB.Jan 30 2017, 12:56 PM

Is there a reason this was implemented as an IR pass and not at the SelectionDAG level?

Is there a reason this was implemented as an IR pass and not at the SelectionDAG level?

Wow, this brings back memories, but unfortunately not the reason for an additional pass. I think it was something to do with being after the register allocation, though I may be misquoting someone.