This is an archive of the discontinued LLVM Phabricator instance.

[ARM64/AArch64] Port NEON post-increment load/store with 2/3/4 vectors to ARM64 backend
Needs ReviewPublic

Authored by HaoLiu on May 5 2014, 1:18 AM.

Details

Reviewers
t.p.northover
Summary

Hi Tim and other reviewers,

This patch ports all NEON post-increment load/store with 2/3/4 vectors to ARM64 backend, including following post increment instructions:
LD1 - Load multiple 1-element structures to two, three or four consecutive registers
LD2 - Load multiple 2-element structures to two consecutive registers
LD3 - Load multiple 3-element structures to three consecutive registers
LD4 - Load multiple 4-element structures to four consecutive registers
LD2 - Load single 2-element structure to one lane of two consecutive registers
LD3 - Load single 3-element structure to one lane of three consecutive registers
LD4 - Load single 4-element structure to one lane of four consecutive registers
LD2R - Load single 2-element structure and replicate to all lanes of two registers
LD3R - Load single 3-element structure and replicate to all lanes of three registers
LD4R - Load single 4-element structure and replicate to all lanes of four registers
ST1 - Store multiple 1-element structures from two, three or four consecutive registers
ST2 - Store multiple 2-element structures from two consecutive registers
ST3 - Store multiple 3-element structures from three consecutive registers
ST4 - Store multiple 4-element structures from four consecutive registers
ST2 - Store single 2-element structure from one lane of two consecutive registers
ST3 - Store single 3-element structure from one lane of three consecutive registers
ST4 - Store single 4-element structure from one lane of four consecutive registers

BTW, I just think the implementation in ARM64DAGToDAGISel::Select has some redundancy. Every time for an intrinsic/ISDNode, it compare to 12 types from v16i8 to v2f64 and call corresponding select function such as SelectLoad, SelectStore. If we call SelectLoad/SelectStore directly and compare the types inside, we can reduce some code. Anyway, it just something about code structure and has nothing to do with correctness. I don't modify it. I just use the same way to call SelectPostLoad as call SelectLoad.

Code review, please.

Thanks,
-Hao

Diff Detail

Event Timeline

HaoLiu updated this revision to Diff 9064.May 5 2014, 1:18 AM
HaoLiu retitled this revision from to [ARM64/AArch64] Port NEON post-increment load/store with 2/3/4 vectors to ARM64 backend.
HaoLiu updated this object.
HaoLiu edited the test plan for this revision. (Show Details)
HaoLiu added a reviewer: t.p.northover.
HaoLiu added a subscriber: Unknown Object (MLST).
t.p.northover edited edge metadata.May 6 2014, 2:36 AM

Hi Hao,

Thanks for working on this. I've got some comments, mostly extremely minor nits (I can't see anything actually wrong with the code).

I do agree about that repeated type checking. It's rather untidy, but it's in the style of the surrounding code so we can probably deal with that separately.

Cheers.

Tim.

lib/Target/ARM64/ARM64ISelDAGToDAG.cpp
1068

Commented out code.

1219

Wouldn't this be neater as a for loop?

static unsigned QSubs[] = { ARM64::qsub0, ARM64::qsub1, ARM64::qsub2, ARM64::qsub3 };
for (int i = 0; i < NumVecs; ++i) {
  SDValue NV = CurDAG->getTargetExtractSubReg(QSubs[i], dl, WideVT, SuperReg);
  if (Narrow)
    NV = NarrowVector(NV, *CurDAG);
  ReplaceUses(SDValue(N, i), NV);
}

(Untested).

lib/Target/ARM64/ARM64ISelLowering.cpp
7017–7019

Functions usually start with a lower-case letter, and indentation is a bit wonky here.

Hi Tim,

This patch has been refactored according your comments and has been committed in http://llvm.org/viewvc/llvm-project?view=revision&revision=208284.

Thanks,
-Hao

lib/Target/ARM64/ARM64ISelLowering.cpp