This is an archive of the discontinued LLVM Phabricator instance.

[Matrix] Use aarch64.udot for 4x4 tiling for i8 matrixes (WIP).
Needs ReviewPublic

Authored by fhahn on Apr 6 2020, 7:14 AM.

Download Raw Diff

This revision needs review, but there are no reviewers specified.

Details

Reviewers: None

Summary

This processes matrix multiplies of i8 matrixes in 4x4 tiles and use
aarch64.udot to compute the result of the 4x4 multiplies.

This patch lowers store(matrix.multiply(transpose(load()), load())) as
described above. As the first operand is transposed we can access the
rows of the transposed operands by loading the columns of the original
load directly.

Note that @llvm.matrix.multiply does not make a distinction between
unsigned & signed multiplication for integers and this patch arbitrarily
use udot. We probably have to add integer multiply variants for signed &
unsigned in the future. Also, the way this is currently integrated needs
a bit of more work. It would probably be good to expose a hook where
targets can be queried which kernels can be implemented efficiently on
the target.

Finally, the shuffles generated for the current lowering seems to
generate awful code for now, but the main goal of the patch is to
illustrate how target specific instructions can be used when lowering
matrix intrsinics.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	160 ms	lldb-unit.Host/_/HostTests::Unknown Unit Message ("")

Event Timeline

fhahn created this revision.Apr 6 2020, 7:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2020, 7:14 AM

Herald added subscribers: danielkiss, tschuett, hiraditya, kristof.beyls. · View Herald Transcript

Add tests to illustrate the generated IR.

fhahn mentioned this in D70456: [Matrix] Add first set of matrix intrinsics and initial lowering pass..Apr 6 2020, 7:35 AM

fhahn added a parent revision: D77550: [Matrix] Add TileInfo abstraction for tiled matrix code-gen..

Harbormaster failed remote builds in B51941: Diff 255328!Apr 6 2020, 8:06 AM

Harbormaster failed remote builds in B51935: Diff 255322!Apr 6 2020, 8:38 AM

LuoYuanke added a subscriber: LuoYuanke.Apr 7 2020, 5:49 AM

I shared the WIP patch to illustrate how matrix intrinsics could be lowered using target intrinsics. I won't have time to work on this in the near future, but if anyone would be interested in picking this up in the meantime, that would be great :)

Thanks Florian, we are happy to pick this up.
+ @samparker , @dmgreen

Small update to preserve loop info.

Harbormaster failed remote builds in B52369: Diff 256037!Apr 8 2020, 9:45 AM

+1 on seeing similar efforts on *all* matrix intrinsics, like transpose

I've just put up D81308, which uses the same approach to generate loops for the regular tiled matrix multiplication. I'll work towards getting the initial infrastructure in place, then the target specific follow-ups should be more straightforward .

Rebase on current trunk.

Harbormaster completed remote builds in B70651: Diff 289931.Sep 4 2020, 7:28 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

LowerMatrixIntrinsics.cpp

117 lines

test/

Transforms/

LowerMatrixIntrinsics/

aarch64-udot-4x4.ll

130 lines

aarch64-udot-8x8.ll