This is an archive of the discontinued LLVM Phabricator instance.

[mlir] [VectorOps] Improve scatter/gather CPU performance
ClosedPublic

Authored by aartbik on Jul 22 2020, 4:52 PM.

Details

Summary

Replaced the linearized address with the proper LLVM way of
defining vector of base + indices in SIMD style. This yields
much better code. Some prototype results with microbencmarking
sparse matrix x vector with 50% sparsity (about 2-3x faster):

LINEARIZED     IMPROVED

GFLOPS sdot saxpy sdot saxpy
16x16 1.6 1.4 4.4 2.1
32x32 1.7 1.6 5.8 5.9
64x64 1.7 1.7 6.4 6.4
128x128 1.7 1.7 5.9 5.9
256x256 1.6 1.6 6.1 6.0
512x512 1.4 1.4 4.9 4.7

Diff Detail

Event Timeline

aartbik created this revision.Jul 22 2020, 4:52 PM
This revision is now accepted and ready to land.Jul 22 2020, 10:48 PM
This revision was automatically updated to reflect the committed changes.