Page MenuHomePhabricator

[ARM][BFloat] Implement bf16 get/set_lane without casts to i16 vectors

Authored by miyuki on Jun 19 2020, 10:05 AM.



Currently, in order to extract an element from a bf16 vector, we cast
the vector to an i16 vector, perform the extraction, and cast the result to
bfloat. This behavior was copied from the old fp16 implementation.

The goal of this patch is to achieve optimal code generation for lane
copying intrinsics in a subsequent patch (LLVM fails to fold certain
combinations of bitcast, insertelement, extractelement and
shufflevector instructions leading to the generation of suboptimal code).

Diff Detail

Event Timeline

miyuki created this revision.Jun 19 2020, 10:05 AM
stuij accepted this revision.Jun 22 2020, 9:44 AM

LGTM. Thanks!

This revision is now accepted and ready to land.Jun 22 2020, 9:44 AM
This revision was automatically updated to reflect the committed changes.