Page MenuHomePhabricator

[X86] Pass to transform tdpbf16ps intrinsics to scalar operation.
ClosedPublic

Authored by yubing on Feb 4 2021, 11:35 PM.

Details

Summary

In previous patch https://reviews.llvm.org/D93594, we only scalarize tilezero, tileload, tilestore and tiledpbssd. In this patch we scalarize tdpbf16ps intrinsic.

Diff Detail

Event Timeline

yubing created this revision.Feb 4 2021, 11:35 PM
yubing requested review of this revision.Feb 4 2021, 11:35 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptFeb 4 2021, 11:35 PM
yubing updated this revision to Diff 321675.Feb 5 2021, 1:31 AM

Rebase and add a testcase for dpbf16ps intrinsic.

yubing edited the summary of this revision. (Show Details)
pengfei added inline comments.Feb 9 2021, 1:45 AM
llvm/test/CodeGen/X86/AMX/amx-low-intrinsics.ll
213–214

Can we use a shuffle instruction?

yubing updated this revision to Diff 325170.Feb 20 2021, 1:16 AM
yubing edited the summary of this revision. (Show Details)

Address comments above and refactor some code

yubing marked an inline comment as done.Feb 20 2021, 1:18 AM
yubing updated this revision to Diff 325683.Feb 22 2021, 10:34 PM

Rebase and add a testcase.

yubing updated this revision to Diff 325685.Feb 22 2021, 10:57 PM

Fix incorrect naming for dpbf16's bb

yubing updated this revision to Diff 325688.Feb 22 2021, 11:06 PM

Modify some comments

yubing updated this revision to Diff 330898.Mar 16 2021, 1:17 AM

just do a rebase

pengfei added inline comments.Mar 16 2021, 2:53 AM
llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp
362–363

Can we create vecC with <256 x float>?

385

better to use EltCF32 or CF32

392

ditto

393

Better to define a variable for it and reuse.

418–423

Is it concise to use below?

template <Intrinsic::ID IntrID>
typename std::enable_if_t<
    IntrID == Intrinsic::x86_tdpbssd_internal ||
    IntrID == Intrinsic::x86_tdpbf16ps_internal, bool>
lowerTileDP(Instruction *TileDP);
yubing added inline comments.Mar 18 2021, 10:30 PM
llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp
362–363

In fact, we are trying to find a bitcast whose operand is <256 x i32>, as shown in line229.

yubing updated this revision to Diff 331766.Mar 18 2021, 10:31 PM

address Pengfei's comments

pengfei accepted this revision.Mar 19 2021, 5:21 AM

LGTM.

This revision is now accepted and ready to land.Mar 19 2021, 5:21 AM