This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][GISel] Update `isCanonicalized`
ClosedPublic

Authored by Pierre-vh on Sep 29 2022, 1:30 AM.

Details

Summary

Recognize more opcodes in the function.
Fixes some regressions introduced in D134857 for fdiv.f16 too.

Depends on D134857

Diff Detail

Event Timeline

Pierre-vh created this revision.Sep 29 2022, 1:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 29 2022, 1:30 AM
Pierre-vh requested review of this revision.Sep 29 2022, 1:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 29 2022, 1:30 AM
foad added a comment.Sep 29 2022, 1:47 AM

Fixes some regressions introduced in D134857 for fdiv.f16 too.

I don't see any real regressions in D134857, just cases where we optimize the "ieee" cases but not the "flush" cases.

foad added inline comments.Sep 29 2022, 1:51 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
10202

It seems wrong for FNEG to fall into the "supportsMinMaxDenormModes" check below.

Pierre-vh updated this revision to Diff 463804.Sep 29 2022, 2:14 AM
Pierre-vh marked an inline comment as done.

Fix unary ops
Also fixes the foldable-fneg test (no longer needs pre/post check lines)

Fixes some regressions introduced in D134857 for fdiv.f16 too.

I don't see any real regressions in D134857, just cases where we optimize the "ieee" cases but not the "flush" cases.

Yes, that's what I meant - we had a split in the ieee/flush codegen but this fixes it :p

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
10202

Indeed it was. Fixed it and also added the remaining unary operations while I was at it.

foad accepted this revision.Sep 29 2022, 2:29 AM

LGTM. (I'm not 100% sure about G_FREM but I assume it expands to something that will flush denormals if appropriate?)

This revision is now accepted and ready to land.Sep 29 2022, 2:29 AM

LGTM. (I'm not 100% sure about G_FREM but I assume it expands to something that will flush denormals if appropriate?)

I think it's fine as this mimics the DAG's implementation. I can still remove FREM if you're not sure about it as I don't really need it added right now (I just added it for completeness).

foad added a comment.Sep 29 2022, 3:20 AM

LGTM. (I'm not 100% sure about G_FREM but I assume it expands to something that will flush denormals if appropriate?)

I think it's fine as this mimics the DAG's implementation. I can still remove FREM if you're not sure about it as I don't really need it added right now (I just added it for completeness).

I agree it's fine. It looks like it normally expands to some combination of div/mul/trunc/fma, i.e. normal fp instructions which will canonicalize.

Pierre-vh updated this revision to Diff 463833.Sep 29 2022, 3:50 AM

Rebase on top of D134857

arsenm accepted this revision.Sep 29 2022, 5:43 AM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
10270–10271

I just realized this is broken since it was copied from the DAG with FP value types. In another patch can you explicitly handle the custom intrinsics?

Plus this isn't handling some of the less common generic opcodes, like all the different rounding functions

I'm not sure I understand, which intrinsics need to be handled? The ones in INTRINSIC_WO_CHAIN in the DAG version?
And what are the opcodes you're talking about? Things like CVT_F32?

I'm not sure I understand, which intrinsics need to be handled? The ones in INTRINSIC_WO_CHAIN in the DAG version?
And what are the opcodes you're talking about? Things like CVT_F32?

Some of them are obscured by custom nodes in the DAG. Some examples are AMDGPUISD::FMAD_FTZ, AMDGPUISD::RCP (corresponding to llvm.amdgcn.rcp), and the cases in INTRINSIC_WO_CHAIN. Plus this is missing a number of the generic opcodes like G_FSQRT and G_FFLOOR

Pierre-vh updated this revision to Diff 463863.Sep 29 2022, 6:13 AM

Handle more opcode/intrinsics

Pierre-vh requested review of this revision.Sep 29 2022, 6:13 AM
arsenm added inline comments.Sep 29 2022, 6:47 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
10195

G_FRINT, G_NEARBYINT, G_INTRINSIC_FPTRUNC_ROUND, G_INTRINSIC_TRUNC, G_INTRINSIC_ROUNDEVEN,

10245

amdgcn_sqrt, fmed3, fmad_ftz, sin, cos, log, log_clamp

10253

cubema, cubesc, cubetc?

10260

Should do separately and also introduce for the DAG path, but we're missing some of the newer operations like _amdgcn_cvt_f32_bf8

Pierre-vh updated this revision to Diff 463881.Sep 29 2022, 6:53 AM
Pierre-vh marked 4 inline comments as done.

Add more missing instructions/intrinsics

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
10245

fmad_ftz was already there, log doesn't seem to exit as an intrinsic but it's an instruction so I it + the variants:

case AMDGPU::G_FLOG:
case AMDGPU::G_FLOG2:
case AMDGPU::G_FLOG10:
arsenm added inline comments.Sep 29 2022, 8:28 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
10224–10225

Also should handle G_FMINNUM/G_FMAXNUM. Also, we should really stop ascribing target instruction behavior to the generic opcodes. This is a separate patch though since this is a big mess

10270–10271

Should default to return false

Pierre-vh marked 2 inline comments as done.

Comments

Pierre-vh planned changes to this revision.Sep 30 2022, 3:45 AM

need a few more items

Pierre-vh updated this revision to Diff 464230.Sep 30 2022, 5:33 AM

Add BUILD_VECTOR

arsenm accepted this revision.Sep 30 2022, 5:45 AM

LGTM

This revision is now accepted and ready to land.Sep 30 2022, 5:45 AM
This revision was landed with ongoing or failed builds.Sep 30 2022, 7:13 AM
This revision was automatically updated to reflect the committed changes.