This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Match med3 for (max (min ..))
ClosedPublic

Authored by Pierre-vh on Mar 2 2023, 6:42 AM.

Details

Summary

We previously only matched (min (max ...))

Depends on D144728

Diff Detail

Event Timeline

Pierre-vh created this revision.Mar 2 2023, 6:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2023, 6:42 AM
Pierre-vh requested review of this revision.Mar 2 2023, 6:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2023, 6:42 AM
Pierre-vh added inline comments.Mar 2 2023, 6:48 AM
llvm/test/CodeGen/AMDGPU/saddsat.ll
17

Seems like med3 cannot take immediate operands. Should a pattern be added to fall back to v_min/v_max if both operands are ext'd constants?

28

Another example, this time it's 2 more instructions so it's a bit worse even.

foad added inline comments.Mar 2 2023, 7:00 AM
llvm/test/CodeGen/AMDGPU/saddsat.ll
17

VOP3 instructions couldn't take a literal operand until gfx10.

28

The extra one is because you're using v_med3_i32 for a 16-bit operation. v_med3_i16 was introduced in gfx9.

foad added a comment.Mar 2 2023, 7:02 AM

Is there a reason this is implemented in C++ instead of instruction selection patterns?

Is there a reason this is implemented in C++ instead of instruction selection patterns?

It's a DAG combine, it isn't creating a V_MED3 directly but the AMDGPUsmed3 DAG op so it can be matched by other ISel pattern
Can we write those in TableGen? I thought we couldn't

Is there a reason this is implemented in C++ instead of instruction selection patterns?

It's a DAG combine, it isn't creating a V_MED3 directly but the AMDGPUsmed3 DAG op so it can be matched by other ISel pattern
Can we write those in TableGen? I thought we couldn't

I think the constant value case would just be more annoying to handle in tablegen, but I think it's possible. If there's value in other patterns checking the med3, it makes sense to keep as a combine

llvm/test/CodeGen/AMDGPU/saddsat.ll
28

This should have been filtered out by the hasMed3_16 check

Pierre-vh added inline comments.Mar 5 2023, 11:50 PM
llvm/test/CodeGen/AMDGPU/saddsat.ll
28

Should we just not do the combine if med3_16 is not available, instead of using the 32-bit version then?

arsenm added inline comments.Mar 6 2023, 4:20 AM
llvm/test/CodeGen/AMDGPU/saddsat.ll
28

Yes, because you'll always have to materialize the constants and extend the value. The 32-bit version might be better in cases where the inputs have repeated med3 uses but that seems unlikely

Pierre-vh updated this revision to Diff 502598.Mar 6 2023, 5:00 AM
Pierre-vh marked 4 inline comments as done.

Remove ext to i32 if med3_16 is not available

arsenm accepted this revision.Mar 6 2023, 5:05 AM

LGTM with nit

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
10625

Probably fine if both constants have multiple uses already, not worth it for this particular patch

10753

No else after return

This revision is now accepted and ready to land.Mar 6 2023, 5:05 AM
Pierre-vh updated this revision to Diff 502601.Mar 6 2023, 5:07 AM
Pierre-vh marked an inline comment as done.

Remove else after return

This revision was landed with ongoing or failed builds.Mar 7 2023, 2:14 AM
This revision was automatically updated to reflect the committed changes.