This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][GFX10] Disabled v_movrel*[sdwa|dpp] opcodes in codegen
ClosedPublic

Authored by dp on Nov 18 2019, 8:07 AM.

Details

Summary

These opcodes use indirect register addressing so they need special handling by codegen (currently missing).

Diff Detail

Event Timeline

dp created this revision.Nov 18 2019, 8:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 18 2019, 8:07 AM

Look mostly good, but can you split this change into one that relates to DPP and another that disables asm only instructions?

dp updated this revision to Diff 229860.Nov 18 2019, 8:37 AM
dp edited the summary of this revision. (Show Details)

Separated dpp combiner changes to D70402

vpykhtin added inline comments.Nov 18 2019, 10:01 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6332

is there anyway to mark these instructions in td files?

dp marked 2 inline comments as done.Nov 18 2019, 10:56 AM
dp added inline comments.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6332

I thought about it. Yes, it is possible, but that will not make code more readable overall.
Labelling these opcodes in td will make code cleaner in this file, but require more changes elsewhere.

Overall I think that this case is very special and requires a special solution. If we face similar issues in the future (that need more cases in the switch below), we may create a flag for this purpose. I'm not sure it is necessary for MOVREL*.

rampitec added inline comments.Nov 18 2019, 12:42 PM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6332

Is it the same as isAsmParserOnly in td? If so shouldn't it be easy to mark it there?
In turn if we need an extra TSFlags bit that is not worth it, as these bits are not countless.

Actually what makes them risky is impuse of M0, so it can be folded around M0 definition. Isn't it cleaner to check for impuse in the SDWA and DPP combiner and disable the combining on these grounds rather than excluding it from codegen completely?

dp marked an inline comment as done.Nov 18 2019, 2:12 PM

Actually what makes them risky is impuse of M0, so it can be folded around M0 definition. Isn't it cleaner to check for impuse in the SDWA and DPP combiner and disable the combining on these grounds rather than excluding it from codegen completely?

Maybe. But I do not understand how codegen can handle these instructions without knowing actual dst and src registers. To support _dpp and _sdwa variants codegen needs the same (or similar) hacks as those implemented for v_movreld_b32.

In D70400#1750597, @dp wrote:

Actually what makes them risky is impuse of M0, so it can be folded around M0 definition. Isn't it cleaner to check for impuse in the SDWA and DPP combiner and disable the combining on these grounds rather than excluding it from codegen completely?

Maybe. But I do not understand how codegen can handle these instructions without knowing actual dst and src registers. To support _dpp and _sdwa variants codegen needs the same (or similar) hacks as those implemented for v_movreld_b32.

Hmm. I think you are right:

v1 = v_and_b32 v2, 0xf
v3 = v_movrels_b32 v1

Means: v3 = v1[m0], same as v3 = (v1 & 0xf)[m0]
After sdwa conversion it would be: v3 = v2[m0] & 0xf

Not exactly the same thing.

This revision is now accepted and ready to land.Nov 18 2019, 2:30 PM
vpykhtin accepted this revision.Nov 19 2019, 8:33 AM

LGTM.

This revision was automatically updated to reflect the committed changes.