Page MenuHomePhabricator

[AMDGPU][MC][GFX8+] Correct v_cndmask modifiers
ClosedPublic

Authored by dp on Oct 13 2022, 11:48 AM.

Details

Summary

Correct v_cndmask_b32 to support abs/neg modifiers in dpp/sdwa/e64 variants.
Correct v_cndmask_b16 for proper disassembly of abs/neg modifiers in e64_dpp variants.

See this bug for more information.

Diff Detail

Event Timeline

dp created this revision.Oct 13 2022, 11:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2022, 11:48 AM
dp requested review of this revision.Oct 13 2022, 11:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2022, 11:48 AM
This revision is now accepted and ready to land.Oct 13 2022, 11:56 AM

I guess CNDMASK_B16 should be a True16 instruction. I shall plan to do that. With that in mind, this patch appears even less parameterized for (i32 vs i16) input types than before. Is there any way to get some of that back? IE before this patch CNDMASK_B16_T16 wouldn't need to override InsDPP, but after this it does.

dp added a comment.Oct 14 2022, 4:20 AM

I guess CNDMASK_B16 should be a True16 instruction. I shall plan to do that. With that in mind, this patch appears even less parameterized for (i32 vs i16) input types than before. Is there any way to get some of that back? IE before this patch CNDMASK_B16_T16 wouldn't need to override InsDPP, but after this it does.

I assume that True16 instructions may use first 128 VGPRs only. v_cndmask_b16 is a native VOP3 instruction, it does not have such limitations. Also, it has no e32_dpp variant, so I do not understand why InsDPP may need a correction. What am I missing here?

Joe_Nash accepted this revision.Oct 14 2022, 6:23 AM
In D135900#3858246, @dp wrote:

I guess CNDMASK_B16 should be a True16 instruction. I shall plan to do that. With that in mind, this patch appears even less parameterized for (i32 vs i16) input types than before. Is there any way to get some of that back? IE before this patch CNDMASK_B16_T16 wouldn't need to override InsDPP, but after this it does.

I assume that True16 instructions may use first 128 VGPRs only. v_cndmask_b16 is a native VOP3 instruction, it does not have such limitations. Also, it has no e32_dpp variant, so I do not understand why InsDPP may need a correction. What am I missing here?

Ok, sorry, I missed the fact that v_cndmask_b16 is a native VOP3 instruction. LGTM!

This revision was automatically updated to reflect the committed changes.