This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix for folding v2.16 literals.
ClosedPublic

Authored by dfukalov on Sep 4 2020, 12:55 PM.

Details

Summary

It was found some packed immediate operands (e.g. <half 1.0, half 2.0>) are
incorrectly processed so one of two packed values were lost.

Introduced new function to check immediate 32-bit operand can be folded.
Converted condition about current op_sel flags value to fall-through.

Fixes: SWDEV-247595

Diff Detail

Event Timeline

dfukalov created this revision.Sep 4 2020, 12:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 4 2020, 12:55 PM
dfukalov requested review of this revision.Sep 4 2020, 12:55 PM
rampitec added inline comments.Sep 4 2020, 2:09 PM
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
195–196

Seems like you still need to check if it is an inline literal, not just foldable. If it is foldable but not inline you can do it only with VOP3 literals available, in which case you do not need to play these games with op_sel at all.

Then I also do not see where the check for vop3 literals is done here. Imagine you would get one and produce op_sel version of something foldable but not inline on GFX9.

dfukalov added inline comments.Sep 5 2020, 3:29 PM
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
195–196

This case is about op_sel folding optimisation only (condition in line 193 checks the instruction has SIInstrFlags::IsPacked flag so it is VOP3). If a literal is not foldable with op_sel, it is processed by code below.

rampitec added inline comments.Sep 8 2020, 10:09 AM
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
195–196

It is not sufficient to check for packed. Assume that literal is 0x200. It is foldable according to your check because high half is zero. Then the code as written will fold it. Same for 0x02000000 because low half is zero. But this is only legal to do on GFX10 which hasVOP3Literal(). As far as I understand it will also fold it on GFX9 where it is not legal. I do not see this legality check here.

dfukalov added inline comments.Sep 8 2020, 11:45 AM
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
195–196

Actually we are here in function that updates operands, since they are in FoldList. Your examples are not placed in candidates list for gfx900. They fails in last tryAddToFoldList() from foldOperand(). It checks isOperandLegal() and fails to use other options.

There are a lot of tests (with different resulting ISA for GFX9 and GFX10) for symmetrical literals (hi16bit == lo16bit) in the test file named "immv216.ll". These literals pass the checks in discussed lines too.

Additionally, "pk_max_f16_literal.ll" test contains check that <half 0xH0000, half 0xH41C8> and <half 0xH41C8, half 0xH0> literals folded for GFX10, but not for GFX9.

Should I duplicate second tests into immv216.ll with, probably, other packed instructions?

rampitec added inline comments.Sep 8 2020, 12:06 PM
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
195–196

It seems to be working, but I want to understand how. Could you please find out at which point this is rejected on GFX9?

dfukalov added inline comments.Sep 9 2020, 7:29 AM
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
195–196

For the case you described, on GFX9:
Firstly foldOperand() fall through its body just to last tryAddToFoldList() at line 914. Then in line 335 we go into if(!TII->isOperandLegal()) and try to check other options (MAC operations, SETREG instruction). They are all not possible in our case so then we try to commute operands and check this variant (lines 373-424). This approach doesn't make folding possible too.

So tryAddToFoldList() doesn't add the instruction to FoldList and returns with false at line 427. And then we do not reach tryAddToFoldList() call at line 1286 within foreach(FoldList).

For the difference, on GFX10 tryAddToFoldList() doesn't got into if(!TII->isOperandLegal()) at line 335 but just arrives to appendFoldCandidate() at line 457 and add the instruction to FoldList.

rampitec accepted this revision.Sep 9 2020, 9:58 AM

OK, thanks!

This revision is now accepted and ready to land.Sep 9 2020, 9:58 AM
This revision was landed with ongoing or failed builds.Sep 9 2020, 3:39 PM
This revision was automatically updated to reflect the committed changes.