User Details
- User Since
- Feb 21 2018, 6:39 AM (228 w, 20 h)
Yesterday
Tue, Jul 5
op_sel value is not being propagated from src_modifiers to the op_sel operand
I was convinced that op_sel value is not important for instruction printing. Non dpp version does not even have op_sel operand and still gets printed correctly, helper prints op_sel[a,b,c] from src_modifiers.
I missed to test assembler, missing part is setting op_sel bit in src_modifiers, done via cvtVOP3P.
I will double check both.
Mon, Jul 4
May 13 2022
May 11 2022
May 10 2022
Can you also add comment that test changes for some memory instructions are caused by subtarget feature that allows offset folding (to be precise not having feature that forbids offset folding). Maybe separate these tests to another patch.
LGTM also, I am just double checking tests, might add a few more comments soon.
Apr 21 2022
Feb 22 2022
Jan 19 2022
There should be no reason to look through copies here, switch to getVRegDef.
Jan 17 2022
I don't think this is the best place to perform this combine since there are some regressions in regbank-combiner.
Also I expect that it will be covered by D116441 during inst-selection.
Jan 14 2022
Jan 13 2022
Jan 12 2022
Previous version already had llvm-ir test with fma pattern covered in matchCombineFAddFMulToFMadOrFMA.
Cover remaining potential errors when MRI.getVRegDef(Reg)->getOperand(0).getReg() is used. Adding targeted mir tests for all these cases.
I don't have llvm-ir tests for these since they seem to combine before legalizer, unmerges that causes bad combine are generated in legalizer.
Jan 11 2022
TODO: do the same for remaining fma combines.
Jan 10 2022
The G_UNMERGE_VALUES of the G_SEXT of the G_BUILD_VECTOR would introduce a G_SEXT for each of the scalars.
Is there a reason why we don't have simpler pattern where conversion opcode of merge like opcode is not used by G_UNMERGE_VALUES (in this case only G_SEXT of the G_BUILD_VECTOR -> G_BUILD_VECTOR of G_SEXT for each of the scalars).
Dec 30 2021
Use ThreeOpFrag for integer min3 and max3.
Floating point min3 and max3 use regular pattern since Tablegen does not support complex pattern (VOP3Mods) inside PatFrag which would count sgpr operands. There should not be many cases where this would make a big difference since sgpr float operands come as function arguments or uniform loads, there are no uniform versions of floating point instructions.
There should be no constant bus restriction errors (sdag also takes care of cst_bus_restrictions when it prints mir from selected DAG)
- integer min/max3 ThreeOpFrag handles cst_bus_restrictions
- float min/max3
- sdag deals with cst_bus_restrictions when it prints mir from selected DAG
- globalisel all float min/max3 operands are set to vgpr by regbankselect
Improve test coverage and add precommit for new tests.
Dec 27 2021
Dec 23 2021
Dec 21 2021
Address comments, fix more spelling mistakes.
Cleanup.
Rebase and address comments.
Dec 16 2021
Dec 15 2021
Rebase and ping.
Rebase and ping.
Rebase.
Dec 3 2021
Rebase and ping.
Dec 2 2021
Cache SIInstrInfo in the helper.
Dec 1 2021
Rebase and ping.
Nov 30 2021
Use same prefix for gfx9 and gfx10 in mir test.
Make all operands vgpr for med3. Find already existing copies to vgpr for constants.
Nov 29 2021
Nov 18 2021
Oct 5 2021
Afaik %3:_(<2 x s32>), %4:_(<2 x s32>) = G_UNMERGE_VALUES %0:_(<2 x s64>) is not yet defined to work like that and should probably be forbidden to make such instruction in builder. Only allow scalar split and vector split to elements or sub-vectors.
Desired unmerge works if done step by step:
%5:_(s64), %6:_(s64) = G_UNMERGE_VALUES %0:_(<2 x s64>) %7:_(s32), %8:_(s32) = G_UNMERGE_VALUES %5:_(s64) %9:_(s32), %10:_(s32) = G_UNMERGE_VALUES %6:_(s64) %3:_(<2 x s32>) = G_BUILD_VECTOR %5:_(s64), %7:_(s32) %4:_(<2 x s32>) = G_BUILD_VECTOR %6:_(s64), %8:_(s32)
Similar for merge: merge s64 elements first then build <2 x s64> vector.
But <2 x s32> instructions will be scalarized, bit shift packing into <2 x s16> is most probably slower.
Why not go with scalarize at the start?
Sep 21 2021
Haven't finished grokking this yet, some comments so far.
Do you have some test cases you are interested in? High level summary would be that by doing unmerge to each element before changing number of elements in a vector we are able to reference each element using vreg. This should be enough since we don't deal with anything smaller. This fixes main problem I see in combines "referencing an element in subvector" required by "merge subvector with subvector(or single element) of another subvector".
Sep 20 2021
Addressed review comments.
Remove min3 and max3 combine from sdag path, use isel patterns. I am not sure about this for sdag, generated code looks worse. I could move this to the post-regbankcombiner for global-isel.
Notable test changes:
- ieee = true : quieting(fcanonicalize) of operands that go in min3/max3. This is because min/max legalization happens, originally combine would make min3/max3 before legalizing.
- uniform ctlz/cttz lowering : isel selects uniform min/max thus no min3/max3. SI Fix SGPR copies turns this into v_min/v_max and it looks like a regression.
Thanks for the review and compile time performance testing.
Dag does this in dag-combiner. For GlobalISel combine would be in regbankcombiner (we need banks to not combine for uniform min and max) which runs just before isel. And since it is simple to match I did this in tablegen.
I will check if this works for sdag without regressions.
Sep 17 2021
Ping.
Remove comment about constant splat with NaN value. Seems unnecessary to mention it since we match all constants.
Ping.
Sep 16 2021
Most common case where we end up with dead instrs in output is unmerge with dead deaf
legalize unmerge using lower(unmerge is in inst list):
Legalizing: %36:_(s16), %37:_(s16) = G_UNMERGE_VALUES %17:_(<2 x s16>)
%37 is dead here, lower creates instrs and artifacts but we are now going through instrlist
.. .. New MI: %49:_(s32) = G_BITCAST %17:_(<2 x s16>)
.. .. New MI: %36:_(s16) = G_TRUNC %49:_(s32)
.. .. New MI: %50:_(s32) = G_LSHR %49:_, %47:_(s32)
.. .. New MI: %37:_(s16) = G_TRUNC %50:_(s32)
Sep 15 2021
Ping. Also see build_vector combines starting with D109240.
Rebase and ping.
Brief summary:
Add support for g_fconstant matching and option to return APFloat.
Rename getConstantVRegValWithLookThrough into get{I|F|Any}ConstantVRegValWithLookThrough. Remove HandleFConstant argument, use function with appropriate name instead.
Add matchers for Reg and APInt/Float: g_constant m_GCst(&ValueAndVReg) and g_fconstant m_GFCst(&FPValueAndVReg)
Sep 14 2021
Sep 10 2021
Move to amdgpu post-legalizer combine.