This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Add isel patterns for min3 and max3
Needs ReviewPublic

Authored by Petar.Avramovic on Sep 17 2021, 8:26 AM.

Details

Reviewers
foad
arsenm

Diff Detail

Event Timeline

Petar.Avramovic requested review of this revision.Sep 17 2021, 8:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 17 2021, 8:26 AM
foad added a comment.Sep 20 2021, 2:55 AM

AMDGPU/GlobalISel: Add min3 and max3 combines

These are not combines, and it looks like they work for SelectionDAG as well as GlobalISel?

foad added a comment.Sep 20 2021, 2:56 AM

If they do work for SelectionDAG as well, maybe move your new test out of the GlobalISel/ directory and run it for both instruction selectors?

Petar.Avramovic retitled this revision from AMDGPU/GlobalISel: Add min3 and max3 combines to AMDGPU/GlobalISel: Add isel patterns for min3 and max3.Sep 20 2021, 2:59 AM
Petar.Avramovic edited the summary of this revision. (Show Details)

Dag does this in dag-combiner. For GlobalISel combine would be in regbankcombiner (we need banks to not combine for uniform min and max) which runs just before isel. And since it is simple to match I did this in tablegen.
I will check if this works for sdag without regressions.

Remove min3 and max3 combine from sdag path, use isel patterns. I am not sure about this for sdag, generated code looks worse. I could move this to the post-regbankcombiner for global-isel.
Notable test changes:

  • ieee = true : quieting(fcanonicalize) of operands that go in min3/max3. This is because min/max legalization happens, originally combine would make min3/max3 before legalizing.
  • uniform ctlz/cttz lowering : isel selects uniform min/max thus no min3/max3. SI Fix SGPR copies turns this into v_min/v_max and it looks like a regression.

Rebase and ping.

My main worry here would be the constant bus restriction handling. Can you add tests for every permutation of SGPR and VGPR operands?

My main worry here would be the constant bus restriction handling. Can you add tests for every permutation of SGPR and VGPR operands?

I'm pretty sure these already exist for the DAG path, should be able to just copy them

foad added a comment.Dec 16 2021, 4:05 AM

As discussed offline, can min3/max3 use the existing ThreeOpFrag that is already used for other cases like add3 and or3?

As a separate improvement, SITargetLowering::reassociateScalarOps is currently only used for add but could also be used for and/or/xor/min/max/...

foad added inline comments.Dec 16 2021, 4:55 AM
llvm/test/CodeGen/AMDGPU/ctlz.ll
579–580

Regression here.

llvm/test/CodeGen/AMDGPU/max3.ll
264

Please precommit new tests like this.

Use ThreeOpFrag for integer min3 and max3.
Floating point min3 and max3 use regular pattern since Tablegen does not support complex pattern (VOP3Mods) inside PatFrag which would count sgpr operands. There should not be many cases where this would make a big difference since sgpr float operands come as function arguments or uniform loads, there are no uniform versions of floating point instructions.
There should be no constant bus restriction errors (sdag also takes care of cst_bus_restrictions when it prints mir from selected DAG)

  • integer min/max3 ThreeOpFrag handles cst_bus_restrictions
  • float min/max3
    • sdag deals with cst_bus_restrictions when it prints mir from selected DAG
    • globalisel all float min/max3 operands are set to vgpr by regbankselect

Improve test coverage and add precommit for new tests.

Herald added a project: Restricted Project. · View Herald TranscriptOct 27 2022, 9:35 AM
Herald added a subscriber: kosarev. · View Herald Transcript