This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Added optimization patterns with Zbb extension
Needs ReviewPublic

Authored by iabg-sc on Sep 7 2022, 5:18 AM.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

Patterns were added to substitute comparison and logic operations with min and logic operations.
Pattern.1
i = a < c
j = b < c
res = i or j
changes to:
m = min(a, b)
res = m < c

Pattern.2
i = a >= c
j = b >= c
res = i and j // negation of the result from Pattern.1
changes to:
m = min(a, b)
tmp = m < c
res = tmp xor 1

Pattern.2 is similar to Pattern.1 except there is no sgeu instruction and result has to be inverted with xor 1.

This patch can resolve this issue: https://github.com/llvm/llvm-project/issues/56518

Diff Detail

Event Timeline

iabg-sc created this revision.Sep 7 2022, 5:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 7 2022, 5:18 AM
iabg-sc requested review of this revision.Sep 7 2022, 5:18 AM
iabg-sc edited the summary of this revision. (Show Details)Sep 7 2022, 5:24 AM
craig.topper added a subscriber: craig.topper.EditedSep 7 2022, 9:00 AM

This seems more suited to a DAGCombine. We need to check the use count of the setccs to make sure we aren't increasing the number of instructions.

It can also allow the xori to be optimized away in some cases.

This patch specific for RISCV. There is no guarantee that min/max will work faster than two comparisons on other archs.
I am not sure how count of the instructions can be increased as pattern explicitly specifies changes from three instructions to two (or three) instructions. Could you please give an example?
xori can be optimized later, but it requires more complex analysis than I suggest in this patch.
Right now I am trying to write self-expanding patterns in TableGen to support most cases.

iabg-sc edited the summary of this revision. (Show Details)Sep 8 2022, 2:57 AM

This patch specific for RISCV. There is no guarantee that min/max will work faster than two comparisons on other archs.

I meant RISC-V DAGCombine in RISCVISelLowering not the generic DAGCombiner.

I am not sure how count of the instructions can be increased as pattern explicitly specifies changes from three instructions to two (or three) instructions. Could you please give an example?

If the setccs have users other than the or/and, then they'll get selected by your pattern as well as the setcc only patterns to satisify the other users. Overall this would result in 4(2 from your pattern and 1 for each setcc) instructions being created instead of 3 if we had selected the OR and setcc separately.

xori can be optimized later, but it requires more complex analysis than I suggest in this patch.

We already have a combine to fold the xori into a beqz or bnez by inverting the branch condition. We also have several combines that can pull the xori through a later or/and using deMorgan's law to move it closer to a branch.

Right now I am trying to write self-expanding patterns in TableGen to support most cases.

craig.topper added inline comments.Sep 8 2022, 8:57 AM
llvm/test/CodeGen/RISCV/minu_xori.ll
5

The DAG for this before isel looks like

SelectionDAG has 16 nodes:                                                       
  t0: ch = EntryToken                                                            
  t2: i64,ch = CopyFromReg t0, Register:i64 %0                                   
          t4: i64,ch = CopyFromReg t0, Register:i64 %1                           
        t22: i64 = setcc t4, t2, setult:ch                                       
          t6: i64,ch = CopyFromReg t0, Register:i64 %2                           
        t19: i64 = setcc t6, t2, setult:ch                                       
      t26: i64 = or t22, t19                                                     
    t27: i64 = xor t26, Constant:i64<1>                                          
  t13: ch,glue = CopyToReg t0, Register:i64 $x10, t27                            
  t14: ch = RISCVISD::RET_FLAG t13, Register:i64 $x10, t13:1

There is no AND, the XOR with 1 was already created.