HomePhabricator

[TableGen][GlobalISel] Optimize MatchTable for faster instruction selection

Description

[TableGen][GlobalISel] Optimize MatchTable for faster instruction selection

  • Context ***

Prior to this patchw, the table generated for matching instruction was
straight forward but highly inefficient.

Basically, each pattern generates its own set of self contained checks
and actions.
E.g., TableGen generated:
First pattern
CheckNumOperand 3
CheckOpcode G_ADD
...
Build ADDrr
Second pattern
CheckNumOperand 3
CheckOpcode G_ADD
...
Build ADDri
// Third pattern
CheckNumOperand 3
CheckOpcode G_SUB
...
Build SUBrr

  • Problem ***

Because of that generation, a *lot* of check were redundant between each
pattern and were checked every single time until we reach the pattern
that matches.
E.g., Taking the previous table, let say we are matching a G_SUB, that
means we were going to check all the rules for G_ADD before looking at
the G_SUB rule. In particular we are going to do:
check 3 operands; PASS
check G_ADD; FAIL
; Next rule
check 3 operands; PASS (but we already knew that!)
check G_ADD; FAIL (well it is still not true)
; Next rule
check 3 operands; PASS (really!!)
check G_SUB; PASS (at last :P)

  • Proposed Solution ***

This patch introduces a concept of group of rules (GroupMatcher) that
share some predicates and only get checked once for the whole group.

This patch only creates groups with one nesting level. Conceptually
there is nothing preventing us for having deeper nest level. However,
the current implementation is not smart enough to share the recording
(aka capturing) of values. That limits its ability to do more sharing.

For the given example the current patch will generate:
// First group
CheckOpcode G_ADD

First pattern
CheckNumOperand 3
...
Build ADDrr
Second pattern
CheckNumOperand 3
...
Build ADDri

// Second group
CheckOpcode G_SUB

// Third pattern
CheckNumOperand 3
...
Build SUBrr

But if we allowed several nesting level, it could create a sub group
for the checknumoperand 3.
(We would need to call optimizeRules on the rules within a group.)

  • Result ***

With only one level of nesting, the instruction selection pass is up
to 4x faster. For instance, one instruction now takes 500 checks,
instead of 24k! With more nesting we could get in the tens I believe.

Differential Revision: https://reviews.llvm.org/D39034

rdar://problem/34670699

Details