Hi,
This is a WIP, if people like it, I can add comments and stuff and push behind an experimental option (or just use it by default). This is the full diff, in particular, I've squashed the whole history to show the full picture.
- Context ***
Right now, the table generated for matching instruction is straight forward but highly inefficient. Basically, each pattern generates its own set of self contained checks and actions.
E.g., TableGen generates:
First pattern
CheckNumOperand 3
CheckOpcode G_ADD
...
Build ADDrr
Second pattern
CheckNumOperand 3
CheckOpcode G_ADD
...
Build ADDri
// Third pattern
CheckNumOperand 3
CheckOpcode G_SUB
...
Build SUBrr
- Problem ***
Because of that generation, a *lot* of check are redundant between each pattern and are checked every single time until we reach the pattern that matches.
E.g., Taking the previous table, let say we are matching a G_SUB, that means we are going to check all the rules for G_ADD before looking at the G_SUB rule. In particular we are going to do:
check 3 operands; PASS
check G_ADD; FAIL
; Next rule
check 3 operands; PASS (but we already knew that!)
check G_ADD; FAIL (well it is still not true
; Next rule
check 3 operands; PASS (really!!)
check G_SUB; PASS (at last :P)
- Proposed Solution ***
This patch introduced a concept of group of rules (GroupMatcher) that share some predicates and only get checked once for the whole group.
As a proof of concept this patch only creates groups with one nesting level, but really there is nothing preventing it to do more.
In particular, for the given example it would generate:
// First group
CheckOpcode G_ADD
// First pattern CheckNumOperand 3 ... Build ADDrr // Second pattern CheckNumOperand 3 ... Build ADDri
// Second group
CheckOpcode G_SUB
// Third pattern CheckNumOperand 3 ... Build SUBrr
But it could also create a sub group for the checknumoperand 3 (we would just need to add a OptimizeRules on the rules within a group).
- Result ***
With only one level of nesting, the instruction selection pass was up to 4x faster. For instance, one instruction takes 500 checks, instead of 24k! With more nesting we could get in the tens I believe.
What do people think?
Cheers,
-Quentin