This patch decreases time spent by GlobalISel in its InstructionSelect pass by roughly 60% for -O0 builds for large inputs as measured on sqlite3-amalgamation (http://sqlite.org/download.html) targeting AArch64.
The performance improvement is achieved solely by reducing the number of matching GIM_* opcodes executed by the MatchTable's interpreter during the selection by approx. a factor of 30, which also brings contribution of this particular part of the selection process to the overall runtime of InstructionSelect pass down from approx. 60-70% to 5-7%, thus making further improvements in this particular direction not very profitable.
The patch loosely depends on Testgen patches and relies on additional testing provided by it (see https://reviews.llvm.org/D43962)