Turns out creating matchers with combineOr isn't very efficient as we have to build matcher objects for both sides of the OR. Those objects aren't free, the trees usually contain several objects that contain a reference to a Value *, ConstantInt *, APInt * or some such thing. The compiler isn't always willing to inline all the matcher code to get rid of these member variables. Thus we end up loads and stores of these variables.
Using combineOR ends up creating two complete copies of the tree and the associated stores. I believe we're also paying for the opcode check twice.
This patch adds a commutable mode to several of the matcher objects as a bool template parameter that can be used to enable commutable support directly in the match functions of the corresponding objects. This avoids the duplicate object creation and the opcode checks.
This shows about an ~7-8k reduction in the opt binary size on my local build.