I noticed this pattern appearing when running the bullet physics engine on node. Folding away the xor looks beneficial for different architectures and runtimes, speedups:
Benchmark | Macbook m1 (node) | Macbook m1 (wasmtime) | Ryzen 3 (node) |
Bullet | 1.4% | 0.5% | 1% |
Adobe | 2.4% | 0.8% | 2.3% |
I have performed this transformation directly in v8 too and the numbers correlate.
This looks generally useful; Can you move it to WebAssemblyInstrInfo.td and add a TODO about using it in more places?