Add an optimization that does CSE in a group of similar GEPs.
This optimization merges the common part of a group of GEPs, so we can compute
each pointer address by adding a simple offset to the common part.
The optimization is currently only enabled for the NVPTX backend, where it has
a large payoff on some benchmarks.
Patch by Jingyue Wu.