To allow broadcast loads of a non-zero'th vector element, lowerVectorShuffleAsBroadcast can replace a load with a new load with an adjusted address, but unfortunately we weren't ensuring that the new load respected the same dependencies.
This patch adds a TokenFactor and updates all dependencies of the old load to reference the new load instead.
Bug found during internal testing.
Why do we need the TokenFactor? The load is only used in the shuffle, right? So why can't we just replace Ld with V completely?
Or do you prefer not to trust the load being DCE'd?