A uniform load is one which loads from a uniform address across all lanes. As currently implemented, we cost model such loads as if we did a single scalar load + a broadcast, but the actual lowering replicates the load once per lane.
This change tweaks the lowering to use the REPLICATE strategy by marking such loads (and the computation leading to their memory operand) as uniform after vectorization. This is a useful change in itself, but it's real purpose is to pave the way for a following change which will generalize our uniformity logic.
The VPRecplicateRecipe contains a IsUniform flag. I think it should be possible to pass the flag through from the recipe to scalarizeInstruction. Ideally the recipes should contain all information required for code-generation to avoid having to tie code generation directly to the cost-model.