I have not found a way to expose a difference for this patch in a test because it only triggers for a one-use load, but this is the code that was adapted into D118376 and caused miscompiles. The new code pattern is the same as what we do in narrowExtractedVectorLoad() (reduces load width for a subvector extract).
This removes seemingly unnecessary manual worklist management and fixes the chain updating via:
SDValue SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain, SDValue NewMemOpChain) { assert(isa<MemSDNode>(NewMemOpChain) && "Expected a memop node"); assert(NewMemOpChain.getValueType() == MVT::Other && "Expected a token VT"); // The new memory operation must have the same position as the old load in // terms of memory dependency. Create a TokenFactor for the old load and new // memory operation and update uses of the old load's output chain to use that // TokenFactor. if (OldChain == NewMemOpChain || OldChain.use_empty()) return NewMemOpChain; SDValue TokenFactor = getNode(ISD::TokenFactor, SDLoc(OldChain), MVT::Other, OldChain, NewMemOpChain); ReplaceAllUsesOfValueWith(OldChain, TokenFactor); UpdateNodeOperands(TokenFactor.getNode(), OldChain, NewMemOpChain); return TokenFactor; }