This effectively relands r308322 / D35067, but sidesteps the PR33914 regression by only increasing the load count for memcmp() if the user only cares about equality (not which operand is greater or lesser).
This patch also generalizes combineVectorSizedSetCCEquality() to handle nontrivial memcmp() expansion pass results.