I discovered while using D52351 (which fixes so that operands scalarization overhead cost is not added if target keeps addresses in GPRs), that some vector loads now got a zero cost. This was because the scalar load can be folded into e.g. an add as one of the operands. The problem is that the folding of the load can only occur in the scalar version, not if the load is vectorized.
I think the simplest solution is to not pass the instruction pointer to getMemoryOpCost() from getMemInstScalarizationCost. Only if that is passed does the SystemZ implementation consider the folding of the load into the user.
I think it would be good to add a comment why no instruction pointer is passed. Otherwise this LGTM.