If a previous read access in the loop has gotten a prefetch, that prefetch should be updated to 'write' in case a store is later seen to the same address.
I believe one should be careful not to emit store prefetches without relative certainty of benefit (targeted cache line), so this patch only does this in case the address is actually the same (PD == 0). I however also tried to do this with different spaces between the load and the store. On SystemZ (with '-min-prefetch-stride=128 -loop-prefetch-writes') I see
(unpatched) 1606 read / 601 write prefetches.
patch (same address): 1503 / 704.
patch, but within Cache Line Size: 1421 / 786
patch, but within CLS / 16: 1459 / 748
In summary, this seems to be worthwhile and safe even for just the identical address case. Perhaps CLS/16 is beneficial..?