DSE will currently only remove stores in the same block unless they can be guaranteed to be loop invariant. This expands that to any stores that are in the same Loop, at the same loop level. I believe this should still account for where AA/MSSA will not handle aliasing between loops, but allow the dead stores to be removed where they overlap in the same loop iteration. It requires adding loop info to DSE, but that looks fairly harmless.
The test case this helps is from code like this, which can come up in certain matrix operations:
for(i=..) dst[i] = 0; for(j=..) dst[i] += src[i*n+j];
After LICM, this becomes:
for(i=..) dst[i] = 0; sum = 0; for(j=..) sum += src[i*n+j]; dst[i] = sum;
The first store is dead, but is not currently removed.