Not rely a review, but removing this code I see slight improvements in
binary size on CTMark. Looks like with "ptr" patches MemCpyOptPass is
capable to handle this.
That's interesting. If MemCpyOpt has picked up the slack and clang can be simpler here, then let's remove clang's "intelligence". I do want to make sure that it handles all the cases this handles though, and does indeed generate great code on both x86-64 and ARM64. I think we need to create a list of test cases and make sure MemCpyOpt covers them. Further, I'd also like to check that performance is also still the same (not just code size).
I would still keep scalar stores (as you did above), and maybe update canDoSingleStore to also handle stores to a struct that only has a single element.
I will. And maybe I'll measure compilation time as well.
So far I tried -O3 and -Os.
Also I tried Chromium arm, arm64, x86_64. No regression in binary size without autoinit, with zero or pattern init.