This is an archive of the discontinued LLVM Phabricator instance.

CodeGen: use non-zero memset when possible for automatic variables
ClosedPublic

Authored by jfb on Jul 24 2018, 5:16 PM.

Details

Summary

Right now automatic variables are either initialized with bzero followed by a few stores, or memcpy'd from a synthesized global. We end up encountering a fair amount of code where memcpy of non-zero byte patterns would be better than memcpy from a global because it touches less memory and generates a smaller binary. The optimizer could reason about this, but it's not really worth it when clang already knows.

This code could definitely be more clever but I'm not sure it's worth it. In particular we could track a histogram of bytes seen and figure out (as we do with bzero) if a memset could be followed by a handful of stores. Similarly, we could tune the heuristics for GlobalSize, but using the same as for bzero seems conservatively OK for now.

rdar://problem/42563091

Diff Detail

Repository
rL LLVM

Event Timeline

jfb created this revision.Jul 24 2018, 5:16 PM
Quuxplusone added inline comments.
test/CodeGen/init.c
202 ↗(On Diff #157176)

Drive-by suggestion: If you make this struct S { union U u; short s; }; then you'll also be testing the case of "padding between struct fields", which is otherwise untested here.

bogner accepted this revision.Jul 24 2018, 5:52 PM
bogner added a subscriber: bogner.

Seems straightforward and correct to me.

lib/CodeGen/CGDecl.cpp
956–957 ↗(On Diff #157176)

Probably makes sense to swap the order of these or give the enum class a smaller underlying type than int.

996–998 ↗(On Diff #157176)

Very much a nitpick, but this would be slightly easier to follow written in the order without a negation.

This revision is now accepted and ready to land.Jul 24 2018, 5:52 PM
jfb updated this revision to Diff 157191.Jul 24 2018, 9:22 PM
jfb marked 2 inline comments as done.
  • Use short to test padding between array elements.
  • Define enum class storage type; swap order of if / else to make it more readable.
jfb marked an inline comment as done.Jul 24 2018, 9:22 PM

Addressed all comments.

lib/CodeGen/CGDecl.cpp
956–957 ↗(On Diff #157176)

I defined the enum class' storage type as uint8_t.

This revision was automatically updated to reflect the committed changes.

I'm curious: isn't the kind of optimization we should expect LLVM to provide?

jfb added a comment.Jul 31 2018, 2:19 PM

I'm curious: isn't the kind of optimization we should expect LLVM to provide?

Maybe? It seems obvious to do here since we know we'll probably want to be doing it, and I have another patch I'm working on which will make it that much more obviously useful to have here. The middle-end can definitely figure it out but it just seems like more work, later, so in the meantime we'd be looking at more stuff.

In D49771#1183380, @jfb wrote:

I'm curious: isn't the kind of optimization we should expect LLVM to provide?

Maybe? It seems obvious to do here since we know we'll probably want to be doing it, and I have another patch I'm working on which will make it that much more obviously useful to have here. The middle-end can definitely figure it out but it just seems like more work, later, so in the meantime we'd be looking at more stuff.

I'm not asking where is it easier to do, but where does it make the most sense :)
Doing such in LLVM in general means catching more patterns (i.e. after inlining, etc.), and also catching it from multiple frontend. So in general I'm worried when I see optimizations implemented in the frontend instead of the middle end.

jfb added a comment.Jul 31 2018, 4:07 PM
In D49771#1183380, @jfb wrote:

I'm curious: isn't the kind of optimization we should expect LLVM to provide?

Maybe? It seems obvious to do here since we know we'll probably want to be doing it, and I have another patch I'm working on which will make it that much more obviously useful to have here. The middle-end can definitely figure it out but it just seems like more work, later, so in the meantime we'd be looking at more stuff.

I'm not asking where is it easier to do, but where does it make the most sense :)

What I mean by "easy" is: we know we're likely to want this type of code, there's not much pattern recognition needed on our part here. Were we to wait we'd need to do more work. I believe this statement will become truer over time.

Doing such in LLVM in general means catching more patterns (i.e. after inlining, etc.), and also catching it from multiple frontend. So in general I'm worried when I see optimizations implemented in the frontend instead of the middle end.

Agreed, LLVM could also do it, and it would likely be useful to do so. I'm worried, however, about generating a bunch more code than needed from clang in the hopes that the compiler will clean it up later.

I'm worried, however, about generating a bunch more code than needed from clang in the hopes that the compiler will clean it up later.

Isn't a strong design component of clang/LLVM? Clang does not try to generate "smart" code and leave it up to LLVM to clean it up.

jfb added a comment.Jul 31 2018, 10:15 PM

I'm worried, however, about generating a bunch more code than needed from clang in the hopes that the compiler will clean it up later.

Isn't a strong design component of clang/LLVM? Clang does not try to generate "smart" code and leave it up to LLVM to clean it up.

The code around this one, and lack of code in LLVM, seem to disagree. :-)

joerg added a subscriber: joerg.Aug 1 2018, 10:19 AM

There are two different considerations here:
(1) Create less target code
(2) Create less IR

If this code can significantly reduce the amount of IR, it can be useful in general. That's why the existing memset logic is helpful.