D29668 enabled to avoid a useless copy of the argument value into an alloca if the caller places it in memory (as it often happens on x86) by directly forwarding the pointer to it. This optimization is illegal if the type contains padding bytes: if a truncating store into the alloca is replaced the upper bits are filled with garbage and produce code misbehaving at runtime.
Details
Diff Detail
Event Timeline
I don't see why this optimization should be declared illegal. Why isn't it the caller's responsibility to store these narrow integers canonically? C doesn't have narrow integers, but GCC doesn't do this. If a variable is passed in memory, it just uses the memory that was passed into it.
llvm/test/CodeGen/X86/arg-copy-elide.ll | ||
---|---|---|
80 | We don't want this, do we? Seems like it would be a regression. |
AFAICS it's the callee's responsibility to chop the extra bits off.
In this (compile at -O0) example you can see that @expect is correctly truncating the i3 value passed in-register, but fails to do so for the third argument that's passed in-memory.
Thinking about this more, I think this is a good change. Currently we use the zext and sext attributes to document if the high bits of an argument are initialized, and if so, to what. If those attributes aren't present, we shouldn't do copy elision. I have a minor concern that needs to be addressed, though.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
---|---|---|
9824 | Don't we still need this check? I don't see any other code that checks that the allocated type and argument type match. Otherwise we could try to copy elide an i32 stored into an i64 alloca, for example. Please add a test for that, sorry for not already having one. | |
llvm/test/CodeGen/X86/arg-copy-elide.ll | ||
80 | This code turns out to not be representative of what clang generates, so I don't think we need to worry about it. See this bool example here: void escape(bool *); void bool_arg(bool b) { escape(&b); } We fail to copy elide because the argument is i1, but the alloca is i8. I think this will be true generally for non-byte-sized integers, so there is really no harm to this change to copy elision. |
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
---|---|---|
9824 | In tryToElideArgumentCopy there's a check between the object size pointed by the old and new frame indices, in case of a partially initialized i64 being forwarded as a i32 the check will fail. |
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
---|---|---|
9824 | I think we should keep the store size vs. alloca size check here. If I were debugging this code later, I would be confused as to how an alloca like this became a copy elision candidate that gets rejected later in tryToElideArgumentCopy. |
Don't we still need this check? I don't see any other code that checks that the allocated type and argument type match. Otherwise we could try to copy elide an i32 stored into an i64 alloca, for example. Please add a test for that, sorry for not already having one.