This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Add support for strlcpy folding
ClosedPublic

Authored by msebor on Jul 27 2022, 2:54 PM.

Details

Summary

The middle end support for strlcpy consists of transforming calls to __strlcpy_chk to the function itself if the buffer overflow checking overhead can be avoided. This change extends the support to also optimize a subset of calls to the function itself, similar to snprintf.

The strlcpy and strlcat functions have both recently been added to the next revision of POSIX (see issue 986). If/when this proposed enhancement is accepted I'll look into adding minimal support for strlcat as well.

Diff Detail

Event Timeline

msebor created this revision.Jul 27 2022, 2:54 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2022, 2:54 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
msebor requested review of this revision.Jul 27 2022, 2:54 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2022, 2:54 PM

Ping: Does anyone have any comments on this enhancement?

For additional motivation behind this change besides the recent inclusion of the functions in POSIX see also the 1999 paper strlcpy and strlcat — Consistent, Safe, String Copy and Concatenation.

efriedma added inline comments.Aug 10 2022, 1:55 PM
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
699

strlcpy accesses the second argument no matter what the size is (so it can compute the return value).

726

>=?

msebor updated this revision to Diff 453112.Aug 16 2022, 1:44 PM
msebor marked an inline comment as done.

Revision 1 of the patch with the following changes:

  • Annotate source argument dereferenceable regardless of the bound.
  • Use getConstantStringInfo instead of GetStringLength to detect unterminated arrays.
  • Add tests.
msebor added inline comments.Aug 16 2022, 1:45 PM
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
699

You're absolutely right, thanks!

726

That would work; at the same time, I was uneasy about GetStringLength not differentiating between an unterminated array and a string of the same length. I'd prefer to avoid assuming the source array is nul terminated, even though it's required to be. So I've replaced the call to GetStringLength with getConstantStringInfo and handled that case.

This revision is now accepted and ready to land.Aug 16 2022, 2:02 PM
This revision was landed with ongoing or failed builds.Aug 16 2022, 3:44 PM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Aug 16 2022, 3:55 PM

This doesn't build on Mac: http://45.33.8.238/macm1/42588/step_4.txt

Please take a look and revert for now if it takes a while to fix.

Whoops! It's a template argument deduction failure due to a mismatch between size_t and uint64_t in ILP32. Sorry about that! It's fixed in rGa7a1be11e69d.

joanahalili added a subscriber: joanahalili.EditedAug 31 2022, 4:31 AM

This might break the following opensource lib https://git.kernel.org/pub/scm/network/iproute2/iproute2.git

In https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/tree/configure#n445

The compilation for the following code no longer fails when setting -Wno-error-implicit-function-declaration

#include <string.h>
int main(int argc, char **argv) {
	char dst[10];
	strlcpy(dst, "test", sizeof(dst));
	return 0;
}

The behaviour difference between clang 14.0, clang trunk and gcc: https://godbolt.org/z/hKETMoEd6

This might break the following opensource lib https://git.kernel.org/pub/scm/network/iproute2/iproute2.git

In https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/tree/configure#n445

The compilation for the following code no longer fails when setting -Wno-error-implicit-function-declaration

#include <string.h>
int main(int argc, char **argv) {
	char dst[10];
	strlcpy(dst, "test", sizeof(dst));
	return 0;
}

The behaviour difference between clang 14.0, clang trunk and gcc: https://godbolt.org/z/hKETMoEd6

The configuration test assumes the function call isn't optimized when it's compiled with optimization enabled. A safer way to test for the presence of a function in a library than calling it without relying on the effect of the call in any way is to use its address in some nontrivial way, e.g., by printing it, or assigning it to a volatile-quialified pointer.