This is an archive of the discontinued LLVM Phabricator instance.

[BuildLibCalls] Add noundef to standard I/O functions
ClosedPublic

Authored by aqjune on Aug 5 2020, 12:31 PM.

Details

Summary

This patch adds noundef to return value and arguments of standard I/O functions.
With this patch, passing undef or poison to the functions becomes undefined
behavior in LLVM IR. Since undef/poison is lowered from operations having UB in C/C++,
passing undef to them was already UB in source.

With this patch, the functions cannot return undef or poison anymore as well.
According to C17 standard, ungetc/ungetwc/fgetpos/ftell can generate unspecified
value; 3.19.3 says unspecified value is a valid value of the relevant type,
and using unspecified value is unspecified behavior, which is not UB, so it
cannot be undef (using undef is UB when e.g. it is used at branch condition).

— The value of the file position indicator after a successful call to the ungetc function for a text stream, or the ungetwc function for any stream, until all pushed-back characters are read or discarded (7.21.7.10, 7.29.3.10).
— The details of the value stored by the fgetpos function (7.21.9.1).
— The details of the value returned by the ftell function for a text stream (7.21.9.4).

In the long run, most of the functions listed in BuildLibCalls should have noundefs; to remove redundant diffs which will anyway disappear in the future, I added noundef to a few more non-I/O functions as well.

Diff Detail

Event Timeline

aqjune created this revision.Aug 5 2020, 12:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 5 2020, 12:31 PM
aqjune requested review of this revision.Aug 5 2020, 12:31 PM
aqjune edited the summary of this revision. (Show Details)
aqjune edited the summary of this revision. (Show Details)
jdoerfert added inline comments.Aug 5 2020, 8:32 PM
llvm/lib/Transforms/Utils/BuildLibCalls.cpp
509

I'd split it into noundef and other attributes. noundef is so new that we should keep that separate, having the others in a single patch is fine with me though.

aqjune added inline comments.Aug 8 2020, 11:44 PM
llvm/lib/Transforms/Utils/BuildLibCalls.cpp
509

Hi,
These attributes existed in the past - they are copied from lines 514-515.
This change was made because noundef is not added to frexp/frexpl/frexpf in this patch, which can possibly make those unspeculatable.

jdoerfert accepted this revision.Aug 9 2020, 10:12 AM

LGTM, assuming the fixes to the failing tests are straight forward.

llvm/lib/Transforms/Utils/BuildLibCalls.cpp
509

I see.

This revision is now accepted and ready to land.Aug 9 2020, 10:12 AM
aqjune updated this revision to Diff 284246.Aug 9 2020, 6:58 PM

Update PR3589-freestanding-libcalls.c

Herald added a project: Restricted Project. · View Herald TranscriptAug 9 2020, 6:58 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
This revision was landed with ongoing or failed builds.Aug 9 2020, 6:58 PM
This revision was automatically updated to reflect the committed changes.
aqjune added a comment.Aug 9 2020, 7:07 PM

If things go well, I'll add noundef to other library functions such as utime, setenv, unsetenv, .... as well.
For malloc/calloc/realloc/free, I'll think more about them. I guess it is safe to tag noundef to free's operand, but for malloc I'm not 100% sure...

If things go well, I'll add noundef to other library functions such as utime, setenv, unsetenv, .... as well.
For malloc/calloc/realloc/free, I'll think more about them. I guess it is safe to tag noundef to free's operand, but for malloc I'm not 100% sure...

What I think is helpful is the operand of a free and the return value of a {m,c,re}alloc. Arguably, you can give the latter anything but what falls out is better not undef or poison ;)

aqjune added a comment.Aug 9 2020, 8:59 PM

What about undef or poison is given to malloc? If it should raise UB, the size argument and returned pointer should be noundef.

What about undef or poison is given to malloc? If it should raise UB, the size argument and returned pointer should be noundef.

It is unclear to me if we want to forbid undef to be passed to malloc. It makes practical sense but not from a semantic perspective.
However, even if you pass undef you should not get undef back. So the return should never be undef for sure. If it doesn, how could you ever deal with it, given that a branch on the result would be UB :D

What about undef or poison is given to malloc? If it should raise UB, the size argument and returned pointer should be noundef.

It is unclear to me if we want to forbid undef to be passed to malloc. It makes practical sense but not from a semantic perspective.
However, even if you pass undef you should not get undef back. So the return should never be undef for sure. If it doesn, how could you ever deal with it, given that a branch on the result would be UB :D

Makes sense.... and labeling the returned pointer of malloc as noundef will be useful in various places, e.g. proving safety of loop unswitching on icmp(malloc(p), null).
Give me some time, because I'd like to play with this by implementing this on Alive2 and see how it goes when running unit tests.