This is an archive of the discontinued LLVM Phabricator instance.

Do not warn about format strings that are indexed string literals.
ClosedPublic

Authored by meikeb on Aug 23 2016, 3:52 PM.

Details

Summary

The warning for a format string not being a string literal and therefore
being potentially insecure is overly strict for indices into string
literals. This fix checks if the index into the string literal is
precomputable. If that's the case it will check if the suffix of that
string literal is a valid format string string literal. It will still
issue the aforementioned warning for out of range indices into the
string literal.

Diff Detail

Repository
rL LLVM

Event Timeline

meikeb updated this revision to Diff 69053.Aug 23 2016, 3:52 PM
meikeb retitled this revision from to Do not warn about format strings that are indexed string literals..
meikeb updated this object.
meikeb added a reviewer: rsmith.
meikeb added a subscriber: cfe-commits.
srhines added inline comments.Aug 23 2016, 4:57 PM
lib/Sema/SemaChecking.cpp
3856 ↗(On Diff #69053)

It might be good to mention that Offset now goes back to the caller to allow for checking of string literal suffixes.

3900 ↗(On Diff #69053)

separately

meikeb updated this revision to Diff 69057.Aug 23 2016, 5:07 PM

Fix typo.

meikeb marked an inline comment as done.Aug 23 2016, 5:09 PM
meikeb added inline comments.
lib/Sema/SemaChecking.cpp
3856 ↗(On Diff #69057)

I'm not sure if that should be mentioned here because it is a very high level comment and the suffix of a string literal is a string literal itself.

rsmith edited edge metadata.Aug 23 2016, 7:15 PM

Please also handle expressions of the form &"str"[i]. We warn (under -Wstring-plus-int) for "str" + i and suggest rewriting into that form, so we should also handle that form here.

lib/Sema/SemaChecking.cpp
3864 ↗(On Diff #69057)

Why is this passed by reference? It's just an input, not an output, right?

4062–4067 ↗(On Diff #69057)

This doesn't seem like it preserves enough information for the downstream code to give correct caret diagnostics pointing at locations within the string. It seems like it would be extremely complicated to maintain the necessary invariants to make that work (you'd need to create a fake string literal source buffer so that the StringLiteralParser can reparse it, for whichever of the string literal tokens the offset ends up within). Have you looked at how much work it'd be to feed a starting offset into CheckFormatString instead?

4089–4090 ↗(On Diff #69057)

What happens if one of these expressions is value-dependent? The evaluator can crash or assert if given a value-dependent expression. If we don't defer these checks in dependent contexts, you'll need to handle that possibility somehow.

Example:

template<int N> void f(const char *p) {
  printf("blah blah %s" + N, p);
}
4097–4100 ↗(On Diff #69057)

This will assert if the result doesn't fit into 64 bits, and it's not guaranteed to (if one of the operands was cast to __int128, for instance). You could use getLimitedValue instead, with some suitable limit.

meikeb added inline comments.Aug 26 2016, 11:31 AM
lib/Sema/SemaChecking.cpp
4089–4090 ↗(On Diff #69057)

I think I don't understand what you are trying to tell me. Especially the example you provided does just fine and behaves as I expected. As far as I followed EvaluateAsInt it does not assert but returns false if we don't get a constexpr here. We warn under -Wformat-nonliteral for value-dependent string literals.

Could you explain this more or provide an example that triggers an assert or explain what behavior is wrong regarding the provided example? Thanks!

rsmith added inline comments.Sep 7 2016, 5:48 PM
lib/Sema/SemaChecking.cpp
4089–4090 ↗(On Diff #69057)

We should not warn for that example, since (for instance) calling f<0> is fine (we should warn for f<11>, though, since it has no format specifiers).

While EvaluateAsInt happens to not assert for that particular value-dependent input, it does assert for some other value-dependent cases. It's not easy for me to find you such a case, because Clang is currently careful to never call this function on a value-dependent expression, but perhaps this will trigger an assert:

struct S { constexpr S(int n) : n(n) {} int n; };
template<int N> void f(const char *p) {
  printf("blah blah %s" + S(N).n, p);
}
meikeb updated this revision to Diff 71043.Sep 12 2016, 12:43 PM
meikeb edited edge metadata.

Fix the mentioned issues.

meikeb marked 4 inline comments as done.Sep 12 2016, 12:46 PM
meikeb added inline comments.
lib/Sema/SemaChecking.cpp
4166–4167 ↗(On Diff #71043)

I hope this additional check fixes this issue. As far as I read the code there were none such asserts in isIntegerConstantExpr(). Thanks for explaining this!

rsmith added inline comments.Sep 12 2016, 1:07 PM
lib/Sema/SemaChecking.cpp
3842–3843 ↗(On Diff #71043)

I can't tell from this declaration what this function is for -- what does "reckon up" mean?

3880 ↗(On Diff #71043)

Typo "retruning"

3891–3892 ↗(On Diff #71043)

I think you can simplify this to return FExpr->getString().drop_front(Offset);

4143 ↗(On Diff #71043)

You should presumably also do this if Offset is >= the length of the string literal (we want printf and friends to at least find the trailing nul byte).

4148–4149 ↗(On Diff #71043)

Does this need to be heap-allocated?

4166–4167 ↗(On Diff #71043)

OK, I've now checked and the value-dependent case is handled up on line 3953. So just calling EvaluateAsInt here is fine after all.

meikeb updated this revision to Diff 71090.Sep 12 2016, 6:37 PM
meikeb marked an inline comment as done.

Fix mentioned issues.

meikeb marked 6 inline comments as done.Sep 12 2016, 6:41 PM

Thanks for taking the time and doing these great reviews! Really appreciated!

lib/Sema/SemaChecking.cpp
4143–4150 ↗(On Diff #71090)

The = case is part of a different warning. It's checked in CheckFormatString.

My comment is mostly naming considerations to improve clarity. I do have concerns though about the unhandled else path.

lib/Sema/SemaChecking.cpp
3842 ↗(On Diff #71090)

Is "computeStringLiteralOffset" or "calculate..." a better name here?

3844 ↗(On Diff #71090)

Is "Operand" better than "Addend"? In particular, there is the possibility that we do subtraction of the value instead of addition, so "Addend" makes it a bit confusing. Of course, I then would expect "OperandIsRight" instead of "AddendIsRight" too.

3852 ↗(On Diff #71090)

Align -> Canonicalize or Adjust

This is confusing here as "Align" already has too many overloaded meanings in programming (that could be relevant to bitwidths). Canonicalize or Adjust don't have this problem.

3860 ↗(On Diff #71090)

Ov -> Overflow

3865 ↗(On Diff #71090)

What happens if someone passes something that isn't caught by these two cases? Should we be returning an indicator that the calculation failed? If not, should we assert here?

3867 ↗(On Diff #71090)

2 places to fix: "a offset" -> "an offset"

meikeb updated this revision to Diff 71199.Sep 13 2016, 10:43 AM
meikeb marked 7 inline comments as done.

Fix typos and add assert to sum up offset helper.

I explained why I chose the names that you commented on. Feel free to add your thoughts if you still think another name would be more fitting.

lib/Sema/SemaChecking.cpp
3842 ↗(On Diff #71090)

I thought about that but decided to go with sumUp because compute or calculate sounds like this function would do what we actually do what the caller of this function does (computing the offset). This is just a nice helper to sum up the offset we already have with another piece of offset.

3844 ↗(On Diff #71090)

Clang summarizes sub and add as "additive" operands. This is why I think this is fitting. Operand is misleading because it includes a lot more operands than add and sub imo.

3860 ↗(On Diff #71090)

I named that in compliance with clang naming. E.g. sadd_ov. It is common in this file to abbreviate variable names with 1-3 characters.

rsmith accepted this revision.Sep 13 2016, 4:47 PM
rsmith edited edge metadata.

This basically looks fine to me now. I'm not 100% sold on sumUpStringLiteralOffset being the best name, but I think we have better things to worry about, and it's good enough. Just a couple of minor style issues then I think this is good to go.

lib/Sema/SemaChecking.cpp
3866–3867 ↗(On Diff #71199)

Rather than assert(false && XXX);, use either llvm_unreachable(XXX) or change the previous case to be:

else {
  assert(AddendIsRight && BinOpKind == BO_Sub &&
         "operator must be ...");
4150–4151 ↗(On Diff #71199)

You can write this more simply as

FormatStringLiteral FStr(StrE, Offset.sextOrTrunc(64).getSExtValue());
This revision is now accepted and ready to land.Sep 13 2016, 4:47 PM
meikeb updated this revision to Diff 71284.Sep 13 2016, 6:18 PM
meikeb edited edge metadata.

Try to improve offset sum helper function name and fix style issues.

meikeb marked 2 inline comments as done.Sep 13 2016, 6:20 PM
rsmith added inline comments.Sep 13 2016, 6:25 PM
lib/Sema/SemaChecking.cpp
3864–3867 ↗(On Diff #71284)

The suggestion was to remove the condition in the else if and put the assertion inside its body, rather than duplicating it here:

if (BinOpKind == BO_Add)
  // handle add
else {
  assert(it's a subtract);
  // handle sub
}
meikeb updated this revision to Diff 71286.Sep 13 2016, 6:44 PM

Correct assert position in offset sum helper function.

meikeb marked an inline comment as done.Sep 13 2016, 6:45 PM
meikeb added inline comments.
lib/Sema/SemaChecking.cpp
3864–3867 ↗(On Diff #71286)

This is way better. Thank you.

meikeb marked an inline comment as done.Sep 14 2016, 12:56 PM

Thanks for reviewing my patch! It would be great if someone could submit this patch for me.

I will take care of submitting this. Thanks for the reviews everybody!

This revision was automatically updated to reflect the committed changes.
meikeb updated this object.Sep 14 2016, 2:33 PM