The warning for a format string not being a string literal and therefore
being potentially insecure is overly strict for indices into string
literals. This fix checks if the index into the string literal is
precomputable. If that's the case it will check if the suffix of that
string literal is a valid format string string literal. It will still
issue the aforementioned warning for out of range indices into the
string literal.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Sema/SemaChecking.cpp | ||
---|---|---|
3856 ↗ | (On Diff #69057) | I'm not sure if that should be mentioned here because it is a very high level comment and the suffix of a string literal is a string literal itself. |
Please also handle expressions of the form &"str"[i]. We warn (under -Wstring-plus-int) for "str" + i and suggest rewriting into that form, so we should also handle that form here.
lib/Sema/SemaChecking.cpp | ||
---|---|---|
3864 ↗ | (On Diff #69057) | Why is this passed by reference? It's just an input, not an output, right? |
4062–4067 ↗ | (On Diff #69057) | This doesn't seem like it preserves enough information for the downstream code to give correct caret diagnostics pointing at locations within the string. It seems like it would be extremely complicated to maintain the necessary invariants to make that work (you'd need to create a fake string literal source buffer so that the StringLiteralParser can reparse it, for whichever of the string literal tokens the offset ends up within). Have you looked at how much work it'd be to feed a starting offset into CheckFormatString instead? |
4089–4090 ↗ | (On Diff #69057) | What happens if one of these expressions is value-dependent? The evaluator can crash or assert if given a value-dependent expression. If we don't defer these checks in dependent contexts, you'll need to handle that possibility somehow. Example: template<int N> void f(const char *p) { printf("blah blah %s" + N, p); } |
4097–4100 ↗ | (On Diff #69057) | This will assert if the result doesn't fit into 64 bits, and it's not guaranteed to (if one of the operands was cast to __int128, for instance). You could use getLimitedValue instead, with some suitable limit. |
lib/Sema/SemaChecking.cpp | ||
---|---|---|
4089–4090 ↗ | (On Diff #69057) | I think I don't understand what you are trying to tell me. Especially the example you provided does just fine and behaves as I expected. As far as I followed EvaluateAsInt it does not assert but returns false if we don't get a constexpr here. We warn under -Wformat-nonliteral for value-dependent string literals. Could you explain this more or provide an example that triggers an assert or explain what behavior is wrong regarding the provided example? Thanks! |
lib/Sema/SemaChecking.cpp | ||
---|---|---|
4089–4090 ↗ | (On Diff #69057) | We should not warn for that example, since (for instance) calling f<0> is fine (we should warn for f<11>, though, since it has no format specifiers). While EvaluateAsInt happens to not assert for that particular value-dependent input, it does assert for some other value-dependent cases. It's not easy for me to find you such a case, because Clang is currently careful to never call this function on a value-dependent expression, but perhaps this will trigger an assert: struct S { constexpr S(int n) : n(n) {} int n; }; template<int N> void f(const char *p) { printf("blah blah %s" + S(N).n, p); } |
lib/Sema/SemaChecking.cpp | ||
---|---|---|
4166–4167 ↗ | (On Diff #71043) | I hope this additional check fixes this issue. As far as I read the code there were none such asserts in isIntegerConstantExpr(). Thanks for explaining this! |
lib/Sema/SemaChecking.cpp | ||
---|---|---|
3842–3843 ↗ | (On Diff #71043) | I can't tell from this declaration what this function is for -- what does "reckon up" mean? |
3880 ↗ | (On Diff #71043) | Typo "retruning" |
3891–3892 ↗ | (On Diff #71043) | I think you can simplify this to return FExpr->getString().drop_front(Offset); |
4143 ↗ | (On Diff #71043) | You should presumably also do this if Offset is >= the length of the string literal (we want printf and friends to at least find the trailing nul byte). |
4148–4149 ↗ | (On Diff #71043) | Does this need to be heap-allocated? |
4166–4167 ↗ | (On Diff #71043) | OK, I've now checked and the value-dependent case is handled up on line 3953. So just calling EvaluateAsInt here is fine after all. |
Thanks for taking the time and doing these great reviews! Really appreciated!
lib/Sema/SemaChecking.cpp | ||
---|---|---|
4143–4150 ↗ | (On Diff #71090) | The = case is part of a different warning. It's checked in CheckFormatString. |
My comment is mostly naming considerations to improve clarity. I do have concerns though about the unhandled else path.
lib/Sema/SemaChecking.cpp | ||
---|---|---|
3842 ↗ | (On Diff #71090) | Is "computeStringLiteralOffset" or "calculate..." a better name here? |
3844 ↗ | (On Diff #71090) | Is "Operand" better than "Addend"? In particular, there is the possibility that we do subtraction of the value instead of addition, so "Addend" makes it a bit confusing. Of course, I then would expect "OperandIsRight" instead of "AddendIsRight" too. |
3852 ↗ | (On Diff #71090) | Align -> Canonicalize or Adjust This is confusing here as "Align" already has too many overloaded meanings in programming (that could be relevant to bitwidths). Canonicalize or Adjust don't have this problem. |
3860 ↗ | (On Diff #71090) | Ov -> Overflow |
3865 ↗ | (On Diff #71090) | What happens if someone passes something that isn't caught by these two cases? Should we be returning an indicator that the calculation failed? If not, should we assert here? |
3867 ↗ | (On Diff #71090) | 2 places to fix: "a offset" -> "an offset" |
I explained why I chose the names that you commented on. Feel free to add your thoughts if you still think another name would be more fitting.
lib/Sema/SemaChecking.cpp | ||
---|---|---|
3842 ↗ | (On Diff #71090) | I thought about that but decided to go with sumUp because compute or calculate sounds like this function would do what we actually do what the caller of this function does (computing the offset). This is just a nice helper to sum up the offset we already have with another piece of offset. |
3844 ↗ | (On Diff #71090) | Clang summarizes sub and add as "additive" operands. This is why I think this is fitting. Operand is misleading because it includes a lot more operands than add and sub imo. |
3860 ↗ | (On Diff #71090) | I named that in compliance with clang naming. E.g. sadd_ov. It is common in this file to abbreviate variable names with 1-3 characters. |
This basically looks fine to me now. I'm not 100% sold on sumUpStringLiteralOffset being the best name, but I think we have better things to worry about, and it's good enough. Just a couple of minor style issues then I think this is good to go.
lib/Sema/SemaChecking.cpp | ||
---|---|---|
3866–3867 ↗ | (On Diff #71199) | Rather than assert(false && XXX);, use either llvm_unreachable(XXX) or change the previous case to be: else { assert(AddendIsRight && BinOpKind == BO_Sub && "operator must be ..."); |
4150–4151 ↗ | (On Diff #71199) | You can write this more simply as FormatStringLiteral FStr(StrE, Offset.sextOrTrunc(64).getSExtValue()); |
lib/Sema/SemaChecking.cpp | ||
---|---|---|
3864–3867 ↗ | (On Diff #71284) | The suggestion was to remove the condition in the else if and put the assertion inside its body, rather than duplicating it here: if (BinOpKind == BO_Add) // handle add else { assert(it's a subtract); // handle sub } |
lib/Sema/SemaChecking.cpp | ||
---|---|---|
3864–3867 ↗ | (On Diff #71286) | This is way better. Thank you. |
Thanks for reviewing my patch! It would be great if someone could submit this patch for me.