This is an archive of the discontinued LLVM Phabricator instance.

Relax constexpr rules to improve __builtin_object_size's accuracy
ClosedPublic

Authored by george.burgess.iv on Aug 19 2015, 3:32 PM.

Details

Summary

(Hoping the formatting works as one would expect)

Motivating examples:
Pre-patch:

__builtin_object_size((char*)&foo, 0) != __builtin_object_size(&foo, 0) // if __builtin_object_size(&foo, 0) != -1
__builtin_object_size(&foo[1].bar[-1].baz, 1) == -1. // Always.

Post-patch:
Both act as one would expect. This was accomplished by making three changes:

  • Adding a flag to PointerExprEvaluator that makes it more accepting of reinterpret_casts.
  • Making array index/pointer offset less coupled in PointerExprEvaluator (we now carry around an extra Offset field that denotes how far we are away from an object boundary).
  • Adding an OutermostMemberEvaluator that ignores foo[1].bar[-1] in foo[1].bar[-1].baz, and is more relaxed with casts/pointer arithmetic/etc. (Not 100% sold on the name)

Diff Detail

Event Timeline

george.burgess.iv retitled this revision from to Relax constexpr rules to improve __builtin_object_size's accuracy.
george.burgess.iv updated this object.
george.burgess.iv added a reviewer: rsmith.
george.burgess.iv added a subscriber: cfe-commits.
vsk added a subscriber: vsk.Aug 19 2015, 4:19 PM

Just a minor nitpick.

lib/AST/ExprConstant.cpp
51

Do you need <iostream>?

Removed advanced debugging tools (iostream)

lib/AST/ExprConstant.cpp
51

Artifact of print debugging. Thanks for catching that :)

rsmith added inline comments.Aug 20 2015, 5:11 PM
lib/AST/ExprConstant.cpp
4763

This should be handled as an EvaluationMode.

4879–4907

We should not do this except in your special new mode, and I'm not entirely convinced we should support this case at all. Just discarding the designator (as we would do anyway in constant-folding mode) seems like an acceptable response here. We really don't know what subobject is being referenced any more.

6355–6356

I think handling this through the normal pointer evaluator, but with a different EvaluationMode, is a better approach. (We need some representation for an LValueBase to represent "unknown base", but everything else should fall through pretty naturally.)

lib/AST/ExprConstant.cpp
4763

Works for me.

4879–4907

We need some way to support BOS((char*)&Foo.Bar + 1, N), so this needs to go somewhere. I originally had it in its own method that was super similar to this one, and the duplication made me sad. Will re-duplicate and see how that looks.

FYI, if you'll look at the diff at L194 in p2-0x.cpp, we have an interesting feature in the non-patched impl where B *b = (B*)((A*)&b + 1) is at an offset of sizeof(A), but Designator says it's at b[1]. I assumed this behavior is a bug, and this kinda-fixes it. If it's intended behavior, I'll absolutely find a better way to handle this.

6355–6356

Sounds great to me -- I thought another visitor seemed heavyweight, but didn't consider adding another EvaluationMode. Will look into swapping approaches.

Mostly redid the patch; removed OutermostExprEvaluator, moved evaluation intent into EvaluationMode, made it so the more interesting rulebreaking that __builtin_object_size allows is only allowed when evaluating __builtin_object_size.

Not entirely sold on adding 4 EvaluationModes, but we need to know two bits of information, both of which impact how we evaluate things:

  • Whether we're evaluating an actual ConstantExpression or a PotentialConstantExpression (if we always assume the former, we get memory corruption by reaching into CallStackFrame::Arguments sometimes; if we assume the latter, attribute(enable_if) breaks).
  • If we're allowed to assume that the base of a MemberExpr is valid.

Also, I'm not 100% on calling LValue.set(MaybeAnRValue) if we are assuming that the base of a MemberExpr is valid. However, in such a case, I would think that it's safe to assume that the caller doesn't care about the base. Plus, it's In The Documentation(TM) that you shouldn't rely on Result.getLValueBase() being a valid base when the EvalMode would allow this assumption to be made.

Talked with Richard, and we both agree that adding 4 EvaluationModes is too much for the use case. So, we decided to add a flag to LValue to denote that the LValueBase is invalid. This allows us to get by with just 2 new EvaluationModes, which is much more acceptable.

LValueBases can only be invalid if you're using one of the shiny new EvaluationModes.

rsmith added inline comments.Aug 31 2015, 7:47 PM
lib/AST/ExprConstant.cpp
479

constexpr is a keyword and an adjective. The noun phrase you want here is "constant expressions".

498–502

I think you should only have one mode here, and it should not be a "checking potential constant expression" mode. (Then, if evaluation of a __builtin_object_size call fails from a "checking potential constant expression" mode, we should treat it as a potential success -- that is, return false with no diagnostic.)

It looks like the only place this would go wrong is the assertion in CheckLValueConstantExpression, but you can update that assertion to also allow this one case.

866

I'm not convinced this is a good idea. There are two cases:

  1. We're looking for the complete object (type is 0 or 2). We don't need this change: the right thing to do is discard the subobject designator and just track the offset, which is what we'd do anyway and all we need.
  1. We're looking for the specified subobject (type is 1 or 3). This change seems wrong: we no longer know which the specified subobject is, because the code computed an offset that left the subobject we thought we were in.
1983–1986

Same comment as for the previous case: I don't think this is a good idea. We can track the offset in this case, but we can't reasonably keep the designator valid.

4003–4008

This doesn't make much sense in ExprEvaluatorBase. I think you should only need the change to LValueExprEvaluatorBase::VisitMemberExpr -- it's only lvalue member expressions that should get this behavior.

4626–4632

Why do you hate comments? :)

4879–4907

We need some way to support BOS((char*)&Foo.Bar + 1, N)

Why? This seems like a pretty canonical example of "we do not know which subobject is being referenced any more", and thus we should give up for types 1 and 3 here.

4879–4907

[...] we have an interesting feature in the non-patched impl where B *b = (B*)((A*)&b + 1) is at an offset of sizeof(A), but Designator says it's at b[1]

That certainly seems like a bug. We should have marked the designator as invalid at the pointer arithmetic because it wasn't a pointer to the most derived type. (Are you sure it wasn't your changes to adjustIndex that caused this?)

4980–4987

You need to set the designator to invalid here, or you break the invariant of the LValue and further calculations on it will result in wrong offsets (or maybe assertions). As with the other two cases, I don't think it's reasonable to claim we can track the subobject once this has happened.

6299

Lowercase first letter for function names is preferred in new code.

6331–6336

Not really related to the current patch, but this check seems wrong for the (Type & 1) == 0 case: if the designator is invalid, we don't care what MostDerivedType is, we only care about the type of the Base and the BaseOffset (and now, having a valid base).

6403

We're supposed to return the number of usable bytes; I don't see why we should care if we can't fit an integral number of the pointee type in those bytes.

george.burgess.iv marked 16 inline comments as done.

Addressed all feedback; backed out support for more questionable features (negative indices, non-object-boundary offsets, etc), added logic to ignore pointless casts, and added tests.

We're down to one new EvalMode (which is only used on Type=1|3).

rsmith accepted this revision.Sep 4 2015, 11:47 AM
rsmith edited edge metadata.

This essentially looks fine. Let me know if you want me to take another look once you've fixed the ignorePointerCastsAndParens bug with derived-to-base conversions, otherwise go ahead.

lib/AST/ExprConstant.cpp
499

Maybe rename to OffsetFold or DesignatorFold?

6263–6264

I don't think this is quite right: you should only skip past casts that don't change the pointer value. In particular, this check will step past derived-to-base pointer conversions, which may require an adjustment to the pointer.

This revision is now accepted and ready to land.Sep 4 2015, 11:47 AM
george.burgess.iv marked 2 inline comments as done.

r246877. Thanks for the review!

(Also: Forgot to submit comments with the most recent revision, so you're getting them all now. Sorry. :) )

lib/AST/ExprConstant.cpp
498–502

Migrated to EM_ConstantFold (if Type=0|2) or EM_ConstantExpressionConstantFold (if Type=1|3)

499

DesignatorFold it is -- thanks for the suggestions.

866

SGTM. Backed it out.

4626–4632

They were far too concise and up-to-date for my taste.

4879–4913

Why? This seems like a pretty canonical example of "we do not know which subobject is being referenced any more", and thus we should give up for types 1 and 3 here.

SGTM.

That certainly seems like a bug...

Discussed in person. Was a misunderstanding on my part.

6263–6264

Thanks for catching that -- I really need to get to know C++ better :)

Updated if (Cast == nullptr) to if (Cast == nullptr || Cast->getCastKind() == CK_DerivedToBase).

6299

Thanks for the heads-up :)

6331–6336

We now no longer directly deal with this case, so I'm closing this.

6403

Note: Code is gone, so this no longer applies. Original response below

The only reason we care is because of how we compute the end offset. Currently, we do EndOffset = SizeOfPointee * NumberOfArraySlotsUntilTheEnd. This works fine when we're on an object boundary, but when we're not, the array index = floor(OffsetFromArrayStart / SizeOfPointee). So, without this line, EndOffset = 11 in:

int16_t shorts[5];
__builtin_object_size((char*)&shorts + 1, 1);

I'm happy to phrase it differently in the code, but if we decide to allow off-object-boundary offsets, then we need to find some way to support it.

This commit seems to cause miscompile in LNT testsuite with -O0 and -O3
http://lab.llvm.org:8080/green/job/perf_o0g_run/7070/warnings2Result/new/
http://lab.llvm.org:8080/green/job/perf_o3lto_run/15591/warnings2Result/new/

Looks like there is an undefined in LNT testsuite, here is the relevant code for consumer-lame:

typedef struct                                                                  
{                                                                               
  int used;                                                                     
  int valid;                                                                    
  char title[31];                                                               
  char artist[31];                                                              
  char album[31];                                                               
  char year[5];                                                                 
  char comment[31];                                                             
  char tagtext[128];                                                            
  char genre[1];                                                                
  unsigned char track;                                                          
                                                                                
}   ID3TAGDATA; 
void id3_inittag(ID3TAGDATA *tag) {                                             
  ...
  strcpy( tag->genre, "ÿ"); /* unset genre */      
  ...                             
}

Here is the suggested change:

diff --git a/MultiSource/Benchmarks/MiBench/consumer-lame/id3tag.c b/MultiSource/Benchmarks/MiBench/consumer-lame/id3tag.c
index e24a966..23f2b86 100644
--- a/MultiSource/Benchmarks/MiBench/consumer-lame/id3tag.c
+++ b/MultiSource/Benchmarks/MiBench/consumer-lame/id3tag.c
@@ -34,7 +34,7 @@ void id3_inittag(ID3TAGDATA *tag) {
        strcpy( tag->album, "");
        strcpy( tag->year, "");    
        strcpy( tag->comment, "");
-       strcpy( tag->genre, "<FF>");    /* unset genre */
+       tag->genre[0] = '<FF>'; /* unset genre */
        tag->track = 0;
 
        tag->valid = 0;         /* not ready for writing*/

However, I have trouble understand the code in consumer-typeset. It crashes in z07.c:138
By the looks of it, the code is trying to copy the include path (specified on command line), which is setup to be "$PATH_TO_TEST_SUITE/MultiSource/Benchmarks/MiBench/consumer-typeset/data/include" into unsigned char ostring[4]. From the look of the code, it is trying to utilize all the size in OBJECT to store the entire string. I don't know if that is fixable. Let me know if you have any better idea of how to fix that.