This is an archive of the discontinued LLVM Phabricator instance.

[esan|cfrag] Add counters for struct array accesses
ClosedPublic

Authored by zhaoqin on Jun 21 2016, 8:26 PM.

Details

Summary

Adds one counter to the struct counter array for counting struct
array accesses.

Adds instrumentation to insert counter update for struct array
accesses.

Diff Detail

Repository
rL LLVM

Event Timeline

zhaoqin updated this revision to Diff 61493.Jun 21 2016, 8:26 PM
zhaoqin retitled this revision from to [esan|cfrag] Add counters for struct array accesses.
zhaoqin updated this object.
zhaoqin added a reviewer: aizatsky.
zhaoqin added subscribers: vitalybuka, zhaoqin, kcc and 3 others.
zhaoqin updated this revision to Diff 61499.Jun 21 2016, 8:53 PM

update comment to explain why we append the counter array.

zhaoqin updated this revision to Diff 61558.Jun 22 2016, 8:42 AM

Rename FieldCounters to Counters

aizatsky requested changes to this revision.Jun 23 2016, 2:52 PM
aizatsky edited edge metadata.
aizatsky added inline comments.
lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
174 ↗(On Diff #61558)

It is not clear to me what you are getting by appending counter to the end rather than adding a fixed field to the struct. Wouldn't the code become simpler?

This revision now requires changes to proceed.Jun 23 2016, 2:52 PM
zhaoqin added inline comments.Jun 23 2016, 6:42 PM
lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
174 ↗(On Diff #61558)

As I explained in the comment below, I was trying to avoid creating additional variable name for the new counter, so reduce link time, which deduplicates/merges counters based on their names.

zhaoqin added inline comments.Jun 24 2016, 9:20 AM
lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
174 ↗(On Diff #61558)

To be clear, the struct instance is created for each compilation/translation unit (e.g., x264.cc).
The counter (and all the struct field counters) is for each structure (e.g., struct x264_t). We rely on special name and weak linkage to merge the counters of the same struct from different CUs at link time.
I could create separate counter with special name for each case, but I think an array of counter seems more reasonable, it is also easy to extend without breaking the binary compatibility in the future.
I will try to make the code more clear on that.

dberlin added inline comments.
lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
738 ↗(On Diff #61558)

So it looks like you just ignore any negative indices?
(but you don't mark them ignored)

Note that both

 %a = getelementptr inbounds {[2 x i8], [2 x i8]}, {[2 x i8], [2 x i8]}* %p, i32 0, i32 0, i32 1
%b = getelementptr inbounds {[2 x i8], [2 x i8]}, {[2 x i8], [2 x i8]}* %p, i32 0, i32 1, i32 -1

are valid :)

zhaoqin updated this object.Jul 1 2016, 8:26 AM
zhaoqin edited edge metadata.
zhaoqin marked an inline comment as done.Jul 1 2016, 8:29 AM
zhaoqin added inline comments.
lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
738 ↗(On Diff #61558)

As you can see, the Idx here is to get from a struct field, if (isa<StructType>(Ty))
I do not think a negative index for struct field indexing is valid.

zhaoqin marked an inline comment as done.Jul 1 2016, 8:29 AM
zhaoqin updated this revision to Diff 62496.Jul 1 2016, 8:30 AM
zhaoqin edited edge metadata.

refact StructInfo

I do not think a negative index for struct field indexing is valid.

You are incorrect :)

FWIW, this is a common mistake.
Please remember that LLVM level structs and GEP do not have the same rules
and pretty much no relation to C-level structs.

GEP is just an indexing operation. Unlike C, i can form pointers to
wherever i want, and even if i mark them inbounds, your only guarantee is
that they are in the object somewhere, you have no guarantee on whether i
use positive or negative gep indices to generate those pointers and either
way is just as valid.
As mentioned, there are plenty of cases frontends will generate negative
gep indices for normal source level field accesses of C-level structs.

zhaoqin updated this revision to Diff 62513.Jul 1 2016, 11:06 AM
zhaoqin edited edge metadata.

use getSExtValue instead of getZExtValue for GetElementPtr instr.

I do not think a negative index for struct field indexing is valid.

You are incorrect :)

I see, thanks! Update CL.

FWIW, this is a common mistake.
Please remember that LLVM level structs and GEP do not have the same rules
and pretty much no relation to C-level structs.

GEP is just an indexing operation. Unlike C, i can form pointers to
wherever i want, and even if i mark them inbounds, your only guarantee is
that they are in the object somewhere, you have no guarantee on whether i
use positive or negative gep indices to generate those pointers and either
way is just as valid.
As mentioned, there are plenty of cases frontends will generate negative
gep indices for normal source level field accesses of C-level structs.

I understand that GEP is just an indexing operation, which I treat it like LEA instr in x86.
One thing I do not understand is how the negative index would work for a struct type.
For array, because we know the element size, we can use positive or negative indexing to calculate an address/offset from the base.
However for struct, if the index is not within the struct field range, i.e., negative or too large, how does the compiler know how the offset/address is calculated?
For example, if the index is -1, how does the compiler decide the first negative struct field size?
I would expect the compiler will generate an error on fail to compute the offset.
Will there be a default size if no field size information available for the out of range index?

You are right. What will happen is it will not look like a structure field
access, and will use i8 or something so it can do the calculation.
I'll paste a clang generated example in a bit

aizatsky accepted this revision.Jul 1 2016, 2:44 PM
aizatsky edited edge metadata.

Much simpler now. Thanks.

LG modulo Daniel's comments.

This revision is now accepted and ready to land.Jul 1 2016, 2:44 PM

You are right. What will happen is it will not look like a structure field
access, and will use i8 or something so it can do the calculation.
I'll paste a clang generated example in a bit

I just tried a manual test example as below:

%struct.RT = type { i8, [10 x [20 x i32]], i8 }
%struct.ST = type { i32, double, %struct.RT }

define i32* @foo(%struct.ST* %s) nounwind uwtable readnone optsize ssp {
entry:

%arrayidx = getelementptr inbounds %struct.ST, %struct.ST* %s, i64 0, i32 -1
ret i32* %arrayidx

}

which I tried gep on struct.ST with a negative field.
And I get the error that invalid getelementptr indices:

$ /usr/local/google/home/zhaoqin/Workspace/LLVM/builds/build_x64_rel.git/./bin/opt < /usr/local/google/home/zhaoqin/Workspace/LLVM/llvm.git/test/Instrumentation/EfficiencySanitizer/struct_field_gep.ll -esan -esan-cache-frag -S
/usr/local/google/home/zhaoqin/Workspace/LLVM/builds/build_x64_rel.git/./bin/opt: <stdin>:28:50: error: invalid getelementptr indices

%arrayidx = getelementptr inbounds %struct.ST, %struct.ST* %s, i64 0, i32 -1
                                                                               ^

Even without esan instrumentation, the same error.

/usr/local/google/home/zhaoqin/Workspace/LLVM/builds/build_x64_rel.git/./bin/opt < /usr/local/google/home/zhaoqin/Workspace/LLVM/llvm.git/test/Instrumentation/EfficiencySanitizer/struct_field_gep.ll -S/usr/local/google/home/zhaoqin/Workspace/LLVM/builds/build_x64_rel.git/./bin/opt: <stdin>:28:50: error: invalid getelementptr indices

%arrayidx = getelementptr inbounds %struct.ST, %struct.ST* %s, i64 0, i32 -1
                                               ^
This revision was automatically updated to reflect the committed changes.