Page MenuHomePhabricator

[OpenMP][InstrProfiling] Fix a missing instr profiling counter
ClosedPublic

Authored by lxfind on Mar 6 2021, 8:40 PM.

Details

Summary

When emitting a function body there needs to be a instr profiling counter emitted. Otherwise instr profiling won't work for this function.

Diff Detail

Event Timeline

lxfind created this revision.Mar 6 2021, 8:40 PM
lxfind requested review of this revision.Mar 6 2021, 8:40 PM
Herald added a project: Restricted Project. · View Herald Transcript

I know nearly nothing about OpenMP..

You need a %clang_cc1 -disable-llvm-passes test to demonstrate the codegen change (call void @llvm.instrprof.increment({{.*}})). Oh, it is unfortunate that there isn't sufficient llvm.instrprof.increment test under clang/test

lxfind updated this revision to Diff 328839.Mar 6 2021, 10:38 PM

add test

MaskRay accepted this revision.EditedMar 9 2021, 5:04 PM
MaskRay added a subscriber: vsk.

LG, but I hope an OpenMP expert can take a look. Perhaps @vsk can comment on where the tests should be improved (currently we hardly have any @llvm.instrprof.increment IR test for clang -fprofile-instr-generate).

clang/test/OpenMP/omp_with_loop_pragma_instr_profile.c
17

Nit: append a trailing ( so that it's clear the instruction is llvm.instrprof.increment, not llvm.instrprof.increment.step.

This revision is now accepted and ready to land.Mar 9 2021, 5:04 PM
lxfind updated this revision to Diff 329517.Mar 9 2021, 8:03 PM

address comment

Thanks. I am landing it.
But feel free to comment here if anything isn't right. @ABataev

There is a problem. We actually do not emit S here directly, instead, we use CodeGen lambdas, which may not be equal to S, in some cases S is nullptr here. It may result in not quite accurate results.

There is a problem. We actually do not emit S here directly, instead, we use CodeGen lambdas, which may not be equal to S, in some cases S is nullptr here. It may result in not quite accurate results.

Thanks for the note!
I don't really know anything about OMP though, not sure how to handle it.
Would you mind taking a look at this issue? Feel free to send a different patch!

There is a problem. We actually do not emit S here directly, instead, we use CodeGen lambdas, which may not be equal to S, in some cases S is nullptr here. It may result in not quite accurate results.

Thanks for the note!
I don't really know anything about OMP though, not sure how to handle it.
Would you mind taking a look at this issue? Feel free to send a different patch!

Could you create a PR for the problem so we could track it?

There is a problem. We actually do not emit S here directly, instead, we use CodeGen lambdas, which may not be equal to S, in some cases S is nullptr here. It may result in not quite accurate results.

Thanks for the note!
I don't really know anything about OMP though, not sure how to handle it.
Would you mind taking a look at this issue? Feel free to send a different patch!

Could you create a PR for the problem so we could track it?

https://bugs.llvm.org/show_bug.cgi?id=49521

vsk added a comment.Mar 10 2021, 12:00 PM

LG, but I hope an OpenMP expert can take a look. Perhaps @vsk can comment on where the tests should be improved (currently we hardly have any @llvm.instrprof.increment IR test for clang -fprofile-instr-generate).

This doesn't necessarily indicate a gap in test coverage. There are plenty of tests under clang/test/Profile that cover IR generation for profiling instrumentation.

lxfind abandoned this revision.Mar 25 2021, 10:23 AM

@ABataev, wondering if you have a timeline on this?
Missing counters from OMP functions sometimes cause llvm-cov to crash because of data inconsistency.

@ABataev, wondering if you have a timeline on this?
Missing counters from OMP functions sometimes cause llvm-cov to crash because of data inconsistency.

Cannot answer right now. It would be much easier to fix this if you could provide additional info about what tests are exactly failed, what are the constructs that do not support it, etc.

@ABataev, wondering if you have a timeline on this?
Missing counters from OMP functions sometimes cause llvm-cov to crash because of data inconsistency.

Cannot answer right now. It would be much easier to fix this if you could provide additional info about what tests are exactly failed, what are the constructs that do not support it, etc.

Yes the whole pipeline is a bit long and complex, so I don't have an exact repro in hand because it requires source code and run it.

But let me try to explain what happened in my observation. There are two sections that are related to this issue in the binary, the IPSK_covfun section that contains the function records, and the IPSK_name section that contains the list of all function names. The issue here is that some OMP functions that are found in the IPSK_covfun section are not found in the IPSK_name section.

The records in IPSK_covfun are generated like this:

Whenever CodeGenFunction is generating code for any function, it will first call the CodeGenFunction::GenerateCode() function, in which it will call PGO.assignRegionCounters(GD, CurFn);: https://github.com/llvm/llvm-project/blob/e5f51fdd650c6d20c81fedb8e856e9858aa10991/clang/lib/CodeGen/CodeGenFunction.cpp#L1329

From there, it will call emitCounterRegionMapping:
https://github.com/llvm/llvm-project/blob/e5f51fdd650c6d20c81fedb8e856e9858aa10991/clang/lib/CodeGen/CodeGenPGO.cpp#L819

which will then call: CGM.getCoverageMapping()->addFunctionMappingRecord:
https://github.com/llvm/llvm-project/blob/e5f51fdd650c6d20c81fedb8e856e9858aa10991/clang/lib/CodeGen/CodeGenPGO.cpp#L890

which will eventually add this function to a FunctionRecords:
https://github.com/llvm/llvm-project/blob/e5f51fdd650c6d20c81fedb8e856e9858aa10991/clang/lib/CodeGen/CoverageMappingGen.cpp#L1655

So all above is to show that every function will eventually be added to FunctionRecords, unless in the case where a function is explicitly marked as unused. The FunctionRecords will eventually be all written into the IPSK_covfun section in the binary.

The names in IPSK_name section are generated like this:
Within InstrProfiling.cpp (https://github.com/llvm/llvm-project/blob/e5f51fdd650c6d20c81fedb8e856e9858aa10991/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp), it collects all the function names referenced in all instrumentation counter increments instructions:
https://github.com/llvm/llvm-project/blob/e5f51fdd650c6d20c81fedb8e856e9858aa10991/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp#L920
Basically for every InstrProfIncrementInst, it adds the function name referenced in it to a list ReferencedNames.
Then in the end this ReferencedNames is written into the IPSK_name section.

Now you can see that for the OMP functions that don't have any counters, they are still added to FunctionRecords, but not added to ReferencedNames, because they are not referenced by any InstrProfIncrementInst.
During the running of llvm-cov, when reading the list of function records, it will attempt to look up the name of the function from the function name list:
https://github.com/llvm/llvm-project/blob/b9ff67099ad6da931976e66f1510c5af2558a86e/llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp#L560

And it will not be able to find it for the OMP case, so it will return an error.

Overall this is very complex and a bit fragile to me. For instance, we probably could have detected the error much earlier during Instrumentation pass in LLVM, that some function records' names are not in the name list. Or we could simply construct the list of function names based on the function records. But currently these two are generated independently.
cc @MaskRay and @vsk, maybe they have thoughts on this.

@ABataev, wondering if you have a timeline on this?
Missing counters from OMP functions sometimes cause llvm-cov to crash because of data inconsistency.

Cannot answer right now. It would be much easier to fix this if you could provide additional info about what tests are exactly failed, what are the constructs that do not support it, etc.

Yes the whole pipeline is a bit long and complex, so I don't have an exact repro in hand because it requires source code and run it.

But let me try to explain what happened in my observation. There are two sections that are related to this issue in the binary, the IPSK_covfun section that contains the function records, and the IPSK_name section that contains the list of all function names. The issue here is that some OMP functions that are found in the IPSK_covfun section are not found in the IPSK_name section.

The records in IPSK_covfun are generated like this:

Whenever CodeGenFunction is generating code for any function, it will first call the CodeGenFunction::GenerateCode() function, in which it will call PGO.assignRegionCounters(GD, CurFn);: https://github.com/llvm/llvm-project/blob/e5f51fdd650c6d20c81fedb8e856e9858aa10991/clang/lib/CodeGen/CodeGenFunction.cpp#L1329

From there, it will call emitCounterRegionMapping:
https://github.com/llvm/llvm-project/blob/e5f51fdd650c6d20c81fedb8e856e9858aa10991/clang/lib/CodeGen/CodeGenPGO.cpp#L819

which will then call: CGM.getCoverageMapping()->addFunctionMappingRecord:
https://github.com/llvm/llvm-project/blob/e5f51fdd650c6d20c81fedb8e856e9858aa10991/clang/lib/CodeGen/CodeGenPGO.cpp#L890

which will eventually add this function to a FunctionRecords:
https://github.com/llvm/llvm-project/blob/e5f51fdd650c6d20c81fedb8e856e9858aa10991/clang/lib/CodeGen/CoverageMappingGen.cpp#L1655

So all above is to show that every function will eventually be added to FunctionRecords, unless in the case where a function is explicitly marked as unused. The FunctionRecords will eventually be all written into the IPSK_covfun section in the binary.

The names in IPSK_name section are generated like this:
Within InstrProfiling.cpp (https://github.com/llvm/llvm-project/blob/e5f51fdd650c6d20c81fedb8e856e9858aa10991/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp), it collects all the function names referenced in all instrumentation counter increments instructions:
https://github.com/llvm/llvm-project/blob/e5f51fdd650c6d20c81fedb8e856e9858aa10991/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp#L920
Basically for every InstrProfIncrementInst, it adds the function name referenced in it to a list ReferencedNames.
Then in the end this ReferencedNames is written into the IPSK_name section.

Now you can see that for the OMP functions that don't have any counters, they are still added to FunctionRecords, but not added to ReferencedNames, because they are not referenced by any InstrProfIncrementInst.
During the running of llvm-cov, when reading the list of function records, it will attempt to look up the name of the function from the function name list:
https://github.com/llvm/llvm-project/blob/b9ff67099ad6da931976e66f1510c5af2558a86e/llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp#L560

And it will not be able to find it for the OMP case, so it will return an error.

Overall this is very complex and a bit fragile to me. For instance, we probably could have detected the error much earlier during Instrumentation pass in LLVM, that some function records' names are not in the name list. Or we could simply construct the list of function names based on the function records. But currently these two are generated independently.
cc @MaskRay and @vsk, maybe they have thoughts on this.

Ok, now I see. I think you can restore this patch, it should be enough to fix the problem. Just check that S is not nullptr before calling CGF.incrementProfileCounter(S);.

lxfind updated this revision to Diff 333417.Mar 25 2021, 1:50 PM

check null on S

This revision is now accepted and ready to land.Mar 25 2021, 1:50 PM
This revision was landed with ongoing or failed builds.Mar 25 2021, 1:52 PM
This revision was automatically updated to reflect the committed changes.