Under certain (currently unknown) conditions, llvm-profdata is outputting
profiles that have two consecutive entries in the MemOPSize section for the
value 0. This causes the PGOMemOPSizeOpt pass to output an invalid switch
instruction with two cases for 0. As mentioned, we’re not quite sure what’s
causing this to happen, but this patch prevents llvm-profdata from outputting a
profile that has this problem and gives an error with a request for a
reproducible.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
The error handling additions look good to me, but I don't have the background to lgtm InstrProfWriter::validateRecord. cc @xur.
I'm not sure llvm-profdata is good place to fix. I would rather to fix this in the memop size transformation -- to detect this in the heuristic.
At the same, I'm interested to know what is the cause of this. Should we do something in profile runtime to prevent this.
I would rather to fix this in the memop size transformation -- to detect this in the heuristic.
I think it would be valuable to guard against this pattern within the transform, but I'm concerned that putting the error in the pass would make it harder to debug this issue. If we detect this in llvm-profdata, the user is more likely to have/provide the profraw files causing this behavior.
Should we do something in profile runtime to prevent this.
Do you think llvm-profdata could be causing this issue? I was thinking that profdata might be merging two profiles incorrectly. Either way, having the profraw files would be valuable for debugging...
Sorry for the delay.
I replied to the comments on Dec 17 via email. But it did not come through. I don't know the reason. I put my comments here.
I think it would be valuable to guard against this pattern within the transform, but I'm concerned that putting the error in the pass would make it harder to debug this issue. If we detect this in llvm-profdata, the user is more likely to have/provide the profraw files causing this behavior.
I don't think putting this into transformation will make it more difficult to debug. There are ready many legality heuristics there.
To me, the merge operation should be a straightforward translation from raw format to index format. We should not put filters there.
A warning is OK but not a filter of profile records.
Should we do something in profile runtime to prevent this.
Do you think llvm-profdata could be causing this issue? I was thinking that profdata might be merging two profiles incorrectly. Either way, having the profraw files would be valuable for debugging...
There are chances of llvm-prodate bugs but more chances in the compiler-rt runtime.
Providing the raw profile and the index profile will be very helpful for the debuging.
I agree with Rong here -- the semantic related error checking belongs to the consumer/client, not in the tool.
In profile use pass, the error message can be more meaningful. On the other hand, finding the root cause of it will be more valuable.
- Add in-pass profile data check
- Change error to warning in llvm-profdata
- Update tests
I think the patch is going the right direction. Some minor comments:
(1) If we have two entries with the same value (other than 0), will we have the same issue? I think we will have same invalid switch statement. So can we extend this patch to handle this? That should be a small change.
(2) can we change "invalid merge" to "invalid profile"? This does not look like a merge issue -- at least from the test case where the problem is from the raw profile. This points to profile runtime issue. I think invalid profile is more appropriate here.
- Update in-pass check to detect all duplicates
- Remove duplicate from MemOp pass test case
- Change warning to "Invalid profile"
llvm/lib/Transforms/Instrumentation/PGOMemOPSizeOpt.cpp | ||
---|---|---|
342 | This initial value mean leads to false warning. Use -1. |
The transformation part looks good to me.
Func validateRecord() still checks entries with 0 size only.
But it's OK, as the transformation will catch bad profiles.
I'll approve this patch. Sorry for taking so long.
This initial value mean leads to false warning. Use -1.