This is an archive of the discontinued LLVM Phabricator instance.

[ProfileData] Sort ProfilingData by hash
AbandonedPublic

Authored by Hahnfeld on Feb 28 2019, 8:39 AM.

Details

Summary

This is expected by instr-remap.test and currently fails when building
LLVM with reverse-iteration: Use std::map instead of SmallDenseMap
which has no guarantees about ordering.

Diff Detail

Event Timeline

Hahnfeld created this revision.Feb 28 2019, 8:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 28 2019, 8:39 AM
davidxl accepted this revision.Feb 28 2019, 8:50 AM

lgtm

This revision is now accepted and ready to land.Feb 28 2019, 8:50 AM

This does look better than the D58385.
But i wonder if this is too strict and is notably slower?
Have you considered only imposing the ordering/sorting where it matters,
i.e. i think where FunctionData is used in InstrProfWriter::writeImpl() and InstrProfWriter::writeText()?

The number of functions with name conflicts (and requires) using secondary map is very few, so the performance should not be affected.

Have you considered only imposing the ordering/sorting where it matters,
i.e. i think where FunctionData is used in InstrProfWriter::writeImpl() and InstrProfWriter::writeText()?

No. But AFAICS you cannot sort a SmallDenseMap, so we'd need to create another std::map just for sorting?

The secondary map 'ProfileData' is needed occasionally (to use cfg hash). doing sorting with std::map is not needed strictly speaking, but it has the nice property of having fixed order.

The secondary map 'ProfileData' is needed occasionally (to use cfg hash). doing sorting with std::map is not needed strictly speaking, but it has the nice property of having fixed order.

Yeah I know, the other approach is making the test resilient against such changes in ordering: D58385

Sorry, I just saw this now. I had a patch to fix this issue (but I didn't get time to follow-up on it). See D57986.

instr-remap.test is failing because FunctionData is iterated in InstrProfWriter to output function counts, etc. But for two functions with the same name the iteration order is not defined if we use a DenseMap. One way to resolve this is to use an ordered container like MapVector.

Another way to resolve this is to sort FunctionData before iteration. In fact, that is what the reviwers of D57986 suggested.

mgrang added inline comments.Feb 28 2019, 10:38 AM
llvm/include/llvm/ProfileData/InstrProfWriter.h
36

I think llvm prefers not to rely on external/STL containers. I would use a MapVector here.

Hahnfeld marked 2 inline comments as done.Feb 28 2019, 10:49 AM

I don't care how this is solved, but it needs to be solved! And it's not getting easier with everyone saying something different

llvm/include/llvm/ProfileData/InstrProfWriter.h
36
$ git grep "std::map" -- llvm/include/ llvm/lib/ | wc -l
426

How about we keep the discussion on the original patch (D57986)? I think we explored the problem space a fair bit there & I'll advocate for the same thing here as I did there: Sorting before emission is likely more efficient than keeping a continuously sorted container, if there's a separate "build" and "iterate" phase you can sort between. At least that's my understanding. std::map and MapVector would both grow memory usage, probably not prohibitively, but a fair bit.

Especially if these are just small clusters of collisions - they can be put in a small container, sorted, emitted, then thrown away - better than changing the whole long-lived/larger data structure.

How about we keep the discussion on the original patch (D57986)? I think we explored the problem space a fair bit there & I'll advocate for the same thing here as I did there: Sorting before emission is likely more efficient than keeping a continuously sorted container, if there's a separate "build" and "iterate" phase you can sort between. At least that's my understanding. std::map and MapVector would both grow memory usage, probably not prohibitively, but a fair bit.

Especially if these are just small clusters of collisions - they can be put in a small container, sorted, emitted, then thrown away - better than changing the whole long-lived/larger data structure.

I have added sort-before-iteration for FuncData in D57986.

Hahnfeld abandoned this revision.Mar 1 2019, 1:20 AM
Hahnfeld marked an inline comment as done.