Fix for the problem shown in D135459.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp | ||
---|---|---|
526 | Avoid C-style casts. It should work as hash_value(ArrayRef<ArgInfo>(Info.Args)); | |
528 | This is not correct, if the hash values clash two different specialisation will be regarded as the same and one will be discarded. | |
529 | Discarding duplicates should happen before we decide which of the possible specialisations to perform. For example, in calculateGains the MaxClonedThreshold limit is applied on a list, which potentially contains duplicates. | |
534 | This looks like a big redundancy, we traverse all the call sites of F for each specialisation while all we need is traverse all the call sites of F once and redirect the call site to the correct specialisation. | |
539 | So the idea here is, when specializing on the second parameter only, to turn void g(int x, int y) { ... g(x, y); ... } ... g(1, 2); into void g.1(int x, /* unused */ int y) { ... g.1(x, 2); ... } ... g.1(1, 2); But the same ought to happen for: void g(int x, int y) { ... g(x, 2); ... } ... g(1, 2); and it looks to me the test will miss it because it will compare 2 (from the call argument) to y in the cloned function. (Similar issue in the original code as well). |
llvm/include/llvm/Transforms/Utils/SCCPSolver.h | ||
---|---|---|
54 | I don't really see the point in creating the temporary pair. return hash_combine(Info.Formal, Info.Actual) |
This adds just a 0.03% geomean overhead in terms of instruction count for CTMark (llvmtestsuite O3+NewPM) over the parent revision D126455 (with specialization of functions enabled).
llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h | ||
---|---|---|
82 ↗ | (On Diff #475864) | This has to be a DenseSet. std::set has logarithmic complexity of operations. |
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp | ||
408–409 | Here evaluation of the gain occurs even if the specialization is a duplicate. | |
726–727 | (nit) This function return Close twice, once as a return value and once as an out argument. A cleaner design would be to just return the value as the function does not depend on the Clone member variable and need not know anything about it. |
llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h | ||
---|---|---|
82 ↗ | (On Diff #475864) | DenseSet is more expensive to iterate on. I have another patch which sorts the specializations globally (across module, not per function) where I keep the specializations sorted in a std::multiset and I use std::merge to unite the two sets. That's relatively cheap as it doesn't perform copies of objects but shuffles pointers around. Also that multiset is iterated a lot more than the set I am adding here. The compexity is logarithmic to the number of callsites, still not a huge number in my opinion. |
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp | ||
408–409 | This is already happening, but good point there's room for improvement. | |
726–727 | I cached the Clone pointer to the SpecInfo for the parent revision D126455, so that updateCallSites() does not rely on two vectors (clones, specialziations) being in sync. So you are suggesting to not set the member variable here, but after returning from this function? |
llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h | ||
---|---|---|
82 ↗ | (On Diff #475864) | This should be a comparison function which induces strict weak ordering on the elements. |
llvm/lib/Transforms/IPO/SCCP.cpp | ||
---|---|---|
46 ↗ | (On Diff #481216) | oops |
I don't really see the point in creating the temporary pair.