It is 'on' by default.
Diff Detail
Event Timeline
please make it false by default since the phase is not really working (compilation times out)
Can we actually add an equivalent clang debug tuning option called "-fvar-tracking" similar to the gcc one here as well?
-eric
I kept it 'on' as it works in most of the cases. I will try to submit a patch by EoD that fixes the issue.
I noticed with a.ll, the test case from https://llvm.org/bugs/show_bug.cgi?id=26055, that there are too many DEBUG_VALUE instructions getting inserted. Statistics showed >6L instructions getting inserted:
616584 live-debug-values - Number of DBG_VALUE instructions inserted
One reason is the presence of too many basic blocks within the function. With so many blocks, the convergence takes really long time. I thought of ignoring functions with more basic blocks. But instead, I have added a simple heuristic to stop convergence after certain number of iterations. Currently the max iterations is arbitrary (chosen to be 10).
Note that test case in https://llvm.org/bugs/show_bug.cgi?id=26055 is produced from auto-generated code, so its BB structure should be very regular, and you can come up with rather precise estimate of how many passes you will need to converge. Does current bound (10) make compile time reasonable for that case?
-fvar-tracking: yes, please! It would be nice to be able to toggle this pass from frontend.
I have a number of comments regarding the code as well, but I will rather see quick fixes (or revert) committed ASAP: this is hurting us pretty badly, as we can't build several projects with optimization and debug info enabled.
Okay, i'm confused.
(Note: I helped write var-tracking for GCC).
Let's ignore the inefficient data structure usage here (ie appending lists instead of using constant time splices, using lists instead of hash maps, etc).
The reason your convergence is slow, is, AFAICT, because you are using a random basic block order for processing the dataflow problem.
You should be walking blocks in reverse postorder, which is the optimal order for this problem, and will significantly speed up convergence.
I'm also going to take a pass at writing comments on the algorithmic issues here.
lib/CodeGen/LiveDebugValues.cpp | ||
---|---|---|
299 | Can you please explain, precisely, what this accomplishes? | |
323 | This looks like another N^2 part. | |
390 | Please use a SmallPtrSet<BasicBlock *, 16> Visited to to track whether you visited a block or not, don't go searching through the worklist. |
lib/CodeGen/LiveDebugValues.cpp | ||
---|---|---|
83 | There is no reason for this to be a global variable. | |
85 | Ditto. | |
374 | You really need to be iterating in reverse postorder. One trivial way to do that is to use the ordering and two worklists. std::priority_queue <unsigned int> worklist; std::priority_queue <unsigned int> pending; DenseMap<MachineBasicBlock *, unsigned int> BBToOrder; DenseMap<unsigned int, MachineBasicBlock *> OrderToBB ReversePostOrderTraversal<MachineBasicBlock *> RPOT(entryblock); unsigned int RPONumber = 0; for (auto I = RPOT.begin(), auto E = RPOT.end(); I != E; ++I) { BBToOrder[*I] = RPONumber; OrderToBB[RPONumber] = *I; worklist.push_back(RPONumber); ++RPONumber; } while (changed) { <go through worklist, use OrderToBB[I] to get BB from worklist> put necessary successors on pending using BBToOrder. std::swap(worklist, pending) Clear pending. } |
Thanks @dberlin for the suggestions and working on it (http://reviews.llvm.org/D16039)
There is no reason for this to be a global variable.
Please just make your transfer functions return bools and | them together.