User Details
- User Since
- Aug 13 2013, 3:10 PM (388 w, 2 d)
Oct 29 2019
Dec 14 2018
Dec 6 2018
I'm not too familiar with the history of OptimizePHIs but some things feel odd to me here:
- The comments in the pass make it seem like it is designed to catch dataflow around loops, yet the motivation for your changes appears to be simpler phis without loops involved.
- I keep wondering why we bother with COPYs here at all. If the COPYs are really trivial COPYs between the same register classes then we really should just remove them when we are still in MachineSSA. Do you know how they are created? Maybe you can simply stop their creation instead?
- Skipping COPYs like this is also a bit inconsequential and will miss several cases: You could have more than 1 COPY in a row or switch between register classes. The PeepholeOptimizer pass already handles these more complex cases, so maybe we rather should look for running OptimizePHIs after the Peephole Optimizer so those COPYs are optimized? (Not sure though if there are other negative effects of running OptimizePHIs later or PeepholeOpt earlier).
Dec 2 2018
Seems useful. Please implement consistently for all stop/start args and document it somewhere.
- Address review feedback
I'm currently less active in LLVM and will not have the time to review this in depth. Some comments:
I certainly was always annoyed that bugpoint wouldn't pick up the tools in the same build directory forcing me to manually specify paths to the tools.
However unfortunately this isn't going far enough and will still search the tools in PATH first, which can still be annoying as in my experience you can often have llvm developer tools installed by your distribution/package managers etc. in your PATH.
Nov 9 2018
Some notes on what is left here:
- Change allocation direction to go from end to begin of a basic block.
- Change debug value handling: The tricky thing with the changed direction is that we dealing with dangling DBG_VALUE instructions (they have a vreg use after the last non-debug use of a vreg) gets trickier. They are rare but occur in several unittests so we now track them separately in the code and perform a forward walk each time after the vreg is allocation by the real last use to see whether the register reached the dangling DBG_VALUE.
- This also simplifies DBG_VALUE handling, to always referenvce stack slots in case of spills as we now place spills immediately adjacent to the definition of a value/register.
- Generally rewrote allocateInstruction to be hopefolly somewhat simpler
- Added some heuristic for tricky inline assembly early clobber situations: If we detect a tricky situation now, we will take all def and early-clobber like operands and sort them by the size of the register class so that we can assign tricky cases more consistently and are more likely to be successfull (this was necessary to fix some unittests that were lucky with the old allocation algorithm but unlucky with the new one. With the new heuristic we should be less dependent on luck).
- Simpler patch now that more things have been split up into separate review and NFC patches.
This is part of my rewrite regallocfast series. See also D52010
This is part of my rewrite regallocfast series. See also D52010
This is part of my regallocfast rewrite series, see also D52010
Maybe you can give it at least some cursory review?
Nov 8 2018
Take my "meh" as LGTM if nobody objects in the next couple days.
The whole registry system isn't used much. The whole machine seems a little overengineered in retrospect (there only ever is a single listener for example). But making it typesafe seems sensible.
Nov 7 2018
And fix one more variable name typo (sorry for all the updates).
And fix one more variable name typo (sorry for all the updates).
Tweak some more comments.
- Added two more tests
- Revamped explanation of the transformation. It's better but I fear still very hard to crasp. Oh well it's as good as I can make it right now...
@mstorsjo could you check if this fixes your original problem?
After pondering about this again, this should be the correct fix.
Nov 6 2018
I extracted a couple of obvious cleanups to reduce the code churn here. They landed in rL346287, rL346288, rL346289, rL346296, rL346297, rL346298
(will update and split the diff here soon)
@mips target owners: Please take a look at the unittests where I had to use -rafast-ignore-missing-defs to workaround invalid machine code. Those tests use a virtual register before it is defined (the old regalloc fast used to just ignore this in some cases).
- Adapted all the unit tests
- Slight tuning of register hinting code
- Improved allocation heuristic for livethrough and earlyclobber registers, allowing us to allocate more inline assembly constructs consistently (rather than the previous allocator doing it by luck).
It seems to have a bug now where the first time a physical register is used, it immediately sets kill on it even though there are further uses of the same def
Nov 5 2018
hmm this is still buggy...
Putting this up for early review, still need to create a minimal testcase.
Unfortunately, it doesn't seem to fix the problem reported in that PR. But maybe it gets us closer to being able to fix it?
Nov 1 2018
I'm not opposed to this change. But out of curiosity:
Would you be in a position to mark your extra operands as explicit operands since the machine does appear to be reading/writing them in your case? (Maybe you have to mark the instruction as variadic, or I would be open to invent a new MCInstrDesc flag if that helps...)
Oct 31 2018
LGTM. Looks sensible to me, nitpicks below.
LGTM, thanks for pushing this! Some of the changes here look like things that were just waiting to become an actual problem.
Oct 30 2018
- Use empty implementation from https://reviews.llvm.org/D53909
- Add a test (I wish writing minimal debug info tests was simpler than this...)
- Add a simple unittest
- Go with the adl_begin() == adl_end() implementation for now. We can always switch to more fine grained overloading when we actually deem it necessary for performance. And after all we will probably just replace it with std::empty() anyway once we use C++17 in llvm.
Oct 23 2018
I'm now depending on this for a change I'm working on rather than figuring out how to fix kill flag handling
I'm not sure what exactly you mean by that. But just as a warning: This patch will lead to more kill flags being added than before, but it does not guarantee that every last use has a kill flag (especially vregs live across blocks will still lack kill flags).
Updated version
ping
- I am slowly adatpting all the tests to the changes here; I have a commit fixing ~100 of them but there is still maybe 30-40 to go.
- I believe I have the first two asserts fixed locally here.
- The 3rd assert is picking up invalid MIR (a value is used before it is defined), my local version now has a commandline flag that allows us to disable that assert for the couple tests that show invalid MIR (planning to file PRs for the target owners to fix, so we can remove the commandline flag once the last test is fixed).
Oct 22 2018
The code in PEI::replaceFrameIndices also seems sketchy to me as in case of nested callframes it would set InsideCallSequence to false after the inner ADJCALLSTACKUP (though at a first glance I wasn't sure yet why that variable exists at all, and why we would not track sp adjustments outside of callframe setups...)
What about MachineVerifier::verifyStackFrame()? That currently rejects any function with nested callframe setups. You will probably see your testcases fail when running with -verify-machineinstrs.
Please try whether using DoNotOptimize as suggested works. With that addressed LGTM.
Oct 21 2018
Can you explain what you would use per-thread command line options for?
Intuitively I would not expect actual commandline users wanting to set options per thread. If you need it to tweak compiler behavior then it might be better to find ways to encode the information in TargetOptions.h, function attributes or similar, so we have a streamlined way of setting them independently of programmatically modifying commandline options.
Oct 16 2018
Did you make sure this typically finished in 0.5-1s of time?
Oct 15 2018
LGTM
Oct 11 2018
- drop accidental reformatting of existing code from the patch.
noticed that the test needs to operate on i64 to have any effect (as all i32 values are reported as TCC_Free anyway)
- This is not inspired by some benchmark, but from me needing stable tests in an upcoming change making DAGCombiner slightly less aggressive with OpaqueConstants. It does however seem like the right thing to do in general.
This is currently not motivated by a benchmark but preparation for an upcoming commit of mine where I don't want ConstantHoisting to screw with dev/rem and mess up testcases.
Oct 8 2018
As far as I understand MachineMemoryOperands:
Oct 5 2018
Sorry for slow response. LGTM, some nitpicks below but feel free to fix testcases and nitpicks at your own discretion.
Landed in rL343874
Oct 4 2018
That explains why I couldn't find those line tables in the executables :)
Is it possible that your test harness is removing .dSYMs? .. ah, no, what's happening is that clang declines to add a dsymutil step when it's being used to link .o's into an executable.
The actual instructions are exactly the same, at least in the couple examples I picked.
Oct 3 2018
Recommitted as rL343895 now, let's hope it passes the tests this time.
Comparing two test-suite builds gives me:
Oct 2 2018
+CC dsanders
Feel free to revert for now, but are you sure it is actually this commit?