- User Since
- Dec 2 2012, 11:45 PM (281 w, 2 d)
Feb 26 2018
This looks fine to me based on a quick review. I don't know if @fhahn or @MatzeB still want to weigh in. Not sure if anyone else needs to review the SystemZ specific code or if you effectively own that.
Feb 21 2018
Just to express my thoughts: I see this point in the sense that if a target truly had a perfect set-in-stone tuning, it would be disastrous to change anything in an uncontrollable way. But given that mischeduler is relatively new and evolving, and a target may merely have been able to improve benchmarks with a minor modification, I think it's more natural to think that a target would really want to be in on the improvements to come in the future. In other words, *not* to decouple. Take register pressure, for instance. I don't think a target that does not have it's own register pressure heuristics would want to fall behind if the common code changes in the future. That change should then be general goodness, or it should be put in a target specific strategy, right?
So to me personally, working on just one backend, it would be slightly preferred to have e.g. the tryCandidate_RegPress() function in the target strategy, so that if somebody improved it, my target would immediately get that improvement. I don't want to decouple this, since I was merely adding a heuristic with lesser priority.
On the other hand, providing just the smaller utility functions and then doing a copy-and-paste of tryCandidate() should probably work quite well in practice, as you say. In the very long run, this would also give maximum flexibility for each target.
I suppose then we could just have one or two of those new methods, like tryCandidate_Clustered_Weak().
Feb 20 2018
Thanks for CC'ing me, but I can't comment on whether this is the right heuristic since I haven't done this level of tuning in a long time. Deferring to @MatzeB.
Feb 14 2018
I can confirm that the purpose of ItinRW is to avoid the need to define InstRW for a subtarget when the target already had itinerary classes. (Things would be *a lot* simpler if all the subtargets used the same style of machine description.)
Nov 22 2017
Obviously I meant "With MicroOpsBufferSize >1 (OOO)" above.
Think of CurrCycle as the cycle in which an instruction is issued.
Sep 7 2017
Note that you can get the stack trace just call RunSignalHandlers before calling exit(1)...
Sep 6 2017
Please add a comment that abort() also causes any signal handlers registered with LLVM such as PrettyStackTrace to run, which is important for compiler debugging.
Aug 16 2017
Andy: Do the comments now look ok?
Aug 7 2017
It sounds to me like you want a "MI-Sched Mode" convenience flag that does more than what it claims to do at face value, rather than needing to specify all the individual flags for SelectionDAG heuristic, enabling MI-Sched, coalescing heuristic. I'm not going to nit pick on the flag names because I don't have any stake it in. But I think you need to discuss this on llvm-dev and argue it out taking into consideration all of the goals:
I don't understand the motivation here. The purpose of command line flags is to test functionality in isolation. If you want a test case to control both SelectionDag and MISched functionality, then you use two command line flags.
Jul 28 2017
Jul 20 2017
Jun 27 2017
Jun 12 2017
Jun 5 2017
As a generally good direction I like the idea of moving LFTR, and eventually even IV widening, forward to the LSR stage of the pipeline.
So, I'm ok with it as long as you do the normal amount of benchmark validation and @mzolotukhin doesn't see any immediate performance issues with this approach.
May 9 2017
I'm haven't looked at this code in a long time, and I have no idea if it makes sense to have an Untyped MVT at this point in the code. But... your fix looks harmless.
May 5 2017
Can you collect spill code stats and benchmark scores from the test suite?
Can you get someone to run benchmarks on arm64 as well?
May 2 2017
The time to shrink live ranges is after all the global machine code
motion is done if the goal is handle as many cases as possible.
I think the algorithm needs to be explained. It seems reasonable to argue that single def/use live ranges should be kept short. But is there any reason at all to run this before other IR passes? It seems natural to run this after machine LICM and Sinking.
Ideally there should be a separate pass that runs on the SSA machine code (before register coalescing) to minimize register pressure and hide latency for chains of loads or FP ops. It should work across calls and loop boundaries. You could even do a kind of poor-man's cyclic scheduling this way.
Why are the adds "sunk down" in the first place? Is this reassociation at work?
May 1 2017
My suggestion was this: LFTR is not necessary this early in the pipeline if it's only purpose is converting a 'cmp slt' to 'cmp eq'. As long as SCEV can compute the trip count, the loop test shouldn't matter. If we really need to convert the loop test, it could be done during LSR and we could drop the 'nw' flags at that point.
Apr 24 2017
Yeah, it seems like you should have the positive version of this test.
Apr 22 2017
Apr 21 2017
Neat. I wouldn't usually add complexity for an unexpected situation, but this is purely a generalization.
Apr 10 2017
This is probably fine, but I can't look at it closely at the moment. @MatzeB did some work in this area recently.
Mar 31 2017
I don't remember (or never understood) how all this code works. But the fix looks reasonable.
Mar 26 2017
LGTM. I think you should comment in TargetSchedule.td that SingleIssue is an alias for Begin/EndGroup.
Mar 25 2017
That looks good. But this version of the patch no longer has the SingleIssue flag support in the SubtargetEmitter. I thought you wanted that so the feature was more self-documenting. Your previous implementation looked ok to me. I just don't think you need any flag in the implementation of the model (MCSchedule.h).
@qcolombet, thanks for the analysis. It would be even better if we had a fix to the SCEVExpander's pathological behavior. Now I'm afraid this patch just hides the problem for this particular category of test case.
Mar 22 2017
I don't think you need the SingleIssue flag in MCSchedule.h any more.
Mar 17 2017
I just looked at the MachineScheduler implementation. I see this TODO:
/// TODO: Also check whether the SU must start a new group.
I think a SingleIssue flag would be a good user option. Representing such a flag in the tables generated by the machine model (MCSchedClassDesc) would be redundant. Either begin/endgroup already does the same thing, or it's a bug that should be fixed.
Mar 16 2017
I was referring to the same example but missed that there were two stores. So FindLoopCounter is working as intended.
Ok. I thought the problem was caused by converting the loop test to a post-increment compare. Is that a problem?
When I talk about creating or removing an IV, I'm talking about the strongly connected phi cycle. I'm not referring to any "derived IV" stemming from values that depend on the IV.
Mar 15 2017
Materializing a new IV isn't a good canonicalization. It may block other loop transformations and there's no guarantee that LSR will undo it. General rule: never pessimize your loop and expect LSR to rewrite it. LSR only works well on a small subset of loops and essentially gives up in a lot of cases. I agree the complexity smells bad. I don't have a better suggestion at the moment.
I see this transform as a "canonicalization" type transform (supposed to make later loop opts easier), but I did not add it. Maybe @atrick can shed some light here on the initial motivation? I'm most certainly in favor of deleting code where possible.
Mar 14 2017
This seems reasonable to me. Hopefully the linker will strip utilities like this for build targets that don't need them.
Mar 9 2017
Separate processor resources should be defined to model issue resources rather than trying to work around it by consuming functional units. The processor resources were always intended to be used for both issue resources and functional units. It just isn't well documented or easily discoverable. It might makes sense to have either a single issue flag or a separate list of issue resources in the tablegen machine model. My argument is that the scheduler itself already supports this functionality.
Mar 8 2017
That said, if there are multiple target maintainers who think adding a special SingleIssue case would be much better than the alternatives, then it may be worth adding.
I think what Matthias was getting at is that you can add a new ProcResource to model issue constraints. If you have a dual issue machine then any instruction that can't be grouped would simply consume both issue resources. Another option might be to use the begin/end group flags.
LSR is fundamentally combinatorial. It relies on arbitrary pruning. I don't like it, but that's the way it is, and we need to guard against pathological cases. I just wonder why you complete bypass the reassociation code instead of simply limiting the number of expressions that participate.
Feb 28 2017
I haven't fully reviewed the code, but I'm really glad you were able to express the range without extending its width. Thanks for doing this.
Feb 7 2017
Jan 20 2017
Note that adjustSchedDependency is defined as updating the latency. It's very important for target hooks not to mutate data structures people's backs.
Jan 18 2017
I think you should find a way to do this without calling shouldScheduleAdjacent on every DAG node. It's fine to say you've tested compilation time but what really matters here are the pathological cases with very large blocks. It's rare for instructions in the middle of blocks to have fusion opportunities, so it's wrong to introduce this potential cost for all blocks.
Jan 15 2017
Jan 13 2017
Sounds good to me. I don't think you need to add a command line flag.
Jan 12 2017
Rather than a command line option, can you find the recursion limit hit by the LLVM test suite (and SPEC if you have access) and set the limit a bit higher with sufficient comments?
Jan 9 2017
Yep, this could all be done in the target. SDep::Cluster is effectively a scheduler API for the subtarget to use as it wishes.
Would it make sense to have a TII.mayFuseWithPrecedingInstr() to avoid testing all DAG edges? The DAG is quadratic, but this a rare opportunity.
Looks reasonable to me.
Jan 3 2017
Dec 12 2016
How do you know this won't move division that used to be outside the loop (possibly multiple levels) inside the innermost loop?
Dec 9 2016
Sounds great, but I 'm unavailable for code reviews for about the next month.
Nov 22 2016
This change is consistent with the original intention of the design.
Nov 8 2016
Nov 4 2016
Oct 31 2016
Thanks. This is great, as long as you tested all in-tree targets.
Oct 18 2016
Oct 17 2016
On x86 it's meant to be a heuristic. If you think all subtargets should instead force macro-fusion before scheduling (and have benchmarks to prove it) then we should delete the code that implements the heuristic.
Doesn't this force macro fusion for all targets/subtargets? If we wanted to do that, we wouldn't need the cluster edge and scheduler heuristic anymore. Shouldn't there be a TII->forceMacroFusion() option?
Sep 28 2016
Marking a DAG node as "dead" for the purpose of scheduling should be an easy thing to do, relative to supporting instruction bundles. But adding the extra DAG edges is also a fine solution, just not quite as direct.
Sep 12 2016
It would be great to fix the example shown in the summary. However, I'm concerned that the proposed heuristic could hurt codegen in other cases.The cases where this heuristic generates worse code should be identified to determine whether this is the right strategy.
Does this patch still need review, or has it been subsumed by other changes?
Aug 18 2016
Whatever works for you. I was thinking custom schedulers would use composition of utilities rather than inheriting the whole set of heuristics. It comes down to whether you want someone tuning the GenericScheduler to impact performance on your target. They will have no way to evaluate it.
Jun 23 2016
Jun 21 2016
Frankly I don't understand what MID.getNumDefs() has to do with anything here. So your change looks ok to me. But I'm not clear on how the register form of the instruction ends up with multiple defs--I guess it already had multiple defs. Can you show an example of that in the form of machine instrs?
Jun 16 2016
Awesome. Proper DFS is how I've done it in the past.
TL;DR: The DAG builder is never supposed to form "insane" DAGs.
Actually, I'm sure fix #1 works in this case, I'm just not sure it won't regress anything else.
This is definitely a case worth fixing.
Matthias investigated this very recently, I think he should make a decision on the patch.
Jun 13 2016
Of course loop rotation and jump threading are both forms of tail-duplication. So what! That doesn't mean they should be the same pass. These things are done for different reasons. Loop rotation's job is to put loops into a canonical form for the sake of all other downstream loop passes.
May 17 2016
With SmallPtrSet you don't have to worry much about the overhead of the visited set. But I understand keeping the code super simple in situations like this when you're really just pattern matching your bounds checks.
That really is a low threshold :) LLVM likes to use #6 to limit search depth in other places.
For the record, even for an experimental pass, I'm objecting to O(2^n) recursion not O(n^2).
May 16 2016
IMO this seems as good a form as any for being in-tree, and I think it makes sense to have in-tree given that it's really part of the implementation of the guard intrinsic. You could "hide" the code in other passes, but it's still specific to your compiler until others start using the intrinsic. This way, the experimental feature is self-contained.
I did a fairly quick review. Your pass looks pretty great.
May 13 2016
Adding TTI hooks in LSR is fine with me. increasedLSRFormulaCost seems a little specific though for a TTI hook.
May 10 2016
Just chatted with Matthias. This patch as fine as-is--it preserves behavior. Matthias will add a comment explaining the pre-existing behavior.
May 2 2016
Adding these asserts seems harmless.
Apr 25 2016
Apr 23 2016
It would be nice to get this feature in. It's encouraging that misched=shuffle passes.
Apr 21 2016
I think this basic idea makes a lot of sense. Good work.