- User Since
- Dec 2 2012, 11:45 PM (407 w, 2 d)
Nov 11 2019
Oct 7 2019
I forgot to link the commit message back to this review.
Closing with commit 861b371e1386b6ac1069c9aa3050f7074fb64516
Oct 2 2019
I guess this is just making the custom scheduling logic more closely match the generic scheduler? Makes sense to me.
Sep 12 2019
Sep 9 2019
Hmm... I guess the intent of the original design is not to overload the table even if it's in small mode. I'm not sure that makes sense, but I'll accept it as status quo. Anyway, it does sort of make sense that we can compress the small table by removing tombstones.
Sep 6 2019
Can this problem be fixed without losing the invariant that memory is never allocated until NumBuckets > InlineBuckets, regardless of the load? I don't like the removal of the fast path if (AtLeast < InlineBuckets) and the load doesn't matter at all in small mode!
e.g. If I ask to reserve 100 out of 128 entries, it should not attempt to copy all the entries into temporary storage.
Aug 20 2019
For the record, I see two failures after updating LLVM
Failing Tests (2):
LLVM :: CodeGen/PowerPC/aix-xcoff-basic.ll LLVM :: CodeGen/PowerPC/aix-xcoff-common.ll
I removed the extra if-block that I accidentally created.
I added a cast to fix a type mismatch that I meant to include in the last patch.
Aug 16 2019
@rsmith, you were the last developer to make salient changes here. Since you're already familiar, any chance you can review this? If not, I'll broadcast a plea.
Aug 14 2019
Aug 5 2019
Aug 2 2019
I'm abandoning this. Adding a member to PointerLikeTypeTraits is going to be a merge problem for some LLVM projects. I'm going back to the simpler way of converting to PtrToIntPair to an opaque pointers to check whether it can be an element in TinyPtrVector now that I've convinced myself that the simple way is still correct.
This breaks the clang build because there are a handful of PointerLikeTypeTraits specializations that specify higher alignment:
Mar 31 2019
@sanjoy I haven't tried to solve this problem myself, but it seems pretty important. It sounds to me like you're laying out an argument for introducing LLVM pointer masking intrinsics that would preserve some sort of inbound property. Is it fair to say that we probably can't fully optimize tagged pointers without using intrinsics to avoid this ptrtoint optimizer trap?
Mar 26 2019
I scanned the code. It seems reasonable to me. Thanks for all the other good review feedback.
Jan 24 2019
Thanks for doing this.
Sep 6 2018
It's great that you have a reproducer. Sad that SCEV still hasn't been fixed. As I mentioned, I wasn't able to easily reproduce the problem from source. I know that, back in 2011, the issue was only exposed because LCSSA form was blocking some obvious SCEV optimization.
Sep 5 2018
Is the "original test case" you are mentioning present somewhere in the unit tests? If no, I'd appreciate if you could check it in.
Sep 4 2018
The bitcast bailout (implemented in 2009) (https://github.com/llvm-mirror/llvm/blob/master/lib/Analysis/ScalarEvolution.cpp#L6451)
@sanjoy may have some insight as to why SCEV is now safe. If we know this why can't happen, you can completely remove isValidRewrite and don't need to assert.
isValidRewrite is a hack that wasn't supposed to live this long.
Aug 17 2018
Looks fine to me.
May 23 2018
LGTM, ignore my previous comment and sorry for the delay.
May 18 2018
May 15 2018
Just to be clear, you aren't actually using PostRASchedulerList are you?
May 14 2018
We didn't support cross-call scheduling last I looked at this. Hopefully @MatzeB can verify that your fix is correct.
That's interesting. @MatzeB is this the right fix? Is regalloc supposed to be adding implicit defs in front of the uses, and don't we/shouldn't we have a general API to handle this?
Fine with me. I don't know if it's possible anyone still uses ScoreBoardHazardRecognizer with SelectionDAG scheduler, but they really need to stop doing that.
May 13 2018
ScheduleDAGMI is defined in MachineScheduler.h. Maybe there's some other reason the API should take ScheduleDAG, but I'm not aware of it. I think that base class is only to share implementation across SelectionDAG and MI scheduling.
If getAA may return NULL, that should be mentioned in the comment.
May 11 2018
May 7 2018
May 3 2018
Apr 30 2018
These classes are not meant to be inherited from, and inheritance is generally terrible way to achieve code reuse across components. The principle behind the scheduling interfaces is to avoid implicit coupling between the target code and machine independent code.
Feb 26 2018
This looks fine to me based on a quick review. I don't know if @fhahn or @MatzeB still want to weigh in. Not sure if anyone else needs to review the SystemZ specific code or if you effectively own that.
Feb 21 2018
Just to express my thoughts: I see this point in the sense that if a target truly had a perfect set-in-stone tuning, it would be disastrous to change anything in an uncontrollable way. But given that mischeduler is relatively new and evolving, and a target may merely have been able to improve benchmarks with a minor modification, I think it's more natural to think that a target would really want to be in on the improvements to come in the future. In other words, *not* to decouple. Take register pressure, for instance. I don't think a target that does not have it's own register pressure heuristics would want to fall behind if the common code changes in the future. That change should then be general goodness, or it should be put in a target specific strategy, right?
So to me personally, working on just one backend, it would be slightly preferred to have e.g. the tryCandidate_RegPress() function in the target strategy, so that if somebody improved it, my target would immediately get that improvement. I don't want to decouple this, since I was merely adding a heuristic with lesser priority.
On the other hand, providing just the smaller utility functions and then doing a copy-and-paste of tryCandidate() should probably work quite well in practice, as you say. In the very long run, this would also give maximum flexibility for each target.
I suppose then we could just have one or two of those new methods, like tryCandidate_Clustered_Weak().
Feb 20 2018
Thanks for CC'ing me, but I can't comment on whether this is the right heuristic since I haven't done this level of tuning in a long time. Deferring to @MatzeB.
Feb 14 2018
I can confirm that the purpose of ItinRW is to avoid the need to define InstRW for a subtarget when the target already had itinerary classes. (Things would be *a lot* simpler if all the subtargets used the same style of machine description.)
Nov 22 2017
Obviously I meant "With MicroOpsBufferSize >1 (OOO)" above.
Think of CurrCycle as the cycle in which an instruction is issued.
Sep 7 2017
Note that you can get the stack trace just call RunSignalHandlers before calling exit(1)...
Sep 6 2017
Please add a comment that abort() also causes any signal handlers registered with LLVM such as PrettyStackTrace to run, which is important for compiler debugging.
Aug 16 2017
Andy: Do the comments now look ok?
Aug 7 2017
It sounds to me like you want a "MI-Sched Mode" convenience flag that does more than what it claims to do at face value, rather than needing to specify all the individual flags for SelectionDAG heuristic, enabling MI-Sched, coalescing heuristic. I'm not going to nit pick on the flag names because I don't have any stake it in. But I think you need to discuss this on llvm-dev and argue it out taking into consideration all of the goals:
I don't understand the motivation here. The purpose of command line flags is to test functionality in isolation. If you want a test case to control both SelectionDag and MISched functionality, then you use two command line flags.
Jul 28 2017
Jul 20 2017
Jun 27 2017
Jun 12 2017
Jun 5 2017
As a generally good direction I like the idea of moving LFTR, and eventually even IV widening, forward to the LSR stage of the pipeline.
So, I'm ok with it as long as you do the normal amount of benchmark validation and @mzolotukhin doesn't see any immediate performance issues with this approach.
May 9 2017
I'm haven't looked at this code in a long time, and I have no idea if it makes sense to have an Untyped MVT at this point in the code. But... your fix looks harmless.
May 5 2017
Can you collect spill code stats and benchmark scores from the test suite?
Can you get someone to run benchmarks on arm64 as well?
May 2 2017
The time to shrink live ranges is after all the global machine code
motion is done if the goal is handle as many cases as possible.
I think the algorithm needs to be explained. It seems reasonable to argue that single def/use live ranges should be kept short. But is there any reason at all to run this before other IR passes? It seems natural to run this after machine LICM and Sinking.
Ideally there should be a separate pass that runs on the SSA machine code (before register coalescing) to minimize register pressure and hide latency for chains of loads or FP ops. It should work across calls and loop boundaries. You could even do a kind of poor-man's cyclic scheduling this way.
Why are the adds "sunk down" in the first place? Is this reassociation at work?
May 1 2017
My suggestion was this: LFTR is not necessary this early in the pipeline if it's only purpose is converting a 'cmp slt' to 'cmp eq'. As long as SCEV can compute the trip count, the loop test shouldn't matter. If we really need to convert the loop test, it could be done during LSR and we could drop the 'nw' flags at that point.
Apr 24 2017
Yeah, it seems like you should have the positive version of this test.
Apr 22 2017
Apr 21 2017
Neat. I wouldn't usually add complexity for an unexpected situation, but this is purely a generalization.
Apr 10 2017
This is probably fine, but I can't look at it closely at the moment. @MatzeB did some work in this area recently.
Mar 31 2017
I don't remember (or never understood) how all this code works. But the fix looks reasonable.
Mar 26 2017
LGTM. I think you should comment in TargetSchedule.td that SingleIssue is an alias for Begin/EndGroup.
Mar 25 2017
That looks good. But this version of the patch no longer has the SingleIssue flag support in the SubtargetEmitter. I thought you wanted that so the feature was more self-documenting. Your previous implementation looked ok to me. I just don't think you need any flag in the implementation of the model (MCSchedule.h).
@qcolombet, thanks for the analysis. It would be even better if we had a fix to the SCEVExpander's pathological behavior. Now I'm afraid this patch just hides the problem for this particular category of test case.
Mar 22 2017
I don't think you need the SingleIssue flag in MCSchedule.h any more.
Mar 17 2017
I just looked at the MachineScheduler implementation. I see this TODO:
/// TODO: Also check whether the SU must start a new group.
I think a SingleIssue flag would be a good user option. Representing such a flag in the tables generated by the machine model (MCSchedClassDesc) would be redundant. Either begin/endgroup already does the same thing, or it's a bug that should be fixed.
Mar 16 2017
I was referring to the same example but missed that there were two stores. So FindLoopCounter is working as intended.
Ok. I thought the problem was caused by converting the loop test to a post-increment compare. Is that a problem?
When I talk about creating or removing an IV, I'm talking about the strongly connected phi cycle. I'm not referring to any "derived IV" stemming from values that depend on the IV.
Mar 15 2017
Materializing a new IV isn't a good canonicalization. It may block other loop transformations and there's no guarantee that LSR will undo it. General rule: never pessimize your loop and expect LSR to rewrite it. LSR only works well on a small subset of loops and essentially gives up in a lot of cases. I agree the complexity smells bad. I don't have a better suggestion at the moment.
I see this transform as a "canonicalization" type transform (supposed to make later loop opts easier), but I did not add it. Maybe @atrick can shed some light here on the initial motivation? I'm most certainly in favor of deleting code where possible.
Mar 14 2017
This seems reasonable to me. Hopefully the linker will strip utilities like this for build targets that don't need them.
Mar 9 2017
Separate processor resources should be defined to model issue resources rather than trying to work around it by consuming functional units. The processor resources were always intended to be used for both issue resources and functional units. It just isn't well documented or easily discoverable. It might makes sense to have either a single issue flag or a separate list of issue resources in the tablegen machine model. My argument is that the scheduler itself already supports this functionality.
Mar 8 2017
That said, if there are multiple target maintainers who think adding a special SingleIssue case would be much better than the alternatives, then it may be worth adding.
I think what Matthias was getting at is that you can add a new ProcResource to model issue constraints. If you have a dual issue machine then any instruction that can't be grouped would simply consume both issue resources. Another option might be to use the begin/end group flags.
LSR is fundamentally combinatorial. It relies on arbitrary pruning. I don't like it, but that's the way it is, and we need to guard against pathological cases. I just wonder why you complete bypass the reassociation code instead of simply limiting the number of expressions that participate.
Feb 28 2017
I haven't fully reviewed the code, but I'm really glad you were able to express the range without extending its width. Thanks for doing this.
Feb 7 2017
Jan 20 2017
Note that adjustSchedDependency is defined as updating the latency. It's very important for target hooks not to mutate data structures people's backs.
Jan 18 2017
I think you should find a way to do this without calling shouldScheduleAdjacent on every DAG node. It's fine to say you've tested compilation time but what really matters here are the pathological cases with very large blocks. It's rare for instructions in the middle of blocks to have fusion opportunities, so it's wrong to introduce this potential cost for all blocks.