Rebase on top of precommitted tests.
Extend ByteProvider / VectorOffset handling to support vectorScalarTypes > 1 Byte.
Thu, Sep 29
Add pattern to select V_AND v1, 0xffff000 in the case where buildvector produces bits V1.hi : 0
Wed, Sep 28
One small point in favor of BFI is the bitmask you need is more likely CSEable for unrelated uses
Use V_BFI for V.hi : V.low . This allows for a bitmask which is more likely to be reused by other instructions (0xffff vs 0x7060100), potentially enabling other optimizations (e.g. CSE)
Tue, Sep 27
Can't you use v_alignbit for all the cases where you need the upper 16 bits of one register and the lower 16 bits of the other? It should be smaller than v_perm because the shift amount (16) is an inline constant.
Precommit generated test + Rebase
I'm still not confident in my understanding of the various index values even with the added code comments.
I'll try to step through some of these tests in the debugger to get a better idea, but it would be good if another reviewer can have a look too for a second opinion.
replace setOffset with addToOffset
Mon, Sep 26
Fri, Sep 23
Address review comments.
Fix attributes in test
Precommit generated tests pack.v2f16.ll, rebase
Address review comments (remove unnecessary "Root" parameter).
Thu, Sep 22
Address review comments.
Hey Matt, thanks for the comments. I'll address them soon, for now I'll add you as reviewer.
Wed, Sep 21
Sure - I didn't look at the diffs closely, but I don't object to improving the SDAG implementation. Just wanted to let you know that there are potential other places to try this kind of transform.
Mon, Sep 19
Address review comments -- update usage of Optional API.
LLVM has gone back and forth on this. There was a general load combine pass for IR, but it was removed because it interfered with other transforms in IR. So we started hacking away at codegen instead, but there are programs where doing the transform in codegen is too late to get the optimal results. So we have some limited transforms in the vectorization passes, and now we're trying to reintroduce load combining as a canonicalization (but in very limited cases and gated by target-specific legality checks).
Both SLP and VectorCombine should try to make patterns like this better in IR, so there might be some target cost/legality checks that need adjusting.
There's also an in-progress patch for -aggressive-instcombine that could be relevant:
Would it be better to transform this before codegen?
Thu, Sep 15
Address review comments
Add comment for new function
Wed, Sep 14
A few optimizations to improve general performance, and perf of LB.
Tue, Sep 13
Address Review Comments
Mon, Sep 12
Avoid a very small number of calls to calculateLowerBound
Fix incomplete comment
Adding reviewers for increased perspective.
Fri, Sep 9
Check D133593, it tries to address a similar problem, but with SCC.
Aug 19 2022
Just a couple nitpicks
Code review: https://reviews.llvm.org/D130729
Aug 17 2022
Does anyone have any concerns about this patch?
Hey Austin --
Aug 16 2022
Not sure how much effort you are willing to put into the exact algorithm. But maybe you can improve the performance by adding some lower bounds on future costs to improve pruning?
More specifically, when evaluating a partial solution S with cost C, and comparing it against the currently best solution BS with cost BC, you currently prune if C > BC.
I'm proposing to compute some lower bound LC on the cost of the decisions that have not yet been made (e.g. adding the costs of cheapest assignments), and pruning if C + LC > BC.
I did not review the algorithm in enough detail to make a more concrete proposal, and it could very well be that computing a non-trivial lower bound (i.e. one that is not zero) is difficult.
However, in similar problems the technique above often helps a lot.
Aug 10 2022
Replace timeout feature with deterministic max branches explored feature, and properly handle early termination condition.
Aug 9 2022
Add some features to help performance / usability of exact PipelineSolver, including:
Timeout feature (& corresponding CLI option) Guiding heuristic, choosing the fit with fewest missed edges first (& corresponding CLI option) Run the greedy algorithm before exact to improve pruning (useful if not using cost heuristic)
Aug 8 2022
Aug 5 2022
Address Review comments.
Aug 4 2022
Move codegen tests to CodeGen, add IR test for InferAddressSpace flat_atomic.
Rework approach of fix.
Aug 3 2022
Hey Matt, Jay,
Aug 2 2022
Address Review Comments.
Aug 1 2022
Thanks! I like the idea behind the greedy solver. Not sure about SchedGroupSU. Maybe just a map between SUs and lists of schedgroups? I think trying to track sched_group_barriers by their order and assigning that an index is a bit confusing.
Jul 29 2022
Jul 28 2022
Remove unnecessary local var
Jul 26 2022
Jul 14 2022
Address review comments.
Jul 13 2022
Include test which which minimally reproduces the SmallVector error reported.
Jul 6 2022
Remove unnecessary debug code.
Addressed review comments.
Jul 1 2022
Hey Austin -- I like the removal of canAddMIs. In the original design, I was leaving open the possibility for users to pass in canAddMIs rather than a mask / SchedGroup name, but it looks like this isn't the direction we're going, and the classification functions defined in a general canAddMI makes things easier.
Jun 30 2022
Remove accidental files
Run instnamer on testfile, explicitly use "source" (RRList) scheduler for InstSelection Scheduler in test.
Jun 28 2022
Jun 27 2022
Broke up logic in ScheduleDAGFast CheckForLiveRegDef to remove redundancy.
Ported over to phab review to land in Trunk. Addressed the requests in initial review, renamed test file to better align with naming of previous test.