- User Since
- Aug 18 2016, 4:39 AM (143 w, 5 d)
Ping. @Gerolf, are you happy with the update?
Sun, May 19
You can use something like the command below to get a comparison between 2 sets of runs. Also, doing single runs only will probably result in quite noisy results.
I've verified this does not change CodeGen for the test-suite + various external suites. Slightly positive impact on compile-time (-0.1 % geomean speedup for test-suite + SPEC & co, with -O1 on X86)
Remove unused MarkDirty.
Wed, May 15
As a way to test this on a larger number of cases, could you add it to the default pass pipeline (maybe at multiple points) and bootstrap clang with it (or build the test-suite & co)?
LGTM, it would be great to have Cortex-M4 use the MachineScheduler in the non -Oz cases. Lets keep track of the issues (I expect more to surface) and go from there.
Tue, May 14
The change in the AArch64 assembly looks good to me.
@sanjoy it would be great if you could have a quick look and check if the new approach makes sense to you.
Thanks for the excellent suggestions! I've updated the text to use the GEP based description, as the GEP documentation spells out the behavior with respect to invalid pointers in detail. Also incorporated Hal's comments.
Ping. Ayal, does this address your comments appropriately?
This looks like an interesting case! I think there are 2 things going wrong (see input to the scheduler below)
Mon, May 13
LGTM. I've also added a few other reviewers, in case they have additional thoughts.
Fri, May 10
Thu, May 9
Change behavior from undefined to poison when passing masks that zero out relevant bits of a pointer, thanks @aqjune
Wed, May 8
Add missing attributes to @init argument.
I went with adding a pointer-mask intrinsic, as suggested.
Tue, May 7
Use InstBB instead of I->getParent(). Thanks, I was looking at the wrong line :)
Filter operands for getIntrinsicCallCost as well.
Split out moving various functions to LoopVectorizationCostModel to D61638.
Mon, May 6
Remove restriction for single use PHIs. We can replace such PHIs with their incoming
value, iff all uses are in the loop exit block or in the outer loop header, if
the incoming value is defined in the inner loop header (reduction phis).
Sun, May 5
Add test case without loop exit, clang-format-diff
Fri, May 3
Limit this patch to visitTokenFactor, limiting the number of operands to inline to 2048 nodes.
Push even element checks to the impacted functions.
Thu, May 2
Relax limits to 1000 nodes to explore. Further experimenting showed that those
bigger limits are still sufficient to ensure limiting quadratic compile time.
Wed, May 1
As per feedback here & on llvm-commits it seems like this is not the right thing at the moment. Thanks for taking a look!
Thanks for taking a look so quickly! I'll look into how to best set-up DYLD_LIBRARY_PATH in that case. I am after a more lightweight solution than CLANG_ENABLE_BOOTSTRAP, i.e. creating/configuring the separate build stages manually.
Tue, Apr 30
I plan to add this on top of the integer range support patches, abandoning this for now