Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

sanwou01 (Sanne Wouda)
Senior Software Engineer

Projects

User does not belong to any projects.

User Details

User Since
Jan 12 2017, 6:15 AM (350 w, 1 d)

Recent Activity

Jun 22 2021

sanwou01 accepted D103816: [SimpleLoopUnswich] Fix a bug on ComputeUnswitchedCost with partial unswitch.

Thanks, LGTM!

Jun 22 2021, 7:04 AM · Restricted Project
sanwou01 added a comment to D103816: [SimpleLoopUnswich] Fix a bug on ComputeUnswitchedCost with partial unswitch.

Sorry for inconvenient, let me add a comment in commit message.

Jun 22 2021, 5:22 AM · Restricted Project
sanwou01 added a comment to D103816: [SimpleLoopUnswich] Fix a bug on ComputeUnswitchedCost with partial unswitch.

I think I managed to figure out what's going on here in the end, but it would help to explain what the bug is in the commit message, something like:

Jun 22 2021, 3:50 AM · Restricted Project
sanwou01 added inline comments to D103816: [SimpleLoopUnswich] Fix a bug on ComputeUnswitchedCost with partial unswitch.
Jun 22 2021, 3:45 AM · Restricted Project

May 17 2021

sanwou01 added a comment to D102279: [InstCombine] Support one-hot merge for logical and/or.

@nikic this fixed the regression, thank you!

May 17 2021, 4:20 AM · Restricted Project

May 11 2021

sanwou01 updated subscribers of D101191: [InstCombine] Fully disable select to and/or i1 folding.

Hi, we've got a ~6% regression in SPEC INT 2006 462.libquantum on AArch64 (both -flto and -Ofast) that comes back to this change. See here for a reproducer https://godbolt.org/z/dq98Gqqxn (-fno-vectorize is not strictly necessary, but it does make the difference easier to spot). @dmgreen mentioned to me that we could probably fix this up in the AArch64 backend, but a fix in the mid-end might be more generally useful too.

May 11 2021, 9:31 AM · Restricted Project, Restricted Project

Apr 22 2021

sanwou01 added a comment to D100381: [RFC] Improve loop distribute cost model.

Thanks for the comments. I'm happy to leave LoopAccessAnalysis alone for now and focus on Loop Distribute's cost model. I was hoping that a cost model tweak or two might enable some more loop distribution in TSVC, but it looks like that isn't the case.

Apr 22 2021, 4:40 AM · Restricted Project, Restricted Project

Apr 21 2021

sanwou01 added a comment to D100381: [RFC] Improve loop distribute cost model.

Looking at TSVC a bit, @xbolva00 :

  • s221 won't distribute because the second read of a[i] is removed by EarlyCSE, so there is no unique load instruction for a second loop. For this loop I'm not convinced that distribution is likely to help performance; it's a trade-off between (some) vectorization and re-loading both a[i] and d[i].
  • s222 also gets mangled by EarlyCSE, but the result would still be distributable if it weren't for the order of the stores to e[i] and a[i]. This runs into a limitation of LoopAccessAnalysis, which can't reorder instructions. Perhaps it could help to do a bit of scheduling on IR?
  • s2275 as mentioned above, this runs into another LoopAccessAnalysis limitation: it only handles innermost loops. I'm not sure how easy (if at all possible) it would be to lift that restriction.
Apr 21 2021, 5:09 AM · Restricted Project, Restricted Project

Apr 20 2021

sanwou01 retitled D99596: [LoopDist] Distribute vectorizable loops from [RFC] [LoopDist] Distribute vectorizable loops to [LoopDist] Distribute vectorizable loops.
Apr 20 2021, 9:15 AM · Restricted Project, Restricted Project
sanwou01 added a comment to D99596: [LoopDist] Distribute vectorizable loops.

Now, there are no differences in distributed loops in the test suite and SPEC, before and after the patch, as intended.

Apr 20 2021, 9:15 AM · Restricted Project, Restricted Project
sanwou01 updated the diff for D99596: [LoopDist] Distribute vectorizable loops.

Rebased, and addressed discrepancy in the loop distributed. The difference hinges on loops that contain backward dependences which the loop vectorizer can handle, but which would frustrate loop distribution. In this case, we don't distributing the loop and leave it to the loop vectorizer.

Apr 20 2021, 9:11 AM · Restricted Project, Restricted Project

Apr 15 2021

sanwou01 added a comment to D100381: [RFC] Improve loop distribute cost model.

IIRC SPEC was neutral except for the expected gain on hmmer, but I'll re-run them with the current patch.

Apr 15 2021, 6:38 AM · Restricted Project, Restricted Project

Apr 14 2021

sanwou01 added a comment to D100381: [RFC] Improve loop distribute cost model.

We also have some target-specific heuristics for loop-distribute, which focus on the number of memory streams a CPU can handle IIRC. I never got around posting them upstream so far. Let me go back and look at those heuristics.

Apr 14 2021, 6:28 AM · Restricted Project, Restricted Project
sanwou01 added reviewers for D99596: [LoopDist] Distribute vectorizable loops: lebedev.ri, nikic, davide.
Apr 14 2021, 6:26 AM · Restricted Project, Restricted Project
sanwou01 added a comment to D99596: [LoopDist] Distribute vectorizable loops.

By just looking at this patch I find it a bit difficult to get an overview of all moving parts involved. I.e., this makes probably sense:

Loop distribute bails out early if a loop is already vectorizable.

but by not doing this, do we remove opportunities for the vectoriser? So, perhaps the easiest is to get some perf numbers on the table?

Then, we can think about the cost-model too, and see if we can create some ideas about that. This pass is not enabled by default, so if perf numbers are okay and we don't make (downstream) users of this pass unhappy, it looks like a good step forward to me, but some ideas about steps after that would be good.

Apr 14 2021, 5:31 AM · Restricted Project, Restricted Project

Apr 13 2021

sanwou01 added a comment to D100381: [RFC] Improve loop distribute cost model.

That certainly looks encouraging. Just some remarks about the perf numbers first. I found that the llvm test suite can be quite noisy and you certainly need to restrict it to the subset of CTMark for some more meaningful numbers. Just checking, did you do this? Because looking at all tests, I think I see more tests than I would expect with CTMark, but I could be wrong. How about a SPEC run for something that runs a bit longer?

Apr 13 2021, 7:17 AM · Restricted Project, Restricted Project
sanwou01 accepted D93762: SCCP: Refactor SCCPSolver.

Since this is mostly just moving pre-existing code, I think it's fine to address the style issues in a separate NFC commit. I believe all comments have been addressed, but please wait a day or two before committing in case there is anything else. LGTM.

Apr 13 2021, 6:34 AM · Restricted Project
sanwou01 requested review of D100381: [RFC] Improve loop distribute cost model.
Apr 13 2021, 5:56 AM · Restricted Project, Restricted Project

Apr 12 2021

sanwou01 added a comment to D99790: [CGCall] Annotate `this` argument with alignment.

+1 on eagerly awaiting a fix. a 3% regression on astar (AArch64, LTO) bisects to @lebedev.ri 's revert: https://reviews.llvm.org/rG6270b3a1eafaba4279e021418c5a2c5a35abc002 .

Apr 12 2021, 10:14 AM · Restricted Project, Restricted Project
sanwou01 added a comment to D93762: SCCP: Refactor SCCPSolver.

Thanks! This seems to be moving in the right direction. Now that SCCPInstVisitor is separate, its declaration can move into SCCPSolver.cpp entirely.

Apr 12 2021, 8:51 AM · Restricted Project

Apr 7 2021

sanwou01 accepted D100033: [NPM] Fix typo inisLTOPreLink for loop rotate .

Good spot! Thanks!

Apr 7 2021, 6:59 AM · Restricted Project

Apr 1 2021

sanwou01 added a comment to D93762: SCCP: Refactor SCCPSolver.

I think the issue here is that quite a number of implementation details are leaking into the header file where they don't belong because they are not part of the public interface. You're right that code using the header file doesn't have access to these by virtue of being private, but this does not make for a very readable header file, IMO.

Apr 1 2021, 9:39 AM · Restricted Project

Mar 30 2021

sanwou01 requested review of D99596: [LoopDist] Distribute vectorizable loops.
Mar 30 2021, 7:36 AM · Restricted Project, Restricted Project

Mar 8 2021

sanwou01 committed rG05a6e2eb9a41: [InstCombine] Add a combine for a shuffle of similar bitcasts (authored by sanwou01).
[InstCombine] Add a combine for a shuffle of similar bitcasts
Mar 8 2021, 8:37 AM
sanwou01 committed rG5e963a24415e: Rehome an orphaned comment [NFC] (authored by sanwou01).
Rehome an orphaned comment [NFC]
Mar 8 2021, 8:37 AM
sanwou01 closed D97397: [InstCombine] Add a combine for a shuffle of similar bitcasts.
Mar 8 2021, 8:37 AM · Restricted Project

Mar 5 2021

sanwou01 added a comment to D97397: [InstCombine] Add a combine for a shuffle of similar bitcasts.

@spatel @lebedev.ri thanks for the comments so far. Any other comments, or is this okay as is?

Mar 5 2021, 8:45 AM · Restricted Project

Mar 1 2021

sanwou01 updated the diff for D97397: [InstCombine] Add a combine for a shuffle of similar bitcasts.

Add a few more tests and some further test reduction.

Mar 1 2021, 8:27 AM · Restricted Project
sanwou01 added a comment to D97397: [InstCombine] Add a combine for a shuffle of similar bitcasts.

I'd like to see some test improvements:

  1. There are no tests with extra uses on bitcasts
  2. There are no tests with something like
%xb = bitcast <2 x half> %x to <2 x i16>
%yb = bitcast <2 x bfloat> %y to <2 x i16>
%r = shufflevector <2 x i16> %xb, <2 x i16> %yb, <4 x i16> <i16 3, i16 2, i16 1, i16 0>

I suspect this will miscompile?

  1. Some tests have unneeded stuff. They should only contain 2 bitcasts and a shufflevector (and a ret), nothing more.
Mar 1 2021, 6:32 AM · Restricted Project
sanwou01 retitled D97397: [InstCombine] Add a combine for a shuffle of similar bitcasts from [InstCombine] Add a combine for a shuffle of identical bitcasts to [InstCombine] Add a combine for a shuffle of similar bitcasts.
Mar 1 2021, 5:12 AM · Restricted Project
sanwou01 updated the diff for D97397: [InstCombine] Add a combine for a shuffle of similar bitcasts.

Address comments.

Mar 1 2021, 5:12 AM · Restricted Project
sanwou01 added a comment to D97397: [InstCombine] Add a combine for a shuffle of similar bitcasts.

We intentionally do not create new shuffle masks in instcombine because we can't guarantee that codegen can lower arbitrary masks efficiently, but this patch seems fine since it just re-uses the existing mask.
If there is motivation to handle casts of different-sized elements (and therefore requires a new mask), you might look at building on VectorCombine::foldBitcastShuf(). We use the cost model there to avoid creating unsupported shuffles.

Mar 1 2021, 4:31 AM · Restricted Project

Feb 24 2021

sanwou01 added inline comments to D97397: [InstCombine] Add a combine for a shuffle of similar bitcasts.
Feb 24 2021, 9:43 AM · Restricted Project
sanwou01 added reviewers for D97397: [InstCombine] Add a combine for a shuffle of similar bitcasts: spatel, dmgreen, SjoerdMeijer, fhahn.
Feb 24 2021, 9:24 AM · Restricted Project
sanwou01 requested review of D97397: [InstCombine] Add a combine for a shuffle of similar bitcasts.
Feb 24 2021, 9:22 AM · Restricted Project

Feb 18 2021

sanwou01 closed D96694: Use LoopRotate PrepareForLTO stage in NPM.

Committed as https://reviews.llvm.org/rG93d9a4c95aff . Looks like I forgot to add the "Differential Revision:" line :(

Feb 18 2021, 5:56 AM · Restricted Project

Feb 17 2021

sanwou01 committed rG93d9a4c95aff: Use LoopRotate PrepareForLTO stage in NPM (authored by sanwou01).
Use LoopRotate PrepareForLTO stage in NPM
Feb 17 2021, 6:07 AM
sanwou01 added a comment to D96694: Use LoopRotate PrepareForLTO stage in NPM.

I've adopted your suggestions, thanks!

Feb 17 2021, 6:05 AM · Restricted Project

Feb 15 2021

sanwou01 added reviewers for D96694: Use LoopRotate PrepareForLTO stage in NPM: dmgreen, fhahn.
Feb 15 2021, 1:41 AM · Restricted Project
sanwou01 requested review of D96694: Use LoopRotate PrepareForLTO stage in NPM.
Feb 15 2021, 1:40 AM · Restricted Project

Feb 8 2021

sanwou01 added a comment to D93762: SCCP: Refactor SCCPSolver.

This looks good to me. @fhahn are you happy with the suggested approach to tighten up the SCCPSolver class in subsequent patches, when function inlining starts using it?

Feb 8 2021, 11:28 AM · Restricted Project
sanwou01 added a comment to D93764: [LoopUnswitch] Implement first version of partial unswitching..

Maybe I'm missing something, but this change doesn't seem to be effective anymore after the new pass manager switcheroo. Did this pass not get ported, in favour of SimpleLoopUnswitch? This now shows up as a regression relative to (what will be) LLVM 12.

Feb 8 2021, 9:58 AM · Restricted Project

Feb 4 2021

sanwou01 added a comment to D93762: SCCP: Refactor SCCPSolver.

This looks like a fine first step to separate SCCPSolver from SCCP. I agree with @fhahn that the SCCPSolver class should be improved by pulling out implementation details into (separate, hidden) classes, but this is probably best reviewed in separate patches. At least it looks like the public interface of SCCPSolver is already restricted to what is needed by SCCP.

Feb 4 2021, 6:52 AM · Restricted Project

Feb 3 2021

sanwou01 added a comment to D88471: [Passes] Run peeling as part of simple/full loop unrolling..

Hey, we've run into a case where this patch causes a dead loop to appear which doesn't subsequently get removed. It's not a huge deal (we're seeing some great speed-ups from this patch too!), but a niggle that might be nice to address. Reproducer:

Feb 3 2021, 8:53 AM · Restricted Project

Jan 27 2021

sanwou01 added a comment to D93838: [SCCP] Add Function Specialization pass.

As the approach here is pretty much identical to D36432, do you have any plans on the cost model for this pass? The earlier proposal seems to have been abandoned due to a lack of good heuristics to trade off between the (potentially huge) increase in code size (as well as compile time) and performance improvements (which may or may not arise due to later optimizations). Perhaps it would help to look at what trade-offs GCC makes? Specifically, it would be good to have an idea of how you would implement getSpecializationCost and getSpecializationBonus.

Jan 27 2021, 6:13 AM · Restricted Project, Restricted Project

Jan 18 2021

sanwou01 accepted D94232: [LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands (WIP)..

Looks good to me!

Jan 18 2021, 9:49 AM · Restricted Project
sanwou01 added a comment to D94232: [LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands (WIP)..

[...]

Yes, I think the only thing outstanding with respect to testing is the omnetpp_r failure. Can you confirm if that is caused by the patch or not?

And any review of the patch would be appreciated of course!

Jan 18 2021, 2:09 AM · Restricted Project

Jan 15 2021

sanwou01 added a comment to D94232: [LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands (WIP)..

So on SPEC 2006 this fixes astar (+9.6%) as well as shakes things up enough to "fix" h264ref (+9.0%, see D93946). Other notable changes are libquantum (+3.2%) and omnetpp (-2.8%). Geomean is +1.5%.

Thank you very much for running the numbers!

Just to double check, positive here means good (I assume increase in score)?

Jan 15 2021, 3:19 AM · Restricted Project

Jan 14 2021

sanwou01 added a comment to D94232: [LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands (WIP)..

So on SPEC 2006 this fixes astar (+9.6%) as well as shakes things up enough to "fix" h264ref (+9.0%, see D93946). Other notable changes are libquantum (+3.2%) and omnetpp (-2.8%). Geomean is +1.5%.

Jan 14 2021, 9:40 AM · Restricted Project
sanwou01 added a comment to D94232: [LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands (WIP)..

Turns out the failing benchmarks are due to a miscompilation with -g3 (which we add to profiled runs). The patch does seem to make that miscompilation more likely. I'll try to reduce that separately, but at least I'll have some performance numbers shortly.

Jan 14 2021, 7:50 AM · Restricted Project
sanwou01 added a comment to D93946: [FuncAttrs] Infer noreturn.

Could you do some initial investigation or come up with a repro? Maybe an optimization remark that doesn't fire after this change?

Yes, I will do an initial investigation, hopefully resulting in a reproducer. It is always a bit of a challenge with LTO benchmarks, but the regression is large enough that it should be easy to spot.

Jan 14 2021, 5:53 AM · Restricted Project

Jan 12 2021

sanwou01 added a comment to D93946: [FuncAttrs] Infer noreturn.

Could you do some initial investigation or come up with a repro? Maybe an optimization remark that doesn't fire after this change?

Jan 12 2021, 10:12 AM · Restricted Project
sanwou01 added a comment to D93946: [FuncAttrs] Infer noreturn.

Hi folks, I'm afraid an 8% regression in h264ref (SPEC INT 2006) bisects to this patch. This in on AArch64 with -flto on a Neoverse-N1 based system, but I wouldn't be surprised if this showed up elsewhere too.

Jan 12 2021, 7:50 AM · Restricted Project

Jan 8 2021

sanwou01 added a comment to D94232: [LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands (WIP)..

Yep, definitely a problem on our end; I've got similar symptoms in other runs. Sorry about the noise, and I will try to get some perf data once we sort this out.

Jan 8 2021, 6:24 AM · Restricted Project
sanwou01 added a comment to D94232: [LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands (WIP)..

Our flags are -mcpu=native -O3 -fomit-frame-pointer -flto on a Neoverse-N1 based system, so -mcpu=neoverse-n1 should do the same thing.

Jan 8 2021, 4:13 AM · Restricted Project

Jan 7 2021

sanwou01 added a comment to D94232: [LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands (WIP)..

@fhahn I'm afraid I'm seeing runtime and verification errors on perlbench, gcc, gobmk, and astar in SPEC INT 2006. I'm guessing something later in the pipe really doesn't like it when certain loops aren't rotated?

Jan 7 2021, 9:26 AM · Restricted Project
sanwou01 added a comment to D94232: [LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands (WIP)..

Thanks, Florian. I'll give the patch a run through our benchmarking.

Jan 7 2021, 6:39 AM · Restricted Project
sanwou01 added a comment to D89896: Add loop distribution to the LTO pipeline.

@pzheng I don't have perf data for Cortex-A57, but I would certainly expect to see an uplift there too.

Jan 7 2021, 1:34 AM · Restricted Project

Jan 6 2021

sanwou01 added a comment to D84108: [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline.

I had a first look and it seems like there's some bad interactions with the LTO pipeline. The problem in this case is the following: we now rotate a loop just before the vectorizer which requires duplicating a function call in the preheader when compiling the individual files. But this then stops inlining during LTO. I'll look into whether we should avoid rotating such loops in the 'prepare-for-lto' stage.

Jan 6 2021, 3:02 AM · Restricted Project

Dec 15 2020

sanwou01 added a comment to D89896: Add loop distribution to the LTO pipeline.

Hi, our flags are "-Ofast -flto -mcpu=native -fomit-frame-pointer", where "native" is a Neoverse-N1 system. Let me know if that helps.

Dec 15 2020, 1:52 AM · Restricted Project

Dec 1 2020

sanwou01 added a comment to D84108: [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline.

Just wanted to add to @fhahn that we're seeing this too, as an even more notable 8% regression (same benchmark, different hardware). Florian's explanation matches what I'm seeing, but I hadn't had a chance to confirm what was happening.

Dec 1 2020, 2:10 AM · Restricted Project

Nov 10 2020

sanwou01 committed rGdd03881bd504: Add loop distribution to the LTO pipeline (authored by sanwou01).
Add loop distribution to the LTO pipeline
Nov 10 2020, 4:04 AM
sanwou01 closed D89896: Add loop distribution to the LTO pipeline.
Nov 10 2020, 4:04 AM · Restricted Project

Nov 9 2020

sanwou01 committed rGf4f256fb7b7e: Reland "Precommit LTO pipeline test" (authored by sanwou01).
Reland "Precommit LTO pipeline test"
Nov 9 2020, 3:37 AM

Nov 6 2020

sanwou01 added a comment to D88126: [Machinesink] add more profitable pattern if target bb register pressure is not too high.

Hi, this is regressing a few internal workloads (physics simulations, AArch64) by a few percent. Did you do any performance measurements for this change?

Nov 6 2020, 7:16 AM · Restricted Project

Nov 3 2020

sanwou01 reopened D89896: Add loop distribution to the LTO pipeline.

Reverted due to new test failing on a bunch of buildbots. I'll try again tomorrow, looks like the other pipeline tests manage to work around it.

Nov 3 2020, 1:03 PM · Restricted Project
sanwou01 added a reverting change for rG5a72a1623e4a: Precommit LTO pipeline test: rGe969ab43202e: Revert "Precommit LTO pipeline test".
Nov 3 2020, 11:30 AM
sanwou01 committed rGe969ab43202e: Revert "Precommit LTO pipeline test" (authored by sanwou01).
Revert "Precommit LTO pipeline test"
Nov 3 2020, 11:30 AM
sanwou01 added a reverting change for rG6e80318eecde: Add loop distribution to the LTO pipeline: rG2ec26d3a2315: Revert "Add loop distribution to the LTO pipeline".
Nov 3 2020, 11:29 AM
sanwou01 committed rG2ec26d3a2315: Revert "Add loop distribution to the LTO pipeline" (authored by sanwou01).
Revert "Add loop distribution to the LTO pipeline"
Nov 3 2020, 11:29 AM
sanwou01 added a reverting change for D89896: Add loop distribution to the LTO pipeline: rG2ec26d3a2315: Revert "Add loop distribution to the LTO pipeline".
Nov 3 2020, 11:29 AM · Restricted Project
sanwou01 committed rG6e80318eecde: Add loop distribution to the LTO pipeline (authored by sanwou01).
Add loop distribution to the LTO pipeline
Nov 3 2020, 10:54 AM
sanwou01 committed rG5a72a1623e4a: Precommit LTO pipeline test (authored by sanwou01).
Precommit LTO pipeline test
Nov 3 2020, 10:54 AM
sanwou01 closed D89896: Add loop distribution to the LTO pipeline.
Nov 3 2020, 10:54 AM · Restricted Project

Oct 21 2020

sanwou01 added a comment to D89896: Add loop distribution to the LTO pipeline.

@SjoerdMeijer yeaahhh these pipeline tests are a bit of a pain, but nothing some big-brain sed scripting can't solve.

Oct 21 2020, 1:06 PM · Restricted Project
sanwou01 updated the diff for D89896: Add loop distribution to the LTO pipeline.

Added LTO pipeline test

Oct 21 2020, 1:04 PM · Restricted Project
sanwou01 added reviewers for D89896: Add loop distribution to the LTO pipeline: SjoerdMeijer, dmgreen, anemet, efriedma.
Oct 21 2020, 10:19 AM · Restricted Project
sanwou01 requested review of D89896: Add loop distribution to the LTO pipeline.
Oct 21 2020, 10:12 AM · Restricted Project

Sep 29 2020

sanwou01 abandoned D88423: Fix llvm-link assert failure in BitCodeWriter.
Sep 29 2020, 2:09 AM · Restricted Project
sanwou01 added a comment to D88241: OpaquePtr: Add type to sret attribute.

Thanks @tpopp, that'll unblock all of us.

Sep 29 2020, 2:09 AM · Restricted Project
sanwou01 added inline comments to D88423: Fix llvm-link assert failure in BitCodeWriter.
Sep 29 2020, 1:52 AM · Restricted Project

Sep 28 2020

sanwou01 added inline comments to D88423: Fix llvm-link assert failure in BitCodeWriter.
Sep 28 2020, 12:28 PM · Restricted Project
sanwou01 added a comment to D88423: Fix llvm-link assert failure in BitCodeWriter.

I'm not sure I understand the reason behind moving the EnumerateType call from incorporateFunction to the ValueEnumerator constructor. We don't walk the attributes before that, do we?

Sep 28 2020, 11:54 AM · Restricted Project
sanwou01 closed D87231: [AArch64] Match pairwise add/fadd pattern.

Committed as d5fd3d9b903e

Sep 28 2020, 9:49 AM · Restricted Project
sanwou01 added reviewers for D88423: Fix llvm-link assert failure in BitCodeWriter: arsenm, t.p.northover, dblaikie, efriedma.
Sep 28 2020, 8:33 AM · Restricted Project
sanwou01 requested review of D88423: Fix llvm-link assert failure in BitCodeWriter.
Sep 28 2020, 8:32 AM · Restricted Project

Sep 22 2020

sanwou01 added a comment to D87972: [OldPM] Pass manager: run SROA after (simple) loop unrolling.

SPEC 2017 on AArch64 is neutral on the geomean. The only slight worry is omnetpp with a 1% regression, but this is balanced by a .8% improvement on mcf. Other changes are in the noise.

Sep 22 2020, 6:54 AM · Restricted Project, Restricted Project

Sep 18 2020

sanwou01 added a comment to D87188: [InstCombine] Canonicalize SPF to abs intrinc.

I know this has already been reverted but just FYI that I've bisected a ~2% regression in SPEC2017 x264_r on AArch64 to this commit. Presumably this is due to the extra unrolling / cost modelling issue already mentioned?

Sep 18 2020, 8:44 AM · Restricted Project, Restricted Project

Sep 17 2020

sanwou01 committed rGd5fd3d9b903e: [AArch64] Match pairwise add/fadd pattern (authored by sanwou01).
[AArch64] Match pairwise add/fadd pattern
Sep 17 2020, 8:28 AM
sanwou01 committed rG3ee87a976d52: Precommit test updates (authored by sanwou01).
Precommit test updates
Sep 17 2020, 8:28 AM
sanwou01 added inline comments to D87231: [AArch64] Match pairwise add/fadd pattern.
Sep 17 2020, 7:00 AM · Restricted Project
sanwou01 updated the diff for D87231: [AArch64] Match pairwise add/fadd pattern.

Fix for when there is no fp16 faddp + testing

Sep 17 2020, 6:56 AM · Restricted Project
sanwou01 accepted D87816: [clang] Fix incorrect call to TextDiagnostic::printDiagnosticMessage.

LGTM, thanks for fixing this! Could you wait a day or two before committing to allow others to comment?

Sep 17 2020, 1:42 AM · Restricted Project

Sep 16 2020

sanwou01 retitled D87231: [AArch64] Match pairwise add/fadd pattern from [AArch64] Match pairwise fadd pattern to [AArch64] Match pairwise add/fadd pattern.
Sep 16 2020, 5:52 AM · Restricted Project
sanwou01 updated the diff for D87231: [AArch64] Match pairwise add/fadd pattern.

Extend to f16, f32, f64 and i64

Sep 16 2020, 5:50 AM · Restricted Project
sanwou01 added inline comments to D87231: [AArch64] Match pairwise add/fadd pattern.
Sep 16 2020, 3:29 AM · Restricted Project
sanwou01 retitled D87231: [AArch64] Match pairwise add/fadd pattern from [AArch64] ExtractElement is free when combined with pairwise add to [AArch64] Match pairwise fadd pattern.
Sep 16 2020, 2:41 AM · Restricted Project
sanwou01 updated the diff for D87231: [AArch64] Match pairwise add/fadd pattern.

Rework to match faddp in AArch64 ISel lowering

Sep 16 2020, 2:41 AM · Restricted Project
sanwou01 added a comment to D87231: [AArch64] Match pairwise add/fadd pattern.

Thanks for the feedback. I agree that ideally we'd be generating reduction intrinsics in IR and matching that in the backends. I don't think the pairwise add can be represented with the current intrinsics though: we'd need a <2 x float> variant, or a predicated version of the <4 x float> intrinsic to do this for strict FP math, I believe.

Sep 16 2020, 2:39 AM · Restricted Project

Sep 8 2020

sanwou01 added a comment to D87231: [AArch64] Match pairwise add/fadd pattern.

Thanks @spatel . You're right that we miss that pattern, but, so does x86 currently it seems (I don't read x86 very well so I might be wrong). Using your faddp example:

Sep 8 2020, 2:34 AM · Restricted Project