sebpop (Sebastian Pop)
User

Projects

User does not belong to any projects.

User Details

User Since
Aug 4 2014, 10:15 AM (206 w, 6 d)

Recent Activity

Fri, Jul 20

sebpop accepted D49617: Early exit with cheaper checks.

looks good

Fri, Jul 20, 6:22 PM
sebpop accepted D49555: [GVNHoist] safeToHoistLdSt incorrectly checks whether a defining access dominates the insertion point.

looks good, thanks!

Fri, Jul 20, 6:17 PM

Thu, Jun 28

sebpop accepted D47893: Add a PhiValuesAnalysis pass to calculate the underlying values of phis.

Looks good to me. Thanks.

Thu, Jun 28, 5:18 AM

Mon, Jun 25

sebpop accepted D48481: [DA] Delinearise AddRecs if we can prove they don't wrap.

Looks good, please commit. Thanks!

Mon, Jun 25, 6:53 AM

Jun 22 2018

sebpop requested changes to D47893: Add a PhiValuesAnalysis pass to calculate the underlying values of phis.
Jun 22 2018, 1:10 PM

Jun 20 2018

sebpop accepted D45872: [DA] Enable -da-delinearize by default.

lgtm thanks!

Jun 20 2018, 9:01 AM

May 24 2018

sebpop added a comment to D24033: Convert clamp into fmaxnum/fminnum pairs..

In the following experiment a positive number is an increase in performance,
the best score was taken out of 3 runs on firefly aarch64 A-72:

May 24 2018, 9:13 AM
sebpop added a comment to D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy.

The experiment is cpu2000 best score out of 3 runs on A-72 of a firefly device.
A better score is positive.

May 24 2018, 9:12 AM

May 23 2018

sebpop updated the diff for D24033: Convert clamp into fmaxnum/fminnum pairs..

Updated patch. I will post perf numbers on some benchmarks with this patch.

May 23 2018, 9:32 AM
sebpop commandeered D24033: Convert clamp into fmaxnum/fminnum pairs..
May 23 2018, 9:30 AM

May 22 2018

sebpop added a comment to D46477: [AARCH64] Gang up loads and stores (for memcpy) for pairing..

You know you can either just use arc patch, and automagically get a
nice commit msg,
or at least manually add "Differential Revision: link", and it will
get closed automatically?

May 22 2018, 3:27 PM
sebpop updated the diff for D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy.

Following Eli's recommendation the patch does not modify memmov.
I will post the updated numbers on top of the improved code generation
for memcpy: https://reviews.llvm.org/rL332482

May 22 2018, 10:33 AM
sebpop closed D46477: [AARCH64] Gang up loads and stores (for memcpy) for pairing..

Committed in https://reviews.llvm.org/rL332482

May 22 2018, 10:28 AM
sebpop added a comment to D44564: [BasicAA] Use PhiValuesAnalysis if available when handling phi alias.

After some tinkering I've come up with the following solution (I have something that seems to work, but it needs cleaning up and testing):

  • Add a PhiAnalysis analysis pass which returns a PhiInfo
May 22 2018, 9:32 AM

May 17 2018

sebpop added a comment to D46193: [LSR] Skip LSR if the cost of input is cheaper than LSR's solution.

I tried this patch on exynos-m3 and there are several benchmarks improving by about 5%.
Among those benchmarks are spec2000 188.ammp and 256.bzip2 that improve by 3%.
All performance degradations are within noise level.

May 17 2018, 2:00 PM

May 14 2018

sebpop accepted D46477: [AARCH64] Gang up loads and stores (for memcpy) for pairing..

LGTM with some minor changes.

May 14 2018, 2:17 PM
sebpop added a comment to D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy.

Looking at the generated code a bit, it looks like we do a really terrible job lowering memcpy; we don't form ldp/stp at all, ever. We should probably fix that before we mess with the threshold here; it could substantially change the codesize/performance impact of this change.

May 14 2018, 12:27 PM

May 11 2018

sebpop added a comment to D45821: [AArch64] improve code generation of vectors smaller than 64 bit.

I am reruning the benchmarks with the patch applied on top of https://reviews.llvm.org/D46655 which fixes one of the problems exposed by this patch.

May 11 2018, 12:01 PM

May 9 2018

sebpop added a comment to D45821: [AArch64] improve code generation of vectors smaller than 64 bit.

I am reruning the benchmarks with the patch applied on top of https://reviews.llvm.org/D46655 which fixes one of the problems exposed by this patch.

May 9 2018, 12:01 PM
sebpop added a comment to D46655: [AArch64] Improve single vector lane stores.

This fixes a perf regression we were seeing with generation of vectors smaller than 64 bit: https://reviews.llvm.org/D45821

May 9 2018, 11:58 AM
sebpop added inline comments to D46193: [LSR] Skip LSR if the cost of input is cheaper than LSR's solution.
May 9 2018, 11:48 AM

May 8 2018

sebpop added inline comments to D46477: [AARCH64] Gang up loads and stores (for memcpy) for pairing..
May 8 2018, 9:24 AM

Apr 27 2018

sebpop accepted D45695: [CodeGen] Use RegUnits to track register aliases (NFC).

looks good, thanks!

Apr 27 2018, 9:40 AM
sebpop added a comment to D46193: [LSR] Skip LSR if the cost of input is cheaper than LSR's solution.

I like this change, thanks for implementing it!

Apr 27 2018, 8:58 AM

Apr 19 2018

sebpop added inline comments to D45821: [AArch64] improve code generation of vectors smaller than 64 bit.
Apr 19 2018, 1:15 PM
sebpop updated the diff for D45821: [AArch64] improve code generation of vectors smaller than 64 bit.

clang-format, added test-case, fixed all failing "make check" tests.

Apr 19 2018, 1:06 PM
sebpop added a comment to D45821: [AArch64] improve code generation of vectors smaller than 64 bit.

I am adding test cases for the new vectorized types, and will update the patch shortly.

Apr 19 2018, 9:01 AM
sebpop created D45821: [AArch64] improve code generation of vectors smaller than 64 bit.
Apr 19 2018, 8:58 AM

Apr 5 2018

sebpop accepted D45287: [InstCombine] Properly change GEP type when reassociating loop invariant GEP chains.

Looks good. Thanks!

Apr 5 2018, 9:44 AM

Apr 4 2018

sebpop abandoned D45229: [MI-sched] schedule following instruction latencies.

Thanks @fhahn for the pointer: I'm closing this revision and I will try to fix the problem with something similar to D38279.
I tried that code out and it is not modifying the current behaviour of the scheduler.

Apr 4 2018, 1:29 PM
sebpop added a comment to D45229: [MI-sched] schedule following instruction latencies.
  • Are you sure you are using a good scheduling model?

I have seen badly scheduled code for -mcpu=cortex-a57 and the exynos-m* tunings.

Apr 4 2018, 11:34 AM

Apr 3 2018

sebpop created D45229: [MI-sched] schedule following instruction latencies.
Apr 3 2018, 2:21 PM
sebpop updated subscribers of D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy.

I just helped the compiler with restrict and I see a pretty good code generated out of this example:

void fun(char * restrict in, char * restrict out) {
  memcpy(out, in, 100);
}

llvm produces:

	ldp	q0, q1, [x0, #64]
	stp	q0, q1, [x1, #64]
	ldp	q0, q1, [x0, #32]
	stp	q0, q1, [x1, #32]
	ldp	q0, q1, [x0]
	ldr	w8, [x0, #96]
	str	w8, [x1, #96]
	stp	q0, q1, [x1]
	ret

And here is the testcase I was looking at before producing the mix of ldr/str:

void fun(char *in, char *out) {
  memcpy(out, in, 100);
}

the mi-scheduler is unable to move ldr past str:

	ldr	w8, [x0, #96]
	str	w8, [x1, #96]
	ldr	q0, [x0, #80]
	str	q0, [x1, #80]
	ldr	q0, [x0, #64]
	str	q0, [x1, #64]
	ldr	q0, [x0, #48]
	str	q0, [x1, #48]
	ldr	q0, [x0, #32]
	str	q0, [x1, #32]
	ldr	q0, [x0, #16]
	str	q0, [x1, #16]
	ldr	q0, [x0]
	str	q0, [x1]
	ret

For this to work, the code generator expanding memcpy in getMemcpyLoadsAndStores()
needs to be amended to produce more than one ldr/str at a time.
The target should be able to specify the number of consecutive loads and stores to be produced.
In the case of generic aarch64 that should be 2 such that we can produce a ldp; stp; sequence.
For Exynos processors that should be a much higher number like 8 as it is better to have all loads and all stores scheduled together.

Apr 3 2018, 12:53 PM
sebpop accepted D45206: [LoopInterchange] Add remark for calls preventing interchanging..

LGTM. Thanks!

Apr 3 2018, 8:57 AM
sebpop accepted D45207: [LoopInterchange] Update tests so DA can handle access after D35430..

lgtm. Thanks!

Apr 3 2018, 8:56 AM

Mar 30 2018

sebpop added a comment to D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy.

Looking at the generated code a bit, it looks like we do a really terrible job lowering memcpy; we don't form ldp/stp at all, ever.

Mar 30 2018, 1:35 PM
sebpop added a comment to D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy.

Should we check for hasNEON() here? The generic code doesn't know AArch64 has ldp/stp, so we might want to be a little more aggressive to compensate.

Mar 30 2018, 11:38 AM
sebpop created D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy.
Mar 30 2018, 9:19 AM

Mar 26 2018

sebpop closed D39906: [InstCombine] reassociate loop invariant GEP chains to enable LICM.

Committed in https://reviews.llvm.org/rL328539

Mar 26 2018, 9:23 AM

Mar 23 2018

sebpop accepted D44758: [LSR] Allow giving priority to post-incrementing addressing modes.

The change looks good to me.
You may want to add a testcase to make sure this continues to work in the future.

Mar 23 2018, 4:31 PM

Mar 19 2018

sebpop added a comment to D38351: MIScheduler improved handling of copied physregs.

I tried this patch on aarch64 A72 firefly linux on a set of benchmarks.
Overall the performance degraded by 35% cumulatively (sum of all speedups and slowdowns.)
There were 5 benchmarks that sped up by more than 1% and 12 that slowed down by >1%.
One benchmark slowed down by >10% and three by >5%.
I will investigate these slowdowns.

Mar 19 2018, 8:41 AM
sebpop added a comment to D39906: [InstCombine] reassociate loop invariant GEP chains to enable LICM.

Ping.

Mar 19 2018, 8:17 AM

Mar 16 2018

sebpop added a comment to D41463: [CodeGen] Add a new pass for PostRA sink.

Each "{ instructions }" represents a packet of instructions.
Each packet executes in 1 cycle.
Overall, before the patch there were 3 packets, after the patch 4 packets.
On one of the paths we go from 2 cycles to 3 cycles.

Mar 16 2018, 11:34 AM
sebpop added a comment to D41463: [CodeGen] Add a new pass for PostRA sink.

The result takes one extra packet, which is a perf regression on Hexagon.
I think this is due to the fact that the sink of copies is post-ra, and
there doesn't seem to be a propagation pass to remove the extra transfer r2=r0.

Mar 16 2018, 11:02 AM
sebpop added a comment to D41463: [CodeGen] Add a new pass for PostRA sink.

Krzysztof, here is the assembly before this patch:

Mar 16 2018, 10:57 AM
sebpop added a comment to D41463: [CodeGen] Add a new pass for PostRA sink.

I got more data for the benchmarks that slowed down, and I see that the variation is within the noise level.
Thanks for checking the performance on your side.

Mar 16 2018, 8:14 AM

Mar 15 2018

sebpop accepted D44177: [JumpThreading] use UnreachableBlocks to avoid unreachable regions.

LGTM. Thanks!

Mar 15 2018, 7:45 AM

Mar 14 2018

sebpop added a comment to D41463: [CodeGen] Add a new pass for PostRA sink.

I tried this patch on aarch64 A72 firefly linux on a set of benchmarks.
Overall the performance improved by 11% cumulatively (sum of all speedups and slowdowns.)
7 benchmarks improved by more than 1% and 4 degraded by >1%, one degraded by 4% and another by 3%.
I will investigate the 4% and 3% regressions.

Mar 14 2018, 2:06 PM
sebpop added a comment to D38351: MIScheduler improved handling of copied physregs.

I tried this patch on aarch64 A72 firefly linux on a set of benchmarks.
Overall the performance degraded by 35% cumulatively (sum of all speedups and slowdowns.)
There were 5 benchmarks that sped up by more than 1% and 12 that slowed down by >1%.
One benchmark slowed down by >10% and three by >5%.
I will investigate these slowdowns.

Mar 14 2018, 1:30 PM

Mar 13 2018

sebpop accepted D41463: [CodeGen] Add a new pass for PostRA sink.

I think the current implementation is good: please commit.
Thanks for the explanations.

Mar 13 2018, 4:35 PM
sebpop added inline comments to D41463: [CodeGen] Add a new pass for PostRA sink.
Mar 13 2018, 11:37 AM
sebpop added a comment to D35430: DA: remove uses of GEP, only ask SCEV.

Requiring assertions for the tests is not too bad, although I think it would be better to use optimization remarks, which should be available in release mode too.

Mar 13 2018, 10:03 AM
sebpop updated subscribers of D38351: MIScheduler improved handling of copied physregs.
Mar 13 2018, 8:58 AM
sebpop updated subscribers of D41463: [CodeGen] Add a new pass for PostRA sink.
Mar 13 2018, 8:55 AM

Mar 12 2018

sebpop added inline comments to D41463: [CodeGen] Add a new pass for PostRA sink.
Mar 12 2018, 2:40 PM
sebpop closed D41360: [AARCH64] Enable fp16 data type for the Builtin in aarch64 only.

Committed in https://reviews.llvm.org/rL321301

Mar 12 2018, 12:23 PM
sebpop closed D38196: [AArch64] Avoid interleaved SIMD store instructions for Exynos.

Committed in https://reviews.llvm.org/rL320123

Mar 12 2018, 12:08 PM
sebpop accepted D44355: [AArch64] Fold adds with tprel_lo12_nc and secrel_lo12 into a following ldr/str.

I think this change is good.

Mar 12 2018, 8:51 AM
sebpop accepted D44361: [polly] Change std::sort to llvm::sort in response to r327219.

LGTM.

Mar 12 2018, 7:02 AM · Restricted Project

Mar 9 2018

sebpop updated the diff for D39906: [InstCombine] reassociate loop invariant GEP chains to enable LICM.
Mar 9 2018, 4:24 PM
sebpop updated the diff for D39906: [InstCombine] reassociate loop invariant GEP chains to enable LICM.

Rewrote the patch as suggested by Eli.

Mar 9 2018, 1:58 PM
sebpop accepted D43323: [NFC] Consolidate six getPointerOperand() utility functions into one place.

LGTM.
Thanks!

Mar 9 2018, 12:27 PM

Mar 8 2018

sebpop added a comment to D39906: [InstCombine] reassociate loop invariant GEP chains to enable LICM.

The 175.vpr result looks bad.

Mar 8 2018, 2:00 PM
sebpop updated the summary of D39906: [InstCombine] reassociate loop invariant GEP chains to enable LICM.
Mar 8 2018, 1:48 PM
sebpop updated the diff for D39906: [InstCombine] reassociate loop invariant GEP chains to enable LICM.
Mar 8 2018, 1:46 PM
sebpop updated the diff for D39906: [InstCombine] reassociate loop invariant GEP chains to enable LICM.

Update patch on today's tree.
Fixed parentheses.

Mar 8 2018, 1:35 PM
sebpop commandeered D39906: [InstCombine] reassociate loop invariant GEP chains to enable LICM.
Mar 8 2018, 1:34 PM
sebpop added a comment to D39906: [InstCombine] reassociate loop invariant GEP chains to enable LICM.

The motivation behind this patch is that it brings up the performance
of gzip deflate by 10%: this pattern occurs in the hot spot of
longest_match() in zlib.

Mar 8 2018, 1:26 PM

Mar 7 2018

sebpop accepted D44177: [JumpThreading] use UnreachableBlocks to avoid unreachable regions.

LGTM.

Mar 7 2018, 6:48 AM

Mar 5 2018

sebpop created D44118: [x86][AArch64] ask the target whether it has a vector blend instruction.
Mar 5 2018, 1:27 PM
sebpop added inline comments to D43973: [AArch64] define isExtractSubvectorCheap.
Mar 5 2018, 12:47 PM
sebpop added reviewers for D43973: [AArch64] define isExtractSubvectorCheap: evandro, javed.absar, kristof.beyls.
Mar 5 2018, 11:12 AM
sebpop added a comment to D43903: [AArch64] generate vuzp instead of mov.

Fixed in https://reviews.llvm.org/rL326722

Mar 5 2018, 9:41 AM
sebpop added a comment to D43903: [AArch64] generate vuzp instead of mov.

We are seeing this problem in testing too.
Sebastian, do you have time to work on this?

Mar 5 2018, 8:38 AM

Mar 1 2018

sebpop created D43973: [AArch64] define isExtractSubvectorCheap.
Mar 1 2018, 2:30 PM
sebpop added a comment to D42133: [AArch64] Improve code generation of constant vectors.

The patch looks good to me.
Thanks!

Mar 1 2018, 11:04 AM
sebpop closed D43903: [AArch64] generate vuzp instead of mov.

Committed with two more tests in https://reviews.llvm.org/rL326443

Mar 1 2018, 7:53 AM

Feb 28 2018

sebpop added a comment to D43903: [AArch64] generate vuzp instead of mov.

Was just thinking though that we probably need some negative tests where we expect the rewrite not to happen? Because e.g. the sequence has all even values except one value, if that makes sense.

Feb 28 2018, 5:07 PM
sebpop created D43903: [AArch64] generate vuzp instead of mov.
Feb 28 2018, 1:40 PM

Feb 15 2018

sebpop added a comment to D43323: [NFC] Consolidate six getPointerOperand() utility functions into one place.

I like this change. Please also check whether Polly needs a similar change.
Thanks!

Feb 15 2018, 5:31 PM
sebpop accepted D42717: [JumpThreading] sync DT for LVI analysis (PR 36133).

LGTM.

Feb 15 2018, 12:51 PM

Feb 7 2018

sebpop accepted D43007: Add missed PostDominatorTree analysis dependency to GVN hoist pass..

LGTM.

Feb 7 2018, 5:31 PM

Jan 29 2018

sebpop accepted D42601: [JumpThreading] NFC: Rename LoadInst variables.

Looks good, please commit.

Jan 29 2018, 8:54 AM

Jan 3 2018

sebpop accepted D40146: [JumpThreading] Preservation of DT and LVI across the pass.

Looks good to me. Please commit. Thanks Brian!

Jan 3 2018, 1:06 PM

Dec 21 2017

sebpop accepted D41453: [GVNHoist] Fix: PR35222 gvn-hoist incorrectly erases load in case of a loop.

This change looks good.

Dec 21 2017, 7:55 AM

Dec 18 2017

sebpop accepted D40146: [JumpThreading] Preservation of DT and LVI across the pass.

LGTM.
-fmodules can be fought in another patch.

Dec 18 2017, 9:54 AM

Dec 14 2017

sebpop accepted D41229: [SCEV] Fix the movement of insertion point in expander. PR35406..

LGTM

Dec 14 2017, 5:31 PM

Dec 6 2017

sebpop accepted D40146: [JumpThreading] Preservation of DT and LVI across the pass.

The patch looks good to me.
Please address the few inline comments and commit.

Dec 6 2017, 7:54 PM

Sep 8 2017

sebpop added a comment to D37528: [JumpThreading] Preserve DT and LVI across the pass..

@brzycki, one related thing: have you tried preserving postdominators as well? I'm not sure if that would be beneficial here, but in theory, it should be pretty simple, as you can pass exactly the same update sequences to the DominatorTree and PostDominatorTree.

I have not. I'll talk to @sebpop and see if he thinks this makes sense to include in this patch. Depending on our follow-on updates to JT we may need post dominators as well.

Even if you don't need it, it's trivial to preserve and probably worth doing.

Sep 8 2017, 6:54 AM

Jul 14 2017

sebpop updated the diff for D35430: DA: remove uses of GEP, only ask SCEV.
Jul 14 2017, 12:00 PM
sebpop created D35430: DA: remove uses of GEP, only ask SCEV.
Jul 14 2017, 11:59 AM

Apr 18 2017

sebpop accepted D32158: {GVNHoist] Mark GlobalsAA as preserved by GVNHoist..

LGTM. Thanks!

Apr 18 2017, 5:57 AM

Apr 10 2017

sebpop added a comment to D31035: [GVNHoist] Call isGuaranteedToTransferExecutionToSuccessor on each instruction.

LGTM.

Apr 10 2017, 1:06 PM

Apr 3 2017

sebpop accepted D31599: [CodeGen] Add Performance Monitor.

LGTM.

Apr 3 2017, 7:33 AM · Restricted Project
sebpop added a comment to D31599: [CodeGen] Add Performance Monitor.

Maybe we should just assert if this is not X86_64 for now?

Apr 3 2017, 7:13 AM · Restricted Project
sebpop added inline comments to D31599: [CodeGen] Add Performance Monitor.
Apr 3 2017, 6:52 AM · Restricted Project

Mar 16 2017

sebpop accepted D31035: [GVNHoist] Call isGuaranteedToTransferExecutionToSuccessor on each instruction.

LGTM.

Mar 16 2017, 11:31 AM

Mar 9 2017

sebpop added a comment to D30225: [LIR] re-enable generation of memmove with runtime checks.

Ping.

Mar 9 2017, 8:41 PM

Mar 2 2017

sebpop updated the diff for D30225: [LIR] re-enable generation of memmove with runtime checks.

Added check for HasMemmove.

Mar 2 2017, 11:39 AM
sebpop added a comment to D30225: [LIR] re-enable generation of memmove with runtime checks.

With the patch and compiling with "-mllvm -stats" the spec 2006:
Number of memcpy's formed from loop load+stores: 42

Before the patch on spec 2006:
Number of memcpy's formed from loop load+stores: 98

This looks bad.

Mar 2 2017, 8:30 AM