anemet (Adam Nemet)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 21 2014, 12:07 PM (152 w, 4 d)

Recent Activity

Today

anemet accepted D34567: [opt-viewer] Remove positional arg checks (NFC).

LGTM, thanks!

Fri, Jun 23, 12:55 PM
anemet added a comment to D34564: [opt-viewer] Python 3 support in opt-stats.py.

This may be way less efficient for Python 2. You may want to follow the guidance from here:

Fri, Jun 23, 12:47 PM

Sat, Jun 10

anemet added inline comments to D34082: [Frontend] 'Show hotness' can be used with a sampling profile.
Sat, Jun 10, 11:24 AM
anemet added inline comments to D34082: [Frontend] 'Show hotness' can be used with a sampling profile.
Sat, Jun 10, 11:01 AM
anemet accepted D34081: [opt-viewer] Include default values in help output.

LGTM, thanks!

Sat, Jun 10, 10:38 AM

Mon, Jun 5

anemet committed rL304721: Handle non-unique edges in edge-dominance.
Handle non-unique edges in edge-dominance
Mon, Jun 5, 9:27 AM
anemet closed D33584: Handle non-unique edges in edge-dominance by committing rL304721: Handle non-unique edges in edge-dominance.
Mon, Jun 5, 9:27 AM

Fri, Jun 2

anemet added inline comments to D33320: [SLP] Improve comments and naming of functions/variables/members, NFC..
Fri, Jun 2, 1:15 PM
anemet accepted D33320: [SLP] Improve comments and naming of functions/variables/members, NFC..

LGTM.

Fri, Jun 2, 12:52 PM
anemet accepted D33320: [SLP] Improve comments and naming of functions/variables/members, NFC..

Thanks very much for rewriting the loop. This is way more intuitive now.

Fri, Jun 2, 9:49 AM
anemet added inline comments to D29402: [SLP] Initial rework for min/max horizontal reduction vectorization, NFC..
Fri, Jun 2, 8:07 AM

Thu, Jun 1

anemet added inline comments to D33320: [SLP] Improve comments and naming of functions/variables/members, NFC..
Thu, Jun 1, 11:45 AM

Wed, May 31

anemet added inline comments to D33320: [SLP] Improve comments and naming of functions/variables/members, NFC..
Wed, May 31, 8:35 PM

Tue, May 30

anemet retitled D33584: Handle non-unique edges in edge-dominance from Remove a quadratic behavior in assert-enabled builds to Handle non-unique edges in edge-dominance.
Tue, May 30, 5:37 PM
anemet updated the diff for D33584: Handle non-unique edges in edge-dominance.

This version handles edge-dominance in the presence of non-unique edges.

Tue, May 30, 5:36 PM

Fri, May 26

anemet committed rL304061: Rearrange Dom unittest to accommodate multiple tests.
Rearrange Dom unittest to accommodate multiple tests
Fri, May 26, 9:06 PM
anemet closed D33617: Rearrange Dom unittest to accommodate multiple tests by committing rL304061: Rearrange Dom unittest to accommodate multiple tests.
Fri, May 26, 9:06 PM
anemet committed rL304060: clang-format DomTree unittest.
clang-format DomTree unittest
Fri, May 26, 9:06 PM
anemet created D33617: Rearrange Dom unittest to accommodate multiple tests.
Fri, May 26, 4:57 PM
anemet added a comment to D33514: [WIP] Bug 32352 - Provide a way for OptimizationRemarkEmitter::allowExtraAnalysis to check if (specific) remarks are enabled.

This is going in the direction, IMO. Would also be good to see the clang patch at some point. Thanks for tackling this!

Fri, May 26, 3:44 PM
anemet added a comment to D33584: Handle non-unique edges in edge-dominance.

I certainly agree that if we're not returning false for the non-unique edge case then that will cause bugs later on.

Can I add unit-tests for edge-domination somehow? I am trying to test this with allowing non-unique edges in GVN but that won't fly as regression test.

I'd just go with a regular C++ test case in unittests/

Fri, May 26, 3:18 PM
anemet added a comment to D33584: Handle non-unique edges in edge-dominance.

I certainly agree that if we're not returning false for the non-unique edge case then that will cause bugs later on.

Fri, May 26, 3:15 PM
anemet updated the summary of D33584: Handle non-unique edges in edge-dominance.
Fri, May 26, 1:33 PM
anemet updated the diff for D33584: Handle non-unique edges in edge-dominance.

Removed the asserts. As Danny put it:

Fri, May 26, 1:31 PM
anemet added a comment to D33584: Handle non-unique edges in edge-dominance.

@davide, I have an ll file for you. On my box, with an assert-enabled opt it gives:

Fri, May 26, 11:38 AM
anemet added inline comments to D33320: [SLP] Improve comments and naming of functions/variables/members, NFC..
Fri, May 26, 11:00 AM
anemet added inline comments to D33320: [SLP] Improve comments and naming of functions/variables/members, NFC..
Fri, May 26, 10:44 AM
anemet added a comment to D33584: Handle non-unique edges in edge-dominance.

As I said last year, i believe, we should just remove this assert.
It doesn't help anything. The callers literally can't handle it any better if they want real dominance answers.

Fri, May 26, 10:22 AM

Thu, May 25

anemet added a comment to D33584: Handle non-unique edges in edge-dominance.

Adding that to the testsuite, if it's not there, would be great.
If you can attach it here (or point a location where I can fetch, I'd like to run it under a profiler :)

Thu, May 25, 10:15 PM
anemet added a comment to D33584: Handle non-unique edges in edge-dominance.

Can you provide such a test?

Thu, May 25, 9:49 PM
anemet created D33584: Handle non-unique edges in edge-dominance.
Thu, May 25, 9:39 PM
anemet added inline comments to D33320: [SLP] Improve comments and naming of functions/variables/members, NFC..
Thu, May 25, 2:26 PM
anemet committed rL303885: Disable two more flaky ASan wait* tests temporarily on Darwin.
Disable two more flaky ASan wait* tests temporarily on Darwin
Thu, May 25, 10:25 AM

May 24 2017

anemet added inline comments to D33320: [SLP] Improve comments and naming of functions/variables/members, NFC..
May 24 2017, 10:20 AM
anemet added inline comments to D33320: [SLP] Improve comments and naming of functions/variables/members, NFC..
May 24 2017, 10:12 AM
anemet added inline comments to D33320: [SLP] Improve comments and naming of functions/variables/members, NFC..
May 24 2017, 9:53 AM

May 23 2017

anemet committed rL303662: Disable flaky ASan tests temporarily on darwin.
Disable flaky ASan tests temporarily on darwin
May 23 2017, 10:51 AM

May 22 2017

anemet added inline comments to D33396: [LV] Report multiple reasons for not vectorizing under allowExtraAnalysis.
May 22 2017, 3:10 PM
anemet added inline comments to D33396: [LV] Report multiple reasons for not vectorizing under allowExtraAnalysis.
May 22 2017, 2:14 PM
anemet accepted D33396: [LV] Report multiple reasons for not vectorizing under allowExtraAnalysis.

This is a good idea. LGTM too.

May 22 2017, 2:09 PM
anemet added a comment to D33320: [SLP] Improve comments and naming of functions/variables/members, NFC..

Thanks for improving this!

May 22 2017, 8:55 AM

May 18 2017

anemet committed rL303402: Revert "[ADT] Fix some Clang-tidy modernize-use-using warnings; other minor….
Revert "[ADT] Fix some Clang-tidy modernize-use-using warnings; other minor…
May 18 2017, 8:10 PM

May 17 2017

anemet added a comment to D25517: [SLPVectorizer] Improved support of partial tree vectorization..

Please write a new patch with all these improvements and add me as a reviewer. Thanks.

May 17 2017, 10:36 PM

May 15 2017

anemet committed rL303116: [SLP] Enable 64-bit wide vectorization on AArch64.
[SLP] Enable 64-bit wide vectorization on AArch64
May 15 2017, 2:28 PM
anemet closed D31965: [SLP] Enable 64-bit wide vectorization for Cyclone by committing rL303116: [SLP] Enable 64-bit wide vectorization on AArch64.
May 15 2017, 2:28 PM
anemet committed rL303094: Revert "[ClangD] Refactor clangd into separate components".
Revert "[ClangD] Refactor clangd into separate components"
May 15 2017, 11:28 AM
anemet committed rL303093: Revert "Fix windows buildbots - missing include and namespace".
Revert "Fix windows buildbots - missing include and namespace"
May 15 2017, 11:27 AM
anemet added a comment to D31965: [SLP] Enable 64-bit wide vectorization for Cyclone.

Ping

May 15 2017, 9:24 AM

May 12 2017

anemet accepted D33146: CMake: Fix docs-llvm-man target when clang+llvm is in the same source tree.

Would be good to include the analysis for the failure in the log/description.

May 12 2017, 12:25 PM

May 11 2017

anemet added a comment to D32827: [AArch64] Correct lane zero optimization in insert/extract costs.

Thanks. I've added opt-remarks to SLP in rL302811. Hopefully you can use those for your analysis too.

May 11 2017, 10:21 AM
anemet committed rL302811: [SLP] Emit optimization remarks.
[SLP] Emit optimization remarks
May 11 2017, 10:19 AM

May 10 2017

anemet added a comment to D25517: [SLPVectorizer] Improved support of partial tree vectorization..

Hi Alexey,

May 10 2017, 10:35 AM

May 9 2017

anemet updated the diff for D31965: [SLP] Enable 64-bit wide vectorization for Cyclone.

Address Kristof's comments. Thanks, Kristof!

May 9 2017, 9:12 AM

May 8 2017

anemet added a comment to D32827: [AArch64] Correct lane zero optimization in insert/extract costs.

Hi Adam,

Actually, this is also true if the insert is fed by a load. In this case we can just directly load into the vector register. In my recent experience with SLP this seemed like a pretty important case which is changing with this patch. How does performance look?

That's right. That actually should be true for all load/insert sequences of legal types, not just ones that insert into lane zero - we should generate LD1s for all lanes. So I think that could be an additional optimization, probably in a separate patch?

May 8 2017, 2:32 PM
anemet updated the diff for D31965: [SLP] Enable 64-bit wide vectorization for Cyclone.

Updated according to Kristof's idea: rather than whitelist, blacklist
subtargets (Qualcomm, Cavium) that didn't get a chance to benchmark this yet.

May 8 2017, 8:19 AM

May 3 2017

anemet added a comment to D31965: [SLP] Enable 64-bit wide vectorization for Cyclone.

The results for Exynos M1 and M2 are in and, except for a couple of workloads which improved between 2 and 5%, any difference in workloads was in the noise level with no significant regression.

IOW, it's OK for the Exynos subtargets.

May 3 2017, 1:46 PM
anemet added a comment to D32827: [AArch64] Correct lane zero optimization in insert/extract costs.

Hi Matt,

May 3 2017, 1:39 PM
anemet added a comment to D31965: [SLP] Enable 64-bit wide vectorization for Cyclone.

That sounds reasonable to me, but I would do it the other way around: enable it by default and explicitly disable it for the cores that we know have a chance of being evaluated and decided on later.
Otherwise, I'm afraid that we'll forever have an ever-growing whitelist of cores to enable this on, while it looks like the right thing to do in the end is to just enable it by default.

May 3 2017, 8:44 AM

May 2 2017

anemet added a comment to D31965: [SLP] Enable 64-bit wide vectorization for Cyclone.

Thanks @evandro, let me know.

May 2 2017, 4:00 PM
anemet added a comment to D31965: [SLP] Enable 64-bit wide vectorization for Cyclone.

Hey Matt,

May 2 2017, 3:44 PM

Apr 20 2017

anemet committed rL300858: Don't pass FPOpFusion::Strict to the backend.
Don't pass FPOpFusion::Strict to the backend
Apr 20 2017, 10:34 AM
anemet closed D32301: Don't pass FPOpFusion::Strict to the backend by committing rL300858: Don't pass FPOpFusion::Strict to the backend.
Apr 20 2017, 10:25 AM
anemet created D32301: Don't pass FPOpFusion::Strict to the backend.
Apr 20 2017, 9:55 AM

Apr 13 2017

anemet committed rL300276: [AArch64] Avoid partial register writes on lane 0 of BUILD_VECTOR for i8/i16/f16.
[AArch64] Avoid partial register writes on lane 0 of BUILD_VECTOR for i8/i16/f16
Apr 13 2017, 4:45 PM
anemet closed D32028: [AArch64] Avoid partial register writes on lane 0 of BUILD_VECTOR for i8/i16/f16 by committing rL300276: [AArch64] Avoid partial register writes on lane 0 of BUILD_VECTOR for i8/i16/f16.
Apr 13 2017, 4:45 PM
anemet created D32028: [AArch64] Avoid partial register writes on lane 0 of BUILD_VECTOR for i8/i16/f16.
Apr 13 2017, 10:00 AM

Apr 12 2017

anemet added inline comments to D31167: Use FPContractModeKind universally.
Apr 12 2017, 12:18 PM
anemet added a comment to D31965: [SLP] Enable 64-bit wide vectorization for Cyclone.

Hi Kristof,

Apr 12 2017, 8:25 AM
anemet added a comment to D31965: [SLP] Enable 64-bit wide vectorization for Cyclone.

Rolling it out for Cyclone-only is just a way to get this going in a controllable manner. Other subtargets can roll it this out as people find the time to benchmark and tune this.

Right. I'd like to at least try a more generic approach first, and only fall back to Cyclone-only if we get odd results on other cores.

Apr 12 2017, 8:13 AM
anemet added a comment to D31965: [SLP] Enable 64-bit wide vectorization for Cyclone.

Hi Renato,

Apr 12 2017, 7:19 AM

Apr 11 2017

anemet added a comment to F3218284: beamformer.png.

This is actually showing the difference with 64-bit SLP enabled as prepared by opt-diff. New remarks are prefixed with '+'.

Apr 11 2017, 5:33 PM
anemet created D31965: [SLP] Enable 64-bit wide vectorization for Cyclone.
Apr 11 2017, 5:28 PM
anemet added inline comments to D31167: Use FPContractModeKind universally.
Apr 11 2017, 10:17 AM
anemet accepted D30680: new method TargetTransformInfo::supportsVectorElementLoadStore() for LoopVectorizer.

LGTM.

Apr 11 2017, 9:05 AM

Apr 10 2017

anemet added inline comments to D31167: Use FPContractModeKind universally.
Apr 10 2017, 2:49 PM
anemet added inline comments to D31167: Use FPContractModeKind universally.
Apr 10 2017, 12:24 PM

Apr 7 2017

anemet added inline comments to D30680: new method TargetTransformInfo::supportsVectorElementLoadStore() for LoopVectorizer.
Apr 7 2017, 8:52 AM

Apr 6 2017

anemet added a comment to D30680: new method TargetTransformInfo::supportsVectorElementLoadStore() for LoopVectorizer.

Sorry about the delay on this but I was working on something related for ARM that may benefit from this as well. What I need for ARM is something that can communicate to the SLPVectorizer that load-pair and store-pair (of two registers) is efficiently supported on the target. I am wondering if we can combine the two things if your new hook would take the type and the vectorization width.

What do you think?

Is this also in the context of scalarizing a load / store?

For SystemZ, a scalarized memory access will have to do VF memory operations, but there is no need to extract or insert any of the data elements, as there are vector element load/store instructions.

We have something like this on ARM too. ld1 can load any element of a vector (e.g. ld1.s {v1}[1], [x1] loads lane 1 of vector reg v1) and st1 can store any element. That said, ld1 is still a partial write of the vector register so in terms of performance, it's worse than a regular store which is a full write. I think that modeling its cost as a load + insert (for non-zero-lane) is fairly accurate. Doesn't this match the situation on SystemZ?

As far as I know there is on SystemZ no extra penalty for using a vector load element, so scalarizing a vector load will really cost e.g. 4 loads at VF 4. This should be better than doing 4 scalar loads and 4 inserts.

Apr 6 2017, 10:26 AM

Apr 5 2017

anemet added a comment to D31169: [DAGCombiner] Initial support for the fast-math flag contract.

The counterpart of fused multiply-and-sub was committed under rL299572.

Apr 5 2017, 11:15 AM
anemet committed rL299572: [DAGCombine] Support FMF contract in fused multiple-and-sub too.
[DAGCombine] Support FMF contract in fused multiple-and-sub too
Apr 5 2017, 11:11 AM
anemet committed rL299571: [DAGCombine] Remove commented-out code from r299096.
[DAGCombine] Remove commented-out code from r299096
Apr 5 2017, 11:11 AM

Apr 4 2017

anemet added a comment to D30680: new method TargetTransformInfo::supportsVectorElementLoadStore() for LoopVectorizer.

Sorry about the delay on this but I was working on something related for ARM that may benefit from this as well. What I need for ARM is something that can communicate to the SLPVectorizer that load-pair and store-pair (of two registers) is efficiently supported on the target. I am wondering if we can combine the two things if your new hook would take the type and the vectorization width.

What do you think?

Is this also in the context of scalarizing a load / store?

For SystemZ, a scalarized memory access will have to do VF memory operations, but there is no need to extract or insert any of the data elements, as there are vector element load/store instructions.

Apr 4 2017, 9:11 PM
anemet committed rL299488: Another attempt to fix the sphinx warning from r299470.
Another attempt to fix the sphinx warning from r299470
Apr 4 2017, 4:59 PM
anemet committed rL299481: Fix sphinx warning from r299470.
Fix sphinx warning from r299470
Apr 4 2017, 3:57 PM
anemet committed rL299470: Add #pragma clang fp.
Add #pragma clang fp
Apr 4 2017, 2:31 PM
anemet closed D31276: Add #pragma clang fp by committing rL299470: Add #pragma clang fp.
Apr 4 2017, 2:31 PM
anemet committed rL299469: Set FMF for -ffp-contract=fast.
Set FMF for -ffp-contract=fast
Apr 4 2017, 2:31 PM
anemet closed D31168: Set FMF for -ffp-contract=fast by committing rL299469: Set FMF for -ffp-contract=fast.
Apr 4 2017, 2:31 PM
anemet added a comment to D31168: Set FMF for -ffp-contract=fast.

Thanks, Aaron! @rjmccall, does it look good to you too?

Apr 4 2017, 8:05 AM

Apr 3 2017

anemet added a comment to D31276: Add #pragma clang fp.

This continues to look good to me with the new name.

Apr 3 2017, 10:15 PM
anemet updated the diff for D31168: Set FMF for -ffp-contract=fast.

Address John's comment.

Apr 3 2017, 10:28 AM
anemet added a comment to D31168: Set FMF for -ffp-contract=fast.

I may have missed earlier steps in this patch series. Why is this being done statefully and contextually in the IRBuilder instead of just applying the flag from the BinaryOperator to the instruction when building it? It's not like ScalarExprEmitter doesn't know that it's building an FMul.

Apr 3 2017, 10:13 AM

Apr 1 2017

anemet added a comment to D31168: Set FMF for -ffp-contract=fast.

Ping. This is the only patch in the series now that wasn't approved.

Apr 1 2017, 7:26 PM

Mar 31 2017

anemet committed rL299237: Improve DebugInfo/strip-loop-metadata.ll test.
Improve DebugInfo/strip-loop-metadata.ll test
Mar 31 2017, 11:03 AM

Mar 30 2017

anemet added a comment to D31169: [DAGCombiner] Initial support for the fast-math flag contract.

LGTM - see a couple of small nits for the PPC test comment.

Mar 30 2017, 12:06 PM
anemet committed rL299096: [DAGCombiner] Initial support for the fast-math flag contract.
[DAGCombiner] Initial support for the fast-math flag contract
Mar 30 2017, 12:05 PM
anemet closed D31169: [DAGCombiner] Initial support for the fast-math flag contract by committing rL299096: [DAGCombiner] Initial support for the fast-math flag contract.
Mar 30 2017, 12:05 PM
anemet updated the diff for D31169: [DAGCombiner] Initial support for the fast-math flag contract.

Add a new test requested by Sanjay.

Mar 30 2017, 9:44 AM
anemet added inline comments to D31169: [DAGCombiner] Initial support for the fast-math flag contract.
Mar 30 2017, 9:42 AM
anemet updated the summary of D31168: Set FMF for -ffp-contract=fast.
Mar 30 2017, 9:04 AM