anemet (Adam Nemet)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 21 2014, 12:07 PM (148 w, 23 h)

Recent Activity

Today

anemet committed rL303662: Disable flaky ASan tests temporarily on darwin.
Disable flaky ASan tests temporarily on darwin
Tue, May 23, 10:51 AM

Yesterday

anemet added inline comments to D33396: [LV] Report multiple reasons for not vectorizing under allowExtraAnalysis.
Mon, May 22, 3:10 PM
anemet added inline comments to D33396: [LV] Report multiple reasons for not vectorizing under allowExtraAnalysis.
Mon, May 22, 2:14 PM
anemet accepted D33396: [LV] Report multiple reasons for not vectorizing under allowExtraAnalysis.

This is a good idea. LGTM too.

Mon, May 22, 2:09 PM
anemet added a comment to D33320: [SLP] Improve comments and naming of functions/variables/members, NFC..

Thanks for improving this!

Mon, May 22, 8:55 AM

Thu, May 18

anemet committed rL303402: Revert "[ADT] Fix some Clang-tidy modernize-use-using warnings; other minor….
Revert "[ADT] Fix some Clang-tidy modernize-use-using warnings; other minor…
Thu, May 18, 8:10 PM

Wed, May 17

anemet added a comment to D25517: [SLPVectorizer] Improved support of partial tree vectorization..

Please write a new patch with all these improvements and add me as a reviewer. Thanks.

Wed, May 17, 10:36 PM

Mon, May 15

anemet committed rL303116: [SLP] Enable 64-bit wide vectorization on AArch64.
[SLP] Enable 64-bit wide vectorization on AArch64
Mon, May 15, 2:28 PM
anemet closed D31965: [SLP] Enable 64-bit wide vectorization for Cyclone by committing rL303116: [SLP] Enable 64-bit wide vectorization on AArch64.
Mon, May 15, 2:28 PM
anemet committed rL303094: Revert "[ClangD] Refactor clangd into separate components".
Revert "[ClangD] Refactor clangd into separate components"
Mon, May 15, 11:28 AM
anemet committed rL303093: Revert "Fix windows buildbots - missing include and namespace".
Revert "Fix windows buildbots - missing include and namespace"
Mon, May 15, 11:27 AM
anemet added a comment to D31965: [SLP] Enable 64-bit wide vectorization for Cyclone.

Ping

Mon, May 15, 9:24 AM

Fri, May 12

anemet accepted D33146: CMake: Fix docs-llvm-man target when clang+llvm is in the same source tree.

Would be good to include the analysis for the failure in the log/description.

Fri, May 12, 12:25 PM

Thu, May 11

anemet added a comment to D32827: [AArch64] Correct lane zero optimization in insert/extract costs.

Thanks. I've added opt-remarks to SLP in rL302811. Hopefully you can use those for your analysis too.

Thu, May 11, 10:21 AM
anemet committed rL302811: [SLP] Emit optimization remarks.
[SLP] Emit optimization remarks
Thu, May 11, 10:19 AM

Wed, May 10

anemet added a comment to D25517: [SLPVectorizer] Improved support of partial tree vectorization..

Hi Alexey,

Wed, May 10, 10:35 AM

Tue, May 9

anemet updated the diff for D31965: [SLP] Enable 64-bit wide vectorization for Cyclone.

Address Kristof's comments. Thanks, Kristof!

Tue, May 9, 9:12 AM

Mon, May 8

anemet added a comment to D32827: [AArch64] Correct lane zero optimization in insert/extract costs.

Hi Adam,

Actually, this is also true if the insert is fed by a load. In this case we can just directly load into the vector register. In my recent experience with SLP this seemed like a pretty important case which is changing with this patch. How does performance look?

That's right. That actually should be true for all load/insert sequences of legal types, not just ones that insert into lane zero - we should generate LD1s for all lanes. So I think that could be an additional optimization, probably in a separate patch?

Mon, May 8, 2:32 PM
anemet updated the diff for D31965: [SLP] Enable 64-bit wide vectorization for Cyclone.

Updated according to Kristof's idea: rather than whitelist, blacklist
subtargets (Qualcomm, Cavium) that didn't get a chance to benchmark this yet.

Mon, May 8, 8:19 AM

Wed, May 3

anemet added a comment to D31965: [SLP] Enable 64-bit wide vectorization for Cyclone.

The results for Exynos M1 and M2 are in and, except for a couple of workloads which improved between 2 and 5%, any difference in workloads was in the noise level with no significant regression.

IOW, it's OK for the Exynos subtargets.

Wed, May 3, 1:46 PM
anemet added a comment to D32827: [AArch64] Correct lane zero optimization in insert/extract costs.

Hi Matt,

Wed, May 3, 1:39 PM
anemet added a comment to D31965: [SLP] Enable 64-bit wide vectorization for Cyclone.

That sounds reasonable to me, but I would do it the other way around: enable it by default and explicitly disable it for the cores that we know have a chance of being evaluated and decided on later.
Otherwise, I'm afraid that we'll forever have an ever-growing whitelist of cores to enable this on, while it looks like the right thing to do in the end is to just enable it by default.

Wed, May 3, 8:44 AM

Tue, May 2

anemet added a comment to D31965: [SLP] Enable 64-bit wide vectorization for Cyclone.

Thanks @evandro, let me know.

Tue, May 2, 4:00 PM
anemet added a comment to D31965: [SLP] Enable 64-bit wide vectorization for Cyclone.

Hey Matt,

Tue, May 2, 3:44 PM

Apr 20 2017

anemet committed rL300858: Don't pass FPOpFusion::Strict to the backend.
Don't pass FPOpFusion::Strict to the backend
Apr 20 2017, 10:34 AM
anemet closed D32301: Don't pass FPOpFusion::Strict to the backend by committing rL300858: Don't pass FPOpFusion::Strict to the backend.
Apr 20 2017, 10:25 AM
anemet created D32301: Don't pass FPOpFusion::Strict to the backend.
Apr 20 2017, 9:55 AM

Apr 13 2017

anemet committed rL300276: [AArch64] Avoid partial register writes on lane 0 of BUILD_VECTOR for i8/i16/f16.
[AArch64] Avoid partial register writes on lane 0 of BUILD_VECTOR for i8/i16/f16
Apr 13 2017, 4:45 PM
anemet closed D32028: [AArch64] Avoid partial register writes on lane 0 of BUILD_VECTOR for i8/i16/f16 by committing rL300276: [AArch64] Avoid partial register writes on lane 0 of BUILD_VECTOR for i8/i16/f16.
Apr 13 2017, 4:45 PM
anemet created D32028: [AArch64] Avoid partial register writes on lane 0 of BUILD_VECTOR for i8/i16/f16.
Apr 13 2017, 10:00 AM

Apr 12 2017

anemet added inline comments to D31167: Use FPContractModeKind universally.
Apr 12 2017, 12:18 PM
anemet added a comment to D31965: [SLP] Enable 64-bit wide vectorization for Cyclone.

Hi Kristof,

Apr 12 2017, 8:25 AM
anemet added a comment to D31965: [SLP] Enable 64-bit wide vectorization for Cyclone.

Rolling it out for Cyclone-only is just a way to get this going in a controllable manner. Other subtargets can roll it this out as people find the time to benchmark and tune this.

Right. I'd like to at least try a more generic approach first, and only fall back to Cyclone-only if we get odd results on other cores.

Apr 12 2017, 8:13 AM
anemet added a comment to D31965: [SLP] Enable 64-bit wide vectorization for Cyclone.

Hi Renato,

Apr 12 2017, 7:19 AM

Apr 11 2017

anemet added a comment to F3218284: beamformer.png.

This is actually showing the difference with 64-bit SLP enabled as prepared by opt-diff. New remarks are prefixed with '+'.

Apr 11 2017, 5:33 PM
anemet created D31965: [SLP] Enable 64-bit wide vectorization for Cyclone.
Apr 11 2017, 5:28 PM
anemet added inline comments to D31167: Use FPContractModeKind universally.
Apr 11 2017, 10:17 AM
anemet accepted D30680: new method TargetTransformInfo::supportsVectorElementLoadStore() for LoopVectorizer.

LGTM.

Apr 11 2017, 9:05 AM

Apr 10 2017

anemet added inline comments to D31167: Use FPContractModeKind universally.
Apr 10 2017, 2:49 PM
anemet added inline comments to D31167: Use FPContractModeKind universally.
Apr 10 2017, 12:24 PM

Apr 7 2017

anemet added inline comments to D30680: new method TargetTransformInfo::supportsVectorElementLoadStore() for LoopVectorizer.
Apr 7 2017, 8:52 AM

Apr 6 2017

anemet added a comment to D30680: new method TargetTransformInfo::supportsVectorElementLoadStore() for LoopVectorizer.

Sorry about the delay on this but I was working on something related for ARM that may benefit from this as well. What I need for ARM is something that can communicate to the SLPVectorizer that load-pair and store-pair (of two registers) is efficiently supported on the target. I am wondering if we can combine the two things if your new hook would take the type and the vectorization width.

What do you think?

Is this also in the context of scalarizing a load / store?

For SystemZ, a scalarized memory access will have to do VF memory operations, but there is no need to extract or insert any of the data elements, as there are vector element load/store instructions.

We have something like this on ARM too. ld1 can load any element of a vector (e.g. ld1.s {v1}[1], [x1] loads lane 1 of vector reg v1) and st1 can store any element. That said, ld1 is still a partial write of the vector register so in terms of performance, it's worse than a regular store which is a full write. I think that modeling its cost as a load + insert (for non-zero-lane) is fairly accurate. Doesn't this match the situation on SystemZ?

As far as I know there is on SystemZ no extra penalty for using a vector load element, so scalarizing a vector load will really cost e.g. 4 loads at VF 4. This should be better than doing 4 scalar loads and 4 inserts.

Apr 6 2017, 10:26 AM

Apr 5 2017

anemet added a comment to D31169: [DAGCombiner] Initial support for the fast-math flag contract.

The counterpart of fused multiply-and-sub was committed under rL299572.

Apr 5 2017, 11:15 AM
anemet committed rL299572: [DAGCombine] Support FMF contract in fused multiple-and-sub too.
[DAGCombine] Support FMF contract in fused multiple-and-sub too
Apr 5 2017, 11:11 AM
anemet committed rL299571: [DAGCombine] Remove commented-out code from r299096.
[DAGCombine] Remove commented-out code from r299096
Apr 5 2017, 11:11 AM

Apr 4 2017

anemet added a comment to D30680: new method TargetTransformInfo::supportsVectorElementLoadStore() for LoopVectorizer.

Sorry about the delay on this but I was working on something related for ARM that may benefit from this as well. What I need for ARM is something that can communicate to the SLPVectorizer that load-pair and store-pair (of two registers) is efficiently supported on the target. I am wondering if we can combine the two things if your new hook would take the type and the vectorization width.

What do you think?

Is this also in the context of scalarizing a load / store?

For SystemZ, a scalarized memory access will have to do VF memory operations, but there is no need to extract or insert any of the data elements, as there are vector element load/store instructions.

Apr 4 2017, 9:11 PM
anemet committed rL299488: Another attempt to fix the sphinx warning from r299470.
Another attempt to fix the sphinx warning from r299470
Apr 4 2017, 4:59 PM
anemet committed rL299481: Fix sphinx warning from r299470.
Fix sphinx warning from r299470
Apr 4 2017, 3:57 PM
anemet committed rL299470: Add #pragma clang fp.
Add #pragma clang fp
Apr 4 2017, 2:31 PM
anemet closed D31276: Add #pragma clang fp by committing rL299470: Add #pragma clang fp.
Apr 4 2017, 2:31 PM
anemet committed rL299469: Set FMF for -ffp-contract=fast.
Set FMF for -ffp-contract=fast
Apr 4 2017, 2:31 PM
anemet closed D31168: Set FMF for -ffp-contract=fast by committing rL299469: Set FMF for -ffp-contract=fast.
Apr 4 2017, 2:31 PM
anemet added a comment to D31168: Set FMF for -ffp-contract=fast.

Thanks, Aaron! @rjmccall, does it look good to you too?

Apr 4 2017, 8:05 AM

Apr 3 2017

anemet added a comment to D31276: Add #pragma clang fp.

This continues to look good to me with the new name.

Apr 3 2017, 10:15 PM
anemet updated the diff for D31168: Set FMF for -ffp-contract=fast.

Address John's comment.

Apr 3 2017, 10:28 AM
anemet added a comment to D31168: Set FMF for -ffp-contract=fast.

I may have missed earlier steps in this patch series. Why is this being done statefully and contextually in the IRBuilder instead of just applying the flag from the BinaryOperator to the instruction when building it? It's not like ScalarExprEmitter doesn't know that it's building an FMul.

Apr 3 2017, 10:13 AM

Apr 1 2017

anemet added a comment to D31168: Set FMF for -ffp-contract=fast.

Ping. This is the only patch in the series now that wasn't approved.

Apr 1 2017, 7:26 PM

Mar 31 2017

anemet committed rL299237: Improve DebugInfo/strip-loop-metadata.ll test.
Improve DebugInfo/strip-loop-metadata.ll test
Mar 31 2017, 11:03 AM

Mar 30 2017

anemet added a comment to D31169: [DAGCombiner] Initial support for the fast-math flag contract.

LGTM - see a couple of small nits for the PPC test comment.

Mar 30 2017, 12:06 PM
anemet committed rL299096: [DAGCombiner] Initial support for the fast-math flag contract.
[DAGCombiner] Initial support for the fast-math flag contract
Mar 30 2017, 12:05 PM
anemet closed D31169: [DAGCombiner] Initial support for the fast-math flag contract by committing rL299096: [DAGCombiner] Initial support for the fast-math flag contract.
Mar 30 2017, 12:05 PM
anemet updated the diff for D31169: [DAGCombiner] Initial support for the fast-math flag contract.

Add a new test requested by Sanjay.

Mar 30 2017, 9:44 AM
anemet added inline comments to D31169: [DAGCombiner] Initial support for the fast-math flag contract.
Mar 30 2017, 9:42 AM
anemet updated the summary of D31168: Set FMF for -ffp-contract=fast.
Mar 30 2017, 9:04 AM
anemet updated the diff for D31168: Set FMF for -ffp-contract=fast.

Also add 'contract' for CompoundAssignOperator (+= and -=). For l-values,
these don't go through the expr-visitor in ScalarExprEmitter::Visit. This
again piggybacks on how debug-locations are added.

Mar 30 2017, 9:03 AM

Mar 29 2017

anemet added inline comments to D31167: Use FPContractModeKind universally.
Mar 29 2017, 8:17 PM
anemet updated the diff for D31169: [DAGCombiner] Initial support for the fast-math flag contract.

Address review comments. Also test with all target enabled now. Sorry again
for the silly mistake.

Mar 29 2017, 5:01 PM
anemet added inline comments to D31169: [DAGCombiner] Initial support for the fast-math flag contract.
Mar 29 2017, 3:45 PM
anemet added inline comments to D31169: [DAGCombiner] Initial support for the fast-math flag contract.
Mar 29 2017, 3:35 PM
anemet committed rL299033: Use FPContractModeKind universally.
Use FPContractModeKind universally
Mar 29 2017, 3:06 PM
anemet committed rL299029: Revert "Use FPContractModeKind universally".
Revert "Use FPContractModeKind universally"
Mar 29 2017, 2:36 PM
anemet committed rL299027: Use FPContractModeKind universally.
Use FPContractModeKind universally
Mar 29 2017, 1:52 PM
anemet closed D31167: Use FPContractModeKind universally by committing rL299027: Use FPContractModeKind universally.
Mar 29 2017, 1:52 PM
anemet added inline comments to D31167: Use FPContractModeKind universally.
Mar 29 2017, 1:31 PM
anemet added a comment to D30680: new method TargetTransformInfo::supportsVectorElementLoadStore() for LoopVectorizer.

Sorry about the delay on this but I was working on something related for ARM that may benefit from this as well. What I need for ARM is something that can communicate to the SLPVectorizer that load-pair and store-pair (of two registers) is efficiently supported on the target. I am wondering if we can combine the two things if your new hook would take the type and the vectorization width.

Mar 29 2017, 9:58 AM

Mar 28 2017

anemet added a comment to D31276: Add #pragma clang fp.

(Sorry about updating the title/description back and forth but arcanist is buggy)

Mar 28 2017, 5:31 PM
anemet retitled D31276: Add #pragma clang fp from Add #pragma clang fast_math to Add #pragma clang fp.
Mar 28 2017, 5:30 PM
anemet updated the diff for D31276: Add #pragma clang fp.

Rename pragma from #pragma clang fast_math contract_fast(on/off) -> #pragma clang fp contract(on/fast/off)

Mar 28 2017, 5:29 PM
anemet retitled D31276: Add #pragma clang fp from Add #pragma clang fast_math to Add #pragma clang fp.
Mar 28 2017, 5:28 PM
anemet added a reviewer for D31167: Use FPContractModeKind universally: spatel.
Mar 28 2017, 5:25 PM
anemet added a reviewer for D31168: Set FMF for -ffp-contract=fast: spatel.
Mar 28 2017, 5:25 PM
anemet added a reviewer for D31276: Add #pragma clang fp: spatel.
Mar 28 2017, 5:24 PM
anemet added a comment to D31165: [SDAG] Add AllowContract to SNodeFlags.

@spatel, @arsenm, this is a trivial patch to expose the new FMF to SDAG. Since you guys looked at the rest of the LLVM patches in the set, would you mind reviewing this too?

I actually haven't looked at all of the patches. Can you add me to those?

Mar 28 2017, 4:59 PM
anemet committed rL298963: [SDAG] Remove -enable-fmf-dag.
[SDAG] Remove -enable-fmf-dag
Mar 28 2017, 4:58 PM
anemet committed rL298962: [SDAG] Handle VectorReduction in SDNodeFlags::intersectWith.
[SDAG] Handle VectorReduction in SDNodeFlags::intersectWith
Mar 28 2017, 4:58 PM
anemet committed rL298961: [SDAG] Add AllowContract to SNodeFlags.
[SDAG] Add AllowContract to SNodeFlags
Mar 28 2017, 4:58 PM
anemet closed D31165: [SDAG] Add AllowContract to SNodeFlags by committing rL298961: [SDAG] Add AllowContract to SNodeFlags.
Mar 28 2017, 4:58 PM
anemet added reviewers for D31169: [DAGCombiner] Initial support for the fast-math flag contract: spatel, arsenm.

Does this look OK now?

Mar 28 2017, 2:17 PM
anemet added reviewers for D31165: [SDAG] Add AllowContract to SNodeFlags: spatel, arsenm.

@spatel, @arsenm, this is a trivial patch to expose the new FMF to SDAG. Since you guys looked at the rest of the LLVM patches in the set, would you mind reviewing this too?

Mar 28 2017, 2:15 PM
anemet committed rL298939: [IR] Add AllowContract to FastMathFlags.
[IR] Add AllowContract to FastMathFlags
Mar 28 2017, 1:24 PM
anemet closed D31164: [IR] Add AllowContract to FastMathFlags by committing rL298939: [IR] Add AllowContract to FastMathFlags.
Mar 28 2017, 1:24 PM

Mar 27 2017

anemet updated the diff for D31164: [IR] Add AllowContract to FastMathFlags.

Address Sanjay's comments

Mar 27 2017, 3:31 PM
anemet added inline comments to D31164: [IR] Add AllowContract to FastMathFlags.
Mar 27 2017, 3:25 PM
anemet updated the diff for D31169: [DAGCombiner] Initial support for the fast-math flag contract.

Fix typo

Mar 27 2017, 3:10 PM
anemet updated the diff for D31169: [DAGCombiner] Initial support for the fast-math flag contract.

Address Matt's comment

Mar 27 2017, 3:09 PM
anemet committed rL298877: Encapsulate FPOptions and use it consistently.
Encapsulate FPOptions and use it consistently
Mar 27 2017, 12:30 PM
anemet closed D31166: Encapsulate FPOptions and use it consistently by committing rL298877: Encapsulate FPOptions and use it consistently.
Mar 27 2017, 12:29 PM
anemet added a comment to D31169: [DAGCombiner] Initial support for the fast-math flag contract.

Ping

Mar 27 2017, 11:04 AM
anemet added a comment to D31165: [SDAG] Add AllowContract to SNodeFlags.

Ping

Mar 27 2017, 11:03 AM
anemet added a comment to D31164: [IR] Add AllowContract to FastMathFlags.

Ping

Mar 27 2017, 11:03 AM