Page MenuHomePhabricator

sheredom (Neil Henning)
User

Projects

User does not belong to any projects.

User Details

User Since
Dec 6 2013, 2:54 AM (271 w, 3 d)

All things performance at AMD: current focus - Vulkan!

Recent Activity

Mon, Feb 11

sheredom committed rG8c10fa1a903f: [AMDGPU] Fix DPP sequence in atomic optimizer. (authored by sheredom).
[AMDGPU] Fix DPP sequence in atomic optimizer.
Mon, Feb 11, 6:44 AM
sheredom committed rL353703: [AMDGPU] Fix DPP sequence in atomic optimizer..
[AMDGPU] Fix DPP sequence in atomic optimizer.
Mon, Feb 11, 6:43 AM
sheredom closed D57737: [AMDGPU] Fix DPP sequence in atomic optimizer..
Mon, Feb 11, 6:43 AM · Restricted Project, Restricted Project
sheredom added a comment to D57737: [AMDGPU] Fix DPP sequence in atomic optimizer..
In D57737#1392667, @tpr wrote:

But I still don't understand it:

  1. Why do you want an exclusive scan? Surely what you're trying to do is just "sum" up all lanes into lane 63, which is an inclusive scan.
Mon, Feb 11, 3:49 AM · Restricted Project, Restricted Project
sheredom added inline comments to D57737: [AMDGPU] Fix DPP sequence in atomic optimizer..
Mon, Feb 11, 3:26 AM · Restricted Project, Restricted Project
sheredom updated the diff for D57737: [AMDGPU] Fix DPP sequence in atomic optimizer..

Fix comment that should now say exclusive scan instead of inclusive scan.

Mon, Feb 11, 2:50 AM · Restricted Project, Restricted Project

Fri, Feb 8

sheredom updated the diff for D57737: [AMDGPU] Fix DPP sequence in atomic optimizer..

Make the final readlane be in the WWM section too as per the review comments.

Fri, Feb 8, 2:06 AM · Restricted Project, Restricted Project

Thu, Feb 7

sheredom added inline comments to D57737: [AMDGPU] Fix DPP sequence in atomic optimizer..
Thu, Feb 7, 5:02 AM · Restricted Project, Restricted Project

Wed, Feb 6

sheredom updated the diff for D57737: [AMDGPU] Fix DPP sequence in atomic optimizer..

Updated to bring in an additional fix to remove read_register exec and replace it with a ballot.

Wed, Feb 6, 1:46 AM · Restricted Project, Restricted Project

Tue, Feb 5

sheredom created D57737: [AMDGPU] Fix DPP sequence in atomic optimizer..
Tue, Feb 5, 1:42 AM · Restricted Project, Restricted Project

Tue, Jan 29

sheredom committed rL352500: [AMDGPU] Fix a weird WWM intrinsic issue..
[AMDGPU] Fix a weird WWM intrinsic issue.
Tue, Jan 29, 6:28 AM
sheredom closed D56002: [AMDGPU] Fix a weird WWM intrinsic issue..
Tue, Jan 29, 6:28 AM · Restricted Project

Mon, Jan 28

sheredom updated the diff for D56002: [AMDGPU] Fix a weird WWM intrinsic issue..

Fixed review comments.

Mon, Jan 28, 4:10 AM · Restricted Project

Jan 18 2019

sheredom committed rL351562: [AMDGPU] Add some missing always-uniform values..
[AMDGPU] Add some missing always-uniform values.
Jan 18 2019, 8:44 AM
sheredom closed D56845: [AMDGPU] Add some missing always-uniform values..
Jan 18 2019, 8:44 AM · Restricted Project
sheredom updated the diff for D56845: [AMDGPU] Add some missing always-uniform values..

Fix review comments.

Jan 18 2019, 8:44 AM · Restricted Project

Jan 17 2019

sheredom updated the diff for D56845: [AMDGPU] Add some missing always-uniform values..

Review comments.

Jan 17 2019, 9:52 AM · Restricted Project
sheredom updated the diff for D56845: [AMDGPU] Add some missing always-uniform values..

Also tag s.getpc as scalar, and add tests in the Analysis folder like suggested.

Jan 17 2019, 8:18 AM · Restricted Project
sheredom added a comment to D56845: [AMDGPU] Add some missing always-uniform values..

@arsenm agreed - not quite sure how to do tests for this? I started to look at how we test alias analysis, and they have a cl::opt to force print the debug info for that pass, and that is how our amdgpu-alias-analysis.ll tests the aliasing rules. How exactly to test that divergence analysis did something here?

Jan 17 2019, 5:57 AM · Restricted Project
sheredom created D56845: [AMDGPU] Add some missing always-uniform values..
Jan 17 2019, 4:10 AM · Restricted Project

Jan 10 2019

sheredom committed rL350838: [AMDGPU] Fix dwordx3/southern-islands failures..
[AMDGPU] Fix dwordx3/southern-islands failures.
Jan 10 2019, 8:25 AM
sheredom closed D56434: [AMDGPU] Fix dwordx3/southern-islands failures..
Jan 10 2019, 8:24 AM · Restricted Project
sheredom added a comment to D55444: AMDGPU: Fix DPP combiner.

@cwabbott I planned to do a followup once this DPP change had landed to add the missing dpp/codegen patterns to the atomic optimizer - so watch this space!

Jan 10 2019, 6:16 AM · Restricted Project, Restricted Project

Jan 9 2019

sheredom added inline comments to D56434: [AMDGPU] Fix dwordx3/southern-islands failures..
Jan 9 2019, 5:06 AM · Restricted Project
sheredom updated the diff for D56434: [AMDGPU] Fix dwordx3/southern-islands failures..

Fix review comment.

Jan 9 2019, 5:06 AM · Restricted Project

Jan 8 2019

sheredom added inline comments to D56434: [AMDGPU] Fix dwordx3/southern-islands failures..
Jan 8 2019, 5:37 AM · Restricted Project
sheredom updated the diff for D56434: [AMDGPU] Fix dwordx3/southern-islands failures..

Fix review comments.

Jan 8 2019, 5:32 AM · Restricted Project
sheredom created D56434: [AMDGPU] Fix dwordx3/southern-islands failures..
Jan 8 2019, 4:44 AM · Restricted Project

Jan 7 2019

sheredom added a comment to D56002: [AMDGPU] Fix a weird WWM intrinsic issue..

@nhaehnle I've removed canReadVGPR as it only had the single callsite - but given @arsenm's comment I did not change getOpRegClass. I also don't feel super comfortable doing the change you suggested to addUsersToMoveToVALUWorklist as I'm worried that there will be non-LLVM-tested paths that I could trip up on with ease. I'd rather ship the commit as is if y'all are ok with it.

Jan 7 2019, 4:52 AM · Restricted Project
sheredom updated the diff for D56002: [AMDGPU] Fix a weird WWM intrinsic issue..

Removed canReadVGPR as it had only a single callsite.

Jan 7 2019, 4:50 AM · Restricted Project

Dec 21 2018

sheredom created D56002: [AMDGPU] Fix a weird WWM intrinsic issue..
Dec 21 2018, 7:08 AM · Restricted Project

Dec 12 2018

sheredom committed rL348937: [AMDGPU] Extend the SI Load/Store optimizer to combine more things..
[AMDGPU] Extend the SI Load/Store optimizer to combine more things.
Dec 12 2018, 8:18 AM
sheredom closed D54042: [AMDGPU] Extend the SI Load/Store optimizer to combine more things..
Dec 12 2018, 8:18 AM · Restricted Project

Dec 10 2018

sheredom committed rL348771: [AMDGPU] Change the l1 flush instruction for AMDPAL/MESA3D..
[AMDGPU] Change the l1 flush instruction for AMDPAL/MESA3D.
Dec 10 2018, 8:42 AM
sheredom closed D55367: [AMDGPU] Change the l1 flush instruction for AMDPAL/MESA3D..
Dec 10 2018, 8:42 AM · Restricted Project
sheredom updated the diff for D55367: [AMDGPU] Change the l1 flush instruction for AMDPAL/MESA3D..

Made the l1 flush change happen for MESA3D too (like Nicolai asked for).

Dec 10 2018, 3:37 AM · Restricted Project

Dec 6 2018

sheredom created D55367: [AMDGPU] Change the l1 flush instruction for AMDPAL/MESA3D..
Dec 6 2018, 5:47 AM · Restricted Project

Dec 4 2018

sheredom updated the diff for D54042: [AMDGPU] Extend the SI Load/Store optimizer to combine more things..

Fixed review comments:

Dec 4 2018, 2:12 AM · Restricted Project

Nov 8 2018

sheredom added inline comments to D54042: [AMDGPU] Extend the SI Load/Store optimizer to combine more things..
Nov 8 2018, 9:25 AM · Restricted Project

Nov 6 2018

sheredom updated the diff for D54042: [AMDGPU] Extend the SI Load/Store optimizer to combine more things..

We discussed this on an internal AMD meeting Monday 5th November 2018, and came to the conclusion that even though I do want the scalar load combining to be brought upstream, it would be better as a separate change so that we can get broader testing across the users of our AMDGPU backend.

Nov 6 2018, 2:39 AM · Restricted Project

Nov 5 2018

sheredom committed rL346128: [AMDGPU] Fix the new atomic optimizer in pixel shaders..
[AMDGPU] Fix the new atomic optimizer in pixel shaders.
Nov 5 2018, 4:07 AM
sheredom closed D53930: [AMDGPU] Fix the new atomic optimizer in pixel shaders..
Nov 5 2018, 4:07 AM · Restricted Project
sheredom added inline comments to D53930: [AMDGPU] Fix the new atomic optimizer in pixel shaders..
Nov 5 2018, 4:05 AM · Restricted Project
sheredom updated the diff for D53930: [AMDGPU] Fix the new atomic optimizer in pixel shaders..

Fix review comments made by @nhaehnle

Nov 5 2018, 4:05 AM · Restricted Project

Nov 2 2018

sheredom created D54042: [AMDGPU] Extend the SI Load/Store optimizer to combine more things..
Nov 2 2018, 11:27 AM · Restricted Project
sheredom committed rL345962: [AMDGPU] UBSan bug fix for r345710.
[AMDGPU] UBSan bug fix for r345710
Nov 2 2018, 3:27 AM

Nov 1 2018

sheredom updated the diff for D53930: [AMDGPU] Fix the new atomic optimizer in pixel shaders..

Review fixes.

Nov 1 2018, 3:25 AM · Restricted Project

Oct 31 2018

sheredom created D53930: [AMDGPU] Fix the new atomic optimizer in pixel shaders..
Oct 31 2018, 6:46 AM · Restricted Project
sheredom committed rL345710: [AMDGPU] support image load/store a16.
[AMDGPU] support image load/store a16
Oct 31 2018, 3:38 AM
sheredom closed D53750: [AMDGPU] support image load/store a16.
Oct 31 2018, 3:38 AM · Restricted Project

Oct 29 2018

sheredom updated the diff for D53750: [AMDGPU] support image load/store a16.

Added an additional lit test for the dim variants (including the 2darraymsaa requested by @nhaehnle), and change the naming of the a16.d16 tests to vNf16.

Oct 29 2018, 2:43 AM · Restricted Project

Oct 26 2018

sheredom created D53750: [AMDGPU] support image load/store a16.
Oct 26 2018, 2:24 AM · Restricted Project

Oct 19 2018

sheredom added a comment to D42885: [AMDGPU] intrintrics for byte/short load/store.

Maybe a dumb question - but why can't we just use the tbuffer load/store instead of these? It already upcasts for you (the zext/sext is built in depending on the nfmt I believe).

Oct 19 2018, 1:55 AM

Oct 10 2018

sheredom committed rL344128: Fix an ordering bug in the scalarizer..
Fix an ordering bug in the scalarizer.
Oct 10 2018, 2:29 AM
sheredom closed D52540: Fix an ordering bug in the scalarizer..
Oct 10 2018, 2:29 AM

Oct 8 2018

sheredom committed rL343973: [AMDGPU] Add an AMDGPU specific atomic optimizer..
[AMDGPU] Add an AMDGPU specific atomic optimizer.
Oct 8 2018, 8:51 AM
sheredom closed D51969: [AMDGPU] Add an AMDGPU specific atomic optimizer..
Oct 8 2018, 8:51 AM · Restricted Project
sheredom updated the diff for D51969: [AMDGPU] Add an AMDGPU specific atomic optimizer..

Rebased ontop of tip to get the IRBuilder change in https://reviews.llvm.org/D52087, and incorporate the requested changes by @nhaehnle resulting from that.

Oct 8 2018, 4:49 AM · Restricted Project
sheredom committed rL343962: [IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle..
[IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle.
Oct 8 2018, 3:35 AM
sheredom closed D52087: [IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle..
Oct 8 2018, 3:35 AM

Oct 5 2018

sheredom committed rL343842: Add missing period to comment to match style of file..
Add missing period to comment to match style of file.
Oct 5 2018, 2:42 AM

Sep 28 2018

sheredom added a comment to D52548: Stop instcombining propagating wider shufflevector arguments to predecessors..

Thanks - LGTM. As before, I'd prefer to have the baseline tests in place before the patch. Let me know if I should commit on your behalf.

Sep 28 2018, 7:30 AM
sheredom updated the diff for D52548: Stop instcombining propagating wider shufflevector arguments to predecessors..

Fix latest review comments.

Sep 28 2018, 7:22 AM
sheredom added inline comments to D52548: Stop instcombining propagating wider shufflevector arguments to predecessors..
Sep 28 2018, 6:27 AM
sheredom retitled D52548: Stop instcombining propagating wider shufflevector arguments to predecessors. from Stop instcombining introducing undef's in div/rem instructions. to Stop instcombining propagating wider shufflevector arguments to predecessors..
Sep 28 2018, 6:12 AM
sheredom added inline comments to D52548: Stop instcombining propagating wider shufflevector arguments to predecessors..
Sep 28 2018, 2:43 AM
sheredom updated the diff for D52548: Stop instcombining propagating wider shufflevector arguments to predecessors..

Move the shuffle widens check into Instructions.h, and call it increasesLength to match the existing changesLength call that was already on ShuffleVectorInst.

Sep 28 2018, 2:42 AM
sheredom added inline comments to D52548: Stop instcombining propagating wider shufflevector arguments to predecessors..
Sep 28 2018, 1:59 AM
sheredom updated the diff for D52548: Stop instcombining propagating wider shufflevector arguments to predecessors..

Changed the check to not ever push wider shufflevector stuff back onto predecessor instructions as per spatel's suggestion!

Sep 28 2018, 1:50 AM

Sep 27 2018

sheredom updated the diff for D52540: Fix an ordering bug in the scalarizer..

Incorporate a related test case that my approach also fixes from https://bugs.llvm.org/show_bug.cgi?id=28911

Sep 27 2018, 7:16 AM
sheredom added a comment to D52540: Fix an ordering bug in the scalarizer..

dstenb described a possible fix in a comment of PR28911 and we've been using that fix for a long time now for our
out-of-tree target without problems.

Perhaps that fix is a hack and using RPOT is the proper way to deal with this, I've no idea. I just wanted to point out the
possibility.

Sep 27 2018, 6:23 AM
sheredom added a comment to D52540: Fix an ordering bug in the scalarizer..

Is this the same problem as described in https://bugs.llvm.org/show_bug.cgi?id=28911 ?

Sep 27 2018, 5:30 AM
sheredom updated the diff for D52548: Stop instcombining propagating wider shufflevector arguments to predecessors..

Rebased on tip trunk.

Sep 27 2018, 2:07 AM

Sep 26 2018

sheredom updated the diff for D52556: Add a test case showing the instcombine fail from D52548.
  • Added a comment explaining each of the 3 variants for each of sdiv/srem/udiv/urem
  • Fixed the whitespace issue
  • Removed the FMF flags that were a legacy from the original shader this was reduced form
Sep 26 2018, 10:42 AM
sheredom updated the diff for D52548: Stop instcombining propagating wider shufflevector arguments to predecessors..

Removed the fdiv/frem as they weren't currently inst simplifying the bad behaviour, and used the utils script to update the test case.

Sep 26 2018, 8:47 AM
sheredom added a comment to D52548: Stop instcombining propagating wider shufflevector arguments to predecessors..

See https://reviews.llvm.org/D52556 for just the test case with the bad output expected.

Sep 26 2018, 8:20 AM
sheredom created D52556: Add a test case showing the instcombine fail from D52548.
Sep 26 2018, 8:20 AM
sheredom added a comment to D52548: Stop instcombining propagating wider shufflevector arguments to predecessors..

Just to be clear you want me to commit the tests in a separate commit with the bad output first?

Sep 26 2018, 7:31 AM
sheredom created D52548: Stop instcombining propagating wider shufflevector arguments to predecessors..
Sep 26 2018, 6:36 AM
sheredom created D52540: Fix an ordering bug in the scalarizer..
Sep 26 2018, 3:06 AM

Sep 25 2018

sheredom added a comment to D51969: [AMDGPU] Add an AMDGPU specific atomic optimizer..

What happens if a shader already does "if (threadID == 0) { do_atomic(); }"? Is the optimization skipped in this case?

Sep 25 2018, 1:55 AM · Restricted Project

Sep 24 2018

sheredom updated the diff for D52087: [IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle..

Removed the no-args overload as per Nicolai's suggestion.

Sep 24 2018, 7:20 AM
sheredom added inline comments to D51969: [AMDGPU] Add an AMDGPU specific atomic optimizer..
Sep 24 2018, 7:20 AM · Restricted Project

Sep 14 2018

sheredom updated the diff for D51969: [AMDGPU] Add an AMDGPU specific atomic optimizer..

Added an interaction with the DominatorTree, so that if its present in the PassManager we can preserve + update it.

Sep 14 2018, 6:25 AM · Restricted Project
sheredom added inline comments to D51969: [AMDGPU] Add an AMDGPU specific atomic optimizer..
Sep 14 2018, 4:38 AM · Restricted Project
sheredom created D52087: [IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle..
Sep 14 2018, 4:35 AM
sheredom added inline comments to D51969: [AMDGPU] Add an AMDGPU specific atomic optimizer..
Sep 14 2018, 2:01 AM · Restricted Project

Sep 13 2018

sheredom added inline comments to D51969: [AMDGPU] Add an AMDGPU specific atomic optimizer..
Sep 13 2018, 6:12 AM · Restricted Project

Sep 12 2018

sheredom updated the diff for D51969: [AMDGPU] Add an AMDGPU specific atomic optimizer..

Update to include tip LLVM changes (DivergenceAnalysis -> LegacyDivergenceAnalysis, AMDGPUAS is become just an enum now).

Sep 12 2018, 2:44 AM · Restricted Project
sheredom created D51969: [AMDGPU] Add an AMDGPU specific atomic optimizer..
Sep 12 2018, 2:37 AM · Restricted Project

Sep 9 2016

sheredom added a comment to D21723: [RFC] Enhance synchscope representation.

This is something we (Codeplay) would like to see upstream, it is a much cleaner solution than the metadata workarounds everyone (including us) have been using to fix this.

Sep 9 2016, 2:20 AM

Apr 27 2016

sheredom closed D19478: Remove assert mandating you can only use SPIR target with OpenCL.

Thanks!

Apr 27 2016, 1:28 AM

Apr 26 2016

sheredom added a comment to D19478: Remove assert mandating you can only use SPIR target with OpenCL.

So we build a bunch of internal libraries in a mix of OpenCL and C++, and then link them all together to create SPIR libraries that can be fed to calls to clLinkProgram and linked against user kernels.

Apr 26 2016, 2:19 AM

Apr 25 2016

sheredom retitled D19478: Remove assert mandating you can only use SPIR target with OpenCL from to Remove assert mandating you can only use SPIR target with OpenCL.
Apr 25 2016, 3:38 AM

Jan 29 2015

sheredom updated the diff for D7245: Fix OpenCL 1.2 double as an optional core feature behaviour.

Ran clang-format on the file, and there was a TON of line changes out-with the patch. Extracted the formatted lines from my original patch only, and updated the patch.

Jan 29 2015, 4:58 AM
sheredom updated the test plan for D7245: Fix OpenCL 1.2 double as an optional core feature behaviour.
Jan 29 2015, 2:40 AM
sheredom retitled D7245: Fix OpenCL 1.2 double as an optional core feature behaviour from to Fix OpenCL 1.2 double as an optional core feature behaviour.
Jan 29 2015, 2:39 AM