Page MenuHomePhabricator

steffenlarsen (Steffen Larsen)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 5 2021, 7:22 AM (11 w, 3 d)

Recent Activity

Today

steffenlarsen requested review of D104847: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions.
Thu, Jun 24, 4:30 AM · Restricted Project, Restricted Project

May 13 2021

steffenlarsen added a comment to D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions.

Do you have ability to commit to LLVM? If not, I can land the patch on your behalf.

May 13 2021, 10:07 AM · Restricted Project, Restricted Project
steffenlarsen added a comment to D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions.

@tra Thanks a ton for the review! This is my first LLVM patch so I only know as much as the Code Review documentation tells me. Is there a process for chasing up additional reviews?

May 13 2021, 9:37 AM · Restricted Project, Restricted Project

Apr 22 2021

steffenlarsen added a comment to D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions.

Do you know if any existing code already uses the __nvvm_* builtins for cp.async? In other words, does nvcc provide them already or is it something we're free to name as we wish? I do not see any relevant intrinsics mentioned in NVVM IR spec: https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html and I don't think NVCC's builtins are publicly documented anywhere.

Apr 22 2021, 2:06 AM · Restricted Project, Restricted Project

Apr 21 2021

steffenlarsen updated the diff for D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions.
  • Removed integer type from builtin and intrinsic names.
    • Signedness in builtin and intrinsic names moved to operator name, i.e. umin and umax.
    • Removed redundant addition variant.
Apr 21 2021, 3:56 AM · Restricted Project, Restricted Project
steffenlarsen added inline comments to D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions.
Apr 21 2021, 1:57 AM · Restricted Project, Restricted Project

Apr 20 2021

steffenlarsen added inline comments to D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions.
Apr 20 2021, 10:18 AM · Restricted Project, Restricted Project

Apr 13 2021

steffenlarsen added a comment to D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions.

Interesting. Reduction across lanes in warp? If so, this is probably a way to handle the last step reduction for openmp reductions

Apr 13 2021, 2:29 AM · Restricted Project, Restricted Project

Apr 9 2021

steffenlarsen updated the diff for D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions.

Following changes:

Apr 9 2021, 7:54 AM · Restricted Project, Restricted Project
steffenlarsen added a comment to D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions.

@tra Thank you for the feedback! I think I see what you're getting at, but I am not quite understanding how it would work for these builtins and intrinsics. I have added some comments to the corresponding feedback about my confusion and/or concerns.

Apr 9 2021, 6:33 AM · Restricted Project, Restricted Project

Apr 8 2021

steffenlarsen requested review of D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions.
Apr 8 2021, 9:17 AM · Restricted Project, Restricted Project