Page MenuHomePhabricator

b-sumner (Brian Sumner)
User

Projects

User does not belong to any projects.

User Details

User Since
Aug 18 2016, 12:44 PM (353 w, 6 d)

Recent Activity

Fri, May 5

b-sumner added a comment to D148906: [AMDGPU] Remove function if FeatureWavefrontSize 32 is not supported on current GPU.

I think we need to similarly remove a wavefrontsize64 function if compiling in wave32 mode.

I think this is a reasonable request but it's an extension to what this pass currently does.

  • Currently the pass removes functions that rely on features not supported by the -mcpu= GPU.
  • The extension would be to remove functions that rely on features not supported by the -mcpu= GPU as modified by any -mattr= options.

Does that make sense? If so can we please implement that in a generic way instead of having a special hack for wave32/wave4 features?

Fri, May 5, 10:38 AM · Restricted Project, Restricted Project

Apr 29 2023

b-sumner added a comment to D148906: [AMDGPU] Remove function if FeatureWavefrontSize 32 is not supported on current GPU.

I think we need to similarly remove a wavefrontsize64 function if compiling in wave32 mode.

Apr 29 2023, 8:33 AM · Restricted Project, Restricted Project

Apr 27 2023

b-sumner added a comment to D149348: RFD: Do not CSE convergent calls in different basic blocks.

How confident are we in our ability to strip the convergent attribute off of functions that don't need it? Seems like this could cause performance regressions.

Apr 27 2023, 8:10 AM · Restricted Project, Restricted Project

Apr 25 2023

b-sumner added a comment to D148906: [AMDGPU] Remove function if FeatureWavefrontSize 32 is not supported on current GPU.

I think a function with wavefrontsize32 is incompatible with a wavefrontsize64 compilation and similarly for a wavefrontsize64 function with a wavefrontsize32 compilation and we should take the usual steps for incompatible functions.

Apr 25 2023, 7:31 AM · Restricted Project, Restricted Project

Apr 21 2023

b-sumner added a comment to D148906: [AMDGPU] Remove function if FeatureWavefrontSize 32 is not supported on current GPU.

Currently, wavfrontsize32 is being appended by device-libs to some functions for gfx9 targets.

Why? That seems wrong.

It is wrong but that’s how the library is currently structured. All code coexists in the same IR, and there are some wave32 only functions

But if you're compiling for gfx900 then this pass should remove any functions that are marked with +wavefrontsize32. It should not just remove the +wavefrontsize32 from the function.

Apr 21 2023, 7:55 AM · Restricted Project, Restricted Project

Apr 20 2023

b-sumner added a comment to D148796: [AMDGPU][GFX908] Add builtin support for global add atomic f16/f32.

We used to support it that way and decided just not doing it. It is very hard to explain why a supported atomic results in error. Someone who really needs it can use intrinsic.

Apr 20 2023, 12:13 PM · Restricted Project, Restricted Project, Restricted Project

Apr 4 2023

b-sumner added a comment to D147408: [AMDGPU] Iterative scan implementation for atomic optimizer..

Scalar branches may be the most expensive aspect of this algorithm

If not-taken conditional branches are cheap then we could do something like this. It only has one taken branch, when we have finished handling all the active lanes.

  // Inclusive plus-scan v0 into v1. Also leaves the result of the plus-reduction in s3.
  s_mov s0, exec
  s_mov s3, 0 // accumulator
// repeat this section 32 or 64 times:
  s_ff1 s1, s0 // find lowest remaining active lane
  s_cmp_eq s1, -1
  s_cbranch_scc1 end
  s_bitset0 s0, s1
  v_readlane s2, v0, s1
  s_add s3, s2
  v_writelane v1, s3, s1
// end of repeated section
end:
Apr 4 2023, 8:24 AM · Restricted Project, Restricted Project
b-sumner added inline comments to D147408: [AMDGPU] Iterative scan implementation for atomic optimizer..
Apr 4 2023, 7:22 AM · Restricted Project, Restricted Project

Mar 28 2023

b-sumner added a comment to D146840: [AMDGPU] Replace target feature for global fadd32.

No objection here.

Mar 28 2023, 2:43 PM · Restricted Project, Restricted Project, Restricted Project

Mar 22 2023

b-sumner added a comment to D146523: [AMDGPU]: Add new intrinsic llvm.amdgcn.convergent.copy.

It might be better to loop over only active lanes, found using ctz or clz builtins.

Mar 22 2023, 8:04 AM · Restricted Project, Restricted Project

Mar 21 2023

b-sumner added a comment to D146523: [AMDGPU]: Add new intrinsic llvm.amdgcn.convergent.copy.

FWIW, there is no desire to read from inactive lanes. The loop is supposed to only be reading from, and writing to, lanes that were active before the for loop is executed by a select single lane.

Then I'm back to not understanding what this convergent copy is for. I'd need to see a more complete example.

Mar 21 2023, 1:27 PM · Restricted Project, Restricted Project
b-sumner added a comment to D146523: [AMDGPU]: Add new intrinsic llvm.amdgcn.convergent.copy.

FWIW, there is no desire to read from inactive lanes. The loop is supposed to only be reading from, and writing to, lanes that were active before the for loop is executed by a select single lane.

Mar 21 2023, 1:10 PM · Restricted Project, Restricted Project

Mar 5 2023

b-sumner added a comment to D145343: [AMDGPU] Emit predefined macro `__AMDGCN_CUMODE__`.

I think exposing whether or not the flag was used is weird/broken, as is including _OPTION in the name. Should just define to whether it's enabled or not

I agree. @b-sumner What do you think?

Mar 5 2023, 7:33 PM · Restricted Project, Restricted Project, Restricted Project

Feb 14 2023

b-sumner added a comment to D142507: [AMDGPU] Split dot7 feature.

My current understanding is the c-p will go into already forked clang-16, but not to rocm 5.4. So rocm device-libs will be accompanied by the older clang-16 w/o this and stay compatible. Someone building from scratch will use latest clang-16 and staging device-libs with this change. Do you think this will work?

Feb 14 2023, 3:48 PM · Restricted Project, Restricted Project, Restricted Project
b-sumner added a comment to D142507: [AMDGPU] Split dot7 feature.

I think unless conflicts arise creating an issue similar to this https://github.com/llvm/llvm-project/issues/60600 with the cherry-pick line set to this commit should be enough. (See also https://llvm.org/docs/GitHub.html).

I believe it will need D142407 to be cherry-picked as well to apply cleanly. Otherwise I do not expect conflicts. So the c-p need to go into release/16.x, right?
Let's wait for @b-sumner first anyway, he is maintaining device-lib.

Feb 14 2023, 3:15 PM · Restricted Project, Restricted Project, Restricted Project

Feb 9 2023

b-sumner added a comment to D143643: [AMDGPU] Remove unused ClangBuiltin definition for fmed3.

The device library is using the builtin, so I assume the build will fail when this change lands.

This doesn’t change the builtin at all

Feb 9 2023, 7:39 AM · Restricted Project, Restricted Project
b-sumner added a comment to D143643: [AMDGPU] Remove unused ClangBuiltin definition for fmed3.

Also, I see no median intrinsic described in the LLVM programming guide.

Feb 9 2023, 7:35 AM · Restricted Project, Restricted Project
b-sumner added a comment to D143643: [AMDGPU] Remove unused ClangBuiltin definition for fmed3.

The device library is using the builtin, so I assume the build will fail when this change lands.

Feb 9 2023, 7:33 AM · Restricted Project, Restricted Project

Jan 23 2023

b-sumner added a comment to D142407: [AMDGPU] Split dot8 feature.

No objections here...

Jan 23 2023, 3:38 PM · Restricted Project, Restricted Project, Restricted Project

Nov 22 2022

b-sumner added a comment to D138507: HIP: Directly use sqrt builtins instead of calling ocml (f32 case).

__builtin_sqrtf does not produce a correctly rounded result. I don't recommend this change.

Nov 22 2022, 8:58 AM · Restricted Project

Oct 28 2022

b-sumner added a comment to D136981: [HIP] add float to fp16 convert functions.

Thank you!

Oct 28 2022, 2:13 PM · Restricted Project, Restricted Project

Oct 12 2022

b-sumner added a comment to D135733: AMDGPU: Treat asm as a hazard for all register read-after-write hazards.

We're in a lose-lose situation here. No matter what we do, someone will complain. But we're are definitely on more solid footing by going for correctness.

Oct 12 2022, 12:46 PM · Restricted Project, Restricted Project

Oct 11 2022

b-sumner added a comment to D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang.

Different functions providing different behaviors can be handled at link time like any other function, instead of the same functions providing different behaviors per translation unit and requires cloning. The current scheme transfers complexity from the device library build system into the driver and user binaries

Oct 11 2022, 2:20 PM · Restricted Project, Restricted Project
b-sumner added a comment to D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang.

I don't like the fact that we need to have two different kinds of control constants, one per-TU and others per-link job. I'm wondering how difficult it would be to make the fast versions of the math calls use different entry points. That way we could handle this in the math header wrappers.

That's really how the C linkage model wants you to handle this. I also would like to have FP value tracking optimizations take care of the special cases in the library code

Oct 11 2022, 1:33 PM · Restricted Project, Restricted Project
b-sumner added a comment to D135614: [OpenMP][CUDA][AMDGPU] Accept case insensitive subarchitecture names.

Also, we may want to use uppercase for other purposes in the future.

Oct 11 2022, 8:18 AM · Restricted Project, Restricted Project, Restricted Project
b-sumner added a comment to D135614: [OpenMP][CUDA][AMDGPU] Accept case insensitive subarchitecture names.

I don't particularly see a need for this. I am not opposed to a "did you mean" in the error diagnostic.

Oct 11 2022, 8:17 AM · Restricted Project, Restricted Project, Restricted Project

Sep 22 2022

b-sumner added a comment to D134355: [AMDGPU] Emit module flag for all code object versions.

LGTM

Should the module flag name be amdgpu_code_object_version or amdhsa_code_object_version?

Good question.

@b-sumner Does code object version affects PAL? Thanks.

Sep 22 2022, 11:52 AM · Restricted Project, Restricted Project

Sep 16 2022

b-sumner added inline comments to D131560: AMDGPU: Improve atomicrmw fadd selection.
Sep 16 2022, 7:26 AM · Restricted Project, Restricted Project

Sep 15 2022

b-sumner added inline comments to D131560: AMDGPU: Improve atomicrmw fadd selection.
Sep 15 2022, 6:17 PM · Restricted Project, Restricted Project

Aug 19 2022

b-sumner added a comment to D132140: [AMDGPU] Add builtin s_sendmsg_rtn.

revised by Brian's comments

Aug 19 2022, 2:19 PM · Restricted Project, Restricted Project

Aug 18 2022

b-sumner added a comment to D132140: [AMDGPU] Add builtin s_sendmsg_rtn.

Following existing naming, it might make sense to rename "rtn_b32" --> "rtn" and "rtn_b64" --> "rtnl".

Aug 18 2022, 8:41 AM · Restricted Project, Restricted Project

Aug 5 2022

b-sumner added a comment to D131276: AMDGPU: Implicit kernel arguments related optimization when uniform-workgroup-size=true.

I also think we're still missing a module flag to indicate the code object version

Aug 5 2022, 12:43 PM · Restricted Project, Restricted Project

Aug 4 2022

b-sumner added inline comments to D130729: [InferAddressSpaces] [AMDGPU] Add inference for flat_atomic intrinsics.
Aug 4 2022, 2:00 PM · Restricted Project, Restricted Project

Jul 14 2022

b-sumner added a comment to D99071: [ASAN][AMDGPU] Add support for accesses to global and constant addrspaces.

Are declarations for @llvm.amdgcn.is.shared supposed to be showing up in my IR, even when my target triple is _not_ amdgcn-amd?

Jul 14 2022, 9:53 AM · Restricted Project, Restricted Project

Jun 24 2022

b-sumner added a comment to D128344: [AMDGPU] Add the uses_dynamic_stack field to the kernel descriptor and the kernel metadata map.

The "is_" prefix is not helpful. I suggest either "uses_dynamic_stack" or just "dynamic_stack".

Jun 24 2022, 8:02 AM · Restricted Project, Restricted Project

Jun 23 2022

b-sumner added inline comments to D128022: [HIP] add -fhip-kernel-arg-name.
Jun 23 2022, 10:02 AM · Restricted Project, Restricted Project

Jun 14 2022

b-sumner added a comment to D123693: Transform illegal intrinsics to V_ILLEGAL.

Thanks @bcahoon. If there were another available approach that always worked, we would use it.

Jun 14 2022, 7:02 AM · Restricted Project, Restricted Project

Jun 7 2022

b-sumner added a comment to D127241: [AMDGPU] gfx11 add bits to COMPUTE_PGM_RSRC3.

This should include updates to https://llvm.org/docs/AMDGPUUsage.html#kernel-descriptor if not planned for a different patch.

Jun 7 2022, 12:30 PM · Restricted Project, Restricted Project

Apr 27 2022

b-sumner added a comment to D124537: [AMDGPU][clang] Definition of gfx11 subtarget.

t-tye should review this too.

Apr 27 2022, 10:31 AM · Restricted Project, Restricted Project

Apr 25 2022

b-sumner added inline comments to D124387: AMDGPU: Fold out readfirstlane between vgpr to vgpr copies.
Apr 25 2022, 1:29 PM · Restricted Project, Restricted Project
b-sumner added inline comments to D124387: AMDGPU: Fold out readfirstlane between vgpr to vgpr copies.
Apr 25 2022, 9:42 AM · Restricted Project, Restricted Project

Apr 13 2022

b-sumner added a comment to D123693: Transform illegal intrinsics to V_ILLEGAL.

I think this is too focused. There are other image_sample_lz intrinsics and all of them potentially need to be replaced if following this approach.

Apr 13 2022, 10:50 AM · Restricted Project, Restricted Project

Apr 11 2022

b-sumner added inline comments to D123548: AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally.
Apr 11 2022, 4:47 PM · Restricted Project, Restricted Project

Apr 7 2022

b-sumner added a comment to D123346: AMDGPU: Align the implicit kernel argument segment to 8 bytes for v5.

Looks fine to me!

Apr 7 2022, 7:35 PM · Restricted Project, Restricted Project

Mar 21 2022

b-sumner added a comment to D122190: [AMDGPU][LLD] Adding support for ABI version 5 option.

This change goes along with 1194b9cdda30d and should probably have been landed with it.

Mar 21 2022, 3:38 PM · Restricted Project, Restricted Project

Feb 18 2022

b-sumner added a comment to D120150: Constant folding of llvm.amdgcn.trig.preop.

Ran the test over the entire range of double floating-point.

Just curious: how long does it take to test all 2^64 inputs?

Feb 18 2022, 11:44 AM · Restricted Project, Restricted Project

Feb 16 2022

b-sumner added inline comments to D119027: [AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg.
Feb 16 2022, 1:20 PM · Restricted Project

Feb 2 2022

b-sumner added a comment to D118229: [AMDGPUHSAMetadataStreamer] Do not assume ABI alignment for pointers.

This change introduces the regression in OpenCL conformance test: basic - kernel_memory_alignment_local. Does it require any corresponding runtime changes?

Is clang correctly emitting the align attribute on all these arguments?

clang does not do anything special for alignment of pointer type kernel arg. It assumes the pointee alignment is default 1. https://godbolt.org/z/xs195rKoW

Question is: What OpenCL spec says about kernel arg pointee alignment? @b-sumner @Anastasia

It should be the ABI type alignment as was used before

I doubt OpenCL spec requires alignment of pointer-type kernel argument. I suspect it is part of our own undocumented ABI. If we implement this in clang, it probably goes to TargetABIInfo.

Yes it does, it requires natural alignment for all types which is what that conformance test is checking.

Feb 2 2022, 12:10 PM · Restricted Project

Dec 2 2021

b-sumner added a comment to D114957: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args.

Can we agree to drop the macro in LLVM 15 and note that in a comment or elsewhere?

Dec 2 2021, 10:11 AM · Restricted Project, Restricted Project
b-sumner added a comment to D114957: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args.

This is a flag-day change to the signatures of the LLVM intrinsics and the OpenCL builtins. Is that OK?

This breaks users' code. If we have to do this, at least let clang emit a pre-defined macro e.g. __amdgcn_bvh_use_vec3__=1 so that users can make their code work before and after the change.

I don't know anything about OpenCL macros. Is it good enough to put this in AMDGPUTargetInfo::getTargetDefines:

if (Opts.OpenCL)
  Builder.defineMacro("__amdgcn_bvh_use_vec3__");

Does it need tests, documentation, etc?

Dec 2 2021, 7:21 AM · Restricted Project, Restricted Project
b-sumner added a comment to D114957: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args.

This is a flag-day change to the signatures of the LLVM intrinsics and the OpenCL builtins. Is that OK?

This breaks users' code. If we have to do this, at least let clang emit a pre-defined macro e.g. __amdgcn_bvh_use_vec3__=1 so that users can make their code work before and after the change.

I do not think it's worth introducing a macro for this. Are there actually C users of these builtins?

Yes we have users who use these clang builtins. We have received quite a few complaints about making breaking API changes without a way to detect them in the program.

Dec 2 2021, 7:15 AM · Restricted Project, Restricted Project

Sep 27 2021

b-sumner added inline comments to D110468: [AMDGPU] Do not internalize ASan device library functions..
Sep 27 2021, 10:15 AM · Restricted Project
b-sumner added inline comments to D110468: [AMDGPU] Do not internalize ASan device library functions..
Sep 27 2021, 10:12 AM · Restricted Project

Jun 10 2021

b-sumner added a comment to D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers..

I believe that for the purposes of detecting lane 0 mbcnt_lo is sufficient.

Jun 10 2021, 7:05 AM · Restricted Project

Jun 8 2021

b-sumner added a comment to D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers..

Two approaches for limiting the stores to lane 0 of each wave:

  1. Write 1 to exec mask, store, and write -1 to exec mask. This works since the exec mask at the start of the wave when this happens is -1
  2. Check for lane == 0 and branch. The lane can be computed by a) wave64: builtin_amdgcn_mbcnt_hi(~0u, builtin_amdgcn_mbcnt_lo(~0u, 0u)) b) wave32: __builtin_amdgcn_mbcnt_lo(~0u, 0u)
Jun 8 2021, 7:24 AM · Restricted Project

May 13 2021

b-sumner added a comment to D102347: [AMDGPU] Only allow global fp atomics with unsafe option.

This looks good to me.

May 13 2021, 7:03 AM · Restricted Project

May 6 2021

b-sumner added inline comments to D102022: [AMDGPU] Expose __builtin_amdgcn_perm for v_perm_b32.
May 6 2021, 3:34 PM · Restricted Project, Restricted Project

Mar 19 2021

b-sumner added inline comments to D98953: [AMDGPU] Use reductions instead of scans in the atomic optimizer.
Mar 19 2021, 12:25 PM · Restricted Project
b-sumner added inline comments to D98953: [AMDGPU] Use reductions instead of scans in the atomic optimizer.
Mar 19 2021, 11:25 AM · Restricted Project

Jan 25 2021

b-sumner added a comment to D95391: AMDGPU: Add support for amdgpu-unsafe-fp-atomics attribute.

That is the safest thing to do. However, I'm now hearing some thinking that memory likely to be targeted by f.p. atomics is likely to be cached. But again, better to be safe at least until we're clear this will always be the case.

Jan 25 2021, 3:32 PM · Restricted Project

Jan 20 2021

b-sumner added a comment to D95102: AMDGPU: Remove v_rsq_f64 patterns.

Thanks. LGTM.

Jan 20 2021, 6:52 PM · Restricted Project
b-sumner added a comment to D95101: AMDGPU: Use more accurate fast f64 fdiv.

That sequence will give a very accurate result as long as overflow and underflow is avoided. LGTM.

Jan 20 2021, 6:50 PM · Restricted Project

Nov 5 2020

b-sumner added a comment to D90809: [amdgpu] Add `llvm.amdgcn.endpgm` support..

Should this also be IntrConvergent?

Nov 5 2020, 11:59 AM · Restricted Project, Restricted Project

Oct 22 2020

b-sumner added a comment to D89966: [HIP] Fix HIP rounding math intrinsics.

Looks good to me.

Oct 22 2020, 8:51 AM
b-sumner added inline comments to D89966: [HIP] Fix HIP rounding math intrinsics.
Oct 22 2020, 8:26 AM

Jul 9 2020

b-sumner added a comment to D82818: AMDGPU: Remove .value_type from kernel metadata.

Do we also want to remove it from v2 metadata?

Probably, but I looked briefly and didn't actually see the direct equivalent

Jul 9 2020, 12:25 PM · Restricted Project

May 28 2020

b-sumner added a comment to D70523: [AMDGPU] Update AMDGPUUsage with DWARF proposal.

LGTM

May 28 2020, 5:38 PM · debug-info, Restricted Project
b-sumner added inline comments to D80702: [AMDGPU] Fold llvm.amdgcn.cos and llvm.amdgcn.sin intrinsics.
May 28 2020, 10:23 AM · Restricted Project

May 7 2020

b-sumner added a comment to D79580: AMDGPU: Don't assert on unknown address spaces.

How do we get these address spaces? I'd rather error out.

You can use attribute((address_space)) or hand write IR. We should not error on valid IR. Someday a frontend may wish to track special information with a custom address space number, which we can handle as a global alias similar to how x86 accepts any arbitrary address space

It sounds dangerous. An unknown address space has unknown semantics. How could we assume anything about it?

It doesn't have unknown semantics, it has target defined semantics. We can interpret them however we want. At a minimum any pointer computation on the values should work, even if load and store don't select

Is there any particular motivation to add this support?

Just generally making a best effort attempt to handle any IR that passes the verifier. We already try to handle these in a variety of other places the same way, and theoretically someone could want to use it.

May 7 2020, 5:23 PM · Restricted Project

May 2 2020

b-sumner added a comment to D77910: AMDGPU: Define cl_khr_gl_sharing as a supported extension.

I don't think we can guarantee this is or will be supported on all devices. The language runtime makes this decision.

We don't need to worry about theoretical devices. We should know the properties of the driver from -amdhsa, -amdpal, -mesa3d

It takes more than support in the ISA for some features. The OpenCL driver may not want to support a given optional feature, e.g. images. I'm not opposed to defaults, but if the driver chooses to not support images, it needs to be able to prevent __IMAGE_SUPPORT__ from being defined. Conformance will fail if the runtime and compiler are not consistent.

The driver details should be captured by the the triple. If some weird driver decided to do something different, we would need to add a new triple for it. We don't have such a driver, so I don't see why worry about it. It's possible to work around with undef and redef in an implicitly included header. We need to fix properties of the driver based on the target to have perfectly matching offline compilation

I don't see anywhere in the triple talking about driver specific details, unless you would use the environment? That seems like overkill to me. But again, I'm not opposed to defaults, and as long as the driver can override them, this should be OK.

The OS is the driver. It doesn't need to specifically encode these details; the OS should encode properties of the driver environment. Anything using -amdhsa should be reporting image support

May 2 2020, 10:03 AM

Apr 22 2020

b-sumner added inline comments to D75917: Expose llvm fence instruction as clang intrinsic.
Apr 22 2020, 8:38 AM · Restricted Project

Apr 17 2020

b-sumner added a comment to D77910: AMDGPU: Define cl_khr_gl_sharing as a supported extension.

I don't think we can guarantee this is or will be supported on all devices. The language runtime makes this decision.

We don't need to worry about theoretical devices. We should know the properties of the driver from -amdhsa, -amdpal, -mesa3d

It takes more than support in the ISA for some features. The OpenCL driver may not want to support a given optional feature, e.g. images. I'm not opposed to defaults, but if the driver chooses to not support images, it needs to be able to prevent __IMAGE_SUPPORT__ from being defined. Conformance will fail if the runtime and compiler are not consistent.

The driver details should be captured by the the triple. If some weird driver decided to do something different, we would need to add a new triple for it. We don't have such a driver, so I don't see why worry about it. It's possible to work around with undef and redef in an implicitly included header. We need to fix properties of the driver based on the target to have perfectly matching offline compilation

Apr 17 2020, 10:14 AM

Apr 14 2020

b-sumner added a comment to D77910: AMDGPU: Define cl_khr_gl_sharing as a supported extension.

I don't think we can guarantee this is or will be supported on all devices. The language runtime makes this decision.

We don't need to worry about theoretical devices. We should know the properties of the driver from -amdhsa, -amdpal, -mesa3d

Apr 14 2020, 2:04 PM

Apr 13 2020

b-sumner added a comment to D77923: OpenCL: Fix some missing predefined macros.

In my opinion, for on-line compile for OpenCL, the platform is responsible for setting __OPENCL_VERSION__. Also, it should be the platform's choice as to how to respond the image support query and how __IMAGE_SUPPORT__ is set. For offline compile, it doesn't seem unreasonable to ask the developer to set these.

Apr 13 2020, 2:41 PM · Restricted Project

Apr 11 2020

b-sumner added a comment to D77910: AMDGPU: Define cl_khr_gl_sharing as a supported extension.

I don't think we can guarantee this is or will be supported on all devices. The language runtime makes this decision.

Apr 11 2020, 10:39 AM

Apr 8 2020

b-sumner added a comment to D75917: Expose llvm fence instruction as clang intrinsic.

In addition to predefining __ATOMIC_RELAXED, etc., clang also predefines __OPENCL_MEMORY_SCOPE_WORK_ITEM and friends. So it doesn't really seem unreasonable for clang to also predefine its known syncscopes, and to require the argument to be one of those integers.

Apr 8 2020, 2:41 PM · Restricted Project

Apr 3 2020

b-sumner added a comment to D77390: Fix __builtin_amdgcn_workgroup_size_x/y/z return type.

LGTM

Apr 3 2020, 8:02 AM

Apr 2 2020

b-sumner added a comment to D75917: Expose llvm fence instruction as clang intrinsic.

Please go ahead and update to a string for the scope.

Apr 2 2020, 9:12 AM · Restricted Project

Mar 25 2020

b-sumner added a comment to D76795: [HIP] Change default --gpu-max-threads-per-block value to 1024.

Thanks. This looks fine to me.

Mar 25 2020, 2:38 PM · Restricted Project
b-sumner added inline comments to D76795: [HIP] Change default --gpu-max-threads-per-block value to 1024.
Mar 25 2020, 12:59 PM · Restricted Project

Mar 20 2020

b-sumner added inline comments to D76356: [AMDGPU] Introduce more scratch registers in the ABI..
Mar 20 2020, 11:55 AM · Restricted Project

Mar 12 2020

b-sumner added a comment to D76107: AMDGPU: Don't handle kernarg.segment.ptr in functions.

I'm not aware of any valid explicit uses of this intrinsic by library or developer code, so this seems fine to me.

Mar 12 2020, 4:50 PM

Feb 20 2020

b-sumner added a comment to D74910: [OpenCL] Remove spurious atomic_fetch_min/max builtins.

I recall we agreed that conformance tests using mixed types were broken, so this change should be OK. Hopefully this will not affect users.

Feb 20 2020, 10:02 AM · Restricted Project

Feb 19 2020

b-sumner accepted D74807: Add cl_khr_mipmap_image_writes as supported to AMDGPU.

LGTM

Feb 19 2020, 9:03 AM · Restricted Project

Jan 7 2020

b-sumner added a comment to D71365: expand printf when compiling HIP to AMDGPU.

Should this be looking forward to also handling OpenCL, which does require vector support?

Jan 7 2020, 6:27 AM · Restricted Project, Restricted Project

Jan 6 2020

b-sumner added a comment to D71357: AMDGPU: Assume f32 denormals are enabled by default.

I am OK with proceeding here.

Jan 6 2020, 9:47 AM · Restricted Project
b-sumner added a comment to D71358: AMDGPU: Remove denormal subtarget features.

I am OK with proceeding here.

Jan 6 2020, 9:47 AM · Restricted Project

Dec 11 2019

b-sumner added a comment to D71358: AMDGPU: Remove denormal subtarget features.

How is it now achieved that we do not support denormals by default before GFX9?

This is a frontend decision. clang emits the attribute based on the subtarget preference

This is also how it always worked. The backend uniformly treated denormals as off by default

Dec 11 2019, 10:39 AM · Restricted Project

Dec 10 2019

b-sumner added a comment to D71293: AMDGPU: Generate the correct sequence of code for FDIV32 when correctly-rounded-divide-sqrt is set.

This looks OK to me, although tuning on correctly rounded division any time denorms are enabled is not actually required by OpenCL.

Dec 10 2019, 3:19 PM · Restricted Project

Aug 14 2019

b-sumner added a comment to D66197: AMDGPU: Add intrinsics for address space identification.

Do we really need these to be "amdgpu" specific?

Aug 14 2019, 11:12 AM
b-sumner added a comment to D66197: AMDGPU: Add intrinsics for address space identification.

Looks fine to me.

Aug 14 2019, 10:56 AM
b-sumner added a comment to D66198: AMDGPU: Add builtins for is_local/is_private.

Looks fine to me.

Aug 14 2019, 10:52 AM
b-sumner added a comment to D66197: AMDGPU: Add intrinsics for address space identification.

Looks fine to me. Thanks!

I wonder if is.local should be is.shared because that's what getreg calls this

Aug 14 2019, 9:07 AM
b-sumner added a comment to D66198: AMDGPU: Add builtins for is_local/is_private.

Looks fine to me. Thanks!

Aug 14 2019, 8:09 AM
b-sumner added a comment to D66197: AMDGPU: Add intrinsics for address space identification.

Looks fine to me. Thanks!

Aug 14 2019, 8:05 AM

Jun 14 2019

b-sumner added a comment to D62739: AMDGPU: Always emit amdgpu-flat-work-group-size.

We need to communicate with anyone generating IR to ensure this is being generated before we change the default. clang is only one of those generators. This change will also need to be documented in the usage document.

The planned change is to make the backend more conservative, so it shouldn't break other frontends

Jun 14 2019, 7:35 AM

Jun 10 2019

b-sumner added a comment to D62739: AMDGPU: Always emit amdgpu-flat-work-group-size.

We need to communicate with anyone generating IR to ensure this is being generated before we change the default. clang is only one of those generators. This change will also need to be documented in the usage document.

Jun 10 2019, 9:32 AM

May 31 2019

b-sumner added inline comments to D62739: AMDGPU: Always emit amdgpu-flat-work-group-size.
May 31 2019, 9:59 AM

Apr 25 2019

b-sumner added a comment to D61112: AMDGPU: Enable _Float16.

Looks good to me.

Apr 25 2019, 6:57 AM

Mar 18 2019

b-sumner added inline comments to D59494: AMDGPU: Add support for cross address space synchronization scopes (clang).
Mar 18 2019, 5:16 PM · Restricted Project