Page MenuHomePhabricator

b-sumner (Brian Sumner)
User

Projects

User does not belong to any projects.

User Details

User Since
Aug 18 2016, 12:44 PM (252 w, 3 d)

Recent Activity

Thu, Jun 10

b-sumner added a comment to D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers..

I believe that for the purposes of detecting lane 0 mbcnt_lo is sufficient.

Thu, Jun 10, 7:05 AM · Restricted Project

Tue, Jun 8

b-sumner added a comment to D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers..

Two approaches for limiting the stores to lane 0 of each wave:

  1. Write 1 to exec mask, store, and write -1 to exec mask. This works since the exec mask at the start of the wave when this happens is zero
  2. Check for lane == 0 and branch. The lane can be computed by a) wave64: builtin_amdgcn_mbcnt_hi(~0u, builtin_amdgcn_mbcnt_lo(~0u, 0u)) b) wave32: __builtin_amdgcn_mbcnt_lo(~0u, 0u)
Tue, Jun 8, 7:24 AM · Restricted Project

May 13 2021

b-sumner added a comment to D102347: [AMDGPU] Only allow global fp atomics with unsafe option.

This looks good to me.

May 13 2021, 7:03 AM · Restricted Project

May 6 2021

b-sumner added inline comments to D102022: [AMDGPU] Expose __builtin_amdgcn_perm for v_perm_b32.
May 6 2021, 3:34 PM · Restricted Project, Restricted Project

Mar 19 2021

b-sumner added inline comments to D98953: [AMDGPU] Use reductions instead of scans in the atomic optimizer.
Mar 19 2021, 12:25 PM · Restricted Project
b-sumner added inline comments to D98953: [AMDGPU] Use reductions instead of scans in the atomic optimizer.
Mar 19 2021, 11:25 AM · Restricted Project

Jan 25 2021

b-sumner added a comment to D95391: AMDGPU: Add support for amdgpu-unsafe-fp-atomics attribute.

That is the safest thing to do. However, I'm now hearing some thinking that memory likely to be targeted by f.p. atomics is likely to be cached. But again, better to be safe at least until we're clear this will always be the case.

Jan 25 2021, 3:32 PM · Restricted Project

Jan 20 2021

b-sumner added a comment to D95102: AMDGPU: Remove v_rsq_f64 patterns.

Thanks. LGTM.

Jan 20 2021, 6:52 PM · Restricted Project
b-sumner added a comment to D95101: AMDGPU: Use more accurate fast f64 fdiv.

That sequence will give a very accurate result as long as overflow and underflow is avoided. LGTM.

Jan 20 2021, 6:50 PM · Restricted Project

Nov 5 2020

b-sumner added a comment to D90809: [amdgpu] Add `llvm.amdgcn.endpgm` support..

Should this also be IntrConvergent?

Nov 5 2020, 11:59 AM · Restricted Project, Restricted Project

Oct 22 2020

b-sumner added a comment to D89966: [HIP] Fix HIP rounding math intrinsics.

Looks good to me.

Oct 22 2020, 8:51 AM
b-sumner added inline comments to D89966: [HIP] Fix HIP rounding math intrinsics.
Oct 22 2020, 8:26 AM

Jul 9 2020

b-sumner added a comment to D82818: AMDGPU: Remove .value_type from kernel metadata.

Do we also want to remove it from v2 metadata?

Probably, but I looked briefly and didn't actually see the direct equivalent

Jul 9 2020, 12:25 PM · Restricted Project

May 28 2020

b-sumner added a comment to D70523: [AMDGPU] Update AMDGPUUsage with DWARF proposal.

LGTM

May 28 2020, 5:38 PM · debug-info, Restricted Project
b-sumner added inline comments to D80702: [AMDGPU] Fold llvm.amdgcn.cos and llvm.amdgcn.sin intrinsics.
May 28 2020, 10:23 AM · Restricted Project

May 7 2020

b-sumner added a comment to D79580: AMDGPU: Don't assert on unknown address spaces.

How do we get these address spaces? I'd rather error out.

You can use attribute((address_space)) or hand write IR. We should not error on valid IR. Someday a frontend may wish to track special information with a custom address space number, which we can handle as a global alias similar to how x86 accepts any arbitrary address space

It sounds dangerous. An unknown address space has unknown semantics. How could we assume anything about it?

It doesn't have unknown semantics, it has target defined semantics. We can interpret them however we want. At a minimum any pointer computation on the values should work, even if load and store don't select

Is there any particular motivation to add this support?

Just generally making a best effort attempt to handle any IR that passes the verifier. We already try to handle these in a variety of other places the same way, and theoretically someone could want to use it.

May 7 2020, 5:23 PM · Restricted Project

May 2 2020

b-sumner added a comment to D77910: AMDGPU: Define cl_khr_gl_sharing as a supported extension.

I don't think we can guarantee this is or will be supported on all devices. The language runtime makes this decision.

We don't need to worry about theoretical devices. We should know the properties of the driver from -amdhsa, -amdpal, -mesa3d

It takes more than support in the ISA for some features. The OpenCL driver may not want to support a given optional feature, e.g. images. I'm not opposed to defaults, but if the driver chooses to not support images, it needs to be able to prevent __IMAGE_SUPPORT__ from being defined. Conformance will fail if the runtime and compiler are not consistent.

The driver details should be captured by the the triple. If some weird driver decided to do something different, we would need to add a new triple for it. We don't have such a driver, so I don't see why worry about it. It's possible to work around with undef and redef in an implicitly included header. We need to fix properties of the driver based on the target to have perfectly matching offline compilation

I don't see anywhere in the triple talking about driver specific details, unless you would use the environment? That seems like overkill to me. But again, I'm not opposed to defaults, and as long as the driver can override them, this should be OK.

The OS is the driver. It doesn't need to specifically encode these details; the OS should encode properties of the driver environment. Anything using -amdhsa should be reporting image support

May 2 2020, 10:03 AM

Apr 22 2020

b-sumner added inline comments to D75917: Expose llvm fence instruction as clang intrinsic.
Apr 22 2020, 8:38 AM · Restricted Project

Apr 17 2020

b-sumner added a comment to D77910: AMDGPU: Define cl_khr_gl_sharing as a supported extension.

I don't think we can guarantee this is or will be supported on all devices. The language runtime makes this decision.

We don't need to worry about theoretical devices. We should know the properties of the driver from -amdhsa, -amdpal, -mesa3d

It takes more than support in the ISA for some features. The OpenCL driver may not want to support a given optional feature, e.g. images. I'm not opposed to defaults, but if the driver chooses to not support images, it needs to be able to prevent __IMAGE_SUPPORT__ from being defined. Conformance will fail if the runtime and compiler are not consistent.

The driver details should be captured by the the triple. If some weird driver decided to do something different, we would need to add a new triple for it. We don't have such a driver, so I don't see why worry about it. It's possible to work around with undef and redef in an implicitly included header. We need to fix properties of the driver based on the target to have perfectly matching offline compilation

Apr 17 2020, 10:14 AM

Apr 14 2020

b-sumner added a comment to D77910: AMDGPU: Define cl_khr_gl_sharing as a supported extension.

I don't think we can guarantee this is or will be supported on all devices. The language runtime makes this decision.

We don't need to worry about theoretical devices. We should know the properties of the driver from -amdhsa, -amdpal, -mesa3d

Apr 14 2020, 2:04 PM

Apr 13 2020

b-sumner added a comment to D77923: OpenCL: Fix some missing predefined macros.

In my opinion, for on-line compile for OpenCL, the platform is responsible for setting __OPENCL_VERSION__. Also, it should be the platform's choice as to how to respond the image support query and how __IMAGE_SUPPORT__ is set. For offline compile, it doesn't seem unreasonable to ask the developer to set these.

Apr 13 2020, 2:41 PM · Restricted Project

Apr 11 2020

b-sumner added a comment to D77910: AMDGPU: Define cl_khr_gl_sharing as a supported extension.

I don't think we can guarantee this is or will be supported on all devices. The language runtime makes this decision.

Apr 11 2020, 10:39 AM

Apr 8 2020

b-sumner added a comment to D75917: Expose llvm fence instruction as clang intrinsic.

In addition to predefining __ATOMIC_RELAXED, etc., clang also predefines __OPENCL_MEMORY_SCOPE_WORK_ITEM and friends. So it doesn't really seem unreasonable for clang to also predefine its known syncscopes, and to require the argument to be one of those integers.

Apr 8 2020, 2:41 PM · Restricted Project

Apr 3 2020

b-sumner added a comment to D77390: Fix __builtin_amdgcn_workgroup_size_x/y/z return type.

LGTM

Apr 3 2020, 8:02 AM

Apr 2 2020

b-sumner added a comment to D75917: Expose llvm fence instruction as clang intrinsic.

Please go ahead and update to a string for the scope.

Apr 2 2020, 9:12 AM · Restricted Project

Mar 25 2020

b-sumner added a comment to D76795: [HIP] Change default --gpu-max-threads-per-block value to 1024.

Thanks. This looks fine to me.

Mar 25 2020, 2:38 PM · Restricted Project
b-sumner added inline comments to D76795: [HIP] Change default --gpu-max-threads-per-block value to 1024.
Mar 25 2020, 12:59 PM · Restricted Project

Mar 20 2020

b-sumner added inline comments to D76356: [AMDGPU] Introduce more scratch registers in the ABI..
Mar 20 2020, 11:55 AM · Restricted Project

Mar 12 2020

b-sumner added a comment to D76107: AMDGPU: Don't handle kernarg.segment.ptr in functions.

I'm not aware of any valid explicit uses of this intrinsic by library or developer code, so this seems fine to me.

Mar 12 2020, 4:50 PM

Feb 20 2020

b-sumner added a comment to D74910: [OpenCL] Remove spurious atomic_fetch_min/max builtins.

I recall we agreed that conformance tests using mixed types were broken, so this change should be OK. Hopefully this will not affect users.

Feb 20 2020, 10:02 AM · Restricted Project

Feb 19 2020

b-sumner accepted D74807: Add cl_khr_mipmap_image_writes as supported to AMDGPU.

LGTM

Feb 19 2020, 9:03 AM · Restricted Project

Jan 7 2020

b-sumner added a comment to D71365: expand printf when compiling HIP to AMDGPU.

Should this be looking forward to also handling OpenCL, which does require vector support?

Jan 7 2020, 6:27 AM · Restricted Project, Restricted Project

Jan 6 2020

b-sumner added a comment to D71357: AMDGPU: Assume f32 denormals are enabled by default.

I am OK with proceeding here.

Jan 6 2020, 9:47 AM · Restricted Project
b-sumner added a comment to D71358: AMDGPU: Remove denormal subtarget features.

I am OK with proceeding here.

Jan 6 2020, 9:47 AM · Restricted Project

Dec 11 2019

b-sumner added a comment to D71358: AMDGPU: Remove denormal subtarget features.

How is it now achieved that we do not support denormals by default before GFX9?

This is a frontend decision. clang emits the attribute based on the subtarget preference

This is also how it always worked. The backend uniformly treated denormals as off by default

Dec 11 2019, 10:39 AM · Restricted Project

Dec 10 2019

b-sumner added a comment to D71293: AMDGPU: Generate the correct sequence of code for FDIV32 when correctly-rounded-divide-sqrt is set.

This looks OK to me, although tuning on correctly rounded division any time denorms are enabled is not actually required by OpenCL.

Dec 10 2019, 3:19 PM · Restricted Project

Aug 14 2019

b-sumner added a comment to D66197: AMDGPU: Add intrinsics for address space identification.

Do we really need these to be "amdgpu" specific?

Aug 14 2019, 11:12 AM
b-sumner added a comment to D66197: AMDGPU: Add intrinsics for address space identification.

Looks fine to me.

Aug 14 2019, 10:56 AM
b-sumner added a comment to D66198: AMDGPU: Add builtins for is_local/is_private.

Looks fine to me.

Aug 14 2019, 10:52 AM
b-sumner added a comment to D66197: AMDGPU: Add intrinsics for address space identification.

Looks fine to me. Thanks!

I wonder if is.local should be is.shared because that's what getreg calls this

Aug 14 2019, 9:07 AM
b-sumner added a comment to D66198: AMDGPU: Add builtins for is_local/is_private.

Looks fine to me. Thanks!

Aug 14 2019, 8:09 AM
b-sumner added a comment to D66197: AMDGPU: Add intrinsics for address space identification.

Looks fine to me. Thanks!

Aug 14 2019, 8:05 AM

Jun 14 2019

b-sumner added a comment to D62739: AMDGPU: Always emit amdgpu-flat-work-group-size.

We need to communicate with anyone generating IR to ensure this is being generated before we change the default. clang is only one of those generators. This change will also need to be documented in the usage document.

The planned change is to make the backend more conservative, so it shouldn't break other frontends

Jun 14 2019, 7:35 AM

Jun 10 2019

b-sumner added a comment to D62739: AMDGPU: Always emit amdgpu-flat-work-group-size.

We need to communicate with anyone generating IR to ensure this is being generated before we change the default. clang is only one of those generators. This change will also need to be documented in the usage document.

Jun 10 2019, 9:32 AM

May 31 2019

b-sumner added inline comments to D62739: AMDGPU: Always emit amdgpu-flat-work-group-size.
May 31 2019, 9:59 AM

Apr 25 2019

b-sumner added a comment to D61112: AMDGPU: Enable _Float16.

Looks good to me.

Apr 25 2019, 6:57 AM

Mar 18 2019

b-sumner added inline comments to D59494: AMDGPU: Add support for cross address space synchronization scopes (clang).
Mar 18 2019, 5:16 PM · Restricted Project
b-sumner added a comment to D59494: AMDGPU: Add support for cross address space synchronization scopes (clang).
Mar 18 2019, 4:14 PM · Restricted Project

Feb 6 2019

b-sumner added a comment to D57831: AMDGPU: set wchar_t and wint_t to be unsigned short on windows.

Maybe there are already other types like this, but it saddens me that an offline compiled code object could potentially not work properly if the application is using any of these types. Or should the runtime try to detect a problem using argument metadata?

Feb 6 2019, 11:34 AM

Jan 28 2019

b-sumner added a comment to D57340: AMDGPU: Add DS append/consume intrinsics.

I think it is perfectly reasonable to treat these as essentially relaxed-only atomic RMW operations and require the application to use fences or barriers if necessary. The ordering and scope are only needed if we ever need this operation to act as a non-relaxed atomic RMW.

Jan 28 2019, 10:15 AM

Nov 1 2018

b-sumner added a comment to D50633: [AMDGPU] Add new Mode Register pass.

Actually the conversions don't need non-default-rounded operations, nor are non-default-rounded arithmetic operations required by OpenCL. However, we've had requests to implement functions such as add_rtz(x,y) which computes x+y with round-to-zero rounding. Our competitors offer such functions, and we implemented them for HSAIL. So we are really trying to get back to parity with HSAIL.

Nov 1 2018, 12:40 PM
b-sumner added a comment to D50633: [AMDGPU] Add new Mode Register pass.

One thing we've wanted for compute for quite a while now is a way to request non-default-rounded add, sub, mul, div, fma, and sqrt. Assuming we ever figure out how to represent these in the IR, ideally without falling back on intrinsics, could this approach be used to implement and minimize the mode changes for those as well?

Nov 1 2018, 11:28 AM

Oct 16 2018

b-sumner added a comment to D52320: AMDGPU: add __builtin_amdgcn_update_dpp.

Ping. There's quite a bit of interest in getting this exposed by clang.

Oct 16 2018, 10:11 AM

Sep 28 2018

b-sumner added inline comments to D52320: AMDGPU: add __builtin_amdgcn_update_dpp.
Sep 28 2018, 3:47 PM

Aug 7 2018

b-sumner added a comment to D50376: AMDGPU: Fix enabling denormals by default on pre-VI targets.

This approach seems fine to me.

Aug 7 2018, 7:39 AM

Jul 10 2018

b-sumner added a comment to D49146: [AMDGPU] Support a fdot2 pattern..

By the way, since types are being mixed, shouldn't the summary say something like optimize fma((float)S0.x, (float)S1.x, fma((float)S0.y, (float)S1.y, S2)) --> fdot2(S0, S1, S2)? We only want this transformation if S0 and S1 are <2 x f16>.

Current pattern matching does not support float element type yet, it will be supported next.

You are right, there is a typo in the summary. It should be:
fma((f16)S0.x, (f16)S1.x fma((f16)S0.y, (f16)S1.y, (f16)z)) -> ftrunc(fdot2(S0, S1, (f32)z))

Jul 10 2018, 12:41 PM
b-sumner added a comment to D49146: [AMDGPU] Support a fdot2 pattern..

By the way, since types are being mixed, shouldn't the summary say something like optimize fma((float)S0.x, (float)S1.x, fma((float)S0.y, (float)S1.y, S2)) --> fdot2(S0, S1, S2)? We only want this transformation if S0 and S1 are <2 x f16>.

Jul 10 2018, 10:51 AM
b-sumner added a comment to D49146: [AMDGPU] Support a fdot2 pattern..

This operation only rounds a single time, and unfortunately always flushes f32 denorms. Thus this transformation should only be done when unsafe math is requested.

Jul 10 2018, 10:30 AM

Jun 27 2018

b-sumner added a comment to D48667: [HIP] Fix ordering of device-lib linking.

Thanks, looks good.

Jun 27 2018, 12:20 PM · Restricted Project, Restricted Project

Jun 22 2018

b-sumner added a comment to D48493: [HIP] Support flush denorms bitcode.

LGTM

Jun 22 2018, 10:44 AM · Restricted Project, Restricted Project

Jun 13 2018

b-sumner accepted D48094: [AMDGPU] Change enqueue kernel handle type.

Looks good to me.

Jun 13 2018, 8:08 AM

Jun 5 2018

b-sumner added inline comments to D47566: AMDHSA: Code object v3 updates.
Jun 5 2018, 5:46 AM

May 8 2018

b-sumner added a comment to D46601: [OpenCL] Fix typos in emitted enqueue kernel function names.

Thanks! Looks good to me.

May 8 2018, 1:52 PM

Mar 23 2018

b-sumner added a comment to D44718: [AMDGPU] Define code object identification string used in AMDHSA runtimes..

LGTM

Mar 23 2018, 1:06 PM
b-sumner added a comment to D44718: [AMDGPU] Define code object identification string used in AMDHSA runtimes..

LGTM, but I'd rather use '+' instead of ',' for the features.

Mar 23 2018, 12:26 PM

Mar 22 2018

b-sumner added a comment to D39739: [HCC] Add flag to Import Weak Functions in Function Importer.

Is first one encountered a poor design?

Mar 22 2018, 8:11 AM

Mar 2 2018

b-sumner added a comment to D43911: [AMDGPU] Clean up old address space mapping and fix constant address space value.

Looks fine to me.

Mar 2 2018, 12:23 PM

Feb 27 2018

b-sumner added inline comments to D43414: AMDGPU: Define FP_FAST_FMA{F} macros for amdgcn.
Feb 27 2018, 5:41 AM

Feb 26 2018

b-sumner added inline comments to D43414: AMDGPU: Define FP_FAST_FMA{F} macros for amdgcn.
Feb 26 2018, 3:57 PM

Feb 23 2018

b-sumner added inline comments to D43414: AMDGPU: Define FP_FAST_FMA{F} macros for amdgcn.
Feb 23 2018, 3:10 PM

Feb 16 2018

b-sumner added inline comments to D43414: AMDGPU: Define FP_FAST_FMA{F} macros for amdgcn.
Feb 16 2018, 3:18 PM

Feb 15 2018

b-sumner accepted D43340: Clean up AMDGCN tests.

Looks good to me.

Feb 15 2018, 11:04 AM

Feb 14 2018

b-sumner added inline comments to D43281: [AMDGPU] fixes for lds f32 builtins.
Feb 14 2018, 7:35 AM · Restricted Project

Jan 31 2018

b-sumner added a comment to D42711: AMDGPU: Support target triple OS component cuda.

I think the purpose of this patch is to get a similar usage of clang as nvptx when compiling CUDA, i.e., using cuda as OS instead of using amdhsa as OS and amdgiz as environment. This is more convenient for CUDA application developers since they just need to swap nvptx with amdgcn.

This is a frontend driver question at most. The backend shouldn't need to be aware of this

There are various places in clang where selection is done based on OS==CUDA. If we don't use that OS, we need more complex logic in clang for such choices. I can try making changes to clang to make it work, but I suspect there may be places using OS==CUDA is necessary since it may be needed before parsing the language options.

Jan 31 2018, 9:06 AM

Jan 30 2018

b-sumner added a comment to D42711: AMDGPU: Support target triple OS component cuda.

As I understand it, the option users pass to clang++ is --cuda-gpu-arch=<Arch>. Can't we arrange to generate the right triple if they use gfx900 or some other AMD target name for <Arch>?

Jan 30 2018, 3:32 PM
b-sumner added a comment to D42711: AMDGPU: Support target triple OS component cuda.

You're using this just as an alias for AMDHSAOS. We shouldn't add something that behaves exactly the same

Jan 30 2018, 2:58 PM

Jan 29 2018

b-sumner added a comment to D42578: [AMDGPU] Add ds_fadd, ds_fmin, ds_fmax builtins functions.

Should we expect that the last 3 arguments have any effect? Do we want to test to ensure they have the expected effects?

Jan 29 2018, 2:30 PM · Restricted Project
b-sumner added a comment to D41699: [OpenCL] Change sampler representation.

I believe there are 20 distinct samplers. I'd be more comfortable with this if you could arrange that __translate_sampler_initializer would be called with values in [0, 19] instead of [133, 322].

Jan 29 2018, 11:09 AM
b-sumner added a comment to D41699: [OpenCL] Change sampler representation.

I believe there are 20 distinct samplers. I'd be more comfortable with this if you could arrange that __translate_sampler_initializer would be called with values in [0, 19] instead of [133, 322].

Jan 29 2018, 11:02 AM

Jan 26 2018

b-sumner added a comment to D42596: AMDGPU/SI: Add decoding in the GFX80_UNPACKED decoding namespace..

Can this be tested?

Jan 26 2018, 1:15 PM
b-sumner added a comment to D42578: [AMDGPU] Add ds_fadd, ds_fmin, ds_fmax builtins functions.

Were you going to add min and max separately?

Jan 26 2018, 7:48 AM · Restricted Project

Jan 23 2018

b-sumner added a comment to D42383: [AMDGPU] fix LDS f32 intrinsics.

Looks good to me...

Jan 23 2018, 12:43 PM · Restricted Project

Jan 18 2018

b-sumner added a comment to D42257: AMDGPU/MemoryModel: Fix monotonic atomic loads.

Do we need tests for 64-bit atomic loads? Do we need tests for cases where we get buffer instead of flat instructions?

Jan 18 2018, 12:20 PM
b-sumner added a comment to D42256: AMDGPU/MemoryModel: Fix monotonic atomic loads.

Do we need tests of 64 bit atomic loads? What about cases where we get buffer instructions rather than flat instructions?

Jan 18 2018, 12:01 PM
b-sumner added a comment to D42255: AMDGPU/MemoryModel: Fix monotonic atomic loads.

Do we need tests of 64-bit atomic loads? Do we need tests where we should have buffer loads instead of a flat loads?

Jan 18 2018, 12:00 PM

Jan 9 2018

b-sumner added a comment to D41699: [OpenCL] Change sampler representation.

This will break all existing implementations and is not backwards compatible. Is this extra diagnosis really worthwhile? Are developers complaining?

Jan 9 2018, 10:49 AM
b-sumner added a comment to D41699: [OpenCL] Change sampler representation.

What's the benefit of this change? Since this change will require all device libraries implementing __translate_sampler_initializer to change accordingly. We need a compelling reason.

Jan 9 2018, 8:09 AM

Dec 5 2017

b-sumner added a comment to D39739: [HCC] Add flag to Import Weak Functions in Function Importer.

The usual rule is to take the first weak definition encountered.

Dec 5 2017, 10:19 AM

Nov 28 2017

b-sumner added a comment to D40514: AMDGPU: Restrict ieee_mode to HSA..

I suppose if we have some other mechanism to ensure output modifiers are never used, then we could consider running with IEEE=0, but I think we need to continue running with IEEE=1 for compute. There may be users counting on current behavior.

The current behavior is broken for OpenCL so no one should depend on it.

Nov 28 2017, 2:02 PM
b-sumner added a comment to D40514: AMDGPU: Restrict ieee_mode to HSA..

IEEE mode disables output modifiers, which is good since output modifiers are not IEEE compatible and do not support output subnormal values.

Nov 28 2017, 6:13 AM

Nov 27 2017

b-sumner added a comment to D40051: AMDGPU: Rename Bonaire target to be gfx704; remove gfx800 and make Iceland and Tonga both use gfx802; update target feature handling.

Looks good to me.

Nov 27 2017, 2:16 PM

Nov 10 2017

b-sumner added a comment to D39912: AMDGPU/SI: Implement d16 support for image intrinsics.

Pardon my ignorance, but why isn't include/llvm/IR/IntrinsicsAMDGCN.td being updated?

We did not add new intrinsics. We just add support for new data types. In IntrinsicsAMDGCN.td, we have already defined
the data types as any_float which includes half types.

In other wordm llvm.amdgcn.image.load.v4f16 (for example) has already been declared in IntrinsicsAMDGCN.td. And this patch just needs to actually define (implement) it.

Nov 10 2017, 12:42 PM
b-sumner added a comment to D39912: AMDGPU/SI: Implement d16 support for image intrinsics.

Pardon my ignorance, but why isn't include/llvm/IR/IntrinsicsAMDGCN.td being updated?

Nov 10 2017, 12:18 PM
b-sumner added a comment to D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics.

Pardon my ignorance, but why isn't include/llvm/IR/IntrinsicsAMDGPU.td being updated?

Nov 10 2017, 12:16 PM

Oct 23 2017

b-sumner added a comment to D39186: LLD: Fix large integer implicitly truncated to unsigned type warning.

g++4.8.2 is complaining:

Oct 23 2017, 9:51 AM

Oct 17 2017

b-sumner added a comment to D39014: AMDGPU : Fix an error for the llvm.cttz implementation..

This passes my tests, including getting the correct answer for 0.

Oct 17 2017, 1:50 PM

Oct 6 2017

b-sumner accepted D38607: [AMDGPU] New 64 bit div/rem expansion.

Looks good to me.

Oct 6 2017, 7:19 AM

Oct 5 2017

b-sumner added inline comments to D38607: [AMDGPU] New 64 bit div/rem expansion.
Oct 5 2017, 4:55 PM

Sep 29 2017

b-sumner added a comment to D38421: Eliminate ftrunc if source is know to be rounded.

We could potentially update visitCEIL and visitFLOOR as well, and use the same opcode test in each, although I don't think such combinations are very likely.

Sep 29 2017, 1:32 PM

Sep 28 2017

b-sumner accepted D37568: [AMDGPU] Allow flexible register names in inline asm constraints.

LGTM. I think we can leave immediates to another patch.

Sep 28 2017, 11:04 AM