Page MenuHomePhabricator

b-sumner (Brian Sumner)
User

Projects

User does not belong to any projects.

User Details

User Since
Aug 18 2016, 12:44 PM (217 w, 6 d)

Recent Activity

Jul 9 2020

b-sumner added a comment to D82818: AMDGPU: Remove .value_type from kernel metadata.

Do we also want to remove it from v2 metadata?

Probably, but I looked briefly and didn't actually see the direct equivalent

Jul 9 2020, 12:25 PM · Restricted Project

May 28 2020

b-sumner added a comment to D70523: [AMDGPU] Update AMDGPUUsage with DWARF proposal.

LGTM

May 28 2020, 5:38 PM · debug-info, Restricted Project
b-sumner added inline comments to D80702: [AMDGPU] Fold llvm.amdgcn.cos and llvm.amdgcn.sin intrinsics.
May 28 2020, 10:23 AM · Restricted Project

May 7 2020

b-sumner added a comment to D79580: AMDGPU: Don't assert on unknown address spaces.

How do we get these address spaces? I'd rather error out.

You can use attribute((address_space)) or hand write IR. We should not error on valid IR. Someday a frontend may wish to track special information with a custom address space number, which we can handle as a global alias similar to how x86 accepts any arbitrary address space

It sounds dangerous. An unknown address space has unknown semantics. How could we assume anything about it?

It doesn't have unknown semantics, it has target defined semantics. We can interpret them however we want. At a minimum any pointer computation on the values should work, even if load and store don't select

Is there any particular motivation to add this support?

Just generally making a best effort attempt to handle any IR that passes the verifier. We already try to handle these in a variety of other places the same way, and theoretically someone could want to use it.

May 7 2020, 5:23 PM · Restricted Project

May 2 2020

b-sumner added a comment to D77910: AMDGPU: Define cl_khr_gl_sharing as a supported extension.

I don't think we can guarantee this is or will be supported on all devices. The language runtime makes this decision.

We don't need to worry about theoretical devices. We should know the properties of the driver from -amdhsa, -amdpal, -mesa3d

It takes more than support in the ISA for some features. The OpenCL driver may not want to support a given optional feature, e.g. images. I'm not opposed to defaults, but if the driver chooses to not support images, it needs to be able to prevent __IMAGE_SUPPORT__ from being defined. Conformance will fail if the runtime and compiler are not consistent.

The driver details should be captured by the the triple. If some weird driver decided to do something different, we would need to add a new triple for it. We don't have such a driver, so I don't see why worry about it. It's possible to work around with undef and redef in an implicitly included header. We need to fix properties of the driver based on the target to have perfectly matching offline compilation

I don't see anywhere in the triple talking about driver specific details, unless you would use the environment? That seems like overkill to me. But again, I'm not opposed to defaults, and as long as the driver can override them, this should be OK.

The OS is the driver. It doesn't need to specifically encode these details; the OS should encode properties of the driver environment. Anything using -amdhsa should be reporting image support

May 2 2020, 10:03 AM

Apr 22 2020

b-sumner added inline comments to D75917: Expose llvm fence instruction as clang intrinsic.
Apr 22 2020, 8:38 AM · Restricted Project

Apr 17 2020

b-sumner added a comment to D77910: AMDGPU: Define cl_khr_gl_sharing as a supported extension.

I don't think we can guarantee this is or will be supported on all devices. The language runtime makes this decision.

We don't need to worry about theoretical devices. We should know the properties of the driver from -amdhsa, -amdpal, -mesa3d

It takes more than support in the ISA for some features. The OpenCL driver may not want to support a given optional feature, e.g. images. I'm not opposed to defaults, but if the driver chooses to not support images, it needs to be able to prevent __IMAGE_SUPPORT__ from being defined. Conformance will fail if the runtime and compiler are not consistent.

The driver details should be captured by the the triple. If some weird driver decided to do something different, we would need to add a new triple for it. We don't have such a driver, so I don't see why worry about it. It's possible to work around with undef and redef in an implicitly included header. We need to fix properties of the driver based on the target to have perfectly matching offline compilation

Apr 17 2020, 10:14 AM

Apr 14 2020

b-sumner added a comment to D77910: AMDGPU: Define cl_khr_gl_sharing as a supported extension.

I don't think we can guarantee this is or will be supported on all devices. The language runtime makes this decision.

We don't need to worry about theoretical devices. We should know the properties of the driver from -amdhsa, -amdpal, -mesa3d

Apr 14 2020, 2:04 PM

Apr 13 2020

b-sumner added a comment to D77923: OpenCL: Fix some missing predefined macros.

In my opinion, for on-line compile for OpenCL, the platform is responsible for setting __OPENCL_VERSION__. Also, it should be the platform's choice as to how to respond the image support query and how __IMAGE_SUPPORT__ is set. For offline compile, it doesn't seem unreasonable to ask the developer to set these.

Apr 13 2020, 2:41 PM · Restricted Project

Apr 11 2020

b-sumner added a comment to D77910: AMDGPU: Define cl_khr_gl_sharing as a supported extension.

I don't think we can guarantee this is or will be supported on all devices. The language runtime makes this decision.

Apr 11 2020, 10:39 AM

Apr 8 2020

b-sumner added a comment to D75917: Expose llvm fence instruction as clang intrinsic.

In addition to predefining __ATOMIC_RELAXED, etc., clang also predefines __OPENCL_MEMORY_SCOPE_WORK_ITEM and friends. So it doesn't really seem unreasonable for clang to also predefine its known syncscopes, and to require the argument to be one of those integers.

Apr 8 2020, 2:41 PM · Restricted Project

Apr 3 2020

b-sumner added a comment to D77390: Fix __builtin_amdgcn_workgroup_size_x/y/z return type.

LGTM

Apr 3 2020, 8:02 AM

Apr 2 2020

b-sumner added a comment to D75917: Expose llvm fence instruction as clang intrinsic.

Please go ahead and update to a string for the scope.

Apr 2 2020, 9:12 AM · Restricted Project

Mar 25 2020

b-sumner added a comment to D76795: [HIP] Change default --gpu-max-threads-per-block value to 1024.

Thanks. This looks fine to me.

Mar 25 2020, 2:38 PM · Restricted Project
b-sumner added inline comments to D76795: [HIP] Change default --gpu-max-threads-per-block value to 1024.
Mar 25 2020, 12:59 PM · Restricted Project

Mar 20 2020

b-sumner added inline comments to D76356: [AMDGPU] Introduce more scratch registers in the ABI..
Mar 20 2020, 11:55 AM · Restricted Project

Mar 12 2020

b-sumner added a comment to D76107: AMDGPU: Don't handle kernarg.segment.ptr in functions.

I'm not aware of any valid explicit uses of this intrinsic by library or developer code, so this seems fine to me.

Mar 12 2020, 4:50 PM

Feb 20 2020

b-sumner added a comment to D74910: [OpenCL] Remove spurious atomic_fetch_min/max builtins.

I recall we agreed that conformance tests using mixed types were broken, so this change should be OK. Hopefully this will not affect users.

Feb 20 2020, 10:02 AM · Restricted Project

Feb 19 2020

b-sumner accepted D74807: Add cl_khr_mipmap_image_writes as supported to AMDGPU.

LGTM

Feb 19 2020, 9:03 AM · Restricted Project

Jan 7 2020

b-sumner added a comment to D71365: expand printf when compiling HIP to AMDGPU.

Should this be looking forward to also handling OpenCL, which does require vector support?

Jan 7 2020, 6:27 AM · Restricted Project, Restricted Project

Jan 6 2020

b-sumner added a comment to D71357: AMDGPU: Assume f32 denormals are enabled by default.

I am OK with proceeding here.

Jan 6 2020, 9:47 AM · Restricted Project
b-sumner added a comment to D71358: AMDGPU: Remove denormal subtarget features.

I am OK with proceeding here.

Jan 6 2020, 9:47 AM · Restricted Project

Dec 11 2019

b-sumner added a comment to D71358: AMDGPU: Remove denormal subtarget features.

How is it now achieved that we do not support denormals by default before GFX9?

This is a frontend decision. clang emits the attribute based on the subtarget preference

This is also how it always worked. The backend uniformly treated denormals as off by default

Dec 11 2019, 10:39 AM · Restricted Project

Dec 10 2019

b-sumner added a comment to D71293: AMDGPU: Generate the correct sequence of code for FDIV32 when correctly-rounded-divide-sqrt is set.

This looks OK to me, although tuning on correctly rounded division any time denorms are enabled is not actually required by OpenCL.

Dec 10 2019, 3:19 PM · Restricted Project

Aug 14 2019

b-sumner added a comment to D66197: AMDGPU: Add intrinsics for address space identification.

Do we really need these to be "amdgpu" specific?

Aug 14 2019, 11:12 AM
b-sumner added a comment to D66197: AMDGPU: Add intrinsics for address space identification.

Looks fine to me.

Aug 14 2019, 10:56 AM
b-sumner added a comment to D66198: AMDGPU: Add builtins for is_local/is_private.

Looks fine to me.

Aug 14 2019, 10:52 AM
b-sumner added a comment to D66197: AMDGPU: Add intrinsics for address space identification.

Looks fine to me. Thanks!

I wonder if is.local should be is.shared because that's what getreg calls this

Aug 14 2019, 9:07 AM
b-sumner added a comment to D66198: AMDGPU: Add builtins for is_local/is_private.

Looks fine to me. Thanks!

Aug 14 2019, 8:09 AM
b-sumner added a comment to D66197: AMDGPU: Add intrinsics for address space identification.

Looks fine to me. Thanks!

Aug 14 2019, 8:05 AM

Jun 14 2019

b-sumner added a comment to D62739: AMDGPU: Always emit amdgpu-flat-work-group-size.

We need to communicate with anyone generating IR to ensure this is being generated before we change the default. clang is only one of those generators. This change will also need to be documented in the usage document.

The planned change is to make the backend more conservative, so it shouldn't break other frontends

Jun 14 2019, 7:35 AM

Jun 10 2019

b-sumner added a comment to D62739: AMDGPU: Always emit amdgpu-flat-work-group-size.

We need to communicate with anyone generating IR to ensure this is being generated before we change the default. clang is only one of those generators. This change will also need to be documented in the usage document.

Jun 10 2019, 9:32 AM

May 31 2019

b-sumner added inline comments to D62739: AMDGPU: Always emit amdgpu-flat-work-group-size.
May 31 2019, 9:59 AM

Apr 25 2019

b-sumner added a comment to D61112: AMDGPU: Enable _Float16.

Looks good to me.

Apr 25 2019, 6:57 AM

Mar 18 2019

b-sumner added inline comments to D59494: AMDGPU: Add support for cross address space synchronization scopes (clang).
Mar 18 2019, 5:16 PM · Restricted Project
b-sumner added a comment to D59494: AMDGPU: Add support for cross address space synchronization scopes (clang).
Mar 18 2019, 4:14 PM · Restricted Project

Feb 6 2019

b-sumner added a comment to D57831: AMDGPU: set wchar_t and wint_t to be unsigned short on windows.

Maybe there are already other types like this, but it saddens me that an offline compiled code object could potentially not work properly if the application is using any of these types. Or should the runtime try to detect a problem using argument metadata?

Feb 6 2019, 11:34 AM

Jan 28 2019

b-sumner added a comment to D57340: AMDGPU: Add DS append/consume intrinsics.

I think it is perfectly reasonable to treat these as essentially relaxed-only atomic RMW operations and require the application to use fences or barriers if necessary. The ordering and scope are only needed if we ever need this operation to act as a non-relaxed atomic RMW.

Jan 28 2019, 10:15 AM

Nov 1 2018

b-sumner added a comment to D50633: [AMDGPU] Add new Mode Register pass.

Actually the conversions don't need non-default-rounded operations, nor are non-default-rounded arithmetic operations required by OpenCL. However, we've had requests to implement functions such as add_rtz(x,y) which computes x+y with round-to-zero rounding. Our competitors offer such functions, and we implemented them for HSAIL. So we are really trying to get back to parity with HSAIL.

Nov 1 2018, 12:40 PM
b-sumner added a comment to D50633: [AMDGPU] Add new Mode Register pass.

One thing we've wanted for compute for quite a while now is a way to request non-default-rounded add, sub, mul, div, fma, and sqrt. Assuming we ever figure out how to represent these in the IR, ideally without falling back on intrinsics, could this approach be used to implement and minimize the mode changes for those as well?

Nov 1 2018, 11:28 AM

Oct 16 2018

b-sumner added a comment to D52320: AMDGPU: add __builtin_amdgcn_update_dpp.

Ping. There's quite a bit of interest in getting this exposed by clang.

Oct 16 2018, 10:11 AM

Sep 28 2018

b-sumner added inline comments to D52320: AMDGPU: add __builtin_amdgcn_update_dpp.
Sep 28 2018, 3:47 PM

Aug 7 2018

b-sumner added a comment to D50376: AMDGPU: Fix enabling denormals by default on pre-VI targets.

This approach seems fine to me.

Aug 7 2018, 7:39 AM

Jul 10 2018

b-sumner added a comment to D49146: [AMDGPU] Support a fdot2 pattern..

By the way, since types are being mixed, shouldn't the summary say something like optimize fma((float)S0.x, (float)S1.x, fma((float)S0.y, (float)S1.y, S2)) --> fdot2(S0, S1, S2)? We only want this transformation if S0 and S1 are <2 x f16>.

Current pattern matching does not support float element type yet, it will be supported next.

You are right, there is a typo in the summary. It should be:
fma((f16)S0.x, (f16)S1.x fma((f16)S0.y, (f16)S1.y, (f16)z)) -> ftrunc(fdot2(S0, S1, (f32)z))

Jul 10 2018, 12:41 PM
b-sumner added a comment to D49146: [AMDGPU] Support a fdot2 pattern..

By the way, since types are being mixed, shouldn't the summary say something like optimize fma((float)S0.x, (float)S1.x, fma((float)S0.y, (float)S1.y, S2)) --> fdot2(S0, S1, S2)? We only want this transformation if S0 and S1 are <2 x f16>.

Jul 10 2018, 10:51 AM
b-sumner added a comment to D49146: [AMDGPU] Support a fdot2 pattern..

This operation only rounds a single time, and unfortunately always flushes f32 denorms. Thus this transformation should only be done when unsafe math is requested.

Jul 10 2018, 10:30 AM

Jun 27 2018

b-sumner added a comment to D48667: [HIP] Fix ordering of device-lib linking.

Thanks, looks good.

Jun 27 2018, 12:20 PM · Restricted Project, Restricted Project

Jun 22 2018

b-sumner added a comment to D48493: [HIP] Support flush denorms bitcode.

LGTM

Jun 22 2018, 10:44 AM · Restricted Project, Restricted Project

Jun 13 2018

b-sumner accepted D48094: [AMDGPU] Change enqueue kernel handle type.

Looks good to me.

Jun 13 2018, 8:08 AM

Jun 5 2018

b-sumner added inline comments to D47566: AMDHSA: Code object v3 updates.
Jun 5 2018, 5:46 AM

May 8 2018

b-sumner added a comment to D46601: [OpenCL] Fix typos in emitted enqueue kernel function names.

Thanks! Looks good to me.

May 8 2018, 1:52 PM

Mar 23 2018

b-sumner added a comment to D44718: [AMDGPU] Define code object identification string used in AMDHSA runtimes..

LGTM

Mar 23 2018, 1:06 PM
b-sumner added a comment to D44718: [AMDGPU] Define code object identification string used in AMDHSA runtimes..

LGTM, but I'd rather use '+' instead of ',' for the features.

Mar 23 2018, 12:26 PM

Mar 22 2018

b-sumner added a comment to D39739: [HCC] Add flag to Import Weak Functions in Function Importer.

Is first one encountered a poor design?

Mar 22 2018, 8:11 AM

Mar 2 2018

b-sumner added a comment to D43911: [AMDGPU] Clean up old address space mapping and fix constant address space value.

Looks fine to me.

Mar 2 2018, 12:23 PM

Feb 27 2018

b-sumner added inline comments to D43414: AMDGPU: Define FP_FAST_FMA{F} macros for amdgcn.
Feb 27 2018, 5:41 AM

Feb 26 2018

b-sumner added inline comments to D43414: AMDGPU: Define FP_FAST_FMA{F} macros for amdgcn.
Feb 26 2018, 3:57 PM

Feb 23 2018

b-sumner added inline comments to D43414: AMDGPU: Define FP_FAST_FMA{F} macros for amdgcn.
Feb 23 2018, 3:10 PM

Feb 16 2018

b-sumner added inline comments to D43414: AMDGPU: Define FP_FAST_FMA{F} macros for amdgcn.
Feb 16 2018, 3:18 PM

Feb 15 2018

b-sumner accepted D43340: Clean up AMDGCN tests.

Looks good to me.

Feb 15 2018, 11:04 AM

Feb 14 2018

b-sumner added inline comments to D43281: [AMDGPU] fixes for lds f32 builtins.
Feb 14 2018, 7:35 AM · Restricted Project

Jan 31 2018

b-sumner added a comment to D42711: AMDGPU: Support target triple OS component cuda.

I think the purpose of this patch is to get a similar usage of clang as nvptx when compiling CUDA, i.e., using cuda as OS instead of using amdhsa as OS and amdgiz as environment. This is more convenient for CUDA application developers since they just need to swap nvptx with amdgcn.

This is a frontend driver question at most. The backend shouldn't need to be aware of this

There are various places in clang where selection is done based on OS==CUDA. If we don't use that OS, we need more complex logic in clang for such choices. I can try making changes to clang to make it work, but I suspect there may be places using OS==CUDA is necessary since it may be needed before parsing the language options.

Jan 31 2018, 9:06 AM

Jan 30 2018

b-sumner added a comment to D42711: AMDGPU: Support target triple OS component cuda.

As I understand it, the option users pass to clang++ is --cuda-gpu-arch=<Arch>. Can't we arrange to generate the right triple if they use gfx900 or some other AMD target name for <Arch>?

Jan 30 2018, 3:32 PM
b-sumner added a comment to D42711: AMDGPU: Support target triple OS component cuda.

You're using this just as an alias for AMDHSAOS. We shouldn't add something that behaves exactly the same

Jan 30 2018, 2:58 PM

Jan 29 2018

b-sumner added a comment to D42578: [AMDGPU] Add ds_fadd, ds_fmin, ds_fmax builtins functions.

Should we expect that the last 3 arguments have any effect? Do we want to test to ensure they have the expected effects?

Jan 29 2018, 2:30 PM · Restricted Project
b-sumner added a comment to D41699: [OpenCL] Change sampler representation.

I believe there are 20 distinct samplers. I'd be more comfortable with this if you could arrange that __translate_sampler_initializer would be called with values in [0, 19] instead of [133, 322].

Jan 29 2018, 11:09 AM
b-sumner added a comment to D41699: [OpenCL] Change sampler representation.

I believe there are 20 distinct samplers. I'd be more comfortable with this if you could arrange that __translate_sampler_initializer would be called with values in [0, 19] instead of [133, 322].

Jan 29 2018, 11:02 AM

Jan 26 2018

b-sumner added a comment to D42596: AMDGPU/SI: Add decoding in the GFX80_UNPACKED decoding namespace..

Can this be tested?

Jan 26 2018, 1:15 PM
b-sumner added a comment to D42578: [AMDGPU] Add ds_fadd, ds_fmin, ds_fmax builtins functions.

Were you going to add min and max separately?

Jan 26 2018, 7:48 AM · Restricted Project

Jan 23 2018

b-sumner added a comment to D42383: [AMDGPU] fix LDS f32 intrinsics.

Looks good to me...

Jan 23 2018, 12:43 PM · Restricted Project

Jan 18 2018

b-sumner added a comment to D42257: AMDGPU/MemoryModel: Fix monotonic atomic loads.

Do we need tests for 64-bit atomic loads? Do we need tests for cases where we get buffer instead of flat instructions?

Jan 18 2018, 12:20 PM
b-sumner added a comment to D42256: AMDGPU/MemoryModel: Fix monotonic atomic loads.

Do we need tests of 64 bit atomic loads? What about cases where we get buffer instructions rather than flat instructions?

Jan 18 2018, 12:01 PM
b-sumner added a comment to D42255: AMDGPU/MemoryModel: Fix monotonic atomic loads.

Do we need tests of 64-bit atomic loads? Do we need tests where we should have buffer loads instead of a flat loads?

Jan 18 2018, 12:00 PM

Jan 9 2018

b-sumner added a comment to D41699: [OpenCL] Change sampler representation.

This will break all existing implementations and is not backwards compatible. Is this extra diagnosis really worthwhile? Are developers complaining?

Jan 9 2018, 10:49 AM
b-sumner added a comment to D41699: [OpenCL] Change sampler representation.

What's the benefit of this change? Since this change will require all device libraries implementing __translate_sampler_initializer to change accordingly. We need a compelling reason.

Jan 9 2018, 8:09 AM

Dec 5 2017

b-sumner added a comment to D39739: [HCC] Add flag to Import Weak Functions in Function Importer.

The usual rule is to take the first weak definition encountered.

Dec 5 2017, 10:19 AM

Nov 28 2017

b-sumner added a comment to D40514: AMDGPU: Restrict ieee_mode to HSA..

I suppose if we have some other mechanism to ensure output modifiers are never used, then we could consider running with IEEE=0, but I think we need to continue running with IEEE=1 for compute. There may be users counting on current behavior.

The current behavior is broken for OpenCL so no one should depend on it.

Nov 28 2017, 2:02 PM
b-sumner added a comment to D40514: AMDGPU: Restrict ieee_mode to HSA..

IEEE mode disables output modifiers, which is good since output modifiers are not IEEE compatible and do not support output subnormal values.

Nov 28 2017, 6:13 AM

Nov 27 2017

b-sumner added a comment to D40051: AMDGPU: Rename Bonaire target to be gfx704; remove gfx800 and make Iceland and Tonga both use gfx802; update target feature handling.

Looks good to me.

Nov 27 2017, 2:16 PM

Nov 10 2017

b-sumner added a comment to D39912: AMDGPU/SI: Implement d16 support for image intrinsics.

Pardon my ignorance, but why isn't include/llvm/IR/IntrinsicsAMDGCN.td being updated?

We did not add new intrinsics. We just add support for new data types. In IntrinsicsAMDGCN.td, we have already defined
the data types as any_float which includes half types.

In other wordm llvm.amdgcn.image.load.v4f16 (for example) has already been declared in IntrinsicsAMDGCN.td. And this patch just needs to actually define (implement) it.

Nov 10 2017, 12:42 PM
b-sumner added a comment to D39912: AMDGPU/SI: Implement d16 support for image intrinsics.

Pardon my ignorance, but why isn't include/llvm/IR/IntrinsicsAMDGCN.td being updated?

Nov 10 2017, 12:18 PM
b-sumner added a comment to D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics.

Pardon my ignorance, but why isn't include/llvm/IR/IntrinsicsAMDGPU.td being updated?

Nov 10 2017, 12:16 PM

Oct 23 2017

b-sumner added a comment to D39186: LLD: Fix large integer implicitly truncated to unsigned type warning.

g++4.8.2 is complaining:

Oct 23 2017, 9:51 AM

Oct 17 2017

b-sumner added a comment to D39014: AMDGPU : Fix an error for the llvm.cttz implementation..

This passes my tests, including getting the correct answer for 0.

Oct 17 2017, 1:50 PM

Oct 6 2017

b-sumner accepted D38607: [AMDGPU] New 64 bit div/rem expansion.

Looks good to me.

Oct 6 2017, 7:19 AM

Oct 5 2017

b-sumner added inline comments to D38607: [AMDGPU] New 64 bit div/rem expansion.
Oct 5 2017, 4:55 PM

Sep 29 2017

b-sumner added a comment to D38421: Eliminate ftrunc if source is know to be rounded.

We could potentially update visitCEIL and visitFLOOR as well, and use the same opcode test in each, although I don't think such combinations are very likely.

Sep 29 2017, 1:32 PM

Sep 28 2017

b-sumner accepted D37568: [AMDGPU] Allow flexible register names in inline asm constraints.

LGTM. I think we can leave immediates to another patch.

Sep 28 2017, 11:04 AM

Sep 21 2017

b-sumner added a comment to D37822: [OpenCL] Clean up and add missing fields for block struct.

Could you please explain a bit more why the alignment have to be put explicitly in the struct? I am just not very convinced this is general enough.

The captured variables are fields of the block literal struct. Due to alignment requirement of these fields, there is alignment requirement of
the block literal struct. The ISA of the block invoke function is generated with the assumption of these alignments. If the block literal is
allocated at a memory address not satisfying the alignment requirement, the kernel behavior is undefined.

Generally, __enqueue_kernel library function needs to prepare the kernel argument before launching the kernel. It usually does this by copying
the block literal to some buffer then pass the address of the buffer to the kernel. Then the address of the buffer has to satisfy the alignment
requirement.

If this block literal struct is not general enough, how about add another field as target reserved size, and leave the remaining space of header for
target specific use. And add a target hook to allow target fill the reserved space, e.g.

struct __opencl_block_literal {
  int total_size;
  int align;
  __generic void *invoke;
  int target_reserved_size; /* round up to 4 bytes */
  int target_reserved[];
  /* captures */
};

I like the idea of the target reserved part actually. But not sure how it could be used without adding any target specific methods?

If we decide to add target reserved fields, I can add target hooks to fill these fields. However I would suggest to leave this for future since I don't see there is need for other fields for now.

I could imagine it can be usefull for some vendor implementations.

However, I am still not clear why the alignment of this struct has to be different from any other struct Clang produces. Normally the alignment of objects have to be known during IR generation to put them correctly in the attributes of generated alloca, store and loads. But as a field inside struct I don't know how it can be useful. I would imagine enqueue_kernel would just operate on the block as if it would be an arbitrary buffer of data. Also would size of the struct not account for any padding to make sure the alignment can be deduced based on it correctly?

enqueue_kernel needs to pass the block struct to the kernel. Let's assume it does this by copying the block struct to a buffer. If enqueue_kernel does not know the alignment of the struct, it can only put it at an arbitrary address in the buffer. Then the kernel has to copy the struct to an aligned private memory and load the fields. However, if the enqueued_kernel knows the alignment of the struct, it can put it at an address satisfying the alignment. Then the kernel can load the fields directly from the buffer, skips the step of copying to an aligned private memory. Therefore, alignment of the block struct is usually a useful information for enqueue_kernel. I think that's why in the SPIRV spec OpEnqueueKernel requires an alignment operand for the block context.

Ok, I just think in C if you use malloc to obtain a pointer to some memory location it doesn't take any alignment information. Then you can use the pointer to copy any data including the struct into the location its pointed to. And the pointer can be used later on correctly. I think the alignment is deduced in this case from the type or the size of an object. Do you know where the alignment information is used for SPIRV call? Also how is the block represented in SPIRV?

Sep 21 2017, 11:27 AM

Sep 13 2017

b-sumner accepted D37703: [AMDGPU] Change addr space of clk_event_t, queue_t and reserve_id_t to global.

Looks good to me.

Sep 13 2017, 11:18 AM

Sep 11 2017

b-sumner added a comment to D37698: Allow target to decide when to cluster loads/stores in misched.

I'd suggest changing the name to doMemOpsHaveSameBase[Ptr]

Sep 11 2017, 10:38 AM

Sep 7 2017

b-sumner added a comment to D37568: [AMDGPU] Allow flexible register names in inline asm constraints.

The assembler accepts v[N] in addition to vN. I'm not sure if that is needed here.

Sep 7 2017, 12:18 PM

Sep 1 2017

b-sumner added a comment to D37348: Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ..
In D37348#859119, @wdng wrote:

Just add a custom lowering ISD:CTTZ to ISD::CTTZ_ZERO_UNDEF

Sep 1 2017, 1:20 PM
b-sumner added a comment to D37348: Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ..

I think the actual problem is the implementation of ISD::CTTZ not using v_ffbl and not this transformation.

Sep 1 2017, 9:32 AM

Aug 30 2017

b-sumner added inline comments to D36862: AMDGPU: Handle non-temporal loads and stores.
Aug 30 2017, 12:13 PM

Aug 29 2017

b-sumner added a comment to D36856: [AMDGPU] Use v_max_f* for fcanonicalize.

Looks fine to me; I suggested using max since it is faster in many cases.

Aug 29 2017, 1:20 PM

Aug 28 2017

b-sumner added inline comments to D36862: AMDGPU: Handle non-temporal loads and stores.
Aug 28 2017, 10:59 AM
b-sumner added inline comments to D36802: AMDGPU: Cleanup most of the macros.
Aug 28 2017, 9:52 AM

Aug 16 2017

b-sumner added inline comments to D36335: Add ‘llvm.experimental.constrained.fma‘ Intrinsic.
Aug 16 2017, 2:23 PM

Aug 15 2017

b-sumner added inline comments to D36335: Add ‘llvm.experimental.constrained.fma‘ Intrinsic.
Aug 15 2017, 4:15 PM