Page MenuHomePhabricator

b-sumner (Brian Sumner)
User

Projects

User does not belong to any projects.

User Details

User Since
Aug 18 2016, 12:44 PM (147 w, 5 d)

Recent Activity

Fri, Jun 14

b-sumner added a comment to D62739: AMDGPU: Always emit amdgpu-flat-work-group-size.

We need to communicate with anyone generating IR to ensure this is being generated before we change the default. clang is only one of those generators. This change will also need to be documented in the usage document.

The planned change is to make the backend more conservative, so it shouldn't break other frontends

Fri, Jun 14, 7:35 AM

Mon, Jun 10

b-sumner added a comment to D62739: AMDGPU: Always emit amdgpu-flat-work-group-size.

We need to communicate with anyone generating IR to ensure this is being generated before we change the default. clang is only one of those generators. This change will also need to be documented in the usage document.

Mon, Jun 10, 9:32 AM

Fri, May 31

b-sumner added inline comments to D62739: AMDGPU: Always emit amdgpu-flat-work-group-size.
Fri, May 31, 9:59 AM

Apr 25 2019

b-sumner added a comment to D61112: AMDGPU: Enable _Float16.

Looks good to me.

Apr 25 2019, 6:57 AM

Mar 18 2019

b-sumner added inline comments to D59494: AMDGPU: Add support for cross address space synchronization scopes (clang).
Mar 18 2019, 5:16 PM · Restricted Project
b-sumner added a comment to D59494: AMDGPU: Add support for cross address space synchronization scopes (clang).
Mar 18 2019, 4:14 PM · Restricted Project

Feb 6 2019

b-sumner added a comment to D57831: AMDGPU: set wchar_t and wint_t to be unsigned short on windows.

Maybe there are already other types like this, but it saddens me that an offline compiled code object could potentially not work properly if the application is using any of these types. Or should the runtime try to detect a problem using argument metadata?

Feb 6 2019, 11:34 AM

Jan 28 2019

b-sumner added a comment to D57340: AMDGPU: Add DS append/consume intrinsics.

I think it is perfectly reasonable to treat these as essentially relaxed-only atomic RMW operations and require the application to use fences or barriers if necessary. The ordering and scope are only needed if we ever need this operation to act as a non-relaxed atomic RMW.

Jan 28 2019, 10:15 AM

Nov 1 2018

b-sumner added a comment to D50633: [AMDGPU] Add new Mode Register pass.

Actually the conversions don't need non-default-rounded operations, nor are they required by OpenCL. However, we've had requests to implement functions such as add_rtz(x,y) which computes x+y with round-to-zero rounding. Our competitors offer such functions, and we implemented them for HSAIL. So we are really trying to get back to parity with HSAIL.

Nov 1 2018, 12:40 PM
b-sumner added a comment to D50633: [AMDGPU] Add new Mode Register pass.

One thing we've wanted for compute for quite a while now is a way to request non-default-rounded add, sub, mul, div, fma, and sqrt. Assuming we ever figure out how to represent these in the IR, ideally without falling back on intrinsics, could this approach be used to implement and minimize the mode changes for those as well?

Nov 1 2018, 11:28 AM

Oct 16 2018

b-sumner added a comment to D52320: AMDGPU: add __builtin_amdgcn_update_dpp.

Ping. There's quite a bit of interest in getting this exposed by clang.

Oct 16 2018, 10:11 AM

Sep 28 2018

b-sumner added inline comments to D52320: AMDGPU: add __builtin_amdgcn_update_dpp.
Sep 28 2018, 3:47 PM

Aug 7 2018

b-sumner added a comment to D50376: AMDGPU: Fix enabling denormals by default on pre-VI targets.

This approach seems fine to me.

Aug 7 2018, 7:39 AM

Jul 10 2018

b-sumner added a comment to D49146: [AMDGPU] Support a fdot2 pattern..

By the way, since types are being mixed, shouldn't the summary say something like optimize fma((float)S0.x, (float)S1.x, fma((float)S0.y, (float)S1.y, S2)) --> fdot2(S0, S1, S2)? We only want this transformation if S0 and S1 are <2 x f16>.

Current pattern matching does not support float element type yet, it will be supported next.

You are right, there is a typo in the summary. It should be:
fma((f16)S0.x, (f16)S1.x fma((f16)S0.y, (f16)S1.y, (f16)z)) -> ftrunc(fdot2(S0, S1, (f32)z))

Jul 10 2018, 12:41 PM
b-sumner added a comment to D49146: [AMDGPU] Support a fdot2 pattern..

By the way, since types are being mixed, shouldn't the summary say something like optimize fma((float)S0.x, (float)S1.x, fma((float)S0.y, (float)S1.y, S2)) --> fdot2(S0, S1, S2)? We only want this transformation if S0 and S1 are <2 x f16>.

Jul 10 2018, 10:51 AM
b-sumner added a comment to D49146: [AMDGPU] Support a fdot2 pattern..

This operation only rounds a single time, and unfortunately always flushes f32 denorms. Thus this transformation should only be done when unsafe math is requested.

Jul 10 2018, 10:30 AM

Jun 27 2018

b-sumner added a comment to D48667: [HIP] Fix ordering of device-lib linking.

Thanks, looks good.

Jun 27 2018, 12:20 PM · Restricted Project, Restricted Project

Jun 22 2018

b-sumner added a comment to D48493: [HIP] Support flush denorms bitcode.

LGTM

Jun 22 2018, 10:44 AM · Restricted Project, Restricted Project

Jun 13 2018

b-sumner accepted D48094: [AMDGPU] Change enqueue kernel handle type.

Looks good to me.

Jun 13 2018, 8:08 AM

Jun 5 2018

b-sumner added inline comments to D47566: AMDHSA: Code object v3 updates.
Jun 5 2018, 5:46 AM

May 8 2018

b-sumner added a comment to D46601: [OpenCL] Fix typos in emitted enqueue kernel function names.

Thanks! Looks good to me.

May 8 2018, 1:52 PM

Mar 23 2018

b-sumner added a comment to D44718: [AMDGPU] Define code object identification string used in AMDHSA runtimes..

LGTM

Mar 23 2018, 1:06 PM
b-sumner added a comment to D44718: [AMDGPU] Define code object identification string used in AMDHSA runtimes..

LGTM, but I'd rather use '+' instead of ',' for the features.

Mar 23 2018, 12:26 PM

Mar 22 2018

b-sumner added a comment to D39739: [HCC] Add flag to Import Weak Functions in Function Importer.

Is first one encountered a poor design?

Mar 22 2018, 8:11 AM

Mar 2 2018

b-sumner added a comment to D43911: [AMDGPU] Clean up old address space mapping and fix constant address space value.

Looks fine to me.

Mar 2 2018, 12:23 PM

Feb 27 2018

b-sumner added inline comments to D43414: AMDGPU: Define FP_FAST_FMA{F} macros for amdgcn.
Feb 27 2018, 5:41 AM

Feb 26 2018

b-sumner added inline comments to D43414: AMDGPU: Define FP_FAST_FMA{F} macros for amdgcn.
Feb 26 2018, 3:57 PM

Feb 23 2018

b-sumner added inline comments to D43414: AMDGPU: Define FP_FAST_FMA{F} macros for amdgcn.
Feb 23 2018, 3:10 PM

Feb 16 2018

b-sumner added inline comments to D43414: AMDGPU: Define FP_FAST_FMA{F} macros for amdgcn.
Feb 16 2018, 3:18 PM

Feb 15 2018

b-sumner accepted D43340: Clean up AMDGCN tests.

Looks good to me.

Feb 15 2018, 11:04 AM

Feb 14 2018

b-sumner added inline comments to D43281: [AMDGPU] fixes for lds f32 builtins.
Feb 14 2018, 7:35 AM · Restricted Project

Jan 31 2018

b-sumner added a comment to D42711: AMDGPU: Support target triple OS component cuda.

I think the purpose of this patch is to get a similar usage of clang as nvptx when compiling CUDA, i.e., using cuda as OS instead of using amdhsa as OS and amdgiz as environment. This is more convenient for CUDA application developers since they just need to swap nvptx with amdgcn.

This is a frontend driver question at most. The backend shouldn't need to be aware of this

There are various places in clang where selection is done based on OS==CUDA. If we don't use that OS, we need more complex logic in clang for such choices. I can try making changes to clang to make it work, but I suspect there may be places using OS==CUDA is necessary since it may be needed before parsing the language options.

Jan 31 2018, 9:06 AM

Jan 30 2018

b-sumner added a comment to D42711: AMDGPU: Support target triple OS component cuda.

As I understand it, the option users pass to clang++ is --cuda-gpu-arch=<Arch>. Can't we arrange to generate the right triple if they use gfx900 or some other AMD target name for <Arch>?

Jan 30 2018, 3:32 PM
b-sumner added a comment to D42711: AMDGPU: Support target triple OS component cuda.

You're using this just as an alias for AMDHSAOS. We shouldn't add something that behaves exactly the same

Jan 30 2018, 2:58 PM

Jan 29 2018

b-sumner added a comment to D42578: [AMDGPU] Add ds_fadd, ds_fmin, ds_fmax builtins functions.

Should we expect that the last 3 arguments have any effect? Do we want to test to ensure they have the expected effects?

Jan 29 2018, 2:30 PM · Restricted Project
b-sumner added a comment to D41699: [OpenCL] Change sampler representation.

I believe there are 20 distinct samplers. I'd be more comfortable with this if you could arrange that __translate_sampler_initializer would be called with values in [0, 19] instead of [133, 322].

Jan 29 2018, 11:09 AM
b-sumner added a comment to D41699: [OpenCL] Change sampler representation.

I believe there are 20 distinct samplers. I'd be more comfortable with this if you could arrange that __translate_sampler_initializer would be called with values in [0, 19] instead of [133, 322].

Jan 29 2018, 11:02 AM

Jan 26 2018

b-sumner added a comment to D42596: AMDGPU/SI: Add decoding in the GFX80_UNPACKED decoding namespace..

Can this be tested?

Jan 26 2018, 1:15 PM
b-sumner added a comment to D42578: [AMDGPU] Add ds_fadd, ds_fmin, ds_fmax builtins functions.

Were you going to add min and max separately?

Jan 26 2018, 7:48 AM · Restricted Project

Jan 23 2018

b-sumner added a comment to D42383: [AMDGPU] fix LDS f32 intrinsics.

Looks good to me...

Jan 23 2018, 12:43 PM · Restricted Project

Jan 18 2018

b-sumner added a comment to D42257: AMDGPU/MemoryModel: Fix monotonic atomic loads.

Do we need tests for 64-bit atomic loads? Do we need tests for cases where we get buffer instead of flat instructions?

Jan 18 2018, 12:20 PM
b-sumner added a comment to D42256: AMDGPU/MemoryModel: Fix monotonic atomic loads.

Do we need tests of 64 bit atomic loads? What about cases where we get buffer instructions rather than flat instructions?

Jan 18 2018, 12:01 PM
b-sumner added a comment to D42255: AMDGPU/MemoryModel: Fix monotonic atomic loads.

Do we need tests of 64-bit atomic loads? Do we need tests where we should have buffer loads instead of a flat loads?

Jan 18 2018, 12:00 PM

Jan 9 2018

b-sumner added a comment to D41699: [OpenCL] Change sampler representation.

This will break all existing implementations and is not backwards compatible. Is this extra diagnosis really worthwhile? Are developers complaining?

Jan 9 2018, 10:49 AM
b-sumner added a comment to D41699: [OpenCL] Change sampler representation.

What's the benefit of this change? Since this change will require all device libraries implementing __translate_sampler_initializer to change accordingly. We need a compelling reason.

Jan 9 2018, 8:09 AM

Dec 5 2017

b-sumner added a comment to D39739: [HCC] Add flag to Import Weak Functions in Function Importer.

The usual rule is to take the first weak definition encountered.

Dec 5 2017, 10:19 AM

Nov 28 2017

b-sumner added a comment to D40514: AMDGPU: Restrict ieee_mode to HSA..

I suppose if we have some other mechanism to ensure output modifiers are never used, then we could consider running with IEEE=0, but I think we need to continue running with IEEE=1 for compute. There may be users counting on current behavior.

The current behavior is broken for OpenCL so no one should depend on it.

Nov 28 2017, 2:02 PM
b-sumner added a comment to D40514: AMDGPU: Restrict ieee_mode to HSA..

IEEE mode disables output modifiers, which is good since output modifiers are not IEEE compatible and do not support output subnormal values.

Nov 28 2017, 6:13 AM

Nov 27 2017

b-sumner added a comment to D40051: AMDGPU: Rename Bonaire target to be gfx704; remove gfx800 and make Iceland and Tonga both use gfx802; update target feature handling.

Looks good to me.

Nov 27 2017, 2:16 PM

Nov 10 2017

b-sumner added a comment to D39912: AMDGPU/SI: Implement d16 support for image intrinsics.

Pardon my ignorance, but why isn't include/llvm/IR/IntrinsicsAMDGCN.td being updated?

We did not add new intrinsics. We just add support for new data types. In IntrinsicsAMDGCN.td, we have already defined
the data types as any_float which includes half types.

In other wordm llvm.amdgcn.image.load.v4f16 (for example) has already been declared in IntrinsicsAMDGCN.td. And this patch just needs to actually define (implement) it.

Nov 10 2017, 12:42 PM
b-sumner added a comment to D39912: AMDGPU/SI: Implement d16 support for image intrinsics.

Pardon my ignorance, but why isn't include/llvm/IR/IntrinsicsAMDGCN.td being updated?

Nov 10 2017, 12:18 PM
b-sumner added a comment to D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics.

Pardon my ignorance, but why isn't include/llvm/IR/IntrinsicsAMDGPU.td being updated?

Nov 10 2017, 12:16 PM

Oct 23 2017

b-sumner added a comment to D39186: LLD: Fix large integer implicitly truncated to unsigned type warning.

g++4.8.2 is complaining:

Oct 23 2017, 9:51 AM

Oct 17 2017

b-sumner added a comment to D39014: AMDGPU : Fix an error for the llvm.cttz implementation..

This passes my tests, including getting the correct answer for 0.

Oct 17 2017, 1:50 PM

Oct 6 2017

b-sumner accepted D38607: [AMDGPU] New 64 bit div/rem expansion.

Looks good to me.

Oct 6 2017, 7:19 AM

Oct 5 2017

b-sumner added inline comments to D38607: [AMDGPU] New 64 bit div/rem expansion.
Oct 5 2017, 4:55 PM

Sep 29 2017

b-sumner added a comment to D38421: Eliminate ftrunc if source is know to be rounded.

We could potentially update visitCEIL and visitFLOOR as well, and use the same opcode test in each, although I don't think such combinations are very likely.

Sep 29 2017, 1:32 PM

Sep 28 2017

b-sumner accepted D37568: [AMDGPU] Allow flexible register names in inline asm constraints.

LGTM. I think we can leave immediates to another patch.

Sep 28 2017, 11:04 AM

Sep 21 2017

b-sumner added a comment to D37822: [OpenCL] Clean up and add missing fields for block struct.

Could you please explain a bit more why the alignment have to be put explicitly in the struct? I am just not very convinced this is general enough.

The captured variables are fields of the block literal struct. Due to alignment requirement of these fields, there is alignment requirement of
the block literal struct. The ISA of the block invoke function is generated with the assumption of these alignments. If the block literal is
allocated at a memory address not satisfying the alignment requirement, the kernel behavior is undefined.

Generally, __enqueue_kernel library function needs to prepare the kernel argument before launching the kernel. It usually does this by copying
the block literal to some buffer then pass the address of the buffer to the kernel. Then the address of the buffer has to satisfy the alignment
requirement.

If this block literal struct is not general enough, how about add another field as target reserved size, and leave the remaining space of header for
target specific use. And add a target hook to allow target fill the reserved space, e.g.

struct __opencl_block_literal {
  int total_size;
  int align;
  __generic void *invoke;
  int target_reserved_size; /* round up to 4 bytes */
  int target_reserved[];
  /* captures */
};

I like the idea of the target reserved part actually. But not sure how it could be used without adding any target specific methods?

If we decide to add target reserved fields, I can add target hooks to fill these fields. However I would suggest to leave this for future since I don't see there is need for other fields for now.

I could imagine it can be usefull for some vendor implementations.

However, I am still not clear why the alignment of this struct has to be different from any other struct Clang produces. Normally the alignment of objects have to be known during IR generation to put them correctly in the attributes of generated alloca, store and loads. But as a field inside struct I don't know how it can be useful. I would imagine enqueue_kernel would just operate on the block as if it would be an arbitrary buffer of data. Also would size of the struct not account for any padding to make sure the alignment can be deduced based on it correctly?

enqueue_kernel needs to pass the block struct to the kernel. Let's assume it does this by copying the block struct to a buffer. If enqueue_kernel does not know the alignment of the struct, it can only put it at an arbitrary address in the buffer. Then the kernel has to copy the struct to an aligned private memory and load the fields. However, if the enqueued_kernel knows the alignment of the struct, it can put it at an address satisfying the alignment. Then the kernel can load the fields directly from the buffer, skips the step of copying to an aligned private memory. Therefore, alignment of the block struct is usually a useful information for enqueue_kernel. I think that's why in the SPIRV spec OpEnqueueKernel requires an alignment operand for the block context.

Ok, I just think in C if you use malloc to obtain a pointer to some memory location it doesn't take any alignment information. Then you can use the pointer to copy any data including the struct into the location its pointed to. And the pointer can be used later on correctly. I think the alignment is deduced in this case from the type or the size of an object. Do you know where the alignment information is used for SPIRV call? Also how is the block represented in SPIRV?

Sep 21 2017, 11:27 AM

Sep 13 2017

b-sumner accepted D37703: [AMDGPU] Change addr space of clk_event_t, queue_t and reserve_id_t to global.

Looks good to me.

Sep 13 2017, 11:18 AM

Sep 11 2017

b-sumner added a comment to D37698: Allow target to decide when to cluster loads/stores in misched.

I'd suggest changing the name to doMemOpsHaveSameBase[Ptr]

Sep 11 2017, 10:38 AM

Sep 7 2017

b-sumner added a comment to D37568: [AMDGPU] Allow flexible register names in inline asm constraints.

The assembler accepts v[N] in addition to vN. I'm not sure if that is needed here.

Sep 7 2017, 12:18 PM

Sep 1 2017

b-sumner added a comment to D37348: Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ..
In D37348#859119, @wdng wrote:

Just add a custom lowering ISD:CTTZ to ISD::CTTZ_ZERO_UNDEF

Sep 1 2017, 1:20 PM
b-sumner added a comment to D37348: Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ..

I think the actual problem is the implementation of ISD::CTTZ not using v_ffbl and not this transformation.

Sep 1 2017, 9:32 AM

Aug 30 2017

b-sumner added inline comments to D36862: AMDGPU: Handle non-temporal loads and stores.
Aug 30 2017, 12:13 PM

Aug 29 2017

b-sumner added a comment to D36856: [AMDGPU] Use v_max_f* for fcanonicalize.

Looks fine to me; I suggested using max since it is faster in many cases.

Aug 29 2017, 1:20 PM

Aug 28 2017

b-sumner added inline comments to D36862: AMDGPU: Handle non-temporal loads and stores.
Aug 28 2017, 10:59 AM
b-sumner added inline comments to D36802: AMDGPU: Cleanup most of the macros.
Aug 28 2017, 9:52 AM

Aug 16 2017

b-sumner added inline comments to D36335: Add ‘llvm.experimental.constrained.fma‘ Intrinsic.
Aug 16 2017, 2:23 PM

Aug 15 2017

b-sumner added inline comments to D36335: Add ‘llvm.experimental.constrained.fma‘ Intrinsic.
Aug 15 2017, 4:15 PM

Aug 11 2017

b-sumner added inline comments to D36335: Add ‘llvm.experimental.constrained.fma‘ Intrinsic.
Aug 11 2017, 2:08 PM

Aug 9 2017

b-sumner added inline comments to D32896: [OpenCL] Make CLK_NULL_RESERVE_ID invalid reserve id..
Aug 9 2017, 3:09 PM

Aug 8 2017

b-sumner added inline comments to D36171: AMDGPU: Use direct struct returns.
Aug 8 2017, 2:46 PM
b-sumner added inline comments to D36171: AMDGPU: Use direct struct returns.
Aug 8 2017, 1:46 PM
b-sumner added a comment to D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes.

Hi Sam,

What do you think about implementing this optimization in target specific optimization pass? Since size/alignment is saved as function parameter in LLVM IR, the optimization can be done in target specific components w/o adding additional conditions to generic library.

Thanks,
Alexey

Hi Alexey,

The optimization of the power-of-2 type size is implemented as a library function. Our backend lacks the capability to link in library code at ISA level, so linking of the optimized library function has to be done before any target-specific passes. It seems the only place to do this is Clang codegen since Clang/llvm does not support target-specific pre-linking passes.

My general feeling is that it doesn't look like a generic enough change for the frontend. Even though it is implemented in a generic way, not every target might have a special support for the power of 2 size and also if there is such a support not every implementation would handle it as a library function. But I can see that perhaps LLVM is missing flexibility in the flow to accommodate these needs. Any change we could try to extend the compilation flow such that this target specific optimization could happen before the IR linking?

Aug 8 2017, 6:14 AM

Aug 7 2017

b-sumner added inline comments to D36171: AMDGPU: Use direct struct returns.
Aug 7 2017, 12:59 PM

Aug 4 2017

b-sumner added inline comments to D36335: Add ‘llvm.experimental.constrained.fma‘ Intrinsic.
Aug 4 2017, 2:29 PM
b-sumner added a comment to D36335: Add ‘llvm.experimental.constrained.fma‘ Intrinsic.

An update to docs/LangRef.rst is needed.

Aug 4 2017, 12:28 PM
b-sumner added inline comments to D36335: Add ‘llvm.experimental.constrained.fma‘ Intrinsic.
Aug 4 2017, 12:23 PM

Aug 3 2017

b-sumner added a comment to D23403: OpenCL atomics 1.2 2.0 conversion pre-link pass.

We don't have a complete replacement yet. We can drop this when the replacement is ready and passing tests.

Aug 3 2017, 5:15 PM · Restricted Project

Jul 25 2017

b-sumner added a comment to D28691: Add OpenCL 2.0 atomic builtin functions as Clang builtin.

There are other languages for heterogeneous compute that have scopes, although not exposed quite as explicitly as OpenCL. For example AMD's "HC" language. And any language making use of clang and targeting SPIR-V would likely use these builtins. I think a more generic prefix is appropriate, and "scoped" tells us exactly when these are needed.

But would those languages use the same language design for these scopes as OpenCL if they did expose them, as opposed to some more elaborate scoping specification? My objection is not that the concept is inherently OpenCL-specific, it's that the presentation in the language might be inherently OpenCL-specific, which makes staying in the opencl namespace is prudent.

Jul 25 2017, 1:59 PM
b-sumner added a comment to D28691: Add OpenCL 2.0 atomic builtin functions as Clang builtin.

Can we drop the "opencl" part of the name and use something like __scoped_atomic_*? Also, it may not make sense to support non-constant scope here since we can't predict what other scopes may be added by other languages in the future.

we could use the approach of LangAS, i.e. we allow targets to map all language specific scopes to target-specific scope names, since IR only cares about scope names, which are target specific. And this is what the current implementation does.

I have no objection to use the __scoped_atomic_ name. It is more general and extensible. John/Anastasia, any comments? Thanks.

I think I would prefer __opencl_atomic_* until we have some evidence that this concept is more general than just OpenCL.

Jul 25 2017, 12:47 PM
b-sumner added a comment to D28691: Add OpenCL 2.0 atomic builtin functions as Clang builtin.

Can we drop the "opencl" part of the name and use something like __scoped_atomic_*? Also, it may not make sense to support non-constant scope here since we can't predict what other scopes may be added by other languages in the future.

Jul 25 2017, 11:32 AM
b-sumner added inline comments to D28691: Add OpenCL 2.0 atomic builtin functions as Clang builtin.
Jul 25 2017, 7:35 AM

Jun 9 2017

b-sumner added a comment to D34046: [AMDGPU] Add intrinsics for alignbit and alignbyte instructions.

This looks fine to me. I was concerned about how successful we would be pattern matching them.

Jun 9 2017, 7:27 AM

Apr 6 2017

b-sumner added a comment to D31779: AMDGPU: Replace fp16SrcZerosHighBits with a whitelist.

This patch fixes the broken tests.

Apr 6 2017, 12:21 PM

Mar 2 2017

b-sumner added a comment to D30551: [AMDGPU] Add builtin functions readlane ds_permute mov_dpp.

mov_dpp should be under the VI+ comment

Mar 2 2017, 3:10 PM

Feb 17 2017

b-sumner added a comment to D29958: AMDGPU : Replace FMAD with FMA when denormals are enabled..

v_mac_f32 always flushes subnormal inputs just like v_mad_f32

Feb 17 2017, 7:03 AM

Feb 15 2017

b-sumner added a comment to D29958: AMDGPU : Replace FMAD with FMA when denormals are enabled..
Feb 15 2017, 10:34 AM

Jan 3 2017

b-sumner added a comment to D27028: Add intrinsics for constrained floating point operations.

FWIW, rounding controls are needed for llvm.fma.*, llvm.fmuladd.*, and llvm.sqrt.*

Jan 3 2017, 10:33 AM

Oct 7 2016

b-sumner added a comment to D25343: [OpenCL] Mark group functions as convergent in opencl-c.h.

clang does not recognize convergent as a valid attribute. There was an attempt to add this, see https://www.mail-archive.com/cfe-commits@lists.llvm.org/msg22271.html but it hasn't had any result. Matt do you see "real uses" for this now?

Oct 7 2016, 7:14 AM

Sep 23 2016

b-sumner accepted D24865: AMDGPU/SI: Don't crash on anonymous GlobalValues .

Looks good to me.

Sep 23 2016, 10:10 AM
b-sumner added a comment to D24865: AMDGPU/SI: Don't crash on anonymous GlobalValues .

I'd replace argument name AsmPrinter with AP to match the existing style.

Sep 23 2016, 8:38 AM