rampitec (Stanislav Mekhanoshin)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 4 2014, 4:14 AM (228 w, 4 d)

Recent Activity

Today

rampitec accepted D50983: AMDGPU: Partially move target handling code from clang to TargetParser.

LGTM

Tue, Aug 21, 3:11 AM
rampitec accepted D50984: AMDGPU: Move target code into TargetParser.

LGTM

Tue, Aug 21, 3:10 AM

Thu, Aug 16

rampitec accepted D50834: AMDGPU: Add feature for fast f32 denormals.

LGTM

Thu, Aug 16, 4:32 AM

Wed, Aug 15

rampitec accepted D50787: AMDGPU: Custom lower fexp.

LGTM

Wed, Aug 15, 11:55 PM
rampitec accepted D50756: AMDGPU: Improve extract_vector_elt reduction combine.

LGTM

Wed, Aug 15, 2:01 AM

Tue, Aug 14

rampitec accepted D50705: AMDGPU: Address todo for handling 1/(2 pi).

LGTM

Tue, Aug 14, 11:52 PM
rampitec accepted D50706: AMDGPU: Fold fneg into fmed3.

LGTM

Tue, Aug 14, 8:28 AM
rampitec added inline comments to D50705: AMDGPU: Address todo for handling 1/(2 pi).
Tue, Aug 14, 8:16 AM

Mon, Aug 13

rampitec accepted D50629: AMDGPU: Fix getInstSizeInBytes.

LGTM

Mon, Aug 13, 7:24 AM
rampitec accepted D50626: AMDGPU: Implement llvm.amdgcn.icmp/fcmp for i16/f16.

LGTM

Mon, Aug 13, 3:03 AM
rampitec accepted D50624: AMDGPU: Stop producing icmp/fcmp intrinsics with invalid types.

LGTM

Mon, Aug 13, 2:22 AM

Sun, Aug 12

rampitec accepted D50600: AMDGPU: Use splat vectors for undefs when folding canonicalize.

LGTM except comment typo.

Sun, Aug 12, 12:29 AM
rampitec accepted D50567: AMDGPU: Fix packing undef parts of build_vector.

LGTM

Sun, Aug 12, 12:27 AM

Thu, Aug 9

rampitec accepted D50468: AMDGPU: More canonicalized operations.

LGTM

Thu, Aug 9, 12:27 AM

Wed, Aug 8

rampitec accepted D50400: AMDGPU: Error more gracefully on libcalls.

LGTM

Wed, Aug 8, 12:05 AM
rampitec accepted D50399: AMDGPU: Fix shifts for i128.

LGTM

Wed, Aug 8, 12:04 AM
rampitec accepted D50332: AMDGPU: Turn class x, p_zero|n_zero into fcmp oeq x, 0.

LGTM

Wed, Aug 8, 12:02 AM

Tue, Aug 7

rampitec accepted D50324: AMDGPU: Match isfinite pattern to class instructions.

LGTM

Tue, Aug 7, 3:22 AM

Mon, Aug 6

rampitec added inline comments to D50332: AMDGPU: Turn class x, p_zero|n_zero into fcmp oeq x, 0.
Mon, Aug 6, 9:05 AM
rampitec accepted D50319: AMDGPU: Fold v_lshl_or_b32 with 0 src0.

LGTM

Mon, Aug 6, 3:43 AM

Thu, Aug 2

rampitec accepted D50051: AMDGPU: Push fcanonicalize through partially constant build_vector.

Remove now unused NumOps variable. Otherwise LGTM.

Thu, Aug 2, 10:07 AM

Tue, Jul 31

rampitec accepted D49874: AMDGPU: Add clamp bit to dot intrinsics .

LGTM

Tue, Jul 31, 9:43 AM
rampitec accepted D50072: AMDGPU: Handle some vector operations in isCanonicalized.

LGTM

Tue, Jul 31, 9:40 AM
rampitec accepted D50069: AMDGPU: Improve hack for packing conversion ops.

LGTM

Tue, Jul 31, 9:19 AM
rampitec added inline comments to D50051: AMDGPU: Push fcanonicalize through partially constant build_vector.
Tue, Jul 31, 9:02 AM
rampitec accepted D50044: AMDGPU: Refactor fcanonicalize combine.

LGTM

Tue, Jul 31, 8:55 AM

Mon, Jul 30

rampitec accepted D50011: AMDGPU: Add clamp bit to dot builtins.

LGTM

Mon, Jul 30, 2:25 PM
rampitec accepted D49983: AMDGPU: Treat more custom operations as canonicalizing.

LGTM

Mon, Jul 30, 10:25 AM

Thu, Jul 26

rampitec accepted D49605: AMDGPU: Fix implementation of isCanonicalized.

LGTM

Thu, Jul 26, 12:20 PM
rampitec added inline comments to D49605: AMDGPU: Fix implementation of isCanonicalized.
Thu, Jul 26, 12:20 PM
rampitec accepted D49845: AMDGPU: Conversions always produce canonical results.

I think this is right about round and extend.

Thu, Jul 26, 12:05 PM
rampitec added inline comments to D49605: AMDGPU: Fix implementation of isCanonicalized.
Thu, Jul 26, 12:02 PM
rampitec accepted D49842: AMDGPU: Fix code size for return_to_epilog pseudo.

LGTM

Thu, Jul 26, 11:47 AM

Wed, Jul 25

rampitec committed rL337938: [AMDGPU] Use AssumptionCacheTracker in the divrem32 expansion.
[AMDGPU] Use AssumptionCacheTracker in the divrem32 expansion
Wed, Jul 25, 10:03 AM
rampitec closed D49761: [AMDGPU] Use AssumptionCacheTracker in the divrem32 expansion.
Wed, Jul 25, 10:02 AM
rampitec committed rL337936: Fix llvm::ComputeNumSignBits with some operations and llvm.assume.
Fix llvm::ComputeNumSignBits with some operations and llvm.assume
Wed, Jul 25, 9:39 AM
rampitec closed D49759: Fix llvm::ComputeNumSignBits with some operations and llvm.assume.
Wed, Jul 25, 9:39 AM

Tue, Jul 24

rampitec added a dependency for D49761: [AMDGPU] Use AssumptionCacheTracker in the divrem32 expansion: D49759: Fix llvm::ComputeNumSignBits with some operations and llvm.assume.
Tue, Jul 24, 3:01 PM
rampitec added a dependent revision for D49759: Fix llvm::ComputeNumSignBits with some operations and llvm.assume: D49761: [AMDGPU] Use AssumptionCacheTracker in the divrem32 expansion.
Tue, Jul 24, 3:01 PM
rampitec created D49761: [AMDGPU] Use AssumptionCacheTracker in the divrem32 expansion.
Tue, Jul 24, 3:00 PM
rampitec created D49759: Fix llvm::ComputeNumSignBits with some operations and llvm.assume.
Tue, Jul 24, 2:21 PM

Jul 20 2018

rampitec added inline comments to D49561: AMDGPU: Try to make isKnownNeverSNan more accurate.
Jul 20 2018, 1:32 PM
rampitec added a comment to D49561: AMDGPU: Try to make isKnownNeverSNan more accurate.

So given the discussion you seem to be missing DAG.isKnownNeverNaN(Op) condition.

Jul 20 2018, 1:30 PM
rampitec added a comment to D49561: AMDGPU: Try to make isKnownNeverSNan more accurate.
  1. Should we care about sNaNs with FP exceptions disabled?

Yes. This has to work. The OpenCL conformance tests check for this

Jul 20 2018, 1:16 PM
rampitec added a comment to D49605: AMDGPU: Fix implementation of isCanonicalized.

I think we need a diff against master.

Jul 20 2018, 12:10 PM
rampitec added a comment to D49561: AMDGPU: Try to make isKnownNeverSNan more accurate.

I would tend to say isKnownNeverSNan here basically tells not that it cannot be sNaN, but rather that we do not care even if it is. At least we should not care if FP exceptions are off.

Except that's not true since there can be sNaNs can be loaded, so even if exceptions aren't handled they need to be dealt with

Do you mean we shall quiet snans with an fmin/fmax even if exception handling is off? My understanding was different. In fact I thought we can completely ignore them with say fast math.

Yes. My understanding is the exception handling aspect is different from the existence of sNaNs, and also orthogonal to enabling fast math. If exception handling is enabled, operations that can produce sNaNs will, but there can still be sNaNs loaded from memory, constants etc. I think if the function is marked no-nans, we can maybe assume this won't happen? The fast math flags aren't sufficient there because arbitrary operations like a load or function argument don't have an associated flag.

With ieee_mode enabled (which is currently the default) v_min_f32 et. al. a qNaN is returned if either input is an sNaN. If ieee_mode is off, it returns the non-NaN operand as-if it were a qNaN. I think for what we actually want, ieee_mode is harmful since it requires the library implementation of OpenCL's fmin to insert canonicalizes to quiet the inputs. Since LLVM doesn't properly handle sNaNs anywhere, I think enabling this is a bit pointless. However, whether it's on or not, I think we need to to get sNaN behavior correct at least for this one operation in order to be able to optimize out redundant canonicalizes.

Jul 20 2018, 11:39 AM
rampitec added a comment to D49561: AMDGPU: Try to make isKnownNeverSNan more accurate.

First, I think this is wrong diff attached, that is not what is in the trunk on the left side of the diff now.

Jul 20 2018, 11:02 AM

Jul 19 2018

rampitec added a comment to D49561: AMDGPU: Try to make isKnownNeverSNan more accurate.

At the very least it is knownNeverNan, then it cannot be a signaling nan as well.

Jul 19 2018, 4:20 PM
rampitec added a comment to D49561: AMDGPU: Try to make isKnownNeverSNan more accurate.

I would tend to say isKnownNeverSNan here basically tells not that it cannot be sNaN, but rather that we do not care even if it is. At least we should not care if FP exceptions are off.

Except that's not true since there can be sNaNs can be loaded, so even if exceptions aren't handled they need to be dealt with

Jul 19 2018, 4:08 PM
rampitec added a reviewer for D49561: AMDGPU: Try to make isKnownNeverSNan more accurate: b-sumner.
Jul 19 2018, 4:05 PM
rampitec added inline comments to D49561: AMDGPU: Try to make isKnownNeverSNan more accurate.
Jul 19 2018, 12:48 PM
rampitec added a comment to D49561: AMDGPU: Try to make isKnownNeverSNan more accurate.

I would tend to say isKnownNeverSNan here basically tells not that it cannot be sNaN, but rather that we do not care even if it is. At least we should not care if FP exceptions are off.

Jul 19 2018, 12:43 PM

Jul 18 2018

rampitec accepted D49516: [LoadStoreVectorizer] Use getMinusScev() to compute the distance between two pointers..

LGTM

Jul 18 2018, 5:17 PM

Jul 17 2018

rampitec added a reviewer for D49448: [AMDGPU] Fix VGPR spills where offset doesn't fit in 12 bits: arsenm.
Jul 17 2018, 3:20 PM
rampitec accepted D49428: [LSV] Look through selects for consecutive addresses.

LGTM

Jul 17 2018, 3:18 PM

Jul 16 2018

rampitec accepted D49342: [LSV] Refactoring + supporting bitcasts to a type of different size.

LGTM

Jul 16 2018, 2:20 PM
rampitec added inline comments to D49342: [LSV] Refactoring + supporting bitcasts to a type of different size.
Jul 16 2018, 10:32 AM

Jul 13 2018

rampitec accepted D49146: [AMDGPU] Support a fdot2 pattern..

LGTM

Jul 13 2018, 3:05 PM
rampitec accepted D49308: AMDGPU: Use existing function to check for VGPRs.

LGTM

Jul 13 2018, 12:15 PM
rampitec accepted D49288: [AMDGPU] run post-RA hazard recognizer pass late.

LGTM

Jul 13 2018, 12:14 PM · Restricted Project
rampitec added inline comments to D49146: [AMDGPU] Support a fdot2 pattern..
Jul 13 2018, 10:36 AM
rampitec added inline comments to D49146: [AMDGPU] Support a fdot2 pattern..
Jul 13 2018, 10:14 AM
rampitec accepted D49065: AMDGPU: Stop wasting argument registers with v3i32/v3f32.

LGTM

Jul 13 2018, 9:47 AM
rampitec accepted D49287: AMDGPU: Break 64-bit arguments into 32-bit pieces.

LGTM

Jul 13 2018, 9:46 AM
rampitec added inline comments to D49288: [AMDGPU] run post-RA hazard recognizer pass late.
Jul 13 2018, 9:45 AM · Restricted Project

Jul 12 2018

rampitec accepted D49255: AMDGPU: Split wide vectors of i16/f16 into 32-bit regs on calls.

LGTM

Jul 12 2018, 10:48 AM
rampitec accepted D49254: AMDGPU: Scalarize vector argument types to calls.

LGTM

Jul 12 2018, 10:47 AM
rampitec accepted D48978: AMDGPU: Fix handling of alignment padding in DAG argument lowering.

LGTM

Jul 12 2018, 10:44 AM

Jul 10 2018

rampitec added a comment to D49146: [AMDGPU] Support a fdot2 pattern..

As far as I understand it should be also legal with -mattr=-fp32-denormals,-fp64-fp16-denormals. I.e. when both 32 and 16 denorms are not supported. Right? Not that is really helps in the real world.
Otherwise it shall be legal if either UnsafeAlgebra or AllowContract flag is set on both FMA nodes.

Having the FMA node already grantees that either UnsafeAlgebra is set or AllowContract flag set is on the FAdd/FMUL nodes. We don't need to check them again during the FMA combine, right?

Jul 10 2018, 12:49 PM
rampitec added a comment to D49146: [AMDGPU] Support a fdot2 pattern..

This operation only rounds a single time, and unfortunately always flushes f32 denorms. Thus this transformation should only be done when unsafe math is requested.

Jul 10 2018, 10:40 AM
rampitec added a comment to D49146: [AMDGPU] Support a fdot2 pattern..

Does fdot2 perform rounding of intermediates?
Basically you start with two FMAs: FMA - Perform a * b + c with no intermediate rounding step. So the expression you are converting is quite fancy in terms of rounding:

Jul 10 2018, 9:56 AM
rampitec added a reviewer for D49146: [AMDGPU] Support a fdot2 pattern.: b-sumner.
Jul 10 2018, 9:51 AM

Jul 9 2018

rampitec accepted D49035: AMDGPU: Force inlining if LDS global address is used.

LGTM

Jul 9 2018, 11:52 AM
rampitec added inline comments to D49035: AMDGPU: Force inlining if LDS global address is used.
Jul 9 2018, 10:37 AM
rampitec added a comment to D49065: AMDGPU: Stop wasting argument registers with v3i32/v3f32.

Can you please add tests for <3 x i64> and <3 x double>?

Jul 9 2018, 10:17 AM

Jun 29 2018

rampitec accepted D48761: AMDGPU: Don't use struct type for argument layout.

LGTM

Jun 29 2018, 10:09 AM
rampitec committed rL335988: [AMDGPU] Enable LICM in the BE pipeline.
[AMDGPU] Enable LICM in the BE pipeline
Jun 29 2018, 9:31 AM
rampitec closed D48604: [AMDGPU] Enable LICM in the BE pipeline.
Jun 29 2018, 9:31 AM
rampitec added inline comments to D48761: AMDGPU: Don't use struct type for argument layout.
Jun 29 2018, 8:36 AM
rampitec added inline comments to D48761: AMDGPU: Don't use struct type for argument layout.
Jun 29 2018, 8:32 AM

Jun 28 2018

rampitec updated the diff for D48604: [AMDGPU] Enable LICM in the BE pipeline.

Moved LICM post SROA.

Jun 28 2018, 10:43 AM
rampitec added inline comments to D48604: [AMDGPU] Enable LICM in the BE pipeline.
Jun 28 2018, 10:32 AM
rampitec updated the diff for D48604: [AMDGPU] Enable LICM in the BE pipeline.

Rebase.

Jun 28 2018, 9:14 AM
rampitec committed rL335868: [AMDGPU] Early expansion of 32 bit udiv/urem.
[AMDGPU] Early expansion of 32 bit udiv/urem
Jun 28 2018, 9:04 AM
rampitec closed D48586: [AMDGPU] Early expansion of 32 bit udiv/urem.
Jun 28 2018, 9:04 AM
rampitec committed rL335866: [AMDGPU] Overload llvm.amdgcn.fmad.ftz to support f16.
[AMDGPU] Overload llvm.amdgcn.fmad.ftz to support f16
Jun 28 2018, 8:29 AM
rampitec closed D48677: [AMDGPU] Overload llvm.amdgcn.fmad.ftz to support f16.
Jun 28 2018, 8:29 AM

Jun 27 2018

rampitec updated the diff for D48586: [AMDGPU] Early expansion of 32 bit udiv/urem.

Rebase.

Jun 27 2018, 3:44 PM
rampitec added a comment to D48573: [AMDGPU] Add llvm.amdgcn.fmad.ftz intrinsic.

Actually according to the selection code, f16 mad does not support denormals the same, so the intrinsic should work with f16 if that is correct

Jun 27 2018, 3:05 PM
rampitec created D48677: [AMDGPU] Overload llvm.amdgcn.fmad.ftz to support f16.
Jun 27 2018, 3:05 PM
rampitec accepted D48635: AMDGPU: Remove MFI::ABIArgOffset.

LGTM

Jun 27 2018, 11:50 AM
rampitec accepted D48630: AMDGPU: Fix assert on aggregate type kernel arguments.

LGTM

Jun 27 2018, 11:27 AM
rampitec accepted D48639: AMDGPU: Fix AMDGPUCodeGenPrepare using uninitialized AMDGPUAS struct.

LGTM

Jun 27 2018, 8:47 AM
rampitec committed rL335742: [AMDGPU] Convert rcp to rcp_iflag.
[AMDGPU] Convert rcp to rcp_iflag
Jun 27 2018, 8:38 AM
rampitec closed D48569: [AMDGPU] Convert rcp to rcp_iflag.
Jun 27 2018, 8:38 AM

Jun 26 2018

rampitec updated the diff for D48586: [AMDGPU] Early expansion of 32 bit udiv/urem.

Changed hi/lo split method as suggested.

Jun 26 2018, 2:42 PM
rampitec committed rL335654: [AMDGPU] Add llvm.amdgcn.fmad.ftz intrinsic.
[AMDGPU] Add llvm.amdgcn.fmad.ftz intrinsic
Jun 26 2018, 1:09 PM
rampitec closed D48573: [AMDGPU] Add llvm.amdgcn.fmad.ftz intrinsic.
Jun 26 2018, 1:09 PM
rampitec added a comment to D48586: [AMDGPU] Early expansion of 32 bit udiv/urem.

Won't doing this break the case where both the div and rem are used, so the full expansion will be used twice?

Jun 26 2018, 12:40 PM
rampitec updated the diff for D48573: [AMDGPU] Add llvm.amdgcn.fmad.ftz intrinsic.

Added tests with source modifiers.

Jun 26 2018, 12:38 PM