This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Analysis/
-
Analysis/
3
ValueTracking.cpp
-
test/Transforms/InstCombine/AMDGPU/
-
Transforms/
-
InstCombine/
-
AMDGPU/
-
freeze.ll

Differential D152798

[AMDGPU][ValueTracking] Handle amdgcn intrinsics that cannot create poison
Needs ReviewPublic

Authored by foad on Jun 13 2023, 3:38 AM.

Download Raw Diff

Details

Reviewers

nikic
nhaehnle
tsymalla
nlopes

Group Reviewers

Restricted Project

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

foad created this revision.Jun 13 2023, 3:38 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2023, 3:38 AM

Herald added subscribers: StephenFan, kerbowa, hiraditya and 5 others. · View Herald Transcript

foad requested review of this revision.Jun 13 2023, 3:38 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2023, 3:38 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B238447: Diff 530849.Jun 13 2023, 3:38 AM

The nice thing to do would be to route this through a TTI hook.

Also I would argue that no target intrinsics can produce poison. Arithmetic poison ultimately exists because of the desire to tolerate different edge case handling on different targets, but target instructions should be well defined

llvm/lib/Analysis/ValueTracking.cpp
6419	Bunch more could be handled (I can’t name one that shouldn’t)

I'd appreciate some confirmation that this makes sense, for some or all of the cases. Here's my thinking:

case Intrinsic::amdgcn_fmed3:
case Intrinsic::amdgcn_fmul_legacy:
case Intrinsic::amdgcn_fract:
case Intrinsic::amdgcn_rcp:
case Intrinsic::amdgcn_rsq:

These are just standard arithmetic operations, so should be handled the same as generic arithmetic intrinsics like maxnum.

case Intrinsic::amdgcn_icmp:

This is like ballot, a convergent operation that depends on values in other lanes.

case Intrinsic::amdgcn_interp_p1:
case Intrinsic::amdgcn_interp_p1_f16:
case Intrinsic::amdgcn_interp_p2:
case Intrinsic::amdgcn_interp_p2_f16:

These ones load values from LDS as well as doing arithmetic, but we don't expose the load to the compiler so I don't think there's any reason to say that the loaded value could be poison.

case Intrinsic::amdgcn_mbcnt_hi:
case Intrinsic::amdgcn_mbcnt_lo:

The result of these depends on the lane ID of the executing lane.

case Intrinsic::amdgcn_readfirstlane:
case Intrinsic::amdgcn_readlane:

These depend on values from other lanes. For readlane we could define that the result is poison if the lane number argument is out of range, but AFAIK we do not do that - the spec is just that you get whatever the corresponding hardware instruction would return, and hardware instructions never return poison.

case Intrinsic::amdgcn_softwqm:
case Intrinsic::amdgcn_wqm:
case Intrinsic::amdgcn_wwm:

These ones return the input value unmodified.

In summary, for the intrinsics whose results depend on some extra state, there is no point in saying that that state could be poison, unless it is exposed as a regular IR value that the compiler can reason about.

foad added a parent revision: D152797: [AMDGPU][ValueTracking] Precommit tests for amdgcn intrinsics that cannot create poison.Jun 13 2023, 3:51 AM

Also I would argue that no target intrinsics can produce poison. Arithmetic poison ultimately exists because of the desire to tolerate different edge case handling on different targets, but target instructions should be well defined

That might work for creating poison, but what about propagating poison? Wouldn't it be beneficial to say that target arithmetic intrinsics still propagate poison like regular arithmetic does?

In D152798#4416991, @foad wrote:

Also I would argue that no target intrinsics can produce poison. Arithmetic poison ultimately exists because of the desire to tolerate different edge case handling on different targets, but target instructions should be well defined

That might work for creating poison, but what about propagating poison? Wouldn't it be beneficial to say that target arithmetic intrinsics still propagate poison like regular arithmetic does?

Yes.

I also realized we do have some sources of poison. The various intrinsics to access inputs return poison when called through functions with the corresponding "amdgpu-no..." attributes or incompatible calling conventions. I'd expect to cover every intrinsic which selects to a real instruction to be covered here

In D152798#4416957, @arsenm wrote:

The nice thing to do would be to route this through a TTI hook.

This would be a pretty big annoyance, because ValueTracking is used everywhere, and this would require adding a TTI dependency to pretty much all passes. The practical outcome would probably be that most callers just wouldn't pass TTI. As such, prefer the approach in this patch.

Also I would argue that no target intrinsics can produce poison. Arithmetic poison ultimately exists because of the desire to tolerate different edge case handling on different targets, but target instructions should be well defined

Just to give an obvious counter-example: Load-like intrinsics tend to produce poison (because there might be poison in memory).

In D152798#4417027, @nikic wrote:

Just to give an obvious counter-example: Load-like intrinsics tend to produce poison (because there might be poison in memory).

But that would be a property of the memory, not the operation itself. I also thought load poison was UB?

In D152798#4417027, @nikic wrote:

Just to give an obvious counter-example: Load-like intrinsics tend to produce poison (because there might be poison in memory).

I was wondering about that case myself. I don't see the benefit of saying that a load-like intrinsic can return poison, unless the compiler recognizes it as a load well enough to do things like forwarding a previously stored value through it.

In the AMDGPU backend we have lots of load-like intrinsics that are mostly opaque to the compiler, except that it knows they read from some address space.

arsenm added inline comments.Jun 13 2023, 4:20 AM

llvm/lib/Analysis/ValueTracking.cpp
6419	amdgcn_log, amdgcn_fmad_ftz, int_amdgcn_mul_i24, int_amdgcn_mul_u24, int_amdgcn_mulhi_i24, int_amdgcn_mulhi_u24, int_amdgcn_inverse_ballot, int_amdgcn_ballot, int_amdgcn_cvt_pk_u8_f32, int_amdgcn_mqsad_u32_u8, int_amdgcn_mqsad_pk_u16_u8, int_amdgcn_qsad_pk_u16_u8, int_amdgcn_sad_u16, int_amdgcn_sad_hi_u8, int_amdgcn_msad_u8, int_amdgcn_sad_u8, f int_amdgcn_lerp, int_amdgcn_sbfe, int_amdgcn_ubfe, int_amdgcn_mov_dpp, int_amdgcn_update_dpp, int_amdgcn_ds_permute, int_amdgcn_ds_bpermute, int_amdgcn_perm, int_amdgcn_permlane16, int_amdgcn_permlanex16, int_amdgcn_mov_dpp8, int_amdgcn_s_get_waveid_in_workgroup, int_amdgcn_permlane64, int_amdgcn_fdot2, int_amdgcn_fdot2_f16_f16, int_amdgcn_fdot2_bf16_bf16, int_amdgcn_fdot2_f32_bf16, int_amdgcn_sdot2, int_amdgcn_udot2, int_amdgcn_sdot4, int_amdgcn_udot4, int_amdgcn_sudot4, int_amdgcn_udot8, int_amdgcn_sudot8 All the MFMA, there are a bunch more cvt* int_amdgcn_fdiv_fast

In D152798#4417039, @foad wrote:

In D152798#4417027, @nikic wrote:

Just to give an obvious counter-example: Load-like intrinsics tend to produce poison (because there might be poison in memory).

I was wondering about that case myself. I don't see the benefit of saying that a load-like intrinsic can return poison, unless the compiler recognizes it as a load well enough to do things like forwarding a previously stored value through it.

Not sure I follow. Do you want to define your load intrinsics are performing an implicit freeze? You could do that, but I'm not sure it is desirable.

llvm/lib/Analysis/ValueTracking.cpp
6419	If you want to add that many intrinsics, it's probably better to add a new intrinsic property to the TableGen intrinsic definitions. That should also make it easier to see how complete the coverage is. (Easiest way to do that is probably a new function attribute.)

I was wondering about that case myself. I don't see the benefit of saying that a load-like intrinsic can return poison, unless the compiler recognizes it as a load well enough to do things like forwarding a previously stored value through it.

Not sure I follow. Do you want to define your load intrinsics are performing an implicit freeze? You could do that, but I'm not sure it is desirable.

I don't understand the big picture here: what possible benefit is there to telling the compiler that the value that is loaded can (or can't) be poison, when the compiler knows nothing else about the provenance of that value?

(By contrast, for a normal IR load, I can see that tracking the poison-ness of the loaded value might be useful, so that IR passes can add or remove store->load pairs without affecting poison-ness.)

there might be poison in memory

Perhaps https://llvm.org/docs/LangRef.html#poison-values should mention this interesting fact.

In D152798#4417179, @foad wrote:

I was wondering about that case myself. I don't see the benefit of saying that a load-like intrinsic can return poison, unless the compiler recognizes it as a load well enough to do things like forwarding a previously stored value through it.

Not sure I follow. Do you want to define your load intrinsics are performing an implicit freeze? You could do that, but I'm not sure it is desirable.

I don't understand the big picture here: what possible benefit is there to telling the compiler that the value that is loaded can (or can't) be poison, when the compiler knows nothing else about the provenance of that value?

(By contrast, for a normal IR load, I can see that tracking the poison-ness of the loaded value might be useful, so that IR passes can add or remove store->load pairs without affecting poison-ness.)

Okay, I think I get what you mean now. You want to say the load result is non-poison/undef, because the compiler will never replace the load with a literal undef/poison value due to lack of optimization support, and as such you can't introduce practical miscompiles with it. This makes me pretty uncomfortable, because you'd effectively be lying, just in a way that shouldn't matter. Maybe @nlopes has some insight on whether this could cause any specific issues.

In D152798#4417317, @nikic wrote:

In D152798#4417179, @foad wrote:

I was wondering about that case myself. I don't see the benefit of saying that a load-like intrinsic can return poison, unless the compiler recognizes it as a load well enough to do things like forwarding a previously stored value through it.

Not sure I follow. Do you want to define your load intrinsics are performing an implicit freeze? You could do that, but I'm not sure it is desirable.

I don't understand the big picture here: what possible benefit is there to telling the compiler that the value that is loaded can (or can't) be poison, when the compiler knows nothing else about the provenance of that value?

(By contrast, for a normal IR load, I can see that tracking the poison-ness of the loaded value might be useful, so that IR passes can add or remove store->load pairs without affecting poison-ness.)

Okay, I think I get what you mean now. You want to say the load result is non-poison/undef, because the compiler will never replace the load with a literal undef/poison value due to lack of optimization support, and as such you can't introduce practical miscompiles with it. This makes me pretty uncomfortable, because you'd effectively be lying, just in a way that shouldn't matter. Maybe @nlopes has some insight on whether this could cause any specific issues.

Right. I have a similar issue with the arguments passed into GPU kernels, which are launched by hardware:

define amdgpu_cs void @f(i32 %x, i32 %y) {
  ...
}

The compiler will never see IR for a call to this function. You're not allowed to call kernels from IR. The argument values %x and %y are set up by hardware, and I think I'd like to mark them as noundef so the compiler can assume they are never poison.

If a function call is guaranteed to never be inlined, then it's ok to mark the declaration as not returning undef/poison.
But you can't mark the function definition as such, since the function may in fact return a poison and then you end up with UB.

For AMD's special load intrinsics, it's fine to mark them as non-undef/poison as long as they are not lowered to vanilla load instructions in SelDag. As then the lowering would be wrong. You need to keep the façade until assembly time.

The semantics needs to be consistent throughout the pipeline. It's ok to "lie" about the semantics of something, as long as the compiler never gets to see the real thing. The real stuff can only meet at linker time. At that point there's no poison or undef values, and therefore your promises about a function/instruction not returning undef/poison are correct.

That's why you can give whatever semantics you want to declarations, as long as you only see the definition in assembly/SelDag and the semantics you promised correspond to the asm/SelDag thing.

My 2 cents:

For the interp intrinsic even the underlying memory is sort of known to never contain poison. The data ultimately comes from an export out of a vertex shader, and *that* data could be poison, but there's a lot of fixed function hardware machinery in-between and we can always pretend that this machinery has a "freeze" somewhere. (This prevents a hypothetical transform that constant-propagates poison from the vertex shader into the fragment shader, but I really don't think that such a transform is useful for performance.)

I'm less comfortable about saying that readlane cannot create poison. Reading from an out-of-range lane is basically like reading from an arbitrary uninitialized register, which is basically the ur-version of poison. I'm not sure that it matters, but it *may* matter via fun facts like: readlane is memory(none), but if you duplicate it and the source lane is out of range, you may get different results from the duplicates. Even readfirstlane is "interesting": the only way for readfirstlane to return poison is if the input is poison; however it is sufficient for the input *in a different thread* to be poison. There may be dragons there.

Apart from that, I'm generally in favor of this change.

If we are agreed that intrinsics that map down to a normal arithmetic instruction should propagate poison, then I will rework this patch with a new NoCreatePoison intrinsic attribute.

In D152798#4417753, @foad wrote:

If we are agreed that intrinsics that map down to a normal arithmetic instruction should propagate poison, then I will rework this patch with a new NoCreatePoison intrinsic attribute.

I think that's trivially true

Revision Contents

Path

Size

llvm/

lib/

Analysis/

ValueTracking.cpp

17 lines

test/

Transforms/

InstCombine/

AMDGPU/

freeze.ll

51 lines

Diff 530849

llvm/lib/Analysis/ValueTracking.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,394 Lines • ▼ Show 20 Lines	if (auto *II = dyn_cast<IntrinsicInst>(Op)) {
case Intrinsic::fptrunc_round:		case Intrinsic::fptrunc_round:
case Intrinsic::canonicalize:		case Intrinsic::canonicalize:
case Intrinsic::arithmetic_fence:		case Intrinsic::arithmetic_fence:
case Intrinsic::minnum:		case Intrinsic::minnum:
case Intrinsic::maxnum:		case Intrinsic::maxnum:
case Intrinsic::minimum:		case Intrinsic::minimum:
case Intrinsic::maximum:		case Intrinsic::maximum:
case Intrinsic::is_fpclass:		case Intrinsic::is_fpclass:
		case Intrinsic::amdgcn_fmed3:
		case Intrinsic::amdgcn_fmul_legacy:
		case Intrinsic::amdgcn_fract:
		case Intrinsic::amdgcn_icmp:
		case Intrinsic::amdgcn_interp_p1:
		case Intrinsic::amdgcn_interp_p1_f16:
		case Intrinsic::amdgcn_interp_p2:
		case Intrinsic::amdgcn_interp_p2_f16:
		case Intrinsic::amdgcn_mbcnt_hi:
		case Intrinsic::amdgcn_mbcnt_lo:
		case Intrinsic::amdgcn_rcp:
		case Intrinsic::amdgcn_readfirstlane:
		case Intrinsic::amdgcn_readlane:
		case Intrinsic::amdgcn_rsq:
		case Intrinsic::amdgcn_softwqm:
		case Intrinsic::amdgcn_wqm:
		case Intrinsic::amdgcn_wwm:
		arsenmUnsubmitted Not Done Reply Inline Actions Bunch more could be handled (I can’t name one that shouldn’t) arsenm: Bunch more could be handled (I can’t name one that shouldn’t)
		arsenmUnsubmitted Not Done Reply Inline Actions amdgcn_log, amdgcn_fmad_ftz, int_amdgcn_mul_i24, int_amdgcn_mul_u24, int_amdgcn_mulhi_i24, int_amdgcn_mulhi_u24, int_amdgcn_inverse_ballot, int_amdgcn_ballot, int_amdgcn_cvt_pk_u8_f32, int_amdgcn_mqsad_u32_u8, int_amdgcn_mqsad_pk_u16_u8, int_amdgcn_qsad_pk_u16_u8, int_amdgcn_sad_u16, int_amdgcn_sad_hi_u8, int_amdgcn_msad_u8, int_amdgcn_sad_u8, f int_amdgcn_lerp, int_amdgcn_sbfe, int_amdgcn_ubfe, int_amdgcn_mov_dpp, int_amdgcn_update_dpp, int_amdgcn_ds_permute, int_amdgcn_ds_bpermute, int_amdgcn_perm, int_amdgcn_permlane16, int_amdgcn_permlanex16, int_amdgcn_mov_dpp8, int_amdgcn_s_get_waveid_in_workgroup, int_amdgcn_permlane64, int_amdgcn_fdot2, int_amdgcn_fdot2_f16_f16, int_amdgcn_fdot2_bf16_bf16, int_amdgcn_fdot2_f32_bf16, int_amdgcn_sdot2, int_amdgcn_udot2, int_amdgcn_sdot4, int_amdgcn_udot4, int_amdgcn_sudot4, int_amdgcn_udot8, int_amdgcn_sudot8 All the MFMA, there are a bunch more cvt* int_amdgcn_fdiv_fast arsenm: amdgcn_log, amdgcn_fmad_ftz, int_amdgcn_mul_i24, int_amdgcn_mul_u24, int_amdgcn_mulhi_i24…
		nikicUnsubmitted Not Done Reply Inline Actions If you want to add that many intrinsics, it's probably better to add a new intrinsic property to the TableGen intrinsic definitions. That should also make it easier to see how complete the coverage is. (Easiest way to do that is probably a new function attribute.) nikic: If you want to add //that// many intrinsics, it's probably better to add a new intrinsic…
return false;		return false;
case Intrinsic::lround:		case Intrinsic::lround:
case Intrinsic::llround:		case Intrinsic::llround:
case Intrinsic::lrint:		case Intrinsic::lrint:
case Intrinsic::llrint:		case Intrinsic::llrint:
// If the value doesn't fit an unspecified value is returned (but this		// If the value doesn't fit an unspecified value is returned (but this
// is not poison).		// is not poison).
return false;		return false;
▲ Show 20 Lines • Show All 2,235 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/AMDGPU/freeze.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
	; RUN: opt -march=amdgcn -S -passes=instcombine < %s \| FileCheck %s			; RUN: opt -march=amdgcn -S -passes=instcombine < %s \| FileCheck %s

	define float @fmed3(float noundef %x, float noundef %y, float noundef %z) {			define float @fmed3(float noundef %x, float noundef %y, float noundef %z) {
	; CHECK-LABEL: define float @fmed3			; CHECK-LABEL: define float @fmed3
	; CHECK-SAME: (float noundef [[X:%.]], float noundef [[Y:%.]], float noundef [[Z:%.*]]) {			; CHECK-SAME: (float noundef [[X:%.]], float noundef [[Y:%.]], float noundef [[Z:%.*]]) {
	; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.fmed3.f32(float [[X]], float [[Y]], float [[Z]])			; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.fmed3.f32(float [[X]], float [[Y]], float [[Z]])
	; CHECK-NEXT: [[FRZ:%.*]] = freeze float [[VAL]]			; CHECK-NEXT: ret float [[VAL]]
	; CHECK-NEXT: ret float [[FRZ]]
	;			;
	%val = call float @llvm.amdgcn.fmed3.f32(float %x, float %y, float %z)			%val = call float @llvm.amdgcn.fmed3.f32(float %x, float %y, float %z)
	%frz = freeze float %val			%frz = freeze float %val
	ret float %frz			ret float %frz
	}			}

	define float @fmul.legacy(float noundef %x, float noundef %y) {			define float @fmul.legacy(float noundef %x, float noundef %y) {
	; CHECK-LABEL: define float @fmul.legacy			; CHECK-LABEL: define float @fmul.legacy
	; CHECK-SAME: (float noundef [[X:%.]], float noundef [[Y:%.]]) {			; CHECK-SAME: (float noundef [[X:%.]], float noundef [[Y:%.]]) {
	; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.fmul.legacy(float [[X]], float [[Y]])			; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.fmul.legacy(float [[X]], float [[Y]])
	; CHECK-NEXT: [[FRZ:%.*]] = freeze float [[VAL]]			; CHECK-NEXT: ret float [[VAL]]
	; CHECK-NEXT: ret float [[FRZ]]
	;			;
	%val = call float @llvm.amdgcn.fmul.legacy(float %x, float %y)			%val = call float @llvm.amdgcn.fmul.legacy(float %x, float %y)
	%frz = freeze float %val			%frz = freeze float %val
	ret float %frz			ret float %frz
	}			}

	define float @fract(float noundef %x) {			define float @fract(float noundef %x) {
	; CHECK-LABEL: define float @fract			; CHECK-LABEL: define float @fract
	; CHECK-SAME: (float noundef [[X:%.*]]) {			; CHECK-SAME: (float noundef [[X:%.*]]) {
	; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.fract.f32(float [[X]])			; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.fract.f32(float [[X]])
	; CHECK-NEXT: [[FRZ:%.*]] = freeze float [[VAL]]			; CHECK-NEXT: ret float [[VAL]]
	; CHECK-NEXT: ret float [[FRZ]]
	;			;
	%val = call float @llvm.amdgcn.fract.f32(float %x)			%val = call float @llvm.amdgcn.fract.f32(float %x)
	%frz = freeze float %val			%frz = freeze float %val
	ret float %frz			ret float %frz
	}			}

	define i32 @icmp(i32 noundef %x, i32 noundef %y) {			define i32 @icmp(i32 noundef %x, i32 noundef %y) {
	; CHECK-LABEL: define i32 @icmp			; CHECK-LABEL: define i32 @icmp
	; CHECK-SAME: (i32 noundef [[X:%.]], i32 noundef [[Y:%.]]) {			; CHECK-SAME: (i32 noundef [[X:%.]], i32 noundef [[Y:%.]]) {
	; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.amdgcn.icmp.i32.i32(i32 [[X]], i32 [[Y]], i32 0)			; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.amdgcn.icmp.i32.i32(i32 [[X]], i32 [[Y]], i32 0)
	; CHECK-NEXT: [[FRZ:%.*]] = freeze i32 [[VAL]]			; CHECK-NEXT: ret i32 [[VAL]]
	; CHECK-NEXT: ret i32 [[FRZ]]
	;			;
	%val = call i32 @llvm.amdgcn.icmp.i32(i32 %x, i32 %y, i32 0)			%val = call i32 @llvm.amdgcn.icmp.i32(i32 %x, i32 %y, i32 0)
	%frz = freeze i32 %val			%frz = freeze i32 %val
	ret i32 %frz			ret i32 %frz
	}			}

	define float @interp.p1(float noundef %x) {			define float @interp.p1(float noundef %x) {
	; CHECK-LABEL: define float @interp.p1			; CHECK-LABEL: define float @interp.p1
	; CHECK-SAME: (float noundef [[X:%.*]]) {			; CHECK-SAME: (float noundef [[X:%.*]]) {
	; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.interp.p1(float [[X]], i32 0, i32 0, i32 0)			; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.interp.p1(float [[X]], i32 0, i32 0, i32 0)
	; CHECK-NEXT: [[FRZ:%.*]] = freeze float [[VAL]]			; CHECK-NEXT: ret float [[VAL]]
	; CHECK-NEXT: ret float [[FRZ]]
	;			;
	%val = call float @llvm.amdgcn.interp.p1(float %x, i32 0, i32 0, i32 0)			%val = call float @llvm.amdgcn.interp.p1(float %x, i32 0, i32 0, i32 0)
	%frz = freeze float %val			%frz = freeze float %val
	ret float %frz			ret float %frz
	}			}

	define float @interp.p1.f16(float noundef %x) {			define float @interp.p1.f16(float noundef %x) {
	; CHECK-LABEL: define float @interp.p1.f16			; CHECK-LABEL: define float @interp.p1.f16
	; CHECK-SAME: (float noundef [[X:%.*]]) {			; CHECK-SAME: (float noundef [[X:%.*]]) {
	; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.interp.p1.f16(float [[X]], i32 0, i32 0, i1 false, i32 0)			; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.interp.p1.f16(float [[X]], i32 0, i32 0, i1 false, i32 0)
	; CHECK-NEXT: [[FRZ:%.*]] = freeze float [[VAL]]			; CHECK-NEXT: ret float [[VAL]]
	; CHECK-NEXT: ret float [[FRZ]]
	;			;
	%val = call float @llvm.amdgcn.interp.p1.f16(float %x, i32 0, i32 0, i1 false, i32 0)			%val = call float @llvm.amdgcn.interp.p1.f16(float %x, i32 0, i32 0, i1 false, i32 0)
	%frz = freeze float %val			%frz = freeze float %val
	ret float %frz			ret float %frz
	}			}

	define float @interp.p2(float noundef %x, float noundef %y) {			define float @interp.p2(float noundef %x, float noundef %y) {
	; CHECK-LABEL: define float @interp.p2			; CHECK-LABEL: define float @interp.p2
	; CHECK-SAME: (float noundef [[X:%.]], float noundef [[Y:%.]]) {			; CHECK-SAME: (float noundef [[X:%.]], float noundef [[Y:%.]]) {
	; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.interp.p2(float [[X]], float [[Y]], i32 0, i32 0, i32 0)			; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.interp.p2(float [[X]], float [[Y]], i32 0, i32 0, i32 0)
	; CHECK-NEXT: [[FRZ:%.*]] = freeze float [[VAL]]			; CHECK-NEXT: ret float [[VAL]]
	; CHECK-NEXT: ret float [[FRZ]]
	;			;
	%val = call float @llvm.amdgcn.interp.p2(float %x, float %y, i32 0, i32 0, i32 0)			%val = call float @llvm.amdgcn.interp.p2(float %x, float %y, i32 0, i32 0, i32 0)
	%frz = freeze float %val			%frz = freeze float %val
	ret float %frz			ret float %frz
	}			}

	define half @interp.p2.f16(float noundef %x, float noundef %y) {			define half @interp.p2.f16(float noundef %x, float noundef %y) {
	; CHECK-LABEL: define half @interp.p2.f16			; CHECK-LABEL: define half @interp.p2.f16
	; CHECK-SAME: (float noundef [[X:%.]], float noundef [[Y:%.]]) {			; CHECK-SAME: (float noundef [[X:%.]], float noundef [[Y:%.]]) {
	; CHECK-NEXT: [[VAL:%.*]] = call half @llvm.amdgcn.interp.p2.f16(float [[X]], float [[Y]], i32 0, i32 0, i1 false, i32 0)			; CHECK-NEXT: [[VAL:%.*]] = call half @llvm.amdgcn.interp.p2.f16(float [[X]], float [[Y]], i32 0, i32 0, i1 false, i32 0)
	; CHECK-NEXT: [[FRZ:%.*]] = freeze half [[VAL]]			; CHECK-NEXT: ret half [[VAL]]
	; CHECK-NEXT: ret half [[FRZ]]
	;			;
	%val = call half @llvm.amdgcn.interp.p2.f16(float %x, float %y, i32 0, i32 0, i1 false, i32 0)			%val = call half @llvm.amdgcn.interp.p2.f16(float %x, float %y, i32 0, i32 0, i1 false, i32 0)
	%frz = freeze half %val			%frz = freeze half %val
	ret half %frz			ret half %frz
	}			}

	define i32 @mbcnt.hi(i32 noundef %x) {			define i32 @mbcnt.hi(i32 noundef %x) {
	; CHECK-LABEL: define i32 @mbcnt.hi			; CHECK-LABEL: define i32 @mbcnt.hi
	; CHECK-SAME: (i32 noundef [[X:%.*]]) {			; CHECK-SAME: (i32 noundef [[X:%.*]]) {
	; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[X]], i32 0)			; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[X]], i32 0)
	; CHECK-NEXT: [[FRZ:%.*]] = freeze i32 [[VAL]]			; CHECK-NEXT: ret i32 [[VAL]]
	; CHECK-NEXT: ret i32 [[FRZ]]
	;			;
	%val = call i32 @llvm.amdgcn.mbcnt.hi(i32 %x, i32 0)			%val = call i32 @llvm.amdgcn.mbcnt.hi(i32 %x, i32 0)
	%frz = freeze i32 %val			%frz = freeze i32 %val
	ret i32 %frz			ret i32 %frz
	}			}

	define i32 @mbcnt.lo(i32 noundef %x) {			define i32 @mbcnt.lo(i32 noundef %x) {
	; CHECK-LABEL: define i32 @mbcnt.lo			; CHECK-LABEL: define i32 @mbcnt.lo
	; CHECK-SAME: (i32 noundef [[X:%.*]]) {			; CHECK-SAME: (i32 noundef [[X:%.*]]) {
	; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[X]], i32 0)			; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[X]], i32 0)
	; CHECK-NEXT: [[FRZ:%.*]] = freeze i32 [[VAL]]			; CHECK-NEXT: ret i32 [[VAL]]
	; CHECK-NEXT: ret i32 [[FRZ]]
	;			;
	%val = call i32 @llvm.amdgcn.mbcnt.hi(i32 %x, i32 0)			%val = call i32 @llvm.amdgcn.mbcnt.hi(i32 %x, i32 0)
	%frz = freeze i32 %val			%frz = freeze i32 %val
	ret i32 %frz			ret i32 %frz
	}			}

	define float @rcp(float noundef %x) {			define float @rcp(float noundef %x) {
	; CHECK-LABEL: define float @rcp			; CHECK-LABEL: define float @rcp
	; CHECK-SAME: (float noundef [[X:%.*]]) {			; CHECK-SAME: (float noundef [[X:%.*]]) {
	; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.rcp.f32(float [[X]])			; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.rcp.f32(float [[X]])
	; CHECK-NEXT: [[FRZ:%.*]] = freeze float [[VAL]]			; CHECK-NEXT: ret float [[VAL]]
	; CHECK-NEXT: ret float [[FRZ]]
	;			;
	%val = call float @llvm.amdgcn.rcp.f32(float %x)			%val = call float @llvm.amdgcn.rcp.f32(float %x)
	%frz = freeze float %val			%frz = freeze float %val
	ret float %frz			ret float %frz
	}			}

	define i32 @readfirstlane(i32 noundef %x) {			define i32 @readfirstlane(i32 noundef %x) {
	; CHECK-LABEL: define i32 @readfirstlane			; CHECK-LABEL: define i32 @readfirstlane
	; CHECK-SAME: (i32 noundef [[X:%.*]]) {			; CHECK-SAME: (i32 noundef [[X:%.*]]) {
	; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[X]])			; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[X]])
	; CHECK-NEXT: [[FRZ:%.*]] = freeze i32 [[VAL]]			; CHECK-NEXT: ret i32 [[VAL]]
	; CHECK-NEXT: ret i32 [[FRZ]]
	;			;
	%val = call i32 @llvm.amdgcn.readfirstlane(i32 %x)			%val = call i32 @llvm.amdgcn.readfirstlane(i32 %x)
	%frz = freeze i32 %val			%frz = freeze i32 %val
	ret i32 %frz			ret i32 %frz
	}			}

	define i32 @readlane(i32 noundef %x) {			define i32 @readlane(i32 noundef %x) {
	; CHECK-LABEL: define i32 @readlane			; CHECK-LABEL: define i32 @readlane
	; CHECK-SAME: (i32 noundef [[X:%.*]]) {			; CHECK-SAME: (i32 noundef [[X:%.*]]) {
	; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.amdgcn.readlane(i32 [[X]], i32 7)			; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.amdgcn.readlane(i32 [[X]], i32 7)
	; CHECK-NEXT: [[FRZ:%.*]] = freeze i32 [[VAL]]			; CHECK-NEXT: ret i32 [[VAL]]
	; CHECK-NEXT: ret i32 [[FRZ]]
	;			;
	%val = call i32 @llvm.amdgcn.readlane(i32 %x, i32 7)			%val = call i32 @llvm.amdgcn.readlane(i32 %x, i32 7)
	%frz = freeze i32 %val			%frz = freeze i32 %val
	ret i32 %frz			ret i32 %frz
	}			}

	define float @rsq(float noundef %x) {			define float @rsq(float noundef %x) {
	; CHECK-LABEL: define float @rsq			; CHECK-LABEL: define float @rsq
	; CHECK-SAME: (float noundef [[X:%.*]]) {			; CHECK-SAME: (float noundef [[X:%.*]]) {
	; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.rsq.f32(float [[X]])			; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.rsq.f32(float [[X]])
	; CHECK-NEXT: [[FRZ:%.*]] = freeze float [[VAL]]			; CHECK-NEXT: ret float [[VAL]]
	; CHECK-NEXT: ret float [[FRZ]]
	;			;
	%val = call float @llvm.amdgcn.rsq.f32(float %x)			%val = call float @llvm.amdgcn.rsq.f32(float %x)
	%frz = freeze float %val			%frz = freeze float %val
	ret float %frz			ret float %frz
	}			}

	define i32 @softwqm(i32 noundef %x) {			define i32 @softwqm(i32 noundef %x) {
	; CHECK-LABEL: define i32 @softwqm			; CHECK-LABEL: define i32 @softwqm
	; CHECK-SAME: (i32 noundef [[X:%.*]]) {			; CHECK-SAME: (i32 noundef [[X:%.*]]) {
	; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.amdgcn.softwqm.i32(i32 [[X]])			; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.amdgcn.softwqm.i32(i32 [[X]])
	; CHECK-NEXT: [[FRZ:%.*]] = freeze i32 [[VAL]]			; CHECK-NEXT: ret i32 [[VAL]]
	; CHECK-NEXT: ret i32 [[FRZ]]
	;			;
	%val = call i32 @llvm.amdgcn.softwqm.i32(i32 %x)			%val = call i32 @llvm.amdgcn.softwqm.i32(i32 %x)
	%frz = freeze i32 %val			%frz = freeze i32 %val
	ret i32 %frz			ret i32 %frz
	}			}

	define i32 @wqm(i32 noundef %x) {			define i32 @wqm(i32 noundef %x) {
	; CHECK-LABEL: define i32 @wqm			; CHECK-LABEL: define i32 @wqm
	; CHECK-SAME: (i32 noundef [[X:%.*]]) {			; CHECK-SAME: (i32 noundef [[X:%.*]]) {
	; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.amdgcn.wqm.i32(i32 [[X]])			; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.amdgcn.wqm.i32(i32 [[X]])
	; CHECK-NEXT: [[FRZ:%.*]] = freeze i32 [[VAL]]			; CHECK-NEXT: ret i32 [[VAL]]
	; CHECK-NEXT: ret i32 [[FRZ]]
	;			;
	%val = call i32 @llvm.amdgcn.wqm.i32(i32 %x)			%val = call i32 @llvm.amdgcn.wqm.i32(i32 %x)
	%frz = freeze i32 %val			%frz = freeze i32 %val
	ret i32 %frz			ret i32 %frz
	}			}

	define i32 @wwm(i32 noundef %x) {			define i32 @wwm(i32 noundef %x) {
	; CHECK-LABEL: define i32 @wwm			; CHECK-LABEL: define i32 @wwm
	; CHECK-SAME: (i32 noundef [[X:%.*]]) {			; CHECK-SAME: (i32 noundef [[X:%.*]]) {
	; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.amdgcn.wwm.i32(i32 [[X]])			; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.amdgcn.wwm.i32(i32 [[X]])
	; CHECK-NEXT: [[FRZ:%.*]] = freeze i32 [[VAL]]			; CHECK-NEXT: ret i32 [[VAL]]
	; CHECK-NEXT: ret i32 [[FRZ]]
	;			;
	%val = call i32 @llvm.amdgcn.wwm.i32(i32 %x)			%val = call i32 @llvm.amdgcn.wwm.i32(i32 %x)
	%frz = freeze i32 %val			%frz = freeze i32 %val
	ret i32 %frz			ret i32 %frz
	}			}

	declare float @llvm.amdgcn.fmed3.f32(float, float, float)			declare float @llvm.amdgcn.fmed3.f32(float, float, float)
	declare float @llvm.amdgcn.fmul.legacy(float, float)			declare float @llvm.amdgcn.fmul.legacy(float, float)
	Show All 15 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][ValueTracking] Handle amdgcn intrinsics that cannot create poisonNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 530849

llvm/lib/Analysis/ValueTracking.cpp

llvm/test/Transforms/InstCombine/AMDGPU/freeze.ll

[AMDGPU][ValueTracking] Handle amdgcn intrinsics that cannot create poison
Needs ReviewPublic