If we have legal f16 instructions but no f16 med3, we can save
one instruction by expanding out the min/max sequence compared
to casting to f32 and casting back.
Details
Diff Detail
Event Timeline
| llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.cpp | ||
|---|---|---|
| 391–393 | ||
| llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
| 5891 | nit: also add a complementary comment in the tablegen file (e.g. TODO: match intrinsics, currently we replace the intrinsic in LegalizerInfo to work around it), that way if we add intrinsic matching later, we don't forget to remove this workaround when updating the pattern | |
| llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
| 11139 | Here you check for f32 explicitly, but I think in the GISel combine you don't enforce it, why? | |
| llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
|---|---|---|
| 11139 | It doesn't matter much either way since there's no f64 or vector versions of fmed3. | |
LGTM, I don't see any regressions but I can't comment on the codegen change, so if you want a second opinion on the codegen logic then I would ask another reviewer :)
This is causing:
FAIL: LLVM :: CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir (1 of 1)
******************** TEST 'LLVM :: CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir' FAILED ********************
Script:
--
: 'RUN: at line 2'; /home/jayfoad2/llvm-release/bin/llc -mtriple=amdgcn-amd-mesa3d -mcpu=gfx1010 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir -o - | /home/jayfoad2/llvm-release/bin/FileCheck /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir
--
Exit Code: 1
Command Output (stderr):
--
+ : 'RUN: at line 2'
+ /home/jayfoad2/llvm-release/bin/llc -mtriple=amdgcn-amd-mesa3d -mcpu=gfx1010 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir -o -
+ /home/jayfoad2/llvm-release/bin/FileCheck /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir
/home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir:23:16: error: CHECK-NEXT: expected string not found in input
; CHECK-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s32) = nnan G_AMDGPU_CLAMP [[FMUL]]
^
<stdin>:151:30: note: scanning from here
%3:vgpr(s32) = G_FMUL %0, %2
^
<stdin>:151:30: note: with "FMUL" equal to "%3"
%3:vgpr(s32) = G_FMUL %0, %2
^
<stdin>:154:2: note: possible intended match here
%6:vgpr(s32) = COPY %5(s32)
^
/home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir:58:16: error: CHECK-NEXT: expected string not found in input
; CHECK-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s16) = nnan G_AMDGPU_CLAMP [[FMUL]]
^
<stdin>:269:30: note: scanning from here
%4:vgpr(s16) = G_FMUL %1, %3
^
<stdin>:269:30: note: with "FMUL" equal to "%4"
%4:vgpr(s16) = G_FMUL %1, %3
^
<stdin>:272:2: note: possible intended match here
%7:vgpr(s16) = COPY %6(s16)
^
/home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir:96:16: error: CHECK-NEXT: expected string not found in input
; CHECK-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s32) = G_AMDGPU_CLAMP [[FMINNUM_IEEE]]
^
<stdin>:387:38: note: scanning from here
%4:vgpr(s32) = G_FMINNUM_IEEE %2, %3
^
<stdin>:387:38: note: with "FMINNUM_IEEE" equal to "%4"
%4:vgpr(s32) = G_FMINNUM_IEEE %2, %3
^
<stdin>:390:2: note: possible intended match here
%7:vgpr(s32) = COPY %6(s32)
^
/home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir:131:16: error: CHECK-NEXT: expected string not found in input
; CHECK-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s32) = G_AMDGPU_CLAMP [[FMUL]]
^
<stdin>:502:30: note: scanning from here
%3:vgpr(s32) = G_FMUL %0, %2
^
<stdin>:502:30: note: with "FMUL" equal to "%3"
%3:vgpr(s32) = G_FMUL %0, %2
^
<stdin>:505:2: note: possible intended match here
%6:vgpr(s32) = COPY %5(s32)
^
/home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir:245:16: error: CHECK-NEXT: expected string not found in input
; CHECK-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s32) = G_AMDGPU_CLAMP [[FMUL]]
^
<stdin>:849:30: note: scanning from here
%3:vgpr(s32) = G_FMUL %0, %2
^
<stdin>:849:30: note: with "FMUL" equal to "%3"
%3:vgpr(s32) = G_FMUL %0, %2
^
<stdin>:852:2: note: possible intended match here
%6:vgpr(s32) = COPY %5(s32)
^
Input file: <stdin>
Check file: /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir
-dump-input=help explains the following input dump.
Input was:
<<<<<<
.
.
.
111: waveLimiter: false
112: hasSpilledSGPRs: false
113: hasSpilledVGPRs: false
114: scratchRSrcReg: '$private_rsrc_reg'
115: frameOffsetReg: '$fp_reg'
116: stackPtrOffsetReg: '$sp_reg'
117: bytesInStackArgArea: 0
118: returnsVoid: true
119: argumentInfo:
120: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
121: dispatchPtr: { reg: '$sgpr4_sgpr5' }
122: queuePtr: { reg: '$sgpr6_sgpr7' }
123: dispatchID: { reg: '$sgpr10_sgpr11' }
124: workGroupIDX: { reg: '$sgpr12' }
125: workGroupIDY: { reg: '$sgpr13' }
126: workGroupIDZ: { reg: '$sgpr14' }
127: LDSKernelId: { reg: '$sgpr15' }
128: implicitArgPtr: { reg: '$sgpr8_sgpr9' }
129: workItemIDX: { reg: '$vgpr31', mask: 1023 }
130: workItemIDY: { reg: '$vgpr31', mask: 1047552 }
131: workItemIDZ: { reg: '$vgpr31', mask: 1072693248 }
132: psInputAddr: 0
133: psInputEnable: 0
134: mode:
135: ieee: true
136: dx10-clamp: true
137: fp32-input-denormals: true
138: fp32-output-denormals: true
139: fp64-fp16-input-denormals: true
140: fp64-fp16-output-denormals: true
141: highBitsOf32BitAddress: 0
142: occupancy: 16
143: vgprForAGPRCopy: ''
144: body: |
145: bb.0:
146: liveins: $vgpr0
147:
148: %0:vgpr(s32) = COPY $vgpr0
149: %1:sgpr(s32) = G_FCONSTANT float 2.000000e+00
150: %2:vgpr(s32) = COPY %1(s32)
151: %3:vgpr(s32) = G_FMUL %0, %2
next:23'0 X error: no match found
next:23'1 with "FMUL" equal to "%3"
152: %4:sgpr(s32) = G_FCONSTANT float 1.000000e+00
next:23'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
153: %5:sgpr(s32) = G_FCONSTANT float 0.000000e+00
next:23'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
154: %6:vgpr(s32) = COPY %5(s32)
next:23'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:23'2 ? possible intended match
155: %7:vgpr(s32) = COPY %4(s32)
next:23'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
156: %8:vgpr(s32) = nnan G_INTRINSIC intrinsic(@llvm.amdgcn.fmed3), %3(s32), %6(s32), %7(s32)
next:23'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
157: $vgpr0 = COPY %8(s32)
next:23'0 ~~~~~~~~~~~~~~~~~~~~~~~
158:
next:23'0 ~
159: ...
next:23'0 ~~~~
160: ---
next:23'0 ~~~~
161: name: test_fmed3_f16_known_nnan_ieee_false
next:23'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
162: alignment: 1
163: exposesReturnsTwice: false
164: legalized: true
165: regBankSelected: true
166: selected: false
167: failedISel: false
168: tracksRegLiveness: true
169: hasWinCFI: false
170: callsEHReturn: false
171: callsUnwindInit: false
172: hasEHCatchret: false
173: hasEHScopes: false
174: hasEHFunclets: false
175: isOutlined: false
176: debugInstrRef: false
177: failsVerification: false
178: tracksDebugUserValues: false
179: registers:
180: - { id: 0, class: vgpr, preferred-register: '' }
181: - { id: 1, class: vgpr, preferred-register: '' }
182: - { id: 2, class: sgpr, preferred-register: '' }
183: - { id: 3, class: vgpr, preferred-register: '' }
184: - { id: 4, class: vgpr, preferred-register: '' }
185: - { id: 5, class: sgpr, preferred-register: '' }
186: - { id: 6, class: sgpr, preferred-register: '' }
187: - { id: 7, class: vgpr, preferred-register: '' }
188: - { id: 8, class: vgpr, preferred-register: '' }
189: - { id: 9, class: vgpr, preferred-register: '' }
190: - { id: 10, class: vgpr, preferred-register: '' }
191: liveins: []
192: frameInfo:
193: isFrameAddressTaken: false
194: isReturnAddressTaken: false
.
.
.
229: hasSpilledSGPRs: false
230: hasSpilledVGPRs: false
231: scratchRSrcReg: '$private_rsrc_reg'
232: frameOffsetReg: '$fp_reg'
233: stackPtrOffsetReg: '$sp_reg'
234: bytesInStackArgArea: 0
235: returnsVoid: true
236: argumentInfo:
237: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
238: dispatchPtr: { reg: '$sgpr4_sgpr5' }
239: queuePtr: { reg: '$sgpr6_sgpr7' }
240: dispatchID: { reg: '$sgpr10_sgpr11' }
241: workGroupIDX: { reg: '$sgpr12' }
242: workGroupIDY: { reg: '$sgpr13' }
243: workGroupIDZ: { reg: '$sgpr14' }
244: LDSKernelId: { reg: '$sgpr15' }
245: implicitArgPtr: { reg: '$sgpr8_sgpr9' }
246: workItemIDX: { reg: '$vgpr31', mask: 1023 }
247: workItemIDY: { reg: '$vgpr31', mask: 1047552 }
248: workItemIDZ: { reg: '$vgpr31', mask: 1072693248 }
249: psInputAddr: 0
250: psInputEnable: 0
251: mode:
252: ieee: false
253: dx10-clamp: true
254: fp32-input-denormals: true
255: fp32-output-denormals: true
256: fp64-fp16-input-denormals: true
257: fp64-fp16-output-denormals: true
258: highBitsOf32BitAddress: 0
259: occupancy: 16
260: vgprForAGPRCopy: ''
261: body: |
262: bb.0:
263: liveins: $vgpr0
264:
265: %0:vgpr(s32) = COPY $vgpr0
266: %1:vgpr(s16) = G_TRUNC %0(s32)
267: %2:sgpr(s16) = G_FCONSTANT half 0xH4000
268: %3:vgpr(s16) = COPY %2(s16)
269: %4:vgpr(s16) = G_FMUL %1, %3
next:58'0 X error: no match found
next:58'1 with "FMUL" equal to "%4"
270: %5:sgpr(s16) = G_FCONSTANT half 0xH3C00
next:58'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
271: %6:sgpr(s16) = G_FCONSTANT half 0xH0000
next:58'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
272: %7:vgpr(s16) = COPY %6(s16)
next:58'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:58'2 ? possible intended match
273: %8:vgpr(s16) = COPY %5(s16)
next:58'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
274: %9:vgpr(s16) = nnan G_INTRINSIC intrinsic(@llvm.amdgcn.fmed3), %4(s16), %7(s16), %8(s16)
next:58'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
275: %10:vgpr(s32) = G_ANYEXT %9(s16)
next:58'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
276: $vgpr0 = COPY %10(s32)
next:58'0 ~~~~~~~~~~~~~~~~~~~~~~~~
277:
next:58'0 ~
278: ...
next:58'0 ~~~~
279: ---
next:58'0 ~~~~
280: name: test_fmed3_non_SNaN_input_ieee_true_dx10clamp_true
next:58'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
281: alignment: 1
282: exposesReturnsTwice: false
283: legalized: true
284: regBankSelected: true
285: selected: false
286: failedISel: false
287: tracksRegLiveness: true
288: hasWinCFI: false
289: callsEHReturn: false
290: callsUnwindInit: false
291: hasEHCatchret: false
292: hasEHScopes: false
293: hasEHFunclets: false
294: isOutlined: false
295: debugInstrRef: false
296: failsVerification: false
297: tracksDebugUserValues: false
298: registers:
299: - { id: 0, class: vgpr, preferred-register: '' }
300: - { id: 1, class: sgpr, preferred-register: '' }
301: - { id: 2, class: vgpr, preferred-register: '' }
302: - { id: 3, class: vgpr, preferred-register: '' }
303: - { id: 4, class: vgpr, preferred-register: '' }
304: - { id: 5, class: sgpr, preferred-register: '' }
305: - { id: 6, class: sgpr, preferred-register: '' }
306: - { id: 7, class: vgpr, preferred-register: '' }
307: - { id: 8, class: vgpr, preferred-register: '' }
308: - { id: 9, class: vgpr, preferred-register: '' }
309: liveins: []
310: frameInfo:
311: isFrameAddressTaken: false
312: isReturnAddressTaken: false
.
.
.
347: hasSpilledSGPRs: false
348: hasSpilledVGPRs: false
349: scratchRSrcReg: '$private_rsrc_reg'
350: frameOffsetReg: '$fp_reg'
351: stackPtrOffsetReg: '$sp_reg'
352: bytesInStackArgArea: 0
353: returnsVoid: true
354: argumentInfo:
355: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
356: dispatchPtr: { reg: '$sgpr4_sgpr5' }
357: queuePtr: { reg: '$sgpr6_sgpr7' }
358: dispatchID: { reg: '$sgpr10_sgpr11' }
359: workGroupIDX: { reg: '$sgpr12' }
360: workGroupIDY: { reg: '$sgpr13' }
361: workGroupIDZ: { reg: '$sgpr14' }
362: LDSKernelId: { reg: '$sgpr15' }
363: implicitArgPtr: { reg: '$sgpr8_sgpr9' }
364: workItemIDX: { reg: '$vgpr31', mask: 1023 }
365: workItemIDY: { reg: '$vgpr31', mask: 1047552 }
366: workItemIDZ: { reg: '$vgpr31', mask: 1072693248 }
367: psInputAddr: 0
368: psInputEnable: 0
369: mode:
370: ieee: true
371: dx10-clamp: true
372: fp32-input-denormals: true
373: fp32-output-denormals: true
374: fp64-fp16-input-denormals: true
375: fp64-fp16-output-denormals: true
376: highBitsOf32BitAddress: 0
377: occupancy: 16
378: vgprForAGPRCopy: ''
379: body: |
380: bb.0:
381: liveins: $vgpr0
382:
383: %0:vgpr(s32) = COPY $vgpr0
384: %1:sgpr(s32) = G_FCONSTANT float 1.000000e+01
385: %2:vgpr(s32) = G_FCANONICALIZE %0
386: %3:vgpr(s32) = COPY %1(s32)
387: %4:vgpr(s32) = G_FMINNUM_IEEE %2, %3
next:96'0 X error: no match found
next:96'1 with "FMINNUM_IEEE" equal to "%4"
388: %5:sgpr(s32) = G_FCONSTANT float 1.000000e+00
next:96'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
389: %6:sgpr(s32) = G_FCONSTANT float 0.000000e+00
next:96'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
390: %7:vgpr(s32) = COPY %6(s32)
next:96'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:96'2 ? possible intended match
391: %8:vgpr(s32) = COPY %5(s32)
next:96'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
392: %9:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmed3), %4(s32), %7(s32), %8(s32)
next:96'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
393: $vgpr0 = COPY %9(s32)
next:96'0 ~~~~~~~~~~~~~~~~~~~~~~~
394:
next:96'0 ~
395: ...
next:96'0 ~~~~
396: ---
next:96'0 ~~~~
397: name: test_fmed3_maybe_SNaN_input_zero_third_operand_ieee_true_dx10clamp_true
next:96'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
398: alignment: 1
399: exposesReturnsTwice: false
400: legalized: true
401: regBankSelected: true
402: selected: false
403: failedISel: false
404: tracksRegLiveness: true
405: hasWinCFI: false
406: callsEHReturn: false
407: callsUnwindInit: false
408: hasEHCatchret: false
409: hasEHScopes: false
410: hasEHFunclets: false
411: isOutlined: false
412: debugInstrRef: false
413: failsVerification: false
414: tracksDebugUserValues: false
415: registers:
416: - { id: 0, class: vgpr, preferred-register: '' }
417: - { id: 1, class: sgpr, preferred-register: '' }
418: - { id: 2, class: vgpr, preferred-register: '' }
419: - { id: 3, class: vgpr, preferred-register: '' }
420: - { id: 4, class: sgpr, preferred-register: '' }
421: - { id: 5, class: sgpr, preferred-register: '' }
422: - { id: 6, class: vgpr, preferred-register: '' }
423: - { id: 7, class: vgpr, preferred-register: '' }
424: - { id: 8, class: vgpr, preferred-register: '' }
425: liveins: []
426: frameInfo:
427: isFrameAddressTaken: false
428: isReturnAddressTaken: false
429: hasStackMap: false
430: hasPatchPoint: false
.
.
.
462: waveLimiter: false
463: hasSpilledSGPRs: false
464: hasSpilledVGPRs: false
465: scratchRSrcReg: '$private_rsrc_reg'
466: frameOffsetReg: '$fp_reg'
467: stackPtrOffsetReg: '$sp_reg'
468: bytesInStackArgArea: 0
469: returnsVoid: true
470: argumentInfo:
471: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
472: dispatchPtr: { reg: '$sgpr4_sgpr5' }
473: queuePtr: { reg: '$sgpr6_sgpr7' }
474: dispatchID: { reg: '$sgpr10_sgpr11' }
475: workGroupIDX: { reg: '$sgpr12' }
476: workGroupIDY: { reg: '$sgpr13' }
477: workGroupIDZ: { reg: '$sgpr14' }
478: LDSKernelId: { reg: '$sgpr15' }
479: implicitArgPtr: { reg: '$sgpr8_sgpr9' }
480: workItemIDX: { reg: '$vgpr31', mask: 1023 }
481: workItemIDY: { reg: '$vgpr31', mask: 1047552 }
482: workItemIDZ: { reg: '$vgpr31', mask: 1072693248 }
483: psInputAddr: 0
484: psInputEnable: 0
485: mode:
486: ieee: true
487: dx10-clamp: true
488: fp32-input-denormals: true
489: fp32-output-denormals: true
490: fp64-fp16-input-denormals: true
491: fp64-fp16-output-denormals: true
492: highBitsOf32BitAddress: 0
493: occupancy: 16
494: vgprForAGPRCopy: ''
495: body: |
496: bb.0:
497: liveins: $vgpr0
498:
499: %0:vgpr(s32) = COPY $vgpr0
500: %1:sgpr(s32) = G_FCONSTANT float 2.000000e+00
501: %2:vgpr(s32) = COPY %1(s32)
502: %3:vgpr(s32) = G_FMUL %0, %2
next:131'0 X error: no match found
next:131'1 with "FMUL" equal to "%3"
503: %4:sgpr(s32) = G_FCONSTANT float 0.000000e+00
next:131'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
504: %5:sgpr(s32) = G_FCONSTANT float 1.000000e+00
next:131'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
505: %6:vgpr(s32) = COPY %5(s32)
next:131'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:131'2 ? possible intended match
506: %7:vgpr(s32) = COPY %4(s32)
next:131'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
507: %8:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmed3), %3(s32), %6(s32), %7(s32)
next:131'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
508: $vgpr0 = COPY %8(s32)
next:131'0 ~~~~~~~~~~~~~~~~~~~~~~~
509:
next:131'0 ~
510: ...
next:131'0 ~~~~
511: ---
next:131'0 ~~~~
512: name: test_fmed3_f32_maybe_NaN_ieee_false
next:131'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
513: alignment: 1
514: exposesReturnsTwice: false
515: legalized: true
516: regBankSelected: true
517: selected: false
518: failedISel: false
519: tracksRegLiveness: true
520: hasWinCFI: false
521: callsEHReturn: false
522: callsUnwindInit: false
523: hasEHCatchret: false
524: hasEHScopes: false
525: hasEHFunclets: false
526: isOutlined: false
527: debugInstrRef: false
528: failsVerification: false
529: tracksDebugUserValues: false
530: registers:
531: - { id: 0, class: vgpr, preferred-register: '' }
532: - { id: 1, class: sgpr, preferred-register: '' }
533: - { id: 2, class: vgpr, preferred-register: '' }
534: - { id: 3, class: vgpr, preferred-register: '' }
535: - { id: 4, class: sgpr, preferred-register: '' }
536: - { id: 5, class: sgpr, preferred-register: '' }
537: - { id: 6, class: vgpr, preferred-register: '' }
538: - { id: 7, class: vgpr, preferred-register: '' }
539: - { id: 8, class: vgpr, preferred-register: '' }
540: liveins: []
541: frameInfo:
542: isFrameAddressTaken: false
543: isReturnAddressTaken: false
544: hasStackMap: false
545: hasPatchPoint: false
.
.
.
809: waveLimiter: false
810: hasSpilledSGPRs: false
811: hasSpilledVGPRs: false
812: scratchRSrcReg: '$private_rsrc_reg'
813: frameOffsetReg: '$fp_reg'
814: stackPtrOffsetReg: '$sp_reg'
815: bytesInStackArgArea: 0
816: returnsVoid: true
817: argumentInfo:
818: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
819: dispatchPtr: { reg: '$sgpr4_sgpr5' }
820: queuePtr: { reg: '$sgpr6_sgpr7' }
821: dispatchID: { reg: '$sgpr10_sgpr11' }
822: workGroupIDX: { reg: '$sgpr12' }
823: workGroupIDY: { reg: '$sgpr13' }
824: workGroupIDZ: { reg: '$sgpr14' }
825: LDSKernelId: { reg: '$sgpr15' }
826: implicitArgPtr: { reg: '$sgpr8_sgpr9' }
827: workItemIDX: { reg: '$vgpr31', mask: 1023 }
828: workItemIDY: { reg: '$vgpr31', mask: 1047552 }
829: workItemIDZ: { reg: '$vgpr31', mask: 1072693248 }
830: psInputAddr: 0
831: psInputEnable: 0
832: mode:
833: ieee: true
834: dx10-clamp: true
835: fp32-input-denormals: true
836: fp32-output-denormals: true
837: fp64-fp16-input-denormals: true
838: fp64-fp16-output-denormals: true
839: highBitsOf32BitAddress: 0
840: occupancy: 16
841: vgprForAGPRCopy: ''
842: body: |
843: bb.0:
844: liveins: $vgpr0
845:
846: %0:vgpr(s32) = COPY $vgpr0
847: %1:sgpr(s32) = G_FCONSTANT float 2.000000e+00
848: %2:vgpr(s32) = COPY %1(s32)
849: %3:vgpr(s32) = G_FMUL %0, %2
next:245'0 X error: no match found
next:245'1 with "FMUL" equal to "%3"
850: %4:sgpr(s32) = G_FCONSTANT float 1.000000e+00
next:245'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
851: %5:sgpr(s32) = G_FCONSTANT float 0.000000e+00
next:245'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
852: %6:vgpr(s32) = COPY %5(s32)
next:245'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:245'2 ? possible intended match
853: %7:vgpr(s32) = COPY %4(s32)
next:245'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
854: %8:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmed3), %3(s32), %6(s32), %7(s32)
next:245'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
855: $vgpr0 = COPY %8(s32)
next:245'0 ~~~~~~~~~~~~~~~~~~~~~~~
856:
next:245'0 ~
857: ...
next:245'0 ~~~~
>>>>>>
--
********************
********************
Failed Tests (1):
LLVM :: CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir