If we have legal f16 instructions but no f16 med3, we can save
one instruction by expanding out the min/max sequence compared
to casting to f32 and casting back.
Details
Diff Detail
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.cpp | ||
---|---|---|
391–393 | ||
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
5891 | nit: also add a complementary comment in the tablegen file (e.g. TODO: match intrinsics, currently we replace the intrinsic in LegalizerInfo to work around it), that way if we add intrinsic matching later, we don't forget to remove this workaround when updating the pattern | |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
11139 | Here you check for f32 explicitly, but I think in the GISel combine you don't enforce it, why? |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
11139 | It doesn't matter much either way since there's no f64 or vector versions of fmed3. |
LGTM, I don't see any regressions but I can't comment on the codegen change, so if you want a second opinion on the codegen logic then I would ask another reviewer :)
This is causing:
FAIL: LLVM :: CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir (1 of 1) ******************** TEST 'LLVM :: CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir' FAILED ******************** Script: -- : 'RUN: at line 2'; /home/jayfoad2/llvm-release/bin/llc -mtriple=amdgcn-amd-mesa3d -mcpu=gfx1010 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir -o - | /home/jayfoad2/llvm-release/bin/FileCheck /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir -- Exit Code: 1 Command Output (stderr): -- + : 'RUN: at line 2' + /home/jayfoad2/llvm-release/bin/llc -mtriple=amdgcn-amd-mesa3d -mcpu=gfx1010 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir -o - + /home/jayfoad2/llvm-release/bin/FileCheck /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir:23:16: error: CHECK-NEXT: expected string not found in input ; CHECK-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s32) = nnan G_AMDGPU_CLAMP [[FMUL]] ^ <stdin>:151:30: note: scanning from here %3:vgpr(s32) = G_FMUL %0, %2 ^ <stdin>:151:30: note: with "FMUL" equal to "%3" %3:vgpr(s32) = G_FMUL %0, %2 ^ <stdin>:154:2: note: possible intended match here %6:vgpr(s32) = COPY %5(s32) ^ /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir:58:16: error: CHECK-NEXT: expected string not found in input ; CHECK-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s16) = nnan G_AMDGPU_CLAMP [[FMUL]] ^ <stdin>:269:30: note: scanning from here %4:vgpr(s16) = G_FMUL %1, %3 ^ <stdin>:269:30: note: with "FMUL" equal to "%4" %4:vgpr(s16) = G_FMUL %1, %3 ^ <stdin>:272:2: note: possible intended match here %7:vgpr(s16) = COPY %6(s16) ^ /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir:96:16: error: CHECK-NEXT: expected string not found in input ; CHECK-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s32) = G_AMDGPU_CLAMP [[FMINNUM_IEEE]] ^ <stdin>:387:38: note: scanning from here %4:vgpr(s32) = G_FMINNUM_IEEE %2, %3 ^ <stdin>:387:38: note: with "FMINNUM_IEEE" equal to "%4" %4:vgpr(s32) = G_FMINNUM_IEEE %2, %3 ^ <stdin>:390:2: note: possible intended match here %7:vgpr(s32) = COPY %6(s32) ^ /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir:131:16: error: CHECK-NEXT: expected string not found in input ; CHECK-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s32) = G_AMDGPU_CLAMP [[FMUL]] ^ <stdin>:502:30: note: scanning from here %3:vgpr(s32) = G_FMUL %0, %2 ^ <stdin>:502:30: note: with "FMUL" equal to "%3" %3:vgpr(s32) = G_FMUL %0, %2 ^ <stdin>:505:2: note: possible intended match here %6:vgpr(s32) = COPY %5(s32) ^ /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir:245:16: error: CHECK-NEXT: expected string not found in input ; CHECK-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s32) = G_AMDGPU_CLAMP [[FMUL]] ^ <stdin>:849:30: note: scanning from here %3:vgpr(s32) = G_FMUL %0, %2 ^ <stdin>:849:30: note: with "FMUL" equal to "%3" %3:vgpr(s32) = G_FMUL %0, %2 ^ <stdin>:852:2: note: possible intended match here %6:vgpr(s32) = COPY %5(s32) ^ Input file: <stdin> Check file: /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir -dump-input=help explains the following input dump. Input was: <<<<<< . . . 111: waveLimiter: false 112: hasSpilledSGPRs: false 113: hasSpilledVGPRs: false 114: scratchRSrcReg: '$private_rsrc_reg' 115: frameOffsetReg: '$fp_reg' 116: stackPtrOffsetReg: '$sp_reg' 117: bytesInStackArgArea: 0 118: returnsVoid: true 119: argumentInfo: 120: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' } 121: dispatchPtr: { reg: '$sgpr4_sgpr5' } 122: queuePtr: { reg: '$sgpr6_sgpr7' } 123: dispatchID: { reg: '$sgpr10_sgpr11' } 124: workGroupIDX: { reg: '$sgpr12' } 125: workGroupIDY: { reg: '$sgpr13' } 126: workGroupIDZ: { reg: '$sgpr14' } 127: LDSKernelId: { reg: '$sgpr15' } 128: implicitArgPtr: { reg: '$sgpr8_sgpr9' } 129: workItemIDX: { reg: '$vgpr31', mask: 1023 } 130: workItemIDY: { reg: '$vgpr31', mask: 1047552 } 131: workItemIDZ: { reg: '$vgpr31', mask: 1072693248 } 132: psInputAddr: 0 133: psInputEnable: 0 134: mode: 135: ieee: true 136: dx10-clamp: true 137: fp32-input-denormals: true 138: fp32-output-denormals: true 139: fp64-fp16-input-denormals: true 140: fp64-fp16-output-denormals: true 141: highBitsOf32BitAddress: 0 142: occupancy: 16 143: vgprForAGPRCopy: '' 144: body: | 145: bb.0: 146: liveins: $vgpr0 147: 148: %0:vgpr(s32) = COPY $vgpr0 149: %1:sgpr(s32) = G_FCONSTANT float 2.000000e+00 150: %2:vgpr(s32) = COPY %1(s32) 151: %3:vgpr(s32) = G_FMUL %0, %2 next:23'0 X error: no match found next:23'1 with "FMUL" equal to "%3" 152: %4:sgpr(s32) = G_FCONSTANT float 1.000000e+00 next:23'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 153: %5:sgpr(s32) = G_FCONSTANT float 0.000000e+00 next:23'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 154: %6:vgpr(s32) = COPY %5(s32) next:23'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ next:23'2 ? possible intended match 155: %7:vgpr(s32) = COPY %4(s32) next:23'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 156: %8:vgpr(s32) = nnan G_INTRINSIC intrinsic(@llvm.amdgcn.fmed3), %3(s32), %6(s32), %7(s32) next:23'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 157: $vgpr0 = COPY %8(s32) next:23'0 ~~~~~~~~~~~~~~~~~~~~~~~ 158: next:23'0 ~ 159: ... next:23'0 ~~~~ 160: --- next:23'0 ~~~~ 161: name: test_fmed3_f16_known_nnan_ieee_false next:23'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 162: alignment: 1 163: exposesReturnsTwice: false 164: legalized: true 165: regBankSelected: true 166: selected: false 167: failedISel: false 168: tracksRegLiveness: true 169: hasWinCFI: false 170: callsEHReturn: false 171: callsUnwindInit: false 172: hasEHCatchret: false 173: hasEHScopes: false 174: hasEHFunclets: false 175: isOutlined: false 176: debugInstrRef: false 177: failsVerification: false 178: tracksDebugUserValues: false 179: registers: 180: - { id: 0, class: vgpr, preferred-register: '' } 181: - { id: 1, class: vgpr, preferred-register: '' } 182: - { id: 2, class: sgpr, preferred-register: '' } 183: - { id: 3, class: vgpr, preferred-register: '' } 184: - { id: 4, class: vgpr, preferred-register: '' } 185: - { id: 5, class: sgpr, preferred-register: '' } 186: - { id: 6, class: sgpr, preferred-register: '' } 187: - { id: 7, class: vgpr, preferred-register: '' } 188: - { id: 8, class: vgpr, preferred-register: '' } 189: - { id: 9, class: vgpr, preferred-register: '' } 190: - { id: 10, class: vgpr, preferred-register: '' } 191: liveins: [] 192: frameInfo: 193: isFrameAddressTaken: false 194: isReturnAddressTaken: false . . . 229: hasSpilledSGPRs: false 230: hasSpilledVGPRs: false 231: scratchRSrcReg: '$private_rsrc_reg' 232: frameOffsetReg: '$fp_reg' 233: stackPtrOffsetReg: '$sp_reg' 234: bytesInStackArgArea: 0 235: returnsVoid: true 236: argumentInfo: 237: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' } 238: dispatchPtr: { reg: '$sgpr4_sgpr5' } 239: queuePtr: { reg: '$sgpr6_sgpr7' } 240: dispatchID: { reg: '$sgpr10_sgpr11' } 241: workGroupIDX: { reg: '$sgpr12' } 242: workGroupIDY: { reg: '$sgpr13' } 243: workGroupIDZ: { reg: '$sgpr14' } 244: LDSKernelId: { reg: '$sgpr15' } 245: implicitArgPtr: { reg: '$sgpr8_sgpr9' } 246: workItemIDX: { reg: '$vgpr31', mask: 1023 } 247: workItemIDY: { reg: '$vgpr31', mask: 1047552 } 248: workItemIDZ: { reg: '$vgpr31', mask: 1072693248 } 249: psInputAddr: 0 250: psInputEnable: 0 251: mode: 252: ieee: false 253: dx10-clamp: true 254: fp32-input-denormals: true 255: fp32-output-denormals: true 256: fp64-fp16-input-denormals: true 257: fp64-fp16-output-denormals: true 258: highBitsOf32BitAddress: 0 259: occupancy: 16 260: vgprForAGPRCopy: '' 261: body: | 262: bb.0: 263: liveins: $vgpr0 264: 265: %0:vgpr(s32) = COPY $vgpr0 266: %1:vgpr(s16) = G_TRUNC %0(s32) 267: %2:sgpr(s16) = G_FCONSTANT half 0xH4000 268: %3:vgpr(s16) = COPY %2(s16) 269: %4:vgpr(s16) = G_FMUL %1, %3 next:58'0 X error: no match found next:58'1 with "FMUL" equal to "%4" 270: %5:sgpr(s16) = G_FCONSTANT half 0xH3C00 next:58'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 271: %6:sgpr(s16) = G_FCONSTANT half 0xH0000 next:58'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 272: %7:vgpr(s16) = COPY %6(s16) next:58'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ next:58'2 ? possible intended match 273: %8:vgpr(s16) = COPY %5(s16) next:58'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 274: %9:vgpr(s16) = nnan G_INTRINSIC intrinsic(@llvm.amdgcn.fmed3), %4(s16), %7(s16), %8(s16) next:58'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 275: %10:vgpr(s32) = G_ANYEXT %9(s16) next:58'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 276: $vgpr0 = COPY %10(s32) next:58'0 ~~~~~~~~~~~~~~~~~~~~~~~~ 277: next:58'0 ~ 278: ... next:58'0 ~~~~ 279: --- next:58'0 ~~~~ 280: name: test_fmed3_non_SNaN_input_ieee_true_dx10clamp_true next:58'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 281: alignment: 1 282: exposesReturnsTwice: false 283: legalized: true 284: regBankSelected: true 285: selected: false 286: failedISel: false 287: tracksRegLiveness: true 288: hasWinCFI: false 289: callsEHReturn: false 290: callsUnwindInit: false 291: hasEHCatchret: false 292: hasEHScopes: false 293: hasEHFunclets: false 294: isOutlined: false 295: debugInstrRef: false 296: failsVerification: false 297: tracksDebugUserValues: false 298: registers: 299: - { id: 0, class: vgpr, preferred-register: '' } 300: - { id: 1, class: sgpr, preferred-register: '' } 301: - { id: 2, class: vgpr, preferred-register: '' } 302: - { id: 3, class: vgpr, preferred-register: '' } 303: - { id: 4, class: vgpr, preferred-register: '' } 304: - { id: 5, class: sgpr, preferred-register: '' } 305: - { id: 6, class: sgpr, preferred-register: '' } 306: - { id: 7, class: vgpr, preferred-register: '' } 307: - { id: 8, class: vgpr, preferred-register: '' } 308: - { id: 9, class: vgpr, preferred-register: '' } 309: liveins: [] 310: frameInfo: 311: isFrameAddressTaken: false 312: isReturnAddressTaken: false . . . 347: hasSpilledSGPRs: false 348: hasSpilledVGPRs: false 349: scratchRSrcReg: '$private_rsrc_reg' 350: frameOffsetReg: '$fp_reg' 351: stackPtrOffsetReg: '$sp_reg' 352: bytesInStackArgArea: 0 353: returnsVoid: true 354: argumentInfo: 355: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' } 356: dispatchPtr: { reg: '$sgpr4_sgpr5' } 357: queuePtr: { reg: '$sgpr6_sgpr7' } 358: dispatchID: { reg: '$sgpr10_sgpr11' } 359: workGroupIDX: { reg: '$sgpr12' } 360: workGroupIDY: { reg: '$sgpr13' } 361: workGroupIDZ: { reg: '$sgpr14' } 362: LDSKernelId: { reg: '$sgpr15' } 363: implicitArgPtr: { reg: '$sgpr8_sgpr9' } 364: workItemIDX: { reg: '$vgpr31', mask: 1023 } 365: workItemIDY: { reg: '$vgpr31', mask: 1047552 } 366: workItemIDZ: { reg: '$vgpr31', mask: 1072693248 } 367: psInputAddr: 0 368: psInputEnable: 0 369: mode: 370: ieee: true 371: dx10-clamp: true 372: fp32-input-denormals: true 373: fp32-output-denormals: true 374: fp64-fp16-input-denormals: true 375: fp64-fp16-output-denormals: true 376: highBitsOf32BitAddress: 0 377: occupancy: 16 378: vgprForAGPRCopy: '' 379: body: | 380: bb.0: 381: liveins: $vgpr0 382: 383: %0:vgpr(s32) = COPY $vgpr0 384: %1:sgpr(s32) = G_FCONSTANT float 1.000000e+01 385: %2:vgpr(s32) = G_FCANONICALIZE %0 386: %3:vgpr(s32) = COPY %1(s32) 387: %4:vgpr(s32) = G_FMINNUM_IEEE %2, %3 next:96'0 X error: no match found next:96'1 with "FMINNUM_IEEE" equal to "%4" 388: %5:sgpr(s32) = G_FCONSTANT float 1.000000e+00 next:96'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 389: %6:sgpr(s32) = G_FCONSTANT float 0.000000e+00 next:96'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 390: %7:vgpr(s32) = COPY %6(s32) next:96'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ next:96'2 ? possible intended match 391: %8:vgpr(s32) = COPY %5(s32) next:96'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 392: %9:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmed3), %4(s32), %7(s32), %8(s32) next:96'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 393: $vgpr0 = COPY %9(s32) next:96'0 ~~~~~~~~~~~~~~~~~~~~~~~ 394: next:96'0 ~ 395: ... next:96'0 ~~~~ 396: --- next:96'0 ~~~~ 397: name: test_fmed3_maybe_SNaN_input_zero_third_operand_ieee_true_dx10clamp_true next:96'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 398: alignment: 1 399: exposesReturnsTwice: false 400: legalized: true 401: regBankSelected: true 402: selected: false 403: failedISel: false 404: tracksRegLiveness: true 405: hasWinCFI: false 406: callsEHReturn: false 407: callsUnwindInit: false 408: hasEHCatchret: false 409: hasEHScopes: false 410: hasEHFunclets: false 411: isOutlined: false 412: debugInstrRef: false 413: failsVerification: false 414: tracksDebugUserValues: false 415: registers: 416: - { id: 0, class: vgpr, preferred-register: '' } 417: - { id: 1, class: sgpr, preferred-register: '' } 418: - { id: 2, class: vgpr, preferred-register: '' } 419: - { id: 3, class: vgpr, preferred-register: '' } 420: - { id: 4, class: sgpr, preferred-register: '' } 421: - { id: 5, class: sgpr, preferred-register: '' } 422: - { id: 6, class: vgpr, preferred-register: '' } 423: - { id: 7, class: vgpr, preferred-register: '' } 424: - { id: 8, class: vgpr, preferred-register: '' } 425: liveins: [] 426: frameInfo: 427: isFrameAddressTaken: false 428: isReturnAddressTaken: false 429: hasStackMap: false 430: hasPatchPoint: false . . . 462: waveLimiter: false 463: hasSpilledSGPRs: false 464: hasSpilledVGPRs: false 465: scratchRSrcReg: '$private_rsrc_reg' 466: frameOffsetReg: '$fp_reg' 467: stackPtrOffsetReg: '$sp_reg' 468: bytesInStackArgArea: 0 469: returnsVoid: true 470: argumentInfo: 471: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' } 472: dispatchPtr: { reg: '$sgpr4_sgpr5' } 473: queuePtr: { reg: '$sgpr6_sgpr7' } 474: dispatchID: { reg: '$sgpr10_sgpr11' } 475: workGroupIDX: { reg: '$sgpr12' } 476: workGroupIDY: { reg: '$sgpr13' } 477: workGroupIDZ: { reg: '$sgpr14' } 478: LDSKernelId: { reg: '$sgpr15' } 479: implicitArgPtr: { reg: '$sgpr8_sgpr9' } 480: workItemIDX: { reg: '$vgpr31', mask: 1023 } 481: workItemIDY: { reg: '$vgpr31', mask: 1047552 } 482: workItemIDZ: { reg: '$vgpr31', mask: 1072693248 } 483: psInputAddr: 0 484: psInputEnable: 0 485: mode: 486: ieee: true 487: dx10-clamp: true 488: fp32-input-denormals: true 489: fp32-output-denormals: true 490: fp64-fp16-input-denormals: true 491: fp64-fp16-output-denormals: true 492: highBitsOf32BitAddress: 0 493: occupancy: 16 494: vgprForAGPRCopy: '' 495: body: | 496: bb.0: 497: liveins: $vgpr0 498: 499: %0:vgpr(s32) = COPY $vgpr0 500: %1:sgpr(s32) = G_FCONSTANT float 2.000000e+00 501: %2:vgpr(s32) = COPY %1(s32) 502: %3:vgpr(s32) = G_FMUL %0, %2 next:131'0 X error: no match found next:131'1 with "FMUL" equal to "%3" 503: %4:sgpr(s32) = G_FCONSTANT float 0.000000e+00 next:131'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 504: %5:sgpr(s32) = G_FCONSTANT float 1.000000e+00 next:131'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 505: %6:vgpr(s32) = COPY %5(s32) next:131'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ next:131'2 ? possible intended match 506: %7:vgpr(s32) = COPY %4(s32) next:131'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 507: %8:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmed3), %3(s32), %6(s32), %7(s32) next:131'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 508: $vgpr0 = COPY %8(s32) next:131'0 ~~~~~~~~~~~~~~~~~~~~~~~ 509: next:131'0 ~ 510: ... next:131'0 ~~~~ 511: --- next:131'0 ~~~~ 512: name: test_fmed3_f32_maybe_NaN_ieee_false next:131'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 513: alignment: 1 514: exposesReturnsTwice: false 515: legalized: true 516: regBankSelected: true 517: selected: false 518: failedISel: false 519: tracksRegLiveness: true 520: hasWinCFI: false 521: callsEHReturn: false 522: callsUnwindInit: false 523: hasEHCatchret: false 524: hasEHScopes: false 525: hasEHFunclets: false 526: isOutlined: false 527: debugInstrRef: false 528: failsVerification: false 529: tracksDebugUserValues: false 530: registers: 531: - { id: 0, class: vgpr, preferred-register: '' } 532: - { id: 1, class: sgpr, preferred-register: '' } 533: - { id: 2, class: vgpr, preferred-register: '' } 534: - { id: 3, class: vgpr, preferred-register: '' } 535: - { id: 4, class: sgpr, preferred-register: '' } 536: - { id: 5, class: sgpr, preferred-register: '' } 537: - { id: 6, class: vgpr, preferred-register: '' } 538: - { id: 7, class: vgpr, preferred-register: '' } 539: - { id: 8, class: vgpr, preferred-register: '' } 540: liveins: [] 541: frameInfo: 542: isFrameAddressTaken: false 543: isReturnAddressTaken: false 544: hasStackMap: false 545: hasPatchPoint: false . . . 809: waveLimiter: false 810: hasSpilledSGPRs: false 811: hasSpilledVGPRs: false 812: scratchRSrcReg: '$private_rsrc_reg' 813: frameOffsetReg: '$fp_reg' 814: stackPtrOffsetReg: '$sp_reg' 815: bytesInStackArgArea: 0 816: returnsVoid: true 817: argumentInfo: 818: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' } 819: dispatchPtr: { reg: '$sgpr4_sgpr5' } 820: queuePtr: { reg: '$sgpr6_sgpr7' } 821: dispatchID: { reg: '$sgpr10_sgpr11' } 822: workGroupIDX: { reg: '$sgpr12' } 823: workGroupIDY: { reg: '$sgpr13' } 824: workGroupIDZ: { reg: '$sgpr14' } 825: LDSKernelId: { reg: '$sgpr15' } 826: implicitArgPtr: { reg: '$sgpr8_sgpr9' } 827: workItemIDX: { reg: '$vgpr31', mask: 1023 } 828: workItemIDY: { reg: '$vgpr31', mask: 1047552 } 829: workItemIDZ: { reg: '$vgpr31', mask: 1072693248 } 830: psInputAddr: 0 831: psInputEnable: 0 832: mode: 833: ieee: true 834: dx10-clamp: true 835: fp32-input-denormals: true 836: fp32-output-denormals: true 837: fp64-fp16-input-denormals: true 838: fp64-fp16-output-denormals: true 839: highBitsOf32BitAddress: 0 840: occupancy: 16 841: vgprForAGPRCopy: '' 842: body: | 843: bb.0: 844: liveins: $vgpr0 845: 846: %0:vgpr(s32) = COPY $vgpr0 847: %1:sgpr(s32) = G_FCONSTANT float 2.000000e+00 848: %2:vgpr(s32) = COPY %1(s32) 849: %3:vgpr(s32) = G_FMUL %0, %2 next:245'0 X error: no match found next:245'1 with "FMUL" equal to "%3" 850: %4:sgpr(s32) = G_FCONSTANT float 1.000000e+00 next:245'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 851: %5:sgpr(s32) = G_FCONSTANT float 0.000000e+00 next:245'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 852: %6:vgpr(s32) = COPY %5(s32) next:245'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ next:245'2 ? possible intended match 853: %7:vgpr(s32) = COPY %4(s32) next:245'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 854: %8:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmed3), %3(s32), %6(s32), %7(s32) next:245'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 855: $vgpr0 = COPY %8(s32) next:245'0 ~~~~~~~~~~~~~~~~~~~~~~~ 856: next:245'0 ~ 857: ... next:245'0 ~~~~ >>>>>> -- ******************** ******************** Failed Tests (1): LLVM :: CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir