This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Expand casted f16 fmed3 pattern to fmin/fmax on gfx8
ClosedPublic

Authored by arsenm on May 15 2023, 1:59 PM.

Details

Reviewers
foad
Pierre-vh
rampitec
b-sumner
Group Reviewers
Restricted Project
Summary

If we have legal f16 instructions but no f16 med3, we can save
one instruction by expanding out the min/max sequence compared
to casting to f32 and casting back.

Diff Detail

Event Timeline

arsenm created this revision.May 15 2023, 1:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 15 2023, 1:59 PM
arsenm requested review of this revision.May 15 2023, 1:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 15 2023, 1:59 PM
Herald added a subscriber: wdng. · View Herald Transcript
Pierre-vh added inline comments.May 16 2023, 12:16 AM
llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.cpp
391–393
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
5891

nit: also add a complementary comment in the tablegen file (e.g. TODO: match intrinsics, currently we replace the intrinsic in LegalizerInfo to work around it), that way if we add intrinsic matching later, we don't forget to remove this workaround when updating the pattern

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11139

Here you check for f32 explicitly, but I think in the GISel combine you don't enforce it, why?

arsenm added inline comments.May 16 2023, 12:48 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11139

It doesn't matter much either way since there's no f64 or vector versions of fmed3.

arsenm updated this revision to Diff 523721.May 19 2023, 3:33 AM
arsenm marked 2 inline comments as done.

Address comments

LGTM, I don't see any regressions but I can't comment on the codegen change, so if you want a second opinion on the codegen logic then I would ask another reviewer :)

Pierre-vh accepted this revision as: Pierre-vh.May 22 2023, 7:01 AM
This revision is now accepted and ready to land.May 22 2023, 7:01 AM
foad added a comment.May 23 2023, 1:16 AM

This is causing:

FAIL: LLVM :: CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir (1 of 1)
******************** TEST 'LLVM :: CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir' FAILED ********************
Script:
--
: 'RUN: at line 2';   /home/jayfoad2/llvm-release/bin/llc -mtriple=amdgcn-amd-mesa3d -mcpu=gfx1010 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir -o - | /home/jayfoad2/llvm-release/bin/FileCheck /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir
--
Exit Code: 1

Command Output (stderr):
--
+ : 'RUN: at line 2'
+ /home/jayfoad2/llvm-release/bin/llc -mtriple=amdgcn-amd-mesa3d -mcpu=gfx1010 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir -o -
+ /home/jayfoad2/llvm-release/bin/FileCheck /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir
/home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir:23:16: error: CHECK-NEXT: expected string not found in input
 ; CHECK-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s32) = nnan G_AMDGPU_CLAMP [[FMUL]]
               ^
<stdin>:151:30: note: scanning from here
 %3:vgpr(s32) = G_FMUL %0, %2
                             ^
<stdin>:151:30: note: with "FMUL" equal to "%3"
 %3:vgpr(s32) = G_FMUL %0, %2
                             ^
<stdin>:154:2: note: possible intended match here
 %6:vgpr(s32) = COPY %5(s32)
 ^
/home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir:58:16: error: CHECK-NEXT: expected string not found in input
 ; CHECK-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s16) = nnan G_AMDGPU_CLAMP [[FMUL]]
               ^
<stdin>:269:30: note: scanning from here
 %4:vgpr(s16) = G_FMUL %1, %3
                             ^
<stdin>:269:30: note: with "FMUL" equal to "%4"
 %4:vgpr(s16) = G_FMUL %1, %3
                             ^
<stdin>:272:2: note: possible intended match here
 %7:vgpr(s16) = COPY %6(s16)
 ^
/home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir:96:16: error: CHECK-NEXT: expected string not found in input
 ; CHECK-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s32) = G_AMDGPU_CLAMP [[FMINNUM_IEEE]]
               ^
<stdin>:387:38: note: scanning from here
 %4:vgpr(s32) = G_FMINNUM_IEEE %2, %3
                                     ^
<stdin>:387:38: note: with "FMINNUM_IEEE" equal to "%4"
 %4:vgpr(s32) = G_FMINNUM_IEEE %2, %3
                                     ^
<stdin>:390:2: note: possible intended match here
 %7:vgpr(s32) = COPY %6(s32)
 ^
/home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir:131:16: error: CHECK-NEXT: expected string not found in input
 ; CHECK-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s32) = G_AMDGPU_CLAMP [[FMUL]]
               ^
<stdin>:502:30: note: scanning from here
 %3:vgpr(s32) = G_FMUL %0, %2
                             ^
<stdin>:502:30: note: with "FMUL" equal to "%3"
 %3:vgpr(s32) = G_FMUL %0, %2
                             ^
<stdin>:505:2: note: possible intended match here
 %6:vgpr(s32) = COPY %5(s32)
 ^
/home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir:245:16: error: CHECK-NEXT: expected string not found in input
 ; CHECK-NEXT: [[AMDGPU_CLAMP:%[0-9]+]]:vgpr(s32) = G_AMDGPU_CLAMP [[FMUL]]
               ^
<stdin>:849:30: note: scanning from here
 %3:vgpr(s32) = G_FMUL %0, %2
                             ^
<stdin>:849:30: note: with "FMUL" equal to "%3"
 %3:vgpr(s32) = G_FMUL %0, %2
                             ^
<stdin>:852:2: note: possible intended match here
 %6:vgpr(s32) = COPY %5(s32)
 ^

Input file: <stdin>
Check file: /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            .
            .
            .
          111:  waveLimiter: false 
          112:  hasSpilledSGPRs: false 
          113:  hasSpilledVGPRs: false 
          114:  scratchRSrcReg: '$private_rsrc_reg' 
          115:  frameOffsetReg: '$fp_reg' 
          116:  stackPtrOffsetReg: '$sp_reg' 
          117:  bytesInStackArgArea: 0 
          118:  returnsVoid: true 
          119:  argumentInfo: 
          120:  privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' } 
          121:  dispatchPtr: { reg: '$sgpr4_sgpr5' } 
          122:  queuePtr: { reg: '$sgpr6_sgpr7' } 
          123:  dispatchID: { reg: '$sgpr10_sgpr11' } 
          124:  workGroupIDX: { reg: '$sgpr12' } 
          125:  workGroupIDY: { reg: '$sgpr13' } 
          126:  workGroupIDZ: { reg: '$sgpr14' } 
          127:  LDSKernelId: { reg: '$sgpr15' } 
          128:  implicitArgPtr: { reg: '$sgpr8_sgpr9' } 
          129:  workItemIDX: { reg: '$vgpr31', mask: 1023 } 
          130:  workItemIDY: { reg: '$vgpr31', mask: 1047552 } 
          131:  workItemIDZ: { reg: '$vgpr31', mask: 1072693248 } 
          132:  psInputAddr: 0 
          133:  psInputEnable: 0 
          134:  mode: 
          135:  ieee: true 
          136:  dx10-clamp: true 
          137:  fp32-input-denormals: true 
          138:  fp32-output-denormals: true 
          139:  fp64-fp16-input-denormals: true 
          140:  fp64-fp16-output-denormals: true 
          141:  highBitsOf32BitAddress: 0 
          142:  occupancy: 16 
          143:  vgprForAGPRCopy: '' 
          144: body: | 
          145:  bb.0: 
          146:  liveins: $vgpr0 
          147:   
          148:  %0:vgpr(s32) = COPY $vgpr0 
          149:  %1:sgpr(s32) = G_FCONSTANT float 2.000000e+00 
          150:  %2:vgpr(s32) = COPY %1(s32) 
          151:  %3:vgpr(s32) = G_FMUL %0, %2 
next:23'0                                   X error: no match found
next:23'1                                     with "FMUL" equal to "%3"
          152:  %4:sgpr(s32) = G_FCONSTANT float 1.000000e+00 
next:23'0      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          153:  %5:sgpr(s32) = G_FCONSTANT float 0.000000e+00 
next:23'0      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          154:  %6:vgpr(s32) = COPY %5(s32) 
next:23'0      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:23'2       ?                            possible intended match
          155:  %7:vgpr(s32) = COPY %4(s32) 
next:23'0      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          156:  %8:vgpr(s32) = nnan G_INTRINSIC intrinsic(@llvm.amdgcn.fmed3), %3(s32), %6(s32), %7(s32) 
next:23'0      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          157:  $vgpr0 = COPY %8(s32) 
next:23'0      ~~~~~~~~~~~~~~~~~~~~~~~
          158:  
next:23'0      ~
          159: ... 
next:23'0      ~~~~
          160: --- 
next:23'0      ~~~~
          161: name: test_fmed3_f16_known_nnan_ieee_false 
next:23'0      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          162: alignment: 1 
          163: exposesReturnsTwice: false 
          164: legalized: true 
          165: regBankSelected: true 
          166: selected: false 
          167: failedISel: false 
          168: tracksRegLiveness: true 
          169: hasWinCFI: false 
          170: callsEHReturn: false 
          171: callsUnwindInit: false 
          172: hasEHCatchret: false 
          173: hasEHScopes: false 
          174: hasEHFunclets: false 
          175: isOutlined: false 
          176: debugInstrRef: false 
          177: failsVerification: false 
          178: tracksDebugUserValues: false 
          179: registers: 
          180:  - { id: 0, class: vgpr, preferred-register: '' } 
          181:  - { id: 1, class: vgpr, preferred-register: '' } 
          182:  - { id: 2, class: sgpr, preferred-register: '' } 
          183:  - { id: 3, class: vgpr, preferred-register: '' } 
          184:  - { id: 4, class: vgpr, preferred-register: '' } 
          185:  - { id: 5, class: sgpr, preferred-register: '' } 
          186:  - { id: 6, class: sgpr, preferred-register: '' } 
          187:  - { id: 7, class: vgpr, preferred-register: '' } 
          188:  - { id: 8, class: vgpr, preferred-register: '' } 
          189:  - { id: 9, class: vgpr, preferred-register: '' } 
          190:  - { id: 10, class: vgpr, preferred-register: '' } 
          191: liveins: [] 
          192: frameInfo: 
          193:  isFrameAddressTaken: false 
          194:  isReturnAddressTaken: false 
            .
            .
            .
          229:  hasSpilledSGPRs: false 
          230:  hasSpilledVGPRs: false 
          231:  scratchRSrcReg: '$private_rsrc_reg' 
          232:  frameOffsetReg: '$fp_reg' 
          233:  stackPtrOffsetReg: '$sp_reg' 
          234:  bytesInStackArgArea: 0 
          235:  returnsVoid: true 
          236:  argumentInfo: 
          237:  privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' } 
          238:  dispatchPtr: { reg: '$sgpr4_sgpr5' } 
          239:  queuePtr: { reg: '$sgpr6_sgpr7' } 
          240:  dispatchID: { reg: '$sgpr10_sgpr11' } 
          241:  workGroupIDX: { reg: '$sgpr12' } 
          242:  workGroupIDY: { reg: '$sgpr13' } 
          243:  workGroupIDZ: { reg: '$sgpr14' } 
          244:  LDSKernelId: { reg: '$sgpr15' } 
          245:  implicitArgPtr: { reg: '$sgpr8_sgpr9' } 
          246:  workItemIDX: { reg: '$vgpr31', mask: 1023 } 
          247:  workItemIDY: { reg: '$vgpr31', mask: 1047552 } 
          248:  workItemIDZ: { reg: '$vgpr31', mask: 1072693248 } 
          249:  psInputAddr: 0 
          250:  psInputEnable: 0 
          251:  mode: 
          252:  ieee: false 
          253:  dx10-clamp: true 
          254:  fp32-input-denormals: true 
          255:  fp32-output-denormals: true 
          256:  fp64-fp16-input-denormals: true 
          257:  fp64-fp16-output-denormals: true 
          258:  highBitsOf32BitAddress: 0 
          259:  occupancy: 16 
          260:  vgprForAGPRCopy: '' 
          261: body: | 
          262:  bb.0: 
          263:  liveins: $vgpr0 
          264:   
          265:  %0:vgpr(s32) = COPY $vgpr0 
          266:  %1:vgpr(s16) = G_TRUNC %0(s32) 
          267:  %2:sgpr(s16) = G_FCONSTANT half 0xH4000 
          268:  %3:vgpr(s16) = COPY %2(s16) 
          269:  %4:vgpr(s16) = G_FMUL %1, %3 
next:58'0                                   X error: no match found
next:58'1                                     with "FMUL" equal to "%4"
          270:  %5:sgpr(s16) = G_FCONSTANT half 0xH3C00 
next:58'0      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          271:  %6:sgpr(s16) = G_FCONSTANT half 0xH0000 
next:58'0      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          272:  %7:vgpr(s16) = COPY %6(s16) 
next:58'0      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:58'2       ?                            possible intended match
          273:  %8:vgpr(s16) = COPY %5(s16) 
next:58'0      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          274:  %9:vgpr(s16) = nnan G_INTRINSIC intrinsic(@llvm.amdgcn.fmed3), %4(s16), %7(s16), %8(s16) 
next:58'0      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          275:  %10:vgpr(s32) = G_ANYEXT %9(s16) 
next:58'0      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          276:  $vgpr0 = COPY %10(s32) 
next:58'0      ~~~~~~~~~~~~~~~~~~~~~~~~
          277:  
next:58'0      ~
          278: ... 
next:58'0      ~~~~
          279: --- 
next:58'0      ~~~~
          280: name: test_fmed3_non_SNaN_input_ieee_true_dx10clamp_true 
next:58'0      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          281: alignment: 1 
          282: exposesReturnsTwice: false 
          283: legalized: true 
          284: regBankSelected: true 
          285: selected: false 
          286: failedISel: false 
          287: tracksRegLiveness: true 
          288: hasWinCFI: false 
          289: callsEHReturn: false 
          290: callsUnwindInit: false 
          291: hasEHCatchret: false 
          292: hasEHScopes: false 
          293: hasEHFunclets: false 
          294: isOutlined: false 
          295: debugInstrRef: false 
          296: failsVerification: false 
          297: tracksDebugUserValues: false 
          298: registers: 
          299:  - { id: 0, class: vgpr, preferred-register: '' } 
          300:  - { id: 1, class: sgpr, preferred-register: '' } 
          301:  - { id: 2, class: vgpr, preferred-register: '' } 
          302:  - { id: 3, class: vgpr, preferred-register: '' } 
          303:  - { id: 4, class: vgpr, preferred-register: '' } 
          304:  - { id: 5, class: sgpr, preferred-register: '' } 
          305:  - { id: 6, class: sgpr, preferred-register: '' } 
          306:  - { id: 7, class: vgpr, preferred-register: '' } 
          307:  - { id: 8, class: vgpr, preferred-register: '' } 
          308:  - { id: 9, class: vgpr, preferred-register: '' } 
          309: liveins: [] 
          310: frameInfo: 
          311:  isFrameAddressTaken: false 
          312:  isReturnAddressTaken: false 
            .
            .
            .
          347:  hasSpilledSGPRs: false 
          348:  hasSpilledVGPRs: false 
          349:  scratchRSrcReg: '$private_rsrc_reg' 
          350:  frameOffsetReg: '$fp_reg' 
          351:  stackPtrOffsetReg: '$sp_reg' 
          352:  bytesInStackArgArea: 0 
          353:  returnsVoid: true 
          354:  argumentInfo: 
          355:  privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' } 
          356:  dispatchPtr: { reg: '$sgpr4_sgpr5' } 
          357:  queuePtr: { reg: '$sgpr6_sgpr7' } 
          358:  dispatchID: { reg: '$sgpr10_sgpr11' } 
          359:  workGroupIDX: { reg: '$sgpr12' } 
          360:  workGroupIDY: { reg: '$sgpr13' } 
          361:  workGroupIDZ: { reg: '$sgpr14' } 
          362:  LDSKernelId: { reg: '$sgpr15' } 
          363:  implicitArgPtr: { reg: '$sgpr8_sgpr9' } 
          364:  workItemIDX: { reg: '$vgpr31', mask: 1023 } 
          365:  workItemIDY: { reg: '$vgpr31', mask: 1047552 } 
          366:  workItemIDZ: { reg: '$vgpr31', mask: 1072693248 } 
          367:  psInputAddr: 0 
          368:  psInputEnable: 0 
          369:  mode: 
          370:  ieee: true 
          371:  dx10-clamp: true 
          372:  fp32-input-denormals: true 
          373:  fp32-output-denormals: true 
          374:  fp64-fp16-input-denormals: true 
          375:  fp64-fp16-output-denormals: true 
          376:  highBitsOf32BitAddress: 0 
          377:  occupancy: 16 
          378:  vgprForAGPRCopy: '' 
          379: body: | 
          380:  bb.0: 
          381:  liveins: $vgpr0 
          382:   
          383:  %0:vgpr(s32) = COPY $vgpr0 
          384:  %1:sgpr(s32) = G_FCONSTANT float 1.000000e+01 
          385:  %2:vgpr(s32) = G_FCANONICALIZE %0 
          386:  %3:vgpr(s32) = COPY %1(s32) 
          387:  %4:vgpr(s32) = G_FMINNUM_IEEE %2, %3 
next:96'0                                           X error: no match found
next:96'1                                             with "FMINNUM_IEEE" equal to "%4"
          388:  %5:sgpr(s32) = G_FCONSTANT float 1.000000e+00 
next:96'0      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          389:  %6:sgpr(s32) = G_FCONSTANT float 0.000000e+00 
next:96'0      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          390:  %7:vgpr(s32) = COPY %6(s32) 
next:96'0      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:96'2       ?                            possible intended match
          391:  %8:vgpr(s32) = COPY %5(s32) 
next:96'0      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          392:  %9:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmed3), %4(s32), %7(s32), %8(s32) 
next:96'0      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          393:  $vgpr0 = COPY %9(s32) 
next:96'0      ~~~~~~~~~~~~~~~~~~~~~~~
          394:  
next:96'0      ~
          395: ... 
next:96'0      ~~~~
          396: --- 
next:96'0      ~~~~
          397: name: test_fmed3_maybe_SNaN_input_zero_third_operand_ieee_true_dx10clamp_true 
next:96'0      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          398: alignment: 1 
          399: exposesReturnsTwice: false 
          400: legalized: true 
          401: regBankSelected: true 
          402: selected: false 
          403: failedISel: false 
          404: tracksRegLiveness: true 
          405: hasWinCFI: false 
          406: callsEHReturn: false 
          407: callsUnwindInit: false 
          408: hasEHCatchret: false 
          409: hasEHScopes: false 
          410: hasEHFunclets: false 
          411: isOutlined: false 
          412: debugInstrRef: false 
          413: failsVerification: false 
          414: tracksDebugUserValues: false 
          415: registers: 
          416:  - { id: 0, class: vgpr, preferred-register: '' } 
          417:  - { id: 1, class: sgpr, preferred-register: '' } 
          418:  - { id: 2, class: vgpr, preferred-register: '' } 
          419:  - { id: 3, class: vgpr, preferred-register: '' } 
          420:  - { id: 4, class: sgpr, preferred-register: '' } 
          421:  - { id: 5, class: sgpr, preferred-register: '' } 
          422:  - { id: 6, class: vgpr, preferred-register: '' } 
          423:  - { id: 7, class: vgpr, preferred-register: '' } 
          424:  - { id: 8, class: vgpr, preferred-register: '' } 
          425: liveins: [] 
          426: frameInfo: 
          427:  isFrameAddressTaken: false 
          428:  isReturnAddressTaken: false 
          429:  hasStackMap: false 
          430:  hasPatchPoint: false 
            .
            .
            .
          462:  waveLimiter: false 
          463:  hasSpilledSGPRs: false 
          464:  hasSpilledVGPRs: false 
          465:  scratchRSrcReg: '$private_rsrc_reg' 
          466:  frameOffsetReg: '$fp_reg' 
          467:  stackPtrOffsetReg: '$sp_reg' 
          468:  bytesInStackArgArea: 0 
          469:  returnsVoid: true 
          470:  argumentInfo: 
          471:  privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' } 
          472:  dispatchPtr: { reg: '$sgpr4_sgpr5' } 
          473:  queuePtr: { reg: '$sgpr6_sgpr7' } 
          474:  dispatchID: { reg: '$sgpr10_sgpr11' } 
          475:  workGroupIDX: { reg: '$sgpr12' } 
          476:  workGroupIDY: { reg: '$sgpr13' } 
          477:  workGroupIDZ: { reg: '$sgpr14' } 
          478:  LDSKernelId: { reg: '$sgpr15' } 
          479:  implicitArgPtr: { reg: '$sgpr8_sgpr9' } 
          480:  workItemIDX: { reg: '$vgpr31', mask: 1023 } 
          481:  workItemIDY: { reg: '$vgpr31', mask: 1047552 } 
          482:  workItemIDZ: { reg: '$vgpr31', mask: 1072693248 } 
          483:  psInputAddr: 0 
          484:  psInputEnable: 0 
          485:  mode: 
          486:  ieee: true 
          487:  dx10-clamp: true 
          488:  fp32-input-denormals: true 
          489:  fp32-output-denormals: true 
          490:  fp64-fp16-input-denormals: true 
          491:  fp64-fp16-output-denormals: true 
          492:  highBitsOf32BitAddress: 0 
          493:  occupancy: 16 
          494:  vgprForAGPRCopy: '' 
          495: body: | 
          496:  bb.0: 
          497:  liveins: $vgpr0 
          498:   
          499:  %0:vgpr(s32) = COPY $vgpr0 
          500:  %1:sgpr(s32) = G_FCONSTANT float 2.000000e+00 
          501:  %2:vgpr(s32) = COPY %1(s32) 
          502:  %3:vgpr(s32) = G_FMUL %0, %2 
next:131'0                                  X error: no match found
next:131'1                                    with "FMUL" equal to "%3"
          503:  %4:sgpr(s32) = G_FCONSTANT float 0.000000e+00 
next:131'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          504:  %5:sgpr(s32) = G_FCONSTANT float 1.000000e+00 
next:131'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          505:  %6:vgpr(s32) = COPY %5(s32) 
next:131'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:131'2      ?                            possible intended match
          506:  %7:vgpr(s32) = COPY %4(s32) 
next:131'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          507:  %8:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmed3), %3(s32), %6(s32), %7(s32) 
next:131'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          508:  $vgpr0 = COPY %8(s32) 
next:131'0     ~~~~~~~~~~~~~~~~~~~~~~~
          509:  
next:131'0     ~
          510: ... 
next:131'0     ~~~~
          511: --- 
next:131'0     ~~~~
          512: name: test_fmed3_f32_maybe_NaN_ieee_false 
next:131'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          513: alignment: 1 
          514: exposesReturnsTwice: false 
          515: legalized: true 
          516: regBankSelected: true 
          517: selected: false 
          518: failedISel: false 
          519: tracksRegLiveness: true 
          520: hasWinCFI: false 
          521: callsEHReturn: false 
          522: callsUnwindInit: false 
          523: hasEHCatchret: false 
          524: hasEHScopes: false 
          525: hasEHFunclets: false 
          526: isOutlined: false 
          527: debugInstrRef: false 
          528: failsVerification: false 
          529: tracksDebugUserValues: false 
          530: registers: 
          531:  - { id: 0, class: vgpr, preferred-register: '' } 
          532:  - { id: 1, class: sgpr, preferred-register: '' } 
          533:  - { id: 2, class: vgpr, preferred-register: '' } 
          534:  - { id: 3, class: vgpr, preferred-register: '' } 
          535:  - { id: 4, class: sgpr, preferred-register: '' } 
          536:  - { id: 5, class: sgpr, preferred-register: '' } 
          537:  - { id: 6, class: vgpr, preferred-register: '' } 
          538:  - { id: 7, class: vgpr, preferred-register: '' } 
          539:  - { id: 8, class: vgpr, preferred-register: '' } 
          540: liveins: [] 
          541: frameInfo: 
          542:  isFrameAddressTaken: false 
          543:  isReturnAddressTaken: false 
          544:  hasStackMap: false 
          545:  hasPatchPoint: false 
            .
            .
            .
          809:  waveLimiter: false 
          810:  hasSpilledSGPRs: false 
          811:  hasSpilledVGPRs: false 
          812:  scratchRSrcReg: '$private_rsrc_reg' 
          813:  frameOffsetReg: '$fp_reg' 
          814:  stackPtrOffsetReg: '$sp_reg' 
          815:  bytesInStackArgArea: 0 
          816:  returnsVoid: true 
          817:  argumentInfo: 
          818:  privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' } 
          819:  dispatchPtr: { reg: '$sgpr4_sgpr5' } 
          820:  queuePtr: { reg: '$sgpr6_sgpr7' } 
          821:  dispatchID: { reg: '$sgpr10_sgpr11' } 
          822:  workGroupIDX: { reg: '$sgpr12' } 
          823:  workGroupIDY: { reg: '$sgpr13' } 
          824:  workGroupIDZ: { reg: '$sgpr14' } 
          825:  LDSKernelId: { reg: '$sgpr15' } 
          826:  implicitArgPtr: { reg: '$sgpr8_sgpr9' } 
          827:  workItemIDX: { reg: '$vgpr31', mask: 1023 } 
          828:  workItemIDY: { reg: '$vgpr31', mask: 1047552 } 
          829:  workItemIDZ: { reg: '$vgpr31', mask: 1072693248 } 
          830:  psInputAddr: 0 
          831:  psInputEnable: 0 
          832:  mode: 
          833:  ieee: true 
          834:  dx10-clamp: true 
          835:  fp32-input-denormals: true 
          836:  fp32-output-denormals: true 
          837:  fp64-fp16-input-denormals: true 
          838:  fp64-fp16-output-denormals: true 
          839:  highBitsOf32BitAddress: 0 
          840:  occupancy: 16 
          841:  vgprForAGPRCopy: '' 
          842: body: | 
          843:  bb.0: 
          844:  liveins: $vgpr0 
          845:   
          846:  %0:vgpr(s32) = COPY $vgpr0 
          847:  %1:sgpr(s32) = G_FCONSTANT float 2.000000e+00 
          848:  %2:vgpr(s32) = COPY %1(s32) 
          849:  %3:vgpr(s32) = G_FMUL %0, %2 
next:245'0                                  X error: no match found
next:245'1                                    with "FMUL" equal to "%3"
          850:  %4:sgpr(s32) = G_FCONSTANT float 1.000000e+00 
next:245'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          851:  %5:sgpr(s32) = G_FCONSTANT float 0.000000e+00 
next:245'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          852:  %6:vgpr(s32) = COPY %5(s32) 
next:245'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:245'2      ?                            possible intended match
          853:  %7:vgpr(s32) = COPY %4(s32) 
next:245'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          854:  %8:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmed3), %3(s32), %6(s32), %7(s32) 
next:245'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          855:  $vgpr0 = COPY %8(s32) 
next:245'0     ~~~~~~~~~~~~~~~~~~~~~~~
          856:  
next:245'0     ~
          857: ... 
next:245'0     ~~~~
>>>>>>

--

********************
********************
Failed Tests (1):
  LLVM :: CodeGen/AMDGPU/GlobalISel/regbankcombiner-clamp-fmed3-const.mir