This is an archive of the discontinued LLVM Phabricator instance.

VCC encoding
wave32 test is failing because the V_DIV_SCALE instruction has no wave32 variant, so I can't print VCC_LO. Ideally I'd print the implicit def (so it's always correct and not hardcoded like that) but I'm not sure it's possible?

Harbormaster completed remote builds in B181499: Diff 452971.Aug 16 2022, 6:21 AM

I'll need some help in review. I think the VCC encoding is correct, but it should be expressed differently (perhaps by using the HWEncoding first 7 bits?)
For the wave32 test/having different asm depending on wave32/64 I'm not sure how to do that, this is where I'm stuck currently.

If we prohibit and SDTS except VCC it should be also prohibited in asm/disasm.

This is not a first instruction which can only have VCC as carry, see any VOP2be instructions, for example V_ADD_CO_U32.

It shall not have SDST at all and instead impdef VCC. Asm string operand is replaced in the real instruction depending on a wave size, see for example:

def _dpp_w32_gfx10 :
  Base_VOP2_DPP16<op, !cast<VOP2_DPP_Pseudo>(opName#"_dpp"), asmName> {
    string AsmDPP = !cast<VOP2_Pseudo>(opName#"_e32").Pfl.AsmDPP16;
    let AsmString = asmName # !subst("vcc", "vcc_lo", AsmDPP);
    let isAsmParserOnly = 1;
    let WaveSizePredicate = isWave32;
  }

Plus you will need to enforce SDST encoding field to VCC for these instructions only, no need in a new SDstIsAlwaysVCC.

This revision now requires changes to proceed.Aug 22 2022, 11:00 AM

In D131959#3740314, @rampitec wrote:
If we prohibit and SDTS except VCC it should be also prohibited in asm/disasm.

This is not a first instruction which can only have VCC as carry, see any VOP2be instructions, for example V_ADD_CO_U32.

It shall not have SDST at all and instead impdef VCC. Asm string operand is replaced in the real instruction depending on a wave size, see for example:
def _dpp_w32_gfx10 :
  Base_VOP2_DPP16<op, !cast<VOP2_DPP_Pseudo>(opName#"_dpp"), asmName> {
    string AsmDPP = !cast<VOP2_Pseudo>(opName#"_e32").Pfl.AsmDPP16;
    let AsmString = asmName # !subst("vcc", "vcc_lo", AsmDPP);
    let isAsmParserOnly = 1;
    let WaveSizePredicate = isWave32;
  }
Plus you will need to enforce SDST encoding field to VCC for these instructions only, no need in a new SDstIsAlwaysVCC.

Will I need to create 2 variants of the instruction, a wave32/wave64 variant (e32/e64 if I understand correctly) for this to work?
Also for the asm/disasm, do I need to change the asmparser/add tests to verify everything other than VCC is rejected in the dst?

enforce SDST encoding field to VCC for these instructions only

How can I do this? Do I also follow the example of V_ADD_CO_U32?

In D131959#3740396, @Pierre-vh wrote:
In D131959#3740314, @rampitec wrote:
If we prohibit and SDTS except VCC it should be also prohibited in asm/disasm.

This is not a first instruction which can only have VCC as carry, see any VOP2be instructions, for example V_ADD_CO_U32.

It shall not have SDST at all and instead impdef VCC. Asm string operand is replaced in the real instruction depending on a wave size, see for example:
def _dpp_w32_gfx10 :
  Base_VOP2_DPP16<op, !cast<VOP2_DPP_Pseudo>(opName#"_dpp"), asmName> {
    string AsmDPP = !cast<VOP2_Pseudo>(opName#"_e32").Pfl.AsmDPP16;
    let AsmString = asmName # !subst("vcc", "vcc_lo", AsmDPP);
    let isAsmParserOnly = 1;
    let WaveSizePredicate = isWave32;
  }
Plus you will need to enforce SDST encoding field to VCC for these instructions only, no need in a new SDstIsAlwaysVCC.
Will I need to create 2 variants of the instruction, a wave32/wave64 variant (e32/e64 if I understand correctly) for this to work?

You need a single pseudo, but 2 different real instructions with different WaveSizePredicate. V_ADD_CO_U32 is not a good example after all, it does that for sdwa/dpp, which it does not have. You can pick a similar example from VOPCInstructions.td looking for _w32/_w64 variants.
Note, this instruction only has _e64 variant.

Also for the asm/disasm, do I need to change the asmparser/add tests to verify everything other than VCC is rejected in the dst?

enforce SDST encoding field to VCC for these instructions only

How can I do this? Do I also follow the example of V_ADD_CO_U32?

Just enforce sdst field to !cast<int>(VCC.HWEncoding) (i.e. 0x6a) in the Real class.

Also note, existing mc tests shall fail when you do it. You will have to create w32/w64 tests for gfx10/11 and negative tests for non-vcc.

In D131959#3740590, @rampitec wrote:
In D131959#3740396, @Pierre-vh wrote:
In D131959#3740314, @rampitec wrote:
If we prohibit and SDTS except VCC it should be also prohibited in asm/disasm.

This is not a first instruction which can only have VCC as carry, see any VOP2be instructions, for example V_ADD_CO_U32.

It shall not have SDST at all and instead impdef VCC. Asm string operand is replaced in the real instruction depending on a wave size, see for example:
def _dpp_w32_gfx10 :
  Base_VOP2_DPP16<op, !cast<VOP2_DPP_Pseudo>(opName#"_dpp"), asmName> {
    string AsmDPP = !cast<VOP2_Pseudo>(opName#"_e32").Pfl.AsmDPP16;
    let AsmString = asmName # !subst("vcc", "vcc_lo", AsmDPP);
    let isAsmParserOnly = 1;
    let WaveSizePredicate = isWave32;
  }
Plus you will need to enforce SDST encoding field to VCC for these instructions only, no need in a new SDstIsAlwaysVCC.
Will I need to create 2 variants of the instruction, a wave32/wave64 variant (e32/e64 if I understand correctly) for this to work?
You need a single pseudo, but 2 different real instructions with different WaveSizePredicate. V_ADD_CO_U32 is not a good example after all, it does that for sdwa/dpp, which it does not have. You can pick a similar example from VOPCInstructions.td looking for _w32/_w64 variants.
Note, this instruction only has _e64 variant.

Also for the asm/disasm, do I need to change the asmparser/add tests to verify everything other than VCC is rejected in the dst?

enforce SDST encoding field to VCC for these instructions only

How can I do this? Do I also follow the example of V_ADD_CO_U32?

Just enforce sdst field to !cast<int>(VCC.HWEncoding) (i.e. 0x6a) in the Real class.

I've written the following for one of the real instructions, and I get a encoding conflict (VCC/VCC_LO have the same encoding).
Am I doing this wrong?

Also, do I need to create a new "Real" class specifically for this, or should I just add a flag to the VOP_PROFILE & change the existing classes to create _w32/64 variants?
Thanks

multiclass VOP3be_VCCSDST_Real_vi<bits<10> op> {
  let sdst = !cast<int>(VCC_LO.HWEncoding), WaveSizePredicate = isWave32 in
  def _vi_w32 : VOP3_Real<!cast<VOP_Pseudo>(NAME#"_e64"), SIEncodingFamily.VI>,
                VOP3be_vi <op, !cast<VOP_Pseudo>(NAME#"_e64").Pfl>;

  let sdst = !cast<int>(VCC.HWEncoding), WaveSizePredicate = isWave64 in
  def _vi_w64 : VOP3_Real<!cast<VOP_Pseudo>(NAME#"_e64"), SIEncodingFamily.VI>,
                VOP3be_vi <op, !cast<VOP_Pseudo>(NAME#"_e64").Pfl>;
}

[build] Decoding Conflict:
[build] 		................................1101000111100000.1101010........
[build] 		................................1101000111100000................
[build] 		................................110100..........................
[build] 		................................................................
[build] 	V_DIV_SCALE_F32_vi_w32 ________________________________1101000111100000_1101010________
[build] 	V_DIV_SCALE_F32_vi_w64 ________________________________1101000111100000_1101010________
[build] Decoding Conflict:
[build] 		................................1101000111100001.1101010........
[build] 		................................1101000111100001................
[build] 		................................110100..........................
[build] 		................................................................
[build] 	V_DIV_SCALE_F64_vi_w32 ________________________________1101000111100001_1101010________
[build] 	V_DIV_SCALE_F64_vi_w64 ________________________________1101000111100001_1101010________

Adding isAsmParserOnly solves it, but now it's InstrInfo that fails:

[build] error: Multiple matches found for `V_DIV_SCALE_F32_e64', for the relation `getMCOpcodeGen', row fields ["v_div_scale_f32_e64"], column `["0"]'
[build] CurInstr: V_DIV_SCALE_F32_w64_gfx6_gfx7
[build] MatchInstr: V_DIV_SCALE_F32_w32_gfx6_gfx7

I'm going to update the diff with my latest (non-working) draft. It's a bit messy so I'll look into cleaning it up further.

Pierre-vh updated this revision to Diff 454752.Aug 23 2022, 2:19 AM

Non-working draft/refactor

Harbormaster completed remote builds in B182785: Diff 454752.Aug 23 2022, 2:20 AM

In D131959#3742034, @Pierre-vh wrote:
Adding isAsmParserOnly solves it, but now it's InstrInfo that fails:
[build] error: Multiple matches found for `V_DIV_SCALE_F32_e64', for the relation `getMCOpcodeGen', row fields ["v_div_scale_f32_e64"], column `["0"]'
[build] CurInstr: V_DIV_SCALE_F32_w64_gfx6_gfx7
[build] MatchInstr: V_DIV_SCALE_F32_w32_gfx6_gfx7
I'm going to update the diff with my latest (non-working) draft. It's a bit messy so I'll look into cleaning it up further.

You should only define w32/w64 for subtargets which have it (i.e. gfx10+).

Non-working Draft.

In D131959#3743472, @rampitec wrote:
In D131959#3742034, @Pierre-vh wrote:
Adding isAsmParserOnly solves it, but now it's InstrInfo that fails:
[build] error: Multiple matches found for `V_DIV_SCALE_F32_e64', for the relation `getMCOpcodeGen', row fields ["v_div_scale_f32_e64"], column `["0"]'
[build] CurInstr: V_DIV_SCALE_F32_w64_gfx6_gfx7
[build] MatchInstr: V_DIV_SCALE_F32_w32_gfx6_gfx7
I'm going to update the diff with my latest (non-working) draft. It's a bit messy so I'll look into cleaning it up further.
You should only define w32/w64 for subtargets which have it (i.e. gfx10+).

Unfortunately I keep having the issue even after I removed the <gfx10 w32/w64 variants.

[build] error: Multiple matches found for `V_DIV_SCALE_F32_e64', for the relation `getMCOpcodeGen', row fields ["v_div_scale_f32_e64"], column `["6"]'
[build] V_DIV_SCALE_F32_w64_gfx10
[build] V_DIV_SCALE_F32_w32_gfx10

Maybe I need to create an InstAlias for wave32/wave64? Not sure how it works

Harbormaster completed remote builds in B183051: Diff 455121.Aug 24 2022, 3:02 AM

It compiles now, but it still does not print vcc_lo for wave32.ll and I'm not sure what else to try.
I spent a while comparing the detailed records of TableGen and found no major difference. I also tried
looking for example instructions that have this w32 variant (like v_cmp_lt) and found no special-casing anywhere in the codebase, so
i'm not sure what to do

Removing VOP_Real from the w32/w64 variants.
Still not working

Various cleanups
vcc_lo/w32 variant is still not taken into account.

So far, if I understand correctly, my changes only affect the asm parser, but not yet the printer, correct? That's why I'm not seeing vcc_lo in the output.
It seems like some changes are needed on that front but I'm not sure where/how as it looks like this is the first VOP3 instruction to need special casing to print vcc/vcc_lo?

Maybe in needsImpliedVcc?

I also get an assertion failure currently

[build] llvm-mc: ../include/llvm/MC/MCInst.h:81: int64_t llvm::MCOperand::getImm() const: Assertion `isImm() && "This is not an immediate"' failed.
[build] PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
[build] Stack dump:
[build] 0.	Program arguments: /home/pvanhout/work/trunk/llvm-project/llvm/build/bin/llvm-mc -arch=amdgcn -mcpu=gfx900 -show-encoding /home/pvanhout/work/trunk/llvm-project/llvm/test/MC/AMDGPU/gfx9_asm_vop3_e64.s
[build]  #0 0x0000555ed26ecdf3 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/pvanhout/work/trunk/llvm-project/llvm/build/../lib/Support/Unix/Signals.inc:569:13
[build]  #1 0x0000555ed26eb310 llvm::sys::RunSignalHandlers() /home/pvanhout/work/trunk/llvm-project/llvm/build/../lib/Support/Signals.cpp:104:18
[build]  #2 0x0000555ed26ed40f SignalHandler(int) /home/pvanhout/work/trunk/llvm-project/llvm/build/../lib/Support/Unix/Signals.inc:407:1
[build]  #3 0x00007f9cdcf8d420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
[build]  #4 0x00007f9cdca2000b raise /build/glibc-SzIz7B/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
[build]  #5 0x00007f9cdc9ff859 abort /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:81:7
[build]  #6 0x00007f9cdc9ff729 get_sysdep_segment_value /build/glibc-SzIz7B/glibc-2.31/intl/loadmsgcat.c:509:8
[build]  #7 0x00007f9cdc9ff729 _nl_load_domain /build/glibc-SzIz7B/glibc-2.31/intl/loadmsgcat.c:970:34
[build]  #8 0x00007f9cdca10fd6 (/lib/x86_64-linux-gnu/libc.so.6+0x33fd6)
[build]  #9 0x0000555ed240aae5 (/home/pvanhout/work/trunk/llvm-project/llvm/build/bin/llvm-mc+0x812ae5)
[build] #10 0x0000555ed24079f9 llvm::AMDGPUInstPrinter::printInstruction(llvm::MCInst const*, unsigned long, llvm::MCSubtargetInfo const&, llvm::raw_ostream&) /home/pvanhout/work/trunk/llvm-project/llvm/build/lib/Target/AMDGPU/AMDGPUGenAsmWriter.inc:0:0
[build] #11 0x0000555ed2405659 llvm::AMDGPUInstPrinter::printInst(llvm::MCInst const*, unsigned long, llvm::StringRef, llvm::MCSubtargetInfo const&, llvm::raw_ostream&) /home/pvanhout/work/trunk/llvm-project/llvm/build/../lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp:54:3
[build] #12 0x0000555ed2618ab4 llvm::MCTargetStreamer::prettyPrintAsm(llvm::MCInstPrinter&, unsigned long, llvm::MCInst const&, llvm::MCSubtargetInfo const&, llvm::raw_ostream&) /home/pvanhout/work/trunk/llvm-project/llvm/build/../lib/MC/MCStreamer.cpp:1065:1
[build] #13 0x0000555ed25d66cf llvm::SmallVectorBase<unsigned long>::size() const /home/pvanhout/work/trunk/llvm-project/llvm/build/../include/llvm/ADT/SmallVector.h:77:32
[build] #14 0x0000555ed25d66cf llvm::SmallString<128u>::str() const /home/pvanhout/work/trunk/llvm-project/llvm/build/../include/llvm/ADT/SmallString.h:260:64
[build] #15 0x0000555ed25d66cf llvm::SmallString<128u>::operator llvm::StringRef() const /home/pvanhout/work/trunk/llvm-project/llvm/build/../include/llvm/ADT/SmallString.h:270:39
[build] #16 0x0000555ed25d66cf (anonymous namespace)::MCAsmStreamer::emitInstruction(llvm::MCInst const&, llvm::MCSubtargetInfo const&) /home/pvanhout/work/trunk/llvm-project/llvm/build/../lib/MC/MCAsmStreamer.cpp:2297:24
[build] #17 0x0000555ed227b8fc (anonymous namespace)::AMDGPUAsmParser::MatchAndEmitInstruction(llvm::SMLoc, unsigned int&, llvm::SmallVectorImpl<std::unique_ptr<llvm::MCParsedAsmOperand, std::default_delete<llvm::MCParsedAsmOperand>>>&, llvm::MCStreamer&, unsigned long&, bool) /home/pvanhout/work/trunk/llvm-project/llvm/build/../lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp:0:9
[build] #18 0x0000555ed266d294 (anonymous namespace)::AsmParser::parseAndMatchAndEmitTargetInstruction((anonymous namespace)::ParseStatementInfo&, llvm::StringRef, llvm::AsmToken, llvm::SMLoc) /home/pvanhout/work/trunk/llvm-project/llvm/build/../lib/MC/MCParser/AsmParser.cpp:2387:27
[build] #19 0x0000555ed2661e60 (anonymous namespace)::AsmParser::parseStatement((anonymous namespace)::ParseStatementInfo&, llvm::MCAsmParserSemaCallback*) /home/pvanhout/work/trunk/llvm-project/llvm/build/../lib/MC/MCParser/AsmParser.cpp:2320:10
[build] #20 0x0000555ed265bbee (anonymous namespace)::AsmParser::Run(bool, bool) /home/pvanhout/work/trunk/llvm-project/llvm/build/../lib/MC/MCParser/AsmParser.cpp:1004:16
[build] #21 0x0000555ed2245247 AssembleInput(char const*, llvm::Target const*, llvm::SourceMgr&, llvm::MCContext&, llvm::MCStreamer&, llvm::MCAsmInfo&, llvm::MCSubtargetInfo&, llvm::MCInstrInfo&, llvm::MCTargetOptions const&) /home/pvanhout/work/trunk/llvm-project/llvm/build/../tools/llvm-mc/llvm-mc.cpp:344:13
[build] #22 0x0000555ed224402c main /home/pvanhout/work/trunk/llvm-project/llvm/build/../tools/llvm-mc/llvm-mc.cpp:0:11
[build] #23 0x00007f9cdca01083 __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:342:3
[build] #24 0x0000555ed224214e _start (/home/pvanhout/work/trunk/llvm-project/llvm/build/bin/llvm-mc+0x64a14e)

Harbormaster completed remote builds in B183326: Diff 455521.Aug 25 2022, 4:13 AM

In D131959#3748543, @Pierre-vh wrote:

So far, if I understand correctly, my changes only affect the asm parser, but not yet the printer, correct? That's why I'm not seeing vcc_lo in the output.
It seems like some changes are needed on that front but I'm not sure where/how as it looks like this is the first VOP3 instruction to need special casing to print vcc/vcc_lo?

Maybe in needsImpliedVcc?

Yes, something along these lines and call printDefaultVccOperand. There are similar cases handled there but not 100% the same.

Fix Parser/Printer

There is still a test (gfx10_asm_vop3.s) that I'm not sure how to update (so I deleted the checks for now)
I'd say the encoding needs to be updated everywhere, but should I duplicate the tests for w32/w64 then?
Or just use GFX10 check instead of W32/W64?
Moreover, there is no other test in that file that has a wavesize-dependent vcc/vcc_lo register usage so it'd
seem strange to leave div_scale there unless there's a pattern I'm not seeing.

Do I just not check for the "operands are not valid for this GPU or mode" errors and use GFX10 check, then update
all encodings as needed?

I feel like since another test already checks "operands are not valid for this GPU or mode", leaving the instructions
to just use VCC (or do half vcc_lo/half vcc) + using GFX10 check should be enough. What do you think?

Harbormaster completed remote builds in B183942: Diff 456362.Aug 29 2022, 10:15 AM

rampitec added inline comments.Aug 29 2022, 1:47 PM

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1010 ↗	(On Diff #456362)	TRI->getVCC().
1012 ↗	(On Diff #456362)	Is this really needed?
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
936–950	Tabs.
961	MIB->getNextNode() looks suspicious.
962	TRI.getVCC().
llvm/test/CodeGen/AMDGPU/frem.ll
2302	What happened here?
llvm/test/MC/AMDGPU/gfx10_asm_vop3.s
6796	Replace s[0:1]/s0 with vcc/vcc_lo
llvm/test/MC/AMDGPU/gfx7_asm_vop3_e64.s
8815 ↗	(On Diff #456362)	Encoding is wrong.
llvm/test/MC/AMDGPU/vop3.s
411	Ditto.
llvm/test/MC/AMDGPU/wave32.s
386–388	Please keep checks order so it is easy to see changes.
llvm/test/MC/AMDGPU/wave_any.s
201–202	Ditto.

Fixing remaining issues.
I had to hack a bit the AsmParser to make it all fit together. I think InstAlias could be used to do the w32/w64 -> normal instruction translation automatically but I don't know how to use it properly.

Herald added a reviewer: andreadb. · View Herald TranscriptAug 30 2022, 4:47 AM

Herald added a subscriber: gbedwell. · View Herald Transcript

Pierre-vh added inline comments.Aug 30 2022, 4:47 AM

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1012 ↗	(On Diff #456362)	Yes, otherwise a lot of tests fail with "Using an undefined physical register".
llvm/test/CodeGen/AMDGPU/frem.ll
2302	I think something is wrong with the way I replace `SDValue(N, 1)` in `SelectDIV_SCALE` but I'm not sure how to fix it. The `addLiveIn` is suspicious (I can't find any ISelDAGtoDAG impl that also uses it) but without it, it crashes.
llvm/test/MC/AMDGPU/wave32.s
386–388	I didn't change the order. Do you mean the -ERR check should always be at the bottom? (e.g. change GFX1064 to GFX1032 instead of adding the -ERR suffix & checking the error message)
llvm/test/MC/AMDGPU/wave_any.s
201–202	Is the check order the issue or the encoding? Encoding looks good to me
llvm/test/tools/llvm-mca/AMDGPU/gfx11-double.s
147 ↗	(On Diff #456614)	Not sure why this popped up, I don't really understand the message

Rebase

Waiting for feedback on:

AsmParser: Do I need to move to a InstAlias? Would it work?
"Truncated display due to cycle limit": Not sure where to start looking with that one.
DAGISel impl: Seems like it makes codegen worse in somee cases but I haven't found a cleaner way to adress the problem yet.

foad added inline comments.Aug 30 2022, 5:54 AM

llvm/test/tools/llvm-mca/AMDGPU/gfx11-double.s
147 ↗	(On Diff #456614)	Add `--timeline-max-cycles=0` to the RUN line? All the other files in this directory have it.

Adding --timeline-max-cycles=0

Pierre-vh marked an inline comment as done.Aug 30 2022, 6:16 AM

So for the weird codegen in frem.ll, the issue is from the addLiveIn call I think.
In fact my changes to the DAGISel aren't needed I believe, but in practice there are segfaults in the following case: When a DIV_FMAS uses the result of DIV_SCALE.
In such cases, the following happens:

t92: f32,i1 = DIV_SCALE nofpexcept t59, t57, t59

ISEL: Starting selection on root node: t102: f32 = DIV_FMAS nofpexcept t101, t97, t100, t92:1
ISEL: Starting pattern match
  Initial Opcode index to 456907
  TypeSwitch[f32] from 456914 to 456917
  Skipped scope entry (due to false predicate) at index 456919, continuing at 456953
Creating new node: t127: ch,glue = CopyToReg t0, Register:i1 $vcc_lo, t92:1
  Morphed node: t102: f32 = V_DIV_FMAS_F32_e64 nofpexcept TargetConstant:i32<0>, t101, TargetConstant:i32<0>, t97, TargetConstant:i32<0>, t100, TargetConstant:i1<0>, TargetConstant:i32<0>, t127:1
ISEL: Match complete!

Then the following MIR is emitted.

%23:vgpr_32 = nofpexcept V_DIV_SCALE_F32_e64 0, %16:vgpr_32, 0, %17:vgpr_32, 0, %16:vgpr_32, 0, 0, implicit-def $vcc, implicit $mode, implicit $exec
%24:sreg_32 = COPY $vcc
$vcc_lo = COPY %24:sreg_32
%29:vgpr_32 = nofpexcept V_DIV_FMAS_F32_e64 0, killed %28:vgpr_32, 0, %22:vgpr_32, 0, %27:vgpr_32, 0, 0, implicit $mode, implicit $vcc, implicit $exec

And SIInstrInfo.cpp asserts when lowering the copy:

real copy:   renamable $vcc_lo = COPY $vcc
llc: ../lib/Target/AMDGPU/SIInstrInfo.cpp:762: virtual void llvm::SIInstrInfo::copyPhysReg(llvm::MachineBasicBlock &, MachineBasicBlock::iterator, const llvm::DebugLoc &, llvm::MCRegister, llvm::MCRegister, bool) const: Assertion `AMDGPU::VGPR_32RegClass.contains(SrcReg)' failed.

I feel like that copy shouldn't be there in the first place, but it seems like the DAGISel is adding it automatically.
The pattern used to select the FMAS is:

class DivFmasPat<ValueType vt, Instruction inst, Register CondReg> : GCNPat<
  (AMDGPUdiv_fmas (vt (VOP3Mods vt:$src0, i32:$src0_modifiers)),
                  (vt (VOP3Mods vt:$src1, i32:$src1_modifiers)),
                  (vt (VOP3Mods vt:$src2, i32:$src2_modifiers)),
                  (i1 CondReg)),
  (inst $src0_modifiers, $src0, $src1_modifiers, $src1, $src2_modifiers, $src2)
>;

let WaveSizePredicate = isWave64 in {
def : DivFmasPat<f32, V_DIV_FMAS_F32_e64, VCC>;
def : DivFmasPat<f64, V_DIV_FMAS_F64_e64, VCC>;
}

let WaveSizePredicate = isWave32 in {
def : DivFmasPat<f32, V_DIV_FMAS_F32_e64, VCC_LO>;
def : DivFmasPat<f64, V_DIV_FMAS_F64_e64, VCC_LO>;
}

Does anyone know where the issue lies? I'm a bit stuck there at the moment.
Is it the pattern that shouldn't be using VCC_LO like that? Or a missing case in SIInstrInfo.cpp ?
Maybe the register allocator should be more careful and transform the COPY into an EXTRACT_SUBREG when allocating %24 to vcc?

Harbormaster completed remote builds in B184150: Diff 456644.Aug 30 2022, 7:40 AM

In D131959#3758373, @Pierre-vh wrote:

So for the weird codegen in frem.ll, the issue is from the addLiveIn call I think.

I believe so too. You are defining the vcc, you should not need it live to define it. Much less live at a function's entry.

rampitec added inline comments.Aug 30 2022, 12:14 PM

llvm/test/CodeGen/AMDGPU/GlobalISel/constant-bus-restriction.ll
255	What is this and why is it needed?
llvm/test/MC/AMDGPU/wave32.s
386–388	It was wave32 test. Now it is wave64 test and wave32 test is below. They are swapped.
llvm/test/tools/llvm-mca/AMDGPU/gfx11-double.s
147 ↗	(On Diff #456614)	Why did it change at all?

@nhaehnle, @foad sorry for the ping, but I see you've also contributed to DAGISel so I was wondering if you had any input regarding the issue highlighted above.
Do you think there's a problem in the DIV_FMAS pattern, SIInstrInfo or the DIV_SCALE lowering?

llvm/test/tools/llvm-mca/AMDGPU/gfx11-double.s
147 ↗	(On Diff #456614)	I don't think it changed, just content was added to it. I think the previous person that updated it didn't add the `--timeline-max-cycles=0` and instead removed the check line with the cycles limit. I think I can revert all these and the test should still pass, it'll just cover less instructions?

Fixing wave32.s, removing useless prefixes from constant-bus-restriction.ll

Harbormaster completed remote builds in B184372: Diff 456968.Aug 31 2022, 8:58 AM

rampitec added inline comments.Aug 31 2022, 10:12 AM

llvm/test/MC/AMDGPU/wave_any.s
207	It seems the intent of the test was to check that both wave32 and wave64 versions are accepted with both attributes set. This is lost now.
llvm/test/tools/llvm-mca/AMDGPU/gfx11-double.s
147 ↗	(On Diff #456614)	Then it does not belong to this patch.

Revert llvm-mca test changes + fix wave_any test

Harbormaster completed remote builds in B184519: Diff 457181.Aug 31 2022, 11:54 PM

So I think I found a solution that may work. The idea is just to:

Remove the SDag changes
- After digging deep, I found that the InstrEmitter is smart enough to infer that the returned i1 is the implicit def of the instruction. No changes needed so.
- However, it incorrectly assigns VReg_1 for the register class of the VCC copy it emits in InstrEmitter. This can be fixed in a pass.
Add a small pass that runs before FixSGPRCopies that fixes the COPY of the implicit def for V_DIV_SCALE. Something like "FixVCCImpDefCopy". The pass itself is minimal and the logic boils down to:

// Iterate over all instructions, check opcode
      case AMDGPU::V_DIV_SCALE_F32_e64:
      case AMDGPU::V_DIV_SCALE_F64_e64: {
        TII->fixImplicitOperands(MI);

        // Check for COPY of VCC right after it and fix it too.
        auto NextI = std::next(I);
        if(NextI == MBB->end() || NextI->getOpcode() != AMDGPU::COPY ||
            NextI->getOperand(1).getReg() != AMDGPU::VCC) {
          break;
        }

        if(TII->isWave32()) {
          NextI->getOperand(1).setReg(AMDGPU::VCC_LO);
          MRI->setRegClass(NextI->getOperand(0).getReg(), &AMDGPU::SReg_32_XM0_XEXECRegClass);
        } else {
          MRI->setRegClass(NextI->getOperand(0).getReg(), &AMDGPU::SReg_64_XEXECRegClass);
        }
      }

@foad, @rampitec do you think it's an acceptable solution? I can get all tests to pass with a seemingly normal codegen with that.

Add FixVCCImpDef pass as explained in previous comment

Herald added a subscriber: mgorny. · View Herald TranscriptSep 1 2022, 6:28 AM

However, it incorrectly assigns VReg_1 for the register class of the VCC copy it emits in InstrEmitter.

Why is VReg_1 wrong? In some places that is the class we use for an SGPR (or SGPR pair) that represents an independent 1-bit value per lane. Why doesn't that work here?

In D131959#3763862, @foad wrote:

However, it incorrectly assigns VReg_1 for the register class of the VCC copy it emits in InstrEmitter.

Why is VReg_1 wrong? In some places that is the class we use for an SGPR (or SGPR pair) that represents an independent 1-bit value per lane. Why doesn't that work here?

It indeed seems to be used to represent a i1 when the value is divergent. Though I'm not sure why, but if I don't correct the regclass to an SReg, the LowerI1Copies pass cannot handle it.

Perhaps the issue lies in LowerI1Copies? I don't understand that pass well enough to know if the assertion is legitimate, or a sign of an unhandled case.
It seems like, to me, it shouldn't be lowering the VCC copy to a specific instruction but rather only lower it when it's actually needed (when the value needs to be preserved).

[build] llc: llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp:713: bool (anonymous namespace)::SILowerI1Copies::lowerCopiesToI1(): Assertion `TII->getRegisterInfo().getRegSizeInBits(SrcReg, *MRI) == 32' failed.

[build]   LLVM :: CodeGen/AMDGPU/dpp64_combine.ll
[build]   LLVM :: CodeGen/AMDGPU/fcanonicalize-elimination.ll
[build]   LLVM :: CodeGen/AMDGPU/fdiv-nofpexcept.ll
[build]   LLVM :: CodeGen/AMDGPU/fdiv.f16.ll
[build]   LLVM :: CodeGen/AMDGPU/llvm.powi.ll
[build]   LLVM :: CodeGen/AMDGPU/rsq.ll
[build]   LLVM :: CodeGen/AMDGPU/sgpr-copy.ll
[build]   LLVM :: CodeGen/AMDGPU/si-sgpr-spill.ll
[build]   LLVM :: CodeGen/AMDGPU/wave32.ll
`

I think if you fix the COPY to refer to $vcc_lo instead of $vcc (for wave 32) then SILowerI1Copies will be happy, because $vcc_lo will satisfy the isLaneMaskReg test. But there should be no need to change the regclass of the COPY.

In D131959#3763894, @foad wrote:

I think if you fix the COPY to refer to $vcc_lo instead of $vcc (for wave 32) then SILowerI1Copies will be happy, because $vcc_lo will satisfy the isLaneMaskReg test. But there should be no need to change the regclass of the COPY.

It's what I was initially doing, but then it crashes on Wave64:

Process 397709 stopped
* thread #1, name = 'llc', stop reason = hit program assert
    frame #4: 0x0000555556f066c5 llc`(anonymous namespace)::SILowerI1Copies::lowerCopiesToI1(this=0x000055555f3a0ae0) at SILowerI1Copies.cpp:713:9
   710        assert(!MI.getOperand(1).getSubReg());
   711 
   712        if (!SrcReg.isVirtual() || (!isLaneMaskReg(SrcReg) && !isVreg1(SrcReg))) {
-> 713          assert(TII->getRegisterInfo().getRegSizeInBits(SrcReg, *MRI) == 32);
   714          unsigned TmpReg = createLaneMaskReg(*MF);
   715          BuildMI(MBB, MI, DL, TII->get(AMDGPU::V_CMP_NE_U32_e64), TmpReg)
   716              .addReg(SrcReg)
(lldb) expr MI->dump()
  %7:sreg_64 = COPY $vcc
  Fix-it applied, fixed expression was: 
    MI.dump()
(lldb) expr TII->isWave32()
(bool) $0 = false

I checked with:

Not forcing the regbank to SReg
Adding v_cmp_ne_u64_e64` & supporting 64 bits sources in LowerI1Copies

It doesn't break codegen, and it no longer crashes, though we get weirdness like v_cmp_ne_u64_e64 vcc, vcc, 0 in the output because:

%14:sreg_32 = V_CMP_NE_U32_e64 0, $vcc_lo, implicit $exec
$vcc_lo = COPY killed %14:sreg_32

> $vcc_lo = COPY killed renamable $vcc_lo
Identity copy: $vcc_lo = COPY killed renamable $vcc_lo
  deleted.
renamable $vcc_lo = V_CMP_NE_U32_e64 0, $vcc_lo, implicit $exec

Well, I don't fully understand the condition on line 712:

if (!SrcReg.isVirtual() || (!isLaneMaskReg(SrcReg) && !isVreg1(SrcReg)))

I'm not sure what kind of copies from physical SGPRs this pass is (was) expecting to see here. I suppose an instruction like this could have two possible interpretations:

%0:vreg_1 = COPY $sgpr0

Each bit of sgpr0 supplies an i1 value for the corresponding lane of the result.
The whole of sgpr0 is either 0 or 1, and that value should be broadcast to every lane of the result.

Stepping back a bit, it feels like you are running into problems because you have a def and a use of vcc, where one of them is implicit and the other is explicit. If def and use were both implicit or both explicit then I think things might work better. Is there a way you can change v_div_scale so that the vcc operand is explicit?

Harbormaster completed remote builds in B184563: Diff 457250.Sep 1 2022, 7:20 AM

In D131959#3763962, @foad wrote:
Well, I don't fully understand the condition on line 712:
if (!SrcReg.isVirtual() || (!isLaneMaskReg(SrcReg) && !isVreg1(SrcReg)))
I'm not sure what kind of copies from physical SGPRs this pass is (was) expecting to see here. I suppose an instruction like this could have two possible interpretations:
%0:vreg_1 = COPY $sgpr0
Each bit of sgpr0 supplies an i1 value for the corresponding lane of the result.

The whole of sgpr0 is either 0 or 1, and that value should be broadcast to every lane of the result.

Stepping back a bit, it feels like you are running into problems because you have a def and a use of vcc, where one of them is implicit and the other is explicit. If def and use were both implicit or both explicit then I think things might work better. Is there a way you can change v_div_scale so that the vcc operand is explicit?

I initially wanted to do just that, but IIRC @arsenm told me that it's not a good idea to force a output operand to be a specific register by constraining it with a RegClass, instead, implicit definitions should be used. I also remember starting out this patch just like that but I ran into a lot more issues, especially with the register allocator.

I quickly checked and check-llvm-codegen-amdgpu passes if I skip lowering Copies to i1 for VCC/VCC_LO. Should we just do that? Not sure if it makes much sense to lower such copies.
Then the pass would be very minimal, just fixing up the copy, but I think it might even be possible to remove it entirely in that case.

I quickly checked and check-llvm-codegen-amdgpu passes if I skip lowering Copies to i1 for VCC/VCC_LO. Should we just do that?

My gut feeling is that that's not safe because VCC/VCC_LO should be treated the same as any other SGPRs. But really I'm out of my depth here and I don't know what the correct solution is.

In D131959#3764069, @foad wrote:

I quickly checked and check-llvm-codegen-amdgpu passes if I skip lowering Copies to i1 for VCC/VCC_LO. Should we just do that?

My gut feeling is that that's not safe because VCC/VCC_LO should be treated the same as any other SGPRs. But really I'm out of my depth here and I don't know what the correct solution is.

I guess it depends on what the pass is trying to achieve. With my current understanding of how this all works, VCC is already a lane mask + it's uniform across the wave since it's a SGPR under the hood, so it shouldn't need special treatment to be passed around, no?

The pass uses V_CMP_NE_U32 to do the lowering/copy, which does:

D.u64[threadId] = (S0 <> S1).

So for v_cmp_ne_u64_e64 vcc, vcc, 0, this would just set the mask to all ones or zeroes depending on whether VCC is all zeroes, no? Then it doesn't make sense because it doesn't COPY VCC but instead changes it.

I'm also out of my depth here, I'm trying to piece this together but my current understanding would be that it doesn't make sense to do this lowering for copies of physical SGPRs to i1

This is really an overkill to have a separate pass to fix vcc copy of just 2 instructions. What if you change the lowering instead? I believe it shall be possible to w/a it by not reading the second result of the lowered DIV_SCALE and glue a CopyFromReg to it with a correct register, i.e. VCC or VCC_LO.

Stepping back a bit, it feels like you are running into problems because you have a def and a use of vcc, where one of them is implicit and the other is explicit. If def and use were both implicit or both explicit then I think things might work better. Is there a way you can change v_div_scale so that the vcc operand is explicit?

+ @jrbyrnes who did some work on implicit vs explicit uses of $scc in D128681.

REmove pass

I got rid of the pass. I tried using a glued node

CurDAG->SelectNodeTo(N, Opc, CurDAG->getVTList(VT, MVT::i1, MVT::Glue), Ops);
SDValue CopyFromReg = CurDAG->getCopyFromReg(CurDAG->getEntryNode(), SL, TRI->getVCC(), MVT::i1, SDValue(N, 2));
CurDAG->ReplaceAllUsesOfValueWith(SDValue(N, 1), CopyFromReg);

and that doesn't work, it doesn't crash but it messes up scheduling tool much and we end up with things like this:

	v_div_scale_f32 v2, vcc, v0, v1, v0
	s_mov_b64 s[0:1], vcc
	v_div_scale_f32 v3, vcc, v1, v1, v0
	s_mov_b64 vcc, s[0:1]

Instead of just swapping the div_scales.

If I fix it later in finalizeLowering when we change the implicit VCC to a VCC_LO, then it works fine.

Harbormaster completed remote builds in B185797: Diff 458989.Sep 9 2022, 3:03 AM

In D131959#3779567, @Pierre-vh wrote:
I got rid of the pass. I tried using a glued node
CurDAG->SelectNodeTo(N, Opc, CurDAG->getVTList(VT, MVT::i1, MVT::Glue), Ops);
SDValue CopyFromReg = CurDAG->getCopyFromReg(CurDAG->getEntryNode(), SL, TRI->getVCC(), MVT::i1, SDValue(N, 2));
CurDAG->ReplaceAllUsesOfValueWith(SDValue(N, 1), CopyFromReg);
and that doesn't work, it doesn't crash but it messes up scheduling tool much and we end up with things like this:
	v_div_scale_f32 v2, vcc, v0, v1, v0
	s_mov_b64 s[0:1], vcc
	v_div_scale_f32 v3, vcc, v1, v1, v0
	s_mov_b64 vcc, s[0:1]
Instead of just swapping the div_scales.

If I fix it later in finalizeLowering when we change the implicit VCC to a VCC_LO, then it works fine.

Did you change AMDGPUdiv_scale to produce glue? So that you have something to glue this copy to?

+ @jrbyrnes who did some work on implicit vs explicit uses of $scc in D128681.

Haven't looked fully into this review, but the work in D128681 seems orthogonal to the issue here -- the issue in that ticket was involving not honoring live ranges. Glued copy nodes that wrote to a PhysReg didn't check if the copy would clobber a live range.

Anyway, are the vcc COPYs needed here? Maybe I don't understand something, but it would be nice to see isel not emit the COPY and just rely on the implicit def/use of $vcc.

Check D133593, it tries to address a similar problem, but with SCC.

Check D133593, it tries to address a similar problem, but with SCC.

Thanks Stas -- I played with D133593 a bit. Looks like if we extend that to support VCC, then we don't need to emit those COPYs which I think will clear things up a bit for this.

In D131959#3781870, @jrbyrnes wrote:

Check D133593, it tries to address a similar problem, but with SCC.

Thanks Stas -- I played with D133593 a bit. Looks like if we extend that to support VCC, then we don't need to emit those COPYs which I think will clear things up a bit for this.

What is the status on this diff then? Do I just need to rebase on top of D133593, extending it with VCC support, and it'll fix the issues?

In D131959#3780692, @rampitec wrote:
In D131959#3779567, @Pierre-vh wrote:
I got rid of the pass. I tried using a glued node
CurDAG->SelectNodeTo(N, Opc, CurDAG->getVTList(VT, MVT::i1, MVT::Glue), Ops);
SDValue CopyFromReg = CurDAG->getCopyFromReg(CurDAG->getEntryNode(), SL, TRI->getVCC(), MVT::i1, SDValue(N, 2));
CurDAG->ReplaceAllUsesOfValueWith(SDValue(N, 1), CopyFromReg);
and that doesn't work, it doesn't crash but it messes up scheduling tool much and we end up with things like this:
	v_div_scale_f32 v2, vcc, v0, v1, v0
	s_mov_b64 s[0:1], vcc
	v_div_scale_f32 v3, vcc, v1, v1, v0
	s_mov_b64 vcc, s[0:1]
Instead of just swapping the div_scales.

If I fix it later in finalizeLowering when we change the implicit VCC to a VCC_LO, then it works fine.
Did you change AMDGPUdiv_scale to produce glue? So that you have something to glue this copy to?

Isn't CurDAG->SelectNodeTo(N, Opc, CurDAG->getVTList(VT, MVT::i1, MVT::Glue), Ops); enough for that? Do I need to change the DAG Op definition itself?

In D131959#3783259, @Pierre-vh wrote:

Did you change AMDGPUdiv_scale to produce glue? So that you have something to glue this copy to?

Isn't CurDAG->SelectNodeTo(N, Opc, CurDAG->getVTList(VT, MVT::i1, MVT::Glue), Ops); enough for that? Do I need to change the DAG Op definition itself?

I think so, but I believe we have reached the conclusion we do not need this w/a. We are pretty confident it works as is and the w/a tends to be invasive. So I'd suggest to shelve this just in case, abandon the patch and reject the ticket.

shelving for now

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUInstructionSelector.cpp

31 lines

AsmParser/

AMDGPUAsmParser.cpp

53 lines

MCTargetDesc/

AMDGPUInstPrinter.cpp

16 lines

13 lines

5 lines

94 lines

11 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

constant-bus-restriction.ll

34 lines

72 lines

941 lines

1446 lines

78 lines

llvm.amdgcn.div.scale.ll

206 lines

fdiv-nofpexcept.ll

54 lines

fdiv.f64.ll

6 lines

frem.ll

1162 lines

inserted-wait-states.mir

2 lines

llvm.amdgcn.div.scale.ll

52 lines

llvm.powi.ll

12 lines

sched-crash-dbg-value.mir

10 lines

wave32.ll

12 lines

MC/

AMDGPU/

492 lines

108 lines

24 lines

24 lines

16 lines

Disassembler/

AMDGPU/

gfx10-wave32.txt

12 lines

gfx10_vop3.txt

250 lines

Diff 458989

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

Show First 20 Lines • Show All 921 Lines • ▼ Show 20 Lines	if (Ty == LLT::scalar(32))
Opc = AMDGPU::V_DIV_SCALE_F32_e64;		Opc = AMDGPU::V_DIV_SCALE_F32_e64;
else if (Ty == LLT::scalar(64))		else if (Ty == LLT::scalar(64))
Opc = AMDGPU::V_DIV_SCALE_F64_e64;		Opc = AMDGPU::V_DIV_SCALE_F64_e64;
else		else
return false;		return false;

// TODO: Match source modifiers.		// TODO: Match source modifiers.

const DebugLoc &DL = MI.getDebugLoc();
MachineBasicBlock *MBB = MI.getParent();

Register Numer = MI.getOperand(3).getReg();		Register Numer = MI.getOperand(3).getReg();
Register Denom = MI.getOperand(4).getReg();		Register Denom = MI.getOperand(4).getReg();
unsigned ChooseDenom = MI.getOperand(5).getImm();		unsigned ChooseDenom = MI.getOperand(5).getImm();

Register Src0 = ChooseDenom != 0 ? Numer : Denom;		Register Src0 = ChooseDenom != 0 ? Numer : Denom;

auto MIB = BuildMI(*MBB, &MI, DL, TII.get(Opc), Dst0)		MachineIRBuilder Builder(MI);
.addDef(Dst1)		auto MIB = Builder.buildInstr(Opc)
		.addDef(Dst0)
.addImm(0) // $src0_modifiers		.addImm(0) // $src0_modifiers
.addUse(Src0) // $src0		.addUse(Src0) // $src0
.addImm(0) // $src1_modifiers		.addImm(0) // $src1_modifiers
.addUse(Denom) // $src1		.addUse(Denom) // $src1
.addImm(0) // $src2_modifiers		.addImm(0) // $src2_modifiers
.addUse(Numer) // $src2		.addUse(Numer) // $src2
.addImm(0) // $clamp		.addImm(0) // $clamp
.addImm(0); // $omod		.addImm(0); // $omod

		if (!constrainSelectedInstRegOperands(*MIB, TII, TRI, RBI)) {
		return false;
		}
		rampitecUnsubmitted Done Reply Inline Actions Tabs. rampitec: Tabs.

		MRI->replaceRegWith(Dst1, TRI.getVCC());
MI.eraseFromParent();		MI.eraseFromParent();
return constrainSelectedInstRegOperands(*MIB, TII, TRI, RBI);		return true;
}		}

bool AMDGPUInstructionSelector::selectG_INTRINSIC(MachineInstr &I) const {		bool AMDGPUInstructionSelector::selectG_INTRINSIC(MachineInstr &I) const {
unsigned IntrinsicID = I.getIntrinsicID();		unsigned IntrinsicID = I.getIntrinsicID();
switch (IntrinsicID) {		switch (IntrinsicID) {
case Intrinsic::amdgcn_if_break: {		case Intrinsic::amdgcn_if_break: {
MachineBasicBlock *BB = I.getParent();		MachineBasicBlock *BB = I.getParent();
		rampitecUnsubmitted Done Reply Inline Actions MIB->getNextNode() looks suspicious. rampitec: MIB->getNextNode() looks suspicious.

		rampitecUnsubmitted Done Reply Inline Actions TRI.getVCC(). rampitec: TRI.getVCC().
// FIXME: Manually selecting to avoid dealing with the SReg_1 trick		// FIXME: Manually selecting to avoid dealing with the SReg_1 trick
// SelectionDAG uses for wave32 vs wave64.		// SelectionDAG uses for wave32 vs wave64.
BuildMI(*BB, &I, I.getDebugLoc(), TII.get(AMDGPU::SI_IF_BREAK))		BuildMI(*BB, &I, I.getDebugLoc(), TII.get(AMDGPU::SI_IF_BREAK))
.add(I.getOperand(0))		.add(I.getOperand(0))
.add(I.getOperand(2))		.add(I.getOperand(2))
.add(I.getOperand(3));		.add(I.getOperand(3));

Register DstReg = I.getOperand(0).getReg();		Register DstReg = I.getOperand(0).getReg();
▲ Show 20 Lines • Show All 4,054 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,186 Lines • ▼ Show 20 Lines	for (int J = 0; J < 3; ++J) {
if (ModOps[J] == AMDGPU::OpName::src0_modifiers &&		if (ModOps[J] == AMDGPU::OpName::src0_modifiers &&
(OpSel & (1 << 3)) != 0)		(OpSel & (1 << 3)) != 0)
ModVal \|= SISrcMods::DST_OP_SEL;		ModVal \|= SISrcMods::DST_OP_SEL;

Inst.getOperand(ModIdx).setImm(ModVal);		Inst.getOperand(ModIdx).setImm(ModVal);
}		}
}		}

		static bool HasImplicitSDSTVCC(unsigned Opcode) {
		switch (Opcode) {
		default:
		return false;
		case AMDGPU::V_DIV_SCALE_F32_e64_gfx11:
		case AMDGPU::V_DIV_SCALE_F32_e64_w32_gfx11:
		case AMDGPU::V_DIV_SCALE_F32_e64_w64_gfx11:
		case AMDGPU::V_DIV_SCALE_F64_e64_gfx11:
		case AMDGPU::V_DIV_SCALE_F64_e64_w32_gfx11:
		case AMDGPU::V_DIV_SCALE_F64_e64_w64_gfx11:
		case AMDGPU::V_DIV_SCALE_F32_gfx10:
		case AMDGPU::V_DIV_SCALE_F32_w32_gfx10:
		case AMDGPU::V_DIV_SCALE_F32_w64_gfx10:
		case AMDGPU::V_DIV_SCALE_F64_gfx10:
		case AMDGPU::V_DIV_SCALE_F64_w32_gfx10:
		case AMDGPU::V_DIV_SCALE_F64_w64_gfx10:
		case AMDGPU::V_DIV_SCALE_F32_gfx6_gfx7:
		case AMDGPU::V_DIV_SCALE_F64_gfx6_gfx7:
		case AMDGPU::V_DIV_SCALE_F32_vi:
		case AMDGPU::V_DIV_SCALE_F64_vi:
		return true;
		}
		}

void AMDGPUAsmParser::cvtVOP3(MCInst &Inst, const OperandVector &Operands,		void AMDGPUAsmParser::cvtVOP3(MCInst &Inst, const OperandVector &Operands,
OptionalImmIndexMap &OptionalIdx) {		OptionalImmIndexMap &OptionalIdx) {
unsigned Opc = Inst.getOpcode();		unsigned Opc = Inst.getOpcode();

unsigned I = 1;		unsigned I = 1;
const MCInstrDesc &Desc = MII.get(Inst.getOpcode());		const MCInstrDesc &Desc = MII.get(Inst.getOpcode());
for (unsigned J = 0; J < Desc.getNumDefs(); ++J) {		for (unsigned J = 0; J < Desc.getNumDefs(); ++J) {
((AMDGPUOperand &)*Operands[I++]).addRegOperands(Inst, 1);		((AMDGPUOperand &)*Operands[I++]).addRegOperands(Inst, 1);
}		}

if (AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src0_modifiers) != -1) {		if (AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src0_modifiers) != -1) {
		const bool UsesImplicitSDSTVCC = HasImplicitSDSTVCC(Inst.getOpcode());
// This instruction has src modifiers		// This instruction has src modifiers
for (unsigned E = Operands.size(); I != E; ++I) {		for (unsigned E = Operands.size(); I != E; ++I) {
AMDGPUOperand &Op = ((AMDGPUOperand &)*Operands[I]);		AMDGPUOperand &Op = ((AMDGPUOperand &)*Operands[I]);

		if (UsesImplicitSDSTVCC && Op.isReg() && I == 2 &&
		(Op.getReg() == AMDGPU::VCC \|\| Op.getReg() == AMDGPU::VCC_LO)) {
		continue;
		}

if (isRegOrImmWithInputMods(Desc, Inst.getNumOperands())) {		if (isRegOrImmWithInputMods(Desc, Inst.getNumOperands())) {
Op.addRegOrImmWithFPInputModsOperands(Inst, 2);		Op.addRegOrImmWithFPInputModsOperands(Inst, 2);
} else if (Op.isImmModifier()) {		} else if (Op.isImmModifier()) {
OptionalIdx[Op.getImmTy()] = I;		OptionalIdx[Op.getImmTy()] = I;
} else if (Op.isRegOrImm()) {		} else if (Op.isRegOrImm()) {
Op.addRegOrImmOperands(Inst, 1);		Op.addRegOrImmOperands(Inst, 1);
} else {		} else {
llvm_unreachable("unhandled operand type");		llvm_unreachable("unhandled operand type");
Show All 39 Lines	if (Opc == AMDGPU::V_MAC_F32_e64_gfx6_gfx7 \|\|
Opc == AMDGPU::V_FMAC_F16_e64_gfx11) {		Opc == AMDGPU::V_FMAC_F16_e64_gfx11) {
auto it = Inst.begin();		auto it = Inst.begin();
std::advance(it, AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src2_modifiers));		std::advance(it, AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src2_modifiers));
it = Inst.insert(it, MCOperand::createImm(0)); // no modifiers for src2		it = Inst.insert(it, MCOperand::createImm(0)); // no modifiers for src2
++it;		++it;
// Copy the operand to ensure it's not invalidated when Inst grows.		// Copy the operand to ensure it's not invalidated when Inst grows.
Inst.insert(it, MCOperand(Inst.getOperand(0))); // src2 = dst		Inst.insert(it, MCOperand(Inst.getOperand(0))); // src2 = dst
}		}

		// Fix AsmParserOnly Opcodes to canonical opcodes.
		switch (Inst.getOpcode()) {
		default:
		break;
		case AMDGPU::V_DIV_SCALE_F32_e64_w32_gfx11:
		case AMDGPU::V_DIV_SCALE_F32_e64_w64_gfx11:
		Inst.setOpcode(AMDGPU::V_DIV_SCALE_F32_e64_gfx11);
		break;
		case AMDGPU::V_DIV_SCALE_F64_e64_w32_gfx11:
		case AMDGPU::V_DIV_SCALE_F64_e64_w64_gfx11:
		Inst.setOpcode(AMDGPU::V_DIV_SCALE_F64_e64_gfx11);
		break;
		case AMDGPU::V_DIV_SCALE_F32_w32_gfx10:
		case AMDGPU::V_DIV_SCALE_F32_w64_gfx10:
		Inst.setOpcode(AMDGPU::V_DIV_SCALE_F32_gfx10);
		break;
		case AMDGPU::V_DIV_SCALE_F64_w32_gfx10:
		case AMDGPU::V_DIV_SCALE_F64_w64_gfx10:
		Inst.setOpcode(AMDGPU::V_DIV_SCALE_F64_gfx10);
		break;
		}
}		}

void AMDGPUAsmParser::cvtVOP3(MCInst &Inst, const OperandVector &Operands) {		void AMDGPUAsmParser::cvtVOP3(MCInst &Inst, const OperandVector &Operands) {
OptionalImmIndexMap OptionalIdx;		OptionalImmIndexMap OptionalIdx;
cvtVOP3(Inst, Operands, OptionalIdx);		cvtVOP3(Inst, Operands, OptionalIdx);
}		}

void AMDGPUAsmParser::cvtVOP3P(MCInst &Inst, const OperandVector &Operands,		void AMDGPUAsmParser::cvtVOP3P(MCInst &Inst, const OperandVector &Operands,
▲ Show 20 Lines • Show All 989 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp

Show First 20 Lines • Show All 755 Lines • ▼ Show 20 Lines	if (Op.isReg()) {
}		}
} else if (Op.isExpr()) {		} else if (Op.isExpr()) {
const MCExpr *Exp = Op.getExpr();		const MCExpr *Exp = Op.getExpr();
Exp->print(O, &MAI);		Exp->print(O, &MAI);
} else {		} else {
O << "/INV_OP/";		O << "/INV_OP/";
}		}

// Print default vcc/vcc_lo operand of v_cndmask_b32_e32.		// Print default vcc/vcc_lo operand for specific instructions.
switch (MI->getOpcode()) {		switch (MI->getOpcode()) {
default: break;		default: break;
		case AMDGPU::V_DIV_SCALE_F32_e64_gfx11:
		case AMDGPU::V_DIV_SCALE_F64_e64_gfx11:
		case AMDGPU::V_DIV_SCALE_F32_gfx10:
		case AMDGPU::V_DIV_SCALE_F64_gfx10: {
		// Print vcc(_lo) after the vdst for V_DIV_SCALE on gfx10+.
		int VDstIdx =
		AMDGPU::getNamedOperandIdx(MI->getOpcode(), AMDGPU::OpName::vdst);
		assert(VDstIdx != -1);
		if ((int)OpNo == VDstIdx) {
		printDefaultVccOperand(false, STI, O);
		}
		break;
		}
case AMDGPU::V_CNDMASK_B32_e32_gfx10:		case AMDGPU::V_CNDMASK_B32_e32_gfx10:
case AMDGPU::V_ADD_CO_CI_U32_e32_gfx10:		case AMDGPU::V_ADD_CO_CI_U32_e32_gfx10:
case AMDGPU::V_SUB_CO_CI_U32_e32_gfx10:		case AMDGPU::V_SUB_CO_CI_U32_e32_gfx10:
case AMDGPU::V_SUBREV_CO_CI_U32_e32_gfx10:		case AMDGPU::V_SUBREV_CO_CI_U32_e32_gfx10:
case AMDGPU::V_CNDMASK_B32_dpp_gfx10:		case AMDGPU::V_CNDMASK_B32_dpp_gfx10:
case AMDGPU::V_ADD_CO_CI_U32_dpp_gfx10:		case AMDGPU::V_ADD_CO_CI_U32_dpp_gfx10:
case AMDGPU::V_SUB_CO_CI_U32_dpp_gfx10:		case AMDGPU::V_SUB_CO_CI_U32_dpp_gfx10:
case AMDGPU::V_SUBREV_CO_CI_U32_dpp_gfx10:		case AMDGPU::V_SUBREV_CO_CI_U32_dpp_gfx10:
▲ Show 20 Lines • Show All 843 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 12,417 Lines • ▼ Show 20 Lines	void SITargetLowering::finalizeLowering(MachineFunction &MF) const {

if (Info->getFrameOffsetReg() != AMDGPU::FP_REG)		if (Info->getFrameOffsetReg() != AMDGPU::FP_REG)
MRI.replaceRegWith(AMDGPU::FP_REG, Info->getFrameOffsetReg());		MRI.replaceRegWith(AMDGPU::FP_REG, Info->getFrameOffsetReg());

Info->limitOccupancy(MF);		Info->limitOccupancy(MF);

if (ST.isWave32() && !MF.empty()) {		if (ST.isWave32() && !MF.empty()) {
for (auto &MBB : MF) {		for (auto &MBB : MF) {
for (auto &MI : MBB) {		for (auto I = MBB.begin(); I != MBB.end(); ++I) {
		auto &MI = *I;
TII->fixImplicitOperands(MI);		TII->fixImplicitOperands(MI);

		if (MI.getOpcode() == AMDGPU::V_DIV_SCALE_F32_e64 \|\|
		MI.getOpcode() == AMDGPU::V_DIV_SCALE_F64_e64) {
		// Fixup adjacent copy of the VCC impdef so it's also VCC_LO.
		auto NextI = std::next(I);
		if (NextI != MBB.end() && NextI->getOpcode() == AMDGPU::COPY &&
		NextI->getOperand(1).getReg() == AMDGPU::VCC) {
		NextI->getOperand(1).setReg(AMDGPU::VCC_LO);
		}
		}
}		}
}		}
}		}

// FIXME: This is a hack to fixup AGPR classes to use the properly aligned		// FIXME: This is a hack to fixup AGPR classes to use the properly aligned
// classes if required. Ideally the register class constraints would differ		// classes if required. Ideally the register class constraints would differ
// per-subtarget, but there's no easy way to achieve that right now. This is		// per-subtarget, but there's no easy way to achieve that right now. This is
// not a problem for VGPRs because the correctly aligned VGPR class is implied		// not a problem for VGPRs because the correctly aligned VGPR class is implied
▲ Show 20 Lines • Show All 545 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp

Show First 20 Lines • Show All 703 Lines • ▼ Show 20 Lines	for (MachineInstr &MI : MBB) {
: &AMDGPU::SReg_64RegClass);		: &AMDGPU::SReg_64RegClass);
if (MI.getOpcode() == AMDGPU::IMPLICIT_DEF)		if (MI.getOpcode() == AMDGPU::IMPLICIT_DEF)
continue;		continue;

DebugLoc DL = MI.getDebugLoc();		DebugLoc DL = MI.getDebugLoc();
Register SrcReg = MI.getOperand(1).getReg();		Register SrcReg = MI.getOperand(1).getReg();
assert(!MI.getOperand(1).getSubReg());		assert(!MI.getOperand(1).getSubReg());

		// VCC already represents a lane mask and doesn't need special lowering.
		if (SrcReg == AMDGPU::VCC_LO \|\| SrcReg == AMDGPU::VCC) {
		continue;
		}

if (!SrcReg.isVirtual() \|\| (!isLaneMaskReg(SrcReg) && !isVreg1(SrcReg))) {		if (!SrcReg.isVirtual() \|\| (!isLaneMaskReg(SrcReg) && !isVreg1(SrcReg))) {
assert(TII->getRegisterInfo().getRegSizeInBits(SrcReg, *MRI) == 32);		assert(TII->getRegisterInfo().getRegSizeInBits(SrcReg, *MRI) == 32);
unsigned TmpReg = createLaneMaskReg(*MF);		unsigned TmpReg = createLaneMaskReg(*MF);
BuildMI(MBB, MI, DL, TII->get(AMDGPU::V_CMP_NE_U32_e64), TmpReg)		BuildMI(MBB, MI, DL, TII->get(AMDGPU::V_CMP_NE_U32_e64), TmpReg)
.addReg(SrcReg)		.addReg(SrcReg)
.addImm(0);		.addImm(0);
MI.getOperand(1).setReg(TmpReg);		MI.getOperand(1).setReg(TmpReg);
SrcReg = TmpReg;		SrcReg = TmpReg;
▲ Show 20 Lines • Show All 170 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/VOP3Instructions.td

Show All 14 Lines	def VOP_F32_F32_F32_F32_VCC : VOPProfile<[f32, f32, f32, f32]> {
let HasExtDPP = 0;		let HasExtDPP = 0;
}		}
def VOP_F64_F64_F64_F64_VCC : VOPProfile<[f64, f64, f64, f64]> {		def VOP_F64_F64_F64_F64_VCC : VOPProfile<[f64, f64, f64, f64]> {
let Outs64 = (outs DstRC.RegClass:$vdst);		let Outs64 = (outs DstRC.RegClass:$vdst);
}		}
}		}

class VOP3b_Profile<ValueType vt> : VOPProfile<[vt, vt, vt, vt]> {		class VOP3b_Profile<ValueType vt> : VOPProfile<[vt, vt, vt, vt]> {
let Outs64 = (outs DstRC:$vdst, VOPDstS64orS32:$sdst);		let Outs64 = (outs DstRC:$vdst);
let Asm64 = "$vdst, $sdst, $src0_modifiers, $src1_modifiers, $src2_modifiers$clamp$omod";
		let Asm64 = "$vdst, vcc, $src0_modifiers, $src1_modifiers, $src2_modifiers$clamp$omod";
let IsSingle = 1;		let IsSingle = 1;
let HasExtVOP3DPP = 0;		let HasExtVOP3DPP = 0;
let HasExtDPP = 0;		let HasExtDPP = 0;
}		}

def VOP3b_F32_I1_F32_F32_F32 : VOP3b_Profile<f32>;		def VOP3b_F32_VCC_F32_F32_F32 : VOP3b_Profile<f32>;
def VOP3b_F64_I1_F64_F64_F64 : VOP3b_Profile<f64>;		def VOP3b_F64_VCC_F64_F64_F64 : VOP3b_Profile<f64>;

def VOP3b_I64_I1_I32_I32_I64 : VOPProfile<[i64, i32, i32, i64]> {		def VOP3b_I64_I1_I32_I32_I64 : VOPProfile<[i64, i32, i32, i64]> {
let HasClamp = 1;		let HasClamp = 1;

let IsSingle = 1;		let IsSingle = 1;
let Outs64 = (outs DstRC:$vdst, VOPDstS64orS32:$sdst);		let Outs64 = (outs DstRC:$vdst, VOPDstS64orS32:$sdst);
let Asm64 = "$vdst, $sdst, $src0, $src1, $src2$clamp";		let Asm64 = "$vdst, $sdst, $src0, $src1, $src2$clamp";
}		}
▲ Show 20 Lines • Show All 175 Lines • ▼ Show 20 Lines

let SchedRW = [WriteDoubleAdd], FPDPRounding = 1 in {		let SchedRW = [WriteDoubleAdd], FPDPRounding = 1 in {
defm V_DIV_FIXUP_F64 : VOP3Inst <"v_div_fixup_f64", VOP3_Profile<VOP_F64_F64_F64_F64>, AMDGPUdiv_fixup>;		defm V_DIV_FIXUP_F64 : VOP3Inst <"v_div_fixup_f64", VOP3_Profile<VOP_F64_F64_F64_F64>, AMDGPUdiv_fixup>;
defm V_LDEXP_F64 : VOP3Inst <"v_ldexp_f64", VOP3_Profile<VOP_F64_F64_I32>, AMDGPUldexp>;		defm V_LDEXP_F64 : VOP3Inst <"v_ldexp_f64", VOP3_Profile<VOP_F64_F64_I32>, AMDGPUldexp>;
} // End SchedRW = [WriteDoubleAdd], FPDPRounding = 1		} // End SchedRW = [WriteDoubleAdd], FPDPRounding = 1
} // End isReMaterializable = 1		} // End isReMaterializable = 1


let mayRaiseFPException = 0 in { // Seems suspicious but manual doesn't say it does.		let Defs = [VCC], mayRaiseFPException = 0 in { // Seems suspicious but manual doesn't say it does.
let SchedRW = [WriteFloatFMA, WriteSALU] in		let SchedRW = [WriteFloatFMA, WriteSALU] in
defm V_DIV_SCALE_F32 : VOP3Inst_Pseudo_Wrapper <"v_div_scale_f32", VOP3b_F32_I1_F32_F32_F32> ;		defm V_DIV_SCALE_F32 : VOP3Inst_Pseudo_Wrapper <"v_div_scale_f32", VOP3b_F32_VCC_F32_F32_F32> ;

// Double precision division pre-scale.		// Double precision division pre-scale.
let SchedRW = [WriteDouble, WriteSALU], FPDPRounding = 1 in		let SchedRW = [WriteDouble, WriteSALU], FPDPRounding = 1 in
defm V_DIV_SCALE_F64 : VOP3Inst_Pseudo_Wrapper <"v_div_scale_f64", VOP3b_F64_I1_F64_F64_F64>;		defm V_DIV_SCALE_F64 : VOP3Inst_Pseudo_Wrapper <"v_div_scale_f64", VOP3b_F64_VCC_F64_F64_F64>;
} // End mayRaiseFPException = 0		} // End mayRaiseFPException = 0

let isReMaterializable = 1 in		let isReMaterializable = 1 in
defm V_MSAD_U8 : VOP3Inst <"v_msad_u8", VOP3_Profile<VOP_I32_I32_I32_I32, VOP3_CLAMP>>;		defm V_MSAD_U8 : VOP3Inst <"v_msad_u8", VOP3_Profile<VOP_I32_I32_I32_I32, VOP3_CLAMP>>;

let Constraints = "@earlyclobber $vdst" in {		let Constraints = "@earlyclobber $vdst" in {
defm V_MQSAD_PK_U16_U8 : VOP3Inst <"v_mqsad_pk_u16_u8", VOP3_Profile<VOP_I64_I64_I32_I64, VOP3_CLAMP>>;		defm V_MQSAD_PK_U16_U8 : VOP3Inst <"v_mqsad_pk_u16_u8", VOP3_Profile<VOP_I64_I64_I32_I64, VOP3_CLAMP>>;
} // End Constraints = "@earlyclobber $vdst"		} // End Constraints = "@earlyclobber $vdst"
▲ Show 20 Lines • Show All 599 Lines • ▼ Show 20 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Target-specific instruction encodings.		// Target-specific instruction encodings.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// GFX11.		// GFX11.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		let AssemblerPredicate = isGFX11Only in
		multiclass VOP3be_VCCSDST_Real_gfx11<bits<10> op, string asmName> {
		defvar ps = !cast<VOP3_Pseudo>(NAME #"_e64");

		def _e64_gfx11 : VOP3_Real<ps, SIEncodingFamily.GFX11, asmName>,
		VOP3be_gfx11<op, ps.Pfl> {
		let IsSingle = ps.Pfl.IsSingle;
		let sdst = !cast<int>(VCC.HWEncoding);
		let AsmString = asmName # !subst(", vcc", "", ps.AsmOperands);
		}

		let isAsmParserOnly = 1 in {
		def _e64_w32_gfx11 : VOP3_Base<ps, asmName>, VOP3be_gfx11<op, ps.Pfl> {
		let sdst = !cast<int>(VCC_LO.HWEncoding);
		let WaveSizePredicate = isWave32;
		let AsmString = asmName # !subst("vcc", "vcc_lo", ps.AsmOperands);
		}

		def _e64_w64_gfx11 : VOP3_Base<ps, asmName>, VOP3be_gfx11<op, ps.Pfl> {
		let sdst = !cast<int>(VCC.HWEncoding);
		let WaveSizePredicate = isWave64;
		let AsmString = asmName # ps.AsmOperands;
		}
		}
		}

defm V_FMA_DX9_ZERO_F32 : VOP3_Real_with_name_gfx11<0x209, "V_FMA_LEGACY_F32", "v_fma_dx9_zero_f32">;		defm V_FMA_DX9_ZERO_F32 : VOP3_Real_with_name_gfx11<0x209, "V_FMA_LEGACY_F32", "v_fma_dx9_zero_f32">;
defm V_MAD_I32_I24 : VOP3_Realtriple_gfx11<0x20a>;		defm V_MAD_I32_I24 : VOP3_Realtriple_gfx11<0x20a>;
defm V_MAD_U32_U24 : VOP3_Realtriple_gfx11<0x20b>;		defm V_MAD_U32_U24 : VOP3_Realtriple_gfx11<0x20b>;
defm V_CUBEID_F32 : VOP3_Realtriple_gfx11<0x20c>;		defm V_CUBEID_F32 : VOP3_Realtriple_gfx11<0x20c>;
defm V_CUBESC_F32 : VOP3_Realtriple_gfx11<0x20d>;		defm V_CUBESC_F32 : VOP3_Realtriple_gfx11<0x20d>;
defm V_CUBETC_F32 : VOP3_Realtriple_gfx11<0x20e>;		defm V_CUBETC_F32 : VOP3_Realtriple_gfx11<0x20e>;
defm V_CUBEMA_F32 : VOP3_Realtriple_gfx11<0x20f>;		defm V_CUBEMA_F32 : VOP3_Realtriple_gfx11<0x20f>;
defm V_BFE_U32 : VOP3_Realtriple_gfx11<0x210>;		defm V_BFE_U32 : VOP3_Realtriple_gfx11<0x210>;
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
defm V_MAXMIN_F16 : VOP3_Realtriple_gfx11<0x260>;		defm V_MAXMIN_F16 : VOP3_Realtriple_gfx11<0x260>;
defm V_MINMAX_F16 : VOP3_Realtriple_gfx11<0x261>;		defm V_MINMAX_F16 : VOP3_Realtriple_gfx11<0x261>;
defm V_MAXMIN_U32 : VOP3_Realtriple_gfx11<0x262>;		defm V_MAXMIN_U32 : VOP3_Realtriple_gfx11<0x262>;
defm V_MINMAX_U32 : VOP3_Realtriple_gfx11<0x263>;		defm V_MINMAX_U32 : VOP3_Realtriple_gfx11<0x263>;
defm V_MAXMIN_I32 : VOP3_Realtriple_gfx11<0x264>;		defm V_MAXMIN_I32 : VOP3_Realtriple_gfx11<0x264>;
defm V_MINMAX_I32 : VOP3_Realtriple_gfx11<0x265>;		defm V_MINMAX_I32 : VOP3_Realtriple_gfx11<0x265>;
defm V_DOT2_F16_F16 : VOP3Dot_Realtriple_gfx11<0x266>;		defm V_DOT2_F16_F16 : VOP3Dot_Realtriple_gfx11<0x266>;
defm V_DOT2_BF16_BF16 : VOP3Dot_Realtriple_gfx11<0x267>;		defm V_DOT2_BF16_BF16 : VOP3Dot_Realtriple_gfx11<0x267>;
defm V_DIV_SCALE_F32 : VOP3be_Real_gfx11<0x2fc, "V_DIV_SCALE_F32", "v_div_scale_f32">;		defm V_DIV_SCALE_F32 : VOP3be_VCCSDST_Real_gfx11<0x2fc, "v_div_scale_f32">;
defm V_DIV_SCALE_F64 : VOP3be_Real_gfx11<0x2fd, "V_DIV_SCALE_F64", "v_div_scale_f64">;		defm V_DIV_SCALE_F64 : VOP3be_VCCSDST_Real_gfx11<0x2fd, "v_div_scale_f64">;
defm V_MAD_U64_U32_gfx11 : VOP3be_Real_gfx11<0x2fe, "V_MAD_U64_U32_gfx11", "v_mad_u64_u32">;		defm V_MAD_U64_U32_gfx11 : VOP3be_Real_gfx11<0x2fe, "V_MAD_U64_U32_gfx11", "v_mad_u64_u32">;
defm V_MAD_I64_I32_gfx11 : VOP3be_Real_gfx11<0x2ff, "V_MAD_I64_I32_gfx11", "v_mad_i64_i32">;		defm V_MAD_I64_I32_gfx11 : VOP3be_Real_gfx11<0x2ff, "V_MAD_I64_I32_gfx11", "v_mad_i64_i32">;
defm V_ADD_NC_U16 : VOP3Only_Realtriple_gfx11<0x303>;		defm V_ADD_NC_U16 : VOP3Only_Realtriple_gfx11<0x303>;
defm V_SUB_NC_U16 : VOP3Only_Realtriple_gfx11<0x304>;		defm V_SUB_NC_U16 : VOP3Only_Realtriple_gfx11<0x304>;
defm V_MUL_LO_U16 : VOP3Only_Realtriple_gfx11<0x305>;		defm V_MUL_LO_U16 : VOP3Only_Realtriple_gfx11<0x305>;
defm V_CVT_PK_I16_F32 : VOP3_Realtriple_gfx11<0x306>;		defm V_CVT_PK_I16_F32 : VOP3_Realtriple_gfx11<0x306>;
defm V_CVT_PK_U16_F32 : VOP3_Realtriple_gfx11<0x307>;		defm V_CVT_PK_U16_F32 : VOP3_Realtriple_gfx11<0x307>;
defm V_MAX_U16 : VOP3Only_Realtriple_gfx11<0x309>;		defm V_MAX_U16 : VOP3Only_Realtriple_gfx11<0x309>;
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	def _gfx10 :
let IsSingle = 1;		let IsSingle = 1;
}		}
}		}
multiclass VOP3be_Real_gfx10<bits<10> op> {		multiclass VOP3be_Real_gfx10<bits<10> op> {
def _gfx10 :		def _gfx10 :
VOP3_Real<!cast<VOP3_Pseudo>(NAME#"_e64"), SIEncodingFamily.GFX10>,		VOP3_Real<!cast<VOP3_Pseudo>(NAME#"_e64"), SIEncodingFamily.GFX10>,
VOP3be_gfx10<op, !cast<VOP3_Pseudo>(NAME#"_e64").Pfl>;		VOP3be_gfx10<op, !cast<VOP3_Pseudo>(NAME#"_e64").Pfl>;
}		}
		multiclass VOP3be_VCCSDST_Real_gfx10<bits<10> op, string asmName> {
		defvar ps = !cast<VOP3_Pseudo>(NAME #"_e64");

		def _gfx10 : VOP3_Real<ps, SIEncodingFamily.GFX10>,
		VOP3be_gfx10<op, ps.Pfl> {
		let IsSingle = ps.Pfl.IsSingle;
		let sdst = !cast<int>(VCC_LO.HWEncoding);
		let AsmString = asmName # !subst(", vcc", "", ps.AsmOperands);
		}

		let isAsmParserOnly = 1 in {
		def _w32_gfx10 : VOP3_Base<ps>, VOP3be_gfx10<op, ps.Pfl> {
		let sdst = !cast<int>(VCC_LO.HWEncoding);
		let WaveSizePredicate = isWave32;
		let AsmString = asmName # !subst("vcc", "vcc_lo", ps.AsmOperands);
		}

		def _w64_gfx10 : VOP3_Base<ps>, VOP3be_gfx10<op, ps.Pfl> {
		let sdst = !cast<int>(VCC.HWEncoding);
		let WaveSizePredicate = isWave64;
		let AsmString = asmName # ps.AsmOperands;
		}
		}
		}
multiclass VOP3Interp_Real_gfx10<bits<10> op> {		multiclass VOP3Interp_Real_gfx10<bits<10> op> {
def _gfx10 :		def _gfx10 :
VOP3_Real<!cast<VOP3_Pseudo>(NAME), SIEncodingFamily.GFX10>,		VOP3_Real<!cast<VOP3_Pseudo>(NAME), SIEncodingFamily.GFX10>,
VOP3Interp_gfx10<op, !cast<VOP3_Pseudo>(NAME).Pfl>;		VOP3Interp_gfx10<op, !cast<VOP3_Pseudo>(NAME).Pfl>;
}		}
multiclass VOP3OpSel_Real_gfx10<bits<10> op> {		multiclass VOP3OpSel_Real_gfx10<bits<10> op> {
def _gfx10 :		def _gfx10 :
VOP3_Real<!cast<VOP3_Pseudo>(NAME#"_e64"), SIEncodingFamily.GFX10>,		VOP3_Real<!cast<VOP3_Pseudo>(NAME#"_e64"), SIEncodingFamily.GFX10>,
▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	def _gfx6_gfx7 :
VOP3_Real<!cast<VOP3_Pseudo>(NAME#"_e64"), SIEncodingFamily.SI>,		VOP3_Real<!cast<VOP3_Pseudo>(NAME#"_e64"), SIEncodingFamily.SI>,
VOP3e_gfx6_gfx7<op{8-0}, !cast<VOP3_Pseudo>(NAME#"_e64").Pfl>;		VOP3e_gfx6_gfx7<op{8-0}, !cast<VOP3_Pseudo>(NAME#"_e64").Pfl>;
}		}
multiclass VOP3be_Real_gfx6_gfx7<bits<10> op> {		multiclass VOP3be_Real_gfx6_gfx7<bits<10> op> {
def _gfx6_gfx7 :		def _gfx6_gfx7 :
VOP3_Real<!cast<VOP3_Pseudo>(NAME#"_e64"), SIEncodingFamily.SI>,		VOP3_Real<!cast<VOP3_Pseudo>(NAME#"_e64"), SIEncodingFamily.SI>,
VOP3be_gfx6_gfx7<op{8-0}, !cast<VOP3_Pseudo>(NAME#"_e64").Pfl>;		VOP3be_gfx6_gfx7<op{8-0}, !cast<VOP3_Pseudo>(NAME#"_e64").Pfl>;
}		}
		multiclass VOP3be_VCCSDST_Real_gfx6_gfx7<bits<10> op> {
		def _gfx6_gfx7 :
		VOP3_Real<!cast<VOP3_Pseudo>(NAME#"_e64"), SIEncodingFamily.SI>,
		VOP3be_gfx6_gfx7<op{8-0}, !cast<VOP3_Pseudo>(NAME#"_e64").Pfl> {
		let sdst = !cast<int>(VCC.HWEncoding);
		}
		}
} // End AssemblerPredicate = isGFX6GFX7, DecoderNamespace = "GFX6GFX7"		} // End AssemblerPredicate = isGFX6GFX7, DecoderNamespace = "GFX6GFX7"

multiclass VOP3_Real_gfx6_gfx7_gfx10<bits<10> op> :		multiclass VOP3_Real_gfx6_gfx7_gfx10<bits<10> op> :
VOP3_Real_gfx6_gfx7<op>, VOP3_Real_gfx10<op>;		VOP3_Real_gfx6_gfx7<op>, VOP3_Real_gfx10<op>;

multiclass VOP3be_Real_gfx6_gfx7_gfx10<bits<10> op> :		multiclass VOP3be_Real_gfx6_gfx7_gfx10<bits<10> op> :
VOP3be_Real_gfx6_gfx7<op>, VOP3be_Real_gfx10<op>;		VOP3be_Real_gfx6_gfx7<op>, VOP3be_Real_gfx10<op>;

		multiclass VOP3be_VCCSDST_Real_gfx6_gfx7_gfx10<bits<10> op, string asmName> :
		VOP3be_VCCSDST_Real_gfx6_gfx7<op>, VOP3be_VCCSDST_Real_gfx10<op, asmName>;

defm V_LSHL_B64 : VOP3_Real_gfx6_gfx7<0x161>;		defm V_LSHL_B64 : VOP3_Real_gfx6_gfx7<0x161>;
defm V_LSHR_B64 : VOP3_Real_gfx6_gfx7<0x162>;		defm V_LSHR_B64 : VOP3_Real_gfx6_gfx7<0x162>;
defm V_ASHR_I64 : VOP3_Real_gfx6_gfx7<0x163>;		defm V_ASHR_I64 : VOP3_Real_gfx6_gfx7<0x163>;
defm V_MUL_LO_I32 : VOP3_Real_gfx6_gfx7<0x16b>;		defm V_MUL_LO_I32 : VOP3_Real_gfx6_gfx7<0x16b>;

defm V_MAD_LEGACY_F32 : VOP3_Real_gfx6_gfx7_gfx10<0x140>;		defm V_MAD_LEGACY_F32 : VOP3_Real_gfx6_gfx7_gfx10<0x140>;
defm V_MAD_F32 : VOP3_Real_gfx6_gfx7_gfx10<0x141>;		defm V_MAD_F32 : VOP3_Real_gfx6_gfx7_gfx10<0x141>;
defm V_MAD_I32_I24 : VOP3_Real_gfx6_gfx7_gfx10<0x142>;		defm V_MAD_I32_I24 : VOP3_Real_gfx6_gfx7_gfx10<0x142>;
Show All 35 Lines
defm V_MUL_LO_U32 : VOP3_Real_gfx6_gfx7_gfx10<0x169>;		defm V_MUL_LO_U32 : VOP3_Real_gfx6_gfx7_gfx10<0x169>;
defm V_MUL_HI_U32 : VOP3_Real_gfx6_gfx7_gfx10<0x16a>;		defm V_MUL_HI_U32 : VOP3_Real_gfx6_gfx7_gfx10<0x16a>;
defm V_MUL_HI_I32 : VOP3_Real_gfx6_gfx7_gfx10<0x16c>;		defm V_MUL_HI_I32 : VOP3_Real_gfx6_gfx7_gfx10<0x16c>;
defm V_DIV_FMAS_F32 : VOP3_Real_gfx6_gfx7_gfx10<0x16f>;		defm V_DIV_FMAS_F32 : VOP3_Real_gfx6_gfx7_gfx10<0x16f>;
defm V_DIV_FMAS_F64 : VOP3_Real_gfx6_gfx7_gfx10<0x170>;		defm V_DIV_FMAS_F64 : VOP3_Real_gfx6_gfx7_gfx10<0x170>;
defm V_MSAD_U8 : VOP3_Real_gfx6_gfx7_gfx10<0x171>;		defm V_MSAD_U8 : VOP3_Real_gfx6_gfx7_gfx10<0x171>;
defm V_MQSAD_PK_U16_U8 : VOP3_Real_gfx6_gfx7_gfx10<0x173>;		defm V_MQSAD_PK_U16_U8 : VOP3_Real_gfx6_gfx7_gfx10<0x173>;
defm V_TRIG_PREOP_F64 : VOP3_Real_gfx6_gfx7_gfx10<0x174>;		defm V_TRIG_PREOP_F64 : VOP3_Real_gfx6_gfx7_gfx10<0x174>;
defm V_DIV_SCALE_F32 : VOP3be_Real_gfx6_gfx7_gfx10<0x16d>;		defm V_DIV_SCALE_F32 : VOP3be_VCCSDST_Real_gfx6_gfx7_gfx10<0x16d, "v_div_scale_f32">;
defm V_DIV_SCALE_F64 : VOP3be_Real_gfx6_gfx7_gfx10<0x16e>;		defm V_DIV_SCALE_F64 : VOP3be_VCCSDST_Real_gfx6_gfx7_gfx10<0x16e, "v_div_scale_f64">;

// NB: Same opcode as v_mad_legacy_f32		// NB: Same opcode as v_mad_legacy_f32
let DecoderNamespace = "GFX10_B" in		let DecoderNamespace = "GFX10_B" in
defm V_FMA_LEGACY_F32 : VOP3_Real_gfx10<0x140>;		defm V_FMA_LEGACY_F32 : VOP3_Real_gfx10<0x140>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// GFX8, GFX9 (VI).		// GFX8, GFX9 (VI).
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
Show All 9 Lines	def _vi : VOP3_Real<!cast<VOP_Pseudo>(NAME), SIEncodingFamily.VI>,
VOP3e_vi <op, !cast<VOP_Pseudo>(NAME).Pfl>;		VOP3e_vi <op, !cast<VOP_Pseudo>(NAME).Pfl>;
}		}

multiclass VOP3be_Real_vi<bits<10> op> {		multiclass VOP3be_Real_vi<bits<10> op> {
def _vi : VOP3_Real<!cast<VOP_Pseudo>(NAME#"_e64"), SIEncodingFamily.VI>,		def _vi : VOP3_Real<!cast<VOP_Pseudo>(NAME#"_e64"), SIEncodingFamily.VI>,
VOP3be_vi <op, !cast<VOP_Pseudo>(NAME#"_e64").Pfl>;		VOP3be_vi <op, !cast<VOP_Pseudo>(NAME#"_e64").Pfl>;
}		}

		multiclass VOP3be_VCCSDST_Real_vi<bits<10> op> {
		def _vi : VOP3_Real<!cast<VOP_Pseudo>(NAME#"_e64"), SIEncodingFamily.VI>,
		VOP3be_vi <op, !cast<VOP_Pseudo>(NAME#"_e64").Pfl> {
		let sdst = !cast<int>(VCC.HWEncoding);
		}
		}

multiclass VOP3OpSel_Real_gfx9<bits<10> op> {		multiclass VOP3OpSel_Real_gfx9<bits<10> op> {
def _vi : VOP3_Real<!cast<VOP_Pseudo>(NAME#"_e64"), SIEncodingFamily.VI>,		def _vi : VOP3_Real<!cast<VOP_Pseudo>(NAME#"_e64"), SIEncodingFamily.VI>,
VOP3OpSel_gfx9 <op, !cast<VOP_Pseudo>(NAME#"_e64").Pfl>;		VOP3OpSel_gfx9 <op, !cast<VOP_Pseudo>(NAME#"_e64").Pfl>;
}		}

multiclass VOP3OpSel_Real_gfx9_forced_opsel2<bits<10> op> {		multiclass VOP3OpSel_Real_gfx9_forced_opsel2<bits<10> op> {
def _vi : VOP3_Real<!cast<VOP_Pseudo>(NAME#"_e64"), SIEncodingFamily.VI>,		def _vi : VOP3_Real<!cast<VOP_Pseudo>(NAME#"_e64"), SIEncodingFamily.VI>,
VOP3OpSel_gfx9 <op, !cast<VOP_Pseudo>(NAME#"_e64").Pfl> {		VOP3OpSel_gfx9 <op, !cast<VOP_Pseudo>(NAME#"_e64").Pfl> {
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
defm V_MED3_U32 : VOP3_Real_vi <0x1d8>;		defm V_MED3_U32 : VOP3_Real_vi <0x1d8>;
defm V_SAD_U8 : VOP3_Real_vi <0x1d9>;		defm V_SAD_U8 : VOP3_Real_vi <0x1d9>;
defm V_SAD_HI_U8 : VOP3_Real_vi <0x1da>;		defm V_SAD_HI_U8 : VOP3_Real_vi <0x1da>;
defm V_SAD_U16 : VOP3_Real_vi <0x1db>;		defm V_SAD_U16 : VOP3_Real_vi <0x1db>;
defm V_SAD_U32 : VOP3_Real_vi <0x1dc>;		defm V_SAD_U32 : VOP3_Real_vi <0x1dc>;
defm V_CVT_PK_U8_F32 : VOP3_Real_vi <0x1dd>;		defm V_CVT_PK_U8_F32 : VOP3_Real_vi <0x1dd>;
defm V_DIV_FIXUP_F32 : VOP3_Real_vi <0x1de>;		defm V_DIV_FIXUP_F32 : VOP3_Real_vi <0x1de>;
defm V_DIV_FIXUP_F64 : VOP3_Real_vi <0x1df>;		defm V_DIV_FIXUP_F64 : VOP3_Real_vi <0x1df>;
defm V_DIV_SCALE_F32 : VOP3be_Real_vi <0x1e0>;		defm V_DIV_SCALE_F32 : VOP3be_VCCSDST_Real_vi <0x1e0>;
defm V_DIV_SCALE_F64 : VOP3be_Real_vi <0x1e1>;		defm V_DIV_SCALE_F64 : VOP3be_VCCSDST_Real_vi <0x1e1>;
defm V_DIV_FMAS_F32 : VOP3_Real_vi <0x1e2>;		defm V_DIV_FMAS_F32 : VOP3_Real_vi <0x1e2>;
defm V_DIV_FMAS_F64 : VOP3_Real_vi <0x1e3>;		defm V_DIV_FMAS_F64 : VOP3_Real_vi <0x1e3>;
defm V_MSAD_U8 : VOP3_Real_vi <0x1e4>;		defm V_MSAD_U8 : VOP3_Real_vi <0x1e4>;
defm V_QSAD_PK_U16_U8 : VOP3_Real_vi <0x1e5>;		defm V_QSAD_PK_U16_U8 : VOP3_Real_vi <0x1e5>;
defm V_MQSAD_PK_U16_U8 : VOP3_Real_vi <0x1e6>;		defm V_MQSAD_PK_U16_U8 : VOP3_Real_vi <0x1e6>;
defm V_MQSAD_U32_U8 : VOP3_Real_vi <0x1e7>;		defm V_MQSAD_U32_U8 : VOP3_Real_vi <0x1e7>;

defm V_PERM_B32 : VOP3_Real_vi <0x1ed>;		defm V_PERM_B32 : VOP3_Real_vi <0x1ed>;
▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/VOPInstructions.td

Show First 20 Lines • Show All 149 Lines • ▼ Show 20 Lines	class VOP3P_Pseudo <string opName, VOPProfile P, list<dag> pattern = []> :
let VOP3P = 1;		let VOP3P = 1;
}		}

class VOP_Real<VOP_Pseudo ps> {		class VOP_Real<VOP_Pseudo ps> {
Instruction Opcode = !cast<Instruction>(NAME);		Instruction Opcode = !cast<Instruction>(NAME);
bit IsSingle = ps.Pfl.IsSingle;		bit IsSingle = ps.Pfl.IsSingle;
}		}

class VOP3_Real <VOP_Pseudo ps, int EncodingFamily, string asm_name = ps.Mnemonic> :		class VOP3_Base <VOP_Pseudo ps, string asm_name = ps.Mnemonic> :
VOP_Real <ps>,		InstSI <ps.OutOperandList, ps.InOperandList, asm_name # ps.AsmOperands, []> {
InstSI <ps.OutOperandList, ps.InOperandList, asm_name # ps.AsmOperands, []>,
SIMCInstr <ps.PseudoInstr, EncodingFamily> {

let VALU = 1;		let VALU = 1;
let VOP3 = 1;		let VOP3 = 1;
let isPseudo = 0;		let isPseudo = 0;
let isCodeGenOnly = 0;		let isCodeGenOnly = 0;
let UseNamedOperandTable = 1;		let UseNamedOperandTable = 1;

// copy relevant pseudo op flags		// copy relevant pseudo op flags
Show All 10 Lines	class VOP3_Base <VOP_Pseudo ps, string asm_name = ps.Mnemonic> :
let SchedRW = ps.SchedRW;		let SchedRW = ps.SchedRW;
let mayLoad = ps.mayLoad;		let mayLoad = ps.mayLoad;
let mayStore = ps.mayStore;		let mayStore = ps.mayStore;
let TRANS = ps.TRANS;		let TRANS = ps.TRANS;

VOPProfile Pfl = ps.Pfl;		VOPProfile Pfl = ps.Pfl;
}		}

		class VOP3_Real <VOP_Pseudo ps, int EncodingFamily, string asm_name = ps.Mnemonic> :
		VOP3_Base <ps, asm_name>,
		VOP_Real <ps>,
		SIMCInstr <ps.PseudoInstr, EncodingFamily>;

// XXX - Is there any reason to distinguish this from regular VOP3		// XXX - Is there any reason to distinguish this from regular VOP3
// here?		// here?
class VOP3P_Real<VOP_Pseudo ps, int EncodingFamily, string asm_name = ps.Mnemonic> :		class VOP3P_Real<VOP_Pseudo ps, int EncodingFamily, string asm_name = ps.Mnemonic> :
VOP3_Real<ps, EncodingFamily, asm_name> {		VOP3_Real<ps, EncodingFamily, asm_name> {

// The v_wmma pseudos have extra constraints that we do not want to impose on the real instruction.		// The v_wmma pseudos have extra constraints that we do not want to impose on the real instruction.
let Constraints = !if(!eq(!substr(ps.Mnemonic,0,6), "v_wmma"), "", ps.Constraints);		let Constraints = !if(!eq(!substr(ps.Mnemonic,0,6), "v_wmma"), "", ps.Constraints);
}		}
▲ Show 20 Lines • Show All 1,310 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/constant-bus-restriction.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GFX9 %s		; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GFX9 %s
; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX10PLUS,GFX10 %s		; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GFX10PLUS %s
; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX10PLUS,GFX11 %s		; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GFX10PLUS %s

; Make sure we don't violate the constant bus restriction		; Make sure we don't violate the constant bus restriction

define amdgpu_ps float @fmul_s_s(float inreg %src0, float inreg %src1) {		define amdgpu_ps float @fmul_s_s(float inreg %src0, float inreg %src1) {
; GFX9-LABEL: fmul_s_s:		; GFX9-LABEL: fmul_s_s:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: v_mov_b32_e32 v0, s3		; GFX9-NEXT: v_mov_b32_e32 v0, s3
; GFX9-NEXT: v_mul_f32_e32 v0, s2, v0		; GFX9-NEXT: v_mul_f32_e32 v0, s2, v0
▲ Show 20 Lines • Show All 198 Lines • ▼ Show 20 Lines	; GFX10PLUS-NEXT: ; return to shader part epilog
%result = select i1 %class, float 1.0, float 0.0		%result = select i1 %class, float 1.0, float 0.0
ret float %result		ret float %result
}		}

define amdgpu_ps float @div_scale_s_s_true(float inreg %src0, float inreg %src1) {		define amdgpu_ps float @div_scale_s_s_true(float inreg %src0, float inreg %src1) {
; GFX9-LABEL: div_scale_s_s_true:		; GFX9-LABEL: div_scale_s_s_true:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: v_mov_b32_e32 v0, s3		; GFX9-NEXT: v_mov_b32_e32 v0, s3
; GFX9-NEXT: v_div_scale_f32 v0, s[0:1], s2, v0, s2		; GFX9-NEXT: v_div_scale_f32 v0, vcc, s2, v0, s2
; GFX9-NEXT: ; return to shader part epilog		; GFX9-NEXT: ; return to shader part epilog
;		;
; GFX10-LABEL: div_scale_s_s_true:		; GFX10PLUS-LABEL: div_scale_s_s_true:
; GFX10: ; %bb.0:		; GFX10PLUS: ; %bb.0:
; GFX10-NEXT: v_div_scale_f32 v0, s0, s2, s3, s2		; GFX10PLUS-NEXT: v_div_scale_f32 v0, vcc_lo, s2, s3, s2
; GFX10-NEXT: ; return to shader part epilog		; GFX10PLUS-NEXT: ; return to shader part epilog
;
; GFX11-LABEL: div_scale_s_s_true:
; GFX11: ; %bb.0:
; GFX11-NEXT: v_div_scale_f32 v0, null, s2, s3, s2
; GFX11-NEXT: ; return to shader part epilog
%div.scale = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %src0, float %src1, i1 true)		%div.scale = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %src0, float %src1, i1 true)
%result = extractvalue { float, i1 } %div.scale, 0		%result = extractvalue { float, i1 } %div.scale, 0
ret float %result		ret float %result
}		}

define amdgpu_ps float @div_scale_s_s_false(float inreg %src0, float inreg %src1) {		define amdgpu_ps float @div_scale_s_s_false(float inreg %src0, float inreg %src1) {
; GFX9-LABEL: div_scale_s_s_false:		; GFX9-LABEL: div_scale_s_s_false:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: v_mov_b32_e32 v0, s3		; GFX9-NEXT: v_mov_b32_e32 v0, s3
; GFX9-NEXT: v_div_scale_f32 v0, s[0:1], v0, v0, s2		; GFX9-NEXT: v_div_scale_f32 v0, vcc, v0, v0, s2
; GFX9-NEXT: ; return to shader part epilog		; GFX9-NEXT: ; return to shader part epilog
;		;
; GFX10-LABEL: div_scale_s_s_false:		; GFX10PLUS-LABEL: div_scale_s_s_false:
; GFX10: ; %bb.0:		; GFX10PLUS: ; %bb.0:
; GFX10-NEXT: v_div_scale_f32 v0, s0, s3, s3, s2		; GFX10PLUS-NEXT: v_div_scale_f32 v0, vcc_lo, s3, s3, s2
; GFX10-NEXT: ; return to shader part epilog		; GFX10PLUS-NEXT: ; return to shader part epilog
;
; GFX11-LABEL: div_scale_s_s_false:
; GFX11: ; %bb.0:
; GFX11-NEXT: v_div_scale_f32 v0, null, s3, s3, s2
; GFX11-NEXT: ; return to shader part epilog
%div.scale = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %src0, float %src1, i1 false)		%div.scale = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %src0, float %src1, i1 false)
%result = extractvalue { float, i1 } %div.scale, 0		%result = extractvalue { float, i1 } %div.scale, 0
ret float %result		ret float %result
}		}

declare float @llvm.fma.f32(float, float, float) #0		declare float @llvm.fma.f32(float, float, float) #0
declare float @llvm.amdgcn.div.fmas.f32(float, float, float, i1) #1		declare float @llvm.amdgcn.div.fmas.f32(float, float, float, i1) #1
declare { float, i1 } @llvm.amdgcn.div.scale.f32(float, float, i1 immarg) #1		declare { float, i1 } @llvm.amdgcn.div.scale.f32(float, float, i1 immarg) #1
declare i1 @llvm.amdgcn.class.f32(float, i32) #1		declare i1 @llvm.amdgcn.class.f32(float, i32) #1

attributes #0 = { nounwind readnone speculatable willreturn }		attributes #0 = { nounwind readnone speculatable willreturn }
attributes #1 = { nounwind readnone speculatable }		attributes #1 = { nounwind readnone speculatable }
		rampitecUnsubmitted Done Reply Inline Actions What is this and why is it needed? rampitec: What is this and why is it needed?

llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f16.ll

	Show All 15 Lines
	; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 -denormal-fp-math=preserve-sign -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10PLUS,GFX11 %s			; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 -denormal-fp-math=preserve-sign -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10PLUS,GFX11 %s

	define half @v_fdiv_f16(half %a, half %b) {			define half @v_fdiv_f16(half %a, half %b) {
	; GFX6-IEEE-LABEL: v_fdiv_f16:			; GFX6-IEEE-LABEL: v_fdiv_f16:
	; GFX6-IEEE: ; %bb.0:			; GFX6-IEEE: ; %bb.0:
	; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, v1			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, v1
	; GFX6-IEEE-NEXT: v_div_scale_f32 v2, s[4:5], v1, v1, v0			; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v0
	; GFX6-IEEE-NEXT: v_rcp_f32_e32 v3, v2			; GFX6-IEEE-NEXT: v_rcp_f32_e32 v3, v2
	; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0			; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0
	; GFX6-IEEE-NEXT: v_fma_f32 v5, -v2, v3, 1.0			; GFX6-IEEE-NEXT: v_fma_f32 v5, -v2, v3, 1.0
	; GFX6-IEEE-NEXT: v_fma_f32 v3, v5, v3, v3			; GFX6-IEEE-NEXT: v_fma_f32 v3, v5, v3, v3
	; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v4, v3			; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v4, v3
	; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v4			; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v4
	; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v3, v5			; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v3, v5
	; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v4			; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v4
	; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v5			; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v5
	; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v2, v1, v0			; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v2, v1, v0
	; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0
	; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]			; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX6-FLUSH-LABEL: v_fdiv_f16:			; GFX6-FLUSH-LABEL: v_fdiv_f16:
	; GFX6-FLUSH: ; %bb.0:			; GFX6-FLUSH: ; %bb.0:
	; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, v1			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, v1
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, s[4:5], v1, v1, v0			; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v0
	; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2			; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0			; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0			; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0
	; GFX6-FLUSH-NEXT: v_fma_f32 v3, v5, v3, v3			; GFX6-FLUSH-NEXT: v_fma_f32 v3, v5, v3, v3
	; GFX6-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3			; GFX6-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4			; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4
	; GFX6-FLUSH-NEXT: v_fma_f32 v5, v6, v3, v5			; GFX6-FLUSH-NEXT: v_fma_f32 v5, v6, v3, v5
	▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	}			}

	define half @v_fdiv_f16_ulp25(half %a, half %b) {			define half @v_fdiv_f16_ulp25(half %a, half %b) {
	; GFX6-IEEE-LABEL: v_fdiv_f16_ulp25:			; GFX6-IEEE-LABEL: v_fdiv_f16_ulp25:
	; GFX6-IEEE: ; %bb.0:			; GFX6-IEEE: ; %bb.0:
	; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, v1			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, v1
	; GFX6-IEEE-NEXT: v_div_scale_f32 v2, s[4:5], v1, v1, v0			; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v0
	; GFX6-IEEE-NEXT: v_rcp_f32_e32 v3, v2			; GFX6-IEEE-NEXT: v_rcp_f32_e32 v3, v2
	; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0			; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0
	; GFX6-IEEE-NEXT: v_fma_f32 v5, -v2, v3, 1.0			; GFX6-IEEE-NEXT: v_fma_f32 v5, -v2, v3, 1.0
	; GFX6-IEEE-NEXT: v_fma_f32 v3, v5, v3, v3			; GFX6-IEEE-NEXT: v_fma_f32 v3, v5, v3, v3
	; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v4, v3			; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v4, v3
	; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v4			; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v4
	; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v3, v5			; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v3, v5
	; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v4			; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v4
	; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v5			; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v5
	; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v2, v1, v0			; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v2, v1, v0
	; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0
	; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]			; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX6-FLUSH-LABEL: v_fdiv_f16_ulp25:			; GFX6-FLUSH-LABEL: v_fdiv_f16_ulp25:
	; GFX6-FLUSH: ; %bb.0:			; GFX6-FLUSH: ; %bb.0:
	; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, v1			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, v1
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, s[4:5], v1, v1, v0			; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v0
	; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2			; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0			; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0			; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0
	; GFX6-FLUSH-NEXT: v_fma_f32 v3, v5, v3, v3			; GFX6-FLUSH-NEXT: v_fma_f32 v3, v5, v3, v3
	; GFX6-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3			; GFX6-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4			; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4
	; GFX6-FLUSH-NEXT: v_fma_f32 v5, v6, v3, v5			; GFX6-FLUSH-NEXT: v_fma_f32 v5, v6, v3, v5
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	}			}

	define half @v_rcp_f16(half %x) {			define half @v_rcp_f16(half %x) {
	; GFX6-IEEE-LABEL: v_rcp_f16:			; GFX6-IEEE-LABEL: v_rcp_f16:
	; GFX6-IEEE: ; %bb.0:			; GFX6-IEEE: ; %bb.0:
	; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, 1.0			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, 1.0
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-IEEE-NEXT: v_div_scale_f32 v2, s[4:5], v0, v0, v1			; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, v0, v0, v1
	; GFX6-IEEE-NEXT: v_rcp_f32_e32 v3, v2			; GFX6-IEEE-NEXT: v_rcp_f32_e32 v3, v2
	; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v1, v0, v1			; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v1, v0, v1
	; GFX6-IEEE-NEXT: v_fma_f32 v5, -v2, v3, 1.0			; GFX6-IEEE-NEXT: v_fma_f32 v5, -v2, v3, 1.0
	; GFX6-IEEE-NEXT: v_fma_f32 v3, v5, v3, v3			; GFX6-IEEE-NEXT: v_fma_f32 v3, v5, v3, v3
	; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v4, v3			; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v4, v3
	; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v4			; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v4
	; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v3, v5			; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v3, v5
	; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v4			; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v4
	; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v5			; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v5
	; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v2, v0, v1			; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v2, v0, v1
	; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0
	; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]			; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX6-FLUSH-LABEL: v_rcp_f16:			; GFX6-FLUSH-LABEL: v_rcp_f16:
	; GFX6-FLUSH: ; %bb.0:			; GFX6-FLUSH: ; %bb.0:
	; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, 1.0			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, 1.0
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, s[4:5], v0, v0, v1			; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, vcc, v0, v0, v1
	; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2			; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, vcc, v1, v0, v1			; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, vcc, v1, v0, v1
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0			; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0
	; GFX6-FLUSH-NEXT: v_fma_f32 v3, v5, v3, v3			; GFX6-FLUSH-NEXT: v_fma_f32 v3, v5, v3, v3
	; GFX6-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3			; GFX6-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4			; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4
	; GFX6-FLUSH-NEXT: v_fma_f32 v5, v6, v3, v5			; GFX6-FLUSH-NEXT: v_fma_f32 v5, v6, v3, v5
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	}			}

	define half @v_rcp_f16_arcp(half %x) {			define half @v_rcp_f16_arcp(half %x) {
	; GFX6-IEEE-LABEL: v_rcp_f16_arcp:			; GFX6-IEEE-LABEL: v_rcp_f16_arcp:
	; GFX6-IEEE: ; %bb.0:			; GFX6-IEEE: ; %bb.0:
	; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, 1.0			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, 1.0
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-IEEE-NEXT: v_div_scale_f32 v2, s[4:5], v0, v0, v1			; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, v0, v0, v1
	; GFX6-IEEE-NEXT: v_rcp_f32_e32 v3, v2			; GFX6-IEEE-NEXT: v_rcp_f32_e32 v3, v2
	; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v1, v0, v1			; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v1, v0, v1
	; GFX6-IEEE-NEXT: v_fma_f32 v5, -v2, v3, 1.0			; GFX6-IEEE-NEXT: v_fma_f32 v5, -v2, v3, 1.0
	; GFX6-IEEE-NEXT: v_fma_f32 v3, v5, v3, v3			; GFX6-IEEE-NEXT: v_fma_f32 v3, v5, v3, v3
	; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v4, v3			; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v4, v3
	; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v4			; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v4
	; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v3, v5			; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v3, v5
	; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v4			; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v4
	; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v5			; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v5
	; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v2, v0, v1			; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v2, v0, v1
	; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0
	; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]			; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX6-FLUSH-LABEL: v_rcp_f16_arcp:			; GFX6-FLUSH-LABEL: v_rcp_f16_arcp:
	; GFX6-FLUSH: ; %bb.0:			; GFX6-FLUSH: ; %bb.0:
	; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, 1.0			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, 1.0
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, s[4:5], v0, v0, v1			; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, vcc, v0, v0, v1
	; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2			; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, vcc, v1, v0, v1			; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, vcc, v1, v0, v1
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0			; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0
	; GFX6-FLUSH-NEXT: v_fma_f32 v3, v5, v3, v3			; GFX6-FLUSH-NEXT: v_fma_f32 v3, v5, v3, v3
	; GFX6-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3			; GFX6-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4			; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4
	; GFX6-FLUSH-NEXT: v_fma_f32 v5, v6, v3, v5			; GFX6-FLUSH-NEXT: v_fma_f32 v5, v6, v3, v5
	▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	}			}

	define half @v_rcp_f16_ulp25(half %x) {			define half @v_rcp_f16_ulp25(half %x) {
	; GFX6-IEEE-LABEL: v_rcp_f16_ulp25:			; GFX6-IEEE-LABEL: v_rcp_f16_ulp25:
	; GFX6-IEEE: ; %bb.0:			; GFX6-IEEE: ; %bb.0:
	; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, 1.0			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, 1.0
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-IEEE-NEXT: v_div_scale_f32 v2, s[4:5], v0, v0, v1			; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, v0, v0, v1
	; GFX6-IEEE-NEXT: v_rcp_f32_e32 v3, v2			; GFX6-IEEE-NEXT: v_rcp_f32_e32 v3, v2
	; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v1, v0, v1			; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v1, v0, v1
	; GFX6-IEEE-NEXT: v_fma_f32 v5, -v2, v3, 1.0			; GFX6-IEEE-NEXT: v_fma_f32 v5, -v2, v3, 1.0
	; GFX6-IEEE-NEXT: v_fma_f32 v3, v5, v3, v3			; GFX6-IEEE-NEXT: v_fma_f32 v3, v5, v3, v3
	; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v4, v3			; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v4, v3
	; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v4			; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v4
	; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v3, v5			; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v3, v5
	; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v4			; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v4
	; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v5			; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v5
	; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v2, v0, v1			; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v2, v0, v1
	; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0
	; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]			; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX6-FLUSH-LABEL: v_rcp_f16_ulp25:			; GFX6-FLUSH-LABEL: v_rcp_f16_ulp25:
	; GFX6-FLUSH: ; %bb.0:			; GFX6-FLUSH: ; %bb.0:
	; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, 1.0			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, 1.0
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, s[4:5], v0, v0, v1			; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, vcc, v0, v0, v1
	; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2			; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, vcc, v1, v0, v1			; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, vcc, v1, v0, v1
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0			; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0
	; GFX6-FLUSH-NEXT: v_fma_f32 v3, v5, v3, v3			; GFX6-FLUSH-NEXT: v_fma_f32 v3, v5, v3, v3
	; GFX6-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3			; GFX6-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4			; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4
	; GFX6-FLUSH-NEXT: v_fma_f32 v5, v6, v3, v5			; GFX6-FLUSH-NEXT: v_fma_f32 v5, v6, v3, v5
	▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	}			}

	define half @v_fdiv_f16_arcp_ulp25(half %a, half %b) {			define half @v_fdiv_f16_arcp_ulp25(half %a, half %b) {
	; GFX6-IEEE-LABEL: v_fdiv_f16_arcp_ulp25:			; GFX6-IEEE-LABEL: v_fdiv_f16_arcp_ulp25:
	; GFX6-IEEE: ; %bb.0:			; GFX6-IEEE: ; %bb.0:
	; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, v1			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, v1
	; GFX6-IEEE-NEXT: v_div_scale_f32 v2, s[4:5], v1, v1, v0			; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v0
	; GFX6-IEEE-NEXT: v_rcp_f32_e32 v3, v2			; GFX6-IEEE-NEXT: v_rcp_f32_e32 v3, v2
	; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0			; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0
	; GFX6-IEEE-NEXT: v_fma_f32 v5, -v2, v3, 1.0			; GFX6-IEEE-NEXT: v_fma_f32 v5, -v2, v3, 1.0
	; GFX6-IEEE-NEXT: v_fma_f32 v3, v5, v3, v3			; GFX6-IEEE-NEXT: v_fma_f32 v3, v5, v3, v3
	; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v4, v3			; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v4, v3
	; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v4			; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v4
	; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v3, v5			; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v3, v5
	; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v4			; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v4
	; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v5			; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v5
	; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v2, v1, v0			; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v2, v1, v0
	; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0
	; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]			; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX6-FLUSH-LABEL: v_fdiv_f16_arcp_ulp25:			; GFX6-FLUSH-LABEL: v_fdiv_f16_arcp_ulp25:
	; GFX6-FLUSH: ; %bb.0:			; GFX6-FLUSH: ; %bb.0:
	; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, v1			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, v1
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, s[4:5], v1, v1, v0			; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v0
	; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2			; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0			; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0			; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0
	; GFX6-FLUSH-NEXT: v_fma_f32 v3, v5, v3, v3			; GFX6-FLUSH-NEXT: v_fma_f32 v3, v5, v3, v3
	; GFX6-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3			; GFX6-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4			; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4
	; GFX6-FLUSH-NEXT: v_fma_f32 v5, v6, v3, v5			; GFX6-FLUSH-NEXT: v_fma_f32 v5, v6, v3, v5
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	define <2 x half> @v_fdiv_v2f16(<2 x half> %a, <2 x half> %b) {			define <2 x half> @v_fdiv_v2f16(<2 x half> %a, <2 x half> %b) {
	; GFX6-IEEE-LABEL: v_fdiv_v2f16:			; GFX6-IEEE-LABEL: v_fdiv_v2f16:
	; GFX6-IEEE: ; %bb.0:			; GFX6-IEEE: ; %bb.0:
	; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v2, v2			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v2, v2
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, v1			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, v1
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v3, v3			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v3, v3
	; GFX6-IEEE-NEXT: v_div_scale_f32 v4, s[4:5], v2, v2, v0			; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v2, v2, v0
	; GFX6-IEEE-NEXT: v_rcp_f32_e32 v5, v4			; GFX6-IEEE-NEXT: v_rcp_f32_e32 v5, v4
	; GFX6-IEEE-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0			; GFX6-IEEE-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0
	; GFX6-IEEE-NEXT: v_fma_f32 v7, -v4, v5, 1.0			; GFX6-IEEE-NEXT: v_fma_f32 v7, -v4, v5, 1.0
	; GFX6-IEEE-NEXT: v_fma_f32 v5, v7, v5, v5			; GFX6-IEEE-NEXT: v_fma_f32 v5, v7, v5, v5
	; GFX6-IEEE-NEXT: v_mul_f32_e32 v7, v6, v5			; GFX6-IEEE-NEXT: v_mul_f32_e32 v7, v6, v5
	; GFX6-IEEE-NEXT: v_fma_f32 v8, -v4, v7, v6			; GFX6-IEEE-NEXT: v_fma_f32 v8, -v4, v7, v6
	; GFX6-IEEE-NEXT: v_fma_f32 v7, v8, v5, v7			; GFX6-IEEE-NEXT: v_fma_f32 v7, v8, v5, v7
	; GFX6-IEEE-NEXT: v_fma_f32 v4, -v4, v7, v6			; GFX6-IEEE-NEXT: v_fma_f32 v4, -v4, v7, v6
	; GFX6-IEEE-NEXT: v_div_fmas_f32 v4, v4, v5, v7			; GFX6-IEEE-NEXT: v_div_fmas_f32 v4, v4, v5, v7
	; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0			; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0
	; GFX6-IEEE-NEXT: v_div_scale_f32 v2, s[4:5], v3, v3, v1			; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, v3, v3, v1
	; GFX6-IEEE-NEXT: v_rcp_f32_e32 v4, v2			; GFX6-IEEE-NEXT: v_rcp_f32_e32 v4, v2
	; GFX6-IEEE-NEXT: v_div_scale_f32 v5, vcc, v1, v3, v1			; GFX6-IEEE-NEXT: v_div_scale_f32 v5, vcc, v1, v3, v1
	; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0
	; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v4, 1.0			; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v4, 1.0
	; GFX6-IEEE-NEXT: v_fma_f32 v4, v6, v4, v4			; GFX6-IEEE-NEXT: v_fma_f32 v4, v6, v4, v4
	; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v5, v4			; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v5, v4
	; GFX6-IEEE-NEXT: v_fma_f32 v7, -v2, v6, v5			; GFX6-IEEE-NEXT: v_fma_f32 v7, -v2, v6, v5
	; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6			; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6
	; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v6, v5			; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v6, v5
	; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v6			; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v6
	; GFX6-IEEE-NEXT: v_div_fixup_f32 v1, v2, v3, v1			; GFX6-IEEE-NEXT: v_div_fixup_f32 v1, v2, v3, v1
	; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v1, v1			; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v1, v1
	; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]			; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX6-FLUSH-LABEL: v_fdiv_v2f16:			; GFX6-FLUSH-LABEL: v_fdiv_v2f16:
	; GFX6-FLUSH: ; %bb.0:			; GFX6-FLUSH: ; %bb.0:
	; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v2, v2			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v2, v2
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, s[4:5], v2, v2, v0			; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, vcc, v2, v2, v0
	; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v5, v4			; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v5, v4
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0			; GFX6-FLUSH-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v4, v5, 1.0			; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v4, v5, 1.0
	; GFX6-FLUSH-NEXT: v_fma_f32 v5, v7, v5, v5			; GFX6-FLUSH-NEXT: v_fma_f32 v5, v7, v5, v5
	; GFX6-FLUSH-NEXT: v_mul_f32_e32 v7, v6, v5			; GFX6-FLUSH-NEXT: v_mul_f32_e32 v7, v6, v5
	; GFX6-FLUSH-NEXT: v_fma_f32 v8, -v4, v7, v6			; GFX6-FLUSH-NEXT: v_fma_f32 v8, -v4, v7, v6
	; GFX6-FLUSH-NEXT: v_fma_f32 v7, v8, v5, v7			; GFX6-FLUSH-NEXT: v_fma_f32 v7, v8, v5, v7
	; GFX6-FLUSH-NEXT: v_fma_f32 v4, -v4, v7, v6			; GFX6-FLUSH-NEXT: v_fma_f32 v4, -v4, v7, v6
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, v1			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, v1
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v3, v3			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v3, v3
	; GFX6-FLUSH-NEXT: v_div_fmas_f32 v4, v4, v5, v7			; GFX6-FLUSH-NEXT: v_div_fmas_f32 v4, v4, v5, v7
	; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v4, v2, v0			; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v4, v2, v0
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
	; GFX6-FLUSH-NEXT: v_cvt_f16_f32_e32 v0, v0			; GFX6-FLUSH-NEXT: v_cvt_f16_f32_e32 v0, v0
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, s[4:5], v3, v3, v1			; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, vcc, v3, v3, v1
	; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v4, v2			; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v4, v2
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v5, vcc, v1, v3, v1			; GFX6-FLUSH-NEXT: v_div_scale_f32 v5, vcc, v1, v3, v1
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v4, 1.0			; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v4, 1.0
	; GFX6-FLUSH-NEXT: v_fma_f32 v4, v6, v4, v4			; GFX6-FLUSH-NEXT: v_fma_f32 v4, v6, v4, v4
	; GFX6-FLUSH-NEXT: v_mul_f32_e32 v6, v5, v4			; GFX6-FLUSH-NEXT: v_mul_f32_e32 v6, v5, v4
	; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v2, v6, v5			; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v2, v6, v5
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, v7, v4, v6			; GFX6-FLUSH-NEXT: v_fma_f32 v6, v7, v4, v6
	▲ Show 20 Lines • Show All 166 Lines • ▼ Show 20 Lines
	define <2 x half> @v_fdiv_v2f16_ulp25(<2 x half> %a, <2 x half> %b) {			define <2 x half> @v_fdiv_v2f16_ulp25(<2 x half> %a, <2 x half> %b) {
	; GFX6-IEEE-LABEL: v_fdiv_v2f16_ulp25:			; GFX6-IEEE-LABEL: v_fdiv_v2f16_ulp25:
	; GFX6-IEEE: ; %bb.0:			; GFX6-IEEE: ; %bb.0:
	; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v2, v2			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v2, v2
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, v1			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, v1
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v3, v3			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v3, v3
	; GFX6-IEEE-NEXT: v_div_scale_f32 v4, s[4:5], v2, v2, v0			; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v2, v2, v0
	; GFX6-IEEE-NEXT: v_rcp_f32_e32 v5, v4			; GFX6-IEEE-NEXT: v_rcp_f32_e32 v5, v4
	; GFX6-IEEE-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0			; GFX6-IEEE-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0
	; GFX6-IEEE-NEXT: v_fma_f32 v7, -v4, v5, 1.0			; GFX6-IEEE-NEXT: v_fma_f32 v7, -v4, v5, 1.0
	; GFX6-IEEE-NEXT: v_fma_f32 v5, v7, v5, v5			; GFX6-IEEE-NEXT: v_fma_f32 v5, v7, v5, v5
	; GFX6-IEEE-NEXT: v_mul_f32_e32 v7, v6, v5			; GFX6-IEEE-NEXT: v_mul_f32_e32 v7, v6, v5
	; GFX6-IEEE-NEXT: v_fma_f32 v8, -v4, v7, v6			; GFX6-IEEE-NEXT: v_fma_f32 v8, -v4, v7, v6
	; GFX6-IEEE-NEXT: v_fma_f32 v7, v8, v5, v7			; GFX6-IEEE-NEXT: v_fma_f32 v7, v8, v5, v7
	; GFX6-IEEE-NEXT: v_fma_f32 v4, -v4, v7, v6			; GFX6-IEEE-NEXT: v_fma_f32 v4, -v4, v7, v6
	; GFX6-IEEE-NEXT: v_div_fmas_f32 v4, v4, v5, v7			; GFX6-IEEE-NEXT: v_div_fmas_f32 v4, v4, v5, v7
	; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0			; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0
	; GFX6-IEEE-NEXT: v_div_scale_f32 v2, s[4:5], v3, v3, v1			; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, v3, v3, v1
	; GFX6-IEEE-NEXT: v_rcp_f32_e32 v4, v2			; GFX6-IEEE-NEXT: v_rcp_f32_e32 v4, v2
	; GFX6-IEEE-NEXT: v_div_scale_f32 v5, vcc, v1, v3, v1			; GFX6-IEEE-NEXT: v_div_scale_f32 v5, vcc, v1, v3, v1
	; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0
	; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v4, 1.0			; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v4, 1.0
	; GFX6-IEEE-NEXT: v_fma_f32 v4, v6, v4, v4			; GFX6-IEEE-NEXT: v_fma_f32 v4, v6, v4, v4
	; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v5, v4			; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v5, v4
	; GFX6-IEEE-NEXT: v_fma_f32 v7, -v2, v6, v5			; GFX6-IEEE-NEXT: v_fma_f32 v7, -v2, v6, v5
	; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6			; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6
	; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v6, v5			; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v6, v5
	; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v6			; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v6
	; GFX6-IEEE-NEXT: v_div_fixup_f32 v1, v2, v3, v1			; GFX6-IEEE-NEXT: v_div_fixup_f32 v1, v2, v3, v1
	; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v1, v1			; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v1, v1
	; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]			; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX6-FLUSH-LABEL: v_fdiv_v2f16_ulp25:			; GFX6-FLUSH-LABEL: v_fdiv_v2f16_ulp25:
	; GFX6-FLUSH: ; %bb.0:			; GFX6-FLUSH: ; %bb.0:
	; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v2, v2			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v2, v2
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, s[4:5], v2, v2, v0			; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, vcc, v2, v2, v0
	; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v5, v4			; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v5, v4
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0			; GFX6-FLUSH-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v4, v5, 1.0			; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v4, v5, 1.0
	; GFX6-FLUSH-NEXT: v_fma_f32 v5, v7, v5, v5			; GFX6-FLUSH-NEXT: v_fma_f32 v5, v7, v5, v5
	; GFX6-FLUSH-NEXT: v_mul_f32_e32 v7, v6, v5			; GFX6-FLUSH-NEXT: v_mul_f32_e32 v7, v6, v5
	; GFX6-FLUSH-NEXT: v_fma_f32 v8, -v4, v7, v6			; GFX6-FLUSH-NEXT: v_fma_f32 v8, -v4, v7, v6
	; GFX6-FLUSH-NEXT: v_fma_f32 v7, v8, v5, v7			; GFX6-FLUSH-NEXT: v_fma_f32 v7, v8, v5, v7
	; GFX6-FLUSH-NEXT: v_fma_f32 v4, -v4, v7, v6			; GFX6-FLUSH-NEXT: v_fma_f32 v4, -v4, v7, v6
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, v1			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, v1
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v3, v3			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v3, v3
	; GFX6-FLUSH-NEXT: v_div_fmas_f32 v4, v4, v5, v7			; GFX6-FLUSH-NEXT: v_div_fmas_f32 v4, v4, v5, v7
	; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v4, v2, v0			; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v4, v2, v0
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
	; GFX6-FLUSH-NEXT: v_cvt_f16_f32_e32 v0, v0			; GFX6-FLUSH-NEXT: v_cvt_f16_f32_e32 v0, v0
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, s[4:5], v3, v3, v1			; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, vcc, v3, v3, v1
	; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v4, v2			; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v4, v2
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v5, vcc, v1, v3, v1			; GFX6-FLUSH-NEXT: v_div_scale_f32 v5, vcc, v1, v3, v1
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v4, 1.0			; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v4, 1.0
	; GFX6-FLUSH-NEXT: v_fma_f32 v4, v6, v4, v4			; GFX6-FLUSH-NEXT: v_fma_f32 v4, v6, v4, v4
	; GFX6-FLUSH-NEXT: v_mul_f32_e32 v6, v5, v4			; GFX6-FLUSH-NEXT: v_mul_f32_e32 v6, v5, v4
	; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v2, v6, v5			; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v2, v6, v5
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, v7, v4, v6			; GFX6-FLUSH-NEXT: v_fma_f32 v6, v7, v4, v6
	▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines

	define <2 x half> @v_rcp_v2f16(<2 x half> %x) {			define <2 x half> @v_rcp_v2f16(<2 x half> %x) {
	; GFX6-IEEE-LABEL: v_rcp_v2f16:			; GFX6-IEEE-LABEL: v_rcp_v2f16:
	; GFX6-IEEE: ; %bb.0:			; GFX6-IEEE: ; %bb.0:
	; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v2, 1.0			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v2, 1.0
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, v1			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, v1
	; GFX6-IEEE-NEXT: v_div_scale_f32 v3, s[4:5], v0, v0, v2			; GFX6-IEEE-NEXT: v_div_scale_f32 v3, vcc, v0, v0, v2
	; GFX6-IEEE-NEXT: v_rcp_f32_e32 v4, v3			; GFX6-IEEE-NEXT: v_rcp_f32_e32 v4, v3
	; GFX6-IEEE-NEXT: v_div_scale_f32 v5, vcc, v2, v0, v2			; GFX6-IEEE-NEXT: v_div_scale_f32 v5, vcc, v2, v0, v2
	; GFX6-IEEE-NEXT: v_fma_f32 v6, -v3, v4, 1.0			; GFX6-IEEE-NEXT: v_fma_f32 v6, -v3, v4, 1.0
	; GFX6-IEEE-NEXT: v_fma_f32 v4, v6, v4, v4			; GFX6-IEEE-NEXT: v_fma_f32 v4, v6, v4, v4
	; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v5, v4			; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v5, v4
	; GFX6-IEEE-NEXT: v_fma_f32 v7, -v3, v6, v5			; GFX6-IEEE-NEXT: v_fma_f32 v7, -v3, v6, v5
	; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6			; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6
	; GFX6-IEEE-NEXT: v_fma_f32 v3, -v3, v6, v5			; GFX6-IEEE-NEXT: v_fma_f32 v3, -v3, v6, v5
	; GFX6-IEEE-NEXT: v_div_fmas_f32 v3, v3, v4, v6			; GFX6-IEEE-NEXT: v_div_fmas_f32 v3, v3, v4, v6
	; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v3, v0, v2			; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v3, v0, v2
	; GFX6-IEEE-NEXT: v_div_scale_f32 v3, s[4:5], v1, v1, v2			; GFX6-IEEE-NEXT: v_div_scale_f32 v3, vcc, v1, v1, v2
	; GFX6-IEEE-NEXT: v_rcp_f32_e32 v4, v3			; GFX6-IEEE-NEXT: v_rcp_f32_e32 v4, v3
	; GFX6-IEEE-NEXT: v_div_scale_f32 v5, vcc, v2, v1, v2			; GFX6-IEEE-NEXT: v_div_scale_f32 v5, vcc, v2, v1, v2
	; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0
	; GFX6-IEEE-NEXT: v_fma_f32 v6, -v3, v4, 1.0			; GFX6-IEEE-NEXT: v_fma_f32 v6, -v3, v4, 1.0
	; GFX6-IEEE-NEXT: v_fma_f32 v4, v6, v4, v4			; GFX6-IEEE-NEXT: v_fma_f32 v4, v6, v4, v4
	; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v5, v4			; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v5, v4
	; GFX6-IEEE-NEXT: v_fma_f32 v7, -v3, v6, v5			; GFX6-IEEE-NEXT: v_fma_f32 v7, -v3, v6, v5
	; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6			; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6
	; GFX6-IEEE-NEXT: v_fma_f32 v3, -v3, v6, v5			; GFX6-IEEE-NEXT: v_fma_f32 v3, -v3, v6, v5
	; GFX6-IEEE-NEXT: v_div_fmas_f32 v3, v3, v4, v6			; GFX6-IEEE-NEXT: v_div_fmas_f32 v3, v3, v4, v6
	; GFX6-IEEE-NEXT: v_div_fixup_f32 v1, v3, v1, v2			; GFX6-IEEE-NEXT: v_div_fixup_f32 v1, v3, v1, v2
	; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v1, v1			; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v1, v1
	; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]			; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX6-FLUSH-LABEL: v_rcp_v2f16:			; GFX6-FLUSH-LABEL: v_rcp_v2f16:
	; GFX6-FLUSH: ; %bb.0:			; GFX6-FLUSH: ; %bb.0:
	; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v2, 1.0			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v2, 1.0
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v3, s[4:5], v0, v0, v2			; GFX6-FLUSH-NEXT: v_div_scale_f32 v3, vcc, v0, v0, v2
	; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v4, v3			; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v4, v3
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v5, vcc, v2, v0, v2			; GFX6-FLUSH-NEXT: v_div_scale_f32 v5, vcc, v2, v0, v2
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v3, v4, 1.0			; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v3, v4, 1.0
	; GFX6-FLUSH-NEXT: v_fma_f32 v4, v6, v4, v4			; GFX6-FLUSH-NEXT: v_fma_f32 v4, v6, v4, v4
	; GFX6-FLUSH-NEXT: v_mul_f32_e32 v6, v5, v4			; GFX6-FLUSH-NEXT: v_mul_f32_e32 v6, v5, v4
	; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v3, v6, v5			; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v3, v6, v5
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, v7, v4, v6			; GFX6-FLUSH-NEXT: v_fma_f32 v6, v7, v4, v6
	; GFX6-FLUSH-NEXT: v_fma_f32 v3, -v3, v6, v5			; GFX6-FLUSH-NEXT: v_fma_f32 v3, -v3, v6, v5
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; GFX6-FLUSH-NEXT: v_div_fmas_f32 v3, v3, v4, v6			; GFX6-FLUSH-NEXT: v_div_fmas_f32 v3, v3, v4, v6
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v4, 1.0			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v4, 1.0
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, v1			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, v1
	; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v3, v0, v2			; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v3, v0, v2
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
	; GFX6-FLUSH-NEXT: v_cvt_f16_f32_e32 v0, v0			; GFX6-FLUSH-NEXT: v_cvt_f16_f32_e32 v0, v0
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, s[4:5], v1, v1, v4			; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v4
	; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2			; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v5, vcc, v4, v1, v4			; GFX6-FLUSH-NEXT: v_div_scale_f32 v5, vcc, v4, v1, v4
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v3, 1.0			; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v3, 1.0
	; GFX6-FLUSH-NEXT: v_fma_f32 v3, v6, v3, v3			; GFX6-FLUSH-NEXT: v_fma_f32 v3, v6, v3, v3
	; GFX6-FLUSH-NEXT: v_mul_f32_e32 v6, v5, v3			; GFX6-FLUSH-NEXT: v_mul_f32_e32 v6, v5, v3
	; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v2, v6, v5			; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v2, v6, v5
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, v7, v3, v6			; GFX6-FLUSH-NEXT: v_fma_f32 v6, v7, v3, v6
	▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines

	define <2 x half> @v_rcp_v2f16_arcp(<2 x half> %x) {			define <2 x half> @v_rcp_v2f16_arcp(<2 x half> %x) {
	; GFX6-IEEE-LABEL: v_rcp_v2f16_arcp:			; GFX6-IEEE-LABEL: v_rcp_v2f16_arcp:
	; GFX6-IEEE: ; %bb.0:			; GFX6-IEEE: ; %bb.0:
	; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v2, 1.0			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v2, 1.0
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, v1			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, v1
	; GFX6-IEEE-NEXT: v_div_scale_f32 v3, s[4:5], v0, v0, v2			; GFX6-IEEE-NEXT: v_div_scale_f32 v3, vcc, v0, v0, v2
	; GFX6-IEEE-NEXT: v_rcp_f32_e32 v4, v3			; GFX6-IEEE-NEXT: v_rcp_f32_e32 v4, v3
	; GFX6-IEEE-NEXT: v_div_scale_f32 v5, vcc, v2, v0, v2			; GFX6-IEEE-NEXT: v_div_scale_f32 v5, vcc, v2, v0, v2
	; GFX6-IEEE-NEXT: v_fma_f32 v6, -v3, v4, 1.0			; GFX6-IEEE-NEXT: v_fma_f32 v6, -v3, v4, 1.0
	; GFX6-IEEE-NEXT: v_fma_f32 v4, v6, v4, v4			; GFX6-IEEE-NEXT: v_fma_f32 v4, v6, v4, v4
	; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v5, v4			; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v5, v4
	; GFX6-IEEE-NEXT: v_fma_f32 v7, -v3, v6, v5			; GFX6-IEEE-NEXT: v_fma_f32 v7, -v3, v6, v5
	; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6			; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6
	; GFX6-IEEE-NEXT: v_fma_f32 v3, -v3, v6, v5			; GFX6-IEEE-NEXT: v_fma_f32 v3, -v3, v6, v5
	; GFX6-IEEE-NEXT: v_div_fmas_f32 v3, v3, v4, v6			; GFX6-IEEE-NEXT: v_div_fmas_f32 v3, v3, v4, v6
	; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v3, v0, v2			; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v3, v0, v2
	; GFX6-IEEE-NEXT: v_div_scale_f32 v3, s[4:5], v1, v1, v2			; GFX6-IEEE-NEXT: v_div_scale_f32 v3, vcc, v1, v1, v2
	; GFX6-IEEE-NEXT: v_rcp_f32_e32 v4, v3			; GFX6-IEEE-NEXT: v_rcp_f32_e32 v4, v3
	; GFX6-IEEE-NEXT: v_div_scale_f32 v5, vcc, v2, v1, v2			; GFX6-IEEE-NEXT: v_div_scale_f32 v5, vcc, v2, v1, v2
	; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0
	; GFX6-IEEE-NEXT: v_fma_f32 v6, -v3, v4, 1.0			; GFX6-IEEE-NEXT: v_fma_f32 v6, -v3, v4, 1.0
	; GFX6-IEEE-NEXT: v_fma_f32 v4, v6, v4, v4			; GFX6-IEEE-NEXT: v_fma_f32 v4, v6, v4, v4
	; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v5, v4			; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v5, v4
	; GFX6-IEEE-NEXT: v_fma_f32 v7, -v3, v6, v5			; GFX6-IEEE-NEXT: v_fma_f32 v7, -v3, v6, v5
	; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6			; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6
	; GFX6-IEEE-NEXT: v_fma_f32 v3, -v3, v6, v5			; GFX6-IEEE-NEXT: v_fma_f32 v3, -v3, v6, v5
	; GFX6-IEEE-NEXT: v_div_fmas_f32 v3, v3, v4, v6			; GFX6-IEEE-NEXT: v_div_fmas_f32 v3, v3, v4, v6
	; GFX6-IEEE-NEXT: v_div_fixup_f32 v1, v3, v1, v2			; GFX6-IEEE-NEXT: v_div_fixup_f32 v1, v3, v1, v2
	; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v1, v1			; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v1, v1
	; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]			; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX6-FLUSH-LABEL: v_rcp_v2f16_arcp:			; GFX6-FLUSH-LABEL: v_rcp_v2f16_arcp:
	; GFX6-FLUSH: ; %bb.0:			; GFX6-FLUSH: ; %bb.0:
	; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v2, 1.0			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v2, 1.0
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v3, s[4:5], v0, v0, v2			; GFX6-FLUSH-NEXT: v_div_scale_f32 v3, vcc, v0, v0, v2
	; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v4, v3			; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v4, v3
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v5, vcc, v2, v0, v2			; GFX6-FLUSH-NEXT: v_div_scale_f32 v5, vcc, v2, v0, v2
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v3, v4, 1.0			; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v3, v4, 1.0
	; GFX6-FLUSH-NEXT: v_fma_f32 v4, v6, v4, v4			; GFX6-FLUSH-NEXT: v_fma_f32 v4, v6, v4, v4
	; GFX6-FLUSH-NEXT: v_mul_f32_e32 v6, v5, v4			; GFX6-FLUSH-NEXT: v_mul_f32_e32 v6, v5, v4
	; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v3, v6, v5			; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v3, v6, v5
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, v7, v4, v6			; GFX6-FLUSH-NEXT: v_fma_f32 v6, v7, v4, v6
	; GFX6-FLUSH-NEXT: v_fma_f32 v3, -v3, v6, v5			; GFX6-FLUSH-NEXT: v_fma_f32 v3, -v3, v6, v5
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; GFX6-FLUSH-NEXT: v_div_fmas_f32 v3, v3, v4, v6			; GFX6-FLUSH-NEXT: v_div_fmas_f32 v3, v3, v4, v6
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v4, 1.0			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v4, 1.0
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, v1			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, v1
	; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v3, v0, v2			; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v3, v0, v2
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
	; GFX6-FLUSH-NEXT: v_cvt_f16_f32_e32 v0, v0			; GFX6-FLUSH-NEXT: v_cvt_f16_f32_e32 v0, v0
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, s[4:5], v1, v1, v4			; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v4
	; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2			; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v5, vcc, v4, v1, v4			; GFX6-FLUSH-NEXT: v_div_scale_f32 v5, vcc, v4, v1, v4
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v3, 1.0			; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v3, 1.0
	; GFX6-FLUSH-NEXT: v_fma_f32 v3, v6, v3, v3			; GFX6-FLUSH-NEXT: v_fma_f32 v3, v6, v3, v3
	; GFX6-FLUSH-NEXT: v_mul_f32_e32 v6, v5, v3			; GFX6-FLUSH-NEXT: v_mul_f32_e32 v6, v5, v3
	; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v2, v6, v5			; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v2, v6, v5
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, v7, v3, v6			; GFX6-FLUSH-NEXT: v_fma_f32 v6, v7, v3, v6
	▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines

	define <2 x half> @v_rcp_v2f16_ulp25(<2 x half> %x) {			define <2 x half> @v_rcp_v2f16_ulp25(<2 x half> %x) {
	; GFX6-IEEE-LABEL: v_rcp_v2f16_ulp25:			; GFX6-IEEE-LABEL: v_rcp_v2f16_ulp25:
	; GFX6-IEEE: ; %bb.0:			; GFX6-IEEE: ; %bb.0:
	; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v2, 1.0			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v2, 1.0
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, v1			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, v1
	; GFX6-IEEE-NEXT: v_div_scale_f32 v3, s[4:5], v0, v0, v2			; GFX6-IEEE-NEXT: v_div_scale_f32 v3, vcc, v0, v0, v2
	; GFX6-IEEE-NEXT: v_rcp_f32_e32 v4, v3			; GFX6-IEEE-NEXT: v_rcp_f32_e32 v4, v3
	; GFX6-IEEE-NEXT: v_div_scale_f32 v5, vcc, v2, v0, v2			; GFX6-IEEE-NEXT: v_div_scale_f32 v5, vcc, v2, v0, v2
	; GFX6-IEEE-NEXT: v_fma_f32 v6, -v3, v4, 1.0			; GFX6-IEEE-NEXT: v_fma_f32 v6, -v3, v4, 1.0
	; GFX6-IEEE-NEXT: v_fma_f32 v4, v6, v4, v4			; GFX6-IEEE-NEXT: v_fma_f32 v4, v6, v4, v4
	; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v5, v4			; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v5, v4
	; GFX6-IEEE-NEXT: v_fma_f32 v7, -v3, v6, v5			; GFX6-IEEE-NEXT: v_fma_f32 v7, -v3, v6, v5
	; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6			; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6
	; GFX6-IEEE-NEXT: v_fma_f32 v3, -v3, v6, v5			; GFX6-IEEE-NEXT: v_fma_f32 v3, -v3, v6, v5
	; GFX6-IEEE-NEXT: v_div_fmas_f32 v3, v3, v4, v6			; GFX6-IEEE-NEXT: v_div_fmas_f32 v3, v3, v4, v6
	; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v3, v0, v2			; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v3, v0, v2
	; GFX6-IEEE-NEXT: v_div_scale_f32 v3, s[4:5], v1, v1, v2			; GFX6-IEEE-NEXT: v_div_scale_f32 v3, vcc, v1, v1, v2
	; GFX6-IEEE-NEXT: v_rcp_f32_e32 v4, v3			; GFX6-IEEE-NEXT: v_rcp_f32_e32 v4, v3
	; GFX6-IEEE-NEXT: v_div_scale_f32 v5, vcc, v2, v1, v2			; GFX6-IEEE-NEXT: v_div_scale_f32 v5, vcc, v2, v1, v2
	; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0
	; GFX6-IEEE-NEXT: v_fma_f32 v6, -v3, v4, 1.0			; GFX6-IEEE-NEXT: v_fma_f32 v6, -v3, v4, 1.0
	; GFX6-IEEE-NEXT: v_fma_f32 v4, v6, v4, v4			; GFX6-IEEE-NEXT: v_fma_f32 v4, v6, v4, v4
	; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v5, v4			; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v5, v4
	; GFX6-IEEE-NEXT: v_fma_f32 v7, -v3, v6, v5			; GFX6-IEEE-NEXT: v_fma_f32 v7, -v3, v6, v5
	; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6			; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6
	; GFX6-IEEE-NEXT: v_fma_f32 v3, -v3, v6, v5			; GFX6-IEEE-NEXT: v_fma_f32 v3, -v3, v6, v5
	; GFX6-IEEE-NEXT: v_div_fmas_f32 v3, v3, v4, v6			; GFX6-IEEE-NEXT: v_div_fmas_f32 v3, v3, v4, v6
	; GFX6-IEEE-NEXT: v_div_fixup_f32 v1, v3, v1, v2			; GFX6-IEEE-NEXT: v_div_fixup_f32 v1, v3, v1, v2
	; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v1, v1			; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v1, v1
	; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]			; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX6-FLUSH-LABEL: v_rcp_v2f16_ulp25:			; GFX6-FLUSH-LABEL: v_rcp_v2f16_ulp25:
	; GFX6-FLUSH: ; %bb.0:			; GFX6-FLUSH: ; %bb.0:
	; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v2, 1.0			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v2, 1.0
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v3, s[4:5], v0, v0, v2			; GFX6-FLUSH-NEXT: v_div_scale_f32 v3, vcc, v0, v0, v2
	; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v4, v3			; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v4, v3
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v5, vcc, v2, v0, v2			; GFX6-FLUSH-NEXT: v_div_scale_f32 v5, vcc, v2, v0, v2
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v3, v4, 1.0			; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v3, v4, 1.0
	; GFX6-FLUSH-NEXT: v_fma_f32 v4, v6, v4, v4			; GFX6-FLUSH-NEXT: v_fma_f32 v4, v6, v4, v4
	; GFX6-FLUSH-NEXT: v_mul_f32_e32 v6, v5, v4			; GFX6-FLUSH-NEXT: v_mul_f32_e32 v6, v5, v4
	; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v3, v6, v5			; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v3, v6, v5
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, v7, v4, v6			; GFX6-FLUSH-NEXT: v_fma_f32 v6, v7, v4, v6
	; GFX6-FLUSH-NEXT: v_fma_f32 v3, -v3, v6, v5			; GFX6-FLUSH-NEXT: v_fma_f32 v3, -v3, v6, v5
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; GFX6-FLUSH-NEXT: v_div_fmas_f32 v3, v3, v4, v6			; GFX6-FLUSH-NEXT: v_div_fmas_f32 v3, v3, v4, v6
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v4, 1.0			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v4, 1.0
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, v1			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, v1
	; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v3, v0, v2			; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v3, v0, v2
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
	; GFX6-FLUSH-NEXT: v_cvt_f16_f32_e32 v0, v0			; GFX6-FLUSH-NEXT: v_cvt_f16_f32_e32 v0, v0
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, s[4:5], v1, v1, v4			; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v4
	; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2			; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v5, vcc, v4, v1, v4			; GFX6-FLUSH-NEXT: v_div_scale_f32 v5, vcc, v4, v1, v4
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v3, 1.0			; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v3, 1.0
	; GFX6-FLUSH-NEXT: v_fma_f32 v3, v6, v3, v3			; GFX6-FLUSH-NEXT: v_fma_f32 v3, v6, v3, v3
	; GFX6-FLUSH-NEXT: v_mul_f32_e32 v6, v5, v3			; GFX6-FLUSH-NEXT: v_mul_f32_e32 v6, v5, v3
	; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v2, v6, v5			; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v2, v6, v5
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, v7, v3, v6			; GFX6-FLUSH-NEXT: v_fma_f32 v6, v7, v3, v6
	▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines
	define <2 x half> @v_fdiv_v2f16_arcp_ulp25(<2 x half> %a, <2 x half> %b) {			define <2 x half> @v_fdiv_v2f16_arcp_ulp25(<2 x half> %a, <2 x half> %b) {
	; GFX6-IEEE-LABEL: v_fdiv_v2f16_arcp_ulp25:			; GFX6-IEEE-LABEL: v_fdiv_v2f16_arcp_ulp25:
	; GFX6-IEEE: ; %bb.0:			; GFX6-IEEE: ; %bb.0:
	; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v2, v2			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v2, v2
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, v1			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v1, v1
	; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v3, v3			; GFX6-IEEE-NEXT: v_cvt_f32_f16_e32 v3, v3
	; GFX6-IEEE-NEXT: v_div_scale_f32 v4, s[4:5], v2, v2, v0			; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v2, v2, v0
	; GFX6-IEEE-NEXT: v_rcp_f32_e32 v5, v4			; GFX6-IEEE-NEXT: v_rcp_f32_e32 v5, v4
	; GFX6-IEEE-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0			; GFX6-IEEE-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0
	; GFX6-IEEE-NEXT: v_fma_f32 v7, -v4, v5, 1.0			; GFX6-IEEE-NEXT: v_fma_f32 v7, -v4, v5, 1.0
	; GFX6-IEEE-NEXT: v_fma_f32 v5, v7, v5, v5			; GFX6-IEEE-NEXT: v_fma_f32 v5, v7, v5, v5
	; GFX6-IEEE-NEXT: v_mul_f32_e32 v7, v6, v5			; GFX6-IEEE-NEXT: v_mul_f32_e32 v7, v6, v5
	; GFX6-IEEE-NEXT: v_fma_f32 v8, -v4, v7, v6			; GFX6-IEEE-NEXT: v_fma_f32 v8, -v4, v7, v6
	; GFX6-IEEE-NEXT: v_fma_f32 v7, v8, v5, v7			; GFX6-IEEE-NEXT: v_fma_f32 v7, v8, v5, v7
	; GFX6-IEEE-NEXT: v_fma_f32 v4, -v4, v7, v6			; GFX6-IEEE-NEXT: v_fma_f32 v4, -v4, v7, v6
	; GFX6-IEEE-NEXT: v_div_fmas_f32 v4, v4, v5, v7			; GFX6-IEEE-NEXT: v_div_fmas_f32 v4, v4, v5, v7
	; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0			; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0
	; GFX6-IEEE-NEXT: v_div_scale_f32 v2, s[4:5], v3, v3, v1			; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, v3, v3, v1
	; GFX6-IEEE-NEXT: v_rcp_f32_e32 v4, v2			; GFX6-IEEE-NEXT: v_rcp_f32_e32 v4, v2
	; GFX6-IEEE-NEXT: v_div_scale_f32 v5, vcc, v1, v3, v1			; GFX6-IEEE-NEXT: v_div_scale_f32 v5, vcc, v1, v3, v1
	; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0			; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v0, v0
	; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v4, 1.0			; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v4, 1.0
	; GFX6-IEEE-NEXT: v_fma_f32 v4, v6, v4, v4			; GFX6-IEEE-NEXT: v_fma_f32 v4, v6, v4, v4
	; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v5, v4			; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v5, v4
	; GFX6-IEEE-NEXT: v_fma_f32 v7, -v2, v6, v5			; GFX6-IEEE-NEXT: v_fma_f32 v7, -v2, v6, v5
	; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6			; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6
	; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v6, v5			; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v6, v5
	; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v6			; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v6
	; GFX6-IEEE-NEXT: v_div_fixup_f32 v1, v2, v3, v1			; GFX6-IEEE-NEXT: v_div_fixup_f32 v1, v2, v3, v1
	; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v1, v1			; GFX6-IEEE-NEXT: v_cvt_f16_f32_e32 v1, v1
	; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]			; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX6-FLUSH-LABEL: v_fdiv_v2f16_arcp_ulp25:			; GFX6-FLUSH-LABEL: v_fdiv_v2f16_arcp_ulp25:
	; GFX6-FLUSH: ; %bb.0:			; GFX6-FLUSH: ; %bb.0:
	; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v0, v0
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v2, v2			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v2, v2
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, s[4:5], v2, v2, v0			; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, vcc, v2, v2, v0
	; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v5, v4			; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v5, v4
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0			; GFX6-FLUSH-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v4, v5, 1.0			; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v4, v5, 1.0
	; GFX6-FLUSH-NEXT: v_fma_f32 v5, v7, v5, v5			; GFX6-FLUSH-NEXT: v_fma_f32 v5, v7, v5, v5
	; GFX6-FLUSH-NEXT: v_mul_f32_e32 v7, v6, v5			; GFX6-FLUSH-NEXT: v_mul_f32_e32 v7, v6, v5
	; GFX6-FLUSH-NEXT: v_fma_f32 v8, -v4, v7, v6			; GFX6-FLUSH-NEXT: v_fma_f32 v8, -v4, v7, v6
	; GFX6-FLUSH-NEXT: v_fma_f32 v7, v8, v5, v7			; GFX6-FLUSH-NEXT: v_fma_f32 v7, v8, v5, v7
	; GFX6-FLUSH-NEXT: v_fma_f32 v4, -v4, v7, v6			; GFX6-FLUSH-NEXT: v_fma_f32 v4, -v4, v7, v6
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, v1			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v1, v1
	; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v3, v3			; GFX6-FLUSH-NEXT: v_cvt_f32_f16_e32 v3, v3
	; GFX6-FLUSH-NEXT: v_div_fmas_f32 v4, v4, v5, v7			; GFX6-FLUSH-NEXT: v_div_fmas_f32 v4, v4, v5, v7
	; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v4, v2, v0			; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v4, v2, v0
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
	; GFX6-FLUSH-NEXT: v_cvt_f16_f32_e32 v0, v0			; GFX6-FLUSH-NEXT: v_cvt_f16_f32_e32 v0, v0
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, s[4:5], v3, v3, v1			; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, vcc, v3, v3, v1
	; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v4, v2			; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v4, v2
	; GFX6-FLUSH-NEXT: v_div_scale_f32 v5, vcc, v1, v3, v1			; GFX6-FLUSH-NEXT: v_div_scale_f32 v5, vcc, v1, v3, v1
	; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v4, 1.0			; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v4, 1.0
	; GFX6-FLUSH-NEXT: v_fma_f32 v4, v6, v4, v4			; GFX6-FLUSH-NEXT: v_fma_f32 v4, v6, v4, v4
	; GFX6-FLUSH-NEXT: v_mul_f32_e32 v6, v5, v4			; GFX6-FLUSH-NEXT: v_mul_f32_e32 v6, v5, v4
	; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v2, v6, v5			; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v2, v6, v5
	; GFX6-FLUSH-NEXT: v_fma_f32 v6, v7, v4, v6			; GFX6-FLUSH-NEXT: v_fma_f32 v6, v7, v4, v6
	▲ Show 20 Lines • Show All 167 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f32.ll

Show All 12 Lines

; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -denormal-fp-math=ieee -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX11,GFX11-IEEE %s		; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -denormal-fp-math=ieee -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX11,GFX11-IEEE %s
; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -denormal-fp-math=preserve-sign -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX11,GFX11-FLUSH %s		; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -denormal-fp-math=preserve-sign -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX11,GFX11-FLUSH %s

define float @v_fdiv_f32(float %a, float %b) {		define float @v_fdiv_f32(float %a, float %b) {
; GFX6-IEEE-LABEL: v_fdiv_f32:		; GFX6-IEEE-LABEL: v_fdiv_f32:
; GFX6-IEEE: ; %bb.0:		; GFX6-IEEE: ; %bb.0:
; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-IEEE-NEXT: v_div_scale_f32 v2, s[4:5], v1, v1, v0		; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v0
; GFX6-IEEE-NEXT: v_rcp_f32_e32 v3, v2		; GFX6-IEEE-NEXT: v_rcp_f32_e32 v3, v2
; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0		; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0
; GFX6-IEEE-NEXT: v_fma_f32 v5, -v2, v3, 1.0		; GFX6-IEEE-NEXT: v_fma_f32 v5, -v2, v3, 1.0
; GFX6-IEEE-NEXT: v_fma_f32 v3, v5, v3, v3		; GFX6-IEEE-NEXT: v_fma_f32 v3, v5, v3, v3
; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v4, v3		; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v4, v3
; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v4		; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v4
; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v3, v5		; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v3, v5
; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v4		; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v4
; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v5		; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v5
; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v2, v1, v0		; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v2, v1, v0
; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX6-FLUSH-LABEL: v_fdiv_f32:		; GFX6-FLUSH-LABEL: v_fdiv_f32:
; GFX6-FLUSH: ; %bb.0:		; GFX6-FLUSH: ; %bb.0:
; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, s[4:5], v1, v1, v0		; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v0
; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2		; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2
; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0		; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0
; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3		; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0		; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0
; GFX6-FLUSH-NEXT: v_fma_f32 v3, v5, v3, v3		; GFX6-FLUSH-NEXT: v_fma_f32 v3, v5, v3, v3
; GFX6-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3		; GFX6-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3
; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4		; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4
; GFX6-FLUSH-NEXT: v_fma_f32 v5, v6, v3, v5		; GFX6-FLUSH-NEXT: v_fma_f32 v5, v6, v3, v5
; GFX6-FLUSH-NEXT: v_fma_f32 v2, -v2, v5, v4		; GFX6-FLUSH-NEXT: v_fma_f32 v2, -v2, v5, v4
; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0		; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
; GFX6-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v3, v5		; GFX6-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v3, v5
; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v2, v1, v0		; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v2, v1, v0
; GFX6-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX6-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX89-IEEE-LABEL: v_fdiv_f32:		; GFX89-IEEE-LABEL: v_fdiv_f32:
; GFX89-IEEE: ; %bb.0:		; GFX89-IEEE: ; %bb.0:
; GFX89-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX89-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX89-IEEE-NEXT: v_div_scale_f32 v2, s[4:5], v1, v1, v0		; GFX89-IEEE-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v0
; GFX89-IEEE-NEXT: v_div_scale_f32 v3, vcc, v0, v1, v0		; GFX89-IEEE-NEXT: v_div_scale_f32 v3, vcc, v0, v1, v0
; GFX89-IEEE-NEXT: v_rcp_f32_e32 v4, v2		; GFX89-IEEE-NEXT: v_rcp_f32_e32 v4, v2
; GFX89-IEEE-NEXT: v_fma_f32 v5, -v2, v4, 1.0		; GFX89-IEEE-NEXT: v_fma_f32 v5, -v2, v4, 1.0
; GFX89-IEEE-NEXT: v_fma_f32 v4, v5, v4, v4		; GFX89-IEEE-NEXT: v_fma_f32 v4, v5, v4, v4
; GFX89-IEEE-NEXT: v_mul_f32_e32 v5, v3, v4		; GFX89-IEEE-NEXT: v_mul_f32_e32 v5, v3, v4
; GFX89-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v3		; GFX89-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v3
; GFX89-IEEE-NEXT: v_fma_f32 v5, v6, v4, v5		; GFX89-IEEE-NEXT: v_fma_f32 v5, v6, v4, v5
; GFX89-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v3		; GFX89-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v3
; GFX89-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v5		; GFX89-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v5
; GFX89-IEEE-NEXT: v_div_fixup_f32 v0, v2, v1, v0		; GFX89-IEEE-NEXT: v_div_fixup_f32 v0, v2, v1, v0
; GFX89-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX89-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX89-FLUSH-LABEL: v_fdiv_f32:		; GFX89-FLUSH-LABEL: v_fdiv_f32:
; GFX89-FLUSH: ; %bb.0:		; GFX89-FLUSH: ; %bb.0:
; GFX89-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX89-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX89-FLUSH-NEXT: v_div_scale_f32 v2, s[4:5], v1, v1, v0		; GFX89-FLUSH-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v0
; GFX89-FLUSH-NEXT: v_div_scale_f32 v3, vcc, v0, v1, v0		; GFX89-FLUSH-NEXT: v_div_scale_f32 v3, vcc, v0, v1, v0
; GFX89-FLUSH-NEXT: v_rcp_f32_e32 v4, v2		; GFX89-FLUSH-NEXT: v_rcp_f32_e32 v4, v2
; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3		; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
; GFX89-FLUSH-NEXT: v_fma_f32 v5, -v2, v4, 1.0		; GFX89-FLUSH-NEXT: v_fma_f32 v5, -v2, v4, 1.0
; GFX89-FLUSH-NEXT: v_fma_f32 v4, v5, v4, v4		; GFX89-FLUSH-NEXT: v_fma_f32 v4, v5, v4, v4
; GFX89-FLUSH-NEXT: v_mul_f32_e32 v5, v3, v4		; GFX89-FLUSH-NEXT: v_mul_f32_e32 v5, v3, v4
; GFX89-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v3		; GFX89-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v3
; GFX89-FLUSH-NEXT: v_fma_f32 v5, v6, v4, v5		; GFX89-FLUSH-NEXT: v_fma_f32 v5, v6, v4, v5
; GFX89-FLUSH-NEXT: v_fma_f32 v2, -v2, v5, v3		; GFX89-FLUSH-NEXT: v_fma_f32 v2, -v2, v5, v3
; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0		; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
; GFX89-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v4, v5		; GFX89-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v4, v5
; GFX89-FLUSH-NEXT: v_div_fixup_f32 v0, v2, v1, v0		; GFX89-FLUSH-NEXT: v_div_fixup_f32 v0, v2, v1, v0
; GFX89-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX89-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-IEEE-LABEL: v_fdiv_f32:		; GFX10-IEEE-LABEL: v_fdiv_f32:
; GFX10-IEEE: ; %bb.0:		; GFX10-IEEE: ; %bb.0:
; GFX10-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-IEEE-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-IEEE-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-IEEE-NEXT: v_div_scale_f32 v2, s4, v1, v1, v0		; GFX10-IEEE-NEXT: v_div_scale_f32 v2, vcc_lo, v1, v1, v0
; GFX10-IEEE-NEXT: v_div_scale_f32 v5, vcc_lo, v0, v1, v0		; GFX10-IEEE-NEXT: v_div_scale_f32 v5, vcc_lo, v0, v1, v0
; GFX10-IEEE-NEXT: v_rcp_f32_e32 v3, v2		; GFX10-IEEE-NEXT: v_rcp_f32_e32 v3, v2
; GFX10-IEEE-NEXT: v_fma_f32 v4, -v2, v3, 1.0		; GFX10-IEEE-NEXT: v_fma_f32 v4, -v2, v3, 1.0
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v3, v4, v3		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v3, v4, v3
; GFX10-IEEE-NEXT: v_mul_f32_e32 v4, v5, v3		; GFX10-IEEE-NEXT: v_mul_f32_e32 v4, v5, v3
; GFX10-IEEE-NEXT: v_fma_f32 v6, -v2, v4, v5		; GFX10-IEEE-NEXT: v_fma_f32 v6, -v2, v4, v5
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v4, v6, v3		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v4, v6, v3
; GFX10-IEEE-NEXT: v_fma_f32 v2, -v2, v4, v5		; GFX10-IEEE-NEXT: v_fma_f32 v2, -v2, v4, v5
; GFX10-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v4		; GFX10-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v4
; GFX10-IEEE-NEXT: v_div_fixup_f32 v0, v2, v1, v0		; GFX10-IEEE-NEXT: v_div_fixup_f32 v0, v2, v1, v0
; GFX10-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX10-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-FLUSH-LABEL: v_fdiv_f32:		; GFX10-FLUSH-LABEL: v_fdiv_f32:
; GFX10-FLUSH: ; %bb.0:		; GFX10-FLUSH: ; %bb.0:
; GFX10-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-FLUSH-NEXT: v_div_scale_f32 v2, s4, v1, v1, v0		; GFX10-FLUSH-NEXT: v_div_scale_f32 v2, vcc_lo, v1, v1, v0
; GFX10-FLUSH-NEXT: v_div_scale_f32 v4, vcc_lo, v0, v1, v0		; GFX10-FLUSH-NEXT: v_div_scale_f32 v4, vcc_lo, v0, v1, v0
; GFX10-FLUSH-NEXT: v_rcp_f32_e32 v3, v2		; GFX10-FLUSH-NEXT: v_rcp_f32_e32 v3, v2
; GFX10-FLUSH-NEXT: s_denorm_mode 3		; GFX10-FLUSH-NEXT: s_denorm_mode 3
; GFX10-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0		; GFX10-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0
; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v3, v5, v3		; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v3, v5, v3
; GFX10-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3		; GFX10-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3
; GFX10-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4		; GFX10-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4
; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v5, v6, v3		; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v5, v6, v3
; GFX10-FLUSH-NEXT: v_fma_f32 v2, -v2, v5, v4		; GFX10-FLUSH-NEXT: v_fma_f32 v2, -v2, v5, v4
; GFX10-FLUSH-NEXT: s_denorm_mode 0		; GFX10-FLUSH-NEXT: s_denorm_mode 0
; GFX10-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v3, v5		; GFX10-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v3, v5
; GFX10-FLUSH-NEXT: v_div_fixup_f32 v0, v2, v1, v0		; GFX10-FLUSH-NEXT: v_div_fixup_f32 v0, v2, v1, v0
; GFX10-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX10-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-IEEE-LABEL: v_fdiv_f32:		; GFX11-IEEE-LABEL: v_fdiv_f32:
; GFX11-IEEE: ; %bb.0:		; GFX11-IEEE: ; %bb.0:
; GFX11-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-IEEE-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-IEEE-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-IEEE-NEXT: v_div_scale_f32 v2, null, v1, v1, v0		; GFX11-IEEE-NEXT: v_div_scale_f32 v2, vcc_lo, v1, v1, v0
; GFX11-IEEE-NEXT: v_div_scale_f32 v5, vcc_lo, v0, v1, v0		; GFX11-IEEE-NEXT: v_div_scale_f32 v5, vcc_lo, v0, v1, v0
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: v_rcp_f32_e32 v3, v2		; GFX11-IEEE-NEXT: v_rcp_f32_e32 v3, v2
; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff		; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff
; GFX11-IEEE-NEXT: v_fma_f32 v4, -v2, v3, 1.0		; GFX11-IEEE-NEXT: v_fma_f32 v4, -v2, v3, 1.0
; GFX11-IEEE-NEXT: v_fmac_f32_e32 v3, v4, v3		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v3, v4, v3
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: v_mul_f32_e32 v4, v5, v3		; GFX11-IEEE-NEXT: v_mul_f32_e32 v4, v5, v3
; GFX11-IEEE-NEXT: v_fma_f32 v6, -v2, v4, v5		; GFX11-IEEE-NEXT: v_fma_f32 v6, -v2, v4, v5
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: v_fmac_f32_e32 v4, v6, v3		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v4, v6, v3
; GFX11-IEEE-NEXT: v_fma_f32 v2, -v2, v4, v5		; GFX11-IEEE-NEXT: v_fma_f32 v2, -v2, v4, v5
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v4		; GFX11-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v4
; GFX11-IEEE-NEXT: v_div_fixup_f32 v0, v2, v1, v0		; GFX11-IEEE-NEXT: v_div_fixup_f32 v0, v2, v1, v0
; GFX11-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX11-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-FLUSH-LABEL: v_fdiv_f32:		; GFX11-FLUSH-LABEL: v_fdiv_f32:
; GFX11-FLUSH: ; %bb.0:		; GFX11-FLUSH: ; %bb.0:
; GFX11-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-FLUSH-NEXT: v_div_scale_f32 v2, null, v1, v1, v0		; GFX11-FLUSH-NEXT: v_div_scale_f32 v2, vcc_lo, v1, v1, v0
; GFX11-FLUSH-NEXT: v_div_scale_f32 v4, vcc_lo, v0, v1, v0		; GFX11-FLUSH-NEXT: v_div_scale_f32 v4, vcc_lo, v0, v1, v0
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_rcp_f32_e32 v3, v2		; GFX11-FLUSH-NEXT: v_rcp_f32_e32 v3, v2
; GFX11-FLUSH-NEXT: s_denorm_mode 3		; GFX11-FLUSH-NEXT: s_denorm_mode 3
; GFX11-FLUSH-NEXT: s_waitcnt_depctr 0xfff		; GFX11-FLUSH-NEXT: s_waitcnt_depctr 0xfff
; GFX11-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0		; GFX11-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0
; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v3, v5, v3		; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v3, v5, v3
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
Show All 38 Lines	; GFX11-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv afn float %a, %b		%fdiv = fdiv afn float %a, %b
ret float %fdiv		ret float %fdiv
}		}

define float @v_fdiv_f32_ulp25(float %a, float %b) {		define float @v_fdiv_f32_ulp25(float %a, float %b) {
; GFX6-IEEE-LABEL: v_fdiv_f32_ulp25:		; GFX6-IEEE-LABEL: v_fdiv_f32_ulp25:
; GFX6-IEEE: ; %bb.0:		; GFX6-IEEE: ; %bb.0:
; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-IEEE-NEXT: v_div_scale_f32 v2, s[4:5], v1, v1, v0		; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v0
; GFX6-IEEE-NEXT: v_rcp_f32_e32 v3, v2		; GFX6-IEEE-NEXT: v_rcp_f32_e32 v3, v2
; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0		; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0
; GFX6-IEEE-NEXT: v_fma_f32 v5, -v2, v3, 1.0		; GFX6-IEEE-NEXT: v_fma_f32 v5, -v2, v3, 1.0
; GFX6-IEEE-NEXT: v_fma_f32 v3, v5, v3, v3		; GFX6-IEEE-NEXT: v_fma_f32 v3, v5, v3, v3
; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v4, v3		; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v4, v3
; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v4		; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v4
; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v3, v5		; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v3, v5
; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v4		; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v4
Show All 12 Lines
; GCN-FLUSH-NEXT: v_rcp_f32_e32 v1, v1		; GCN-FLUSH-NEXT: v_rcp_f32_e32 v1, v1
; GCN-FLUSH-NEXT: v_mul_f32_e32 v0, v0, v1		; GCN-FLUSH-NEXT: v_mul_f32_e32 v0, v0, v1
; GCN-FLUSH-NEXT: v_mul_f32_e32 v0, v2, v0		; GCN-FLUSH-NEXT: v_mul_f32_e32 v0, v2, v0
; GCN-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GCN-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX89-IEEE-LABEL: v_fdiv_f32_ulp25:		; GFX89-IEEE-LABEL: v_fdiv_f32_ulp25:
; GFX89-IEEE: ; %bb.0:		; GFX89-IEEE: ; %bb.0:
; GFX89-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX89-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX89-IEEE-NEXT: v_div_scale_f32 v2, s[4:5], v1, v1, v0		; GFX89-IEEE-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v0
; GFX89-IEEE-NEXT: v_div_scale_f32 v3, vcc, v0, v1, v0		; GFX89-IEEE-NEXT: v_div_scale_f32 v3, vcc, v0, v1, v0
; GFX89-IEEE-NEXT: v_rcp_f32_e32 v4, v2		; GFX89-IEEE-NEXT: v_rcp_f32_e32 v4, v2
; GFX89-IEEE-NEXT: v_fma_f32 v5, -v2, v4, 1.0		; GFX89-IEEE-NEXT: v_fma_f32 v5, -v2, v4, 1.0
; GFX89-IEEE-NEXT: v_fma_f32 v4, v5, v4, v4		; GFX89-IEEE-NEXT: v_fma_f32 v4, v5, v4, v4
; GFX89-IEEE-NEXT: v_mul_f32_e32 v5, v3, v4		; GFX89-IEEE-NEXT: v_mul_f32_e32 v5, v3, v4
; GFX89-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v3		; GFX89-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v3
; GFX89-IEEE-NEXT: v_fma_f32 v5, v6, v4, v5		; GFX89-IEEE-NEXT: v_fma_f32 v5, v6, v4, v5
; GFX89-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v3		; GFX89-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v3
; GFX89-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v5		; GFX89-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v5
; GFX89-IEEE-NEXT: v_div_fixup_f32 v0, v2, v1, v0		; GFX89-IEEE-NEXT: v_div_fixup_f32 v0, v2, v1, v0
; GFX89-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX89-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-IEEE-LABEL: v_fdiv_f32_ulp25:		; GFX10-IEEE-LABEL: v_fdiv_f32_ulp25:
; GFX10-IEEE: ; %bb.0:		; GFX10-IEEE: ; %bb.0:
; GFX10-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-IEEE-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-IEEE-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-IEEE-NEXT: v_div_scale_f32 v2, s4, v1, v1, v0		; GFX10-IEEE-NEXT: v_div_scale_f32 v2, vcc_lo, v1, v1, v0
; GFX10-IEEE-NEXT: v_div_scale_f32 v5, vcc_lo, v0, v1, v0		; GFX10-IEEE-NEXT: v_div_scale_f32 v5, vcc_lo, v0, v1, v0
; GFX10-IEEE-NEXT: v_rcp_f32_e32 v3, v2		; GFX10-IEEE-NEXT: v_rcp_f32_e32 v3, v2
; GFX10-IEEE-NEXT: v_fma_f32 v4, -v2, v3, 1.0		; GFX10-IEEE-NEXT: v_fma_f32 v4, -v2, v3, 1.0
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v3, v4, v3		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v3, v4, v3
; GFX10-IEEE-NEXT: v_mul_f32_e32 v4, v5, v3		; GFX10-IEEE-NEXT: v_mul_f32_e32 v4, v5, v3
; GFX10-IEEE-NEXT: v_fma_f32 v6, -v2, v4, v5		; GFX10-IEEE-NEXT: v_fma_f32 v6, -v2, v4, v5
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v4, v6, v3		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v4, v6, v3
; GFX10-IEEE-NEXT: v_fma_f32 v2, -v2, v4, v5		; GFX10-IEEE-NEXT: v_fma_f32 v2, -v2, v4, v5
Show All 12 Lines
; GFX10-FLUSH-NEXT: v_mul_f32_e32 v0, v0, v1		; GFX10-FLUSH-NEXT: v_mul_f32_e32 v0, v0, v1
; GFX10-FLUSH-NEXT: v_mul_f32_e32 v0, v2, v0		; GFX10-FLUSH-NEXT: v_mul_f32_e32 v0, v2, v0
; GFX10-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX10-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-IEEE-LABEL: v_fdiv_f32_ulp25:		; GFX11-IEEE-LABEL: v_fdiv_f32_ulp25:
; GFX11-IEEE: ; %bb.0:		; GFX11-IEEE: ; %bb.0:
; GFX11-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-IEEE-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-IEEE-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-IEEE-NEXT: v_div_scale_f32 v2, null, v1, v1, v0		; GFX11-IEEE-NEXT: v_div_scale_f32 v2, vcc_lo, v1, v1, v0
; GFX11-IEEE-NEXT: v_div_scale_f32 v5, vcc_lo, v0, v1, v0		; GFX11-IEEE-NEXT: v_div_scale_f32 v5, vcc_lo, v0, v1, v0
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: v_rcp_f32_e32 v3, v2		; GFX11-IEEE-NEXT: v_rcp_f32_e32 v3, v2
; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff		; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff
; GFX11-IEEE-NEXT: v_fma_f32 v4, -v2, v3, 1.0		; GFX11-IEEE-NEXT: v_fma_f32 v4, -v2, v3, 1.0
; GFX11-IEEE-NEXT: v_fmac_f32_e32 v3, v4, v3		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v3, v4, v3
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: v_mul_f32_e32 v4, v5, v3		; GFX11-IEEE-NEXT: v_mul_f32_e32 v4, v5, v3
Show All 23 Lines	; GFX11-FLUSH-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv float %a, %b, !fpmath !0		%fdiv = fdiv float %a, %b, !fpmath !0
ret float %fdiv		ret float %fdiv
}		}

define float @v_rcp_f32(float %x) {		define float @v_rcp_f32(float %x) {
; GFX6-IEEE-LABEL: v_rcp_f32:		; GFX6-IEEE-LABEL: v_rcp_f32:
; GFX6-IEEE: ; %bb.0:		; GFX6-IEEE: ; %bb.0:
; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-IEEE-NEXT: v_div_scale_f32 v1, s[4:5], v0, v0, 1.0		; GFX6-IEEE-NEXT: v_div_scale_f32 v1, vcc, v0, v0, 1.0
; GFX6-IEEE-NEXT: v_rcp_f32_e32 v2, v1		; GFX6-IEEE-NEXT: v_rcp_f32_e32 v2, v1
; GFX6-IEEE-NEXT: v_div_scale_f32 v3, vcc, 1.0, v0, 1.0		; GFX6-IEEE-NEXT: v_div_scale_f32 v3, vcc, 1.0, v0, 1.0
; GFX6-IEEE-NEXT: v_fma_f32 v4, -v1, v2, 1.0		; GFX6-IEEE-NEXT: v_fma_f32 v4, -v1, v2, 1.0
; GFX6-IEEE-NEXT: v_fma_f32 v2, v4, v2, v2		; GFX6-IEEE-NEXT: v_fma_f32 v2, v4, v2, v2
; GFX6-IEEE-NEXT: v_mul_f32_e32 v4, v3, v2		; GFX6-IEEE-NEXT: v_mul_f32_e32 v4, v3, v2
; GFX6-IEEE-NEXT: v_fma_f32 v5, -v1, v4, v3		; GFX6-IEEE-NEXT: v_fma_f32 v5, -v1, v4, v3
; GFX6-IEEE-NEXT: v_fma_f32 v4, v5, v2, v4		; GFX6-IEEE-NEXT: v_fma_f32 v4, v5, v2, v4
; GFX6-IEEE-NEXT: v_fma_f32 v1, -v1, v4, v3		; GFX6-IEEE-NEXT: v_fma_f32 v1, -v1, v4, v3
; GFX6-IEEE-NEXT: v_div_fmas_f32 v1, v1, v2, v4		; GFX6-IEEE-NEXT: v_div_fmas_f32 v1, v1, v2, v4
; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0		; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0
; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX6-FLUSH-LABEL: v_rcp_f32:		; GFX6-FLUSH-LABEL: v_rcp_f32:
; GFX6-FLUSH: ; %bb.0:		; GFX6-FLUSH: ; %bb.0:
; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-FLUSH-NEXT: v_div_scale_f32 v1, s[4:5], v0, v0, 1.0		; GFX6-FLUSH-NEXT: v_div_scale_f32 v1, vcc, v0, v0, 1.0
; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v2, v1		; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v2, v1
; GFX6-FLUSH-NEXT: v_div_scale_f32 v3, vcc, 1.0, v0, 1.0		; GFX6-FLUSH-NEXT: v_div_scale_f32 v3, vcc, 1.0, v0, 1.0
; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3		; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
; GFX6-FLUSH-NEXT: v_fma_f32 v4, -v1, v2, 1.0		; GFX6-FLUSH-NEXT: v_fma_f32 v4, -v1, v2, 1.0
; GFX6-FLUSH-NEXT: v_fma_f32 v2, v4, v2, v2		; GFX6-FLUSH-NEXT: v_fma_f32 v2, v4, v2, v2
; GFX6-FLUSH-NEXT: v_mul_f32_e32 v4, v3, v2		; GFX6-FLUSH-NEXT: v_mul_f32_e32 v4, v3, v2
; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v1, v4, v3		; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v1, v4, v3
; GFX6-FLUSH-NEXT: v_fma_f32 v4, v5, v2, v4		; GFX6-FLUSH-NEXT: v_fma_f32 v4, v5, v2, v4
; GFX6-FLUSH-NEXT: v_fma_f32 v1, -v1, v4, v3		; GFX6-FLUSH-NEXT: v_fma_f32 v1, -v1, v4, v3
; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0		; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
; GFX6-FLUSH-NEXT: v_div_fmas_f32 v1, v1, v2, v4		; GFX6-FLUSH-NEXT: v_div_fmas_f32 v1, v1, v2, v4
; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0		; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0
; GFX6-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX6-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX89-IEEE-LABEL: v_rcp_f32:		; GFX89-IEEE-LABEL: v_rcp_f32:
; GFX89-IEEE: ; %bb.0:		; GFX89-IEEE: ; %bb.0:
; GFX89-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX89-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX89-IEEE-NEXT: v_div_scale_f32 v1, s[4:5], v0, v0, 1.0		; GFX89-IEEE-NEXT: v_div_scale_f32 v1, vcc, v0, v0, 1.0
; GFX89-IEEE-NEXT: v_div_scale_f32 v2, vcc, 1.0, v0, 1.0		; GFX89-IEEE-NEXT: v_div_scale_f32 v2, vcc, 1.0, v0, 1.0
; GFX89-IEEE-NEXT: v_rcp_f32_e32 v3, v1		; GFX89-IEEE-NEXT: v_rcp_f32_e32 v3, v1
; GFX89-IEEE-NEXT: v_fma_f32 v4, -v1, v3, 1.0		; GFX89-IEEE-NEXT: v_fma_f32 v4, -v1, v3, 1.0
; GFX89-IEEE-NEXT: v_fma_f32 v3, v4, v3, v3		; GFX89-IEEE-NEXT: v_fma_f32 v3, v4, v3, v3
; GFX89-IEEE-NEXT: v_mul_f32_e32 v4, v2, v3		; GFX89-IEEE-NEXT: v_mul_f32_e32 v4, v2, v3
; GFX89-IEEE-NEXT: v_fma_f32 v5, -v1, v4, v2		; GFX89-IEEE-NEXT: v_fma_f32 v5, -v1, v4, v2
; GFX89-IEEE-NEXT: v_fma_f32 v4, v5, v3, v4		; GFX89-IEEE-NEXT: v_fma_f32 v4, v5, v3, v4
; GFX89-IEEE-NEXT: v_fma_f32 v1, -v1, v4, v2		; GFX89-IEEE-NEXT: v_fma_f32 v1, -v1, v4, v2
; GFX89-IEEE-NEXT: v_div_fmas_f32 v1, v1, v3, v4		; GFX89-IEEE-NEXT: v_div_fmas_f32 v1, v1, v3, v4
; GFX89-IEEE-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0		; GFX89-IEEE-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0
; GFX89-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX89-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX89-FLUSH-LABEL: v_rcp_f32:		; GFX89-FLUSH-LABEL: v_rcp_f32:
; GFX89-FLUSH: ; %bb.0:		; GFX89-FLUSH: ; %bb.0:
; GFX89-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX89-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX89-FLUSH-NEXT: v_div_scale_f32 v1, s[4:5], v0, v0, 1.0		; GFX89-FLUSH-NEXT: v_div_scale_f32 v1, vcc, v0, v0, 1.0
; GFX89-FLUSH-NEXT: v_div_scale_f32 v2, vcc, 1.0, v0, 1.0		; GFX89-FLUSH-NEXT: v_div_scale_f32 v2, vcc, 1.0, v0, 1.0
; GFX89-FLUSH-NEXT: v_rcp_f32_e32 v3, v1		; GFX89-FLUSH-NEXT: v_rcp_f32_e32 v3, v1
; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3		; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
; GFX89-FLUSH-NEXT: v_fma_f32 v4, -v1, v3, 1.0		; GFX89-FLUSH-NEXT: v_fma_f32 v4, -v1, v3, 1.0
; GFX89-FLUSH-NEXT: v_fma_f32 v3, v4, v3, v3		; GFX89-FLUSH-NEXT: v_fma_f32 v3, v4, v3, v3
; GFX89-FLUSH-NEXT: v_mul_f32_e32 v4, v2, v3		; GFX89-FLUSH-NEXT: v_mul_f32_e32 v4, v2, v3
; GFX89-FLUSH-NEXT: v_fma_f32 v5, -v1, v4, v2		; GFX89-FLUSH-NEXT: v_fma_f32 v5, -v1, v4, v2
; GFX89-FLUSH-NEXT: v_fma_f32 v4, v5, v3, v4		; GFX89-FLUSH-NEXT: v_fma_f32 v4, v5, v3, v4
; GFX89-FLUSH-NEXT: v_fma_f32 v1, -v1, v4, v2		; GFX89-FLUSH-NEXT: v_fma_f32 v1, -v1, v4, v2
; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0		; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
; GFX89-FLUSH-NEXT: v_div_fmas_f32 v1, v1, v3, v4		; GFX89-FLUSH-NEXT: v_div_fmas_f32 v1, v1, v3, v4
; GFX89-FLUSH-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0		; GFX89-FLUSH-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0
; GFX89-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX89-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-IEEE-LABEL: v_rcp_f32:		; GFX10-IEEE-LABEL: v_rcp_f32:
; GFX10-IEEE: ; %bb.0:		; GFX10-IEEE: ; %bb.0:
; GFX10-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-IEEE-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-IEEE-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-IEEE-NEXT: v_div_scale_f32 v1, s4, v0, v0, 1.0		; GFX10-IEEE-NEXT: v_div_scale_f32 v1, vcc_lo, v0, v0, 1.0
; GFX10-IEEE-NEXT: v_div_scale_f32 v4, vcc_lo, 1.0, v0, 1.0		; GFX10-IEEE-NEXT: v_div_scale_f32 v4, vcc_lo, 1.0, v0, 1.0
; GFX10-IEEE-NEXT: v_rcp_f32_e32 v2, v1		; GFX10-IEEE-NEXT: v_rcp_f32_e32 v2, v1
; GFX10-IEEE-NEXT: v_fma_f32 v3, -v1, v2, 1.0		; GFX10-IEEE-NEXT: v_fma_f32 v3, -v1, v2, 1.0
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v2, v3, v2		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v2, v3, v2
; GFX10-IEEE-NEXT: v_mul_f32_e32 v3, v4, v2		; GFX10-IEEE-NEXT: v_mul_f32_e32 v3, v4, v2
; GFX10-IEEE-NEXT: v_fma_f32 v5, -v1, v3, v4		; GFX10-IEEE-NEXT: v_fma_f32 v5, -v1, v3, v4
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v3, v5, v2		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v3, v5, v2
; GFX10-IEEE-NEXT: v_fma_f32 v1, -v1, v3, v4		; GFX10-IEEE-NEXT: v_fma_f32 v1, -v1, v3, v4
; GFX10-IEEE-NEXT: v_div_fmas_f32 v1, v1, v2, v3		; GFX10-IEEE-NEXT: v_div_fmas_f32 v1, v1, v2, v3
; GFX10-IEEE-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0		; GFX10-IEEE-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0
; GFX10-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX10-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-FLUSH-LABEL: v_rcp_f32:		; GFX10-FLUSH-LABEL: v_rcp_f32:
; GFX10-FLUSH: ; %bb.0:		; GFX10-FLUSH: ; %bb.0:
; GFX10-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-FLUSH-NEXT: v_div_scale_f32 v1, s4, v0, v0, 1.0		; GFX10-FLUSH-NEXT: v_div_scale_f32 v1, vcc_lo, v0, v0, 1.0
; GFX10-FLUSH-NEXT: v_div_scale_f32 v3, vcc_lo, 1.0, v0, 1.0		; GFX10-FLUSH-NEXT: v_div_scale_f32 v3, vcc_lo, 1.0, v0, 1.0
; GFX10-FLUSH-NEXT: v_rcp_f32_e32 v2, v1		; GFX10-FLUSH-NEXT: v_rcp_f32_e32 v2, v1
; GFX10-FLUSH-NEXT: s_denorm_mode 3		; GFX10-FLUSH-NEXT: s_denorm_mode 3
; GFX10-FLUSH-NEXT: v_fma_f32 v4, -v1, v2, 1.0		; GFX10-FLUSH-NEXT: v_fma_f32 v4, -v1, v2, 1.0
; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v2, v4, v2		; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v2, v4, v2
; GFX10-FLUSH-NEXT: v_mul_f32_e32 v4, v3, v2		; GFX10-FLUSH-NEXT: v_mul_f32_e32 v4, v3, v2
; GFX10-FLUSH-NEXT: v_fma_f32 v5, -v1, v4, v3		; GFX10-FLUSH-NEXT: v_fma_f32 v5, -v1, v4, v3
; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v4, v5, v2		; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v4, v5, v2
; GFX10-FLUSH-NEXT: v_fma_f32 v1, -v1, v4, v3		; GFX10-FLUSH-NEXT: v_fma_f32 v1, -v1, v4, v3
; GFX10-FLUSH-NEXT: s_denorm_mode 0		; GFX10-FLUSH-NEXT: s_denorm_mode 0
; GFX10-FLUSH-NEXT: v_div_fmas_f32 v1, v1, v2, v4		; GFX10-FLUSH-NEXT: v_div_fmas_f32 v1, v1, v2, v4
; GFX10-FLUSH-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0		; GFX10-FLUSH-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0
; GFX10-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX10-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-IEEE-LABEL: v_rcp_f32:		; GFX11-IEEE-LABEL: v_rcp_f32:
; GFX11-IEEE: ; %bb.0:		; GFX11-IEEE: ; %bb.0:
; GFX11-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-IEEE-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-IEEE-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-IEEE-NEXT: v_div_scale_f32 v1, null, v0, v0, 1.0		; GFX11-IEEE-NEXT: v_div_scale_f32 v1, vcc_lo, v0, v0, 1.0
; GFX11-IEEE-NEXT: v_div_scale_f32 v4, vcc_lo, 1.0, v0, 1.0		; GFX11-IEEE-NEXT: v_div_scale_f32 v4, vcc_lo, 1.0, v0, 1.0
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: v_rcp_f32_e32 v2, v1		; GFX11-IEEE-NEXT: v_rcp_f32_e32 v2, v1
; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff		; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff
; GFX11-IEEE-NEXT: v_fma_f32 v3, -v1, v2, 1.0		; GFX11-IEEE-NEXT: v_fma_f32 v3, -v1, v2, 1.0
; GFX11-IEEE-NEXT: v_fmac_f32_e32 v2, v3, v2		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v2, v3, v2
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: v_mul_f32_e32 v3, v4, v2		; GFX11-IEEE-NEXT: v_mul_f32_e32 v3, v4, v2
; GFX11-IEEE-NEXT: v_fma_f32 v5, -v1, v3, v4		; GFX11-IEEE-NEXT: v_fma_f32 v5, -v1, v3, v4
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: v_fmac_f32_e32 v3, v5, v2		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v3, v5, v2
; GFX11-IEEE-NEXT: v_fma_f32 v1, -v1, v3, v4		; GFX11-IEEE-NEXT: v_fma_f32 v1, -v1, v3, v4
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: v_div_fmas_f32 v1, v1, v2, v3		; GFX11-IEEE-NEXT: v_div_fmas_f32 v1, v1, v2, v3
; GFX11-IEEE-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0		; GFX11-IEEE-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0
; GFX11-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX11-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-FLUSH-LABEL: v_rcp_f32:		; GFX11-FLUSH-LABEL: v_rcp_f32:
; GFX11-FLUSH: ; %bb.0:		; GFX11-FLUSH: ; %bb.0:
; GFX11-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-FLUSH-NEXT: v_div_scale_f32 v1, null, v0, v0, 1.0		; GFX11-FLUSH-NEXT: v_div_scale_f32 v1, vcc_lo, v0, v0, 1.0
; GFX11-FLUSH-NEXT: v_div_scale_f32 v3, vcc_lo, 1.0, v0, 1.0		; GFX11-FLUSH-NEXT: v_div_scale_f32 v3, vcc_lo, 1.0, v0, 1.0
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_rcp_f32_e32 v2, v1		; GFX11-FLUSH-NEXT: v_rcp_f32_e32 v2, v1
; GFX11-FLUSH-NEXT: s_denorm_mode 3		; GFX11-FLUSH-NEXT: s_denorm_mode 3
; GFX11-FLUSH-NEXT: s_waitcnt_depctr 0xfff		; GFX11-FLUSH-NEXT: s_waitcnt_depctr 0xfff
; GFX11-FLUSH-NEXT: v_fma_f32 v4, -v1, v2, 1.0		; GFX11-FLUSH-NEXT: v_fma_f32 v4, -v1, v2, 1.0
; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v2, v4, v2		; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v2, v4, v2
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
Show All 10 Lines	; GFX11-FLUSH-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv float 1.0, %x		%fdiv = fdiv float 1.0, %x
ret float %fdiv		ret float %fdiv
}		}

define float @v_rcp_f32_arcp(float %x) {		define float @v_rcp_f32_arcp(float %x) {
; GFX6-IEEE-LABEL: v_rcp_f32_arcp:		; GFX6-IEEE-LABEL: v_rcp_f32_arcp:
; GFX6-IEEE: ; %bb.0:		; GFX6-IEEE: ; %bb.0:
; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-IEEE-NEXT: v_div_scale_f32 v1, s[4:5], v0, v0, 1.0		; GFX6-IEEE-NEXT: v_div_scale_f32 v1, vcc, v0, v0, 1.0
; GFX6-IEEE-NEXT: v_rcp_f32_e32 v2, v1		; GFX6-IEEE-NEXT: v_rcp_f32_e32 v2, v1
; GFX6-IEEE-NEXT: v_div_scale_f32 v3, vcc, 1.0, v0, 1.0		; GFX6-IEEE-NEXT: v_div_scale_f32 v3, vcc, 1.0, v0, 1.0
; GFX6-IEEE-NEXT: v_fma_f32 v4, -v1, v2, 1.0		; GFX6-IEEE-NEXT: v_fma_f32 v4, -v1, v2, 1.0
; GFX6-IEEE-NEXT: v_fma_f32 v2, v4, v2, v2		; GFX6-IEEE-NEXT: v_fma_f32 v2, v4, v2, v2
; GFX6-IEEE-NEXT: v_mul_f32_e32 v4, v3, v2		; GFX6-IEEE-NEXT: v_mul_f32_e32 v4, v3, v2
; GFX6-IEEE-NEXT: v_fma_f32 v5, -v1, v4, v3		; GFX6-IEEE-NEXT: v_fma_f32 v5, -v1, v4, v3
; GFX6-IEEE-NEXT: v_fma_f32 v4, v5, v2, v4		; GFX6-IEEE-NEXT: v_fma_f32 v4, v5, v2, v4
; GFX6-IEEE-NEXT: v_fma_f32 v1, -v1, v4, v3		; GFX6-IEEE-NEXT: v_fma_f32 v1, -v1, v4, v3
; GFX6-IEEE-NEXT: v_div_fmas_f32 v1, v1, v2, v4		; GFX6-IEEE-NEXT: v_div_fmas_f32 v1, v1, v2, v4
; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0		; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0
; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX6-FLUSH-LABEL: v_rcp_f32_arcp:		; GFX6-FLUSH-LABEL: v_rcp_f32_arcp:
; GFX6-FLUSH: ; %bb.0:		; GFX6-FLUSH: ; %bb.0:
; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-FLUSH-NEXT: v_div_scale_f32 v1, s[4:5], v0, v0, 1.0		; GFX6-FLUSH-NEXT: v_div_scale_f32 v1, vcc, v0, v0, 1.0
; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v2, v1		; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v2, v1
; GFX6-FLUSH-NEXT: v_div_scale_f32 v3, vcc, 1.0, v0, 1.0		; GFX6-FLUSH-NEXT: v_div_scale_f32 v3, vcc, 1.0, v0, 1.0
; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3		; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
; GFX6-FLUSH-NEXT: v_fma_f32 v4, -v1, v2, 1.0		; GFX6-FLUSH-NEXT: v_fma_f32 v4, -v1, v2, 1.0
; GFX6-FLUSH-NEXT: v_fma_f32 v2, v4, v2, v2		; GFX6-FLUSH-NEXT: v_fma_f32 v2, v4, v2, v2
; GFX6-FLUSH-NEXT: v_mul_f32_e32 v4, v3, v2		; GFX6-FLUSH-NEXT: v_mul_f32_e32 v4, v3, v2
; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v1, v4, v3		; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v1, v4, v3
; GFX6-FLUSH-NEXT: v_fma_f32 v4, v5, v2, v4		; GFX6-FLUSH-NEXT: v_fma_f32 v4, v5, v2, v4
; GFX6-FLUSH-NEXT: v_fma_f32 v1, -v1, v4, v3		; GFX6-FLUSH-NEXT: v_fma_f32 v1, -v1, v4, v3
; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0		; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
; GFX6-FLUSH-NEXT: v_div_fmas_f32 v1, v1, v2, v4		; GFX6-FLUSH-NEXT: v_div_fmas_f32 v1, v1, v2, v4
; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0		; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0
; GFX6-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX6-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX89-IEEE-LABEL: v_rcp_f32_arcp:		; GFX89-IEEE-LABEL: v_rcp_f32_arcp:
; GFX89-IEEE: ; %bb.0:		; GFX89-IEEE: ; %bb.0:
; GFX89-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX89-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX89-IEEE-NEXT: v_div_scale_f32 v1, s[4:5], v0, v0, 1.0		; GFX89-IEEE-NEXT: v_div_scale_f32 v1, vcc, v0, v0, 1.0
; GFX89-IEEE-NEXT: v_div_scale_f32 v2, vcc, 1.0, v0, 1.0		; GFX89-IEEE-NEXT: v_div_scale_f32 v2, vcc, 1.0, v0, 1.0
; GFX89-IEEE-NEXT: v_rcp_f32_e32 v3, v1		; GFX89-IEEE-NEXT: v_rcp_f32_e32 v3, v1
; GFX89-IEEE-NEXT: v_fma_f32 v4, -v1, v3, 1.0		; GFX89-IEEE-NEXT: v_fma_f32 v4, -v1, v3, 1.0
; GFX89-IEEE-NEXT: v_fma_f32 v3, v4, v3, v3		; GFX89-IEEE-NEXT: v_fma_f32 v3, v4, v3, v3
; GFX89-IEEE-NEXT: v_mul_f32_e32 v4, v2, v3		; GFX89-IEEE-NEXT: v_mul_f32_e32 v4, v2, v3
; GFX89-IEEE-NEXT: v_fma_f32 v5, -v1, v4, v2		; GFX89-IEEE-NEXT: v_fma_f32 v5, -v1, v4, v2
; GFX89-IEEE-NEXT: v_fma_f32 v4, v5, v3, v4		; GFX89-IEEE-NEXT: v_fma_f32 v4, v5, v3, v4
; GFX89-IEEE-NEXT: v_fma_f32 v1, -v1, v4, v2		; GFX89-IEEE-NEXT: v_fma_f32 v1, -v1, v4, v2
; GFX89-IEEE-NEXT: v_div_fmas_f32 v1, v1, v3, v4		; GFX89-IEEE-NEXT: v_div_fmas_f32 v1, v1, v3, v4
; GFX89-IEEE-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0		; GFX89-IEEE-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0
; GFX89-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX89-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX89-FLUSH-LABEL: v_rcp_f32_arcp:		; GFX89-FLUSH-LABEL: v_rcp_f32_arcp:
; GFX89-FLUSH: ; %bb.0:		; GFX89-FLUSH: ; %bb.0:
; GFX89-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX89-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX89-FLUSH-NEXT: v_div_scale_f32 v1, s[4:5], v0, v0, 1.0		; GFX89-FLUSH-NEXT: v_div_scale_f32 v1, vcc, v0, v0, 1.0
; GFX89-FLUSH-NEXT: v_div_scale_f32 v2, vcc, 1.0, v0, 1.0		; GFX89-FLUSH-NEXT: v_div_scale_f32 v2, vcc, 1.0, v0, 1.0
; GFX89-FLUSH-NEXT: v_rcp_f32_e32 v3, v1		; GFX89-FLUSH-NEXT: v_rcp_f32_e32 v3, v1
; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3		; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
; GFX89-FLUSH-NEXT: v_fma_f32 v4, -v1, v3, 1.0		; GFX89-FLUSH-NEXT: v_fma_f32 v4, -v1, v3, 1.0
; GFX89-FLUSH-NEXT: v_fma_f32 v3, v4, v3, v3		; GFX89-FLUSH-NEXT: v_fma_f32 v3, v4, v3, v3
; GFX89-FLUSH-NEXT: v_mul_f32_e32 v4, v2, v3		; GFX89-FLUSH-NEXT: v_mul_f32_e32 v4, v2, v3
; GFX89-FLUSH-NEXT: v_fma_f32 v5, -v1, v4, v2		; GFX89-FLUSH-NEXT: v_fma_f32 v5, -v1, v4, v2
; GFX89-FLUSH-NEXT: v_fma_f32 v4, v5, v3, v4		; GFX89-FLUSH-NEXT: v_fma_f32 v4, v5, v3, v4
; GFX89-FLUSH-NEXT: v_fma_f32 v1, -v1, v4, v2		; GFX89-FLUSH-NEXT: v_fma_f32 v1, -v1, v4, v2
; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0		; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
; GFX89-FLUSH-NEXT: v_div_fmas_f32 v1, v1, v3, v4		; GFX89-FLUSH-NEXT: v_div_fmas_f32 v1, v1, v3, v4
; GFX89-FLUSH-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0		; GFX89-FLUSH-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0
; GFX89-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX89-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-IEEE-LABEL: v_rcp_f32_arcp:		; GFX10-IEEE-LABEL: v_rcp_f32_arcp:
; GFX10-IEEE: ; %bb.0:		; GFX10-IEEE: ; %bb.0:
; GFX10-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-IEEE-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-IEEE-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-IEEE-NEXT: v_div_scale_f32 v1, s4, v0, v0, 1.0		; GFX10-IEEE-NEXT: v_div_scale_f32 v1, vcc_lo, v0, v0, 1.0
; GFX10-IEEE-NEXT: v_div_scale_f32 v4, vcc_lo, 1.0, v0, 1.0		; GFX10-IEEE-NEXT: v_div_scale_f32 v4, vcc_lo, 1.0, v0, 1.0
; GFX10-IEEE-NEXT: v_rcp_f32_e32 v2, v1		; GFX10-IEEE-NEXT: v_rcp_f32_e32 v2, v1
; GFX10-IEEE-NEXT: v_fma_f32 v3, -v1, v2, 1.0		; GFX10-IEEE-NEXT: v_fma_f32 v3, -v1, v2, 1.0
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v2, v3, v2		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v2, v3, v2
; GFX10-IEEE-NEXT: v_mul_f32_e32 v3, v4, v2		; GFX10-IEEE-NEXT: v_mul_f32_e32 v3, v4, v2
; GFX10-IEEE-NEXT: v_fma_f32 v5, -v1, v3, v4		; GFX10-IEEE-NEXT: v_fma_f32 v5, -v1, v3, v4
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v3, v5, v2		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v3, v5, v2
; GFX10-IEEE-NEXT: v_fma_f32 v1, -v1, v3, v4		; GFX10-IEEE-NEXT: v_fma_f32 v1, -v1, v3, v4
; GFX10-IEEE-NEXT: v_div_fmas_f32 v1, v1, v2, v3		; GFX10-IEEE-NEXT: v_div_fmas_f32 v1, v1, v2, v3
; GFX10-IEEE-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0		; GFX10-IEEE-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0
; GFX10-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX10-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-FLUSH-LABEL: v_rcp_f32_arcp:		; GFX10-FLUSH-LABEL: v_rcp_f32_arcp:
; GFX10-FLUSH: ; %bb.0:		; GFX10-FLUSH: ; %bb.0:
; GFX10-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-FLUSH-NEXT: v_div_scale_f32 v1, s4, v0, v0, 1.0		; GFX10-FLUSH-NEXT: v_div_scale_f32 v1, vcc_lo, v0, v0, 1.0
; GFX10-FLUSH-NEXT: v_div_scale_f32 v3, vcc_lo, 1.0, v0, 1.0		; GFX10-FLUSH-NEXT: v_div_scale_f32 v3, vcc_lo, 1.0, v0, 1.0
; GFX10-FLUSH-NEXT: v_rcp_f32_e32 v2, v1		; GFX10-FLUSH-NEXT: v_rcp_f32_e32 v2, v1
; GFX10-FLUSH-NEXT: s_denorm_mode 3		; GFX10-FLUSH-NEXT: s_denorm_mode 3
; GFX10-FLUSH-NEXT: v_fma_f32 v4, -v1, v2, 1.0		; GFX10-FLUSH-NEXT: v_fma_f32 v4, -v1, v2, 1.0
; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v2, v4, v2		; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v2, v4, v2
; GFX10-FLUSH-NEXT: v_mul_f32_e32 v4, v3, v2		; GFX10-FLUSH-NEXT: v_mul_f32_e32 v4, v3, v2
; GFX10-FLUSH-NEXT: v_fma_f32 v5, -v1, v4, v3		; GFX10-FLUSH-NEXT: v_fma_f32 v5, -v1, v4, v3
; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v4, v5, v2		; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v4, v5, v2
; GFX10-FLUSH-NEXT: v_fma_f32 v1, -v1, v4, v3		; GFX10-FLUSH-NEXT: v_fma_f32 v1, -v1, v4, v3
; GFX10-FLUSH-NEXT: s_denorm_mode 0		; GFX10-FLUSH-NEXT: s_denorm_mode 0
; GFX10-FLUSH-NEXT: v_div_fmas_f32 v1, v1, v2, v4		; GFX10-FLUSH-NEXT: v_div_fmas_f32 v1, v1, v2, v4
; GFX10-FLUSH-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0		; GFX10-FLUSH-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0
; GFX10-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX10-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-IEEE-LABEL: v_rcp_f32_arcp:		; GFX11-IEEE-LABEL: v_rcp_f32_arcp:
; GFX11-IEEE: ; %bb.0:		; GFX11-IEEE: ; %bb.0:
; GFX11-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-IEEE-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-IEEE-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-IEEE-NEXT: v_div_scale_f32 v1, null, v0, v0, 1.0		; GFX11-IEEE-NEXT: v_div_scale_f32 v1, vcc_lo, v0, v0, 1.0
; GFX11-IEEE-NEXT: v_div_scale_f32 v4, vcc_lo, 1.0, v0, 1.0		; GFX11-IEEE-NEXT: v_div_scale_f32 v4, vcc_lo, 1.0, v0, 1.0
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: v_rcp_f32_e32 v2, v1		; GFX11-IEEE-NEXT: v_rcp_f32_e32 v2, v1
; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff		; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff
; GFX11-IEEE-NEXT: v_fma_f32 v3, -v1, v2, 1.0		; GFX11-IEEE-NEXT: v_fma_f32 v3, -v1, v2, 1.0
; GFX11-IEEE-NEXT: v_fmac_f32_e32 v2, v3, v2		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v2, v3, v2
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: v_mul_f32_e32 v3, v4, v2		; GFX11-IEEE-NEXT: v_mul_f32_e32 v3, v4, v2
; GFX11-IEEE-NEXT: v_fma_f32 v5, -v1, v3, v4		; GFX11-IEEE-NEXT: v_fma_f32 v5, -v1, v3, v4
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: v_fmac_f32_e32 v3, v5, v2		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v3, v5, v2
; GFX11-IEEE-NEXT: v_fma_f32 v1, -v1, v3, v4		; GFX11-IEEE-NEXT: v_fma_f32 v1, -v1, v3, v4
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: v_div_fmas_f32 v1, v1, v2, v3		; GFX11-IEEE-NEXT: v_div_fmas_f32 v1, v1, v2, v3
; GFX11-IEEE-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0		; GFX11-IEEE-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0
; GFX11-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX11-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-FLUSH-LABEL: v_rcp_f32_arcp:		; GFX11-FLUSH-LABEL: v_rcp_f32_arcp:
; GFX11-FLUSH: ; %bb.0:		; GFX11-FLUSH: ; %bb.0:
; GFX11-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-FLUSH-NEXT: v_div_scale_f32 v1, null, v0, v0, 1.0		; GFX11-FLUSH-NEXT: v_div_scale_f32 v1, vcc_lo, v0, v0, 1.0
; GFX11-FLUSH-NEXT: v_div_scale_f32 v3, vcc_lo, 1.0, v0, 1.0		; GFX11-FLUSH-NEXT: v_div_scale_f32 v3, vcc_lo, 1.0, v0, 1.0
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_rcp_f32_e32 v2, v1		; GFX11-FLUSH-NEXT: v_rcp_f32_e32 v2, v1
; GFX11-FLUSH-NEXT: s_denorm_mode 3		; GFX11-FLUSH-NEXT: s_denorm_mode 3
; GFX11-FLUSH-NEXT: s_waitcnt_depctr 0xfff		; GFX11-FLUSH-NEXT: s_waitcnt_depctr 0xfff
; GFX11-FLUSH-NEXT: v_fma_f32 v4, -v1, v2, 1.0		; GFX11-FLUSH-NEXT: v_fma_f32 v4, -v1, v2, 1.0
; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v2, v4, v2		; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v2, v4, v2
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	; GFX11-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv afn float %a, %b, !fpmath !0		%fdiv = fdiv afn float %a, %b, !fpmath !0
ret float %fdiv		ret float %fdiv
}		}

define float @v_fdiv_f32_arcp_ulp25(float %a, float %b) {		define float @v_fdiv_f32_arcp_ulp25(float %a, float %b) {
; GFX6-IEEE-LABEL: v_fdiv_f32_arcp_ulp25:		; GFX6-IEEE-LABEL: v_fdiv_f32_arcp_ulp25:
; GFX6-IEEE: ; %bb.0:		; GFX6-IEEE: ; %bb.0:
; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-IEEE-NEXT: v_div_scale_f32 v2, s[4:5], v1, v1, v0		; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v0
; GFX6-IEEE-NEXT: v_rcp_f32_e32 v3, v2		; GFX6-IEEE-NEXT: v_rcp_f32_e32 v3, v2
; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0		; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0
; GFX6-IEEE-NEXT: v_fma_f32 v5, -v2, v3, 1.0		; GFX6-IEEE-NEXT: v_fma_f32 v5, -v2, v3, 1.0
; GFX6-IEEE-NEXT: v_fma_f32 v3, v5, v3, v3		; GFX6-IEEE-NEXT: v_fma_f32 v3, v5, v3, v3
; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v4, v3		; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v4, v3
; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v4		; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v4
; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v3, v5		; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v3, v5
; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v4		; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v4
Show All 12 Lines
; GCN-FLUSH-NEXT: v_rcp_f32_e32 v1, v1		; GCN-FLUSH-NEXT: v_rcp_f32_e32 v1, v1
; GCN-FLUSH-NEXT: v_mul_f32_e32 v0, v0, v1		; GCN-FLUSH-NEXT: v_mul_f32_e32 v0, v0, v1
; GCN-FLUSH-NEXT: v_mul_f32_e32 v0, v2, v0		; GCN-FLUSH-NEXT: v_mul_f32_e32 v0, v2, v0
; GCN-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GCN-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX89-IEEE-LABEL: v_fdiv_f32_arcp_ulp25:		; GFX89-IEEE-LABEL: v_fdiv_f32_arcp_ulp25:
; GFX89-IEEE: ; %bb.0:		; GFX89-IEEE: ; %bb.0:
; GFX89-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX89-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX89-IEEE-NEXT: v_div_scale_f32 v2, s[4:5], v1, v1, v0		; GFX89-IEEE-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v0
; GFX89-IEEE-NEXT: v_div_scale_f32 v3, vcc, v0, v1, v0		; GFX89-IEEE-NEXT: v_div_scale_f32 v3, vcc, v0, v1, v0
; GFX89-IEEE-NEXT: v_rcp_f32_e32 v4, v2		; GFX89-IEEE-NEXT: v_rcp_f32_e32 v4, v2
; GFX89-IEEE-NEXT: v_fma_f32 v5, -v2, v4, 1.0		; GFX89-IEEE-NEXT: v_fma_f32 v5, -v2, v4, 1.0
; GFX89-IEEE-NEXT: v_fma_f32 v4, v5, v4, v4		; GFX89-IEEE-NEXT: v_fma_f32 v4, v5, v4, v4
; GFX89-IEEE-NEXT: v_mul_f32_e32 v5, v3, v4		; GFX89-IEEE-NEXT: v_mul_f32_e32 v5, v3, v4
; GFX89-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v3		; GFX89-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v3
; GFX89-IEEE-NEXT: v_fma_f32 v5, v6, v4, v5		; GFX89-IEEE-NEXT: v_fma_f32 v5, v6, v4, v5
; GFX89-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v3		; GFX89-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v3
; GFX89-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v5		; GFX89-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v5
; GFX89-IEEE-NEXT: v_div_fixup_f32 v0, v2, v1, v0		; GFX89-IEEE-NEXT: v_div_fixup_f32 v0, v2, v1, v0
; GFX89-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX89-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-IEEE-LABEL: v_fdiv_f32_arcp_ulp25:		; GFX10-IEEE-LABEL: v_fdiv_f32_arcp_ulp25:
; GFX10-IEEE: ; %bb.0:		; GFX10-IEEE: ; %bb.0:
; GFX10-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-IEEE-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-IEEE-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-IEEE-NEXT: v_div_scale_f32 v2, s4, v1, v1, v0		; GFX10-IEEE-NEXT: v_div_scale_f32 v2, vcc_lo, v1, v1, v0
; GFX10-IEEE-NEXT: v_div_scale_f32 v5, vcc_lo, v0, v1, v0		; GFX10-IEEE-NEXT: v_div_scale_f32 v5, vcc_lo, v0, v1, v0
; GFX10-IEEE-NEXT: v_rcp_f32_e32 v3, v2		; GFX10-IEEE-NEXT: v_rcp_f32_e32 v3, v2
; GFX10-IEEE-NEXT: v_fma_f32 v4, -v2, v3, 1.0		; GFX10-IEEE-NEXT: v_fma_f32 v4, -v2, v3, 1.0
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v3, v4, v3		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v3, v4, v3
; GFX10-IEEE-NEXT: v_mul_f32_e32 v4, v5, v3		; GFX10-IEEE-NEXT: v_mul_f32_e32 v4, v5, v3
; GFX10-IEEE-NEXT: v_fma_f32 v6, -v2, v4, v5		; GFX10-IEEE-NEXT: v_fma_f32 v6, -v2, v4, v5
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v4, v6, v3		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v4, v6, v3
; GFX10-IEEE-NEXT: v_fma_f32 v2, -v2, v4, v5		; GFX10-IEEE-NEXT: v_fma_f32 v2, -v2, v4, v5
Show All 12 Lines
; GFX10-FLUSH-NEXT: v_mul_f32_e32 v0, v0, v1		; GFX10-FLUSH-NEXT: v_mul_f32_e32 v0, v0, v1
; GFX10-FLUSH-NEXT: v_mul_f32_e32 v0, v2, v0		; GFX10-FLUSH-NEXT: v_mul_f32_e32 v0, v2, v0
; GFX10-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX10-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-IEEE-LABEL: v_fdiv_f32_arcp_ulp25:		; GFX11-IEEE-LABEL: v_fdiv_f32_arcp_ulp25:
; GFX11-IEEE: ; %bb.0:		; GFX11-IEEE: ; %bb.0:
; GFX11-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-IEEE-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-IEEE-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-IEEE-NEXT: v_div_scale_f32 v2, null, v1, v1, v0		; GFX11-IEEE-NEXT: v_div_scale_f32 v2, vcc_lo, v1, v1, v0
; GFX11-IEEE-NEXT: v_div_scale_f32 v5, vcc_lo, v0, v1, v0		; GFX11-IEEE-NEXT: v_div_scale_f32 v5, vcc_lo, v0, v1, v0
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: v_rcp_f32_e32 v3, v2		; GFX11-IEEE-NEXT: v_rcp_f32_e32 v3, v2
; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff		; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff
; GFX11-IEEE-NEXT: v_fma_f32 v4, -v2, v3, 1.0		; GFX11-IEEE-NEXT: v_fma_f32 v4, -v2, v3, 1.0
; GFX11-IEEE-NEXT: v_fmac_f32_e32 v3, v4, v3		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v3, v4, v3
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: v_mul_f32_e32 v4, v5, v3		; GFX11-IEEE-NEXT: v_mul_f32_e32 v4, v5, v3
Show All 23 Lines	; GFX11-FLUSH-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv arcp float %a, %b, !fpmath !0		%fdiv = fdiv arcp float %a, %b, !fpmath !0
ret float %fdiv		ret float %fdiv
}		}

define <2 x float> @v_fdiv_v2f32(<2 x float> %a, <2 x float> %b) {		define <2 x float> @v_fdiv_v2f32(<2 x float> %a, <2 x float> %b) {
; GFX6-IEEE-LABEL: v_fdiv_v2f32:		; GFX6-IEEE-LABEL: v_fdiv_v2f32:
; GFX6-IEEE: ; %bb.0:		; GFX6-IEEE: ; %bb.0:
; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-IEEE-NEXT: v_div_scale_f32 v4, s[4:5], v2, v2, v0		; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v2, v2, v0
; GFX6-IEEE-NEXT: v_rcp_f32_e32 v5, v4		; GFX6-IEEE-NEXT: v_rcp_f32_e32 v5, v4
; GFX6-IEEE-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0		; GFX6-IEEE-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0
; GFX6-IEEE-NEXT: v_fma_f32 v7, -v4, v5, 1.0		; GFX6-IEEE-NEXT: v_fma_f32 v7, -v4, v5, 1.0
; GFX6-IEEE-NEXT: v_fma_f32 v5, v7, v5, v5		; GFX6-IEEE-NEXT: v_fma_f32 v5, v7, v5, v5
; GFX6-IEEE-NEXT: v_mul_f32_e32 v7, v6, v5		; GFX6-IEEE-NEXT: v_mul_f32_e32 v7, v6, v5
; GFX6-IEEE-NEXT: v_fma_f32 v8, -v4, v7, v6		; GFX6-IEEE-NEXT: v_fma_f32 v8, -v4, v7, v6
; GFX6-IEEE-NEXT: v_fma_f32 v7, v8, v5, v7		; GFX6-IEEE-NEXT: v_fma_f32 v7, v8, v5, v7
; GFX6-IEEE-NEXT: v_fma_f32 v4, -v4, v7, v6		; GFX6-IEEE-NEXT: v_fma_f32 v4, -v4, v7, v6
; GFX6-IEEE-NEXT: v_div_fmas_f32 v4, v4, v5, v7		; GFX6-IEEE-NEXT: v_div_fmas_f32 v4, v4, v5, v7
; GFX6-IEEE-NEXT: v_div_scale_f32 v5, s[4:5], v3, v3, v1		; GFX6-IEEE-NEXT: v_div_scale_f32 v5, vcc, v3, v3, v1
; GFX6-IEEE-NEXT: v_rcp_f32_e32 v6, v5		; GFX6-IEEE-NEXT: v_rcp_f32_e32 v6, v5
; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0		; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0
; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, v1, v3, v1		; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, v1, v3, v1
; GFX6-IEEE-NEXT: v_fma_f32 v4, -v5, v6, 1.0		; GFX6-IEEE-NEXT: v_fma_f32 v4, -v5, v6, 1.0
; GFX6-IEEE-NEXT: v_fma_f32 v4, v4, v6, v6		; GFX6-IEEE-NEXT: v_fma_f32 v4, v4, v6, v6
; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v2, v4		; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v2, v4
; GFX6-IEEE-NEXT: v_fma_f32 v7, -v5, v6, v2		; GFX6-IEEE-NEXT: v_fma_f32 v7, -v5, v6, v2
; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6		; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6
; GFX6-IEEE-NEXT: v_fma_f32 v2, -v5, v6, v2		; GFX6-IEEE-NEXT: v_fma_f32 v2, -v5, v6, v2
; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v6		; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v6
; GFX6-IEEE-NEXT: v_div_fixup_f32 v1, v2, v3, v1		; GFX6-IEEE-NEXT: v_div_fixup_f32 v1, v2, v3, v1
; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX6-FLUSH-LABEL: v_fdiv_v2f32:		; GFX6-FLUSH-LABEL: v_fdiv_v2f32:
; GFX6-FLUSH: ; %bb.0:		; GFX6-FLUSH: ; %bb.0:
; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, s[4:5], v2, v2, v0		; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, vcc, v2, v2, v0
; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v5, v4		; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v5, v4
; GFX6-FLUSH-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0		; GFX6-FLUSH-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0
; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3		; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v4, v5, 1.0		; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v4, v5, 1.0
; GFX6-FLUSH-NEXT: v_fma_f32 v5, v7, v5, v5		; GFX6-FLUSH-NEXT: v_fma_f32 v5, v7, v5, v5
; GFX6-FLUSH-NEXT: v_mul_f32_e32 v7, v6, v5		; GFX6-FLUSH-NEXT: v_mul_f32_e32 v7, v6, v5
; GFX6-FLUSH-NEXT: v_fma_f32 v8, -v4, v7, v6		; GFX6-FLUSH-NEXT: v_fma_f32 v8, -v4, v7, v6
; GFX6-FLUSH-NEXT: v_fma_f32 v7, v8, v5, v7		; GFX6-FLUSH-NEXT: v_fma_f32 v7, v8, v5, v7
; GFX6-FLUSH-NEXT: v_fma_f32 v4, -v4, v7, v6		; GFX6-FLUSH-NEXT: v_fma_f32 v4, -v4, v7, v6
; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0		; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
; GFX6-FLUSH-NEXT: v_div_fmas_f32 v4, v4, v5, v7		; GFX6-FLUSH-NEXT: v_div_fmas_f32 v4, v4, v5, v7
; GFX6-FLUSH-NEXT: v_div_scale_f32 v5, s[4:5], v3, v3, v1		; GFX6-FLUSH-NEXT: v_div_scale_f32 v5, vcc, v3, v3, v1
; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v6, v5		; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v6, v5
; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v4, v2, v0		; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v4, v2, v0
; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, vcc, v1, v3, v1		; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, vcc, v1, v3, v1
; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3		; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
; GFX6-FLUSH-NEXT: v_fma_f32 v4, -v5, v6, 1.0		; GFX6-FLUSH-NEXT: v_fma_f32 v4, -v5, v6, 1.0
; GFX6-FLUSH-NEXT: v_fma_f32 v4, v4, v6, v6		; GFX6-FLUSH-NEXT: v_fma_f32 v4, v4, v6, v6
; GFX6-FLUSH-NEXT: v_mul_f32_e32 v6, v2, v4		; GFX6-FLUSH-NEXT: v_mul_f32_e32 v6, v2, v4
; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v5, v6, v2		; GFX6-FLUSH-NEXT: v_fma_f32 v7, -v5, v6, v2
; GFX6-FLUSH-NEXT: v_fma_f32 v6, v7, v4, v6		; GFX6-FLUSH-NEXT: v_fma_f32 v6, v7, v4, v6
; GFX6-FLUSH-NEXT: v_fma_f32 v2, -v5, v6, v2		; GFX6-FLUSH-NEXT: v_fma_f32 v2, -v5, v6, v2
; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0		; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
; GFX6-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v4, v6		; GFX6-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v4, v6
; GFX6-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v3, v1		; GFX6-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v3, v1
; GFX6-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX6-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX89-IEEE-LABEL: v_fdiv_v2f32:		; GFX89-IEEE-LABEL: v_fdiv_v2f32:
; GFX89-IEEE: ; %bb.0:		; GFX89-IEEE: ; %bb.0:
; GFX89-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX89-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX89-IEEE-NEXT: v_div_scale_f32 v4, s[4:5], v2, v2, v0		; GFX89-IEEE-NEXT: v_div_scale_f32 v4, vcc, v2, v2, v0
; GFX89-IEEE-NEXT: v_div_scale_f32 v5, s[4:5], v3, v3, v1		; GFX89-IEEE-NEXT: v_div_scale_f32 v5, vcc, v0, v2, v0
; GFX89-IEEE-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0		; GFX89-IEEE-NEXT: v_rcp_f32_e32 v6, v4
; GFX89-IEEE-NEXT: v_div_scale_f32 v7, s[4:5], v1, v3, v1		; GFX89-IEEE-NEXT: v_fma_f32 v7, -v4, v6, 1.0
; GFX89-IEEE-NEXT: v_rcp_f32_e32 v8, v4		; GFX89-IEEE-NEXT: v_fma_f32 v6, v7, v6, v6
; GFX89-IEEE-NEXT: v_rcp_f32_e32 v9, v5		; GFX89-IEEE-NEXT: v_mul_f32_e32 v7, v5, v6
; GFX89-IEEE-NEXT: v_fma_f32 v10, -v4, v8, 1.0		; GFX89-IEEE-NEXT: v_fma_f32 v8, -v4, v7, v5
; GFX89-IEEE-NEXT: v_fma_f32 v8, v10, v8, v8		; GFX89-IEEE-NEXT: v_fma_f32 v7, v8, v6, v7
; GFX89-IEEE-NEXT: v_fma_f32 v11, -v5, v9, 1.0		; GFX89-IEEE-NEXT: v_fma_f32 v4, -v4, v7, v5
; GFX89-IEEE-NEXT: v_fma_f32 v9, v11, v9, v9		; GFX89-IEEE-NEXT: v_div_fmas_f32 v4, v4, v6, v7
; GFX89-IEEE-NEXT: v_mul_f32_e32 v10, v6, v8		; GFX89-IEEE-NEXT: v_div_scale_f32 v5, vcc, v3, v3, v1
; GFX89-IEEE-NEXT: v_mul_f32_e32 v11, v7, v9		; GFX89-IEEE-NEXT: v_div_scale_f32 v6, vcc, v1, v3, v1
; GFX89-IEEE-NEXT: v_fma_f32 v12, -v4, v10, v6
; GFX89-IEEE-NEXT: v_fma_f32 v13, -v5, v11, v7
; GFX89-IEEE-NEXT: v_fma_f32 v10, v12, v8, v10
; GFX89-IEEE-NEXT: v_fma_f32 v11, v13, v9, v11
; GFX89-IEEE-NEXT: v_fma_f32 v4, -v4, v10, v6
; GFX89-IEEE-NEXT: v_fma_f32 v5, -v5, v11, v7
; GFX89-IEEE-NEXT: v_div_fmas_f32 v4, v4, v8, v10
; GFX89-IEEE-NEXT: s_mov_b64 vcc, s[4:5]
; GFX89-IEEE-NEXT: v_div_fmas_f32 v5, v5, v9, v11
; GFX89-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0		; GFX89-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0
		; GFX89-IEEE-NEXT: v_rcp_f32_e32 v7, v5
		; GFX89-IEEE-NEXT: v_fma_f32 v8, -v5, v7, 1.0
		; GFX89-IEEE-NEXT: v_fma_f32 v7, v8, v7, v7
		; GFX89-IEEE-NEXT: v_mul_f32_e32 v8, v6, v7
		; GFX89-IEEE-NEXT: v_fma_f32 v9, -v5, v8, v6
		; GFX89-IEEE-NEXT: v_fma_f32 v8, v9, v7, v8
		; GFX89-IEEE-NEXT: v_fma_f32 v5, -v5, v8, v6
		; GFX89-IEEE-NEXT: v_div_fmas_f32 v5, v5, v7, v8
; GFX89-IEEE-NEXT: v_div_fixup_f32 v1, v5, v3, v1		; GFX89-IEEE-NEXT: v_div_fixup_f32 v1, v5, v3, v1
; GFX89-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX89-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX89-FLUSH-LABEL: v_fdiv_v2f32:		; GFX89-FLUSH-LABEL: v_fdiv_v2f32:
; GFX89-FLUSH: ; %bb.0:		; GFX89-FLUSH: ; %bb.0:
; GFX89-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX89-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX89-FLUSH-NEXT: v_div_scale_f32 v4, s[4:5], v2, v2, v0		; GFX89-FLUSH-NEXT: v_div_scale_f32 v4, vcc, v2, v2, v0
; GFX89-FLUSH-NEXT: v_div_scale_f32 v5, vcc, v0, v2, v0		; GFX89-FLUSH-NEXT: v_div_scale_f32 v5, vcc, v0, v2, v0
; GFX89-FLUSH-NEXT: v_rcp_f32_e32 v6, v4		; GFX89-FLUSH-NEXT: v_rcp_f32_e32 v6, v4
; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3		; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
; GFX89-FLUSH-NEXT: v_fma_f32 v7, -v4, v6, 1.0		; GFX89-FLUSH-NEXT: v_fma_f32 v7, -v4, v6, 1.0
; GFX89-FLUSH-NEXT: v_fma_f32 v6, v7, v6, v6		; GFX89-FLUSH-NEXT: v_fma_f32 v6, v7, v6, v6
; GFX89-FLUSH-NEXT: v_mul_f32_e32 v7, v5, v6		; GFX89-FLUSH-NEXT: v_mul_f32_e32 v7, v5, v6
; GFX89-FLUSH-NEXT: v_fma_f32 v8, -v4, v7, v5		; GFX89-FLUSH-NEXT: v_fma_f32 v8, -v4, v7, v5
; GFX89-FLUSH-NEXT: v_fma_f32 v7, v8, v6, v7		; GFX89-FLUSH-NEXT: v_fma_f32 v7, v8, v6, v7
; GFX89-FLUSH-NEXT: v_fma_f32 v4, -v4, v7, v5		; GFX89-FLUSH-NEXT: v_fma_f32 v4, -v4, v7, v5
; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0		; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
; GFX89-FLUSH-NEXT: v_div_scale_f32 v5, s[4:5], v3, v3, v1
; GFX89-FLUSH-NEXT: v_div_fmas_f32 v4, v4, v6, v7		; GFX89-FLUSH-NEXT: v_div_fmas_f32 v4, v4, v6, v7
		; GFX89-FLUSH-NEXT: v_div_scale_f32 v5, vcc, v3, v3, v1
; GFX89-FLUSH-NEXT: v_div_scale_f32 v6, vcc, v1, v3, v1		; GFX89-FLUSH-NEXT: v_div_scale_f32 v6, vcc, v1, v3, v1
; GFX89-FLUSH-NEXT: v_rcp_f32_e32 v7, v5
; GFX89-FLUSH-NEXT: v_div_fixup_f32 v0, v4, v2, v0		; GFX89-FLUSH-NEXT: v_div_fixup_f32 v0, v4, v2, v0
		; GFX89-FLUSH-NEXT: v_rcp_f32_e32 v7, v5
; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3		; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
; GFX89-FLUSH-NEXT: v_fma_f32 v2, -v5, v7, 1.0		; GFX89-FLUSH-NEXT: v_fma_f32 v2, -v5, v7, 1.0
; GFX89-FLUSH-NEXT: v_fma_f32 v2, v2, v7, v7		; GFX89-FLUSH-NEXT: v_fma_f32 v2, v2, v7, v7
; GFX89-FLUSH-NEXT: v_mul_f32_e32 v4, v6, v2		; GFX89-FLUSH-NEXT: v_mul_f32_e32 v4, v6, v2
; GFX89-FLUSH-NEXT: v_fma_f32 v7, -v5, v4, v6		; GFX89-FLUSH-NEXT: v_fma_f32 v7, -v5, v4, v6
; GFX89-FLUSH-NEXT: v_fma_f32 v4, v7, v2, v4		; GFX89-FLUSH-NEXT: v_fma_f32 v4, v7, v2, v4
; GFX89-FLUSH-NEXT: v_fma_f32 v5, -v5, v4, v6		; GFX89-FLUSH-NEXT: v_fma_f32 v5, -v5, v4, v6
; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0		; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
; GFX89-FLUSH-NEXT: v_div_fmas_f32 v2, v5, v2, v4		; GFX89-FLUSH-NEXT: v_div_fmas_f32 v2, v5, v2, v4
; GFX89-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v3, v1		; GFX89-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v3, v1
; GFX89-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX89-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-IEEE-LABEL: v_fdiv_v2f32:		; GFX10-IEEE-LABEL: v_fdiv_v2f32:
; GFX10-IEEE: ; %bb.0:		; GFX10-IEEE: ; %bb.0:
; GFX10-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-IEEE-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-IEEE-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-IEEE-NEXT: v_div_scale_f32 v4, s4, v2, v2, v0		; GFX10-IEEE-NEXT: v_div_scale_f32 v4, vcc_lo, v2, v2, v0
; GFX10-IEEE-NEXT: v_div_scale_f32 v5, s4, v3, v3, v1		; GFX10-IEEE-NEXT: v_div_scale_f32 v7, vcc_lo, v0, v2, v0
; GFX10-IEEE-NEXT: v_div_scale_f32 v10, vcc_lo, v0, v2, v0		; GFX10-IEEE-NEXT: v_rcp_f32_e32 v5, v4
; GFX10-IEEE-NEXT: v_rcp_f32_e32 v6, v4		; GFX10-IEEE-NEXT: v_fma_f32 v6, -v4, v5, 1.0
; GFX10-IEEE-NEXT: v_rcp_f32_e32 v7, v5		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v5, v6, v5
; GFX10-IEEE-NEXT: v_fma_f32 v8, -v4, v6, 1.0		; GFX10-IEEE-NEXT: v_mul_f32_e32 v6, v7, v5
; GFX10-IEEE-NEXT: v_fma_f32 v9, -v5, v7, 1.0		; GFX10-IEEE-NEXT: v_fma_f32 v8, -v4, v6, v7
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v6, v8, v6		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v6, v8, v5
; GFX10-IEEE-NEXT: v_div_scale_f32 v8, s4, v1, v3, v1		; GFX10-IEEE-NEXT: v_fma_f32 v4, -v4, v6, v7
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v7, v9, v7		; GFX10-IEEE-NEXT: v_div_fmas_f32 v4, v4, v5, v6
; GFX10-IEEE-NEXT: v_mul_f32_e32 v9, v10, v6		; GFX10-IEEE-NEXT: v_div_scale_f32 v5, vcc_lo, v3, v3, v1
; GFX10-IEEE-NEXT: v_mul_f32_e32 v11, v8, v7		; GFX10-IEEE-NEXT: v_div_scale_f32 v8, vcc_lo, v1, v3, v1
; GFX10-IEEE-NEXT: v_fma_f32 v12, -v4, v9, v10
; GFX10-IEEE-NEXT: v_fma_f32 v13, -v5, v11, v8
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v9, v12, v6
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v11, v13, v7
; GFX10-IEEE-NEXT: v_fma_f32 v4, -v4, v9, v10
; GFX10-IEEE-NEXT: v_fma_f32 v5, -v5, v11, v8
; GFX10-IEEE-NEXT: v_div_fmas_f32 v4, v4, v6, v9
; GFX10-IEEE-NEXT: s_mov_b32 vcc_lo, s4
; GFX10-IEEE-NEXT: v_div_fmas_f32 v5, v5, v7, v11
; GFX10-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0		; GFX10-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0
		; GFX10-IEEE-NEXT: v_rcp_f32_e32 v6, v5
		; GFX10-IEEE-NEXT: v_fma_f32 v7, -v5, v6, 1.0
		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v6, v7, v6
		; GFX10-IEEE-NEXT: v_mul_f32_e32 v7, v8, v6
		; GFX10-IEEE-NEXT: v_fma_f32 v9, -v5, v7, v8
		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v7, v9, v6
		; GFX10-IEEE-NEXT: v_fma_f32 v5, -v5, v7, v8
		; GFX10-IEEE-NEXT: v_div_fmas_f32 v5, v5, v6, v7
; GFX10-IEEE-NEXT: v_div_fixup_f32 v1, v5, v3, v1		; GFX10-IEEE-NEXT: v_div_fixup_f32 v1, v5, v3, v1
; GFX10-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX10-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-FLUSH-LABEL: v_fdiv_v2f32:		; GFX10-FLUSH-LABEL: v_fdiv_v2f32:
; GFX10-FLUSH: ; %bb.0:		; GFX10-FLUSH: ; %bb.0:
; GFX10-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-FLUSH-NEXT: v_div_scale_f32 v4, s4, v2, v2, v0		; GFX10-FLUSH-NEXT: v_div_scale_f32 v4, vcc_lo, v2, v2, v0
; GFX10-FLUSH-NEXT: v_div_scale_f32 v6, vcc_lo, v0, v2, v0		; GFX10-FLUSH-NEXT: v_div_scale_f32 v6, vcc_lo, v0, v2, v0
; GFX10-FLUSH-NEXT: v_rcp_f32_e32 v5, v4		; GFX10-FLUSH-NEXT: v_rcp_f32_e32 v5, v4
; GFX10-FLUSH-NEXT: s_denorm_mode 3		; GFX10-FLUSH-NEXT: s_denorm_mode 3
; GFX10-FLUSH-NEXT: v_fma_f32 v7, -v4, v5, 1.0		; GFX10-FLUSH-NEXT: v_fma_f32 v7, -v4, v5, 1.0
; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v5, v7, v5		; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v5, v7, v5
; GFX10-FLUSH-NEXT: v_mul_f32_e32 v7, v6, v5		; GFX10-FLUSH-NEXT: v_mul_f32_e32 v7, v6, v5
; GFX10-FLUSH-NEXT: v_fma_f32 v8, -v4, v7, v6		; GFX10-FLUSH-NEXT: v_fma_f32 v8, -v4, v7, v6
; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v7, v8, v5		; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v7, v8, v5
; GFX10-FLUSH-NEXT: v_fma_f32 v4, -v4, v7, v6		; GFX10-FLUSH-NEXT: v_fma_f32 v4, -v4, v7, v6
; GFX10-FLUSH-NEXT: s_denorm_mode 0		; GFX10-FLUSH-NEXT: s_denorm_mode 0
; GFX10-FLUSH-NEXT: v_div_scale_f32 v6, s4, v3, v3, v1
; GFX10-FLUSH-NEXT: v_div_fmas_f32 v4, v4, v5, v7		; GFX10-FLUSH-NEXT: v_div_fmas_f32 v4, v4, v5, v7
; GFX10-FLUSH-NEXT: v_rcp_f32_e32 v5, v6		; GFX10-FLUSH-NEXT: v_div_scale_f32 v5, vcc_lo, v3, v3, v1
; GFX10-FLUSH-NEXT: v_div_fixup_f32 v0, v4, v2, v0		; GFX10-FLUSH-NEXT: v_div_fixup_f32 v0, v4, v2, v0
		; GFX10-FLUSH-NEXT: v_rcp_f32_e32 v6, v5
; GFX10-FLUSH-NEXT: v_div_scale_f32 v2, vcc_lo, v1, v3, v1		; GFX10-FLUSH-NEXT: v_div_scale_f32 v2, vcc_lo, v1, v3, v1
; GFX10-FLUSH-NEXT: s_denorm_mode 3		; GFX10-FLUSH-NEXT: s_denorm_mode 3
; GFX10-FLUSH-NEXT: v_fma_f32 v4, -v6, v5, 1.0		; GFX10-FLUSH-NEXT: v_fma_f32 v4, -v5, v6, 1.0
; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v5, v4, v5		; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v6, v4, v6
; GFX10-FLUSH-NEXT: v_mul_f32_e32 v4, v2, v5		; GFX10-FLUSH-NEXT: v_mul_f32_e32 v4, v2, v6
; GFX10-FLUSH-NEXT: v_fma_f32 v7, -v6, v4, v2		; GFX10-FLUSH-NEXT: v_fma_f32 v7, -v5, v4, v2
; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v4, v7, v5		; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v4, v7, v6
; GFX10-FLUSH-NEXT: v_fma_f32 v2, -v6, v4, v2		; GFX10-FLUSH-NEXT: v_fma_f32 v2, -v5, v4, v2
; GFX10-FLUSH-NEXT: s_denorm_mode 0		; GFX10-FLUSH-NEXT: s_denorm_mode 0
; GFX10-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v5, v4		; GFX10-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v6, v4
; GFX10-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v3, v1		; GFX10-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v3, v1
; GFX10-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX10-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-IEEE-LABEL: v_fdiv_v2f32:		; GFX11-IEEE-LABEL: v_fdiv_v2f32:
; GFX11-IEEE: ; %bb.0:		; GFX11-IEEE: ; %bb.0:
; GFX11-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-IEEE-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-IEEE-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-IEEE-NEXT: v_div_scale_f32 v4, null, v2, v2, v0		; GFX11-IEEE-NEXT: v_div_scale_f32 v4, vcc_lo, v2, v2, v0
; GFX11-IEEE-NEXT: v_div_scale_f32 v5, null, v3, v3, v1		; GFX11-IEEE-NEXT: v_div_scale_f32 v7, vcc_lo, v0, v2, v0
; GFX11-IEEE-NEXT: v_div_scale_f32 v10, vcc_lo, v0, v2, v0		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-IEEE-NEXT: v_rcp_f32_e32 v5, v4
; GFX11-IEEE-NEXT: v_rcp_f32_e32 v6, v4
; GFX11-IEEE-NEXT: v_rcp_f32_e32 v7, v5
; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff		; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff
; GFX11-IEEE-NEXT: v_fma_f32 v8, -v4, v6, 1.0		; GFX11-IEEE-NEXT: v_fma_f32 v6, -v4, v5, 1.0
; GFX11-IEEE-NEXT: v_fma_f32 v9, -v5, v7, 1.0		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v5, v6, v5
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: v_dual_fmac_f32 v6, v8, v6 :: v_dual_fmac_f32 v7, v9, v7		; GFX11-IEEE-NEXT: v_mul_f32_e32 v6, v7, v5
; GFX11-IEEE-NEXT: v_div_scale_f32 v8, s0, v1, v3, v1		; GFX11-IEEE-NEXT: v_fma_f32 v8, -v4, v6, v7
; GFX11-IEEE-NEXT: v_mul_f32_e32 v9, v10, v6		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v6, v8, v5
; GFX11-IEEE-NEXT: v_mul_f32_e32 v11, v8, v7		; GFX11-IEEE-NEXT: v_fma_f32 v4, -v4, v6, v7
; GFX11-IEEE-NEXT: v_fma_f32 v12, -v4, v9, v10		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_2) \| instid1(VALU_DEP_3)
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-IEEE-NEXT: v_div_fmas_f32 v4, v4, v5, v6
; GFX11-IEEE-NEXT: v_fma_f32 v13, -v5, v11, v8		; GFX11-IEEE-NEXT: v_div_scale_f32 v5, vcc_lo, v3, v3, v1
; GFX11-IEEE-NEXT: v_fmac_f32_e32 v9, v12, v6		; GFX11-IEEE-NEXT: v_div_scale_f32 v8, vcc_lo, v1, v3, v1
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-IEEE-NEXT: v_fmac_f32_e32 v11, v13, v7
; GFX11-IEEE-NEXT: v_fma_f32 v4, -v4, v9, v10
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-IEEE-NEXT: v_fma_f32 v5, -v5, v11, v8
; GFX11-IEEE-NEXT: v_div_fmas_f32 v4, v4, v6, v9
; GFX11-IEEE-NEXT: s_mov_b32 vcc_lo, s0
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-IEEE-NEXT: v_div_fmas_f32 v5, v5, v7, v11
; GFX11-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0		; GFX11-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
		; GFX11-IEEE-NEXT: v_rcp_f32_e32 v6, v5
		; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff
		; GFX11-IEEE-NEXT: v_fma_f32 v7, -v5, v6, 1.0
		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v6, v7, v6
		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-IEEE-NEXT: v_mul_f32_e32 v7, v8, v6
		; GFX11-IEEE-NEXT: v_fma_f32 v9, -v5, v7, v8
		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v7, v9, v6
		; GFX11-IEEE-NEXT: v_fma_f32 v5, -v5, v7, v8
		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-IEEE-NEXT: v_div_fmas_f32 v5, v5, v6, v7
; GFX11-IEEE-NEXT: v_div_fixup_f32 v1, v5, v3, v1		; GFX11-IEEE-NEXT: v_div_fixup_f32 v1, v5, v3, v1
; GFX11-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX11-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-FLUSH-LABEL: v_fdiv_v2f32:		; GFX11-FLUSH-LABEL: v_fdiv_v2f32:
; GFX11-FLUSH: ; %bb.0:		; GFX11-FLUSH: ; %bb.0:
; GFX11-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-FLUSH-NEXT: v_div_scale_f32 v4, null, v2, v2, v0		; GFX11-FLUSH-NEXT: v_div_scale_f32 v4, vcc_lo, v2, v2, v0
; GFX11-FLUSH-NEXT: v_div_scale_f32 v6, vcc_lo, v0, v2, v0		; GFX11-FLUSH-NEXT: v_div_scale_f32 v6, vcc_lo, v0, v2, v0
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_rcp_f32_e32 v5, v4		; GFX11-FLUSH-NEXT: v_rcp_f32_e32 v5, v4
; GFX11-FLUSH-NEXT: s_denorm_mode 3		; GFX11-FLUSH-NEXT: s_denorm_mode 3
; GFX11-FLUSH-NEXT: s_waitcnt_depctr 0xfff		; GFX11-FLUSH-NEXT: s_waitcnt_depctr 0xfff
; GFX11-FLUSH-NEXT: v_fma_f32 v7, -v4, v5, 1.0		; GFX11-FLUSH-NEXT: v_fma_f32 v7, -v4, v5, 1.0
; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v5, v7, v5		; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v5, v7, v5
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_mul_f32_e32 v7, v6, v5		; GFX11-FLUSH-NEXT: v_mul_f32_e32 v7, v6, v5
; GFX11-FLUSH-NEXT: v_fma_f32 v8, -v4, v7, v6		; GFX11-FLUSH-NEXT: v_fma_f32 v8, -v4, v7, v6
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v7, v8, v5		; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v7, v8, v5
; GFX11-FLUSH-NEXT: v_fma_f32 v4, -v4, v7, v6		; GFX11-FLUSH-NEXT: v_fma_f32 v4, -v4, v7, v6
; GFX11-FLUSH-NEXT: s_denorm_mode 0		; GFX11-FLUSH-NEXT: s_denorm_mode 0
; GFX11-FLUSH-NEXT: v_div_scale_f32 v6, null, v3, v3, v1		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-FLUSH-NEXT: v_div_fmas_f32 v4, v4, v5, v7		; GFX11-FLUSH-NEXT: v_div_fmas_f32 v4, v4, v5, v7
; GFX11-FLUSH-NEXT: v_rcp_f32_e32 v5, v6		; GFX11-FLUSH-NEXT: v_div_scale_f32 v5, vcc_lo, v3, v3, v1
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_4) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_div_fixup_f32 v0, v4, v2, v0		; GFX11-FLUSH-NEXT: v_div_fixup_f32 v0, v4, v2, v0
		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_4) \| instid1(VALU_DEP_1)
		; GFX11-FLUSH-NEXT: v_rcp_f32_e32 v6, v5
; GFX11-FLUSH-NEXT: v_div_scale_f32 v2, vcc_lo, v1, v3, v1		; GFX11-FLUSH-NEXT: v_div_scale_f32 v2, vcc_lo, v1, v3, v1
; GFX11-FLUSH-NEXT: s_denorm_mode 3		; GFX11-FLUSH-NEXT: s_denorm_mode 3
; GFX11-FLUSH-NEXT: s_waitcnt_depctr 0xfff		; GFX11-FLUSH-NEXT: s_waitcnt_depctr 0xfff
; GFX11-FLUSH-NEXT: v_fma_f32 v4, -v6, v5, 1.0		; GFX11-FLUSH-NEXT: v_fma_f32 v4, -v5, v6, 1.0
; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v5, v4, v5		; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v6, v4, v6
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_mul_f32_e32 v4, v2, v5		; GFX11-FLUSH-NEXT: v_mul_f32_e32 v4, v2, v6
; GFX11-FLUSH-NEXT: v_fma_f32 v7, -v6, v4, v2		; GFX11-FLUSH-NEXT: v_fma_f32 v7, -v5, v4, v2
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v4, v7, v5		; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v4, v7, v6
; GFX11-FLUSH-NEXT: v_fma_f32 v2, -v6, v4, v2		; GFX11-FLUSH-NEXT: v_fma_f32 v2, -v5, v4, v2
; GFX11-FLUSH-NEXT: s_denorm_mode 0		; GFX11-FLUSH-NEXT: s_denorm_mode 0
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v5, v4		; GFX11-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v6, v4
; GFX11-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v3, v1		; GFX11-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v3, v1
; GFX11-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX11-FLUSH-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv <2 x float> %a, %b		%fdiv = fdiv <2 x float> %a, %b
ret <2 x float> %fdiv		ret <2 x float> %fdiv
}		}

define <2 x float> @v_fdiv_v2f32_afn(<2 x float> %a, <2 x float> %b) {		define <2 x float> @v_fdiv_v2f32_afn(<2 x float> %a, <2 x float> %b) {
; GCN-LABEL: v_fdiv_v2f32_afn:		; GCN-LABEL: v_fdiv_v2f32_afn:
Show All 27 Lines	; GFX11-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv afn <2 x float> %a, %b		%fdiv = fdiv afn <2 x float> %a, %b
ret <2 x float> %fdiv		ret <2 x float> %fdiv
}		}

define <2 x float> @v_fdiv_v2f32_ulp25(<2 x float> %a, <2 x float> %b) {		define <2 x float> @v_fdiv_v2f32_ulp25(<2 x float> %a, <2 x float> %b) {
; GFX6-IEEE-LABEL: v_fdiv_v2f32_ulp25:		; GFX6-IEEE-LABEL: v_fdiv_v2f32_ulp25:
; GFX6-IEEE: ; %bb.0:		; GFX6-IEEE: ; %bb.0:
; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-IEEE-NEXT: v_div_scale_f32 v4, s[4:5], v2, v2, v0		; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v2, v2, v0
; GFX6-IEEE-NEXT: v_rcp_f32_e32 v5, v4		; GFX6-IEEE-NEXT: v_rcp_f32_e32 v5, v4
; GFX6-IEEE-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0		; GFX6-IEEE-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0
; GFX6-IEEE-NEXT: v_fma_f32 v7, -v4, v5, 1.0		; GFX6-IEEE-NEXT: v_fma_f32 v7, -v4, v5, 1.0
; GFX6-IEEE-NEXT: v_fma_f32 v5, v7, v5, v5		; GFX6-IEEE-NEXT: v_fma_f32 v5, v7, v5, v5
; GFX6-IEEE-NEXT: v_mul_f32_e32 v7, v6, v5		; GFX6-IEEE-NEXT: v_mul_f32_e32 v7, v6, v5
; GFX6-IEEE-NEXT: v_fma_f32 v8, -v4, v7, v6		; GFX6-IEEE-NEXT: v_fma_f32 v8, -v4, v7, v6
; GFX6-IEEE-NEXT: v_fma_f32 v7, v8, v5, v7		; GFX6-IEEE-NEXT: v_fma_f32 v7, v8, v5, v7
; GFX6-IEEE-NEXT: v_fma_f32 v4, -v4, v7, v6		; GFX6-IEEE-NEXT: v_fma_f32 v4, -v4, v7, v6
; GFX6-IEEE-NEXT: v_div_fmas_f32 v4, v4, v5, v7		; GFX6-IEEE-NEXT: v_div_fmas_f32 v4, v4, v5, v7
; GFX6-IEEE-NEXT: v_div_scale_f32 v5, s[4:5], v3, v3, v1		; GFX6-IEEE-NEXT: v_div_scale_f32 v5, vcc, v3, v3, v1
; GFX6-IEEE-NEXT: v_rcp_f32_e32 v6, v5		; GFX6-IEEE-NEXT: v_rcp_f32_e32 v6, v5
; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0		; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0
; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, v1, v3, v1		; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, v1, v3, v1
; GFX6-IEEE-NEXT: v_fma_f32 v4, -v5, v6, 1.0		; GFX6-IEEE-NEXT: v_fma_f32 v4, -v5, v6, 1.0
; GFX6-IEEE-NEXT: v_fma_f32 v4, v4, v6, v6		; GFX6-IEEE-NEXT: v_fma_f32 v4, v4, v6, v6
; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v2, v4		; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v2, v4
; GFX6-IEEE-NEXT: v_fma_f32 v7, -v5, v6, v2		; GFX6-IEEE-NEXT: v_fma_f32 v7, -v5, v6, v2
; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6		; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6
Show All 19 Lines
; GCN-FLUSH-NEXT: v_mul_f32_e32 v1, v1, v3		; GCN-FLUSH-NEXT: v_mul_f32_e32 v1, v1, v3
; GCN-FLUSH-NEXT: v_mul_f32_e32 v0, v5, v0		; GCN-FLUSH-NEXT: v_mul_f32_e32 v0, v5, v0
; GCN-FLUSH-NEXT: v_mul_f32_e32 v1, v4, v1		; GCN-FLUSH-NEXT: v_mul_f32_e32 v1, v4, v1
; GCN-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GCN-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX89-IEEE-LABEL: v_fdiv_v2f32_ulp25:		; GFX89-IEEE-LABEL: v_fdiv_v2f32_ulp25:
; GFX89-IEEE: ; %bb.0:		; GFX89-IEEE: ; %bb.0:
; GFX89-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX89-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX89-IEEE-NEXT: v_div_scale_f32 v4, s[4:5], v2, v2, v0		; GFX89-IEEE-NEXT: v_div_scale_f32 v4, vcc, v2, v2, v0
; GFX89-IEEE-NEXT: v_div_scale_f32 v5, s[4:5], v3, v3, v1		; GFX89-IEEE-NEXT: v_div_scale_f32 v5, vcc, v0, v2, v0
; GFX89-IEEE-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0		; GFX89-IEEE-NEXT: v_rcp_f32_e32 v6, v4
; GFX89-IEEE-NEXT: v_div_scale_f32 v7, s[4:5], v1, v3, v1		; GFX89-IEEE-NEXT: v_fma_f32 v7, -v4, v6, 1.0
; GFX89-IEEE-NEXT: v_rcp_f32_e32 v8, v4		; GFX89-IEEE-NEXT: v_fma_f32 v6, v7, v6, v6
; GFX89-IEEE-NEXT: v_rcp_f32_e32 v9, v5		; GFX89-IEEE-NEXT: v_mul_f32_e32 v7, v5, v6
; GFX89-IEEE-NEXT: v_fma_f32 v10, -v4, v8, 1.0		; GFX89-IEEE-NEXT: v_fma_f32 v8, -v4, v7, v5
; GFX89-IEEE-NEXT: v_fma_f32 v8, v10, v8, v8		; GFX89-IEEE-NEXT: v_fma_f32 v7, v8, v6, v7
; GFX89-IEEE-NEXT: v_fma_f32 v11, -v5, v9, 1.0		; GFX89-IEEE-NEXT: v_fma_f32 v4, -v4, v7, v5
; GFX89-IEEE-NEXT: v_fma_f32 v9, v11, v9, v9		; GFX89-IEEE-NEXT: v_div_fmas_f32 v4, v4, v6, v7
; GFX89-IEEE-NEXT: v_mul_f32_e32 v10, v6, v8		; GFX89-IEEE-NEXT: v_div_scale_f32 v5, vcc, v3, v3, v1
; GFX89-IEEE-NEXT: v_mul_f32_e32 v11, v7, v9		; GFX89-IEEE-NEXT: v_div_scale_f32 v6, vcc, v1, v3, v1
; GFX89-IEEE-NEXT: v_fma_f32 v12, -v4, v10, v6
; GFX89-IEEE-NEXT: v_fma_f32 v13, -v5, v11, v7
; GFX89-IEEE-NEXT: v_fma_f32 v10, v12, v8, v10
; GFX89-IEEE-NEXT: v_fma_f32 v11, v13, v9, v11
; GFX89-IEEE-NEXT: v_fma_f32 v4, -v4, v10, v6
; GFX89-IEEE-NEXT: v_fma_f32 v5, -v5, v11, v7
; GFX89-IEEE-NEXT: v_div_fmas_f32 v4, v4, v8, v10
; GFX89-IEEE-NEXT: s_mov_b64 vcc, s[4:5]
; GFX89-IEEE-NEXT: v_div_fmas_f32 v5, v5, v9, v11
; GFX89-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0		; GFX89-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0
		; GFX89-IEEE-NEXT: v_rcp_f32_e32 v7, v5
		; GFX89-IEEE-NEXT: v_fma_f32 v8, -v5, v7, 1.0
		; GFX89-IEEE-NEXT: v_fma_f32 v7, v8, v7, v7
		; GFX89-IEEE-NEXT: v_mul_f32_e32 v8, v6, v7
		; GFX89-IEEE-NEXT: v_fma_f32 v9, -v5, v8, v6
		; GFX89-IEEE-NEXT: v_fma_f32 v8, v9, v7, v8
		; GFX89-IEEE-NEXT: v_fma_f32 v5, -v5, v8, v6
		; GFX89-IEEE-NEXT: v_div_fmas_f32 v5, v5, v7, v8
; GFX89-IEEE-NEXT: v_div_fixup_f32 v1, v5, v3, v1		; GFX89-IEEE-NEXT: v_div_fixup_f32 v1, v5, v3, v1
; GFX89-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX89-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-IEEE-LABEL: v_fdiv_v2f32_ulp25:		; GFX10-IEEE-LABEL: v_fdiv_v2f32_ulp25:
; GFX10-IEEE: ; %bb.0:		; GFX10-IEEE: ; %bb.0:
; GFX10-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-IEEE-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-IEEE-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-IEEE-NEXT: v_div_scale_f32 v4, s4, v2, v2, v0		; GFX10-IEEE-NEXT: v_div_scale_f32 v4, vcc_lo, v2, v2, v0
; GFX10-IEEE-NEXT: v_div_scale_f32 v5, s4, v3, v3, v1		; GFX10-IEEE-NEXT: v_div_scale_f32 v7, vcc_lo, v0, v2, v0
; GFX10-IEEE-NEXT: v_div_scale_f32 v10, vcc_lo, v0, v2, v0		; GFX10-IEEE-NEXT: v_rcp_f32_e32 v5, v4
; GFX10-IEEE-NEXT: v_rcp_f32_e32 v6, v4		; GFX10-IEEE-NEXT: v_fma_f32 v6, -v4, v5, 1.0
; GFX10-IEEE-NEXT: v_rcp_f32_e32 v7, v5		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v5, v6, v5
; GFX10-IEEE-NEXT: v_fma_f32 v8, -v4, v6, 1.0		; GFX10-IEEE-NEXT: v_mul_f32_e32 v6, v7, v5
; GFX10-IEEE-NEXT: v_fma_f32 v9, -v5, v7, 1.0		; GFX10-IEEE-NEXT: v_fma_f32 v8, -v4, v6, v7
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v6, v8, v6		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v6, v8, v5
; GFX10-IEEE-NEXT: v_div_scale_f32 v8, s4, v1, v3, v1		; GFX10-IEEE-NEXT: v_fma_f32 v4, -v4, v6, v7
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v7, v9, v7		; GFX10-IEEE-NEXT: v_div_fmas_f32 v4, v4, v5, v6
; GFX10-IEEE-NEXT: v_mul_f32_e32 v9, v10, v6		; GFX10-IEEE-NEXT: v_div_scale_f32 v5, vcc_lo, v3, v3, v1
; GFX10-IEEE-NEXT: v_mul_f32_e32 v11, v8, v7		; GFX10-IEEE-NEXT: v_div_scale_f32 v8, vcc_lo, v1, v3, v1
; GFX10-IEEE-NEXT: v_fma_f32 v12, -v4, v9, v10
; GFX10-IEEE-NEXT: v_fma_f32 v13, -v5, v11, v8
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v9, v12, v6
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v11, v13, v7
; GFX10-IEEE-NEXT: v_fma_f32 v4, -v4, v9, v10
; GFX10-IEEE-NEXT: v_fma_f32 v5, -v5, v11, v8
; GFX10-IEEE-NEXT: v_div_fmas_f32 v4, v4, v6, v9
; GFX10-IEEE-NEXT: s_mov_b32 vcc_lo, s4
; GFX10-IEEE-NEXT: v_div_fmas_f32 v5, v5, v7, v11
; GFX10-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0		; GFX10-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0
		; GFX10-IEEE-NEXT: v_rcp_f32_e32 v6, v5
		; GFX10-IEEE-NEXT: v_fma_f32 v7, -v5, v6, 1.0
		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v6, v7, v6
		; GFX10-IEEE-NEXT: v_mul_f32_e32 v7, v8, v6
		; GFX10-IEEE-NEXT: v_fma_f32 v9, -v5, v7, v8
		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v7, v9, v6
		; GFX10-IEEE-NEXT: v_fma_f32 v5, -v5, v7, v8
		; GFX10-IEEE-NEXT: v_div_fmas_f32 v5, v5, v6, v7
; GFX10-IEEE-NEXT: v_div_fixup_f32 v1, v5, v3, v1		; GFX10-IEEE-NEXT: v_div_fixup_f32 v1, v5, v3, v1
; GFX10-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX10-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-FLUSH-LABEL: v_fdiv_v2f32_ulp25:		; GFX10-FLUSH-LABEL: v_fdiv_v2f32_ulp25:
; GFX10-FLUSH: ; %bb.0:		; GFX10-FLUSH: ; %bb.0:
; GFX10-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-FLUSH-NEXT: v_cmp_lt_f32_e64 s4, 0x6f800000, \|v2\|		; GFX10-FLUSH-NEXT: v_cmp_lt_f32_e64 s4, 0x6f800000, \|v2\|
Show All 9 Lines
; GFX10-FLUSH-NEXT: v_mul_f32_e32 v1, v1, v3		; GFX10-FLUSH-NEXT: v_mul_f32_e32 v1, v1, v3
; GFX10-FLUSH-NEXT: v_mul_f32_e32 v1, v5, v1		; GFX10-FLUSH-NEXT: v_mul_f32_e32 v1, v5, v1
; GFX10-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX10-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-IEEE-LABEL: v_fdiv_v2f32_ulp25:		; GFX11-IEEE-LABEL: v_fdiv_v2f32_ulp25:
; GFX11-IEEE: ; %bb.0:		; GFX11-IEEE: ; %bb.0:
; GFX11-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-IEEE-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-IEEE-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-IEEE-NEXT: v_div_scale_f32 v4, null, v2, v2, v0		; GFX11-IEEE-NEXT: v_div_scale_f32 v4, vcc_lo, v2, v2, v0
; GFX11-IEEE-NEXT: v_div_scale_f32 v5, null, v3, v3, v1		; GFX11-IEEE-NEXT: v_div_scale_f32 v7, vcc_lo, v0, v2, v0
; GFX11-IEEE-NEXT: v_div_scale_f32 v10, vcc_lo, v0, v2, v0		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-IEEE-NEXT: v_rcp_f32_e32 v5, v4
; GFX11-IEEE-NEXT: v_rcp_f32_e32 v6, v4
; GFX11-IEEE-NEXT: v_rcp_f32_e32 v7, v5
; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff		; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff
; GFX11-IEEE-NEXT: v_fma_f32 v8, -v4, v6, 1.0		; GFX11-IEEE-NEXT: v_fma_f32 v6, -v4, v5, 1.0
; GFX11-IEEE-NEXT: v_fma_f32 v9, -v5, v7, 1.0		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v5, v6, v5
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: v_dual_fmac_f32 v6, v8, v6 :: v_dual_fmac_f32 v7, v9, v7		; GFX11-IEEE-NEXT: v_mul_f32_e32 v6, v7, v5
; GFX11-IEEE-NEXT: v_div_scale_f32 v8, s0, v1, v3, v1		; GFX11-IEEE-NEXT: v_fma_f32 v8, -v4, v6, v7
; GFX11-IEEE-NEXT: v_mul_f32_e32 v9, v10, v6		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v6, v8, v5
; GFX11-IEEE-NEXT: v_mul_f32_e32 v11, v8, v7		; GFX11-IEEE-NEXT: v_fma_f32 v4, -v4, v6, v7
; GFX11-IEEE-NEXT: v_fma_f32 v12, -v4, v9, v10		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_2) \| instid1(VALU_DEP_3)
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-IEEE-NEXT: v_div_fmas_f32 v4, v4, v5, v6
; GFX11-IEEE-NEXT: v_fma_f32 v13, -v5, v11, v8		; GFX11-IEEE-NEXT: v_div_scale_f32 v5, vcc_lo, v3, v3, v1
; GFX11-IEEE-NEXT: v_fmac_f32_e32 v9, v12, v6		; GFX11-IEEE-NEXT: v_div_scale_f32 v8, vcc_lo, v1, v3, v1
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-IEEE-NEXT: v_fmac_f32_e32 v11, v13, v7
; GFX11-IEEE-NEXT: v_fma_f32 v4, -v4, v9, v10
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-IEEE-NEXT: v_fma_f32 v5, -v5, v11, v8
; GFX11-IEEE-NEXT: v_div_fmas_f32 v4, v4, v6, v9
; GFX11-IEEE-NEXT: s_mov_b32 vcc_lo, s0
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-IEEE-NEXT: v_div_fmas_f32 v5, v5, v7, v11
; GFX11-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0		; GFX11-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
		; GFX11-IEEE-NEXT: v_rcp_f32_e32 v6, v5
		; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff
		; GFX11-IEEE-NEXT: v_fma_f32 v7, -v5, v6, 1.0
		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v6, v7, v6
		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-IEEE-NEXT: v_mul_f32_e32 v7, v8, v6
		; GFX11-IEEE-NEXT: v_fma_f32 v9, -v5, v7, v8
		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v7, v9, v6
		; GFX11-IEEE-NEXT: v_fma_f32 v5, -v5, v7, v8
		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-IEEE-NEXT: v_div_fmas_f32 v5, v5, v6, v7
; GFX11-IEEE-NEXT: v_div_fixup_f32 v1, v5, v3, v1		; GFX11-IEEE-NEXT: v_div_fixup_f32 v1, v5, v3, v1
; GFX11-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX11-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-FLUSH-LABEL: v_fdiv_v2f32_ulp25:		; GFX11-FLUSH-LABEL: v_fdiv_v2f32_ulp25:
; GFX11-FLUSH: ; %bb.0:		; GFX11-FLUSH: ; %bb.0:
; GFX11-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-FLUSH-NEXT: v_cmp_lt_f32_e64 s0, 0x6f800000, \|v2\|		; GFX11-FLUSH-NEXT: v_cmp_lt_f32_e64 s0, 0x6f800000, \|v2\|
Show All 13 Lines	; GFX11-FLUSH-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv <2 x float> %a, %b, !fpmath !0		%fdiv = fdiv <2 x float> %a, %b, !fpmath !0
ret <2 x float> %fdiv		ret <2 x float> %fdiv
}		}

define <2 x float> @v_rcp_v2f32(<2 x float> %x) {		define <2 x float> @v_rcp_v2f32(<2 x float> %x) {
; GFX6-IEEE-LABEL: v_rcp_v2f32:		; GFX6-IEEE-LABEL: v_rcp_v2f32:
; GFX6-IEEE: ; %bb.0:		; GFX6-IEEE: ; %bb.0:
; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-IEEE-NEXT: v_div_scale_f32 v2, s[4:5], v0, v0, 1.0		; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, v0, v0, 1.0
; GFX6-IEEE-NEXT: v_rcp_f32_e32 v3, v2		; GFX6-IEEE-NEXT: v_rcp_f32_e32 v3, v2
; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, 1.0, v0, 1.0		; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, 1.0, v0, 1.0
; GFX6-IEEE-NEXT: v_fma_f32 v5, -v2, v3, 1.0		; GFX6-IEEE-NEXT: v_fma_f32 v5, -v2, v3, 1.0
; GFX6-IEEE-NEXT: v_fma_f32 v3, v5, v3, v3		; GFX6-IEEE-NEXT: v_fma_f32 v3, v5, v3, v3
; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v4, v3		; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v4, v3
; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v4		; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v4
; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v3, v5		; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v3, v5
; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v4		; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v4
; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v5		; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v5
; GFX6-IEEE-NEXT: v_div_scale_f32 v3, s[4:5], v1, v1, 1.0		; GFX6-IEEE-NEXT: v_div_scale_f32 v3, vcc, v1, v1, 1.0
; GFX6-IEEE-NEXT: v_rcp_f32_e32 v4, v3		; GFX6-IEEE-NEXT: v_rcp_f32_e32 v4, v3
; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0		; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0
; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, 1.0, v1, 1.0		; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, 1.0, v1, 1.0
; GFX6-IEEE-NEXT: v_fma_f32 v5, -v3, v4, 1.0		; GFX6-IEEE-NEXT: v_fma_f32 v5, -v3, v4, 1.0
; GFX6-IEEE-NEXT: v_fma_f32 v4, v5, v4, v4		; GFX6-IEEE-NEXT: v_fma_f32 v4, v5, v4, v4
; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v2, v4		; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v2, v4
; GFX6-IEEE-NEXT: v_fma_f32 v6, -v3, v5, v2		; GFX6-IEEE-NEXT: v_fma_f32 v6, -v3, v5, v2
; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v4, v5		; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v4, v5
; GFX6-IEEE-NEXT: v_fma_f32 v2, -v3, v5, v2		; GFX6-IEEE-NEXT: v_fma_f32 v2, -v3, v5, v2
; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v5		; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v5
; GFX6-IEEE-NEXT: v_div_fixup_f32 v1, v2, v1, 1.0		; GFX6-IEEE-NEXT: v_div_fixup_f32 v1, v2, v1, 1.0
; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX6-FLUSH-LABEL: v_rcp_v2f32:		; GFX6-FLUSH-LABEL: v_rcp_v2f32:
; GFX6-FLUSH: ; %bb.0:		; GFX6-FLUSH: ; %bb.0:
; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, s[4:5], v0, v0, 1.0		; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, vcc, v0, v0, 1.0
; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2		; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2
; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, vcc, 1.0, v0, 1.0		; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, vcc, 1.0, v0, 1.0
; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3		; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0		; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0
; GFX6-FLUSH-NEXT: v_fma_f32 v3, v5, v3, v3		; GFX6-FLUSH-NEXT: v_fma_f32 v3, v5, v3, v3
; GFX6-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3		; GFX6-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3
; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4		; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4
; GFX6-FLUSH-NEXT: v_fma_f32 v5, v6, v3, v5		; GFX6-FLUSH-NEXT: v_fma_f32 v5, v6, v3, v5
; GFX6-FLUSH-NEXT: v_fma_f32 v2, -v2, v5, v4		; GFX6-FLUSH-NEXT: v_fma_f32 v2, -v2, v5, v4
; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0		; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
; GFX6-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v3, v5		; GFX6-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v3, v5
; GFX6-FLUSH-NEXT: v_div_scale_f32 v3, s[4:5], v1, v1, 1.0		; GFX6-FLUSH-NEXT: v_div_scale_f32 v3, vcc, v1, v1, 1.0
; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v4, v3		; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v4, v3
; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0		; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0
; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, vcc, 1.0, v1, 1.0		; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, vcc, 1.0, v1, 1.0
; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3		; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v3, v4, 1.0		; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v3, v4, 1.0
; GFX6-FLUSH-NEXT: v_fma_f32 v4, v5, v4, v4		; GFX6-FLUSH-NEXT: v_fma_f32 v4, v5, v4, v4
; GFX6-FLUSH-NEXT: v_mul_f32_e32 v5, v2, v4		; GFX6-FLUSH-NEXT: v_mul_f32_e32 v5, v2, v4
; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v3, v5, v2		; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v3, v5, v2
; GFX6-FLUSH-NEXT: v_fma_f32 v5, v6, v4, v5		; GFX6-FLUSH-NEXT: v_fma_f32 v5, v6, v4, v5
; GFX6-FLUSH-NEXT: v_fma_f32 v2, -v3, v5, v2		; GFX6-FLUSH-NEXT: v_fma_f32 v2, -v3, v5, v2
; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0		; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
; GFX6-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v4, v5		; GFX6-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v4, v5
; GFX6-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v1, 1.0		; GFX6-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v1, 1.0
; GFX6-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX6-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX89-IEEE-LABEL: v_rcp_v2f32:		; GFX89-IEEE-LABEL: v_rcp_v2f32:
; GFX89-IEEE: ; %bb.0:		; GFX89-IEEE: ; %bb.0:
; GFX89-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX89-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX89-IEEE-NEXT: v_div_scale_f32 v2, s[4:5], v0, v0, 1.0		; GFX89-IEEE-NEXT: v_div_scale_f32 v2, vcc, v0, v0, 1.0
; GFX89-IEEE-NEXT: v_div_scale_f32 v3, s[4:5], v1, v1, 1.0		; GFX89-IEEE-NEXT: v_div_scale_f32 v3, vcc, 1.0, v0, 1.0
; GFX89-IEEE-NEXT: v_div_scale_f32 v4, vcc, 1.0, v0, 1.0		; GFX89-IEEE-NEXT: v_rcp_f32_e32 v4, v2
; GFX89-IEEE-NEXT: v_div_scale_f32 v5, s[4:5], 1.0, v1, 1.0		; GFX89-IEEE-NEXT: v_fma_f32 v5, -v2, v4, 1.0
; GFX89-IEEE-NEXT: v_rcp_f32_e32 v6, v2		; GFX89-IEEE-NEXT: v_fma_f32 v4, v5, v4, v4
; GFX89-IEEE-NEXT: v_rcp_f32_e32 v7, v3		; GFX89-IEEE-NEXT: v_mul_f32_e32 v5, v3, v4
; GFX89-IEEE-NEXT: v_fma_f32 v8, -v2, v6, 1.0		; GFX89-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v3
; GFX89-IEEE-NEXT: v_fma_f32 v6, v8, v6, v6		; GFX89-IEEE-NEXT: v_fma_f32 v5, v6, v4, v5
; GFX89-IEEE-NEXT: v_fma_f32 v9, -v3, v7, 1.0		; GFX89-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v3
; GFX89-IEEE-NEXT: v_fma_f32 v7, v9, v7, v7		; GFX89-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v5
; GFX89-IEEE-NEXT: v_mul_f32_e32 v8, v4, v6		; GFX89-IEEE-NEXT: v_div_scale_f32 v3, vcc, v1, v1, 1.0
; GFX89-IEEE-NEXT: v_mul_f32_e32 v9, v5, v7		; GFX89-IEEE-NEXT: v_div_scale_f32 v4, vcc, 1.0, v1, 1.0
; GFX89-IEEE-NEXT: v_fma_f32 v10, -v2, v8, v4
; GFX89-IEEE-NEXT: v_fma_f32 v11, -v3, v9, v5
; GFX89-IEEE-NEXT: v_fma_f32 v8, v10, v6, v8
; GFX89-IEEE-NEXT: v_fma_f32 v9, v11, v7, v9
; GFX89-IEEE-NEXT: v_fma_f32 v2, -v2, v8, v4
; GFX89-IEEE-NEXT: v_fma_f32 v3, -v3, v9, v5
; GFX89-IEEE-NEXT: v_div_fmas_f32 v2, v2, v6, v8
; GFX89-IEEE-NEXT: s_mov_b64 vcc, s[4:5]
; GFX89-IEEE-NEXT: v_div_fmas_f32 v3, v3, v7, v9
; GFX89-IEEE-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0		; GFX89-IEEE-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0
		; GFX89-IEEE-NEXT: v_rcp_f32_e32 v5, v3
		; GFX89-IEEE-NEXT: v_fma_f32 v6, -v3, v5, 1.0
		; GFX89-IEEE-NEXT: v_fma_f32 v5, v6, v5, v5
		; GFX89-IEEE-NEXT: v_mul_f32_e32 v6, v4, v5
		; GFX89-IEEE-NEXT: v_fma_f32 v7, -v3, v6, v4
		; GFX89-IEEE-NEXT: v_fma_f32 v6, v7, v5, v6
		; GFX89-IEEE-NEXT: v_fma_f32 v3, -v3, v6, v4
		; GFX89-IEEE-NEXT: v_div_fmas_f32 v3, v3, v5, v6
; GFX89-IEEE-NEXT: v_div_fixup_f32 v1, v3, v1, 1.0		; GFX89-IEEE-NEXT: v_div_fixup_f32 v1, v3, v1, 1.0
; GFX89-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX89-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX89-FLUSH-LABEL: v_rcp_v2f32:		; GFX89-FLUSH-LABEL: v_rcp_v2f32:
; GFX89-FLUSH: ; %bb.0:		; GFX89-FLUSH: ; %bb.0:
; GFX89-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX89-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX89-FLUSH-NEXT: v_div_scale_f32 v2, s[4:5], v0, v0, 1.0		; GFX89-FLUSH-NEXT: v_div_scale_f32 v2, vcc, v0, v0, 1.0
; GFX89-FLUSH-NEXT: v_div_scale_f32 v3, vcc, 1.0, v0, 1.0		; GFX89-FLUSH-NEXT: v_div_scale_f32 v3, vcc, 1.0, v0, 1.0
; GFX89-FLUSH-NEXT: v_rcp_f32_e32 v4, v2		; GFX89-FLUSH-NEXT: v_rcp_f32_e32 v4, v2
; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3		; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
; GFX89-FLUSH-NEXT: v_fma_f32 v5, -v2, v4, 1.0		; GFX89-FLUSH-NEXT: v_fma_f32 v5, -v2, v4, 1.0
; GFX89-FLUSH-NEXT: v_fma_f32 v4, v5, v4, v4		; GFX89-FLUSH-NEXT: v_fma_f32 v4, v5, v4, v4
; GFX89-FLUSH-NEXT: v_mul_f32_e32 v5, v3, v4		; GFX89-FLUSH-NEXT: v_mul_f32_e32 v5, v3, v4
; GFX89-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v3		; GFX89-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v3
; GFX89-FLUSH-NEXT: v_fma_f32 v5, v6, v4, v5		; GFX89-FLUSH-NEXT: v_fma_f32 v5, v6, v4, v5
; GFX89-FLUSH-NEXT: v_fma_f32 v2, -v2, v5, v3		; GFX89-FLUSH-NEXT: v_fma_f32 v2, -v2, v5, v3
; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0		; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
; GFX89-FLUSH-NEXT: v_div_scale_f32 v3, s[4:5], v1, v1, 1.0
; GFX89-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v4, v5		; GFX89-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v4, v5
		; GFX89-FLUSH-NEXT: v_div_scale_f32 v3, vcc, v1, v1, 1.0
; GFX89-FLUSH-NEXT: v_div_scale_f32 v4, vcc, 1.0, v1, 1.0		; GFX89-FLUSH-NEXT: v_div_scale_f32 v4, vcc, 1.0, v1, 1.0
; GFX89-FLUSH-NEXT: v_rcp_f32_e32 v5, v3
; GFX89-FLUSH-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0		; GFX89-FLUSH-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0
		; GFX89-FLUSH-NEXT: v_rcp_f32_e32 v5, v3
; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3		; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
; GFX89-FLUSH-NEXT: v_fma_f32 v2, -v3, v5, 1.0		; GFX89-FLUSH-NEXT: v_fma_f32 v2, -v3, v5, 1.0
; GFX89-FLUSH-NEXT: v_fma_f32 v2, v2, v5, v5		; GFX89-FLUSH-NEXT: v_fma_f32 v2, v2, v5, v5
; GFX89-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v2		; GFX89-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v2
; GFX89-FLUSH-NEXT: v_fma_f32 v6, -v3, v5, v4		; GFX89-FLUSH-NEXT: v_fma_f32 v6, -v3, v5, v4
; GFX89-FLUSH-NEXT: v_fma_f32 v5, v6, v2, v5		; GFX89-FLUSH-NEXT: v_fma_f32 v5, v6, v2, v5
; GFX89-FLUSH-NEXT: v_fma_f32 v3, -v3, v5, v4		; GFX89-FLUSH-NEXT: v_fma_f32 v3, -v3, v5, v4
; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0		; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
; GFX89-FLUSH-NEXT: v_div_fmas_f32 v2, v3, v2, v5		; GFX89-FLUSH-NEXT: v_div_fmas_f32 v2, v3, v2, v5
; GFX89-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v1, 1.0		; GFX89-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v1, 1.0
; GFX89-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX89-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-IEEE-LABEL: v_rcp_v2f32:		; GFX10-IEEE-LABEL: v_rcp_v2f32:
; GFX10-IEEE: ; %bb.0:		; GFX10-IEEE: ; %bb.0:
; GFX10-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-IEEE-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-IEEE-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-IEEE-NEXT: v_div_scale_f32 v2, s4, v0, v0, 1.0		; GFX10-IEEE-NEXT: v_div_scale_f32 v2, vcc_lo, v0, v0, 1.0
; GFX10-IEEE-NEXT: v_div_scale_f32 v3, s4, v1, v1, 1.0		; GFX10-IEEE-NEXT: v_div_scale_f32 v5, vcc_lo, 1.0, v0, 1.0
; GFX10-IEEE-NEXT: v_div_scale_f32 v8, vcc_lo, 1.0, v0, 1.0		; GFX10-IEEE-NEXT: v_rcp_f32_e32 v3, v2
; GFX10-IEEE-NEXT: v_rcp_f32_e32 v4, v2		; GFX10-IEEE-NEXT: v_fma_f32 v4, -v2, v3, 1.0
; GFX10-IEEE-NEXT: v_rcp_f32_e32 v5, v3		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v3, v4, v3
; GFX10-IEEE-NEXT: v_fma_f32 v6, -v2, v4, 1.0		; GFX10-IEEE-NEXT: v_mul_f32_e32 v4, v5, v3
; GFX10-IEEE-NEXT: v_fma_f32 v7, -v3, v5, 1.0		; GFX10-IEEE-NEXT: v_fma_f32 v6, -v2, v4, v5
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v4, v6, v4		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v4, v6, v3
; GFX10-IEEE-NEXT: v_div_scale_f32 v6, s4, 1.0, v1, 1.0		; GFX10-IEEE-NEXT: v_fma_f32 v2, -v2, v4, v5
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v5, v7, v5		; GFX10-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v4
; GFX10-IEEE-NEXT: v_mul_f32_e32 v7, v8, v4		; GFX10-IEEE-NEXT: v_div_scale_f32 v3, vcc_lo, v1, v1, 1.0
; GFX10-IEEE-NEXT: v_mul_f32_e32 v9, v6, v5		; GFX10-IEEE-NEXT: v_div_scale_f32 v6, vcc_lo, 1.0, v1, 1.0
; GFX10-IEEE-NEXT: v_fma_f32 v10, -v2, v7, v8
; GFX10-IEEE-NEXT: v_fma_f32 v11, -v3, v9, v6
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v7, v10, v4
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v9, v11, v5
; GFX10-IEEE-NEXT: v_fma_f32 v2, -v2, v7, v8
; GFX10-IEEE-NEXT: v_fma_f32 v3, -v3, v9, v6
; GFX10-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v7
; GFX10-IEEE-NEXT: s_mov_b32 vcc_lo, s4
; GFX10-IEEE-NEXT: v_div_fmas_f32 v3, v3, v5, v9
; GFX10-IEEE-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0		; GFX10-IEEE-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0
		; GFX10-IEEE-NEXT: v_rcp_f32_e32 v4, v3
		; GFX10-IEEE-NEXT: v_fma_f32 v5, -v3, v4, 1.0
		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v4, v5, v4
		; GFX10-IEEE-NEXT: v_mul_f32_e32 v5, v6, v4
		; GFX10-IEEE-NEXT: v_fma_f32 v7, -v3, v5, v6
		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v5, v7, v4
		; GFX10-IEEE-NEXT: v_fma_f32 v3, -v3, v5, v6
		; GFX10-IEEE-NEXT: v_div_fmas_f32 v3, v3, v4, v5
; GFX10-IEEE-NEXT: v_div_fixup_f32 v1, v3, v1, 1.0		; GFX10-IEEE-NEXT: v_div_fixup_f32 v1, v3, v1, 1.0
; GFX10-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX10-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-FLUSH-LABEL: v_rcp_v2f32:		; GFX10-FLUSH-LABEL: v_rcp_v2f32:
; GFX10-FLUSH: ; %bb.0:		; GFX10-FLUSH: ; %bb.0:
; GFX10-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-FLUSH-NEXT: v_div_scale_f32 v2, s4, v0, v0, 1.0		; GFX10-FLUSH-NEXT: v_div_scale_f32 v2, vcc_lo, v0, v0, 1.0
; GFX10-FLUSH-NEXT: v_div_scale_f32 v4, vcc_lo, 1.0, v0, 1.0		; GFX10-FLUSH-NEXT: v_div_scale_f32 v4, vcc_lo, 1.0, v0, 1.0
; GFX10-FLUSH-NEXT: v_rcp_f32_e32 v3, v2		; GFX10-FLUSH-NEXT: v_rcp_f32_e32 v3, v2
; GFX10-FLUSH-NEXT: s_denorm_mode 3		; GFX10-FLUSH-NEXT: s_denorm_mode 3
; GFX10-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0		; GFX10-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0
; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v3, v5, v3		; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v3, v5, v3
; GFX10-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3		; GFX10-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3
; GFX10-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4		; GFX10-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4
; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v5, v6, v3		; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v5, v6, v3
; GFX10-FLUSH-NEXT: v_fma_f32 v2, -v2, v5, v4		; GFX10-FLUSH-NEXT: v_fma_f32 v2, -v2, v5, v4
; GFX10-FLUSH-NEXT: s_denorm_mode 0		; GFX10-FLUSH-NEXT: s_denorm_mode 0
; GFX10-FLUSH-NEXT: v_div_scale_f32 v4, s4, v1, v1, 1.0
; GFX10-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v3, v5		; GFX10-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v3, v5
; GFX10-FLUSH-NEXT: v_rcp_f32_e32 v3, v4		; GFX10-FLUSH-NEXT: v_div_scale_f32 v3, vcc_lo, v1, v1, 1.0
; GFX10-FLUSH-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0		; GFX10-FLUSH-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0
		; GFX10-FLUSH-NEXT: v_rcp_f32_e32 v4, v3
; GFX10-FLUSH-NEXT: v_div_scale_f32 v2, vcc_lo, 1.0, v1, 1.0		; GFX10-FLUSH-NEXT: v_div_scale_f32 v2, vcc_lo, 1.0, v1, 1.0
; GFX10-FLUSH-NEXT: s_denorm_mode 3		; GFX10-FLUSH-NEXT: s_denorm_mode 3
; GFX10-FLUSH-NEXT: v_fma_f32 v5, -v4, v3, 1.0		; GFX10-FLUSH-NEXT: v_fma_f32 v5, -v3, v4, 1.0
; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v3, v5, v3		; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v4, v5, v4
; GFX10-FLUSH-NEXT: v_mul_f32_e32 v5, v2, v3		; GFX10-FLUSH-NEXT: v_mul_f32_e32 v5, v2, v4
; GFX10-FLUSH-NEXT: v_fma_f32 v6, -v4, v5, v2		; GFX10-FLUSH-NEXT: v_fma_f32 v6, -v3, v5, v2
; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v5, v6, v3		; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v5, v6, v4
; GFX10-FLUSH-NEXT: v_fma_f32 v2, -v4, v5, v2		; GFX10-FLUSH-NEXT: v_fma_f32 v2, -v3, v5, v2
; GFX10-FLUSH-NEXT: s_denorm_mode 0		; GFX10-FLUSH-NEXT: s_denorm_mode 0
; GFX10-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v3, v5		; GFX10-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v4, v5
; GFX10-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v1, 1.0		; GFX10-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v1, 1.0
; GFX10-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX10-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-IEEE-LABEL: v_rcp_v2f32:		; GFX11-IEEE-LABEL: v_rcp_v2f32:
; GFX11-IEEE: ; %bb.0:		; GFX11-IEEE: ; %bb.0:
; GFX11-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-IEEE-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-IEEE-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-IEEE-NEXT: v_div_scale_f32 v2, null, v0, v0, 1.0		; GFX11-IEEE-NEXT: v_div_scale_f32 v2, vcc_lo, v0, v0, 1.0
; GFX11-IEEE-NEXT: v_div_scale_f32 v3, null, v1, v1, 1.0		; GFX11-IEEE-NEXT: v_div_scale_f32 v5, vcc_lo, 1.0, v0, 1.0
; GFX11-IEEE-NEXT: v_div_scale_f32 v8, vcc_lo, 1.0, v0, 1.0		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-IEEE-NEXT: v_rcp_f32_e32 v3, v2
; GFX11-IEEE-NEXT: v_rcp_f32_e32 v4, v2
; GFX11-IEEE-NEXT: v_rcp_f32_e32 v5, v3
; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff		; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff
; GFX11-IEEE-NEXT: v_fma_f32 v6, -v2, v4, 1.0		; GFX11-IEEE-NEXT: v_fma_f32 v4, -v2, v3, 1.0
; GFX11-IEEE-NEXT: v_fma_f32 v7, -v3, v5, 1.0		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v3, v4, v3
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: v_dual_fmac_f32 v4, v6, v4 :: v_dual_fmac_f32 v5, v7, v5		; GFX11-IEEE-NEXT: v_mul_f32_e32 v4, v5, v3
; GFX11-IEEE-NEXT: v_div_scale_f32 v6, s0, 1.0, v1, 1.0		; GFX11-IEEE-NEXT: v_fma_f32 v6, -v2, v4, v5
; GFX11-IEEE-NEXT: v_mul_f32_e32 v7, v8, v4		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v4, v6, v3
; GFX11-IEEE-NEXT: v_mul_f32_e32 v9, v6, v5		; GFX11-IEEE-NEXT: v_fma_f32 v2, -v2, v4, v5
; GFX11-IEEE-NEXT: v_fma_f32 v10, -v2, v7, v8		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_2) \| instid1(VALU_DEP_3)
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v4
; GFX11-IEEE-NEXT: v_fma_f32 v11, -v3, v9, v6		; GFX11-IEEE-NEXT: v_div_scale_f32 v3, vcc_lo, v1, v1, 1.0
; GFX11-IEEE-NEXT: v_fmac_f32_e32 v7, v10, v4		; GFX11-IEEE-NEXT: v_div_scale_f32 v6, vcc_lo, 1.0, v1, 1.0
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-IEEE-NEXT: v_fmac_f32_e32 v9, v11, v5
; GFX11-IEEE-NEXT: v_fma_f32 v2, -v2, v7, v8
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-IEEE-NEXT: v_fma_f32 v3, -v3, v9, v6
; GFX11-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v7
; GFX11-IEEE-NEXT: s_mov_b32 vcc_lo, s0
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-IEEE-NEXT: v_div_fmas_f32 v3, v3, v5, v9
; GFX11-IEEE-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0		; GFX11-IEEE-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
		; GFX11-IEEE-NEXT: v_rcp_f32_e32 v4, v3
		; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff
		; GFX11-IEEE-NEXT: v_fma_f32 v5, -v3, v4, 1.0
		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v4, v5, v4
		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-IEEE-NEXT: v_mul_f32_e32 v5, v6, v4
		; GFX11-IEEE-NEXT: v_fma_f32 v7, -v3, v5, v6
		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v5, v7, v4
		; GFX11-IEEE-NEXT: v_fma_f32 v3, -v3, v5, v6
		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-IEEE-NEXT: v_div_fmas_f32 v3, v3, v4, v5
; GFX11-IEEE-NEXT: v_div_fixup_f32 v1, v3, v1, 1.0		; GFX11-IEEE-NEXT: v_div_fixup_f32 v1, v3, v1, 1.0
; GFX11-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX11-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-FLUSH-LABEL: v_rcp_v2f32:		; GFX11-FLUSH-LABEL: v_rcp_v2f32:
; GFX11-FLUSH: ; %bb.0:		; GFX11-FLUSH: ; %bb.0:
; GFX11-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-FLUSH-NEXT: v_div_scale_f32 v2, null, v0, v0, 1.0		; GFX11-FLUSH-NEXT: v_div_scale_f32 v2, vcc_lo, v0, v0, 1.0
; GFX11-FLUSH-NEXT: v_div_scale_f32 v4, vcc_lo, 1.0, v0, 1.0		; GFX11-FLUSH-NEXT: v_div_scale_f32 v4, vcc_lo, 1.0, v0, 1.0
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_rcp_f32_e32 v3, v2		; GFX11-FLUSH-NEXT: v_rcp_f32_e32 v3, v2
; GFX11-FLUSH-NEXT: s_denorm_mode 3		; GFX11-FLUSH-NEXT: s_denorm_mode 3
; GFX11-FLUSH-NEXT: s_waitcnt_depctr 0xfff		; GFX11-FLUSH-NEXT: s_waitcnt_depctr 0xfff
; GFX11-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0		; GFX11-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0
; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v3, v5, v3		; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v3, v5, v3
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3		; GFX11-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3
; GFX11-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4		; GFX11-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v5, v6, v3		; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v5, v6, v3
; GFX11-FLUSH-NEXT: v_fma_f32 v2, -v2, v5, v4		; GFX11-FLUSH-NEXT: v_fma_f32 v2, -v2, v5, v4
; GFX11-FLUSH-NEXT: s_denorm_mode 0		; GFX11-FLUSH-NEXT: s_denorm_mode 0
; GFX11-FLUSH-NEXT: v_div_scale_f32 v4, null, v1, v1, 1.0		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v3, v5		; GFX11-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v3, v5
; GFX11-FLUSH-NEXT: v_rcp_f32_e32 v3, v4		; GFX11-FLUSH-NEXT: v_div_scale_f32 v3, vcc_lo, v1, v1, 1.0
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_4) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0		; GFX11-FLUSH-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0
		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_4) \| instid1(VALU_DEP_1)
		; GFX11-FLUSH-NEXT: v_rcp_f32_e32 v4, v3
; GFX11-FLUSH-NEXT: v_div_scale_f32 v2, vcc_lo, 1.0, v1, 1.0		; GFX11-FLUSH-NEXT: v_div_scale_f32 v2, vcc_lo, 1.0, v1, 1.0
; GFX11-FLUSH-NEXT: s_denorm_mode 3		; GFX11-FLUSH-NEXT: s_denorm_mode 3
; GFX11-FLUSH-NEXT: s_waitcnt_depctr 0xfff		; GFX11-FLUSH-NEXT: s_waitcnt_depctr 0xfff
; GFX11-FLUSH-NEXT: v_fma_f32 v5, -v4, v3, 1.0		; GFX11-FLUSH-NEXT: v_fma_f32 v5, -v3, v4, 1.0
; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v3, v5, v3		; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v4, v5, v4
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_mul_f32_e32 v5, v2, v3		; GFX11-FLUSH-NEXT: v_mul_f32_e32 v5, v2, v4
; GFX11-FLUSH-NEXT: v_fma_f32 v6, -v4, v5, v2		; GFX11-FLUSH-NEXT: v_fma_f32 v6, -v3, v5, v2
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v5, v6, v3		; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v5, v6, v4
; GFX11-FLUSH-NEXT: v_fma_f32 v2, -v4, v5, v2		; GFX11-FLUSH-NEXT: v_fma_f32 v2, -v3, v5, v2
; GFX11-FLUSH-NEXT: s_denorm_mode 0		; GFX11-FLUSH-NEXT: s_denorm_mode 0
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v3, v5		; GFX11-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v4, v5
; GFX11-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v1, 1.0		; GFX11-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v1, 1.0
; GFX11-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX11-FLUSH-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv <2 x float> <float 1.0, float 1.0>, %x		%fdiv = fdiv <2 x float> <float 1.0, float 1.0>, %x
ret <2 x float> %fdiv		ret <2 x float> %fdiv
}		}

define <2 x float> @v_rcp_v2f32_arcp(<2 x float> %x) {		define <2 x float> @v_rcp_v2f32_arcp(<2 x float> %x) {
; GFX6-IEEE-LABEL: v_rcp_v2f32_arcp:		; GFX6-IEEE-LABEL: v_rcp_v2f32_arcp:
; GFX6-IEEE: ; %bb.0:		; GFX6-IEEE: ; %bb.0:
; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-IEEE-NEXT: v_div_scale_f32 v2, s[4:5], v0, v0, 1.0		; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, v0, v0, 1.0
; GFX6-IEEE-NEXT: v_rcp_f32_e32 v3, v2		; GFX6-IEEE-NEXT: v_rcp_f32_e32 v3, v2
; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, 1.0, v0, 1.0		; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, 1.0, v0, 1.0
; GFX6-IEEE-NEXT: v_fma_f32 v5, -v2, v3, 1.0		; GFX6-IEEE-NEXT: v_fma_f32 v5, -v2, v3, 1.0
; GFX6-IEEE-NEXT: v_fma_f32 v3, v5, v3, v3		; GFX6-IEEE-NEXT: v_fma_f32 v3, v5, v3, v3
; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v4, v3		; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v4, v3
; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v4		; GFX6-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v4
; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v3, v5		; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v3, v5
; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v4		; GFX6-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v4
; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v5		; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v5
; GFX6-IEEE-NEXT: v_div_scale_f32 v3, s[4:5], v1, v1, 1.0		; GFX6-IEEE-NEXT: v_div_scale_f32 v3, vcc, v1, v1, 1.0
; GFX6-IEEE-NEXT: v_rcp_f32_e32 v4, v3		; GFX6-IEEE-NEXT: v_rcp_f32_e32 v4, v3
; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0		; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0
; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, 1.0, v1, 1.0		; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, 1.0, v1, 1.0
; GFX6-IEEE-NEXT: v_fma_f32 v5, -v3, v4, 1.0		; GFX6-IEEE-NEXT: v_fma_f32 v5, -v3, v4, 1.0
; GFX6-IEEE-NEXT: v_fma_f32 v4, v5, v4, v4		; GFX6-IEEE-NEXT: v_fma_f32 v4, v5, v4, v4
; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v2, v4		; GFX6-IEEE-NEXT: v_mul_f32_e32 v5, v2, v4
; GFX6-IEEE-NEXT: v_fma_f32 v6, -v3, v5, v2		; GFX6-IEEE-NEXT: v_fma_f32 v6, -v3, v5, v2
; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v4, v5		; GFX6-IEEE-NEXT: v_fma_f32 v5, v6, v4, v5
; GFX6-IEEE-NEXT: v_fma_f32 v2, -v3, v5, v2		; GFX6-IEEE-NEXT: v_fma_f32 v2, -v3, v5, v2
; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v5		; GFX6-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v5
; GFX6-IEEE-NEXT: v_div_fixup_f32 v1, v2, v1, 1.0		; GFX6-IEEE-NEXT: v_div_fixup_f32 v1, v2, v1, 1.0
; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX6-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX6-FLUSH-LABEL: v_rcp_v2f32_arcp:		; GFX6-FLUSH-LABEL: v_rcp_v2f32_arcp:
; GFX6-FLUSH: ; %bb.0:		; GFX6-FLUSH: ; %bb.0:
; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, s[4:5], v0, v0, 1.0		; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, vcc, v0, v0, 1.0
; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2		; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v3, v2
; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, vcc, 1.0, v0, 1.0		; GFX6-FLUSH-NEXT: v_div_scale_f32 v4, vcc, 1.0, v0, 1.0
; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3		; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0		; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0
; GFX6-FLUSH-NEXT: v_fma_f32 v3, v5, v3, v3		; GFX6-FLUSH-NEXT: v_fma_f32 v3, v5, v3, v3
; GFX6-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3		; GFX6-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3
; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4		; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4
; GFX6-FLUSH-NEXT: v_fma_f32 v5, v6, v3, v5		; GFX6-FLUSH-NEXT: v_fma_f32 v5, v6, v3, v5
; GFX6-FLUSH-NEXT: v_fma_f32 v2, -v2, v5, v4		; GFX6-FLUSH-NEXT: v_fma_f32 v2, -v2, v5, v4
; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0		; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
; GFX6-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v3, v5		; GFX6-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v3, v5
; GFX6-FLUSH-NEXT: v_div_scale_f32 v3, s[4:5], v1, v1, 1.0		; GFX6-FLUSH-NEXT: v_div_scale_f32 v3, vcc, v1, v1, 1.0
; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v4, v3		; GFX6-FLUSH-NEXT: v_rcp_f32_e32 v4, v3
; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0		; GFX6-FLUSH-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0
; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, vcc, 1.0, v1, 1.0		; GFX6-FLUSH-NEXT: v_div_scale_f32 v2, vcc, 1.0, v1, 1.0
; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3		; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v3, v4, 1.0		; GFX6-FLUSH-NEXT: v_fma_f32 v5, -v3, v4, 1.0
; GFX6-FLUSH-NEXT: v_fma_f32 v4, v5, v4, v4		; GFX6-FLUSH-NEXT: v_fma_f32 v4, v5, v4, v4
; GFX6-FLUSH-NEXT: v_mul_f32_e32 v5, v2, v4		; GFX6-FLUSH-NEXT: v_mul_f32_e32 v5, v2, v4
; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v3, v5, v2		; GFX6-FLUSH-NEXT: v_fma_f32 v6, -v3, v5, v2
; GFX6-FLUSH-NEXT: v_fma_f32 v5, v6, v4, v5		; GFX6-FLUSH-NEXT: v_fma_f32 v5, v6, v4, v5
; GFX6-FLUSH-NEXT: v_fma_f32 v2, -v3, v5, v2		; GFX6-FLUSH-NEXT: v_fma_f32 v2, -v3, v5, v2
; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0		; GFX6-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
; GFX6-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v4, v5		; GFX6-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v4, v5
; GFX6-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v1, 1.0		; GFX6-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v1, 1.0
; GFX6-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX6-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX89-IEEE-LABEL: v_rcp_v2f32_arcp:		; GFX89-IEEE-LABEL: v_rcp_v2f32_arcp:
; GFX89-IEEE: ; %bb.0:		; GFX89-IEEE: ; %bb.0:
; GFX89-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX89-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX89-IEEE-NEXT: v_div_scale_f32 v2, s[4:5], v0, v0, 1.0		; GFX89-IEEE-NEXT: v_div_scale_f32 v2, vcc, v0, v0, 1.0
; GFX89-IEEE-NEXT: v_div_scale_f32 v3, s[4:5], v1, v1, 1.0		; GFX89-IEEE-NEXT: v_div_scale_f32 v3, vcc, 1.0, v0, 1.0
; GFX89-IEEE-NEXT: v_div_scale_f32 v4, vcc, 1.0, v0, 1.0		; GFX89-IEEE-NEXT: v_rcp_f32_e32 v4, v2
; GFX89-IEEE-NEXT: v_div_scale_f32 v5, s[4:5], 1.0, v1, 1.0		; GFX89-IEEE-NEXT: v_fma_f32 v5, -v2, v4, 1.0
; GFX89-IEEE-NEXT: v_rcp_f32_e32 v6, v2		; GFX89-IEEE-NEXT: v_fma_f32 v4, v5, v4, v4
; GFX89-IEEE-NEXT: v_rcp_f32_e32 v7, v3		; GFX89-IEEE-NEXT: v_mul_f32_e32 v5, v3, v4
; GFX89-IEEE-NEXT: v_fma_f32 v8, -v2, v6, 1.0		; GFX89-IEEE-NEXT: v_fma_f32 v6, -v2, v5, v3
; GFX89-IEEE-NEXT: v_fma_f32 v6, v8, v6, v6		; GFX89-IEEE-NEXT: v_fma_f32 v5, v6, v4, v5
; GFX89-IEEE-NEXT: v_fma_f32 v9, -v3, v7, 1.0		; GFX89-IEEE-NEXT: v_fma_f32 v2, -v2, v5, v3
; GFX89-IEEE-NEXT: v_fma_f32 v7, v9, v7, v7		; GFX89-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v5
; GFX89-IEEE-NEXT: v_mul_f32_e32 v8, v4, v6		; GFX89-IEEE-NEXT: v_div_scale_f32 v3, vcc, v1, v1, 1.0
; GFX89-IEEE-NEXT: v_mul_f32_e32 v9, v5, v7		; GFX89-IEEE-NEXT: v_div_scale_f32 v4, vcc, 1.0, v1, 1.0
; GFX89-IEEE-NEXT: v_fma_f32 v10, -v2, v8, v4
; GFX89-IEEE-NEXT: v_fma_f32 v11, -v3, v9, v5
; GFX89-IEEE-NEXT: v_fma_f32 v8, v10, v6, v8
; GFX89-IEEE-NEXT: v_fma_f32 v9, v11, v7, v9
; GFX89-IEEE-NEXT: v_fma_f32 v2, -v2, v8, v4
; GFX89-IEEE-NEXT: v_fma_f32 v3, -v3, v9, v5
; GFX89-IEEE-NEXT: v_div_fmas_f32 v2, v2, v6, v8
; GFX89-IEEE-NEXT: s_mov_b64 vcc, s[4:5]
; GFX89-IEEE-NEXT: v_div_fmas_f32 v3, v3, v7, v9
; GFX89-IEEE-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0		; GFX89-IEEE-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0
		; GFX89-IEEE-NEXT: v_rcp_f32_e32 v5, v3
		; GFX89-IEEE-NEXT: v_fma_f32 v6, -v3, v5, 1.0
		; GFX89-IEEE-NEXT: v_fma_f32 v5, v6, v5, v5
		; GFX89-IEEE-NEXT: v_mul_f32_e32 v6, v4, v5
		; GFX89-IEEE-NEXT: v_fma_f32 v7, -v3, v6, v4
		; GFX89-IEEE-NEXT: v_fma_f32 v6, v7, v5, v6
		; GFX89-IEEE-NEXT: v_fma_f32 v3, -v3, v6, v4
		; GFX89-IEEE-NEXT: v_div_fmas_f32 v3, v3, v5, v6
; GFX89-IEEE-NEXT: v_div_fixup_f32 v1, v3, v1, 1.0		; GFX89-IEEE-NEXT: v_div_fixup_f32 v1, v3, v1, 1.0
; GFX89-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX89-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX89-FLUSH-LABEL: v_rcp_v2f32_arcp:		; GFX89-FLUSH-LABEL: v_rcp_v2f32_arcp:
; GFX89-FLUSH: ; %bb.0:		; GFX89-FLUSH: ; %bb.0:
; GFX89-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX89-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX89-FLUSH-NEXT: v_div_scale_f32 v2, s[4:5], v0, v0, 1.0		; GFX89-FLUSH-NEXT: v_div_scale_f32 v2, vcc, v0, v0, 1.0
; GFX89-FLUSH-NEXT: v_div_scale_f32 v3, vcc, 1.0, v0, 1.0		; GFX89-FLUSH-NEXT: v_div_scale_f32 v3, vcc, 1.0, v0, 1.0
; GFX89-FLUSH-NEXT: v_rcp_f32_e32 v4, v2		; GFX89-FLUSH-NEXT: v_rcp_f32_e32 v4, v2
; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3		; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
; GFX89-FLUSH-NEXT: v_fma_f32 v5, -v2, v4, 1.0		; GFX89-FLUSH-NEXT: v_fma_f32 v5, -v2, v4, 1.0
; GFX89-FLUSH-NEXT: v_fma_f32 v4, v5, v4, v4		; GFX89-FLUSH-NEXT: v_fma_f32 v4, v5, v4, v4
; GFX89-FLUSH-NEXT: v_mul_f32_e32 v5, v3, v4		; GFX89-FLUSH-NEXT: v_mul_f32_e32 v5, v3, v4
; GFX89-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v3		; GFX89-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v3
; GFX89-FLUSH-NEXT: v_fma_f32 v5, v6, v4, v5		; GFX89-FLUSH-NEXT: v_fma_f32 v5, v6, v4, v5
; GFX89-FLUSH-NEXT: v_fma_f32 v2, -v2, v5, v3		; GFX89-FLUSH-NEXT: v_fma_f32 v2, -v2, v5, v3
; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0		; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
; GFX89-FLUSH-NEXT: v_div_scale_f32 v3, s[4:5], v1, v1, 1.0
; GFX89-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v4, v5		; GFX89-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v4, v5
		; GFX89-FLUSH-NEXT: v_div_scale_f32 v3, vcc, v1, v1, 1.0
; GFX89-FLUSH-NEXT: v_div_scale_f32 v4, vcc, 1.0, v1, 1.0		; GFX89-FLUSH-NEXT: v_div_scale_f32 v4, vcc, 1.0, v1, 1.0
; GFX89-FLUSH-NEXT: v_rcp_f32_e32 v5, v3
; GFX89-FLUSH-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0		; GFX89-FLUSH-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0
		; GFX89-FLUSH-NEXT: v_rcp_f32_e32 v5, v3
; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3		; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
; GFX89-FLUSH-NEXT: v_fma_f32 v2, -v3, v5, 1.0		; GFX89-FLUSH-NEXT: v_fma_f32 v2, -v3, v5, 1.0
; GFX89-FLUSH-NEXT: v_fma_f32 v2, v2, v5, v5		; GFX89-FLUSH-NEXT: v_fma_f32 v2, v2, v5, v5
; GFX89-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v2		; GFX89-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v2
; GFX89-FLUSH-NEXT: v_fma_f32 v6, -v3, v5, v4		; GFX89-FLUSH-NEXT: v_fma_f32 v6, -v3, v5, v4
; GFX89-FLUSH-NEXT: v_fma_f32 v5, v6, v2, v5		; GFX89-FLUSH-NEXT: v_fma_f32 v5, v6, v2, v5
; GFX89-FLUSH-NEXT: v_fma_f32 v3, -v3, v5, v4		; GFX89-FLUSH-NEXT: v_fma_f32 v3, -v3, v5, v4
; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0		; GFX89-FLUSH-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
; GFX89-FLUSH-NEXT: v_div_fmas_f32 v2, v3, v2, v5		; GFX89-FLUSH-NEXT: v_div_fmas_f32 v2, v3, v2, v5
; GFX89-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v1, 1.0		; GFX89-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v1, 1.0
; GFX89-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX89-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-IEEE-LABEL: v_rcp_v2f32_arcp:		; GFX10-IEEE-LABEL: v_rcp_v2f32_arcp:
; GFX10-IEEE: ; %bb.0:		; GFX10-IEEE: ; %bb.0:
; GFX10-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-IEEE-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-IEEE-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-IEEE-NEXT: v_div_scale_f32 v2, s4, v0, v0, 1.0		; GFX10-IEEE-NEXT: v_div_scale_f32 v2, vcc_lo, v0, v0, 1.0
; GFX10-IEEE-NEXT: v_div_scale_f32 v3, s4, v1, v1, 1.0		; GFX10-IEEE-NEXT: v_div_scale_f32 v5, vcc_lo, 1.0, v0, 1.0
; GFX10-IEEE-NEXT: v_div_scale_f32 v8, vcc_lo, 1.0, v0, 1.0		; GFX10-IEEE-NEXT: v_rcp_f32_e32 v3, v2
; GFX10-IEEE-NEXT: v_rcp_f32_e32 v4, v2		; GFX10-IEEE-NEXT: v_fma_f32 v4, -v2, v3, 1.0
; GFX10-IEEE-NEXT: v_rcp_f32_e32 v5, v3		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v3, v4, v3
; GFX10-IEEE-NEXT: v_fma_f32 v6, -v2, v4, 1.0		; GFX10-IEEE-NEXT: v_mul_f32_e32 v4, v5, v3
; GFX10-IEEE-NEXT: v_fma_f32 v7, -v3, v5, 1.0		; GFX10-IEEE-NEXT: v_fma_f32 v6, -v2, v4, v5
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v4, v6, v4		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v4, v6, v3
; GFX10-IEEE-NEXT: v_div_scale_f32 v6, s4, 1.0, v1, 1.0		; GFX10-IEEE-NEXT: v_fma_f32 v2, -v2, v4, v5
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v5, v7, v5		; GFX10-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v4
; GFX10-IEEE-NEXT: v_mul_f32_e32 v7, v8, v4		; GFX10-IEEE-NEXT: v_div_scale_f32 v3, vcc_lo, v1, v1, 1.0
; GFX10-IEEE-NEXT: v_mul_f32_e32 v9, v6, v5		; GFX10-IEEE-NEXT: v_div_scale_f32 v6, vcc_lo, 1.0, v1, 1.0
; GFX10-IEEE-NEXT: v_fma_f32 v10, -v2, v7, v8
; GFX10-IEEE-NEXT: v_fma_f32 v11, -v3, v9, v6
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v7, v10, v4
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v9, v11, v5
; GFX10-IEEE-NEXT: v_fma_f32 v2, -v2, v7, v8
; GFX10-IEEE-NEXT: v_fma_f32 v3, -v3, v9, v6
; GFX10-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v7
; GFX10-IEEE-NEXT: s_mov_b32 vcc_lo, s4
; GFX10-IEEE-NEXT: v_div_fmas_f32 v3, v3, v5, v9
; GFX10-IEEE-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0		; GFX10-IEEE-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0
		; GFX10-IEEE-NEXT: v_rcp_f32_e32 v4, v3
		; GFX10-IEEE-NEXT: v_fma_f32 v5, -v3, v4, 1.0
		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v4, v5, v4
		; GFX10-IEEE-NEXT: v_mul_f32_e32 v5, v6, v4
		; GFX10-IEEE-NEXT: v_fma_f32 v7, -v3, v5, v6
		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v5, v7, v4
		; GFX10-IEEE-NEXT: v_fma_f32 v3, -v3, v5, v6
		; GFX10-IEEE-NEXT: v_div_fmas_f32 v3, v3, v4, v5
; GFX10-IEEE-NEXT: v_div_fixup_f32 v1, v3, v1, 1.0		; GFX10-IEEE-NEXT: v_div_fixup_f32 v1, v3, v1, 1.0
; GFX10-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX10-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-FLUSH-LABEL: v_rcp_v2f32_arcp:		; GFX10-FLUSH-LABEL: v_rcp_v2f32_arcp:
; GFX10-FLUSH: ; %bb.0:		; GFX10-FLUSH: ; %bb.0:
; GFX10-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-FLUSH-NEXT: v_div_scale_f32 v2, s4, v0, v0, 1.0		; GFX10-FLUSH-NEXT: v_div_scale_f32 v2, vcc_lo, v0, v0, 1.0
; GFX10-FLUSH-NEXT: v_div_scale_f32 v4, vcc_lo, 1.0, v0, 1.0		; GFX10-FLUSH-NEXT: v_div_scale_f32 v4, vcc_lo, 1.0, v0, 1.0
; GFX10-FLUSH-NEXT: v_rcp_f32_e32 v3, v2		; GFX10-FLUSH-NEXT: v_rcp_f32_e32 v3, v2
; GFX10-FLUSH-NEXT: s_denorm_mode 3		; GFX10-FLUSH-NEXT: s_denorm_mode 3
; GFX10-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0		; GFX10-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0
; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v3, v5, v3		; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v3, v5, v3
; GFX10-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3		; GFX10-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3
; GFX10-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4		; GFX10-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4
; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v5, v6, v3		; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v5, v6, v3
; GFX10-FLUSH-NEXT: v_fma_f32 v2, -v2, v5, v4		; GFX10-FLUSH-NEXT: v_fma_f32 v2, -v2, v5, v4
; GFX10-FLUSH-NEXT: s_denorm_mode 0		; GFX10-FLUSH-NEXT: s_denorm_mode 0
; GFX10-FLUSH-NEXT: v_div_scale_f32 v4, s4, v1, v1, 1.0
; GFX10-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v3, v5		; GFX10-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v3, v5
; GFX10-FLUSH-NEXT: v_rcp_f32_e32 v3, v4		; GFX10-FLUSH-NEXT: v_div_scale_f32 v3, vcc_lo, v1, v1, 1.0
; GFX10-FLUSH-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0		; GFX10-FLUSH-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0
		; GFX10-FLUSH-NEXT: v_rcp_f32_e32 v4, v3
; GFX10-FLUSH-NEXT: v_div_scale_f32 v2, vcc_lo, 1.0, v1, 1.0		; GFX10-FLUSH-NEXT: v_div_scale_f32 v2, vcc_lo, 1.0, v1, 1.0
; GFX10-FLUSH-NEXT: s_denorm_mode 3		; GFX10-FLUSH-NEXT: s_denorm_mode 3
; GFX10-FLUSH-NEXT: v_fma_f32 v5, -v4, v3, 1.0		; GFX10-FLUSH-NEXT: v_fma_f32 v5, -v3, v4, 1.0
; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v3, v5, v3		; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v4, v5, v4
; GFX10-FLUSH-NEXT: v_mul_f32_e32 v5, v2, v3		; GFX10-FLUSH-NEXT: v_mul_f32_e32 v5, v2, v4
; GFX10-FLUSH-NEXT: v_fma_f32 v6, -v4, v5, v2		; GFX10-FLUSH-NEXT: v_fma_f32 v6, -v3, v5, v2
; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v5, v6, v3		; GFX10-FLUSH-NEXT: v_fmac_f32_e32 v5, v6, v4
; GFX10-FLUSH-NEXT: v_fma_f32 v2, -v4, v5, v2		; GFX10-FLUSH-NEXT: v_fma_f32 v2, -v3, v5, v2
; GFX10-FLUSH-NEXT: s_denorm_mode 0		; GFX10-FLUSH-NEXT: s_denorm_mode 0
; GFX10-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v3, v5		; GFX10-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v4, v5
; GFX10-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v1, 1.0		; GFX10-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v1, 1.0
; GFX10-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX10-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-IEEE-LABEL: v_rcp_v2f32_arcp:		; GFX11-IEEE-LABEL: v_rcp_v2f32_arcp:
; GFX11-IEEE: ; %bb.0:		; GFX11-IEEE: ; %bb.0:
; GFX11-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-IEEE-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-IEEE-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-IEEE-NEXT: v_div_scale_f32 v2, null, v0, v0, 1.0		; GFX11-IEEE-NEXT: v_div_scale_f32 v2, vcc_lo, v0, v0, 1.0
; GFX11-IEEE-NEXT: v_div_scale_f32 v3, null, v1, v1, 1.0		; GFX11-IEEE-NEXT: v_div_scale_f32 v5, vcc_lo, 1.0, v0, 1.0
; GFX11-IEEE-NEXT: v_div_scale_f32 v8, vcc_lo, 1.0, v0, 1.0		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-IEEE-NEXT: v_rcp_f32_e32 v3, v2
; GFX11-IEEE-NEXT: v_rcp_f32_e32 v4, v2
; GFX11-IEEE-NEXT: v_rcp_f32_e32 v5, v3
; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff		; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff
; GFX11-IEEE-NEXT: v_fma_f32 v6, -v2, v4, 1.0		; GFX11-IEEE-NEXT: v_fma_f32 v4, -v2, v3, 1.0
; GFX11-IEEE-NEXT: v_fma_f32 v7, -v3, v5, 1.0		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v3, v4, v3
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: v_dual_fmac_f32 v4, v6, v4 :: v_dual_fmac_f32 v5, v7, v5		; GFX11-IEEE-NEXT: v_mul_f32_e32 v4, v5, v3
; GFX11-IEEE-NEXT: v_div_scale_f32 v6, s0, 1.0, v1, 1.0		; GFX11-IEEE-NEXT: v_fma_f32 v6, -v2, v4, v5
; GFX11-IEEE-NEXT: v_mul_f32_e32 v7, v8, v4		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v4, v6, v3
; GFX11-IEEE-NEXT: v_mul_f32_e32 v9, v6, v5		; GFX11-IEEE-NEXT: v_fma_f32 v2, -v2, v4, v5
; GFX11-IEEE-NEXT: v_fma_f32 v10, -v2, v7, v8		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_2) \| instid1(VALU_DEP_3)
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-IEEE-NEXT: v_div_fmas_f32 v2, v2, v3, v4
; GFX11-IEEE-NEXT: v_fma_f32 v11, -v3, v9, v6		; GFX11-IEEE-NEXT: v_div_scale_f32 v3, vcc_lo, v1, v1, 1.0
; GFX11-IEEE-NEXT: v_fmac_f32_e32 v7, v10, v4		; GFX11-IEEE-NEXT: v_div_scale_f32 v6, vcc_lo, 1.0, v1, 1.0
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-IEEE-NEXT: v_fmac_f32_e32 v9, v11, v5
; GFX11-IEEE-NEXT: v_fma_f32 v2, -v2, v7, v8
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-IEEE-NEXT: v_fma_f32 v3, -v3, v9, v6
; GFX11-IEEE-NEXT: v_div_fmas_f32 v2, v2, v4, v7
; GFX11-IEEE-NEXT: s_mov_b32 vcc_lo, s0
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-IEEE-NEXT: v_div_fmas_f32 v3, v3, v5, v9
; GFX11-IEEE-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0		; GFX11-IEEE-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
		; GFX11-IEEE-NEXT: v_rcp_f32_e32 v4, v3
		; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff
		; GFX11-IEEE-NEXT: v_fma_f32 v5, -v3, v4, 1.0
		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v4, v5, v4
		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-IEEE-NEXT: v_mul_f32_e32 v5, v6, v4
		; GFX11-IEEE-NEXT: v_fma_f32 v7, -v3, v5, v6
		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v5, v7, v4
		; GFX11-IEEE-NEXT: v_fma_f32 v3, -v3, v5, v6
		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-IEEE-NEXT: v_div_fmas_f32 v3, v3, v4, v5
; GFX11-IEEE-NEXT: v_div_fixup_f32 v1, v3, v1, 1.0		; GFX11-IEEE-NEXT: v_div_fixup_f32 v1, v3, v1, 1.0
; GFX11-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX11-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-FLUSH-LABEL: v_rcp_v2f32_arcp:		; GFX11-FLUSH-LABEL: v_rcp_v2f32_arcp:
; GFX11-FLUSH: ; %bb.0:		; GFX11-FLUSH: ; %bb.0:
; GFX11-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-FLUSH-NEXT: v_div_scale_f32 v2, null, v0, v0, 1.0		; GFX11-FLUSH-NEXT: v_div_scale_f32 v2, vcc_lo, v0, v0, 1.0
; GFX11-FLUSH-NEXT: v_div_scale_f32 v4, vcc_lo, 1.0, v0, 1.0		; GFX11-FLUSH-NEXT: v_div_scale_f32 v4, vcc_lo, 1.0, v0, 1.0
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_rcp_f32_e32 v3, v2		; GFX11-FLUSH-NEXT: v_rcp_f32_e32 v3, v2
; GFX11-FLUSH-NEXT: s_denorm_mode 3		; GFX11-FLUSH-NEXT: s_denorm_mode 3
; GFX11-FLUSH-NEXT: s_waitcnt_depctr 0xfff		; GFX11-FLUSH-NEXT: s_waitcnt_depctr 0xfff
; GFX11-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0		; GFX11-FLUSH-NEXT: v_fma_f32 v5, -v2, v3, 1.0
; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v3, v5, v3		; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v3, v5, v3
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3		; GFX11-FLUSH-NEXT: v_mul_f32_e32 v5, v4, v3
; GFX11-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4		; GFX11-FLUSH-NEXT: v_fma_f32 v6, -v2, v5, v4
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v5, v6, v3		; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v5, v6, v3
; GFX11-FLUSH-NEXT: v_fma_f32 v2, -v2, v5, v4		; GFX11-FLUSH-NEXT: v_fma_f32 v2, -v2, v5, v4
; GFX11-FLUSH-NEXT: s_denorm_mode 0		; GFX11-FLUSH-NEXT: s_denorm_mode 0
; GFX11-FLUSH-NEXT: v_div_scale_f32 v4, null, v1, v1, 1.0		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v3, v5		; GFX11-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v3, v5
; GFX11-FLUSH-NEXT: v_rcp_f32_e32 v3, v4		; GFX11-FLUSH-NEXT: v_div_scale_f32 v3, vcc_lo, v1, v1, 1.0
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_4) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0		; GFX11-FLUSH-NEXT: v_div_fixup_f32 v0, v2, v0, 1.0
		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_4) \| instid1(VALU_DEP_1)
		; GFX11-FLUSH-NEXT: v_rcp_f32_e32 v4, v3
; GFX11-FLUSH-NEXT: v_div_scale_f32 v2, vcc_lo, 1.0, v1, 1.0		; GFX11-FLUSH-NEXT: v_div_scale_f32 v2, vcc_lo, 1.0, v1, 1.0
; GFX11-FLUSH-NEXT: s_denorm_mode 3		; GFX11-FLUSH-NEXT: s_denorm_mode 3
; GFX11-FLUSH-NEXT: s_waitcnt_depctr 0xfff		; GFX11-FLUSH-NEXT: s_waitcnt_depctr 0xfff
; GFX11-FLUSH-NEXT: v_fma_f32 v5, -v4, v3, 1.0		; GFX11-FLUSH-NEXT: v_fma_f32 v5, -v3, v4, 1.0
; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v3, v5, v3		; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v4, v5, v4
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_mul_f32_e32 v5, v2, v3		; GFX11-FLUSH-NEXT: v_mul_f32_e32 v5, v2, v4
; GFX11-FLUSH-NEXT: v_fma_f32 v6, -v4, v5, v2		; GFX11-FLUSH-NEXT: v_fma_f32 v6, -v3, v5, v2
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v5, v6, v3		; GFX11-FLUSH-NEXT: v_fmac_f32_e32 v5, v6, v4
; GFX11-FLUSH-NEXT: v_fma_f32 v2, -v4, v5, v2		; GFX11-FLUSH-NEXT: v_fma_f32 v2, -v3, v5, v2
; GFX11-FLUSH-NEXT: s_denorm_mode 0		; GFX11-FLUSH-NEXT: s_denorm_mode 0
; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-FLUSH-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v3, v5		; GFX11-FLUSH-NEXT: v_div_fmas_f32 v2, v2, v4, v5
; GFX11-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v1, 1.0		; GFX11-FLUSH-NEXT: v_div_fixup_f32 v1, v2, v1, 1.0
; GFX11-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX11-FLUSH-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv arcp <2 x float> <float 1.0, float 1.0>, %x		%fdiv = fdiv arcp <2 x float> <float 1.0, float 1.0>, %x
ret <2 x float> %fdiv		ret <2 x float> %fdiv
}		}

define <2 x float> @v_rcp_v2f32_arcp_afn(<2 x float> %x) {		define <2 x float> @v_rcp_v2f32_arcp_afn(<2 x float> %x) {
; GCN-LABEL: v_rcp_v2f32_arcp_afn:		; GCN-LABEL: v_rcp_v2f32_arcp_afn:
▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	; GFX11-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv afn <2 x float> %a, %b, !fpmath !0		%fdiv = fdiv afn <2 x float> %a, %b, !fpmath !0
ret <2 x float> %fdiv		ret <2 x float> %fdiv
}		}

define <2 x float> @v_fdiv_v2f32_arcp_ulp25(<2 x float> %a, <2 x float> %b) {		define <2 x float> @v_fdiv_v2f32_arcp_ulp25(<2 x float> %a, <2 x float> %b) {
; GFX6-IEEE-LABEL: v_fdiv_v2f32_arcp_ulp25:		; GFX6-IEEE-LABEL: v_fdiv_v2f32_arcp_ulp25:
; GFX6-IEEE: ; %bb.0:		; GFX6-IEEE: ; %bb.0:
; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-IEEE-NEXT: v_div_scale_f32 v4, s[4:5], v2, v2, v0		; GFX6-IEEE-NEXT: v_div_scale_f32 v4, vcc, v2, v2, v0
; GFX6-IEEE-NEXT: v_rcp_f32_e32 v5, v4		; GFX6-IEEE-NEXT: v_rcp_f32_e32 v5, v4
; GFX6-IEEE-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0		; GFX6-IEEE-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0
; GFX6-IEEE-NEXT: v_fma_f32 v7, -v4, v5, 1.0		; GFX6-IEEE-NEXT: v_fma_f32 v7, -v4, v5, 1.0
; GFX6-IEEE-NEXT: v_fma_f32 v5, v7, v5, v5		; GFX6-IEEE-NEXT: v_fma_f32 v5, v7, v5, v5
; GFX6-IEEE-NEXT: v_mul_f32_e32 v7, v6, v5		; GFX6-IEEE-NEXT: v_mul_f32_e32 v7, v6, v5
; GFX6-IEEE-NEXT: v_fma_f32 v8, -v4, v7, v6		; GFX6-IEEE-NEXT: v_fma_f32 v8, -v4, v7, v6
; GFX6-IEEE-NEXT: v_fma_f32 v7, v8, v5, v7		; GFX6-IEEE-NEXT: v_fma_f32 v7, v8, v5, v7
; GFX6-IEEE-NEXT: v_fma_f32 v4, -v4, v7, v6		; GFX6-IEEE-NEXT: v_fma_f32 v4, -v4, v7, v6
; GFX6-IEEE-NEXT: v_div_fmas_f32 v4, v4, v5, v7		; GFX6-IEEE-NEXT: v_div_fmas_f32 v4, v4, v5, v7
; GFX6-IEEE-NEXT: v_div_scale_f32 v5, s[4:5], v3, v3, v1		; GFX6-IEEE-NEXT: v_div_scale_f32 v5, vcc, v3, v3, v1
; GFX6-IEEE-NEXT: v_rcp_f32_e32 v6, v5		; GFX6-IEEE-NEXT: v_rcp_f32_e32 v6, v5
; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0		; GFX6-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0
; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, v1, v3, v1		; GFX6-IEEE-NEXT: v_div_scale_f32 v2, vcc, v1, v3, v1
; GFX6-IEEE-NEXT: v_fma_f32 v4, -v5, v6, 1.0		; GFX6-IEEE-NEXT: v_fma_f32 v4, -v5, v6, 1.0
; GFX6-IEEE-NEXT: v_fma_f32 v4, v4, v6, v6		; GFX6-IEEE-NEXT: v_fma_f32 v4, v4, v6, v6
; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v2, v4		; GFX6-IEEE-NEXT: v_mul_f32_e32 v6, v2, v4
; GFX6-IEEE-NEXT: v_fma_f32 v7, -v5, v6, v2		; GFX6-IEEE-NEXT: v_fma_f32 v7, -v5, v6, v2
; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6		; GFX6-IEEE-NEXT: v_fma_f32 v6, v7, v4, v6
Show All 19 Lines
; GCN-FLUSH-NEXT: v_mul_f32_e32 v1, v1, v3		; GCN-FLUSH-NEXT: v_mul_f32_e32 v1, v1, v3
; GCN-FLUSH-NEXT: v_mul_f32_e32 v0, v5, v0		; GCN-FLUSH-NEXT: v_mul_f32_e32 v0, v5, v0
; GCN-FLUSH-NEXT: v_mul_f32_e32 v1, v4, v1		; GCN-FLUSH-NEXT: v_mul_f32_e32 v1, v4, v1
; GCN-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GCN-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX89-IEEE-LABEL: v_fdiv_v2f32_arcp_ulp25:		; GFX89-IEEE-LABEL: v_fdiv_v2f32_arcp_ulp25:
; GFX89-IEEE: ; %bb.0:		; GFX89-IEEE: ; %bb.0:
; GFX89-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX89-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX89-IEEE-NEXT: v_div_scale_f32 v4, s[4:5], v2, v2, v0		; GFX89-IEEE-NEXT: v_div_scale_f32 v4, vcc, v2, v2, v0
; GFX89-IEEE-NEXT: v_div_scale_f32 v5, s[4:5], v3, v3, v1		; GFX89-IEEE-NEXT: v_div_scale_f32 v5, vcc, v0, v2, v0
; GFX89-IEEE-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0		; GFX89-IEEE-NEXT: v_rcp_f32_e32 v6, v4
; GFX89-IEEE-NEXT: v_div_scale_f32 v7, s[4:5], v1, v3, v1		; GFX89-IEEE-NEXT: v_fma_f32 v7, -v4, v6, 1.0
; GFX89-IEEE-NEXT: v_rcp_f32_e32 v8, v4		; GFX89-IEEE-NEXT: v_fma_f32 v6, v7, v6, v6
; GFX89-IEEE-NEXT: v_rcp_f32_e32 v9, v5		; GFX89-IEEE-NEXT: v_mul_f32_e32 v7, v5, v6
; GFX89-IEEE-NEXT: v_fma_f32 v10, -v4, v8, 1.0		; GFX89-IEEE-NEXT: v_fma_f32 v8, -v4, v7, v5
; GFX89-IEEE-NEXT: v_fma_f32 v8, v10, v8, v8		; GFX89-IEEE-NEXT: v_fma_f32 v7, v8, v6, v7
; GFX89-IEEE-NEXT: v_fma_f32 v11, -v5, v9, 1.0		; GFX89-IEEE-NEXT: v_fma_f32 v4, -v4, v7, v5
; GFX89-IEEE-NEXT: v_fma_f32 v9, v11, v9, v9		; GFX89-IEEE-NEXT: v_div_fmas_f32 v4, v4, v6, v7
; GFX89-IEEE-NEXT: v_mul_f32_e32 v10, v6, v8		; GFX89-IEEE-NEXT: v_div_scale_f32 v5, vcc, v3, v3, v1
; GFX89-IEEE-NEXT: v_mul_f32_e32 v11, v7, v9		; GFX89-IEEE-NEXT: v_div_scale_f32 v6, vcc, v1, v3, v1
; GFX89-IEEE-NEXT: v_fma_f32 v12, -v4, v10, v6
; GFX89-IEEE-NEXT: v_fma_f32 v13, -v5, v11, v7
; GFX89-IEEE-NEXT: v_fma_f32 v10, v12, v8, v10
; GFX89-IEEE-NEXT: v_fma_f32 v11, v13, v9, v11
; GFX89-IEEE-NEXT: v_fma_f32 v4, -v4, v10, v6
; GFX89-IEEE-NEXT: v_fma_f32 v5, -v5, v11, v7
; GFX89-IEEE-NEXT: v_div_fmas_f32 v4, v4, v8, v10
; GFX89-IEEE-NEXT: s_mov_b64 vcc, s[4:5]
; GFX89-IEEE-NEXT: v_div_fmas_f32 v5, v5, v9, v11
; GFX89-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0		; GFX89-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0
		; GFX89-IEEE-NEXT: v_rcp_f32_e32 v7, v5
		; GFX89-IEEE-NEXT: v_fma_f32 v8, -v5, v7, 1.0
		; GFX89-IEEE-NEXT: v_fma_f32 v7, v8, v7, v7
		; GFX89-IEEE-NEXT: v_mul_f32_e32 v8, v6, v7
		; GFX89-IEEE-NEXT: v_fma_f32 v9, -v5, v8, v6
		; GFX89-IEEE-NEXT: v_fma_f32 v8, v9, v7, v8
		; GFX89-IEEE-NEXT: v_fma_f32 v5, -v5, v8, v6
		; GFX89-IEEE-NEXT: v_div_fmas_f32 v5, v5, v7, v8
; GFX89-IEEE-NEXT: v_div_fixup_f32 v1, v5, v3, v1		; GFX89-IEEE-NEXT: v_div_fixup_f32 v1, v5, v3, v1
; GFX89-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX89-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-IEEE-LABEL: v_fdiv_v2f32_arcp_ulp25:		; GFX10-IEEE-LABEL: v_fdiv_v2f32_arcp_ulp25:
; GFX10-IEEE: ; %bb.0:		; GFX10-IEEE: ; %bb.0:
; GFX10-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-IEEE-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-IEEE-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-IEEE-NEXT: v_div_scale_f32 v4, s4, v2, v2, v0		; GFX10-IEEE-NEXT: v_div_scale_f32 v4, vcc_lo, v2, v2, v0
; GFX10-IEEE-NEXT: v_div_scale_f32 v5, s4, v3, v3, v1		; GFX10-IEEE-NEXT: v_div_scale_f32 v7, vcc_lo, v0, v2, v0
; GFX10-IEEE-NEXT: v_div_scale_f32 v10, vcc_lo, v0, v2, v0		; GFX10-IEEE-NEXT: v_rcp_f32_e32 v5, v4
; GFX10-IEEE-NEXT: v_rcp_f32_e32 v6, v4		; GFX10-IEEE-NEXT: v_fma_f32 v6, -v4, v5, 1.0
; GFX10-IEEE-NEXT: v_rcp_f32_e32 v7, v5		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v5, v6, v5
; GFX10-IEEE-NEXT: v_fma_f32 v8, -v4, v6, 1.0		; GFX10-IEEE-NEXT: v_mul_f32_e32 v6, v7, v5
; GFX10-IEEE-NEXT: v_fma_f32 v9, -v5, v7, 1.0		; GFX10-IEEE-NEXT: v_fma_f32 v8, -v4, v6, v7
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v6, v8, v6		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v6, v8, v5
; GFX10-IEEE-NEXT: v_div_scale_f32 v8, s4, v1, v3, v1		; GFX10-IEEE-NEXT: v_fma_f32 v4, -v4, v6, v7
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v7, v9, v7		; GFX10-IEEE-NEXT: v_div_fmas_f32 v4, v4, v5, v6
; GFX10-IEEE-NEXT: v_mul_f32_e32 v9, v10, v6		; GFX10-IEEE-NEXT: v_div_scale_f32 v5, vcc_lo, v3, v3, v1
; GFX10-IEEE-NEXT: v_mul_f32_e32 v11, v8, v7		; GFX10-IEEE-NEXT: v_div_scale_f32 v8, vcc_lo, v1, v3, v1
; GFX10-IEEE-NEXT: v_fma_f32 v12, -v4, v9, v10
; GFX10-IEEE-NEXT: v_fma_f32 v13, -v5, v11, v8
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v9, v12, v6
; GFX10-IEEE-NEXT: v_fmac_f32_e32 v11, v13, v7
; GFX10-IEEE-NEXT: v_fma_f32 v4, -v4, v9, v10
; GFX10-IEEE-NEXT: v_fma_f32 v5, -v5, v11, v8
; GFX10-IEEE-NEXT: v_div_fmas_f32 v4, v4, v6, v9
; GFX10-IEEE-NEXT: s_mov_b32 vcc_lo, s4
; GFX10-IEEE-NEXT: v_div_fmas_f32 v5, v5, v7, v11
; GFX10-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0		; GFX10-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0
		; GFX10-IEEE-NEXT: v_rcp_f32_e32 v6, v5
		; GFX10-IEEE-NEXT: v_fma_f32 v7, -v5, v6, 1.0
		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v6, v7, v6
		; GFX10-IEEE-NEXT: v_mul_f32_e32 v7, v8, v6
		; GFX10-IEEE-NEXT: v_fma_f32 v9, -v5, v7, v8
		; GFX10-IEEE-NEXT: v_fmac_f32_e32 v7, v9, v6
		; GFX10-IEEE-NEXT: v_fma_f32 v5, -v5, v7, v8
		; GFX10-IEEE-NEXT: v_div_fmas_f32 v5, v5, v6, v7
; GFX10-IEEE-NEXT: v_div_fixup_f32 v1, v5, v3, v1		; GFX10-IEEE-NEXT: v_div_fixup_f32 v1, v5, v3, v1
; GFX10-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX10-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-FLUSH-LABEL: v_fdiv_v2f32_arcp_ulp25:		; GFX10-FLUSH-LABEL: v_fdiv_v2f32_arcp_ulp25:
; GFX10-FLUSH: ; %bb.0:		; GFX10-FLUSH: ; %bb.0:
; GFX10-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-FLUSH-NEXT: v_cmp_lt_f32_e64 s4, 0x6f800000, \|v2\|		; GFX10-FLUSH-NEXT: v_cmp_lt_f32_e64 s4, 0x6f800000, \|v2\|
Show All 9 Lines
; GFX10-FLUSH-NEXT: v_mul_f32_e32 v1, v1, v3		; GFX10-FLUSH-NEXT: v_mul_f32_e32 v1, v1, v3
; GFX10-FLUSH-NEXT: v_mul_f32_e32 v1, v5, v1		; GFX10-FLUSH-NEXT: v_mul_f32_e32 v1, v5, v1
; GFX10-FLUSH-NEXT: s_setpc_b64 s[30:31]		; GFX10-FLUSH-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-IEEE-LABEL: v_fdiv_v2f32_arcp_ulp25:		; GFX11-IEEE-LABEL: v_fdiv_v2f32_arcp_ulp25:
; GFX11-IEEE: ; %bb.0:		; GFX11-IEEE: ; %bb.0:
; GFX11-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-IEEE-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-IEEE-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-IEEE-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-IEEE-NEXT: v_div_scale_f32 v4, null, v2, v2, v0		; GFX11-IEEE-NEXT: v_div_scale_f32 v4, vcc_lo, v2, v2, v0
; GFX11-IEEE-NEXT: v_div_scale_f32 v5, null, v3, v3, v1		; GFX11-IEEE-NEXT: v_div_scale_f32 v7, vcc_lo, v0, v2, v0
; GFX11-IEEE-NEXT: v_div_scale_f32 v10, vcc_lo, v0, v2, v0		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-IEEE-NEXT: v_rcp_f32_e32 v5, v4
; GFX11-IEEE-NEXT: v_rcp_f32_e32 v6, v4
; GFX11-IEEE-NEXT: v_rcp_f32_e32 v7, v5
; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff		; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff
; GFX11-IEEE-NEXT: v_fma_f32 v8, -v4, v6, 1.0		; GFX11-IEEE-NEXT: v_fma_f32 v6, -v4, v5, 1.0
; GFX11-IEEE-NEXT: v_fma_f32 v9, -v5, v7, 1.0		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v5, v6, v5
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: v_dual_fmac_f32 v6, v8, v6 :: v_dual_fmac_f32 v7, v9, v7		; GFX11-IEEE-NEXT: v_mul_f32_e32 v6, v7, v5
; GFX11-IEEE-NEXT: v_div_scale_f32 v8, s0, v1, v3, v1		; GFX11-IEEE-NEXT: v_fma_f32 v8, -v4, v6, v7
; GFX11-IEEE-NEXT: v_mul_f32_e32 v9, v10, v6		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v6, v8, v5
; GFX11-IEEE-NEXT: v_mul_f32_e32 v11, v8, v7		; GFX11-IEEE-NEXT: v_fma_f32 v4, -v4, v6, v7
; GFX11-IEEE-NEXT: v_fma_f32 v12, -v4, v9, v10		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_2) \| instid1(VALU_DEP_3)
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-IEEE-NEXT: v_div_fmas_f32 v4, v4, v5, v6
; GFX11-IEEE-NEXT: v_fma_f32 v13, -v5, v11, v8		; GFX11-IEEE-NEXT: v_div_scale_f32 v5, vcc_lo, v3, v3, v1
; GFX11-IEEE-NEXT: v_fmac_f32_e32 v9, v12, v6		; GFX11-IEEE-NEXT: v_div_scale_f32 v8, vcc_lo, v1, v3, v1
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-IEEE-NEXT: v_fmac_f32_e32 v11, v13, v7
; GFX11-IEEE-NEXT: v_fma_f32 v4, -v4, v9, v10
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-IEEE-NEXT: v_fma_f32 v5, -v5, v11, v8
; GFX11-IEEE-NEXT: v_div_fmas_f32 v4, v4, v6, v9
; GFX11-IEEE-NEXT: s_mov_b32 vcc_lo, s0
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-IEEE-NEXT: v_div_fmas_f32 v5, v5, v7, v11
; GFX11-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0		; GFX11-IEEE-NEXT: v_div_fixup_f32 v0, v4, v2, v0
; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_2)		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
		; GFX11-IEEE-NEXT: v_rcp_f32_e32 v6, v5
		; GFX11-IEEE-NEXT: s_waitcnt_depctr 0xfff
		; GFX11-IEEE-NEXT: v_fma_f32 v7, -v5, v6, 1.0
		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v6, v7, v6
		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-IEEE-NEXT: v_mul_f32_e32 v7, v8, v6
		; GFX11-IEEE-NEXT: v_fma_f32 v9, -v5, v7, v8
		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-IEEE-NEXT: v_fmac_f32_e32 v7, v9, v6
		; GFX11-IEEE-NEXT: v_fma_f32 v5, -v5, v7, v8
		; GFX11-IEEE-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-IEEE-NEXT: v_div_fmas_f32 v5, v5, v6, v7
; GFX11-IEEE-NEXT: v_div_fixup_f32 v1, v5, v3, v1		; GFX11-IEEE-NEXT: v_div_fixup_f32 v1, v5, v3, v1
; GFX11-IEEE-NEXT: s_setpc_b64 s[30:31]		; GFX11-IEEE-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-FLUSH-LABEL: v_fdiv_v2f32_arcp_ulp25:		; GFX11-FLUSH-LABEL: v_fdiv_v2f32_arcp_ulp25:
; GFX11-FLUSH: ; %bb.0:		; GFX11-FLUSH: ; %bb.0:
; GFX11-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-FLUSH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-FLUSH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-FLUSH-NEXT: v_cmp_lt_f32_e64 s0, 0x6f800000, \|v2\|		; GFX11-FLUSH-NEXT: v_cmp_lt_f32_e64 s0, 0x6f800000, \|v2\|
▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f64.ll

Show All 12 Lines

; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -denormal-fp-math=ieee -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX11 %s		; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -denormal-fp-math=ieee -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX11 %s
; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -denormal-fp-math=preserve-sign -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX11 %s		; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -denormal-fp-math=preserve-sign -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX11 %s

define double @v_fdiv_f64(double %a, double %b) {		define double @v_fdiv_f64(double %a, double %b) {
; GFX6-LABEL: v_fdiv_f64:		; GFX6-LABEL: v_fdiv_f64:
; GFX6: ; %bb.0:		; GFX6: ; %bb.0:
; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-NEXT: v_div_scale_f64 v[4:5], s[4:5], v[2:3], v[2:3], v[0:1]		; GFX6-NEXT: v_div_scale_f64 v[4:5], vcc, v[2:3], v[2:3], v[0:1]
; GFX6-NEXT: v_div_scale_f64 v[10:11], s[4:5], v[0:1], v[2:3], v[0:1]		; GFX6-NEXT: v_div_scale_f64 v[10:11], vcc, v[0:1], v[2:3], v[0:1]
; GFX6-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]		; GFX6-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v3, v5		; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v3, v5
; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v1, v11		; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v1, v11
; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]		; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]
; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]		; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]		; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX6-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]		; GFX6-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]
; GFX6-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]		; GFX6-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]
; GFX6-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]		; GFX6-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]
; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]		; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]
; GFX6-NEXT: s_setpc_b64 s[30:31]		; GFX6-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX8-LABEL: v_fdiv_f64:		; GFX8-LABEL: v_fdiv_f64:
; GFX8: ; %bb.0:		; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: v_div_scale_f64 v[4:5], s[4:5], v[2:3], v[2:3], v[0:1]		; GFX8-NEXT: v_div_scale_f64 v[4:5], vcc, v[2:3], v[2:3], v[0:1]
; GFX8-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]		; GFX8-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX8-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX8-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX8-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]		; GFX8-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX8-NEXT: v_div_scale_f64 v[8:9], vcc, v[0:1], v[2:3], v[0:1]		; GFX8-NEXT: v_div_scale_f64 v[8:9], vcc, v[0:1], v[2:3], v[0:1]
; GFX8-NEXT: v_fma_f64 v[10:11], -v[4:5], v[6:7], 1.0		; GFX8-NEXT: v_fma_f64 v[10:11], -v[4:5], v[6:7], 1.0
; GFX8-NEXT: v_fma_f64 v[6:7], v[6:7], v[10:11], v[6:7]		; GFX8-NEXT: v_fma_f64 v[6:7], v[6:7], v[10:11], v[6:7]
; GFX8-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]		; GFX8-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]
; GFX8-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]		; GFX8-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]
; GFX8-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[10:11]		; GFX8-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[10:11]
; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]		; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]
; GFX8-NEXT: s_setpc_b64 s[30:31]		; GFX8-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX9-LABEL: v_fdiv_f64:		; GFX9-LABEL: v_fdiv_f64:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: v_div_scale_f64 v[4:5], s[4:5], v[2:3], v[2:3], v[0:1]		; GFX9-NEXT: v_div_scale_f64 v[4:5], vcc, v[2:3], v[2:3], v[0:1]
; GFX9-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]		; GFX9-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX9-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX9-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX9-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]		; GFX9-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX9-NEXT: v_div_scale_f64 v[8:9], vcc, v[0:1], v[2:3], v[0:1]		; GFX9-NEXT: v_div_scale_f64 v[8:9], vcc, v[0:1], v[2:3], v[0:1]
; GFX9-NEXT: v_fma_f64 v[10:11], -v[4:5], v[6:7], 1.0		; GFX9-NEXT: v_fma_f64 v[10:11], -v[4:5], v[6:7], 1.0
; GFX9-NEXT: v_fma_f64 v[6:7], v[6:7], v[10:11], v[6:7]		; GFX9-NEXT: v_fma_f64 v[6:7], v[6:7], v[10:11], v[6:7]
; GFX9-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]		; GFX9-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]
; GFX9-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]		; GFX9-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]
; GFX9-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[10:11]		; GFX9-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[10:11]
; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]		; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: v_fdiv_f64:		; GFX10-LABEL: v_fdiv_f64:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: v_div_scale_f64 v[4:5], s4, v[2:3], v[2:3], v[0:1]		; GFX10-NEXT: v_div_scale_f64 v[4:5], vcc_lo, v[2:3], v[2:3], v[0:1]
; GFX10-NEXT: v_div_scale_f64 v[10:11], vcc_lo, v[0:1], v[2:3], v[0:1]		; GFX10-NEXT: v_div_scale_f64 v[10:11], vcc_lo, v[0:1], v[2:3], v[0:1]
; GFX10-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]		; GFX10-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX10-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX10-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX10-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]		; GFX10-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX10-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX10-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX10-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]		; GFX10-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX10-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]		; GFX10-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]
; GFX10-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]		; GFX10-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]
; GFX10-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]		; GFX10-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]
; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]		; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-LABEL: v_fdiv_f64:		; GFX11-LABEL: v_fdiv_f64:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: v_div_scale_f64 v[4:5], null, v[2:3], v[2:3], v[0:1]		; GFX11-NEXT: v_div_scale_f64 v[4:5], vcc_lo, v[2:3], v[2:3], v[0:1]
; GFX11-NEXT: v_div_scale_f64 v[10:11], vcc_lo, v[0:1], v[2:3], v[0:1]		; GFX11-NEXT: v_div_scale_f64 v[10:11], vcc_lo, v[0:1], v[2:3], v[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]		; GFX11-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX11-NEXT: s_waitcnt_depctr 0xfff		; GFX11-NEXT: s_waitcnt_depctr 0xfff
; GFX11-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX11-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX11-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]		; GFX11-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX11-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	; GFX11-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv afn double %a, %b		%fdiv = fdiv afn double %a, %b
ret double %fdiv		ret double %fdiv
}		}

define double @v_fdiv_f64_ulp25(double %a, double %b) {		define double @v_fdiv_f64_ulp25(double %a, double %b) {
; GFX6-LABEL: v_fdiv_f64_ulp25:		; GFX6-LABEL: v_fdiv_f64_ulp25:
; GFX6: ; %bb.0:		; GFX6: ; %bb.0:
; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-NEXT: v_div_scale_f64 v[4:5], s[4:5], v[2:3], v[2:3], v[0:1]		; GFX6-NEXT: v_div_scale_f64 v[4:5], vcc, v[2:3], v[2:3], v[0:1]
; GFX6-NEXT: v_div_scale_f64 v[10:11], s[4:5], v[0:1], v[2:3], v[0:1]		; GFX6-NEXT: v_div_scale_f64 v[10:11], vcc, v[0:1], v[2:3], v[0:1]
; GFX6-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]		; GFX6-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v3, v5		; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v3, v5
; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v1, v11		; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v1, v11
; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]		; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]
; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]		; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]		; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX6-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]		; GFX6-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]
; GFX6-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]		; GFX6-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]
; GFX6-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]		; GFX6-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]
; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]		; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]
; GFX6-NEXT: s_setpc_b64 s[30:31]		; GFX6-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX8-LABEL: v_fdiv_f64_ulp25:		; GFX8-LABEL: v_fdiv_f64_ulp25:
; GFX8: ; %bb.0:		; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: v_div_scale_f64 v[4:5], s[4:5], v[2:3], v[2:3], v[0:1]		; GFX8-NEXT: v_div_scale_f64 v[4:5], vcc, v[2:3], v[2:3], v[0:1]
; GFX8-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]		; GFX8-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX8-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX8-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX8-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]		; GFX8-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX8-NEXT: v_div_scale_f64 v[8:9], vcc, v[0:1], v[2:3], v[0:1]		; GFX8-NEXT: v_div_scale_f64 v[8:9], vcc, v[0:1], v[2:3], v[0:1]
; GFX8-NEXT: v_fma_f64 v[10:11], -v[4:5], v[6:7], 1.0		; GFX8-NEXT: v_fma_f64 v[10:11], -v[4:5], v[6:7], 1.0
; GFX8-NEXT: v_fma_f64 v[6:7], v[6:7], v[10:11], v[6:7]		; GFX8-NEXT: v_fma_f64 v[6:7], v[6:7], v[10:11], v[6:7]
; GFX8-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]		; GFX8-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]
; GFX8-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]		; GFX8-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]
; GFX8-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[10:11]		; GFX8-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[10:11]
; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]		; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]
; GFX8-NEXT: s_setpc_b64 s[30:31]		; GFX8-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX9-LABEL: v_fdiv_f64_ulp25:		; GFX9-LABEL: v_fdiv_f64_ulp25:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: v_div_scale_f64 v[4:5], s[4:5], v[2:3], v[2:3], v[0:1]		; GFX9-NEXT: v_div_scale_f64 v[4:5], vcc, v[2:3], v[2:3], v[0:1]
; GFX9-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]		; GFX9-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX9-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX9-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX9-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]		; GFX9-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX9-NEXT: v_div_scale_f64 v[8:9], vcc, v[0:1], v[2:3], v[0:1]		; GFX9-NEXT: v_div_scale_f64 v[8:9], vcc, v[0:1], v[2:3], v[0:1]
; GFX9-NEXT: v_fma_f64 v[10:11], -v[4:5], v[6:7], 1.0		; GFX9-NEXT: v_fma_f64 v[10:11], -v[4:5], v[6:7], 1.0
; GFX9-NEXT: v_fma_f64 v[6:7], v[6:7], v[10:11], v[6:7]		; GFX9-NEXT: v_fma_f64 v[6:7], v[6:7], v[10:11], v[6:7]
; GFX9-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]		; GFX9-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]
; GFX9-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]		; GFX9-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]
; GFX9-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[10:11]		; GFX9-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[10:11]
; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]		; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: v_fdiv_f64_ulp25:		; GFX10-LABEL: v_fdiv_f64_ulp25:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: v_div_scale_f64 v[4:5], s4, v[2:3], v[2:3], v[0:1]		; GFX10-NEXT: v_div_scale_f64 v[4:5], vcc_lo, v[2:3], v[2:3], v[0:1]
; GFX10-NEXT: v_div_scale_f64 v[10:11], vcc_lo, v[0:1], v[2:3], v[0:1]		; GFX10-NEXT: v_div_scale_f64 v[10:11], vcc_lo, v[0:1], v[2:3], v[0:1]
; GFX10-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]		; GFX10-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX10-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX10-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX10-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]		; GFX10-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX10-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX10-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX10-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]		; GFX10-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX10-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]		; GFX10-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]
; GFX10-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]		; GFX10-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]
; GFX10-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]		; GFX10-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]
; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]		; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-LABEL: v_fdiv_f64_ulp25:		; GFX11-LABEL: v_fdiv_f64_ulp25:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: v_div_scale_f64 v[4:5], null, v[2:3], v[2:3], v[0:1]		; GFX11-NEXT: v_div_scale_f64 v[4:5], vcc_lo, v[2:3], v[2:3], v[0:1]
; GFX11-NEXT: v_div_scale_f64 v[10:11], vcc_lo, v[0:1], v[2:3], v[0:1]		; GFX11-NEXT: v_div_scale_f64 v[10:11], vcc_lo, v[0:1], v[2:3], v[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]		; GFX11-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX11-NEXT: s_waitcnt_depctr 0xfff		; GFX11-NEXT: s_waitcnt_depctr 0xfff
; GFX11-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX11-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX11-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]		; GFX11-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX11-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX11-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]		; GFX11-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]		; GFX11-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]
; GFX11-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]		; GFX11-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]		; GFX11-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]
; GFX11-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]		; GFX11-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv double %a, %b, !fpmath !0		%fdiv = fdiv double %a, %b, !fpmath !0
ret double %fdiv		ret double %fdiv
}		}

define double @v_rcp_f64(double %x) {		define double @v_rcp_f64(double %x) {
; GFX6-LABEL: v_rcp_f64:		; GFX6-LABEL: v_rcp_f64:
; GFX6: ; %bb.0:		; GFX6: ; %bb.0:
; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-NEXT: v_div_scale_f64 v[2:3], s[4:5], v[0:1], v[0:1], 1.0		; GFX6-NEXT: v_div_scale_f64 v[2:3], vcc, v[0:1], v[0:1], 1.0
; GFX6-NEXT: v_div_scale_f64 v[8:9], s[4:5], 1.0, v[0:1], 1.0		; GFX6-NEXT: v_div_scale_f64 v[8:9], vcc, 1.0, v[0:1], 1.0
; GFX6-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]		; GFX6-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]
; GFX6-NEXT: v_mov_b32_e32 v10, 0x3ff00000		; GFX6-NEXT: v_mov_b32_e32 v10, 0x3ff00000
; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v10, v9		; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v10, v9
; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v1, v3		; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v1, v3
; GFX6-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX6-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]		; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]
; GFX6-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]		; GFX6-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
; GFX6-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX6-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
; GFX6-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]		; GFX6-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
; GFX6-NEXT: v_mul_f64 v[6:7], v[8:9], v[4:5]		; GFX6-NEXT: v_mul_f64 v[6:7], v[8:9], v[4:5]
; GFX6-NEXT: v_fma_f64 v[2:3], -v[2:3], v[6:7], v[8:9]		; GFX6-NEXT: v_fma_f64 v[2:3], -v[2:3], v[6:7], v[8:9]
; GFX6-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[6:7]		; GFX6-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[6:7]
; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0		; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0
; GFX6-NEXT: s_setpc_b64 s[30:31]		; GFX6-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX8-LABEL: v_rcp_f64:		; GFX8-LABEL: v_rcp_f64:
; GFX8: ; %bb.0:		; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: v_div_scale_f64 v[2:3], s[4:5], v[0:1], v[0:1], 1.0		; GFX8-NEXT: v_div_scale_f64 v[2:3], vcc, v[0:1], v[0:1], 1.0
; GFX8-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]		; GFX8-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]
; GFX8-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX8-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
; GFX8-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]		; GFX8-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
; GFX8-NEXT: v_div_scale_f64 v[6:7], vcc, 1.0, v[0:1], 1.0		; GFX8-NEXT: v_div_scale_f64 v[6:7], vcc, 1.0, v[0:1], 1.0
; GFX8-NEXT: v_fma_f64 v[8:9], -v[2:3], v[4:5], 1.0		; GFX8-NEXT: v_fma_f64 v[8:9], -v[2:3], v[4:5], 1.0
; GFX8-NEXT: v_fma_f64 v[4:5], v[4:5], v[8:9], v[4:5]		; GFX8-NEXT: v_fma_f64 v[4:5], v[4:5], v[8:9], v[4:5]
; GFX8-NEXT: v_mul_f64 v[8:9], v[6:7], v[4:5]		; GFX8-NEXT: v_mul_f64 v[8:9], v[6:7], v[4:5]
; GFX8-NEXT: v_fma_f64 v[2:3], -v[2:3], v[8:9], v[6:7]		; GFX8-NEXT: v_fma_f64 v[2:3], -v[2:3], v[8:9], v[6:7]
; GFX8-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[8:9]		; GFX8-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[8:9]
; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0		; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0
; GFX8-NEXT: s_setpc_b64 s[30:31]		; GFX8-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX9-LABEL: v_rcp_f64:		; GFX9-LABEL: v_rcp_f64:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: v_div_scale_f64 v[2:3], s[4:5], v[0:1], v[0:1], 1.0		; GFX9-NEXT: v_div_scale_f64 v[2:3], vcc, v[0:1], v[0:1], 1.0
; GFX9-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]		; GFX9-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]
; GFX9-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX9-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
; GFX9-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]		; GFX9-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
; GFX9-NEXT: v_div_scale_f64 v[6:7], vcc, 1.0, v[0:1], 1.0		; GFX9-NEXT: v_div_scale_f64 v[6:7], vcc, 1.0, v[0:1], 1.0
; GFX9-NEXT: v_fma_f64 v[8:9], -v[2:3], v[4:5], 1.0		; GFX9-NEXT: v_fma_f64 v[8:9], -v[2:3], v[4:5], 1.0
; GFX9-NEXT: v_fma_f64 v[4:5], v[4:5], v[8:9], v[4:5]		; GFX9-NEXT: v_fma_f64 v[4:5], v[4:5], v[8:9], v[4:5]
; GFX9-NEXT: v_mul_f64 v[8:9], v[6:7], v[4:5]		; GFX9-NEXT: v_mul_f64 v[8:9], v[6:7], v[4:5]
; GFX9-NEXT: v_fma_f64 v[2:3], -v[2:3], v[8:9], v[6:7]		; GFX9-NEXT: v_fma_f64 v[2:3], -v[2:3], v[8:9], v[6:7]
; GFX9-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[8:9]		; GFX9-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[8:9]
; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0		; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: v_rcp_f64:		; GFX10-LABEL: v_rcp_f64:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: v_div_scale_f64 v[2:3], s4, v[0:1], v[0:1], 1.0		; GFX10-NEXT: v_div_scale_f64 v[2:3], vcc_lo, v[0:1], v[0:1], 1.0
; GFX10-NEXT: v_div_scale_f64 v[8:9], vcc_lo, 1.0, v[0:1], 1.0		; GFX10-NEXT: v_div_scale_f64 v[8:9], vcc_lo, 1.0, v[0:1], 1.0
; GFX10-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]		; GFX10-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]
; GFX10-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX10-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
; GFX10-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]		; GFX10-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
; GFX10-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX10-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
; GFX10-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]		; GFX10-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
; GFX10-NEXT: v_mul_f64 v[6:7], v[8:9], v[4:5]		; GFX10-NEXT: v_mul_f64 v[6:7], v[8:9], v[4:5]
; GFX10-NEXT: v_fma_f64 v[2:3], -v[2:3], v[6:7], v[8:9]		; GFX10-NEXT: v_fma_f64 v[2:3], -v[2:3], v[6:7], v[8:9]
; GFX10-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[6:7]		; GFX10-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[6:7]
; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0		; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-LABEL: v_rcp_f64:		; GFX11-LABEL: v_rcp_f64:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: v_div_scale_f64 v[2:3], null, v[0:1], v[0:1], 1.0		; GFX11-NEXT: v_div_scale_f64 v[2:3], vcc_lo, v[0:1], v[0:1], 1.0
; GFX11-NEXT: v_div_scale_f64 v[8:9], vcc_lo, 1.0, v[0:1], 1.0		; GFX11-NEXT: v_div_scale_f64 v[8:9], vcc_lo, 1.0, v[0:1], 1.0
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]		; GFX11-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]
; GFX11-NEXT: s_waitcnt_depctr 0xfff		; GFX11-NEXT: s_waitcnt_depctr 0xfff
; GFX11-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX11-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
; GFX11-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]		; GFX11-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX11-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
; GFX11-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]		; GFX11-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_mul_f64 v[6:7], v[8:9], v[4:5]		; GFX11-NEXT: v_mul_f64 v[6:7], v[8:9], v[4:5]
; GFX11-NEXT: v_fma_f64 v[2:3], -v[2:3], v[6:7], v[8:9]		; GFX11-NEXT: v_fma_f64 v[2:3], -v[2:3], v[6:7], v[8:9]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[6:7]		; GFX11-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[6:7]
; GFX11-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0		; GFX11-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv double 1.0, %x		%fdiv = fdiv double 1.0, %x
ret double %fdiv		ret double %fdiv
}		}

define double @v_rcp_f64_arcp(double %x) {		define double @v_rcp_f64_arcp(double %x) {
; GFX6-LABEL: v_rcp_f64_arcp:		; GFX6-LABEL: v_rcp_f64_arcp:
; GFX6: ; %bb.0:		; GFX6: ; %bb.0:
; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-NEXT: v_div_scale_f64 v[2:3], s[4:5], v[0:1], v[0:1], 1.0		; GFX6-NEXT: v_div_scale_f64 v[2:3], vcc, v[0:1], v[0:1], 1.0
; GFX6-NEXT: v_div_scale_f64 v[8:9], s[4:5], 1.0, v[0:1], 1.0		; GFX6-NEXT: v_div_scale_f64 v[8:9], vcc, 1.0, v[0:1], 1.0
; GFX6-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]		; GFX6-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]
; GFX6-NEXT: v_mov_b32_e32 v10, 0x3ff00000		; GFX6-NEXT: v_mov_b32_e32 v10, 0x3ff00000
; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v10, v9		; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v10, v9
; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v1, v3		; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v1, v3
; GFX6-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX6-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]		; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]
; GFX6-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]		; GFX6-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
; GFX6-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX6-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
; GFX6-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]		; GFX6-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
; GFX6-NEXT: v_mul_f64 v[6:7], v[8:9], v[4:5]		; GFX6-NEXT: v_mul_f64 v[6:7], v[8:9], v[4:5]
; GFX6-NEXT: v_fma_f64 v[2:3], -v[2:3], v[6:7], v[8:9]		; GFX6-NEXT: v_fma_f64 v[2:3], -v[2:3], v[6:7], v[8:9]
; GFX6-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[6:7]		; GFX6-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[6:7]
; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0		; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0
; GFX6-NEXT: s_setpc_b64 s[30:31]		; GFX6-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX8-LABEL: v_rcp_f64_arcp:		; GFX8-LABEL: v_rcp_f64_arcp:
; GFX8: ; %bb.0:		; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: v_div_scale_f64 v[2:3], s[4:5], v[0:1], v[0:1], 1.0		; GFX8-NEXT: v_div_scale_f64 v[2:3], vcc, v[0:1], v[0:1], 1.0
; GFX8-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]		; GFX8-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]
; GFX8-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX8-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
; GFX8-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]		; GFX8-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
; GFX8-NEXT: v_div_scale_f64 v[6:7], vcc, 1.0, v[0:1], 1.0		; GFX8-NEXT: v_div_scale_f64 v[6:7], vcc, 1.0, v[0:1], 1.0
; GFX8-NEXT: v_fma_f64 v[8:9], -v[2:3], v[4:5], 1.0		; GFX8-NEXT: v_fma_f64 v[8:9], -v[2:3], v[4:5], 1.0
; GFX8-NEXT: v_fma_f64 v[4:5], v[4:5], v[8:9], v[4:5]		; GFX8-NEXT: v_fma_f64 v[4:5], v[4:5], v[8:9], v[4:5]
; GFX8-NEXT: v_mul_f64 v[8:9], v[6:7], v[4:5]		; GFX8-NEXT: v_mul_f64 v[8:9], v[6:7], v[4:5]
; GFX8-NEXT: v_fma_f64 v[2:3], -v[2:3], v[8:9], v[6:7]		; GFX8-NEXT: v_fma_f64 v[2:3], -v[2:3], v[8:9], v[6:7]
; GFX8-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[8:9]		; GFX8-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[8:9]
; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0		; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0
; GFX8-NEXT: s_setpc_b64 s[30:31]		; GFX8-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX9-LABEL: v_rcp_f64_arcp:		; GFX9-LABEL: v_rcp_f64_arcp:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: v_div_scale_f64 v[2:3], s[4:5], v[0:1], v[0:1], 1.0		; GFX9-NEXT: v_div_scale_f64 v[2:3], vcc, v[0:1], v[0:1], 1.0
; GFX9-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]		; GFX9-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]
; GFX9-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX9-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
; GFX9-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]		; GFX9-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
; GFX9-NEXT: v_div_scale_f64 v[6:7], vcc, 1.0, v[0:1], 1.0		; GFX9-NEXT: v_div_scale_f64 v[6:7], vcc, 1.0, v[0:1], 1.0
; GFX9-NEXT: v_fma_f64 v[8:9], -v[2:3], v[4:5], 1.0		; GFX9-NEXT: v_fma_f64 v[8:9], -v[2:3], v[4:5], 1.0
; GFX9-NEXT: v_fma_f64 v[4:5], v[4:5], v[8:9], v[4:5]		; GFX9-NEXT: v_fma_f64 v[4:5], v[4:5], v[8:9], v[4:5]
; GFX9-NEXT: v_mul_f64 v[8:9], v[6:7], v[4:5]		; GFX9-NEXT: v_mul_f64 v[8:9], v[6:7], v[4:5]
; GFX9-NEXT: v_fma_f64 v[2:3], -v[2:3], v[8:9], v[6:7]		; GFX9-NEXT: v_fma_f64 v[2:3], -v[2:3], v[8:9], v[6:7]
; GFX9-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[8:9]		; GFX9-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[8:9]
; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0		; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: v_rcp_f64_arcp:		; GFX10-LABEL: v_rcp_f64_arcp:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: v_div_scale_f64 v[2:3], s4, v[0:1], v[0:1], 1.0		; GFX10-NEXT: v_div_scale_f64 v[2:3], vcc_lo, v[0:1], v[0:1], 1.0
; GFX10-NEXT: v_div_scale_f64 v[8:9], vcc_lo, 1.0, v[0:1], 1.0		; GFX10-NEXT: v_div_scale_f64 v[8:9], vcc_lo, 1.0, v[0:1], 1.0
; GFX10-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]		; GFX10-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]
; GFX10-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX10-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
; GFX10-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]		; GFX10-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
; GFX10-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX10-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
; GFX10-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]		; GFX10-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
; GFX10-NEXT: v_mul_f64 v[6:7], v[8:9], v[4:5]		; GFX10-NEXT: v_mul_f64 v[6:7], v[8:9], v[4:5]
; GFX10-NEXT: v_fma_f64 v[2:3], -v[2:3], v[6:7], v[8:9]		; GFX10-NEXT: v_fma_f64 v[2:3], -v[2:3], v[6:7], v[8:9]
; GFX10-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[6:7]		; GFX10-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[6:7]
; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0		; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-LABEL: v_rcp_f64_arcp:		; GFX11-LABEL: v_rcp_f64_arcp:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: v_div_scale_f64 v[2:3], null, v[0:1], v[0:1], 1.0		; GFX11-NEXT: v_div_scale_f64 v[2:3], vcc_lo, v[0:1], v[0:1], 1.0
; GFX11-NEXT: v_div_scale_f64 v[8:9], vcc_lo, 1.0, v[0:1], 1.0		; GFX11-NEXT: v_div_scale_f64 v[8:9], vcc_lo, 1.0, v[0:1], 1.0
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]		; GFX11-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]
; GFX11-NEXT: s_waitcnt_depctr 0xfff		; GFX11-NEXT: s_waitcnt_depctr 0xfff
; GFX11-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX11-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
; GFX11-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]		; GFX11-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX11-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	; GFX11-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv arcp afn double 1.0, %x		%fdiv = fdiv arcp afn double 1.0, %x
ret double %fdiv		ret double %fdiv
}		}

define double @v_rcp_f64_ulp25(double %x) {		define double @v_rcp_f64_ulp25(double %x) {
; GFX6-LABEL: v_rcp_f64_ulp25:		; GFX6-LABEL: v_rcp_f64_ulp25:
; GFX6: ; %bb.0:		; GFX6: ; %bb.0:
; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-NEXT: v_div_scale_f64 v[2:3], s[4:5], v[0:1], v[0:1], 1.0		; GFX6-NEXT: v_div_scale_f64 v[2:3], vcc, v[0:1], v[0:1], 1.0
; GFX6-NEXT: v_div_scale_f64 v[8:9], s[4:5], 1.0, v[0:1], 1.0		; GFX6-NEXT: v_div_scale_f64 v[8:9], vcc, 1.0, v[0:1], 1.0
; GFX6-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]		; GFX6-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]
; GFX6-NEXT: v_mov_b32_e32 v10, 0x3ff00000		; GFX6-NEXT: v_mov_b32_e32 v10, 0x3ff00000
; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v10, v9		; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v10, v9
; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v1, v3		; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v1, v3
; GFX6-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX6-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]		; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]
; GFX6-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]		; GFX6-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
; GFX6-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX6-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
; GFX6-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]		; GFX6-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
; GFX6-NEXT: v_mul_f64 v[6:7], v[8:9], v[4:5]		; GFX6-NEXT: v_mul_f64 v[6:7], v[8:9], v[4:5]
; GFX6-NEXT: v_fma_f64 v[2:3], -v[2:3], v[6:7], v[8:9]		; GFX6-NEXT: v_fma_f64 v[2:3], -v[2:3], v[6:7], v[8:9]
; GFX6-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[6:7]		; GFX6-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[6:7]
; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0		; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0
; GFX6-NEXT: s_setpc_b64 s[30:31]		; GFX6-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX8-LABEL: v_rcp_f64_ulp25:		; GFX8-LABEL: v_rcp_f64_ulp25:
; GFX8: ; %bb.0:		; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: v_div_scale_f64 v[2:3], s[4:5], v[0:1], v[0:1], 1.0		; GFX8-NEXT: v_div_scale_f64 v[2:3], vcc, v[0:1], v[0:1], 1.0
; GFX8-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]		; GFX8-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]
; GFX8-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX8-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
; GFX8-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]		; GFX8-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
; GFX8-NEXT: v_div_scale_f64 v[6:7], vcc, 1.0, v[0:1], 1.0		; GFX8-NEXT: v_div_scale_f64 v[6:7], vcc, 1.0, v[0:1], 1.0
; GFX8-NEXT: v_fma_f64 v[8:9], -v[2:3], v[4:5], 1.0		; GFX8-NEXT: v_fma_f64 v[8:9], -v[2:3], v[4:5], 1.0
; GFX8-NEXT: v_fma_f64 v[4:5], v[4:5], v[8:9], v[4:5]		; GFX8-NEXT: v_fma_f64 v[4:5], v[4:5], v[8:9], v[4:5]
; GFX8-NEXT: v_mul_f64 v[8:9], v[6:7], v[4:5]		; GFX8-NEXT: v_mul_f64 v[8:9], v[6:7], v[4:5]
; GFX8-NEXT: v_fma_f64 v[2:3], -v[2:3], v[8:9], v[6:7]		; GFX8-NEXT: v_fma_f64 v[2:3], -v[2:3], v[8:9], v[6:7]
; GFX8-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[8:9]		; GFX8-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[8:9]
; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0		; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0
; GFX8-NEXT: s_setpc_b64 s[30:31]		; GFX8-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX9-LABEL: v_rcp_f64_ulp25:		; GFX9-LABEL: v_rcp_f64_ulp25:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: v_div_scale_f64 v[2:3], s[4:5], v[0:1], v[0:1], 1.0		; GFX9-NEXT: v_div_scale_f64 v[2:3], vcc, v[0:1], v[0:1], 1.0
; GFX9-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]		; GFX9-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]
; GFX9-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX9-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
; GFX9-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]		; GFX9-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
; GFX9-NEXT: v_div_scale_f64 v[6:7], vcc, 1.0, v[0:1], 1.0		; GFX9-NEXT: v_div_scale_f64 v[6:7], vcc, 1.0, v[0:1], 1.0
; GFX9-NEXT: v_fma_f64 v[8:9], -v[2:3], v[4:5], 1.0		; GFX9-NEXT: v_fma_f64 v[8:9], -v[2:3], v[4:5], 1.0
; GFX9-NEXT: v_fma_f64 v[4:5], v[4:5], v[8:9], v[4:5]		; GFX9-NEXT: v_fma_f64 v[4:5], v[4:5], v[8:9], v[4:5]
; GFX9-NEXT: v_mul_f64 v[8:9], v[6:7], v[4:5]		; GFX9-NEXT: v_mul_f64 v[8:9], v[6:7], v[4:5]
; GFX9-NEXT: v_fma_f64 v[2:3], -v[2:3], v[8:9], v[6:7]		; GFX9-NEXT: v_fma_f64 v[2:3], -v[2:3], v[8:9], v[6:7]
; GFX9-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[8:9]		; GFX9-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[8:9]
; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0		; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: v_rcp_f64_ulp25:		; GFX10-LABEL: v_rcp_f64_ulp25:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: v_div_scale_f64 v[2:3], s4, v[0:1], v[0:1], 1.0		; GFX10-NEXT: v_div_scale_f64 v[2:3], vcc_lo, v[0:1], v[0:1], 1.0
; GFX10-NEXT: v_div_scale_f64 v[8:9], vcc_lo, 1.0, v[0:1], 1.0		; GFX10-NEXT: v_div_scale_f64 v[8:9], vcc_lo, 1.0, v[0:1], 1.0
; GFX10-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]		; GFX10-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]
; GFX10-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX10-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
; GFX10-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]		; GFX10-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
; GFX10-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX10-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
; GFX10-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]		; GFX10-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
; GFX10-NEXT: v_mul_f64 v[6:7], v[8:9], v[4:5]		; GFX10-NEXT: v_mul_f64 v[6:7], v[8:9], v[4:5]
; GFX10-NEXT: v_fma_f64 v[2:3], -v[2:3], v[6:7], v[8:9]		; GFX10-NEXT: v_fma_f64 v[2:3], -v[2:3], v[6:7], v[8:9]
; GFX10-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[6:7]		; GFX10-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[6:7]
; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0		; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[2:3], v[0:1], 1.0
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-LABEL: v_rcp_f64_ulp25:		; GFX11-LABEL: v_rcp_f64_ulp25:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: v_div_scale_f64 v[2:3], null, v[0:1], v[0:1], 1.0		; GFX11-NEXT: v_div_scale_f64 v[2:3], vcc_lo, v[0:1], v[0:1], 1.0
; GFX11-NEXT: v_div_scale_f64 v[8:9], vcc_lo, 1.0, v[0:1], 1.0		; GFX11-NEXT: v_div_scale_f64 v[8:9], vcc_lo, 1.0, v[0:1], 1.0
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]		; GFX11-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]
; GFX11-NEXT: s_waitcnt_depctr 0xfff		; GFX11-NEXT: s_waitcnt_depctr 0xfff
; GFX11-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX11-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
; GFX11-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]		; GFX11-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0		; GFX11-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	; GFX11-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv afn double %a, %b, !fpmath !0		%fdiv = fdiv afn double %a, %b, !fpmath !0
ret double %fdiv		ret double %fdiv
}		}

define double @v_fdiv_f64_arcp_ulp25(double %a, double %b) {		define double @v_fdiv_f64_arcp_ulp25(double %a, double %b) {
; GFX6-LABEL: v_fdiv_f64_arcp_ulp25:		; GFX6-LABEL: v_fdiv_f64_arcp_ulp25:
; GFX6: ; %bb.0:		; GFX6: ; %bb.0:
; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-NEXT: v_div_scale_f64 v[4:5], s[4:5], v[2:3], v[2:3], v[0:1]		; GFX6-NEXT: v_div_scale_f64 v[4:5], vcc, v[2:3], v[2:3], v[0:1]
; GFX6-NEXT: v_div_scale_f64 v[10:11], s[4:5], v[0:1], v[2:3], v[0:1]		; GFX6-NEXT: v_div_scale_f64 v[10:11], vcc, v[0:1], v[2:3], v[0:1]
; GFX6-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]		; GFX6-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v3, v5		; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v3, v5
; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v1, v11		; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v1, v11
; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]		; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]
; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]		; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]		; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX6-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]		; GFX6-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]
; GFX6-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]		; GFX6-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]
; GFX6-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]		; GFX6-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]
; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]		; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]
; GFX6-NEXT: s_setpc_b64 s[30:31]		; GFX6-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX8-LABEL: v_fdiv_f64_arcp_ulp25:		; GFX8-LABEL: v_fdiv_f64_arcp_ulp25:
; GFX8: ; %bb.0:		; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: v_div_scale_f64 v[4:5], s[4:5], v[2:3], v[2:3], v[0:1]		; GFX8-NEXT: v_div_scale_f64 v[4:5], vcc, v[2:3], v[2:3], v[0:1]
; GFX8-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]		; GFX8-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX8-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX8-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX8-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]		; GFX8-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX8-NEXT: v_div_scale_f64 v[8:9], vcc, v[0:1], v[2:3], v[0:1]		; GFX8-NEXT: v_div_scale_f64 v[8:9], vcc, v[0:1], v[2:3], v[0:1]
; GFX8-NEXT: v_fma_f64 v[10:11], -v[4:5], v[6:7], 1.0		; GFX8-NEXT: v_fma_f64 v[10:11], -v[4:5], v[6:7], 1.0
; GFX8-NEXT: v_fma_f64 v[6:7], v[6:7], v[10:11], v[6:7]		; GFX8-NEXT: v_fma_f64 v[6:7], v[6:7], v[10:11], v[6:7]
; GFX8-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]		; GFX8-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]
; GFX8-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]		; GFX8-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]
; GFX8-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[10:11]		; GFX8-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[10:11]
; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]		; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]
; GFX8-NEXT: s_setpc_b64 s[30:31]		; GFX8-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX9-LABEL: v_fdiv_f64_arcp_ulp25:		; GFX9-LABEL: v_fdiv_f64_arcp_ulp25:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: v_div_scale_f64 v[4:5], s[4:5], v[2:3], v[2:3], v[0:1]		; GFX9-NEXT: v_div_scale_f64 v[4:5], vcc, v[2:3], v[2:3], v[0:1]
; GFX9-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]		; GFX9-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX9-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX9-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX9-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]		; GFX9-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX9-NEXT: v_div_scale_f64 v[8:9], vcc, v[0:1], v[2:3], v[0:1]		; GFX9-NEXT: v_div_scale_f64 v[8:9], vcc, v[0:1], v[2:3], v[0:1]
; GFX9-NEXT: v_fma_f64 v[10:11], -v[4:5], v[6:7], 1.0		; GFX9-NEXT: v_fma_f64 v[10:11], -v[4:5], v[6:7], 1.0
; GFX9-NEXT: v_fma_f64 v[6:7], v[6:7], v[10:11], v[6:7]		; GFX9-NEXT: v_fma_f64 v[6:7], v[6:7], v[10:11], v[6:7]
; GFX9-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]		; GFX9-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]
; GFX9-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]		; GFX9-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]
; GFX9-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[10:11]		; GFX9-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[10:11]
; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]		; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: v_fdiv_f64_arcp_ulp25:		; GFX10-LABEL: v_fdiv_f64_arcp_ulp25:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: v_div_scale_f64 v[4:5], s4, v[2:3], v[2:3], v[0:1]		; GFX10-NEXT: v_div_scale_f64 v[4:5], vcc_lo, v[2:3], v[2:3], v[0:1]
; GFX10-NEXT: v_div_scale_f64 v[10:11], vcc_lo, v[0:1], v[2:3], v[0:1]		; GFX10-NEXT: v_div_scale_f64 v[10:11], vcc_lo, v[0:1], v[2:3], v[0:1]
; GFX10-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]		; GFX10-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX10-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX10-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX10-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]		; GFX10-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX10-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX10-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX10-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]		; GFX10-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX10-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]		; GFX10-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]
; GFX10-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]		; GFX10-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]
; GFX10-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]		; GFX10-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]
; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]		; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-LABEL: v_fdiv_f64_arcp_ulp25:		; GFX11-LABEL: v_fdiv_f64_arcp_ulp25:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: v_div_scale_f64 v[4:5], null, v[2:3], v[2:3], v[0:1]		; GFX11-NEXT: v_div_scale_f64 v[4:5], vcc_lo, v[2:3], v[2:3], v[0:1]
; GFX11-NEXT: v_div_scale_f64 v[10:11], vcc_lo, v[0:1], v[2:3], v[0:1]		; GFX11-NEXT: v_div_scale_f64 v[10:11], vcc_lo, v[0:1], v[2:3], v[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]		; GFX11-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX11-NEXT: s_waitcnt_depctr 0xfff		; GFX11-NEXT: s_waitcnt_depctr 0xfff
; GFX11-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX11-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX11-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]		; GFX11-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0		; GFX11-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX11-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]		; GFX11-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]		; GFX11-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]
; GFX11-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]		; GFX11-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]		; GFX11-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]
; GFX11-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]		; GFX11-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[2:3], v[0:1]
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv arcp double %a, %b, !fpmath !0		%fdiv = fdiv arcp double %a, %b, !fpmath !0
ret double %fdiv		ret double %fdiv
}		}

define <2 x double> @v_fdiv_v2f64(<2 x double> %a, <2 x double> %b) {		define <2 x double> @v_fdiv_v2f64(<2 x double> %a, <2 x double> %b) {
; GFX6-LABEL: v_fdiv_v2f64:		; GFX6-LABEL: v_fdiv_v2f64:
; GFX6: ; %bb.0:		; GFX6: ; %bb.0:
; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-NEXT: v_div_scale_f64 v[8:9], s[4:5], v[4:5], v[4:5], v[0:1]		; GFX6-NEXT: v_div_scale_f64 v[8:9], vcc, v[4:5], v[4:5], v[0:1]
; GFX6-NEXT: v_div_scale_f64 v[14:15], s[4:5], v[6:7], v[6:7], v[2:3]		; GFX6-NEXT: v_div_scale_f64 v[14:15], vcc, v[0:1], v[4:5], v[0:1]
; GFX6-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]		; GFX6-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]
; GFX6-NEXT: v_div_scale_f64 v[18:19], s[4:5], v[0:1], v[4:5], v[0:1]		; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v5, v9
; GFX6-NEXT: v_rcp_f64_e32 v[16:17], v[14:15]		; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v1, v15
		; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]
; GFX6-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0		; GFX6-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v1, v19
; GFX6-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]		; GFX6-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v5, v9
; GFX6-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0		; GFX6-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]
; GFX6-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]		; GFX6-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX6-NEXT: v_fma_f64 v[12:13], -v[14:15], v[16:17], 1.0		; GFX6-NEXT: v_mul_f64 v[12:13], v[14:15], v[10:11]
; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v7, v15		; GFX6-NEXT: v_fma_f64 v[14:15], -v[8:9], v[12:13], v[14:15]
; GFX6-NEXT: v_fma_f64 v[12:13], v[16:17], v[12:13], v[16:17]		; GFX6-NEXT: v_div_fmas_f64 v[8:9], v[14:15], v[10:11], v[12:13]
; GFX6-NEXT: v_mul_f64 v[16:17], v[18:19], v[10:11]		; GFX6-NEXT: v_div_scale_f64 v[10:11], vcc, v[6:7], v[6:7], v[2:3]
; GFX6-NEXT: v_fma_f64 v[18:19], -v[8:9], v[16:17], v[18:19]		; GFX6-NEXT: v_div_scale_f64 v[16:17], vcc, v[2:3], v[6:7], v[2:3]
; GFX6-NEXT: v_fma_f64 v[8:9], -v[14:15], v[12:13], 1.0		; GFX6-NEXT: v_rcp_f64_e32 v[12:13], v[10:11]
; GFX6-NEXT: v_div_fmas_f64 v[10:11], v[18:19], v[10:11], v[16:17]		; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v7, v11
; GFX6-NEXT: v_fma_f64 v[8:9], v[12:13], v[8:9], v[12:13]		; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v3, v17
; GFX6-NEXT: v_div_scale_f64 v[12:13], s[6:7], v[2:3], v[6:7], v[2:3]
; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[10:11], v[4:5], v[0:1]
; GFX6-NEXT: v_mul_f64 v[16:17], v[12:13], v[8:9]
; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v3, v13
; GFX6-NEXT: v_fma_f64 v[18:19], -v[14:15], v[16:17], v[12:13]
; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]		; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]
; GFX6-NEXT: s_nop 1		; GFX6-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
; GFX6-NEXT: v_div_fmas_f64 v[8:9], v[18:19], v[8:9], v[16:17]		; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]
; GFX6-NEXT: v_div_fixup_f64 v[2:3], v[8:9], v[6:7], v[2:3]		; GFX6-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX6-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
		; GFX6-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX6-NEXT: v_mul_f64 v[14:15], v[16:17], v[12:13]
		; GFX6-NEXT: v_fma_f64 v[10:11], -v[10:11], v[14:15], v[16:17]
		; GFX6-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[12:13], v[14:15]
		; GFX6-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]
; GFX6-NEXT: s_setpc_b64 s[30:31]		; GFX6-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX8-LABEL: v_fdiv_v2f64:		; GFX8-LABEL: v_fdiv_v2f64:
; GFX8: ; %bb.0:		; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: v_div_scale_f64 v[8:9], s[4:5], v[4:5], v[4:5], v[0:1]		; GFX8-NEXT: v_div_scale_f64 v[8:9], vcc, v[4:5], v[4:5], v[0:1]
; GFX8-NEXT: v_div_scale_f64 v[10:11], s[4:5], v[6:7], v[6:7], v[2:3]		; GFX8-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]
; GFX8-NEXT: v_rcp_f64_e32 v[12:13], v[8:9]		; GFX8-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX8-NEXT: v_rcp_f64_e32 v[14:15], v[10:11]		; GFX8-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX8-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX8-NEXT: v_div_scale_f64 v[12:13], vcc, v[0:1], v[4:5], v[0:1]
; GFX8-NEXT: v_fma_f64 v[18:19], -v[10:11], v[14:15], 1.0		; GFX8-NEXT: v_fma_f64 v[14:15], -v[8:9], v[10:11], 1.0
; GFX8-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX8-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]
; GFX8-NEXT: v_fma_f64 v[14:15], v[14:15], v[18:19], v[14:15]		; GFX8-NEXT: v_mul_f64 v[14:15], v[12:13], v[10:11]
; GFX8-NEXT: v_div_scale_f64 v[18:19], vcc, v[0:1], v[4:5], v[0:1]		; GFX8-NEXT: v_fma_f64 v[8:9], -v[8:9], v[14:15], v[12:13]
; GFX8-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX8-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[10:11], v[14:15]
; GFX8-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX8-NEXT: v_div_scale_f64 v[10:11], vcc, v[6:7], v[6:7], v[2:3]
; GFX8-NEXT: v_fma_f64 v[16:17], -v[10:11], v[14:15], 1.0
; GFX8-NEXT: v_fma_f64 v[14:15], v[14:15], v[16:17], v[14:15]
; GFX8-NEXT: v_mul_f64 v[16:17], v[18:19], v[12:13]
; GFX8-NEXT: v_fma_f64 v[8:9], -v[8:9], v[16:17], v[18:19]
; GFX8-NEXT: v_div_scale_f64 v[18:19], s[4:5], v[2:3], v[6:7], v[2:3]
; GFX8-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[12:13], v[16:17]
; GFX8-NEXT: s_mov_b64 vcc, s[4:5]
; GFX8-NEXT: v_mul_f64 v[20:21], v[18:19], v[14:15]
; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]		; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]
; GFX8-NEXT: v_fma_f64 v[10:11], -v[10:11], v[20:21], v[18:19]		; GFX8-NEXT: v_rcp_f64_e32 v[12:13], v[10:11]
; GFX8-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[14:15], v[20:21]		; GFX8-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
		; GFX8-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX8-NEXT: v_div_scale_f64 v[14:15], vcc, v[2:3], v[6:7], v[2:3]
		; GFX8-NEXT: v_fma_f64 v[16:17], -v[10:11], v[12:13], 1.0
		; GFX8-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]
		; GFX8-NEXT: v_mul_f64 v[16:17], v[14:15], v[12:13]
		; GFX8-NEXT: v_fma_f64 v[10:11], -v[10:11], v[16:17], v[14:15]
		; GFX8-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[12:13], v[16:17]
; GFX8-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]		; GFX8-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]
; GFX8-NEXT: s_setpc_b64 s[30:31]		; GFX8-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX9-LABEL: v_fdiv_v2f64:		; GFX9-LABEL: v_fdiv_v2f64:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: v_div_scale_f64 v[8:9], s[4:5], v[4:5], v[4:5], v[0:1]		; GFX9-NEXT: v_div_scale_f64 v[8:9], vcc, v[4:5], v[4:5], v[0:1]
; GFX9-NEXT: v_div_scale_f64 v[10:11], s[4:5], v[6:7], v[6:7], v[2:3]		; GFX9-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]
; GFX9-NEXT: v_rcp_f64_e32 v[12:13], v[8:9]		; GFX9-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX9-NEXT: v_rcp_f64_e32 v[14:15], v[10:11]		; GFX9-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX9-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX9-NEXT: v_div_scale_f64 v[12:13], vcc, v[0:1], v[4:5], v[0:1]
; GFX9-NEXT: v_fma_f64 v[18:19], -v[10:11], v[14:15], 1.0		; GFX9-NEXT: v_fma_f64 v[14:15], -v[8:9], v[10:11], 1.0
; GFX9-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX9-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]
; GFX9-NEXT: v_fma_f64 v[14:15], v[14:15], v[18:19], v[14:15]		; GFX9-NEXT: v_mul_f64 v[14:15], v[12:13], v[10:11]
; GFX9-NEXT: v_div_scale_f64 v[18:19], vcc, v[0:1], v[4:5], v[0:1]		; GFX9-NEXT: v_fma_f64 v[8:9], -v[8:9], v[14:15], v[12:13]
; GFX9-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX9-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[10:11], v[14:15]
; GFX9-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX9-NEXT: v_div_scale_f64 v[10:11], vcc, v[6:7], v[6:7], v[2:3]
; GFX9-NEXT: v_fma_f64 v[16:17], -v[10:11], v[14:15], 1.0
; GFX9-NEXT: v_fma_f64 v[14:15], v[14:15], v[16:17], v[14:15]
; GFX9-NEXT: v_mul_f64 v[16:17], v[18:19], v[12:13]
; GFX9-NEXT: v_fma_f64 v[8:9], -v[8:9], v[16:17], v[18:19]
; GFX9-NEXT: v_div_scale_f64 v[18:19], s[4:5], v[2:3], v[6:7], v[2:3]
; GFX9-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[12:13], v[16:17]
; GFX9-NEXT: s_mov_b64 vcc, s[4:5]
; GFX9-NEXT: v_mul_f64 v[20:21], v[18:19], v[14:15]
; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]		; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]
; GFX9-NEXT: v_fma_f64 v[10:11], -v[10:11], v[20:21], v[18:19]		; GFX9-NEXT: v_rcp_f64_e32 v[12:13], v[10:11]
; GFX9-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[14:15], v[20:21]		; GFX9-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
		; GFX9-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX9-NEXT: v_div_scale_f64 v[14:15], vcc, v[2:3], v[6:7], v[2:3]
		; GFX9-NEXT: v_fma_f64 v[16:17], -v[10:11], v[12:13], 1.0
		; GFX9-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]
		; GFX9-NEXT: v_mul_f64 v[16:17], v[14:15], v[12:13]
		; GFX9-NEXT: v_fma_f64 v[10:11], -v[10:11], v[16:17], v[14:15]
		; GFX9-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[12:13], v[16:17]
; GFX9-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]		; GFX9-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: v_fdiv_v2f64:		; GFX10-LABEL: v_fdiv_v2f64:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: v_div_scale_f64 v[8:9], s4, v[4:5], v[4:5], v[0:1]		; GFX10-NEXT: v_div_scale_f64 v[8:9], vcc_lo, v[4:5], v[4:5], v[0:1]
; GFX10-NEXT: v_div_scale_f64 v[10:11], s4, v[6:7], v[6:7], v[2:3]		; GFX10-NEXT: v_div_scale_f64 v[14:15], vcc_lo, v[0:1], v[4:5], v[0:1]
; GFX10-NEXT: v_div_scale_f64 v[20:21], vcc_lo, v[0:1], v[4:5], v[0:1]		; GFX10-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]
; GFX10-NEXT: v_rcp_f64_e32 v[12:13], v[8:9]		; GFX10-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX10-NEXT: v_rcp_f64_e32 v[14:15], v[10:11]		; GFX10-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX10-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX10-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX10-NEXT: v_fma_f64 v[18:19], -v[10:11], v[14:15], 1.0		; GFX10-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX10-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX10-NEXT: v_mul_f64 v[12:13], v[14:15], v[10:11]
; GFX10-NEXT: v_fma_f64 v[14:15], v[14:15], v[18:19], v[14:15]		; GFX10-NEXT: v_fma_f64 v[8:9], -v[8:9], v[12:13], v[14:15]
; GFX10-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX10-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[10:11], v[12:13]
; GFX10-NEXT: v_fma_f64 v[18:19], -v[10:11], v[14:15], 1.0		; GFX10-NEXT: v_div_scale_f64 v[10:11], vcc_lo, v[6:7], v[6:7], v[2:3]
; GFX10-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX10-NEXT: v_div_scale_f64 v[16:17], vcc_lo, v[2:3], v[6:7], v[2:3]
; GFX10-NEXT: v_div_scale_f64 v[16:17], s4, v[2:3], v[6:7], v[2:3]
; GFX10-NEXT: v_fma_f64 v[14:15], v[14:15], v[18:19], v[14:15]
; GFX10-NEXT: v_mul_f64 v[18:19], v[20:21], v[12:13]
; GFX10-NEXT: v_mul_f64 v[22:23], v[16:17], v[14:15]
; GFX10-NEXT: v_fma_f64 v[8:9], -v[8:9], v[18:19], v[20:21]
; GFX10-NEXT: v_fma_f64 v[10:11], -v[10:11], v[22:23], v[16:17]
; GFX10-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[12:13], v[18:19]
; GFX10-NEXT: s_mov_b32 vcc_lo, s4
; GFX10-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[14:15], v[22:23]
; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]		; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]
		; GFX10-NEXT: v_rcp_f64_e32 v[12:13], v[10:11]
		; GFX10-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
		; GFX10-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX10-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
		; GFX10-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX10-NEXT: v_mul_f64 v[14:15], v[16:17], v[12:13]
		; GFX10-NEXT: v_fma_f64 v[10:11], -v[10:11], v[14:15], v[16:17]
		; GFX10-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[12:13], v[14:15]
; GFX10-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]		; GFX10-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-LABEL: v_fdiv_v2f64:		; GFX11-LABEL: v_fdiv_v2f64:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: v_div_scale_f64 v[8:9], null, v[4:5], v[4:5], v[0:1]		; GFX11-NEXT: v_div_scale_f64 v[8:9], vcc_lo, v[4:5], v[4:5], v[0:1]
; GFX11-NEXT: v_div_scale_f64 v[10:11], null, v[6:7], v[6:7], v[2:3]		; GFX11-NEXT: v_div_scale_f64 v[14:15], vcc_lo, v[0:1], v[4:5], v[0:1]
; GFX11-NEXT: v_div_scale_f64 v[20:21], vcc_lo, v[0:1], v[4:5], v[0:1]		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]
; GFX11-NEXT: v_rcp_f64_e32 v[12:13], v[8:9]		; GFX11-NEXT: s_waitcnt_depctr 0xfff
; GFX11-NEXT: v_rcp_f64_e32 v[14:15], v[10:11]		; GFX11-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX11-NEXT: s_waitcnt_depctr 0xfff		; GFX11-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX11-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_fma_f64 v[18:19], -v[10:11], v[14:15], 1.0		; GFX11-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX11-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_fma_f64 v[14:15], v[14:15], v[18:19], v[14:15]		; GFX11-NEXT: v_mul_f64 v[12:13], v[14:15], v[10:11]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-NEXT: v_fma_f64 v[8:9], -v[8:9], v[12:13], v[14:15]
; GFX11-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_2) \| instid1(VALU_DEP_3)
; GFX11-NEXT: v_fma_f64 v[18:19], -v[10:11], v[14:15], 1.0		; GFX11-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[10:11], v[12:13]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_1) \| instid1(VALU_DEP_3)		; GFX11-NEXT: v_div_scale_f64 v[10:11], vcc_lo, v[6:7], v[6:7], v[2:3]
; GFX11-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX11-NEXT: v_div_scale_f64 v[16:17], vcc_lo, v[2:3], v[6:7], v[2:3]
; GFX11-NEXT: v_div_scale_f64 v[16:17], s0, v[2:3], v[6:7], v[2:3]
; GFX11-NEXT: v_fma_f64 v[14:15], v[14:15], v[18:19], v[14:15]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_mul_f64 v[18:19], v[20:21], v[12:13]
; GFX11-NEXT: v_mul_f64 v[22:23], v[16:17], v[14:15]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_fma_f64 v[8:9], -v[8:9], v[18:19], v[20:21]
; GFX11-NEXT: v_fma_f64 v[10:11], -v[10:11], v[22:23], v[16:17]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[12:13], v[18:19]
; GFX11-NEXT: s_mov_b32 vcc_lo, s0
; GFX11-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[14:15], v[22:23]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]		; GFX11-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_rcp_f64_e32 v[12:13], v[10:11]
		; GFX11-NEXT: s_waitcnt_depctr 0xfff
		; GFX11-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
		; GFX11-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
		; GFX11-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_mul_f64 v[14:15], v[16:17], v[12:13]
		; GFX11-NEXT: v_fma_f64 v[10:11], -v[10:11], v[14:15], v[16:17]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[12:13], v[14:15]
; GFX11-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]		; GFX11-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv <2 x double> %a, %b		%fdiv = fdiv <2 x double> %a, %b
ret <2 x double> %fdiv		ret <2 x double> %fdiv
}		}

define <2 x double> @v_fdiv_v2f64_afn(<2 x double> %a, <2 x double> %b) {		define <2 x double> @v_fdiv_v2f64_afn(<2 x double> %a, <2 x double> %b) {
; GCN-LABEL: v_fdiv_v2f64_afn:		; GCN-LABEL: v_fdiv_v2f64_afn:
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	; GFX11-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv afn <2 x double> %a, %b		%fdiv = fdiv afn <2 x double> %a, %b
ret <2 x double> %fdiv		ret <2 x double> %fdiv
}		}

define <2 x double> @v_fdiv_v2f64_ulp25(<2 x double> %a, <2 x double> %b) {		define <2 x double> @v_fdiv_v2f64_ulp25(<2 x double> %a, <2 x double> %b) {
; GFX6-LABEL: v_fdiv_v2f64_ulp25:		; GFX6-LABEL: v_fdiv_v2f64_ulp25:
; GFX6: ; %bb.0:		; GFX6: ; %bb.0:
; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-NEXT: v_div_scale_f64 v[8:9], s[4:5], v[4:5], v[4:5], v[0:1]		; GFX6-NEXT: v_div_scale_f64 v[8:9], vcc, v[4:5], v[4:5], v[0:1]
; GFX6-NEXT: v_div_scale_f64 v[14:15], s[4:5], v[6:7], v[6:7], v[2:3]		; GFX6-NEXT: v_div_scale_f64 v[14:15], vcc, v[0:1], v[4:5], v[0:1]
; GFX6-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]		; GFX6-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]
; GFX6-NEXT: v_div_scale_f64 v[18:19], s[4:5], v[0:1], v[4:5], v[0:1]		; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v5, v9
; GFX6-NEXT: v_rcp_f64_e32 v[16:17], v[14:15]		; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v1, v15
		; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]
; GFX6-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0		; GFX6-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v1, v19
; GFX6-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]		; GFX6-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v5, v9
; GFX6-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0		; GFX6-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]
; GFX6-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]		; GFX6-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX6-NEXT: v_fma_f64 v[12:13], -v[14:15], v[16:17], 1.0		; GFX6-NEXT: v_mul_f64 v[12:13], v[14:15], v[10:11]
; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v7, v15		; GFX6-NEXT: v_fma_f64 v[14:15], -v[8:9], v[12:13], v[14:15]
; GFX6-NEXT: v_fma_f64 v[12:13], v[16:17], v[12:13], v[16:17]		; GFX6-NEXT: v_div_fmas_f64 v[8:9], v[14:15], v[10:11], v[12:13]
; GFX6-NEXT: v_mul_f64 v[16:17], v[18:19], v[10:11]		; GFX6-NEXT: v_div_scale_f64 v[10:11], vcc, v[6:7], v[6:7], v[2:3]
; GFX6-NEXT: v_fma_f64 v[18:19], -v[8:9], v[16:17], v[18:19]		; GFX6-NEXT: v_div_scale_f64 v[16:17], vcc, v[2:3], v[6:7], v[2:3]
; GFX6-NEXT: v_fma_f64 v[8:9], -v[14:15], v[12:13], 1.0		; GFX6-NEXT: v_rcp_f64_e32 v[12:13], v[10:11]
; GFX6-NEXT: v_div_fmas_f64 v[10:11], v[18:19], v[10:11], v[16:17]		; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v7, v11
; GFX6-NEXT: v_fma_f64 v[8:9], v[12:13], v[8:9], v[12:13]		; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v3, v17
; GFX6-NEXT: v_div_scale_f64 v[12:13], s[6:7], v[2:3], v[6:7], v[2:3]
; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[10:11], v[4:5], v[0:1]
; GFX6-NEXT: v_mul_f64 v[16:17], v[12:13], v[8:9]
; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v3, v13
; GFX6-NEXT: v_fma_f64 v[18:19], -v[14:15], v[16:17], v[12:13]
; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]		; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]
; GFX6-NEXT: s_nop 1		; GFX6-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
; GFX6-NEXT: v_div_fmas_f64 v[8:9], v[18:19], v[8:9], v[16:17]		; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]
; GFX6-NEXT: v_div_fixup_f64 v[2:3], v[8:9], v[6:7], v[2:3]		; GFX6-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX6-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
		; GFX6-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX6-NEXT: v_mul_f64 v[14:15], v[16:17], v[12:13]
		; GFX6-NEXT: v_fma_f64 v[10:11], -v[10:11], v[14:15], v[16:17]
		; GFX6-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[12:13], v[14:15]
		; GFX6-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]
; GFX6-NEXT: s_setpc_b64 s[30:31]		; GFX6-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX8-LABEL: v_fdiv_v2f64_ulp25:		; GFX8-LABEL: v_fdiv_v2f64_ulp25:
; GFX8: ; %bb.0:		; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: v_div_scale_f64 v[8:9], s[4:5], v[4:5], v[4:5], v[0:1]		; GFX8-NEXT: v_div_scale_f64 v[8:9], vcc, v[4:5], v[4:5], v[0:1]
; GFX8-NEXT: v_div_scale_f64 v[10:11], s[4:5], v[6:7], v[6:7], v[2:3]		; GFX8-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]
; GFX8-NEXT: v_rcp_f64_e32 v[12:13], v[8:9]		; GFX8-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX8-NEXT: v_rcp_f64_e32 v[14:15], v[10:11]		; GFX8-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX8-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX8-NEXT: v_div_scale_f64 v[12:13], vcc, v[0:1], v[4:5], v[0:1]
; GFX8-NEXT: v_fma_f64 v[18:19], -v[10:11], v[14:15], 1.0		; GFX8-NEXT: v_fma_f64 v[14:15], -v[8:9], v[10:11], 1.0
; GFX8-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX8-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]
; GFX8-NEXT: v_fma_f64 v[14:15], v[14:15], v[18:19], v[14:15]		; GFX8-NEXT: v_mul_f64 v[14:15], v[12:13], v[10:11]
; GFX8-NEXT: v_div_scale_f64 v[18:19], vcc, v[0:1], v[4:5], v[0:1]		; GFX8-NEXT: v_fma_f64 v[8:9], -v[8:9], v[14:15], v[12:13]
; GFX8-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX8-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[10:11], v[14:15]
; GFX8-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX8-NEXT: v_div_scale_f64 v[10:11], vcc, v[6:7], v[6:7], v[2:3]
; GFX8-NEXT: v_fma_f64 v[16:17], -v[10:11], v[14:15], 1.0
; GFX8-NEXT: v_fma_f64 v[14:15], v[14:15], v[16:17], v[14:15]
; GFX8-NEXT: v_mul_f64 v[16:17], v[18:19], v[12:13]
; GFX8-NEXT: v_fma_f64 v[8:9], -v[8:9], v[16:17], v[18:19]
; GFX8-NEXT: v_div_scale_f64 v[18:19], s[4:5], v[2:3], v[6:7], v[2:3]
; GFX8-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[12:13], v[16:17]
; GFX8-NEXT: s_mov_b64 vcc, s[4:5]
; GFX8-NEXT: v_mul_f64 v[20:21], v[18:19], v[14:15]
; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]		; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]
; GFX8-NEXT: v_fma_f64 v[10:11], -v[10:11], v[20:21], v[18:19]		; GFX8-NEXT: v_rcp_f64_e32 v[12:13], v[10:11]
; GFX8-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[14:15], v[20:21]		; GFX8-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
		; GFX8-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX8-NEXT: v_div_scale_f64 v[14:15], vcc, v[2:3], v[6:7], v[2:3]
		; GFX8-NEXT: v_fma_f64 v[16:17], -v[10:11], v[12:13], 1.0
		; GFX8-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]
		; GFX8-NEXT: v_mul_f64 v[16:17], v[14:15], v[12:13]
		; GFX8-NEXT: v_fma_f64 v[10:11], -v[10:11], v[16:17], v[14:15]
		; GFX8-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[12:13], v[16:17]
; GFX8-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]		; GFX8-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]
; GFX8-NEXT: s_setpc_b64 s[30:31]		; GFX8-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX9-LABEL: v_fdiv_v2f64_ulp25:		; GFX9-LABEL: v_fdiv_v2f64_ulp25:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: v_div_scale_f64 v[8:9], s[4:5], v[4:5], v[4:5], v[0:1]		; GFX9-NEXT: v_div_scale_f64 v[8:9], vcc, v[4:5], v[4:5], v[0:1]
; GFX9-NEXT: v_div_scale_f64 v[10:11], s[4:5], v[6:7], v[6:7], v[2:3]		; GFX9-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]
; GFX9-NEXT: v_rcp_f64_e32 v[12:13], v[8:9]		; GFX9-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX9-NEXT: v_rcp_f64_e32 v[14:15], v[10:11]		; GFX9-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX9-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX9-NEXT: v_div_scale_f64 v[12:13], vcc, v[0:1], v[4:5], v[0:1]
; GFX9-NEXT: v_fma_f64 v[18:19], -v[10:11], v[14:15], 1.0		; GFX9-NEXT: v_fma_f64 v[14:15], -v[8:9], v[10:11], 1.0
; GFX9-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX9-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]
; GFX9-NEXT: v_fma_f64 v[14:15], v[14:15], v[18:19], v[14:15]		; GFX9-NEXT: v_mul_f64 v[14:15], v[12:13], v[10:11]
; GFX9-NEXT: v_div_scale_f64 v[18:19], vcc, v[0:1], v[4:5], v[0:1]		; GFX9-NEXT: v_fma_f64 v[8:9], -v[8:9], v[14:15], v[12:13]
; GFX9-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX9-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[10:11], v[14:15]
; GFX9-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX9-NEXT: v_div_scale_f64 v[10:11], vcc, v[6:7], v[6:7], v[2:3]
; GFX9-NEXT: v_fma_f64 v[16:17], -v[10:11], v[14:15], 1.0
; GFX9-NEXT: v_fma_f64 v[14:15], v[14:15], v[16:17], v[14:15]
; GFX9-NEXT: v_mul_f64 v[16:17], v[18:19], v[12:13]
; GFX9-NEXT: v_fma_f64 v[8:9], -v[8:9], v[16:17], v[18:19]
; GFX9-NEXT: v_div_scale_f64 v[18:19], s[4:5], v[2:3], v[6:7], v[2:3]
; GFX9-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[12:13], v[16:17]
; GFX9-NEXT: s_mov_b64 vcc, s[4:5]
; GFX9-NEXT: v_mul_f64 v[20:21], v[18:19], v[14:15]
; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]		; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]
; GFX9-NEXT: v_fma_f64 v[10:11], -v[10:11], v[20:21], v[18:19]		; GFX9-NEXT: v_rcp_f64_e32 v[12:13], v[10:11]
; GFX9-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[14:15], v[20:21]		; GFX9-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
		; GFX9-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX9-NEXT: v_div_scale_f64 v[14:15], vcc, v[2:3], v[6:7], v[2:3]
		; GFX9-NEXT: v_fma_f64 v[16:17], -v[10:11], v[12:13], 1.0
		; GFX9-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]
		; GFX9-NEXT: v_mul_f64 v[16:17], v[14:15], v[12:13]
		; GFX9-NEXT: v_fma_f64 v[10:11], -v[10:11], v[16:17], v[14:15]
		; GFX9-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[12:13], v[16:17]
; GFX9-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]		; GFX9-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: v_fdiv_v2f64_ulp25:		; GFX10-LABEL: v_fdiv_v2f64_ulp25:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: v_div_scale_f64 v[8:9], s4, v[4:5], v[4:5], v[0:1]		; GFX10-NEXT: v_div_scale_f64 v[8:9], vcc_lo, v[4:5], v[4:5], v[0:1]
; GFX10-NEXT: v_div_scale_f64 v[10:11], s4, v[6:7], v[6:7], v[2:3]		; GFX10-NEXT: v_div_scale_f64 v[14:15], vcc_lo, v[0:1], v[4:5], v[0:1]
; GFX10-NEXT: v_div_scale_f64 v[20:21], vcc_lo, v[0:1], v[4:5], v[0:1]		; GFX10-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]
; GFX10-NEXT: v_rcp_f64_e32 v[12:13], v[8:9]		; GFX10-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX10-NEXT: v_rcp_f64_e32 v[14:15], v[10:11]		; GFX10-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX10-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX10-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX10-NEXT: v_fma_f64 v[18:19], -v[10:11], v[14:15], 1.0		; GFX10-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX10-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX10-NEXT: v_mul_f64 v[12:13], v[14:15], v[10:11]
; GFX10-NEXT: v_fma_f64 v[14:15], v[14:15], v[18:19], v[14:15]		; GFX10-NEXT: v_fma_f64 v[8:9], -v[8:9], v[12:13], v[14:15]
; GFX10-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX10-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[10:11], v[12:13]
; GFX10-NEXT: v_fma_f64 v[18:19], -v[10:11], v[14:15], 1.0		; GFX10-NEXT: v_div_scale_f64 v[10:11], vcc_lo, v[6:7], v[6:7], v[2:3]
; GFX10-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX10-NEXT: v_div_scale_f64 v[16:17], vcc_lo, v[2:3], v[6:7], v[2:3]
; GFX10-NEXT: v_div_scale_f64 v[16:17], s4, v[2:3], v[6:7], v[2:3]
; GFX10-NEXT: v_fma_f64 v[14:15], v[14:15], v[18:19], v[14:15]
; GFX10-NEXT: v_mul_f64 v[18:19], v[20:21], v[12:13]
; GFX10-NEXT: v_mul_f64 v[22:23], v[16:17], v[14:15]
; GFX10-NEXT: v_fma_f64 v[8:9], -v[8:9], v[18:19], v[20:21]
; GFX10-NEXT: v_fma_f64 v[10:11], -v[10:11], v[22:23], v[16:17]
; GFX10-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[12:13], v[18:19]
; GFX10-NEXT: s_mov_b32 vcc_lo, s4
; GFX10-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[14:15], v[22:23]
; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]		; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]
		; GFX10-NEXT: v_rcp_f64_e32 v[12:13], v[10:11]
		; GFX10-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
		; GFX10-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX10-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
		; GFX10-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX10-NEXT: v_mul_f64 v[14:15], v[16:17], v[12:13]
		; GFX10-NEXT: v_fma_f64 v[10:11], -v[10:11], v[14:15], v[16:17]
		; GFX10-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[12:13], v[14:15]
; GFX10-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]		; GFX10-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-LABEL: v_fdiv_v2f64_ulp25:		; GFX11-LABEL: v_fdiv_v2f64_ulp25:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: v_div_scale_f64 v[8:9], null, v[4:5], v[4:5], v[0:1]		; GFX11-NEXT: v_div_scale_f64 v[8:9], vcc_lo, v[4:5], v[4:5], v[0:1]
; GFX11-NEXT: v_div_scale_f64 v[10:11], null, v[6:7], v[6:7], v[2:3]		; GFX11-NEXT: v_div_scale_f64 v[14:15], vcc_lo, v[0:1], v[4:5], v[0:1]
; GFX11-NEXT: v_div_scale_f64 v[20:21], vcc_lo, v[0:1], v[4:5], v[0:1]		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]
; GFX11-NEXT: v_rcp_f64_e32 v[12:13], v[8:9]		; GFX11-NEXT: s_waitcnt_depctr 0xfff
; GFX11-NEXT: v_rcp_f64_e32 v[14:15], v[10:11]		; GFX11-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX11-NEXT: s_waitcnt_depctr 0xfff		; GFX11-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX11-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_fma_f64 v[18:19], -v[10:11], v[14:15], 1.0		; GFX11-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX11-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_fma_f64 v[14:15], v[14:15], v[18:19], v[14:15]		; GFX11-NEXT: v_mul_f64 v[12:13], v[14:15], v[10:11]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-NEXT: v_fma_f64 v[8:9], -v[8:9], v[12:13], v[14:15]
; GFX11-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_2) \| instid1(VALU_DEP_3)
; GFX11-NEXT: v_fma_f64 v[18:19], -v[10:11], v[14:15], 1.0		; GFX11-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[10:11], v[12:13]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_1) \| instid1(VALU_DEP_3)		; GFX11-NEXT: v_div_scale_f64 v[10:11], vcc_lo, v[6:7], v[6:7], v[2:3]
; GFX11-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX11-NEXT: v_div_scale_f64 v[16:17], vcc_lo, v[2:3], v[6:7], v[2:3]
; GFX11-NEXT: v_div_scale_f64 v[16:17], s0, v[2:3], v[6:7], v[2:3]
; GFX11-NEXT: v_fma_f64 v[14:15], v[14:15], v[18:19], v[14:15]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_mul_f64 v[18:19], v[20:21], v[12:13]
; GFX11-NEXT: v_mul_f64 v[22:23], v[16:17], v[14:15]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_fma_f64 v[8:9], -v[8:9], v[18:19], v[20:21]
; GFX11-NEXT: v_fma_f64 v[10:11], -v[10:11], v[22:23], v[16:17]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[12:13], v[18:19]
; GFX11-NEXT: s_mov_b32 vcc_lo, s0
; GFX11-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[14:15], v[22:23]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]		; GFX11-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_rcp_f64_e32 v[12:13], v[10:11]
		; GFX11-NEXT: s_waitcnt_depctr 0xfff
		; GFX11-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
		; GFX11-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
		; GFX11-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_mul_f64 v[14:15], v[16:17], v[12:13]
		; GFX11-NEXT: v_fma_f64 v[10:11], -v[10:11], v[14:15], v[16:17]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[12:13], v[14:15]
; GFX11-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]		; GFX11-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv <2 x double> %a, %b, !fpmath !0		%fdiv = fdiv <2 x double> %a, %b, !fpmath !0
ret <2 x double> %fdiv		ret <2 x double> %fdiv
}		}

define <2 x double> @v_rcp_v2f64(<2 x double> %x) {		define <2 x double> @v_rcp_v2f64(<2 x double> %x) {
; GFX6-LABEL: v_rcp_v2f64:		; GFX6-LABEL: v_rcp_v2f64:
; GFX6: ; %bb.0:		; GFX6: ; %bb.0:
; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-NEXT: v_div_scale_f64 v[4:5], s[4:5], v[0:1], v[0:1], 1.0		; GFX6-NEXT: v_div_scale_f64 v[4:5], vcc, v[0:1], v[0:1], 1.0
; GFX6-NEXT: v_div_scale_f64 v[10:11], s[4:5], v[2:3], v[2:3], 1.0		; GFX6-NEXT: v_div_scale_f64 v[10:11], vcc, 1.0, v[0:1], 1.0
; GFX6-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]		; GFX6-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX6-NEXT: v_mov_b32_e32 v18, 0x3ff00000		; GFX6-NEXT: v_mov_b32_e32 v14, 0x3ff00000
; GFX6-NEXT: v_rcp_f64_e32 v[14:15], v[10:11]		; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v14, v11
; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX6-NEXT: v_fma_f64 v[16:17], -v[10:11], v[14:15], 1.0
; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX6-NEXT: v_div_scale_f64 v[8:9], s[4:5], 1.0, v[0:1], 1.0
; GFX6-NEXT: v_fma_f64 v[12:13], -v[4:5], v[6:7], 1.0
; GFX6-NEXT: v_fma_f64 v[14:15], v[14:15], v[16:17], v[14:15]
; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[12:13], v[6:7]
; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v18, v9
; GFX6-NEXT: v_mul_f64 v[12:13], v[8:9], v[6:7]
; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v1, v5		; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v1, v5
; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[12:13], v[8:9]		; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX6-NEXT: v_fma_f64 v[4:5], -v[10:11], v[14:15], 1.0
; GFX6-NEXT: v_div_scale_f64 v[16:17], s[6:7], 1.0, v[2:3], 1.0
; GFX6-NEXT: v_fma_f64 v[4:5], v[14:15], v[4:5], v[14:15]
; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]		; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]
; GFX6-NEXT: v_mul_f64 v[14:15], v[16:17], v[4:5]		; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX6-NEXT: v_div_fmas_f64 v[6:7], v[8:9], v[6:7], v[12:13]		; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX6-NEXT: v_fma_f64 v[8:9], -v[10:11], v[14:15], v[16:17]		; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v18, v17		; GFX6-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]
; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v3, v11		; GFX6-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]
		; GFX6-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]
		; GFX6-NEXT: v_div_scale_f64 v[6:7], vcc, v[2:3], v[2:3], 1.0
		; GFX6-NEXT: v_div_scale_f64 v[12:13], vcc, 1.0, v[2:3], 1.0
		; GFX6-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]
		; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v3, v7
		; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v14, v13
; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]		; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]
; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[6:7], v[0:1], 1.0		; GFX6-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
; GFX6-NEXT: s_nop 0		; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0
; GFX6-NEXT: v_div_fmas_f64 v[4:5], v[8:9], v[4:5], v[14:15]		; GFX6-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
; GFX6-NEXT: v_div_fixup_f64 v[2:3], v[4:5], v[2:3], 1.0		; GFX6-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
		; GFX6-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
		; GFX6-NEXT: v_mul_f64 v[10:11], v[12:13], v[8:9]
		; GFX6-NEXT: v_fma_f64 v[6:7], -v[6:7], v[10:11], v[12:13]
		; GFX6-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[8:9], v[10:11]
		; GFX6-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0
; GFX6-NEXT: s_setpc_b64 s[30:31]		; GFX6-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX8-LABEL: v_rcp_v2f64:		; GFX8-LABEL: v_rcp_v2f64:
; GFX8: ; %bb.0:		; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: v_div_scale_f64 v[4:5], s[4:5], v[0:1], v[0:1], 1.0		; GFX8-NEXT: v_div_scale_f64 v[4:5], vcc, v[0:1], v[0:1], 1.0
; GFX8-NEXT: v_div_scale_f64 v[6:7], s[4:5], v[2:3], v[2:3], 1.0		; GFX8-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX8-NEXT: v_div_scale_f64 v[16:17], s[4:5], 1.0, v[2:3], 1.0		; GFX8-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX8-NEXT: v_rcp_f64_e32 v[8:9], v[4:5]		; GFX8-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX8-NEXT: v_rcp_f64_e32 v[10:11], v[6:7]		; GFX8-NEXT: v_div_scale_f64 v[8:9], vcc, 1.0, v[0:1], 1.0
; GFX8-NEXT: v_fma_f64 v[12:13], -v[4:5], v[8:9], 1.0		; GFX8-NEXT: v_fma_f64 v[10:11], -v[4:5], v[6:7], 1.0
; GFX8-NEXT: v_fma_f64 v[14:15], -v[6:7], v[10:11], 1.0		; GFX8-NEXT: v_fma_f64 v[6:7], v[6:7], v[10:11], v[6:7]
; GFX8-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]		; GFX8-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]
; GFX8-NEXT: v_div_scale_f64 v[12:13], vcc, 1.0, v[0:1], 1.0		; GFX8-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]
; GFX8-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]		; GFX8-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[10:11]
; GFX8-NEXT: v_fma_f64 v[14:15], -v[4:5], v[8:9], 1.0		; GFX8-NEXT: v_div_scale_f64 v[6:7], vcc, v[2:3], v[2:3], 1.0
; GFX8-NEXT: v_fma_f64 v[18:19], -v[6:7], v[10:11], 1.0
; GFX8-NEXT: v_fma_f64 v[8:9], v[8:9], v[14:15], v[8:9]
; GFX8-NEXT: v_fma_f64 v[10:11], v[10:11], v[18:19], v[10:11]
; GFX8-NEXT: v_mul_f64 v[14:15], v[12:13], v[8:9]
; GFX8-NEXT: v_mul_f64 v[18:19], v[16:17], v[10:11]
; GFX8-NEXT: v_fma_f64 v[4:5], -v[4:5], v[14:15], v[12:13]
; GFX8-NEXT: v_fma_f64 v[6:7], -v[6:7], v[18:19], v[16:17]
; GFX8-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[8:9], v[14:15]
; GFX8-NEXT: s_mov_b64 vcc, s[4:5]
; GFX8-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[10:11], v[18:19]
; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0		; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0
		; GFX8-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]
		; GFX8-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
		; GFX8-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
		; GFX8-NEXT: v_div_scale_f64 v[10:11], vcc, 1.0, v[2:3], 1.0
		; GFX8-NEXT: v_fma_f64 v[12:13], -v[6:7], v[8:9], 1.0
		; GFX8-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]
		; GFX8-NEXT: v_mul_f64 v[12:13], v[10:11], v[8:9]
		; GFX8-NEXT: v_fma_f64 v[6:7], -v[6:7], v[12:13], v[10:11]
		; GFX8-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[8:9], v[12:13]
; GFX8-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0		; GFX8-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0
; GFX8-NEXT: s_setpc_b64 s[30:31]		; GFX8-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX9-LABEL: v_rcp_v2f64:		; GFX9-LABEL: v_rcp_v2f64:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: v_div_scale_f64 v[4:5], s[4:5], v[0:1], v[0:1], 1.0		; GFX9-NEXT: v_div_scale_f64 v[4:5], vcc, v[0:1], v[0:1], 1.0
; GFX9-NEXT: v_div_scale_f64 v[6:7], s[4:5], v[2:3], v[2:3], 1.0		; GFX9-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX9-NEXT: v_div_scale_f64 v[16:17], s[4:5], 1.0, v[2:3], 1.0		; GFX9-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX9-NEXT: v_rcp_f64_e32 v[8:9], v[4:5]		; GFX9-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX9-NEXT: v_rcp_f64_e32 v[10:11], v[6:7]		; GFX9-NEXT: v_div_scale_f64 v[8:9], vcc, 1.0, v[0:1], 1.0
; GFX9-NEXT: v_fma_f64 v[12:13], -v[4:5], v[8:9], 1.0		; GFX9-NEXT: v_fma_f64 v[10:11], -v[4:5], v[6:7], 1.0
; GFX9-NEXT: v_fma_f64 v[14:15], -v[6:7], v[10:11], 1.0		; GFX9-NEXT: v_fma_f64 v[6:7], v[6:7], v[10:11], v[6:7]
; GFX9-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]		; GFX9-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]
; GFX9-NEXT: v_div_scale_f64 v[12:13], vcc, 1.0, v[0:1], 1.0		; GFX9-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]
; GFX9-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]		; GFX9-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[10:11]
; GFX9-NEXT: v_fma_f64 v[14:15], -v[4:5], v[8:9], 1.0		; GFX9-NEXT: v_div_scale_f64 v[6:7], vcc, v[2:3], v[2:3], 1.0
; GFX9-NEXT: v_fma_f64 v[18:19], -v[6:7], v[10:11], 1.0
; GFX9-NEXT: v_fma_f64 v[8:9], v[8:9], v[14:15], v[8:9]
; GFX9-NEXT: v_fma_f64 v[10:11], v[10:11], v[18:19], v[10:11]
; GFX9-NEXT: v_mul_f64 v[14:15], v[12:13], v[8:9]
; GFX9-NEXT: v_mul_f64 v[18:19], v[16:17], v[10:11]
; GFX9-NEXT: v_fma_f64 v[4:5], -v[4:5], v[14:15], v[12:13]
; GFX9-NEXT: v_fma_f64 v[6:7], -v[6:7], v[18:19], v[16:17]
; GFX9-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[8:9], v[14:15]
; GFX9-NEXT: s_mov_b64 vcc, s[4:5]
; GFX9-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[10:11], v[18:19]
; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0		; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0
		; GFX9-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]
		; GFX9-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
		; GFX9-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
		; GFX9-NEXT: v_div_scale_f64 v[10:11], vcc, 1.0, v[2:3], 1.0
		; GFX9-NEXT: v_fma_f64 v[12:13], -v[6:7], v[8:9], 1.0
		; GFX9-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]
		; GFX9-NEXT: v_mul_f64 v[12:13], v[10:11], v[8:9]
		; GFX9-NEXT: v_fma_f64 v[6:7], -v[6:7], v[12:13], v[10:11]
		; GFX9-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[8:9], v[12:13]
; GFX9-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0		; GFX9-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: v_rcp_v2f64:		; GFX10-LABEL: v_rcp_v2f64:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: v_div_scale_f64 v[4:5], s4, v[0:1], v[0:1], 1.0		; GFX10-NEXT: v_div_scale_f64 v[4:5], vcc_lo, v[0:1], v[0:1], 1.0
; GFX10-NEXT: v_div_scale_f64 v[6:7], s4, v[2:3], v[2:3], 1.0		; GFX10-NEXT: v_div_scale_f64 v[10:11], vcc_lo, 1.0, v[0:1], 1.0
; GFX10-NEXT: v_div_scale_f64 v[16:17], vcc_lo, 1.0, v[0:1], 1.0		; GFX10-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX10-NEXT: v_rcp_f64_e32 v[8:9], v[4:5]		; GFX10-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX10-NEXT: v_rcp_f64_e32 v[10:11], v[6:7]		; GFX10-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX10-NEXT: v_fma_f64 v[12:13], -v[4:5], v[8:9], 1.0		; GFX10-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX10-NEXT: v_fma_f64 v[14:15], -v[6:7], v[10:11], 1.0		; GFX10-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX10-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]		; GFX10-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]
; GFX10-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]		; GFX10-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]
; GFX10-NEXT: v_fma_f64 v[12:13], -v[4:5], v[8:9], 1.0		; GFX10-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]
; GFX10-NEXT: v_fma_f64 v[14:15], -v[6:7], v[10:11], 1.0		; GFX10-NEXT: v_div_scale_f64 v[6:7], vcc_lo, v[2:3], v[2:3], 1.0
; GFX10-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]		; GFX10-NEXT: v_div_scale_f64 v[12:13], vcc_lo, 1.0, v[2:3], 1.0
; GFX10-NEXT: v_div_scale_f64 v[12:13], s4, 1.0, v[2:3], 1.0
; GFX10-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]
; GFX10-NEXT: v_mul_f64 v[14:15], v[16:17], v[8:9]
; GFX10-NEXT: v_mul_f64 v[18:19], v[12:13], v[10:11]
; GFX10-NEXT: v_fma_f64 v[4:5], -v[4:5], v[14:15], v[16:17]
; GFX10-NEXT: v_fma_f64 v[6:7], -v[6:7], v[18:19], v[12:13]
; GFX10-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[8:9], v[14:15]
; GFX10-NEXT: s_mov_b32 vcc_lo, s4
; GFX10-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[10:11], v[18:19]
; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0		; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0
		; GFX10-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]
		; GFX10-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
		; GFX10-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
		; GFX10-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
		; GFX10-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
		; GFX10-NEXT: v_mul_f64 v[10:11], v[12:13], v[8:9]
		; GFX10-NEXT: v_fma_f64 v[6:7], -v[6:7], v[10:11], v[12:13]
		; GFX10-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[8:9], v[10:11]
; GFX10-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0		; GFX10-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-LABEL: v_rcp_v2f64:		; GFX11-LABEL: v_rcp_v2f64:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: v_div_scale_f64 v[4:5], null, v[0:1], v[0:1], 1.0		; GFX11-NEXT: v_div_scale_f64 v[4:5], vcc_lo, v[0:1], v[0:1], 1.0
; GFX11-NEXT: v_div_scale_f64 v[6:7], null, v[2:3], v[2:3], 1.0		; GFX11-NEXT: v_div_scale_f64 v[10:11], vcc_lo, 1.0, v[0:1], 1.0
; GFX11-NEXT: v_div_scale_f64 v[16:17], vcc_lo, 1.0, v[0:1], 1.0		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX11-NEXT: v_rcp_f64_e32 v[8:9], v[4:5]
; GFX11-NEXT: v_rcp_f64_e32 v[10:11], v[6:7]
; GFX11-NEXT: s_waitcnt_depctr 0xfff		; GFX11-NEXT: s_waitcnt_depctr 0xfff
; GFX11-NEXT: v_fma_f64 v[12:13], -v[4:5], v[8:9], 1.0		; GFX11-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX11-NEXT: v_fma_f64 v[14:15], -v[6:7], v[10:11], 1.0		; GFX11-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]		; GFX11-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX11-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]		; GFX11-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_fma_f64 v[12:13], -v[4:5], v[8:9], 1.0		; GFX11-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]
; GFX11-NEXT: v_fma_f64 v[14:15], -v[6:7], v[10:11], 1.0		; GFX11-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_1) \| instid1(VALU_DEP_3)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_2) \| instid1(VALU_DEP_3)
; GFX11-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]		; GFX11-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]
; GFX11-NEXT: v_div_scale_f64 v[12:13], s0, 1.0, v[2:3], 1.0		; GFX11-NEXT: v_div_scale_f64 v[6:7], vcc_lo, v[2:3], v[2:3], 1.0
; GFX11-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]		; GFX11-NEXT: v_div_scale_f64 v[12:13], vcc_lo, 1.0, v[2:3], 1.0
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_mul_f64 v[14:15], v[16:17], v[8:9]
; GFX11-NEXT: v_mul_f64 v[18:19], v[12:13], v[10:11]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_fma_f64 v[4:5], -v[4:5], v[14:15], v[16:17]
; GFX11-NEXT: v_fma_f64 v[6:7], -v[6:7], v[18:19], v[12:13]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[8:9], v[14:15]
; GFX11-NEXT: s_mov_b32 vcc_lo, s0
; GFX11-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[10:11], v[18:19]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0		; GFX11-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]
		; GFX11-NEXT: s_waitcnt_depctr 0xfff
		; GFX11-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
		; GFX11-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
		; GFX11-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_mul_f64 v[10:11], v[12:13], v[8:9]
		; GFX11-NEXT: v_fma_f64 v[6:7], -v[6:7], v[10:11], v[12:13]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[8:9], v[10:11]
; GFX11-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0		; GFX11-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv <2 x double> <double 1.0, double 1.0>, %x		%fdiv = fdiv <2 x double> <double 1.0, double 1.0>, %x
ret <2 x double> %fdiv		ret <2 x double> %fdiv
}		}

define <2 x double> @v_rcp_v2f64_arcp(<2 x double> %x) {		define <2 x double> @v_rcp_v2f64_arcp(<2 x double> %x) {
; GFX6-LABEL: v_rcp_v2f64_arcp:		; GFX6-LABEL: v_rcp_v2f64_arcp:
; GFX6: ; %bb.0:		; GFX6: ; %bb.0:
; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-NEXT: v_div_scale_f64 v[4:5], s[4:5], v[0:1], v[0:1], 1.0		; GFX6-NEXT: v_div_scale_f64 v[4:5], vcc, v[0:1], v[0:1], 1.0
; GFX6-NEXT: v_div_scale_f64 v[10:11], s[4:5], v[2:3], v[2:3], 1.0		; GFX6-NEXT: v_div_scale_f64 v[10:11], vcc, 1.0, v[0:1], 1.0
; GFX6-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]		; GFX6-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX6-NEXT: v_mov_b32_e32 v18, 0x3ff00000		; GFX6-NEXT: v_mov_b32_e32 v14, 0x3ff00000
; GFX6-NEXT: v_rcp_f64_e32 v[14:15], v[10:11]		; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v14, v11
; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX6-NEXT: v_fma_f64 v[16:17], -v[10:11], v[14:15], 1.0
; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX6-NEXT: v_div_scale_f64 v[8:9], s[4:5], 1.0, v[0:1], 1.0
; GFX6-NEXT: v_fma_f64 v[12:13], -v[4:5], v[6:7], 1.0
; GFX6-NEXT: v_fma_f64 v[14:15], v[14:15], v[16:17], v[14:15]
; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[12:13], v[6:7]
; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v18, v9
; GFX6-NEXT: v_mul_f64 v[12:13], v[8:9], v[6:7]
; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v1, v5		; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v1, v5
; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[12:13], v[8:9]		; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX6-NEXT: v_fma_f64 v[4:5], -v[10:11], v[14:15], 1.0
; GFX6-NEXT: v_div_scale_f64 v[16:17], s[6:7], 1.0, v[2:3], 1.0
; GFX6-NEXT: v_fma_f64 v[4:5], v[14:15], v[4:5], v[14:15]
; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]		; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]
; GFX6-NEXT: v_mul_f64 v[14:15], v[16:17], v[4:5]		; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX6-NEXT: v_div_fmas_f64 v[6:7], v[8:9], v[6:7], v[12:13]		; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX6-NEXT: v_fma_f64 v[8:9], -v[10:11], v[14:15], v[16:17]		; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v18, v17		; GFX6-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]
; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v3, v11		; GFX6-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]
		; GFX6-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]
		; GFX6-NEXT: v_div_scale_f64 v[6:7], vcc, v[2:3], v[2:3], 1.0
		; GFX6-NEXT: v_div_scale_f64 v[12:13], vcc, 1.0, v[2:3], 1.0
		; GFX6-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]
		; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v3, v7
		; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v14, v13
; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]		; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]
; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[6:7], v[0:1], 1.0		; GFX6-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
; GFX6-NEXT: s_nop 0		; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0
; GFX6-NEXT: v_div_fmas_f64 v[4:5], v[8:9], v[4:5], v[14:15]		; GFX6-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
; GFX6-NEXT: v_div_fixup_f64 v[2:3], v[4:5], v[2:3], 1.0		; GFX6-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
		; GFX6-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
		; GFX6-NEXT: v_mul_f64 v[10:11], v[12:13], v[8:9]
		; GFX6-NEXT: v_fma_f64 v[6:7], -v[6:7], v[10:11], v[12:13]
		; GFX6-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[8:9], v[10:11]
		; GFX6-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0
; GFX6-NEXT: s_setpc_b64 s[30:31]		; GFX6-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX8-LABEL: v_rcp_v2f64_arcp:		; GFX8-LABEL: v_rcp_v2f64_arcp:
; GFX8: ; %bb.0:		; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: v_div_scale_f64 v[4:5], s[4:5], v[0:1], v[0:1], 1.0		; GFX8-NEXT: v_div_scale_f64 v[4:5], vcc, v[0:1], v[0:1], 1.0
; GFX8-NEXT: v_div_scale_f64 v[6:7], s[4:5], v[2:3], v[2:3], 1.0		; GFX8-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX8-NEXT: v_div_scale_f64 v[16:17], s[4:5], 1.0, v[2:3], 1.0		; GFX8-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX8-NEXT: v_rcp_f64_e32 v[8:9], v[4:5]		; GFX8-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX8-NEXT: v_rcp_f64_e32 v[10:11], v[6:7]		; GFX8-NEXT: v_div_scale_f64 v[8:9], vcc, 1.0, v[0:1], 1.0
; GFX8-NEXT: v_fma_f64 v[12:13], -v[4:5], v[8:9], 1.0		; GFX8-NEXT: v_fma_f64 v[10:11], -v[4:5], v[6:7], 1.0
; GFX8-NEXT: v_fma_f64 v[14:15], -v[6:7], v[10:11], 1.0		; GFX8-NEXT: v_fma_f64 v[6:7], v[6:7], v[10:11], v[6:7]
; GFX8-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]		; GFX8-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]
; GFX8-NEXT: v_div_scale_f64 v[12:13], vcc, 1.0, v[0:1], 1.0		; GFX8-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]
; GFX8-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]		; GFX8-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[10:11]
; GFX8-NEXT: v_fma_f64 v[14:15], -v[4:5], v[8:9], 1.0		; GFX8-NEXT: v_div_scale_f64 v[6:7], vcc, v[2:3], v[2:3], 1.0
; GFX8-NEXT: v_fma_f64 v[18:19], -v[6:7], v[10:11], 1.0
; GFX8-NEXT: v_fma_f64 v[8:9], v[8:9], v[14:15], v[8:9]
; GFX8-NEXT: v_fma_f64 v[10:11], v[10:11], v[18:19], v[10:11]
; GFX8-NEXT: v_mul_f64 v[14:15], v[12:13], v[8:9]
; GFX8-NEXT: v_mul_f64 v[18:19], v[16:17], v[10:11]
; GFX8-NEXT: v_fma_f64 v[4:5], -v[4:5], v[14:15], v[12:13]
; GFX8-NEXT: v_fma_f64 v[6:7], -v[6:7], v[18:19], v[16:17]
; GFX8-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[8:9], v[14:15]
; GFX8-NEXT: s_mov_b64 vcc, s[4:5]
; GFX8-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[10:11], v[18:19]
; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0		; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0
		; GFX8-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]
		; GFX8-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
		; GFX8-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
		; GFX8-NEXT: v_div_scale_f64 v[10:11], vcc, 1.0, v[2:3], 1.0
		; GFX8-NEXT: v_fma_f64 v[12:13], -v[6:7], v[8:9], 1.0
		; GFX8-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]
		; GFX8-NEXT: v_mul_f64 v[12:13], v[10:11], v[8:9]
		; GFX8-NEXT: v_fma_f64 v[6:7], -v[6:7], v[12:13], v[10:11]
		; GFX8-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[8:9], v[12:13]
; GFX8-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0		; GFX8-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0
; GFX8-NEXT: s_setpc_b64 s[30:31]		; GFX8-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX9-LABEL: v_rcp_v2f64_arcp:		; GFX9-LABEL: v_rcp_v2f64_arcp:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: v_div_scale_f64 v[4:5], s[4:5], v[0:1], v[0:1], 1.0		; GFX9-NEXT: v_div_scale_f64 v[4:5], vcc, v[0:1], v[0:1], 1.0
; GFX9-NEXT: v_div_scale_f64 v[6:7], s[4:5], v[2:3], v[2:3], 1.0		; GFX9-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX9-NEXT: v_div_scale_f64 v[16:17], s[4:5], 1.0, v[2:3], 1.0		; GFX9-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX9-NEXT: v_rcp_f64_e32 v[8:9], v[4:5]		; GFX9-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX9-NEXT: v_rcp_f64_e32 v[10:11], v[6:7]		; GFX9-NEXT: v_div_scale_f64 v[8:9], vcc, 1.0, v[0:1], 1.0
; GFX9-NEXT: v_fma_f64 v[12:13], -v[4:5], v[8:9], 1.0		; GFX9-NEXT: v_fma_f64 v[10:11], -v[4:5], v[6:7], 1.0
; GFX9-NEXT: v_fma_f64 v[14:15], -v[6:7], v[10:11], 1.0		; GFX9-NEXT: v_fma_f64 v[6:7], v[6:7], v[10:11], v[6:7]
; GFX9-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]		; GFX9-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]
; GFX9-NEXT: v_div_scale_f64 v[12:13], vcc, 1.0, v[0:1], 1.0		; GFX9-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]
; GFX9-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]		; GFX9-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[10:11]
; GFX9-NEXT: v_fma_f64 v[14:15], -v[4:5], v[8:9], 1.0		; GFX9-NEXT: v_div_scale_f64 v[6:7], vcc, v[2:3], v[2:3], 1.0
; GFX9-NEXT: v_fma_f64 v[18:19], -v[6:7], v[10:11], 1.0
; GFX9-NEXT: v_fma_f64 v[8:9], v[8:9], v[14:15], v[8:9]
; GFX9-NEXT: v_fma_f64 v[10:11], v[10:11], v[18:19], v[10:11]
; GFX9-NEXT: v_mul_f64 v[14:15], v[12:13], v[8:9]
; GFX9-NEXT: v_mul_f64 v[18:19], v[16:17], v[10:11]
; GFX9-NEXT: v_fma_f64 v[4:5], -v[4:5], v[14:15], v[12:13]
; GFX9-NEXT: v_fma_f64 v[6:7], -v[6:7], v[18:19], v[16:17]
; GFX9-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[8:9], v[14:15]
; GFX9-NEXT: s_mov_b64 vcc, s[4:5]
; GFX9-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[10:11], v[18:19]
; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0		; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0
		; GFX9-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]
		; GFX9-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
		; GFX9-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
		; GFX9-NEXT: v_div_scale_f64 v[10:11], vcc, 1.0, v[2:3], 1.0
		; GFX9-NEXT: v_fma_f64 v[12:13], -v[6:7], v[8:9], 1.0
		; GFX9-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]
		; GFX9-NEXT: v_mul_f64 v[12:13], v[10:11], v[8:9]
		; GFX9-NEXT: v_fma_f64 v[6:7], -v[6:7], v[12:13], v[10:11]
		; GFX9-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[8:9], v[12:13]
; GFX9-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0		; GFX9-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: v_rcp_v2f64_arcp:		; GFX10-LABEL: v_rcp_v2f64_arcp:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: v_div_scale_f64 v[4:5], s4, v[0:1], v[0:1], 1.0		; GFX10-NEXT: v_div_scale_f64 v[4:5], vcc_lo, v[0:1], v[0:1], 1.0
; GFX10-NEXT: v_div_scale_f64 v[6:7], s4, v[2:3], v[2:3], 1.0		; GFX10-NEXT: v_div_scale_f64 v[10:11], vcc_lo, 1.0, v[0:1], 1.0
; GFX10-NEXT: v_div_scale_f64 v[16:17], vcc_lo, 1.0, v[0:1], 1.0		; GFX10-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX10-NEXT: v_rcp_f64_e32 v[8:9], v[4:5]		; GFX10-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX10-NEXT: v_rcp_f64_e32 v[10:11], v[6:7]		; GFX10-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX10-NEXT: v_fma_f64 v[12:13], -v[4:5], v[8:9], 1.0		; GFX10-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX10-NEXT: v_fma_f64 v[14:15], -v[6:7], v[10:11], 1.0		; GFX10-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX10-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]		; GFX10-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]
; GFX10-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]		; GFX10-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]
; GFX10-NEXT: v_fma_f64 v[12:13], -v[4:5], v[8:9], 1.0		; GFX10-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]
; GFX10-NEXT: v_fma_f64 v[14:15], -v[6:7], v[10:11], 1.0		; GFX10-NEXT: v_div_scale_f64 v[6:7], vcc_lo, v[2:3], v[2:3], 1.0
; GFX10-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]		; GFX10-NEXT: v_div_scale_f64 v[12:13], vcc_lo, 1.0, v[2:3], 1.0
; GFX10-NEXT: v_div_scale_f64 v[12:13], s4, 1.0, v[2:3], 1.0
; GFX10-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]
; GFX10-NEXT: v_mul_f64 v[14:15], v[16:17], v[8:9]
; GFX10-NEXT: v_mul_f64 v[18:19], v[12:13], v[10:11]
; GFX10-NEXT: v_fma_f64 v[4:5], -v[4:5], v[14:15], v[16:17]
; GFX10-NEXT: v_fma_f64 v[6:7], -v[6:7], v[18:19], v[12:13]
; GFX10-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[8:9], v[14:15]
; GFX10-NEXT: s_mov_b32 vcc_lo, s4
; GFX10-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[10:11], v[18:19]
; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0		; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0
		; GFX10-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]
		; GFX10-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
		; GFX10-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
		; GFX10-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
		; GFX10-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
		; GFX10-NEXT: v_mul_f64 v[10:11], v[12:13], v[8:9]
		; GFX10-NEXT: v_fma_f64 v[6:7], -v[6:7], v[10:11], v[12:13]
		; GFX10-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[8:9], v[10:11]
; GFX10-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0		; GFX10-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-LABEL: v_rcp_v2f64_arcp:		; GFX11-LABEL: v_rcp_v2f64_arcp:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: v_div_scale_f64 v[4:5], null, v[0:1], v[0:1], 1.0		; GFX11-NEXT: v_div_scale_f64 v[4:5], vcc_lo, v[0:1], v[0:1], 1.0
; GFX11-NEXT: v_div_scale_f64 v[6:7], null, v[2:3], v[2:3], 1.0		; GFX11-NEXT: v_div_scale_f64 v[10:11], vcc_lo, 1.0, v[0:1], 1.0
; GFX11-NEXT: v_div_scale_f64 v[16:17], vcc_lo, 1.0, v[0:1], 1.0		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX11-NEXT: v_rcp_f64_e32 v[8:9], v[4:5]
; GFX11-NEXT: v_rcp_f64_e32 v[10:11], v[6:7]
; GFX11-NEXT: s_waitcnt_depctr 0xfff		; GFX11-NEXT: s_waitcnt_depctr 0xfff
; GFX11-NEXT: v_fma_f64 v[12:13], -v[4:5], v[8:9], 1.0		; GFX11-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX11-NEXT: v_fma_f64 v[14:15], -v[6:7], v[10:11], 1.0		; GFX11-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]		; GFX11-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX11-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]		; GFX11-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_fma_f64 v[12:13], -v[4:5], v[8:9], 1.0		; GFX11-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]
; GFX11-NEXT: v_fma_f64 v[14:15], -v[6:7], v[10:11], 1.0		; GFX11-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_1) \| instid1(VALU_DEP_3)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_2) \| instid1(VALU_DEP_3)
; GFX11-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]		; GFX11-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]
; GFX11-NEXT: v_div_scale_f64 v[12:13], s0, 1.0, v[2:3], 1.0		; GFX11-NEXT: v_div_scale_f64 v[6:7], vcc_lo, v[2:3], v[2:3], 1.0
; GFX11-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]		; GFX11-NEXT: v_div_scale_f64 v[12:13], vcc_lo, 1.0, v[2:3], 1.0
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_mul_f64 v[14:15], v[16:17], v[8:9]
; GFX11-NEXT: v_mul_f64 v[18:19], v[12:13], v[10:11]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_fma_f64 v[4:5], -v[4:5], v[14:15], v[16:17]
; GFX11-NEXT: v_fma_f64 v[6:7], -v[6:7], v[18:19], v[12:13]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[8:9], v[14:15]
; GFX11-NEXT: s_mov_b32 vcc_lo, s0
; GFX11-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[10:11], v[18:19]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0		; GFX11-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]
		; GFX11-NEXT: s_waitcnt_depctr 0xfff
		; GFX11-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
		; GFX11-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
		; GFX11-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_mul_f64 v[10:11], v[12:13], v[8:9]
		; GFX11-NEXT: v_fma_f64 v[6:7], -v[6:7], v[10:11], v[12:13]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[8:9], v[10:11]
; GFX11-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0		; GFX11-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv arcp <2 x double> <double 1.0, double 1.0>, %x		%fdiv = fdiv arcp <2 x double> <double 1.0, double 1.0>, %x
ret <2 x double> %fdiv		ret <2 x double> %fdiv
}		}

define <2 x double> @v_rcp_v2f64_arcp_afn(<2 x double> %x) {		define <2 x double> @v_rcp_v2f64_arcp_afn(<2 x double> %x) {
; GCN-LABEL: v_rcp_v2f64_arcp_afn:		; GCN-LABEL: v_rcp_v2f64_arcp_afn:
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	; GFX11-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv arcp afn <2 x double> <double 1.0, double 1.0>, %x		%fdiv = fdiv arcp afn <2 x double> <double 1.0, double 1.0>, %x
ret <2 x double> %fdiv		ret <2 x double> %fdiv
}		}

define <2 x double> @v_rcp_v2f64_ulp25(<2 x double> %x) {		define <2 x double> @v_rcp_v2f64_ulp25(<2 x double> %x) {
; GFX6-LABEL: v_rcp_v2f64_ulp25:		; GFX6-LABEL: v_rcp_v2f64_ulp25:
; GFX6: ; %bb.0:		; GFX6: ; %bb.0:
; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-NEXT: v_div_scale_f64 v[4:5], s[4:5], v[0:1], v[0:1], 1.0		; GFX6-NEXT: v_div_scale_f64 v[4:5], vcc, v[0:1], v[0:1], 1.0
; GFX6-NEXT: v_div_scale_f64 v[10:11], s[4:5], v[2:3], v[2:3], 1.0		; GFX6-NEXT: v_div_scale_f64 v[10:11], vcc, 1.0, v[0:1], 1.0
; GFX6-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]		; GFX6-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX6-NEXT: v_mov_b32_e32 v18, 0x3ff00000		; GFX6-NEXT: v_mov_b32_e32 v14, 0x3ff00000
; GFX6-NEXT: v_rcp_f64_e32 v[14:15], v[10:11]		; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v14, v11
; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX6-NEXT: v_fma_f64 v[16:17], -v[10:11], v[14:15], 1.0
; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX6-NEXT: v_div_scale_f64 v[8:9], s[4:5], 1.0, v[0:1], 1.0
; GFX6-NEXT: v_fma_f64 v[12:13], -v[4:5], v[6:7], 1.0
; GFX6-NEXT: v_fma_f64 v[14:15], v[14:15], v[16:17], v[14:15]
; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[12:13], v[6:7]
; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v18, v9
; GFX6-NEXT: v_mul_f64 v[12:13], v[8:9], v[6:7]
; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v1, v5		; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v1, v5
; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[12:13], v[8:9]		; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX6-NEXT: v_fma_f64 v[4:5], -v[10:11], v[14:15], 1.0
; GFX6-NEXT: v_div_scale_f64 v[16:17], s[6:7], 1.0, v[2:3], 1.0
; GFX6-NEXT: v_fma_f64 v[4:5], v[14:15], v[4:5], v[14:15]
; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]		; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]
; GFX6-NEXT: v_mul_f64 v[14:15], v[16:17], v[4:5]		; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX6-NEXT: v_div_fmas_f64 v[6:7], v[8:9], v[6:7], v[12:13]		; GFX6-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX6-NEXT: v_fma_f64 v[8:9], -v[10:11], v[14:15], v[16:17]		; GFX6-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v18, v17		; GFX6-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]
; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v3, v11		; GFX6-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]
		; GFX6-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]
		; GFX6-NEXT: v_div_scale_f64 v[6:7], vcc, v[2:3], v[2:3], 1.0
		; GFX6-NEXT: v_div_scale_f64 v[12:13], vcc, 1.0, v[2:3], 1.0
		; GFX6-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]
		; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v3, v7
		; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v14, v13
; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]		; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]
; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[6:7], v[0:1], 1.0		; GFX6-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
; GFX6-NEXT: s_nop 0		; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0
; GFX6-NEXT: v_div_fmas_f64 v[4:5], v[8:9], v[4:5], v[14:15]		; GFX6-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
; GFX6-NEXT: v_div_fixup_f64 v[2:3], v[4:5], v[2:3], 1.0		; GFX6-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
		; GFX6-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
		; GFX6-NEXT: v_mul_f64 v[10:11], v[12:13], v[8:9]
		; GFX6-NEXT: v_fma_f64 v[6:7], -v[6:7], v[10:11], v[12:13]
		; GFX6-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[8:9], v[10:11]
		; GFX6-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0
; GFX6-NEXT: s_setpc_b64 s[30:31]		; GFX6-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX8-LABEL: v_rcp_v2f64_ulp25:		; GFX8-LABEL: v_rcp_v2f64_ulp25:
; GFX8: ; %bb.0:		; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: v_div_scale_f64 v[4:5], s[4:5], v[0:1], v[0:1], 1.0		; GFX8-NEXT: v_div_scale_f64 v[4:5], vcc, v[0:1], v[0:1], 1.0
; GFX8-NEXT: v_div_scale_f64 v[6:7], s[4:5], v[2:3], v[2:3], 1.0		; GFX8-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX8-NEXT: v_div_scale_f64 v[16:17], s[4:5], 1.0, v[2:3], 1.0		; GFX8-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX8-NEXT: v_rcp_f64_e32 v[8:9], v[4:5]		; GFX8-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX8-NEXT: v_rcp_f64_e32 v[10:11], v[6:7]		; GFX8-NEXT: v_div_scale_f64 v[8:9], vcc, 1.0, v[0:1], 1.0
; GFX8-NEXT: v_fma_f64 v[12:13], -v[4:5], v[8:9], 1.0		; GFX8-NEXT: v_fma_f64 v[10:11], -v[4:5], v[6:7], 1.0
; GFX8-NEXT: v_fma_f64 v[14:15], -v[6:7], v[10:11], 1.0		; GFX8-NEXT: v_fma_f64 v[6:7], v[6:7], v[10:11], v[6:7]
; GFX8-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]		; GFX8-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]
; GFX8-NEXT: v_div_scale_f64 v[12:13], vcc, 1.0, v[0:1], 1.0		; GFX8-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]
; GFX8-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]		; GFX8-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[10:11]
; GFX8-NEXT: v_fma_f64 v[14:15], -v[4:5], v[8:9], 1.0		; GFX8-NEXT: v_div_scale_f64 v[6:7], vcc, v[2:3], v[2:3], 1.0
; GFX8-NEXT: v_fma_f64 v[18:19], -v[6:7], v[10:11], 1.0
; GFX8-NEXT: v_fma_f64 v[8:9], v[8:9], v[14:15], v[8:9]
; GFX8-NEXT: v_fma_f64 v[10:11], v[10:11], v[18:19], v[10:11]
; GFX8-NEXT: v_mul_f64 v[14:15], v[12:13], v[8:9]
; GFX8-NEXT: v_mul_f64 v[18:19], v[16:17], v[10:11]
; GFX8-NEXT: v_fma_f64 v[4:5], -v[4:5], v[14:15], v[12:13]
; GFX8-NEXT: v_fma_f64 v[6:7], -v[6:7], v[18:19], v[16:17]
; GFX8-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[8:9], v[14:15]
; GFX8-NEXT: s_mov_b64 vcc, s[4:5]
; GFX8-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[10:11], v[18:19]
; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0		; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0
		; GFX8-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]
		; GFX8-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
		; GFX8-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
		; GFX8-NEXT: v_div_scale_f64 v[10:11], vcc, 1.0, v[2:3], 1.0
		; GFX8-NEXT: v_fma_f64 v[12:13], -v[6:7], v[8:9], 1.0
		; GFX8-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]
		; GFX8-NEXT: v_mul_f64 v[12:13], v[10:11], v[8:9]
		; GFX8-NEXT: v_fma_f64 v[6:7], -v[6:7], v[12:13], v[10:11]
		; GFX8-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[8:9], v[12:13]
; GFX8-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0		; GFX8-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0
; GFX8-NEXT: s_setpc_b64 s[30:31]		; GFX8-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX9-LABEL: v_rcp_v2f64_ulp25:		; GFX9-LABEL: v_rcp_v2f64_ulp25:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: v_div_scale_f64 v[4:5], s[4:5], v[0:1], v[0:1], 1.0		; GFX9-NEXT: v_div_scale_f64 v[4:5], vcc, v[0:1], v[0:1], 1.0
; GFX9-NEXT: v_div_scale_f64 v[6:7], s[4:5], v[2:3], v[2:3], 1.0		; GFX9-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX9-NEXT: v_div_scale_f64 v[16:17], s[4:5], 1.0, v[2:3], 1.0		; GFX9-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX9-NEXT: v_rcp_f64_e32 v[8:9], v[4:5]		; GFX9-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX9-NEXT: v_rcp_f64_e32 v[10:11], v[6:7]		; GFX9-NEXT: v_div_scale_f64 v[8:9], vcc, 1.0, v[0:1], 1.0
; GFX9-NEXT: v_fma_f64 v[12:13], -v[4:5], v[8:9], 1.0		; GFX9-NEXT: v_fma_f64 v[10:11], -v[4:5], v[6:7], 1.0
; GFX9-NEXT: v_fma_f64 v[14:15], -v[6:7], v[10:11], 1.0		; GFX9-NEXT: v_fma_f64 v[6:7], v[6:7], v[10:11], v[6:7]
; GFX9-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]		; GFX9-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]
; GFX9-NEXT: v_div_scale_f64 v[12:13], vcc, 1.0, v[0:1], 1.0		; GFX9-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]
; GFX9-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]		; GFX9-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[10:11]
; GFX9-NEXT: v_fma_f64 v[14:15], -v[4:5], v[8:9], 1.0		; GFX9-NEXT: v_div_scale_f64 v[6:7], vcc, v[2:3], v[2:3], 1.0
; GFX9-NEXT: v_fma_f64 v[18:19], -v[6:7], v[10:11], 1.0
; GFX9-NEXT: v_fma_f64 v[8:9], v[8:9], v[14:15], v[8:9]
; GFX9-NEXT: v_fma_f64 v[10:11], v[10:11], v[18:19], v[10:11]
; GFX9-NEXT: v_mul_f64 v[14:15], v[12:13], v[8:9]
; GFX9-NEXT: v_mul_f64 v[18:19], v[16:17], v[10:11]
; GFX9-NEXT: v_fma_f64 v[4:5], -v[4:5], v[14:15], v[12:13]
; GFX9-NEXT: v_fma_f64 v[6:7], -v[6:7], v[18:19], v[16:17]
; GFX9-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[8:9], v[14:15]
; GFX9-NEXT: s_mov_b64 vcc, s[4:5]
; GFX9-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[10:11], v[18:19]
; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0		; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0
		; GFX9-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]
		; GFX9-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
		; GFX9-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
		; GFX9-NEXT: v_div_scale_f64 v[10:11], vcc, 1.0, v[2:3], 1.0
		; GFX9-NEXT: v_fma_f64 v[12:13], -v[6:7], v[8:9], 1.0
		; GFX9-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]
		; GFX9-NEXT: v_mul_f64 v[12:13], v[10:11], v[8:9]
		; GFX9-NEXT: v_fma_f64 v[6:7], -v[6:7], v[12:13], v[10:11]
		; GFX9-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[8:9], v[12:13]
; GFX9-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0		; GFX9-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: v_rcp_v2f64_ulp25:		; GFX10-LABEL: v_rcp_v2f64_ulp25:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: v_div_scale_f64 v[4:5], s4, v[0:1], v[0:1], 1.0		; GFX10-NEXT: v_div_scale_f64 v[4:5], vcc_lo, v[0:1], v[0:1], 1.0
; GFX10-NEXT: v_div_scale_f64 v[6:7], s4, v[2:3], v[2:3], 1.0		; GFX10-NEXT: v_div_scale_f64 v[10:11], vcc_lo, 1.0, v[0:1], 1.0
; GFX10-NEXT: v_div_scale_f64 v[16:17], vcc_lo, 1.0, v[0:1], 1.0		; GFX10-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX10-NEXT: v_rcp_f64_e32 v[8:9], v[4:5]		; GFX10-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX10-NEXT: v_rcp_f64_e32 v[10:11], v[6:7]		; GFX10-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX10-NEXT: v_fma_f64 v[12:13], -v[4:5], v[8:9], 1.0		; GFX10-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX10-NEXT: v_fma_f64 v[14:15], -v[6:7], v[10:11], 1.0		; GFX10-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX10-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]		; GFX10-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]
; GFX10-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]		; GFX10-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]
; GFX10-NEXT: v_fma_f64 v[12:13], -v[4:5], v[8:9], 1.0		; GFX10-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]
; GFX10-NEXT: v_fma_f64 v[14:15], -v[6:7], v[10:11], 1.0		; GFX10-NEXT: v_div_scale_f64 v[6:7], vcc_lo, v[2:3], v[2:3], 1.0
; GFX10-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]		; GFX10-NEXT: v_div_scale_f64 v[12:13], vcc_lo, 1.0, v[2:3], 1.0
; GFX10-NEXT: v_div_scale_f64 v[12:13], s4, 1.0, v[2:3], 1.0
; GFX10-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]
; GFX10-NEXT: v_mul_f64 v[14:15], v[16:17], v[8:9]
; GFX10-NEXT: v_mul_f64 v[18:19], v[12:13], v[10:11]
; GFX10-NEXT: v_fma_f64 v[4:5], -v[4:5], v[14:15], v[16:17]
; GFX10-NEXT: v_fma_f64 v[6:7], -v[6:7], v[18:19], v[12:13]
; GFX10-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[8:9], v[14:15]
; GFX10-NEXT: s_mov_b32 vcc_lo, s4
; GFX10-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[10:11], v[18:19]
; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0		; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0
		; GFX10-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]
		; GFX10-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
		; GFX10-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
		; GFX10-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
		; GFX10-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
		; GFX10-NEXT: v_mul_f64 v[10:11], v[12:13], v[8:9]
		; GFX10-NEXT: v_fma_f64 v[6:7], -v[6:7], v[10:11], v[12:13]
		; GFX10-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[8:9], v[10:11]
; GFX10-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0		; GFX10-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-LABEL: v_rcp_v2f64_ulp25:		; GFX11-LABEL: v_rcp_v2f64_ulp25:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: v_div_scale_f64 v[4:5], null, v[0:1], v[0:1], 1.0		; GFX11-NEXT: v_div_scale_f64 v[4:5], vcc_lo, v[0:1], v[0:1], 1.0
; GFX11-NEXT: v_div_scale_f64 v[6:7], null, v[2:3], v[2:3], 1.0		; GFX11-NEXT: v_div_scale_f64 v[10:11], vcc_lo, 1.0, v[0:1], 1.0
; GFX11-NEXT: v_div_scale_f64 v[16:17], vcc_lo, 1.0, v[0:1], 1.0		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
; GFX11-NEXT: v_rcp_f64_e32 v[8:9], v[4:5]
; GFX11-NEXT: v_rcp_f64_e32 v[10:11], v[6:7]
; GFX11-NEXT: s_waitcnt_depctr 0xfff		; GFX11-NEXT: s_waitcnt_depctr 0xfff
; GFX11-NEXT: v_fma_f64 v[12:13], -v[4:5], v[8:9], 1.0		; GFX11-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX11-NEXT: v_fma_f64 v[14:15], -v[6:7], v[10:11], 1.0		; GFX11-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]		; GFX11-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
; GFX11-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]		; GFX11-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_fma_f64 v[12:13], -v[4:5], v[8:9], 1.0		; GFX11-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]
; GFX11-NEXT: v_fma_f64 v[14:15], -v[6:7], v[10:11], 1.0		; GFX11-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_1) \| instid1(VALU_DEP_3)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_2) \| instid1(VALU_DEP_3)
; GFX11-NEXT: v_fma_f64 v[8:9], v[8:9], v[12:13], v[8:9]		; GFX11-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[6:7], v[8:9]
; GFX11-NEXT: v_div_scale_f64 v[12:13], s0, 1.0, v[2:3], 1.0		; GFX11-NEXT: v_div_scale_f64 v[6:7], vcc_lo, v[2:3], v[2:3], 1.0
; GFX11-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]		; GFX11-NEXT: v_div_scale_f64 v[12:13], vcc_lo, 1.0, v[2:3], 1.0
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_mul_f64 v[14:15], v[16:17], v[8:9]
; GFX11-NEXT: v_mul_f64 v[18:19], v[12:13], v[10:11]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_fma_f64 v[4:5], -v[4:5], v[14:15], v[16:17]
; GFX11-NEXT: v_fma_f64 v[6:7], -v[6:7], v[18:19], v[12:13]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_div_fmas_f64 v[4:5], v[4:5], v[8:9], v[14:15]
; GFX11-NEXT: s_mov_b32 vcc_lo, s0
; GFX11-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[10:11], v[18:19]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0		; GFX11-NEXT: v_div_fixup_f64 v[0:1], v[4:5], v[0:1], 1.0
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]
		; GFX11-NEXT: s_waitcnt_depctr 0xfff
		; GFX11-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
		; GFX11-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
		; GFX11-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_mul_f64 v[10:11], v[12:13], v[8:9]
		; GFX11-NEXT: v_fma_f64 v[6:7], -v[6:7], v[10:11], v[12:13]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_div_fmas_f64 v[6:7], v[6:7], v[8:9], v[10:11]
; GFX11-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0		; GFX11-NEXT: v_div_fixup_f64 v[2:3], v[6:7], v[2:3], 1.0
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv <2 x double> <double 1.0, double 1.0>, %x, !fpmath !0		%fdiv = fdiv <2 x double> <double 1.0, double 1.0>, %x, !fpmath !0
ret <2 x double> %fdiv		ret <2 x double> %fdiv
}		}

define <2 x double> @v_fdiv_v2f64_afn_ulp25(<2 x double> %a, <2 x double> %b) {		define <2 x double> @v_fdiv_v2f64_afn_ulp25(<2 x double> %a, <2 x double> %b) {
; GCN-LABEL: v_fdiv_v2f64_afn_ulp25:		; GCN-LABEL: v_fdiv_v2f64_afn_ulp25:
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	; GFX11-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv afn <2 x double> %a, %b, !fpmath !0		%fdiv = fdiv afn <2 x double> %a, %b, !fpmath !0
ret <2 x double> %fdiv		ret <2 x double> %fdiv
}		}

define <2 x double> @v_fdiv_v2f64_arcp_ulp25(<2 x double> %a, <2 x double> %b) {		define <2 x double> @v_fdiv_v2f64_arcp_ulp25(<2 x double> %a, <2 x double> %b) {
; GFX6-LABEL: v_fdiv_v2f64_arcp_ulp25:		; GFX6-LABEL: v_fdiv_v2f64_arcp_ulp25:
; GFX6: ; %bb.0:		; GFX6: ; %bb.0:
; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-NEXT: v_div_scale_f64 v[8:9], s[4:5], v[4:5], v[4:5], v[0:1]		; GFX6-NEXT: v_div_scale_f64 v[8:9], vcc, v[4:5], v[4:5], v[0:1]
; GFX6-NEXT: v_div_scale_f64 v[14:15], s[4:5], v[6:7], v[6:7], v[2:3]		; GFX6-NEXT: v_div_scale_f64 v[14:15], vcc, v[0:1], v[4:5], v[0:1]
; GFX6-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]		; GFX6-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]
; GFX6-NEXT: v_div_scale_f64 v[18:19], s[4:5], v[0:1], v[4:5], v[0:1]		; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v5, v9
; GFX6-NEXT: v_rcp_f64_e32 v[16:17], v[14:15]		; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v1, v15
		; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]
; GFX6-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0		; GFX6-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v1, v19
; GFX6-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]		; GFX6-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v5, v9
; GFX6-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0		; GFX6-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]
; GFX6-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]		; GFX6-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX6-NEXT: v_fma_f64 v[12:13], -v[14:15], v[16:17], 1.0		; GFX6-NEXT: v_mul_f64 v[12:13], v[14:15], v[10:11]
; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v7, v15		; GFX6-NEXT: v_fma_f64 v[14:15], -v[8:9], v[12:13], v[14:15]
; GFX6-NEXT: v_fma_f64 v[12:13], v[16:17], v[12:13], v[16:17]		; GFX6-NEXT: v_div_fmas_f64 v[8:9], v[14:15], v[10:11], v[12:13]
; GFX6-NEXT: v_mul_f64 v[16:17], v[18:19], v[10:11]		; GFX6-NEXT: v_div_scale_f64 v[10:11], vcc, v[6:7], v[6:7], v[2:3]
; GFX6-NEXT: v_fma_f64 v[18:19], -v[8:9], v[16:17], v[18:19]		; GFX6-NEXT: v_div_scale_f64 v[16:17], vcc, v[2:3], v[6:7], v[2:3]
; GFX6-NEXT: v_fma_f64 v[8:9], -v[14:15], v[12:13], 1.0		; GFX6-NEXT: v_rcp_f64_e32 v[12:13], v[10:11]
; GFX6-NEXT: v_div_fmas_f64 v[10:11], v[18:19], v[10:11], v[16:17]		; GFX6-NEXT: v_cmp_eq_u32_e64 s[4:5], v7, v11
; GFX6-NEXT: v_fma_f64 v[8:9], v[12:13], v[8:9], v[12:13]		; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v3, v17
; GFX6-NEXT: v_div_scale_f64 v[12:13], s[6:7], v[2:3], v[6:7], v[2:3]
; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[10:11], v[4:5], v[0:1]
; GFX6-NEXT: v_mul_f64 v[16:17], v[12:13], v[8:9]
; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, v3, v13
; GFX6-NEXT: v_fma_f64 v[18:19], -v[14:15], v[16:17], v[12:13]
; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]		; GFX6-NEXT: s_xor_b64 vcc, vcc, s[4:5]
; GFX6-NEXT: s_nop 1		; GFX6-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
; GFX6-NEXT: v_div_fmas_f64 v[8:9], v[18:19], v[8:9], v[16:17]		; GFX6-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]
; GFX6-NEXT: v_div_fixup_f64 v[2:3], v[8:9], v[6:7], v[2:3]		; GFX6-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX6-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
		; GFX6-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX6-NEXT: v_mul_f64 v[14:15], v[16:17], v[12:13]
		; GFX6-NEXT: v_fma_f64 v[10:11], -v[10:11], v[14:15], v[16:17]
		; GFX6-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[12:13], v[14:15]
		; GFX6-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]
; GFX6-NEXT: s_setpc_b64 s[30:31]		; GFX6-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX8-LABEL: v_fdiv_v2f64_arcp_ulp25:		; GFX8-LABEL: v_fdiv_v2f64_arcp_ulp25:
; GFX8: ; %bb.0:		; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: v_div_scale_f64 v[8:9], s[4:5], v[4:5], v[4:5], v[0:1]		; GFX8-NEXT: v_div_scale_f64 v[8:9], vcc, v[4:5], v[4:5], v[0:1]
; GFX8-NEXT: v_div_scale_f64 v[10:11], s[4:5], v[6:7], v[6:7], v[2:3]		; GFX8-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]
; GFX8-NEXT: v_rcp_f64_e32 v[12:13], v[8:9]		; GFX8-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX8-NEXT: v_rcp_f64_e32 v[14:15], v[10:11]		; GFX8-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX8-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX8-NEXT: v_div_scale_f64 v[12:13], vcc, v[0:1], v[4:5], v[0:1]
; GFX8-NEXT: v_fma_f64 v[18:19], -v[10:11], v[14:15], 1.0		; GFX8-NEXT: v_fma_f64 v[14:15], -v[8:9], v[10:11], 1.0
; GFX8-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX8-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]
; GFX8-NEXT: v_fma_f64 v[14:15], v[14:15], v[18:19], v[14:15]		; GFX8-NEXT: v_mul_f64 v[14:15], v[12:13], v[10:11]
; GFX8-NEXT: v_div_scale_f64 v[18:19], vcc, v[0:1], v[4:5], v[0:1]		; GFX8-NEXT: v_fma_f64 v[8:9], -v[8:9], v[14:15], v[12:13]
; GFX8-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX8-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[10:11], v[14:15]
; GFX8-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX8-NEXT: v_div_scale_f64 v[10:11], vcc, v[6:7], v[6:7], v[2:3]
; GFX8-NEXT: v_fma_f64 v[16:17], -v[10:11], v[14:15], 1.0
; GFX8-NEXT: v_fma_f64 v[14:15], v[14:15], v[16:17], v[14:15]
; GFX8-NEXT: v_mul_f64 v[16:17], v[18:19], v[12:13]
; GFX8-NEXT: v_fma_f64 v[8:9], -v[8:9], v[16:17], v[18:19]
; GFX8-NEXT: v_div_scale_f64 v[18:19], s[4:5], v[2:3], v[6:7], v[2:3]
; GFX8-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[12:13], v[16:17]
; GFX8-NEXT: s_mov_b64 vcc, s[4:5]
; GFX8-NEXT: v_mul_f64 v[20:21], v[18:19], v[14:15]
; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]		; GFX8-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]
; GFX8-NEXT: v_fma_f64 v[10:11], -v[10:11], v[20:21], v[18:19]		; GFX8-NEXT: v_rcp_f64_e32 v[12:13], v[10:11]
; GFX8-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[14:15], v[20:21]		; GFX8-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
		; GFX8-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX8-NEXT: v_div_scale_f64 v[14:15], vcc, v[2:3], v[6:7], v[2:3]
		; GFX8-NEXT: v_fma_f64 v[16:17], -v[10:11], v[12:13], 1.0
		; GFX8-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]
		; GFX8-NEXT: v_mul_f64 v[16:17], v[14:15], v[12:13]
		; GFX8-NEXT: v_fma_f64 v[10:11], -v[10:11], v[16:17], v[14:15]
		; GFX8-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[12:13], v[16:17]
; GFX8-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]		; GFX8-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]
; GFX8-NEXT: s_setpc_b64 s[30:31]		; GFX8-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX9-LABEL: v_fdiv_v2f64_arcp_ulp25:		; GFX9-LABEL: v_fdiv_v2f64_arcp_ulp25:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: v_div_scale_f64 v[8:9], s[4:5], v[4:5], v[4:5], v[0:1]		; GFX9-NEXT: v_div_scale_f64 v[8:9], vcc, v[4:5], v[4:5], v[0:1]
; GFX9-NEXT: v_div_scale_f64 v[10:11], s[4:5], v[6:7], v[6:7], v[2:3]		; GFX9-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]
; GFX9-NEXT: v_rcp_f64_e32 v[12:13], v[8:9]		; GFX9-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX9-NEXT: v_rcp_f64_e32 v[14:15], v[10:11]		; GFX9-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX9-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX9-NEXT: v_div_scale_f64 v[12:13], vcc, v[0:1], v[4:5], v[0:1]
; GFX9-NEXT: v_fma_f64 v[18:19], -v[10:11], v[14:15], 1.0		; GFX9-NEXT: v_fma_f64 v[14:15], -v[8:9], v[10:11], 1.0
; GFX9-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX9-NEXT: v_fma_f64 v[10:11], v[10:11], v[14:15], v[10:11]
; GFX9-NEXT: v_fma_f64 v[14:15], v[14:15], v[18:19], v[14:15]		; GFX9-NEXT: v_mul_f64 v[14:15], v[12:13], v[10:11]
; GFX9-NEXT: v_div_scale_f64 v[18:19], vcc, v[0:1], v[4:5], v[0:1]		; GFX9-NEXT: v_fma_f64 v[8:9], -v[8:9], v[14:15], v[12:13]
; GFX9-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX9-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[10:11], v[14:15]
; GFX9-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX9-NEXT: v_div_scale_f64 v[10:11], vcc, v[6:7], v[6:7], v[2:3]
; GFX9-NEXT: v_fma_f64 v[16:17], -v[10:11], v[14:15], 1.0
; GFX9-NEXT: v_fma_f64 v[14:15], v[14:15], v[16:17], v[14:15]
; GFX9-NEXT: v_mul_f64 v[16:17], v[18:19], v[12:13]
; GFX9-NEXT: v_fma_f64 v[8:9], -v[8:9], v[16:17], v[18:19]
; GFX9-NEXT: v_div_scale_f64 v[18:19], s[4:5], v[2:3], v[6:7], v[2:3]
; GFX9-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[12:13], v[16:17]
; GFX9-NEXT: s_mov_b64 vcc, s[4:5]
; GFX9-NEXT: v_mul_f64 v[20:21], v[18:19], v[14:15]
; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]		; GFX9-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]
; GFX9-NEXT: v_fma_f64 v[10:11], -v[10:11], v[20:21], v[18:19]		; GFX9-NEXT: v_rcp_f64_e32 v[12:13], v[10:11]
; GFX9-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[14:15], v[20:21]		; GFX9-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
		; GFX9-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX9-NEXT: v_div_scale_f64 v[14:15], vcc, v[2:3], v[6:7], v[2:3]
		; GFX9-NEXT: v_fma_f64 v[16:17], -v[10:11], v[12:13], 1.0
		; GFX9-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]
		; GFX9-NEXT: v_mul_f64 v[16:17], v[14:15], v[12:13]
		; GFX9-NEXT: v_fma_f64 v[10:11], -v[10:11], v[16:17], v[14:15]
		; GFX9-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[12:13], v[16:17]
; GFX9-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]		; GFX9-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: v_fdiv_v2f64_arcp_ulp25:		; GFX10-LABEL: v_fdiv_v2f64_arcp_ulp25:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: v_div_scale_f64 v[8:9], s4, v[4:5], v[4:5], v[0:1]		; GFX10-NEXT: v_div_scale_f64 v[8:9], vcc_lo, v[4:5], v[4:5], v[0:1]
; GFX10-NEXT: v_div_scale_f64 v[10:11], s4, v[6:7], v[6:7], v[2:3]		; GFX10-NEXT: v_div_scale_f64 v[14:15], vcc_lo, v[0:1], v[4:5], v[0:1]
; GFX10-NEXT: v_div_scale_f64 v[20:21], vcc_lo, v[0:1], v[4:5], v[0:1]		; GFX10-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]
; GFX10-NEXT: v_rcp_f64_e32 v[12:13], v[8:9]		; GFX10-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX10-NEXT: v_rcp_f64_e32 v[14:15], v[10:11]		; GFX10-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX10-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX10-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX10-NEXT: v_fma_f64 v[18:19], -v[10:11], v[14:15], 1.0		; GFX10-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX10-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX10-NEXT: v_mul_f64 v[12:13], v[14:15], v[10:11]
; GFX10-NEXT: v_fma_f64 v[14:15], v[14:15], v[18:19], v[14:15]		; GFX10-NEXT: v_fma_f64 v[8:9], -v[8:9], v[12:13], v[14:15]
; GFX10-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX10-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[10:11], v[12:13]
; GFX10-NEXT: v_fma_f64 v[18:19], -v[10:11], v[14:15], 1.0		; GFX10-NEXT: v_div_scale_f64 v[10:11], vcc_lo, v[6:7], v[6:7], v[2:3]
; GFX10-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX10-NEXT: v_div_scale_f64 v[16:17], vcc_lo, v[2:3], v[6:7], v[2:3]
; GFX10-NEXT: v_div_scale_f64 v[16:17], s4, v[2:3], v[6:7], v[2:3]
; GFX10-NEXT: v_fma_f64 v[14:15], v[14:15], v[18:19], v[14:15]
; GFX10-NEXT: v_mul_f64 v[18:19], v[20:21], v[12:13]
; GFX10-NEXT: v_mul_f64 v[22:23], v[16:17], v[14:15]
; GFX10-NEXT: v_fma_f64 v[8:9], -v[8:9], v[18:19], v[20:21]
; GFX10-NEXT: v_fma_f64 v[10:11], -v[10:11], v[22:23], v[16:17]
; GFX10-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[12:13], v[18:19]
; GFX10-NEXT: s_mov_b32 vcc_lo, s4
; GFX10-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[14:15], v[22:23]
; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]		; GFX10-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]
		; GFX10-NEXT: v_rcp_f64_e32 v[12:13], v[10:11]
		; GFX10-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
		; GFX10-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX10-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
		; GFX10-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX10-NEXT: v_mul_f64 v[14:15], v[16:17], v[12:13]
		; GFX10-NEXT: v_fma_f64 v[10:11], -v[10:11], v[14:15], v[16:17]
		; GFX10-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[12:13], v[14:15]
; GFX10-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]		; GFX10-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-LABEL: v_fdiv_v2f64_arcp_ulp25:		; GFX11-LABEL: v_fdiv_v2f64_arcp_ulp25:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: v_div_scale_f64 v[8:9], null, v[4:5], v[4:5], v[0:1]		; GFX11-NEXT: v_div_scale_f64 v[8:9], vcc_lo, v[4:5], v[4:5], v[0:1]
; GFX11-NEXT: v_div_scale_f64 v[10:11], null, v[6:7], v[6:7], v[2:3]		; GFX11-NEXT: v_div_scale_f64 v[14:15], vcc_lo, v[0:1], v[4:5], v[0:1]
; GFX11-NEXT: v_div_scale_f64 v[20:21], vcc_lo, v[0:1], v[4:5], v[0:1]		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]
; GFX11-NEXT: v_rcp_f64_e32 v[12:13], v[8:9]		; GFX11-NEXT: s_waitcnt_depctr 0xfff
; GFX11-NEXT: v_rcp_f64_e32 v[14:15], v[10:11]		; GFX11-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX11-NEXT: s_waitcnt_depctr 0xfff		; GFX11-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX11-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_fma_f64 v[18:19], -v[10:11], v[14:15], 1.0		; GFX11-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
; GFX11-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_fma_f64 v[14:15], v[14:15], v[18:19], v[14:15]		; GFX11-NEXT: v_mul_f64 v[12:13], v[14:15], v[10:11]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-NEXT: v_fma_f64 v[8:9], -v[8:9], v[12:13], v[14:15]
; GFX11-NEXT: v_fma_f64 v[16:17], -v[8:9], v[12:13], 1.0		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_2) \| instid1(VALU_DEP_3)
; GFX11-NEXT: v_fma_f64 v[18:19], -v[10:11], v[14:15], 1.0		; GFX11-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[10:11], v[12:13]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_1) \| instid1(VALU_DEP_3)		; GFX11-NEXT: v_div_scale_f64 v[10:11], vcc_lo, v[6:7], v[6:7], v[2:3]
; GFX11-NEXT: v_fma_f64 v[12:13], v[12:13], v[16:17], v[12:13]		; GFX11-NEXT: v_div_scale_f64 v[16:17], vcc_lo, v[2:3], v[6:7], v[2:3]
; GFX11-NEXT: v_div_scale_f64 v[16:17], s0, v[2:3], v[6:7], v[2:3]
; GFX11-NEXT: v_fma_f64 v[14:15], v[14:15], v[18:19], v[14:15]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_mul_f64 v[18:19], v[20:21], v[12:13]
; GFX11-NEXT: v_mul_f64 v[22:23], v[16:17], v[14:15]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_fma_f64 v[8:9], -v[8:9], v[18:19], v[20:21]
; GFX11-NEXT: v_fma_f64 v[10:11], -v[10:11], v[22:23], v[16:17]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[12:13], v[18:19]
; GFX11-NEXT: s_mov_b32 vcc_lo, s0
; GFX11-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[14:15], v[22:23]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]		; GFX11-NEXT: v_div_fixup_f64 v[0:1], v[8:9], v[4:5], v[0:1]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_rcp_f64_e32 v[12:13], v[10:11]
		; GFX11-NEXT: s_waitcnt_depctr 0xfff
		; GFX11-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
		; GFX11-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
		; GFX11-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_mul_f64 v[14:15], v[16:17], v[12:13]
		; GFX11-NEXT: v_fma_f64 v[10:11], -v[10:11], v[14:15], v[16:17]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
		; GFX11-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[12:13], v[14:15]
; GFX11-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]		; GFX11-NEXT: v_div_fixup_f64 v[2:3], v[10:11], v[6:7], v[2:3]
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
%fdiv = fdiv arcp <2 x double> %a, %b, !fpmath !0		%fdiv = fdiv arcp <2 x double> %a, %b, !fpmath !0
ret <2 x double> %fdiv		ret <2 x double> %fdiv
}		}

define <2 x double> @v_fdiv_v2f64_arcp_afn_ulp25(<2 x double> %a, <2 x double> %b) {		define <2 x double> @v_fdiv_v2f64_arcp_afn_ulp25(<2 x double> %a, <2 x double> %b) {
; GCN-LABEL: v_fdiv_v2f64_arcp_afn_ulp25:		; GCN-LABEL: v_fdiv_v2f64_arcp_afn_ulp25:
▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -amdgpu-scalarize-global-loads=false -enable-misched=0 -march=amdgcn -mcpu=bonaire -verify-machineinstrs < %s \| FileCheck --check-prefix=CI %s			; RUN: llc -global-isel -amdgpu-scalarize-global-loads=false -enable-misched=0 -march=amdgcn -mcpu=bonaire -verify-machineinstrs < %s \| FileCheck --check-prefix=CI %s
	; RUN: llc -global-isel -amdgpu-scalarize-global-loads=false -enable-misched=0 -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck --check-prefix=VI %s			; RUN: llc -global-isel -amdgpu-scalarize-global-loads=false -enable-misched=0 -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck --check-prefix=VI %s

	define amdgpu_kernel void @frem_f16(half addrspace(1)* %out, half addrspace(1)* %in1, half addrspace(1)* %in2) #0 {			define amdgpu_kernel void @frem_f16(half addrspace(1)* %out, half addrspace(1)* %in1, half addrspace(1)* %in2) #0 {
	; CI-LABEL: frem_f16:			; CI-LABEL: frem_f16:
	; CI: ; %bb.0:			; CI: ; %bb.0:
	; CI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; CI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd			; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
	; CI-NEXT: s_waitcnt lgkmcnt(0)			; CI-NEXT: s_waitcnt lgkmcnt(0)
	; CI-NEXT: s_load_dword s2, s[6:7], 0x0			; CI-NEXT: s_load_dword s2, s[6:7], 0x0
	; CI-NEXT: s_load_dword s0, s[0:1], 0x2			; CI-NEXT: s_load_dword s0, s[0:1], 0x2
	; CI-NEXT: s_waitcnt lgkmcnt(0)			; CI-NEXT: s_waitcnt lgkmcnt(0)
	; CI-NEXT: v_cvt_f32_f16_e32 v0, s2			; CI-NEXT: v_cvt_f32_f16_e32 v0, s2
	; CI-NEXT: v_cvt_f32_f16_e32 v1, s0			; CI-NEXT: v_cvt_f32_f16_e32 v1, s0
	; CI-NEXT: v_div_scale_f32 v2, s[0:1], v1, v1, v0			; CI-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v0
	; CI-NEXT: v_div_scale_f32 v3, vcc, v0, v1, v0			; CI-NEXT: v_div_scale_f32 v3, vcc, v0, v1, v0
	; CI-NEXT: v_rcp_f32_e32 v4, v2			; CI-NEXT: v_rcp_f32_e32 v4, v2
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v5, -v2, v4, 1.0			; CI-NEXT: v_fma_f32 v5, -v2, v4, 1.0
	; CI-NEXT: v_fma_f32 v4, v5, v4, v4			; CI-NEXT: v_fma_f32 v4, v5, v4, v4
	; CI-NEXT: v_mul_f32_e32 v5, v3, v4			; CI-NEXT: v_mul_f32_e32 v5, v3, v4
	; CI-NEXT: v_fma_f32 v6, -v2, v5, v3			; CI-NEXT: v_fma_f32 v6, -v2, v5, v3
	; CI-NEXT: v_fma_f32 v5, v6, v4, v5			; CI-NEXT: v_fma_f32 v5, v6, v4, v5
	▲ Show 20 Lines • Show All 136 Lines • ▼ Show 20 Lines
	; CI: ; %bb.0:			; CI: ; %bb.0:
	; CI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; CI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd			; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
	; CI-NEXT: s_waitcnt lgkmcnt(0)			; CI-NEXT: s_waitcnt lgkmcnt(0)
	; CI-NEXT: s_load_dword s2, s[6:7], 0x0			; CI-NEXT: s_load_dword s2, s[6:7], 0x0
	; CI-NEXT: s_load_dword s0, s[0:1], 0x4			; CI-NEXT: s_load_dword s0, s[0:1], 0x4
	; CI-NEXT: s_waitcnt lgkmcnt(0)			; CI-NEXT: s_waitcnt lgkmcnt(0)
	; CI-NEXT: v_mov_b32_e32 v0, s0			; CI-NEXT: v_mov_b32_e32 v0, s0
	; CI-NEXT: v_div_scale_f32 v1, s[0:1], v0, v0, s2			; CI-NEXT: v_div_scale_f32 v1, vcc, v0, v0, s2
	; CI-NEXT: v_div_scale_f32 v2, vcc, s2, v0, s2			; CI-NEXT: v_div_scale_f32 v2, vcc, s2, v0, s2
	; CI-NEXT: v_rcp_f32_e32 v3, v1			; CI-NEXT: v_rcp_f32_e32 v3, v1
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v4, -v1, v3, 1.0			; CI-NEXT: v_fma_f32 v4, -v1, v3, 1.0
	; CI-NEXT: v_fma_f32 v3, v4, v3, v3			; CI-NEXT: v_fma_f32 v3, v4, v3, v3
	; CI-NEXT: v_mul_f32_e32 v4, v2, v3			; CI-NEXT: v_mul_f32_e32 v4, v2, v3
	; CI-NEXT: v_fma_f32 v5, -v1, v4, v2			; CI-NEXT: v_fma_f32 v5, -v1, v4, v2
	; CI-NEXT: v_fma_f32 v4, v5, v3, v4			; CI-NEXT: v_fma_f32 v4, v5, v3, v4
	Show All 12 Lines
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: s_load_dword s2, s[6:7], 0x0			; VI-NEXT: s_load_dword s2, s[6:7], 0x0
	; VI-NEXT: s_load_dword s0, s[0:1], 0x10			; VI-NEXT: s_load_dword s0, s[0:1], 0x10
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: v_mov_b32_e32 v0, s0			; VI-NEXT: v_mov_b32_e32 v0, s0
	; VI-NEXT: v_div_scale_f32 v1, s[0:1], v0, v0, s2			; VI-NEXT: v_div_scale_f32 v1, vcc, v0, v0, s2
	; VI-NEXT: v_div_scale_f32 v2, vcc, s2, v0, s2			; VI-NEXT: v_div_scale_f32 v2, vcc, s2, v0, s2
	; VI-NEXT: v_rcp_f32_e32 v3, v1			; VI-NEXT: v_rcp_f32_e32 v3, v1
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; VI-NEXT: v_fma_f32 v4, -v1, v3, 1.0			; VI-NEXT: v_fma_f32 v4, -v1, v3, 1.0
	; VI-NEXT: v_fma_f32 v3, v4, v3, v3			; VI-NEXT: v_fma_f32 v3, v4, v3, v3
	; VI-NEXT: v_mul_f32_e32 v4, v2, v3			; VI-NEXT: v_mul_f32_e32 v4, v2, v3
	; VI-NEXT: v_fma_f32 v5, -v1, v4, v2			; VI-NEXT: v_fma_f32 v5, -v1, v4, v2
	; VI-NEXT: v_fma_f32 v4, v5, v3, v4			; VI-NEXT: v_fma_f32 v4, v5, v3, v4
	▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines
	; CI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; CI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd			; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
	; CI-NEXT: s_waitcnt lgkmcnt(0)			; CI-NEXT: s_waitcnt lgkmcnt(0)
	; CI-NEXT: s_load_dwordx2 s[2:3], s[6:7], 0x0			; CI-NEXT: s_load_dwordx2 s[2:3], s[6:7], 0x0
	; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; CI-NEXT: s_waitcnt lgkmcnt(0)			; CI-NEXT: s_waitcnt lgkmcnt(0)
	; CI-NEXT: v_mov_b32_e32 v0, s0			; CI-NEXT: v_mov_b32_e32 v0, s0
	; CI-NEXT: v_mov_b32_e32 v1, s1			; CI-NEXT: v_mov_b32_e32 v1, s1
	; CI-NEXT: v_div_scale_f64 v[2:3], s[0:1], v[0:1], v[0:1], s[2:3]			; CI-NEXT: v_div_scale_f64 v[2:3], vcc, v[0:1], v[0:1], s[2:3]
	; CI-NEXT: v_div_scale_f64 v[8:9], vcc, s[2:3], v[0:1], s[2:3]			; CI-NEXT: v_div_scale_f64 v[8:9], vcc, s[2:3], v[0:1], s[2:3]
	; CI-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]			; CI-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]
	; CI-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0			; CI-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
	; CI-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]			; CI-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
	; CI-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0			; CI-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
	; CI-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]			; CI-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
	; CI-NEXT: v_mul_f64 v[6:7], v[8:9], v[4:5]			; CI-NEXT: v_mul_f64 v[6:7], v[8:9], v[4:5]
	; CI-NEXT: v_fma_f64 v[2:3], -v[2:3], v[6:7], v[8:9]			; CI-NEXT: v_fma_f64 v[2:3], -v[2:3], v[6:7], v[8:9]
	Show All 11 Lines
	; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: s_load_dwordx2 s[2:3], s[6:7], 0x0			; VI-NEXT: s_load_dwordx2 s[2:3], s[6:7], 0x0
	; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: v_mov_b32_e32 v0, s0			; VI-NEXT: v_mov_b32_e32 v0, s0
	; VI-NEXT: v_mov_b32_e32 v1, s1			; VI-NEXT: v_mov_b32_e32 v1, s1
	; VI-NEXT: v_div_scale_f64 v[2:3], s[0:1], v[0:1], v[0:1], s[2:3]			; VI-NEXT: v_div_scale_f64 v[2:3], vcc, v[0:1], v[0:1], s[2:3]
	; VI-NEXT: v_div_scale_f64 v[8:9], vcc, s[2:3], v[0:1], s[2:3]			; VI-NEXT: v_div_scale_f64 v[8:9], vcc, s[2:3], v[0:1], s[2:3]
	; VI-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]			; VI-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]
	; VI-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0			; VI-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
	; VI-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]			; VI-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
	; VI-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0			; VI-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
	; VI-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]			; VI-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
	; VI-NEXT: v_mul_f64 v[6:7], v[8:9], v[4:5]			; VI-NEXT: v_mul_f64 v[6:7], v[8:9], v[4:5]
	; VI-NEXT: v_fma_f64 v[2:3], -v[2:3], v[6:7], v[8:9]			; VI-NEXT: v_fma_f64 v[2:3], -v[2:3], v[6:7], v[8:9]
	▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines
	; CI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; CI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd			; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
	; CI-NEXT: s_waitcnt lgkmcnt(0)			; CI-NEXT: s_waitcnt lgkmcnt(0)
	; CI-NEXT: s_load_dword s2, s[6:7], 0x0			; CI-NEXT: s_load_dword s2, s[6:7], 0x0
	; CI-NEXT: s_load_dword s0, s[0:1], 0x4			; CI-NEXT: s_load_dword s0, s[0:1], 0x4
	; CI-NEXT: s_waitcnt lgkmcnt(0)			; CI-NEXT: s_waitcnt lgkmcnt(0)
	; CI-NEXT: v_cvt_f32_f16_e32 v0, s2			; CI-NEXT: v_cvt_f32_f16_e32 v0, s2
	; CI-NEXT: v_cvt_f32_f16_e32 v1, s0			; CI-NEXT: v_cvt_f32_f16_e32 v1, s0
	; CI-NEXT: s_lshr_b32 s6, s0, 16			; CI-NEXT: s_lshr_b32 s1, s2, 16
	; CI-NEXT: s_lshr_b32 s3, s2, 16			; CI-NEXT: s_lshr_b32 s3, s0, 16
	; CI-NEXT: v_div_scale_f32 v2, s[0:1], v1, v1, v0			; CI-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v0
	; CI-NEXT: v_div_scale_f32 v3, vcc, v0, v1, v0			; CI-NEXT: v_div_scale_f32 v3, vcc, v0, v1, v0
	; CI-NEXT: v_rcp_f32_e32 v4, v2			; CI-NEXT: v_rcp_f32_e32 v4, v2
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v5, -v2, v4, 1.0			; CI-NEXT: v_fma_f32 v5, -v2, v4, 1.0
	; CI-NEXT: v_fma_f32 v4, v5, v4, v4			; CI-NEXT: v_fma_f32 v4, v5, v4, v4
	; CI-NEXT: v_mul_f32_e32 v5, v3, v4			; CI-NEXT: v_mul_f32_e32 v5, v3, v4
	; CI-NEXT: v_fma_f32 v6, -v2, v5, v3			; CI-NEXT: v_fma_f32 v6, -v2, v5, v3
	; CI-NEXT: v_fma_f32 v5, v6, v4, v5			; CI-NEXT: v_fma_f32 v5, v6, v4, v5
	; CI-NEXT: v_fma_f32 v2, -v2, v5, v3			; CI-NEXT: v_fma_f32 v2, -v2, v5, v3
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; CI-NEXT: v_div_fmas_f32 v2, v2, v4, v5			; CI-NEXT: v_div_fmas_f32 v2, v2, v4, v5
	; CI-NEXT: v_div_fixup_f32 v2, v2, v1, v0			; CI-NEXT: v_div_fixup_f32 v2, v2, v1, v0
	; CI-NEXT: v_trunc_f32_e32 v2, v2			; CI-NEXT: v_trunc_f32_e32 v2, v2
	; CI-NEXT: v_fma_f32 v0, -v2, v1, v0			; CI-NEXT: v_fma_f32 v0, -v2, v1, v0
	; CI-NEXT: v_cvt_f32_f16_e32 v1, s3			; CI-NEXT: v_cvt_f32_f16_e32 v1, s1
	; CI-NEXT: v_cvt_f32_f16_e32 v2, s6			; CI-NEXT: v_cvt_f32_f16_e32 v2, s3
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
	; CI-NEXT: v_cvt_f16_f32_e32 v0, v0			; CI-NEXT: v_cvt_f16_f32_e32 v0, v0
	; CI-NEXT: v_div_scale_f32 v3, s[0:1], v2, v2, v1			; CI-NEXT: v_div_scale_f32 v3, vcc, v2, v2, v1
	; CI-NEXT: v_div_scale_f32 v4, vcc, v1, v2, v1			; CI-NEXT: v_div_scale_f32 v4, vcc, v1, v2, v1
	; CI-NEXT: v_rcp_f32_e32 v5, v3			; CI-NEXT: v_rcp_f32_e32 v5, v3
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v6, -v3, v5, 1.0			; CI-NEXT: v_fma_f32 v6, -v3, v5, 1.0
	; CI-NEXT: v_fma_f32 v5, v6, v5, v5			; CI-NEXT: v_fma_f32 v5, v6, v5, v5
	; CI-NEXT: v_mul_f32_e32 v6, v4, v5			; CI-NEXT: v_mul_f32_e32 v6, v4, v5
	; CI-NEXT: v_fma_f32 v7, -v3, v6, v4			; CI-NEXT: v_fma_f32 v7, -v3, v6, v4
	; CI-NEXT: v_fma_f32 v6, v7, v5, v6			; CI-NEXT: v_fma_f32 v6, v7, v5, v6
	▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; CI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; CI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd			; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
	; CI-NEXT: s_waitcnt lgkmcnt(0)			; CI-NEXT: s_waitcnt lgkmcnt(0)
	; CI-NEXT: s_load_dwordx2 s[2:3], s[6:7], 0x0			; CI-NEXT: s_load_dwordx2 s[2:3], s[6:7], 0x0
	; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x8			; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x8
	; CI-NEXT: s_waitcnt lgkmcnt(0)			; CI-NEXT: s_waitcnt lgkmcnt(0)
	; CI-NEXT: v_cvt_f32_f16_e32 v0, s2			; CI-NEXT: v_cvt_f32_f16_e32 v0, s2
	; CI-NEXT: v_cvt_f32_f16_e32 v1, s0			; CI-NEXT: v_cvt_f32_f16_e32 v1, s0
	; CI-NEXT: s_lshr_b32 s8, s2, 16			; CI-NEXT: s_lshr_b32 s6, s2, 16
	; CI-NEXT: s_lshr_b32 s9, s3, 16			; CI-NEXT: s_lshr_b32 s7, s3, 16
	; CI-NEXT: s_lshr_b32 s10, s0, 16			; CI-NEXT: s_lshr_b32 s8, s0, 16
	; CI-NEXT: v_div_scale_f32 v2, s[6:7], v1, v1, v0			; CI-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v0
	; CI-NEXT: s_lshr_b32 s11, s1, 16			; CI-NEXT: s_lshr_b32 s9, s1, 16
	; CI-NEXT: v_div_scale_f32 v3, vcc, v0, v1, v0			; CI-NEXT: v_div_scale_f32 v3, vcc, v0, v1, v0
	; CI-NEXT: v_rcp_f32_e32 v4, v2			; CI-NEXT: v_rcp_f32_e32 v4, v2
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v5, -v2, v4, 1.0			; CI-NEXT: v_fma_f32 v5, -v2, v4, 1.0
	; CI-NEXT: v_fma_f32 v4, v5, v4, v4			; CI-NEXT: v_fma_f32 v4, v5, v4, v4
	; CI-NEXT: v_mul_f32_e32 v5, v3, v4			; CI-NEXT: v_mul_f32_e32 v5, v3, v4
	; CI-NEXT: v_fma_f32 v6, -v2, v5, v3			; CI-NEXT: v_fma_f32 v6, -v2, v5, v3
	; CI-NEXT: v_fma_f32 v5, v6, v4, v5			; CI-NEXT: v_fma_f32 v5, v6, v4, v5
	; CI-NEXT: v_fma_f32 v2, -v2, v5, v3			; CI-NEXT: v_fma_f32 v2, -v2, v5, v3
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; CI-NEXT: v_div_fmas_f32 v2, v2, v4, v5			; CI-NEXT: v_div_fmas_f32 v2, v2, v4, v5
	; CI-NEXT: v_div_fixup_f32 v2, v2, v1, v0			; CI-NEXT: v_div_fixup_f32 v2, v2, v1, v0
	; CI-NEXT: v_trunc_f32_e32 v2, v2			; CI-NEXT: v_trunc_f32_e32 v2, v2
	; CI-NEXT: v_fma_f32 v0, -v2, v1, v0			; CI-NEXT: v_fma_f32 v0, -v2, v1, v0
	; CI-NEXT: v_cvt_f32_f16_e32 v1, s8			; CI-NEXT: v_cvt_f32_f16_e32 v1, s6
	; CI-NEXT: v_cvt_f32_f16_e32 v2, s10			; CI-NEXT: v_cvt_f32_f16_e32 v2, s8
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
	; CI-NEXT: v_cvt_f16_f32_e32 v0, v0			; CI-NEXT: v_cvt_f16_f32_e32 v0, v0
	; CI-NEXT: v_div_scale_f32 v3, s[6:7], v2, v2, v1			; CI-NEXT: v_div_scale_f32 v3, vcc, v2, v2, v1
	; CI-NEXT: v_div_scale_f32 v4, vcc, v1, v2, v1			; CI-NEXT: v_div_scale_f32 v4, vcc, v1, v2, v1
	; CI-NEXT: v_rcp_f32_e32 v5, v3			; CI-NEXT: v_rcp_f32_e32 v5, v3
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v6, -v3, v5, 1.0			; CI-NEXT: v_fma_f32 v6, -v3, v5, 1.0
	; CI-NEXT: v_fma_f32 v5, v6, v5, v5			; CI-NEXT: v_fma_f32 v5, v6, v5, v5
	; CI-NEXT: v_mul_f32_e32 v6, v4, v5			; CI-NEXT: v_mul_f32_e32 v6, v4, v5
	; CI-NEXT: v_fma_f32 v7, -v3, v6, v4			; CI-NEXT: v_fma_f32 v7, -v3, v6, v4
	; CI-NEXT: v_fma_f32 v6, v7, v5, v6			; CI-NEXT: v_fma_f32 v6, v7, v5, v6
	; CI-NEXT: v_fma_f32 v3, -v3, v6, v4			; CI-NEXT: v_fma_f32 v3, -v3, v6, v4
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; CI-NEXT: v_div_fmas_f32 v3, v3, v5, v6			; CI-NEXT: v_div_fmas_f32 v3, v3, v5, v6
	; CI-NEXT: v_div_fixup_f32 v3, v3, v2, v1			; CI-NEXT: v_div_fixup_f32 v3, v3, v2, v1
	; CI-NEXT: v_trunc_f32_e32 v3, v3			; CI-NEXT: v_trunc_f32_e32 v3, v3
	; CI-NEXT: v_fma_f32 v1, -v3, v2, v1			; CI-NEXT: v_fma_f32 v1, -v3, v2, v1
	; CI-NEXT: v_cvt_f32_f16_e32 v2, s3			; CI-NEXT: v_cvt_f32_f16_e32 v2, s3
	; CI-NEXT: v_cvt_f32_f16_e32 v3, s1			; CI-NEXT: v_cvt_f32_f16_e32 v3, s1
	; CI-NEXT: v_cvt_f16_f32_e32 v1, v1			; CI-NEXT: v_cvt_f16_f32_e32 v1, v1
	; CI-NEXT: v_div_scale_f32 v4, s[0:1], v3, v3, v2			; CI-NEXT: v_div_scale_f32 v4, vcc, v3, v3, v2
	; CI-NEXT: v_div_scale_f32 v5, vcc, v2, v3, v2			; CI-NEXT: v_div_scale_f32 v5, vcc, v2, v3, v2
	; CI-NEXT: v_rcp_f32_e32 v6, v4			; CI-NEXT: v_rcp_f32_e32 v6, v4
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v7, -v4, v6, 1.0			; CI-NEXT: v_fma_f32 v7, -v4, v6, 1.0
	; CI-NEXT: v_fma_f32 v6, v7, v6, v6			; CI-NEXT: v_fma_f32 v6, v7, v6, v6
	; CI-NEXT: v_mul_f32_e32 v7, v5, v6			; CI-NEXT: v_mul_f32_e32 v7, v5, v6
	; CI-NEXT: v_fma_f32 v8, -v4, v7, v5			; CI-NEXT: v_fma_f32 v8, -v4, v7, v5
	; CI-NEXT: v_fma_f32 v7, v8, v6, v7			; CI-NEXT: v_fma_f32 v7, v8, v6, v7
	; CI-NEXT: v_fma_f32 v4, -v4, v7, v5			; CI-NEXT: v_fma_f32 v4, -v4, v7, v5
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; CI-NEXT: v_div_fmas_f32 v4, v4, v6, v7			; CI-NEXT: v_div_fmas_f32 v4, v4, v6, v7
	; CI-NEXT: v_div_fixup_f32 v4, v4, v3, v2			; CI-NEXT: v_div_fixup_f32 v4, v4, v3, v2
	; CI-NEXT: v_trunc_f32_e32 v4, v4			; CI-NEXT: v_trunc_f32_e32 v4, v4
	; CI-NEXT: v_fma_f32 v2, -v4, v3, v2			; CI-NEXT: v_fma_f32 v2, -v4, v3, v2
	; CI-NEXT: v_cvt_f32_f16_e32 v3, s9			; CI-NEXT: v_cvt_f32_f16_e32 v3, s7
	; CI-NEXT: v_cvt_f32_f16_e32 v4, s11			; CI-NEXT: v_cvt_f32_f16_e32 v4, s9
	; CI-NEXT: v_cvt_f16_f32_e32 v2, v2			; CI-NEXT: v_cvt_f16_f32_e32 v2, v2
	; CI-NEXT: v_div_scale_f32 v5, s[0:1], v4, v4, v3			; CI-NEXT: v_div_scale_f32 v5, vcc, v4, v4, v3
	; CI-NEXT: v_div_scale_f32 v6, vcc, v3, v4, v3			; CI-NEXT: v_div_scale_f32 v6, vcc, v3, v4, v3
	; CI-NEXT: v_rcp_f32_e32 v7, v5			; CI-NEXT: v_rcp_f32_e32 v7, v5
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v8, -v5, v7, 1.0			; CI-NEXT: v_fma_f32 v8, -v5, v7, 1.0
	; CI-NEXT: v_fma_f32 v7, v8, v7, v7			; CI-NEXT: v_fma_f32 v7, v8, v7, v7
	; CI-NEXT: v_mul_f32_e32 v8, v6, v7			; CI-NEXT: v_mul_f32_e32 v8, v6, v7
	; CI-NEXT: v_fma_f32 v9, -v5, v8, v6			; CI-NEXT: v_fma_f32 v9, -v5, v8, v6
	; CI-NEXT: v_fma_f32 v8, v9, v7, v8			; CI-NEXT: v_fma_f32 v8, v9, v7, v8
	▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	; CI: ; %bb.0:			; CI: ; %bb.0:
	; CI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; CI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd			; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
	; CI-NEXT: s_waitcnt lgkmcnt(0)			; CI-NEXT: s_waitcnt lgkmcnt(0)
	; CI-NEXT: s_load_dwordx2 s[2:3], s[6:7], 0x0			; CI-NEXT: s_load_dwordx2 s[2:3], s[6:7], 0x0
	; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x8			; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x8
	; CI-NEXT: s_waitcnt lgkmcnt(0)			; CI-NEXT: s_waitcnt lgkmcnt(0)
	; CI-NEXT: v_mov_b32_e32 v0, s0			; CI-NEXT: v_mov_b32_e32 v0, s0
	; CI-NEXT: v_div_scale_f32 v1, s[6:7], v0, v0, s2			; CI-NEXT: v_div_scale_f32 v1, vcc, v0, v0, s2
	; CI-NEXT: v_div_scale_f32 v2, vcc, s2, v0, s2			; CI-NEXT: v_div_scale_f32 v2, vcc, s2, v0, s2
	; CI-NEXT: v_rcp_f32_e32 v3, v1			; CI-NEXT: v_rcp_f32_e32 v3, v1
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v4, -v1, v3, 1.0			; CI-NEXT: v_fma_f32 v4, -v1, v3, 1.0
	; CI-NEXT: v_fma_f32 v3, v4, v3, v3			; CI-NEXT: v_fma_f32 v3, v4, v3, v3
	; CI-NEXT: v_mul_f32_e32 v4, v2, v3			; CI-NEXT: v_mul_f32_e32 v4, v2, v3
	; CI-NEXT: v_fma_f32 v5, -v1, v4, v2			; CI-NEXT: v_fma_f32 v5, -v1, v4, v2
	; CI-NEXT: v_fma_f32 v4, v5, v3, v4			; CI-NEXT: v_fma_f32 v4, v5, v3, v4
	; CI-NEXT: v_fma_f32 v1, -v1, v4, v2			; CI-NEXT: v_fma_f32 v1, -v1, v4, v2
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; CI-NEXT: v_div_fmas_f32 v1, v1, v3, v4			; CI-NEXT: v_div_fmas_f32 v1, v1, v3, v4
	; CI-NEXT: v_div_fixup_f32 v1, v1, v0, s2			; CI-NEXT: v_div_fixup_f32 v1, v1, v0, s2
	; CI-NEXT: v_trunc_f32_e32 v1, v1			; CI-NEXT: v_trunc_f32_e32 v1, v1
	; CI-NEXT: v_fma_f32 v0, -v1, v0, s2			; CI-NEXT: v_fma_f32 v0, -v1, v0, s2
	; CI-NEXT: v_mov_b32_e32 v1, s1			; CI-NEXT: v_mov_b32_e32 v1, s1
	; CI-NEXT: v_div_scale_f32 v2, s[0:1], v1, v1, s3			; CI-NEXT: v_div_scale_f32 v2, vcc, v1, v1, s3
	; CI-NEXT: v_div_scale_f32 v3, vcc, s3, v1, s3			; CI-NEXT: v_div_scale_f32 v3, vcc, s3, v1, s3
	; CI-NEXT: v_rcp_f32_e32 v4, v2			; CI-NEXT: v_rcp_f32_e32 v4, v2
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v5, -v2, v4, 1.0			; CI-NEXT: v_fma_f32 v5, -v2, v4, 1.0
	; CI-NEXT: v_fma_f32 v4, v5, v4, v4			; CI-NEXT: v_fma_f32 v4, v5, v4, v4
	; CI-NEXT: v_mul_f32_e32 v5, v3, v4			; CI-NEXT: v_mul_f32_e32 v5, v3, v4
	; CI-NEXT: v_fma_f32 v6, -v2, v5, v3			; CI-NEXT: v_fma_f32 v6, -v2, v5, v3
	; CI-NEXT: v_fma_f32 v5, v6, v4, v5			; CI-NEXT: v_fma_f32 v5, v6, v4, v5
	Show All 12 Lines
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: s_load_dwordx2 s[2:3], s[6:7], 0x0			; VI-NEXT: s_load_dwordx2 s[2:3], s[6:7], 0x0
	; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x20			; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x20
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: v_mov_b32_e32 v0, s0			; VI-NEXT: v_mov_b32_e32 v0, s0
	; VI-NEXT: v_div_scale_f32 v1, s[6:7], v0, v0, s2			; VI-NEXT: v_div_scale_f32 v1, vcc, v0, v0, s2
	; VI-NEXT: v_div_scale_f32 v2, vcc, s2, v0, s2			; VI-NEXT: v_div_scale_f32 v2, vcc, s2, v0, s2
	; VI-NEXT: v_rcp_f32_e32 v3, v1			; VI-NEXT: v_rcp_f32_e32 v3, v1
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; VI-NEXT: v_fma_f32 v4, -v1, v3, 1.0			; VI-NEXT: v_fma_f32 v4, -v1, v3, 1.0
	; VI-NEXT: v_fma_f32 v3, v4, v3, v3			; VI-NEXT: v_fma_f32 v3, v4, v3, v3
	; VI-NEXT: v_mul_f32_e32 v4, v2, v3			; VI-NEXT: v_mul_f32_e32 v4, v2, v3
	; VI-NEXT: v_fma_f32 v5, -v1, v4, v2			; VI-NEXT: v_fma_f32 v5, -v1, v4, v2
	; VI-NEXT: v_fma_f32 v4, v5, v3, v4			; VI-NEXT: v_fma_f32 v4, v5, v3, v4
	; VI-NEXT: v_fma_f32 v1, -v1, v4, v2			; VI-NEXT: v_fma_f32 v1, -v1, v4, v2
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; VI-NEXT: v_div_fmas_f32 v1, v1, v3, v4			; VI-NEXT: v_div_fmas_f32 v1, v1, v3, v4
	; VI-NEXT: v_div_fixup_f32 v1, v1, v0, s2			; VI-NEXT: v_div_fixup_f32 v1, v1, v0, s2
	; VI-NEXT: v_trunc_f32_e32 v1, v1			; VI-NEXT: v_trunc_f32_e32 v1, v1
	; VI-NEXT: v_fma_f32 v0, -v1, v0, s2			; VI-NEXT: v_fma_f32 v0, -v1, v0, s2
	; VI-NEXT: v_mov_b32_e32 v1, s1			; VI-NEXT: v_mov_b32_e32 v1, s1
	; VI-NEXT: v_div_scale_f32 v2, s[0:1], v1, v1, s3			; VI-NEXT: v_div_scale_f32 v2, vcc, v1, v1, s3
	; VI-NEXT: v_div_scale_f32 v3, vcc, s3, v1, s3			; VI-NEXT: v_div_scale_f32 v3, vcc, s3, v1, s3
	; VI-NEXT: v_rcp_f32_e32 v4, v2			; VI-NEXT: v_rcp_f32_e32 v4, v2
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; VI-NEXT: v_fma_f32 v5, -v2, v4, 1.0			; VI-NEXT: v_fma_f32 v5, -v2, v4, 1.0
	; VI-NEXT: v_fma_f32 v4, v5, v4, v4			; VI-NEXT: v_fma_f32 v4, v5, v4, v4
	; VI-NEXT: v_mul_f32_e32 v5, v3, v4			; VI-NEXT: v_mul_f32_e32 v5, v3, v4
	; VI-NEXT: v_fma_f32 v6, -v2, v5, v3			; VI-NEXT: v_fma_f32 v6, -v2, v5, v3
	; VI-NEXT: v_fma_f32 v5, v6, v4, v5			; VI-NEXT: v_fma_f32 v5, v6, v4, v5
	Show All 20 Lines
	; CI: ; %bb.0:			; CI: ; %bb.0:
	; CI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; CI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; CI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0xd			; CI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0xd
	; CI-NEXT: s_waitcnt lgkmcnt(0)			; CI-NEXT: s_waitcnt lgkmcnt(0)
	; CI-NEXT: s_load_dwordx4 s[0:3], s[6:7], 0x0			; CI-NEXT: s_load_dwordx4 s[0:3], s[6:7], 0x0
	; CI-NEXT: s_load_dwordx4 s[8:11], s[8:9], 0x10			; CI-NEXT: s_load_dwordx4 s[8:11], s[8:9], 0x10
	; CI-NEXT: s_waitcnt lgkmcnt(0)			; CI-NEXT: s_waitcnt lgkmcnt(0)
	; CI-NEXT: v_mov_b32_e32 v0, s8			; CI-NEXT: v_mov_b32_e32 v0, s8
	; CI-NEXT: v_div_scale_f32 v1, s[6:7], v0, v0, s0			; CI-NEXT: v_div_scale_f32 v1, vcc, v0, v0, s0
	; CI-NEXT: v_div_scale_f32 v2, vcc, s0, v0, s0			; CI-NEXT: v_div_scale_f32 v2, vcc, s0, v0, s0
	; CI-NEXT: v_rcp_f32_e32 v3, v1			; CI-NEXT: v_rcp_f32_e32 v3, v1
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v4, -v1, v3, 1.0			; CI-NEXT: v_fma_f32 v4, -v1, v3, 1.0
	; CI-NEXT: v_fma_f32 v3, v4, v3, v3			; CI-NEXT: v_fma_f32 v3, v4, v3, v3
	; CI-NEXT: v_mul_f32_e32 v4, v2, v3			; CI-NEXT: v_mul_f32_e32 v4, v2, v3
	; CI-NEXT: v_fma_f32 v5, -v1, v4, v2			; CI-NEXT: v_fma_f32 v5, -v1, v4, v2
	; CI-NEXT: v_fma_f32 v4, v5, v3, v4			; CI-NEXT: v_fma_f32 v4, v5, v3, v4
	; CI-NEXT: v_fma_f32 v1, -v1, v4, v2			; CI-NEXT: v_fma_f32 v1, -v1, v4, v2
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; CI-NEXT: v_div_fmas_f32 v1, v1, v3, v4			; CI-NEXT: v_div_fmas_f32 v1, v1, v3, v4
	; CI-NEXT: v_div_fixup_f32 v1, v1, v0, s0			; CI-NEXT: v_div_fixup_f32 v1, v1, v0, s0
	; CI-NEXT: v_trunc_f32_e32 v1, v1			; CI-NEXT: v_trunc_f32_e32 v1, v1
	; CI-NEXT: v_fma_f32 v0, -v1, v0, s0			; CI-NEXT: v_fma_f32 v0, -v1, v0, s0
	; CI-NEXT: v_mov_b32_e32 v1, s9			; CI-NEXT: v_mov_b32_e32 v1, s9
	; CI-NEXT: v_div_scale_f32 v2, s[6:7], v1, v1, s1			; CI-NEXT: v_div_scale_f32 v2, vcc, v1, v1, s1
	; CI-NEXT: v_div_scale_f32 v3, vcc, s1, v1, s1			; CI-NEXT: v_div_scale_f32 v3, vcc, s1, v1, s1
	; CI-NEXT: v_rcp_f32_e32 v4, v2			; CI-NEXT: v_rcp_f32_e32 v4, v2
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v5, -v2, v4, 1.0			; CI-NEXT: v_fma_f32 v5, -v2, v4, 1.0
	; CI-NEXT: v_fma_f32 v4, v5, v4, v4			; CI-NEXT: v_fma_f32 v4, v5, v4, v4
	; CI-NEXT: v_mul_f32_e32 v5, v3, v4			; CI-NEXT: v_mul_f32_e32 v5, v3, v4
	; CI-NEXT: v_fma_f32 v6, -v2, v5, v3			; CI-NEXT: v_fma_f32 v6, -v2, v5, v3
	; CI-NEXT: v_fma_f32 v5, v6, v4, v5			; CI-NEXT: v_fma_f32 v5, v6, v4, v5
	; CI-NEXT: v_fma_f32 v2, -v2, v5, v3			; CI-NEXT: v_fma_f32 v2, -v2, v5, v3
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; CI-NEXT: v_div_fmas_f32 v2, v2, v4, v5			; CI-NEXT: v_div_fmas_f32 v2, v2, v4, v5
	; CI-NEXT: v_div_fixup_f32 v2, v2, v1, s1			; CI-NEXT: v_div_fixup_f32 v2, v2, v1, s1
	; CI-NEXT: v_trunc_f32_e32 v2, v2			; CI-NEXT: v_trunc_f32_e32 v2, v2
	; CI-NEXT: v_fma_f32 v1, -v2, v1, s1			; CI-NEXT: v_fma_f32 v1, -v2, v1, s1
	; CI-NEXT: v_mov_b32_e32 v2, s10			; CI-NEXT: v_mov_b32_e32 v2, s10
	; CI-NEXT: v_div_scale_f32 v3, s[0:1], v2, v2, s2			; CI-NEXT: v_div_scale_f32 v3, vcc, v2, v2, s2
	; CI-NEXT: v_div_scale_f32 v4, vcc, s2, v2, s2			; CI-NEXT: v_div_scale_f32 v4, vcc, s2, v2, s2
	; CI-NEXT: v_rcp_f32_e32 v5, v3			; CI-NEXT: v_rcp_f32_e32 v5, v3
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v6, -v3, v5, 1.0			; CI-NEXT: v_fma_f32 v6, -v3, v5, 1.0
	; CI-NEXT: v_fma_f32 v5, v6, v5, v5			; CI-NEXT: v_fma_f32 v5, v6, v5, v5
	; CI-NEXT: v_mul_f32_e32 v6, v4, v5			; CI-NEXT: v_mul_f32_e32 v6, v4, v5
	; CI-NEXT: v_fma_f32 v7, -v3, v6, v4			; CI-NEXT: v_fma_f32 v7, -v3, v6, v4
	; CI-NEXT: v_fma_f32 v6, v7, v5, v6			; CI-NEXT: v_fma_f32 v6, v7, v5, v6
	; CI-NEXT: v_fma_f32 v3, -v3, v6, v4			; CI-NEXT: v_fma_f32 v3, -v3, v6, v4
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; CI-NEXT: v_div_fmas_f32 v3, v3, v5, v6			; CI-NEXT: v_div_fmas_f32 v3, v3, v5, v6
	; CI-NEXT: v_div_fixup_f32 v3, v3, v2, s2			; CI-NEXT: v_div_fixup_f32 v3, v3, v2, s2
	; CI-NEXT: v_trunc_f32_e32 v3, v3			; CI-NEXT: v_trunc_f32_e32 v3, v3
	; CI-NEXT: v_fma_f32 v2, -v3, v2, s2			; CI-NEXT: v_fma_f32 v2, -v3, v2, s2
	; CI-NEXT: v_mov_b32_e32 v3, s11			; CI-NEXT: v_mov_b32_e32 v3, s11
	; CI-NEXT: v_div_scale_f32 v4, s[0:1], v3, v3, s3			; CI-NEXT: v_div_scale_f32 v4, vcc, v3, v3, s3
	; CI-NEXT: v_div_scale_f32 v5, vcc, s3, v3, s3			; CI-NEXT: v_div_scale_f32 v5, vcc, s3, v3, s3
	; CI-NEXT: v_rcp_f32_e32 v6, v4			; CI-NEXT: v_rcp_f32_e32 v6, v4
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v7, -v4, v6, 1.0			; CI-NEXT: v_fma_f32 v7, -v4, v6, 1.0
	; CI-NEXT: v_fma_f32 v6, v7, v6, v6			; CI-NEXT: v_fma_f32 v6, v7, v6, v6
	; CI-NEXT: v_mul_f32_e32 v7, v5, v6			; CI-NEXT: v_mul_f32_e32 v7, v5, v6
	; CI-NEXT: v_fma_f32 v8, -v4, v7, v5			; CI-NEXT: v_fma_f32 v8, -v4, v7, v5
	; CI-NEXT: v_fma_f32 v7, v8, v6, v7			; CI-NEXT: v_fma_f32 v7, v8, v6, v7
	Show All 12 Lines
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; VI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x34			; VI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x34
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: s_load_dwordx4 s[0:3], s[6:7], 0x0			; VI-NEXT: s_load_dwordx4 s[0:3], s[6:7], 0x0
	; VI-NEXT: s_load_dwordx4 s[8:11], s[8:9], 0x40			; VI-NEXT: s_load_dwordx4 s[8:11], s[8:9], 0x40
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: v_mov_b32_e32 v0, s8			; VI-NEXT: v_mov_b32_e32 v0, s8
	; VI-NEXT: v_div_scale_f32 v1, s[6:7], v0, v0, s0			; VI-NEXT: v_div_scale_f32 v1, vcc, v0, v0, s0
	; VI-NEXT: v_div_scale_f32 v2, vcc, s0, v0, s0			; VI-NEXT: v_div_scale_f32 v2, vcc, s0, v0, s0
	; VI-NEXT: v_rcp_f32_e32 v3, v1			; VI-NEXT: v_rcp_f32_e32 v3, v1
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; VI-NEXT: v_fma_f32 v4, -v1, v3, 1.0			; VI-NEXT: v_fma_f32 v4, -v1, v3, 1.0
	; VI-NEXT: v_fma_f32 v3, v4, v3, v3			; VI-NEXT: v_fma_f32 v3, v4, v3, v3
	; VI-NEXT: v_mul_f32_e32 v4, v2, v3			; VI-NEXT: v_mul_f32_e32 v4, v2, v3
	; VI-NEXT: v_fma_f32 v5, -v1, v4, v2			; VI-NEXT: v_fma_f32 v5, -v1, v4, v2
	; VI-NEXT: v_fma_f32 v4, v5, v3, v4			; VI-NEXT: v_fma_f32 v4, v5, v3, v4
	; VI-NEXT: v_fma_f32 v1, -v1, v4, v2			; VI-NEXT: v_fma_f32 v1, -v1, v4, v2
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; VI-NEXT: v_div_fmas_f32 v1, v1, v3, v4			; VI-NEXT: v_div_fmas_f32 v1, v1, v3, v4
	; VI-NEXT: v_div_fixup_f32 v1, v1, v0, s0			; VI-NEXT: v_div_fixup_f32 v1, v1, v0, s0
	; VI-NEXT: v_trunc_f32_e32 v1, v1			; VI-NEXT: v_trunc_f32_e32 v1, v1
	; VI-NEXT: v_fma_f32 v0, -v1, v0, s0			; VI-NEXT: v_fma_f32 v0, -v1, v0, s0
	; VI-NEXT: v_mov_b32_e32 v1, s9			; VI-NEXT: v_mov_b32_e32 v1, s9
	; VI-NEXT: v_div_scale_f32 v2, s[6:7], v1, v1, s1			; VI-NEXT: v_div_scale_f32 v2, vcc, v1, v1, s1
	; VI-NEXT: v_div_scale_f32 v3, vcc, s1, v1, s1			; VI-NEXT: v_div_scale_f32 v3, vcc, s1, v1, s1
	; VI-NEXT: v_rcp_f32_e32 v4, v2			; VI-NEXT: v_rcp_f32_e32 v4, v2
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; VI-NEXT: v_fma_f32 v5, -v2, v4, 1.0			; VI-NEXT: v_fma_f32 v5, -v2, v4, 1.0
	; VI-NEXT: v_fma_f32 v4, v5, v4, v4			; VI-NEXT: v_fma_f32 v4, v5, v4, v4
	; VI-NEXT: v_mul_f32_e32 v5, v3, v4			; VI-NEXT: v_mul_f32_e32 v5, v3, v4
	; VI-NEXT: v_fma_f32 v6, -v2, v5, v3			; VI-NEXT: v_fma_f32 v6, -v2, v5, v3
	; VI-NEXT: v_fma_f32 v5, v6, v4, v5			; VI-NEXT: v_fma_f32 v5, v6, v4, v5
	; VI-NEXT: v_fma_f32 v2, -v2, v5, v3			; VI-NEXT: v_fma_f32 v2, -v2, v5, v3
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; VI-NEXT: v_div_fmas_f32 v2, v2, v4, v5			; VI-NEXT: v_div_fmas_f32 v2, v2, v4, v5
	; VI-NEXT: v_div_fixup_f32 v2, v2, v1, s1			; VI-NEXT: v_div_fixup_f32 v2, v2, v1, s1
	; VI-NEXT: v_trunc_f32_e32 v2, v2			; VI-NEXT: v_trunc_f32_e32 v2, v2
	; VI-NEXT: v_fma_f32 v1, -v2, v1, s1			; VI-NEXT: v_fma_f32 v1, -v2, v1, s1
	; VI-NEXT: v_mov_b32_e32 v2, s10			; VI-NEXT: v_mov_b32_e32 v2, s10
	; VI-NEXT: v_div_scale_f32 v3, s[0:1], v2, v2, s2			; VI-NEXT: v_div_scale_f32 v3, vcc, v2, v2, s2
	; VI-NEXT: v_div_scale_f32 v4, vcc, s2, v2, s2			; VI-NEXT: v_div_scale_f32 v4, vcc, s2, v2, s2
	; VI-NEXT: v_rcp_f32_e32 v5, v3			; VI-NEXT: v_rcp_f32_e32 v5, v3
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; VI-NEXT: v_fma_f32 v6, -v3, v5, 1.0			; VI-NEXT: v_fma_f32 v6, -v3, v5, 1.0
	; VI-NEXT: v_fma_f32 v5, v6, v5, v5			; VI-NEXT: v_fma_f32 v5, v6, v5, v5
	; VI-NEXT: v_mul_f32_e32 v6, v4, v5			; VI-NEXT: v_mul_f32_e32 v6, v4, v5
	; VI-NEXT: v_fma_f32 v7, -v3, v6, v4			; VI-NEXT: v_fma_f32 v7, -v3, v6, v4
	; VI-NEXT: v_fma_f32 v6, v7, v5, v6			; VI-NEXT: v_fma_f32 v6, v7, v5, v6
	; VI-NEXT: v_fma_f32 v3, -v3, v6, v4			; VI-NEXT: v_fma_f32 v3, -v3, v6, v4
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; VI-NEXT: v_div_fmas_f32 v3, v3, v5, v6			; VI-NEXT: v_div_fmas_f32 v3, v3, v5, v6
	; VI-NEXT: v_div_fixup_f32 v3, v3, v2, s2			; VI-NEXT: v_div_fixup_f32 v3, v3, v2, s2
	; VI-NEXT: v_trunc_f32_e32 v3, v3			; VI-NEXT: v_trunc_f32_e32 v3, v3
	; VI-NEXT: v_fma_f32 v2, -v3, v2, s2			; VI-NEXT: v_fma_f32 v2, -v3, v2, s2
	; VI-NEXT: v_mov_b32_e32 v3, s11			; VI-NEXT: v_mov_b32_e32 v3, s11
	; VI-NEXT: v_div_scale_f32 v4, s[0:1], v3, v3, s3			; VI-NEXT: v_div_scale_f32 v4, vcc, v3, v3, s3
	; VI-NEXT: v_div_scale_f32 v5, vcc, s3, v3, s3			; VI-NEXT: v_div_scale_f32 v5, vcc, s3, v3, s3
	; VI-NEXT: v_rcp_f32_e32 v6, v4			; VI-NEXT: v_rcp_f32_e32 v6, v4
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; VI-NEXT: v_fma_f32 v7, -v4, v6, 1.0			; VI-NEXT: v_fma_f32 v7, -v4, v6, 1.0
	; VI-NEXT: v_fma_f32 v6, v7, v6, v6			; VI-NEXT: v_fma_f32 v6, v7, v6, v6
	; VI-NEXT: v_mul_f32_e32 v7, v5, v6			; VI-NEXT: v_mul_f32_e32 v7, v5, v6
	; VI-NEXT: v_fma_f32 v8, -v4, v7, v5			; VI-NEXT: v_fma_f32 v8, -v4, v7, v5
	; VI-NEXT: v_fma_f32 v7, v8, v6, v7			; VI-NEXT: v_fma_f32 v7, v8, v6, v7
	Show All 21 Lines
	; CI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; CI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; CI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0xd			; CI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0xd
	; CI-NEXT: s_waitcnt lgkmcnt(0)			; CI-NEXT: s_waitcnt lgkmcnt(0)
	; CI-NEXT: s_load_dwordx4 s[0:3], s[6:7], 0x0			; CI-NEXT: s_load_dwordx4 s[0:3], s[6:7], 0x0
	; CI-NEXT: s_load_dwordx4 s[8:11], s[8:9], 0x10			; CI-NEXT: s_load_dwordx4 s[8:11], s[8:9], 0x10
	; CI-NEXT: s_waitcnt lgkmcnt(0)			; CI-NEXT: s_waitcnt lgkmcnt(0)
	; CI-NEXT: v_mov_b32_e32 v0, s8			; CI-NEXT: v_mov_b32_e32 v0, s8
	; CI-NEXT: v_mov_b32_e32 v1, s9			; CI-NEXT: v_mov_b32_e32 v1, s9
	; CI-NEXT: v_div_scale_f64 v[2:3], s[6:7], v[0:1], v[0:1], s[0:1]			; CI-NEXT: v_div_scale_f64 v[2:3], vcc, v[0:1], v[0:1], s[0:1]
	; CI-NEXT: v_div_scale_f64 v[8:9], vcc, s[0:1], v[0:1], s[0:1]			; CI-NEXT: v_div_scale_f64 v[8:9], vcc, s[0:1], v[0:1], s[0:1]
	; CI-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]			; CI-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]
	; CI-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0			; CI-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
	; CI-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]			; CI-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
	; CI-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0			; CI-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
	; CI-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]			; CI-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
	; CI-NEXT: v_mul_f64 v[6:7], v[8:9], v[4:5]			; CI-NEXT: v_mul_f64 v[6:7], v[8:9], v[4:5]
	; CI-NEXT: v_fma_f64 v[2:3], -v[2:3], v[6:7], v[8:9]			; CI-NEXT: v_fma_f64 v[2:3], -v[2:3], v[6:7], v[8:9]
	; CI-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[6:7]			; CI-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[6:7]
	; CI-NEXT: v_div_fixup_f64 v[2:3], v[2:3], v[0:1], s[0:1]			; CI-NEXT: v_div_fixup_f64 v[2:3], v[2:3], v[0:1], s[0:1]
	; CI-NEXT: v_trunc_f64_e32 v[2:3], v[2:3]			; CI-NEXT: v_trunc_f64_e32 v[2:3], v[2:3]
	; CI-NEXT: v_fma_f64 v[0:1], -v[2:3], v[0:1], s[0:1]			; CI-NEXT: v_fma_f64 v[0:1], -v[2:3], v[0:1], s[0:1]
	; CI-NEXT: v_mov_b32_e32 v2, s10			; CI-NEXT: v_mov_b32_e32 v2, s10
	; CI-NEXT: v_mov_b32_e32 v3, s11			; CI-NEXT: v_mov_b32_e32 v3, s11
	; CI-NEXT: v_div_scale_f64 v[4:5], s[0:1], v[2:3], v[2:3], s[2:3]			; CI-NEXT: v_div_scale_f64 v[4:5], vcc, v[2:3], v[2:3], s[2:3]
	; CI-NEXT: v_div_scale_f64 v[10:11], vcc, s[2:3], v[2:3], s[2:3]			; CI-NEXT: v_div_scale_f64 v[10:11], vcc, s[2:3], v[2:3], s[2:3]
	; CI-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]			; CI-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
	; CI-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0			; CI-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
	; CI-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]			; CI-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
	; CI-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0			; CI-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
	; CI-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]			; CI-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
	; CI-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]			; CI-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]
	; CI-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]			; CI-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]
	Show All 11 Lines
	; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; VI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x34			; VI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x34
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: s_load_dwordx4 s[0:3], s[6:7], 0x0			; VI-NEXT: s_load_dwordx4 s[0:3], s[6:7], 0x0
	; VI-NEXT: s_load_dwordx4 s[8:11], s[8:9], 0x40			; VI-NEXT: s_load_dwordx4 s[8:11], s[8:9], 0x40
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: v_mov_b32_e32 v0, s8			; VI-NEXT: v_mov_b32_e32 v0, s8
	; VI-NEXT: v_mov_b32_e32 v1, s9			; VI-NEXT: v_mov_b32_e32 v1, s9
	; VI-NEXT: v_div_scale_f64 v[2:3], s[6:7], v[0:1], v[0:1], s[0:1]			; VI-NEXT: v_div_scale_f64 v[2:3], vcc, v[0:1], v[0:1], s[0:1]
	; VI-NEXT: v_div_scale_f64 v[8:9], vcc, s[0:1], v[0:1], s[0:1]			; VI-NEXT: v_div_scale_f64 v[8:9], vcc, s[0:1], v[0:1], s[0:1]
	; VI-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]			; VI-NEXT: v_rcp_f64_e32 v[4:5], v[2:3]
	; VI-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0			; VI-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
	; VI-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]			; VI-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
	; VI-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0			; VI-NEXT: v_fma_f64 v[6:7], -v[2:3], v[4:5], 1.0
	; VI-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]			; VI-NEXT: v_fma_f64 v[4:5], v[4:5], v[6:7], v[4:5]
	; VI-NEXT: v_mul_f64 v[6:7], v[8:9], v[4:5]			; VI-NEXT: v_mul_f64 v[6:7], v[8:9], v[4:5]
	; VI-NEXT: v_fma_f64 v[2:3], -v[2:3], v[6:7], v[8:9]			; VI-NEXT: v_fma_f64 v[2:3], -v[2:3], v[6:7], v[8:9]
	; VI-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[6:7]			; VI-NEXT: v_div_fmas_f64 v[2:3], v[2:3], v[4:5], v[6:7]
	; VI-NEXT: v_div_fixup_f64 v[2:3], v[2:3], v[0:1], s[0:1]			; VI-NEXT: v_div_fixup_f64 v[2:3], v[2:3], v[0:1], s[0:1]
	; VI-NEXT: v_trunc_f64_e32 v[2:3], v[2:3]			; VI-NEXT: v_trunc_f64_e32 v[2:3], v[2:3]
	; VI-NEXT: v_fma_f64 v[0:1], -v[2:3], v[0:1], s[0:1]			; VI-NEXT: v_fma_f64 v[0:1], -v[2:3], v[0:1], s[0:1]
	; VI-NEXT: v_mov_b32_e32 v2, s10			; VI-NEXT: v_mov_b32_e32 v2, s10
	; VI-NEXT: v_mov_b32_e32 v3, s11			; VI-NEXT: v_mov_b32_e32 v3, s11
	; VI-NEXT: v_div_scale_f64 v[4:5], s[0:1], v[2:3], v[2:3], s[2:3]			; VI-NEXT: v_div_scale_f64 v[4:5], vcc, v[2:3], v[2:3], s[2:3]
	; VI-NEXT: v_div_scale_f64 v[10:11], vcc, s[2:3], v[2:3], s[2:3]			; VI-NEXT: v_div_scale_f64 v[10:11], vcc, s[2:3], v[2:3], s[2:3]
	; VI-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]			; VI-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
	; VI-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0			; VI-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
	; VI-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]			; VI-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
	; VI-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0			; VI-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
	; VI-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]			; VI-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
	; VI-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]			; VI-NEXT: v_mul_f64 v[8:9], v[10:11], v[6:7]
	; VI-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]			; VI-NEXT: v_fma_f64 v[4:5], -v[4:5], v[8:9], v[10:11]
	Show All 18 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.div.scale.ll

	Show All 14 Lines
	; GFX7-NEXT: s_mov_b32 s7, 0xf000			; GFX7-NEXT: s_mov_b32 s7, 0xf000
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_mov_b64 s[4:5], s[2:3]			; GFX7-NEXT: s_mov_b64 s[4:5], s[2:3]
	; GFX7-NEXT: buffer_load_dword v2, v[0:1], s[4:7], 0 addr64 glc			; GFX7-NEXT: buffer_load_dword v2, v[0:1], s[4:7], 0 addr64 glc
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: buffer_load_dword v0, v[0:1], s[4:7], 0 addr64 offset:4 glc			; GFX7-NEXT: buffer_load_dword v0, v[0:1], s[4:7], 0 addr64 offset:4 glc
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: s_mov_b32 s6, -1			; GFX7-NEXT: s_mov_b32 s6, -1
	; GFX7-NEXT: v_div_scale_f32 v0, s[2:3], v0, v0, v2
	; GFX7-NEXT: s_mov_b64 s[2:3], s[6:7]			; GFX7-NEXT: s_mov_b64 s[2:3], s[6:7]
				; GFX7-NEXT: v_div_scale_f32 v0, vcc, v0, v0, v2
	; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f32_1:			; GFX8-LABEL: test_div_scale_f32_1:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX8-NEXT: v_lshlrev_b32_e32 v2, 2, v0			; GFX8-NEXT: v_lshlrev_b32_e32 v2, 2, v0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s2			; GFX8-NEXT: v_mov_b32_e32 v0, s2
	; GFX8-NEXT: v_mov_b32_e32 v1, s3			; GFX8-NEXT: v_mov_b32_e32 v1, s3
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2			; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2
	; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; GFX8-NEXT: v_add_u32_e32 v2, vcc, 4, v0			; GFX8-NEXT: v_add_u32_e32 v2, vcc, 4, v0
	; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v1, vcc			; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v1, vcc
	; GFX8-NEXT: flat_load_dword v0, v[0:1] glc			; GFX8-NEXT: flat_load_dword v0, v[0:1] glc
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: flat_load_dword v1, v[2:3] glc			; GFX8-NEXT: flat_load_dword v1, v[2:3] glc
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_div_scale_f32 v2, s[2:3], v1, v1, v0			; GFX8-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v0
	; GFX8-NEXT: v_mov_b32_e32 v0, s0			; GFX8-NEXT: v_mov_b32_e32 v0, s0
	; GFX8-NEXT: v_mov_b32_e32 v1, s1			; GFX8-NEXT: v_mov_b32_e32 v1, s1
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f32_1:			; GFX10-LABEL: test_div_scale_f32_1:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: global_load_dword v1, v0, s[2:3] glc dlc			; GFX10-NEXT: global_load_dword v1, v0, s[2:3] glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: global_load_dword v2, v0, s[2:3] offset:4 glc dlc			; GFX10-NEXT: global_load_dword v2, v0, s[2:3] offset:4 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f32 v0, s2, v2, v2, v1			; GFX10-NEXT: v_div_scale_f32 v0, vcc_lo, v2, v2, v1
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: global_store_dword v1, v0, s[0:1]			; GFX10-NEXT: global_store_dword v1, v0, s[0:1]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f32_1:			; GFX11-LABEL: test_div_scale_f32_1:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_load_b128 s[0:3], s[0:1], 0x24			; GFX11-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
	; GFX11-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: global_load_b32 v1, v0, s[2:3] glc dlc			; GFX11-NEXT: global_load_b32 v1, v0, s[2:3] glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: global_load_b32 v0, v0, s[2:3] offset:4 glc dlc			; GFX11-NEXT: global_load_b32 v0, v0, s[2:3] offset:4 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_div_scale_f32 v0, null, v0, v0, v1			; GFX11-NEXT: v_div_scale_f32 v0, vcc_lo, v0, v0, v1
	; GFX11-NEXT: v_mov_b32_e32 v1, 0			; GFX11-NEXT: v_mov_b32_e32 v1, 0
	; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]			; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid
	%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1

	Show All 16 Lines
	; GFX7-NEXT: s_mov_b32 s7, 0xf000			; GFX7-NEXT: s_mov_b32 s7, 0xf000
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_mov_b64 s[4:5], s[2:3]			; GFX7-NEXT: s_mov_b64 s[4:5], s[2:3]
	; GFX7-NEXT: buffer_load_dword v2, v[0:1], s[4:7], 0 addr64 glc			; GFX7-NEXT: buffer_load_dword v2, v[0:1], s[4:7], 0 addr64 glc
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: buffer_load_dword v0, v[0:1], s[4:7], 0 addr64 offset:4 glc			; GFX7-NEXT: buffer_load_dword v0, v[0:1], s[4:7], 0 addr64 offset:4 glc
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: s_mov_b32 s6, -1			; GFX7-NEXT: s_mov_b32 s6, -1
	; GFX7-NEXT: v_div_scale_f32 v0, s[2:3], v2, v0, v2
	; GFX7-NEXT: s_mov_b64 s[2:3], s[6:7]			; GFX7-NEXT: s_mov_b64 s[2:3], s[6:7]
				; GFX7-NEXT: v_div_scale_f32 v0, vcc, v2, v0, v2
	; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f32_2:			; GFX8-LABEL: test_div_scale_f32_2:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX8-NEXT: v_lshlrev_b32_e32 v2, 2, v0			; GFX8-NEXT: v_lshlrev_b32_e32 v2, 2, v0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s2			; GFX8-NEXT: v_mov_b32_e32 v0, s2
	; GFX8-NEXT: v_mov_b32_e32 v1, s3			; GFX8-NEXT: v_mov_b32_e32 v1, s3
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2			; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2
	; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; GFX8-NEXT: v_add_u32_e32 v2, vcc, 4, v0			; GFX8-NEXT: v_add_u32_e32 v2, vcc, 4, v0
	; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v1, vcc			; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v1, vcc
	; GFX8-NEXT: flat_load_dword v0, v[0:1] glc			; GFX8-NEXT: flat_load_dword v0, v[0:1] glc
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: flat_load_dword v1, v[2:3] glc			; GFX8-NEXT: flat_load_dword v1, v[2:3] glc
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_div_scale_f32 v2, s[2:3], v0, v1, v0			; GFX8-NEXT: v_div_scale_f32 v2, vcc, v0, v1, v0
	; GFX8-NEXT: v_mov_b32_e32 v0, s0			; GFX8-NEXT: v_mov_b32_e32 v0, s0
	; GFX8-NEXT: v_mov_b32_e32 v1, s1			; GFX8-NEXT: v_mov_b32_e32 v1, s1
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f32_2:			; GFX10-LABEL: test_div_scale_f32_2:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: global_load_dword v1, v0, s[2:3] glc dlc			; GFX10-NEXT: global_load_dword v1, v0, s[2:3] glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: global_load_dword v2, v0, s[2:3] offset:4 glc dlc			; GFX10-NEXT: global_load_dword v2, v0, s[2:3] offset:4 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f32 v0, s2, v1, v2, v1			; GFX10-NEXT: v_div_scale_f32 v0, vcc_lo, v1, v2, v1
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: global_store_dword v1, v0, s[0:1]			; GFX10-NEXT: global_store_dword v1, v0, s[0:1]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f32_2:			; GFX11-LABEL: test_div_scale_f32_2:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_load_b128 s[0:3], s[0:1], 0x24			; GFX11-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
	; GFX11-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: global_load_b32 v1, v0, s[2:3] glc dlc			; GFX11-NEXT: global_load_b32 v1, v0, s[2:3] glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: global_load_b32 v0, v0, s[2:3] offset:4 glc dlc			; GFX11-NEXT: global_load_b32 v0, v0, s[2:3] offset:4 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_div_scale_f32 v0, null, v1, v0, v1			; GFX11-NEXT: v_div_scale_f32 v0, vcc_lo, v1, v0, v1
	; GFX11-NEXT: v_mov_b32_e32 v1, 0			; GFX11-NEXT: v_mov_b32_e32 v1, 0
	; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]			; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid
	%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1

	Show All 18 Lines
	; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2			; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2
	; GFX7-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; GFX7-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; GFX7-NEXT: v_add_i32_e32 v2, vcc, 8, v0			; GFX7-NEXT: v_add_i32_e32 v2, vcc, 8, v0
	; GFX7-NEXT: v_addc_u32_e32 v3, vcc, 0, v1, vcc			; GFX7-NEXT: v_addc_u32_e32 v3, vcc, 0, v1, vcc
	; GFX7-NEXT: flat_load_dwordx2 v[0:1], v[0:1] glc			; GFX7-NEXT: flat_load_dwordx2 v[0:1], v[0:1] glc
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: flat_load_dwordx2 v[2:3], v[2:3] glc			; GFX7-NEXT: flat_load_dwordx2 v[2:3], v[2:3] glc
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_div_scale_f64 v[0:1], s[2:3], v[2:3], v[2:3], v[0:1]			; GFX7-NEXT: v_div_scale_f64 v[0:1], vcc, v[2:3], v[2:3], v[0:1]
	; GFX7-NEXT: v_mov_b32_e32 v3, s1			; GFX7-NEXT: v_mov_b32_e32 v3, s1
	; GFX7-NEXT: v_mov_b32_e32 v2, s0			; GFX7-NEXT: v_mov_b32_e32 v2, s0
	; GFX7-NEXT: flat_store_dwordx2 v[2:3], v[0:1]			; GFX7-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f64_1:			; GFX8-LABEL: test_div_scale_f64_1:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34			; GFX8-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	; GFX8-NEXT: v_lshlrev_b32_e32 v2, 3, v0			; GFX8-NEXT: v_lshlrev_b32_e32 v2, 3, v0
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s2			; GFX8-NEXT: v_mov_b32_e32 v0, s2
	; GFX8-NEXT: v_mov_b32_e32 v1, s3			; GFX8-NEXT: v_mov_b32_e32 v1, s3
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2			; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2
	; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; GFX8-NEXT: v_add_u32_e32 v2, vcc, 8, v0			; GFX8-NEXT: v_add_u32_e32 v2, vcc, 8, v0
	; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v1, vcc			; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v1, vcc
	; GFX8-NEXT: flat_load_dwordx2 v[0:1], v[0:1] glc			; GFX8-NEXT: flat_load_dwordx2 v[0:1], v[0:1] glc
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: flat_load_dwordx2 v[2:3], v[2:3] glc			; GFX8-NEXT: flat_load_dwordx2 v[2:3], v[2:3] glc
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_div_scale_f64 v[0:1], s[2:3], v[2:3], v[2:3], v[0:1]			; GFX8-NEXT: v_div_scale_f64 v[0:1], vcc, v[2:3], v[2:3], v[0:1]
	; GFX8-NEXT: v_mov_b32_e32 v3, s1			; GFX8-NEXT: v_mov_b32_e32 v3, s1
	; GFX8-NEXT: v_mov_b32_e32 v2, s0			; GFX8-NEXT: v_mov_b32_e32 v2, s0
	; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]			; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f64_1:			; GFX10-LABEL: test_div_scale_f64_1:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34			; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	; GFX10-NEXT: v_lshlrev_b32_e32 v4, 3, v0			; GFX10-NEXT: v_lshlrev_b32_e32 v4, 3, v0
	; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v4, s[2:3] glc dlc			; GFX10-NEXT: global_load_dwordx2 v[0:1], v4, s[2:3] glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: global_load_dwordx2 v[2:3], v4, s[2:3] offset:8 glc dlc			; GFX10-NEXT: global_load_dwordx2 v[2:3], v4, s[2:3] offset:8 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f64 v[0:1], s2, v[2:3], v[2:3], v[0:1]			; GFX10-NEXT: v_div_scale_f64 v[0:1], vcc_lo, v[2:3], v[2:3], v[0:1]
	; GFX10-NEXT: v_mov_b32_e32 v2, 0			; GFX10-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-NEXT: global_store_dwordx2 v2, v[0:1], s[0:1]			; GFX10-NEXT: global_store_dwordx2 v2, v[0:1], s[0:1]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f64_1:			; GFX11-LABEL: test_div_scale_f64_1:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_load_b64 s[2:3], s[0:1], 0x34			; GFX11-NEXT: s_load_b64 s[2:3], s[0:1], 0x34
	; GFX11-NEXT: v_lshlrev_b32_e32 v2, 3, v0			; GFX11-NEXT: v_lshlrev_b32_e32 v2, 3, v0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: global_load_b64 v[0:1], v2, s[2:3] glc dlc			; GFX11-NEXT: global_load_b64 v[0:1], v2, s[2:3] glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: global_load_b64 v[2:3], v2, s[2:3] offset:8 glc dlc			; GFX11-NEXT: global_load_b64 v[2:3], v2, s[2:3] offset:8 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_div_scale_f64 v[0:1], null, v[2:3], v[2:3], v[0:1]			; GFX11-NEXT: v_div_scale_f64 v[0:1], vcc_lo, v[2:3], v[2:3], v[0:1]
	; GFX11-NEXT: v_mov_b32_e32 v2, 0			; GFX11-NEXT: v_mov_b32_e32 v2, 0
	; GFX11-NEXT: global_store_b64 v2, v[0:1], s[0:1]			; GFX11-NEXT: global_store_b64 v2, v[0:1], s[0:1]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%gep.0 = getelementptr double, double addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr double, double addrspace(1)* %in, i32 %tid
	%gep.1 = getelementptr double, double addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr double, double addrspace(1)* %gep.0, i32 1

	Show All 18 Lines
	; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2			; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2
	; GFX7-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; GFX7-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; GFX7-NEXT: v_add_i32_e32 v2, vcc, 8, v0			; GFX7-NEXT: v_add_i32_e32 v2, vcc, 8, v0
	; GFX7-NEXT: v_addc_u32_e32 v3, vcc, 0, v1, vcc			; GFX7-NEXT: v_addc_u32_e32 v3, vcc, 0, v1, vcc
	; GFX7-NEXT: flat_load_dwordx2 v[0:1], v[0:1] glc			; GFX7-NEXT: flat_load_dwordx2 v[0:1], v[0:1] glc
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: flat_load_dwordx2 v[2:3], v[2:3] glc			; GFX7-NEXT: flat_load_dwordx2 v[2:3], v[2:3] glc
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_div_scale_f64 v[0:1], s[2:3], v[0:1], v[2:3], v[0:1]			; GFX7-NEXT: v_div_scale_f64 v[0:1], vcc, v[0:1], v[2:3], v[0:1]
	; GFX7-NEXT: v_mov_b32_e32 v3, s1			; GFX7-NEXT: v_mov_b32_e32 v3, s1
	; GFX7-NEXT: v_mov_b32_e32 v2, s0			; GFX7-NEXT: v_mov_b32_e32 v2, s0
	; GFX7-NEXT: flat_store_dwordx2 v[2:3], v[0:1]			; GFX7-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f64_2:			; GFX8-LABEL: test_div_scale_f64_2:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34			; GFX8-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	; GFX8-NEXT: v_lshlrev_b32_e32 v2, 3, v0			; GFX8-NEXT: v_lshlrev_b32_e32 v2, 3, v0
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s2			; GFX8-NEXT: v_mov_b32_e32 v0, s2
	; GFX8-NEXT: v_mov_b32_e32 v1, s3			; GFX8-NEXT: v_mov_b32_e32 v1, s3
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2			; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2
	; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; GFX8-NEXT: v_add_u32_e32 v2, vcc, 8, v0			; GFX8-NEXT: v_add_u32_e32 v2, vcc, 8, v0
	; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v1, vcc			; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v1, vcc
	; GFX8-NEXT: flat_load_dwordx2 v[0:1], v[0:1] glc			; GFX8-NEXT: flat_load_dwordx2 v[0:1], v[0:1] glc
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: flat_load_dwordx2 v[2:3], v[2:3] glc			; GFX8-NEXT: flat_load_dwordx2 v[2:3], v[2:3] glc
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_div_scale_f64 v[0:1], s[2:3], v[0:1], v[2:3], v[0:1]			; GFX8-NEXT: v_div_scale_f64 v[0:1], vcc, v[0:1], v[2:3], v[0:1]
	; GFX8-NEXT: v_mov_b32_e32 v3, s1			; GFX8-NEXT: v_mov_b32_e32 v3, s1
	; GFX8-NEXT: v_mov_b32_e32 v2, s0			; GFX8-NEXT: v_mov_b32_e32 v2, s0
	; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]			; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f64_2:			; GFX10-LABEL: test_div_scale_f64_2:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34			; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	; GFX10-NEXT: v_lshlrev_b32_e32 v4, 3, v0			; GFX10-NEXT: v_lshlrev_b32_e32 v4, 3, v0
	; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v4, s[2:3] glc dlc			; GFX10-NEXT: global_load_dwordx2 v[0:1], v4, s[2:3] glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: global_load_dwordx2 v[2:3], v4, s[2:3] offset:8 glc dlc			; GFX10-NEXT: global_load_dwordx2 v[2:3], v4, s[2:3] offset:8 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f64 v[0:1], s2, v[0:1], v[2:3], v[0:1]			; GFX10-NEXT: v_div_scale_f64 v[0:1], vcc_lo, v[0:1], v[2:3], v[0:1]
	; GFX10-NEXT: v_mov_b32_e32 v2, 0			; GFX10-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-NEXT: global_store_dwordx2 v2, v[0:1], s[0:1]			; GFX10-NEXT: global_store_dwordx2 v2, v[0:1], s[0:1]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f64_2:			; GFX11-LABEL: test_div_scale_f64_2:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_load_b64 s[2:3], s[0:1], 0x34			; GFX11-NEXT: s_load_b64 s[2:3], s[0:1], 0x34
	; GFX11-NEXT: v_lshlrev_b32_e32 v2, 3, v0			; GFX11-NEXT: v_lshlrev_b32_e32 v2, 3, v0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: global_load_b64 v[0:1], v2, s[2:3] glc dlc			; GFX11-NEXT: global_load_b64 v[0:1], v2, s[2:3] glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: global_load_b64 v[2:3], v2, s[2:3] offset:8 glc dlc			; GFX11-NEXT: global_load_b64 v[2:3], v2, s[2:3] offset:8 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_div_scale_f64 v[0:1], null, v[0:1], v[2:3], v[0:1]			; GFX11-NEXT: v_div_scale_f64 v[0:1], vcc_lo, v[0:1], v[2:3], v[0:1]
	; GFX11-NEXT: v_mov_b32_e32 v2, 0			; GFX11-NEXT: v_mov_b32_e32 v2, 0
	; GFX11-NEXT: global_store_b64 v2, v[0:1], s[0:1]			; GFX11-NEXT: global_store_b64 v2, v[0:1], s[0:1]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%gep.0 = getelementptr double, double addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr double, double addrspace(1)* %in, i32 %tid
	%gep.1 = getelementptr double, double addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr double, double addrspace(1)* %gep.0, i32 1

	Show All 16 Lines
	; GFX7-NEXT: s_mov_b32 s2, 0			; GFX7-NEXT: s_mov_b32 s2, 0
	; GFX7-NEXT: s_mov_b32 s3, 0xf000			; GFX7-NEXT: s_mov_b32 s3, 0xf000
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_mov_b64 s[0:1], s[6:7]			; GFX7-NEXT: s_mov_b64 s[0:1], s[6:7]
	; GFX7-NEXT: buffer_load_dword v0, v[0:1], s[0:3], 0 addr64			; GFX7-NEXT: buffer_load_dword v0, v[0:1], s[0:3], 0 addr64
	; GFX7-NEXT: s_mov_b32 s2, -1			; GFX7-NEXT: s_mov_b32 s2, -1
	; GFX7-NEXT: s_mov_b64 s[6:7], s[2:3]			; GFX7-NEXT: s_mov_b64 s[6:7], s[2:3]
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_div_scale_f32 v0, s[0:1], v0, v0, s8			; GFX7-NEXT: v_div_scale_f32 v0, vcc, v0, v0, s8
	; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0			; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f32_scalar_num_1:			; GFX8-LABEL: test_div_scale_f32_scalar_num_1:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX8-NEXT: s_load_dword s0, s[0:1], 0x54			; GFX8-NEXT: s_load_dword s0, s[0:1], 0x54
	; GFX8-NEXT: v_lshlrev_b32_e32 v2, 2, v0			; GFX8-NEXT: v_lshlrev_b32_e32 v2, 2, v0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s6			; GFX8-NEXT: v_mov_b32_e32 v0, s6
	; GFX8-NEXT: v_mov_b32_e32 v1, s7			; GFX8-NEXT: v_mov_b32_e32 v1, s7
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2			; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2
	; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; GFX8-NEXT: flat_load_dword v0, v[0:1]			; GFX8-NEXT: flat_load_dword v0, v[0:1]
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_div_scale_f32 v2, s[0:1], v0, v0, s0			; GFX8-NEXT: v_div_scale_f32 v2, vcc, v0, v0, s0
	; GFX8-NEXT: v_mov_b32_e32 v0, s4			; GFX8-NEXT: v_mov_b32_e32 v0, s4
	; GFX8-NEXT: v_mov_b32_e32 v1, s5			; GFX8-NEXT: v_mov_b32_e32 v1, s5
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f32_scalar_num_1:			; GFX10-LABEL: test_div_scale_f32_scalar_num_1:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX10-NEXT: s_load_dword s0, s[0:1], 0x54			; GFX10-NEXT: s_load_dword s0, s[0:1], 0x54
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: global_load_dword v0, v0, s[6:7]			; GFX10-NEXT: global_load_dword v0, v0, s[6:7]
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f32 v0, s0, v0, v0, s0			; GFX10-NEXT: v_div_scale_f32 v0, vcc_lo, v0, v0, s0
	; GFX10-NEXT: global_store_dword v1, v0, s[4:5]			; GFX10-NEXT: global_store_dword v1, v0, s[4:5]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f32_scalar_num_1:			; GFX11-LABEL: test_div_scale_f32_scalar_num_1:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24			; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24
	; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_lshlrev_b32 v0, 2, v0			; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_lshlrev_b32 v0, 2, v0
	; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x54			; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x54
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: global_load_b32 v0, v0, s[6:7]			; GFX11-NEXT: global_load_b32 v0, v0, s[6:7]
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_div_scale_f32 v0, null, v0, v0, s0			; GFX11-NEXT: v_div_scale_f32 v0, vcc_lo, v0, v0, s0
	; GFX11-NEXT: global_store_b32 v1, v0, s[4:5]			; GFX11-NEXT: global_store_b32 v1, v0, s[4:5]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%gep = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep = getelementptr float, float addrspace(1)* %in, i32 %tid

	%b = load float, float addrspace(1)* %gep, align 4			%b = load float, float addrspace(1)* %gep, align 4

	Show All 13 Lines
	; GFX7-NEXT: s_mov_b32 s2, 0			; GFX7-NEXT: s_mov_b32 s2, 0
	; GFX7-NEXT: s_mov_b32 s3, 0xf000			; GFX7-NEXT: s_mov_b32 s3, 0xf000
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_mov_b64 s[0:1], s[6:7]			; GFX7-NEXT: s_mov_b64 s[0:1], s[6:7]
	; GFX7-NEXT: buffer_load_dword v0, v[0:1], s[0:3], 0 addr64			; GFX7-NEXT: buffer_load_dword v0, v[0:1], s[0:3], 0 addr64
	; GFX7-NEXT: s_mov_b32 s2, -1			; GFX7-NEXT: s_mov_b32 s2, -1
	; GFX7-NEXT: s_mov_b64 s[6:7], s[2:3]			; GFX7-NEXT: s_mov_b64 s[6:7], s[2:3]
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_div_scale_f32 v0, s[0:1], s8, v0, s8			; GFX7-NEXT: v_div_scale_f32 v0, vcc, s8, v0, s8
	; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0			; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f32_scalar_num_2:			; GFX8-LABEL: test_div_scale_f32_scalar_num_2:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX8-NEXT: s_load_dword s0, s[0:1], 0x34			; GFX8-NEXT: s_load_dword s0, s[0:1], 0x34
	; GFX8-NEXT: v_lshlrev_b32_e32 v2, 2, v0			; GFX8-NEXT: v_lshlrev_b32_e32 v2, 2, v0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s6			; GFX8-NEXT: v_mov_b32_e32 v0, s6
	; GFX8-NEXT: v_mov_b32_e32 v1, s7			; GFX8-NEXT: v_mov_b32_e32 v1, s7
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2			; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2
	; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; GFX8-NEXT: flat_load_dword v0, v[0:1]			; GFX8-NEXT: flat_load_dword v0, v[0:1]
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_div_scale_f32 v2, s[0:1], s0, v0, s0			; GFX8-NEXT: v_div_scale_f32 v2, vcc, s0, v0, s0
	; GFX8-NEXT: v_mov_b32_e32 v0, s4			; GFX8-NEXT: v_mov_b32_e32 v0, s4
	; GFX8-NEXT: v_mov_b32_e32 v1, s5			; GFX8-NEXT: v_mov_b32_e32 v1, s5
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f32_scalar_num_2:			; GFX10-LABEL: test_div_scale_f32_scalar_num_2:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX10-NEXT: s_load_dword s0, s[0:1], 0x34			; GFX10-NEXT: s_load_dword s0, s[0:1], 0x34
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: global_load_dword v0, v0, s[6:7]			; GFX10-NEXT: global_load_dword v0, v0, s[6:7]
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f32 v0, s0, s0, v0, s0			; GFX10-NEXT: v_div_scale_f32 v0, vcc_lo, s0, v0, s0
	; GFX10-NEXT: global_store_dword v1, v0, s[4:5]			; GFX10-NEXT: global_store_dword v1, v0, s[4:5]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f32_scalar_num_2:			; GFX11-LABEL: test_div_scale_f32_scalar_num_2:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24			; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24
	; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_lshlrev_b32 v0, 2, v0			; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_lshlrev_b32 v0, 2, v0
	; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x34			; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x34
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: global_load_b32 v0, v0, s[6:7]			; GFX11-NEXT: global_load_b32 v0, v0, s[6:7]
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_div_scale_f32 v0, null, s0, v0, s0			; GFX11-NEXT: v_div_scale_f32 v0, vcc_lo, s0, v0, s0
	; GFX11-NEXT: global_store_b32 v1, v0, s[4:5]			; GFX11-NEXT: global_store_b32 v1, v0, s[4:5]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%gep = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep = getelementptr float, float addrspace(1)* %in, i32 %tid

	%b = load float, float addrspace(1)* %gep, align 4			%b = load float, float addrspace(1)* %gep, align 4

	Show All 13 Lines
	; GFX7-NEXT: s_mov_b32 s2, 0			; GFX7-NEXT: s_mov_b32 s2, 0
	; GFX7-NEXT: s_mov_b32 s3, 0xf000			; GFX7-NEXT: s_mov_b32 s3, 0xf000
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_mov_b64 s[0:1], s[6:7]			; GFX7-NEXT: s_mov_b64 s[0:1], s[6:7]
	; GFX7-NEXT: buffer_load_dword v0, v[0:1], s[0:3], 0 addr64			; GFX7-NEXT: buffer_load_dword v0, v[0:1], s[0:3], 0 addr64
	; GFX7-NEXT: s_mov_b32 s2, -1			; GFX7-NEXT: s_mov_b32 s2, -1
	; GFX7-NEXT: s_mov_b64 s[6:7], s[2:3]			; GFX7-NEXT: s_mov_b64 s[6:7], s[2:3]
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_div_scale_f32 v0, s[0:1], s8, s8, v0			; GFX7-NEXT: v_div_scale_f32 v0, vcc, s8, s8, v0
	; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0			; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f32_scalar_den_1:			; GFX8-LABEL: test_div_scale_f32_scalar_den_1:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX8-NEXT: s_load_dword s0, s[0:1], 0x34			; GFX8-NEXT: s_load_dword s0, s[0:1], 0x34
	; GFX8-NEXT: v_lshlrev_b32_e32 v2, 2, v0			; GFX8-NEXT: v_lshlrev_b32_e32 v2, 2, v0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s6			; GFX8-NEXT: v_mov_b32_e32 v0, s6
	; GFX8-NEXT: v_mov_b32_e32 v1, s7			; GFX8-NEXT: v_mov_b32_e32 v1, s7
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2			; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2
	; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; GFX8-NEXT: flat_load_dword v0, v[0:1]			; GFX8-NEXT: flat_load_dword v0, v[0:1]
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_div_scale_f32 v2, s[0:1], s0, s0, v0			; GFX8-NEXT: v_div_scale_f32 v2, vcc, s0, s0, v0
	; GFX8-NEXT: v_mov_b32_e32 v0, s4			; GFX8-NEXT: v_mov_b32_e32 v0, s4
	; GFX8-NEXT: v_mov_b32_e32 v1, s5			; GFX8-NEXT: v_mov_b32_e32 v1, s5
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f32_scalar_den_1:			; GFX10-LABEL: test_div_scale_f32_scalar_den_1:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX10-NEXT: s_load_dword s0, s[0:1], 0x34			; GFX10-NEXT: s_load_dword s0, s[0:1], 0x34
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: global_load_dword v0, v0, s[6:7]			; GFX10-NEXT: global_load_dword v0, v0, s[6:7]
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f32 v0, s0, s0, s0, v0			; GFX10-NEXT: v_div_scale_f32 v0, vcc_lo, s0, s0, v0
	; GFX10-NEXT: global_store_dword v1, v0, s[4:5]			; GFX10-NEXT: global_store_dword v1, v0, s[4:5]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f32_scalar_den_1:			; GFX11-LABEL: test_div_scale_f32_scalar_den_1:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24			; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24
	; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_lshlrev_b32 v0, 2, v0			; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_lshlrev_b32 v0, 2, v0
	; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x34			; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x34
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: global_load_b32 v0, v0, s[6:7]			; GFX11-NEXT: global_load_b32 v0, v0, s[6:7]
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_div_scale_f32 v0, null, s0, s0, v0			; GFX11-NEXT: v_div_scale_f32 v0, vcc_lo, s0, s0, v0
	; GFX11-NEXT: global_store_b32 v1, v0, s[4:5]			; GFX11-NEXT: global_store_b32 v1, v0, s[4:5]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%gep = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep = getelementptr float, float addrspace(1)* %in, i32 %tid

	%a = load float, float addrspace(1)* %gep, align 4			%a = load float, float addrspace(1)* %gep, align 4

	Show All 13 Lines
	; GFX7-NEXT: s_mov_b32 s2, 0			; GFX7-NEXT: s_mov_b32 s2, 0
	; GFX7-NEXT: s_mov_b32 s3, 0xf000			; GFX7-NEXT: s_mov_b32 s3, 0xf000
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_mov_b64 s[0:1], s[6:7]			; GFX7-NEXT: s_mov_b64 s[0:1], s[6:7]
	; GFX7-NEXT: buffer_load_dword v0, v[0:1], s[0:3], 0 addr64			; GFX7-NEXT: buffer_load_dword v0, v[0:1], s[0:3], 0 addr64
	; GFX7-NEXT: s_mov_b32 s2, -1			; GFX7-NEXT: s_mov_b32 s2, -1
	; GFX7-NEXT: s_mov_b64 s[6:7], s[2:3]			; GFX7-NEXT: s_mov_b64 s[6:7], s[2:3]
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_div_scale_f32 v0, s[0:1], v0, s8, v0			; GFX7-NEXT: v_div_scale_f32 v0, vcc, v0, s8, v0
	; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0			; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f32_scalar_den_2:			; GFX8-LABEL: test_div_scale_f32_scalar_den_2:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX8-NEXT: s_load_dword s0, s[0:1], 0x34			; GFX8-NEXT: s_load_dword s0, s[0:1], 0x34
	; GFX8-NEXT: v_lshlrev_b32_e32 v2, 2, v0			; GFX8-NEXT: v_lshlrev_b32_e32 v2, 2, v0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s6			; GFX8-NEXT: v_mov_b32_e32 v0, s6
	; GFX8-NEXT: v_mov_b32_e32 v1, s7			; GFX8-NEXT: v_mov_b32_e32 v1, s7
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2			; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2
	; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; GFX8-NEXT: flat_load_dword v0, v[0:1]			; GFX8-NEXT: flat_load_dword v0, v[0:1]
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_div_scale_f32 v2, s[0:1], v0, s0, v0			; GFX8-NEXT: v_div_scale_f32 v2, vcc, v0, s0, v0
	; GFX8-NEXT: v_mov_b32_e32 v0, s4			; GFX8-NEXT: v_mov_b32_e32 v0, s4
	; GFX8-NEXT: v_mov_b32_e32 v1, s5			; GFX8-NEXT: v_mov_b32_e32 v1, s5
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f32_scalar_den_2:			; GFX10-LABEL: test_div_scale_f32_scalar_den_2:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX10-NEXT: s_load_dword s0, s[0:1], 0x34			; GFX10-NEXT: s_load_dword s0, s[0:1], 0x34
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: global_load_dword v0, v0, s[6:7]			; GFX10-NEXT: global_load_dword v0, v0, s[6:7]
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f32 v0, s0, v0, s0, v0			; GFX10-NEXT: v_div_scale_f32 v0, vcc_lo, v0, s0, v0
	; GFX10-NEXT: global_store_dword v1, v0, s[4:5]			; GFX10-NEXT: global_store_dword v1, v0, s[4:5]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f32_scalar_den_2:			; GFX11-LABEL: test_div_scale_f32_scalar_den_2:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24			; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24
	; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_lshlrev_b32 v0, 2, v0			; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_lshlrev_b32 v0, 2, v0
	; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x34			; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x34
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: global_load_b32 v0, v0, s[6:7]			; GFX11-NEXT: global_load_b32 v0, v0, s[6:7]
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_div_scale_f32 v0, null, v0, s0, v0			; GFX11-NEXT: v_div_scale_f32 v0, vcc_lo, v0, s0, v0
	; GFX11-NEXT: global_store_b32 v1, v0, s[4:5]			; GFX11-NEXT: global_store_b32 v1, v0, s[4:5]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%gep = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep = getelementptr float, float addrspace(1)* %in, i32 %tid

	%a = load float, float addrspace(1)* %gep, align 4			%a = load float, float addrspace(1)* %gep, align 4

	Show All 13 Lines
	; GFX7-NEXT: v_mov_b32_e32 v0, s6			; GFX7-NEXT: v_mov_b32_e32 v0, s6
	; GFX7-NEXT: v_mov_b32_e32 v1, s7			; GFX7-NEXT: v_mov_b32_e32 v1, s7
	; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2			; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2
	; GFX7-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; GFX7-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; GFX7-NEXT: flat_load_dwordx2 v[0:1], v[0:1]			; GFX7-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
	; GFX7-NEXT: v_mov_b32_e32 v2, s4			; GFX7-NEXT: v_mov_b32_e32 v2, s4
	; GFX7-NEXT: v_mov_b32_e32 v3, s5			; GFX7-NEXT: v_mov_b32_e32 v3, s5
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_div_scale_f64 v[0:1], s[0:1], v[0:1], v[0:1], s[0:1]			; GFX7-NEXT: v_div_scale_f64 v[0:1], vcc, v[0:1], v[0:1], s[0:1]
	; GFX7-NEXT: flat_store_dwordx2 v[2:3], v[0:1]			; GFX7-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f64_scalar_num_1:			; GFX8-LABEL: test_div_scale_f64_scalar_num_1:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x54			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x54
	; GFX8-NEXT: v_lshlrev_b32_e32 v2, 3, v0			; GFX8-NEXT: v_lshlrev_b32_e32 v2, 3, v0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s6			; GFX8-NEXT: v_mov_b32_e32 v0, s6
	; GFX8-NEXT: v_mov_b32_e32 v1, s7			; GFX8-NEXT: v_mov_b32_e32 v1, s7
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2			; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2
	; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; GFX8-NEXT: flat_load_dwordx2 v[0:1], v[0:1]			; GFX8-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
	; GFX8-NEXT: v_mov_b32_e32 v2, s4			; GFX8-NEXT: v_mov_b32_e32 v2, s4
	; GFX8-NEXT: v_mov_b32_e32 v3, s5			; GFX8-NEXT: v_mov_b32_e32 v3, s5
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_div_scale_f64 v[0:1], s[0:1], v[0:1], v[0:1], s[0:1]			; GFX8-NEXT: v_div_scale_f64 v[0:1], vcc, v[0:1], v[0:1], s[0:1]
	; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]			; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f64_scalar_num_1:			; GFX10-LABEL: test_div_scale_f64_scalar_num_1:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-NEXT: v_lshlrev_b32_e32 v0, 3, v0			; GFX10-NEXT: v_lshlrev_b32_e32 v0, 3, v0
	; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x54			; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x54
	; GFX10-NEXT: v_mov_b32_e32 v2, 0			; GFX10-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v0, s[6:7]			; GFX10-NEXT: global_load_dwordx2 v[0:1], v0, s[6:7]
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f64 v[0:1], s0, v[0:1], v[0:1], s[0:1]			; GFX10-NEXT: v_div_scale_f64 v[0:1], vcc_lo, v[0:1], v[0:1], s[0:1]
	; GFX10-NEXT: global_store_dwordx2 v2, v[0:1], s[4:5]			; GFX10-NEXT: global_store_dwordx2 v2, v[0:1], s[4:5]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f64_scalar_num_1:			; GFX11-LABEL: test_div_scale_f64_scalar_num_1:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24			; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24
	; GFX11-NEXT: v_lshlrev_b32_e32 v0, 3, v0			; GFX11-NEXT: v_lshlrev_b32_e32 v0, 3, v0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x54			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x54
	; GFX11-NEXT: v_mov_b32_e32 v2, 0			; GFX11-NEXT: v_mov_b32_e32 v2, 0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: global_load_b64 v[0:1], v0, s[6:7]			; GFX11-NEXT: global_load_b64 v[0:1], v0, s[6:7]
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_div_scale_f64 v[0:1], null, v[0:1], v[0:1], s[0:1]			; GFX11-NEXT: v_div_scale_f64 v[0:1], vcc_lo, v[0:1], v[0:1], s[0:1]
	; GFX11-NEXT: global_store_b64 v2, v[0:1], s[4:5]			; GFX11-NEXT: global_store_b64 v2, v[0:1], s[4:5]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%gep = getelementptr double, double addrspace(1)* %in, i32 %tid			%gep = getelementptr double, double addrspace(1)* %in, i32 %tid

	%b = load double, double addrspace(1)* %gep, align 8			%b = load double, double addrspace(1)* %gep, align 8

	Show All 13 Lines
	; GFX7-NEXT: v_mov_b32_e32 v0, s6			; GFX7-NEXT: v_mov_b32_e32 v0, s6
	; GFX7-NEXT: v_mov_b32_e32 v1, s7			; GFX7-NEXT: v_mov_b32_e32 v1, s7
	; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2			; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2
	; GFX7-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; GFX7-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; GFX7-NEXT: flat_load_dwordx2 v[0:1], v[0:1]			; GFX7-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
	; GFX7-NEXT: v_mov_b32_e32 v2, s4			; GFX7-NEXT: v_mov_b32_e32 v2, s4
	; GFX7-NEXT: v_mov_b32_e32 v3, s5			; GFX7-NEXT: v_mov_b32_e32 v3, s5
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_div_scale_f64 v[0:1], s[0:1], s[0:1], v[0:1], s[0:1]			; GFX7-NEXT: v_div_scale_f64 v[0:1], vcc, s[0:1], v[0:1], s[0:1]
	; GFX7-NEXT: flat_store_dwordx2 v[2:3], v[0:1]			; GFX7-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f64_scalar_num_2:			; GFX8-LABEL: test_div_scale_f64_scalar_num_2:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x54			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x54
	; GFX8-NEXT: v_lshlrev_b32_e32 v2, 3, v0			; GFX8-NEXT: v_lshlrev_b32_e32 v2, 3, v0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s6			; GFX8-NEXT: v_mov_b32_e32 v0, s6
	; GFX8-NEXT: v_mov_b32_e32 v1, s7			; GFX8-NEXT: v_mov_b32_e32 v1, s7
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2			; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2
	; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; GFX8-NEXT: flat_load_dwordx2 v[0:1], v[0:1]			; GFX8-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
	; GFX8-NEXT: v_mov_b32_e32 v2, s4			; GFX8-NEXT: v_mov_b32_e32 v2, s4
	; GFX8-NEXT: v_mov_b32_e32 v3, s5			; GFX8-NEXT: v_mov_b32_e32 v3, s5
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_div_scale_f64 v[0:1], s[0:1], s[0:1], v[0:1], s[0:1]			; GFX8-NEXT: v_div_scale_f64 v[0:1], vcc, s[0:1], v[0:1], s[0:1]
	; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]			; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f64_scalar_num_2:			; GFX10-LABEL: test_div_scale_f64_scalar_num_2:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-NEXT: v_lshlrev_b32_e32 v0, 3, v0			; GFX10-NEXT: v_lshlrev_b32_e32 v0, 3, v0
	; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x54			; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x54
	; GFX10-NEXT: v_mov_b32_e32 v2, 0			; GFX10-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v0, s[6:7]			; GFX10-NEXT: global_load_dwordx2 v[0:1], v0, s[6:7]
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f64 v[0:1], s0, s[0:1], v[0:1], s[0:1]			; GFX10-NEXT: v_div_scale_f64 v[0:1], vcc_lo, s[0:1], v[0:1], s[0:1]
	; GFX10-NEXT: global_store_dwordx2 v2, v[0:1], s[4:5]			; GFX10-NEXT: global_store_dwordx2 v2, v[0:1], s[4:5]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f64_scalar_num_2:			; GFX11-LABEL: test_div_scale_f64_scalar_num_2:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24			; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24
	; GFX11-NEXT: v_lshlrev_b32_e32 v0, 3, v0			; GFX11-NEXT: v_lshlrev_b32_e32 v0, 3, v0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x54			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x54
	; GFX11-NEXT: v_mov_b32_e32 v2, 0			; GFX11-NEXT: v_mov_b32_e32 v2, 0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: global_load_b64 v[0:1], v0, s[6:7]			; GFX11-NEXT: global_load_b64 v[0:1], v0, s[6:7]
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_div_scale_f64 v[0:1], null, s[0:1], v[0:1], s[0:1]			; GFX11-NEXT: v_div_scale_f64 v[0:1], vcc_lo, s[0:1], v[0:1], s[0:1]
	; GFX11-NEXT: global_store_b64 v2, v[0:1], s[4:5]			; GFX11-NEXT: global_store_b64 v2, v[0:1], s[4:5]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%gep = getelementptr double, double addrspace(1)* %in, i32 %tid			%gep = getelementptr double, double addrspace(1)* %in, i32 %tid

	%b = load double, double addrspace(1)* %gep, align 8			%b = load double, double addrspace(1)* %gep, align 8

	Show All 13 Lines
	; GFX7-NEXT: v_mov_b32_e32 v0, s6			; GFX7-NEXT: v_mov_b32_e32 v0, s6
	; GFX7-NEXT: v_mov_b32_e32 v1, s7			; GFX7-NEXT: v_mov_b32_e32 v1, s7
	; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2			; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2
	; GFX7-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; GFX7-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; GFX7-NEXT: flat_load_dwordx2 v[0:1], v[0:1]			; GFX7-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
	; GFX7-NEXT: v_mov_b32_e32 v2, s4			; GFX7-NEXT: v_mov_b32_e32 v2, s4
	; GFX7-NEXT: v_mov_b32_e32 v3, s5			; GFX7-NEXT: v_mov_b32_e32 v3, s5
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_div_scale_f64 v[0:1], s[0:1], s[0:1], s[0:1], v[0:1]			; GFX7-NEXT: v_div_scale_f64 v[0:1], vcc, s[0:1], s[0:1], v[0:1]
	; GFX7-NEXT: flat_store_dwordx2 v[2:3], v[0:1]			; GFX7-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f64_scalar_den_1:			; GFX8-LABEL: test_div_scale_f64_scalar_den_1:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x54			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x54
	; GFX8-NEXT: v_lshlrev_b32_e32 v2, 3, v0			; GFX8-NEXT: v_lshlrev_b32_e32 v2, 3, v0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s6			; GFX8-NEXT: v_mov_b32_e32 v0, s6
	; GFX8-NEXT: v_mov_b32_e32 v1, s7			; GFX8-NEXT: v_mov_b32_e32 v1, s7
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2			; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2
	; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; GFX8-NEXT: flat_load_dwordx2 v[0:1], v[0:1]			; GFX8-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
	; GFX8-NEXT: v_mov_b32_e32 v2, s4			; GFX8-NEXT: v_mov_b32_e32 v2, s4
	; GFX8-NEXT: v_mov_b32_e32 v3, s5			; GFX8-NEXT: v_mov_b32_e32 v3, s5
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_div_scale_f64 v[0:1], s[0:1], s[0:1], s[0:1], v[0:1]			; GFX8-NEXT: v_div_scale_f64 v[0:1], vcc, s[0:1], s[0:1], v[0:1]
	; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]			; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f64_scalar_den_1:			; GFX10-LABEL: test_div_scale_f64_scalar_den_1:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-NEXT: v_lshlrev_b32_e32 v0, 3, v0			; GFX10-NEXT: v_lshlrev_b32_e32 v0, 3, v0
	; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x54			; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x54
	; GFX10-NEXT: v_mov_b32_e32 v2, 0			; GFX10-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v0, s[6:7]			; GFX10-NEXT: global_load_dwordx2 v[0:1], v0, s[6:7]
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f64 v[0:1], s0, s[0:1], s[0:1], v[0:1]			; GFX10-NEXT: v_div_scale_f64 v[0:1], vcc_lo, s[0:1], s[0:1], v[0:1]
	; GFX10-NEXT: global_store_dwordx2 v2, v[0:1], s[4:5]			; GFX10-NEXT: global_store_dwordx2 v2, v[0:1], s[4:5]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f64_scalar_den_1:			; GFX11-LABEL: test_div_scale_f64_scalar_den_1:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24			; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24
	; GFX11-NEXT: v_lshlrev_b32_e32 v0, 3, v0			; GFX11-NEXT: v_lshlrev_b32_e32 v0, 3, v0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x54			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x54
	; GFX11-NEXT: v_mov_b32_e32 v2, 0			; GFX11-NEXT: v_mov_b32_e32 v2, 0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: global_load_b64 v[0:1], v0, s[6:7]			; GFX11-NEXT: global_load_b64 v[0:1], v0, s[6:7]
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_div_scale_f64 v[0:1], null, s[0:1], s[0:1], v[0:1]			; GFX11-NEXT: v_div_scale_f64 v[0:1], vcc_lo, s[0:1], s[0:1], v[0:1]
	; GFX11-NEXT: global_store_b64 v2, v[0:1], s[4:5]			; GFX11-NEXT: global_store_b64 v2, v[0:1], s[4:5]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%gep = getelementptr double, double addrspace(1)* %in, i32 %tid			%gep = getelementptr double, double addrspace(1)* %in, i32 %tid

	%a = load double, double addrspace(1)* %gep, align 8			%a = load double, double addrspace(1)* %gep, align 8

	Show All 13 Lines
	; GFX7-NEXT: v_mov_b32_e32 v0, s6			; GFX7-NEXT: v_mov_b32_e32 v0, s6
	; GFX7-NEXT: v_mov_b32_e32 v1, s7			; GFX7-NEXT: v_mov_b32_e32 v1, s7
	; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2			; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2
	; GFX7-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; GFX7-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; GFX7-NEXT: flat_load_dwordx2 v[0:1], v[0:1]			; GFX7-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
	; GFX7-NEXT: v_mov_b32_e32 v2, s4			; GFX7-NEXT: v_mov_b32_e32 v2, s4
	; GFX7-NEXT: v_mov_b32_e32 v3, s5			; GFX7-NEXT: v_mov_b32_e32 v3, s5
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_div_scale_f64 v[0:1], s[0:1], v[0:1], s[0:1], v[0:1]			; GFX7-NEXT: v_div_scale_f64 v[0:1], vcc, v[0:1], s[0:1], v[0:1]
	; GFX7-NEXT: flat_store_dwordx2 v[2:3], v[0:1]			; GFX7-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f64_scalar_den_2:			; GFX8-LABEL: test_div_scale_f64_scalar_den_2:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x54			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x54
	; GFX8-NEXT: v_lshlrev_b32_e32 v2, 3, v0			; GFX8-NEXT: v_lshlrev_b32_e32 v2, 3, v0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s6			; GFX8-NEXT: v_mov_b32_e32 v0, s6
	; GFX8-NEXT: v_mov_b32_e32 v1, s7			; GFX8-NEXT: v_mov_b32_e32 v1, s7
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2			; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2
	; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; GFX8-NEXT: flat_load_dwordx2 v[0:1], v[0:1]			; GFX8-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
	; GFX8-NEXT: v_mov_b32_e32 v2, s4			; GFX8-NEXT: v_mov_b32_e32 v2, s4
	; GFX8-NEXT: v_mov_b32_e32 v3, s5			; GFX8-NEXT: v_mov_b32_e32 v3, s5
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_div_scale_f64 v[0:1], s[0:1], v[0:1], s[0:1], v[0:1]			; GFX8-NEXT: v_div_scale_f64 v[0:1], vcc, v[0:1], s[0:1], v[0:1]
	; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]			; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f64_scalar_den_2:			; GFX10-LABEL: test_div_scale_f64_scalar_den_2:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-NEXT: v_lshlrev_b32_e32 v0, 3, v0			; GFX10-NEXT: v_lshlrev_b32_e32 v0, 3, v0
	; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x54			; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x54
	; GFX10-NEXT: v_mov_b32_e32 v2, 0			; GFX10-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v0, s[6:7]			; GFX10-NEXT: global_load_dwordx2 v[0:1], v0, s[6:7]
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f64 v[0:1], s0, v[0:1], s[0:1], v[0:1]			; GFX10-NEXT: v_div_scale_f64 v[0:1], vcc_lo, v[0:1], s[0:1], v[0:1]
	; GFX10-NEXT: global_store_dwordx2 v2, v[0:1], s[4:5]			; GFX10-NEXT: global_store_dwordx2 v2, v[0:1], s[4:5]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f64_scalar_den_2:			; GFX11-LABEL: test_div_scale_f64_scalar_den_2:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24			; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24
	; GFX11-NEXT: v_lshlrev_b32_e32 v0, 3, v0			; GFX11-NEXT: v_lshlrev_b32_e32 v0, 3, v0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x54			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x54
	; GFX11-NEXT: v_mov_b32_e32 v2, 0			; GFX11-NEXT: v_mov_b32_e32 v2, 0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: global_load_b64 v[0:1], v0, s[6:7]			; GFX11-NEXT: global_load_b64 v[0:1], v0, s[6:7]
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_div_scale_f64 v[0:1], null, v[0:1], s[0:1], v[0:1]			; GFX11-NEXT: v_div_scale_f64 v[0:1], vcc_lo, v[0:1], s[0:1], v[0:1]
	; GFX11-NEXT: global_store_b64 v2, v[0:1], s[4:5]			; GFX11-NEXT: global_store_b64 v2, v[0:1], s[4:5]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%gep = getelementptr double, double addrspace(1)* %in, i32 %tid			%gep = getelementptr double, double addrspace(1)* %in, i32 %tid

	%a = load double, double addrspace(1)* %gep, align 8			%a = load double, double addrspace(1)* %gep, align 8

	%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double %a, double %b, i1 true)			%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double %a, double %b, i1 true)
	%result0 = extractvalue { double, i1 } %result, 0			%result0 = extractvalue { double, i1 } %result, 0
	store double %result0, double addrspace(1)* %out, align 8			store double %result0, double addrspace(1)* %out, align 8
	ret void			ret void
	}			}

	define amdgpu_kernel void @test_div_scale_f32_all_scalar_1(float addrspace(1)* %out, [8 x i32], float %a, [8 x i32], float %b) {			define amdgpu_kernel void @test_div_scale_f32_all_scalar_1(float addrspace(1)* %out, [8 x i32], float %a, [8 x i32], float %b) {
	; GFX7-LABEL: test_div_scale_f32_all_scalar_1:			; GFX7-LABEL: test_div_scale_f32_all_scalar_1:
	; GFX7: ; %bb.0:			; GFX7: ; %bb.0:
	; GFX7-NEXT: s_load_dword s3, s[0:1], 0x1c			; GFX7-NEXT: s_load_dword s3, s[0:1], 0x1c
	; GFX7-NEXT: s_load_dword s4, s[0:1], 0x13			; GFX7-NEXT: s_load_dword s4, s[0:1], 0x13
	; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
	; GFX7-NEXT: s_mov_b32 s2, -1			; GFX7-NEXT: s_mov_b32 s2, -1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: v_mov_b32_e32 v0, s3			; GFX7-NEXT: v_mov_b32_e32 v0, s3
	; GFX7-NEXT: v_div_scale_f32 v0, s[4:5], v0, v0, s4			; GFX7-NEXT: v_div_scale_f32 v0, vcc, v0, v0, s4
	; GFX7-NEXT: s_mov_b32 s3, 0xf000			; GFX7-NEXT: s_mov_b32 s3, 0xf000
	; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f32_all_scalar_1:			; GFX8-LABEL: test_div_scale_f32_all_scalar_1:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: s_load_dword s2, s[0:1], 0x70			; GFX8-NEXT: s_load_dword s2, s[0:1], 0x70
	; GFX8-NEXT: s_load_dword s3, s[0:1], 0x4c			; GFX8-NEXT: s_load_dword s3, s[0:1], 0x4c
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s2			; GFX8-NEXT: v_mov_b32_e32 v0, s2
	; GFX8-NEXT: v_div_scale_f32 v2, s[2:3], v0, v0, s3			; GFX8-NEXT: v_div_scale_f32 v2, vcc, v0, v0, s3
	; GFX8-NEXT: v_mov_b32_e32 v0, s0			; GFX8-NEXT: v_mov_b32_e32 v0, s0
	; GFX8-NEXT: v_mov_b32_e32 v1, s1			; GFX8-NEXT: v_mov_b32_e32 v1, s1
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f32_all_scalar_1:			; GFX10-LABEL: test_div_scale_f32_all_scalar_1:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_clause 0x2			; GFX10-NEXT: s_clause 0x2
	; GFX10-NEXT: s_load_dword s4, s[0:1], 0x4c			; GFX10-NEXT: s_load_dword s4, s[0:1], 0x4c
	; GFX10-NEXT: s_load_dword s5, s[0:1], 0x70			; GFX10-NEXT: s_load_dword s5, s[0:1], 0x70
	; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_div_scale_f32 v0, s0, s5, s5, s4			; GFX10-NEXT: v_div_scale_f32 v0, vcc_lo, s5, s5, s4
	; GFX10-NEXT: global_store_dword v1, v0, s[2:3]			; GFX10-NEXT: global_store_dword v1, v0, s[2:3]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f32_all_scalar_1:			; GFX11-LABEL: test_div_scale_f32_all_scalar_1:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_clause 0x2			; GFX11-NEXT: s_clause 0x2
	; GFX11-NEXT: s_load_b32 s2, s[0:1], 0x4c			; GFX11-NEXT: s_load_b32 s2, s[0:1], 0x4c
	; GFX11-NEXT: s_load_b32 s3, s[0:1], 0x70			; GFX11-NEXT: s_load_b32 s3, s[0:1], 0x70
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX11-NEXT: v_mov_b32_e32 v1, 0			; GFX11-NEXT: v_mov_b32_e32 v1, 0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: v_div_scale_f32 v0, null, s3, s3, s2			; GFX11-NEXT: v_div_scale_f32 v0, vcc_lo, s3, s3, s2
	; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]			; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b, i1 false)			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b, i1 false)
	%result0 = extractvalue { float, i1 } %result, 0			%result0 = extractvalue { float, i1 } %result, 0
	store float %result0, float addrspace(1)* %out, align 4			store float %result0, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	define amdgpu_kernel void @test_div_scale_f32_all_scalar_2(float addrspace(1)* %out, [8 x i32], float %a, [8 x i32], float %b) {			define amdgpu_kernel void @test_div_scale_f32_all_scalar_2(float addrspace(1)* %out, [8 x i32], float %a, [8 x i32], float %b) {
	; GFX7-LABEL: test_div_scale_f32_all_scalar_2:			; GFX7-LABEL: test_div_scale_f32_all_scalar_2:
	; GFX7: ; %bb.0:			; GFX7: ; %bb.0:
	; GFX7-NEXT: s_load_dword s3, s[0:1], 0x1c			; GFX7-NEXT: s_load_dword s3, s[0:1], 0x1c
	; GFX7-NEXT: s_load_dword s4, s[0:1], 0x13			; GFX7-NEXT: s_load_dword s4, s[0:1], 0x13
	; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
	; GFX7-NEXT: s_mov_b32 s2, -1			; GFX7-NEXT: s_mov_b32 s2, -1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: v_mov_b32_e32 v0, s3			; GFX7-NEXT: v_mov_b32_e32 v0, s3
	; GFX7-NEXT: v_div_scale_f32 v0, s[4:5], s4, v0, s4			; GFX7-NEXT: v_div_scale_f32 v0, vcc, s4, v0, s4
	; GFX7-NEXT: s_mov_b32 s3, 0xf000			; GFX7-NEXT: s_mov_b32 s3, 0xf000
	; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f32_all_scalar_2:			; GFX8-LABEL: test_div_scale_f32_all_scalar_2:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: s_load_dword s2, s[0:1], 0x70			; GFX8-NEXT: s_load_dword s2, s[0:1], 0x70
	; GFX8-NEXT: s_load_dword s3, s[0:1], 0x4c			; GFX8-NEXT: s_load_dword s3, s[0:1], 0x4c
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s2			; GFX8-NEXT: v_mov_b32_e32 v0, s2
	; GFX8-NEXT: v_div_scale_f32 v2, s[2:3], s3, v0, s3			; GFX8-NEXT: v_div_scale_f32 v2, vcc, s3, v0, s3
	; GFX8-NEXT: v_mov_b32_e32 v0, s0			; GFX8-NEXT: v_mov_b32_e32 v0, s0
	; GFX8-NEXT: v_mov_b32_e32 v1, s1			; GFX8-NEXT: v_mov_b32_e32 v1, s1
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f32_all_scalar_2:			; GFX10-LABEL: test_div_scale_f32_all_scalar_2:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_clause 0x2			; GFX10-NEXT: s_clause 0x2
	; GFX10-NEXT: s_load_dword s4, s[0:1], 0x4c			; GFX10-NEXT: s_load_dword s4, s[0:1], 0x4c
	; GFX10-NEXT: s_load_dword s5, s[0:1], 0x70			; GFX10-NEXT: s_load_dword s5, s[0:1], 0x70
	; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_div_scale_f32 v0, s0, s4, s5, s4			; GFX10-NEXT: v_div_scale_f32 v0, vcc_lo, s4, s5, s4
	; GFX10-NEXT: global_store_dword v1, v0, s[2:3]			; GFX10-NEXT: global_store_dword v1, v0, s[2:3]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f32_all_scalar_2:			; GFX11-LABEL: test_div_scale_f32_all_scalar_2:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_clause 0x2			; GFX11-NEXT: s_clause 0x2
	; GFX11-NEXT: s_load_b32 s2, s[0:1], 0x4c			; GFX11-NEXT: s_load_b32 s2, s[0:1], 0x4c
	; GFX11-NEXT: s_load_b32 s3, s[0:1], 0x70			; GFX11-NEXT: s_load_b32 s3, s[0:1], 0x70
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX11-NEXT: v_mov_b32_e32 v1, 0			; GFX11-NEXT: v_mov_b32_e32 v1, 0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: v_div_scale_f32 v0, null, s2, s3, s2			; GFX11-NEXT: v_div_scale_f32 v0, vcc_lo, s2, s3, s2
	; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]			; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b, i1 true)			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b, i1 true)
	%result0 = extractvalue { float, i1 } %result, 0			%result0 = extractvalue { float, i1 } %result, 0
	store float %result0, float addrspace(1)* %out, align 4			store float %result0, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	define amdgpu_kernel void @test_div_scale_f64_all_scalar_1(double addrspace(1)* %out, [8 x i32], double %a, [8 x i32], double %b) {			define amdgpu_kernel void @test_div_scale_f64_all_scalar_1(double addrspace(1)* %out, [8 x i32], double %a, [8 x i32], double %b) {
	; GFX7-LABEL: test_div_scale_f64_all_scalar_1:			; GFX7-LABEL: test_div_scale_f64_all_scalar_1:
	; GFX7: ; %bb.0:			; GFX7: ; %bb.0:
	; GFX7-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x1d			; GFX7-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x1d
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x13			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x13
	; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: v_mov_b32_e32 v0, s2			; GFX7-NEXT: v_mov_b32_e32 v0, s2
	; GFX7-NEXT: v_mov_b32_e32 v1, s3			; GFX7-NEXT: v_mov_b32_e32 v1, s3
	; GFX7-NEXT: v_div_scale_f64 v[0:1], s[2:3], v[0:1], v[0:1], s[4:5]			; GFX7-NEXT: v_div_scale_f64 v[0:1], vcc, v[0:1], v[0:1], s[4:5]
	; GFX7-NEXT: v_mov_b32_e32 v3, s1			; GFX7-NEXT: v_mov_b32_e32 v3, s1
	; GFX7-NEXT: v_mov_b32_e32 v2, s0			; GFX7-NEXT: v_mov_b32_e32 v2, s0
	; GFX7-NEXT: flat_store_dwordx2 v[2:3], v[0:1]			; GFX7-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f64_all_scalar_1:			; GFX8-LABEL: test_div_scale_f64_all_scalar_1:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x74			; GFX8-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x74
	; GFX8-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x4c			; GFX8-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x4c
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s2			; GFX8-NEXT: v_mov_b32_e32 v0, s2
	; GFX8-NEXT: v_mov_b32_e32 v1, s3			; GFX8-NEXT: v_mov_b32_e32 v1, s3
	; GFX8-NEXT: v_div_scale_f64 v[0:1], s[2:3], v[0:1], v[0:1], s[4:5]			; GFX8-NEXT: v_div_scale_f64 v[0:1], vcc, v[0:1], v[0:1], s[4:5]
	; GFX8-NEXT: v_mov_b32_e32 v3, s1			; GFX8-NEXT: v_mov_b32_e32 v3, s1
	; GFX8-NEXT: v_mov_b32_e32 v2, s0			; GFX8-NEXT: v_mov_b32_e32 v2, s0
	; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]			; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f64_all_scalar_1:			; GFX10-LABEL: test_div_scale_f64_all_scalar_1:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x4c			; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x4c
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x74			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x74
	; GFX10-NEXT: v_mov_b32_e32 v2, 0			; GFX10-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_div_scale_f64 v[0:1], s2, s[4:5], s[4:5], s[2:3]			; GFX10-NEXT: v_div_scale_f64 v[0:1], vcc_lo, s[4:5], s[4:5], s[2:3]
	; GFX10-NEXT: global_store_dwordx2 v2, v[0:1], s[0:1]			; GFX10-NEXT: global_store_dwordx2 v2, v[0:1], s[0:1]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f64_all_scalar_1:			; GFX11-LABEL: test_div_scale_f64_all_scalar_1:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_clause 0x2			; GFX11-NEXT: s_clause 0x2
	; GFX11-NEXT: s_load_b64 s[2:3], s[0:1], 0x4c			; GFX11-NEXT: s_load_b64 s[2:3], s[0:1], 0x4c
	; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x74			; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x74
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX11-NEXT: v_mov_b32_e32 v2, 0			; GFX11-NEXT: v_mov_b32_e32 v2, 0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: v_div_scale_f64 v[0:1], null, s[4:5], s[4:5], s[2:3]			; GFX11-NEXT: v_div_scale_f64 v[0:1], vcc_lo, s[4:5], s[4:5], s[2:3]
	; GFX11-NEXT: global_store_b64 v2, v[0:1], s[0:1]			; GFX11-NEXT: global_store_b64 v2, v[0:1], s[0:1]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double %a, double %b, i1 false)			%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double %a, double %b, i1 false)
	%result0 = extractvalue { double, i1 } %result, 0			%result0 = extractvalue { double, i1 } %result, 0
	store double %result0, double addrspace(1)* %out, align 8			store double %result0, double addrspace(1)* %out, align 8
	ret void			ret void
	}			}

	define amdgpu_kernel void @test_div_scale_f64_all_scalar_2(double addrspace(1)* %out, [8 x i32], double %a, [8 x i32], double %b) {			define amdgpu_kernel void @test_div_scale_f64_all_scalar_2(double addrspace(1)* %out, [8 x i32], double %a, [8 x i32], double %b) {
	; GFX7-LABEL: test_div_scale_f64_all_scalar_2:			; GFX7-LABEL: test_div_scale_f64_all_scalar_2:
	; GFX7: ; %bb.0:			; GFX7: ; %bb.0:
	; GFX7-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x1d			; GFX7-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x1d
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x13			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x13
	; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: v_mov_b32_e32 v0, s2			; GFX7-NEXT: v_mov_b32_e32 v0, s2
	; GFX7-NEXT: v_mov_b32_e32 v1, s3			; GFX7-NEXT: v_mov_b32_e32 v1, s3
	; GFX7-NEXT: v_div_scale_f64 v[0:1], s[2:3], s[4:5], v[0:1], s[4:5]			; GFX7-NEXT: v_div_scale_f64 v[0:1], vcc, s[4:5], v[0:1], s[4:5]
	; GFX7-NEXT: v_mov_b32_e32 v3, s1			; GFX7-NEXT: v_mov_b32_e32 v3, s1
	; GFX7-NEXT: v_mov_b32_e32 v2, s0			; GFX7-NEXT: v_mov_b32_e32 v2, s0
	; GFX7-NEXT: flat_store_dwordx2 v[2:3], v[0:1]			; GFX7-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f64_all_scalar_2:			; GFX8-LABEL: test_div_scale_f64_all_scalar_2:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x74			; GFX8-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x74
	; GFX8-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x4c			; GFX8-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x4c
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s2			; GFX8-NEXT: v_mov_b32_e32 v0, s2
	; GFX8-NEXT: v_mov_b32_e32 v1, s3			; GFX8-NEXT: v_mov_b32_e32 v1, s3
	; GFX8-NEXT: v_div_scale_f64 v[0:1], s[2:3], s[4:5], v[0:1], s[4:5]			; GFX8-NEXT: v_div_scale_f64 v[0:1], vcc, s[4:5], v[0:1], s[4:5]
	; GFX8-NEXT: v_mov_b32_e32 v3, s1			; GFX8-NEXT: v_mov_b32_e32 v3, s1
	; GFX8-NEXT: v_mov_b32_e32 v2, s0			; GFX8-NEXT: v_mov_b32_e32 v2, s0
	; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]			; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f64_all_scalar_2:			; GFX10-LABEL: test_div_scale_f64_all_scalar_2:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x4c			; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x4c
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x74			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x74
	; GFX10-NEXT: v_mov_b32_e32 v2, 0			; GFX10-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_div_scale_f64 v[0:1], s2, s[2:3], s[4:5], s[2:3]			; GFX10-NEXT: v_div_scale_f64 v[0:1], vcc_lo, s[2:3], s[4:5], s[2:3]
	; GFX10-NEXT: global_store_dwordx2 v2, v[0:1], s[0:1]			; GFX10-NEXT: global_store_dwordx2 v2, v[0:1], s[0:1]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f64_all_scalar_2:			; GFX11-LABEL: test_div_scale_f64_all_scalar_2:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_clause 0x2			; GFX11-NEXT: s_clause 0x2
	; GFX11-NEXT: s_load_b64 s[2:3], s[0:1], 0x4c			; GFX11-NEXT: s_load_b64 s[2:3], s[0:1], 0x4c
	; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x74			; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x74
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX11-NEXT: v_mov_b32_e32 v2, 0			; GFX11-NEXT: v_mov_b32_e32 v2, 0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: v_div_scale_f64 v[0:1], null, s[2:3], s[4:5], s[2:3]			; GFX11-NEXT: v_div_scale_f64 v[0:1], vcc_lo, s[2:3], s[4:5], s[2:3]
	; GFX11-NEXT: global_store_b64 v2, v[0:1], s[0:1]			; GFX11-NEXT: global_store_b64 v2, v[0:1], s[0:1]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double %a, double %b, i1 true)			%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double %a, double %b, i1 true)
	%result0 = extractvalue { double, i1 } %result, 0			%result0 = extractvalue { double, i1 } %result, 0
	store double %result0, double addrspace(1)* %out, align 8			store double %result0, double addrspace(1)* %out, align 8
	ret void			ret void
	}			}

	define amdgpu_kernel void @test_div_scale_f32_inline_imm_num(float addrspace(1)* %out, float addrspace(1)* %in) {			define amdgpu_kernel void @test_div_scale_f32_inline_imm_num(float addrspace(1)* %out, float addrspace(1)* %in) {
	; GFX7-LABEL: test_div_scale_f32_inline_imm_num:			; GFX7-LABEL: test_div_scale_f32_inline_imm_num:
	; GFX7: ; %bb.0:			; GFX7: ; %bb.0:
	; GFX7-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x9
	; GFX7-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX7-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX7-NEXT: v_mov_b32_e32 v1, 0			; GFX7-NEXT: v_mov_b32_e32 v1, 0
	; GFX7-NEXT: s_mov_b32 s6, 0			; GFX7-NEXT: s_mov_b32 s6, 0
	; GFX7-NEXT: s_mov_b32 s7, 0xf000			; GFX7-NEXT: s_mov_b32 s7, 0xf000
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_mov_b64 s[4:5], s[2:3]			; GFX7-NEXT: s_mov_b64 s[4:5], s[2:3]
	; GFX7-NEXT: buffer_load_dword v0, v[0:1], s[4:7], 0 addr64			; GFX7-NEXT: buffer_load_dword v0, v[0:1], s[4:7], 0 addr64
	; GFX7-NEXT: s_mov_b32 s6, -1			; GFX7-NEXT: s_mov_b32 s6, -1
	; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_div_scale_f32 v0, s[2:3], v0, v0, 1.0
	; GFX7-NEXT: s_mov_b64 s[2:3], s[6:7]			; GFX7-NEXT: s_mov_b64 s[2:3], s[6:7]
				; GFX7-NEXT: s_waitcnt vmcnt(0)
				; GFX7-NEXT: v_div_scale_f32 v0, vcc, v0, v0, 1.0
	; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f32_inline_imm_num:			; GFX8-LABEL: test_div_scale_f32_inline_imm_num:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX8-NEXT: v_lshlrev_b32_e32 v2, 2, v0			; GFX8-NEXT: v_lshlrev_b32_e32 v2, 2, v0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s2			; GFX8-NEXT: v_mov_b32_e32 v0, s2
	; GFX8-NEXT: v_mov_b32_e32 v1, s3			; GFX8-NEXT: v_mov_b32_e32 v1, s3
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2			; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2
	; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; GFX8-NEXT: flat_load_dword v0, v[0:1]			; GFX8-NEXT: flat_load_dword v0, v[0:1]
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_div_scale_f32 v2, s[2:3], v0, v0, 1.0			; GFX8-NEXT: v_div_scale_f32 v2, vcc, v0, v0, 1.0
	; GFX8-NEXT: v_mov_b32_e32 v0, s0			; GFX8-NEXT: v_mov_b32_e32 v0, s0
	; GFX8-NEXT: v_mov_b32_e32 v1, s1			; GFX8-NEXT: v_mov_b32_e32 v1, s1
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f32_inline_imm_num:			; GFX10-LABEL: test_div_scale_f32_inline_imm_num:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: global_load_dword v0, v0, s[2:3]			; GFX10-NEXT: global_load_dword v0, v0, s[2:3]
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f32 v0, s2, v0, v0, 1.0			; GFX10-NEXT: v_div_scale_f32 v0, vcc_lo, v0, v0, 1.0
	; GFX10-NEXT: global_store_dword v1, v0, s[0:1]			; GFX10-NEXT: global_store_dword v1, v0, s[0:1]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f32_inline_imm_num:			; GFX11-LABEL: test_div_scale_f32_inline_imm_num:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_load_b128 s[0:3], s[0:1], 0x24			; GFX11-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
	; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_lshlrev_b32 v0, 2, v0			; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_lshlrev_b32 v0, 2, v0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: global_load_b32 v0, v0, s[2:3]			; GFX11-NEXT: global_load_b32 v0, v0, s[2:3]
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_div_scale_f32 v0, null, v0, v0, 1.0			; GFX11-NEXT: v_div_scale_f32 v0, vcc_lo, v0, v0, 1.0
	; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]			; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid
	%a = load float, float addrspace(1)* %gep.0, align 4			%a = load float, float addrspace(1)* %gep.0, align 4

	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float 1.0, float %a, i1 false)			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float 1.0, float %a, i1 false)
	Show All 9 Lines
	; GFX7-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX7-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX7-NEXT: v_mov_b32_e32 v1, 0			; GFX7-NEXT: v_mov_b32_e32 v1, 0
	; GFX7-NEXT: s_mov_b32 s6, 0			; GFX7-NEXT: s_mov_b32 s6, 0
	; GFX7-NEXT: s_mov_b32 s7, 0xf000			; GFX7-NEXT: s_mov_b32 s7, 0xf000
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_mov_b64 s[4:5], s[2:3]			; GFX7-NEXT: s_mov_b64 s[4:5], s[2:3]
	; GFX7-NEXT: buffer_load_dword v0, v[0:1], s[4:7], 0 addr64			; GFX7-NEXT: buffer_load_dword v0, v[0:1], s[4:7], 0 addr64
	; GFX7-NEXT: s_mov_b32 s6, -1			; GFX7-NEXT: s_mov_b32 s6, -1
	; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_div_scale_f32 v0, s[2:3], 2.0, 2.0, v0
	; GFX7-NEXT: s_mov_b64 s[2:3], s[6:7]			; GFX7-NEXT: s_mov_b64 s[2:3], s[6:7]
				; GFX7-NEXT: s_waitcnt vmcnt(0)
				; GFX7-NEXT: v_div_scale_f32 v0, vcc, 2.0, 2.0, v0
	; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f32_inline_imm_den:			; GFX8-LABEL: test_div_scale_f32_inline_imm_den:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX8-NEXT: v_lshlrev_b32_e32 v2, 2, v0			; GFX8-NEXT: v_lshlrev_b32_e32 v2, 2, v0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s2			; GFX8-NEXT: v_mov_b32_e32 v0, s2
	; GFX8-NEXT: v_mov_b32_e32 v1, s3			; GFX8-NEXT: v_mov_b32_e32 v1, s3
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2			; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2
	; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; GFX8-NEXT: flat_load_dword v0, v[0:1]			; GFX8-NEXT: flat_load_dword v0, v[0:1]
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_div_scale_f32 v2, s[2:3], 2.0, 2.0, v0			; GFX8-NEXT: v_div_scale_f32 v2, vcc, 2.0, 2.0, v0
	; GFX8-NEXT: v_mov_b32_e32 v0, s0			; GFX8-NEXT: v_mov_b32_e32 v0, s0
	; GFX8-NEXT: v_mov_b32_e32 v1, s1			; GFX8-NEXT: v_mov_b32_e32 v1, s1
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f32_inline_imm_den:			; GFX10-LABEL: test_div_scale_f32_inline_imm_den:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: global_load_dword v0, v0, s[2:3]			; GFX10-NEXT: global_load_dword v0, v0, s[2:3]
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f32 v0, s2, 2.0, 2.0, v0			; GFX10-NEXT: v_div_scale_f32 v0, vcc_lo, 2.0, 2.0, v0
	; GFX10-NEXT: global_store_dword v1, v0, s[0:1]			; GFX10-NEXT: global_store_dword v1, v0, s[0:1]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f32_inline_imm_den:			; GFX11-LABEL: test_div_scale_f32_inline_imm_den:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_load_b128 s[0:3], s[0:1], 0x24			; GFX11-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
	; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_lshlrev_b32 v0, 2, v0			; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_lshlrev_b32 v0, 2, v0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: global_load_b32 v0, v0, s[2:3]			; GFX11-NEXT: global_load_b32 v0, v0, s[2:3]
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_div_scale_f32 v0, null, 2.0, 2.0, v0			; GFX11-NEXT: v_div_scale_f32 v0, vcc_lo, 2.0, 2.0, v0
	; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]			; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid
	%a = load float, float addrspace(1)* %gep.0, align 4			%a = load float, float addrspace(1)* %gep.0, align 4

	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float 2.0, i1 false)			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float 2.0, i1 false)
	Show All 12 Lines
	; GFX7-NEXT: s_mov_b32 s7, 0xf000			; GFX7-NEXT: s_mov_b32 s7, 0xf000
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_mov_b64 s[4:5], s[2:3]			; GFX7-NEXT: s_mov_b64 s[4:5], s[2:3]
	; GFX7-NEXT: buffer_load_dword v2, v[0:1], s[4:7], 0 addr64 glc			; GFX7-NEXT: buffer_load_dword v2, v[0:1], s[4:7], 0 addr64 glc
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: buffer_load_dword v0, v[0:1], s[4:7], 0 addr64 offset:4 glc			; GFX7-NEXT: buffer_load_dword v0, v[0:1], s[4:7], 0 addr64 offset:4 glc
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: s_mov_b32 s6, -1			; GFX7-NEXT: s_mov_b32 s6, -1
	; GFX7-NEXT: v_and_b32_e32 v1, 0x7fffffff, v2
	; GFX7-NEXT: v_div_scale_f32 v0, s[2:3], v0, v0, v1
	; GFX7-NEXT: s_mov_b64 s[2:3], s[6:7]			; GFX7-NEXT: s_mov_b64 s[2:3], s[6:7]
				; GFX7-NEXT: v_and_b32_e32 v1, 0x7fffffff, v2
				; GFX7-NEXT: v_div_scale_f32 v0, vcc, v0, v0, v1
	; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f32_fabs_num:			; GFX8-LABEL: test_div_scale_f32_fabs_num:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX8-NEXT: v_lshlrev_b32_e32 v2, 2, v0			; GFX8-NEXT: v_lshlrev_b32_e32 v2, 2, v0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s2			; GFX8-NEXT: v_mov_b32_e32 v0, s2
	; GFX8-NEXT: v_mov_b32_e32 v1, s3			; GFX8-NEXT: v_mov_b32_e32 v1, s3
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2			; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2
	; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; GFX8-NEXT: flat_load_dword v2, v[0:1] glc			; GFX8-NEXT: flat_load_dword v2, v[0:1] glc
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, 4, v0			; GFX8-NEXT: v_add_u32_e32 v0, vcc, 4, v0
	; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; GFX8-NEXT: flat_load_dword v0, v[0:1] glc			; GFX8-NEXT: flat_load_dword v0, v[0:1] glc
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_and_b32_e32 v1, 0x7fffffff, v2			; GFX8-NEXT: v_and_b32_e32 v1, 0x7fffffff, v2
	; GFX8-NEXT: v_div_scale_f32 v2, s[2:3], v0, v0, v1			; GFX8-NEXT: v_div_scale_f32 v2, vcc, v0, v0, v1
	; GFX8-NEXT: v_mov_b32_e32 v0, s0			; GFX8-NEXT: v_mov_b32_e32 v0, s0
	; GFX8-NEXT: v_mov_b32_e32 v1, s1			; GFX8-NEXT: v_mov_b32_e32 v1, s1
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f32_fabs_num:			; GFX10-LABEL: test_div_scale_f32_fabs_num:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: global_load_dword v1, v0, s[2:3] glc dlc			; GFX10-NEXT: global_load_dword v1, v0, s[2:3] glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: global_load_dword v2, v0, s[2:3] offset:4 glc dlc			; GFX10-NEXT: global_load_dword v2, v0, s[2:3] offset:4 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_and_b32_e32 v0, 0x7fffffff, v1			; GFX10-NEXT: v_and_b32_e32 v0, 0x7fffffff, v1
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: v_div_scale_f32 v0, s2, v2, v2, v0			; GFX10-NEXT: v_div_scale_f32 v0, vcc_lo, v2, v2, v0
	; GFX10-NEXT: global_store_dword v1, v0, s[0:1]			; GFX10-NEXT: global_store_dword v1, v0, s[0:1]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f32_fabs_num:			; GFX11-LABEL: test_div_scale_f32_fabs_num:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_load_b128 s[0:3], s[0:1], 0x24			; GFX11-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
	; GFX11-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: global_load_b32 v1, v0, s[2:3] glc dlc			; GFX11-NEXT: global_load_b32 v1, v0, s[2:3] glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: global_load_b32 v0, v0, s[2:3] offset:4 glc dlc			; GFX11-NEXT: global_load_b32 v0, v0, s[2:3] offset:4 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_and_b32_e32 v1, 0x7fffffff, v1			; GFX11-NEXT: v_and_b32_e32 v1, 0x7fffffff, v1
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_div_scale_f32 v0, null, v0, v0, v1			; GFX11-NEXT: v_div_scale_f32 v0, vcc_lo, v0, v0, v1
	; GFX11-NEXT: v_mov_b32_e32 v1, 0			; GFX11-NEXT: v_mov_b32_e32 v1, 0
	; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]			; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid
	%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1

	Show All 18 Lines
	; GFX7-NEXT: s_mov_b32 s7, 0xf000			; GFX7-NEXT: s_mov_b32 s7, 0xf000
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_mov_b64 s[4:5], s[2:3]			; GFX7-NEXT: s_mov_b64 s[4:5], s[2:3]
	; GFX7-NEXT: buffer_load_dword v2, v[0:1], s[4:7], 0 addr64 glc			; GFX7-NEXT: buffer_load_dword v2, v[0:1], s[4:7], 0 addr64 glc
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: buffer_load_dword v0, v[0:1], s[4:7], 0 addr64 offset:4 glc			; GFX7-NEXT: buffer_load_dword v0, v[0:1], s[4:7], 0 addr64 offset:4 glc
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: s_mov_b32 s6, -1			; GFX7-NEXT: s_mov_b32 s6, -1
	; GFX7-NEXT: v_and_b32_e32 v0, 0x7fffffff, v0
	; GFX7-NEXT: v_div_scale_f32 v0, s[2:3], v0, v0, v2
	; GFX7-NEXT: s_mov_b64 s[2:3], s[6:7]			; GFX7-NEXT: s_mov_b64 s[2:3], s[6:7]
				; GFX7-NEXT: v_and_b32_e32 v0, 0x7fffffff, v0
				; GFX7-NEXT: v_div_scale_f32 v0, vcc, v0, v0, v2
	; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f32_fabs_den:			; GFX8-LABEL: test_div_scale_f32_fabs_den:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX8-NEXT: v_lshlrev_b32_e32 v2, 2, v0			; GFX8-NEXT: v_lshlrev_b32_e32 v2, 2, v0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s2			; GFX8-NEXT: v_mov_b32_e32 v0, s2
	; GFX8-NEXT: v_mov_b32_e32 v1, s3			; GFX8-NEXT: v_mov_b32_e32 v1, s3
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2			; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2
	; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; GFX8-NEXT: v_add_u32_e32 v2, vcc, 4, v0			; GFX8-NEXT: v_add_u32_e32 v2, vcc, 4, v0
	; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v1, vcc			; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v1, vcc
	; GFX8-NEXT: flat_load_dword v0, v[0:1] glc			; GFX8-NEXT: flat_load_dword v0, v[0:1] glc
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: flat_load_dword v1, v[2:3] glc			; GFX8-NEXT: flat_load_dword v1, v[2:3] glc
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_and_b32_e32 v1, 0x7fffffff, v1			; GFX8-NEXT: v_and_b32_e32 v1, 0x7fffffff, v1
	; GFX8-NEXT: v_div_scale_f32 v2, s[2:3], v1, v1, v0			; GFX8-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v0
	; GFX8-NEXT: v_mov_b32_e32 v0, s0			; GFX8-NEXT: v_mov_b32_e32 v0, s0
	; GFX8-NEXT: v_mov_b32_e32 v1, s1			; GFX8-NEXT: v_mov_b32_e32 v1, s1
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f32_fabs_den:			; GFX10-LABEL: test_div_scale_f32_fabs_den:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: global_load_dword v1, v0, s[2:3] glc dlc			; GFX10-NEXT: global_load_dword v1, v0, s[2:3] glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: global_load_dword v2, v0, s[2:3] offset:4 glc dlc			; GFX10-NEXT: global_load_dword v2, v0, s[2:3] offset:4 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_and_b32_e32 v0, 0x7fffffff, v2			; GFX10-NEXT: v_and_b32_e32 v0, 0x7fffffff, v2
	; GFX10-NEXT: v_div_scale_f32 v0, s2, v0, v0, v1			; GFX10-NEXT: v_div_scale_f32 v0, vcc_lo, v0, v0, v1
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: global_store_dword v1, v0, s[0:1]			; GFX10-NEXT: global_store_dword v1, v0, s[0:1]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f32_fabs_den:			; GFX11-LABEL: test_div_scale_f32_fabs_den:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_load_b128 s[0:3], s[0:1], 0x24			; GFX11-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
	; GFX11-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: global_load_b32 v1, v0, s[2:3] glc dlc			; GFX11-NEXT: global_load_b32 v1, v0, s[2:3] glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: global_load_b32 v0, v0, s[2:3] offset:4 glc dlc			; GFX11-NEXT: global_load_b32 v0, v0, s[2:3] offset:4 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_and_b32_e32 v0, 0x7fffffff, v0			; GFX11-NEXT: v_and_b32_e32 v0, 0x7fffffff, v0
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_div_scale_f32 v0, null, v0, v0, v1			; GFX11-NEXT: v_div_scale_f32 v0, vcc_lo, v0, v0, v1
	; GFX11-NEXT: v_mov_b32_e32 v1, 0			; GFX11-NEXT: v_mov_b32_e32 v1, 0
	; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]			; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid
	%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1

	%a = load volatile float, float addrspace(1)* %gep.0, align 4			%a = load volatile float, float addrspace(1)* %gep.0, align 4
	%b = load volatile float, float addrspace(1)* %gep.1, align 4			%b = load volatile float, float addrspace(1)* %gep.1, align 4

	%b.fabs = call float @llvm.fabs.f32(float %b)			%b.fabs = call float @llvm.fabs.f32(float %b)

	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b.fabs, i1 false)			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b.fabs, i1 false)
	%result0 = extractvalue { float, i1 } %result, 0			%result0 = extractvalue { float, i1 } %result, 0
	store float %result0, float addrspace(1)* %out, align 4			store float %result0, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	define amdgpu_kernel void @test_div_scale_f32_val_undef_val(float addrspace(1)* %out) #0 {			define amdgpu_kernel void @test_div_scale_f32_val_undef_val(float addrspace(1)* %out) #0 {
	; GFX7-LABEL: test_div_scale_f32_val_undef_val:			; GFX7-LABEL: test_div_scale_f32_val_undef_val:
	; GFX7: ; %bb.0:			; GFX7: ; %bb.0:
	; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
	; GFX7-NEXT: v_mov_b32_e32 v0, 0x41000000			; GFX7-NEXT: v_mov_b32_e32 v0, 0x41000000
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: v_div_scale_f32 v0, s[2:3], s0, s0, v0
	; GFX7-NEXT: s_mov_b32 s2, -1			; GFX7-NEXT: s_mov_b32 s2, -1
	; GFX7-NEXT: s_mov_b32 s3, 0xf000			; GFX7-NEXT: s_mov_b32 s3, 0xf000
				; GFX7-NEXT: s_waitcnt lgkmcnt(0)
				; GFX7-NEXT: v_div_scale_f32 v0, vcc, s0, s0, v0
	; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f32_val_undef_val:			; GFX8-LABEL: test_div_scale_f32_val_undef_val:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: v_mov_b32_e32 v0, 0x41000000			; GFX8-NEXT: v_mov_b32_e32 v0, 0x41000000
	; GFX8-NEXT: v_div_scale_f32 v2, s[2:3], s0, s0, v0			; GFX8-NEXT: v_div_scale_f32 v2, vcc, s0, s0, v0
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s0			; GFX8-NEXT: v_mov_b32_e32 v0, s0
	; GFX8-NEXT: v_mov_b32_e32 v1, s1			; GFX8-NEXT: v_mov_b32_e32 v1, s1
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f32_val_undef_val:			; GFX10-LABEL: test_div_scale_f32_val_undef_val:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_div_scale_f32 v0, s2, s0, s0, 0x41000000			; GFX10-NEXT: v_div_scale_f32 v0, vcc_lo, s0, s0, 0x41000000
	; GFX10-NEXT: global_store_dword v1, v0, s[0:1]			; GFX10-NEXT: global_store_dword v1, v0, s[0:1]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f32_val_undef_val:			; GFX11-LABEL: test_div_scale_f32_val_undef_val:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX11-NEXT: v_mov_b32_e32 v1, 0			; GFX11-NEXT: v_mov_b32_e32 v1, 0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: v_div_scale_f32 v0, null, s0, s0, 0x41000000			; GFX11-NEXT: v_div_scale_f32 v0, vcc_lo, s0, s0, 0x41000000
	; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]			; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float 8.0, float undef, i1 false)			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float 8.0, float undef, i1 false)
	%result0 = extractvalue { float, i1 } %result, 0			%result0 = extractvalue { float, i1 } %result, 0
	store float %result0, float addrspace(1)* %out, align 4			store float %result0, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	define amdgpu_kernel void @test_div_scale_f32_undef_val_val(float addrspace(1)* %out) #0 {			define amdgpu_kernel void @test_div_scale_f32_undef_val_val(float addrspace(1)* %out) #0 {
	; GFX7-LABEL: test_div_scale_f32_undef_val_val:			; GFX7-LABEL: test_div_scale_f32_undef_val_val:
	; GFX7: ; %bb.0:			; GFX7: ; %bb.0:
	; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
	; GFX7-NEXT: v_mov_b32_e32 v0, 0x41000000			; GFX7-NEXT: v_mov_b32_e32 v0, 0x41000000
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: v_div_scale_f32 v0, s[2:3], v0, v0, s0
	; GFX7-NEXT: s_mov_b32 s2, -1			; GFX7-NEXT: s_mov_b32 s2, -1
	; GFX7-NEXT: s_mov_b32 s3, 0xf000			; GFX7-NEXT: s_mov_b32 s3, 0xf000
				; GFX7-NEXT: s_waitcnt lgkmcnt(0)
				; GFX7-NEXT: v_div_scale_f32 v0, vcc, v0, v0, s0
	; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f32_undef_val_val:			; GFX8-LABEL: test_div_scale_f32_undef_val_val:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: v_mov_b32_e32 v0, 0x41000000			; GFX8-NEXT: v_mov_b32_e32 v0, 0x41000000
	; GFX8-NEXT: v_div_scale_f32 v2, s[2:3], v0, v0, s0			; GFX8-NEXT: v_div_scale_f32 v2, vcc, v0, v0, s0
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s0			; GFX8-NEXT: v_mov_b32_e32 v0, s0
	; GFX8-NEXT: v_mov_b32_e32 v1, s1			; GFX8-NEXT: v_mov_b32_e32 v1, s1
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f32_undef_val_val:			; GFX10-LABEL: test_div_scale_f32_undef_val_val:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_div_scale_f32 v0, s2, 0x41000000, 0x41000000, s0			; GFX10-NEXT: v_div_scale_f32 v0, vcc_lo, 0x41000000, 0x41000000, s0
	; GFX10-NEXT: global_store_dword v1, v0, s[0:1]			; GFX10-NEXT: global_store_dword v1, v0, s[0:1]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f32_undef_val_val:			; GFX11-LABEL: test_div_scale_f32_undef_val_val:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX11-NEXT: v_mov_b32_e32 v1, 0			; GFX11-NEXT: v_mov_b32_e32 v1, 0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: v_div_scale_f32 v0, null, 0x41000000, 0x41000000, s0			; GFX11-NEXT: v_div_scale_f32 v0, vcc_lo, 0x41000000, 0x41000000, s0
	; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]			; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float undef, float 8.0, i1 false)			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float undef, float 8.0, i1 false)
	%result0 = extractvalue { float, i1 } %result, 0			%result0 = extractvalue { float, i1 } %result, 0
	store float %result0, float addrspace(1)* %out, align 4			store float %result0, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	define amdgpu_kernel void @test_div_scale_f32_undef_undef_val(float addrspace(1)* %out) #0 {			define amdgpu_kernel void @test_div_scale_f32_undef_undef_val(float addrspace(1)* %out) #0 {
	; GFX7-LABEL: test_div_scale_f32_undef_undef_val:			; GFX7-LABEL: test_div_scale_f32_undef_undef_val:
	; GFX7: ; %bb.0:			; GFX7: ; %bb.0:
	; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: v_div_scale_f32 v0, s[2:3], s0, s0, s0
	; GFX7-NEXT: s_mov_b32 s2, -1			; GFX7-NEXT: s_mov_b32 s2, -1
	; GFX7-NEXT: s_mov_b32 s3, 0xf000			; GFX7-NEXT: s_mov_b32 s3, 0xf000
				; GFX7-NEXT: s_waitcnt lgkmcnt(0)
				; GFX7-NEXT: v_div_scale_f32 v0, vcc, s0, s0, s0
	; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f32_undef_undef_val:			; GFX8-LABEL: test_div_scale_f32_undef_undef_val:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: v_div_scale_f32 v2, s[2:3], s0, s0, s0			; GFX8-NEXT: v_div_scale_f32 v2, vcc, s0, s0, s0
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s0			; GFX8-NEXT: v_mov_b32_e32 v0, s0
	; GFX8-NEXT: v_mov_b32_e32 v1, s1			; GFX8-NEXT: v_mov_b32_e32 v1, s1
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f32_undef_undef_val:			; GFX10-LABEL: test_div_scale_f32_undef_undef_val:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_div_scale_f32 v0, s2, s0, s0, s0			; GFX10-NEXT: v_div_scale_f32 v0, vcc_lo, s0, s0, s0
	; GFX10-NEXT: global_store_dword v1, v0, s[0:1]			; GFX10-NEXT: global_store_dword v1, v0, s[0:1]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f32_undef_undef_val:			; GFX11-LABEL: test_div_scale_f32_undef_undef_val:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX11-NEXT: v_mov_b32_e32 v1, 0			; GFX11-NEXT: v_mov_b32_e32 v1, 0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: v_div_scale_f32 v0, null, s0, s0, s0			; GFX11-NEXT: v_div_scale_f32 v0, vcc_lo, s0, s0, s0
	; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]			; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float undef, float undef, i1 false)			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float undef, float undef, i1 false)
	%result0 = extractvalue { float, i1 } %result, 0			%result0 = extractvalue { float, i1 } %result, 0
	store float %result0, float addrspace(1)* %out, align 4			store float %result0, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	define amdgpu_kernel void @test_div_scale_f64_val_undef_val(double addrspace(1)* %out) #0 {			define amdgpu_kernel void @test_div_scale_f64_val_undef_val(double addrspace(1)* %out) #0 {
	; GFX7-LABEL: test_div_scale_f64_val_undef_val:			; GFX7-LABEL: test_div_scale_f64_val_undef_val:
	; GFX7: ; %bb.0:			; GFX7: ; %bb.0:
	; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
	; GFX7-NEXT: s_mov_b32 s2, 0			; GFX7-NEXT: s_mov_b32 s2, 0
	; GFX7-NEXT: s_mov_b32 s3, 0x40200000			; GFX7-NEXT: s_mov_b32 s3, 0x40200000
	; GFX7-NEXT: v_div_scale_f64 v[0:1], s[2:3], v[0:1], v[0:1], s[2:3]			; GFX7-NEXT: v_div_scale_f64 v[0:1], vcc, v[0:1], v[0:1], s[2:3]
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: v_mov_b32_e32 v3, s1			; GFX7-NEXT: v_mov_b32_e32 v3, s1
	; GFX7-NEXT: v_mov_b32_e32 v2, s0			; GFX7-NEXT: v_mov_b32_e32 v2, s0
	; GFX7-NEXT: flat_store_dwordx2 v[2:3], v[0:1]			; GFX7-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_div_scale_f64_val_undef_val:			; GFX8-LABEL: test_div_scale_f64_val_undef_val:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: s_mov_b32 s2, 0			; GFX8-NEXT: s_mov_b32 s2, 0
	; GFX8-NEXT: s_mov_b32 s3, 0x40200000			; GFX8-NEXT: s_mov_b32 s3, 0x40200000
	; GFX8-NEXT: v_div_scale_f64 v[0:1], s[2:3], v[0:1], v[0:1], s[2:3]			; GFX8-NEXT: v_div_scale_f64 v[0:1], vcc, v[0:1], v[0:1], s[2:3]
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v3, s1			; GFX8-NEXT: v_mov_b32_e32 v3, s1
	; GFX8-NEXT: v_mov_b32_e32 v2, s0			; GFX8-NEXT: v_mov_b32_e32 v2, s0
	; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]			; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f64_val_undef_val:			; GFX10-LABEL: test_div_scale_f64_val_undef_val:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_mov_b32 s2, 0			; GFX10-NEXT: s_mov_b32 s2, 0
	; GFX10-NEXT: s_mov_b32 s3, 0x40200000			; GFX10-NEXT: s_mov_b32 s3, 0x40200000
	; GFX10-NEXT: v_mov_b32_e32 v2, 0			; GFX10-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-NEXT: v_div_scale_f64 v[0:1], s2, s[0:1], s[0:1], s[2:3]			; GFX10-NEXT: v_div_scale_f64 v[0:1], vcc_lo, s[0:1], s[0:1], s[2:3]
	; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: global_store_dwordx2 v2, v[0:1], s[0:1]			; GFX10-NEXT: global_store_dwordx2 v2, v[0:1], s[0:1]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: test_div_scale_f64_val_undef_val:			; GFX11-LABEL: test_div_scale_f64_val_undef_val:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_mov_b32 s2, 0			; GFX11-NEXT: s_mov_b32 s2, 0
	; GFX11-NEXT: s_mov_b32 s3, 0x40200000			; GFX11-NEXT: s_mov_b32 s3, 0x40200000
	; GFX11-NEXT: v_mov_b32_e32 v2, 0			; GFX11-NEXT: v_mov_b32_e32 v2, 0
	; GFX11-NEXT: v_div_scale_f64 v[0:1], null, s[0:1], s[0:1], s[2:3]			; GFX11-NEXT: v_div_scale_f64 v[0:1], vcc_lo, s[0:1], s[0:1], s[2:3]
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: global_store_b64 v2, v[0:1], s[0:1]			; GFX11-NEXT: global_store_b64 v2, v[0:1], s[0:1]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double 8.0, double undef, i1 false)			%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double 8.0, double undef, i1 false)
	%result0 = extractvalue { double, i1 } %result, 0			%result0 = extractvalue { double, i1 } %result, 0
	store double %result0, double addrspace(1)* %out, align 8			store double %result0, double addrspace(1)* %out, align 8
	Show All 10 Lines

llvm/test/CodeGen/AMDGPU/fdiv-nofpexcept.ll

	; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -stop-after=finalize-isel -o - %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -stop-after=finalize-isel -o - %s \| FileCheck -check-prefix=GCN %s

	; Make sure nofpexcept flags are emitted when lowering a			; Make sure nofpexcept flags are emitted when lowering a
	; non-constrained fdiv.			; non-constrained fdiv.

	define float @fdiv_f32(float %a, float %b) #0 {			define float @fdiv_f32(float %a, float %b) #0 {
	; GCN-LABEL: name: fdiv_f32			; GCN-LABEL: name: fdiv_f32
	; GCN: bb.0.entry:			; GCN: bb.0.entry:
	; GCN-NEXT: liveins: $vgpr0, $vgpr1			; GCN-NEXT: liveins: $vgpr0, $vgpr1
	; GCN-NEXT: {{ $}}			; GCN-NEXT: {{ $}}
	; GCN-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GCN-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GCN-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GCN-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GCN-NEXT: %4:vgpr_32, %5:sreg_64 = nofpexcept V_DIV_SCALE_F32_e64 0, [[COPY1]], 0, [[COPY]], 0, [[COPY1]], 0, 0, implicit $mode, implicit $exec			; GCN-NEXT: %4:vgpr_32 = nofpexcept V_DIV_SCALE_F32_e64 0, [[COPY]], 0, [[COPY]], 0, [[COPY1]], 0, 0, implicit-def dead $vcc, implicit $mode, implicit $exec
	; GCN-NEXT: %6:vgpr_32, %7:sreg_64 = nofpexcept V_DIV_SCALE_F32_e64 0, [[COPY]], 0, [[COPY]], 0, [[COPY1]], 0, 0, implicit $mode, implicit $exec			; GCN-NEXT: %5:vgpr_32 = nofpexcept V_RCP_F32_e64 0, %4, 0, 0, implicit $mode, implicit $exec
	; GCN-NEXT: %8:vgpr_32 = nofpexcept V_RCP_F32_e64 0, %6, 0, 0, implicit $mode, implicit $exec			; GCN-NEXT: %6:vgpr_32 = nofpexcept V_DIV_SCALE_F32_e64 0, [[COPY1]], 0, [[COPY]], 0, [[COPY1]], 0, 0, implicit-def $vcc, implicit $mode, implicit $exec
				; GCN-NEXT: [[COPY2:%[0-9]+]]:sreg_64 = COPY $vcc
	; GCN-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 3			; GCN-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 3
	; GCN-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sgpr_32 = S_MOV_B32 1065353216			; GCN-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sgpr_32 = S_MOV_B32 1065353216
	; GCN-NEXT: [[S_MOV_B32_2:%[0-9]+]]:sreg_32 = S_MOV_B32 0			; GCN-NEXT: [[S_MOV_B32_2:%[0-9]+]]:sreg_32 = S_MOV_B32 0
	; GCN-NEXT: S_SETREG_B32_mode killed [[S_MOV_B32_]], 2305, implicit-def $mode, implicit $mode			; GCN-NEXT: S_SETREG_B32_mode killed [[S_MOV_B32_]], 2305, implicit-def $mode, implicit $mode
	; GCN-NEXT: %12:vgpr_32 = nofpexcept V_FMA_F32_e64 1, %6, 0, %8, 0, killed [[S_MOV_B32_1]], 0, 0, implicit $mode, implicit $exec			; GCN-NEXT: %11:vgpr_32 = nofpexcept V_FMA_F32_e64 1, %4, 0, %5, 0, killed [[S_MOV_B32_1]], 0, 0, implicit $mode, implicit $exec
	; GCN-NEXT: %13:vgpr_32 = nofpexcept V_FMA_F32_e64 0, killed %12, 0, %8, 0, %8, 0, 0, implicit $mode, implicit $exec			; GCN-NEXT: %12:vgpr_32 = nofpexcept V_FMA_F32_e64 0, killed %11, 0, %5, 0, %5, 0, 0, implicit $mode, implicit $exec
	; GCN-NEXT: %14:vgpr_32 = nofpexcept V_MUL_F32_e64 0, %4, 0, %13, 0, 0, implicit $mode, implicit $exec			; GCN-NEXT: %13:vgpr_32 = nofpexcept V_MUL_F32_e64 0, %6, 0, %12, 0, 0, implicit $mode, implicit $exec
	; GCN-NEXT: %15:vgpr_32 = nofpexcept V_FMA_F32_e64 1, %6, 0, %14, 0, %4, 0, 0, implicit $mode, implicit $exec			; GCN-NEXT: %14:vgpr_32 = nofpexcept V_FMA_F32_e64 1, %4, 0, %13, 0, %6, 0, 0, implicit $mode, implicit $exec
	; GCN-NEXT: %16:vgpr_32 = nofpexcept V_FMA_F32_e64 0, killed %15, 0, %13, 0, %14, 0, 0, implicit $mode, implicit $exec			; GCN-NEXT: %15:vgpr_32 = nofpexcept V_FMA_F32_e64 0, killed %14, 0, %12, 0, %13, 0, 0, implicit $mode, implicit $exec
	; GCN-NEXT: %17:vgpr_32 = nofpexcept V_FMA_F32_e64 1, %6, 0, %16, 0, %4, 0, 0, implicit $mode, implicit $exec			; GCN-NEXT: %16:vgpr_32 = nofpexcept V_FMA_F32_e64 1, %4, 0, %15, 0, %6, 0, 0, implicit $mode, implicit $exec
	; GCN-NEXT: S_SETREG_B32_mode killed [[S_MOV_B32_2]], 2305, implicit-def dead $mode, implicit $mode			; GCN-NEXT: S_SETREG_B32_mode killed [[S_MOV_B32_2]], 2305, implicit-def dead $mode, implicit $mode
	; GCN-NEXT: $vcc = COPY %5			; GCN-NEXT: $vcc = COPY [[COPY2]]
	; GCN-NEXT: %18:vgpr_32 = nofpexcept V_DIV_FMAS_F32_e64 0, killed %17, 0, %13, 0, %16, 0, 0, implicit $mode, implicit $vcc, implicit $exec			; GCN-NEXT: %17:vgpr_32 = nofpexcept V_DIV_FMAS_F32_e64 0, killed %16, 0, %12, 0, %15, 0, 0, implicit $mode, implicit $vcc, implicit $exec
	; GCN-NEXT: %19:vgpr_32 = nofpexcept V_DIV_FIXUP_F32_e64 0, killed %18, 0, [[COPY]], 0, [[COPY1]], 0, 0, implicit $mode, implicit $exec			; GCN-NEXT: %18:vgpr_32 = nofpexcept V_DIV_FIXUP_F32_e64 0, killed %17, 0, [[COPY]], 0, [[COPY1]], 0, 0, implicit $mode, implicit $exec
	; GCN-NEXT: $vgpr0 = COPY %19			; GCN-NEXT: $vgpr0 = COPY %18
	; GCN-NEXT: SI_RETURN implicit $vgpr0			; GCN-NEXT: SI_RETURN implicit $vgpr0
	entry:			entry:
	%fdiv = fdiv float %a, %b			%fdiv = fdiv float %a, %b
	ret float %fdiv			ret float %fdiv
	}			}

	define float @fdiv_nnan_f32(float %a, float %b) #0 {			define float @fdiv_nnan_f32(float %a, float %b) #0 {
	; GCN-LABEL: name: fdiv_nnan_f32			; GCN-LABEL: name: fdiv_nnan_f32
	; GCN: bb.0.entry:			; GCN: bb.0.entry:
	; GCN-NEXT: liveins: $vgpr0, $vgpr1			; GCN-NEXT: liveins: $vgpr0, $vgpr1
	; GCN-NEXT: {{ $}}			; GCN-NEXT: {{ $}}
	; GCN-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GCN-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GCN-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GCN-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GCN-NEXT: %4:vgpr_32, %5:sreg_64 = nnan nofpexcept V_DIV_SCALE_F32_e64 0, [[COPY1]], 0, [[COPY]], 0, [[COPY1]], 0, 0, implicit $mode, implicit $exec			; GCN-NEXT: %4:vgpr_32 = nnan nofpexcept V_DIV_SCALE_F32_e64 0, [[COPY]], 0, [[COPY]], 0, [[COPY1]], 0, 0, implicit-def dead $vcc, implicit $mode, implicit $exec
	; GCN-NEXT: %6:vgpr_32, %7:sreg_64 = nnan nofpexcept V_DIV_SCALE_F32_e64 0, [[COPY]], 0, [[COPY]], 0, [[COPY1]], 0, 0, implicit $mode, implicit $exec			; GCN-NEXT: %5:vgpr_32 = nnan nofpexcept V_RCP_F32_e64 0, %4, 0, 0, implicit $mode, implicit $exec
	; GCN-NEXT: %8:vgpr_32 = nnan nofpexcept V_RCP_F32_e64 0, %6, 0, 0, implicit $mode, implicit $exec			; GCN-NEXT: %6:vgpr_32 = nnan nofpexcept V_DIV_SCALE_F32_e64 0, [[COPY1]], 0, [[COPY]], 0, [[COPY1]], 0, 0, implicit-def $vcc, implicit $mode, implicit $exec
				; GCN-NEXT: [[COPY2:%[0-9]+]]:sreg_64 = COPY $vcc
	; GCN-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 3			; GCN-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 3
	; GCN-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sgpr_32 = S_MOV_B32 1065353216			; GCN-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sgpr_32 = S_MOV_B32 1065353216
	; GCN-NEXT: [[S_MOV_B32_2:%[0-9]+]]:sreg_32 = S_MOV_B32 0			; GCN-NEXT: [[S_MOV_B32_2:%[0-9]+]]:sreg_32 = S_MOV_B32 0
	; GCN-NEXT: S_SETREG_B32_mode killed [[S_MOV_B32_]], 2305, implicit-def $mode, implicit $mode			; GCN-NEXT: S_SETREG_B32_mode killed [[S_MOV_B32_]], 2305, implicit-def $mode, implicit $mode
	; GCN-NEXT: %12:vgpr_32 = nnan nofpexcept V_FMA_F32_e64 1, %6, 0, %8, 0, killed [[S_MOV_B32_1]], 0, 0, implicit $mode, implicit $exec			; GCN-NEXT: %11:vgpr_32 = nnan nofpexcept V_FMA_F32_e64 1, %4, 0, %5, 0, killed [[S_MOV_B32_1]], 0, 0, implicit $mode, implicit $exec
	; GCN-NEXT: %13:vgpr_32 = nnan nofpexcept V_FMA_F32_e64 0, killed %12, 0, %8, 0, %8, 0, 0, implicit $mode, implicit $exec			; GCN-NEXT: %12:vgpr_32 = nnan nofpexcept V_FMA_F32_e64 0, killed %11, 0, %5, 0, %5, 0, 0, implicit $mode, implicit $exec
	; GCN-NEXT: %14:vgpr_32 = nnan nofpexcept V_MUL_F32_e64 0, %4, 0, %13, 0, 0, implicit $mode, implicit $exec			; GCN-NEXT: %13:vgpr_32 = nnan nofpexcept V_MUL_F32_e64 0, %6, 0, %12, 0, 0, implicit $mode, implicit $exec
	; GCN-NEXT: %15:vgpr_32 = nnan nofpexcept V_FMA_F32_e64 1, %6, 0, %14, 0, %4, 0, 0, implicit $mode, implicit $exec			; GCN-NEXT: %14:vgpr_32 = nnan nofpexcept V_FMA_F32_e64 1, %4, 0, %13, 0, %6, 0, 0, implicit $mode, implicit $exec
	; GCN-NEXT: %16:vgpr_32 = nnan nofpexcept V_FMA_F32_e64 0, killed %15, 0, %13, 0, %14, 0, 0, implicit $mode, implicit $exec			; GCN-NEXT: %15:vgpr_32 = nnan nofpexcept V_FMA_F32_e64 0, killed %14, 0, %12, 0, %13, 0, 0, implicit $mode, implicit $exec
	; GCN-NEXT: %17:vgpr_32 = nnan nofpexcept V_FMA_F32_e64 1, %6, 0, %16, 0, %4, 0, 0, implicit $mode, implicit $exec			; GCN-NEXT: %16:vgpr_32 = nnan nofpexcept V_FMA_F32_e64 1, %4, 0, %15, 0, %6, 0, 0, implicit $mode, implicit $exec
	; GCN-NEXT: S_SETREG_B32_mode killed [[S_MOV_B32_2]], 2305, implicit-def dead $mode, implicit $mode			; GCN-NEXT: S_SETREG_B32_mode killed [[S_MOV_B32_2]], 2305, implicit-def dead $mode, implicit $mode
	; GCN-NEXT: $vcc = COPY %5			; GCN-NEXT: $vcc = COPY [[COPY2]]
	; GCN-NEXT: %18:vgpr_32 = nnan nofpexcept V_DIV_FMAS_F32_e64 0, killed %17, 0, %13, 0, %16, 0, 0, implicit $mode, implicit $vcc, implicit $exec			; GCN-NEXT: %17:vgpr_32 = nnan nofpexcept V_DIV_FMAS_F32_e64 0, killed %16, 0, %12, 0, %15, 0, 0, implicit $mode, implicit $vcc, implicit $exec
	; GCN-NEXT: %19:vgpr_32 = nnan nofpexcept V_DIV_FIXUP_F32_e64 0, killed %18, 0, [[COPY]], 0, [[COPY1]], 0, 0, implicit $mode, implicit $exec			; GCN-NEXT: %18:vgpr_32 = nnan nofpexcept V_DIV_FIXUP_F32_e64 0, killed %17, 0, [[COPY]], 0, [[COPY1]], 0, 0, implicit $mode, implicit $exec
	; GCN-NEXT: $vgpr0 = COPY %19			; GCN-NEXT: $vgpr0 = COPY %18
	; GCN-NEXT: SI_RETURN implicit $vgpr0			; GCN-NEXT: SI_RETURN implicit $vgpr0
	entry:			entry:
	%fdiv = fdiv nnan float %a, %b			%fdiv = fdiv nnan float %a, %b
	ret float %fdiv			ret float %fdiv
	}			}

	attributes #0 = { nounwind "denormal-fp-math-f32"="preserve-sign,preserve-sign" }			attributes #0 = { nounwind "denormal-fp-math-f32"="preserve-sign,preserve-sign" }

llvm/test/CodeGen/AMDGPU/fdiv.f64.ll

	; RUN: llc -march=amdgcn -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -check-prefix=CI -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -check-prefix=CI -check-prefix=GCN %s
	; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=GCN %s
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=CI -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=CI -check-prefix=GCN %s


	; GCN-LABEL: {{^}}fdiv_f64:			; GCN-LABEL: {{^}}fdiv_f64:
	; GCN-DAG: buffer_load_dwordx2 [[NUM:v\[[0-9]+:[0-9]+\]]], off, {{s\[[0-9]+:[0-9]+\]}}, 0			; GCN-DAG: buffer_load_dwordx2 [[NUM:v\[[0-9]+:[0-9]+\]]], off, {{s\[[0-9]+:[0-9]+\]}}, 0
	; GCN-DAG: buffer_load_dwordx2 [[DEN:v\[[0-9]+:[0-9]+\]]], off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:8			; GCN-DAG: buffer_load_dwordx2 [[DEN:v\[[0-9]+:[0-9]+\]]], off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:8
	; CI-DAG: v_div_scale_f64 [[SCALE0:v\[[0-9]+:[0-9]+\]]], {{s\[[0-9]+:[0-9]+\]}}, [[DEN]], [[DEN]], [[NUM]]			; CI-DAG: v_div_scale_f64 [[SCALE0:v\[[0-9]+:[0-9]+\]]], vcc, [[DEN]], [[DEN]], [[NUM]]
	; CI-DAG: v_div_scale_f64 [[SCALE1:v\[[0-9]+:[0-9]+\]]], vcc, [[NUM]], [[DEN]], [[NUM]]			; CI-DAG: v_div_scale_f64 [[SCALE1:v\[[0-9]+:[0-9]+\]]], vcc, [[NUM]], [[DEN]], [[NUM]]

	; Check for div_scale bug workaround on SI			; Check for div_scale bug workaround on SI
	; SI-DAG: v_div_scale_f64 [[SCALE0:v\[[0-9]+:[0-9]+\]]], {{s\[[0-9]+:[0-9]+\]}}, [[DEN]], [[DEN]], [[NUM]]			; SI-DAG: v_div_scale_f64 [[SCALE0:v\[[0-9]+:[0-9]+\]]], vcc, [[DEN]], [[DEN]], [[NUM]]
	; SI-DAG: v_div_scale_f64 [[SCALE1:v\[[0-9]+:[0-9]+\]]], {{s\[[0-9]+:[0-9]+\]}}, [[NUM]], [[DEN]], [[NUM]]			; SI-DAG: v_div_scale_f64 [[SCALE1:v\[[0-9]+:[0-9]+\]]], vcc, [[NUM]], [[DEN]], [[NUM]]

	; GCN-DAG: v_rcp_f64_e32 [[RCP_SCALE0:v\[[0-9]+:[0-9]+\]]], [[SCALE0]]			; GCN-DAG: v_rcp_f64_e32 [[RCP_SCALE0:v\[[0-9]+:[0-9]+\]]], [[SCALE0]]

	; SI-DAG: v_cmp_eq_u32_e32 vcc, {{v[0-9]+}}, {{v[0-9]+}}			; SI-DAG: v_cmp_eq_u32_e32 vcc, {{v[0-9]+}}, {{v[0-9]+}}
	; SI-DAG: v_cmp_eq_u32_e64 [[CMP0:s\[[0-9]+:[0-9]+\]]], {{v[0-9]+}}, {{v[0-9]+}}			; SI-DAG: v_cmp_eq_u32_e64 [[CMP0:s\[[0-9]+:[0-9]+\]]], {{v[0-9]+}}, {{v[0-9]+}}
	; SI-DAG: s_xor_b64 vcc, [[CMP0]], vcc			; SI-DAG: s_xor_b64 vcc, [[CMP0]], vcc

	; GCN-DAG: v_fma_f64 [[FMA0:v\[[0-9]+:[0-9]+\]]], -[[SCALE0]], [[RCP_SCALE0]], 1.0			; GCN-DAG: v_fma_f64 [[FMA0:v\[[0-9]+:[0-9]+\]]], -[[SCALE0]], [[RCP_SCALE0]], 1.0
	▲ Show 20 Lines • Show All 140 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/frem.ll

	Show All 22 Lines
	; SI-NEXT: s_mov_b32 s2, s10			; SI-NEXT: s_mov_b32 s2, s10
	; SI-NEXT: s_mov_b32 s3, s11			; SI-NEXT: s_mov_b32 s3, s11
	; SI-NEXT: buffer_load_ushort v0, off, s[4:7], 0			; SI-NEXT: buffer_load_ushort v0, off, s[4:7], 0
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: v_cvt_f32_f16_e32 v0, v0			; SI-NEXT: v_cvt_f32_f16_e32 v0, v0
	; SI-NEXT: buffer_load_ushort v1, off, s[0:3], 0 offset:8			; SI-NEXT: buffer_load_ushort v1, off, s[0:3], 0 offset:8
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: v_cvt_f32_f16_e32 v1, v1			; SI-NEXT: v_cvt_f32_f16_e32 v1, v1
	; SI-NEXT: v_div_scale_f32 v2, vcc, v0, v1, v0			; SI-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v0
	; SI-NEXT: v_div_scale_f32 v3, s[0:1], v1, v1, v0			; SI-NEXT: v_rcp_f32_e32 v3, v2
	; SI-NEXT: v_rcp_f32_e32 v4, v3			; SI-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; SI-NEXT: v_fma_f32 v5, -v3, v4, 1.0			; SI-NEXT: v_fma_f32 v5, -v2, v3, 1.0
	; SI-NEXT: v_fma_f32 v4, v5, v4, v4			; SI-NEXT: v_fma_f32 v3, v5, v3, v3
	; SI-NEXT: v_mul_f32_e32 v5, v2, v4			; SI-NEXT: v_mul_f32_e32 v5, v4, v3
	; SI-NEXT: v_fma_f32 v6, -v3, v5, v2			; SI-NEXT: v_fma_f32 v6, -v2, v5, v4
	; SI-NEXT: v_fma_f32 v5, v6, v4, v5			; SI-NEXT: v_fma_f32 v5, v6, v3, v5
	; SI-NEXT: v_fma_f32 v2, -v3, v5, v2			; SI-NEXT: v_fma_f32 v2, -v2, v5, v4
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; SI-NEXT: v_div_fmas_f32 v2, v2, v4, v5			; SI-NEXT: v_div_fmas_f32 v2, v2, v3, v5
	; SI-NEXT: v_div_fixup_f32 v2, v2, v1, v0			; SI-NEXT: v_div_fixup_f32 v2, v2, v1, v0
	; SI-NEXT: v_trunc_f32_e32 v2, v2			; SI-NEXT: v_trunc_f32_e32 v2, v2
	; SI-NEXT: v_fma_f32 v0, -v2, v1, v0			; SI-NEXT: v_fma_f32 v0, -v2, v1, v0
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
	; SI-NEXT: v_cvt_f16_f32_e32 v0, v0			; SI-NEXT: v_cvt_f16_f32_e32 v0, v0
	; SI-NEXT: buffer_store_short v0, off, s[8:11], 0			; SI-NEXT: buffer_store_short v0, off, s[8:11], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	Show All 13 Lines
	; CI-NEXT: s_mov_b32 s7, s11			; CI-NEXT: s_mov_b32 s7, s11
	; CI-NEXT: s_mov_b32 s3, s11			; CI-NEXT: s_mov_b32 s3, s11
	; CI-NEXT: buffer_load_ushort v0, off, s[4:7], 0			; CI-NEXT: buffer_load_ushort v0, off, s[4:7], 0
	; CI-NEXT: buffer_load_ushort v1, off, s[0:3], 0 offset:8			; CI-NEXT: buffer_load_ushort v1, off, s[0:3], 0 offset:8
	; CI-NEXT: s_waitcnt vmcnt(1)			; CI-NEXT: s_waitcnt vmcnt(1)
	; CI-NEXT: v_cvt_f32_f16_e32 v0, v0			; CI-NEXT: v_cvt_f32_f16_e32 v0, v0
	; CI-NEXT: s_waitcnt vmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0)
	; CI-NEXT: v_cvt_f32_f16_e32 v1, v1			; CI-NEXT: v_cvt_f32_f16_e32 v1, v1
	; CI-NEXT: v_div_scale_f32 v3, s[0:1], v1, v1, v0			; CI-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v0
	; CI-NEXT: v_div_scale_f32 v2, vcc, v0, v1, v0			; CI-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0
	; CI-NEXT: v_rcp_f32_e32 v4, v3			; CI-NEXT: v_rcp_f32_e32 v3, v2
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v5, -v3, v4, 1.0			; CI-NEXT: v_fma_f32 v5, -v2, v3, 1.0
	; CI-NEXT: v_fma_f32 v4, v5, v4, v4			; CI-NEXT: v_fma_f32 v3, v5, v3, v3
	; CI-NEXT: v_mul_f32_e32 v5, v2, v4			; CI-NEXT: v_mul_f32_e32 v5, v4, v3
	; CI-NEXT: v_fma_f32 v6, -v3, v5, v2			; CI-NEXT: v_fma_f32 v6, -v2, v5, v4
	; CI-NEXT: v_fma_f32 v5, v6, v4, v5			; CI-NEXT: v_fma_f32 v5, v6, v3, v5
	; CI-NEXT: v_fma_f32 v2, -v3, v5, v2			; CI-NEXT: v_fma_f32 v2, -v2, v5, v4
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; CI-NEXT: v_div_fmas_f32 v2, v2, v4, v5			; CI-NEXT: v_div_fmas_f32 v2, v2, v3, v5
	; CI-NEXT: v_div_fixup_f32 v2, v2, v1, v0			; CI-NEXT: v_div_fixup_f32 v2, v2, v1, v0
	; CI-NEXT: v_trunc_f32_e32 v2, v2			; CI-NEXT: v_trunc_f32_e32 v2, v2
	; CI-NEXT: v_fma_f32 v0, -v2, v1, v0			; CI-NEXT: v_fma_f32 v0, -v2, v1, v0
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
	; CI-NEXT: v_cvt_f16_f32_e32 v0, v0			; CI-NEXT: v_cvt_f16_f32_e32 v0, v0
	; CI-NEXT: buffer_store_short v0, off, s[8:11], 0			; CI-NEXT: buffer_store_short v0, off, s[8:11], 0
	; CI-NEXT: s_endpgm			; CI-NEXT: s_endpgm
	;			;
	▲ Show 20 Lines • Show All 409 Lines • ▼ Show 20 Lines
	; SI-NEXT: s_mov_b32 s5, s7			; SI-NEXT: s_mov_b32 s5, s7
	; SI-NEXT: s_mov_b32 s6, s10			; SI-NEXT: s_mov_b32 s6, s10
	; SI-NEXT: s_mov_b32 s7, s11			; SI-NEXT: s_mov_b32 s7, s11
	; SI-NEXT: s_mov_b32 s2, s10			; SI-NEXT: s_mov_b32 s2, s10
	; SI-NEXT: s_mov_b32 s3, s11			; SI-NEXT: s_mov_b32 s3, s11
	; SI-NEXT: buffer_load_dword v0, off, s[4:7], 0			; SI-NEXT: buffer_load_dword v0, off, s[4:7], 0
	; SI-NEXT: buffer_load_dword v1, off, s[0:3], 0 offset:16			; SI-NEXT: buffer_load_dword v1, off, s[0:3], 0 offset:16
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: v_div_scale_f32 v2, vcc, v0, v1, v0			; SI-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v0
	; SI-NEXT: v_div_scale_f32 v3, s[0:1], v1, v1, v0			; SI-NEXT: v_rcp_f32_e32 v3, v2
	; SI-NEXT: v_rcp_f32_e32 v4, v3			; SI-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; SI-NEXT: v_fma_f32 v5, -v3, v4, 1.0			; SI-NEXT: v_fma_f32 v5, -v2, v3, 1.0
	; SI-NEXT: v_fma_f32 v4, v5, v4, v4			; SI-NEXT: v_fma_f32 v3, v5, v3, v3
	; SI-NEXT: v_mul_f32_e32 v5, v2, v4			; SI-NEXT: v_mul_f32_e32 v5, v4, v3
	; SI-NEXT: v_fma_f32 v6, -v3, v5, v2			; SI-NEXT: v_fma_f32 v6, -v2, v5, v4
	; SI-NEXT: v_fma_f32 v5, v6, v4, v5			; SI-NEXT: v_fma_f32 v5, v6, v3, v5
	; SI-NEXT: v_fma_f32 v2, -v3, v5, v2			; SI-NEXT: v_fma_f32 v2, -v2, v5, v4
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; SI-NEXT: v_div_fmas_f32 v2, v2, v4, v5			; SI-NEXT: v_div_fmas_f32 v2, v2, v3, v5
	; SI-NEXT: v_div_fixup_f32 v2, v2, v1, v0			; SI-NEXT: v_div_fixup_f32 v2, v2, v1, v0
	; SI-NEXT: v_trunc_f32_e32 v2, v2			; SI-NEXT: v_trunc_f32_e32 v2, v2
	; SI-NEXT: v_fma_f32 v0, -v2, v1, v0			; SI-NEXT: v_fma_f32 v0, -v2, v1, v0
	; SI-NEXT: buffer_store_dword v0, off, s[8:11], 0			; SI-NEXT: buffer_store_dword v0, off, s[8:11], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	; CI-LABEL: frem_f32:			; CI-LABEL: frem_f32:
	; CI: ; %bb.0:			; CI: ; %bb.0:
	; CI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; CI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd			; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
	; CI-NEXT: s_mov_b32 s11, 0xf000			; CI-NEXT: s_mov_b32 s11, 0xf000
	; CI-NEXT: s_mov_b32 s10, -1			; CI-NEXT: s_mov_b32 s10, -1
	; CI-NEXT: s_mov_b32 s2, s10			; CI-NEXT: s_mov_b32 s2, s10
	; CI-NEXT: s_waitcnt lgkmcnt(0)			; CI-NEXT: s_waitcnt lgkmcnt(0)
	; CI-NEXT: s_mov_b32 s8, s4			; CI-NEXT: s_mov_b32 s8, s4
	; CI-NEXT: s_mov_b32 s9, s5			; CI-NEXT: s_mov_b32 s9, s5
	; CI-NEXT: s_mov_b32 s4, s6			; CI-NEXT: s_mov_b32 s4, s6
	; CI-NEXT: s_mov_b32 s5, s7			; CI-NEXT: s_mov_b32 s5, s7
	; CI-NEXT: s_mov_b32 s6, s10			; CI-NEXT: s_mov_b32 s6, s10
	; CI-NEXT: s_mov_b32 s7, s11			; CI-NEXT: s_mov_b32 s7, s11
	; CI-NEXT: s_mov_b32 s3, s11			; CI-NEXT: s_mov_b32 s3, s11
	; CI-NEXT: buffer_load_dword v0, off, s[4:7], 0			; CI-NEXT: buffer_load_dword v0, off, s[4:7], 0
	; CI-NEXT: buffer_load_dword v1, off, s[0:3], 0 offset:16			; CI-NEXT: buffer_load_dword v1, off, s[0:3], 0 offset:16
	; CI-NEXT: s_waitcnt vmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0)
	; CI-NEXT: v_div_scale_f32 v3, s[0:1], v1, v1, v0			; CI-NEXT: v_div_scale_f32 v2, vcc, v1, v1, v0
	; CI-NEXT: v_div_scale_f32 v2, vcc, v0, v1, v0			; CI-NEXT: v_div_scale_f32 v4, vcc, v0, v1, v0
	; CI-NEXT: v_rcp_f32_e32 v4, v3			; CI-NEXT: v_rcp_f32_e32 v3, v2
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v5, -v3, v4, 1.0			; CI-NEXT: v_fma_f32 v5, -v2, v3, 1.0
	; CI-NEXT: v_fma_f32 v4, v5, v4, v4			; CI-NEXT: v_fma_f32 v3, v5, v3, v3
	; CI-NEXT: v_mul_f32_e32 v5, v2, v4			; CI-NEXT: v_mul_f32_e32 v5, v4, v3
	; CI-NEXT: v_fma_f32 v6, -v3, v5, v2			; CI-NEXT: v_fma_f32 v6, -v2, v5, v4
	; CI-NEXT: v_fma_f32 v5, v6, v4, v5			; CI-NEXT: v_fma_f32 v5, v6, v3, v5
	; CI-NEXT: v_fma_f32 v2, -v3, v5, v2			; CI-NEXT: v_fma_f32 v2, -v2, v5, v4
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; CI-NEXT: v_div_fmas_f32 v2, v2, v4, v5			; CI-NEXT: v_div_fmas_f32 v2, v2, v3, v5
	; CI-NEXT: v_div_fixup_f32 v2, v2, v1, v0			; CI-NEXT: v_div_fixup_f32 v2, v2, v1, v0
	; CI-NEXT: v_trunc_f32_e32 v2, v2			; CI-NEXT: v_trunc_f32_e32 v2, v2
	; CI-NEXT: v_fma_f32 v0, -v2, v1, v0			; CI-NEXT: v_fma_f32 v0, -v2, v1, v0
	; CI-NEXT: buffer_store_dword v0, off, s[8:11], 0			; CI-NEXT: buffer_store_dword v0, off, s[8:11], 0
	; CI-NEXT: s_endpgm			; CI-NEXT: s_endpgm
	;			;
	; VI-LABEL: frem_f32:			; VI-LABEL: frem_f32:
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: v_mov_b32_e32 v2, s6			; VI-NEXT: v_mov_b32_e32 v2, s6
	; VI-NEXT: s_add_u32 s0, s0, 16			; VI-NEXT: s_add_u32 s0, s0, 16
	; VI-NEXT: v_mov_b32_e32 v3, s7			; VI-NEXT: v_mov_b32_e32 v3, s7
	; VI-NEXT: s_addc_u32 s1, s1, 0			; VI-NEXT: s_addc_u32 s1, s1, 0
	; VI-NEXT: flat_load_dword v4, v[2:3]			; VI-NEXT: flat_load_dword v4, v[2:3]
	; VI-NEXT: v_mov_b32_e32 v3, s1			; VI-NEXT: v_mov_b32_e32 v3, s1
	; VI-NEXT: v_mov_b32_e32 v2, s0			; VI-NEXT: v_mov_b32_e32 v2, s0
	; VI-NEXT: flat_load_dword v2, v[2:3]			; VI-NEXT: flat_load_dword v2, v[2:3]
	; VI-NEXT: v_mov_b32_e32 v0, s4			; VI-NEXT: v_mov_b32_e32 v0, s4
	; VI-NEXT: v_mov_b32_e32 v1, s5			; VI-NEXT: v_mov_b32_e32 v1, s5
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: v_div_scale_f32 v5, s[0:1], v2, v2, v4			; VI-NEXT: v_div_scale_f32 v3, vcc, v2, v2, v4
	; VI-NEXT: v_div_scale_f32 v3, vcc, v4, v2, v4			; VI-NEXT: v_div_scale_f32 v6, vcc, v4, v2, v4
	; VI-NEXT: v_rcp_f32_e32 v6, v5			; VI-NEXT: v_rcp_f32_e32 v5, v3
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; VI-NEXT: v_fma_f32 v7, -v5, v6, 1.0			; VI-NEXT: v_fma_f32 v7, -v3, v5, 1.0
	; VI-NEXT: v_fma_f32 v6, v7, v6, v6			; VI-NEXT: v_fma_f32 v5, v7, v5, v5
	; VI-NEXT: v_mul_f32_e32 v7, v3, v6			; VI-NEXT: v_mul_f32_e32 v7, v6, v5
	; VI-NEXT: v_fma_f32 v8, -v5, v7, v3			; VI-NEXT: v_fma_f32 v8, -v3, v7, v6
	; VI-NEXT: v_fma_f32 v7, v8, v6, v7			; VI-NEXT: v_fma_f32 v7, v8, v5, v7
	; VI-NEXT: v_fma_f32 v3, -v5, v7, v3			; VI-NEXT: v_fma_f32 v3, -v3, v7, v6
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; VI-NEXT: v_div_fmas_f32 v3, v3, v6, v7			; VI-NEXT: v_div_fmas_f32 v3, v3, v5, v7
	; VI-NEXT: v_div_fixup_f32 v3, v3, v2, v4			; VI-NEXT: v_div_fixup_f32 v3, v3, v2, v4
	; VI-NEXT: v_trunc_f32_e32 v3, v3			; VI-NEXT: v_trunc_f32_e32 v3, v3
	; VI-NEXT: v_fma_f32 v2, -v3, v2, v4			; VI-NEXT: v_fma_f32 v2, -v3, v2, v4
	; VI-NEXT: flat_store_dword v[0:1], v2			; VI-NEXT: flat_store_dword v[0:1], v2
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: frem_f32:			; GFX9-LABEL: frem_f32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX9-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34			; GFX9-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dword v1, v0, s[6:7]			; GFX9-NEXT: global_load_dword v1, v0, s[6:7]
	; GFX9-NEXT: global_load_dword v2, v0, s[2:3] offset:16			; GFX9-NEXT: global_load_dword v2, v0, s[2:3] offset:16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_div_scale_f32 v4, s[0:1], v2, v2, v1			; GFX9-NEXT: v_div_scale_f32 v3, vcc, v2, v2, v1
	; GFX9-NEXT: v_div_scale_f32 v3, vcc, v1, v2, v1			; GFX9-NEXT: v_div_scale_f32 v5, vcc, v1, v2, v1
	; GFX9-NEXT: v_rcp_f32_e32 v5, v4			; GFX9-NEXT: v_rcp_f32_e32 v4, v3
	; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX9-NEXT: v_fma_f32 v6, -v4, v5, 1.0			; GFX9-NEXT: v_fma_f32 v6, -v3, v4, 1.0
	; GFX9-NEXT: v_fma_f32 v5, v6, v5, v5			; GFX9-NEXT: v_fma_f32 v4, v6, v4, v4
	; GFX9-NEXT: v_mul_f32_e32 v6, v3, v5			; GFX9-NEXT: v_mul_f32_e32 v6, v5, v4
	; GFX9-NEXT: v_fma_f32 v7, -v4, v6, v3			; GFX9-NEXT: v_fma_f32 v7, -v3, v6, v5
	; GFX9-NEXT: v_fma_f32 v6, v7, v5, v6			; GFX9-NEXT: v_fma_f32 v6, v7, v4, v6
	; GFX9-NEXT: v_fma_f32 v3, -v4, v6, v3			; GFX9-NEXT: v_fma_f32 v3, -v3, v6, v5
	; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; GFX9-NEXT: v_div_fmas_f32 v3, v3, v5, v6			; GFX9-NEXT: v_div_fmas_f32 v3, v3, v4, v6
	; GFX9-NEXT: v_div_fixup_f32 v3, v3, v2, v1			; GFX9-NEXT: v_div_fixup_f32 v3, v3, v2, v1
	; GFX9-NEXT: v_trunc_f32_e32 v3, v3			; GFX9-NEXT: v_trunc_f32_e32 v3, v3
	; GFX9-NEXT: v_fma_f32 v1, -v3, v2, v1			; GFX9-NEXT: v_fma_f32 v1, -v3, v2, v1
	; GFX9-NEXT: global_store_dword v0, v1, s[4:5]			; GFX9-NEXT: global_store_dword v0, v1, s[4:5]
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: frem_f32:			; GFX10-LABEL: frem_f32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34			; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: global_load_dword v1, v0, s[6:7]			; GFX10-NEXT: global_load_dword v1, v0, s[6:7]
	; GFX10-NEXT: global_load_dword v2, v0, s[2:3] offset:16			; GFX10-NEXT: global_load_dword v2, v0, s[2:3] offset:16
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f32 v4, s0, v2, v2, v1			; GFX10-NEXT: v_div_scale_f32 v3, vcc_lo, v2, v2, v1
	; GFX10-NEXT: v_div_scale_f32 v3, vcc_lo, v1, v2, v1			; GFX10-NEXT: v_div_scale_f32 v5, vcc_lo, v1, v2, v1
	; GFX10-NEXT: v_rcp_f32_e32 v5, v4			; GFX10-NEXT: v_rcp_f32_e32 v4, v3
	; GFX10-NEXT: s_denorm_mode 15			; GFX10-NEXT: s_denorm_mode 15
	; GFX10-NEXT: v_fma_f32 v6, -v4, v5, 1.0			; GFX10-NEXT: v_fma_f32 v6, -v3, v4, 1.0
	; GFX10-NEXT: v_fmac_f32_e32 v5, v6, v5			; GFX10-NEXT: v_fmac_f32_e32 v4, v6, v4
	; GFX10-NEXT: v_mul_f32_e32 v6, v3, v5			; GFX10-NEXT: v_mul_f32_e32 v6, v5, v4
	; GFX10-NEXT: v_fma_f32 v7, -v4, v6, v3			; GFX10-NEXT: v_fma_f32 v7, -v3, v6, v5
	; GFX10-NEXT: v_fmac_f32_e32 v6, v7, v5			; GFX10-NEXT: v_fmac_f32_e32 v6, v7, v4
	; GFX10-NEXT: v_fma_f32 v3, -v4, v6, v3			; GFX10-NEXT: v_fma_f32 v3, -v3, v6, v5
	; GFX10-NEXT: s_denorm_mode 12			; GFX10-NEXT: s_denorm_mode 12
	; GFX10-NEXT: v_div_fmas_f32 v3, v3, v5, v6			; GFX10-NEXT: v_div_fmas_f32 v3, v3, v4, v6
	; GFX10-NEXT: v_div_fixup_f32 v3, v3, v2, v1			; GFX10-NEXT: v_div_fixup_f32 v3, v3, v2, v1
	; GFX10-NEXT: v_trunc_f32_e32 v3, v3			; GFX10-NEXT: v_trunc_f32_e32 v3, v3
	; GFX10-NEXT: v_fma_f32 v1, -v3, v2, v1			; GFX10-NEXT: v_fma_f32 v1, -v3, v2, v1
	; GFX10-NEXT: global_store_dword v0, v1, s[4:5]			; GFX10-NEXT: global_store_dword v0, v1, s[4:5]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: frem_f32:			; GFX11-LABEL: frem_f32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24			; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x34			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x34
	; GFX11-NEXT: v_mov_b32_e32 v0, 0			; GFX11-NEXT: v_mov_b32_e32 v0, 0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: global_load_b32 v1, v0, s[6:7]			; GFX11-NEXT: global_load_b32 v1, v0, s[6:7]
	; GFX11-NEXT: global_load_b32 v2, v0, s[0:1] offset:16			; GFX11-NEXT: global_load_b32 v2, v0, s[0:1] offset:16
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_div_scale_f32 v4, null, v2, v2, v1			; GFX11-NEXT: v_div_scale_f32 v3, vcc_lo, v2, v2, v1
	; GFX11-NEXT: v_div_scale_f32 v3, vcc_lo, v1, v2, v1			; GFX11-NEXT: v_div_scale_f32 v5, vcc_lo, v1, v2, v1
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_rcp_f32_e32 v5, v4			; GFX11-NEXT: v_rcp_f32_e32 v4, v3
	; GFX11-NEXT: s_denorm_mode 15			; GFX11-NEXT: s_denorm_mode 15
	; GFX11-NEXT: s_waitcnt_depctr 0xfff			; GFX11-NEXT: s_waitcnt_depctr 0xfff
	; GFX11-NEXT: v_fma_f32 v6, -v4, v5, 1.0			; GFX11-NEXT: v_fma_f32 v6, -v3, v4, 1.0
	; GFX11-NEXT: v_fmac_f32_e32 v5, v6, v5			; GFX11-NEXT: v_fmac_f32_e32 v4, v6, v4
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_mul_f32_e32 v6, v3, v5			; GFX11-NEXT: v_mul_f32_e32 v6, v5, v4
	; GFX11-NEXT: v_fma_f32 v7, -v4, v6, v3			; GFX11-NEXT: v_fma_f32 v7, -v3, v6, v5
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_fmac_f32_e32 v6, v7, v5			; GFX11-NEXT: v_fmac_f32_e32 v6, v7, v4
	; GFX11-NEXT: v_fma_f32 v3, -v4, v6, v3			; GFX11-NEXT: v_fma_f32 v3, -v3, v6, v5
	; GFX11-NEXT: s_denorm_mode 12			; GFX11-NEXT: s_denorm_mode 12
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_div_fmas_f32 v3, v3, v5, v6			; GFX11-NEXT: v_div_fmas_f32 v3, v3, v4, v6
	; GFX11-NEXT: v_div_fixup_f32 v3, v3, v2, v1			; GFX11-NEXT: v_div_fixup_f32 v3, v3, v2, v1
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_trunc_f32_e32 v3, v3			; GFX11-NEXT: v_trunc_f32_e32 v3, v3
	; GFX11-NEXT: v_fma_f32 v1, -v3, v2, v1			; GFX11-NEXT: v_fma_f32 v1, -v3, v2, v1
	; GFX11-NEXT: global_store_b32 v0, v1, s[4:5]			; GFX11-NEXT: global_store_b32 v0, v1, s[4:5]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	float addrspace(1)* %in2) #0 {			float addrspace(1)* %in2) #0 {
	▲ Show 20 Lines • Show All 293 Lines • ▼ Show 20 Lines
	; SI-NEXT: s_mov_b32 s9, s11			; SI-NEXT: s_mov_b32 s9, s11
	; SI-NEXT: s_mov_b32 s10, s6			; SI-NEXT: s_mov_b32 s10, s6
	; SI-NEXT: s_mov_b32 s11, s7			; SI-NEXT: s_mov_b32 s11, s7
	; SI-NEXT: s_mov_b32 s2, s6			; SI-NEXT: s_mov_b32 s2, s6
	; SI-NEXT: s_mov_b32 s3, s7			; SI-NEXT: s_mov_b32 s3, s7
	; SI-NEXT: buffer_load_dwordx2 v[0:1], off, s[8:11], 0			; SI-NEXT: buffer_load_dwordx2 v[0:1], off, s[8:11], 0
	; SI-NEXT: buffer_load_dwordx2 v[2:3], off, s[0:3], 0			; SI-NEXT: buffer_load_dwordx2 v[2:3], off, s[0:3], 0
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: v_div_scale_f64 v[4:5], s[0:1], v[2:3], v[2:3], v[0:1]			; SI-NEXT: v_div_scale_f64 v[4:5], vcc, v[2:3], v[2:3], v[0:1]
	; SI-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]			; SI-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
	; SI-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0			; SI-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
	; SI-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]			; SI-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
	; SI-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0			; SI-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
	; SI-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]			; SI-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
	; SI-NEXT: v_div_scale_f64 v[8:9], s[0:1], v[0:1], v[2:3], v[0:1]			; SI-NEXT: v_div_scale_f64 v[8:9], vcc, v[0:1], v[2:3], v[0:1]
	; SI-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]			; SI-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]
	; SI-NEXT: v_fma_f64 v[12:13], -v[4:5], v[10:11], v[8:9]			; SI-NEXT: v_fma_f64 v[12:13], -v[4:5], v[10:11], v[8:9]
	; SI-NEXT: v_cmp_eq_u32_e32 vcc, v3, v5			; SI-NEXT: v_cmp_eq_u32_e32 vcc, v3, v5
	; SI-NEXT: v_cmp_eq_u32_e64 s[0:1], v1, v9			; SI-NEXT: v_cmp_eq_u32_e64 s[0:1], v1, v9
	; SI-NEXT: s_xor_b64 vcc, s[0:1], vcc			; SI-NEXT: s_xor_b64 vcc, s[0:1], vcc
	; SI-NEXT: s_nop 1			; SI-NEXT: s_nop 1
	; SI-NEXT: v_div_fmas_f64 v[4:5], v[12:13], v[6:7], v[10:11]			; SI-NEXT: v_div_fmas_f64 v[4:5], v[12:13], v[6:7], v[10:11]
	; SI-NEXT: v_div_fixup_f64 v[4:5], v[4:5], v[2:3], v[0:1]			; SI-NEXT: v_div_fixup_f64 v[4:5], v[4:5], v[2:3], v[0:1]
	Show All 30 Lines
	; CI-NEXT: s_mov_b32 s4, s6			; CI-NEXT: s_mov_b32 s4, s6
	; CI-NEXT: s_mov_b32 s5, s7			; CI-NEXT: s_mov_b32 s5, s7
	; CI-NEXT: s_mov_b32 s6, s10			; CI-NEXT: s_mov_b32 s6, s10
	; CI-NEXT: s_mov_b32 s7, s11			; CI-NEXT: s_mov_b32 s7, s11
	; CI-NEXT: s_mov_b32 s3, s11			; CI-NEXT: s_mov_b32 s3, s11
	; CI-NEXT: buffer_load_dwordx2 v[0:1], off, s[4:7], 0			; CI-NEXT: buffer_load_dwordx2 v[0:1], off, s[4:7], 0
	; CI-NEXT: buffer_load_dwordx2 v[2:3], off, s[0:3], 0			; CI-NEXT: buffer_load_dwordx2 v[2:3], off, s[0:3], 0
	; CI-NEXT: s_waitcnt vmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0)
	; CI-NEXT: v_div_scale_f64 v[4:5], s[0:1], v[2:3], v[2:3], v[0:1]			; CI-NEXT: v_div_scale_f64 v[4:5], vcc, v[2:3], v[2:3], v[0:1]
	; CI-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]			; CI-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
	; CI-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0			; CI-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
	; CI-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]			; CI-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
	; CI-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0			; CI-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
	; CI-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]			; CI-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
	; CI-NEXT: v_div_scale_f64 v[8:9], vcc, v[0:1], v[2:3], v[0:1]			; CI-NEXT: v_div_scale_f64 v[8:9], vcc, v[0:1], v[2:3], v[0:1]
	; CI-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]			; CI-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]
	; CI-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]			; CI-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]
	Show All 14 Lines
	; VI-NEXT: v_mov_b32_e32 v3, s7			; VI-NEXT: v_mov_b32_e32 v3, s7
	; VI-NEXT: v_mov_b32_e32 v4, s0			; VI-NEXT: v_mov_b32_e32 v4, s0
	; VI-NEXT: v_mov_b32_e32 v5, s1			; VI-NEXT: v_mov_b32_e32 v5, s1
	; VI-NEXT: flat_load_dwordx2 v[2:3], v[2:3]			; VI-NEXT: flat_load_dwordx2 v[2:3], v[2:3]
	; VI-NEXT: flat_load_dwordx2 v[4:5], v[4:5]			; VI-NEXT: flat_load_dwordx2 v[4:5], v[4:5]
	; VI-NEXT: v_mov_b32_e32 v0, s4			; VI-NEXT: v_mov_b32_e32 v0, s4
	; VI-NEXT: v_mov_b32_e32 v1, s5			; VI-NEXT: v_mov_b32_e32 v1, s5
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: v_div_scale_f64 v[6:7], s[0:1], v[4:5], v[4:5], v[2:3]			; VI-NEXT: v_div_scale_f64 v[6:7], vcc, v[4:5], v[4:5], v[2:3]
	; VI-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]			; VI-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]
	; VI-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0			; VI-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
	; VI-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]			; VI-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
	; VI-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0			; VI-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
	; VI-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]			; VI-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
	; VI-NEXT: v_div_scale_f64 v[10:11], vcc, v[2:3], v[4:5], v[2:3]			; VI-NEXT: v_div_scale_f64 v[10:11], vcc, v[2:3], v[4:5], v[2:3]
	; VI-NEXT: v_mul_f64 v[12:13], v[10:11], v[8:9]			; VI-NEXT: v_mul_f64 v[12:13], v[10:11], v[8:9]
	; VI-NEXT: v_fma_f64 v[6:7], -v[6:7], v[12:13], v[10:11]			; VI-NEXT: v_fma_f64 v[6:7], -v[6:7], v[12:13], v[10:11]
	Show All 9 Lines
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX9-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34			; GFX9-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	; GFX9-NEXT: v_mov_b32_e32 v12, 0			; GFX9-NEXT: v_mov_b32_e32 v12, 0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dwordx2 v[0:1], v12, s[6:7]			; GFX9-NEXT: global_load_dwordx2 v[0:1], v12, s[6:7]
	; GFX9-NEXT: global_load_dwordx2 v[2:3], v12, s[2:3]			; GFX9-NEXT: global_load_dwordx2 v[2:3], v12, s[2:3]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_div_scale_f64 v[4:5], s[0:1], v[2:3], v[2:3], v[0:1]			; GFX9-NEXT: v_div_scale_f64 v[4:5], vcc, v[2:3], v[2:3], v[0:1]
	; GFX9-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]			; GFX9-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
	; GFX9-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0			; GFX9-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
	; GFX9-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]			; GFX9-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
	; GFX9-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0			; GFX9-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
	; GFX9-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]			; GFX9-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
	; GFX9-NEXT: v_div_scale_f64 v[8:9], vcc, v[0:1], v[2:3], v[0:1]			; GFX9-NEXT: v_div_scale_f64 v[8:9], vcc, v[0:1], v[2:3], v[0:1]
	; GFX9-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]			; GFX9-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]
	; GFX9-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]			; GFX9-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]
	Show All 11 Lines
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34			; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	; GFX10-NEXT: v_mov_b32_e32 v12, 0			; GFX10-NEXT: v_mov_b32_e32 v12, 0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v12, s[6:7]			; GFX10-NEXT: global_load_dwordx2 v[0:1], v12, s[6:7]
	; GFX10-NEXT: global_load_dwordx2 v[2:3], v12, s[2:3]			; GFX10-NEXT: global_load_dwordx2 v[2:3], v12, s[2:3]
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f64 v[4:5], s0, v[2:3], v[2:3], v[0:1]			; GFX10-NEXT: v_div_scale_f64 v[4:5], vcc_lo, v[2:3], v[2:3], v[0:1]
	; GFX10-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]			; GFX10-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
	; GFX10-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0			; GFX10-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
	; GFX10-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]			; GFX10-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
	; GFX10-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0			; GFX10-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
	; GFX10-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]			; GFX10-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
	; GFX10-NEXT: v_div_scale_f64 v[8:9], vcc_lo, v[0:1], v[2:3], v[0:1]			; GFX10-NEXT: v_div_scale_f64 v[8:9], vcc_lo, v[0:1], v[2:3], v[0:1]
	; GFX10-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]			; GFX10-NEXT: v_mul_f64 v[10:11], v[8:9], v[6:7]
	; GFX10-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]			; GFX10-NEXT: v_fma_f64 v[4:5], -v[4:5], v[10:11], v[8:9]
	Show All 10 Lines
	; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24			; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x34			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x34
	; GFX11-NEXT: v_mov_b32_e32 v12, 0			; GFX11-NEXT: v_mov_b32_e32 v12, 0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: global_load_b64 v[0:1], v12, s[6:7]			; GFX11-NEXT: global_load_b64 v[0:1], v12, s[6:7]
	; GFX11-NEXT: global_load_b64 v[2:3], v12, s[0:1]			; GFX11-NEXT: global_load_b64 v[2:3], v12, s[0:1]
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_div_scale_f64 v[4:5], null, v[2:3], v[2:3], v[0:1]			; GFX11-NEXT: v_div_scale_f64 v[4:5], vcc_lo, v[2:3], v[2:3], v[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]			; GFX11-NEXT: v_rcp_f64_e32 v[6:7], v[4:5]
	; GFX11-NEXT: s_waitcnt_depctr 0xfff			; GFX11-NEXT: s_waitcnt_depctr 0xfff
	; GFX11-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0			; GFX11-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
	; GFX11-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]			; GFX11-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0			; GFX11-NEXT: v_fma_f64 v[8:9], -v[4:5], v[6:7], 1.0
	; GFX11-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]			; GFX11-NEXT: v_fma_f64 v[6:7], v[6:7], v[8:9], v[6:7]
	▲ Show 20 Lines • Show All 415 Lines • ▼ Show 20 Lines
	; SI-NEXT: v_cvt_f32_f16_e32 v1, v0			; SI-NEXT: v_cvt_f32_f16_e32 v1, v0
	; SI-NEXT: v_lshrrev_b32_e32 v0, 16, v0			; SI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
	; SI-NEXT: v_cvt_f32_f16_e32 v0, v0			; SI-NEXT: v_cvt_f32_f16_e32 v0, v0
	; SI-NEXT: buffer_load_dword v2, off, s[8:11], 0 offset:16			; SI-NEXT: buffer_load_dword v2, off, s[8:11], 0 offset:16
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: v_cvt_f32_f16_e32 v3, v2			; SI-NEXT: v_cvt_f32_f16_e32 v3, v2
	; SI-NEXT: v_lshrrev_b32_e32 v2, 16, v2			; SI-NEXT: v_lshrrev_b32_e32 v2, 16, v2
	; SI-NEXT: v_cvt_f32_f16_e32 v2, v2			; SI-NEXT: v_cvt_f32_f16_e32 v2, v2
	; SI-NEXT: v_div_scale_f32 v4, vcc, v0, v2, v0			; SI-NEXT: v_div_scale_f32 v4, vcc, v2, v2, v0
	; SI-NEXT: v_div_scale_f32 v5, s[4:5], v2, v2, v0			; SI-NEXT: v_rcp_f32_e32 v5, v4
	; SI-NEXT: v_rcp_f32_e32 v6, v5			; SI-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; SI-NEXT: v_fma_f32 v7, -v5, v6, 1.0			; SI-NEXT: v_fma_f32 v7, -v4, v5, 1.0
	; SI-NEXT: v_fma_f32 v6, v7, v6, v6			; SI-NEXT: v_fma_f32 v5, v7, v5, v5
	; SI-NEXT: v_mul_f32_e32 v7, v4, v6			; SI-NEXT: v_mul_f32_e32 v7, v6, v5
	; SI-NEXT: v_fma_f32 v8, -v5, v7, v4			; SI-NEXT: v_fma_f32 v8, -v4, v7, v6
	; SI-NEXT: v_fma_f32 v7, v8, v6, v7			; SI-NEXT: v_fma_f32 v7, v8, v5, v7
	; SI-NEXT: v_fma_f32 v4, -v5, v7, v4			; SI-NEXT: v_fma_f32 v4, -v4, v7, v6
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; SI-NEXT: v_div_fmas_f32 v4, v4, v6, v7			; SI-NEXT: v_div_fmas_f32 v4, v4, v5, v7
	; SI-NEXT: v_div_fixup_f32 v4, v4, v2, v0			; SI-NEXT: v_div_fixup_f32 v4, v4, v2, v0
	; SI-NEXT: v_trunc_f32_e32 v4, v4			; SI-NEXT: v_trunc_f32_e32 v4, v4
	; SI-NEXT: v_fma_f32 v0, -v4, v2, v0			; SI-NEXT: v_fma_f32 v0, -v4, v2, v0
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
	; SI-NEXT: v_cvt_f16_f32_e32 v0, v0			; SI-NEXT: v_cvt_f16_f32_e32 v0, v0
	; SI-NEXT: v_lshlrev_b32_e32 v0, 16, v0			; SI-NEXT: v_lshlrev_b32_e32 v0, 16, v0
	; SI-NEXT: v_div_scale_f32 v2, vcc, v1, v3, v1			; SI-NEXT: v_div_scale_f32 v2, vcc, v3, v3, v1
	; SI-NEXT: v_div_scale_f32 v4, s[4:5], v3, v3, v1			; SI-NEXT: v_rcp_f32_e32 v4, v2
	; SI-NEXT: v_rcp_f32_e32 v5, v4			; SI-NEXT: v_div_scale_f32 v5, vcc, v1, v3, v1
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; SI-NEXT: v_fma_f32 v6, -v4, v5, 1.0			; SI-NEXT: v_fma_f32 v6, -v2, v4, 1.0
	; SI-NEXT: v_fma_f32 v5, v6, v5, v5			; SI-NEXT: v_fma_f32 v4, v6, v4, v4
	; SI-NEXT: v_mul_f32_e32 v6, v2, v5			; SI-NEXT: v_mul_f32_e32 v6, v5, v4
	; SI-NEXT: v_fma_f32 v7, -v4, v6, v2			; SI-NEXT: v_fma_f32 v7, -v2, v6, v5
	; SI-NEXT: v_fma_f32 v6, v7, v5, v6			; SI-NEXT: v_fma_f32 v6, v7, v4, v6
	; SI-NEXT: v_fma_f32 v2, -v4, v6, v2			; SI-NEXT: v_fma_f32 v2, -v2, v6, v5
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; SI-NEXT: v_div_fmas_f32 v2, v2, v5, v6			; SI-NEXT: v_div_fmas_f32 v2, v2, v4, v6
	; SI-NEXT: v_div_fixup_f32 v2, v2, v3, v1			; SI-NEXT: v_div_fixup_f32 v2, v2, v3, v1
	; SI-NEXT: v_trunc_f32_e32 v2, v2			; SI-NEXT: v_trunc_f32_e32 v2, v2
	; SI-NEXT: v_fma_f32 v1, -v2, v3, v1			; SI-NEXT: v_fma_f32 v1, -v2, v3, v1
	; SI-NEXT: v_cvt_f16_f32_e32 v1, v1			; SI-NEXT: v_cvt_f16_f32_e32 v1, v1
	; SI-NEXT: v_or_b32_e32 v0, v1, v0			; SI-NEXT: v_or_b32_e32 v0, v1, v0
	; SI-NEXT: buffer_store_dword v0, off, s[0:3], 0			; SI-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	Show All 17 Lines
	; CI-NEXT: s_waitcnt vmcnt(1)			; CI-NEXT: s_waitcnt vmcnt(1)
	; CI-NEXT: v_cvt_f32_f16_e32 v1, v0			; CI-NEXT: v_cvt_f32_f16_e32 v1, v0
	; CI-NEXT: v_lshrrev_b32_e32 v0, 16, v0			; CI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
	; CI-NEXT: s_waitcnt vmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0)
	; CI-NEXT: v_cvt_f32_f16_e32 v3, v2			; CI-NEXT: v_cvt_f32_f16_e32 v3, v2
	; CI-NEXT: v_lshrrev_b32_e32 v2, 16, v2			; CI-NEXT: v_lshrrev_b32_e32 v2, 16, v2
	; CI-NEXT: v_cvt_f32_f16_e32 v0, v0			; CI-NEXT: v_cvt_f32_f16_e32 v0, v0
	; CI-NEXT: v_cvt_f32_f16_e32 v2, v2			; CI-NEXT: v_cvt_f32_f16_e32 v2, v2
	; CI-NEXT: v_div_scale_f32 v5, s[4:5], v2, v2, v0			; CI-NEXT: v_div_scale_f32 v4, vcc, v2, v2, v0
	; CI-NEXT: v_div_scale_f32 v4, vcc, v0, v2, v0			; CI-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0
	; CI-NEXT: v_rcp_f32_e32 v6, v5			; CI-NEXT: v_rcp_f32_e32 v5, v4
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v7, -v5, v6, 1.0			; CI-NEXT: v_fma_f32 v7, -v4, v5, 1.0
	; CI-NEXT: v_fma_f32 v6, v7, v6, v6			; CI-NEXT: v_fma_f32 v5, v7, v5, v5
	; CI-NEXT: v_mul_f32_e32 v7, v4, v6			; CI-NEXT: v_mul_f32_e32 v7, v6, v5
	; CI-NEXT: v_fma_f32 v8, -v5, v7, v4			; CI-NEXT: v_fma_f32 v8, -v4, v7, v6
	; CI-NEXT: v_fma_f32 v7, v8, v6, v7			; CI-NEXT: v_fma_f32 v7, v8, v5, v7
	; CI-NEXT: v_fma_f32 v4, -v5, v7, v4			; CI-NEXT: v_fma_f32 v4, -v4, v7, v6
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; CI-NEXT: v_div_fmas_f32 v4, v4, v6, v7			; CI-NEXT: v_div_fmas_f32 v4, v4, v5, v7
	; CI-NEXT: v_div_fixup_f32 v4, v4, v2, v0			; CI-NEXT: v_div_fixup_f32 v4, v4, v2, v0
	; CI-NEXT: v_trunc_f32_e32 v4, v4			; CI-NEXT: v_trunc_f32_e32 v4, v4
	; CI-NEXT: v_fma_f32 v0, -v4, v2, v0			; CI-NEXT: v_fma_f32 v0, -v4, v2, v0
	; CI-NEXT: v_div_scale_f32 v4, s[4:5], v3, v3, v1			; CI-NEXT: v_div_scale_f32 v2, vcc, v3, v3, v1
	; CI-NEXT: v_div_scale_f32 v2, vcc, v1, v3, v1			; CI-NEXT: v_div_scale_f32 v5, vcc, v1, v3, v1
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
	; CI-NEXT: v_cvt_f16_f32_e32 v0, v0			; CI-NEXT: v_cvt_f16_f32_e32 v0, v0
	; CI-NEXT: v_lshlrev_b32_e32 v0, 16, v0			; CI-NEXT: v_lshlrev_b32_e32 v0, 16, v0
	; CI-NEXT: v_rcp_f32_e32 v5, v4			; CI-NEXT: v_rcp_f32_e32 v4, v2
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v6, -v4, v5, 1.0			; CI-NEXT: v_fma_f32 v6, -v2, v4, 1.0
	; CI-NEXT: v_fma_f32 v5, v6, v5, v5			; CI-NEXT: v_fma_f32 v4, v6, v4, v4
	; CI-NEXT: v_mul_f32_e32 v6, v2, v5			; CI-NEXT: v_mul_f32_e32 v6, v5, v4
	; CI-NEXT: v_fma_f32 v7, -v4, v6, v2			; CI-NEXT: v_fma_f32 v7, -v2, v6, v5
	; CI-NEXT: v_fma_f32 v6, v7, v5, v6			; CI-NEXT: v_fma_f32 v6, v7, v4, v6
	; CI-NEXT: v_fma_f32 v2, -v4, v6, v2			; CI-NEXT: v_fma_f32 v2, -v2, v6, v5
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; CI-NEXT: v_div_fmas_f32 v2, v2, v5, v6			; CI-NEXT: v_div_fmas_f32 v2, v2, v4, v6
	; CI-NEXT: v_div_fixup_f32 v2, v2, v3, v1			; CI-NEXT: v_div_fixup_f32 v2, v2, v3, v1
	; CI-NEXT: v_trunc_f32_e32 v2, v2			; CI-NEXT: v_trunc_f32_e32 v2, v2
	; CI-NEXT: v_fma_f32 v1, -v2, v3, v1			; CI-NEXT: v_fma_f32 v1, -v2, v3, v1
	; CI-NEXT: v_cvt_f16_f32_e32 v1, v1			; CI-NEXT: v_cvt_f16_f32_e32 v1, v1
	; CI-NEXT: v_or_b32_e32 v0, v1, v0			; CI-NEXT: v_or_b32_e32 v0, v1, v0
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], 0			; CI-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; CI-NEXT: s_endpgm			; CI-NEXT: s_endpgm
	;			;
	▲ Show 20 Lines • Show All 182 Lines • ▼ Show 20 Lines
	; SI-NEXT: buffer_load_dwordx2 v[0:1], off, s[8:11], 0 offset:32			; SI-NEXT: buffer_load_dwordx2 v[0:1], off, s[8:11], 0 offset:32
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: v_cvt_f32_f16_e32 v6, v0			; SI-NEXT: v_cvt_f32_f16_e32 v6, v0
	; SI-NEXT: v_lshrrev_b32_e32 v0, 16, v0			; SI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
	; SI-NEXT: v_cvt_f32_f16_e32 v0, v0			; SI-NEXT: v_cvt_f32_f16_e32 v0, v0
	; SI-NEXT: v_cvt_f32_f16_e32 v7, v1			; SI-NEXT: v_cvt_f32_f16_e32 v7, v1
	; SI-NEXT: v_lshrrev_b32_e32 v1, 16, v1			; SI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
	; SI-NEXT: v_cvt_f32_f16_e32 v1, v1			; SI-NEXT: v_cvt_f32_f16_e32 v1, v1
	; SI-NEXT: v_div_scale_f32 v8, vcc, v5, v1, v5			; SI-NEXT: v_div_scale_f32 v8, vcc, v1, v1, v5
	; SI-NEXT: v_div_scale_f32 v9, s[4:5], v1, v1, v5			; SI-NEXT: v_rcp_f32_e32 v9, v8
	; SI-NEXT: v_rcp_f32_e32 v10, v9			; SI-NEXT: v_div_scale_f32 v10, vcc, v5, v1, v5
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; SI-NEXT: v_fma_f32 v11, -v9, v10, 1.0			; SI-NEXT: v_fma_f32 v11, -v8, v9, 1.0
	; SI-NEXT: v_fma_f32 v10, v11, v10, v10			; SI-NEXT: v_fma_f32 v9, v11, v9, v9
	; SI-NEXT: v_mul_f32_e32 v11, v8, v10			; SI-NEXT: v_mul_f32_e32 v11, v10, v9
	; SI-NEXT: v_fma_f32 v12, -v9, v11, v8			; SI-NEXT: v_fma_f32 v12, -v8, v11, v10
	; SI-NEXT: v_fma_f32 v11, v12, v10, v11			; SI-NEXT: v_fma_f32 v11, v12, v9, v11
	; SI-NEXT: v_fma_f32 v8, -v9, v11, v8			; SI-NEXT: v_fma_f32 v8, -v8, v11, v10
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; SI-NEXT: v_div_fmas_f32 v8, v8, v10, v11			; SI-NEXT: v_div_fmas_f32 v8, v8, v9, v11
	; SI-NEXT: v_div_fixup_f32 v8, v8, v1, v5			; SI-NEXT: v_div_fixup_f32 v8, v8, v1, v5
	; SI-NEXT: v_trunc_f32_e32 v8, v8			; SI-NEXT: v_trunc_f32_e32 v8, v8
	; SI-NEXT: v_fma_f32 v1, -v8, v1, v5			; SI-NEXT: v_fma_f32 v1, -v8, v1, v5
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
	; SI-NEXT: v_cvt_f16_f32_e32 v1, v1			; SI-NEXT: v_cvt_f16_f32_e32 v1, v1
	; SI-NEXT: v_lshlrev_b32_e32 v1, 16, v1			; SI-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; SI-NEXT: v_div_scale_f32 v5, vcc, v4, v7, v4			; SI-NEXT: v_div_scale_f32 v5, vcc, v7, v7, v4
	; SI-NEXT: v_div_scale_f32 v8, s[4:5], v7, v7, v4			; SI-NEXT: v_rcp_f32_e32 v8, v5
	; SI-NEXT: v_rcp_f32_e32 v9, v8			; SI-NEXT: v_div_scale_f32 v9, vcc, v4, v7, v4
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; SI-NEXT: v_fma_f32 v10, -v8, v9, 1.0			; SI-NEXT: v_fma_f32 v10, -v5, v8, 1.0
	; SI-NEXT: v_fma_f32 v9, v10, v9, v9			; SI-NEXT: v_fma_f32 v8, v10, v8, v8
	; SI-NEXT: v_mul_f32_e32 v10, v5, v9			; SI-NEXT: v_mul_f32_e32 v10, v9, v8
	; SI-NEXT: v_fma_f32 v11, -v8, v10, v5			; SI-NEXT: v_fma_f32 v11, -v5, v10, v9
	; SI-NEXT: v_fma_f32 v10, v11, v9, v10			; SI-NEXT: v_fma_f32 v10, v11, v8, v10
	; SI-NEXT: v_fma_f32 v5, -v8, v10, v5			; SI-NEXT: v_fma_f32 v5, -v5, v10, v9
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; SI-NEXT: v_div_fmas_f32 v5, v5, v9, v10			; SI-NEXT: v_div_fmas_f32 v5, v5, v8, v10
	; SI-NEXT: v_div_fixup_f32 v5, v5, v7, v4			; SI-NEXT: v_div_fixup_f32 v5, v5, v7, v4
	; SI-NEXT: v_trunc_f32_e32 v5, v5			; SI-NEXT: v_trunc_f32_e32 v5, v5
	; SI-NEXT: v_fma_f32 v4, -v5, v7, v4			; SI-NEXT: v_fma_f32 v4, -v5, v7, v4
	; SI-NEXT: v_cvt_f16_f32_e32 v4, v4			; SI-NEXT: v_cvt_f16_f32_e32 v4, v4
	; SI-NEXT: v_or_b32_e32 v1, v4, v1			; SI-NEXT: v_or_b32_e32 v1, v4, v1
	; SI-NEXT: v_div_scale_f32 v4, vcc, v3, v0, v3			; SI-NEXT: v_div_scale_f32 v4, vcc, v0, v0, v3
	; SI-NEXT: v_div_scale_f32 v5, s[4:5], v0, v0, v3			; SI-NEXT: v_rcp_f32_e32 v5, v4
	; SI-NEXT: v_rcp_f32_e32 v7, v5			; SI-NEXT: v_div_scale_f32 v7, vcc, v3, v0, v3
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; SI-NEXT: v_fma_f32 v8, -v5, v7, 1.0			; SI-NEXT: v_fma_f32 v8, -v4, v5, 1.0
	; SI-NEXT: v_fma_f32 v7, v8, v7, v7			; SI-NEXT: v_fma_f32 v5, v8, v5, v5
	; SI-NEXT: v_mul_f32_e32 v8, v4, v7			; SI-NEXT: v_mul_f32_e32 v8, v7, v5
	; SI-NEXT: v_fma_f32 v9, -v5, v8, v4			; SI-NEXT: v_fma_f32 v9, -v4, v8, v7
	; SI-NEXT: v_fma_f32 v8, v9, v7, v8			; SI-NEXT: v_fma_f32 v8, v9, v5, v8
	; SI-NEXT: v_fma_f32 v4, -v5, v8, v4			; SI-NEXT: v_fma_f32 v4, -v4, v8, v7
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; SI-NEXT: v_div_fmas_f32 v4, v4, v7, v8			; SI-NEXT: v_div_fmas_f32 v4, v4, v5, v8
	; SI-NEXT: v_div_fixup_f32 v4, v4, v0, v3			; SI-NEXT: v_div_fixup_f32 v4, v4, v0, v3
	; SI-NEXT: v_trunc_f32_e32 v4, v4			; SI-NEXT: v_trunc_f32_e32 v4, v4
	; SI-NEXT: v_fma_f32 v0, -v4, v0, v3			; SI-NEXT: v_fma_f32 v0, -v4, v0, v3
	; SI-NEXT: v_cvt_f16_f32_e32 v0, v0			; SI-NEXT: v_cvt_f16_f32_e32 v0, v0
	; SI-NEXT: v_lshlrev_b32_e32 v0, 16, v0			; SI-NEXT: v_lshlrev_b32_e32 v0, 16, v0
	; SI-NEXT: v_div_scale_f32 v3, vcc, v2, v6, v2			; SI-NEXT: v_div_scale_f32 v3, vcc, v6, v6, v2
	; SI-NEXT: v_div_scale_f32 v4, s[4:5], v6, v6, v2			; SI-NEXT: v_rcp_f32_e32 v4, v3
	; SI-NEXT: v_rcp_f32_e32 v5, v4			; SI-NEXT: v_div_scale_f32 v5, vcc, v2, v6, v2
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; SI-NEXT: v_fma_f32 v7, -v4, v5, 1.0			; SI-NEXT: v_fma_f32 v7, -v3, v4, 1.0
	; SI-NEXT: v_fma_f32 v5, v7, v5, v5			; SI-NEXT: v_fma_f32 v4, v7, v4, v4
	; SI-NEXT: v_mul_f32_e32 v7, v3, v5			; SI-NEXT: v_mul_f32_e32 v7, v5, v4
	; SI-NEXT: v_fma_f32 v8, -v4, v7, v3			; SI-NEXT: v_fma_f32 v8, -v3, v7, v5
	; SI-NEXT: v_fma_f32 v7, v8, v5, v7			; SI-NEXT: v_fma_f32 v7, v8, v4, v7
	; SI-NEXT: v_fma_f32 v3, -v4, v7, v3			; SI-NEXT: v_fma_f32 v3, -v3, v7, v5
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; SI-NEXT: v_div_fmas_f32 v3, v3, v5, v7			; SI-NEXT: v_div_fmas_f32 v3, v3, v4, v7
	; SI-NEXT: v_div_fixup_f32 v3, v3, v6, v2			; SI-NEXT: v_div_fixup_f32 v3, v3, v6, v2
	; SI-NEXT: v_trunc_f32_e32 v3, v3			; SI-NEXT: v_trunc_f32_e32 v3, v3
	; SI-NEXT: v_fma_f32 v2, -v3, v6, v2			; SI-NEXT: v_fma_f32 v2, -v3, v6, v2
	; SI-NEXT: v_cvt_f16_f32_e32 v2, v2			; SI-NEXT: v_cvt_f16_f32_e32 v2, v2
	; SI-NEXT: v_or_b32_e32 v0, v2, v0			; SI-NEXT: v_or_b32_e32 v0, v2, v0
	; SI-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; SI-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	Show All 23 Lines
	; CI-NEXT: buffer_load_dwordx2 v[0:1], off, s[8:11], 0 offset:32			; CI-NEXT: buffer_load_dwordx2 v[0:1], off, s[8:11], 0 offset:32
	; CI-NEXT: s_waitcnt vmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0)
	; CI-NEXT: v_cvt_f32_f16_e32 v7, v1			; CI-NEXT: v_cvt_f32_f16_e32 v7, v1
	; CI-NEXT: v_lshrrev_b32_e32 v1, 16, v1			; CI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
	; CI-NEXT: v_cvt_f32_f16_e32 v1, v1			; CI-NEXT: v_cvt_f32_f16_e32 v1, v1
	; CI-NEXT: v_cvt_f32_f16_e32 v6, v0			; CI-NEXT: v_cvt_f32_f16_e32 v6, v0
	; CI-NEXT: v_lshrrev_b32_e32 v0, 16, v0			; CI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
	; CI-NEXT: v_cvt_f32_f16_e32 v0, v0			; CI-NEXT: v_cvt_f32_f16_e32 v0, v0
	; CI-NEXT: v_div_scale_f32 v9, s[4:5], v1, v1, v5			; CI-NEXT: v_div_scale_f32 v8, vcc, v1, v1, v5
	; CI-NEXT: v_div_scale_f32 v8, vcc, v5, v1, v5			; CI-NEXT: v_div_scale_f32 v10, vcc, v5, v1, v5
	; CI-NEXT: v_rcp_f32_e32 v10, v9			; CI-NEXT: v_rcp_f32_e32 v9, v8
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v11, -v9, v10, 1.0			; CI-NEXT: v_fma_f32 v11, -v8, v9, 1.0
	; CI-NEXT: v_fma_f32 v10, v11, v10, v10			; CI-NEXT: v_fma_f32 v9, v11, v9, v9
	; CI-NEXT: v_mul_f32_e32 v11, v8, v10			; CI-NEXT: v_mul_f32_e32 v11, v10, v9
	; CI-NEXT: v_fma_f32 v12, -v9, v11, v8			; CI-NEXT: v_fma_f32 v12, -v8, v11, v10
	; CI-NEXT: v_fma_f32 v11, v12, v10, v11			; CI-NEXT: v_fma_f32 v11, v12, v9, v11
	; CI-NEXT: v_fma_f32 v8, -v9, v11, v8			; CI-NEXT: v_fma_f32 v8, -v8, v11, v10
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; CI-NEXT: v_div_fmas_f32 v8, v8, v10, v11			; CI-NEXT: v_div_fmas_f32 v8, v8, v9, v11
	; CI-NEXT: v_div_fixup_f32 v8, v8, v1, v5			; CI-NEXT: v_div_fixup_f32 v8, v8, v1, v5
	; CI-NEXT: v_trunc_f32_e32 v8, v8			; CI-NEXT: v_trunc_f32_e32 v8, v8
	; CI-NEXT: v_fma_f32 v1, -v8, v1, v5			; CI-NEXT: v_fma_f32 v1, -v8, v1, v5
	; CI-NEXT: v_div_scale_f32 v8, s[4:5], v7, v7, v4			; CI-NEXT: v_div_scale_f32 v5, vcc, v7, v7, v4
	; CI-NEXT: v_div_scale_f32 v5, vcc, v4, v7, v4			; CI-NEXT: v_div_scale_f32 v9, vcc, v4, v7, v4
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
	; CI-NEXT: v_cvt_f16_f32_e32 v1, v1			; CI-NEXT: v_cvt_f16_f32_e32 v1, v1
	; CI-NEXT: v_lshlrev_b32_e32 v1, 16, v1			; CI-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; CI-NEXT: v_rcp_f32_e32 v9, v8			; CI-NEXT: v_rcp_f32_e32 v8, v5
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v10, -v8, v9, 1.0			; CI-NEXT: v_fma_f32 v10, -v5, v8, 1.0
	; CI-NEXT: v_fma_f32 v9, v10, v9, v9			; CI-NEXT: v_fma_f32 v8, v10, v8, v8
	; CI-NEXT: v_mul_f32_e32 v10, v5, v9			; CI-NEXT: v_mul_f32_e32 v10, v9, v8
	; CI-NEXT: v_fma_f32 v11, -v8, v10, v5			; CI-NEXT: v_fma_f32 v11, -v5, v10, v9
	; CI-NEXT: v_fma_f32 v10, v11, v9, v10			; CI-NEXT: v_fma_f32 v10, v11, v8, v10
	; CI-NEXT: v_fma_f32 v5, -v8, v10, v5			; CI-NEXT: v_fma_f32 v5, -v5, v10, v9
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; CI-NEXT: v_div_fmas_f32 v5, v5, v9, v10			; CI-NEXT: v_div_fmas_f32 v5, v5, v8, v10
	; CI-NEXT: v_div_fixup_f32 v5, v5, v7, v4			; CI-NEXT: v_div_fixup_f32 v5, v5, v7, v4
	; CI-NEXT: v_trunc_f32_e32 v5, v5			; CI-NEXT: v_trunc_f32_e32 v5, v5
	; CI-NEXT: v_fma_f32 v4, -v5, v7, v4			; CI-NEXT: v_fma_f32 v4, -v5, v7, v4
	; CI-NEXT: v_div_scale_f32 v5, s[4:5], v0, v0, v3
	; CI-NEXT: v_cvt_f16_f32_e32 v4, v4			; CI-NEXT: v_cvt_f16_f32_e32 v4, v4
	; CI-NEXT: v_or_b32_e32 v1, v4, v1			; CI-NEXT: v_or_b32_e32 v1, v4, v1
	; CI-NEXT: v_div_scale_f32 v4, vcc, v3, v0, v3			; CI-NEXT: v_div_scale_f32 v4, vcc, v0, v0, v3
	; CI-NEXT: v_rcp_f32_e32 v7, v5			; CI-NEXT: v_div_scale_f32 v7, vcc, v3, v0, v3
				; CI-NEXT: v_rcp_f32_e32 v5, v4
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v8, -v5, v7, 1.0			; CI-NEXT: v_fma_f32 v8, -v4, v5, 1.0
	; CI-NEXT: v_fma_f32 v7, v8, v7, v7			; CI-NEXT: v_fma_f32 v5, v8, v5, v5
	; CI-NEXT: v_mul_f32_e32 v8, v4, v7			; CI-NEXT: v_mul_f32_e32 v8, v7, v5
	; CI-NEXT: v_fma_f32 v9, -v5, v8, v4			; CI-NEXT: v_fma_f32 v9, -v4, v8, v7
	; CI-NEXT: v_fma_f32 v8, v9, v7, v8			; CI-NEXT: v_fma_f32 v8, v9, v5, v8
	; CI-NEXT: v_fma_f32 v4, -v5, v8, v4			; CI-NEXT: v_fma_f32 v4, -v4, v8, v7
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; CI-NEXT: v_div_fmas_f32 v4, v4, v7, v8			; CI-NEXT: v_div_fmas_f32 v4, v4, v5, v8
	; CI-NEXT: v_div_fixup_f32 v4, v4, v0, v3			; CI-NEXT: v_div_fixup_f32 v4, v4, v0, v3
	; CI-NEXT: v_trunc_f32_e32 v4, v4			; CI-NEXT: v_trunc_f32_e32 v4, v4
	; CI-NEXT: v_fma_f32 v0, -v4, v0, v3			; CI-NEXT: v_fma_f32 v0, -v4, v0, v3
	; CI-NEXT: v_div_scale_f32 v4, s[4:5], v6, v6, v2			; CI-NEXT: v_div_scale_f32 v3, vcc, v6, v6, v2
	; CI-NEXT: v_div_scale_f32 v3, vcc, v2, v6, v2			; CI-NEXT: v_div_scale_f32 v5, vcc, v2, v6, v2
	; CI-NEXT: v_cvt_f16_f32_e32 v0, v0			; CI-NEXT: v_cvt_f16_f32_e32 v0, v0
	; CI-NEXT: v_lshlrev_b32_e32 v0, 16, v0			; CI-NEXT: v_lshlrev_b32_e32 v0, 16, v0
	; CI-NEXT: v_rcp_f32_e32 v5, v4			; CI-NEXT: v_rcp_f32_e32 v4, v3
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v7, -v4, v5, 1.0			; CI-NEXT: v_fma_f32 v7, -v3, v4, 1.0
	; CI-NEXT: v_fma_f32 v5, v7, v5, v5			; CI-NEXT: v_fma_f32 v4, v7, v4, v4
	; CI-NEXT: v_mul_f32_e32 v7, v3, v5			; CI-NEXT: v_mul_f32_e32 v7, v5, v4
	; CI-NEXT: v_fma_f32 v8, -v4, v7, v3			; CI-NEXT: v_fma_f32 v8, -v3, v7, v5
	; CI-NEXT: v_fma_f32 v7, v8, v5, v7			; CI-NEXT: v_fma_f32 v7, v8, v4, v7
	; CI-NEXT: v_fma_f32 v3, -v4, v7, v3			; CI-NEXT: v_fma_f32 v3, -v3, v7, v5
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; CI-NEXT: v_div_fmas_f32 v3, v3, v5, v7			; CI-NEXT: v_div_fmas_f32 v3, v3, v4, v7
	; CI-NEXT: v_div_fixup_f32 v3, v3, v6, v2			; CI-NEXT: v_div_fixup_f32 v3, v3, v6, v2
	; CI-NEXT: v_trunc_f32_e32 v3, v3			; CI-NEXT: v_trunc_f32_e32 v3, v3
	; CI-NEXT: v_fma_f32 v2, -v3, v6, v2			; CI-NEXT: v_fma_f32 v2, -v3, v6, v2
	; CI-NEXT: v_cvt_f16_f32_e32 v2, v2			; CI-NEXT: v_cvt_f16_f32_e32 v2, v2
	; CI-NEXT: v_or_b32_e32 v0, v2, v0			; CI-NEXT: v_or_b32_e32 v0, v2, v0
	; CI-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; CI-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; CI-NEXT: s_endpgm			; CI-NEXT: s_endpgm
	;			;
	▲ Show 20 Lines • Show All 241 Lines • ▼ Show 20 Lines
	}			}

	define amdgpu_kernel void @frem_v2f32(<2 x float> addrspace(1)* %out, <2 x float> addrspace(1)* %in1,			define amdgpu_kernel void @frem_v2f32(<2 x float> addrspace(1)* %out, <2 x float> addrspace(1)* %in1,
	; SI-LABEL: frem_v2f32:			; SI-LABEL: frem_v2f32:
	; SI: ; %bb.0:			; SI: ; %bb.0:
	; SI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; SI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; SI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0xd			; SI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0xd
	; SI-NEXT: s_mov_b32 s3, 0xf000			; SI-NEXT: s_mov_b32 s3, 0xf000
	; SI-NEXT: s_mov_b32 s2, -1			; SI-NEXT: s_mov_b32 s2, -1
				rampitecUnsubmitted Not Done Reply Inline Actions What happened here? rampitec: What happened here?
				Pierre-vhAuthorUnsubmitted Done Reply Inline Actions I think something is wrong with the way I replace `SDValue(N, 1)` in `SelectDIV_SCALE` but I'm not sure how to fix it. The `addLiveIn` is suspicious (I can't find any ISelDAGtoDAG impl that also uses it) but without it, it crashes. Pierre-vh: I think something is wrong with the way I replace `SDValue(N, 1)` in `SelectDIV_SCALE` but I'm…
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: s_mov_b32 s0, s4			; SI-NEXT: s_mov_b32 s0, s4
	; SI-NEXT: s_mov_b32 s1, s5			; SI-NEXT: s_mov_b32 s1, s5
	; SI-NEXT: s_mov_b32 s4, s6			; SI-NEXT: s_mov_b32 s4, s6
	; SI-NEXT: s_mov_b32 s5, s7			; SI-NEXT: s_mov_b32 s5, s7
	; SI-NEXT: s_mov_b32 s6, s2			; SI-NEXT: s_mov_b32 s6, s2
	; SI-NEXT: s_mov_b32 s7, s3			; SI-NEXT: s_mov_b32 s7, s3
	; SI-NEXT: s_mov_b32 s10, s2			; SI-NEXT: s_mov_b32 s10, s2
	; SI-NEXT: s_mov_b32 s11, s3			; SI-NEXT: s_mov_b32 s11, s3
	; SI-NEXT: buffer_load_dwordx2 v[0:1], off, s[4:7], 0			; SI-NEXT: buffer_load_dwordx2 v[0:1], off, s[4:7], 0
	; SI-NEXT: buffer_load_dwordx2 v[2:3], off, s[8:11], 0 offset:32			; SI-NEXT: buffer_load_dwordx2 v[2:3], off, s[8:11], 0 offset:32
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: v_div_scale_f32 v4, vcc, v1, v3, v1			; SI-NEXT: v_div_scale_f32 v4, vcc, v3, v3, v1
	; SI-NEXT: v_div_scale_f32 v5, s[4:5], v3, v3, v1			; SI-NEXT: v_rcp_f32_e32 v5, v4
	; SI-NEXT: v_rcp_f32_e32 v6, v5			; SI-NEXT: v_div_scale_f32 v6, vcc, v1, v3, v1
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; SI-NEXT: v_fma_f32 v7, -v5, v6, 1.0			; SI-NEXT: v_fma_f32 v7, -v4, v5, 1.0
	; SI-NEXT: v_fma_f32 v6, v7, v6, v6			; SI-NEXT: v_fma_f32 v5, v7, v5, v5
	; SI-NEXT: v_mul_f32_e32 v7, v4, v6			; SI-NEXT: v_mul_f32_e32 v7, v6, v5
	; SI-NEXT: v_fma_f32 v8, -v5, v7, v4			; SI-NEXT: v_fma_f32 v8, -v4, v7, v6
	; SI-NEXT: v_fma_f32 v7, v8, v6, v7			; SI-NEXT: v_fma_f32 v7, v8, v5, v7
	; SI-NEXT: v_fma_f32 v4, -v5, v7, v4			; SI-NEXT: v_fma_f32 v4, -v4, v7, v6
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; SI-NEXT: v_div_fmas_f32 v4, v4, v6, v7			; SI-NEXT: v_div_fmas_f32 v4, v4, v5, v7
	; SI-NEXT: v_div_fixup_f32 v4, v4, v3, v1			; SI-NEXT: v_div_fixup_f32 v4, v4, v3, v1
	; SI-NEXT: v_trunc_f32_e32 v4, v4			; SI-NEXT: v_trunc_f32_e32 v4, v4
	; SI-NEXT: v_fma_f32 v1, -v4, v3, v1			; SI-NEXT: v_fma_f32 v1, -v4, v3, v1
	; SI-NEXT: v_div_scale_f32 v3, vcc, v0, v2, v0			; SI-NEXT: v_div_scale_f32 v3, vcc, v2, v2, v0
	; SI-NEXT: v_div_scale_f32 v4, s[4:5], v2, v2, v0			; SI-NEXT: v_rcp_f32_e32 v4, v3
	; SI-NEXT: v_rcp_f32_e32 v5, v4			; SI-NEXT: v_div_scale_f32 v5, vcc, v0, v2, v0
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; SI-NEXT: v_fma_f32 v6, -v4, v5, 1.0			; SI-NEXT: v_fma_f32 v6, -v3, v4, 1.0
	; SI-NEXT: v_fma_f32 v5, v6, v5, v5			; SI-NEXT: v_fma_f32 v4, v6, v4, v4
	; SI-NEXT: v_mul_f32_e32 v6, v3, v5			; SI-NEXT: v_mul_f32_e32 v6, v5, v4
	; SI-NEXT: v_fma_f32 v7, -v4, v6, v3			; SI-NEXT: v_fma_f32 v7, -v3, v6, v5
	; SI-NEXT: v_fma_f32 v6, v7, v5, v6			; SI-NEXT: v_fma_f32 v6, v7, v4, v6
	; SI-NEXT: v_fma_f32 v3, -v4, v6, v3			; SI-NEXT: v_fma_f32 v3, -v3, v6, v5
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; SI-NEXT: v_div_fmas_f32 v3, v3, v5, v6			; SI-NEXT: v_div_fmas_f32 v3, v3, v4, v6
	; SI-NEXT: v_div_fixup_f32 v3, v3, v2, v0			; SI-NEXT: v_div_fixup_f32 v3, v3, v2, v0
	; SI-NEXT: v_trunc_f32_e32 v3, v3			; SI-NEXT: v_trunc_f32_e32 v3, v3
	; SI-NEXT: v_fma_f32 v0, -v3, v2, v0			; SI-NEXT: v_fma_f32 v0, -v3, v2, v0
	; SI-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; SI-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	; CI-LABEL: frem_v2f32:			; CI-LABEL: frem_v2f32:
	; CI: ; %bb.0:			; CI: ; %bb.0:
	; CI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; CI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; CI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0xd			; CI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0xd
	; CI-NEXT: s_mov_b32 s3, 0xf000			; CI-NEXT: s_mov_b32 s3, 0xf000
	; CI-NEXT: s_mov_b32 s2, -1			; CI-NEXT: s_mov_b32 s2, -1
	; CI-NEXT: s_mov_b32 s10, s2			; CI-NEXT: s_mov_b32 s10, s2
	; CI-NEXT: s_waitcnt lgkmcnt(0)			; CI-NEXT: s_waitcnt lgkmcnt(0)
	; CI-NEXT: s_mov_b32 s0, s4			; CI-NEXT: s_mov_b32 s0, s4
	; CI-NEXT: s_mov_b32 s1, s5			; CI-NEXT: s_mov_b32 s1, s5
	; CI-NEXT: s_mov_b32 s4, s6			; CI-NEXT: s_mov_b32 s4, s6
	; CI-NEXT: s_mov_b32 s5, s7			; CI-NEXT: s_mov_b32 s5, s7
	; CI-NEXT: s_mov_b32 s6, s2			; CI-NEXT: s_mov_b32 s6, s2
	; CI-NEXT: s_mov_b32 s7, s3			; CI-NEXT: s_mov_b32 s7, s3
	; CI-NEXT: s_mov_b32 s11, s3			; CI-NEXT: s_mov_b32 s11, s3
	; CI-NEXT: buffer_load_dwordx2 v[0:1], off, s[4:7], 0			; CI-NEXT: buffer_load_dwordx2 v[0:1], off, s[4:7], 0
	; CI-NEXT: buffer_load_dwordx2 v[2:3], off, s[8:11], 0 offset:32			; CI-NEXT: buffer_load_dwordx2 v[2:3], off, s[8:11], 0 offset:32
	; CI-NEXT: s_waitcnt vmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0)
	; CI-NEXT: v_div_scale_f32 v5, s[4:5], v3, v3, v1			; CI-NEXT: v_div_scale_f32 v4, vcc, v3, v3, v1
	; CI-NEXT: v_div_scale_f32 v4, vcc, v1, v3, v1			; CI-NEXT: v_div_scale_f32 v6, vcc, v1, v3, v1
	; CI-NEXT: v_rcp_f32_e32 v6, v5			; CI-NEXT: v_rcp_f32_e32 v5, v4
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v7, -v5, v6, 1.0			; CI-NEXT: v_fma_f32 v7, -v4, v5, 1.0
	; CI-NEXT: v_fma_f32 v6, v7, v6, v6			; CI-NEXT: v_fma_f32 v5, v7, v5, v5
	; CI-NEXT: v_mul_f32_e32 v7, v4, v6			; CI-NEXT: v_mul_f32_e32 v7, v6, v5
	; CI-NEXT: v_fma_f32 v8, -v5, v7, v4			; CI-NEXT: v_fma_f32 v8, -v4, v7, v6
	; CI-NEXT: v_fma_f32 v7, v8, v6, v7			; CI-NEXT: v_fma_f32 v7, v8, v5, v7
	; CI-NEXT: v_fma_f32 v4, -v5, v7, v4			; CI-NEXT: v_fma_f32 v4, -v4, v7, v6
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; CI-NEXT: v_div_fmas_f32 v4, v4, v6, v7			; CI-NEXT: v_div_fmas_f32 v4, v4, v5, v7
	; CI-NEXT: v_div_fixup_f32 v4, v4, v3, v1			; CI-NEXT: v_div_fixup_f32 v4, v4, v3, v1
	; CI-NEXT: v_trunc_f32_e32 v4, v4			; CI-NEXT: v_trunc_f32_e32 v4, v4
	; CI-NEXT: v_fma_f32 v1, -v4, v3, v1			; CI-NEXT: v_fma_f32 v1, -v4, v3, v1
	; CI-NEXT: v_div_scale_f32 v4, s[4:5], v2, v2, v0			; CI-NEXT: v_div_scale_f32 v3, vcc, v2, v2, v0
	; CI-NEXT: v_div_scale_f32 v3, vcc, v0, v2, v0			; CI-NEXT: v_div_scale_f32 v5, vcc, v0, v2, v0
	; CI-NEXT: v_rcp_f32_e32 v5, v4			; CI-NEXT: v_rcp_f32_e32 v4, v3
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v6, -v4, v5, 1.0			; CI-NEXT: v_fma_f32 v6, -v3, v4, 1.0
	; CI-NEXT: v_fma_f32 v5, v6, v5, v5			; CI-NEXT: v_fma_f32 v4, v6, v4, v4
	; CI-NEXT: v_mul_f32_e32 v6, v3, v5			; CI-NEXT: v_mul_f32_e32 v6, v5, v4
	; CI-NEXT: v_fma_f32 v7, -v4, v6, v3			; CI-NEXT: v_fma_f32 v7, -v3, v6, v5
	; CI-NEXT: v_fma_f32 v6, v7, v5, v6			; CI-NEXT: v_fma_f32 v6, v7, v4, v6
	; CI-NEXT: v_fma_f32 v3, -v4, v6, v3			; CI-NEXT: v_fma_f32 v3, -v3, v6, v5
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; CI-NEXT: v_div_fmas_f32 v3, v3, v5, v6			; CI-NEXT: v_div_fmas_f32 v3, v3, v4, v6
	; CI-NEXT: v_div_fixup_f32 v3, v3, v2, v0			; CI-NEXT: v_div_fixup_f32 v3, v3, v2, v0
	; CI-NEXT: v_trunc_f32_e32 v3, v3			; CI-NEXT: v_trunc_f32_e32 v3, v3
	; CI-NEXT: v_fma_f32 v0, -v3, v2, v0			; CI-NEXT: v_fma_f32 v0, -v3, v2, v0
	; CI-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; CI-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; CI-NEXT: s_endpgm			; CI-NEXT: s_endpgm
	;			;
	; VI-LABEL: frem_v2f32:			; VI-LABEL: frem_v2f32:
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: v_mov_b32_e32 v2, s6			; VI-NEXT: v_mov_b32_e32 v2, s6
	; VI-NEXT: s_add_u32 s0, s0, 32			; VI-NEXT: s_add_u32 s0, s0, 32
	; VI-NEXT: s_addc_u32 s1, s1, 0			; VI-NEXT: s_addc_u32 s1, s1, 0
	; VI-NEXT: v_mov_b32_e32 v5, s1			; VI-NEXT: v_mov_b32_e32 v5, s1
	; VI-NEXT: v_mov_b32_e32 v3, s7			; VI-NEXT: v_mov_b32_e32 v3, s7
	; VI-NEXT: v_mov_b32_e32 v4, s0			; VI-NEXT: v_mov_b32_e32 v4, s0
	; VI-NEXT: flat_load_dwordx2 v[2:3], v[2:3]			; VI-NEXT: flat_load_dwordx2 v[2:3], v[2:3]
	; VI-NEXT: flat_load_dwordx2 v[4:5], v[4:5]			; VI-NEXT: flat_load_dwordx2 v[4:5], v[4:5]
	; VI-NEXT: v_mov_b32_e32 v0, s4			; VI-NEXT: v_mov_b32_e32 v0, s4
	; VI-NEXT: v_mov_b32_e32 v1, s5			; VI-NEXT: v_mov_b32_e32 v1, s5
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: v_div_scale_f32 v7, s[0:1], v5, v5, v3			; VI-NEXT: v_div_scale_f32 v6, vcc, v5, v5, v3
	; VI-NEXT: v_div_scale_f32 v6, vcc, v3, v5, v3			; VI-NEXT: v_div_scale_f32 v8, vcc, v3, v5, v3
	; VI-NEXT: v_rcp_f32_e32 v8, v7			; VI-NEXT: v_rcp_f32_e32 v7, v6
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; VI-NEXT: v_fma_f32 v9, -v7, v8, 1.0			; VI-NEXT: v_fma_f32 v9, -v6, v7, 1.0
	; VI-NEXT: v_fma_f32 v8, v9, v8, v8			; VI-NEXT: v_fma_f32 v7, v9, v7, v7
	; VI-NEXT: v_mul_f32_e32 v9, v6, v8			; VI-NEXT: v_mul_f32_e32 v9, v8, v7
	; VI-NEXT: v_fma_f32 v10, -v7, v9, v6			; VI-NEXT: v_fma_f32 v10, -v6, v9, v8
	; VI-NEXT: v_fma_f32 v9, v10, v8, v9			; VI-NEXT: v_fma_f32 v9, v10, v7, v9
	; VI-NEXT: v_fma_f32 v6, -v7, v9, v6			; VI-NEXT: v_fma_f32 v6, -v6, v9, v8
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; VI-NEXT: v_div_fmas_f32 v6, v6, v8, v9			; VI-NEXT: v_div_fmas_f32 v6, v6, v7, v9
	; VI-NEXT: v_div_fixup_f32 v6, v6, v5, v3			; VI-NEXT: v_div_fixup_f32 v6, v6, v5, v3
	; VI-NEXT: v_trunc_f32_e32 v6, v6			; VI-NEXT: v_trunc_f32_e32 v6, v6
	; VI-NEXT: v_fma_f32 v3, -v6, v5, v3			; VI-NEXT: v_fma_f32 v3, -v6, v5, v3
	; VI-NEXT: v_div_scale_f32 v6, s[0:1], v4, v4, v2			; VI-NEXT: v_div_scale_f32 v5, vcc, v4, v4, v2
	; VI-NEXT: v_div_scale_f32 v5, vcc, v2, v4, v2			; VI-NEXT: v_div_scale_f32 v7, vcc, v2, v4, v2
	; VI-NEXT: v_rcp_f32_e32 v7, v6			; VI-NEXT: v_rcp_f32_e32 v6, v5
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; VI-NEXT: v_fma_f32 v8, -v6, v7, 1.0			; VI-NEXT: v_fma_f32 v8, -v5, v6, 1.0
	; VI-NEXT: v_fma_f32 v7, v8, v7, v7			; VI-NEXT: v_fma_f32 v6, v8, v6, v6
	; VI-NEXT: v_mul_f32_e32 v8, v5, v7			; VI-NEXT: v_mul_f32_e32 v8, v7, v6
	; VI-NEXT: v_fma_f32 v9, -v6, v8, v5			; VI-NEXT: v_fma_f32 v9, -v5, v8, v7
	; VI-NEXT: v_fma_f32 v8, v9, v7, v8			; VI-NEXT: v_fma_f32 v8, v9, v6, v8
	; VI-NEXT: v_fma_f32 v5, -v6, v8, v5			; VI-NEXT: v_fma_f32 v5, -v5, v8, v7
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; VI-NEXT: v_div_fmas_f32 v5, v5, v7, v8			; VI-NEXT: v_div_fmas_f32 v5, v5, v6, v8
	; VI-NEXT: v_div_fixup_f32 v5, v5, v4, v2			; VI-NEXT: v_div_fixup_f32 v5, v5, v4, v2
	; VI-NEXT: v_trunc_f32_e32 v5, v5			; VI-NEXT: v_trunc_f32_e32 v5, v5
	; VI-NEXT: v_fma_f32 v2, -v5, v4, v2			; VI-NEXT: v_fma_f32 v2, -v5, v4, v2
	; VI-NEXT: flat_store_dwordx2 v[0:1], v[2:3]			; VI-NEXT: flat_store_dwordx2 v[0:1], v[2:3]
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: frem_v2f32:			; GFX9-LABEL: frem_v2f32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX9-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34			; GFX9-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	; GFX9-NEXT: v_mov_b32_e32 v4, 0			; GFX9-NEXT: v_mov_b32_e32 v4, 0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dwordx2 v[0:1], v4, s[6:7]			; GFX9-NEXT: global_load_dwordx2 v[0:1], v4, s[6:7]
	; GFX9-NEXT: global_load_dwordx2 v[2:3], v4, s[2:3] offset:32			; GFX9-NEXT: global_load_dwordx2 v[2:3], v4, s[2:3] offset:32
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_div_scale_f32 v6, s[0:1], v3, v3, v1			; GFX9-NEXT: v_div_scale_f32 v5, vcc, v3, v3, v1
	; GFX9-NEXT: v_div_scale_f32 v5, vcc, v1, v3, v1			; GFX9-NEXT: v_div_scale_f32 v7, vcc, v1, v3, v1
	; GFX9-NEXT: v_rcp_f32_e32 v7, v6			; GFX9-NEXT: v_rcp_f32_e32 v6, v5
	; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX9-NEXT: v_fma_f32 v8, -v6, v7, 1.0			; GFX9-NEXT: v_fma_f32 v8, -v5, v6, 1.0
	; GFX9-NEXT: v_fma_f32 v7, v8, v7, v7			; GFX9-NEXT: v_fma_f32 v6, v8, v6, v6
	; GFX9-NEXT: v_mul_f32_e32 v8, v5, v7			; GFX9-NEXT: v_mul_f32_e32 v8, v7, v6
	; GFX9-NEXT: v_fma_f32 v9, -v6, v8, v5			; GFX9-NEXT: v_fma_f32 v9, -v5, v8, v7
	; GFX9-NEXT: v_fma_f32 v8, v9, v7, v8			; GFX9-NEXT: v_fma_f32 v8, v9, v6, v8
	; GFX9-NEXT: v_fma_f32 v5, -v6, v8, v5			; GFX9-NEXT: v_fma_f32 v5, -v5, v8, v7
	; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; GFX9-NEXT: v_div_fmas_f32 v5, v5, v7, v8			; GFX9-NEXT: v_div_fmas_f32 v5, v5, v6, v8
	; GFX9-NEXT: v_div_fixup_f32 v5, v5, v3, v1			; GFX9-NEXT: v_div_fixup_f32 v5, v5, v3, v1
	; GFX9-NEXT: v_trunc_f32_e32 v5, v5			; GFX9-NEXT: v_trunc_f32_e32 v5, v5
	; GFX9-NEXT: v_fma_f32 v1, -v5, v3, v1			; GFX9-NEXT: v_fma_f32 v1, -v5, v3, v1
	; GFX9-NEXT: v_div_scale_f32 v5, s[0:1], v2, v2, v0			; GFX9-NEXT: v_div_scale_f32 v3, vcc, v2, v2, v0
	; GFX9-NEXT: v_div_scale_f32 v3, vcc, v0, v2, v0			; GFX9-NEXT: v_div_scale_f32 v6, vcc, v0, v2, v0
	; GFX9-NEXT: v_rcp_f32_e32 v6, v5			; GFX9-NEXT: v_rcp_f32_e32 v5, v3
	; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX9-NEXT: v_fma_f32 v7, -v5, v6, 1.0			; GFX9-NEXT: v_fma_f32 v7, -v3, v5, 1.0
	; GFX9-NEXT: v_fma_f32 v6, v7, v6, v6			; GFX9-NEXT: v_fma_f32 v5, v7, v5, v5
	; GFX9-NEXT: v_mul_f32_e32 v7, v3, v6			; GFX9-NEXT: v_mul_f32_e32 v7, v6, v5
	; GFX9-NEXT: v_fma_f32 v8, -v5, v7, v3			; GFX9-NEXT: v_fma_f32 v8, -v3, v7, v6
	; GFX9-NEXT: v_fma_f32 v7, v8, v6, v7			; GFX9-NEXT: v_fma_f32 v7, v8, v5, v7
	; GFX9-NEXT: v_fma_f32 v3, -v5, v7, v3			; GFX9-NEXT: v_fma_f32 v3, -v3, v7, v6
	; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; GFX9-NEXT: v_div_fmas_f32 v3, v3, v6, v7			; GFX9-NEXT: v_div_fmas_f32 v3, v3, v5, v7
	; GFX9-NEXT: v_div_fixup_f32 v3, v3, v2, v0			; GFX9-NEXT: v_div_fixup_f32 v3, v3, v2, v0
	; GFX9-NEXT: v_trunc_f32_e32 v3, v3			; GFX9-NEXT: v_trunc_f32_e32 v3, v3
	; GFX9-NEXT: v_fma_f32 v0, -v3, v2, v0			; GFX9-NEXT: v_fma_f32 v0, -v3, v2, v0
	; GFX9-NEXT: global_store_dwordx2 v4, v[0:1], s[4:5]			; GFX9-NEXT: global_store_dwordx2 v4, v[0:1], s[4:5]
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: frem_v2f32:			; GFX10-LABEL: frem_v2f32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34			; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	; GFX10-NEXT: v_mov_b32_e32 v4, 0			; GFX10-NEXT: v_mov_b32_e32 v4, 0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v4, s[6:7]			; GFX10-NEXT: global_load_dwordx2 v[0:1], v4, s[6:7]
	; GFX10-NEXT: global_load_dwordx2 v[2:3], v4, s[2:3] offset:32			; GFX10-NEXT: global_load_dwordx2 v[2:3], v4, s[2:3] offset:32
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f32 v6, s0, v3, v3, v1			; GFX10-NEXT: v_div_scale_f32 v5, vcc_lo, v3, v3, v1
	; GFX10-NEXT: v_div_scale_f32 v5, vcc_lo, v1, v3, v1			; GFX10-NEXT: v_div_scale_f32 v7, vcc_lo, v1, v3, v1
	; GFX10-NEXT: v_rcp_f32_e32 v7, v6			; GFX10-NEXT: v_rcp_f32_e32 v6, v5
	; GFX10-NEXT: s_denorm_mode 15			; GFX10-NEXT: s_denorm_mode 15
	; GFX10-NEXT: v_fma_f32 v8, -v6, v7, 1.0			; GFX10-NEXT: v_fma_f32 v8, -v5, v6, 1.0
	; GFX10-NEXT: v_fmac_f32_e32 v7, v8, v7			; GFX10-NEXT: v_fmac_f32_e32 v6, v8, v6
	; GFX10-NEXT: v_mul_f32_e32 v8, v5, v7			; GFX10-NEXT: v_mul_f32_e32 v8, v7, v6
	; GFX10-NEXT: v_fma_f32 v9, -v6, v8, v5			; GFX10-NEXT: v_fma_f32 v9, -v5, v8, v7
	; GFX10-NEXT: v_fmac_f32_e32 v8, v9, v7			; GFX10-NEXT: v_fmac_f32_e32 v8, v9, v6
	; GFX10-NEXT: v_fma_f32 v5, -v6, v8, v5			; GFX10-NEXT: v_fma_f32 v5, -v5, v8, v7
	; GFX10-NEXT: s_denorm_mode 12			; GFX10-NEXT: s_denorm_mode 12
	; GFX10-NEXT: v_div_fmas_f32 v5, v5, v7, v8			; GFX10-NEXT: v_div_fmas_f32 v5, v5, v6, v8
	; GFX10-NEXT: v_div_fixup_f32 v5, v5, v3, v1			; GFX10-NEXT: v_div_fixup_f32 v5, v5, v3, v1
	; GFX10-NEXT: v_trunc_f32_e32 v5, v5			; GFX10-NEXT: v_trunc_f32_e32 v5, v5
	; GFX10-NEXT: v_fma_f32 v1, -v5, v3, v1			; GFX10-NEXT: v_fma_f32 v1, -v5, v3, v1
	; GFX10-NEXT: v_div_scale_f32 v5, s0, v2, v2, v0			; GFX10-NEXT: v_div_scale_f32 v3, vcc_lo, v2, v2, v0
	; GFX10-NEXT: v_div_scale_f32 v3, vcc_lo, v0, v2, v0			; GFX10-NEXT: v_div_scale_f32 v6, vcc_lo, v0, v2, v0
	; GFX10-NEXT: v_rcp_f32_e32 v6, v5			; GFX10-NEXT: v_rcp_f32_e32 v5, v3
	; GFX10-NEXT: s_denorm_mode 15			; GFX10-NEXT: s_denorm_mode 15
	; GFX10-NEXT: v_fma_f32 v7, -v5, v6, 1.0			; GFX10-NEXT: v_fma_f32 v7, -v3, v5, 1.0
	; GFX10-NEXT: v_fmac_f32_e32 v6, v7, v6			; GFX10-NEXT: v_fmac_f32_e32 v5, v7, v5
	; GFX10-NEXT: v_mul_f32_e32 v7, v3, v6			; GFX10-NEXT: v_mul_f32_e32 v7, v6, v5
	; GFX10-NEXT: v_fma_f32 v8, -v5, v7, v3			; GFX10-NEXT: v_fma_f32 v8, -v3, v7, v6
	; GFX10-NEXT: v_fmac_f32_e32 v7, v8, v6			; GFX10-NEXT: v_fmac_f32_e32 v7, v8, v5
	; GFX10-NEXT: v_fma_f32 v3, -v5, v7, v3			; GFX10-NEXT: v_fma_f32 v3, -v3, v7, v6
	; GFX10-NEXT: s_denorm_mode 12			; GFX10-NEXT: s_denorm_mode 12
	; GFX10-NEXT: v_div_fmas_f32 v3, v3, v6, v7			; GFX10-NEXT: v_div_fmas_f32 v3, v3, v5, v7
	; GFX10-NEXT: v_div_fixup_f32 v3, v3, v2, v0			; GFX10-NEXT: v_div_fixup_f32 v3, v3, v2, v0
	; GFX10-NEXT: v_trunc_f32_e32 v3, v3			; GFX10-NEXT: v_trunc_f32_e32 v3, v3
	; GFX10-NEXT: v_fma_f32 v0, -v3, v2, v0			; GFX10-NEXT: v_fma_f32 v0, -v3, v2, v0
	; GFX10-NEXT: global_store_dwordx2 v4, v[0:1], s[4:5]			; GFX10-NEXT: global_store_dwordx2 v4, v[0:1], s[4:5]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: frem_v2f32:			; GFX11-LABEL: frem_v2f32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24			; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x34			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x34
	; GFX11-NEXT: v_mov_b32_e32 v4, 0			; GFX11-NEXT: v_mov_b32_e32 v4, 0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: global_load_b64 v[0:1], v4, s[6:7]			; GFX11-NEXT: global_load_b64 v[0:1], v4, s[6:7]
	; GFX11-NEXT: global_load_b64 v[2:3], v4, s[0:1] offset:32			; GFX11-NEXT: global_load_b64 v[2:3], v4, s[0:1] offset:32
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_div_scale_f32 v6, null, v3, v3, v1			; GFX11-NEXT: v_div_scale_f32 v5, vcc_lo, v3, v3, v1
	; GFX11-NEXT: v_div_scale_f32 v5, vcc_lo, v1, v3, v1			; GFX11-NEXT: v_div_scale_f32 v7, vcc_lo, v1, v3, v1
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_rcp_f32_e32 v7, v6			; GFX11-NEXT: v_rcp_f32_e32 v6, v5
	; GFX11-NEXT: s_denorm_mode 15			; GFX11-NEXT: s_denorm_mode 15
	; GFX11-NEXT: s_waitcnt_depctr 0xfff			; GFX11-NEXT: s_waitcnt_depctr 0xfff
	; GFX11-NEXT: v_fma_f32 v8, -v6, v7, 1.0			; GFX11-NEXT: v_fma_f32 v8, -v5, v6, 1.0
	; GFX11-NEXT: v_fmac_f32_e32 v7, v8, v7			; GFX11-NEXT: v_fmac_f32_e32 v6, v8, v6
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_mul_f32_e32 v8, v5, v7			; GFX11-NEXT: v_mul_f32_e32 v8, v7, v6
	; GFX11-NEXT: v_fma_f32 v9, -v6, v8, v5			; GFX11-NEXT: v_fma_f32 v9, -v5, v8, v7
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_fmac_f32_e32 v8, v9, v7			; GFX11-NEXT: v_fmac_f32_e32 v8, v9, v6
	; GFX11-NEXT: v_fma_f32 v5, -v6, v8, v5			; GFX11-NEXT: v_fma_f32 v5, -v5, v8, v7
	; GFX11-NEXT: s_denorm_mode 12			; GFX11-NEXT: s_denorm_mode 12
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_div_fmas_f32 v5, v5, v7, v8			; GFX11-NEXT: v_div_fmas_f32 v5, v5, v6, v8
	; GFX11-NEXT: v_div_fixup_f32 v5, v5, v3, v1			; GFX11-NEXT: v_div_fixup_f32 v5, v5, v3, v1
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_trunc_f32_e32 v5, v5			; GFX11-NEXT: v_trunc_f32_e32 v5, v5
	; GFX11-NEXT: v_fma_f32 v1, -v5, v3, v1			; GFX11-NEXT: v_fma_f32 v1, -v5, v3, v1
	; GFX11-NEXT: v_div_scale_f32 v5, null, v2, v2, v0			; GFX11-NEXT: v_div_scale_f32 v3, vcc_lo, v2, v2, v0
	; GFX11-NEXT: v_div_scale_f32 v3, vcc_lo, v0, v2, v0			; GFX11-NEXT: v_div_scale_f32 v6, vcc_lo, v0, v2, v0
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_rcp_f32_e32 v6, v5			; GFX11-NEXT: v_rcp_f32_e32 v5, v3
	; GFX11-NEXT: s_denorm_mode 15			; GFX11-NEXT: s_denorm_mode 15
	; GFX11-NEXT: s_waitcnt_depctr 0xfff			; GFX11-NEXT: s_waitcnt_depctr 0xfff
	; GFX11-NEXT: v_fma_f32 v7, -v5, v6, 1.0			; GFX11-NEXT: v_fma_f32 v7, -v3, v5, 1.0
	; GFX11-NEXT: v_fmac_f32_e32 v6, v7, v6			; GFX11-NEXT: v_fmac_f32_e32 v5, v7, v5
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_mul_f32_e32 v7, v3, v6			; GFX11-NEXT: v_mul_f32_e32 v7, v6, v5
	; GFX11-NEXT: v_fma_f32 v8, -v5, v7, v3			; GFX11-NEXT: v_fma_f32 v8, -v3, v7, v6
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_fmac_f32_e32 v7, v8, v6			; GFX11-NEXT: v_fmac_f32_e32 v7, v8, v5
	; GFX11-NEXT: v_fma_f32 v3, -v5, v7, v3			; GFX11-NEXT: v_fma_f32 v3, -v3, v7, v6
	; GFX11-NEXT: s_denorm_mode 12			; GFX11-NEXT: s_denorm_mode 12
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_div_fmas_f32 v3, v3, v6, v7			; GFX11-NEXT: v_div_fmas_f32 v3, v3, v5, v7
	; GFX11-NEXT: v_div_fixup_f32 v3, v3, v2, v0			; GFX11-NEXT: v_div_fixup_f32 v3, v3, v2, v0
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_trunc_f32_e32 v3, v3			; GFX11-NEXT: v_trunc_f32_e32 v3, v3
	; GFX11-NEXT: v_fma_f32 v0, -v3, v2, v0			; GFX11-NEXT: v_fma_f32 v0, -v3, v2, v0
	; GFX11-NEXT: global_store_b64 v4, v[0:1], s[4:5]			; GFX11-NEXT: global_store_b64 v4, v[0:1], s[4:5]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	<2 x float> addrspace(1)* %in2) #0 {			<2 x float> addrspace(1)* %in2) #0 {
	Show All 19 Lines
	; SI-NEXT: s_mov_b32 s5, s7			; SI-NEXT: s_mov_b32 s5, s7
	; SI-NEXT: s_mov_b32 s6, s2			; SI-NEXT: s_mov_b32 s6, s2
	; SI-NEXT: s_mov_b32 s7, s3			; SI-NEXT: s_mov_b32 s7, s3
	; SI-NEXT: s_mov_b32 s10, s2			; SI-NEXT: s_mov_b32 s10, s2
	; SI-NEXT: s_mov_b32 s11, s3			; SI-NEXT: s_mov_b32 s11, s3
	; SI-NEXT: buffer_load_dwordx4 v[0:3], off, s[4:7], 0			; SI-NEXT: buffer_load_dwordx4 v[0:3], off, s[4:7], 0
	; SI-NEXT: buffer_load_dwordx4 v[4:7], off, s[8:11], 0 offset:64			; SI-NEXT: buffer_load_dwordx4 v[4:7], off, s[8:11], 0 offset:64
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: v_div_scale_f32 v8, vcc, v3, v7, v3			; SI-NEXT: v_div_scale_f32 v8, vcc, v7, v7, v3
	; SI-NEXT: v_div_scale_f32 v9, s[4:5], v7, v7, v3			; SI-NEXT: v_rcp_f32_e32 v9, v8
	; SI-NEXT: v_rcp_f32_e32 v10, v9			; SI-NEXT: v_div_scale_f32 v10, vcc, v3, v7, v3
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; SI-NEXT: v_fma_f32 v11, -v9, v10, 1.0			; SI-NEXT: v_fma_f32 v11, -v8, v9, 1.0
	; SI-NEXT: v_fma_f32 v10, v11, v10, v10			; SI-NEXT: v_fma_f32 v9, v11, v9, v9
	; SI-NEXT: v_mul_f32_e32 v11, v8, v10			; SI-NEXT: v_mul_f32_e32 v11, v10, v9
	; SI-NEXT: v_fma_f32 v12, -v9, v11, v8			; SI-NEXT: v_fma_f32 v12, -v8, v11, v10
	; SI-NEXT: v_fma_f32 v11, v12, v10, v11			; SI-NEXT: v_fma_f32 v11, v12, v9, v11
	; SI-NEXT: v_fma_f32 v8, -v9, v11, v8			; SI-NEXT: v_fma_f32 v8, -v8, v11, v10
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; SI-NEXT: v_div_fmas_f32 v8, v8, v10, v11			; SI-NEXT: v_div_fmas_f32 v8, v8, v9, v11
	; SI-NEXT: v_div_fixup_f32 v8, v8, v7, v3			; SI-NEXT: v_div_fixup_f32 v8, v8, v7, v3
	; SI-NEXT: v_trunc_f32_e32 v8, v8			; SI-NEXT: v_trunc_f32_e32 v8, v8
	; SI-NEXT: v_fma_f32 v3, -v8, v7, v3			; SI-NEXT: v_fma_f32 v3, -v8, v7, v3
	; SI-NEXT: v_div_scale_f32 v7, vcc, v2, v6, v2			; SI-NEXT: v_div_scale_f32 v7, vcc, v6, v6, v2
	; SI-NEXT: v_div_scale_f32 v8, s[4:5], v6, v6, v2			; SI-NEXT: v_rcp_f32_e32 v8, v7
	; SI-NEXT: v_rcp_f32_e32 v9, v8			; SI-NEXT: v_div_scale_f32 v9, vcc, v2, v6, v2
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; SI-NEXT: v_fma_f32 v10, -v8, v9, 1.0			; SI-NEXT: v_fma_f32 v10, -v7, v8, 1.0
	; SI-NEXT: v_fma_f32 v9, v10, v9, v9			; SI-NEXT: v_fma_f32 v8, v10, v8, v8
	; SI-NEXT: v_mul_f32_e32 v10, v7, v9			; SI-NEXT: v_mul_f32_e32 v10, v9, v8
	; SI-NEXT: v_fma_f32 v11, -v8, v10, v7			; SI-NEXT: v_fma_f32 v11, -v7, v10, v9
	; SI-NEXT: v_fma_f32 v10, v11, v9, v10			; SI-NEXT: v_fma_f32 v10, v11, v8, v10
	; SI-NEXT: v_fma_f32 v7, -v8, v10, v7			; SI-NEXT: v_fma_f32 v7, -v7, v10, v9
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; SI-NEXT: v_div_fmas_f32 v7, v7, v9, v10			; SI-NEXT: v_div_fmas_f32 v7, v7, v8, v10
	; SI-NEXT: v_div_fixup_f32 v7, v7, v6, v2			; SI-NEXT: v_div_fixup_f32 v7, v7, v6, v2
	; SI-NEXT: v_trunc_f32_e32 v7, v7			; SI-NEXT: v_trunc_f32_e32 v7, v7
	; SI-NEXT: v_fma_f32 v2, -v7, v6, v2			; SI-NEXT: v_fma_f32 v2, -v7, v6, v2
	; SI-NEXT: v_div_scale_f32 v6, vcc, v1, v5, v1			; SI-NEXT: v_div_scale_f32 v6, vcc, v5, v5, v1
	; SI-NEXT: v_div_scale_f32 v7, s[4:5], v5, v5, v1			; SI-NEXT: v_rcp_f32_e32 v7, v6
	; SI-NEXT: v_rcp_f32_e32 v8, v7			; SI-NEXT: v_div_scale_f32 v8, vcc, v1, v5, v1
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; SI-NEXT: v_fma_f32 v9, -v7, v8, 1.0			; SI-NEXT: v_fma_f32 v9, -v6, v7, 1.0
	; SI-NEXT: v_fma_f32 v8, v9, v8, v8			; SI-NEXT: v_fma_f32 v7, v9, v7, v7
	; SI-NEXT: v_mul_f32_e32 v9, v6, v8			; SI-NEXT: v_mul_f32_e32 v9, v8, v7
	; SI-NEXT: v_fma_f32 v10, -v7, v9, v6			; SI-NEXT: v_fma_f32 v10, -v6, v9, v8
	; SI-NEXT: v_fma_f32 v9, v10, v8, v9			; SI-NEXT: v_fma_f32 v9, v10, v7, v9
	; SI-NEXT: v_fma_f32 v6, -v7, v9, v6			; SI-NEXT: v_fma_f32 v6, -v6, v9, v8
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; SI-NEXT: v_div_fmas_f32 v6, v6, v8, v9			; SI-NEXT: v_div_fmas_f32 v6, v6, v7, v9
	; SI-NEXT: v_div_fixup_f32 v6, v6, v5, v1			; SI-NEXT: v_div_fixup_f32 v6, v6, v5, v1
	; SI-NEXT: v_trunc_f32_e32 v6, v6			; SI-NEXT: v_trunc_f32_e32 v6, v6
	; SI-NEXT: v_fma_f32 v1, -v6, v5, v1			; SI-NEXT: v_fma_f32 v1, -v6, v5, v1
	; SI-NEXT: v_div_scale_f32 v5, vcc, v0, v4, v0			; SI-NEXT: v_div_scale_f32 v5, vcc, v4, v4, v0
	; SI-NEXT: v_div_scale_f32 v6, s[4:5], v4, v4, v0			; SI-NEXT: v_rcp_f32_e32 v6, v5
	; SI-NEXT: v_rcp_f32_e32 v7, v6			; SI-NEXT: v_div_scale_f32 v7, vcc, v0, v4, v0
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; SI-NEXT: v_fma_f32 v8, -v6, v7, 1.0			; SI-NEXT: v_fma_f32 v8, -v5, v6, 1.0
	; SI-NEXT: v_fma_f32 v7, v8, v7, v7			; SI-NEXT: v_fma_f32 v6, v8, v6, v6
	; SI-NEXT: v_mul_f32_e32 v8, v5, v7			; SI-NEXT: v_mul_f32_e32 v8, v7, v6
	; SI-NEXT: v_fma_f32 v9, -v6, v8, v5			; SI-NEXT: v_fma_f32 v9, -v5, v8, v7
	; SI-NEXT: v_fma_f32 v8, v9, v7, v8			; SI-NEXT: v_fma_f32 v8, v9, v6, v8
	; SI-NEXT: v_fma_f32 v5, -v6, v8, v5			; SI-NEXT: v_fma_f32 v5, -v5, v8, v7
	; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; SI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; SI-NEXT: v_div_fmas_f32 v5, v5, v7, v8			; SI-NEXT: v_div_fmas_f32 v5, v5, v6, v8
	; SI-NEXT: v_div_fixup_f32 v5, v5, v4, v0			; SI-NEXT: v_div_fixup_f32 v5, v5, v4, v0
	; SI-NEXT: v_trunc_f32_e32 v5, v5			; SI-NEXT: v_trunc_f32_e32 v5, v5
	; SI-NEXT: v_fma_f32 v0, -v5, v4, v0			; SI-NEXT: v_fma_f32 v0, -v5, v4, v0
	; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0			; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	; CI-LABEL: frem_v4f32:			; CI-LABEL: frem_v4f32:
	; CI: ; %bb.0:			; CI: ; %bb.0:
	; CI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; CI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; CI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0xd			; CI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0xd
	; CI-NEXT: s_mov_b32 s3, 0xf000			; CI-NEXT: s_mov_b32 s3, 0xf000
	; CI-NEXT: s_mov_b32 s2, -1			; CI-NEXT: s_mov_b32 s2, -1
	; CI-NEXT: s_mov_b32 s10, s2			; CI-NEXT: s_mov_b32 s10, s2
	; CI-NEXT: s_waitcnt lgkmcnt(0)			; CI-NEXT: s_waitcnt lgkmcnt(0)
	; CI-NEXT: s_mov_b32 s0, s4			; CI-NEXT: s_mov_b32 s0, s4
	; CI-NEXT: s_mov_b32 s1, s5			; CI-NEXT: s_mov_b32 s1, s5
	; CI-NEXT: s_mov_b32 s4, s6			; CI-NEXT: s_mov_b32 s4, s6
	; CI-NEXT: s_mov_b32 s5, s7			; CI-NEXT: s_mov_b32 s5, s7
	; CI-NEXT: s_mov_b32 s6, s2			; CI-NEXT: s_mov_b32 s6, s2
	; CI-NEXT: s_mov_b32 s7, s3			; CI-NEXT: s_mov_b32 s7, s3
	; CI-NEXT: s_mov_b32 s11, s3			; CI-NEXT: s_mov_b32 s11, s3
	; CI-NEXT: buffer_load_dwordx4 v[0:3], off, s[4:7], 0			; CI-NEXT: buffer_load_dwordx4 v[0:3], off, s[4:7], 0
	; CI-NEXT: buffer_load_dwordx4 v[4:7], off, s[8:11], 0 offset:64			; CI-NEXT: buffer_load_dwordx4 v[4:7], off, s[8:11], 0 offset:64
	; CI-NEXT: s_waitcnt vmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0)
	; CI-NEXT: v_div_scale_f32 v9, s[4:5], v7, v7, v3			; CI-NEXT: v_div_scale_f32 v8, vcc, v7, v7, v3
	; CI-NEXT: v_div_scale_f32 v8, vcc, v3, v7, v3			; CI-NEXT: v_div_scale_f32 v10, vcc, v3, v7, v3
	; CI-NEXT: v_rcp_f32_e32 v10, v9			; CI-NEXT: v_rcp_f32_e32 v9, v8
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v11, -v9, v10, 1.0			; CI-NEXT: v_fma_f32 v11, -v8, v9, 1.0
	; CI-NEXT: v_fma_f32 v10, v11, v10, v10			; CI-NEXT: v_fma_f32 v9, v11, v9, v9
	; CI-NEXT: v_mul_f32_e32 v11, v8, v10			; CI-NEXT: v_mul_f32_e32 v11, v10, v9
	; CI-NEXT: v_fma_f32 v12, -v9, v11, v8			; CI-NEXT: v_fma_f32 v12, -v8, v11, v10
	; CI-NEXT: v_fma_f32 v11, v12, v10, v11			; CI-NEXT: v_fma_f32 v11, v12, v9, v11
	; CI-NEXT: v_fma_f32 v8, -v9, v11, v8			; CI-NEXT: v_fma_f32 v8, -v8, v11, v10
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; CI-NEXT: v_div_fmas_f32 v8, v8, v10, v11			; CI-NEXT: v_div_fmas_f32 v8, v8, v9, v11
	; CI-NEXT: v_div_fixup_f32 v8, v8, v7, v3			; CI-NEXT: v_div_fixup_f32 v8, v8, v7, v3
	; CI-NEXT: v_trunc_f32_e32 v8, v8			; CI-NEXT: v_trunc_f32_e32 v8, v8
	; CI-NEXT: v_fma_f32 v3, -v8, v7, v3			; CI-NEXT: v_fma_f32 v3, -v8, v7, v3
	; CI-NEXT: v_div_scale_f32 v8, s[4:5], v6, v6, v2			; CI-NEXT: v_div_scale_f32 v7, vcc, v6, v6, v2
	; CI-NEXT: v_div_scale_f32 v7, vcc, v2, v6, v2			; CI-NEXT: v_div_scale_f32 v9, vcc, v2, v6, v2
	; CI-NEXT: v_rcp_f32_e32 v9, v8			; CI-NEXT: v_rcp_f32_e32 v8, v7
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v10, -v8, v9, 1.0			; CI-NEXT: v_fma_f32 v10, -v7, v8, 1.0
	; CI-NEXT: v_fma_f32 v9, v10, v9, v9			; CI-NEXT: v_fma_f32 v8, v10, v8, v8
	; CI-NEXT: v_mul_f32_e32 v10, v7, v9			; CI-NEXT: v_mul_f32_e32 v10, v9, v8
	; CI-NEXT: v_fma_f32 v11, -v8, v10, v7			; CI-NEXT: v_fma_f32 v11, -v7, v10, v9
	; CI-NEXT: v_fma_f32 v10, v11, v9, v10			; CI-NEXT: v_fma_f32 v10, v11, v8, v10
	; CI-NEXT: v_fma_f32 v7, -v8, v10, v7			; CI-NEXT: v_fma_f32 v7, -v7, v10, v9
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; CI-NEXT: v_div_fmas_f32 v7, v7, v9, v10			; CI-NEXT: v_div_fmas_f32 v7, v7, v8, v10
	; CI-NEXT: v_div_fixup_f32 v7, v7, v6, v2			; CI-NEXT: v_div_fixup_f32 v7, v7, v6, v2
	; CI-NEXT: v_trunc_f32_e32 v7, v7			; CI-NEXT: v_trunc_f32_e32 v7, v7
	; CI-NEXT: v_fma_f32 v2, -v7, v6, v2			; CI-NEXT: v_fma_f32 v2, -v7, v6, v2
	; CI-NEXT: v_div_scale_f32 v7, s[4:5], v5, v5, v1			; CI-NEXT: v_div_scale_f32 v6, vcc, v5, v5, v1
	; CI-NEXT: v_div_scale_f32 v6, vcc, v1, v5, v1			; CI-NEXT: v_div_scale_f32 v8, vcc, v1, v5, v1
	; CI-NEXT: v_rcp_f32_e32 v8, v7			; CI-NEXT: v_rcp_f32_e32 v7, v6
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v9, -v7, v8, 1.0			; CI-NEXT: v_fma_f32 v9, -v6, v7, 1.0
	; CI-NEXT: v_fma_f32 v8, v9, v8, v8			; CI-NEXT: v_fma_f32 v7, v9, v7, v7
	; CI-NEXT: v_mul_f32_e32 v9, v6, v8			; CI-NEXT: v_mul_f32_e32 v9, v8, v7
	; CI-NEXT: v_fma_f32 v10, -v7, v9, v6			; CI-NEXT: v_fma_f32 v10, -v6, v9, v8
	; CI-NEXT: v_fma_f32 v9, v10, v8, v9			; CI-NEXT: v_fma_f32 v9, v10, v7, v9
	; CI-NEXT: v_fma_f32 v6, -v7, v9, v6			; CI-NEXT: v_fma_f32 v6, -v6, v9, v8
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; CI-NEXT: v_div_fmas_f32 v6, v6, v8, v9			; CI-NEXT: v_div_fmas_f32 v6, v6, v7, v9
	; CI-NEXT: v_div_fixup_f32 v6, v6, v5, v1			; CI-NEXT: v_div_fixup_f32 v6, v6, v5, v1
	; CI-NEXT: v_trunc_f32_e32 v6, v6			; CI-NEXT: v_trunc_f32_e32 v6, v6
	; CI-NEXT: v_fma_f32 v1, -v6, v5, v1			; CI-NEXT: v_fma_f32 v1, -v6, v5, v1
	; CI-NEXT: v_div_scale_f32 v6, s[4:5], v4, v4, v0			; CI-NEXT: v_div_scale_f32 v5, vcc, v4, v4, v0
	; CI-NEXT: v_div_scale_f32 v5, vcc, v0, v4, v0			; CI-NEXT: v_div_scale_f32 v7, vcc, v0, v4, v0
	; CI-NEXT: v_rcp_f32_e32 v7, v6			; CI-NEXT: v_rcp_f32_e32 v6, v5
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; CI-NEXT: v_fma_f32 v8, -v6, v7, 1.0			; CI-NEXT: v_fma_f32 v8, -v5, v6, 1.0
	; CI-NEXT: v_fma_f32 v7, v8, v7, v7			; CI-NEXT: v_fma_f32 v6, v8, v6, v6
	; CI-NEXT: v_mul_f32_e32 v8, v5, v7			; CI-NEXT: v_mul_f32_e32 v8, v7, v6
	; CI-NEXT: v_fma_f32 v9, -v6, v8, v5			; CI-NEXT: v_fma_f32 v9, -v5, v8, v7
	; CI-NEXT: v_fma_f32 v8, v9, v7, v8			; CI-NEXT: v_fma_f32 v8, v9, v6, v8
	; CI-NEXT: v_fma_f32 v5, -v6, v8, v5			; CI-NEXT: v_fma_f32 v5, -v5, v8, v7
	; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; CI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; CI-NEXT: v_div_fmas_f32 v5, v5, v7, v8			; CI-NEXT: v_div_fmas_f32 v5, v5, v6, v8
	; CI-NEXT: v_div_fixup_f32 v5, v5, v4, v0			; CI-NEXT: v_div_fixup_f32 v5, v5, v4, v0
	; CI-NEXT: v_trunc_f32_e32 v5, v5			; CI-NEXT: v_trunc_f32_e32 v5, v5
	; CI-NEXT: v_fma_f32 v0, -v5, v4, v0			; CI-NEXT: v_fma_f32 v0, -v5, v4, v0
	; CI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0			; CI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0
	; CI-NEXT: s_endpgm			; CI-NEXT: s_endpgm
	;			;
	; VI-LABEL: frem_v4f32:			; VI-LABEL: frem_v4f32:
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: v_mov_b32_e32 v0, s6			; VI-NEXT: v_mov_b32_e32 v0, s6
	; VI-NEXT: s_add_u32 s0, s0, 64			; VI-NEXT: s_add_u32 s0, s0, 64
	; VI-NEXT: s_addc_u32 s1, s1, 0			; VI-NEXT: s_addc_u32 s1, s1, 0
	; VI-NEXT: v_mov_b32_e32 v5, s1			; VI-NEXT: v_mov_b32_e32 v5, s1
	; VI-NEXT: v_mov_b32_e32 v1, s7			; VI-NEXT: v_mov_b32_e32 v1, s7
	; VI-NEXT: v_mov_b32_e32 v4, s0			; VI-NEXT: v_mov_b32_e32 v4, s0
	; VI-NEXT: flat_load_dwordx4 v[0:3], v[0:1]			; VI-NEXT: flat_load_dwordx4 v[0:3], v[0:1]
	; VI-NEXT: flat_load_dwordx4 v[4:7], v[4:5]			; VI-NEXT: flat_load_dwordx4 v[4:7], v[4:5]
	; VI-NEXT: v_mov_b32_e32 v8, s4			; VI-NEXT: v_mov_b32_e32 v8, s4
	; VI-NEXT: v_mov_b32_e32 v9, s5			; VI-NEXT: v_mov_b32_e32 v9, s5
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: v_div_scale_f32 v11, s[0:1], v7, v7, v3			; VI-NEXT: v_div_scale_f32 v10, vcc, v7, v7, v3
	; VI-NEXT: v_div_scale_f32 v10, vcc, v3, v7, v3			; VI-NEXT: v_div_scale_f32 v12, vcc, v3, v7, v3
	; VI-NEXT: v_rcp_f32_e32 v12, v11			; VI-NEXT: v_rcp_f32_e32 v11, v10
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; VI-NEXT: v_fma_f32 v13, -v11, v12, 1.0			; VI-NEXT: v_fma_f32 v13, -v10, v11, 1.0
	; VI-NEXT: v_fma_f32 v12, v13, v12, v12			; VI-NEXT: v_fma_f32 v11, v13, v11, v11
	; VI-NEXT: v_mul_f32_e32 v13, v10, v12			; VI-NEXT: v_mul_f32_e32 v13, v12, v11
	; VI-NEXT: v_fma_f32 v14, -v11, v13, v10			; VI-NEXT: v_fma_f32 v14, -v10, v13, v12
	; VI-NEXT: v_fma_f32 v13, v14, v12, v13			; VI-NEXT: v_fma_f32 v13, v14, v11, v13
	; VI-NEXT: v_fma_f32 v10, -v11, v13, v10			; VI-NEXT: v_fma_f32 v10, -v10, v13, v12
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; VI-NEXT: v_div_fmas_f32 v10, v10, v12, v13			; VI-NEXT: v_div_fmas_f32 v10, v10, v11, v13
	; VI-NEXT: v_div_fixup_f32 v10, v10, v7, v3			; VI-NEXT: v_div_fixup_f32 v10, v10, v7, v3
	; VI-NEXT: v_trunc_f32_e32 v10, v10			; VI-NEXT: v_trunc_f32_e32 v10, v10
	; VI-NEXT: v_fma_f32 v3, -v10, v7, v3			; VI-NEXT: v_fma_f32 v3, -v10, v7, v3
	; VI-NEXT: v_div_scale_f32 v10, s[0:1], v6, v6, v2			; VI-NEXT: v_div_scale_f32 v7, vcc, v6, v6, v2
	; VI-NEXT: v_div_scale_f32 v7, vcc, v2, v6, v2			; VI-NEXT: v_div_scale_f32 v11, vcc, v2, v6, v2
	; VI-NEXT: v_rcp_f32_e32 v11, v10			; VI-NEXT: v_rcp_f32_e32 v10, v7
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; VI-NEXT: v_fma_f32 v12, -v10, v11, 1.0			; VI-NEXT: v_fma_f32 v12, -v7, v10, 1.0
	; VI-NEXT: v_fma_f32 v11, v12, v11, v11			; VI-NEXT: v_fma_f32 v10, v12, v10, v10
	; VI-NEXT: v_mul_f32_e32 v12, v7, v11			; VI-NEXT: v_mul_f32_e32 v12, v11, v10
	; VI-NEXT: v_fma_f32 v13, -v10, v12, v7			; VI-NEXT: v_fma_f32 v13, -v7, v12, v11
	; VI-NEXT: v_fma_f32 v12, v13, v11, v12			; VI-NEXT: v_fma_f32 v12, v13, v10, v12
	; VI-NEXT: v_fma_f32 v7, -v10, v12, v7			; VI-NEXT: v_fma_f32 v7, -v7, v12, v11
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; VI-NEXT: v_div_fmas_f32 v7, v7, v11, v12			; VI-NEXT: v_div_fmas_f32 v7, v7, v10, v12
	; VI-NEXT: v_div_fixup_f32 v7, v7, v6, v2			; VI-NEXT: v_div_fixup_f32 v7, v7, v6, v2
	; VI-NEXT: v_trunc_f32_e32 v7, v7			; VI-NEXT: v_trunc_f32_e32 v7, v7
	; VI-NEXT: v_fma_f32 v2, -v7, v6, v2			; VI-NEXT: v_fma_f32 v2, -v7, v6, v2
	; VI-NEXT: v_div_scale_f32 v7, s[0:1], v5, v5, v1			; VI-NEXT: v_div_scale_f32 v6, vcc, v5, v5, v1
	; VI-NEXT: v_div_scale_f32 v6, vcc, v1, v5, v1			; VI-NEXT: v_div_scale_f32 v10, vcc, v1, v5, v1
	; VI-NEXT: v_rcp_f32_e32 v10, v7			; VI-NEXT: v_rcp_f32_e32 v7, v6
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; VI-NEXT: v_fma_f32 v11, -v7, v10, 1.0			; VI-NEXT: v_fma_f32 v11, -v6, v7, 1.0
	; VI-NEXT: v_fma_f32 v10, v11, v10, v10			; VI-NEXT: v_fma_f32 v7, v11, v7, v7
	; VI-NEXT: v_mul_f32_e32 v11, v6, v10			; VI-NEXT: v_mul_f32_e32 v11, v10, v7
	; VI-NEXT: v_fma_f32 v12, -v7, v11, v6			; VI-NEXT: v_fma_f32 v12, -v6, v11, v10
	; VI-NEXT: v_fma_f32 v11, v12, v10, v11			; VI-NEXT: v_fma_f32 v11, v12, v7, v11
	; VI-NEXT: v_fma_f32 v6, -v7, v11, v6			; VI-NEXT: v_fma_f32 v6, -v6, v11, v10
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; VI-NEXT: v_div_fmas_f32 v6, v6, v10, v11			; VI-NEXT: v_div_fmas_f32 v6, v6, v7, v11
	; VI-NEXT: v_div_fixup_f32 v6, v6, v5, v1			; VI-NEXT: v_div_fixup_f32 v6, v6, v5, v1
	; VI-NEXT: v_trunc_f32_e32 v6, v6			; VI-NEXT: v_trunc_f32_e32 v6, v6
	; VI-NEXT: v_fma_f32 v1, -v6, v5, v1			; VI-NEXT: v_fma_f32 v1, -v6, v5, v1
	; VI-NEXT: v_div_scale_f32 v6, s[0:1], v4, v4, v0			; VI-NEXT: v_div_scale_f32 v5, vcc, v4, v4, v0
	; VI-NEXT: v_div_scale_f32 v5, vcc, v0, v4, v0			; VI-NEXT: v_div_scale_f32 v7, vcc, v0, v4, v0
	; VI-NEXT: v_rcp_f32_e32 v7, v6			; VI-NEXT: v_rcp_f32_e32 v6, v5
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; VI-NEXT: v_fma_f32 v10, -v6, v7, 1.0			; VI-NEXT: v_fma_f32 v10, -v5, v6, 1.0
	; VI-NEXT: v_fma_f32 v7, v10, v7, v7			; VI-NEXT: v_fma_f32 v6, v10, v6, v6
	; VI-NEXT: v_mul_f32_e32 v10, v5, v7			; VI-NEXT: v_mul_f32_e32 v10, v7, v6
	; VI-NEXT: v_fma_f32 v11, -v6, v10, v5			; VI-NEXT: v_fma_f32 v11, -v5, v10, v7
	; VI-NEXT: v_fma_f32 v10, v11, v7, v10			; VI-NEXT: v_fma_f32 v10, v11, v6, v10
	; VI-NEXT: v_fma_f32 v5, -v6, v10, v5			; VI-NEXT: v_fma_f32 v5, -v5, v10, v7
	; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; VI-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; VI-NEXT: v_div_fmas_f32 v5, v5, v7, v10			; VI-NEXT: v_div_fmas_f32 v5, v5, v6, v10
	; VI-NEXT: v_div_fixup_f32 v5, v5, v4, v0			; VI-NEXT: v_div_fixup_f32 v5, v5, v4, v0
	; VI-NEXT: v_trunc_f32_e32 v5, v5			; VI-NEXT: v_trunc_f32_e32 v5, v5
	; VI-NEXT: v_fma_f32 v0, -v5, v4, v0			; VI-NEXT: v_fma_f32 v0, -v5, v4, v0
	; VI-NEXT: flat_store_dwordx4 v[8:9], v[0:3]			; VI-NEXT: flat_store_dwordx4 v[8:9], v[0:3]
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: frem_v4f32:			; GFX9-LABEL: frem_v4f32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX9-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34			; GFX9-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	; GFX9-NEXT: v_mov_b32_e32 v8, 0			; GFX9-NEXT: v_mov_b32_e32 v8, 0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v8, s[6:7]			; GFX9-NEXT: global_load_dwordx4 v[0:3], v8, s[6:7]
	; GFX9-NEXT: global_load_dwordx4 v[4:7], v8, s[2:3] offset:64			; GFX9-NEXT: global_load_dwordx4 v[4:7], v8, s[2:3] offset:64
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_div_scale_f32 v10, s[0:1], v7, v7, v3			; GFX9-NEXT: v_div_scale_f32 v9, vcc, v7, v7, v3
	; GFX9-NEXT: v_div_scale_f32 v9, vcc, v3, v7, v3			; GFX9-NEXT: v_div_scale_f32 v11, vcc, v3, v7, v3
	; GFX9-NEXT: v_rcp_f32_e32 v11, v10			; GFX9-NEXT: v_rcp_f32_e32 v10, v9
	; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX9-NEXT: v_fma_f32 v12, -v10, v11, 1.0			; GFX9-NEXT: v_fma_f32 v12, -v9, v10, 1.0
	; GFX9-NEXT: v_fma_f32 v11, v12, v11, v11			; GFX9-NEXT: v_fma_f32 v10, v12, v10, v10
	; GFX9-NEXT: v_mul_f32_e32 v12, v9, v11			; GFX9-NEXT: v_mul_f32_e32 v12, v11, v10
	; GFX9-NEXT: v_fma_f32 v13, -v10, v12, v9			; GFX9-NEXT: v_fma_f32 v13, -v9, v12, v11
	; GFX9-NEXT: v_fma_f32 v12, v13, v11, v12			; GFX9-NEXT: v_fma_f32 v12, v13, v10, v12
	; GFX9-NEXT: v_fma_f32 v9, -v10, v12, v9			; GFX9-NEXT: v_fma_f32 v9, -v9, v12, v11
	; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; GFX9-NEXT: v_div_fmas_f32 v9, v9, v11, v12			; GFX9-NEXT: v_div_fmas_f32 v9, v9, v10, v12
	; GFX9-NEXT: v_div_fixup_f32 v9, v9, v7, v3			; GFX9-NEXT: v_div_fixup_f32 v9, v9, v7, v3
	; GFX9-NEXT: v_trunc_f32_e32 v9, v9			; GFX9-NEXT: v_trunc_f32_e32 v9, v9
	; GFX9-NEXT: v_fma_f32 v3, -v9, v7, v3			; GFX9-NEXT: v_fma_f32 v3, -v9, v7, v3
	; GFX9-NEXT: v_div_scale_f32 v9, s[0:1], v6, v6, v2			; GFX9-NEXT: v_div_scale_f32 v7, vcc, v6, v6, v2
	; GFX9-NEXT: v_div_scale_f32 v7, vcc, v2, v6, v2			; GFX9-NEXT: v_div_scale_f32 v10, vcc, v2, v6, v2
	; GFX9-NEXT: v_rcp_f32_e32 v10, v9			; GFX9-NEXT: v_rcp_f32_e32 v9, v7
	; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX9-NEXT: v_fma_f32 v11, -v9, v10, 1.0			; GFX9-NEXT: v_fma_f32 v11, -v7, v9, 1.0
	; GFX9-NEXT: v_fma_f32 v10, v11, v10, v10			; GFX9-NEXT: v_fma_f32 v9, v11, v9, v9
	; GFX9-NEXT: v_mul_f32_e32 v11, v7, v10			; GFX9-NEXT: v_mul_f32_e32 v11, v10, v9
	; GFX9-NEXT: v_fma_f32 v12, -v9, v11, v7			; GFX9-NEXT: v_fma_f32 v12, -v7, v11, v10
	; GFX9-NEXT: v_fma_f32 v11, v12, v10, v11			; GFX9-NEXT: v_fma_f32 v11, v12, v9, v11
	; GFX9-NEXT: v_fma_f32 v7, -v9, v11, v7			; GFX9-NEXT: v_fma_f32 v7, -v7, v11, v10
	; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; GFX9-NEXT: v_div_fmas_f32 v7, v7, v10, v11			; GFX9-NEXT: v_div_fmas_f32 v7, v7, v9, v11
	; GFX9-NEXT: v_div_fixup_f32 v7, v7, v6, v2			; GFX9-NEXT: v_div_fixup_f32 v7, v7, v6, v2
	; GFX9-NEXT: v_trunc_f32_e32 v7, v7			; GFX9-NEXT: v_trunc_f32_e32 v7, v7
	; GFX9-NEXT: v_fma_f32 v2, -v7, v6, v2			; GFX9-NEXT: v_fma_f32 v2, -v7, v6, v2
	; GFX9-NEXT: v_div_scale_f32 v7, s[0:1], v5, v5, v1			; GFX9-NEXT: v_div_scale_f32 v6, vcc, v5, v5, v1
	; GFX9-NEXT: v_div_scale_f32 v6, vcc, v1, v5, v1			; GFX9-NEXT: v_div_scale_f32 v9, vcc, v1, v5, v1
	; GFX9-NEXT: v_rcp_f32_e32 v9, v7			; GFX9-NEXT: v_rcp_f32_e32 v7, v6
	; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX9-NEXT: v_fma_f32 v10, -v7, v9, 1.0			; GFX9-NEXT: v_fma_f32 v10, -v6, v7, 1.0
	; GFX9-NEXT: v_fma_f32 v9, v10, v9, v9			; GFX9-NEXT: v_fma_f32 v7, v10, v7, v7
	; GFX9-NEXT: v_mul_f32_e32 v10, v6, v9			; GFX9-NEXT: v_mul_f32_e32 v10, v9, v7
	; GFX9-NEXT: v_fma_f32 v11, -v7, v10, v6			; GFX9-NEXT: v_fma_f32 v11, -v6, v10, v9
	; GFX9-NEXT: v_fma_f32 v10, v11, v9, v10			; GFX9-NEXT: v_fma_f32 v10, v11, v7, v10
	; GFX9-NEXT: v_fma_f32 v6, -v7, v10, v6			; GFX9-NEXT: v_fma_f32 v6, -v6, v10, v9
	; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; GFX9-NEXT: v_div_fmas_f32 v6, v6, v9, v10			; GFX9-NEXT: v_div_fmas_f32 v6, v6, v7, v10
	; GFX9-NEXT: v_div_fixup_f32 v6, v6, v5, v1			; GFX9-NEXT: v_div_fixup_f32 v6, v6, v5, v1
	; GFX9-NEXT: v_trunc_f32_e32 v6, v6			; GFX9-NEXT: v_trunc_f32_e32 v6, v6
	; GFX9-NEXT: v_fma_f32 v1, -v6, v5, v1			; GFX9-NEXT: v_fma_f32 v1, -v6, v5, v1
	; GFX9-NEXT: v_div_scale_f32 v6, s[0:1], v4, v4, v0			; GFX9-NEXT: v_div_scale_f32 v5, vcc, v4, v4, v0
	; GFX9-NEXT: v_div_scale_f32 v5, vcc, v0, v4, v0			; GFX9-NEXT: v_div_scale_f32 v7, vcc, v0, v4, v0
	; GFX9-NEXT: v_rcp_f32_e32 v7, v6			; GFX9-NEXT: v_rcp_f32_e32 v6, v5
	; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3			; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
	; GFX9-NEXT: v_fma_f32 v9, -v6, v7, 1.0			; GFX9-NEXT: v_fma_f32 v9, -v5, v6, 1.0
	; GFX9-NEXT: v_fma_f32 v7, v9, v7, v7			; GFX9-NEXT: v_fma_f32 v6, v9, v6, v6
	; GFX9-NEXT: v_mul_f32_e32 v9, v5, v7			; GFX9-NEXT: v_mul_f32_e32 v9, v7, v6
	; GFX9-NEXT: v_fma_f32 v10, -v6, v9, v5			; GFX9-NEXT: v_fma_f32 v10, -v5, v9, v7
	; GFX9-NEXT: v_fma_f32 v9, v10, v7, v9			; GFX9-NEXT: v_fma_f32 v9, v10, v6, v9
	; GFX9-NEXT: v_fma_f32 v5, -v6, v9, v5			; GFX9-NEXT: v_fma_f32 v5, -v5, v9, v7
	; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0			; GFX9-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0
	; GFX9-NEXT: v_div_fmas_f32 v5, v5, v7, v9			; GFX9-NEXT: v_div_fmas_f32 v5, v5, v6, v9
	; GFX9-NEXT: v_div_fixup_f32 v5, v5, v4, v0			; GFX9-NEXT: v_div_fixup_f32 v5, v5, v4, v0
	; GFX9-NEXT: v_trunc_f32_e32 v5, v5			; GFX9-NEXT: v_trunc_f32_e32 v5, v5
	; GFX9-NEXT: v_fma_f32 v0, -v5, v4, v0			; GFX9-NEXT: v_fma_f32 v0, -v5, v4, v0
	; GFX9-NEXT: global_store_dwordx4 v8, v[0:3], s[4:5]			; GFX9-NEXT: global_store_dwordx4 v8, v[0:3], s[4:5]
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: frem_v4f32:			; GFX10-LABEL: frem_v4f32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34			; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	; GFX10-NEXT: v_mov_b32_e32 v8, 0			; GFX10-NEXT: v_mov_b32_e32 v8, 0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v8, s[6:7]			; GFX10-NEXT: global_load_dwordx4 v[0:3], v8, s[6:7]
	; GFX10-NEXT: global_load_dwordx4 v[4:7], v8, s[2:3] offset:64			; GFX10-NEXT: global_load_dwordx4 v[4:7], v8, s[2:3] offset:64
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f32 v10, s0, v7, v7, v3			; GFX10-NEXT: v_div_scale_f32 v9, vcc_lo, v7, v7, v3
	; GFX10-NEXT: v_div_scale_f32 v9, vcc_lo, v3, v7, v3			; GFX10-NEXT: v_div_scale_f32 v11, vcc_lo, v3, v7, v3
	; GFX10-NEXT: v_rcp_f32_e32 v11, v10			; GFX10-NEXT: v_rcp_f32_e32 v10, v9
	; GFX10-NEXT: s_denorm_mode 15			; GFX10-NEXT: s_denorm_mode 15
	; GFX10-NEXT: v_fma_f32 v12, -v10, v11, 1.0			; GFX10-NEXT: v_fma_f32 v12, -v9, v10, 1.0
	; GFX10-NEXT: v_fmac_f32_e32 v11, v12, v11			; GFX10-NEXT: v_fmac_f32_e32 v10, v12, v10
	; GFX10-NEXT: v_mul_f32_e32 v12, v9, v11			; GFX10-NEXT: v_mul_f32_e32 v12, v11, v10
	; GFX10-NEXT: v_fma_f32 v13, -v10, v12, v9			; GFX10-NEXT: v_fma_f32 v13, -v9, v12, v11
	; GFX10-NEXT: v_fmac_f32_e32 v12, v13, v11			; GFX10-NEXT: v_fmac_f32_e32 v12, v13, v10
	; GFX10-NEXT: v_fma_f32 v9, -v10, v12, v9			; GFX10-NEXT: v_fma_f32 v9, -v9, v12, v11
	; GFX10-NEXT: s_denorm_mode 12			; GFX10-NEXT: s_denorm_mode 12
	; GFX10-NEXT: v_div_fmas_f32 v9, v9, v11, v12			; GFX10-NEXT: v_div_fmas_f32 v9, v9, v10, v12
	; GFX10-NEXT: v_div_fixup_f32 v9, v9, v7, v3			; GFX10-NEXT: v_div_fixup_f32 v9, v9, v7, v3
	; GFX10-NEXT: v_trunc_f32_e32 v9, v9			; GFX10-NEXT: v_trunc_f32_e32 v9, v9
	; GFX10-NEXT: v_fma_f32 v3, -v9, v7, v3			; GFX10-NEXT: v_fma_f32 v3, -v9, v7, v3
	; GFX10-NEXT: v_div_scale_f32 v9, s0, v6, v6, v2			; GFX10-NEXT: v_div_scale_f32 v7, vcc_lo, v6, v6, v2
	; GFX10-NEXT: v_div_scale_f32 v7, vcc_lo, v2, v6, v2			; GFX10-NEXT: v_div_scale_f32 v10, vcc_lo, v2, v6, v2
	; GFX10-NEXT: v_rcp_f32_e32 v10, v9			; GFX10-NEXT: v_rcp_f32_e32 v9, v7
	; GFX10-NEXT: s_denorm_mode 15			; GFX10-NEXT: s_denorm_mode 15
	; GFX10-NEXT: v_fma_f32 v11, -v9, v10, 1.0			; GFX10-NEXT: v_fma_f32 v11, -v7, v9, 1.0
	; GFX10-NEXT: v_fmac_f32_e32 v10, v11, v10			; GFX10-NEXT: v_fmac_f32_e32 v9, v11, v9
	; GFX10-NEXT: v_mul_f32_e32 v11, v7, v10			; GFX10-NEXT: v_mul_f32_e32 v11, v10, v9
	; GFX10-NEXT: v_fma_f32 v12, -v9, v11, v7			; GFX10-NEXT: v_fma_f32 v12, -v7, v11, v10
	; GFX10-NEXT: v_fmac_f32_e32 v11, v12, v10			; GFX10-NEXT: v_fmac_f32_e32 v11, v12, v9
	; GFX10-NEXT: v_fma_f32 v7, -v9, v11, v7			; GFX10-NEXT: v_fma_f32 v7, -v7, v11, v10
	; GFX10-NEXT: s_denorm_mode 12			; GFX10-NEXT: s_denorm_mode 12
	; GFX10-NEXT: v_div_fmas_f32 v7, v7, v10, v11			; GFX10-NEXT: v_div_fmas_f32 v7, v7, v9, v11
	; GFX10-NEXT: v_div_fixup_f32 v7, v7, v6, v2			; GFX10-NEXT: v_div_fixup_f32 v7, v7, v6, v2
	; GFX10-NEXT: v_trunc_f32_e32 v7, v7			; GFX10-NEXT: v_trunc_f32_e32 v7, v7
	; GFX10-NEXT: v_fma_f32 v2, -v7, v6, v2			; GFX10-NEXT: v_fma_f32 v2, -v7, v6, v2
	; GFX10-NEXT: v_div_scale_f32 v7, s0, v5, v5, v1			; GFX10-NEXT: v_div_scale_f32 v6, vcc_lo, v5, v5, v1
	; GFX10-NEXT: v_div_scale_f32 v6, vcc_lo, v1, v5, v1			; GFX10-NEXT: v_div_scale_f32 v9, vcc_lo, v1, v5, v1
	; GFX10-NEXT: v_rcp_f32_e32 v9, v7			; GFX10-NEXT: v_rcp_f32_e32 v7, v6
	; GFX10-NEXT: s_denorm_mode 15			; GFX10-NEXT: s_denorm_mode 15
	; GFX10-NEXT: v_fma_f32 v10, -v7, v9, 1.0			; GFX10-NEXT: v_fma_f32 v10, -v6, v7, 1.0
	; GFX10-NEXT: v_fmac_f32_e32 v9, v10, v9			; GFX10-NEXT: v_fmac_f32_e32 v7, v10, v7
	; GFX10-NEXT: v_mul_f32_e32 v10, v6, v9			; GFX10-NEXT: v_mul_f32_e32 v10, v9, v7
	; GFX10-NEXT: v_fma_f32 v11, -v7, v10, v6			; GFX10-NEXT: v_fma_f32 v11, -v6, v10, v9
	; GFX10-NEXT: v_fmac_f32_e32 v10, v11, v9			; GFX10-NEXT: v_fmac_f32_e32 v10, v11, v7
	; GFX10-NEXT: v_fma_f32 v6, -v7, v10, v6			; GFX10-NEXT: v_fma_f32 v6, -v6, v10, v9
	; GFX10-NEXT: s_denorm_mode 12			; GFX10-NEXT: s_denorm_mode 12
	; GFX10-NEXT: v_div_fmas_f32 v6, v6, v9, v10			; GFX10-NEXT: v_div_fmas_f32 v6, v6, v7, v10
	; GFX10-NEXT: v_div_fixup_f32 v6, v6, v5, v1			; GFX10-NEXT: v_div_fixup_f32 v6, v6, v5, v1
	; GFX10-NEXT: v_trunc_f32_e32 v6, v6			; GFX10-NEXT: v_trunc_f32_e32 v6, v6
	; GFX10-NEXT: v_fma_f32 v1, -v6, v5, v1			; GFX10-NEXT: v_fma_f32 v1, -v6, v5, v1
	; GFX10-NEXT: v_div_scale_f32 v6, s0, v4, v4, v0			; GFX10-NEXT: v_div_scale_f32 v5, vcc_lo, v4, v4, v0
	; GFX10-NEXT: v_div_scale_f32 v5, vcc_lo, v0, v4, v0			; GFX10-NEXT: v_div_scale_f32 v7, vcc_lo, v0, v4, v0
	; GFX10-NEXT: v_rcp_f32_e32 v7, v6			; GFX10-NEXT: v_rcp_f32_e32 v6, v5
	; GFX10-NEXT: s_denorm_mode 15			; GFX10-NEXT: s_denorm_mode 15
	; GFX10-NEXT: v_fma_f32 v9, -v6, v7, 1.0			; GFX10-NEXT: v_fma_f32 v9, -v5, v6, 1.0
	; GFX10-NEXT: v_fmac_f32_e32 v7, v9, v7			; GFX10-NEXT: v_fmac_f32_e32 v6, v9, v6
	; GFX10-NEXT: v_mul_f32_e32 v9, v5, v7			; GFX10-NEXT: v_mul_f32_e32 v9, v7, v6
	; GFX10-NEXT: v_fma_f32 v10, -v6, v9, v5			; GFX10-NEXT: v_fma_f32 v10, -v5, v9, v7
	; GFX10-NEXT: v_fmac_f32_e32 v9, v10, v7			; GFX10-NEXT: v_fmac_f32_e32 v9, v10, v6
	; GFX10-NEXT: v_fma_f32 v5, -v6, v9, v5			; GFX10-NEXT: v_fma_f32 v5, -v5, v9, v7
	; GFX10-NEXT: s_denorm_mode 12			; GFX10-NEXT: s_denorm_mode 12
	; GFX10-NEXT: v_div_fmas_f32 v5, v5, v7, v9			; GFX10-NEXT: v_div_fmas_f32 v5, v5, v6, v9
	; GFX10-NEXT: v_div_fixup_f32 v5, v5, v4, v0			; GFX10-NEXT: v_div_fixup_f32 v5, v5, v4, v0
	; GFX10-NEXT: v_trunc_f32_e32 v5, v5			; GFX10-NEXT: v_trunc_f32_e32 v5, v5
	; GFX10-NEXT: v_fma_f32 v0, -v5, v4, v0			; GFX10-NEXT: v_fma_f32 v0, -v5, v4, v0
	; GFX10-NEXT: global_store_dwordx4 v8, v[0:3], s[4:5]			; GFX10-NEXT: global_store_dwordx4 v8, v[0:3], s[4:5]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: frem_v4f32:			; GFX11-LABEL: frem_v4f32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24			; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x34			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x34
	; GFX11-NEXT: v_mov_b32_e32 v8, 0			; GFX11-NEXT: v_mov_b32_e32 v8, 0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: global_load_b128 v[0:3], v8, s[6:7]			; GFX11-NEXT: global_load_b128 v[0:3], v8, s[6:7]
	; GFX11-NEXT: global_load_b128 v[4:7], v8, s[0:1] offset:64			; GFX11-NEXT: global_load_b128 v[4:7], v8, s[0:1] offset:64
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_div_scale_f32 v10, null, v7, v7, v3			; GFX11-NEXT: v_div_scale_f32 v9, vcc_lo, v7, v7, v3
	; GFX11-NEXT: v_div_scale_f32 v9, vcc_lo, v3, v7, v3			; GFX11-NEXT: v_div_scale_f32 v11, vcc_lo, v3, v7, v3
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_rcp_f32_e32 v11, v10			; GFX11-NEXT: v_rcp_f32_e32 v10, v9
	; GFX11-NEXT: s_denorm_mode 15			; GFX11-NEXT: s_denorm_mode 15
	; GFX11-NEXT: s_waitcnt_depctr 0xfff			; GFX11-NEXT: s_waitcnt_depctr 0xfff
	; GFX11-NEXT: v_fma_f32 v12, -v10, v11, 1.0			; GFX11-NEXT: v_fma_f32 v12, -v9, v10, 1.0
	; GFX11-NEXT: v_fmac_f32_e32 v11, v12, v11			; GFX11-NEXT: v_fmac_f32_e32 v10, v12, v10
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_mul_f32_e32 v12, v9, v11			; GFX11-NEXT: v_mul_f32_e32 v12, v11, v10
	; GFX11-NEXT: v_fma_f32 v13, -v10, v12, v9			; GFX11-NEXT: v_fma_f32 v13, -v9, v12, v11
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_fmac_f32_e32 v12, v13, v11			; GFX11-NEXT: v_fmac_f32_e32 v12, v13, v10
	; GFX11-NEXT: v_fma_f32 v9, -v10, v12, v9			; GFX11-NEXT: v_fma_f32 v9, -v9, v12, v11
	; GFX11-NEXT: s_denorm_mode 12			; GFX11-NEXT: s_denorm_mode 12
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_div_fmas_f32 v9, v9, v11, v12			; GFX11-NEXT: v_div_fmas_f32 v9, v9, v10, v12
	; GFX11-NEXT: v_div_fixup_f32 v9, v9, v7, v3			; GFX11-NEXT: v_div_fixup_f32 v9, v9, v7, v3
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_trunc_f32_e32 v9, v9			; GFX11-NEXT: v_trunc_f32_e32 v9, v9
	; GFX11-NEXT: v_fma_f32 v3, -v9, v7, v3			; GFX11-NEXT: v_fma_f32 v3, -v9, v7, v3
	; GFX11-NEXT: v_div_scale_f32 v9, null, v6, v6, v2			; GFX11-NEXT: v_div_scale_f32 v7, vcc_lo, v6, v6, v2
	; GFX11-NEXT: v_div_scale_f32 v7, vcc_lo, v2, v6, v2			; GFX11-NEXT: v_div_scale_f32 v10, vcc_lo, v2, v6, v2
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_rcp_f32_e32 v10, v9			; GFX11-NEXT: v_rcp_f32_e32 v9, v7
	; GFX11-NEXT: s_denorm_mode 15			; GFX11-NEXT: s_denorm_mode 15
	; GFX11-NEXT: s_waitcnt_depctr 0xfff			; GFX11-NEXT: s_waitcnt_depctr 0xfff
	; GFX11-NEXT: v_fma_f32 v11, -v9, v10, 1.0			; GFX11-NEXT: v_fma_f32 v11, -v7, v9, 1.0
	; GFX11-NEXT: v_fmac_f32_e32 v10, v11, v10			; GFX11-NEXT: v_fmac_f32_e32 v9, v11, v9
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_mul_f32_e32 v11, v7, v10			; GFX11-NEXT: v_mul_f32_e32 v11, v10, v9
	; GFX11-NEXT: v_fma_f32 v12, -v9, v11, v7			; GFX11-NEXT: v_fma_f32 v12, -v7, v11, v10
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_fmac_f32_e32 v11, v12, v10			; GFX11-NEXT: v_fmac_f32_e32 v11, v12, v9
	; GFX11-NEXT: v_fma_f32 v7, -v9, v11, v7			; GFX11-NEXT: v_fma_f32 v7, -v7, v11, v10
	; GFX11-NEXT: s_denorm_mode 12			; GFX11-NEXT: s_denorm_mode 12
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_div_fmas_f32 v7, v7, v10, v11			; GFX11-NEXT: v_div_fmas_f32 v7, v7, v9, v11
	; GFX11-NEXT: v_div_fixup_f32 v7, v7, v6, v2			; GFX11-NEXT: v_div_fixup_f32 v7, v7, v6, v2
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_trunc_f32_e32 v7, v7			; GFX11-NEXT: v_trunc_f32_e32 v7, v7
	; GFX11-NEXT: v_fma_f32 v2, -v7, v6, v2			; GFX11-NEXT: v_fma_f32 v2, -v7, v6, v2
	; GFX11-NEXT: v_div_scale_f32 v7, null, v5, v5, v1			; GFX11-NEXT: v_div_scale_f32 v6, vcc_lo, v5, v5, v1
	; GFX11-NEXT: v_div_scale_f32 v6, vcc_lo, v1, v5, v1			; GFX11-NEXT: v_div_scale_f32 v9, vcc_lo, v1, v5, v1
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_rcp_f32_e32 v9, v7			; GFX11-NEXT: v_rcp_f32_e32 v7, v6
	; GFX11-NEXT: s_denorm_mode 15			; GFX11-NEXT: s_denorm_mode 15
	; GFX11-NEXT: s_waitcnt_depctr 0xfff			; GFX11-NEXT: s_waitcnt_depctr 0xfff
	; GFX11-NEXT: v_fma_f32 v10, -v7, v9, 1.0			; GFX11-NEXT: v_fma_f32 v10, -v6, v7, 1.0
	; GFX11-NEXT: v_fmac_f32_e32 v9, v10, v9			; GFX11-NEXT: v_fmac_f32_e32 v7, v10, v7
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_mul_f32_e32 v10, v6, v9			; GFX11-NEXT: v_mul_f32_e32 v10, v9, v7
	; GFX11-NEXT: v_fma_f32 v11, -v7, v10, v6			; GFX11-NEXT: v_fma_f32 v11, -v6, v10, v9
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_fmac_f32_e32 v10, v11, v9			; GFX11-NEXT: v_fmac_f32_e32 v10, v11, v7
	; GFX11-NEXT: v_fma_f32 v6, -v7, v10, v6			; GFX11-NEXT: v_fma_f32 v6, -v6, v10, v9
	; GFX11-NEXT: s_denorm_mode 12			; GFX11-NEXT: s_denorm_mode 12
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_div_fmas_f32 v6, v6, v9, v10			; GFX11-NEXT: v_div_fmas_f32 v6, v6, v7, v10
	; GFX11-NEXT: v_div_fixup_f32 v6, v6, v5, v1			; GFX11-NEXT: v_div_fixup_f32 v6, v6, v5, v1
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_trunc_f32_e32 v6, v6			; GFX11-NEXT: v_trunc_f32_e32 v6, v6
	; GFX11-NEXT: v_fma_f32 v1, -v6, v5, v1			; GFX11-NEXT: v_fma_f32 v1, -v6, v5, v1
	; GFX11-NEXT: v_div_scale_f32 v6, null, v4, v4, v0			; GFX11-NEXT: v_div_scale_f32 v5, vcc_lo, v4, v4, v0
	; GFX11-NEXT: v_div_scale_f32 v5, vcc_lo, v0, v4, v0			; GFX11-NEXT: v_div_scale_f32 v7, vcc_lo, v0, v4, v0
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_rcp_f32_e32 v7, v6			; GFX11-NEXT: v_rcp_f32_e32 v6, v5
	; GFX11-NEXT: s_denorm_mode 15			; GFX11-NEXT: s_denorm_mode 15
	; GFX11-NEXT: s_waitcnt_depctr 0xfff			; GFX11-NEXT: s_waitcnt_depctr 0xfff
	; GFX11-NEXT: v_fma_f32 v9, -v6, v7, 1.0			; GFX11-NEXT: v_fma_f32 v9, -v5, v6, 1.0
	; GFX11-NEXT: v_fmac_f32_e32 v7, v9, v7			; GFX11-NEXT: v_fmac_f32_e32 v6, v9, v6
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_mul_f32_e32 v9, v5, v7			; GFX11-NEXT: v_mul_f32_e32 v9, v7, v6
	; GFX11-NEXT: v_fma_f32 v10, -v6, v9, v5			; GFX11-NEXT: v_fma_f32 v10, -v5, v9, v7
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_fmac_f32_e32 v9, v10, v7			; GFX11-NEXT: v_fmac_f32_e32 v9, v10, v6
	; GFX11-NEXT: v_fma_f32 v5, -v6, v9, v5			; GFX11-NEXT: v_fma_f32 v5, -v5, v9, v7
	; GFX11-NEXT: s_denorm_mode 12			; GFX11-NEXT: s_denorm_mode 12
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_div_fmas_f32 v5, v5, v7, v9			; GFX11-NEXT: v_div_fmas_f32 v5, v5, v6, v9
	; GFX11-NEXT: v_div_fixup_f32 v5, v5, v4, v0			; GFX11-NEXT: v_div_fixup_f32 v5, v5, v4, v0
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_trunc_f32_e32 v5, v5			; GFX11-NEXT: v_trunc_f32_e32 v5, v5
	; GFX11-NEXT: v_fma_f32 v0, -v5, v4, v0			; GFX11-NEXT: v_fma_f32 v0, -v5, v4, v0
	; GFX11-NEXT: global_store_b128 v8, v[0:3], s[4:5]			; GFX11-NEXT: global_store_b128 v8, v[0:3], s[4:5]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	<4 x float> addrspace(1)* %in2) #0 {			<4 x float> addrspace(1)* %in2) #0 {
	Show All 19 Lines
	; SI-NEXT: s_mov_b32 s9, s11			; SI-NEXT: s_mov_b32 s9, s11
	; SI-NEXT: s_mov_b32 s10, s6			; SI-NEXT: s_mov_b32 s10, s6
	; SI-NEXT: s_mov_b32 s11, s7			; SI-NEXT: s_mov_b32 s11, s7
	; SI-NEXT: s_mov_b32 s2, s6			; SI-NEXT: s_mov_b32 s2, s6
	; SI-NEXT: s_mov_b32 s3, s7			; SI-NEXT: s_mov_b32 s3, s7
	; SI-NEXT: buffer_load_dwordx4 v[0:3], off, s[8:11], 0			; SI-NEXT: buffer_load_dwordx4 v[0:3], off, s[8:11], 0
	; SI-NEXT: buffer_load_dwordx4 v[4:7], off, s[0:3], 0 offset:64			; SI-NEXT: buffer_load_dwordx4 v[4:7], off, s[0:3], 0 offset:64
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: v_div_scale_f64 v[8:9], s[0:1], v[6:7], v[6:7], v[2:3]			; SI-NEXT: v_div_scale_f64 v[8:9], vcc, v[6:7], v[6:7], v[2:3]
	; SI-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]			; SI-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]
	; SI-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0			; SI-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
	; SI-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]			; SI-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
	; SI-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0			; SI-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
	; SI-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]			; SI-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
	; SI-NEXT: v_div_scale_f64 v[12:13], s[0:1], v[2:3], v[6:7], v[2:3]			; SI-NEXT: v_div_scale_f64 v[12:13], vcc, v[2:3], v[6:7], v[2:3]
	; SI-NEXT: v_mul_f64 v[14:15], v[12:13], v[10:11]			; SI-NEXT: v_mul_f64 v[14:15], v[12:13], v[10:11]
	; SI-NEXT: v_fma_f64 v[16:17], -v[8:9], v[14:15], v[12:13]			; SI-NEXT: v_fma_f64 v[16:17], -v[8:9], v[14:15], v[12:13]
	; SI-NEXT: v_cmp_eq_u32_e32 vcc, v7, v9			; SI-NEXT: v_cmp_eq_u32_e32 vcc, v7, v9
	; SI-NEXT: v_cmp_eq_u32_e64 s[0:1], v3, v13			; SI-NEXT: v_cmp_eq_u32_e64 s[0:1], v3, v13
	; SI-NEXT: s_xor_b64 vcc, s[0:1], vcc			; SI-NEXT: s_xor_b64 vcc, s[0:1], vcc
	; SI-NEXT: s_nop 1			; SI-NEXT: s_nop 1
	; SI-NEXT: v_div_fmas_f64 v[8:9], v[16:17], v[10:11], v[14:15]			; SI-NEXT: v_div_fmas_f64 v[8:9], v[16:17], v[10:11], v[14:15]
	; SI-NEXT: v_div_fixup_f64 v[8:9], v[8:9], v[6:7], v[2:3]			; SI-NEXT: v_div_fixup_f64 v[8:9], v[8:9], v[6:7], v[2:3]
	; SI-NEXT: v_bfe_u32 v10, v9, 20, 11			; SI-NEXT: v_bfe_u32 v10, v9, 20, 11
	; SI-NEXT: v_add_i32_e32 v12, vcc, 0xfffffc01, v10			; SI-NEXT: v_add_i32_e32 v12, vcc, 0xfffffc01, v10
	; SI-NEXT: s_mov_b32 s3, 0xfffff			; SI-NEXT: s_mov_b32 s3, 0xfffff
	; SI-NEXT: v_lshr_b64 v[10:11], s[2:3], v12			; SI-NEXT: v_lshr_b64 v[10:11], s[2:3], v12
	; SI-NEXT: v_not_b32_e32 v10, v10			; SI-NEXT: v_not_b32_e32 v10, v10
	; SI-NEXT: v_and_b32_e32 v10, v8, v10			; SI-NEXT: v_and_b32_e32 v10, v8, v10
	; SI-NEXT: v_not_b32_e32 v11, v11			; SI-NEXT: v_not_b32_e32 v11, v11
	; SI-NEXT: v_and_b32_e32 v11, v9, v11			; SI-NEXT: v_and_b32_e32 v11, v9, v11
	; SI-NEXT: v_and_b32_e32 v13, 0x80000000, v9			; SI-NEXT: v_and_b32_e32 v13, 0x80000000, v9
	; SI-NEXT: v_cmp_gt_i32_e32 vcc, 0, v12			; SI-NEXT: v_cmp_gt_i32_e32 vcc, 0, v12
	; SI-NEXT: v_cndmask_b32_e32 v11, v11, v13, vcc			; SI-NEXT: v_cndmask_b32_e32 v11, v11, v13, vcc
	; SI-NEXT: v_cmp_lt_i32_e64 s[0:1], 51, v12			; SI-NEXT: v_cmp_lt_i32_e64 s[0:1], 51, v12
	; SI-NEXT: v_cndmask_b32_e64 v9, v11, v9, s[0:1]			; SI-NEXT: v_cndmask_b32_e64 v9, v11, v9, s[0:1]
	; SI-NEXT: v_cndmask_b32_e64 v10, v10, 0, vcc			; SI-NEXT: v_cndmask_b32_e64 v10, v10, 0, vcc
	; SI-NEXT: v_cndmask_b32_e64 v8, v10, v8, s[0:1]			; SI-NEXT: v_cndmask_b32_e64 v8, v10, v8, s[0:1]
	; SI-NEXT: v_fma_f64 v[2:3], -v[8:9], v[6:7], v[2:3]			; SI-NEXT: v_fma_f64 v[2:3], -v[8:9], v[6:7], v[2:3]
	; SI-NEXT: v_div_scale_f64 v[6:7], s[0:1], v[4:5], v[4:5], v[0:1]			; SI-NEXT: v_div_scale_f64 v[6:7], vcc, v[4:5], v[4:5], v[0:1]
	; SI-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]			; SI-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]
	; SI-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0			; SI-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
	; SI-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]			; SI-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
	; SI-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0			; SI-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
	; SI-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]			; SI-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
	; SI-NEXT: v_div_scale_f64 v[10:11], s[0:1], v[0:1], v[4:5], v[0:1]			; SI-NEXT: v_div_scale_f64 v[10:11], vcc, v[0:1], v[4:5], v[0:1]
	; SI-NEXT: v_mul_f64 v[12:13], v[10:11], v[8:9]			; SI-NEXT: v_mul_f64 v[12:13], v[10:11], v[8:9]
	; SI-NEXT: v_fma_f64 v[14:15], -v[6:7], v[12:13], v[10:11]			; SI-NEXT: v_fma_f64 v[14:15], -v[6:7], v[12:13], v[10:11]
	; SI-NEXT: v_cmp_eq_u32_e32 vcc, v5, v7			; SI-NEXT: v_cmp_eq_u32_e32 vcc, v5, v7
	; SI-NEXT: v_cmp_eq_u32_e64 s[0:1], v1, v11			; SI-NEXT: v_cmp_eq_u32_e64 s[0:1], v1, v11
	; SI-NEXT: s_xor_b64 vcc, s[0:1], vcc			; SI-NEXT: s_xor_b64 vcc, s[0:1], vcc
	; SI-NEXT: s_nop 1			; SI-NEXT: s_nop 1
	; SI-NEXT: v_div_fmas_f64 v[6:7], v[14:15], v[8:9], v[12:13]			; SI-NEXT: v_div_fmas_f64 v[6:7], v[14:15], v[8:9], v[12:13]
	; SI-NEXT: v_div_fixup_f64 v[6:7], v[6:7], v[4:5], v[0:1]			; SI-NEXT: v_div_fixup_f64 v[6:7], v[6:7], v[4:5], v[0:1]
	Show All 28 Lines
	; CI-NEXT: s_mov_b32 s4, s6			; CI-NEXT: s_mov_b32 s4, s6
	; CI-NEXT: s_mov_b32 s5, s7			; CI-NEXT: s_mov_b32 s5, s7
	; CI-NEXT: s_mov_b32 s6, s2			; CI-NEXT: s_mov_b32 s6, s2
	; CI-NEXT: s_mov_b32 s7, s3			; CI-NEXT: s_mov_b32 s7, s3
	; CI-NEXT: s_mov_b32 s11, s3			; CI-NEXT: s_mov_b32 s11, s3
	; CI-NEXT: buffer_load_dwordx4 v[0:3], off, s[4:7], 0			; CI-NEXT: buffer_load_dwordx4 v[0:3], off, s[4:7], 0
	; CI-NEXT: buffer_load_dwordx4 v[4:7], off, s[8:11], 0 offset:64			; CI-NEXT: buffer_load_dwordx4 v[4:7], off, s[8:11], 0 offset:64
	; CI-NEXT: s_waitcnt vmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0)
	; CI-NEXT: v_div_scale_f64 v[8:9], s[4:5], v[6:7], v[6:7], v[2:3]			; CI-NEXT: v_div_scale_f64 v[8:9], vcc, v[6:7], v[6:7], v[2:3]
	; CI-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]			; CI-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]
	; CI-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0			; CI-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
	; CI-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]			; CI-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
	; CI-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0			; CI-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
	; CI-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]			; CI-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
	; CI-NEXT: v_div_scale_f64 v[12:13], vcc, v[2:3], v[6:7], v[2:3]			; CI-NEXT: v_div_scale_f64 v[12:13], vcc, v[2:3], v[6:7], v[2:3]
	; CI-NEXT: v_mul_f64 v[14:15], v[12:13], v[10:11]			; CI-NEXT: v_mul_f64 v[14:15], v[12:13], v[10:11]
	; CI-NEXT: v_fma_f64 v[8:9], -v[8:9], v[14:15], v[12:13]			; CI-NEXT: v_fma_f64 v[8:9], -v[8:9], v[14:15], v[12:13]
	; CI-NEXT: s_nop 1			; CI-NEXT: s_nop 1
	; CI-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[10:11], v[14:15]			; CI-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[10:11], v[14:15]
	; CI-NEXT: v_div_fixup_f64 v[8:9], v[8:9], v[6:7], v[2:3]			; CI-NEXT: v_div_fixup_f64 v[8:9], v[8:9], v[6:7], v[2:3]
	; CI-NEXT: v_trunc_f64_e32 v[8:9], v[8:9]			; CI-NEXT: v_trunc_f64_e32 v[8:9], v[8:9]
	; CI-NEXT: v_fma_f64 v[2:3], -v[8:9], v[6:7], v[2:3]			; CI-NEXT: v_fma_f64 v[2:3], -v[8:9], v[6:7], v[2:3]
	; CI-NEXT: v_div_scale_f64 v[6:7], s[4:5], v[4:5], v[4:5], v[0:1]			; CI-NEXT: v_div_scale_f64 v[6:7], vcc, v[4:5], v[4:5], v[0:1]
	; CI-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]			; CI-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]
	; CI-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0			; CI-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
	; CI-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]			; CI-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
	; CI-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0			; CI-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
	; CI-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]			; CI-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
	; CI-NEXT: v_div_scale_f64 v[10:11], vcc, v[0:1], v[4:5], v[0:1]			; CI-NEXT: v_div_scale_f64 v[10:11], vcc, v[0:1], v[4:5], v[0:1]
	; CI-NEXT: v_mul_f64 v[12:13], v[10:11], v[8:9]			; CI-NEXT: v_mul_f64 v[12:13], v[10:11], v[8:9]
	; CI-NEXT: v_fma_f64 v[6:7], -v[6:7], v[12:13], v[10:11]			; CI-NEXT: v_fma_f64 v[6:7], -v[6:7], v[12:13], v[10:11]
	Show All 16 Lines
	; VI-NEXT: v_mov_b32_e32 v5, s1			; VI-NEXT: v_mov_b32_e32 v5, s1
	; VI-NEXT: v_mov_b32_e32 v1, s7			; VI-NEXT: v_mov_b32_e32 v1, s7
	; VI-NEXT: v_mov_b32_e32 v4, s0			; VI-NEXT: v_mov_b32_e32 v4, s0
	; VI-NEXT: flat_load_dwordx4 v[0:3], v[0:1]			; VI-NEXT: flat_load_dwordx4 v[0:3], v[0:1]
	; VI-NEXT: flat_load_dwordx4 v[4:7], v[4:5]			; VI-NEXT: flat_load_dwordx4 v[4:7], v[4:5]
	; VI-NEXT: v_mov_b32_e32 v8, s4			; VI-NEXT: v_mov_b32_e32 v8, s4
	; VI-NEXT: v_mov_b32_e32 v9, s5			; VI-NEXT: v_mov_b32_e32 v9, s5
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: v_div_scale_f64 v[10:11], s[0:1], v[6:7], v[6:7], v[2:3]			; VI-NEXT: v_div_scale_f64 v[10:11], vcc, v[6:7], v[6:7], v[2:3]
	; VI-NEXT: v_rcp_f64_e32 v[12:13], v[10:11]			; VI-NEXT: v_rcp_f64_e32 v[12:13], v[10:11]
	; VI-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0			; VI-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
	; VI-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]			; VI-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
	; VI-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0			; VI-NEXT: v_fma_f64 v[14:15], -v[10:11], v[12:13], 1.0
	; VI-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]			; VI-NEXT: v_fma_f64 v[12:13], v[12:13], v[14:15], v[12:13]
	; VI-NEXT: v_div_scale_f64 v[14:15], vcc, v[2:3], v[6:7], v[2:3]			; VI-NEXT: v_div_scale_f64 v[14:15], vcc, v[2:3], v[6:7], v[2:3]
	; VI-NEXT: v_mul_f64 v[16:17], v[14:15], v[12:13]			; VI-NEXT: v_mul_f64 v[16:17], v[14:15], v[12:13]
	; VI-NEXT: v_fma_f64 v[10:11], -v[10:11], v[16:17], v[14:15]			; VI-NEXT: v_fma_f64 v[10:11], -v[10:11], v[16:17], v[14:15]
	; VI-NEXT: s_nop 1			; VI-NEXT: s_nop 1
	; VI-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[12:13], v[16:17]			; VI-NEXT: v_div_fmas_f64 v[10:11], v[10:11], v[12:13], v[16:17]
	; VI-NEXT: v_div_fixup_f64 v[10:11], v[10:11], v[6:7], v[2:3]			; VI-NEXT: v_div_fixup_f64 v[10:11], v[10:11], v[6:7], v[2:3]
	; VI-NEXT: v_trunc_f64_e32 v[10:11], v[10:11]			; VI-NEXT: v_trunc_f64_e32 v[10:11], v[10:11]
	; VI-NEXT: v_fma_f64 v[2:3], -v[10:11], v[6:7], v[2:3]			; VI-NEXT: v_fma_f64 v[2:3], -v[10:11], v[6:7], v[2:3]
	; VI-NEXT: v_div_scale_f64 v[6:7], s[0:1], v[4:5], v[4:5], v[0:1]			; VI-NEXT: v_div_scale_f64 v[6:7], vcc, v[4:5], v[4:5], v[0:1]
	; VI-NEXT: v_rcp_f64_e32 v[10:11], v[6:7]			; VI-NEXT: v_rcp_f64_e32 v[10:11], v[6:7]
	; VI-NEXT: v_fma_f64 v[12:13], -v[6:7], v[10:11], 1.0			; VI-NEXT: v_fma_f64 v[12:13], -v[6:7], v[10:11], 1.0
	; VI-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]			; VI-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
	; VI-NEXT: v_fma_f64 v[12:13], -v[6:7], v[10:11], 1.0			; VI-NEXT: v_fma_f64 v[12:13], -v[6:7], v[10:11], 1.0
	; VI-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]			; VI-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
	; VI-NEXT: v_div_scale_f64 v[12:13], vcc, v[0:1], v[4:5], v[0:1]			; VI-NEXT: v_div_scale_f64 v[12:13], vcc, v[0:1], v[4:5], v[0:1]
	; VI-NEXT: v_mul_f64 v[14:15], v[12:13], v[10:11]			; VI-NEXT: v_mul_f64 v[14:15], v[12:13], v[10:11]
	; VI-NEXT: v_fma_f64 v[6:7], -v[6:7], v[14:15], v[12:13]			; VI-NEXT: v_fma_f64 v[6:7], -v[6:7], v[14:15], v[12:13]
	Show All 9 Lines
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX9-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34			; GFX9-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	; GFX9-NEXT: v_mov_b32_e32 v16, 0			; GFX9-NEXT: v_mov_b32_e32 v16, 0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v16, s[6:7]			; GFX9-NEXT: global_load_dwordx4 v[0:3], v16, s[6:7]
	; GFX9-NEXT: global_load_dwordx4 v[4:7], v16, s[2:3] offset:64			; GFX9-NEXT: global_load_dwordx4 v[4:7], v16, s[2:3] offset:64
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_div_scale_f64 v[8:9], s[0:1], v[6:7], v[6:7], v[2:3]			; GFX9-NEXT: v_div_scale_f64 v[8:9], vcc, v[6:7], v[6:7], v[2:3]
	; GFX9-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]			; GFX9-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]
	; GFX9-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0			; GFX9-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
	; GFX9-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]			; GFX9-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
	; GFX9-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0			; GFX9-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
	; GFX9-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]			; GFX9-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
	; GFX9-NEXT: v_div_scale_f64 v[12:13], vcc, v[2:3], v[6:7], v[2:3]			; GFX9-NEXT: v_div_scale_f64 v[12:13], vcc, v[2:3], v[6:7], v[2:3]
	; GFX9-NEXT: v_mul_f64 v[14:15], v[12:13], v[10:11]			; GFX9-NEXT: v_mul_f64 v[14:15], v[12:13], v[10:11]
	; GFX9-NEXT: v_fma_f64 v[8:9], -v[8:9], v[14:15], v[12:13]			; GFX9-NEXT: v_fma_f64 v[8:9], -v[8:9], v[14:15], v[12:13]
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[10:11], v[14:15]			; GFX9-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[10:11], v[14:15]
	; GFX9-NEXT: v_div_fixup_f64 v[8:9], v[8:9], v[6:7], v[2:3]			; GFX9-NEXT: v_div_fixup_f64 v[8:9], v[8:9], v[6:7], v[2:3]
	; GFX9-NEXT: v_trunc_f64_e32 v[8:9], v[8:9]			; GFX9-NEXT: v_trunc_f64_e32 v[8:9], v[8:9]
	; GFX9-NEXT: v_fma_f64 v[2:3], -v[8:9], v[6:7], v[2:3]			; GFX9-NEXT: v_fma_f64 v[2:3], -v[8:9], v[6:7], v[2:3]
	; GFX9-NEXT: v_div_scale_f64 v[6:7], s[0:1], v[4:5], v[4:5], v[0:1]			; GFX9-NEXT: v_div_scale_f64 v[6:7], vcc, v[4:5], v[4:5], v[0:1]
	; GFX9-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]			; GFX9-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]
	; GFX9-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0			; GFX9-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
	; GFX9-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]			; GFX9-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
	; GFX9-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0			; GFX9-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
	; GFX9-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]			; GFX9-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
	; GFX9-NEXT: v_div_scale_f64 v[10:11], vcc, v[0:1], v[4:5], v[0:1]			; GFX9-NEXT: v_div_scale_f64 v[10:11], vcc, v[0:1], v[4:5], v[0:1]
	; GFX9-NEXT: v_mul_f64 v[12:13], v[10:11], v[8:9]			; GFX9-NEXT: v_mul_f64 v[12:13], v[10:11], v[8:9]
	; GFX9-NEXT: v_fma_f64 v[6:7], -v[6:7], v[12:13], v[10:11]			; GFX9-NEXT: v_fma_f64 v[6:7], -v[6:7], v[12:13], v[10:11]
	Show All 11 Lines
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34			; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	; GFX10-NEXT: v_mov_b32_e32 v16, 0			; GFX10-NEXT: v_mov_b32_e32 v16, 0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v16, s[6:7]			; GFX10-NEXT: global_load_dwordx4 v[0:3], v16, s[6:7]
	; GFX10-NEXT: global_load_dwordx4 v[4:7], v16, s[2:3] offset:64			; GFX10-NEXT: global_load_dwordx4 v[4:7], v16, s[2:3] offset:64
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f64 v[8:9], s0, v[6:7], v[6:7], v[2:3]			; GFX10-NEXT: v_div_scale_f64 v[8:9], vcc_lo, v[6:7], v[6:7], v[2:3]
	; GFX10-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]			; GFX10-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]
	; GFX10-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0			; GFX10-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
	; GFX10-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]			; GFX10-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
	; GFX10-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0			; GFX10-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
	; GFX10-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]			; GFX10-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
	; GFX10-NEXT: v_div_scale_f64 v[12:13], vcc_lo, v[2:3], v[6:7], v[2:3]			; GFX10-NEXT: v_div_scale_f64 v[12:13], vcc_lo, v[2:3], v[6:7], v[2:3]
	; GFX10-NEXT: v_mul_f64 v[14:15], v[12:13], v[10:11]			; GFX10-NEXT: v_mul_f64 v[14:15], v[12:13], v[10:11]
	; GFX10-NEXT: v_fma_f64 v[8:9], -v[8:9], v[14:15], v[12:13]			; GFX10-NEXT: v_fma_f64 v[8:9], -v[8:9], v[14:15], v[12:13]
	; GFX10-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[10:11], v[14:15]			; GFX10-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[10:11], v[14:15]
	; GFX10-NEXT: v_div_fixup_f64 v[8:9], v[8:9], v[6:7], v[2:3]			; GFX10-NEXT: v_div_fixup_f64 v[8:9], v[8:9], v[6:7], v[2:3]
	; GFX10-NEXT: v_trunc_f64_e32 v[8:9], v[8:9]			; GFX10-NEXT: v_trunc_f64_e32 v[8:9], v[8:9]
	; GFX10-NEXT: v_fma_f64 v[2:3], -v[8:9], v[6:7], v[2:3]			; GFX10-NEXT: v_fma_f64 v[2:3], -v[8:9], v[6:7], v[2:3]
	; GFX10-NEXT: v_div_scale_f64 v[6:7], s0, v[4:5], v[4:5], v[0:1]			; GFX10-NEXT: v_div_scale_f64 v[6:7], vcc_lo, v[4:5], v[4:5], v[0:1]
	; GFX10-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]			; GFX10-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]
	; GFX10-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0			; GFX10-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
	; GFX10-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]			; GFX10-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
	; GFX10-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0			; GFX10-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
	; GFX10-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]			; GFX10-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
	; GFX10-NEXT: v_div_scale_f64 v[10:11], vcc_lo, v[0:1], v[4:5], v[0:1]			; GFX10-NEXT: v_div_scale_f64 v[10:11], vcc_lo, v[0:1], v[4:5], v[0:1]
	; GFX10-NEXT: v_mul_f64 v[12:13], v[10:11], v[8:9]			; GFX10-NEXT: v_mul_f64 v[12:13], v[10:11], v[8:9]
	; GFX10-NEXT: v_fma_f64 v[6:7], -v[6:7], v[12:13], v[10:11]			; GFX10-NEXT: v_fma_f64 v[6:7], -v[6:7], v[12:13], v[10:11]
	Show All 10 Lines
	; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24			; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x34			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x34
	; GFX11-NEXT: v_mov_b32_e32 v16, 0			; GFX11-NEXT: v_mov_b32_e32 v16, 0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: global_load_b128 v[0:3], v16, s[6:7]			; GFX11-NEXT: global_load_b128 v[0:3], v16, s[6:7]
	; GFX11-NEXT: global_load_b128 v[4:7], v16, s[0:1] offset:64			; GFX11-NEXT: global_load_b128 v[4:7], v16, s[0:1] offset:64
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_div_scale_f64 v[8:9], null, v[6:7], v[6:7], v[2:3]			; GFX11-NEXT: v_div_scale_f64 v[8:9], vcc_lo, v[6:7], v[6:7], v[2:3]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]			; GFX11-NEXT: v_rcp_f64_e32 v[10:11], v[8:9]
	; GFX11-NEXT: s_waitcnt_depctr 0xfff			; GFX11-NEXT: s_waitcnt_depctr 0xfff
	; GFX11-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0			; GFX11-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
	; GFX11-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]			; GFX11-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0			; GFX11-NEXT: v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
	; GFX11-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]			; GFX11-NEXT: v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
	; GFX11-NEXT: v_div_scale_f64 v[12:13], vcc_lo, v[2:3], v[6:7], v[2:3]			; GFX11-NEXT: v_div_scale_f64 v[12:13], vcc_lo, v[2:3], v[6:7], v[2:3]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_mul_f64 v[14:15], v[12:13], v[10:11]			; GFX11-NEXT: v_mul_f64 v[14:15], v[12:13], v[10:11]
	; GFX11-NEXT: v_fma_f64 v[8:9], -v[8:9], v[14:15], v[12:13]			; GFX11-NEXT: v_fma_f64 v[8:9], -v[8:9], v[14:15], v[12:13]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[10:11], v[14:15]			; GFX11-NEXT: v_div_fmas_f64 v[8:9], v[8:9], v[10:11], v[14:15]
	; GFX11-NEXT: v_div_fixup_f64 v[8:9], v[8:9], v[6:7], v[2:3]			; GFX11-NEXT: v_div_fixup_f64 v[8:9], v[8:9], v[6:7], v[2:3]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_trunc_f64_e32 v[8:9], v[8:9]			; GFX11-NEXT: v_trunc_f64_e32 v[8:9], v[8:9]
	; GFX11-NEXT: v_fma_f64 v[2:3], -v[8:9], v[6:7], v[2:3]			; GFX11-NEXT: v_fma_f64 v[2:3], -v[8:9], v[6:7], v[2:3]
	; GFX11-NEXT: v_div_scale_f64 v[6:7], null, v[4:5], v[4:5], v[0:1]			; GFX11-NEXT: v_div_scale_f64 v[6:7], vcc_lo, v[4:5], v[4:5], v[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_2) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]			; GFX11-NEXT: v_rcp_f64_e32 v[8:9], v[6:7]
	; GFX11-NEXT: s_waitcnt_depctr 0xfff			; GFX11-NEXT: s_waitcnt_depctr 0xfff
	; GFX11-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0			; GFX11-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
	; GFX11-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]			; GFX11-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0			; GFX11-NEXT: v_fma_f64 v[10:11], -v[6:7], v[8:9], 1.0
	; GFX11-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]			; GFX11-NEXT: v_fma_f64 v[8:9], v[8:9], v[10:11], v[8:9]
	Show All 24 Lines

llvm/test/CodeGen/AMDGPU/inserted-wait-states.mir

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	bb.1:
S_BRANCH %bb.2		S_BRANCH %bb.2

bb.2:		bb.2:
$vcc = V_CMP_EQ_I32_e64 $vgpr1, $vgpr2, implicit $exec		$vcc = V_CMP_EQ_I32_e64 $vgpr1, $vgpr2, implicit $exec
$vgpr0 = V_DIV_FMAS_F32_e64 0, $vgpr1, 0, $vgpr2, 0, $vgpr3, 0, 0, implicit $mode, implicit $vcc, implicit $exec		$vgpr0 = V_DIV_FMAS_F32_e64 0, $vgpr1, 0, $vgpr2, 0, $vgpr3, 0, 0, implicit $mode, implicit $vcc, implicit $exec
S_BRANCH %bb.3		S_BRANCH %bb.3

bb.3:		bb.3:
$vgpr4, $vcc = V_DIV_SCALE_F32_e64 0, $vgpr1, 0, $vgpr1, 0, $vgpr3, 0, 0, implicit $mode, implicit $exec		$vgpr4 = V_DIV_SCALE_F32_e64 0, $vgpr1, 0, $vgpr1, 0, $vgpr3, 0, 0, implicit $mode, implicit-def $vcc, implicit $exec
$vgpr0 = V_DIV_FMAS_F32_e64 0, $vgpr1, 0, $vgpr2, 0, $vgpr3, 0, 0, implicit $mode, implicit $vcc, implicit $exec		$vgpr0 = V_DIV_FMAS_F32_e64 0, $vgpr1, 0, $vgpr2, 0, $vgpr3, 0, 0, implicit $mode, implicit $vcc, implicit $exec
S_ENDPGM 0		S_ENDPGM 0

...		...

...		...
---		---
# GCN-LABEL: name: s_getreg		# GCN-LABEL: name: s_getreg
▲ Show 20 Lines • Show All 363 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.div.scale.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=SI %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=SI %s

	declare i32 @llvm.amdgcn.workitem.id.x() #1			declare i32 @llvm.amdgcn.workitem.id.x() #1
	declare { float, i1 } @llvm.amdgcn.div.scale.f32(float, float, i1) #1			declare { float, i1 } @llvm.amdgcn.div.scale.f32(float, float, i1) #1
	declare { double, i1 } @llvm.amdgcn.div.scale.f64(double, double, i1) #1			declare { double, i1 } @llvm.amdgcn.div.scale.f64(double, double, i1) #1
	declare float @llvm.fabs.f32(float) #1			declare float @llvm.fabs.f32(float) #1

	; SI-LABEL: {{^}}test_div_scale_f32_1:			; SI-LABEL: {{^}}test_div_scale_f32_1:
	; SI-DAG: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64			; SI-DAG: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64
	; SI-DAG: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4			; SI-DAG: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4
	; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], [[RESULT1:s\[[0-9]+:[0-9]+\]]], [[B]], [[B]], [[A]]			; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], vcc, [[B]], [[B]], [[A]]
	; SI: buffer_store_dword [[RESULT0]]			; SI: buffer_store_dword [[RESULT0]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_div_scale_f32_1(float addrspace(1)* %out, float addrspace(1)* %in) nounwind {			define amdgpu_kernel void @test_div_scale_f32_1(float addrspace(1)* %out, float addrspace(1)* %in) nounwind {
	%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
	%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid
	%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1

	%a = load volatile float, float addrspace(1)* %gep.0, align 4			%a = load volatile float, float addrspace(1)* %gep.0, align 4
	%b = load volatile float, float addrspace(1)* %gep.1, align 4			%b = load volatile float, float addrspace(1)* %gep.1, align 4

	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b, i1 false) nounwind readnone			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b, i1 false) nounwind readnone
	%result0 = extractvalue { float, i1 } %result, 0			%result0 = extractvalue { float, i1 } %result, 0
	store float %result0, float addrspace(1)* %out, align 4			store float %result0, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f32_2:			; SI-LABEL: {{^}}test_div_scale_f32_2:
	; SI-DAG: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64			; SI-DAG: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64
	; SI-DAG: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4			; SI-DAG: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4
	; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], [[RESULT1:s\[[0-9]+:[0-9]+\]]], [[A]], [[B]], [[A]]			; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], vcc, [[A]], [[B]], [[A]]
	; SI: buffer_store_dword [[RESULT0]]			; SI: buffer_store_dword [[RESULT0]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_div_scale_f32_2(float addrspace(1)* %out, float addrspace(1)* %in) nounwind {			define amdgpu_kernel void @test_div_scale_f32_2(float addrspace(1)* %out, float addrspace(1)* %in) nounwind {
	%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
	%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid
	%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1

	%a = load volatile float, float addrspace(1)* %gep.0, align 4			%a = load volatile float, float addrspace(1)* %gep.0, align 4
	%b = load volatile float, float addrspace(1)* %gep.1, align 4			%b = load volatile float, float addrspace(1)* %gep.1, align 4

	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b, i1 true) nounwind readnone			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b, i1 true) nounwind readnone
	%result0 = extractvalue { float, i1 } %result, 0			%result0 = extractvalue { float, i1 } %result, 0
	store float %result0, float addrspace(1)* %out, align 4			store float %result0, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f64_1:			; SI-LABEL: {{^}}test_div_scale_f64_1:
	; SI-DAG: buffer_load_dwordx2 [[A:v\[[0-9]+:[0-9]+\]]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64			; SI-DAG: buffer_load_dwordx2 [[A:v\[[0-9]+:[0-9]+\]]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64
	; SI-DAG: buffer_load_dwordx2 [[B:v\[[0-9]+:[0-9]+\]]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:8			; SI-DAG: buffer_load_dwordx2 [[B:v\[[0-9]+:[0-9]+\]]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:8
	; SI: v_div_scale_f64 [[RESULT0:v\[[0-9]+:[0-9]+\]]], [[RESULT1:s\[[0-9]+:[0-9]+\]]], [[B]], [[B]], [[A]]			; SI: v_div_scale_f64 [[RESULT0:v\[[0-9]+:[0-9]+\]]], vcc, [[B]], [[B]], [[A]]
	; SI: buffer_store_dwordx2 [[RESULT0]]			; SI: buffer_store_dwordx2 [[RESULT0]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_div_scale_f64_1(double addrspace(1)* %out, double addrspace(1)* %aptr, double addrspace(1)* %in) nounwind {			define amdgpu_kernel void @test_div_scale_f64_1(double addrspace(1)* %out, double addrspace(1)* %aptr, double addrspace(1)* %in) nounwind {
	%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
	%gep.0 = getelementptr double, double addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr double, double addrspace(1)* %in, i32 %tid
	%gep.1 = getelementptr double, double addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr double, double addrspace(1)* %gep.0, i32 1

	%a = load volatile double, double addrspace(1)* %gep.0, align 8			%a = load volatile double, double addrspace(1)* %gep.0, align 8
	%b = load volatile double, double addrspace(1)* %gep.1, align 8			%b = load volatile double, double addrspace(1)* %gep.1, align 8

	%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double %a, double %b, i1 false) nounwind readnone			%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double %a, double %b, i1 false) nounwind readnone
	%result0 = extractvalue { double, i1 } %result, 0			%result0 = extractvalue { double, i1 } %result, 0
	store double %result0, double addrspace(1)* %out, align 8			store double %result0, double addrspace(1)* %out, align 8
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f64_2:			; SI-LABEL: {{^}}test_div_scale_f64_2:
	; SI-DAG: buffer_load_dwordx2 [[A:v\[[0-9]+:[0-9]+\]]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64			; SI-DAG: buffer_load_dwordx2 [[A:v\[[0-9]+:[0-9]+\]]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64
	; SI-DAG: buffer_load_dwordx2 [[B:v\[[0-9]+:[0-9]+\]]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:8			; SI-DAG: buffer_load_dwordx2 [[B:v\[[0-9]+:[0-9]+\]]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:8
	; SI: v_div_scale_f64 [[RESULT0:v\[[0-9]+:[0-9]+\]]], [[RESULT1:s\[[0-9]+:[0-9]+\]]], [[A]], [[B]], [[A]]			; SI: v_div_scale_f64 [[RESULT0:v\[[0-9]+:[0-9]+\]]], vcc, [[A]], [[B]], [[A]]
	; SI: buffer_store_dwordx2 [[RESULT0]]			; SI: buffer_store_dwordx2 [[RESULT0]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_div_scale_f64_2(double addrspace(1)* %out, double addrspace(1)* %aptr, double addrspace(1)* %in) nounwind {			define amdgpu_kernel void @test_div_scale_f64_2(double addrspace(1)* %out, double addrspace(1)* %aptr, double addrspace(1)* %in) nounwind {
	%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
	%gep.0 = getelementptr double, double addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr double, double addrspace(1)* %in, i32 %tid
	%gep.1 = getelementptr double, double addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr double, double addrspace(1)* %gep.0, i32 1

	%a = load volatile double, double addrspace(1)* %gep.0, align 8			%a = load volatile double, double addrspace(1)* %gep.0, align 8
	%b = load volatile double, double addrspace(1)* %gep.1, align 8			%b = load volatile double, double addrspace(1)* %gep.1, align 8

	%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double %a, double %b, i1 true) nounwind readnone			%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double %a, double %b, i1 true) nounwind readnone
	%result0 = extractvalue { double, i1 } %result, 0			%result0 = extractvalue { double, i1 } %result, 0
	store double %result0, double addrspace(1)* %out, align 8			store double %result0, double addrspace(1)* %out, align 8
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f32_scalar_num_1:			; SI-LABEL: {{^}}test_div_scale_f32_scalar_num_1:
	; SI-DAG: buffer_load_dword [[B:v[0-9]+]]			; SI-DAG: buffer_load_dword [[B:v[0-9]+]]
	; SI-DAG: s_load_dword [[A:s[0-9]+]]			; SI-DAG: s_load_dword [[A:s[0-9]+]]
	; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], [[RESULT1:s\[[0-9]+:[0-9]+\]]], [[B]], [[B]], [[A]]			; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], vcc, [[B]], [[B]], [[A]]
	; SI: buffer_store_dword [[RESULT0]]			; SI: buffer_store_dword [[RESULT0]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_div_scale_f32_scalar_num_1(float addrspace(1)* %out, float addrspace(1)* %in, float %a) nounwind {			define amdgpu_kernel void @test_div_scale_f32_scalar_num_1(float addrspace(1)* %out, float addrspace(1)* %in, float %a) nounwind {
	%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
	%gep = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep = getelementptr float, float addrspace(1)* %in, i32 %tid

	%b = load float, float addrspace(1)* %gep, align 4			%b = load float, float addrspace(1)* %gep, align 4

	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b, i1 false) nounwind readnone			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b, i1 false) nounwind readnone
	%result0 = extractvalue { float, i1 } %result, 0			%result0 = extractvalue { float, i1 } %result, 0
	store float %result0, float addrspace(1)* %out, align 4			store float %result0, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f32_scalar_num_2:			; SI-LABEL: {{^}}test_div_scale_f32_scalar_num_2:
	; SI-DAG: buffer_load_dword [[B:v[0-9]+]]			; SI-DAG: buffer_load_dword [[B:v[0-9]+]]
	; SI-DAG: s_load_dword [[A:s[0-9]+]]			; SI-DAG: s_load_dword [[A:s[0-9]+]]
	; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], [[RESULT1:s\[[0-9]+:[0-9]+\]]], [[A]], [[B]], [[A]]			; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], vcc, [[A]], [[B]], [[A]]
	; SI: buffer_store_dword [[RESULT0]]			; SI: buffer_store_dword [[RESULT0]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_div_scale_f32_scalar_num_2(float addrspace(1)* %out, float addrspace(1)* %in, float %a) nounwind {			define amdgpu_kernel void @test_div_scale_f32_scalar_num_2(float addrspace(1)* %out, float addrspace(1)* %in, float %a) nounwind {
	%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
	%gep = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep = getelementptr float, float addrspace(1)* %in, i32 %tid

	%b = load float, float addrspace(1)* %gep, align 4			%b = load float, float addrspace(1)* %gep, align 4

	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b, i1 true) nounwind readnone			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b, i1 true) nounwind readnone
	%result0 = extractvalue { float, i1 } %result, 0			%result0 = extractvalue { float, i1 } %result, 0
	store float %result0, float addrspace(1)* %out, align 4			store float %result0, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f32_scalar_den_1:			; SI-LABEL: {{^}}test_div_scale_f32_scalar_den_1:
	; SI-DAG: buffer_load_dword [[A:v[0-9]+]]			; SI-DAG: buffer_load_dword [[A:v[0-9]+]]
	; SI-DAG: s_load_dword [[B:s[0-9]+]]			; SI-DAG: s_load_dword [[B:s[0-9]+]]
	; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], [[RESULT1:s\[[0-9]+:[0-9]+\]]], [[B]], [[B]], [[A]]			; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], vcc, [[B]], [[B]], [[A]]
	; SI: buffer_store_dword [[RESULT0]]			; SI: buffer_store_dword [[RESULT0]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_div_scale_f32_scalar_den_1(float addrspace(1)* %out, float addrspace(1)* %in, float %b) nounwind {			define amdgpu_kernel void @test_div_scale_f32_scalar_den_1(float addrspace(1)* %out, float addrspace(1)* %in, float %b) nounwind {
	%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
	%gep = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep = getelementptr float, float addrspace(1)* %in, i32 %tid

	%a = load float, float addrspace(1)* %gep, align 4			%a = load float, float addrspace(1)* %gep, align 4

	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b, i1 false) nounwind readnone			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b, i1 false) nounwind readnone
	%result0 = extractvalue { float, i1 } %result, 0			%result0 = extractvalue { float, i1 } %result, 0
	store float %result0, float addrspace(1)* %out, align 4			store float %result0, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f32_scalar_den_2:			; SI-LABEL: {{^}}test_div_scale_f32_scalar_den_2:
	; SI-DAG: buffer_load_dword [[A:v[0-9]+]]			; SI-DAG: buffer_load_dword [[A:v[0-9]+]]
	; SI-DAG: s_load_dword [[B:s[0-9]+]]			; SI-DAG: s_load_dword [[B:s[0-9]+]]
	; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], [[RESULT1:s\[[0-9]+:[0-9]+\]]], [[A]], [[B]], [[A]]			; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], vcc, [[A]], [[B]], [[A]]
	; SI: buffer_store_dword [[RESULT0]]			; SI: buffer_store_dword [[RESULT0]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_div_scale_f32_scalar_den_2(float addrspace(1)* %out, float addrspace(1)* %in, float %b) nounwind {			define amdgpu_kernel void @test_div_scale_f32_scalar_den_2(float addrspace(1)* %out, float addrspace(1)* %in, float %b) nounwind {
	%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
	%gep = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep = getelementptr float, float addrspace(1)* %in, i32 %tid

	%a = load float, float addrspace(1)* %gep, align 4			%a = load float, float addrspace(1)* %gep, align 4

	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b, i1 true) nounwind readnone			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b, i1 true) nounwind readnone
	%result0 = extractvalue { float, i1 } %result, 0			%result0 = extractvalue { float, i1 } %result, 0
	store float %result0, float addrspace(1)* %out, align 4			store float %result0, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f64_scalar_num_1:			; SI-LABEL: {{^}}test_div_scale_f64_scalar_num_1:
	; SI-DAG: buffer_load_dwordx2 [[B:v\[[0-9]+:[0-9]+\]]]			; SI-DAG: buffer_load_dwordx2 [[B:v\[[0-9]+:[0-9]+\]]]
	; SI-DAG: s_load_dwordx2 [[A:s\[[0-9]+:[0-9]+\]]], {{s\[[0-9]+:[0-9]+\]}}, 0xd			; SI-DAG: s_load_dwordx2 [[A:s\[[0-9]+:[0-9]+\]]], {{s\[[0-9]+:[0-9]+\]}}, 0xd
	; SI: v_div_scale_f64 [[RESULT0:v\[[0-9]+:[0-9]+\]]], [[RESULT1:s\[[0-9]+:[0-9]+\]]], [[B]], [[B]], [[A]]			; SI: v_div_scale_f64 [[RESULT0:v\[[0-9]+:[0-9]+\]]], vcc, [[B]], [[B]], [[A]]
	; SI: buffer_store_dwordx2 [[RESULT0]]			; SI: buffer_store_dwordx2 [[RESULT0]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_div_scale_f64_scalar_num_1(double addrspace(1)* %out, double addrspace(1)* %in, double %a) nounwind {			define amdgpu_kernel void @test_div_scale_f64_scalar_num_1(double addrspace(1)* %out, double addrspace(1)* %in, double %a) nounwind {
	%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
	%gep = getelementptr double, double addrspace(1)* %in, i32 %tid			%gep = getelementptr double, double addrspace(1)* %in, i32 %tid

	%b = load double, double addrspace(1)* %gep, align 8			%b = load double, double addrspace(1)* %gep, align 8

	%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double %a, double %b, i1 false) nounwind readnone			%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double %a, double %b, i1 false) nounwind readnone
	%result0 = extractvalue { double, i1 } %result, 0			%result0 = extractvalue { double, i1 } %result, 0
	store double %result0, double addrspace(1)* %out, align 8			store double %result0, double addrspace(1)* %out, align 8
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f64_scalar_num_2:			; SI-LABEL: {{^}}test_div_scale_f64_scalar_num_2:
	; SI-DAG: s_load_dwordx2 [[A:s\[[0-9]+:[0-9]+\]]], {{s\[[0-9]+:[0-9]+\]}}, 0xd			; SI-DAG: s_load_dwordx2 [[A:s\[[0-9]+:[0-9]+\]]], {{s\[[0-9]+:[0-9]+\]}}, 0xd
	; SI-DAG: buffer_load_dwordx2 [[B:v\[[0-9]+:[0-9]+\]]]			; SI-DAG: buffer_load_dwordx2 [[B:v\[[0-9]+:[0-9]+\]]]
	; SI: v_div_scale_f64 [[RESULT0:v\[[0-9]+:[0-9]+\]]], [[RESULT1:s\[[0-9]+:[0-9]+\]]], [[A]], [[B]], [[A]]			; SI: v_div_scale_f64 [[RESULT0:v\[[0-9]+:[0-9]+\]]], vcc, [[A]], [[B]], [[A]]
	; SI: buffer_store_dwordx2 [[RESULT0]]			; SI: buffer_store_dwordx2 [[RESULT0]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_div_scale_f64_scalar_num_2(double addrspace(1)* %out, double addrspace(1)* %in, double %a) nounwind {			define amdgpu_kernel void @test_div_scale_f64_scalar_num_2(double addrspace(1)* %out, double addrspace(1)* %in, double %a) nounwind {
	%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
	%gep = getelementptr double, double addrspace(1)* %in, i32 %tid			%gep = getelementptr double, double addrspace(1)* %in, i32 %tid

	%b = load double, double addrspace(1)* %gep, align 8			%b = load double, double addrspace(1)* %gep, align 8

	%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double %a, double %b, i1 true) nounwind readnone			%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double %a, double %b, i1 true) nounwind readnone
	%result0 = extractvalue { double, i1 } %result, 0			%result0 = extractvalue { double, i1 } %result, 0
	store double %result0, double addrspace(1)* %out, align 8			store double %result0, double addrspace(1)* %out, align 8
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f64_scalar_den_1:			; SI-LABEL: {{^}}test_div_scale_f64_scalar_den_1:
	; SI-DAG: buffer_load_dwordx2 [[A:v\[[0-9]+:[0-9]+\]]]			; SI-DAG: buffer_load_dwordx2 [[A:v\[[0-9]+:[0-9]+\]]]
	; SI-DAG: s_load_dwordx2 [[B:s\[[0-9]+:[0-9]+\]]], {{s\[[0-9]+:[0-9]+\]}}, 0xd			; SI-DAG: s_load_dwordx2 [[B:s\[[0-9]+:[0-9]+\]]], {{s\[[0-9]+:[0-9]+\]}}, 0xd
	; SI: v_div_scale_f64 [[RESULT0:v\[[0-9]+:[0-9]+\]]], [[RESULT1:s\[[0-9]+:[0-9]+\]]], [[B]], [[B]], [[A]]			; SI: v_div_scale_f64 [[RESULT0:v\[[0-9]+:[0-9]+\]]], vcc, [[B]], [[B]], [[A]]
	; SI: buffer_store_dwordx2 [[RESULT0]]			; SI: buffer_store_dwordx2 [[RESULT0]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_div_scale_f64_scalar_den_1(double addrspace(1)* %out, double addrspace(1)* %in, double %b) nounwind {			define amdgpu_kernel void @test_div_scale_f64_scalar_den_1(double addrspace(1)* %out, double addrspace(1)* %in, double %b) nounwind {
	%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
	%gep = getelementptr double, double addrspace(1)* %in, i32 %tid			%gep = getelementptr double, double addrspace(1)* %in, i32 %tid

	%a = load double, double addrspace(1)* %gep, align 8			%a = load double, double addrspace(1)* %gep, align 8

	%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double %a, double %b, i1 false) nounwind readnone			%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double %a, double %b, i1 false) nounwind readnone
	%result0 = extractvalue { double, i1 } %result, 0			%result0 = extractvalue { double, i1 } %result, 0
	store double %result0, double addrspace(1)* %out, align 8			store double %result0, double addrspace(1)* %out, align 8
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f64_scalar_den_2:			; SI-LABEL: {{^}}test_div_scale_f64_scalar_den_2:
	; SI-DAG: buffer_load_dwordx2 [[A:v\[[0-9]+:[0-9]+\]]]			; SI-DAG: buffer_load_dwordx2 [[A:v\[[0-9]+:[0-9]+\]]]
	; SI-DAG: s_load_dwordx2 [[B:s\[[0-9]+:[0-9]+\]]], {{s\[[0-9]+:[0-9]+\]}}, 0xd			; SI-DAG: s_load_dwordx2 [[B:s\[[0-9]+:[0-9]+\]]], {{s\[[0-9]+:[0-9]+\]}}, 0xd
	; SI: v_div_scale_f64 [[RESULT0:v\[[0-9]+:[0-9]+\]]], [[RESULT1:s\[[0-9]+:[0-9]+\]]], [[A]], [[B]], [[A]]			; SI: v_div_scale_f64 [[RESULT0:v\[[0-9]+:[0-9]+\]]], vcc, [[A]], [[B]], [[A]]
	; SI: buffer_store_dwordx2 [[RESULT0]]			; SI: buffer_store_dwordx2 [[RESULT0]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_div_scale_f64_scalar_den_2(double addrspace(1)* %out, double addrspace(1)* %in, double %b) nounwind {			define amdgpu_kernel void @test_div_scale_f64_scalar_den_2(double addrspace(1)* %out, double addrspace(1)* %in, double %b) nounwind {
	%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
	%gep = getelementptr double, double addrspace(1)* %in, i32 %tid			%gep = getelementptr double, double addrspace(1)* %in, i32 %tid

	%a = load double, double addrspace(1)* %gep, align 8			%a = load double, double addrspace(1)* %gep, align 8

	%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double %a, double %b, i1 true) nounwind readnone			%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double %a, double %b, i1 true) nounwind readnone
	%result0 = extractvalue { double, i1 } %result, 0			%result0 = extractvalue { double, i1 } %result, 0
	store double %result0, double addrspace(1)* %out, align 8			store double %result0, double addrspace(1)* %out, align 8
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f32_all_scalar_1:			; SI-LABEL: {{^}}test_div_scale_f32_all_scalar_1:
	; SI-DAG: s_load_dword [[A:s[0-9]+]], {{s\[[0-9]+:[0-9]+\]}}, 0x13			; SI-DAG: s_load_dword [[A:s[0-9]+]], {{s\[[0-9]+:[0-9]+\]}}, 0x13
	; SI-DAG: s_load_dword [[B:s[0-9]+]], {{s\[[0-9]+:[0-9]+\]}}, 0x1c			; SI-DAG: s_load_dword [[B:s[0-9]+]], {{s\[[0-9]+:[0-9]+\]}}, 0x1c
	; SI: v_mov_b32_e32 [[VA:v[0-9]+]], [[A]]			; SI: v_mov_b32_e32 [[VA:v[0-9]+]], [[A]]
	; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], [[RESULT1:s\[[0-9]+:[0-9]+\]]], [[B]], [[B]], [[VA]]			; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], vcc, [[B]], [[B]], [[VA]]
	; SI: buffer_store_dword [[RESULT0]]			; SI: buffer_store_dword [[RESULT0]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_div_scale_f32_all_scalar_1(float addrspace(1)* %out, [8 x i32], float %a, [8 x i32], float %b) nounwind {			define amdgpu_kernel void @test_div_scale_f32_all_scalar_1(float addrspace(1)* %out, [8 x i32], float %a, [8 x i32], float %b) nounwind {
	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b, i1 false) nounwind readnone			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b, i1 false) nounwind readnone
	%result0 = extractvalue { float, i1 } %result, 0			%result0 = extractvalue { float, i1 } %result, 0
	store float %result0, float addrspace(1)* %out, align 4			store float %result0, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f32_all_scalar_2:			; SI-LABEL: {{^}}test_div_scale_f32_all_scalar_2:
	; SI-DAG: s_load_dword [[A:s[0-9]+]], {{s\[[0-9]+:[0-9]+\]}}, 0x13			; SI-DAG: s_load_dword [[A:s[0-9]+]], {{s\[[0-9]+:[0-9]+\]}}, 0x13
	; SI-DAG: s_load_dword [[B:s[0-9]+]], {{s\[[0-9]+:[0-9]+\]}}, 0x1c			; SI-DAG: s_load_dword [[B:s[0-9]+]], {{s\[[0-9]+:[0-9]+\]}}, 0x1c
	; SI: v_mov_b32_e32 [[VB:v[0-9]+]], [[B]]			; SI: v_mov_b32_e32 [[VB:v[0-9]+]], [[B]]
	; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], [[RESULT1:s\[[0-9]+:[0-9]+\]]], [[A]], [[VB]], [[A]]			; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], vcc, [[A]], [[VB]], [[A]]
	; SI: buffer_store_dword [[RESULT0]]			; SI: buffer_store_dword [[RESULT0]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_div_scale_f32_all_scalar_2(float addrspace(1)* %out, [8 x i32], float %a, [8 x i32], float %b) nounwind {			define amdgpu_kernel void @test_div_scale_f32_all_scalar_2(float addrspace(1)* %out, [8 x i32], float %a, [8 x i32], float %b) nounwind {
	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b, i1 true) nounwind readnone			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b, i1 true) nounwind readnone
	%result0 = extractvalue { float, i1 } %result, 0			%result0 = extractvalue { float, i1 } %result, 0
	store float %result0, float addrspace(1)* %out, align 4			store float %result0, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f64_all_scalar_1:			; SI-LABEL: {{^}}test_div_scale_f64_all_scalar_1:
	; SI-DAG: s_load_dwordx2 s[[[A_LO:[0-9]+]]:[[A_HI:[0-9]+]]], {{s\[[0-9]+:[0-9]+\]}}, 0x13			; SI-DAG: s_load_dwordx2 s[[[A_LO:[0-9]+]]:[[A_HI:[0-9]+]]], {{s\[[0-9]+:[0-9]+\]}}, 0x13
	; SI-DAG: s_load_dwordx2 [[B:s\[[0-9]+:[0-9]+\]]], {{s\[[0-9]+:[0-9]+\]}}, 0x1d			; SI-DAG: s_load_dwordx2 [[B:s\[[0-9]+:[0-9]+\]]], {{s\[[0-9]+:[0-9]+\]}}, 0x1d
	; SI-DAG: v_mov_b32_e32 v[[VA_LO:[0-9]+]], s[[A_LO]]			; SI-DAG: v_mov_b32_e32 v[[VA_LO:[0-9]+]], s[[A_LO]]
	; SI-DAG: v_mov_b32_e32 v[[VA_HI:[0-9]+]], s[[A_HI]]			; SI-DAG: v_mov_b32_e32 v[[VA_HI:[0-9]+]], s[[A_HI]]
	; SI: v_div_scale_f64 [[RESULT0:v\[[0-9]+:[0-9]+\]]], [[RESULT1:s\[[0-9]+:[0-9]+\]]], [[B]], [[B]], v[[[VA_LO]]:[[VA_HI]]]			; SI: v_div_scale_f64 [[RESULT0:v\[[0-9]+:[0-9]+\]]], vcc, [[B]], [[B]], v[[[VA_LO]]:[[VA_HI]]]
	; SI: buffer_store_dwordx2 [[RESULT0]]			; SI: buffer_store_dwordx2 [[RESULT0]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_div_scale_f64_all_scalar_1(double addrspace(1)* %out, [8 x i32], double %a, [8 x i32], double %b) nounwind {			define amdgpu_kernel void @test_div_scale_f64_all_scalar_1(double addrspace(1)* %out, [8 x i32], double %a, [8 x i32], double %b) nounwind {
	%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double %a, double %b, i1 false) nounwind readnone			%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double %a, double %b, i1 false) nounwind readnone
	%result0 = extractvalue { double, i1 } %result, 0			%result0 = extractvalue { double, i1 } %result, 0
	store double %result0, double addrspace(1)* %out, align 8			store double %result0, double addrspace(1)* %out, align 8
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f64_all_scalar_2:			; SI-LABEL: {{^}}test_div_scale_f64_all_scalar_2:
	; SI-DAG: s_load_dwordx2 [[A:s\[[0-9]+:[0-9]+\]]], {{s\[[0-9]+:[0-9]+\]}}, 0x13			; SI-DAG: s_load_dwordx2 [[A:s\[[0-9]+:[0-9]+\]]], {{s\[[0-9]+:[0-9]+\]}}, 0x13
	; SI-DAG: s_load_dwordx2 s[[[B_LO:[0-9]+]]:[[B_HI:[0-9]+]]], {{s\[[0-9]+:[0-9]+\]}}, 0x1d			; SI-DAG: s_load_dwordx2 s[[[B_LO:[0-9]+]]:[[B_HI:[0-9]+]]], {{s\[[0-9]+:[0-9]+\]}}, 0x1d
	; SI-DAG: v_mov_b32_e32 v[[VB_LO:[0-9]+]], s[[B_LO]]			; SI-DAG: v_mov_b32_e32 v[[VB_LO:[0-9]+]], s[[B_LO]]
	; SI-DAG: v_mov_b32_e32 v[[VB_HI:[0-9]+]], s[[B_HI]]			; SI-DAG: v_mov_b32_e32 v[[VB_HI:[0-9]+]], s[[B_HI]]
	; SI: v_div_scale_f64 [[RESULT0:v\[[0-9]+:[0-9]+\]]], [[RESULT1:s\[[0-9]+:[0-9]+\]]], [[A]], v[[[VB_LO]]:[[VB_HI]]], [[A]]			; SI: v_div_scale_f64 [[RESULT0:v\[[0-9]+:[0-9]+\]]], vcc, [[A]], v[[[VB_LO]]:[[VB_HI]]], [[A]]
	; SI: buffer_store_dwordx2 [[RESULT0]]			; SI: buffer_store_dwordx2 [[RESULT0]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_div_scale_f64_all_scalar_2(double addrspace(1)* %out, [8 x i32], double %a, [8 x i32], double %b) nounwind {			define amdgpu_kernel void @test_div_scale_f64_all_scalar_2(double addrspace(1)* %out, [8 x i32], double %a, [8 x i32], double %b) nounwind {
	%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double %a, double %b, i1 true) nounwind readnone			%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double %a, double %b, i1 true) nounwind readnone
	%result0 = extractvalue { double, i1 } %result, 0			%result0 = extractvalue { double, i1 } %result, 0
	store double %result0, double addrspace(1)* %out, align 8			store double %result0, double addrspace(1)* %out, align 8
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f32_inline_imm_num:			; SI-LABEL: {{^}}test_div_scale_f32_inline_imm_num:
	; SI-DAG: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}			; SI-DAG: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}
	; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], [[RESULT1:s\[[0-9]+:[0-9]+\]]], [[A]], [[A]], 1.0			; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], vcc, [[A]], [[A]], 1.0
	; SI: buffer_store_dword [[RESULT0]]			; SI: buffer_store_dword [[RESULT0]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_div_scale_f32_inline_imm_num(float addrspace(1)* %out, float addrspace(1)* %in) nounwind {			define amdgpu_kernel void @test_div_scale_f32_inline_imm_num(float addrspace(1)* %out, float addrspace(1)* %in) nounwind {
	%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
	%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid
	%a = load float, float addrspace(1)* %gep.0, align 4			%a = load float, float addrspace(1)* %gep.0, align 4

	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float 1.0, float %a, i1 false) nounwind readnone			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float 1.0, float %a, i1 false) nounwind readnone
	%result0 = extractvalue { float, i1 } %result, 0			%result0 = extractvalue { float, i1 } %result, 0
	store float %result0, float addrspace(1)* %out, align 4			store float %result0, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f32_inline_imm_den:			; SI-LABEL: {{^}}test_div_scale_f32_inline_imm_den:
	; SI-DAG: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}			; SI-DAG: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}
	; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], [[RESULT1:s\[[0-9]+:[0-9]+\]]], 2.0, 2.0, [[A]]			; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], vcc, 2.0, 2.0, [[A]]
	; SI: buffer_store_dword [[RESULT0]]			; SI: buffer_store_dword [[RESULT0]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_div_scale_f32_inline_imm_den(float addrspace(1)* %out, float addrspace(1)* %in) nounwind {			define amdgpu_kernel void @test_div_scale_f32_inline_imm_den(float addrspace(1)* %out, float addrspace(1)* %in) nounwind {
	%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
	%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid
	%a = load float, float addrspace(1)* %gep.0, align 4			%a = load float, float addrspace(1)* %gep.0, align 4

	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float 2.0, i1 false) nounwind readnone			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float 2.0, i1 false) nounwind readnone
	%result0 = extractvalue { float, i1 } %result, 0			%result0 = extractvalue { float, i1 } %result, 0
	store float %result0, float addrspace(1)* %out, align 4			store float %result0, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f32_fneg_num:			; SI-LABEL: {{^}}test_div_scale_f32_fneg_num:
	; SI-DAG: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64			; SI-DAG: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64
	; SI-DAG: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4			; SI-DAG: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4
	; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], [[RESULT1:s\[[0-9]+:[0-9]+\]]], [[B]], [[B]], -[[A]]			; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], vcc, [[B]], [[B]], -[[A]]
	; SI: buffer_store_dword [[RESULT0]]			; SI: buffer_store_dword [[RESULT0]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_div_scale_f32_fneg_num(float addrspace(1)* %out, float addrspace(1)* %in) nounwind {			define amdgpu_kernel void @test_div_scale_f32_fneg_num(float addrspace(1)* %out, float addrspace(1)* %in) nounwind {
	%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
	%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid
	%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1

	%a = load volatile float, float addrspace(1)* %gep.0, align 4			%a = load volatile float, float addrspace(1)* %gep.0, align 4
	%b = load volatile float, float addrspace(1)* %gep.1, align 4			%b = load volatile float, float addrspace(1)* %gep.1, align 4

	%a.fneg = fneg float %a			%a.fneg = fneg float %a

	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a.fneg, float %b, i1 false) nounwind readnone			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a.fneg, float %b, i1 false) nounwind readnone
	%result0 = extractvalue { float, i1 } %result, 0			%result0 = extractvalue { float, i1 } %result, 0
	store float %result0, float addrspace(1)* %out, align 4			store float %result0, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f32_fabs_num:			; SI-LABEL: {{^}}test_div_scale_f32_fabs_num:
	; SI-DAG: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64			; SI-DAG: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64
	; SI-DAG: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4			; SI-DAG: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4
	; SI: v_and_b32_e32 [[ABS_A:v[0-9]+]], 0x7fffffff, [[A]]			; SI: v_and_b32_e32 [[ABS_A:v[0-9]+]], 0x7fffffff, [[A]]
	; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], [[RESULT1:s\[[0-9]+:[0-9]+\]]], [[B]], [[B]], [[ABS_A]]			; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], vcc, [[B]], [[B]], [[ABS_A]]
	; SI: buffer_store_dword [[RESULT0]]			; SI: buffer_store_dword [[RESULT0]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_div_scale_f32_fabs_num(float addrspace(1)* %out, float addrspace(1)* %in) nounwind {			define amdgpu_kernel void @test_div_scale_f32_fabs_num(float addrspace(1)* %out, float addrspace(1)* %in) nounwind {
	%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
	%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid
	%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1

	%a = load volatile float, float addrspace(1)* %gep.0, align 4			%a = load volatile float, float addrspace(1)* %gep.0, align 4
	%b = load volatile float, float addrspace(1)* %gep.1, align 4			%b = load volatile float, float addrspace(1)* %gep.1, align 4

	%a.fabs = call float @llvm.fabs.f32(float %a) nounwind readnone			%a.fabs = call float @llvm.fabs.f32(float %a) nounwind readnone

	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a.fabs, float %b, i1 false) nounwind readnone			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a.fabs, float %b, i1 false) nounwind readnone
	%result0 = extractvalue { float, i1 } %result, 0			%result0 = extractvalue { float, i1 } %result, 0
	store float %result0, float addrspace(1)* %out, align 4			store float %result0, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f32_fneg_den:			; SI-LABEL: {{^}}test_div_scale_f32_fneg_den:
	; SI-DAG: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64			; SI-DAG: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64
	; SI-DAG: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4			; SI-DAG: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4
	; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], [[RESULT1:s\[[0-9]+:[0-9]+\]]], -[[B]], -[[B]], [[A]]			; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], vcc, -[[B]], -[[B]], [[A]]
	; SI: buffer_store_dword [[RESULT0]]			; SI: buffer_store_dword [[RESULT0]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_div_scale_f32_fneg_den(float addrspace(1)* %out, float addrspace(1)* %in) nounwind {			define amdgpu_kernel void @test_div_scale_f32_fneg_den(float addrspace(1)* %out, float addrspace(1)* %in) nounwind {
	%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
	%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid
	%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1

	%a = load volatile float, float addrspace(1)* %gep.0, align 4			%a = load volatile float, float addrspace(1)* %gep.0, align 4
	%b = load volatile float, float addrspace(1)* %gep.1, align 4			%b = load volatile float, float addrspace(1)* %gep.1, align 4

	%b.fneg = fneg float %b			%b.fneg = fneg float %b

	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b.fneg, i1 false) nounwind readnone			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b.fneg, i1 false) nounwind readnone
	%result0 = extractvalue { float, i1 } %result, 0			%result0 = extractvalue { float, i1 } %result, 0
	store float %result0, float addrspace(1)* %out, align 4			store float %result0, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f32_fabs_den:			; SI-LABEL: {{^}}test_div_scale_f32_fabs_den:
	; SI-DAG: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64			; SI-DAG: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64
	; SI-DAG: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4			; SI-DAG: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4
	; SI: v_and_b32_e32 [[ABS_B:v[0-9]+]], 0x7fffffff, [[B]]			; SI: v_and_b32_e32 [[ABS_B:v[0-9]+]], 0x7fffffff, [[B]]
	; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], [[RESULT1:s\[[0-9]+:[0-9]+\]]], [[ABS_B]], [[ABS_B]], [[A]]			; SI: v_div_scale_f32 [[RESULT0:v[0-9]+]], vcc, [[ABS_B]], [[ABS_B]], [[A]]
	; SI: buffer_store_dword [[RESULT0]]			; SI: buffer_store_dword [[RESULT0]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_div_scale_f32_fabs_den(float addrspace(1)* %out, float addrspace(1)* %in) nounwind {			define amdgpu_kernel void @test_div_scale_f32_fabs_den(float addrspace(1)* %out, float addrspace(1)* %in) nounwind {
	%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
	%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid
	%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1

	%a = load volatile float, float addrspace(1)* %gep.0, align 4			%a = load volatile float, float addrspace(1)* %gep.0, align 4
	%b = load volatile float, float addrspace(1)* %gep.1, align 4			%b = load volatile float, float addrspace(1)* %gep.1, align 4

	%b.fabs = call float @llvm.fabs.f32(float %b) nounwind readnone			%b.fabs = call float @llvm.fabs.f32(float %b) nounwind readnone

	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b.fabs, i1 false) nounwind readnone			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b.fabs, i1 false) nounwind readnone
	%result0 = extractvalue { float, i1 } %result, 0			%result0 = extractvalue { float, i1 } %result, 0
	store float %result0, float addrspace(1)* %out, align 4			store float %result0, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f32_val_undef_val:			; SI-LABEL: {{^}}test_div_scale_f32_val_undef_val:
	; SI: s_mov_b32 [[K:s[0-9]+]], 0x41000000			; SI: s_mov_b32 [[K:s[0-9]+]], 0x41000000
	; SI: v_div_scale_f32 v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}}, [[K]], v{{[0-9]+}}, [[K]]			; SI: v_div_scale_f32 v{{[0-9]+}}, vcc, [[K]], v{{[0-9]+}}, [[K]]
	define amdgpu_kernel void @test_div_scale_f32_val_undef_val(float addrspace(1)* %out) #0 {			define amdgpu_kernel void @test_div_scale_f32_val_undef_val(float addrspace(1)* %out) #0 {
	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float 8.0, float undef, i1 false)			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float 8.0, float undef, i1 false)
	%result0 = extractvalue { float, i1 } %result, 0			%result0 = extractvalue { float, i1 } %result, 0
	store float %result0, float addrspace(1)* %out, align 4			store float %result0, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f32_undef_val_val:			; SI-LABEL: {{^}}test_div_scale_f32_undef_val_val:
	; SI: s_mov_b32 [[K:s[0-9]+]], 0x41000000			; SI: s_mov_b32 [[K:s[0-9]+]], 0x41000000
	; SI: v_div_scale_f32 v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}}, [[K]], [[K]], v{{[0-9]+}}			; SI: v_div_scale_f32 v{{[0-9]+}}, vcc, [[K]], [[K]], v{{[0-9]+}}
	define amdgpu_kernel void @test_div_scale_f32_undef_val_val(float addrspace(1)* %out) #0 {			define amdgpu_kernel void @test_div_scale_f32_undef_val_val(float addrspace(1)* %out) #0 {
	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float undef, float 8.0, i1 false)			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float undef, float 8.0, i1 false)
	%result0 = extractvalue { float, i1 } %result, 0			%result0 = extractvalue { float, i1 } %result, 0
	store float %result0, float addrspace(1)* %out, align 4			store float %result0, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f32_undef_undef_val:			; SI-LABEL: {{^}}test_div_scale_f32_undef_undef_val:
	; SI-NOT: v0			; SI-NOT: v0
	; SI: v_div_scale_f32 v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}}, s0, s0, v0			; SI: v_div_scale_f32 v{{[0-9]+}}, vcc, s0, s0, v0
	define amdgpu_kernel void @test_div_scale_f32_undef_undef_val(float addrspace(1)* %out) #0 {			define amdgpu_kernel void @test_div_scale_f32_undef_undef_val(float addrspace(1)* %out) #0 {
	%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float undef, float undef, i1 false)			%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float undef, float undef, i1 false)
	%result0 = extractvalue { float, i1 } %result, 0			%result0 = extractvalue { float, i1 } %result, 0
	store float %result0, float addrspace(1)* %out, align 4			store float %result0, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}test_div_scale_f64_val_undef_val:			; SI-LABEL: {{^}}test_div_scale_f64_val_undef_val:
	; SI-DAG: s_mov_b32 s[[K_LO:[0-9]+]], 0{{$}}			; SI-DAG: s_mov_b32 s[[K_LO:[0-9]+]], 0{{$}}
	; SI-DAG: s_mov_b32 s[[K_HI:[0-9]+]], 0x40200000			; SI-DAG: s_mov_b32 s[[K_HI:[0-9]+]], 0x40200000
	; SI: v_div_scale_f64 v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, s[[[K_LO]]:[[K_HI]]], v[0:1], s[[[K_LO]]:[[K_HI]]]			; SI: v_div_scale_f64 v{{\[[0-9]+:[0-9]+\]}}, vcc, s[[[K_LO]]:[[K_HI]]], v[0:1], s[[[K_LO]]:[[K_HI]]]
	define amdgpu_kernel void @test_div_scale_f64_val_undef_val(double addrspace(1)* %out) #0 {			define amdgpu_kernel void @test_div_scale_f64_val_undef_val(double addrspace(1)* %out) #0 {
	%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double 8.0, double undef, i1 false)			%result = call { double, i1 } @llvm.amdgcn.div.scale.f64(double 8.0, double undef, i1 false)
	%result0 = extractvalue { double, i1 } %result, 0			%result0 = extractvalue { double, i1 } %result, 0
	store double %result0, double addrspace(1)* %out, align 8			store double %result0, double addrspace(1)* %out, align 8
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readnone speculatable }			attributes #1 = { nounwind readnone speculatable }

llvm/test/CodeGen/AMDGPU/llvm.powi.ll

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	; GCN-NEXT: s_setpc_b64 s[30:31]
%res = call float @llvm.powi.f32.i32(float %l, i32 1)		%res = call float @llvm.powi.f32.i32(float %l, i32 1)
ret float %res		ret float %res
}		}

define float @v_powi_neg1_f32(float %l) {		define float @v_powi_neg1_f32(float %l) {
; GFX7-LABEL: v_powi_neg1_f32:		; GFX7-LABEL: v_powi_neg1_f32:
; GFX7: ; %bb.0:		; GFX7: ; %bb.0:
; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX7-NEXT: v_div_scale_f32 v1, s[4:5], v0, v0, 1.0		; GFX7-NEXT: v_div_scale_f32 v1, vcc, v0, v0, 1.0
; GFX7-NEXT: v_rcp_f32_e32 v2, v1		; GFX7-NEXT: v_rcp_f32_e32 v2, v1
; GFX7-NEXT: v_div_scale_f32 v3, vcc, 1.0, v0, 1.0		; GFX7-NEXT: v_div_scale_f32 v3, vcc, 1.0, v0, 1.0
; GFX7-NEXT: v_fma_f32 v4, -v1, v2, 1.0		; GFX7-NEXT: v_fma_f32 v4, -v1, v2, 1.0
; GFX7-NEXT: v_fma_f32 v2, v4, v2, v2		; GFX7-NEXT: v_fma_f32 v2, v4, v2, v2
; GFX7-NEXT: v_mul_f32_e32 v4, v3, v2		; GFX7-NEXT: v_mul_f32_e32 v4, v3, v2
; GFX7-NEXT: v_fma_f32 v5, -v1, v4, v3		; GFX7-NEXT: v_fma_f32 v5, -v1, v4, v3
; GFX7-NEXT: v_fma_f32 v4, v5, v2, v4		; GFX7-NEXT: v_fma_f32 v4, v5, v2, v4
; GFX7-NEXT: v_fma_f32 v1, -v1, v4, v3		; GFX7-NEXT: v_fma_f32 v1, -v1, v4, v3
; GFX7-NEXT: v_div_fmas_f32 v1, v1, v2, v4		; GFX7-NEXT: v_div_fmas_f32 v1, v1, v2, v4
; GFX7-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0		; GFX7-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0
; GFX7-NEXT: s_setpc_b64 s[30:31]		; GFX7-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX8-LABEL: v_powi_neg1_f32:		; GFX8-LABEL: v_powi_neg1_f32:
; GFX8: ; %bb.0:		; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: v_div_scale_f32 v1, s[4:5], v0, v0, 1.0		; GFX8-NEXT: v_div_scale_f32 v1, vcc, v0, v0, 1.0
; GFX8-NEXT: v_div_scale_f32 v2, vcc, 1.0, v0, 1.0		; GFX8-NEXT: v_div_scale_f32 v2, vcc, 1.0, v0, 1.0
; GFX8-NEXT: v_rcp_f32_e32 v3, v1		; GFX8-NEXT: v_rcp_f32_e32 v3, v1
; GFX8-NEXT: v_fma_f32 v4, -v1, v3, 1.0		; GFX8-NEXT: v_fma_f32 v4, -v1, v3, 1.0
; GFX8-NEXT: v_fma_f32 v3, v4, v3, v3		; GFX8-NEXT: v_fma_f32 v3, v4, v3, v3
; GFX8-NEXT: v_mul_f32_e32 v4, v2, v3		; GFX8-NEXT: v_mul_f32_e32 v4, v2, v3
; GFX8-NEXT: v_fma_f32 v5, -v1, v4, v2		; GFX8-NEXT: v_fma_f32 v5, -v1, v4, v2
; GFX8-NEXT: v_fma_f32 v4, v5, v3, v4		; GFX8-NEXT: v_fma_f32 v4, v5, v3, v4
; GFX8-NEXT: v_fma_f32 v1, -v1, v4, v2		; GFX8-NEXT: v_fma_f32 v1, -v1, v4, v2
Show All 14 Lines	; GCN-NEXT: s_setpc_b64 s[30:31]
ret float %res		ret float %res
}		}

define float @v_powi_neg2_f32(float %l) {		define float @v_powi_neg2_f32(float %l) {
; GFX7-LABEL: v_powi_neg2_f32:		; GFX7-LABEL: v_powi_neg2_f32:
; GFX7: ; %bb.0:		; GFX7: ; %bb.0:
; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX7-NEXT: v_mul_f32_e32 v0, v0, v0		; GFX7-NEXT: v_mul_f32_e32 v0, v0, v0
; GFX7-NEXT: v_div_scale_f32 v1, s[4:5], v0, v0, 1.0		; GFX7-NEXT: v_div_scale_f32 v1, vcc, v0, v0, 1.0
; GFX7-NEXT: v_rcp_f32_e32 v2, v1		; GFX7-NEXT: v_rcp_f32_e32 v2, v1
; GFX7-NEXT: v_div_scale_f32 v3, vcc, 1.0, v0, 1.0		; GFX7-NEXT: v_div_scale_f32 v3, vcc, 1.0, v0, 1.0
; GFX7-NEXT: v_fma_f32 v4, -v1, v2, 1.0		; GFX7-NEXT: v_fma_f32 v4, -v1, v2, 1.0
; GFX7-NEXT: v_fma_f32 v2, v4, v2, v2		; GFX7-NEXT: v_fma_f32 v2, v4, v2, v2
; GFX7-NEXT: v_mul_f32_e32 v4, v3, v2		; GFX7-NEXT: v_mul_f32_e32 v4, v3, v2
; GFX7-NEXT: v_fma_f32 v5, -v1, v4, v3		; GFX7-NEXT: v_fma_f32 v5, -v1, v4, v3
; GFX7-NEXT: v_fma_f32 v4, v5, v2, v4		; GFX7-NEXT: v_fma_f32 v4, v5, v2, v4
; GFX7-NEXT: v_fma_f32 v1, -v1, v4, v3		; GFX7-NEXT: v_fma_f32 v1, -v1, v4, v3
; GFX7-NEXT: v_div_fmas_f32 v1, v1, v2, v4		; GFX7-NEXT: v_div_fmas_f32 v1, v1, v2, v4
; GFX7-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0		; GFX7-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0
; GFX7-NEXT: s_setpc_b64 s[30:31]		; GFX7-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX8-LABEL: v_powi_neg2_f32:		; GFX8-LABEL: v_powi_neg2_f32:
; GFX8: ; %bb.0:		; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: v_mul_f32_e32 v0, v0, v0		; GFX8-NEXT: v_mul_f32_e32 v0, v0, v0
; GFX8-NEXT: v_div_scale_f32 v1, s[4:5], v0, v0, 1.0		; GFX8-NEXT: v_div_scale_f32 v1, vcc, v0, v0, 1.0
; GFX8-NEXT: v_div_scale_f32 v2, vcc, 1.0, v0, 1.0		; GFX8-NEXT: v_div_scale_f32 v2, vcc, 1.0, v0, 1.0
; GFX8-NEXT: v_rcp_f32_e32 v3, v1		; GFX8-NEXT: v_rcp_f32_e32 v3, v1
; GFX8-NEXT: v_fma_f32 v4, -v1, v3, 1.0		; GFX8-NEXT: v_fma_f32 v4, -v1, v3, 1.0
; GFX8-NEXT: v_fma_f32 v3, v4, v3, v3		; GFX8-NEXT: v_fma_f32 v3, v4, v3, v3
; GFX8-NEXT: v_mul_f32_e32 v4, v2, v3		; GFX8-NEXT: v_mul_f32_e32 v4, v2, v3
; GFX8-NEXT: v_fma_f32 v5, -v1, v4, v2		; GFX8-NEXT: v_fma_f32 v5, -v1, v4, v2
; GFX8-NEXT: v_fma_f32 v4, v5, v3, v4		; GFX8-NEXT: v_fma_f32 v4, v5, v3, v4
; GFX8-NEXT: v_fma_f32 v1, -v1, v4, v2		; GFX8-NEXT: v_fma_f32 v1, -v1, v4, v2
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX7-NEXT: v_mul_f32_e32 v0, v0, v0		; GFX7-NEXT: v_mul_f32_e32 v0, v0, v0
; GFX7-NEXT: v_mul_f32_e32 v0, v0, v0		; GFX7-NEXT: v_mul_f32_e32 v0, v0, v0
; GFX7-NEXT: v_mul_f32_e32 v0, v0, v0		; GFX7-NEXT: v_mul_f32_e32 v0, v0, v0
; GFX7-NEXT: v_mul_f32_e32 v0, v0, v0		; GFX7-NEXT: v_mul_f32_e32 v0, v0, v0
; GFX7-NEXT: v_mul_f32_e32 v0, v0, v0		; GFX7-NEXT: v_mul_f32_e32 v0, v0, v0
; GFX7-NEXT: v_mul_f32_e32 v0, v0, v0		; GFX7-NEXT: v_mul_f32_e32 v0, v0, v0
; GFX7-NEXT: v_mul_f32_e32 v0, v0, v0		; GFX7-NEXT: v_mul_f32_e32 v0, v0, v0
; GFX7-NEXT: v_div_scale_f32 v1, s[4:5], v0, v0, 1.0		; GFX7-NEXT: v_div_scale_f32 v1, vcc, v0, v0, 1.0
; GFX7-NEXT: v_rcp_f32_e32 v2, v1		; GFX7-NEXT: v_rcp_f32_e32 v2, v1
; GFX7-NEXT: v_div_scale_f32 v3, vcc, 1.0, v0, 1.0		; GFX7-NEXT: v_div_scale_f32 v3, vcc, 1.0, v0, 1.0
; GFX7-NEXT: v_fma_f32 v4, -v1, v2, 1.0		; GFX7-NEXT: v_fma_f32 v4, -v1, v2, 1.0
; GFX7-NEXT: v_fma_f32 v2, v4, v2, v2		; GFX7-NEXT: v_fma_f32 v2, v4, v2, v2
; GFX7-NEXT: v_mul_f32_e32 v4, v3, v2		; GFX7-NEXT: v_mul_f32_e32 v4, v3, v2
; GFX7-NEXT: v_fma_f32 v5, -v1, v4, v3		; GFX7-NEXT: v_fma_f32 v5, -v1, v4, v3
; GFX7-NEXT: v_fma_f32 v4, v5, v2, v4		; GFX7-NEXT: v_fma_f32 v4, v5, v2, v4
; GFX7-NEXT: v_fma_f32 v1, -v1, v4, v3		; GFX7-NEXT: v_fma_f32 v1, -v1, v4, v3
; GFX7-NEXT: v_div_fmas_f32 v1, v1, v2, v4		; GFX7-NEXT: v_div_fmas_f32 v1, v1, v2, v4
; GFX7-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0		; GFX7-NEXT: v_div_fixup_f32 v0, v1, v0, 1.0
; GFX7-NEXT: s_setpc_b64 s[30:31]		; GFX7-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX8-LABEL: v_powi_neg128_f32:		; GFX8-LABEL: v_powi_neg128_f32:
; GFX8: ; %bb.0:		; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: v_mul_f32_e32 v0, v0, v0		; GFX8-NEXT: v_mul_f32_e32 v0, v0, v0
; GFX8-NEXT: v_mul_f32_e32 v0, v0, v0		; GFX8-NEXT: v_mul_f32_e32 v0, v0, v0
; GFX8-NEXT: v_mul_f32_e32 v0, v0, v0		; GFX8-NEXT: v_mul_f32_e32 v0, v0, v0
; GFX8-NEXT: v_mul_f32_e32 v0, v0, v0		; GFX8-NEXT: v_mul_f32_e32 v0, v0, v0
; GFX8-NEXT: v_mul_f32_e32 v0, v0, v0		; GFX8-NEXT: v_mul_f32_e32 v0, v0, v0
; GFX8-NEXT: v_mul_f32_e32 v0, v0, v0		; GFX8-NEXT: v_mul_f32_e32 v0, v0, v0
; GFX8-NEXT: v_mul_f32_e32 v0, v0, v0		; GFX8-NEXT: v_mul_f32_e32 v0, v0, v0
; GFX8-NEXT: v_div_scale_f32 v1, s[4:5], v0, v0, 1.0		; GFX8-NEXT: v_div_scale_f32 v1, vcc, v0, v0, 1.0
; GFX8-NEXT: v_div_scale_f32 v2, vcc, 1.0, v0, 1.0		; GFX8-NEXT: v_div_scale_f32 v2, vcc, 1.0, v0, 1.0
; GFX8-NEXT: v_rcp_f32_e32 v3, v1		; GFX8-NEXT: v_rcp_f32_e32 v3, v1
; GFX8-NEXT: v_fma_f32 v4, -v1, v3, 1.0		; GFX8-NEXT: v_fma_f32 v4, -v1, v3, 1.0
; GFX8-NEXT: v_fma_f32 v3, v4, v3, v3		; GFX8-NEXT: v_fma_f32 v3, v4, v3, v3
; GFX8-NEXT: v_mul_f32_e32 v4, v2, v3		; GFX8-NEXT: v_mul_f32_e32 v4, v2, v3
; GFX8-NEXT: v_fma_f32 v5, -v1, v4, v2		; GFX8-NEXT: v_fma_f32 v5, -v1, v4, v2
; GFX8-NEXT: v_fma_f32 v4, v5, v3, v4		; GFX8-NEXT: v_fma_f32 v4, v5, v3, v4
; GFX8-NEXT: v_fma_f32 v1, -v1, v4, v2		; GFX8-NEXT: v_fma_f32 v1, -v1, v4, v2
Show All 18 Lines

llvm/test/CodeGen/AMDGPU/sched-crash-dbg-value.mir

Show First 20 Lines • Show All 166 Lines • ▼ Show 20 Lines	--- \|
!6 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 81, isLocal: false, isDefinition: true, scopeLine: 86, flags: DIFlagPrototyped, isOptimized: true, unit: !0)		!6 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 81, isLocal: false, isDefinition: true, scopeLine: 86, flags: DIFlagPrototyped, isOptimized: true, unit: !0)
!7 = !DIBasicType(name: "float", size: 32, encoding: DW_ATE_float)		!7 = !DIBasicType(name: "float", size: 32, encoding: DW_ATE_float)
!8 = !DILocation(line: 102, column: 8, scope: !6)		!8 = !DILocation(line: 102, column: 8, scope: !6)

...		...
---		---

# CHECK: name: sched_dbg_value_crash		# CHECK: name: sched_dbg_value_crash
# CHECK: DBG_VALUE %99, $noreg, !5, !DIExpression(DW_OP_constu, 1, DW_OP_swap, DW_OP_xderef), debug-location !8		# CHECK: DBG_VALUE %97, $noreg, !5, !DIExpression(DW_OP_constu, 1, DW_OP_swap, DW_OP_xderef), debug-location !8

name: sched_dbg_value_crash		name: sched_dbg_value_crash
alignment: 1		alignment: 1
exposesReturnsTwice: false		exposesReturnsTwice: false
legalized: false		legalized: false
regBankSelected: false		regBankSelected: false
selected: false		selected: false
tracksRegLiveness: true		tracksRegLiveness: true
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	bb.0.bb:
BUFFER_STORE_DWORD_OFFEN %81, %stack.0.tmp5, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr101, 104, 0, 0, 0, implicit $exec		BUFFER_STORE_DWORD_OFFEN %81, %stack.0.tmp5, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr101, 104, 0, 0, 0, implicit $exec
BUFFER_STORE_DWORD_OFFEN %80, %stack.0.tmp5, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr101, 100, 0, 0, 0, implicit $exec		BUFFER_STORE_DWORD_OFFEN %80, %stack.0.tmp5, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr101, 100, 0, 0, 0, implicit $exec
BUFFER_STORE_DWORD_OFFEN %78, %stack.0.tmp5, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr101, 96, 0, 0, 0, implicit $exec		BUFFER_STORE_DWORD_OFFEN %78, %stack.0.tmp5, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr101, 96, 0, 0, 0, implicit $exec
%85:vgpr_32 = IMPLICIT_DEF		%85:vgpr_32 = IMPLICIT_DEF
%86:vgpr_32 = IMPLICIT_DEF		%86:vgpr_32 = IMPLICIT_DEF
%87:vgpr_32 = IMPLICIT_DEF		%87:vgpr_32 = IMPLICIT_DEF
%88:vgpr_32 = IMPLICIT_DEF		%88:vgpr_32 = IMPLICIT_DEF
%90:vgpr_32 = IMPLICIT_DEF		%90:vgpr_32 = IMPLICIT_DEF
%91:vgpr_32, dead %92:sreg_64 = nofpexcept V_DIV_SCALE_F32_e64 0, %90, 0, %90, 0, 1065353216, 0, 0, implicit $mode, implicit $exec		%91:vgpr_32 = nofpexcept V_DIV_SCALE_F32_e64 0, %90, 0, %90, 0, 1065353216, 0, 0, implicit $mode, implicit $exec, implicit-def $vcc
%95:vgpr_32 = nofpexcept V_FMA_F32_e64 0, 0, 0, 0, 0, undef %93:vgpr_32, 0, 0, implicit $mode, implicit $exec		%95:vgpr_32 = nofpexcept V_FMA_F32_e64 0, 0, 0, 0, 0, undef %93:vgpr_32, 0, 0, implicit $mode, implicit $exec
%96:vgpr_32, %97:sreg_64 = nofpexcept V_DIV_SCALE_F32_e64 0, 1065353216, 0, %90, 0, 1065353216, 0, 0, implicit $mode, implicit $exec		%96:vgpr_32 = nofpexcept V_DIV_SCALE_F32_e64 0, 1065353216, 0, %90, 0, 1065353216, 0, 0, implicit $mode, implicit $exec, implicit-def $vcc
%98:vgpr_32 = IMPLICIT_DEF		%98:vgpr_32 = IMPLICIT_DEF
%99:vgpr_32 = IMPLICIT_DEF		%99:vgpr_32 = IMPLICIT_DEF
%100:vgpr_32 = IMPLICIT_DEF		%100:vgpr_32 = IMPLICIT_DEF
%101:vgpr_32 = IMPLICIT_DEF		%101:vgpr_32 = IMPLICIT_DEF
%102:vgpr_32 = IMPLICIT_DEF		%102:vgpr_32 = IMPLICIT_DEF
%103:vgpr_32 = IMPLICIT_DEF		%103:vgpr_32 = IMPLICIT_DEF
%104:vgpr_32 = IMPLICIT_DEF		%104:vgpr_32 = IMPLICIT_DEF
%105:vgpr_32 = IMPLICIT_DEF		%105:vgpr_32 = IMPLICIT_DEF
%106:vgpr_32, dead %107:sreg_64 = nofpexcept V_DIV_SCALE_F32_e64 0, %90, 0, %90, 0, %105, 0, 0, implicit $mode, implicit $exec		%106:vgpr_32 = nofpexcept V_DIV_SCALE_F32_e64 0, %90, 0, %90, 0, %105, 0, 0, implicit $mode, implicit $exec, implicit-def $vcc
%108:vgpr_32 = nofpexcept V_RCP_F32_e32 0, implicit $mode, implicit $exec		%108:vgpr_32 = nofpexcept V_RCP_F32_e32 0, implicit $mode, implicit $exec
%109:vgpr_32 = IMPLICIT_DEF		%109:vgpr_32 = IMPLICIT_DEF
%110:vgpr_32 = nofpexcept V_FMA_F32_e64 0, 0, 0, 0, 0, 0, 0, 0, implicit $mode, implicit $exec		%110:vgpr_32 = nofpexcept V_FMA_F32_e64 0, 0, 0, 0, 0, 0, 0, 0, implicit $mode, implicit $exec
%111:vgpr_32, %112:sreg_64 = nofpexcept V_DIV_SCALE_F32_e64 0, 0, 0, 0, 0, 0, 0, 0, implicit $mode, implicit $exec		%111:vgpr_32 = nofpexcept V_DIV_SCALE_F32_e64 0, 0, 0, 0, 0, 0, 0, 0, implicit $mode, implicit $exec, implicit-def $vcc
%113:vgpr_32 = nofpexcept V_MUL_F32_e32 0, %110, implicit $mode, implicit $exec		%113:vgpr_32 = nofpexcept V_MUL_F32_e32 0, %110, implicit $mode, implicit $exec
%114:vgpr_32 = IMPLICIT_DEF		%114:vgpr_32 = IMPLICIT_DEF
%115:vgpr_32 = IMPLICIT_DEF		%115:vgpr_32 = IMPLICIT_DEF
%116:vgpr_32 = IMPLICIT_DEF		%116:vgpr_32 = IMPLICIT_DEF
$vcc = IMPLICIT_DEF		$vcc = IMPLICIT_DEF
%117:vgpr_32 = nofpexcept V_DIV_FMAS_F32_e64 0, %116, 0, %110, 0, %115, 0, 0, implicit killed $vcc, implicit $mode, implicit $exec		%117:vgpr_32 = nofpexcept V_DIV_FMAS_F32_e64 0, %116, 0, %110, 0, %115, 0, 0, implicit killed $vcc, implicit $mode, implicit $exec
%118:vgpr_32 = nofpexcept V_DIV_FIXUP_F32_e64 0, %117, 0, %90, 0, %105, 0, 0, implicit $mode, implicit $exec		%118:vgpr_32 = nofpexcept V_DIV_FIXUP_F32_e64 0, %117, 0, %90, 0, %105, 0, 0, implicit $mode, implicit $exec
%119:vgpr_32 = IMPLICIT_DEF		%119:vgpr_32 = IMPLICIT_DEF
Show All 20 Lines

llvm/test/CodeGen/AMDGPU/wave32.ll

Show First 20 Lines • Show All 345 Lines • ▼ Show 20 Lines	bb:
%tmp2 = load i64, i64 addrspace(1)* %arg, align 8		%tmp2 = load i64, i64 addrspace(1)* %arg, align 8
%tmp3 = udiv i64 %tmp1, %tmp2		%tmp3 = udiv i64 %tmp1, %tmp2
%tmp4 = getelementptr inbounds i64, i64 addrspace(1)* %arg, i64 2		%tmp4 = getelementptr inbounds i64, i64 addrspace(1)* %arg, i64 2
store i64 %tmp3, i64 addrspace(1)* %tmp4, align 8		store i64 %tmp3, i64 addrspace(1)* %tmp4, align 8
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_div_scale_f32:		; GCN-LABEL: {{^}}test_div_scale_f32:
; GFX1032: v_div_scale_f32 v{{[0-9]+}}, s{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; GFX1032: v_div_scale_f32 v{{[0-9]+}}, vcc_lo, v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
; GFX1064: v_div_scale_f32 v{{[0-9]+}}, s[{{[0-9:]+}}], v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; GFX1064: v_div_scale_f32 v{{[0-9]+}}, vcc, v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
define amdgpu_kernel void @test_div_scale_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {		define amdgpu_kernel void @test_div_scale_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone		%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid		%gep.0 = getelementptr float, float addrspace(1)* %in, i32 %tid
%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1		%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1

%a = load volatile float, float addrspace(1)* %gep.0, align 4		%a = load volatile float, float addrspace(1)* %gep.0, align 4
%b = load volatile float, float addrspace(1)* %gep.1, align 4		%b = load volatile float, float addrspace(1)* %gep.1, align 4

%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b, i1 false) nounwind readnone		%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %a, float %b, i1 false) nounwind readnone
%result0 = extractvalue { float, i1 } %result, 0		%result0 = extractvalue { float, i1 } %result, 0
store float %result0, float addrspace(1)* %out, align 4		store float %result0, float addrspace(1)* %out, align 4
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_div_scale_f64:		; GCN-LABEL: {{^}}test_div_scale_f64:
; GFX1032: v_div_scale_f64 v[{{[0-9:]+}}], s{{[0-9]+}}, v[{{[0-9:]+}}], v[{{[0-9:]+}}], v[{{[0-9:]+}}]		; GFX1032: v_div_scale_f64 v[{{[0-9:]+}}], vcc_lo, v[{{[0-9:]+}}], v[{{[0-9:]+}}], v[{{[0-9:]+}}]
; GFX1064: v_div_scale_f64 v[{{[0-9:]+}}], s[{{[0-9:]+}}], v[{{[0-9:]+}}], v[{{[0-9:]+}}], v[{{[0-9:]+}}]		; GFX1064: v_div_scale_f64 v[{{[0-9:]+}}], vcc, v[{{[0-9:]+}}], v[{{[0-9:]+}}], v[{{[0-9:]+}}]
define amdgpu_kernel void @test_div_scale_f64(double addrspace(1)* %out, double addrspace(1)* %aptr, double addrspace(1)* %in) #0 {		define amdgpu_kernel void @test_div_scale_f64(double addrspace(1)* %out, double addrspace(1)* %aptr, double addrspace(1)* %in) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone		%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
%gep.0 = getelementptr double, double addrspace(1)* %in, i32 %tid		%gep.0 = getelementptr double, double addrspace(1)* %in, i32 %tid
%gep.1 = getelementptr double, double addrspace(1)* %gep.0, i32 1		%gep.1 = getelementptr double, double addrspace(1)* %gep.0, i32 1

%a = load volatile double, double addrspace(1)* %gep.0, align 8		%a = load volatile double, double addrspace(1)* %gep.0, align 8
%b = load volatile double, double addrspace(1)* %gep.1, align 8		%b = load volatile double, double addrspace(1)* %gep.1, align 8

▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
exit:		exit:
%cond = phi i1 [false, %entry], [%cmp1, %bb]		%cond = phi i1 [false, %entry], [%cmp1, %bb]
%result = call float @llvm.amdgcn.div.fmas.f32(float %a, float %b, float %c, i1 %cond) nounwind readnone		%result = call float @llvm.amdgcn.div.fmas.f32(float %a, float %b, float %c, i1 %cond) nounwind readnone
store float %result, float addrspace(1)* %gep.out, align 4		store float %result, float addrspace(1)* %gep.out, align 4
ret void		ret void
}		}

; GCN-LABEL: {{^}}fdiv_f32:		; GCN-LABEL: {{^}}fdiv_f32:
; GFX1032: v_div_scale_f32 v{{[0-9]+}}, s{{[0-9]+}}, s{{[0-9]+}}, s{{[0-9]+}}, s{{[0-9]+}}		; GFX1032: v_div_scale_f32 v{{[0-9]+}}, vcc_lo, s{{[0-9]+}}, s{{[0-9]+}}, s{{[0-9]+}}
; GFX1064: v_div_scale_f32 v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}}, s{{[0-9]+}}, s{{[0-9]+}}		; GFX1064: v_div_scale_f32 v{{[0-9]+}}, vcc, s{{[0-9]+}}, s{{[0-9]+}}, s{{[0-9]+}}
; GCN: v_rcp_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}		; GCN: v_rcp_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}
; GFX1032: v_div_scale_f32 v{{[0-9]+}}, vcc_lo, s{{[0-9]+}}, s{{[0-9]+}}, s{{[0-9]+}}		; GFX1032: v_div_scale_f32 v{{[0-9]+}}, vcc_lo, s{{[0-9]+}}, s{{[0-9]+}}, s{{[0-9]+}}
; GFX1064: v_div_scale_f32 v{{[0-9]+}}, vcc, s{{[0-9]+}}, s{{[0-9]+}}, s{{[0-9]+}}		; GFX1064: v_div_scale_f32 v{{[0-9]+}}, vcc, s{{[0-9]+}}, s{{[0-9]+}}, s{{[0-9]+}}

; GCN-NOT: vcc		; GCN-NOT: vcc
; GCN: v_div_fmas_f32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; GCN: v_div_fmas_f32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
define amdgpu_kernel void @fdiv_f32(float addrspace(1)* %out, float %a, float %b) #0 {		define amdgpu_kernel void @fdiv_f32(float addrspace(1)* %out, float %a, float %b) #0 {
entry:		entry:
▲ Show 20 Lines • Show All 703 Lines • Show Last 20 Lines

llvm/test/MC/AMDGPU/gfx10_asm_vop3.s

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 6,787 Lines • ▼ Show 20 Lines
	// GFX10: encoding: [0x05,0x00,0x6c,0xd5,0x01,0x83,0x01,0x00]			// GFX10: encoding: [0x05,0x00,0x6c,0xd5,0x01,0x83,0x01,0x00]

	v_mul_hi_i32 v5, v1, 0.5			v_mul_hi_i32 v5, v1, 0.5
	// GFX10: encoding: [0x05,0x00,0x6c,0xd5,0x01,0xe1,0x01,0x00]			// GFX10: encoding: [0x05,0x00,0x6c,0xd5,0x01,0xe1,0x01,0x00]

	v_mul_hi_i32 v5, v1, -4.0			v_mul_hi_i32 v5, v1, -4.0
	// GFX10: encoding: [0x05,0x00,0x6c,0xd5,0x01,0xef,0x01,0x00]			// GFX10: encoding: [0x05,0x00,0x6c,0xd5,0x01,0xef,0x01,0x00]

	v_div_scale_f32 v5, s0, v1, v2, v3			v_div_scale_f32 v5, vcc_lo, v1, v2, v3
	rampitecUnsubmitted Done Reply Inline Actions Replace s[0:1]/s0 with vcc/vcc_lo rampitec: Replace s[0:1]/s0 with vcc/vcc_lo
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x0e,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x0e,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v255, s0, v1, v2, v3			v_div_scale_f32 v255, vcc_lo, v1, v2, v3
	// W32: encoding: [0xff,0x00,0x6d,0xd5,0x01,0x05,0x0e,0x04]			// W32: encoding: [0xff,0x6a,0x6d,0xd5,0x01,0x05,0x0e,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v255, v2, v3			v_div_scale_f32 v5, vcc_lo, v255, v2, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0xff,0x05,0x0e,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0xff,0x05,0x0e,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, s1, v2, v3			v_div_scale_f32 v5, vcc_lo, s1, v2, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x04,0x0e,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x04,0x0e,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, s103, v2, v3			v_div_scale_f32 v5, vcc_lo, s103, v2, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x67,0x04,0x0e,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x67,0x04,0x0e,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, vcc_lo, v2, v3			v_div_scale_f32 v5, vcc_lo, vcc_lo, v2, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x6a,0x04,0x0e,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x6a,0x04,0x0e,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, vcc_hi, v2, v3			v_div_scale_f32 v5, vcc_lo, vcc_hi, v2, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x6b,0x04,0x0e,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x6b,0x04,0x0e,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, ttmp11, v2, v3			v_div_scale_f32 v5, vcc_lo, ttmp11, v2, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x77,0x04,0x0e,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x77,0x04,0x0e,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, m0, v2, v3			v_div_scale_f32 v5, vcc_lo, m0, v2, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x7c,0x04,0x0e,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x7c,0x04,0x0e,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, exec_lo, v2, v3			v_div_scale_f32 v5, vcc_lo, exec_lo, v2, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x7e,0x04,0x0e,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x7e,0x04,0x0e,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, exec_hi, v2, v3			v_div_scale_f32 v5, vcc_lo, exec_hi, v2, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x7f,0x04,0x0e,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x7f,0x04,0x0e,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, 0, v2, v3			v_div_scale_f32 v5, vcc_lo, 0, v2, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x80,0x04,0x0e,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x80,0x04,0x0e,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, -1, v2, v3			v_div_scale_f32 v5, vcc_lo, -1, v2, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0xc1,0x04,0x0e,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0xc1,0x04,0x0e,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, 0.5, v2, v3			v_div_scale_f32 v5, vcc_lo, 0.5, v2, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0xf0,0x04,0x0e,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0xf0,0x04,0x0e,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, -4.0, v2, v3			v_div_scale_f32 v5, vcc_lo, -4.0, v2, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0xf7,0x04,0x0e,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0xf7,0x04,0x0e,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, v255, v3			v_div_scale_f32 v5, vcc_lo, v1, v255, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0xff,0x0f,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xff,0x0f,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, s2, v3			v_div_scale_f32 v5, vcc_lo, v1, s2, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x0c,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x0c,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, s103, v3			v_div_scale_f32 v5, vcc_lo, v1, s103, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0xcf,0x0c,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xcf,0x0c,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, vcc_lo, v3			v_div_scale_f32 v5, vcc_lo, v1, vcc_lo, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0xd5,0x0c,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xd5,0x0c,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, vcc_hi, v3			v_div_scale_f32 v5, vcc_lo, v1, vcc_hi, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0xd7,0x0c,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xd7,0x0c,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, ttmp11, v3			v_div_scale_f32 v5, vcc_lo, v1, ttmp11, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0xef,0x0c,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xef,0x0c,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, m0, v3			v_div_scale_f32 v5, vcc_lo, v1, m0, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0xf9,0x0c,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xf9,0x0c,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, exec_lo, v3			v_div_scale_f32 v5, vcc_lo, v1, exec_lo, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0xfd,0x0c,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xfd,0x0c,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, exec_hi, v3			v_div_scale_f32 v5, vcc_lo, v1, exec_hi, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0xff,0x0c,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xff,0x0c,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, 0, v3			v_div_scale_f32 v5, vcc_lo, v1, 0, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x01,0x0d,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x01,0x0d,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, -1, v3			v_div_scale_f32 v5, vcc_lo, v1, -1, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x83,0x0d,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x83,0x0d,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, 0.5, v3			v_div_scale_f32 v5, vcc_lo, v1, 0.5, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0xe1,0x0d,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xe1,0x0d,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, -4.0, v3			v_div_scale_f32 v5, vcc_lo, v1, -4.0, v3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0xef,0x0d,0x04]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xef,0x0d,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, v2, v255			v_div_scale_f32 v5, vcc_lo, v1, v2, v255
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xfe,0x07]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xfe,0x07]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, v2, s3			v_div_scale_f32 v5, vcc_lo, v1, v2, s3
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x0e,0x00]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x0e,0x00]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, v2, s103			v_div_scale_f32 v5, vcc_lo, v1, v2, s103
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x9e,0x01]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x9e,0x01]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, v2, vcc_lo			v_div_scale_f32 v5, vcc_lo, v1, v2, vcc_lo
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xaa,0x01]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xaa,0x01]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, v2, vcc_hi			v_div_scale_f32 v5, vcc_lo, v1, v2, vcc_hi
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xae,0x01]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xae,0x01]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, v2, ttmp11			v_div_scale_f32 v5, vcc_lo, v1, v2, ttmp11
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xde,0x01]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xde,0x01]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, v2, m0			v_div_scale_f32 v5, vcc_lo, v1, v2, m0
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xf2,0x01]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xf2,0x01]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, v2, exec_lo			v_div_scale_f32 v5, vcc_lo, v1, v2, exec_lo
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xfa,0x01]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xfa,0x01]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, v2, exec_hi			v_div_scale_f32 v5, vcc_lo, v1, v2, exec_hi
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xfe,0x01]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xfe,0x01]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, v2, 0			v_div_scale_f32 v5, vcc_lo, v1, v2, 0
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x02,0x02]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x02,0x02]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, v2, -1			v_div_scale_f32 v5, vcc_lo, v1, v2, -1
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x06,0x03]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x06,0x03]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, v2, 0.5			v_div_scale_f32 v5, vcc_lo, v1, v2, 0.5
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xc2,0x03]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xc2,0x03]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s0, v1, v2, -4.0			v_div_scale_f32 v5, vcc_lo, v1, v2, -4.0
	// W32: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xde,0x03]			// W32: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xde,0x03]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, v2, v3			v_div_scale_f32 v5, vcc, v1, v2, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x0e,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x0e,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v255, s[0:1], v1, v2, v3			v_div_scale_f32 v255, vcc, v1, v2, v3
	// W64: encoding: [0xff,0x00,0x6d,0xd5,0x01,0x05,0x0e,0x04]			// W64: encoding: [0xff,0x6a,0x6d,0xd5,0x01,0x05,0x0e,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v255, v2, v3			v_div_scale_f32 v5, vcc, v255, v2, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0xff,0x05,0x0e,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0xff,0x05,0x0e,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], s1, v2, v3			v_div_scale_f32 v5, vcc, s1, v2, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x04,0x0e,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x04,0x0e,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], s103, v2, v3			v_div_scale_f32 v5, vcc, s103, v2, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x67,0x04,0x0e,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x67,0x04,0x0e,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], vcc_lo, v2, v3			v_div_scale_f32 v5, vcc, vcc_lo, v2, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x6a,0x04,0x0e,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x6a,0x04,0x0e,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], vcc_hi, v2, v3			v_div_scale_f32 v5, vcc, vcc_hi, v2, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x6b,0x04,0x0e,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x6b,0x04,0x0e,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], ttmp11, v2, v3			v_div_scale_f32 v5, vcc, ttmp11, v2, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x77,0x04,0x0e,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x77,0x04,0x0e,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], m0, v2, v3			v_div_scale_f32 v5, vcc, m0, v2, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x7c,0x04,0x0e,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x7c,0x04,0x0e,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], exec_lo, v2, v3			v_div_scale_f32 v5, vcc, exec_lo, v2, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x7e,0x04,0x0e,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x7e,0x04,0x0e,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], exec_hi, v2, v3			v_div_scale_f32 v5, vcc, exec_hi, v2, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x7f,0x04,0x0e,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x7f,0x04,0x0e,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], 0, v2, v3			v_div_scale_f32 v5, vcc, 0, v2, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x80,0x04,0x0e,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x80,0x04,0x0e,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], -1, v2, v3			v_div_scale_f32 v5, vcc, -1, v2, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0xc1,0x04,0x0e,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0xc1,0x04,0x0e,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], 0.5, v2, v3			v_div_scale_f32 v5, vcc, 0.5, v2, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0xf0,0x04,0x0e,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0xf0,0x04,0x0e,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], -4.0, v2, v3			v_div_scale_f32 v5, vcc, -4.0, v2, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0xf7,0x04,0x0e,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0xf7,0x04,0x0e,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, v255, v3			v_div_scale_f32 v5, vcc, v1, v255, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0xff,0x0f,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xff,0x0f,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, s2, v3			v_div_scale_f32 v5, vcc, v1, s2, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x0c,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x0c,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, s103, v3			v_div_scale_f32 v5, vcc, v1, s103, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0xcf,0x0c,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xcf,0x0c,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, vcc_lo, v3			v_div_scale_f32 v5, vcc, v1, vcc_lo, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0xd5,0x0c,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xd5,0x0c,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, vcc_hi, v3			v_div_scale_f32 v5, vcc, v1, vcc_hi, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0xd7,0x0c,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xd7,0x0c,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, ttmp11, v3			v_div_scale_f32 v5, vcc, v1, ttmp11, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0xef,0x0c,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xef,0x0c,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, m0, v3			v_div_scale_f32 v5, vcc, v1, m0, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0xf9,0x0c,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xf9,0x0c,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, exec_lo, v3			v_div_scale_f32 v5, vcc, v1, exec_lo, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0xfd,0x0c,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xfd,0x0c,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, exec_hi, v3			v_div_scale_f32 v5, vcc, v1, exec_hi, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0xff,0x0c,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xff,0x0c,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, 0, v3			v_div_scale_f32 v5, vcc, v1, 0, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x01,0x0d,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x01,0x0d,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, -1, v3			v_div_scale_f32 v5, vcc, v1, -1, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x83,0x0d,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x83,0x0d,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, 0.5, v3			v_div_scale_f32 v5, vcc, v1, 0.5, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0xe1,0x0d,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xe1,0x0d,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, -4.0, v3			v_div_scale_f32 v5, vcc, v1, -4.0, v3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0xef,0x0d,0x04]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xef,0x0d,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, v2, v255			v_div_scale_f32 v5, vcc, v1, v2, v255
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xfe,0x07]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xfe,0x07]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, v2, s3			v_div_scale_f32 v5, vcc, v1, v2, s3
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x0e,0x00]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x0e,0x00]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, v2, s103			v_div_scale_f32 v5, vcc, v1, v2, s103
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x9e,0x01]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x9e,0x01]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, v2, vcc_lo			v_div_scale_f32 v5, vcc, v1, v2, vcc_lo
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xaa,0x01]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xaa,0x01]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, v2, vcc_hi			v_div_scale_f32 v5, vcc, v1, v2, vcc_hi
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xae,0x01]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xae,0x01]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, v2, ttmp11			v_div_scale_f32 v5, vcc, v1, v2, ttmp11
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xde,0x01]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xde,0x01]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, v2, m0			v_div_scale_f32 v5, vcc, v1, v2, m0
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xf2,0x01]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xf2,0x01]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, v2, exec_lo			v_div_scale_f32 v5, vcc, v1, v2, exec_lo
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xfa,0x01]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xfa,0x01]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, v2, exec_hi			v_div_scale_f32 v5, vcc, v1, v2, exec_hi
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xfe,0x01]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xfe,0x01]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, v2, 0			v_div_scale_f32 v5, vcc, v1, v2, 0
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x02,0x02]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x02,0x02]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, v2, -1			v_div_scale_f32 v5, vcc, v1, v2, -1
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x06,0x03]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x06,0x03]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, v2, 0.5			v_div_scale_f32 v5, vcc, v1, v2, 0.5
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xc2,0x03]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xc2,0x03]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, s[0:1], v1, v2, -4.0			v_div_scale_f32 v5, vcc, v1, v2, -4.0
	// W64: encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xde,0x03]			// W64: encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xde,0x03]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_fmas_f32 v5, v1, v2, v3			v_div_fmas_f32 v5, v1, v2, v3
	// GFX10: encoding: [0x05,0x00,0x6f,0xd5,0x01,0x05,0x0e,0x04]			// GFX10: encoding: [0x05,0x00,0x6f,0xd5,0x01,0x05,0x0e,0x04]

	v_div_fmas_f32 v255, v1, v2, v3			v_div_fmas_f32 v255, v1, v2, v3
	// GFX10: encoding: [0xff,0x00,0x6f,0xd5,0x01,0x05,0x0e,0x04]			// GFX10: encoding: [0xff,0x00,0x6f,0xd5,0x01,0x05,0x0e,0x04]

	v_div_fmas_f32 v5, v255, v2, v3			v_div_fmas_f32 v5, v255, v2, v3
	▲ Show 20 Lines • Show All 6,065 Lines • Show Last 20 Lines

llvm/test/MC/AMDGPU/gfx11_asm_vop3.s

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 4,054 Lines • ▼ Show 20 Lines
	v_div_fmas_f64 v[5:6], -\|src_scc\|, -1, 0.5 mul:4			v_div_fmas_f64 v[5:6], -\|src_scc\|, -1, 0.5 mul:4
	// GFX11: encoding: [0x05,0x01,0x38,0xd6,0xfd,0x82,0xc1,0x33]			// GFX11: encoding: [0x05,0x01,0x38,0xd6,0xfd,0x82,0xc1,0x33]

	v_div_fmas_f64 v[254:255], 0xaf123456, null, -1 clamp div:2			v_div_fmas_f64 v[254:255], 0xaf123456, null, -1 clamp div:2
	// GFX11: encoding: [0xfe,0x80,0x38,0xd6,0xff,0xf8,0x04,0x1b,0x56,0x34,0x12,0xaf]			// GFX11: encoding: [0xfe,0x80,0x38,0xd6,0xff,0xf8,0x04,0x1b,0x56,0x34,0x12,0xaf]

	v_div_scale_f32 v5, vcc_lo, v1, v2, s3			v_div_scale_f32 v5, vcc_lo, v1, v2, s3
	// W32: encoding: [0x05,0x6a,0xfc,0xd6,0x01,0x05,0x0e,0x00]			// W32: encoding: [0x05,0x6a,0xfc,0xd6,0x01,0x05,0x0e,0x00]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc_lo, v255, s2, s105			v_div_scale_f32 v5, vcc_lo, v255, s2, s105
	// W32: encoding: [0x05,0x6a,0xfc,0xd6,0xff,0x05,0xa4,0x01]			// W32: encoding: [0x05,0x6a,0xfc,0xd6,0xff,0x05,0xa4,0x01]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc_lo, s1, v255, exec_hi			v_div_scale_f32 v5, vcc_lo, s1, v255, exec_hi
	// W32: encoding: [0x05,0x6a,0xfc,0xd6,0x01,0xfe,0xff,0x01]			// W32: encoding: [0x05,0x6a,0xfc,0xd6,0x01,0xfe,0xff,0x01]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc_lo, s105, s105, exec_lo			v_div_scale_f32 v5, vcc_lo, s105, s105, exec_lo
	// W32: encoding: [0x05,0x6a,0xfc,0xd6,0x69,0xd2,0xf8,0x01]			// W32: encoding: [0x05,0x6a,0xfc,0xd6,0x69,0xd2,0xf8,0x01]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc_lo, vcc_lo, ttmp15, v3			v_div_scale_f32 v5, vcc_lo, vcc_lo, ttmp15, v3
	// W32: encoding: [0x05,0x6a,0xfc,0xd6,0x6a,0xf6,0x0c,0x04]			// W32: encoding: [0x05,0x6a,0xfc,0xd6,0x6a,0xf6,0x0c,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc_lo, vcc_hi, 0xaf123456, v255			v_div_scale_f32 v5, vcc_lo, vcc_hi, 0xaf123456, v255
	// W32: encoding: [0x05,0x6a,0xfc,0xd6,0x6b,0xfe,0xfd,0x07,0x56,0x34,0x12,0xaf]			// W32: encoding: [0x05,0x6a,0xfc,0xd6,0x6b,0xfe,0xfd,0x07,0x56,0x34,0x12,0xaf]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc_lo, -ttmp15, -src_scc, -ttmp15			v_div_scale_f32 v5, vcc_lo, -ttmp15, -src_scc, -ttmp15
	// W32: encoding: [0x05,0x6a,0xfc,0xd6,0x7b,0xfa,0xed,0xe1]			// W32: encoding: [0x05,0x6a,0xfc,0xd6,0x7b,0xfa,0xed,0xe1]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc_lo, m0, 0.5, m0			v_div_scale_f32 v5, vcc_lo, m0, 0.5, m0
	// W32: encoding: [0x05,0x6a,0xfc,0xd6,0x7d,0xe0,0xf5,0x01]			// W32: encoding: [0x05,0x6a,0xfc,0xd6,0x7d,0xe0,0xf5,0x01]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc_lo, exec_lo, -1, vcc_hi			v_div_scale_f32 v5, vcc_lo, exec_lo, -1, vcc_hi
	// W32: encoding: [0x05,0x6a,0xfc,0xd6,0x7e,0x82,0xad,0x01]			// W32: encoding: [0x05,0x6a,0xfc,0xd6,0x7e,0x82,0xad,0x01]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc_lo, -exec_hi, null, -vcc_lo			v_div_scale_f32 v5, vcc_lo, -exec_hi, null, -vcc_lo
	// W32: encoding: [0x05,0x6a,0xfc,0xd6,0x7f,0xf8,0xa8,0xa1]			// W32: encoding: [0x05,0x6a,0xfc,0xd6,0x7f,0xf8,0xa8,0xa1]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc_lo, null, exec_lo, neg(0xaf123456)			v_div_scale_f32 v5, vcc_lo, null, exec_lo, neg(0xaf123456)
	// W32: encoding: [0x05,0x6a,0xfc,0xd6,0x7c,0xfc,0xfc,0x83,0x56,0x34,0x12,0xaf]			// W32: encoding: [0x05,0x6a,0xfc,0xd6,0x7c,0xfc,0xfc,0x83,0x56,0x34,0x12,0xaf]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc_lo, -1, -exec_hi, -src_scc			v_div_scale_f32 v5, vcc_lo, -1, -exec_hi, -src_scc
	// W32: encoding: [0x05,0x6a,0xfc,0xd6,0xc1,0xfe,0xf4,0xc3]			// W32: encoding: [0x05,0x6a,0xfc,0xd6,0xc1,0xfe,0xf4,0xc3]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc_lo, 0.5, -m0, 0.5 mul:2			v_div_scale_f32 v5, vcc_lo, 0.5, -m0, 0.5 mul:2
	// W32: encoding: [0x05,0x6a,0xfc,0xd6,0xf0,0xfa,0xc0,0x4b]			// W32: encoding: [0x05,0x6a,0xfc,0xd6,0xf0,0xfa,0xc0,0x4b]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc_lo, -src_scc, vcc_lo, -1 mul:4			v_div_scale_f32 v5, vcc_lo, -src_scc, vcc_lo, -1 mul:4
	// W32: encoding: [0x05,0x6a,0xfc,0xd6,0xfd,0xd4,0x04,0x33]			// W32: encoding: [0x05,0x6a,0xfc,0xd6,0xfd,0xd4,0x04,0x33]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v255, vcc_lo, neg(0xaf123456), -vcc_hi, null clamp div:2			v_div_scale_f32 v255, vcc_lo, neg(0xaf123456), -vcc_hi, null clamp div:2
	// W32: encoding: [0xff,0xea,0xfc,0xd6,0xff,0xd6,0xf0,0x79,0x56,0x34,0x12,0xaf]			// W32: encoding: [0xff,0xea,0xfc,0xd6,0xff,0xd6,0xf0,0x79,0x56,0x34,0x12,0xaf]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc, v1, v2, s3			v_div_scale_f32 v5, vcc, v1, v2, s3
	// W64: encoding: [0x05,0x6a,0xfc,0xd6,0x01,0x05,0x0e,0x00]			// W64: encoding: [0x05,0x6a,0xfc,0xd6,0x01,0x05,0x0e,0x00]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc, v255, s2, s105			v_div_scale_f32 v5, vcc, v255, s2, s105
	// W64: encoding: [0x05,0x6a,0xfc,0xd6,0xff,0x05,0xa4,0x01]			// W64: encoding: [0x05,0x6a,0xfc,0xd6,0xff,0x05,0xa4,0x01]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc, s1, v255, exec_hi			v_div_scale_f32 v5, vcc, s1, v255, exec_hi
	// W64: encoding: [0x05,0x6a,0xfc,0xd6,0x01,0xfe,0xff,0x01]			// W64: encoding: [0x05,0x6a,0xfc,0xd6,0x01,0xfe,0xff,0x01]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc, s105, s105, exec_lo			v_div_scale_f32 v5, vcc, s105, s105, exec_lo
	// W64: encoding: [0x05,0x6a,0xfc,0xd6,0x69,0xd2,0xf8,0x01]			// W64: encoding: [0x05,0x6a,0xfc,0xd6,0x69,0xd2,0xf8,0x01]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc, vcc_lo, ttmp15, v3			v_div_scale_f32 v5, vcc, vcc_lo, ttmp15, v3
	// W64: encoding: [0x05,0x6a,0xfc,0xd6,0x6a,0xf6,0x0c,0x04]			// W64: encoding: [0x05,0x6a,0xfc,0xd6,0x6a,0xf6,0x0c,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc, vcc_hi, 0xaf123456, v255			v_div_scale_f32 v5, vcc, vcc_hi, 0xaf123456, v255
	// W64: encoding: [0x05,0x6a,0xfc,0xd6,0x6b,0xfe,0xfd,0x07,0x56,0x34,0x12,0xaf]			// W64: encoding: [0x05,0x6a,0xfc,0xd6,0x6b,0xfe,0xfd,0x07,0x56,0x34,0x12,0xaf]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc, -ttmp15, -src_scc, -ttmp15			v_div_scale_f32 v5, vcc, -ttmp15, -src_scc, -ttmp15
	// W64: encoding: [0x05,0x6a,0xfc,0xd6,0x7b,0xfa,0xed,0xe1]			// W64: encoding: [0x05,0x6a,0xfc,0xd6,0x7b,0xfa,0xed,0xe1]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc, m0, 0.5, m0			v_div_scale_f32 v5, vcc, m0, 0.5, m0
	// W64: encoding: [0x05,0x6a,0xfc,0xd6,0x7d,0xe0,0xf5,0x01]			// W64: encoding: [0x05,0x6a,0xfc,0xd6,0x7d,0xe0,0xf5,0x01]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc, exec_lo, -1, vcc_hi			v_div_scale_f32 v5, vcc, exec_lo, -1, vcc_hi
	// W64: encoding: [0x05,0x6a,0xfc,0xd6,0x7e,0x82,0xad,0x01]			// W64: encoding: [0x05,0x6a,0xfc,0xd6,0x7e,0x82,0xad,0x01]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc, -exec_hi, null, -vcc_lo			v_div_scale_f32 v5, vcc, -exec_hi, null, -vcc_lo
	// W64: encoding: [0x05,0x6a,0xfc,0xd6,0x7f,0xf8,0xa8,0xa1]			// W64: encoding: [0x05,0x6a,0xfc,0xd6,0x7f,0xf8,0xa8,0xa1]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc, null, exec_lo, neg(0xaf123456)			v_div_scale_f32 v5, vcc, null, exec_lo, neg(0xaf123456)
	// W64: encoding: [0x05,0x6a,0xfc,0xd6,0x7c,0xfc,0xfc,0x83,0x56,0x34,0x12,0xaf]			// W64: encoding: [0x05,0x6a,0xfc,0xd6,0x7c,0xfc,0xfc,0x83,0x56,0x34,0x12,0xaf]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc, -1, -exec_hi, -src_scc			v_div_scale_f32 v5, vcc, -1, -exec_hi, -src_scc
	// W64: encoding: [0x05,0x6a,0xfc,0xd6,0xc1,0xfe,0xf4,0xc3]			// W64: encoding: [0x05,0x6a,0xfc,0xd6,0xc1,0xfe,0xf4,0xc3]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc, 0.5, -m0, 0.5 mul:2			v_div_scale_f32 v5, vcc, 0.5, -m0, 0.5 mul:2
	// W64: encoding: [0x05,0x6a,0xfc,0xd6,0xf0,0xfa,0xc0,0x4b]			// W64: encoding: [0x05,0x6a,0xfc,0xd6,0xf0,0xfa,0xc0,0x4b]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v5, vcc, -src_scc, vcc_lo, -1 mul:4			v_div_scale_f32 v5, vcc, -src_scc, vcc_lo, -1 mul:4
	// W64: encoding: [0x05,0x6a,0xfc,0xd6,0xfd,0xd4,0x04,0x33]			// W64: encoding: [0x05,0x6a,0xfc,0xd6,0xfd,0xd4,0x04,0x33]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f32 v255, vcc, neg(0xaf123456), -vcc_hi, null clamp div:2			v_div_scale_f32 v255, vcc, neg(0xaf123456), -vcc_hi, null clamp div:2
	// W64: encoding: [0xff,0xea,0xfc,0xd6,0xff,0xd6,0xf0,0x79,0x56,0x34,0x12,0xaf]			// W64: encoding: [0xff,0xea,0xfc,0xd6,0xff,0xd6,0xf0,0x79,0x56,0x34,0x12,0xaf]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[5:6], vcc_lo, v[1:2], v[2:3], v[3:4]			v_div_scale_f64 v[5:6], vcc_lo, v[1:2], v[2:3], v[3:4]
	// W32: encoding: [0x05,0x6a,0xfd,0xd6,0x01,0x05,0x0e,0x04]			// W32: encoding: [0x05,0x6a,0xfd,0xd6,0x01,0x05,0x0e,0x04]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[5:6], vcc_lo, v[254:255], v[254:255], s[6:7]			v_div_scale_f64 v[5:6], vcc_lo, v[254:255], v[254:255], s[6:7]
	// W32: encoding: [0x05,0x6a,0xfd,0xd6,0xfe,0xfd,0x1b,0x00]			// W32: encoding: [0x05,0x6a,0xfd,0xd6,0xfe,0xfd,0x1b,0x00]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[5:6], vcc_lo, s[2:3], s[4:5], v[254:255]			v_div_scale_f64 v[5:6], vcc_lo, s[2:3], s[4:5], v[254:255]
	// W32: encoding: [0x05,0x6a,0xfd,0xd6,0x02,0x08,0xf8,0x07]			// W32: encoding: [0x05,0x6a,0xfd,0xd6,0x02,0x08,0xf8,0x07]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[5:6], vcc_lo, -s[104:105], s[104:105], -s[104:105]			v_div_scale_f64 v[5:6], vcc_lo, -s[104:105], s[104:105], -s[104:105]
	// W32: encoding: [0x05,0x6a,0xfd,0xd6,0x68,0xd0,0xa0,0xa1]			// W32: encoding: [0x05,0x6a,0xfd,0xd6,0x68,0xd0,0xa0,0xa1]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[5:6], vcc_lo, vcc, -ttmp[14:15], -ttmp[14:15]			v_div_scale_f64 v[5:6], vcc_lo, vcc, -ttmp[14:15], -ttmp[14:15]
	// W32: encoding: [0x05,0x6a,0xfd,0xd6,0x6a,0xf4,0xe8,0xc1]			// W32: encoding: [0x05,0x6a,0xfd,0xd6,0x6a,0xf4,0xe8,0xc1]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[5:6], vcc_lo, -ttmp[14:15], 0xaf123456, null			v_div_scale_f64 v[5:6], vcc_lo, -ttmp[14:15], 0xaf123456, null
	// W32: encoding: [0x05,0x6a,0xfd,0xd6,0x7a,0xfe,0xf1,0x21,0x56,0x34,0x12,0xaf]			// W32: encoding: [0x05,0x6a,0xfd,0xd6,0x7a,0xfe,0xf1,0x21,0x56,0x34,0x12,0xaf]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[5:6], vcc_lo, -exec, -src_scc, -exec			v_div_scale_f64 v[5:6], vcc_lo, -exec, -src_scc, -exec
	// W32: encoding: [0x05,0x6a,0xfd,0xd6,0x7e,0xfa,0xf9,0xe1]			// W32: encoding: [0x05,0x6a,0xfd,0xd6,0x7e,0xfa,0xf9,0xe1]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[5:6], vcc_lo, null, 0.5, vcc			v_div_scale_f64 v[5:6], vcc_lo, null, 0.5, vcc
	// W32: encoding: [0x05,0x6a,0xfd,0xd6,0x7c,0xe0,0xa9,0x01]			// W32: encoding: [0x05,0x6a,0xfd,0xd6,0x7c,0xe0,0xa9,0x01]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[5:6], vcc_lo, -1, -1, 0xaf123456			v_div_scale_f64 v[5:6], vcc_lo, -1, -1, 0xaf123456
	// W32: encoding: [0x05,0x6a,0xfd,0xd6,0xc1,0x82,0xfd,0x03,0x56,0x34,0x12,0xaf]			// W32: encoding: [0x05,0x6a,0xfd,0xd6,0xc1,0x82,0xfd,0x03,0x56,0x34,0x12,0xaf]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[5:6], vcc_lo, 0.5, null, -src_scc mul:2			v_div_scale_f64 v[5:6], vcc_lo, 0.5, null, -src_scc mul:2
	// W32: encoding: [0x05,0x6a,0xfd,0xd6,0xf0,0xf8,0xf4,0x8b]			// W32: encoding: [0x05,0x6a,0xfd,0xd6,0xf0,0xf8,0xf4,0x8b]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[5:6], vcc_lo, -src_scc, -exec, 0.5 mul:4			v_div_scale_f64 v[5:6], vcc_lo, -src_scc, -exec, 0.5 mul:4
	// W32: encoding: [0x05,0x6a,0xfd,0xd6,0xfd,0xfc,0xc0,0x73]			// W32: encoding: [0x05,0x6a,0xfd,0xd6,0xfd,0xfc,0xc0,0x73]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[254:255], vcc_lo, 0xaf123456, -vcc, -1 clamp div:2			v_div_scale_f64 v[254:255], vcc_lo, 0xaf123456, -vcc, -1 clamp div:2
	// W32: encoding: [0xfe,0xea,0xfd,0xd6,0xff,0xd4,0x04,0x5b,0x56,0x34,0x12,0xaf]			// W32: encoding: [0xfe,0xea,0xfd,0xd6,0xff,0xd4,0x04,0x5b,0x56,0x34,0x12,0xaf]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[5:6], vcc, v[1:2], v[2:3], v[3:4]			v_div_scale_f64 v[5:6], vcc, v[1:2], v[2:3], v[3:4]
	// W64: encoding: [0x05,0x6a,0xfd,0xd6,0x01,0x05,0x0e,0x04]			// W64: encoding: [0x05,0x6a,0xfd,0xd6,0x01,0x05,0x0e,0x04]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[5:6], vcc, v[254:255], v[254:255], s[6:7]			v_div_scale_f64 v[5:6], vcc, v[254:255], v[254:255], s[6:7]
	// W64: encoding: [0x05,0x6a,0xfd,0xd6,0xfe,0xfd,0x1b,0x00]			// W64: encoding: [0x05,0x6a,0xfd,0xd6,0xfe,0xfd,0x1b,0x00]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[5:6], vcc, s[2:3], s[4:5], v[254:255]			v_div_scale_f64 v[5:6], vcc, s[2:3], s[4:5], v[254:255]
	// W64: encoding: [0x05,0x6a,0xfd,0xd6,0x02,0x08,0xf8,0x07]			// W64: encoding: [0x05,0x6a,0xfd,0xd6,0x02,0x08,0xf8,0x07]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[5:6], vcc, -s[104:105], s[104:105], -s[104:105]			v_div_scale_f64 v[5:6], vcc, -s[104:105], s[104:105], -s[104:105]
	// W64: encoding: [0x05,0x6a,0xfd,0xd6,0x68,0xd0,0xa0,0xa1]			// W64: encoding: [0x05,0x6a,0xfd,0xd6,0x68,0xd0,0xa0,0xa1]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[5:6], vcc, vcc, -ttmp[14:15], -ttmp[14:15]			v_div_scale_f64 v[5:6], vcc, vcc, -ttmp[14:15], -ttmp[14:15]
	// W64: encoding: [0x05,0x6a,0xfd,0xd6,0x6a,0xf4,0xe8,0xc1]			// W64: encoding: [0x05,0x6a,0xfd,0xd6,0x6a,0xf4,0xe8,0xc1]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[5:6], vcc, -ttmp[14:15], 0xaf123456, null			v_div_scale_f64 v[5:6], vcc, -ttmp[14:15], 0xaf123456, null
	// W64: encoding: [0x05,0x6a,0xfd,0xd6,0x7a,0xfe,0xf1,0x21,0x56,0x34,0x12,0xaf]			// W64: encoding: [0x05,0x6a,0xfd,0xd6,0x7a,0xfe,0xf1,0x21,0x56,0x34,0x12,0xaf]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[5:6], vcc, -exec, -src_scc, -exec			v_div_scale_f64 v[5:6], vcc, -exec, -src_scc, -exec
	// W64: encoding: [0x05,0x6a,0xfd,0xd6,0x7e,0xfa,0xf9,0xe1]			// W64: encoding: [0x05,0x6a,0xfd,0xd6,0x7e,0xfa,0xf9,0xe1]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[5:6], vcc, null, 0.5, vcc			v_div_scale_f64 v[5:6], vcc, null, 0.5, vcc
	// W64: encoding: [0x05,0x6a,0xfd,0xd6,0x7c,0xe0,0xa9,0x01]			// W64: encoding: [0x05,0x6a,0xfd,0xd6,0x7c,0xe0,0xa9,0x01]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[5:6], vcc, -1, -1, 0xaf123456			v_div_scale_f64 v[5:6], vcc, -1, -1, 0xaf123456
	// W64: encoding: [0x05,0x6a,0xfd,0xd6,0xc1,0x82,0xfd,0x03,0x56,0x34,0x12,0xaf]			// W64: encoding: [0x05,0x6a,0xfd,0xd6,0xc1,0x82,0xfd,0x03,0x56,0x34,0x12,0xaf]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[5:6], vcc, 0.5, null, -src_scc mul:2			v_div_scale_f64 v[5:6], vcc, 0.5, null, -src_scc mul:2
	// W64: encoding: [0x05,0x6a,0xfd,0xd6,0xf0,0xf8,0xf4,0x8b]			// W64: encoding: [0x05,0x6a,0xfd,0xd6,0xf0,0xf8,0xf4,0x8b]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[5:6], vcc, -src_scc, -exec, 0.5 mul:4			v_div_scale_f64 v[5:6], vcc, -src_scc, -exec, 0.5 mul:4
	// W64: encoding: [0x05,0x6a,0xfd,0xd6,0xfd,0xfc,0xc0,0x73]			// W64: encoding: [0x05,0x6a,0xfd,0xd6,0xfd,0xfc,0xc0,0x73]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[254:255], vcc, 0xaf123456, -vcc, -1 clamp div:2			v_div_scale_f64 v[254:255], vcc, 0xaf123456, -vcc, -1 clamp div:2
	// W64: encoding: [0xfe,0xea,0xfd,0xd6,0xff,0xd4,0x04,0x5b,0x56,0x34,0x12,0xaf]			// W64: encoding: [0xfe,0xea,0xfd,0xd6,0xff,0xd4,0x04,0x5b,0x56,0x34,0x12,0xaf]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: operands are not valid for this GPU or mode

	v_dot2_bf16_bf16 v5, v1, v2, s3			v_dot2_bf16_bf16 v5, v1, v2, s3
	// GFX11: encoding: [0x05,0x00,0x67,0xd6,0x01,0x05,0x0e,0x00]			// GFX11: encoding: [0x05,0x00,0x67,0xd6,0x01,0x05,0x0e,0x00]

	v_dot2_bf16_bf16 v5, v255, v255, s105			v_dot2_bf16_bf16 v5, v255, v255, s105
	// GFX11: encoding: [0x05,0x00,0x67,0xd6,0xff,0xff,0xa7,0x01]			// GFX11: encoding: [0x05,0x00,0x67,0xd6,0xff,0xff,0xa7,0x01]

	v_dot2_bf16_bf16 v5, s1, s2, v3			v_dot2_bf16_bf16 v5, s1, s2, v3
	▲ Show 20 Lines • Show All 7,561 Lines • Show Last 20 Lines

llvm/test/MC/AMDGPU/vop3.s

	Show First 20 Lines • Show All 402 Lines • ▼ Show 20 Lines
	// SICI: v_add_f64 v[0:1], -v[2:3], \|v[5:6]\| clamp mul:4 ; encoding: [0x00,0x0a,0xc8,0xd2,0x02,0x0b,0x02,0x30]			// SICI: v_add_f64 v[0:1], -v[2:3], \|v[5:6]\| clamp mul:4 ; encoding: [0x00,0x0a,0xc8,0xd2,0x02,0x0b,0x02,0x30]
	// VI: v_add_f64 v[0:1], -v[2:3], \|v[5:6]\| clamp mul:4 ; encoding: [0x00,0x82,0x80,0xd2,0x02,0x0b,0x02,0x30]			// VI: v_add_f64 v[0:1], -v[2:3], \|v[5:6]\| clamp mul:4 ; encoding: [0x00,0x82,0x80,0xd2,0x02,0x0b,0x02,0x30]

	v_add_f64_e64 v[0:1], -v[2:3], abs(v[5:6]) clamp mul:4			v_add_f64_e64 v[0:1], -v[2:3], abs(v[5:6]) clamp mul:4
	// SICI: v_add_f64 v[0:1], -v[2:3], \|v[5:6]\| clamp mul:4 ; encoding: [0x00,0x0a,0xc8,0xd2,0x02,0x0b,0x02,0x30]			// SICI: v_add_f64 v[0:1], -v[2:3], \|v[5:6]\| clamp mul:4 ; encoding: [0x00,0x0a,0xc8,0xd2,0x02,0x0b,0x02,0x30]
	// VI: v_add_f64 v[0:1], -v[2:3], \|v[5:6]\| clamp mul:4 ; encoding: [0x00,0x82,0x80,0xd2,0x02,0x0b,0x02,0x30]			// VI: v_add_f64 v[0:1], -v[2:3], \|v[5:6]\| clamp mul:4 ; encoding: [0x00,0x82,0x80,0xd2,0x02,0x0b,0x02,0x30]

	v_div_scale_f64 v[24:25], vcc, v[22:23], v[22:23], v[20:21]			v_div_scale_f64 v[24:25], vcc, v[22:23], v[22:23], v[20:21]
	// SICI: v_div_scale_f64 v[24:25], vcc, v[22:23], v[22:23], v[20:21] ; encoding: [0x18,0x6a,0xdc,0xd2,0x16,0x2d,0x52,0x04]			// SICI: v_div_scale_f64 v[24:25], vcc, v[22:23], v[22:23], v[20:21] ; encoding: [0x18,0x6a,0xdc,0xd2,0x16,0x2d,0x52,0x04]
				rampitecUnsubmitted Done Reply Inline Actions Ditto. rampitec: Ditto.
	// VI: v_div_scale_f64 v[24:25], vcc, v[22:23], v[22:23], v[20:21] ; encoding: [0x18,0x6a,0xe1,0xd1,0x16,0x2d,0x52,0x04]			// VI: v_div_scale_f64 v[24:25], vcc, v[22:23], v[22:23], v[20:21] ; encoding: [0x18,0x6a,0xe1,0xd1,0x16,0x2d,0x52,0x04]

	v_div_scale_f64 v[24:25], s[10:11], -v[22:23], v[20:21], v[20:21] clamp			v_div_scale_f64 v[24:25], vcc, -v[22:23], v[20:21], v[20:21] clamp
	// SICI: v_div_scale_f64 v[24:25], s[10:11], -v[22:23], v[20:21], v[20:21] clamp ; encoding: [0x18,0x0a,0xdc,0xd2,0x16,0x29,0x52,0x24]			// SICI: v_div_scale_f64 v[24:25], vcc, -v[22:23], v[20:21], v[20:21] clamp ; encoding: [0x18,0x6a,0xdc,0xd2,0x16,0x29,0x52,0x24]
	// VI: v_div_scale_f64 v[24:25], s[10:11], -v[22:23], v[20:21], v[20:21] clamp ; encoding: [0x18,0x8a,0xe1,0xd1,0x16,0x29,0x52,0x24]			// VI: v_div_scale_f64 v[24:25], vcc, -v[22:23], v[20:21], v[20:21] clamp ; encoding: [0x18,0xea,0xe1,0xd1,0x16,0x29,0x52,0x24]

	v_div_scale_f64 v[24:25], s[10:11], v[22:23], -v[20:21], v[20:21] clamp mul:2			v_div_scale_f64 v[24:25], vcc, v[22:23], -v[20:21], v[20:21] clamp mul:2
	// SICI: v_div_scale_f64 v[24:25], s[10:11], v[22:23], -v[20:21], v[20:21] clamp mul:2 ; encoding: [0x18,0x0a,0xdc,0xd2,0x16,0x29,0x52,0x4c]			// SICI: v_div_scale_f64 v[24:25], vcc, v[22:23], -v[20:21], v[20:21] clamp mul:2 ; encoding: [0x18,0x6a,0xdc,0xd2,0x16,0x29,0x52,0x4c]
	// VI: v_div_scale_f64 v[24:25], s[10:11], v[22:23], -v[20:21], v[20:21] clamp mul:2 ; encoding: [0x18,0x8a,0xe1,0xd1,0x16,0x29,0x52,0x4c]			// VI: v_div_scale_f64 v[24:25], vcc, v[22:23], -v[20:21], v[20:21] clamp mul:2 ; encoding: [0x18,0xea,0xe1,0xd1,0x16,0x29,0x52,0x4c]

	v_div_scale_f64 v[24:25], s[10:11], v[22:23], v[20:21], -v[20:21]			v_div_scale_f64 v[24:25], vcc, v[22:23], v[20:21], -v[20:21]
	// SICI: v_div_scale_f64 v[24:25], s[10:11], v[22:23], v[20:21], -v[20:21] ; encoding: [0x18,0x0a,0xdc,0xd2,0x16,0x29,0x52,0x84]			// SICI: v_div_scale_f64 v[24:25], vcc, v[22:23], v[20:21], -v[20:21] ; encoding: [0x18,0x6a,0xdc,0xd2,0x16,0x29,0x52,0x84]
	// VI: v_div_scale_f64 v[24:25], s[10:11], v[22:23], v[20:21], -v[20:21] ; encoding: [0x18,0x0a,0xe1,0xd1,0x16,0x29,0x52,0x84]			// VI: v_div_scale_f64 v[24:25], vcc, v[22:23], v[20:21], -v[20:21] ; encoding: [0x18,0x6a,0xe1,0xd1,0x16,0x29,0x52,0x84]

	v_div_scale_f32 v24, vcc, v22, v22, v20			v_div_scale_f32 v24, vcc, v22, v22, v20
	// SICI: v_div_scale_f32 v24, vcc, v22, v22, v20 ; encoding: [0x18,0x6a,0xda,0xd2,0x16,0x2d,0x52,0x04]			// SICI: v_div_scale_f32 v24, vcc, v22, v22, v20 ; encoding: [0x18,0x6a,0xda,0xd2,0x16,0x2d,0x52,0x04]
	// VI: v_div_scale_f32 v24, vcc, v22, v22, v20 ; encoding: [0x18,0x6a,0xe0,0xd1,0x16,0x2d,0x52,0x04]			// VI: v_div_scale_f32 v24, vcc, v22, v22, v20 ; encoding: [0x18,0x6a,0xe0,0xd1,0x16,0x2d,0x52,0x04]

	v_div_scale_f32 v24, vcc, -v22, v22, v20			v_div_scale_f32 v24, vcc, -v22, v22, v20
	// SICI: v_div_scale_f32 v24, vcc, -v22, v22, v20 ; encoding: [0x18,0x6a,0xda,0xd2,0x16,0x2d,0x52,0x24]			// SICI: v_div_scale_f32 v24, vcc, -v22, v22, v20 ; encoding: [0x18,0x6a,0xda,0xd2,0x16,0x2d,0x52,0x24]
	// VI: v_div_scale_f32 v24, vcc, -v22, v22, v20 ; encoding: [0x18,0x6a,0xe0,0xd1,0x16,0x2d,0x52,0x24]			// VI: v_div_scale_f32 v24, vcc, -v22, v22, v20 ; encoding: [0x18,0x6a,0xe0,0xd1,0x16,0x2d,0x52,0x24]

	v_div_scale_f32 v24, vcc, v22, -v22, v20 clamp			v_div_scale_f32 v24, vcc, v22, -v22, v20 clamp
	// SICI: v_div_scale_f32 v24, vcc, v22, -v22, v20 clamp ; encoding: [0x18,0x6a,0xda,0xd2,0x16,0x2d,0x52,0x44]			// SICI: v_div_scale_f32 v24, vcc, v22, -v22, v20 clamp ; encoding: [0x18,0x6a,0xda,0xd2,0x16,0x2d,0x52,0x44]
	// VI: v_div_scale_f32 v24, vcc, v22, -v22, v20 clamp ; encoding: [0x18,0xea,0xe0,0xd1,0x16,0x2d,0x52,0x44]			// VI: v_div_scale_f32 v24, vcc, v22, -v22, v20 clamp ; encoding: [0x18,0xea,0xe0,0xd1,0x16,0x2d,0x52,0x44]

	v_div_scale_f32 v24, vcc, v22, v22, -v20 clamp div:2			v_div_scale_f32 v24, vcc, v22, v22, -v20 clamp div:2
	// SICI: v_div_scale_f32 v24, vcc, v22, v22, -v20 clamp div:2 ; encoding: [0x18,0x6a,0xda,0xd2,0x16,0x2d,0x52,0x9c]			// SICI: v_div_scale_f32 v24, vcc, v22, v22, -v20 clamp div:2 ; encoding: [0x18,0x6a,0xda,0xd2,0x16,0x2d,0x52,0x9c]
	// VI: v_div_scale_f32 v24, vcc, v22, v22, -v20 clamp div:2 ; encoding: [0x18,0xea,0xe0,0xd1,0x16,0x2d,0x52,0x9c]			// VI: v_div_scale_f32 v24, vcc, v22, v22, -v20 clamp div:2 ; encoding: [0x18,0xea,0xe0,0xd1,0x16,0x2d,0x52,0x9c]

	v_div_scale_f32 v24, s[10:11], v22, v22, v20			v_div_scale_f32 v24, vcc, v22, v22, v20
	// SICI: v_div_scale_f32 v24, s[10:11], v22, v22, v20 ; encoding: [0x18,0x0a,0xda,0xd2,0x16,0x2d,0x52,0x04]			// SICI: v_div_scale_f32 v24, vcc, v22, v22, v20 ; encoding: [0x18,0x6a,0xda,0xd2,0x16,0x2d,0x52,0x04]
	// VI: v_div_scale_f32 v24, s[10:11], v22, v22, v20 ; encoding: [0x18,0x0a,0xe0,0xd1,0x16,0x2d,0x52,0x04]			// VI: v_div_scale_f32 v24, vcc, v22, v22, v20 ; encoding: [0x18,0x6a,0xe0,0xd1,0x16,0x2d,0x52,0x04]

	v_div_scale_f32 v24, vcc, v22, 1.0, v22			v_div_scale_f32 v24, vcc, v22, 1.0, v22
	// SICI: v_div_scale_f32 v24, vcc, v22, 1.0, v22 ; encoding: [0x18,0x6a,0xda,0xd2,0x16,0xe5,0x59,0x04]			// SICI: v_div_scale_f32 v24, vcc, v22, 1.0, v22 ; encoding: [0x18,0x6a,0xda,0xd2,0x16,0xe5,0x59,0x04]
	// VI: v_div_scale_f32 v24, vcc, v22, 1.0, v22 ; encoding: [0x18,0x6a,0xe0,0xd1,0x16,0xe5,0x59,0x04]			// VI: v_div_scale_f32 v24, vcc, v22, 1.0, v22 ; encoding: [0x18,0x6a,0xe0,0xd1,0x16,0xe5,0x59,0x04]

	v_div_scale_f32 v24, vcc, v22, v22, -2.0			v_div_scale_f32 v24, vcc, v22, v22, -2.0
	// SICI: v_div_scale_f32 v24, vcc, v22, v22, -2.0 ; encoding: [0x18,0x6a,0xda,0xd2,0x16,0x2d,0xd6,0x03]			// SICI: v_div_scale_f32 v24, vcc, v22, v22, -2.0 ; encoding: [0x18,0x6a,0xda,0xd2,0x16,0x2d,0xd6,0x03]
	// VI: v_div_scale_f32 v24, vcc, v22, v22, -2.0 ; encoding: [0x18,0x6a,0xe0,0xd1,0x16,0x2d,0xd6,0x03]			// VI: v_div_scale_f32 v24, vcc, v22, v22, -2.0 ; encoding: [0x18,0x6a,0xe0,0xd1,0x16,0x2d,0xd6,0x03]
	▲ Show 20 Lines • Show All 426 Lines • Show Last 20 Lines

llvm/test/MC/AMDGPU/wave32.s

	Show First 20 Lines • Show All 377 Lines • ▼ Show 20 Lines
	v_add_co_ci_u32_e64 v4, s0, v1, v5, vcc_lo			v_add_co_ci_u32_e64 v4, s0, v1, v5, vcc_lo
	// GFX1032: v_add_co_ci_u32_e64 v4, s0, v1, v5, vcc_lo ; encoding: [0x04,0x00,0x28,0xd5,0x01,0x0b,0xaa,0x01]			// GFX1032: v_add_co_ci_u32_e64 v4, s0, v1, v5, vcc_lo ; encoding: [0x04,0x00,0x28,0xd5,0x01,0x0b,0xaa,0x01]
	// GFX1064-ERR: :[[@LINE-2]]:25: error: invalid operand for instruction			// GFX1064-ERR: :[[@LINE-2]]:25: error: invalid operand for instruction

	v_add_co_ci_u32_e64 v4, s[0:1], v1, v5, vcc			v_add_co_ci_u32_e64 v4, s[0:1], v1, v5, vcc
	// GFX1032-ERR: :[[@LINE-1]]:25: error: invalid operand for instruction			// GFX1032-ERR: :[[@LINE-1]]:25: error: invalid operand for instruction
	// GFX1064: v_add_co_ci_u32_e64 v4, s[0:1], v1, v5, vcc ; encoding: [0x04,0x00,0x28,0xd5,0x01,0x0b,0xaa,0x01]			// GFX1064: v_add_co_ci_u32_e64 v4, s[0:1], v1, v5, vcc ; encoding: [0x04,0x00,0x28,0xd5,0x01,0x0b,0xaa,0x01]

	v_div_scale_f32 v2, s2, v0, v0, v2			v_div_scale_f32 v2, vcc_lo, v0, v0, v2
	// GFX1032: v_div_scale_f32 v2, s2, v0, v0, v2 ; encoding: [0x02,0x02,0x6d,0xd5,0x00,0x01,0x0a,0x04]			// GFX1032: v_div_scale_f32 v2, vcc_lo, v0, v0, v2 ; encoding: [0x02,0x6a,0x6d,0xd5,0x00,0x01,0x0a,0x04]
	// GFX1064-ERR: :[[@LINE-2]]:21: error: invalid operand for instruction			// GFX1064-ERR: :[[@LINE-2]]:1: error: operands are not valid for this GPU or mode
				rampitecUnsubmitted Done Reply Inline Actions Please keep checks order so it is easy to see changes. rampitec: Please keep checks order so it is easy to see changes.
				Pierre-vhAuthorUnsubmitted Done Reply Inline Actions I didn't change the order. Do you mean the -ERR check should always be at the bottom? (e.g. change GFX1064 to GFX1032 instead of adding the -ERR suffix & checking the error message) Pierre-vh: I didn't change the order. Do you mean the -ERR check should always be at the bottom? (e.g.
				rampitecUnsubmitted Not Done Reply Inline Actions It was wave32 test. Now it is wave64 test and wave32 test is below. They are swapped. rampitec: It was wave32 test. Now it is wave64 test and wave32 test is below. They are swapped.

	v_div_scale_f32 v2, s[2:3], v0, v0, v2			v_div_scale_f32 v2, vcc, v0, v0, v2
	// GFX1032-ERR: :[[@LINE-1]]:21: error: invalid operand for instruction			// GFX1032-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
	// GFX1064: v_div_scale_f32 v2, s[2:3], v0, v0, v2 ; encoding: [0x02,0x02,0x6d,0xd5,0x00,0x01,0x0a,0x04]			// GFX1064: v_div_scale_f32 v2, vcc, v0, v0, v2 ; encoding: [0x02,0x6a,0x6d,0xd5,0x00,0x01,0x0a,0x04]

	v_div_scale_f64 v[2:3], s2, v[0:1], v[0:1], v[2:3]			v_div_scale_f64 v[2:3], vcc_lo, v[0:1], v[0:1], v[2:3]
	// GFX1032: v_div_scale_f64 v[2:3], s2, v[0:1], v[0:1], v[2:3] ; encoding: [0x02,0x02,0x6e,0xd5,0x00,0x01,0x0a,0x04]			// GFX1032: v_div_scale_f64 v[2:3], vcc_lo, v[0:1], v[0:1], v[2:3] ; encoding: [0x02,0x6a,0x6e,0xd5,0x00,0x01,0x0a,0x04]
	// GFX1064-ERR: :[[@LINE-2]]:25: error: invalid operand for instruction			// GFX1064-ERR: :[[@LINE-2]]:1: error: operands are not valid for this GPU or mode

	v_div_scale_f64 v[2:3], s[2:3], v[0:1], v[0:1], v[2:3]			v_div_scale_f64 v[2:3], vcc, v[0:1], v[0:1], v[2:3]
	// GFX1032-ERR: :[[@LINE-1]]:25: error: invalid operand for instruction			// GFX1032-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
	// GFX1064: v_div_scale_f64 v[2:3], s[2:3], v[0:1], v[0:1], v[2:3] ; encoding: [0x02,0x02,0x6e,0xd5,0x00,0x01,0x0a,0x04]			// GFX1064: v_div_scale_f64 v[2:3], vcc, v[0:1], v[0:1], v[2:3] ; encoding: [0x02,0x6a,0x6e,0xd5,0x00,0x01,0x0a,0x04]

	v_mad_i64_i32 v[0:1], s6, v0, v1, v[2:3]			v_mad_i64_i32 v[0:1], s6, v0, v1, v[2:3]
	// GFX1032: v_mad_i64_i32 v[0:1], s6, v0, v1, v[2:3] ; encoding: [0x00,0x06,0x77,0xd5,0x00,0x03,0x0a,0x04]			// GFX1032: v_mad_i64_i32 v[0:1], s6, v0, v1, v[2:3] ; encoding: [0x00,0x06,0x77,0xd5,0x00,0x03,0x0a,0x04]
	// GFX1064-ERR: :[[@LINE-2]]:23: error: invalid operand for instruction			// GFX1064-ERR: :[[@LINE-2]]:23: error: invalid operand for instruction

	v_mad_i64_i32 v[0:1], s[6:7], v0, v1, v[2:3]			v_mad_i64_i32 v[0:1], s[6:7], v0, v1, v[2:3]
	// GFX1032-ERR: :[[@LINE-1]]:23: error: invalid operand for instruction			// GFX1032-ERR: :[[@LINE-1]]:23: error: invalid operand for instruction
	// GFX1064: v_mad_i64_i32 v[0:1], s[6:7], v0, v1, v[2:3] ; encoding: [0x00,0x06,0x77,0xd5,0x00,0x03,0x0a,0x04]			// GFX1064: v_mad_i64_i32 v[0:1], s[6:7], v0, v1, v[2:3] ; encoding: [0x00,0x06,0x77,0xd5,0x00,0x03,0x0a,0x04]
	Show All 28 Lines

llvm/test/MC/AMDGPU/wave_any.s

	Show First 20 Lines • Show All 192 Lines • ▼ Show 20 Lines
	// GFX10: v_add_co_ci_u32_e64 v4, vcc, v1, v5, s[2:3] ; encoding: [0x04,0x6a,0x28,0xd5,0x01,0x0b,0x0a,0x00]			// GFX10: v_add_co_ci_u32_e64 v4, vcc, v1, v5, s[2:3] ; encoding: [0x04,0x6a,0x28,0xd5,0x01,0x0b,0x0a,0x00]

	v_add_co_ci_u32_e64 v4, s0, v1, v5, vcc_lo			v_add_co_ci_u32_e64 v4, s0, v1, v5, vcc_lo
	// GFX10: v_add_co_ci_u32_e64 v4, s0, v1, v5, vcc_lo ; encoding: [0x04,0x00,0x28,0xd5,0x01,0x0b,0xaa,0x01]			// GFX10: v_add_co_ci_u32_e64 v4, s0, v1, v5, vcc_lo ; encoding: [0x04,0x00,0x28,0xd5,0x01,0x0b,0xaa,0x01]

	v_add_co_ci_u32_e64 v4, s[0:1], v1, v5, vcc			v_add_co_ci_u32_e64 v4, s[0:1], v1, v5, vcc
	// GFX10: v_add_co_ci_u32_e64 v4, s[0:1], v1, v5, vcc ; encoding: [0x04,0x00,0x28,0xd5,0x01,0x0b,0xaa,0x01]			// GFX10: v_add_co_ci_u32_e64 v4, s[0:1], v1, v5, vcc ; encoding: [0x04,0x00,0x28,0xd5,0x01,0x0b,0xaa,0x01]

	v_div_scale_f32 v2, s2, v0, v0, v2			v_div_scale_f32 v2, vcc_lo, v0, v0, v2
	// GFX10: v_div_scale_f32 v2, s2, v0, v0, v2 ; encoding: [0x02,0x02,0x6d,0xd5,0x00,0x01,0x0a,0x04]			// GFX10: v_div_scale_f32 v2, vcc, v0, v0, v2 ; encoding: [0x02,0x6a,0x6d,0xd5,0x00,0x01,0x0a,0x04]
				rampitecUnsubmitted Done Reply Inline Actions Ditto. rampitec: Ditto.
				Pierre-vhAuthorUnsubmitted Done Reply Inline Actions Is the check order the issue or the encoding? Encoding looks good to me Pierre-vh: Is the check order the issue or the encoding? Encoding looks good to me

	v_div_scale_f32 v2, s[2:3], v0, v0, v2			v_div_scale_f32 v2, vcc, v0, v0, v2
	// GFX10: v_div_scale_f32 v2, s[2:3], v0, v0, v2 ; encoding: [0x02,0x02,0x6d,0xd5,0x00,0x01,0x0a,0x04]			// GFX10: v_div_scale_f32 v2, vcc, v0, v0, v2 ; encoding: [0x02,0x6a,0x6d,0xd5,0x00,0x01,0x0a,0x04]

	v_div_scale_f64 v[2:3], s2, v[0:1], v[0:1], v[2:3]			v_div_scale_f64 v[2:3], vcc_lo, v[0:1], v[0:1], v[2:3]
	rampitecUnsubmitted Done Reply Inline Actions It seems the intent of the test was to check that both wave32 and wave64 versions are accepted with both attributes set. This is lost now. rampitec: It seems the intent of the test was to check that both wave32 and wave64 versions are accepted…
	// GFX10: v_div_scale_f64 v[2:3], s2, v[0:1], v[0:1], v[2:3] ; encoding: [0x02,0x02,0x6e,0xd5,0x00,0x01,0x0a,0x04]			// GFX10: v_div_scale_f64 v[2:3], vcc, v[0:1], v[0:1], v[2:3] ; encoding: [0x02,0x6a,0x6e,0xd5,0x00,0x01,0x0a,0x04]

	v_div_scale_f64 v[2:3], s[2:3], v[0:1], v[0:1], v[2:3]			v_div_scale_f64 v[2:3], vcc, v[0:1], v[0:1], v[2:3]
	// GFX10: v_div_scale_f64 v[2:3], s[2:3], v[0:1], v[0:1], v[2:3] ; encoding: [0x02,0x02,0x6e,0xd5,0x00,0x01,0x0a,0x04]			// GFX10: v_div_scale_f64 v[2:3], vcc, v[0:1], v[0:1], v[2:3] ; encoding: [0x02,0x6a,0x6e,0xd5,0x00,0x01,0x0a,0x04]

	v_mad_i64_i32 v[0:1], s6, v0, v1, v[2:3]			v_mad_i64_i32 v[0:1], s6, v0, v1, v[2:3]
	// GFX10: v_mad_i64_i32 v[0:1], s6, v0, v1, v[2:3] ; encoding: [0x00,0x06,0x77,0xd5,0x00,0x03,0x0a,0x04]			// GFX10: v_mad_i64_i32 v[0:1], s6, v0, v1, v[2:3] ; encoding: [0x00,0x06,0x77,0xd5,0x00,0x03,0x0a,0x04]

	v_mad_i64_i32 v[0:1], s[6:7], v0, v1, v[2:3]			v_mad_i64_i32 v[0:1], s[6:7], v0, v1, v[2:3]
	// GFX10: v_mad_i64_i32 v[0:1], s[6:7], v0, v1, v[2:3] ; encoding: [0x00,0x06,0x77,0xd5,0x00,0x03,0x0a,0x04]			// GFX10: v_mad_i64_i32 v[0:1], s[6:7], v0, v1, v[2:3] ; encoding: [0x00,0x06,0x77,0xd5,0x00,0x03,0x0a,0x04]

	v_mad_u64_u32 v[0:1], s6, v0, v1, v[2:3]			v_mad_u64_u32 v[0:1], s6, v0, v1, v[2:3]
	Show All 19 Lines

llvm/test/MC/Disassembler/AMDGPU/gfx10-wave32.txt

	Show First 20 Lines • Show All 133 Lines • ▼ Show 20 Lines
	# GFX1032: v_add_co_ci_u32_e64 v4, vcc_lo, v1, v5, s2			# GFX1032: v_add_co_ci_u32_e64 v4, vcc_lo, v1, v5, s2
	# GFX1064: v_add_co_ci_u32_e64 v4, vcc, v1, v5, s[2:3]			# GFX1064: v_add_co_ci_u32_e64 v4, vcc, v1, v5, s[2:3]
	0x04,0x6a,0x28,0xd5,0x01,0x0b,0x0a,0x00			0x04,0x6a,0x28,0xd5,0x01,0x0b,0x0a,0x00

	# GFX1032: v_add_co_ci_u32_e64 v4, s0, v1, v5, vcc_lo			# GFX1032: v_add_co_ci_u32_e64 v4, s0, v1, v5, vcc_lo
	# GFX1064: v_add_co_ci_u32_e64 v4, s[0:1], v1, v5, vcc ;			# GFX1064: v_add_co_ci_u32_e64 v4, s[0:1], v1, v5, vcc ;
	0x04,0x00,0x28,0xd5,0x01,0x0b,0xaa,0x01			0x04,0x00,0x28,0xd5,0x01,0x0b,0xaa,0x01

	# GFX1032: v_div_scale_f32 v2, s2, v0, v0, v2			# GFX1032: v_div_scale_f32 v2, vcc_lo, v0, v0, v2
	# GFX1064: v_div_scale_f32 v2, s[2:3], v0, v0, v2			# GFX1064: v_div_scale_f32 v2, vcc, v0, v0, v2
	0x02,0x02,0x6d,0xd5,0x00,0x01,0x0a,0x04			0x02,0x6a,0x6d,0xd5,0x00,0x01,0x0a,0x04

	# GFX1032: v_div_scale_f64 v[2:3], s2, v[0:1], v[0:1], v[2:3]			# GFX1032: v_div_scale_f64 v[2:3], vcc_lo, v[0:1], v[0:1], v[2:3]
	# GFX1064: v_div_scale_f64 v[2:3], s[2:3], v[0:1], v[0:1], v[2:3]			# GFX1064: v_div_scale_f64 v[2:3], vcc, v[0:1], v[0:1], v[2:3]
	0x02,0x02,0x6e,0xd5,0x00,0x01,0x0a,0x04			0x02,0x6a,0x6e,0xd5,0x00,0x01,0x0a,0x04

	# GFX1032: v_mad_i64_i32 v[0:1], s6, v0, v1, v[2:3]			# GFX1032: v_mad_i64_i32 v[0:1], s6, v0, v1, v[2:3]
	# GFX1064: v_mad_i64_i32 v[0:1], s[6:7], v0, v1, v[2:3]			# GFX1064: v_mad_i64_i32 v[0:1], s[6:7], v0, v1, v[2:3]
	0x00,0x06,0x77,0xd5,0x00,0x03,0x0a,0x04			0x00,0x06,0x77,0xd5,0x00,0x03,0x0a,0x04

	# GFX1032: v_mad_u64_u32 v[0:1], s6, v0, v1, v[2:3]			# GFX1032: v_mad_u64_u32 v[0:1], s6, v0, v1, v[2:3]
	# GFX1064: v_mad_u64_u32 v[0:1], s[6:7], v0, v1, v[2:3]			# GFX1064: v_mad_u64_u32 v[0:1], s[6:7], v0, v1, v[2:3]
	0x00,0x06,0x76,0xd5,0x00,0x03,0x0a,0x04			0x00,0x06,0x76,0xd5,0x00,0x03,0x0a,0x04
	Show All 16 Lines

llvm/test/MC/Disassembler/AMDGPU/gfx10_vop3.txt

	# RUN: llvm-mc -arch=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -disassemble -show-encoding < %s \| FileCheck -strict-whitespace -check-prefixes=GFX10,W32 %s			# RUN: llvm-mc -arch=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -disassemble -show-encoding < %s \| FileCheck -check-prefixes=GFX10,W32 %s
	# RUN: llvm-mc -arch=amdgcn -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -disassemble -show-encoding < %s \| FileCheck -strict-whitespace -check-prefixes=GFX10,W64 %s			# RUN: llvm-mc -arch=amdgcn -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -disassemble -show-encoding < %s \| FileCheck -check-prefixes=GFX10,W64 %s


	# GFX10: v_add3_u32 v255, v1, v2, v3 ; encoding: [0xff,0x00,0x6d,0xd7,0x01,0x05,0x0e,0x04]			# GFX10: v_add3_u32 v255, v1, v2, v3 ; encoding: [0xff,0x00,0x6d,0xd7,0x01,0x05,0x0e,0x04]
	0xff,0x00,0x6d,0xd7,0x01,0x05,0x0e,0x04			0xff,0x00,0x6d,0xd7,0x01,0x05,0x0e,0x04

	# GFX10: v_add3_u32 v5, -1, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd7,0xc1,0x04,0x0e,0x04]			# GFX10: v_add3_u32 v5, -1, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd7,0xc1,0x04,0x0e,0x04]
	0x05,0x00,0x6d,0xd7,0xc1,0x04,0x0e,0x04			0x05,0x00,0x6d,0xd7,0xc1,0x04,0x0e,0x04

	▲ Show 20 Lines • Show All 5,822 Lines • ▼ Show 20 Lines
	0x05,0x00,0x70,0xd5,0xfe,0x05,0x0e,0x04			0x05,0x00,0x70,0xd5,0xfe,0x05,0x0e,0x04

	# GFX10: v_div_fmas_f64 v[5:6], \|v[1:2]\|, v[2:3], v[3:4] ; encoding: [0x05,0x01,0x70,0xd5,0x01,0x05,0x0e,0x04]			# GFX10: v_div_fmas_f64 v[5:6], \|v[1:2]\|, v[2:3], v[3:4] ; encoding: [0x05,0x01,0x70,0xd5,0x01,0x05,0x0e,0x04]
	0x05,0x01,0x70,0xd5,0x01,0x05,0x0e,0x04			0x05,0x01,0x70,0xd5,0x01,0x05,0x0e,0x04

	# GFX10: v_div_fmas_f64 v[5:6], \|v[1:2]\|, \|v[2:3]\|, \|v[3:4]\| ; encoding: [0x05,0x07,0x70,0xd5,0x01,0x05,0x0e,0x04]			# GFX10: v_div_fmas_f64 v[5:6], \|v[1:2]\|, \|v[2:3]\|, \|v[3:4]\| ; encoding: [0x05,0x07,0x70,0xd5,0x01,0x05,0x0e,0x04]
	0x05,0x07,0x70,0xd5,0x01,0x05,0x0e,0x04			0x05,0x07,0x70,0xd5,0x01,0x05,0x0e,0x04

	# W32: v_div_scale_f32 v255, s0, v1, v2, v3 ; encoding: [0xff,0x00,0x6d,0xd5,0x01,0x05,0x0e,0x04]			# W32: v_div_scale_f32 v255, vcc_lo, v1, v2, v3 ; encoding: [0xff,0x6a,0x6d,0xd5,0x01,0x05,0x0e,0x04]
	# W64: v_div_scale_f32 v255, s[0:1], v1, v2, v3 ; encoding: [0xff,0x00,0x6d,0xd5,0x01,0x05,0x0e,0x04]			# W64: v_div_scale_f32 v255, vcc, v1, v2, v3 ; encoding: [0xff,0x6a,0x6d,0xd5,0x01,0x05,0x0e,0x04]
	0xff,0x00,0x6d,0xd5,0x01,0x05,0x0e,0x04			0xff,0x6a,0x6d,0xd5,0x01,0x05,0x0e,0x04

	# W32: v_div_scale_f32 v5, s0, -1, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0xc1,0x04,0x0e,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, -1, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0xc1,0x04,0x0e,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], -1, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0xc1,0x04,0x0e,0x04]			# W64: v_div_scale_f32 v5, vcc, -1, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0xc1,0x04,0x0e,0x04]
	0x05,0x00,0x6d,0xd5,0xc1,0x04,0x0e,0x04			0x05,0x6a,0x6d,0xd5,0xc1,0x04,0x0e,0x04

	# W32: v_div_scale_f32 v5, s0, -4.0, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0xf7,0x04,0x0e,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, -4.0, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0xf7,0x04,0x0e,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], -4.0, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0xf7,0x04,0x0e,0x04]			# W64: v_div_scale_f32 v5, vcc, -4.0, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0xf7,0x04,0x0e,0x04]
	0x05,0x00,0x6d,0xd5,0xf7,0x04,0x0e,0x04			0x05,0x6a,0x6d,0xd5,0xf7,0x04,0x0e,0x04

	# W32: v_div_scale_f32 v5, s0, 0, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x80,0x04,0x0e,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, 0, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x80,0x04,0x0e,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], 0, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x80,0x04,0x0e,0x04]			# W64: v_div_scale_f32 v5, vcc, 0, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x80,0x04,0x0e,0x04]
	0x05,0x00,0x6d,0xd5,0x80,0x04,0x0e,0x04			0x05,0x6a,0x6d,0xd5,0x80,0x04,0x0e,0x04

	# W32: v_div_scale_f32 v5, s0, 0.5, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0xf0,0x04,0x0e,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, 0.5, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0xf0,0x04,0x0e,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], 0.5, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0xf0,0x04,0x0e,0x04]			# W64: v_div_scale_f32 v5, vcc, 0.5, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0xf0,0x04,0x0e,0x04]
	0x05,0x00,0x6d,0xd5,0xf0,0x04,0x0e,0x04			0x05,0x6a,0x6d,0xd5,0xf0,0x04,0x0e,0x04

	# W32: v_div_scale_f32 v5, s0, exec_hi, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x7f,0x04,0x0e,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, exec_hi, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x7f,0x04,0x0e,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], exec_hi, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x7f,0x04,0x0e,0x04]			# W64: v_div_scale_f32 v5, vcc, exec_hi, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x7f,0x04,0x0e,0x04]
	0x05,0x00,0x6d,0xd5,0x7f,0x04,0x0e,0x04			0x05,0x6a,0x6d,0xd5,0x7f,0x04,0x0e,0x04

	# W32: v_div_scale_f32 v5, s0, exec_lo, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x7e,0x04,0x0e,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, exec_lo, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x7e,0x04,0x0e,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], exec_lo, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x7e,0x04,0x0e,0x04]			# W64: v_div_scale_f32 v5, vcc, exec_lo, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x7e,0x04,0x0e,0x04]
	0x05,0x00,0x6d,0xd5,0x7e,0x04,0x0e,0x04			0x05,0x6a,0x6d,0xd5,0x7e,0x04,0x0e,0x04

	# W32: v_div_scale_f32 v5, s0, m0, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x7c,0x04,0x0e,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, m0, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x7c,0x04,0x0e,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], m0, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x7c,0x04,0x0e,0x04]			# W64: v_div_scale_f32 v5, vcc, m0, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x7c,0x04,0x0e,0x04]
	0x05,0x00,0x6d,0xd5,0x7c,0x04,0x0e,0x04			0x05,0x6a,0x6d,0xd5,0x7c,0x04,0x0e,0x04

	# W32: v_div_scale_f32 v5, s0, s1, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x04,0x0e,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, s1, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x04,0x0e,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], s1, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x04,0x0e,0x04]			# W64: v_div_scale_f32 v5, vcc, s1, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x04,0x0e,0x04]
	0x05,0x00,0x6d,0xd5,0x01,0x04,0x0e,0x04			0x05,0x6a,0x6d,0xd5,0x01,0x04,0x0e,0x04

	# W32: v_div_scale_f32 v5, s0, s103, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x67,0x04,0x0e,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, s103, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x67,0x04,0x0e,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], s103, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x67,0x04,0x0e,0x04]			# W64: v_div_scale_f32 v5, vcc, s103, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x67,0x04,0x0e,0x04]
	0x05,0x00,0x6d,0xd5,0x67,0x04,0x0e,0x04			0x05,0x6a,0x6d,0xd5,0x67,0x04,0x0e,0x04

	# W32: v_div_scale_f32 v5, s0, ttmp11, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x77,0x04,0x0e,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, ttmp11, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x77,0x04,0x0e,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], ttmp11, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x77,0x04,0x0e,0x04]			# W64: v_div_scale_f32 v5, vcc, ttmp11, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x77,0x04,0x0e,0x04]
	0x05,0x00,0x6d,0xd5,0x77,0x04,0x0e,0x04			0x05,0x6a,0x6d,0xd5,0x77,0x04,0x0e,0x04

	# W32: v_div_scale_f32 v5, s0, v1, -1, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x83,0x0d,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, v1, -1, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x83,0x0d,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], v1, -1, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x83,0x0d,0x04]			# W64: v_div_scale_f32 v5, vcc, v1, -1, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x83,0x0d,0x04]
	0x05,0x00,0x6d,0xd5,0x01,0x83,0x0d,0x04			0x05,0x6a,0x6d,0xd5,0x01,0x83,0x0d,0x04

	# W32: v_div_scale_f32 v5, s0, v1, -4.0, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0xef,0x0d,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, v1, -4.0, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xef,0x0d,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], v1, -4.0, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0xef,0x0d,0x04]			# W64: v_div_scale_f32 v5, vcc, v1, -4.0, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xef,0x0d,0x04]
	0x05,0x00,0x6d,0xd5,0x01,0xef,0x0d,0x04			0x05,0x6a,0x6d,0xd5,0x01,0xef,0x0d,0x04

	# W32: v_div_scale_f32 v5, s0, v1, 0, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x01,0x0d,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, v1, 0, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x01,0x0d,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], v1, 0, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x01,0x0d,0x04]			# W64: v_div_scale_f32 v5, vcc, v1, 0, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x01,0x0d,0x04]
	0x05,0x00,0x6d,0xd5,0x01,0x01,0x0d,0x04			0x05,0x6a,0x6d,0xd5,0x01,0x01,0x0d,0x04

	# W32: v_div_scale_f32 v5, s0, v1, 0.5, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0xe1,0x0d,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, v1, 0.5, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xe1,0x0d,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], v1, 0.5, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0xe1,0x0d,0x04]			# W64: v_div_scale_f32 v5, vcc, v1, 0.5, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xe1,0x0d,0x04]
	0x05,0x00,0x6d,0xd5,0x01,0xe1,0x0d,0x04			0x05,0x6a,0x6d,0xd5,0x01,0xe1,0x0d,0x04

	# W32: v_div_scale_f32 v5, s0, v1, exec_hi, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0xff,0x0c,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, v1, exec_hi, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xff,0x0c,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], v1, exec_hi, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0xff,0x0c,0x04]			# W64: v_div_scale_f32 v5, vcc, v1, exec_hi, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xff,0x0c,0x04]
	0x05,0x00,0x6d,0xd5,0x01,0xff,0x0c,0x04			0x05,0x6a,0x6d,0xd5,0x01,0xff,0x0c,0x04

	# W32: v_div_scale_f32 v5, s0, v1, exec_lo, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0xfd,0x0c,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, v1, exec_lo, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xfd,0x0c,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], v1, exec_lo, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0xfd,0x0c,0x04]			# W64: v_div_scale_f32 v5, vcc, v1, exec_lo, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xfd,0x0c,0x04]
	0x05,0x00,0x6d,0xd5,0x01,0xfd,0x0c,0x04			0x05,0x6a,0x6d,0xd5,0x01,0xfd,0x0c,0x04

	# W32: v_div_scale_f32 v5, s0, v1, m0, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0xf9,0x0c,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, v1, m0, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xf9,0x0c,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], v1, m0, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0xf9,0x0c,0x04]			# W64: v_div_scale_f32 v5, vcc, v1, m0, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xf9,0x0c,0x04]
	0x05,0x00,0x6d,0xd5,0x01,0xf9,0x0c,0x04			0x05,0x6a,0x6d,0xd5,0x01,0xf9,0x0c,0x04

	# W32: v_div_scale_f32 v5, s0, v1, s103, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0xcf,0x0c,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, v1, s103, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xcf,0x0c,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], v1, s103, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0xcf,0x0c,0x04]			# W64: v_div_scale_f32 v5, vcc, v1, s103, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xcf,0x0c,0x04]
	0x05,0x00,0x6d,0xd5,0x01,0xcf,0x0c,0x04			0x05,0x6a,0x6d,0xd5,0x01,0xcf,0x0c,0x04

	# W32: v_div_scale_f32 v5, s0, v1, s2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x0c,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, v1, s2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x0c,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], v1, s2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x0c,0x04]			# W64: v_div_scale_f32 v5, vcc, v1, s2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x0c,0x04]
	0x05,0x00,0x6d,0xd5,0x01,0x05,0x0c,0x04			0x05,0x6a,0x6d,0xd5,0x01,0x05,0x0c,0x04

	# W32: v_div_scale_f32 v5, s0, v1, ttmp11, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0xef,0x0c,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, v1, ttmp11, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xef,0x0c,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], v1, ttmp11, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0xef,0x0c,0x04]			# W64: v_div_scale_f32 v5, vcc, v1, ttmp11, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xef,0x0c,0x04]
	0x05,0x00,0x6d,0xd5,0x01,0xef,0x0c,0x04			0x05,0x6a,0x6d,0xd5,0x01,0xef,0x0c,0x04

	# W32: v_div_scale_f32 v5, s0, v1, v2, -1 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x06,0x03]			# W32: v_div_scale_f32 v5, vcc_lo, v1, v2, -1 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x06,0x03]
	# W64: v_div_scale_f32 v5, s[0:1], v1, v2, -1 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x06,0x03]			# W64: v_div_scale_f32 v5, vcc, v1, v2, -1 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x06,0x03]
	0x05,0x00,0x6d,0xd5,0x01,0x05,0x06,0x03			0x05,0x6a,0x6d,0xd5,0x01,0x05,0x06,0x03

	# W32: v_div_scale_f32 v5, s0, v1, v2, -4.0 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xde,0x03]			# W32: v_div_scale_f32 v5, vcc_lo, v1, v2, -4.0 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xde,0x03]
	# W64: v_div_scale_f32 v5, s[0:1], v1, v2, -4.0 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xde,0x03]			# W64: v_div_scale_f32 v5, vcc, v1, v2, -4.0 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xde,0x03]
	0x05,0x00,0x6d,0xd5,0x01,0x05,0xde,0x03			0x05,0x6a,0x6d,0xd5,0x01,0x05,0xde,0x03

	# W32: v_div_scale_f32 v5, s0, v1, v2, 0 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x02,0x02]			# W32: v_div_scale_f32 v5, vcc_lo, v1, v2, 0 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x02,0x02]
	# W64: v_div_scale_f32 v5, s[0:1], v1, v2, 0 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x02,0x02]			# W64: v_div_scale_f32 v5, vcc, v1, v2, 0 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x02,0x02]
	0x05,0x00,0x6d,0xd5,0x01,0x05,0x02,0x02			0x05,0x6a,0x6d,0xd5,0x01,0x05,0x02,0x02

	# W32: v_div_scale_f32 v5, s0, v1, v2, 0.5 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xc2,0x03]			# W32: v_div_scale_f32 v5, vcc_lo, v1, v2, 0.5 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xc2,0x03]
	# W64: v_div_scale_f32 v5, s[0:1], v1, v2, 0.5 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xc2,0x03]			# W64: v_div_scale_f32 v5, vcc, v1, v2, 0.5 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xc2,0x03]
	0x05,0x00,0x6d,0xd5,0x01,0x05,0xc2,0x03			0x05,0x6a,0x6d,0xd5,0x01,0x05,0xc2,0x03

	# W32: v_div_scale_f32 v5, s0, v1, v2, exec_hi ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xfe,0x01]			# W32: v_div_scale_f32 v5, vcc_lo, v1, v2, exec_hi ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xfe,0x01]
	# W64: v_div_scale_f32 v5, s[0:1], v1, v2, exec_hi ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xfe,0x01]			# W64: v_div_scale_f32 v5, vcc, v1, v2, exec_hi ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xfe,0x01]
	0x05,0x00,0x6d,0xd5,0x01,0x05,0xfe,0x01			0x05,0x6a,0x6d,0xd5,0x01,0x05,0xfe,0x01

	# W32: v_div_scale_f32 v5, s0, v1, v2, exec_lo ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xfa,0x01]			# W32: v_div_scale_f32 v5, vcc_lo, v1, v2, exec_lo ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xfa,0x01]
	# W64: v_div_scale_f32 v5, s[0:1], v1, v2, exec_lo ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xfa,0x01]			# W64: v_div_scale_f32 v5, vcc, v1, v2, exec_lo ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xfa,0x01]
	0x05,0x00,0x6d,0xd5,0x01,0x05,0xfa,0x01			0x05,0x6a,0x6d,0xd5,0x01,0x05,0xfa,0x01

	# W32: v_div_scale_f32 v5, s0, v1, v2, m0 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xf2,0x01]			# W32: v_div_scale_f32 v5, vcc_lo, v1, v2, m0 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xf2,0x01]
	# W64: v_div_scale_f32 v5, s[0:1], v1, v2, m0 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xf2,0x01]			# W64: v_div_scale_f32 v5, vcc, v1, v2, m0 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xf2,0x01]
	0x05,0x00,0x6d,0xd5,0x01,0x05,0xf2,0x01			0x05,0x6a,0x6d,0xd5,0x01,0x05,0xf2,0x01

	# W32: v_div_scale_f32 v5, s0, v1, v2, s103 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x9e,0x01]			# W32: v_div_scale_f32 v5, vcc_lo, v1, v2, s103 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x9e,0x01]
	# W64: v_div_scale_f32 v5, s[0:1], v1, v2, s103 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x9e,0x01]			# W64: v_div_scale_f32 v5, vcc, v1, v2, s103 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x9e,0x01]
	0x05,0x00,0x6d,0xd5,0x01,0x05,0x9e,0x01			0x05,0x6a,0x6d,0xd5,0x01,0x05,0x9e,0x01

	# W32: v_div_scale_f32 v5, s0, v1, v2, s3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x0e,0x00]			# W32: v_div_scale_f32 v5, vcc_lo, v1, v2, s3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x0e,0x00]
	# W64: v_div_scale_f32 v5, s[0:1], v1, v2, s3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x0e,0x00]			# W64: v_div_scale_f32 v5, vcc, v1, v2, s3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x0e,0x00]
	0x05,0x00,0x6d,0xd5,0x01,0x05,0x0e,0x00			0x05,0x6a,0x6d,0xd5,0x01,0x05,0x0e,0x00

	# W32: v_div_scale_f32 v5, s0, v1, v2, ttmp11 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xde,0x01]			# W32: v_div_scale_f32 v5, vcc_lo, v1, v2, ttmp11 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xde,0x01]
	# W64: v_div_scale_f32 v5, s[0:1], v1, v2, ttmp11 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xde,0x01]			# W64: v_div_scale_f32 v5, vcc, v1, v2, ttmp11 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xde,0x01]
	0x05,0x00,0x6d,0xd5,0x01,0x05,0xde,0x01			0x05,0x6a,0x6d,0xd5,0x01,0x05,0xde,0x01

	# W32: v_div_scale_f32 v5, s0, v1, v2, v255 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xfe,0x07]			# W32: v_div_scale_f32 v5, vcc_lo, v1, v2, v255 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xfe,0x07]
	# W64: v_div_scale_f32 v5, s[0:1], v1, v2, v255 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xfe,0x07]			# W64: v_div_scale_f32 v5, vcc, v1, v2, v255 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xfe,0x07]
	0x05,0x00,0x6d,0xd5,0x01,0x05,0xfe,0x07			0x05,0x6a,0x6d,0xd5,0x01,0x05,0xfe,0x07

	# W32: v_div_scale_f32 v5, s0, v1, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x0e,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, v1, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x0e,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], v1, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0x0e,0x04]			# W64: v_div_scale_f32 v5, vcc, v1, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0x0e,0x04]
	0x05,0x00,0x6d,0xd5,0x01,0x05,0x0e,0x04			0x05,0x6a,0x6d,0xd5,0x01,0x05,0x0e,0x04

	# W32: v_div_scale_f32 v5, s0, v1, v2, vcc_hi ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xae,0x01]			# W32: v_div_scale_f32 v5, vcc_lo, v1, v2, vcc_hi ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xae,0x01]
	# W64: v_div_scale_f32 v5, s[0:1], v1, v2, vcc_hi ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xae,0x01]			# W64: v_div_scale_f32 v5, vcc, v1, v2, vcc_hi ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xae,0x01]
	0x05,0x00,0x6d,0xd5,0x01,0x05,0xae,0x01			0x05,0x6a,0x6d,0xd5,0x01,0x05,0xae,0x01

	# W32: v_div_scale_f32 v5, s0, v1, v2, vcc_lo ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xaa,0x01]			# W32: v_div_scale_f32 v5, vcc_lo, v1, v2, vcc_lo ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xaa,0x01]
	# W64: v_div_scale_f32 v5, s[0:1], v1, v2, vcc_lo ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0x05,0xaa,0x01]			# W64: v_div_scale_f32 v5, vcc, v1, v2, vcc_lo ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0x05,0xaa,0x01]
	0x05,0x00,0x6d,0xd5,0x01,0x05,0xaa,0x01			0x05,0x6a,0x6d,0xd5,0x01,0x05,0xaa,0x01

	# W32: v_div_scale_f32 v5, s0, v1, v255, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0xff,0x0f,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, v1, v255, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xff,0x0f,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], v1, v255, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0xff,0x0f,0x04]			# W64: v_div_scale_f32 v5, vcc, v1, v255, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xff,0x0f,0x04]
	0x05,0x00,0x6d,0xd5,0x01,0xff,0x0f,0x04			0x05,0x6a,0x6d,0xd5,0x01,0xff,0x0f,0x04

	# W32: v_div_scale_f32 v5, s0, v1, vcc_hi, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0xd7,0x0c,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, v1, vcc_hi, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xd7,0x0c,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], v1, vcc_hi, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0xd7,0x0c,0x04]			# W64: v_div_scale_f32 v5, vcc, v1, vcc_hi, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xd7,0x0c,0x04]
	0x05,0x00,0x6d,0xd5,0x01,0xd7,0x0c,0x04			0x05,0x6a,0x6d,0xd5,0x01,0xd7,0x0c,0x04

	# W32: v_div_scale_f32 v5, s0, v1, vcc_lo, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0xd5,0x0c,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, v1, vcc_lo, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xd5,0x0c,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], v1, vcc_lo, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x01,0xd5,0x0c,0x04]			# W64: v_div_scale_f32 v5, vcc, v1, vcc_lo, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x01,0xd5,0x0c,0x04]
	0x05,0x00,0x6d,0xd5,0x01,0xd5,0x0c,0x04			0x05,0x6a,0x6d,0xd5,0x01,0xd5,0x0c,0x04

	# W32: v_div_scale_f32 v5, s0, v255, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0xff,0x05,0x0e,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, v255, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0xff,0x05,0x0e,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], v255, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0xff,0x05,0x0e,0x04]			# W64: v_div_scale_f32 v5, vcc, v255, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0xff,0x05,0x0e,0x04]
	0x05,0x00,0x6d,0xd5,0xff,0x05,0x0e,0x04			0x05,0x6a,0x6d,0xd5,0xff,0x05,0x0e,0x04

	# W32: v_div_scale_f32 v5, s0, vcc_hi, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x6b,0x04,0x0e,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, vcc_hi, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x6b,0x04,0x0e,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], vcc_hi, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x6b,0x04,0x0e,0x04]			# W64: v_div_scale_f32 v5, vcc, vcc_hi, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x6b,0x04,0x0e,0x04]
	0x05,0x00,0x6d,0xd5,0x6b,0x04,0x0e,0x04			0x05,0x6a,0x6d,0xd5,0x6b,0x04,0x0e,0x04

	# W32: v_div_scale_f32 v5, s0, vcc_lo, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x6a,0x04,0x0e,0x04]			# W32: v_div_scale_f32 v5, vcc_lo, vcc_lo, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x6a,0x04,0x0e,0x04]
	# W64: v_div_scale_f32 v5, s[0:1], vcc_lo, v2, v3 ; encoding: [0x05,0x00,0x6d,0xd5,0x6a,0x04,0x0e,0x04]			# W64: v_div_scale_f32 v5, vcc, vcc_lo, v2, v3 ; encoding: [0x05,0x6a,0x6d,0xd5,0x6a,0x04,0x0e,0x04]
	0x05,0x00,0x6d,0xd5,0x6a,0x04,0x0e,0x04			0x05,0x6a,0x6d,0xd5,0x6a,0x04,0x0e,0x04

	# GFX10: v_exp_f16_e64 v255, v1 ; encoding: [0xff,0x00,0xd8,0xd5,0x01,0x01,0x00,0x00]			# GFX10: v_exp_f16_e64 v255, v1 ; encoding: [0xff,0x00,0xd8,0xd5,0x01,0x01,0x00,0x00]
	0xff,0x00,0xd8,0xd5,0x01,0x01,0x00,0x00			0xff,0x00,0xd8,0xd5,0x01,0x01,0x00,0x00

	# GFX10: v_exp_f16_e64 v5, -1 ; encoding: [0x05,0x00,0xd8,0xd5,0xc1,0x00,0x00,0x00]			# GFX10: v_exp_f16_e64 v5, -1 ; encoding: [0x05,0x00,0xd8,0xd5,0xc1,0x00,0x00,0x00]
	0x05,0x00,0xd8,0xd5,0xc1,0x00,0x00,0x00			0x05,0x00,0xd8,0xd5,0xc1,0x00,0x00,0x00

	# GFX10: v_exp_f16_e64 v5, -4.0 ; encoding: [0x05,0x00,0xd8,0xd5,0xf7,0x00,0x00,0x00]			# GFX10: v_exp_f16_e64 v5, -4.0 ; encoding: [0x05,0x00,0xd8,0xd5,0xf7,0x00,0x00,0x00]
	▲ Show 20 Lines • Show All 13,639 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix SDST operand of V_DIV_SCALE to always be VCCChanges PlannedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 458989

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp

llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp

llvm/lib/Target/AMDGPU/VOP3Instructions.td

llvm/lib/Target/AMDGPU/VOPInstructions.td

llvm/test/CodeGen/AMDGPU/GlobalISel/constant-bus-restriction.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f16.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f32.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f64.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.div.scale.ll

llvm/test/CodeGen/AMDGPU/fdiv-nofpexcept.ll

llvm/test/CodeGen/AMDGPU/fdiv.f64.ll

llvm/test/CodeGen/AMDGPU/frem.ll

llvm/test/CodeGen/AMDGPU/inserted-wait-states.mir

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.div.scale.ll

llvm/test/CodeGen/AMDGPU/llvm.powi.ll

llvm/test/CodeGen/AMDGPU/sched-crash-dbg-value.mir

llvm/test/CodeGen/AMDGPU/wave32.ll

llvm/test/MC/AMDGPU/gfx10_asm_vop3.s

llvm/test/MC/AMDGPU/gfx11_asm_vop3.s

llvm/test/MC/AMDGPU/vop3.s

llvm/test/MC/AMDGPU/wave32.s

llvm/test/MC/AMDGPU/wave_any.s

llvm/test/MC/Disassembler/AMDGPU/gfx10-wave32.txt

llvm/test/MC/Disassembler/AMDGPU/gfx10_vop3.txt

[AMDGPU] Fix SDST operand of V_DIV_SCALE to always be VCC
Changes PlannedPublic