This is an archive of the discontinued LLVM Phabricator instance.

Add the Connex SIMD/vector processor back end (main back end patch)
Needs ReviewPublic

Authored by alexsusu on Mar 2 2021, 10:09 AM.

Download Raw Diff

Details

Reviewers

jpienaar
jdoerfert
jfb
fedor.sergeev

Summary

Connex is an established, almost 30-year old, wide research vector processor (see, for example, http://users.dcae.pub.ro/~gstefan/2ndLevel/connex.html) with a number of lanes between 32 and 4096, easily changeable at synthesis time.
A very interesting feature is that the Connex processor has a local banked vector memory (each lane has its own local memory), which achieves 1 cycle latency with direct and indirect loads and stores - this implies that the memory bandwidth is very big.

The Connex vector processor has 16-bit signed integer Execution Units in each lane. It is emulating efficiently (via inlining the emulation subroutines in the instruction selection pass) 32-bit int and IEEE 754-2008 compliant 16-bit floating point (Clang type _Float16, C for ARM __fp16, LLVM IR half type). The emulation subroutines are in the lib/Target/Connex/Select_*_OpincaaCodeGen.h files, which are to be included in the ConnexISelDAGToDAG.cpp module, in the ConnexDAGToDAGISel::Select() method. These emulation subroutines can be easily adjusted using for example to increase performance by sacrificing accuracy of f16 - drop me an email to ask how can you do it. (They currently total almost 1 MB of C++ code.)
The Connex vector processor does not currently support the float, double, nor the 64-bit integer types.

The back end targets more exactly the Connex processor, used as an accelerator, a variant of the Connex processor, which is low-power. The working compiler is described at https://dl.acm.org/doi/10.1145/3406536 and at https://sites.google.com/site/connextools/ .

Note that currently our back end targets only our Connex Opincaa assembler (very easy to learn and use) available at https://gitlab.upb.ro/research/ConnexRelated/opincaa/ .
The Connex Opincaa assembler allows to run arbitrary Connex vector-length, host (CPU) agnostic code.

The ISA of the Connex vector processor is available at https://gitlab.dcae.pub.ro/research/ConnexRelated/opincaa/blob/master/ConnexISA.pdf .
The Connex vector processor has also an open source C++ simulator available also at https://gitlab.dcae.pub.ro/research/ConnexRelated/opincaa/ .

The mailing list for the Connex processor and tools is: https://groups.google.com/forum/#!forum/connex-tools .

An interesting feature is that, in order to support recovering from from the Instruction selection pass' SelectionDAG back to the original source (C) code we require adding a simple data structure in include/llvm/CodeGen/SelectionDAG.h (and helper methods in related files) that maps an SDValue to the LLVM IR Value object it was used to translate from:

DenseMap<const Value*, SDValue> *crtNodeMapPtr

The Connex back end is 4 years old. We published 2 academic papers on it at ACM TECS and a CGO workshop: https://dl.acm.org/citation.cfm?id=3306166 . However, we are still adding features to the back end.

Small note: the Connex backend is rather small, it builds fast (in ~3-5 mins, single-threaded on a decent machine; in Apr 2019 the built objects have a total 71,168K, while the smallest LLVM backend, MSP430, has 63,387K and the biggest ones are X86 with 359,736K, and AMDGPU with 488,309K).

An important thing is that I think the test/MC/Connex folder should not be populated for this patch, because the Connex back end is able to generate only an assembly code that is required to be used by the special Opincaa assembler, which is not integrated in LLVM. I've seen other back ends doing a similar thing such as the NVPTX back end, which doesn't support object file generation. The Connex back end also doesn't support object file generation.
The eBPF+ConnexS processor has the same ABI as the eBPF processor it extends, except that Connex-S supports natively only 16-bit integers and it is able to access the banked vector memory only by line (so Connex-S can't perform unaligned accesses).

The Connex processor is currently implemented in FPGA, but was also implemented in silicon also:

an older version for HDTV: Gheorghe M. Stefan, "The CA1024: A Massively Parallel Processor for Cost-Effective HDTV", 2006 (http://users.dcae.pub.ro/~gstefan/2ndLevel/images/connex_v4.ppt)
M. Malita and Gheorghe M. Stefan, "Map-scan Node Accelerator for Big-data"
Gheorghe M. Stefan and Mihaela Malita, "Can One-Chip Parallel Computing Be Liberated From Ad Hoc Solutions? A Computation Model Based Approach and Its Implementation"

Comitting all the Connex back end files and a few other files from LLVM that I had to touch to work well.

Diff Detail

Event Timeline

alexsusu created this revision.Mar 2 2021, 10:09 AM

Herald added subscribers: dexonsmith, pengfei, jfb and 6 others. · View Herald TranscriptMar 2 2021, 10:09 AM

alexsusu requested review of this revision.Mar 2 2021, 10:09 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2021, 10:09 AM

Herald added subscribers: llvm-commits, jdoerfert, aheejin. · View Herald Transcript

alexsusu added a parent revision: D97638: Add the Connex SIMD/vector processor back end.Mar 2 2021, 10:10 AM

alexsusu retitled this revision from Add the Connex SIMD/vector processor back end to Add the Connex SIMD/vector processor back end (main back end patch).

Harbormaster completed remote builds in B91611: Diff 327502.Mar 2 2021, 11:50 AM

Indented better the code, partly as suggested by the clang-format tool.

Herald added subscribers: mstorsjo, ormris. · View Herald TranscriptJun 30 2021, 9:17 AM

Harbormaster completed remote builds in B111776: Diff 355568.Jun 30 2021, 9:18 AM

ormris removed a subscriber: ormris.Jan 24 2022, 11:51 AM

alexsusu edited the summary of this revision. (Show Details)Jan 19 2023, 10:25 AM

alexsusu added reviewers: jpienaar, jdoerfert.

Herald added a project: Restricted Project. · View Herald TranscriptJan 19 2023, 10:25 AM

Herald added subscribers: • pcwang-thead, kosarev. · View Herald Transcript

dexonsmith removed a subscriber: dexonsmith.Jan 19 2023, 10:39 AM

alexsusu added a reviewer: jfb.Jan 23 2023, 11:27 AM

alexsusu added a reviewer: fedor.sergeev.Jan 23 2023, 12:18 PM

There's way too much commented out and macro conditional code

llvm/include/llvm/ADT/Triple.h
59 ↗	(On Diff #355568)	Belongs with the other patch?
llvm/lib/Target/Connex/Connex.td
31–43	Lots of extra spaces?
llvm/lib/Target/Connex/ConnexAsmPrinter.cpp
65–67	Can't have globals like this
611	dead function
1069	Remove all of these type of macros
1412–1432	Drop dead code
llvm/lib/Target/Connex/ConnexAsmPrinterLoopNests.h
71–79	Definitely shouldn't be using fgets/fscanf anywhere. File usage anywhere like this is dubious
llvm/lib/Target/Connex/ConnexISelDAGToDAG.cpp
484	no strncmps
593–594	no raw mallocs and strcpy
llvm/lib/Target/Connex/Select_REDf16_OpincaaCodeGen.h
10–14	Generated code should only come from tablegen and not be committed
llvm/test/CodeGen/Connex/MatMulBT-512_i16_tiled_182_74_512.ll
4 ↗	(On Diff #355568)	This test is huge and I don't see any checks

Addressed reviews of Matt Arsenault.
Formatted all C++ source files with clang-format.

Harbormaster completed remote builds in B217369: Diff 502377.Mar 4 2023, 9:45 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

CMakeLists.txt

1 line

Intrinsics.td

1 line

IntrinsicsConnex.td

106 lines

lib/

IR/

Function.cpp

1 line

Target/

Connex/

53 lines

36 lines

66 lines

1222 lines

ConnexAsmPrinterLoopNests.h

131 lines

ConnexCallingConv.td

33 lines

ConnexConfig.h

101 lines

ConnexFrameLowering.h

41 lines

ConnexFrameLowering.cpp

41 lines

ConnexHazardRecognizer.h

55 lines

ConnexHazardRecognizer.cpp

296 lines

ConnexISelDAGToDAG.cpp

4489 lines

ConnexISelLowering.h

208 lines

ConnexISelLowering.cpp

3268 lines

ConnexISelMisc.h

18 lines

ConnexInstrFormats.td

749 lines

ConnexInstrInfo.h

96 lines

ConnexInstrInfo.cpp

863 lines

ConnexInstrInfo.td

22 lines

ConnexInstrInfoScalar.td

603 lines

ConnexInstrInfoVec.td

1294 lines

ConnexInstrInfoVecVsplat.td

506 lines

ConnexMCInstLower.h

43 lines

ConnexMCInstLower.cpp

112 lines

ConnexRegisterInfo.h

77 lines

ConnexRegisterInfo.cpp

160 lines

ConnexRegisterInfo.td

261 lines

ConnexSchedule.td

10 lines

ConnexSelectionDAGInfo.h

73 lines

ConnexSelectionDAGInfo.cpp

119 lines

ConnexSubtarget.h

69 lines

ConnexSubtarget.cpp

30 lines

ConnexTargetMachine.h

50 lines

ConnexTargetMachine.cpp

1528 lines

ConnexTargetTransformInfo.h

131 lines

MCTargetDesc/

CMakeLists.txt

15 lines

ConnexAsmBackend.cpp

111 lines

ConnexELFObjectWriter.cpp

82 lines

ConnexInstPrinter.h

68 lines

ConnexInstPrinter.cpp

364 lines

ConnexMCAsmInfo.h

45 lines

ConnexMCCodeEmitter.cpp

161 lines

ConnexMCTargetDesc.h

64 lines

ConnexMCTargetDesc.cpp

98 lines

Misc.h

143 lines

RecoverFromLlvmIR.h

2217 lines

Select_ABSi32_OpincaaCodeGen.h

306 lines

Select_ADDf16_OpincaaCodeGen.h

3555 lines

Select_ADDi32_OpincaaCodeGen.h

189 lines

Select_DIVf16_OpincaaCodeGen.h

2909 lines

Select_DIVi16_OpincaaCodeGen.h

5741 lines

Select_LTf16_OpincaaCodeGen.h

646 lines

Select_MULTf16_OpincaaCodeGen.h

3242 lines

Select_MULTi32_ComplementedRepresentation_OpincaaCodeGen.h

325 lines

Select_REDf16_OpincaaCodeGen.h

1478 lines

Select_REDi32_OpincaaCodeGen.h

174 lines

Select_SHRAi32_OpincaaCodeGen.h

461 lines

Select_SUBf16_OpincaaCodeGen.h

3566 lines

Select_SUBi32_OpincaaCodeGen.h

191 lines

TargetInfo/

CMakeLists.txt

10 lines

ConnexTargetInfo.cpp

21 lines

test/

CodeGen/

Connex/

MatMulBT-128_i16.ll

230 lines

Diff 502377

llvm/include/llvm/IR/CMakeLists.txt

	set(LLVM_TARGET_DEFINITIONS Attributes.td)			set(LLVM_TARGET_DEFINITIONS Attributes.td)
	tablegen(LLVM Attributes.inc -gen-attrs)			tablegen(LLVM Attributes.inc -gen-attrs)

	set(LLVM_TARGET_DEFINITIONS Intrinsics.td)			set(LLVM_TARGET_DEFINITIONS Intrinsics.td)
	tablegen(LLVM IntrinsicImpl.inc -gen-intrinsic-impl)			tablegen(LLVM IntrinsicImpl.inc -gen-intrinsic-impl)
	tablegen(LLVM IntrinsicEnums.inc -gen-intrinsic-enums)			tablegen(LLVM IntrinsicEnums.inc -gen-intrinsic-enums)
	tablegen(LLVM IntrinsicsAArch64.h -gen-intrinsic-enums -intrinsic-prefix=aarch64)			tablegen(LLVM IntrinsicsAArch64.h -gen-intrinsic-enums -intrinsic-prefix=aarch64)
	tablegen(LLVM IntrinsicsAMDGPU.h -gen-intrinsic-enums -intrinsic-prefix=amdgcn)			tablegen(LLVM IntrinsicsAMDGPU.h -gen-intrinsic-enums -intrinsic-prefix=amdgcn)
	tablegen(LLVM IntrinsicsARM.h -gen-intrinsic-enums -intrinsic-prefix=arm)			tablegen(LLVM IntrinsicsARM.h -gen-intrinsic-enums -intrinsic-prefix=arm)
	tablegen(LLVM IntrinsicsBPF.h -gen-intrinsic-enums -intrinsic-prefix=bpf)			tablegen(LLVM IntrinsicsBPF.h -gen-intrinsic-enums -intrinsic-prefix=bpf)
				tablegen(LLVM IntrinsicsConnex.h -gen-intrinsic-enums -intrinsic-prefix=connex)
	tablegen(LLVM IntrinsicsDirectX.h -gen-intrinsic-enums -intrinsic-prefix=dx)			tablegen(LLVM IntrinsicsDirectX.h -gen-intrinsic-enums -intrinsic-prefix=dx)
	tablegen(LLVM IntrinsicsHexagon.h -gen-intrinsic-enums -intrinsic-prefix=hexagon)			tablegen(LLVM IntrinsicsHexagon.h -gen-intrinsic-enums -intrinsic-prefix=hexagon)
	tablegen(LLVM IntrinsicsLoongArch.h -gen-intrinsic-enums -intrinsic-prefix=loongarch)			tablegen(LLVM IntrinsicsLoongArch.h -gen-intrinsic-enums -intrinsic-prefix=loongarch)
	tablegen(LLVM IntrinsicsMips.h -gen-intrinsic-enums -intrinsic-prefix=mips)			tablegen(LLVM IntrinsicsMips.h -gen-intrinsic-enums -intrinsic-prefix=mips)
	tablegen(LLVM IntrinsicsNVPTX.h -gen-intrinsic-enums -intrinsic-prefix=nvvm)			tablegen(LLVM IntrinsicsNVPTX.h -gen-intrinsic-enums -intrinsic-prefix=nvvm)
	tablegen(LLVM IntrinsicsPowerPC.h -gen-intrinsic-enums -intrinsic-prefix=ppc)			tablegen(LLVM IntrinsicsPowerPC.h -gen-intrinsic-enums -intrinsic-prefix=ppc)
	tablegen(LLVM IntrinsicsR600.h -gen-intrinsic-enums -intrinsic-prefix=r600)			tablegen(LLVM IntrinsicsR600.h -gen-intrinsic-enums -intrinsic-prefix=r600)
	tablegen(LLVM IntrinsicsRISCV.h -gen-intrinsic-enums -intrinsic-prefix=riscv)			tablegen(LLVM IntrinsicsRISCV.h -gen-intrinsic-enums -intrinsic-prefix=riscv)
	tablegen(LLVM IntrinsicsSPIRV.h -gen-intrinsic-enums -intrinsic-prefix=spv)			tablegen(LLVM IntrinsicsSPIRV.h -gen-intrinsic-enums -intrinsic-prefix=spv)
	tablegen(LLVM IntrinsicsS390.h -gen-intrinsic-enums -intrinsic-prefix=s390)			tablegen(LLVM IntrinsicsS390.h -gen-intrinsic-enums -intrinsic-prefix=s390)
	tablegen(LLVM IntrinsicsWebAssembly.h -gen-intrinsic-enums -intrinsic-prefix=wasm)			tablegen(LLVM IntrinsicsWebAssembly.h -gen-intrinsic-enums -intrinsic-prefix=wasm)
	tablegen(LLVM IntrinsicsX86.h -gen-intrinsic-enums -intrinsic-prefix=x86)			tablegen(LLVM IntrinsicsX86.h -gen-intrinsic-enums -intrinsic-prefix=x86)
	tablegen(LLVM IntrinsicsXCore.h -gen-intrinsic-enums -intrinsic-prefix=xcore)			tablegen(LLVM IntrinsicsXCore.h -gen-intrinsic-enums -intrinsic-prefix=xcore)
	tablegen(LLVM IntrinsicsVE.h -gen-intrinsic-enums -intrinsic-prefix=ve)			tablegen(LLVM IntrinsicsVE.h -gen-intrinsic-enums -intrinsic-prefix=ve)
	add_public_tablegen_target(intrinsics_gen)			add_public_tablegen_target(intrinsics_gen)

llvm/include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 2,178 Lines • ▼ Show 20 Lines
	include "llvm/IR/IntrinsicsARM.td"			include "llvm/IR/IntrinsicsARM.td"
	include "llvm/IR/IntrinsicsAArch64.td"			include "llvm/IR/IntrinsicsAArch64.td"
	include "llvm/IR/IntrinsicsXCore.td"			include "llvm/IR/IntrinsicsXCore.td"
	include "llvm/IR/IntrinsicsHexagon.td"			include "llvm/IR/IntrinsicsHexagon.td"
	include "llvm/IR/IntrinsicsNVVM.td"			include "llvm/IR/IntrinsicsNVVM.td"
	include "llvm/IR/IntrinsicsMips.td"			include "llvm/IR/IntrinsicsMips.td"
	include "llvm/IR/IntrinsicsAMDGPU.td"			include "llvm/IR/IntrinsicsAMDGPU.td"
	include "llvm/IR/IntrinsicsBPF.td"			include "llvm/IR/IntrinsicsBPF.td"
				include "llvm/IR/IntrinsicsConnex.td"
	include "llvm/IR/IntrinsicsSystemZ.td"			include "llvm/IR/IntrinsicsSystemZ.td"
	include "llvm/IR/IntrinsicsWebAssembly.td"			include "llvm/IR/IntrinsicsWebAssembly.td"
	include "llvm/IR/IntrinsicsRISCV.td"			include "llvm/IR/IntrinsicsRISCV.td"
	include "llvm/IR/IntrinsicsSPIRV.td"			include "llvm/IR/IntrinsicsSPIRV.td"
	include "llvm/IR/IntrinsicsVE.td"			include "llvm/IR/IntrinsicsVE.td"
	include "llvm/IR/IntrinsicsDirectX.td"			include "llvm/IR/IntrinsicsDirectX.td"
	include "llvm/IR/IntrinsicsLoongArch.td"			include "llvm/IR/IntrinsicsLoongArch.td"

llvm/include/llvm/IR/IntrinsicsConnex.td

This file was added.

				//===- IntrinsicsConnex.td - Defines Connex-S intrinsics ---- tablegen --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file defines all of the Connex-specific intrinsics.
				//
				//===----------------------------------------------------------------------===//

				// All Connex-S vector processor intrinsics start with "llvm.connex."
				//
				let TargetPrefix = "connex" in {

				/*
				* Note: all intrinsics defined in these .td files start with
				* the int_ prefix (from intrinsic). For this file they start with
				* int_connex prefix - otherwise we get the following TableGen error
				* <<error:Intrinsic 'int_end_repeat' does not start with 'llvm.connex.'!>>
				*
				* The LLVM IR intrinsics extend the LLVM language s.t. we can use
				* these instructions in an LLVM IR program. We also need to define the
				* corresponding assembly instructions in the back end TableGen files.
				*/

				/* Note: Following Intrinsics.td:
				class Intrinsic<list<LLVMType> ret_types,
				list<LLVMType> param_types = [],
				list<IntrinsicProperty> properties = [],
				string name = "">
				*/


				/* Small-note:
				llvm_i64_ty makes simpler my LLVM IR generation in the LoopVectorize.cpp
				module:
				def int_connex_repeat_x_times : Intrinsic<[], [llvm_i64_ty], []>;
				But llvm_i32_ty is in accordance to the original i32 type of n.vec in the
				LoopVectorize.cpp module:
				def int_connex_repeat_x_times : Intrinsic<[], [llvm_i32_ty], []>;

				Small-note: We get inspired from include/llvm/IR/IntrinsicsPowerPC.td:
				// Intrinsics used to generate ctrl-based loops.
				def int_ppc_mtctr : Intrinsic<[], [llvm_anyint_ty], []>;

				Small-note: Trying to use a polymorphic definition, which requires
				specifying the actual type in Function::Create(FunctionType::get(), ...)
				is:
				def int_connex_repeat_x_times : Intrinsic<[], [llvm_anyint_ty], []>;
				When instantiating it in LoopVectorize.cpp like this:
				Value *instrinsicFunc = Intrinsic::getDeclaration(M,
				Intrinsic::connex_repeat_x_times);
				it gives error at runtime:
				llvm::ArrayRef<T>::operator[](size_t) const [with T = llvm::Type*;
				size_t = long unsigned int]: Assertion `Index < Length &&
				"Invalid index!"' failed.
				*/
				def int_connex_repeat_x_times : Intrinsic<[], [llvm_i64_ty], []>;
				def int_connex_end_repeat : Intrinsic<[], [], []>;

				/* Note: Possibly useful in the future.
				Connex OPINCAA's END_REPEAT does not have a relative offset,
				as the standard Connex assembly ijmpnzdec instruction,
				since it falls on Opincaa to compute the jump back relative offset.
				We can also use a setlc to position it outside the loop created by the
				ijmpnzdec instruction by using it inside a delay-slot instruction.

				def int_connex_setlc : Intrinsic<[], [llvm_i16_ty], []>;
				def int_connex_ijmpnzdec : Intrinsic<[], [], []>;
				*/



				/* IMPORTANT: REDUCE cannot return a value. It is the duty of the host (CPU)
				to read the result itself from the REDUCE issued by Connex-S.
				Therefore this definition is incorrect:
				def int_connex_reduce : Intrinsic<[llvm_i32_ty], [llvm_v128i16_ty], []>;
				*/
				/* Also good:
				def int_connex_reduce : Intrinsic<[], [llvm_v128i16_ty], []>;
				def int_connex_reduce_i32 : Intrinsic<[], [llvm_v64i32_ty], []>;
				def int_connex_reduce_f16 : Intrinsic<[], [llvm_v128f16_ty], []>;
				*/
				def int_connex_reduce : Intrinsic<[], [llvm_anyvector_ty], []>;

				/* Note: ctpop is already defined in Intrinsics.td.
				So the below definition is not required:
				def int_connex_ctpop : Intrinsic<[llvm_v8i16_ty],
				[llvm_v8i16_ty], []>;
				*/


				// Inherited BPF scalar intrinsics: Specialized loads from packet
				def int_connex_load_byte : ClangBuiltin<"__builtin_connex_load_byte">,
				Intrinsic<[llvm_i64_ty], [llvm_ptr_ty, llvm_i64_ty], [IntrReadMem]>;
				def int_connex_load_half : ClangBuiltin<"__builtin_connex_load_half">,
				Intrinsic<[llvm_i64_ty], [llvm_ptr_ty, llvm_i64_ty], [IntrReadMem]>;
				def int_connex_load_word : ClangBuiltin<"__builtin_connex_load_word">,
				Intrinsic<[llvm_i64_ty], [llvm_ptr_ty, llvm_i64_ty], [IntrReadMem]>;
				def int_connex_pseudo : ClangBuiltin<"__builtin_connex_pseudo">,
				Intrinsic<[llvm_i64_ty], [llvm_i64_ty, llvm_i64_ty]>;
				}

llvm/lib/IR/Function.cpp

	Show All 29 Lines
	#include "llvm/IR/InstIterator.h"			#include "llvm/IR/InstIterator.h"
	#include "llvm/IR/Instruction.h"			#include "llvm/IR/Instruction.h"
	#include "llvm/IR/IntrinsicInst.h"			#include "llvm/IR/IntrinsicInst.h"
	#include "llvm/IR/Intrinsics.h"			#include "llvm/IR/Intrinsics.h"
	#include "llvm/IR/IntrinsicsAArch64.h"			#include "llvm/IR/IntrinsicsAArch64.h"
	#include "llvm/IR/IntrinsicsAMDGPU.h"			#include "llvm/IR/IntrinsicsAMDGPU.h"
	#include "llvm/IR/IntrinsicsARM.h"			#include "llvm/IR/IntrinsicsARM.h"
	#include "llvm/IR/IntrinsicsBPF.h"			#include "llvm/IR/IntrinsicsBPF.h"
				#include "llvm/IR/IntrinsicsConnex.h"
	#include "llvm/IR/IntrinsicsDirectX.h"			#include "llvm/IR/IntrinsicsDirectX.h"
	#include "llvm/IR/IntrinsicsHexagon.h"			#include "llvm/IR/IntrinsicsHexagon.h"
	#include "llvm/IR/IntrinsicsMips.h"			#include "llvm/IR/IntrinsicsMips.h"
	#include "llvm/IR/IntrinsicsNVPTX.h"			#include "llvm/IR/IntrinsicsNVPTX.h"
	#include "llvm/IR/IntrinsicsPowerPC.h"			#include "llvm/IR/IntrinsicsPowerPC.h"
	#include "llvm/IR/IntrinsicsR600.h"			#include "llvm/IR/IntrinsicsR600.h"
	#include "llvm/IR/IntrinsicsRISCV.h"			#include "llvm/IR/IntrinsicsRISCV.h"
	#include "llvm/IR/IntrinsicsS390.h"			#include "llvm/IR/IntrinsicsS390.h"
	▲ Show 20 Lines • Show All 2,099 Lines • Show Last 20 Lines

llvm/lib/Target/Connex/CMakeLists.txt

This file was added.

				add_llvm_component_group(Connex)

				set(LLVM_TARGET_DEFINITIONS Connex.td)

				tablegen(LLVM ConnexGenRegisterInfo.inc -gen-register-info)
				tablegen(LLVM ConnexGenInstrInfo.inc -gen-instr-info)
				tablegen(LLVM ConnexGenAsmWriter.inc -gen-asm-writer)
				tablegen(LLVM ConnexGenAsmMatcher.inc -gen-asm-matcher)
				tablegen(LLVM ConnexGenDAGISel.inc -gen-dag-isel)
				tablegen(LLVM ConnexGenMCCodeEmitter.inc -gen-emitter)
				tablegen(LLVM ConnexGenCallingConv.inc -gen-callingconv)
				tablegen(LLVM ConnexGenSubtargetInfo.inc -gen-subtarget)
				add_public_tablegen_target(ConnexCommonTableGen)

				add_llvm_target(ConnexCodeGen
				ConnexAsmPrinter.cpp
				ConnexFrameLowering.cpp
				ConnexHazardRecognizer.cpp
				ConnexInstrInfo.cpp
				ConnexISelDAGToDAG.cpp
				ConnexISelLowering.cpp
				ConnexMCInstLower.cpp
				ConnexRegisterInfo.cpp
				ConnexSelectionDAGInfo.cpp
				ConnexSubtarget.cpp
				ConnexTargetMachine.cpp

				# similar to the ones of the Mips back end
				LINK_COMPONENTS
				Analysis
				AsmPrinter
				CodeGen
				Core
				MC
				ConnexDesc
				ConnexInfo
				IPO
				Scalar
				SelectionDAG
				Support
				Target
				TargetParser
				TransformUtils

				ADD_TO_COMPONENT
				Connex
				)

				#add_subdirectory(AsmParser)
				#add_subdirectory(InstPrinter)
				#add_subdirectory(Disassembler)
				add_subdirectory(MCTargetDesc)
				add_subdirectory(TargetInfo)

llvm/lib/Target/Connex/Connex.h

This file was added.

				//===-- Connex.h - Top-level interface for Connex representation - C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				///
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_CONNEX_CONNEX_H
				#define LLVM_LIB_TARGET_CONNEX_CONNEX_H

				#include "MCTargetDesc/ConnexMCTargetDesc.h"
				#include "llvm/Pass.h"
				#include "llvm/Target/TargetMachine.h"

				// We define reserved register(s) of Connex to use for:
				// - handling COPY instructions in WHERE blocks
				// (see ConnexTargetMachine.cpp and ConnexISelLowering.cpp), etc
				#define CONNEX_RESERVED_REGISTER_01 Connex::Wh30
				#define CONNEX_RESERVED_REGISTER_02 Connex::Wh31
				#define CONNEX_RESERVED_REGISTER_03 Connex::Wh29

				// This definition is used also in the OPINCAA library
				#define COPY_REGISTER_IMPLEMENTED_WITH_ORV_H

				namespace llvm {
				class ConnexTargetMachine;

				FunctionPass *createConnexISelDag(ConnexTargetMachine &TM);
				} // End namespace llvm

				#endif

llvm/lib/Target/Connex/Connex.td

This file was added.

				//===-- Connex.td - Describe the Connex Target Machine ----- tablegen --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//


				// The processor system we define is a:
				// - scalar processor, which is basically an almost unmodified version of the
				// extended BPF (Berkeley Packet Filter) 64-bit RISC processor, implemented
				// in LLVM. eBPF is well-described at
				// https://www.kernel.org/doc/Documentation/networking/filter.txt.
				// - a vector/SIMD unit, which is the Connex-S processor, actually an
				// accelerator, with its own separate-space banked vector memory.
				// The back end implementation starts from the BPF LLVM back end (the version
				// from Jul 2016), to which we add
				// the vector instruction support inspired mostly from the Mips MSA vector
				// extensions.
				// Note: the scalar registers of Connex are R0-31 (e.g., Connex::R10), and the
				// vector registers are Wh0-31 (from wide; e.g., Connex::Wh0).
				//
				// VERY IMPORTANT: Note that currently our back end targets only our Connex
				// OPINCAA assembler (very easy to learn and use) available at
				// https://gitlab.dcae.pub.ro/research/ConnexRelated/opincaa/ .


				include "llvm/Target/Target.td"
				include "ConnexRegisterInfo.td"
				include "ConnexCallingConv.td"
				include "ConnexSchedule.td"
				include "ConnexInstrInfo.td"


				def ConnexInstrInfo : InstrInfo;

				class Proc<string Name, list<SubtargetFeature> Features>
				: Processor<Name, NoItineraries, Features>;

				def : Proc<"generic", []>;

				def ConnexInstPrinter : AsmWriter {
				arsenmUnsubmitted Done Reply Inline Actions Lots of extra spaces? arsenm: Lots of extra spaces?
				string AsmWriterClassName = "InstPrinter";
				bit isMCAsmWriter = 1;
				}

				// Inspired from https://github.com/llvm-mirror/llvm/commit/5ef1349
				// (see also https://groups.google.com/forum/#!topic/llvm-dev/_Zr4Oe5MLkE)
				def ConnexAsmParser : AsmParser {
				bit HasMnemonicFirst = 0;
				}

				def ConnexAsmParserVariant : AsmParserVariant {
				int Variant = 0;
				string Name = "Connex";
				string BreakCharacters = ".";
				}

				def Connex : Target {
				let InstructionSet = ConnexInstrInfo;
				let AssemblyWriters = [ConnexInstPrinter];
				// Inspired from llvm/lib/Target/Hexagon/Hexagon.td
				let AssemblyParsers = [ConnexAsmParser];
				let AssemblyParserVariants = [ConnexAsmParserVariant];
				}

llvm/lib/Target/Connex/ConnexAsmPrinter.cpp

This file was added.

				//===-- ConnexAsmPrinter.cpp - Connex LLVM assembly writer ----------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains a printer that converts from our internal representation
				// of machine-dependent LLVM code to the Connex assembly language.
				//
				//===----------------------------------------------------------------------===//

				#include "Connex.h"
				#include "ConnexConfig.h"
				#include "ConnexInstrInfo.h"
				#include "ConnexMCInstLower.h"
				#include "ConnexTargetMachine.h"
				#include "MCTargetDesc/ConnexInstPrinter.h"
				#include "Misc.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/CodeGen/AsmPrinter.h"
				#include "llvm/CodeGen/MachineConstantPool.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstr.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineModuleInfo.h"
				#include "llvm/MC/MCAsmInfo.h"
				#include "llvm/MC/MCInst.h"
				#include "llvm/MC/MCStreamer.h"
				#include "llvm/MC/MCSymbol.h"
				#include "llvm/MC/TargetRegistry.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/raw_ostream.h"
				// TODO: #include "BTFDebug.h"

				using namespace llvm;

				// Inspired from llvm/lib/CodeGen/TargetPassConfig.cpp
				static cl::opt<bool> EnableCorrectBBsASMPrint(
				"enable-correct-asm-print", cl::Hidden, cl::init(true),
				cl::desc(
				"Correct the BBs of the 2nd innermost loop in loop nests of kernels "
				"and use normally REPEAT for it and 'host-side OPINCAA C++ for' as "
				"the innermost loop"));

				static cl::opt<bool> TreatRepeat2ndInnerLoopGlobalTmp(
				"treat-repeat-2nd-inner-loop", cl::Hidden, cl::init(true),
				cl::desc("Treat well 2nd inner loop in kernel and use normally REPEAT "
				"for it and host-side OPINCAA C++ for() as the inner loop"));

				#define DEBUG_TYPE "asm-printer"

				namespace {

				// Declarations for adapted RPO and DFS traversals of the CFG
				typedef bool (*CompareBBs)(MachineBasicBlock &b1, MachineBasicBlock &b2);
				//
				// We declare these vars static outside the class to avoid some strange C++
				// linker errors (used for adapted RPO or DFS traversal of the CFG).
				static std::map<MachineBasicBlock *, bool> visitedMBB;
				static std::map<MachineBasicBlock *, int> finishingTimeMBB; // DFS finish time
				static std::vector<MachineBasicBlock *> sortedListMBB;

				bool isMBBWithInlineAsmString(MachineBasicBlock *MBB, std::string strToSearch) {
				LLVM_DEBUG(
				dbgs() << "Entered isMBBWithOPINCAAKernelEndMarker(MBB->getName() = "
				arsenmUnsubmitted Done Reply Inline Actions Can't have globals like this arsenm: Can't have globals like this
				<< MBB->getName() << ")\n");

				for (auto MIItr = MBB->begin(), MBBend = MBB->end(); MIItr != MBBend;
				++MIItr) {
				MachineInstr MI = &(MIItr);

				if (MI->isInlineAsm()) {
				LLVM_DEBUG(
				dbgs()
				<< " isMBBWithOPINCAAKernelEndMarker(): found INLINEASM *MI = "
				<< *MI << "\n");

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineInstr.html
				for (unsigned index = 0; index < MI->getNumOperands(); index++) {
				MachineOperand *miOpnd;
				miOpnd = &(MI->getOperand(index));

				if (miOpnd->isSymbol()) {
				std::string symStr = miOpnd->getSymbolName();
				LLVM_DEBUG(dbgs() << " isMBBWithOPINCAAKernelEndMarker(): symStr = "
				<< symStr << "\n");

				if (symStr.find(strToSearch) != std::string::npos) {
				LLVM_DEBUG(
				dbgs()
				<< " isMBBWithOPINCAAKernelEndMarker(): Found INLINEASM "
				"with strToSearch in the symbol "
				"operand\n");
				//"with host-side for loop"
				return true;
				}
				}
				}
				}
				}

				return false;
				} // End isMBBWithInlineAsmString()

				class ConnexAsmPrinter : public AsmPrinter {
				#include "ConnexAsmPrinterLoopNests.h"
				/*
				TODO:
				private:
				BTFDebug *BTF;
				explicit BPFAsmPrinter(TargetMachine &TM,
				std::unique_ptr<MCStreamer> Streamer)
				: AsmPrinter(TM, std::move(Streamer)), BTF(nullptr) {}
				*/
				public:
				explicit ConnexAsmPrinter(TargetMachine &TM,
				std::unique_ptr<MCStreamer> Streamer)
				: AsmPrinter(TM, std::move(Streamer)) {}

				StringRef getPassName() const override { return "Connex Assembly Printer"; }

				/*
				// Inspired from BPF's BPFAsmPrinter::emitInstruction(const MachineInstr *MI)
				void emitInstruction(const MachineInstr *MI) {
				MCInst TmpInst;

				//if (!BTF \|\| !BTF->InstLower(MI, TmpInst)) {
				ConnexMCInstLower MCInstLowering(OutContext, *this);
				MCInstLowering.Lower(MI, TmpInst);
				//}

				EmitToStreamer(*OutStreamer, TmpInst);
				}
				*/

				/*
				(From http://llvm.org/docs/doxygen/html/classllvm_1_1MachineFunctionPass.html
				we see SelectionDAGISel and AsmPrinter were the only passes that inherit
				MachineFunctionPass, from this back end.)
				From http://llvm.org/docs/doxygen/html/AsmPrinter_8h_source.html:
				/// Set up the AsmPrinter when we are working on a new module. If your pass
				/// overrides this, it must make sure to explicitly call this implementation.
				*/

				bool isVectorBody(StringRef &&strRef) {
				#define STR_VECTOR_BODY "vector.body"
				#define STR_VECTOR_BODY_PREHEADER ".preheader"

				LLVM_DEBUG(dbgs() << "isVectorBody(): strRef = " << strRef << "\n");

				// We can have several BBs with name vector.bodyXYZT (but we do NOT
				// search for STR_VECTOR_BODY_PREHEADER, which can be e.g.,
				// vector.body40.preheader)
				if (strRef.startswith(StringRef(STR_VECTOR_BODY)) &&
				strRef.endswith(StringRef(STR_VECTOR_BODY_PREHEADER)))
				return false;

				if (strRef.startswith(StringRef(STR_VECTOR_BODY)) == false)
				return false;

				LLVM_DEBUG(dbgs() << "isVectorBody(): returning true\n");

				return true;
				} // End isVectorBody()

				void moveToFrontRepeat(MachineBasicBlock *MBB) {
				LLVM_DEBUG(dbgs() << "Entered moveToFrontRepeat(MBB = " << MBB << ")\n");

				// Moving the REPEAT and it's symbolic operand in INLINEASM at the
				// front of the MBB.
				for (auto MIItr = MBB->begin(); MIItr != MBB->end(); ++MIItr) {
				MachineInstr MI = &(MIItr);

				if (MI->getOpcode() == Connex::REPEAT_SYM_IMM) {
				LLVM_DEBUG(
				dbgs() << "moveToFrontRepeat(): Found Connex::REPEAT_SYM_IMM\n");
				MIItr++;

				MachineInstr MI2 = &(MIItr);

				if (MI2->isInlineAsm()) {
				LLVM_DEBUG(dbgs() << "moveToFrontRepeat(): Moving the successor "
				"INLINEASM together with the "
				"Connex::REPEAT_SYM_IMM\n");

				MBB->remove(MI2);
				MBB->insert(MBB->front(), MI2);
				} else {
				MIItr++;
				MI2 = &(*MIItr);

				LLVM_DEBUG(dbgs() << "moveToFrontRepeat(): Moving the following "
				"(not successor) INLINEASM together with the "
				"Connex::REPEAT_SYM_IMM\n");
				if (MI2->isInlineAsm()) {
				MBB->remove(MI2);
				MBB->insert(MBB->front(), MI2);
				} else {
				assert(0 && "Can't find INLINEASM associated to REPEAT_SYM_IMM");
				}
				}

				LLVM_DEBUG(
				dbgs() << "moveToFrontRepeat(): Moving Connex::REPEAT_SYM_IMM\n");

				MBB->remove(MI);
				MBB->insert(MBB->front(), MI);

				break;
				}
				}
				} // End moveToFrontRepeat()

				void moveToFrontInlineAsm(MachineBasicBlock *MBB, std::string strToSearch) {
				LLVM_DEBUG(dbgs() << "Entered moveToFrontInlineAsm(MBB = " << MBB
				<< ", strToSearch = " << strToSearch << ")\n");

				std::string strMBB = MBB->getName().str();

				// Moving strToSearch and it's associated INLINEASM at the
				// front of the MBB.
				for (auto MIItr = MBB->begin(), MBBend = MBB->end(); MIItr != MBBend;) {
				MachineInstr MI = &(MIItr);

				// We avoid iterator invalidation:
				// See some comments on iterator invalidation (when doing remove) at
				// llvm.1065342.n5.nabble.com/deleting-or-replacing-a-MachineInst-td77723.html
				MachineBasicBlock::iterator MIsucc = MIItr;
				MIsucc++;

				if (MI->isInlineAsm()) {
				LLVM_DEBUG(dbgs() << " moveToFrontInlineAsm(): found INLINEASM *MI = "
				<< *MI << "\n");

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineInstr.html
				for (unsigned index = 0; index < MI->getNumOperands(); index++) {
				MachineOperand *miOpnd;
				miOpnd = &(MI->getOperand(index));

				LLVM_DEBUG(dbgs() << " MI->getOperand(" << index << ") = " << *miOpnd
				<< "\n");

				if (miOpnd->isSymbol()) {
				std::string symStr = miOpnd->getSymbolName();
				LLVM_DEBUG(dbgs() << " moveToFrontInlineAsm(): symStr = " << symStr
				<< "\n");

				if (symStr.find(strToSearch) != std::string::npos) {
				LLVM_DEBUG(dbgs() << " moveToFrontInlineAsm(): Found INLINEASM "
				"with strToSearch in the symbol "
				"operand\n");

				MBB->remove(MI);

				if (strMBB == "entry") {
				// The "entry" MBB normally contains init Connex
				// instructions, so we add marker MI at the end to prevent these
				// init instructions to be put inside a host-side For loop
				// since they will be executed in the For loop body, which is
				// NOT good
				MBB->insert(MBB->getFirstTerminator(), MI);
				} else {
				MBB->insert(MBB->front(), MI);
				}
				}
				}
				}
				}

				// We avoid iterator invalidation
				MIItr = MIsucc;
				}
				} // End moveToFrontInlineAsm()

				/*
				This moves to the front of the MBB a number of 3 (if justOne == false),
				or 1 (if justOne == true) ASM inline expression(s) IF the 1st inline
				expression has OPINCAA kernel begin.

				We require to run first this function with justOne == false and then
				with justOne == true.

				More exactly, in LoopVectorize.cpp we added, among others, the following
				3 ASM inline expressions (consecutively):
				- 1 BEGIN_KERNEL INLINEASM instruction used as loop prologue
				- 1 END_KERNEL INLINEASM instruction used as
				loop prologue (END_KERNEL part)
				- 1 BEGIN_KERNEL INLINEASM instruction for
				the loop.
				We move these 3 instructions to the front of
				MBB when justOne == false. This ensures that eventual
				less-likely case of having a VLOAD_H_SYM_IMM (and inline ASM associated,
				containing the symbolic operand) manually generated by me
				in ConnexISelDAGToDAG.cpp is not going to be first instruction, before
				the OPINCAA loop header ASM inline expression.
				We also make sure that eventual loads from spills are put inside the loop
				prologue.

				We move 1 instruction to the front since in runOnMachineFunction() we put
				all instructions of the predecessor (has to be only 1 predecessor) of
				vector.body at the front of MBB, so we have to move the BEGIN_KERNEL of
				the loop prologue.
				*/
				void moveToFront(MachineBasicBlock *MBB, bool justOne) {
				MachineInstr tmp1, tmp2, tmp3; //, tmp4;
				int counter = 0;

				LLVM_DEBUG(dbgs() << "Entered moveToFront(justOne = " << justOne << ")\n");

				/* We compute MIItrLastLoadAssociatedToSpill, an iterator (pointer) to
				the first instruction after the loads (fills) from spills at the
				beginning of the BB.
				*/
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineBasicBlock.html
				/* Important: Make sure we put this initialization after any other MBB
				mutation in order to use it well to move the 3 INLINEASM instructions.
				*/
				MachineBasicBlock::iterator MIItrLastLoadAssociatedToSpill = MBB->front();

				if (justOne == false) {
				for (auto MIItr2 = MBB->begin(); MIItr2 != MBB->end(); ++MIItr2) {
				MachineInstr MI = &(MIItr2);

				LLVM_DEBUG(dbgs() << " moveToFront(): MI = " << MI
				<< ", MI->getOpcode() = " << MI->getOpcode() << "\n");

				unsigned imm = -1;
				if (MI->getOpcode() == Connex::LD_H) {
				// Inspired from
				// http://llvm.org/docs/doxygen/html/MachineInstr_8cpp_source.html,
				// method MachineInstr::isIdenticalTo()
				for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
				const MachineOperand &MO = MI->getOperand(i);

				if (MO.isImm()) {
				imm = MO.getImm();
				LLVM_DEBUG(dbgs() << " moveToFront(): imm = " << imm << "\n");
				break;
				}
				}

				// If the imm operand > CONNEX_MEM_NUM_ROWS - 32 it (normally)
				// means that the operation is generated in
				// ConnexInstrInfo::storeRegToStackSlot() and
				// ConnexInstrInfo::loadRegFromStackSlot(),
				// part of a spill or load from spill operation.
				// Note that on Connex we do not have a stack per se,
				// but we emulate it at the end of the LS memory.
				if ((imm >= CONNEX_MEM_NUM_ROWS - 32) &&
				(imm < CONNEX_MEM_NUM_ROWS)) {
				MIItrLastLoadAssociatedToSpill = MIItr2;
				MIItrLastLoadAssociatedToSpill++;
				}
				}
				} // end for
				} // if (justOne == false)

				// Moving the ISD::INLINEASM instruction containing the opincaa kernel
				// begin at the very front of this BB.
				for (auto MIItr = MBB->begin(); MIItr != MBB->end(); ++MIItr, ++counter) {
				MachineInstr MI = &(MIItr);

				if (MI->isInlineAsm()) {
				LLVM_DEBUG(dbgs() << " moveToFront() found INLINEASM MI = " << MI
				<< "\n");

				bool isOpincaaCodeBegin = false;

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineInstr.html
				for (unsigned index = 0; index < MI->getNumOperands(); index++) {
				MachineOperand *miOpndOpincaaCodeBegin; // = NULL;
				miOpndOpincaaCodeBegin = &(MI->getOperand(index));

				LLVM_DEBUG(dbgs() << " MI->getOperand(" << index
				<< ") = " << *miOpndOpincaaCodeBegin << "\n");

				if (miOpndOpincaaCodeBegin->isSymbol()) {
				std::string symStr = miOpndOpincaaCodeBegin->getSymbolName();
				LLVM_DEBUG(dbgs()
				<< " moveToFront(): symStr = " << symStr << "\n");
				if (symStr.find(STR_OPINCAA_CODE_BEGIN) != std::string::npos) {
				isOpincaaCodeBegin = true;
				break;
				}
				}
				}

				if (isOpincaaCodeBegin) {
				if (counter != 0) {
				// We move only if not at the beginning of MBB
				tmp1 = MI;
				LLVM_DEBUG(dbgs()
				<< " moveToFront(): moving INLINEASM to the front "
				"(counter = "
				<< counter << ", justOne = " << justOne << ")\n");

				if (justOne == true) {
				MBB->remove(tmp1);
				MBB->insert(MBB->front(), tmp1);
				} else {
				// We move the next 3 instructions to the front of
				// MBB, namely:
				// - 1 BEGIN_KERNEL INLINEASM instruction used as
				// loop prologue
				// - 1 END_KERNEL INLINEASM instruction used as
				// loop prologue (END_KERNEL part)
				// - 1 BEGIN_KERNEL INLINEASM instruction for
				// the loop.
				// TODO: check tmp3 and tmp2 are also INLINEASM.

				MIItr++;
				tmp2 = &(*MIItr);

				MIItr++;
				tmp3 = &(*MIItr);

				LLVM_DEBUG(dbgs() << " moveToFront(): tmp1 = " << tmp1 << "\n");
				LLVM_DEBUG(dbgs() << " moveToFront(): tmp2 = " << tmp2 << "\n");
				LLVM_DEBUG(dbgs() << " moveToFront(): tmp3 = " << tmp3 << "\n");
				/*
				MBB->remove(tmp4);
				//MBB->insert(MBB->front(), tmp3);
				*/

				MBB->remove(tmp3);

				MBB->remove(tmp2);

				MBB->remove(tmp1);

				// TODO: check that the iterator
				// MIItrLastLoadAssociatedToSpill does NOT get
				// invalidated - it seems it is not invalidated even if we
				// change MBB, which is so because the instruction
				// to which the iterator points to is NOT changed.
				MBB->insert(MIItrLastLoadAssociatedToSpill, tmp1);
				MBB->insert(MIItrLastLoadAssociatedToSpill, tmp2);
				MBB->insert(MIItrLastLoadAssociatedToSpill, tmp3);
				}
				} // End if (counter != 0)
				break;
				} // End if (isOpincaaCodeBegin)
				}
				// counter++;
				}
				} // End moveToFront()

				// Moving the last ISD::INLINEASM instruction of MBB at the very back of MBB
				void moveToBackLastInlineAsm(MachineBasicBlock *MBB) {
				MachineInstr *tmp1;
				int counter = 0;

				LLVM_DEBUG(dbgs() << " moveToBackLastInlineAsm(): MBB = " << MBB << "\n");

				for (auto MIItr = MBB->rbegin(); MIItr != MBB->rend(); ++MIItr, ++counter) {
				MachineInstr MI = &(MIItr);

				if (MI->isInlineAsm()) {
				LLVM_DEBUG(
				dbgs() << " moveToBackLastInlineAsm() found INLINEASM MI = "
				<< *MI << "\n");

				bool isOpincaaCodeEnd = false;

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineInstr.html
				for (unsigned index = 0; index < MI->getNumOperands(); index++) {
				MachineOperand *miOpndOpincaaCodeEnd;
				miOpndOpincaaCodeEnd = &(MI->getOperand(index));

				LLVM_DEBUG(dbgs() << " MI->getOperand(" << index
				<< ") = " << *miOpndOpincaaCodeEnd << "\n");

				if (miOpndOpincaaCodeEnd->isSymbol()) {
				std::string symStr = miOpndOpincaaCodeEnd->getSymbolName();
				LLVM_DEBUG(dbgs() << " moveToBackLastInlineAsm(): symStr = "
				<< symStr << "\n");
				if (symStr.find(STR_OPINCAA_CODE_END) != std::string::npos) {
				isOpincaaCodeEnd = true;
				break;
				}
				}
				}

				if (isOpincaaCodeEnd) {
				tmp1 = MI;
				LLVM_DEBUG(dbgs()
				<< " moveToBackLastInlineAsm(): moving INLINEASM to the "
				"front (counter = "
				<< counter << ")\n");

				MBB->remove(tmp1);
				MBB->insert(MBB->end(), tmp1);
				break;
				}
				}
				}
				} // End moveToBackLastInlineAsm()

				// We add at the front of vector.body the instructions
				// for the predecessor of vector.body basic-block DIFFERENT than
				// vector.body (normally vector.ph).
				void copyInstructionsFromPred(MachineFunction &MF, MachineBasicBlock &MBB,
				MachineBasicBlock *&predMBBGood) {

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineBasicBlock.html
				/* (See fossies.org/linux/llvm/lib/CodeGen/DeadMachineInstructionElim.cpp
				* also, method DeadMachineInstructionElim::runOnMachineFunction() for
				* an example of iteration backwards).
				*/
				unsigned counterPredMBB = 0;

				// rbegin() is a reverse_iterator
				for (auto predMIItr = predMBBGood->rbegin();
				predMIItr != predMBBGood->rend(); predMIItr++, counterPredMBB++) {
				MachineInstr predMI = &(predMIItr);

				LLVM_DEBUG(dbgs() << " copyInstructionsFromPred(): predMI = " << predMI
				<< "\n");

				// Need to insert them in different order
				if (predMI->isBundle()) {
				LLVM_DEBUG(dbgs() << " copyInstructionsFromPred(): handling bundle\n");

				const MachineBasicBlock *MBBBundle = predMI->getParent();
				MachineBasicBlock::const_instr_iterator I = predMI->getIterator();

				// Important: We assume we work with finalized bundles
				I++;

				assert(I != MBBBundle->instr_end());
				const MachineInstr I1 = &(I);
				LLVM_DEBUG(dbgs() << " copyInstructionsFromPredConnexAsmPrinter::"
				"runOnMachineFunction(): *I1 = "
				<< *I1 << "\n");
				//
				I++;

				// Important: We assume we work with bundles with only 2 instructions

				// From http://llvm.org/docs/doxygen/html/classllvm_1_1MachineInstr.html
				// bool isInsideBundle () const
				// Return true if MI is in a bundle (but not the first MI in a bundle).
				// bool isBundled () const
				// Return true if this instruction part of a bundle.
				//
				assert(I != MBBBundle->instr_end());
				const MachineInstr I2 = &(I);

				MachineInstr *newPredMI2 = MF.CloneMachineInstr(I2);
				LLVM_DEBUG(dbgs() << " copyInstructionsFromPred(): *newPredMI2 = "
				<< *newPredMI2 << "\n");
				MBB.insert(MBB.front(), newPredMI2);

				MachineInstr *newPredMI1 = MF.CloneMachineInstr(I1);
				LLVM_DEBUG(dbgs() << " copyInstructionsFromPred(): *newPredMI1 = "
				<< *newPredMI1 << "\n");
				MBB.insert(MBB.front(), newPredMI1);

				LLVM_DEBUG(
				dbgs() << " copyInstructionsFromPred(): End handling bundle\n");

				continue;
				}

				// We avoid the last instruction of predMBBGood, since it is an
				// unconditional JMP
				if (counterPredMBB == 0 &&
				// See
				// http://llvm.org/docs/doxygen/html/classllvm_1_1MachineInstr.html
				predMI->isUnconditionalBranch()) { // predMBBGood->size())
				/* For llc -O3 it removes the JMP at the end of
				vector.ph, hence it merges it with vector.body,
				even if it leaves the entry label of vector.body.
				So we need to check if predMI is JMP with
				isUnconditionalBranch(). */
				LLVM_DEBUG(dbgs() << " copyInstructionsFromPred(): found a JMP, "
				"so not copying it in vector.body\n");
				continue;
				}

				/* Important note: EmitInstruction() fails for ISD::INLINEASM
				EmitInstruction(&predMI);
				*/

				/* See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineFunction.html
				MachineInstr CloneMachineInstr(const MachineInstr Orig);
				CloneMachineInstr - Create a new MachineInstr which is a
				copy of the 'Orig' instruction, identical in all ways except
				the instruction has no parent, prev, or next.
				*/
				MachineInstr *newPredMI = MF.CloneMachineInstr(predMI);

				MBB.insert(MBB.front(), newPredMI);
				}

				// I guess normally we should have 2 predecessors, but since I mess
				// up in LoopVectorize.cpp the vector.body block in some cases
				// (e.g., with a few iterations, in the order of magnitude of the
				// vector unit width) it can remain with only 1 predecessor.
				//
				// assert(numPredecessors <= 2 &&
				// "vector.body should have at most 2 predecessors: itself and one more");
				} // End copyInstructionsFromPred()

				// Important: We copy from successor BB (middle.block) to vector.body BB
				void copyInstructionsFromSucc(MachineFunction &MF, MachineBasicBlock &MBB) {
				LLVM_DEBUG(dbgs() << " copyInstructionsFromSucc(): Move code from succ "
				"of block "
				<< MBB.getName().data() << "\n");
				arsenmUnsubmitted Done Reply Inline Actions dead function arsenm: dead function

				int numSuccessors = 0;

				for (auto succMBB : MBB.successors()) {
				numSuccessors++;

				StringRef strSuccMBB = succMBB->getName();
				LLVM_DEBUG(dbgs() << " copyInstructionsFromSucc(): strSuccMBB = "
				<< strSuccMBB << "\n");

				// See llvm.org/docs/doxygen/html/classllvm_1_1MachineBasicBlock.html
				/* (See fossies.org/linux/llvm/lib/CodeGen/DeadMachineInstructionElim.cpp
				* also, method DeadMachineInstructionElim::runOnMachineFunction() for
				* an example of iteration backwards).
				*/
				unsigned counterSuccMBB = 0;

				for (auto succMIItr = succMBB->begin(); succMIItr != succMBB->end();
				succMIItr++, counterSuccMBB++) {
				MachineInstr succMI = &(succMIItr);

				LLVM_DEBUG(dbgs() << " copyInstructionsFromSucc(): succMI = "
				<< *succMI << "\n");

				/* We avoid the last instruction of predMBB, since it is an
				unconditional JMP */
				if (
				// See llvm.org/docs/doxygen/html/classllvm_1_1MachineInstr.html
				(succMI->isUnconditionalBranch() \|\|
				succMI->isConditionalBranch())) { // predMBB->size())
				/* For llc -O3 it removes the JMP at the end of
				vector.ph, hence it merges it with vector.body,
				even if it leaves the entry label of vector.body.
				So we need to check if predMI is JMP with
				isUnconditionalBranch(). */
				LLVM_DEBUG(dbgs() << "copyInstructionsFromSucc(): found a JMP, "
				"so not copying it in vector.body\n");
				continue;
				}

				/* Important note: EmitInstruction() fails for ISD::INLINEASM
				EmitInstruction(&predMI);
				*/

				/* See llvm.org/docs/doxygen/html/classllvm_1_1MachineFunction.html
				MachineInstr CloneMachineInstr(const MachineInstr Orig);
				CloneMachineInstr - Create a new MachineInstr which is a
				copy of the 'Orig' instruction, identical in all ways except
				the instruction has no parent, prev, or next.
				*/
				MachineInstr *newSuccMI = MF.CloneMachineInstr(succMI);

				// Gives error: "Assertion `!N->getParent() && "machine instruction
				// already in a basic block"' failed."
				// MBB.insert(MBB.front(), &predMI);
				MBB.insert(MBB.back(), newSuccMI);
				}

				// Instead of break we should check if predMBB is the BB "just"
				// above predMBBGood or below
				break;
				}

				assert(numSuccessors == 1);
				} // End copyInstructionsFromSucc()

				// If commented we traverse nodes in our standard DFS (pre-order).
				// Otherwise we traverse in Reverse post-order (RPO).
				#define ADAPTED_RPO

				/* In DFS() we store in the sortedListMBB vector the traversed nodes:
				- in RPO (Reverse post-order)
				- see e.g.
				eli.thegreenplace.net/2015/directed-graph-traversal-orderings-and-applications-to-data-flow-analysis/
				- OR, we can use, if we want, preorder (standard DFS).
				This is required because the MachineBasicBlock class iterates the BBs
				in an (undocumented/unspecified) order (for MatMul it is actually RPO),
				which is bad for our simple source-to-source transformation basically
				implemented with our simple ReplaceLoopsWithOpincaaKernels tool that
				simply copies a section of the Connex assembly code from the test.s
				file that is inbetween the markers
				"// START_OPINCAA_HOST_DEVICE_CODE" and
				"// END_OPINCAA_HOST_DEVICE_CODE".

				Using the MachineBasicBlock BB iterator order results in e.g.:
				- REPEAT instruction being misplaced (not at beginning of loop, but
				actually close to the end of the loop, close to END_REPEAT) - see the
				MatAdd test. This example shows the difference of the order:
				Printing the MBBs, as they are ordered now:
				BB name: = entry
				BB name: = entry
				BB name: = for.cond2.preheader.us.preheader
				BB name: = for.cond2.preheader.us
				BB name: = min.iters.checked
				BB name: = vector.memcheck
				BB name: = vector.memcheck
				BB name: = vector.memcheck
				BB name: = vector.memcheck
				BB name: = vector.memcheck
				BB name: = vector.memcheck
				BB name: = vector.memcheck
				BB name: = vector.memcheck
				BB name: = vector.memcheck
				BB name: = vector.memcheck
				BB name: = vector.body.preheader
				BB name: = vector.body
				BB name: = middle.block
				BB name: = for.body6.us
				BB name: = for.cond2.for.inc20_crit_edge.us
				BB name: = for.end22.loopexit
				BB name: = for.end22
				My pre-order DFS traversal:
				DFS(): BB name: = entry, n = 0x136c038
				DFS(): BB name: = entry, n = 0x13ad750
				DFS(): BB name: = for.cond2.preheader.us.preheader, n = 0x136c0e8
				DFS(): BB name: = for.cond2.preheader.us, n = 0x136c198
				DFS(): BB name: = min.iters.checked, n = 0x136c4e8
				DFS(): BB name: = for.body6.us, n = 0x1381360
				DFS(): BB name: = for.cond2.for.inc20_crit_edge.us, n = 0x1381520
				DFS(): BB name: = for.end22.loopexit, n = 0x13816f0
				DFS(): BB name: = for.end22, n = 0x13817a0
				DFS(): BB name: = vector.memcheck, n = 0x136c598
				DFS(): BB name: = vector.memcheck, n = 0x13ad980
				DFS(): BB name: = vector.memcheck, n = 0x13ada30
				DFS(): BB name: = vector.memcheck, n = 0x13adb20
				DFS(): BB name: = vector.memcheck, n = 0x13adbd0
				DFS(): BB name: = vector.memcheck, n = 0x13adcc0
				DFS(): BB name: = vector.memcheck, n = 0x13add70
				DFS(): BB name: = vector.memcheck, n = 0x13adf30
				DFS(): BB name: = vector.memcheck, n = 0x13adfe0
				DFS(): BB name: = vector.memcheck, n = 0x13ab5f8
				DFS(): BB name: = vector.body.preheader, n = 0x136c648
				DFS(): BB name: = vector.body, n = 0x136c6f8
				DFS(): BB name: = middle.block, n = 0x136c928

				The issue with misplaced REPEAT can also be seen in the MatMul_sizeMat test.
				*/
				void DFS(MachineBasicBlock *n) {
				// See http://www.cplusplus.com/reference/map/map/count/
				if (visitedMBB.count(n) != 0)
				return;

				// See http://www.cplusplus.com/reference/map/map/insert/
				visitedMBB.insert(std::pair<MachineBasicBlock *, bool>(n, true));

				#ifndef ADAPTED_RPO
				finishingTimeMBB.insert(
				std::pair<MachineBasicBlock *, int>(n, sortedListMBB.size()));
				sortedListMBB.push_back(n);
				#endif

				const char *strN = n->getName().data();
				LLVM_DEBUG(dbgs() << "DFS(): BB name of n: = " << strN << ", n = " << n
				<< "\n");

				int numSuccessorsN = 0;
				MachineBasicBlock *successorsN[2];

				// If in the successors we have vector.ph, vector.body, etc we choose those
				// first.
				for (auto MBB : n->successors()) {
				std::string strMBB = MBB->getName().data();
				LLVM_DEBUG(dbgs() << " DFS(): successor: MBB name = " << strMBB
				<< ", MBB = " << MBB << "\n");
				if (strMBB == "min.iters.checked" \|\|
				// small-TODO: check only for "vector.*" not for all below
				strMBB == "vector.memcheck" \|\| strMBB == "vector.ph" \|\|
				strMBB == "vector.body.preheader" \|\| strMBB == "vector.body") {
				DFS(MBB); // This will update visitedMBB to avoid further visits
				}

				successorsN[numSuccessorsN & 1] = MBB;
				numSuccessorsN++;
				}
				LLVM_DEBUG(dbgs() << "DFS(): numSuccessorsN = " << numSuccessorsN << "\n");

				if (numSuccessorsN == 2) {
				std::string strSuccName0 =
				successorsN[numSuccessorsN & 1]->getName().str();
				std::string strSuccName1 =
				successorsN[(numSuccessorsN + 1) & 1]->getName().str();

				// If we have 2 successors e.g. %for.cond47.preheader.preheader,
				// %for.cond6.preheader.preheader
				// we choose the one with smaller ID (i.e., in this case 6) number first.
				// #define FOR_COND_STR "for.cond"
				std::string FOR_COND_STR = "for.cond";
				if (startsWith(strSuccName0, FOR_COND_STR) &&
				startsWith(strSuccName1, FOR_COND_STR)) {
				LLVM_DEBUG(dbgs() << "DFS(): strSuccName0 = " << strSuccName0 << "\n");
				LLVM_DEBUG(dbgs() << "DFS(): strSuccName1 = " << strSuccName1 << "\n");

				std::string strSuccNameId0, strSuccNameId1;
				strSuccNameId0 = strSuccName0.substr(FOR_COND_STR.size());
				strSuccNameId1 = strSuccName1.substr(FOR_COND_STR.size());
				strSuccNameId0 = strSuccNameId0.substr(0, strSuccNameId0.find('.'));
				strSuccNameId1 = strSuccNameId1.substr(0, strSuccNameId1.find('.'));

				LLVM_DEBUG(dbgs() << "DFS(): strSuccNameId0 = " << strSuccNameId0
				<< "\n");
				LLVM_DEBUG(dbgs() << "DFS(): strSuccNameId1 = " << strSuccNameId1
				<< "\n");

				if (atoi(strSuccNameId0.c_str()) < atoi(strSuccNameId1.c_str())) {
				#ifdef ADAPTED_RPO
				// The 1st successor has bigger ID --> changing the order of the
				// 2 successors
				LLVM_DEBUG(dbgs() << "DFS(): Changing order of the 2 successors.\n");

				DFS(successorsN[(numSuccessorsN + 1) & 1]);
				// DFS(successorsN[numSuccessorsN & 1]);

				// IMPORTANT: Never give return since at the end of this function
				// we insert finishingTimeMBB.
				#else
				assert(0 && "NOT implemented.");
				// DFS(successorsN[numSuccessorsN & 1]);
				#endif
				}
				}

				// Addressing case encountered for MatMul-512.i16, TS_182_74
				// If we have 2 successors if.then... and if.else...
				// we choose if.then... first.
				#define IF_THEN "if.then"
				#define IF_ELSE "if.else"
				//
				#define IF_BODY1 "for.body4.us.preheader"
				// MEGA-TODO: this is a NON-general solution
				#define IF_BODY2 "for.body4.preheader"
				if ((startsWith(strSuccName0, IF_THEN) &&
				startsWith(strSuccName1, IF_ELSE)) \|\|
				(startsWith(strSuccName0, IF_BODY1) &&
				startsWith(strSuccName1, IF_BODY2))) {
				LLVM_DEBUG(dbgs() << "DFS(): strSuccName0 = " << strSuccName0 << "\n");
				LLVM_DEBUG(dbgs() << "DFS(): strSuccName1 = " << strSuccName1 << "\n");

				#ifdef ADAPTED_RPO
				// The 1st successor has bigger ID --> changing the order of the
				// 2 successors
				LLVM_DEBUG(dbgs() << "DFS(): Changing order of the 2 successors.\n");

				DFS(successorsN[(numSuccessorsN + 1) & 1]);
				// DFS(successorsN[numSuccessorsN & 1]);

				// IMPORTANT: Never give return since at the end of this function
				// we insert finishingTimeMBB.
				#else
				assert(0 && "NOT implemented.");
				// DFS(successorsN[numSuccessorsN & 1]);
				#endif
				}
				}

				for (auto MBB : n->successors()) {
				DFS(MBB);
				}

				#ifdef ADAPTED_RPO
				// See http://www.cplusplus.com/reference/map/map/insert/
				finishingTimeMBB.insert(
				std::pair<MachineBasicBlock *, int>(n, sortedListMBB.size()));
				sortedListMBB.push_back(n);
				#endif
				}

				static bool compareBasicBlocks(MachineBasicBlock &b1, MachineBasicBlock &b2) {
				LLVM_DEBUG(dbgs() << "compareBasicBlocks(): finishingTimeMBB[&b1] = "
				<< finishingTimeMBB[&b1] << ", finishingTimeMBB[&b2] = "
				<< finishingTimeMBB[&b2] << ".\n");

				#ifdef ADAPTED_RPO
				return finishingTimeMBB[&b1] > finishingTimeMBB[&b2];
				#endif

				// reverse RPO: return finishingTimeMBB[&b1] < finishingTimeMBB[&b2];
				}

				void sortMBBs(MachineFunction &MF) {
				MachineBasicBlock *entryMBB = NULL;

				LLVM_DEBUG(dbgs() << "Printing the MBBs, as they are ordered now:\n");
				// Looking at http://llvm.org/doxygen/classllvm_1_1MachineFunction.html
				// it seems it's not possible to obtain the root(s) of the MB otherwise.
				for (auto &MBB : MF) {
				if (entryMBB == NULL)
				entryMBB = &MBB;
				std::string strMBB = MBB.getName().str();
				LLVM_DEBUG(dbgs() << " BB name = " << strMBB << "\n");
				}

				// We now compute the order of the CFG node (BB) traversal
				visitedMBB.clear();
				finishingTimeMBB.clear();
				sortedListMBB.clear();
				//
				DFS(entryMBB);
				// Small Note: We can get inspired form the ReversePostOrderTraversal
				// LLVM class and create our adapted RPO order class, but note that
				// using ReversePostOrderTraversal doesn't change the order of MBBs in
				// the MF object, which is REQUIRED by the EmitFunctionBody() method,
				// which iterates over the MBBs of MF. This is why we perform the
				// somewhat-strange MF.sort() below.
				// See http://llvm.org/doxygen/X86WinAllocaExpander_8cpp_source.html#l00146
				// ReversePostOrderTraversal<MachineFunction*> RPO(&MF);
				// for (MachineBasicBlock *MBB : RPO) {...}
				// See also https://llvm.org/doxygen/PostOrderIterator_8h_source.html#l00259

				#ifdef ADAPTED_RPO
				LLVM_DEBUG(dbgs() << "ConnexAsmPrinter: ADAPTED_RPO sortedListMBB = \n");
				for (int idxSListMBB = sortedListMBB.size() - 1; idxSListMBB >= 0;
				idxSListMBB--)
				#else
				LLVM_DEBUG(dbgs() << "ConnexAsmPrinter: DFS order sortedListMBB =.\n");
				for (int idxSListMBB = 0; idxSListMBB < sortedListMBB.size(); idxSListMBB++)
				#endif
				{
				MachineBasicBlock *MBB = sortedListMBB[idxSListMBB];

				std::string strMBB = MBB->getName().str();
				LLVM_DEBUG(dbgs() << " BB name = " << strMBB << ", MBB = " << MBB
				<< "\n");
				}

				// For calling a templated function
				// see http://www.cplusplus.com/doc/oldtutorial/templates/
				MF.sort<CompareBBs>(compareBasicBlocks);

				LLVM_DEBUG(dbgs() << " After sort():\n");
				for (auto &MBB : MF) {
				std::string strMBB = MBB.getName().str();
				LLVM_DEBUG(dbgs() << " BB name = " << strMBB << "\n");
				}
				} // End sortMBBs()

				/// Emit the specified function out to the OutStreamer.
				bool runOnMachineFunction(MachineFunction &MF) override {
				LLVM_DEBUG(dbgs() << "Entered ConnexAsmPrinter::runOnMachineFunction().\n");
				LLVM_DEBUG(dbgs() << " EnableCorrectBBsASMPrint = "
				<< EnableCorrectBBsASMPrint << "\n");

				// We sort the BBs of the MF in a better order to be able to use our
				// ReplaceLoopsWithOpincaaKernels tool to extract correctly the vector
				// kernels, in SIMPLE TEXTUAL order, from the .s file generated here.
				sortMBBs(MF);

				int numVectorizedLoops = 0;
				bool TreatRepeat2ndInnerLoopGlobal = false;

				// We read from FILENAME_LOOPNESTS_LOCATIONS the configuration of the loop
				// nests in order to fill correctly the std::vector
				// treatRepeat2ndInnerLoop, which we use below.
				readLoopsLocFile(const_cast<char *>(FILENAME_LOOPNESTS_LOCATIONS), true);
				LLVM_DEBUG(
				dbgs() << "runOnMachineFunction(): treatRepeat2ndInnerLoop.size() = "
				<< treatRepeat2ndInnerLoop.size() << "\n");

				if (EnableCorrectBBsASMPrint) {
				this->MF = &MF;

				// Inspired from ConnexRegisterInfo.cpp:
				// const TargetInstrInfo *TII = MF.getSubtarget().getInstrInfo();

				// Inspired from llvm.org/docs/doxygen/html/AsmPrinter_8cpp_source.html:

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineFunction.html
				for (auto &MBB : MF) {
				if (numVectorizedLoops >= (int)treatRepeat2ndInnerLoop.size())
				TreatRepeat2ndInnerLoopGlobal = false;
				else
				TreatRepeat2ndInnerLoopGlobal =
				treatRepeat2ndInnerLoop[numVectorizedLoops];

				LLVM_DEBUG(
				dbgs() << "runOnMachineFunction(): TreatRepeat2ndInnerLoopGlobal = "
				<< TreatRepeat2ndInnerLoopGlobal << "\n");
				LLVM_DEBUG(dbgs() << "runOnMachineFunction(): numVectorizedLoops = "
				<< numVectorizedLoops << "\n");

				if (TreatRepeat2ndInnerLoopGlobal == true) {
				// TODO: think a bit: we should always call moveToFrontRepeat()
				// - we complicate a bit, BUT it is highly unlikely to have a
				// REPEAT() after the last vector.body

				// A bit inefficient - we try all MBB
				moveToFrontRepeat(&MBB);
				} else {
				// If we do this we risk to have comments like "Map/Reduction part"
				// after the REPEAT OPINCAA instruction.
				moveToFrontRepeat(&MBB);
				}

				// NOTE: We need to do this check because if we try to split in the
				// LoopVectorize pass MBB, it will get merged back into one BB after
				// LV, in opt.
				if (isMBBWithInlineAsmString(&MBB, STR_OPINCAA_CODE_END) == false) {
				LLVM_DEBUG(dbgs() << "isMBBWithInlineAsmString(STR_OPINCAA_CODE_END) "
				"returned false\n");
				// We take care to put the beginning marker for OPINCAA kernel at the
				// very front of its basic block, MBB - we try all MBBs.
				LLVM_DEBUG(dbgs()
				<< "Calling moveToFrontInlineAsm(STR_OPINCAA_CODE_BEGIN) "
				"for MBB = "
				<< MBB.getName() << "\n");

				moveToFrontInlineAsm(&MBB,
				const_cast<char *>(STR_OPINCAA_CODE_BEGIN));

				LLVM_DEBUG(dbgs()
				<< "Finished calling "
				"moveToFrontInlineAsm(STR_OPINCAA_CODE_BEGIN)\n");
				}

				if (isVectorBody(MBB.getName()) == false)
				continue;

				numVectorizedLoops++;

				// moveToFrontRepeat(MBB);
				//
				// replaceWithSymbolicIndex(&MBB);
				/* Important:
				* We move the Inline ASM expressions to the beginning of the BB,
				* by using moveToFront(),
				* such that, immediately after (see code below) we put the
				* instructions of the predecessor of the vector.body BB
				* at the top and then call moveToFront(&MBB, true) again
				* to make the code OK.
				*/
				// moveToFront(&MBB, false);

				MachineBasicBlock *predMBBGood;
				int numPredecessors = 0;
				for (auto predMBB : MBB.predecessors()) {
				numPredecessors++;

				if (isVectorBody(predMBB->getName()) == true)
				continue;
				else
				predMBBGood = predMBB;
				}

				// I guess normally we should have 2 predecessors, but since I mess
				// up in LoopVectorize.cpp the vector.body block in some cases
				// (e.g., with a few iterations, in the order of magnitude of the
				// vector unit width) it can remain with only 1 predecessor.
				assert(numPredecessors <= 2 && "vector.body should have at most "
				"2 predecessors: itself and one more");

				if (TreatRepeat2ndInnerLoopGlobal == false) {
				// copyInstructionsFromPred(MF, MBB, predMBBGood);

				// We move the header of the OPINCAA kernel
				moveToFront(predMBBGood, true);
				}

				// Does NOT help: moveToFront(&MBB, true);
				LLVM_DEBUG(dbgs() << " runOnMachineFunction(): calling "
				arsenmUnsubmitted Not Done Reply Inline Actions Remove all of these type of macros arsenm: Remove all of these type of macros
				"moveToFrontInlineAsm(&MBB)\n");
				// moveToFront(&MBB, false);
				moveToFrontInlineAsm(&MBB, const_cast<char *>("for ("));

				if (TreatRepeat2ndInnerLoopGlobal == true) {
				moveToBackLastInlineAsm(&MBB);
				}
				} // End for (auto &MBB : MF)
				} // End if EnableCorrectBBsASMPrint

				SetupMachineFunction(MF);
				emitFunctionBody();

				return false;
				} // End bool runOnMachineFunction(MachineFunction &MF)

				void printOperand(const MachineInstr *MI, int OpNum, raw_ostream &O,
				const char *Modifier = nullptr);

				void emitInstruction(const MachineInstr *MI) override;

				// Taken from the MSP430 back end
				void printSrcMemOperand(const MachineInstr *MI, int OpNum, raw_ostream &O);

				bool PrintAsmMemoryOperand(const MachineInstr *MI, unsigned OpNo,
				unsigned AsmVariant, const char *ExtraCode,
				raw_ostream &OS) {
				LLVM_DEBUG(dbgs() << "Entered PrintAsmMemoryOperand()\n");
				return false;
				}

				bool PrintAsmOperand(const MachineInstr *MI, unsigned OpNo,
				unsigned AsmVariant, const char *ExtraCode,
				raw_ostream &OS) {
				LLVM_DEBUG(dbgs() << "Entered PrintAsmOperand()\n");
				return false;
				}

				void PrintSpecial(const MachineInstr *MI, raw_ostream &OS,
				const char *Code) const {
				LLVM_DEBUG(dbgs() << "Entered PrintSpecial()\n");
				}

				void printOffset(int64_t Offset, raw_ostream &OS) const {
				LLVM_DEBUG(dbgs() << "Entered printOffset()\n");
				}
				}; // End class ConnexAsmPrinter

				} // End namespace

				// TODO: remove since it seems it's NOT called
				void ConnexAsmPrinter::printOperand(const MachineInstr *MI, int OpNum,
				raw_ostream &O, const char *Modifier) {
				LLVM_DEBUG(dbgs() << "Entered ConnexAsmPrinter::printOperand()\n");
				const MachineOperand &MO = MI->getOperand(OpNum);

				switch (MO.getType()) {
				case MachineOperand::MO_Register:
				O << ConnexInstPrinter::getRegisterName(MO.getReg());
				break;

				case MachineOperand::MO_Immediate: {
				unsigned imm = MO.getImm();
				LLVM_DEBUG(dbgs() << "printOperand(): imm = " << imm << "\n");

				if (imm == CONNEX_MEM_NUM_ROWS + 10) {
				O << STR_LOOP_SYMBOLIC_INDEX;
				} else {
				O << MO.getImm();
				}
				// O << MO.getImm();
				break;
				}

				case MachineOperand::MO_MachineBasicBlock:
				O << *MO.getMBB()->getSymbol();
				break;

				case MachineOperand::MO_GlobalAddress:
				O << *getSymbol(MO.getGlobal());
				break;

				default:
				llvm_unreachable("<unknown operand type>");
				}
				}

				void ConnexAsmPrinter::printSrcMemOperand(const MachineInstr *MI, int OpNum,
				raw_ostream &O) {
				const MachineOperand &Base = MI->getOperand(OpNum);
				const MachineOperand &Disp = MI->getOperand(OpNum + 1);

				// Print displacement first

				// Imm here is in fact global address - print extra modifier.
				if (Disp.isImm() && !Base.getReg())
				O << '&';

				printOperand(MI, OpNum + 1, O, "nohash");

				// Print register base field
				if (Base.getReg()) {
				O << '(';
				printOperand(MI, OpNum, O);
				O << ')';
				}
				}

				void ConnexAsmPrinter::emitInstruction(const MachineInstr *MI) {
				// We need to store the correspondence between MachineInstr and the lowered
				// MCInst, since MCInst does not.
				// This could be used in ConnexInstPrinter.cpp.
				// static const MachineInstr *crtMI;

				LLVM_DEBUG(dbgs() << "Entered ConnexAsmPrinter::emitInstruction()...\n");

				/* Inspired from lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
				(actually it's class AMDGPUAsmPrinter)
				*/
				if (MI->isBundle()) {
				LLVM_DEBUG(dbgs() << " emitInstruction(): handling bundle\n");
				const MachineBasicBlock *MBB = MI->getParent();
				// MachineBasicBlock::const_instr_iterator I = ++MI->getIterator();
				MachineBasicBlock::const_instr_iterator I = MI->getIterator();
				I++;

				/*
				From http://llvm.org/docs/doxygen/html/classllvm_1_1MachineInstr.html
				bool isInsideBundle () const
				Return true if MI is in a bundle (but not the first MI in a bundle).
				*/
				while (I != MBB->instr_end() && I->isInsideBundle()) {
				emitInstruction(&(*I));
				++I;
				}

				return;
				}

				ConnexMCInstLower MCInstLowering(OutContext, *this);

				MCInst TmpInst;
				MCInstLowering.Lower(MI, TmpInst);

				// crtMI = MI;

				EmitToStreamer(*OutStreamer, TmpInst);
				} // End ConnexAsmPrinter::emitInstruction()

				// Force static initialization.
				extern "C" void LLVMInitializeConnexAsmPrinter() {
				RegisterAsmPrinter<ConnexAsmPrinter> Z(TheConnexTarget);
				}
				arsenmUnsubmitted Done Reply Inline Actions Drop dead code arsenm: Drop dead code

llvm/lib/Target/Connex/ConnexAsmPrinterLoopNests.h

This file was added.

				//===-- ConnexAsmPrinterLoopNests.h - ------ C++ ---===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				/// \file
				/// This file implements reading the FILENAME_LOOPNESTS_LOCATIONS file with info
				/// about start and end locations of loops nests, generated by the
				/// LoopVectorize pass.
				// Used by ConnexAsmPrinter.cpp and ReplaceLoopsWithOpincaaKernels.cpp.
				//===----------------------------------------------------------------------===//

				#ifndef CONNEX_ASM_PRINTER_LOOP_NESTS_H
				#define CONNEX_ASM_PRINTER_LOOP_NESTS_H

				// Used by ReplaceLoopsWithOpincaaKernels.cpp and ConnexAsmPrinter.cpp

				std::vector<bool> treatRepeat2ndInnerLoop;
				// The start and end of the innermost (or 2nd innermost) loop
				std::vector<int> linStart, colStart, linEnd, colEnd;
				//
				std::vector<int> linStartLoopNest, colStartLoopNest, linEndLoopNest,
				colEndLoopNest;

				/*
				We read in vectors the lines and columns of the innermost loop
				and, if there is one, also of the outermost loop
				of the loop nests specified in the FILENAME_LOOPNESTS_LOCATIONS file.
				We put in treatRepeat2ndInnerLoop vector true
				depending if the loop nest has more than 1 loop in the nest,
				false otherwise.

				Note: We keep the numbering from 1 throughout the ENTIRE program,
				BUT in FindEndLoop() we decrement the value.
				*/
				void readLoopsLocFile(char *fileNameSrc, bool silentFail = false) {
				int index;
				char str[MAXLEN_STR];

				int linStartTmp, colStartTmp;
				int linEndTmp, colEndTmp;

				FILE *fin = fopen(fileNameSrc, "rt");

				/* We need to process each loop, from the last in the file to the first,
				therefore preserving the line & column numbers of the loops that
				remain to be replaces.
				*/
				if (silentFail) {
				if (fin == NULL) {
				printf("%s file NOT found (maybe NO loop was vectorized)",
				FILENAME_LOOPNESTS_LOCATIONS);
				return;
				}
				}
				assert(fin != NULL &&
				"readLoopsLocFile(): fileNameSrc (e.g., FILENAME_LOOPNESTS_LOCATIONS) "
				"file NOT found (maybe NO loop was vectorized). "
				"Anyhow cannot automatically replace in source file vectorized loops "
				"with OPINCAA kernels.");

				for (index = 0;; index++) {
				// We read the line with the C++ comment and discard it
				if (fgets(str, MAXLEN_STR - 1, fin) == NULL)
				break;

				printf("str = %s\n", str);
				fflush(stdout);

				// We read the coordinates of the innermost loop of the crt nest
				int res = fscanf(fin, "%d %d %d %d\r\n", &linStartTmp, &colStartTmp,
				&linEndTmp, &colEndTmp);
				(void)res;
				//
				printf("readLoopsLocFile(): index = %d\n", index);

				arsenmUnsubmitted Done Reply Inline Actions Definitely shouldn't be using fgets/fscanf anywhere. File usage anywhere like this is dubious arsenm: Definitely shouldn't be using fgets/fscanf anywhere. File usage anywhere like this is dubious
				printf("readLoopsLocFile(): (linStart = %d, colStart = %d) -> "
				"(linEndTmp = %d, colEndTmp = %d)\n",
				linStartTmp, colStartTmp, linEndTmp, colEndTmp);
				fflush(stdout);
				//
				linStart.push_back(linStartTmp);
				colStart.push_back(colStartTmp);
				linEnd.push_back(linEndTmp);
				colEnd.push_back(colEndTmp);
				assert(linStartTmp <= linEndTmp);

				// We check if the next line is one with C++ comment
				int ch = getc(fin);
				ungetc(ch, fin);

				printf("readLoopsLocFile(): ch = %d\n", (int)ch);
				fflush(stdout);

				if ((ch == '/') \|\| (ch == -1)) {
				treatRepeat2ndInnerLoop.push_back(false);

				linStartLoopNest.push_back(-1);
				colStartLoopNest.push_back(-1);
				linEndLoopNest.push_back(-1);
				colEndLoopNest.push_back(-1);
				} else {
				// We read the coordinates of the outermost loop of the crt nest
				treatRepeat2ndInnerLoop.push_back(true);

				int res = fscanf(fin, "%d %d %d %d\r\n", &linStartTmp, &colStartTmp,
				&linEndTmp, &colEndTmp);
				(void)res;
				printf("readLoopsLocFile(): (linStart = %d, colStart = %d) -> "
				"(linEndTmp = %d, colEndTmp = %d)\n",
				linStartTmp, colStartTmp, linEndTmp, colEndTmp);
				fflush(stdout);

				linStartLoopNest.push_back(linStartTmp);
				colStartLoopNest.push_back(colStartTmp);
				linEndLoopNest.push_back(linEndTmp);
				colEndLoopNest.push_back(colEndTmp);
				}

				printf("readLoopsLocFile(): treatRepeat2ndInnerLoop[%d] = %d\n", index,
				(int)treatRepeat2ndInnerLoop[index]);
				fflush(stdout);
				}

				fclose(fin);
				} // End readLoopsLocFile()

				#endif // End CONNEX_ASM_PRINTER_LOOP_NESTS_H

llvm/lib/Target/Connex/ConnexCallingConv.td

This file was added.

				//===-- ConnexCallingConv.td - Calling Conventions Connex --------- tablegen --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This describes the calling conventions for the Connex architecture.
				//
				//===----------------------------------------------------------------------===//

				// Connex 64-bit C return-value convention.
				def RetCC_Connex64 : CallingConv<[CCIfType<[i64], CCAssignToReg<[R0]>>]>;

				// Connex 64-bit C Calling convention.
				def CC_Connex64 : CallingConv<[
				// Promote i8/i16/i32 args to i64
				// TODO_CHANGE_BACKEND:
				CCIfType<[ i8, i16, i32 ], CCPromoteToType<i64>>,
				//CCIfType<[ i8, i16, i32 ], CCPromoteToType<i32>>,

				// All arguments get passed in integer registers if there is space.
				CCIfType<[i64], CCAssignToReg<[ R1, R2, R3, R4, R5 ]>>,
				//CCIfType<[i32], CCAssignToReg<[ R1, R2, R3, R4, R5 ]>>,

				// Could be assigned to the stack in 8-byte aligned units, but unsupported
				CCAssignToStack<8, 8>
				]>;

				def CSR : CalleeSavedRegs<(add R6, R7, R8, R9, R10)>;

llvm/lib/Target/Connex/ConnexConfig.h

This file was added.

				//===-- ConnexConfig.h ------------------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				///
				//===----------------------------------------------------------------------===//

				#ifndef CONNEX_CONFIG_H
				#define CONNEX_CONFIG_H

				// This file is used by ConnexISelDAGToDAG.cpp, ConnexISelLowering.h,
				// ReplaceLoopsWithOpincaaKernels.cpp.

				// The macros in this header file are strategic, in the sense that the back end
				// could target a Connex vector processor of different vector length.
				// There are also some other important macros like: CONNEX_MEM_NUM_ROWS_EXTRA
				// (used to keep spilled registers, or tables for f16 operations like sqrt
				// or div, etc), STR_OPINCAA, etc.

				// These 2 types are defined also in OPINCAA lib, in include/Architecture.h
				typedef short TypeElement;
				typedef unsigned short UnsignedTypeElement;

				// The vector length of the Connex-S back end, which could be different
				// from the actual vector length of the Connex-S processor.
				#define CONNEX_VECTOR_LENGTH 8

				// TYPE is the type of an element of a Connex-S vector register
				#define TYPE_SIZEOF 2
				#define CONNEX_LINE_SIZE (CONNEX_VECTOR_LENGTH * TYPE_SIZEOF)

				//#define STR_LOOP_SYMBOLIC_INDEX "indexLLVM_LV / CONNEX_VECTOR_LENGTH"
				// NOTE: make sure it is equiavlent to the above commented macro
				// NOTE: keep the paranthesis since >> has low operator priority
				#define STR_LOOP_SYMBOLIC_INDEX "(indexLLVM_LV >> 7)"

				// This is the type of the scalar processor (normally the BPF processor) operand
				// TODO_CHANGE_BACKEND:
				#define TYPE_SCALAR_ELEMENT MVT::i64
				// #define TYPE_ELEMENT MVT::i32

				// #define TYPE_VECTOR MVT::v8i64
				// #define TYPE_VECTOR MVT::v16i32
				// #define TYPE_VECTOR MVT::v32i16
				// #define TYPE_VECTOR_I16 MVT::v128i16
				#define TYPE_VECTOR_I16 MVT::v8i16
				#define TYPE_VECTOR_I16_EXT_I64 MVT::v8i64
				// #define TYPE_VECTOR_ELEMENT MVT::i64
				#define TYPE_VECTOR_I16_ELEMENT MVT::i16

				// #define TYPE_VECTOR_I32 MVT::v64i32
				#define TYPE_VECTOR_I32 MVT::v4i32
				#define TYPE_VECTOR_I32_EXT_I64 MVT::v4i64
				#define TYPE_VECTOR_I32_ELEMENT MVT::i32

				#define TYPE_VECTOR_I64 MVT::v2i64
				// #define TYPE_VECTOR_I64_EXT_I64 MVT::v2i64
				#define TYPE_VECTOR_I64_ELEMENT MVT::i64

				// #define TYPE_VECTOR_F16 MVT::v128f16
				#define TYPE_VECTOR_F16 MVT::v8f16
				#define TYPE_VECTOR_F16_ELEMENT MVT::f16

				#define TYPE_VECTOR_I16_ELEMENT_BITSIZE 16
				#define TYPE_VECTOR_I32_ELEMENT_BITSIZE 32
				#define TYPE_VECTOR_F16_ELEMENT_BITSIZE 16

				// This constant is used as an offset to inform from LoopVectorize pass to
				// the ConnexInstPrinter that the respective address of the LD_H or ST_H
				// Connex-S instruction is actually symbolic (and the symbolic value
				// can be found in the associated InlineAsm expression for it).
				#define CONNEX_MEM_CONSTANT_OFFSET 1000

				#define CONNEX_MEM_NUM_ROWS 1024
				// For 64 lanes: #define CONNEX_MEM_NUM_ROWS 2048
				// Extra LS memory for spills and LUTs for div/sqrt.f16, etc
				#define CONNEX_MEM_NUM_ROWS_EXTRA 200
				#define CONNEX_MEM_NUM_ROWS_EXTRA_FOR_SPILL 50

				// NOTE: normally REPEAT accepts immediates in interval 0..1023
				#define VALUE_BOGUS_REPEAT_X_TIMES 32761

				// #ifndef MAXLEN_STR
				#define MAXLEN_STR 8192
				// #endif

				// Used in ConnexAsmPrinter.cpp and LoopVectorize.cpp
				#define STR_OPINCAA_CODE_BEGIN "// START_OPINCAA_HOST_DEVICE_CODE"
				#define STR_OPINCAA_CODE_END "// END_OPINCAA_HOST_DEVICE_CODE"
				//
				#define STR_OPINCAA_KERNEL_REDUCE_BEFORE_END \
				"REDUCE R(0); // We add a 'bogus' REDUCE to wait for it"

				#define FILENAME_LOOPNESTS_LOCATIONS "loopsLoc.txt"

				#endif

llvm/lib/Target/Connex/ConnexFrameLowering.h

This file was added.

				//===-- ConnexFrameLowering.h - Define frame lowering for Connex - C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				// This class implements Connex-specific bits of TargetFrameLowering class.
				///
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_CONNEX_CONNEXFRAMELOWERING_H
				#define LLVM_LIB_TARGET_CONNEX_CONNEXFRAMELOWERING_H

				#include "llvm/CodeGen/TargetFrameLowering.h"

				namespace llvm {
				class ConnexSubtarget;

				class ConnexFrameLowering : public TargetFrameLowering {
				public:
				explicit ConnexFrameLowering(const ConnexSubtarget &sti)
				: TargetFrameLowering(TargetFrameLowering::StackGrowsDown, Align(8), 0) {}

				void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override;
				void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override;

				bool hasFP(const MachineFunction &MF) const override;
				void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs,
				RegScavenger *RS) const override;

				MachineBasicBlock::iterator
				eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB,
				MachineBasicBlock::iterator MI) const override {
				return MBB.erase(MI);
				}
				};
				} // End namespace llvm
				#endif

llvm/lib/Target/Connex/ConnexFrameLowering.cpp

This file was added.

				//===-- ConnexFrameLowering.cpp - Connex Frame Information ----------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains the Connex implementation of TargetFrameLowering class.
				//
				//===----------------------------------------------------------------------===//

				#include "ConnexFrameLowering.h"
				#include "ConnexInstrInfo.h"
				#include "ConnexSubtarget.h"
				#include "llvm/CodeGen/MachineFrameInfo.h"
				#include "llvm/CodeGen/MachineFunction.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"

				using namespace llvm;

				bool ConnexFrameLowering::hasFP(const MachineFunction &MF) const {
				return true;
				}

				void ConnexFrameLowering::emitPrologue(MachineFunction &MF,
				MachineBasicBlock &MBB) const {}

				void ConnexFrameLowering::emitEpilogue(MachineFunction &MF,
				MachineBasicBlock &MBB) const {}

				void ConnexFrameLowering::determineCalleeSaves(MachineFunction &MF,
				BitVector &SavedRegs,
				RegScavenger *RS) const {
				TargetFrameLowering::determineCalleeSaves(MF, SavedRegs, RS);
				SavedRegs.reset(Connex::R6);
				SavedRegs.reset(Connex::R7);
				SavedRegs.reset(Connex::R8);
				SavedRegs.reset(Connex::R9);
				}

llvm/lib/Target/Connex/ConnexHazardRecognizer.h

This file was added.

				//=-- ConnexHazardRecognizer.h - Define frame lowering for Connex -- C++ -*--=//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				///
				//===----------------------------------------------------------------------===//

				/* Inspired from llvm/lib/Target/PowerPC/PPCHazardRecognizer.h:
				/// PPCDispatchGroupSBHazardRecognizer - This class implements a
				/// scoreboard-based
				/// hazard recognizer for PPC ooo processors with dispatch-group hazards.
				*/

				#ifndef LLVM_LIB_TARGET_CONNEX_HAZARDRECOGNIZER_H
				#define LLVM_LIB_TARGET_CONNEX_HAZARDRECOGNIZER_H

				#include "ConnexInstrInfo.h"
				#include "llvm/CodeGen/ScheduleHazardRecognizer.h"
				#include "llvm/CodeGen/ScoreboardHazardRecognizer.h"
				#include "llvm/CodeGen/SelectionDAGNodes.h"

				namespace llvm {

				/* NOTE: ScheduleHazardRecognizer is basically an "interface"
				* (almost abstract, i.e. almost no functionality implemented) class, so better
				* stick with ScoreboardHazardRecognizer if its functionality is OK for me:
				*/

				/* We choose to inherit the ScoreboardHazardRecognizer because only this
				* performs out-of-order scheduling, and NOT ScheduleHazardRecognizer.
				*/
				class ConnexDispatchGroupSBHazardRecognizer
				: public ScoreboardHazardRecognizer {
				const ScheduleDAG *DAG;
				bool isDataHazard(SUnit *SU);

				public:
				ConnexDispatchGroupSBHazardRecognizer(const InstrItineraryData *ItinData,
				const ScheduleDAG *DAG_)
				: ScoreboardHazardRecognizer(ItinData, DAG_), DAG(DAG_) {}

				HazardType getHazardType(SUnit *SU, int Stalls) override;

				unsigned PreEmitNoops(SUnit *SU) override;
				void EmitInstruction(SUnit *SU) override;
				};

				} // End namespace llvm

				#endif

llvm/lib/Target/Connex/ConnexHazardRecognizer.cpp

This file was added.

				//===-- ConnexHazardRecognizer.cpp - Connex Hazard Recognizer Impls -------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements hazard recognizer for scheduling on Connex processor.
				//
				//===----------------------------------------------------------------------===//

				// Inspired from llvm/lib/Target/PowerPC/PPCHazardRecognizer.cpp

				#include "ConnexHazardRecognizer.h"
				#include "Connex.h"
				#include "ConnexInstrInfo.h"
				#include "ConnexTargetMachine.h"
				#include "llvm/CodeGen/ScheduleDAG.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/ErrorHandling.h"
				#include "llvm/Support/raw_ostream.h"
				//
				#define INCLUDE_SUNIT_DUMP
				#include "Misc.h" // For dumpSU()

				using namespace llvm;

				#define DEBUG_TYPE "post-RA-sched"

				// getPredMachineInstr() is declared in ConnexInstrInfo.cpp
				extern MachineInstr getPredMachineInstr(MachineInstr MI,
				MachineInstr **succMI);

				/*
				From llvm.org/docs/doxygen/html/ScheduleHazardRecognizer_8h_source.html:
				/// PreEmitNoops - This callback is invoked prior to emitting an instruction.
				/// It should return the number of noops to emit prior to the provided
				/// instruction.
				/// Note: This is only used during PostRA scheduling. EmitNoop is not called
				/// for these noops.
				*/
				unsigned ConnexDispatchGroupSBHazardRecognizer::PreEmitNoops(SUnit *SU) {
				assert(SU->isInstr() == true);

				if (isDataHazard(SU))
				return 1;

				return ScoreboardHazardRecognizer::PreEmitNoops(SU);
				}

				bool ConnexDispatchGroupSBHazardRecognizer::isDataHazard(SUnit *SU) {
				// From http://llvm.org/docs/doxygen/html/classllvm_1_1MCInstrDesc.html
				const MCInstrDesc *MCID = DAG->getInstrDesc(SU);
				if (MCID == NULL)
				return false;

				unsigned numUses = MCID->getNumOperands() - MCID->getNumDefs();
				LLVM_DEBUG(dbgs() << " isDataHazard(): numUses = " << numUses << "\n");
				LLVM_DEBUG(dbgs() << " isDataHazard(): MCID->getNumOperands() = "
				<< MCID->getNumOperands() << "\n");
				LLVM_DEBUG(dbgs() << " isDataHazard(): MCID->getNumDefs() = "
				<< MCID->getNumDefs() << "\n");

				assert(SU->isInstr() == true);

				MachineInstr *MI = SU->getInstr();
				LLVM_DEBUG(dbgs() << " isDataHazard(): MI ="; MI->dump(););

				int MIOpcode = MI->getOpcode();
				LLVM_DEBUG(dbgs() << " isDataHazard(): MI->getOpcode() = " << MI->getOpcode()
				<< "\n");

				if (MIOpcode == Connex::ST_INDIRECT_H \|\| MIOpcode == Connex::ST_INDIRECT_W \|\|
				MIOpcode == Connex::ST_INDIRECT_MASKED_H \|\| MIOpcode == Connex::ST_H) {
				/* NOTE: END_REPEAT returns, to my surprise, also mayStore().
				But we should not worry about this since END_REPEAT takes no
				parameter. */
				LLVM_DEBUG(dbgs() << " isDataHazard(): SU is Store\n");
				} else if (MIOpcode == Connex::LD_INDIRECT_H \|\|
				MIOpcode == Connex::LD_INDIRECT_W \|\|
				MIOpcode == Connex::LD_INDIRECT_MASKED_H) {
				LLVM_DEBUG(dbgs() << " isDataHazard(): SU is Load\n");
				} else if (MIOpcode == Connex::WHEREEQ_BUNDLE_H \|\|
				MIOpcode == Connex::WHERELT_BUNDLE_H \|\|
				MIOpcode == Connex::WHEREULT_BUNDLE_H) {
				LLVM_DEBUG(dbgs() << " isDataHazard(): SU is Where\n");
				} else {
				LLVM_DEBUG(dbgs() << " isDataHazard(): SU NOT producing data hazard\n");

				// Very important
				return false;
				}

				LLVM_DEBUG(dbgs() << " isDataHazard(): MI->getNumOperands() = "
				<< MI->getNumOperands() << "\n");

				/*
				Why does getHazardType() find 3 Loads - because I was considering pred in
				DAG (SDNode), not in MachineInstr list, where it should be only 1?

				This should cover these cases described in ConnexISA.docx:
				- (i)write using register defined in the previous instruction:
				LS[R1] = R4
				LS[5] = R1
				and also this slightly different case:
				LS[R10] = R1

				- read using register defined in the previous instruction
				R4 = LS[R1]

				- wherexx using the flag defined in the previous instruction
				R1 = (R2 == R3)
				WHERE_EQUAL
				*/

				// small-TODO: understand conceptually what PPC was doing with dispatch group.

				// IMPORTANT: We keep this search for predecessors of SU in the DAG and not
				// for THE only predecessor of the MachineInstr (we are at Post-RA scheduler)
				// contained in SU because MAYBE/it is possible that when doing
				// ScoreboardHazardRecognizer (out-of-order scheduling to fill delay slots)
				// we could benefit from the DAG predecessors - QUITE UNLIKELY, but maybe
				// so. Otherwise, we should ONLY look at the
				// getPredMachineInstr(MachineInstr *MI).
				//
				// For any predecessors of SU with which we
				// have an ordering dependency, return true.
				for (unsigned i = 0, ie = (unsigned)SU->Preds.size(); i != ie; ++i) {
				const MCInstrDesc *PredMCID = DAG->getInstrDesc(SU->Preds[i].getSUnit());

				if (PredMCID == NULL) // \|\| !PredMCID->mayStore())
				continue;

				// SU->Preds is SmallVector of SDep.
				// - see http://llvm.org/docs/doxygen/html/classllvm_1_1SUnit.html
				// - see http://llvm.org/docs/doxygen/html/classllvm_1_1SDep.html
				MachineInstr *PredMI = (SU->Preds[i].getSUnit())->getInstr();
				MachineInstr *tmpNotUsed;

				if (PredMI != getPredMachineInstr(MI, &tmpNotUsed)) {
				LLVM_DEBUG(dbgs() << " isDataHazard(): jumping DAG predecessor that is "
				"NOT MachineInstr predecessor: PredMI =";
				PredMI->dump(); dbgs() << " for MI ="; MI->dump(););
				continue;
				}

				LLVM_DEBUG(dbgs() << " isDataHazard(): Found DAG predecessor that is "
				"MachineInstr predecessor: PredMI =";
				PredMI->dump(); dbgs() << " for MI ="; MI->dump(););

				LLVM_DEBUG(dbgs() << " isDataHazard(SU->Preds[" << i << "] = ";
				PredMI->dump();
				// (SU->Preds[i].getSUnit())->dump(DAG);
				// PredMCID->dump(DAG);
				dbgs() << ")\n");

				// TODO: check BETTER we have to check SU->Preds[i] is THE prev
				// instruction in the list of MachineInstr - .getParent()
				// TODO: we have to check for LD_INDIRECT_H for the memory (offset)
				// register, not the passthrough (or mask).

				unsigned numDefs = PredMCID->getNumDefs();
				LLVM_DEBUG(dbgs() << " isDataHazard(): numDefs = " << numDefs << "\n");
				LLVM_DEBUG(dbgs() << " isDataHazard(): PredMI->getNumOperands() = "
				<< PredMI->getNumOperands() << "\n");
				LLVM_DEBUG(dbgs() << " isDataHazard(): PredMCID->getNumOperands() = "
				<< PredMCID->getNumOperands() << "\n");
				LLVM_DEBUG(dbgs() << " isDataHazard(): PredMCID->getNumDefs() = "
				<< PredMCID->getNumDefs() << "\n");

				int idUseStart;
				if (MIOpcode == Connex::LD_INDIRECT_H \|\|
				MIOpcode == Connex::LD_INDIRECT_W \|\|
				MIOpcode == Connex::LD_INDIRECT_MASKED_H) {
				LLVM_DEBUG(dbgs() << " isDataHazard(): PredMI->getOpcode() = "
				<< PredMI->getOpcode() << "\n");

				if (PredMI->isInlineAsm()) {
				LLVM_DEBUG(
				dbgs() << " isDataHazard(): PredMI is INLINEASM so return true"
				<< "\n");
				// We assume that the PredMI INLINEAASM is NOT a Connex
				// instruction, but a host-side OPINCAA C++ for loop.
				// In such case, we can have 2 data hazards with MI:
				// - one with the instruction above this C++ for statement
				// - one with the instruction at the end of this for loop
				// when we unroll (if the trip-count of the loop is >1)
				// this for loop
				//
				// Important TODO: make full checks and
				// return true only if it
				// is the case, to be more efficient.
				//
				// Important TODO: return true;
				}

				/* %Wh5<def>, %BoolMask1<def,dead> = LD_INDIRECT_MASKED_H %Wh4,
				%BoolMask0, %Wh0;
				mem:LD256[inttoptr (i16 51 to i16*)]
				(tbaa=!12)(alias.scope=!16)
				The arguments ("uses") of LD_INDIRECT_MASKED_H are:
				%Wh4 - I think it is the passthrough register
				(if mask bit is 0 we use passthrough)
				%BoolMask0 - is the mask
				%Wh0 - the offset register (if mask bit is 0 we use passthrough)
				Note that Connex does NOT support masked gather just with read
				(it requires WHERE also and things become more complex than
				just masked gather, in principle)
				*/

				if (MIOpcode == Connex::LD_INDIRECT_MASKED_H) {
				idUseStart = MCID->getNumDefs() + 2; // 1 for passthrough, 1 bool mask
				} else if (MIOpcode == Connex::LD_INDIRECT_H \|\|
				MIOpcode == Connex::LD_INDIRECT_W) {
				idUseStart = MCID->getNumDefs(); // 1 for passthrough, 1 for bool mask
				}
				} else {
				idUseStart = MCID->getNumDefs();
				}

				for (unsigned idUse = idUseStart; idUse < numUses; idUse++) {
				LLVM_DEBUG(dbgs() << " isDataHazard(): MI->getOperand(" << idUse
				<< ") = " << MI->getOperand(idUse) << "\n");
				for (unsigned idDef = 0; idDef < numDefs; idDef++) {
				// See llvm.org/docs/doxygen/html/classllvm_1_1MachineOperand.html
				const MachineOperand &PredMIMO = PredMI->getOperand(idDef);
				const MachineOperand &MIMO = MI->getOperand(idUse);
				LLVM_DEBUG(dbgs() << " isDataHazard(): PredMI->getOperand(" << idDef
				<< ") = " << PredMI->getOperand(idDef) << "\n");

				if ((PredMI->getOpcode() != Connex::END_WHERE) &&
				(PredMI->getOpcode() != Connex::WHEREEQ) &&
				(PredMI->getOpcode() != Connex::WHERELT) &&
				(PredMI->getOpcode() != Connex::WHERECRY) && PredMIMO.isReg() &&
				MIMO.isReg() && PredMIMO.getReg() == MIMO.getReg()) {
				LLVM_DEBUG(dbgs()
				<< " isDataHazard(): found an instr sequence "
				"(defReg = PredOpcode; write/read/Where useReg;) and "
				"defReg == useReg. "
				"This sequence has to be separated by NOP to avoid "
				"true dependency hazard\n");
				return true;
				}
				}
				}
				}

				return false;
				}

				ScheduleHazardRecognizer::HazardType
				ConnexDispatchGroupSBHazardRecognizer::getHazardType(SUnit *SU, int Stalls) {
				return ScoreboardHazardRecognizer::getHazardType(SU, Stalls);
				}

				void ConnexDispatchGroupSBHazardRecognizer::EmitInstruction(SUnit *SU) {
				unsigned i, ie;

				LLVM_DEBUG(
				dbgs() << "Entered Connex's "
				"ConnexDispatchGroupSBHazardRecognizer::EmitInstruction(";
				dumpSU(SU, dbgs()); dbgs() << ")\n");
				//
				assert(SU->isInstr() == true);
				MachineInstr *MI = SU->getInstr();
				MachineBasicBlock *MBB = MI->getParent();
				LLVM_DEBUG(dbgs() << " EmitInstruction(): MBB = " << MBB->getFullName()
				<< "\n"
				// MBB->dump();
				);

				LLVM_DEBUG(dbgs() << " SU->Succs.size() = " << SU->Succs.size() << "\n");
				LLVM_DEBUG(dbgs() << " SU->Preds.size() = " << SU->Preds.size() << "\n");

				for (i = 0, ie = (unsigned)SU->Succs.size(); i != ie; ++i) {
				MachineInstr *SuccMI = (SU->Succs[i].getSUnit())->getInstr();
				if (SuccMI == NULL) {
				LLVM_DEBUG(dbgs() << " SU->Succs[" << i << "] = NULL\n");
				} else {
				LLVM_DEBUG(dbgs() << " SU->Succs[" << i << "] = "; SuccMI->dump();
				dbgs() << "\n");
				}
				}
				for (i = 0, ie = (unsigned)SU->Preds.size(); i != ie; ++i) {
				MachineInstr *PredMI = (SU->Preds[i].getSUnit())->getInstr();
				if (PredMI == NULL) {
				LLVM_DEBUG(dbgs() << " SU->Preds[" << i << "] = NULL\n");
				} else {
				LLVM_DEBUG(dbgs() << " SU->Preds[" << i << "] = "; PredMI->dump();
				dbgs() << "\n");
				}
				}

				return ScoreboardHazardRecognizer::EmitInstruction(SU);
				}

llvm/lib/Target/Connex/ConnexISelDAGToDAG.cpp

This file was added.

				//===-- ConnexISelDAGToDAG.cpp - A dag to dag inst selector for Connex ----===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file defines a DAG pattern matching instruction selector for Connex,
				// converting from a legalized dag to a Connex dag.
				//
				//===----------------------------------------------------------------------===//

				#include "Connex.h"
				#include "ConnexRegisterInfo.h"
				#include "ConnexSubtarget.h"
				#include "ConnexTargetMachine.h"
				#include "llvm/ADT/None.h"
				#include "llvm/CodeGen/MachineConstantPool.h"
				#include "llvm/CodeGen/MachineFrameInfo.h"
				#include "llvm/CodeGen/MachineFunction.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/CodeGen/SelectionDAGISel.h"
				#include "llvm/IR/IntrinsicInst.h"
				#include "llvm/IR/IntrinsicsConnex.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/ErrorHandling.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Target/TargetMachine.h"
				#include <algorithm>

				#define DEBUG_TYPE "connex-isel"

				// RecoverFromLlvmIR.h must be put after DEBUG_TYPE, since it has LLVM_DEBUG()
				#include "RecoverFromLlvmIR.h"

				using namespace llvm;

				#include "ConnexConfig.h"
				#include "ConnexISelMisc.h"

				/* To help reading ASM code we put some useful comments (INLINE Asm nodes)
				where the emulation of the unsupported operation of type i32/f16/etc
				starts and ends.
				*/
				#define MARKER_FOR_EMULATION

				/* Important: These macros with BITCAST can add hazards due to delay slots.
				We recommend disabling these macros.
				*/
				#define BITCAST_MAY2017_05_28
				// #define BITCAST_2018_06_F16

				#define CrtDAG CurDAG

				static bool isUnitSteppedZeroStartingVector(const BuildVectorSDNode *N) {
				unsigned int nOps = N->getNumOperands();

				assert(nOps > 1 && "isSplatVector has 0 or 1 sized build vector");

				LLVM_DEBUG(dbgs() << "Entered isUnitStridedZeroStartingVector()\n");

				for (unsigned int i = 0; i < nOps; ++i) {
				LLVM_DEBUG(dbgs() << "N->getOperand(" << i << ") = ";
				N->getOperand(i)->dump(); dbgs() << "\n");
				}

				LLVM_DEBUG(dbgs() << "Exiting isUnitStridedZeroStartingVector()\n");

				return true;
				}

				// Instruction Selector Implementation
				namespace {

				class ConnexDAGToDAGISel : public SelectionDAGISel {
				public:
				static char ID;

				explicit ConnexDAGToDAGISel(ConnexTargetMachine &TM)
				: SelectionDAGISel(ID, TM) {}

				StringRef getPassName() const override {
				return "Connex DAG->DAG Pattern Instruction Selection";
				}

				private:
				// Include the pieces autogenerated from the target description.
				#include "ConnexGenDAGISel.inc"

				bool selectVectorAddr(SDNode *Parent, SDValue N, SDValue &Base);

				void selectBUILD_VECTOR(SDNode *Node);
				void selectVECTOR_SHUFFLE(SDNode *Node);

				SDNode selectVSELECT(SDNode Node);

				SDNode selectReduceI32(SDNode Node);
				SDNode selectAddI32(SDNode Node);
				SDNode selectAbsI32(SDNode Node);
				SDNode selectSubI32(SDNode Node);
				SDNode selectMulI32(SDNode Node);
				SDNode selectSraI32(SDNode Node);
				//
				SDNode selectDivI16(SDNode Node);
				//
				SDNode selectReduceF16(SDNode Node);
				SDNode selectAddF16(SDNode Node);
				SDNode selectSubF16(SDNode Node);
				SDNode selectMulF16(SDNode Node);
				SDNode selectDivF16(SDNode Node);
				SDNode selectLtF16(SDNode Node);

				void Select(SDNode *N) override;

				// Complex Pattern for address selection.
				bool SelectAddr(SDValue Addr, SDValue &Base, SDValue &Offset);
				bool SelectFIAddr(SDValue Addr, SDValue &Base, SDValue &Offset);

				// Added from MipsSEISelDAGToDAG.cpp
				bool selectAddrFrameIndex(SDValue Addr, SDValue &Base, SDValue &Offset) const;
				bool selectAddrFrameIndexOffset(SDValue Addr, SDValue &Base, SDValue &Offset,
				unsigned OffsetBits) const;
				bool selectAddrRegImm10(SDValue Addr, SDValue &Base, SDValue &Offset) const;
				bool selectAddrDefault(SDValue Addr, SDValue &Base, SDValue &Offset) const;
				bool selectIntAddrMSA(SDValue Addr, SDValue &Base, SDValue &Offset) const;

				// In Mips we have MipsSEIselDAGToDAG inheriting MipsIselDAGToDAG, but
				// in Connex we do NOT, so we comment the override qualifier
				/// \brief Select constant vector splats.
				bool selectVSplat(SDNode *N, APInt &Imm,
				unsigned MinSizeInBits) const; // override;
				/// \brief Select constant vector splats whose value fits in a given integer.
				bool selectVSplatCommon(SDValue N, SDValue &Imm, bool Signed,
				unsigned ImmBitSize) const;
				/// \brief Select constant vector splats whose value fits in a uimm1.
				bool selectVSplatUimm1(SDValue N, SDValue &Imm) const; // override;
				/// \brief Select constant vector splats whose value fits in a uimm2.
				bool selectVSplatUimm2(SDValue N, SDValue &Imm) const; // override;
				/// \brief Select constant vector splats whose value fits in a uimm3.
				bool selectVSplatUimm3(SDValue N, SDValue &Imm) const; // override;
				/// \brief Select constant vector splats whose value fits in a uimm4.
				bool selectVSplatUimm4(SDValue N, SDValue &Imm) const; // override;
				/// \brief Select constant vector splats whose value fits in a uimm5.
				bool selectVSplatUimm5(SDValue N, SDValue &Imm) const; // override;
				/// \brief Select constant vector splats whose value fits in a uimm6.
				bool selectVSplatUimm6(SDValue N, SDValue &Imm) const; // override;
				/// \brief Select constant vector splats whose value fits in a uimm8.
				bool selectVSplatUimm8(SDValue N, SDValue &Imm) const; // override;
				/// \brief Select constant vector splats whose value fits in a simm5.
				bool selectVSplatSimm5(SDValue N, SDValue &Imm) const; // override;
				/// \brief Select constant vector splats whose value is a power of 2.
				bool selectVSplatUimmPow2(SDValue N, SDValue &Imm) const; // override;
				/// \brief Select constant vector splats whose value is the inverse of a
				/// power of 2.
				bool selectVSplatUimmInvPow2(SDValue N, SDValue &Imm) const; // override;
				/// \brief Select constant vector splats whose value is a run of set bits
				/// ending at the most significant bit
				bool selectVSplatMaskL(SDValue N, SDValue &Imm) const; // override;
				/// \brief Select constant vector splats whose value is a run of set bits
				/// starting at bit zero.
				bool selectVSplatMaskR(SDValue N, SDValue &Imm) const; // override;
				}; // end class ConnexDAGToDAGISel
				} // end namespace

				char ConnexDAGToDAGISel::ID = 0;

				// ComplexPattern used on Connex Load/Store instructions
				bool ConnexDAGToDAGISel::SelectAddr(SDValue Addr, SDValue &Base,
				SDValue &Offset) {
				// if Address is FI, get the TargetFrameIndex.
				SDLoc DL(Addr);
				if (FrameIndexSDNode *FIN = dyn_cast<FrameIndexSDNode>(Addr)) {
				// TODO_CHANGE_BACKEND:
				Base = CrtDAG->getTargetFrameIndex(FIN->getIndex(), TYPE_SCALAR_ELEMENT);

				Offset = CrtDAG->getTargetConstant(0, DL, TYPE_SCALAR_ELEMENT);
				return true;
				}

				if (Addr.getOpcode() == ISD::TargetExternalSymbol \|\|
				Addr.getOpcode() == ISD::TargetGlobalAddress)
				return false;

				// Addresses of the form Addr+const or Addr\|const
				if (CrtDAG->isBaseWithConstantOffset(Addr)) {
				ConstantSDNode *CN = dyn_cast<ConstantSDNode>(Addr.getOperand(1));
				if (isInt<32>(CN->getSExtValue())) {
				// If the first operand is a FI, get the TargetFI Node
				if (FrameIndexSDNode *FIN =
				dyn_cast<FrameIndexSDNode>(Addr.getOperand(0)))
				// TODO_CHANGE_BACKEND:
				Base =
				CrtDAG->getTargetFrameIndex(FIN->getIndex(), TYPE_SCALAR_ELEMENT);
				else
				Base = Addr.getOperand(0);

				// TODO_CHANGE_BACKEND:
				Offset = CrtDAG->getTargetConstant(CN->getSExtValue(), DL,
				TYPE_SCALAR_ELEMENT);

				return true;
				}
				}

				Base = Addr;
				// TODO_CHANGE_BACKEND:
				Offset = CrtDAG->getTargetConstant(0, DL, TYPE_SCALAR_ELEMENT);

				return true;
				}

				// ComplexPattern used on Connex FI instruction
				bool ConnexDAGToDAGISel::SelectFIAddr(SDValue Addr, SDValue &Base,
				SDValue &Offset) {
				SDLoc DL(Addr);

				if (!CrtDAG->isBaseWithConstantOffset(Addr))
				return false;

				// Addresses of the form Addr+const or Addr\|const
				ConstantSDNode *CN = dyn_cast<ConstantSDNode>(Addr.getOperand(1));
				if (isInt<32>(CN->getSExtValue())) {

				// If the first operand is a FI, get the TargetFI Node
				if (FrameIndexSDNode *FIN = dyn_cast<FrameIndexSDNode>(Addr.getOperand(0)))
				// TODO_CHANGE_BACKEND:
				Base = CrtDAG->getTargetFrameIndex(FIN->getIndex(), TYPE_SCALAR_ELEMENT);
				else
				return false;

				// TODO_CHANGE_BACKEND:
				Offset =
				CrtDAG->getTargetConstant(CN->getSExtValue(), DL, TYPE_SCALAR_ELEMENT);
				return true;
				}

				return false;
				}

				// Important: Note that recoverCExpressionFromSDNode() is used only for
				// REPEAT and BUILD_VECTOR nodes, in method Select().
				std::string
				recoverCExpressionFromSDNode(SDNode *theSDNode,
				DenseMap<const Value *, SDValue> &SDBNodeMap,
				bool failOver) {
				/*
				NOTE: the SelectionDAGISel::crtNodeMap, defined in
				lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp,
				discussed at lists.llvm.org/pipermail/llvm-dev/2016-November/107361.html

				getNodeMap() (method defined by me) returns the NodeMap object from
				SelectionDAGBuilder.h with this definition:
				DenseMap<const Value*, SDValue> NodeMap;

				Note however that this info is not enough since some SDNodes get generated
				in the following phases of the back end, namely:
				- DAG combining - see lib/CodeGen/SelectionDAG/DAGCombiner.cpp

				This class gets invoked much later, after all the ones mentioned above have
				finished.
				*/
				LLVM_DEBUG(
				dbgs()
				<< "Entered recoverCExpressionFromSDNode() (ConnexISelDAGToDAG.cpp)\n");

				std::string res;

				// Important note: class SelectionDAGBuilder is forward declared.
				// assert(SDB != NULL);
				// assert(SDB->NodeMap[(const Value *)NULL]); // NodeMap is private
				// auto iterNodeMap = SDB->NodeMap.begin();

				// bool res = SDB->HasTailCall;

				// DenseMap<const Value*, SDValue> &SDBNodeMap = crtNodeMap;

				// unsigned size = SDB->NodeMap.size();
				LLVM_DEBUG(dbgs() << "recoverCExpressionFromSDNode(): SDB->NodeMap.size() = "
				<< SDBNodeMap.size() << ", theSDNode = ";
				theSDNode->dump();
				dbgs() << ", theSDNode (ptr) = " << theSDNode << "\n");

				/* We retrieve from the SDBNodeMap the associated LLVM IR Instruction for
				theSDNode (SDNode created by SelectionDAGBuilder). */

				int counter = 0;

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1DenseMapBase.html
				for (auto iterNodeMap = SDBNodeMap.begin(); iterNodeMap != SDBNodeMap.end();
				iterNodeMap++, counter++) {

				// Type is: llvm::detail::DenseMapPair<const llvm::Value*, llvm::SDValue>
				auto tmp1 = (*iterNodeMap);

				// Value *crtValue = tmp1.first;
				const Instruction crtValue = (const Instruction )(tmp1.first);
				SDNode *crtSDNode = tmp1.second.getNode();

				LLVM_DEBUG(dbgs() << "recoverCExpressionFromSDNode(): [#" << counter
				<< "] tmp1.first = "
				<< *crtValue
				/*
				<< ", tmp1.second = ";
				tmp1.second.dump();
				dbgs() << "\n"
				*/
				<< "\n");

				if (crtSDNode != nullptr) {
				LLVM_DEBUG(
				dbgs() << "recoverCExpressionFromSDNode(): tmp1.second.getNode() = ";
				crtSDNode->dump(); dbgs() << "\n");

				if (crtSDNode == theSDNode) {
				LLVM_DEBUG(
				dbgs() << "recoverCExpressionFromSDNode(): Found a match:...\n");

				/*
				This corresponds to cases like:
				From NEW_v128i16.zip/!/300_Opincaa/sSub/STDerr_llc_01
				recoverCExpressionFromSDNode():
				tmp1.first = %broadcast.splatinsert10 =
				insertelement <128 x i16> undef, i16 %sub, i32 0, !dbg !8
				recoverCExpressionFromSDNode():
				tmp1.second.getNode() = t33: v128i16 = BUILD_VECTOR t35, t35, t35,
				... t35

				We can see here that the machine-independent back end instruction
				BUILD_VECTOR is more complex (less abstract) than the LLVM IR
				insertelement.
				The equivalent to BUILD_VECTOR LLVM IR program uses also a shufflevector
				instruction:
				%broadcast.splatinsert10 = insertelement <128 x i16> undef, i16 %sub,
				i32 0, !dbg !8
				%broadcast.splat11 = shufflevector <128 x i16> %broadcast.splatinsert10,
				<128 x i16> undef, <128 x i32> zeroinitializer, !dbg !8

				Note that recoverCExpressionFromSDNode() is used only for BUILD_VECTOR.

				For the SSD benchmark, the associated instruction is though
				ShuffleVector
				(see Tests/DawnCC/90_SSD/B/STDerr_llc_01).
				*/
				/*
				assert( (crtValue->getOpcode() == Instruction::InsertElement) \|\|
				(crtValue->getOpcode() == Instruction::ShuffleVector)
				);
				*/

				Instruction *crtValueOp1;

				switch (crtValue->getOpcode()) {
				case Instruction::InsertElement:
				case Instruction::ShuffleVector:
				if (crtValue->getOpcode() == Instruction::InsertElement) {
				crtValueOp1 = (Instruction *)(crtValue->getOperand(1));
				} else {
				crtValueOp1 = (Instruction *)(crtValue->getOperand(0));
				assert(crtValueOp1->getOpcode() == Instruction::InsertElement);
				// TODO: check that crtValueOp1->getOperand(0) is vec undef,
				// crtValueOp1->getOperand(2) is 0
				crtValueOp1 = (Instruction *)(crtValueOp1->getOperand(1));
				}
				LLVM_DEBUG(dbgs() << " crtValueOp1 = " << crtValueOp1 << "\n");

				getExprForDMATransfer = true;
				// res = getExpr(crtValueOp1);
				res = canonicalizeExpression(getExpr(crtValueOp1), true);
				LLVM_DEBUG(dbgs() << " getExpr(crtValueOp1) = " << res << "\n");
				break;
				default:
				getExprForDMATransfer = true;
				// res = getExpr(const_cast<Instruction *>(crtValue));
				res = canonicalizeExpression(
				getExpr(const_cast<Instruction *>(crtValue)), true);

				LLVM_DEBUG(dbgs() << " getExpr(crtValue) = " << res << "\n");
				break;
				}
				break;
				}
				} else {
				LLVM_DEBUG(dbgs() << "recoverCExpressionFromSDNode(): "
				"tmp1.second.getNode() == nullptr\n\n");
				}
				} // end for

				if (res.length() == 0) {
				if (failOver) {
				// #define NVEC_STR "n.vec"
				#define NVEC_STR "VTC_ceil"

				/* TODO: Find, if possible, a better solution. Keep track of the
				SelectionDAGs of all BBs, not just the current BB. */

				LLVM_DEBUG(
				dbgs()
				<< "recoverCExpressionFromSDNode(): failOver == true --> we look "
				"for NVEC_STR (vector tripcount defined in LoopVectorize.cpp) "
				"in SDBNodeMap and retrieve for it\n");

				/* Although not a great alternative, we look in SDBNodeMap for
				* an entry containing %n.vec - this should exist from a previous
				* BB.
				*/
				for (auto iterNodeMap = SDBNodeMap.begin();
				iterNodeMap != SDBNodeMap.end(); iterNodeMap++, counter++) {
				auto tmp1 = (*iterNodeMap);
				const Instruction crtValue = (const Instruction )(tmp1.first);

				LLVM_DEBUG(dbgs() << "recoverCExpressionFromSDNode(): *crtValue = "
				<< *crtValue << "\n");

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1StringRef.html
				if (crtValue->getName().starts_with(NVEC_STR) == true) {
				getExprForDMATransfer = true;

				res = getExpr(const_cast<Instruction *>(crtValue));
				LLVM_DEBUG(dbgs() << " recoverCExpressionFromSDNode(): res = " << res
				<< "\n");

				/* TODO: This is NOT good if the res already contains a constant such
				as 1 - OK we could take out CreateDiv in LoopVectorize.cpp, etc */

				// res = res + " / CONNEX_VECTOR_LENGTH";
				// Unfortunately, we hard code this also here...
				}
				}
				} else {
				assert(res.length() != 0);
				}
				}

				return res;
				}

				// Inspired from lib/Target/X86/X86ISelDAGToDAG.cpp
				bool ConnexDAGToDAGISel::selectVectorAddr(SDNode *Parent, SDValue N,
				SDValue &Index) {
				LLVM_DEBUG(dbgs() << "Entered ConnexDAGToDAGISel::selectVectorAddr()\n");

				LLVM_DEBUG(dbgs() << " selectVectorAddr(): Parent = "; Parent->dump(CrtDAG);
				dbgs() << "\n N = "; N->dump(CrtDAG);
				/*
				dbgs() << "\n Base.getNode() = " << Base.getNode();
				dbgs() << "\n Base = "; Base->dump(CrtDAG);
				*/
				dbgs() << "\n");

				// From llvm.org/docs/doxygen/html/classllvm_1_1MaskedGatherScatterSDNode.html
				MaskedGatherScatterSDNode *Mgs = dyn_cast<MaskedGatherScatterSDNode>(Parent);
				if (!Mgs)
				return false;

				/*
				// Retrieve the "scalar base pointer" (as said also at
				// lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150831/297534.html)
				Base = Mgs->getBasePtr();
				*/
				Index = Mgs->getIndex();

				LLVM_DEBUG(dbgs() << " selectVectorAddr(), after update: Parent = ";
				Parent->dump(CrtDAG); dbgs() << "\n N = "; N->dump(CrtDAG);
				dbgs() << "\n Index.getNode() = " << Index.getNode();
				dbgs() << "\n Index = "; Index->dump(CrtDAG); dbgs() << "\n");

				LLVM_DEBUG(dbgs() << "Exiting ConnexDAGToDAGISel::selectVectorAddr()\n");

				return true;
				}

				SDNode CreateInlineAsmNode(SelectionDAG CrtDAG, std::string asmString,
				SDNode *nodeSymImm, SDLoc &DL,
				bool specialCase = false) {
				// This step is very Important:
				// Important: As of Oct 2016, we must malloc the char * that is passed to
				// getTargetExternalSymbol as a reference, so we must make sure
				// the value persists after we get out of this function.
				// Hopefully no leak will happen either - maybe when deleting
				// SDNode the destructor frees the char *.
				// With difficulty I found with Google this method doing
				// creation of the SDNode, which is used also by
				arsenmUnsubmitted Done Reply Inline Actions no strncmps arsenm: no strncmps
				// getTargetExternalSymbol().
				// template <typename SDNodeT, typename... ArgTypes>
				// SDNodeT *newSDNode(ArgTypes &&... Args) {
				// return new (NodeAllocator.template Allocate<SDNodeT>())
				// SDNodeT(std::forward<ArgTypes>(Args)...);
				// }
				char *exprStrChar = new char[MAXLEN_STR];
				// strcpy(exprStrChar, asmString.c_str());
				// Inspired from
				// www.appsloveworld.com/cplus/100/251/c-stdstring-alternative-to-strcpy See
				// https://en.cppreference.com/w/cpp/algorithm/copy_n
				std::copy_n(asmString.c_str(), asmString.size() + 1, exprStrChar);
				LLVM_DEBUG(dbgs() << "CreateInlineAsmNode(): exprStrChar = " << exprStrChar
				<< "\n");
				/*
				See http://llvm.org/docs/doxygen/html/classllvm_1_1SelectionDAG.html:
				SDValue getTargetExternalSymbol(const char *Sym, EVT VT,
				unsigned char TargetFlags=0)
				*/
				SDValue extSym = CrtDAG->getTargetExternalSymbol(exprStrChar, MVT::i64);
				SDNode *extSymNode = extSym.getNode();
				LLVM_DEBUG(dbgs() << "CreateInlineAsmNode(): extSymNode = ";
				extSymNode->dump(); dbgs() << "\n");

				/*
				From http://llvm.org/doxygen/namespacellvm_1_1ISD.html
				"INLINEASM - Represents an inline asm block.
				This node always has two return values: a chain and a flag result.
				The inputs are as follows:
				Operand #0 : Input chain.
				Operand #1 : a ExternalSymbolSDNode with a pointer to the asm string.
				Operand #2 : a MDNodeSDNode with the !srcloc metadata.
				Operand #3 : HasSideEffect, IsAlignStack bits.
				After this, it is followed by a list of operands with this format:
				ConstantSDNode: Flags that encode whether it is a mem or not, the
				list of operands that follow, etc.
				See InlineAsm.h. ... however many operands ...
				Operand #last: Optional, an incoming flag."
				*/
				std::vector<SDValue> opsInline;

				// This generates either:
				// - a glue edge/link if the return type is MVT::Glue
				// - a chain edge/link if the return type is MVT::Other
				// between the nodeSymImm and the INLINEASM node.
				if (specialCase) {
				// opsInline.push_back(CrtDAG->getEntryNode());
				} else {
				opsInline.push_back(SDValue(nodeSymImm, 0));
				}
				//
				opsInline.push_back(extSym); // SDValue(extSym, 0));

				/* Creating a null-MDNode MDNodeSDNode object.
				Inspiring from (since only SelectionDAG can call constructor)
				http://llvm.org/docs/doxygen/html/SelectionDAGNodes_8h_source.html:
				class MDNodeSDNode : public SDNode {
				const MDNode *MD;
				friend class SelectionDAG;
				explicit MDNodeSDNode(const MDNode *md)
				: SDNode(ISD::MDNODE_SDNODE, 0, DebugLoc(), getSDVTList(MVT::Other)),
				MD(md)
				{}
				See also, although not helpful,
				http://llvm.org/docs/doxygen/html/classllvm_1_1MDNodeSDNode.html .
				*/
				/* Does NOT work: MDNodeSDNode mdNodeSDNode; // = MDNodeSDNode::getMD();
				is private: MDNodeSDNode::MDNodeSDNode(mdNode); */

				// Creating a NON-null-MDNode MDNodeSDNode object (has a
				// hexadecimal value when outputing the DOT file).
				/* From
				http://llvm.org/docs/doxygen/html/classllvm_1_1MDNode.html:
				Detailed Description
				Metadata nodes can be uniqued, like constants, or distinct.
				*/
				// Actually inspired from
				// http://ftp.nchc.org.tw/NetBSD/NetBSD-current/src/external/bsd/llvm/
				// dist/llvm/unittests/IR/MetadataTest.cpp
				MDNode mdNode = MDNode::get((CrtDAG->getContext()), llvm::None);
				/*
				From http://llvm.org/docs/doxygen/html/classllvm_1_1SelectionDAG.html
				<<SDValue getMDNode (const MDNode *MD)
				Return an MDNodeSDNode which holds an MDNode.>>
				*/
				SDNode *mdNodeSDNode = CrtDAG->getMDNode(mdNode).getNode();
				//
				/* Avoiding error -
				see Tests/DawnCC/30l_dotprod_f16/5/STDerr_llc_01_old03:
				<< Assertion `Op.getValueType() != MVT::Other &&
				Op.getValueType() != MVT::Glue &&
				"Chain and glue operands should occur at end of operand list!"' failed.
				*/
				if (specialCase == false) {
				opsInline.push_back(SDValue(mdNodeSDNode, 0));
				}

				/*
				From http://llvm.org/docs/doxygen/html/classllvm_1_1SelectionDAG.html
				<<SDValue getTargetConstant (uint64_t Val, SDLoc DL, EVT VT,
				bool isOpaque=false)>>
				*/
				SDValue targetConstant = CrtDAG->getTargetConstant(1, DL, MVT::i64);
				SDNode *targetConstantSDNode = targetConstant.getNode();
				//
				opsInline.push_back(SDValue(targetConstantSDNode, 0));

				if (specialCase)
				opsInline.push_back(SDValue(nodeSymImm, 0));

				arsenmUnsubmitted Done Reply Inline Actions no raw mallocs and strcpy arsenm: no raw mallocs and strcpy
				// Note that you can also look at the .dot file output
				// from the LLVM I-sel stage to get an idea on how an
				// INLINEASM node looks.

				// Related to CODE2018_07_01
				SDNode *inlineAsmNode;
				if (specialCase == true) {
				inlineAsmNode = CrtDAG->getMachineNode(
				Connex::INLINEASM, DL,
				// Result types:
				// CrtDAG->getVTList(TYPE_VECTOR_I16),
				CrtDAG->getVTList(MVT::Other, MVT::Glue), opsInline);
				} else {
				SDValue inlineAsm = CrtDAG->getNode(
				// We use this non-machine SDNode to avoid
				// <<Assertion `!AnyNotSched' failed.>> e.g.
				// in middle.block
				ISD::INLINEASM, DL,
				// Result types:
				// CrtDAG->getVTList(TYPE_VECTOR_I16),
				CrtDAG->getVTList(MVT::Other, MVT::Glue), opsInline);
				inlineAsmNode = inlineAsm.getNode();
				}

				LLVM_DEBUG(dbgs() << "CreateInlineAsmNode(): inlineAsmNode = ";
				inlineAsmNode->dump();
				// dbgs() << '\n'
				);

				return inlineAsmNode;
				} // End CreateInlineAsmNode()

				static SDValue ChangeVectorType(SDValue InOp, MVT NVT, SelectionDAG &DAG,
				bool FillWithZeroes = false) {
				// Check if InOp already has the right width.
				MVT InVT = InOp.getSimpleValueType();
				if (InVT == NVT)
				return InOp;

				if (InOp.isUndef())
				return DAG.getUNDEF(NVT);

				unsigned InNumElts = InVT.getVectorNumElements();
				unsigned WidenNumElts = NVT.getVectorNumElements();
				/*
				assert(WidenNumElts > InNumElts && WidenNumElts % InNumElts == 0 &&
				"Unexpected request for vector widening");
				*/
				assert(WidenNumElts == InNumElts && "WidenNumElts == InNumElts failed");

				EVT EltVT = NVT.getVectorElementType();

				SDLoc dl(InOp);
				if (InOp.getOpcode() == ISD::CONCAT_VECTORS && InOp.getNumOperands() == 2) {
				SDValue N1 = InOp.getOperand(1);
				if ((ISD::isBuildVectorAllZeros(N1.getNode()) && FillWithZeroes) \|\|
				N1.isUndef()) {
				InOp = InOp.getOperand(0);
				InVT = InOp.getSimpleValueType();
				InNumElts = InVT.getVectorNumElements();
				}
				}

				if (ISD::isBuildVectorOfConstantSDNodes(InOp.getNode()) \|\|
				ISD::isBuildVectorOfConstantFPSDNodes(InOp.getNode())) {
				SmallVector<SDValue, 128> Ops;
				for (unsigned i = 0; i < InNumElts; ++i) {
				// Ops.push_back(InOp.getOperand(i));
				Ops.push_back(InOp.getOperand(0));
				}

				#if 0
				SDValue FillVal = FillWithZeroes ? DAG.getConstant(0, dl, EltVT) :
				DAG.getUNDEF(EltVT);
				for (unsigned i = 0; i < WidenNumElts - InNumElts; ++i)
				Ops.push_back(FillVal);
				#endif
				SDValue res = DAG.getBuildVector(NVT, dl, Ops);

				LLVM_DEBUG(dbgs() << "Exiting ChangeVectorType() with: res = "
				<< res.getNode() << ".\n");

				return res;
				}

				assert(0 && "ChangeVectorType(): I guess this case should not be reached");
				SDValue FillVal =
				FillWithZeroes ? DAG.getConstant(0, dl, NVT) : DAG.getUNDEF(NVT);
				return DAG.getNode(ISD::INSERT_SUBVECTOR, dl, NVT, FillVal, InOp,
				DAG.getIntPtrConstant(0, dl));
				}

				void ConnexDAGToDAGISel::selectBUILD_VECTOR(SDNode *Node) {
				LLVM_DEBUG(dbgs() << "Entered selectBUILD_VECTOR().\n");

				// NEW32
				EVT typeVecNode;
				SDLoc DL(Node);

				BuildVectorSDNode *BVN = cast<BuildVectorSDNode>(Node);
				APInt SplatValue, SplatUndef;
				unsigned SplatBitSize;
				bool HasAnyUndefs;
				unsigned LdiOp;
				EVT ResTy = BVN->getValueType(0);
				EVT ViaVecTy;

				bool needsConversionToResultType = true;

				SDNode *Res;

				if (ResTy == TYPE_VECTOR_I32) {
				LLVM_DEBUG(
				dbgs() << "selectBUILD_VECTOR(): We are in the case TYPE_VECTOR_I32\n");
				/*
				TODO:
				Although so far we do not have a test for this case, in principle we
				should lower the following target-independent SDNode:
				BUILD_VECTOR i32ct
				to:
				R0 = 1;
				R1 = VLOAD i32ct_lower16bits;
				R2 = VLOAD i32ct_higher16bits;
				CELLSHR R2, R0;
				WHERE_EQ (INDEX & 1 == 1) // for all odd indices
				R1 = R2 \| R2;
				END_WHERE;
				*/
				} else if (ResTy == TYPE_VECTOR_I16) {
				LLVM_DEBUG(
				dbgs() << "selectBUILD_VECTOR(): We are in the case TYPE_VECTOR_I16\n");
				}
				typeVecNode = ResTy;

				/*
				From http://llvm.org/docs/doxygen/html/classllvm_1_1BuildVectorSDNode.html:
				bool isConstantSplat(APInt &SplatValue, APInt &SplatUndef,
				unsigned &SplatBitSize, bool &HasAnyUndefs,
				unsigned MinSplatBits=0, bool isBigEndian=false) const
				Check if this is a constant splat, and if so, find the smallest element
				size that splats the vector.

				By constant splat we understand a vector filled with the same
				constant value in all elements.
				*/
				if (BVN->isConstantSplat(SplatValue, SplatUndef, SplatBitSize, HasAnyUndefs,
				8, true) == false) {
				LLVM_DEBUG(dbgs() << "selectBUILD_VECTOR(): "
				"BVN->isConstantSplat() == false:\n");

				/* Checking if we have a symbolic splat.
				From
				http://llvm.org/docs/doxygen/html/classllvm_1_1BuildVectorSDNode.html:
				SDValue getSplatValue (BitVector *UndefElements=nullptr) const
				<<Returns the splatted value or a null value if this is not a splat.>>
				*/
				SDValue symbolicValue = BVN->getSplatValue();
				LLVM_DEBUG(dbgs() << "selectBUILD_VECTOR(): symbolicValue.getNode() = "
				<< symbolicValue.getNode() << "\n");

				// Inspired vaguely from
				// http://llvm.org/docs/doxygen/html/SelectionDAGNodes_8h_source.html
				if (symbolicValue.getNode() != nullptr) {
				LLVM_DEBUG(dbgs() << "selectBUILD_VECTOR(): symbolicValue = ";
				symbolicValue->dump(); dbgs() << "\n");
				// LdiOp = Connex::VLOAD_H_STR;

				/* For the case BUILD_VECTOR is a variable splat
				(contains the same variable in all elements of the vector),
				we retrieve the C expression from the variable and generate
				an inlineasm with VLOAD variable_C_Expression (so this is OPINCAA host
				and Connex ASM code together). */

				/*
				From http://llvm.org/docs/doxygen/html/namespacellvm_1_1ISD.html:
				<<INLINEASM - Represents an inline asm block.
				This node always has two return values: a chain and a flag result.
				The inputs are as follows:
				Operand #0 : Input chain.
				Operand #1 : a ExternalSymbolSDNode with a pointer to the asm string.
				Operand #2 : a MDNodeSDNode with the !srcloc metadata.
				Operand #3 : HasSideEffect, IsAlignStack bits.
				After this, it is followed by a list of operands with this format:
				ConstantSDNode: Flags that encode whether it is a mem or not,
				the number of operands that follow, etc.
				See InlineAsm.h. ... however many operands ...
				Operand #last: Optional, an incoming flag.
				>>
				Also, ISD::INLINEASM accepts only objects of type ConstantSDNode
				from 2nd operand onwards - see InstrEmitter.cpp, line 966:
				unsigned Flags =
				cast<ConstantSDNode>(Node->getOperand(i))->getZExtValue();

				Examples of creating an INLINEASM SDNode, in llc:
				From llvm/lib/Target/Sparc/SparcISelDAGToDAG.cpp,
				(or llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp) :

				if (Glue.getNode())
				AsmNodeOperands.push_back(Glue);
				if (!Changed)
				return false;

				SDValue New = CrtDAG->getNode(ISD::INLINEASM, SDLoc(N),
				CrtDAG->getVTList(MVT::Other, MVT::Glue), AsmNodeOperands);
				New->setNodeId(-1);
				ReplaceNode(N, New.getNode());

				Less useful: From SelectionDAGISel.cpp:
				void SelectionDAGISel::Select_INLINEASM(SDNode *N) {
				SDLoc DL(N);

				std::vector<SDValue> Ops(N->op_begin(), N->op_end());
				SelectInlineAsmMemoryOperands(Ops, DL);

				const EVT VTs[] = {MVT::Other, MVT::Glue};
				SDValue New = CrtDAG->getNode(ISD::INLINEASM, DL, VTs, Ops);
				New->setNodeId(-1);
				ReplaceUses(N, New.getNode());
				CrtDAG->RemoveDeadNode(N);
				}

				From SelectionDAGBuilder.cpp:
				Chain = DAG.getNode(ISD::INLINEASM, getCurSDLoc(),
				DAG.getVTList(MVT::Other, MVT::Glue),
				AsmNodeOperands);

				LESS relevant note: to create an InlineAsm Value in the LLVM program,
				in clang/opt, we can use the API described at:
				http://llvm.org/docs/doxygen/html/classllvm_1_1InlineAsm.html
				http://llvm.org/docs/doxygen/html/InlineAsm_8h_source.html
				http://llvm.org/docs/doxygen/html/InlineAsm_8cpp_source.html
				*/

				SDValue InFlag(nullptr, 0); // NO Glue - Null incoming flag value.
				// Inspired from ConnexISelLowering.cpp
				MachineFunction &MF = CrtDAG->getMachineFunction();
				MachineRegisterInfo &RegInfo = MF.getRegInfo();
				/* From http://llvm.org/docs/doxygen/html/classllvm_1_1SelectionDAG.html
				SDValue SelectionDAG::getBasicBlock(MachineBasicBlock *MBB);
				// SDValue bb = CrtDAG->getBasicBlock(MachineBasicBlock *MBB);
				*/

				SDNode *firstAsmInlineSDNode = NULL;
				for (auto dagIter = CrtDAG->allnodes_begin(); // allnodes_iterator
				dagIter != CrtDAG->allnodes_end(); dagIter++) {
				SDNode iterSDNode = *dagIter;
				/*
				LLVM_DEBUG(dbgs() << "dagIter = ";
				iterSDNode.dump(CrtDAG);
				dbgs() << '\n');
				*/
				if (iterSDNode.getOpcode() == ISD::INLINEASM) {
				firstAsmInlineSDNode = &iterSDNode;
				break;
				}
				}

				// Using the MDNode - because Inline gives error:
				// firstAsmInlineSDNode = (firstAsmInlineSDNode->getOperand(2)).getNode();

				if (firstAsmInlineSDNode == NULL)
				firstAsmInlineSDNode = (CrtDAG->getEntryNode()).getNode();
				LLVM_DEBUG(dbgs() << "firstAsmInlineSDNode = " << firstAsmInlineSDNode
				<< "\n");
				LLVM_DEBUG(dbgs() << "firstAsmInlineSDNode = ";
				firstAsmInlineSDNode->dump(); dbgs() << "[END]\n");

				SDValue firstAsmInlineSDValue = SDValue(firstAsmInlineSDNode, 0);
				LLVM_DEBUG(dbgs() << "firstAsmInlineSDValue = ";
				firstAsmInlineSDValue->dump(); dbgs() << "[END]\n");

				/* TODO: Treat preoperly case
				typeVecNode == TYPE_VECTOR_I32.
				I.e., with multiple VLOAD_H, CELL_SH, WHERE, etc */

				SDNode *vloadSpecial = CrtDAG->getMachineNode(
				typeVecNode == TYPE_VECTOR_I16 ? Connex::VLOAD_H_SYM_IMM :
				// Connex::VLOAD_W_SYM_IMM,
				Connex::VLOAD_H_SYM_IMM,
				DL,
				//
				// We add MVT::Glue to the return
				// types to avoid that llc performs CSE
				// on these nodes: if this getMachineNode()
				// function is called more than once we
				// return the same value again and again
				// (i.e., perform CSE) since the node doesn't
				// take any actual inputs.
				// - see why this is so at
				// llvm.org/docs/doxygen/html/SelectionDAG_8cpp_source.html
				CrtDAG->getVTList(
				// typeVecNode,
				TYPE_VECTOR_I16, MVT::Glue),
				//
				CrtDAG->getEntryNode()
				// We add a chain edge
				/* TODO Very Important - figure if
				I can do this better
				(maybe in Selection Lowering):
				//SDValue(firstAsmInlineSDNode, 0)
				firstAsmInlineSDValue */
				// SDValue(copyToRegAux, 0),
				// copyToRegAux
				/*
				Gives error: InstrEmitter.cpp:782:
				void llvm::InstrEmitter::EmitMachineNode(
				llvm::SDNode*, bool, bool,
				llvm::DenseMap<llvm::SDValue, unsigned int>&):
				Assertion `NumMIOperands >= II.getNumOperands()
				&& NumMIOperands <= II.getNumOperands() +
				II.getNumImplicitDefs() + NumImpUses &&
				"#operands for dag node doesn't match
				.td file!"' failed.
				*/
				// Node->getOperand(0)
				);
				LLVM_DEBUG(dbgs() << "selectBUILD_VECTOR(): vloadSpecial = "
				<< vloadSpecial << ".\n"
				<< "vloadSpecial = ";
				vloadSpecial->dump(); dbgs() << "\n");

				std::string exprStr = "1"; // This is Wrong - we just put an incorrect
				// value TODO(2021_02_02): see below
				/*
				// TODO(2021_02_02):
				std::string exprStr = recoverCExpressionFromSDNode(Node, crtNodeMap,
				true
				);
				*/

				// std::string exprStr = recoverCExpressionFromSDNode(
				// symbolicValue.getNode(), crtNodeMap);
				exprStr = " " + exprStr;
				exprStr = exprStr + "; // MSA_I16";
				LLVM_DEBUG(dbgs() << "selectBUILD_VECTOR(): exprStr = " << exprStr
				<< "\n");

				SDNode *inlineAsmNode =
				CreateInlineAsmNode(CrtDAG, exprStr, vloadSpecial, DL);

				// Very Important:
				// You might wonder why we require creating also
				// SDNodes CopyToReg and CopyFromReg.
				// We put them to preserve the INLINEASM SDNode, which does NOT
				// have a type and needs to be chained/glued to its VLOAD* and
				// the result (Res) from this instr-selection needs to be
				// a vector type (typeVecNode).
				// If we don't put them (e.g., we make
				// Res = inlineAsmNode;
				// we end up with erroneous cases like this
				// (which gives an assertion failure like:
				// "#operands for dag node doesn't match .td file!"):
				// SU(10): t71: v128i16,glue = VLOAD_H_SYM_IMM t0
				// SU(9): t74: ch,glue = inlineasm t71,
				// TargetExternalSymbol:i64' ((N + -1) << 1)) + 2) /
				// (((int *)&CONNEX_VL)[0])) ...;
				// // MSA_I10', MDNode:ch<0x1724220>, TargetConstant:i64<1>
				// SU(8): t75: v64i32 = NOP_BITCONVERT_WH t74
				unsigned virtRegRes = RegInfo.createVirtualRegister(
				typeVecNode == TYPE_VECTOR_I16 ? &Connex::VectorHRegClass
				: &Connex::VectorHRegClass);

				// From http://llvm.org/docs/doxygen/html/classllvm_1_1SelectionDAG.html:
				// SDValue getCopyFromReg(SDValue Chain, SDLoc dl, unsigned Reg, EVT VT)
				// SDValue getCopyFromReg(SDValue Chain, SDLoc dl, unsigned Reg, EVT VT,
				// SDValue Glue)
				//
				// SDValue getCopyToReg (SDValue Chain, SDLoc dl, unsigned Reg, SDValue N)
				// SDValue getCopyToReg (SDValue Chain, SDLoc dl, unsigned Reg, SDValue N,
				// SDValue Glue)
				// SDValue getCopyToReg (SDValue Chain, SDLoc dl, SDValue Reg, SDValue N,
				// SDValue Glue)
				SDValue copyToRegRes = CrtDAG->getCopyToReg(
				// CrtDAG->getEntryNode(),
				// messes up scheduling
				// SDValue(vloadSpecial, 0),
				// this should be considered chain
				// edge, even if VLOAD does NOT have
				// output ch port
				SDValue(inlineAsmNode, 0),
				// extSym,

				DL, virtRegRes, SDValue(vloadSpecial, 0), InFlag);

				SDValue copyFromRegRes =
				CrtDAG->getCopyFromReg(copyToRegRes, // chain
				DL, virtRegRes, typeVecNode
				//, copyToRegOp2
				);

				/*
				From http://llvm.org/docs/doxygen/html/classllvm_1_1SelectionDAG.html:
				SDValue getRegister (unsigned Reg, EVT VT)
				*/
				// Res = CrtDAG->getRegister(virtRegRes, TYPE_VECTOR_I16).getNode();
				Res = copyFromRegRes.getNode();

				LLVM_DEBUG(dbgs() << "selectBUILD_VECTOR(): Res = "; Res->dump();
				dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "selectBUILD_VECTOR(): inlineAsmNode = ";
				inlineAsmNode->dump(); dbgs() << "\n");

				/* TODO: Make sure I am not deleting an SDNode nc
				incoming on the chain port of Node, where nc is an arbitrary
				node which happened to be before Node. */
				// ReplaceNode(Node, Res);
				// return;

				needsConversionToResultType = false;
				} // End symbolicValue.getNode() != nullptr
				else {
				bool isUnitStepped = isUnitSteppedZeroStartingVector(BVN);
				LLVM_DEBUG(dbgs() << "selectBUILD_VECTOR(): isUnitStepped = true\n");

				if (isUnitStepped) {
				LLVM_DEBUG(dbgs() << "selectBUILD_VECTOR(): isUnitStepped = true\n");
				/*
				LLVM_DEBUG(dbgs() << "Select() for ISD::BUILD_VECTOR: Res = ";
				Res->print(dbgs()); dbgs() << "\n");
				*/

				LdiOp = Connex::LDIX_H;

				ViaVecTy = TYPE_VECTOR_I16;
				/*
				// return std::make_pair(false, nullptr);
				LLVM_DEBUG(
				dbgs() << "Select() for ISD::BUILD_VECTOR: exiting with 1st return"
				"nullptr\n");

				return;
				*/

				/* Important: We use Connex's LDIX (LDIX_H) instruction
				to load the immediate value Imm in all vector elements. */
				Res = CrtDAG->getMachineNode(LdiOp, DL, ViaVecTy);

				if (ResTy != ViaVecTy) {
				// If LdiOp is writing to a different register class to ResTy, then
				// fix it up here. This COPY_TO_REGCLASS should never cause a move.v
				// since the source and destination register sets contain the same
				// registers.
				const TargetLowering *TLI = getTargetLowering();
				MVT ResTySimple = ResTy.getSimpleVT();
				const TargetRegisterClass *RC = TLI->getRegClassFor(ResTySimple);

				LLVM_DEBUG(
				dbgs()
				<< "selectBUILD_VECTOR(): before CrtDAG->getMachineNode()\n");
				Res = CrtDAG->getMachineNode(
				Connex::COPY_TO_REGCLASS, DL, ResTy, SDValue(Res, 0),
				CrtDAG->getTargetConstant(RC->getID(), DL,
				// TODO_CHANGE_BACKEND:
				// MVT::i64));
				TYPE_SCALAR_ELEMENT));
				}
				}
				}
				} // End BVN->isConstantSplat == false
				else {
				LLVM_DEBUG(
				dbgs() << "selectBUILD_VECTOR(): BVN->isConstantSplat() == true, "
				<< "SplatValue = " << SplatValue << ", SplatUndef = "
				<< SplatUndef << ", SplatBitSize = " << SplatBitSize << "\n");

				// TODO_CHANGE_BACKEND:
				// if (SplatBitSize == 8 \|\| SplatBitSize == 16 \|\| SplatBitSize == 32)
				if (SplatBitSize != TYPE_VECTOR_I16_ELEMENT_BITSIZE) {
				LLVM_DEBUG(dbgs() << "selectBUILD_VECTOR(): SplatBitSize == "
				<< SplatBitSize
				<< "(8 is NOT supported in our back end)\n");
				// Important-TODO: kindda wicked hack - try to avoid by working
				// defining in TableGen the right conversion records
				// TODO_CHANGE_BACKEND:
				SplatBitSize = 16;
				// SplatBitSize = 32;
				// SplatBitSize = 64;

				LLVM_DEBUG(
				dbgs() << " --> Extending element type to SplatBitSize = "
				<< SplatBitSize << "\n");

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1APInt.html
				llvm::SmallVector<char, 5> splatValueStr;
				SplatValue.toString(splatValueStr, 10, 1);
				LLVM_DEBUG(dbgs() << "selectBUILD_VECTOR: SplatValue = " << splatValueStr
				<< "\n");
				LLVM_DEBUG(dbgs() << "selectBUILD_VECTOR: SplatValue.getBitWidth() = "
				<< SplatValue.getBitWidth() << "\n");

				// TODO This should be performed through TableGen
				// if (SplatBitSize > SplatValue.getBitWidth())
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1APInt.html
				SplatValue = SplatValue.zextOrTrunc(SplatBitSize);

				LLVM_DEBUG(dbgs() << "Select() for ISD::BUILD_VECTOR: After, "
				"SplatValue.getBitWidth() = "
				<< SplatValue.getBitWidth() << "\n");
				}

				llvm::SmallVector<char, 5> splatUndefStr;
				SplatUndef.toString(splatUndefStr, 10, 1);
				LLVM_DEBUG(dbgs() << "selectBUILD_VECTOR: SplatUndef = " << splatUndefStr
				<< "\n");
				LLVM_DEBUG(dbgs() << "selectBUILD_VECTOR: SplatBitSize = " << SplatBitSize
				<< "\n");

				/* TODO: VLOAD is NOT a feasible option if BUILD_VECTOR is loaded
				with DIFFERENT constant values. */

				switch (SplatBitSize) {
				default:
				// return std::make_pair(false, nullptr);
				LLVM_DEBUG(
				dbgs() << "selectBUILD_VECTOR: exiting with 2nd return nullptr\n");
				return;
				case 8:
				// LdiOp = Connex::VLOAD_B;
				LdiOp = Connex::VLOAD_H;
				// TODO_CHANGE_BACKEND:
				// ViaVecTy = MVT::v16i8;
				// ViaVecTy = MVT::v16i32;
				ViaVecTy = TYPE_VECTOR_I16;
				/*
				LdiOp = Connex::VLOAD_H;
				ViaVecTy = MVT::v8i64;
				*/
				break;
				case 16:
				LdiOp = Connex::VLOAD_H;
				// TODO_CHANGE_BACKEND:
				ViaVecTy = TYPE_VECTOR_I16;
				break;
				case 32:
				// TODO_CHANGE_BACKEND:
				// TODO: We should add also WHERE and vload depending on index
				LdiOp = Connex::VLOAD_H;
				ViaVecTy = TYPE_VECTOR_I16;
				/*
				LdiOp = Connex::VLOAD_W;
				ViaVecTy = TYPE_VECTOR_I32; */
				break;
				case 64:
				assert(0 && "Connex supports only 16 bits"
				"immediate operands - see ConnexISA.docx");
				LdiOp = Connex::VLOAD_W; // TODO: actually VLOAD_D
				// TODO_CHANGE_BACKEND:
				// ViaVecTy = MVT::v8i64;
				ViaVecTy = TYPE_VECTOR_I16;
				break;
				/*
				LdiOp = Connex::VLOAD_H; //VLOAD:
				ViaVecTy = MVT::v8i64;
				break;
				*/
				}

				/*
				From http://llvm.org/docs/doxygen/html/APInt_8h_source.html:
				bool isSignedIntN(unsigned N) const
				Check if this APInt has an N-bits signed integer value.
				*/
				if (!SplatValue.isSignedIntN(16)) {
				// return std::make_pair(false, nullptr);
				LLVM_DEBUG(
				dbgs() << "selectBUILD_VECTOR: exiting via 3rd return nullptr\n");

				return;
				}
				LLVM_DEBUG(
				dbgs() << "selectBUILD_VECTOR: SplatValue.isSignedIntN(16) == true\n");

				llvm::SmallVector<char, 5> splatValueStr;
				SplatValue.toString(splatValueStr, 10, 1);
				LLVM_DEBUG(dbgs() << "selectBUILD_VECTOR: SplatValue = " << splatValueStr
				<< "\n");

				// See http://llvm.org/docs/doxygen/html/structllvm_1_1EVT.html
				LLVM_DEBUG(
				dbgs() << "selectBUILD_VECTOR: ViaVecTy.getVectorElementType() = "
				<< ViaVecTy.getVectorElementType().getEVTString() << "\n");

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1SDLoc.html and
				// http://llvm.org/docs/doxygen/html/classllvm_1_1DebugLoc.html
				// LLVM_DEBUG(dbgs() << "selectBUILD_VECTOR: DL = "
				// << DL.getDebugLoc().getLoc() << "\n");

				LLVM_DEBUG(
				dbgs() << "selectBUILD_VECTOR: before CrtDAG->getTargetConstant()\n");
				SDValue Imm = CrtDAG->getTargetConstant(SplatValue, DL,
				ViaVecTy.getVectorElementType());
				LLVM_DEBUG(
				dbgs() << "selectBUILD_VECTOR: after CrtDAG->getTargetConstant()\n");
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1SDValue.html
				LLVM_DEBUG(dbgs() << "selectBUILD_VECTOR: Imm = "; Imm.dump();
				dbgs() << "\n");

				/* Important: if we got this far then we use Connex's VLOAD (VLOAD_H)
				instruction to load the immediate value Imm in all vector elements. */
				Res = CrtDAG->getMachineNode(LdiOp, DL, ViaVecTy, Imm);

				// It doesn't make sense to use target independent BITCAST
				/*
				Res = CrtDAG->getMachineNode(ISD::BITCAST, DL,
				typeVecNode, SDValue(Res2, 0));
				*/
				}

				if (ResTy == TYPE_VECTOR_I32 && needsConversionToResultType) {
				LLVM_DEBUG(
				dbgs() << "selectBUILD_VECTOR(): Adding NOP_BITCONVERT_HW node\n");

				SDNode *ResOrig = Res;
				Res = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_HW, DL, typeVecNode,
				SDValue(ResOrig, 0));
				}

				/*
				return std::make_pair(true, Res);
				*/
				LLVM_DEBUG(dbgs() << "selectBUILD_VECTOR(): Res = ";
				/* print() gives "Segmentation fault" when BUILD_VECTOR
				contains vars Res->print(dbgs()); dbgs() << "\n"); */
				Res->dump(CrtDAG); dbgs() << "\n");

				ReplaceNode(Node, Res);
				} // End selectBUILD_VECTOR()

				SDNode ConnexDAGToDAGISel::selectReduceI32(SDNode Node) {
				LLVM_DEBUG(dbgs() << "Entered selectReduceI32(): Selecting Node = ";
				Node->dump(CrtDAG); dbgs() << "\n");

				SDLoc DL(Node);

				EVT ViaVecTy;
				EVT typeVecNode;

				// EVT ResTy = Node->getValueType(1); // 0 is ch (chain)

				LLVM_DEBUG(
				dbgs() << "selectReduceI32(): We are in the case TYPE_VECTOR_I32\n");
				typeVecNode = TYPE_VECTOR_I32;

				// NOTE: Opnd 1 is a ct
				SDValue nodeOpSrc = Node->getOperand(2);

				// We need to preserve the node that was chained with Node to avoid
				// it is removed
				SDValue nodeOpChain = Node->getOperand(0); // Opnd 0 is ch (chain)

				LLVM_DEBUG(dbgs() << "selectReduceI32(): nodeOpSrc.getValueType() = "
				<< nodeOpSrc.getValueType().getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "selectReduceI32(): nodeOpSrc = ";
				(nodeOpSrc.getNode())->dump(); dbgs() << "\n");
				// assert(nodeOpSrc.getValueType() == TYPE_VECTOR_I32);

				#ifdef MARKER_FOR_EMULATION
				SDNode *nodeOpSrcCastBogus = CrtDAG->getMachineNode(
				Connex::NOP_BITCONVERT_WH, DL, TYPE_VECTOR_I16, MVT::Other,
				// This gives error: MVT::Glue,
				nodeOpSrc,
				// chain edge
				nodeOpChain);

				std::string exprStrBegin = "// Starting RED.i32 emulation ;)";
				SDNode *inlineAsmNodeBegin =
				CreateInlineAsmNode(CrtDAG, exprStrBegin, nodeOpSrcCastBogus, DL);
				LLVM_DEBUG(dbgs() << "selectReduceI32: inlineAsmNodeBegin = ";
				inlineAsmNodeBegin->dump(); dbgs() << "\n");

				/* This node is also bogus, only for the sake of "sandwhiching"
				the INLINE assembly with 2 NOPs.
				*/
				SDNode *nodeOpSrcCast = CrtDAG->getMachineNode(
				Connex::NOP_BITCONVERT_HH, DL, TYPE_VECTOR_I16, MVT::Other,
				/* Important: this can give
				error:
				<<Assertion
				`N->getNodeId() == -1
				&&
				"Node already inserted!">>
				MVT::Glue,
				*/
				SDValue(nodeOpSrcCastBogus, 0),
				// chain
				SDValue(inlineAsmNodeBegin, 0));
				#else
				SDNode *nodeOpSrcCast = CrtDAG->getMachineNode(
				Connex::NOP_BITCONVERT_WH, DL, TYPE_VECTOR_I16, MVT::Glue, nodeOpSrc,
				// chain edge
				nodeOpChain);
				#endif

				#include "Select_REDi32_OpincaaCodeGen.h"

				#ifdef MARKER_FOR_EMULATION
				std::string exprStrEnd = "// Finishing RED.i32 emulation ;)";

				/*
				SDNode *inlineAsmNodeEnd = CreateInlineAsmNode(CrtDAG, exprStrEnd,
				reduceHigh16, DL);
				LLVM_DEBUG(dbgs() << "selectReduceI32(): inlineAsmNodeEnd = ";
				inlineAsmNodeEnd->dump(); dbgs() << "\n");

				LLVM_DEBUG(dbgs() << "selectReduceI32(): reduceHigh16 = ";
				reduceHigh16->dump(CrtDAG); dbgs() << "\n");
				// return inlineAsmNodeEnd;
				// Gives error: <<SelectionDAG.cpp:6421:
				// void llvm::SelectionDAG::ReplaceAllUsesWith(llvm::SDNode*,
				// llvm::SDNode*): Assertion `(!From->hasAnyUseOfValue(i) \|\|
				// From->getValueType(i) == To->getValueType(i)) &&
				// "Cannot use this version of ReplaceAllUsesWith!"' failed.>>
				*/

				SDNode *resHH = CreateInlineAsmNode(CrtDAG, exprStrEnd, reduceHigh16, DL);

				/*
				// This node is also bogus, only for the sake of "sandwhiching" the INLINE
				// assembly with 2 instructions.
				SDNode *resHH = CrtDAG->getMachineNode(
				Connex::NOP_BITCONVERT_HH,
				DL,
				// Gives error: <<void llvm::SelectionDAG::ReplaceAllUsesWith(llvm::SDNode*,
				// llvm::SDNode*): Assertion `(!From->hasAnyUseOfValue(i) \|\|
				// From->getValueType(i) == To->getValueType(i)) &&
				// "Cannot use this version of ReplaceAllUsesWith!"' failed.>>
				// TYPE_VECTOR_I16,
				// Gives error: <<Assertion `NumMIOperands >= II.getNumOperands() &&
				// NumMIOperands <= II.getNumOperands() + II.getNumImplicitDefs() +
				// NumImpUses &&
				// "#operands for dag node doesn't match .td file!"' failed.>> MVT::Other,
				SDValue(reduceHigh16, 0),
				// chain edge
				//SDValue(resH, 1)
				SDValue(inlineAsmNodeEnd, 0)
				);
				*/
				LLVM_DEBUG(dbgs() << "selectReduceI32(): resHH = "; resHH->dump(CrtDAG);
				dbgs() << "\n");

				return resHH;
				#else
				LLVM_DEBUG(dbgs() << "selectReduceI32(): reduceHigh16 = ";
				reduceHigh16->dump(CrtDAG); dbgs() << "\n");

				return reduceHigh16;
				#endif
				} // End selectReduceI32()

				SDNode ConnexDAGToDAGISel::selectReduceF16(SDNode Node) {
				LLVM_DEBUG(dbgs() << "Entered selectReduceF16(): Selecting Node = ";
				Node->dump(CrtDAG); dbgs() << "\n");

				SDLoc DL(Node);

				EVT ViaVecTy;
				EVT typeVecNode;

				// EVT ResTy = Node->getValueType(1); // 0 is ch (chain)

				LLVM_DEBUG(
				dbgs() << "selectReduceF16(): We are in the case TYPE_VECTOR_F16\n");
				typeVecNode = TYPE_VECTOR_F16;

				// NOTE: Opnd 1 is a ct
				SDValue nodeOpSrc = Node->getOperand(2);

				// We need to preserve the node that was chained with Node to avoid
				// it is removed.
				SDValue nodeOpChain = Node->getOperand(0); // Opnd 0 is ch (chain)

				LLVM_DEBUG(dbgs() << "selectReduceF16(): nodeOpSrc.getValueType() = "
				<< nodeOpSrc.getValueType().getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "selectReduceF16(): nodeOpSrc = ";
				(nodeOpSrc.getNode())->dump(); dbgs() << "\n");
				// assert(nodeOpSrc.getValueType() == TYPE_VECTOR_F16);

				#ifdef MARKER_FOR_EMULATION
				SDNode *nodeOpSrcCastBogus1 = CrtDAG->getMachineNode(
				Connex::NOP_BITCONVERT_HH, DL, TYPE_VECTOR_I16, MVT::Other,
				// This gives error: MVT::Glue,
				nodeOpSrc,
				// chain edge
				nodeOpChain);

				std::string exprStrBegin = "// Starting red.f16 emulation ;)";
				SDNode *inlineAsmNodeBegin =
				CreateInlineAsmNode(CrtDAG, exprStrBegin, nodeOpSrcCastBogus1, DL);
				LLVM_DEBUG(dbgs() << "selectReduceF16: inlineAsmNodeBegin = ";
				inlineAsmNodeBegin->dump(); dbgs() << "\n");

				/* This node is also bogus, only for the sake of "sandwhiching" the INLINE
				assembly with 2 NOPs. */
				SDNode *nodeOpSrcCast =
				CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_HH,
				// Important: this is a BOGUS
				// NOP_BITCONVERT - we just put
				// it since it has a Glue result
				// while nodeOpSrcCast2 does NOT
				DL, TYPE_VECTOR_I16, MVT::Other,
				// Important: this gives error:
				// <<Assertion `N->getNodeId()
				// == -1 &&
				// "Node already inserted!">>
				// MVT::Glue,
				SDValue(nodeOpSrcCastBogus1, 0),
				// chain
				SDValue(inlineAsmNodeBegin, 0));
				#else
				SDNode *nodeOpSrcCast = CrtDAG->getMachineNode(
				Connex::NOP_BITCONVERT_HH, DL, TYPE_VECTOR_I16, MVT::Glue, nodeOpSrc,
				// chain edge
				nodeOpChain);
				#endif

				#include "Select_REDf16_OpincaaCodeGen.h"

				#ifdef MARKER_FOR_EMULATION
				std::string exprStrEnd = "// Finishing red.f16 emulation ;)";
				/*
				SDNode *inlineAsmNodeEnd = CreateInlineAsmNode(CrtDAG, exprStrEnd,
				reduceH, DL);
				LLVM_DEBUG(dbgs() << "selectReduceF16(): inlineAsmNodeEnd = ";
				inlineAsmNodeEnd->dump(); dbgs() << "\n");
				*/
				SDNode *reduceHH = CreateInlineAsmNode(CrtDAG, exprStrEnd, reduceH, DL);

				LLVM_DEBUG(dbgs() << "SelectReduceF16(): reduceH = "; reduceH->dump(CrtDAG);
				dbgs() << "\n");
				// return inlineAsmNodeEnd;
				// Gives error: <<SelectionDAG.cpp:6421:
				// void llvm::SelectionDAG::ReplaceAllUsesWith(llvm::SDNode, llvm::SDNode):
				// Assertion `(!From->hasAnyUseOfValue(i) \|\| From->getValueType(i) ==
				// To->getValueType(i)) && "Cannot use this version of ReplaceAllUsesWith!"'
				// failed.>>

				/*
				// This node is also bogus, only for the sake of "sandwhiching" the INLINE
				// assembly with 2 instructions.
				SDNode *reduceHH = CrtDAG->getMachineNode(
				Connex::NOP_BITCONVERT_HH,
				DL,
				// Gives error: <<void llvm::SelectionDAG::ReplaceAllUsesWith(llvm::SDNode*,
				// llvm::SDNode*): Assertion `(!From->hasAnyUseOfValue(i) \|\|
				// From->getValueType(i) == To->getValueType(i)) &&
				// "Cannot use this version of ReplaceAllUsesWith!"' failed.>>
				// TYPE_VECTOR_I16,
				// Gives error: <<Assertion `NumMIOperands >= II.getNumOperands() &&
				// NumMIOperands <= II.getNumOperands() + II.getNumImplicitDefs() +
				// NumImpUses && "#operands for dag node doesn't match .td file!"'
				// failed.>>
				// MVT::Other,
				SDValue(reduceH, 0),
				// chain edge
				//SDValue(reduceH, 1)
				SDValue(inlineAsmNodeEnd, 0)
				);
				*/
				LLVM_DEBUG(dbgs() << "selectReduceF16(): reduceHH = "; reduceHH->dump(CrtDAG);
				dbgs() << "\n");

				return reduceHH;
				#else
				LLVM_DEBUG(dbgs() << "selectReduceF16(): reduceH = "; reduceH->dump(CrtDAG);
				dbgs() << "\n");

				return reduceH;
				#endif
				} // End selectReduceF16()

				SDNode ConnexDAGToDAGISel::selectAddI32(SDNode Node) {
				LLVM_DEBUG(dbgs() << "Entered selectAddI32(): Selecting Node = ";
				Node->dump(CrtDAG); dbgs() << "\n");

				/* We look into doing "instruction-select" to
				OpDst = ADD OpSRC1, OpSRC2
				where the 3 operands are vectors of type <VFxi32>: */

				SDLoc DL(Node);

				EVT ViaVecTy;
				EVT typeVecNode;

				// EVT ResTy = Node->getValueType(1); // 0 is ch (chain)

				LLVM_DEBUG(dbgs() << "selectAddI32(): We are in the case TYPE_VECTOR_I32\n");
				typeVecNode = TYPE_VECTOR_I32;

				SDValue nodeOpSrc1 = Node->getOperand(0);
				SDValue nodeOpSrc2 = Node->getOperand(1);

				LLVM_DEBUG(dbgs() << "selectAddI32(): nodeOpSrc1.getValueType() = "
				<< nodeOpSrc1.getValueType().getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "selectAddI32(): nodeOpSrc1 = ";
				(nodeOpSrc1.getNode())->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "selectAddI32(): nodeOpSrc2.getValueType() = "
				<< nodeOpSrc2.getValueType().getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "selectAddI32(): nodeOpSrc2 = ";
				(nodeOpSrc2.getNode())->dump(); dbgs() << "\n");
				// assert(nodeOpSrc.getValueType() == TYPE_VECTOR_I32);

				/*
				Very Important:
				We convert the v4i32 add operation into a sequence of nodes that take as
				input the v4i32 operands of the operation convert them to v8i16 operands
				using the NOP_BITCONVERT_WH nodes and then instantiating the SDNodes
				emulating the v4i32 add operation.
				At the end we put a NOP_BITCONVERT_HW SDNode converting the result from
				v8i16 to v4i32.
				Note that these NOP_BITCONVERT_* nodes are more helpful conceptually - but
				they also keep the nodes s.t. they are not scheduled badly.
				*/

				SDNode *nodeOpSrcCast1 = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_WH, DL,
				// The output type of the node
				TYPE_VECTOR_I16,
				// MVT::Other,
				MVT::Glue, nodeOpSrc1);

				#ifdef MARKER_FOR_EMULATION
				std::string exprStrBegin = "// Starting ADD.i32 emulation ;)";
				SDNode *inlineAsmNodeBegin =
				CreateInlineAsmNode(CrtDAG, exprStrBegin, nodeOpSrcCast1, DL);
				LLVM_DEBUG(dbgs() << "selectAddI32: inlineAsmNodeBegin = ";
				inlineAsmNodeBegin->dump(); dbgs() << "\n");
				#endif

				SDNode *nodeOpSrcCast2 = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_WH, DL,
				// The output type of the node
				TYPE_VECTOR_I16, MVT::Other,
				// Important: it gives error:
				// <<Assertion `N->getNodeId()
				// == -1 &&
				// "Node already inserted!">>
				// MVT::Glue,
				nodeOpSrc2,
				// chain
				#ifdef MARKER_FOR_EMULATION
				SDValue(inlineAsmNodeBegin, 0)
				#else
				SDValue(nodeOpSrcCast1, 1)
				#endif
				);

				#include "Select_ADDi32_OpincaaCodeGen.h"

				#ifdef MARKER_FOR_EMULATION
				std::string exprStrEnd = "// Finishing ADD.i32 emulation ;)";
				SDNode *inlineAsmNodeEnd = CreateInlineAsmNode(CrtDAG, exprStrEnd,
				lastNode, // resH,
				DL);
				LLVM_DEBUG(dbgs() << "selectAddI32(): inlineAsmNodeEnd = ";
				inlineAsmNodeEnd->dump(); dbgs() << "\n");

				LLVM_DEBUG(dbgs() << "selectAddI32(): resH = "; resH->dump(CrtDAG);
				dbgs() << "\n");
				// return inlineAsmNodeEnd;
				// Gives error: <<SelectionDAG.cpp:6421:
				// void llvm::SelectionDAG::ReplaceAllUsesWith(llvm::SDNode, llvm::SDNode):
				// Assertion `(!From->hasAnyUseOfValue(i) \|\|
				// From->getValueType(i) == To->getValueType(i)) &&
				// "Cannot use this version of ReplaceAllUsesWith!"' failed.>>
				#endif

				SDNode *resW = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_HW, DL,
				typeVecNode, SDValue(resH, 0),
				// chain edge
				#ifdef MARKER_FOR_EMULATION
				SDValue(inlineAsmNodeEnd, 0)
				#else
				SDValue(resH, 1)
				#endif
				);
				LLVM_DEBUG(dbgs() << "selectAddI32(): resW = "; resW->dump(CrtDAG);
				dbgs() << "\n");

				return resW;
				} // End selectAddI32()

				SDNode ConnexDAGToDAGISel::selectAbsI32(SDNode Node) {
				LLVM_DEBUG(dbgs() << "Entered selectAbsI32(): Selecting Node = ";
				Node->dump(CrtDAG); dbgs() << "\n");

				/* We look into doing "instruction-select" to
				OpDst = ABS.i32 OpSRC1, OpSRC2
				where the 3 operands are vectors of type <VFxi32>: */

				SDLoc DL(Node);

				EVT ViaVecTy;
				EVT typeVecNode;

				// EVT ResTy = Node->getValueType(1); // 0 is ch (chain)

				LLVM_DEBUG(dbgs() << "selectAbsI32(): We are in the case TYPE_VECTOR_I32\n");
				typeVecNode = TYPE_VECTOR_I32;

				SDValue nodeOpSrc1 = Node->getOperand(0);

				LLVM_DEBUG(dbgs() << "selectAbsI32(): nodeOpSrc1.getValueType() = "
				<< nodeOpSrc1.getValueType().getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "selectAbsI32(): nodeOpSrc1 = ";
				(nodeOpSrc1.getNode())->dump(); dbgs() << "\n");
				// assert(nodeOpSrc.getValueType() == TYPE_VECTOR_I32);

				/*
				Very Important:
				We convert the v4i32 add operation into a sequence of nodes that take as
				input the v4i32 operands of the operation convert them to v8i16 operands
				using the NOP_BITCONVERT_WH nodes and then instantiating the SDNodes
				emulating the v4i32 add operation.
				At the end we put a NOP_BITCONVERT_HW SDNode converting the result from
				v8i16 to v4i32.
				Note that these NOP_BITCONVERT_* nodes are more helpful conceptually - but
				they also keep the nodes s.t. they are not scheduled badly.
				*/

				#ifdef MARKER_FOR_EMULATION
				SDNode *nodeOpSrcCastBogus = CrtDAG->getMachineNode(
				Connex::NOP_BITCONVERT_WH, DL, TYPE_VECTOR_I16, MVT::Other,
				// This gives error: MVT::Glue,
				nodeOpSrc1
				// chain edge
				// nodeOpChain
				);

				std::string exprStrBegin = "// Starting ABS.i32 emulation ;)";
				SDNode *inlineAsmNodeBegin = CreateInlineAsmNode(CrtDAG, exprStrBegin,
				// nodeOpSrcCast1, DL);
				nodeOpSrcCastBogus, DL);
				LLVM_DEBUG(dbgs() << "selectAbsI32: inlineAsmNodeBegin = ";
				inlineAsmNodeBegin->dump(); dbgs() << "\n");

				/* This node is also bogus, only for the sake of "sandwhiching"
				the INLINE assembly with 2 NOPs. */
				SDNode *nodeOpSrcCast = CrtDAG->getMachineNode(
				Connex::NOP_BITCONVERT_HH, DL, TYPE_VECTOR_I16, MVT::Other,
				// Important: this gives error:
				// <<Assertion
				// `N->getNodeId() == -1 &&
				// "Node already inserted!">>
				// MVT::Glue,
				SDValue(nodeOpSrcCastBogus, 0),
				// chain
				SDValue(inlineAsmNodeBegin, 0));
				#endif

				#include "Select_ABSi32_OpincaaCodeGen.h"

				#ifdef MARKER_FOR_EMULATION
				std::string exprStrEnd = "// Finishing ABS.i32 emulation ;)";
				SDNode *inlineAsmNodeEnd = CreateInlineAsmNode(CrtDAG, exprStrEnd,
				lastNode, // resH,
				DL);
				LLVM_DEBUG(dbgs() << "selectAbsI32(): inlineAsmNodeEnd = ";
				inlineAsmNodeEnd->dump(); dbgs() << "\n");

				LLVM_DEBUG(dbgs() << "selectAbsI32(): resH = "; resH->dump(CrtDAG);
				dbgs() << "\n");
				// return inlineAsmNodeEnd;
				// Gives error: <<SelectionDAG.cpp:6421: void llvm::SelectionDAG::
				// ReplaceAllUsesWith(llvm::SDNode, llvm::SDNode):
				// Assertion `(!From->hasAnyUseOfValue(i) \|\|
				// From->getValueType(i) == To->getValueType(i)) &&
				// "Cannot use this version of ReplaceAllUsesWith!"' failed.>>
				#endif

				SDNode *resW = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_HW, DL,
				typeVecNode, SDValue(resH, 0),
				// chain edge
				#ifdef MARKER_FOR_EMULATION
				SDValue(inlineAsmNodeEnd, 0)
				#else
				SDValue(resH, 1)
				#endif
				);
				LLVM_DEBUG(dbgs() << "selectAbsI32(): resW = "; resW->dump(CrtDAG);
				dbgs() << "\n");

				return resW;
				} // End selectAbsI32()

				SDNode ConnexDAGToDAGISel::selectSubI32(SDNode Node) {
				LLVM_DEBUG(dbgs() << "Entered selectSubI32(): Selecting Node = ";
				Node->dump(CrtDAG); dbgs() << "\n");

				SDLoc DL(Node);

				EVT ViaVecTy;
				EVT typeVecNode;

				// EVT ResTy = Node->getValueType(1); // 0 is ch (chain)

				LLVM_DEBUG(dbgs() << "selectSubI32(): We are in the case TYPE_VECTOR_I32\n");
				typeVecNode = TYPE_VECTOR_I32;

				SDValue nodeOpSrc1 = Node->getOperand(0);
				SDValue nodeOpSrc2 = Node->getOperand(1);

				LLVM_DEBUG(dbgs() << "selectSubI32(): nodeOpSrc1.getValueType() = "
				<< nodeOpSrc1.getValueType().getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "selectSubI32(): nodeOpSrc1 = ";
				(nodeOpSrc1.getNode())->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "selectSubI32(): nodeOpSrc2.getValueType() = "
				<< nodeOpSrc2.getValueType().getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "selectSubI32(): nodeOpSrc2 = ";
				(nodeOpSrc2.getNode())->dump(); dbgs() << "\n");
				// assert(nodeOpSrc.getValueType() == TYPE_VECTOR_I32);

				SDNode *nodeOpSrcCast1 =
				CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_WH, DL, TYPE_VECTOR_I16,
				// MVT::Other,
				MVT::Glue, nodeOpSrc1);
				//
				std::string exprStrBegin = "// Starting SUB.i32 emulation ;)";
				SDNode *inlineAsmNodeBegin =
				CreateInlineAsmNode(CrtDAG, exprStrBegin, nodeOpSrcCast1, DL);
				LLVM_DEBUG(dbgs() << "Select() for SUB.i32: inlineAsmNodeBegin = ";
				inlineAsmNodeBegin->dump(); dbgs() << "\n");
				//
				SDNode *nodeOpSrcCast2 = CrtDAG->getMachineNode(
				Connex::NOP_BITCONVERT_WH, DL, TYPE_VECTOR_I16, MVT::Other,
				// Important: it gives error:
				// <<Assertion
				// `N->getNodeId() == -1 &&
				// "Node already inserted!">>
				// MVT::Glue,
				nodeOpSrc2,
				// chain
				// SDValue(nodeOpSrcCast1, 1)
				SDValue(inlineAsmNodeBegin, 0));

				#include "Select_SUBi32_OpincaaCodeGen.h"

				std::string exprStrEnd = "// Finishing SUB.i32 emulation ;)";
				SDNode *inlineAsmNodeEnd = CreateInlineAsmNode(CrtDAG, exprStrEnd,
				lastNode, // resH,
				DL);
				LLVM_DEBUG(dbgs() << "selectSubI32(): inlineAsmNodeEnd = ";
				inlineAsmNodeEnd->dump(); dbgs() << "\n");

				SDNode *resW = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_HW, DL,
				typeVecNode, SDValue(resH, 0),
				// chain edge
				// SDValue(resH, 1)
				SDValue(inlineAsmNodeEnd, 0));
				LLVM_DEBUG(dbgs() << "selectSubI32(): resW = "; resW->dump(CrtDAG);
				dbgs() << "\n");

				return resW;
				} // End selectSubI32()

				SDNode ConnexDAGToDAGISel::selectMulI32(SDNode Node) {
				LLVM_DEBUG(dbgs() << "Entered selectMulI32(): [LATEST] Selecting Node = ";
				Node->dump(CrtDAG); dbgs() << "\n");

				SDLoc DL(Node);

				EVT ViaVecTy;
				EVT typeVecNode;

				// EVT ResTy = Node->getValueType(1); // 0 is ch (chain)

				LLVM_DEBUG(dbgs() << "selectMulI32(): We are in the case TYPE_VECTOR_I32\n");
				typeVecNode = TYPE_VECTOR_I32;

				SDValue nodeOpSrc1 = Node->getOperand(0);
				SDValue nodeOpSrc2 = Node->getOperand(1);

				LLVM_DEBUG(dbgs() << "selectMulI32(): nodeOpSrc1.getValueType() = "
				<< nodeOpSrc1.getValueType().getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "selectMulI32(): nodeOpSrc1 = ";
				(nodeOpSrc1.getNode())->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "selectMulI32(): nodeOpSrc2.getValueType() = "
				<< nodeOpSrc2.getValueType().getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "selectMulI32(): nodeOpSrc2 = ";
				(nodeOpSrc2.getNode())->dump(); dbgs() << "\n");
				// assert(nodeOpSrc.getValueType() == TYPE_VECTOR_I32);

				SDNode *nodeOpSrcCast1 =
				CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_WH, DL, TYPE_VECTOR_I16,
				#ifdef MARKER_FOR_EMULATION
				MVT::Other,
				#else
				MVT::Glue,
				#endif
				nodeOpSrc1);

				#ifdef MARKER_FOR_EMULATION
				std::string exprStrBegin = "// Starting MUL.i32 emulation ;)";
				SDNode *inlineAsmNodeBegin =
				CreateInlineAsmNode(CrtDAG, exprStrBegin, nodeOpSrcCast1, DL);
				LLVM_DEBUG(dbgs() << "selectMulI32: inlineAsmNodeBegin = ";
				inlineAsmNodeBegin->dump(); dbgs() << "\n");
				#endif

				SDNode *nodeOpSrcCast2 = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_WH, DL,
				TYPE_VECTOR_I16, MVT::Other,
				// Important:this gives error:
				// <<Assertion
				// `N->getNodeId() == -1 &&
				// "Node already inserted!">>
				// MVT::Glue,
				nodeOpSrc2,
				// chain
				#ifdef MARKER_FOR_EMULATION
				SDValue(inlineAsmNodeBegin, 0)
				#else
				SDValue(nodeOpSrcCast1, 1)
				#endif
				);

				// Note: COPY generated by TwoAddressInctruction in WHERE blocks and handled
				// by me in ConnexTargetMachine.cpp, etc.

				//#include "Select_MULTi32_SignAndMagnitude_OpincaaCodeGen.h"
				#include "Select_MULTi32_ComplementedRepresentation_OpincaaCodeGen.h"

				#ifdef MARKER_FOR_EMULATION
				std::string exprStrEnd = "// Finishing MUL.i32 emulation ;)";
				SDNode *inlineAsmNodeEnd = CreateInlineAsmNode(CrtDAG, exprStrEnd,
				lastNode, // resH,
				DL);
				LLVM_DEBUG(dbgs() << "selectMulI32(): inlineAsmNodeEnd = ";
				inlineAsmNodeEnd->dump(); dbgs() << "\n");
				#endif

				// End of method - we convert resH (vector of i16) to resW (vector of i32)
				SDNode *resW = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_HW, DL,
				typeVecNode, SDValue(resH, 0),
				// chain edge
				#ifdef MARKER_FOR_EMULATION
				SDValue(inlineAsmNodeEnd, 0)
				#else
				SDValue(resH, 1)
				#endif
				);
				LLVM_DEBUG(dbgs() << "selectMulI32(): resW = "; resW->dump(CrtDAG);
				dbgs() << "\n");

				return resW;
				} // End selectMulI32()

				SDNode ConnexDAGToDAGISel::selectSraI32(SDNode Node) {
				LLVM_DEBUG(dbgs() << "Entered selectSraI32(): [LATEST] Selecting Node = ";
				Node->dump(CrtDAG); dbgs() << "\n");

				SDLoc DL(Node);

				EVT ViaVecTy;
				EVT typeVecNode;

				// EVT ResTy = Node->getValueType(1); // 0 is ch (chain)

				LLVM_DEBUG(dbgs() << "selectSraI32(): We are in the case TYPE_VECTOR_I32\n");
				typeVecNode = TYPE_VECTOR_I32;

				SDValue nodeOp0 = Node->getOperand(0);
				SDValue nodeOp1 = Node->getOperand(1);

				SDNode *nodeOpSrcCast1 =
				CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_WH, DL, TYPE_VECTOR_I16,
				#ifdef MARKER_FOR_EMULATION
				MVT::Other,
				#else
				MVT::Glue,
				#endif
				nodeOp0);

				#ifdef MARKER_FOR_EMULATION
				std::string exprStrBegin = "// Starting SHRA.i32 emulation ;)";
				SDNode *inlineAsmNodeBegin =
				CreateInlineAsmNode(CrtDAG, exprStrBegin, nodeOpSrcCast1, DL);
				LLVM_DEBUG(dbgs() << "selectSraI32(): inlineAsmNodeBegin = ";
				inlineAsmNodeBegin->dump(); dbgs() << "\n");
				#endif

				SDNode *nodeOpSrcCast2 = CrtDAG->getMachineNode(
				Connex::NOP_BITCONVERT_WH, DL, TYPE_VECTOR_I16, MVT::Other,
				// Important: this can give error:
				// <<Assertion
				// `N->getNodeId() == -1 &&
				// "Node already inserted!">>
				// MVT::Glue,
				nodeOp1,
				// chain
				#ifdef MARKER_FOR_EMULATION
				SDValue(inlineAsmNodeBegin, 0)
				#else
				SDValue(nodeOpSrcCast1, 1)
				#endif
				);

				#include "Select_SHRAi32_OpincaaCodeGen.h"

				LLVM_DEBUG(dbgs() << "selectSraI32(): resH = "; resH->dump(CrtDAG);
				dbgs() << "\n");

				#ifdef MARKER_FOR_EMULATION
				std::string exprStrEnd = "// Finishing SHRA.i32 emulation ;)";
				SDNode *inlineAsmNodeEnd = CreateInlineAsmNode(CrtDAG, exprStrEnd,
				lastNode, // resH,
				DL);
				LLVM_DEBUG(dbgs() << "selectSraI32(): inlineAsmNodeEnd = ";
				inlineAsmNodeEnd->dump(); dbgs() << "\n");
				#endif

				// End of method - we convert resH (vector of i16) to resW (vector of i32)
				SDNode *resW = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_HW, DL,
				typeVecNode, SDValue(resH, 0),
				// chain edge
				#ifdef MARKER_FOR_EMULATION
				SDValue(inlineAsmNodeEnd, 0)
				#else
				SDValue(resH, 1)
				#endif
				);

				LLVM_DEBUG(dbgs() << "selectSraI32(): resW = "; resW->dump(CrtDAG);
				dbgs() << "\n");

				return resW;
				} // End selectSraI32()

				SDNode ConnexDAGToDAGISel::selectAddF16(SDNode Node) {
				LLVM_DEBUG(dbgs() << "Entered selectAddF16(): [LATEST] Selecting Node = ";
				Node->dump(CrtDAG); dbgs() << "\n");

				SDLoc DL(Node);

				EVT ViaVecTy;
				EVT typeVecNode;

				// EVT ResTy = Node->getValueType(1); // 0 is ch (chain)

				LLVM_DEBUG(dbgs() << "selectAddF16(): We are in the case TYPE_VECTOR_F16\n");
				typeVecNode = TYPE_VECTOR_F16;

				SDValue nodeOpSrc1 = Node->getOperand(0);
				SDValue nodeOpSrc2 = Node->getOperand(1);

				LLVM_DEBUG(dbgs() << "selectAddF16(): nodeOpSrc1.getValueType() = "
				<< nodeOpSrc1.getValueType().getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "selectAddF16(): nodeOpSrc1 = ";
				(nodeOpSrc1.getNode())->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "selectAddF16(): nodeOpSrc2.getValueType() = "
				<< nodeOpSrc2.getValueType().getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "selectAddF16(): nodeOpSrc2 = ";
				(nodeOpSrc2.getNode())->dump(); dbgs() << "\n");
				// assert(nodeOpSrc.getValueType() == TYPE_VECTOR_F16);

				SDNode *nodeOpSrcCast1 = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_WH, DL,
				// The output type of the node
				TYPE_VECTOR_I16,
				#ifdef MARKER_FOR_EMULATION
				MVT::Other,
				// It gives error: MVT::Glue,
				#else
				MVT::Glue,
				#endif
				nodeOpSrc1);

				#ifdef MARKER_FOR_EMULATION
				std::string exprStrBegin = "// Starting add.f16 emulation ;)";
				SDNode *inlineAsmNodeBegin =
				CreateInlineAsmNode(CrtDAG, exprStrBegin, nodeOpSrcCast1, DL);
				LLVM_DEBUG(dbgs() << "selectAddF16: inlineAsmNodeBegin = ";
				inlineAsmNodeBegin->dump(); dbgs() << "\n");
				#endif

				SDNode *nodeOpSrcCast2 = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_WH, DL,
				// The output type of the node
				TYPE_VECTOR_I16, MVT::Other,
				// Important: it gives error:
				// <<Assertion
				// `N->getNodeId() == -1 &&
				// "Node already inserted!">>
				// MVT::Glue,
				nodeOpSrc2,
				// chain
				#ifdef MARKER_FOR_EMULATION
				SDValue(inlineAsmNodeBegin, 0)
				#else
				SDValue(nodeOpSrcCast1, 1)
				#endif
				);

				// Note: COPY generated by TwoAddressInctruction in WHERE blocks and handled
				// by me in ConnexTargetMachine.cpp, etc.

				#include "Select_ADDf16_OpincaaCodeGen.h"

				#ifdef MARKER_FOR_EMULATION
				std::string exprStrEnd = "// Finishing add.f16 emulation ;)";
				SDNode *inlineAsmNodeEnd = CreateInlineAsmNode(CrtDAG, exprStrEnd,
				lastNode, // resF16,
				DL);
				LLVM_DEBUG(dbgs() << "selectAddF16(): inlineAsmNodeEnd = ";
				inlineAsmNodeEnd->dump(); dbgs() << "\n");
				#endif

				// End of method - we convert resH (vector of i16) to resW (vector of i32)
				SDNode *resW = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_HW, DL,
				typeVecNode, SDValue(resF16, 0),
				// chain edge
				#ifdef MARKER_FOR_EMULATION
				SDValue(inlineAsmNodeEnd, 0)
				#else
				SDValue(resF16, 1)
				#endif
				);
				LLVM_DEBUG(dbgs() << "selectAddF16(): resW = "; resW->dump(CrtDAG);
				dbgs() << "\n");

				return resW;
				} // End selectAddF16()

				SDNode ConnexDAGToDAGISel::selectSubF16(SDNode Node) {
				LLVM_DEBUG(dbgs() << "Entered selectSubF16(): [LATEST] Selecting Node = ";
				Node->dump(CrtDAG); dbgs() << "\n");

				SDLoc DL(Node);

				EVT ViaVecTy;
				EVT typeVecNode;

				// EVT ResTy = Node->getValueType(1); // 0 is ch (chain)

				LLVM_DEBUG(dbgs() << "selectSubF16(): We are in the case TYPE_VECTOR_F16\n");
				typeVecNode = TYPE_VECTOR_F16;

				SDValue nodeOpSrc1 = Node->getOperand(0);
				SDValue nodeOpSrc2 = Node->getOperand(1);

				LLVM_DEBUG(dbgs() << "selectSubF16(): nodeOpSrc1.getValueType() = "
				<< nodeOpSrc1.getValueType().getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "selectSubF16(): nodeOpSrc1 = ";
				(nodeOpSrc1.getNode())->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "selectSubF16(): nodeOpSrc2.getValueType() = "
				<< nodeOpSrc2.getValueType().getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "selectSubF16(): nodeOpSrc2 = ";
				(nodeOpSrc2.getNode())->dump(); dbgs() << "\n");
				// assert(nodeOpSrc.getValueType() == TYPE_VECTOR_F16);

				SDNode *nodeOpSrcCast1 = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_WH, DL,
				// The output type of the node
				TYPE_VECTOR_I16,
				#ifdef MARKER_FOR_EMULATION
				MVT::Other,
				// This gives error: MVT::Glue,
				#else
				MVT::Glue,
				#endif
				nodeOpSrc1);

				#ifdef MARKER_FOR_EMULATION
				std::string exprStrBegin = "// Starting sub.f16 emulation ;)";
				SDNode *inlineAsmNodeBegin =
				CreateInlineAsmNode(CrtDAG, exprStrBegin, nodeOpSrcCast1, DL);
				LLVM_DEBUG(dbgs() << "selectSubF16: inlineAsmNodeBegin = ";
				inlineAsmNodeBegin->dump(); dbgs() << "\n");
				#endif

				SDNode *nodeOpSrcCast2 = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_WH, DL,
				// The output type of the node
				TYPE_VECTOR_I16, MVT::Other,
				// Important: it gives error:
				// <<Assertion
				// `N->getNodeId() == -1 &&
				// "Node already inserted!">>
				// MVT::Glue,
				nodeOpSrc2,
				// chain
				#ifdef MARKER_FOR_EMULATION
				SDValue(inlineAsmNodeBegin, 0)
				#else
				SDValue(nodeOpSrcCast1, 1)
				#endif
				);

				// Note: COPY generated by TwoAddressInctruction in WHERE blocks and handled
				// by me in ConnexTargetMachine.cpp, etc.

				#include "Select_SUBf16_OpincaaCodeGen.h"

				#ifdef MARKER_FOR_EMULATION
				std::string exprStrEnd = "// Finishing sub.f16 emulation ;)";
				SDNode *inlineAsmNodeEnd = CreateInlineAsmNode(CrtDAG, exprStrEnd,
				lastNode, // resF16,
				DL);
				LLVM_DEBUG(dbgs() << "SelectSubF16(): inlineAsmNodeEnd = ";
				inlineAsmNodeEnd->dump(); dbgs() << "\n");
				#endif

				// End of method - we convert resH (vector of i16) to resW (vector of i32)
				SDNode *resW = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_HW, DL,
				typeVecNode, SDValue(resF16, 0),
				// chain edge
				#ifdef MARKER_FOR_EMULATION
				SDValue(inlineAsmNodeEnd, 0)
				#else
				SDValue(resF16, 1)
				#endif
				);
				LLVM_DEBUG(dbgs() << "selectSubF16(): resW = "; resW->dump(CrtDAG);
				dbgs() << "\n");

				return resW;
				} // End selectSubF16()

				SDNode ConnexDAGToDAGISel::selectLtF16(SDNode Node) {
				LLVM_DEBUG(dbgs() << "Entered selectLtF16(): [LATEST] Selecting Node = ";
				Node->dump(CrtDAG); dbgs() << "\n");

				SDLoc DL(Node);

				EVT ViaVecTy;
				EVT typeVecNode;

				// EVT ResTy = Node->getValueType(1); // 0 is ch (chain)

				LLVM_DEBUG(dbgs() << "selectLtF16(): We are in the case TYPE_VECTOR_F16\n");
				typeVecNode = TYPE_VECTOR_F16;

				SDValue nodeOpSrc1 = Node->getOperand(0);
				SDValue nodeOpSrc2 = Node->getOperand(1);

				LLVM_DEBUG(dbgs() << "selectLtF16(): nodeOpSrc1.getValueType() = "
				<< nodeOpSrc1.getValueType().getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "selectLtF16(): nodeOpSrc1 = ";
				(nodeOpSrc1.getNode())->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "selectLtF16(): nodeOpSrc2.getValueType() = "
				<< nodeOpSrc2.getValueType().getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "selectLtF16(): nodeOpSrc2 = ";
				(nodeOpSrc2.getNode())->dump(); dbgs() << "\n");
				// assert(nodeOpSrc.getValueType() == TYPE_VECTOR_F16);

				SDNode *nodeOpSrcCast1 = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_WH, DL,
				// The output type of the node
				TYPE_VECTOR_I16,
				#ifdef MARKER_FOR_EMULATION
				MVT::Other,
				// It gives error: MVT::Glue,
				#else
				MVT::Glue,
				#endif
				nodeOpSrc1);

				#ifdef MARKER_FOR_EMULATION
				std::string exprStrBegin = "// Starting lt.f16 emulation ;)";
				SDNode *inlineAsmNodeBegin =
				CreateInlineAsmNode(CrtDAG, exprStrBegin, nodeOpSrcCast1, DL);
				LLVM_DEBUG(dbgs() << "selectLtF16: inlineAsmNodeBegin = ";
				inlineAsmNodeBegin->dump(); dbgs() << "\n");
				#endif

				SDNode *nodeOpSrcCast2 = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_WH, DL,
				// The output type of the node
				TYPE_VECTOR_I16, MVT::Other,
				// Important: it gives error:
				// <<Assertion
				// `N->getNodeId() == -1 &&
				// "Node already inserted!">>
				// MVT::Glue,
				nodeOpSrc2,
				// chain
				#ifdef MARKER_FOR_EMULATION
				SDValue(inlineAsmNodeBegin, 0)
				#else
				SDValue(nodeOpSrcCast1, 1)
				#endif
				);

				// Note: COPY generated by TwoAddressInctruction in WHERE blocks and handled
				// by me in ConnexTargetMachine.cpp, etc)

				#include "Select_LTf16_OpincaaCodeGen.h"

				#ifdef MARKER_FOR_EMULATION
				std::string exprStrEnd = "// Finishing lt.f16 emulation ;)";
				SDNode *inlineAsmNodeEnd = CreateInlineAsmNode(CrtDAG, exprStrEnd,
				lastNode, // resF16,
				DL);
				LLVM_DEBUG(dbgs() << "selectLtF16(): inlineAsmNodeEnd = ";
				inlineAsmNodeEnd->dump(); dbgs() << "\n");
				#endif

				// End of method - we convert resH (vector of short/i16) to resW (vector of
				// i32)
				SDNode *resW =
				CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_HW, DL,
				// typeVecNode,
				TYPE_VECTOR_I16, MVT::Glue, SDValue(resF16, 0),
				// chain edge
				#ifdef MARKER_FOR_EMULATION
				SDValue(inlineAsmNodeEnd, 0)
				#else
				SDValue(resF16, 1)
				#endif
				);
				LLVM_DEBUG(dbgs() << "selectLtF16(): resW = "; resW->dump(CrtDAG);
				dbgs() << "\n");

				return resW;
				} // End selectLtF16()

				SDNode ConnexDAGToDAGISel::selectMulF16(SDNode Node) {
				LLVM_DEBUG(dbgs() << "Entered selectMulF16(): [LATEST] Selecting Node = ";
				Node->dump(CrtDAG); dbgs() << "\n");

				SDLoc DL(Node);

				EVT ViaVecTy;
				EVT typeVecNode;

				// EVT ResTy = Node->getValueType(1); // 0 is ch (chain)

				LLVM_DEBUG(dbgs() << "selectMulF16(): We are in the case TYPE_VECTOR_F16\n");
				typeVecNode = TYPE_VECTOR_F16;

				SDValue nodeOpSrc1 = Node->getOperand(0);
				SDValue nodeOpSrc2 = Node->getOperand(1);

				LLVM_DEBUG(dbgs() << "selectMulF16(): nodeOpSrc1.getValueType() = "
				<< nodeOpSrc1.getValueType().getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "selectMulF16(): nodeOpSrc1 = ";
				(nodeOpSrc1.getNode())->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "selectMulF16(): nodeOpSrc2.getValueType() = "
				<< nodeOpSrc2.getValueType().getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "selectMulF16(): nodeOpSrc2 = ";
				(nodeOpSrc2.getNode())->dump(); dbgs() << "\n");
				// assert(nodeOpSrc.getValueType() == TYPE_VECTOR_F16);

				SDNode *nodeOpSrcCast1 = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_WH, DL,
				// The output type of the node
				TYPE_VECTOR_I16,
				#ifdef MARKER_FOR_EMULATION
				MVT::Other,
				// This gives a serious error:
				// MVT::Glue,
				#else
				MVT::Glue,
				#endif
				nodeOpSrc1);

				#ifdef MARKER_FOR_EMULATION
				std::string exprStrBegin = "// Starting mult.f16 emulation ;)";
				SDNode *inlineAsmNodeBegin =
				CreateInlineAsmNode(CrtDAG, exprStrBegin, nodeOpSrcCast1, DL);
				LLVM_DEBUG(dbgs() << "selectMulF16: inlineAsmNodeBegin = ";
				inlineAsmNodeBegin->dump(); dbgs() << "\n");
				#endif

				SDNode *nodeOpSrcCast2 = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_WH, DL,
				// The output type of the node
				TYPE_VECTOR_I16, MVT::Other,
				// Important: it gives error:
				// <<Assertion
				// `N->getNodeId() == -1 &&
				// "Node already inserted!">>
				// MVT::Glue,
				nodeOpSrc2,
				// chain
				#ifdef MARKER_FOR_EMULATION
				SDValue(inlineAsmNodeBegin, 0)
				#else
				SDValue(nodeOpSrcCast1, 1)
				#endif
				);

				// Note: COPY generated by TwoAddressInctruction in WHERE blocks and handled
				// by me in ConnexTargetMachine.cpp, etc.

				#include "Select_MULTf16_OpincaaCodeGen.h"

				#ifdef MARKER_FOR_EMULATION
				std::string exprStrEnd = "// Finishing mult.f16 emulation ;)";
				SDNode *inlineAsmNodeEnd =
				CreateInlineAsmNode(CrtDAG, exprStrEnd, lastNode, DL);
				LLVM_DEBUG(dbgs() << "selectMulF16(): inlineAsmNodeEnd = ";
				inlineAsmNodeEnd->dump(); dbgs() << "\n");
				#endif

				// End of method - we convert resH (vector of i16) to resW (vector of i32)
				SDNode *resW = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_HW, DL,
				typeVecNode, SDValue(resF16, 0),
				// chain edge
				#ifdef MARKER_FOR_EMULATION
				SDValue(inlineAsmNodeEnd, 0)
				#else
				#error Normally no longer supported
				SDValue(resF16, 1)
				#endif
				);
				LLVM_DEBUG(dbgs() << "selectMulF16(): resW = "; resW->dump(CrtDAG);
				dbgs() << "\n");

				return resW;
				} // End selectMulF16()

				SDNode ConnexDAGToDAGISel::selectDivF16(SDNode Node) {
				LLVM_DEBUG(dbgs() << "Entered selectDivF16(): [LATEST] Selecting Node = ";
				Node->dump(CrtDAG); dbgs() << "\n");

				SDLoc DL(Node);

				EVT ViaVecTy;
				EVT typeVecNode;

				// EVT ResTy = Node->getValueType(1); // 0 is ch (chain)

				LLVM_DEBUG(dbgs() << "selectDivF16(): We are in the case TYPE_VECTOR_F16\n");
				typeVecNode = TYPE_VECTOR_F16;

				SDValue nodeOpSrc1 = Node->getOperand(0);
				SDValue nodeOpSrc2 = Node->getOperand(1);

				LLVM_DEBUG(dbgs() << "selectDivF16(): nodeOpSrc1.getValueType() = "
				<< nodeOpSrc1.getValueType().getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "selectDivF16(): nodeOpSrc1 = ";
				(nodeOpSrc1.getNode())->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "selectDivF16(): nodeOpSrc2.getValueType() = "
				<< nodeOpSrc2.getValueType().getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "selectDivF16(): nodeOpSrc2 = ";
				(nodeOpSrc2.getNode())->dump(); dbgs() << "\n");
				// assert(nodeOpSrc.getValueType() == TYPE_VECTOR_F16);

				SDNode *nodeOpSrcCast1 = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_WH, DL,
				// The output type of the node
				TYPE_VECTOR_I16,
				#ifdef MARKER_FOR_EMULATION
				MVT::Other,
				// It gives error: MVT::Glue,
				#else
				MVT::Glue,
				#endif
				nodeOpSrc1);

				#ifdef MARKER_FOR_EMULATION
				std::string exprStrBegin = "// Starting div.f16 emulation ;)";
				SDNode *inlineAsmNodeBegin =
				CreateInlineAsmNode(CrtDAG, exprStrBegin, nodeOpSrcCast1, DL);
				LLVM_DEBUG(dbgs() << "selectDivF16: inlineAsmNodeBegin = ";
				inlineAsmNodeBegin->dump(); dbgs() << "\n");
				#endif

				SDNode *nodeOpSrcCast2 = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_WH, DL,
				// The output type of the node
				TYPE_VECTOR_I16, MVT::Other,
				// Important: it gives error:
				// <<Assertion
				// `N->getNodeId() == -1 &&
				// "Node already inserted!">>
				// MVT::Glue,
				nodeOpSrc2,
				// chain
				#ifdef MARKER_FOR_EMULATION
				SDValue(inlineAsmNodeBegin, 0)
				#else
				SDValue(nodeOpSrcCast1, 1)
				#endif
				);

				// Note: COPY generated by TwoAddressInctruction in WHERE blocks and handled
				// by me in ConnexTargetMachine.cpp, etc.

				#include "Select_DIVf16_OpincaaCodeGen.h"

				#ifdef MARKER_FOR_EMULATION
				std::string exprStrEnd = "// Finishing div.f16 emulation ;)";
				SDNode *inlineAsmNodeEnd =
				CreateInlineAsmNode(CrtDAG, exprStrEnd, lastNode, DL);
				LLVM_DEBUG(dbgs() << "selectDivF16(): inlineAsmNodeEnd = ";
				inlineAsmNodeEnd->dump(); dbgs() << "\n");
				#endif

				// End of method - we convert resH (vector of i16) to resW (vector of i32)
				SDNode *resW = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_HW, DL,
				typeVecNode, SDValue(resF16, 0),
				// chain edge
				#ifdef MARKER_FOR_EMULATION
				SDValue(inlineAsmNodeEnd, 0)
				#else
				#error Normally no longer supported
				SDValue(resF16, 1)
				#endif
				);
				LLVM_DEBUG(dbgs() << "selectDivF16(): resW = "; resW->dump(CrtDAG);
				dbgs() << "\n");

				return resW;
				} // End selectDivF16()

				SDNode ConnexDAGToDAGISel::selectDivI16(SDNode Node) {
				LLVM_DEBUG(dbgs() << "Entered selectDivI16(): Selecting Node = ";
				Node->dump(CrtDAG); dbgs() << "\n");

				SDLoc DL(Node);

				EVT ViaVecTy;
				EVT typeVecNode;

				SDValue nodeOpSrc1 = Node->getOperand(0);
				SDValue nodeOpSrc2 = Node->getOperand(1);

				LLVM_DEBUG(dbgs() << "selectDivI16(): nodeOpSrc1.getValueType() = "
				<< nodeOpSrc1.getValueType().getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "selectDivI16(): nodeOpSrc1 = ";
				(nodeOpSrc1.getNode())->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "selectDivI16(): nodeOpSrc2.getValueType() = "
				<< nodeOpSrc2.getValueType().getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "selectDivI16(): nodeOpSrc2 = ";
				(nodeOpSrc2.getNode())->dump(); dbgs() << "\n");
				// assert(nodeOpSrc.getValueType() == TYPE_VECTOR_I32);

				SDNode *nodeOpSrcCast1 = CrtDAG->getMachineNode(
				// Important: this is a BOGUS
				// NOP_BITCONVERT - we just
				// put it since it has a Glue
				// result, while
				// nodeOpSrcCast1 does NOT
				Connex::NOP_BITCONVERT_HH, DL, TYPE_VECTOR_I16,
				#ifdef MARKER_FOR_EMULATION
				MVT::Other,
				// It gives error: MVT::Glue,
				#else
				MVT::Glue,
				#endif
				nodeOpSrc1);

				#ifdef MARKER_FOR_EMULATION
				std::string exprStrBegin = "// Starting DIV.i16 emulation ;)";
				SDNode *inlineAsmNodeBegin =
				CreateInlineAsmNode(CrtDAG, exprStrBegin, nodeOpSrcCast1, DL);
				LLVM_DEBUG(dbgs() << "selectDivI16: inlineAsmNodeBegin = ";
				inlineAsmNodeBegin->dump(); dbgs() << "\n");
				#endif

				SDNode *nodeOpSrcCast2 = CrtDAG->getMachineNode(
				// Important: this is a BOGUS
				// NOP_BITCONVERT - we just
				// put it since it has a Glue
				// result, while
				// nodeOpSrcCast1 does NOT
				Connex::NOP_BITCONVERT_HH, DL, TYPE_VECTOR_I16, MVT::Other,
				// It gives error: MVT::Glue,
				nodeOpSrc2,
				// chain
				#ifdef MARKER_FOR_EMULATION
				SDValue(inlineAsmNodeBegin, 0)
				#else
				SDValue(nodeOpSrcCast1, 1)
				#endif
				);

				#include "Select_DIVi16_OpincaaCodeGen.h"

				#ifdef MARKER_FOR_EMULATION
				std::string exprStrEnd = "// Finishing DIV.i16 emulation ;)";
				SDNode *inlineAsmNodeEnd = CreateInlineAsmNode(CrtDAG, exprStrEnd, resH, DL);
				LLVM_DEBUG(dbgs() << "selectDivI16(): inlineAsmNodeEnd = ";
				inlineAsmNodeEnd->dump(); dbgs() << "\n");

				LLVM_DEBUG(dbgs() << "selectDivI16(): resH = "; resH->dump(CrtDAG);
				dbgs() << "\n");
				// return inlineAsmNodeEnd;
				// Gives error: <<SelectionDAG.cpp:6421:
				// void llvm::SelectionDAG::ReplaceAllUsesWith(llvm::SDNode, llvm::SDNode):
				// Assertion `(!From->hasAnyUseOfValue(i) \|\|
				// From->getValueType(i) == To->getValueType(i)) &&
				// "Cannot use this version of ReplaceAllUsesWith!"' failed.>>

				SDNode *resHH = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_HH, DL,
				TYPE_VECTOR_I16, SDValue(resH, 0),
				// chain edge
				// SDValue(resH, 1)
				SDValue(inlineAsmNodeEnd, 0));
				LLVM_DEBUG(dbgs() << "selectDivI16(): resHH = "; resHH->dump(CrtDAG);
				dbgs() << "\n");

				return resHH;
				#else
				return resH;
				#endif
				} // End selectDivI16()

				SDNode ConnexDAGToDAGISel::selectVSELECT(SDNode vselectNode) {
				// Basically we expand ("instruction-select") the following
				// machine-independent instruction:
				// dst = VSELECT pred, true_assignment, false_assignment
				// to the following Connex machine instr sequence:
				// (note the comparison is excluded from the listing below
				// and will be scheduled before it)
				//
				// // For pred == false
				// dst = false_assignment
				// WHERExy
				// // For pred == true:
				// dst = true_assignment
				// END_WHERE
				//
				// NOTE: we could use a WHERE !pred to assign for the false case,
				// but our above solution "destructive" assignment is OK and
				// it takes fewer instructions.

				// In the end I do VSELECT treatment here, in
				// ConnexISelDAGToDAG, and not in ISelLowering::LowerOperation.
				//
				// Note that register allocation is performed after Instruction selection
				// (see [Cardoso_2014], Figure on page 134).
				//
				// Note that although it is not required to create virtual registers for
				// the ORV_H machine instructions (since we failed to add a ch input port
				// to the setcc - see 50_IfConversion/Setcc_with_ch_input_port_NOT_working
				// - and I guess we would fail here also), we create it for the true
				// ORV_H because we need to make the associated predecessor CopyToRegister
				// a successor of WHEREEQ, otherwise the WHEREEQ would not have a successor.
				// TODO if we are extremely precious:
				// I guess we could make a succcessor of WHEREEQ the CopyToReg successor
				// of ORV_H and could get rid of all input virtual registers.
				// NOTE: we canNOT get rid of the virtual register that keeps the result of
				// both ORV_H, because we can replace it only with a VSELECT (reminds me
				// of dataflow machines and multiplexors :) ), BUT we want
				// to lower VSELECT in other components.
				//
				// Note that the nodes we create here have to have correct ordering,
				// otherwise instruction selection can fail or have wrong semantics.

				// END_WHERE, etc are defined in anonymous enum in TableGen generated
				// ConnexGenInstrInfo.inc

				// From http://llvm.org/docs/doxygen/html/classllvm_1_1SelectionDAG.html:
				// LLVMContext * getContext () const

				SDLoc DL(vselectNode);

				EVT ViaVecTy;
				EVT typeVecNode;

				LLVM_DEBUG(dbgs() << "Entered selectVSELECT(): Selecting vselectNode = ";
				vselectNode->dump(CrtDAG); dbgs() << "\n");

				// EVT ResTy = Node->getValueType(1); // 0 is ch (chain)
				EVT ResTy = vselectNode->getValueType(0);
				LLVM_DEBUG(dbgs() << " ResTy = " << ResTy.getEVTString() << "\n");

				// SDValue chain = DAG.getEntryNode();

				assert(vselectNode->getNumOperands() == 3);
				LLVM_DEBUG(dbgs() << " selectVSELECT(): Initially vselectNode->use_size() = "
				<< vselectNode->use_size() << "\n");
				for (SDNode::use_iterator UI = vselectNode->use_begin(),
				UE = vselectNode->use_end();
				UI != UE; ++UI) {
				// Note: UI is an SDNode *
				LLVM_DEBUG(
				dbgs() << " selectVSELECT(): Initially a use of vselectNode is: ";
				UI->print(dbgs()); dbgs() << "\n");
				}

				// EVT nodeResType = vselectNode->getValueType(0);
				SDValue vselectNodeOp0 = vselectNode->getOperand(0);
				SDValue vselectNodeOp1 = vselectNode->getOperand(1);
				SDValue vselectNodeOp2 = vselectNode->getOperand(2);
				//
				LLVM_DEBUG(dbgs() << "selectVSELECT(): vselectNodeOp0.getValueType() = "
				<< vselectNodeOp0.getValueType().getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "selectVSELECT(): vselectNodeOp0 = ";
				(vselectNodeOp0.getNode())->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "selectVSELECT(): vselectNodeOp1.getValueType() = "
				<< vselectNodeOp1.getValueType().getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "selectVSELECT(): vselectNodeOp1 = ";
				(vselectNodeOp1.getNode())->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "selectVSELECT(): vselectNodeOp2 = ";
				(vselectNodeOp2.getNode())->dump(); dbgs() << "\n");

				SDValue setCC = vselectNodeOp0;
				SDNode *setCCNode = setCC.getNode();
				SDValue setCCPred = vselectNodeOp0.getNode()->getOperand(2);
				SDNode *setCCPredNode = setCCPred.getNode();
				//
				LLVM_DEBUG(dbgs() << "selectVSELECT(): setCCPredNode = ";
				// << setCCPredNode
				setCCPredNode->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "selectVSELECT(): setCCNode = ";
				// << setCCPredNode
				setCCNode->dump(); dbgs() << "\n");

				SDValue setCCNodeOp0 = setCCNode->getOperand(0);
				SDValue setCCNodeOp1 = setCCNode->getOperand(1);
				SDValue setCCNodeOp2 = setCCNode->getOperand(2);
				//
				LLVM_DEBUG(dbgs() << "selectVSELECT(): setCCNodeOp0 = ";
				(setCCNodeOp0.getNode())->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "selectVSELECT(): setCCNodeOp1 = ";
				(setCCNodeOp1.getNode())->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "selectVSELECT(): setCCNodeOp2 = ";
				(setCCNodeOp2.getNode())->dump(); dbgs() << "\n");

				assert(setCCPredNode->isMachineOpcode() == false);
				assert(setCCPredNode->getOpcode() == ISD::CONDCODE);

				// EVT ResTy = TYPE_VECTOR_I16;

				unsigned whereOpcode;
				unsigned wherePredOpcode;
				unsigned typeCmp = cast<CondCodeSDNode>(setCCPredNode)->get();

				switch (typeCmp) {
				case ISD::SETEQ:
				whereOpcode = Connex::WHEREEQ;
				wherePredOpcode = Connex::EQ_H;
				break;
				case ISD::SETLT:
				whereOpcode = Connex::WHERELT;
				wherePredOpcode = Connex::LT_H;
				break;
				case ISD::SETULT:
				whereOpcode = Connex::WHERELT;
				wherePredOpcode = Connex::ULT_H;
				break;
				case ISD::SETOLT: {
				// We do operator strength reduction if one of the operands is 0
				BuildVectorSDNode *BVN = cast<BuildVectorSDNode>(setCCNodeOp1);

				if (isSplatVector(BVN) && // (BuildVectorSDNode *)&setCCNodeOp1) &&
				setCCNodeOp1->getConstantOperandVal(0) == 0) {
				LLVM_DEBUG(
				dbgs()
				<< "selectVSELECT(): operator strength reduction if one of the "
				"operands is 0\n");

				typeCmp = ISD::SETLT; // ISD::SETULT;
				ResTy = TYPE_VECTOR_I16;

				whereOpcode = Connex::WHERELT;
				wherePredOpcode = Connex::LT_H;
				break;
				}

				assert(0 && "selectVSELECT(): Comparison (typeCmp) NOT implemented "
				"--> MUST IMPLEMENT");
				whereOpcode = Connex::WHERELT;
				wherePredOpcode = Connex::LT_H;
				break;
				}
				default:
				assert(0 && "selectVSELECT(): Comparison (typeCmp) NOT implemented "
				"--> MUST IMPLEMENT");
				// "VSELECT NOT implemented for other types than i16 - "
				// "need to do WHERELT/ULT.i32/f16
				// (WHEREEQ.i32/f16 is identical to WHEREEQ.i16)");
				}

				/*
				assert(ResTy == TYPE_VECTOR_I16 &&
				"VSELECT NOT implemented for other types than i16 - "
				"need to do WHERELT/ULT.i32/f16 "
				" (WHEREEQ.i32/f16 is identical to WHEREEQ.i16)");
				*/

				switch (typeCmp) {
				// ISD::SETEQ/SETLT/SETULT follow the same code to add glue - hence NO break
				case ISD::SETEQ:
				case ISD::SETLT:
				case ISD::SETULT: {
				// Here we basically add a glue edge to work with e.g.
				// pre-RA-sched(=source) to the predicate SDNode by creating a new
				// predicate MachineSDNode.
				// If we do NOT put it, this block of Connex (Machine) instructions
				// implementing VSELECT can be broken by e.g. pre-RA-sched=source.
				SDNode *newNode = CrtDAG->getMachineNode(wherePredOpcode, DL,
				// TYPE_VECTOR_I16,
				ResTy, MVT::Glue,
				// The 1st vector for comparison
				setCCNodeOp0,
				// The 2nd vector for comparison
				setCCNodeOp1
				// Glue input edge
				);
				LLVM_DEBUG(dbgs() << "selectVSELECT(): newNode = "; newNode->dump();
				dbgs() << "\n");

				ReplaceNode(setCCNode, newNode);
				setCCNode = newNode;
				setCC = SDValue(newNode, 0);

				break;
				}
				//
				case ISD::SETOLT: {
				// This is lt.f16

				whereOpcode = Connex::WHEREEQ;
				ResTy = TYPE_VECTOR_F16;

				// We ISel an lt.f16 and compare its result with 1.
				SDNode *resLtF16 = selectLtF16(setCCNode);

				// VLOAD 1;
				SDValue ct1 = CrtDAG->getConstant(1, DL, MVT::i16, true, false);
				SDNode *vload1 = CrtDAG->getMachineNode(Connex::VLOAD_H, DL,
				TYPE_VECTOR_I16, MVT::Glue, ct1,
				// Glue input edge
				SDValue(resLtF16, 1));

				SDNode *newNode =
				CrtDAG->getMachineNode(Connex::EQ_H, DL, TYPE_VECTOR_I16, MVT::Glue,
				SDValue(resLtF16, 0), SDValue(vload1, 0),
				// Glue input edge
				SDValue(vload1, 1));
				LLVM_DEBUG(dbgs() << "selectVSELECT(): newNode = "; newNode->dump();
				dbgs() << "\n");

				ReplaceNode(setCCNode, newNode);
				setCCNode = newNode;
				setCC = SDValue(newNode, 0);

				break;
				}
				default:
				assert(0 && "case not reachable");
				break;
				}

				SDValue ct1 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16,
				true, false);
				SDNode *nopCopyFalse =
				CrtDAG->getMachineNode(Connex::NOP_BPF, DL, MVT::Glue,
				// Important: This is a small abuse
				// (trick): normally we should put
				// ct1 instead, BUT we need to
				// avoid that ISel does DCE with
				// setCCNode, so we put here the
				// only output the value of
				// setCCNode.
				// If setCCNode had a glue or
				// chain output we could have
				// used that output instead at
				// the end of this argument list.
				// ct1,
				SDValue(setCCNode, 0)
				// Important: This is
				// a small abuse (normally we
				// should put ct1 instead, BUT
				// IT WORKS - we need to avoid
				// ISel it does DCE with
				// setCCNode)
				// glue (or chain) input edge
				// SDValue(eq1, 1)
				// SDValue(setCCNode, 1)
				// Gives error: Assertion
				// `(!Node \|\| ResNo <
				// Node->getNumValues()) &&
				// "Invalid result number for
				// the given node!"' failed.
				// SDValue(setCCNode, 0)
				// Gives error: Assertion
				// `NumMIOperands >=
				// II.getNumOperands()
				// && NumMIOperands <=
				// II.getNumOperands() +
				// II.getNumImplicitDefs() +
				// NumImpUses && "#operands for
				// dag node doesn't match .td
				// file!"' failed.
				);

				SDNode *whereEq = CrtDAG->getMachineNode(whereOpcode, DL,
				// TYPE_VECTOR_I16,
				// MVT::Other,
				MVT::Glue,
				// SDValue(nopCopyFalse, 0),
				// vselectNodeOp2,
				// Glue/chain edge
				// SDValue(idxPredicate, 1)
				// setCCPred
				SDValue(nopCopyFalse, 0));

				// Important: Note that we use ORV_SPECIAL_H, which puts a tied-to constraint
				// to allocate to the same physical vector register (dst) both vSelectNodeOp1
				// and vSelectNodeOp2.
				// Therefore, this ORV_SPECIAL_H puts over the vSelectNodeOp2,
				// the false value, the values of the selected (where the predicate is true)
				// vSelectNodeOp1.
				SDNode *copyTrue = CrtDAG->getMachineNode(
				Connex::ORV_SPECIAL_H, DL,
				// TYPE_VECTOR_I16,
				// ResTy,
				CrtDAG->getVTList(vselectNode->getValueType(0), MVT::Glue),
				{
				vselectNodeOp1, vselectNodeOp1, vselectNodeOp2,
				// Glue edge
				// SDValue(whereEq, 1) // Glue edge
				SDValue(whereEq, 0) // Glue edge
				});

				SDNode *endWhere = CrtDAG->getMachineNode(Connex::END_WHERE, DL,
				// ResTy,
				// TYPE_VECTOR_I16,
				MVT::Other,
				// SDValue(copyTrue, 0),
				// MVT::Glue,
				/* Important: we put this bogus
				operand here to force the PostRA
				scheduler to keep the
				WHERE..END_WHERE block intact
				withOUT using instruction bundles.
				*/
				// chain edge
				SDValue(copyTrue, 1) // Glue edge
				);
				std::string exprStrEnd = "// Finishing VSELECT emulation ;)";
				SDNode *inlineAsmNodeEnd =
				CreateInlineAsmNode(CrtDAG, exprStrEnd, endWhere, DL);
				LLVM_DEBUG(dbgs() << "selectVSELECT(): inlineAsmNodeEnd = ";
				inlineAsmNodeEnd->dump(); dbgs() << "\n");

				// SDNode *res = resW;
				SDNode *res = copyTrue;
				LLVM_DEBUG(dbgs() << "selectVSELECT(): res = "; res->dump(); dbgs() << "\n");

				return res;
				} // End selectVSELECT()

				// Note: all ISD opcodes can be also found at
				// http://llvm.org/docs/doxygen/html/namespacellvm_1_1ISD.html.
				// There are also Connex opcodes that are generated by TableGen.
				void ConnexDAGToDAGISel::Select(SDNode *Node) {
				unsigned Opcode = Node->getOpcode();

				// Dump information about the Node being selected
				LLVM_DEBUG(
				dbgs() << "Entered ConnexDAGToDAGISel::Select(): Selecting Node = ";
				Node->dump(CrtDAG); dbgs() << "Opcode = " << Opcode << "\n");

				// If we have a (custom) Machine node, it means we already have selected it
				if (Node->isMachineOpcode()) {
				LLVM_DEBUG(dbgs() << "== "; Node->dump(CrtDAG); dbgs() << '\n');
				return;
				}

				// tablegen selection should be handled here.
				switch (Opcode) {
				default:
				LLVM_DEBUG(dbgs() << "ConnexDAGToDAGISel::Select(): default case: Opcode = "
				<< Opcode << "\n");
				break;

				/* From http://llvm.org/docs/doxygen/html/ISDOpcodes_8h_source.html:
				/// OUTCHAIN = INTRINSIC_VOID(INCHAIN, INTRINSICID, arg1, arg2, ...)
				/// This node represents a target intrinsic function with side effects that
				/// does not return a result. The first operand is a chain pointer. The
				/// second is the ID number of the intrinsic from the llvm::Intrinsic
				/// namespace. The operands to the intrinsic follow.
				*/
				case ISD::INTRINSIC_VOID: {
				LLVM_DEBUG(
				dbgs() << "ConnexDAGToDAGISel::Select(): case ISD::INTRINSIC_VOID"
				<< "\n");

				unsigned intrinsicOpcode =
				cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();
				LLVM_DEBUG(dbgs() << "intrinsicOpcode = " << intrinsicOpcode << "\n");

				/*
				LLVM_DEBUG(dbgs() << "Intrinsic::connex_end_repeat = "
				<< Intrinsic::connex_end_repeat << "\n");
				LLVM_DEBUG(dbgs() << "Intrinsic::connex_reduce = "
				<< Intrinsic::connex_reduce << "\n");
				LLVM_DEBUG(dbgs() << "Intrinsic::connex_repeat_x_times = "
				<< Intrinsic::connex_repeat_x_times << "\n");
				*/

				LLVM_DEBUG(dbgs() << "Node->getOperand(0) = "; Node->getOperand(0).dump();
				dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "Node->getOperand(1) = "; Node->getOperand(1).dump();
				dbgs() << "\n");

				switch (intrinsicOpcode) {
				case Intrinsic::connex_repeat_x_times: {
				SDLoc DL(Node);

				/* llvm.connex.repeat.x.times SDNode has 3 operands:
				- 0, which is the chain - a bit to my surprise
				SelectionDAGBuilder puts as input to the chain port
				the node just above it, not SDNode t0
				- 1, which is the intrinsic's opcode
				- 2, which is the actual parameter
				t16: ch = llvm.connex.repeat.x.times t10, TargetConstant:i64<471>, t15
				*/

				LLVM_DEBUG(dbgs() << "ConnexDAGToDAGISel::Select(): case "
				"Intrinsic::connex_repeat_x_times"
				<< "\n");
				LLVM_DEBUG(dbgs() << " Node->getOperand(2) = ";
				Node->getOperand(2).dump(); dbgs() << "\n");

				#define CODE2018_06_29

				SDNode *repeatSpecial =
				CrtDAG->getMachineNode(Connex::REPEAT_SYM_IMM, DL,
				// Return types
				#ifdef CODE2018_06_29
				/* Gives error at "List Scheduling":
				- when doing things as correct as
				possible (glue edge put in
				CONNEX::INLINEASM as the last
				operand):
				<<Assertion `!AnyNotSched' failed.>>

				- glue edge put in ISD::INLINEASM as
				the last operand): <<Assertion
				`isMachineOpcode() &&
				"Not a MachineInstr opcode!"'
				failed.>>

				- <<Assertion `N->getNodeId() == -1
				&& "Node already inserted!">>
				- because I put this Glue edge as
				1st operand of INLINEASM, which
				is documented as being wrong
				*/
				MVT::Glue,
				#else
				MVT::Other,
				#endif
				// We add a chain edge
				/* Important: this was wrong since
				* when we give ReplaceNode() it
				* deletes the platform independent
				* REPEAT SDNode which has as input
				* opnd0 (Node->getOperand(0), an Inline
				* ASM epxression, as discussed,
				* fed on the chain input port) and
				* opnd0 is not used by any other
				* node.
				WRONG: CrtDAG->getEntryNode()

				* But now I give opnd0 as input
				* to the chain port of the new
				* machine-dependent node
				* and this avoids
				* opnd0 becoming a dead node and
				* be eventually removed.
				*/
				Node->getOperand(0));
				LLVM_DEBUG(dbgs() << "Select() for Intrinsic::connex_repeat_x_times: "
				"repeatSpecial = ";
				repeatSpecial->dump(); dbgs() << "\n");

				SDNode *op2 = Node->getOperand(2).getNode();
				LLVM_DEBUG(dbgs() << "op2 = "; op2->dump(); dbgs() << "\n");

				std::string exprStr = "-1";
				// This is wrong - we just put an incorrect value
				// TODO(2021_02_02): see below
				/*
				// TODO(2021_02_02):
				std::string exprStr = " " +
				recoverCExpressionFromSDNode(op2, crtNodeMap, true)
				+ ");";
				*/

				SDNode *inlineAsmNode =
				CreateInlineAsmNode(CrtDAG, exprStr, repeatSpecial, DL
				#ifdef CODE2018_06_29
				,
				true
				#endif
				);

				// ReplaceAllUsesWith(Node, inlineAsmNode);
				// CrtDAG->RemoveDeadNode(Node);
				// Gives at scheduling error: Assertion `Node2Index[SU->NodeNum] >
				// Node2Index[I->getSUnit()->NodeNum] &&
				// "Wrong topological sorting"' failed.
				// ReplaceNode defined in include/llvm/CodeGen/SelectionDAGISel.h .

				ReplaceNode(Node, inlineAsmNode);

				// This takes out the REPEAT and symbolic expression INLINE Asm
				// ReplaceNode(Node, Node->getOperand(0).getNode());
				return;
				}
				/*
				case Intrinsic::connex_end_repeat:
				// Note: this case is handled in TableGen match pattern in
				// ConnexInstrInfo_REPEAT.td
				*/
				default:
				break;
				}
				}

				/* From http://llvm.org/docs/doxygen/html/ISDOpcodes_8h_source.html:
				/// RESULT,OUTCHAIN = INTRINSIC_W_CHAIN(INCHAIN, INTRINSICID, arg1, ...)
				/// This node represents a target intrinsic function with side effects that
				/// returns a result. The first operand is a chain pointer. The second is
				/// the ID number of the intrinsic from the llvm::Intrinsic namespace. The
				/// operands to the intrinsic follow. The node has two results, the result
				/// of the intrinsic and an output chain.
				*/
				case ISD::INTRINSIC_W_CHAIN: {
				LLVM_DEBUG(
				dbgs() << "ConnexDAGToDAGISel::Select(): case ISD::INTRINSIC_W_CHAIN"
				<< "\n");
				unsigned Num = cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();
				LLVM_DEBUG(dbgs() << "Num = " << Num << "\n");
				switch (Num) {
				case Intrinsic::connex_load_byte:
				case Intrinsic::connex_load_half:
				case Intrinsic::connex_load_word: {
				SDLoc DL(Node);
				SDValue Chain = Node->getOperand(0);
				SDValue N1 = Node->getOperand(1);
				SDValue Skb = Node->getOperand(2);
				SDValue N3 = Node->getOperand(3);

				// TODO_CHANGE_BACKEND:
				// SDValue R6Reg = CrtDAG->getRegister(Connex::R6, MVT::i64);
				SDValue R6Reg = CrtDAG->getRegister(Connex::R6, TYPE_SCALAR_ELEMENT);

				Chain = CrtDAG->getCopyToReg(Chain, DL, R6Reg, Skb, SDValue());
				Node = CrtDAG->UpdateNodeOperands(Node, Chain, N1, R6Reg, N3);
				break;
				}
				case Intrinsic::connex_reduce: {
				// EVT ResTy = Node->getValueType(0);
				EVT ResTy = (Node->getOperand(2).getNode())->getValueType(0);

				LLVM_DEBUG(dbgs() << "Entered Select() for Intrinsic::connex_reduce:\n"
				<< " ResTy = " << ResTy.getEVTString() << "\n");

				if (ResTy == TYPE_VECTOR_I32) {
				LLVM_DEBUG(dbgs() << "Select() for connex_reduce.i32\n");

				SDNode *reduceHigh16 = selectReduceI32(Node);

				ReplaceNode(Node, reduceHigh16);
				// Res // does NOT work - gives RT error: whereEq);
				// ReplaceNode(Node, nodeOpSrcCast);

				/*
				// See llvm.org/docs/doxygen/html/classllvm_1_1SelectionDAG.html
				CrtDAG->SelectNodeTo(Node,
				Connex::RED_H,
				TYPE_VECTOR_I16,
				SDValue(vloadCt0_srcAux, 0));
				*/
				return;
				} // End case Intrinsic::connex_reduce_i32
				else if (ResTy == TYPE_VECTOR_F16) {
				LLVM_DEBUG(dbgs() << "Select() for connex_reduce.f16\n");

				SDNode *reduceH = selectReduceF16(Node);

				ReplaceNode(Node, reduceH);

				return;
				} // End case Intrinsic::connex_reduce_f16
				}
				}
				break;
				}

				case ISD::FrameIndex: {
				int FI = cast<FrameIndexSDNode>(Node)->getIndex();
				EVT VT = Node->getValueType(0);
				SDValue TFI = CrtDAG->getTargetFrameIndex(FI, VT);
				unsigned Opc = Connex::MOV_rr;
				if (Node->hasOneUse()) {
				CrtDAG->SelectNodeTo(Node, Opc, VT, TFI);
				return;
				}
				ReplaceNode(Node, CrtDAG->getMachineNode(Opc, SDLoc(Node), VT, TFI));
				return;
				}

				case ISD::INSERT_VECTOR_ELT: {
				LLVM_DEBUG(dbgs() << "Entered Select() for ISD::INSERT_VECTOR_ELT.\n");
				return;
				}
				/*
				case ISD::SETCC: {
				SDNode *res = Select...(Node);
				ReplaceNode(Node, res);
				return;
				}
				*/
				case ISD::VSELECT: {
				SDNode *res = selectVSELECT(Node);
				ReplaceNode(Node, res);
				return;
				}
				case ISD::FNEG: {
				EVT ResTy = Node->getValueType(0);
				LLVM_DEBUG(dbgs() << "Entered Select() for ISD::FNEG: \n"
				<< " ResTy = " << ResTy.getEVTString() << "\n");
				SDLoc DL(Node);
				SDValue nodeOpSrc = Node->getOperand(0);

				if (ResTy == TYPE_VECTOR_F16) {
				LLVM_DEBUG(dbgs() << "Select() for FNEG: "
				"We are in the case TYPE_VECTOR_F16\n");

				SDNode *nodeOpSrcCast = CrtDAG->getMachineNode(
				Connex::NOP_BITCONVERT_WH, DL, ResTy, MVT::Glue, nodeOpSrc);

				SDValue ct0x8000 = CrtDAG->getConstant(0x8000, DL, MVT::i16, true, false);
				SDNode *vload0x8000 = CrtDAG->getMachineNode(
				Connex::VLOAD_H, DL, TYPE_VECTOR_I16, MVT::Glue, ct0x8000,
				// glue (or chain) input
				SDValue(nodeOpSrcCast, 1));

				SDNode *res = CrtDAG->getMachineNode(Connex::XORV_H, DL, ResTy, MVT::Glue,
				SDValue(nodeOpSrcCast, 0),
				SDValue(vload0x8000, 0),
				// glue (or chain) input edge
				SDValue(vload0x8000, 1));

				SDNode *resW = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_HW, DL,
				ResTy, SDValue(res, 0),
				// chain edge
				SDValue(res, 1));
				ReplaceNode(Node, resW);

				LLVM_DEBUG(dbgs() << "Select() for FNEG: Node = "; Node->dump();
				dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "Select() for FNEG: res = "; res->dump();
				dbgs() << "\n");

				return;
				}
				}
				case ISD::FADD: {
				EVT ResTy = Node->getValueType(0);
				LLVM_DEBUG(dbgs() << "Entered Select() for ISD::FADD: \n"
				<< " ResTy = " << ResTy.getEVTString() << "\n");

				SDLoc DL(Node);
				SDValue nodeOpSrc1 = Node->getOperand(0);
				SDValue nodeOpSrc2 = Node->getOperand(1);

				// NEW_FP16
				if (ResTy == TYPE_VECTOR_F16) {
				LLVM_DEBUG(dbgs() << "Select() for FADD: "
				"We are in the case TYPE_VECTOR_F16\n");

				SDNode *res = selectAddF16(Node);
				ReplaceNode(Node, res);

				LLVM_DEBUG(dbgs() << "Select() for ISD::FADD: Node = "; Node->dump();
				dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "Select() for ISD::FADD: res = "; res->dump();
				dbgs() << "\n");

				return;
				} else if (ResTy == MVT::f16) {
				// Scalar F16
				// TODO: We should emulate with BPF assembler the add.f16
				// scalar op. This means we need to use a NOP_CONVERT_F16_TO_I64, etc
				LLVM_DEBUG(dbgs() << "Select() for FADD:We are in the case MVT::F16\n");

				SDNode *res = CrtDAG->getMachineNode(Connex::ADD_rr, // This is actually
				// a BPF instruction
				DL, ResTy,
				// NOT working - error <<Assertion
				// `(!From->hasAnyUseOfValue(i) \|\|
				// From->getValueType(i) ==
				// To->getValueType(i)) &&
				// "Cannot use this version of
				// ReplaceAllUsesWith!"' failed.>>:
				// MVT::i64,
				// MVT::Other,
				// nodeOpSrc1,
				// I guess this is not needed,
				// since the auto-ISeled BPF
				// instructions don't need it
				// either
				nodeOpSrc1, nodeOpSrc2
				// opChain
				);

				ReplaceNode(Node, res);

				LLVM_DEBUG(dbgs() << "Select() for scalar ISD::FADD: Node = ";
				Node->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "Select() for scalar ISD::FADD: res = "; res->dump();
				dbgs() << "\n");

				return;
				}
				} // End ISD::FADD
				case ISD::FSUB: {
				EVT ResTy = Node->getValueType(0);
				LLVM_DEBUG(dbgs() << "Entered Select() for ISD::FSUB: \n"
				<< " ResTy = " << ResTy.getEVTString() << "\n");

				if (ResTy == TYPE_VECTOR_F16) {
				LLVM_DEBUG(dbgs() << "Select() for FSUB: "
				"We are in the case TYPE_VECTOR_F16\n");
				// typeVecNode = TYPE_VECTOR_F16;

				SDNode *res = selectSubF16(Node);

				ReplaceNode(Node, res);

				LLVM_DEBUG(dbgs() << "Select() for ISD::FSUB: res = "; res->dump(CrtDAG);
				dbgs() << "\n");
				return;
				}
				}
				case ISD::FMUL: {
				EVT ResTy = Node->getValueType(0);
				LLVM_DEBUG(dbgs() << "Entered Select() for ISD::FMUL: \n"
				<< " ResTy = " << ResTy.getEVTString() << "\n");

				if (ResTy == TYPE_VECTOR_F16) {
				LLVM_DEBUG(dbgs() << "Select() for FMUL: "
				"We are in the case TYPE_VECTOR_F16\n");
				// typeVecNode = TYPE_VECTOR_F16;

				// TODO
				SDNode *res = selectMulF16(Node);

				ReplaceNode(Node, res);

				LLVM_DEBUG(dbgs() << "Select() for ISD::FMUL: res = "; res->dump(CrtDAG);
				dbgs() << "\n");
				return;
				}
				}

				case ISD::ABS: {
				EVT typeVecNode;
				EVT ResTy = Node->getValueType(0);

				LLVM_DEBUG(dbgs() << "Entered Select() for ISD::ABS: \n"
				<< " ResTy = " << ResTy.getEVTString() << "\n");

				if (ResTy == TYPE_VECTOR_I32) {
				LLVM_DEBUG(dbgs() << "Select() for ABS: "
				"We are in the case TYPE_VECTOR_I32\n");
				typeVecNode = TYPE_VECTOR_I32;

				SDNode *res = selectAbsI32(Node);

				ReplaceNode(Node, res);

				LLVM_DEBUG(dbgs() << "Select() for ISD::ABS: res = "; res->dump(CrtDAG);
				dbgs() << "\n");
				return;
				} else if (ResTy == TYPE_VECTOR_I16) {
				assert(0 && "Not implemented!");
				LLVM_DEBUG(dbgs() << "Select() for ABS: "
				"We are in the case TYPE_VECTOR_I16\n");
				typeVecNode = TYPE_VECTOR_I16;
				}

				break;
				} // End ISD::ADD

				// NEW32
				case ISD::ADD: {
				EVT typeVecNode;
				EVT ResTy = Node->getValueType(0);

				LLVM_DEBUG(dbgs() << "Entered Select() for ISD::ADD: \n"
				<< " ResTy = " << ResTy.getEVTString() << "\n");

				if (ResTy == TYPE_VECTOR_I32) {
				LLVM_DEBUG(dbgs() << "Select() for ADD: "
				"We are in the case TYPE_VECTOR_I32\n");
				typeVecNode = TYPE_VECTOR_I32;

				SDNode *res = selectAddI32(Node);

				ReplaceNode(Node, res);

				LLVM_DEBUG(dbgs() << "Select() for ISD::ADD: res = "; res->dump(CrtDAG);
				dbgs() << "\n");
				return;
				} else if (ResTy == TYPE_VECTOR_I16) {
				LLVM_DEBUG(dbgs() << "Select() for ADD: "
				"We are in the case TYPE_VECTOR_I16\n");
				typeVecNode = TYPE_VECTOR_I16;
				}

				break;
				} // End ISD::ADD
				// NEW32
				case ISD::SUB: {
				EVT typeVecNode;
				EVT ResTy = Node->getValueType(0);

				LLVM_DEBUG(dbgs() << "Entered Select() for ISD::SUB.\n"
				<< "Select() for SUB: "
				"ResTy = "
				<< ResTy.getEVTString() << "\n");

				if (ResTy == TYPE_VECTOR_I32) {
				LLVM_DEBUG(dbgs() << "Select() for SUB: "
				"We are in the case TYPE_VECTOR_I32\n");
				typeVecNode = TYPE_VECTOR_I32;

				SDNode *res = selectSubI32(Node);

				ReplaceNode(Node, res);

				LLVM_DEBUG(dbgs() << "Select() for ISD::SUB: res = "; res->dump(CrtDAG);
				dbgs() << "\n");
				return;
				} else if (ResTy == TYPE_VECTOR_I16) {
				LLVM_DEBUG(dbgs() << "Select() for SUB: "
				"We are in the case TYPE_VECTOR_I16\n");
				typeVecNode = TYPE_VECTOR_I16;
				}

				break;
				} // End ISD::SUB
				case ISD::MUL: {
				SDLoc DL(Node);

				EVT ViaVecTy;
				EVT typeVecNode;
				EVT ResTy = Node->getValueType(0);

				LLVM_DEBUG(dbgs() << "Entered Select() for ISD::MUL.\n");

				LLVM_DEBUG(dbgs() << "Select() for MUL: "
				"ResTy = "
				<< ResTy.getEVTString() << "\n");
				if (ResTy == TYPE_VECTOR_I32) {
				LLVM_DEBUG(dbgs() << "Select() for MUL: "
				"We are in the case TYPE_VECTOR_I32\n");
				typeVecNode = TYPE_VECTOR_I32;

				SDNode *res = selectMulI32(Node);

				ReplaceNode(Node, res);

				LLVM_DEBUG(dbgs() << "Select() for ISD::MUL: res = "; res->dump(CrtDAG);
				dbgs() << "\n");
				return;
				} else if (ResTy == TYPE_VECTOR_I16) {
				LLVM_DEBUG(dbgs() << "Select() for ISD::MUL: We are in the case "
				"TYPE_VECTOR_I16\n");
				typeVecNode = TYPE_VECTOR_I16;
				}

				break;
				} // End case ISD::MUL
				case ISD::FDIV: {
				EVT ResTy = Node->getValueType(0);
				LLVM_DEBUG(dbgs() << "Entered Select() for ISD::FDIV: \n"
				<< " ResTy = " << ResTy.getEVTString() << "\n");

				if (ResTy == TYPE_VECTOR_F16) {
				LLVM_DEBUG(dbgs() << "Select() for FDIV: "
				"We are in the case TYPE_VECTOR_F16\n");
				// typeVecNode = TYPE_VECTOR_F16;

				// TODO
				SDNode *res = selectDivF16(Node);

				ReplaceNode(Node, res);

				LLVM_DEBUG(dbgs() << "Select() for ISD::FDIV: res = "; res->dump(CrtDAG);
				dbgs() << "\n");
				return;
				}
				/*
				SDLoc DL(Node);

				EVT ViaVecTy;
				EVT typeVecNode;
				EVT ResTy = Node->getValueType(0);

				LLVM_DEBUG(dbgs() << "Entered Select() for ISD::FDIV.\n");

				LLVM_DEBUG(dbgs() << "Select() for FDIV: "
				"ResTy = " << ResTy.getEVTString()
				<< "\n");
				*/
				}
				// TODO: should be also case ISD::SDIVREM:
				case ISD::SDIV: {
				SDLoc DL(Node);

				EVT ViaVecTy;
				EVT typeVecNode;
				EVT ResTy = Node->getValueType(0);

				LLVM_DEBUG(dbgs() << "Entered Select() for ISD::DIV.\n");

				LLVM_DEBUG(dbgs() << "Select() for DIV: "
				"ResTy = "
				<< ResTy.getEVTString() << "\n");

				if (ResTy == TYPE_VECTOR_I32) {
				LLVM_DEBUG(dbgs() << "Select() for DIV: "
				"We are in the case TYPE_VECTOR_I32\n");
				typeVecNode = TYPE_VECTOR_I32;

				assert(0 && "Not implemented");

				/* SDNode *res = SelectDivI32(Node);
				ReplaceNode(Node, res);

				LLVM_DEBUG(dbgs() << "Select() for ISD::DIV: res = ";
				res->dump(CrtDAG); dbgs() << "\n");
				*/
				return;
				} else if (ResTy == TYPE_VECTOR_I16) {
				LLVM_DEBUG(dbgs() << "Select() for ISD::DIV: "
				"We are in the case TYPE_VECTOR_I16\n");
				typeVecNode = TYPE_VECTOR_I16;

				SDNode *res = selectDivI16(Node);

				ReplaceNode(Node, res);

				LLVM_DEBUG(dbgs() << "Select() for ISD::DIV: res = "; res->dump(CrtDAG);
				dbgs() << "\n");
				return;
				}

				break;
				}
				case ISD::OR: {
				SDLoc DL(Node);

				EVT ViaVecTy;
				EVT typeVecNode;
				EVT ResTy = Node->getValueType(0);

				LLVM_DEBUG(dbgs() << "Entered Select() for ISD::OR.\n");
				LLVM_DEBUG(dbgs() << "Select() for OR: "
				"ResTy = "
				<< ResTy.getEVTString() << "\n");

				if (ResTy == TYPE_VECTOR_I32) {
				LLVM_DEBUG(dbgs() << "Select() for OR: "
				"We are in the case TYPE_VECTOR_I32\n");
				typeVecNode = TYPE_VECTOR_I32;

				SDValue nodeOp0 = Node->getOperand(0);
				SDValue nodeOp1 = Node->getOperand(1);

				SDNode *nodeOp0Cast = CrtDAG->getMachineNode(
				Connex::NOP_BITCONVERT_WH, DL, TYPE_VECTOR_I16, nodeOp0);
				SDNode *nodeOp1Cast = CrtDAG->getMachineNode(
				Connex::NOP_BITCONVERT_WH, DL, TYPE_VECTOR_I16, nodeOp1);

				SDNode *Res16 = CrtDAG->getMachineNode(Connex::ORV_H, DL, TYPE_VECTOR_I16,
				// MVT::Other,
				SDValue(nodeOp0Cast, 0),
				SDValue(nodeOp1Cast, 0));
				LLVM_DEBUG(dbgs() << "Select() for ISD::OR: Res16 = ";
				Res16->dump(CrtDAG); dbgs() << "\n");

				SDNode *Res = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_HW, DL,
				typeVecNode, SDValue(Res16, 0));

				ReplaceNode(Node, Res);

				LLVM_DEBUG(dbgs() << "Select() for ISD::OR: Res = "; Res->dump(CrtDAG);
				dbgs() << "\n");
				return;
				} else if (ResTy == TYPE_VECTOR_I16) {
				LLVM_DEBUG(
				dbgs() << "Select() for OR: We are in the case TYPE_VECTOR_I16\n");
				typeVecNode = TYPE_VECTOR_I16;
				}

				break;
				} // End ISD::OR
				// NEW32
				case ISD::AND: {
				SDLoc DL(Node);

				EVT ViaVecTy;
				EVT typeVecNode;
				EVT ResTy = Node->getValueType(0);

				LLVM_DEBUG(dbgs() << "Entered Select() for ISD::AND.\n");

				LLVM_DEBUG(dbgs() << "Select() for AND: ResTy = " << ResTy.getEVTString()
				<< "\n");
				if (ResTy == TYPE_VECTOR_I32) {
				LLVM_DEBUG(
				dbgs() << "Select() for AND: We are in the case TYPE_VECTOR_I32\n");
				typeVecNode = TYPE_VECTOR_I32;

				SDValue nodeOp0 = Node->getOperand(0);
				SDValue nodeOp1 = Node->getOperand(1);

				SDNode *nodeOp0Cast = CrtDAG->getMachineNode(
				Connex::NOP_BITCONVERT_WH, DL, TYPE_VECTOR_I16, nodeOp0);
				SDNode *nodeOp1Cast = CrtDAG->getMachineNode(
				Connex::NOP_BITCONVERT_WH, DL, TYPE_VECTOR_I16, nodeOp1);

				SDNode *Res16 = CrtDAG->getMachineNode(
				Connex::ANDV_H, DL, TYPE_VECTOR_I16,
				// MVT::Other,
				SDValue(nodeOp0Cast, 0), SDValue(nodeOp1Cast, 0));
				LLVM_DEBUG(dbgs() << "Select() for ISD::AND: Res16 = ";
				Res16->dump(CrtDAG); dbgs() << "\n");

				SDNode *Res = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_HW, DL,
				typeVecNode, SDValue(Res16, 0));

				ReplaceNode(Node, Res);

				LLVM_DEBUG(dbgs() << "Select() for ISD::AND: Res = "; Res->dump(CrtDAG);
				dbgs() << "\n");
				return;
				} else if (ResTy == TYPE_VECTOR_I16) {
				LLVM_DEBUG(
				dbgs() << "Select() for AND: We are in the case TYPE_VECTOR_I16\n");
				typeVecNode = TYPE_VECTOR_I16;
				}

				break;
				} // End ISD::AND
				// NEW32
				case ISD::XOR: {
				SDLoc DL(Node);

				/* TODO: check that the flags are also equivalent: XOR i16
				sets flags like SUBC:
				see ConnexVector.cpp
				BINARY_OP_FLAGS_LIKE_SUBC(^) - look for the macros
				*/

				EVT ViaVecTy;
				EVT typeVecNode;
				EVT ResTy = Node->getValueType(0);

				LLVM_DEBUG(dbgs() << "Entered Select() for ISD::XOR.\n");

				LLVM_DEBUG(dbgs() << "Select() for XOR: ResTy = " << ResTy.getEVTString()
				<< "\n");
				if (ResTy == TYPE_VECTOR_I32) {
				LLVM_DEBUG(
				dbgs() << "Select() for XOR: We are in the case TYPE_VECTOR_I32\n");
				typeVecNode = TYPE_VECTOR_I32;

				SDValue nodeOp0 = Node->getOperand(0);
				SDValue nodeOp1 = Node->getOperand(1);

				SDNode *nodeOp0Cast = CrtDAG->getMachineNode(
				Connex::NOP_BITCONVERT_WH, DL, TYPE_VECTOR_I16, nodeOp0);
				SDNode *nodeOp1Cast = CrtDAG->getMachineNode(
				Connex::NOP_BITCONVERT_WH, DL, TYPE_VECTOR_I16, nodeOp1);

				SDNode *Res16 = CrtDAG->getMachineNode(
				Connex::XORV_H, DL, TYPE_VECTOR_I16,
				// MVT::Other,
				SDValue(nodeOp0Cast, 0), SDValue(nodeOp1Cast, 0));
				LLVM_DEBUG(dbgs() << "Select() for ISD::XOR: Res16 = ";
				Res16->dump(CrtDAG); dbgs() << "\n");

				SDNode *Res = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_HW, DL,
				typeVecNode, SDValue(Res16, 0));

				ReplaceNode(Node, Res);

				LLVM_DEBUG(dbgs() << "Select() for ISD::XOR: Res = "; Res->dump(CrtDAG);
				dbgs() << "\n");
				return;
				} else if (ResTy == TYPE_VECTOR_I16) {
				LLVM_DEBUG(
				dbgs() << "Select() for XOR: We are in the case TYPE_VECTOR_I16\n");
				typeVecNode = TYPE_VECTOR_I16;
				}

				break;
				} // End ISD::XOR
				// NEW32
				case ISD::SRA: { // Arithmetic Shift Right
				// See http://llvm.org/docs/LangRef.html#ashr-instruction
				// and http://en.wikipedia.org/wiki/Arithmetic_shift
				SDLoc DL(Node);

				EVT ViaVecTy;
				EVT typeVecNode;
				EVT ResTy = Node->getValueType(0);

				LLVM_DEBUG(dbgs() << "Entered Select() for ISD::SRA.\n");
				LLVM_DEBUG(dbgs() << "Select() for SRA: "
				"ResTy = "
				<< ResTy.getEVTString() << "\n");

				if (ResTy == TYPE_VECTOR_I32) {
				LLVM_DEBUG(dbgs() << "selectSraI32() for MUL: We are in the case "
				"TYPE_VECTOR_I32\n");
				// typeVecNode = TYPE_VECTOR_I32;

				/*
				TODO(INTERESTING)
				// ConstantSDNode *nodeOp0CtSDNode = cast<ConstantSDNode>(nodeOp1);
				BuildVectorSDNode *BVN = cast<BuildVectorSDNode>(nodeOp1.getNode());
				// TODO: need to discriminate case: immediate operand - it takes fewer
				cycles APInt SplatValue, SplatUndef; unsigned SplatBitSize; bool
				HasAnyUndefs; if (BVN->isConstantSplat(SplatValue, SplatUndef,
				SplatBitSize, HasAnyUndefs,
				8, true) == true) {
				LLVM_DEBUG(
				dbgs() << "Select() for SRA: BVN->isConstantSplat() == TRUE\n");
				// MEGA-TODO: in this case we should do ISHRA.i32 instead of SHRA.i32
				}
				*/

				SDNode *res = selectSraI32(Node);

				ReplaceNode(Node, res);

				LLVM_DEBUG(dbgs() << "Select() for ISD::SRA: res = "; res->dump(CrtDAG);
				dbgs() << "\n");
				return;
				} else if (ResTy == TYPE_VECTOR_I16) {
				LLVM_DEBUG(
				dbgs() << "Select() for SRA: We are in the case TYPE_VECTOR_I16\n");
				typeVecNode = TYPE_VECTOR_I16;
				}

				break;
				} // End ISD::SRA
				// NEW32
				case ISD::MGATHER: {
				LLVM_DEBUG(dbgs() << "Entered Select() for ISD::MGATHER.\n");
				LLVM_DEBUG(dbgs() << " Select(): Node = "; Node->dump(); dbgs() << "\n");

				SDLoc DL(Node);
				EVT ViaVecTy;
				EVT typeVecNode;
				EVT ResTy = Node->getValueType(0);

				MaskedGatherSDNode *nodeGather = dyn_cast<MaskedGatherSDNode>(Node);
				assert(nodeGather != NULL);

				// See llvm.org/docs/doxygen/html/SelectionDAGNodes_8h_source.html#l02107
				SDValue indexVec = nodeGather->getIndex();
				SDValue passthruVec = nodeGather->getPassThru();

				LLVM_DEBUG(dbgs() << "Select() for MGATHER: indexVec = ";
				(indexVec.getNode())->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "Select() for MGATHER: passthruVec = ";
				(passthruVec.getNode())->dump(); dbgs() << "\n");

				EVT opIndexVecTy = indexVec.getValueType();
				EVT opValVecTy = passthruVec.getValueType();

				LLVM_DEBUG(dbgs() << "Select() for MGATHER: opIndexVecTy = "
				<< opIndexVecTy.getEVTString()
				<< ", opValVecTy = " << opValVecTy.getEVTString()
				<< ", ResTy = " << ResTy.getEVTString() << "\n");

				SDValue opChain = Node->getOperand(0);
				LLVM_DEBUG(dbgs() << "Select() for MGATHER: opChain = ";
				(opChain.getNode())->dump(); dbgs() << "\n");

				// NEW_FP16
				// if (opValVecTy == TYPE_VECTOR_F16)
				if (ResTy == TYPE_VECTOR_F16) {
				typeVecNode = TYPE_VECTOR_F16;

				LLVM_DEBUG(dbgs() << "Select() for MGATHER: We are in the case "
				"ResTy == TYPE_VECTOR_F16\n");

				#ifdef BITCAST_2018_06_F16
				SDNode *indexVec16 = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_WH, DL,
				// typeVecNode,
				TYPE_VECTOR_I16,
				// The address operand
				indexVec);
				#endif

				SDNode *Res16 = CrtDAG->getMachineNode(Connex::LD_INDIRECT_H, DL,
				#ifdef BITCAST_2018_06_F16
				TYPE_VECTOR_I16,
				#else
				typeVecNode,
				// We prevent getting error:
				// <<Assertion
				// `(!From->hasAnyUseOfValue(i)
				// \|\| From->getValueType(i) ==
				// To->getValueType(i)) &&
				// "Cannot use this version of
				// ReplaceAllUsesWith!"'
				// failed.>>
				#endif
				// MVT::Other,
				#ifdef BITCAST_2018_06_F16
				SDValue(indexVec16, 0), // indexVec
				#else
				indexVec,
				#endif
				opChain);

				SDNode *Res;
				#ifdef BITCAST_2018_06_F16
				Res = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_HW, DL, typeVecNode,
				SDValue(Res16, 0));
				#else
				Res = Res16;
				#endif

				// TODO: Use instead of Connex::NOP_BITCONVERT_WH a new node called
				// Connex::NOP_BITCONVERT_F16H
				LLVM_DEBUG(dbgs() << "Select(): Node = "; Node->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "Select(): Res = "; Res->dump(); dbgs() << "\n");

				ReplaceNode(Node, Res);

				return;
				} else if (opIndexVecTy == TYPE_VECTOR_I32) {
				LLVM_DEBUG(dbgs() << "Select() for MGATHER: We are in the case "
				"TYPE_VECTOR_I32\n");
				typeVecNode = TYPE_VECTOR_I32;

				/* Very Important: we add opChain to chain this new node with the node
				the target-independent masked_gather node was chained with.
				If we do not do this then we will eventually have other useful
				chained nodes removed, resulting in a incorrect/partial program. */
				/* TODO: not sure if the chain is going to always be operand 0.
				However masked_gather has a chain following attribute SDNPHasChain,
				see include/llvm/Target/TargetSelectionDAG.td
				See also indirectly the other params (methods get*()) of
				MaskedGatherScatterSDNode at
				http://llvm.org/docs/doxygen/html/SelectionDAGNodes_8h_source.html
				*/
				#ifdef BITCAST_MAY2017_05_28
				SDNode *indexVec16 = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_WH, DL,
				// typeVecNode,
				TYPE_VECTOR_I16,
				// The address operand
				indexVec);
				#endif
				SDNode *Res16 = CrtDAG->getMachineNode(Connex::LD_INDIRECT_H, DL,
				// typeVecNode,
				TYPE_VECTOR_I16,
				// MVT::Other,
				#ifdef BITCAST_MAY2017_05_28
				SDValue(indexVec16, 0), // indexVec
				#else
				indexVec,
				#endif
				opChain);
				SDNode *Res;
				#ifdef BITCAST_MAY2017_05_28
				Res = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_HW, DL, typeVecNode,
				MVT::Other,
				// We need this only for DotProd.i16
				SDValue(Res16, 0));
				#else
				Res = Res16;
				#endif

				LLVM_DEBUG(dbgs() << "Select(): Node = "; Node->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "Select(): Res = "; Res->dump(); dbgs() << "\n");

				ReplaceNode(Node, Res);

				return;
				} else if (opIndexVecTy == TYPE_VECTOR_I16) {
				LLVM_DEBUG(dbgs() << "Select() for MGATHER: We are in the case "
				"TYPE_VECTOR_I16\n");
				typeVecNode = TYPE_VECTOR_I16;

				SDNode *Res = CrtDAG->getMachineNode(Connex::LD_INDIRECT_H, DL,
				// typeVecNode,
				TYPE_VECTOR_I16,
				/* Usually it comes with ch
				putting it here avoids error
				<<Assertion `ResNo < NumValues &&
				"Illegal result number!"' failed.>>
				*/
				MVT::Other, indexVec, opChain);

				LLVM_DEBUG(dbgs() << "Res = "; Res->dump(CrtDAG); dbgs() << "\n");
				ReplaceNode(Node, Res);

				return;
				}
				// Res = CrtDAG->getMachineNode(Connex::LD_INDIRECT_W, DL, ViaVecTy,
				// Node->getOperand(0));
				// Res = CrtDAG->getMachineNode(LD_INDIRECT_W_DESC_BASE, DL, ViaVecTy,
				// Node->getOperand(0));
				// Res = CrtDAG->getMachineNode(ST_INDIRECT_H_DESC_BASE, DL, ViaVecTy,
				// Node->getOperand(0));

				break;
				} // End ISD::MGATHER
				// NEW32
				case ISD::MSCATTER: {
				LLVM_DEBUG(dbgs() << "Entered Select() for ISD::MSCATTER.\n");

				SDLoc DL(Node);
				MVT typeVecNode;
				// For SCATTER it is chain: EVT ResTy = Node->getValueType(0);
				// MVT mResTy = ResTy.getSimpleVT();

				MaskedScatterSDNode *nodeScatter = dyn_cast<MaskedScatterSDNode>(Node);
				// See llvm.org/docs/doxygen/html/SelectionDAGNodes_8h_source.html#l02107
				SDValue indexVec = nodeScatter->getIndex();
				SDValue sourceVec = nodeScatter->getValue();

				LLVM_DEBUG(dbgs() << "Select() for MSCATTER: indexVec = ";
				(indexVec.getNode())->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "Select() for MSCATTER: sourceVec = ";
				(sourceVec.getNode())->dump(); dbgs() << "\n");

				EVT opIndexVecTy = indexVec.getValueType();
				/Node->getOperand(0).getValueType(); / // getSimpleValueType();
				EVT opSourceVecTy = sourceVec.getValueType();

				LLVM_DEBUG(dbgs() << "Select() for MSCATTER: "
				<< "opIndexVecTy = " << opIndexVecTy.getEVTString()
				<< ", opSourceVecTy = " << opSourceVecTy.getEVTString()
				<< "\n");

				// NEW_FP16
				if (opSourceVecTy == TYPE_VECTOR_F16) {
				LLVM_DEBUG(dbgs() << "Select() for MSCATTER: We are in the case "
				"opSourceVecTy == TYPE_VECTOR_F16\n");

				// TODO: Use instead of Connex::NOP_BITCONVERT_WH a new node called
				// Connex::NOP_BITCONVERT_F16H
				#ifdef BITCAST_2018_06_F16
				SDNode *sourceVec16 = CrtDAG->getMachineNode(
				Connex::NOP_BITCONVERT_WH, DL, TYPE_VECTOR_I16, sourceVec);
				SDNode *Res =
				CrtDAG->getMachineNode(Connex::ST_INDIRECT_H, DL, MVT::Other,
				indexVec, SDValue(sourceVec16, 0));
				#else
				SDNode *Res = CrtDAG->getMachineNode(Connex::ST_INDIRECT_H, DL,
				MVT::Other, indexVec, sourceVec);
				#endif

				LLVM_DEBUG(dbgs() << "Select(): Node = "; Node->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "Select(): Res = "; Res->dump(); dbgs() << "\n");

				ReplaceNode(Node, Res);

				return;
				} else if (opIndexVecTy == TYPE_VECTOR_I32) {
				LLVM_DEBUG(dbgs() << "Select() for MSCATTER: We are in the case "
				"opIndexVecTy == TYPE_VECTOR_I32\n");

				typeVecNode = TYPE_VECTOR_I32;

				/* Very Important: we add opChain to chain this new node with the node
				the target-independent masked_gather node was chained with.
				If we do not do this then we will eventually have other useful
				chained nodes removed, resulting in a incorrect/partial program. */
				/* TODO: not sure if the chain is going to always be operand 0.
				However masked_gather has a chain following attribute SDNPHasChain,
				see include/llvm/Target/TargetSelectionDAG.td
				See also indirectly the other params (methods get*()) of
				MaskedGatherScatterSDNode at
				http://llvm.org/docs/doxygen/html/SelectionDAGNodes_8h_source.html
				*/
				SDValue opChain = Node->getOperand(0);
				LLVM_DEBUG(dbgs() << "Select() for MSCATTER: opChain = ";
				(opChain.getNode())->dump(); dbgs() << "\n");
				#ifdef BITCAST_MAY2017_05_28
				SDNode *indexVec16 = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_WH, DL,
				// typeVecNode,
				TYPE_VECTOR_I16, indexVec);
				SDNode *sourceVec16 =
				CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_WH, DL,
				// typeVecNode,
				TYPE_VECTOR_I16, sourceVec);
				#endif
				SDNode *Res = CrtDAG->getMachineNode(Connex::ST_INDIRECT_H, DL,
				// typeVecNode,
				// voidEVT,
				MVT::Other,
				#ifdef BITCAST_MAY2017_05_28
				SDValue(indexVec16, 0), // indexVec,
				// sourceVec
				SDValue(sourceVec16, 0)
				#else
				indexVec, sourceVec
				#endif
				/*
				// ,opChain
				TODO: figure out why can't I add a
				chain edge to scatter like I did for
				MGAHTER MAYBE use:
				CrtDAG->getVTList(MVT::Other,
				MVT::Glue), */
				);

				LLVM_DEBUG(dbgs() << "Select(): Node = "; Node->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "Select(): Res = "; Res->dump(); dbgs() << "\n");

				ReplaceNode(Node, Res);

				return;
				} else if (opIndexVecTy == TYPE_VECTOR_I16) {
				LLVM_DEBUG(dbgs() << "Select() for MSCATTER: We are in the case "
				"opIndexVecTy == TYPE_VECTOR_I16\n");

				typeVecNode = TYPE_VECTOR_I16;

				SDNode *Res = CrtDAG->getMachineNode(Connex::ST_INDIRECT_H, DL,
				// typeVecNode,
				// voidEVT,
				MVT::Other, indexVec, sourceVec);

				LLVM_DEBUG(dbgs() << "Select(): Node = "; Node->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "Select(): Res = "; Res->dump(); dbgs() << "\n");

				ReplaceNode(Node, Res);

				return;
				}

				/*
				LLVMContext &theContext = *(CrtDAG->getContext());
				EVT voidEVT = EVT::getEVT(Type::getVoidTy(theContext));
				LLVM_DEBUG(dbgs() << " voidEVT = "
				<< voidEVT.getEVTString() << "\n");
				*/

				break;
				} // End ISD::MSCATTER
				case ISD::ConstantPool: {
				LLVM_DEBUG(dbgs() << "Entered Select() for ISD::ConstantPool.\n");
				LLVM_DEBUG(dbgs() << " Select(): Node = "; Node->dump(); dbgs() << "\n");

				SDLoc DL(Node);

				// TODO: check for splat 0..CVL-1
				// TODO: I need to return TYPE_VECTOR_I16 (maybe create a virtreg
				// also)
				SDNode *Res = CrtDAG->getMachineNode(Connex::LDIX_H, DL, MVT::i64
				// TYPE_VECTOR_I16,
				// We add a chain edge
				// CrtDAG->getEntryNode()
				// sourceVec,
				// offsetVec
				// MVT::Other
				// offset,
				// basePtr,
				// opChain
				);
				LLVM_DEBUG(dbgs() << "Select(): Node = "; Node->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "Select(): Res = "; Res->dump(); dbgs() << "\n");

				ReplaceNode(Node, Res);

				return;
				}
				// NEW_FP16: required for non-vector BBs like for.body
				case ISD::LOAD: {
				LLVM_DEBUG(dbgs() << "Entered Select() for ISD::LOAD.\n");
				LLVM_DEBUG(dbgs() << " Select(): Node = "; Node->dump(); dbgs() << "\n");

				SDLoc DL(Node);
				EVT typeVecNode;
				EVT resTy = Node->getValueType(0);

				LoadSDNode *nodeLoad = dyn_cast<LoadSDNode>(Node);
				assert(nodeLoad != NULL);

				// See http://llvm.org/doxygen/SelectionDAGNodes_8h_source.html
				SDValue opChain = nodeLoad->getOperand(0);
				SDValue basePtr = nodeLoad->getBasePtr(); // Operand 1
				SDValue offset = nodeLoad->getOffset(); // Operand 2
				LLVM_DEBUG(
				dbgs() << "Select() for LOAD: basePtr = "; (basePtr.getNode())->dump();
				dbgs() << "Select() for LOAD: offset = "; (offset.getNode())->dump();
				dbgs() << "Select() for LOAD: opChain = "; (opChain.getNode())->dump();
				dbgs() << "\n");

				EVT offsetTy = offset.getValueType();

				LLVM_DEBUG(dbgs() << "Select() for LOAD: "
				<< "resTy = " << resTy.getEVTString()
				<< ", offsetTy = " << offsetTy.getEVTString() << "\n");
				LLVM_DEBUG(dbgs() << "Select() for LOAD: offset = ";
				(offset.getNode())->dump(); dbgs() << " basePtr = ";
				(basePtr.getNode())->dump(); dbgs() << " opChain = ";
				(opChain.getNode())->dump(); dbgs() << "\n");

				if (resTy == MVT::f16) {
				LLVM_DEBUG(dbgs() << "Select() for LOAD: We are in the case "
				"resTy == MVT::f16\n");

				// small-TODO: although useless, normally we should emulate f16 on BPF
				SDNode *Res16 =
				CrtDAG->getMachineNode(Connex::LDH, DL, resTy, MVT::Other,
				// NOT useful: MVT::Other,
				////offset,
				// Error: <<Assertion
				// `Op.getValueType() !=
				// MVT::Other &&
				// Op.getValueType() !=
				// MVT::Glue &&
				// "Chain and glue operands
				// should occur at end of
				// operand list!"' failed.>>
				// opChain,
				basePtr,
				//
				// Important: Unfortunately this
				// operand becomes a register,
				// not an immediate: offset,
				CrtDAG->getTargetConstant(0, DL, MVT::i64),
				// TODO: we should put probably
				// a different value than 0
				//
				opChain
				// This gives <<LLVM ERROR:
				// Cannot select: t60:
				// i64 = ConstantPool<half
				// 0xH0000> 0>> , basePtr
				);
				SDNode *Res = Res16;

				LLVM_DEBUG(dbgs() << "Select(): Node = "; Node->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "Select(): Res = "; Res->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << " Select(): Res->getOpcode() = " << Res->getOpcode()
				<< "\n");

				ReplaceNode(Node, Res);

				return;
				} // End if (resTy == MVT::f16)
				else if (resTy == TYPE_VECTOR_I16) {
				LLVM_DEBUG(dbgs() << "Select() for LOAD: We are in the case "
				"resTy == TYPE_VECTOR_I16\n");

				SDNode *Res =
				CrtDAG->getMachineNode(Connex::LD_INDIRECT_H, DL, TYPE_VECTOR_I16,
				// We add a chain edge
				// CrtDAG->getEntryNode()
				// sourceVec,
				// offsetVec
				MVT::Other,
				// offset,
				basePtr, opChain);
				LLVM_DEBUG(dbgs() << "Select(): Node = "; Node->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "Select(): Res = "; Res->dump(); dbgs() << "\n");

				ReplaceNode(Node, Res);

				return;
				}

				break;
				} // End ISD::LOAD
				// NEW_FP16: normally required for non-vector BBs like for.body
				case ISD::STORE: {
				LLVM_DEBUG(dbgs() << "Entered Select() for ISD::STORE.\n");
				LLVM_DEBUG(dbgs() << " Select(): Node = "; Node->dump(); dbgs() << "\n");

				SDLoc DL(Node);
				EVT typeNode;
				EVT resTy = Node->getValueType(0);

				StoreSDNode *nodeStore = dyn_cast<StoreSDNode>(Node);
				assert(nodeStore != NULL);

				// See http://llvm.org/doxygen/SelectionDAGNodes_8h_source.html#l02076
				SDValue opChain = nodeStore->getOperand(0);
				SDValue source = nodeStore->getValue(); // Operand 1
				SDValue basePtr = nodeStore->getBasePtr(); // Operand 2
				SDValue offset = nodeStore->getOffset(); // Operand 3
				LLVM_DEBUG(
				dbgs() << "Select() for STORE: offset = "; offset.getNode()->dump();
				dbgs() << "Select() for STORE: basePtr = "; basePtr.getNode()->dump();
				dbgs() << "Select() for STORE: source = "; source.getNode()->dump();
				dbgs() << "Select() for STORE: opChain = "; opChain.getNode()->dump();
				dbgs() << "\n");

				EVT offsetTy = offset.getValueType();
				EVT sourceTy = source.getValueType();

				LLVM_DEBUG(dbgs() << "Select() for STORE: "
				<< "sourceTy = " << sourceTy.getEVTString()
				<< ", offsetTy = " << offsetTy.getEVTString()
				<< ", resTy = " << resTy.getEVTString() << "\n");

				if (sourceTy == MVT::f16) {
				/* We need to treat this case because the BPF processor doesn't
				have any floating point support.
				*/
				LLVM_DEBUG(dbgs() << "Select() for STORE: We are in the case "
				"sourceTy == MVT::f16\n");

				/*
				// TODO: Use instead of Connex::NOP_BITCONVERT_WH a new node called
				// Connex::NOP_BITCONVERT_F16H
				SDNode *Res = CrtDAG->getMachineNode(Connex::NOP_BITCONVERT_WH,
				DL,
				MVT::Other,
				sourceVec,
				offsetVec
				);
				*/

				/* Crappy but it works: this is a scalar f16 STORE - we simply
				avoid generating a useful instruction - we just replace it
				with "pseudo"-instruction NOP_BOGUS, which doesn't have a
				useful assembly instruction.
				*/
				SDNode *Res = CrtDAG->getMachineNode( // Connex::NOP_BPF,
				// This must take an immediate
				// operand
				// An unnecessary NOP: Connex::NOP,
				Connex::NOP_BOGUS, DL, MVT::Other,
				// We add a chain edge
				// CrtDAG->getEntryNode()
				// sourceVec,
				// offsetVec
				opChain);
				// assert(0 && "I don't think it's implemented - anyhow I don't think "
				// "it's (much) used - we should try harder with "
				// "NOP_BITCONVERT, etc...");

				LLVM_DEBUG(dbgs() << "Select(): Node = "; Node->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "Select(): Res = "; Res->dump(); dbgs() << "\n");

				ReplaceNode(Node, Res);

				return;
				} else if (sourceTy == TYPE_VECTOR_I16) {
				LLVM_DEBUG(dbgs() << "Select() for STORE: We are in the case "
				"sourceTy == TYPE_VECTOR_I16\n");

				SDNode *Res = CrtDAG->getMachineNode(Connex::ST_INDIRECT_H, DL,
				// We add a chain edge
				// CrtDAG->getEntryNode()
				// sourceVec,
				// offsetVec
				MVT::Other, offset, source, opChain);
				LLVM_DEBUG(dbgs() << "Select(): Node = "; Node->dump(); dbgs() << "\n");
				LLVM_DEBUG(dbgs() << "Select(): Res = "; Res->dump(); dbgs() << "\n");

				ReplaceNode(Node, Res);

				return;
				}

				break;
				} // End ISD::STORE

				// Inspired from MipsSEISelDAGToDAG.cpp
				case ISD::BUILD_VECTOR: {
				selectBUILD_VECTOR(Node);
				return;
				} // End case ISD::BUILD_VECTOR
				/*
				// Very Important: In ISelLowering the DAG Combiner changes
				// (I think in all cases) the vector_shuffle SDNode into a BUILD_VECTOR.
				case ISD::VECTOR_SHUFFLE: {
				selectVECTOR_SHUFFLE(Node);
				return;
				} // End case ISD::VECTOR_SHUFFLE
				*/
				} // End switch (Opcode)

				/*
				// Select the default instruction
				SDNode *ResNode = SelectCode(Node);

				LLVM_DEBUG(dbgs() << "=> ";
				if (ResNode == nullptr \|\| ResNode == Node)
				Node->dump(CrtDAG);
				else
				ResNode->dump(CrtDAG);
				dbgs() << '\n');

				LLVM_DEBUG(dbgs() << "Exiting Select()\n"); // also calling SelectCode()\n");
				ReplaceNode(Node, ResNode);
				return;
				*/

				// Select the default instruction
				// SDNode *ResNode = SelectCode(Node);
				SelectCode(Node);
				}

				FunctionPass *llvm::createConnexISelDag(ConnexTargetMachine &TM) {
				return new ConnexDAGToDAGISel(TM);
				}

				// Added from MipsSEISelDAGToDAG.cpp
				/// Match frameindex
				bool ConnexDAGToDAGISel::selectAddrFrameIndex(SDValue Addr, SDValue &Base,
				SDValue &Offset) const {
				if (FrameIndexSDNode *FIN = dyn_cast<FrameIndexSDNode>(Addr)) {
				EVT ValTy = Addr.getValueType();

				Base = CrtDAG->getTargetFrameIndex(FIN->getIndex(), ValTy);
				Offset = CrtDAG->getTargetConstant(0, SDLoc(Addr), ValTy);
				return true;
				}
				return false;
				}

				// Added from MipsSEISelDAGToDAG.cpp
				/// Match frameindex+offset and frameindex\|offset
				bool ConnexDAGToDAGISel::selectAddrFrameIndexOffset(SDValue Addr, SDValue &Base,
				SDValue &Offset,
				unsigned OffsetBits) const {
				if (CrtDAG->isBaseWithConstantOffset(Addr)) {
				ConstantSDNode *CN = dyn_cast<ConstantSDNode>(Addr.getOperand(1));
				if (isIntN(OffsetBits, CN->getSExtValue())) {
				EVT ValTy = Addr.getValueType();

				// If the first operand is a FI, get the TargetFI Node
				if (FrameIndexSDNode *FIN =
				dyn_cast<FrameIndexSDNode>(Addr.getOperand(0)))
				Base = CrtDAG->getTargetFrameIndex(FIN->getIndex(), ValTy);
				else
				Base = Addr.getOperand(0);

				Offset =
				CrtDAG->getTargetConstant(CN->getZExtValue(), SDLoc(Addr), ValTy);
				return true;
				}
				}
				return false;
				}

				// Added from MipsSEISelDAGToDAG.cpp
				bool ConnexDAGToDAGISel::selectAddrRegImm10(SDValue Addr, SDValue &Base,
				SDValue &Offset) const {
				if (selectAddrFrameIndex(Addr, Base, Offset))
				return true;

				if (selectAddrFrameIndexOffset(Addr, Base, Offset, 10))
				return true;

				return false;
				}

				// Added from MipsSEISelDAGToDAG.cpp
				bool ConnexDAGToDAGISel::selectAddrDefault(SDValue Addr, SDValue &Base,
				SDValue &Offset) const {
				Base = Addr;
				Offset = CrtDAG->getTargetConstant(0, SDLoc(Addr), Addr.getValueType());
				return true;
				}

				// Added from MipsSEISelDAGToDAG.cpp
				bool ConnexDAGToDAGISel::selectIntAddrMSA(SDValue Addr, SDValue &Base,
				SDValue &Offset) const {
				if (selectAddrRegImm10(Addr, Base, Offset))
				return true;

				if (selectAddrDefault(Addr, Base, Offset))
				return true;

				return false;
				}

				// Added from MipsSEISelDAGToDAG.cpp
				// Select constant vector splats.
				//
				// Returns true and sets Imm if:
				// * MSA is enabled
				// * N is a ISD::BUILD_VECTOR representing a constant splat
				bool ConnexDAGToDAGISel::selectVSplat(SDNode *N, APInt &Imm,
				unsigned MinSizeInBits) const {
				LLVM_DEBUG(dbgs() << "Entered ConnexDAGToDAGISel::selectVSplat()\n");

				/*
				if (!Subtarget->hasMSA())
				return false;
				*/

				BuildVectorSDNode *Node = dyn_cast<BuildVectorSDNode>(N);

				if (!Node)
				return false;

				APInt SplatValue, SplatUndef;
				unsigned SplatBitSize;
				bool HasAnyUndefs;

				if (!Node->isConstantSplat(SplatValue, SplatUndef, SplatBitSize, HasAnyUndefs,
				MinSizeInBits,
				// !Subtarget->isLittle()
				false))
				return false;

				Imm = SplatValue;

				LLVM_DEBUG(dbgs() << "ConnexDAGToDAGISel::selectVSplat(): returning true\n");
				return true;
				}

				// Select constant vector splats.
				//
				// In addition to the requirements of selectVSplat(), this function returns
				// true and sets Imm if:
				// * The splat value is the same width as the elements of the vector
				// * The splat value fits in an integer with the specified signed-ness and
				// width.
				//
				// This function looks through ISD::BITCAST nodes.
				// TODO: This might not be appropriate for big-endian MSA since BITCAST is
				// sometimes a shuffle in big-endian mode.
				//
				// It's worth noting that this function is not used as part of the selection
				// of ldi.[bhwd] since it does not permit using the wrong-typed ldi.[bhwd]
				// instruction to achieve the desired bit pattern. ldi.[bhwd] is selected in
				// MipsSEDAGToDAGISel::selectNode.
				bool ConnexDAGToDAGISel::selectVSplatCommon(SDValue N, SDValue &Imm,
				bool Signed,
				unsigned ImmBitSize) const {
				LLVM_DEBUG(dbgs() << "Entered ConnexDAGToDAGISel::selectVSplatCommon()\n");

				APInt ImmValue;
				EVT EltTy = N->getValueType(0).getVectorElementType();

				if (N->getOpcode() == ISD::BITCAST)
				N = N->getOperand(0);

				if (selectVSplat(N.getNode(), ImmValue, EltTy.getSizeInBits()) &&
				ImmValue.getBitWidth() == EltTy.getSizeInBits()) {

				if ((Signed && ImmValue.isSignedIntN(ImmBitSize)) \|\|
				(!Signed && ImmValue.isIntN(ImmBitSize))) {
				Imm = CrtDAG->getTargetConstant(ImmValue, SDLoc(N), EltTy);
				return true;
				}
				}

				return false;
				}

				// Select constant vector splats.
				bool ConnexDAGToDAGISel::selectVSplatUimm1(SDValue N, SDValue &Imm) const {
				LLVM_DEBUG(dbgs() << "Entered selectVSplatUimm1()\n");
				return selectVSplatCommon(N, Imm, false, 1);
				}

				bool ConnexDAGToDAGISel::selectVSplatUimm2(SDValue N, SDValue &Imm) const {
				return selectVSplatCommon(N, Imm, false, 2);
				}

				bool ConnexDAGToDAGISel::selectVSplatUimm3(SDValue N, SDValue &Imm) const {
				return selectVSplatCommon(N, Imm, false, 3);
				}

				// Select constant vector splats.
				bool ConnexDAGToDAGISel::selectVSplatUimm4(SDValue N, SDValue &Imm) const {
				return selectVSplatCommon(N, Imm, false, 4);
				}

				// Select constant vector splats.
				bool ConnexDAGToDAGISel::selectVSplatUimm5(SDValue N, SDValue &Imm) const {
				return selectVSplatCommon(N, Imm, false, 5);
				}

				// Select constant vector splats.
				bool ConnexDAGToDAGISel::selectVSplatUimm6(SDValue N, SDValue &Imm) const {
				return selectVSplatCommon(N, Imm, false, 6);
				}

				// Select constant vector splats.
				bool ConnexDAGToDAGISel::selectVSplatUimm8(SDValue N, SDValue &Imm) const {
				return selectVSplatCommon(N, Imm, false, 8);
				}

				// Select constant vector splats.
				bool ConnexDAGToDAGISel::selectVSplatSimm5(SDValue N, SDValue &Imm) const {
				return selectVSplatCommon(N, Imm, true, 5);
				}

				// Select constant vector splats whose value is a power of 2.
				//
				// In addition to the requirements of selectVSplat(), this function returns
				// true and sets Imm if:
				// * The splat value is the same width as the elements of the vector
				// * The splat value is a power of two.
				//
				// This function looks through ISD::BITCAST nodes.
				// TODO: This might not be appropriate for big-endian MSA since BITCAST is
				// sometimes a shuffle in big-endian mode.
				bool ConnexDAGToDAGISel::selectVSplatUimmPow2(SDValue N, SDValue &Imm) const {
				APInt ImmValue;
				EVT EltTy = N->getValueType(0).getVectorElementType();

				if (N->getOpcode() == ISD::BITCAST)
				N = N->getOperand(0);

				if (selectVSplat(N.getNode(), ImmValue, EltTy.getSizeInBits()) &&
				ImmValue.getBitWidth() == EltTy.getSizeInBits()) {
				int32_t Log2 = ImmValue.exactLogBase2();

				if (Log2 != -1) {
				Imm = CrtDAG->getTargetConstant(Log2, SDLoc(N), EltTy);
				return true;
				}
				}

				return false;
				}

				// Select constant vector splats whose value only has a consecutive sequence
				// of left-most bits set (e.g. 0b11...1100...00).
				//
				// In addition to the requirements of selectVSplat(), this function returns
				// true and sets Imm if:
				// * The splat value is the same width as the elements of the vector
				// * The splat value is a consecutive sequence of left-most bits.
				//
				// This function looks through ISD::BITCAST nodes.
				// TODO: This might not be appropriate for big-endian MSA since BITCAST is
				// sometimes a shuffle in big-endian mode.
				bool ConnexDAGToDAGISel::selectVSplatMaskL(SDValue N, SDValue &Imm) const {
				APInt ImmValue;
				EVT EltTy = N->getValueType(0).getVectorElementType();

				if (N->getOpcode() == ISD::BITCAST)
				N = N->getOperand(0);

				if (selectVSplat(N.getNode(), ImmValue, EltTy.getSizeInBits()) &&
				ImmValue.getBitWidth() == EltTy.getSizeInBits()) {
				// Extract the run of set bits starting with bit zero from the bitwise
				// inverse of ImmValue, and test that the inverse of this is the same
				// as the original value.
				if (ImmValue == ~(~ImmValue & ~(~ImmValue + 1))) {

				Imm = CrtDAG->getTargetConstant(ImmValue.countPopulation(), SDLoc(N),
				EltTy);
				return true;
				}
				}

				return false;
				}

				// Select constant vector splats whose value only has a consecutive sequence
				// of right-most bits set (e.g. 0b00...0011...11).
				//
				// In addition to the requirements of selectVSplat(), this function returns
				// true and sets Imm if:
				// * The splat value is the same width as the elements of the vector
				// * The splat value is a consecutive sequence of right-most bits.
				//
				// This function looks through ISD::BITCAST nodes.
				// TODO: This might not be appropriate for big-endian MSA since BITCAST is
				// sometimes a shuffle in big-endian mode.
				bool ConnexDAGToDAGISel::selectVSplatMaskR(SDValue N, SDValue &Imm) const {
				APInt ImmValue;
				EVT EltTy = N->getValueType(0).getVectorElementType();

				if (N->getOpcode() == ISD::BITCAST)
				N = N->getOperand(0);

				if (selectVSplat(N.getNode(), ImmValue, EltTy.getSizeInBits()) &&
				ImmValue.getBitWidth() == EltTy.getSizeInBits()) {
				// Extract the run of set bits starting with bit zero, and test that the
				// result is the same as the original value
				if (ImmValue == (ImmValue & ~(ImmValue + 1))) {
				Imm = CrtDAG->getTargetConstant(ImmValue.countPopulation(), SDLoc(N),
				EltTy);
				return true;
				}
				}

				return false;
				}

				bool ConnexDAGToDAGISel::selectVSplatUimmInvPow2(SDValue N,
				SDValue &Imm) const {
				APInt ImmValue;
				EVT EltTy = N->getValueType(0).getVectorElementType();

				if (N->getOpcode() == ISD::BITCAST)
				N = N->getOperand(0);

				if (selectVSplat(N.getNode(), ImmValue, EltTy.getSizeInBits()) &&
				ImmValue.getBitWidth() == EltTy.getSizeInBits()) {
				int32_t Log2 = (~ImmValue).exactLogBase2();

				if (Log2 != -1) {
				Imm = CrtDAG->getTargetConstant(Log2, SDLoc(N), EltTy);
				return true;
				}
				}

				return false;
				}

llvm/lib/Target/Connex/ConnexISelLowering.h

This file was added.

				//===-- ConnexISelLowering.h - Connex DAG Lowering Interface ----- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file defines the interfaces that Connex uses to lower LLVM code into a
				/// selection DAG.
				///
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_CONNEX_CONNEXISELLOWERING_H
				#define LLVM_LIB_TARGET_CONNEX_CONNEXISELLOWERING_H

				#include "Connex.h"
				#include "ConnexConfig.h"
				#include "llvm/CodeGen/SelectionDAG.h"
				#include "llvm/CodeGen/TargetLowering.h"

				namespace llvm {
				class ConnexSubtarget;

				namespace ConnexISD {
				/*
				From http://llvm.org/docs/doxygen/html/namespacellvm_1_1ISD.html:
				<<Targets may also define target-dependent operator codes for SDNodes.
				For example, on x86, these are the enum values in the X86ISD namespace.
				Targets should aim to use target-independent operators to model their
				instruction sets as much as possible, and only use target-dependent
				operators when they have special requirements.
				Finally, during and after selection proper, SNodes may use special operator
				codes that correspond directly with MachineInstr opcodes.
				These are used to represent selected instructions.
				See the isMachineOpcode() and getMachineOpcode() member functions of
				SDNode.>>
				*/
				enum NodeType : unsigned {
				FIRST_NUMBER = ISD::BUILTIN_OP_END,
				RET_FLAG,
				CALL,
				SELECT_CC,
				BR_CC,

				/* Inspired from lib/Target/X86/X86ISelLowering.h
				/// A wrapper node for TargetConstantPool,
				/// TargetExternalSymbol, and TargetGlobalAddress.
				*/
				Wrapper,

				// From llvm/lib/Target/Mips/MipsISelLowering.h
				// Extended vector element extraction
				VEXTRACT_SEXT_ELT,
				VEXTRACT_ZEXT_ELT,

				// ConstantPool,

				// Vector Shuffle with mask as an operand
				VSHF, // Generic shuffle
				SHF, // 4-element set shuffle.
				ILVEV, // Interleave even elements
				ILVOD, // Interleave odd elements
				ILVL, // Interleave left elements
				ILVR, // Interleave right elements
				PCKEV, // Pack even elements
				PCKOD, // Pack odd elements
				};
				} // end namespace ConnexISD

				class ConnexTargetLowering : public TargetLowering {
				public:
				explicit ConnexTargetLowering(const TargetMachine &TM,
				const ConnexSubtarget &STI);

				SDValue LowerConstantPool(SDValue Op, SelectionDAG &DAG) const;

				// Inspired from lib/Target/AMDGPU/AMDGPUISelLowering.h
				SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;

				// Provide custom lowering hooks for some operations.
				SDValue LowerOperation(SDValue Op, SelectionDAG &DAG) const override;

				// This method returns the name of a target specific DAG node.
				const char *getTargetNodeName(unsigned Opcode) const override;

				MachineBasicBlock *
				EmitInstrWithCustomInserter(MachineInstr &MI,
				MachineBasicBlock *BB) const override;

				private:
				/*
				// From llvm/lib/Target/Mips/MipsISelLowering.h
				// Create a TargetGlobalAddress node.
				SDValue getTargetNode(GlobalAddressSDNode *N, EVT Ty, SelectionDAG &DAG,
				unsigned Flag) const;

				// Create a TargetExternalSymbol node.
				SDValue getTargetNode(ExternalSymbolSDNode *N, EVT Ty, SelectionDAG &DAG,
				unsigned Flag) const;

				// Create a TargetBlockAddress node.
				SDValue getTargetNode(BlockAddressSDNode *N, EVT Ty, SelectionDAG &DAG,
				unsigned Flag) const;

				// Create a TargetJumpTable node.
				SDValue getTargetNode(JumpTableSDNode *N, EVT Ty, SelectionDAG &DAG,
				unsigned Flag) const;
				*/
				// Create a TargetConstantPool node.
				SDValue getTargetNode(ConstantPoolSDNode *N, EVT Ty, SelectionDAG &DAG,
				unsigned Flag) const;

				// Added from lib/Target/Mips/MipsSEISelLowering.cpp (method addMSAIntType)
				void addVectorIntType(MVT::SimpleValueType Ty, const TargetRegisterClass *RC);

				// Inspired from lib/Target/Mips/MipsSEISelLowering.cpp, addMSAFloatType()
				void addVectorFloatType(MVT::SimpleValueType Ty,
				const TargetRegisterClass *RC);

				bool allowsMisalignedMemoryAccesses(EVT VT, unsigned, unsigned,
				bool *Fast) const;

				void replaceAddI32UseWithADDVH(MVT &aType, SDValue &Index,
				SelectionDAG &DAG) const;

				SDValue LowerBR_CC(SDValue Op, SelectionDAG &DAG) const;
				SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;
				SDValue LowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const;
				/static / SDValue LowerMGATHER(SDValue &Op,
				// const ConnexSubtarget &Subtarget,
				SelectionDAG &DAG) const;
				/static / SDValue LowerMSCATTER(SDValue &Op,
				// const ConnexSubtarget &Subtarget,
				SelectionDAG &DAG) const;

				// Lower the result values of a call, copying them out of physregs into vregs
				SDValue LowerCallResult(SDValue Chain, SDValue InFlag,
				CallingConv::ID CallConv, bool IsVarArg,
				const SmallVectorImpl<ISD::InputArg> &Ins,
				const SDLoc &DL, SelectionDAG &DAG,
				SmallVectorImpl<SDValue> &InVals) const;

				// Maximum number of arguments to a call
				static const unsigned MaxArgs;

				// Lower a call into CALLSEQ_START - ConnexISD:CALL - CALLSEQ_END chain
				SDValue LowerCall(TargetLowering::CallLoweringInfo &CLI,
				SmallVectorImpl<SDValue> &InVals) const override;

				// Lower incoming arguments, copy physregs into vregs
				SDValue LowerFormalArguments(SDValue Chain, CallingConv::ID CallConv,
				bool IsVarArg,
				const SmallVectorImpl<ISD::InputArg> &Ins,
				const SDLoc &DL, SelectionDAG &DAG,
				SmallVectorImpl<SDValue> &InVals) const override;

				SDValue LowerReturn(SDValue Chain, CallingConv::ID CallConv, bool IsVarArg,
				const SmallVectorImpl<ISD::OutputArg> &Outs,
				const SmallVectorImpl<SDValue> &OutVals, const SDLoc &DL,
				SelectionDAG &DAG) const override;

				EVT getOptimalMemOpType(const MemOp &Op,
				const AttributeList &FuncAttributes) const override {
				#define DEBUG_TYPE "connex-lower"

				LLVM_DEBUG(dbgs() << "Entered getOptimalMemOpType(): Op.size() = "
				<< Op.size() << ")\n");

				// return Size >= 8 ? MVT::i64 : MVT::i32;
				// Inspired from lib/Target/BPF/BPFISelLowering.h
				return Op.size() >= 8 ? MVT::i64 : MVT::i32;

				// TODO_CHANGE_BACKEND - Seems it's NOT required:
				// return Size >= 8 ? TYPE_VECTOR_ELEMENT : MVT::i32;

				#undef DEBUG_TYPE
				}

				bool shouldConvertConstantLoadToIntImm(const APInt &Imm,
				Type *Ty) const override {
				return true;
				}

				SDValue LowerVSELECT(SDValue &Op, SelectionDAG &DAG) const;

				// From llvm/lib/Target/Mips/MipsSEISelLowering.h
				SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;

				SDValue LowerADD_I32(SDValue Op, SelectionDAG &DAG) const;

				SDValue LowerADD_F16(SDValue &Op, SelectionDAG *DAG) const;
				SDValue LowerMUL_F16(SDValue &Op, SelectionDAG *DAG) const;
				SDValue LowerREDUCE_F16(SDValue &Op, SelectionDAG *DAG) const;

				SDValue LowerBITCAST(SDValue Op, SelectionDAG &DAG) const;

				SDValue LowerINSERT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;
				SDValue LowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;
				SDValue LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) const;
				//
				EVT getSetCCResultType(const DataLayout &, LLVMContext &,
				EVT VT) const override;
				}; // end class ConnexTargetLowering
				} // end namespace llvm

				#endif

llvm/lib/Target/Connex/ConnexISelLowering.cpp

This file was added.

This file has a very large number of changes (3,268 lines). Show File Contents

llvm/lib/Target/Connex/ConnexISelMisc.h

This file was added.

				#include "llvm/CodeGen/SelectionDAGNodes.h"

				using namespace llvm;

				// From llvm/lib/Target/Mips/MipsSEISelLowering.cpp
				static bool isSplatVector(const BuildVectorSDNode *N) {
				unsigned int nOps = N->getNumOperands();
				assert(nOps > 1 && "isSplatVector(): N is 0 or 1 sized build vector");

				SDValue Operand0 = N->getOperand(0);

				for (unsigned int i = 1; i < nOps; ++i) {
				if (N->getOperand(i) != Operand0)
				return false;
				}

				return true;
				}

llvm/lib/Target/Connex/ConnexInstrFormats.td

This file was added.

				//===-- ConnexInstrFormats.td - Connex Instruction Formats -- tablegen --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				class InstConnex<dag outs, dag ins, string asmstr, list<dag> pattern>
				: Instruction {
				field bits<64> Inst;
				field bits<64> SoftFail = 0;
				let Size = 8;

				let Namespace = "Connex";
				let DecoderNamespace = "Connex";

				bits<3> ConnexClass;
				let Inst{58-56} = ConnexClass;

				dag OutOperandList = outs;
				dag InOperandList = ins;
				let AsmString = asmstr;
				let Pattern = pattern;
				}

				class InstConnex2<dag outs, dag ins, string asmstr, list<dag> pattern>
				: Instruction {
				field bits<64> Inst;
				field bits<64> SoftFail = 0;
				let Size = 8;

				let Namespace = "Connex";
				let DecoderNamespace = "Connex";

				// bits<3> ConnexClass;
				// let Inst{58-56} = ConnexClass;

				dag OutOperandList = outs;
				dag InOperandList = ins;
				let AsmString = asmstr;
				let Pattern = pattern;
				}


				// Pseudo instructions
				class Pseudo<dag outs, dag ins, string asmstr, list<dag> pattern>
				: InstConnex<outs, ins, asmstr, pattern> {
				let Inst{63-0} = 0;
				let isPseudo = 1;
				}


				// Inspired from book "Getting started with LLVM Core Libraries", 2014, page 141

				// Inspired from SparcInstrFormats.td:

				// Instruction with 16 bits immediate operand
				class Connex_IMM16_FMT<bits<6> opcode>: Instruction {
				field bits<32> Inst;

				bits<16> imm;
				bits<5> wl; // Connex-S left
				bits<5> wd; // Connex-S dest

				let Namespace = "Connex";
				let DecoderNamespace = "Connex";

				let Inst{31-26} = opcode;
				let Inst{25-10} = imm;
				let Inst{9-5} = wl;
				let Inst{4-0} = wd;
				}

				class Connex_IMM16_FMT2<bits<6> opcode>: Instruction {
				field bits<32> Inst;

				bits<16> imm;

				let Namespace = "Connex";
				let DecoderNamespace = "Connex";

				let Inst{31-26} = opcode;
				let Inst{25-10} = imm;
				}

				class Connex_IMM16_FMT3<bits<6> opcode>: Instruction {
				field bits<32> Inst;

				bits<16> imm;
				bits<5> wd; // Connex-S dest

				let Namespace = "Connex";
				let DecoderNamespace = "Connex";

				let Inst{31-26} = opcode;
				let Inst{25-10} = imm;
				let Inst{4-0} = wd;
				}

				class Connex_IMM16_SYM_FMT<bits<6> opcode>: Instruction {
				field bits<32> Inst;

				// We comment it - otherwise we get error in TableGen
				// <<CodeGenInstruction.h:196: Assertion `i < OperandList.size() &&
				// "Invalid flat operand #"' failed.>>: bits<16> imm;
				bits<5> wd; // dest

				let Namespace = "Connex";
				let DecoderNamespace = "Connex";

				let Inst{31-26} = opcode;
				let Inst{4-0} = wd;
				}

				// Non-immediate instruction
				class Connex_NI_FMT<bits<9> opcode>: Instruction {
				field bits<32> Inst;

				bits<8> reserved;
				bits<5> wd;
				bits<5> wl;
				bits<5> wr;

				let Namespace = "Connex";
				let DecoderNamespace = "Connex";

				let Inst{31-23} = opcode;
				let Inst{22-15} = reserved;
				let Inst{14-10} = wl;
				let Inst{9-5} = wr;
				let Inst{4-0} = wd;
				}

				// Non-immediate instruction for ISHLV(_SPECIAL), ISHRV(_SPECIAL)
				class Connex_NI_FMT_ISHV<bits<9> opcode>: Instruction {
				field bits<32> Inst;

				// wr is Right, wl is Left, wd is Dest - see ConnexISA.pdf
				// bits<8> reserved;
				bits<5> wr;
				bits<5> wl;
				bits<5> wd;

				let Namespace = "Connex";
				let DecoderNamespace = "Connex";

				let Inst{31-23} = opcode;
				// let Inst{22-15} = reserved;
				let Inst{14-10} = wr;
				let Inst{9-5} = wl;
				let Inst{4-0} = wd;
				}

				class NonImmediateInstruction<bits<9> opcode, dag outs, dag ins,
				string asmstr, list<dag> pattern>
				: Instruction {
				field bits<32> Inst;
				let Inst{31-23} = opcode;

				// We require to put the Namespace field, otherwise we receive:
				// "error:No instructions defined!"
				let Namespace = "Connex";
				let DecoderNamespace = "Connex";

				dag OutOperandList = outs;
				dag InOperandList = ins;
				let AsmString = asmstr;
				let Pattern = pattern;
				}

				class ImmediateInstruction<bits<6> opcode, dag outs, dag ins, string asmstr,
				list<dag> pattern>
				: Instruction {
				field bits<32> Inst;
				let Inst{31-26} = opcode;

				let Namespace = "Connex";
				let DecoderNamespace = "Connex";

				dag OutOperandList = outs;
				dag InOperandList = ins;
				let AsmString = asmstr;
				let Pattern = pattern;
				}


				// Inspired from lib/Target/Mips/Mips.td
				// The overall idea of the PredicateControl class is to chop the Predicates list
				// into subsets that are usually overridden independently. This allows
				// subclasses to partially override the predicates of their superclasses without
				// having to re-add all the existing predicates.
				class PredicateControl {
				// Predicates for the encoding scheme in use such as HasStdEnc
				list<Predicate> EncodingPredicates = [];
				// Predicates for the GPR size such as IsGP64bit
				list<Predicate> GPRPredicates = [];
				// Predicates for the FGR size and layout such as IsFP64bit
				list<Predicate> FGRPredicates = [];
				// Predicates for the instruction group membership such as ISA's and ASE's
				list<Predicate> InsnPredicates = [];
				// Predicate for marking the instruction as usable in hard-float mode only.
				list<Predicate> HardFloatPredicate = [];
				// Predicates for anything else
				list<Predicate> AdditionalPredicates = [];
				list<Predicate> Predicates = !listconcat(EncodingPredicates,
				GPRPredicates,
				FGRPredicates,
				InsnPredicates,
				HardFloatPredicate,
				AdditionalPredicates);
				}

				// Inspired from lib/Target/Mips/MipsInstrFormats.td
				// Format specifies the encoding used by the instruction. This is part of the
				// ad-hoc solution used to emit machine instruction encodings by our machine
				// code emitter.
				class Format<bits<4> val> {
				bits<4> Value = val;
				}

				def FrmOther : Format<6>; // Instruction w/ a custom format

				class ConnexMipsInst<dag outs, dag ins, string asmstr, list<dag> pattern,
				InstrItinClass itin, Format f>: Instruction
				{
				field bits<32> Inst;
				Format Form = f;

				let Namespace = "Connex";

				let Size = 4;

				bits<6> Opcode = 0;

				// Top 6 bits are the 'opcode' field
				let Inst{31-26} = Opcode;

				let OutOperandList = outs;
				let InOperandList = ins;

				let AsmString = asmstr;
				let Pattern = pattern;
				let Itinerary = itin;

				//
				// Attributes specific to Mips instructions...
				//
				bits<4> FormBits = Form.Value;

				// TSFlags layout should be kept in sync with MipsInstrInfo.h.
				let TSFlags{3-0} = FormBits;

				let DecoderNamespace = "Connex";

				field bits<32> SoftFail = 0;
				}

				// Inspired from lib/Target/Mips/MipsMSAInstrFormats.td:
				class MSAInst : ConnexMipsInst<(outs), (ins), "", [], NoItinerary, FrmOther>,
				PredicateControl {
				}

				class MSA_1R_FMT<bits<9> opcode>: MSAInst {
				bits<5> wl;

				let Inst{31-23} = opcode;

				let Namespace = "Connex";
				let DecoderNamespace = "Connex";

				let Inst{9-5} = wl;
				}

				class MSA_1R_FMT_dest<bits<9> opcode>: MSAInst {
				bits<5> wd;

				let Inst{31-23} = opcode;

				let Namespace = "Connex";
				let DecoderNamespace = "Connex";

				let Inst{9-5} = wd;
				}

				class MSA_1R_FMT_dest_imm<bits<6> opcode>: MSAInst {
				bits<5> wd; // Connex-S dest
				bits<16> imm;

				let Inst{31-26} = opcode;

				let Namespace = "Connex";
				let DecoderNamespace = "Connex";

				let Inst{25-10} = imm;
				let Inst{4-0} = wd;
				}

				class MSA_1R_FMT_left_imm<bits<6> opcode>: MSAInst {
				bits<5> wl; // Connex-S left
				bits<16> imm; // Connex-S immediate

				let Inst{31-26} = opcode;

				let Namespace = "Connex";
				let DecoderNamespace = "Connex";

				let Inst{25-10} = imm;
				let Inst{9-5} = wl;
				}

				class MSA_2R_FMT<bits<9> opcode>: MSAInst {
				bits<5> wl; // Connex-S left
				bits<5> wd; // Connex-S dest

				let Inst{31-23} = opcode;

				let Namespace = "Connex";
				let DecoderNamespace = "Connex";

				let Inst{9-5} = wl;
				let Inst{4-0} = wd;
				}

				class MSA_2R_FMT2<bits<9> opcode>: MSAInst {
				bits<5> wr; // Connex-S right
				bits<5> wl; // Connex-S left

				let Inst{31-23} = opcode;

				let Namespace = "Connex";
				let DecoderNamespace = "Connex";

				let Inst{14-10} = wr;
				let Inst{9-5} = wl;
				}

				class MSA_RR_FMT<bits<9> opcode>: MSAInst {
				bits<5> wr; // Connex-S right
				bits<5> wl; // Connex-S left

				let Inst{31-23} = opcode;

				let Namespace = "Connex";
				let DecoderNamespace = "Connex";

				let Inst{14-10} = wr;
				let Inst{9-5} = wl;
				}

				class MSA_3R_FMT<bits<9> opcode>: MSAInst {
				bits<5> wr; // Connex-S right
				bits<5> wl; // Connex-S left
				bits<5> wd; // Connex-S dest

				let Inst{31-23} = opcode;

				let Namespace = "Connex";

				let DecoderNamespace = "Connex";

				let Inst{14-10} = wr;
				let Inst{9-5} = wl;
				let Inst{4-0} = wd;
				}

				class MSA_3R_FMT2<bits<9> opcode>: MSAInst {
				bits<5> wr; // Connex-S right
				// bits<5> ws; // Connex-S left
				bits<5> wd; // Connex-S dest

				let Inst{31-23} = opcode;

				let Namespace = "Connex";

				let DecoderNamespace = "Connex";

				let Inst{14-10} = wr;
				// let Inst{9-5} = ws;
				let Inst{4-0} = wd;
				}

				class MSA_LDIX_LDSH_MULT_H_DESC_BASE<string instr_asm,
				RegisterOperand ROWD,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD:$wd);
				dag InOperandList = (ins);
				string AsmString = !strconcat(!strconcat("$wd = ", instr_asm),
				" ; // MSA_LDIX_LDSH_MULT");
				// Note: LDI is matched using custom matching code in MipsSEISelDAGToDAG.cpp
				list<dag> Pattern = [];

				bit hasSideEffects = 1;
				// We need to put this since we don't specify a DAG pattern in Pattern
				InstrItinClass Itinerary = itin;
				}

				class MSA_RR_PREFIX_DESC_BASE<string instr_asm,
				RegisterOperand ROWS,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs);
				dag InOperandList = (ins ROWS:$wr, ROWS:$wl);
				string AsmString = !strconcat(instr_asm,
				" ( $wr, $wl ); // MSA_RR generic instruction");
				list<dag> Pattern = [];

				bit hasSideEffects = 1;

				InstrItinClass Itinerary = itin;
				}

				class MSA_RR_INFIX_DESC_BASE<string instr_asm,
				RegisterOperand ROWS,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs);
				dag InOperandList = (ins ROWS:$wr, ROWS:$wl);
				string AsmString = !strconcat(
				!strconcat("$wr ", instr_asm),
				" $wl ; // MSA_RR_INFIX instruction");
				list<dag> Pattern = [];

				bit hasSideEffects = 1;

				InstrItinClass Itinerary = itin;
				}

				///////////////////////////////////////////////////////////////////////////////
				////////////////////BEGIN (i)read/(i)write SPECS///////////////////////////////
				///////////////////////////////////////////////////////////////////////////////

				// SDTMaskedGather, SDTMaskedScatter, masked_gather, masked_scatter
				// are found now in include/llvm/Target/TargetSelectionDAG.td.
				// But SDTMaskedGather, SDTMaskedScatter taken now 4 parameters.
				def SDTMaskedGather2: SDTypeProfile<2, 3, [ // masked gather
				SDTCisVec<0>, SDTCisVec<1>, SDTCisSameAs<0, 2>, SDTCisSameAs<1, 3>,
				SDTCisPtrTy<4>, SDTCVecEltisVT<1, i1>, SDTCisSameNumEltsAs<0, 1>
				]>;

				/*
				// So: 3 input operands, 1 result.
				// Params are: mask, value, index; results are: ptr
				// Params are 0, 1, 2 and result is 3.
				// Operands 0 and 1 have vector type; also with same number of elements.
				// Operands 0 and 2 have identical types.
				// --> Opnd 2 is vector of i16 elements
				// Operand 3 (result 0) has pointer type.
				// Operand 0 is vector type with element type of i1.
				def SDTMaskedScatter: SDTypeProfile<1, 3, [ // masked scatter
				SDTCisVec<0>, SDTCisVec<1>, SDTCisSameAs<0, 2>, SDTCisSameNumEltsAs<0, 1>,
				SDTCVecEltisVT<0, i1>, SDTCisPtrTy<3>
				]>;

				def masked_scatter : SDNode<"ISD::MSCATTER", SDTMaskedScatter,
				[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
				*/

				def SDTMaskedScatter2: SDTypeProfile<1, 3, [ // masked scatter
				SDTCisVec<0>, SDTCisVec<1>, SDTCisSameAs<0, 2>, SDTCisSameNumEltsAs<0, 1>,
				SDTCVecEltisVT<0, i1>, SDTCisPtrTy<3>
				]>;

				def masked_gather2 : SDNode<"ISD::MGATHER", SDTMaskedGather2,
				[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
				def masked_scatter2 : SDNode<"ISD::MSCATTER", SDTMaskedScatter2,
				[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;

				// We inspire from MSP430:
				def addr : ComplexPattern<iPTR, 2, "SelectAddr", [], []>;

				def simm10 : Operand<i32>;
				def simm16 : Operand<i32> {
				let DecoderMethod= "DecodeSimm16";
				}

				def MipsMemAsmOperand : AsmOperandClass {
				let Name = "Mem";
				let ParserMethod = "parseMemOperand";
				}

				class mem_generic : Operand<iPTR> {
				let PrintMethod = "printMemOperand";
				let MIOperandInfo = (ops ptr_rc, simm16);
				let EncoderMethod = "getMemEncoding";
				let ParserMatchClass = MipsMemAsmOperand;
				let OperandType = "OPERAND_MEMORY";
				}

				// MSA specific address operand
				def mem_msa : mem_generic {
				let MIOperandInfo = (ops ptr_rc, simm10);
				let EncoderMethod = "getMSAMemEncoding";
				}

				def mem_msa2 : mem_generic {
				let MIOperandInfo = (ops VectorHOpnd);
				let EncoderMethod = "getMSAMemEncoding";
				}

				// Address operands
				def MEMtest : Operand<i16> {
				let PrintMethod = "printMemOperand";
				let MIOperandInfo = (ops i16imm);
				}

				// From [LLVM]/llvm/lib/Target/Mips/MipsMSAInstrInfo.td
				def uimm4_ptr : Operand<iPTR> {
				let PrintMethod = "printUnsignedImm";
				}

				def immAlex : ComplexPattern<iPTR, 1, "SelectAddr", []>;
				def immLeafAlex : ImmLeaf<i64, [{return 1;}]>; // TODO: make sure we can retrn 1
				def uimm8 : Operand<i64> {
				let PrintMethod = "printUnsignedImm8";
				}

				// This is inspired from Mips MSA LD_DESC_BASE and got changed to have
				// immediate address operand.
				// Note that MSA_I10_LDI_DESC_BASE loads in a vector register an
				// immediate vector value.
				class LD_DESC_BASE<
				SDPatternOperator OpNode,
				ValueType TyNode, RegisterOperand ROWD,
				Operand MemOpnd = uimm4_ptr,
				ImmLeaf Addr = immLeafAlex,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD:$wd);
				dag InOperandList = (ins MemOpnd:$imm);
				string AsmString = "$wd = LS[$imm]; // IREAD (or Mips MSA's LD)";

				list<dag> Pattern = [(set ROWD:$wd, (TyNode (OpNode Addr:$imm)))];
				InstrItinClass Itinerary = itin;
				string DecoderMethod = "DecodeMSA128Mem";
				}

				class ST_DESC_BASE<
				SDPatternOperator OpNode,
				ValueType TyNode, RegisterOperand ROWS,
				Operand MemOpnd = uimm4_ptr, ImmLeaf Addr = immLeafAlex,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs);
				dag InOperandList = (ins ROWS:$wl, MemOpnd:$imm);
				string AsmString = "LS[$imm] = $wl ; // IWRITE (or Mips MSA's ST)";
				list<dag> Pattern = [(OpNode (TyNode ROWS:$wl), Addr:$imm)];
				InstrItinClass Itinerary = itin;
				string DecoderMethod = "DecodeMSA128Mem";
				}

				/*
				In a good sense LD_INDIRECT_DESC_BASE is similar to
				the gather of X86 AVX - remember I can implement gather with Connex's
				Read, by loading (the rather small array in each column of LS
				with LS[i] = Rsrc = array[i]).

				LD_INDIRECT_DESC_BASE is similar to:
				- VLOAD
				The Mips equivalent is LD_DESC_BASE which uses stack -
				def addrimm10 : ComplexPattern<iPTR, 2, "selectIntAddrMSA", [frameindex]>;
				The instructions selected look like:
				ld.w $w0, 32($fp)

				I need to instr-select:
				Rdst = LS[Rsrc]

				list<dag> Pattern = [(set ROWD:$wd, (TyNode (OpNode Addr:$addrsrc)))];

				// From http://llvm.org/docs/doxygen/html/SelectionDAGNodes_8h_source.html:
				// In the both nodes address is Op1, mask is Op2:
				// MaskedGatherSDNode (Chain, src0, mask, base, index),
				// src0 is a passthru value
				// MaskedScatterSDNode (Chain, value, mask, base, index)
				// Mask is a vector of i1 elements
				const SDValue &getBasePtr() const { return getOperand(3); }
				const SDValue &getIndex() const { return getOperand(4); }
				const SDValue &getMask() const { return getOperand(2); }
				const SDValue &getValue() const { return getOperand(1); }
				// This is pass-thru
				*/

				/*
				// From include/llvm/Target/TargetSelectionDAG.td

				// SDTypeProfile - This profile describes the type requirements of a Selection
				// DAG node.
				class SDTypeProfile<int numresults, int numoperands,
				list<SDTypeConstraint> constraints> {
				int NumResults = numresults;
				int NumOperands = numoperands;
				list<SDTypeConstraint> Constraints = constraints;
				}

				// So: 3 input operands, 2 results.
				// Params are: passthru, mask, index; results are: vector of i1, the
				// vector with the values loaded by the gather instruction (ptr)
				// Params are 0, 1, 2 and results are 3, 4.
				// Operands 0 and 1 have vector type; also with same number of elements.
				// Operands 0 and 2 have identical types.
				// Operands 1 and 3 have identical types.
				// --> Opnd 3 (result 0?) is i1 vector
				// Operand 4 (result 1?) has pointer type.
				// Operand 1 is vector type with element type of i1.
				// Note that ConnexTargetLowering::LowerMGATHER() treats a ~different
				// masked_gather, which is machine-independent, NOT like this one, with
				// different parameters.
				def SDTMaskedGather: SDTypeProfile<2, 3, [ // masked gather
				SDTCisVec<0>, SDTCisVec<1>, SDTCisSameAs<0, 2>, SDTCisSameAs<1, 3>,
				SDTCisPtrTy<4>, SDTCVecEltisVT<1, i1>, SDTCisSameNumEltsAs<0, 1>
				]>;

				def masked_gather : SDNode<"ISD::MGATHER", SDTMaskedGather,
				[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
				*/

				// Note (defined in Target.td): def ptr_rc : PointerLikeRegClass<0>;
				//
				/* IMPORTANT: because of the SDNPMemOperand attribute of masked_gather it seems
				we need to make the index operator a memory operand.
				It also seems we need to make it a scalar operand by using iPTR and use a C++
				method that returns a vector type.
				If we try to use VectorHOpnd instead of vectoraddr in the Pattern, we get
				errors.
				*/
				// Gather mem operands
				def ScatterGatherMemOperand : Operand<iPTR> {
				let PrintMethod = "printScatterGatherMemOperand";
				let MIOperandInfo = (ops VectorH);
				}

				/* 1 means selectVectorAddr takes 1 extra argument, in this case reference
				int Index which we set with N->getIndex(). Otherwise, the 3rd(? maybe 2nd now)
				parameter of masked_gather would receive the base pointer IIRC. */
				def vectoraddr : ComplexPattern<iPTR, 1, "selectVectorAddr", [],
				[SDNPWantParent]>;

				/*
				Note: From
				llvm.org/docs/LangRef.html#masked-vector-gather-and-scatter-intrinsics
				‘llvm.masked.gather.*‘ Intrinsics
				<<Overview:
				Reads scalar values from arbitrary memory locations and gathers them into
				one vector.
				The memory locations are provided in the vector of pointers ‘ptrs‘.
				The memory is accessed according to the provided mask.
				The mask holds a bit for each vector lane, and is used to prevent memory
				accesses to the masked-off lanes.
				The masked-off lanes in the result vector are taken from the
				corresponding lanes of the ‘passthru‘ operand.>>
				*/

				// Inspired from [REPO]/llvm/lib/Target/X86/X86InstrAVX512.td
				class LD_INDIRECT_MASKED_DESC_BASE<RegisterOperand ROWD,
				// AVOIDING_USE_OF_PASSTHRU_REGISTER:
				RegisterOperand ROWSP = ROWD,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD:$wd, BoolMaskOpnd:$dstmask);
				dag InOperandList = (ins
				// AVOIDING_USE_OF_PASSTHRU_REGISTER:
				ROWSP:$wsp, // passthru register
				BoolMaskOpnd:$srcmask, // mask register
				ScatterGatherMemOperand:$wr // index register
				);
				string AsmString = "$wd = LS[$wr]; // READ (gather)";
				list<dag> Pattern = [(set ROWD:$wd, BoolMaskOpnd:$dstmask,
				(masked_gather2
				// AVOIDING_USE_OF_PASSTHRU_REGISTER:
				ROWSP:$wsp,
				BoolMaskOpnd:$srcmask,
				vectoraddr:$wr
				)
				)];
				InstrItinClass Itinerary = itin;
				string DecoderMethod = "DecodeMSA128Mem";
				}

				// NEW32-TODO
				class LD_INDIRECT_DESC_BASE<RegisterOperand ROWD,
				RegisterOperand ROWSI = ROWD,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD:$wd);
				dag InOperandList = (ins ROWSI:$wr); // index register
				string AsmString = "$wd = LS[$wr]; // READ 32bits index (gather)";
				list<dag> Pattern = [];

				bit hasSideEffects = 1;
				// We need to put this since we don't specify a DAG pattern in Pattern

				InstrItinClass Itinerary = itin;
				string DecoderMethod = "DecodeMSA128Mem";
				}

				// NEW32-TODO
				class ST_INDIRECT_DESC_BASE<
				RegisterOperand ROWSV,
				RegisterOperand ROWSI = ROWSV,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs);
				dag InOperandList = (ins
				ROWSI:$wr, // index register
				ROWSV:$wl
				);
				string AsmString = "LS[$wr] = $wl ; // WRITE 32bits index (scatter)";
				list<dag> Pattern = [];

				// We need to put this since we don't specify a DAG pattern in Pattern
				bit hasSideEffects = 1;

				InstrItinClass Itinerary = itin;
				string DecoderMethod = "DecodeMSA128Mem";
				}

				/*
				// REPEATED: So: 3 input operands, 1 result.
				// Params are: mask, value, index; results are: ptr
				// Params are 0, 1, 2 and result is 3.
				// Operands 0 and 1 have vector type; also with same number of elements.
				// Operands 0 and 2 have identical types.
				// --> Opnd 2 is vector of i16 elements
				// Operand 3 (result 0) has pointer type.
				// Operand 0 is vector type with element type of i1.
				def SDTMaskedScatter: SDTypeProfile<1, 3, [ // masked scatter
				SDTCisVec<0>, SDTCisVec<1>, SDTCisSameAs<0, 2>, SDTCisSameNumEltsAs<0, 1>,
				SDTCVecEltisVT<0, i1>, SDTCisPtrTy<3>
				]>;

				def masked_scatter : SDNode<"ISD::MSCATTER", SDTMaskedScatter,
				[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
				*/

				class ST_INDIRECT_MASKED_DESC_BASE<
				RegisterOperand ROWV,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs BoolMaskOpnd:$dstmask);
				dag InOperandList = (ins ROWV:$wl, // value register
				BoolMaskOpnd:$wd, // mask register
				ScatterGatherMemOperand:$wr // index register
				);
				string AsmString = "LS[$wr] = $wl ; // WRITE (scatter)";
				list<dag> Pattern = [(set BoolMaskOpnd:$dstmask,
				(masked_scatter2
				// See for pattern, multiclass avx512_scatter,
				// def mr for pattern:
				ROWV:$wl, BoolMaskOpnd:$wd, vectoraddr:$wr)
				)];
				InstrItinClass Itinerary = itin;
				string DecoderMethod = "DecodeMSA128Mem";
				}
				///////////////////////////////////////////////////////////////////////////////
				/////////////////////END (i)read/(i)write SPECS////////////////////////////////
				///////////////////////////////////////////////////////////////////////////////

llvm/lib/Target/Connex/ConnexInstrInfo.h

This file was added.

				//===-- ConnexInstrInfo.h - Connex Instruction Information ------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains the Connex implementation of the TargetInstrInfo class.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_CONNEX_CONNEXINSTRINFO_H
				#define LLVM_LIB_TARGET_CONNEX_CONNEXINSTRINFO_H

				#include "Connex.h"
				#include "ConnexRegisterInfo.h"
				#include "llvm/CodeGen/TargetInstrInfo.h"

				#define GET_INSTRINFO_HEADER
				#include "ConnexGenInstrInfo.inc"

				namespace llvm {

				class ConnexInstrInfo : public ConnexGenInstrInfo {
				const ConnexRegisterInfo RI;

				public:
				ConnexInstrInfo();

				const ConnexRegisterInfo &getRegisterInfo() const { return RI; }

				// Got a bit inspired from lib/Target/AMDGPU/SIInstrInfo.cpp
				bool expandPostRAPseudo(MachineInstr &MI) const override;

				// Note: we do not use Pre-RA hazard recognizer since it works on the
				// MachineInstr immediately after 1st scheduling pass, which is before the,
				// RA, TwoAddressInstructionPass, etc - so a lot of other instructions
				// will be added after 1st scheduling pass.
				// We would like our post-RA Hazard recognizer to be able to reschedule
				// instructions in a different order (with the ScoreBoardHazardRecognizer)
				// in order to avoid inserting useless NOPs.

				// USE_POSTRA_SCHED
				// Got inspired from llvm/lib/Target/PowerPC/PPCInstrInfo.h
				ScheduleHazardRecognizer *
				CreateTargetPostRAHazardRecognizer(const InstrItineraryData *II,
				const ScheduleDAG *DAG) const override;

				ScheduleHazardRecognizer *
				CreateTargetMIHazardRecognizer(const InstrItineraryData *II,
				// 2021_02_09: const ScheduleDAG *DAG
				const ScheduleDAGMI *DAG // 2021_02_09
				) const override;

				void insertNoop(MachineBasicBlock &MBB,
				MachineBasicBlock::iterator MI) const override;

				void copyPhysReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator I,
				const DebugLoc &DL, MCRegister DestReg, MCRegister SrcReg,
				bool KillSrc) const override;

				void storeRegToStackSlot(MachineBasicBlock &MBB,
				MachineBasicBlock::iterator MBBI, Register SrcReg,
				bool isKill, int FrameIndex,
				const TargetRegisterClass *RC,
				const TargetRegisterInfo *TRI,
				Register VReg) const override;

				void loadRegFromStackSlot(MachineBasicBlock &MBB,
				MachineBasicBlock::iterator MBBI, Register DestReg,
				int FrameIndex, const TargetRegisterClass *RC,
				const TargetRegisterInfo *TRI,
				Register VReg) const override;
				bool analyzeBranch(MachineBasicBlock &MBB, MachineBasicBlock *&TBB,
				MachineBasicBlock *&FBB,
				SmallVectorImpl<MachineOperand> &Cond,
				bool AllowModify) const override;

				unsigned removeBranch(MachineBasicBlock &MBB,
				int *BytesRemoved = nullptr) const override;

				unsigned insertBranch(MachineBasicBlock &MBB, MachineBasicBlock *TBB,
				MachineBasicBlock *FBB, ArrayRef<MachineOperand> Cond,
				const DebugLoc &DL,
				int *BytesAdded = nullptr) const override;

				bool isPredicable(MachineInstr &MI) const;

				protected:
				MachineMemOperand *GetMemOperand(MachineBasicBlock &MBB, int FI,
				MachineMemOperand::Flags Flag) const;
				}; // end class ConnexInstrInfo
				} // end namespace llvm

				#endif

llvm/lib/Target/Connex/ConnexInstrInfo.cpp

This file was added.

				//===-- ConnexInstrInfo.cpp - Connex Instruction Information ----- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains the Connex implementation of the TargetInstrInfo class.
				//
				//===----------------------------------------------------------------------===//

				#include "ConnexInstrInfo.h"
				#include "Connex.h"
				#include "ConnexHazardRecognizer.h" // USE_POSTRA_SCHED
				#include "ConnexSubtarget.h"
				#include "ConnexTargetMachine.h"
				#include "llvm/ADT/STLExtras.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/CodeGen/MachineFrameInfo.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/MC/TargetRegistry.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/ErrorHandling.h"

				#define DEBUG_TYPE "connex-lower"

				#define GET_INSTRINFO_CTOR_DTOR
				#include "ConnexGenInstrInfo.inc"

				using namespace llvm;

				MachineInstr getPredMachineInstr(MachineInstr MI, MachineInstr **succMI) {
				MachineBasicBlock *MBB = MI->getParent();
				DebugLoc DL = MBB->findDebugLoc(MI);

				LLVM_DEBUG(dbgs() << "getPredMachineInstr(): MI.getOpcode() = "
				<< MI->getOpcode() << "\n");

				// switch (MI.getOpcode())

				MachineInstr *predMI = NULL;
				*succMI = NULL;

				for (MachineBasicBlock::iterator I = MBB->begin(), IE = MBB->end(); I != IE;
				++I) {
				MachineInstr IMI = (MachineInstr )(&(*I));
				if (IMI == MI) {
				I++;
				succMI = (MachineInstr )(&(*I));
				break;
				}
				predMI = (MachineInstr )(&(I));
				LLVM_DEBUG(
				dbgs() << "getPredMachineInstr(): (I in MBB of MI) I->getOpcode() = "
				<< I->getOpcode() << "\n");
				}

				LLVM_DEBUG(dbgs() << "getPredMachineInstr(): MI = " << MI << "(" << MI << ")"
				<< "\n");
				if ((succMI) != NULL && (succMI) != nullptr) {
				LLVM_DEBUG(dbgs() << "getPredMachineInstr(): succMI = "
				// We do not put this one because we can have issues with
				// NULL/invalid MachineInstr (at least in case of
				// llc -regalloc=fast) << **succMI
				<< "[TO BE DONE]"
				<< "(" << *succMI << ")"
				<< "\n");
				} else {
				LLVM_DEBUG(dbgs() << "getPredMachineInstr(): succMI = NULL\n");
				}

				if (predMI != NULL) {
				LLVM_DEBUG(dbgs() << "getPredMachineInstr(): predMI = " << predMI << "("
				<< predMI << ")"
				<< "\n");
				} else {
				LLVM_DEBUG(dbgs() << "getPredMachineInstr(): predMI = NULL\n");
				}

				return predMI;
				}

				ConnexInstrInfo::ConnexInstrInfo()
				: ConnexGenInstrInfo(Connex::ADJCALLSTACKDOWN, Connex::ADJCALLSTACKUP) {}

				// Inspired from lib/Target/Mips/MipsInstrInfo.cpp
				MachineMemOperand *
				ConnexInstrInfo::GetMemOperand(MachineBasicBlock &MBB, int FI,
				MachineMemOperand::Flags Flag) const {
				LLVM_DEBUG(dbgs() << "Entered ConnexInstrInfo::GetMemOperand()\n");

				MachineFunction &MF = *MBB.getParent();
				MachineFrameInfo &MFI = MF.getFrameInfo();

				return MF.getMachineMemOperand(MachinePointerInfo::getFixedStack(MF, FI),
				// Flag, MFI.getObjectSize(FI), Align
				Flag, MFI.getObjectSize(FI),
				Align(MFI.getObjectAlign(FI)));
				}

				/*
				From http://llvm.org/docs/doxygen/html/classllvm_1_1TargetInstrInfo.html:
				virtual void copyPhysReg(MachineBasicBlock &MBB,
				MachineBasicBlock::iterator MI,
				DebugLoc DL, unsigned DestReg, unsigned SrcReg,
				bool KillSrc) const
				Emit instructions to copy a pair of physical registers.
				virtual void storeRegToStackSlot (MachineBasicBlock &MBB,
				MachineBasicBlock::iterator MI,
				unsigned SrcReg, bool isKill,
				int FrameIndex,
				const TargetRegisterClass *RC,
				const TargetRegisterInfo *TRI) const
				Store the specified register of the given register class to the specified
				stack frame index.
				virtual void loadRegFromStackSlot (MachineBasicBlock &MBB,
				MachineBasicBlock::iterator MI,
				unsigned DestReg, int FrameIndex,
				const TargetRegisterClass *RC,
				const TargetRegisterInfo *TRI) const
				Load the specified register of the given register class from the specified
				stack frame index.
				*/
				void ConnexInstrInfo::copyPhysReg(
				MachineBasicBlock &MBB, MachineBasicBlock::iterator I, const DebugLoc &DL,
				MCRegister DestReg, MCRegister SrcReg,
				bool KillSrc) const {
				LLVM_DEBUG(dbgs() << "Entered ConnexInstrInfo::copyPhysReg(I = " << I
				<< ", DestReg = " << DestReg << ", SrcReg = " << SrcReg
				<< ")\n");

				if (Connex::GPRRegClass.contains(DestReg, SrcReg)) {
				BuildMI(MBB, I, DL, get(Connex::MOV_rr), DestReg)
				.addReg(SrcReg, getKillRegState(KillSrc));
				} else if (Connex::VectorHRegClass.contains(DestReg, SrcReg)) {
				// llvm_unreachable("NOT implemented well!");

				/*
				// TODO
				if (SrgReg == ct) {
				BuildMI(MBB, I, DL, get(Connex::VLOAD_H), DestReg)
				.addImm(ct) //, getKillRegState(KillSrc))
				.addReg(SrcReg);
				}
				*/

				BuildMI(MBB, I, DL, get(Connex::ORV_H), DestReg)
				.addReg(SrcReg) //, getKillRegState(KillSrc))
				.addReg(SrcReg);
				} else
				// if (Connex::BoolMaskRegClass.contains(DestReg, SrcReg))
				if (Connex::BoolMaskRegClass.contains(DestReg) \|\|
				Connex::BoolMaskRegClass.contains(SrcReg)) {
				LLVM_DEBUG(dbgs() << "ConnexInstrInfo::copyPhysReg(): DestReg or SrcReg "
				"are in BoolMask\n");
				/*
				// Important-TODO: what if register Wh31, also called R(31), is already in
				// use for some other var?
				BuildMI(MBB, I, DL, get(Connex::VLOAD_H), Connex::Wh31)
				.addImm(0);

				BuildMI(MBB, I, DL, get(Connex::ORV_H), DestReg)
				.addReg(SrcReg) //, getKillRegState(KillSrc))
				.addReg(Connex::Wh31, getKillRegState(KillSrc));
				*/
				}
				/*
				// PREFERABLY_NOT_2019_03_21
				else
				if ( (Connex::MSA128WRegClass.contains(DestReg) &&
				Connex::VectorHRegClass.contains(SrcReg)) \|\|
				//
				(Connex::MSA128WRegClass.contains(SrcReg) &&
				Connex::VectorHRegClass.contains(DestReg)) ) {

				if (Connex::MSA128WRegClass.contains(DestReg)) {
				LLVM_DEBUG(dbgs()
				<< "ConnexInstrInfo::copyPhysReg(): DestReg is TYPE_VECTOR_I32 and "
				"SrcReg is TYPE_VECTOR_I16\n");
				}
				else
				if (Connex::MSA128WRegClass.contains(DestReg)) {
				LLVM_DEBUG(dbgs()
				<< "ConnexInstrInfo::copyPhysReg(): DestReg is TYPE_VECTOR_I16 and "
				"SrcReg is TYPE_VECTOR_I32\n");
				}

				// BuildMI(MBB, I, DL, get(Connex::INLINEASM));
				// This makes llc give error:
				// <<llvm/include/llvm/CodeGen/MachineInstr.h:293:
				//const llvm::MachineOperand& llvm::MachineInstr::getOperand(unsigned int)
				// const:
				// Assertion `i < getNumOperands() && "getOperand() out of range!"'
				// failed.>>
				// This works surprisingly:
				// BuildMI(MBB, I, DL, get(Connex::NOP_BITCONVERT_HW));

				#ifdef COPY_REGISTER_IMPLEMENTED_WITH_ORV_H
				//BuildMI(MBB, I, DL, get(Connex::NOP_BOGUS));
				BuildMI(MBB, I, DL, get(Connex::ORV_H), DestReg)
				.addReg(SrcReg) //, getKillRegState(KillSrc))
				.addReg(SrcReg);
				#endif
				}
				*/
				else {
				llvm_unreachable("Impossible reg-to-reg copy");
				}
				}

				// storeRegToStackSlot() and loadRegFromStackSlot() use
				// the FI argument (frame index, the index within the current frame)
				//
				// This implements spilling of registers (both scalar, and vector).
				void ConnexInstrInfo::storeRegToStackSlot(MachineBasicBlock &MBB,
				MachineBasicBlock::iterator I,
				Register SrcReg, bool IsKill, int FI,
				const TargetRegisterClass *RC,
				const TargetRegisterInfo *TRI,
				Register VReg) const {
				DebugLoc DL;

				LLVM_DEBUG(dbgs() << "Entered ConnexInstrInfo::storeRegToStackSlot(): FI = "
				<< FI << "\n");
				// assert(FI >= 2 && "I assumed wrong that frame index >= 2"); // 2019_08_03

				/* MEGA-TODO: the FI is only 1 variable, and we basically have 2 stack frames:
				- 1 for the scalar CPU
				- normally 1 for the separate address-space LS memory Connex vector processor,
				although Connex does NOT allow calls inside vector kernels,
				BUT the CPU does although a good case is not simple.

				Think of a case where this mildly-viciated solution is NOT good for
				programs (remember we output OPINCAA programs and NO CPU assembly code,
				and Connex does NOT allow calls inside vector kernels).

				Also, understand well why FI >= 2 always holds
				- it seems there is some prologue.
				*/
				// unsigned ConnexLSOffsetSpillLoad = (CONNEX_MEM_NUM_ROWS + 1) - FI;
				unsigned ConnexLSOffsetSpillLoad =
				(CONNEX_MEM_NUM_ROWS + CONNEX_MEM_NUM_ROWS_EXTRA_FOR_SPILL + 1) - FI;

				if (I != MBB.end())
				DL = I->getDebugLoc();

				if (RC == &Connex::GPRRegClass) {
				BuildMI(MBB, I, DL, get(Connex::STD))
				.addReg(SrcReg, getKillRegState(IsKill))
				.addFrameIndex(FI)
				.addImm(0);
				} else if (RC == &Connex::VectorHRegClass) {
				LLVM_DEBUG(dbgs() << " ConnexInstrInfo::storeRegToStackSlot(): Spilling Wh"
				<< SrcReg << " to ConnexLSOffsetSpillLoad = "
				<< ConnexLSOffsetSpillLoad << " (FI = " << FI << "), "
				<< "I == MBB.end() is " << (I == MBB.end())
				<< ", MBB = " << MBB.getFullName()
				<< ", &MBB.front() = " << &(MBB.front()) << "\n"
				<< "MBB = " << MBB
				//<< ", MBB.front() = " << MBB.front()
				);

				/* Very Important: after experimenting (see
				~/LLVM/Tests/DawnCC/91_SAD_f16/FEATURE_LENGTH_128/A/STDerr_llc_01)
				if we have INLINEASM at the beginning of the MBB, the MBB.front() is
				the 1st instruction AFTER these INLINEASM - this is why we can end up
				adding more NOPs...

				Important-TODO: we should take into consideration that vector.body has
				INLINEASM with host-side for loop here normally.
				*/

				// Note: this method is spilling the destination register of the
				// instruction *(I-1)
				/*
				// I got a strange error in LLVM when printing in certain cases *I
				// - see e.g. ~/LLVM/Tests/DawnCC/90_SSD_f16/3/STDerr_llc_01_old03
				LLVM_DEBUG(dbgs() << " ConnexInstrInfo::storeRegToStackSlot(): *I = "
				<< *I);
				*/

				MachineBasicBlock::iterator Iprev; // = I;

				MachineInstr *IMI;
				if (I == MBB.end())
				IMI = NULL;
				else
				IMI = (MachineInstr )(&(I));

				LLVM_DEBUG(dbgs() << " storeRegToStackSlot(): IMI = " << IMI << "\n");
				LLVM_DEBUG(dbgs() << " storeRegToStackSlot(): IMI == &MBB.front() = "
				<< (IMI == (&MBB.front())) << "\n");

				if ((I != MBB.end()) && (IMI != NULL) && (IMI != (&MBB.front()))) {
				Iprev = I;
				Iprev--;
				MachineInstr IprevMI = (MachineInstr )(&(*Iprev));

				LLVM_DEBUG(dbgs() << " storeRegToStackSlot(): IprevMI = " << IprevMI
				<< "\n");
				LLVM_DEBUG(
				dbgs() << " storeRegToStackSlot(): IprevMI->getNumOperands() = "
				<< IprevMI->getNumOperands() << "\n");
				LLVM_DEBUG(dbgs() << " storeRegToStackSlot(): IprevMI->getOpcode() == "
				"Connex::INLINEASM = "
				<< (IprevMI->getOpcode() == Connex::INLINEASM) << "\n");
				LLVM_DEBUG(dbgs() << " storeRegToStackSlot(): IprevMI->getOpcode() == "
				"Connex::VLOAD_H_SYM_IMM = "
				<< (IprevMI->getOpcode() == Connex::VLOAD_H_SYM_IMM)
				<< "\n");
				// The case where I screw up is LS[1013] = ...
				// because the INLINEASM before it is the MBB.front() and is INLINEASM.

				if (IprevMI != NULL &&
				// NOT necessary: (IprevMI != (&MBB.front())) &&
				// (IMI != (&MBB.front())) &&
				(IprevMI->getNumOperands() >
				0 \|\| // MEGA-TODO: understand why I give this
				IprevMI->getOpcode() == Connex::INLINEASM \|\|
				IprevMI->getOpcode() == Connex::VLOAD_H_SYM_IMM)) {

				LLVM_DEBUG(dbgs() << " storeRegToStackSlot(): Handling special case: "
				"if (IprevMI != NULL && ...).\n");

				MachineOperand &I0Opnd = IprevMI->getOperand(0);

				// Avoiding separating VLOAD_H_SYM_IMM from its corresponding INLINEASM
				if (IprevMI->getOpcode() == Connex::VLOAD_H_SYM_IMM) {
				// Treating Symbolic immediate operands
				// MEGA-TODO: check
				// assert(0 && "Bogus");
				assert(IprevMI->getNumOperands() > 0); // Just checking
				assert(IMI->getOpcode() == Connex::INLINEASM &&
				"The INLINEASM with the immediate operand should be next "
				"for VLOAD_H_SYM_IMM.");

				LLVM_DEBUG(dbgs() << " storeRegToStackSlot(): Treating "
				"VLOAD_H_SYM_IMM case.\n");
				I++;
				// Iprev++;
				}
				}
				}

				BuildMI(MBB, I, DL, get(Connex::ST_SPILL_H))
				.addReg(SrcReg, getKillRegState(IsKill))
				/*
				// Gives error I guess because it is a vector instruction, not eBPF one:
				// void llvm::MachineInstr::addOperand(llvm::MachineFunction&,
				// const llvm::MachineOperand&): Assertion `(isImpReg \|\| Op.isRegMask()
				\|\|
				// MCID->isVariadic() \|\| OpNo < MCID->getNumOperands() \|\| isMetaDataOp)
				&&
				// "Trying to add an operand to a machine instr that is already
				done!"'
				// failed.
				.addFrameIndex(FI)
				// Even if Connex does NOT have a stack, we can use LS mem to easily
				// simulate it.
				*/
				.addImm(ConnexLSOffsetSpillLoad);

				LLVM_DEBUG(
				dbgs() << " storeRegToStackSlot(): Added ST_SPILL_H instruction.\n");
				LLVM_DEBUG(dbgs() << " storeRegToStackSlot(): MBB = " << MBB << "\n");
				} else if (RC == &Connex::BoolMaskRegClass) {
				/*
				BuildMI(MBB, I, DL, get(Connex::ST_H))
				.addReg(SrcReg, getKillRegState(IsKill))
				.addImm(CONNEX_MEM_NUM_ROWS - 100);
				// TODO: this is just bogus I guess, no need to spill v8i1 register
				*/
				} else {
				llvm_unreachable("Connex back end: Can't store register to stack slot");
				}
				}

				// This implements filling/reloading - i.e., load for spilled registers
				// (both scalar, and vector).
				void ConnexInstrInfo::loadRegFromStackSlot(MachineBasicBlock &MBB,
				MachineBasicBlock::iterator I,
				Register DestReg, int FI,
				const TargetRegisterClass *RC,
				const TargetRegisterInfo *TRI,
				Register VReg) const {
				DebugLoc DL;

				LLVM_DEBUG(dbgs() << "Entered ConnexInstrInfo::loadRegFromStackSlot(): FI = "
				<< FI << "\n");

				// assert(FI >= 2 && "I assumed wrong that frame index >= 2");

				// unsigned ConnexLSOffsetFillLoad = (CONNEX_MEM_NUM_ROWS + 1) - FI;
				unsigned ConnexLSOffsetFillLoad =
				(CONNEX_MEM_NUM_ROWS + CONNEX_MEM_NUM_ROWS_EXTRA_FOR_SPILL + 1) - FI;

				if (I != MBB.end())
				DL = I->getDebugLoc();

				if (RC == &Connex::GPRRegClass) {
				BuildMI(MBB, I, DL, get(Connex::LDD), DestReg).addFrameIndex(FI).addImm(0);
				} else if (RC == &Connex::VectorHRegClass) {
				/*
				// This actually generates a malformed scalar instruction with
				// vector register
				BuildMI(MBB, I, DL, get(Connex::LDD), DestReg)
				.addFrameIndex(FI)
				.addImm(0);
				*/
				/*
				// It is NOT correct since LLVM assumes it uses a stack and the
				// operations are sort of PUSH/POP. Even if Connex does NOT have
				// a stack, we can use LS to easily simulate it.
				BuildMI(MBB, I, DL, get(Connex::LD_H), DestReg)
				.addImm(CONNEX_MEM_NUM_ROWS - 1 - DestReg);
				*/

				LLVM_DEBUG(dbgs() << " ConnexInstrInfo::loadRegFromStackSlot(): Filling Wh"
				<< DestReg << " from ConnexLSOffsetFillLoad = "
				<< ConnexLSOffsetFillLoad << " (FI = " << FI << ")\n");

				/*
				Important: Adding the NOP is NOT required, since the iread Connex
				instruction does NOT require the insertion of a delay slot between
				them and the instruction that uses the register read from the LS memory.
				*/
				BuildMI(MBB, I, DL, get(Connex::LD_FILL_H), DestReg)
				.addImm(ConnexLSOffsetFillLoad);
				/* TODO: get num vector registers from ConnexRegisterInfo.td:
				def VectorH: RegisterClass<"Connex", [v128i16], 32, */
				} else {
				llvm_unreachable("Connex back end: Can't load register from stack slot");
				}
				}

				bool ConnexInstrInfo::analyzeBranch(MachineBasicBlock &MBB,
				MachineBasicBlock *&TBB,
				MachineBasicBlock *&FBB,
				SmallVectorImpl<MachineOperand> &Cond,
				bool AllowModify) const {
				// Start from the bottom of the block and work up, examining the
				// terminator instructions.
				MachineBasicBlock::iterator I = MBB.end();
				while (I != MBB.begin()) {
				--I;
				if (I->isDebugValue())
				continue;

				// Working from the bottom, when we see a non-terminator
				// instruction, we're done.
				if (!isUnpredicatedTerminator(*I))
				break;

				// A terminator that isn't a branch can't easily be handled
				// by this analysis.
				if (!I->isBranch())
				return true;

				// Handle unconditional branches.
				if (I->getOpcode() == Connex::JMP) {
				if (!AllowModify) {
				TBB = I->getOperand(0).getMBB();
				continue;
				}

				// If the block has any instructions after a J, delete them.
				while (std::next(I) != MBB.end())
				std::next(I)->eraseFromParent();
				Cond.clear();
				FBB = 0;

				// Delete the J if it's equivalent to a fall-through.
				if (MBB.isLayoutSuccessor(I->getOperand(0).getMBB())) {
				TBB = 0;
				I->eraseFromParent();
				I = MBB.end();
				continue;
				}

				// TBB is used to indicate the unconditinal destination.
				TBB = I->getOperand(0).getMBB();
				continue;
				}
				// Cannot handle conditional branches
				return true;
				}

				return false;
				}

				unsigned ConnexInstrInfo::insertBranch(
				MachineBasicBlock &MBB, MachineBasicBlock TBB, MachineBasicBlock FBB,
				ArrayRef<MachineOperand> Cond, const DebugLoc &DL, int *BytesAdded) const {
				// Shouldn't be a fall through.
				assert(TBB && "InsertBranch must not be told to insert a fallthrough");

				if (Cond.empty()) {
				// Unconditional branch
				assert(!FBB && "Unconditional branch with multiple successors!");
				BuildMI(&MBB, DL, get(Connex::JMP)).addMBB(TBB);
				return 1;
				}

				llvm_unreachable("Unexpected conditional branch");
				}

				unsigned ConnexInstrInfo::removeBranch(MachineBasicBlock &MBB,
				int *BytesRemoved) const {
				MachineBasicBlock::iterator I = MBB.end();
				unsigned Count = 0;

				while (I != MBB.begin()) {
				--I;
				if (I->isDebugValue())
				continue;
				if (I->getOpcode() != Connex::JMP)
				break;
				// Remove the branch.
				I->eraseFromParent();
				I = MBB.end();
				++Count;
				}

				return Count;
				}

				/*
				TODO: better implement it in ConnexTargetMachine::addPreRegAlloc(), in
				order to avoid any spills the register allocator might create.

				Creating in ConnexInstrInfo::expandPostRAPseudo() bundle instructions
				with VLOAD_H_SYM_IMM + INLINEASM.
				This is a decent compromise although I do NOT use pseudo-instructions,
				using this after Register Allocation (PostRA) works because:
				- Important: INLINEASM is considered a pseudo-instruction (NOTE that
				VLOAD_H_SYM_IMM is NOT considered a pseudo-instruction);
				- pre-RA scheduler does NOT break the VLOAD_H_SYM_IMM from its associated
				INLINEASM;
				- register allocator does NOT break either the VLOAD_H_SYM_IMM from its
				associated INLINEASM, more exactly it doesn't insert spills or fills
				between the two instructions as far as I can see. Important: however I
				am NOT sure if this is always going to hold.
				As of Feb 2017, class TargetInstrInfo
				(see http://llvm.org/docs/doxygen/html/classllvm_1_1TargetInstrInfo.html)
				has a few methods called on MachineInstr, but expandPostRAPseudo() seems
				to be a very good candidate (also it has no method with MachineSDNode).
				Anyhow, we could create and register our own pass working on MachineInstr in
				order to bundle instructions together (or on MachineSDNode, before pre-RA
				scheduler, although I guess it might be DIFFICULT to bundle from
				MachineSDNode to MachineInstr, since we have to perform a simple scheduling).

				From http://llvm.org/docs/doxygen/html/classllvm_1_1TargetInstrInfo.html
				<<This function is called for all pseudo instructions that remain after
				register allocation.
				Many pseudo instructions are created to help register allocation.
				This is the place to convert them into real instructions.
				The target can edit MI in place, or it can insert new instructions and
				erase MI.
				The function should return true if anything was changed.>>
				*/
				bool ConnexInstrInfo::expandPostRAPseudo(MachineInstr &MI) const {
				// Making expandPostRAPseudo() do nothing:
				return false;

				LLVM_DEBUG(
				dbgs() << "ConnexInstrInfo::expandPostRAPseudo(): MI.getOpcode() = "
				<< MI.getOpcode() << "\n");

				MachineBasicBlock *MBB = MI.getParent();
				DebugLoc DL = MBB->findDebugLoc(MI);

				/*
				// Inspired from lib/Target/PowerPC/PPCCTRLoops.cpp
				for (MachineBasicBlock::pred_iterator PI = MBB->pred_begin(),
				PIE = MBB->pred_end(); PI != PIE; ++PI)
				Preds.push_back(*PI);
				*/
				switch (MI.getOpcode()) {
				default:
				// return expandPostRAPseudo(MI);
				return false;

				case Connex::VLOAD_H_SYM_IMM:
				// This is just a placeholder for register allocation.
				LLVM_DEBUG(
				dbgs()
				<< "ConnexInstrInfo::expandPostRAPseudo(): found VLOAD_H_SYM_IMM\n");
				// MI.eraseFromParent();
				break;

				case Connex::INLINEASM:
				// This is just a placeholder for register allocation.
				LLVM_DEBUG(
				dbgs() << "ConnexInstrInfo::expandPostRAPseudo(): found INLINEASM\n");

				/*
				MachineInstr *predMI = NULL;
				MachineInstr *succMI = NULL;
				for (MachineBasicBlock::iterator I = MBB->begin(),
				IE = MBB->end(); I != IE; ++I) {
				MachineInstr *IMI = I;
				if (IMI == &MI) {
				I++;
				succMI = I;
				// predMI contains normally instruction VLOAD_H_SYM_IMM
				break;
				}
				predMI = I;
				LLVM_DEBUG(dbgs() << "expandPostRAPseudo(): (pred) I->getOpcode() = "
				<< I->getOpcode() << "\n");
				}
				*/
				MachineInstr *succMI;
				MachineInstr *predMI = getPredMachineInstr(&MI, &succMI);

				if (predMI != NULL) {
				LLVM_DEBUG(dbgs() << "expandPostRAPseudo(): predMI = " << predMI << "("
				<< predMI << ")"
				<< "\n");
				LLVM_DEBUG(dbgs() << "expandPostRAPseudo(): succMI = " << succMI << "("
				<< succMI << ")"
				<< "\n");
				LLVM_DEBUG(dbgs() << "expandPostRAPseudo(): MI = " << MI << "(" << &MI
				<< ")"
				<< "\n");

				if (predMI->getOpcode() == Connex::VLOAD_H_SYM_IMM) {
				// Inspired from lib/Target/AMDGPU/SIInstrInfo.cpp
				// (or Mips/MipsDelaySlotFiller.cpp)
				/* Create a bundle so these instructions won't be re-ordered by the
				post-RA scheduler. */

				/*
				#ifdef THIS_DOES_NOT_ASMPRINT_BUNDLES
				MIBundleBuilder Bundler(*MBB, MI);

				LLVM_DEBUG(dbgs() << "expandPostRAPseudo(): predMI->getParent() = "
				<< predMI->getParent() << "\n");

				// This must NOT be commented. Otherwise, it results in ~strange error
				in ConnexMCInstLower::Lower()
				predMI->eraseFromParent();
				LLVM_DEBUG(dbgs()
				<< "expandPostRAPseudo(): appending predMI to bundle\n");
				Bundler.append(predMI);

				LLVM_DEBUG(dbgs()
				<< "expandPostRAPseudo(): calling finalizeBundle()\n");
				// See llvm.org/docs/doxygen/html/MachineInstrBundle_8cpp_source.html
				llvm::finalizeBundle(*MBB, Bundler.begin());

				MI.eraseFromParent();

				#ifdef NOT_USEFUL
				// Inspired from
				// llvm.org/docs/doxygen/html/MachineInstrBuilder_8h_source.html
				MI.bundleWithPred();
				// Does NOT compile: llvm::finalizeBundle(MBB, predMI);
				#endif
				*/

				/* We now know that MI is the INLINEASM instruction that
				needs to be bundled with the previous instruction, predMI.
				*/
				/*
				We do NOT use MIBundleBuilder,
				with eventual MI/predMI/succMI.eraseFromParent().
				Just predMI and succMI iterators.
				Note that succMI is required if we want to bundle
				instructions in the interval
				predMI..MI, where succMI = succ(MI).

				So we normally bundle here: predMI, MI (without succMI).
				*/
				/* See llvm.org/docs/doxygen/html/MachineInstrBundle_8cpp_source.html
				and llvm.org/docs/doxygen/html/MachineInstrBundle_8cpp_source.html
				*/
				llvm::finalizeBundle(*MBB, (MachineBasicBlock::instr_iterator)predMI,
				(MachineBasicBlock::instr_iterator)succMI);
				// (MachineBasicBlock::instr_iterator)&MI);

				/*
				// See llvm.org/docs/doxygen/html/classllvm_1_1MIBundleBuilder.html
				// MIBundleBuilder(MachineBasicBlock &BB,
				// MachineBasicBlock::iterator B,
				// MachineBasicBlock::iterator E)
				// Create a bundle from the sequence of instructions between B and E.
				MIBundleBuilder Bundler(*MBB, predMI, MI);

				// MI.eraseFromParent();
				// Bundler.append(&MI);

				// Bundler.append(&MI);
				//

				// Gives error
				// include/llvm/CodeGen/MachineInstrBundleIterator.h:42:
				// llvm::MachineInstrBundleIterator<Ty>::
				// MachineInstrBundleIterator(Ty*)[with Ty = llvm::MachineInstr]:
				// Assertion `(!MI \|\| !MI->isBundledWithPred()) && "It's not legal to
				// initialize " "MachineInstrBundleIterator "
				// "with a bundled MI"' failed.
				////MIBundleBuilder Bundler(MBB, predMI, succMI);

				// See llvm.org/docs/doxygen/html/MachineInstrBundle_8cpp_source.html
				llvm::finalizeBundle(*MBB, Bundler.begin());

				MI.eraseFromParent();

				// This yields error <<[with Ty = llvm::MachineInstr]:
				// Assertion `(!MI \|\| !MI->isBundledWithPred()) &&
				// "It's not legal to initialize " "MachineInstrBundleIterator "
				// "with a bundled MI"' failed.>>
				// predMI->eraseFromParent();
				*/
				}
				}

				break;
				}

				LLVM_DEBUG(dbgs() << "Before exit expandPostRAPseudo():\n");
				// Gives error since MI can be bundled: <<Assertion `!MI.isBundledWithPred()
				// && "It's not legal to initialize " "MachineInstrBundleIterator with a "
				// "bundled MI"' failed.>> MachineBasicBlock &MBB = *(MI.getParent());

				// From http://llvm.org/docs/doxygen/html/classllvm_1_1MachineBasicBlock.html
				// for (auto it: *MBB)
				for (MachineBasicBlock::iterator I = MBB->begin(), IE = MBB->end(); I != IE;
				++I) {
				/*
				LLVM_DEBUG(dbgs()
				<< "ConnexInstrInfo::expandPostRAPseudo(): it->getOpcode() = "
				<< it->getOpcode() << "\n");
				*/
				LLVM_DEBUG(dbgs() << " I = " << *I << "\n");
				/*
				switch (MI.getOpcode()) {
				}
				*/
				}

				/*
				const SIRegisterInfo *TRI
				= static_cast<const SIRegisterInfo *>(ST.getRegisterInfo());
				MachineFunction &MF = MBB->getParent();
				unsigned Reg = MI.getOperand(0).getReg();
				unsigned RegLo = TRI->getSubReg(Reg, AMDGPU::sub0);
				unsigned RegHi = TRI->getSubReg(Reg, AMDGPU::sub1);

				// Create a bundle so these instructions won't be re-ordered by the
				// post-RA scheduler.
				MIBundleBuilder Bundler(*MBB, MI);
				Bundler.append(BuildMI(MF, DL, get(AMDGPU::S_GETPC_B64), Reg));

				// Add 32-bit offset from this instruction to the start of the
				// constant data.
				Bundler.append(BuildMI(MF, DL, get(AMDGPU::S_ADD_U32), RegLo)
				.addReg(RegLo)
				.addOperand(MI.getOperand(1)));

				llvm::finalizeBundle(*MBB, Bundler.begin());

				MI.eraseFromParent();
				break;
				*/

				return false;
				} // End ConnexInstrInfo::expandPostRAPseudo()

				// USE_POSTRA_SCHED
				// Inspired from llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1TargetInstrInfo.html
				ScheduleHazardRecognizer *ConnexInstrInfo::CreateTargetPostRAHazardRecognizer(
				const InstrItineraryData II, const ScheduleDAG DAG) const {
				LLVM_DEBUG(
				dbgs()
				<< "Entered ConnexInstrInfo::CreateTargetPostRAHazardRecognizer()\n");

				return new ConnexDispatchGroupSBHazardRecognizer(II, DAG);
				}

				/*
				ScheduleHazardRecognizer *
				ConnexInstrInfo::CreateTargetPostRAHazardRecognizer(const MachineFunction &MF)
				const {
				LLVM_DEBUG(dbgs()
				<< "Entered ConnexInstrInfo::"
				"CreateTargetPostRAHazardRecognizer(MachineFunction)\n");

				// TODO: Get inspired from AMDGPU how they added separate
				// PostRA HazardRecognizer.
				// See http://llvm.org/doxygen/classllvm_1_1MachineFunction.html
				return new ConnexDispatchGroupSBHazardRecognizer(II, DAG);
				}
				*/

				// Pre-RA mach. instr. scheduler hazard recognizer
				// I guess this method is called if I give llc -enable-misched,
				// which invokes MIScheduler
				// (see e.g. https://llvm.org/devmtg/2016-09/slides/Absar-SchedulingInOrder.pdf)
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1TargetInstrInfo.html
				ScheduleHazardRecognizer *ConnexInstrInfo::CreateTargetMIHazardRecognizer(
				const InstrItineraryData *II,
				const ScheduleDAGMI *DAG) const {
				LLVM_DEBUG(
				dbgs() << "Entered ConnexInstrInfo::CreateTargetMIHazardRecognizer()\n");

				llvm_unreachable("ConnexInstrInfo::CreateTargetMIHazardRecognizer() "
				"not implemented");
				// return new ConnexDispatchGroupSBHazardRecognizerPreRAScheduler(II, DAG);
				}

				/*
				// USE_PRERA_HAZARD_RECOGNIZER

				// Pre-RA scheduler - default scheduler (no special param given to llc)
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1TargetInstrInfo.html
				ScheduleHazardRecognizer *ConnexInstrInfo::CreateTargetHazardRecognizer(
				const TargetSubtargetInfo *STI,
				const ScheduleDAG *DAG) const {
				LLVM_DEBUG(dbgs()
				<< "Entered ConnexInstrInfo::CreateTargetHazardRecognizer()\n");

				return new ConnexDispatchGroupSBHazardRecognizerPreRAScheduler(
				// See http://llvm.org/docs/doxygen/html/TargetSubtargetInfo_8h_source.html
				STI->getInstrItineraryData(),
				DAG);
				}
				*/

				// Inspired from llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
				void ConnexInstrInfo::insertNoop(MachineBasicBlock &MBB,
				MachineBasicBlock::iterator MI) const {
				LLVM_DEBUG(dbgs() << "Entered ConnexInstrInfo::insertNoop()\n");

				DebugLoc DL;
				BuildMI(MBB, MI, DL, get(Connex::NOP));
				}

				// From http://llvm.org/docs/doxygen/html/classllvm_1_1TargetInstrInfo.html:
				// <<Return true if the specified instruction can be predicated.>>
				/* From http://llvm.org/docs/doxygen/html/classllvm_1_1MachineInstr.html:
				<<bool isPredicable (QueryType Type=AllInBundle) const
				Return true if this instruction has a predicate operand that
				controls execution.>>
				*/
				// Inspired from ARMBaseInstrInfo::isPredicable
				bool ConnexInstrInfo::isPredicable(MachineInstr &MI) const {
				// if (!MI.isPredicable())
				// return false;
				LLVM_DEBUG(dbgs() << "ConnexInstrInfo::isPredicable(): MI.getOpcode() = "
				<< MI.getOpcode() << "\n");

				if (MI.getOpcode() == Connex::VLOAD_H) {
				return true;
				}

				return false;
				}

llvm/lib/Target/Connex/ConnexInstrInfo.td

This file was added.

				//===-- ConnexInstrInfo.td - Target Description for Connex Target ---------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file describes the Connex instructions in TableGen format.
				//
				//===----------------------------------------------------------------------===//

				include "ConnexInstrFormats.td"

				include "ConnexInstrInfoVec.td"





				include "ConnexInstrInfoScalar.td"

llvm/lib/Target/Connex/ConnexInstrInfoScalar.td

This file was added.

				//=- ConnexInstrInfoScalar.td - Scalar Target Description for Connex Target -=//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file describes the Connex scalar instructions in TableGen format.
				// It basically implements the BPF ISA.
				//
				//===----------------------------------------------------------------------===//

				// Instruction Operands and Patterns (64 bits operands BPF)

				// These are target-independent nodes, but have target-specific formats.
				def SDT_ConnexCallSeqStart : SDCallSeqStart<[SDTCisVT<0, iPTR>,
				SDTCisVT<1, iPTR>]>;
				def SDT_ConnexCallSeqEnd : SDCallSeqEnd<[SDTCisVT<0, iPTR>,
				SDTCisVT<1, iPTR>]>;
				def SDT_ConnexCall : SDTypeProfile<0, -1, [SDTCisVT<0, iPTR>]>;
				def SDT_ConnexSetFlag : SDTypeProfile<0, 3, [SDTCisSameAs<0, 1>]>;
				def SDT_ConnexSelectCC : SDTypeProfile<1, 5, [SDTCisSameAs<1, 2>,
				SDTCisSameAs<0, 4>,
				SDTCisSameAs<4, 5>]>;
				def SDT_ConnexBrCC : SDTypeProfile<0, 4, [SDTCisSameAs<0, 1>,
				SDTCisVT<3, OtherVT>]>;
				def SDT_ConnexWrapper : SDTypeProfile<1, 1, [SDTCisSameAs<0, 1>,
				SDTCisPtrTy<0>]>;

				def Connexcall : SDNode<"ConnexISD::CALL", SDT_ConnexCall,
				[SDNPHasChain, SDNPOptInGlue, SDNPOutGlue,
				SDNPVariadic]>;
				def Connexretflag : SDNode<"ConnexISD::RET_FLAG", SDTNone,
				[SDNPHasChain, SDNPOptInGlue, SDNPVariadic]>;
				def Connexcallseq_start: SDNode<"ISD::CALLSEQ_START", SDT_ConnexCallSeqStart,
				[SDNPHasChain, SDNPOutGlue]>;
				def Connexcallseq_end : SDNode<"ISD::CALLSEQ_END", SDT_ConnexCallSeqEnd,
				[SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;
				def Connexbrcc : SDNode<"ConnexISD::BR_CC", SDT_ConnexBrCC,
				[SDNPHasChain, SDNPOutGlue, SDNPInGlue]>;

				def Connexselectcc : SDNode<"ConnexISD::SELECT_CC", SDT_ConnexSelectCC,
				[SDNPInGlue]>;
				def ConnexWrapper : SDNode<"ConnexISD::Wrapper", SDT_ConnexWrapper>;

				def brtarget : Operand<OtherVT>;
				def calltarget : Operand<i64>;

				def u64imm : Operand<i64> {
				let PrintMethod = "printImm64Operand";
				}

				/* Added type qualifier i64 to avoid error
				"Could not infer all types in pattern!" - type ambiguity since the
				variable name ("in dag operator") does not have a type and this poses
				issues to the Type inference algorithm, since I added in
				ConnexRegisterInfo.td a second RegisterClass with type v128i16... (or v8i64)
				*/
				def i64immSExt32 : PatLeaf<(i64 imm),
				[{return isInt<32>(N->getSExtValue()); }]>;

				// Addressing modes.
				def ADDRri : ComplexPattern<i64, 2, "SelectAddr", [], []>;
				def FIri : ComplexPattern<i64, 2, "SelectFIAddr", [add, or], []>;

				// Address operands
				def MEMri : Operand<i64> {
				let PrintMethod = "printMemOperand";
				let EncoderMethod = "getMemoryOpValue";
				let MIOperandInfo = (ops GPR, i16imm);
				}


				/* Added type qualifier i64 to avoid error
				"Could not infer all types in pattern!"
				- type ambiguity since the variable name ("in dag operator") does not have a
				type and this poses issues to the Type inference algorithm, since I added
				in ConnexRegisterInfo.td a second RegisterClass with type v2i64). */
				// Conditional code predicates - used for pattern matching for jump instructions
				def Connex_CC_EQ : PatLeaf<(i64 imm),
				[{return (N->getZExtValue() == ISD::SETEQ);}]>;
				def Connex_CC_NE : PatLeaf<(i64 imm),
				[{return (N->getZExtValue() == ISD::SETNE);}]>;
				def Connex_CC_GE : PatLeaf<(i64 imm),
				[{return (N->getZExtValue() == ISD::SETGE);}]>;
				def Connex_CC_GT : PatLeaf<(i64 imm),
				[{return (N->getZExtValue() == ISD::SETGT);}]>;
				def Connex_CC_GTU : PatLeaf<(i64 imm),
				[{return (N->getZExtValue() == ISD::SETUGT);}]>;
				def Connex_CC_GEU : PatLeaf<(i64 imm),
				[{return (N->getZExtValue() == ISD::SETUGE);}]>;

				// jump instructions
				class JMP_RR<bits<4> Opc, string OpcodeStr, PatLeaf Cond>
				: InstConnex<(outs), (ins GPR:$dst, GPR:$src, brtarget:$BrDst),
				!strconcat(OpcodeStr, "\t$dst, $src goto $BrDst"),
				[(Connexbrcc i64:$dst, i64:$src, Cond, bb:$BrDst)]> {
				bits<4> op;
				bits<1> ConnexSrc;
				bits<4> dst;
				bits<4> src;
				bits<16> BrDst;

				let Inst{63-60} = op;
				let Inst{59} = ConnexSrc;
				let Inst{55-52} = src;
				let Inst{51-48} = dst;
				let Inst{47-32} = BrDst;

				let op = Opc;
				let ConnexSrc = 1;
				let ConnexClass = 5; // Connex_JMP
				}

				class JMP_RI<bits<4> Opc, string OpcodeStr, PatLeaf Cond>
				: InstConnex<(outs), (ins GPR:$dst, i64imm:$imm, brtarget:$BrDst),
				!strconcat(OpcodeStr, "i\t$dst, $imm goto $BrDst"),
				[(Connexbrcc i64:$dst, i64immSExt32:$imm, Cond, bb:$BrDst)]> {
				bits<4> op;
				bits<1> ConnexSrc;
				bits<4> dst;
				bits<16> BrDst;
				bits<32> imm;

				let Inst{63-60} = op;
				let Inst{59} = ConnexSrc;
				let Inst{51-48} = dst;
				let Inst{47-32} = BrDst;
				let Inst{31-0} = imm;

				let op = Opc;
				let ConnexSrc = 0;
				let ConnexClass = 5; // Connex_JMP
				}

				multiclass J<bits<4> Opc, string OpcodeStr, PatLeaf Cond> {
				def _rr : JMP_RR<Opc, OpcodeStr, Cond>;
				def _ri : JMP_RI<Opc, OpcodeStr, Cond>;
				}

				let isBranch = 1, isTerminator = 1, hasDelaySlot=0 in {
				// cmp+goto instructions
				defm JEQ : J<0x1, "jeq", Connex_CC_EQ>;
				defm JUGT : J<0x2, "jgt", Connex_CC_GTU>;
				defm JUGE : J<0x3, "jge", Connex_CC_GEU>;
				defm JNE : J<0x5, "jne", Connex_CC_NE>;
				defm JSGT : J<0x6, "jsgt", Connex_CC_GT>;
				defm JSGE : J<0x7, "jsge", Connex_CC_GE>;
				}

				// Inspired from def : Pat<(f32 (load addr:$src)), (LDRAM addr:$src)>;
				//Pat<(f32 (load addr:$src)), (JEQ_ri addr:$src)>;

				// ALU instructions
				class ALU_RI<bits<4> Opc, string OpcodeStr, SDNode OpNode>
				: InstConnex<(outs GPR:$dst), (ins GPR:$src2, i64imm:$imm),
				!strconcat(OpcodeStr, "i\t$dst, $imm"),
				[(set GPR:$dst, (OpNode GPR:$src2, i64immSExt32:$imm))]> {
				//[(set GPR:$dst, (OpNode GPR:$src2, (i64 i64immSExt32:$imm)))]> {
				bits<4> op;
				bits<1> ConnexSrc;
				bits<4> dst;
				bits<32> imm;

				let Inst{63-60} = op;
				let Inst{59} = ConnexSrc;
				let Inst{51-48} = dst;
				let Inst{31-0} = imm;

				let op = Opc;
				let ConnexSrc = 0;
				let ConnexClass = 7; // Connex_ALU64
				}

				class ALU_RR<bits<4> Opc, string OpcodeStr, SDNode OpNode>
				: InstConnex<(outs GPR:$dst), (ins GPR:$src2, GPR:$src),
				!strconcat(OpcodeStr, "\t$dst, $src"),
				[(set GPR:$dst, (OpNode i64:$src2, i64:$src))]> {
				bits<4> op;
				bits<1> ConnexSrc;
				bits<4> dst;
				bits<4> src;

				let Inst{63-60} = op;
				let Inst{59} = ConnexSrc;
				let Inst{55-52} = src;
				let Inst{51-48} = dst;

				let op = Opc;
				let ConnexSrc = 1;
				let ConnexClass = 7; // Connex_ALU64
				}

				multiclass ALU<bits<4> Opc, string OpcodeStr, SDNode OpNode> {
				def _rr : ALU_RR<Opc, OpcodeStr, OpNode>;
				def _ri : ALU_RI<Opc, OpcodeStr, OpNode>;
				}

				let Constraints = "$dst = $src2" in {
				let isAsCheapAsAMove = 1 in {
				defm ADD : ALU<0x0, "add", add>;
				defm SUB : ALU<0x1, "sub", sub>;
				defm OR : ALU<0x4, "or", or>;
				defm AND : ALU<0x5, "and", and>;
				defm SLL : ALU<0x6, "sll", shl>;
				defm SRL : ALU<0x7, "srl", srl>;
				defm XOR : ALU<0xa, "xor", xor>;
				defm SRA : ALU<0xc, "sra", sra>;
				}
				defm MUL : ALU<0x2, "mul", mul>;
				defm DIV : ALU<0x3, "div", udiv>;
				}

				class MOV_RR<string OpcodeStr>
				: InstConnex<(outs GPR:$dst), (ins GPR:$src),
				!strconcat(OpcodeStr, "\t$dst, $src"),
				[]> {
				bits<4> op;
				bits<1> ConnexSrc;
				bits<4> dst;
				bits<4> src;

				let Inst{63-60} = op;
				let Inst{59} = ConnexSrc;
				let Inst{55-52} = src;
				let Inst{51-48} = dst;

				let op = 0xb; // Connex_MOV
				let ConnexSrc = 1; // Connex_X
				let ConnexClass = 7; // Connex_ALU64
				}

				class MOV_RI<string OpcodeStr>
				: InstConnex<(outs GPR:$dst), (ins i64imm:$imm),
				!strconcat(OpcodeStr, "\t$dst, $imm"),
				[(set GPR:$dst, (i64 i64immSExt32:$imm))]> {
				bits<4> op;
				bits<1> ConnexSrc;
				bits<4> dst;
				bits<32> imm;

				let Inst{63-60} = op;
				let Inst{59} = ConnexSrc;
				let Inst{51-48} = dst;
				let Inst{31-0} = imm;

				let op = 0xb; // Connex_MOV
				let ConnexSrc = 0; // Connex_K
				let ConnexClass = 7; // Connex_ALU64
				}

				class LD_IMM64<bits<4> Pseudo, string OpcodeStr>
				: InstConnex<(outs GPR:$dst), (ins u64imm:$imm),
				!strconcat(OpcodeStr, "\t$dst, $imm"),
				[(set GPR:$dst, (i64 imm:$imm))]> {

				bits<3> mode;
				bits<2> size;
				bits<4> dst;
				bits<64> imm;

				let Inst{63-61} = mode;
				let Inst{60-59} = size;
				let Inst{51-48} = dst;
				let Inst{55-52} = Pseudo;
				let Inst{47-32} = 0;
				let Inst{31-0} = imm{31-0};

				let mode = 0; // Connex_IMM
				let size = 3; // Connex_DW
				let ConnexClass = 0; // Connex_LD
				}

				let isReMaterializable = 1, isAsCheapAsAMove = 1 in {
				def LD_imm64 : LD_IMM64<0, "ld_64">;
				def MOV_rr : MOV_RR<"mov">;
				def MOV_ri : MOV_RI<"mov">;
				}

				def FI_ri
				: InstConnex2<(outs GPR:$dst), (ins MEMri:$addr),
				"lea\t$dst, $addr",
				[(set i64:$dst, FIri:$addr)]> {
				// This is a tentative instruction, and will be replaced
				// with MOV_rr and ADD_ri in PEI phase
				}


				def LD_pseudo
				: InstConnex<(outs GPR:$dst), (ins i64imm:$pseudo, u64imm:$imm),
				"ld_pseudo\t$dst, $pseudo, $imm",
				[(set GPR:$dst, (int_connex_pseudo imm:$pseudo, imm:$imm))]> {

				bits<3> mode;
				bits<2> size;
				bits<4> dst;
				bits<64> imm;
				bits<4> pseudo;

				let Inst{63-61} = mode;
				let Inst{60-59} = size;
				let Inst{51-48} = dst;
				let Inst{55-52} = pseudo;
				let Inst{47-32} = 0;
				let Inst{31-0} = imm{31-0};

				let mode = 0; // Connex_IMM
				let size = 3; // Connex_DW
				let ConnexClass = 0; // Connex_LD
				}

				// STORE instructions
				class STORE<bits<2> SizeOp, string OpcodeStr, list<dag> Pattern>
				: InstConnex<(outs), (ins GPR:$src, MEMri:$addr),
				!strconcat(OpcodeStr, "\t$addr, $src"), Pattern> {
				bits<3> mode;
				bits<2> size;
				bits<4> src;
				bits<20> addr;

				let Inst{63-61} = mode;
				let Inst{60-59} = size;
				let Inst{51-48} = addr{19-16}; // base reg
				let Inst{55-52} = src;
				let Inst{47-32} = addr{15-0}; // offset

				let mode = 3; // Connex_MEM
				let size = SizeOp;
				let ConnexClass = 3; // Connex_STX
				}

				class STOREi64<bits<2> Opc, string OpcodeStr, PatFrag OpNode>
				: STORE<Opc, OpcodeStr, [(OpNode i64:$src, ADDRri:$addr)]>;

				def STW : STOREi64<0x0, "stw", truncstorei32>;
				def STH : STOREi64<0x1, "sth", truncstorei16>;
				def STB : STOREi64<0x2, "stb", truncstorei8>;
				def STD : STOREi64<0x3, "std", store>;

				// LOAD instructions
				class LOAD<bits<2> SizeOp, string OpcodeStr, list<dag> Pattern>
				: InstConnex<(outs GPR:$dst), (ins MEMri:$addr),
				!strconcat(OpcodeStr, "\t$dst, $addr"), Pattern> {
				bits<3> mode;
				bits<2> size;
				bits<4> dst;
				bits<20> addr;

				let Inst{63-61} = mode;
				let Inst{60-59} = size;
				let Inst{51-48} = dst;
				let Inst{55-52} = addr{19-16};
				let Inst{47-32} = addr{15-0};

				let mode = 3; // Connex_MEM
				let size = SizeOp;
				let ConnexClass = 1; // Connex_LDX
				}

				class LOADi64<bits<2> SizeOp, string OpcodeStr, PatFrag OpNode>
				: LOAD<SizeOp, OpcodeStr, [(set i64:$dst, (OpNode ADDRri:$addr))]>;

				def LDW : LOADi64<0x0, "ldw", zextloadi32>;
				def LDH : LOADi64<0x1, "ldh", zextloadi16>;
				def LDB : LOADi64<0x2, "ldb", zextloadi8>;
				def LDD : LOADi64<0x3, "ldd", load>;

				class BRANCH<bits<4> Opc, string OpcodeStr, list<dag> Pattern>
				: InstConnex<(outs), (ins brtarget:$BrDst),
				!strconcat(OpcodeStr, "\t$BrDst"), Pattern> {
				bits<4> op;
				bits<16> BrDst;
				bits<1> ConnexSrc;

				let Inst{63-60} = op;
				let Inst{59} = ConnexSrc;
				let Inst{47-32} = BrDst;

				let op = Opc;
				let ConnexSrc = 0;
				let ConnexClass = 5; // Connex_JMP
				}

				class CALL<string OpcodeStr>
				: InstConnex<(outs), (ins calltarget:$BrDst),
				!strconcat(OpcodeStr, "\t$BrDst"), []> {
				bits<4> op;
				bits<32> BrDst;
				bits<1> ConnexSrc;

				let Inst{63-60} = op;
				let Inst{59} = ConnexSrc;
				let Inst{31-0} = BrDst;

				let op = 8; // Connex_CALL
				let ConnexSrc = 0;
				let ConnexClass = 5; // Connex_JMP
				}

				// Jump always
				let isBranch = 1, isTerminator = 1, hasDelaySlot=0, isBarrier = 1 in {
				def JMP : BRANCH<0x0, "jmp", [(br bb:$BrDst)]>;
				}

				// Jump and link
				let isCall=1, hasDelaySlot=0, Uses = [R11],
				// Potentially clobbered registers
				Defs = [R0, R1, R2, R3, R4, R5] in {
				def JAL : CALL<"call">;
				}

				class NOP_I<string OpcodeStr>
				: InstConnex<(outs), (ins i32imm:$imm),
				//!strconcat(OpcodeStr, "\t$imm ; // scalar or vector NOP"), []>
				!strconcat(OpcodeStr,
				" ; // (immOperand = $imm ) scalar or vector NOP"), []>
				{
				// mov r0, r0 == nop
				bits<4> op;
				bits<1> ConnexSrc;
				bits<4> dst;
				bits<4> src;

				let Inst{63-60} = op;
				let Inst{59} = ConnexSrc;
				let Inst{55-52} = src;
				let Inst{51-48} = dst;

				let op = 0xb; // Connex_MOV
				let ConnexSrc = 1; // Connex_X
				let ConnexClass = 7; // Connex_ALU64
				let src = 0; // R0
				let dst = 0; // R0
				}

				/* If we manually generate NOP (for delay slots) it means we want to keep it,
				otherwise we should NOT have it generated
				(hasSideEffects - The instruction has side effects that are not captured
				by any operands of the instruction or other flags.)
				*/
				let hasSideEffects = 1 in
				def NOP_BPF : NOP_I<"NOP">;


				class RET<string OpcodeStr>
				: InstConnex<(outs), (ins),
				!strconcat(OpcodeStr, ""), [(Connexretflag)]> {
				bits<4> op;

				let Inst{63-60} = op;
				let Inst{59} = 0;
				let Inst{31-0} = 0;

				let op = 9; // Connex_EXIT
				let ConnexClass = 5; // Connex_JMP
				}

				let isReturn = 1, isTerminator = 1, hasDelaySlot=0, isBarrier = 1,
				isNotDuplicable = 1 in {
				def RET : RET<"ret">;
				}

				// ADJCALLSTACKDOWN/UP pseudo insns
				let Defs = [R11], Uses = [R11], isCodeGenOnly = 1 in {
				def ADJCALLSTACKDOWN : Pseudo<(outs), (ins i64imm:$amt1, i64imm:$amt2),
				"#ADJCALLSTACKDOWN $amt1 $amt2",
				[(Connexcallseq_start timm:$amt1, timm:$amt2)]>;
				def ADJCALLSTACKUP : Pseudo<(outs), (ins i64imm:$amt1, i64imm:$amt2),
				"#ADJCALLSTACKUP $amt1 $amt2",
				[(Connexcallseq_end timm:$amt1, timm:$amt2)]>;
				}

				let usesCustomInserter = 1 in {
				def Select : Pseudo<
				(outs GPR:$dst),
				(ins GPR:$lhs, GPR:$rhs, i64imm:$imm, GPR:$src, GPR:$src2),
				"# Select PSEUDO $dst = $lhs $imm $rhs ? $src : $src2",
				[(set i64:$dst,
				(Connexselectcc i64:$lhs, i64:$rhs, (i64 imm:$imm),
				i64:$src, i64:$src2))]>;
				}

				// load 64-bit global addr into register
				def : Pat<(ConnexWrapper tglobaladdr:$in), (LD_imm64 tglobaladdr:$in)>;

				// 0xffffFFFF doesn't fit into simm32, optimize common case
				def : Pat<(i64 (and (i64 GPR:$src), 0xffffFFFF)),
				(SRL_ri (SLL_ri (i64 GPR:$src), 32), 32)>;

				// Calls
				def : Pat<(Connexcall tglobaladdr:$dst), (JAL tglobaladdr:$dst)>;
				def : Pat<(Connexcall imm:$dst), (JAL imm:$dst)>;

				// Loads
				def : Pat<(extloadi8 ADDRri:$src), (i64 (LDB ADDRri:$src))>;
				def : Pat<(extloadi16 ADDRri:$src), (i64 (LDH ADDRri:$src))>;
				def : Pat<(extloadi32 ADDRri:$src), (i64 (LDW ADDRri:$src))>;

				// Atomics
				class XADD<bits<2> SizeOp, string OpcodeStr, PatFrag OpNode>
				: InstConnex<(outs GPR:$dst), (ins MEMri:$addr, GPR:$val),
				!strconcat(OpcodeStr, "\t$dst, $addr, $val"),
				[(set GPR:$dst, (OpNode ADDRri:$addr, GPR:$val))]> {
				bits<3> mode;
				bits<2> size;
				// bits<4> src;
				bits<20> addr;

				let Inst{63-61} = mode;
				let Inst{60-59} = size;
				let Inst{51-48} = addr{19-16}; // base reg
				// let Inst{55-52} = src;
				let Inst{47-32} = addr{15-0}; // offset

				let mode = 6; // Connex_XADD
				let size = SizeOp;
				let ConnexClass = 3; // Connex_STX
				}

				let Constraints = "$dst = $val" in {
				def XADD32 : XADD<0, "xadd32", atomic_load_add_32>;
				def XADD64 : XADD<3, "xadd64", atomic_load_add_64>;
				// undefined def XADD16 : XADD<1, "xadd16", atomic_load_add_16>;
				// undefined def XADD8 : XADD<2, "xadd8", atomic_load_add_8>;
				}

				// bswap16, bswap32, bswap64
				class BSWAP<bits<32> SizeOp, string OpcodeStr, list<dag> Pattern>
				: InstConnex<(outs GPR:$dst), (ins GPR:$src),
				!strconcat(OpcodeStr, "\t$dst"),
				Pattern> {
				bits<4> op;
				bits<1> ConnexSrc;
				bits<4> dst;
				bits<32> imm;

				let Inst{63-60} = op;
				let Inst{59} = ConnexSrc;
				let Inst{51-48} = dst;
				let Inst{31-0} = imm;

				let op = 0xd; // Connex_END
				let ConnexSrc = 1; // Connex_TO_LE (TODO: use Connex_TO_BE for big-endian target)
				let ConnexClass = 4; // Connex_ALU
				let imm = SizeOp;
				}

				let Constraints = "$dst = $src" in {
				def BSWAP16 : BSWAP<16, "bswap16",
				[(set GPR:$dst, (srl (bswap GPR:$src), (i64 48)))]>;
				def BSWAP32 : BSWAP<32, "bswap32",
				[(set GPR:$dst, (srl (bswap GPR:$src), (i64 32)))]>;
				def BSWAP64 : BSWAP<64, "bswap64",
				[(set GPR:$dst, (bswap GPR:$src))]>;
				}

				let Defs = [R0, R1, R2, R3, R4, R5], Uses = [R6], hasSideEffects = 1,
				hasExtraDefRegAllocReq = 1, hasExtraSrcRegAllocReq = 1, mayLoad = 1 in {
				class LOAD_ABS<bits<2> SizeOp, string OpcodeStr, Intrinsic OpNode>
				: InstConnex<(outs), (ins GPR:$skb, i64imm:$imm),
				!strconcat(OpcodeStr, "\tr0, $skb.data + $imm"),
				[(set R0, (OpNode GPR:$skb, i64immSExt32:$imm))]> {
				//[(set R0, (OpNode GPR:$skb, (i64immSExt32:$imm)))]> {
				bits<3> mode;
				bits<2> size;
				bits<32> imm;

				let Inst{63-61} = mode;
				let Inst{60-59} = size;
				let Inst{31-0} = imm;

				let mode = 1; // Connex_ABS
				let size = SizeOp;
				let ConnexClass = 0; // Connex_LD
				}

				class LOAD_IND<bits<2> SizeOp, string OpcodeStr, Intrinsic OpNode>
				: InstConnex<(outs), (ins GPR:$skb, GPR:$val),
				!strconcat(OpcodeStr, "\tr0, $skb.data + $val"),
				[(set R0, (OpNode GPR:$skb, GPR:$val))]> {
				bits<3> mode;
				bits<2> size;
				bits<4> val;

				let Inst{63-61} = mode;
				let Inst{60-59} = size;
				let Inst{55-52} = val;

				let mode = 2; // Connex_IND
				let size = SizeOp;
				let ConnexClass = 0; // Connex_LD
				}
				}

				def LD_ABS_B : LOAD_ABS<2, "ldabs_b", int_connex_load_byte>;
				def LD_ABS_H : LOAD_ABS<1, "ldabs_h", int_connex_load_half>;
				def LD_ABS_W : LOAD_ABS<0, "ldabs_w", int_connex_load_word>;

				def LD_IND_B : LOAD_IND<2, "ldind_b", int_connex_load_byte>;
				def LD_IND_H : LOAD_IND<1, "ldind_h", int_connex_load_half>;
				def LD_IND_W : LOAD_IND<0, "ldind_w", int_connex_load_word>;

llvm/lib/Target/Connex/ConnexInstrInfoVec.td

This file was added.

				//==- ConnexInstrInfoVec.td - Scalar Target Description for Connex Target -==//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file describes the Connex-S vector instructions in TableGen format.
				// The ISA is described in ConnexISA.pdf.
				//
				//===----------------------------------------------------------------------===//

				// From include/llvm/IR/IntrinsicsConnex.td:
				// def int_connex_reduce : Intrinsic<[], [llvm_anyvector_ty], []>;
				class RED_1R_DESC_BASE<string instr_asm,
				RegisterOperand ROWS,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs);
				dag InOperandList = (ins ROWS:$wl);
				string AsmString = !strconcat(instr_asm,
				" $wl ; // MSA_1R generic instruction");
				list<dag> Pattern = [(int_connex_reduce ROWS:$wl)];
				bit hasSideEffects = 1;
				InstrItinClass Itinerary = itin;
				}
				class RED_H_DESC : RED_1R_DESC_BASE<"RED", VectorHOpnd>;
				class RED_H_ENC : MSA_1R_FMT<0b101010110>;
				def RED_H: RED_H_ENC, RED_H_DESC;

				// NOTE: RED_U, unsigned reduction, is only used for manual/custom ISel
				class RED_U_1R_DESC_BASE<string instr_asm,
				RegisterOperand ROWS,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs);
				dag InOperandList = (ins ROWS:$wl);
				string AsmString = !strconcat(instr_asm,
				" $wl ; // MSA_1R generic instruction");
				list<dag> Pattern = [];
				bit hasSideEffects = 1;
				InstrItinClass Itinerary = itin;
				}
				class RED_U_H_DESC : RED_U_1R_DESC_BASE<"RED_U", VectorHOpnd>;
				class RED_U_H_ENC : MSA_1R_FMT<0b100000000>;
				// Note: does NOT affect any flags
				def RED_U_H: RED_U_H_ENC, RED_U_H_DESC;

				let hasSideEffects = 1 in // Inspired from MSP430InstrInfo.td
				def NOP : NonImmediateInstruction<0b000000000, (outs), (ins),
				"NOP; // NOP : NonImmediateInstruction", []>;
				// Note: does NOT affect any flags

				// NEW32
				class NOP_BITCONVERT_DESC_BASE<string AsmStrInfo,
				RegisterOperand ROWS,
				RegisterOperand ROWD,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD:$wd);
				dag InOperandList = (ins ROWS:$wl);
				string AsmString = !strconcat(
				"// 'NOP' used for vector bitconvert $wl -> $wd . ",
				AsmStrInfo);
				list<dag> Pattern = [];

				// Inspired from include/llvm/Target/Target.td:
				// <<OperandConstraint, e.g. $src = $dst.>>
				string Constraints = "$wl = $wd";

				/*
				bit hasSideEffects = 1;
				// We need to put this since we don't specify a DAG pattern in Pattern
				*/
				InstrItinClass Itinerary = itin;
				}
				class NOP_BITCONVERT_ENC : MSA_2R_FMT<0b000000000>; // MSAInst
				class NOP_BITCONVERT_HW_DESC : NOP_BITCONVERT_DESC_BASE<"v8i16 to v4i32",
				VectorHOpnd,
				VectorHOpnd>;
				def NOP_BITCONVERT_HW: NOP_BITCONVERT_ENC, NOP_BITCONVERT_HW_DESC;
				//
				class NOP_BITCONVERT_WH_DESC : NOP_BITCONVERT_DESC_BASE<"v4i32 to v8i16",
				VectorHOpnd,
				VectorHOpnd>;
				def NOP_BITCONVERT_WH: NOP_BITCONVERT_ENC, NOP_BITCONVERT_WH_DESC;
				//
				class NOP_BITCONVERT_HH_DESC : NOP_BITCONVERT_DESC_BASE<"v8i16 to v8i16",
				VectorHOpnd,
				VectorHOpnd>;
				def NOP_BITCONVERT_HH: NOP_BITCONVERT_ENC, NOP_BITCONVERT_HH_DESC;


				class NOP_BOGUS_DESC_BASE<InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs);
				dag InOperandList = (ins);
				string AsmString = "// 'NOP' used for ConnexInstrInfo::copyPhysReg()";
				list<dag> Pattern = [];

				// We put this normally since we don't specify a DAG pattern in Pattern
				//bit hasSideEffects = 1;

				InstrItinClass Itinerary = itin;
				}
				class NOP_BOGUS_ENC : MSAInst; // TODO: add encoding;
				def NOP_BOGUS: NOP_BOGUS_ENC, NOP_BOGUS_DESC_BASE;

				class NOP_SPECIAL_DESC_BASE<RegisterOperand ROWS,
				RegisterOperand ROWD_TIED = ROWS,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD_TIED:$wdTied);
				dag InOperandList = (ins ROWS:$ws);
				string AsmString =
				"NOP ; // NOP_SPECIAL_DESC_BASE (ws = $ws, wdTied = $wdTied )";
				list<dag> Pattern = [];

				string Constraints = "$ws = $wdTied";

				// We need to put this since we don't specify a DAG pattern in Pattern
				bit hasSideEffects = 1;

				InstrItinClass Itinerary = itin;
				}
				class NOP_SPECIAL_DESC : NOP_SPECIAL_DESC_BASE<VectorHOpnd>;
				class NOP_SPECIAL_ENC : MSAInst; // TODO: add encoding;
				def NOP_SPECIAL: NOP_SPECIAL_ENC, NOP_SPECIAL_DESC;








				// From MipsMSAInstrInfo.td
				class IsCommutable {
				bit isCommutable = 1;
				}



				class MSA_2R_DESC_BASE<string instr_asm, SDPatternOperator OpNode,
				RegisterOperand ROWD, RegisterOperand ROWS = ROWD,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD:$wd);
				dag InOperandList = (ins ROWS:$wl);
				// INFIX_NOTATION:
				string AsmString = !strconcat("$wd = ",
				!strconcat(instr_asm,
				" $wl ; // MSA_2R generic instruction"));
				list<dag> Pattern = [(set ROWD:$wd, (OpNode ROWS:$wl))];
				InstrItinClass Itinerary = itin;
				}

				class MSA_3R_DESC_BASE<string instr_asm,
				SDPatternOperator OpNode,
				RegisterOperand ROWD,
				RegisterOperand ROWR = ROWD,
				RegisterOperand ROWL = ROWD,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD:$wd);
				dag InOperandList = (ins ROWR:$wr, ROWL:$wl);
				// INFIX_NOTATION:
				string AsmString = !strconcat("$wd = $wr ",
				!strconcat(instr_asm,
				" $wl ; // MSA_3R generic instruction"));
				list<dag> Pattern = [(set ROWD:$wd, (OpNode ROWR:$wr, ROWL:$wl))];
				InstrItinClass Itinerary = itin;
				}

				class MSA_3R_PREFIX_DESC_BASE<string instr_asm,
				SDPatternOperator OpNode,
				RegisterOperand ROWD,
				RegisterOperand ROWR = ROWD,
				RegisterOperand ROWL = ROWD,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD:$wd);
				dag InOperandList = (ins ROWR:$wr, ROWL:$wl);
				string AsmString = !strconcat(!strconcat("$wd = ", instr_asm),
				"( $wr , $wl ) ; // MSA_3R_PREFIX generic instruction");
				list<dag> Pattern = [(set ROWD:$wd, (OpNode ROWR:$wr, ROWL:$wl))];
				//bit hasSideEffects = 1;
				InstrItinClass Itinerary = itin;
				}


				///////////////////////////////////////////////////////////////////////////////
				//////////////////////BEGIN (i)read/(i)write //////////////////////////////////
				///////////////////////////////////////////////////////////////////////////////

				//include "ConnexInstrInfoVec_READ_WRITE.td"
				class LD_H_DESC : LD_DESC_BASE<load, v8i16, VectorHOpnd>; // ldvd OR iread
				// NEW32
				class LD_W_ENC : MSA_3R_FMT<0b000000000>;
				// NOTE: this is not a real Connex-S instruction
				class LD_H_ENC : MSA_1R_FMT_dest_imm<0b110100>;
				// Note: does NOT affect any flags
				def LD_H: LD_H_ENC, LD_H_DESC;

				class ST_H_DESC : ST_DESC_BASE<store, v8i16, VectorHOpnd>; // stvd OR iwrite
				class ST_H_ENC : MSA_1R_FMT_left_imm<0b110010>;
				// Note: does NOT affect any flags
				def ST_H: ST_H_ENC, ST_H_DESC;

				class LD_INDIRECT_MASKED_H_DESC : LD_INDIRECT_MASKED_DESC_BASE<VectorHOpnd>;
				class LD_INDIRECT_MASKED_H_ENC : MSA_3R_FMT2<0b100100100>;
				// Note: does NOT affect any flags
				def LD_INDIRECT_MASKED_H: LD_INDIRECT_MASKED_H_ENC, LD_INDIRECT_MASKED_H_DESC;

				class LD_INDIRECT_W_DESC : LD_INDIRECT_DESC_BASE<VectorHOpnd>;
				class LD_INDIRECT_W_ENC : MSA_3R_FMT2<0b100100100>;
				// Note: does NOT affect any flags
				def LD_INDIRECT_W: LD_INDIRECT_W_ENC, LD_INDIRECT_W_DESC;
				//
				class LD_INDIRECT_H_DESC : LD_INDIRECT_DESC_BASE<VectorHOpnd>;
				class LD_INDIRECT_H_ENC : MSA_3R_FMT2<0b100100100>;
				// Note: does NOT affect any flags
				def LD_INDIRECT_H: LD_INDIRECT_H_ENC, LD_INDIRECT_H_DESC;

				class ST_INDIRECT_W_DESC : ST_INDIRECT_DESC_BASE<VectorHOpnd>;
				class ST_INDIRECT_W_ENC : MSA_2R_FMT2<0b101001011>;
				// Note: affects flags: like Sub, Lt, Eq
				def ST_INDIRECT_W: ST_INDIRECT_W_ENC, ST_INDIRECT_W_DESC;
				//
				class ST_INDIRECT_H_DESC : ST_INDIRECT_DESC_BASE<VectorHOpnd>;
				class ST_INDIRECT_H_ENC : MSA_2R_FMT2<0b101001011>;
				// Note: affects flags: like Sub, Lt, Eq
				def ST_INDIRECT_H: ST_INDIRECT_H_ENC, ST_INDIRECT_H_DESC;

				class ST_INDIRECT_MASKED_H_DESC : ST_INDIRECT_MASKED_DESC_BASE<VectorHOpnd>;
				class ST_INDIRECT_MASKED_H_ENC : MSA_3R_FMT<0b100010100>;
				// Note: affects flags: like Sub, Lt, Eq
				def ST_INDIRECT_MASKED_H: ST_INDIRECT_MASKED_H_ENC, ST_INDIRECT_MASKED_H_DESC;


				// Fill (load of spilling) - to recognize it's a fill in ConnexTargetMachine.cpp
				def LD_FILL_H: LD_H_ENC, LD_H_DESC;
				//
				// For the spill - to recognize it is a spill in ConnexTargetMachine.cpp
				def ST_SPILL_H: ST_H_ENC, ST_H_DESC;

				///////////////////////////////////////////////////////////////////////////////
				///////////////////////END (i)read/(i)write ///////////////////////////////////
				///////////////////////////////////////////////////////////////////////////////


				///////////////////////////////////////////////////////////////////////////////
				///////////////BEGIN REPEAT////////////////////////////////////////////////////
				///////////////////////////////////////////////////////////////////////////////

				// VERY IMPORTANT: we use the OPINCAA name instructions REPEAT and END_REPEAT
				// because REPEAT is transformed by the OPINCAA assembler in
				// 2 Connex SETLC assembly instructions due to a hardware workaround.
				// Also, END_REPEAT is transformed by OPINCAA in IJMPNZDEC and NOP.


				class REPEAT_DESC_BASE<InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs);
				dag InOperandList = (ins i64imm:$imm);

				string AsmString = "REPEAT_TIMES($imm ); // ; REP_1R";

				list<dag> Pattern = [(int_connex_repeat_x_times imm:$imm)];
				bit hasSideEffects = 1;
				InstrItinClass Itinerary = itin;
				}
				class REPEAT_DESC : REPEAT_DESC_BASE;
				class REPEAT_ENC : Connex_IMM16_FMT2<0b010101>;
				def REPEAT: REPEAT_ENC, REPEAT_DESC;


				class REPEAT_DESC_BASE_SYM_IMM<InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs);
				dag InOperandList = (ins);

				string AsmString =
				"REPEAT_TIMES( // (fake but necessary ; ) REPEAT_DESC_BASE_SYM_IMM";
				/* IMPORTANT: this instruction does NOT have a pattern here, but is matched
				using custom matching code in ConnexISelDAGToDAG.cpp,
				void ConnexDAGToDAGISel::Select(SDNode Node) /
				list<dag> Pattern = [];
				bit hasSideEffects = 1;
				// We need to put this since we don't specify a DAG pattern in Pattern
				InstrItinClass Itinerary = itin;
				}
				class REPEAT_DESC_SYM_IMM : REPEAT_DESC_BASE_SYM_IMM;
				class REPEAT_ENC_SYM_IMM : MSAInst; // small-MEGA TODO: add encoding;
				def REPEAT_SYM_IMM: REPEAT_ENC_SYM_IMM, REPEAT_DESC_SYM_IMM;


				class END_REPEAT_DESC_BASE<InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs);
				dag InOperandList = (ins);

				string AsmString = "END_REPEAT; // END_REPEAT_DESC_BASE";
				list<dag> Pattern = [(int_connex_end_repeat)]; // small-TODO: maybe I should put brcond targetAddress
				bit hasSideEffects = 1;
				bit isBranch = 1; // small-TODO: Is this instruction a branch instruction?
				InstrItinClass Itinerary = itin;
				}
				class END_REPEAT_DESC : END_REPEAT_DESC_BASE;
				class END_REPEAT_ENC : MSAInst;
				def END_REPEAT: END_REPEAT_ENC, END_REPEAT_DESC;

				///////////////////////////////////////////////////////////////////////////////
				//////////////END REPEAT //////////////////////////////////////////////////////
				///////////////////////////////////////////////////////////////////////////////

				// These are taken from Mips MipsMSAInstrInfo.td.
				// See also http://llvm.org/docs/LangRef.html#shl-instruction
				class SHLV_H_DESC : MSA_3R_DESC_BASE<"<<", shl, VectorHOpnd>;
				class SHLV_H_ENC : MSA_3R_FMT<0b101000000>;
				// Note: affects flags: like Add, Lt, Eq
				def SHLV_H : SHLV_H_ENC, SHLV_H_DESC;

				// srl is defined in [LLVM]/llvm/include/llvm/Target/TargetSelectionDAG.td
				class SHRV_H_DESC : MSA_3R_DESC_BASE<">>", srl, VectorHOpnd>;
				class SHRV_H_ENC : MSA_3R_FMT<0b101010000>;
				// Note: affects flags: like Sub, Lt, Eq
				def SHRV_H : SHRV_H_ENC, SHRV_H_DESC;

				// sra is defined in [LLVM]/llvm/include/llvm/Target/TargetSelectionDAG.td
				class SHRAV_H_DESC : MSA_3R_PREFIX_DESC_BASE<"SHRA", sra, VectorHOpnd>;
				class SHRAV_H_ENC : MSA_3R_FMT<0b101100000>;
				// Note: affects flags: like Addc, Ult, Eq
				def SHRAV_H : SHRAV_H_ENC, SHRAV_H_DESC;


				class NOT_H_DESC : MSA_2R_DESC_BASE<"~", not, VectorHOpnd>;
				class NOT_H_ENC : MSA_2R_FMT<0b101001100>;
				// Note: affects in an undefined way the flags
				def NOT_H : NOT_H_ENC, NOT_H_DESC;

				class ORV_H_DESC : MSA_3R_DESC_BASE<"\|", or, VectorHOpnd>, IsCommutable;
				class ORV_H_ENC : MSA_3R_FMT<0b101011100>;
				// Note: affects flags: like Sub, Lt, Eq
				def ORV_H : ORV_H_ENC, ORV_H_DESC;

				class ANDV_H_DESC : MSA_3R_DESC_BASE<"&", and, VectorHOpnd>, IsCommutable;
				class ANDV_H_ENC : MSA_3R_FMT<0b101101100>;
				// Note: affects flags: like Addc, Ult, Eq
				def ANDV_H : ANDV_H_ENC, ANDV_H_DESC;

				class XORV_H_DESC : MSA_3R_DESC_BASE<"^", xor, VectorHOpnd>, IsCommutable;
				class XORV_H_ENC : MSA_3R_FMT<0b101111100>;
				// Note: affects flags: like Subc, Ult, Eq
				def XORV_H : XORV_H_ENC, XORV_H_DESC;




				// Using ctpop intrinsic - just like llvm/lib/Target/AArch64/AArch64InstrInfo.td
				class POPCNT_H_DESC : MSA_2R_DESC_BASE<"POPCNT", ctpop, VectorHOpnd>;
				class POPCNT_H_ENC : MSA_2R_FMT<0b101110000>;
				// Note: does NOT affect any flags
				def POPCNT_H : POPCNT_H_ENC, POPCNT_H_DESC;




				class MSA_2R_SPECIAL_DESC_BASE<string instr_asm,
				SDPatternOperator OpNode,
				RegisterOperand ROWD,
				RegisterOperand ROWS = ROWD,
				RegisterOperand ROWS_TIED = ROWD,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD:$wd);
				dag InOperandList = (ins ROWS:$wl, ROWS_TIED:$wlTied);
				// INFIX_NOTATION:
				string AsmString = !strconcat("$wd = ",
				!strconcat(instr_asm,
				" $wl ; // MSA_2R_SPECIAL generic instruction (wlTied = $wlTied )")
				);
				list<dag> Pattern = [];

				// Inspired from include/llvm/Target/Target.td:
				// <<OperandConstraint, e.g. $src = $dst.>>
				string Constraints = "$wd = $wlTied";

				// We need to put this since we don't specify a DAG pattern in Pattern
				bit hasSideEffects = 1;

				InstrItinClass Itinerary = itin;
				}

				class MSA_3R_SPECIAL_PREFIX_DESC_BASE<string instr_asm,
				SDPatternOperator OpNode,
				RegisterOperand ROWD,
				RegisterOperand ROWR = ROWD,
				RegisterOperand ROWL = ROWD,
				RegisterOperand ROWR_TIED = ROWD,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD:$wd);
				dag InOperandList = (ins ROWR:$wr, ROWL:$wl, ROWR_TIED:$wrTied);
				string AsmString = !strconcat("$wd = ",
				!strconcat(instr_asm,
				"($wr, $wl ) ; // MSA_3R_SPECIAL generic instruction (wrTied = $wrTied )"
				) );
				list<dag> Pattern = [];

				// Inspired from include/llvm/Target/Target.td:
				// <<OperandConstraint, e.g. $src = $dst.>>
				string Constraints = "$wd = $wrTied";

				// We need to put this since we don't specify a DAG pattern in Pattern
				bit hasSideEffects = 1;

				InstrItinClass Itinerary = itin;
				}

				class MSA_3R_SPECIAL_DESC_BASE<string instr_asm,
				SDPatternOperator OpNode,
				RegisterOperand ROWD,
				RegisterOperand ROWR = ROWD,
				RegisterOperand ROWL = ROWD,
				RegisterOperand ROWR_TIED = ROWD,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD:$wd);
				dag InOperandList = (ins ROWR:$wr, ROWL:$wl, ROWR_TIED:$wrTied);
				// INFIX_NOTATION:
				string AsmString = !strconcat("$wd = $wr ",
				!strconcat(instr_asm,
				" $wl ; // MSA_3R_SPECIAL generic instruction (wrTied = $wrTied )"
				) );
				list<dag> Pattern = [];

				// Inspired from llvm/include/llvm/Target/Target.td:
				// <<OperandConstraint, e.g. $src = $dst.>>
				string Constraints = "$wd = $wrTied";

				// We need to put this since we don't specify a DAG pattern in Pattern
				bit hasSideEffects = 1;

				InstrItinClass Itinerary = itin;
				}






				include "ConnexInstrInfoVecVsplat.td"


				///////////////////////////////////////////////////////////////////////////////
				////////////////////////BEGIN ISHR/ISHR(A)/////////////////////////////////////
				///////////////////////////////////////////////////////////////////////////////
				//include "ConnexInstrInfoVec_bit.td"
				/* VERY IMPORTANT: even if the _ISHL/_ISHR(A) instructions are
				immediate, they have the 2nd opcode bit (also called the IMM bit)
				set to 0, as we can see in the Instruction format diagram above.
				So they are of type INSTRUCTION_TYPE_NO_IMMEDIATE
				- the immediate value is stored in the 5 bits of the
				right register, since it is enough for the delta operand
				of SHIFT operations (normally values 0-16 are enough for
				16-bit registers).

				IMPORTANT NOTE: these instructions take an immediate value which is to be
				applied as an operand on a vector binary operand.
				Hence, we must make this immediate operand a vector immediate operand
				(and not a scalar immediate operand as we might be tempted to put it,
				imm/i64imm),
				otherwise TableGen will complain "Could not infer all types in pattern!"
				at the Pattern(s) below.
				*/
				class Connex_IMM_SHIFT_DESC_BASE<string instr_asm,
				SDPatternOperator OpNode,
				SplatComplexPattern SplatImm,
				RegisterOperand ROWD,
				RegisterOperand ROWS = ROWD,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD:$wd);
				dag InOperandList = (ins ROWS:$wl, SplatImm.OpClass:$wr);
				// Note: For ISHL, ISHR(A) the immediate is actually the wr (Connex right)
				// register

				string AsmString = !strconcat("$wd = ",
				!strconcat(instr_asm,
				"($wl, $wr ); // Connex_IMM_SHIFT generic instruction"
				));

				list<dag> Pattern = [(set ROWD:$wd, (OpNode ROWS:$wl, SplatImm:$wr))];

				InstrItinClass Itinerary = itin;
				}


				class ISHLV_H_DESC : Connex_IMM_SHIFT_DESC_BASE<"ISHL", shl, vsplati16_uimm5,
				VectorHOpnd>;
				class ISHLV_H_ENC : Connex_NI_FMT_ISHV<0b101000001>;
				def ISHLV_H : ISHLV_H_ENC, ISHLV_H_DESC;

				class ISHRAV_H_DESC : Connex_IMM_SHIFT_DESC_BASE<"ISHRA", sra, vsplati16_uimm5,
				VectorHOpnd>;
				class ISHRAV_H_ENC : Connex_NI_FMT_ISHV<0b101100001>;
				def ISHRAV_H : ISHRAV_H_ENC, ISHRAV_H_DESC;

				class ISHRV_H_DESC : Connex_IMM_SHIFT_DESC_BASE<"ISHR", srl, vsplati16_uimm5,
				VectorHOpnd>;
				class ISHRV_H_ENC : Connex_NI_FMT_ISHV<0b101010001>;
				def ISHRV_H : ISHRV_H_ENC, ISHRV_H_DESC;

				class MSA_3R_PREFIX_SPECIAL_DESC_BASE<string instr_asm,
				SDPatternOperator OpNode,
				RegisterOperand ROWD,
				RegisterOperand ROWR = ROWD,
				RegisterOperand ROWL = ROWD,
				RegisterOperand ROWR_TIED = ROWD,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD:$wd);
				dag InOperandList = (ins ROWR:$wr, ROWL:$wl, ROWR_TIED:$wrTied);
				string AsmString = !strconcat(
				!strconcat("$wd = ", instr_asm),
				"($wr , $wl ) ; // MSA_3R_PREFIX_SPECIAL generic instr (wrTied = $wrTied )"
				);
				list<dag> Pattern = [];

				string Constraints = "$wd = $wrTied";

				// We need to put this to prevent llc (SelectionDAG, DAGCombiner, MachineCSE)
				// to apply CSE and other optimizations on the predicated instructions
				bit hasSideEffects = 1;

				InstrItinClass Itinerary = itin;
				}

				class SHLV_SPECIAL_H_DESC : MSA_3R_SPECIAL_DESC_BASE<"<<", shl, VectorHOpnd>;
				class SHLV_SPECIAL_H_ENC : MSA_3R_FMT<0b101000000>;
				// Note: affects flags: like Add, Lt, Eq
				def SHLV_SPECIAL_H : SHLV_SPECIAL_H_ENC, SHLV_SPECIAL_H_DESC;

				class SHRV_SPECIAL_H_DESC : MSA_3R_SPECIAL_DESC_BASE<">>", srl, VectorHOpnd>;
				class SHRV_SPECIAL_H_ENC : MSA_3R_FMT<0b101010000>;
				// Note: affects flags: like Sub, Lt, Eq
				def SHRV_SPECIAL_H : SHRV_SPECIAL_H_ENC, SHRV_SPECIAL_H_DESC;

				class SHRAV_SPECIAL_H_DESC : MSA_3R_PREFIX_SPECIAL_DESC_BASE<"SHRA", sra,
				VectorHOpnd>;
				class SHRAV_SPECIAL_H_ENC : MSA_3R_FMT<0b101100000>;
				// Note: affects flags: like Addc, Ult, Eq
				def SHRAV_SPECIAL_H : SHRAV_SPECIAL_H_ENC, SHRAV_SPECIAL_H_DESC;

				class NOT_SPECIAL_H_DESC : MSA_2R_SPECIAL_DESC_BASE<"~", not, VectorHOpnd>;
				class NOT_SPECIAL_H_ENC : MSA_2R_FMT<0b101001100>;
				// Note: affects in an undefined way the flags
				def NOT_SPECIAL_H : NOT_SPECIAL_H_ENC, NOT_SPECIAL_H_DESC;

				// We use ORV_SPECIAL_H for codegen of VSELECT
				class ORV_SPECIAL_H_DESC : MSA_3R_SPECIAL_DESC_BASE<"\|", or, VectorHOpnd>,
				IsCommutable;
				class ORV_SPECIAL_H_ENC : MSA_3R_FMT<0b101011100>;
				// Note: affects flags: like Addc, Ult, Eq
				def ORV_SPECIAL_H : ORV_SPECIAL_H_ENC, ORV_SPECIAL_H_DESC;

				class ANDV_SPECIAL_H_DESC : MSA_3R_SPECIAL_DESC_BASE<"&", and, VectorHOpnd>,
				IsCommutable;
				class ANDV_SPECIAL_H_ENC : MSA_3R_FMT<0b111111101>;
				// Note: affects flags: like Addc, Ult, Eq
				def ANDV_SPECIAL_H : ANDV_SPECIAL_H_ENC, ANDV_SPECIAL_H_DESC;

				class XORV_SPECIAL_H_DESC : MSA_3R_SPECIAL_DESC_BASE<"^", xor, VectorHOpnd>,
				IsCommutable;
				class XORV_SPECIAL_H_ENC : MSA_3R_FMT<0b101111100>;
				// Note: affects flags: like Subc, Ult, Eq
				def XORV_SPECIAL_H : XORV_SPECIAL_H_ENC, XORV_SPECIAL_H_DESC;



				// Using ctpop intrinsic - just like llvm/lib/Target/AArch64/AArch64InstrInfo.td
				class POPCNT_SPECIAL_H_DESC : MSA_2R_SPECIAL_DESC_BASE<"POPCNT", ctpop,
				VectorHOpnd>;
				class POPCNT_SPECIAL_H_ENC : MSA_2R_FMT<0b101110000>;
				// Note: does NOT affect any flags
				def POPCNT_SPECIAL_H : POPCNT_SPECIAL_H_ENC, POPCNT_SPECIAL_H_DESC;



				class POWER_CELL_H_DESC_BASE<string instr_asm,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs);
				dag InOperandList = (ins);
				string AsmString = !strconcat(instr_asm, "; // POWER_CELL_H_DESC ");
				list<dag> Pattern = [];

				bit hasSideEffects = 1;
				InstrItinClass Itinerary = itin;
				}
				class DISABLE_CELL_H_DESC : POWER_CELL_H_DESC_BASE<"DISABLE_CELL">;
				class DISABLE_CELL_H_ENC : MSAInst; // small-MEGA TODO: add encoding
				def DISABLE_CELL_H: DISABLE_CELL_H_ENC, DISABLE_CELL_H_DESC;
				//
				class ENABLE_ALL_CELLS_H_DESC : POWER_CELL_H_DESC_BASE<"ENABLE_ALL_CELLS">;
				class ENABLE_ALL_CELLS_H_ENC : MSAInst; // small-MEGA TODO: add encoding
				def ENABLE_ALL_CELLS_H: ENABLE_ALL_CELLS_H_ENC, ENABLE_ALL_CELLS_H_DESC;




				///////////////////////////////////////////////////////////////////////////////
				///////////////////////////////////////////////////////////////////////////////
				///////////////////////////////////////////////////////////////////////////////
				///////////////////////////////////////////////////////////////////////////////
				///////////////////////////////////////////////////////////////////////////////
				///////////////////////////////////////////////////////////////////////////////








				//include "ConnexInstrInfoVec_vsetcc_vselect.td"
				// Inspired from [LLVM]/llvm/lib/Target/Mips/MipsMSAInstrInfo.td, from:
				// From MipsMSAInstrInfo.td
				// 1 result, 3 input operands
				def SDT_VSetCC : SDTypeProfile<1, 3, [
				SDTCisVec<0>,
				SDTCisVec<1>,
				SDTCisSameAs<0, 1>,
				SDTCisVT<3, OtherVT>
				]>;
				def vsetcc : SDNode<"ISD::SETCC", SDT_VSetCC>;

				class vsetcc_type<ValueType ResTy, CondCode CC> :
				PatFrag<(ops node:$lhs, node:$rhs),
				(ResTy (vsetcc node:$lhs, node:$rhs, CC))>;

				def vseteq_v128i16 : vsetcc_type<v8i16, SETEQ>;
				def vsetle_v128i16 : vsetcc_type<v8i16, SETLE>;
				def vsetlt_v128i16 : vsetcc_type<v8i16, SETLT>;
				def vsetule_v128i16 : vsetcc_type<v8i16, SETULE>;
				def vsetult_v128i16 : vsetcc_type<v8i16, SETULT>;









				class Connex_IMM_SHIFT_SPECIAL_DESC_BASE<string instr_asm,
				SDPatternOperator OpNode,
				SplatComplexPattern SplatImm,
				RegisterOperand ROWD,
				RegisterOperand ROWS = ROWD,
				RegisterOperand ROWS_TIED = ROWD,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD:$wd);
				dag InOperandList = (ins ROWS:$wl, SplatImm.OpClass:$wr, ROWS_TIED:$wsTied);
				string AsmString = !strconcat("$wd = ",
				!strconcat(instr_asm,
				"($wl, $wr ); // Connex_IMM_SHIFT_SPECIAL generic instr (wsTied = $wsTied )"
				));

				list<dag> Pattern = [];

				string Constraints = "$wd = $wsTied";
				// Inspired from Target.td: <<OperandConstraint, e.g. $src = $dst.>>
				bit hasSideEffects = 1;
				// We need to put this since we don't specify a DAG pattern in Pattern

				InstrItinClass Itinerary = itin;
				}

				class ISHLV_SPECIAL_H_DESC : Connex_IMM_SHIFT_SPECIAL_DESC_BASE<"ISHL", shl,
				vsplati16_uimm5,
				VectorHOpnd>;
				def ISHLV_SPECIAL_H : ISHLV_H_ENC, ISHLV_SPECIAL_H_DESC;

				class ISHRV_SPECIAL_H_DESC : Connex_IMM_SHIFT_SPECIAL_DESC_BASE<"ISHR", srl,
				vsplati16_uimm5,
				VectorHOpnd>;
				def ISHRV_SPECIAL_H : ISHRV_H_ENC, ISHRV_SPECIAL_H_DESC;
				///////////////////////////////////////////////////////////////////////////////
				//////////////////////////END ISHR/ISHR(A)/////////////////////////////////////
				///////////////////////////////////////////////////////////////////////////////






				//include "ConnexInstrInfoVec_vinsert.td"
				// These are for ConnexTargetLowering::LowerEXTRACT_VECTOR_ELT and for
				// BUILD_VECTOR
				// Inspired from llvm/lib/Target/Mips/MipsMSAInstrInfo.td
				def MipsVExtractSExt : SDNode<"ConnexISD::VEXTRACT_SEXT_ELT",
				SDTypeProfile<1, 3, [SDTCisPtrTy<2>]>, []>;
				def MipsVExtractZExt : SDNode<"ConnexISD::VEXTRACT_ZEXT_ELT",
				SDTypeProfile<1, 3, [SDTCisPtrTy<2>]>, []>;


				// Inspired from MipsMSAInstrInfo.td
				def immZExt1Ptr : ImmLeaf<iPTR, [{return isUInt<1>(Imm);}]>;
				def immZExt2Ptr : ImmLeaf<iPTR, [{return isUInt<2>(Imm);}]>;
				def immZExt3Ptr : ImmLeaf<iPTR, [{return isUInt<3>(Imm);}]>;
				def immZExt4Ptr : ImmLeaf<iPTR, [{return isUInt<4>(Imm);}]>;


				// Pattern fragments
				def vextract_sext_i8 : PatFrag<(ops node:$vec, node:$idx),
				(MipsVExtractSExt node:$vec, node:$idx, i8)>;
				def vextract_sext_i16 : PatFrag<(ops node:$vec, node:$idx),
				(MipsVExtractSExt node:$vec, node:$idx, i16)>;
				def vextract_sext_i32 : PatFrag<(ops node:$vec, node:$idx),
				(MipsVExtractSExt node:$vec, node:$idx, i32)>;
				def vextract_sext_i64 : PatFrag<(ops node:$vec, node:$idx),
				(MipsVExtractSExt node:$vec, node:$idx, i64)>;

				def vextract_zext_i8 : PatFrag<(ops node:$vec, node:$idx),
				(MipsVExtractZExt node:$vec, node:$idx, i8)>;
				def vextract_zext_i16 : PatFrag<(ops node:$vec, node:$idx),
				(MipsVExtractZExt node:$vec, node:$idx, i16)>;
				def vextract_zext_i32 : PatFrag<(ops node:$vec, node:$idx),
				(MipsVExtractZExt node:$vec, node:$idx, i32)>;
				def vextract_zext_i64 : PatFrag<(ops node:$vec, node:$idx),
				(MipsVExtractZExt node:$vec, node:$idx, i64)>;

				def vinsert_v8i16 : PatFrag<(ops node:$vec, node:$val, node:$idx),
				(v8i16 (vector_insert node:$vec, node:$val, node:$idx))>;












				// From MipsDSPInstrInfo.td
				def immZExt1 : ImmLeaf<i32, [{return isUInt<1>(Imm);}]>;




				class EQ_H_DESC : MSA_3R_DESC_BASE<"==", vseteq_v128i16, VectorHOpnd>,
				IsCommutable;
				class EQ_H_ENC : MSA_3R_FMT<0b101001000>;
				let isCompare = 1 in // (Is this instruction a comparison instruction?)
				// We model the fact it changes the flags of Connex
				// (hasSideEffects - The instruction has side effects that are not captured
				// by any operands of the instruction or other flags.)
				let hasSideEffects = 1 in
				def EQ_H : EQ_H_ENC, EQ_H_DESC;

				class LT_H_DESC : MSA_3R_DESC_BASE<"<", setlt, VectorHOpnd>;
				class LT_H_ENC : MSA_3R_FMT<0b101011000>;
				let isCompare = 1 in // (Is this instruction a comparison instruction?)
				let hasSideEffects = 1 in
				def LT_H : LT_H_ENC, LT_H_DESC;

				class ULT_H_DESC : MSA_3R_PREFIX_DESC_BASE<"ULT", setult, VectorHOpnd>;
				class ULT_H_ENC : MSA_3R_FMT<0b101101000>;
				let isCompare = 1 in
				let hasSideEffects = 1 in
				def ULT_H : ULT_H_ENC, ULT_H_DESC;

				class EQ_SPECIAL_H_DESC : MSA_3R_SPECIAL_DESC_BASE<"==", vseteq_v128i16,
				VectorHOpnd>, IsCommutable;
				class EQ_SPECIAL_H_ENC : MSA_3R_FMT<0b101001000>;
				let isCompare = 1 in
				let hasSideEffects = 1 in
				def EQ_SPECIAL_H : EQ_SPECIAL_H_ENC, EQ_SPECIAL_H_DESC;

				class LT_SPECIAL_H_DESC : MSA_3R_SPECIAL_DESC_BASE<"<", vsetlt_v128i16,
				VectorHOpnd>;
				class LT_SPECIAL_H_ENC : MSA_3R_FMT<0b101011000>;
				let isCompare = 1 in
				let hasSideEffects = 1 in
				def LT_SPECIAL_H : LT_SPECIAL_H_ENC, LT_SPECIAL_H_DESC;

				class ULT_SPECIAL_H_DESC : MSA_3R_SPECIAL_PREFIX_DESC_BASE<"ULT",
				vsetult_v128i16,
				VectorHOpnd>;
				class ULT_SPECIAL_H_ENC : MSA_3R_FMT<0b101101000>;
				let isCompare = 1 in
				let hasSideEffects = 1 in
				def ULT_SPECIAL_H : ULT_SPECIAL_H_ENC, ULT_SPECIAL_H_DESC;

				///////////////////////////////////////////////////////////////////////////////
				//////////BEGIN VLOAD, LDIX, LDSH (load INDEX or from SHIFT_REG)///////////////
				///////////////////////////////////////////////////////////////////////////////
				//include "ConnexInstrInfoVec_VLOAD_LDIX_LDSH.td"

				/* IMPORTANT NOTE: these instructions take an immediate value which is to be
				applied as an operand on a vector binary operand.
				Hence, we must make this immediate operand a vector immediate operand
				(and not a scalar immediate operand as we might be tempted to put it,
				imm/i64imm),
				otherwise TableGen will complain "Could not infer all types in pattern!"
				at the Pattern(s) below.
				*/

				class MSA_I16_LDI_DESC_BASE<string instr_asm,
				SplatComplexPattern SplatImm,
				RegisterOperand ROWD,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD:$wd);
				dag InOperandList = (ins SplatImm.OpClass:$imm);
				string AsmString = "$wd = $imm ; // MSA_I16";

				/* IMPORTANT: this instruction does NOT have a pattern here, but is matched
				using custom matching code in ConnexISelDAGToDAG.cpp,
				void ConnexDAGToDAGISel::Select(SDNode Node) /
				list<dag> Pattern = [];

				bit hasSideEffects = 1;
				// We need to put this since we don't specify a DAG pattern in Pattern
				// bit isPredicable = 1;
				// Is this instruction predicable?
				InstrItinClass Itinerary = itin;
				}

				class VLOAD_H_DESC: MSA_I16_LDI_DESC_BASE<"VLOAD", vsplati16_simm16,
				VectorHOpnd>;
				class VLOAD_H_ENC : Connex_IMM16_FMT3<0b110101>;
				// Note: does NOT affect any flags
				// let isPredicable = 1 in
				let hasSideEffects = 0 in
				def VLOAD_H : VLOAD_H_ENC, VLOAD_H_DESC;


				class VLOAD_W_DESC : MSA_I16_LDI_DESC_BASE<"VLOAD_W_TODO",
				vsplati16_simm16, // TODO: Should be bigger
				VectorHOpnd>;
				class VLOAD_W_ENC : Connex_IMM16_FMT3<0b110101>;
				// IMPORTANT Note: the opcode is NOT correct since this is NOT an i16 native
				// Connex-S instruction
				//class VLOAD_W_ENC : MSA_2R_FMT<0b110110001>;
				def VLOAD_W : VLOAD_W_ENC, VLOAD_W_DESC;



				/* This is a bogus VLOAD to avoid initializing registers in
				Select*_OpincaaCodeGen.cpp.
				This allows to have predicated instructions refer to tied-to
				constraints to the nodes using this class (destination registers of
				predicated instr)
				without initializing the respective dest registers, since it's not necessary.
				*/
				class MSA_I16_LDI_BOGUS_DESC_BASE<string instr_asm,
				SplatComplexPattern SplatImm,
				RegisterOperand ROWD,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD:$wd);
				dag InOperandList = (ins SplatImm.OpClass:$imm);
				string AsmString = "// BOGUS: $wd = $imm ; // MSA_I16";
				list<dag> Pattern = [];

				bit hasSideEffects = 1;
				// We need to put this since we don't specify a DAG pattern in Pattern
				// bit isPredicable = 1;
				// Is this instruction predicable?
				InstrItinClass Itinerary = itin;
				}
				//
				class VLOAD_BOGUS_H_DESC : MSA_I16_LDI_BOGUS_DESC_BASE<"VLOAD_BOGUS",
				vsplati16_simm16,
				VectorHOpnd>;
				// let isPredicable = 1 in
				def VLOAD_BOGUS_H : VLOAD_H_ENC, VLOAD_BOGUS_H_DESC;

				// Used for special case of REDUCE_i32, etc
				class MSA_I16_LDI_SPECIAL_DESC_BASE<string instr_asm,
				SplatComplexPattern SplatImm,
				RegisterOperand ROWD,
				RegisterOperand ROWR=ROWD,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD:$wd);
				dag InOperandList = (ins SplatImm.OpClass:$imm, ROWR:$wrTied);
				string AsmString = "$wd = $imm ; // MSA_I16 (special input wd = $wrTied )";

				/* IMPORTANT: this instruction does NOT have a pattern here, but is matched
				using custom matching code in ConnexISelDAGToDAG.cpp,
				void ConnexDAGToDAGISel::Select(SDNode Node) /
				list<dag> Pattern = [];

				string Constraints = "$wrTied = $wd";
				// Inspired from Target.td: <<OperandConstraint, e.g. $src = $dst.>>

				bit hasSideEffects = 1;
				// We need to put this since we don't specify a DAG pattern in Pattern
				//bit isPredicable = 1; // Is this instruction predicable?
				InstrItinClass Itinerary = itin;
				}
				class VLOAD_SPECIAL_H_DESC : MSA_I16_LDI_SPECIAL_DESC_BASE<"VLOAD_SPECIAL",
				vsplati16_simm16,
				VectorHOpnd>;
				def VLOAD_SPECIAL_H : VLOAD_H_ENC, VLOAD_SPECIAL_H_DESC;

				class MSA_I16_LDI_SYM_IMM_DESC_BASE<string instr_asm,
				RegisterOperand ROWD,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD:$wd);
				dag InOperandList = (ins);
				string AsmString = "$wd = // (fake but necessary ; ) VLOAD_H_SYM_IMM MSA_I16";

				/* IMPORTANT: this instruction does NOT have a pattern here, but is matched
				using custom matching code in ConnexISelDAGToDAG.cpp,
				void ConnexDAGToDAGISel::Select(SDNode Node) /
				list<dag> Pattern = [];

				/* This prevents that the pre-RA LICM no longer move the instructions,
				in principle. We need to put this since we don't specify a
				DAG pattern in Pattern. */
				bit hasSideEffects = 1;
				InstrItinClass Itinerary = itin;
				}
				class VLOAD_H_DESC_SYM_IMM : MSA_I16_LDI_SYM_IMM_DESC_BASE<"VLOAD",
				VectorHOpnd>;
				class VLOAD_H_ENC_SYM_IMM : Connex_IMM16_SYM_FMT<0b110101>;
				def VLOAD_H_SYM_IMM : VLOAD_H_ENC_SYM_IMM, VLOAD_H_DESC_SYM_IMM;

				class LDIX_H_DESC : MSA_LDIX_LDSH_MULT_H_DESC_BASE<"INDEX", VectorHOpnd>;
				class LDIX_H_ENC : MSA_1R_FMT_dest<0b100100000>;
				// Note: does NOT affect any flags
				def LDIX_H : LDIX_H_ENC, LDIX_H_DESC;

				class MSA_LDIX_LDSH_MULT_SPECIAL_H_DESC_BASE<string instr_asm,
				RegisterOperand ROWD,
				RegisterOperand ROWS = ROWD,
				InstrItinClass itin = NoItinerary>
				{
				dag OutOperandList = (outs ROWD:$wd);
				dag InOperandList = (ins ROWS:$ws);
				string AsmString = !strconcat(!strconcat("$wd = ", instr_asm),
				" ; // MSA_LDIX_LDSH_MULT (ws = $ws )"
				);

				// LDI is matched using custom matching code in MipsSEISelDAGToDAG.cpp
				list<dag> Pattern = [];

				bit hasSideEffects = 1;
				// We need to put this since we don't specify a DAG pattern in Pattern

				string Constraints = "$ws = $wd";
				// Inspired from Target.td: <<OperandConstraint, e.g. $src = $dst.>>

				InstrItinClass Itinerary = itin;
				}
				class LDIX_SPECIAL_H_DESC : MSA_LDIX_LDSH_MULT_SPECIAL_H_DESC_BASE<"INDEX",
				VectorHOpnd>;
				class LDIX_SPECIAL_H_ENC : MSA_1R_FMT_dest<0b100100000>;
				// Note: does NOT affect any flags
				def LDIX_SPECIAL_H : LDIX_SPECIAL_H_ENC, LDIX_SPECIAL_H_DESC;

				class LDSH_H_DESC : MSA_LDIX_LDSH_MULT_H_DESC_BASE<"SHIFT_REG", VectorHOpnd>;
				class LDSH_H_ENC : MSA_1R_FMT_dest<0b100110000>;
				// Note: does NOT affect any flags
				def LDSH_H : LDSH_H_ENC, LDSH_H_DESC;

				///////////////////////////////////////////////////////////////////////////////
				/////////////////END VLOAD, LDIX and LDSH//////////////////////////////////////
				///////////////////////////////////////////////////////////////////////////////








				// Note: add is defined in [LLVM]/llvm/include/llvm/Target/TargetSelectionDAG.td
				class ADDV_H_DESC : MSA_3R_DESC_BASE<"+", add, VectorHOpnd>, IsCommutable;
				class ADDV_H_ENC : MSA_3R_FMT<0b101000100>;
				// Note: affects flags: like Add, Lt, Eq
				def ADDV_H : ADDV_H_ENC, ADDV_H_DESC;

				class ADDV_SPECIAL_H_DESC : MSA_3R_SPECIAL_DESC_BASE<"+", add, VectorHOpnd>,
				IsCommutable;
				class ADDV_SPECIAL_H_ENC : MSA_3R_FMT<0b101000101>;
				// Note: affects flags: like Add, Lt, Eq
				def ADDV_SPECIAL_H : ADDV_SPECIAL_H_ENC, ADDV_SPECIAL_H_DESC;

				class SUBV_H_DESC : MSA_3R_DESC_BASE<"-", sub, VectorHOpnd>;
				class SUBV_H_ENC : MSA_3R_FMT<0b101010100>;
				// Note: affects flags: like Sub, Lt, Eq
				def SUBV_H : SUBV_H_ENC, SUBV_H_DESC;

				class SUBV_SPECIAL_H_DESC : MSA_3R_SPECIAL_DESC_BASE<"-", sub, VectorHOpnd>;
				// Note: affects flags: like Sub, Lt, Eq
				def SUBV_SPECIAL_H : SUBV_H_ENC, SUBV_SPECIAL_H_DESC;




				/*
				The ADDC, SUBC can experience NOT coming immediately after ADD and SUB,
				respectively, even if we make ADDC take a MVT::Glue result from ADD
				(this is not guaranteeing us to have ADDC coming immediately after ADD).
				In fact ADD can be scheduled after ADDC by the post-RA scheduler
				("******** List Scheduling ********") - see
				DawnCC/35_MatMul_i32/2/STDerr_llc_01;
				(a solution would be to disable running the post-RA scheduler pass,
				but this is not desirable).
				Therefore, we add for ADDC and SUBC one more input, coming from ADD and SUB,
				respectively - in order to make (at least) ADDC come after ADD. However, not
				even this should not guarantee that ADDC comes immediately after ADD as it
				should to preserve the Carry flags.
				*/
				class MSA_3R_PREFIX_CARRY_DESC_BASE<string instr_asm,
				RegisterOperand ROWD,
				RegisterOperand ROWR = ROWD,
				RegisterOperand ROWL = ROWD,
				RegisterOperand ROWR_CARRY = ROWD,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD:$wd);
				dag InOperandList = (ins ROWR:$wr, ROWL:$wl, ROWR_CARRY:$wrCarry);
				string AsmString = !strconcat(!strconcat("$wd = ", instr_asm),
				"( $wl , $wr ) ; // MSA_3R prefix Carry instruction (wrCarry = $wrCarry ) "
				);
				list<dag> Pattern = [];

				bit hasSideEffects = 1;
				// We need to put this since we don't specify a DAG pattern in Pattern
				InstrItinClass Itinerary = itin;
				}
				class ADDCV_H_DESC : MSA_3R_PREFIX_CARRY_DESC_BASE<"ADDC", VectorHOpnd>,
				IsCommutable;
				class ADDCV_H_ENC : MSA_3R_FMT<0b101100100>;
				// Note: affects flags: like Addc, Ult, Eq
				def ADDCV_H : ADDCV_H_ENC, ADDCV_H_DESC;

				class SUBCV_H_DESC : MSA_3R_PREFIX_CARRY_DESC_BASE<"SUBC", VectorHOpnd>;
				class SUBCV_H_ENC : MSA_3R_FMT<0b101110100>;
				// Note: affects flags: like Subc, Ult, Eq
				def SUBCV_H : SUBCV_H_ENC, SUBCV_H_DESC;

				// Similarly as for MSA_3R_PREFIX_CARRY_DESC_BASE we also
				// take the result of the instruction before that sets the Carry flags.
				class MSA_3R_PREFIX_CARRY_SPECIAL_DESC_BASE<string instr_asm,
				RegisterOperand ROWD,
				RegisterOperand ROWR = ROWD,
				RegisterOperand ROWL = ROWD,
				RegisterOperand ROWR_CARRY = ROWD,
				RegisterOperand ROWD_TIED = ROWD,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD:$wd);
				dag InOperandList = (ins ROWR:$wr, ROWL:$wl, ROWR_CARRY:$wrCarry,
				ROWD_TIED:$wdTied);
				string AsmString = !strconcat(!strconcat("$wd = ", instr_asm),
				"( $wl , $wr ) ; // MSA_3R prefix Carry special instruction (wrCarry = $wrCarry and wdTied = $wdTied ) "
				);
				list<dag> Pattern = [];

				string Constraints = "$wd = $wdTied";
				// Inspired from llvm/include/llvm/Target/Target.td:
				// <<OperandConstraint, e.g. $src = $dst.>>

				// We need to put this since we don't specify a DAG pattern in Pattern
				bit hasSideEffects = 1;
				InstrItinClass Itinerary = itin;
				}
				//
				class ADDCV_SPECIAL_H_DESC : MSA_3R_PREFIX_CARRY_SPECIAL_DESC_BASE<"ADDC",
				VectorHOpnd>,
				IsCommutable;
				class ADDCV_SPECIAL_H_ENC : MSA_3R_FMT<0b101110101>;
				// Note: affects flags: like Addc, Ult, Eq
				def ADDCV_SPECIAL_H : ADDCV_SPECIAL_H_ENC, ADDCV_SPECIAL_H_DESC;
				//
				class SUBCV_SPECIAL_H_DESC : MSA_3R_PREFIX_CARRY_SPECIAL_DESC_BASE<"SUBC",
				VectorHOpnd>;
				class SUBCV_SPECIAL_H_ENC : MSA_3R_FMT<0b101110101>;
				// Note: affects flags: like Subc, Ult, Eq
				def SUBCV_SPECIAL_H : SUBCV_SPECIAL_H_ENC, SUBCV_SPECIAL_H_DESC;

				class MUL_3R_DESC_BASE<string instr_asm,
				SDPatternOperator OpNode,
				RegisterOperand ROWD,
				RegisterOperand ROWR = ROWD,
				RegisterOperand ROWL = ROWD,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD:$wd);
				dag InOperandList = (ins ROWR:$wr, ROWL:$wl);
				// Important Note: we write in string "( $wl )" to be parsed properly
				string AsmString = !strconcat("$wr * ( $wl ); ", "$wd = MULTLO(); // MUL_3R");
				// TODO: Maybe return also the high 32 bits
				list<dag> Pattern = [(set ROWD:$wd, (OpNode ROWR:$wr, ROWL:$wl))];
				InstrItinClass Itinerary = itin;
				}

				/* Important:
				This generic i16 multiplication with destination is actually NOT a Connex-S
				instruction, but we use it to match the ISD::MUL automatically with
				TableGen, since it is simpler like this.
				MULV is multiplication with src1, src2 and dst
				(actually it has 2 Connex instructions)

				Note that we specify below also the Connex assembly instructions for
				multiplication.
				*/
				class MUL3RV_H_DESC : MUL_3R_DESC_BASE<"*", mul, VectorHOpnd>, IsCommutable;
				class MUL3RV_H_ENC : MSA_3R_FMT<0b111111111>;

				def MUL3RV_H : MUL3RV_H_ENC, MUL3RV_H_DESC;

				// Now we specify all actual Connex-S instructions for multiplication
				class MULT_H_DESC : MSA_RR_INFIX_DESC_BASE<"*", VectorHOpnd>, IsCommutable;
				class MULT_H_ENC : MSA_2R_FMT2<0b100001000>;
				// Note: affects flags: like Add, Lt, Eq
				def MULT_H : MULT_H_ENC, MULT_H_DESC;

				// IMPORTANT NOTE: MULT_U_H, unsigned multiplication, is only used for manual
				// ISel
				// TODO: maybe do also MULT_SPECIAL_H_DESC, MULT_U_SPECIAL_H_DESC
				class MULT_U_H_DESC : MSA_RR_PREFIX_DESC_BASE<"MULT_U", VectorHOpnd>,
				IsCommutable;
				class MULT_U_H_ENC : MSA_2R_FMT2<0b010110111>;
				// Note: affects flags: like Add, Lt, Eq
				def MULT_U_H : MULT_U_H_ENC, MULT_U_H_DESC;

				class MULTLO_H_DESC : MSA_LDIX_LDSH_MULT_H_DESC_BASE<"MULTLO()", VectorHOpnd>;
				class MULTLO_H_ENC : MSA_1R_FMT_dest<0b100101000>;
				// Note: does NOT affect any flags
				def MULTLO_H : MULTLO_H_ENC, MULTLO_H_DESC;

				class MULTHI_H_DESC : MSA_LDIX_LDSH_MULT_H_DESC_BASE<"MULTHI()", VectorHOpnd>;
				class MULTHI_H_ENC : MSA_1R_FMT_dest<0b100111000>;
				// Note: does NOT affect any flags
				def MULTHI_H : MULTHI_H_ENC, MULTHI_H_DESC;

				class MULTLO_SPECIAL_H_DESC : MSA_LDIX_LDSH_MULT_SPECIAL_H_DESC_BASE<"MULTLO()",
				VectorHOpnd>;
				def MULTLO_SPECIAL_H : MULTLO_H_ENC, MULTLO_SPECIAL_H_DESC;

				class MULTHI_SPECIAL_H_DESC : MSA_LDIX_LDSH_MULT_SPECIAL_H_DESC_BASE<"MULTHI()",
				VectorHOpnd>;
				def MULTHI_SPECIAL_H : MULTHI_H_ENC, MULTHI_SPECIAL_H_DESC;


				let Constraints = "$specialOperandOut = $specialOperandIn" in
				let isCodeGenOnly = 1 in
				// let hasSideEffects = 1 in
				// let isBarrier = 1 in // Can control flow fall through this instruction?
				// bit isSelect = 1; // Is this instruction a select instruction?
				def END_WHERE_2OPNDS : NonImmediateInstruction<0b100011111,
				(outs VectorHOpnd:$specialOperandOut),
				(ins VectorHOpnd:$specialOperandIn),
				"\n); // END_WHERE (NII) \n EXECUTE_IN_ALL( // (specialOperandIn = $specialOperandIn, specialOperandOut = $specialOperandOut) ;",
				[]>;
				//
				let isCodeGenOnly = 1 in
				let hasSideEffects = 1 in
				// let isBarrier = 1 in // Can control flow fall through this instruction?
				// bit isSelect = 1; // Is this instruction a select instruction?
				def END_WHERE : NonImmediateInstruction<0b100011111,
				(outs),
				(ins),
				"\n); // END_WHERE (NII) \n EXECUTE_IN_ALL( // ;",
				[]>;
				// Note: does NOT affect any flags

				let isCodeGenOnly = 1 in
				// To avoid: <<error: multiline instruction is not valid for the asmparser,
				// mark it isCodeGenOnly>>
				let hasSideEffects = 1 in
				let isBarrier = 1 in
				def WHERECRY : NonImmediateInstruction<0b100011100,
				(outs),
				(ins),
				"\n); // END_EXECUTE_IN_ALL\n EXECUTE_WHERE_CRY( // NII;",
				[]>;
				// Note: does NOT affect any flags


				//let Constraints = "$specialOperandOut = $specialOperandIn" in
				// To avoid: <<error: multiline instruction is not valid for the asmparser>>
				let isCodeGenOnly = 1 in
				let hasSideEffects = 1 in
				//let isBarrier = 1 in
				def WHEREEQ : NonImmediateInstruction<0b100011101,
				(outs),
				(ins),
				//(outs VectorHOpnd:$specialOperandOut),
				//(ins VectorHOpnd:$specialOperandIn),
				/*"); // END_EXECUTE_IN_ALL\n EXECUTE_WHERE_EQ( // NII "
				"(specialOperandIn = $specialOperandIn,
				"specialOperandOut = $specialOperandOut) ;",*/
				"); // END_EXECUTE_IN_ALL\n EXECUTE_WHERE_EQ( // NII ;",
				[]>;
				// Note: does NOT affect any flags




				/*
				let Constraints = "$specialOperandOut = $specialOperandIn" in
				// Inspired from llvm/include/llvm/Target/Target.td:
				// <<OperandConstraint, e.g. $src = $dst.>>
				*/
				let isCodeGenOnly = 1 in // To avoid: <<error: multiline instruction is not
				// valid for the asmparser, mark it isCodeGenOnly>>
				let hasSideEffects = 1 in
				//let isBarrier = 1 in
				def WHERELT : NonImmediateInstruction<0b100011110,
				(outs),
				(ins),
				//(outs VectorHOpnd:$specialOperandOut),
				//(ins VectorHOpnd:$specialOperandIn),
				/*
				"); // END_EXECUTE_IN_ALL\n EXECUTE_WHERE_LT( "
				"// NII (specialOperandIn = $specialOperandIn, "
				"specialOperandOut = $specialOperandOut) ;",
				*/
				"); // END_EXECUTE_IN_ALL\n EXECUTE_WHERE_LT( // NII ;",
				[]>;
				// Note: does NOT affect any flags


				// Inspired from lib/Target/WebAssembly/WebAssemblyInstrInfo.td
				def bb_op : Operand<OtherVT>;









				// Inspired from [LLVM]/llvm/lib/Target/Mips/MipsMSAInstrInfo.td

				class MSA_3R_DESC_BASE_2STR<string instr_asm,
				SDPatternOperator OpNode,
				RegisterOperand ROWD,
				RegisterOperand ROWR = ROWD,
				RegisterOperand ROWL = ROWD,
				RegisterOperand ROWP = ROWD,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD:$wd);
				dag InOperandList = (ins ROWL:$wl, ROWR:$wr, ROWP:$pred);
				// INFIX_NOTATION:
				string AsmString = instr_asm;
				list<dag> Pattern = [];
				InstrItinClass Itinerary = itin;
				}

				/* OLD: Important: WHERE_EQ/LT/CRY has 2 inputs and 1 output because I
				lower VSELECT to WHEREEQ_H, which I later use to create a bundle with real
				machine instructions to "resist" the post-RA scheduler without changing the
				order of the nodes (in ConnexTargetMachine.cpp, passes PassCreateWhereBlocks
				and PassFinalizeBundles). */
				class WHEREEQ_BUNDLE_H_ENC : MSA_3R_FMT<0b000110010>;
				class WHEREEQ_BUNDLE_H_DESC : MSA_3R_DESC_BASE_2STR<
				");\n EXECUTE_WHERE_EQ( //;",
				vseteq_v128i16, VectorHOpnd>;
				//, IsCommutable;
				//
				let isCodeGenOnly = 1 in
				let hasSideEffects = 1 in
				// let isBarrier = 1 in
				// let isPseudo = 1 in
				def WHEREEQ_BUNDLE_H : WHEREEQ_BUNDLE_H_ENC, WHEREEQ_BUNDLE_H_DESC;

				class WHERELT_BUNDLE_H_ENC : MSA_3R_FMT<0b000110010>;
				class WHERELT_BUNDLE_H_DESC : MSA_3R_DESC_BASE_2STR<
				");\n EXECUTE_WHERE_LT( //;",
				vsetlt_v128i16, VectorHOpnd>;
				//, IsCommutable;
				//
				let isCodeGenOnly = 1 in
				let hasSideEffects = 1 in
				// let isBarrier = 1 in
				// let isPseudo = 1 in
				def WHERELT_BUNDLE_H : WHERELT_BUNDLE_H_ENC, WHERELT_BUNDLE_H_DESC;

				class WHEREULT_BUNDLE_H_ENC : MSA_3R_FMT<0b000110010>;
				class WHEREULT_BUNDLE_H_DESC : MSA_3R_DESC_BASE_2STR<
				");\n EXECUTE_WHERE_ULT( //;",
				vsetult_v128i16, VectorHOpnd>;
				//, IsCommutable;
				//
				let isCodeGenOnly = 1 in
				let hasSideEffects = 1 in
				// let isBarrier = 1 in
				// let isPseudo = 1 in
				def WHEREULT_BUNDLE_H : WHEREULT_BUNDLE_H_ENC, WHEREULT_BUNDLE_H_DESC;

				// NOTE: It modifies Carry (Sub), Lt, Eq

				class CELLSHR_H_DESC : MSA_RR_PREFIX_DESC_BASE<"CELLSHR", VectorHOpnd>;
				class CELLSHR_H_ENC : MSA_RR_FMT<0b100010010>;
				// Note: affects flags: like Sub, Lt, Eq
				def CELLSHR_H : CELLSHR_H_ENC, CELLSHR_H_DESC;

				class CELLSHL_H_DESC : MSA_RR_PREFIX_DESC_BASE<"CELLSHL", VectorHOpnd>;
				class CELLSHL_H_ENC : MSA_RR_FMT<0b100010001>;
				// Note: affects flags: like Sub, Lt, Eq
				def CELLSHL_H : CELLSHL_H_ENC, CELLSHL_H_DESC;

llvm/lib/Target/Connex/ConnexInstrInfoVecVsplat.td

This file was added.

				// Inspired from llvm/lib/Target/Mips/MipsInstrInfo.td
				// - I copied only the records I needed (not all)

				//===----------------------------------------------------------------------===//
				// Mips Operand, Complex Patterns and Transformations Definitions.
				//===----------------------------------------------------------------------===//

				class ConstantSImmAsmOperandClass<int Bits, list<AsmOperandClass> Supers = [],
				int Offset = 0> : AsmOperandClass {
				let Name = "ConstantSImm" # Bits # "_" # Offset;
				let RenderMethod = "addConstantSImmOperands<" # Bits # ", " # Offset # ">";
				let PredicateMethod = "isConstantSImm<" # Bits # ", " # Offset # ">";
				let SuperClasses = Supers;
				let DiagnosticType = "SImm" # Bits # "_" # Offset;
				}

				class ConstantUImmAsmOperandClass<int Bits, list<AsmOperandClass> Supers = [],
				int Offset = 0> : AsmOperandClass {
				let Name = "ConstantUImm" # Bits # "_" # Offset;
				let RenderMethod = "addConstantUImmOperands<" # Bits # ", " # Offset # ">";
				let PredicateMethod = "isConstantUImm<" # Bits # ", " # Offset # ">";
				let SuperClasses = Supers;
				let DiagnosticType = "UImm" # Bits # "_" # Offset;
				}

				class ConstantUImmRangeAsmOperandClass<int Bottom, int Top,
				list<AsmOperandClass> Supers = []>
				: AsmOperandClass {
				let Name = "ConstantUImmRange" # Bottom # "_" # Top;
				let RenderMethod = "addImmOperands";
				let PredicateMethod = "isConstantUImmRange<" # Bottom # ", " # Top # ">";
				let SuperClasses = Supers;
				let DiagnosticType = "UImmRange" # Bottom # "_" # Top;
				}

				class SImmAsmOperandClass<int Bits, list<AsmOperandClass> Supers = []>
				: AsmOperandClass {
				let Name = "SImm" # Bits;
				let RenderMethod = "addSImmOperands<" # Bits # ">";
				let PredicateMethod = "isSImm<" # Bits # ">";
				let SuperClasses = Supers;
				let DiagnosticType = "SImm" # Bits;
				}

				class UImmAsmOperandClass<int Bits, list<AsmOperandClass> Supers = []>
				: AsmOperandClass {
				let Name = "UImm" # Bits;
				let RenderMethod = "addUImmOperands<" # Bits # ">";
				let PredicateMethod = "isUImm<" # Bits # ">";
				let SuperClasses = Supers;
				let DiagnosticType = "UImm" # Bits;
				}

				// AsmOperandClasses require a strict ordering which is difficult to manage
				// as a hierarchy. Instead, we use a linear ordering and impose an order that
				// is in some places arbitrary.
				//
				// Here the rules that are in use:
				// * Wider immediates are a superset of narrower immediates:
				// uimm4 < uimm5 < uimm6
				// * For the same bit-width, unsigned immediates are a superset of signed
				// immediates::
				// simm4 < uimm4 < simm5 < uimm5
				// * For the same upper-bound, signed immediates are a superset of unsigned
				// immediates:
				// uimm3 < simm4 < uimm4 < simm4
				// * Modified immediates are a superset of ordinary immediates:
				// uimm5 < uimm5_plus1 (1..32) < uimm5_plus32 (32..63) < uimm6
				// The term 'superset' starts to break down here since the uimm5_plus* classes
				// are not true supersets of uimm5 (but they are still subsets of uimm6).
				// * 'Relaxed' immediates are supersets of the corresponding unsigned immediate.
				// uimm16 < uimm16_relaxed
				// * The codeGen pattern type is arbitrarily ordered.
				// uimm5 < uimm5_64, and uimm5 < vsplat_uimm5
				// This is entirely arbitrary. We need an ordering and what we pick is
				// unimportant since only one is possible for a given mnemonic.
				def SImm32RelaxedAsmOperandClass
				: SImmAsmOperandClass<32, []> {
				let Name = "SImm32_Relaxed";
				let PredicateMethod = "isAnyImm<32>";
				let DiagnosticType = "SImm32_Relaxed";
				}
				def SImm32AsmOperandClass
				: SImmAsmOperandClass<32, [SImm32RelaxedAsmOperandClass]>;
				def ConstantUImm26AsmOperandClass
				: ConstantUImmAsmOperandClass<26, [SImm32AsmOperandClass]>;
				def ConstantUImm20AsmOperandClass
				: ConstantUImmAsmOperandClass<20, [ConstantUImm26AsmOperandClass]>;
				def UImm16RelaxedAsmOperandClass
				: UImmAsmOperandClass<16, [ConstantUImm20AsmOperandClass]> {
				let Name = "UImm16_Relaxed";
				let PredicateMethod = "isAnyImm<16>";
				let DiagnosticType = "UImm16_Relaxed";
				}
				def UImm16AsmOperandClass
				: UImmAsmOperandClass<16, [UImm16RelaxedAsmOperandClass]>;
				def SImm16RelaxedAsmOperandClass
				: SImmAsmOperandClass<16, [UImm16RelaxedAsmOperandClass]> {
				let Name = "SImm16_Relaxed";
				let PredicateMethod = "isAnyImm<16>";
				let DiagnosticType = "SImm16_Relaxed";
				}
				def SImm16AsmOperandClass
				: SImmAsmOperandClass<16, [SImm16RelaxedAsmOperandClass]>;


				def ConstantSImm10Lsl3AsmOperandClass : AsmOperandClass {
				let Name = "SImm10Lsl3";
				let RenderMethod = "addImmOperands";
				let PredicateMethod = "isScaledSImm<10, 3>";
				let SuperClasses = [SImm16AsmOperandClass];
				let DiagnosticType = "SImm10_Lsl3";
				}
				def ConstantSImm10Lsl2AsmOperandClass : AsmOperandClass {
				let Name = "SImm10Lsl2";
				let RenderMethod = "addImmOperands";
				let PredicateMethod = "isScaledSImm<10, 2>";
				let SuperClasses = [ConstantSImm10Lsl3AsmOperandClass];
				let DiagnosticType = "SImm10_Lsl2";
				}
				def ConstantSImm11AsmOperandClass
				: ConstantSImmAsmOperandClass<11, [ConstantSImm10Lsl2AsmOperandClass]>;
				def ConstantSImm10Lsl1AsmOperandClass : AsmOperandClass {
				let Name = "SImm10Lsl1";
				let RenderMethod = "addImmOperands";
				let PredicateMethod = "isScaledSImm<10, 1>";
				let SuperClasses = [ConstantSImm11AsmOperandClass];
				let DiagnosticType = "SImm10_Lsl1";
				}
				def ConstantUImm10AsmOperandClass
				: ConstantUImmAsmOperandClass<10, [ConstantSImm10Lsl1AsmOperandClass]>;

				def ConstantSImm10AsmOperandClass
				: ConstantSImmAsmOperandClass<10, [ConstantUImm10AsmOperandClass]>;
				def ConstantSImm9AsmOperandClass
				: ConstantSImmAsmOperandClass<9, [ConstantSImm10AsmOperandClass]>;
				def ConstantSImm7Lsl2AsmOperandClass : AsmOperandClass {
				let Name = "SImm7Lsl2";
				let RenderMethod = "addImmOperands";
				let PredicateMethod = "isScaledSImm<7, 2>";
				let SuperClasses = [ConstantSImm9AsmOperandClass];
				let DiagnosticType = "SImm7_Lsl2";
				}
				def ConstantUImm8AsmOperandClass
				: ConstantUImmAsmOperandClass<8, [ConstantSImm7Lsl2AsmOperandClass]>;
				def ConstantUImm7Sub1AsmOperandClass
				: ConstantUImmAsmOperandClass<7, [ConstantUImm8AsmOperandClass], -1> {
				// Specify the names since the -1 offset causes invalid identifiers otherwise.
				let Name = "UImm7_N1";
				let DiagnosticType = "UImm7_N1";
				}
				def ConstantUImm7AsmOperandClass
				: ConstantUImmAsmOperandClass<7, [ConstantUImm7Sub1AsmOperandClass]>;
				def ConstantUImm6Lsl2AsmOperandClass : AsmOperandClass {
				let Name = "UImm6Lsl2";
				let RenderMethod = "addImmOperands";
				let PredicateMethod = "isScaledUImm<6, 2>";
				let SuperClasses = [ConstantUImm7AsmOperandClass];
				let DiagnosticType = "UImm6_Lsl2";
				}
				def ConstantUImm6AsmOperandClass
				: ConstantUImmAsmOperandClass<6, [ConstantUImm6Lsl2AsmOperandClass]>;
				def ConstantSImm6AsmOperandClass
				: ConstantSImmAsmOperandClass<6, [ConstantUImm6AsmOperandClass]>;
				def ConstantUImm5Lsl2AsmOperandClass : AsmOperandClass {
				let Name = "UImm5Lsl2";
				let RenderMethod = "addImmOperands";
				let PredicateMethod = "isScaledUImm<5, 2>";
				let SuperClasses = [ConstantSImm6AsmOperandClass];
				let DiagnosticType = "UImm5_Lsl2";
				}

				def ConstantUImm5_Range2_64AsmOperandClass
				: ConstantUImmRangeAsmOperandClass<2, 64,
				[ConstantUImm5Lsl2AsmOperandClass]>;

				def ConstantUImm5Plus33AsmOperandClass
				: ConstantUImmAsmOperandClass<5, [ConstantUImm5_Range2_64AsmOperandClass],
				33>;
				def ConstantUImm5ReportUImm6AsmOperandClass
				: ConstantUImmAsmOperandClass<5, [ConstantUImm5Plus33AsmOperandClass]> {
				let Name = "ConstantUImm5_0_Report_UImm6";
				let DiagnosticType = "UImm5_0_Report_UImm6";
				}
				def ConstantUImm5Plus32AsmOperandClass
				: ConstantUImmAsmOperandClass<
				5, [ConstantUImm5ReportUImm6AsmOperandClass], 32>;
				def ConstantUImm5Plus32NormalizeAsmOperandClass
				: ConstantUImmAsmOperandClass<5, [ConstantUImm5Plus32AsmOperandClass], 32> {
				let Name = "ConstantUImm5_32_Norm";
				// We must also subtract 32 when we render the operand.
				let RenderMethod = "addConstantUImmOperands<5, 32, -32>";
				}
				def ConstantUImm5Plus1AsmOperandClass
				: ConstantUImmAsmOperandClass<
				5, [ConstantUImm5Plus32NormalizeAsmOperandClass], 1>;
				def ConstantUImm5AsmOperandClass
				: ConstantUImmAsmOperandClass<5, [ConstantUImm5Plus1AsmOperandClass]>;
				def ConstantSImm5AsmOperandClass
				: ConstantSImmAsmOperandClass<5, [ConstantUImm5AsmOperandClass]>;
				def ConstantUImm4AsmOperandClass
				: ConstantUImmAsmOperandClass<4, [ConstantSImm5AsmOperandClass]>;
				def ConstantSImm4AsmOperandClass
				: ConstantSImmAsmOperandClass<4, [ConstantUImm4AsmOperandClass]>;
				def ConstantUImm3AsmOperandClass
				: ConstantUImmAsmOperandClass<3, [ConstantSImm4AsmOperandClass]>;
				def ConstantUImm2Plus1AsmOperandClass
				: ConstantUImmAsmOperandClass<2, [ConstantUImm3AsmOperandClass], 1>;
				def ConstantUImm2AsmOperandClass
				: ConstantUImmAsmOperandClass<2, [ConstantUImm3AsmOperandClass]>;
				def ConstantUImm1AsmOperandClass
				: ConstantUImmAsmOperandClass<1, [ConstantUImm2AsmOperandClass]>;
				def ConstantImmzAsmOperandClass : AsmOperandClass {
				let Name = "ConstantImmz";
				let RenderMethod = "addConstantUImmOperands<1>";
				let PredicateMethod = "isConstantImmz";
				let SuperClasses = [ConstantUImm1AsmOperandClass];
				let DiagnosticType = "Immz";
				}

				def ConstantSImm16Lsl2AsmOperandClass : AsmOperandClass {
				let Name = "SImm16Lsl2";
				let RenderMethod = "addImmOperands";
				let PredicateMethod = "isScaledSImm<16, 2>";
				let SuperClasses = [ConstantSImm9AsmOperandClass];
				let DiagnosticType = "SImm16_Lsl2";
				}
				def ConstantUImm16AsmOperandClass
				: ConstantUImmAsmOperandClass<16, [ConstantSImm16Lsl2AsmOperandClass]>;
				def ConstantSImm16AsmOperandClass
				: ConstantSImmAsmOperandClass<16, [ConstantUImm16AsmOperandClass]>;


				foreach I = {1, 2, 3, 4} in
				def uimm # I : Operand<i32> {
				let PrintMethod = "printUnsignedImm";
				let ParserMatchClass =
				!cast<AsmOperandClass>("ConstantUImm" # I # "AsmOperandClass");
				}

				foreach I = {1, 2, 3, 4, 5, 6, 8, 16} in
				def vsplat_uimm # I : Operand<vAny> {
				let PrintMethod = "printUImm<" # I # ">";
				let ParserMatchClass =
				!cast<AsmOperandClass>("ConstantUImm" # I # "AsmOperandClass");
				}






				foreach I = {5, 10, 16} in
				def vsplat_simm # I : Operand<vAny> {
				let ParserMatchClass =
				!cast<AsmOperandClass>("ConstantSImm" # I # "AsmOperandClass");
				}













				// ...
				def SDT_VSHF : SDTypeProfile<1, 3, [SDTCisInt<0>, SDTCisVec<0>,
				SDTCisInt<1>, SDTCisVec<1>,
				SDTCisSameAs<0, 2>, SDTCisSameAs<2, 3>]>;
				// ...
				//def MipsVSHF : SDNode<"MipsISD::VSHF", SDT_VSHF>;
				def MipsVSHF : SDNode<"ConnexISD::VSHF", SDT_VSHF>;
				// ...




				class SplatPatLeaf<Operand opclass, dag frag, code pred = [{}],
				SDNodeXForm xform = NOOP_SDNodeXForm>
				: PatLeaf<frag, pred, xform> {
				Operand OpClass = opclass;
				}

				class SplatComplexPattern<Operand opclass, ValueType ty, int numops, string fn,
				list<SDNode> roots = [],
				list<SDNodeProperty> props = []> :
				ComplexPattern<ty, numops, fn, roots, props> {
				Operand OpClass = opclass;
				}


				class MSA_ELM_SPLAT_DESC_BASE<string instr_asm, SplatComplexPattern SplatImm,
				RegisterOperand ROWD,
				RegisterOperand ROWS = ROWD,
				InstrItinClass itin = NoItinerary> {
				dag OutOperandList = (outs ROWD:$wd);
				dag InOperandList = (ins ROWS:$ws, SplatImm.OpClass:$n);
				string AsmString = !strconcat(instr_asm, "\t$wd, $ws[$n]");
				list<dag> Pattern = [(set ROWD:$wd, (MipsVSHF SplatImm:$n, ROWS:$ws,
				ROWS:$ws))];
				InstrItinClass Itinerary = itin;
				}


				// TODO_CHANGE_BACKEND:
				// TODO!!!! Alex: add ~original def vsplati16
				//def vsplati16 : PatFrag<(ops node:$e0),
				def vsplati64 : PatFrag<(ops node:$e0),
				(v8i64 (build_vector node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0))>;

				// NEW32
				def vsplati32 : PatFrag<(ops node:$e0),
				//(v64i32 (build_vector node:$e0, node:$e0,
				(v4i32 (build_vector node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0,
				node:$e0, node:$e0))>;



				def vsplati64_elt : PatFrag<(ops node:$v, node:$i),
				(MipsVSHF (vsplati64 node:$i), node:$v, node:$v)>;
				def vsplati32_elt : PatFrag<(ops node:$v, node:$i),
				(MipsVSHF (vsplati32 node:$i), node:$v, node:$v)>;













				def vsplati8_uimm3 : SplatComplexPattern<vsplat_uimm3, v16i8, 1,
				"selectVSplatUimm3",
				[build_vector, bitconvert]>;

				def vsplati8_uimm4 : SplatComplexPattern<vsplat_uimm4, v16i8, 1,
				"selectVSplatUimm4",
				[build_vector, bitconvert]>;
				def vsplati8_uimm5 : SplatComplexPattern<vsplat_uimm5, v16i8, 1,
				"selectVSplatUimm5",
				[build_vector, bitconvert]>;

				def vsplati8_uimm8 : SplatComplexPattern<vsplat_uimm8, v16i8, 1,
				"selectVSplatUimm8",
				[build_vector, bitconvert]>;

				def vsplati8_simm5 : SplatComplexPattern<vsplat_simm5, v16i8, 1,
				"selectVSplatSimm5",
				[build_vector, bitconvert]>;

				// Alex: changed v8i16 into v128i16
				// TODO_CHANGE_BACKEND:
				def vsplati16_uimm3 : SplatComplexPattern<vsplat_uimm3, v8i16, 1,
				"selectVSplatUimm3",
				[build_vector, bitconvert]>;

				def vsplati16_uimm4 : SplatComplexPattern<vsplat_uimm4, v8i16, 1,
				"selectVSplatUimm4",
				[build_vector, bitconvert]>;

				def vsplati16_uimm5 : SplatComplexPattern<vsplat_uimm5, v8i16, 1,
				"selectVSplatUimm5",
				[build_vector, bitconvert]>;

				def vsplati16_simm5 : SplatComplexPattern<vsplat_simm5, v8i16, 1,
				"selectVSplatSimm5",
				[build_vector, bitconvert]>;

				def vsplati16_uimm16 : SplatComplexPattern<vsplat_uimm16, v8i16, 1,
				"selectVSplatUimm16",
				[build_vector, bitconvert]>;

				def vsplati16_simm16 : SplatComplexPattern<vsplat_simm16, v8i16, 1,
				"selectVSplatSimm16",
				[build_vector, bitconvert]>;

				// Alex: changed v4i32 into v16i32
				// TODO_CHANGE_BACKEND:
				// NEW32
				def vsplati32_uimm2 : SplatComplexPattern<vsplat_uimm2, v4i32, 1,
				"selectVSplatUimm2",
				[build_vector, bitconvert]>;

				def vsplati32_uimm5 : SplatComplexPattern<vsplat_uimm5, v4i32, 1,
				"selectVSplatUimm5",
				[build_vector, bitconvert]>;

				def vsplati32_simm5 : SplatComplexPattern<vsplat_simm5, v4i32, 1,
				"selectVSplatSimm5",
				[build_vector, bitconvert]>;

				// Alex: changed v2i64 into v8i64
				// TODO_CHANGE_BACKEND:
				def vsplati64_uimm1 : SplatComplexPattern<vsplat_uimm1, v2i64, 1,
				"selectVSplatUimm1",
				[build_vector, bitconvert]>;

				def vsplati64_uimm5 : SplatComplexPattern<vsplat_uimm5, v2i64, 1,
				"selectVSplatUimm5",
				[build_vector, bitconvert]>;

				def vsplati64_uimm6 : SplatComplexPattern<vsplat_uimm6, v2i64, 1,
				"selectVSplatUimm6",
				[build_vector, bitconvert]>;

				def vsplati64_simm5 : SplatComplexPattern<vsplat_simm5, v2i64, 1,
				"selectVSplatSimm5",
				[build_vector, bitconvert]>;



				// Any build_vector that is a constant splat with a value that is an exact
				// power of 2
				def vsplat_uimm_pow2 : ComplexPattern<vAny, 1, "selectVSplatUimmPow2",
				[build_vector, bitconvert]>;

				// Any build_vector that is a constant splat with a value that is the bitwise
				// inverse of an exact power of 2
				def vsplat_uimm_inv_pow2 : ComplexPattern<vAny, 1, "selectVSplatUimmInvPow2",
				[build_vector, bitconvert]>;







				// Any build_vector that is a constant splat with only a consecutive sequence
				// of left-most bits set.
				def vsplat_maskl_bits_uimm3
				: SplatComplexPattern<vsplat_uimm3, vAny, 1, "selectVSplatMaskL",
				[build_vector, bitconvert]>;
				def vsplat_maskl_bits_uimm4
				: SplatComplexPattern<vsplat_uimm4, vAny, 1, "selectVSplatMaskL",
				[build_vector, bitconvert]>;
				def vsplat_maskl_bits_uimm5
				: SplatComplexPattern<vsplat_uimm5, vAny, 1, "selectVSplatMaskL",
				[build_vector, bitconvert]>;
				def vsplat_maskl_bits_uimm6
				: SplatComplexPattern<vsplat_uimm6, vAny, 1, "selectVSplatMaskL",
				[build_vector, bitconvert]>;

				// Any build_vector that is a constant splat with only a consecutive sequence
				// of right-most bits set.
				def vsplat_maskr_bits_uimm3
				: SplatComplexPattern<vsplat_uimm3, vAny, 1, "selectVSplatMaskR",
				[build_vector, bitconvert]>;
				def vsplat_maskr_bits_uimm4
				: SplatComplexPattern<vsplat_uimm4, vAny, 1, "selectVSplatMaskR",
				[build_vector, bitconvert]>;
				def vsplat_maskr_bits_uimm5
				: SplatComplexPattern<vsplat_uimm5, vAny, 1, "selectVSplatMaskR",
				[build_vector, bitconvert]>;
				def vsplat_maskr_bits_uimm6
				: SplatComplexPattern<vsplat_uimm6, vAny, 1, "selectVSplatMaskR",
				[build_vector, bitconvert]>;





				class SPLATI_D_DESC : MSA_ELM_SPLAT_DESC_BASE<"splati.h", vsplati64_uimm1,
				VectorHOpnd>;

llvm/lib/Target/Connex/ConnexMCInstLower.h

This file was added.

				//===-- ConnexMCInstLower.h - Lower MachineInstr to MCInst ------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_CONNEX_CONNEXMCINSTLOWER_H
				#define LLVM_LIB_TARGET_CONNEX_CONNEXMCINSTLOWER_H

				#include "llvm/Support/Compiler.h"

				namespace llvm {
				class AsmPrinter;
				class MCContext;
				class MCInst;
				class MCOperand;
				class MCSymbol;
				class MachineInstr;
				class MachineModuleInfoMachO;
				class MachineOperand;
				class Mangler;

				// ConnexMCInstLower - This class is used to lower an MachineInstr into an
				// MCInst.
				class LLVM_LIBRARY_VISIBILITY ConnexMCInstLower {
				MCContext &Ctx;

				AsmPrinter &Printer;

				public:
				ConnexMCInstLower(MCContext &ctx, AsmPrinter &printer)
				: Ctx(ctx), Printer(printer) {}
				void Lower(const MachineInstr *MI, MCInst &OutMI) const;

				MCOperand LowerSymbolOperand(const MachineOperand &MO, MCSymbol *Sym) const;

				MCSymbol *GetGlobalAddressSymbol(const MachineOperand &MO) const;
				};
				} // namespace llvm

				#endif

llvm/lib/Target/Connex/ConnexMCInstLower.cpp

This file was added.

				//=-- ConnexMCInstLower.cpp - Convert Connex MachineInstr to an MCInst ------=//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains code to lower Connex MachineInstrs to their corresponding
				// MCInst records.
				//
				//===----------------------------------------------------------------------===//

				#include "ConnexMCInstLower.h"
				#include "llvm/ADT/SmallString.h"
				#include "llvm/CodeGen/AsmPrinter.h"
				#include "llvm/CodeGen/MachineBasicBlock.h"
				#include "llvm/CodeGen/MachineInstr.h"
				#include "llvm/MC/MCAsmInfo.h"
				#include "llvm/MC/MCContext.h"
				#include "llvm/MC/MCExpr.h"
				#include "llvm/MC/MCInst.h"
				#include "llvm/Support/ErrorHandling.h"
				#include "llvm/Support/raw_ostream.h"

				#include "llvm/Support/Debug.h" // for dbgs and LLVM_DEBUG() macro
				#define DEBUG_TYPE "mc-inst-lower"

				using namespace llvm;

				MCSymbol *
				ConnexMCInstLower::GetGlobalAddressSymbol(const MachineOperand &MO) const {
				return Printer.getSymbol(MO.getGlobal());
				}

				MCOperand ConnexMCInstLower::LowerSymbolOperand(const MachineOperand &MO,
				MCSymbol *Sym) const {

				const MCExpr *Expr = MCSymbolRefExpr::create(Sym, Ctx);

				if (!MO.isJTI() && MO.getOffset())
				llvm_unreachable("unknown symbol op");

				return MCOperand::createExpr(Expr);
				}

				void ConnexMCInstLower::Lower(const MachineInstr *MI, MCInst &OutMI) const {
				LLVM_DEBUG(dbgs() << "Entered ConnexMCInstLower::Lower(MI = " << MI
				<< ")...\n");
				OutMI.setOpcode(MI->getOpcode());

				for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
				const MachineOperand &MO = MI->getOperand(i);
				LLVM_DEBUG(dbgs() << "ConnexMCInstLower::Lower(): MO = " << MO << "\n");
				LLVM_DEBUG(dbgs() << " ConnexMCInstLower::Lower(): MO.getType() = "
				<< MO.getType() << "\n");

				MCOperand MCOp;

				switch (MO.getType()) {

				default:
				MI->dump();
				/*
				LLVM_DEBUG(dbgs() << "ConnexMCInstLower::Lower(): MO.getType() = "
				<< MO.getType() << "\n");
				*/

				llvm_unreachable("unknown operand type");

				case MachineOperand::MO_ExternalSymbol: {
				const MCSymbol *Symbol =
				Printer.GetExternalSymbolSymbol(MO.getSymbolName());
				MCSymbolRefExpr::VariantKind Kind = MCSymbolRefExpr::VK_None;
				const MCExpr *Expr = MCSymbolRefExpr::create(Symbol, Kind, Ctx);
				MCOp = MCOperand::createExpr(Expr);
				// Offset += MO.getOffset();
				break;
				}

				// case MachineOperand::MO_MetaData:
				case MachineOperand::MO_Metadata: {
				continue;
				// break;
				}

				case MachineOperand::MO_Register:
				// Ignore all implicit register operands.
				if (MO.isImplicit())
				continue;
				MCOp = MCOperand::createReg(MO.getReg());
				break;

				case MachineOperand::MO_Immediate:
				MCOp = MCOperand::createImm(MO.getImm());
				break;

				case MachineOperand::MO_MachineBasicBlock:
				MCOp = MCOperand::createExpr(
				MCSymbolRefExpr::create(MO.getMBB()->getSymbol(), Ctx));
				break;

				case MachineOperand::MO_RegisterMask:
				continue;
				case MachineOperand::MO_GlobalAddress:
				MCOp = LowerSymbolOperand(MO, GetGlobalAddressSymbol(MO));
				break;
				}

				OutMI.addOperand(MCOp);
				}
				}

llvm/lib/Target/Connex/ConnexRegisterInfo.h

This file was added.

				//===-- ConnexRegisterInfo.h - Connex Register Information Impl -- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains the Connex implementation of the TargetRegisterInfo class.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_CONNEX_CONNEXREGISTERINFO_H
				#define LLVM_LIB_TARGET_CONNEX_CONNEXREGISTERINFO_H

				#include "llvm/CodeGen/TargetRegisterInfo.h"

				#define GET_REGINFO_HEADER
				#include "ConnexGenRegisterInfo.inc"

				namespace llvm {

				struct ConnexRegisterInfo : public ConnexGenRegisterInfo {

				ConnexRegisterInfo();

				// Inspired from lib/Target/Mips/MipsRegisterInfo.cpp
				const TargetRegisterClass *getPointerRegClass(const MachineFunction &MF,
				unsigned Kind) const;

				const MCPhysReg getCalleeSavedRegs(const MachineFunction MF) const override;

				/*
				From http://llvm.org/doxygen/classllvm_1_1TargetRegisterInfo.html:
				<<Returns a bitset indexed by physical register number indicating if a
				register is a special register that has particular uses and should be
				considered unavailable at all times, e.g. stack pointer, return address.
				A reserved register:
				is not allocatable
				is considered always live
				is ignored by liveness tracking It is often necessary to reserve the
				super registers of a reserved register as well, to avoid them
				getting allocated indirectly. You may use markSuperRegs() and
				checkAllSuperRegsMarked() in this case.>>
				*/
				BitVector getReservedRegs(const MachineFunction &MF) const override;

				bool eliminateFrameIndex(MachineBasicBlock::iterator MI, int SPAdj,
				unsigned FIOperandNum,
				RegScavenger *RS = nullptr) const override;

				Register getFrameRegister(const MachineFunction &MF) const override;

				/* Addressing bug
				(llc -O0, at pass: "******** FAST REGISTER ALLOCATION ********")
				<<Remaining virtual register operands
				UNREACHABLE executed at llvm/lib/CodeGen/MachineRegisterInfo.cpp:144!>>

				(Using suggestion from at
				https://groups.google.com/forum/#!topic/llvm-dev/fEyD9YREi5M).
				*/
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1TargetRegisterInfo.html
				// Returns true if the target requires (and can make use of) the register
				// scavenger.
				virtual bool
				requiresRegisterScavenging(const MachineFunction &MF) const override {
				return false;
				}

				virtual bool
				requiresFrameIndexScavenging(const MachineFunction &MF) const override {
				return false;
				}
				};
				} // namespace llvm

				#endif

llvm/lib/Target/Connex/ConnexRegisterInfo.cpp

This file was added.

				//===-- ConnexRegisterInfo.cpp - Connex Register Information ----- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains the Connex implementation of the TargetRegisterInfo class.
				//
				//===----------------------------------------------------------------------===//

				#include "ConnexRegisterInfo.h"
				#include "Connex.h"
				#include "ConnexSubtarget.h"
				#include "llvm/CodeGen/MachineFrameInfo.h"
				#include "llvm/CodeGen/MachineFunction.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/RegisterScavenging.h"
				#include "llvm/CodeGen/TargetFrameLowering.h"
				#include "llvm/CodeGen/TargetInstrInfo.h"
				#include "llvm/Support/ErrorHandling.h"

				#define GET_REGINFO_TARGET_DESC
				#include "ConnexGenRegisterInfo.inc"
				using namespace llvm;

				#include "llvm/Support/Debug.h" // for dbgs and LLVM_DEBUG() macro
				#define DEBUG_TYPE "mc-inst-lower"

				ConnexRegisterInfo::ConnexRegisterInfo() : ConnexGenRegisterInfo(Connex::R0) {}

				// Inspired from lib/Target/Mips/MipsRegisterInfo.cpp
				const TargetRegisterClass *
				ConnexRegisterInfo::getPointerRegClass(const MachineFunction &MF,
				unsigned Kind) const {
				return &Connex::GPRRegClass;
				}

				const MCPhysReg *
				ConnexRegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const {
				return CSR_SaveList;
				}

				BitVector ConnexRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
				int numRegs = getNumRegs();

				LLVM_DEBUG(dbgs() << "getReservedRegs(): numRegs = " << numRegs << "\n");

				BitVector Reserved(numRegs);
				// We reserve scalar registers
				Reserved.set(Connex::R10); // R10 is read only frame pointer
				Reserved.set(Connex::R11); // R11 is pseudo stack pointer

				/*
				We now reserve vector registers.
				Wh30, vector register R(30), is used by me to codegen:
				- LLVM's VSELECT on Connex in ConnexTargetMachine.cpp
				- PassAfterPostRAScheduler
				(NO longer: in ConnexISelLowering::Lower() for VSELECT to be
				lowered to WHERE*).
				Doing so we avoid errors like:
				<<* Bad machine code: Using an undefined physical register *
				- function: IfConversion
				- basic block: BB#6 vector.body (0x1501fd8)
				- instruction: %vreg47<def> = COPY
				- operand 1: %Wh31>>

				- in ConnexInstrInfo::copyPhysReg() .
				*/
				/*
				LLVM_DEBUG(dbgs() << "getReservedRegs(): CONNEX_RESERVED_REGISTER_01/02/03 = "
				<< CONNEX_RESERVED_REGISTER_01 << "(normally Wh30)/"
				<< CONNEX_RESERVED_REGISTER_02 << "(normally Wh31)/"
				<< CONNEX_RESERVED_REGISTER_03 << "(normally Wh31)"
				<< "\n");
				*/
				Reserved.set(CONNEX_RESERVED_REGISTER_01);
				Reserved.set(CONNEX_RESERVED_REGISTER_02);
				Reserved.set(CONNEX_RESERVED_REGISTER_03);

				return Reserved;
				}

				// From book Lopes_2014:
				// eliminateFrameIndex
				// "implements this replacement by converting each frame index to a real stack
				// offset for all machine instructions that contain stack references (usually
				// loads and stores). Extra instructions are also generated whenever
				// additional stack offset arithmetic is necessary".
				bool ConnexRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
				int SPAdj, unsigned FIOperandNum,
				RegScavenger *RS) const {
				assert(SPAdj == 0 && "Unexpected");

				unsigned i = 0;
				MachineInstr &MI = *II;
				MachineFunction &MF = *MI.getParent()->getParent();
				DebugLoc DL = MI.getDebugLoc();

				while (!MI.getOperand(i).isFI()) {
				++i;
				assert(i < MI.getNumOperands() && "Instr doesn't have FrameIndex operand!");
				}

				unsigned FrameReg = getFrameRegister(MF);
				int FrameIndex = MI.getOperand(i).getIndex();
				const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
				MachineBasicBlock &MBB = *MI.getParent();

				if (MI.getOpcode() == Connex::MOV_rr) {
				MI.getOperand(i).ChangeToRegister(FrameReg, false);

				// TODO MAYBE: we took out the scalar ADD and therefore we have to comment
				// this
				int Offset = MF.getFrameInfo().getObjectOffset(FrameIndex);
				unsigned reg = MI.getOperand(i - 1).getReg();

				BuildMI(MBB, ++II, DL, TII.get(Connex::ADD_ri), reg)
				.addReg(reg)
				.addImm(Offset);

				return false;
				}

				int Offset = MF.getFrameInfo().getObjectOffset(FrameIndex) +
				MI.getOperand(i + 1).getImm();

				if (!isInt<32>(Offset))
				llvm_unreachable("bug in frame offset");

				if (MI.getOpcode() == Connex::FI_ri) {
				// architecture does not really support FI_ri, replace it with
				// MOV_rr <target_reg>, frame_reg
				// ADD_ri <target_reg>, imm
				unsigned reg = MI.getOperand(i - 1).getReg();

				BuildMI(MBB, ++II, DL, TII.get(Connex::MOV_rr), reg).addReg(FrameReg);

				// TODO MAYBE: we took out the scalar ADD and therefore we have to comment
				// this
				BuildMI(MBB, II, DL, TII.get(Connex::ADD_ri), reg)
				.addReg(reg)
				.addImm(Offset);

				// Remove FI_ri instruction
				MI.eraseFromParent();
				} else {
				MI.getOperand(i).ChangeToRegister(FrameReg, false);
				MI.getOperand(i + 1).ChangeToImmediate(Offset);
				}

				return false;
				}

				Register ConnexRegisterInfo::getFrameRegister(const MachineFunction &MF) const {
				// MEGA-TODO: in principle we should return also for the Connex vector
				// processor a vector register like: Connex::Wh28
				return Connex::R10;
				}

llvm/lib/Target/Connex/ConnexRegisterInfo.td

This file was added.

				//===-- ConnexRegisterInfo.td - Connex Register defs -------- tablegen --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				//===----------------------------------------------------------------------===//
				// Declarations that describe the Connex register file
				//===----------------------------------------------------------------------===//


				// Scalar??? registers are identified with 4-bit ID numbers.
				// Ri - 16-bit integer registers
				class Ri<bits<16> Enc, string n> : Register<n> {
				let Namespace = "Connex";
				let HWEncoding = Enc;
				}

				// Scalar registers of integer type
				def R0 : Ri< 0, "r0">, DwarfRegNum<[0]>;
				def R1 : Ri< 1, "r1">, DwarfRegNum<[1]>;
				def R2 : Ri< 2, "r2">, DwarfRegNum<[2]>;
				def R3 : Ri< 3, "r3">, DwarfRegNum<[3]>;
				def R4 : Ri< 4, "r4">, DwarfRegNum<[4]>;
				def R5 : Ri< 5, "r5">, DwarfRegNum<[5]>;
				def R6 : Ri< 6, "r6">, DwarfRegNum<[6]>;
				def R7 : Ri< 7, "r7">, DwarfRegNum<[7]>;
				def R8 : Ri< 8, "r8">, DwarfRegNum<[8]>;
				def R9 : Ri< 9, "r9">, DwarfRegNum<[9]>;
				def R10 : Ri<10, "r10">, DwarfRegNum<[10]>;
				def R11 : Ri<11, "r11">, DwarfRegNum<[11]>;
				def R12 : Ri<12, "r12">, DwarfRegNum<[12]>;
				def R13 : Ri<13, "r13">, DwarfRegNum<[13]>;
				def R14 : Ri<14, "r14">, DwarfRegNum<[14]>;
				def R15 : Ri<15, "r15">, DwarfRegNum<[15]>;
				def R16 : Ri<16, "r16">, DwarfRegNum<[16]>;
				def R17 : Ri<17, "r17">, DwarfRegNum<[17]>;
				def R18 : Ri<18, "r18">, DwarfRegNum<[18]>;
				def R19 : Ri<19, "r19">, DwarfRegNum<[19]>;
				def R20 : Ri<20, "r20">, DwarfRegNum<[20]>;
				def R21 : Ri<21, "r21">, DwarfRegNum<[21]>;
				def R22 : Ri<22, "r22">, DwarfRegNum<[22]>;
				def R23 : Ri<23, "r23">, DwarfRegNum<[23]>;
				def R24 : Ri<24, "r24">, DwarfRegNum<[24]>;
				def R25 : Ri<25, "r25">, DwarfRegNum<[25]>;
				def R26 : Ri<26, "r26">, DwarfRegNum<[26]>;
				def R27 : Ri<27, "r27">, DwarfRegNum<[27]>;
				def R28 : Ri<28, "r28">, DwarfRegNum<[28]>;
				def R29 : Ri<29, "r29">, DwarfRegNum<[29]>;
				def R30 : Ri<30, "r30">, DwarfRegNum<[30]>;
				def R31 : Ri<30, "r31">, DwarfRegNum<[31]>;








				// Register classes.

				//===----------------------------------------------------------------------===//
				// Declarations that describe the Connex register file
				//===----------------------------------------------------------------------===//
				let Namespace = "Connex" in {


				// Subregister indices. - See also llvm/lib/Target/X86/X86RegisterInfo.td

				// From llvm.org/svn/llvm-project/llvm/trunk/include/llvm/Target/Target.td


				/*
				// SubRegIndex - Use instances of SubRegIndex to identify subregisters.
				//class SubRegIndex<int size, int offset = 0>
				*/

				// 16 lanes Connex
				// This is for 16-bits subregisters (a Connex vector register has type 16 x i16)
				//
				def sub_16_00 : SubRegIndex<16, 0>;
				def sub_16_01 : SubRegIndex<16, 16>;
				def sub_16_02 : SubRegIndex<16, 32>;
				def sub_16_03 : SubRegIndex<16, 48>;
				def sub_16_04 : SubRegIndex<16, 64>;
				def sub_16_05 : SubRegIndex<16, 80>;
				def sub_16_06 : SubRegIndex<16, 96>;
				def sub_16_07 : SubRegIndex<16, 112>;
				def sub_16_08 : SubRegIndex<16, 128>;
				def sub_16_09 : SubRegIndex<16, 144>;
				def sub_16_10 : SubRegIndex<16, 160>;
				def sub_16_11 : SubRegIndex<16, 176>;
				def sub_16_12 : SubRegIndex<16, 192>;
				def sub_16_13 : SubRegIndex<16, 208>;
				def sub_16_14 : SubRegIndex<16, 224>;
				def sub_16_15 : SubRegIndex<16, 240>;
				} // END: let Namespace = "Connex" in






				// Register Operands.

				class ConnexAsmRegOperand : AsmOperandClass {
				let ParserMethod = "parseAnyRegister";
				}

				def ConnexVectorAsmOperand : ConnexAsmRegOperand {
				let Name = "ConnexVectorAsmReg";
				}




				class ConnexRegWithSubRegs<bits<16> Enc, string n, list<Register> subregs>
				: RegisterWithSubRegs<n, subregs> {
				let HWEncoding = Enc;
				let Namespace = "Connex";
				}

				// We have banks of 32 registers each.
				class ConnexVectorElementReg<bits<16> Enc, string n> : Register<n> {
				let HWEncoding = Enc;
				let Namespace = "Connex";
				}

				class SUBREGS<bits<16> Enc, string n> : ConnexVectorElementReg<Enc, n>;

				// 8 lanes Connex, with registers of 8 x 16-bits (type v8i16)
				class VectorRegPacked16bits<bits<16> Enc, string n, list<Register> subregs>
				: ConnexRegWithSubRegs<Enc, n, subregs> {
				// For 16-bit subregisters:
				let SubRegIndices = [ sub_16_00, sub_16_01, sub_16_02, sub_16_03,
				sub_16_04, sub_16_05, sub_16_06, sub_16_07];
				let CoveredBySubRegs = 1;
				}


				//!!!!TODO: implement aliasing between 64, 16 and 32 subregs - NOT sure if required


				let Namespace = "Connex" in {
				// 8 lanes vector registers (type v8i16 - 8 x 16-bits)
				// 32 vector registers
				foreach RegId = 0-31 in
				foreach I = 0-7 in
				def SR16b_#RegId#_#I : SUBREGS<I, "sr16_"#RegId#"_"#I>,
				DwarfRegNum<[!add(I, 2048)]>;

				// 8 lanes vector registers (type v8i16)
				foreach RegId = 0-31 in
				def Wh#RegId : VectorRegPacked16bits<0, "R("#RegId#")",
				[
				!cast<SUBREGS>("SR16b_"#RegId#"_0"),
				!cast<SUBREGS>("SR16b_"#RegId#"_1"),
				!cast<SUBREGS>("SR16b_"#RegId#"_2"),
				!cast<SUBREGS>("SR16b_"#RegId#"_3"),
				!cast<SUBREGS>("SR16b_"#RegId#"_4"),
				!cast<SUBREGS>("SR16b_"#RegId#"_5"),
				!cast<SUBREGS>("SR16b_"#RegId#"_6"),
				!cast<SUBREGS>("SR16b_"#RegId#"_7")]>,
				DwarfRegNum<[!add(RegId, 32)]>;
				} // END: let Namespace = "Connex" in



				/*
				From http://llvm.org/docs/WritingAnLLVMBackend.html#defining-a-register-class:
				<<To define a RegisterClass, use the following 4 arguments:
				- The first argument of the definition is the name of the namespace.
				- The second argument is a list of ValueType register type values that are
				defined in include/llvm/CodeGen/ValueTypes.td.
				Defined values include integer types (such as i16, i32, and i1 for Boolean),
				floating-point types (f32, f64), and vector types (for example, v8i16 for
				an 8 x i16 vector).
				All registers in a RegisterClass must have the same ValueType, but some
				registers may store vector data in different configurations.
				For example a register that can process a 128-bit vector may be able to
				handle 16 8-bit integer elements, 8 16-bit integers, 4 32-bit integers,
				and so on.
				- The third argument of the RegisterClass definition specifies the alignment
				required of the registers when they are stored or loaded to memory.
				(Alex: the alignment is expressed in number of bits of the register)
				- The final argument, regList, specifies which registers are in this class.
				If an alternative allocation order method is not specified, then regList
				also defines the order of allocation used by the register allocator.
				Besides simply listing registers with (add R0, R1, ...), more advanced set
				operators are available. See include/llvm/Target/Target.td for more
				information.>>
				*/




				// These are taken from Mips - in the file ConnexRegisterInfo.td these registers
				// (RegisterClass) are defined with the other register altogether - same
				// namespace, same target


				// TODO: currently I use v8i16 --> vector alignment is 128 bits; but CVL
				// (Connex vector length) can be arbitrary --> MEGA-TODO: find a work-around
				// for this
				def VectorH: RegisterClass<"Connex", [v8i16 /, v128f16 /],
				// NUM_REGS = 32
				128, // Works, but 256-bytes alignment wastes too
				// much: 2048, NOTE: using 0 for alignment gives
				// an unexplainable "Stack dump"
				//64, // Works, but 256-bytes alignment wastes too
				// much: 2048, NOTE: using 0 for alignment
				// gives an unexplainable "Stack dump"
				//NUM_REGS=64
				// TODO_CHANGE_BACKEND
				// NUM_REGS = 32
				(sequence "Wh%u", 0, 31)>;


				def VectorHOpnd : RegisterOperand<VectorH> {
				let ParserMatchClass = ConnexVectorAsmOperand;
				}



				// NUM_REGS = 32
				// 32 registers of 128 elements, 1 bit each (v128i1)
				foreach RegId = 0-31 in
				def BoolMask#RegId : ConnexVectorElementReg<0, "BoolMask"#RegId>,
				DwarfRegNum<[!add(RegId, 10)]>;

				// Inspired from llvm/lib/Target/X86/X86RegisterInfo.td:
				// NUM_REGS = 32
				def BoolMask: RegisterClass<"Connex", [v8i1], 32,
				(sequence "BoolMask%u", 0, 31)>;
				def BoolMaskOpnd : RegisterOperand<BoolMask> {
				let ParserMatchClass = ConnexVectorAsmOperand;
				}









				// The GPR class of scalar registers

				def GPR : RegisterClass<"Connex", [i64], 64, (add (sequence "R%u", 0, 31))>;

				def GPR64AsmOperand : ConnexAsmRegOperand {
				let Name = "GPR64AsmReg";
				}

				// Taken from MipsRegisterInfo.td
				def GPR64Opnd : RegisterOperand<GPR> {
				let ParserMatchClass = GPR64AsmOperand;
				}

llvm/lib/Target/Connex/ConnexSchedule.td

This file was added.

				def ALL_UNIT : FuncUnit; // Branch unit

				// Inspired from PPCSchedule.td:def IIC_IntSimple : InstrItinClass;
				def ConnexItinClass : InstrItinClass;

				def ConnexItineraries : ProcessorItineraries<[ALL_UNIT], [], [
				// Inspired from lib/Target/AMDGPU/R600Schedule.td
				InstrItinData<ConnexItinClass, [InstrStage<1, [ALL_UNIT]>]>
				// 1 cycle long, uses the only functional unit of the processor
				]>;

llvm/lib/Target/Connex/ConnexSelectionDAGInfo.h

This file was added.

				//===-- ConnexSelectionDAGInfo.h - Connex SelectionDAG Info ------ C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file defines the Connex subclass for SelectionDAGTargetInfo.
				///
				//===----------------------------------------------------------------------===//

				// Inspired from ARM/ARMSelectionDAGInfo.cpp

				#ifndef LLVM_LIB_TARGET_CONNEX_CONNEXSELECTIONDAGINFO_H
				#define LLVM_LIB_TARGET_CONNEX_CONNEXSELECTIONDAGINFO_H

				#include "llvm/CodeGen/RuntimeLibcalls.h"
				#include "llvm/CodeGen/SelectionDAGTargetInfo.h"

				namespace llvm {

				/*
				namespace Connex_AM {
				static inline ShiftOpc getShiftOpcForNode(unsigned Opcode) {
				switch (Opcode) {
				default: return Connex_AM::no_shift;
				case ISD::SHL: return Connex_AM::lsl;
				case ISD::SRL: return Connex_AM::lsr;
				case ISD::SRA: return Connex_AM::asr;
				case ISD::ROTR: return Connex_AM::ror;
				//case ISD::ROTL: // Only if imm -> turn into ROTR.
				// Can't handle RRX here, because it would require folding a flag into
				// the addressing mode. :( This causes us to miss certain things.
				//case ConnexISD::RRX: return Connex_AM::rrx;
				}
				}
				} // end namespace Connex_AM
				*/

				class ConnexSelectionDAGInfo : public SelectionDAGTargetInfo {
				public:
				SDValue EmitTargetCodeForMemcpy(SelectionDAG &DAG, const SDLoc &dl,
				SDValue Chain, SDValue Dst, SDValue Src,
				SDValue Size, Align Alignment,
				bool isVolatile, bool AlwaysInline,
				MachinePointerInfo DstPtrInfo,
				MachinePointerInfo SrcPtrInfo) const override;

				SDValue
				EmitTargetCodeForMemmove(SelectionDAG &DAG, const SDLoc &dl, SDValue Chain,
				SDValue Dst, SDValue Src, SDValue Size,
				Align Alignment, bool isVolatile,
				MachinePointerInfo DstPtrInfo,
				MachinePointerInfo SrcPtrInfo) const override;

				// Adjust parameters for memset, see RTABI section 4.3.4
				SDValue EmitTargetCodeForMemset(SelectionDAG &DAG, const SDLoc &dl,
				SDValue Chain, SDValue Op1, SDValue Op2,
				SDValue Op3, Align Alignment, bool isVolatile,
				bool AlwaysInline,
				MachinePointerInfo DstPtrInfo) const override;

				SDValue EmitSpecializedLibcall(SelectionDAG &DAG, const SDLoc &dl,
				SDValue Chain, SDValue Dst, SDValue Src,
				SDValue Size, unsigned Align,
				RTLIB::Libcall LC) const;
				};

				} // end namespace llvm

				#endif

llvm/lib/Target/Connex/ConnexSelectionDAGInfo.cpp

This file was added.

				//===-- ConnexSelectionDAGInfo.cpp - Connex SelectionDAG Info -------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements the ConnexSelectionDAGInfo class.
				//
				//===----------------------------------------------------------------------===//

				#include "ConnexSelectionDAGInfo.h"
				#include "ConnexTargetMachine.h"
				#include "llvm/CodeGen/SelectionDAG.h"
				#include "llvm/IR/DerivedTypes.h"

				// Inspired from ARM/ARMSelectionDAGInfo.cpp

				using namespace llvm;

				#define DEBUG_TYPE "connex-selectiondag-info"

				// Emit, if possible, a specialized version of the given Libcall. Typically this
				// means selecting the appropriately aligned version, but we also convert memset
				// of 0 into memclr.
				SDValue ConnexSelectionDAGInfo::EmitSpecializedLibcall(
				SelectionDAG &DAG, const SDLoc &dl, SDValue Chain, SDValue Dst, SDValue Src,
				SDValue Size, unsigned Align, RTLIB::Libcall LC) const {

				const ConnexSubtarget &Subtarget =
				DAG.getMachineFunction().getSubtarget<ConnexSubtarget>();
				const ConnexTargetLowering *TLI = Subtarget.getTargetLowering();

				TargetLowering::ArgListTy Args;
				TargetLowering::ArgListEntry Entry;
				Entry.Ty = DAG.getDataLayout().getIntPtrType(*DAG.getContext());
				Entry.Node = Dst;
				Args.push_back(Entry);

				/*
				if (AEABILibcall == AEABI_MEMCLR) {
				Entry.Node = Size;
				Args.push_back(Entry);
				} else if (AEABILibcall == AEABI_MEMSET) {
				*/
				// Adjust parameters for memset, EABI uses format (ptr, size, value),
				// GNU library uses (ptr, value, size)
				// See RTABI section 4.3.4
				Entry.Node = Size;
				Args.push_back(Entry);

				// Extend or truncate the argument to be an i32 value for the call.
				if (Src.getValueType().bitsGT(MVT::i32))
				Src = DAG.getNode(ISD::TRUNCATE, dl, MVT::i32, Src);
				else if (Src.getValueType().bitsLT(MVT::i32))
				Src = DAG.getNode(ISD::ZERO_EXTEND, dl, MVT::i32, Src);

				Entry.Node = Src;
				Entry.Ty = Type::getInt32Ty(*DAG.getContext());
				Entry.IsSExt = false;
				Args.push_back(Entry);
				/*
				} else {
				Entry.Node = Src;
				Args.push_back(Entry);

				Entry.Node = Size;
				Args.push_back(Entry);
				}
				*/

				static char const *FunctionNames[4][3] = {
				{"__aeabi_memcpy", "__aeabi_memcpy4", "__aeabi_memcpy8"},
				{"__aeabi_memmove", "__aeabi_memmove4", "__aeabi_memmove8"},
				// { "__aeabi_memset", "__aeabi_memset4", "__aeabi_memset8" },
				{"memset", "memset", "memset"},
				{"__aeabi_memclr", "__aeabi_memclr4", "__aeabi_memclr8"}};
				TargetLowering::CallLoweringInfo CLI(DAG);
				CLI.setDebugLoc(dl)
				.setChain(Chain)
				.setCallee(TLI->getLibcallCallingConv(LC),
				Type::getVoidTy(*DAG.getContext()),
				DAG.getExternalSymbol(FunctionNames[2][2],
				TLI->getPointerTy(DAG.getDataLayout())),
				std::move(Args))
				.setDiscardResult();
				std::pair<SDValue, SDValue> CallResult = TLI->LowerCallTo(CLI);

				return CallResult.second;
				}

				SDValue ConnexSelectionDAGInfo::EmitTargetCodeForMemcpy(
				SelectionDAG &DAG, const SDLoc &dl, SDValue Chain, SDValue Dst, SDValue Src,
				SDValue Size, Align Alignment, bool isVolatile, bool AlwaysInline,
				MachinePointerInfo DstPtrInfo, MachinePointerInfo SrcPtrInfo) const {
				return EmitSpecializedLibcall(DAG, dl, Chain, Dst, Src, Size,
				Alignment.value(), RTLIB::MEMCPY);
				}

				SDValue ConnexSelectionDAGInfo::EmitTargetCodeForMemmove(
				SelectionDAG &DAG, const SDLoc &dl, SDValue Chain, SDValue Dst, SDValue Src,
				SDValue Size, Align Alignment, bool isVolatile,
				MachinePointerInfo DstPtrInfo, MachinePointerInfo SrcPtrInfo) const {
				return EmitSpecializedLibcall(DAG, dl, Chain, Dst, Src, Size,
				Alignment.value(), RTLIB::MEMMOVE);
				}

				SDValue ConnexSelectionDAGInfo::EmitTargetCodeForMemset(
				SelectionDAG &DAG, const SDLoc &dl, SDValue Chain, SDValue Dst, SDValue Src,
				SDValue Size, Align Alignment, bool isVolatile, bool AlwaysInline,
				MachinePointerInfo DstPtrInfo) const {
				LLVM_DEBUG(
				dbgs() << "Entered ConnexSelectionDAGInfo::EmitTargetCodeForMemset()"
				<< "\n");

				return EmitSpecializedLibcall(DAG, dl, Chain, Dst, Src, Size,
				Alignment.value(), RTLIB::MEMSET);
				}

llvm/lib/Target/Connex/ConnexSubtarget.h

This file was added.

				//===-- ConnexSubtarget.h - Define Subtarget for the Connex ------ C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file declares the Connex specific subclass of TargetSubtargetInfo.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_CONNEX_CONNEXSUBTARGET_H
				#define LLVM_LIB_TARGET_CONNEX_CONNEXSUBTARGET_H

				#include "ConnexFrameLowering.h"
				#include "ConnexISelLowering.h"
				#include "ConnexInstrInfo.h"
				#include "ConnexSelectionDAGInfo.h"
				#include "llvm/CodeGen/SelectionDAGTargetInfo.h"
				#include "llvm/CodeGen/TargetSubtargetInfo.h"
				#include "llvm/IR/DataLayout.h"
				#include "llvm/Target/TargetMachine.h"

				#define GET_SUBTARGETINFO_HEADER
				#include "ConnexGenSubtargetInfo.inc"

				namespace llvm {
				class StringRef;

				class ConnexSubtarget : public ConnexGenSubtargetInfo {
				virtual void anchor();
				ConnexInstrInfo InstrInfo;
				ConnexFrameLowering FrameLowering;
				ConnexTargetLowering TLInfo;

				SelectionDAGTargetInfo TSInfo;
				ConnexSelectionDAGInfo TSInfo2;

				public:
				// This constructor initializes the data members to match that
				// of the specified triple.
				ConnexSubtarget(const Triple &TT, const std::string &CPU,
				const std::string &FS, const TargetMachine &TM);

				// ParseSubtargetFeatures - Parses features string setting specified
				// subtarget options. Definition of function is auto generated by tblgen.
				void ParseSubtargetFeatures(StringRef CPU, StringRef TuneCPU, StringRef FS);

				const ConnexInstrInfo *getInstrInfo() const override { return &InstrInfo; }
				const ConnexFrameLowering *getFrameLowering() const override {
				return &FrameLowering;
				}
				const ConnexTargetLowering *getTargetLowering() const override {
				return &TLInfo;
				}

				const TargetRegisterInfo *getRegisterInfo() const override {
				return &InstrInfo.getRegisterInfo();
				}

				// Inspired from ARM/ARMSubtarget.cpp
				const ConnexSelectionDAGInfo *getSelectionDAGInfo() const override {
				return &TSInfo2;
				}
				};
				} // namespace llvm

				#endif

llvm/lib/Target/Connex/ConnexSubtarget.cpp

This file was added.

				//===-- ConnexSubtarget.cpp - Connex Subtarget Information ----------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements the Connex specific subclass of TargetSubtargetInfo.
				//
				//===----------------------------------------------------------------------===//

				#include "ConnexSubtarget.h"
				#include "Connex.h"
				#include "llvm/MC/TargetRegistry.h"

				using namespace llvm;

				#define DEBUG_TYPE "connex-subtarget"

				#define GET_SUBTARGETINFO_TARGET_DESC
				#define GET_SUBTARGETINFO_CTOR
				#include "ConnexGenSubtargetInfo.inc"

				void ConnexSubtarget::anchor() {}

				ConnexSubtarget::ConnexSubtarget(const Triple &TT, const std::string &CPU,
				const std::string &FS, const TargetMachine &TM)
				: ConnexGenSubtargetInfo(TT, CPU, /TuneCPU/ CPU, FS), InstrInfo(),
				FrameLowering(this), TLInfo(TM, this), TSInfo2() {}

llvm/lib/Target/Connex/ConnexTargetMachine.h

This file was added.

				//===-- ConnexTargetMachine.h - Define TargetMachine for Connex --- C++ ---===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file declares the Connex specific subclass of TargetMachine.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_CONNEX_CONNEXTARGETMACHINE_H
				#define LLVM_LIB_TARGET_CONNEX_CONNEXTARGETMACHINE_H

				#include "ConnexSubtarget.h"
				#include "llvm/ADT/Optional.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/Support/CodeGen.h"
				#include "llvm/Target/TargetMachine.h" // This was before
				#include <memory>

				namespace llvm {
				class ConnexTargetMachine : public LLVMTargetMachine {
				std::unique_ptr<TargetLoweringObjectFile> TLOF;
				ConnexSubtarget Subtarget;

				public:
				ConnexTargetMachine(const Target &T, const Triple &TT, StringRef CPU,
				StringRef FS, const TargetOptions &Options,
				Optional<Reloc::Model> RM, Optional<CodeModel::Model> CM,
				CodeGenOpt::Level OL, bool JIT);

				const ConnexSubtarget *getSubtargetImpl() const { return &Subtarget; }
				const ConnexSubtarget *getSubtargetImpl(const Function &) const override {
				return &Subtarget;
				}

				TargetPassConfig *createPassConfig(PassManagerBase &PM) override;

				TargetTransformInfo getTargetTransformInfo(const Function &F) const override;

				TargetLoweringObjectFile *getObjFileLowering() const override {
				return TLOF.get();
				}
				};
				} // namespace llvm

				#endif

llvm/lib/Target/Connex/ConnexTargetMachine.cpp

This file was added.

				//===-- TargetMachine.cpp - Define TargetMachine for Connex ---------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// Implements the info about the Connex target spec.
				//===----------------------------------------------------------------------===//

				#include "ConnexTargetMachine.h"
				#include "Connex.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/CodeGen/Passes.h"
				#include "llvm/CodeGen/TargetLoweringObjectFileImpl.h"
				#include "llvm/CodeGen/TargetPassConfig.h"
				#include "llvm/IR/LegacyPassManager.h"
				#include "llvm/MC/TargetRegistry.h"
				#include "llvm/Support/FormattedStream.h"
				#include "llvm/Target/TargetOptions.h"

				#include "llvm/Support/Debug.h"
				#define DEBUG_TYPE "connex-target-config"

				// This must be put after #include "llvm/Support/Debug.h"
				#include "ConnexTargetTransformInfo.h"

				using namespace llvm;

				static cl::opt<bool>
				DontTreatCopyInstructions("dont-treat-copy-instructions", cl::Hidden,
				cl::init(false),
				cl::desc("Don't treat copy instructions"));

				extern "C" void LLVMInitializeConnexTarget() {
				// Register the target - Force static initialization.
				RegisterTargetMachine<ConnexTargetMachine> Z(TheConnexTarget);
				}

				static StringRef computeDataLayout(const Triple &TT) {
				/*
				See http://llvm.org/docs/LangRef.html#data-layout for all details regarding
				layout declaration.
				- e
				Specifies that the target lays out data in little-endian form.
				- S<size>
				Specifies the natural alignment of the stack in bits.
				Alignment promotion of stack variables is limited to the natural stack
				alignment to avoid dynamic stack realignment.
				The stack alignment must be a multiple of 8-bits.
				If omitted, the natural stack alignment defaults to “unspecified”, which
				does not prevent any alignment promotions.
				- p[n]:<size>:<abi>:<pref>
				This specifies the size of a pointer and its <abi> and <pref>erred
				alignments for address space n. All sizes are in bits.
				The address space, n, is optional, and if not specified, denotes
				the default address space 0.
				The value of n must be in the range [1,2^23).
				- i<size>:<abi>:<pref>
				This specifies the alignment for an integer type of a given bit <size>.
				The value of <size> must be in the range [1,2^23).
				- n<size1>:<size2>:<size3>...
				This specifies a set of native integer widths for the target CPU in bits.
				- v<size>:<abi>:<pref>
				This specifies the alignment for a vector type of a given bit <size>.

				See also http://llvm.org/docs/WritingAnLLVMBackend.html
				An upper-case “E” in the string indicates a big-endian target data model.
				A lower-case “e” indicates little-endian.
				“p:” is followed by pointer information: size, ABI alignment, and
				preferred alignment.
				If only two figures follow “p:”, then the first value is pointer size,
				and the second value is both ABI and preferred alignment.
				Then a letter for numeric type alignment: “i”, “f”, “v”, or “a”
				(corresponding to integer, floating point, vector, or aggregate).
				“i”, “v”, or “a” are followed by ABI alignment and preferred alignment.
				“f” is followed by three values: the first indicates the size of a long
				double, then ABI alignment, and then ABI preferred alignment.
				*/

				// We specify here the data-layout:
				// - of the CPU, eBPF - actually ABI properties
				// - only a few alignment properties for the vector types
				// - see at the end of the string. Note that we can't
				// specify any other properties for the Connex vector processor.
				// Very Important: The pointer size 64 (of the eBPF CPU), because the
				// masked.gather/scatter instructions use such pointer normally in LLVM IR,
				// even if we translate them to writeDataTo/readDataFromConnex() and
				// Connex vector assembly instructions with indirect memory accesses.
				//
				// We really need to specify p:64 (not p:16), otherwise we get an error like:
				// "Do not know how to promote this operator!"
				// (GlobalAddress<i64* @CONNEX_VL> 0")
				// Important: the string is the one from the (e)BPF back end,
				// concatenated with the spec for the vector alignment for Connex.

				// return "e-m:e-p:64:64-i64:64-n32:64-S128-v128:128:128-v2048:2048:2048";
				// return
				// "e-m:e-p:64:64:64:64-p1:32:32:32:32-i64:64-n32:64-S128-v128:128:128-"
				// "v2048:2048:2048"; // 2019_06_25
				return "e-m:e-p:64:64:64:64-p1:64:64:64:64-i64:64-n32:64-S128-v128:128:128-"
				"v2048:2048:2048"; // 2019_06_25
				}

				static Reloc::Model getEffectiveRelocModel(Optional<Reloc::Model> RM) {
				if (!RM.has_value())
				return Reloc::PIC_;
				return *RM;
				}

				// Inspired from XCore/XCoreTargetMachine.cpp
				static CodeModel::Model
				getEffectiveXCoreCodeModel(Optional<CodeModel::Model> CM) {
				if (CM) {
				if (CM != CodeModel::Small && CM != CodeModel::Large)
				report_fatal_error("Target only supports CodeModel Small or Large");
				return *CM;
				}
				return CodeModel::Small;
				}

				ConnexTargetMachine::ConnexTargetMachine(const Target &T, const Triple &TT,
				StringRef CPU, StringRef FS,
				const TargetOptions &Options,
				Optional<Reloc::Model> RM,
				Optional<CodeModel::Model> CM,
				CodeGenOpt::Level OL, bool JIT)
				: LLVMTargetMachine(T, computeDataLayout(TT), TT, CPU, FS, Options,
				getEffectiveRelocModel(RM),
				getEffectiveCodeModel(CM, CodeModel::Small), OL),
				TLOF(std::make_unique<TargetLoweringObjectFileELF>()),
				Subtarget(TT, std::string(CPU), std::string(FS), *this) {
				initAsmInfo();
				}

				namespace {

				/* I made sure that the iterators don't become invalid by using
				another iterator, e.g. I2succ, which stores the next pointer in the
				data structures.

				small-TODO: it might be safer to do a change by moving (maybe also
				erasing) misplaced instrs one per WHERE block (or even per MBB) and then get
				out of the MBB::iterator loop and restart the loop from the beginning again
				until NO more changes are performed - this in order to avoid any (eventual)
				issue with iterator invalidation.
				*/
				class PassHandleMisplacedInstr : public MachineFunctionPass {
				public:
				PassHandleMisplacedInstr() : MachineFunctionPass(ID) {}

				StringRef getPassName() const override { return "PassHandleMisplacedInstr"; }

				/*
				// Very Important: GMS said in 2018 he doesn't like having arithmetic or logic
				// instruction between predicate and WHERE* instruction:
				#define ALLOW_COPY_BETWEEN_PREDICATE_AND_WHERE_INSTRUCTIONS
				- this case needs to be implemented carefully - I only sketched it a bit, so
				it is not tested either
				*/

				void updateUsesOfRegUntilMisplacedInstr(
				MachineBasicBlock::iterator &Ipredicate,
				// We start replacing uses from Ipredicate + 1
				MachineBasicBlock::iterator &I2, // misplaced instr
				MachineBasicBlock::iterator &IE, unsigned regCrt, unsigned regNew) {
				LLVM_DEBUG(dbgs() << " I2 = " << *I2);

				/* We update all following occurences of the dest register
				of misplaced instr (which was also the dest register of the
				predicate)
				- for both uses and def, until 1st def. */
				MachineBasicBlock::iterator Iupdate;
				Iupdate = Ipredicate;
				Iupdate++;

				for (; Iupdate != I2 && Iupdate != IE; Iupdate++) {
				LLVM_DEBUG(dbgs() << " Iupdate = " << *Iupdate);

				/* Important: we go in reverse order to make the def last since we
				break at def. */
				for (int idOpnd = Iupdate->getNumOperands() - 1; idOpnd >= 0; idOpnd--) {
				MachineOperand &IOpnd = Iupdate->getOperand((unsigned)idOpnd);

				if (IOpnd.isReg() && IOpnd.getReg() == regCrt) {
				LLVM_DEBUG(dbgs() << "updateUsesOfRegUntilMisplacedInstr(): Updating "
				"to regNew the register of Iupdate. "
				" Iupdate = "
				<< *Iupdate);

				/*
				// This does NOT hold because we can have uses of a misplaced instr
				// dest register before the misplaced instr - see the big WHERE
				// block of ADD.f16
				assert( (Iupdate->getOpcode() == Connex::WHEREEQ \|\|
				Iupdate->getOpcode() == Connex::WHERELT \|\|
				Iupdate->getOpcode() == Connex::WHERECRY) &&
				"We should NOT be arriving here otherwise.");
				*/

				if (IOpnd.isDef()) {
				// We break
				Iupdate = IE;
				Iupdate--; // We make it break out of outermost loop
				break;
				}

				IOpnd.setReg(regNew);
				}
				}
				}
				}

				void putMisplacedInstrBeforeWhereBlock(
				MachineBasicBlock &MBB, const TargetInstrInfo *TII,
				MachineInstr IMI, // The WHERE instruction
				MachineBasicBlock::iterator &I2, // misplaced instr
				MachineBasicBlock::iterator &I2plus1, MachineBasicBlock::iterator &IE,
				bool &changedMF, int &destRegisterPredicateOfSplitWhere) {
				/* NOTE: I2 is the misplaced instr instruction
				if (I2.getOperand(0) == Ipredicate.getOperand(0))
				for each instruction from Ipredicate to I2 - 1 replace defs and uses of
				I2.getOperand(0) with CONNEX_RESERVED_REGISTER_01
				*/

				/*
				Moving misplaced instr before the WHERE block.

				Normally we move the Misplaced Instr instructions and put them
				in the same order before the predicate.

				important-Note: If we have 2 Misplaced Instr with the same dest register,
				the WHERE block will be surely split at least for
				the 2nd Misplaced Instr. For example, from MatMul-256.f16:

				R(11) = R(23) == R(1);
				NOP;
				);
				EXECUTE_WHERE_EQ(
				R(19) = ISHL(R(21), 10);
				// Assume it's not here: R(19) = R(10) \| R(19);
				// Assume it's not here: R(25) = R(1) & R(10);
				R(10) = R(0) \| R(0); // COPY
				R(10) = R(26) - R(1);
				R(11) = R(1) << R(11);
				R(10) = R(0) \| R(0); // COPY
				R(10) = R(11) & R(20);
				The 2nd COPY forces the WHERE to be split
				- it's actually a different variable.

				Note: although not important, in principle we could
				have non-SPECIALV_H instrs inside WHERE blocks if
				the register is NOT initialized. */
				LLVM_DEBUG(dbgs() << " moving I2 immediately before the "
				"predicate instruction linked to the "
				"WHERE block (Case 1 from paper)\n");

				MachineBasicBlock::iterator Ipredicate = IMI;
				LLVM_DEBUG(dbgs() << " IMI = " << *IMI << "\n");
				Ipredicate--;
				LLVM_DEBUG(dbgs() << " Ipredicate = " << *Ipredicate << "\n");

				/*
				if (Ipredicate->getOpcode() != Connex::NOP_BPF)
				LLVM_DEBUG(dbgs() << "PassHandleMisplacedInstr: Warning: "
				"Ipredicate->getOpcode() != Connex::NOP_BPF\n");
				*/
				assert(Ipredicate->getOpcode() == Connex::NOP_BPF
				//\|\| Ipredicate->getOpcode() == Connex::NOP
				);

				/* Ipredicate is pointing at 2 instructions before the
				WHERE* instruction, normally at the predicate
				instruction.*/
				Ipredicate--;

				LLVM_DEBUG(dbgs() << " Ipredicate = " << *Ipredicate << "\n");

				// Important-TODO: check better: check for right (w.r.t. WHERE) predicate
				// instruction before NOP
				assert(Ipredicate->getOpcode() == Connex::EQ_H \|\|
				Ipredicate->getOpcode() == Connex::LT_H \|\|
				Ipredicate->getOpcode() == Connex::ULT_H //);
				\|\|
				// This is for the case of using lane gating instructions
				// (DISABLE_CELL, ENABLE_ALL_CELLS)
				Ipredicate->getOpcode() == Connex::EQ_SPECIAL_H \|\|
				Ipredicate->getOpcode() == Connex::LT_SPECIAL_H \|\|
				Ipredicate->getOpcode() == Connex::ULT_SPECIAL_H);

				assert(Ipredicate->getOperand(0).isReg() &&
				Ipredicate->getOperand(0).isDef());
				assert(I2->getOperand(0).isReg() && I2->getOperand(0).isDef());

				/*
				// This case can be handled (ONLY) by splitting WHERE block:
				#ifndef ALLOW_COPY_BETWEEN_PREDICATE_AND_WHERE_INSTRUCTIONS
				assert(I2->getOperand(1).getReg() != Ipredicate->getOperand(0).getReg() &&
				"We reached a case that's not treatable by to implement this case!");
				#endif
				*/

				/* Checking for WAR/anti-dependence between the predicate and Misplaced
				Instr instruction
				- if so, then changing order (moving Misplaced Instr before predicate)
				compromises correctness so we make a copy of the respective predicate
				input. */
				// I2 is the Misplaced Instr instruction
				assert(I2->getOperand(0).isReg() && I2->getOperand(0).isDef());
				//
				// Ipredicate is the predicate instruction
				assert(Ipredicate->getOperand(1).isReg() &&
				Ipredicate->getOperand(1).isUse());
				assert(Ipredicate->getOperand(2).isReg() &&
				Ipredicate->getOperand(2).isUse());
				//
				bool sameOpnd1 =
				Ipredicate->getOperand(1).getReg() == I2->getOperand(0).getReg();
				bool sameOpnd2 =
				Ipredicate->getOperand(2).getReg() == I2->getOperand(0).getReg();
				//
				if (sameOpnd1 \|\| sameOpnd2) {
				LLVM_DEBUG(
				dbgs()
				<< "Moving Misplaced Instr before WHERE predicate breaks "
				"WAR/anti-dependence relation between Misplaced Instr and "
				"predicate. "
				"--> fixing the problem by making copy of predicate input.\n");

				/* TODO???: if Ipredicate has a use of the dest register of EQ????????????
				then add: a) an instr before Misplaced Instr with
				CONNEX_RESERVED_REGISTER_01 = Rinput_EQ \| Rinput_EQ
				*/

				/* We preserve the input register of the predicate instruction since it
				will be overwritten by the moved (before the predicate)
				Misplaced Instr instruction:
				we make a copy:
				CONNEX_RESERVED_REGISTER_01 = Rdst_MisplacedInstr \|
				Rdst_MisplacedInstr
				*/
				#ifndef ALLOW_COPY_BETWEEN_PREDICATE_AND_WHERE_INSTRUCTIONS
				#ifdef COPY_REGISTER_IMPLEMENTED_WITH_ORV_H
				BuildMI(MBB, Ipredicate,
				/* We insert this MachineInstr before Ipredicate.
				Also the Misplaced Instr I2 we move after this, after
				Ipredicate, so I2 will be moved after this new copy */
				IMI->getDebugLoc(), TII->get(Connex::ORV_H),
				CONNEX_RESERVED_REGISTER_01)
				.addReg(I2->getOperand(0).getReg())
				.
				/* Note: I2 (Misplaced Instr) does NOT necessarily have the
				same dest register as Ipredicate. */
				addReg(I2->getOperand(0).getReg());
				#else
				#error "This case is NOT implemented. Implement it!"
				#endif
				#endif
				/* This really helps a lot since the Misplaced Instr moved before
				Ipredicate should be visible inside the WHERE block,
				so then we need to make the Ipredicate destination a reserved reg.
				Chances are big (but it's not necessary to be so I think) that since
				sameOpnd1 \|\| sameOpnd2, then we can have Ipredicate with
				Ipredicate->getOperand(0) == I2->getOperand(0);
				and if we leave it like that then we shadow the Misplaced Instr.
				. */
				if (Ipredicate->getOperand(0).getReg() == I2->getOperand(0).getReg())
				Ipredicate->getOperand(0).setReg(CONNEX_RESERVED_REGISTER_01);

				// Note: Ipredicate is the predicate instruction
				/* These checks handle also the case both input operands of Ipredicate
				are the same.
				*/
				if (sameOpnd1)
				Ipredicate->getOperand(1).setReg(CONNEX_RESERVED_REGISTER_01);
				if (sameOpnd2)
				Ipredicate->getOperand(2).setReg(CONNEX_RESERVED_REGISTER_01);

				/* We now normally have to update the uses of modified input of
				Ipredicate for the following instructions between the predicate
				and the place where the Misplaced Instr was.
				However, the instructions using the input after predicate are
				only the ones in the WHERE block basically.
				*/
				updateUsesOfRegUntilMisplacedInstr(Ipredicate,
				I2, // Misplaced Instr
				IE, I2->getOperand(0).getReg(),
				CONNEX_RESERVED_REGISTER_01);
				} else // MEGA-TODO: think if OK
				if (Ipredicate->getOperand(0).getReg() == I2->getOperand(0).getReg()) {
				// If we have a WAW (output) dependendce
				// Note: Ipredicate is the predicate, I2 is the Misplaced Instr
				LLVM_DEBUG(
				dbgs()
				<< " Found that the Misplaced Instr to be moved "
				"immediately before the predicate of the "
				"WHERE block has the same destination register as the predicate. "
				"This forces us to handle specially "
				"the predicate instr dest register, "
				"since this dest "
				"register is the same as the one of the "
				"Misplaced Instr (hence, a WAW dependence is broken "
				"and the program would become incorrect "
				"otherwise).\n");

				/* We update dest register of of Ipredicate (predicate)
				due to conflict with I2, which we move before it. */
				/*
				if (destRegisterPredicateOfSplitWhere != -1)
				Ipredicate->getOperand(0).setReg(destRegisterPredicateOfSplitWhere);
				else
				Ipredicate->getOperand(0).setReg(CONNEX_RESERVED_REGISTER_01);
				*/
				Ipredicate->getOperand(0).setReg(CONNEX_RESERVED_REGISTER_02);
				//
				updateUsesOfRegUntilMisplacedInstr(Ipredicate,
				I2, // Misplaced Instr
				IE, I2->getOperand(0).getReg(),
				CONNEX_RESERVED_REGISTER_02);
				}

				// We move the Misplaced Instr instruction before the predicate
				MBB.remove((&(*I2)));
				// MBB.insert(IMI, I2); // It inserts before IMI
				#ifdef ALLOW_COPY_BETWEEN_PREDICATE_AND_WHERE_INSTRUCTIONS
				MBB.insert(Ipredicate,
				IMI); // It inserts immediately before the WHERE instr
				#else
				MBB.insert(Ipredicate, (&(*I2))); // It inserts before Ipredicate
				#endif
				changedMF = true;

				// We handle the case of more than 1 Misplaced Instr instr in WHERE block
				// I2plus1 represents the next instr after the Misplaced Instr (before move)
				I2 = I2plus1;
				} // End putMisplacedInstrBeforeWhereBlock()

				inline void
				splitWhereBlock(MachineBasicBlock &MBB, const TargetInstrInfo *TII,
				MachineBasicBlock::iterator &I, MachineInstr *&IMI,
				MachineBasicBlock::iterator &I2, // Misplaced Instr instr
				MachineBasicBlock::iterator &IE, bool &changedMF,
				int &destRegisterPredicateOfSplitWhere) {
				/* This case handles only the cases we ran so far.
				See MEGA-TODO for limitation of this case. */
				changedMF = true;

				LLVM_DEBUG(dbgs() << " splitWhereBlock(): IMI = " << *IMI);
				LLVM_DEBUG(dbgs() << " splitWhereBlock(): I2 = " << *I2 << "\n");

				/* TODO: handle case
				where we have Misplaced Instr between 2 instr like ADD and
				ADDC, which is incorrect because the Misplaced Instr messes
				up the Connex flags. */
				MachineBasicBlock::iterator I2plus1 = I2;
				I2plus1++;
				// I think this does NOT cover all cases but most of them
				assert(
				I2plus1->getOpcode() != Connex::ADDCV_H &&
				I2plus1->getOpcode() != Connex::SUBCV_H &&
				I2plus1->getOpcode() != Connex::ADDCV_SPECIAL_H &&
				I2plus1->getOpcode() != Connex::SUBCV_SPECIAL_H &&
				"We do NOT handle yet ADDCV/SUBCV instructions immediately after "
				"Misplaced Instr for this case (and the corresponding ADD/SUB before "
				"the Misplaced Instr)");

				LLVM_DEBUG(dbgs() << " splitting WHERE block in 2 s.t. we put I2 "
				"immediately after new END_WHERE resulting from "
				"split.\n");
				// I = beginning of new WHERE block
				// const TargetInstrInfo *TII =
				// MF.getSubtarget<ConnexSubtarget>().getInstrInfo();

				MachineBasicBlock::iterator Ipredicate = IMI;
				// We make Ipredicate point to the predicate of this WHERE
				// block
				Ipredicate--;
				LLVM_DEBUG(dbgs() << " splitWhereBlock(): Ipredicate = " << *Ipredicate
				<< "\n");
				assert(Ipredicate->getOpcode() == Connex::NOP_BPF);
				Ipredicate--;
				LLVM_DEBUG(dbgs() << " splitWhereBlock(): Ipredicate (2 instr before) = "
				<< *Ipredicate << "\n");

				unsigned regDest = CONNEX_RESERVED_REGISTER_02;
				int changedPredicateOpnd = -1;

				// We check Ipredicate, the predicate, is 3-opcode
				assert(((
				// For the standard case:
				(Ipredicate->getOpcode() == Connex::EQ_H \|\|
				Ipredicate->getOpcode() == Connex::LT_H \|\|
				Ipredicate->getOpcode() == Connex::ULT_H) &&
				Ipredicate->getNumOperands() == 3) \|\|
				(
				// For disabled lane gating regions
				(Ipredicate->getOpcode() == Connex::EQ_SPECIAL_H \|\|
				Ipredicate->getOpcode() == Connex::LT_SPECIAL_H \|\|
				Ipredicate->getOpcode() == Connex::ULT_SPECIAL_H) &&
				Ipredicate->getNumOperands() == 4)) &&
				Ipredicate->getOperand(0).isReg() &&
				Ipredicate->getOperand(0).isDef() &&
				Ipredicate->getOperand(1).isReg() &&
				Ipredicate->getOperand(1).isUse() &&
				Ipredicate->getOperand(2).isReg() &&
				Ipredicate->getOperand(2).isUse());

				unsigned predicateInstrOpnd[2];
				predicateInstrOpnd[0] = Ipredicate->getOperand(1).getReg();
				predicateInstrOpnd[1] = Ipredicate->getOperand(2).getReg();

				destRegisterPredicateOfSplitWhere = Ipredicate->getOperand(0).getReg();
				LLVM_DEBUG(
				dbgs()
				<< "PassHandleMisplacedInstr: destRegisterPredicateOfSplitWhere = "
				<< destRegisterPredicateOfSplitWhere << "\n");

				/*
				assert( (predicateInstrOpnd[0] != CONNEX_RESERVED_REGISTER_02) &&
				(predicateInstrOpnd[1] != CONNEX_RESERVED_REGISTER_02) &&
				// MEGA-TODO: implement this - it happens for ADD/MUL.f16
				"We currently can't handle these cases because we have only 1 reserved "
				"register.");
				*/
				unsigned predicateInstrOpcode = Ipredicate->getOpcode();
				unsigned predicateInstrOpndAux[2];

				/* We look if predicateInstrOpnd[*] is updated/redefined
				either in the predicate instruction or in the
				instructions of the
				associated WHERE block before the Misplaced Instr instr.
				- i.e., if predicateInstrOpnd[1] changes then
				use it as predicateInstrOpnd[0].
				If NO change happens we do NOT need to save the
				value of predicateInstrOpnd[*], i.e., to create
				ORV_H below.

				We check this from Ipredicate(+1) (next instr after predicate) to I2(-1)
				(Misplaced Instr instr, exclusive).
				We check if any of the operands of the predicate change.
				NOTE: assert (if both change - we don't want to waste by reserving 2
				Connex registers - maybe we can change the Connex ASM code by hand
				to avoid this).
				*/
				/*
				if (Ipredicate->getOperand(0).getReg() ==
				Ipredicate->getOperand(1).getReg()) {
				// We changed the 1st input operand of the predicate
				changedPredicateOpnd = 0;
				}
				else
				if (Ipredicate->getOperand(0).getReg() ==
				Ipredicate->getOperand(2).getReg()) {
				// We changed the 2nd input operand of the predicate
				changedPredicateOpnd = 1;
				}
				*/

				MachineBasicBlock::iterator Iaux = Ipredicate;
				// Iaux++;
				MachineBasicBlock::iterator IauxEnd = I2; // I2 is Misplaced Instr

				IauxEnd++; // TREAT_ONLY_ONCE_CHANGE_PREDICATE_OPERANDS
				// IauxEnd--;

				/* Important: for the NEW predicate we don't care what we use for the
				destination register.

				We now check for the NEW predicate we create for the split if its input
				operands are updated between the
				original_predicate..Misplaced Instr */
				for (; Iaux != IauxEnd && Iaux != IE; Iaux++) {
				LLVM_DEBUG(dbgs() << " splitWhereBlock(): Iaux = " << *Iaux << "\n");
				if (Iaux->getNumOperands() >= 1 && Iaux->getOperand(0).isReg() &&
				Iaux->getOperand(0).isDef()) {
				if (Iaux->getOperand(0).getReg() == predicateInstrOpnd[0]) {
				assert((changedPredicateOpnd == -1 \|\| changedPredicateOpnd == 0) &&
				// MEGA-TODO: handle this assert violation case
				"It seems both input operands of the "
				"predicate get updated so we would need to "
				"reserve 2 Connex registers to handle well "
				"this case.");
				// We find that we subsequently change the 1st input operand of
				// the predicate
				changedPredicateOpnd = 0;
				} else if (Iaux->getOperand(0).getReg() == predicateInstrOpnd[1]) {
				/* We find that we subsequently change
				the 2nd input operand of the predicate */
				assert((changedPredicateOpnd == -1 \|\| changedPredicateOpnd == 1) &&
				// MEGA-TODO: handle this assert violation case
				"It seems both input operands of the "
				"predicate get updated so we would need "
				"to reserve 2 Connex registers to handle "
				"well this case.");
				changedPredicateOpnd = 1;
				}
				}
				}

				LLVM_DEBUG(dbgs() << " changedPredicateOpnd = " << changedPredicateOpnd
				<< " (for the input operands of the predicate)\n");

				if (changedPredicateOpnd == -1) {
				// regDest = predicateInstrOpnd[0];
				predicateInstrOpndAux[0] = predicateInstrOpnd[0];
				predicateInstrOpndAux[1] = predicateInstrOpnd[1];
				} else {
				/* Put a copy of the changed input register of the predicate instruction
				before Ipredicate, the initial predicate of this WHERE block. */
				#ifdef COPY_REGISTER_IMPLEMENTED_WITH_ORV_H
				if (regDest != predicateInstrOpnd[changedPredicateOpnd]) {
				BuildMI(MBB, Ipredicate, IMI->getDebugLoc(), TII->get(Connex::ORV_H),
				regDest)
				. // The reserved register, CONNEX_RESERVED_REGISTER_02
				addReg(predicateInstrOpnd[changedPredicateOpnd])
				.addReg(predicateInstrOpnd[changedPredicateOpnd]);
				}
				#else
				#error "This case is NOT implemented. Implement it!"
				#endif

				/*
				predicateInstrOpndAux[0] = regDest; // Reserved register
				predicateInstrOpndAux[1] = predicateInstrOpnd[1 - changedPredicateOpnd];
				*/
				predicateInstrOpndAux[changedPredicateOpnd] =
				CONNEX_RESERVED_REGISTER_02; // regDest
				predicateInstrOpndAux[1 - changedPredicateOpnd] =
				predicateInstrOpnd[1 - changedPredicateOpnd];
				}

				LLVM_DEBUG(dbgs() << " predicateInstrOpndAux[0] = "
				<< predicateInstrOpndAux[0] << "\n");
				LLVM_DEBUG(dbgs() << " predicateInstrOpndAux[1] = "
				<< predicateInstrOpndAux[1] << "\n");

				MachineBasicBlock::iterator I2succ = I2;
				I2succ++;
				BuildMI(MBB,
				I2, // Immediately before the Misplaced Instr instr
				IMI->getDebugLoc(), TII->get(Connex::END_WHERE)
				//, I2->getOperand(0).getReg()
				);
				LLVM_DEBUG(dbgs() << " Finished creating the END_WHERE\n");

				/*
				// TREAT_ONLY_ONCE_CHANGE_PREDICATE_OPERANDS

				// Ipredicate is predicate
				#if 0
				// Unnecessary check:
				assert(Ipredicate->getOperand(0).getReg() !=
				I2->getOperand(0).getReg());
				#endif

				// This check is actually VAGUELY different from the one above because
				// the one above inserts a register save (copy) instruction before the
				// original WHERE, while this new one after the new END_WHERE resulting
				// from the split.
				// Very Important Note: the new predicate WHERE can have the result stored
				in
				// RESERVED_REGISTER.
				// We now check for conflicts between:
				// - destination register operand of Misplaced Instr and
				// - input registers of predicate instruction.
				//
				// Note: I2 is the Misplaced Instr instruction that triggered the split of
				// WHERE block.
				//
				// Addressing the case, where after the split of WHERE* block we have
				// something like this immediately after the 1st new WHERE* block,
				// before the 2nd WHERE* block, where the repeated predicate instruction
				// (repeated by us) happens to use the register defined in the Misplaced
				// Instr instruction, which makes the computation incorrect:
				// END_WHERE;
				// R(26) = R(10) \| R(10); // This COPY (Misplaced Instr) instruction is
				// // the reason of the split
				// R(30) = R(26) < R(3);
				// NOP
				// WHERE*
				//
				// Note: CONNEX_RESERVED_REGISTER_01 is a reserved register.
				//
				// To correct the problem in this example we have to copy the value of
				// R(26) in R(30):
				// END_WHERE;
				// R(30) = R(26) \| R(26);
				// R(26) = R(10) \| R(10); // This COPY (Misplaced Instr) instruction is
				// // the reason of the split
				// R(30) = R(30) < R(3);
				// NOP
				// WHERE*
				int changeInputPredicateOperandsDueToMisplacedInstr = 0;
				if (predicateInstrOpnd[0] == I2->getOperand(0).getReg()) {
				changeInputPredicateOperandsDueToMisplacedInstr \|= 1;
				}
				if (predicateInstrOpnd[1] == I2->getOperand(0).getReg()) {
				changeInputPredicateOperandsDueToMisplacedInstr \|= 2;
				}
				//
				assert(changeInputPredicateOperandsDueToMisplacedInstr != 3 &&
				// important-TODO: handle this assert violation case
				"We shouldn't have such a case - doesn't really make sense for a "
				"conditional to have both operands equal.");

				LLVM_DEBUG(dbgs() << " changeInputPredicateOperandsDueToMisplacedInstr = "
				<< changeInputPredicateOperandsDueToMisplacedInstrMB
				<< "\n");
				// assert(! (changedPredicateOpnd != -1 &&
				// changeInputPredicateOperandsDueToMisplacedInstr != 0) &&
				// // TODO: if not merging the 2 cases together, handle this assert
				// // violation case,
				// "We currently can't handle both cases simultaneously.");
				//
				if (changeInputPredicateOperandsDueToMisplacedInstr != 0) {
				LLVM_DEBUG(dbgs()
				<< " PassHandleMisplacedInstr::runOnMachineFunction(): correcting "
				"the conflicting register (due to the Misplaced Instr) in the "
				"predicate instruction\n");
				MachineBasicBlock::iterator Icorrect = I2succ;
				//Icorrect++;
				#ifdef COPY_REGISTER_IMPLEMENTED_WITH_ORV_H
				BuildMI(MBB,
				Icorrect, // We insert this MachineInstr after new END_WHERE,
				// before Misplaced Instr instr
				IMI->getDebugLoc(),
				TII->get(Connex::ORV_H),
				CONNEX_RESERVED_REGISTER_02).
				addReg(I2->getOperand(0).getReg()).
				addReg(I2->getOperand(0).getReg());
				#else
				#error "This case is NOT implemented. Implement it!"
				#endif


				// Note: Ipredicate is the predicate for the 1st (part) WHERE* block.
				// // Ipredicate->getOperand(1).setReg(CONNEX_RESERVED_REGISTER_02);

				LLVM_DEBUG(dbgs()
				<< "PassHandleMisplacedInstr: after WHERE block processed: MBB = ";
				MBB.dump());
				// We check that we don't mess up the program - TODO we should also check
				// that the iterators are not messed up
				// for (MachineBasicBlock::iterator Inew = MBB.begin(),
				// IEnew = MBB.end(); Inew != IEnew; ++Inew) {
				// //MachineInstr *IMI = I;
				// LLVM_DEBUG(dbgs() << " runOnMachineFunction(): Inew = "
				// << *Inew << "\n");
				// }
				}
				*/ // End comment (TREAT_ONLY_ONCE_CHANGE_PREDICATE_OPERANDS)

				// I2succ++;
				LLVM_DEBUG(dbgs() << " moving I2 immediately after END_WHERE of "
				"split WHERE block\n");

				// Very Important: We create another predicate, a NOP and a new WHERE*
				// instructions, identical with the (previous) one associated to the
				// WHERE block, EXCEPT the destination register is
				// CONNEX_RESERVED_REGISTER_01 - this is safe.
				BuildMI(
				MBB,
				I2succ, // We insert new instr immediately before I2succ
				IMI->getDebugLoc(), TII->get(predicateInstrOpcode),
				CONNEX_RESERVED_REGISTER_01 // TODO: (2020_12_09) prove it's correct: it
				// looks we can use here also _01 reg (prove
				// by exploring all cases) instead of
				// CONNEX_RESERVED_REGISTER_03
				/*
				// destRegisterPredicateOfSplitWhere is made -1 only after
				// iterating over END_WHERE, below
				destRegisterPredicateOfSplitWhere != -1 ?
				destRegisterPredicateOfSplitWhere :
				regDest // It is CONNEX_RESERVED_REGISTER_02
				*/
				)
				.
				// We now change the conflicting register in the predicate
				// instruction.
				addReg((changedPredicateOpnd == 0)
				?
				/* // (! TREAT_ONLY_ONCE_CHANGE_PREDICATE_OPERANDS)
				addReg(((changeInputPredicateOperandsDueToMisplacedInstr & 1)
				== 1) ?
				*/
				(unsigned)CONNEX_RESERVED_REGISTER_02
				: predicateInstrOpndAux[0])
				. // predicateInstrOpnd1).
				addReg((changedPredicateOpnd == 1)
				?
				/* // (! TREAT_ONLY_ONCE_CHANGE_PREDICATE_OPERANDS)
				addReg(((changeInputPredicateOperandsDueToMisplacedInstr & 2)
				== 2) ?
				*/
				(unsigned)CONNEX_RESERVED_REGISTER_02
				: predicateInstrOpndAux[1]);

				BuildMI(MBB, I2succ, IMI->getDebugLoc(), TII->get(Connex::NOP_BPF));
				// TODO: maybe add an addImm(0)?, although it works without

				// We add the same WHERE instr as the one for this block
				/* This gives the following error:
				<<Assertion `!N->getParent() &&
				"machine instruction already in a basic block"' failed.>>
				MBB.insert(I2succ, IMI); // before I2succ
				*/
				LLVM_DEBUG(dbgs() << " splitWhereBlock(): IMI (for split) = " << IMI
				<< "\n");

				LLVM_DEBUG(dbgs() << " splitWhereBlock(): I2succ = " << I2succ << "\n");

				/*
				IMI = I2succ;
				LLVM_DEBUG(dbgs() << " IMI = I2succ = "
				<< *IMI << "\n");
				// Important: This makes IMI NULL since IMI is a MachineInstr
				// - see 35l_MatMul_f16/SIZE_128/L_required_manual_move_fills/
				// STDerr_llc_01_old17
				IMI--;
				*/
				// From http://llvm.org/doxygen/MachineInstrBuilder_8h_source.html#l00312:
				// "inserts the newly-built instruction before the given position".
				// See good comments on iterator invalidation at:
				// http://llvm.1065342.n5.nabble.com/
				// deleting-or-replacing-a-MachineInst-td77723.html
				I = BuildMI(MBB,
				I2succ, // We insert new instr immediately before I2succ
				IMI->getDebugLoc(), TII->get(IMI->getOpcode())
				//, regDest
				);

				// TODO: understand if it generates (due to iterator invalidation??) another
				// END_WHERE - see Tests/DawnCC/25k_map_i32/MUL_i32/ (output_old06.cpp?)

				// NOTE: I is the new WHERE* instruction just created
				// We update I2 to check for more Misplaced Instr instrs after the new
				// created WHERE
				I2 = I;
				I2++;

				// We update IMI since we insert Misplaced Instr before the predicate of
				// WHERE using IMI
				IMI = (&(*I));

				// MachineBasicBlock::iterator Iaux10 = I2succ; Iaux10--;
				LLVM_DEBUG(dbgs() << " I2succ = " << I2succ << "\n");
				LLVM_DEBUG(dbgs() << " IMI = " << IMI << "\n");
				LLVM_DEBUG(dbgs() << " I = " << I << "\n");
				LLVM_DEBUG(dbgs() << " I2 = " << I2 << "\n");

				// break;
				// assert();
				LLVM_DEBUG(dbgs() << " To check: IMI = " << IMI << "\n");

				LLVM_DEBUG(
				dbgs() << "splitWhereBlock(): after splitting WHERE block in 2: MBB = ";
				MBB.dump());
				} // End splitWhereBlock()

				/*
				* Note: The structure of the loop nest with iterators is:
				* I = main loop iterating over all instr of the MBB
				* IMI = I;
				* I2
				* if IMI == WHERE*
				* I2 = I + 1;
				* for (;; I2++) // <--- here starts handleMisplacedInstrs()
				* if I2 == ORV_H (or whatever is used to implement the COPY
				* (Misplaced Instr) primitive)
				* for (I3 = IMI + 1; ; I3++) // used to compute whatToDo;
				if I3 == END_WHERE
				break;
				compute whatToDo;
				*/
				inline void
				handleMisplacedInstrs(MachineBasicBlock &MBB, const TargetInstrInfo *TII,
				MachineBasicBlock::iterator &I, MachineInstr *&IMI,
				MachineBasicBlock::iterator &I2,
				// Misplaced Instr
				MachineBasicBlock::iterator &IE, bool &changedMF,
				int &destRegisterPredicateOfSplitWhere) {
				LLVM_DEBUG(dbgs() << "Entered handleMisplacedInstrs()");

				// Iterating over all remaining instructions of the BB
				for (; I2 != IE; /* I2++ */) {
				LLVM_DEBUG(dbgs() << " I2 = " << *I2);

				// TO_ADAPT: currently copyPhysReg() is implemented with ORV_H

				// Important: NORMALLY, inside WHERE blocks generated
				// with OPINCAA lib's Kernel::genLLVMISelManualCode(),
				// we are guaranteed to have only ORV_SPECIAL_H Connex
				// instructions, so meeting an ORV_H is only when a Misplaced Instr
				// was generated by the TwoAddressInstructionPass.
				if (
				#ifdef COPY_REGISTER_IMPLEMENTED_WITH_ORV_H
				I2->getOpcode() == Connex::ORV_H
				#else
				#error "This case is NOT implemented. Implement it!"
				#endif
				\|\| I2->getOpcode() == Connex::LD_FILL_H) {
				// MEGA-TODO: \|\| I2->getOpcode() == Connex::ST_FILL_H

				// The ORV_H instruction implemented in copyPhysReg()
				// has both input operands equal.
				// NOTE: the destination register of any instruction
				// I is I->getOperand(0).

				#ifdef COPY_REGISTER_IMPLEMENTED_WITH_ORV_H
				if (I2->getOpcode() == Connex::ORV_H)
				assert(I2->getOperand(1).getReg() == I2->getOperand(2).getReg() &&
				"I2 is an ORV_H with different input operands. "
				"Maybe too paranoid check: We do not "
				"recommend to have emulation OPINCAA kernels "
				"generated by Kernel::genLLVMISelManualCode() "
				"with ORV_H inside WHERE blocks (if these "
				"instructions come from there). But you "
				"can comment this assert and issue a simple "
				"warning.");
				/*
				if (I2->getOperand(1).getReg() !=
				I2->getOperand(2).getReg())
				LLVM_DEBUG(dbgs() << "PassHandleMisplacedInstr: Warning: "
				"I2->getOperand(1).getReg() != "
				"I2->getOperand(2).getReg()\n\n");
				*/
				#endif // COPY_REGISTER_IMPLEMENTED_WITH_ORV_H

				// From http://llvm.org/doxygen/MachineBasicBlock_8h_source.html:
				// MBB::insert(iterator, MI)
				// "Insert MI into the instruction list before I, possibly inside a
				// bundle.
				LLVM_DEBUG(dbgs() << " found Misplaced Instr (COPY/LD_FILL) at I2 = "
				<< *I2
				<< " --> moving it out of the WHERE block to "
				"preserve correct program semantics.\n");

				// We should move I2 before or after the WHERE block,
				// or split the WHERE block in 2. */
				/* The algo is (a sketch that MIGHT NOT reflect
				totally the implementation):
				NOTE: this is the case that allows having Misplaced Instr between
				predicate and WHERE instr.
				If the Misplaced Instr doesn't use (doesn't have as source)
				a register defined in the WHERE block
				BEFORE the Misplaced Instr (NO RAW/flow dependence relation
				to be broken)
				and also the Misplaced Instr doesn't define a register
				that is used by an instruction before
				(NO WAR/anti-dependence relation to be broken):
				We move the Misplaced Instr exactly before the
				WHERE instruction starting the block
				Else
				If the Misplaced Instr doesn't use (doesn't have as source)
				a register defined in the WHERE block,
				after the Misplaced Instr (NO WAR dep broken)
				and also the Misplaced Instr doesn't define a register
				used by an instruction after it (NO RAW dep broken):
				We move the Misplaced Instr exactly after the END_WHERE
				instruction ending the block
				Else
				Moving the Misplaced Instr immediately before/after
				the WHERE block is UNsafe and
				would change semantics program
				The solution is to split the WHERE block in
				two and for the 2nd WHERE block to copy the
				predicate (together with a NOP) just
				before it.
				*/

				#ifdef ALLOW_COPY_BETWEEN_PREDICATE_AND_WHERE_INSTRUCTIONS
				MachineBasicBlock::iterator I3 = IMI; // IMI is WHERE instr
				LLVM_DEBUG(dbgs() << " I3 = " << I3 << "\n");

				I3--;
				LLVM_DEBUG(dbgs() << " I3 (after 1 -)= " << I3 << "\n");

				assert(I3->getOpcode() == Connex::NOP \|\|
				I3->getOpcode() == Connex::NOP_BPF);

				I3--;
				LLVM_DEBUG(dbgs() << " I3 (after 2 -)= " << I3 << "\n");
				assert(I3->getOpcode() == Connex::EQ_H \|\|
				I3->getOpcode() == Connex::LT_H \|\|
				I3->getOpcode() == Connex::ULT_H);
				#else
				MachineBasicBlock::iterator I3 = IMI; // IMI is WHERE instr
				I3++;
				#endif

				#define SAFE_SINCE_NO_CONSTRAINT 0
				#define UNSAFE_TO_PUT_COPY_BEFORE_WHERE_BLOCK 1
				#define UNSAFE_TO_PUT_COPY_AFTER_WHERE_BLOCK 2
				#define SAFE_TO_PUT_COPY_IN_SPLIT_WHERE_BLOCK 3
				int whatToDo = SAFE_SINCE_NO_CONSTRAINT;

				// bool I2afterIsInsideWhereBlock = true;
				bool I3IsBeforeI2 = true;

				// Remember: I2 points to the Misplaced Instr instruction
				for (; I3 != IE; I3++) {
				if (I3->getOpcode() == Connex::END_WHERE) {
				break;
				}

				LLVM_DEBUG(dbgs() << " I3 = " << I3);

				if (I3 == I2) {
				I3IsBeforeI2 = false;
				continue;
				}
				LLVM_DEBUG(dbgs() << " I3IsBeforeI2 = " << I3IsBeforeI2 << "\n");

				// We look at all operands of instruction I3
				// Note: I3->getOperand(0) is result of I3; the rest are inputs.
				for (unsigned idOpnd = 0; idOpnd < I3->getNumOperands(); idOpnd++) {
				MachineOperand &I3Opnd = I3->getOperand(idOpnd);

				LLVM_DEBUG(dbgs() << " I3Opnd (index = " << idOpnd
				<< ") = " << I3Opnd << "\n");

				if (I3Opnd.isReg() && I3Opnd.isUse()) {
				// Remember: I2 points to the Misplaced Instr instruction
				if (I3Opnd.getReg() == I2->getOperand(0).getReg()) {
				if (I3IsBeforeI2) {
				// RBW dependence w.r.t. Misplaced Instr (I2), which writes
				// I3 uses or defines the dst-register of I2 (the Misplaced
				// Instr instr)
				LLVM_DEBUG(dbgs() << " I3, which is before I2, "
				"uses (RAW dependence) the "
				"dst-register of I2 "
				"--> moving I2 before the "
				"WHERE block is NOT safe\n");

				whatToDo \|= UNSAFE_TO_PUT_COPY_BEFORE_WHERE_BLOCK;
				/*
				LLVM_DEBUG(dbgs() << " changing I2afterOpnd's reg to = "
				<< I2->getOperand(0).getReg() << "\n");
				I2afterOpnd.setReg(I2->getOperand(1).getReg());
				*/
				} else { // NOT I3IsBeforeI2
				// RAW dependence w.r.t. Misplaced Instr (I2), which writes
				// I3 uses the dst-register of I2 (the Misplaced Instr)
				LLVM_DEBUG(dbgs() << " I3, which is after I2, "
				"uses (RAW dependence) the dst-register "
				"of I2 --> moving I2 after the "
				"WHERE block is NOT safe\n");

				whatToDo \|= UNSAFE_TO_PUT_COPY_AFTER_WHERE_BLOCK;
				}
				} else
				// Although we are safe on the else branch,
				// we put this code here for "completness".
				if (
				#ifdef COPY_REGISTER_IMPLEMENTED_WITH_ORV_H
				I2->getOpcode() == Connex::ORV_H &&
				#endif
				I3Opnd.getReg() == I2->getOperand(1).getReg()) {
				// RAR dependence - none actually :)
				if (I3IsBeforeI2) {
				// I3 uses the dst-register of I2 (the Misplaced Instr)
				LLVM_DEBUG(dbgs() << " I3, which is before I2, "
				"uses(RAR dependence) the "
				"src-register of I2 "
				"--> everything is safe\n");

				// whatToDo \|= UNSAFE_TO_PUT_COPY_BEFORE_WHERE_BLOCK;
				} else {
				// I3 uses the dst-register of I2 (the Misplaced Instr)
				LLVM_DEBUG(dbgs() << " I3, which is after I2, "
				"uses (RAR dependence) the "
				"src-register of I2 "
				"--> everything is safe\n");

				// whatToDo \|= UNSAFE_TO_PUT_COPY_AFTER_WHERE_BLOCK;
				}
				}
				} // End I3Opnd.isUse()
				else if (I3Opnd.isReg() && I3Opnd.isDef()) {
				// Remember: I2 points to the Misplaced Instr
				if (I3Opnd.getReg() == I2->getOperand(0).getReg()) {
				if (I3IsBeforeI2) {
				// WAW dependence w.r.t. Misplaced Instr (I2), which writes
				// I3 defs the dst-register of I2 (the Misplaced Instr instr)
				LLVM_DEBUG(dbgs() << " I3, which is before I2, "
				"defs (WAW dependence) the "
				"dst-register of I2 --> "
				"moving I2 before the "
				"WHERE block is NOT safe\n");

				whatToDo \|= UNSAFE_TO_PUT_COPY_BEFORE_WHERE_BLOCK;
				} else {
				// WAW dependence w.r.t. Misplaced Instr (I2), which writes
				// I3 defs the dst-register of I2 (the Misplaced Instr instr)
				LLVM_DEBUG(dbgs() << " I3, which is after I2, "
				"defs (WAW dependence) the "
				"dst-register of I2 --> "
				"moving I2 after the "
				"WHERE block is NOT safe\n");

				whatToDo \|= UNSAFE_TO_PUT_COPY_AFTER_WHERE_BLOCK;
				}
				} else if (
				#ifdef COPY_REGISTER_IMPLEMENTED_WITH_ORV_H
				I2->getOpcode() == Connex::ORV_H &&
				#endif
				I3Opnd.getReg() == I2->getOperand(1).getReg()) {
				if (I3IsBeforeI2) {
				// RAW dependence w.r.t. I3, which writes
				// I3 defs the dst-register of I2 (the Misplaced Instr)
				LLVM_DEBUG(dbgs() << " I3, which is before I2, "
				"defs (RAW dependence) the src-register "
				"of I2 --> moving I2 before the "
				"WHERE block is NOT safe\n");

				whatToDo \|= UNSAFE_TO_PUT_COPY_BEFORE_WHERE_BLOCK;
				} else {
				// RBW dependence w.r.t. I3, which writes
				// I3 defs the dst-register of I2 (the Misplaced Instr instr)
				LLVM_DEBUG(dbgs() << " I3, which is after I2, "
				"defs (RAW dependence) the src-register "
				"of I2 --> moving I2 after the "
				"WHERE block is NOT safe\n");

				whatToDo \|= UNSAFE_TO_PUT_COPY_AFTER_WHERE_BLOCK;
				}
				}
				} // End I3Opnd.isDef()
				} // End for loop idOpnd
				} // End for loop with ind-var I3

				/*
				* Note: The structure of the loop nest with iterators is:
				* I = main loop iterating over all instr of the MBB
				* IMI = I;
				* I2
				* if IMI == WHERE*
				* I2 = I + 1;
				* for (;; I2++) // <--- here starts handleMisplacedInstrs()
				* if I2 == ORV_H (or whatever is used to implement the COPY
				* (Misplaced Instr) primitive)
				* for (I3 = IMI + 1; ; I3++) // used to compute whatToDo;
				if I3 == END_WHERE
				break;
				compute whatToDo;
				*/

				MachineBasicBlock::iterator I2plus1 = I2;
				//
				// We need to increment it, otherwise it looks that
				// I2 and I2plus1 are identical after remove()
				// and insert()
				I2plus1++;
				LLVM_DEBUG(dbgs() << " runOnMachineFunction(): I2plus1 = " << *I2plus1
				<< "\n");
				LLVM_DEBUG(
				dbgs() << " runOnMachineFunction(): I2 (before moving I2) = "
				<< *I2 << "\n");
				LLVM_DEBUG(dbgs() << " whatToDo = " << whatToDo << "\n");

				if ( // whatToDo == SAFE_SINCE_NO_CONSTRAINT \|\|
				whatToDo == UNSAFE_TO_PUT_COPY_AFTER_WHERE_BLOCK) {
				// Case 1 from paper
				LLVM_DEBUG(dbgs() << " Case 1 from paper --> calling "
				"putMisplacedInstrBeforeWhereBlock()");

				// Moving Misplaced Instr before the WHERE block.
				putMisplacedInstrBeforeWhereBlock(MBB, TII, IMI, I2, I2plus1, IE,
				changedMF,
				destRegisterPredicateOfSplitWhere);
				// break;

				} // End moving I2 just before logical instr linked to WHERE block
				else if (
				// We treat here SAFE_SINCE_NO_CONSTRAINT because moving after WHERE
				// block doesn't add any auxiliary instruction
				whatToDo == SAFE_SINCE_NO_CONSTRAINT \|\|
				whatToDo == UNSAFE_TO_PUT_COPY_BEFORE_WHERE_BLOCK) {
				// Case 2 from paper
				// TODO: we should put multiple Misplaced Instr instructions from
				// this WHERE block in the SAME order after END_WHERE. See if such
				// cases happen.
				LLVM_DEBUG(dbgs() << " moving I2 immediately after WHERE block "
				"(Case 2 from paper)\n");
				assert(I3 != IE);

				LLVM_DEBUG(dbgs()
				<< " runOnMachineFunction(): I2 = " << I2 << "\n");

				// I3 is pointing to END_WHERE (see code above)
				LLVM_DEBUG(dbgs()
				<< " runOnMachineFunction(): I3 = " << I3 << "\n");

				assert((I3->getOpcode() == Connex::END_WHERE) &&
				"I3 should point to END_WHERE (see code above).");
				/*
				assert( (I3->getOpcode() == Connex::WHEREEQ \|\|
				I3->getOpcode() == Connex::WHERELT \|\|
				I3->getOpcode() == Connex::WHERECRY) &&
				"We should NOT be arriving here otherwise.");
				*/

				I3++; // Jump over END_WHERE (normally)
				LLVM_DEBUG(dbgs() << " runOnMachineFunction(): *I3 (after I3++) = "
				<< *I3 << "\n");

				LLVM_DEBUG(dbgs()
				<< " runOnMachineFunction(): Preparing to remove *I2 = "
				<< I2 << " and moving it before I3 = " << *I3
				<< "\n");
				MBB.remove((&(*I2)));
				MBB.insert(I3, (&(*I2))); // It inserts before I3

				/*
				// This is NOT good for case where we have 2+ Misplaced Instrs
				// instrs in the WHERE block: I = I3;
				// I2++;
				// I = I2;
				*/
				LLVM_DEBUG(dbgs()
				<< " runOnMachineFunction(): *I2 (after moving I2) = "
				<< *I2 << "\n");
				// I2plus1++;
				LLVM_DEBUG(dbgs() << " runOnMachineFunction(): *I2plus1 = "
				<< *I2plus1 << "\n");

				// Here we handle the case of more than 1 Misplaced Instr
				// instr in the WHERE block (I2plus1 represents the next
				// instr after the Misplaced Instr (before move))
				I2 = I2plus1;

				MachineBasicBlock::iterator I2plus2 = I2plus1;
				I2plus2++;
				LLVM_DEBUG(dbgs() << " runOnMachineFunction(): *I2plus2 = "
				<< *I2plus2 << "\n");

				changedMF = true;
				// This is NOT good for case where we have 2+ Misplaced Instrs
				// instrs in the WHERE block: break;
				// We keep searching with I2 for loop in this WHERE block
				// for more Misplaced Instrs.
				} // End if (whatToDo == UNSAFE_TO_PUT_COPY_BEFORE_WHERE_BLOCK)
				else if (whatToDo == SAFE_TO_PUT_COPY_IN_SPLIT_WHERE_BLOCK) {
				// Case 3 from paper
				LLVM_DEBUG(dbgs()
				<< " Case 3 from paper --> calling splitWhereBlock()");
				splitWhereBlock(MBB, TII, I, IMI, I2, IE, changedMF,
				destRegisterPredicateOfSplitWhere);
				LLVM_DEBUG(dbgs() << " After calling splitWhereBlock(): *IMI = "
				<< *IMI << "\n");
				} // End if SPLIT WHERE block
				else
				// Important: we increment here the iterator over instruction in
				// WHERE block
				I2++;
				} // End if (I2->getOpcode() == Connex::ORV_H)
				else {
				// Important: we increment here the iterator over instruction in
				// WHERE block
				I2++;
				// else
				}

				// Note that the END_WHERE takes input node and has a value output
				if (I2->getOpcode() == Connex::END_WHERE) {
				LLVM_DEBUG(dbgs() << " found END_WHERE --> breaking I2 loop\n");
				I2++;
				I = I2;

				// MEGA-TODO: think if OK here
				destRegisterPredicateOfSplitWhere = -1;

				LLVM_DEBUG(
				dbgs() << " Making destRegisterPredicateOfSplitWhere = -1\n");

				break;
				}

				LLVM_DEBUG(
				dbgs() << "PassHandleMisplacedInstr: at end of for loop I2, *I2 = "
				<< I2 << " and IMI = " << *IMI);
				} // End for loop with ind-var I2
				} // End handleMisplacedInstrs()

				/// \brief Loop over all of the basic blocks
				bool runOnMachineFunction(MachineFunction &MF) {
				bool changedMF = false;

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineFunction.html
				LLVM_DEBUG(
				dbgs() << "Entered PassHandleMisplacedInstr::runOnMachineFunction(MF = "
				//; MF.dump();
				<< MF.getName()
				// dbgs()
				<< ")\n");
				// bool Changed = false;

				// Process all basic blocks.
				for (auto &MBB : MF) {
				// int anotherReservedRegister = -1;
				int destRegisterPredicateOfSplitWhere = -1;

				// For the current MBB:
				// See llvm.org/docs/doxygen/html/classllvm_1_1MachineBasicBlock.html
				LLVM_DEBUG(
				dbgs()
				<< "PassHandleMisplacedInstr::runOnMachineFunction(): a new MBB = "
				<< MBB << "\n");

				const TargetInstrInfo *TII =
				MF.getSubtarget<ConnexSubtarget>().getInstrInfo();

				// See llvm.org/docs/doxygen/html/classllvm_1_1MachineBasicBlock.html
				LLVM_DEBUG(
				dbgs()
				<< "PassHandleMisplacedInstr::runOnMachineFunction(): again MBB = "
				<< MBB << "\n");

				for (MachineBasicBlock::iterator I = MBB.begin(), IE = MBB.end(); I != IE;
				++I) {
				MachineInstr IMI = (&(I));
				/*
				if (IMI == &MI)
				I++;
				// predMI contains normally instruction VLOAD_H_SYM_IMM
				break;
				// predMI = I;
				*/
				LLVM_DEBUG(dbgs() << " runOnMachineFunction(): I = " << I << "\n");
				LLVM_DEBUG(
				dbgs() << " runOnMachineFunction(): DontTreatCopyInstructions = "
				<< DontTreatCopyInstructions << "\n");

				if (DontTreatCopyInstructions == false) {
				// Important: we move the Misplaced Instr instructions outside
				// the WHERE block, just like the ARM/Thumb2ITBlockPass.cpp
				// does (the ARM pass is also registered in addPreSched2()).
				// Note that moving Misplaced Instrs before WHERE (ARM IT) blocks
				// (as it seems ARM surprisingly is doing, since
				// MBB::insert(iterator, MI) does "Insert MI into the
				// instruction list before I, possibly inside a bundle.")
				// can change semantics in most cases.

				// Important: First we remove any Misplaced Instrs
				// generated by the TwoAddressInstructionPass and not erased
				// by RegisterCoalescer (transformed
				// into ORV_H) instructions inside WHERE* blocks.
				// This is to handle cases like sequences of manually
				// selected instructions in ConnexISelDAGToDAG for MULi32,
				// DIVi16, etc.
				if (IMI->getOpcode() == Connex::WHEREEQ \|\|
				IMI->getOpcode() == Connex::WHERELT \|\|
				IMI->getOpcode() == Connex::WHERECRY) {
				LLVM_DEBUG(dbgs() << "runOnMachineFunction(): found WHERE block\n");

				// Removing useless COPY immediately before WHERE* block
				// (between NOP and WHERE*, where it should normally be put).
				// It is useless - we eye-balled seriously on a few
				// programs, most notably SSD.f16 on Jul 29-30 2018
				// (I guess - MEGA-TODO: check if so) always because it is
				// generated by the WHERE* instruction and,
				// therefore, it's NOT required.
				// important-TODO: we should take care of COPY
				// instructions being moved by the post-RA scheduler. */
				MachineBasicBlock::iterator ItmpToErase = IMI;
				ItmpToErase--;
				if (ItmpToErase->getOpcode() != Connex::NOP_BPF
				//\|\| ItmpToErase->getOpcode() == Connex::NOP
				) {
				#ifdef COPY_REGISTER_IMPLEMENTED_WITH_ORV_H
				if (ItmpToErase->getOpcode() == Connex::ORV_H) {
				#else
				#error "This case is NOT implemented. Implement it!"
				#endif
				MachineInstr Iremove = (&(ItmpToErase));
				// ItmpToErase--;

				// We assert this COPY is related to the WHERE*
				// instruction - if NOT, then the COPY was moved
				// probably by the post-RA scheduler here.
				assert(Iremove->getOperand(0).isReg() &&
				Iremove->getOperand(0).isDef() &&
				Iremove->getOperand(0).getReg() ==
				IMI->getOperand(0).getReg());

				// Checking that it is really safe to remove this COPY
				// since it is not used by any instruction after it.
				MachineBasicBlock::iterator Icheck = I;
				//
				// We jump over the WHERE* instruction found
				Icheck++;
				LLVM_DEBUG(dbgs() << " runOnMachineFunction(): Icheck = "
				<< *Icheck << "\n");
				// Iterating over all remaining instructions of the BB
				for (; Icheck != IE; Icheck++) {
				LLVM_DEBUG(dbgs() << " Icheck = " << *Icheck);
				if (Icheck->getNumOperands() > 0 &&
				Icheck->getOperand(0).isReg() &&
				Icheck->getOperand(0).getReg() ==
				Iremove->getOperand(0).getReg()) {
				// It normally has to be a def - if it's a use it's bad
				assert(
				Icheck->getOperand(0).isDef() &&
				"PassHandleMisplacedInstr: Found a 'useless' COPY "
				"that is not useless since it is used after... - "
				"this is not good --> change ConnexTargetMachine.cpp");
				break;
				}
				}

				LLVM_DEBUG(dbgs() << " Removing useless COPY immediately "
				"before the WHERE block.\n");

				MBB.remove(Iremove);
				}
				}

				MachineBasicBlock::iterator I2 = I; // + 1;
				// We jump over the WHERE* instruction found
				I2++;
				LLVM_DEBUG(dbgs()
				<< " runOnMachineFunction(): I2 = " << I2 << "\n");

				// continue;

				handleMisplacedInstrs(MBB, TII, I, IMI,
				I2, // Misplaced Instr
				IE, changedMF,
				destRegisterPredicateOfSplitWhere);

				LLVM_DEBUG(dbgs() << "PassHandleMisplacedInstr: after WHERE "
				"block processed: MBB = ";
				MBB.dump());
				LLVM_DEBUG(dbgs() << "PassHandleMisplacedInstr: IMI = " << IMI);
				} // End if WHERE*
				} // End if (DontTreatCopyInstructions == false)
				} // End for (MachineBasicBlock::iterator I

				} // End for (auto &MBB : MF)

				LLVM_DEBUG(dbgs() << " runOnMachineFunction(): changedMF = " << changedMF
				<< "\n");

				return changedMF; // indicates if we changed MF
				} // end runOnMachineFunction(MachineFunction &MF)

				private:
				MachineRegisterInfo *MRI;

				static char ID;
				}; // namespace
				char PassHandleMisplacedInstr::ID = 0;

				} // End namespace

				// Important: We don't use bundles, since we avoid using the post-RA scheduler

				namespace llvm {
				FunctionPass *createPassHandleMisplacedInstr() {
				return new PassHandleMisplacedInstr();
				}
				} // namespace llvm

				namespace {

				// Connex Code Generator Pass Configuration Options.
				class ConnexPassConfig : public TargetPassConfig {
				public:
				ConnexPassConfig(ConnexTargetMachine *TM, PassManagerBase &PM)
				: TargetPassConfig((LLVMTargetMachine &)(*TM), PM) {}

				ConnexTargetMachine &getConnexTargetMachine() const {
				return getTM<ConnexTargetMachine>();
				}

				// Important: Not executing these methods following defined in the class
				// results in error:
				// <<llc: target does not support generation of this file type!>>

				// bool addInstSelector() override;
				// Install an instruction selector pass using
				// the ISelDag to gen Connex code; also register extra passes.
				bool /* ConnexPassConfig:: */ addInstSelector() {
				// The registered pass is run immediately after the 1st List
				// scheduling, after the ISel pass registered above.
				// The reason it is NOT directly after the ISel pass is that it seems
				// that the 1st scheduling
				// pass is considered to be linked together with ISel.
				addPass(createConnexISelDag(getConnexTargetMachine()));

				return false;
				}

				// From http://llvm.org/docs/doxygen/html/classllvm_1_1TargetPassConfig.html
				// This method may be implemented by targets that want to run passes
				// immediately before register allocation.
				void addPreRegAlloc() {}

				void addPostRegAlloc() {}

				// From http://llvm.org/doxygen/classllvm_1_1TargetPassConfig.html:
				// <<This pass may be implemented by targets that want to run passes
				// immediately before machine code is emitted.>>
				void addPreEmitPass() {
				LLVM_DEBUG(dbgs() << "Entered ConnexPassConfig::addPreEmitPass().\n");

				addPass(createPassHandleMisplacedInstr());

				// Here we add a stand-alone hazard recognizer pass
				// Very Important: the post-RA hazard recognizer is called iff
				// we give:
				// llc -post-RA-scheduler ...
				addPass(&PostRAHazardRecognizerID);
				}
				}; // End class ConnexPassConfig

				} // end namespace

				TargetPassConfig *ConnexTargetMachine::createPassConfig(PassManagerBase &PM) {
				return new ConnexPassConfig(this, PM);
				}

				// Inspired from ARCTargetMachine.cpp
				TargetTransformInfo
				ConnexTargetMachine::getTargetTransformInfo(const Function &F) const {
				return TargetTransformInfo(ConnexTTIImpl(this, F));
				}

llvm/lib/Target/Connex/ConnexTargetTransformInfo.h

This file was added.

				//===-- ConnexTargetTransformInfo.h - Connex specific TTI -------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				/// This file contains a TargetTransformInfo::Concept conforming object specific
				/// to the Connex target machine. It uses the target's detailed information to
				/// provide more precise answers to certain TTI queries, while letting the
				/// target independent and default TTI implementations handle the rest.
				///
				//===----------------------------------------------------------------------===//

				// Inspired from XCore/XCoreTargetTransformInfo.h

				#ifndef LLVM_LIB_TARGET_CONNEX_CONNEXTARGETTRANSFORMINFO_H
				#define LLVM_LIB_TARGET_CONNEX_CONNEXTARGETTRANSFORMINFO_H

				#include "Connex.h"
				#include "ConnexTargetMachine.h"
				#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/CodeGen/BasicTTIImpl.h"
				#include "llvm/CodeGen/TargetLowering.h"

				namespace llvm {

				class ConnexTTIImpl : public BasicTTIImplBase<ConnexTTIImpl> {
				typedef BasicTTIImplBase<ConnexTTIImpl> BaseT;
				typedef TargetTransformInfo TTI;
				friend BaseT;

				const ConnexSubtarget *ST;
				const ConnexTargetLowering *TLI;

				const ConnexSubtarget *getST() const {
				LLVM_DEBUG(dbgs() << "Entered getST()\n");
				return ST;
				}

				const ConnexTargetLowering *getTLI() const {
				LLVM_DEBUG(dbgs() << "Entered getTLI()\n");
				return TLI;
				}

				public:
				bool isLegalMaskedGather(Type *DataTy, Align Alignment) {
				// Inspired from X86TargetTransformInfo.cpp
				LLVM_DEBUG(dbgs() << "Entered isLegalMaskedGather()\n");

				/*
				// Some CPUs have better gather performance than others.
				// TODO: Remove the explicit ST->hasAVX512()?, That would mean we would only
				// enable gather with a -march.
				if (!(ST->hasAVX512() \|\| (ST->hasFastGather() && ST->hasAVX2())))
				return false;

				// This function is called now in two cases: from the Loop Vectorizer
				// and from the Scalarizer.
				// When the Loop Vectorizer asks about legality of the feature,
				// the vectorization factor is not calculated yet. The Loop Vectorizer
				// sends a scalar type and the decision is based on the width of the
				// scalar element.
				// Later on, the cost model will estimate usage this intrinsic based on
				// the vector type.
				// The Scalarizer asks again about legality. It sends a vector type.
				// In this case we can reject non-power-of-2 vectors.
				// We also reject single element vectors as the type legalizer can't
				// scalarize it.
				if (isa<VectorType>(DataTy)) {
				unsigned NumElts = DataTy->getVectorNumElements();
				if (NumElts == 1 \|\| !isPowerOf2_32(NumElts))
				return false;
				}
				Type *ScalarTy = DataTy->getScalarType();
				if (ScalarTy->isPointerTy())
				return true;

				if (ScalarTy->isFloatTy() \|\| ScalarTy->isDoubleTy())
				return true;

				if (!ScalarTy->isIntegerTy())
				return false;

				unsigned IntWidth = ScalarTy->getIntegerBitWidth();
				return IntWidth == 32 \|\| IntWidth == 64;
				*/

				Type *ScalarTy = DataTy->getScalarType();

				if (ScalarTy->isHalfTy())
				return true;

				if (ScalarTy->isIntegerTy()) {
				unsigned IntWidth = ScalarTy->getIntegerBitWidth();
				LLVM_DEBUG(dbgs() << "isLegalMaskedGather(): IntWidth = "
				<< IntWidth << "\n");
				return (IntWidth == 16) \|\| (IntWidth == 32);
				}

				return false;
				}

				bool isLegalMaskedScatter(Type *DataType, Align Alignment) {
				LLVM_DEBUG(dbgs() << "Entered isLegalMaskedScatter()\n");

				// Inspired from X86TargetTransformInfo.cpp
				return isLegalMaskedGather(DataType, Alignment);
				}

				public:
				explicit ConnexTTIImpl(const ConnexTargetMachine *TM, const Function &F)
				: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl()),
				TLI(ST->getTargetLowering()) {
				LLVM_DEBUG(dbgs() << "Entered constructor ConnexTTIImpl()\n");
				}

				/*
				unsigned getNumberOfRegisters(bool Vector) {
				if (Vector) {
				return 0;
				}
				return 12;
				}
				*/
				};

				} // end namespace llvm

				#endif

llvm/lib/Target/Connex/MCTargetDesc/CMakeLists.txt

This file was added.

				add_llvm_component_library(LLVMConnexDesc
				ConnexMCTargetDesc.cpp
				ConnexAsmBackend.cpp
				ConnexInstPrinter.cpp
				ConnexMCCodeEmitter.cpp
				ConnexELFObjectWriter.cpp

				LINK_COMPONENTS
				ConnexInfo
				MC
				Support

				ADD_TO_COMPONENT
				Connex
				)

llvm/lib/Target/Connex/MCTargetDesc/ConnexAsmBackend.cpp

This file was added.

				//===-- ConnexAsmBackend.cpp - Connex Assembler Backend -------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "MCTargetDesc/ConnexMCTargetDesc.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/MC/MCAsmBackend.h"
				#include "llvm/MC/MCAssembler.h"
				#include "llvm/MC/MCContext.h"
				#include "llvm/MC/MCFixup.h"
				#include "llvm/MC/MCObjectWriter.h"
				#include "llvm/Support/EndianStream.h"
				#include <cassert>
				#include <cstdint>

				using namespace llvm;

				namespace {

				class ConnexAsmBackend : public MCAsmBackend {
				public:
				ConnexAsmBackend(support::endianness Endian) : MCAsmBackend(Endian) {}

				~ConnexAsmBackend() override = default;

				void applyFixup(const MCAssembler &Asm, const MCFixup &Fixup,
				const MCValue &Target, MutableArrayRef<char> Data,
				uint64_t Value, bool IsResolved,
				const MCSubtargetInfo *STI) const override;

				std::unique_ptr<MCObjectTargetWriter>
				createObjectTargetWriter() const override;

				// No instruction requires relaxation
				bool fixupNeedsRelaxation(const MCFixup &Fixup, uint64_t Value,
				const MCRelaxableFragment *DF,
				const MCAsmLayout &Layout) const override {
				return false;
				}

				unsigned getNumFixupKinds() const override { return 1; }

				bool mayNeedRelaxation(const MCInst &Inst,
				const MCSubtargetInfo &STI) const override {
				return false;
				}

				bool writeNopData(raw_ostream &OS, uint64_t Count,
				const MCSubtargetInfo *STI) const override;
				};

				} // End anonymous namespace

				bool ConnexAsmBackend::writeNopData(raw_ostream &OS, uint64_t Count,
				const MCSubtargetInfo *STI) const {
				if ((Count % 8) != 0)
				return false;

				for (uint64_t i = 0; i < Count; i += 8)
				support::endian::write<uint64_t>(OS, 0x15000000, Endian);

				return true;
				}

				void ConnexAsmBackend::applyFixup(const MCAssembler &Asm, const MCFixup &Fixup,
				const MCValue &Target,
				MutableArrayRef<char> Data, uint64_t Value,
				bool IsResolved,
				const MCSubtargetInfo *STI) const {
				if (Fixup.getKind() == FK_SecRel_4 \|\| Fixup.getKind() == FK_SecRel_8) {
				// The Value is 0 for global variables, and the in-section offset
				// for static variables. Write to the immediate field of the inst.
				assert(Value <= UINT32_MAX);
				support::endian::write<uint32_t>(&Data[Fixup.getOffset() + 4],
				static_cast<uint32_t>(Value), Endian);
				} else if (Fixup.getKind() == FK_Data_4) {
				support::endian::write<uint32_t>(&Data[Fixup.getOffset()], Value, Endian);
				} else if (Fixup.getKind() == FK_Data_8) {
				support::endian::write<uint64_t>(&Data[Fixup.getOffset()], Value, Endian);
				} else if (Fixup.getKind() == FK_PCRel_4) {
				Value = (uint32_t)((Value - 8) / 8);
				if (Endian == support::little) {
				Data[Fixup.getOffset() + 1] = 0x10;
				support::endian::write32le(&Data[Fixup.getOffset() + 4], Value);
				} else {
				Data[Fixup.getOffset() + 1] = 0x1;
				support::endian::write32be(&Data[Fixup.getOffset() + 4], Value);
				}
				} else {
				assert(Fixup.getKind() == FK_PCRel_2);
				Value = (uint16_t)((Value - 8) / 8);
				support::endian::write<uint16_t>(&Data[Fixup.getOffset() + 2], Value,
				Endian);
				}
				}

				std::unique_ptr<MCObjectTargetWriter>
				ConnexAsmBackend::createObjectTargetWriter() const {
				return createConnexELFObjectWriter(0);
				}

				MCAsmBackend *llvm::createConnexAsmBackend(const Target &T,
				const MCSubtargetInfo &STI,
				const MCRegisterInfo &MRI,
				const MCTargetOptions &) {
				return new ConnexAsmBackend(support::little);
				}

llvm/lib/Target/Connex/MCTargetDesc/ConnexELFObjectWriter.cpp

This file was added.

				//===-- ConnexELFObjectWriter.cpp - Connex ELF Writer ---------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "MCTargetDesc/ConnexMCTargetDesc.h"
				#include "llvm/BinaryFormat/ELF.h"
				#include "llvm/MC/MCELFObjectWriter.h"
				#include "llvm/MC/MCFixup.h"
				#include "llvm/MC/MCObjectWriter.h"
				#include "llvm/MC/MCValue.h"
				#include "llvm/Support/ErrorHandling.h"
				#include <cstdint>

				using namespace llvm;

				namespace {

				class ConnexELFObjectWriter : public MCELFObjectTargetWriter {
				public:
				ConnexELFObjectWriter(uint8_t OSABI);

				~ConnexELFObjectWriter() override;

				protected:
				unsigned getRelocType(MCContext &Ctx, const MCValue &Target,
				const MCFixup &Fixup, bool IsPCRel) const override;
				};

				} // end anonymous namespace

				ConnexELFObjectWriter::ConnexELFObjectWriter(uint8_t OSABI)
				: MCELFObjectTargetWriter(/Is64Bit/ true, OSABI, ELF::EM_NONE,
				/HasRelocationAddend/ false) {}

				ConnexELFObjectWriter::~ConnexELFObjectWriter() {}

				unsigned ConnexELFObjectWriter::getRelocType(MCContext &Ctx,
				const MCValue &Target,
				const MCFixup &Fixup,
				bool IsPCRel) const {
				// Determine the type of the relocation
				switch ((unsigned)Fixup.getKind()) {
				default:
				llvm_unreachable("invalid fixup kind!");
				case FK_SecRel_8:
				return ELF::R_BPF_64_64;
				case FK_PCRel_4:
				case FK_SecRel_4:
				return ELF::R_BPF_64_32;
				case FK_Data_8:
				return ELF::R_BPF_64_64;
				case FK_Data_4:
				// .BTF.ext generates FK_Data_4 relocations for
				// insn offset by creating temporary labels.
				// The insn offset is within the code section and
				// already been fulfilled by applyFixup(). No
				// further relocation is needed.
				if (const MCSymbolRefExpr *A = Target.getSymA()) {
				if (A->getSymbol().isTemporary()) {
				MCSection &Section = A->getSymbol().getSection();
				const MCSectionELF *SectionELF = dyn_cast<MCSectionELF>(&Section);
				assert(SectionELF && "Null section for reloc symbol");

				// The reloc symbol should be in text section.
				unsigned Flags = SectionELF->getFlags();
				if ((Flags & ELF::SHF_ALLOC) && (Flags & ELF::SHF_EXECINSTR))
				return ELF::R_BPF_NONE;
				}
				}
				return ELF::R_BPF_64_32;
				}
				}

				std::unique_ptr<MCObjectTargetWriter>
				llvm::createConnexELFObjectWriter(uint8_t OSABI) {
				// Following https://reviews.llvm.org/D66259
				return std::make_unique<ConnexELFObjectWriter>(OSABI);
				}

llvm/lib/Target/Connex/MCTargetDesc/ConnexInstPrinter.h

This file was added.

				//===-- ConnexInstPrinter.h - Convert Connex MCInst to asm syntax -- C++ ---//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This class prints a Connex MCInst to a .s file.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_CONNEX_INSTPRINTER_CONNEXINSTPRINTER_H
				#define LLVM_LIB_TARGET_CONNEX_INSTPRINTER_CONNEXINSTPRINTER_H

				#include "llvm/MC/MCInstPrinter.h"

				namespace llvm {
				class MCOperand;

				class ConnexInstPrinter : public MCInstPrinter {
				public:
				ConnexInstPrinter(const MCAsmInfo &MAI, const MCInstrInfo &MII,
				const MCRegisterInfo &MRI)
				: MCInstPrinter(MAI, MII, MRI) {}

				void printInst(const MCInst *MI, uint64_t Address, StringRef Annot,
				const MCSubtargetInfo &STI, raw_ostream &O) override;

				// IMPORTANT Note: printOperand() etc are not methods of the
				// MCInstPrinter class, but they are methods called from the
				// TableGen generated code from ConnexGenAsmWriter.inc.
				void printOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O,
				const char *Modifier = nullptr);

				template <unsigned Bits, unsigned Offset = 0>
				void printUImm(const MCInst *MI, int opNum, raw_ostream &O);

				void printMemOperand(const MCInst *MI, int OpNo, raw_ostream &O,
				const char *Modifier = nullptr);

				// Taken from MSP430InstPrinter.h
				void printSrcMemOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O,
				const char *Modifier = nullptr);

				void printImm64Operand(const MCInst *MI, unsigned OpNo, raw_ostream &O);

				// Inspired from printi256mem() from
				// [LLVM]/lib/Target/X86/InstPrinter/X86IntelInstPrinter.h
				void printScatterGatherMemOperand(const MCInst *MI, unsigned OpNo,
				raw_ostream &O);

				// Autogenerated by tblgen.
				std::pair<const char , uint64_t> getMnemonic(const MCInst MI) override;
				void printInstruction(const MCInst *MI, uint64_t Address, raw_ostream &O);
				static const char *getRegisterName(unsigned RegNo);

				private:
				// Taken from [LLVM]/llvm/lib/Target/Mips/InstPrinter/MipsInstPrinter.h
				void printUnsignedImm8(const MCInst *MI, int opNum, raw_ostream &O);

				// Required by ConnexGenAsmWriter.inc
				// Inspired from Mips/InstPrinter/MipsInstPrinter.h
				void printUnsignedImm(const MCInst *MI, int opNum, raw_ostream &O);
				};
				} // namespace llvm

				#endif

llvm/lib/Target/Connex/MCTargetDesc/ConnexInstPrinter.cpp

This file was added.

				//===-- ConnexInstPrinter.cpp - Convert Connex MCInst to asm syntax -------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This class prints a Connex MCInst to a .s file.
				//
				//===----------------------------------------------------------------------===//

				#include "ConnexInstPrinter.h"
				#include "Connex.h"
				#include "ConnexConfig.h"
				#include "llvm/CodeGen/MachineInstr.h"
				#include "llvm/MC/MCAsmInfo.h"
				#include "llvm/MC/MCExpr.h"
				#include "llvm/MC/MCInst.h"
				#include "llvm/MC/MCSymbol.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/ErrorHandling.h"
				#include "llvm/Support/FormattedStream.h"

				using namespace llvm;

				#define DEBUG_TYPE "asm-inst-printer"

				// Include the auto-generated portion of the assembly writer.
				#include "ConnexGenAsmWriter.inc"

				/*
				Note: As of Nov 2016, the LLVM APIs allow printing customized code only
				here (and NOT in ConnexAsmPrinter.cpp, which around a year ago had some APIs).
				*/

				void ConnexInstPrinter::printInst(const MCInst *MI, uint64_t Address,
				StringRef Annot, const MCSubtargetInfo &STI,
				raw_ostream &O) {
				LLVM_DEBUG(dbgs() << "Entered ConnexInstPrinter::printInst()...\n");
				LLVM_DEBUG(dbgs() << "printInst(): MI = " << MI << "\n");
				LLVM_DEBUG(dbgs() << "printInst(): MI->getOpcode() = " << MI->getOpcode()
				<< "\n");
				LLVM_DEBUG(dbgs() << "printInst(): Address = " << Address << "\n");

				/* For some reason, ConnexGenAsmWriter.inc cannot print INLINEASM from the
				MachineInstr bundles I create in ConnexInstrInfo.cpp, expandPostRAPseudo(),
				and then unpack in [Target]AsmPrinter::EmitInstruction(),
				because of this definition they have:
				static const uint32_t OpInfo0[] =
				0U,>// PHI
				0U,>// INLINEASM
				...
				etc.
				So I handle these INLINEASMs myself here.
				TODO: maybe explain better.
				*/
				if (MI->getOpcode() == 1) {
				O << " ";
				printOperand(MI, 0, O); // getOperand(0));
				O << " // custom code in ConnexInstPrinter::printInst() for INLINEASM";
				} else {
				printInstruction(MI, Address, O);
				}

				printAnnotation(O, Annot);
				}

				static void printExpr(const MCExpr *Expr, raw_ostream &O) {
				#ifndef NDEBUG
				const MCSymbolRefExpr *SRE;

				if (const MCBinaryExpr *BE = dyn_cast<MCBinaryExpr>(Expr))
				SRE = dyn_cast<MCSymbolRefExpr>(BE->getLHS());
				else
				SRE = dyn_cast<MCSymbolRefExpr>(Expr);
				assert(SRE && "Unexpected MCExpr type.");

				MCSymbolRefExpr::VariantKind Kind = SRE->getKind();

				assert(Kind == MCSymbolRefExpr::VK_None);
				#endif

				O << *Expr;
				}

				void ConnexInstPrinter::printOperand(const MCInst *MI, unsigned OpNo,
				raw_ostream &O, const char *Modifier) {
				LLVM_DEBUG(dbgs() << "Entered ConnexInstPrinter::printOperand(OpNo = " << OpNo
				<< ")...\n");
				LLVM_DEBUG(dbgs() << "ConnexInstPrinter::printOperand(): MI = " << MI
				<< "\n");
				LLVM_DEBUG(
				dbgs() << "ConnexInstPrinter::printOperand(): MI->getNumOperands() = "
				<< MI->getNumOperands() << "\n");

				/* Simple failback, useful just for NOP -
				* TODO: I could take care of it in printInstruction(), which calls
				* printOperand()
				*/
				if (MI->getNumOperands() <= OpNo)
				return;

				LLVM_DEBUG(
				dbgs() << "ConnexInstPrinter::printOperand(): MI->getOperand(OpNo) = "
				<< MI->getOperand(OpNo) << "\n");

				assert((Modifier == 0 \|\| Modifier[0] == 0) && "No modifiers supported");

				const MCOperand &Op = MI->getOperand(OpNo);

				if (Op.isReg()) {
				// This handles registers, such as scalar r0 or vector R(0)
				O << getRegisterName(Op.getReg());
				} else if (Op.isImm()) {
				/* Normally we do NOT get here because this case is treated in
				printUnsignedImm(). */
				LLVM_DEBUG(dbgs() << "ConnexInstPrinter::printOperand(): Op.getImm() = "
				<< Op.getImm() << "\n");
				O << (int32_t)Op.getImm();
				} else {
				assert(Op.isExpr() && "Expected an expression");
				printExpr(Op.getExpr(), O);
				}
				}

				template <unsigned Bits, unsigned Offset>
				void ConnexInstPrinter::printUImm(const MCInst *MI, int opNum, raw_ostream &O) {
				const MCOperand &MO = MI->getOperand(opNum);
				if (MO.isImm()) {
				uint64_t Imm = MO.getImm();
				Imm -= Offset;
				Imm &= (1 << Bits) - 1;
				Imm += Offset;
				O << formatImm(Imm);
				return;
				}

				printOperand(MI, opNum, O);
				}

				void ConnexInstPrinter::printMemOperand(const MCInst *MI, int OpNo,
				raw_ostream &O, const char *Modifier) {
				// We arrive here for instructions like: sth 0(r12), r14

				LLVM_DEBUG(dbgs() << "Entered ConnexInstPrinter::printMemOperand()\n");

				const MCOperand &RegOp = MI->getOperand(OpNo);
				const MCOperand &OffsetOp = MI->getOperand(OpNo + 1);

				// offset
				if (OffsetOp.isImm())
				O << formatDec(OffsetOp.getImm());
				else
				assert(0 && "Expected an immediate");

				// register
				assert(RegOp.isReg() && "Register operand not a register");
				O << '(' << getRegisterName(RegOp.getReg()) << ')';
				}

				// Inspired from MSP430InstPrinter.h
				void ConnexInstPrinter::printSrcMemOperand(const MCInst *MI, unsigned OpNo,
				raw_ostream &O,
				const char *Modifier) {
				LLVM_DEBUG(dbgs() << "Entered ConnexInstPrinter::printSrcMemOperand()\n");

				const MCOperand &Base = MI->getOperand(0);
				const MCOperand &Disp = MI->getOperand(1);

				// Print displacement first

				// If the global address expression is a part of displacement field with a
				// register base, we should not emit any prefix symbol here, e.g.
				// mov.w &foo, r1
				// vs
				// mov.w glb(r1), r2
				// Otherwise (!) msp430-as will silently miscompile the output :(
				if (!Base.getReg())
				O << '&';

				if (Disp.isExpr())
				Disp.getExpr()->print(O, &MAI);
				else {
				assert(Disp.isImm() && "Expected immediate in displacement field");
				O << Disp.getImm();
				}

				// Print register base field
				if (Base.getReg())
				O << '(' << getRegisterName(Base.getReg()) << ')';
				}

				void ConnexInstPrinter::printImm64Operand(const MCInst *MI, unsigned OpNo,
				raw_ostream &O) {
				LLVM_DEBUG(dbgs() << "Entered ConnexInstPrinter::printImm64Operand()\n");

				const MCOperand &Op = MI->getOperand(OpNo);

				if (Op.isImm()) {
				// This is for instructions like: ld_64 r3, 4294967296
				O << (uint64_t)Op.getImm();
				} else {
				// This is for instructions like: ld_64 r1, <MCOperand Expr:(CONNEX_VL)>
				O << Op;
				}
				}

				void ConnexInstPrinter::printScatterGatherMemOperand(const MCInst *MI,
				unsigned OpNo,
				raw_ostream &O) {
				LLVM_DEBUG(
				dbgs()
				<< "Entered ConnexInstPrinter::printScatterGatherMemOperand() - "
				"NOTE that we discard the BasePtr of the TableGen MemOperand\n");
				/*
				IMPORTANT: Here, for the MCInst, the parameters do NOT follow the order from
				the .td file.
				Following include/llvm/Target/TargetSelectionDAG.td we have:

				// SDTypeProfile - This profile describes the type requirements of a
				// Selection DAG node.
				class SDTypeProfile<int numresults, int numoperands,
				list<SDTypeConstraint> constraints> {
				int NumResults = numresults;
				int NumOperands = numoperands;
				list<SDTypeConstraint> Constraints = constraints;
				}

				// So: 3 input operands, 2 results.
				// Params are: passthru, mask, index; results are: vector of i1,
				// vector of ptr (actual result)
				// Params are 0, 1, 2 and results are 3, 4.
				// Operands 0 and 1 have vector type, with same number of elements.
				// Operands 0 and 2 have identical types.
				// Operands 1 and 3 have identical types.
				// --> Opnd 3 (result 0?) is i1 vector
				// Operand 4 (result 1?) has pointer type.
				// Operand 1 is vector type with element type of i1.
				def SDTMaskedGather: SDTypeProfile<2, 3, [ // masked gather
				SDTCisVec<0>, SDTCisVec<1>, SDTCisSameAs<0, 2>, SDTCisSameAs<1, 3>,
				SDTCisPtrTy<4>, SDTCVecEltisVT<1, i1>, SDTCisSameNumEltsAs<0, 1>
				]>;

				def masked_gather : SDNode<"ISD::MGATHER", SDTMaskedGather,
				[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
				*/

				if (MI->getNumOperands() > 4) {
				// We have an MGATHER operation
				const MCOperand &res = MI->getOperand(0);
				const MCOperand &index = MI->getOperand(4);
				const MCOperand &maskIn = MI->getOperand(1);
				const MCOperand &passthru = MI->getOperand(2);
				const MCOperand &maskOut = MI->getOperand(3);

				assert(index.isReg() && "index not a register");
				assert(passthru.isReg() && "passthru not a register");

				LLVM_DEBUG(dbgs() << "MI = " << *MI << "\n index = " << index
				<< "\n maskIn (bool vector register, which we actually "
				"do NOT use) = "
				<< maskIn << "\n passthru = " << passthru
				<< "\n maskOut = " << maskOut << "\n res = " << res
				<< "\n");

				LLVM_DEBUG(dbgs() << "\n res = " << res << "\n");

				assert(res.isReg() && "res not a register");
				O << getRegisterName(index.getReg());
				} else {
				// We have an MSCATTER operation
				const MCOperand &value = MI->getOperand(1);
				const MCOperand &maskIn = MI->getOperand(0);
				const MCOperand &mask2 = MI->getOperand(2);
				const MCOperand &index = MI->getOperand(3);

				LLVM_DEBUG(dbgs() << "MI = " << *MI << "\n value (src) = " << value
				<< "\n maskIn (bool vector register, "
				"which we actually do NOT use) = "
				<< maskIn << "\n index = " << index
				<< "\n mask2 = " << mask2 << "\n");
				O << getRegisterName(index.getReg());
				}

				LLVM_DEBUG(
				dbgs() << "Exiting ConnexInstPrinter::printScatterGatherMemOperand()\n");
				}

				// Taken from MipsInstPrinter.cpp
				// (required by ConnexGenAsmWriter.inc)
				void ConnexInstPrinter::printUnsignedImm(const MCInst *MI, int opNum,
				raw_ostream &O) {
				char *res = NULL;

				LLVM_DEBUG(dbgs() << "Entered ConnexInstPrinter::printUnsignedImm()...\n");

				const MCOperand &MO = MI->getOperand(opNum);

				if (MO.isImm()) {
				unsigned int imm = MO.getImm();

				LLVM_DEBUG(dbgs() << "ConnexInstPrinter::printUnsignedImm(): imm = " << imm
				<< ", MI (ptr) = " << MI << ", MI = " << MI << "\n");

				#ifdef GENERATE_ASSOCIATED_INLINEASM_FROM_LOOPVECTORIZE_PASS
				if (imm == VALUE_BOGUS_REPEAT_X_TIMES) {
				assert(0 && "This should NOT be executed since we don't "
				"use symbolic LD_H, ST_H or REPEAT (using INLINEASMs "
				"attached next to them) anymore");

				assert(MI->getOpcode() == Connex::REPEAT);
				/*
				res = getStringFromAssociatedInlineAsm(crtMI,
				const_cast<char >("/value*/"));
				*/

				O << res;
				} else
				#endif

				if (imm == CONNEX_MEM_NUM_ROWS + CONNEX_MEM_CONSTANT_OFFSET) {
				assert(0 && "This should NOT be executed since we don't "
				"use symbolic LD_H, ST_H or REPEAT (using INLINEASMs "
				"attached next to them) anymore");

				assert((MI->getOpcode() == Connex::LD_H) \|\|
				(MI->getOpcode() == Connex::ST_H));
				#if 0
				res = getStringFromAssociatedInlineAsm(crtMI,
				const_cast<char >("/offset*/"));
				#endif

				O << STR_LOOP_SYMBOLIC_INDEX << " + " << res;
				} else if (imm >= CONNEX_MEM_NUM_ROWS) {
				int spillRelativeOffset =
				(int)imm - CONNEX_MEM_NUM_ROWS - CONNEX_MEM_NUM_ROWS_EXTRA_FOR_SPILL;
				assert(spillRelativeOffset <= 1);
				// In few cases (Map.f16, SSD.f16) it is -1

				O << "CONNEX_MEM_SPILL_START_OFFSET";

				if (spillRelativeOffset >= 0)
				O << " + " << spillRelativeOffset;
				else
				O << " - " << -spillRelativeOffset;
				} else {
				O << imm; // (unsigned int)MO.getImm();
				}
				} else {
				printOperand(MI, opNum, O);
				}
				}

				// Inspired from [LLVM]/llvm/lib/Target/Mips/InstPrinter/MipsInstPrinter.h
				void ConnexInstPrinter::printUnsignedImm8(const MCInst *MI, int opNum,
				raw_ostream &O) {
				const MCOperand &MO = MI->getOperand(opNum);

				if (MO.isImm())
				O << (unsigned short int)(unsigned char)MO.getImm();
				else
				printOperand(MI, opNum, O);
				}

llvm/lib/Target/Connex/MCTargetDesc/ConnexMCAsmInfo.h

This file was added.

				//===-- ConnexMCAsmInfo.h - Connex asm properties -------------- C++ ---====//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains the declaration of the ConnexMCAsmInfo class.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_CONNEX_MCTARGETDESC_CONNEXMCASMINFO_H
				#define LLVM_LIB_TARGET_CONNEX_MCTARGETDESC_CONNEXMCASMINFO_H

				#include "llvm/ADT/StringRef.h"
				#include "llvm/ADT/Triple.h"
				#include "llvm/MC/MCAsmInfo.h"

				namespace llvm {
				class Target;
				class Triple;

				class ConnexMCAsmInfo : public MCAsmInfo {
				public:
				explicit ConnexMCAsmInfo(const Triple &TT, const MCTargetOptions &Options) {
				PrivateGlobalPrefix = ".L";
				WeakRefDirective = "\t.weak\t";

				// Inspired from llvm.org/docs/doxygen/html/NVPTXMCAsmInfo_8cpp_source.html
				// Avoiding to add APP and NO_APP delimiters before ASM Inline Expressions
				CommentString = "//";
				InlineAsmStart = "";
				InlineAsmEnd = "";

				UsesELFSectionDirectiveForBSS = true;
				HasSingleParameterDotFile = false;
				HasDotTypeDotSizeDirective = false;

				SupportsDebugInformation = true;
				}
				};
				} // End namespace llvm

				#endif

llvm/lib/Target/Connex/MCTargetDesc/ConnexMCCodeEmitter.cpp

This file was added.

				//===-- ConnexMCCodeEmitter.cpp - Convert Connex code to machine code -----===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements the ConnexMCCodeEmitter class.
				//
				//===----------------------------------------------------------------------===//

				#include "MCTargetDesc/ConnexMCTargetDesc.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/MC/MCCodeEmitter.h"
				#include "llvm/MC/MCContext.h"
				#include "llvm/MC/MCExpr.h"
				#include "llvm/MC/MCFixup.h"
				#include "llvm/MC/MCInst.h"
				#include "llvm/MC/MCInstrInfo.h"
				#include "llvm/MC/MCRegisterInfo.h"
				#include "llvm/MC/MCSubtargetInfo.h"
				#include "llvm/Support/Endian.h"
				#include "llvm/Support/EndianStream.h"
				#include <cassert>
				#include <cstdint>

				using namespace llvm;

				#define DEBUG_TYPE "mccodeemitter"

				namespace {

				class ConnexMCCodeEmitter : public MCCodeEmitter {
				const MCInstrInfo &MCII;
				const MCRegisterInfo &MRI;
				bool IsLittleEndian;

				public:
				ConnexMCCodeEmitter(const MCInstrInfo &mcii, const MCRegisterInfo &mri,
				bool IsLittleEndian)
				: MCII(mcii), MRI(mri), IsLittleEndian(IsLittleEndian) {}

				ConnexMCCodeEmitter(const ConnexMCCodeEmitter &) = delete;

				void operator=(const ConnexMCCodeEmitter &) = delete;

				~ConnexMCCodeEmitter() override = default;

				// getBinaryCodeForInstr - TableGen'erated function for getting the
				// binary encoding for an instruction.
				uint64_t getBinaryCodeForInstr(const MCInst &MI,
				SmallVectorImpl<MCFixup> &Fixups,
				const MCSubtargetInfo &STI) const;

				// getMachineOpValue - Return binary encoding of operand. If the machin
				// operand requires relocation, record the relocation and return zero.
				unsigned getMachineOpValue(const MCInst &MI, const MCOperand &MO,
				SmallVectorImpl<MCFixup> &Fixups,
				const MCSubtargetInfo &STI) const;

				uint64_t getMemoryOpValue(const MCInst &MI, unsigned Op,
				SmallVectorImpl<MCFixup> &Fixups,
				const MCSubtargetInfo &STI) const;

				void encodeInstruction(const MCInst &MI, raw_ostream &OS,
				SmallVectorImpl<MCFixup> &Fixups,
				const MCSubtargetInfo &STI) const override;
				};

				} // end anonymous namespace

				MCCodeEmitter *llvm::createConnexMCCodeEmitter(const MCInstrInfo &MCII,
				MCContext &Ctx) {
				return new ConnexMCCodeEmitter(MCII, *(Ctx.getRegisterInfo()), true);
				}

				unsigned
				ConnexMCCodeEmitter::getMachineOpValue(const MCInst &MI, const MCOperand &MO,
				SmallVectorImpl<MCFixup> &Fixups,
				const MCSubtargetInfo &STI) const {
				if (MO.isReg())
				return MRI.getEncodingValue(MO.getReg());
				if (MO.isImm())
				return static_cast<unsigned>(MO.getImm());

				assert(MO.isExpr());

				const MCExpr *Expr = MO.getExpr();

				assert(Expr->getKind() == MCExpr::SymbolRef);

				if (MI.getOpcode() == Connex::JAL)
				// func call name
				Fixups.push_back(MCFixup::create(0, Expr, FK_SecRel_4));
				else if (MI.getOpcode() == Connex::LD_imm64)
				Fixups.push_back(MCFixup::create(0, Expr, FK_SecRel_8));
				else
				// bb label
				Fixups.push_back(MCFixup::create(0, Expr, FK_PCRel_2));

				return 0;
				}

				static uint8_t SwapBits(uint8_t Val) {
				return (Val & 0x0F) << 4 \| (Val & 0xF0) >> 4;
				}

				void ConnexMCCodeEmitter::encodeInstruction(const MCInst &MI, raw_ostream &OS,
				SmallVectorImpl<MCFixup> &Fixups,
				const MCSubtargetInfo &STI) const {
				unsigned Opcode = MI.getOpcode();
				support::endian::Writer OSE(OS,
				IsLittleEndian ? support::little : support::big);

				if (Opcode == Connex::LD_imm64 \|\| Opcode == Connex::LD_pseudo) {
				uint64_t Value = getBinaryCodeForInstr(MI, Fixups, STI);
				OS << char(Value >> 56);
				if (IsLittleEndian)
				OS << char((Value >> 48) & 0xff);
				else
				OS << char(SwapBits((Value >> 48) & 0xff));
				OSE.write<uint16_t>(0);
				OSE.write<uint32_t>(Value & 0xffffFFFF);

				const MCOperand &MO = MI.getOperand(1);
				uint64_t Imm = MO.isImm() ? MO.getImm() : 0;
				OSE.write<uint8_t>(0);
				OSE.write<uint8_t>(0);
				OSE.write<uint16_t>(0);
				OSE.write<uint32_t>(Imm >> 32);
				} else {
				// Get instruction encoding and emit it
				uint64_t Value = getBinaryCodeForInstr(MI, Fixups, STI);
				OS << char(Value >> 56);
				if (IsLittleEndian)
				OS << char((Value >> 48) & 0xff);
				else
				OS << char(SwapBits((Value >> 48) & 0xff));
				OSE.write<uint16_t>((Value >> 32) & 0xffff);
				OSE.write<uint32_t>(Value & 0xffffFFFF);
				}
				}

				// Encode Connex Memory Operand
				uint64_t
				ConnexMCCodeEmitter::getMemoryOpValue(const MCInst &MI, unsigned Op,
				SmallVectorImpl<MCFixup> &Fixups,
				const MCSubtargetInfo &STI) const {
				uint64_t Encoding;
				const MCOperand Op1 = MI.getOperand(1);
				assert(Op1.isReg() && "First operand is not register.");
				Encoding = MRI.getEncodingValue(Op1.getReg());
				Encoding <<= 16;
				MCOperand Op2 = MI.getOperand(2);
				assert(Op2.isImm() && "Second operand is not immediate.");
				Encoding \|= Op2.getImm() & 0xffff;
				return Encoding;
				}

				#include "ConnexGenMCCodeEmitter.inc"

llvm/lib/Target/Connex/MCTargetDesc/ConnexMCTargetDesc.h

This file was added.

				//===-- ConnexMCTargetDesc.h - Connex Target Descriptions -------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file provides Connex specific target descriptions.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_CONNEX_MCTARGETDESC_CONNEXMCTARGETDESC_H
				#define LLVM_LIB_TARGET_CONNEX_MCTARGETDESC_CONNEXMCTARGETDESC_H

				#include "llvm/Config/config.h"
				#include "llvm/Support/DataTypes.h"

				#include <memory>

				namespace llvm {
				class MCAsmBackend;
				class MCCodeEmitter;
				class MCContext;
				class MCInstrInfo;
				class MCObjectTargetWriter;
				class MCRegisterInfo;
				class MCSubtargetInfo;
				class MCTargetOptions;
				class StringRef;
				class Target;
				class Triple;
				class raw_ostream;
				class raw_pwrite_stream;

				extern Target TheConnexTarget;

				MCCodeEmitter *createConnexMCCodeEmitter(const MCInstrInfo &MCII,
				MCContext &Ctx);

				MCAsmBackend *createConnexAsmBackend(const Target &T,
				const MCSubtargetInfo &STI,
				const MCRegisterInfo &MRI,
				const MCTargetOptions &Options);

				std::unique_ptr<MCObjectTargetWriter>
				createConnexELFObjectWriter(uint8_t OSABI);
				} // namespace llvm

				// Defines symbolic names for Connex registers. This defines a mapping from
				// register name to register number.
				//
				#define GET_REGINFO_ENUM
				#include "ConnexGenRegisterInfo.inc"

				// Defines symbolic names for the Connex instructions.
				//
				#define GET_INSTRINFO_ENUM
				#include "ConnexGenInstrInfo.inc"

				#define GET_SUBTARGETINFO_ENUM
				#include "ConnexGenSubtargetInfo.inc"

				#endif

llvm/lib/Target/Connex/MCTargetDesc/ConnexMCTargetDesc.cpp

This file was added.

				//===-- ConnexMCTargetDesc.cpp - Connex Target Descriptions ---------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file provides Connex specific target descriptions.
				//
				//===----------------------------------------------------------------------===//

				#include "ConnexMCTargetDesc.h"
				#include "Connex.h"
				#include "ConnexInstPrinter.h"
				#include "ConnexMCAsmInfo.h"
				#include "llvm/MC/MCInstrInfo.h"
				#include "llvm/MC/MCRegisterInfo.h"
				#include "llvm/MC/MCStreamer.h"
				#include "llvm/MC/MCSubtargetInfo.h"
				#include "llvm/MC/TargetRegistry.h"
				#include "llvm/Support/ErrorHandling.h"

				#define GET_INSTRINFO_MC_DESC
				#include "ConnexGenInstrInfo.inc"

				#define GET_SUBTARGETINFO_MC_DESC
				#include "ConnexGenSubtargetInfo.inc"

				#define GET_REGINFO_MC_DESC
				#include "ConnexGenRegisterInfo.inc"

				using namespace llvm;

				static MCInstrInfo *createConnexMCInstrInfo() {
				MCInstrInfo *X = new MCInstrInfo();
				InitConnexMCInstrInfo(X);
				return X;
				}

				static MCRegisterInfo *createConnexMCRegisterInfo(const Triple &TT) {
				MCRegisterInfo *X = new MCRegisterInfo();
				InitConnexMCRegisterInfo(X, Connex::R11 /* RAReg doesn't exist */);
				return X;
				}

				static MCSubtargetInfo *
				createConnexMCSubtargetInfo(const Triple &TT, StringRef CPU, StringRef FS) {
				return createConnexMCSubtargetInfoImpl(TT, CPU, CPU, FS);
				}

				static MCStreamer *createConnexMCStreamer(const Triple &T, MCContext &Ctx,
				std::unique_ptr<MCAsmBackend> &&MAB,
				std::unique_ptr<MCObjectWriter> &&OW,
				std::unique_ptr<MCCodeEmitter> &&Emit,
				bool RelaxAll) {
				return createELFStreamer(Ctx, std::move(MAB), std::move(OW), std::move(Emit),
				RelaxAll);
				}

				static MCInstPrinter *createConnexMCInstPrinter(const Triple &T,
				unsigned SyntaxVariant,
				const MCAsmInfo &MAI,
				const MCInstrInfo &MII,
				const MCRegisterInfo &MRI) {
				if (SyntaxVariant == 0)
				return new ConnexInstPrinter(MAI, MII, MRI);
				return nullptr;
				}

				extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeConnexTargetMC() {
				for (Target *T : {&TheConnexTarget}) {
				// Register the MC asm info.
				RegisterMCAsmInfo<ConnexMCAsmInfo> X(*T);

				// Register the MC instruction info.
				TargetRegistry::RegisterMCInstrInfo(*T, createConnexMCInstrInfo);

				// Register the MC register info.
				TargetRegistry::RegisterMCRegInfo(*T, createConnexMCRegisterInfo);

				// Register the MC subtarget info.
				TargetRegistry::RegisterMCSubtargetInfo(*T, createConnexMCSubtargetInfo);

				// Register the object streamer
				TargetRegistry::RegisterELFStreamer(*T, createConnexMCStreamer);

				// Register the MCInstPrinter.
				TargetRegistry::RegisterMCInstPrinter(*T, createConnexMCInstPrinter);
				}

				// Register the MC code emitter
				TargetRegistry::RegisterMCCodeEmitter(TheConnexTarget,
				createConnexMCCodeEmitter);

				// Register the ASM Backend
				TargetRegistry::RegisterMCAsmBackend(TheConnexTarget, createConnexAsmBackend);
				}

llvm/lib/Target/Connex/Misc.h

This file was added.

				#ifndef MISC_H_INCLUDED__
				#define MISC_H_INCLUDED__

				#include <string>

				#ifndef MAXLEN_STR
				#define MAXLEN_STR 8192
				#endif

				//#include "string_extras.h"
				// We could use also boost::algorithm::ends_with() and starts_with()
				// From
				// stackoverflow.com/questions/8095088/how-to-check-string-start-in-c/8095132
				static bool startsWith(const std::string &haystack, const std::string &needle) {
				// We avoid making a copy of your string with strstr() or searching more
				// than at index 0
				return needle.length() <= haystack.length() &&
				std::equal(needle.begin(), needle.end(), haystack.begin());
				}

				// From
				// stackoverflow.com/questions/874134/find-out-if-string-ends-with-another-string-in-c
				static bool endsWith(const std::string &s, const std::string &suffix) {
				return s.rfind(suffix) == (s.size() - suffix.size());
				}
				static inline bool ends_with(std::string const &value,
				std::string const &ending) {
				// TODO: check if OK
				if (ending.size() > value.size())
				return false;
				return std::equal(ending.rbegin(), ending.rend(), value.rbegin());
				}

				static std::string stringScanf(std::string aText, char *formatStr) {
				char strTmp[MAXLEN_STR];

				assert(formatStr[0] == '%' && formatStr[1] == 's');

				// TODO: implement without sscanf()
				// Maybe use as said at
				// https://stackoverflow.com/questions/6104821/c-equivalent-of-sscanf:
				// "The formatting isn't as easy but check out stringstream.
				// See also istringstream and ostringstream for input and
				// output buffers formatting."
				sscanf(aText.c_str(), formatStr, strTmp);

				return std::string(strTmp);
				}

				static std::string stringPrintf(char formatStr, void aPtr) {
				char strTmp[MAXLEN_STR];

				// TODO: implement without sprintf()
				sprintf(strTmp, formatStr, aPtr);

				return std::string(strTmp);
				}
				static std::string stringPrintf(char *formatStr, long aVal) {
				char strTmp[MAXLEN_STR];

				// TODO: implement without sprintf()
				sprintf(strTmp, formatStr, aVal);

				return std::string(strTmp);
				}

				#ifdef INCLUDE_SUNIT_DUMP

				#include "llvm/Support/raw_ostream.h"

				using namespace llvm;

				// Inspired from SystemZHazardRecognizer.cpp

				#ifndef NDEBUG // Debug output

				// The SUnit (Scheduling Unit) class no longer has the dump() method,
				// so we create a helper method for it here.
				// Inspired from SystemZHazardRecognizer.h

				/// Resolves and cache a resolved scheduling class for an SUnit.
				static const MCSchedClassDesc getSchedClass(SUnit SU) {
				if (!SU->SchedClass) { // && SchedModel->hasInstrSchedModel()
				return NULL;
				// TODO: SU->SchedClass = SchedModel->resolveSchedClass(SU->getInstr());
				}

				return SU->SchedClass;
				}

				static void dumpSU(llvm::SUnit *SU, raw_ostream &OS) {
				OS << "SU(" << SU->NodeNum << "):";
				// OS << TII->getName(SU->getInstr()->getOpcode());
				OS << SU->getInstr()->getOpcode();

				const MCSchedClassDesc *SC = getSchedClass(SU);
				if (!SC->isValid())
				return;

				/*
				// TODO: make this compile

				for (TargetSchedModel::ProcResIter
				PI = SchedModel->getWriteProcResBegin(SC),
				PE = SchedModel->getWriteProcResEnd(SC); PI != PE; ++PI) {
				const MCProcResourceDesc &PRD =
				*SchedModel->getProcResource(PI->ProcResourceIdx);
				std::string FU(PRD.Name);
				// trim e.g. Z13_FXaUnit -> FXa
				FU = FU.substr(FU.find("_") + 1);
				size_t Pos = FU.find("Unit");
				if (Pos != std::string::npos)
				FU.resize(Pos);
				if (FU == "LS") // LSUnit -> LSU
				FU = "LSU";
				OS << "/" << FU;

				if (PI->Cycles > 1)
				OS << "(" << PI->Cycles << "cyc)";
				}
				*/

				if (SC->NumMicroOps > 1)
				OS << "/" << SC->NumMicroOps << "uops";
				if (SC->BeginGroup && SC->EndGroup)
				OS << "/GroupsAlone";
				else if (SC->BeginGroup)
				OS << "/BeginsGroup";
				else if (SC->EndGroup)
				OS << "/EndsGroup";
				if (SU->isUnbuffered)
				OS << "/Unbuffered";
				/*
				// TODO: make this compile
				if (has4RegOps(SU->getInstr()))
				OS << "/4RegOps";
				*/
				}
				#endif // #ifndef NDEBUG

				#endif // INCLUDED_SUNIT_DUMP

				#endif // MISC_H_INCLUDED__

llvm/lib/Target/Connex/RecoverFromLlvmIR.h

This file was added.

				#ifndef RECOVER_FROM_LLVM_IR
				#define RECOVER_FROM_LLVM_IR

				#include <stack>
				#include <string>
				#include <unordered_map>
				#include <utility> // For std::pair

				#include "llvm/IR/DebugInfo.h"
				#include "llvm/IR/InstrTypes.h"

				// See http://llvm.org/docs/ProgrammersManual.html#isa
				#include "llvm/Support/Casting.h" // For dyn_cast

				#define EXCHANGE(a, b) \
				a ^= b; \
				b ^= a; \
				a ^= b;

				#ifndef MAXLEN_STR
				#define MAXLEN_STR 8192
				#endif

				#include "Misc.h"

				// #define DEBUG_TYPE LV_NAME
				// #define LLVM_DEBUG DEBUG

				// static const std::string STR_REMAINDER_VF = "n.mod.vf";

				using namespace llvm;

				namespace {

				// Normally used to return the variable name without suffix e.g. ".034"
				std::string rStripStringAfterChar(std::string str, char ch) {
				std::size_t pos = str.find(ch);

				return str.substr(0, pos);
				}

				/* Important Note:
				* If the val is an LLVM variable, it will return something like
				* "%[llvm_var_name]".
				* If val is a constant it returns normally the value of the
				* constant.
				*
				* I consider a rather big defficiency of Value::getName() NOT to return
				* (itself or a different method, created by the key LLVM people)
				* the auto-generated number like %0, if the Value is created without an
				* explicit name.
				*
				* Important: I noticed that for different Instruction the result of print()
				* can be somewhat different, like:
				* - i32 %0
				* - %1 = bitcast ...
				*/
				std::string getLLVMValueName(Value *val) {
				/* Somewhat important: it is possible that, if the API
				changes a bit the name will NOT be printed
				here anymore */
				std::string printStr;
				raw_string_ostream OS(printStr);

				// bci->printAsOperand(OS, true); // Does NOT write anything (false neither)

				// See http://llvm.org/docs/doxygen/html/Value_8h_source.html#l00202
				/* Note: IsForDebug false can print:
				- the SAME as true or
				- the complete instruction, not just the value */
				val->print(OS, /IsForDebug/ true);
				LLVM_DEBUG(dbgs() << "getLLVMValueName(): printStr = " << printStr << "\n");

				std::string strValName;
				std::string strValName2;

				if (llvm::dyn_cast<Constant>(val) != nullptr) {
				LLVM_DEBUG(dbgs() << "getLLVMValueName(): val is Constant\n");

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1Constant.html
				// sscanf(printStr.c_str(), "%s %s", strValName2, strValName);
				strValName2 = stringScanf(printStr, (char *)"%s ");
				strValName =
				stringScanf(printStr.substr(strValName2.length()), (char *)"%s ");

				/* Normally printStr is of form "type_ct val_ct".
				* But we can also have something like
				* @dataT = common local_unnamed_addr global [128 x [150 x half]]
				* zeroinitializer
				*/
				if (strValName2[0] == '@')
				strValName = strValName2.substr(1);
				} else {
				std::size_t posPercent = printStr.find('%');
				LLVM_DEBUG(dbgs() << "getLLVMValueName(): posPercent = " << posPercent
				<< "\n");

				if (posPercent == std::string::npos) {
				// This is NOT a variable Value - probably just a constant
				return ""; // std::to_string("");
				}

				// sscanf(printStr.substr(posPercent).c_str(), "%s ", strValName);
				strValName = stringScanf(printStr.substr(posPercent), (char *)"%s ");
				// sscanf(valTypeAndName.c_str(), "%s %s", strValName, strValName);
				}

				std::string res = strValName;
				LLVM_DEBUG(dbgs() << "getLLVMValueName(): res = " << res << "\n");

				return res;
				}

				// Used by getAllMetadata() (and getExpr())
				bool ranGetAllMetadata;
				// DenseMap<Value *, std::string> varNameMap;
				// Map with <name of Value, name of source var represented>
				std::unordered_map<std::string, std::string> varNameMap;
				//
				void getAllMetadata(Function *F) {
				ranGetAllMetadata = true;

				LLVM_DEBUG(dbgs() << "Entered getAllMetadata()\n");

				// Some info about metadata:
				// http://llvm.org/docs/SourceLevelDebugging.html#llvm-dbg-value

				// Inspired from
				// weaponshot.wordpress.com/2012/05/06/extract-all-the-metadata-nodes-in-llvm/
				for (Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB) {
				for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I) {
				/* Get the Metadata declared in the llvm intrinsic functions
				such as llvm.dbg.declare() */
				if (CallInst *CI = dyn_cast<CallInst>(I)) {
				if (Function *F = CI->getCalledFunction()) {
				// We look at the llvm.dbg.value metadata which associates Value
				// (LLVM IR values) with names in the original program
				if (F->getName().startswith("llvm.dbg.value")) {
				// if (F->getName().startswith("llvm.dbg"))
				LLVM_DEBUG(dbgs() << "getAllMetadata(): CI = " << *CI << "\n");

				/* It seems that the association between LLVM IR
				Value and names in the original source program
				is always like this:
				- opnd 0 contains the Value,
				- opnd 1 is always a (useless?) 0,
				- opnd 2 contains the DILocalVariable,
				*/
				// Error: <<no known conversion for argument 1 from
				// ‘const llvm::Value’ to ‘const llvm::Metadata’>>:
				// DILocalVariable *srcVar = llvm::dyn_cast_or_null<
				// DILocalVariable>(I->getOperand(2));
				// Error: <<no known conversion for argument 1 from
				// ‘const llvm::Value’ to ‘const llvm::Metadata’>>:
				// MDNode *srcVar =
				// llvm::dyn_cast_or_null<MDNode>(I->getOperand(2));
				/* See llvm.org/docs/doxygen/html/classllvm_1_1MetadataAsValue.html
				(see maybe llvm.org/docs/doxygen/html/namespacellvm_1_1mdconst.html:
				"Now that Value and Metadata are in separate hierarchies" */
				MetadataAsValue *srcVarMDV =
				llvm::dyn_cast_or_null<MetadataAsValue>(I->getOperand(2));

				// Value *val = I->getOperand(0);
				MetadataAsValue *val =
				llvm::dyn_cast_or_null<MetadataAsValue>(I->getOperand(0));
				assert(val != nullptr);

				if (srcVarMDV != nullptr) {
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MDNode.html
				// MDNode *srcVar = llvm::dyn_cast_or_null<MDNode>(
				// srcVarMDV->getMetadata());

				// See
				// llvm.org/docs/doxygen/html/classllvm_1_1DILocalVariable.html
				// and llvm.org/docs/doxygen/html/classllvm_1_1DIVariable.html
				DILocalVariable *srcVar = llvm::dyn_cast_or_null<DILocalVariable>(
				srcVarMDV->getMetadata());

				assert(srcVar != nullptr);

				// Gives compiler-error:
				// const MDOperand srcVarOpnd0 = srcVar->getOperand(0);
				// const MDOperand *srcVarOpnd0 = & (srcVar->getOperand(0));

				std::string valueName = getLLVMValueName(val);
				if (valueName.size() == 0) {
				/* We can have metadata which has for 1st
				operand a constant e.g. 0.
				For ex
				call void @llvm.dbg.value(metadata i32 0, i64 0,
				metadata !32, metadata !21), !dbg !33
				*/
				continue;
				}

				// varNameMap[valTypeAndName] = (srcVar->getName()).str();
				varNameMap[valueName] = (srcVar->getName()).str();

				// See
				// llvm.org/docs/doxygen/html/classllvm_1_1DILocalVariable.html
				LLVM_DEBUG(dbgs() << "getAllMetadata(): val = " << *val << "\n");
				LLVM_DEBUG(dbgs() << " val = " << val << "\n");
				LLVM_DEBUG(dbgs() << " val->getValueName() = "
				<< val->getValueName() << "\n");
				LLVM_DEBUG(dbgs()
				<< " val->getName() = " << val->getName() << "\n");
				LLVM_DEBUG(dbgs() << " srcVar = " << *srcVar << "\n");
				// LLVM_DEBUG(dbgs() << " srcVar->getOperand(0) = "
				LLVM_DEBUG(dbgs() << " srcVarName = "
				<< varNameMap[valueName] /* srcVar->getName() */
				<< "\n");
				}
				}
				}
				}
				}
				}
				} // end getAllMetadata()

				std::string printCTypeFromLLVMType(Type aType, LLVMContext aContext) {
				std::string res;

				// See http://llvm.org/doxygen/classllvm_1_1Type.html
				if (aType == Type::getInt16Ty(*aContext))
				res = "short";
				else if (aType == Type::getInt32Ty(*aContext)) // Builder.getInt32Ty())
				res = "int";
				else if (aType == Type::getHalfTy(*aContext))
				res = "half";
				else
				assert(0 && "printCTypeFromLLVMType(): Type NOT supported");

				return res;
				}

				// TODO: probably we will need to treat struct/record,
				// union/variants
				Type getElementTypeOfDerivedType(Type valType) {
				int sizeofElem;

				LLVM_DEBUG(dbgs() << "getElementTypeOfDerivedType(): valType = " << *valType
				<< "\n");

				// Helps for vector type.
				// So it does NOT help for pointer type, as it is the case for val (normally).
				Type *scalarType = valType->getScalarType();
				LLVM_DEBUG(dbgs() << "getElementTypeOfDerivedType(): scalarType = "
				<< *scalarType << "\n");

				sizeofElem = scalarType->getScalarSizeInBits() / 8;
				LLVM_DEBUG(dbgs() << "getElementTypeOfDerivedType(): sizeof(scalarType) = "
				<< sizeofElem << "\n");
				if (sizeofElem != 0)
				return scalarType;

				/*
				// Does not help: both return 0...
				LLVM_DEBUG(dbgs() << "GetSize(): bitsizeof(type of val) = "
				//<< valType->getPrimitiveSizeInBits() / 8 << "\n");
				<< valType->getScalarSizeInBits() << "\n");
				*/
				ArrayType *arrType = llvm::dyn_cast<ArrayType>(valType);

				if (arrType != nullptr) {
				Type *elemArrType = arrType->getElementType();
				sizeofElem = elemArrType->getScalarSizeInBits() / 8;
				LLVM_DEBUG(
				dbgs()
				<< "getElementTypeOfDerivedType(): (arrType != nullptr): elemArrType = "
				<< *elemArrType << "\n");
				LLVM_DEBUG(
				dbgs()
				<< "getElementTypeOfDerivedType(): (arrType != nullptr): sizeofElem = "
				<< sizeofElem << "\n");

				if (sizeofElem == 0) {
				return getElementTypeOfDerivedType(elemArrType);
				} else {
				return elemArrType;
				}
				}

				/*
				// MEGA-TODO: Now LLVM has opaque pointer types
				// - see https://llvm.org/docs/OpaquePointers.html
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1PointerType.html
				PointerType *ptrType = llvm::dyn_cast<PointerType>(valType);
				if (ptrType != nullptr) {
				Type *elemPtrType = ptrType->getElementType();

				sizeofElem = elemPtrType->getScalarSizeInBits() / 8;
				LLVM_DEBUG(dbgs() << "getElementTypeOfDerivedType(): elemPtrType = "
				<< *elemPtrType << "\n");
				LLVM_DEBUG(dbgs() << "getElementTypeOfDerivedType(): sizeof(elemPtrType) = "
				<< sizeofElem << "\n");

				if (sizeofElem == 0) {
				return getElementTypeOfDerivedType(elemPtrType);
				}
				else
				return elemPtrType;
				}
				*/

				return nullptr;
				}

				bool testEquivalence(Instruction it, PHINode phi) {
				Value *op0 = nullptr;

				LLVM_DEBUG(dbgs() << "Entered testEquivalence(): it = " << *it
				<< ", phi = " << *phi << "\n");

				if (phi == it)
				return true;

				if (it->getNumOperands() > 0) {
				op0 = it->getOperand(0);
				}

				switch (it->getOpcode()) {
				case Instruction::ZExt:
				case Instruction::SExt:
				case Instruction::Trunc:
				// case Instruction::ShuffleVector:
				// case Instruction::InsertElement:
				// case Instruction::PHI:
				// case Instruction::ExtractElement:
				// res = "";
				break;

				/* case Instruction::GetElementPtr:
				break; */

				default:
				return false;
				// assert(0 && "testEquivalence(): we do not deal with these cases");
				}

				/*
				// Important-TODO: need to do this for the case we have an access like
				// B[j + 1][0]
				switch (it->getOpcode()) {
				case Instruction::Add:
				res += " + ";
				break;
				}
				*/

				return testEquivalence((Instruction *)op0, phi);
				}

				inline bool isGlobalArray(GetElementPtrInst *GEPPtr) {
				return llvm::dyn_cast<GlobalValue>(GEPPtr->getOperand(0)) != nullptr;
				}
				//
				inline int getIndexFirstOpndFromGEPInst(GetElementPtrInst *GEPPtr) {
				int startIndex;

				if (isGlobalArray(GEPPtr)) {
				/* Following also
				http://llvm.org/docs/GetElementPtr.html#why-is-the-extra-0-index-required
				we see that for global arrays, the 1st index
				in GEP is redundant - it has value 0 invariably,
				so we skip it.
				*/
				startIndex = 2;
				} else {
				startIndex = 1;
				}

				return startIndex;
				}
				//
				inline Value getFirstIndexOpndFromGEPInst(GetElementPtrInst GEPPtr) {
				int startIndex = getIndexFirstOpndFromGEPInst(GEPPtr);

				Value *res = GEPPtr->getOperand(startIndex);

				return res;
				}

				// We check we have a correctly paranthesized expression
				bool checkCorrectParanthesis(std::string res, int *indexTop) {
				std::stack<int> stack;
				int numOpenParans = 0;

				for (std::size_t i = 0; i < res.length(); i++) {
				if (res[i] == '(') {
				numOpenParans++;
				stack.push(i);
				}
				if (res[i] == ')') {
				numOpenParans--;
				if (indexTop != nullptr)
				*indexTop = stack.top();

				// assert(numOpenParans >= 0 && "Invalid arithmetic expression!");
				if (numOpenParans < 0)
				return false;

				stack.pop();
				}
				}

				return (numOpenParans == 0);
				}

				/*
				Currently this function ONLY does this: it gets rid of duplicated spaces.

				// Important-TODO: get rid of unnecessary parantheses
				- for this normally I have to parse expr before and pretty-print it
				intelligently.
				To do algebraic simplification is more complex. See Muchnick's book,
				- value numbering, etc.

				To do Constant folding (Constant-Expression Evaluation),
				although both these methods are heavy, we could use them:
				We could try to use CIL's partial evaluation module, but:
				- it doesn't work with C++
				We can't use sympy, which can parse expressions (parse_expr) and
				simplify them (method sympy.simplify.cse_main.cse) because:
				- see e.g.
				https://github.com/sympy/sympy/blob/master/sympy/parsing/sympy_parser.py
				- it doesn't handle pointers, etc - but we can extend it
				- ...
				*/
				std::string canonicalizeExpression(std::string aStr,
				bool removeOuterParens = false) {
				std::string res = aStr;

				LLVM_DEBUG(dbgs() << "Entered canonicalizeExpression(aStr = " << aStr
				<< ")\n");

				// We make all double spaces single-spaces.
				for (;;) {
				// From http://www.cplusplus.com/reference/string/string/find/
				std::size_t pos = res.find(" ");

				if (pos == std::string::npos) {
				break;
				} else {
				// std::cout << "first 'needle' found at position: " << pos << "\n";
				// From http://www.cplusplus.com/reference/string/string/erase/
				res.erase(pos, 1);
				}
				}
				/*
				std::cout << "canonicalizeExpression(): returning aStr = "
				<< res << "\n";
				*/

				if (removeOuterParens) {
				int indexTop = -1;

				bool correct = checkCorrectParanthesis(res, &indexTop);
				// assert(correct == true && "Invalid arithmetic expression!");

				LLVM_DEBUG(dbgs() << "canonicalizeExpression(): indexTop = " << indexTop
				<< "\n");
				if (indexTop == 0) {
				// If the last paranthesis (not necessarily the last char) is matched by
				// the 1st paranthesis
				// assert(res.front() == '(' && res.back() == ')');
				if (res.front() == '(' && res.back() == ')') {
				LLVM_DEBUG(
				dbgs() << "canonicalizeExpression(): removing useless outer ()\n");

				res = res.substr(1, res.size() - 2);
				}
				}
				}

				if (res != aStr)
				return canonicalizeExpression(res);
				return res;
				}

				inline void printInfo(Instruction *it, std::string str0, std::string str1,
				std::string iGetNameData, Value op0, Value op1) {
				LLVM_DEBUG(dbgs() << "printInfo(): it = " << *it << "\n");
				LLVM_DEBUG(dbgs() << "printInfo(): it ptr = " << it << "\n");
				LLVM_DEBUG(dbgs() << " (printInfo(): it->getOpcodeName() = "
				<< it->getOpcodeName() << ")\n");
				LLVM_DEBUG(dbgs() << " (printInfo(): it->getOpcode() = " << it->getOpcode()
				<< ")\n");
				LLVM_DEBUG(dbgs() << " (printInfo(): it->getName() = " << iGetNameData
				<< ")\n");
				LLVM_DEBUG(dbgs() << " (printInfo(): str0 = " << str0 << ")\n");
				LLVM_DEBUG(dbgs() << " (printInfo(): str1 = " << str1 << ")\n");

				if (op0 == nullptr) {
				LLVM_DEBUG(dbgs() << " (printInfo(): op0 = nullptr\n");
				} else {
				LLVM_DEBUG(dbgs() << " (printInfo(): op0 = " << *op0 << ")\n");
				}

				if (op1 == nullptr) {
				LLVM_DEBUG(dbgs() << " (printInfo(): op1 = nullptr\n");
				} else {
				LLVM_DEBUG(dbgs() << " (printInfo(): op1 = " << *op1 << ")\n");
				}
				}

				std::string getPredicateString(int pred) {
				std::string res;

				// See all values of enum Predicate at
				// https://llvm.org/doxygen/classllvm_1_1CmpInst.html
				// Note: FCMP_O* is for ordered (neither operand can be a QNAN),
				// FCMP_O* is for unordered (either can be QNAN) -
				// see https://llvm.org/docs/LangRef.html#fcmp-instruction
				if (pred == CmpInst::ICMP_SGT \|\| pred == CmpInst::ICMP_UGT \|\|
				pred == CmpInst::FCMP_OGT \|\| pred == CmpInst::FCMP_UGT)
				res = " > ";
				else if (pred == CmpInst::ICMP_SGE \|\| pred == CmpInst::ICMP_UGE \|\|
				pred == CmpInst::FCMP_OGE \|\| pred == CmpInst::FCMP_UGE)
				res = " >= ";
				else if (pred == CmpInst::ICMP_SLT \|\| pred == CmpInst::ICMP_ULT \|\|
				pred == CmpInst::FCMP_OLT \|\| pred == CmpInst::FCMP_ULT)
				res = " < ";
				else if (pred == CmpInst::ICMP_SLE \|\| pred == CmpInst::ICMP_ULE \|\|
				pred == CmpInst::FCMP_OLE \|\| pred == CmpInst::FCMP_ULE)
				res = " <= ";
				else if (pred == CmpInst::ICMP_EQ \|\| pred == CmpInst::FCMP_OEQ \|\|
				pred == CmpInst::FCMP_UEQ)
				res = " == ";
				else if (pred == CmpInst::ICMP_NE \|\| pred == CmpInst::FCMP_ONE \|\|
				pred == CmpInst::FCMP_UNE)
				res = " != ";

				return res;
				}

				/* Alex:
				* - we get a C expression
				* by walking on the use-def-chains (more exactly the only reaching
				* definition for the SSA it instruction) in order to get the most complete
				* definition for the it instruction.
				*
				* - doing some sort of partial evaluation

				Note: SCEV also pretty prints - display expressions related to tripcounts
				(zext i16 (-1 + %N) to i32)
				(see code below:
				BackedgeTakenCount->dump();
				ExitCount->dump(); )
				See, more exactly,
				http://llvm.org/docs/doxygen/html/ScalarEvolution_8cpp_source.html
				void SCEV::print(raw_ostream &OS) const {}

				Important Note: We use ((int *)&x) instead of &x because the & for an array
				(global at least) is a pointer to array and this affects/reflects on
				the pointer arithmetic.
				Concrete example on ARM 32 on zedboard.arh.pub.ro:
				/home/alarm/OpincaaLLVM/opincaa_standalone_app/35_MatMul/SIZE_256/STDout_003a
				Before 1st write: &A = 405912
				Before 1st write: &A + 20 = 3027352
				Before 1st write: &A + 131072 = 405912
				Before 1st write: ((char *)&A) + 131072 = 536984
				when running on ARM (32 bits processor) it is possible that &A + x == &A
				(where x is e.g. 131072) (probably because of overflow or because the
				VM did not map memory there or...)
				So, again, we need to use when doing arithmetic instead of &A --> (int *)(&A)
				or (short/char *)(&A) .
				Note: [TODO CHECK WELL]: It seems for pointer type we print just the
				var e.g. A without &A.
				*/
				int getExprExitCount_StepCt_int = -100100100;
				bool usePaddingForNestedLoops_more = false;
				bool getExprVarSpecial = false;
				// Very Important: call also cacheExpr.clear(); when making
				// getExprVarSpecial = true;
				// IIRC used to distinguish between Value LLVM IR objects with the same name
				// (by adding the pointer address to it, thus resulting name is e.g.
				// row__0x1299b68)
				// bool getExprForTripCount = false;
				bool getExprForDMATransfer = true;
				// std::unordered_map<Instruction *, std::string> cacheExpr;
				std::unordered_map<Value *, std::string> cacheExpr;
				Value *basePtrGetExprIt; // The base pointer (GetElementPtr, 1st operand)
				int getExprGEPCount = 0;
				Value *getExprGEP;
				// Important-TODO: make getExpr(Value *it) and check if it is instruction or not
				//
				std::string getExpr(Value *aVal) {
				std::string str0("");
				std::string str1("");

				std::string strCopy;

				// ConstantExpr: bool isSelectConstantExpr = false;
				const ConstantExpr *CE;

				std::string res;

				Value *op0 = nullptr;
				Value *op1 = nullptr;

				const std::string STR_VEC_IND = "vec.ind";
				const std::string STR_STEP_ADD = "step.add";
				/* Note that if I recall correctly, the var names ending in splatinsert are
				automatically generated */
				const std::string STR_BROADCAST_SPLATINSERT = "broadcast.splatinsert";
				const std::string STR_SPLATINSERT = ".splatinsert";
				const std::string STR_BROADCAST_SPLAT = "broadcast.splat";
				const std::string STR_SPLAT = ".splat";
				//
				const std::string STR_INDUCTION = "induction";
				const std::string STR_UNDEF = "undef";
				const std::string STR_INDEX = "index";
				const std::string STR_INDEX_NEXT = "index.next";

				const std::string INVALID_VALUE_CACHEEXPR = "\\@@INVALID_STR@@";

				// See http://www.cplusplus.com/reference/unordered_map/unordered_map/find/
				// std::unordered_map<Instruction *, std::string>::const_iterator got =
				std::unordered_map<Value *, std::string>::const_iterator got =
				// cacheExpr.find(it);
				cacheExpr.find(aVal);

				// const char *iGetNameData = it->getName().data();
				// std::string iGetNameData(it->getName().data());
				std::string iGetNameData(aVal->getName().data());

				res.clear();

				if (aVal == nullptr) {
				LLVM_DEBUG(dbgs() << "Entered getExpr(): aVal = nullptr.\n");
				return std::string("");
				} else {
				LLVM_DEBUG(dbgs() << "Entered getExpr(): aVal = " << aVal << ".\n");
				}

				Instruction *it = llvm::dyn_cast<Instruction>(aVal);
				if (it == nullptr) {
				LLVM_DEBUG(dbgs() << "getExpr(): it = nullptr.\n");

				res = "";

				/*
				it = (Instruction *)aVal;
				LLVM_DEBUG(dbgs() << "getExpr(): After static typecast, *it = "
				<< *it << ".\n");
				*/

				/* Global var (values, not arrays) in LLVM language are already pointers to
				the global address space. This is why we need to use & for them.
				We check that *it is a GlobalValue like:
				@colsK = common local_unnamed_addr global i32 0, align 4
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1GlobalValue.html
				// (also http://llvm.org/docs/LangRef.html#global-variables)
				*/
				// if (GlobalValue *gv = llvm::dyn_cast<GlobalValue>(it))
				if (llvm::dyn_cast<GlobalValue>(aVal) != nullptr) {
				if (usePaddingForNestedLoops_more == true)
				res = "(";
				else
				res = "((int *)&";

				if (getExprVarSpecial) {
				// res += "<VAR*SPECIAL>";
				}

				res += iGetNameData;

				if (getExprVarSpecial) {
				res += stringPrintf(const_cast<char >("__%p"), (void )it);
				// res += "<VARSPECIALEND>";
				}

				res += ")";
				if (basePtrGetExprIt == nullptr)
				basePtrGetExprIt = aVal;

				goto GetExpr_end;
				}

				/* See llvm.org/docs/doxygen/html/Core_8h_source.html#l00100 and
				http://llvm.org/docs/doxygen/html/Instruction_8cpp_source.html#l00194
				for all supported opcodes.
				In fact, we can have more valid opcodes than these
				See http://llvm.org/docs/doxygen/html/Core_8h_source.html#l00100
				- the enums with typedef enum LLVMOpcode - e.g., LLVMAdd, etc
				seem to be related to values of Instruction::getOpcode().
				I think Instruction:Add == LLVMAdd + InstructionVal
				(use gdb to see exactly);
				note also that getOpcode() returns getValueID() - InstructionVal.
				http://llvm.org/docs/doxygen/html/Value_8h_source.html
				see enum ValueTy - better see
				http://llvm.org/test-doxygen/api/Value_8h_source.html,
				since the Value.h source file uses TableGen macros inside.
				*/
				LLVM_DEBUG(dbgs() << "getExpr(): Special case\n");
				const Constant *C = llvm::dyn_cast<Constant>(aVal);

				LLVM_DEBUG(dbgs() << "getExpr(): C = " << C << "\n");

				if (C != nullptr) {
				LLVM_DEBUG(dbgs() << " getExpr(): aVal is Constant.\n");
				// res += "Constant-->";

				if (const ConstantInt *CI = llvm::dyn_cast<ConstantInt>(C)) {
				LLVM_DEBUG(dbgs() << " getExpr(): CI->getValue() = " << CI->getValue()
				<< ".\n");
				res += std::to_string(CI->getValue().getSExtValue());
				} else if (const ConstantDataVector *CDV =
				llvm::dyn_cast<ConstantDataVector>(C)) {
				LLVM_DEBUG(dbgs() << " getExpr(): CDV->getSplatValue() = "
				<< CDV->getSplatValue() << ".\n");
				res += std::to_string(
				((ConstantInt *)CDV->getSplatValue())->getSExtValue());
				}
				else
				/*
				// Maybe useful in the future, but little likely:
				if (const ConstantDataArray *CA =
				llvm::dyn_cast<ConstantDataArray>(C)) { LLVM_DEBUG(dbgs() << "
				getExpr(): It is ConstantDataArray.\n");
				}
				if (const ConstantArray *CA = llvm::dyn_cast<ConstantArray>(C)) {
				LLVM_DEBUG(dbgs() << " getExpr(): It is ConstantArray.\n");
				}
				*/

				/* Inspired from
				http://llvm.org/docs/doxygen/html/AsmWriter_8cpp_source.html#l01304,
				method WriteConstantInternal() .
				*/
				if (CE = llvm::dyn_cast<ConstantExpr>(C)) {
				LLVM_DEBUG(dbgs() << " getExpr(): It is ConstantExpr.\n");
				LLVM_DEBUG(dbgs() << " getExpr(): CE->getNumOperands() = "
				<< CE->getNumOperands() << "\n");

				if (CE->getNumOperands() > 0) {
				op0 = CE->getOperand(0);
				str0 = op0->getName().data();

				if (CE->getOpcode() == Instruction::GetElementPtr) {
				res += "(int *)&(" + str0;
				} else {
				res += "(" + getExpr(op0) + " )";
				}
				// small-TODO: Replace the " )" with ")"
				}

				// From http://llvm.org/test-doxygen/api/Constants_8cpp_source.html
				// res += CE->getOpcodeName();
				switch (CE->getOpcode()) {
				// small-TODO: This code is similar to the one for the switch above
				// - maybe we should reuse code although it will make things more
				// complicated...
				case Instruction::Add:
				res += " + ";
				break;
				case Instruction::Sub:
				res += " - ";
				break;
				case Instruction::Mul:
				res += " * ";
				break;
				case Instruction::UDiv:
				case Instruction::SDiv:
				res += " / ";
				break;
				case Instruction::SRem:
				case Instruction::URem:
				res += " % ";
				break;
				case Instruction::Shl:
				res += " << ";
				break;
				case Instruction::LShr:
				res += " >> ";
				break;
				case Instruction::AShr:
				res += " >> ";
				break;
				case Instruction::ICmp:
				case Instruction::FCmp: {
				// Check type of cmp
				int pred = CE->getPredicate();
				res += getPredicateString(pred);

				break;
				}
				case Instruction::ZExt:
				case Instruction::SExt:
				// res += " ext "; // Note: this is unary operator
				break;
				case Instruction::Trunc:
				// res += " trunc ";
				break;
				case Instruction::Select: {
				res += " ? ";

				// To pretty-print the 3rd operand, below
				/*
				// ConstantExpr: isSelectConstantExpr = true;
				LLVM_DEBUG(dbgs()
				<< "getExpr(): setting isSelectConstantExpr = true\n");
				*/
				break;
				}

				case Instruction::PtrToInt:
				case Instruction::IntToPtr: {
				break;
				}
				case Instruction::GetElementPtr: {
				// res += " [Unsupported_C_CtExpr_operator = GEP]";

				int numOpnds = CE->getNumOperands();
				int startIndex = 2; // getIndexFirstOpndFromGEPInst(GEPPtr);

				for (int i = startIndex; i < numOpnds; i++) {
				res += "[";

				Value *op_i = CE->getOperand(i);
				res += getExpr(op_i);

				res += "]";
				}
				break;
				}
				default:
				res += " [Unsupported_C_CtExpr_operator]";
				break;
				}

				if (CE->getOpcode() != Instruction::GetElementPtr &&
				CE->getNumOperands() > 1) {
				op1 = CE->getOperand(1);
				str1 = op1->getName().data();

				res += getExpr(op1);
				}

				if (CE->getOpcode() == Instruction::Select &&
				CE->getNumOperands() > 2) {
				res += " : ";

				op1 = CE->getOperand(2);
				str1 = op1->getName().data();

				res += getExpr(op1);
				}

				if (CE->getNumOperands() == 0) // MEGA-TODO: takeout this silly check
				res += " ";

				if (CE->getOpcode() == Instruction::GetElementPtr) {
				res += ")";
				}
				} // END if (CE = llvm::dyn_cast<ConstantExpr>(C))
				else {
				res += " [Unsupported_Constant]";
				/*
				// Compiler error: <<error: cannot use typeid with -fno-rtti>>
				res += typeid(C).name() + " ";
				res += typeid(aVal).name() + " ";
				*/

				res += " ";
				}

				// break;
				// goto GetExpr_end;
				} else {
				// res += " [Unsupported_C_operator] [Constant_C_is_nullptr]";
				res += iGetNameData;

				// goto GetExpr_end;
				}

				goto GetExpr_end;
				} else {
				LLVM_DEBUG(dbgs() << "getExpr(): it = " << it << ".\n");
				}

				/* Note: It is possible that the names have a suffix when we have 2+
				vars starting with the same name - this happens when more
				vector.body BBs are created (more loops are vectorized).
				For this, we use strncmp(), not strcmp(). */

				if (it->getNumOperands() > 0) {
				op0 = it->getOperand(0);
				str0 = op0->getName().data();
				if (it->getNumOperands() > 1) {
				op1 = it->getOperand(1);
				str1 = op1->getName().data();
				}
				}

				/*
				* Note: It points to an Instruction (or just a Value).
				getOperand() returns type Value.
				* From http://llvm.org/docs/doxygen/html/classllvm_1_1Value.html
				<< StringRef getName () const
				Return a constant reference to the value's name. >>
				*/

				/*
				LLVM_DEBUG(dbgs() << "getExpr(): getExprForTripCount = "
				<< getExprForTripCount << "\n");
				*/
				printInfo(it, str0, str1, iGetNameData, op0, op1);

				if (got == cacheExpr.end()) {
				// cacheExpr.insert(it);

				/* We insert an empty string res, just to keep track we visited this
				* node and we update the entry with the correct value at the end of
				* the function. */
				cacheExpr[it] = INVALID_VALUE_CACHEEXPR; // res;
				} else {
				if (cacheExpr[it] != INVALID_VALUE_CACHEEXPR) {
				// This case can be quite easily reached if the expression it has
				// several times as constituent atoms the same expression.
				res = got->second;
				LLVM_DEBUG(
				dbgs()
				<< "getExpr(): We already visited this node so we stop here.\n");
				goto GetExpr_end;
				}
				else
				/* We have already cached something for this node,
				* either an INVALID_VALUE_CACHEEXPR or a valid value we can return
				* directly. */
				if (it->getOpcode() == Instruction::PHI) {
				/* If we visited this phi we do NOT revisit it since it can easily
				* result in infinite cycles... It's not very fundamented,
				* but it's OK :) */
				// We should keep the unstripped name, although it is possible that if
				// we visited the variable node before it might be already stripped.

				if (str0.empty()) {
				std::string exprOp0 = getExpr(op0);
				LLVM_DEBUG(dbgs() << "getExpr(): Checking PHI's exprOp0 = " << exprOp0
				<< " (should be a constant).\n");

				// MEGA-TODO: Test well, also regressive tests
				if (iGetNameData.empty()) {
				if (exprOp0.size() > 4)
				res = exprOp0;
				}

				// assert(exprOp0 == "0");
				} else if (str1.empty()) {
				std::string exprOp1 = getExpr(op1);
				LLVM_DEBUG(dbgs() << "getExpr(): Checking PHI's exprOp1 = " << exprOp1
				<< " (should be a constant).\n");

				// MEGA-TODO: test well, also regressive tests
				if (iGetNameData.empty()) {
				if (exprOp1.size() > 4)
				res = exprOp1;
				}

				// assert(exprOp1 == "0");
				} else {
				LLVM_DEBUG(dbgs() << "getExpr(): Setting res to empty string.\n");

				res = rStripStringAfterChar(iGetNameData, '.');
				if (getExprVarSpecial) {
				res += stringPrintf(const_cast<char >("__%p"), (void )it);

				// res += "<VARSPECIALEND>";
				}

				goto GetExpr_end;
				}

				LLVM_DEBUG(
				dbgs()
				<< "getExpr(): We visited part of this PHI node "
				"so we approximate it... This should be avoided if possible.\n");

				if (getExprVarSpecial) {
				// res += "<VAR*SPECIAL>";
				}

				LLVM_DEBUG(dbgs() << "getExpr(): res = " << res << "\n");
				// res = rStripStringAfterChar(iGetNameData, '.');
				strCopy.assign(iGetNameData);
				strCopy = rStripStringAfterChar(strCopy, '.');
				res += strCopy;
				LLVM_DEBUG(dbgs() << "getExpr(): after, res = " << res << "\n");

				if (getExprVarSpecial) {
				res += stringPrintf(const_cast<char >("__%p"), (void )it);

				// res += "<VARSPECIALEND>";
				}

				goto GetExpr_end;
				}
				} // END else if (got == cacheExpr.end())

				/* // NOT_TREAT_NMODVF
				// When computing trip count, I don't want it to be multiple of VF,
				// but I want the original expression.
				// Note: n.mod.vf is a name given by the program below (this module) in
				// getOrCreateVectorTripCount().
				// It is possible that the names to have a suffix since the names
				// exist, since a different vector.body was created before.
				if (startsWith(iGetNameData, STR_REMAINDER_VF)) {
				LLVM_DEBUG(dbgs() << "getExpr(): NOT following remainder var "
				<< iGetNameData << ".\n");

				// A simple hack, since I already have the - operator and am lazy to
				// get rid of it:
				res = "0";

				goto GetExpr_end;
				}
				*/

				if (startsWith(iGetNameData, STR_INDUCTION) &&
				(it->getOpcode() == Instruction::Add)) {
				LLVM_DEBUG(dbgs() << "getExpr(): NOT following induction var "
				<< iGetNameData << ".\n");
				res = getExpr(it->getOperand(0));

				/* Indeed, induction is a vector of consecutive indices - let's call it
				a vector index.
				Very Important: To understand things better, we distinguish:
				- the scalar index, indexLLVM_LV, or LV's index (and index.next)
				- the vector index, vec.ind, used for loading from array (well,
				sortof scalar, but...) */

				/*
				// We do NOT process this:
				res += " + ";
				// TODO: check that op1 == <VF x i...><0, 1, ..., VF-1>
				res += "indexLLVM_LV";
				*/
				goto GetExpr_end;
				}

				if ((it->getOpcode() == Instruction::PHI) &&
				startsWith(iGetNameData, STR_INDEX) &&
				startsWith(std::string(it->getOperand(1)->getName().data()),
				STR_INDEX_NEXT)) {
				// TODO Check that op0 is constant 0.
				// Coping with %index = phi i32 [ 0, %vector.ph ],
				// [ %index.next, %vector.body ]
				// LLVM_DEBUG(dbgs() << "getExpr(): NOT following index induction var.\n");
				LLVM_DEBUG(dbgs() << "getExpr(): Treating special case index = "
				"phi(0, index.next).\n");

				// A simple hack, since I already have the - operator and I am lazy to
				// get rid of it:
				#ifdef AGGREGATED_DMA_TRANSFERS
				// Important note: we include this file from the back end also now
				// (not only LoopVectorize.cpp)
				if (getExprForDMATransfer)
				res = "0";
				else
				res = "indexLLVM_LV";
				#else
				if (getExprForDMATransfer)
				res = "0";
				else
				res = "indexLLVM_LV";
				#endif

				goto GetExpr_end;
				}

				// Here we try to solve a recurrence equation with any PHI node related to
				// the C source variables:
				if ((it->getOpcode() == Instruction::PHI) &&
				startsWith(iGetNameData, STR_VEC_IND) == false &&
				startsWith(iGetNameData, STR_STEP_ADD) == false &&
				startsWith(iGetNameData, STR_INDUCTION) == false) {
				LLVM_DEBUG(
				dbgs()
				<< "getExpr(): it is Phi, phi node with no special vector vars...\n");

				assert(it->getNumOperands() > 0);

				// std::string exprOp1 = canonicalizeExpression(getExpr(op1);

				/*
				// Treating the rather unfortunate case that scalar evolution
				// is "over-intelligent"
				#define STR_INDVAR_WITH_ASSOCIATED_PHI "indvarScEv"
				if (startsWith(iGetNameData, STR_INDVAR_WITH_ASSOCIATED_PHI)) {
				LLVM_DEBUG(dbgs() << "getExpr(): Treating Phi case indvar with "
				"associated Phis\n");

				BasicBlock *bbIt = it->getParent();

				// TODO: retrieve increment step of Instruction *it
				// (maybe use the SCEV BackedgeTakenCount of getOrCreateTripCount()).
				// for (auto instr : bbIt)
				// for (BasicBlock::iterator I = bbIt->begin(); isa<PHINode>(I); ++I)
				for (const Instruction &I : *bbIt) {
				if (I.getOpcode() == Instruction::PHI) {
				std::string Iname = I.getName();
				LLVM_DEBUG(dbgs() << "getExpr(): Iname = " << Iname << "\n");

				if (!startsWith(Iname, STR_INDVAR_WITH_ASSOCIATED_PHI)) {
				// MEGA-TODO: check *I has the same PHI labels

				// TODO: retrieve increment step of Instruction *I

				res = getExpr(&I);
				goto GetExpr_end;
				//break;
				}
				}
				}
				}
				*/

				// MEGA-TODO: Test well
				if (((Instruction *)op0)->getOpcode() == Instruction::PHI) {
				// MEGA-TODO:
				// && startsWith(exprOp1, STR_UNDEF))
				LLVM_DEBUG(dbgs() << "getExpr(): op0 is Phi --> res = getExpr(op0)\n");

				// NOT good - doesn't work well e.g. for 41_PolybenchCPU_covariance for
				// the for (j = i; ...) loop: res = getExpr(op0);
				res = rStripStringAfterChar(iGetNameData, '.');
				// Temporary solution: MEGA-TODO: test well

				if (getExprVarSpecial) { // NOT tested, not sure if really required
				res += stringPrintf(const_cast<char >("__%p"), (void )it);
				// res += "<VARSPECIALEND>";
				}

				goto GetExpr_end;
				}
				else
				// MEGA-TODO: Test well
				if (it->getNumOperands() >= 2)
				// This is required e.g. for
				// 41_PolybenchCPU_covariance/B/A_With_counter_instrument/5/test.c
				if (((Instruction *)op1)->getOpcode() == Instruction::PHI) {
				// MEGA-TODO: && startsWith(exprOp1, STR_UNDEF)
				LLVM_DEBUG(dbgs() << "getExpr(): op1 is Phi --> res = getExpr(op1)\n");
				res = getExpr(op1);
				goto GetExpr_end;
				}

				std::string exprOp0Orig =
				canonicalizeExpression(getExpr(op0),
				/* removeOuterParens = */ true);
				LLVM_DEBUG(dbgs() << " exprOp0Orig = " << exprOp0Orig << "\n");
				if (exprOp0Orig == "0") {
				// if (str0.empty())
				// Note: constants like i64 0 don't have name --> str0 is empty
				// Note: vars like %33 but also constants like i64 0 don't have name -->
				// str0 is empty
				LLVM_DEBUG(
				dbgs() << " getExpr(): strlen(str0) == 0 --> exchanging operands\n");

				str0.swap(str1);

				Value *tmp = op0;
				op0 = op1;
				op1 = tmp;

				// EXCHANGE(str0, str1);
				// EXCHANGE((int)op0, (int)op1);

				printInfo(it, str0, str1, iGetNameData, op0, op1);
				}

				/*
				// TODO: We should treat PHI in a unified way by putting op0 to be a
				// constant normally and op1 an add/sub Instruction (normally)
				// rather-important-MEGA-TODO: determine if BB Phi-target of op0 dominates
				// BB Phi-target of op1. If not exchange op0<->op1
				// Currently we do a simpler solution since we do not have an
				// instance of DomTree in getExpr().
				// This solution is good but it does NOT treat cases like
				// for (j = i + 1; j < ...; j +...)
				std::string exprOp0 = getExpr(op0);
				std::string exprOp1 = getExpr(op1);
				if (exprOp1[0] >= '0' && exprOp1[0] <= '9') {
				LLVM_DEBUG(dbgs() << " Exchanging op0 and op1\n");

				// op1 is a numerical constant - we exchange 0 with 1
				Value *tmp = op1;
				op1 = op0;
				op0 = tmp;
				}
				*/
				if (str0.empty() == false) {
				// If op0 is not a constant (since the name of op0 is NOT empty)
				// assert(str0 ==(symbolically, after more recovery) iGetNameData + 1);

				LLVM_DEBUG(dbgs() << " ... op0 = " << op0 << "\n");
				LLVM_DEBUG(dbgs() << " ... op1 = " << op1 << "\n");

				LLVM_DEBUG(dbgs() << " ... Entering getExpr() for op0\n");
				std::string exprOp0 =
				canonicalizeExpression(getExpr(op0),
				/* removeOuterParens = */ false);
				LLVM_DEBUG(dbgs() << " exprOp0 = " << exprOp0 << "\n");

				std::string itVarPlusCt = "(";

				strCopy.assign(iGetNameData);
				strCopy = rStripStringAfterChar(strCopy, '.');
				if (getExprVarSpecial) {
				strCopy += stringPrintf(const_cast<char >("__%p"), (void )it);
				}

				// assert(op1 != nullptr);
				Instruction *itOp0 = llvm::dyn_cast<Instruction>(op0);
				assert(itOp0 != nullptr);
				std::string exprItOp0Op0 = getExpr(itOp0->getOperand(0));
				LLVM_DEBUG(dbgs() << " strCopy = " << strCopy << "\n");
				LLVM_DEBUG(dbgs() << " exprItOp1Op0 = " << exprItOp0Op0 << "\n");
				assert(exprItOp0Op0 == strCopy);

				// itVarPlusCt = itVarPlusCt + iGetNameData;
				itVarPlusCt = itVarPlusCt + strCopy;
				//
				// itVarPlusCt = itVarPlusCt + " + 1)";
				itVarPlusCt = itVarPlusCt + " + " + getExpr(itOp0->getOperand(1)) + ")";

				LLVM_DEBUG(dbgs() << " exprOp0 = " << exprOp0 << "\n");
				LLVM_DEBUG(dbgs() << " itVarPlusCt = " << itVarPlusCt << "\n");

				if (exprOp0 != itVarPlusCt) {
				// Important-TODO: take from the other if case below
				LLVM_DEBUG(dbgs() << " (SPECIAL) VERY BAD case encountered: "
				<< "Phi node is NOT like x = Phi(x + 1, 0) --> "
				"return 'main' part of exprOp0 = "
				<< exprOp0 << "\n");

				/* Important-TODO: this case is special (with a ~incomplete solution)
				- to compute a solution to the phi node we normally require more
				intelligent analysis.

				Here we treat associated Phis:
				For example, for test 32_MatAdd we have:
				%conv48.us = phi i32 [ %conv.us, %for.cond3.for.inc12_crit_edge.us ],
				[ 0, %for.cond3.preheader.us.preheader ]
				%i.047.us = phi i16 [ %inc13.us, %for.cond3.for.inc12_crit_edge.us ],
				[ 0, %for.cond3.preheader.us.preheader ]

				While the 2nd phi has an easy to find solution (by seeing that
				%inc13.us = add i16 %i.047.us, 1, !dbg !27)
				which means the closed-form solution of Phi is %i.047.us = i,
				for the 1st phi node the situation is Very complicated.
				But we see that:
				%conv.us = sext i16 %inc13.us to i32, !dbg !28
				which makes the Phi expression of %conv48.us the same as
				for %i.047.us .

				Also for SSD:
				%conv48.us = phi(i.047.us + 1, 0)

				getExpr(): it = %conv327 = phi i32 [ 0, %for.cond2.preheader ],
				[ %conv3, %for.inc44 ]
				getExpr(): op1 = %conv3 = sext i16 %inc45 to i32, !dbg !41
				getExpr(): updated op1 = %inc45 = add i16 %counter.026, 1,
				!dbg !40
				Alhough %conv.327 does NOT appear in the final .ll file, if we look in:
				NEW_v128i16/90_CV/SSD/STDerr_clang_opt_01
				we have a similar case:
				for.cond7.preheader:
				; preds = %for.cond2.preheader, %for.inc44
				%conv327 = phi i32 [ 0, %for.cond2.preheader ],
				[ %conv3, %for.inc44 ]
				%counter.026 = phi i16 [ 0, %for.cond2.preheader ],
				[ %inc45, %for.inc44 ]
				*/

				/* MEGA-TODO: think if possible to do better like
				having getExpr return a parse tree where it is clear that a
				node is a var or constant in order to avoid using substr. */
				/*
				LLVM_DEBUG(dbgs()
				<< " res = " << res << "\n"
				<< " exprOp0.substr(1, exprOp0.size() - 6) = "
				<< exprOp0.substr(1, exprOp0.size() - 6) << "\n");
				*/
				assert(exprOp0.size() >= 6 + 1);

				// Since we are in a SPECIAL ("VERY BAD") case, we make some
				// checks to "fix" the problem. If it doesn't work we simply
				// "bail" out
				std::string exprOp0Substr = exprOp0.substr(1, exprOp0.size() - 6);
				LLVM_DEBUG(dbgs() << " exprOp0.substr(1, exprOp0.size() - 6) = "
				<< exprOp0Substr);
				if (checkCorrectParanthesis(exprOp0Substr, nullptr) &&
				(endsWith(exprOp0Substr, " +") ==
				false) // MEGA-TODO: CHECK better
				// that expression is well formed
				) {
				res += exprOp0Substr;
				} else {
				res += exprOp0;
				}

				goto GetExpr_end;
				} else { // i.e. if (exprOp0 == itVarPlusCt)
				// Case: *it is: x == phi(x + 1, 0);
				// Check getExpr(op0) == str0 + 1;

				LLVM_DEBUG(dbgs() << " ... Entering getExpr() for op1\n");
				std::string exprOp1 = canonicalizeExpression(getExpr(op1));
				LLVM_DEBUG(dbgs() << " exprOp1 = " << exprOp1 << "\n");
				// From NOW onwards we treat cases like
				// for (c1=468; c1 >= 0; c1-=78) assert(exprOp1 == "0");

				// assert(op0->getOpcode() == Instruction::ADD);
				/* assert that:
				- op1 is ct 0 and
				- op0 == iGetNameData + 1 (but this normally leads to
				a cyclic dependency)
				i.e., check that (str0 == iGetNameData) && (str1 == ct 0) */
				/* This next condition is Very important
				* - e.g., for i phi node, for ...: because TODO
				*/
				LLVM_DEBUG(
				dbgs()
				<< "getExpr(): ...and str0 not empty, --> res = name of it\n");

				/* We don't modify iGetNameData - otherwise we get errors
				(assertion failures, etc) for modifying the LLVM variable names
				*/
				strCopy.assign(iGetNameData);

				/* We might have a newly created temp LLVM var and keep the original
				(source file) variable name
				*/
				strCopy = rStripStringAfterChar(strCopy, '.');
				res += strCopy;

				if (getExprVarSpecial) { // NOT tested, not sure if really required
				res += stringPrintf(const_cast<char >("__%p"), (void )it);
				// res += "<VARSPECIALEND>";
				}

				goto GetExpr_end;
				}
				}
				} // end if ((it->getOpcode() == Instruction::PHI)

				// TODO: NOT sure if it's OK to only choose it->getOperand(0)
				// Normally this makes it a pointer to Value
				if (it->getNumOperands() == 0) {
				Type itType = ((Value )it)->getType();

				LLVM_DEBUG(dbgs() << " (getExpr(): it->getType() = " << *itType
				<< " )\n");

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1Type.html
				if (itType->isVectorTy()) {
				int64_t resVal = 0;

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1ConstantVector.html
				/* Surprisingly NOT working:
				ConstantVector ctVec = llvm::dyn_cast<ConstantVector>((Value )it);
				*/

				// See llvm.org/docs/doxygen/html/classllvm_1_1ConstantDataVector.html
				ConstantDataVector *ctVec = llvm::dyn_cast<ConstantDataVector>(it);

				LLVM_DEBUG(dbgs() << "getExpr(): ctVec =" << ctVec << "\n");

				if (ctVec != nullptr) {
				Constant *ctSplat = ctVec->getSplatValue();

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1Constant.html
				const APInt ctAPInt = ctSplat->getUniqueInteger();
				// TODO: Use instead Constant::getAggregateElement() - see
				// http://lists.llvm.org/pipermail/llvm-dev/2016-November/106954.html

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1APInt.html
				resVal = ctAPInt.getSExtValue();
				}

				/* This was meant for the %induction vector var;
				but it's NOT good for %(broadcast).splatinsert - but we take
				care of this below ...TODO [SAY WHERE]
				*/
				res += stringPrintf((char *)"(int)%ld", resVal);

				goto GetExpr_end;
				}

				// We print the constant or input variable:
				std::string Result;
				raw_string_ostream OS(Result);
				it->printAsOperand(OS, /* bool PrintType = */ false);
				OS.flush();
				LLVM_DEBUG(dbgs() << " (getExpr(): it->printAsOperand() = " << Result
				<< ")\n");

				// We erase the leading % char if it exists - for name of var
				if (Result.c_str()[0] == '%')
				Result.erase(0, 1);

				// f16 (half) constants strangely are represented in LLVM in hexadecimal
				// with prefix 0xH, as specified at http://llvm.org/docs/LangRef.html
				// ("The IEEE 16-bit format (half precision) is represented by 0xH
				// followed by 4 hexadecimal digits.")
				// so now we remove the 'H' to make it proper hexadecimal value.
				if (startsWith(Result, "0xH"))
				Result.erase(2, 1);

				/*
				Result.clear();
				it->print(OS);
				OS.flush();
				LLVM_DEBUG(dbgs() << " (getExpr(): it->print() = "
				<< Result << ")\n");
				*/
				/*
				switch (it->getOpcode()) {
				case Instruction::Constant:
				LLVM_DEBUG(dbgs() << " (getExpr(): it is Constant))\n");
				res = "ct";
				break;
				}
				*/
				if (startsWith(Result, STR_UNDEF) == false) {
				/* Note:
				We can also have as parent
				%broadcast.splatinsert = insertelement <32 x i64> undef, i64 %mul.us,
				i32 0
				For this case, operand 0 is printed as: "<32 x i64> undef".
				But we avoid to reach this case by specially treating
				a %broadcast.splatinsert node.
				*/
				res += Result;
				}
				goto GetExpr_end;
				} // End of if (it->getNumOperands() == 0)

				bool putParantheses;

				switch (it->getOpcode()) {
				case Instruction::ZExt:
				case Instruction::SExt:
				case Instruction::Trunc:
				case Instruction::ShuffleVector:
				case Instruction::InsertElement:
				case Instruction::PHI:
				case Instruction::ExtractElement:
				// res = "";
				putParantheses = false;
				break;

				case Instruction::GetElementPtr: {
				/*
				putParantheses = false;
				res = "(int *)&";
				*/

				/* Important:
				From http://en.cppreference.com/w/c/language/operator_precedence:
				- operator [] (Array subscripting) has bigger priority
				than & (Address-of).
				So we need to put parantheses here
				in case [] follows.
				*/
				putParantheses = true;

				// res = "(int *)&(";

				// By doing so we treat case like (&ls[index])[0] (see SSD benchmark)
				res = "((int *)&";

				getExprGEPCount++;
				getExprGEP = it;

				GetElementPtrInst *GEPInstr = llvm::dyn_cast<GetElementPtrInst>(it);
				assert(GEPInstr != nullptr);
				if (basePtrGetExprIt == nullptr)
				basePtrGetExprIt = GEPInstr->getPointerOperand();

				break;
				}
				default:
				putParantheses = true;
				res = "(";
				}
				/*
				if (putParantheses)
				res = "(";
				//
				if (it->getOpcode() == Instruction::GetElementPtr) { }
				*/

				LLVM_DEBUG(dbgs() << "getExpr(): putParantheses = " << putParantheses
				<< "\n");

				if (it->getNumOperands() > 1) {
				LLVM_DEBUG(dbgs() << "getExpr(): it->getOperand(1) = " << *op1 << "; "
				<< "(str1 = " << str1 << ")[END]\n");

				// We prevent pretty-printing constant vectors
				// if (getExprForTripCount == false)
				/* TODO: maybe step.add is not operand 1, but 0 or 2, etc; check that
				op0 is constant */
				if (startsWith(iGetNameData, STR_VEC_IND) &&
				startsWith(str1, STR_STEP_ADD) &&
				startsWith(str0, STR_INDUCTION) == false) {
				/*
				This prevents further processing of:
				%vec.ind = phi <32 x i64> [ <i64 0, i64 1, ...>, %vector.ph ],
				[ %step.add, %vector.body ]
				BUT NOT of: %vec.ind = phi <32 x i32> [ %induction, %vector.ph ],
				[ %step.add, %vector.body ]
				*/
				LLVM_DEBUG(
				dbgs()
				<< "getExpr(): treating vec.ind = phi ct_vec, step.add case\n");

				#ifdef AGGREGATED_DMA_TRANSFERS
				// Important note: we include this file from the back end also now
				// (not only LoopVectorize.cpp)
				if (getExprForDMATransfer)
				res = "0";
				else
				res = "indexLLVM_LV";
				#else
				if (getExprForDMATransfer)
				res = "0";
				else
				res = "indexLLVM_LV";
				#endif
				goto GetExpr_end;
				}

				if (startsWith(iGetNameData, STR_VEC_IND) &&
				startsWith(str0, STR_INDUCTION) && startsWith(str1, STR_STEP_ADD)) {
				/*
				This prevents further processing of:
				%vec.ind = phi <32 x i64> [ <i64 0, i64 1, ...>, %vector.ph ],
				[ %step.add, %vector.body ]
				BUT NOT of: %vec.ind = phi <32 x i32> [ %induction, %vector.ph ],
				[ %step.add, %vector.body ]
				*/
				LLVM_DEBUG(
				dbgs()
				<< "getExpr(): treating vec.ind = phi induction, step.add case\n");
				res = getExpr(op0);
				res += " + indexLLVM_LV";
				goto GetExpr_end;
				}

				if (it->getOpcode() == Instruction::PHI) {
				// assert(0 && "We should not get here... since we already treated it");
				LLVM_DEBUG(
				dbgs() << "getExpr(): it is Phi. (normally should not be here)\n");
				LLVM_DEBUG(dbgs() << " getExpr(): it = " << it << "\n");

				// Important-TODO : follow I guess the loopexit value
				/* This is for cases like the one encountered in 50_SpMV, where we
				cycle over temporary created vars:
				%1 = phi i16 [ %2, %for.cond.loopexit ],
				[ %.pre, %for.body.preheader ]
				%2 = load i16, i16* %arrayidx5, align 2, !dbg !64, !tbaa !46
				%arrayidx5 = getelementptr inbounds i16, i16* %row_ptr,
				i64 %idxprom4, !dbg !64
				%idxprom4 = sext i32 %add to i64, !dbg !64
				%add = add nsw i32 %i.026, 1, !dbg !63
				%i.026 = phi i32 [ %add, %for.cond.loopexit ],
				[ 0, %for.body.preheader ]
				*/
				res = getExpr(op0);
				LLVM_DEBUG(dbgs() << "getExpr(): it is Phi, res = " << res << "\n");

				/* Noname like in the case of 50_SpMV testcase:
				%1 = phi(%2, row_ptr[0])
				TODO But I guess I should check iGetName != str0 + 1...
				*/
				// Note: constants like i64 0 don't have name --> str0 is empty
				if (str0.empty()) {
				LLVM_DEBUG(dbgs() << "getExpr(): it is Phi, str0 is empty.\n");

				// assert getNumOperands() > 1
				std::string res2 = getExpr(op1);
				LLVM_DEBUG(dbgs() << "getExpr(): res2 = " << res2 << "\n");
				// res += " phi ";

				/* Here we compute the solution of phi - a 1st simple and ~bad
				* attempt.
				MEGA-TODO: compute the
				closed-form solution from these recursive equations.
				*/
				#define STR_TO_LOOK_FOR " + 1"
				res = canonicalizeExpression(res);
				LLVM_DEBUG(dbgs() << "getExpr(): After canonicalizeExpression() res = "
				<< res << "\n");
				std::size_t found = res.find(STR_TO_LOOK_FOR);
				if (found != std::string::npos) {
				LLVM_DEBUG(dbgs() << "getExpr(): calling res.erase(found, "
				"strlen(STR_TO_LOOK_FOR))\n");
				res.erase(found, strlen(STR_TO_LOOK_FOR));
				res = canonicalizeExpression(res, true);
				}
				/*
				// BUGS: because of modifying the internal char * of a std::strng
				// and I guess string::size() needs to be updated
				// also(??)
				const char *resCStr = res.c_str();
				char resCStrFound = (char )strstr(resCStr, STR_TO_LOOK_FOR);
				if (resCStrFound != nullptr) {
				LLVM_DEBUG(dbgs() << "InstrumentVectorStore(): resCStrFound = "
				<< resCStrFound << "\n");
				// NOT correct - strings do overlap: strcpy(resCStrFound,
				// resCStrFound + 4);
				memmove(resCStrFound, resCStrFound + strlen(STR_TO_LOOK_FOR),
				strlen(resCStrFound + strlen(STR_TO_LOOK_FOR)) + 1);
				}
				*/
				} else {
				/* Important-TODO: think if it is correct to be empty
				- try it out - note there is also another case treating
				phi nodes above.
				*/
				}

				goto GetExpr_end;
				}

				/*
				// NOT necessary anymore - treat below this case by simply jumping to
				// meaningful values
				if (startsWith(iGetNameData, STR_BROADCAST_SPLATINSERT) \|\|
				startsWith(iGetNameData, STR_SPLATINSERT)) {
				LLVM_DEBUG(dbgs()
				<< "getExpr(): treating (broadcast).splat(insert) case\n");

				// op0 should be vector undef
				res = getExpr(op1);
				goto GetExpr_end;
				}
				*/
				if (startsWith(iGetNameData, STR_BROADCAST_SPLAT) \|\|
				/* // I guess it's not necessary to do this test:
				&& (startsWith(iGetNameData, STR_BROADCAST_SPLATINSERT) == false) */
				startsWith(iGetNameData, STR_SPLAT)
				/* // I guess it's not necessary to do this test:
				&& (startsWith(iGetNameData, STR_SPLATINSERT) == false) */
				) {
				LLVM_DEBUG(dbgs() << "getExpr(): treating (broadcast).splat case\n");

				/* This is for the SSD test:
				%broadcast.splat33 = shufflevector <128 x i16> %broadcast.splatinsert32,
				<128 x i16> undef, <128 x i32> zeroinitializer
				where it =
				%broadcast.splatinsert32 = insertelement <128 x i16> undef, i16 %0,
				i32 0
				and op0 = <128 x i16> undef
				*/
				if (llvm::dyn_cast<Instruction>(op0) == nullptr) {
				res = getExpr(op1);
				goto GetExpr_end;

				/*it->getOpcode() == Instruction::InsertElement)
				if (startsWith(iGetNameData, STR_BROADCAST_SPLAT) \|\|
				*/
				} else {
				/// TODO: Maybe I should do some checks
				// op1 should be vector undef, op2 should be zeroinitializer
				// res = getExpr(op0);
				res = getExpr((((Instruction *)op0)->getOperand(1)));
				goto GetExpr_end;
				}
				}
				}

				// We now pretty print op0;

				if (str0.empty()
				/* \|\|
				startsWith(str0, STR_BROADCAST_SPLATINSERT) */
				) {
				/* If the name of the variable is empty it means it is an automatically
				* generated name (like %0, etc), NOT a name from the original (C,C++)
				* program. Therefore we look also at the def of this var.
				*/

				/*
				TODO
				- ~BAD: recursively test str0 until we reach a
				variable name that is input to the function??
				*/
				/* TODO (THIS IS MAYBE BADLY DESIGNED - might require more or fewer steps):
				* Coping with type conversions like i32 to i64 (ex:
				* 201_LoopVectorize/25_GOOD_map/NEW/7_v16i32/3better_opt.ll)
				* in which case we have the following:
				for.body.preheader: ; preds = %entry
				%0 = add i32 %N, -1
				%1 = zext i32 %0 to i64
				%2 = add nuw nsw i64 %1, 1
				%min.iters.check = icmp ult i64 %2, 16
				[...]
				min.iters.checked: ; preds = %for.body.preheader
				%n.vec = and i64 %2, 8589934576
				*/

				/*
				LLVM_DEBUG(dbgs() << "getExpr(): (it->getOperand(0) = "
				<< * (it->getOperand(0)) << ")\n");
				*/
				LLVM_DEBUG(
				dbgs() << "getExpr(): str0 empty (or so) --> calling getExpr(op0)\n");
				LLVM_DEBUG(dbgs() << " (getExpr(): current it = " << *it << ").\n");

				// strcpy(res, tmp);
				res += getExpr(op0);
				} else { // str0 is NOT empty
				/*
				if (getExprForTripCount == false) {
				LLVM_DEBUG(dbgs() << "getExpr(): returning str0 = "
				<< str0 << "\n");
				//res.assign(str0);
				// Gives <<warning: cast from type ‘const char’ to type ‘char’ casts
				// away qualifiers>>
				// * (char *)strchr(str0, '.') = 0;

				// Important-TODO: this
				// transformation I guess is NOT 100% safe, because a named var
				// can be a C var or an auxiliary LLVM var created in the LLVM pass
				// - think how to make it safe

				if (startsWith(str0, STR_VEC_IND) == false) {
				// We don't modify str0 - otherwise we get errors
				//(assertion failures, etc) for modifying the LLVM variable names
				strCopy.assign(str0);

				// We might have a newly created temp LLVM var and keep the original
				// (source file) variable name
				strCopy = rStripStringAfterChar(strCopy, '.');
				res += strCopy;

				// Maybe put here operation pretty-print TODO
				}
				else {
				// vec.ind is the widened induction variable
				// res += str0;
				}
				}
				else
				*/
				{ // getExprForTripCount == true and str0 not empty
				/*
				// This SOMETIMES introduces infinite cycles, which can be avoided
				// if we keep track of the instructions already visited
				Example of cycle:
				- these 2 simple instructions:
				%indvars.iv29 = phi i64 [ 0, %for.body.preheader ],
				[ %indvars.iv.next30, %for.cond.loopexit ].
				%indvars.iv.next30 = add nuw nsw i64 %indvars.iv29, 1, !dbg !9
				*/

				if ((it->getOpcode() == Instruction::GetElementPtr) &&
				(it->getNumOperands() >= 3)) {
				res += ((Instruction *)op0)->getName().data();
				} else {
				LLVM_DEBUG(
				dbgs() << "getExpr(): str0 not empty --> calling getExpr(op0)\n");
				// This introduces useless parantheses: res += "(";
				res += getExpr(op0);
				// This introduces useless parantheses: res += ")";
				}
				}
				}

				// We now pretty print operation associated to *it;

				/* We generate C code for the operation associated to the it
				LLVM instruction.
				See http://llvm.org/docs/doxygen/html/Instruction_8cpp_source.html
				for all/various possible opcodes - see method
				00194 const char Instruction::getOpcodeName(unsigned OpCode). /
				// Note: vec.ind is a PHI node
				// if (startsWith(str0, STR_VEC_IND) == false)
				// if (!(getExprForTripCount == false && strcmp(str0, "vec.ind") == 0))
				switch (it->getOpcode()) {
				case Instruction::Call: {
				// Important-TODO: this works well for the case 31c_dotprod_RaduH,
				// BUT not sure if it's general
				res = "((int *)&(";
				res += iGetNameData;
				res += "))";

				std::string strFuncName;
				strFuncName = dyn_cast<CallInst>(it)->getCalledFunction()->getName().str();
				assert(strFuncName == "malloc" \|\| strFuncName == "calloc");

				// Inspired from llvm.org/docs/ProgrammersManual.html
				// #iterating-over-def-use-use-def-chains
				for (Value::user_iterator i = it->user_begin(), e = it->user_end(); i != e;
				++i) {
				if (Instruction inst = dyn_cast<Instruction>(i)) {
				LLVM_DEBUG(dbgs() << "getExpr(): it is used in instruction: " << *inst
				<< "\n");
				if (BitCastInst bci = dyn_cast<BitCastInst>(i)) {
				if (strlen(bci->getName().data()) != 0) {
				LLVM_DEBUG(
				dbgs()
				<< "getExpr(): it is used in BitCast instruction --> we use "
				"its name instead\n");
				res = "((int *)&(";
				res += bci->getName().data();
				res += "))";
				} else {
				if (ranGetAllMetadata == false) {
				LLVM_DEBUG(dbgs() << "getExpr(): Before, varNameMap.size() = "
				<< varNameMap.size() << "\n");
				getAllMetadata(bci->getParent()->getParent());
				LLVM_DEBUG(dbgs() << "getExpr(): varNameMap.size() = "
				<< varNameMap.size() << "\n");
				}

				std::string valueName = getLLVMValueName(bci);

				// Normally the value name is a number when getName() is empty
				LLVM_DEBUG(dbgs() << "getExpr(): bci has empty name\n");
				LLVM_DEBUG(dbgs() << "getExpr(): bci = " << *bci << "\n");
				LLVM_DEBUG(dbgs() << " bci = " << bci << "\n");
				LLVM_DEBUG(dbgs() << " bci->getValueName() = "
				<< bci->getValueName() << "\n");
				LLVM_DEBUG(dbgs() << " bci->getName() = "
				<< bci->getName() << "\n");
				LLVM_DEBUG(dbgs() << "getExpr(): it = " << *it << "\n");
				//
				LLVM_DEBUG(dbgs() << "getExpr(): varNameMap[bci] = "
				<< varNameMap[valueName] << "\n");

				// res = varNameMap[valTypeAndName];
				res = varNameMap[valueName];

				goto GetExpr_end;

				/*
				for (Value::user_iterator i2 = bci->user_begin(),
				e2 = bci->user_end();
				i2 != e2; ++i2) {
				if (Instruction inst2 = dyn_cast<Instruction>(i2)) {
				LLVM_DEBUG(dbgs() << "getExpr(): bci is used in instruction: "
				<< *inst2 << "\n");
				if (StoreInst si = dyn_cast<StoreInst>(i2)) {
				LLVM_DEBUG(dbgs()
				<< "getExpr(): bci is used in StoreInst instruction "
				"--> we use its name instead\n");
				res = "((int *)&(";
				res += si->getName().data();
				res += "))";
				goto GetExpr_end;
				}
				}
				}
				*/
				}
				}
				} else {
				LLVM_DEBUG(dbgs() << "getExpr(): it is used in val: " << *i << "\n");
				}
				}

				goto GetExpr_end;
				}
				case Instruction::Add:
				case Instruction::FAdd:
				res += " + ";
				break;
				case Instruction::Sub:
				res += " - ";
				break;
				// case Instruction::FSub:
				case Instruction::Mul: {
				res += " * ";
				break;
				}
				// case Instruction::FMul:
				case Instruction::UDiv:
				case Instruction::SDiv:
				case Instruction::FDiv:
				res += " / ";
				break;
				case Instruction::URem:
				case Instruction::SRem:
				// case Instruction::FRem:
				res += " % ";
				break;
				case Instruction::Shl:
				res += " << ";
				break;
				case Instruction::LShr:
				res += " >> ";
				break;
				// Important-TODO: think better
				case Instruction::AShr:
				/* From https://en.wikipedia.org/wiki/Arithmetic_shift#cite_ref-1 :
				"The >> operator in C and C++ is
				not necessarily an arithmetic shift. Usually it is only an
				arithmetic shift if used with a signed integer type on its
				left-hand side.
				If it is used on an unsigned integer type instead, it will be a
				logical shift."
				*/
				res += " >> ";
				break;
				case Instruction::And:
				res += " & ";
				break;
				case Instruction::Or:
				res += " \| ";
				break;
				case Instruction::Xor:
				res += " ^ ";
				break;
				case Instruction::PHI:
				res += " phi ";
				break;
				case Instruction::Load:
				// res += " load ";
				res += "[0]";
				break;
				case Instruction::Store:
				res += " store ";
				break;
				case Instruction::GetElementPtr:
				// res += " getelementptr ";
				/*
				if (it->getNumOperands() < 3) {
				res += " + ";
				}
				*/
				break;
				case Instruction::ZExt:
				case Instruction::SExt:
				// res += " ext "; // Note: this is unary operator
				break;
				// case Instruction::FPTrunc:
				case Instruction::Trunc: {
				// res += " trunc ";
				break;
				}
				case Instruction::ICmp:
				case Instruction::FCmp: {
				// Check type of cmp
				CmpInst *Cmp = dyn_cast<CmpInst>(it);
				int pred = Cmp->getPredicate();
				res += getPredicateString(pred);

				break;
				}
				case Instruction::Select: {
				// TODO: add : and 3rd operand
				res += " ? ";
				break;
				}
				case Instruction::ShuffleVector: {
				// res += " shufflevector ";
				break;
				}
				case Instruction::InsertElement: {
				// res += " insertelement ";
				break;
				}
				case Instruction::ExtractElement: {
				// res += " extractelement ";

				LLVM_DEBUG(dbgs() << "getExpr(): case Instruction::ExtractElement.\n");
				LLVM_DEBUG(dbgs() << "getExpr(): res = " << res << "\n");

				std::string op1Expr = getExpr(op1);
				// ((Instruction *)op1)->getName().data();

				// std::string op0Expr = getExpr(op0);

				if (op1Expr == "0") {
				LLVM_DEBUG(
				dbgs()
				<< "getExpr(): Neutralizing ExtractElement, since index is 0\n");
				if (putParantheses)
				res += ")";

				goto GetExpr_end;
				}

				// TODO: check that op0 is vec.ind or sext vec.ind
				res = "((int *)&" + res;
				// res = "((int *)&" + op0Expr;
				res += "))"; // One ')' for the '(' added at beginning getExpr,
				// 1 to close the '(' before '&'
				res += "[";
				res += op1Expr;
				res += "]";

				// basePtr = nullptr;

				goto GetExpr_end;
				// break;
				}
				// See e.g. http://llvm.org/docs/doxygen/html/Instructions_8h_source.html
				case Instruction::PtrToInt:
				case Instruction::IntToPtr: {
				/* This is normally encountered when using the LLVM-SRA library and
				I give SCEVRangeBuilder->getUpperBound(AccessFunction) */
				// We don't do a thing
				break;
				}
				case Instruction::Alloca: {
				// res += "(int *)&(";
				// TODO: this works well for the case 31c_dotprod_RaduH
				res = "((int *)&(";
				res += iGetNameData;
				res += "))";
				goto GetExpr_end;
				// break;
				}
				default:
				assert(0 && "NOT implemented");
				} // end switch

				/*
				if (it->getOpcode() == Instruction::PHI) {
				// TODO: check that op0 is associated to predecessor BB
				// different than itself - e.g., preheader, vector.ph, etc

				// This results in incorrect paranthesis - missing a few ')'
				goto GetExpr_end;
				}
				*/

				// Pretty print op1:

				/*
				if ((it->getNumOperands() > 1) &&
				(it->getOpcode() != Instruction::PHI))
				*/
				if (it->getNumOperands() > 1) {
				// strcat(res, " ");
				res += " ";

				bool specialCase = false;
				bool str1NotEmpty = (str1.empty() == false);

				if (str1NotEmpty) {
				LLVM_DEBUG(dbgs() << "getExpr(): str1 NOT empty: str1 = " << str1
				<< "\n");

				/* Important Note: some operands have names and are also
				instructions */

				/*
				The following can also introduce cycles:
				- an example
				getExpr(): str0 empty (or so) --> calling getExpr(op0)
				(getExpr(): current it = %vec.ind = phi <32 x i64> [ <i64 0,
				i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7, i64 8, i64 9,
				i64 10, i64 11, i64 12, i64 13, i64 14, i64 15, i64 16, i64 17,
				i64 18, i64 19, i64 20, i64 21, i64 22, i64 23, i64 24, i64 25,
				i64 26, i64 27, i64 28, i64 29, i64 30, i64 31>, %vector.ph ],
				[ %step.add, %vector.body ]).
				getExpr(): getExprForTripCount = 1
				getExpr(): it = <32 x i64> <i64 0, i64 1, i64 2, i64 3,
				i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10,
				i64 11, i64 12, i64 13, i64 14, i64 15, i64 16,
				i64 17, i64 18, i64 19, i64 20, i64 21, i64 22,
				i64 23, i64 24, i64 25, i64 26, i64 27, i64 28,
				i64 29, i64 30, i64 31>
				(getExpr(): it->getOpcodeName() = <Invalid operator> )
				(getExpr(): it->getName() = )
				(getExpr(): it->printAsOperand() == <i64 0, i64 1, i64 2, i64 3,
				i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11,
				i64 12, i64 13, i64 14, i64 15, i64 16, i64 17, i64 18, i64 19,
				i64 20, i64 21, i64 22, i64 23, i64 24, i64 25, i64 26, i64 27,
				i64 28, i64 29, i64 30, i64 31>)
				getExpr(): calling getExpr(op1).
				getExpr(): it = %vec.ind = phi <32 x i64> [ <i64 0, i64 1,
				i64 2, i64 3, i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10,
				i64 11, i64 12, i64 13, i64 14, i64 15, i64 16, i64 17, i64 18,
				i64 19, i64 20, i64 21, i64 22, i64 23, i64 24, i64 25, i64 26,
				i64 27, i64 28, i64 29, i64 30, i64 31>, %vector.ph ],
				[ %step.add, %vector.body ].
				getExpr(): getExprForTripCount = 1
				getExpr(): it = %step.add = add <32 x i64> %vec.ind,
				<i64 32, i64 32, i64 32, i64 32, i64 32, i64 32,
				i64 32, i64 32, i64 32, i64 32, i64 32, i64 32,
				i64 32, i64 32, i64 32, i64 32, i64 32, i64 32,
				i64 32, i64 32, i64 32, i64 32, i64 32, i64 32,
				i64 32, i64 32, i64 32, i64 32, i64 32, i64 32,
				i64 32, i64 32>, !dbg !38

				- another example:
				getExpr(): it = %row.020.us = phi i64 [ %inc16.us,
				%for.cond1.for.cond.cleanup3_crit_edge.us ],
				[ 0, %for.cond1.preheader.us.preheader ]
				(getExpr(): it->getOpcodeName() = phi)
				(getExpr(): it->getName() = row.020.us)
				getExpr(): it->getOperand(1) = i64 0; str1 = [END]
				getExpr(): getExprForTripCount = 1
				getExpr(): it = %inc16.us = add nuw nsw i64 %row.020.us, 1, !dbg !58
				(getExpr(): it->getOpcodeName() = add)
				(getExpr(): it->getName() = inc16.us)
				getExpr(): it->getOperand(1) = i64 1; str1 = [END]
				getExpr(): getExprForTripCount = 1
				getExpr(): it = %row.020.us = phi i64
				[ %inc16.us, %for.cond1.for.cond.cleanup3_crit_edge.us ],
				[ 0, %for.cond1.preheader.us.preheader ]
				*/
				// if (strcmp(str1, "broadcast.splat") == 0)
				if (((Instruction *)op1)->getNumOperands() != 0 &&
				// This prevents pretty-printing constant vectors, etc
				!(startsWith(iGetNameData, STR_VEC_IND) &&
				startsWith(str1, STR_STEP_ADD))) {
				// res += getExpr(op1);

				// We defer pretty printing below - see immediately below
				/*
				LLVM_DEBUG(dbgs() << "getExpr(): calling getExpr(op1).\n");
				LLVM_DEBUG(dbgs() << " getExpr(): it = " << *it << ".\n");
				res += getExpr(op1);
				*/
				} else {
				res += str1;
				specialCase = true;
				}
				} // End str1 NOT empty

				if (specialCase == false) {
				LLVM_DEBUG(dbgs() << "getExpr(): specialCase = false, "
				<< "str1NotEmpty = " << str1NotEmpty << ".\n");
				LLVM_DEBUG(dbgs() << " getExpr(): it = " << *it << ".\n");

				if (it->getOpcode() == Instruction::GetElementPtr) {
				int numOpnds = it->getNumOperands();
				GetElementPtrInst *GEPPtr = llvm::dyn_cast<GetElementPtrInst>(it);
				int startIndex = getIndexFirstOpndFromGEPInst(GEPPtr);

				for (int i = startIndex; i < numOpnds; i++) {
				res += "[";

				Value *op_i = it->getOperand(i);
				res += getExpr(op_i);

				res += "]";
				}
				} else {
				// strcat(res, getExpr(op1));
				res += getExpr(op1);
				}
				}
				} // END if (it->getNumOperands() > 1)

				// Important-TODO : treat also Phi, which can have arbitrary num of arguments:
				// if (it->getOpcode() == Instruction::Phi)
				if (it->getOpcode() == Instruction::Select) {
				// ConstantExpr: \|\| isSelectConstantExpr)
				res += " : ";

				Value *op2;
				/*
				// ConstantExpr:
				LLVM_DEBUG(dbgs()
				<< "getExpr(): isSelectConstantExpr = "
				<< isSelectConstantExpr);

				if (isSelectConstantExpr) {
				op2 = CE->getOperand(2);
				}
				else {
				*/
				op2 = it->getOperand(2);
				//}
				LLVM_DEBUG(dbgs() << "getExpr(): op2 = " << op2);
				res += getExpr(op2);
				}

				if (putParantheses)
				res += ")";

				GetExpr_end:
				/*
				// Don't really understand why it fails at compile-time at make_pair
				// std::unordered_map<Value *, std::string> cacheExpr;
				typedef Value *ValuePtr;
				//cacheExpr.insert(std::make_pair<Value *, std::string>(it, res));
				cacheExpr.insert(std::make_pair<InstructionPtr, std::string>(it, res));
				But this does NOT fail:
				// Inspired from example
				// http://www.cplusplus.com/reference/utility/make_pair/
				std::pair<Value *, std::string> tmp;
				tmp = std::make_pair(it, res);
				cacheExpr.insert(tmp);
				*/
				/*
				if ((res.size() == 2) && (res.c_str()[0] == '(') &&
				(res.c_str()[1] == ')'))
				*/
				if (res == "()") {
				// This is redundant so we drop it.
				res.clear();
				}

				LLVM_DEBUG(dbgs() << "getExpr(): Inserting in cacheExpr aVal = " << aVal;
				if (aVal != nullptr) dbgs()
				<< " (aVal = " << aVal << ") and res = " << res;
				dbgs() << "\n");
				cacheExpr[aVal] = res;

				return res;
				} // end getExpr()

				} // end namespace

				#endif // RECOVER_FROM_LLVM_IR

llvm/lib/Target/Connex/Select_ABSi32_OpincaaCodeGen.h

This file was added.

				//===-- Select_ABSi32_OpincaaCodeGen.h --------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				// Code auto-generated by method Kernel::genLLVMISelManualCode()
				// from the OPINCAA lib, from kernel abs.i32.
				// You can put this code in the Select() method of the [Target]SelectionDAGISel
				// class of your back end (or MAYBE in the
				// ISelLowering pass).
				// Number of instructions generated: 24.
				//
				//===----------------------------------------------------------------------===//

				// From /home/asusu/LLVM/Tests/opincaa_standalone_apps/Emulate_i32/ABSi32_manual/DumpISel_OpincaaCodeGen_old006_030.cpp


				SDValue ct0 = CrtDAG->getConstant(0, DL, MVT::i16, true, false);
				// R31 = 0;// Instr #0
				SDNode *vload0 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct0,
				// glue (or chain) input edge
				// Alex: SDValue(nodeOpSrcCast1, 1)
				SDValue(nodeOpSrcCast, 1) // Alex
				);

				SDValue ct1 = CrtDAG->getConstant(1, DL, MVT::i16, true, false);
				// R30 = 1;// Instr #1
				SDNode *vload1 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct1,
				// glue (or chain) input edge
				SDValue(vload0, 1)
				);

				// R14 = INDEX;// Instr #2
				SDNode *ldix0 = CrtDAG->getMachineNode(
				Connex::LDIX_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(vload1, 1)
				);

				// CELL_SHR(R27, R30);// Instr #3
				SDNode *cellshr0 = CrtDAG->getMachineNode(
				Connex::CELLSHR_H,
				DL,
				MVT::Glue,
				// Alex: SDValue(nodeOpSrcCast1, 0),
				SDValue(nodeOpSrcCast, 0), // Alex
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(ldix0, 1)
				);

				SDValue ct2 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #4
				SDNode *nop0 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct2,
				// glue (or chain) input edge
				SDValue(cellshr0, 0)
				);

				// R19 = SHIFT_REG;// Instr #5
				SDNode *ldsh0 = CrtDAG->getMachineNode(
				Connex::LDSH_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop0, 0)
				);

				// R20 = R27 < R31;// Instr #6
				SDNode *lt0 = CrtDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// Alex: SDValue(nodeOpSrcCast1, 0),
				SDValue(nodeOpSrcCast, 0), // Alex
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(ldsh0, 1)
				);

				// R13 = R14 & R30;// Instr #7
				SDNode *and0 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload1, 0),
				SDValue(ldix0, 0),
				// glue (or chain) input edge
				SDValue(lt0, 1)
				);

				// R12 = R13 == R30;// Instr #8
				SDNode *eq0 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and0, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(and0, 1)
				);

				// R20 = R20 & R12;// Instr #9
				SDNode *and1 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq0, 0),
				SDValue(lt0, 0),
				// glue (or chain) input edge
				SDValue(eq0, 1)
				);

				// R20 = R20 == R30;// Instr #10
				SDNode *eq1 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and1, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(and1, 1)
				);

				SDValue ct3 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #11
				SDNode *nop1 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct3,
				// glue (or chain) input edge
				SDValue(eq1, 1)
				);

				// WHERE_EQ;// Instr #12
				SDNode *whereeq0 = CrtDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop1, 0)
				);

				// R19 = R31 - R19;// Instr #13
				SDNode *sub0 = CrtDAG->getMachineNode(
				Connex::SUBV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(vload0, 0),
				SDValue(ldsh0, 0),
				SDValue(ldsh0, 0),
				// glue (or chain) input edge
				SDValue(whereeq0, 0)
				}
				);

				// R27 = SUBC(R31, R27);// Instr #14
				SDNode *subc0 = CrtDAG->getMachineNode(
				Connex::SUBCV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(vload0, 0),
				// Alex: SDValue(nodeOpSrcCast1, 0),
				SDValue(nodeOpSrcCast, 0), // Alex
				SDValue(sub0, 0),
				// Alex: SDValue(nodeOpSrcCast1, 0)
				SDValue(nodeOpSrcCast, 0) // Alex
				// no need for glue or chain input (since it normally consumes the output of the predecessor)
				}
				);

				// END_WHERE;// Instr #15
				SDNode *endwhere0 = CrtDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(subc0, 1)
				);

				// CELL_SHL(R19, R30);// Instr #16
				SDNode *cellshl0 = CrtDAG->getMachineNode(
				Connex::CELLSHL_H,
				DL,
				MVT::Glue,
				SDValue(sub0, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(endwhere0, 0)
				);

				SDValue ct4 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #17
				SDNode *nop2 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct4,
				// glue (or chain) input edge
				SDValue(cellshl0, 0)
				);

				// R19 = SHIFT_REG;// Instr #18
				SDNode *ldsh1 = CrtDAG->getMachineNode(
				Connex::LDSH_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop2, 0)
				);

				// R12 = R13 == R31;// Instr #19
				SDNode *eq2 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and0, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(ldsh1, 1)
				);

				SDValue ct5 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #20
				SDNode *nop3 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct5,
				// glue (or chain) input edge
				SDValue(eq2, 1)
				);

				// WHERE_EQ;// Instr #21
				SDNode *whereeq1 = CrtDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop3, 0)
				);

				// R27 = R19 \| R19;// Instr #22
				SDNode resH /or0*/ = CrtDAG->getMachineNode(
				Connex::ORV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(ldsh1, 0),
				SDValue(ldsh1, 0),
				SDValue(subc0, 0),
				// glue (or chain) input edge
				SDValue(whereeq1, 0)
				}
				);

				// END_WHERE;// Instr #23
				SDNode lastNode / endwhere1 */ = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				// Alex: MVT::Glue,
				MVT::Other, // Alex
				// glue (or chain) input edge
				// Alex: SDValue(or0, 1)
				SDValue(resH, 1)
				);

llvm/lib/Target/Connex/Select_ADDf16_OpincaaCodeGen.h

This file was added.

This file has a very large number of changes (3,555 lines). Show File Contents

llvm/lib/Target/Connex/Select_ADDi32_OpincaaCodeGen.h

This file was added.

				//===-- Select_ADDi32_OpincaaCodeGen.h - Connex specific TTI ---------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// Code auto-generated by method Kernel::genLLVMISelManualCode()
				// from the OPINCAA lib, from kernel add.i32.
				// You should include this code in the Select() method of the [Target]SelectionDAGISel
				// class of your back end (or MAYBE in the ISelLowering pass).
				// Number of instructions generated: 15.
				//
				//===----------------------------------------------------------------------===//


				// From /home/asusu/LLVM/Tests/opincaa_standalone_apps/Emulate_i32/ADDi32_manual/DumpISel_OpincaaCodeGen_old09_060.cpp


				SDValue ct0 = CrtDAG->getConstant(0, DL, MVT::i16, true, false);
				// R31 = 0;// Instr #0
				SDNode *vload0 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct0,
				// glue (or chain) input edge
				SDValue(nodeOpSrcCast2, 1)
				);

				SDValue ct1 = CrtDAG->getConstant(1, DL, MVT::i16, true, false);
				// R30 = 1;// Instr #1
				SDNode *vload1 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct1,
				// glue (or chain) input edge
				SDValue(vload0, 1)
				);

				// R29 = R27 + R28;// Instr #2
				SDNode *add0 = CrtDAG->getMachineNode(
				Connex::ADDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(nodeOpSrcCast2, 0),
				SDValue(nodeOpSrcCast1, 0),
				// glue (or chain) input edge
				SDValue(vload1, 1)
				);

				// R23 = ADDC(R31, R31);// Instr #3
				SDNode *addc0 = CrtDAG->getMachineNode(
				Connex::ADDCV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload0, 0),
				SDValue(vload0, 0),
				SDValue(add0, 0)
				// no need for glue or chain input (since it normally consumes the output of the predecessor)
				);

				// R26 = INDEX;// Instr #4
				SDNode *ldix0 = CrtDAG->getMachineNode(
				Connex::LDIX_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(addc0, 1)
				);

				// R25 = R26 & R30;// Instr #5
				SDNode *and0 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload1, 0),
				SDValue(ldix0, 0),
				// glue (or chain) input edge
				SDValue(ldix0, 1)
				);

				// R24 = R25 == R30;// Instr #6
				SDNode *eq0 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and0, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(and0, 1)
				);

				SDValue ct2 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #7
				SDNode *nop0 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct2,
				// glue (or chain) input edge
				SDValue(eq0, 1)
				);

				// WHERE_EQ;// Instr #8
				SDNode *whereeq0 = CrtDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop0, 0)
				);

				SDValue ct3 = CrtDAG->getConstant(0, DL, MVT::i16, true, false);
				// R23 = 0;// Instr #9
				SDNode *vload2 = CrtDAG->getMachineNode(
				Connex::VLOAD_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct3,
				SDValue(addc0, 0),
				// glue (or chain) input edge
				SDValue(whereeq0, 0)
				);

				// END_WHERE;// Instr #10
				SDNode *endwhere0 = CrtDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(vload2, 1)
				);

				// CELL_SHR(R23, R30);// Instr #11
				SDNode *cellshr0 = CrtDAG->getMachineNode(
				Connex::CELLSHR_H,
				DL,
				MVT::Glue,
				SDValue(vload2, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(endwhere0, 0)
				);

				SDValue ct4 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #12
				SDNode *nop1 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct4,
				// glue (or chain) input edge
				SDValue(cellshr0, 0)
				);

				// R21 = SHIFT_REG;// Instr #13
				SDNode *ldsh0 = CrtDAG->getMachineNode(
				Connex::LDSH_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop1, 0)
				);

				// R22 = R21 + R29;// Instr #14
				SDNode resH /add1*/ = CrtDAG->getMachineNode(
				Connex::ADDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(add0, 0),
				SDValue(ldsh0, 0),
				// glue (or chain) input edge
				SDValue(ldsh0, 1)
				);

				SDNode *lastNode = resH;

llvm/lib/Target/Connex/Select_DIVf16_OpincaaCodeGen.h

This file was added.

This file has a very large number of changes (2,909 lines). Show File Contents

llvm/lib/Target/Connex/Select_DIVi16_OpincaaCodeGen.h

This file was added.

This file has a very large number of changes (5,741 lines). Show File Contents

llvm/lib/Target/Connex/Select_LTf16_OpincaaCodeGen.h

This file was added.

				//===-- Select_LTf16_OpincaaCodeGen.h --------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				// Code auto-generated by method Kernel::genLLVMISelManualCode()
				// from the OPINCAA lib, from kernel lt.f16.
				// You should include this code in the Select() method of the [Target]SelectionDAGISel
				// class of your back end (or MAYBE in the ISelLowering pass).
				// Number of instructions generated: 53.
				//
				//===----------------------------------------------------------------------===//


				// From /home/asusu/LLVM/Tests/opincaa_standalone_apps/Emulate_f16/LTf16_manual/DumpISel_OpincaaCodeGen_old07_061.cpp



				SDValue ct0 = CrtDAG->getConstant(0, DL, MVT::i16, true, false);
				// R31 = 0;// Instr #0
				SDNode *vload0 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct0,
				// glue (or chain) input edge
				SDValue(nodeOpSrcCast1, 1)
				);

				SDValue ct1 = CrtDAG->getConstant(1, DL, MVT::i16, true, false);
				// R30 = 1;// Instr #1
				SDNode *vload1 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct1,
				// glue (or chain) input edge
				SDValue(vload0, 1)
				);

				SDValue ct2 = CrtDAG->getConstant(5, DL, MVT::i16, true, false);
				// R29 = 5;// Instr #2
				SDNode *vload2 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct2,
				// glue (or chain) input edge
				SDValue(vload1, 1)
				);

				SDValue ct3 = CrtDAG->getConstant(1023, DL, MVT::i16, true, false);
				// R13 = 1023;// Instr #3
				SDNode *vload3 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct3,
				// glue (or chain) input edge
				SDValue(vload2, 1)
				);

				SDValue ct4 = CrtDAG->getConstant(31744, DL, MVT::i16, true, false);
				// R12 = 31744;// Instr #4
				SDNode *vload4 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct4,
				// glue (or chain) input edge
				SDValue(vload3, 1)
				);

				SDValue ct5 = CrtDAG->getConstant(-32768, DL, MVT::i16, true, false);
				// R11 = -32768;// Instr #5
				SDNode *vload5 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct5,
				// glue (or chain) input edge
				SDValue(vload4, 1)
				);

				SDValue ct6 = CrtDAG->getConstant(1024, DL, MVT::i16, true, false);
				// R10 = 1024;// Instr #6
				SDNode *vload6 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct6,
				// glue (or chain) input edge
				SDValue(vload5, 1)
				);

				SDValue ct7 = CrtDAG->getConstant(0, DL, MVT::i16, true, false);
				// R19 = 0;// Instr #7
				SDNode *vload7 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct7,
				// glue (or chain) input edge
				SDValue(vload6, 1)
				);

				SDValue ct8 = CrtDAG->getConstant(1, DL, MVT::i16, true, false);
				// R14 = 1;// Instr #8
				SDNode *vload8 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct8,
				// glue (or chain) input edge
				SDValue(vload7, 1)
				);

				// R25 = R27 & R12;// Instr #9
				SDNode *and0 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload4, 0),
				SDValue(nodeOpSrcCast1, 0),
				// glue (or chain) input edge
				SDValue(vload8, 1)
				);

				// R26 = R27 & R13;// Instr #10
				SDNode *and1 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload3, 0),
				SDValue(nodeOpSrcCast1, 0),
				// glue (or chain) input edge
				SDValue(and0, 1)
				);

				// R21 = R23 & R12;// Instr #11
				SDNode *and2 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload4, 0),
				SDValue(nodeOpSrcCast2, 0),
				// glue (or chain) input edge
				SDValue(and1, 1)
				);

				// R22 = R23 & R13;// Instr #12
				SDNode *and3 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload3, 0),
				SDValue(nodeOpSrcCast2, 0),
				// glue (or chain) input edge
				SDValue(and2, 1)
				);

				// R17 = POPCNT(R25);// Instr #13
				SDNode *popcnt0 = CrtDAG->getMachineNode(
				Connex::POPCNT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and0, 0),
				// glue (or chain) input edge
				SDValue(and3, 1)
				);

				// R17 = R17 == R29;// Instr #14
				SDNode *eq0 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(popcnt0, 0),
				SDValue(vload2, 0),
				// glue (or chain) input edge
				SDValue(popcnt0, 1)
				);

				// R18 = R26 == R31;// Instr #15
				SDNode *eq1 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and1, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(eq0, 1)
				);

				// R18 = R30 - R18;// Instr #16
				SDNode *sub0 = CrtDAG->getMachineNode(
				Connex::SUBV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload1, 0),
				SDValue(eq1, 0),
				// glue (or chain) input edge
				SDValue(eq1, 1)
				);

				// R18 = R18 & R17;// Instr #17
				SDNode *and4 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq0, 0),
				SDValue(sub0, 0),
				// glue (or chain) input edge
				SDValue(sub0, 1)
				);

				// R18 = R18 == R30;// Instr #18
				SDNode *eq2 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and4, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(and4, 1)
				);

				SDValue ct9 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #19
				SDNode *nop0 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct9,
				// glue (or chain) input edge
				SDValue(eq2, 1)
				);

				// WHERE_EQ;// Instr #20
				SDNode *whereeq0 = CrtDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop0, 0)
				);

				SDValue ct10 = CrtDAG->getConstant(0, DL, MVT::i16, true, false);
				// R14 = 0;// Instr #21
				SDNode *vload9 = CrtDAG->getMachineNode(
				Connex::VLOAD_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct10,
				SDValue(vload8, 0),
				// glue (or chain) input edge
				SDValue(whereeq0, 0)
				);

				// END_WHERE;// Instr #22
				SDNode *endwhere0 = CrtDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(vload9, 1)
				);

				// R15 = POPCNT(R21);// Instr #23
				SDNode *popcnt1 = CrtDAG->getMachineNode(
				Connex::POPCNT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and2, 0),
				// glue (or chain) input edge
				SDValue(endwhere0, 0)
				);

				// R15 = R15 == R29;// Instr #24
				SDNode *eq3 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(popcnt1, 0),
				SDValue(vload2, 0),
				// glue (or chain) input edge
				SDValue(popcnt1, 1)
				);

				// R16 = R22 == R31;// Instr #25
				SDNode *eq4 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and3, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(eq3, 1)
				);

				// R16 = R30 - R16;// Instr #26
				SDNode *sub1 = CrtDAG->getMachineNode(
				Connex::SUBV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload1, 0),
				SDValue(eq4, 0),
				// glue (or chain) input edge
				SDValue(eq4, 1)
				);

				// R16 = R16 & R15;// Instr #27
				SDNode *and5 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq3, 0),
				SDValue(sub1, 0),
				// glue (or chain) input edge
				SDValue(sub1, 1)
				);

				// R16 = R16 == R30;// Instr #28
				SDNode *eq5 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and5, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(and5, 1)
				);

				SDValue ct11 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #29
				SDNode *nop1 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct11,
				// glue (or chain) input edge
				SDValue(eq5, 1)
				);

				// WHERE_EQ;// Instr #30
				SDNode *whereeq1 = CrtDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop1, 0)
				);

				SDValue ct12 = CrtDAG->getConstant(0, DL, MVT::i16, true, false);
				// R14 = 0;// Instr #31
				SDNode *vload10 = CrtDAG->getMachineNode(
				Connex::VLOAD_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct12,
				SDValue(vload9, 0),
				// glue (or chain) input edge
				SDValue(whereeq1, 0)
				);

				// END_WHERE;// Instr #32
				SDNode *endwhere1 = CrtDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(vload10, 1)
				);

				// R16 = R27 == R23;// Instr #33
				SDNode *eq6 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(nodeOpSrcCast1, 0),
				SDValue(nodeOpSrcCast2, 0),
				// glue (or chain) input edge
				SDValue(endwhere1, 0)
				);

				// R14 = R14 ^ R16;// Instr #34
				SDNode *xor0 = CrtDAG->getMachineNode(
				Connex::XORV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq6, 0),
				SDValue(vload10, 0),
				// glue (or chain) input edge
				SDValue(eq6, 1)
				);

				// R16 = R27 & R23;// Instr #35
				SDNode *and6 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(nodeOpSrcCast2, 0),
				SDValue(nodeOpSrcCast1, 0),
				// glue (or chain) input edge
				SDValue(xor0, 1)
				);

				// R16 = R16 & R11;// Instr #36
				SDNode *and7 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload5, 0),
				SDValue(and6, 0),
				// glue (or chain) input edge
				SDValue(and6, 1)
				);

				// R16 = R16 == R11;// Instr #37
				SDNode *eq7 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and7, 0),
				SDValue(vload5, 0),
				// glue (or chain) input edge
				SDValue(and7, 1)
				);

				// R16 = R16 & R14;// Instr #38
				SDNode *and8 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(xor0, 0),
				SDValue(eq7, 0),
				// glue (or chain) input edge
				SDValue(eq7, 1)
				);

				// R16 = R16 == R30;// Instr #39
				SDNode *eq8 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and8, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(and8, 1)
				);

				SDValue ct13 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #40
				SDNode *nop2 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct13,
				// glue (or chain) input edge
				SDValue(eq8, 1)
				);

				// WHERE_EQ;// Instr #41
				SDNode *whereeq2 = CrtDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop2, 0)
				);

				// R27 = R27 ^ R11;// Instr #42
				SDNode *xor1 = CrtDAG->getMachineNode(
				Connex::XORV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(vload5, 0),
				SDValue(nodeOpSrcCast1, 0),
				SDValue(nodeOpSrcCast1, 0),
				// glue (or chain) input edge
				SDValue(whereeq2, 0)
				}
				);

				// R23 = R23 ^ R11;// Instr #43
				SDNode *xor2 = CrtDAG->getMachineNode(
				Connex::XORV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(vload5, 0),
				SDValue(nodeOpSrcCast2, 0),
				SDValue(nodeOpSrcCast2, 0),
				// glue (or chain) input edge
				SDValue(xor1, 1)
				}
				);

				SDValue ct14 = CrtDAG->getConstant(1, DL, MVT::i16, true, false);
				// R19 = 1;// Instr #44
				SDNode *vload11 = CrtDAG->getMachineNode(
				Connex::VLOAD_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct14,
				SDValue(vload7, 0),
				// glue (or chain) input edge
				SDValue(xor2, 1)
				);

				// END_WHERE;// Instr #45
				SDNode *endwhere2 = CrtDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(vload11, 1)
				);

				// R16 = R27 < R23;// Instr #46
				SDNode *lt0 = CrtDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(xor1, 0),
				SDValue(xor2, 0),
				// glue (or chain) input edge
				SDValue(endwhere2, 0)
				);

				// R16 = R16 & R14;// Instr #47
				SDNode *and9 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(xor0, 0),
				SDValue(lt0, 0),
				// glue (or chain) input edge
				SDValue(lt0, 1)
				);

				// R16 = R16 == R30;// Instr #48
				SDNode *eq9 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and9, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(and9, 1)
				);

				SDValue ct15 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #49
				SDNode *nop3 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct15,
				// glue (or chain) input edge
				SDValue(eq9, 1)
				);

				// WHERE_EQ;// Instr #50
				SDNode *whereeq3 = CrtDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop3, 0)
				);

				// R19 = R19 ^ R30;// Instr #51
				SDNode resF16 /xor3*/ = CrtDAG->getMachineNode(
				Connex::XORV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(vload1, 0),
				SDValue(vload11, 0),
				SDValue(vload11, 0),
				// glue (or chain) input edge
				SDValue(whereeq3, 0)
				}
				);

				// END_WHERE;// Instr #52
				SDNode lastNode /endwhere3*/ = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				// Alex: MVT::Glue,
				MVT::Other,
				// glue (or chain) input edge
				// Alex: SDValue(xor3, 1)
				SDValue(resF16, 1)
				);

llvm/lib/Target/Connex/Select_MULTf16_OpincaaCodeGen.h

This file was added.

This file has a very large number of changes (3,242 lines). Show File Contents

llvm/lib/Target/Connex/Select_MULTi32_ComplementedRepresentation_OpincaaCodeGen.h

This file was added.

				//===-- Select_MULTi32_OpincaaCodeGen.h --------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				// Code auto-generated by method Kernel::genLLVMISelManualCode()
				// from the OPINCAA lib, from kernel mul.i32.
				// You can put this code in the Select() method of the [Target]SelectionDAGISel
				// class of your back end (or MAYBE in the ISelLowering pass).
				// Number of instructions generated: 27.


				// From /home/asusu/LLVM/Tests/opincaa_standalone_apps/Emulate_i32/MULTi32_manual_Complemented_radix_216_representation/DumpISel_OpincaaCodeGen_old50_320.cpp


				SDValue ct0 = CrtDAG->getConstant(0, DL, MVT::i16, true, false);
				// R31 = 0;// Instr #0
				SDNode *vload0 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct0,
				// glue (or chain) input edge
				SDValue(nodeOpSrcCast2, 1)
				);

				SDValue ct1 = CrtDAG->getConstant(1, DL, MVT::i16, true, false);
				// R30 = 1;// Instr #1
				SDNode *vload1 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct1,
				// glue (or chain) input edge
				SDValue(vload0, 1)
				);

				// MULT_U(R28, R27);// Instr #2
				SDNode *mult_u0 = CrtDAG->getMachineNode(
				Connex::MULT_U_H,
				DL,
				MVT::Glue,
				SDValue(nodeOpSrcCast2, 0),
				SDValue(nodeOpSrcCast1, 0),
				// glue (or chain) input edge
				SDValue(vload1, 1)
				);

				// R26 = MULT_LOW();// Instr #3
				SDNode *multlo0 = CrtDAG->getMachineNode(
				Connex::MULTLO_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(mult_u0, 0)
				);

				// R25 = MULT_HIGH();// Instr #4
				SDNode *multhi0 = CrtDAG->getMachineNode(
				Connex::MULTHI_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(multlo0, 1)
				);

				// CELL_SHR(R27, R30);// Instr #5
				SDNode *cellshr0 = CrtDAG->getMachineNode(
				Connex::CELLSHR_H,
				DL,
				MVT::Glue,
				SDValue(nodeOpSrcCast1, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(multhi0, 1)
				);

				SDValue ct2 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #6
				SDNode *nop0 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct2,
				// glue (or chain) input edge
				SDValue(cellshr0, 0)
				);

				// R24 = SHIFT_REG;// Instr #7
				SDNode *ldsh0 = CrtDAG->getMachineNode(
				Connex::LDSH_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop0, 0)
				);

				// MULT_U(R24, R28);// Instr #8
				SDNode *mult_u1 = CrtDAG->getMachineNode(
				Connex::MULT_U_H,
				DL,
				MVT::Glue,
				SDValue(ldsh0, 0),
				SDValue(nodeOpSrcCast2, 0),
				// glue (or chain) input edge
				SDValue(ldsh0, 1)
				);

				// R24 = MULT_LOW();// Instr #9
				SDNode *multlo1 = CrtDAG->getMachineNode(
				Connex::MULTLO_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(mult_u1, 0)
				);

				// CELL_SHR(R28, R30);// Instr #10
				SDNode *cellshr1 = CrtDAG->getMachineNode(
				Connex::CELLSHR_H,
				DL,
				MVT::Glue,
				SDValue(nodeOpSrcCast2, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(multlo1, 1)
				);

				SDValue ct3 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #11
				SDNode *nop1 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct3,
				// glue (or chain) input edge
				SDValue(cellshr1, 0)
				);

				// R23 = SHIFT_REG;// Instr #12
				SDNode *ldsh1 = CrtDAG->getMachineNode(
				Connex::LDSH_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop1, 0)
				);

				// MULT_U(R23, R27);// Instr #13
				SDNode *mult_u2 = CrtDAG->getMachineNode(
				Connex::MULT_U_H,
				DL,
				MVT::Glue,
				SDValue(ldsh1, 0),
				SDValue(nodeOpSrcCast1, 0),
				// glue (or chain) input edge
				SDValue(ldsh1, 1)
				);

				// R23 = MULT_LOW();// Instr #14
				SDNode *multlo2 = CrtDAG->getMachineNode(
				Connex::MULTLO_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(mult_u2, 0)
				);

				// CELL_SHR(R25, R30);// Instr #15
				SDNode *cellshr2 = CrtDAG->getMachineNode(
				Connex::CELLSHR_H,
				DL,
				MVT::Glue,
				SDValue(multhi0, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(multlo2, 1)
				);

				SDValue ct4 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #16
				SDNode *nop2 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct4,
				// glue (or chain) input edge
				SDValue(cellshr2, 0)
				);

				// R21 = SHIFT_REG;// Instr #17
				SDNode *ldsh2 = CrtDAG->getMachineNode(
				Connex::LDSH_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop2, 0)
				);

				// R14 = INDEX;// Instr #18
				SDNode *ldix0 = CrtDAG->getMachineNode(
				Connex::LDIX_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(ldsh2, 1)
				);

				// R13 = R14 & R30;// Instr #19
				SDNode *and0 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload1, 0),
				SDValue(ldix0, 0),
				// glue (or chain) input edge
				SDValue(ldix0, 1)
				);

				// R12 = R13 == R30;// Instr #20
				SDNode *eq0 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and0, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(and0, 1)
				);

				SDValue ct5 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #21
				SDNode *nop3 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct5,
				// glue (or chain) input edge
				SDValue(eq0, 1)
				);

				// WHERE_EQ;// Instr #22
				SDNode *whereeq0 = CrtDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop3, 0)
				);

				// R26 = R21 \| R21;// Instr #23
				SDNode *or0 = CrtDAG->getMachineNode(
				Connex::ORV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(ldsh2, 0),
				SDValue(ldsh2, 0),
				SDValue(multlo0, 0),
				// glue (or chain) input edge
				SDValue(whereeq0, 0)
				}
				);

				// R26 = R24 + R26;// Instr #24
				SDNode *add0 = CrtDAG->getMachineNode(
				Connex::ADDV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(or0, 0),
				SDValue(multlo1, 0),
				SDValue(or0, 0),
				// glue (or chain) input edge
				SDValue(or0, 1)
				}
				);

				// R26 = R23 + R26;// Instr #25
				SDNode resH /add1*/ = CrtDAG->getMachineNode(
				Connex::ADDV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(add0, 0),
				SDValue(multlo2, 0),
				SDValue(add0, 0),
				// glue (or chain) input edge
				SDValue(add0, 1)
				}
				);

				// END_WHERE;// Instr #26
				SDNode lastNode /endwhere0*/ = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				// Alex: MVT::Glue,
				MVT::Other,
				// glue (or chain) input edge
				SDValue(resH /add1/, 1)
				);

llvm/lib/Target/Connex/Select_REDf16_OpincaaCodeGen.h

This file was added.

				//===-- Select_REDf16_OpincaaCodeGen.h --------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				// Code auto-generated by method Kernel::genLLVMISelManualCode()
				// from the OPINCAA lib, from kernel red.f16.
				// You should include this code in the Select() method of the [Target]SelectionDAGISel
				// class of your back end (or MAYBE in the ISelLowering pass).
				// Number of instructions generated: 122.
				//
				arsenmUnsubmitted Done Reply Inline Actions Generated code should only come from tablegen and not be committed arsenm: Generated code should only come from tablegen and not be committed
				//===----------------------------------------------------------------------===//


				// From /home/asusu/LLVM/Tests/opincaa_standalone_apps/Emulate_f16/REDf16_manual/DumpISel_OpincaaCodeGen_old18_STD_853.cpp



				SDValue ct0 = CrtDAG->getConstant(1, DL, MVT::i16, true, false);
				// R31 = 1;// Instr #0
				SDNode *vload0 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct0,
				// glue (or chain) input edge
				SDValue(nodeOpSrcCast, 1)
				);

				SDValue ct1 = CrtDAG->getConstant(0, DL, MVT::i16, true, false);
				// R30 = 0;// Instr #1
				SDNode *vload1 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct1,
				// glue (or chain) input edge
				SDValue(vload0, 1)
				);

				SDValue ct2 = CrtDAG->getConstant(31, DL, MVT::i16, true, false);
				// R29 = 31;// Instr #2
				SDNode *vload2 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct2,
				// glue (or chain) input edge
				SDValue(vload1, 1)
				);

				SDValue ct3 = CrtDAG->getConstant(1023, DL, MVT::i16, true, false);
				// R13 = 1023;// Instr #3
				SDNode *vload3 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct3,
				// glue (or chain) input edge
				SDValue(vload2, 1)
				);

				SDValue ct4 = CrtDAG->getConstant(31744, DL, MVT::i16, true, false);
				// R12 = 31744;// Instr #4
				SDNode *vload4 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct4,
				// glue (or chain) input edge
				SDValue(vload3, 1)
				);

				SDValue ct5 = CrtDAG->getConstant(-32768, DL, MVT::i16, true, false);
				// R11 = -32768;// Instr #5
				SDNode *vload5 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct5,
				// glue (or chain) input edge
				SDValue(vload4, 1)
				);

				SDValue ct6 = CrtDAG->getConstant(1024, DL, MVT::i16, true, false);
				// R10 = 1024;// Instr #6
				SDNode *vload6 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct6,
				// glue (or chain) input edge
				SDValue(vload5, 1)
				);

				// R25 = R28 & R11;// Instr #7
				SDNode *and0 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload5, 0),
				SDValue(nodeOpSrcCast, 0),
				// glue (or chain) input edge
				SDValue(vload6, 1)
				);

				// R26 = R28 & R12;// Instr #8
				SDNode *and1 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload4, 0),
				SDValue(nodeOpSrcCast, 0),
				// glue (or chain) input edge
				SDValue(and0, 1)
				);

				SDValue ct7 = CrtDAG->getConstant(10, DL, MVT::i16, true, false);
				// R26 = R26 >> 10;// Instr #9
				SDNode *ishr0 = CrtDAG->getMachineNode(
				Connex::ISHRV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and1, 0),
				ct7,
				// glue (or chain) input edge
				SDValue(and1, 1)
				);

				// R27 = R28 & R13;// Instr #10
				SDNode *and2 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload3, 0),
				SDValue(nodeOpSrcCast, 0),
				// glue (or chain) input edge
				SDValue(ishr0, 1)
				);

				// R17 = R30 < R27;// Instr #11
				SDNode *lt0 = CrtDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload1, 0),
				SDValue(and2, 0),
				// glue (or chain) input edge
				SDValue(and2, 1)
				);

				// R16 = R26 == R30;// Instr #12
				SDNode *eq0 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(ishr0, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(lt0, 1)
				);

				// R09 = R16 & R17;// Instr #13
				SDNode *and3 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(lt0, 0),
				SDValue(eq0, 0),
				// glue (or chain) input edge
				SDValue(eq0, 1)
				);

				// R09 = R09 == R31;// Instr #14
				SDNode *eq1 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and3, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(and3, 1)
				);

				SDValue ct8 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #15
				SDNode *nop0 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct8,
				// glue (or chain) input edge
				SDValue(eq1, 1)
				);

				// WHERE_EQ;// Instr #16
				SDNode *whereeq0 = CrtDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop0, 0)
				);

				SDValue ct9 = CrtDAG->getConstant(1, DL, MVT::i16, true, false);
				// R26 = 1;// Instr #17
				SDNode *vload7 = CrtDAG->getMachineNode(
				Connex::VLOAD_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct9,
				SDValue(ishr0, 0),
				// glue (or chain) input edge
				SDValue(whereeq0, 0)
				);

				// END_WHERE;// Instr #18
				SDNode *endwhere0 = CrtDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(vload7, 1)
				);

				// R17 = R26 == R29;// Instr #19
				SDNode *eq2 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				SDValue(vload2, 0),
				// glue (or chain) input edge
				SDValue(endwhere0, 0)
				);

				// R17 = R17 \| R16;// Instr #20
				SDNode *or0 = CrtDAG->getMachineNode(
				Connex::ORV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq0, 0),
				SDValue(eq2, 0),
				// glue (or chain) input edge
				SDValue(eq2, 1)
				);

				// R17 = R17 == R30;// Instr #21
				SDNode *eq3 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(or0, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(or0, 1)
				);

				SDValue ct10 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #22
				SDNode *nop1 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct10,
				// glue (or chain) input edge
				SDValue(eq3, 1)
				);

				// WHERE_EQ;// Instr #23
				SDNode *whereeq1 = CrtDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop1, 0)
				);

				// R27 = R27 \| R10;// Instr #24
				SDNode *or1 = CrtDAG->getMachineNode(
				Connex::ORV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(vload6, 0),
				SDValue(and2, 0),
				SDValue(and2, 0),
				// glue (or chain) input edge
				SDValue(whereeq1, 0)
				}
				);

				// END_WHERE;// Instr #25
				SDNode *endwhere1 = CrtDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(or1, 1)
				);

				// R18 = R26 == R29;// Instr #26
				SDNode *eq4 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				SDValue(vload2, 0),
				// glue (or chain) input edge
				SDValue(endwhere1, 0)
				);

				// R17 = R27 == R30;// Instr #27
				SDNode *eq5 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(or1, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(eq4, 1)
				);

				// R09 = R31 - R17;// Instr #28
				SDNode *sub0 = CrtDAG->getMachineNode(
				Connex::SUBV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload0, 0),
				SDValue(eq5, 0),
				// glue (or chain) input edge
				SDValue(eq5, 1)
				);

				// R09 = R09 & R18;// Instr #29
				SDNode *and4 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq4, 0),
				SDValue(sub0, 0),
				// glue (or chain) input edge
				SDValue(sub0, 1)
				);

				// RED(R09);// Instr #30
				SDNode *sumRed0 = CrtDAG->getMachineNode(
				Connex::RED_H,
				DL,
				MVT::Glue,
				SDValue(and4, 0),
				// glue (or chain) input edge
				SDValue(and4, 1)
				);

				// R24 = R18 & R17;// Instr #31
				SDNode *and5 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq5, 0),
				SDValue(eq4, 0),
				// glue (or chain) input edge
				SDValue(sumRed0, 0)
				);

				// R09 = R25 == R30;// Instr #32
				SDNode *eq6 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and0, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(and5, 1)
				);

				// R16 = R24 & R09;// Instr #33
				SDNode *and6 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq6, 0),
				SDValue(and5, 0),
				// glue (or chain) input edge
				SDValue(eq6, 1)
				);

				// RED(R16);// Instr #34
				SDNode *sumRed1 = CrtDAG->getMachineNode(
				Connex::RED_H,
				DL,
				MVT::Glue,
				SDValue(and6, 0),
				// glue (or chain) input edge
				SDValue(and6, 1)
				);

				// R09 = R31 - R09;// Instr #35
				SDNode *sub1 = CrtDAG->getMachineNode(
				Connex::SUBV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload0, 0),
				SDValue(eq6, 0),
				// glue (or chain) input edge
				SDValue(sumRed1, 0)
				);

				// R16 = R24 & R09;// Instr #36
				SDNode *and7 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(sub1, 0),
				SDValue(and5, 0),
				// glue (or chain) input edge
				SDValue(sub1, 1)
				);

				// RED(R16);// Instr #37
				SDNode *sumRed2 = CrtDAG->getMachineNode(
				Connex::RED_H,
				DL,
				MVT::Glue,
				SDValue(and7, 0),
				// glue (or chain) input edge
				SDValue(and7, 1)
				);

				// R09 = R25 == R11;// Instr #38
				SDNode *eq7 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and0, 0),
				SDValue(vload5, 0),
				// glue (or chain) input edge
				SDValue(sumRed2, 0)
				);

				SDValue ct11 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #39
				SDNode *nop2 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct11,
				// glue (or chain) input edge
				SDValue(eq7, 1)
				);

				// WHERE_EQ;// Instr #40
				SDNode *whereeq2 = CrtDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop2, 0)
				);

				// R27 = R30 - R27;// Instr #41
				SDNode *sub2 = CrtDAG->getMachineNode(
				Connex::SUBV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(vload1, 0),
				SDValue(or1, 0),
				SDValue(or1, 0),
				// glue (or chain) input edge
				SDValue(whereeq2, 0)
				}
				);

				// END_WHERE;// Instr #42
				SDNode *endwhere2 = CrtDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(sub2, 1)
				);

				SDValue ct12 = CrtDAG->getConstant(0, DL, MVT::i16, true, false);
				// R02 = R26 << 0;// Instr #43
				SDNode *ishl0 = CrtDAG->getMachineNode(
				Connex::ISHLV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				ct12,
				// glue (or chain) input edge
				SDValue(endwhere2, 0)
				);

				SDValue ct13 = CrtDAG->getConstant(0, DL, MVT::i16, true, false);
				// R02 = 0;// Instr #44
				SDNode *vload8 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct13,
				// glue (or chain) input edge
				SDValue(ishl0, 1)
				);

				SDValue ct14 = CrtDAG->getConstant(6, DL, MVT::i16, true, false);
				// R24 = 6;// Instr #45
				SDNode *vload9 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct14,
				// glue (or chain) input edge
				SDValue(vload8, 1)
				);

				// R19 = R26 < R24;// Instr #46
				SDNode *lt1 = CrtDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				SDValue(vload9, 0),
				// glue (or chain) input edge
				SDValue(vload9, 1)
				);

				// R17 = R02 < R26;// Instr #47
				SDNode *lt2 = CrtDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload8, 0),
				SDValue(vload7, 0),
				// glue (or chain) input edge
				SDValue(lt1, 1)
				);

				// R02 = R31 + R02;// Instr #48
				SDNode *add0 = CrtDAG->getMachineNode(
				Connex::ADDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload8, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(lt2, 1)
				);

				// R09 = R19 & R17;// Instr #49
				SDNode *and8 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(lt2, 0),
				SDValue(lt1, 0),
				// glue (or chain) input edge
				SDValue(add0, 1)
				);

				// R09 = R09 == R31;// Instr #50
				SDNode *eq8 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and8, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(and8, 1)
				);

				SDValue ct15 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #51
				SDNode *nop3 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct15,
				// glue (or chain) input edge
				SDValue(eq8, 1)
				);

				// WHERE_EQ;// Instr #52
				SDNode *whereeq3 = CrtDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop3, 0)
				);

				// R19 = R26 - R02;// Instr #53
				SDNode *sub3 = CrtDAG->getMachineNode(
				Connex::SUBV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(vload7, 0),
				SDValue(add0, 0),
				SDValue(lt1, 0),
				// glue (or chain) input edge
				SDValue(whereeq3, 0)
				}
				);

				// R27 = R27 << R19;// Instr #54
				SDNode *shl0 = CrtDAG->getMachineNode(
				Connex::SHLV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(sub2, 0),
				SDValue(sub3, 0),
				SDValue(sub2, 0),
				// glue (or chain) input edge
				SDValue(sub3, 1)
				}
				);

				// RED(R27);// Instr #55
				SDNode *sumRed3 = CrtDAG->getMachineNode(
				Connex::RED_H,
				DL,
				MVT::Glue,
				SDValue(shl0, 0),
				// glue (or chain) input edge
				SDValue(shl0, 1)
				);

				// END_WHERE;// Instr #56
				SDNode *endwhere3 = CrtDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(sumRed3, 0)
				);

				SDValue ct16 = CrtDAG->getConstant(5, DL, MVT::i16, true, false);
				// R02 = 5;// Instr #57
				SDNode *vload10 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct16,
				// glue (or chain) input edge
				SDValue(endwhere3, 0)
				);

				SDValue ct17 = CrtDAG->getConstant(11, DL, MVT::i16, true, false);
				// R24 = 11;// Instr #58
				SDNode *vload11 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct17,
				// glue (or chain) input edge
				SDValue(vload10, 1)
				);

				// R19 = R26 < R24;// Instr #59
				SDNode *lt3 = CrtDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				SDValue(vload11, 0),
				// glue (or chain) input edge
				SDValue(vload11, 1)
				);

				// R17 = R02 < R26;// Instr #60
				SDNode *lt4 = CrtDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload10, 0),
				SDValue(vload7, 0),
				// glue (or chain) input edge
				SDValue(lt3, 1)
				);

				// R02 = R31 + R02;// Instr #61
				SDNode *add1 = CrtDAG->getMachineNode(
				Connex::ADDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload10, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(lt4, 1)
				);

				// R09 = R19 & R17;// Instr #62
				SDNode *and9 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(lt4, 0),
				SDValue(lt3, 0),
				// glue (or chain) input edge
				SDValue(add1, 1)
				);

				// R09 = R09 == R31;// Instr #63
				SDNode *eq9 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and9, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(and9, 1)
				);

				SDValue ct18 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #64
				SDNode *nop4 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct18,
				// glue (or chain) input edge
				SDValue(eq9, 1)
				);

				// WHERE_EQ;// Instr #65
				SDNode *whereeq4 = CrtDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop4, 0)
				);

				// R19 = R26 - R02;// Instr #66
				SDNode *sub4 = CrtDAG->getMachineNode(
				Connex::SUBV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(vload7, 0),
				SDValue(add1, 0),
				SDValue(lt3, 0),
				// glue (or chain) input edge
				SDValue(whereeq4, 0)
				}
				);

				// R27 = R27 << R19;// Instr #67
				SDNode *shl1 = CrtDAG->getMachineNode(
				Connex::SHLV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(shl0, 0),
				SDValue(sub4, 0),
				SDValue(shl0, 0),
				// glue (or chain) input edge
				SDValue(sub4, 1)
				}
				);

				// RED(R27);// Instr #68
				SDNode *sumRed4 = CrtDAG->getMachineNode(
				Connex::RED_H,
				DL,
				MVT::Glue,
				SDValue(shl1, 0),
				// glue (or chain) input edge
				SDValue(shl1, 1)
				);

				// END_WHERE;// Instr #69
				SDNode *endwhere4 = CrtDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(sumRed4, 0)
				);

				SDValue ct19 = CrtDAG->getConstant(10, DL, MVT::i16, true, false);
				// R02 = 10;// Instr #70
				SDNode *vload12 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct19,
				// glue (or chain) input edge
				SDValue(endwhere4, 0)
				);

				SDValue ct20 = CrtDAG->getConstant(16, DL, MVT::i16, true, false);
				// R24 = 16;// Instr #71
				SDNode *vload13 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct20,
				// glue (or chain) input edge
				SDValue(vload12, 1)
				);

				// R19 = R26 < R24;// Instr #72
				SDNode *lt5 = CrtDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				SDValue(vload13, 0),
				// glue (or chain) input edge
				SDValue(vload13, 1)
				);

				// R17 = R02 < R26;// Instr #73
				SDNode *lt6 = CrtDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload12, 0),
				SDValue(vload7, 0),
				// glue (or chain) input edge
				SDValue(lt5, 1)
				);

				// R02 = R31 + R02;// Instr #74
				SDNode *add2 = CrtDAG->getMachineNode(
				Connex::ADDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload12, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(lt6, 1)
				);

				// R09 = R19 & R17;// Instr #75
				SDNode *and10 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(lt6, 0),
				SDValue(lt5, 0),
				// glue (or chain) input edge
				SDValue(add2, 1)
				);

				// R09 = R09 == R31;// Instr #76
				SDNode *eq10 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and10, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(and10, 1)
				);

				SDValue ct21 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #77
				SDNode *nop5 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct21,
				// glue (or chain) input edge
				SDValue(eq10, 1)
				);

				// WHERE_EQ;// Instr #78
				SDNode *whereeq5 = CrtDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop5, 0)
				);

				// R19 = R26 - R02;// Instr #79
				SDNode *sub5 = CrtDAG->getMachineNode(
				Connex::SUBV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(vload7, 0),
				SDValue(add2, 0),
				SDValue(lt5, 0),
				// glue (or chain) input edge
				SDValue(whereeq5, 0)
				}
				);

				// R27 = R27 << R19;// Instr #80
				SDNode *shl2 = CrtDAG->getMachineNode(
				Connex::SHLV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(shl1, 0),
				SDValue(sub5, 0),
				SDValue(shl1, 0),
				// glue (or chain) input edge
				SDValue(sub5, 1)
				}
				);

				// RED(R27);// Instr #81
				SDNode *sumRed5 = CrtDAG->getMachineNode(
				Connex::RED_H,
				DL,
				MVT::Glue,
				SDValue(shl2, 0),
				// glue (or chain) input edge
				SDValue(shl2, 1)
				);

				// END_WHERE;// Instr #82
				SDNode *endwhere5 = CrtDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(sumRed5, 0)
				);

				SDValue ct22 = CrtDAG->getConstant(15, DL, MVT::i16, true, false);
				// R02 = 15;// Instr #83
				SDNode *vload14 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct22,
				// glue (or chain) input edge
				SDValue(endwhere5, 0)
				);

				SDValue ct23 = CrtDAG->getConstant(21, DL, MVT::i16, true, false);
				// R24 = 21;// Instr #84
				SDNode *vload15 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct23,
				// glue (or chain) input edge
				SDValue(vload14, 1)
				);

				// R19 = R26 < R24;// Instr #85
				SDNode *lt7 = CrtDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				SDValue(vload15, 0),
				// glue (or chain) input edge
				SDValue(vload15, 1)
				);

				// R17 = R02 < R26;// Instr #86
				SDNode *lt8 = CrtDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload14, 0),
				SDValue(vload7, 0),
				// glue (or chain) input edge
				SDValue(lt7, 1)
				);

				// R02 = R31 + R02;// Instr #87
				SDNode *add3 = CrtDAG->getMachineNode(
				Connex::ADDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload14, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(lt8, 1)
				);

				// R09 = R19 & R17;// Instr #88
				SDNode *and11 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(lt8, 0),
				SDValue(lt7, 0),
				// glue (or chain) input edge
				SDValue(add3, 1)
				);

				// R09 = R09 == R31;// Instr #89
				SDNode *eq11 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and11, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(and11, 1)
				);

				SDValue ct24 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #90
				SDNode *nop6 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct24,
				// glue (or chain) input edge
				SDValue(eq11, 1)
				);

				// WHERE_EQ;// Instr #91
				SDNode *whereeq6 = CrtDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop6, 0)
				);

				// R19 = R26 - R02;// Instr #92
				SDNode *sub6 = CrtDAG->getMachineNode(
				Connex::SUBV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(vload7, 0),
				SDValue(add3, 0),
				SDValue(lt7, 0),
				// glue (or chain) input edge
				SDValue(whereeq6, 0)
				}
				);

				// R27 = R27 << R19;// Instr #93
				SDNode *shl3 = CrtDAG->getMachineNode(
				Connex::SHLV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(shl2, 0),
				SDValue(sub6, 0),
				SDValue(shl2, 0),
				// glue (or chain) input edge
				SDValue(sub6, 1)
				}
				);

				// RED(R27);// Instr #94
				SDNode *sumRed6 = CrtDAG->getMachineNode(
				Connex::RED_H,
				DL,
				MVT::Glue,
				SDValue(shl3, 0),
				// glue (or chain) input edge
				SDValue(shl3, 1)
				);

				// END_WHERE;// Instr #95
				SDNode *endwhere6 = CrtDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(sumRed6, 0)
				);

				SDValue ct25 = CrtDAG->getConstant(20, DL, MVT::i16, true, false);
				// R02 = 20;// Instr #96
				SDNode *vload16 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct25,
				// glue (or chain) input edge
				SDValue(endwhere6, 0)
				);

				SDValue ct26 = CrtDAG->getConstant(26, DL, MVT::i16, true, false);
				// R24 = 26;// Instr #97
				SDNode *vload17 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct26,
				// glue (or chain) input edge
				SDValue(vload16, 1)
				);

				// R19 = R26 < R24;// Instr #98
				SDNode *lt9 = CrtDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				SDValue(vload17, 0),
				// glue (or chain) input edge
				SDValue(vload17, 1)
				);

				// R17 = R02 < R26;// Instr #99
				SDNode *lt10 = CrtDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload16, 0),
				SDValue(vload7, 0),
				// glue (or chain) input edge
				SDValue(lt9, 1)
				);

				// R02 = R31 + R02;// Instr #100
				SDNode *add4 = CrtDAG->getMachineNode(
				Connex::ADDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload16, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(lt10, 1)
				);

				// R09 = R19 & R17;// Instr #101
				SDNode *and12 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(lt10, 0),
				SDValue(lt9, 0),
				// glue (or chain) input edge
				SDValue(add4, 1)
				);

				// R09 = R09 == R31;// Instr #102
				SDNode *eq12 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and12, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(and12, 1)
				);

				SDValue ct27 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #103
				SDNode *nop7 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct27,
				// glue (or chain) input edge
				SDValue(eq12, 1)
				);

				// WHERE_EQ;// Instr #104
				SDNode *whereeq7 = CrtDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop7, 0)
				);

				// R19 = R26 - R02;// Instr #105
				SDNode *sub7 = CrtDAG->getMachineNode(
				Connex::SUBV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(vload7, 0),
				SDValue(add4, 0),
				SDValue(lt9, 0),
				// glue (or chain) input edge
				SDValue(whereeq7, 0)
				}
				);

				// R27 = R27 << R19;// Instr #106
				SDNode *shl4 = CrtDAG->getMachineNode(
				Connex::SHLV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(shl3, 0),
				SDValue(sub7, 0),
				SDValue(shl3, 0),
				// glue (or chain) input edge
				SDValue(sub7, 1)
				}
				);

				// RED(R27);// Instr #107
				SDNode *sumRed7 = CrtDAG->getMachineNode(
				Connex::RED_H,
				DL,
				MVT::Glue,
				SDValue(shl4, 0),
				// glue (or chain) input edge
				SDValue(shl4, 1)
				);

				// END_WHERE;// Instr #108
				SDNode *endwhere7 = CrtDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(sumRed7, 0)
				);

				SDValue ct28 = CrtDAG->getConstant(25, DL, MVT::i16, true, false);
				// R02 = 25;// Instr #109
				SDNode *vload18 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct28,
				// glue (or chain) input edge
				SDValue(endwhere7, 0)
				);

				SDValue ct29 = CrtDAG->getConstant(31, DL, MVT::i16, true, false);
				// R24 = 31;// Instr #110
				SDNode *vload19 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct29,
				// glue (or chain) input edge
				SDValue(vload18, 1)
				);

				// R19 = R26 < R24;// Instr #111
				SDNode *lt11 = CrtDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				SDValue(vload19, 0),
				// glue (or chain) input edge
				SDValue(vload19, 1)
				);

				// R17 = R02 < R26;// Instr #112
				SDNode *lt12 = CrtDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload18, 0),
				SDValue(vload7, 0),
				// glue (or chain) input edge
				SDValue(lt11, 1)
				);

				// R02 = R31 + R02;// Instr #113
				SDNode *add5 = CrtDAG->getMachineNode(
				Connex::ADDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload18, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(lt12, 1)
				);

				// R09 = R19 & R17;// Instr #114
				SDNode *and13 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(lt12, 0),
				SDValue(lt11, 0),
				// glue (or chain) input edge
				SDValue(add5, 1)
				);

				// R09 = R09 == R31;// Instr #115
				SDNode *eq13 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and13, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(and13, 1)
				);

				SDValue ct30 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #116
				SDNode *nop8 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct30,
				// glue (or chain) input edge
				SDValue(eq13, 1)
				);

				// WHERE_EQ;// Instr #117
				SDNode *whereeq8 = CrtDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop8, 0)
				);

				// R19 = R26 - R02;// Instr #118
				SDNode *sub8 = CrtDAG->getMachineNode(
				Connex::SUBV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(vload7, 0),
				SDValue(add5, 0),
				SDValue(lt11, 0),
				// glue (or chain) input edge
				SDValue(whereeq8, 0)
				}
				);

				// R27 = R27 << R19;// Instr #119
				SDNode *shl5 = CrtDAG->getMachineNode(
				Connex::SHLV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(shl4, 0),
				SDValue(sub8, 0),
				SDValue(shl4, 0),
				// glue (or chain) input edge
				SDValue(sub8, 1)
				}
				);

				// RED(R27);// Instr #120
				SDNode *sumRed8 = CrtDAG->getMachineNode(
				Connex::RED_H,
				DL,
				MVT::Glue,
				SDValue(shl5, 0),
				// glue (or chain) input edge
				SDValue(shl5, 1)
				);

				// END_WHERE;// Instr #121
				SDNode reduceH /endwhere8*/ = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				// Alex: MVT::Glue,
				MVT::Other,
				// glue (or chain) input edge
				SDValue(sumRed8, 0)
				);

llvm/lib/Target/Connex/Select_REDi32_OpincaaCodeGen.h

This file was added.

				//===-- Select_REDi32_OpincaaCodeGen.h --------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				// Code auto-generated by method Kernel::genLLVMISelManualCode()
				// from the OPINCAA lib, from kernel red.i32.
				// You should include this code in the Select() method of the [Target]SelectionDAGISel
				// class of your back end (or MAYBE in the ISelLowering pass).
				// Number of instructions generated: 14.
				//
				//===----------------------------------------------------------------------===//


				// From /home/asusu/LLVM/Tests/opincaa_standalone_apps/Emulate_i32/REDi32_manual/DumpISel_OpincaaCodeGen_old07_620.cpp


				SDValue ct0 = CrtDAG->getConstant(1, DL, MVT::i16, true, false);
				// R29 = 1;// Instr #0
				SDNode *vload0 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct0,
				// glue (or chain) input edge
				SDValue(nodeOpSrcCast, 1)
				);

				// CELL_SHR(R28, R29);// Instr #1
				SDNode *cellshr0 = CrtDAG->getMachineNode(
				Connex::CELLSHR_H,
				DL,
				MVT::Glue,
				SDValue(nodeOpSrcCast, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(vload0, 1)
				);

				SDValue ct1 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #2
				SDNode *nop0 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct1,
				// glue (or chain) input edge
				SDValue(cellshr0, 0)
				);

				// R27 = SHIFT_REG;// Instr #3
				SDNode *ldsh0 = CrtDAG->getMachineNode(
				Connex::LDSH_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop0, 0)
				);

				// R26 = INDEX;// Instr #4
				SDNode *ldix0 = CrtDAG->getMachineNode(
				Connex::LDIX_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(ldsh0, 1)
				);

				// R25 = R26 & R29;// Instr #5
				SDNode *and0 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload0, 0),
				SDValue(ldix0, 0),
				// glue (or chain) input edge
				SDValue(ldix0, 1)
				);

				// R24 = R25 == R29;// Instr #6
				SDNode *eq0 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and0, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(and0, 1)
				);

				SDValue ct2 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #7
				SDNode *nop1 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct2,
				// glue (or chain) input edge
				SDValue(eq0, 1)
				);

				// WHERE_EQ;// Instr #8
				SDNode *whereeq0 = CrtDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop1, 0)
				);

				SDValue ct3 = CrtDAG->getConstant(0, DL, MVT::i16, true, false);
				// R28 = 0;// Instr #9
				SDNode *vload1 = CrtDAG->getMachineNode(
				Connex::VLOAD_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct3,
				SDValue(nodeOpSrcCast, 0),
				// glue (or chain) input edge
				SDValue(whereeq0, 0)
				);

				SDValue ct4 = CrtDAG->getConstant(0, DL, MVT::i16, true, false);
				// R27 = 0;// Instr #10
				SDNode *vload2 = CrtDAG->getMachineNode(
				Connex::VLOAD_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct4,
				SDValue(ldsh0, 0),
				// glue (or chain) input edge
				SDValue(vload1, 1)
				);

				// END_WHERE;// Instr #11
				SDNode *endwhere0 = CrtDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(vload2, 1)
				);

				// RED_U(R28);// Instr #12
				SDNode *sumRedU0 = CrtDAG->getMachineNode(
				Connex::RED_U_H,
				DL,
				MVT::Glue,
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(endwhere0, 0)
				);

				// RED_U(R27);// Instr #13
				SDNode reduceHigh16 /sumRedU1*/ = CurDAG->getMachineNode(
				Connex::RED_U_H,
				DL,
				// Alex: MVT::Glue,
				MVT::Other,
				SDValue(vload2, 0),
				// glue (or chain) input edge
				SDValue(sumRedU0, 0)
				);

llvm/lib/Target/Connex/Select_SHRAi32_OpincaaCodeGen.h

This file was added.

				//===-- Select_SHRAi32_OpincaaCodeGen.h --------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				// Code auto-generated by method Kernel::genLLVMISelManualCode()
				// from the OPINCAA lib, from kernel shra.i32.
				// You should include this code in the Select() method of the [Target]SelectionDAGISel
				// class of your back end (or MAYBE in the ISelLowering pass).
				// Number of instructions generated: 33.
				//
				//===----------------------------------------------------------------------===//


				// From /home/asusu/LLVM/Tests/opincaa_standalone_apps/Emulate_i32/SHRAi32_manual/DumpISel_OpincaaCodeGen_old15_930.cpp


				// IMPORTANT NOTE: the OPINCAA lib gives the following warnings when generating
				// the code below:
				// "Warning: instrCrt.getDest() = 21 - register not initialized before updated in WHERE - maybe wrong semantics. instrCrtOrig = R21 = R27 - R10;"
				// "Warning: instrCrt.getDest() = 22 - register not initialized before updated in WHERE - maybe wrong semantics. instrCrtOrig = R22 = R23 << R21;"
				// Therefore we need to manually add the initialization code (see below).
				// Therefore had to add manually the following 2 nodes with opcode
				// VLOAD_BOGUS_H actually NOT generating a real Connex assembly instruction,
				// to have the later predicated instructions refer to via tied-to constraints
				// (constraint: result of predicated instr the same as bogus result of VLOAD_BOGUS_H)

				SDValue ct21Node = CurDAG->getConstant(21, DL, MVT::i16, true, false);
				SDNode *rct21Node = CurDAG->getMachineNode(
				Connex::VLOAD_BOGUS_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct21Node,
				// glue (or chain) input edge
				SDValue(nodeOpSrcCast1, 1)
				);

				SDValue ct22Node = CurDAG->getConstant(22, DL, MVT::i16, true, false);
				SDNode *rct22Node = CurDAG->getMachineNode(
				Connex::VLOAD_BOGUS_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct22Node,
				// glue (or chain) input edge
				SDValue(rct21Node, 1)
				);




				SDValue ct0 = CrtDAG->getConstant(0, DL, MVT::i16, true, false);
				// R31 = 0;// Instr #0
				SDNode *vload0 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct0,
				// glue (or chain) input edge
				// Alex: SDValue(nodeOpSrcCast1, 1)
				SDValue(rct22Node, 1) // Alex
				);

				SDValue ct1 = CrtDAG->getConstant(1, DL, MVT::i16, true, false);
				// R30 = 1;// Instr #1
				SDNode *vload1 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct1,
				// glue (or chain) input edge
				SDValue(vload0, 1)
				);

				SDValue ct2 = CrtDAG->getConstant(16, DL, MVT::i16, true, false);
				// R10 = 16;// Instr #2
				SDNode *vload2 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct2,
				// glue (or chain) input edge
				SDValue(vload1, 1)
				);

				SDValue ct3 = CrtDAG->getConstant(31, DL, MVT::i16, true, false);
				// R08 = 31;// Instr #3
				SDNode *vload3 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct3,
				// glue (or chain) input edge
				SDValue(vload2, 1)
				);

				// R27 = R27 & R08;// Instr #4
				SDNode *and0 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload3, 0),
				SDValue(nodeOpSrcCast2, 0),
				// glue (or chain) input edge
				SDValue(vload3, 1)
				);

				// R25 = INDEX;// Instr #5
				SDNode *ldix0 = CrtDAG->getMachineNode(
				Connex::LDIX_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(and0, 1)
				);

				// R25 = R25 & R30;// Instr #6
				SDNode *and1 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload1, 0),
				SDValue(ldix0, 0),
				// glue (or chain) input edge
				SDValue(ldix0, 1)
				);

				// CELL_SHR(R27, R25);// Instr #7
				SDNode *cellshr0 = CrtDAG->getMachineNode(
				Connex::CELLSHR_H,
				DL,
				MVT::Glue,
				SDValue(and0, 0),
				SDValue(and1, 0),
				// glue (or chain) input edge
				SDValue(and1, 1)
				);

				SDValue ct4 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #8
				SDNode *nop0 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct4,
				// glue (or chain) input edge
				SDValue(cellshr0, 0)
				);

				// R27 = SHIFT_REG;// Instr #9
				SDNode *ldsh0 = CrtDAG->getMachineNode(
				Connex::LDSH_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop0, 0)
				);

				// R20 = R10 < R27;// Instr #10
				SDNode *lt0 = CrtDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload2, 0),
				SDValue(ldsh0, 0),
				// glue (or chain) input edge
				SDValue(ldsh0, 1)
				);

				// R29 = SHRA(R28, R27);// Instr #11
				SDNode *shra0 = CrtDAG->getMachineNode(
				Connex::SHRAV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(nodeOpSrcCast1, 0),
				SDValue(ldsh0, 0),
				// glue (or chain) input edge
				SDValue(lt0, 1)
				);

				// CELL_SHL(R28, R30);// Instr #12
				SDNode *cellshl0 = CrtDAG->getMachineNode(
				Connex::CELLSHL_H,
				DL,
				MVT::Glue,
				SDValue(nodeOpSrcCast1, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(shra0, 1)
				);

				SDValue ct5 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #13
				SDNode *nop1 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct5,
				// glue (or chain) input edge
				SDValue(cellshl0, 0)
				);

				// R23 = SHIFT_REG;// Instr #14
				SDNode *ldsh1 = CrtDAG->getMachineNode(
				Connex::LDSH_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop1, 0)
				);

				// R25 = R25 == R31;// Instr #15
				SDNode *eq0 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and1, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(ldsh1, 1)
				);

				// R24 = R20 & R25;// Instr #16
				SDNode *and2 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq0, 0),
				SDValue(lt0, 0),
				// glue (or chain) input edge
				SDValue(eq0, 1)
				);

				// R19 = R24 == R30;// Instr #17
				SDNode *eq1 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and2, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(and2, 1)
				);

				SDValue ct6 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #18
				SDNode *nop2 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct6,
				// glue (or chain) input edge
				SDValue(eq1, 1)
				);

				// WHERE_EQ;// Instr #19
				SDNode *whereeq0 = CrtDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop2, 0)
				);

				// R21 = R27 - R10;// Instr #20
				SDNode *sub0 = CrtDAG->getMachineNode(
				Connex::SUBV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(ldsh0, 0),
				SDValue(vload2, 0),
				// Alex: SDValue(, 0),
				SDValue(rct21Node, 0), // Alex
				// glue (or chain) input edge
				SDValue(whereeq0, 0)
				}
				);

				// R29 = SHRA(R23, R21);// Instr #21
				SDNode *shra1 = CrtDAG->getMachineNode(
				Connex::SHRAV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(ldsh1, 0),
				SDValue(sub0, 0),
				SDValue(shra0, 0),
				// glue (or chain) input edge
				SDValue(sub0, 1)
				}
				);

				// END_WHERE;// Instr #22
				SDNode *endwhere0 = CrtDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(shra1, 1)
				);

				// R20 = R30 - R20;// Instr #23
				SDNode *sub1 = CrtDAG->getMachineNode(
				Connex::SUBV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload1, 0),
				SDValue(lt0, 0),
				// glue (or chain) input edge
				SDValue(endwhere0, 0)
				);

				// R24 = R20 & R25;// Instr #24
				SDNode *and3 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq0, 0),
				SDValue(sub1, 0),
				// glue (or chain) input edge
				SDValue(sub1, 1)
				);

				// R19 = R24 == R30;// Instr #25
				SDNode *eq2 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and3, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(and3, 1)
				);

				SDValue ct7 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #26
				SDNode *nop3 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct7,
				// glue (or chain) input edge
				SDValue(eq2, 1)
				);

				// WHERE_EQ;// Instr #27
				SDNode *whereeq1 = CrtDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop3, 0)
				);

				// R21 = R10 - R27;// Instr #28
				SDNode *sub2 = CrtDAG->getMachineNode(
				Connex::SUBV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(vload2, 0),
				SDValue(ldsh0, 0),
				SDValue(sub0, 0),
				// glue (or chain) input edge
				SDValue(whereeq1, 0)
				}
				);

				// R22 = R23 << R21;// Instr #29
				SDNode *shl0 = CrtDAG->getMachineNode(
				Connex::SHLV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(ldsh1, 0),
				SDValue(sub2, 0),
				// Alex: SDValue(, 0),
				SDValue(rct22Node, 0),
				// glue (or chain) input edge
				SDValue(sub2, 1)
				}
				);

				// R29 = R28 >> R27;// Instr #30
				SDNode *shr0 = CrtDAG->getMachineNode(
				Connex::SHRV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(nodeOpSrcCast1, 0),
				SDValue(ldsh0, 0),
				SDValue(shra1, 0),
				// glue (or chain) input edge
				SDValue(shl0, 1)
				}
				);

				// R29 = R29 \| R22;// Instr #31
				SDNode resH /or0*/ = CrtDAG->getMachineNode(
				Connex::ORV_SPECIAL_H,
				DL,
				CrtDAG->getVTList(
				TYPE_VECTOR_I16,
				MVT::Glue
				),
				{
				SDValue(shl0, 0),
				SDValue(shr0, 0),
				SDValue(shr0, 0),
				// glue (or chain) input edge
				SDValue(shr0, 1)
				}
				);

				// END_WHERE;// Instr #32
				SDNode lastNode /endwhere1*/ = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				// MVT::Glue,
				MVT::Other,
				// glue (or chain) input edge
				SDValue(resH /or0/, 1)
				);

llvm/lib/Target/Connex/Select_SUBf16_OpincaaCodeGen.h

This file was added.

This file has a very large number of changes (3,566 lines). Show File Contents

llvm/lib/Target/Connex/Select_SUBi32_OpincaaCodeGen.h

This file was added.

				//===-- Select_SUBi32_OpincaaCodeGen.h --------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				// Code auto-generated by method Kernel::genLLVMISelManualCode()
				// from the OPINCAA lib, from kernel sub.i32.
				// You should include this code in the Select() method of the SelectionDAGISel
				// class of your back end.
				// Number of instructions generated: 15.
				//
				//===----------------------------------------------------------------------===//


				// From /home/asusu/LLVM/Tests/opincaa_standalone_apps/Emulate_i32/SUBi32_manual/DumpISel_OpincaaCodeGen_old140_420.cpp


				SDValue ct0 = CrtDAG->getConstant(0, DL, MVT::i16, true, false);
				// R31 = 0;// Instr #0
				SDNode *vload0 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct0,
				// glue (or chain) input edge
				SDValue(nodeOpSrcCast2, 1)
				);

				SDValue ct1 = CrtDAG->getConstant(1, DL, MVT::i16, true, false);
				// R30 = 1;// Instr #1
				SDNode *vload1 = CrtDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct1,
				// glue (or chain) input edge
				SDValue(vload0, 1)
				);

				// R29 = R27 - R28;// Instr #2
				SDNode *sub0 = CrtDAG->getMachineNode(
				Connex::SUBV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(nodeOpSrcCast1, 0),
				SDValue(nodeOpSrcCast2, 0),
				// glue (or chain) input edge
				SDValue(vload1, 1)
				);

				// R23 = ADDC(R31, R31);// Instr #3
				SDNode *addc0 = CrtDAG->getMachineNode(
				Connex::ADDCV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload0, 0),
				SDValue(vload0, 0),
				SDValue(sub0, 0)
				// no need for glue or chain input (since it normally consumes the output of the predecessor)
				);

				// R26 = INDEX;// Instr #4
				SDNode *ldix0 = CrtDAG->getMachineNode(
				Connex::LDIX_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(addc0, 1)
				);

				// R25 = R26 & R30;// Instr #5
				SDNode *and0 = CrtDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload1, 0),
				SDValue(ldix0, 0),
				// glue (or chain) input edge
				SDValue(ldix0, 1)
				);

				// R24 = R25 == R30;// Instr #6
				SDNode *eq0 = CrtDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and0, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(and0, 1)
				);

				SDValue ct2 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #7
				SDNode *nop0 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct2,
				// glue (or chain) input edge
				SDValue(eq0, 1)
				);

				// WHERE_EQ;// Instr #8
				SDNode *whereeq0 = CrtDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop0, 0)
				);

				SDValue ct3 = CrtDAG->getConstant(0, DL, MVT::i16, true, false);
				// R23 = 0;// Instr #9
				SDNode *vload2 = CrtDAG->getMachineNode(
				Connex::VLOAD_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct3,
				SDValue(addc0, 0),
				// glue (or chain) input edge
				SDValue(whereeq0, 0)
				);

				// END_WHERE;// Instr #10
				SDNode *endwhere0 = CrtDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(vload2, 1)
				);

				// CELL_SHR(R23, R30);// Instr #11
				SDNode *cellshr0 = CrtDAG->getMachineNode(
				Connex::CELLSHR_H,
				DL,
				MVT::Glue,
				SDValue(vload2, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(endwhere0, 0)
				);

				SDValue ct4 = CrtDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;// Instr #12
				SDNode *nop1 = CrtDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct4,
				// glue (or chain) input edge
				SDValue(cellshr0, 0)
				);

				// R23 = SHIFT_REG;// Instr #13
				SDNode *ldsh0 = CrtDAG->getMachineNode(
				Connex::LDSH_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop1, 0)
				);

				// R29 = R29 - R23;// Instr #14
				SDNode resH /sub1*/ = CrtDAG->getMachineNode(
				Connex::SUBV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(sub0, 0),
				SDValue(ldsh0, 0),
				// glue (or chain) input edge
				SDValue(ldsh0, 1)
				);



				SDNode *lastNode = resH;

llvm/lib/Target/Connex/TargetInfo/CMakeLists.txt

This file was added.

				add_llvm_component_library(LLVMConnexInfo
				ConnexTargetInfo.cpp

				LINK_COMPONENTS
				MC
				Support

				ADD_TO_COMPONENT
				Connex
				)

llvm/lib/Target/Connex/TargetInfo/ConnexTargetInfo.cpp

This file was added.

				//===-- ConnexTargetInfo.cpp - Connex Target Implementation ---------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "Connex.h"
				#include "llvm/MC/TargetRegistry.h"
				using namespace llvm;

				namespace llvm {
				Target TheConnexTarget;
				}

				extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeConnexTargetInfo() {
				TargetRegistry::RegisterTarget(
				TheConnexTarget, "connex", "Connex", "Connex",
				[](Triple::ArchType) { return false; }, true);
				}

llvm/test/CodeGen/Connex/MatMulBT-128_i16.ll

This file was added.

				; RUN: llc < %s -no-integrated-as -print-after-all -debug -march=connex -O3 -disable-cgp -pre-RA-sched=source -hoist-cheap-insts -enable-correct-asm-print -asm-show-inst -asm-verbose -debug-pass=Structure \| FileCheck %s

				; From ~/LLVM/Tests/DawnCC/35_MatMul/SIZE_256/7_CVL8_LLVM80/3/test.ll

				; ModuleID = 'test.scalar.ll'
				source_filename = "test.c"
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@C = common local_unnamed_addr global [256 x [256 x i16]] zeroinitializer, align 16
				@A = common local_unnamed_addr global [256 x [256 x i16]] zeroinitializer, align 16
				@B = common local_unnamed_addr global [256 x [256 x i16]] zeroinitializer, align 16
				@CONNEX_VL = external global i64

				; Function Attrs: nounwind uwtable
				define void @MatMul_BTransposed() local_unnamed_addr #0 !dbg !15 {
				entry:
				call void asm sideeffect "// START_OPINCAA_HOST_DEVICE_CODE\0A int numI16WordsAccessedInArrayA = 65536;\0A connexGlobal->writeDataToConnexPartial(A, // ;)\0A /* num elems written / numI16WordsAccessedInArrayA, // ;)\0A / offset / 0);\0A // Generated in InstrumentVectorGatherLoadOrScatterStore() ;)\0A int numI16WordsAccessedInArrayB = 65536;\0A connexGlobal->writeDataToConnexPartial(B, // ;)\0A / num elems written / numI16WordsAccessedInArrayB, // ;)\0A / offset */ 0 + CEIL_INT_DIV(numI16WordsAccessedInArrayA, CONNEX_VL));\0A // ;)\0Aif (connexGlobal->getKernel(\22OpincaaLLVM_MatMul_BTransposed_lines_43_0\22) == NULL) {\0A BEGIN_KERNEL(\22OpincaaLLVM_MatMul_BTransposed_lines_43_0\22); // Generated in vectorizeLoop()\0A EXECUTE_IN_ALL( // Generated in vectorizeLoop()\0A // Handling spills (from predecessors) and fills\0A", ""() #3, !dbg !18
				tail call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !19, metadata !21), !dbg !18
				%CONNEX_VL_DEREF_C = load i64, i64* @CONNEX_VL, align 8, !dbg !22
				%getSizeDiv_i64 = udiv i64 sub (i64 add (i64 ptrtoint ([256 x [256 x i16]]* @A to i64), i64 131072), i64 ptrtoint ([256 x [256 x i16]]* @A to i64)), %CONNEX_VL_DEREF_C, !dbg !22
				%getSizeDiv = lshr i64 %getSizeDiv_i64, 1, !dbg !22
				%ceil_getSize_16b = trunc i64 %getSizeDiv to i16, !dbg !22
				br label %for.cond1.preheader, !dbg !22

				for.cond1.preheader: ; preds = %entry, %for.inc27
				%0 = phi <8 x i16> [ undef, %entry ], [ %15, %for.inc27 ]
				%i.05 = phi i32 [ 0, %entry ], [ %inc28, %for.inc27 ]
				%vecIndVar2ndInnerLoop0 = insertelement <8 x i16> undef, i16 %ceil_getSize_16b, i64 0, !dbg !26
				%1 = shufflevector <8 x i16> %vecIndVar2ndInnerLoop0, <8 x i16> undef, <8 x i32> zeroinitializer, !dbg !26
				%idxprom4 = sext i32 %i.05 to i64, !dbg !26
				%GEPInstrIndexWith0.idx = shl nsw i64 %idxprom4, 9, !dbg !31
				%CONNEX_VL_DEREF_D = load i64, i64* @CONNEX_VL, align 8, !dbg !31
				%connexVLDerefAdjusted = shl i64 %CONNEX_VL_DEREF_D, 1, !dbg !31
				%finalIndexValue64 = udiv i64 %GEPInstrIndexWith0.idx, %connexVLDerefAdjusted, !dbg !31
				%finalIndexValue647 = trunc i64 %finalIndexValue64 to i16, !dbg !31
				br label %for.body3, !dbg !31

				for.body3: ; preds = %for.cond1.preheader, %for.inc24
				%2 = phi <8 x i16> [ %0, %for.cond1.preheader ], [ %15, %for.inc24 ]
				%j.04 = phi i32 [ 0, %for.cond1.preheader ], [ %inc25, %for.inc24 ]
				%varVecIndexOuterLoop = phi <8 x i16> [ %1, %for.cond1.preheader ], [ %15, %for.inc24 ], !dbg !26
				call void @llvm.connex.repeat.x.times(i64 256), !dbg !26
				; CHECK: REPEAT
				%idxprom = sext i32 %j.04 to i64, !dbg !26
				%arrayidx5 = getelementptr inbounds [256 x [256 x i16]], [256 x [256 x i16]]* @C, i64 0, i64 %idxprom4, i64 %idxprom, !dbg !26
				store i16 0, i16* %arrayidx5, align 2, !dbg !33
				tail call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !34, metadata !21), !dbg !35
				%CONNEX_VL_DEREF_B = load i64, i64* @CONNEX_VL, align 8, !dbg !36
				%n.mod.vf = urem i64 256, %CONNEX_VL_DEREF_B, !dbg !36
				%n.vec = sub nsw i64 256, %n.mod.vf, !dbg !36
				%cmp.zero = icmp eq i64 %n.vec, 0, !dbg !36
				%cast.crd = trunc i64 %n.vec to i32, !dbg !36
				br i1 %cmp.zero, label %for.body8.preheader, label %vector.ph, !dbg !36

				vector.ph: ; preds = %for.body3
				%vecInsElem_valExactLSOffset = insertelement <8 x i16> undef, i16 %finalIndexValue647, i64 0, !dbg !36
				%vecValExactLSOffset = shufflevector <8 x i16> %vecInsElem_valExactLSOffset, <8 x i16> undef, <8 x i32> zeroinitializer, !dbg !36
				br label %vector.body, !dbg !36

				vector.body: ; preds = %vector.body, %vector.ph
				%index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				%vec.phi = phi <8 x i16> [ zeroinitializer, %vector.ph ], [ %10, %vector.body ]
				%varVecIndexInnerLoop = phi <8 x i16> [ %vecValExactLSOffset, %vector.ph ], [ %5, %vector.body ]
				%varVecIndexInnerLoop20 = phi <8 x i16> [ %varVecIndexOuterLoop, %vector.ph ], [ %8, %vector.body ]
				call void asm sideeffect " // Map part of reduction code; // Generated in vectorizeLoop()\0A", ""() #3
				call void asm sideeffect "// An empty inline Asm expression, required for ConnexAsmPrinter.cpp, MoveToFront();\0A\0A", ""() #3
				call void asm sideeffect "int indexLLVM_LV2;\0Aint origLoopTripCount = 256;\0Afor (indexLLVM_LV2 = 0; indexLLVM_LV2 < origLoopTripCount; indexLLVM_LV2 += CONNEX_VL) { // vectorized loop for induction var [NO INFO]\0A", ""() #3
				%3 = sext <8 x i16> %varVecIndexInnerLoop to <8 x i64>, !dbg !36
				%VectorGep = getelementptr i16, i16* inttoptr (i16 51 to i16*), <8 x i16> %varVecIndexInnerLoop, !dbg !36
				%4 = call <8 x i16> @llvm.masked.gather.v8i16(<8 x i16*> %VectorGep, i32 0, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i16> undef) #3, !dbg !36
				; CHECK: R(3) = LS[R(4)]
				%5 = add <8 x i16> %varVecIndexInnerLoop, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>, !dbg !36
				; CHECK: R(4) = R(4) + R(1)
				%6 = sext <8 x i16> %varVecIndexInnerLoop20 to <8 x i64>, !dbg !36
				%VectorGep21 = getelementptr i16, i16* inttoptr (i16 51 to i16*), <8 x i16> %varVecIndexInnerLoop20, !dbg !36
				%7 = call <8 x i16> @llvm.masked.gather.v8i16(<8 x i16*> %VectorGep21, i32 0, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i16> undef) #3, !dbg !36
				; CHECK: R(6) = LS[R(2)]
				%8 = add <8 x i16> %varVecIndexInnerLoop20, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>, !dbg !36
				; CHECK: R(2) = R(2) + R(1)
				%9 = mul <8 x i16> %7, %4, !dbg !40
				; CHECK: R(6) * ( R(3) ); R(3) = MULTLO();
				%10 = add <8 x i16> %vec.phi, %9, !dbg !42
				; CHECK: R(5) = R(5) + R(3)
				%index.next = add i64 %index, 8, !dbg !36
				%11 = icmp eq i64 %index.next, %n.vec, !dbg !36
				br i1 %11, label %middle.block, label %vector.body, !dbg !36, !llvm.loop !43

				middle.block: ; preds = %vector.body
				call void asm sideeffect "} // END for (indexLLVM_LV2) loop\0A", ""() #3
				call void @llvm.connex.reduce.v8i16(<8 x i16> %10)
				; CHECK: RED
				%cmp.n = icmp eq i64 %n.mod.vf, 0
				br i1 %cmp.n, label %for.inc24, label %for.body8.preheader, !dbg !36

				for.body8.preheader: ; preds = %middle.block, %for.body3
				%12 = phi <8 x i16> [ %2, %for.body3 ], [ %8, %middle.block ]
				%k.02.ph = phi i32 [ 0, %for.body3 ], [ %cast.crd, %middle.block ]
				br label %for.body8, !dbg !47

				for.body8: ; preds = %for.body8.preheader, %for.body8
				%add3 = phi i16 [ %add, %for.body8 ], [ 0, %for.body8.preheader ], !dbg !47
				%k.02 = phi i32 [ %inc, %for.body8 ], [ %k.02.ph, %for.body8.preheader ]
				%idxprom9 = sext i32 %k.02 to i64, !dbg !47
				%arrayidx12 = getelementptr inbounds [256 x [256 x i16]], [256 x [256 x i16]]* @A, i64 0, i64 %idxprom4, i64 %idxprom9, !dbg !47
				%13 = load i16, i16* %arrayidx12, align 2, !dbg !47
				%arrayidx16 = getelementptr inbounds [256 x [256 x i16]], [256 x [256 x i16]]* @B, i64 0, i64 %idxprom, i64 %idxprom9, !dbg !48
				%14 = load i16, i16* %arrayidx16, align 2, !dbg !48
				%mul = mul i16 %14, %13, !dbg !40
				%add = add i16 %add3, %mul, !dbg !42
				%inc = add nsw i32 %k.02, 1, !dbg !49
				tail call void @llvm.dbg.value(metadata i32 %inc, i64 0, metadata !34, metadata !21), !dbg !35
				%cmp7 = icmp slt i32 %inc, 256, !dbg !51
				br i1 %cmp7, label %for.body8, label %for.inc24.loopexit, !dbg !36, !llvm.loop !52

				for.inc24.loopexit: ; preds = %for.body8
				br label %for.inc24, !dbg !42

				for.inc24: ; preds = %for.inc24.loopexit, %middle.block
				%15 = phi <8 x i16> [ %8, %middle.block ], [ %12, %for.inc24.loopexit ]
				%add.lcssa = phi i16 [ undef, %middle.block ], [ %add, %for.inc24.loopexit ]
				store i16 %add.lcssa, i16* %arrayidx5, align 2, !dbg !42
				%inc25 = add nsw i32 %j.04, 1, !dbg !54
				tail call void @llvm.dbg.value(metadata i32 %inc25, i64 0, metadata !56, metadata !21), !dbg !57
				%cmp2 = icmp slt i32 %inc25, 256, !dbg !58
				br i1 %cmp2, label %for.body3, label %for.inc27, !dbg !31, !llvm.loop !59

				for.inc27: ; preds = %for.inc24
				call void @llvm.connex.end.repeat(), !dbg !61
				%inc28 = add nsw i32 %i.05, 1, !dbg !61
				tail call void @llvm.dbg.value(metadata i32 %inc28, i64 0, metadata !19, metadata !21), !dbg !18
				%cmp = icmp slt i32 %inc28, 256, !dbg !63
				call void asm sideeffect ");\0A END_KERNEL(\22OpincaaLLVM_MatMul_BTransposed_lines_43_0\22);\0A} // END if (connexGlobal->getKernel(...) == NULL)\0A connexGlobal->executeKernel(\22OpincaaLLVM_MatMul_BTransposed_lines_43_0\22);\0AconnexGlobal->readCorrectReductionResults(C, 65536, 2); \0A\0A// END_OPINCAA_HOST_DEVICE_CODE", ""() #3, !dbg !22
				br i1 %cmp, label %for.cond1.preheader, label %for.end29, !dbg !22, !llvm.loop !64

				for.end29: ; preds = %for.inc27
				ret void, !dbg !66
				}

				; Function Attrs: nounwind readnone
				declare void @llvm.dbg.value(metadata, i64, metadata, metadata) #1

				; Function Attrs: nounwind readonly
				declare <8 x i16> @llvm.masked.gather.v8i16(<8 x i16*>, i32, <8 x i1>, <8 x i16>) #2

				; Function Attrs: nounwind
				declare void @llvm.connex.repeat.x.times(i64) #3

				; Function Attrs: nounwind
				declare void @llvm.connex.end.repeat() #3

				; Function Attrs: nounwind
				declare void @llvm.connex.reduce.v8i16(<8 x i16>) #3

				attributes #0 = { nounwind uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #1 = { nounwind readnone }
				attributes #2 = { nounwind readonly }
				attributes #3 = { nounwind }

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!12, !13}
				!llvm.ident = !{!14}

				!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 3.9.0 (trunk 274579) (llvm/trunk 274513)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, globals: !3)
				!1 = !DIFile(filename: "test.c", directory: "/home/Tests/DawnCC/35_MatMul/SIZE_256/7_CVL8_LLVMnew")
				!2 = !{}
				!3 = !{!4, !10, !11}
				!4 = distinct !DIGlobalVariable(name: "A", scope: !0, file: !1, line: 36, type: !5, isLocal: false, isDefinition: true)
				!5 = !DICompositeType(tag: DW_TAG_array_type, baseType: !6, size: 1048576, align: 16, elements: !8)
				!6 = !DIDerivedType(tag: DW_TAG_typedef, name: "TYPE", file: !1, line: 34, baseType: !7)
				!7 = !DIBasicType(name: "short", size: 16, align: 16, encoding: DW_ATE_signed)
				!8 = !{!9, !9}
				!9 = !DISubrange(count: 256)
				!10 = distinct !DIGlobalVariable(name: "B", scope: !0, file: !1, line: 37, type: !5, isLocal: false, isDefinition: true)
				!11 = distinct !DIGlobalVariable(name: "C", scope: !0, file: !1, line: 38, type: !5, isLocal: false, isDefinition: true)
				!12 = !{i32 2, !"Dwarf Version", i32 4}
				!13 = !{i32 2, !"Debug Info Version", i32 3}
				!14 = !{!"clang version 3.9.0 (trunk 274579) (llvm/trunk 274513)"}
				!15 = distinct !DISubprogram(name: "MatMul_BTransposed", scope: !1, file: !1, line: 40, type: !16, isLocal: false, isDefinition: true, scopeLine: 40, isOptimized: false, unit: !0)
				!16 = !DISubroutineType(types: !17)
				!17 = !{null}
				!18 = !DILocation(line: 41, column: 9, scope: !15)
				!19 = !DILocalVariable(name: "i", scope: !15, file: !1, line: 41, type: !20)
				!20 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
				!21 = !DIExpression()
				!22 = !DILocation(line: 43, column: 5, scope: !23)
				!23 = !DILexicalBlockFile(scope: !24, file: !1, discriminator: 1)
				!24 = distinct !DILexicalBlock(scope: !25, file: !1, line: 43, column: 5)
				!25 = distinct !DILexicalBlock(scope: !15, file: !1, line: 43, column: 5)
				!26 = !DILocation(line: 45, column: 13, scope: !27)
				!27 = distinct !DILexicalBlock(scope: !28, file: !1, line: 44, column: 36)
				!28 = distinct !DILexicalBlock(scope: !29, file: !1, line: 44, column: 9)
				!29 = distinct !DILexicalBlock(scope: !30, file: !1, line: 44, column: 9)
				!30 = distinct !DILexicalBlock(scope: !24, file: !1, line: 43, column: 32)
				!31 = !DILocation(line: 44, column: 9, scope: !32)
				!32 = !DILexicalBlockFile(scope: !28, file: !1, discriminator: 1)
				!33 = !DILocation(line: 45, column: 21, scope: !27)
				!34 = !DILocalVariable(name: "k", scope: !15, file: !1, line: 41, type: !20)
				!35 = !DILocation(line: 41, column: 15, scope: !15)
				!36 = !DILocation(line: 46, column: 13, scope: !37)
				!37 = !DILexicalBlockFile(scope: !38, file: !1, discriminator: 1)
				!38 = distinct !DILexicalBlock(scope: !39, file: !1, line: 46, column: 13)
				!39 = distinct !DILexicalBlock(scope: !27, file: !1, line: 46, column: 13)
				!40 = !DILocation(line: 47, column: 36, scope: !41)
				!41 = distinct !DILexicalBlock(scope: !38, file: !1, line: 46, column: 40)
				!42 = !DILocation(line: 47, column: 25, scope: !41)
				!43 = distinct !{!43, !44, !45, !46}
				!44 = !DILocation(line: 46, column: 13, scope: !27)
				!45 = !{!"llvm.loop.vectorize.width", i32 1}
				!46 = !{!"llvm.loop.interleave.count", i32 1}
				!47 = !DILocation(line: 47, column: 28, scope: !41)
				!48 = !DILocation(line: 47, column: 38, scope: !41)
				!49 = !DILocation(line: 46, column: 35, scope: !50)
				!50 = !DILexicalBlockFile(scope: !38, file: !1, discriminator: 2)
				!51 = !DILocation(line: 46, column: 27, scope: !37)
				!52 = distinct !{!52, !44, !53, !45, !46}
				!53 = !{!"llvm.loop.unroll.runtime.disable"}
				!54 = !DILocation(line: 44, column: 31, scope: !55)
				!55 = !DILexicalBlockFile(scope: !28, file: !1, discriminator: 2)
				!56 = !DILocalVariable(name: "j", scope: !15, file: !1, line: 41, type: !20)
				!57 = !DILocation(line: 41, column: 12, scope: !15)
				!58 = !DILocation(line: 44, column: 23, scope: !32)
				!59 = distinct !{!59, !60}
				!60 = !DILocation(line: 44, column: 9, scope: !30)
				!61 = !DILocation(line: 43, column: 27, scope: !62)
				!62 = !DILexicalBlockFile(scope: !24, file: !1, discriminator: 2)
				!63 = !DILocation(line: 43, column: 19, scope: !23)
				!64 = distinct !{!64, !65}
				!65 = !DILocation(line: 43, column: 5, scope: !15)
				!66 = !DILocation(line: 53, column: 1, scope: !15)

This is an archive of the discontinued LLVM Phabricator instance.

Add the Connex SIMD/vector processor back end (main back end patch)Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 502377

llvm/include/llvm/IR/CMakeLists.txt

llvm/include/llvm/IR/Intrinsics.td

llvm/include/llvm/IR/IntrinsicsConnex.td

llvm/lib/IR/Function.cpp

llvm/lib/Target/Connex/CMakeLists.txt

llvm/lib/Target/Connex/Connex.h

llvm/lib/Target/Connex/Connex.td

llvm/lib/Target/Connex/ConnexAsmPrinter.cpp

llvm/lib/Target/Connex/ConnexAsmPrinterLoopNests.h

llvm/lib/Target/Connex/ConnexCallingConv.td

llvm/lib/Target/Connex/ConnexConfig.h

llvm/lib/Target/Connex/ConnexFrameLowering.h

llvm/lib/Target/Connex/ConnexFrameLowering.cpp

llvm/lib/Target/Connex/ConnexHazardRecognizer.h

llvm/lib/Target/Connex/ConnexHazardRecognizer.cpp

llvm/lib/Target/Connex/ConnexISelDAGToDAG.cpp

llvm/lib/Target/Connex/ConnexISelLowering.h

llvm/lib/Target/Connex/ConnexISelLowering.cpp

llvm/lib/Target/Connex/ConnexISelMisc.h

llvm/lib/Target/Connex/ConnexInstrFormats.td

llvm/lib/Target/Connex/ConnexInstrInfo.h

llvm/lib/Target/Connex/ConnexInstrInfo.cpp

llvm/lib/Target/Connex/ConnexInstrInfo.td

llvm/lib/Target/Connex/ConnexInstrInfoScalar.td

llvm/lib/Target/Connex/ConnexInstrInfoVec.td

llvm/lib/Target/Connex/ConnexInstrInfoVecVsplat.td

llvm/lib/Target/Connex/ConnexMCInstLower.h

llvm/lib/Target/Connex/ConnexMCInstLower.cpp

llvm/lib/Target/Connex/ConnexRegisterInfo.h

llvm/lib/Target/Connex/ConnexRegisterInfo.cpp

llvm/lib/Target/Connex/ConnexRegisterInfo.td

llvm/lib/Target/Connex/ConnexSchedule.td

llvm/lib/Target/Connex/ConnexSelectionDAGInfo.h

llvm/lib/Target/Connex/ConnexSelectionDAGInfo.cpp

llvm/lib/Target/Connex/ConnexSubtarget.h

llvm/lib/Target/Connex/ConnexSubtarget.cpp

llvm/lib/Target/Connex/ConnexTargetMachine.h

llvm/lib/Target/Connex/ConnexTargetMachine.cpp

llvm/lib/Target/Connex/ConnexTargetTransformInfo.h

llvm/lib/Target/Connex/MCTargetDesc/CMakeLists.txt

llvm/lib/Target/Connex/MCTargetDesc/ConnexAsmBackend.cpp

llvm/lib/Target/Connex/MCTargetDesc/ConnexELFObjectWriter.cpp

llvm/lib/Target/Connex/MCTargetDesc/ConnexInstPrinter.h

llvm/lib/Target/Connex/MCTargetDesc/ConnexInstPrinter.cpp

llvm/lib/Target/Connex/MCTargetDesc/ConnexMCAsmInfo.h

llvm/lib/Target/Connex/MCTargetDesc/ConnexMCCodeEmitter.cpp

llvm/lib/Target/Connex/MCTargetDesc/ConnexMCTargetDesc.h

llvm/lib/Target/Connex/MCTargetDesc/ConnexMCTargetDesc.cpp

llvm/lib/Target/Connex/Misc.h

llvm/lib/Target/Connex/RecoverFromLlvmIR.h

llvm/lib/Target/Connex/Select_ABSi32_OpincaaCodeGen.h

llvm/lib/Target/Connex/Select_ADDf16_OpincaaCodeGen.h

llvm/lib/Target/Connex/Select_ADDi32_OpincaaCodeGen.h

llvm/lib/Target/Connex/Select_DIVf16_OpincaaCodeGen.h

llvm/lib/Target/Connex/Select_DIVi16_OpincaaCodeGen.h

llvm/lib/Target/Connex/Select_LTf16_OpincaaCodeGen.h

llvm/lib/Target/Connex/Select_MULTf16_OpincaaCodeGen.h

llvm/lib/Target/Connex/Select_MULTi32_ComplementedRepresentation_OpincaaCodeGen.h

llvm/lib/Target/Connex/Select_REDf16_OpincaaCodeGen.h

llvm/lib/Target/Connex/Select_REDi32_OpincaaCodeGen.h

llvm/lib/Target/Connex/Select_SHRAi32_OpincaaCodeGen.h

llvm/lib/Target/Connex/Select_SUBf16_OpincaaCodeGen.h

llvm/lib/Target/Connex/Select_SUBi32_OpincaaCodeGen.h

llvm/lib/Target/Connex/TargetInfo/CMakeLists.txt

llvm/lib/Target/Connex/TargetInfo/ConnexTargetInfo.cpp

llvm/test/CodeGen/Connex/MatMulBT-128_i16.ll

Add the Connex SIMD/vector processor back end (main back end patch)
Needs ReviewPublic