This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

1/1
CMakeLists.txt
-
CODE_OWNERS.TXT
-
include/llvm/
-
llvm/
-
ADT/
1
Triple.h
-
Analysis/
-
RegionInfoImpl.h
-
ScalarEvolution.h
1/1
ScalarEvolutionExpander.h
-
CodeGen/
3
SelectionDAG.h
-
SelectionDAGISel.h
-
SlotIndexes.h
-
IR/
-
Intrinsics.td
-
IntrinsicsConnex.td
-
lib/
-
CodeGen/
1/1
LiveRangeCalc.cpp
1/1
RegAllocGreedy.cpp
-
SelectionDAG/
1
DAGCombiner.cpp
1
SelectionDAG.cpp
1/1
SelectionDAGBuilder.h
-
SelectionDAGISel.cpp
-
Target/
-
Connex/
-
Connex.h
5/10
ConnexAsmPrinter.cpp
1
ConnexAsmPrinterLoopNests.h
1/1
ConnexConfig.h
-
ConnexFrameLowering.h
-
ConnexFrameLowering.cpp
-
ConnexHazardRecognizer.h
-
ConnexHazardRecognizer.cpp
-
ConnexHazardRecognizerPreRAScheduler.h
-
ConnexHazardRecognizerPreRAScheduler.cpp
-
ConnexISelDAGToDAG.cpp
-
ConnexISelLowering.h
-
ConnexISelLowering.cpp
1/1
ConnexInstrInfo.h
-
ConnexInstrInfo.cpp
-
ConnexMCInstLower.h
-
ConnexMCInstLower.cpp
-
ConnexRegisterInfo.h
-
ConnexRegisterInfo.cpp
-
ConnexSelectionDAGInfo.h
3/3
ConnexSelectionDAGInfo.cpp
-
ConnexSubtarget.h
-
ConnexSubtarget.cpp
-
ConnexTargetMachine.h
1/1
ConnexTargetMachine.cpp
1/1
ConnexTargetTransformInfo.h
-
InstPrinter/
-
CMakeLists.txt
-
ConnexInstPrinter.h
-
ConnexInstPrinter.cpp
-
LLVMBuild.txt
-
LLVMBuild.txt
-
MCTargetDesc/
-
CMakeLists.txt
-
ConnexAsmBackend.cpp
-
ConnexELFObjectWriter.cpp
-
ConnexMCAsmInfo.h
-
ConnexMCCodeEmitter.cpp
-
ConnexMCTargetDesc.h
-
ConnexMCTargetDesc.cpp
-
LLVMBuild.txt
-
Misc.h
1/1
RecoverFromLlvmIR.h
-
Select_ADDf16_OpincaaCodeGen.h
1/1
Select_ADDi32_OpincaaCodeGen.h
-
Select_LTf16_OpincaaCodeGen.h
-
Select_MULTf16_OpincaaCodeGen.h
-
Select_MULTi32_ComplementedRepresentation_OpincaaCodeGen.h
-
Select_REDf16_OpincaaCodeGen.h
-
Select_REDi32_OpincaaCodeGen.h
-
Select_SHRAi32_OpincaaCodeGen.h
-
Select_SUBf16_OpincaaCodeGen.h
1
Select_SUBi32_OpincaaCodeGen.h
-
TargetInfo/
-
CMakeLists.txt
-
ConnexTargetInfo.cpp
-
LLVMBuild.txt
-
LLVMBuild.txt
-
test/CodeGen/Connex/
-
CodeGen/
-
Connex/
-
MatMul-128_i16.ll
-
MatMul-128_i32.ll
-
basictest.ll
-
lit.local.cfg

Differential D60052

Add Connex vector processor back end
Needs ReviewPublic

Authored by alexsusu on Mar 31 2019, 5:16 PM.

Download Raw Diff

Details

Reviewers

• llvm.org
asb
efriedma
arsenm
luismarques

Summary

Connex is an established, almost 30-year old, very wide research vector processor (see, for example, http://users.dcae.pub.ro/~gstefan/2ndLevel/connex.html) with a number of lanes between 32 and 4096, easily changeable at synthesis time.
A very interesting feature is that the Connex processor has a local banked vector memory (each lane has its own local memory), which achieves 1 cycle latency with direct and indirect loads and stores - this implies that the memory bandwidth is very big.

The Connex vector processor has 16-bit signed integer Execution Units in each lane. It is emulating efficiently (via inlining the emulation subroutines in the instruction selection pass) 32-bit int and IEEE 754-2008 compliant 16-bit floating point (Clang type _Float16, C for ARM __fp16, LLVM IR half type). The emulation subroutines are in the lib/Target/Connex/Select_*_OpincaaCodeGen.h files, which are to be included in the ConnexISelDAGToDAG.cpp module, in the ConnexDAGToDAGISel::Select() method. These emulation subroutines can be easily adjusted using for example to increase performance by sacrificing accuracy of f16 - drop me an email to ask how can you do it. (They currently total almost 1 MB of C++ code.)
The Connex vector processor does not currently support the float, double, nor the 64-bit integer types.

The back end targets more exactly the Connex processor, used as an accelerator, a variant of the Connex processor, which is low-power. The working compiler is described at https://sites.google.com/site/alexsusu/myfilecabinet/OpincaaLLVM_TR_UPB.pdf .

Note that currently our back end targets only our Connex Opincaa assembler (very easy to learn and use) available at https://gitlab.dcae.pub.ro/research/ConnexRelated/opincaa/ .
The Connex Opincaa assembler allows to run arbitrary Connex vector-length, host (CPU) agnostic code.

The ISA of the Connex vector processor is available at https://gitlab.dcae.pub.ro/research/ConnexRelated/opincaa/blob/master/ConnexISA.pdf .
The Connex vector processor has also an open source C++ simulator available also at https://gitlab.dcae.pub.ro/research/ConnexRelated/opincaa/ .

The mailing list for the Connex processor and tools is: https://groups.google.com/forum/#!forum/connex-tools .

An interesting feature is that, in order to support recovering from from the Instruction selection pass' SelectionDAG back to the original source (C) code we require adding a simple data structure in include/llvm/CodeGen/SelectionDAG.h (and helper methods in related files) that maps an SDValue to the LLVM IR Value object it was used to translate from:
   DenseMap<const Value*, SDValue> *crtNodeMapPtr

The Connex back end is 3 years old. We published 1 academic paper on it at a CGO workshop: https://dl.acm.org/citation.cfm?id=3306166 . However, we are still adding features to the back end.

Small note: the Connex backend is rather small, it builds fast (in ~3-5 mins, single-threaded on a decent machine; in Apr 2019 the built objects have a total 71,168K, while the smallest LLVM backend, MSP430, has 63,387K and the biggest ones are X86 with 359,736K, and AMDGPU with 488,309K).

An important thing is that I think the test/MC/Connex folder should not be populated for this patch, because the Connex back end is able to generate only an assembly code that is required to be used by the special Opincaa assembler, which is not integrated in LLVM. I've seen other back ends doing a similar thing such as the NVPTX back end, which doesn't support object file generation. The Connex back end also doesn't support object file generation.
The eBPF+ConnexS processor has the same ABI as the eBPF processor it extends, except that Connex-S supports natively only 16-bit integers and it is able to access the banked vector memory only by line (so Connex-S can't perform unaligned accesses).

The Connex processor is currently implemented in FPGA, but was also implemented in silicon also:
    - an older version for HDTV: Gheorghe M. Stefan, "The CA1024: A Massively Parallel Processor for Cost-Effective HDTV", 2006 (http://users.dcae.pub.ro/~gstefan/2ndLevel/images/connex_v4.ppt)
    - M. Malita and Gheorghe M. Stefan, "Map-scan Node Accelerator for Big-data"
    - Gheorghe M. Stefan and Mihaela Malita, "Can One-Chip Parallel Computing Be Liberated From Ad Hoc Solutions? A Computation Model Based Approach and Its Implementation"

Diff Detail

Repository: rL LLVM

Event Timeline

alexsusu created this revision.Mar 31 2019, 5:16 PM

Herald added subscribers: llvm-commits, jdoerfert, kristina and 6 others. · View Herald TranscriptMar 31 2019, 5:16 PM

alexsusu edited the summary of this revision. (Show Details)Mar 31 2019, 6:17 PM

Added 2 more files. I still have to add another about 50 files.

A few corrections to ConnexInstrInfoVec.td .

Herald added a subscriber: sanjoy. · View Herald TranscriptApr 12 2019, 3:40 PM

alexsusu edited the summary of this revision. (Show Details)Apr 12 2019, 7:38 PM

alexsusu edited the summary of this revision. (Show Details)

alexsusu edited the summary of this revision. (Show Details)Apr 12 2019, 7:53 PM

alexsusu edited the summary of this revision. (Show Details)Apr 12 2019, 8:08 PM

More refactoring on the .td TableGen files. Added also more source files.

Added all source files for the Connex back end.
Followed coding standards from https://llvm.org/docs/CodingStandards.html .

Herald added subscribers: javed.absar, kristof.beyls. · View Herald TranscriptApr 15 2019, 1:49 PM

Added myself as maintainer of the Connex backend in CODE_OWNERS.TXT.

alexsusu edited the summary of this revision. (Show Details)Apr 15 2019, 5:08 PM

alexsusu added reviewers: jpienaar, asb.

Herald added a subscriber: tpr. · View Herald TranscriptApr 15 2019, 5:08 PM

Some initial feedback:

This patch is pretty huge which makes it pretty hard to meaningfully review
There seem to be effectively no tests. I'd expect test/CodeGen/Connex and test/MC/Connex to be reasonably well populated
Plenty of code commented out or with date-based comments that don't match our style, e.g. // 2018_*
Files have the old license header rather than the new one

Made a first round of corrections following Alex Bradbury's review.

In D60052#1471560, @asb wrote:

Some initial feedback:

This patch is pretty huge which makes it pretty hard to meaningfully review

There seem to be effectively no tests. I'd expect test/CodeGen/Connex and test/MC/Connex to be reasonably well populated

Plenty of code commented out or with date-based comments that don't match our style, e.g. // 2018_*

Files have the old license header rather than the new one

Hi, Alex.

I addressed a few of your concerns and I'm working on the others.

An important thing is that I think test/MC/Connex should not be populated for this patch, because the Connex back end is able to generate only an assembly code that is required to be used by the special Opincaa assembler, which is not integrated in LLVM. I've seen other back ends doing a similar thing such as the NVPTX back end, which doesn't support object file generation. The Connex back end also doesn't support object file generation.

Thank you,
  Alex Susu

Added all required files for the back end (a few were missing).

Herald added a subscriber: ormris. · View Herald TranscriptApr 22 2019, 6:10 PM

ormris removed a subscriber: ormris.Apr 23 2019, 9:36 AM

Just quickly reviewing the target-independent changes.

In general, commented-out code should be cleaned up.

Could you explain the need for crtNodeMapPtr a bit more? In general, IR instructions should be lowered to SelectionDAG nodes in a way that doesn't require referring back to the original Instruction afterwards.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
9092 ↗	(On Diff #196163)	If you have operations with multiple results you need to custom-legalize, your backend should just override LowerOperationWrapper.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
42	Stray include?

There's a lot of clutter that makes this hard to review. All of the extra debug, special ifdef blocks, and commented out code should be removed.

This should also be split into smaller patches. For example, the triple patches, then adding the basic target machine definition, and then MC parts before moving on to codegen and optimizations

include/llvm/ADT/Triple.h
56	The triple patches are usually committed as a first, separate patch
include/llvm/CodeGen/SelectionDAG.h
273–277	This probably should be dropped
1226–1231	This should be a separate patch (or you could just use the ArrayRef version)
lib/Target/Connex/ConnexAsmPrinter.cpp
53	This should be after all includes
54–59	Sort includes
61–64	You should not need this, also no global
165	isInlineAsm() (also for others)
226–227	A large number of the debug statements seem like they're just noise for committing
235	No c string functions
958	You shouldn't be looking at the .data() on a StringRef as if it were a c string like this should just use the StringRef directly.
lib/Target/Connex/ConnexAsmPrinterLoopNests.h
2–3	I don't really understand anything that's going on in this file. You shouldn't have file IO, or globals
lib/Target/Connex/ConnexConfig.h
4	These macros mostly seem like a waste of effort
lib/Target/Connex/ConnexHazardRecognizers.cpp
304 ↗	(On Diff #196163)	Fewer TODOs
308 ↗	(On Diff #196163)	No random special ifdef blocks
lib/Target/Connex/ConnexInstrInfo.h
53	No macro for this
lib/Target/Connex/ConnexSelectionDAGInfo.cpp
114	static const
176	static const
202	More junk macros
lib/Target/Connex/ConnexTargetMachine.cpp
899	This needs to be broken down into smaller functions
lib/Target/Connex/ConnexTargetTransformInfo.h
118	Noisy debug
lib/Target/Connex/RecoverFromLlvmIR.h
2–3	I don't know what this file is trying to accomplish, but it is a separate patch from the backend
lib/Target/Connex/Select_ADDi32_OpincaaCodeGen.h
10–19	There shouldn't be any generated code. Generated selection should come from table gen, with some manual code in *ISelDAGToDAG
lib/Target/Connex/TargetInfo/Makefile
1 ↗	(On Diff #196163)	I thought Makefiles were all gone?

alexsusu marked 2 inline comments as done.Apr 25 2019, 4:07 PM

This comment was removed by alexsusu.

luismarques added a subscriber: luismarques.Apr 26 2019, 3:43 AM

luismarques added inline comments.

CMakeLists.txt
324	New targets are generally considered experimental and not added to the default build list. You would generally build them by adding them to the cmake definition LLVM_EXPERIMENTAL_TARGETS_TO_BUILD.

Addressed most reviews of efriedma, arsenm, luismarques (and of asb).

Herald added a subscriber: wdng. · View Herald TranscriptMay 3 2019, 3:14 PM

Added relevant tests in test/CodeGen/Connex.

Herald added subscribers: arphaman, qcolombet, MatzeB. · View Herald TranscriptMay 6 2019, 3:52 PM

This still needs a lot of work before it will be in a committable state. All of the parts touching generic code need to be reviewed and committed separately at an absolute minimum. The backend also could use some breaking up.

In particular the Value*/SDNode map doesn't in general make sense. What is it used for? It's certainly not the correct solution to whatever problem you're solving with it.

include/llvm/Analysis/ScalarEvolutionExpander.h
264–272	SCEV changes need to be split to separate patches
include/llvm/CodeGen/SelectionDAG.h
273–277	This still needs to be dropped
lib/CodeGen/LiveRangeCalc.cpp
25–38	This needs to be removed
lib/CodeGen/RegAllocGreedy.cpp
1223–1224	Needs to be removed
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1472–1473	Needs to be removed
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
85	Ditto
lib/Target/Connex/ConnexAsmPrinter.cpp
79–86	These various places linking to other documentation should be removed. Doxygen generates the appropriate links
104–117	Remove commented out code
780–782	More macros to remove
lib/Target/Connex/Select_SUBi32_OpincaaCodeGen.h
1–6	This file needs to be dropped. There should be no committed, generated code. You should have table gen emit this, or manually write the code in ISelDAGToDAG

Feel free to re-add me on a new revision

luismarques resigned from this revision.Feb 1 2021, 7:17 AM

Herald added a subscriber: pengfei. · View Herald TranscriptFeb 1 2021, 7:17 AM

Revision Contents

Path

Size

CMakeLists.txt

1 line

CODE_OWNERS.TXT

4 lines

include/

llvm/

ADT/

Triple.h

1 line

Analysis/

RegionInfoImpl.h

2 lines

ScalarEvolution.h

4 lines

ScalarEvolutionExpander.h

35 lines

CodeGen/

SelectionDAG.h

12 lines

SelectionDAGISel.h

1 line

SlotIndexes.h

31 lines

IR/

Intrinsics.td

1 line

IntrinsicsConnex.td

106 lines

lib/

CodeGen/

LiveRangeCalc.cpp

24 lines

RegAllocGreedy.cpp

3 lines

SelectionDAG/

DAGCombiner.cpp

3 lines

SelectionDAG.cpp

45 lines

SelectionDAGBuilder.h

6 lines

SelectionDAGISel.cpp

6 lines

Target/

Connex/

Connex.h

35 lines

ConnexAsmPrinter.cpp

1271 lines

ConnexAsmPrinterLoopNests.h

126 lines

ConnexConfig.h

78 lines

ConnexFrameLowering.h

41 lines

ConnexFrameLowering.cpp

39 lines

ConnexHazardRecognizer.h

79 lines

ConnexHazardRecognizer.cpp

471 lines

ConnexHazardRecognizerPreRAScheduler.h

70 lines

ConnexHazardRecognizerPreRAScheduler.cpp

337 lines

ConnexISelDAGToDAG.cpp

5094 lines

ConnexISelLowering.h

213 lines

ConnexISelLowering.cpp

3561 lines

ConnexInstrInfo.h

96 lines

ConnexInstrInfo.cpp

939 lines

ConnexMCInstLower.h

42 lines

ConnexMCInstLower.cpp

116 lines

ConnexRegisterInfo.h

76 lines

ConnexRegisterInfo.cpp

152 lines

ConnexSelectionDAGInfo.h

74 lines

ConnexSelectionDAGInfo.cpp

131 lines

ConnexSubtarget.h

70 lines

ConnexSubtarget.cpp

30 lines

ConnexTargetMachine.h

51 lines

ConnexTargetMachine.cpp

1580 lines

ConnexTargetTransformInfo.h

132 lines

InstPrinter/

CMakeLists.txt

3 lines

ConnexInstPrinter.h

65 lines

ConnexInstPrinter.cpp

527 lines

LLVMBuild.txt

23 lines

LLVMBuild.txt

42 lines

MCTargetDesc/

CMakeLists.txt

6 lines

ConnexAsmBackend.cpp

138 lines

ConnexELFObjectWriter.cpp

84 lines

ConnexMCAsmInfo.h

50 lines

ConnexMCCodeEmitter.cpp

177 lines

ConnexMCTargetDesc.h

64 lines

ConnexMCTargetDesc.cpp

109 lines

LLVMBuild.txt

22 lines

Misc.h

78 lines

RecoverFromLlvmIR.h

2022 lines

Select_ADDf16_OpincaaCodeGen.h

3633 lines

Select_ADDi32_OpincaaCodeGen.h

213 lines

Select_LTf16_OpincaaCodeGen.h

705 lines

Select_MULTf16_OpincaaCodeGen.h

3266 lines

Select_MULTi32_ComplementedRepresentation_OpincaaCodeGen.h

354 lines

Select_REDf16_OpincaaCodeGen.h

1562 lines

Select_REDi32_OpincaaCodeGen.h

191 lines

Select_SHRAi32_OpincaaCodeGen.h

464 lines

Select_SUBf16_OpincaaCodeGen.h

3651 lines

Select_SUBi32_OpincaaCodeGen.h

212 lines

TargetInfo/

3 lines

23 lines

22 lines

1 line

test/

CodeGen/

Connex/

224 lines

221 lines

28 lines

2 lines

Diff 198358

CMakeLists.txt

	Show First 20 Lines • Show All 315 Lines • ▼ Show 20 Lines
	set(LLVM_INCLUDE_DIR ${CMAKE_CURRENT_BINARY_DIR}/include)			set(LLVM_INCLUDE_DIR ${CMAKE_CURRENT_BINARY_DIR}/include)

	# List of all targets to be built by default:			# List of all targets to be built by default:
	set(LLVM_ALL_TARGETS			set(LLVM_ALL_TARGETS
	AArch64			AArch64
	AMDGPU			AMDGPU
	ARM			ARM
	BPF			BPF
				Connex
				luismarquesUnsubmitted Done Reply Inline Actions New targets are generally considered experimental and not added to the default build list. You would generally build them by adding them to the cmake definition LLVM_EXPERIMENTAL_TARGETS_TO_BUILD. luismarques: New targets are generally considered experimental and not added to the default build list. You…
	Hexagon			Hexagon
	Lanai			Lanai
	Mips			Mips
	MSP430			MSP430
	NVPTX			NVPTX
	PowerPC			PowerPC
	Sparc			Sparc
	SystemZ			SystemZ
	▲ Show 20 Lines • Show All 814 Lines • Show Last 20 Lines

CODE_OWNERS.TXT

	Show First 20 Lines • Show All 188 Lines • ▼ Show 20 Lines
	N: Michael Spencer			N: Michael Spencer
	E: bigcheesegs@gmail.com			E: bigcheesegs@gmail.com
	D: Windows parts of Support, Object, ar, nm, objdump, ranlib, size			D: Windows parts of Support, Object, ar, nm, objdump, ranlib, size

	N: Alexei Starovoitov			N: Alexei Starovoitov
	E: alexei.starovoitov@gmail.com			E: alexei.starovoitov@gmail.com
	D: BPF backend			D: BPF backend

				N: Alex Susu
				E: alex.susu@gmail.com
				D: Connex wide vector processor backend

	N: Tom Stellard			N: Tom Stellard
	E: tstellar@redhat.com			E: tstellar@redhat.com
	D: Stable release management (x.y.[1-9] releases), AMDGPU Backend, libclc			D: Stable release management (x.y.[1-9] releases), AMDGPU Backend, libclc

	N: Evgeniy Stepanov			N: Evgeniy Stepanov
	E: eugenis@google.com			E: eugenis@google.com
	D: MemorySanitizer (LLVM part)			D: MemorySanitizer (LLVM part)

	Show All 24 Lines

include/llvm/ADT/Triple.h

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	enum ArchType {
arm, // ARM (little endian): arm, armv.*, xscale		arm, // ARM (little endian): arm, armv.*, xscale
armeb, // ARM (big endian): armeb		armeb, // ARM (big endian): armeb
aarch64, // AArch64 (little endian): aarch64		aarch64, // AArch64 (little endian): aarch64
aarch64_be, // AArch64 (big endian): aarch64_be		aarch64_be, // AArch64 (big endian): aarch64_be
arc, // ARC: Synopsys ARC		arc, // ARC: Synopsys ARC
avr, // AVR: Atmel AVR microcontroller		avr, // AVR: Atmel AVR microcontroller
bpfel, // eBPF or extended BPF or 64-bit BPF (little endian)		bpfel, // eBPF or extended BPF or 64-bit BPF (little endian)
bpfeb, // eBPF or extended BPF or 64-bit BPF (big endian)		bpfeb, // eBPF or extended BPF or 64-bit BPF (big endian)
		connex, // Connex vector processor
		arsenmUnsubmitted Not Done Reply Inline Actions The triple patches are usually committed as a first, separate patch arsenm: The triple patches are usually committed as a first, separate patch
hexagon, // Hexagon: hexagon		hexagon, // Hexagon: hexagon
mips, // MIPS: mips, mipsallegrex, mipsr6		mips, // MIPS: mips, mipsallegrex, mipsr6
mipsel, // MIPSEL: mipsel, mipsallegrexe, mipsr6el		mipsel, // MIPSEL: mipsel, mipsallegrexe, mipsr6el
mips64, // MIPS64: mips64, mips64r6, mipsn32, mipsn32r6		mips64, // MIPS64: mips64, mips64r6, mipsn32, mipsn32r6
mips64el, // MIPS64EL: mips64el, mips64r6el, mipsn32el, mipsn32r6el		mips64el, // MIPS64EL: mips64el, mips64r6el, mipsn32el, mipsn32r6el
msp430, // MSP430: msp430		msp430, // MSP430: msp430
ppc, // PPC: powerpc		ppc, // PPC: powerpc
ppc64, // PPC64: powerpc64, ppu		ppc64, // PPC64: powerpc64, ppu
▲ Show 20 Lines • Show All 788 Lines • Show Last 20 Lines

include/llvm/Analysis/RegionInfoImpl.h

	Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines

	template <class Tr>			template <class Tr>
	void RegionBase<Tr>::replaceEntry(BlockT *BB) {			void RegionBase<Tr>::replaceEntry(BlockT *BB) {
	this->entry.setPointer(BB);			this->entry.setPointer(BB);
	}			}

	template <class Tr>			template <class Tr>
	void RegionBase<Tr>::replaceExit(BlockT *BB) {			void RegionBase<Tr>::replaceExit(BlockT *BB) {
	assert(exit && "No exit to replace!");			//assert(exit && "No exit to replace!");
	exit = BB;			exit = BB;
	}			}

	template <class Tr>			template <class Tr>
	void RegionBase<Tr>::replaceEntryRecursive(BlockT *NewEntry) {			void RegionBase<Tr>::replaceEntryRecursive(BlockT *NewEntry) {
	std::vector<RegionT *> RegionQueue;			std::vector<RegionT *> RegionQueue;
	BlockT *OldEntry = getEntry();			BlockT *OldEntry = getEntry();

	▲ Show 20 Lines • Show All 860 Lines • Show Last 20 Lines

include/llvm/Analysis/ScalarEvolution.h

Show First 20 Lines • Show All 776 Lines • ▼ Show 20 Lines	public:
/// Return true if the backedge taken count is either the value returned by		/// Return true if the backedge taken count is either the value returned by
/// getMaxBackedgeTakenCount or zero.		/// getMaxBackedgeTakenCount or zero.
bool isBackedgeTakenCountMaxOrZero(const Loop *L);		bool isBackedgeTakenCountMaxOrZero(const Loop *L);

/// Return true if the specified loop has an analyzable loop-invariant		/// Return true if the specified loop has an analyzable loop-invariant
/// backedge-taken count.		/// backedge-taken count.
bool hasLoopInvariantBackedgeTakenCount(const Loop *L);		bool hasLoopInvariantBackedgeTakenCount(const Loop *L);

		/// Determines if the terminator of a given exiting block consistently
		/// controls the exit count of a loop.
		static bool hasConsistentTerminator(const Loop L, BasicBlock ExitingBlock);

/// This method should be called by the client when it has changed a loop in		/// This method should be called by the client when it has changed a loop in
/// a way that may effect ScalarEvolution's ability to compute a trip count,		/// a way that may effect ScalarEvolution's ability to compute a trip count,
/// or if the loop is deleted. This call is potentially expensive for large		/// or if the loop is deleted. This call is potentially expensive for large
/// loop bodies.		/// loop bodies.
void forgetLoop(const Loop *L);		void forgetLoop(const Loop *L);

// This method invokes forgetLoop for the outermost loop of the given loop		// This method invokes forgetLoop for the outermost loop of the given loop
// \p L, making ScalarEvolution forget about all this subtree. This needs to		// \p L, making ScalarEvolution forget about all this subtree. This needs to
▲ Show 20 Lines • Show All 1,244 Lines • Show Last 20 Lines

include/llvm/Analysis/ScalarEvolutionExpander.h

Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	class SCEVExpander : public SCEVVisitor<SCEVExpander, Value*> {
/// variable. When false, expression are expanded in a more literal form.		/// variable. When false, expression are expanded in a more literal form.
bool CanonicalMode;		bool CanonicalMode;

/// When invoked from LSR, the expander is in "strength reduction" mode. The		/// When invoked from LSR, the expander is in "strength reduction" mode. The
/// only difference is that phi's are only reused if they are already in		/// only difference is that phi's are only reused if they are already in
/// "expanded" form.		/// "expanded" form.
bool LSRMode;		bool LSRMode;

typedef IRBuilder<TargetFolder> BuilderType;
BuilderType Builder;

// RAII object that stores the current insertion point and restores it when		// RAII object that stores the current insertion point and restores it when
// the object is destroyed. This includes the debug location. Duplicated		// the object is destroyed. This includes the debug location. Duplicated
// from InsertPointGuard to add SetInsertPoint() which is used to updated		// from InsertPointGuard to add SetInsertPoint() which is used to updated
// InsertPointGuards stack when insert points are moved during SCEV		// InsertPointGuards stack when insert points are moved during SCEV
// expansion.		// expansion.
class SCEVInsertPointGuard {		class SCEVInsertPointGuard {
IRBuilderBase &Builder;		IRBuilderBase &Builder;
AssertingVH<BasicBlock> Block;		AssertingVH<BasicBlock> Block;
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	explicit SCEVExpander(ScalarEvolution &se, const DataLayout &DL,
: SE(se), DL(DL), IVName(name), IVIncInsertLoop(nullptr),		: SE(se), DL(DL), IVName(name), IVIncInsertLoop(nullptr),
IVIncInsertPos(nullptr), CanonicalMode(true), LSRMode(false),		IVIncInsertPos(nullptr), CanonicalMode(true), LSRMode(false),
Builder(se.getContext(), TargetFolder(DL)) {		Builder(se.getContext(), TargetFolder(DL)) {
#ifndef NDEBUG		#ifndef NDEBUG
DebugType = "";		DebugType = "";
#endif		#endif
}		}

~SCEVExpander() {		virtual ~SCEVExpander() {
// Make sure the insert point guard stack is consistent.		// Make sure the insert point guard stack is consistent.
assert(InsertPointGuards.empty());		assert(InsertPointGuards.empty());
}		}

#ifndef NDEBUG		#ifndef NDEBUG
void setDebugType(const char* s) { DebugType = s; }		void setDebugType(const char* s) { DebugType = s; }
#endif		#endif

▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	#endif

/// Disable the behavior of expanding expressions in canonical form rather		/// Disable the behavior of expanding expressions in canonical form rather
/// than in a more literal form. Non-canonical mode is useful for late		/// than in a more literal form. Non-canonical mode is useful for late
/// optimization passes.		/// optimization passes.
void disableCanonicalMode() { CanonicalMode = false; }		void disableCanonicalMode() { CanonicalMode = false; }

void enableLSRMode() { LSRMode = true; }		void enableLSRMode() { LSRMode = true; }

		Instruction *getInsertPoint() {
		return &(*Builder.GetInsertPoint());
		}

		void setInsertPoint(BasicBlock *bb, BasicBlock::iterator ip) {
		Builder.SetInsertPoint(bb, ip);
		}

/// Set the current insertion point. This is useful if multiple calls to		/// Set the current insertion point. This is useful if multiple calls to
		arsenmUnsubmitted Not Done Reply Inline Actions SCEV changes need to be split to separate patches arsenm: SCEV changes need to be split to separate patches
/// expandCodeFor() are going to be made with the same insert point and the		/// expandCodeFor() are going to be made with the same insert point and the
/// insert point may be moved during one of the expansions (e.g. if the		/// insert point may be moved during one of the expansions (e.g. if the
/// insert point is not a block terminator).		/// insert point is not a block terminator).
void setInsertPoint(Instruction *IP) {		void setInsertPoint(Instruction *IP) {
assert(IP);		assert(IP);
Builder.SetInsertPoint(IP);		Builder.SetInsertPoint(IP);
}		}

Show All 28 Lines	#endif
/// didn't find any value it does not mean that there is no such value.		/// didn't find any value it does not mean that there is no such value.
///		///
Optional<ScalarEvolution::ValueOffsetPair>		Optional<ScalarEvolution::ValueOffsetPair>
getRelatedExistingExpansion(const SCEV S, const Instruction At, Loop *L);		getRelatedExistingExpansion(const SCEV S, const Instruction At, Loop *L);

private:		private:
LLVMContext &getContext() const { return SE.getContext(); }		LLVMContext &getContext() const { return SE.getContext(); }

		Value expand(const SCEV S);

		/// Determine the most "relevant" loop for the given SCEV.
		const Loop getRelevantLoop(const SCEV );

		protected:
		typedef IRBuilder<TargetFolder> BuilderType;
		BuilderType Builder;

		Value* getSavedExpression(const SCEV S, Instruction InsertPt);

		void rememberExpression(const SCEV S, Instruction InsertPt,
		Value *V);

/// Recursive helper function for isHighCostExpansion.		/// Recursive helper function for isHighCostExpansion.
bool isHighCostExpansionHelper(const SCEV S, Loop L,		bool isHighCostExpansionHelper(const SCEV S, Loop L,
const Instruction *At,		const Instruction *At,
SmallPtrSetImpl<const SCEV *> &Processed);		SmallPtrSetImpl<const SCEV *> &Processed);

/// Insert the specified binary operator, doing a small amount of work to		/// Insert the specified binary operator, doing a small amount of work to
/// avoid inserting an obviously redundant operation.		/// avoid inserting an obviously redundant operation.
Value InsertBinop(Instruction::BinaryOps Opcode, Value LHS, Value *RHS);		Value InsertBinop(Instruction::BinaryOps Opcode, Value LHS, Value *RHS);

		Value InsertCast(Instruction::CastOps Op, Value V, Type *DestTy);
		Value InsertICmp(CmpInst::Predicate P, Value LHS, Value *RHS);
		Value InsertSelect(Value C, Value True, Value False, const Twine &Name = "");

/// Arrange for there to be a cast of V to Ty at IP, reusing an existing		/// Arrange for there to be a cast of V to Ty at IP, reusing an existing
/// cast if a suitable one exists, moving an existing cast if a suitable one		/// cast if a suitable one exists, moving an existing cast if a suitable one
/// exists but isn't in the right place, or creating a new one.		/// exists but isn't in the right place, or creating a new one.
Value ReuseOrCreateCast(Value V, Type *Ty,		Value ReuseOrCreateCast(Value V, Type *Ty,
Instruction::CastOps Op,		Instruction::CastOps Op,
BasicBlock::iterator IP);		BasicBlock::iterator IP);

/// Insert a cast of V to the specified type, which must be possible with a		/// Insert a cast of V to the specified type, which must be possible with a
/// noop cast, doing what we can to share the casts.		/// noop cast, doing what we can to share the casts.
Value InsertNoopCastOfTo(Value V, Type *Ty);		Value InsertNoopCastOfTo(Value V, Type *Ty);

/// Expand a SCEVAddExpr with a pointer type into a GEP instead of using		/// Expand a SCEVAddExpr with a pointer type into a GEP instead of using
/// ptrtoint+arithmetic+inttoptr.		/// ptrtoint+arithmetic+inttoptr.
Value expandAddToGEP(const SCEV const *op_begin,		Value expandAddToGEP(const SCEV const *op_begin,
const SCEV const op_end,		const SCEV const op_end,
PointerType PTy, Type Ty, Value *V);		PointerType PTy, Type Ty, Value *V);
Value expandAddToGEP(const SCEV Op, PointerType PTy, Type Ty, Value *V);		Value expandAddToGEP(const SCEV Op, PointerType PTy, Type Ty, Value *V);

/// Find a previous Value in ExprValueMap for expand.		/// Find a previous Value in ExprValueMap for expand.
ScalarEvolution::ValueOffsetPair		ScalarEvolution::ValueOffsetPair
FindValueInExprValueMap(const SCEV S, const Instruction InsertPt);		FindValueInExprValueMap(const SCEV S, const Instruction InsertPt);

Value expand(const SCEV S);		//Value expand(const SCEV S);

/// Determine the most "relevant" loop for the given SCEV.		/// Determine the most "relevant" loop for the given SCEV.
const Loop getRelevantLoop(const SCEV );		//const Loop getRelevantLoop(const SCEV );

Value visitConstant(const SCEVConstant S) {		Value visitConstant(const SCEVConstant S) {
return S->getValue();		return S->getValue();
}		}

Value visitTruncateExpr(const SCEVTruncateExpr S);		Value visitTruncateExpr(const SCEVTruncateExpr S);

Value visitZeroExtendExpr(const SCEVZeroExtendExpr S);		Value visitZeroExtendExpr(const SCEVZeroExtendExpr S);
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

include/llvm/CodeGen/SelectionDAG.h

Show First 20 Lines • Show All 264 Lines • ▼ Show 20 Lines	class SelectionDAG {
BumpPtrAllocator Allocator;		BumpPtrAllocator Allocator;

/// Tracks dbg_value and dbg_label information through SDISel.		/// Tracks dbg_value and dbg_label information through SDISel.
SDDbgInfo *DbgInfo;		SDDbgInfo *DbgInfo;

uint16_t NextPersistentId = 0;		uint16_t NextPersistentId = 0;

public:		public:
		DenseMap<const Value, SDValue> crtNodeMapPtr;

		void SetNodeMap(DenseMap<const Value , SDValue> aCrtNodeMapPtr);

		void UpdateNodeMapSDValue(SDNode *oldSDN, SDValue &newSDV);
		arsenmUnsubmitted Not Done Reply Inline Actions This probably should be dropped arsenm: This probably should be dropped
		arsenmUnsubmitted Not Done Reply Inline Actions This still needs to be dropped arsenm: This still needs to be dropped

/// Clients of various APIs that cause global effects on		/// Clients of various APIs that cause global effects on
/// the DAG can optionally implement this interface. This allows the clients		/// the DAG can optionally implement this interface. This allows the clients
/// to handle the various sorts of updates that happen.		/// to handle the various sorts of updates that happen.
///		///
/// A DAGUpdateListener automatically registers itself with DAG when it is		/// A DAGUpdateListener automatically registers itself with DAG when it is
/// constructed, and removes itself when destroyed in RAII fashion.		/// constructed, and removes itself when destroyed in RAII fashion.
struct DAGUpdateListener {		struct DAGUpdateListener {
DAGUpdateListener *const Next;		DAGUpdateListener *const Next;
▲ Show 20 Lines • Show All 931 Lines • ▼ Show 20 Lines	#endif
MachineSDNode *getMachineNode(unsigned Opcode, const SDLoc &dl, EVT VT);		MachineSDNode *getMachineNode(unsigned Opcode, const SDLoc &dl, EVT VT);
MachineSDNode *getMachineNode(unsigned Opcode, const SDLoc &dl, EVT VT,		MachineSDNode *getMachineNode(unsigned Opcode, const SDLoc &dl, EVT VT,
SDValue Op1);		SDValue Op1);
MachineSDNode *getMachineNode(unsigned Opcode, const SDLoc &dl, EVT VT,		MachineSDNode *getMachineNode(unsigned Opcode, const SDLoc &dl, EVT VT,
SDValue Op1, SDValue Op2);		SDValue Op1, SDValue Op2);
MachineSDNode *getMachineNode(unsigned Opcode, const SDLoc &dl, EVT VT,		MachineSDNode *getMachineNode(unsigned Opcode, const SDLoc &dl, EVT VT,
SDValue Op1, SDValue Op2, SDValue Op3);		SDValue Op1, SDValue Op2, SDValue Op3);
MachineSDNode *getMachineNode(unsigned Opcode, const SDLoc &dl, EVT VT,		MachineSDNode *getMachineNode(unsigned Opcode, const SDLoc &dl, EVT VT,
		SDValue Op1, SDValue Op2,
		SDValue Op3, SDValue Op4);
		MachineSDNode *getMachineNode(unsigned Opcode, const SDLoc &dl, EVT VT1,
		EVT VT2, SDValue Op1, SDValue Op2,
		SDValue Op3, SDValue Op4);
		MachineSDNode *getMachineNode(unsigned Opcode, const SDLoc &dl, EVT VT,
		arsenmUnsubmitted Not Done Reply Inline Actions This should be a separate patch (or you could just use the ArrayRef version) arsenm: This should be a separate patch (or you could just use the ArrayRef version)
ArrayRef<SDValue> Ops);		ArrayRef<SDValue> Ops);
MachineSDNode *getMachineNode(unsigned Opcode, const SDLoc &dl, EVT VT1,		MachineSDNode *getMachineNode(unsigned Opcode, const SDLoc &dl, EVT VT1,
EVT VT2, SDValue Op1, SDValue Op2);		EVT VT2, SDValue Op1, SDValue Op2);
MachineSDNode *getMachineNode(unsigned Opcode, const SDLoc &dl, EVT VT1,		MachineSDNode *getMachineNode(unsigned Opcode, const SDLoc &dl, EVT VT1,
EVT VT2, SDValue Op1, SDValue Op2, SDValue Op3);		EVT VT2, SDValue Op1, SDValue Op2, SDValue Op3);
MachineSDNode *getMachineNode(unsigned Opcode, const SDLoc &dl, EVT VT1,		MachineSDNode *getMachineNode(unsigned Opcode, const SDLoc &dl, EVT VT1,
EVT VT2, ArrayRef<SDValue> Ops);		EVT VT2, ArrayRef<SDValue> Ops);
MachineSDNode *getMachineNode(unsigned Opcode, const SDLoc &dl, EVT VT1,		MachineSDNode *getMachineNode(unsigned Opcode, const SDLoc &dl, EVT VT1,
▲ Show 20 Lines • Show All 483 Lines • Show Last 20 Lines

include/llvm/CodeGen/SelectionDAGISel.h

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	public:
SelectionDAGBuilder *SDB;		SelectionDAGBuilder *SDB;
AliasAnalysis *AA;		AliasAnalysis *AA;
GCFunctionInfo *GFI;		GCFunctionInfo *GFI;
CodeGenOpt::Level OptLevel;		CodeGenOpt::Level OptLevel;
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;
const TargetLowering *TLI;		const TargetLowering *TLI;
bool FastISelFailed;		bool FastISelFailed;
SmallPtrSet<const Instruction *, 4> ElidedArgCopyInstrs;		SmallPtrSet<const Instruction *, 4> ElidedArgCopyInstrs;
		DenseMap<const Value*, SDValue> crtNodeMap;

/// Current optimization remark emitter.		/// Current optimization remark emitter.
/// Used to report things like combines and FastISel failures.		/// Used to report things like combines and FastISel failures.
std::unique_ptr<OptimizationRemarkEmitter> ORE;		std::unique_ptr<OptimizationRemarkEmitter> ORE;

static char ID;		static char ID;

explicit SelectionDAGISel(TargetMachine &tm,		explicit SelectionDAGISel(TargetMachine &tm,
▲ Show 20 Lines • Show All 295 Lines • Show Last 20 Lines

include/llvm/CodeGen/SlotIndexes.h

Show First 20 Lines • Show All 403 Lines • ▼ Show 20 Lines	public:
/// Returns true if the given machine instr is mapped to an index,		/// Returns true if the given machine instr is mapped to an index,
/// otherwise returns false.		/// otherwise returns false.
bool hasIndex(const MachineInstr &instr) const {		bool hasIndex(const MachineInstr &instr) const {
return mi2iMap.count(&instr);		return mi2iMap.count(&instr);
}		}

/// Returns the base index for the given instruction.		/// Returns the base index for the given instruction.
SlotIndex getInstructionIndex(const MachineInstr &MI) const {		SlotIndex getInstructionIndex(const MachineInstr &MI) const {
		// Alex: new code
		#ifndef DEBUG_TYPE
		#define DEBUG_TYPE "slotindexes_h"
		LLVM_DEBUG(dbgs() << "SlotIndex::getInstructionIndex(): MI = "
		<< MI
		<< ", MI.getOpcode() = " << MI.getOpcode()
		<< "\n");
		#undef DEBUG_TYPE
		#endif
		// Alex: END new code
// Instructions inside a bundle have the same number as the bundle itself.		// Instructions inside a bundle have the same number as the bundle itself.
auto BundleStart = getBundleStart(MI.getIterator());		auto BundleStart = getBundleStart(MI.getIterator());
auto BundleEnd = getBundleEnd(MI.getIterator());		auto BundleEnd = getBundleEnd(MI.getIterator());
// Use the first non-debug instruction in the bundle to get SlotIndex.		// Use the first non-debug instruction in the bundle to get SlotIndex.
const MachineInstr &BundleNonDebug =		const MachineInstr &BundleNonDebug =
*skipDebugInstructionsForward(BundleStart, BundleEnd);		*skipDebugInstructionsForward(BundleStart, BundleEnd);
		#ifndef DEBUG_TYPE
		#define DEBUG_TYPE "slotindexes_h"
		LLVM_DEBUG(dbgs() << "SlotIndex::getInstructionIndex(): BundleNonDebug = "
		<< BundleNonDebug << " "
		<< &BundleNonDebug
		<< ", BundleNonDebug.getOpcode() = " << BundleNonDebug.getOpcode()
		<< "\n");
		#undef DEBUG_TYPE
		#endif
assert(!BundleNonDebug.isDebugInstr() &&		assert(!BundleNonDebug.isDebugInstr() &&
"Could not use a debug instruction to query mi2iMap.");		"Could not use a debug instruction to query mi2iMap.");
Mi2IndexMap::const_iterator itr = mi2iMap.find(&BundleNonDebug);		Mi2IndexMap::const_iterator itr = mi2iMap.find(&BundleNonDebug);
		if (itr == mi2iMap.end()) {
		#ifndef DEBUG_TYPE
		#define DEBUG_TYPE "slotindexes_h"
		for (Mi2IndexMap::const_iterator itrAlex = mi2iMap.begin();
		itrAlex != mi2iMap.end(); itrAlex++) {
		LLVM_DEBUG(dbgs() << " *(itrAlex->first) = "
		<< itrAlex->first << " "
		<< *(itrAlex->first) << "\n");
		}
		#undef DEBUG_TYPE
		#endif
		}
assert(itr != mi2iMap.end() && "Instruction not found in maps.");		assert(itr != mi2iMap.end() && "Instruction not found in maps.");
return itr->second;		return itr->second;
}		}

/// Returns the instruction for the given index, or null if the given		/// Returns the instruction for the given index, or null if the given
/// index has no instruction associated with it.		/// index has no instruction associated with it.
MachineInstr* getInstructionFromIndex(SlotIndex index) const {		MachineInstr* getInstructionFromIndex(SlotIndex index) const {
return index.isValid() ? index.listEntry()->getInstr() : nullptr;		return index.isValid() ? index.listEntry()->getInstr() : nullptr;
▲ Show 20 Lines • Show All 289 Lines • Show Last 20 Lines

include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 1,176 Lines • ▼ Show 20 Lines
	include "llvm/IR/IntrinsicsARM.td"			include "llvm/IR/IntrinsicsARM.td"
	include "llvm/IR/IntrinsicsAArch64.td"			include "llvm/IR/IntrinsicsAArch64.td"
	include "llvm/IR/IntrinsicsXCore.td"			include "llvm/IR/IntrinsicsXCore.td"
	include "llvm/IR/IntrinsicsHexagon.td"			include "llvm/IR/IntrinsicsHexagon.td"
	include "llvm/IR/IntrinsicsNVVM.td"			include "llvm/IR/IntrinsicsNVVM.td"
	include "llvm/IR/IntrinsicsMips.td"			include "llvm/IR/IntrinsicsMips.td"
	include "llvm/IR/IntrinsicsAMDGPU.td"			include "llvm/IR/IntrinsicsAMDGPU.td"
	include "llvm/IR/IntrinsicsBPF.td"			include "llvm/IR/IntrinsicsBPF.td"
				include "llvm/IR/IntrinsicsConnex.td"
	include "llvm/IR/IntrinsicsSystemZ.td"			include "llvm/IR/IntrinsicsSystemZ.td"
	include "llvm/IR/IntrinsicsWebAssembly.td"			include "llvm/IR/IntrinsicsWebAssembly.td"
	include "llvm/IR/IntrinsicsRISCV.td"			include "llvm/IR/IntrinsicsRISCV.td"

include/llvm/IR/IntrinsicsConnex.td

				//===- IntrinsicsConnex.td - Defines Connex-S intrinsics ---- tablegen --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file defines all of the Connex-specific intrinsics.
				//
				//===----------------------------------------------------------------------===//

				// All Connex-S vector processor intrinsics start with "llvm.connex."
				//
				let TargetPrefix = "connex" in {

				/*
				* Note: all intrinsics defined in these .td files start with
				* the int_ prefix (from intrinsic). For this file they start with
				* int_connex prefix - otherwise we get the following TableGen error
				* <<error:Intrinsic 'int_end_repeat' does not start with 'llvm.connex.'!>>
				*
				* The LLVM IR intrinsics extend the LLVM language s.t. we can use
				* these instructions in an LLVM IR program. We also need to define the
				* corresponding assembly instructions in the back end TableGen files.
				*/

				/* Following Intrinsics.td:
				class Intrinsic<list<LLVMType> ret_types,
				list<LLVMType> param_types = [],
				list<IntrinsicProperty> properties = [],
				string name = "">
				*/


				/* Small-note:
				llvm_i64_ty makes simpler my LLVM IR generation in the LoopVectorize.cpp
				module:
				def int_connex_repeat_x_times : Intrinsic<[], [llvm_i64_ty], []>;
				But llvm_i32_ty is in accordance to the original i32 type of n.vec in the
				LoopVectorize.cpp module:
				def int_connex_repeat_x_times : Intrinsic<[], [llvm_i32_ty], []>;

				Small-note: We get inspired from include/llvm/IR/IntrinsicsPowerPC.td:
				// Intrinsics used to generate ctrl-based loops.
				def int_ppc_mtctr : Intrinsic<[], [llvm_anyint_ty], []>;

				Small-note: Trying to use a polymorphic definition, which requires
				specifying the actual type in Function::Create(FunctionType::get(), ...)
				is:
				def int_connex_repeat_x_times : Intrinsic<[], [llvm_anyint_ty], []>;
				When instantiating it in LoopVectorize.cpp like this:
				Value *instrinsicFunc = Intrinsic::getDeclaration(M,
				Intrinsic::connex_repeat_x_times);
				it gives error at runtime:
				llvm::ArrayRef<T>::operator[](size_t) const [with T = llvm::Type*;
				size_t = long unsigned int]: Assertion `Index < Length &&
				"Invalid index!"' failed.
				*/
				def int_connex_repeat_x_times : Intrinsic<[], [llvm_i64_ty], []>;
				def int_connex_end_repeat : Intrinsic<[], [], []>;

				/* Note: Possibly useful in the future.
				Connex Opincaa's END_REPEAT does not have a relative offset,
				as the standard Connex assembly ijmpnzdec instruction,
				since it falls on Opincaa to compute the jump back relative offset.
				We can also use a setlc to position it outside the loop created by the
				ijmpnzdec instruction by using it inside a delay-slot instruction.

				def int_connex_setlc : Intrinsic<[], [llvm_i16_ty], []>;
				def int_connex_ijmpnzdec : Intrinsic<[], [], []>;
				*/



				/* IMPORTANT: REDUCE cannot return a value. It is the duty of the host (CPU)
				to read the result itself from the REDUCE issued by Connex-S.
				Therefore this definition is incorrect:
				def int_connex_reduce : Intrinsic<[llvm_i32_ty], [llvm_v128i16_ty], []>;
				*/
				/* GOOD:
				def int_connex_reduce : Intrinsic<[], [llvm_v128i16_ty], []>;
				def int_connex_reduce_i32 : Intrinsic<[], [llvm_v64i32_ty], []>;
				def int_connex_reduce_f16 : Intrinsic<[], [llvm_v128f16_ty], []>;
				*/
				def int_connex_reduce : Intrinsic<[], [llvm_anyvector_ty], []>;

				/* Note: ctpop is already defined in Intrinsics.td.
				So the below definition is not required:
				def int_connex_ctpop : Intrinsic<[llvm_v8i16_ty],
				[llvm_v8i16_ty], []>;
				*/


				// Inherited BPF scalar intrinsics: Specialized loads from packet
				def int_connex_load_byte : GCCBuiltin<"__builtin_connex_load_byte">,
				Intrinsic<[llvm_i64_ty], [llvm_ptr_ty, llvm_i64_ty], [IntrReadMem]>;
				def int_connex_load_half : GCCBuiltin<"__builtin_connex_load_half">,
				Intrinsic<[llvm_i64_ty], [llvm_ptr_ty, llvm_i64_ty], [IntrReadMem]>;
				def int_connex_load_word : GCCBuiltin<"__builtin_connex_load_word">,
				Intrinsic<[llvm_i64_ty], [llvm_ptr_ty, llvm_i64_ty], [IntrReadMem]>;
				def int_connex_pseudo : GCCBuiltin<"__builtin_connex_pseudo">,
				Intrinsic<[llvm_i64_ty], [llvm_i64_ty, llvm_i64_ty]>;
				}

lib/CodeGen/LiveRangeCalc.cpp

Show All 16 Lines
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/CodeGen/LiveInterval.h"		#include "llvm/CodeGen/LiveInterval.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineDominators.h"		#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineInstr.h"		#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineOperand.h"		#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
		// Alex: new code
		// Assuming current include folder is llvm/lib/CodeGen (NOT in include)
		// NOT working: #include "ConnexGenInstrInfo.inc"
		#define GET_INSTRINFO_ENUM
		#include "/home/asusu/LLVM/llvm2019_03_29/llvm/lib/Target/Connex/ConnexGenInstrInfo.inc" // small-MEGA-TODO: this include is specific to my desktop...
		// NOT really good: #include "../Target/Connex/ConnexGenInstrInfo.inc" // This uses a temporary file
		/*
		// NOT working:
		#include "../Target/Connex/Connex.h"
		#include "../Target/Connex/ConnexInstrInfo.h"
		#include "../Target/Connex/ConnexMCInstLower.h"
		#include "../Target/Connex/ConnexTargetMachine.h"
		*/
		// Alex: END new code
		arsenmUnsubmitted Not Done Reply Inline Actions This needs to be removed arsenm: This needs to be removed
#include "llvm/CodeGen/SlotIndexes.h"		#include "llvm/CodeGen/SlotIndexes.h"
#include "llvm/CodeGen/TargetRegisterInfo.h"		#include "llvm/CodeGen/TargetRegisterInfo.h"
#include "llvm/MC/LaneBitmask.h"		#include "llvm/MC/LaneBitmask.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <iterator>		#include <iterator>
▲ Show 20 Lines • Show All 166 Lines • ▼ Show 20 Lines	if (MI->isPHI()) {
unsigned DefIdx;		unsigned DefIdx;
if (MO.isDef())		if (MO.isDef())
isEarlyClobber = MO.isEarlyClobber();		isEarlyClobber = MO.isEarlyClobber();
else if (MI->isRegTiedToDefOperand(OpNo, &DefIdx)) {		else if (MI->isRegTiedToDefOperand(OpNo, &DefIdx)) {
// FIXME: This would be a lot easier if tied early-clobber uses also		// FIXME: This would be a lot easier if tied early-clobber uses also
// had an early-clobber flag.		// had an early-clobber flag.
isEarlyClobber = MI->getOperand(DefIdx).isEarlyClobber();		isEarlyClobber = MI->getOperand(DefIdx).isEarlyClobber();
}		}
		// Alex: new code
		LLVM_DEBUG(dbgs() << "LiveRangeCalc::extendToUses(): *MI = "
		<< *MI
		<< "\n");
		if (MI->getOpcode() == llvm::Connex::ST_SPILL_H) {
		LLVM_DEBUG(dbgs() << "LiveRangeCalc::extendToUses(): "
		"found llvm::Connex::ST_SPILL_H\n");
		continue;
		}
		// Alex: END new code
UseIdx = Indexes->getInstructionIndex(*MI).getRegSlot(isEarlyClobber);		UseIdx = Indexes->getInstructionIndex(*MI).getRegSlot(isEarlyClobber);
}		}

// MI is reading Reg. We may have visited MI before if it happens to be		// MI is reading Reg. We may have visited MI before if it happens to be
// reading Reg multiple times. That is OK, extend() is idempotent.		// reading Reg multiple times. That is OK, extend() is idempotent.
extend(LR, UseIdx, Reg, Undefs);		extend(LR, UseIdx, Reg, Undefs);
}		}
}		}
▲ Show 20 Lines • Show All 393 Lines • Show Last 20 Lines

lib/CodeGen/RegAllocGreedy.cpp

Show First 20 Lines • Show All 1,214 Lines • ▼ Show 20 Lines	if (BI.LiveIn) {
BC.Entry = SpillPlacement::MustSpill;		BC.Entry = SpillPlacement::MustSpill;
++Ins;		++Ins;
} else if (Intf.first() < BI.FirstInstr) {		} else if (Intf.first() < BI.FirstInstr) {
BC.Entry = SpillPlacement::PrefSpill;		BC.Entry = SpillPlacement::PrefSpill;
++Ins;		++Ins;
} else if (Intf.first() < BI.LastInstr) {		} else if (Intf.first() < BI.LastInstr) {
++Ins;		++Ins;
}		}
		// Alex: new code
		// Alex: END new code
		arsenmUnsubmitted Not Done Reply Inline Actions Needs to be removed arsenm: Needs to be removed
// Abort if the spill cannot be inserted at the MBB' start		// Abort if the spill cannot be inserted at the MBB' start
if (((BC.Entry == SpillPlacement::MustSpill) \|\|		if (((BC.Entry == SpillPlacement::MustSpill) \|\|
(BC.Entry == SpillPlacement::PrefSpill)) &&		(BC.Entry == SpillPlacement::PrefSpill)) &&
SlotIndex::isEarlierInstr(BI.FirstInstr,		SlotIndex::isEarlierInstr(BI.FirstInstr,
SA->getFirstSplitPoint(BC.Number)))		SA->getFirstSplitPoint(BC.Number)))
return false;		return false;
}		}

▲ Show 20 Lines • Show All 2,035 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,463 Lines • ▼ Show 20 Lines	if (RV.getNode() == N)
continue;		continue;

assert(N->getOpcode() != ISD::DELETED_NODE &&		assert(N->getOpcode() != ISD::DELETED_NODE &&
RV.getOpcode() != ISD::DELETED_NODE &&		RV.getOpcode() != ISD::DELETED_NODE &&
"Node was deleted but visit returned new node!");		"Node was deleted but visit returned new node!");

LLVM_DEBUG(dbgs() << " ... into: "; RV.getNode()->dump(&DAG));		LLVM_DEBUG(dbgs() << " ... into: "; RV.getNode()->dump(&DAG));

		// Replacing SDNode N with RV in crtNodeMap
		DAG.UpdateNodeMapSDValue(N, RV);
		arsenmUnsubmitted Not Done Reply Inline Actions Needs to be removed arsenm: Needs to be removed

if (N->getNumValues() == RV.getNode()->getNumValues())		if (N->getNumValues() == RV.getNode()->getNumValues())
DAG.ReplaceAllUsesWith(N, RV.getNode());		DAG.ReplaceAllUsesWith(N, RV.getNode());
else {		else {
assert(N->getValueType(0) == RV.getValueType() &&		assert(N->getValueType(0) == RV.getValueType() &&
N->getNumValues() == 1 && "Type mismatch");		N->getNumValues() == 1 && "Type mismatch");
DAG.ReplaceAllUsesWith(N, &RV);		DAG.ReplaceAllUsesWith(N, &RV);
}		}

▲ Show 20 Lines • Show All 18,343 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines

/// makeVTList - Return an instance of the SDVTList struct initialized with the		/// makeVTList - Return an instance of the SDVTList struct initialized with the
/// specified members.		/// specified members.
static SDVTList makeVTList(const EVT *VTs, unsigned NumVTs) {		static SDVTList makeVTList(const EVT *VTs, unsigned NumVTs) {
SDVTList Res = {VTs, NumVTs};		SDVTList Res = {VTs, NumVTs};
return Res;		return Res;
}		}

		void SelectionDAG::SetNodeMap(DenseMap<const Value , SDValue> aCrtNodeMapPtr) {
		arsenmUnsubmitted Not Done Reply Inline Actions Ditto arsenm: Ditto
		crtNodeMapPtr = aCrtNodeMapPtr;
		}

		void SelectionDAG::UpdateNodeMapSDValue(SDNode *oldSDN, SDValue &newSDV) {
		/* NOTE: SelectionDAGBuilder defines DenseMap NodeMap.
		* I added in SelectionDAGISel a copy of it, crtNodeMap.
		* The pointer crtNodeMapPtr here is the pointer of crtNodeMap
		* initialized in SelectionDAGISel::CodeGenAndEmitDAG().
		*/
		for (auto iterNodeMap = crtNodeMapPtr->begin();
		iterNodeMap != crtNodeMapPtr->end(); iterNodeMap++) {
		auto tmp1 = (*iterNodeMap);

		const Value crtValue = (const Value )(tmp1.first);

		SDValue crtSDValue = tmp1.second;
		SDNode *crtSDNode = crtSDValue.getNode();

		if (crtSDNode == oldSDN) {
		(*crtNodeMapPtr)[crtValue] = newSDV;
		break;
		}
		}
		}


// Default null implementations of the callbacks.		// Default null implementations of the callbacks.
void SelectionDAG::DAGUpdateListener::NodeDeleted(SDNode, SDNode) {}		void SelectionDAG::DAGUpdateListener::NodeDeleted(SDNode, SDNode) {}
void SelectionDAG::DAGUpdateListener::NodeUpdated(SDNode*) {}		void SelectionDAG::DAGUpdateListener::NodeUpdated(SDNode*) {}

void SelectionDAG::DAGNodeDeletedListener::anchor() {}		void SelectionDAG::DAGNodeDeletedListener::anchor() {}

#define DEBUG_TYPE "selectiondag"		#define DEBUG_TYPE "selectiondag"

▲ Show 20 Lines • Show All 7,705 Lines • ▼ Show 20 Lines	MachineSDNode *SelectionDAG::getMachineNode(unsigned Opcode, const SDLoc &dl,
SDValue Op1, SDValue Op2,		SDValue Op1, SDValue Op2,
SDValue Op3) {		SDValue Op3) {
SDVTList VTs = getVTList(VT1, VT2, VT3);		SDVTList VTs = getVTList(VT1, VT2, VT3);
SDValue Ops[] = { Op1, Op2, Op3 };		SDValue Ops[] = { Op1, Op2, Op3 };
return getMachineNode(Opcode, dl, VTs, Ops);		return getMachineNode(Opcode, dl, VTs, Ops);
}		}

MachineSDNode *SelectionDAG::getMachineNode(unsigned Opcode, const SDLoc &dl,		MachineSDNode *SelectionDAG::getMachineNode(unsigned Opcode, const SDLoc &dl,
		EVT VT,
		SDValue Op1, SDValue Op2,
		SDValue Op3, SDValue Op4) {
		SDVTList VTs = getVTList(VT);
		SDValue Ops[] = { Op1, Op2, Op3, Op4 };
		return getMachineNode(Opcode, dl, VTs, Ops);
		}

		MachineSDNode *SelectionDAG::getMachineNode(unsigned Opcode, const SDLoc &dl,
		EVT VT1, EVT VT2,
		SDValue Op1, SDValue Op2,
		SDValue Op3, SDValue Op4) {
		SDVTList VTs = getVTList(VT1, VT2);
		SDValue Ops[] = { Op1, Op2, Op3, Op4 };
		return getMachineNode(Opcode, dl, VTs, Ops);
		}

		MachineSDNode *SelectionDAG::getMachineNode(unsigned Opcode, const SDLoc &dl,
EVT VT1, EVT VT2, EVT VT3,		EVT VT1, EVT VT2, EVT VT3,
ArrayRef<SDValue> Ops) {		ArrayRef<SDValue> Ops) {
SDVTList VTs = getVTList(VT1, VT2, VT3);		SDVTList VTs = getVTList(VT1, VT2, VT3);
return getMachineNode(Opcode, dl, VTs, Ops);		return getMachineNode(Opcode, dl, VTs, Ops);
}		}

MachineSDNode *SelectionDAG::getMachineNode(unsigned Opcode, const SDLoc &dl,		MachineSDNode *SelectionDAG::getMachineNode(unsigned Opcode, const SDLoc &dl,
ArrayRef<EVT> ResultTys,		ArrayRef<EVT> ResultTys,
▲ Show 20 Lines • Show All 1,682 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

Show All 27 Lines
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/DebugLoc.h"		#include "llvm/IR/DebugLoc.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Statepoint.h"		#include "llvm/IR/Statepoint.h"
#include "llvm/Support/BranchProbability.h"		#include "llvm/Support/BranchProbability.h"
#include "llvm/Support/CodeGen.h"		#include "llvm/Support/CodeGen.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/MachineValueType.h"		#include "llvm/Support/MachineValueType.h"
		#include "llvm/Support/Debug.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <utility>		#include <utility>
#include <vector>		#include <vector>

		efriedmaUnsubmitted Done Reply Inline Actions Stray include? efriedma: Stray include?
namespace llvm {		namespace llvm {

class AllocaInst;		class AllocaInst;
class AtomicCmpXchgInst;		class AtomicCmpXchgInst;
class AtomicRMWInst;		class AtomicRMWInst;
class BasicBlock;		class BasicBlock;
class BranchInst;		class BranchInst;
class CallInst;		class CallInst;
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	class SelectionDAGBuilder {
/// Helper type for DanglingDebugInfoMap.		/// Helper type for DanglingDebugInfoMap.
typedef std::vector<DanglingDebugInfo> DanglingDebugInfoVector;		typedef std::vector<DanglingDebugInfo> DanglingDebugInfoVector;

/// Keeps track of dbg_values for which we have not yet seen the referent.		/// Keeps track of dbg_values for which we have not yet seen the referent.
/// We defer handling these until we do see it.		/// We defer handling these until we do see it.
MapVector<const Value*, DanglingDebugInfoVector> DanglingDebugInfoMap;		MapVector<const Value*, DanglingDebugInfoVector> DanglingDebugInfoMap;

public:		public:
		// Add a getter for NodeMap
		DenseMap<const Value*, SDValue> &getNodeMap() {
		return NodeMap;
		}

/// Loads are not emitted to the program immediately. We bunch them up and		/// Loads are not emitted to the program immediately. We bunch them up and
/// then emit token factor nodes when possible. This allows us to get simple		/// then emit token factor nodes when possible. This allows us to get simple
/// disambiguation between loads without worrying about alias analysis.		/// disambiguation between loads without worrying about alias analysis.
SmallVector<SDValue, 8> PendingLoads;		SmallVector<SDValue, 8> PendingLoads;

/// State used while lowering a statepoint sequence (gc_statepoint,		/// State used while lowering a statepoint sequence (gc_statepoint,
/// gc_relocate, and gc_result). See StatepointLowering.hpp/cpp for details.		/// gc_relocate, and gc_result). See StatepointLowering.hpp/cpp for details.
StatepointLoweringState StatepointLowering;		StatepointLoweringState StatepointLowering;
▲ Show 20 Lines • Show All 955 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp

Show First 20 Lines • Show All 686 Lines • ▼ Show 20 Lines	for (BasicBlock::const_iterator I = Begin; I != End && !SDB->HasTailCall; ++I) {
if (!ElidedArgCopyInstrs.count(&*I))		if (!ElidedArgCopyInstrs.count(&*I))
SDB->visit(*I);		SDB->visit(*I);
}		}

// Make sure the root of the DAG is up-to-date.		// Make sure the root of the DAG is up-to-date.
CurDAG->setRoot(SDB->getControlRoot());		CurDAG->setRoot(SDB->getControlRoot());
HadTailCall = SDB->HasTailCall;		HadTailCall = SDB->HasTailCall;
SDB->resolveOrClearDbgInfo();		SDB->resolveOrClearDbgInfo();

		crtNodeMap = SDB->getNodeMap();

SDB->clear();		SDB->clear();

// Final step, emit the lowered DAG as machine code.		// Final step, emit the lowered DAG as machine code.
CodeGenAndEmitDAG();		CodeGenAndEmitDAG();
}		}

void SelectionDAGISel::ComputeLiveOutVRegInfo() {		void SelectionDAGISel::ComputeLiveOutVRegInfo() {
SmallPtrSet<SDNode*, 16> VisitedNodes;		SmallPtrSet<SDNode*, 16> VisitedNodes;
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	LLVM_DEBUG(dbgs() << "Initial selection DAG: "
<< "'\n";		<< "'\n";
CurDAG->dump());		CurDAG->dump());

if (ViewDAGCombine1 && MatchFilterBB)		if (ViewDAGCombine1 && MatchFilterBB)
CurDAG->viewGraph("dag-combine1 input for " + BlockName);		CurDAG->viewGraph("dag-combine1 input for " + BlockName);

// Run the DAG combiner in pre-legalize mode.		// Run the DAG combiner in pre-legalize mode.
{		{
		// We should do this only once
		CurDAG->SetNodeMap(&crtNodeMap);

NamedRegionTimer T("combine1", "DAG Combining 1", GroupName,		NamedRegionTimer T("combine1", "DAG Combining 1", GroupName,
GroupDescription, TimePassesIsEnabled);		GroupDescription, TimePassesIsEnabled);
CurDAG->Combine(BeforeLegalizeTypes, AA, OptLevel);		CurDAG->Combine(BeforeLegalizeTypes, AA, OptLevel);
}		}

#ifndef NDEBUG		#ifndef NDEBUG
if (TTI.hasBranchDivergence())		if (TTI.hasBranchDivergence())
CurDAG->VerifyDAGDiverence();		CurDAG->VerifyDAGDiverence();
▲ Show 20 Lines • Show All 3,101 Lines • Show Last 20 Lines

lib/Target/Connex/Connex.h

				//===-- Connex.h - Top-level interface for Connex representation ------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				///
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_CONNEX_CONNEX_H
				#define LLVM_LIB_TARGET_CONNEX_CONNEX_H

				#include "MCTargetDesc/ConnexMCTargetDesc.h"
				#include "llvm/Target/TargetMachine.h"


				// We define reserved register(s) of Connex to use for:
				// - handling COPY instructions in WHERE blocks
				// (see ConnexTargetMachine.cpp and ConnexISelLowering.cpp), etc
				#define CONNEX_RESERVED_REGISTER_01 Connex::Wh30
				#define CONNEX_RESERVED_REGISTER_02 Connex::Wh31
				#define CONNEX_RESERVED_REGISTER_03 Connex::Wh29

				#define COPY_REGISTER_IMPLEMENTED_WITH_ORV_H

				namespace llvm {
				class ConnexTargetMachine;

				FunctionPass *createConnexISelDag(ConnexTargetMachine &TM);
				}

				#endif

lib/Target/Connex/ConnexAsmPrinter.cpp

				//===-- ConnexAsmPrinter.cpp - Connex LLVM assembly writer ----------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains a printer that converts from our internal representation
				// of machine-dependent LLVM code to the Connex assembly language.
				//
				//===----------------------------------------------------------------------===//

				#include "Connex.h"
				#include "ConnexConfig.h"
				#include "ConnexAsmPrinterLoopNests.h"
				#include "ConnexInstrInfo.h"
				#include "ConnexMCInstLower.h"
				#include "ConnexTargetMachine.h"
				// 2019_03_30_TODO: #include "BTFDebug.h"
				#include "InstPrinter/ConnexInstPrinter.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/CodeGen/AsmPrinter.h"
				#include "llvm/CodeGen/MachineConstantPool.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstr.h"
				#include "llvm/CodeGen/MachineModuleInfo.h"
				#include "llvm/MC/MCAsmInfo.h"
				#include "llvm/MC/MCInst.h"
				#include "llvm/MC/MCStreamer.h"
				#include "llvm/MC/MCSymbol.h"
				#include "llvm/Support/TargetRegistry.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/Support/CommandLine.h" //See http://llvm.org/docs/CommandLine.html
				#include <unordered_map>


				using namespace llvm;

				// Inspired from llvm/lib/CodeGen/TargetPassConfig.cpp
				static cl::opt<bool> EnableCorrectBBsASMPrint("enable-correct-asm-print",
				cl::Hidden,
				cl::init(true),
				//cl::desc("Enable special instrumentation in ConnexASMPrinter")
				cl::desc("Correct the BBs of 2nd innermost loop in loop nests of kernels "
				"and use normally REPEAT for it and host-side Opincaa C++ for() as "
				"the innermost loop"));

				static cl::opt<bool> TreatRepeat2ndInnerLoopGlobalTmp("treat-repeat-2nd-inner-loop",
				cl::Hidden,
				cl::init(true),
				cl::desc("Treat well 2nd inner loop in kernel and use normally REPEAT "
				arsenmUnsubmitted Done Reply Inline Actions This should be after all includes arsenm: This should be after all includes
				"for it and host-side Opincaa C++ for() as the inner loop"));


				#define DEBUG_TYPE "asm-printer"


				arsenmUnsubmitted Done Reply Inline Actions Sort includes arsenm: Sort includes

				// We need to store the correspondence between MachineInstr and the lowered
				// MCInst, since MCInst does not.
				// This is used in ConnexInstPrinter.cpp.
				const MachineInstr *crtMI;
				arsenmUnsubmitted Not Done Reply Inline Actions You should not need this, also no global arsenm: You should not need this, also no global
				extern std::unordered_map<const MachineInstr , const MachineInstr > mapLD_ST_REPEAT_InlineAsm;




				namespace {
				class ConnexAsmPrinter : public AsmPrinter {
				public:
				explicit ConnexAsmPrinter(TargetMachine &TM,
				std::unique_ptr<MCStreamer> Streamer)
				: AsmPrinter(TM, std::move(Streamer)) {}

				StringRef getPassName() const override { return "Connex Assembly Printer"; }

				/*
				(From http://llvm.org/docs/doxygen/html/classllvm_1_1MachineFunctionPass.html
				we see SelectionDAGISel and AsmPrinter were the only passes that inherit
				MachineFunctionPass, from this back end.)
				From http://llvm.org/docs/doxygen/html/AsmPrinter_8h_source.html:
				/// Set up the AsmPrinter when we are working on a new module. If your pass
				/// overrides this, it must make sure to explicitly call this implementation.
				*/
				arsenmUnsubmitted Not Done Reply Inline Actions These various places linking to other documentation should be removed. Doxygen generates the appropriate links arsenm: These various places linking to other documentation should be removed. Doxygen generates the…

				bool isVectorBody(StringRef &&strRef) {
				#define STR_VECTOR_BODY "vector.body"
				#define STR_VECTOR_BODY_PREHEADER ".preheader"

				LLVM_DEBUG(dbgs() << "isVectorBody(): strRef = " << strRef << "\n");

				// We can have several BBs with name vector.bodyXYZT (but we do NOT
				// search for STR_VECTOR_BODY_PREHEADER, which can be e.g.,
				// vector.body40.preheader)
				if (strRef.startswith(StringRef(STR_VECTOR_BODY)) &&
				strRef.endswith(StringRef(STR_VECTOR_BODY_PREHEADER)))
				return false;

				if (strRef.startswith(StringRef(STR_VECTOR_BODY)) == false)
				return false;

				/*
				const char *str = strRef.data();

				if ((strncmp(str, STR_VECTOR_BODY,
				strlen(STR_VECTOR_BODY)) == 0) &&
				(strncmp(str + strlen(str) - strlen(STR_VECTOR_BODY_PREHEADER),
				STR_VECTOR_BODY_PREHEADER,
				strlen(STR_VECTOR_BODY_PREHEADER)) == 0))
				return false;

				if (strncmp(str, STR_VECTOR_BODY, strlen(STR_VECTOR_BODY)) != 0)
				return false;
				*/

				arsenmUnsubmitted Not Done Reply Inline Actions Remove commented out code arsenm: Remove commented out code
				LLVM_DEBUG(dbgs() << "isVectorBody(): returning true\n");

				return true;
				}


				int ifImmSpecialUpdateMap(const MachineInstr MI, const MachineInstr MI2) {
				unsigned imm;

				if (MI2->getOpcode() == Connex::REPEAT) {
				const MachineOperand &MI2MO0 = MI2->getOperand(0);
				LLVM_DEBUG(dbgs() << "ifImmSpecialUpdateMap(): MI2MO0 = "
				<< MI2MO0 << "\n");

				imm = MI2MO0.getImm();
				}
				else {
				const MachineOperand &MI2MO0 = MI2->getOperand(0);
				LLVM_DEBUG(dbgs() << "ifImmSpecialUpdateMap(): MI2MO0 = "
				<< MI2MO0 << "\n");

				const MachineOperand &MI2MO1 = MI2->getOperand(1);
				LLVM_DEBUG(dbgs() << "ifImmSpecialUpdateMap(): MI2MO1 = "
				<< MI2MO1 << "\n");

				imm = MI2MO1.getImm();
				}

				LLVM_DEBUG(dbgs() << "ifImmSpecialUpdateMap(): imm = "
				<< imm << "\n");

				if ((imm == CONNEX_MEM_NUM_ROWS + 10) \|\|
				(imm == VALUE_BOGUS_REPEAT_X_TIMES)) {
				LLVM_DEBUG(dbgs() << "ifImmSpecialUpdateMap(): MI2 = "
				<< *MI2 << "\n");
				LLVM_DEBUG(dbgs() << "ifImmSpecialUpdateMap(): MI->getOperand(0) = "
				<< MI->getOperand(0) << "\n");
				LLVM_DEBUG(dbgs() << "ifImmSpecialUpdateMap(): MI = "
				<< MI
				<< ", MI2 (ptr) = " << MI2 << "\n");

				mapLD_ST_REPEAT_InlineAsm[MI2] = MI;
				return 1;
				}

				return -1;
				}

				arsenmUnsubmitted Done Reply Inline Actions isInlineAsm() (also for others) arsenm: isInlineAsm() (also for others)

				void MoveToFrontRepeat(MachineBasicBlock *MBB) {
				LLVM_DEBUG(dbgs() << "Entered MoveToFrontRepeat(MBB = "
				<< MBB << ")\n");

				// Moving the REPEAT and it's symbolic operand in INLINEASM at the
				// front of the MBB.
				for (auto MIItr = MBB->begin(); MIItr != MBB->end(); ++MIItr) {
				MachineInstr MI = &(MIItr);

				if (MI->getOpcode() == Connex::REPEAT_SYM_IMM) {
				LLVM_DEBUG(dbgs() << "MoveToFrontRepeat(): Found Connex::REPEAT_SYM_IMM\n");
				MIItr++;

				MachineInstr MI2 = &(MIItr);

				if (MI2->isInlineAsm()) {
				LLVM_DEBUG(dbgs() << "MoveToFrontRepeat(): Moving the successor "
				"INLINEASM together with the Connex::REPEAT_SYM_IMM\n");

				MBB->remove(MI2);
				MBB->insert(MBB->front(), MI2);
				}
				else {
				MIItr++;
				MI2 = &(*MIItr);

				LLVM_DEBUG(dbgs() << "MoveToFrontRepeat(): Moving the following "
				"(not successor) INLINEASM together with the "
				"Connex::REPEAT_SYM_IMM\n");
				//MIItr++;

				if (MI2->isInlineAsm()) {
				MBB->remove(MI2);
				MBB->insert(MBB->front(), MI2);
				}
				else {
				assert(0 && "Can't find INLINEASM associated to REPEAT_SYM_IMM");
				}
				}

				LLVM_DEBUG(dbgs() << "MoveToFrontRepeat(): Moving Connex::REPEAT_SYM_IMM\n");

				MBB->remove(MI);
				MBB->insert(MBB->front(), MI);

				break;
				}
				}
				}


				void MoveToFrontInlineAsm(MachineBasicBlock MBB, char strToSearch) {
				LLVM_DEBUG(dbgs() << "Entered MoveToFrontInlineAsm(MBB = "
				<< MBB
				<< ", strToSearch = " << strToSearch << ")\n");

				// Moving the REPEAT and it's symbolic operand in INLINEASM at the
				// front of the MBB.
				for (auto MIItr = MBB->begin(); MIItr != MBB->end(); /* ++MIItr */) {
				MachineInstr MI = &(MIItr);

				arsenmUnsubmitted Done Reply Inline Actions A large number of the debug statements seem like they're just noise for committing arsenm: A large number of the debug statements seem like they're just noise for committing
				// We avoid iterator invalidation:
				// See some comments on iterator invalidation (when doing remove) at
				// http://llvm.1065342.n5.nabble.com/deleting-or-replacing-a-MachineInst-td77723.html
				MachineBasicBlock::iterator MIsucc = MIItr;
				MIsucc++;

				if (MI->isInlineAsm()) {
				LLVM_DEBUG(dbgs() << " MoveToFrontInlineAsm(): found INLINEASM MI = "
				arsenmUnsubmitted Not Done Reply Inline Actions No c string functions arsenm: No c string functions
				<< *MI << "\n");

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineInstr.html
				for (unsigned index = 0; index < MI->getNumOperands(); index++) {
				MachineOperand *miOpnd;
				miOpnd = & (MI->getOperand(index));

				LLVM_DEBUG(dbgs() << " MI->getOperand(" << index << ") = "
				<< *miOpnd << "\n");

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineOperand.html
				if (miOpnd->isSymbol()) {
				const char *symStr = miOpnd->getSymbolName();
				LLVM_DEBUG(dbgs() << " MoveToFrontInlineAsm(): symStr = "
				<< symStr << "\n");

				if (strstr(symStr, strToSearch) != NULL) {
				LLVM_DEBUG(dbgs() << " MoveToFrontInlineAsm(): Found INLINEASM "
				"with strToSearch in the symbol "
				"operand\n");
				//"with host-side for loop"
				//break;

				MBB->remove(MI);
				MBB->insert(MBB->front(), MI);
				}
				}
				}
				}

				// We avoid iterator invalidation
				MIItr = MIsucc;
				}
				}


				/*
				This moves to the front of the MBB a number of 3 (if justOne == false),
				or 1 (if justOne == true) ASM inline expression(s) IF the 1st inline
				expression has Opincaa kernel begin.

				We require to run first this function with justOne == false and then
				with justOne == true.

				More exactly, in LoopVectorize.cpp we added, among others, the following
				3 ASM inline expressions (consecutively):
				- 1 BEGIN_KERNEL INLINEASM instruction used as loop prologue
				- 1 END_KERNEL INLINEASM instruction used as
				loop prologue (END_KERNEL part)
				- 1 BEGIN_KERNEL INLINEASM instruction for
				the loop.
				We move these 3 instructions to the front of
				MBB when justOne == false. This ensures that eventual
				less-likely case of having a VLOAD_H_SYM_IMM (and inline ASM associated,
				containing the symbolic operand) manually generated by me
				in ConnexISelDAGToDAG.cpp is not going to be first instruction, before
				the Opincaa loop header ASM inline expression.
				We also make sure that eventual loads from spills are put inside the loop
				prologue.

				We move 1 instruction to the front since in runOnMachineFunction() we put
				all instructions of the predecessor (has to be only 1 predecessor) of
				vector.body at the front of MBB, so we have to move the BEGIN_KERNEL of
				the loop prologue.
				*/
				void MoveToFront(MachineBasicBlock *MBB, bool justOne) {
				MachineInstr tmp1, tmp2, tmp3; //, tmp4;
				int counter = 0;

				LLVM_DEBUG(dbgs() << "Entered MoveToFront(justOne = "
				<< justOne << ")\n");


				/* We compute MIItrLastLoadAssociatedToSpill, an iterator (pointer) to
				the first instruction after the loads (fills) from spills at the
				beginning of the BB.
				*/
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineBasicBlock.html
				/* IMPORTANT: make sure we put this initialization after any other MBB mutation
				in order to use it well to move the 3 INLINEASM instructions.
				*/
				MachineBasicBlock::iterator MIItrLastLoadAssociatedToSpill = MBB->front();

				if (justOne == false) {
				for (auto MIItr2 = MBB->begin(); MIItr2 != MBB->end(); ++MIItr2) {
				MachineInstr MI = &(MIItr2);

				LLVM_DEBUG(dbgs() << " MoveToFront(): MI = "
				<< *MI
				<< ", MI->getOpcode() = "
				<< MI->getOpcode()
				<< "\n");

				unsigned imm = -1;
				if (MI->getOpcode() == Connex::LD_H) {
				/* Inspired from
				http://llvm.org/docs/doxygen/html/MachineInstr_8cpp_source.html,
				method MachineInstr::isIdenticalTo()
				*/
				for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
				const MachineOperand &MO = MI->getOperand(i);

				if (MO.isImm()) {
				imm = MO.getImm();
				LLVM_DEBUG(dbgs() << " MoveToFront(): imm = "
				<< imm << "\n");
				break;
				}
				}

				//if (MI == is a vector load (LD_H), with offset address
				/* If the imm operand > CONNEX_MEM_NUM_ROWS - 32 it (normally)
				* means that the operation is generated in
				* ConnexInstrInfo::storeRegToStackSlot() and
				* ConnexInstrInfo::loadRegFromStackSlot(),
				* part of a spill or load from spill operation.
				* Note that on Connex we do not have a stack per se,
				* but we emulate it at the end of the LS memory.
				*/
				if ((imm >= CONNEX_MEM_NUM_ROWS - 32) &&
				(imm < CONNEX_MEM_NUM_ROWS)) {
				//MIItr2++;
				MIItrLastLoadAssociatedToSpill = MIItr2;
				MIItrLastLoadAssociatedToSpill++;
				}
				}
				} // end for
				} // if (justOne == false)

				/* Moving the ISD::INLINEASM instruction containing the opincaa kernel
				begin at the very front of this BB. */
				for (auto MIItr = MBB->begin(); MIItr != MBB->end();
				++MIItr, ++counter) {
				MachineInstr MI = &(MIItr);

				if (MI->isInlineAsm()) {
				LLVM_DEBUG(dbgs() << " MoveToFront() found INLINEASM MI = "
				<< *MI << "\n");

				bool isOpincaaCodeBegin = false;

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineInstr.html
				for (unsigned index = 0; index < MI->getNumOperands(); index++) {
				MachineOperand *miOpndOpincaaCodeBegin; // = NULL;
				miOpndOpincaaCodeBegin = & (MI->getOperand(index));

				LLVM_DEBUG(dbgs() << " MI->getOperand(" << index << ") = "
				<< *miOpndOpincaaCodeBegin << "\n");

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineOperand.html
				if (miOpndOpincaaCodeBegin->isSymbol()) {
				const char *symStr = miOpndOpincaaCodeBegin->getSymbolName();
				LLVM_DEBUG(dbgs() << " MoveToFront(): symStr = "
				<< symStr << "\n");
				if (strstr(symStr, STR_OPINCAA_CODE_BEGIN) != NULL) {
				isOpincaaCodeBegin = true;
				break;
				}
				}
				}

				if (isOpincaaCodeBegin) {
				if (counter != 0) {
				// We move only if not at the beginning of MBB
				tmp1 = MI;
				LLVM_DEBUG(dbgs() << " MoveToFront(): moving INLINEASM to the front (counter = "
				<< counter << ", justOne = "
				<< justOne << ")\n");

				if (justOne == true) {
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineBasicBlock.html
				MBB->remove(tmp1);
				MBB->insert(MBB->front(), tmp1);
				}
				else {
				/* We move the next 3 instructions to the front of
				MBB, namely:
				- 1 BEGIN_KERNEL INLINEASM instruction used as
				loop prologue
				- 1 END_KERNEL INLINEASM instruction used as
				loop prologue (END_KERNEL part)
				- 1 BEGIN_KERNEL INLINEASM instruction for
				the loop.

				TODO TODO TODO TODO: check tmp3 and tmp2 are
				also INLINEASM */

				MIItr++;
				tmp2 = &(*MIItr);

				MIItr++;
				tmp3 = &(*MIItr);

				LLVM_DEBUG(dbgs() << " MoveToFront(): tmp1 = "
				<< *tmp1 << "\n");
				LLVM_DEBUG(dbgs() << " MoveToFront(): tmp2 = "
				<< *tmp2 << "\n");
				LLVM_DEBUG(dbgs() << " MoveToFront(): tmp3 = "
				<< *tmp3 << "\n");
				/*
				MBB->remove(tmp4);
				//MBB->insert(MBB->front(), tmp3);
				*/

				MBB->remove(tmp3);

				MBB->remove(tmp2);

				MBB->remove(tmp1);

				/* TODO TODO TODO TODO TODO: check that the iterator
				MIItrLastLoadAssociatedToSpill does NOT get
				invalidated - it seems it is not invalidated even if we
				change MBB, which is so because the instruction
				to which the iterator points to is NOT changed. */
				MBB->insert(MIItrLastLoadAssociatedToSpill, tmp1);
				MBB->insert(MIItrLastLoadAssociatedToSpill, tmp2);
				MBB->insert(MIItrLastLoadAssociatedToSpill, tmp3);
				}
				} // END if (counter != 0)
				break;
				} // END if (isOpincaaCodeBegin)
				}
				//counter++;
				}
				} // END MoveToFront()


				// Moving the last ISD::INLINEASM instruction of MBB at the very back of MBB
				void MoveToBackLastInlineAsm(MachineBasicBlock *MBB) {
				MachineInstr tmp1; //, tmp2, *tmp3;
				int counter = 0;

				LLVM_DEBUG(dbgs() << " MoveToBackLastInlineAsm(): MBB = "
				<< *MBB << "\n");

				for (auto MIItr = MBB->rbegin(); MIItr != MBB->rend();
				++MIItr, ++counter) {
				MachineInstr MI = &(MIItr);

				if (MI->isInlineAsm()) {
				LLVM_DEBUG(dbgs() << " MoveToBackLastInlineAsm() found INLINEASM MI = "
				<< *MI << "\n");

				bool isOpincaaCodeEnd = false;

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineInstr.html
				for (unsigned index = 0; index < MI->getNumOperands(); index++) {
				MachineOperand *miOpndOpincaaCodeEnd; // = NULL;
				miOpndOpincaaCodeEnd = & (MI->getOperand(index));

				LLVM_DEBUG(dbgs() << " MI->getOperand(" << index << ") = "
				<< *miOpndOpincaaCodeEnd << "\n");

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineOperand.html
				if (miOpndOpincaaCodeEnd->isSymbol()) {
				const char *symStr = miOpndOpincaaCodeEnd->getSymbolName();
				LLVM_DEBUG(dbgs() << " MoveToBackLastInlineAsm(): symStr = "
				<< symStr << "\n");
				if (strstr(symStr, STR_OPINCAA_CODE_END) != NULL) {
				isOpincaaCodeEnd = true;
				break;
				}
				}
				}

				if (isOpincaaCodeEnd) {
				//if (counter != 0) { // We move only if not at the beginning of MBB
				tmp1 = MI;
				LLVM_DEBUG(dbgs() << " MoveToBackLastInlineAsm(): moving INLINEASM to the front (counter = "
				<< counter << ")\n");

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineBasicBlock.html
				MBB->remove(tmp1);
				MBB->insert(MBB->end(), tmp1);
				//}
				break;
				}
				}
				//counter++;
				}
				} // END MoveToBack()


				void ReplaceWithSymbolicIndex(MachineBasicBlock *MBB) {
				assert(0 && "ReplaceWithSymbolicIndex() does NOT do anything anymore");

				LLVM_DEBUG(dbgs() << "Entered ReplaceWithSymbolicIndex()\n");

				unsigned imm = -1;

				for (auto &MI : *MBB) {
				if ((MI.getOpcode() == Connex::LD_H) \|\|
				(MI.getOpcode() == Connex::ST_H)) {
				/* Inspired from
				http://llvm.org/docs/doxygen/html/MachineInstr_8cpp_source.html,
				method MachineInstr::isIdenticalTo()
				*/
				for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
				const MachineOperand &MO = MI.getOperand(i);
				if (MO.isImm()) {
				imm = MO.getImm();
				LLVM_DEBUG(dbgs() << " ReplaceWithSymbolicIndex(): imm = "
				<< imm << "\n");
				/*
				if (imm == CONNEX_MEM_NUM_ROWS - 32 - 10) {
				MO.setImm((int64_t)-1);
				}
				*/
				break;
				}
				}
				}
				}
				}


				// We add at the front of vector.body the instructions
				// for the predecessor of vector.body basic-block DIFFERENT than
				// vector.body (normally vector.ph).
				void copyInstructionsFromPred(MachineFunction &MF, MachineBasicBlock &MBB,
				MachineBasicBlock * &predMBBGood) {

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineBasicBlock.html
				/* (See also https://fossies.org/linux/llvm/lib/CodeGen/DeadMachineInstructionElim.cpp
				* method DeadMachineInstructionElim::runOnMachineFunction() for
				* an example of iteration backwards).
				*/
				//for (auto &predMI : (*predMBB))
				unsigned counterPredMBB = 0;

				// rbegin() is a reverse_iterator
				for (auto predMIItr = predMBBGood->rbegin();
				predMIItr != predMBBGood->rend();
				predMIItr++, counterPredMBB++) {
				MachineInstr predMI = &(predMIItr);

				LLVM_DEBUG(dbgs() << " copyInstructionsFromPred(): predMI = "
				<< *predMI << "\n");

				// Need to insert them in different order
				if (predMI->isBundle()) {
				LLVM_DEBUG(dbgs() << " copyInstructionsFromPred(): handling bundle\n");

				const MachineBasicBlock *MBBBundle = predMI->getParent();
				//MachineBasicBlock::const_instr_iterator I = ++MI->getIterator();
				MachineBasicBlock::const_instr_iterator I = predMI->getIterator();

				// IMPORTANT: We assume we work with finalized bundles
				I++;

				// THIS cycles ~forever... EmitInstruction(& (*I) );

				assert(I != MBBBundle->instr_end());
				const MachineInstr I1 = & (I);
				LLVM_DEBUG(dbgs() << " copyInstructionsFromPredConnexAsmPrinter::runOnMachineFunction(): I1 = "
				<< *I1 << "\n");
				//
				I++;


				// IMPORTANT: We assume we work with bundles with only 2 instructions

				/*
				// From http://llvm.org/docs/doxygen/html/classllvm_1_1MachineInstr.html
				bool isInsideBundle () const
				Return true if MI is in a bundle (but not the first MI in a bundle).
				bool isBundled () const
				Return true if this instruction part of a bundle.
				*/
				/*
				// TODO: this fails if bundle created in addPreSched2()
				// (before post-RA scheduler):
				assert(I->isInsideBundle());
				assert(I->isBundled());
				*/
				//
				/*
				// TODO: this fails if bundle created in addPreSched2()
				// (before post-RA scheduler):
				assert(I->isInsideBundle());
				assert(I->isBundled());
				*/
				assert(I != MBBBundle->instr_end());
				const MachineInstr I2 = & (I);

				MachineInstr *newPredMI2 = MF.CloneMachineInstr(I2);
				LLVM_DEBUG(dbgs() << " copyInstructionsFromPred(): newPredMI2 = "
				<< *newPredMI2 << "\n");
				MBB.insert(MBB.front(), newPredMI2);

				MachineInstr *newPredMI1 = MF.CloneMachineInstr(I1);
				LLVM_DEBUG(dbgs() << " copyInstructionsFromPred(): newPredMI1 = "
				<< *newPredMI1 << "\n");
				MBB.insert(MBB.front(), newPredMI1);

				/*
				while (I != MBBBundle->instr_end() && I->isInsideBundle()) {
				MachineInstr *newPredMI =
				MF.CloneMachineInstr(& (*I));
				MBB.insert(MBB.front(), newPredMI);

				//EmitInstruction(& (*I) );

				++I;
				}
				*/

				LLVM_DEBUG(dbgs() << " copyInstructionsFromPred(): END handling bundle\n");

				continue;
				}


				/*
				* We avoid the last instruction of predMBBGood, since it is an
				* unconditional JMP
				*/
				if (counterPredMBB == 0 &&
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineInstr.html
				predMI->isUnconditionalBranch()) { // predMBBGood->size())
				/* For llc -O3 it removes the JMP at the end of
				vector.ph, hence it merges it with vector.body,
				even if it leaves the entry label of vector.body.
				So we need to check if predMI is JMP with
				isUnconditionalBranch(). */
				LLVM_DEBUG(dbgs() << " copyInstructionsFromPred(): found a JMP, "
				"so not copying it in vector.body\n");
				continue;
				}

				/* IMPORTANT note: EmitInstruction() fails for ISD::INLINEASM
				EmitInstruction(&predMI);
				*/

				/* See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineFunction.html
				MachineInstr CloneMachineInstr(const MachineInstr Orig);
				CloneMachineInstr - Create a new MachineInstr which is a
				copy of the 'Orig' instruction, identical in all ways except
				the instruction has no parent, prev, or next.
				*/
				MachineInstr *newPredMI = MF.CloneMachineInstr(predMI);

				//MBB.insert(MBB.front(), &predMI);
				// Gives error: "Assertion `!N->getParent() &&
				// "machine instruction already in a basic block"' failed."
				MBB.insert(MBB.front(), newPredMI);
				}

				#ifdef NNNNO
				/*
				* I guess normally we should have 2 predecessors, but since I mess
				* up in LoopVectorize.cpp the vector.body block in some cases
				* (e.g., with a few iterations, in the order of magnitude of the
				* vector unit width) it can remain with only 1 predecessor.
				*/
				assert(numPredecessors <= 2 &&
				"vector.body should have at most 2 predecessors: itself and one more");
				#endif
				}


				// IMPORTANT: We copy from successor BB (middle.block) to vector.body BB
				void CopyInstructionsFromSucc(MachineFunction &MF, MachineBasicBlock &MBB) {
				LLVM_DEBUG(dbgs() << " CopyInstructionsFromSucc(): Move code from succ of block "
				<< MBB.getName().data() << "\n");

				int numSuccessors = 0;

				for (auto succMBB : MBB.successors()) {
				numSuccessors++;

				StringRef strSuccMBB = succMBB->getName();
				LLVM_DEBUG(dbgs() << " CopyInstructionsFromSucc(): strSuccMBB = "
				<< strSuccMBB << "\n");

				/*
				if (isVectorBody(strPredMBB) == true)
				continue;
				*/


				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineBasicBlock.html
				/* (See also https://fossies.org/linux/llvm/lib/CodeGen/DeadMachineInstructionElim.cpp
				* method DeadMachineInstructionElim::runOnMachineFunction() for
				* an example of iteration backwards).
				*/
				//for (auto &predMI : (*predMBB))
				unsigned counterSuccMBB = 0;

				// rbegin() is a reverse_iterator
				for (auto succMIItr = succMBB->begin();
				succMIItr != succMBB->end();
				succMIItr++, counterSuccMBB++) {
				MachineInstr succMI = &(succMIItr);

				LLVM_DEBUG(dbgs() << " CopyInstructionsFromSucc(): succMI = "
				<< *succMI << "\n");

				/*
				* We avoid the last instruction of predMBB, since it is an
				* unconditional JMP
				*/
				if (
				// counterSuccMBB == 0 &&
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineInstr.html
				(succMI->isUnconditionalBranch() \|\|
				succMI->isConditionalBranch()) ) { // predMBB->size())
				/* For llc -O3 it removes the JMP at the end of
				vector.ph, hence it merges it with vector.body,
				even if it leaves the entry label of vector.body.
				So we need to check if predMI is JMP with
				isUnconditionalBranch(). */
				LLVM_DEBUG(dbgs() << "CopyInstructionsFromSucc(): found a JMP, "
				"so not copying it in vector.body\n");
				continue;
				}

				/* IMPORTANT note: EmitInstruction() fails for ISD::INLINEASM
				EmitInstruction(&predMI);
				*/

				/* See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineFunction.html
				MachineInstr CloneMachineInstr(const MachineInstr Orig);
				CloneMachineInstr - Create a new MachineInstr which is a
				copy of the 'Orig' instruction, identical in all ways except
				the instruction has no parent, prev, or next.
				*/
				MachineInstr *newSuccMI = MF.CloneMachineInstr(succMI);

				// Gives error: "Assertion `!N->getParent() && "machine instruction already in a basic block"' failed."
				//MBB.insert(MBB.front(), &predMI);
				MBB.insert(MBB.back(), newSuccMI);
				}

				// Instead of break we should check if predMBB is the BB "just"
				// above predMBBGood or below
				break;
				}

				assert(numSuccessors == 1);
				} // END CopyInstructionsFromSucc()


				//#define TRY_DFS
				#ifdef TRY_DFS
				#define RPO
				arsenmUnsubmitted Not Done Reply Inline Actions More macros to remove arsenm: More macros to remove
				std::map<MachineBasicBlock *, bool> visitedMBB;
				std::vector<MachineBasicBlock *> sortedListMBB;

				void DFS(MachineBasicBlock *n) {
				// See http://www.cplusplus.com/reference/map/map/count/
				if (visitedMBB.count(n) != 0)
				return;

				// See http://www.cplusplus.com/reference/map/map/insert/
				visitedMBB.insert(std::pair<MachineBasicBlock *, bool>(n, true));

				#ifndef RPO
				sortedListMBB.push_back(n);
				#endif

				StringRef strN = n->getName();
				LLVM_DEBUG(dbgs() << "DFS(): BB name: = " << strN
				<< ", n = " << n << "\n");

				#ifdef NNNO
				// If in the successors we have vector.ph, vector.body, etc we choose those
				// first.
				for (auto MBB : n->successors()) {
				StringRef strMBB = MBB->getName();
				/*
				LLVM_DEBUG(dbgs() << "DFS(): BB name: = " << strMBB
				<< ", MBB = " << MBB << "\n");
				*/
				if (strMBB.equals(StringRef("min.iters.checked")) \|\|
				// somewhat-IMPORTANT-TODO: check only for "vector.*" not for all these below
				strMBB.equals(StringRef("vector.memcheck")) \|\|
				strMBB.equals(StringRef("vector.ph")) \|\|
				strMBB.equals(StringRef("vector.body.preheader")) \|\|
				strMBB.equals(StringRef("vector.body"))) {
				DFS(MBB); // This will update visitedMBB to avoid further visits
				}
				}
				#endif

				//for (auto &MBB : n->successors())
				for (auto MBB : n->successors()) {
				/*
				const char *strMBB = MBB->getName().data();
				LLVM_DEBUG(dbgs() << "DFS(): BB name: = " << strMBB
				<< ", MBB = " << MBB << "\n");
				*/
				DFS(MBB);
				}

				#ifdef RPO
				sortedListMBB.push_back(n);
				#endif
				}
				#endif // TRY_DFS


				/// Emit the specified function out to the OutStreamer.
				bool runOnMachineFunction(MachineFunction &MF) override {
				LLVM_DEBUG(dbgs()
				<< "Entered ConnexAsmPrinter::runOnMachineFunction().\n");
				LLVM_DEBUG(dbgs() << " EnableCorrectBBsASMPrint = "
				<< EnableCorrectBBsASMPrint << "\n");

				MachineBasicBlock *entryMBB = NULL;

				#ifdef TRY_DFS
				LLVM_DEBUG(dbgs() << "Printing the MBBs, as they are ordered now:\n");

				/* Looking at http://llvm.org/doxygen/classllvm_1_1MachineFunction.html
				* it seems it's not possible to obtain the root(s) of the MB otherwise.
				*/
				for (auto &MBB : MF) {
				if (entryMBB == NULL)
				entryMBB = &MBB;
				StringRef strMBB = MBB.getName();
				LLVM_DEBUG(dbgs() << " BB name: = " << strMBB << "\n");
				}
				//
				visitedMBB.clear();
				sortedListMBB.clear();
				DFS(entryMBB);
				//
				#ifdef RPO
				LLVM_DEBUG(dbgs() << "ConnexAsmPrinter: (RPO) sortedListMBB = \n");

				for (int idxSListMBB = sortedListMBB.size() - 1;
				idxSListMBB >= 0; idxSListMBB--) {
				MachineBasicBlock *MBB = sortedListMBB[idxSListMBB];
				StringRef strMBB = MBB->getName();
				LLVM_DEBUG(dbgs() << " BB name: = " << strMBB
				<< ", MBB = " << MBB << "\n");
				}
				#else
				LLVM_DEBUG(dbgs() << "ConnexAsmPrinter: sortedListMBB = \n");

				for (auto &MBB : sortedListMBB) {
				StringRef strMBB = MBB->getName();
				LLVM_DEBUG(dbgs() << " BB name: = " << strMBB
				<< ", MBB = " << MBB << "\n");
				}
				#endif

				/*
				LLVM_DEBUG(dbgs()
				<< "Printing the MBBs, as they are ordered after MF.sort():\n");

				for (auto &MBB : MF) {
				StringRef strMBB = MBB.getName();
				LLVM_DEBUG(dbgs() << " BB name: = " << strMBB << "\n");
				}
				*/
				#endif // TRY_DFS

				int numVectorizedLoops = 0;
				bool TreatRepeat2ndInnerLoopGlobal;

				// We read from startLoc.txt the configuration of the loop nests
				// in order to fill correctly the std::vector treatRepeat2ndInnerLoop.
				readStartLocFile(const_cast<char *>("startLoc.txt"), true);
				LLVM_DEBUG(dbgs()
				<< "runOnMachineFunction(): treatRepeat2ndInnerLoop.size() = "
				<< treatRepeat2ndInnerLoop.size() << "\n");

				if (EnableCorrectBBsASMPrint) {
				// processFunction() just updates mapLD_ST_REPEAT_InlineAsm for the
				// given function.
				processFunction(&MF);

				this->MF = &MF;

				// Inspired from ConnexRegisterInfo.cpp:
				//const TargetInstrInfo *TII = MF.getSubtarget().getInstrInfo();

				// Inspired from http://llvm.org/docs/doxygen/html/AsmPrinter_8cpp_source.html:

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineFunction.html
				for (auto &MBB : MF) {
				if (numVectorizedLoops >= (int)treatRepeat2ndInnerLoop.size())
				TreatRepeat2ndInnerLoopGlobal = false;
				else
				TreatRepeat2ndInnerLoopGlobal = treatRepeat2ndInnerLoop[numVectorizedLoops];

				LLVM_DEBUG(dbgs()
				<< "runOnMachineFunction(): TreatRepeat2ndInnerLoopGlobal = "
				<< TreatRepeat2ndInnerLoopGlobal << "\n");
				LLVM_DEBUG(dbgs() << "runOnMachineFunction(): numVectorizedLoops = "
				<< numVectorizedLoops << "\n");

				if (TreatRepeat2ndInnerLoopGlobal == true) {
				// TODO: think a bit: we should always call MoveToFrontRepeat() - we complicate a bit, BUT it is highly unlikely to have a REPEAT() after the last vector.body
				// A bit inefficient - we try all MBB
				MoveToFrontRepeat(&MBB);
				}
				else {
				// If we do this we risk to have comments like "Map/Reduction part"
				// after the REPEAT Opincaa instruction.
				MoveToFrontRepeat(&MBB);
				}

				// We take care to put the beginning marker for Opincaa kernel at the
				// very front of its basic block, MBB - we try all MBBs.
				LLVM_DEBUG(dbgs() <<
				"Calling MoveToFrontInlineAsm(STR_OPINCAA_CODE_BEGIN)\n");
				MoveToFrontInlineAsm(&MBB, const_cast<char *>(STR_OPINCAA_CODE_BEGIN));
				LLVM_DEBUG(dbgs() <<
				"Finished calling MoveToFrontInlineAsm(STR_OPINCAA_CODE_BEGIN)\n");

				if (isVectorBody(MBB.getName()) == false)
				continue;

				numVectorizedLoops++;

				//MoveToFrontRepeat(MBB);
				//
				//ReplaceWithSymbolicIndex(&MBB);
				/* IMPORTANT:
				arsenmUnsubmitted Done Reply Inline Actions You shouldn't be looking at the .data() on a StringRef as if it were a c string like this should just use the StringRef directly. arsenm: You shouldn't be looking at the .data() on a StringRef as if it were a c string like this…
				* We move the Inline ASM expressions to the beginning of the BB,
				* by using MoveToFront(),
				* such that, immediately after (see code below) we put the
				* instructions of the predecessor of the vector.body BB
				* at the top and then call MoveToFront(&MBB, true) again
				* to make the code OK.
				*/
				//MoveToFront(&MBB, false);

				MachineBasicBlock *predMBBGood;
				int numPredecessors = 0;
				for (auto predMBB : MBB.predecessors()) {
				numPredecessors++;

				if (isVectorBody(predMBB->getName()) == true)
				continue;
				else
				predMBBGood = predMBB;
				}

				// I guess normally we should have 2 predecessors, but since I mess
				// up in LoopVectorize.cpp the vector.body block in some cases
				// (e.g., with a few iterations, in the order of magnitude of the
				// vector unit width) it can remain with only 1 predecessor.
				assert(numPredecessors <= 2 && "vector.body should have at most "
				"2 predecessors: itself and one more");

				if (TreatRepeat2ndInnerLoopGlobal == false) {
				//copyInstructionsFromPred(MF, MBB, predMBBGood);

				// We move the header of the Opincaa kernel
				MoveToFront(predMBBGood, true);
				}

				// Does NOT help: MoveToFront(&MBB, true);
				LLVM_DEBUG(dbgs() <<
				" runOnMachineFunction(): calling MoveToFrontInlineAsm(&MBB)\n");
				//MoveToFront(&MBB, false);
				MoveToFrontInlineAsm(&MBB, const_cast<char *>("for ("));

				if (TreatRepeat2ndInnerLoopGlobal == true) {
				MoveToBackLastInlineAsm(&MBB);
				}
				} // END for (auto &MBB : MF)
				} // end if EnableCorrectBBsASMPrint

				SetupMachineFunction(MF);
				EmitFunctionBody();

				return false;
				} // end bool runOnMachineFunction(MachineFunction &MF)


				void printOperand(const MachineInstr *MI, int OpNum, raw_ostream &O,
				const char *Modifier = nullptr);


				void EmitInstruction(const MachineInstr *MI) override;


				// Taken from MSP430 back end
				void printSrcMemOperand(const MachineInstr *MI, int OpNum,
				raw_ostream &O);


				// processFunction() just updates mapLD_ST_REPEAT_InlineAsm.
				void processFunction(const MachineFunction *MF) {
				LLVM_DEBUG(dbgs() << "Entered processFunction()\n");

				for (auto &MBB : *MF) {
				for (auto MIItr = MBB.begin(); MIItr != MBB.end(); ++MIItr) {
				const MachineInstr MI = &(MIItr);

				LLVM_DEBUG(dbgs() << "processFunction(): MI = "
				<< *MI << "\n");

				if (MI->isInlineAsm()) {
				// TODO TODO: check also that the InlineAsm contains the substring "note that this line is normally NOT printed in the final .cpp"
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineBasicBlock.html
				MachineBasicBlock::const_iterator MIItr2 = MIItr;
				// TODO TODO: check for more instr, not just the next... it should help...
				MIItr2++;

				const MachineInstr MI2 = &(MIItr2);
				LLVM_DEBUG(dbgs() << "processFunction(): MI2->getOpcode() = "
				<< MI2->getOpcode() << "\n");
				LLVM_DEBUG(dbgs() << "processFunction(): MI2 = "
				<< MI2 << "\n");

				if (MI2->getOpcode() == 0) {
				/* It crashes when giving dbgs << *MI2, unfortunately...
				This case happens since I changed how I treat the
				writeDataToArray...() primitives in LoopVectorize.cpp
				because now I don't put them at the beginning of
				vector.body. */
				}
				else {
				LLVM_DEBUG(dbgs() << "processFunction(): MI2 = "
				<< *MI2 << "\n");
				}

				bool validCase = false;
				if ((MI2->getOpcode() == Connex::LD_H) \|\|
				(MI2->getOpcode() == Connex::ST_H) \|\|
				(MI2->getOpcode() == Connex::REPEAT)) {
				validCase = true;
				}
				else
				if (MI2->getOpcode() == Connex::VLOAD_H) {
				MIItr2++;
				MI2 = &(*MIItr2);

				if (MI2->getOpcode() == Connex::ST_H) {
				// TODO TODO TODO TODO: verify ALSO that dest vector register of MI2 (VLOAD_H) is used in ST_H instruction
				validCase = true;
				}
				}

				if (validCase) {
				if (ifImmSpecialUpdateMap(MI, MI2) == -1) {
				/* For test 300_Opincaa_BUG_Connex/STDerr_llc_01
				we require to look 1 more instruction.
				*/
				MIItr2++;
				MI2 = &(*MIItr2);

				if ((MI2->getOpcode() == Connex::LD_H) \|\|
				(MI2->getOpcode() == Connex::ST_H)) {
				//validCase = true;
				ifImmSpecialUpdateMap(MI, MI2);
				}
				}
				}
				}
				}
				}
				} // END processFunction()


				bool /ConnexAsmPrinter::/ PrintAsmMemoryOperand(const MachineInstr *MI,
				unsigned OpNo,
				unsigned AsmVariant,
				const char *ExtraCode,
				raw_ostream &OS) {
				LLVM_DEBUG(dbgs() << "Entered PrintAsmMemoryOperand()\n");
				return false;
				}


				bool /* ConnexAsmPrinter:: / PrintAsmOperand(const MachineInstr MI,
				unsigned OpNo,
				unsigned AsmVariant,
				const char *ExtraCode,
				raw_ostream &OS) {
				LLVM_DEBUG(dbgs() << "Entered PrintAsmOperand()\n");
				return false;
				}


				void PrintSpecial(const MachineInstr *MI, raw_ostream &OS,
				const char *Code) const {
				LLVM_DEBUG(dbgs() << "Entered PrintSpecial()\n");
				}


				void printOffset(int64_t Offset, raw_ostream &OS) const {
				LLVM_DEBUG(dbgs() << "Entered printOffset()\n");
				}


				// Note: NOT called
				void EmitInt32(int Value) const {
				LLVM_DEBUG(dbgs() << "Entered EmitInt32()\n");
				}
				}; // END class ConnexAsmPrinter

				} // END namespace


				/*
				// From [LLVM]/llvm38Nov2016/llvm/lib/Target/Mips/MipsAsmPrinter.cpp
				void ConnexAsmPrinter::printUnsignedImm(const MachineInstr *MI, int opNum,
				raw_ostream &O) {
				const MachineOperand &MO = MI->getOperand(opNum);
				if (MO.isImm())
				O << (unsigned short int)MO.getImm();
				else
				printOperand(MI, opNum, O);
				}

				// From [LLVM]/llvm38Nov2016/llvm/lib/Target/Mips/MipsAsmPrinter.cpp
				void ConnexAsmPrinter::printUnsignedImm8(const MachineInstr *MI, int opNum,
				raw_ostream &O) {
				const MachineOperand &MO = MI->getOperand(opNum);
				if (MO.isImm())
				O << (unsigned short int)(unsigned char)MO.getImm();
				else
				printOperand(MI, opNum, O);
				}
				*/


				// TODO: remove since it seems it's NOT called
				void ConnexAsmPrinter::printOperand(const MachineInstr *MI, int OpNum,
				raw_ostream &O, const char *Modifier) {
				LLVM_DEBUG(dbgs() << "Entered ConnexAsmPrinter::printOperand()\n");
				const MachineOperand &MO = MI->getOperand(OpNum);

				switch (MO.getType()) {
				case MachineOperand::MO_Register:
				O << ConnexInstPrinter::getRegisterName(MO.getReg());
				break;

				case MachineOperand::MO_Immediate: {
				unsigned imm = MO.getImm();
				LLVM_DEBUG(dbgs() << "printOperand(): imm = " << imm << "\n");

				if (imm == CONNEX_MEM_NUM_ROWS + 10) {
				O << STR_LOOP_SYMBOLIC_INDEX;
				}
				else {
				O << MO.getImm();
				}
				//O << MO.getImm();
				break;
				}

				case MachineOperand::MO_MachineBasicBlock:
				O << *MO.getMBB()->getSymbol();
				break;

				case MachineOperand::MO_GlobalAddress:
				O << *getSymbol(MO.getGlobal());
				break;

				default:
				llvm_unreachable("<unknown operand type>");
				}
				}


				void ConnexAsmPrinter::printSrcMemOperand(const MachineInstr *MI, int OpNum,
				raw_ostream &O) {
				const MachineOperand &Base = MI->getOperand(OpNum);
				const MachineOperand &Disp = MI->getOperand(OpNum+1);

				// Print displacement first

				// Imm here is in fact global address - print extra modifier.
				if (Disp.isImm() && !Base.getReg())
				O << '&';

				printOperand(MI, OpNum+1, O, "nohash");

				// Print register base field
				if (Base.getReg()) {
				O << '(';
				printOperand(MI, OpNum, O);
				O << ')';
				}
				}


				void ConnexAsmPrinter::EmitInstruction(const MachineInstr *MI) {
				LLVM_DEBUG(dbgs() << "Entered ConnexAsmPrinter::EmitInstruction()...\n");

				/* Inspired from lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
				(actually it's class AMDGPUAsmPrinter)
				*/
				if (MI->isBundle()) {
				LLVM_DEBUG(dbgs() << " EmitInstruction(): handling bundle\n");
				const MachineBasicBlock *MBB = MI->getParent();
				//MachineBasicBlock::const_instr_iterator I = ++MI->getIterator();
				MachineBasicBlock::const_instr_iterator I = MI->getIterator();
				I++;
				// THIS cycles ~forever... EmitInstruction(& (*I) );

				/*
				// From http://llvm.org/docs/doxygen/html/classllvm_1_1MachineInstr.html
				bool isInsideBundle () const
				Return true if MI is in a bundle (but not the first MI in a bundle).
				*/
				while (I != MBB->instr_end() && I->isInsideBundle()) {
				EmitInstruction(& (*I) );
				++I;
				}

				// Prints wrong instructions: EmitInstruction(& (*I) );
				return;
				}

				//#ifdef ORIGINAL_CODE
				ConnexMCInstLower MCInstLowering(OutContext, *this);

				MCInst TmpInst;
				MCInstLowering.Lower(MI, TmpInst);

				crtMI = MI;

				EmitToStreamer(*OutStreamer, TmpInst);

				//OutStreamer->EmitInstruction(MIPred, getSubtargetInfo());
				//#endif

				//AsmPrinter::EmitInstruction(MI);
				} // END ConnexAsmPrinter::EmitInstruction()


				// Force static initialization.
				extern "C" void LLVMInitializeConnexAsmPrinter() {
				RegisterAsmPrinter<ConnexAsmPrinter> Z(TheConnexTarget);
				}

lib/Target/Connex/ConnexAsmPrinterLoopNests.h

				//===-- ConnexAsmPrinterLoopNests.h - ------ C++ ---===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				arsenmUnsubmitted Not Done Reply Inline Actions I don't really understand anything that's going on in this file. You shouldn't have file IO, or globals arsenm: I don't really understand anything that's going on in this file. You shouldn't have file IO, or…
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				/// \file
				/// This file implements reading the startLoc.txt file with info about start and
				/// end locations of loops nests, generated by the LoopVectorize pass.
				// Used by ConnexAsmPrinter.cpp and ReplaceLoopsWithOpincaaKernels.cpp.
				//===----------------------------------------------------------------------===//

				#ifndef CONNEX_ASM_PRINTER_LOOP_NESTS_H
				#define CONNEX_ASM_PRINTER_LOOP_NESTS_H

				// Used by ReplaceLoopsWithOpincaaKernels.cpp and ConnexAsmPrinter.cpp

				std::vector<bool> treatRepeat2ndInnerLoop;
				// The start and end of the innermost (or 2nd innermost) loop
				std::vector<int> linStart, colStart, linEnd, colEnd;
				//
				std::vector<int> linStartLoopNest, colStartLoopNest, linEndLoopNest, colEndLoopNest;

				/*
				We read in vectors the lines and columns of the innermost loop
				and, if there is one, also of the outermost loop
				of the loop nests specified in the startLoc.txt file.
				We put in treatRepeat2ndInnerLoop vector true
				depending if the loop nest has more than 1 loop in the nest,
				false otherwise.

				Note: We keep the numbering from 1 throughout the ENTIRE program,
				BUT in FindEndLoop() we decrement the value.
				*/
				void readStartLocFile(char *fileNameSrc, bool silentFail=false) {
				int index;
				char str[MAXLEN_STR];

				int linStartTmp, colStartTmp;
				int linEndTmp, colEndTmp;

				FILE *fin = fopen(fileNameSrc, "rt");

				/* We need to process each loop, from the last in the file to the first,
				therefore preserving the line & column numbers of the loops that
				remain to be replaces.
				*/
				if (silentFail) {
				if (fin == NULL) {
				printf("startLoc.txt file NOT found (maybe NO loop was vectorized)");
				return;
				}
				}
				assert(fin != NULL &&
				"readStartLocFile(): fileNameSrc (e.g., startLoc.txt) file NOT found (maybe NO loop was vectorized). "
				"Anyhow cannot automatically replace in source file vectorized loops with Opincaa kernels.");

				//for (index = 0; index < replaceString.size(); index++)
				for (index = 0; ; index++) {
				// We read the line with the C++ comment and discard it
				if (fgets(str, MAXLEN_STR - 1, fin) == NULL)
				break;

				printf("str = %s\n", str);
				fflush(stdout);

				// We read the coordinates of the innermost loop of the crt nest
				fscanf(fin, "%d %d %d %d\r\n", &linStartTmp, &colStartTmp,
				&linEndTmp, &colEndTmp);
				//
				printf("readStartLocFile(): index = %d\n", index);

				printf("readStartLocFile(): (linStart = %d, colStart = %d) -> "
				"(linEndTmp = %d, colEndTmp = %d)\n",
				linStartTmp, colStartTmp, linEndTmp, colEndTmp);
				fflush(stdout);
				//
				linStart.push_back(linStartTmp);
				colStart.push_back(colStartTmp);
				linEnd.push_back(linEndTmp);
				colEnd.push_back(colEndTmp);
				assert(linStartTmp <= linEndTmp);

				// We check if the next line is one with C++ comment
				int ch = getc(fin);
				ungetc(ch, fin);

				printf("readStartLocFile(): ch = %d\n", (int)ch);
				fflush(stdout);

				if ((ch == '/') \|\| (ch == -1)) {
				treatRepeat2ndInnerLoop.push_back(false);

				linStartLoopNest.push_back(-1);
				colStartLoopNest.push_back(-1);
				linEndLoopNest.push_back(-1);
				colEndLoopNest.push_back(-1);
				}
				else {
				// We read the coordinates of the outermost loop of the crt nest
				treatRepeat2ndInnerLoop.push_back(true);

				fscanf(fin, "%d %d %d %d\r\n", &linStartTmp, &colStartTmp,
				&linEndTmp, &colEndTmp);
				printf("readStartLocFile(): (linStart = %d, colStart = %d) -> "
				"(linEndTmp = %d, colEndTmp = %d)\n",
				linStartTmp, colStartTmp, linEndTmp, colEndTmp);
				fflush(stdout);

				linStartLoopNest.push_back(linStartTmp);
				colStartLoopNest.push_back(colStartTmp);
				linEndLoopNest.push_back(linEndTmp);
				colEndLoopNest.push_back(colEndTmp);
				}

				printf("readStartLocFile(): treatRepeat2ndInnerLoop[%d] = %d\n",
				index, (int)treatRepeat2ndInnerLoop[index]);
				fflush(stdout);
				}

				fclose(fin);
				} // END readStartLocFile()

				#endif // end CONNEX_ASM_PRINTER_LOOP_NESTS_H

lib/Target/Connex/ConnexConfig.h

				#ifndef CONNEX_CONFIG_ALEX
				#define CONNEX_CONFIG_ALEX

				// This file is used by ConnexISelDAGToDAG.cpp, ConnexISelLowering.h,
				arsenmUnsubmitted Done Reply Inline Actions These macros mostly seem like a waste of effort arsenm: These macros mostly seem like a waste of effort
				// ReplaceLoopsWithOpincaaKernels.cpp.

				// The macros in this header file are strategic, in the sense that the back end
				// could target a Connex vector processor of different vector length.
				// There are also some other important macros like: CONNEX_MEM_NUM_ROWS_EXTRA
				// (used to keep spilled registers, or tables for f16 operations like sqrt
				// or div, etc), STR_OPINCAA, etc.


				// These 2 types are defined also in Opincaa lib, in include/Architecture.h
				typedef short TypeElement;
				typedef unsigned short UnsignedTypeElement;


				// The vector length of the Connex back end, which could be different
				// from the actual vector length of the Connex processor.
				#define CONNEX_VECTOR_LENGTH 8

				#define TYPE_SIZEOF 2
				#define CONNEX_LINE_SIZE (CONNEX_VECTOR_LENGTH * TYPE_SIZEOF)

				//#define STR_LOOP_SYMBOLIC_INDEX "indexLLVM_LV / CONNEX_VECTOR_LENGTH"
				// NOTE: make sure it is equiavlent to the above commented macro
				// NOTE: keep the paranthesis since >> has low operator priority
				#define STR_LOOP_SYMBOLIC_INDEX "(indexLLVM_LV >> 7)"

				// This is the type of the scalar processor (normally the BPF processor) operand
				// TODO_CHANGE_BACKEND:
				#define TYPE_SCALAR_ELEMENT MVT::i64
				//#define TYPE_ELEMENT MVT::i32

				//#define TYPE_VECTOR MVT::v8i64
				//#define TYPE_VECTOR MVT::v16i32
				//#define TYPE_VECTOR MVT::v32i16
				//#define TYPE_VECTOR_I16 MVT::v128i16
				#define TYPE_VECTOR_I16 MVT::v8i16
				//#define TYPE_VECTOR_ELEMENT MVT::i64
				#define TYPE_VECTOR_I16_ELEMENT MVT::i16

				//#define TYPE_VECTOR_I32 MVT::v64i32
				#define TYPE_VECTOR_I32 MVT::v4i32
				#define TYPE_VECTOR_I32_ELEMENT MVT::i32

				//#define TYPE_VECTOR_F16 MVT::v128f16
				#define TYPE_VECTOR_F16 MVT::v8f16
				#define TYPE_VECTOR_F16_ELEMENT MVT::f16


				#define TYPE_VECTOR_I16_ELEMENT_BITSIZE 16
				#define TYPE_VECTOR_I32_ELEMENT_BITSIZE 32
				#define TYPE_VECTOR_F16_ELEMENT_BITSIZE 16


				#define CONNEX_MEM_NUM_ROWS 1024
				// For 64 lanes: #define CONNEX_MEM_NUM_ROWS 2048
				// Extra LS memory for spills and LUTs for div/sqrt.f16, etc
				#define CONNEX_MEM_NUM_ROWS_EXTRA 200

				// NOTE: normally REPEAT accepts immediates in interval 0..1023
				#define VALUE_BOGUS_REPEAT_X_TIMES 32761


				//#ifndef MAXLEN_STR
				#define MAXLEN_STR 8192
				//#endif

				// Used in ConnexAsmPrinter.cpp and LoopVectorize.cpp
				#define STR_OPINCAA_CODE_BEGIN "// START_OPINCAA_HOST_DEVICE_CODE"
				#define STR_OPINCAA_CODE_END "// END_OPINCAA_HOST_DEVICE_CODE"

				#define STR_OPINCAA_KERNEL_REDUCE_BEFORE_END "REDUCE R(0); // We add a 'bogus' REDUCE to wait for it"

				#endif

lib/Target/Connex/ConnexFrameLowering.h

				//===-- ConnexFrameLowering.h - Define frame lowering for Connex ------ C++ ---===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				// This class implements Connex-specific bits of TargetFrameLowering class.
				///
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_CONNEX_CONNEXFRAMELOWERING_H
				#define LLVM_LIB_TARGET_CONNEX_CONNEXFRAMELOWERING_H

				#include "llvm/CodeGen/TargetFrameLowering.h"

				namespace llvm {
				class ConnexSubtarget;

				class ConnexFrameLowering : public TargetFrameLowering {
				public:
				explicit ConnexFrameLowering(const ConnexSubtarget &sti)
				: TargetFrameLowering(TargetFrameLowering::StackGrowsDown, 8, 0) {}

				void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override;
				void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override;

				bool hasFP(const MachineFunction &MF) const override;
				void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs,
				RegScavenger *RS) const override;

				MachineBasicBlock::iterator
				eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB,
				MachineBasicBlock::iterator MI) const override {
				return MBB.erase(MI);
				}
				};
				}
				#endif

lib/Target/Connex/ConnexFrameLowering.cpp

				//===-- ConnexFrameLowering.cpp - Connex Frame Information ----------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains the Connex implementation of TargetFrameLowering class.
				//
				//===----------------------------------------------------------------------===//

				#include "ConnexFrameLowering.h"
				#include "ConnexInstrInfo.h"
				#include "ConnexSubtarget.h"
				#include "llvm/CodeGen/MachineFrameInfo.h"
				#include "llvm/CodeGen/MachineFunction.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"

				using namespace llvm;

				bool ConnexFrameLowering::hasFP(const MachineFunction &MF) const { return true; }

				void ConnexFrameLowering::emitPrologue(MachineFunction &MF,
				MachineBasicBlock &MBB) const {}

				void ConnexFrameLowering::emitEpilogue(MachineFunction &MF,
				MachineBasicBlock &MBB) const {}

				void ConnexFrameLowering::determineCalleeSaves(MachineFunction &MF,
				BitVector &SavedRegs,
				RegScavenger *RS) const {
				TargetFrameLowering::determineCalleeSaves(MF, SavedRegs, RS);
				SavedRegs.reset(Connex::R6);
				SavedRegs.reset(Connex::R7);
				SavedRegs.reset(Connex::R8);
				SavedRegs.reset(Connex::R9);
				}

lib/Target/Connex/ConnexHazardRecognizer.h

				//===-- ConnexHazardRecognizer.h - Define frame lowering for Connex ------ C++ ---===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				///
				//===----------------------------------------------------------------------===//


				/* Inspired from llvm/lib/Target/PowerPC/PPCHazardRecognizer.h:
				/// PPCDispatchGroupSBHazardRecognizer - This class implements a scoreboard-based
				/// hazard recognizer for PPC ooo processors with dispatch-group hazards.
				*/


				#ifndef LLVM_LIB_TARGET_CONNEX_HAZARDRECOGNIZER_H
				#define LLVM_LIB_TARGET_CONNEX_HAZARDRECOGNIZER_H

				#include "ConnexInstrInfo.h"
				#include "llvm/CodeGen/ScheduleHazardRecognizer.h"
				#include "llvm/CodeGen/ScoreboardHazardRecognizer.h"
				#include "llvm/CodeGen/SelectionDAGNodes.h"

				namespace llvm {

				/* NOTE: ScheduleHazardRecognizer is basically an "interface"
				* (almost abstract, i.e. almost no functionality implemented)class, so better
				* stick with ScoreboardHazardRecognizer if its functionality is OK for me:
				class ConnexDispatchGroupSBHazardRecognizer : public ScheduleHazardRecognizer {
				*/

				/* We choose to inherit the ScoreboardHazardRecognizer because only this
				* performs out-of-order scheduling, and NOT ScheduleHazardRecognizer.
				*/
				class ConnexDispatchGroupSBHazardRecognizer : public ScoreboardHazardRecognizer {
				const ScheduleDAG *DAG;
				bool isDataHazard(SUnit *SU);

				/*
				SmallVector<SUnit *, 7> CurGroup;
				unsigned CurSlots, CurBranches;

				bool isLoadAfterStore(SUnit *SU);
				bool isBCTRAfterSet(SUnit *SU);
				bool mustComeFirst(const MCInstrDesc *MCID, unsigned &NSlots);
				*/

				public:
				ConnexDispatchGroupSBHazardRecognizer(const InstrItineraryData *ItinData,
				const ScheduleDAG *DAG_) :
				ScoreboardHazardRecognizer(ItinData, DAG_), DAG(DAG_)
				//, CurSlots(0), CurBranches(0)
				{
				//DEBUG(dbgs() << "Entered ConnexDispatchGroupSBHazardRecognizer()\n");
				}

				HazardType getHazardType(SUnit *SU, int Stalls) override;

				unsigned PreEmitNoops(SUnit *SU) override;
				/*
				bool ShouldPreferAnother(SUnit* SU) override;
				*/
				void EmitInstruction(SUnit *SU) override;
				/*
				void AdvanceCycle() override;
				void RecedeCycle() override;
				void Reset() override;
				void EmitNoop() override;
				*/
				};

				}

				#endif

lib/Target/Connex/ConnexHazardRecognizer.cpp

				//===-- ConnexHazardRecognizer.cpp - Connex Hazard Recognizer Impls --------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements hazard recognizer for scheduling on PowerPC processors.
				//
				//===----------------------------------------------------------------------===//


				// Inspired from llvm/lib/Target/PowerPC/PPCHazardRecognizer.cpp

				/*
				The delay slot issues that need to be handled are for:
				- where normally; but NOW ([!!!!THINK BETTER - we added support for i32]) we only generate WHERE for the VSELECT LLVM IR instruction
				a bundle of 4 instructions (in ConnexTargetMachine.cpp, passes PassCreateWhereBlocks and PassFinalizeBundles).
				Basically we expand the following pseudo-machine instruction:
				dst = VSELECT pred, true_assignment, false_assignment:
				to the following Connex machine instr:
				(note the comparison is excluded from the bundle -
				it's scheduled before it)
				// For pred == false
				dst = false_assignment
				WHERExy
				// For pred == true:
				dst = true_assignment
				END_WHERE

				// The comparison is excluded from the bundle (SHOULD be scheduled before it)
				predicate-false register assignment
				WHERExy
				predicate-true register assignment
				END_WHERE
				Note: I tried to use TII->PredicateInstruction() but id didn't work - see http://lists.llvm.org/pipermail/llvm-dev/2017-March/111026.html
				- read, write
				- iwrite
				for each operation updating the register used by these instructions just before, which can be:
				iread, vload, ldix, multlo/hi, ldsh, add/c, sub/c, eq/ult/lt, (i)shl, (i)shr, (i)shra, popcount, not/or/and/xor.

				Similarly with the wherexx Connex instruction.

				The point is that we should try NOT to focus on the delay slots of the producer instructions (in number of 24), but focus on these delays at the consumer side because there are only 6 consumer instructions (read/write, iwrite, wherecr/eq/lt).

				Not only that, but we should try to fill the delay slots with instructions in out-of-order fashion.

				Hal Finkel pointed me to lib/Target/PowerPC/PPCHazardRecognizers.cpp:
				On 2/3/2017 10:25 PM, Hal Finkel wrote:
				> Hi Alex,
				> You can program a post-RA scheduler which will return NoopHazard in the appropriate
				> circumstances. You can look at the PowerPC target (e.g.
				> lib/Target/PowerPC/PPCHazardRecognizers.cpp) as an example.

				I guess Hal recommends customizing the post-RA scheduler because after RA we have finished all(?) instruction selection steps and we handle MachineInstr, which makes life simpler for us to see if we have ST_H or ST_INDIRECT, etc.
				See the Figure with passes in \cite{Cardoso_Lopes2014}, page 134.

				\cite{Cardoso_Lopes2014}
				"There are three distinct scheduler executions in the code generator:
				two prior and one post register allocation. The frst works on
				SelectionDAG nodes while the other two work on machine
				instructions"

				"The scheduler runs before and after register allocation. However, the SDNode
				instruction representation is only available in the former while the latter uses the
				MachineInstr class. To cope with both SDNodes and MachineInstrs, the SUnit class
				(see the fle <llvm_source>/include/llvm/CodeGen/ScheduleDAG.h) abstracts the
				underlying instruction representation as the unit used during instruction scheduling."

				See also http://llvm.org/docs/doxygen/html/classllvm_1_1ScheduleHazardRecognizer.html#details
				<<HazardRecognizer - This determines whether or not an instruction can be issued this cycle, and whether or not a noop needs to be inserted to handle the hazard.>>
				*/




				#include "ConnexHazardRecognizer.h"
				#include "Connex.h"
				#include "ConnexInstrInfo.h"
				#include "ConnexTargetMachine.h"
				#include "llvm/CodeGen/ScheduleDAG.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/ErrorHandling.h"
				#include "llvm/Support/raw_ostream.h"
				#include "Misc.h" // For dumpSU()

				using namespace llvm;



				#define DEBUG_TYPE "post-RA-sched"

				// getPredMachineInstr() is declared in ConnexInstrInfo.cpp
				extern MachineInstr getPredMachineInstr(MachineInstr MI, MachineInstr **succMI);






				/*
				From http://llvm.org/docs/doxygen/html/ScheduleHazardRecognizer_8h_source.html#l00078:
				00073 /// PreEmitNoops - This callback is invoked prior to emitting an instruction.
				00074 /// It should return the number of noops to emit prior to the provided
				00075 /// instruction.
				00076 /// Note: This is only used during PostRA scheduling. EmitNoop is not called
				00077 /// for these noops.
				*
				*/
				unsigned ConnexDispatchGroupSBHazardRecognizer::PreEmitNoops(SUnit *SU) {
				assert(SU->isInstr() == true);

				/*
				MachineInstr *MI = SU->getInstr();
				int MIOpcode = MI->getOpcode();
				if (MIOpcode == Connex::LD_INDIRECT_H)
				*/
				if (isDataHazard(SU))
				return 1;

				return ScoreboardHazardRecognizer::PreEmitNoops(SU);
				}


				bool ConnexDispatchGroupSBHazardRecognizer::isDataHazard(SUnit *SU) {
				// From http://llvm.org/docs/doxygen/html/classllvm_1_1MCInstrDesc.html
				const MCInstrDesc *MCID = DAG->getInstrDesc(SU);
				if (MCID == NULL)
				return false;

				/*
				// Note: MCPhysReg is an integer -
				// see http://llvm.org/docs/doxygen/html/namespacellvm.html:
				// "typedef uint16_t llvm::MCPhysReg"
				const MCPhysReg *MCIDArray = MCID->getImplicitUses();
				unsigned numUses = MCID->getNumImplicitUses(); seems it is always 0
				*/
				//const MCOperandInfo *MCIDArray = MCID->OpInfo;
				unsigned numUses = MCID->getNumOperands() - MCID->getNumDefs();
				//LLVM_DEBUG(dbgs() << " isDataHazard(): SU = " << numUses << "\n");
				LLVM_DEBUG(dbgs() << " isDataHazard(): numUses = " << numUses << "\n");
				LLVM_DEBUG(dbgs() << " isDataHazard(): MCID->getNumOperands() = "
				<< MCID->getNumOperands() << "\n");
				LLVM_DEBUG(dbgs() << " isDataHazard(): MCID->getNumDefs() = "
				<< MCID->getNumDefs() << "\n");


				assert(SU->isInstr() == true);

				MachineInstr *MI = SU->getInstr();
				LLVM_DEBUG(dbgs() << " isDataHazard(): MI =";
				MI->dump();
				);

				int MIOpcode = MI->getOpcode();
				LLVM_DEBUG(dbgs() << " isDataHazard(): MI->getOpcode() = "
				<< MI->getOpcode() << "\n");


				if (MIOpcode == Connex::ST_INDIRECT_H \|\|
				MIOpcode == Connex::ST_INDIRECT_W \|\|
				MIOpcode == Connex::ST_INDIRECT_MASKED_H \|\|
				MIOpcode == Connex::ST_H) {
				/* NOTE: END_REPEAT returns, to my surprise, also mayStore().
				But we should not worry about this since END_REPEAT takes no
				parameter. */
				/*
				if (MCID->mayStore())
				if (MCID->mayLoad())
				*/
				LLVM_DEBUG(dbgs() << " isDataHazard(): SU is Store\n");
				}
				else
				if (MIOpcode == Connex::LD_INDIRECT_H \|\|
				MIOpcode == Connex::LD_INDIRECT_W \|\|
				MIOpcode == Connex::LD_INDIRECT_MASKED_H) {
				LLVM_DEBUG(dbgs() << " isDataHazard(): SU is Load\n");
				}
				else
				if (
				//assert(MIOpcode != Connex::WHERECRY);
				//MIOpcode == Connex::WHERECRY \|\|
				MIOpcode == Connex::WHEREEQ_BUNDLE_H \|\|
				MIOpcode == Connex::WHERELT_BUNDLE_H \|\|
				MIOpcode == Connex::WHEREULT_BUNDLE_H) {
				LLVM_DEBUG(dbgs() << " isDataHazard(): SU is Where\n");
				}
				else {
				LLVM_DEBUG(dbgs() << " isDataHazard(): SU NOT producing data hazard\n");

				// VERY IMPORTANT
				return false;
				}

				LLVM_DEBUG(dbgs() << " isDataHazard(): MI->getNumOperands() = "
				<< MI->getNumOperands() << "\n");

				/*
				Why does getHazardType() find 3 Loads - because I was considering pred in DAG (SDNode), not in MachineInstr list, where it should be only 1?

				This should cover these cases described in ConnexISA.docx:
				- (i)write using register defined in the previous instruction:
				LS[R1] = R4
				LS[5] = R1
				and also this slightly different case:
				LS[R10] = R1

				- read using register defined in the previous instruction
				R4 = LS[R1]

				- wherexx using the flag defined in the previous instruction
				R1 = (R2 == R3)
				WHERE_EQUAL
				*/

				/* small-TODO: understand conceptually what PPC was doing with dispatch group.

				IMPORTANT: We keep this search for predecessors of SU in the DAG and not for
				THE only predecessor of the MachineInstr (we are at Post-RA scheduler)
				contained in SU because MAYBE/it is possible that when doing
				ScoreboardHazardRecognizer (out-of-order scheduling to fill delay slots)
				we could benefit from the DAG predecessors - QUITE UNLIKELY, but maybe
				so. Otherwise, we should ONLY look at the
				getPredMachineInstr(MachineInstr *MI).

				For any predecessors of SU with which we
				have an ordering dependency, return true. */
				for (unsigned i = 0, ie = (unsigned) SU->Preds.size(); i != ie; ++i) {
				const MCInstrDesc *PredMCID = DAG->getInstrDesc(SU->Preds[i].getSUnit());

				if (PredMCID == NULL) // \|\| !PredMCID->mayStore())
				continue;

				/* SU->Preds is SmallVector of SDep.
				* - see http://llvm.org/docs/doxygen/html/classllvm_1_1SUnit.html
				* - see http://llvm.org/docs/doxygen/html/classllvm_1_1SDep.html
				*/
				MachineInstr *PredMI = (SU->Preds[i].getSUnit())->getInstr();
				MachineInstr *tmpNotUsed;
				if (PredMI != getPredMachineInstr(MI, &tmpNotUsed)) {
				LLVM_DEBUG(dbgs() << " isDataHazard(): jumping DAG predecessor that is "
				"NOT MachineInstr predecessor: PredMI =";
				PredMI->dump();
				dbgs() << " for MI =";
				MI->dump();
				);
				continue;
				}

				LLVM_DEBUG(dbgs() << " isDataHazard(): Found DAG predecessor that is "
				"MachineInstr predecessor: PredMI =";
				PredMI->dump();
				dbgs() << " for MI =";
				MI->dump();
				);

				LLVM_DEBUG(dbgs() << " isDataHazard(SU->Preds["
				<< i << "] = ";
				PredMI->dump();
				//(SU->Preds[i].getSUnit())->dump(DAG);
				//PredMCID->dump(DAG);
				dbgs() << ")\n");

				/*
				* // TODO: check BETTER we have to check SU->Preds[i] is THE prev
				instruction in the list of MachineInstr - .getParent()
				* TODO TODO TODO: we have to check for
				* LD_INDIRECT_H for the memory (offset) register,
				* not the passthrough (or mask).
				*/

				/*
				const MCPhysReg *PredMCIDArray = PredMCID->getImplicitDefs();
				unsigned numDefs = PredMCID->getNumImplicitDefs(); seems it is always 0
				*/
				unsigned numDefs = PredMCID->getNumDefs();
				//const MCOperandInfo *PredMCIDArray = PredMCID->OpInfo;
				LLVM_DEBUG(dbgs() << " isDataHazard(): numDefs = " << numDefs << "\n");
				LLVM_DEBUG(dbgs() << " isDataHazard(): PredMI->getNumOperands() = "
				<< PredMI->getNumOperands() << "\n");
				LLVM_DEBUG(dbgs() << " isDataHazard(): PredMCID->getNumOperands() = "
				<< PredMCID->getNumOperands() << "\n");
				LLVM_DEBUG(dbgs() << " isDataHazard(): PredMCID->getNumDefs() = "
				<< PredMCID->getNumDefs() << "\n");

				int idUseStart;
				if (MIOpcode == Connex::LD_INDIRECT_H \|\| MIOpcode == Connex::LD_INDIRECT_W \|\|
				MIOpcode == Connex::LD_INDIRECT_MASKED_H) {
				LLVM_DEBUG(dbgs() << " isDataHazard(): PredMI->getOpcode() = "
				<< PredMI->getOpcode() << "\n");

				if (PredMI->isInlineAsm()) {
				LLVM_DEBUG(dbgs()
				<< " isDataHazard(): PredMI is INLINEASM so return true"
				<< "\n");
				/* We assume that the PredMI INLINEAASM is NOT a Connex
				* instruction, but a host-side Opincaa C++ for loop.
				* In such case, we can have 2 data hazards with MI:
				* - one with the instruction above this C++ for statement
				* - one with the instruction at the end of this for loop
				* when we unroll (if the trip-count of the loop is >1)
				* this for loop
				*
				* IMPORTANT-TODO: make full checks and
				* return true only if it
				* is the case, to be more efficient.
				*/
				// IMPORTANT-TODO: return true;
				}

				/* %Wh5<def>, %BoolMask1<def,dead> = LD_INDIRECT_MASKED_H %Wh4, %BoolMask0, %Wh0; mem:LD256[inttoptr (i16 51 to i16*)](tbaa=!12)(alias.scope=!16)
				The arguments ("uses") of LD_INDIRECT_MASKED_H are:
				%Wh4 - I think it is the passthrough register
				(if mask bit is 0 we use passthrough)
				%BoolMask0 - is the mask
				%Wh0 - the offset register (if mask bit is 0 we use passthrough)
				Note that Connex does NOT support masked gather just with read
				(it requires WHERE also and things become more complex than
				just masked gather, in principle)
				*/

				if (MIOpcode == Connex::LD_INDIRECT_MASKED_H) {
				idUseStart = MCID->getNumDefs() + 2; // 1 for passthrough, 1 for bool mask
				}
				else
				if (MIOpcode == Connex::LD_INDIRECT_H \|\| MIOpcode == Connex::LD_INDIRECT_W) {
				idUseStart = MCID->getNumDefs(); // 1 for passthrough, 1 for bool mask
				}
				}
				else {
				idUseStart = MCID->getNumDefs();
				}

				for (unsigned idUse = idUseStart; idUse < numUses; idUse++) {
				/*
				LLVM_DEBUG(dbgs() << " isDataHazard(): MCIDArray[" << idUse
				<< "] = " << MCIDArray[idUse] << "\n");
				*/
				LLVM_DEBUG(dbgs() << " isDataHazard(): MI->getOperand(" << idUse
				<< ") = " << MI->getOperand(idUse) << "\n");
				for (unsigned idDef = 0; idDef < numDefs; idDef++) {
				/*
				LLVM_DEBUG(dbgs() << " isDataHazard(): PredMCIDArray[" << idDef
				<< "] = " << PredMCIDArray[idDef] << "\n");
				if (PredMCIDArray[idDef] == MCIDArray[idUse]) {
				LLVM_DEBUG(dbgs() << " isDataHazard(): found an instr sequence that has to be separated by NOP to avoid true dependency hazard\n");
				return true;
				}
				*/
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineOperand.html
				const MachineOperand &PredMIMO = PredMI->getOperand(idDef);
				const MachineOperand &MIMO = MI->getOperand(idUse);
				LLVM_DEBUG(dbgs() << " isDataHazard(): PredMI->getOperand(" << idDef
				<< ") = " << PredMI->getOperand(idDef) << "\n");

				if ((PredMI->getOpcode() != Connex::END_WHERE) &&
				(PredMI->getOpcode() != Connex::WHEREEQ) &&
				(PredMI->getOpcode() != Connex::WHERELT) &&
				(PredMI->getOpcode() != Connex::WHERECRY) &&
				PredMIMO.isReg() && MIMO.isReg() &&
				PredMIMO.getReg() == MIMO.getReg()) {
				LLVM_DEBUG(dbgs()
				<< " isDataHazard(): found an instr sequence "
				"(defReg = PredOpcode; write/read/Where useReg;) and "
				"defReg == useReg. "
				"This sequence has to be separated by NOP to avoid "
				"true dependency hazard\n");
				return true;
				}
				}
				}
				/*
				if (!SU->Preds[i].isNormalMemory() && !SU->Preds[i].isBarrier())
				continue;
				*/
				//return true;
				}

				return false;
				}


				ScheduleHazardRecognizer::HazardType
				ConnexDispatchGroupSBHazardRecognizer::getHazardType(SUnit *SU, int Stalls) {
				#ifdef USE_GETHAZARDTYPE
				static bool emittedNoop = false;

				// From http://llvm.org/docs/doxygen/html/classllvm_1_1SUnit.html
				LLVM_DEBUG(dbgs() << "ConnexDispatchGroupSBHazardRecognizer::getHazardType(SU = ";
				SU->dump(DAG);
				dbgs() << ", Stalls = " << Stalls << ") and "
				<< "emittedNoop = " << emittedNoop << "\n");

				//if (Stalls == 0 && isLoadAfterStore(SU))
				if (Stalls == 0 && // no (pipeline?) stalls
				emittedNoop == false && // TODO This is a ~lousy solution, but can generate several NOPs in a function, etc
				/* TODO TODO: the problem I have is due to wrong instr
				itineraries??? */
				isDataHazard(SU)) {
				LLVM_DEBUG(dbgs() << " getHazardType(): return NoopHazard\n");

				emittedNoop = true;

				return NoopHazard;
				/* TODO TODO TODO TODO TODO TODO TODO: figure out how to make this work.
				Does NOT help at all (no change in code - not NOP,
				nor other useful instr in the delay slot):
				return Hazard;
				*/
				}
				else {
				emittedNoop = false;
				}

				return NoHazard;
				#endif

				return ScoreboardHazardRecognizer::getHazardType(SU, Stalls);
				}

				void ConnexDispatchGroupSBHazardRecognizer::EmitInstruction(SUnit *SU) {
				unsigned i, ie;

				LLVM_DEBUG(dbgs() << "Entered Connex's ConnexDispatchGroupSBHazardRecognizer::EmitInstruction(";
				dumpSU(SU, dbgs());
				dbgs() << ")\n");
				//
				assert(SU->isInstr() == true);
				MachineInstr *MI = SU->getInstr();
				MachineBasicBlock *MBB = MI->getParent();
				LLVM_DEBUG(dbgs() << " EmitInstruction(): MBB = "
				<< MBB->getFullName() << "\n"
				//MBB->dump();
				);

				LLVM_DEBUG(dbgs() << " SU->Succs.size() = "
				<< SU->Succs.size() << "\n");
				LLVM_DEBUG(dbgs() << " SU->Preds.size() = "
				<< SU->Preds.size() << "\n");

				for (i = 0, ie = (unsigned) SU->Succs.size(); i != ie; ++i) {
				MachineInstr *SuccMI = (SU->Succs[i].getSUnit())->getInstr();
				if (SuccMI == NULL) {
				LLVM_DEBUG(dbgs() << " SU->Succs["
				<< i << "] = NULL\n");
				}
				else {
				LLVM_DEBUG(dbgs() << " SU->Succs["
				<< i << "] = ";
				SuccMI->dump();
				dbgs() << "\n");
				}
				}
				for (i = 0, ie = (unsigned) SU->Preds.size(); i != ie; ++i) {
				MachineInstr *PredMI = (SU->Preds[i].getSUnit())->getInstr();
				if (PredMI == NULL) {
				LLVM_DEBUG(dbgs() << " SU->Preds["
				<< i << "] = NULL\n");
				}
				else {
				LLVM_DEBUG(dbgs() << " SU->Preds["
				<< i << "] = ";
				PredMI->dump();
				dbgs() << "\n");
				}
				}

				return ScoreboardHazardRecognizer::EmitInstruction(SU);
				}

lib/Target/Connex/ConnexHazardRecognizerPreRAScheduler.h

				//===-- ConnexHazardRecognizerPreRAScheduler.h - Define frame lowering for Connex ------ C++ ---===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				///
				//===----------------------------------------------------------------------===//

				/* Inspired from llvm/lib/Target/PowerPC/PPCHazardRecognizer.h:
				/// PPCDispatchGroupSBHazardRecognizer - This class implements a scoreboard-based
				/// hazard recognizer for PPC ooo processors with dispatch-group hazards.
				*/


				#ifndef LLVM_LIB_TARGET_CONNEX_HAZARDRECOGNIZER_PRE_RA_SCHEDULER_H
				#define LLVM_LIB_TARGET_CONNEX_HAZARDRECOGNIZER_PRE_RA_SCHEDULER_H

				#include "ConnexInstrInfo.h"
				#include "llvm/CodeGen/ScheduleHazardRecognizer.h"
				#include "llvm/CodeGen/ScoreboardHazardRecognizer.h"
				#include "llvm/CodeGen/SelectionDAGNodes.h"

				namespace llvm {

				/* We choose to inherit the ScoreboardHazardRecognizer because only this
				* performs out-of-order scheduling, and NOT ScheduleHazardRecognizer.
				*/
				class ConnexDispatchGroupSBHazardRecognizerPreRAScheduler : public ScoreboardHazardRecognizer {
				const ScheduleDAG *DAG;
				bool isReadAfterWrite(SUnit *SU);

				/*
				SmallVector<SUnit *, 7> CurGroup;
				unsigned CurSlots, CurBranches;

				bool isLoadAfterStore(SUnit *SU);
				bool isBCTRAfterSet(SUnit *SU);
				bool mustComeFirst(const MCInstrDesc *MCID, unsigned &NSlots);
				*/

				public:
				ConnexDispatchGroupSBHazardRecognizerPreRAScheduler(
				const InstrItineraryData *ItinData,
				const ScheduleDAG *DAG_) :
				ScoreboardHazardRecognizer(ItinData, DAG_), DAG(DAG_) {
				//DEBUG(dbgs()
				// << "Entered ConnexDispatchGroupSBHazardRecognizerPreRAScheduler()\n");
				}

				HazardType getHazardType(SUnit *SU, int Stalls) override;
				/*
				bool ShouldPreferAnother(SUnit* SU) override;
				*/
				unsigned PreEmitNoops(SUnit *SU) override;
				void EmitInstruction(SUnit *SU) override;
				/*
				void AdvanceCycle() override;
				void RecedeCycle() override;
				void Reset() override;
				*/
				void EmitNoop() override;
				};

				}

				#endif

lib/Target/Connex/ConnexHazardRecognizerPreRAScheduler.cpp

				//===-- ConnexHazardRecognizerPreRAScheduler.cpp - Connex Hazard Recognizer Impls --------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements hazard recognizer for scheduling on PowerPC processors.
				//
				//===----------------------------------------------------------------------===//

				// Inspired from llvm/lib/Target/PowerPC/PPCHazardRecognizer.cpp

				#include "ConnexHazardRecognizerPreRAScheduler.h"
				#include "Connex.h"
				#include "ConnexInstrInfo.h"
				#include "ConnexTargetMachine.h"
				#include "llvm/CodeGen/ScheduleDAG.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/ErrorHandling.h"
				#include "llvm/Support/raw_ostream.h"
				#include "Misc.h" // For dumpSU()

				using namespace llvm;



				#define DEBUG_TYPE "pre-RA-sched"

				/*
				SUnit is meant for both types of schedulers:
				- pre-RA, which deals with MachineSDNode and SDNode.
				- post-RA, which deal with MachineInstr
				But note that here we are a pre-RA scheduler.
				So, as expected here an SUnit contains ONLY MachineSDNode and SDNode.
				*/
				bool ConnexDispatchGroupSBHazardRecognizerPreRAScheduler::isReadAfterWrite(SUnit *SU) {
				/* From http://llvm.org/docs/doxygen/html/classllvm_1_1MCInstrDesc.html
				NOTE: although SU->isInstr() == false, we can use DAG->getInstrDesc(SU).
				*/
				const MCInstrDesc *MCID = DAG->getInstrDesc(SU);

				if (MCID == NULL)
				return false;

				LLVM_DEBUG(dbgs() << "isReadAfterWrite(SU = ";
				dumpSU(SU, dbgs());
				dbgs() << ")\n");
				LLVM_DEBUG(dbgs() << "isReadAfterWrite(): SU->Succs.size() = "
				<< SU->Succs.size() << "\n");
				/* See http://llvm.org/docs/doxygen/html/SelectionDAGNodes_8h_source.html#l00481
				/// Test if this node has a post-isel opcode, directly
				/// corresponding to a MachineInstr opcode.
				*/
				LLVM_DEBUG(dbgs() << "isReadAfterWrite(): (SU->getNode())->isMachineOpcode() = "
				<< (SU->getNode())->isMachineOpcode() << "\n");
				LLVM_DEBUG(dbgs() << "isReadAfterWrite(): (SU->getNode())->getOpcode() = "
				<< (SU->getNode())->getOpcode() << "\n");
				LLVM_DEBUG(dbgs() << "isReadAfterWrite(): (SU->getNode())->getMachineOpcode() = "
				<< (SU->getNode())->getMachineOpcode() << "\n");

				#ifdef USE_FOUNDINLINEASM
				bool foundINLINEASM = false;
				#endif
				//MachineInstr *SUpred_INLINEASM = NULL;
				for (unsigned int i = 0; i < SU->Succs.size(); ++i) {
				SUnit *SUsucc = SU->Succs[i].getSUnit();
				LLVM_DEBUG(dbgs() << "isReadAfterWrite(): SU->Succs[" << i << "] = ";
				dumpSU(SUsucc, dbgs());
				dbgs() << ")\n");

				if ((SUsucc->getNode())->isMachineOpcode() == false) {
				LLVM_DEBUG(dbgs() << "isReadAfterWrite(): (SUsucc->getNode())->getOpcode() = "
				<< (SUsucc->getNode())->getOpcode() << "\n");
				}
				else{
				LLVM_DEBUG(dbgs() << "isReadAfterWrite(): (SUsucc->getNode())->getMachineOpcode() = "
				<< (SUsucc->getNode())->getMachineOpcode() << "\n");
				}

				if ( ((SUsucc->getNode())->isMachineOpcode() == false) &&
				((SUsucc->getNode())->getOpcode() == ISD::INLINEASM) ) {
				LLVM_DEBUG(dbgs() << "isReadAfterWrite(): Found SDNode ISD::INLINEASM\n");

				#ifdef USE_FOUNDINLINEASM
				foundINLINEASM = true;
				#endif
				/*
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineInstr.html
				// This fails with: llvm::MachineInstr* llvm::SUnit::getInstr() const: Assertion `!Node && "Reading MachineInstr of SUnit with SDNode!"' failed.
				SUpred_INLINEASM = SUsucc->getInstr();
				assert(SUpred_INLINEASM != NULL);
				if ( ((SU->getNode())->isMachineOpcode() == true) &&
				((SU->getNode())->getMachineOpcode() == Connex::VLOAD_H_SYM_IMM) ) {
				SUpred_INLINEASM->bundleWithPred();
				}
				*/
				}
				}

				// See http://llvm.org/docs/doxygen/html/SelectionDAGNodes_8h_source.html#l00486
				if ( ((SU->getNode())->isMachineOpcode() == true) &&
				((SU->getNode())->getMachineOpcode() == Connex::VLOAD_H_SYM_IMM) ) {
				LLVM_DEBUG(dbgs() << "isReadAfterWrite(): Found Connex::VLOAD_H_SYM_IMM\n");

				/*
				if (foundINLINEASM == true) {
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineInstr.html
				LLVM_DEBUG(dbgs() << "isReadAfterWrite(): before getInstr()\n");
				// Gives error: <<llvm::MachineInstr* llvm::SUnit::getInstr() const: Assertion `!Node && "Reading MachineInstr of SUnit with SDNode!"' failed.>>
				(SU->getInstr())->bundleWithSucc();
				LLVM_DEBUG(dbgs() << "isReadAfterWrite(): after getInstr()\n");
				}
				*/
				/*
				LLVM_DEBUG(dbgs() << "isReadAfterWrite(): SU->Preds[0] = ";
				(SU->Preds[0].getSUnit())->dump(DAG);
				dbgs() << ")\n");
				*/
				}

				/*
				// Note: MCPhysReg is an integer - see http://llvm.org/docs/doxygen/html/namespacellvm.html: "typedef uint16_t llvm::MCPhysReg"
				const MCPhysReg *MCIDArray = MCID->getImplicitUses();
				unsigned numUses = MCID->getNumImplicitUses(); seems it is always 0
				*/
				//const MCOperandInfo *MCIDArray = MCID->OpInfo;
				unsigned numUses = MCID->getNumOperands() - MCID->getNumDefs();
				LLVM_DEBUG(dbgs() << " isReadAfterWrite(): numUses = " << numUses << "\n");
				LLVM_DEBUG(dbgs() << " isReadAfterWrite(): MCID->getNumOperands() = "
				<< MCID->getNumOperands() << "\n");
				LLVM_DEBUG(dbgs() << " isReadAfterWrite(): MCID->getNumDefs() = "
				<< MCID->getNumDefs() << "\n");

				LLVM_DEBUG(dbgs() << " isReadAfterWrite(): SU->Preds.size() = "
				<< SU->Preds.size() << "\n");
				LLVM_DEBUG(dbgs() << " isReadAfterWrite(): SU->Succs.size() = "
				<< SU->Succs.size() << "\n");

				/*
				if (!MCID->mayLoad())
				return false;
				LLVM_DEBUG(dbgs() << " isReadAfterWrite(): SU can load\n");
				*/
				/* TODO: NOTE: END_REPEAT returns also mayStore(). But we should not worry
				about this since END_REPEAT takes no parameter. */
				if (!MCID->mayStore())
				return false;
				LLVM_DEBUG(dbgs() << " isReadAfterWrite(): SU can store\n");

				// IMPORTANT: In the standard pre-RA, END_REPEAT has isInstr() == false
				assert(SU->isInstr() == false);
				/*
				// TODO TODO TODO TODO: try to treat this since REPEAT is also intrinsic and can have conditional hazards
				if (SU->isInstr() == false) {
				LLVM_DEBUG(dbgs() << " isReadAfterWrite(): SU->isInstr() == false\n");
				return false;
				}
				*/

				SDNode *SDN = SU->getNode();
				LLVM_DEBUG(dbgs() << " isReadAfterWrite(): SDN->getNumOperands() = "
				<< SDN->getNumOperands() << "\n");

				// SU is a load; for any predecessors in this dispatch group, that are stores,
				// and with which we have an ordering dependency, return true.
				for (unsigned i = 0, ie = (unsigned) SU->Preds.size(); i != ie; ++i) {
				const MCInstrDesc *PredMCID = DAG->getInstrDesc(SU->Preds[i].getSUnit());

				if (PredMCID == NULL) // \|\| !PredMCID->mayStore())
				continue;

				/* SU->Preds is SmallVector of SDep.
				* - see http://llvm.org/docs/doxygen/html/classllvm_1_1SUnit.html
				* - see http://llvm.org/docs/doxygen/html/classllvm_1_1SDep.html
				*/
				SDNode *PredSDN = (SU->Preds[i].getSUnit())->getNode();
				LLVM_DEBUG(dbgs() << " isReadAfterWrite(SU->Preds["
				<< i << "] = ";
				PredSDN->dump();
				//(SU->Preds[i].getSUnit())->dump(DAG);
				//PredMCID->dump(DAG);
				dbgs() << ")\n");


				/*
				const MCPhysReg *PredMCIDArray = PredMCID->getImplicitDefs();
				unsigned numDefs = PredMCID->getNumImplicitDefs(); seems it is always 0
				*/
				unsigned numDefs = PredMCID->getNumDefs();
				LLVM_DEBUG(dbgs() << " isReadAfterWrite(): numDefs = " << numDefs << "\n");
				//const MCOperandInfo *PredMCIDArray = PredMCID->OpInfo;
				LLVM_DEBUG(dbgs() << " isReadAfterWrite(): PredSDN->getNumOperands() = "
				<< PredSDN->getNumOperands() << "\n");
				/*
				LLVM_DEBUG(dbgs() << " isReadAfterWrite(): PredSDN->getNumDefs() = "
				<< PredMCID->getNumDefs() << "\n");
				*/


				//for (unsigned idUse = MCID->getNumDefs(); idUse < numUses; idUse++)
				for (unsigned idUse = 0; idUse < numUses; idUse++) {
				/*
				LLVM_DEBUG(dbgs() << " isReadAfterWrite(): MCIDArray[" << idUse
				<< "] = " << MCIDArray[idUse] << "\n");
				*/
				LLVM_DEBUG(dbgs() << " isReadAfterWrite(): SDN->getOperand(" << idUse
				<< ") = ";
				SDN->getOperand(idUse)->dump();
				dbgs() << "\n");
				//for (unsigned idDef = 0; idDef < PredSDN->getNumOperands(); idDef++)
				for (unsigned idDef = 0; idDef < numDefs; idDef++) {
				LLVM_DEBUG(dbgs() << " isReadAfterWrite(): PredSDN->getOperand(" << idUse
				<< ") = ";
				PredSDN->getOperand(idDef)->dump();
				dbgs() << "\n");

				//if (PredSDN->getOperand(idDef) == SDN->getOperand(idUse))
				if (PredSDN == SDN->getOperand(idUse).getNode()) {
				LLVM_DEBUG(dbgs() << " isReadAfterWrite(): Found PredSDN == SDN->getOperand(idUse)\n");
				return true;
				}
				/*
				LLVM_DEBUG(dbgs() << " isReadAfterWrite(): PredMCIDArray[" << idDef
				<< "] = " << PredMCIDArray[idDef] << "\n");
				if (PredMCIDArray[idDef] == MCIDArray[idUse]) {
				LLVM_DEBUG(dbgs() << " isReadAfterWrite(): found an instr sequence that has to be separated by NOP to avoid true dependency hazard\n");
				return true;
				}
				*/

				/*
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineOperand.html
				const MachineOperand &PredMIMO = PredMI->getOperand(idDef);
				const MachineOperand &MIMO = MI->getOperand(idUse);
				LLVM_DEBUG(dbgs() << " isReadAfterWrite(): PredMI->getOperand("
				<< idDef
				<< ") = " << PredMI->getOperand(idDef) << "\n");

				if (PredMIMO.isReg() && MIMO.isReg() &&
				PredMIMO.getReg() == MIMO.getReg()) {
				LLVM_DEBUG(dbgs() << " isReadAfterWrite(): found an instr sequence that has to be separated by NOP to avoid true dependency hazard\n");
				return true;
				}
				*/
				}
				}
				/*
				if (!SU->Preds[i].isNormalMemory() && !SU->Preds[i].isBarrier())
				continue;
				*/
				//return true;
				}

				return false;
				}

				ScheduleHazardRecognizer::HazardType
				ConnexDispatchGroupSBHazardRecognizerPreRAScheduler::getHazardType(SUnit *SU, int Stalls) {
				static bool emittedNoop = false;

				// From http://llvm.org/docs/doxygen/html/classllvm_1_1SUnit.html
				LLVM_DEBUG(dbgs() << "ConnexDispatchGroupSBHazardRecognizerPreRAScheduler::getHazardType(SU = ";
				dumpSU(SU, dbgs());
				dbgs() << ", Stalls = " << Stalls << ")\n");

				//if (Stalls == 0 && isLoadAfterStore(SU))
				if (Stalls == 0 && // no (pipeline?) stalls
				emittedNoop == false && // TODO TODO TODO This is a very louzy tmp solution
				isReadAfterWrite(SU)) {
				LLVM_DEBUG(dbgs() << " Pre-RA: getHazardType(): return NoopHazard\n");

				emittedNoop = true;

				return NoopHazard;
				}

				return ScoreboardHazardRecognizer::getHazardType(SU, Stalls);
				}

				void ConnexDispatchGroupSBHazardRecognizerPreRAScheduler::EmitInstruction(SUnit *SU) {
				unsigned i, ie;

				LLVM_DEBUG(dbgs() << "Entered Connex's PreRA EmitInstruction(";
				dumpSU(SU, dbgs());
				dbgs() << ")\n");
				LLVM_DEBUG(dbgs() << " SU->Succs.size() = "
				<< SU->Succs.size() << "\n");
				LLVM_DEBUG(dbgs() << " SU->Preds.size() = "
				<< SU->Preds.size() << "\n");

				for (i = 0, ie = (unsigned) SU->Succs.size(); i != ie; ++i) {
				MachineInstr *SuccMI = (SU->Succs[i].getSUnit())->getInstr();
				if (SuccMI == NULL) {
				LLVM_DEBUG(dbgs() << " SU->Succs["
				<< i << "] = NULL\n");
				}
				else {
				LLVM_DEBUG(dbgs() << " SU->Succs["
				<< i << "] = ";
				SuccMI->dump();
				dbgs() << "\n");
				}
				}
				for (i = 0, ie = (unsigned) SU->Preds.size(); i != ie; ++i) {
				MachineInstr *PredMI = (SU->Preds[i].getSUnit())->getInstr();
				if (PredMI == NULL) {
				LLVM_DEBUG(dbgs() << " SU->Preds["
				<< i << "] = NULL\n");
				}
				else {
				LLVM_DEBUG(dbgs() << " SU->Preds["
				<< i << "] = ";
				PredMI->dump();
				dbgs() << "\n");
				}
				}

				return ScoreboardHazardRecognizer::EmitInstruction(SU);
				}

				/* See also http://llvm.org/docs/doxygen/html/classllvm_1_1ScheduleHazardRecognizer.html
				PreEmitNoops - This callback is invoked prior to emitting an instruction.
				*/
				unsigned ConnexDispatchGroupSBHazardRecognizerPreRAScheduler::PreEmitNoops(SUnit *SU) {
				LLVM_DEBUG(dbgs() << "Entered Connex's PreRA PreEmitNoops()\n");
				return 0;
				}

				/* See also http://llvm.org/docs/doxygen/html/classllvm_1_1ScheduleHazardRecognizer.html
				EmitNoop - This callback is invoked when a noop was added to the instruction stream.
				*/
				void ConnexDispatchGroupSBHazardRecognizerPreRAScheduler::EmitNoop() {
				LLVM_DEBUG(dbgs() << "Entered Connex's PreRA EmitNoops()\n");
				}

lib/Target/Connex/ConnexISelDAGToDAG.cpp

This file has a very large number of changes (5,094 lines). Show File Contents

lib/Target/Connex/ConnexISelLowering.h

				//===-- ConnexISelLowering.h - Connex DAG Lowering Interface ----------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file defines the interfaces that Connex uses to lower LLVM code into a
				/// selection DAG.
				///
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_CONNEX_CONNEXISELLOWERING_H
				#define LLVM_LIB_TARGET_CONNEX_CONNEXISELLOWERING_H

				#include "Connex.h"
				#include "llvm/CodeGen/SelectionDAG.h"
				#include "llvm/CodeGen/TargetLowering.h"

				#include "ConnexConfig.h"



				namespace llvm {
				class ConnexSubtarget;

				namespace ConnexISD {
				/*
				From http://llvm.org/docs/doxygen/html/namespacellvm_1_1ISD.html:
				<<Targets may also define target-dependent operator codes for SDNodes.
				For example, on x86, these are the enum values in the X86ISD namespace.
				Targets should aim to use target-independent operators to model their
				instruction sets as much as possible, and only use target-dependent
				operators when they have special requirements.
				Finally, during and after selection proper, SNodes may use special operator
				codes that correspond directly with MachineInstr opcodes.
				These are used to represent selected instructions.
				See the isMachineOpcode() and getMachineOpcode() member functions of SDNode.>>
				*/
				enum NodeType : unsigned {
				FIRST_NUMBER = ISD::BUILTIN_OP_END,
				RET_FLAG,
				CALL,
				SELECT_CC,
				BR_CC,

				/* Inspired from lib/Target/X86/X86ISelLowering.h
				/// A wrapper node for TargetConstantPool,
				/// TargetExternalSymbol, and TargetGlobalAddress.
				*/
				Wrapper,

				// From [LLVM]/llvm/lib/Target/Mips/MipsISelLowering.h
				// Extended vector element extraction
				VEXTRACT_SEXT_ELT,
				VEXTRACT_ZEXT_ELT,

				//ConstantPool,

				// Vector Shuffle with mask as an operand
				VSHF, // Generic shuffle
				SHF, // 4-element set shuffle.
				ILVEV, // Interleave even elements
				ILVOD, // Interleave odd elements
				ILVL, // Interleave left elements
				ILVR, // Interleave right elements
				PCKEV, // Pack even elements
				PCKOD, // Pack odd elements
				};
				}


				class ConnexTargetLowering : public TargetLowering {
				public:
				explicit ConnexTargetLowering(const TargetMachine &TM,
				const ConnexSubtarget &STI);

				SDValue LowerConstantPool(SDValue Op, SelectionDAG &DAG) const;

				// Inspired from lib/Target/AMDGPU/AMDGPUISelLowering.h
				SDValue LowerDYNAMIC_STACKALLOC(SDValue Op,
				SelectionDAG &DAG) const;

				// Provide custom lowering hooks for some operations.
				SDValue LowerOperation(SDValue Op, SelectionDAG &DAG) const override;

				// This method returns the name of a target specific DAG node.
				const char *getTargetNodeName(unsigned Opcode) const override;

				MachineBasicBlock *EmitInstrWithCustomInserter(MachineInstr &MI,
				MachineBasicBlock *BB) const override;

				private:
				/*
				// From llvm/lib/Target/Mips/MipsISelLowering.h
				// Create a TargetGlobalAddress node.
				SDValue getTargetNode(GlobalAddressSDNode *N, EVT Ty, SelectionDAG &DAG,
				unsigned Flag) const;

				// Create a TargetExternalSymbol node.
				SDValue getTargetNode(ExternalSymbolSDNode *N, EVT Ty, SelectionDAG &DAG,
				unsigned Flag) const;

				// Create a TargetBlockAddress node.
				SDValue getTargetNode(BlockAddressSDNode *N, EVT Ty, SelectionDAG &DAG,
				unsigned Flag) const;

				// Create a TargetJumpTable node.
				SDValue getTargetNode(JumpTableSDNode *N, EVT Ty, SelectionDAG &DAG,
				unsigned Flag) const;
				*/
				// Create a TargetConstantPool node.
				SDValue getTargetNode(ConstantPoolSDNode *N, EVT Ty, SelectionDAG &DAG,
				unsigned Flag) const;

				// Added from lib/Target/Mips/MipsSEISelLowering.cpp (method addMSAIntType)
				void addVectorIntType(MVT::SimpleValueType Ty, const TargetRegisterClass *RC);

				// Inspired from lib/Target/Mips/MipsSEISelLowering.cpp, addMSAFloatType()
				void addVectorFloatType(MVT::SimpleValueType Ty,
				const TargetRegisterClass *RC);

				bool allowsMisalignedMemoryAccesses(EVT VT,
				unsigned,
				unsigned,
				bool *Fast) const;

				void replaceAddI32UseWithADDVH(MVT &aType, SDValue &Index,
				SelectionDAG &DAG) const;

				SDValue LowerBR_CC(SDValue Op, SelectionDAG &DAG) const;
				SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;
				SDValue LowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const;
				/static / SDValue LowerMGATHER(SDValue &Op,
				//const ConnexSubtarget &Subtarget,
				SelectionDAG &DAG) const;
				/static / SDValue LowerMSCATTER(SDValue &Op,
				//const ConnexSubtarget &Subtarget,
				SelectionDAG &DAG) const;

				// Lower the result values of a call, copying them out of physregs into vregs
				SDValue LowerCallResult(SDValue Chain, SDValue InFlag,
				CallingConv::ID CallConv, bool IsVarArg,
				const SmallVectorImpl<ISD::InputArg> &Ins,
				const SDLoc &DL, SelectionDAG &DAG,
				SmallVectorImpl<SDValue> &InVals) const;

				// Maximum number of arguments to a call
				static const unsigned MaxArgs;

				// Lower a call into CALLSEQ_START - ConnexISD:CALL - CALLSEQ_END chain
				SDValue LowerCall(TargetLowering::CallLoweringInfo &CLI,
				SmallVectorImpl<SDValue> &InVals) const override;

				// Lower incoming arguments, copy physregs into vregs
				SDValue LowerFormalArguments(SDValue Chain, CallingConv::ID CallConv,
				bool IsVarArg,
				const SmallVectorImpl<ISD::InputArg> &Ins,
				const SDLoc &DL, SelectionDAG &DAG,
				SmallVectorImpl<SDValue> &InVals) const override;

				SDValue LowerReturn(SDValue Chain, CallingConv::ID CallConv, bool IsVarArg,
				const SmallVectorImpl<ISD::OutputArg> &Outs,
				const SmallVectorImpl<SDValue> &OutVals, const SDLoc &DL,
				SelectionDAG &DAG) const override;

				EVT getOptimalMemOpType(uint64_t Size, unsigned DstAlign, unsigned SrcAlign,
				bool IsMemset, bool ZeroMemset, bool MemcpyStrSrc,
				MachineFunction &MF) const override {
				#define DEBUG_TYPE "connex-lower"

				LLVM_DEBUG(dbgs() << "Entered getOptimalMemOpType(Size = " << Size
				<< ")\n");

				return Size >= 8 ? MVT::i64 : MVT::i32;

				// TODO_CHANGE_BACKEND - Seems it's NOT required:
				//return Size >= 8 ? TYPE_VECTOR_ELEMENT : MVT::i32;

				#undef DEBUG_TYPE
				}

				bool shouldConvertConstantLoadToIntImm(const APInt &Imm,
				Type *Ty) const override {
				return true;
				}

				SDValue LowerVSELECT(SDValue &Op, SelectionDAG &DAG) const;

				// From [LLVM]/llvm/lib/Target/Mips/MipsSEISelLowering.h
				SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;

				SDValue LowerADD_I32(SDValue Op, SelectionDAG &DAG) const;

				SDValue LowerADD_F16(SDValue &Op, SelectionDAG *CurDAG) const;
				SDValue LowerMUL_F16(SDValue &Op, SelectionDAG *CurDAG) const;
				SDValue LowerREDUCE_F16(SDValue &Op, SelectionDAG *CurDAG) const;

				SDValue LowerBITCAST(SDValue Op, SelectionDAG &DAG) const;


				SDValue LowerINSERT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;
				SDValue LowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;
				SDValue LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) const;
				//
				EVT getSetCCResultType(const DataLayout &, LLVMContext &, EVT VT) const;
				}; // end class ConnexTargetLowering
				} // end namespace llvm

				#endif

lib/Target/Connex/ConnexISelLowering.cpp

This file has a very large number of changes (3,561 lines). Show File Contents

lib/Target/Connex/ConnexInstrInfo.h

				//===-- ConnexInstrInfo.h - Connex Instruction Information ------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains the Connex implementation of the TargetInstrInfo class.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_CONNEX_CONNEXINSTRINFO_H
				#define LLVM_LIB_TARGET_CONNEX_CONNEXINSTRINFO_H

				#include "Connex.h"
				#include "ConnexRegisterInfo.h"
				#include "llvm/CodeGen/TargetInstrInfo.h"

				#define GET_INSTRINFO_HEADER
				#include "ConnexGenInstrInfo.inc"

				namespace llvm {

				class ConnexInstrInfo : public ConnexGenInstrInfo {
				const ConnexRegisterInfo RI;

				public:
				ConnexInstrInfo();

				const ConnexRegisterInfo &getRegisterInfo() const { return RI; }


				// Got a bit inspired from lib/Target/AMDGPU/SIInstrInfo.cpp
				bool expandPostRAPseudo(MachineInstr &MI) const;


				// Note: we do not use Pre-RA hazard recognizer since it works on the
				// MachineInstr immediately after 1st scheduling pass, which is before the,
				// RA, TwoAddressInstructionPass, etc - so a lot of other instructions
				// will be added after 1st scheduling pass.
				// We would like our post-RA Hazard recognizer to be able to reschedule
				// instructions in a different order (with the ScoreBoardHazardRecognizer)
				// in order to avoid inserting useless NOPs.

				// USE_POSTRA_SCHED
				// Got inspired from llvm/lib/Target/PowerPC/PPCInstrInfo.h
				ScheduleHazardRecognizer *
				CreateTargetPostRAHazardRecognizer(const InstrItineraryData *II,
				const ScheduleDAG *DAG) const override;


				ScheduleHazardRecognizer *
				arsenmUnsubmitted Done Reply Inline Actions No macro for this arsenm: No macro for this
				CreateTargetMIHazardRecognizer(const InstrItineraryData *II,
				const ScheduleDAG *DAG) const override;

				void insertNoop(MachineBasicBlock &MBB,
				MachineBasicBlock::iterator MI) const;


				void copyPhysReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator I,
				const DebugLoc &DL, unsigned DestReg, unsigned SrcReg,
				bool KillSrc) const override;

				void storeRegToStackSlot(MachineBasicBlock &MBB,
				MachineBasicBlock::iterator MBBI, unsigned SrcReg,
				bool isKill, int FrameIndex,
				const TargetRegisterClass *RC,
				const TargetRegisterInfo *TRI) const override;

				void loadRegFromStackSlot(MachineBasicBlock &MBB,
				MachineBasicBlock::iterator MBBI, unsigned DestReg,
				int FrameIndex, const TargetRegisterClass *RC,
				const TargetRegisterInfo *TRI) const override;
				bool analyzeBranch(MachineBasicBlock &MBB, MachineBasicBlock *&TBB,
				MachineBasicBlock *&FBB,
				SmallVectorImpl<MachineOperand> &Cond,
				bool AllowModify) const override;

				unsigned removeBranch(MachineBasicBlock &MBB,
				int *BytesRemoved = nullptr) const override;

				unsigned insertBranch(MachineBasicBlock &MBB, MachineBasicBlock *TBB,
				MachineBasicBlock *FBB, ArrayRef<MachineOperand> Cond,
				const DebugLoc &DL,
				int *BytesAdded = nullptr) const override;

				bool isPredicable(MachineInstr &MI) const;

				protected:
				MachineMemOperand *GetMemOperand(MachineBasicBlock &MBB, int FI,
				MachineMemOperand::Flags Flag) const;
				}; // end class ConnexInstrInfo
				} // end namespace llvm

				#endif

lib/Target/Connex/ConnexInstrInfo.cpp

				//===-- ConnexInstrInfo.cpp - Connex Instruction Information ----------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains the Connex implementation of the TargetInstrInfo class.
				//
				//===----------------------------------------------------------------------===//

				#include "Connex.h"
				#include "ConnexHazardRecognizer.h" // USE_POSTRA_SCHED
				#include "ConnexHazardRecognizerPreRAScheduler.h"
				//#include "llvm/CodeGen/ScheduleDAG.h"
				#include "ConnexInstrInfo.h"
				#include "ConnexSubtarget.h"
				#include "ConnexTargetMachine.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/Support/ErrorHandling.h"
				#include "llvm/Support/TargetRegistry.h"
				#include "llvm/ADT/STLExtras.h"
				#include "llvm/ADT/SmallVector.h"

				#include "llvm/CodeGen/MachineFrameInfo.h"
				#include "llvm/Support/Debug.h"
				#define DEBUG_TYPE "connex-lower"

				#define GET_INSTRINFO_CTOR_DTOR
				#include "ConnexGenInstrInfo.inc"

				using namespace llvm;






				MachineInstr getPredMachineInstr(MachineInstr MI, MachineInstr **succMI) {
				MachineBasicBlock *MBB = MI->getParent();
				DebugLoc DL = MBB->findDebugLoc(MI);

				LLVM_DEBUG(dbgs() << "getPredMachineInstr(): MI.getOpcode() = "
				<< MI->getOpcode() << "\n");

				//switch (MI.getOpcode())

				MachineInstr *predMI = NULL;
				/*
				MachineInstr *succMI = NULL;
				*/
				*succMI = NULL;

				for (MachineBasicBlock::iterator I = MBB->begin(),
				IE = MBB->end(); I != IE; ++I) {
				MachineInstr IMI = (MachineInstr )(&(*I));
				if (IMI == MI) {
				I++;
				succMI = (MachineInstr )(&(*I));
				break;
				}
				predMI = (MachineInstr )(&(I));
				LLVM_DEBUG(dbgs() << "getPredMachineInstr(): (I in MBB of MI) I->getOpcode() = "
				<< I->getOpcode() << "\n");
				}

				LLVM_DEBUG(dbgs() << "getPredMachineInstr(): MI = "
				<< MI
				<< "(" << MI << ")"
				<< "\n");
				if ((succMI) != NULL && (succMI) != nullptr) {
				LLVM_DEBUG(dbgs() << "getPredMachineInstr(): succMI = "
				//We do not put this one because we can have issues with NULL/invalid MachineInstr (at least in case of llc -regalloc=fast) << **succMI
				<< "[TO BE DONE]"
				<< "(" << *succMI << ")"
				<< "\n");
				}
				else {
				LLVM_DEBUG(dbgs() << "getPredMachineInstr(): succMI = NULL\n");
				}

				if (predMI != NULL) {
				LLVM_DEBUG(dbgs() << "getPredMachineInstr(): predMI = "
				<< *predMI
				<< "(" << predMI << ")"
				<< "\n");
				}
				else {
				LLVM_DEBUG(dbgs() << "getPredMachineInstr(): predMI = NULL\n");
				}

				return predMI;
				}


				ConnexInstrInfo::ConnexInstrInfo()
				: ConnexGenInstrInfo(Connex::ADJCALLSTACKDOWN, Connex::ADJCALLSTACKUP) {}


				// Inspired from lib/Target/Mips/MipsInstrInfo.cpp
				MachineMemOperand *ConnexInstrInfo::GetMemOperand(MachineBasicBlock &MBB,
				int FI,
				MachineMemOperand::Flags Flag
				) const {
				LLVM_DEBUG(dbgs() << "Entered ConnexInstrInfo::GetMemOperand()\n");

				MachineFunction &MF = *MBB.getParent();
				MachineFrameInfo &MFI = MF.getFrameInfo();
				unsigned Align = MFI.getObjectAlignment(FI);

				return MF.getMachineMemOperand(MachinePointerInfo::getFixedStack(MF, FI),
				Flag, MFI.getObjectSize(FI), Align);
				}


				/*
				From http://llvm.org/docs/doxygen/html/classllvm_1_1TargetInstrInfo.html:
				virtual void copyPhysReg (MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, DebugLoc DL, unsigned DestReg, unsigned SrcReg, bool KillSrc) const
				Emit instructions to copy a pair of physical registers.
				virtual void storeRegToStackSlot (MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, unsigned SrcReg, bool isKill, int FrameIndex, const TargetRegisterClass RC, const TargetRegisterInfo TRI) const
				Store the specified register of the given register class to the specified stack frame index.
				virtual void loadRegFromStackSlot (MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, unsigned DestReg, int FrameIndex, const TargetRegisterClass RC, const TargetRegisterInfo TRI) const
				Load the specified register of the given register class from the specified stack frame index.
				*/
				void ConnexInstrInfo::copyPhysReg(MachineBasicBlock &MBB,
				MachineBasicBlock::iterator I,
				const DebugLoc &DL, unsigned DestReg,
				unsigned SrcReg, bool KillSrc) const {
				LLVM_DEBUG(dbgs()
				<< "Entered ConnexInstrInfo::copyPhysReg(I = " << *I
				<< ", DestReg = " << DestReg
				<< ", SrcReg = " << SrcReg
				<< ")\n");

				if (Connex::GPRRegClass.contains(DestReg, SrcReg)) {
				BuildMI(MBB, I, DL, get(Connex::MOV_rr), DestReg)
				.addReg(SrcReg, getKillRegState(KillSrc));
				}
				else
				if (Connex::VectorHRegClass.contains(DestReg, SrcReg)) {
				//llvm_unreachable("NOT implemented well!");

				/*
				// TODO_TODO
				if (SrgReg == ct) {
				BuildMI(MBB, I, DL, get(Connex::VLOAD_H), DestReg)
				.addImm(ct) //, getKillRegState(KillSrc))
				.addReg(SrcReg);
				}
				*/

				BuildMI(MBB, I, DL, get(Connex::ORV_H), DestReg)
				.addReg(SrcReg) //, getKillRegState(KillSrc))
				.addReg(SrcReg);
				}
				else
				//if (Connex::BoolMaskRegClass.contains(DestReg, SrcReg))
				if (Connex::BoolMaskRegClass.contains(DestReg) \|\|
				Connex::BoolMaskRegClass.contains(SrcReg)) {
				LLVM_DEBUG(dbgs()
				<< "ConnexInstrInfo::copyPhysReg(): DestReg or SrcReg are in BoolMask\n");
				/*
				// IMPORTANT-TODO: what if register Wh31, also called R(31), is already in use for some other var?
				BuildMI(MBB, I, DL, get(Connex::VLOAD_H), Connex::Wh31)
				.addImm(0);

				BuildMI(MBB, I, DL, get(Connex::ORV_H), DestReg)
				.addReg(SrcReg) //, getKillRegState(KillSrc))
				.addReg(Connex::Wh31, getKillRegState(KillSrc));
				*/
				}
				/*
				// PREFERABLY_NOT_2019_03_21
				else
				if ( (Connex::MSA128WRegClass.contains(DestReg) &&
				Connex::VectorHRegClass.contains(SrcReg)) \|\|
				//
				(Connex::MSA128WRegClass.contains(SrcReg) &&
				Connex::VectorHRegClass.contains(DestReg)) ) {

				if (Connex::MSA128WRegClass.contains(DestReg)) {
				LLVM_DEBUG(dbgs()
				<< "ConnexInstrInfo::copyPhysReg(): DestReg is TYPE_VECTOR_I32 and SrcReg is TYPE_VECTOR_I16\n");
				}
				else
				if (Connex::MSA128WRegClass.contains(DestReg)) {
				LLVM_DEBUG(dbgs()
				<< "ConnexInstrInfo::copyPhysReg(): DestReg is TYPE_VECTOR_I16 and SrcReg is TYPE_VECTOR_I32\n");
				}

				// BuildMI(MBB, I, DL, get(Connex::INLINEASM)); // This makes llc give error: <</home/asusu/LLVM/llvm38Nov2016/llvm/include/llvm/CodeGen/MachineInstr.h:293: const llvm::MachineOperand& llvm::MachineInstr::getOperand(unsigned int) const: Assertion `i < getNumOperands() && "getOperand() out of range!"' failed.>>
				// This works surprisingly: BuildMI(MBB, I, DL, get(Connex::NOP_BITCONVERT_HW));

				#ifdef COPY_REGISTER_IMPLEMENTED_WITH_ORV_H
				//BuildMI(MBB, I, DL, get(Connex::NOP_BOGUS));
				BuildMI(MBB, I, DL, get(Connex::ORV_H), DestReg)
				.addReg(SrcReg) //, getKillRegState(KillSrc))
				.addReg(SrcReg);
				#endif
				}
				*/
				else {
				llvm_unreachable("Impossible reg-to-reg copy");
				}
				}


				// storeRegToStackSlot() and loadRegFromStackSlot() use
				// the FI argument (frame index, the index within the current frame)
				//
				// This implements spilling of registers (both scalar, and vector).
				void ConnexInstrInfo::storeRegToStackSlot(MachineBasicBlock &MBB,
				MachineBasicBlock::iterator I,
				unsigned SrcReg, bool IsKill, int FI,
				const TargetRegisterClass *RC,
				const TargetRegisterInfo *TRI) const {
				DebugLoc DL;

				assert(FI >= 2 && "It seems I assumed wrong that frame index >= 2");

				/* MEGA-TODO: the FI is only 1 variable, and we basically have 2 stack frames:
				- 1 for the scalar CPU
				- normally 1 for the separate address-space LS memory Connex vector processor,
				although Connex does NOT allow calls inside vector kernels,
				BUT the CPU does although a good case is not simple.

				Think of a case where this mildly-viciated solution is NOT good for
				programs (remember we output Opincaa programs and NO CPU assembly code,
				and Connex does NOT allow calls inside vector kernels).

				Also, understand well why FI >= 2 always holds - it seems there is some prologue.
				*/
				unsigned ConnexLSOffsetSpillLoad = (CONNEX_MEM_NUM_ROWS + 1) - FI;

				if (I != MBB.end())
				DL = I->getDebugLoc();

				if (RC == &Connex::GPRRegClass) {
				BuildMI(MBB, I, DL, get(Connex::STD))
				.addReg(SrcReg, getKillRegState(IsKill))
				.addFrameIndex(FI)
				.addImm(0);
				}
				else
				if (RC == &Connex::VectorHRegClass) {
				LLVM_DEBUG(dbgs() << " ConnexInstrInfo::storeRegToStackSlot(): Spilling Wh"
				<< SrcReg
				<< " to ConnexLSOffsetSpillLoad = "
				<< ConnexLSOffsetSpillLoad
				<< " (FI = "
				<< FI
				<< "), "
				<< "I == MBB.end() is " << (I == MBB.end())
				<< ", MBB = " << MBB.getFullName()
				<< ", &MBB.front() = " << &(MBB.front()) << "\n"
				<< "MBB = " << MBB
				//<< ", MBB.front() = " << MBB.front()
				);

				/* VERY IMPORTANT: after experimenting (see
				/home/asusu/LLVM/llvm38Nov2016/llvm/build40/bin/Tests/NEW_v128i16/DawnCC/91_SAD_f16/FEATURE_LENGTH_128/A/STDerr_llc_01)
				if we have INLINEASM at the beginning of the MBB, the MBB.front() is
				the 1st instruction AFTER these INLINEASM - this is why we can end up
				adding more NOPs...
				IMPORTANT-TODO: we should take into consideration that vector.body has
				INLINEASM with host-side for loop here normally.
				*/

				// Note: this method is spilling the destination register of the instruction *(I-1)
				/*
				// I got a strange error in LLVM when printing in certain cases *I - see e.g. /home/asusu/LLVM/llvm38Nov2016/llvm/build40/bin/Tests/NEW_v128i16/DawnCC/90_SSD_f16/3/STDerr_llc_01_old03
				LLVM_DEBUG(dbgs() << " ConnexInstrInfo::storeRegToStackSlot(): *I = "
				<< *I);
				*/

				/*
				Important-TODO: maybe we can avoid inserting the NOP now by making the
				post-RA (maybe even the pre-RA) scheduler reschedule instructions
				to insert a useful instruction in this delay slot.

				Adding the NOP is mandatory if the previous instruction updates the
				spilled register, since all (i)write instructions require the
				insertion of a delay slot between them and the instructions that
				generates their operands
				- in this case the register to be written to the LS memory.

				It prints something like:
				<<
				*(I--) = %vreg538<def> = XORV_H %vreg106, %vreg105; VectorH:%vreg538,%vreg106,%vreg105 dbg:test.c:48:36
				*I = %vreg175<def> = ADDV_H %vreg149, %vreg164; VectorH:%vreg175,%vreg149,%vreg164 dbg:test.c:48:36>>

				In this case it spills %vreg538 to LS memory
				- with an instruction like LS[1020] = R...
				*/

				MachineBasicBlock::iterator Iprev; // = I;

				#ifdef EXPERIMENTAL_2019_05
				bool IFront = (I == MBB.front());
				#endif

				MachineInstr *IMI;
				if (I == MBB.end())
				IMI = NULL;
				else
				IMI = (MachineInstr )(&(I));

				LLVM_DEBUG(dbgs() << " storeRegToStackSlot(): IMI = "
				<< IMI
				<< "\n");
				LLVM_DEBUG(dbgs() << " storeRegToStackSlot(): IMI == &MBB.front() = "
				<< (IMI == (&MBB.front()) )
				<< "\n");

				if ( (I != MBB.end()) &&
				(IMI != NULL) &&
				(IMI != (&MBB.front())) ) {
				Iprev = I;
				Iprev--;
				MachineInstr IprevMI = (MachineInstr )(&(*Iprev));

				LLVM_DEBUG(dbgs() << " storeRegToStackSlot(): IprevMI = "
				<< *IprevMI
				<< "\n");
				LLVM_DEBUG(dbgs() << " storeRegToStackSlot(): IprevMI->getNumOperands() = "
				<< IprevMI->getNumOperands()
				<< "\n");
				LLVM_DEBUG(dbgs() << " storeRegToStackSlot(): IprevMI->getOpcode() == Connex::INLINEASM = "
				<< (IprevMI->getOpcode() == Connex::INLINEASM)
				<< "\n");
				LLVM_DEBUG(dbgs() << " storeRegToStackSlot(): IprevMI->getOpcode() == Connex::VLOAD_H_SYM_IMM = "
				<< (IprevMI->getOpcode() == Connex::VLOAD_H_SYM_IMM)
				<< "\n");
				// The case where I screw up is LS[1013] = ...
				// because the INLINEASM before it is the MBB.front() and is INLINEASM.

				if (IprevMI != NULL &&
				// NOT necessary: (IprevMI != (&MBB.front())) &&
				//(IMI != (&MBB.front())) &&
				(IprevMI->getNumOperands() > 0 \|\| // MEGA-TODO: understand why I give this
				IprevMI->getOpcode() == Connex::INLINEASM \|\|
				IprevMI->getOpcode() == Connex::VLOAD_H_SYM_IMM) ) {

				LLVM_DEBUG(dbgs()
				<< " storeRegToStackSlot(): Handling special cases.\n");

				MachineOperand &I0Opnd = IprevMI->getOperand(0);

				// Avoiding separating VLOAD_H_SYM_IMM from its corresponding INLINEASM
				if (IprevMI->getOpcode() == Connex::VLOAD_H_SYM_IMM) {
				// Treating Symbolic immediate operands
				// MEGA-TODO: check
				//assert(0 && "Bogus");
				assert(IprevMI->getNumOperands() > 0); // Just checking
				assert(IMI->getOpcode() == Connex::INLINEASM &&
				"The INLINEASM with the immediate operand should be next "
				"for VLOAD_H_SYM_IMM.");

				LLVM_DEBUG(dbgs() << " storeRegToStackSlot(): Treating "
				"VLOAD_H_SYM_IMM case.\n");
				I++;
				//Iprev++;
				}

				if ( (//IprevMI->getNumOperands() > 0 &&
				I0Opnd.isReg() &&
				I0Opnd.isDef() &&
				I0Opnd.getReg() == SrcReg
				) \|\|
				(IprevMI->getOpcode() == Connex::INLINEASM)) {
				/* Important-TODO: check better: first, for SAD.f16 we have a COPY
				between the host-for and the spill - so we should do these checks
				after the hoisting of spills, etc - IMPORTANT: either in
				ConnexAsmPrinter.cpp or PostRAHazardRecognizer which I'm afraid to
				run for programs using bigger types like f16 - e.g., SSD.f16.
				It is possible that the instruction IprevMI be a
				VLOAD or a for loop that has an instruction with dst register
				the one that is spilled. */
				LLVM_DEBUG(dbgs() << " storeRegToStackSlot(): Adding NOP_BPF to "
				"avoid data hazards...[Explain better...]\n");
				#ifdef EXPERIMENTAL_2019_05
				BuildMI(MBB, I, DL, get(Connex::NOP_BPF)).addImm(1);
				#endif
				}
				else {
				LLVM_DEBUG(dbgs() << " storeRegToStackSlot(): Not putting NOP "
				"after IprevMI = "
				<< *IprevMI
				//<< " before I = " << *I << "\n");
				<< " before: IMI = " << IMI << ",\n"
				<< " IMI->getOpcode() = "
				<< IMI->getOpcode() << "\n");
				/* I get some error here, from MachineInstr.cpp:1695:
				"I = #0 0x00007faf1da72700" and then it
				crashes without any warning:
				<< " I = " << IMI << "\n"); /
				}
				}
				else {
				LLVM_DEBUG(dbgs() << " storeRegToStackSlot(): else case for "
				"if (IprevMI != NULL && ...)\n");
				}
				}
				else {
				LLVM_DEBUG(dbgs() << " storeRegToStackSlot(): else case for "
				"if (IMI != NULL && Iprev != &MBB.front())\n");

				if (IMI == (&MBB.front())) {
				LLVM_DEBUG(dbgs() << " storeRegToStackSlot(): case IMI == &MBB.front()\n");
				// We conservatively put a NOP before the spill (Store)
				#ifdef EXPERIMENTAL_2019_05
				// MEGA MEGA-TODO: see /home/asusu/LLVM/Tests/DawnCC/35l_MatMul_f16/SIZE_256/H_CVL8_LLVMnew/A/STDerr_llc_01 - gives error: <<Assertion `itr != mi2iMap.end() && "Instruction not found in maps."' failed.>>
				BuildMI(MBB, I, DL, get(Connex::NOP_BPF)).addImm(1);
				#endif
				}
				}
				//BuildMI(MBB, I, DL, get(Connex::NOP_BOGUS));

				#ifdef EXPERIMENTAL_2019_05
				if (IFront == false) {
				#endif
				BuildMI(MBB, I, DL, get(Connex::ST_SPILL_H))
				.addReg(SrcReg, getKillRegState(IsKill))
				/*
				// Gives error I guess because it is a vector instruction, not eBPF one:
				// void llvm::MachineInstr::addOperand(llvm::MachineFunction&,
				// const llvm::MachineOperand&): Assertion `(isImpReg \|\| Op.isRegMask() \|\|
				// MCID->isVariadic() \|\| OpNo < MCID->getNumOperands() \|\| isMetaDataOp) &&
				// "Trying to add an operand to a machine instr that is already done!"'
				// failed.
				.addFrameIndex(FI)
				// Even if Connex does NOT have a stack, we can use LS mem to easily
				// simulate it.
				*/
				.addImm(ConnexLSOffsetSpillLoad);
				#ifdef EXPERIMENTAL
				}
				#endif

				LLVM_DEBUG(dbgs() <<
				" storeRegToStackSlot(): Added ST_SPILL_H instruction.\n");
				LLVM_DEBUG(dbgs() <<
				" storeRegToStackSlot(): MBB = " << MBB << "\n");
				}
				else
				if (RC == &Connex::BoolMaskRegClass) {
				/*
				BuildMI(MBB, I, DL, get(Connex::ST_H))
				.addReg(SrcReg, getKillRegState(IsKill))
				.addImm(CONNEX_MEM_NUM_ROWS - 100);
				// TODO: this is just bogus I guess, no need to spill v8i1 register
				*/
				}
				else {
				llvm_unreachable("Connex back end: Can't store this register to stack slot");
				}
				}



				// This implements filling/reloading - i.e., load for spilled registers
				// (both scalar, and vector).
				void ConnexInstrInfo::loadRegFromStackSlot(MachineBasicBlock &MBB,
				MachineBasicBlock::iterator I,
				unsigned DestReg,
				int FI,
				const TargetRegisterClass *RC,
				const TargetRegisterInfo *TRI) const {
				DebugLoc DL;

				assert(FI >= 2 && "I assumed wrong that frame index >= 2");

				unsigned ConnexLSOffsetFillLoad = (CONNEX_MEM_NUM_ROWS + 1) - FI;

				if (I != MBB.end())
				DL = I->getDebugLoc();

				if (RC == &Connex::GPRRegClass) {
				BuildMI(MBB, I, DL, get(Connex::LDD), DestReg)
				.addFrameIndex(FI)
				.addImm(0);
				}
				else
				if (RC == &Connex::VectorHRegClass) {
				/*
				// This actually generates a malformed scalar instruction with
				// vector register
				BuildMI(MBB, I, DL, get(Connex::LDD), DestReg)
				.addFrameIndex(FI)
				.addImm(0);
				*/
				/*
				// It is NOT correct since LLVM assumes it uses a stack and the
				// operations are sort of PUSH/POP. Even if Connex does NOT have
				// a stack, we can use LS to easily simulate it.
				BuildMI(MBB, I, DL, get(Connex::LD_H), DestReg)
				.addImm(CONNEX_MEM_NUM_ROWS - 1 - DestReg);
				*/

				LLVM_DEBUG(dbgs() << " ConnexInstrInfo::loadRegFromStackSlot(): Filling Wh"
				<< DestReg
				<< " from ConnexLSOffsetFillLoad = "
				<< ConnexLSOffsetFillLoad
				<< " (FI = "
				<< FI
				<< ")\n");

				/*
				IMPORTANT: Adding the NOP is NOT required, since the iread Connex
				instruction does NOT require the insertion of a delay slot between
				them and the instruction that uses the register read from the LS memory.
				*/
				BuildMI(MBB, I, DL, get(Connex::LD_FILL_H), DestReg)
				.addImm(ConnexLSOffsetFillLoad);
				// TODO TODO TODO: get num vector registers from ConnexRegisterInfo.td: def VectorH: RegisterClass<"Connex", [v128i16], 32,
				}
				else {
				llvm_unreachable("Connex back end: Can't load this register from stack slot");
				}
				}

				bool ConnexInstrInfo::analyzeBranch(MachineBasicBlock &MBB,
				MachineBasicBlock *&TBB,
				MachineBasicBlock *&FBB,
				SmallVectorImpl<MachineOperand> &Cond,
				bool AllowModify) const {
				// Start from the bottom of the block and work up, examining the
				// terminator instructions.
				MachineBasicBlock::iterator I = MBB.end();
				while (I != MBB.begin()) {
				--I;
				if (I->isDebugValue())
				continue;

				// Working from the bottom, when we see a non-terminator
				// instruction, we're done.
				if (!isUnpredicatedTerminator(*I))
				break;

				// A terminator that isn't a branch can't easily be handled
				// by this analysis.
				if (!I->isBranch())
				return true;

				// Handle unconditional branches.
				if (I->getOpcode() == Connex::JMP) {
				if (!AllowModify) {
				TBB = I->getOperand(0).getMBB();
				continue;
				}

				// If the block has any instructions after a J, delete them.
				while (std::next(I) != MBB.end())
				std::next(I)->eraseFromParent();
				Cond.clear();
				FBB = 0;

				// Delete the J if it's equivalent to a fall-through.
				if (MBB.isLayoutSuccessor(I->getOperand(0).getMBB())) {
				TBB = 0;
				I->eraseFromParent();
				I = MBB.end();
				continue;
				}

				// TBB is used to indicate the unconditinal destination.
				TBB = I->getOperand(0).getMBB();
				continue;
				}
				// Cannot handle conditional branches
				return true;
				}

				return false;
				}

				unsigned ConnexInstrInfo::insertBranch(MachineBasicBlock &MBB,
				MachineBasicBlock *TBB,
				MachineBasicBlock *FBB,
				ArrayRef<MachineOperand> Cond,
				const DebugLoc &DL,
				int *BytesAdded) const {
				// Shouldn't be a fall through.
				assert(TBB && "InsertBranch must not be told to insert a fallthrough");

				if (Cond.empty()) {
				// Unconditional branch
				assert(!FBB && "Unconditional branch with multiple successors!");
				BuildMI(&MBB, DL, get(Connex::JMP)).addMBB(TBB);
				return 1;
				}

				llvm_unreachable("Unexpected conditional branch");
				}

				unsigned ConnexInstrInfo::removeBranch(MachineBasicBlock &MBB,
				int *BytesRemoved) const {
				MachineBasicBlock::iterator I = MBB.end();
				unsigned Count = 0;

				while (I != MBB.begin()) {
				--I;
				if (I->isDebugValue())
				continue;
				if (I->getOpcode() != Connex::JMP)
				break;
				// Remove the branch.
				I->eraseFromParent();
				I = MBB.end();
				++Count;
				}

				return Count;
				}

				/*
				TODO TODO: better implement it in ConnexTargetMachine::addPreRegAlloc(), in
				order to avoid any spills the register allocator might create.

				Creating in ConnexInstrInfo::expandPostRAPseudo() bundle instructions
				with VLOAD_H_SYM_IMM + INLINEASM.
				This is a decent compromise although I do NOT use pseudo-instructions,
				using this after Register Allocation (PostRA) works because:
				- IMPORTANT: INLINEASM is considered a pseudo-instruction (NOTE that
				VLOAD_H_SYM_IMM is NOT considered a pseudo-instruction);
				- pre-RA scheduler does NOT break the VLOAD_H_SYM_IMM from its associated
				INLINEASM;
				- register allocator does NOT break either the VLOAD_H_SYM_IMM from its
				associated INLINEASM, more exactly it doesn't insert spills or fills
				between the two instructions as far as I can see. IMPORTANT: however I
				am NOT sure if this is always going to hold.
				As of Feb 2017, class TargetInstrInfo
				(see http://llvm.org/docs/doxygen/html/classllvm_1_1TargetInstrInfo.html)
				has a few methods called on MachineInstr, but expandPostRAPseudo() seems
				to be a very good candidate (also it has no method with MachineSDNode).
				Anyhow, we could create and register our own pass working on MachineInstr in
				order to bundle instructions together (or on MachineSDNode, before pre-RA
				scheduler, although I guess it might be DIFFICULT to bundle from
				MachineSDNode to MachineInstr, since we have to perform a simple scheduling).

				From http://llvm.org/docs/doxygen/html/classllvm_1_1TargetInstrInfo.html
				<<This function is called for all pseudo instructions that remain after register allocation.
				Many pseudo instructions are created to help register allocation.
				This is the place to convert them into real instructions.
				The target can edit MI in place, or it can insert new instructions and erase MI.
				The function should return true if anything was changed.>>
				*/
				bool ConnexInstrInfo::expandPostRAPseudo(MachineInstr &MI) const {
				// Making expandPostRAPseudo() do nothing:
				return false;

				LLVM_DEBUG(dbgs() << "ConnexInstrInfo::expandPostRAPseudo(): MI.getOpcode() = "
				<< MI.getOpcode() << "\n");

				MachineBasicBlock *MBB = MI.getParent();
				DebugLoc DL = MBB->findDebugLoc(MI);

				/*
				// Inspired from lib/Target/PowerPC/PPCCTRLoops.cpp
				for (MachineBasicBlock::pred_iterator PI = MBB->pred_begin(),
				PIE = MBB->pred_end(); PI != PIE; ++PI)
				Preds.push_back(*PI);
				*/
				switch (MI.getOpcode()) {
				default:
				//return expandPostRAPseudo(MI);
				return false;

				case Connex::VLOAD_H_SYM_IMM:
				// This is just a placeholder for register allocation.
				LLVM_DEBUG(dbgs() <<
				"ConnexInstrInfo::expandPostRAPseudo(): found VLOAD_H_SYM_IMM\n");
				//MI.eraseFromParent();
				break;

				case Connex::INLINEASM:
				// This is just a placeholder for register allocation.
				LLVM_DEBUG(dbgs() <<
				"ConnexInstrInfo::expandPostRAPseudo(): found INLINEASM\n");

				/*
				MachineInstr *predMI = NULL;
				MachineInstr *succMI = NULL;
				for (MachineBasicBlock::iterator I = MBB->begin(),
				IE = MBB->end(); I != IE; ++I) {
				MachineInstr *IMI = I;
				if (IMI == &MI) {
				I++;
				succMI = I;
				// predMI contains normally instruction VLOAD_H_SYM_IMM
				break;
				}
				predMI = I;
				LLVM_DEBUG(dbgs() << "expandPostRAPseudo(): (pred) I->getOpcode() = "
				<< I->getOpcode() << "\n");
				}
				*/
				MachineInstr *succMI;
				MachineInstr *predMI = getPredMachineInstr(&MI, &succMI);

				if (predMI != NULL) {
				LLVM_DEBUG(dbgs() << "expandPostRAPseudo(): predMI = "
				<< *predMI
				<< "(" << predMI << ")"
				<< "\n");
				LLVM_DEBUG(dbgs() << "expandPostRAPseudo(): succMI = "
				<< *succMI
				<< "(" << succMI << ")"
				<< "\n");
				LLVM_DEBUG(dbgs() << "expandPostRAPseudo(): MI = "
				<< MI
				<< "(" << &MI << ")"
				<< "\n");

				if (predMI->getOpcode() == Connex::VLOAD_H_SYM_IMM) {
				// Inspired from lib/Target/AMDGPU/SIInstrInfo.cpp
				// (or Mips/MipsDelaySlotFiller.cpp)
				/* Create a bundle so these instructions won't be re-ordered by the
				post-RA scheduler. */

				/*
				#ifdef THIS_DOES_NOT_ASMPRINT_BUNDLES
				MIBundleBuilder Bundler(*MBB, MI);

				LLVM_DEBUG(dbgs() << "expandPostRAPseudo(): predMI->getParent() = "
				<< predMI->getParent() << "\n");

				// This must NOT be commented. Otherwise, it results in ~strange error
				in ConnexMCInstLower::Lower()
				predMI->eraseFromParent();
				LLVM_DEBUG(dbgs() << "expandPostRAPseudo(): appending predMI to bundle\n");
				Bundler.append(predMI);

				LLVM_DEBUG(dbgs() << "expandPostRAPseudo(): calling finalizeBundle()\n");
				// See http://llvm.org/docs/doxygen/html/MachineInstrBundle_8cpp_source.html#l00217
				llvm::finalizeBundle(*MBB, Bundler.begin());

				MI.eraseFromParent();

				#ifdef NOT_USEFUL
				// Inspired from http://llvm.org/docs/doxygen/html/MachineInstrBuilder_8h_source.html#l00434
				MI.bundleWithPred();
				// Does NOT compile: llvm::finalizeBundle(MBB, predMI);
				#endif
				*/

				/* We now know that MI is the INLINEASM instruction that
				needs to be bundled with the previous instruction, predMI.
				*/
				/*
				We do NOT use MIBundleBuilder, with eventual MI/predMI/succMI.eraseFromParent().
				Just predMI and succMI iterators.
				Note that succMI is required if we want to bundle
				instructions in the interval
				predMI..MI, where succMI = succ(MI).

				So we normally bundle here: predMI, MI (without succMI).
				*/
				/* See llvm.org/docs/doxygen/html/MachineInstrBundle_8cpp_source.html#l00106
				and http://llvm.org/docs/doxygen/html/MachineInstrBundle_8cpp_source.html#l00217
				*/
				llvm::finalizeBundle(*MBB,
				(MachineBasicBlock::instr_iterator)predMI,
				(MachineBasicBlock::instr_iterator)succMI);
				//(MachineBasicBlock::instr_iterator)&MI);

				/*
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MIBundleBuilder.html
				// MIBundleBuilder (MachineBasicBlock &BB, MachineBasicBlock::iterator B, MachineBasicBlock::iterator E)
				// Create a bundle from the sequence of instructions between B and E.
				MIBundleBuilder Bundler(*MBB, predMI, MI);

				// MI.eraseFromParent();
				// Bundler.append(&MI);

				//Bundler.append(&MI);
				//

				// Gives error
				//include/llvm/CodeGen/MachineInstrBundleIterator.h:42:
				//llvm::MachineInstrBundleIterator<Ty>::MachineInstrBundleIterator(Ty*)
				//[with Ty = llvm::MachineInstr]:
				//Assertion `(!MI \|\| !MI->isBundledWithPred()) && "It's not legal to
				//initialize " "MachineInstrBundleIterator " "with a bundled MI"' failed.
				////MIBundleBuilder Bundler(MBB, predMI, succMI);

				// See http://llvm.org/docs/doxygen/html/MachineInstrBundle_8cpp_source.html#l00217
				llvm::finalizeBundle(*MBB, Bundler.begin());

				MI.eraseFromParent();

				// This yields error <<[with Ty = llvm::MachineInstr]:
				// Assertion `(!MI \|\| !MI->isBundledWithPred()) &&
				// "It's not legal to initialize " "MachineInstrBundleIterator "
				// "with a bundled MI"' failed.>>
				// predMI->eraseFromParent();
				*/
				}
				}

				break;
				}

				LLVM_DEBUG(dbgs() << "Before exit expandPostRAPseudo():\n");
				// Gives error since MI can be bundled: <<Assertion `!MI.isBundledWithPred() && "It's not legal to initialize " "MachineInstrBundleIterator with a " "bundled MI"' failed.>> MachineBasicBlock &MBB = *(MI.getParent());

				// From http://llvm.org/docs/doxygen/html/classllvm_1_1MachineBasicBlock.html
				//for (auto it: *MBB)
				for (MachineBasicBlock::iterator I = MBB->begin(),
				IE = MBB->end(); I != IE; ++I) {
				/*
				LLVM_DEBUG(dbgs() << "ConnexInstrInfo::expandPostRAPseudo(): it->getOpcode() = "
				<< it->getOpcode() << "\n");
				*/
				LLVM_DEBUG(dbgs() << " I = " << *I << "\n");
				/*
				switch (MI.getOpcode()) {
				}
				*/
				}

				/*
				const SIRegisterInfo *TRI
				= static_cast<const SIRegisterInfo *>(ST.getRegisterInfo());
				MachineFunction &MF = MBB->getParent();
				unsigned Reg = MI.getOperand(0).getReg();
				unsigned RegLo = TRI->getSubReg(Reg, AMDGPU::sub0);
				unsigned RegHi = TRI->getSubReg(Reg, AMDGPU::sub1);

				// Create a bundle so these instructions won't be re-ordered by the
				// post-RA scheduler.
				MIBundleBuilder Bundler(*MBB, MI);
				Bundler.append(BuildMI(MF, DL, get(AMDGPU::S_GETPC_B64), Reg));

				// Add 32-bit offset from this instruction to the start of the
				// constant data.
				Bundler.append(BuildMI(MF, DL, get(AMDGPU::S_ADD_U32), RegLo)
				.addReg(RegLo)
				.addOperand(MI.getOperand(1)));

				llvm::finalizeBundle(*MBB, Bundler.begin());

				MI.eraseFromParent();
				break;
				*/

				return false;
				} // END ConnexInstrInfo::expandPostRAPseudo()


				// USE_POSTRA_SCHED
				// Inspired from llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1TargetInstrInfo.html
				ScheduleHazardRecognizer *ConnexInstrInfo::CreateTargetPostRAHazardRecognizer(
				const InstrItineraryData *II,
				const ScheduleDAG *DAG) const {
				/*
				unsigned Directive =
				DAG->MF.getSubtarget<PPCSubtarget>().getDarwinDirective();
				*/
				LLVM_DEBUG(dbgs() << "Entered ConnexInstrInfo::CreateTargetPostRAHazardRecognizer()\n");

				return new ConnexDispatchGroupSBHazardRecognizer(II, DAG);
				}


				/*
				ScheduleHazardRecognizer *
				ConnexInstrInfo::CreateTargetPostRAHazardRecognizer(const MachineFunction &MF) const {
				LLVM_DEBUG(dbgs() << "Entered ConnexInstrInfo::CreateTargetPostRAHazardRecognizer(MachineFunction)\n");

				// TODO TODO TODO TODO TODO TODO TODO: Get inspired from AMDGPU how they added separate
				// PostRA HazardRecognizer.
				// See http://llvm.org/doxygen/classllvm_1_1MachineFunction.html
				return new ConnexDispatchGroupSBHazardRecognizer(II, DAG);
				}
				*/

				// Pre-RA MI-scheduler - used if I give llc -enable-misched ...
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1TargetInstrInfo.html
				ScheduleHazardRecognizer *ConnexInstrInfo::CreateTargetMIHazardRecognizer(
				const InstrItineraryData *II,
				const ScheduleDAG *DAG) const {
				LLVM_DEBUG(dbgs() <<
				"Entered ConnexInstrInfo::CreateTargetMIHazardRecognizer()\n");

				return new ConnexDispatchGroupSBHazardRecognizerPreRAScheduler(II, DAG);
				}


				/*
				// USE_PRERA_HAZARD_RECOGNIZER

				// Pre-RA scheduler - default scheduler (no special param given to llc)
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1TargetInstrInfo.html
				ScheduleHazardRecognizer *ConnexInstrInfo::CreateTargetHazardRecognizer(
				const TargetSubtargetInfo *STI,
				const ScheduleDAG *DAG) const {
				LLVM_DEBUG(dbgs() << "Entered ConnexInstrInfo::CreateTargetHazardRecognizer()\n");

				return new ConnexDispatchGroupSBHazardRecognizerPreRAScheduler(
				// See http://llvm.org/docs/doxygen/html/TargetSubtargetInfo_8h_source.html#l00100
				STI->getInstrItineraryData(),
				DAG);
				}
				*/

				// Inspired from llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
				void ConnexInstrInfo::insertNoop(MachineBasicBlock &MBB,
				MachineBasicBlock::iterator MI) const {
				LLVM_DEBUG(dbgs() << "Entered ConnexInstrInfo::insertNoop()\n");

				DebugLoc DL;
				BuildMI(MBB, MI, DL, get(Connex::NOP));
				}


				// From http://llvm.org/docs/doxygen/html/classllvm_1_1TargetInstrInfo.html: <<Return true if the specified instruction can be predicated.>>
				/* From http://llvm.org/docs/doxygen/html/classllvm_1_1MachineInstr.html:
				<<bool isPredicable (QueryType Type=AllInBundle) const
				Return true if this instruction has a predicate operand that controls execution.>>
				*/
				// Inspired from ARMBaseInstrInfo::isPredicable
				bool ConnexInstrInfo::isPredicable(MachineInstr &MI) const {
				//if (!MI.isPredicable())
				// return false;
				LLVM_DEBUG(dbgs() << "ConnexInstrInfo::isPredicable(): MI.getOpcode() = "
				<< MI.getOpcode() << "\n");

				if (MI.getOpcode() == Connex::VLOAD_H) {
				return true;
				}

				return false;
				}

lib/Target/Connex/ConnexMCInstLower.h

				//===-- ConnexMCInstLower.h - Lower MachineInstr to MCInst ---------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_CONNEX_CONNEXMCINSTLOWER_H
				#define LLVM_LIB_TARGET_CONNEX_CONNEXMCINSTLOWER_H

				#include "llvm/Support/Compiler.h"

				namespace llvm {
				class AsmPrinter;
				class MCContext;
				class MCInst;
				class MCOperand;
				class MCSymbol;
				class MachineInstr;
				class MachineModuleInfoMachO;
				class MachineOperand;
				class Mangler;

				// ConnexMCInstLower - This class is used to lower an MachineInstr into an MCInst.
				class LLVM_LIBRARY_VISIBILITY ConnexMCInstLower {
				MCContext &Ctx;

				AsmPrinter &Printer;

				public:
				ConnexMCInstLower(MCContext &ctx, AsmPrinter &printer)
				: Ctx(ctx), Printer(printer) {}
				void Lower(const MachineInstr *MI, MCInst &OutMI) const;

				MCOperand LowerSymbolOperand(const MachineOperand &MO, MCSymbol *Sym) const;

				MCSymbol *GetGlobalAddressSymbol(const MachineOperand &MO) const;
				};
				}

				#endif

lib/Target/Connex/ConnexMCInstLower.cpp

				//=-- ConnexMCInstLower.cpp - Convert Connex MachineInstr to an MCInst ------------=//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains code to lower Connex MachineInstrs to their corresponding
				// MCInst records.
				//
				//===----------------------------------------------------------------------===//

				#include "ConnexMCInstLower.h"
				#include "llvm/CodeGen/AsmPrinter.h"
				#include "llvm/CodeGen/MachineBasicBlock.h"
				#include "llvm/CodeGen/MachineInstr.h"
				#include "llvm/MC/MCAsmInfo.h"
				#include "llvm/MC/MCContext.h"
				#include "llvm/MC/MCExpr.h"
				#include "llvm/MC/MCInst.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Support/ErrorHandling.h"
				#include "llvm/ADT/SmallString.h"

				#include "llvm/Support/Debug.h" // for dbgs and LLVM_DEBUG() macro
				#define DEBUG_TYPE "mc-inst-lower"


				using namespace llvm;

				MCSymbol *
				ConnexMCInstLower::GetGlobalAddressSymbol(const MachineOperand &MO) const {
				return Printer.getSymbol(MO.getGlobal());
				}

				MCOperand ConnexMCInstLower::LowerSymbolOperand(const MachineOperand &MO,
				MCSymbol *Sym) const {

				const MCExpr *Expr = MCSymbolRefExpr::create(Sym, Ctx);

				if (!MO.isJTI() && MO.getOffset())
				llvm_unreachable("unknown symbol op");

				return MCOperand::createExpr(Expr);
				}

				void ConnexMCInstLower::Lower(const MachineInstr *MI, MCInst &OutMI) const {
				LLVM_DEBUG(dbgs() << "Entered ConnexMCInstLower::Lower(*MI = "
				<< *MI << ")...\n");
				OutMI.setOpcode(MI->getOpcode());

				for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
				const MachineOperand &MO = MI->getOperand(i);
				LLVM_DEBUG(dbgs() << "ConnexMCInstLower::Lower(): MO = "
				<< MO << "\n");
				LLVM_DEBUG(dbgs() << " ConnexMCInstLower::Lower(): MO.getType() = "
				<< MO.getType() << "\n");

				MCOperand MCOp;

				switch (MO.getType()) {

				default:
				MI->dump();
				/*
				LLVM_DEBUG(dbgs() << "ConnexMCInstLower::Lower(): MO.getType() = "
				<< MO.getType() << "\n");
				*/

				llvm_unreachable("unknown operand type");



				case MachineOperand::MO_ExternalSymbol: {
				const MCSymbol *Symbol = Printer.GetExternalSymbolSymbol(MO.getSymbolName());
				MCSymbolRefExpr::VariantKind Kind = MCSymbolRefExpr::VK_None;
				const MCExpr *Expr = MCSymbolRefExpr::create(Symbol, Kind, Ctx);
				MCOp = MCOperand::createExpr(Expr);
				//Offset += MO.getOffset();
				break;
				}

				//case MachineOperand::MO_MetaData: {
				case MachineOperand::MO_Metadata: {
				continue;
				//break;
				}

				case MachineOperand::MO_Register:
				// Ignore all implicit register operands.
				if (MO.isImplicit())
				continue;
				MCOp = MCOperand::createReg(MO.getReg());
				break;

				case MachineOperand::MO_Immediate:
				MCOp = MCOperand::createImm(MO.getImm());
				break;

				case MachineOperand::MO_MachineBasicBlock:
				MCOp = MCOperand::createExpr(
				MCSymbolRefExpr::create(MO.getMBB()->getSymbol(), Ctx));
				break;

				case MachineOperand::MO_RegisterMask:
				continue;
				case MachineOperand::MO_GlobalAddress:
				MCOp = LowerSymbolOperand(MO, GetGlobalAddressSymbol(MO));
				break;
				}

				OutMI.addOperand(MCOp);
				}
				}

lib/Target/Connex/ConnexRegisterInfo.h

				//===-- ConnexRegisterInfo.h - Connex Register Information Impl -------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains the Connex implementation of the TargetRegisterInfo class.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_CONNEX_CONNEXREGISTERINFO_H
				#define LLVM_LIB_TARGET_CONNEX_CONNEXREGISTERINFO_H

				#include "llvm/CodeGen/TargetRegisterInfo.h"

				#define GET_REGINFO_HEADER
				#include "ConnexGenRegisterInfo.inc"

				namespace llvm {

				struct ConnexRegisterInfo : public ConnexGenRegisterInfo {

				ConnexRegisterInfo();

				// Inspired from lib/Target/Mips/MipsRegisterInfo.cpp
				const TargetRegisterClass *getPointerRegClass(const MachineFunction &MF,
				unsigned Kind) const;

				const MCPhysReg getCalleeSavedRegs(const MachineFunction MF) const override;

				/*
				From http://llvm.org/doxygen/classllvm_1_1TargetRegisterInfo.html:
				<<Returns a bitset indexed by physical register number indicating if a
				register is a special register that has particular uses and should be
				considered unavailable at all times, e.g. stack pointer, return address.
				A reserved register:
				is not allocatable
				is considered always live
				is ignored by liveness tracking It is often necessary to reserve the
				super registers of a reserved register as well, to avoid them
				getting allocated indirectly. You may use markSuperRegs() and
				checkAllSuperRegsMarked() in this case.>>
				*/
				BitVector getReservedRegs(const MachineFunction &MF) const override;

				void eliminateFrameIndex(MachineBasicBlock::iterator MI, int SPAdj,
				unsigned FIOperandNum,
				RegScavenger *RS = nullptr) const override;

				unsigned getFrameRegister(const MachineFunction &MF) const override;


				/* Addressing bug
				(llc -O0, at pass: "******** FAST REGISTER ALLOCATION ********")
				<<Remaining virtual register operands
				UNREACHABLE executed at /home/asusu/LLVM/llvm38Nov2016/llvm/lib/CodeGen/MachineRegisterInfo.cpp:144!>>

				(Using suggestion from at https://groups.google.com/forum/#!topic/llvm-dev/fEyD9YREi5M).
				*/
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1TargetRegisterInfo.html
				// Returns true if the target requires (and can make use of) the register scavenger.
				virtual bool requiresRegisterScavenging (const MachineFunction &MF) const {
				//return true;
				return false;
				}

				virtual bool requiresFrameIndexScavenging (const MachineFunction &MF) const {
				//return true;
				return false;
				}
				};
				}

				#endif

lib/Target/Connex/ConnexRegisterInfo.cpp

				//===-- ConnexRegisterInfo.cpp - Connex Register Information ----------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains the Connex implementation of the TargetRegisterInfo class.
				//
				//===----------------------------------------------------------------------===//

				#include "Connex.h"
				#include "ConnexRegisterInfo.h"
				#include "ConnexSubtarget.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineFrameInfo.h"
				#include "llvm/CodeGen/MachineFunction.h"
				#include "llvm/CodeGen/RegisterScavenging.h"
				#include "llvm/Support/ErrorHandling.h"
				#include "llvm/CodeGen/TargetFrameLowering.h"
				#include "llvm/CodeGen/TargetInstrInfo.h"

				#define GET_REGINFO_TARGET_DESC
				#include "ConnexGenRegisterInfo.inc"
				using namespace llvm;

				#include "llvm/Support/Debug.h" // for dbgs and LLVM_DEBUG() macro
				#define DEBUG_TYPE "mc-inst-lower"



				ConnexRegisterInfo::ConnexRegisterInfo()
				: ConnexGenRegisterInfo(Connex::R0) {}

				// Inspired from lib/Target/Mips/MipsRegisterInfo.cpp
				const TargetRegisterClass *ConnexRegisterInfo::getPointerRegClass(
				const MachineFunction &MF,
				unsigned Kind) const {
				return &Connex::GPRRegClass;
				}

				const MCPhysReg ConnexRegisterInfo::getCalleeSavedRegs(const MachineFunction MF) const {
				return CSR_SaveList;
				}

				BitVector ConnexRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
				int numRegs = getNumRegs();

				LLVM_DEBUG(dbgs() << "getReservedRegs(): numRegs = "
				<< numRegs << "\n");

				BitVector Reserved(numRegs);
				Reserved.set(Connex::R10); // R10 is read only frame pointer
				Reserved.set(Connex::R11); // R11 is pseudo stack pointer

				/* Wh30, vector register R(30), is used by me to codegen:
				- LLVM's VSELECT on Connex in ConnexTargetMachine.cpp - PassAfterPostRAScheduler
				(NO longer: in ConnexISelLowering::Lower() for VSELECT to be
				lowered to WHERE*).
				Doing so we avoid errors like:
				<<* Bad machine code: Using an undefined physical register *
				- function: IfConversion
				- basic block: BB#6 vector.body (0x1501fd8)
				- instruction: %vreg47<def> = COPY
				- operand 1: %Wh31>>

				- in ConnexInstrInfo::copyPhysReg() .
				*/
				Reserved.set(CONNEX_RESERVED_REGISTER_01);
				Reserved.set(CONNEX_RESERVED_REGISTER_02);
				Reserved.set(CONNEX_RESERVED_REGISTER_03);

				return Reserved;
				}

				// From book Lopes_2014:
				// "implements this replacement by converting each frame index to a real stack offset
				// for all machine instructions that contain stack references (usually loads and stores).
				// Extra instructions are also generated whenever additional stack offset arithmetic is
				// necessary".
				void ConnexRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
				int SPAdj, unsigned FIOperandNum,
				RegScavenger *RS) const {
				assert(SPAdj == 0 && "Unexpected");

				unsigned i = 0;
				MachineInstr &MI = *II;
				MachineFunction &MF = *MI.getParent()->getParent();
				DebugLoc DL = MI.getDebugLoc();

				while (!MI.getOperand(i).isFI()) {
				++i;
				assert(i < MI.getNumOperands() && "Instr doesn't have FrameIndex operand!");
				}

				unsigned FrameReg = getFrameRegister(MF);
				int FrameIndex = MI.getOperand(i).getIndex();
				const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
				MachineBasicBlock &MBB = *MI.getParent();

				if (MI.getOpcode() == Connex::MOV_rr) {
				MI.getOperand(i).ChangeToRegister(FrameReg, false);

				// !!!!TODO MAYBE: we took out the scalar ADD and therefore we have to comment this
				// /*
				int Offset = MF.getFrameInfo().getObjectOffset(FrameIndex);
				unsigned reg = MI.getOperand(i - 1).getReg();

				BuildMI(MBB, ++II, DL, TII.get(Connex::ADD_ri), reg)
				.addReg(reg)
				.addImm(Offset);
				// */

				return;
				}

				int Offset = MF.getFrameInfo().getObjectOffset(FrameIndex) +
				MI.getOperand(i + 1).getImm();

				if (!isInt<32>(Offset))
				llvm_unreachable("bug in frame offset");

				if (MI.getOpcode() == Connex::FI_ri) {
				// architecture does not really support FI_ri, replace it with
				// MOV_rr <target_reg>, frame_reg
				// ADD_ri <target_reg>, imm
				unsigned reg = MI.getOperand(i - 1).getReg();

				BuildMI(MBB, ++II, DL, TII.get(Connex::MOV_rr), reg)
				.addReg(FrameReg);

				// !!!!TODO MAYBE: we took out the scalar ADD and therefore we have to comment this
				// /*
				BuildMI(MBB, II, DL, TII.get(Connex::ADD_ri), reg)
				.addReg(reg)
				.addImm(Offset);
				// */

				// Remove FI_ri instruction
				MI.eraseFromParent();
				}
				else {
				MI.getOperand(i).ChangeToRegister(FrameReg, false);
				MI.getOperand(i + 1).ChangeToImmediate(Offset);
				}
				}

				unsigned ConnexRegisterInfo::getFrameRegister(const MachineFunction &MF) const {
				// MEGA-TODO: in principle we should return also for the Connex vector processor a vector register like: Connex::Wh28
				return Connex::R10;
				}

lib/Target/Connex/ConnexSelectionDAGInfo.h

				//===-- ConnexSelectionDAGInfo.h - Connex SelectionDAG Info ------------ C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file defines the Connex subclass for SelectionDAGTargetInfo.
				///
				//===----------------------------------------------------------------------===//

				// Inspired from ARM/ARMSelectionDAGInfo.cpp


				#ifndef LLVM_LIB_TARGET_CONNEX_CONNEXSELECTIONDAGINFO_H
				#define LLVM_LIB_TARGET_CONNEX_CONNEXSELECTIONDAGINFO_H

				//#include "MCTargetDesc/ConnexAddressingModes.h"
				#include "llvm/CodeGen/RuntimeLibcalls.h"
				#include "llvm/CodeGen/SelectionDAGTargetInfo.h"

				namespace llvm {

				/*
				namespace Connex_AM {
				static inline ShiftOpc getShiftOpcForNode(unsigned Opcode) {
				switch (Opcode) {
				default: return Connex_AM::no_shift;
				case ISD::SHL: return Connex_AM::lsl;
				case ISD::SRL: return Connex_AM::lsr;
				case ISD::SRA: return Connex_AM::asr;
				case ISD::ROTR: return Connex_AM::ror;
				//case ISD::ROTL: // Only if imm -> turn into ROTR.
				// Can't handle RRX here, because it would require folding a flag into
				// the addressing mode. :( This causes us to miss certain things.
				//case ConnexISD::RRX: return Connex_AM::rrx;
				}
				}
				} // end namespace Connex_AM
				*/

				class ConnexSelectionDAGInfo : public SelectionDAGTargetInfo {
				public:
				SDValue EmitTargetCodeForMemcpy(SelectionDAG &DAG, const SDLoc &dl,
				SDValue Chain, SDValue Dst, SDValue Src,
				SDValue Size, unsigned Align, bool isVolatile,
				bool AlwaysInline,
				MachinePointerInfo DstPtrInfo,
				MachinePointerInfo SrcPtrInfo) const override;

				SDValue
				EmitTargetCodeForMemmove(SelectionDAG &DAG, const SDLoc &dl, SDValue Chain,
				SDValue Dst, SDValue Src, SDValue Size,
				unsigned Align, bool isVolatile,
				MachinePointerInfo DstPtrInfo,
				MachinePointerInfo SrcPtrInfo) const override;

				// Adjust parameters for memset, see RTABI section 4.3.4
				SDValue EmitTargetCodeForMemset(SelectionDAG &DAG, const SDLoc &dl,
				SDValue Chain, SDValue Op1, SDValue Op2,
				SDValue Op3, unsigned Align, bool isVolatile,
				MachinePointerInfo DstPtrInfo) const override;

				SDValue EmitSpecializedLibcall(SelectionDAG &DAG, const SDLoc &dl,
				SDValue Chain, SDValue Dst, SDValue Src,
				SDValue Size, unsigned Align,
				RTLIB::Libcall LC) const;
				};

				} // end namespace llvm

				#endif

lib/Target/Connex/ConnexSelectionDAGInfo.cpp

				//===-- ConnexSelectionDAGInfo.cpp - Connex SelectionDAG Info -------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements the ConnexSelectionDAGInfo class.
				//
				//===----------------------------------------------------------------------===//

				#include "ConnexTargetMachine.h"
				#include "llvm/CodeGen/SelectionDAG.h"
				#include "llvm/IR/DerivedTypes.h"
				#include "ConnexSelectionDAGInfo.h"


				// Inspired from ARM/ARMSelectionDAGInfo.cpp

				using namespace llvm;

				#define DEBUG_TYPE "connex-selectiondag-info"

				// Emit, if possible, a specialized version of the given Libcall. Typically this
				// means selecting the appropriately aligned version, but we also convert memset
				// of 0 into memclr.
				SDValue ConnexSelectionDAGInfo::EmitSpecializedLibcall(
				SelectionDAG &DAG, const SDLoc &dl, SDValue Chain, SDValue Dst, SDValue Src,
				SDValue Size, unsigned Align, RTLIB::Libcall LC) const {

				const ConnexSubtarget &Subtarget =
				DAG.getMachineFunction().getSubtarget<ConnexSubtarget>();
				const ConnexTargetLowering *TLI = Subtarget.getTargetLowering();

				TargetLowering::ArgListTy Args;
				TargetLowering::ArgListEntry Entry;
				Entry.Ty = DAG.getDataLayout().getIntPtrType(*DAG.getContext());
				Entry.Node = Dst;
				Args.push_back(Entry);

				/*
				if (AEABILibcall == AEABI_MEMCLR) {
				Entry.Node = Size;
				Args.push_back(Entry);
				} else if (AEABILibcall == AEABI_MEMSET) {
				*/
				// Adjust parameters for memset, EABI uses format (ptr, size, value),
				// GNU library uses (ptr, value, size)
				// See RTABI section 4.3.4
				Entry.Node = Size;
				Args.push_back(Entry);

				// Extend or truncate the argument to be an i32 value for the call.
				if (Src.getValueType().bitsGT(MVT::i32))
				Src = DAG.getNode(ISD::TRUNCATE, dl, MVT::i32, Src);
				else if (Src.getValueType().bitsLT(MVT::i32))
				Src = DAG.getNode(ISD::ZERO_EXTEND, dl, MVT::i32, Src);

				Entry.Node = Src;
				Entry.Ty = Type::getInt32Ty(*DAG.getContext());
				Entry.IsSExt = false;
				Args.push_back(Entry);
				/*
				} else {
				Entry.Node = Src;
				Args.push_back(Entry);

				Entry.Node = Size;
				Args.push_back(Entry);
				}
				*/

				static char const *FunctionNames[4][3] = {
				{ "__aeabi_memcpy", "__aeabi_memcpy4", "__aeabi_memcpy8" },
				{ "__aeabi_memmove", "__aeabi_memmove4", "__aeabi_memmove8" },
				//{ "__aeabi_memset", "__aeabi_memset4", "__aeabi_memset8" },
				{ "memset", "memset", "memset" },
				{ "__aeabi_memclr", "__aeabi_memclr4", "__aeabi_memclr8" }
				};
				TargetLowering::CallLoweringInfo CLI(DAG);
				CLI.setDebugLoc(dl)
				.setChain(Chain)
				.setCallee(
				TLI->getLibcallCallingConv(LC), Type::getVoidTy(*DAG.getContext()),
				DAG.getExternalSymbol(FunctionNames[2][2],
				TLI->getPointerTy(DAG.getDataLayout())),
				std::move(Args))
				.setDiscardResult();
				std::pair<SDValue,SDValue> CallResult = TLI->LowerCallTo(CLI);

				return CallResult.second;
				}

				SDValue ConnexSelectionDAGInfo::EmitTargetCodeForMemcpy(
				SelectionDAG &DAG, const SDLoc &dl, SDValue Chain, SDValue Dst, SDValue Src,
				SDValue Size, unsigned Align, bool isVolatile, bool AlwaysInline,
				MachinePointerInfo DstPtrInfo, MachinePointerInfo SrcPtrInfo) const {
				return EmitSpecializedLibcall(DAG, dl, Chain, Dst, Src, Size, Align,
				RTLIB::MEMCPY);
				}

				SDValue ConnexSelectionDAGInfo::EmitTargetCodeForMemmove(SelectionDAG &DAG,
				const SDLoc &dl,
				SDValue Chain,
				SDValue Dst,
				SDValue Src,
				SDValue Size,
				unsigned Align,
				bool isVolatile,
				MachinePointerInfo DstPtrInfo,
				MachinePointerInfo SrcPtrInfo) const {
				return EmitSpecializedLibcall(DAG, dl, Chain, Dst, Src, Size, Align,
				RTLIB::MEMMOVE);
				arsenmUnsubmitted Done Reply Inline Actions static const arsenm: static const
				}

				SDValue ConnexSelectionDAGInfo::EmitTargetCodeForMemset(SelectionDAG &DAG,
				const SDLoc &dl,
				SDValue Chain,
				SDValue Dst,
				SDValue Src,
				SDValue Size,
				unsigned Align,
				bool isVolatile,
				MachinePointerInfo DstPtrInfo) const {
				LLVM_DEBUG(dbgs() << "Entered ConnexSelectionDAGInfo::EmitTargetCodeForMemset()"
				<< "\n");

				return EmitSpecializedLibcall(DAG, dl, Chain, Dst, Src, Size, Align,
				RTLIB::MEMSET);
				}
				arsenmUnsubmitted Done Reply Inline Actions static const arsenm: static const
				arsenmUnsubmitted Done Reply Inline Actions More junk macros arsenm: More junk macros

lib/Target/Connex/ConnexSubtarget.h

				//===-- ConnexSubtarget.h - Define Subtarget for the Connex ------------ C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file declares the Connex specific subclass of TargetSubtargetInfo.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_CONNEX_CONNEXSUBTARGET_H
				#define LLVM_LIB_TARGET_CONNEX_CONNEXSUBTARGET_H

				#include "ConnexFrameLowering.h"
				#include "ConnexISelLowering.h"
				#include "ConnexInstrInfo.h"
				#include "ConnexSelectionDAGInfo.h"

				#include "llvm/CodeGen/SelectionDAGTargetInfo.h"
				#include "llvm/CodeGen/TargetSubtargetInfo.h"
				#include "llvm/IR/DataLayout.h"
				#include "llvm/Target/TargetMachine.h"

				#define GET_SUBTARGETINFO_HEADER
				#include "ConnexGenSubtargetInfo.inc"

				namespace llvm {
				class StringRef;

				class ConnexSubtarget : public ConnexGenSubtargetInfo {
				virtual void anchor();
				ConnexInstrInfo InstrInfo;
				ConnexFrameLowering FrameLowering;
				ConnexTargetLowering TLInfo;

				SelectionDAGTargetInfo TSInfo;
				ConnexSelectionDAGInfo TSInfo2;

				public:
				// This constructor initializes the data members to match that
				// of the specified triple.
				ConnexSubtarget(const Triple &TT, const std::string &CPU, const std::string &FS,
				const TargetMachine &TM);

				// ParseSubtargetFeatures - Parses features string setting specified
				// subtarget options. Definition of function is auto generated by tblgen.
				void ParseSubtargetFeatures(StringRef CPU, StringRef FS);

				const ConnexInstrInfo *getInstrInfo() const override { return &InstrInfo; }
				const ConnexFrameLowering *getFrameLowering() const override {
				return &FrameLowering;
				}
				const ConnexTargetLowering *getTargetLowering() const override {
				return &TLInfo;
				}

				const TargetRegisterInfo *getRegisterInfo() const override {
				return &InstrInfo.getRegisterInfo();
				}

				// Inspired from ARM/ARMSubtarget.cpp
				const ConnexSelectionDAGInfo *getSelectionDAGInfo() const override {
				return &TSInfo2;
				}
				};
				} // End llvm namespace

				#endif

lib/Target/Connex/ConnexSubtarget.cpp

				//===-- ConnexSubtarget.cpp - Connex Subtarget Information ----------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements the Connex specific subclass of TargetSubtargetInfo.
				//
				//===----------------------------------------------------------------------===//

				#include "ConnexSubtarget.h"
				#include "Connex.h"
				#include "llvm/Support/TargetRegistry.h"

				using namespace llvm;

				#define DEBUG_TYPE "connex-subtarget"

				#define GET_SUBTARGETINFO_TARGET_DESC
				#define GET_SUBTARGETINFO_CTOR
				#include "ConnexGenSubtargetInfo.inc"

				void ConnexSubtarget::anchor() {}

				ConnexSubtarget::ConnexSubtarget(const Triple &TT, const std::string &CPU,
				const std::string &FS, const TargetMachine &TM)
				: ConnexGenSubtargetInfo(TT, CPU, FS), InstrInfo(), FrameLowering(*this),
				TLInfo(TM, *this) {}

lib/Target/Connex/ConnexTargetMachine.h

				//===-- ConnexTargetMachine.h - Define TargetMachine for Connex --- C++ ---===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file declares the Connex specific subclass of TargetMachine.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_CONNEX_CONNEXTARGETMACHINE_H
				#define LLVM_LIB_TARGET_CONNEX_CONNEXTARGETMACHINE_H

				#include "ConnexSubtarget.h"
				#include "llvm/ADT/Optional.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/Support/CodeGen.h"
				#include "llvm/Target/TargetMachine.h" // This was before
				#include <memory>

				namespace llvm {
				class ConnexTargetMachine : public LLVMTargetMachine {
				std::unique_ptr<TargetLoweringObjectFile> TLOF;
				ConnexSubtarget Subtarget;

				public:
				ConnexTargetMachine(const Target &T, const Triple &TT, StringRef CPU,
				StringRef FS, const TargetOptions &Options,
				Optional<Reloc::Model> RM, Optional<CodeModel::Model> CM,
				CodeGenOpt::Level OL, bool JIT);

				const ConnexSubtarget *getSubtargetImpl() const { return &Subtarget; }
				const ConnexSubtarget *getSubtargetImpl(const Function &) const override {
				return &Subtarget;
				}

				TargetPassConfig *createPassConfig(PassManagerBase &PM) override;

				// Inspired from ARC/ARCTargetMachine.h
				TargetTransformInfo getTargetTransformInfo(const Function &F) override;

				TargetLoweringObjectFile *getObjFileLowering() const override {
				return TLOF.get();
				}
				};
				}

				#endif

lib/Target/Connex/ConnexTargetMachine.cpp

				// TargetMachine.cpp - Define TargetMachine for Connex ---------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// Implements the info about Connex target spec.
				// NOTE: I (partly) documented what the passes PassCreateBundles and
				// PassFinalizeBundles do and my design decisions at
				// http://lists.llvm.org/pipermail/llvm-dev/2017-March/110990.html
				//===----------------------------------------------------------------------===//

				#include "Connex.h"
				#include "ConnexTargetMachine.h"
				#include "llvm/CodeGen/TargetLoweringObjectFileImpl.h"
				#include "llvm/IR/LegacyPassManager.h"
				#include "llvm/CodeGen/Passes.h"
				#include "llvm/CodeGen/TargetPassConfig.h"
				#include "llvm/Support/FormattedStream.h"
				#include "llvm/Support/TargetRegistry.h"
				#include "llvm/Target/TargetOptions.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h" // For MIBundleBuilder
				//
				#include "llvm/CodeGen/MachineRegisterInfo.h"

				#include "llvm/Support/Debug.h"
				#define DEBUG_TYPE "connex-target-config"

				#include "ConnexTargetTransformInfo.h"




				using namespace llvm;

				static cl::opt<bool> DontTreatCopyInstructions("dont-treat-copy-instructions",
				cl::Hidden,
				cl::init(false),
				cl::desc("Don't treat copy instructions"));


				#define CONNEX_RESERVED_REGISTER_DST_FOR_SPLIT CONNEX_RESERVED_REGISTER_02
				// NOT compiling - <<error: ‘Wh3000’ is not a member of ‘llvm::Connex’>>: #define CONNEX_RESERVED_REGISTER_DST_FOR_SPLIT Connex::Wh3000
				// Gives strange results, but sortta helps for reading output.cpp: #define CONNEX_RESERVED_REGISTER_DST_FOR_SPLIT 3000

				extern "C" void LLVMInitializeConnexTarget() {
				// Register the target - Force static initialization.
				RegisterTargetMachine<ConnexTargetMachine> Z(TheConnexTarget);
				}

				static StringRef computeDataLayout(const Triple &TT) {
				/*
				See http://llvm.org/docs/LangRef.html#data-layout for all details regarding layout declaration.
				- e
				Specifies that the target lays out data in little-endian form.
				- S<size>
				Specifies the natural alignment of the stack in bits.
				Alignment promotion of stack variables is limited to the natural stack alignment to avoid dynamic stack realignment.
				The stack alignment must be a multiple of 8-bits.
				If omitted, the natural stack alignment defaults to “unspecified”, which does not prevent any alignment promotions.
				- p[n]:<size>:<abi>:<pref>
				This specifies the size of a pointer and its <abi> and <pref>erred alignments for address space n. All sizes are in bits. The address space, n, is optional, and if not specified, denotes the default address space 0. The value of n must be in the range [1,2^23).
				- i<size>:<abi>:<pref>
				This specifies the alignment for an integer type of a given bit <size>. The value of <size> must be in the range [1,2^23).
				- n<size1>:<size2>:<size3>...
				This specifies a set of native integer widths for the target CPU in bits.
				- v<size>:<abi>:<pref>
				This specifies the alignment for a vector type of a given bit <size>.

				See also http://llvm.org/docs/WritingAnLLVMBackend.html
				An upper-case “E” in the string indicates a big-endian target data model.
				A lower-case “e” indicates little-endian.
				“p:” is followed by pointer information: size, ABI alignment, and preferred alignment.
				If only two figures follow “p:”, then the first value is pointer size, and the second value is both ABI and preferred alignment.
				Then a letter for numeric type alignment: “i”, “f”, “v”, or “a” (corresponding to integer, floating point, vector, or aggregate).
				“i”, “v”, or “a” are followed by ABI alignment and preferred alignment.
				“f” is followed by three values: the first indicates the size of a long double, then ABI alignment, and then ABI preferred alignment.
				*/

				// We specify here the data-layout:
				// - of the CPU, eBPF - actually ABI properties
				// - only a few alignment properties for the vector types
				// - see at the end of the string. Note that we can't
				// specify any other properties for the Connex vector processor.
				// VERY IMPORTANT: The pointer size 64 (of the eBPF CPU), because the
				// masked.gather/scatter instructions use such pointer normally in LLVM IR,
				// even if we translate them to writeDataTo/readDataFromConnex() and
				// Connex vector assembly instructions with indirect memory accesses.
				//
				// We really need to specify p:64 (not p:16), otherwise we get an error like:
				// "Do not know how to promote this operator!"
				// (GlobalAddress<i64* @CONNEX_VL> 0")
				// IMPORTANT: the string is the one from the (e)BPF back end,
				// concatenated with the spec for the vector alignment for Connex.
				return "e-m:e-p:64:64-i64:64-n32:64-S128-v128:128:128-v2048:2048:2048";
				}


				static Reloc::Model getEffectiveRelocModel(Optional<Reloc::Model> RM) {
				if (!RM.hasValue())
				return Reloc::PIC_;
				return *RM;
				}


				// Inspired from XCore/XCoreTargetMachine.cpp
				static CodeModel::Model getEffectiveXCoreCodeModel(
				Optional<CodeModel::Model> CM) {
				if (CM) {
				if (CM != CodeModel::Small && CM != CodeModel::Large)
				report_fatal_error("Target only supports CodeModel Small or Large");
				return *CM;
				}
				return CodeModel::Small;
				}


				ConnexTargetMachine::ConnexTargetMachine(const Target &T, const Triple &TT,
				StringRef CPU, StringRef FS,
				const TargetOptions &Options,
				Optional<Reloc::Model> RM,
				Optional<CodeModel::Model> CM,
				CodeGenOpt::Level OL,
				bool JIT)
				: LLVMTargetMachine(T, computeDataLayout(TT), TT, CPU, FS, Options,
				getEffectiveRelocModel(RM),
				getEffectiveCodeModel(CM, CodeModel::Small), OL),
				TLOF(make_unique<TargetLoweringObjectFileELF>()),
				Subtarget(TT, CPU, FS, *this) {
				initAsmInfo();
				}





				namespace {


				/* I made sure that the iterators don't become invalid by using
				another iterator, e.g. I2succ, which stores the next pointer in the
				data structures.

				small-TODO: it might be safer to do a change by moving (maybe also
				erasing) COPY instrs one per WHERE block (or even per MBB) and then get out of
				the MBB::iterator loop and restart the loop from the beginning again until
				NO more changes are performed - this in order to avoid any (eventual) issue with
				iterator invalidation.
				*/
				class PassHandleMisplacedInstr : public MachineFunctionPass {
				public:
				PassHandleMisplacedInstr() : MachineFunctionPass(ID) {}

				StringRef getPassName() const override {
				return "PassHandleMisplacedInstr";
				}

				/* // GMS said he doesn't like having arithmetic or logic instruction between predicate and WHERE* instruction:
				#ifdef ALLOW_COPY_BETWEEN_PREDICATE_AND_WHERE_INSTRUCTIONS
				- this case needs to be implemented carefully - I only sketched it a bit, so
				it isn't tested either
				*/

				void updateUsesOfRegUntilCOPY(MachineBasicBlock::iterator &Ipredicate,
				// We start replacing uses from Ipredicate + 1
				MachineBasicBlock::iterator &I2, // COPY
				MachineBasicBlock::iterator &IE,
				unsigned regCrt,
				unsigned regNew) {
				LLVM_DEBUG(dbgs() << " I2 = " << *I2);

				/* We update all following occurences of the dest register
				of COPY instr (which was also the dest register of the
				predicate)
				- for both uses and def, until 1st def. */
				MachineBasicBlock::iterator Iupdate;
				Iupdate = Ipredicate;
				Iupdate++;

				for (; Iupdate != I2 && Iupdate != IE; Iupdate++) {
				LLVM_DEBUG(dbgs() << " Iupdate = " << *Iupdate);

				/* IMPORTANT: we go in reverse order to make the def last since we
				break at def. */
				for (int idOpnd = Iupdate->getNumOperands() - 1; idOpnd >= 0; idOpnd--) {
				MachineOperand &IOpnd = Iupdate->getOperand((unsigned)idOpnd);

				if (IOpnd.isReg() && IOpnd.getReg() == regCrt) {
				LLVM_DEBUG(dbgs() << "updateUsesOfRegUntilCOPY(): Updating to "
				"regNew the register of Iupdate. "
				" Iupdate = "
				<< *Iupdate);

				/*
				// This does NOT hold because we can have uses of a COPY instr dest
				// register before the COPY - see the big WHERE block of ADD.f16
				assert( (Iupdate->getOpcode() == Connex::WHEREEQ \|\|
				Iupdate->getOpcode() == Connex::WHERELT \|\|
				Iupdate->getOpcode() == Connex::WHERECRY) &&
				"We should NOT be arriving here otherwise.");
				*/

				if (IOpnd.isDef()) {
				// We break
				Iupdate = IE; Iupdate--; // We make it break out of outermost loop
				break;
				}

				IOpnd.setReg(regNew);
				}
				}
				}
				}


				void putCOPYBeforeWhereBlock(MachineBasicBlock &MBB,
				const TargetInstrInfo *TII,
				//MachineBasicBlock::iterator &I,
				MachineInstr IMI, // The WHERE instruction
				MachineBasicBlock::iterator &I2, // COPY
				MachineBasicBlock::iterator &I2plus1,
				MachineBasicBlock::iterator &IE,
				bool &changedMF,
				int &destRegisterPredicateOfSplitWhere) {
				/* NOTE: I2 is the COPY instruction
				if (I2.getOperand(0) == Ipredicate.getOperand(0))
				for each instruction from Ipredicate to I2 - 1 replace defs and uses of
				I2.getOperand(0) with CONNEX_RESERVED_REGISTER_01
				*/

				/*
				Moving COPY before the WHERE block.

				Normally we move the COPY instructions and put them
				in the same order before the predicate.

				important-Note: If we have 2 COPY with the same dest register,
				the WHERE block will be surely split at least for
				the 2nd COPY. For example, from MatMul-256.f16:

				R(11) = R(23) == R(1);
				NOP;
				);
				EXECUTE_WHERE_EQ(
				R(19) = ISHL(R(21), 10);
				// Assume it's not here: R(19) = R(10) \| R(19);
				// Assume it's not here: R(25) = R(1) & R(10);
				R(10) = R(0) \| R(0); // COPY
				R(10) = R(26) - R(1);
				R(11) = R(1) << R(11);
				R(10) = R(0) \| R(0); // COPY
				R(10) = R(11) & R(20);
				The 2nd COPY forces the WHERE to be split
				- it's actually a different variable.

				Note: although not important, in principle we could
				have non-SPECIALV_H instrs inside WHERE blocks if
				the register is NOT initialized. */
				LLVM_DEBUG(dbgs() << " moving I2 immediately before the "
				"predicate instruction linked to the "
				"WHERE block\n");

				MachineBasicBlock::iterator Ipredicate = IMI;
				LLVM_DEBUG(dbgs() << " IMI = "
				<< *IMI << "\n");
				Ipredicate--;
				LLVM_DEBUG(dbgs() << " Ipredicate = "
				<< *Ipredicate << "\n");

				/*
				if (Ipredicate->getOpcode() != Connex::NOP_BPF)
				LLVM_DEBUG(dbgs() << "PassHandleMisplacedInstr: Warning: "
				"Ipredicate->getOpcode() != Connex::NOP_BPF\n");
				*/
				assert(Ipredicate->getOpcode() == Connex::NOP_BPF
				//\|\| Ipredicate->getOpcode() == Connex::NOP
				);

				/* Ipredicate is pointing at 2 instructions before the
				WHERE* instruction, normally at the predicate
				instruction.*/
				Ipredicate--;

				LLVM_DEBUG(dbgs() << " Ipredicate = "
				<< *Ipredicate << "\n");

				// IMPORTANT-TODO: check better: check for right (w.r.t. WHERE) predicate instruction before NOP
				assert(Ipredicate->getOpcode() == Connex::EQ_H \|\|
				Ipredicate->getOpcode() == Connex::LT_H \|\|
				Ipredicate->getOpcode() == Connex::ULT_H //);
				\|\|
				// This is for the case of using lane gating instructions (DISABLE_CELL, ENABLE_ALL_CELLS)
				Ipredicate->getOpcode() == Connex::EQ_SPECIAL_H \|\|
				Ipredicate->getOpcode() == Connex::LT_SPECIAL_H \|\|
				Ipredicate->getOpcode() == Connex::ULT_SPECIAL_H);


				assert(Ipredicate->getOperand(0).isReg() &&
				Ipredicate->getOperand(0).isDef());
				assert(I2->getOperand(0).isReg() &&
				I2->getOperand(0).isDef());


				/*
				// This case can be handled (ONLY) by splitting WHERE block:
				#ifndef ALLOW_COPY_BETWEEN_PREDICATE_AND_WHERE_INSTRUCTIONS
				assert(I2->getOperand(1).getReg() != Ipredicate->getOperand(0).getReg() &&
				"We reached a case that's not treatable by to implement this case!");
				#endif
				*/

				/* Checking for WAR/anti-dependence between predicate and COPY instruction
				- if so, then changing order (moving COPY before predicate) compromises
				correctness so we make a copy of the respective predicate input. */
				// I2 is the COPY instruction
				assert( I2->getOperand(0).isReg() && I2->getOperand(0).isDef() );
				//
				// Ipredicate is the predicate instruction
				assert( Ipredicate->getOperand(1).isReg() &&
				Ipredicate->getOperand(1).isUse() );
				assert( Ipredicate->getOperand(2).isReg() &&
				Ipredicate->getOperand(2).isUse() );
				//
				bool sameOpnd1 =
				Ipredicate->getOperand(1).getReg() == I2->getOperand(0).getReg();
				bool sameOpnd2 =
				Ipredicate->getOperand(2).getReg() == I2->getOperand(0).getReg();
				//
				if (sameOpnd1 \|\| sameOpnd2) {
				LLVM_DEBUG(dbgs() <<
				"Moving COPY before WHERE predicate breaks WAR/anti-dependence "
				"relation between COPY and predicate. "
				"--> fixing the problem by making copy of predicate input.\n");

				/* TODO???: if Ipredicate has a use of the dest register of EQ????????????
				then add: a) an instr before COPY with
				CONNEX_RESERVED_REGISTER_01 = Rinput_EQ \| Rinput_EQ
				*/

				/* We preserve the input register of the predicate instruction since it
				will be overwritten by the moved (before the predicate)
				COPY instruction:
				we make a copy:
				CONNEX_RESERVED_REGISTER_01 = Rdst_COPY \| Rdst_COPY
				*/
				#ifndef ALLOW_COPY_BETWEEN_PREDICATE_AND_WHERE_INSTRUCTIONS
				#ifdef COPY_REGISTER_IMPLEMENTED_WITH_ORV_H
				BuildMI(MBB,
				Ipredicate,
				/* We insert this MachineInstr before Ipredicate.
				Also the COPY I2 we move after this, after Ipredicate,
				so I2 will be moved after this new copy */
				IMI->getDebugLoc(),
				TII->get(Connex::ORV_H),
				CONNEX_RESERVED_REGISTER_01).
				addReg(I2->getOperand(0).getReg()).
				/* Note: I2 (COPY) does NOT necessarily have the
				same dest register as Ipredicate. */
				addReg(I2->getOperand(0).getReg());
				#else
				#error "This case is NOT implemented. Implement it!"
				#endif
				#endif
				/* This really helps a lot since the COPY moved before
				Ipredicate should be visible inside the WHERE block,
				so then we need to make the Ipredicate destination a reserved reg.
				Chances are big (but it's not necessary to be so I think) that since
				sameOpnd1 \|\| sameOpnd2, then we can have Ipredicate with
				Ipredicate->getOperand(0) == I2->getOperand(0);
				and if we leave it like that then we shadow the COPY.
				. */
				if (Ipredicate->getOperand(0).getReg() == I2->getOperand(0).getReg())
				Ipredicate->getOperand(0).setReg(CONNEX_RESERVED_REGISTER_01);

				// Note: Ipredicate is the predicate instruction
				/* These checks handle also the case both input operands of Ipredicate
				are the same.
				*/
				if (sameOpnd1)
				Ipredicate->getOperand(1).setReg(CONNEX_RESERVED_REGISTER_01);
				if (sameOpnd2)
				Ipredicate->getOperand(2).setReg(CONNEX_RESERVED_REGISTER_01);

				/* We now normally have to update the uses of modified input of
				Ipredicate for the following instructions between the predicate
				and the place where the COPY was.
				However, the instructions using the input after predicate are
				only the ones in the WHERE block basically.
				*/
				updateUsesOfRegUntilCOPY(Ipredicate,
				I2, // COPY
				IE,
				I2->getOperand(0).getReg(),
				CONNEX_RESERVED_REGISTER_01);
				}
				else // MEGA-TODO: think if OK
				if (Ipredicate->getOperand(0).getReg() == I2->getOperand(0).getReg()) {
				// If we have a WAW (output) dependendce
				// Note: Ipredicate is the predicate, I2 is the COPY
				LLVM_DEBUG(dbgs() <<
				" Found that the COPY to be moved "
				"immediately before the predicate of the "
				"WHERE block has the same destination register as the predicate. "
				"This forces us to handle specially "
				"the predicate instr dest register, "
				"since this dest "
				"register is the same as the one of the "
				"COPY (hence, a WAW dependence is broken "
				"and the program would become incorrect "
				"otherwise).\n");

				/* We update dest register of of Ipredicate (predicate)
				due to conflict with I2, which we move before it. */
				/*
				if (destRegisterPredicateOfSplitWhere != -1)
				Ipredicate->getOperand(0).setReg(destRegisterPredicateOfSplitWhere);
				else
				Ipredicate->getOperand(0).setReg(CONNEX_RESERVED_REGISTER_01);
				*/
				Ipredicate->getOperand(0).setReg(CONNEX_RESERVED_REGISTER_02);
				//
				updateUsesOfRegUntilCOPY(Ipredicate,
				I2, // COPY
				IE,
				I2->getOperand(0).getReg(),
				CONNEX_RESERVED_REGISTER_02);
				}

				// We move the COPY instruction before the predicate
				MBB.remove((&(*I2)));
				//MBB.insert(IMI, I2); // It inserts before IMI
				#ifdef ALLOW_COPY_BETWEEN_PREDICATE_AND_WHERE_INSTRUCTIONS
				MBB.insert(Ipredicate, IMI); // It inserts immediately before the WHERE instr
				#else
				MBB.insert(Ipredicate, (&(*I2))); // It inserts before Ipredicate
				#endif
				changedMF = true;

				// We handle the case of more than 1 COPY instr in the WHERE block
				// I2plus1 represents the next instr after the COPY (before move)
				I2 = I2plus1;
				} // END putCOPYBeforeWhereBlock()


				void splitWhereBlock(MachineBasicBlock &MBB,
				const TargetInstrInfo *TII,
				MachineBasicBlock::iterator &I,
				MachineInstr *&IMI,
				MachineBasicBlock::iterator &I2, // COPY instr
				MachineBasicBlock::iterator &IE,
				bool &changedMF,
				int &destRegisterPredicateOfSplitWhere) {
				/* This case handles only the cases we ran so far.
				See MEGA-TODO for limitation of this case. */
				changedMF = true;

				LLVM_DEBUG(dbgs() << " splitWhereBlock(): IMI = "
				<< *IMI);
				LLVM_DEBUG(dbgs() << " splitWhereBlock(): I2 = "
				<< *I2 << "\n");

				/* TODO TODO: handle case
				where we have COPY between 2 instr like ADD and
				ADDC, which is incorrect because the COPY messes
				up the Connex flags. */
				MachineBasicBlock::iterator I2plus1 = I2;
				I2plus1++;
				// I think this does NOT cover all cases but most of them
				assert(I2plus1->getOpcode() != Connex::ADDCV_H &&
				I2plus1->getOpcode() != Connex::SUBCV_H &&
				I2plus1->getOpcode() != Connex::ADDCV_SPECIAL_H &&
				I2plus1->getOpcode() != Connex::SUBCV_SPECIAL_H &&
				"We do NOT handle yet ADDCV/SUBCV instructions immediately after COPY "
				"for this case (and the corresponding ADD/SUB before the COPY)");

				LLVM_DEBUG(dbgs() << " splitting WHERE block in 2 s.t. we put I2 immediately "
				"after new END_WHERE resulting from split.\n");
				// I = beginning of new WHERE block
				//const TargetInstrInfo *TII = MF.getSubtarget<ConnexSubtarget>().getInstrInfo();

				MachineBasicBlock::iterator Ipredicate = IMI;
				// We make Ipredicate point to the predicate of this WHERE
				// block
				Ipredicate--;
				LLVM_DEBUG(dbgs() << " splitWhereBlock(): Ipredicate = "
				<< *Ipredicate << "\n");
				assert(Ipredicate->getOpcode() == Connex::NOP_BPF);
				Ipredicate--;
				LLVM_DEBUG(dbgs() << " splitWhereBlock(): Ipredicate (2 instr before) = "
				<< *Ipredicate << "\n");

				unsigned regDest = CONNEX_RESERVED_REGISTER_02;
				int changedPredicateOpnd = -1;

				// We check Ipredicate, the predicate, is 3-opcode
				assert(
				(
				(
				// For the standard case:
				(Ipredicate->getOpcode() == Connex::EQ_H \|\|
				Ipredicate->getOpcode() == Connex::LT_H \|\|
				Ipredicate->getOpcode() == Connex::ULT_H
				) &&
				Ipredicate->getNumOperands() == 3
				)
				\|\|
				(
				// For disabled lane gating regions
				(
				Ipredicate->getOpcode() == Connex::EQ_SPECIAL_H \|\|
				Ipredicate->getOpcode() == Connex::LT_SPECIAL_H \|\|
				Ipredicate->getOpcode() == Connex::ULT_SPECIAL_H
				) &&
				Ipredicate->getNumOperands() == 4
				)
				)
				&&
				Ipredicate->getOperand(0).isReg() &&
				Ipredicate->getOperand(0).isDef() &&
				Ipredicate->getOperand(1).isReg() &&
				Ipredicate->getOperand(1).isUse() &&
				Ipredicate->getOperand(2).isReg() &&
				Ipredicate->getOperand(2).isUse()
				);

				unsigned predicateInstrOpnd[2];
				predicateInstrOpnd[0] = Ipredicate->getOperand(1).getReg();
				predicateInstrOpnd[1] = Ipredicate->getOperand(2).getReg();

				destRegisterPredicateOfSplitWhere = Ipredicate->getOperand(0).getReg();
				LLVM_DEBUG(dbgs()
				<< "PassHandleMisplacedInstr: destRegisterPredicateOfSplitWhere = "
				<< destRegisterPredicateOfSplitWhere
				<< "\n");

				/*
				assert( (predicateInstrOpnd[0] != CONNEX_RESERVED_REGISTER_02) &&
				(predicateInstrOpnd[1] != CONNEX_RESERVED_REGISTER_02) &&
				// MEGA-MEGA-TODO: implement this - it happens for ADD/MUL.f16
				"We currently can't handle these cases because we have only 1 reserved register.");
				*/
				unsigned predicateInstrOpcode = Ipredicate->getOpcode();
				unsigned predicateInstrOpndAux[2];

				/* We look if predicateInstrOpnd[*] is updated/redefined
				either in the predicate instruction or in the
				instructions of the
				associated WHERE block before the COPY instr.
				- i.e., if predicateInstrOpnd[1] changes then
				use it as predicateInstrOpnd[0].
				If NO change happens we do NOT need to save the
				value of predicateInstrOpnd[*], i.e., to create
				ORV_H below.

				We check this from Ipredicate(+1) (next instr after predicate) to I2(-1)
				(COPY instr, exclusive).
				We check if any of the operands of the predicate change.
				NOTE: assert (if both change - we don't want to waste by reserving 2
				Connex registers - maybe we can change the Connex ASM code by hand
				to avoid this).
				*/
				/*
				if (Ipredicate->getOperand(0).getReg() ==
				Ipredicate->getOperand(1).getReg()) {
				// We changed the 1st input operand of the predicate
				changedPredicateOpnd = 0;
				}
				else
				if (Ipredicate->getOperand(0).getReg() ==
				Ipredicate->getOperand(2).getReg()) {
				// We changed the 2nd input operand of the predicate
				changedPredicateOpnd = 1;
				}
				*/

				MachineBasicBlock::iterator Iaux = Ipredicate;
				//Iaux++;
				MachineBasicBlock::iterator IauxEnd = I2; // I2 is COPY

				#define TREAT_ONLY_ONCE_CHANGE_PREDICATE_OPERANDS
				#ifdef TREAT_ONLY_ONCE_CHANGE_PREDICATE_OPERANDS
				IauxEnd++;
				#endif
				//IauxEnd--;
				/* IMPORTANT: for the NEW predicate we don't care what we use for the
				destination register.

				We now check for the NEW predicate we create for the split if its input
				operands are updated between the
				original_predicate..COPY_instr */
				for (; Iaux != IauxEnd && Iaux != IE; Iaux++) {
				LLVM_DEBUG(dbgs() << " splitWhereBlock(): Iaux = "
				<< *Iaux << "\n");
				if (Iaux->getNumOperands() >= 1 && Iaux->getOperand(0).isReg() &&
				Iaux->getOperand(0).isDef()) {
				if (Iaux->getOperand(0).getReg() == predicateInstrOpnd[0]) {
				assert((changedPredicateOpnd == -1 \|\| changedPredicateOpnd == 0) &&
				// MEGA-TODO: handle this assert violation case
				"It seems both input operands of the "
				"predicate get updated so we would need to "
				"reserve 2 Connex registers to handle well "
				"this case.");
				// We find that we subsequently change the 1st input operand of the predicate
				changedPredicateOpnd = 0;
				}
				else
				if (Iaux->getOperand(0).getReg() == predicateInstrOpnd[1]) {
				/* We find that we subsequently change
				the 2nd input operand of the predicate */
				assert((changedPredicateOpnd == -1 \|\| changedPredicateOpnd == 1) &&
				// MEGA-TODO: handle this assert violation case
				"It seems both input operands of the "
				"predicate get updated so we would need "
				"to reserve 2 Connex registers to handle "
				"well this case.");
				changedPredicateOpnd = 1;
				}
				}
				}

				LLVM_DEBUG(dbgs() << " changedPredicateOpnd = "
				<< changedPredicateOpnd
				<< " (for the input operands of the predicate)\n");

				if (changedPredicateOpnd == -1) {
				//regDest = predicateInstrOpnd[0];
				predicateInstrOpndAux[0] = predicateInstrOpnd[0];
				predicateInstrOpndAux[1] = predicateInstrOpnd[1];
				}
				else {
				/* Put a copy of the changed input register of the predicate instruction
				before Ipredicate, the initial predicate of this WHERE block. */
				#ifdef COPY_REGISTER_IMPLEMENTED_WITH_ORV_H
				if (regDest != predicateInstrOpnd[changedPredicateOpnd]) {
				BuildMI(MBB,
				Ipredicate,
				IMI->getDebugLoc(),
				TII->get(Connex::ORV_H),
				regDest). // The reserved register, CONNEX_RESERVED_REGISTER_02
				addReg(predicateInstrOpnd[changedPredicateOpnd]).
				addReg(predicateInstrOpnd[changedPredicateOpnd]);
				}
				#else
				#error "This case is NOT implemented. Implement it!"
				#endif

				/*
				predicateInstrOpndAux[0] = regDest; // Reserved register
				predicateInstrOpndAux[1] = predicateInstrOpnd[1 - changedPredicateOpnd];
				*/
				predicateInstrOpndAux[changedPredicateOpnd] = CONNEX_RESERVED_REGISTER_02; // regDest
				predicateInstrOpndAux[1 - changedPredicateOpnd] =
				predicateInstrOpnd[1 - changedPredicateOpnd];
				}

				LLVM_DEBUG(dbgs() << " predicateInstrOpndAux[0] = "
				<< predicateInstrOpndAux[0]
				<< "\n");
				LLVM_DEBUG(dbgs() << " predicateInstrOpndAux[1] = "
				<< predicateInstrOpndAux[1]
				<< "\n");

				MachineBasicBlock::iterator I2succ = I2;
				I2succ++;
				BuildMI(MBB,
				I2, // Immediately before the COPY instr
				IMI->getDebugLoc(),
				TII->get(Connex::END_WHERE)
				//, I2->getOperand(0).getReg()
				);
				LLVM_DEBUG(dbgs() << " Finished creating the END_WHERE\n");

				#ifndef TREAT_ONLY_ONCE_CHANGE_PREDICATE_OPERANDS
				/*
				// Ipredicate is predicate
				// Unnecessary check:
				assert(Ipredicate->getOperand(0).getReg() !=
				I2->getOperand(0).getReg());
				*/
				/*
				This check is actually VAGUELY different from the one above because
				the one above inserts a register save (copy) instruction before the original WHERE,
				while this new one after the new END_WHERE resulting from the split.
				VERY IMPORTANT Note: the new predicate WHERE can have the result stored in RESERVED_REGISTER.
				* We now check for conflicts between:
				- destination register operand of COPY and
				- input registers of predicate instruction.
				*
				* Note: I2 is the COPY instruction that triggered the split of WHERE block.
				*
				* Addressing the case, where after the split of WHERE* block we have something
				* like this immediately after the 1st new WHERE* block, before the 2nd
				* WHERE* block, where the repeated predicate instruction (repeated by us)
				* happens to use the register defined in the COPY instruction, which makes
				* the computation incorrect:
				* END_WHERE;
				* R(26) = R(10) \| R(10); // This COPY instruction is the reason of the split
				* R(30) = R(26) < R(3);
				* NOP
				* WHERE*
				*
				* Note: R(30) (CONNEX_RESERVED_REGISTER_01) is a reserved register.
				*
				* To correct the problem in this example we have to copy the value of R(26)
				* in R(30):
				* END_WHERE;
				* R(30) = R(26) \| R(26);
				* R(26) = R(10) \| R(10); // This COPY instruction is the reason of the split
				* R(30) = R(30) < R(3);
				* NOP
				* WHERE*
				*/
				int changeInputPredicateOperandsDueToCOPY = 0;
				if (predicateInstrOpnd[0] == I2->getOperand(0).getReg()) {
				changeInputPredicateOperandsDueToCOPY \|= 1;
				}
				if (predicateInstrOpnd[1] == I2->getOperand(0).getReg()) {
				changeInputPredicateOperandsDueToCOPY \|= 2;
				}
				//
				assert(changeInputPredicateOperandsDueToCOPY != 3 &&
				// important-TODO: handle this assert violation case
				"We shouldn't have such a case - doesn't really make sense for a "
				"conditional to have both operands equal.");

				LLVM_DEBUG(dbgs() << " changeInputPredicateOperandsDueToCOPY = "
				<< changeInputPredicateOperandsDueToCOPYMBB << "\n");
				/*
				assert(! (changedPredicateOpnd != -1 && changeInputPredicateOperandsDueToCOPY != 0) &&
				// TODO: if not merging the 2 cases together, handle this assert violation case,
				"We currently can't handle both cases simultaneously.");
				*/
				//
				if (changeInputPredicateOperandsDueToCOPY != 0) {
				LLVM_DEBUG(dbgs() << " PassHandleMisplacedInstr::runOnMachineFunction(): correcting "
				"the conflicting register (due to the COPY) in the "
				"predicate instruction\n");
				MachineBasicBlock::iterator Icorrect = I2succ;
				//Icorrect++;
				#ifdef COPY_REGISTER_IMPLEMENTED_WITH_ORV_H
				BuildMI(MBB,
				Icorrect, // We insert this MachineInstr after the new END_WHERE, before the COPY instr
				IMI->getDebugLoc(),
				TII->get(Connex::ORV_H),
				CONNEX_RESERVED_REGISTER_02).
				addReg(I2->getOperand(0).getReg()).
				addReg(I2->getOperand(0).getReg());
				#else
				#error "This case is NOT implemented. Implement it!"
				#endif


				/* Note: Ipredicate is the predicate for the 1st (part) WHERE* block.
				//Ipredicate->getOperand(1).setReg(CONNEX_RESERVED_REGISTER_02); */

				LLVM_DEBUG(dbgs() << "PassHandleMisplacedInstr: after WHERE block processed: MBB = ";
				MBB.dump());
				// We check that we don't mess up the program - TODO we should also check that the iterators are not messed up
				/*
				for (MachineBasicBlock::iterator Inew = MBB.begin(),
				IEnew = MBB.end(); Inew != IEnew; ++Inew) {
				//MachineInstr *IMI = I;
				LLVM_DEBUG(dbgs() << " runOnMachineFunction(): Inew = "
				<< *Inew << "\n");
				}
				*/
				}
				#endif // END ifndef TREAT_ONLY_ONCE_CHANGE_PREDICATE_OPERANDS


				// I2succ++;
				LLVM_DEBUG(dbgs() << " moving I2 immediately after END_WHERE of "
				"split WHERE block\n");

				/* VERY IMPORTANT: We create another predicate, a NOP and a new WHERE*
				instructions, identical with the (previous) one associated to the
				WHERE block, EXCEPT the destination register is
				CONNEX_RESERVED_REGISTER_02 - this is safe. */
				BuildMI(MBB,
				I2succ, // We insert new instr immediately before I2succ
				IMI->getDebugLoc(),
				TII->get(predicateInstrOpcode),
				#define NEW2018_08_11
				#ifdef NEW2018_08_11
				CONNEX_RESERVED_REGISTER_03
				#else
				/* destRegisterPredicateOfSplitWhere is made -1 only after
				iterating over END_WHERE, below
				*/
				destRegisterPredicateOfSplitWhere != -1 ?
				destRegisterPredicateOfSplitWhere :
				regDest // It is CONNEX_RESERVED_REGISTER_02
				#endif
				).
				/* We now change the conflicting register in the predicate
				* instruction.
				*/
				#ifdef TREAT_ONLY_ONCE_CHANGE_PREDICATE_OPERANDS
				addReg((changedPredicateOpnd == 0) ?
				#else
				addReg(((changeInputPredicateOperandsDueToCOPY & 1) == 1) ?
				#endif
				(unsigned)CONNEX_RESERVED_REGISTER_02 :
				predicateInstrOpndAux[0]). //predicateInstrOpnd1).
				#ifdef TREAT_ONLY_ONCE_CHANGE_PREDICATE_OPERANDS
				addReg((changedPredicateOpnd == 1) ?
				#else
				addReg(((changeInputPredicateOperandsDueToCOPY & 2) == 2) ?
				#endif
				(unsigned)CONNEX_RESERVED_REGISTER_02 :
				predicateInstrOpndAux[1]);

				BuildMI(MBB,
				I2succ,
				IMI->getDebugLoc(),
				TII->get(Connex::NOP_BPF));
				// TODO: maybe add an addImm(0)?, although it works without

				// We add the same WHERE instr as the one for this block
				/* This gives the following error:
				<<Assertion `!N->getParent() && "machine instruction already in a basic block"' failed.>>
				MBB.insert(I2succ, IMI); // before I2succ
				*/
				LLVM_DEBUG(dbgs() << " splitWhereBlock(): IMI (for split) = "
				<< *IMI << "\n");
				/* From http://llvm.org/doxygen/MachineInstrBuilder_8h_source.html#l00312:
				"inserts the newly-built instruction before the given position". */
				/*
				IMI = I2succ;
				LLVM_DEBUG(dbgs() << " IMI = I2succ = "
				<< *IMI << "\n");
				IMI--; // IMPORTANT: This makes IMI NULL since IMI is a MachineInstr - see /home/asusu/LLVM/llvm38Nov2016/llvm/build40/bin/Tests/NEW_v128i16/DawnCC/35l_MatMul_f16/SIZE_128/L/STDerr_llc_01_old17
				*/
				// See good comments on iterator invalidation: http://llvm.1065342.n5.nabble.com/deleting-or-replacing-a-MachineInst-td77723.html
				I = BuildMI(MBB,
				I2succ, // We insert new instr immediately before I2succ
				IMI->getDebugLoc(),
				TII->get(IMI->getOpcode()),
				regDest
				);

				// TODO TODO TODO TODO: understand if it generates (due to iterator invalidation??) another END_WHERE - see /home/asusu/LLVM/llvm38Nov2016/llvm/build40/bin/Tests/NEW_v128i16/DawnCC/25k_map/MUL_i32/!!/5_GOOD/output_old06.cpp

				// NOTE: I is the new WHERE* instruction just created
				// We update I2 to check for more COPY instrs after the new created WHERE
				I2 = I; I2++;

				// We update IMI since we insert COPY before predicate of WHERE using IMI
				IMI = (&(*I));

				//MachineBasicBlock::iterator Iaux10 = I2succ; Iaux10--;
				LLVM_DEBUG(dbgs() << " I2succ = "
				<< *I2succ << "\n");
				LLVM_DEBUG(dbgs() << " IMI = "
				<< *IMI << "\n");
				LLVM_DEBUG(dbgs() << " I = "
				<< *I << "\n");
				LLVM_DEBUG(dbgs() << " I2 = "
				<< *I2 << "\n");

				//break;
				//assert();
				LLVM_DEBUG(dbgs() << " To check: IMI = "
				<< *IMI << "\n");

				LLVM_DEBUG(dbgs()
				<< "splitWhereBlock(): after splitting WHERE block in 2: MBB = ";
				MBB.dump());
				} // END splitWhereBlock()


				/// \brief Loop over all of the basic blocks
				bool runOnMachineFunction(MachineFunction &MF) {
				bool changedMF = false;

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineFunction.html
				LLVM_DEBUG(dbgs() << "Entered PassHandleMisplacedInstr::runOnMachineFunction(MF = "
				//; MF.dump();
				<< MF.getName()
				//dbgs()
				<< ")\n");
				//bool Changed = false;

				// Process all basic blocks.
				for (auto &MBB : MF) {
				//int anotherReservedRegister = -1;
				int destRegisterPredicateOfSplitWhere = -1;

				// For the current MBB:
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineBasicBlock.html
				LLVM_DEBUG(dbgs()
				<< "PassHandleMisplacedInstr::runOnMachineFunction(): a new MBB = "
				<< MBB
				<< "\n");

				const TargetInstrInfo *TII = MF.getSubtarget<ConnexSubtarget>().getInstrInfo();

				arsenmUnsubmitted Done Reply Inline Actions This needs to be broken down into smaller functions arsenm: This needs to be broken down into smaller functions
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MachineBasicBlock.html
				LLVM_DEBUG(dbgs()
				<< "PassHandleMisplacedInstr::runOnMachineFunction(): again MBB = "
				<< MBB
				<< "\n");

				for (MachineBasicBlock::iterator I = MBB.begin(),
				IE = MBB.end(); I != IE; ++I) {
				MachineInstr IMI = (&(I));
				/*
				if (IMI == &MI)
				I++;
				// predMI contains normally instruction VLOAD_H_SYM_IMM
				break;
				*/
				// predMI = I;
				LLVM_DEBUG(dbgs() << " runOnMachineFunction(): I = "
				<< *I << "\n");
				LLVM_DEBUG(dbgs() << " runOnMachineFunction(): DontTreatCopyInstructions = "
				<< DontTreatCopyInstructions << "\n");

				if (DontTreatCopyInstructions == false) {
				/*IMPORTANT: we move the COPY instructions outside
				the WHERE block, just like the ARM/Thumb2ITBlockPass.cpp
				does (the ARM pass is also registered in addPreSched2()).
				Note that moving COPY instrs before WHERE (ARM IT) blocks
				(as it seems ARM surprisingly is doing, since
				MBB::insert(iterator, MI) does "Insert MI into the
				instruction list before I, possibly inside a bundle.")
				can change semantics in most cases.

				IMPORTANT: This is where we remove any COPY instructions
				generated by the TwoAddressInstructionPass and not erased
				by RegisterCoalescer (transformed
				into ORV_H) instructions inside WHERE* blocks.
				This is to handle cases like sequences of manually
				selected instructions in ConnexISelDAGToDAG for MULi32, DIVi16, etc.
				*/
				if (IMI->getOpcode() == Connex::WHEREEQ \|\|
				IMI->getOpcode() == Connex::WHERELT \|\|
				IMI->getOpcode() == Connex::WHERECRY) {
				LLVM_DEBUG(dbgs() << "runOnMachineFunction(): found WHERE* block\n");

				/* Removing useless COPY immediately before WHERE* block
				* (between NOP and WHERE*, where it should normally be put).
				* It is useless - we eye-balled seriously on a few
				* programs, most notably SSD.f16 on Jul 29-30 2018
				* (I guess - MEGA-TODO: check if so) always because it is
				* generated by the WHERE* instruction and,
				* therefore, it's NOT required.
				* important-TODO: we should take care of COPY
				* instructions being moved by the post-RA scheduler. */
				MachineBasicBlock::iterator ItmpToErase = IMI;
				ItmpToErase--;
				if (ItmpToErase->getOpcode() != Connex::NOP_BPF
				//\|\| ItmpToErase->getOpcode() == Connex::NOP
				) {
				#ifdef COPY_REGISTER_IMPLEMENTED_WITH_ORV_H
				if (ItmpToErase->getOpcode() == Connex::ORV_H) {
				#else
				#error "This case is NOT implemented. Implement it!"
				#endif
				MachineInstr Iremove = (&(ItmpToErase));
				//ItmpToErase--;

				/* We assert this COPY is related to the WHERE*
				instruction - if NOT, then the COPY was moved
				probably by the post-RA scheduler here.
				*/
				assert(Iremove->getOperand(0).isReg() &&
				Iremove->getOperand(0).isDef() &&
				Iremove->getOperand(0).getReg() == IMI->getOperand(0).getReg()
				);


				/* Checking that it is really safe to remove this COPY
				since it is not used by any instruction after it.
				*/
				MachineBasicBlock::iterator Icheck = I;
				// We jump over the WHERE* instruction found
				Icheck++;
				LLVM_DEBUG(dbgs() << " runOnMachineFunction(): Icheck = "
				<< *Icheck << "\n");
				// Iterating over all remaining instructions of the BB
				for (; Icheck != IE; Icheck++) {
				LLVM_DEBUG(dbgs() << " Icheck = " << *Icheck);
				if (Icheck->getNumOperands() > 0 &&
				Icheck->getOperand(0).isReg() &&
				Icheck->getOperand(0).getReg() ==
				Iremove->getOperand(0).getReg()) {
				// It normally has to be a def - if it's a use it's bad
				assert(Icheck->getOperand(0).isDef() &&
				"PassHandleMisplacedInstr: Found a 'useless' COPY "
				"that is not useless since it is used after... - "
				"this is not good --> change ConnexTargetMachine.cpp");
				break;
				}
				}

				LLVM_DEBUG(dbgs() << " Removing useless COPY immediately "
				"before the WHERE block.\n");

				MBB.remove(Iremove);
				}
				}


				MachineBasicBlock::iterator I2 = I; // + 1;
				// We jump over the WHERE* instruction found
				I2++;
				LLVM_DEBUG(dbgs() << " runOnMachineFunction(): I2 = "
				<< *I2 << "\n");

				//continue;

				// Iterating over all remaining instructions of the BB
				for (; I2 != IE; /* I2++ */) {
				LLVM_DEBUG(dbgs() << " I2 = " << *I2);

				// TO_ADAPT: currently copyPhysReg() is implemented with ORV_H
				/* IMPORTANT: NORMALLY, inside WHERE blocks generated
				with Opincaa lib's Kernel::genLLVMISelManualCode(),
				we are guaranteed to have only ORV_SPECIAL_H Connex
				instructions, so meeting an ORV_H is only when a COPY
				was generated by the TwoAddressInstructionPass. */
				if (
				#ifdef COPY_REGISTER_IMPLEMENTED_WITH_ORV_H
				I2->getOpcode() == Connex::ORV_H
				#else
				#error "This case is NOT implemented. Implement it!"
				#endif
				\|\| I2->getOpcode() == Connex::LD_FILL_H) {
				// MEGA-TODO: \|\| I2->getOpcode() == Connex::ST_FILL_H
				/* The ORV_H instruction implemented in copyPhysReg()
				has both input operands equal.
				NOTE: the destination register of any instruction
				I is I->getOperand(0).
				*/

				#ifdef COPY_REGISTER_IMPLEMENTED_WITH_ORV_H
				if (I2->getOpcode() == Connex::ORV_H)
				assert(I2->getOperand(1).getReg() ==
				I2->getOperand(2).getReg() &&
				"I2 is an ORV_H with different input operands. "
				"Maybe too paranoid check: We do not "
				"recommend to have emulation Opincaa kernels "
				"generated by Kernel::genLLVMISelManualCode() "
				"with ORV_H inside WHERE blocks (if these "
				"instructions come from there). But you "
				"can comment this assert and issue a simple "
				"warning.");
				/*
				if (I2->getOperand(1).getReg() !=
				I2->getOperand(2).getReg())
				LLVM_DEBUG(dbgs() << "PassHandleMisplacedInstr: Warning: "
				"I2->getOperand(1).getReg() != "
				"I2->getOperand(2).getReg()\n\n");
				*/
				#endif


				/* From http://llvm.org/doxygen/MachineBasicBlock_8h_source.html:
				MBB::insert(iterator, MI)
				"Insert MI into the instruction list before I, possibly inside a bundle.
				*/
				LLVM_DEBUG(dbgs() << " found COPY/LD_FILL at I2 = " << *I2
				<< " --> moving it out of the WHERE block to "
				"preserve correct program semantics.\n");

				/* We should move I2 before or after the WHERE block,
				* or split the WHERE block in 2. */
				/* The algo is (a sketch that MIGHT NOT reflect
				totally the implementation):
				NOTE: this is the case that allows having COPY between
				predicate and WHERE instr.
				If the COPY doesn't use (doesn't have as source)
				a register defined in the WHERE block
				BEFORE the COPY (NO RAW/flow dependence relation to be broken)
				and also the COPY doesn't define a register
				that is used by an instruction before
				(NO WAR/anti-dependence relation to be broken):
				We move the COPY exactly before the
				WHERE instruction starting the block
				Else
				If the COPY doesn't use (doesn't have as source)
				a register defined in the WHERE block,
				after the COPY (NO WAR dep broken)
				and also the COPY doesn't define a register
				used by an instruction after it (NO RAW dep broken):
				We move the COPY exactly after the END_WHERE
				instruction ending the block
				Else
				Moving the COPY immediately before/after
				the WHERE block is UNsafe and
				would change semantics program
				The solution is to split the WHERE block in
				two and for the 2nd WHERE block to copy the
				predicate (together with a NOP) just
				before it.
				*/

				#ifdef ALLOW_COPY_BETWEEN_PREDICATE_AND_WHERE_INSTRUCTIONS
				MachineBasicBlock::iterator I3 = IMI; // IMI is WHERE instr
				LLVM_DEBUG(dbgs() << " I3 = "
				<< *I3 << "\n");

				I3--;
				LLVM_DEBUG(dbgs() << " I3 (after 1 -)= "
				<< *I3 << "\n");

				assert(I3->getOpcode() == Connex::NOP \|\|
				I3->getOpcode() == Connex::NOP_BPF);

				I3--;
				LLVM_DEBUG(dbgs() << " I3 (after 2 -)= "
				<< *I3 << "\n");
				assert(I3->getOpcode() == Connex::EQ_H \|\|
				I3->getOpcode() == Connex::LT_H \|\|
				I3->getOpcode() == Connex::ULT_H);
				#else
				MachineBasicBlock::iterator I3 = IMI; // IMI is WHERE instr
				I3++;
				#endif

				#define SAFE_SINCE_NO_CONSTRAINT 0
				#define NOT_SAFE_TO_PUT_COPY_BEFORE_WHERE_BLOCK 1
				#define NOT_SAFE_TO_PUT_COPY_AFTER_WHERE_BLOCK 2
				#define SAFE_TO_PUT_COPY_IN_SPLIT_WHERE_BLOCK 3
				int whatToDo = SAFE_SINCE_NO_CONSTRAINT;

				//bool I2afterIsInsideWhereBlock = true;
				bool I3IsBeforeI2 = true;

				// Remember: I2 points to the COPY instruction
				for (; I3 != IE; I3++) {
				if (I3->getOpcode() == Connex::END_WHERE) {
				break;
				}

				LLVM_DEBUG(dbgs() << " I3 = "
				<< *I3);

				if (I3 == I2) {
				I3IsBeforeI2 = false;
				continue;
				}
				LLVM_DEBUG(dbgs() << " I3IsBeforeI2 = "
				<< I3IsBeforeI2 << "\n");

				// We look at all operands of instruction I3
				for (unsigned idOpnd = 0; idOpnd < I3->getNumOperands();
				idOpnd++) {
				MachineOperand &I3Opnd = I3->getOperand(idOpnd);

				LLVM_DEBUG(dbgs() << " I3Opnd (index = " << idOpnd
				<< ") = " << I3Opnd << "\n");

				if (I3Opnd.isReg() && I3Opnd.isUse()) {
				// Remember: I2 points to the COPY instruction
				if (I3Opnd.getReg() == I2->getOperand(0).getReg()) {
				if (I3IsBeforeI2) {
				// RBW dependence w.r.t. COPY (I2), which writes
				// I3 uses the dst-register of I2 (the COPY instr)
				LLVM_DEBUG(dbgs() << " I3, which is before I2, "
				"uses the dst-register of I2 "
				"--> moving I2 before the "
				"WHERE block is NOT safe\n");

				whatToDo \|= NOT_SAFE_TO_PUT_COPY_BEFORE_WHERE_BLOCK;
				/*
				LLVM_DEBUG(dbgs() << " changing I2afterOpnd's reg to = "
				<< I2->getOperand(0).getReg() << "\n");
				I2afterOpnd.setReg(I2->getOperand(1).getReg());
				*/
				}
				else { // NOT I3IsBeforeI2
				// RAW dependence w.r.t. COPY (I2), which writes
				// I3 uses the dst-register of I2 (the COPY instr)
				LLVM_DEBUG(dbgs() << " I3, which is after I2, "
				"uses the dst-register of I2 "
				"--> moving I2 after the "
				"WHERE block is NOT safe\n");

				whatToDo \|= NOT_SAFE_TO_PUT_COPY_AFTER_WHERE_BLOCK;
				}
				}
				else
				/* Although we are safe on the else branch,
				we put this code here for "completness".
				*/
				if (
				#ifdef COPY_REGISTER_IMPLEMENTED_WITH_ORV_H
				I2->getOpcode() == Connex::ORV_H &&
				#endif
				I3Opnd.getReg() == I2->getOperand(1).getReg()) {
				// RAR dependence - NONE actually :)
				if (I3IsBeforeI2) {
				// I3 uses the dst-register of I2 (the COPY instr)
				LLVM_DEBUG(dbgs() << " I3, which is before I2, "
				"uses the src-register of I2 "
				"--> everything is safe\n");

				//whatToDo \|= NOT_SAFE_TO_PUT_COPY_BEFORE_WHERE_BLOCK;
				}
				else {
				// I3 uses the dst-register of I2 (the COPY instr)
				LLVM_DEBUG(dbgs() << " I3, which is after I2, "
				"uses the src-register of I2 "
				"--> everything is safe\n");

				//whatToDo \|= NOT_SAFE_TO_PUT_COPY_AFTER_WHERE_BLOCK;
				}
				}
				} // END I3Opnd.isUse()
				else
				if (I3Opnd.isReg() && I3Opnd.isDef()) {
				// Remember: I2 points to the COPY instruction
				if (I3Opnd.getReg() == I2->getOperand(0).getReg()) {
				if (I3IsBeforeI2) {
				// WAW dependence w.r.t. COPY (I2), which writes
				// I3 defs the dst-register of I2 (the COPY instr)
				LLVM_DEBUG(dbgs() << " I3, which is before I2, "
				"defs the dst-register of I2 --> "
				"moving I2 before the "
				"WHERE block is NOT safe\n");

				whatToDo \|= NOT_SAFE_TO_PUT_COPY_BEFORE_WHERE_BLOCK;
				}
				else {
				// WAW dependence w.r.t. COPY (I2), which writes
				// I3 defs the dst-register of I2 (the COPY instr)
				LLVM_DEBUG(dbgs() << " I3, which is after I2, "
				"defs the dst-register of I2 --> "
				"moving I2 after the "
				"WHERE block is NOT safe\n");

				whatToDo \|= NOT_SAFE_TO_PUT_COPY_AFTER_WHERE_BLOCK;
				}
				}
				else
				if (
				#ifdef COPY_REGISTER_IMPLEMENTED_WITH_ORV_H
				I2->getOpcode() == Connex::ORV_H &&
				#endif
				I3Opnd.getReg() == I2->getOperand(1).getReg()) {
				if (I3IsBeforeI2) {
				// RAW dependence w.r.t. I3, which writes
				// I3 defs the dst-register of I2 (the COPY instr)
				LLVM_DEBUG(dbgs() << " I3, which is before I2, "
				"defs the src-register of I2 --> "
				"moving I2 before the "
				"WHERE block is NOT safe\n");

				whatToDo \|= NOT_SAFE_TO_PUT_COPY_BEFORE_WHERE_BLOCK;
				}
				else {
				// RBW dependence w.r.t. I3, which writes
				// I3 defs the dst-register of I2 (the COPY instr)
				LLVM_DEBUG(dbgs() << " I3, which is after I2, "
				"defs the src-register of I2 --> "
				"moving I2 after the "
				"WHERE block is NOT safe\n");

				whatToDo \|= NOT_SAFE_TO_PUT_COPY_AFTER_WHERE_BLOCK;
				}
				}
				} // END I3Opnd.isDef()
				} // END for loop idOpnd
				} // END for loop with ind-var I3

				/*
				* Note:
				* I = main loop iterating over all instr of the MBB
				* IMI = I;
				* I2
				* if IMI == WHERE*
				* I2 = I + 1;
				* for (;; I2++)
				* if I2 == ORV_H (or whatever is used to implement the COPY primitive)
				* for (I3 = IMI + 1; ; I3++) // used to compute whatToDo;
				if I3 == END_WHERE
				break;
				compute whatToDo;
				*/
				MachineBasicBlock::iterator I2plus1 = I2;
				/* We need to increment it, otherwise it looks that
				* I2 and I2plus1 are identical after remove()
				* and insert()
				*/
				I2plus1++;
				LLVM_DEBUG(dbgs() << " runOnMachineFunction(): I2plus1 = "
				<< *I2plus1 << "\n");
				LLVM_DEBUG(dbgs() << " runOnMachineFunction(): I2 (before moving I2) = "
				<< *I2 << "\n");
				LLVM_DEBUG(dbgs() << " whatToDo = " << whatToDo << "\n");

				if (//whatToDo == SAFE_SINCE_NO_CONSTRAINT \|\|
				whatToDo == NOT_SAFE_TO_PUT_COPY_AFTER_WHERE_BLOCK) {
				// Moving COPY before the WHERE block.
				putCOPYBeforeWhereBlock(MBB, TII, IMI, I2,
				I2plus1, IE, changedMF,
				destRegisterPredicateOfSplitWhere);
				// break;

				} // END moving I2 immediately before the logical instruction linked to the WHERE block
				else
				if (
				// We treat here SAFE_SINCE_NO_CONSTRAINT because moving after WHERE block doesn't add any auxiliary instruction
				whatToDo == SAFE_SINCE_NO_CONSTRAINT \|\|
				whatToDo == NOT_SAFE_TO_PUT_COPY_BEFORE_WHERE_BLOCK) {
				// TODO TODO: we should put multiple COPY instructions from this WHERE block in the SAME order after END_WHERE. See if such cases happen.
				LLVM_DEBUG(dbgs() << " moving I2 immediately after WHERE block\n");
				assert(I3 != IE);

				LLVM_DEBUG(dbgs() << " runOnMachineFunction(): I2 = "
				<< *I2 << "\n");

				// I3 is pointing to END_WHERE (see code above)
				LLVM_DEBUG(dbgs() << " runOnMachineFunction(): I3 = "
				<< *I3 << "\n");

				assert( (I3->getOpcode() == Connex::END_WHERE) &&
				"I3 should point to END_WHERE (see code above).");
				/*
				assert( (I3->getOpcode() == Connex::WHEREEQ \|\|
				I3->getOpcode() == Connex::WHERELT \|\|
				I3->getOpcode() == Connex::WHERECRY) &&
				"We should NOT be arriving here otherwise.");
				*/

				I3++; // Jump over END_WHERE (normally)
				LLVM_DEBUG(dbgs() << " runOnMachineFunction(): I3 (after I3++) = "
				<< *I3 << "\n");

				LLVM_DEBUG(dbgs() << " runOnMachineFunction(): Preparing to remove I2 = "
				<< *I2
				<< " and moving it before I3 = "
				<< *I3 << "\n");
				MBB.remove((&(*I2)));
				MBB.insert(I3, (&(*I2))); // It inserts before I3

				/* This is NOT good for case where we have 2+ COPY
				instrs in the WHERE block: I = I3; */
				//I2++;
				//I = I2;
				LLVM_DEBUG(dbgs() << " runOnMachineFunction(): I2 (after moving I2) = "
				<< *I2 << "\n");
				// I2plus1++;
				LLVM_DEBUG(dbgs() << " runOnMachineFunction(): I2plus1 = "
				<< *I2plus1 << "\n");

				/* Here we handle the case of more than 1 COPY
				instr in the WHERE block (I2plus1 represents the next
				instr after the COPY (before move)) */
				I2 = I2plus1;

				MachineBasicBlock::iterator I2plus2 = I2plus1;
				I2plus2++;
				LLVM_DEBUG(dbgs() << " runOnMachineFunction(): I2plus2 = "
				<< *I2plus2 << "\n");

				changedMF = true;
				/* This is NOT good for case where we have 2+ COPY
				instrs in the WHERE block: break;
				We keep searching with I2 for loop in this WHERE block
				for more COPY instrs. */
				} // END if (whatToDo == NOT_SAFE_TO_PUT_COPY_BEFORE_WHERE_BLOCK)
				else
				if (whatToDo == SAFE_TO_PUT_COPY_IN_SPLIT_WHERE_BLOCK) {
				splitWhereBlock(MBB, TII, I, IMI, I2, IE,
				changedMF,
				destRegisterPredicateOfSplitWhere);
				LLVM_DEBUG(dbgs() << " After calling splitWhereBlock(): IMI = "
				<< *IMI << "\n");
				} // END if SPLIT WHERE block
				else
				// IMPORTANT: we increment here the iterator over instruction in WHERE block
				I2++;
				} // END if (I2->getOpcode() == Connex::ORV_H)
				else {
				// IMPORTANT: we increment here the iterator over instruction in WHERE block
				I2++;
				// else
				}

				// Note that the END_WHERE takes input node and has a value output
				if (I2->getOpcode() == Connex::END_WHERE) {
				LLVM_DEBUG(dbgs() << " found END_WHERE --> breaking I2 loop\n");
				I2++;
				I = I2;

				// MEGA-TODO: think if OK here
				destRegisterPredicateOfSplitWhere = -1;

				LLVM_DEBUG(dbgs() << " Making destRegisterPredicateOfSplitWhere = -1\n");

				break;
				}

				LLVM_DEBUG(dbgs() << "PassHandleMisplacedInstr: at end of for loop I2, I2 = "
				<< *I2
				<< " and IMI = "
				<< *IMI);
				} // END for loop with ind-var I2

				LLVM_DEBUG(dbgs() << "PassHandleMisplacedInstr: after WHERE block processed: MBB = ";
				MBB.dump());
				LLVM_DEBUG(dbgs() << "PassHandleMisplacedInstr: IMI = "
				<< *IMI);
				} // END if WHERE*
				} // END if (DontTreatCopyInstructions == false)
				} // END for (MachineBasicBlock::iterator I

				} // END for (auto &MBB : MF)

				LLVM_DEBUG(dbgs() << " runOnMachineFunction(): changedMF = "
				<< changedMF << "\n");

				return changedMF; // indicates if we changed MF
				} // end runOnMachineFunction(MachineFunction &MF)

				private:
				MachineRegisterInfo *MRI;

				static char ID;
				}; // END class PassHandleMisplacedInstr
				char PassHandleMisplacedInstr::ID = 0;

				} // END namespace



				// We currently don't use anymore bundles, since we avoid using the post-RA scheduler
				//#define CREATE_BUNDLES
				#ifdef CREATE_BUNDLES
				#include "ConnexTargetMachine_NotUsed_Important.h"
				#endif

				// Gives error: should have been declared inside ‘llvm’: FunctionPass *llvm::createPreRAPassFinalizeBundles() { return new PreRAPassFinalizeBundles(); }
				namespace llvm {
				#ifdef CREATE_BUNDLES
				FunctionPass *createPassCreateBundles() {
				return new PassCreateBundles();
				}

				FunctionPass *createPassFinalizeBundles() {
				return new PassFinalizeBundles();
				}
				#endif

				FunctionPass *createPassHandleMisplacedInstr() {
				return new PassHandleMisplacedInstr();
				}
				}


				namespace {

				// Connex Code Generator Pass Configuration Options.
				class ConnexPassConfig : public TargetPassConfig {
				public:
				ConnexPassConfig(ConnexTargetMachine *TM, PassManagerBase &PM)
				: TargetPassConfig((LLVMTargetMachine &)(*TM), PM) {}


				ConnexTargetMachine &getConnexTargetMachine() const {
				return getTM<ConnexTargetMachine>();
				}


				//#ifdef CREATE_BUNDLES // IMPORTANT - not executing these methods inside results in error: <<llc: target does not support generation of this file type!>>
				//bool addInstSelector() override;
				// Install an instruction selector pass using
				// the ISelDag to gen Connex code; also register extra passes.

				// VERY IMPORTANT: commenting this method results in error: <<llc: target does not support generation of this file type!>>
				//#ifdef CREATE_BUNDLES
				bool /* ConnexPassConfig:: */ addInstSelector() {
				addPass(createConnexISelDag(getConnexTargetMachine()));

				/* The registered pass is run immediately after the 1st List
				* scheduling, after the ISel pass registered above.
				* The reason it is NOT directly after the ISel pass is that it seems
				* that the 1st scheduling
				* pass is considered to be linked together with ISel.
				*/
				#ifdef CREATE_BUNDLES
				addPass(createPassCreateBundles());
				#endif

				return false;
				}
				//#endif


				/* From http://llvm.org/docs/doxygen/html/classllvm_1_1TargetPassConfig.html
				This method may be implemented by targets that want to run passes immediately before register allocation.
				*/
				void addPreRegAlloc() {
				/*
				// IMPORTANT: As of Mar 2017, implementing this pass with finalizeBundle here
				// gives error at:
				// <<llvm/include/llvm/MC/MCRegisterInfo.h:355: const llvm::MCRegisterDesc& llvm::MCRegisterInfo::operator[](unsigned int) const: Assertion `RegNo < NumRegs && "Attempting to access record for invalid register number!"' failed.>>

				LLVM_DEBUG(dbgs() << "Entered ConnexPassConfig::addPreRegAlloc().\n");

				// Inspired from llvm/lib/Target/X86/X86TargetMachine.cpp and X86OptimizeLEAs.cpp
				if (getOptLevel() != CodeGenOpt::None)
				addPass(createPassFinalizeBundles());
				*/

				/*
				LLVM_DEBUG(dbgs() << "Entered ConnexPassConfig::addPreRegAlloc().\n");
				//addPass(createPassCreateBundles());

				// IMPORTANT: finalizeBundle gives error:
				// <<MCRegisterInfo.h:355: const llvm::MCRegisterDesc&
				// llvm::MCRegisterInfo::operator[](unsigned int)
				// const: Assertion `RegNo < NumRegs && "Attempting to access record
				// for invalid register number!"' failed.>>
				addPass(createPassFinalizeBundles());
				*/
				}


				void addPostRegAlloc() {
				/*
				// It does NOT help for my llc -O1 bug related to <<Using an undefined physical register>>

				LLVM_DEBUG(dbgs() << "Entered ConnexPassConfig::addPostRegAlloc().\n");
				addPass(createPassFinalizeBundles());
				*/
				}


				#ifdef CREATE_BUNDLES
				/* IMPORTANT:
				From http://llvm.org/docs/doxygen/html/classllvm_1_1TargetPassConfig.html
				<<This method may be implemented by targets that want to run passes
				after prolog-epilog insertion and before the second instruction
				scheduling pass.>>
				(This runs after register allocation, before 2nd (post-RA) scheduler) */
				void addPreSched2() {
				LLVM_DEBUG(dbgs() << "Entered ConnexPassConfig::addPreSched2().\n");

				// Inspired from llvm/lib/Target/ARM/ARMTargetMachine.cpp
				//if (getOptLevel() != CodeGenOpt::None)
				addPass(createPassFinalizeBundles());
				}
				#endif
				//#endif // CREATE_BUNDLES


				/*
				From http://llvm.org/doxygen/classllvm_1_1TargetPassConfig.html:
				<<This pass may be implemented by targets that want to run passes
				immediately before machine code is emitted.>>
				*/
				void addPreEmitPass() {
				LLVM_DEBUG(dbgs() << "Entered ConnexPassConfig::addPreEmitPass().\n");

				addPass(createPassHandleMisplacedInstr());

				// Here we add a stand-alone hazard recognizer pass
				addPass(&PostRAHazardRecognizerID);
				}
				};

				} // end namespace

				TargetPassConfig *ConnexTargetMachine::createPassConfig(PassManagerBase &PM) {
				return new ConnexPassConfig(this, PM);
				}

				/*
				*/
				// Inspired from ARCTargetMachine.cpp
				TargetTransformInfo ConnexTargetMachine::getTargetTransformInfo(const Function &F) {
				return TargetTransformInfo(ConnexTTIImpl(this, F));
				}

lib/Target/Connex/ConnexTargetTransformInfo.h

				//===-- ConnexTargetTransformInfo.h - Connex specific TTI ---------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				/// This file contains a TargetTransformInfo::Concept conforming object specific to the
				/// Connex target machine. It uses the target's detailed information to
				/// provide more precise answers to certain TTI queries, while letting the
				/// target independent and default TTI implementations handle the rest.
				///
				//===----------------------------------------------------------------------===//

				// Inspired from XCore/XCoreTargetTransformInfo.h

				#ifndef LLVM_LIB_TARGET_CONNEX_CONNEXTARGETTRANSFORMINFO_H
				#define LLVM_LIB_TARGET_CONNEX_CONNEXTARGETTRANSFORMINFO_H

				#include "Connex.h"
				#include "ConnexTargetMachine.h"
				#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/CodeGen/BasicTTIImpl.h"
				#include "llvm/CodeGen/TargetLowering.h"

				namespace llvm {

				class ConnexTTIImpl : public BasicTTIImplBase<ConnexTTIImpl> {
				typedef BasicTTIImplBase<ConnexTTIImpl> BaseT;
				typedef TargetTransformInfo TTI;
				friend BaseT;

				const ConnexSubtarget *ST;
				const ConnexTargetLowering *TLI;

				const ConnexSubtarget *getST() const {
				LLVM_DEBUG(dbgs() << "Entered getST()\n");
				return ST; }

				const ConnexTargetLowering *getTLI() const {
				LLVM_DEBUG(dbgs() << "Entered getTLI()\n");
				return TLI; }


				public:
				bool isLegalMaskedGather(Type *DataTy) {
				// Inspired from X86TargetTransformInfo.cpp
				LLVM_DEBUG(dbgs() << "Entered isLegalMaskedGather()\n");

				/*
				// Some CPUs have better gather performance than others.
				// TODO: Remove the explicit ST->hasAVX512()?, That would mean we would only
				// enable gather with a -march.
				if (!(ST->hasAVX512() \|\| (ST->hasFastGather() && ST->hasAVX2())))
				return false;

				// This function is called now in two cases: from the Loop Vectorizer
				// and from the Scalarizer.
				// When the Loop Vectorizer asks about legality of the feature,
				// the vectorization factor is not calculated yet. The Loop Vectorizer
				// sends a scalar type and the decision is based on the width of the
				// scalar element.
				// Later on, the cost model will estimate usage this intrinsic based on
				// the vector type.
				// The Scalarizer asks again about legality. It sends a vector type.
				// In this case we can reject non-power-of-2 vectors.
				// We also reject single element vectors as the type legalizer can't
				// scalarize it.
				if (isa<VectorType>(DataTy)) {
				unsigned NumElts = DataTy->getVectorNumElements();
				if (NumElts == 1 \|\| !isPowerOf2_32(NumElts))
				return false;
				}
				Type *ScalarTy = DataTy->getScalarType();
				if (ScalarTy->isPointerTy())
				return true;

				if (ScalarTy->isFloatTy() \|\| ScalarTy->isDoubleTy())
				return true;

				if (!ScalarTy->isIntegerTy())
				return false;

				unsigned IntWidth = ScalarTy->getIntegerBitWidth();
				return IntWidth == 32 \|\| IntWidth == 64;
				*/

				Type *ScalarTy = DataTy->getScalarType();

				if (ScalarTy->isHalfTy())
				return true;

				if (ScalarTy->isIntegerTy()) {
				unsigned IntWidth = ScalarTy->getIntegerBitWidth();
				LLVM_DEBUG(dbgs() << "isLegalMaskedGather(): IntWidth = "
				<< IntWidth << "\n");
				//return IntWidth == 16; // 32 \|\| IntWidth == 64;
				return (IntWidth == 16) \|\| (IntWidth == 32);
				}

				return false;
				}

				bool isLegalMaskedScatter(Type *DataType) {
				LLVM_DEBUG(dbgs() << "Entered isLegalMaskedScatter()\n");

				// Inspired from X86TargetTransformInfo.cpp
				return isLegalMaskedGather(DataType);
				}

				public:
				explicit ConnexTTIImpl(const ConnexTargetMachine *TM, const Function &F)
				: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl()),
				// Doesn't help (inspired from X86 backend) : BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),
				TLI(ST->getTargetLowering()) {
				LLVM_DEBUG(dbgs() << "Entered constructor ConnexTTIImpl()\n");
				}
				arsenmUnsubmitted Done Reply Inline Actions Noisy debug arsenm: Noisy debug

				/*
				unsigned getNumberOfRegisters(bool Vector) {
				if (Vector) {
				return 0;
				}
				return 12;
				}
				*/
				};

				} // end namespace llvm

				#endif

lib/Target/Connex/InstPrinter/CMakeLists.txt

				add_llvm_library(LLVMConnexAsmPrinter
				ConnexInstPrinter.cpp
				)

lib/Target/Connex/InstPrinter/ConnexInstPrinter.h

				//===-- ConnexInstPrinter.h - Convert Connex MCInst to asm syntax -------- C++ ---//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This class prints a Connex MCInst to a .s file.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_CONNEX_INSTPRINTER_CONNEXINSTPRINTER_H
				#define LLVM_LIB_TARGET_CONNEX_INSTPRINTER_CONNEXINSTPRINTER_H

				#include "llvm/MC/MCInstPrinter.h"

				namespace llvm {
				class MCOperand;

				class ConnexInstPrinter : public MCInstPrinter {
				public:
				ConnexInstPrinter(const MCAsmInfo &MAI, const MCInstrInfo &MII,
				const MCRegisterInfo &MRI)
				: MCInstPrinter(MAI, MII, MRI) {}

				void printInst(const MCInst *MI, raw_ostream &O, StringRef Annot,
				const MCSubtargetInfo &STI) override;

				// IMPORTANT Note: printOperand() etc are not methods of the
				// MCInstPrinter class, but they are methods called from the
				// TableGen generated code from ConnexGenAsmWriter.inc.
				void printOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O,
				const char *Modifier = nullptr);

				template <unsigned Bits, unsigned Offset = 0>
				void printUImm(const MCInst *MI, int opNum, raw_ostream &O);

				void printMemOperand(const MCInst *MI, int OpNo, raw_ostream &O,
				const char *Modifier = nullptr);

				// Taken from MSP430InstPrinter.h
				void printSrcMemOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O,
				const char *Modifier = nullptr);

				void printImm64Operand(const MCInst *MI, unsigned OpNo, raw_ostream &O);

				// Inspired from printi256mem() from [LLVM]/lib/Target/X86/InstPrinter/X86IntelInstPrinter.h
				void printScatterGatherMemOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O);

				// Autogenerated by tblgen.
				void printInstruction(const MCInst *MI, raw_ostream &O);
				static const char *getRegisterName(unsigned RegNo);

				private:
				// Taken from [LLVM]/llvm/lib/Target/Mips/InstPrinter/MipsInstPrinter.h
				void printUnsignedImm8(const MCInst *MI, int opNum, raw_ostream &O);

				// Required by ConnexGenAsmWriter.inc
				// Taken from Mips/InstPrinter/MipsInstPrinter.h
				void printUnsignedImm(const MCInst *MI, int opNum, raw_ostream &O);
				};
				}

				#endif

lib/Target/Connex/InstPrinter/ConnexInstPrinter.cpp

				//===-- ConnexInstPrinter.cpp - Convert Connex MCInst to asm syntax -------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This class prints an Connex MCInst to a .s file.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/CodeGen/MachineInstr.h"
				#include "Connex.h"
				#include "ConnexInstPrinter.h"
				#include "llvm/MC/MCAsmInfo.h"
				#include "llvm/MC/MCExpr.h"
				#include "llvm/MC/MCInst.h"
				#include "llvm/MC/MCSymbol.h"
				#include "llvm/Support/ErrorHandling.h"
				#include "llvm/Support/FormattedStream.h"

				#include "llvm/Support/Debug.h" // for dbgs and DEBUG() macro
				#include "ConnexConfig.h"

				using namespace llvm;


				//#define DEBUG_TYPE "asm-printer"
				#define DEBUG_TYPE "asm-inst-printer"

				// Include the auto-generated portion of the assembly writer.
				#include "ConnexGenAsmWriter.inc"


				#include "llvm/CodeGen/MachineInstr.h"
				#include <unordered_map>

				/*
				NOTE: as of Nov 2016, the LLVM APIs allow printing customized code only
				here (and NOT in ConnexAsmPrinter.cpp, which around a year ago had some APIs).
				*/

				/*
				We declare here these vars and use them as extern in
				ConnexAsmPrinter.cpp (and NOT the other way around - which gives ld error)
				because of the way these C modules are being linked by the Makefile scripts
				of LLVM.

				Note that the flow of the operations is
				ConnexAsmPrinter::EmitInstruction() gets called first and then
				ConnexInstPrinter::printUnsignedImm() gets called immediately after.
				(look at the stdout files generated by llc with the DEBUG prints)
				Also, if we look at ConnexAsmPrinter::EmitInstruction(), we have an
				automatic var MCInst TmpInst; .
				So, MCInst is generated for the time it is output to the stream and then
				automatically destroyed - so it does NOT make any sense to keep track in a
				map the associated MachineInstr for the MCInst in "flight" (TmpInst).
				*/
				const MachineInstr *crtMI = NULL;
				#ifdef NOTNOTNOT
				std::unordered_map<const MachineInstr , MCInst > mapMachineMCInst;
				#endif
				// A map associating: first is LD_H, ST_H or REPEAT, second is the associated INLINEASM
				std::unordered_map<const MachineInstr , const MachineInstr > mapLD_ST_REPEAT_InlineAsm;

				#ifdef NOTNOTNOT
				const MachineInstr retrieveAssociatedMachineInstr(const MCInst mci) {
				DEBUG(dbgs() << "Entered retrieveAssociatedMachineInstr()\n");

				const MachineInstr *res;

				//for (auto : mapMachineMCInst)
				// See http://www.cplusplus.com/reference/unordered_map/unordered_map/begin/
				for (auto it = mapMachineMCInst.begin();
				it != mapMachineMCInst.end(); ++it) {
				//std::cout << " " << it->first << ":" << it->second;
				if (it->second == mci) {
				const MachineInstr mi = &((it->first));
				DEBUG(dbgs() << "retrieveAssociatedMachineInstr(): "
				<< "mci = " << *mci
				<< ", mci = " << mci
				//<< ", it->second = " << it->second
				<< ", MachineInstr = " << mi
				//<< " " << *mi
				<< "\n");

				res = it->first;
				/const MachineInstr res = it->first;
				return res;
				*/
				}
				}
				return res;
				/*
				assert(0 && "MCInst not found!");
				return NULL;
				*/
				}
				#endif



				void ConnexInstPrinter::printInst(const MCInst *MI, raw_ostream &O,
				StringRef Annot, const MCSubtargetInfo &STI) {
				LLVM_DEBUG(dbgs() << "Entered ConnexInstPrinter::printInst()...\n");
				//O << "Entered ConnexInstPrinter::printInst()\n";
				LLVM_DEBUG(dbgs() << "printInst(): MI = " << *MI << "\n");
				LLVM_DEBUG(dbgs() << "printInst(): MI->getOpcode() = "
				<< MI->getOpcode() << "\n");

				/* For some reason, ConnexGenAsmWriter.inc cannot print INLINEASM from the
				MachineInstr bundles I create in ConnexInstrInfo.cpp, expandPostRAPseudo(),
				and then unpack in [Target]AsmPrinter::EmitInstruction(),
				because of this definition they have:
				static const uint32_t OpInfo0[] = {
				0U,>// PHI
				0U,>// INLINEASM
				...
				etc.
				So I handle these INLINEASMs myself here.
				TODO: maybe explain better.
				*/
				if (MI->getOpcode() == 1) {
				O << " ";
				printOperand(MI, 0, O); //getOperand(0));
				O << " // custom code in ConnexInstPrinter::printInst() for INLINEASM";
				}
				/*
				else
				if (MI->getOpcode() == 13) { // Handling bundle for VSELECT, more exactly instruction COPY
				// TODO TODO TODO: I should maybe implement this in printInstruction() and check for Bits != 0 and act accordingly
				O << " ";
				printOperand(MI, 0, O); //getOperand(0));
				O << " = ";
				printOperand(MI, 1, O);
				O << " \| ";
				printOperand(MI, 1, O);
				O << " ; // custom code in ConnexInstPrinter::printInst() for VSELECT";
				}
				*/
				else {
				printInstruction(MI, O);
				}

				printAnnotation(O, Annot);
				}


				static void printExpr(const MCExpr *Expr, raw_ostream &O) {
				#ifndef NDEBUG
				const MCSymbolRefExpr *SRE;

				if (const MCBinaryExpr *BE = dyn_cast<MCBinaryExpr>(Expr))
				SRE = dyn_cast<MCSymbolRefExpr>(BE->getLHS());
				else
				SRE = dyn_cast<MCSymbolRefExpr>(Expr);
				assert(SRE && "Unexpected MCExpr type.");

				MCSymbolRefExpr::VariantKind Kind = SRE->getKind();

				assert(Kind == MCSymbolRefExpr::VK_None);
				#endif

				O << *Expr;
				}

				void ConnexInstPrinter::printOperand(const MCInst *MI, unsigned OpNo,
				raw_ostream &O, const char *Modifier) {
				LLVM_DEBUG(dbgs() << "Entered ConnexInstPrinter::printOperand(OpNo = "
				<< OpNo << ")...\n");
				LLVM_DEBUG(dbgs() << "ConnexInstPrinter::printOperand(): *MI = "
				<< *MI << "\n");
				LLVM_DEBUG(dbgs() << "ConnexInstPrinter::printOperand(): MI->getNumOperands() = "
				<< MI->getNumOperands() << "\n");

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MCInst.html

				/* Simple failback, useful just for NOP -
				* TODO: I could take care of it in printInstruction(), which calls
				* printOperand()
				*/
				if (MI->getNumOperands() <= OpNo)
				return;

				LLVM_DEBUG(dbgs() << "ConnexInstPrinter::printOperand(): MI->getOperand(OpNo) = "
				<< MI->getOperand(OpNo) << "\n");

				assert((Modifier == 0 \|\| Modifier[0] == 0) && "No modifiers supported");

				const MCOperand &Op = MI->getOperand(OpNo);

				if (Op.isReg()) {
				// This handles registers, such as scalar r0 or vector R(0)
				O << getRegisterName(Op.getReg());
				}
				else
				if (Op.isImm()) {
				/* Normally we do NOT get here because this case is treated in
				printUnsignedImm(). */
				LLVM_DEBUG(dbgs() << "ConnexInstPrinter::printOperand(): Op.getImm() = "
				<< Op.getImm() << "\n");
				O << (int32_t)Op.getImm();
				}
				else {
				assert(Op.isExpr() && "Expected an expression");
				printExpr(Op.getExpr(), O);
				}
				}

				template <unsigned Bits, unsigned Offset>
				void ConnexInstPrinter::printUImm(const MCInst *MI, int opNum, raw_ostream &O) {
				const MCOperand &MO = MI->getOperand(opNum);
				if (MO.isImm()) {
				uint64_t Imm = MO.getImm();
				Imm -= Offset;
				Imm &= (1 << Bits) - 1;
				Imm += Offset;
				O << formatImm(Imm);
				return;
				}

				printOperand(MI, opNum, O);
				}

				void ConnexInstPrinter::printMemOperand(const MCInst *MI, int OpNo, raw_ostream &O,
				const char *Modifier) {
				// We arrive here for instructions like: sth 0(r12), r14

				LLVM_DEBUG(dbgs() << "Entered ConnexInstPrinter::printMemOperand()\n");

				const MCOperand &RegOp = MI->getOperand(OpNo);
				const MCOperand &OffsetOp = MI->getOperand(OpNo + 1);

				// offset
				if (OffsetOp.isImm())
				O << formatDec(OffsetOp.getImm());
				else
				assert(0 && "Expected an immediate");

				// register
				assert(RegOp.isReg() && "Register operand not a register");
				//#ifdef USE_ORIGINAL_PRINT_CODE
				O << '(' << getRegisterName(RegOp.getReg()) << ')';
				/*
				#else
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MCOperand.html
				O << getRegisterName(RegOp.getReg()); //print something like r1, r2, etc
				//O << RegOp.getImm(); // Gives error: /home/asusu/LLVM/llvm38Nov2016/llvm/include/llvm/MC/MCInst.h:75: int64_t llvm::MCOperand::getImm() const: Assertion `isImm() && "This is not an immediate"' failed.
				//O << RegOp; // Outputs something like <MCOperand Reg:66>, etc
				#endif
				*/
				}

				// Taken from MSP430InstPrinter.h
				void ConnexInstPrinter::printSrcMemOperand(const MCInst *MI, unsigned OpNo,
				raw_ostream &O,
				const char *Modifier) {
				LLVM_DEBUG(dbgs() << "Entered ConnexInstPrinter::printSrcMemOperand()\n");

				const MCOperand &Base = MI->getOperand(0);
				const MCOperand &Disp = MI->getOperand(1);

				// Print displacement first

				// If the global address expression is a part of displacement field with a
				// register base, we should not emit any prefix symbol here, e.g.
				// mov.w &foo, r1
				// vs
				// mov.w glb(r1), r2
				// Otherwise (!) msp430-as will silently miscompile the output :(
				if (!Base.getReg())
				O << '&';

				if (Disp.isExpr())
				Disp.getExpr()->print(O, &MAI);
				else {
				assert(Disp.isImm() && "Expected immediate in displacement field");
				O << Disp.getImm();
				}

				// Print register base field
				if (Base.getReg())
				O << '(' << getRegisterName(Base.getReg()) << ')';
				}

				void ConnexInstPrinter::printImm64Operand(const MCInst *MI, unsigned OpNo,
				raw_ostream &O) {
				LLVM_DEBUG(dbgs() << "Entered ConnexInstPrinter::printImm64Operand()\n");

				const MCOperand &Op = MI->getOperand(OpNo);

				if (Op.isImm()) {
				// This is for instructions like: ld_64 r3, 4294967296
				O << (uint64_t)Op.getImm();
				}
				else {
				// This is for instructions like: ld_64 r1, <MCOperand Expr:(CONNEX_VL)>
				O << Op;
				}
				}

				void ConnexInstPrinter::printScatterGatherMemOperand(const MCInst *MI,
				unsigned OpNo,
				raw_ostream &O) {
				LLVM_DEBUG(dbgs() <<
				"Entered ConnexInstPrinter::printScatterGatherMemOperand() - "
				"NOTE that we discard the BasePtr of the TableGen MemOperand\n");
				/*
				IMPORTANT: Here, for the MCInst, the parameters do NOT follow the order from the .td file.
				Following include/llvm/Target/TargetSelectionDAG.td we have:

				// SDTypeProfile - This profile describes the type requirements of a Selection
				// DAG node.
				class SDTypeProfile<int numresults, int numoperands,
				list<SDTypeConstraint> constraints> {
				int NumResults = numresults;
				int NumOperands = numoperands;
				list<SDTypeConstraint> Constraints = constraints;
				}

				// So: 3 input operands, 2 results.
				// Params are: passthru, mask, index; results are: vector of i1, vector of ptr (actual result)
				// Params are 0, 1, 2 and results are 3, 4.
				// Operands 0 and 1 have vector type, with same number of elements.
				// Operands 0 and 2 have identical types.
				// Operands 1 and 3 have identical types.
				// --> Opnd 3 (result 0?) is i1 vector
				// Operand 4 (result 1?) has pointer type.
				// Operand 1 is vector type with element type of i1.
				def SDTMaskedGather: SDTypeProfile<2, 3, [ // masked gather
				SDTCisVec<0>, SDTCisVec<1>, SDTCisSameAs<0, 2>, SDTCisSameAs<1, 3>,
				SDTCisPtrTy<4>, SDTCVecEltisVT<1, i1>, SDTCisSameNumEltsAs<0, 1>
				]>;

				def masked_gather : SDNode<"ISD::MGATHER", SDTMaskedGather,
				[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
				*/

				if (MI->getNumOperands() > 4) {
				// We have an MGATHER operation
				/*
				// THIS is WRONG:
				const MCOperand &index = MI->getOperand(0);
				const MCOperand &maskIn = MI->getOperand(1);
				const MCOperand &passthru = MI->getOperand(2);
				const MCOperand &maskOut = MI->getOperand(3);
				const MCOperand &res = MI->getOperand(4);
				*/
				const MCOperand &res = MI->getOperand(0);
				const MCOperand &index = MI->getOperand(4);
				const MCOperand &maskIn = MI->getOperand(1);
				const MCOperand &passthru = MI->getOperand(2);
				const MCOperand &maskOut = MI->getOperand(3);

				assert(index.isReg() && "index not a register");
				assert(passthru.isReg() && "passthru not a register");

				LLVM_DEBUG(dbgs() << "MI = " << *MI
				<< "\n index = " << index
				<< "\n maskIn (bool vector register, which we actually do NOT use) = " << maskIn
				<< "\n passthru = " << passthru
				<< "\n maskOut = " << maskOut
				<< "\n res = " << res << "\n");

				LLVM_DEBUG(dbgs() << "\n res = " << res << "\n");

				assert(res.isReg() && "res not a register");
				O << getRegisterName(index.getReg());
				}
				else {
				// We have an MSCATTER operation
				const MCOperand &value = MI->getOperand(1);
				const MCOperand &maskIn = MI->getOperand(0);
				const MCOperand &mask2 = MI->getOperand(2);
				const MCOperand &index = MI->getOperand(3);

				LLVM_DEBUG(dbgs() << "MI = " << *MI
				<< "\n value (src) = " << value
				<< "\n maskIn (bool vector register, "
				"which we actually do NOT use) = " << maskIn
				<< "\n index = " << index
				<< "\n mask2 = " << mask2
				<< "\n");
				O << getRegisterName(index.getReg());
				}


				/*
				O << "MI = " << *MI << "\n";
				O << "index = (" << getRegisterName(index.getReg()) << ")\n";
				O << "passthru = (" << getRegisterName(passthru.getReg()) << ")\n";
				O << "res = (" << getRegisterName(res.getReg()) << ")\n";
				//O << " = (" << getRegisterName(BaseReg.getReg()) << ")\n";
				*/

				//printMemReference(MI, OpNo, O);
				LLVM_DEBUG(dbgs() << "Exiting ConnexInstPrinter::printScatterGatherMemOperand()\n");
				}


				char getStringFromAssociatedInlineAsm(const MachineInstr assocMI,
				char *strToSearch) {
				char *res = NULL;

				assert(0 &&
				"getStringFromAssociatedInlineAsm() should NOT be executed since we don't "
				"use symbolic LD_H or ST_H anymore");

				LLVM_DEBUG(dbgs() << "getStringFromAssociatedInlineAsm(): assocMI = "
				//; assocMI->dump();
				//dbgs() <<
				<< "(" << assocMI << ")" << "\n");

				const MachineInstr *miInlineasm = mapLD_ST_REPEAT_InlineAsm[assocMI];
				LLVM_DEBUG(dbgs() << "getStringFromAssociatedInlineAsm(): miInlineasm = "
				<< miInlineasm << "\n");

				if (miInlineasm == NULL) {
				res = strdup("[NO_VALUE - since miInlineasm == NULL!!!!]");
				return res;
				}
				assert(miInlineasm->isInlineAsm());

				const MachineOperand &inlineAsmStrMO0 = miInlineasm->getOperand(0);

				// LLVM_DEBUG(dbgs() << "ConnexInstPrinter::printUnsignedImm(): inlineAsmStrMO = "
				// << inlineAsmStrMO0 << "\n");
				// Inspiring from http://llvm.org/docs/doxygen/html/MachineInstr_8cpp_source.html#l00306
				assert(inlineAsmStrMO0.getType() == MachineOperand::MO_ExternalSymbol);

				LLVM_DEBUG(dbgs() << "getStringFromAssociatedInlineAsm(): "
				"inlineAsmStrMO0.getSymbolName() = "
				<< inlineAsmStrMO0.getSymbolName() << "\n");

				// From http://llvm.org/docs/doxygen/html/classllvm_1_1MachineOperand.html
				// const char *getSymbolName () const
				res = strstr(const_cast<char *>(inlineAsmStrMO0.getSymbolName()),
				strToSearch);

				assert(res != NULL && "Did not find strToSearch marker in INLINEASM");

				res += strlen(strToSearch);

				assert(res != NULL);

				return res;
				}


				// Taken from MipsInstPrinter.cpp
				// (required by ConnexGenAsmWriter.inc)
				void ConnexInstPrinter::printUnsignedImm(const MCInst *MI, int opNum,
				raw_ostream &O) {
				char *res = NULL;
				//int offsetLS;

				LLVM_DEBUG(dbgs() << "Entered ConnexInstPrinter::printUnsignedImm()...\n");

				const MCOperand &MO = MI->getOperand(opNum);

				if (MO.isImm()) {
				// Printing 16-bits unsigned int
				//O << (unsigned short)MO.getImm();
				// Printing unsigned int
				unsigned imm = MO.getImm();

				LLVM_DEBUG(dbgs() << "ConnexInstPrinter::printUnsignedImm(): imm = "
				<< imm
				<< ", MI (ptr) = " << MI
				<< ", MI = " << *MI
				<< "\n");

				#ifdef GENERATE_ASSOCIATED_INLINEASM_FROM_LOOPVECTORIZE_PASS
				if (imm == VALUE_BOGUS_REPEAT_X_TIMES) {
				assert(MI->getOpcode() == Connex::REPEAT);

				res = getStringFromAssociatedInlineAsm(crtMI,
				const_cast<char >("/value*/"));

				O << res;
				}
				else
				#endif

				if (imm == CONNEX_MEM_NUM_ROWS + 10) {
				#ifdef NOTNOTNOT
				// This was too complicated

				//MCInst *assocMC = mapMachineMCInst[MI];
				const MachineInstr *assocMI =
				retrieveAssociatedMachineInstr(MI);
				#endif

				const MachineInstr *assocMI = crtMI;

				assert((MI->getOpcode() == Connex::LD_H) \|\|
				(MI->getOpcode() == Connex::ST_H));

				res = getStringFromAssociatedInlineAsm(crtMI, "/offset/");

				//sscanf(res, "%d", &offsetLS);

				//LLVM_DEBUG(dbgs() << "assocMI = " << *assocMC << "\n");

				O << STR_LOOP_SYMBOLIC_INDEX
				<< " + " << res; // offsetLS
				}
				else {
				O << (unsigned int)MO.getImm();
				}
				}
				else
				printOperand(MI, opNum, O);
				}


				// Taken from [LLVM]/llvm/lib/Target/Mips/InstPrinter/MipsInstPrinter.h
				void ConnexInstPrinter::printUnsignedImm8(const MCInst *MI, int opNum,
				raw_ostream &O) {
				const MCOperand &MO = MI->getOperand(opNum);

				if (MO.isImm())
				O << (unsigned short int)(unsigned char)MO.getImm();
				else
				printOperand(MI, opNum, O);
				}

lib/Target/Connex/InstPrinter/LLVMBuild.txt

				;===- ./lib/Target/Connex/InstPrinter/LLVMBuild.txt ---------------- Conf ---===;
				;
				; The LLVM Compiler Infrastructure
				;
				; This file is distributed under the University of Illinois Open Source
				; License. See LICENSE.TXT for details.
				;
				;===------------------------------------------------------------------------===;
				;
				; This is an LLVMBuild description file for the components in this subdirectory.
				;
				; For more information on the LLVMBuild system, please see:
				;
				; http://llvm.org/docs/LLVMBuild.html
				;
				;===------------------------------------------------------------------------===;

				[component_0]
				type = Library
				name = ConnexAsmPrinter
				parent = Connex
				required_libraries = MC Support
				add_to_library_groups = Connex

lib/Target/Connex/LLVMBuild.txt

				;===- ./lib/Target/Connex/LLVMBuild.txt ---------------------------- Conf ---===;
				;
				; Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				; See https://llvm.org/LICENSE.txt for license information.
				; SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				;
				;===------------------------------------------------------------------------===;
				;
				; This is an LLVMBuild description file for the components in this subdirectory.
				;
				; For more information on the LLVMBuild system, please see:
				;
				; http://llvm.org/docs/LLVMBuild.html
				;
				;===------------------------------------------------------------------------===;

				[common]
				subdirectories = InstPrinter MCTargetDesc TargetInfo

				[component_0]
				type = TargetGroup
				name = Connex
				parent = Target
				has_asmprinter = 1

				[component_1]
				type = Library
				name = ConnexCodeGen
				parent = Connex
				required_libraries =
				Analysis
				AsmPrinter
				CodeGen
				Core
				MC
				ConnexAsmPrinter
				ConnexDesc
				ConnexInfo
				SelectionDAG
				Support
				Target
				add_to_library_groups = Connex

lib/Target/Connex/MCTargetDesc/CMakeLists.txt

				add_llvm_library(LLVMConnexDesc
				ConnexMCTargetDesc.cpp
				ConnexAsmBackend.cpp
				ConnexMCCodeEmitter.cpp
				ConnexELFObjectWriter.cpp
				)

lib/Target/Connex/MCTargetDesc/ConnexAsmBackend.cpp

				//===-- ConnexAsmBackend.cpp - Connex Assembler Backend -------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "MCTargetDesc/ConnexMCTargetDesc.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/MC/MCAsmBackend.h"
				#include "llvm/MC/MCAssembler.h"
				#include "llvm/MC/MCContext.h"
				#include "llvm/MC/MCFixup.h"
				/*
				// 2019_03_30
				#include "llvm/MC/MCDirectives.h"
				#include "llvm/MC/MCELFObjectWriter.h"
				#include "llvm/MC/MCFixupKindInfo.h"
				*/
				#include "llvm/MC/MCObjectWriter.h"
				#include "llvm/Support/EndianStream.h"
				/*
				#include "llvm/MC/MCSubtargetInfo.h"
				#include "llvm/MC/MCExpr.h"
				*/
				#include <cassert>
				#include <cstdint>
				/*
				#include "llvm/MC/MCSymbol.h"
				#include "llvm/Support/ErrorHandling.h"
				#include "llvm/Support/raw_ostream.h"
				*/

				using namespace llvm;

				namespace {

				class ConnexAsmBackend : public MCAsmBackend {
				public:
				ConnexAsmBackend(support::endianness Endian) : MCAsmBackend(Endian) {}

				~ConnexAsmBackend() override = default;

				void applyFixup(const MCAssembler &Asm, const MCFixup &Fixup,
				const MCValue &Target, MutableArrayRef<char> Data,
				uint64_t Value, bool IsResolved,
				const MCSubtargetInfo *STI) const override;

				std::unique_ptr<MCObjectTargetWriter> createObjectTargetWriter()
				const override;


				// No instruction requires relaxation
				bool fixupNeedsRelaxation(const MCFixup &Fixup, uint64_t Value,
				const MCRelaxableFragment *DF,
				const MCAsmLayout &Layout) const override {
				return false;
				}


				unsigned getNumFixupKinds() const override { return 1; }


				bool mayNeedRelaxation(const MCInst &Inst,
				const MCSubtargetInfo &STI) const override {
				return false;
				}


				void relaxInstruction(const MCInst &Inst, const MCSubtargetInfo &STI,
				MCInst &Res) const override {}


				bool writeNopData(raw_ostream &OS, uint64_t Count) const override;
				};

				} // end anonymous namespace


				bool ConnexAsmBackend::writeNopData(raw_ostream &OS, uint64_t Count) const {
				if ((Count % 8) != 0)
				return false;

				for (uint64_t i = 0; i < Count; i += 8)
				support::endian::write<uint64_t>(OS, 0x15000000, Endian);

				return true;
				}


				void ConnexAsmBackend::applyFixup(const MCAssembler &Asm, const MCFixup &Fixup,
				const MCValue &Target,
				MutableArrayRef<char> Data, uint64_t Value,
				bool IsResolved,
				const MCSubtargetInfo *STI) const {
				if (Fixup.getKind() == FK_SecRel_4 \|\| Fixup.getKind() == FK_SecRel_8) {
				// The Value is 0 for global variables, and the in-section offset
				// for static variables. Write to the immediate field of the inst.
				assert(Value <= UINT32_MAX);
				support::endian::write<uint32_t>(&Data[Fixup.getOffset() + 4],
				static_cast<uint32_t>(Value),
				Endian);
				} else if (Fixup.getKind() == FK_Data_4) {
				support::endian::write<uint32_t>(&Data[Fixup.getOffset()], Value, Endian);
				} else if (Fixup.getKind() == FK_Data_8) {
				support::endian::write<uint64_t>(&Data[Fixup.getOffset()], Value, Endian);
				} else if (Fixup.getKind() == FK_PCRel_4) {
				Value = (uint32_t)((Value - 8) / 8);
				if (Endian == support::little) {
				Data[Fixup.getOffset() + 1] = 0x10;
				support::endian::write32le(&Data[Fixup.getOffset() + 4], Value);
				} else {
				Data[Fixup.getOffset() + 1] = 0x1;
				support::endian::write32be(&Data[Fixup.getOffset() + 4], Value);
				}
				} else {
				assert(Fixup.getKind() == FK_PCRel_2);
				Value = (uint16_t)((Value - 8) / 8);
				support::endian::write<uint16_t>(&Data[Fixup.getOffset() + 2], Value,
				Endian);
				}
				}


				std::unique_ptr<MCObjectTargetWriter>
				ConnexAsmBackend::createObjectTargetWriter() const {
				return createConnexELFObjectWriter(0);
				}


				MCAsmBackend *llvm::createConnexAsmBackend(const Target &T,
				const MCSubtargetInfo &STI,
				const MCRegisterInfo &MRI,
				const MCTargetOptions &) {
				return new ConnexAsmBackend(support::little);
				}

lib/Target/Connex/MCTargetDesc/ConnexELFObjectWriter.cpp

				//===-- ConnexELFObjectWriter.cpp - Connex ELF Writer ---------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "MCTargetDesc/ConnexMCTargetDesc.h"
				#include "llvm/BinaryFormat/ELF.h"
				#include "llvm/MC/MCELFObjectWriter.h"
				#include "llvm/MC/MCFixup.h"
				#include "llvm/MC/MCObjectWriter.h"
				#include "llvm/MC/MCValue.h"
				#include "llvm/Support/ErrorHandling.h"
				#include <cstdint>

				using namespace llvm;

				namespace {

				class ConnexELFObjectWriter : public MCELFObjectTargetWriter {
				public:
				ConnexELFObjectWriter(uint8_t OSABI);

				~ConnexELFObjectWriter() override;

				protected:
				unsigned getRelocType(MCContext &Ctx, const MCValue &Target,
				const MCFixup &Fixup, bool IsPCRel) const override;
				};

				} // end anonymous namespace


				ConnexELFObjectWriter::ConnexELFObjectWriter(uint8_t OSABI)
				: MCELFObjectTargetWriter(/Is64Bit/ true, OSABI, ELF::EM_NONE,
				/HasRelocationAddend/ false) {}


				ConnexELFObjectWriter::~ConnexELFObjectWriter() {}


				unsigned ConnexELFObjectWriter::getRelocType(MCContext &Ctx, const MCValue &Target,
				const MCFixup &Fixup,
				bool IsPCRel) const {
				// determine the type of the relocation
				switch ((unsigned)Fixup.getKind()) {
				default:
				llvm_unreachable("invalid fixup kind!");
				case FK_SecRel_8:
				return ELF::R_BPF_64_64;
				case FK_PCRel_4:
				case FK_SecRel_4:
				return ELF::R_BPF_64_32;
				case FK_Data_8:
				return ELF::R_BPF_64_64;
				case FK_Data_4:
				// .BTF.ext generates FK_Data_4 relocations for
				// insn offset by creating temporary labels.
				// The insn offset is within the code section and
				// already been fulfilled by applyFixup(). No
				// further relocation is needed.
				if (const MCSymbolRefExpr *A = Target.getSymA()) {
				if (A->getSymbol().isTemporary()) {
				MCSection &Section = A->getSymbol().getSection();
				const MCSectionELF *SectionELF = dyn_cast<MCSectionELF>(&Section);
				assert(SectionELF && "Null section for reloc symbol");

				// The reloc symbol should be in text section.
				unsigned Flags = SectionELF->getFlags();
				if ((Flags & ELF::SHF_ALLOC) && (Flags & ELF::SHF_EXECINSTR))
				return ELF::R_BPF_NONE;
				}
				}
				return ELF::R_BPF_64_32;
				}
				}


				std::unique_ptr<MCObjectTargetWriter>
				llvm::createConnexELFObjectWriter(uint8_t OSABI) {
				return llvm::make_unique<ConnexELFObjectWriter>(OSABI);
				}

lib/Target/Connex/MCTargetDesc/ConnexMCAsmInfo.h

				//===-- ConnexMCAsmInfo.h - Connex asm properties -------------------- C++ ---====//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains the declaration of the ConnexMCAsmInfo class.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_CONNEX_MCTARGETDESC_CONNEXMCASMINFO_H
				#define LLVM_LIB_TARGET_CONNEX_MCTARGETDESC_CONNEXMCASMINFO_H

				#include "llvm/ADT/StringRef.h"
				#include "llvm/MC/MCAsmInfo.h"
				#include "llvm/ADT/Triple.h"

				namespace llvm {
				class Target;
				class Triple;

				class ConnexMCAsmInfo : public MCAsmInfo {
				public:
				explicit ConnexMCAsmInfo(const Triple &TT) {
				#ifdef NOT_NOT_NOT
				if (TT.getArch() == Triple::bpfeb)
				IsLittleEndian = false;
				#endif

				PrivateGlobalPrefix = ".L";
				WeakRefDirective = "\t.weak\t";

				// Inspired from http://llvm.org/docs/doxygen/html/NVPTXMCAsmInfo_8cpp_source.html#l00028
				// Avoiding to add APP and NO_APP delimiters before ASM Inline Expressions
				CommentString = "//";
				InlineAsmStart = "";
				InlineAsmEnd = "";

				UsesELFSectionDirectiveForBSS = true;
				HasSingleParameterDotFile = false;
				HasDotTypeDotSizeDirective = false;

				SupportsDebugInformation = true;
				}
				};
				}

				#endif

lib/Target/Connex/MCTargetDesc/ConnexMCCodeEmitter.cpp

				//===-- ConnexMCCodeEmitter.cpp - Convert Connex code to machine code -----------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements the ConnexMCCodeEmitter class.
				//
				//===----------------------------------------------------------------------===//

				#include "MCTargetDesc/ConnexMCTargetDesc.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/MC/MCCodeEmitter.h"
				#include "llvm/MC/MCFixup.h"
				#include "llvm/MC/MCInst.h"
				#include "llvm/MC/MCInstrInfo.h"
				#include "llvm/MC/MCRegisterInfo.h"
				#include "llvm/MC/MCSubtargetInfo.h"
				#include "llvm/Support/Endian.h"
				#include "llvm/Support/EndianStream.h"
				#include <cassert>
				#include <cstdint>


				using namespace llvm;

				#define DEBUG_TYPE "mccodeemitter"

				namespace {

				class ConnexMCCodeEmitter : public MCCodeEmitter {
				const MCInstrInfo &MCII;
				const MCRegisterInfo &MRI;
				bool IsLittleEndian;

				public:
				ConnexMCCodeEmitter(const MCInstrInfo &mcii, const MCRegisterInfo &mri,
				bool IsLittleEndian)
				: MCII(mcii), MRI(mri), IsLittleEndian(IsLittleEndian) {}

				ConnexMCCodeEmitter(const ConnexMCCodeEmitter &) = delete;

				void operator=(const ConnexMCCodeEmitter &) = delete;

				~ConnexMCCodeEmitter() override = default;

				// getBinaryCodeForInstr - TableGen'erated function for getting the
				// binary encoding for an instruction.
				uint64_t getBinaryCodeForInstr(const MCInst &MI,
				SmallVectorImpl<MCFixup> &Fixups,
				const MCSubtargetInfo &STI) const;

				// getMachineOpValue - Return binary encoding of operand. If the machin
				// operand requires relocation, record the relocation and return zero.
				unsigned getMachineOpValue(const MCInst &MI, const MCOperand &MO,
				SmallVectorImpl<MCFixup> &Fixups,
				const MCSubtargetInfo &STI) const;

				uint64_t getMemoryOpValue(const MCInst &MI, unsigned Op,
				SmallVectorImpl<MCFixup> &Fixups,
				const MCSubtargetInfo &STI) const;

				void encodeInstruction(const MCInst &MI, raw_ostream &OS,
				SmallVectorImpl<MCFixup> &Fixups,
				const MCSubtargetInfo &STI) const override;
				};

				} // end anonymous namespace

				MCCodeEmitter *llvm::createConnexMCCodeEmitter(const MCInstrInfo &MCII,
				const MCRegisterInfo &MRI,
				MCContext &Ctx) {
				return new ConnexMCCodeEmitter(MCII, MRI, true);
				}

				#ifdef NOT_NOT_NOT
				MCCodeEmitter *llvm::createBPFbeMCCodeEmitter(const MCInstrInfo &MCII,
				const MCRegisterInfo &MRI,
				MCContext &Ctx) {
				return new BPFMCCodeEmitter(MRI, false);
				}
				#endif



				unsigned ConnexMCCodeEmitter::getMachineOpValue(const MCInst &MI,
				const MCOperand &MO,
				SmallVectorImpl<MCFixup> &Fixups,
				const MCSubtargetInfo &STI) const {
				if (MO.isReg())
				return MRI.getEncodingValue(MO.getReg());
				if (MO.isImm())
				return static_cast<unsigned>(MO.getImm());

				assert(MO.isExpr());

				const MCExpr *Expr = MO.getExpr();

				assert(Expr->getKind() == MCExpr::SymbolRef);

				if (MI.getOpcode() == Connex::JAL)
				// func call name
				Fixups.push_back(MCFixup::create(0, Expr, FK_SecRel_4));
				else if (MI.getOpcode() == Connex::LD_imm64)
				Fixups.push_back(MCFixup::create(0, Expr, FK_SecRel_8));
				else
				// bb label
				Fixups.push_back(MCFixup::create(0, Expr, FK_PCRel_2));

				return 0;
				}

				static uint8_t SwapBits(uint8_t Val) {
				return (Val & 0x0F) << 4 \| (Val & 0xF0) >> 4;
				}

				void ConnexMCCodeEmitter::encodeInstruction(const MCInst &MI, raw_ostream &OS,
				SmallVectorImpl<MCFixup> &Fixups,
				const MCSubtargetInfo &STI) const {
				/*
				// 2019_03_30_TODO
				verifyInstructionPredicates(MI,
				computeAvailableFeatures(STI.getFeatureBits()));
				*/

				unsigned Opcode = MI.getOpcode();
				support::endian::Writer OSE(OS,
				IsLittleEndian ? support::little : support::big);

				if (Opcode == Connex::LD_imm64 \|\| Opcode == Connex::LD_pseudo) {
				uint64_t Value = getBinaryCodeForInstr(MI, Fixups, STI);
				OS << char(Value >> 56);
				if (IsLittleEndian)
				OS << char((Value >> 48) & 0xff);
				else
				OS << char(SwapBits((Value >> 48) & 0xff));
				OSE.write<uint16_t>(0);
				OSE.write<uint32_t>(Value & 0xffffFFFF);

				const MCOperand &MO = MI.getOperand(1);
				uint64_t Imm = MO.isImm() ? MO.getImm() : 0;
				OSE.write<uint8_t>(0);
				OSE.write<uint8_t>(0);
				OSE.write<uint16_t>(0);
				OSE.write<uint32_t>(Imm >> 32);
				} else {
				// Get instruction encoding and emit it
				uint64_t Value = getBinaryCodeForInstr(MI, Fixups, STI);
				OS << char(Value >> 56);
				if (IsLittleEndian)
				OS << char((Value >> 48) & 0xff);
				else
				OS << char(SwapBits((Value >> 48) & 0xff));
				OSE.write<uint16_t>((Value >> 32) & 0xffff);
				OSE.write<uint32_t>(Value & 0xffffFFFF);
				}
				}

				// Encode Connex Memory Operand
				uint64_t ConnexMCCodeEmitter::getMemoryOpValue(const MCInst &MI, unsigned Op,
				SmallVectorImpl<MCFixup> &Fixups,
				const MCSubtargetInfo &STI) const {
				uint64_t Encoding;
				const MCOperand Op1 = MI.getOperand(1);
				assert(Op1.isReg() && "First operand is not register.");
				Encoding = MRI.getEncodingValue(Op1.getReg());
				Encoding <<= 16;
				MCOperand Op2 = MI.getOperand(2);
				assert(Op2.isImm() && "Second operand is not immediate.");
				Encoding \|= Op2.getImm() & 0xffff;
				return Encoding;
				}

				// 2019_03_30_TODO #define ENABLE_INSTR_PREDICATE_VERIFIER
				#include "ConnexGenMCCodeEmitter.inc"

lib/Target/Connex/MCTargetDesc/ConnexMCTargetDesc.h

				//===-- ConnexMCTargetDesc.h - Connex Target Descriptions -------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file provides Connex specific target descriptions.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_CONNEX_MCTARGETDESC_CONNEXMCTARGETDESC_H
				#define LLVM_LIB_TARGET_CONNEX_MCTARGETDESC_CONNEXMCTARGETDESC_H

				#include "llvm/Config/config.h"
				#include "llvm/Support/DataTypes.h"

				#include <memory>

				namespace llvm {
				class MCAsmBackend;
				class MCCodeEmitter;
				class MCContext;
				class MCInstrInfo;
				class MCObjectTargetWriter;
				class MCRegisterInfo;
				class MCSubtargetInfo;
				class MCTargetOptions;
				class StringRef;
				class Target;
				class Triple;
				class raw_ostream;
				class raw_pwrite_stream;

				extern Target TheConnexTarget;


				MCCodeEmitter *createConnexMCCodeEmitter(const MCInstrInfo &MCII,
				const MCRegisterInfo &MRI,
				MCContext &Ctx);

				MCAsmBackend *createConnexAsmBackend(const Target &T, const MCSubtargetInfo &STI,
				const MCRegisterInfo &MRI,
				const MCTargetOptions &Options);

				std::unique_ptr<MCObjectTargetWriter> createConnexELFObjectWriter(uint8_t OSABI);
				}

				// Defines symbolic names for Connex registers. This defines a mapping from
				// register name to register number.
				//
				#define GET_REGINFO_ENUM
				#include "ConnexGenRegisterInfo.inc"

				// Defines symbolic names for the Connex instructions.
				//
				#define GET_INSTRINFO_ENUM
				#include "ConnexGenInstrInfo.inc"

				#define GET_SUBTARGETINFO_ENUM
				#include "ConnexGenSubtargetInfo.inc"

				#endif

lib/Target/Connex/MCTargetDesc/ConnexMCTargetDesc.cpp

				//===-- ConnexMCTargetDesc.cpp - Connex Target Descriptions ---------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file provides Connex specific target descriptions.
				//
				//===----------------------------------------------------------------------===//


				#include "Connex.h"
				#include "ConnexMCTargetDesc.h"
				#include "ConnexMCAsmInfo.h"
				#include "InstPrinter/ConnexInstPrinter.h"
				//#include "llvm/MC/MCCodeGenInfo.h"
				#include "llvm/MC/MCInstrInfo.h"
				#include "llvm/MC/MCRegisterInfo.h"
				#include "llvm/MC/MCStreamer.h"
				#include "llvm/MC/MCSubtargetInfo.h"
				#include "llvm/Support/ErrorHandling.h"
				#include "llvm/Support/TargetRegistry.h"

				#define GET_INSTRINFO_MC_DESC
				#include "ConnexGenInstrInfo.inc"

				#define GET_SUBTARGETINFO_MC_DESC
				#include "ConnexGenSubtargetInfo.inc"

				#define GET_REGINFO_MC_DESC
				#include "ConnexGenRegisterInfo.inc"

				using namespace llvm;


				static MCInstrInfo *createConnexMCInstrInfo() {
				MCInstrInfo *X = new MCInstrInfo();
				InitConnexMCInstrInfo(X);
				return X;
				}


				static MCRegisterInfo *createConnexMCRegisterInfo(const Triple &TT) {
				MCRegisterInfo *X = new MCRegisterInfo();
				InitConnexMCRegisterInfo(X, Connex::R11 /* RAReg doesn't exist */);
				return X;
				}


				static MCSubtargetInfo *createConnexMCSubtargetInfo(const Triple &TT,
				StringRef CPU, StringRef FS) {
				return createConnexMCSubtargetInfoImpl(TT, CPU, FS);
				}


				static MCStreamer *createConnexMCStreamer(const Triple &T, MCContext &Ctx,
				std::unique_ptr<MCAsmBackend> &&MAB,
				std::unique_ptr<MCObjectWriter> &&OW,
				std::unique_ptr<MCCodeEmitter> &&Emitter,
				bool RelaxAll) {
				return createELFStreamer(Ctx, std::move(MAB), std::move(OW),
				std::move(Emitter),
				RelaxAll);
				}


				static MCInstPrinter *createConnexMCInstPrinter(const Triple &T,
				unsigned SyntaxVariant,
				const MCAsmInfo &MAI,
				const MCInstrInfo &MII,
				const MCRegisterInfo &MRI) {
				if (SyntaxVariant == 0)
				return new ConnexInstPrinter(MAI, MII, MRI);
				return nullptr;
				}


				extern "C" void LLVMInitializeConnexTargetMC() {
				for (Target *T : {&TheConnexTarget}) {
				// Register the MC asm info.
				RegisterMCAsmInfo<ConnexMCAsmInfo> X(*T);

				// Register the MC instruction info.
				TargetRegistry::RegisterMCInstrInfo(*T, createConnexMCInstrInfo);

				// Register the MC register info.
				TargetRegistry::RegisterMCRegInfo(*T, createConnexMCRegisterInfo);

				// Register the MC subtarget info.
				TargetRegistry::RegisterMCSubtargetInfo(*T,
				createConnexMCSubtargetInfo);

				// Register the object streamer
				TargetRegistry::RegisterELFStreamer(*T, createConnexMCStreamer);

				// Register the MCInstPrinter.
				TargetRegistry::RegisterMCInstPrinter(*T, createConnexMCInstPrinter);
				}

				// Register the MC code emitter
				TargetRegistry::RegisterMCCodeEmitter(TheConnexTarget,
				createConnexMCCodeEmitter);

				// Register the ASM Backend
				TargetRegistry::RegisterMCAsmBackend(TheConnexTarget,
				createConnexAsmBackend);
				}

lib/Target/Connex/MCTargetDesc/LLVMBuild.txt

				;===- ./lib/Target/Connex/MCTargetDesc/LLVMBuild.txt --------------- Conf ---===;
				;
				; Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				; See https://llvm.org/LICENSE.txt for license information.
				; SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				;
				;===------------------------------------------------------------------------===;
				;
				; This is an LLVMBuild description file for the components in this subdirectory.
				;
				; For more information on the LLVMBuild system, please see:
				;
				; http://llvm.org/docs/LLVMBuild.html
				;
				;===------------------------------------------------------------------------===;

				[component_0]
				type = Library
				name = ConnexDesc
				parent = Connex
				required_libraries = MC ConnexAsmPrinter ConnexInfo Support
				add_to_library_groups = Connex

lib/Target/Connex/Misc.h

				#ifndef INCLUDED_SUNIT_DUMP
				#define INCLUDED_SUNIT_DUMP

				#include "llvm/Support/raw_ostream.h"

				using namespace llvm;

				// Inspired from SystemZHazardRecognizer.cpp

				#ifndef NDEBUG // Debug output


				// The SUnit (Scheduling Unit) class no longer has the dump() method,
				// so we create a helper method for it here.
				// Inspired from SystemZHazardRecognizer.h

				/// Resolves and cache a resolved scheduling class for an SUnit.
				static const MCSchedClassDesc getSchedClass(SUnit SU) {
				if (!SU->SchedClass // && SchedModel->hasInstrSchedModel()
				) {
				return NULL;
				// TODO: SU->SchedClass = SchedModel->resolveSchedClass(SU->getInstr());
				}

				return SU->SchedClass;
				}

				static void dumpSU(llvm::SUnit *SU, raw_ostream &OS) {
				OS << "SU(" << SU->NodeNum << "):";
				//OS << TII->getName(SU->getInstr()->getOpcode());
				OS << SU->getInstr()->getOpcode();

				const MCSchedClassDesc *SC = getSchedClass(SU);
				if (!SC->isValid())
				return;

				/*
				// TODO: make this compile

				for (TargetSchedModel::ProcResIter
				PI = SchedModel->getWriteProcResBegin(SC),
				PE = SchedModel->getWriteProcResEnd(SC); PI != PE; ++PI) {
				const MCProcResourceDesc &PRD =
				*SchedModel->getProcResource(PI->ProcResourceIdx);
				std::string FU(PRD.Name);
				// trim e.g. Z13_FXaUnit -> FXa
				FU = FU.substr(FU.find("_") + 1);
				size_t Pos = FU.find("Unit");
				if (Pos != std::string::npos)
				FU.resize(Pos);
				if (FU == "LS") // LSUnit -> LSU
				FU = "LSU";
				OS << "/" << FU;

				if (PI->Cycles > 1)
				OS << "(" << PI->Cycles << "cyc)";
				}
				*/

				if (SC->NumMicroOps > 1)
				OS << "/" << SC->NumMicroOps << "uops";
				if (SC->BeginGroup && SC->EndGroup)
				OS << "/GroupsAlone";
				else if (SC->BeginGroup)
				OS << "/BeginsGroup";
				else if (SC->EndGroup)
				OS << "/EndsGroup";
				if (SU->isUnbuffered)
				OS << "/Unbuffered";
				/*
				// TODO: make this compile
				if (has4RegOps(SU->getInstr()))
				OS << "/4RegOps";
				*/
				}
				#endif

				#endif // INCLUDED_SUNIT_DUMP

lib/Target/Connex/RecoverFromLlvmIR.h

Property	Old Value	New Value
svn:special	null	* \ No newline at end of property

				#ifndef RECOVER_FROM_LLVM_IR
				#define RECOVER_FROM_LLVM_IR

				arsenmUnsubmitted Done Reply Inline Actions I don't know what this file is trying to accomplish, but it is a separate patch from the backend arsenm: I don't know what this file is trying to accomplish, but it is a separate patch from the backend
				//#include <algorithm>
				//#include <functional>

				// Alex: new code
				//#include <ilist.h>
				#include <string>
				//#include <unordered_set>
				#include <unordered_map>
				#include <utility> // std::pair
				// Alex: END new code

				#include "llvm/IR/DebugInfo.h"

				// See http://llvm.org/docs/ProgrammersManual.html#isa
				#include "llvm/Support/Casting.h" // for dyn_cast

				//#define DEBUG_TYPE LV_NAME


				#define STR_REMAINDER_VF "n.mod.vf"


				#define EXCHANGE(a, b) a ^= b; b ^= a; a ^= b;

				#ifndef MAXLEN_STR
				//#define MAXLEN_STR 2048
				#define MAXLEN_STR 8192
				#endif

				using namespace llvm;

				namespace {

				// Normally used to return the variable name without suffix e.g. ".034"
				void rStripStringAfterChar(char *str, char ch) {
				/*
				//char *reductionVarNameTmp;
				char *strTmp;
				for (strTmp = str; *strTmp != 0; strTmp++) {
				if (*strTmp == ch)
				*strTmp = 0;
				}
				*/

				//char found = const_cast<char >(strchr(str0, '.'));
				char found = const_cast<char >(strchr(str, ch));
				if (found != NULL)
				*found = 0;
				}


				/* IMPORTANT NOTE:
				* If the val is an LLVM variable, it will return something like
				* "%[llvm_var_name]".
				* If val is a constant it returns normally the value of the
				* constant.
				*
				* I consider a rather big defficiency of Value::getName() NOT to return
				* (itself or a different method, created by the key LLVM people)
				* the auto-generated number like %0, if the Value is created without an
				* explicit name.
				*
				* IMPORTANT: I noticed that for different Instruction the result of print()
				* can be somewhat different, like:
				* - i32 %0
				* - %1 = bitcast ...
				*/
				std::string getLLVMValueName(Value *val) {
				/* Somewhat important: it is possible that, if the API
				changes a bit the name will NOT be printed
				here anymore */
				std::string printStr;
				raw_string_ostream OS(printStr);

				// bci->printAsOperand(OS, true); // Does NOT write anything (false neither)

				// See http://llvm.org/docs/doxygen/html/Value_8h_source.html#l00202
				/* NOTE: IsForDebug false can print:
				- the SAME as true or
				- the complete instruction, not just the value */
				val->print(OS, /IsForDebug/ true);
				LLVM_DEBUG(dbgs() << "getLLVMValueName(): printStr = "
				<< printStr << "\n");

				char strValName[MAXLEN_STR];
				char strValName2[MAXLEN_STR];

				if (llvm::dyn_cast<Constant>(val) != NULL) {
				LLVM_DEBUG(dbgs() << "getLLVMValueName(): val is Constant\n");

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1Constant.html
				sscanf(printStr.c_str(), "%s %s", strValName2, strValName);

				/* Normally printStr is of form "type_ct val_ct".
				* But we can also have something like
				* @dataT = common local_unnamed_addr global [128 x [150 x half]] zeroinitializer
				*/
				if (strValName2[0] == '@')
				strcpy(strValName, strValName2 + 1);
				}
				else {
				const char *ptr = printStr.c_str();
				for (; *ptr != 0; ptr++) {
				if (*ptr == '%')
				break;
				}
				LLVM_DEBUG(dbgs() << "getLLVMValueName(): ptr = " << ptr << "\n");

				if (*ptr == 0) {
				// This is NOT a variable Value - probably just a constant
				return ""; //std::to_string("");
				}

				sscanf(ptr, "%s ", strValName);
				//sscanf(valTypeAndName.c_str(), "%s %s", strValName, strValName);
				}

				std::string res = strValName;
				LLVM_DEBUG(dbgs() << "getLLVMValueName(): res = "
				<< res << "\n");

				return res;
				}

				// Used by getAllMetadata() (and getExpr())
				bool ranGetAllMetadata;
				//DenseMap<Value *, std::string> varNameMap;
				// Map with <name of Value, name of source var represented>
				std::unordered_map<std::string, std::string> varNameMap;
				//
				void getAllMetadata(Function *F) {
				ranGetAllMetadata = true;

				LLVM_DEBUG(dbgs() << "Entered getAllMetadata()\n");

				// Some info about metadata: http://llvm.org/docs/SourceLevelDebugging.html#llvm-dbg-value

				// Inspired from
				// https://weaponshot.wordpress.com/2012/05/06/extract-all-the-metadata-nodes-in-llvm/
				for (Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB) {
				for (BasicBlock::iterator I = BB->begin(),
				E = BB->end(); I != E; ++I) {
				/* Get the Metadata declared in the llvm intrinsic functions
				such as llvm.dbg.declare() */
				if (CallInst *CI = dyn_cast<CallInst>(I)) {
				if (Function *F = CI->getCalledFunction()) {
				// We look at the llvm.dbg.value metadata which associates Value (LLVM IR values) with names in the original program
				if (F->getName().startswith("llvm.dbg.value")) {
				//if (F->getName().startswith("llvm.dbg"))
				LLVM_DEBUG(dbgs() << "getAllMetadata(): CI = " << *CI << "\n");

				/* It seems that the association between LLVM IR
				Value and names in the original source program
				is always like this:
				- opnd 0 contains the Value,
				- opnd 1 is always a (useless?) 0,
				- opnd 2 contains the DILocalVariable,
				*/
				// Error: <<no known conversion for argument 1 from ‘const llvm::Value’ to ‘const llvm::Metadata’>>: DILocalVariable *srcVar = llvm::dyn_cast_or_null<DILocalVariable>(I->getOperand(2));
				// Error: <<no known conversion for argument 1 from ‘const llvm::Value’ to ‘const llvm::Metadata’>>: MDNode *srcVar = llvm::dyn_cast_or_null<MDNode>(I->getOperand(2));
				/* See http://llvm.org/docs/doxygen/html/classllvm_1_1MetadataAsValue.html
				(see maybe http://llvm.org/docs/doxygen/html/namespacellvm_1_1mdconst.html:
				"Now that Value and Metadata are in separate hierarchies" */
				MetadataAsValue *srcVarMDV = llvm::dyn_cast_or_null<MetadataAsValue>(I->getOperand(2));

				//Value *val = I->getOperand(0);
				MetadataAsValue *val = llvm::dyn_cast_or_null<MetadataAsValue>(I->getOperand(0));
				assert(val != NULL);

				if (srcVarMDV != NULL) {
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1MDNode.html
				//MDNode *srcVar = llvm::dyn_cast_or_null<MDNode>(srcVarMDV->getMetadata());

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1DILocalVariable.html
				// and http://llvm.org/docs/doxygen/html/classllvm_1_1DIVariable.html
				DILocalVariable *srcVar = llvm::dyn_cast_or_null<DILocalVariable>(srcVarMDV->getMetadata());

				assert(srcVar != NULL);

				// Gives compiler-error: const MDOperand srcVarOpnd0 = srcVar->getOperand(0);
				//const MDOperand *srcVarOpnd0 = & (srcVar->getOperand(0));

				std::string valueName = getLLVMValueName(val);
				if (valueName.size() == 0) {
				/* We can have metadata which has for 1st
				operand a constant e.g. 0.
				For ex
				call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !32, metadata !21), !dbg !33
				*/
				continue;
				}

				//varNameMap[valTypeAndName] = (srcVar->getName()).str();
				varNameMap[valueName] = (srcVar->getName()).str();

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1DILocalVariable.html
				LLVM_DEBUG(dbgs() << "getAllMetadata(): val = "
				<< *val << "\n");
				LLVM_DEBUG(dbgs() << " val = "
				<< val << "\n");
				LLVM_DEBUG(dbgs() << " val->getValueName() = "
				<< val->getValueName() << "\n");
				LLVM_DEBUG(dbgs() << " val->getName() = "
				<< val->getName() << "\n");
				LLVM_DEBUG(dbgs() << " srcVar = "
				<< *srcVar << "\n");
				//LLVM_DEBUG(dbgs() << " srcVar->getOperand(0) = "
				LLVM_DEBUG(dbgs() << " srcVarName = "
				<< varNameMap[valueName] /* srcVar->getName() */
				<< "\n");
				}
				}
				}
				}
				}
				}
				} // end getAllMetadata()


				std::string printCTypeFromLLVMType(Type aType, LLVMContext aContext) {
				std::string res;

				// See http://llvm.org/doxygen/classllvm_1_1Type.html
				if (aType == Type::getInt16Ty(*aContext))
				res = "short";
				else
				if (aType == Type::getInt32Ty(*aContext)) //Builder.getInt32Ty())
				res = "int";
				else
				if (aType == Type::getHalfTy(*aContext))
				res = "half";
				else
				assert(0 && "printCTypeFromLLVMType(): Type NOT supported");

				return res;
				}


				// TODO: probably we will need to treat struct/record
				// (category theory coproduct data type),
				// union/variants (category theory cartesian product data type)
				Type getElementTypeOfDerivedType(Type valType) {
				int sizeofElem;

				LLVM_DEBUG(dbgs() << "getElementTypeOfDerivedType(): valType = "
				<< *valType << "\n");

				// Helps for vector type.
				// So it does NOT help for pointer type, as it is the case for val (normally).
				Type *scalarType = valType->getScalarType();
				LLVM_DEBUG(dbgs() << "getElementTypeOfDerivedType(): scalarType = "
				<< *scalarType << "\n");

				sizeofElem = scalarType->getScalarSizeInBits() / 8;
				LLVM_DEBUG(dbgs() << "getElementTypeOfDerivedType(): sizeof(scalarType) = "
				<< sizeofElem << "\n");
				if (sizeofElem != 0)
				return scalarType;

				/*
				// Does NOT help: both return 0...
				LLVM_DEBUG(dbgs() << "GetSize(): bitsizeof(type of val) = "
				//<< valType->getPrimitiveSizeInBits() / 8 << "\n");
				<< valType->getScalarSizeInBits() << "\n");
				*/
				ArrayType *arrType = llvm::dyn_cast<ArrayType>(valType);

				if (arrType != NULL) {
				Type *elemArrType = arrType->getElementType();
				sizeofElem = elemArrType->getScalarSizeInBits() / 8;
				LLVM_DEBUG(dbgs()
				<< "getElementTypeOfDerivedType(): (arrType != NULL): elemArrType = "
				<< *elemArrType << "\n");
				LLVM_DEBUG(dbgs()
				<< "getElementTypeOfDerivedType(): (arrType != NULL): sizeofElem = "
				<< sizeofElem << "\n");

				if (sizeofElem == 0) {
				return getElementTypeOfDerivedType(elemArrType);
				}
				else {
				return elemArrType;
				}
				}

				/* See http://llvm.org/docs/doxygen/html/classllvm_1_1SequentialType.html
				and http://llvm.org/docs/doxygen/html/classllvm_1_1PointerType.html */
				PointerType *ptrType = llvm::dyn_cast<PointerType>(valType);
				if (ptrType != NULL) {
				Type *elemPtrType = ptrType->getElementType();

				sizeofElem = elemPtrType->getScalarSizeInBits() / 8;
				LLVM_DEBUG(dbgs() << "getElementTypeOfDerivedType(): elemPtrType = "
				<< *elemPtrType << "\n");
				LLVM_DEBUG(dbgs() << "getElementTypeOfDerivedType(): sizeof(elemPtrType) = "
				<< sizeofElem << "\n");

				if (sizeofElem == 0) {
				return getElementTypeOfDerivedType(elemPtrType);
				}
				else
				return elemPtrType;
				}

				/*
				ArrayType *elemPtrTypeArr = llvm::dyn_cast<ArrayType>(elemPtrType);
				if (elemPtrTypeArr != NULL) {
				Type *elemPtr2TypeArr = elemPtrTypeArr->getElementType();
				sizeofElem = elemPtr2TypeArr->getScalarSizeInBits() / 8;

				LLVM_DEBUG(dbgs()
				<< "getElementTypeOfDerivedType(): (elemPtr2TypeArr != NULL) elemPtr2TypeArr = "
				<< *elemPtr2TypeArr << "\n");
				LLVM_DEBUG(dbgs()
				<< "getElementTypeOfDerivedType(): (elemPtr2TypeArr != NULL) sizeofElem = "
				<< sizeofElem << "\n");
				LLVM_DEBUG(dbgs()
				<< "getElementTypeOfDerivedType(): elemPtrTypeArr->getNumElements() = "
				<< elemPtrTypeArr->getNumElements() << "\n");
				if (sizeofElem == 0) {
				return getElementTypeOfDerivedType(elemPtrType);
				}
				}
				*/

				return NULL;
				}


				bool testEquivalence(Instruction it, PHINode phi) {
				Value *op0 = NULL;

				LLVM_DEBUG(dbgs() << "Entered testEquivalence(): it = "
				<< * it
				<< ", phi = "
				<< *phi << "\n");

				if (phi == it)
				return true;

				if (it->getNumOperands() > 0) {
				op0 = it->getOperand(0);
				}

				switch (it->getOpcode()) {
				case Instruction::ZExt:
				case Instruction::SExt:
				case Instruction::Trunc:
				//case Instruction::ShuffleVector:
				//case Instruction::InsertElement:
				//case Instruction::PHI:
				//case Instruction::ExtractElement:
				//res = "";
				break;

				/*case Instruction::GetElementPtr:
				break; */

				default:
				return false;
				//assert(0 && "testEquivalence(): we do not deal with these cases");
				}

				/*
				// IMPORTANT-TODO: need to do this for the case we have an access like
				// B[j + 1][0]
				switch (it->getOpcode()) {
				case Instruction::Add:
				res += " + ";
				break;
				}
				*/

				return testEquivalence((Instruction *)op0, phi);
				}


				inline bool isGlobalArray(GetElementPtrInst *GEPPtr) {
				return llvm::dyn_cast<GlobalValue>(GEPPtr->getOperand(0)) != NULL;
				}

				inline Value GetIndexOpndFromGEPInst(GetElementPtrInst GEPPtr) {
				int startIndex;
				if (isGlobalArray(GEPPtr)) {
				/* Following also
				http://llvm.org/docs/GetElementPtr.html#why-is-the-extra-0-index-required
				we see that for global arrays, the 1st index
				in GEP is redundant - it has value 0 invariably,
				so we skip it.
				*/
				startIndex = 2;
				}
				else {
				startIndex = 1;
				}

				Value *res = GEPPtr->getOperand(startIndex);

				return res;
				}


				/*
				Currently this function ONLY does this: it gets rid of duplicated spaces.

				// IMPORTANT-TODO: get rid of unnecessary parantheses
				- for this normally I have to parse expr before and pretty-print it intelligently
				To do algebraic simplification is more complex. See Muchnick,
				- value numbering, etc.

				To do Constant folding (Constant-Expression Evaluation),
				although both these methods are heavy, we could use them:
				We could try to use CIL's partial evaluation module, but:
				- it doesn't work with C++
				We can't use sympy, which can parse expressions (parse_expr) and simplify them (method sympy.simplify.cse_main.cse) because:
				- see e.g.
				https://github.com/sympy/sympy/blob/master/sympy/parsing/sympy_parser.py
				- it doesn't handle pointers, etc - but we can extend it
				- ...
				*/
				std::string canonicalizeExpression(std::string aStr) {
				for (;;) {
				// From http://www.cplusplus.com/reference/string/string/find/
				std::size_t pos = aStr.find(" ");

				if (pos == std::string::npos) {
				break;
				}
				else {
				//std::cout << "first 'needle' found at position: " << pos << "\n";
				// From http://www.cplusplus.com/reference/string/string/erase/
				aStr.erase(pos, 1);
				}
				}
				/*
				std::cout << "canonicalizeExpression(): returning aStr = "
				<< aStr << "\n";
				*/

				return aStr;
				}



				inline void printInfo(Instruction *it,
				char str0, char str1, const char *iGetNameData,
				Value op0, Value op1) {
				LLVM_DEBUG(dbgs() << "printInfo(): it = " << *it << "\n");
				LLVM_DEBUG(dbgs() << "printInfo(): it ptr = " << it << "\n");
				LLVM_DEBUG(dbgs() << " (printInfo(): it->getOpcodeName() = "
				<< it->getOpcodeName() << ")\n");
				LLVM_DEBUG(dbgs() << " (printInfo(): it->getOpcode() = "
				<< it->getOpcode() << ")\n");
				LLVM_DEBUG(dbgs() << " (printInfo(): it->getName() = "
				<< iGetNameData << ")\n");
				LLVM_DEBUG(dbgs() << " (printInfo(): str0 = "
				<< str0 << ")\n");
				LLVM_DEBUG(dbgs() << " (printInfo(): str1 = "
				<< str1 << ")\n");

				if (op0 == NULL) {
				LLVM_DEBUG(dbgs() << " (printInfo(): op0 = NULL\n");
				}
				else {
				LLVM_DEBUG(dbgs() << " (printInfo(): op0 = "
				<< *op0 << ")\n");
				}

				if (op1 == NULL) {
				LLVM_DEBUG(dbgs() << " (printInfo(): op1 = NULL\n");
				}
				else {
				LLVM_DEBUG(dbgs() << " (printInfo(): op1 = "
				<< *op1 << ")\n");
				}
				}


				/* Alex:
				* - we get a C expression
				* by walking on the use-def-chains (more exactly the only reaching definition
				* for the SSA it instruction) in order to get the most complete definition
				* for the it instruction.
				*
				* - doing some sort of partial evaluation

				NOTE: SCEV also pretty prints - display expressions related to tripcounts
				(zext i16 (-1 + %N) to i32)
				(see code below:
				BackedgeTakenCount->dump();
				ExitCount->dump(); )
				See, more exactly, http://llvm.org/docs/doxygen/html/ScalarEvolution_8cpp_source.html
				void SCEV::print(raw_ostream &OS) const {}

				IMPORTANT NOTE: We use ((int *)&x) instead of &x because the & for an array
				(global at least) is a pointer to array and this affects/reflects on
				the pointer arithmetic.
				Concrete example on ARM 32 on zedboard.arh.pub.ro:
				/home/alarm/OpincaaLLVM/opincaa_standalone_app/35_MatMul/SIZE_256/STDout_003a
				Before 1st write: &A = 405912
				Before 1st write: &A + 20 = 3027352
				Before 1st write: &A + 131072 = 405912
				Before 1st write: ((char *)&A) + 131072 = 536984
				when running on ARM (32 bits processor) it is possible that &A + x == &A
				(where x is e.g. 131072) (probably because of overflow or because the
				VM did not map memory there or...)
				So, again, we need to use when doing arithmetic instead of &A --> (int *)(&A)
				or (short/char *)(&A) .
				NOTE: [TODO TODO CHECK WELL]: It seems for pointer type we print just the var e.g. A
				without &A.
				*/
				bool usePaddingForNestedLoops_more = false;
				bool getExprVarSpecial = false;
				//bool getExprForTripCount = false;
				bool getExprForDMATransfer = true;
				std::unordered_map<Instruction *, std::string> cacheExpr;
				Value *basePtrGetExprIt; // This is the base pointer (GetElementPtr, 1st operand; )
				// IMPORTANT-TODO: make getExpr(Value *it) and check if it is instruction or not
				//
				std::string getExpr(Instruction *it) {
				if (it == NULL) {
				LLVM_DEBUG(dbgs() << "Entered getExpr(): it = NULL\n");
				return std::string("");
				}
				else {
				LLVM_DEBUG(dbgs() << "Entered getExpr(): it = "
				<< *it << "\n");
				}

				char str0 = const_cast<char >("");
				char str1 = const_cast<char >("");

				char strCopy[MAXLEN_STR];
				//static char res[MAXLEN_STR];
				std::string res;

				#define STR_VEC_IND "vec.ind"
				#define STR_STEP_ADD "step.add"
				/* Note that if I recall correctly, the var names ending in splatinsert are
				automatically generated */
				#define STR_BROADCAST_SPLATINSERT "broadcast.splatinsert"
				#define STR_SPLATINSERT ".splatinsert"
				#define STR_BROADCAST_SPLAT "broadcast.splat"
				#define STR_SPLAT ".splat"
				//
				#define STR_INDUCTION "induction"
				#define STR_UNDEF "undef"
				#define STR_INDEX "index"
				#define STR_INDEX_NEXT "index.next"

				/* NOTE: It is possible that the names have a suffix when we have 2+
				vars starting with the same name - this happens when more
				vector.body BBs are created (more loops are vectorized).
				For this, we use strncmp(), not strcmp(). */

				Value *op0 = NULL;
				Value *op1 = NULL;

				if (it->getNumOperands() > 0) {
				op0 = it->getOperand(0);
				str0 = const_cast<char *>(op0->getName().data());
				if (it->getNumOperands() > 1) {
				op1 = it->getOperand(1);
				str1 = const_cast<char *>(op1->getName().data());
				}
				}

				/*
				* NOTE: it points to an Instruction (or just a Value).
				getOperand() returns type Value.
				* From http://llvm.org/docs/doxygen/html/classllvm_1_1Value.html
				<< StringRef getName () const
				Return a constant reference to the value's name. >>
				*/

				const char *iGetNameData = it->getName().data();

				res.clear();


				/*
				LLVM_DEBUG(dbgs() << "getExpr(): getExprForTripCount = "
				<< getExprForTripCount << "\n");
				*/
				printInfo(it, str0, str1, iGetNameData, op0, op1);

				// See http://www.cplusplus.com/reference/unordered_map/unordered_map/find/
				std::unordered_map<Instruction *, std::string>::const_iterator got =
				cacheExpr.find(it);
				#define INVALID_VALUE_CACHEEXPR "\\@@INVALID_STR@@"
				if (got == cacheExpr.end()) {
				//cacheExpr.insert(it);

				/* We insert an empty string res, just to keep track we visited this
				* node and we update the entry with the correct value at the end of
				* the function */
				cacheExpr[it] = INVALID_VALUE_CACHEEXPR; //res;
				}
				else {
				if (cacheExpr[it] != INVALID_VALUE_CACHEEXPR) {
				// This case can be quite easily reached if the expression it has
				// several times as constituent atoms the same expression.
				res = got->second;
				LLVM_DEBUG(dbgs()
				<< "getExpr(): We already visited this node so we stop here.\n");
				goto GetExpr_end;
				}
				else
				/* We have already cached something for this node,
				* either an INVALID_VALUE_CACHEEXPR or a valid value we can return
				* directly.
				*/
				if (it->getOpcode() == Instruction::PHI) {
				/* If we visited this phi we do NOT revisit it since it can easily
				* result in infinite cycles... It's not very fundamented,
				* but it's OK :) */
				/*
				We should keep the unstripped name, although it is possible that if
				we visited the variable node before it might be already stripped.
				*/

				if (strlen(str0) == 0) {
				std::string exprOp0 = getExpr((Instruction *)op0);
				LLVM_DEBUG(dbgs() << "getExpr(): Checking PHI's exprOp0 = "
				<< exprOp0 << " (should be a constant).\n");

				// 2018_12_15: MEGA-TODO: test well, also regressive tests
				if (strlen(iGetNameData) == 0) {
				if (exprOp0.size() > 4)
				res = exprOp0;
				}

				//assert(strcmp(exprOp0.c_str(), "0") == 0);
				}
				else
				if (strlen(str1) == 0) {
				std::string exprOp1 = getExpr((Instruction *)op1);
				LLVM_DEBUG(dbgs() << "getExpr(): Checking PHI's exprOp1 = "
				<< exprOp1 << " (should be a constant).\n");

				// 2018_12_15: MEGA-TODO: test well, also regressive tests
				if (strlen(iGetNameData) == 0) {
				if (exprOp1.size() > 4)
				res = exprOp1;
				}

				//assert(strcmp(exprOp1.c_str(), "0") == 0);
				}
				else {
				LLVM_DEBUG(dbgs() << "getExpr(): Setting res to empty string.\n");
				res = "";
				goto GetExpr_end;
				}

				LLVM_DEBUG(dbgs()
				<< "getExpr(): We visited part of this PHI node "
				"so we approximate it... This should be avoided if possible.\n");

				if (getExprVarSpecial) {
				//res += "<VAR*SPECIAL>";
				}

				LLVM_DEBUG(dbgs() << "getExpr(): res = " << res << "\n");
				//res = rStripStringAfterChar(iGetNameData, '.');
				strcpy(strCopy, iGetNameData);
				rStripStringAfterChar(strCopy, '.');
				res += strCopy;
				LLVM_DEBUG(dbgs() << "getExpr(): after, res = " << res << "\n");

				if (getExprVarSpecial) {
				char strTmp[MAXLEN_STR];
				sprintf(strTmp, "__%p", (void *)it);
				res += strTmp;
				//res += "<VARSPECIALEND>";
				}

				goto GetExpr_end;
				}
				} // END else if (got == cacheExpr.end())

				/* Global var (values, not arrays) in LLVM language are already pointers to
				the global address space. This is why we need to use & for them.
				We check that *it is a GlobalValue like:
				@colsK = common local_unnamed_addr global i32 0, align 4
				// See http://llvm.org/docs/doxygen/html/classllvm_1_1GlobalValue.html
				// (also http://llvm.org/docs/LangRef.html#global-variables)
				*/
				//if (GlobalValue *gv = llvm::dyn_cast<GlobalValue>(it))
				if (llvm::dyn_cast<GlobalValue>(it) != NULL) {
				if (usePaddingForNestedLoops_more == true)
				res = "(";
				else
				res = "((int *)&";

				if (getExprVarSpecial) {
				//res += "<VAR*SPECIAL>";
				}

				res += iGetNameData;

				if (getExprVarSpecial) {
				char strTmp[MAXLEN_STR];
				sprintf(strTmp, "__%p", (void *)it);
				res += strTmp;
				//res += "<VARSPECIALEND>";
				}

				res += ")";
				if (basePtrGetExprIt == NULL)
				basePtrGetExprIt = it;

				goto GetExpr_end;
				}

				#ifdef NOT_TREAT_NMODVF
				/* When computing trip count, I don't want it to be multiple of VF,
				but I want the original expression.
				Note: n.mod.vf is a name given by the program below (this module) in
				getOrCreateVectorTripCount(). */
				/* It is possible that the names to have a suffix since the names
				exist, since a different vector.body was created before. */
				if (strncmp(iGetNameData, STR_REMAINDER_VF,
				strlen(STR_REMAINDER_VF)) == 0) {
				LLVM_DEBUG(dbgs() << "getExpr(): NOT following remainder var "
				<< iGetNameData << ".\n");

				/* A simple hack, since I already have the - operator and am lazy to
				get rid of it: */
				res = "0";

				goto GetExpr_end;
				}
				#endif

				if ((strncmp(iGetNameData, STR_INDUCTION, strlen(STR_INDUCTION)) == 0) &&
				(it->getOpcode() == Instruction::Add)) {
				LLVM_DEBUG(dbgs() << "getExpr(): NOT following induction var "
				<< iGetNameData << ".\n");
				res = getExpr((Instruction *) (it->getOperand(0)) );

				/* Indeed, induction is a vector of consecutive indices - let's call it
				a vector index.
				VERY IMPORTANT: To understand things better, we distinguish:
				- the scalar index, indexLLVM_LV, or LV's index (and index.next)
				- the vector index, vec.ind, used for loading from array (well,
				sortof scalar, but...) */

				/*
				// We do NOT process this:
				res += " + ";
				// TODO TODO: check that op1 == <VF x i...><0, 1, ..., VF-1>
				res += "indexLLVM_LV";
				*/
				goto GetExpr_end;
				}


				if ((strncmp(iGetNameData, STR_INDEX,
				strlen(STR_INDEX)) == 0) &&
				(it->getOpcode() == Instruction::PHI) &&
				(strncmp(it->getOperand(1)->getName().data(), STR_INDEX_NEXT,
				strlen(STR_INDEX_NEXT)) == 0)
				) {
				// TODO TODO Check that op0 is constant 0.
				// Coping with %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				//LLVM_DEBUG(dbgs() << "getExpr(): NOT following index induction var.\n");
				LLVM_DEBUG(dbgs() << "getExpr(): Treating special case index = phi(0, index.next).\n");

				/* A simple hack, since I already have the - operator and am lazy to
				get rid of it: */
				#ifdef AGGREGATED_DMA_TRANSFERS // IMPORTANT note: we include this file from the back end also now (not only LoopVectorize.cpp)
				if (getExprForDMATransfer)
				res = "0";
				else
				res = "indexLLVM_LV";
				#else
				if (getExprForDMATransfer)
				res = "0";
				else
				res = "indexLLVM_LV";
				#endif

				goto GetExpr_end;
				}

				// Note: constants like i64 0 don't have name --> str0 is empty
				/* Here we try to solve a recurrence equation with any PHI node related to
				the C source variables: */
				if ((it->getOpcode() == Instruction::PHI) &&
				strncmp(iGetNameData, STR_VEC_IND, strlen(STR_VEC_IND)) != 0 &&
				strncmp(iGetNameData, STR_STEP_ADD, strlen(STR_STEP_ADD)) != 0 &&
				strncmp(iGetNameData, STR_INDUCTION, strlen(STR_INDUCTION)) != 0) {
				LLVM_DEBUG(dbgs() <<
				"getExpr(): it is Phi, phi node with no special vector vars...\n");

				assert(it->getNumOperands() > 0);

				// 2018_12_15: MEGA-TODO: test well
				if (((Instruction *)op0)->getOpcode() == Instruction::PHI) {
				// MEGA-TODO: && strncmp(exprOp1.c_str(), STR_UNDEF, strlen(STR_UNDEF))
				LLVM_DEBUG(dbgs() <<
				"getExpr(): op0 is Phi --> res = getExpr(op0)\n");
				res = getExpr((Instruction *)op0);
				goto GetExpr_end;
				}
				else
				// 2018_12_15: MEGA-TODO: test well
				if (((Instruction *)op1)->getOpcode() == Instruction::PHI) {
				// MEGA-TODO: && strncmp(exprOp1.c_str(), STR_UNDEF, strlen(STR_UNDEF))
				LLVM_DEBUG(dbgs() <<
				"getExpr(): op1 is Phi --> res = getExpr(op1)\n");
				res = getExpr((Instruction *)op1);
				goto GetExpr_end;
				}

				//#ifdef NEW_STUFF_DANGER
				if (strlen(str0) == 0) {
				LLVM_DEBUG(dbgs() <<
				" getExpr(): strlen(str0) == 0 --> exchanging operands\n");

				long tmp;
				tmp = (long)str0;
				str0 = str1;
				str1 = (char *)tmp;

				tmp = (long)op0;
				op0 = op1;
				op1 = (Value *)tmp;

				//EXCHANGE(str0, str1);
				//EXCHANGE((int)op0, (int)op1);

				printInfo(it, str0, str1, iGetNameData, op0, op1);
				}
				//#endif

				if (strlen(str0) != 0) {
				//assert(str0 ==(symbolically, after more recovery) iGetNameData + 1);
				LLVM_DEBUG(dbgs() << " ... Entering getExpr() for op0\n");
				std::string exprOp0 = canonicalizeExpression(getExpr((Instruction *)op0));
				LLVM_DEBUG(dbgs() << " exprOp0 = " << exprOp0 << "\n");

				std::string tmp = "(";

				strcpy(strCopy, iGetNameData);
				rStripStringAfterChar(strCopy, '.');

				//tmp = tmp + iGetNameData;
				tmp = tmp + strCopy;
				tmp = tmp + " + 1)";

				LLVM_DEBUG(dbgs() << " tmp = " << tmp << "\n");

				/* IMPORTANT-TODO: in some cases like
				/home/asusu/LLVM/Tests/NEW_v128i16/32_MatAdd/STDerr_clang_opt_01
				we will have (i + 1) instead of (i.047.us + 1)
				*/
				if (strcmp(exprOp0.c_str(), tmp.c_str()) != 0) {
				// IMPORTANT-TODO: take from the other if case below
				LLVM_DEBUG(dbgs()
				<< " VERY BAD case encountered: "
				<< "Phi node is NOT like x = Phi(x + 1, 0) --> return 'main' part of exprOp0\n");
				/* IMPORTANT-TODO: this case is indeed bad - to
				compute a solution to the phi node we normally require more
				intelligent analysis.


				For example, for test 32_MatAdd we have:
				%conv48.us = phi i32 [ %conv.us, %for.cond3.for.inc12_crit_edge.us ],
				[ 0, %for.cond3.preheader.us.preheader ]
				%i.047.us = phi i16 [ %inc13.us, %for.cond3.for.inc12_crit_edge.us ],
				[ 0, %for.cond3.preheader.us.preheader ]

				While the 2nd phi has an easy to find solution (by seeing that
				%inc13.us = add i16 %i.047.us, 1, !dbg !27)
				which means the closed-form solution of Phi is %i.047.us = i,
				for the 1st phi node the situation is VERY complicated.
				But we see that:
				%conv.us = sext i16 %inc13.us to i32, !dbg !28
				which makes the Phi expression of %conv48.us the same as
				for %i.047.us .

				Also for SSD:
				%conv48.us = phi(i.047.us + 1, 0)

				getExpr(): it = %conv327 = phi i32 [ 0, %for.cond2.preheader ], [ %conv3, %for.inc44 ]
				getExpr(): op1 = %conv3 = sext i16 %inc45 to i32, !dbg !41
				getExpr(): updated op1 = %inc45 = add i16 %counter.026, 1, !dbg !40
				Alhough %conv.327 does NOT appear in the final .ll file, if we look in:
				NEW_v128i16/90_CV/SSD/STDerr_clang_opt_01
				we have a similar case:
				for.cond7.preheader: ; preds = %for.cond2.preheader, %for.inc44
				%conv327 = phi i32 [ 0, %for.cond2.preheader ], [ %conv3, %for.inc44 ]
				%counter.026 = phi i16 [ 0, %for.cond2.preheader ], [ %inc45, %for.inc44 ]
				*/

				/* IMPORTANT-TODO: think if possible to do better like
				having getExpr return a parse tree where it is clear that a
				node is a var or constant in order to avoid using substr. */
				res += exprOp0.substr(1, exprOp0.size() - 6);
				goto GetExpr_end;
				}
				else { //if (strcmp(exprOp0.c_str(), tmp.c_str()) == 0)
				// Case: *it is: x == phi(x + 1, 0);
				// Check getExpr(op0) == str0 + 1;

				LLVM_DEBUG(dbgs() << " ... Entering getExpr() for op1\n");
				std::string exprOp1 = canonicalizeExpression(getExpr((Instruction *)op1));
				LLVM_DEBUG(dbgs() << " exprOp1 = " << exprOp1 << "\n");
				assert(strcmp(exprOp1.c_str(), "0") == 0);

				//assert(op0->getOpcode() == Instruction::ADD);
				/* assert that:
				- op1 is ct 0 and
				- op0 == iGetNameData + 1 (but this normally leads to
				a cyclic dependency)
				i.e., check that (str0 == iGetNameData) && (str1 == ct 0) */
				/* This next condition is VERY important
				* - e.g., for i phi node, for ...: because TODO
				*/
				LLVM_DEBUG(dbgs() <<
				"getExpr(): ...and str0 not empty, --> res = name of it\n");

				/* We don't modify iGetNameData - otherwise we get errors
				(assertion failures, etc) for modifying the LLVM variable names
				*/
				strcpy(strCopy, iGetNameData);

				/* Alex: We might have a newly created temp LLVM var and keep the original
				(source file) variable name
				*/
				rStripStringAfterChar(strCopy, '.');
				res += strCopy;
				goto GetExpr_end;
				}
				}
				#ifdef NOTNOTNOT
				else
				if (strlen(str1) != 0) {
				LLVM_DEBUG(dbgs() << "getExpr(): op1 = " << *op1 << "\n");

				bool goodPhi = false;
				if (((Instruction *)op1)->getOpcode() == Instruction::Add) {
				goodPhi = true;
				}
				else
				/* IMPORTANT-TODO: make it more generic (maybe
				getExpr can itself say if we have a chain of SExt, Trunc, etc
				before an Add) */
				if (((Instruction *)op1)->getOpcode() == Instruction::SExt) {
				op1 = ((Instruction *)op1)->getOperand(0);

				if (((Instruction *)op1)->getOpcode() == Instruction::Add) {
				goodPhi = true;
				LLVM_DEBUG(dbgs() << "getExpr(): updated op1 = "
				<< *op1 << "\n");
				}
				}

				if (goodPhi) {
				LLVM_DEBUG(dbgs() << "getExpr(): ...and str1 not empty...\n");

				// TODO TODO: check for Add to be + 1, etc
				Value op10 = ((Instruction )op1)->getOperand(0);
				Value op11 = ((Instruction )op1)->getOperand(1);
				const char *op10Name = op10->getName().data();

				LLVM_DEBUG(dbgs() << "getExpr(): op10 = " << * op10 << "...\n");
				LLVM_DEBUG(dbgs() << "getExpr(): op10Name = "
				<< op10Name << "...\n");
				LLVM_DEBUG(dbgs() << "getExpr(): op11 = " << * op11 << "...\n");

				std::string res11 = getExpr((Instruction *)op11);
				LLVM_DEBUG(dbgs() << "getExpr(): ...getExpr(op11) = "
				<< res11 << "...\n");

				//((Instruction *)op1)->getOperand(0)->getName().data()
				if (strcmp(op10->getName().data(), iGetNameData) == 0 &&
				strcmp(res11.c_str(), "1") == 0) {
				LLVM_DEBUG(dbgs()
				<< "getExpr(): ...it->op1->op0 == it --> res = name of it\n");

				/* We have instruction (recurrent equation):
				x = phi(0, x + 1) with solution x . */
				strcpy(strCopy, iGetNameData);
				rStripStringAfterChar(strCopy, '.');
				res += strCopy;
				goto GetExpr_end;
				}
				else {
				/* UNFORTUNATELY, we have an equation like:
				y = phi(0, f(x)).
				It is difficult to give a solution for general f(x).
				BUT for case f(x) = x + 1, if we have also an instruction
				x = phi(0, x + 1), with the same phi-labels as the
				y = phi(...) instruction then it is obvious that y = x.
				Fortunately, this happens quite often, acutally.
				*/
				LLVM_DEBUG(dbgs()
				<< "getExpr(): ...it->op1->op0 != it --> ...\n");
				LLVM_DEBUG(dbgs() << "getExpr(): op0 = "
				<< *op0 << "\n");
				LLVM_DEBUG(dbgs() << "getExpr(): op1 = "
				<< *op1 << "\n");

				std::string res0Aux = getExpr((Instruction *)op0);
				assert(res0Aux == "0");

				if ( (((Instruction *)op1)->getOpcode() == Instruction::Add) &&
				(getExpr((Instruction *)op11) == "1") &&
				(((Instruction *)op10)->getOpcode() == Instruction::PHI) ) {
				std::string res10Aux = getExpr((Instruction *)op10);
				LLVM_DEBUG(dbgs() << "res10Aux = " << res10Aux.c_str() << "\n");

				if (cacheExpr[(Instruction *)op10] != INVALID_VALUE_CACHEEXPR) {
				LLVM_DEBUG(dbgs()
				<< "getExpr() - Special PHI case "
				"encountered: y = phi(0, x + 1), where x is also PHI\n");
				/*
				if (strncmp(cacheExpr[op10].c_str(),
				op10->getName().data(),
				cacheExpr[op10].size()) == 0)
				*/

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1PHINode.html
				//PHINode op10It = (PHINode )op10;
				PHINode *op10It = llvm::dyn_cast<PHINode>(op10);

				int itNumOpnds = it->getNumOperands();
				LLVM_DEBUG(dbgs() << "getExpr(): itNumOpnds = "
				<< itNumOpnds
				<< "; op10It->getNumOperands() = "
				<< op10It->getNumOperands() << "\n");
				assert(itNumOpnds == 2 &&
				op10It->getNumOperands() == 2);

				int iOpnd;
				for (iOpnd = 0; iOpnd < itNumOpnds; iOpnd++) {
				LLVM_DEBUG(dbgs() << "getExpr(): it->getIncomingBlock("
				<< iOpnd << ") = "
				<< ((PHINode *)it)->getIncomingBlock(iOpnd)
				<< "\n");
				LLVM_DEBUG(dbgs() << "getExpr(): op10It->getIncomingBlock("
				<< iOpnd << ") = "
				<< op10It->getIncomingBlock(iOpnd) << "\n");

				if (((PHINode *)it)->getIncomingBlock(iOpnd) !=
				op10It->getIncomingBlock(iOpnd))
				break;
				}

				LLVM_DEBUG(dbgs() << " getExpr() - ... and "
				"these 2 PHIs are basically equivalent "
				"(except the 'it' node does not have recursive eq "
				"as the other - 'it' has a different name than op10)\n");

				if (iOpnd == itNumOpnds) {
				res += res10Aux;
				goto GetExpr_end;
				}
				}
				}
				}
				}
				res += "!!!! [DO NOT KNOW HOW TO SOLVE]!!!!";
				} // end strlen(str1 != 0)
				#endif // NOTNOTNOT
				} // end if ((it->getOpcode() == Instruction::PHI)


				// TODO TODO: NOT sure if it's OK to only choose it->getOperand(0)
				// Normally this makes it a pointer to Value
				if (it->getNumOperands() == 0) {
				Type itType = ((Value )it)->getType();

				LLVM_DEBUG(dbgs() << " (getExpr(): it->getType() = "
				<< *itType << " )\n");

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1Type.html
				if (itType->isVectorTy()) {
				int64_t resVal = 0;
				char strAux[MAXLEN_STR];

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1ConstantVector.html
				// Surprisingly NOT working: ConstantVector ctVec = llvm::dyn_cast<ConstantVector>((Value )it);

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1ConstantDataVector.html
				ConstantDataVector ctVec = llvm::dyn_cast<ConstantDataVector>((Value )it);

				LLVM_DEBUG(dbgs() << "getExpr(): ctVec ="
				<< ctVec << "\n");

				if (ctVec != NULL) {
				Constant *ctSplat = ctVec->getSplatValue();

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1Constant.html
				const APInt ctAPInt = ctSplat->getUniqueInteger();
				// TODO TODO: Use instead Constant::getAggregateElement() - see http://lists.llvm.org/pipermail/llvm-dev/2016-November/106954.html

				// See http://llvm.org/docs/doxygen/html/classllvm_1_1APInt.html
				resVal = ctAPInt.getSExtValue();
				}

				/* This was meant for the %induction vector var;
				but it's NOT good for %(broadcast).splatinsert - but we take
				care of this below ...TODO [SAY WHERE]
				*/
				sprintf(strAux, "(int)%ld", resVal);
				res += strAux;
				goto GetExpr_end;
				}

				// We print the constant or input variable:
				std::string Result;
				raw_string_ostream OS(Result);
				((Value )it)->printAsOperand(OS, / bool PrintType = */ false);
				OS.flush();
				LLVM_DEBUG(dbgs() << " (getExpr(): it->printAsOperand() = "
				<< Result << ")\n");

				// We erase the leading % char if it exists - for name of var
				if (Result.c_str()[0] == '%')
				Result.erase(0, 1);

				/*
				Result.clear();
				((Value *)it)->print(OS);
				OS.flush();
				LLVM_DEBUG(dbgs() << " (getExpr(): it->print() = "
				<< Result << ")\n");
				*/
				/*
				switch (it->getOpcode()) {
				case Instruction::Constant:
				LLVM_DEBUG(dbgs() << " (getExpr(): it is Constant))\n");
				res = "ct!!!!";
				break;
				}
				*/
				if (strncmp(Result.c_str(), STR_UNDEF, strlen(STR_UNDEF)) != 0) {
				/* Note:
				We can also have as parent %broadcast.splatinsert = insertelement <32 x i64> undef, i64 %mul.us, i32 0
				For this case, operand 0 is printed as: "<32 x i64> undef".
				But we avoid to reach this case by specially treating
				a %broadcast.splatinsert node.
				*/
				res += Result;
				}
				goto GetExpr_end;
				} // END of if (it->getNumOperands() == 0)


				bool putParantheses;

				switch (it->getOpcode()) {
				case Instruction::ZExt:
				case Instruction::SExt:
				case Instruction::Trunc:
				case Instruction::ShuffleVector:
				case Instruction::InsertElement:
				case Instruction::PHI:
				case Instruction::ExtractElement:
				//res = "";
				putParantheses = false;
				break;

				case Instruction::GetElementPtr: {
				/*
				putParantheses = false;
				res = "(int *)&";
				*/

				/* IMPORTANT:
				From http://en.cppreference.com/w/c/language/operator_precedence:
				- operator [] (Array subscripting) has bigger priority
				than & (Address-of).
				So we need to put parantheses here
				in case [] follows.
				*/
				putParantheses = true;

				//res = "(int *)&(";

				// By doing so we treat case like (&ls[index])[0] (see SSD benchmark)
				res = "((int *)&";

				GetElementPtrInst *GEPInstr = llvm::dyn_cast<GetElementPtrInst>(it);
				assert(GEPInstr != NULL);
				if (basePtrGetExprIt == NULL)
				basePtrGetExprIt = GEPInstr->getPointerOperand();

				break;
				}
				default:
				putParantheses = true;
				res = "(";
				}
				/*
				if (putParantheses)
				res = "(";
				//
				if (it->getOpcode() == Instruction::GetElementPtr) { }
				*/

				LLVM_DEBUG(dbgs() << "getExpr(): putParantheses = "
				<< putParantheses << "\n");

				if (it->getNumOperands() > 1) {
				LLVM_DEBUG(dbgs() << "getExpr(): it->getOperand(1) = "
				<< *op1 << "; "
				<< "(str1 = "
				<< str1 << ")[END]\n");

				// We prevent pretty-printing constant vectors
				//if (getExprForTripCount == false)
				/* TODO: maybe step.add is not operand 1, but 0 or 2, etc; check that
				op0 is constant */
				if (strncmp(iGetNameData, STR_VEC_IND, strlen(STR_VEC_IND)) == 0 &&
				strncmp(str1, STR_STEP_ADD, strlen(STR_STEP_ADD)) == 0 &&
				strncmp(str0, STR_INDUCTION, strlen(STR_INDUCTION)) != 0) {
				/*
				This prevents further processing of:
				%vec.ind = phi <32 x i64> [ <i64 0, i64 1, ...>, %vector.ph ], [ %step.add, %vector.body ]
				BUT NOT of: %vec.ind = phi <32 x i32> [ %induction, %vector.ph ], [ %step.add, %vector.body ]
				*/
				LLVM_DEBUG(dbgs() << "getExpr(): treating vec.ind = phi ct_vec, step.add case\n");

				#ifdef AGGREGATED_DMA_TRANSFERS // IMPORTANT note: we include this file from the back end also now (not only LoopVectorize.cpp)
				if (getExprForDMATransfer)
				res = "0";
				else
				res = "indexLLVM_LV";
				#else
				if (getExprForDMATransfer)
				res = "0";
				else
				res = "indexLLVM_LV";
				#endif
				goto GetExpr_end;
				}

				if (strncmp(iGetNameData, STR_VEC_IND, strlen(STR_VEC_IND)) == 0 &&
				strncmp(str0, STR_INDUCTION, strlen(STR_INDUCTION)) == 0 &&
				strncmp(str1, STR_STEP_ADD, strlen(STR_STEP_ADD)) == 0
				) {
				/*
				This prevents further processing of:
				%vec.ind = phi <32 x i64> [ <i64 0, i64 1, ...>, %vector.ph ], [ %step.add, %vector.body ]
				BUT NOT of: %vec.ind = phi <32 x i32> [ %induction, %vector.ph ], [ %step.add, %vector.body ]
				*/
				LLVM_DEBUG(dbgs()
				<< "getExpr(): treating vec.ind = phi induction, step.add case\n");
				res = getExpr((Instruction *)op0);
				res += " + indexLLVM_LV";
				goto GetExpr_end;
				}

				if (it->getOpcode() == Instruction::PHI) {
				//assert(0 && "We should not get here... since we already treated it");
				LLVM_DEBUG(dbgs()
				<< "getExpr(): it is Phi. (normally should not be here)\n");
				LLVM_DEBUG(dbgs() << " getExpr(): it = " << it << "\n");

				// IMPORTANT-TODO : follow I guess the loopexit value
				/* This is for cases like the one encountered in 50_SpMV, where we
				cycle over temporary created vars:
				%1 = phi i16 [ %2, %for.cond.loopexit ], [ %.pre, %for.body.preheader ]
				%2 = load i16, i16* %arrayidx5, align 2, !dbg !64, !tbaa !46
				%arrayidx5 = getelementptr inbounds i16, i16* %row_ptr, i64 %idxprom4, !dbg !64
				%idxprom4 = sext i32 %add to i64, !dbg !64
				%add = add nsw i32 %i.026, 1, !dbg !63
				%i.026 = phi i32 [ %add, %for.cond.loopexit ], [ 0, %for.body.preheader ]
				*/
				res = getExpr((Instruction *)op0);
				LLVM_DEBUG(dbgs() << "getExpr(): it is Phi, res = " << res << "\n");


				/* Noname like in the case of 50_SpMV testcase:
				%1 = phi(%2, row_ptr[0])
				TODO TODO But I guess I should check iGetName != str0 + 1...
				*/
				// Note: constants like i64 0 don't have name --> str0 is empty
				if (strlen(str0) == 0) {
				LLVM_DEBUG(dbgs() << "getExpr(): it is Phi, str0 is empty.\n");

				// assert getNumOperands() > 1
				std::string res2 = getExpr((Instruction *)op1);
				//res += " phi ";

				LLVM_DEBUG(dbgs() << "getExpr(): res2 = " << res2 << "\n");

				/* Here we compute the solution of phi - a 1st simple and ~bad
				* attempt.
				MEGA MEGA-TODO: compute the
				closed-form solution from these recursive equations.
				*/
				#define STR_TO_LOOK_FOR " + 1"
				std::size_t found = canonicalizeExpression(res).find(STR_TO_LOOK_FOR);
				if (found != std::string::npos) {
				LLVM_DEBUG(dbgs() << "getExpr(): calling res.erase(found, "
				"strlen(STR_TO_LOOK_FOR))\n");
				res.erase(found, strlen(STR_TO_LOOK_FOR));
				}
				/*
				//BUGS: because of modifying the internal char * of a std::strng
				// and I guess string::size() needs to be updated
				// also(??)
				const char *resCStr = res.c_str();
				char resCStrFound = (char )strstr(resCStr, STR_TO_LOOK_FOR);
				if (resCStrFound != NULL) {
				LLVM_DEBUG(dbgs() << "InstrumentVectorStore(): resCStrFound = "
				<< resCStrFound << "\n");
				// NOT correct - strings do overlap: strcpy(resCStrFound, resCStrFound + 4);
				memmove(resCStrFound, resCStrFound + strlen(STR_TO_LOOK_FOR),
				strlen(resCStrFound + strlen(STR_TO_LOOK_FOR)) + 1);
				}
				*/
				}
				else {
				/* IMPORTANT-TODO: think if it is correct to be empty
				- try it out - note there is also another case treating
				phi nodes above.
				*/
				}

				goto GetExpr_end;
				}

				/*
				// NOT necessary anymore - treat below this case by simply jumping to
				// meaningful values
				if (strncmp(iGetNameData, STR_BROADCAST_SPLATINSERT,
				strlen(STR_BROADCAST_SPLATINSERT)) == 0 \|\|
				strncmp(iGetNameData, STR_SPLATINSERT,
				strlen(STR_SPLATINSERT)) == 0) {
				LLVM_DEBUG(dbgs()
				<< "getExpr(): treating (broadcast).splat(insert) case\n");

				// op0 should be vector undef
				res = getExpr((Instruction *)op1);
				goto GetExpr_end;
				}
				*/
				if (strncmp(iGetNameData, STR_BROADCAST_SPLAT,
				strlen(STR_BROADCAST_SPLAT)) == 0 \|\|
				/* // I guess it's not necessary to do this test:
				&& (strncmp(iGetNameData, STR_BROADCAST_SPLATINSERT,
				strlen(STR_BROADCAST_SPLATINSERT)) != 0) */
				(strncmp(iGetNameData, STR_SPLAT,
				strlen(STR_SPLAT)) == 0)
				/* // I guess it's not necessary to do this test:
				&& (strncmp(iGetNameData, STR_SPLATINSERT,
				strlen(STR_SPLATINSERT)) != 0) */
				) {
				LLVM_DEBUG(dbgs() << "getExpr(): treating (broadcast).splat case\n");

				//if (((Instruction *)op0)
				/* This is for the SSD test:
				%broadcast.splat33 = shufflevector <128 x i16> %broadcast.splatinsert32,
				<128 x i16> undef, <128 x i32> zeroinitializer
				where it =
				%broadcast.splatinsert32 = insertelement <128 x i16> undef, i16 %0, i32 0
				and op0 = <128 x i16> undef
				*/
				if (llvm::dyn_cast<Instruction>(op0) == NULL) {
				res = getExpr((Instruction *) op1);
				goto GetExpr_end;

				/*it->getOpcode() == Instruction::InsertElement)
				if (strncmp(iGetNameData, STR_BROADCAST_SPLAT,
				strlen(STR_BROADCAST_SPLAT)) == 0 \|\|
				*/
				}
				else {
				/// TODO TODO: maybe I should do some checks
				// op1 should be vector undef, op2 should be zeroinitializer
				//res = getExpr((Instruction *)op0);
				res = getExpr((Instruction ) (((Instruction )op0)->getOperand(1)) );
				goto GetExpr_end;
				}
				}
				}

				// We now pretty print op0;

				if ((strlen(str0) == 0)
				/* \|\|
				(strncmp(str0, STR_BROADCAST_SPLATINSERT,
				strlen(STR_BROADCAST_SPLATINSERT)) == 0)) { */
				) {
				/* If the name of the variable is empty it means it is an automatically
				* generated name (like %0, etc), NOT a name from the original (C,C++)
				* program. Therefore we look also at the def of this var.
				*/

				/*
				TODO TODO
				- ~BAD: recursively test str0 until we reach a
				variable name that is input to the function??
				*/
				/* TODO TODO (THIS IS MAYBE BADLY DESIGNED - might require more or fewer steps):
				* Coping with type conversions like i32 to i64 (ex:
				* ~/LLVM/llvm38Nov2016/llvm/build40/bin/Tests/201_LoopVectorize/25_GOOD_map/NEW/7_v16i32/3better_opt.ll)
				* in which case we have the following:
				for.body.preheader: ; preds = %entry
				%0 = add i32 %N, -1
				%1 = zext i32 %0 to i64
				%2 = add nuw nsw i64 %1, 1
				%min.iters.check = icmp ult i64 %2, 16
				[...]
				min.iters.checked: ; preds = %for.body.preheader
				%n.vec = and i64 %2, 8589934576
				*/

				/*
				LLVM_DEBUG(dbgs() << "getExpr(): (it->getOperand(0) = "
				<< * (it->getOperand(0)) << ")\n");
				*/
				LLVM_DEBUG(dbgs()
				<< "getExpr(): str0 empty (or so) --> calling getExpr(op0)\n");
				LLVM_DEBUG(dbgs() << " (getExpr(): current it = " << *it << ").\n");

				//strcpy(res, tmp);
				res += getExpr((Instruction *)op0);
				}
				else { // str0 is NOT empty
				/*
				// NOTNOTNOTNONOTNO
				if (getExprForTripCount == false) {
				LLVM_DEBUG(dbgs() << "getExpr(): returning str0 = "
				<< str0 << "\n");
				//strcpy(res, str0);
				// Gives <<warning: cast from type ‘const char’ to type ‘char’ casts
				// away qualifiers>>
				// * (char *)strchr(str0, '.') = 0;

				// IMPORTANT-TODO: this
				// transformation I guess is NOT 100% safe, because a named var
				// can be a C var or an auxiliary LLVM var created in the LLVM pass
				// - think how to make it safe

				if (strncmp(str0, STR_VEC_IND, strlen(STR_VEC_IND)) != 0) {
				// We don't modify str0 - otherwise we get errors
				//(assertion failures, etc) for modifying the LLVM variable names
				strcpy(strCopy, str0);

				// Alex: We might have a newly created temp LLVM var and keep the original
				// (source file) variable name
				rStripStringAfterChar(strCopy, '.');
				res += strCopy;

				// Maybe put here operation pretty-print TODO TODO
				}
				else {
				// vec.ind is the widened induction variable
				//res += str0;
				}
				}
				else
				*/
				{ //getExprForTripCount == true and str0 not empty
				/*
				// This SOMETIMES introduces infinite cycles, which can be avoided
				// if we keep track of the instructions already visited
				Example of cycle:
				- these 2 simple instructions:
				%indvars.iv29 = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next30, %for.cond.loopexit ].
				%indvars.iv.next30 = add nuw nsw i64 %indvars.iv29, 1, !dbg !9
				*/

				if ((it->getOpcode() == Instruction::GetElementPtr) &&
				(it->getNumOperands() >= 3)) {
				res += ((Instruction *)op0)->getName().data();
				}
				else {
				LLVM_DEBUG(dbgs()
				<< "getExpr(): str0 not empty --> calling getExpr(op0)\n");
				// This introduces useless parantheses: res += "(";
				res += getExpr((Instruction *)op0);
				// This introduces useless parantheses: res += ")";
				}
				}
				}


				// We now pretty print operation associated to *it;

				/* We generate C code for the operation associated to the it
				LLVM instruction.
				See http://llvm.org/docs/doxygen/html/Instruction_8cpp_source.html
				for all/various possible opcodes - see method
				00194 const char Instruction::getOpcodeName(unsigned OpCode) . /
				// NOTE: vec.ind is a PHI node
				//if (strncmp(str0, STR_VEC_IND, strlen(STR_VEC_IND)) != 0)
				//{
				//if (!(getExprForTripCount == false && strcmp(str0, "vec.ind") == 0))
				switch (it->getOpcode()) {
				case Instruction::Call: {
				// IMPORTANT-TODO: this works well for the case 31c_dotprod_RaduH, BUT not sure if it's general
				res = "((int *)&(";
				res += iGetNameData;
				res += "))";

				const char *strFuncName;
				strFuncName = dyn_cast<CallInst>(it)->getCalledFunction()->getName().data();
				assert( (strcmp(strFuncName, "malloc") == 0) \|\|
				(strcmp(strFuncName, "calloc") == 0) );

				// Inspired from http://llvm.org/docs/ProgrammersManual.html#iterating-over-def-use-use-def-chains
				for (Value::user_iterator i = it->user_begin(),
				e = it->user_end();
				i != e; ++i) {
				if (Instruction inst = dyn_cast<Instruction>(i)) {
				LLVM_DEBUG(dbgs() << "getExpr(): it is used in instruction: "
				<< *inst << "\n");
				if (BitCastInst bci = dyn_cast<BitCastInst>(i)) {
				if (strlen(bci->getName().data()) != 0) {
				LLVM_DEBUG(dbgs()
				<< "getExpr(): it is used in BitCast instruction --> we use "
				"its name instead\n");
				res = "((int *)&(";
				res += bci->getName().data();
				res += "))";
				}
				else {
				if (ranGetAllMetadata == false) {
				LLVM_DEBUG(dbgs() << "getExpr(): Before, varNameMap.size() = "
				<< varNameMap.size() << "\n");
				getAllMetadata(bci->getParent()->getParent());
				LLVM_DEBUG(dbgs() << "getExpr(): varNameMap.size() = "
				<< varNameMap.size() << "\n");
				}


				std::string valueName = getLLVMValueName(bci);

				// Normally the value name is a number when getName() is empty
				LLVM_DEBUG(dbgs() << "getExpr(): bci has empty name\n");
				LLVM_DEBUG(dbgs() << "getExpr(): bci = " << *bci << "\n");
				LLVM_DEBUG(dbgs() << " bci = " << bci << "\n");
				LLVM_DEBUG(dbgs() << " bci->getValueName() = "
				<< bci->getValueName() << "\n");
				LLVM_DEBUG(dbgs() << " bci->getName() = "
				<< bci->getName() << "\n");
				LLVM_DEBUG(dbgs() << "getExpr(): it = " << *it << "\n");
				//
				LLVM_DEBUG(dbgs() << "getExpr(): varNameMap[bci] = "
				<< varNameMap[valueName] << "\n");

				//res = varNameMap[valTypeAndName];
				res = varNameMap[valueName];

				goto GetExpr_end;

				/*
				for (Value::user_iterator i2 = bci->user_begin(),
				e2 = bci->user_end();
				i2 != e2; ++i2) {
				if (Instruction inst2 = dyn_cast<Instruction>(i2)) {
				LLVM_DEBUG(dbgs() << "getExpr(): bci is used in instruction: "
				<< *inst2 << "\n");
				if (StoreInst si = dyn_cast<StoreInst>(i2)) {
				LLVM_DEBUG(dbgs()
				<< "getExpr(): bci is used in StoreInst instruction "
				"--> we use its name instead\n");
				res = "((int *)&(";
				res += si->getName().data();
				res += "))";
				goto GetExpr_end;
				}
				}
				}
				*/
				}
				}
				}
				else {
				LLVM_DEBUG(dbgs() << "getExpr(): it is used in val: "
				<< *i << "\n");
				}
				}

				goto GetExpr_end;
				}
				case Instruction::Add:
				res += " + ";
				break;
				//case Instruction::FAdd:
				case Instruction::Sub:
				res += " - ";
				break;
				//case Instruction::FSub:
				case Instruction::Mul:
				res += " * ";
				break;
				//case Instruction::FMul:
				case Instruction::UDiv:
				case Instruction::SDiv:
				case Instruction::FDiv:
				res += " / ";
				break;
				case Instruction::URem:
				case Instruction::SRem:
				//case Instruction::FRem:
				res += " % ";
				break;
				case Instruction::Shl:
				res += " << ";
				break;
				case Instruction::LShr:
				res += " >> ";
				break;
				// IMPORTANT-TODO: think better
				case Instruction::AShr:
				/* From https://en.wikipedia.org/wiki/Arithmetic_shift#cite_ref-1 :
				"The >> operator in C and C++ is
				not necessarily an arithmetic shift. Usually it is only an
				arithmetic shift if used with a signed integer type on its
				left-hand side.
				If it is used on an unsigned integer type instead, it will be a
				logical shift."
				*/
				res += " >> ";
				break;
				case Instruction::And:
				res += " & ";
				break;
				case Instruction::Or:
				res += " \| ";
				break;
				case Instruction::Xor:
				res += " ^ ";
				break;
				case Instruction::PHI:
				res += " phi ";
				break;
				case Instruction::Load:
				//res += " load ";
				res += "[0]";
				break;
				case Instruction::Store:
				res += " store ";
				break;
				case Instruction::GetElementPtr:
				//res += " getelementptr ";
				/*
				if (it->getNumOperands() < 3) {
				res += " + ";
				}
				*/
				break;
				case Instruction::ZExt:
				case Instruction::SExt:
				//res += " ext "; // NOTE: this is unary operator
				break;
				//case Instruction::FPTrunc:
				case Instruction::Trunc: {
				//res += " trunc ";
				break;
				}
				case Instruction::ICmp:
				case Instruction::FCmp: {
				/* TODO TODO: check type of cmp
				CmpInst *Cmp = dyn_cast<CmpInst>(it);
				Cmp->getPredicate()
				*/
				res += " > ";
				break;
				}
				case Instruction::Select: {
				// TODO TODO: add : and 3rd operand
				res += " ? ";
				break;
				}
				case Instruction::ShuffleVector: {
				//res += " shufflevector ";
				break;
				}
				case Instruction::InsertElement: {
				//res += " insertelement ";
				break;
				}
				case Instruction::ExtractElement: {
				//res += " extractelement ";

				std::string op1Expr = getExpr((Instruction )op1); //((Instruction )op1)->getName().data();
				if (op1Expr == "0") {
				LLVM_DEBUG(dbgs()
				<< "getExpr(): Neutralizing ExtractElement, since index is 0\n");
				if (putParantheses)
				res += ")";

				goto GetExpr_end;
				}


				// TODO TODO: check that op0 is vec.ind or sext vec.ind
				res = "((int *)&" + res;
				res += "))"; // One ')' for the '(' added at beginning getExpr,
				// 1 to close the '(' before '&'
				res += "[";
				res += op1Expr;
				res += "]";

				//basePtr = NULL;

				goto GetExpr_end;
				//break;
				}
				// See e.g. http://llvm.org/docs/doxygen/html/Instructions_8h_source.html#l04703
				case Instruction::PtrToInt:
				case Instruction::IntToPtr: {
				/* This is normally encountered when using the LLVM-SRA library and
				I give SCEVRangeBuilder->getUpperBound(AccessFunction) */
				// We don't do a thing
				break;
				}
				case Instruction::Alloca: {
				//res += "(int *)&(";
				// TODO TODO: this works well for the case 31c_dotprod_RaduH
				res = "((int *)&(";
				res += iGetNameData;
				res += "))";
				goto GetExpr_end;
				//break;
				}
				default:
				/* See llvm.org/docs/doxygen/html/Core_8h_source.html#l00100 and
				http://llvm.org/docs/doxygen/html/Instruction_8cpp_source.html#l00194
				for all supported opcodes.
				In fact, we can have more valid opcodes than these
				See http://llvm.org/docs/doxygen/html/Core_8h_source.html#l00100
				- the enums with typedef enum LLVMOpcode - e.g., LLVMAdd, etc
				seem to be related to values of Instruction::getOpcode().
				I think Instruction:Add == LLVMAdd + InstructionVal (use gdb to see exactly);
				note also that getOpcode() returns getValueID() - InstructionVal.
				http://llvm.org/docs/doxygen/html/Value_8h_source.html
				see enum ValueTy - better see http://llvm.org/test-doxygen/api/Value_8h_source.html,
				since the Value.h source file uses TableGen macros inside.
				*/
				LLVM_DEBUG(dbgs() << "getExpr(): !!!!Special case: it = "
				<< *it
				<< "\n");
				const Constant *C = llvm::dyn_cast<Constant>(it);

				LLVM_DEBUG(dbgs() << "getExpr(): C = "
				<< C
				<< "\n");

				if (C != NULL) {
				LLVM_DEBUG(dbgs() << " getExpr(): It is Constant.\n");
				//res += "Constant-->";

				if (const ConstantInt *CI = llvm::dyn_cast<ConstantInt>(C)) {
				LLVM_DEBUG(dbgs() << " getExpr(): CI->getValue() = "
				<< CI->getValue()
				<< ".\n");
				}
				/*
				// Maybe useful in the future, but little likely:
				if (const ConstantDataArray *CA = llvm::dyn_cast<ConstantDataArray>(C)) {
				LLVM_DEBUG(dbgs() << " getExpr(): It is ConstantDataArray.\n");
				}
				if (const ConstantArray *CA = llvm::dyn_cast<ConstantArray>(C)) {
				LLVM_DEBUG(dbgs() << " getExpr(): It is ConstantArray.\n");
				}
				*/

				/* Inspired from http://llvm.org/docs/doxygen/html/AsmWriter_8cpp_source.html#l01304,
				method WriteConstantInternal() .
				*/
				if (const ConstantExpr *CE = llvm::dyn_cast<ConstantExpr>(C)) {
				LLVM_DEBUG(dbgs() << " getExpr(): It is ConstantExpr.\n");

				// From http://llvm.org/test-doxygen/api/Constants_8cpp_source.html#l01937
				// res += CE->getOpcodeName();
				switch (CE->getOpcode()) {
				// small-TODO: this code is similar to the one for the switch above - maybe we should reuse code although it will make things more complicated...
				case Instruction::Add:
				res += " + ";
				break;
				case Instruction::Sub:
				res += " - ";
				break;
				case Instruction::Mul:
				res += " * ";
				break;
				case Instruction::UDiv:
				case Instruction::SDiv:
				res += " / ";
				break;
				case Instruction::SRem:
				case Instruction::URem:
				res += " % ";
				break;
				case Instruction::Shl:
				res += " << ";
				break;
				case Instruction::LShr:
				res += " >> ";
				break;
				case Instruction::AShr:
				res += " >> ";
				break;
				case Instruction::ICmp:
				case Instruction::FCmp:
				res += " > ";
				break;
				case Instruction::ZExt:
				case Instruction::SExt:
				//res += " ext "; // NOTE: this is unary operator
				break;
				case Instruction::Trunc:
				//res += " trunc ";
				break;

				case Instruction::PtrToInt:
				case Instruction::IntToPtr: {
				break;
				}
				default:
				res += " [Unsupported_C_CtExpr_operator]";
				break;
				}
				res += " ";
				}
				else {
				res += " [Unsupported_C_operator]";
				res += it->getOpcodeName();
				res += " ";
				}
				break;
				}
				else {
				//res += " [Unsupported_C_operator] [Constant_C_is_NULL]";
				res += iGetNameData;
				}
				} // end switch

				/*
				if (it->getOpcode() == Instruction::PHI) {
				// TODO TODO: check that op0 is associated to predecessor BB
				// different than itself - e.g., preheader, vector.ph, etc

				// This results in incorrect paranthesis - missing a few ')'
				goto GetExpr_end;
				}
				*/

				// Pretty print op1:

				/*
				if ((it->getNumOperands() > 1) &&
				(it->getOpcode() != Instruction::PHI)) {
				*/
				if (it->getNumOperands() > 1) {
				//strcat(res, " ");
				res += " ";

				bool specialCase = false;
				bool str1NotEmpty = (strlen(str1) != 0);

				if (str1NotEmpty) {
				LLVM_DEBUG(dbgs() << "getExpr(): str1 NOT empty: str1 = "
				<< str1 << "\n");


				/* IMPORTANT NOTE: some operands have names and are also
				instructions */

				/*
				The following can also introduce cycles:
				- an example
				getExpr(): str0 empty (or so) --> calling getExpr(op0)
				(getExpr(): current it = %vec.ind = phi <32 x i64> [ <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11, i64 12, i64 13, i64 14, i64 15, i64 16, i64 17, i64 18, i64 19, i64 20, i64 21, i64 22, i64 23, i64 24, i64 25,
				i64 26, i64 27, i64 28, i64 29, i64 30, i64 31>, %vector.ph ], [ %step.add, %vector.body ]).
				getExpr(): getExprForTripCount = 1
				getExpr(): it = <32 x i64> <i64 0, i64 1, i64 2, i64 3,
				i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10,
				i64 11, i64 12, i64 13, i64 14, i64 15, i64 16,
				i64 17, i64 18, i64 19, i64 20, i64 21, i64 22,
				i64 23, i64 24, i64 25, i64 26, i64 27, i64 28,
				i64 29, i64 30, i64 31>
				(getExpr(): it->getOpcodeName() = <Invalid operator> )
				(getExpr(): it->getName() = )
				(getExpr(): it->printAsOperand() == <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11, i64 12, i64 13, i64 14, i64 15, i64 16, i64 17, i64 18, i64 19, i64 20, i64 21, i64 22, i64 23, i64 24, i64 25, i64 26, i64 27, i6
				4 28, i64 29, i64 30, i64 31>)
				getExpr(): calling getExpr(op1).
				getExpr(): it = %vec.ind = phi <32 x i64> [ <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11, i64 12, i64 13, i64 14, i64 15, i64 16, i64 17, i64 18, i64 19, i64 20, i64 21, i64 22, i64 23, i64 24, i64 25, i64 26,
				i64 27, i64 28, i64 29, i64 30, i64 31>, %vector.ph ], [ %step.add, %vector.body ].
				getExpr(): getExprForTripCount = 1
				getExpr(): it = %step.add = add <32 x i64> %vec.ind,
				<i64 32, i64 32, i64 32, i64 32, i64 32, i64 32,
				i64 32, i64 32, i64 32, i64 32, i64 32, i64 32,
				i64 32, i64 32, i64 32, i64 32, i64 32, i64 32,
				i64 32, i64 32, i64 32, i64 32, i64 32, i64 32,
				i64 32, i64 32, i64 32, i64 32, i64 32, i64 32,
				i64 32, i64 32>, !dbg !38

				- another example:
				getExpr(): it = %row.020.us = phi i64 [ %inc16.us, %for.cond1.for.cond.cleanup3_crit_edge.us ], [ 0, %for.cond1.preheader.us.preheader ]
				(getExpr(): it->getOpcodeName() = phi)
				(getExpr(): it->getName() = row.020.us)
				getExpr(): it->getOperand(1) = i64 0; str1 = [END]
				getExpr(): getExprForTripCount = 1
				getExpr(): it = %inc16.us = add nuw nsw i64 %row.020.us, 1, !dbg !58
				(getExpr(): it->getOpcodeName() = add)
				(getExpr(): it->getName() = inc16.us)
				getExpr(): it->getOperand(1) = i64 1; str1 = [END]
				getExpr(): getExprForTripCount = 1
				getExpr(): it = %row.020.us = phi i64 [ %inc16.us, %for.cond1.for.cond.cleanup3_crit_edge.us ], [ 0, %for.cond1.preheader.us.preheader ]
				*/
				//if (strcmp(str1, "broadcast.splat") == 0)
				if (((Instruction *)op1)->getNumOperands() != 0 &&
				// This prevents pretty-printing constant vectors, etc
				!(strncmp(iGetNameData, STR_VEC_IND, strlen(STR_VEC_IND)) == 0 &&
				strncmp(str1, STR_STEP_ADD, strlen(STR_STEP_ADD)) == 0)
				) {
				//strcat(res, getExpr((Instruction *)op1));

				// We defer pretty printing below - see immediately below
				/*
				LLVM_DEBUG(dbgs() << "getExpr(): calling getExpr(op1).\n");
				LLVM_DEBUG(dbgs() << " getExpr(): it = " << *it << ".\n");
				res += getExpr((Instruction *)op1);
				*/
				}
				else {
				//strcat(res, str1);
				res += str1;
				specialCase = true;
				}
				} // End str1 NOT empty

				if (specialCase == false) {
				LLVM_DEBUG(dbgs() << "getExpr(): specialCase = false, "
				<< "str1NotEmpty = " << str1NotEmpty << ".\n");
				LLVM_DEBUG(dbgs() << " getExpr(): it = " << *it << ".\n");

				if (it->getOpcode() == Instruction::GetElementPtr) {
				int numOpnds = it->getNumOperands();

				//GetIndexOpndFromGEPInst(GEPPtr);
				int startIndex;

				if (llvm::dyn_cast<GlobalValue>(it->getOperand(0)) != NULL) {
				/* We empirically saw that for global arrays, the 1st index
				in GEP is redundant - it has value 0 invariably,
				so we skip it.
				*/
				startIndex = 2;
				}
				else {
				startIndex = 1;
				}

				for (int i = startIndex; i < numOpnds; i++) {
				res += "[";

				Value *op_i = it->getOperand(i);
				res += getExpr((Instruction *)op_i);

				res += "]";
				}
				}
				else {
				//strcat(res, getExpr((Instruction *)op1));
				res += getExpr((Instruction *)op1);
				}
				}
				}


				// IMPORTANT-TODO : treat also Phi, which can have arbitrary num of arguments: if (it->getOpcode() == Instruction::Phi) {
				if (it->getOpcode() == Instruction::Select) {
				res += " : ";

				Value *op2;
				op2 = it->getOperand(2);
				res += getExpr((Instruction *)op2);
				}


				if (putParantheses)
				res += ")";

				GetExpr_end:
				/*
				// Don't really understand why it fails at compile-time at make_pair
				// std::unordered_map<Instruction *, std::string> cacheExpr;
				typedef Instruction *InstructionPtr;
				//cacheExpr.insert(std::make_pair<Instruction *, std::string>(it, res));
				cacheExpr.insert(std::make_pair<InstructionPtr, std::string>(it, res));
				But this does NOT fail:
				// Inspired from example http://www.cplusplus.com/reference/utility/make_pair/
				std::pair<Instruction *, std::string> tmp;
				tmp = std::make_pair(it, res);
				cacheExpr.insert(tmp);
				*/
				/*
				if ((res.size() == 2) && (res.c_str()[0] == '(') &&
				(res.c_str()[1] == ')')) {
				*/
				if (res == "()") {
				// This is redundant so we drop it.
				res.clear();
				}

				LLVM_DEBUG(dbgs() << "getExpr(): Inserting in cacheExpr it = " << it
				<< " (it = " << it
				<< ") and res = " << res << "\n");
				cacheExpr[it] = res;
				return res;
				}

				} // end namespace

				#endif // RECOVER_FROM_LLVM_IR

lib/Target/Connex/Select_ADDf16_OpincaaCodeGen.h

This file has a very large number of changes (3,633 lines). Show File Contents

lib/Target/Connex/Select_ADDi32_OpincaaCodeGen.h

				//===-- Select_ADDi32_OpincaaCodeGen.h - Connex specific TTI ---------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// Code auto-generated by method Kernel::genLLVMISelManualCode().
				// from the OPINCAA lib, from kernel add.i32.
				// You should put this code in the Select() method of the SelectionDAGISel
				// class of your back end.
				// Number of instructions generated: 15.
				//
				//===----------------------------------------------------------------------===//

				// From /home/asusu/LLVM/llvm38Nov2016/llvm/build40/bin/Tests/NEW_v128i16/opincaa_standalone_apps/Emulate_i32/ADD_i32_manual/DumpISel_OpincaaCodeGen_old05_011.cpp

				// R27 is REG_SRC1. It is represented by result of nodeOpSrcCast1.
				arsenmUnsubmitted Done Reply Inline Actions There shouldn't be any generated code. Generated selection should come from table gen, with some manual code in ISelDAGToDAG arsenm:* There shouldn't be any generated code. Generated selection should come from table gen, with…
				// R28 is REG_SRC2. It is represented by result of nodeOpSrcCast2.







				SDValue ct0 = CurDAG->getConstant(0, DL, MVT::i16, true, false);
				// R31 = 0;
				// Instr #0
				SDNode *vload0 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct0,
				// glue (or chain) input edge
				SDValue(nodeOpSrcCast2, 1)
				);

				SDValue ct1 = CurDAG->getConstant(1, DL, MVT::i16, true, false);
				// R30 = 1;
				// Instr #1
				SDNode *vload1 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct1,
				// glue (or chain) input edge
				SDValue(vload0, 1)
				);

				// R29 = R27 + R28;
				// Instr #2
				SDNode *add0 = CurDAG->getMachineNode(
				Connex::ADDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(nodeOpSrcCast2, 0),
				SDValue(nodeOpSrcCast1, 0),
				// glue (or chain) input edge
				SDValue(vload1, 1)
				);

				// R23 = ADDC(R31, R31);
				// Instr #3
				SDNode *addc0 = CurDAG->getMachineNode(
				Connex::ADDCV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload0, 0),
				SDValue(vload0, 0),
				SDValue(add0, 0)
				// no need for glue or chain input (since it normally consumes the output of the predecessor)
				);

				// R26 = INDEX;
				// Instr #4
				SDNode *ldix0 = CurDAG->getMachineNode(
				Connex::LDIX_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(addc0, 1)
				);

				// R25 = R26 & R30;
				// Instr #5
				SDNode *and0 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload1, 0),
				SDValue(ldix0, 0),
				// glue (or chain) input edge
				SDValue(ldix0, 1)
				);

				// R24 = R25 == R30;
				// Instr #6
				SDNode *eq0 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and0, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(and0, 1)
				);

				SDValue ct2 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #7
				SDNode *nop0 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct2,
				// glue (or chain) input edge
				SDValue(eq0, 1)
				);

				// WHERE_EQ;
				// Instr #8
				SDNode *whereeq0 = CurDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq0, 0),
				// glue (or chain) input edge
				SDValue(nop0, 0)
				);

				SDValue ct3 = CurDAG->getConstant(0, DL, MVT::i16, true, false);
				// R23 = 0;
				// Instr #9
				SDNode *vload2 = CurDAG->getMachineNode(
				Connex::VLOAD_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct3,
				SDValue(addc0, 0),
				// glue (or chain) input edge
				SDValue(whereeq0, 1)
				);

				// END_WHERE;
				// Instr #10
				SDNode *endwhere0 = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(vload2, 1)
				);

				// CELL_SHR(R23, R30);
				// Instr #11
				SDNode *cellshr0 = CurDAG->getMachineNode(
				Connex::CELLSHR_H,
				DL,
				MVT::Glue,
				SDValue(vload2, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(endwhere0, 0)
				);

				SDValue ct4 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #12
				SDNode *nop1 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct4,
				// glue (or chain) input edge
				SDValue(cellshr0, 0)
				);

				// R21 = SHIFT_REG;
				// Instr #13
				SDNode *ldsh0 = CurDAG->getMachineNode(
				Connex::LDSH_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop1, 0)
				);

				// R22 = R21 + R29;
				// Instr #14
				SDNode resH /add1*/ = CurDAG->getMachineNode(
				Connex::ADDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(add0, 0),
				SDValue(ldsh0, 0),
				// glue (or chain) input edge
				SDValue(ldsh0, 1)
				);

				SDNode *lastNode = resH;

lib/Target/Connex/Select_LTf16_OpincaaCodeGen.h

				//===-- Select_ADDf16_OpincaaCodeGen.h --------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				// Code auto-generated by method Kernel::genLLVMISelManualCode()
				// from the OPINCAA lib, from kernel lt.f16.
				// You should put this code in the Select() method of the SelectionDAGISel
				// class of your back end.
				// Number of instructions generated: 53.
				//
				//===----------------------------------------------------------------------===//

				// From /home/asusu/LLVM/llvm38Nov2016/llvm/build40/bin/Tests/NEW_v128i16/opincaa_standalone_apps/Emulate_f16/LT_f16_manual/DumpISel_OpincaaCodeGen_old05_050.cpp

				// Code auto-generated by method Kernel::genLLVMISelManualCode()
				// from Opincaa lib from kernel: lt.f16.
				// It is important to put this code in the Select() method of the
				// SelectionDAGISel class of your back end, after the ISelLowering pass,
				// which contains the DAG Combiner, because the DAG Combiner can remove
				// the getCopyToReg() we create, which can lead to the following error:
				// <<Register use before def!>> assertion failed.
				// Number of instructions generated: 53.




				SDValue ct0 = CurDAG->getConstant(0, DL, MVT::i16, true, false);
				// R31 = 0;
				// Instr #0
				SDNode *vload0 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct0,
				// glue (or chain) input edge
				SDValue(nodeOpSrcCast1, 1)
				);

				SDValue ct1 = CurDAG->getConstant(1, DL, MVT::i16, true, false);
				// R30 = 1;
				// Instr #1
				SDNode *vload1 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct1,
				// glue (or chain) input edge
				SDValue(vload0, 1)
				);

				SDValue ct2 = CurDAG->getConstant(5, DL, MVT::i16, true, false);
				// R29 = 5;
				// Instr #2
				SDNode *vload2 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct2,
				// glue (or chain) input edge
				SDValue(vload1, 1)
				);

				SDValue ct3 = CurDAG->getConstant(1023, DL, MVT::i16, true, false);
				// R13 = 1023;
				// Instr #3
				SDNode *vload3 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct3,
				// glue (or chain) input edge
				SDValue(vload2, 1)
				);

				SDValue ct4 = CurDAG->getConstant(31744, DL, MVT::i16, true, false);
				// R12 = 31744;
				// Instr #4
				SDNode *vload4 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct4,
				// glue (or chain) input edge
				SDValue(vload3, 1)
				);

				SDValue ct5 = CurDAG->getConstant(-32768, DL, MVT::i16, true, false);
				// R11 = -32768;
				// Instr #5
				SDNode *vload5 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct5,
				// glue (or chain) input edge
				SDValue(vload4, 1)
				);

				SDValue ct6 = CurDAG->getConstant(1024, DL, MVT::i16, true, false);
				// R10 = 1024;
				// Instr #6
				SDNode *vload6 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct6,
				// glue (or chain) input edge
				SDValue(vload5, 1)
				);

				SDValue ct7 = CurDAG->getConstant(0, DL, MVT::i16, true, false);
				// R19 = 0;
				// Instr #7
				SDNode *vload7 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct7,
				// glue (or chain) input edge
				SDValue(vload6, 1)
				);

				SDValue ct8 = CurDAG->getConstant(1, DL, MVT::i16, true, false);
				// R14 = 1;
				// Instr #8
				SDNode *vload8 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct8,
				// glue (or chain) input edge
				SDValue(vload7, 1)
				);

				// R25 = R27 & R12;
				// Instr #9
				SDNode *and0 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload4, 0),
				SDValue(nodeOpSrcCast1, 0),
				// glue (or chain) input edge
				SDValue(vload8, 1)
				);

				// R26 = R27 & R13;
				// Instr #10
				SDNode *and1 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload3, 0),
				SDValue(nodeOpSrcCast1, 0),
				// glue (or chain) input edge
				SDValue(and0, 1)
				);

				// R21 = R23 & R12;
				// Instr #11
				SDNode *and2 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload4, 0),
				SDValue(nodeOpSrcCast2, 0),
				// glue (or chain) input edge
				SDValue(and1, 1)
				);

				// R22 = R23 & R13;
				// Instr #12
				SDNode *and3 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload3, 0),
				SDValue(nodeOpSrcCast2, 0),
				// glue (or chain) input edge
				SDValue(and2, 1)
				);

				// R17 = POPCNT(R25);
				// Instr #13
				SDNode *popcnt0 = CurDAG->getMachineNode(
				Connex::POPCNT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and0, 0),
				// glue (or chain) input edge
				SDValue(and3, 1)
				);

				// R17 = R17 == R29;
				// Instr #14
				SDNode *eq0 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(popcnt0, 0),
				SDValue(vload2, 0),
				// glue (or chain) input edge
				SDValue(popcnt0, 1)
				);

				// R18 = R26 == R31;
				// Instr #15
				SDNode *eq1 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and1, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(eq0, 1)
				);

				// R18 = R30 - R18;
				// Instr #16
				SDNode *sub0 = CurDAG->getMachineNode(
				Connex::SUBV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload1, 0),
				SDValue(eq1, 0),
				// glue (or chain) input edge
				SDValue(eq1, 1)
				);

				// R18 = R18 & R17;
				// Instr #17
				SDNode *and4 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq0, 0),
				SDValue(sub0, 0),
				// glue (or chain) input edge
				SDValue(sub0, 1)
				);

				// R18 = R18 == R30;
				// Instr #18
				SDNode *eq2 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and4, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(and4, 1)
				);

				SDValue ct9 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #19
				SDNode *nop0 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct9,
				// glue (or chain) input edge
				SDValue(eq2, 1)
				);

				// WHERE_EQ;
				// Instr #20
				SDNode *whereeq0 = CurDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq2, 0),
				// glue (or chain) input edge
				SDValue(nop0, 0)
				);

				SDValue ct10 = CurDAG->getConstant(0, DL, MVT::i16, true, false);
				// R14 = 0;
				// Instr #21
				SDNode *vload9 = CurDAG->getMachineNode(
				Connex::VLOAD_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct10,
				SDValue(vload8, 0),
				// glue (or chain) input edge
				SDValue(whereeq0, 1)
				);

				// END_WHERE;
				// Instr #22
				SDNode *endwhere0 = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(vload9, 1)
				);

				// R15 = POPCNT(R21);
				// Instr #23
				SDNode *popcnt1 = CurDAG->getMachineNode(
				Connex::POPCNT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and2, 0),
				// glue (or chain) input edge
				SDValue(endwhere0, 0)
				);

				// R15 = R15 == R29;
				// Instr #24
				SDNode *eq3 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(popcnt1, 0),
				SDValue(vload2, 0),
				// glue (or chain) input edge
				SDValue(popcnt1, 1)
				);

				// R16 = R22 == R31;
				// Instr #25
				SDNode *eq4 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and3, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(eq3, 1)
				);

				// R16 = R30 - R16;
				// Instr #26
				SDNode *sub1 = CurDAG->getMachineNode(
				Connex::SUBV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload1, 0),
				SDValue(eq4, 0),
				// glue (or chain) input edge
				SDValue(eq4, 1)
				);

				// R16 = R16 & R15;
				// Instr #27
				SDNode *and5 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq3, 0),
				SDValue(sub1, 0),
				// glue (or chain) input edge
				SDValue(sub1, 1)
				);

				// R16 = R16 == R30;
				// Instr #28
				SDNode *eq5 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and5, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(and5, 1)
				);

				SDValue ct11 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #29
				SDNode *nop1 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct11,
				// glue (or chain) input edge
				SDValue(eq5, 1)
				);

				// WHERE_EQ;
				// Instr #30
				SDNode *whereeq1 = CurDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq5, 0),
				// glue (or chain) input edge
				SDValue(nop1, 0)
				);

				SDValue ct12 = CurDAG->getConstant(0, DL, MVT::i16, true, false);
				// R14 = 0;
				// Instr #31
				SDNode *vload10 = CurDAG->getMachineNode(
				Connex::VLOAD_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct12,
				SDValue(vload9, 0),
				// glue (or chain) input edge
				SDValue(whereeq1, 1)
				);

				// END_WHERE;
				// Instr #32
				SDNode *endwhere1 = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(vload10, 1)
				);

				// R16 = R27 == R23;
				// Instr #33
				SDNode *eq6 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(nodeOpSrcCast1, 0),
				SDValue(nodeOpSrcCast2, 0),
				// glue (or chain) input edge
				SDValue(endwhere1, 0)
				);

				// R14 = R14 ^ R16;
				// Instr #34
				SDNode *xor0 = CurDAG->getMachineNode(
				Connex::XORV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq6, 0),
				SDValue(vload10, 0),
				// glue (or chain) input edge
				SDValue(eq6, 1)
				);

				// R16 = R27 & R23;
				// Instr #35
				SDNode *and6 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(nodeOpSrcCast2, 0),
				SDValue(nodeOpSrcCast1, 0),
				// glue (or chain) input edge
				SDValue(xor0, 1)
				);

				// R16 = R16 & R11;
				// Instr #36
				SDNode *and7 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload5, 0),
				SDValue(and6, 0),
				// glue (or chain) input edge
				SDValue(and6, 1)
				);

				// R16 = R16 == R11;
				// Instr #37
				SDNode *eq7 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and7, 0),
				SDValue(vload5, 0),
				// glue (or chain) input edge
				SDValue(and7, 1)
				);

				// R16 = R16 & R14;
				// Instr #38
				SDNode *and8 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(xor0, 0),
				SDValue(eq7, 0),
				// glue (or chain) input edge
				SDValue(eq7, 1)
				);

				// R16 = R16 == R30;
				// Instr #39
				SDNode *eq8 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and8, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(and8, 1)
				);

				SDValue ct13 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #40
				SDNode *nop2 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct13,
				// glue (or chain) input edge
				SDValue(eq8, 1)
				);

				// WHERE_EQ;
				// Instr #41
				SDNode *whereeq2 = CurDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq8, 0),
				// glue (or chain) input edge
				SDValue(nop2, 0)
				);

				// R27 = R27 ^ R11;
				// Instr #42
				SDNode *xor1 = CurDAG->getMachineNode(
				Connex::XORV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload5, 0),
				SDValue(nodeOpSrcCast1, 0),
				SDValue(nodeOpSrcCast1, 0),
				// glue (or chain) input edge
				SDValue(whereeq2, 1)
				);

				// R23 = R23 ^ R11;
				// Instr #43
				SDNode *xor2 = CurDAG->getMachineNode(
				Connex::XORV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload5, 0),
				SDValue(nodeOpSrcCast2, 0),
				SDValue(nodeOpSrcCast2, 0),
				// glue (or chain) input edge
				SDValue(xor1, 1)
				);

				SDValue ct14 = CurDAG->getConstant(1, DL, MVT::i16, true, false);
				// R19 = 1;
				// Instr #44
				SDNode *vload11 = CurDAG->getMachineNode(
				Connex::VLOAD_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct14,
				SDValue(vload7, 0),
				// glue (or chain) input edge
				SDValue(xor2, 1)
				);

				// END_WHERE;
				// Instr #45
				SDNode *endwhere2 = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(vload11, 1)
				);

				// R16 = R27 < R23;
				// Instr #46
				SDNode *lt0 = CurDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(xor1, 0),
				SDValue(xor2, 0),
				// glue (or chain) input edge
				SDValue(endwhere2, 0)
				);

				// R16 = R16 & R14;
				// Instr #47
				SDNode *and9 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(xor0, 0),
				SDValue(lt0, 0),
				// glue (or chain) input edge
				SDValue(lt0, 1)
				);

				// R16 = R16 == R30;
				// Instr #48
				SDNode *eq9 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and9, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(and9, 1)
				);

				SDValue ct15 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #49
				SDNode *nop3 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct15,
				// glue (or chain) input edge
				SDValue(eq9, 1)
				);

				// WHERE_EQ;
				// Instr #50
				SDNode *whereeq3 = CurDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq9, 0),
				// glue (or chain) input edge
				SDValue(nop3, 0)
				);

				// R19 = R19 ^ R30;
				// Instr #51
				SDNode resF16 /xor3*/ = CurDAG->getMachineNode(
				Connex::XORV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload1, 0),
				SDValue(vload11, 0),
				SDValue(vload11, 0),
				// glue (or chain) input edge
				SDValue(whereeq3, 1)
				);

				// END_WHERE;
				// Instr #52
				SDNode lastNode /endwhere3*/ = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				// Alex: MVT::Glue,
				MVT::Other,
				// glue (or chain) input edge
				// Alex: SDValue(xor3, 1)
				SDValue(resF16, 1)
				);

lib/Target/Connex/Select_MULTf16_OpincaaCodeGen.h

This file has a very large number of changes (3,266 lines). Show File Contents

lib/Target/Connex/Select_MULTi32_ComplementedRepresentation_OpincaaCodeGen.h

				//===-- Select_MULTi32_OpincaaCodeGen.h --------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				// Code auto-generated by method Kernel::genLLVMISelManualCode()
				// from the OPINCAA lib, from kernel mul.f16.
				// You should include this code in the Select() method of the SelectionDAGISel
				// class of your back end.
				// Number of instructions generated: 27.
				//
				//===----------------------------------------------------------------------===//


				// Copied from /home/asusu/LLVM/llvm38Nov2016/llvm/build40/bin/Tests/NEW_v128i16/opincaa_standalone_apps/Emulate_i32/MULTi32_manual_Complemented_radix_216_representation/DumpISel_OpincaaCodeGen_old27_220.cpp


				// R27 is REG_SRC1. It is represented by result of nodeOpSrcCast1.
				// R28 is REG_SRC2. It is represented by result of nodeOpSrcCast2.






				SDValue ct0 = CurDAG->getConstant(0, DL, MVT::i16, true, false);
				// R31 = 0;
				// Instr #0
				SDNode *vload0 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Other,
				ct0,
				// glue (or chain) input edge
				SDValue(nodeOpSrcCast2, 1)
				);

				SDValue ct1 = CurDAG->getConstant(1, DL, MVT::i16, true, false);
				// R30 = 1;
				// Instr #1
				SDNode *vload1 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Other,
				ct1,
				// glue (or chain) input edge
				SDValue(vload0, 1)
				);

				// MULT_U(R28, R27);
				// Instr #2
				SDNode *mult_u0 = CurDAG->getMachineNode(
				Connex::MULT_U_H,
				DL,
				MVT::Other,
				SDValue(nodeOpSrcCast2, 0),
				SDValue(nodeOpSrcCast1, 0),
				// glue (or chain) input edge
				SDValue(vload1, 1)
				);

				// R26 = MULT_LOW();
				// Instr #3
				SDNode *multlo0 = CurDAG->getMachineNode(
				Connex::MULTLO_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Other,
				// glue (or chain) input edge
				SDValue(mult_u0, 0)
				);

				// R25 = MULT_HIGH();
				// Instr #4
				SDNode *multhi0 = CurDAG->getMachineNode(
				Connex::MULTHI_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Other,
				// glue (or chain) input edge
				SDValue(multlo0, 1)
				);

				// CELL_SHR(R27, R30);
				// Instr #5
				SDNode *cellshr0 = CurDAG->getMachineNode(
				Connex::CELLSHR_H,
				DL,
				MVT::Other,
				SDValue(nodeOpSrcCast1, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(multhi0, 1)
				);

				SDValue ct2 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #6
				SDNode *nop0 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Other,
				ct2,
				// glue (or chain) input edge
				SDValue(cellshr0, 0)
				);

				// R24 = SHIFT_REG;
				// Instr #7
				SDNode *ldsh0 = CurDAG->getMachineNode(
				Connex::LDSH_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Other,
				// glue (or chain) input edge
				SDValue(nop0, 0)
				);

				// MULT_U(R24, R28);
				// Instr #8
				SDNode *mult_u1 = CurDAG->getMachineNode(
				Connex::MULT_U_H,
				DL,
				MVT::Other,
				SDValue(ldsh0, 0),
				SDValue(nodeOpSrcCast2, 0),
				// glue (or chain) input edge
				SDValue(ldsh0, 1)
				);

				// R24 = MULT_LOW();
				// Instr #9
				SDNode *multlo1 = CurDAG->getMachineNode(
				Connex::MULTLO_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Other,
				// glue (or chain) input edge
				SDValue(mult_u1, 0)
				);

				// CELL_SHR(R28, R30);
				// Instr #10
				SDNode *cellshr1 = CurDAG->getMachineNode(
				Connex::CELLSHR_H,
				DL,
				MVT::Other,
				SDValue(nodeOpSrcCast2, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(multlo1, 1)
				);

				SDValue ct3 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #11
				SDNode *nop1 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Other,
				ct3,
				// glue (or chain) input edge
				SDValue(cellshr1, 0)
				);

				// R23 = SHIFT_REG;
				// Instr #12
				SDNode *ldsh1 = CurDAG->getMachineNode(
				Connex::LDSH_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Other,
				// glue (or chain) input edge
				SDValue(nop1, 0)
				);

				// MULT_U(R23, R27);
				// Instr #13
				SDNode *mult_u2 = CurDAG->getMachineNode(
				Connex::MULT_U_H,
				DL,
				MVT::Other,
				SDValue(ldsh1, 0),
				SDValue(nodeOpSrcCast1, 0),
				// glue (or chain) input edge
				SDValue(ldsh1, 1)
				);

				// R23 = MULT_LOW();
				// Instr #14
				SDNode *multlo2 = CurDAG->getMachineNode(
				Connex::MULTLO_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Other,
				// glue (or chain) input edge
				SDValue(mult_u2, 0)
				);

				// CELL_SHR(R25, R30);
				// Instr #15
				SDNode *cellshr2 = CurDAG->getMachineNode(
				Connex::CELLSHR_H,
				DL,
				MVT::Other,
				SDValue(multhi0, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(multlo2, 1)
				);

				SDValue ct4 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #16
				SDNode *nop2 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Other,
				ct4,
				// glue (or chain) input edge
				SDValue(cellshr2, 0)
				);

				// R21 = SHIFT_REG;
				// Instr #17
				SDNode *ldsh2 = CurDAG->getMachineNode(
				Connex::LDSH_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Other,
				// glue (or chain) input edge
				SDValue(nop2, 0)
				);

				// R14 = INDEX;
				// Instr #18
				SDNode *ldix0 = CurDAG->getMachineNode(
				Connex::LDIX_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Other,
				// glue (or chain) input edge
				SDValue(ldsh2, 1)
				);

				// R13 = R14 & R30;
				// Instr #19
				SDNode *and0 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Other,
				SDValue(vload1, 0),
				SDValue(ldix0, 0),
				// glue (or chain) input edge
				SDValue(ldix0, 1)
				);

				// R12 = R13 == R30;
				// Instr #20
				SDNode *eq0 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Other,
				SDValue(and0, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(and0, 1)
				);

				SDValue ct5 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #21
				SDNode *nop3 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Other,
				ct5,
				// glue (or chain) input edge
				SDValue(eq0, 1)
				);

				// WHERE_EQ;
				// Instr #22
				SDNode *whereeq0 = CurDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				TYPE_VECTOR_I16,
				MVT::Other,
				SDValue(eq0, 0),
				// glue (or chain) input edge
				SDValue(nop3, 0)
				);

				// R26 = R21 \| R21;
				// Instr #23
				SDNode *or0 = CurDAG->getMachineNode(
				Connex::ORV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Other,
				SDValue(ldsh2, 0),
				SDValue(ldsh2, 0),
				SDValue(multlo0, 0),
				// glue (or chain) input edge
				SDValue(whereeq0, 1)
				);

				// R26 = R24 + R26;
				// Instr #24
				SDNode *add0 = CurDAG->getMachineNode(
				Connex::ADDV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Other,
				SDValue(or0, 0),
				SDValue(multlo1, 0),
				SDValue(or0, 0),
				// glue (or chain) input edge
				SDValue(or0, 1)
				);

				// R26 = R23 + R26;
				// Instr #25
				SDNode resH /add1*/ = CurDAG->getMachineNode(
				Connex::ADDV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Other,
				SDValue(add0, 0),
				SDValue(multlo2, 0),
				SDValue(add0, 0),
				// glue (or chain) input edge
				SDValue(add0, 1)
				);

				// END_WHERE;
				// Instr #26
				SDNode lastNode /endwhere0*/ = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Other,
				// glue (or chain) input edge
				SDValue(resH /add1/, 1)
				);


				//SDNode *lastNode = resF16;

lib/Target/Connex/Select_REDf16_OpincaaCodeGen.h

				//===-- Select_REDf16_OpincaaCodeGen.h --------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				// Code auto-generated by method Kernel::genLLVMISelManualCode()
				// from the OPINCAA lib, from kernel red.f16.
				// You should include this code in the Select() method of the SelectionDAGISel
				// class of your back end.
				// Number of instructions generated: 122.
				//
				//===----------------------------------------------------------------------===//


				// From /home/asusu/LLVM/Tests/opincaa_standalone_apps/Emulate_f16/REDf16_manual/DumpISel_OpincaaCodeGen.cpp



				SDValue ct0 = CurDAG->getConstant(1, DL, MVT::i16, true, false);
				// R31 = 1;
				// Instr #0
				SDNode *vload0 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct0,
				// glue (or chain) input edge
				SDValue(nodeOpSrcCast, 1)
				);

				SDValue ct1 = CurDAG->getConstant(0, DL, MVT::i16, true, false);
				// R30 = 0;
				// Instr #1
				SDNode *vload1 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct1,
				// glue (or chain) input edge
				SDValue(vload0, 1)
				);

				SDValue ct2 = CurDAG->getConstant(31, DL, MVT::i16, true, false);
				// R29 = 31;
				// Instr #2
				SDNode *vload2 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct2,
				// glue (or chain) input edge
				SDValue(vload1, 1)
				);

				SDValue ct3 = CurDAG->getConstant(1023, DL, MVT::i16, true, false);
				// R13 = 1023;
				// Instr #3
				SDNode *vload3 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct3,
				// glue (or chain) input edge
				SDValue(vload2, 1)
				);

				SDValue ct4 = CurDAG->getConstant(31744, DL, MVT::i16, true, false);
				// R12 = 31744;
				// Instr #4
				SDNode *vload4 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct4,
				// glue (or chain) input edge
				SDValue(vload3, 1)
				);

				SDValue ct5 = CurDAG->getConstant(-32768, DL, MVT::i16, true, false);
				// R11 = -32768;
				// Instr #5
				SDNode *vload5 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct5,
				// glue (or chain) input edge
				SDValue(vload4, 1)
				);

				SDValue ct6 = CurDAG->getConstant(1024, DL, MVT::i16, true, false);
				// R10 = 1024;
				// Instr #6
				SDNode *vload6 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct6,
				// glue (or chain) input edge
				SDValue(vload5, 1)
				);

				// R25 = R28 & R11;
				// Instr #7
				SDNode *and0 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload5, 0),
				SDValue(nodeOpSrcCast, 0),
				// glue (or chain) input edge
				SDValue(vload6, 1)
				);

				// R26 = R28 & R12;
				// Instr #8
				SDNode *and1 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload4, 0),
				SDValue(nodeOpSrcCast, 0),
				// glue (or chain) input edge
				SDValue(and0, 1)
				);

				SDValue ct7 = CurDAG->getConstant(10, DL, MVT::i16, true, false);
				// R26 = R26 >> 10;
				// Instr #9
				SDNode *ishr0 = CurDAG->getMachineNode(
				Connex::ISHRV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and1, 0),
				ct7,
				// glue (or chain) input edge
				SDValue(and1, 1)
				);

				// R27 = R28 & R13;
				// Instr #10
				SDNode *and2 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload3, 0),
				SDValue(nodeOpSrcCast, 0),
				// glue (or chain) input edge
				SDValue(ishr0, 1)
				);

				// R17 = R30 < R27;
				// Instr #11
				SDNode *lt0 = CurDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload1, 0),
				SDValue(and2, 0),
				// glue (or chain) input edge
				SDValue(and2, 1)
				);

				// R16 = R26 == R30;
				// Instr #12
				SDNode *eq0 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(ishr0, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(lt0, 1)
				);

				// R09 = R16 & R17;
				// Instr #13
				SDNode *and3 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(lt0, 0),
				SDValue(eq0, 0),
				// glue (or chain) input edge
				SDValue(eq0, 1)
				);

				// R09 = R09 == R31;
				// Instr #14
				SDNode *eq1 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and3, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(and3, 1)
				);

				SDValue ct8 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #15
				SDNode *nop0 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct8,
				// glue (or chain) input edge
				SDValue(eq1, 1)
				);

				// WHERE_EQ;
				// Instr #16
				SDNode *whereeq0 = CurDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq1, 0),
				// glue (or chain) input edge
				SDValue(nop0, 0)
				);

				SDValue ct9 = CurDAG->getConstant(1, DL, MVT::i16, true, false);
				// R26 = 1;
				// Instr #17
				SDNode *vload7 = CurDAG->getMachineNode(
				Connex::VLOAD_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct9,
				SDValue(ishr0, 0),
				// glue (or chain) input edge
				SDValue(whereeq0, 1)
				);

				// END_WHERE;
				// Instr #18
				SDNode *endwhere0 = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(vload7, 1)
				);

				// R17 = R26 == R29;
				// Instr #19
				SDNode *eq2 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				SDValue(vload2, 0),
				// glue (or chain) input edge
				SDValue(endwhere0, 0)
				);

				// R17 = R17 \| R16;
				// Instr #20
				SDNode *or0 = CurDAG->getMachineNode(
				Connex::ORV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq0, 0),
				SDValue(eq2, 0),
				// glue (or chain) input edge
				SDValue(eq2, 1)
				);

				// R17 = R17 == R30;
				// Instr #21
				SDNode *eq3 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(or0, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(or0, 1)
				);

				SDValue ct10 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #22
				SDNode *nop1 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct10,
				// glue (or chain) input edge
				SDValue(eq3, 1)
				);

				// WHERE_EQ;
				// Instr #23
				SDNode *whereeq1 = CurDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq3, 0),
				// glue (or chain) input edge
				SDValue(nop1, 0)
				);

				// R27 = R27 \| R10;
				// Instr #24
				SDNode *or1 = CurDAG->getMachineNode(
				Connex::ORV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload6, 0),
				SDValue(and2, 0),
				SDValue(and2, 0),
				// glue (or chain) input edge
				SDValue(whereeq1, 1)
				);

				// END_WHERE;
				// Instr #25
				SDNode *endwhere1 = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(or1, 1)
				);

				// R18 = R26 == R29;
				// Instr #26
				SDNode *eq4 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				SDValue(vload2, 0),
				// glue (or chain) input edge
				SDValue(endwhere1, 0)
				);

				// R17 = R27 == R30;
				// Instr #27
				SDNode *eq5 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(or1, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(eq4, 1)
				);

				// R09 = R31 - R17;
				// Instr #28
				SDNode *sub0 = CurDAG->getMachineNode(
				Connex::SUBV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload0, 0),
				SDValue(eq5, 0),
				// glue (or chain) input edge
				SDValue(eq5, 1)
				);

				// R09 = R09 & R18;
				// Instr #29
				SDNode *and4 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq4, 0),
				SDValue(sub0, 0),
				// glue (or chain) input edge
				SDValue(sub0, 1)
				);

				// REDUCE(R09);
				// Instr #30
				SDNode *sumRed0 = CurDAG->getMachineNode(
				Connex::RED_H,
				DL,
				MVT::Glue,
				SDValue(and4, 0),
				// glue (or chain) input edge
				SDValue(and4, 1)
				);

				// R24 = R18 & R17;
				// Instr #31
				SDNode *and5 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq5, 0),
				SDValue(eq4, 0),
				// glue (or chain) input edge
				SDValue(sumRed0, 0)
				);

				// R09 = R25 == R30;
				// Instr #32
				SDNode *eq6 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and0, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(and5, 1)
				);

				// R16 = R24 & R09;
				// Instr #33
				SDNode *and6 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq6, 0),
				SDValue(and5, 0),
				// glue (or chain) input edge
				SDValue(eq6, 1)
				);

				// REDUCE(R16);
				// Instr #34
				SDNode *sumRed1 = CurDAG->getMachineNode(
				Connex::RED_H,
				DL,
				MVT::Glue,
				SDValue(and6, 0),
				// glue (or chain) input edge
				SDValue(and6, 1)
				);

				// R09 = R31 - R09;
				// Instr #35
				SDNode *sub1 = CurDAG->getMachineNode(
				Connex::SUBV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload0, 0),
				SDValue(eq6, 0),
				// glue (or chain) input edge
				SDValue(sumRed1, 0)
				);

				// R16 = R24 & R09;
				// Instr #36
				SDNode *and7 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(sub1, 0),
				SDValue(and5, 0),
				// glue (or chain) input edge
				SDValue(sub1, 1)
				);

				// REDUCE(R16);
				// Instr #37
				SDNode *sumRed2 = CurDAG->getMachineNode(
				Connex::RED_H,
				DL,
				MVT::Glue,
				SDValue(and7, 0),
				// glue (or chain) input edge
				SDValue(and7, 1)
				);

				// R09 = R25 == R11;
				// Instr #38
				SDNode *eq7 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and0, 0),
				SDValue(vload5, 0),
				// glue (or chain) input edge
				SDValue(sumRed2, 0)
				);

				SDValue ct11 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #39
				SDNode *nop2 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct11,
				// glue (or chain) input edge
				SDValue(eq7, 1)
				);

				// WHERE_EQ;
				// Instr #40
				SDNode *whereeq2 = CurDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq7, 0),
				// glue (or chain) input edge
				SDValue(nop2, 0)
				);

				// R27 = R30 - R27;
				// Instr #41
				SDNode *sub2 = CurDAG->getMachineNode(
				Connex::SUBV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload1, 0),
				SDValue(or1, 0),
				SDValue(or1, 0),
				// glue (or chain) input edge
				SDValue(whereeq2, 1)
				);

				// END_WHERE;
				// Instr #42
				SDNode *endwhere2 = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(sub2, 1)
				);

				SDValue ct12 = CurDAG->getConstant(0, DL, MVT::i16, true, false);
				// R02 = R26 << 0;
				// Instr #43
				SDNode *ishl0 = CurDAG->getMachineNode(
				Connex::ISHLV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				ct12,
				// glue (or chain) input edge
				SDValue(endwhere2, 0)
				);

				SDValue ct13 = CurDAG->getConstant(0, DL, MVT::i16, true, false);
				// R02 = 0;
				// Instr #44
				SDNode *vload8 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct13,
				// glue (or chain) input edge
				SDValue(ishl0, 1)
				);

				SDValue ct14 = CurDAG->getConstant(6, DL, MVT::i16, true, false);
				// R24 = 6;
				// Instr #45
				SDNode *vload9 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct14,
				// glue (or chain) input edge
				SDValue(vload8, 1)
				);

				// R19 = R26 < R24;
				// Instr #46
				SDNode *lt1 = CurDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				SDValue(vload9, 0),
				// glue (or chain) input edge
				SDValue(vload9, 1)
				);

				// R17 = R02 < R26;
				// Instr #47
				SDNode *lt2 = CurDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload8, 0),
				SDValue(vload7, 0),
				// glue (or chain) input edge
				SDValue(lt1, 1)
				);

				// R02 = R31 + R02;
				// Instr #48
				SDNode *add0 = CurDAG->getMachineNode(
				Connex::ADDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload8, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(lt2, 1)
				);

				// R09 = R19 & R17;
				// Instr #49
				SDNode *and8 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(lt2, 0),
				SDValue(lt1, 0),
				// glue (or chain) input edge
				SDValue(add0, 1)
				);

				// R09 = R09 == R31;
				// Instr #50
				SDNode *eq8 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and8, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(and8, 1)
				);

				SDValue ct15 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #51
				SDNode *nop3 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct15,
				// glue (or chain) input edge
				SDValue(eq8, 1)
				);

				// WHERE_EQ;
				// Instr #52
				SDNode *whereeq3 = CurDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq8, 0),
				// glue (or chain) input edge
				SDValue(nop3, 0)
				);

				// R19 = R26 - R02;
				// Instr #53
				SDNode *sub3 = CurDAG->getMachineNode(
				Connex::SUBV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				SDValue(add0, 0),
				SDValue(lt1, 0),
				// glue (or chain) input edge
				SDValue(whereeq3, 1)
				);

				// R27 = R27 << R19;
				// Instr #54
				SDNode *shl0 = CurDAG->getMachineNode(
				Connex::SHLV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(sub2, 0),
				SDValue(sub3, 0),
				SDValue(sub2, 0),
				// glue (or chain) input edge
				SDValue(sub3, 1)
				);

				// REDUCE(R27);
				// Instr #55
				SDNode *sumRed3 = CurDAG->getMachineNode(
				Connex::RED_H,
				DL,
				MVT::Glue,
				SDValue(shl0, 0),
				// glue (or chain) input edge
				SDValue(shl0, 1)
				);

				// END_WHERE;
				// Instr #56
				SDNode *endwhere3 = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(sumRed3, 0)
				);

				SDValue ct16 = CurDAG->getConstant(5, DL, MVT::i16, true, false);
				// R02 = 5;
				// Instr #57
				SDNode *vload10 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct16,
				// glue (or chain) input edge
				SDValue(endwhere3, 0)
				);

				SDValue ct17 = CurDAG->getConstant(11, DL, MVT::i16, true, false);
				// R24 = 11;
				// Instr #58
				SDNode *vload11 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct17,
				// glue (or chain) input edge
				SDValue(vload10, 1)
				);

				// R19 = R26 < R24;
				// Instr #59
				SDNode *lt3 = CurDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				SDValue(vload11, 0),
				// glue (or chain) input edge
				SDValue(vload11, 1)
				);

				// R17 = R02 < R26;
				// Instr #60
				SDNode *lt4 = CurDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload10, 0),
				SDValue(vload7, 0),
				// glue (or chain) input edge
				SDValue(lt3, 1)
				);

				// R02 = R31 + R02;
				// Instr #61
				SDNode *add1 = CurDAG->getMachineNode(
				Connex::ADDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload10, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(lt4, 1)
				);

				// R09 = R19 & R17;
				// Instr #62
				SDNode *and9 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(lt4, 0),
				SDValue(lt3, 0),
				// glue (or chain) input edge
				SDValue(add1, 1)
				);

				// R09 = R09 == R31;
				// Instr #63
				SDNode *eq9 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and9, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(and9, 1)
				);

				SDValue ct18 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #64
				SDNode *nop4 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct18,
				// glue (or chain) input edge
				SDValue(eq9, 1)
				);

				// WHERE_EQ;
				// Instr #65
				SDNode *whereeq4 = CurDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq9, 0),
				// glue (or chain) input edge
				SDValue(nop4, 0)
				);

				// R19 = R26 - R02;
				// Instr #66
				SDNode *sub4 = CurDAG->getMachineNode(
				Connex::SUBV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				SDValue(add1, 0),
				SDValue(lt3, 0),
				// glue (or chain) input edge
				SDValue(whereeq4, 1)
				);

				// R27 = R27 << R19;
				// Instr #67
				SDNode *shl1 = CurDAG->getMachineNode(
				Connex::SHLV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(shl0, 0),
				SDValue(sub4, 0),
				SDValue(shl0, 0),
				// glue (or chain) input edge
				SDValue(sub4, 1)
				);

				// REDUCE(R27);
				// Instr #68
				SDNode *sumRed4 = CurDAG->getMachineNode(
				Connex::RED_H,
				DL,
				MVT::Glue,
				SDValue(shl1, 0),
				// glue (or chain) input edge
				SDValue(shl1, 1)
				);

				// END_WHERE;
				// Instr #69
				SDNode *endwhere4 = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(sumRed4, 0)
				);

				SDValue ct19 = CurDAG->getConstant(10, DL, MVT::i16, true, false);
				// R02 = 10;
				// Instr #70
				SDNode *vload12 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct19,
				// glue (or chain) input edge
				SDValue(endwhere4, 0)
				);

				SDValue ct20 = CurDAG->getConstant(16, DL, MVT::i16, true, false);
				// R24 = 16;
				// Instr #71
				SDNode *vload13 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct20,
				// glue (or chain) input edge
				SDValue(vload12, 1)
				);

				// R19 = R26 < R24;
				// Instr #72
				SDNode *lt5 = CurDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				SDValue(vload13, 0),
				// glue (or chain) input edge
				SDValue(vload13, 1)
				);

				// R17 = R02 < R26;
				// Instr #73
				SDNode *lt6 = CurDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload12, 0),
				SDValue(vload7, 0),
				// glue (or chain) input edge
				SDValue(lt5, 1)
				);

				// R02 = R31 + R02;
				// Instr #74
				SDNode *add2 = CurDAG->getMachineNode(
				Connex::ADDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload12, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(lt6, 1)
				);

				// R09 = R19 & R17;
				// Instr #75
				SDNode *and10 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(lt6, 0),
				SDValue(lt5, 0),
				// glue (or chain) input edge
				SDValue(add2, 1)
				);

				// R09 = R09 == R31;
				// Instr #76
				SDNode *eq10 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and10, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(and10, 1)
				);

				SDValue ct21 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #77
				SDNode *nop5 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct21,
				// glue (or chain) input edge
				SDValue(eq10, 1)
				);

				// WHERE_EQ;
				// Instr #78
				SDNode *whereeq5 = CurDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq10, 0),
				// glue (or chain) input edge
				SDValue(nop5, 0)
				);

				// R19 = R26 - R02;
				// Instr #79
				SDNode *sub5 = CurDAG->getMachineNode(
				Connex::SUBV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				SDValue(add2, 0),
				SDValue(lt5, 0),
				// glue (or chain) input edge
				SDValue(whereeq5, 1)
				);

				// R27 = R27 << R19;
				// Instr #80
				SDNode *shl2 = CurDAG->getMachineNode(
				Connex::SHLV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(shl1, 0),
				SDValue(sub5, 0),
				SDValue(shl1, 0),
				// glue (or chain) input edge
				SDValue(sub5, 1)
				);

				// REDUCE(R27);
				// Instr #81
				SDNode *sumRed5 = CurDAG->getMachineNode(
				Connex::RED_H,
				DL,
				MVT::Glue,
				SDValue(shl2, 0),
				// glue (or chain) input edge
				SDValue(shl2, 1)
				);

				// END_WHERE;
				// Instr #82
				SDNode *endwhere5 = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(sumRed5, 0)
				);

				SDValue ct22 = CurDAG->getConstant(15, DL, MVT::i16, true, false);
				// R02 = 15;
				// Instr #83
				SDNode *vload14 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct22,
				// glue (or chain) input edge
				SDValue(endwhere5, 0)
				);

				SDValue ct23 = CurDAG->getConstant(21, DL, MVT::i16, true, false);
				// R24 = 21;
				// Instr #84
				SDNode *vload15 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct23,
				// glue (or chain) input edge
				SDValue(vload14, 1)
				);

				// R19 = R26 < R24;
				// Instr #85
				SDNode *lt7 = CurDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				SDValue(vload15, 0),
				// glue (or chain) input edge
				SDValue(vload15, 1)
				);

				// R17 = R02 < R26;
				// Instr #86
				SDNode *lt8 = CurDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload14, 0),
				SDValue(vload7, 0),
				// glue (or chain) input edge
				SDValue(lt7, 1)
				);

				// R02 = R31 + R02;
				// Instr #87
				SDNode *add3 = CurDAG->getMachineNode(
				Connex::ADDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload14, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(lt8, 1)
				);

				// R09 = R19 & R17;
				// Instr #88
				SDNode *and11 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(lt8, 0),
				SDValue(lt7, 0),
				// glue (or chain) input edge
				SDValue(add3, 1)
				);

				// R09 = R09 == R31;
				// Instr #89
				SDNode *eq11 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and11, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(and11, 1)
				);

				SDValue ct24 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #90
				SDNode *nop6 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct24,
				// glue (or chain) input edge
				SDValue(eq11, 1)
				);

				// WHERE_EQ;
				// Instr #91
				SDNode *whereeq6 = CurDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq11, 0),
				// glue (or chain) input edge
				SDValue(nop6, 0)
				);

				// R19 = R26 - R02;
				// Instr #92
				SDNode *sub6 = CurDAG->getMachineNode(
				Connex::SUBV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				SDValue(add3, 0),
				SDValue(lt7, 0),
				// glue (or chain) input edge
				SDValue(whereeq6, 1)
				);

				// R27 = R27 << R19;
				// Instr #93
				SDNode *shl3 = CurDAG->getMachineNode(
				Connex::SHLV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(shl2, 0),
				SDValue(sub6, 0),
				SDValue(shl2, 0),
				// glue (or chain) input edge
				SDValue(sub6, 1)
				);

				// REDUCE(R27);
				// Instr #94
				SDNode *sumRed6 = CurDAG->getMachineNode(
				Connex::RED_H,
				DL,
				MVT::Glue,
				SDValue(shl3, 0),
				// glue (or chain) input edge
				SDValue(shl3, 1)
				);

				// END_WHERE;
				// Instr #95
				SDNode *endwhere6 = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(sumRed6, 0)
				);

				SDValue ct25 = CurDAG->getConstant(20, DL, MVT::i16, true, false);
				// R02 = 20;
				// Instr #96
				SDNode *vload16 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct25,
				// glue (or chain) input edge
				SDValue(endwhere6, 0)
				);

				SDValue ct26 = CurDAG->getConstant(26, DL, MVT::i16, true, false);
				// R24 = 26;
				// Instr #97
				SDNode *vload17 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct26,
				// glue (or chain) input edge
				SDValue(vload16, 1)
				);

				// R19 = R26 < R24;
				// Instr #98
				SDNode *lt9 = CurDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				SDValue(vload17, 0),
				// glue (or chain) input edge
				SDValue(vload17, 1)
				);

				// R17 = R02 < R26;
				// Instr #99
				SDNode *lt10 = CurDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload16, 0),
				SDValue(vload7, 0),
				// glue (or chain) input edge
				SDValue(lt9, 1)
				);

				// R02 = R31 + R02;
				// Instr #100
				SDNode *add4 = CurDAG->getMachineNode(
				Connex::ADDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload16, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(lt10, 1)
				);

				// R09 = R19 & R17;
				// Instr #101
				SDNode *and12 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(lt10, 0),
				SDValue(lt9, 0),
				// glue (or chain) input edge
				SDValue(add4, 1)
				);

				// R09 = R09 == R31;
				// Instr #102
				SDNode *eq12 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and12, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(and12, 1)
				);

				SDValue ct27 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #103
				SDNode *nop7 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct27,
				// glue (or chain) input edge
				SDValue(eq12, 1)
				);

				// WHERE_EQ;
				// Instr #104
				SDNode *whereeq7 = CurDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq12, 0),
				// glue (or chain) input edge
				SDValue(nop7, 0)
				);

				// R19 = R26 - R02;
				// Instr #105
				SDNode *sub7 = CurDAG->getMachineNode(
				Connex::SUBV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				SDValue(add4, 0),
				SDValue(lt9, 0),
				// glue (or chain) input edge
				SDValue(whereeq7, 1)
				);

				// R27 = R27 << R19;
				// Instr #106
				SDNode *shl4 = CurDAG->getMachineNode(
				Connex::SHLV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(shl3, 0),
				SDValue(sub7, 0),
				SDValue(shl3, 0),
				// glue (or chain) input edge
				SDValue(sub7, 1)
				);

				// REDUCE(R27);
				// Instr #107
				SDNode *sumRed7 = CurDAG->getMachineNode(
				Connex::RED_H,
				DL,
				MVT::Glue,
				SDValue(shl4, 0),
				// glue (or chain) input edge
				SDValue(shl4, 1)
				);

				// END_WHERE;
				// Instr #108
				SDNode *endwhere7 = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(sumRed7, 0)
				);

				SDValue ct28 = CurDAG->getConstant(25, DL, MVT::i16, true, false);
				// R02 = 25;
				// Instr #109
				SDNode *vload18 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct28,
				// glue (or chain) input edge
				SDValue(endwhere7, 0)
				);

				SDValue ct29 = CurDAG->getConstant(31, DL, MVT::i16, true, false);
				// R24 = 31;
				// Instr #110
				SDNode *vload19 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct29,
				// glue (or chain) input edge
				SDValue(vload18, 1)
				);

				// R19 = R26 < R24;
				// Instr #111
				SDNode *lt11 = CurDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				SDValue(vload19, 0),
				// glue (or chain) input edge
				SDValue(vload19, 1)
				);

				// R17 = R02 < R26;
				// Instr #112
				SDNode *lt12 = CurDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload18, 0),
				SDValue(vload7, 0),
				// glue (or chain) input edge
				SDValue(lt11, 1)
				);

				// R02 = R31 + R02;
				// Instr #113
				SDNode *add5 = CurDAG->getMachineNode(
				Connex::ADDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload18, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(lt12, 1)
				);

				// R09 = R19 & R17;
				// Instr #114
				SDNode *and13 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(lt12, 0),
				SDValue(lt11, 0),
				// glue (or chain) input edge
				SDValue(add5, 1)
				);

				// R09 = R09 == R31;
				// Instr #115
				SDNode *eq13 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and13, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(and13, 1)
				);

				SDValue ct30 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #116
				SDNode *nop8 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct30,
				// glue (or chain) input edge
				SDValue(eq13, 1)
				);

				// WHERE_EQ;
				// Instr #117
				SDNode *whereeq8 = CurDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq13, 0),
				// glue (or chain) input edge
				SDValue(nop8, 0)
				);

				// R19 = R26 - R02;
				// Instr #118
				SDNode *sub8 = CurDAG->getMachineNode(
				Connex::SUBV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload7, 0),
				SDValue(add5, 0),
				SDValue(lt11, 0),
				// glue (or chain) input edge
				SDValue(whereeq8, 1)
				);

				// R27 = R27 << R19;
				// Instr #119
				SDNode *shl5 = CurDAG->getMachineNode(
				Connex::SHLV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(shl4, 0),
				SDValue(sub8, 0),
				SDValue(shl4, 0),
				// glue (or chain) input edge
				SDValue(sub8, 1)
				);

				// REDUCE(R27);
				// Instr #120
				SDNode *sumRed8 = CurDAG->getMachineNode(
				Connex::RED_H,
				DL,
				MVT::Glue,
				SDValue(shl5, 0),
				// glue (or chain) input edge
				SDValue(shl5, 1)
				);

				// END_WHERE;
				// Instr #121
				SDNode reduceH / endwhere8 */ = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				// Alex: MVT::Glue, // Error: <<Assertion `N->getNodeId() == -1 && "Node already inserted!"' failed.>>
				MVT::Other,
				// glue (or chain) input edge
				SDValue(sumRed8, 0)
				);

lib/Target/Connex/Select_REDi32_OpincaaCodeGen.h

				//===-- Select_REDi32_OpincaaCodeGen.h --------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				// Code auto-generated by method Kernel::genLLVMISelManualCode()
				// from the OPINCAA lib, from kernel red.i32.
				// You should include this code in the Select() method of the SelectionDAGISel
				// class of your back end.
				// Number of instructions generated: 14.
				//
				//===----------------------------------------------------------------------===//


				// From /home/asusu/LLVM/llvm38Nov2016/llvm/build40/bin/Tests/NEW_v128i16/opincaa_standalone_apps/Emulate_i32/RED_i32_manual/DumpISel_OpincaaCodeGen_old04_300.cpp



				SDValue ct0 = CurDAG->getConstant(1, DL, MVT::i16, true, false);
				// R29 = 1;
				// Instr #0
				SDNode *vload0 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct0,
				// glue (or chain) input edge
				SDValue(nodeOpSrcCast, 1)
				);

				// CELL_SHR(R28, R29);
				// Instr #1
				SDNode *cellshr0 = CurDAG->getMachineNode(
				Connex::CELLSHR_H,
				DL,
				MVT::Glue,
				SDValue(nodeOpSrcCast, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(vload0, 1)
				);

				SDValue ct1 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #2
				SDNode *nop0 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct1,
				// glue (or chain) input edge
				SDValue(cellshr0, 0)
				);

				// R27 = SHIFT_REG;
				// Instr #3
				SDNode *ldsh0 = CurDAG->getMachineNode(
				Connex::LDSH_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop0, 0)
				);

				// R26 = INDEX;
				// Instr #4
				SDNode *ldix0 = CurDAG->getMachineNode(
				Connex::LDIX_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(ldsh0, 1)
				);

				// R25 = R26 & R29;
				// Instr #5
				SDNode *and0 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload0, 0),
				SDValue(ldix0, 0),
				// glue (or chain) input edge
				SDValue(ldix0, 1)
				);

				// R24 = R25 == R29;
				// Instr #6
				SDNode *eq0 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and0, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(and0, 1)
				);

				SDValue ct2 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #7
				SDNode *nop1 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct2,
				// glue (or chain) input edge
				SDValue(eq0, 1)
				);

				// WHERE_EQ;
				// Instr #8
				SDNode *whereeq0 = CurDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq0, 0),
				// glue (or chain) input edge
				SDValue(nop1, 0)
				);

				SDValue ct3 = CurDAG->getConstant(0, DL, MVT::i16, true, false);
				// R28 = 0;
				// Instr #9
				SDNode *vload1 = CurDAG->getMachineNode(
				Connex::VLOAD_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct3,
				SDValue(nodeOpSrcCast, 0),
				// glue (or chain) input edge
				SDValue(whereeq0, 1)
				);

				SDValue ct4 = CurDAG->getConstant(0, DL, MVT::i16, true, false);
				// R27 = 0;
				// Instr #10
				SDNode *vload2 = CurDAG->getMachineNode(
				Connex::VLOAD_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct4,
				SDValue(ldsh0, 0),
				// glue (or chain) input edge
				SDValue(vload1, 1)
				);

				// END_WHERE;
				// Instr #11
				SDNode *endwhere0 = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(vload2, 1)
				);

				// REDUCE_U(R28);
				// Instr #12
				SDNode *sumRedU0 = CurDAG->getMachineNode(
				Connex::RED_U_H,
				DL,
				MVT::Glue,
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(endwhere0, 0)
				);

				// REDUCE_U(R27);
				// Instr #13
				SDNode reduceHigh16 /sumRedU1*/ = CurDAG->getMachineNode(
				Connex::RED_U_H,
				DL,
				// Alex: MVT::Glue,
				MVT::Other,
				SDValue(vload2, 0),
				// glue (or chain) input edge
				SDValue(sumRedU0, 0)
				);

lib/Target/Connex/Select_SHRAi32_OpincaaCodeGen.h

				//===-- Select_SHRAi32_OpincaaCodeGen.h --------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				// Code auto-generated by method Kernel::genLLVMISelManualCode()
				// from the OPINCAA lib, from kernel shra.i32.
				// You should include this code in the Select() method of the SelectionDAGISel
				// class of your back end.
				// Number of instructions generated: 33.
				//
				//===----------------------------------------------------------------------===//

				// From /home/asusu/LLVM/llvm38Nov2016/llvm/build40/bin/Tests/NEW_v128i16/opincaa_standalone_apps/Emulate_i32/SHRA_i32_manual/DumpISel_OpincaaCodeGen_old13_927.cpp




				/* Alex: added manually to have predicated instructions refer to tied-to
				constraints to these nodes (destination registers of predicated instr)
				without initializing the respective dest registers, since it's not necessary.
				*/
				SDValue ct21Node = CurDAG->getConstant(21, DL, MVT::i16, true, false);
				SDNode *r21Node = CurDAG->getMachineNode(
				Connex::VLOAD_BOGUS_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct21Node,
				// glue (or chain) input edge
				SDValue(nodeOpSrcCast1, 1)
				);

				SDValue ct22Node = CurDAG->getConstant(22, DL, MVT::i16, true, false);
				SDNode *r22Node = CurDAG->getMachineNode(
				Connex::VLOAD_BOGUS_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct22Node,
				// glue (or chain) input edge
				SDValue(r21Node, 1)
				);


				SDValue ct0 = CurDAG->getConstant(0, DL, MVT::i16, true, false);
				// R31 = 0;
				// Instr #0
				SDNode *vload0 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct0,
				// glue (or chain) input edge
				SDValue(r22Node, 1)
				);

				SDValue ct1 = CurDAG->getConstant(1, DL, MVT::i16, true, false);
				// R30 = 1;
				// Instr #1
				SDNode *vload1 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct1,
				// glue (or chain) input edge
				SDValue(vload0, 1)
				);

				SDValue ct2 = CurDAG->getConstant(16, DL, MVT::i16, true, false);
				// R10 = 16;
				// Instr #2
				SDNode *vload2 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct2,
				// glue (or chain) input edge
				SDValue(vload1, 1)
				);

				SDValue ct3 = CurDAG->getConstant(31, DL, MVT::i16, true, false);
				// R08 = 31;
				// Instr #3
				SDNode *vload3 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct3,
				// glue (or chain) input edge
				SDValue(vload2, 1)
				);

				// R27 = R27 & R08;
				// Instr #4
				SDNode *and0 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload3, 0),
				SDValue(nodeOpSrcCast2, 0),
				// glue (or chain) input edge
				SDValue(vload3, 1)
				);

				// R25 = INDEX;
				// Instr #5
				SDNode *ldix0 = CurDAG->getMachineNode(
				Connex::LDIX_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(and0, 1)
				);

				// R25 = R25 & R30;
				// Instr #6
				SDNode *and1 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload1, 0),
				SDValue(ldix0, 0),
				// glue (or chain) input edge
				SDValue(ldix0, 1)
				);

				// CELL_SHR(R27, R25);
				// Instr #7
				SDNode *cellshr0 = CurDAG->getMachineNode(
				Connex::CELLSHR_H,
				DL,
				MVT::Glue,
				SDValue(and0, 0),
				SDValue(and1, 0),
				// glue (or chain) input edge
				SDValue(and1, 1)
				);

				SDValue ct4 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #8
				SDNode *nop0 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct4,
				// glue (or chain) input edge
				SDValue(cellshr0, 0)
				);

				// R27 = SHIFT_REG;
				// Instr #9
				SDNode *ldsh0 = CurDAG->getMachineNode(
				Connex::LDSH_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop0, 0)
				);

				// R20 = R10 < R27;
				// Instr #10
				SDNode *lt0 = CurDAG->getMachineNode(
				Connex::LT_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload2, 0),
				SDValue(ldsh0, 0),
				// glue (or chain) input edge
				SDValue(ldsh0, 1)
				);

				// R29 = SHRA(R28, R27);
				// Instr #11
				SDNode *shra0 = CurDAG->getMachineNode(
				Connex::SHRAV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(nodeOpSrcCast1, 0),
				SDValue(ldsh0, 0),
				// glue (or chain) input edge
				SDValue(lt0, 1)
				);

				// CELL_SHL(R28, R30);
				// Instr #12
				SDNode *cellshl0 = CurDAG->getMachineNode(
				Connex::CELLSHL_H,
				DL,
				MVT::Glue,
				SDValue(nodeOpSrcCast1, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(shra0, 1)
				);

				SDValue ct5 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #13
				SDNode *nop1 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct5,
				// glue (or chain) input edge
				SDValue(cellshl0, 0)
				);

				// R23 = SHIFT_REG;
				// Instr #14
				SDNode *ldsh1 = CurDAG->getMachineNode(
				Connex::LDSH_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop1, 0)
				);

				// R25 = R25 == R31;
				// Instr #15
				SDNode *eq0 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and1, 0),
				SDValue(vload0, 0),
				// glue (or chain) input edge
				SDValue(ldsh1, 1)
				);

				// R24 = R20 & R25;
				// Instr #16
				SDNode *and2 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq0, 0),
				SDValue(lt0, 0),
				// glue (or chain) input edge
				SDValue(eq0, 1)
				);

				// R19 = R24 == R30;
				// Instr #17
				SDNode *eq1 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and2, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(and2, 1)
				);

				SDValue ct6 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #18
				SDNode *nop2 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct6,
				// glue (or chain) input edge
				SDValue(eq1, 1)
				);

				// WHERE_EQ;
				// Instr #19
				SDNode *whereeq0 = CurDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq1, 0),
				// glue (or chain) input edge
				SDValue(nop2, 0)
				);

				// R21 = R27 - R10;
				// Instr #20
				SDNode *sub0 = CurDAG->getMachineNode(
				Connex::SUBV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(ldsh0, 0),
				SDValue(vload2, 0),
				SDValue(r21Node, 0),
				// glue (or chain) input edge
				SDValue(whereeq0, 1)
				);

				// R29 = SHRA(R23, R21);
				// Instr #21
				SDNode *shra1 = CurDAG->getMachineNode(
				Connex::SHRAV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(ldsh1, 0),
				SDValue(sub0, 0),
				SDValue(shra0, 0),
				// glue (or chain) input edge
				SDValue(sub0, 1)
				);

				// END_WHERE;
				// Instr #22
				SDNode *endwhere0 = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(shra1, 1)
				);

				// R20 = R30 - R20;
				// Instr #23
				SDNode *sub1 = CurDAG->getMachineNode(
				Connex::SUBV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload1, 0),
				SDValue(lt0, 0),
				// glue (or chain) input edge
				SDValue(endwhere0, 0)
				);

				// R24 = R20 & R25;
				// Instr #24
				SDNode *and3 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq0, 0),
				SDValue(sub1, 0),
				// glue (or chain) input edge
				SDValue(sub1, 1)
				);

				// R19 = R24 == R30;
				// Instr #25
				SDNode *eq2 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and3, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(and3, 1)
				);

				SDValue ct7 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #26
				SDNode *nop3 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct7,
				// glue (or chain) input edge
				SDValue(eq2, 1)
				);

				// WHERE_EQ;
				// Instr #27
				SDNode *whereeq1 = CurDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq2, 0),
				// glue (or chain) input edge
				SDValue(nop3, 0)
				);

				// R21 = R10 - R27;
				// Instr #28
				SDNode *sub2 = CurDAG->getMachineNode(
				Connex::SUBV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload2, 0),
				SDValue(ldsh0, 0),
				SDValue(sub0, 0),
				// glue (or chain) input edge
				SDValue(whereeq1, 1)
				);

				// R22 = R23 << R21;
				// Instr #29
				SDNode *shl0 = CurDAG->getMachineNode(
				Connex::SHLV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(ldsh1, 0),
				SDValue(sub2, 0),
				SDValue(r22Node, 0),
				// glue (or chain) input edge
				SDValue(sub2, 1)
				);

				// R29 = R28 >> R27;
				// Instr #30
				SDNode *shr0 = CurDAG->getMachineNode(
				Connex::SHRV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(nodeOpSrcCast1, 0),
				SDValue(ldsh0, 0),
				SDValue(shra1, 0),
				// glue (or chain) input edge
				SDValue(shl0, 1)
				);

				// R29 = R29 \| R22;
				// Instr #31
				SDNode resH /or0*/ = CurDAG->getMachineNode(
				Connex::ORV_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(shl0, 0),
				SDValue(shr0, 0),
				SDValue(shr0, 0),
				// glue (or chain) input edge
				SDValue(shr0, 1)
				);

				// END_WHERE;
				// Instr #32
				SDNode lastNode /endwhere1*/ = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				// MVT::Glue,
				MVT::Other,
				// glue (or chain) input edge
				SDValue(resH /or0/, 1)
				);

lib/Target/Connex/Select_SUBf16_OpincaaCodeGen.h

This file has a very large number of changes (3,651 lines). Show File Contents

lib/Target/Connex/Select_SUBi32_OpincaaCodeGen.h

				//===-- Select_SUBi32_OpincaaCodeGen.h --------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				arsenmUnsubmitted Not Done Reply Inline Actions This file needs to be dropped. There should be no committed, generated code. You should have table gen emit this, or manually write the code in ISelDAGToDAG arsenm: This file needs to be dropped. There should be no committed, generated code. You should have…
				//===----------------------------------------------------------------------===//
				/// \file
				// Code auto-generated by method Kernel::genLLVMISelManualCode()
				// from the OPINCAA lib, from kernel sub.i32.
				// You should include this code in the Select() method of the SelectionDAGISel
				// class of your back end.
				// Number of instructions generated: 15.
				//
				//===----------------------------------------------------------------------===//


				// From /home/asusu/LLVM/llvm38Nov2016/llvm/build40/bin/Tests/NEW_v128i16/opincaa_standalone_apps/Emulate_i32/SUB_i32_manual/DumpISel_OpincaaCodeGen_old110_400.cpp

				// R27 is REG_SRC1. It is represented by result of nodeOpSrcCast1.
				// R28 is REG_SRC2. It is represented by result of nodeOpSrcCast2.





				SDValue ct0 = CurDAG->getConstant(0, DL, MVT::i16, true, false);
				// R31 = 0;
				// Instr #0
				SDNode *vload0 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct0,
				// glue (or chain) input edge
				SDValue(nodeOpSrcCast2, 1)
				);

				SDValue ct1 = CurDAG->getConstant(1, DL, MVT::i16, true, false);
				// R30 = 1;
				// Instr #1
				SDNode *vload1 = CurDAG->getMachineNode(
				Connex::VLOAD_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct1,
				// glue (or chain) input edge
				SDValue(vload0, 1)
				);

				// R29 = R27 - R28;
				// Instr #2
				SDNode *sub0 = CurDAG->getMachineNode(
				Connex::SUBV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(nodeOpSrcCast1, 0),
				SDValue(nodeOpSrcCast2, 0),
				// glue (or chain) input edge
				SDValue(vload1, 1)
				);

				// R23 = ADDC(R31, R31);
				// Instr #3
				SDNode *addc0 = CurDAG->getMachineNode(
				Connex::ADDCV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload0, 0),
				SDValue(vload0, 0),
				SDValue(sub0, 0)
				// no need for glue or chain input (since it normally consumes the output of the predecessor)
				);

				// R26 = INDEX;
				// Instr #4
				SDNode *ldix0 = CurDAG->getMachineNode(
				Connex::LDIX_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(addc0, 1)
				);

				// R25 = R26 & R30;
				// Instr #5
				SDNode *and0 = CurDAG->getMachineNode(
				Connex::ANDV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(vload1, 0),
				SDValue(ldix0, 0),
				// glue (or chain) input edge
				SDValue(ldix0, 1)
				);

				// R24 = R25 == R30;
				// Instr #6
				SDNode *eq0 = CurDAG->getMachineNode(
				Connex::EQ_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(and0, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(and0, 1)
				);

				SDValue ct2 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #7
				SDNode *nop0 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct2,
				// glue (or chain) input edge
				SDValue(eq0, 1)
				);

				// WHERE_EQ;
				// Instr #8
				SDNode *whereeq0 = CurDAG->getMachineNode(
				Connex::WHEREEQ,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(eq0, 0),
				// glue (or chain) input edge
				SDValue(nop0, 0)
				);

				SDValue ct3 = CurDAG->getConstant(0, DL, MVT::i16, true, false);
				// R23 = 0;
				// Instr #9
				SDNode *vload2 = CurDAG->getMachineNode(
				Connex::VLOAD_SPECIAL_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				ct3,
				SDValue(addc0, 0),
				// glue (or chain) input edge
				SDValue(whereeq0, 1)
				);

				// END_WHERE;
				// Instr #10
				SDNode *endwhere0 = CurDAG->getMachineNode(
				Connex::END_WHERE,
				DL,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(vload2, 1)
				);

				// CELL_SHR(R23, R30);
				// Instr #11
				SDNode *cellshr0 = CurDAG->getMachineNode(
				Connex::CELLSHR_H,
				DL,
				MVT::Glue,
				SDValue(vload2, 0),
				SDValue(vload1, 0),
				// glue (or chain) input edge
				SDValue(endwhere0, 0)
				);

				SDValue ct4 = CurDAG->getConstant(1 /* Num of cycles to NOP */, DL, MVT::i16, true, false);
				// NOP;
				// Instr #12
				SDNode *nop1 = CurDAG->getMachineNode(
				Connex::NOP_BPF,
				DL,
				MVT::Glue,
				ct4,
				// glue (or chain) input edge
				SDValue(cellshr0, 0)
				);

				// R23 = SHIFT_REG;
				// Instr #13
				SDNode *ldsh0 = CurDAG->getMachineNode(
				Connex::LDSH_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				// glue (or chain) input edge
				SDValue(nop1, 0)
				);

				// R29 = R29 - R23;
				// Instr #14
				SDNode resH /sub1*/ = CurDAG->getMachineNode(
				Connex::SUBV_H,
				DL,
				TYPE_VECTOR_I16,
				MVT::Glue,
				SDValue(sub0, 0),
				SDValue(ldsh0, 0),
				// glue (or chain) input edge
				SDValue(ldsh0, 1)
				);

				SDNode *lastNode = resH;

lib/Target/Connex/TargetInfo/CMakeLists.txt

				add_llvm_library(LLVMConnexInfo
				ConnexTargetInfo.cpp
				)

lib/Target/Connex/TargetInfo/ConnexTargetInfo.cpp

				//===-- ConnexTargetInfo.cpp - Connex Target Implementation ---------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "Connex.h"
				#include "llvm/Support/TargetRegistry.h"
				using namespace llvm;

				namespace llvm {
				Target TheConnexTarget;
				}

				extern "C" void LLVMInitializeConnexTargetInfo() {
				TargetRegistry::RegisterTarget(TheConnexTarget, "connex",
				//"Connex (host endian)",
				"Connex",
				"Connex",
				[](Triple::ArchType) { return false; }, true);
				}

lib/Target/Connex/TargetInfo/LLVMBuild.txt

				;===- ./lib/Target/Connex/TargetInfo/LLVMBuild.txt ----------------- Conf ---===;
				;
				; Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				; See https://llvm.org/LICENSE.txt for license information.
				; SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				;
				;===------------------------------------------------------------------------===;
				;
				; This is an LLVMBuild description file for the components in this subdirectory.
				;
				; For more information on the LLVMBuild system, please see:
				;
				; http://llvm.org/docs/LLVMBuild.html
				;
				;===------------------------------------------------------------------------===;

				[component_0]
				type = Library
				name = ConnexInfo
				parent = Connex
				required_libraries = Support
				add_to_library_groups = Connex

lib/Target/LLVMBuild.txt

	Show All 18 Lines
	[common]			[common]
	subdirectories =			subdirectories =
	AMDGPU			AMDGPU
	ARC			ARC
	ARM			ARM
	AArch64			AArch64
	AVR			AVR
	BPF			BPF
				Connex
	Lanai			Lanai
	Hexagon			Hexagon
	MSP430			MSP430
	NVPTX			NVPTX
	Mips			Mips
	PowerPC			PowerPC
	RISCV			RISCV
	Sparc			Sparc
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

test/CodeGen/Connex/MatMul-128_i16.ll

				; RUN: llc < %s -print-after-all -debug -march=connex -O3 -disable-cgp -pre-RA-sched=source -hoist-cheap-insts -enable-correct-asm-print -asm-show-inst -asm-verbose -debug-pass=Structure \| FileCheck %s

				; From ~/LLVM/Tests/DawnCC/35_MatMul/SIZE_256/7_CVL8_LLVMnew/3/test.ll

				; ModuleID = 'test.scalar.ll'
				source_filename = "test.c"
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@C = common local_unnamed_addr global [256 x [256 x i16]] zeroinitializer, align 16
				@A = common local_unnamed_addr global [256 x [256 x i16]] zeroinitializer, align 16
				@B = common local_unnamed_addr global [256 x [256 x i16]] zeroinitializer, align 16
				@CONNEX_VL = external global i64

				; Function Attrs: nounwind uwtable
				define void @MatMul_BTransposed() local_unnamed_addr #0 !dbg !15 {
				entry:
				call void asm sideeffect "// START_OPINCAA_HOST_DEVICE_CODE\0A int numI16WordsAccessedInArrayA = 65536;\0A connexGlobal->writeDataToConnexPartial(A, // ;)\0A /* num elems written / numI16WordsAccessedInArrayA, // ;)\0A / offset / 0);\0A // Generated in InstrumentVectorGatherLoadOrScatterStore() ;)\0A int numI16WordsAccessedInArrayB = 65536;\0A connexGlobal->writeDataToConnexPartial(B, // ;)\0A / num elems written / numI16WordsAccessedInArrayB, // ;)\0A / offset */ 0 + CEIL_INT_DIV(numI16WordsAccessedInArrayA, CONNEX_VL));\0A // ;)\0Aif (connexGlobal->getKernel(\22OpincaaLLVM_MatMul_BTransposed_lines_43_0\22) == NULL) {\0A BEGIN_KERNEL(\22OpincaaLLVM_MatMul_BTransposed_lines_43_0\22); // Generated in vectorizeLoop()\0A EXECUTE_IN_ALL( // Generated in vectorizeLoop()\0A // Handling spills (from predecessors) and fills\0A", ""() #3, !dbg !18
				tail call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !19, metadata !21), !dbg !18
				%CONNEX_VL_DEREF_C = load i64, i64* @CONNEX_VL, align 8, !dbg !22
				%getSizeDiv_i64 = udiv i64 sub (i64 add (i64 ptrtoint ([256 x [256 x i16]]* @A to i64), i64 131072), i64 ptrtoint ([256 x [256 x i16]]* @A to i64)), %CONNEX_VL_DEREF_C, !dbg !22
				%getSizeDiv = lshr i64 %getSizeDiv_i64, 1, !dbg !22
				%ceil_getSize_16b = trunc i64 %getSizeDiv to i16, !dbg !22
				br label %for.cond1.preheader, !dbg !22

				for.cond1.preheader: ; preds = %entry, %for.inc27
				%0 = phi <8 x i16> [ undef, %entry ], [ %15, %for.inc27 ]
				%i.05 = phi i32 [ 0, %entry ], [ %inc28, %for.inc27 ]
				%vecIndVar2ndInnerLoop0 = insertelement <8 x i16> undef, i16 %ceil_getSize_16b, i64 0, !dbg !26
				%1 = shufflevector <8 x i16> %vecIndVar2ndInnerLoop0, <8 x i16> undef, <8 x i32> zeroinitializer, !dbg !26
				%idxprom4 = sext i32 %i.05 to i64, !dbg !26
				%GEPInstrIndexWith0.idx = shl nsw i64 %idxprom4, 9, !dbg !31
				%CONNEX_VL_DEREF_D = load i64, i64* @CONNEX_VL, align 8, !dbg !31
				%connexVLDerefAdjusted = shl i64 %CONNEX_VL_DEREF_D, 1, !dbg !31
				%finalIndexValue64 = udiv i64 %GEPInstrIndexWith0.idx, %connexVLDerefAdjusted, !dbg !31
				%finalIndexValue647 = trunc i64 %finalIndexValue64 to i16, !dbg !31
				br label %for.body3, !dbg !31

				for.body3: ; preds = %for.cond1.preheader, %for.inc24
				%2 = phi <8 x i16> [ %0, %for.cond1.preheader ], [ %15, %for.inc24 ]
				%j.04 = phi i32 [ 0, %for.cond1.preheader ], [ %inc25, %for.inc24 ]
				%varVecIndexOuterLoop = phi <8 x i16> [ %1, %for.cond1.preheader ], [ %15, %for.inc24 ], !dbg !26
				call void @llvm.connex.repeat.x.times(i64 256), !dbg !26
				; CHECK: REPEAT_TIMES
				%idxprom = sext i32 %j.04 to i64, !dbg !26
				%arrayidx5 = getelementptr inbounds [256 x [256 x i16]], [256 x [256 x i16]]* @C, i64 0, i64 %idxprom4, i64 %idxprom, !dbg !26
				store i16 0, i16* %arrayidx5, align 2, !dbg !33
				tail call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !34, metadata !21), !dbg !35
				%CONNEX_VL_DEREF_B = load i64, i64* @CONNEX_VL, align 8, !dbg !36
				%n.mod.vf = urem i64 256, %CONNEX_VL_DEREF_B, !dbg !36
				%n.vec = sub nsw i64 256, %n.mod.vf, !dbg !36
				%cmp.zero = icmp eq i64 %n.vec, 0, !dbg !36
				%cast.crd = trunc i64 %n.vec to i32, !dbg !36
				br i1 %cmp.zero, label %for.body8.preheader, label %vector.ph, !dbg !36

				vector.ph: ; preds = %for.body3
				%vecInsElem_valExactLSOffset = insertelement <8 x i16> undef, i16 %finalIndexValue647, i64 0, !dbg !36
				%vecValExactLSOffset = shufflevector <8 x i16> %vecInsElem_valExactLSOffset, <8 x i16> undef, <8 x i32> zeroinitializer, !dbg !36
				br label %vector.body, !dbg !36

				vector.body: ; preds = %vector.body, %vector.ph
				%index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				%vec.phi = phi <8 x i16> [ zeroinitializer, %vector.ph ], [ %10, %vector.body ]
				%varVecIndexInnerLoop = phi <8 x i16> [ %vecValExactLSOffset, %vector.ph ], [ %5, %vector.body ]
				%varVecIndexInnerLoop20 = phi <8 x i16> [ %varVecIndexOuterLoop, %vector.ph ], [ %8, %vector.body ]
				call void asm sideeffect " // Map part of reduction code; // Generated in vectorizeLoop()\0A", ""() #3
				call void asm sideeffect "// An empty inline Asm expression, required for ConnexAsmPrinter.cpp, MoveToFront();\0A\0A", ""() #3
				call void asm sideeffect "int indexLLVM_LV2;\0Aint origLoopTripCount = 256;\0Afor (indexLLVM_LV2 = 0; indexLLVM_LV2 < origLoopTripCount; indexLLVM_LV2 += CONNEX_VL) { // vectorized loop for induction var [NO INFO]\0A", ""() #3
				%3 = sext <8 x i16> %varVecIndexInnerLoop to <8 x i64>, !dbg !36
				%VectorGep = getelementptr i16, i16* inttoptr (i16 51 to i16*), <8 x i16> %varVecIndexInnerLoop, !dbg !36
				%4 = call <8 x i16> @llvm.masked.gather.v8i16(<8 x i16*> %VectorGep, i32 0, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i16> undef) #3, !dbg !36
				%5 = add <8 x i16> %varVecIndexInnerLoop, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>, !dbg !36
				%6 = sext <8 x i16> %varVecIndexInnerLoop20 to <8 x i64>, !dbg !36
				%VectorGep21 = getelementptr i16, i16* inttoptr (i16 51 to i16*), <8 x i16> %varVecIndexInnerLoop20, !dbg !36
				%7 = call <8 x i16> @llvm.masked.gather.v8i16(<8 x i16*> %VectorGep21, i32 0, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i16> undef) #3, !dbg !36
				%8 = add <8 x i16> %varVecIndexInnerLoop20, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>, !dbg !36
				%9 = mul <8 x i16> %7, %4, !dbg !40
				%10 = add <8 x i16> %vec.phi, %9, !dbg !42
				%index.next = add i64 %index, 8, !dbg !36
				%11 = icmp eq i64 %index.next, %n.vec, !dbg !36
				br i1 %11, label %middle.block, label %vector.body, !dbg !36, !llvm.loop !43

				middle.block: ; preds = %vector.body
				call void asm sideeffect "} // END for (indexLLVM_LV2) loop\0A", ""() #3
				call void @llvm.connex.reduce.v8i16(<8 x i16> %10)
				; CHECK: RED
				%cmp.n = icmp eq i64 %n.mod.vf, 0
				br i1 %cmp.n, label %for.inc24, label %for.body8.preheader, !dbg !36

				for.body8.preheader: ; preds = %middle.block, %for.body3
				%12 = phi <8 x i16> [ %2, %for.body3 ], [ %8, %middle.block ]
				%k.02.ph = phi i32 [ 0, %for.body3 ], [ %cast.crd, %middle.block ]
				br label %for.body8, !dbg !47

				for.body8: ; preds = %for.body8.preheader, %for.body8
				%add3 = phi i16 [ %add, %for.body8 ], [ 0, %for.body8.preheader ], !dbg !47
				%k.02 = phi i32 [ %inc, %for.body8 ], [ %k.02.ph, %for.body8.preheader ]
				%idxprom9 = sext i32 %k.02 to i64, !dbg !47
				%arrayidx12 = getelementptr inbounds [256 x [256 x i16]], [256 x [256 x i16]]* @A, i64 0, i64 %idxprom4, i64 %idxprom9, !dbg !47
				%13 = load i16, i16* %arrayidx12, align 2, !dbg !47
				%arrayidx16 = getelementptr inbounds [256 x [256 x i16]], [256 x [256 x i16]]* @B, i64 0, i64 %idxprom, i64 %idxprom9, !dbg !48
				%14 = load i16, i16* %arrayidx16, align 2, !dbg !48
				%mul = mul i16 %14, %13, !dbg !40
				%add = add i16 %add3, %mul, !dbg !42
				%inc = add nsw i32 %k.02, 1, !dbg !49
				tail call void @llvm.dbg.value(metadata i32 %inc, i64 0, metadata !34, metadata !21), !dbg !35
				%cmp7 = icmp slt i32 %inc, 256, !dbg !51
				br i1 %cmp7, label %for.body8, label %for.inc24.loopexit, !dbg !36, !llvm.loop !52

				for.inc24.loopexit: ; preds = %for.body8
				br label %for.inc24, !dbg !42

				for.inc24: ; preds = %for.inc24.loopexit, %middle.block
				%15 = phi <8 x i16> [ %8, %middle.block ], [ %12, %for.inc24.loopexit ]
				%add.lcssa = phi i16 [ undef, %middle.block ], [ %add, %for.inc24.loopexit ]
				store i16 %add.lcssa, i16* %arrayidx5, align 2, !dbg !42
				%inc25 = add nsw i32 %j.04, 1, !dbg !54
				tail call void @llvm.dbg.value(metadata i32 %inc25, i64 0, metadata !56, metadata !21), !dbg !57
				%cmp2 = icmp slt i32 %inc25, 256, !dbg !58
				br i1 %cmp2, label %for.body3, label %for.inc27, !dbg !31, !llvm.loop !59

				for.inc27: ; preds = %for.inc24
				call void @llvm.connex.end.repeat(), !dbg !61
				%inc28 = add nsw i32 %i.05, 1, !dbg !61
				tail call void @llvm.dbg.value(metadata i32 %inc28, i64 0, metadata !19, metadata !21), !dbg !18
				%cmp = icmp slt i32 %inc28, 256, !dbg !63
				call void asm sideeffect ");\0A END_KERNEL(\22OpincaaLLVM_MatMul_BTransposed_lines_43_0\22);\0A} // END if (connexGlobal->getKernel(...) == NULL)\0A connexGlobal->executeKernel(\22OpincaaLLVM_MatMul_BTransposed_lines_43_0\22);\0AconnexGlobal->readCorrectReductionResults(C, 65536, 2); \0A\0A// END_OPINCAA_HOST_DEVICE_CODE", ""() #3, !dbg !22
				br i1 %cmp, label %for.cond1.preheader, label %for.end29, !dbg !22, !llvm.loop !64

				for.end29: ; preds = %for.inc27
				ret void, !dbg !66
				}

				; Function Attrs: nounwind readnone
				declare void @llvm.dbg.value(metadata, i64, metadata, metadata) #1

				; Function Attrs: nounwind readonly
				declare <8 x i16> @llvm.masked.gather.v8i16(<8 x i16*>, i32, <8 x i1>, <8 x i16>) #2

				; Function Attrs: nounwind
				declare void @llvm.connex.repeat.x.times(i64) #3

				; Function Attrs: nounwind
				declare void @llvm.connex.end.repeat() #3

				; Function Attrs: nounwind
				declare void @llvm.connex.reduce.v8i16(<8 x i16>) #3

				attributes #0 = { nounwind uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #1 = { nounwind readnone }
				attributes #2 = { nounwind readonly }
				attributes #3 = { nounwind }

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!12, !13}
				!llvm.ident = !{!14}

				!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 3.9.0 (trunk 274579) (llvm/trunk 274513)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, globals: !3)
				!1 = !DIFile(filename: "test.c", directory: "/home/Tests/DawnCC/35_MatMul/SIZE_256/7_CVL8_LLVMnew")
				!2 = !{}
				!3 = !{!4, !10, !11}
				!4 = distinct !DIGlobalVariable(name: "A", scope: !0, file: !1, line: 36, type: !5, isLocal: false, isDefinition: true)
				!5 = !DICompositeType(tag: DW_TAG_array_type, baseType: !6, size: 1048576, align: 16, elements: !8)
				!6 = !DIDerivedType(tag: DW_TAG_typedef, name: "TYPE", file: !1, line: 34, baseType: !7)
				!7 = !DIBasicType(name: "short", size: 16, align: 16, encoding: DW_ATE_signed)
				!8 = !{!9, !9}
				!9 = !DISubrange(count: 256)
				!10 = distinct !DIGlobalVariable(name: "B", scope: !0, file: !1, line: 37, type: !5, isLocal: false, isDefinition: true)
				!11 = distinct !DIGlobalVariable(name: "C", scope: !0, file: !1, line: 38, type: !5, isLocal: false, isDefinition: true)
				!12 = !{i32 2, !"Dwarf Version", i32 4}
				!13 = !{i32 2, !"Debug Info Version", i32 3}
				!14 = !{!"clang version 3.9.0 (trunk 274579) (llvm/trunk 274513)"}
				!15 = distinct !DISubprogram(name: "MatMul_BTransposed", scope: !1, file: !1, line: 40, type: !16, isLocal: false, isDefinition: true, scopeLine: 40, isOptimized: false, unit: !0)
				!16 = !DISubroutineType(types: !17)
				!17 = !{null}
				!18 = !DILocation(line: 41, column: 9, scope: !15)
				!19 = !DILocalVariable(name: "i", scope: !15, file: !1, line: 41, type: !20)
				!20 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
				!21 = !DIExpression()
				!22 = !DILocation(line: 43, column: 5, scope: !23)
				!23 = !DILexicalBlockFile(scope: !24, file: !1, discriminator: 1)
				!24 = distinct !DILexicalBlock(scope: !25, file: !1, line: 43, column: 5)
				!25 = distinct !DILexicalBlock(scope: !15, file: !1, line: 43, column: 5)
				!26 = !DILocation(line: 45, column: 13, scope: !27)
				!27 = distinct !DILexicalBlock(scope: !28, file: !1, line: 44, column: 36)
				!28 = distinct !DILexicalBlock(scope: !29, file: !1, line: 44, column: 9)
				!29 = distinct !DILexicalBlock(scope: !30, file: !1, line: 44, column: 9)
				!30 = distinct !DILexicalBlock(scope: !24, file: !1, line: 43, column: 32)
				!31 = !DILocation(line: 44, column: 9, scope: !32)
				!32 = !DILexicalBlockFile(scope: !28, file: !1, discriminator: 1)
				!33 = !DILocation(line: 45, column: 21, scope: !27)
				!34 = !DILocalVariable(name: "k", scope: !15, file: !1, line: 41, type: !20)
				!35 = !DILocation(line: 41, column: 15, scope: !15)
				!36 = !DILocation(line: 46, column: 13, scope: !37)
				!37 = !DILexicalBlockFile(scope: !38, file: !1, discriminator: 1)
				!38 = distinct !DILexicalBlock(scope: !39, file: !1, line: 46, column: 13)
				!39 = distinct !DILexicalBlock(scope: !27, file: !1, line: 46, column: 13)
				!40 = !DILocation(line: 47, column: 36, scope: !41)
				!41 = distinct !DILexicalBlock(scope: !38, file: !1, line: 46, column: 40)
				!42 = !DILocation(line: 47, column: 25, scope: !41)
				!43 = distinct !{!43, !44, !45, !46}
				!44 = !DILocation(line: 46, column: 13, scope: !27)
				!45 = !{!"llvm.loop.vectorize.width", i32 1}
				!46 = !{!"llvm.loop.interleave.count", i32 1}
				!47 = !DILocation(line: 47, column: 28, scope: !41)
				!48 = !DILocation(line: 47, column: 38, scope: !41)
				!49 = !DILocation(line: 46, column: 35, scope: !50)
				!50 = !DILexicalBlockFile(scope: !38, file: !1, discriminator: 2)
				!51 = !DILocation(line: 46, column: 27, scope: !37)
				!52 = distinct !{!52, !44, !53, !45, !46}
				!53 = !{!"llvm.loop.unroll.runtime.disable"}
				!54 = !DILocation(line: 44, column: 31, scope: !55)
				!55 = !DILexicalBlockFile(scope: !28, file: !1, discriminator: 2)
				!56 = !DILocalVariable(name: "j", scope: !15, file: !1, line: 41, type: !20)
				!57 = !DILocation(line: 41, column: 12, scope: !15)
				!58 = !DILocation(line: 44, column: 23, scope: !32)
				!59 = distinct !{!59, !60}
				!60 = !DILocation(line: 44, column: 9, scope: !30)
				!61 = !DILocation(line: 43, column: 27, scope: !62)
				!62 = !DILexicalBlockFile(scope: !24, file: !1, discriminator: 2)
				!63 = !DILocation(line: 43, column: 19, scope: !23)
				!64 = distinct !{!64, !65}
				!65 = !DILocation(line: 43, column: 5, scope: !15)
				!66 = !DILocation(line: 53, column: 1, scope: !15)

test/CodeGen/Connex/MatMul-128_i32.ll

				; RUN: llc < %s -print-after-all -debug -march=connex -O3 -disable-cgp -pre-RA-sched=source -hoist-cheap-insts -enable-correct-asm-print -asm-show-inst -asm-verbose -debug-pass=Structure \| FileCheck %s

				; From ~/LLVM/Tests/DawnCC/35_MatMul_i32/SIZE128/I_CVL8_LLVMnew/4/test.ll

				; ModuleID = 'test.scalar.ll'
				source_filename = "test.c"
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@C = common local_unnamed_addr global [128 x [128 x i32]] zeroinitializer, align 16
				@A = common local_unnamed_addr global [128 x [128 x i32]] zeroinitializer, align 16
				@B = common local_unnamed_addr global [128 x [128 x i32]] zeroinitializer, align 16
				@CONNEX_VL = external global i64

				; Function Attrs: nounwind uwtable
				define void @MatMul_BTransposed() local_unnamed_addr #0 !dbg !15 {
				entry:
				call void asm sideeffect "// START_OPINCAA_HOST_DEVICE_CODE\0A int numI16WordsAccessedInArrayA = 32768;\0A connexGlobal->writeDataToConnexPartial(A, // ;)\0A /* num elems written / numI16WordsAccessedInArrayA, // ;)\0A / offset / 0);\0A // Generated in InstrumentVectorGatherLoadOrScatterStore() ;)\0A int numI16WordsAccessedInArrayB = 32768;\0A connexGlobal->writeDataToConnexPartial(B, // ;)\0A / num elems written / numI16WordsAccessedInArrayB, // ;)\0A / offset */ 0 + CEIL_INT_DIV(numI16WordsAccessedInArrayA, CONNEX_VL));\0A // Generated in InstrumentVectorGatherLoadOrScatterStore() ;)\0Aif (connexGlobal->getKernel(\22OpincaaLLVM_MatMul_BTransposed_lines_44_0\22) == NULL) {\0A BEGIN_KERNEL(\22OpincaaLLVM_MatMul_BTransposed_lines_44_0\22); // Generated in vectorizeLoop()\0A EXECUTE_IN_ALL( // Generated in vectorizeLoop()\0A // Handling spills (from predecessors) and fills\0A", ""() #3, !dbg !18
				tail call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !19, metadata !20), !dbg !18
				%CONNEX_VL_DEREF_C = load i64, i64* @CONNEX_VL, align 8, !dbg !21
				%getSizeDiv_i64 = udiv i64 sub (i64 add (i64 ptrtoint ([128 x [128 x i32]]* @A to i64), i64 65536), i64 ptrtoint ([128 x [128 x i32]]* @A to i64)), %CONNEX_VL_DEREF_C, !dbg !21
				%getSizeDiv = lshr i64 %getSizeDiv_i64, 1, !dbg !21
				%ceil_getSize_16b = trunc i64 %getSizeDiv to i32, !dbg !21
				br label %for.cond1.preheader, !dbg !21

				for.cond1.preheader: ; preds = %entry, %for.inc24
				%0 = phi <4 x i32> [ undef, %entry ], [ %15, %for.inc24 ]
				%i.04 = phi i32 [ 0, %entry ], [ %inc25, %for.inc24 ]
				%vecIndVar2ndInnerLoop0 = insertelement <4 x i32> undef, i32 %ceil_getSize_16b, i64 0, !dbg !25
				%1 = shufflevector <4 x i32> %vecIndVar2ndInnerLoop0, <4 x i32> undef, <4 x i32> zeroinitializer, !dbg !25
				%idxprom4 = sext i32 %i.04 to i64, !dbg !25
				%GEPInstrIndexWith0.idx = shl nsw i64 %idxprom4, 9, !dbg !30
				%CONNEX_VL_DEREF_D = load i64, i64* @CONNEX_VL, align 8, !dbg !30
				%connexVLDerefAdjusted = shl i64 %CONNEX_VL_DEREF_D, 1, !dbg !30
				%finalIndexValue64 = udiv i64 %GEPInstrIndexWith0.idx, %connexVLDerefAdjusted, !dbg !30
				%finalIndexValue646 = trunc i64 %finalIndexValue64 to i32, !dbg !30
				br label %for.body3, !dbg !30

				for.body3: ; preds = %for.cond1.preheader, %for.inc21
				%2 = phi <4 x i32> [ %0, %for.cond1.preheader ], [ %15, %for.inc21 ]
				%j.03 = phi i32 [ 0, %for.cond1.preheader ], [ %inc22, %for.inc21 ]
				%varVecIndexOuterLoop = phi <4 x i32> [ %1, %for.cond1.preheader ], [ %15, %for.inc21 ], !dbg !25
				call void @llvm.connex.repeat.x.times(i64 128), !dbg !25
				%idxprom = sext i32 %j.03 to i64, !dbg !25
				%arrayidx5 = getelementptr inbounds [128 x [128 x i32]], [128 x [128 x i32]]* @C, i64 0, i64 %idxprom4, i64 %idxprom, !dbg !25
				store i32 0, i32* %arrayidx5, align 4, !dbg !32
				tail call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !33, metadata !20), !dbg !34
				%CONNEX_VL_DEREF_B = load i64, i64* @CONNEX_VL, align 8, !dbg !35
				%n.mod.vf = urem i64 128, %CONNEX_VL_DEREF_B, !dbg !35
				%n.vec = sub nsw i64 128, %n.mod.vf, !dbg !35
				%cmp.zero = icmp eq i64 %n.vec, 0, !dbg !35
				%cast.crd = trunc i64 %n.vec to i32, !dbg !35
				br i1 %cmp.zero, label %for.body8.preheader, label %vector.ph, !dbg !35

				vector.ph: ; preds = %for.body3
				%vecInsElem_valExactLSOffset = insertelement <4 x i32> undef, i32 %finalIndexValue646, i64 0, !dbg !35
				%vecValExactLSOffset = shufflevector <4 x i32> %vecInsElem_valExactLSOffset, <4 x i32> undef, <4 x i32> zeroinitializer, !dbg !35
				br label %vector.body, !dbg !35

				vector.body: ; preds = %vector.body, %vector.ph
				%index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				%vec.phi = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ %10, %vector.body ]
				%varVecIndexInnerLoop = phi <4 x i32> [ %vecValExactLSOffset, %vector.ph ], [ %5, %vector.body ]
				%varVecIndexInnerLoop19 = phi <4 x i32> [ %varVecIndexOuterLoop, %vector.ph ], [ %8, %vector.body ]
				call void asm sideeffect " // Map part of reduction code; // Generated in vectorizeLoop()\0A", ""() #3
				call void asm sideeffect "// An empty inline Asm expression, required for ConnexAsmPrinter.cpp, MoveToFront();\0A\0A", ""() #3
				call void asm sideeffect "int indexLLVM_LV2;\0Aint origLoopTripCount = 128;\0Afor (indexLLVM_LV2 = 0; indexLLVM_LV2 < origLoopTripCount; indexLLVM_LV2 += (CONNEX_VL / 2)) { // vectorized loop for induction var [NO INFO]\0A", ""() #3
				%3 = sext <4 x i32> %varVecIndexInnerLoop to <4 x i64>, !dbg !35
				%VectorGep = getelementptr i32, i32* inttoptr (i32 51 to i32*), <4 x i32> %varVecIndexInnerLoop, !dbg !35
				%4 = call <4 x i32> @llvm.masked.gather.v4i32(<4 x i32*> %VectorGep, i32 0, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef) #3, !dbg !35
				%5 = add <4 x i32> %varVecIndexInnerLoop, <i32 1, i32 1, i32 1, i32 1>, !dbg !35
				%6 = sext <4 x i32> %varVecIndexInnerLoop19 to <4 x i64>, !dbg !35
				%VectorGep20 = getelementptr i32, i32* inttoptr (i32 51 to i32*), <4 x i32> %varVecIndexInnerLoop19, !dbg !35
				%7 = call <4 x i32> @llvm.masked.gather.v4i32(<4 x i32*> %VectorGep20, i32 0, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef) #3, !dbg !35
				%8 = add <4 x i32> %varVecIndexInnerLoop19, <i32 1, i32 1, i32 1, i32 1>, !dbg !35
				%9 = mul nsw <4 x i32> %7, %4, !dbg !39
				%10 = add nsw <4 x i32> %vec.phi, %9, !dbg !41
				%index.next = add i64 %index, 8, !dbg !35
				%11 = icmp eq i64 %index.next, %n.vec, !dbg !35
				br i1 %11, label %middle.block, label %vector.body, !dbg !35, !llvm.loop !42

				middle.block: ; preds = %vector.body
				call void asm sideeffect "} // END for (indexLLVM_LV2) loop\0A", ""() #3
				call void @llvm.connex.reduce.v4i32(<4 x i32> %10)
				%cmp.n = icmp eq i64 %n.mod.vf, 0
				br i1 %cmp.n, label %for.inc21, label %for.body8.preheader, !dbg !35

				for.body8.preheader: ; preds = %middle.block, %for.body3
				%12 = phi <4 x i32> [ %2, %for.body3 ], [ %8, %middle.block ]
				%k.01.ph = phi i32 [ 0, %for.body3 ], [ %cast.crd, %middle.block ]
				br label %for.body8, !dbg !46

				for.body8: ; preds = %for.body8.preheader, %for.body8
				%add2 = phi i32 [ %add, %for.body8 ], [ 0, %for.body8.preheader ], !dbg !46
				%k.01 = phi i32 [ %inc, %for.body8 ], [ %k.01.ph, %for.body8.preheader ]
				%idxprom9 = sext i32 %k.01 to i64, !dbg !46
				%arrayidx12 = getelementptr inbounds [128 x [128 x i32]], [128 x [128 x i32]]* @A, i64 0, i64 %idxprom4, i64 %idxprom9, !dbg !46
				%13 = load i32, i32* %arrayidx12, align 4, !dbg !46
				%arrayidx16 = getelementptr inbounds [128 x [128 x i32]], [128 x [128 x i32]]* @B, i64 0, i64 %idxprom, i64 %idxprom9, !dbg !47
				%14 = load i32, i32* %arrayidx16, align 4, !dbg !47
				%mul = mul nsw i32 %14, %13, !dbg !39
				%add = add nsw i32 %add2, %mul, !dbg !41
				%inc = add nsw i32 %k.01, 1, !dbg !48
				tail call void @llvm.dbg.value(metadata i32 %inc, i64 0, metadata !33, metadata !20), !dbg !34
				%cmp7 = icmp slt i32 %inc, 128, !dbg !50
				br i1 %cmp7, label %for.body8, label %for.inc21.loopexit, !dbg !35, !llvm.loop !51

				for.inc21.loopexit: ; preds = %for.body8
				br label %for.inc21, !dbg !41

				for.inc21: ; preds = %for.inc21.loopexit, %middle.block
				%15 = phi <4 x i32> [ %8, %middle.block ], [ %12, %for.inc21.loopexit ]
				%add.lcssa = phi i32 [ undef, %middle.block ], [ %add, %for.inc21.loopexit ]
				store i32 %add.lcssa, i32* %arrayidx5, align 4, !dbg !41
				%inc22 = add nsw i32 %j.03, 1, !dbg !53
				tail call void @llvm.dbg.value(metadata i32 %inc22, i64 0, metadata !55, metadata !20), !dbg !56
				%cmp2 = icmp slt i32 %inc22, 128, !dbg !57
				br i1 %cmp2, label %for.body3, label %for.inc24, !dbg !30, !llvm.loop !58

				for.inc24: ; preds = %for.inc21
				call void @llvm.connex.end.repeat(), !dbg !60
				%inc25 = add nsw i32 %i.04, 1, !dbg !60
				tail call void @llvm.dbg.value(metadata i32 %inc25, i64 0, metadata !19, metadata !20), !dbg !18
				%cmp = icmp slt i32 %inc25, 128, !dbg !62
				call void asm sideeffect ");\0A END_KERNEL(\22OpincaaLLVM_MatMul_BTransposed_lines_44_0\22);\0A} // END if (connexGlobal->getKernel(...) == NULL)\0A connexGlobal->executeKernel(\22OpincaaLLVM_MatMul_BTransposed_lines_44_0\22);\0AconnexGlobal->readCorrectReductionResults(C, 16384, 4);\0A\0A// END_OPINCAA_HOST_DEVICE_CODE", ""() #3, !dbg !21
				br i1 %cmp, label %for.cond1.preheader, label %for.end26, !dbg !21, !llvm.loop !63

				for.end26: ; preds = %for.inc24
				ret void, !dbg !65
				}

				; Function Attrs: nounwind readnone
				declare void @llvm.dbg.value(metadata, i64, metadata, metadata) #1

				; Function Attrs: nounwind readonly
				declare <4 x i32> @llvm.masked.gather.v4i32(<4 x i32*>, i32, <4 x i1>, <4 x i32>) #2

				; Function Attrs: nounwind
				declare void @llvm.connex.repeat.x.times(i64) #3

				; Function Attrs: nounwind
				declare void @llvm.connex.end.repeat() #3

				; Function Attrs: nounwind
				declare void @llvm.connex.reduce.v4i32(<4 x i32>) #3

				attributes #0 = { nounwind uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #1 = { nounwind readnone }
				attributes #2 = { nounwind readonly }
				attributes #3 = { nounwind }

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!12, !13}
				!llvm.ident = !{!14}

				!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 3.9.0 (trunk 274579) (llvm/trunk 274513)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, globals: !3)
				!1 = !DIFile(filename: "test.c", directory: "/home/Tests/DawnCC/35_MatMul_i32/SIZE128/G_CVL8")
				!2 = !{}
				!3 = !{!4, !10, !11}
				!4 = distinct !DIGlobalVariable(name: "A", scope: !0, file: !1, line: 36, type: !5, isLocal: false, isDefinition: true)
				!5 = !DICompositeType(tag: DW_TAG_array_type, baseType: !6, size: 524288, align: 32, elements: !8)
				!6 = !DIDerivedType(tag: DW_TAG_typedef, name: "TYPE", file: !1, line: 33, baseType: !7)
				!7 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
				!8 = !{!9, !9}
				!9 = !DISubrange(count: 128)
				!10 = distinct !DIGlobalVariable(name: "B", scope: !0, file: !1, line: 37, type: !5, isLocal: false, isDefinition: true)
				!11 = distinct !DIGlobalVariable(name: "C", scope: !0, file: !1, line: 38, type: !5, isLocal: false, isDefinition: true)
				!12 = !{i32 2, !"Dwarf Version", i32 4}
				!13 = !{i32 2, !"Debug Info Version", i32 3}
				!14 = !{!"clang version 3.9.0 (trunk 274579) (llvm/trunk 274513)"}
				!15 = distinct !DISubprogram(name: "MatMul_BTransposed", scope: !1, file: !1, line: 40, type: !16, isLocal: false, isDefinition: true, scopeLine: 40, isOptimized: false, unit: !0)
				!16 = !DISubroutineType(types: !17)
				!17 = !{null}
				!18 = !DILocation(line: 42, column: 9, scope: !15)
				!19 = !DILocalVariable(name: "i", scope: !15, file: !1, line: 42, type: !7)
				!20 = !DIExpression()
				!21 = !DILocation(line: 44, column: 5, scope: !22)
				!22 = !DILexicalBlockFile(scope: !23, file: !1, discriminator: 1)
				!23 = distinct !DILexicalBlock(scope: !24, file: !1, line: 44, column: 5)
				!24 = distinct !DILexicalBlock(scope: !15, file: !1, line: 44, column: 5)
				!25 = !DILocation(line: 46, column: 13, scope: !26)
				!26 = distinct !DILexicalBlock(scope: !27, file: !1, line: 45, column: 36)
				!27 = distinct !DILexicalBlock(scope: !28, file: !1, line: 45, column: 9)
				!28 = distinct !DILexicalBlock(scope: !29, file: !1, line: 45, column: 9)
				!29 = distinct !DILexicalBlock(scope: !23, file: !1, line: 44, column: 32)
				!30 = !DILocation(line: 45, column: 9, scope: !31)
				!31 = !DILexicalBlockFile(scope: !27, file: !1, discriminator: 1)
				!32 = !DILocation(line: 46, column: 21, scope: !26)
				!33 = !DILocalVariable(name: "k", scope: !15, file: !1, line: 42, type: !7)
				!34 = !DILocation(line: 42, column: 15, scope: !15)
				!35 = !DILocation(line: 47, column: 13, scope: !36)
				!36 = !DILexicalBlockFile(scope: !37, file: !1, discriminator: 1)
				!37 = distinct !DILexicalBlock(scope: !38, file: !1, line: 47, column: 13)
				!38 = distinct !DILexicalBlock(scope: !26, file: !1, line: 47, column: 13)
				!39 = !DILocation(line: 48, column: 36, scope: !40)
				!40 = distinct !DILexicalBlock(scope: !37, file: !1, line: 47, column: 40)
				!41 = !DILocation(line: 48, column: 25, scope: !40)
				!42 = distinct !{!42, !43, !44, !45}
				!43 = !DILocation(line: 47, column: 13, scope: !26)
				!44 = !{!"llvm.loop.vectorize.width", i32 1}
				!45 = !{!"llvm.loop.interleave.count", i32 1}
				!46 = !DILocation(line: 48, column: 28, scope: !40)
				!47 = !DILocation(line: 48, column: 38, scope: !40)
				!48 = !DILocation(line: 47, column: 35, scope: !49)
				!49 = !DILexicalBlockFile(scope: !37, file: !1, discriminator: 2)
				!50 = !DILocation(line: 47, column: 27, scope: !36)
				!51 = distinct !{!51, !43, !52, !44, !45}
				!52 = !{!"llvm.loop.unroll.runtime.disable"}
				!53 = !DILocation(line: 45, column: 31, scope: !54)
				!54 = !DILexicalBlockFile(scope: !27, file: !1, discriminator: 2)
				!55 = !DILocalVariable(name: "j", scope: !15, file: !1, line: 42, type: !7)
				!56 = !DILocation(line: 42, column: 12, scope: !15)
				!57 = !DILocation(line: 45, column: 23, scope: !31)
				!58 = distinct !{!58, !59}
				!59 = !DILocation(line: 45, column: 9, scope: !29)
				!60 = !DILocation(line: 44, column: 27, scope: !61)
				!61 = !DILexicalBlockFile(scope: !23, file: !1, discriminator: 2)
				!62 = !DILocation(line: 44, column: 19, scope: !22)
				!63 = distinct !{!63, !64}
				!64 = !DILocation(line: 44, column: 5, scope: !15)
				!65 = !DILocation(line: 54, column: 1, scope: !15)

test/CodeGen/Connex/basictest.ll

				; RUN: llc < %s -march=bpfel \| FileCheck %s

				define i32 @test0(i32 %X) {
				%tmp.1 = add i32 %X, 1
				ret i32 %tmp.1
				; CHECK-LABEL: test0:
				; CHECK: addi r1, 1
				}

				; CHECK-LABEL: store_imm:
				; CHECK: stw 0(r1), r{{[03]}}
				; CHECK: stw 4(r2), r{{[03]}}
				define i32 @store_imm(i32* %a, i32* %b) {
				entry:
				store i32 0, i32* %a, align 4
				%0 = getelementptr inbounds i32, i32* %b, i32 1
				store i32 0, i32* %0, align 4
				ret i32 0
				}

				@G = external global i8
				define zeroext i8 @loadG() {
				%tmp = load i8, i8* @G
				ret i8 %tmp
				; CHECK-LABEL: loadG:
				; CHECK: ld_64 r1
				; CHECK: ldb r0, 0(r1)
				}

test/CodeGen/Connex/lit.local.cfg

				if not 'Connex' in config.root.targets:
				config.unsupported = True

This is an archive of the discontinued LLVM Phabricator instance.

Add Connex vector processor back endNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 198358

CMakeLists.txt

CODE_OWNERS.TXT

include/llvm/ADT/Triple.h

include/llvm/Analysis/RegionInfoImpl.h

include/llvm/Analysis/ScalarEvolution.h

include/llvm/Analysis/ScalarEvolutionExpander.h

include/llvm/CodeGen/SelectionDAG.h

include/llvm/CodeGen/SelectionDAGISel.h

include/llvm/CodeGen/SlotIndexes.h

include/llvm/IR/Intrinsics.td

include/llvm/IR/IntrinsicsConnex.td

lib/CodeGen/LiveRangeCalc.cpp

lib/CodeGen/RegAllocGreedy.cpp

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp

lib/Target/Connex/Connex.h

lib/Target/Connex/ConnexAsmPrinter.cpp

lib/Target/Connex/ConnexAsmPrinterLoopNests.h

lib/Target/Connex/ConnexConfig.h

lib/Target/Connex/ConnexFrameLowering.h

lib/Target/Connex/ConnexFrameLowering.cpp

lib/Target/Connex/ConnexHazardRecognizer.h

lib/Target/Connex/ConnexHazardRecognizer.cpp

lib/Target/Connex/ConnexHazardRecognizerPreRAScheduler.h

lib/Target/Connex/ConnexHazardRecognizerPreRAScheduler.cpp

lib/Target/Connex/ConnexISelDAGToDAG.cpp

lib/Target/Connex/ConnexISelLowering.h

lib/Target/Connex/ConnexISelLowering.cpp

lib/Target/Connex/ConnexInstrInfo.h

lib/Target/Connex/ConnexInstrInfo.cpp

lib/Target/Connex/ConnexMCInstLower.h

lib/Target/Connex/ConnexMCInstLower.cpp

lib/Target/Connex/ConnexRegisterInfo.h

lib/Target/Connex/ConnexRegisterInfo.cpp

lib/Target/Connex/ConnexSelectionDAGInfo.h

lib/Target/Connex/ConnexSelectionDAGInfo.cpp

lib/Target/Connex/ConnexSubtarget.h

lib/Target/Connex/ConnexSubtarget.cpp

lib/Target/Connex/ConnexTargetMachine.h

lib/Target/Connex/ConnexTargetMachine.cpp

lib/Target/Connex/ConnexTargetTransformInfo.h

lib/Target/Connex/InstPrinter/CMakeLists.txt

lib/Target/Connex/InstPrinter/ConnexInstPrinter.h

lib/Target/Connex/InstPrinter/ConnexInstPrinter.cpp

lib/Target/Connex/InstPrinter/LLVMBuild.txt

lib/Target/Connex/LLVMBuild.txt

lib/Target/Connex/MCTargetDesc/CMakeLists.txt

lib/Target/Connex/MCTargetDesc/ConnexAsmBackend.cpp

lib/Target/Connex/MCTargetDesc/ConnexELFObjectWriter.cpp

lib/Target/Connex/MCTargetDesc/ConnexMCAsmInfo.h

lib/Target/Connex/MCTargetDesc/ConnexMCCodeEmitter.cpp

lib/Target/Connex/MCTargetDesc/ConnexMCTargetDesc.h

lib/Target/Connex/MCTargetDesc/ConnexMCTargetDesc.cpp

lib/Target/Connex/MCTargetDesc/LLVMBuild.txt

lib/Target/Connex/Misc.h

lib/Target/Connex/RecoverFromLlvmIR.h

lib/Target/Connex/Select_ADDf16_OpincaaCodeGen.h

lib/Target/Connex/Select_ADDi32_OpincaaCodeGen.h

lib/Target/Connex/Select_LTf16_OpincaaCodeGen.h

lib/Target/Connex/Select_MULTf16_OpincaaCodeGen.h

lib/Target/Connex/Select_MULTi32_ComplementedRepresentation_OpincaaCodeGen.h

lib/Target/Connex/Select_REDf16_OpincaaCodeGen.h

lib/Target/Connex/Select_REDi32_OpincaaCodeGen.h

lib/Target/Connex/Select_SHRAi32_OpincaaCodeGen.h

lib/Target/Connex/Select_SUBf16_OpincaaCodeGen.h

lib/Target/Connex/Select_SUBi32_OpincaaCodeGen.h

lib/Target/Connex/TargetInfo/CMakeLists.txt

lib/Target/Connex/TargetInfo/ConnexTargetInfo.cpp

lib/Target/Connex/TargetInfo/LLVMBuild.txt

lib/Target/LLVMBuild.txt

test/CodeGen/Connex/MatMul-128_i16.ll

Add Connex vector processor back end
Needs ReviewPublic