This is an archive of the discontinued LLVM Phabricator instance.

Add the Connex SIMD/vector processor back end
Needs ReviewPublic

Authored by alexsusu on Feb 28 2021, 11:57 AM.

Download Raw Diff

Details

Reviewers

asb
efriedma
jpienaar
jfb
fedor.sergeev

Summary

Connex is an established, almost 30-year old, wide research vector processor (see, for example, http://users.dcae.pub.ro/~gstefan/2ndLevel/connex.html) with a number of lanes between 32 and 4096, easily changeable at synthesis time.
A very interesting feature is that the Connex processor has a local banked vector memory (each lane has its own local memory), which achieves 1 cycle latency with direct and indirect loads and stores - this implies that the memory bandwidth is very big.

The Connex vector processor has 16-bit signed integer Execution Units in each lane. It is emulating efficiently (via inlining the emulation subroutines in the instruction selection pass) 32-bit int and IEEE 754-2008 compliant 16-bit floating point (Clang type _Float16, C for ARM __fp16, LLVM IR half type). The emulation subroutines are in the lib/Target/Connex/Select_*_OpincaaCodeGen.h files, which are to be included in the ConnexISelDAGToDAG.cpp module, in the ConnexDAGToDAGISel::Select() method. These emulation subroutines can be easily adjusted using for example to increase performance by sacrificing accuracy of f16 - drop me an email to ask how can you do it. (They currently total almost 1 MB of C++ code.)
The Connex vector processor does not currently support the float, double, nor the 64-bit integer types.

The back end targets more exactly the Connex processor, used as an accelerator, a variant of the Connex processor, which is low-power. The working compiler is described at https://dl.acm.org/doi/10.1145/3406536 and at https://sites.google.com/site/connextools/ .

Note that currently our back end targets only our Connex Opincaa assembler (very easy to learn and use) available at https://gitlab.dcae.pub.ro/research/ConnexRelated/opincaa/ .
The Connex Opincaa assembler allows to run arbitrary Connex vector-length, host (CPU) agnostic code.

The ISA of the Connex vector processor is available at https://gitlab.dcae.pub.ro/research/ConnexRelated/opincaa/blob/master/ConnexISA.pdf .
The Connex vector processor has also an open source C++ simulator available also at https://gitlab.dcae.pub.ro/research/ConnexRelated/opincaa/ .

The mailing list for the Connex processor and tools is: https://groups.google.com/forum/#!forum/connex-tools .

An interesting feature is that, in order to support recovering from from the Instruction selection pass' SelectionDAG back to the original source (C) code we require adding a simple data structure in include/llvm/CodeGen/SelectionDAG.h (and helper methods in related files) that maps an SDValue to the LLVM IR Value object it was used to translate from:

DenseMap<const Value*, SDValue> *crtNodeMapPtr

The Connex back end is 4 years old. We published 2 academic papers on it at ACM TECS and a CGO workshop: https://dl.acm.org/citation.cfm?id=3306166 . However, we are still adding features to the back end.

Small note: the Connex backend is rather small, it builds fast (in ~3-5 mins, single-threaded on a decent machine; in Apr 2019 the built objects have a total 71,168K, while the smallest LLVM backend, MSP430, has 63,387K and the biggest ones are X86 with 359,736K, and AMDGPU with 488,309K).

An important thing is that I think the test/MC/Connex folder should not be populated for this patch, because the Connex back end is able to generate only an assembly code that is required to be used by the special Opincaa assembler, which is not integrated in LLVM. I've seen other back ends doing a similar thing such as the NVPTX back end, which doesn't support object file generation. The Connex back end also doesn't support object file generation.
The eBPF+ConnexS processor has the same ABI as the eBPF processor it extends, except that Connex-S supports natively only 16-bit integers and it is able to access the banked vector memory only by line (so Connex-S can't perform unaligned accesses).

The Connex processor is currently implemented in FPGA, but was also implemented in silicon also:

an older version for HDTV: Gheorghe M. Stefan, "The CA1024: A Massively Parallel Processor for Cost-Effective HDTV", 2006 (http://users.dcae.pub.ro/~gstefan/2ndLevel/images/connex_v4.ppt)
M. Malita and Gheorghe M. Stefan, "Map-scan Node Accelerator for Big-data"
Gheorghe M. Stefan and Mihaela Malita, "Can One-Chip Parallel Computing Be Liberated From Ad Hoc Solutions? A Computation Model Based Approach and Its Implementation"

Comitting first separately the patch with the main CMakeLists.txt and the Triple.h file.

Diff Detail

Event Timeline

alexsusu created this revision.Feb 28 2021, 11:57 AM

Herald added subscribers: dexonsmith, pengfei, kristof.beyls and 2 others. · View Herald TranscriptFeb 28 2021, 11:57 AM

alexsusu requested review of this revision.Feb 28 2021, 11:57 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 28 2021, 11:57 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

alexsusu retitled this revision from Add Connex vector processor back end to Add the Connex SIMD/vector processor back end.Feb 28 2021, 11:58 AM

alexsusu edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B91243: Diff 326990.Feb 28 2021, 1:18 PM

alexsusu edited the summary of this revision. (Show Details)Mar 2 2021, 9:37 AM

alexsusu edited the summary of this revision. (Show Details)

alexsusu added a child revision: D97783: Add the Connex SIMD/vector processor back end (main back end patch).Mar 2 2021, 10:10 AM

alexsusu edited the summary of this revision. (Show Details)Mar 4 2021, 6:18 AM

alexsusu added reviewers: asb, efriedma.

alexsusu added reviewers: jpienaar, jfb, fedor.sergeev.Jan 28 2023, 8:24 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 28 2023, 8:24 AM

Herald added a subscriber: kosarev. · View Herald Transcript

dexonsmith removed a subscriber: dexonsmith.Jan 28 2023, 9:04 AM

Needs the triple unit tests

llvm/CMakeLists.txt
432	New targets start in experimental state, not sure if we have a separate list of them but you should just drop this

Addressed review of Matt Arsenault (put the Connex back end in the experimental list, LLVM_EXPERIMENTAL_TARGETS_TO_BUILD, in CMakeLists.txt; added the source file llvm/unittests/TargetParser/TripleTest.cpp).

Harbormaster completed remote builds in B217409: Diff 502421.Mar 5 2023, 3:45 AM

Revision Contents

Path

Size

llvm/

CMakeLists.txt

2 lines

include/

llvm/

TargetParser/

Triple.h

1 line

unittests/

TargetParser/

TripleTest.cpp

3 lines

Diff 502421

llvm/CMakeLists.txt

	Show First 20 Lines • Show All 423 Lines • ▼ Show 20 Lines

	# List of all targets to be built by default:			# List of all targets to be built by default:
	set(LLVM_ALL_TARGETS			set(LLVM_ALL_TARGETS
	AArch64			AArch64
	AMDGPU			AMDGPU
	ARM			ARM
	AVR			AVR
	BPF			BPF
	Hexagon			Hexagon
				arsenmUnsubmitted Not Done Reply Inline Actions New targets start in experimental state, not sure if we have a separate list of them but you should just drop this arsenm: New targets start in experimental state, not sure if we have a separate list of them but you…
	Lanai			Lanai
	LoongArch			LoongArch
	Mips			Mips
	MSP430			MSP430
	NVPTX			NVPTX
	PowerPC			PowerPC
	RISCV			RISCV
	Sparc			Sparc
	SystemZ			SystemZ
	VE			VE
	WebAssembly			WebAssembly
	X86			X86
	XCore			XCore
	)			)

	# List of targets with JIT support:			# List of targets with JIT support:
	set(LLVM_TARGETS_WITH_JIT X86 PowerPC AArch64 ARM Mips SystemZ)			set(LLVM_TARGETS_WITH_JIT X86 PowerPC AArch64 ARM Mips SystemZ)

	set(LLVM_TARGETS_TO_BUILD "all"			set(LLVM_TARGETS_TO_BUILD "all"
	CACHE STRING "Semicolon-separated list of targets to build, or \"all\".")			CACHE STRING "Semicolon-separated list of targets to build, or \"all\".")

	set(LLVM_EXPERIMENTAL_TARGETS_TO_BUILD ""			set(LLVM_EXPERIMENTAL_TARGETS_TO_BUILD "Connex"
	CACHE STRING "Semicolon-separated list of experimental targets to build.")			CACHE STRING "Semicolon-separated list of experimental targets to build.")

	option(BUILD_SHARED_LIBS			option(BUILD_SHARED_LIBS
	"Build all libraries as shared libraries instead of static" OFF)			"Build all libraries as shared libraries instead of static" OFF)

	option(LLVM_ENABLE_BACKTRACES "Enable embedding backtraces on crash." ON)			option(LLVM_ENABLE_BACKTRACES "Enable embedding backtraces on crash." ON)
	if(LLVM_ENABLE_BACKTRACES)			if(LLVM_ENABLE_BACKTRACES)
	set(ENABLE_BACKTRACES 1)			set(ENABLE_BACKTRACES 1)
	▲ Show 20 Lines • Show All 893 Lines • Show Last 20 Lines

llvm/include/llvm/TargetParser/Triple.h

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	enum ArchType {
armeb, // ARM (big endian): armeb		armeb, // ARM (big endian): armeb
aarch64, // AArch64 (little endian): aarch64		aarch64, // AArch64 (little endian): aarch64
aarch64_be, // AArch64 (big endian): aarch64_be		aarch64_be, // AArch64 (big endian): aarch64_be
aarch64_32, // AArch64 (little endian) ILP32: aarch64_32		aarch64_32, // AArch64 (little endian) ILP32: aarch64_32
arc, // ARC: Synopsys ARC		arc, // ARC: Synopsys ARC
avr, // AVR: Atmel AVR microcontroller		avr, // AVR: Atmel AVR microcontroller
bpfel, // eBPF or extended BPF or 64-bit BPF (little endian)		bpfel, // eBPF or extended BPF or 64-bit BPF (little endian)
bpfeb, // eBPF or extended BPF or 64-bit BPF (big endian)		bpfeb, // eBPF or extended BPF or 64-bit BPF (big endian)
		connex, // Connex vector processor
csky, // CSKY: csky		csky, // CSKY: csky
dxil, // DXIL 32-bit DirectX bytecode		dxil, // DXIL 32-bit DirectX bytecode
hexagon, // Hexagon: hexagon		hexagon, // Hexagon: hexagon
loongarch32, // LoongArch (32-bit): loongarch32		loongarch32, // LoongArch (32-bit): loongarch32
loongarch64, // LoongArch (64-bit): loongarch64		loongarch64, // LoongArch (64-bit): loongarch64
m68k, // M68k: Motorola 680x0 family		m68k, // M68k: Motorola 680x0 family
mips, // MIPS: mips, mipsallegrex, mipsr6		mips, // MIPS: mips, mipsallegrex, mipsr6
mipsel, // MIPSEL: mipsel, mipsallegrexe, mipsr6el		mipsel, // MIPSEL: mipsel, mipsallegrexe, mipsr6el
▲ Show 20 Lines • Show All 1,032 Lines • Show Last 20 Lines

llvm/unittests/TargetParser/TripleTest.cpp

Show First 20 Lines • Show All 1,477 Lines • ▼ Show 20 Lines	TEST(TripleTest, EndianArchVariants) {
T.setArch(Triple::bpfeb);		T.setArch(Triple::bpfeb);
EXPECT_EQ(Triple::bpfeb, T.getBigEndianArchVariant().getArch());		EXPECT_EQ(Triple::bpfeb, T.getBigEndianArchVariant().getArch());
EXPECT_EQ(Triple::bpfel, T.getLittleEndianArchVariant().getArch());		EXPECT_EQ(Triple::bpfel, T.getLittleEndianArchVariant().getArch());

T.setArch(Triple::bpfel);		T.setArch(Triple::bpfel);
EXPECT_EQ(Triple::bpfeb, T.getBigEndianArchVariant().getArch());		EXPECT_EQ(Triple::bpfeb, T.getBigEndianArchVariant().getArch());
EXPECT_EQ(Triple::bpfel, T.getLittleEndianArchVariant().getArch());		EXPECT_EQ(Triple::bpfel, T.getLittleEndianArchVariant().getArch());

		T.setArch(Triple::connex);
		EXPECT_EQ(Triple::connex, T.getArch());

T.setArch(Triple::mips64);		T.setArch(Triple::mips64);
EXPECT_EQ(Triple::mips64, T.getBigEndianArchVariant().getArch());		EXPECT_EQ(Triple::mips64, T.getBigEndianArchVariant().getArch());
EXPECT_EQ(Triple::NoSubArch, T.getBigEndianArchVariant().getSubArch());		EXPECT_EQ(Triple::NoSubArch, T.getBigEndianArchVariant().getSubArch());
EXPECT_EQ(Triple::mips64el, T.getLittleEndianArchVariant().getArch());		EXPECT_EQ(Triple::mips64el, T.getLittleEndianArchVariant().getArch());
EXPECT_EQ(Triple::NoSubArch, T.getLittleEndianArchVariant().getSubArch());		EXPECT_EQ(Triple::NoSubArch, T.getLittleEndianArchVariant().getSubArch());

T.setArch(Triple::mips64, Triple::MipsSubArch_r6);		T.setArch(Triple::mips64, Triple::MipsSubArch_r6);
EXPECT_EQ(Triple::mips64, T.getBigEndianArchVariant().getArch());		EXPECT_EQ(Triple::mips64, T.getBigEndianArchVariant().getArch());
▲ Show 20 Lines • Show All 672 Lines • Show Last 20 Lines