Page MenuHomePhabricator

Add the Connex SIMD/vector processor back end (main back end patch)
Needs ReviewPublic

Authored by alexsusu on Mar 2 2021, 10:09 AM.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

Connex is an established, almost 30-year old, wide research vector processor (see, for example, http://users.dcae.pub.ro/~gstefan/2ndLevel/connex.html) with a number of lanes between 32 and 4096, easily changeable at synthesis time.
A very interesting feature is that the Connex processor has a local banked vector memory (each lane has its own local memory), which achieves 1 cycle latency with direct and indirect loads and stores - this implies that the memory bandwidth is very big.

The Connex vector processor has 16-bit signed integer Execution Units in each lane. It is emulating efficiently (via inlining the emulation subroutines in the instruction selection pass) 32-bit int and IEEE 754-2008 compliant 16-bit floating point (Clang type _Float16, C for ARM __fp16, LLVM IR half type). The emulation subroutines are in the lib/Target/Connex/Select_*_OpincaaCodeGen.h files, which are to be included in the ConnexISelDAGToDAG.cpp module, in the ConnexDAGToDAGISel::Select() method. These emulation subroutines can be easily adjusted using for example to increase performance by sacrificing accuracy of f16 - drop me an email to ask how can you do it. (They currently total almost 1 MB of C++ code.)
The Connex vector processor does not currently support the float, double, nor the 64-bit integer types.

The back end targets more exactly the Connex processor, used as an accelerator, a variant of the Connex processor, which is low-power. The working compiler is described at https://dl.acm.org/doi/10.1145/3406536 and at https://sites.google.com/site/connextools/ .

Note that currently our back end targets only our Connex Opincaa assembler (very easy to learn and use) available at https://gitlab.dcae.pub.ro/research/ConnexRelated/opincaa/ .
The Connex Opincaa assembler allows to run arbitrary Connex vector-length, host (CPU) agnostic code.

The ISA of the Connex vector processor is available at https://gitlab.dcae.pub.ro/research/ConnexRelated/opincaa/blob/master/ConnexISA.pdf .
The Connex vector processor has also an open source C++ simulator available also at https://gitlab.dcae.pub.ro/research/ConnexRelated/opincaa/ .

The mailing list for the Connex processor and tools is: https://groups.google.com/forum/#!forum/connex-tools .

An interesting feature is that, in order to support recovering from from the Instruction selection pass' SelectionDAG back to the original source (C) code we require adding a simple data structure in include/llvm/CodeGen/SelectionDAG.h (and helper methods in related files) that maps an SDValue to the LLVM IR Value object it was used to translate from:

DenseMap<const Value*, SDValue> *crtNodeMapPtr

The Connex back end is 4 years old. We published 2 academic papers on it at ACM TECS and a CGO workshop: https://dl.acm.org/citation.cfm?id=3306166 . However, we are still adding features to the back end.

Small note: the Connex backend is rather small, it builds fast (in ~3-5 mins, single-threaded on a decent machine; in Apr 2019 the built objects have a total 71,168K, while the smallest LLVM backend, MSP430, has 63,387K and the biggest ones are X86 with 359,736K, and AMDGPU with 488,309K).

An important thing is that I think the test/MC/Connex folder should not be populated for this patch, because the Connex back end is able to generate only an assembly code that is required to be used by the special Opincaa assembler, which is not integrated in LLVM. I've seen other back ends doing a similar thing such as the NVPTX back end, which doesn't support object file generation. The Connex back end also doesn't support object file generation.
The eBPF+ConnexS processor has the same ABI as the eBPF processor it extends, except that Connex-S supports natively only 16-bit integers and it is able to access the banked vector memory only by line (so Connex-S can't perform unaligned accesses).

The Connex processor is currently implemented in FPGA, but was also implemented in silicon also:

an older version for HDTV: Gheorghe M. Stefan, "The CA1024: A Massively Parallel Processor for Cost-Effective HDTV", 2006 (http://users.dcae.pub.ro/~gstefan/2ndLevel/images/connex_v4.ppt)
M. Malita and Gheorghe M. Stefan, "Map-scan Node Accelerator for Big-data"
Gheorghe M. Stefan and Mihaela Malita, "Can One-Chip Parallel Computing Be Liberated From Ad Hoc Solutions? A Computation Model Based Approach and Its Implementation"

Comitting all the Connex back end files and a few other files from LLVM that I had to touch to work well.

Diff Detail

Event Timeline

alexsusu created this revision.Mar 2 2021, 10:09 AM
alexsusu requested review of this revision.Mar 2 2021, 10:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2021, 10:09 AM
alexsusu retitled this revision from Add the Connex SIMD/vector processor back end to Add the Connex SIMD/vector processor back end (main back end patch).