); // use ALU1, ALU2, and LOAD
+```
+
+
+These statements allow a latency rule (or subunit) to tie a set of instructions to functional unit instancesWhen there is more than one instance of the specified unit, or if the unit is declared as a functional unit group, at compile time one of those units is selected. Likewise, if the unit is a subunit of one or more functional units, one of the “parent” functional units is selected.
+
+### **Machine Description Compiler Artifacts**
+
+What does all of this produce?
+
+The primary artifact of the MDL compiler is a set of data that we associate with each instruction description in a targeted compiler. For each instruction, at compiler-build time we produce a list of objects, each of which describe that instruction’s behavior on a single functional unit instance. The instruction will have one of these objects for each functional unit instance that it can be scheduled on across all CPUs. These are written out as a set of auto-initialized collections of objects that are attached to instruction templates in the target compiler.
+
+Each of these objects describe the behavior of each instruction cycle by cycle:
+
+
+
+* What operand’s registers it reads and write,
+* What register constraints are applied to operands,
+* What resources it uses, holds on, or reserves.
+* What explicit functional unit and issue slots it uses.
+* What pooled resources need to be allocated.
+
+The other primary artifact is a set of objects and methods for managing the low-level details of instruction scheduling and register allocation. This includes methods to build and manage resource pools, pipeline models, resource reservation infrastructure, and instruction bundling, all specialized for the input machine description.
+
+As part of this effort, we will incrementally modify the LLVM compiler to alternatively use this information alongside of SchedMachineModel and Itinerary methodologies.
+
+
+
+>>>> GDC alert: inline image link here (to images/image1.png). Store image on your image server and adjust path/filename/extension if necessary.
(Back to top)(Next alert)
>>>>
+
+
+![alt_text](images/image1.png "image_tooltip")
+
+
+
+### **Using the MDL Language in LLVM**
+
+The proposed use case for the MDL language is as an alternative specification for the architecture description currently embodied in TableGen Schedules and Itineraries, particularly for architectures for which Schedules and Itineraries are not expressive enough. It is explicitly _not_ the intent that it “replace TableGen”. But we believe that the MDL language is a better language (vs Schedules and Itineraries) for a large class of accelerators, and can be used effectively alongside TableGen.
+
+We’ve written a tool (TdScan) which extracts enough information from TableGen descriptions so that we can sync instruction definitions with architecture definitions. TdScan can also optionally scrape all of the Schedule and Itinerary information from a tablegen description and produce an equivalent\*\* MDL description.
+
+So there are several possible MDL usage scenarios:
+
+
+
+* _Current: _Given a complete tablegen description with schedules or itineraries, scrape the architecture information and create an MDL description of the architecture every time you build the compiler.
+* _Transitional: _Scrape an existing tablegen description and keep the generated MDL file, using it as the architecture description going forward.
+* _Future (potentially): _when writing a compiler for a new architecture, write an MDL description rather than schedules and/or itineraries.
+
+The general development flow of using an MDL description in LLVM looks like this:
+
+
+
+1. Write an architecture description (or scrape one from an existing tablegen description).
+ 1. Instructions, operands, register descriptions in .td files
+ 2. Microarchitecture description in .mdl files
+2. Compile TD files with TableGen
+3. Use TdScan to scrape instruction, operand, and register information from tablegen, producing a .mdl file
+4. Compile the top-level MDL file (which includes the scraped Tablegen information). This produces C++ code for inclusion in llvm.
+5. Build LLVM.
+
+
+
+>>>> GDC alert: inline image link here (to images/image2.png). Store image on your image server and adjust path/filename/extension if necessary.
(Back to top)(Next alert)
>>>>
+
+
+![alt_text](images/image2.png "image_tooltip")
+
+
+
+#### **TdScan**
+
+To synchronize an MDL architecture description with llvm TableGen descriptions, we’ve written a tool which scrapes information that the MDL compiler needs from Tablegen files. In the general case, it collects basic information about registers, register classes, operands, and instruction definitions, and it produces an “mdl” file which can be processed by the MDL compiler to sync an architecture description to the tablegen descriptions of instructions.
+
+For currently upstreamed targets that use Schedules or Itineraries, TdScan can also extract the whole architecture specification from the tablegen files, and produce an MDL description of the architecture. We’ve used this approach to prove out our llvm integration with upstreamed targets. The integration and testing of this is ongoing.
+
+
+#### **Upstream Targets**
+
+In general, upstream targets have no compelling need for MDL descriptions - the existing Schedules and/or Itinerary descriptions are field tested. However, there are a few benefits to using an MDL description for existing targets. The primary benefit is that the MDL descriptions are typically quite a bit smaller, succinct, and (we believe) intuitive than the equivalent TableGen descriptions.
+
+
+
+
+ CPU
+ |
+ MDL Lines of Code
+ |
+ Tablegen Lines of Code
+ |
+
+
+ AArch64
+ |
+ 2866
+ |
+ 21429
+ |
+
+
+ AMDGPU
+ |
+ 299
+ |
+ 388
+ |
+
+
+ ARM
+ |
+ 3380
+ |
+ 9371
+ |
+
+
+ Hexagon
+ |
+ 1947
+ |
+ 17625
+ |
+
+
+ Lanai
+ |
+ 87
+ |
+ 69
+ |
+
+
+ Mips
+ |
+ 472
+ |
+ 3003
+ |
+
+
+ PowerPC
+ |
+ 4251
+ |
+ 4276
+ |
+
+
+ RISCV
+ |
+ 123
+ |
+ 2909
+ |
+
+
+ Sparc
+ |
+ 237
+ |
+ 123
+ |
+
+
+ SystemZ
+ |
+ 1638
+ |
+ 9224
+ |
+
+
+ X86
+ |
+ 5686
+ |
+ 25631
+ |
+
+
+
+
+\*\* Note: these numbers are generated using “index-based” references in subunit/latency rules, rather than symbolic references. These are typically 10-20% less lines of MDL description than when operand names are used, almost entirely due to operand name differences between instruction definitions (like “dest” vs “dst”, or “src1” vs “s1”). However, the databases produced by the two approaches are virtually identical - albeit ordered differently.
+
+
+#### **Syncing Instruction Information**
+
+The MDL compiler needs 3 pieces of information from tablegen for each machine instruction:
+
+
+
+1. The instruction opcode name
+2. Each operand’s name, type, and order of appearance in an instruction instance
+3. The name(s) of the subunit(s) it can run on.
+
+Subunits are a new concept introduced with the MDL. The normal approach is to modify each tablegen instruction description to explicitly specify subunit assignments, which become an additional instruction attribute. The other approach is to use subunit template bases to use regular expressions to tie instructions to subunits (just like InstRW records).
+
+As part of the build process, we use a program (“tdscan”) which scrapes the instruction information - including the subunit information from a target’s tablegen files and generates information about the target’s instructions. Tdscan allows us to stay in sync with changes to instruction definitions.
+
+
+#### **Using the generated microarchitecture information in LLVM**
+
+There are two classes of services that the MDL database and associated APIs provide:
+
+
+
+* Detailed pipeline modeling for instructions (for all processors, for all functional units) including instruction latencies calculations and resource usage (hazard management)
+* Parallel instruction bundling and instruction scheduling.
+
+The tablegen scraper (tdscan) can correctly scan all upstreamed targets and generate correct instruction, operand, and register class information for all of them.
+
+We can also extract high-level architecture information and generate correct MDL descriptions for all the upstreamed targets that have Schedules or Itineraries (AArch64, AMDGPU, AMD/R600, ARM, Hexagon, Lanai, Mips, PPC, RISCV, Sparc, SystemZ, X86). Usually, the new architecture spec is dramatically simpler than the tablegen descriptions.
+
+We provide code and libraries to do the following things - in machine-independent ways:
+
+
+
+* Calculate accurate instruction latencies.
+* A set of APIs to build and manage instruction bundles (parallel instructions), performing all the required legality checks and resource allocation based on information in the generated database.
+* Manage resource reservations and hazard management for an instruction scheduler.
+* Determine latencies between instructions based on resource holds and reservations.
+* Methods to query functional unit, issue slot, and resource assignments for a bundled/scheduled instruction.
+* Methods to query the set of all register uses and defs for an instruction instance, with accurate timing information.
+* Manage functional unit register forwarding.
+
+There’s more we can do here, and a deeper integration with upstreamed LLVM is a long-term goal.
+
+
+#### **Current Status of the LLVM Integration (briefly)**
+
+
+
+* We can generate MDL full architecture specs for all upstreamed targets, and properly represent and use all metadata associated with Schedules and Itineraries.
+* The MDL database is used to properly calculate instruction latencies for all architectures. Caveat: we don’t yet fully convert Itinerary and Schedule forwarding information, since the LLVM model for forwarding is fundamentally different from the MDL model, and the provided information is typically incomplete.
+* We’ve integrated the MDL-based bundle-packing and hazard management into all the LLVM schedulers, with the exception of the Swing scheduler, which is still in progress. We’ve run all the standard tests, and in most cases we produce the same schedules, with positive and negative performance differences in the noise.
+
+
+
+
+### **Appendix A: Full Language Grammar**
+
+The definitive Antlr4-based grammar is in llvm/utils/MdlCompiler/mdl.g4.
+
+
+```
+architecture_spec : architecture_item+ EOF ;
+architecture_item : family_name
+ | cpu_def
+ | register_def
+ | register_class
+ | resource_def
+ | pipe_def
+ | func_unit_template
+ | func_unit_group
+ | subunit_template
+ | latency_template
+ | instruction_def
+ | operand_def
+ | derived_operand_def
+ | import_file
+ | predicate_def ;
+
+import_file : 'import' STRING ;
+
+family_name : 'family' IDENT ';' ;
+
+//---------------------------------------------------------------------------
+// Top-level CPU instantiation.
+//---------------------------------------------------------------------------
+cpu_def : 'cpu' IDENT ('(' STRING (',' STRING)* ')')?
+ '{' cpu_stmt* '}' ';'? ;
+
+cpu_stmt : pipe_def
+ | resource_def
+ | reorder_buffer_def
+ | issue_statement
+ | cluster_instantiation
+ | func_unit_instantiation
+ | forward_stmt ;
+
+cluster_instantiation : 'cluster' IDENT '{' cluster_stmt+ '}' ';'? ;
+
+cluster_stmt : resource_def
+ | issue_statement
+ | func_unit_instantiation
+ | forward_stmt ;
+
+issue_statement : 'issue' '(' IDENT ')' name_list ';' ;
+
+func_unit_instantiation : 'func_unit' func_unit_instance func_unit_bases*
+ IDENT '(' resource_refs? ')'
+ ('->' (pin_one | pin_any | pin_all))? ';'
+
+func_unit_instance : IDENT ('<>' | ('<' number '>'))?
+func_unit_bases : ':' func_unit_instance
+
+pin_one : IDENT ;
+pin_any : IDENT ('|' IDENT)+ ;
+pin_all : IDENT ('&' IDENT)+ ;
+
+//---------------------------------------------------------------------------
+// A single forwarding specification (in CPUs and Clusters).
+//---------------------------------------------------------------------------
+forward_stmt : 'forward' IDENT '->'
+ forward_to_unit (',' forward_to_unit)* ';' ;
+forward_to_unit : IDENT ('(' snumber ')')? ;
+
+//---------------------------------------------------------------------------
+// Functional unit template definition.
+//---------------------------------------------------------------------------
+func_unit_template : 'func_unit' IDENT base_list
+ '(' func_unit_params? ')'
+ '{' func_unit_template_stmt* '}' ';'? ;
+
+func_unit_params : fu_decl_item (';' fu_decl_item)* ;
+fu_decl_item : 'resource' name_list
+ | 'register_class' name_list ;
+
+func_unit_template_stmt : resource_def
+ | port_def
+ | connect_stmt
+ | subunit_instantiation ;
+
+port_def : 'port' port_decl (',' port_decl)* ';' ;
+port_decl : IDENT ('<' IDENT '>')? ('(' resource_ref ')')? ;
+connect_stmt : 'connect' IDENT
+ ('to' IDENT)? ('via' resource_ref)? ';' ;
+
+//---------------------------------------------------------------------------
+// Functional unit group definition.
+//---------------------------------------------------------------------------
+func_unit_group : FUNCGROUP IDENT ':' name_list ';' ;
+
+//---------------------------------------------------------------------------
+// Definition of subunit template instantiation.
+//---------------------------------------------------------------------------
+subunit_instantiation : (name_list ':')? subunit_statement
+ | name_list ':' '{' subunit_statement* '}' ';'? ;
+
+subunit_statement : 'subunit' subunit_instance (',' subunit_instance)* ';' ;
+subunit_instance : IDENT '(' resource_refs? ')' ;
+
+//---------------------------------------------------------------------------
+// Definition of subunit template definition.
+//---------------------------------------------------------------------------
+subunit_template : 'subunit' IDENT su_base_list '(' su_decl_items? ')'
+ (('{' subunit_body* '}' ';'?) |
+ ('{{' latency_items* '}}' ';'? )) ;
+
+su_decl_items : su_decl_item (';' su_decl_item)* ;
+su_decl_item : 'resource' name_list
+ | 'port' name_list ;
+
+su_base_list : (':' (IDENT | STRING_LITERAL))* ;
+
+subunit_body : latency_instance ;
+latency_instance : (name_list ':')? latency_statement
+ | name_list ':' '{' latency_statement* '}' ';'? ;
+latency_statement : 'latency' IDENT '(' resource_refs? ')' ';' ;
+
+//---------------------------------------------------------------------------
+// Latency template definition.
+//---------------------------------------------------------------------------
+latency_template : 'latency' IDENT base_list
+ '(' su_decl_items? ')'
+ '{' latency_items* '}' ';'? ;
+
+latency_items : (name_list ':')?
+ (latency_item | ('{' latency_item* '}' ';'?)) ;
+
+latency_item : latency_ref
+ | conditional_ref
+ | fus_statement ;
+
+//---------------------------------------------------------------------------
+// Conditional references
+//---------------------------------------------------------------------------
+conditional_ref : 'if' IDENT '{' latency_item* '}'
+ (conditional_elseif | conditional_else)? ;
+conditional_elseif : 'else' 'if' IDENT '{' latency_item* '}'
+ (conditional_elseif | conditional_else)? ;
+conditional_else : 'else' '{' latency_item* '}' ;
+
+//---------------------------------------------------------------------------
+// Basic references
+//---------------------------------------------------------------------------
+latency_ref : ref_type '(' latency_spec ')' ';' ;
+
+ref_type : ('use' | 'def' | 'usedef' | 'kill' |
+ 'hold' | 'res' | 'predicate') ;
+
+latency_spec : expr (':' number)? ',' latency_resource_refs
+ | expr ('[' number (',' number)? ']')? ',' operand
+ | expr ',' operand ',' latency_resource_refs ;
+
+expr : '-' expr
+ | expr ('*' | '/') expr
+ | expr ('+' | '-') expr
+ | '{' expr '}'
+ | '(' expr ')'
+ | IDENT
+ | number
+ | operand ;
+
+//---------------------------------------------------------------------------
+// Shorthand for a reference that uses functional units.
+//---------------------------------------------------------------------------
+fus_statement : 'fus' '(' (fus_item ('&' fus_item)* ',')?
+ snumber (',' fus_attribute)* ')' ';'
+ ;
+fus_item : IDENT ('<' (expr ':')? number '>')? ;
+
+fus_attribute : 'BeginGroup' | 'EndGroup' | 'SingleIssue' | 'RetireOOO' ;
+
+//---------------------------------------------------------------------------
+// Latency resource references
+//---------------------------------------------------------------------------
+latency_resource_refs : latency_resource_ref (',' latency_resource_ref)* ;
+
+latency_resource_ref : resource_ref ':' number (':' IDENT)?
+ | resource_ref ':' IDENT (':' IDENT)?
+ | resource_ref ':' ':' IDENT // no allocation
+ | resource_ref ':' '*' // allocate all members
+ | resource_ref ;
+
+operand : (IDENT ':')? '$' IDENT ('.' operand_ref)*
+ | (IDENT ':')? '$' number
+ | (IDENT ':')? '$$' number
+
+operand_ref : (IDENT | number) ;
+
+//---------------------------------------------------------------------------
+// Pipeline phase names definitions.
+//---------------------------------------------------------------------------
+pipe_def : protection? 'phases' IDENT '{' pipe_phases '}' ';'? ;
+protection : 'protected' | 'unprotected' | 'hard' ;
+pipe_phases : phase_id (',' phase_id)* ;
+phase_id : '#'? IDENT ('[' range ']')? ('=' number)? ;
+
+//---------------------------------------------------------------------------
+// Resource definitions: global in scope, CPU- or Datapath- or FU-level.
+//---------------------------------------------------------------------------
+resource_def : 'resource' ( '(' IDENT ('..' IDENT)? ')' )?
+ resource_decl (',' resource_decl)* ';' ;
+
+resource_decl : IDENT (':' number)? ('[' number ']')?
+ | IDENT (':' number)? '{' name_list '}'
+ | IDENT (':' number)? '{' group_list '}' ;
+
+resource_refs : resource_ref (',' resource_ref)* ;
+
+resource_ref : IDENT ('[' range ']')?
+ | IDENT '.' IDENT
+ | IDENT '[' number ']'
+ | IDENT ('|' IDENT)+
+ | IDENT ('&' IDENT)+ ;
+
+//---------------------------------------------------------------------------
+// List of identifiers.
+//---------------------------------------------------------------------------
+name_list : IDENT (',' IDENT)* ;
+group_list : IDENT ('|' IDENT)+
+ | IDENT ('&' IDENT)+ ;
+
+//---------------------------------------------------------------------------
+// List of template bases
+//---------------------------------------------------------------------------
+base_list : (':' IDENT)* ;
+
+//---------------------------------------------------------------------------
+// Register definitions.
+//---------------------------------------------------------------------------
+register_def : 'register' register_decl (',' register_decl)* ';' ;
+register_decl : IDENT ('[' range ']')? ;
+
+register_class : 'register_class' IDENT
+ '{' register_decl (',' register_decl)* '}' ';'?
+ | 'register_class' IDENT '{' '}' ';'? ;
+
+//---------------------------------------------------------------------------
+// Instruction definition.
+//---------------------------------------------------------------------------
+instruction_def : 'instruction' IDENT
+ '(' (operand_decl (',' operand_decl)*)? ')'
+ '{'
+ ('subunit' '(' name_list ')' ';' )?
+ ('derived' '(' name_list ')' ';' )?
+ '}' ';'? ;
+
+//---------------------------------------------------------------------------
+// Operand definition.
+//---------------------------------------------------------------------------
+operand_def : 'operand' IDENT
+ '(' (operand_decl (',' operand_decl)*)? ')'
+ '{' (operand_type | operand_attribute)* '}' ';'?
+ ;
+operand_decl : ((IDENT (IDENT)?) | '...') ('(I)' | '(O)')? ;
+
+operand_type : 'type' '(' IDENT ')' ';' ;
+
+operand_attribute : (name_list ':')? operand_attribute_stmt
+ | name_list ':' '{' operand_attribute_stmt* '}' ';'? ;
+operand_attribute_stmt : 'attribute' IDENT '=' (snumber | tuple)
+ ('if' ('lit' | 'address' | 'label')
+ ('[' pred_value (',' pred_value)* ']' )? )? ';' ;
+pred_value : snumber
+ | snumber '..' snumber
+ | '{' number '}' ;
+
+//---------------------------------------------------------------------------
+// Derived Operand definition.
+//---------------------------------------------------------------------------
+derived_operand_def : 'operand' IDENT base_list ('(' ')')?
+ '{' (operand_type | operand_attribute)* '}' ';'? ;
+
+//---------------------------------------------------------------------------
+// Predicate definition.
+//---------------------------------------------------------------------------
+predicate_def : 'predicate' IDENT ':' predicate_op? ';' ;
+
+predicate_op : pred_opcode '<' pred_opnd (',' pred_opnd)* ','? '>'
+ | code_escape
+ | IDENT ;
+code_escape : '[' '{' .*? '}' ']' ;
+
+pred_opnd : IDENT
+ | snumber
+ | STRING_LITERAL
+ | '[' IDENT (',' IDENT)* ']'
+ | predicate_op
+ | operand ;
+
+pred_opcode : 'CheckAny' | 'CheckAll' | 'CheckNot' | 'CheckOpcode'
+ | 'CheckIsRegOperand' | 'CheckRegOperand'
+ | 'CheckSameRegOperand' | 'CheckNumOperands'
+ | 'CheckIsImmOperand' | 'CheckImmOperand'
+ | 'CheckZeroOperand' | 'CheckInvalidRegOperand'
+ | 'CheckFunctionPredicate' | 'CheckFunctionPredicateWithTII'
+ | 'TIIPredicate'
+ | 'OpcodeSwitchStatement' | 'OpcodeSwitchCase'
+ | 'ReturnStatement' | 'MCSchedPredicate' ;
+
+//---------------------------------------------------------------------------
+// Match and convert a number, a set of numbers, and a range of numbers.
+//---------------------------------------------------------------------------
+number : NUMBER ;
+snumber : NUMBER | '-' NUMBER ;
+tuple : '[' snumber (',' snumber)* ']' ;
+range : number '..' number ;
+```
+
+
+
+### **Appendix B: Future Directions**
+
+
+#### **Memory Hierarchy**
+
+We need a first class representation of any compiler-managed memory hierarchy.
+
+Compiler-managed memory
+
+
+
+* Per level
+ * Size
+ * Addressable units
+ * Speed
+ * Latency
+ * Access method(s)
+ * Banking
+ * Sharing
+* Separate address spaces
+ * Code, Data, I/O, etc
+
+Caches
+
+
+
+* Per level
+ * Size
+ * Type (I, D, I/D)
+ * Replacement policy
+ * Mapping (direct, associativity)
+ * Line size
+ * Prefetching
+ * Miss cost modeling
+ * etc
+
+Synchronization policies
+
+Virtual Memory
+
+DMA system descriptions
+
+
+#### **Multi-Processor System Topology**
+
+
+### **Appendix C: RISC-V Generated Architecture Description**
+
+This is a complete, automatically generated machine description for RISC-V using our tool to scrape information from tablegen files. We can automatically generate MDL specifications for all targets that have schedules and/or itineraries. We include RISC-V here for illustrative purposes.
+
+The “Schedule” td files for RISC-V are approximately 1720 lines of tablegen, describing two full schedule models and one “default” model. The generated MDL file is ~120 lines of our machine description language.
+
+
+```
+//---------------------------------------------------------------------
+// This file is autogenerated from an LLVM Target Description File.
+//---------------------------------------------------------------------
+import "RISCV_instructions.mdl"
+
+//---------------------------------------------------------------------
+// Pipeline phase definitions
+//---------------------------------------------------------------------
+protected phases RISCV { F1, E[1..57] };
+
+//---------------------------------------------------------------------
+// CPU Description Classes (4 entries)
+//---------------------------------------------------------------------
+cpu RISCV("generic", "generic-rv32", "generic-rv64") {
+}
+
+cpu Rocket("rocket", "rocket-rv32", "rocket-rv64", "sifive-e20", "sifive-e21", "sifive-e24", "sifive-e31", "sifive-e34", "sifive-s21", "sifive-s51", "sifive-s54", "sifive-u54") {
+ protected phases defaults { LOAD_PHASE=3 };
+ issue(F1) s0;
+ func_unit RocketUnitALU<0> U0();
+ func_unit RocketUnitB<0> U1();
+ func_unit RocketUnitFPALU<0> U2();
+ func_unit RocketUnitFPDivSqrt<1> U3();
+ func_unit RocketUnitIDiv<1> U4();
+ func_unit RocketUnitIMul<0> U5();
+ func_unit RocketUnitMem<0> U6();
+}
+
+cpu SiFive7("sifive-7-series", "sifive-e76", "sifive-s76", "sifive-u74") {
+ protected phases defaults { LOAD_PHASE=3 };
+ issue(F1) s0, s1;
+ func_unit SiFive7PipeA<0> U0();
+ func_unit SiFive7PipeB<0>:SiFive7FDiv<1>:SiFive7IDiv<1> U1();
+}
+
+cpu SyntacoreSCR1("syntacore-scr1-base", "syntacore-scr1-max") {
+ protected phases defaults { LOAD_PHASE=2 };
+ issue(F1) s0;
+ func_unit SCR1_ALU<0> U0();
+ func_unit SCR1_CFU<0> U1();
+ func_unit SCR1_DIV<0> U2();
+ func_unit SCR1_LSU<0> U3();
+ func_unit SCR1_MUL<0> U4();
+}
+
+//---------------------------------------------------------------------
+// Functional Unit Groups
+//---------------------------------------------------------------------
+func_group SiFive7PipeAB: SiFive7PipeA, SiFive7PipeB;
+
+//---------------------------------------------------------------------
+// Subunit Definitions (58 entries)
+//---------------------------------------------------------------------
+subunit sub6() {{ def(E1, $0); fus(1); fus(Rocket, 0); }}
+subunit sub7() {{ def(E1, $0); fus(1); fus(SiFive7, 0); }}
+subunit sub8() {{ def(E1, $0); fus(1); fus(SyntacoreSCR1, 0); }}
+subunit sub49() {{ def(E1, $0); fus(RocketUnitALU, 1); fus(RocketUnitB, 1); }}
+subunit sub0() {{ def(E1, $0); fus(RocketUnitALU, 1); }}
+subunit sub41() {{ def(E1, $0); fus(RocketUnitB, 1); }}
+subunit sub56() {{ def(E1, $0); fus(RocketUnitMem, 1); }}
+subunit sub51() {{ def(E1, $0); fus(SCR1_ALU, 1); fus(SCR1_CFU, 1); }}
+subunit sub2() {{ def(E1, $0); fus(SCR1_ALU, 1); }}
+subunit sub42() {{ def(E1, $0); fus(SCR1_CFU, 1); }}
+subunit sub5() {{ def(E1, $0); fus(SCR1_LSU, 1); }}
+subunit sub45() {{ def(E1, $0); fus(SCR1_MUL, 1); }}
+subunit sub57() {{ def(E1, $0); fus(SiFive7PipeA, 1); }}
+subunit sub12() {{ def(E1, $0); fus(SiFive7PipeB, 1); }}
+subunit sub46() {{ def(E1, $1); fus(RocketUnitALU, 1); fus(RocketUnitB, 1); }}
+subunit sub17() {{ def(E1, $1); fus(RocketUnitB, 1); }}
+subunit sub48() {{ def(E1, $1); fus(SCR1_ALU, 1); fus(SCR1_CFU, 1); }}
+subunit sub19() {{ def(E1, $1); fus(SCR1_CFU, 1); }}
+subunit sub18() {{ def(E1, $1); fus(SiFive7PipeB, 1); }}
+subunit sub26() {{ def(E16, $0); fus(SiFive7PipeB&SiFive7IDiv<15>, 1); }}
+subunit sub33() {{ def(E2, $0); fus(RocketUnitFPALU, 1); }}
+subunit sub3() {{ def(E2, $0); fus(RocketUnitMem, 1); }}
+subunit sub20() {{ def(E2, $0); fus(SCR1_LSU<2>, 1); }}
+subunit sub13() {{ def(E2, $0); fus(SiFive7PipeA, 1); }}
+subunit sub35() {{ def(E20, $0); fus(RocketUnitFPDivSqrt<20>, 1); }}
+subunit sub40() {{ def(E25, $0); fus(RocketUnitFPDivSqrt<25>, 1); }}
+subunit sub37() {{ def(E27, $0); fus(SiFive7PipeB&SiFive7FDiv<26>, 1); }}
+subunit sub43() {{ def(E3, $0); fus(RocketUnitMem, 1); }}
+subunit sub52() {{ def(E3, $0); fus(SiFive7PipeA&SiFive7PipeB, 2); }}
+subunit sub4() {{ def(E3, $0); fus(SiFive7PipeA, 1); }}
+subunit sub50() {{ def(E3, $0); fus(SiFive7PipeAB, 1); fus(SiFive7PipeB, 1); }}
+subunit sub1() {{ def(E3, $0); fus(SiFive7PipeAB, 1); }}
+subunit sub34() {{ def(E3, $0); fus(SiFive7PipeB, 1); }}
+subunit sub47() {{ def(E3, $1); fus(SiFive7PipeAB, 1); fus(SiFive7PipeB, 1); }}
+subunit sub25() {{ def(E33, $0); fus(RocketUnitIDiv<33>, 1); }}
+subunit sub27() {{ def(E33, $0); fus(SCR1_DIV<33>, 1); }}
+subunit sub28() {{ def(E34, $0); fus(RocketUnitIDiv<34>, 1); }}
+subunit sub31() {{ def(E4, $0); fus(RocketUnitFPALU, 1); }}
+subunit sub44() {{ def(E4, $0); fus(RocketUnitIMul, 1); }}
+subunit sub39() {{ def(E5, $0); fus(RocketUnitFPALU, 1); }}
+subunit sub32() {{ def(E5, $0); fus(SiFive7PipeB, 1); }}
+subunit sub36() {{ def(E56, $0); fus(SiFive7PipeB&SiFive7FDiv<55>, 1); }}
+subunit sub29() {{ def(E6, $0); fus(RocketUnitFPALU, 1); }}
+subunit sub38() {{ def(E7, $0); fus(RocketUnitFPALU, 1); }}
+subunit sub30() {{ def(E7, $0); fus(SiFive7PipeB, 1); }}
+subunit sub21() {{ fus(1); fus(Rocket, 0); }}
+subunit sub22() {{ fus(1); fus(SiFive7, 0); }}
+subunit sub23() {{ fus(1); fus(SyntacoreSCR1, 0); }}
+subunit sub53() {{ fus(RocketUnitALU, 1); fus(RocketUnitB, 1); }}
+subunit sub9() {{ fus(RocketUnitB, 1); }}
+subunit sub14() {{ fus(RocketUnitMem, 1); }}
+subunit sub55() {{ fus(SCR1_ALU, 1); fus(SCR1_CFU, 1); }}
+subunit sub11() {{ fus(SCR1_CFU, 1); }}
+subunit sub16() {{ fus(SCR1_LSU, 1); }}
+subunit sub24() {{ fus(SCR1_LSU<2>, 1); }}
+subunit sub15() {{ fus(SiFive7PipeA, 1); }}
+subunit sub54() {{ fus(SiFive7PipeAB, 1); fus(SiFive7PipeB, 1); }}
+subunit sub10() {{ fus(SiFive7PipeB, 1); }}
+
diff --git a/llvm/docs/Mdl/RFC.md b/llvm/docs/Mdl/RFC.md
new file mode 100644
--- /dev/null
+++ b/llvm/docs/Mdl/RFC.md
@@ -0,0 +1,46 @@
+
+## MDL: A Micro-Architecture Description Language for LLVM
+
+November 2022 Reid Tatge [tatge@google.com](mailto:tatge@google.com)
+
+
+#### **TL;DR:**
+
+We’ve created a DSL and compiler for modeling micro-architecture that handles a very broad class of architectures - CPU, GPUs, VLIWs, DSPs, ML accelerators, and embedded devices. This effort grew out of a need to quickly develop and experiment with high-quality compilers and tools to facilitate rapid architecture exploration. We named the DSL “MDL” for “Microarchitecture Description Language”.
+
+While being significantly more expressive than TableGen’s Schedules and Itineraries used in LLVM, MDL is also more concise, and simpler to read and write while supporting a much broader class of embedded and accelerator architectures. We currently can automatically _generate _MDL descriptions for all upstream targets which are in many cases 1/10 the size of the equivalent TableGen descriptions. We’ve integrated this with LLVM, and are sending out this RFC because we believe it could be valuable to the larger LLVM community. \
+
+
+The MDL compiler, associated tools, and documentation are available as open source (at https://github.com/MPACT-ORG/llvm-project/tree/work), and we would like to explore adding this to the LLVM project, and encourage contributions from others.
+
+
+#### **Background**
+
+Over the last few years, we have been using LLVM to develop a compiler backend for Google’s TPU machine learning accelerators. TPUs have complex microarchitectures and pose a number of challenges that are not seen in in typical LLVM targets:
+
+
+
+* Clustered VLIW with partitioned register files.
+* Extremely deep pipelines with complex hazard conditions
+* Instructions with functional-unit-specific and/or cluster-specific behaviors
+ * Non-trivial and/or instance-specific latencies
+ * Complex resource usage
+ * Functional-unit-specific register constraints
+* Shared/allocated encoding resources (instructions need 1..M of N resources)
+* Explicitly managed hardware resources (register ports, internal datapaths, busses, etc)
+
+While some of these problems manifest in a few upstream targets, this collection of problems is a superset of the problems directly addressed by LLVM - Schedules and Itineraries are simply not sufficient to model everything. Supporting this class of architecture is therefore code-intensive - it takes around 20,000 lines of C++ code to model the TPU sub-targets. This is brittle, hard to write, debug, test, and evolve over time. In contrast, the MDL description for these sub-targets is ~2,000 lines of text.
+
+
+#### **Status**
+
+
+
+* We’ve created the MDL language and compiler for describing microarchitecture details, a methodology for integrating it with TableGen files for any target, and a set of APIs that can be used in a machine-independent way to inform back-end passes such as bundle-packing, instruction scheduling, and register allocation.
+* To facilitate integration with LLVM, we built a tool which scrapes architectural information from TableGen files, and produces our MDL language for all upstream targets.
+* We’ve modified the CodeGen and MC libraries to (optionally) use our methodology for latency management.
+
+There is a lot more to do. For example, we plan to enhance existing back-end scheduling passes and register allocation passes to cleanly handle a larger class of embedded and accelerator architectures, based on MDL-generated information.
+
+We welcome feedback on the language design and associated tools and use model. You can find the MDL design documentation, compiler, and other tools in our github repo in llvm/docs/mdl.
+
diff --git a/llvm/docs/Mdl/ResourceGroups.md b/llvm/docs/Mdl/ResourceGroups.md
new file mode 100644
--- /dev/null
+++ b/llvm/docs/Mdl/ResourceGroups.md
@@ -0,0 +1,304 @@
+
+
+
+## Modeling Resource Groups
+
+Reid Tatge tatge@google.com
+
+
+[TOC]
+
+
+
+### Introduction
+
+The MDL language supports the specification and use of “resource groups”, which is a set of related resources that can be allocated like a pool:
+
+
+```
+ resource group { a, b, c, d, e, f };
+```
+
+
+Resource groups have CPU, Cluster, or Functional Unit Template scope, and can be passed as parameters to functional unit, subunit, or latency templates. You can pass an entire group to a template as a parameter:
+
+
+```
+ subunit yyy(group); // reference the entire group
+```
+
+
+Or you can pass references to a member with a C++ “struct” like syntax:
+
+
+```
+ subunit xxx(group.a); // reference a single member of a group
+```
+
+
+When a group is passed to a template, you can allocate a single member of a group:
+
+
+```
+ def(E3, group:1); // allocate a single resource from the group
+```
+
+
+Or reference a named item of the group:
+
+
+```
+ def(E3, group.d); // use a named member of the group
+```
+
+
+Or reference the entire group:
+
+
+```
+ def(E3, group); // use all the resources in a group
+```
+
+
+However, you cannot cleanly reference a subset of a group (or an arbitrary set of resources).
+
+
+### Former interpretation of groups
+
+Currently, members of a resource group have the scope of the context they are defined in (CPU, Cluster, or Functional Unit Template). Resource groups defined in the same scope may define members with the same name, and these names can shadow other resource names defined in the same scope. So for example, the following is legal:
+
+
+```
+ resource fun;
+resource group g1 { happy, fun, ball }; // Don't tease this
+ resource group g2 { programming, is, fun };
+```
+
+
+In this case, we have defined 9 distinct resources in the same scope (including the group resources):
+
+
+```
+ fun, g1, g1.happy, g1.fun, g1.ball, g2, g2.programming, g2.is, g2.fun
+```
+
+
+The previous compiler allowed you to specify group members by name as long as they are unique in the current context, and they don’t shadow other defined resources. In this case, “fun” is defined three times, so any use of those must qualify the reference:
+
+
+```
+ func_unit mu_fu fu1(fun, g1.fun, g2.fun); // passes 3 different resources
+
+
+```
+
+
+Grouped resources with unique names can simply be referenced by their name:
+
+
+```
+ subunit yyy(programming, is, happy);
+```
+
+
+
+### New model: Arbitrary grouping of resources
+
+There is a fairly common need to specify different subsets of a set of defined resources. The MDL has a methodology to support aspects of this, but in the general case we didn’t have a direct syntax for making this easy to specify. This is particularly common with itineraries, where each stage specifies a different set of resources which can be used by each stage. For this use case, we’d like to be able to use groups to define subsets of defined resources, for example:
+
+
+```
+ resource res1, res2, res3, res4, res5, res6, res7, res8;
+ resource lows { res1, res2, res3, res4 };
+ resource highs { res5, res6, res7, res8 };
+ resource odds { res1, res3, res5, res7 };
+ resource evens { res2, res4, res6, res8 };
+ resource arbitrary { res1, res4, res5 };
+```
+
+
+In this case, all the group members with the same name refer to the same defined resource (in the current scope). This allows us to use groups to define arbitrary sets of defined resources, rather than defining distinct resources for each member.
+
+In the “fun” example from the previous section, rather than creating nine distinct resources, we would generate only seven: g1, g2, happy, fun, ball, programming, is - ie, all the “fun” members refer to the same “fun” resource.
+
+This is a very minor change in the language interpretation, and would obsolete the feature that two resource groups, defined in the same scope, could have members with the same name. This is of relatively little utility versus being able to define arbitrary subsets of defined resources.
+
+
+### Semantic and Syntax Changes
+
+Since this is primarily a change of the interpretation of resource groups, syntax changes are _required_. However, we would like to introduce a syntax for shortcutting the specification of a resource group as a template parameter. Consider the following example:
+
+
+```
+ resource group1 { res1, res2, res3 };
+ resource group2 { res3, res4, res5 };
+ resource group3 { res5, res6 };
+ subunit xyzzy(group1, group2, group3);
+```
+
+
+With the new syntax, this defines 3 resource groups and (only) 6 resources (res1..res6).
+
+We introduce a syntax that allows you to define these groups implicitly as part of the instance, so that the explicit group definitions are unnecessary. We’ll also add syntax to set the default allocation for a resource group - either “one of” or “all of”.
+
+
+```
+ subunit xyzzy(res1|res2|res3, res3|res4|res5, res5|res6);
+ subunit plugh(res1&res2&res3, res3&res4&res5, res5&res6);
+```
+
+
+Normally defined groups can also be defined with this syntax. Note that all the “operators” (‘,’ ‘&’, and ‘|’) must be identical in a single definition:
+
+
+```
+ resource group1 { res1 | res2 | res3 };
+ resource group2 { res3 & res4 & res5 };
+ resource group3 { res5, res6 }; // equivalent to |
+```
+
+
+When a group declared with “&” is “used” without an explicit allocation (ala x.y), all of its members are used. When a group declared with “,” or “|” is used, only 1 is allocated (ala x.1). We now have a syntax x.\* which allocates all of a group’s members, regardless of how it is declared.
+
+Implicitly declared groups can be used/declared in functional unit instances and subunit instances only. They cannot be used in latency instances (ie, in subunit templates), since resources can only be declared in CPUs, clusters, and functional unit templates. We may add this capability in the future.
+
+As with the current syntax, note that defined group members are promoted to the scope that the group is defined in, so there’s no need to explicitly define the members of the group as normally defined resources. This change would formalize that promotion.
+
+There are a few minor aspects of this new capability that we need to error check. A resource group definition can have shared bits (“resource x:3”), and/or a phase specification (“resource(E1) x”) and we assume all items in the resource group have the same definition. If we allow a group to reference already defined resources, we _may_ want to ensure all the resources are the same as the group resource definition (which might be an implicit definition…). Or not - there may be some value in allowing different members of a resource group to have different phases, for example.
+
+
+### General Design
+
+An important part of this design change is that for descriptions that don’t have groups with identically named members, the behavior doesn’t change, and this change should be transparent. (None of the existing descriptions have this issue.)
+
+In general, this design simplifies the compiler design of resources quite a bit. It complicates the bundle packing code a bit, since we must provide an explicit list of resource ids to allocate. We may want to handle reference groups and reference arrays the same.
+
+
+#### Parser Changes
+
+We will modify the parser to recognize implicitly defined groups in template instance parameters, and create groups for each of those occurrences.
+
+
+```
+ subunit xxx(res1 | res2 | res3, res3 & res4);
+```
+
+
+produces (internally):
+
+
+```
+resource anon1 { res1 | res2 | res3 };
+resource anon2 { res3 & res4 };
+subunit xxx(anon1, anon2);
+```
+
+
+which in turn produces (internally):
+
+ `resource res1, res2, res3, res4;`
+
+
+```
+ resource anon1 { res1 | res2 | res3 };
+ resource anon2 { res3 & res4 };
+ subunit xxx(anon1, anon2);
+```
+
+
+We maintain a table of groups so that we share definitions across explicit and implicit definitions.
+
+
+#### Promoting Members
+
+In the front-end of the compiler, we preprocess resource group definitions in CPUs, Clusters, and Functional Unit Templates to promote members to the scope they are defined in. While doing this promotion, if the resource already exists, we want to ensure that any phase or shared-bits attribute are the same. As we promote the members of a group, we create a vector of ResourceDef’s for the group definition to link each member to their promoted, defined resources. Each member contains an index into that list of ResourceDefs.
+
+
+#### Name Lookup
+
+In general, member name lookup is easier. For unqualified references (like “member”), we can eliminate the separate member-name lookups, since the member would have been promoted to a top-level reference. For qualified members (like “group.member”) the code can remain the way it is. We could also simplify it to simply reduce to a pointer to the promoted resource.
+
+
+#### Resource Id Assignment
+
+We no longer need to assign resource ids to either a group or to its declared members. A group is now simply a set of defined resources, and their associated ids.
+
+
+#### Accessing Member Ids
+
+Currently, a member’s id is the sum of its group id and its index in the group. In the new approach, a member’s index in the group is used to index into the group’s vector of ResourceDefs, and we use that resource’s id.
+
+
+#### Writing out Member Id Name Definitions
+
+When we write out definitions for resources, we no longer need to write out ids for resource groups, or their members. We can simply skip them.
+
+
+#### Building Resource Sets
+
+When we create permutations of pooled resource assignments, we must use a set of resource ids, rather than a simple range. We should do this the same for arrays and groups. \
+
+
+
+#### Output of the Database For Resource Groups and Arrays
+
+Rather than simply write out an initial resource id and a number of resources, for groups we need to write out a vector of the resource ids in the group. We may want to create a table of these, since there will be many duplicates. We will probably want to use the same mechanism for both Arrays and Groups, so that these can be treated the same way in the database and the bundle packer - even though an Array is guaranteed to have consecutive ids.
+
+We modify the PooledResourceRef definition - rather than provide a base resource id for the pool, we instead provide an array of resources associated with the pool. For example, today a PooledResourceRef looks like this:
+
+
+```
+ static std::vector PRES_101
+ {{RefUse,1,0,nullptr,47,2,&POOL_11}};
+```
+
+
+Currently, we only provide the base id of the pool/group, in this example 47. To implement the new methodology, we instead provide a pointer to an array of ids associated with the pool:
+
+
+```
+ static ResourceId MEMBERS_47 { 23, 43, 39, 35 };
+ static std::vector PRES_101
+ {{RefUse,1,0,nullptr,&MEMBERS_47,2,&POOL_11}};
+```
+
+
+
+#### Bundle Packing
+
+As in the database, a “Pool” is no longer a base plus a number of members. It is now a vector of explicit resource ids as part of the PooledResourceRef object. Rather than “compute” the resource id’s in a pool, we just use the explicitly enumerated resource ids. (This is a one or two line change in the pool allocation code.)
+
+
+#### TdScan Changes
+
+Currently each stage of an itinerary can specify a set of resources to use in that stage, specifying either all of the resources or just one. Functional unit templates and subunit templates are defined to have a resource template argument for each stage. For each CPU, for each functional unit, TdScan generates a separate instance of the functional unit for each permutation of the stage resources. For example, given the set of InstrStages:
+
+
+```
+InstrStage: cycles=1, units=[ADD1, ADD2], timeinc=-1
+InstrStage: cycles=1, units=[UNIT1, UNIT2], timeinc=-1
+InstrStage: cycles=1, units=[STORE1, STORE2], timeinc=-1
+```
+
+
+We previously generated the following functional unit definitions:
+
+
+```
+ func_unit type name(ADD1, UNIT1, STORE1);
+ func_unit type name(ADD2, UNIT1, STORE1);
+ func_unit type name(ADD1, UNIT2, STORE1);
+ func_unit type name(ADD2, UNIT2, STORE1);
+ func_unit type name(ADD1, UNIT1, STORE2);
+ func_unit type name(ADD2, UNIT1, STORE2);
+ func_unit type name(ADD1, UNIT2, STORE2);
+ func_unit type name(ADD2, UNIT2, STORE2);
+```
+
+
+This is all the permutations of the resource sets associated with the three stages. With the new syntax, we generate the following:
+
+ `func_unit type name(ADD1|ADD2, UNIT1|UNIT2, STORE1|STORE2);`
+
+and let the MDL compiler create the allocation pools to implement the permutations automatically.
+
diff --git a/llvm/docs/Mdl/UsingTheMDLCompiler.md b/llvm/docs/Mdl/UsingTheMDLCompiler.md
new file mode 100644
--- /dev/null
+++ b/llvm/docs/Mdl/UsingTheMDLCompiler.md
@@ -0,0 +1,575 @@
+
+
+## Using TdScan and the MDL compiler
+
+Reid Tatge tatge@google.com
+
+
+[TOC]
+
+
+
+#### **Overview of the process**
+
+This document describes the steps to building the MDL compiler and Tablegen scraper (tdscan) so that you can create and debug MDL instruction descriptions for LLVM.
+
+The “normal” process of using an MDL machine description for a target is to write the overall architecture description by hand, and generate an instruction description by scraping information from the tablegen description of the target. The generated instruction description is explicitly imported by the MDL compiler to tie the hand-written architecture description to the instruction descriptions in the target’s tablegen files.
+
+To keep the architecture in sync with the LLVM description, we extract and scrape the tablegen information as part of the compiler build process. The extraction process uses tablegen to write out all the target information, and the scraper scans this file and produces an MDL-based description of instructions, operands, registers, and register classes. This is imported by the architecture description so that the two descriptions are compiled together. This produces .cc and .h files that can be included in the LLVM build.
+
+
+
+#### **Scraping Information from Tablegen**
+
+To synchronize the MDL for a target with LLVM, we need to extract all of the instruction, operand and register definitions from the tablegen description. The first step in this process is to get tablegen to dump its internal representation of the target description to a plain text file.
+
+
+##### **Create tablegen information for a target:**
+
+This step uses the normal tablegen program to produce a dump of all the tablegen information for any LLVM target.
+
+
+
+* export LLVM=<path to llvm>
+* export TARGET=<family-name>
\
+Where family-name is one of AArch64, AMDGPU, ARC, ARM, AVR, BPF, CSKY, Hexagon, Lanai, M68k, Mips, MSP430, NVPTX, PPC, RISCV, Sparc, SystemZ, VE, WebAssembly, X86, XCore
+* .../clang-tblgen -print-records \
+ -I $LLVM/llvm-project/llvm/include/ \
+ -I $LLVM/llvm/include/llvm/IR/ \
+ -I $LLVM/llvm-project/llvm/lib/Target/$TARGET/ \
+ $LLVM/llvm-project/llvm/lib/Target/$TARGET/$(TARGET).td > ~/$(TARGET).txt
+
+This creates the file <family\_name>.txt, which can be processed by “tdscan” to produce an MDL file that describes the ISA of the processor family.
+
+
+##### **Scraping the tablegen file to produce ISA information**
+
+In this step we use “tdscan” to process the tablegen output file, which produces an MDL language description of the target architecture. \
+
+
+
+
+* export TARGET=<family-name>
+
+ Where family-name is one of: AArch64, AMDGPU, ARC, ARM, AVR, BPF, CSKY, Hexagon, Lanai, LoongArch, M68k, Mips, MSP430, NVPTX, PPC, RISCV, Sparc, SystemZ, VE, WebAssembly, X86, XCore
+
+* `…/tdscan -–family_name=$TARGET $(TARGET).txt`
+
+ \
+This produces the file $(TARGET)\_instructions.mdl, which contains MDL descriptions for all instructions, operands, registers, and register classes defined in the td files for that target.
+
+Anomalies:
+
+
+
+* For Sparc, the family name is actually “SP”, while the file name is “Sparc.txt”.
+* For PowerPC, the name of the td file is PPC.td which resides at //third\_party/llvm/llvm-project/llvm/lib/Target/**PowerPC**
+
+Note: Without the –family\_name argument, tdscan uses the input file name as the target name (minus the filename extension). As noted above, in general the family name matches the input file name.
+
+**Scraping the tablegen file to produce a full architecture spec**
+
+If the tablegen description contains Schedules and/or Itinerary descriptions, you can also have tdscan produce an MDL architecture spec for a processor. Currently, this applies to the following targets: AArch64, AMDGPU, ARM, Hexagon, Lanai, MIPS, PPC, RISCV, Sparc (SP), SystemZ, and X86.
+
+
+
+* export TARGET=<family-name>
+
+ Where family-name is one of: AArch64, AMDGPU, ARM, Hexagon, Lanai, Mips, PPC, RISCV, Sparc, SystemZ, X86 (same family name caveat for Sparc)
+
+* `…/tdscan --gen_arch_spec -–family_name=$TARGET $(TARGET).txt`
+
+This will produce both the instructions file ($(TARGET)\_instructions.mdl) and the architecture spec file ($(TARGET).mdl). The generated architecture spec will explicitly import the instruction description file. Compiling $(TARGET).mdl with the MDL compiler will produce an instruction database for the processor family.
+
+
+#### **Compiling a Machine Description**
+
+Generally, we separate the instruction descriptions from the architecture spec into separate .mdl files, and the architecture spec explicitly imports the instruction descriptions. So to compile a full machine description, we invoke the compiler on the architecture spec:
+
+
+```
+.../mdl CPU.mdl
+```
+
+
+This will create three files: CPUGenMdlInfo.inc, CPUGenMdlTarget.inc, and CPUGenInfo.h, which contain the database of architecture and instruction information that is imported into LLVM.
+
+
+##### **Command line options**
+
+You can invoke the compiler with “--help” to get a brief description of the command line options. The following options are supported, and discussed in more detail below:
+
+
+```
+ --check_all_operands (Check references to all operands - not just registers);
+ default: false;
+ --check_usage (Check subunit, reference, and resource usage);
+ default: false;
+
+ --dump_fus (Dump functional unit instantiations); default: false;
+ --dump_instr (Dump instruction information); default: false;
+ --dump_llvm_defs (Dump LLVM definitions); default: false;
+ --dump_preds (Dump user-define predicates); default: false;
+ --dump_resources (Dump resource ids); default: false;
+ --dump_spec (Dump entire mdl specification); default: false;
+ --dump_sus (Dump subunit instantiations); default: false;
+
+ --fatal_warnings (Treat warnings as errors); default: false;
+ --import_dir (import file dir); default: "";
+ --output_dir (output file dir); default: "";
+ --warnings (Print warnings); default: true;
+```
+
+
+
+##### **Options that help debug a machine description under development**
+
+
+###### **–check\_usage:**
+
+This option checks for possible errors in the description:
+
+
+
+* It checks that every register operand is explicitly referred to any any latency rules that apply to that instruction.
+* It warns for any latency template reference (use, def, etc) that never appear to apply to any instruction.
+* It warns for any unused subunit template (never referred to in any instruction.
+* It warns for any resource that is never referenced anywhere.
+
+These are not errors, but could indicate that something is incorrectly modeled.
+
+
+###### **–check\_all\_operands:**
+
+This option does the same checks that –check\_usage performs, but also checks that every single operand - even non-register operands - always is referenced. This is also not an error, but simply a diagnostic tool.
+
+
+###### **–dump\_instr:**
+
+This option dumps comprehensive information (to std::out) about every behavior of every instruction on every subtarget.
+
+**NOTE: **There are a LOT of instruction descriptions - each instruction has entries for the cross product of each processor, functional unit, and issue slot it can run on. You'll notice that the entries are often almost identical except for where they run. Internally, identical aspects of the description are shared - across different instructions, functional units, and processors - so this isn't as bad as it might seem. If you look through the resource references, you should see EXACTLY what each instruction does in each context it can run in. A few thoughts:
+
+
+
+* If you ignore functional unit and issue slot resources, many of the instances of an instruction are going to be identical (from the perspective of a simulator, for example).
+* There are quite a few instructions that have different operand and resource latencies based on which functional unit they run on. So the only difference between their descriptions will be a single latency (operand or resource). The good news is that the representation of all of this is pretty compact.
+* All of this information is encoded in the output file (<family>.mdl.cc).
+
+So there is a massive amount of information here - not to worry: the compiler deduplicates everything, so there is very little redundancy in the generated database. This is just the “raw” information the compiler generates internally.
+
+Here's what the output of --dump\_instr looks like: \
+
+
+
+```
+Instruction: MOV16rm(GR16 dst, i16mem src)
+ flat(GR16 dst, (i16mem.ptr_rc) (src.0), (i16mem.i8imm) (src.1),
+ (i16mem.ptr_rc_nosp) (src.2), (i16mem.i32imm) (src.3),
+ (i16mem.SEGMENT_REG) (src.4)) {
+ subunit(sub579,sub1976,sub1977,sub1978,sub1979,sub1980,sub1981,sub1982,sub1983,
+ sub1984,sub2767,sub2768,sub2769,sub2770); }
+ Subunit: AlderlakeP.U11
+ Operand references:
+ ===> def.p(E6, GR16:$dst[0])
+ Resources:
+ use.p(F1,U11{12})
+ Pool Resources:
+ Architectural Register Constraints:
+
+
+
+Instruction: MOV16rm(GR16 dst, i16mem src)
+ flat(GR16 dst, (i16mem.ptr_rc) (src.0), (i16mem.i8imm) (src.1),
+ (i16mem.ptr_rc_nosp) (src.2), (i16mem.i32imm) (src.3),
+ (i16mem.SEGMENT_REG) (src.4)) { subunit(sub579,sub1976,sub1977,sub1978,sub1979,sub1980,sub1981,sub1982,sub1983,
+ sub1984,sub2767,sub2768,sub2769,sub2770); }
+ Subunit: Znver1.U0
+ Operand references:
+ ===> def.p(E5, GR16:$dst[0])
+ ===> use.p(E5, i16mem:$src.0[1])
+ Resources:
+ use.p(F1,U0{1})
+ Pool Resources:
+ Architectural Register Constraints:
+```
+
+
+**_How to Interpret MDL Debug Output_**
+
+Each instruction record describes a single behavior of an instruction on a particular processor and functional unit. For each instruction, we write out:
+
+
+
+* The instruction name (per LLVM) and the operand types/names as declared in llvm. Some of these operands are composites of other operands.
+* The "flat" operand list: each composite operand is expanded to its components as discrete operands. This is the "real" operand list.
+* The "Subunit": the processor and functional unit names for this instance of the instruction.
+* All of this instruction's operand references, and the name of the pipeline phase they happen in (E1, etc). This includes operand-related resource references, if any.
+* All of this instruction's resource references, and when they happen
+* All of this instruction's pooled resource references, and when they happen.
+* Any architectural register constraints imposed on the instruction by this functional unit (most CPU’s don’t have these)
+
+**Operands: \
+**The operand references have the syntax (in the output):
+
+ ` <opcode> (<protection>)? '(' <pipeline_phase> ',' \
+ <operand_specification> ',' \
+ (<resource_references>)? ')' \
+`
+
+where the opcodes are "use", "def", “predicate”. <Protection> is what kind of pipeline protection is used for this reference (protected, unprotected, hard), one of “.p”, “.u”, or “.h”.
+
+**Resources:**
+
+The resource references are the same, without the operand reference component.
+
+ ` <opcode> (<protection>)? '(' <pipeline_phase> ',' <resource_references> ')' `
+
+An operand specification has the syntax:
+
+
+```
+ ':' '$' '[' ']'
+```
+
+
+An example: GPR:$x[2] refers to operand number 2 (in the flat operand list), called "x", which has operand type GPR.
+
+The resource references have the syntax:
+
+
+```
+ '{' '}'
+```
+
+
+An example: alu1{2} refers to a resource "alu1" which has a resource id of 1.
+
+**Pooled Resources:**
+
+Pooled resources have a slightly more complex syntax:
+
+
+```
+ '{' '}' '[' ']'
+ (':' )* '-->'
+```
+
+
+An example: imm{26}[0..3]:size:bits-->2 refers to the "imm" resource, resource id 26, a subrange of members 0..3 with "size" and "bits" attributes, associated with operand 2.
+
+Pooled resources also have a “subpool id” and “size requests” information.
+
+
+###### **--dump\_resources:**
+
+Write descriptions of all defined resources to std::out.
+
+For each subtarget, we print a set of resource definitions, followed by a list of _pooled _resource definitions (if the description includes any resource pools).
+
+**_Example Resource Dump:_**
+
+
+```
+Resources defined for 'RISCV' ---------------------------------------
+fake.RISCV.end : 1
+
+Pooled resources defined for 'RISCV' --------------------------------
+
+Resources defined for 'Rocket' ---------------------------------------
+Funcunit.Rocket.__.U0 : 1, cycles: [0..0]