Index: llvm/trunk/docs/CommandGuide/llvm-exegesis.rst =================================================================== --- llvm/trunk/docs/CommandGuide/llvm-exegesis.rst +++ llvm/trunk/docs/CommandGuide/llvm-exegesis.rst @@ -22,7 +22,116 @@ result is printed out as YAML to the standard output. The main goal of this tool is to automatically (in)validate the LLVM's TableDef -scheduling models. +scheduling models. To that end, we also provide analysis of the results. + +EXAMPLES: benchmarking +---------------------- + +Assume you have an X86-64 machine. To measure the latency of a single +instruction, run: + +.. code-block:: bash + + $ llvm-exegesis -mode=latency -opcode-name=ADD64rr + +Measuring the uop decomposition of an instruction works similarly: + +.. code-block:: bash + + $ llvm-exegesis -mode=uops -opcode-name=ADD64rr + +The output is a YAML document (the default is to write to stdout, but you can +redirect the output to a file using `-benchmarks-file`): + +.. code-block:: none + + --- + key: + opcode_name: ADD64rr + mode: latency + config: '' + cpu_name: haswell + llvm_triple: x86_64-unknown-linux-gnu + num_repetitions: 10000 + measurements: + - { key: latency, value: 1.0058, debug_string: '' } + error: '' + info: 'explicit self cycles, selecting one aliasing configuration. + Snippet: + ADD64rr R8, R8, R10 + ' + ... + +To measure the latency of all instructions for the host architecture, run: + +.. code-block:: bash + + #!/bin/bash + readonly INSTRUCTIONS=$(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc | cut -f2 -d=) + for INSTRUCTION in $(seq 1 ${INSTRUCTIONS}); + do + ./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} | sed -n '/---/,$p' + done + +FIXME: Provide an :program:`llvm-exegesis` option to test all instructions. + +EXAMPLES: analysis +---------------------- + +Assuming you have a set of benchmarked instructions (either latency or uops) as +YAML in file `/tmp/benchmarks.yaml`, you can analyze the results using the +following command: + +.. code-block:: bash + + $ llvm-exegesis -mode=analysis \ + -benchmarks-file=/tmp/benchmarks.yaml \ + -analysis-clusters-output-file=/tmp/clusters.csv \ + -analysis-inconsistencies-output-file=/tmp/inconsistencies.txt + +This will group the instructions into clusters with the same performance +characteristics. The clusters will be written out to `/tmp/clusters.csv` in the +following format: + +.. code-block:: none + + cluster_id,opcode_name,config,sched_class + ... + 2,ADD32ri8_DB,,WriteALU,1.00 + 2,ADD32ri_DB,,WriteALU,1.01 + 2,ADD32rr,,WriteALU,1.01 + 2,ADD32rr_DB,,WriteALU,1.00 + 2,ADD32rr_REV,,WriteALU,1.00 + 2,ADD64i32,,WriteALU,1.01 + 2,ADD64ri32,,WriteALU,1.01 + 2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00 + 2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02 + 2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01 + 2,ADD64ri8,,WriteALU,1.00 + 2,SETBr,,WriteSETCC,1.01 + ... + +:program:`llvm-exegesis` will also analyze the clusters to point out +inconsistencies in the scheduling information. For example, +`/tmp/inconsistencies.txt` will contain messages like: + +.. code-block:: none + + Sched Class EXTRACTPSrr_VEXTRACTPSrr contains instructions with distinct performance characteristics, falling into 2 clusters: + 4,EXTRACTPSrr,,3.00 + 3,VEXTRACTPSrr,,2.01 + + Sched Class WriteCRC32 contains instructions with distinct performance characteristics, falling into 2 clusters: + 4,CRC32r32r16,,3.01 + 4,CRC32r32r32,,3.00 + 11,CRC32r32r8,,4.01 + 4,CRC32r64r64,,3.01 + 4,CRC32r64r8,,3.00 + +Note that the scheduling class names will be resolved only when +:program:`llvm-exegesis` is compiled in debug mode, else only the class id will +be shown. This does not invalidate any of the analysis results though. + OPTIONS ------- @@ -41,15 +150,40 @@ Specify the opcode to measure, by name. Either `opcode-index` or `opcode-name` must be set. -.. option:: -benchmark-mode=[Latency|Uops] +.. option:: -mode=[latency|uops|analysis] - Specify which characteristic of the opcode to measure. + Specify the run mode. .. option:: -num-repetitions= Specify the number of repetitions of the asm snippet. Higher values lead to more accurate measurements but lengthen the benchmark. + .. option:: -benchmarks-file= + + File to read (`analysis` mode) or write (`latency`/`uops` modes) benchmark + results. "-" uses stdin/stdout. + +.. option:: -analysis-clusters-output-file= + + If provided, write the analysis clusters as CSV to this file. "-" prints to + stdout. + +.. option:: -analysis-inconsistencies-output-file= + + If non-empty, write inconsistencies found during analysis to this file. `-` + prints to stdout. + +.. option:: -analysis-numpoints= + + Specify the numPoints parameters to be used for DBSCAN clustering + (`analysis` mode). + +.. option:: -analysis-espilon= + + Specify the numPoints parameters to be used for DBSCAN clustering + (`analysis` mode). + EXIT STATUS ----------- Index: llvm/trunk/tools/llvm-exegesis/llvm-exegesis.cpp =================================================================== --- llvm/trunk/tools/llvm-exegesis/llvm-exegesis.cpp +++ llvm/trunk/tools/llvm-exegesis/llvm-exegesis.cpp @@ -49,7 +49,7 @@ enum class BenchmarkModeE { Latency, Uops, Analysis }; static llvm::cl::opt BenchmarkMode( - "benchmark-mode", llvm::cl::desc("the benchmark mode to run"), + "mode", llvm::cl::desc("the mode to run"), llvm::cl::values( clEnumValN(BenchmarkModeE::Latency, "latency", "Instruction Latency"), clEnumValN(BenchmarkModeE::Uops, "uops", "Uop Decomposition"),