diff --git a/llvm/docs/CommandGuide/llvm-exegesis.rst b/llvm/docs/CommandGuide/llvm-exegesis.rst
--- a/llvm/docs/CommandGuide/llvm-exegesis.rst
+++ b/llvm/docs/CommandGuide/llvm-exegesis.rst
@@ -89,18 +89,18 @@
 
 .. code-block:: bash
 
-    $ llvm-exegesis -mode=latency -opcode-name=ADD64rr
+    $ llvm-exegesis --mode=latency --opcode-name=ADD64rr
 
 Measuring the uop decomposition or inverse throughput of an instruction works similarly:
 
 .. code-block:: bash
 
-    $ llvm-exegesis -mode=uops -opcode-name=ADD64rr
-    $ llvm-exegesis -mode=inverse_throughput -opcode-name=ADD64rr
+    $ llvm-exegesis --mode=uops --opcode-name=ADD64rr
+    $ llvm-exegesis --mode=inverse_throughput --opcode-name=ADD64rr
 
 
 The output is a YAML document (the default is to write to stdout, but you can
-redirect the output to a file using `-benchmarks-file`):
+redirect the output to a file using `--benchmarks-file`):
 
 .. code-block:: none
 
@@ -125,7 +125,7 @@
 
 .. code-block:: bash
 
-    $ llvm-exegesis -mode=latency -opcode-index=-1
+    $ llvm-exegesis --mode=latency --opcode-index=-1
 
 
 EXAMPLE 2: benchmarking a custom code snippet
@@ -136,7 +136,7 @@
 
 .. code-block:: bash
 
-    $ echo "vzeroupper" | llvm-exegesis -mode=uops -snippets-file=-
+    $ echo "vzeroupper" | llvm-exegesis --mode=uops --snippets-file=-
 
 Real-life code snippets typically depend on registers or memory.
 :program:`llvm-exegesis` checks the liveliness of registers (i.e. any register
@@ -189,10 +189,10 @@
 
 .. code-block:: bash
 
-    $ llvm-exegesis -mode=analysis \
-  -benchmarks-file=/tmp/benchmarks.yaml \
-  -analysis-clusters-output-file=/tmp/clusters.csv \
-  -analysis-inconsistencies-output-file=/tmp/inconsistencies.html
+    $ llvm-exegesis --mode=analysis \
+  --benchmarks-file=/tmp/benchmarks.yaml \
+  --analysis-clusters-output-file=/tmp/clusters.csv \
+  --analysis-inconsistencies-output-file=/tmp/inconsistencies.html
 
 This will group the instructions into clusters with the same performance
 characteristics. The clusters will be written out to `/tmp/clusters.csv` in the
@@ -230,28 +230,28 @@
 OPTIONS
 -------
 
-.. option:: -help
+.. option:: --help
 
  Print a summary of command line options.
 
-.. option:: -opcode-index=<LLVM opcode index>
+.. option:: --opcode-index=<LLVM opcode index>
 
  Specify the opcode to measure, by index. Specifying `-1` will result
  in measuring every existing opcode. See example 1 for details.
  Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
 
-.. option:: -opcode-name=<opcode name 1>,<opcode name 2>,...
+.. option:: --opcode-name=<opcode name 1>,<opcode name 2>,...
 
  Specify the opcode to measure, by name. Several opcodes can be specified as
  a comma-separated list. See example 1 for details.
  Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
 
-.. option:: -snippets-file=<filename>
+.. option:: --snippets-file=<filename>
 
  Specify the custom code snippet to measure. See example 2 for details.
  Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
 
-.. option:: -mode=[latency|uops|inverse_throughput|analysis]
+.. option:: --mode=[latency|uops|inverse_throughput|analysis]
 
  Specify the run mode. Note that some modes have additional requirements and options.
 
@@ -274,7 +274,7 @@
   * ``assemble-measured-code``: Same as ``prepare-and-assemble-snippet``. but also creates the full sequence that can be dumped to a file using ``--dump-object-to-disk``.
   * ``measure``: Same as ``assemble-measured-code``, but also runs the measurement.
 
-.. option:: -x86-lbr-sample-period=<nBranches/sample>
+.. option:: --x86-lbr-sample-period=<nBranches/sample>
 
   Specify the LBR sampling period - how many branches before we take a sample.
   When a positive value is specified for this option and when the mode is `latency`,
@@ -283,7 +283,7 @@
   could occur if the sampling is too frequent. A prime number should be used to
   avoid consistently skipping certain blocks.
 
-.. option:: -x86-disable-upper-sse-registers
+.. option:: --x86-disable-upper-sse-registers
 
   Using the upper xmm registers (xmm8-xmm15) forces a longer instruction encoding
   which may put greater pressure on the frontend fetch and decode stages,
@@ -292,7 +292,7 @@
   enabled can help determine the effects of the frontend and can be used to
   improve latency and throughput estimates.
 
-.. option:: -repetition-mode=[duplicate|loop|min]
+.. option:: --repetition-mode=[duplicate|loop|min]
 
  Specify the repetition mode. `duplicate` will create a large, straight line
  basic block with `num-repetitions` instructions (repeating the snippet
@@ -307,13 +307,13 @@
  instead use the `min` mode, which will run each other mode,
  and produce the minimal measured result.
 
-.. option:: -num-repetitions=<Number of repetitions>
+.. option:: --num-repetitions=<Number of repetitions>
 
  Specify the target number of executed instructions. Note that the actual
  repetition count of the snippet will be `num-repetitions`/`snippet size`.
  Higher values lead to more accurate measurements but lengthen the benchmark.
 
-.. option:: -loop-body-size=<Preferred loop body size>
+.. option:: --loop-body-size=<Preferred loop body size>
 
  Only effective for `-repetition-mode=[loop|min]`.
  Instead of looping over the snippet directly, first duplicate it so that the
@@ -321,7 +321,7 @@
  in loop body being cached in the CPU Op Cache / Loop Cache, which allows to
  which may have higher throughput than the CPU decoders.
 
-.. option:: -max-configs-per-opcode=<value>
+.. option:: --max-configs-per-opcode=<value>
 
  Specify the maximum configurations that can be generated for each opcode.
  By default this is `1`, meaning that we assume that a single measurement is
@@ -333,22 +333,22 @@
  lead to different performance characteristics.
 
 
-.. option:: -benchmarks-file=</path/to/file>
+.. option:: --benchmarks-file=</path/to/file>
 
  File to read (`analysis` mode) or write (`latency`/`uops`/`inverse_throughput`
  modes) benchmark results. "-" uses stdin/stdout.
 
-.. option:: -analysis-clusters-output-file=</path/to/file>
+.. option:: --analysis-clusters-output-file=</path/to/file>
 
  If provided, write the analysis clusters as CSV to this file. "-" prints to
  stdout. By default, this analysis is not run.
 
-.. option:: -analysis-inconsistencies-output-file=</path/to/file>
+.. option:: --analysis-inconsistencies-output-file=</path/to/file>
 
  If non-empty, write inconsistencies found during analysis to this file. `-`
  prints to stdout. By default, this analysis is not run.
 
-.. option:: -analysis-filter=[all|reg-only|mem-only]
+.. option:: --analysis-filter=[all|reg-only|mem-only]
 
  By default, all benchmark results are analysed, but sometimes it may be useful
  to only look at those that to not involve memory, or vice versa. This option
@@ -356,44 +356,44 @@
  ones that do involve memory (involve instructions that may read or write to
  memory), or the opposite, to only keep such benchmarks.
 
-.. option:: -analysis-clustering=[dbscan,naive]
+.. option:: --analysis-clustering=[dbscan,naive]
 
  Specify the clustering algorithm to use. By default DBSCAN will be used.
  Naive clustering algorithm is better for doing further work on the
  `-analysis-inconsistencies-output-file=` output, it will create one cluster
  per opcode, and check that the cluster is stable (all points are neighbours).
 
-.. option:: -analysis-numpoints=<dbscan numPoints parameter>
+.. option:: --analysis-numpoints=<dbscan numPoints parameter>
 
  Specify the numPoints parameters to be used for DBSCAN clustering
  (`analysis` mode, DBSCAN only).
 
-.. option:: -analysis-clustering-epsilon=<dbscan epsilon parameter>
+.. option:: --analysis-clustering-epsilon=<dbscan epsilon parameter>
 
  Specify the epsilon parameter used for clustering of benchmark points
  (`analysis` mode).
 
-.. option:: -analysis-inconsistency-epsilon=<epsilon>
+.. option:: --analysis-inconsistency-epsilon=<epsilon>
 
  Specify the epsilon parameter used for detection of when the cluster
  is different from the LLVM schedule profile values (`analysis` mode).
 
-.. option:: -analysis-display-unstable-clusters
+.. option:: --analysis-display-unstable-clusters
 
  If there is more than one benchmark for an opcode, said benchmarks may end up
  not being clustered into the same cluster if the measured performance
  characteristics are different. by default all such opcodes are filtered out.
  This flag will instead show only such unstable opcodes.
 
-.. option:: -ignore-invalid-sched-class=false
+.. option:: --ignore-invalid-sched-class=false
 
  If set, ignore instructions that do not have a sched class (class idx = 0).
 
-.. option:: -mtriple=<triple name>
+.. option:: --mtriple=<triple name>
 
  Target triple. See `-version` for available targets.
 
-.. option:: -mcpu=<cpu name>
+.. option:: --mcpu=<cpu name>
 
  If set, measure the cpu characteristics using the counters for this CPU. This
  is useful when creating new sched models (the host CPU is unknown to LLVM).