Introduction
Currently llvm-mca only accepts assembly code as input.  We would like to
extend llvm-mca to support object files, allowing users to analyze the
performance of binaries.  The proposed changes optionally introduce an object
file section, but this can be stripped-out if desired.
For the llvm-mca binary support feature to be useful, a user needs to tell
llvm-mca which portions of their code they would like analyzed.  Currently,
this is accomplished via assembly comments.  However, assembly comments are not
preserved in object files, and this has encouraged this RFC.  For the proposed
binary support, we need to introduce changes to clang and llvm to allow the
user's object code to be recognized by llvm-mca:
- We need a way for a user to identify a region/block of code they want analyzed by llvm-mca.
- We need the information defining the user's region of code to be maintained in the object file so that llvm-mca can analyze the desired region(s) from the binary object file.
We define a "code region" as a subset of a user's program that is to be
analyzed via llvm-mca.  The sequence of instructions to be analyzed is
represented as a pair: <start, end> where the 'start' marks the beginning of
the user's source code and 'end' terminates the sequence.  The instructions
between 'start' and 'end' form the region that can be analyzed by llvm-mca at a
later time.
Example
Before we go into the details of this proposed change, let's first look at a
simple example:
// example.c -- Analyze a dot-product expression.
double test(double x, double y) {
double result = 0.0; __mca_code_region_start(42); result += x * y; __mca_code_region_end(); return result;
}
In the example above, we have identified a code region, in this case a single
dot-product expression.  For the sake of brevity and simplicity, we've chosen
a very simple example, but in reality a more complicated example could use
multiple expressions.  We have also denoted this region as number 42.  That
identifier is only for the user, and simplifies reading an llvm-mca analysis
report later.
When this code is compiled, the region markers (the mca_code_region markers)
are transformed into assembly labels.  While the markers are presented as
function calls, in reality they are no-ops.
test:
	pushq	%rbp
	movq	%rsp, %rbp
	movsd	%xmm0, -8(%rbp)
	movsd	%xmm1, -16(%rbp)
.Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
	xorps	%xmm0, %xmm0
	movsd	%xmm0, -24(%rbp)
	movsd	-8(%rbp), %xmm0
	mulsd	-16(%rbp), %xmm0
	addsd	-24(%rbp), %xmm0
	movsd	%xmm0, -24(%rbp)
.Lmca_code_region_end_0:   # LLVM-MCA-END ID: 42
	movsd	-24(%rbp), %xmm0
	popq	%rbp
retq
.section	.mca_code_regions,"",@progbits
.quad	42
.quad	.Lmca_code_region_start_0
.quad	.Lmca_code_region_end_0-.Lmca_code_region_start_0
The assembly has been trimmed to show the portions relevant to this RFC.
Notice the labels enclose the user's defined region, and that they preserve the
user's arbitrary region identifier, the ever-so-important region 42.
In the object file section .mca_code_regions, we have noted the user's region
identifier (.quad 42), start address, and region size.  A more complicated
example can have multiple regions defined within a single .mca_code_regions
section.  This section can be read by llvm-mca, allowing llvm-mca to take
object files as input instead of assembly source.
Details
We need a way for a user to identify a region/block of code they want analyzed
by llvm-mca. We solve this problem by introducing two intrinsics that a user can
specify, for identifying regions of code for analysis.
The two intrinsics are: llvm.mca.code.regions.start and
llvm.mca.code.regions.end.  A user can identify a code region by inserting the
mca_code_region_start and mca_code_region_end markers.  These are simply
clang builtins and are transformed into the aforementioned intrinsics during
compilation.  The code between the intrinsics are what we call "code regions"
and are to be easily identifiable by llvm-mca; any code between a start/end
pair can be analyzed by llvm-mca at a later time.  A user can define multiple
non-overlapping code regions within their program.
The llvm.mca.code.region.start intrinsic takes an integer constant as its only
argument.  This argument is implemented as a metadata i32, and is only used
when generating llvm-mca reports. This value allows a user to more easily
identify a specific code region.  llvm.mca.code.region.end takes no arguments.
Since we disallow nesting of regions, the first 'end' intrinsic lexically
following a 'start' intrinsic represents the end of that code region.
Now that we have a solution for identifying regions for analysis, we now need a
way for preserving that information to be read at a later time.  To accomplish
this we propose adding a new section (.mca_code_regions) to the object file
generated by llvm.  During code generation, the start/end intrinsics described
above will be transformed into start/end labels in assembly.  When llvm
generates the object file from the user's code, these start/end labels form a
pair of values identifying the start of the user's code region, and size.  The
size represents the number of bytes between the start and end address of the
labels.  Note that the labels are emitted during assembly printing.  We hope
that these labels have no influence on code generation or basic-block
placement.  However, the target assembler strategy for handling labels is
outside of our control.
This proposed change affects the size of a binary, but only if the user calls
the start/end builtins mentioned above.  The additional size of the
.mca_code_regions section, which we imagine to be very small (to the order of a
few bytes), can trivially be stripped by tools like 'strip' or 'objcopy'.
Implementation Status
We currently have the proposed changes implemented at the url posted below.
This initial patch only targets ELF object files, and does not handle
relocatable addresses.  Since the start of a code region is represented as an
assembly label, and referenced in the .mca_code_regions section, that address
is relocatable.  That value can be represented as section-relative relocatable
symbol (.text + addend), but we are not handling that case yet.  Instead, the
proposed changes only handle linked/executable object files.
The change is presented  as a monolithic patch; however, when the time comes
it will be split into three  patches:
- The introduction of the builtins to clang.
- The llvm portion (the added intrinsics).
- The llvm-mca portion.
This RFC was proposed to the wider llvm community here:
https://lists.llvm.org/pipermail/llvm-dev/2018-November/127784.html
I've set hasSideEffects to true, so that DeadMachineInstructionElim does not remove the llvm-mca code markers under optimization. However, it's certainly possible that optimizations will move code outside of the region.