Page MenuHomePhabricator

[RFC][LLD][ELF] Cortex-M Security Extensions (CMSE) Support
Needs ReviewPublic

Authored by amilendra on Dec 1 2022, 2:57 AM.

Details

Summary

This is to summarize the approach and a request for comments in any constraints
the approach will break. Alternative suggestions are also very much welcome.

Arm v8-M Security Extensions: Requirements on Development Tools [1] specifies
the functionality required by the toolchain. Clang support has already been
added in [2]. This patch implements the linker requirements specified in [1].

Linker Requirements

The linker acts on entry functions generated by the compiler/assembler.
An entry function has two ELF function (STT_FUNC) symbols labelling it:

  1. A standard symbol ( e.g. foo)
  2. A special symbol (e.g. __acle_se_foo)

See [Req.43], [Req.45].

Generation of secure gateway veneers

The linker generates an SG veneer for each entry function with external linkage
that has its standard and special symbols labelling the same address
(e.g. foo == __acle_se_foo). Otherwise an SG is assumed to be present. [Req.44]

An SG veneer consists of an SG instruction followed by a B.W instruction that
targets the entry function it veneers. [Req.9], [Req.12]

An SG veneer must be labelled by an ELF symbol that has the same binding, type,
and name as the entry function it veneers. [Req.10]

SG veneer section

A toolchain must support creating a section of SG veneers consisting of one or
more veneers placed consecutively in memory. [Req.11]

A section of SG veneers must be aligned to a 32-byte boundary, and must be zero
padded to a 32- byte boundary. [Req.13]

The position of SG veneers in a section must be controllable by the developer.
This allows the developer to fix the addresses of the SG veneers such that
secure code can be updated independently of non-secure code. [Req.14]

Generation of a SG import library

The linker supports generating an SG import library (import library), which is a
relocatable file containing only copies of the (absolute) symbols of the SGs in
the secure executable. [Req.8]

The import library is used when updating non-secure code to specify to the
linker the fixed addresses of the SG veneers used by secure code [Req.14].

Implementation Notes

External interface

The external interface of the linker for the CMSE functionality follows the
external interface used by the BFD ld linker.[3]

All SG veneers are placed in the special output section '.gnu.sgstubs'. The
start address of '.gnu.sgstubs' must be set, either with the command-line option
'--section-start' or in a linker script. The ld linker throws an error when the
start address is not specified.

The '--in-implib=file' specifies an input import library whose symbols must keep
the same address in the executable being produced.

ld emits a warning is given if no '--out-implib' is given but new symbols have
been introduced in the executable that should be listed in its import library.
Otherwise, if '--out-implib' is specified, the symbols are added to the output
import library. ld emits a warning is also given if some symbols present in the
input import library have disappeared from the executable.

In ld, the '--in-implib' and '--out-implib' options are effective only when
'--cmse-implib' is specified. '--cmse-implib' was needed in ld maybe because
'--out-implib' already existed for the i386 PE backend [4], so some mechanism
was needed to differentiate its use in the ARM backend. LLD does not implement
'--out-implib' so the '--cmse-implib' option seems redundant at the moment. In
fact, the linker for the proprietary Arm Compiler toolchain (armlink) only
define two options to specify the input and output import libraries,
--import_cmse_lib_in=file and --import_cmse_lib_out=file, respectively. I've
added the '--cmse-implib' option for now, but will remove it if we decide it
unnecessary.

Reading the import library

The input import library is a relocatable object with only a symbol table. This
is similar to the --just-symbols functionality, but is used to inform the linker
of SG veneers that must be placed at fixed addresses. So the symbols in the
input import library will not be added to the global symbol table (symtab).
Instead the information will be maintained via a separate StringMap table which
uses the symbol name as the key.

Generating the SG veneers and SG veneer section

Generation and positioning of SG veneers is the most complicated part.
There are two types of SG veneers.

  1. SG veneers that should be generated by the linker at fixed addresses specified by the input import library.
  2. SG veneers that do not have their symbols specified in the input import library which the linker can place at any location within the SG veneer section.

In both cases the standard symbol of the entry functions should be retargetted
to point to the start of the generated SG veneer.

In this patch I've modelled the problem by defining a type of SyntheticSection,
ArmCmseSGVeneer, and two specializations of it, ArmCmseFixedAddressSGVeneer and
ArmCmseVariableAddressSGVeneer. An ArmCmseSGVeneer keeps track of its
corresponding standard and special symbol pair. After the addresses are fixed,
the standard symbol should point to the start address of the ArmCmseSGVeneer,
and the veneer generated in ArmCmseSGVeneer should branch to the special symbol.

ArmCmseSGSection is another type of SyntheticSection which I've used to model
the SG veneer section and is meant to be the parent of all the ArmCmseSGVeneers.
ArmCmseSGSection maintains the range of addresses used by the input import
library. ArmCmseFixedAddressSGVeneers will be placed sparsely at fixed addresses
within that range, and ArmCmseVariableAddressSGVeneer will be placed at
consecutive addresses out of that range.

For now, finalizing the ArmCmseSGSection is done in a single pass after output
addresses have been decided. However, further testing and review may show that
multiple passes needs to be done to finalize the ArmCmseSGSection.

I briefly considered if this can be modelled using the Thunk management
mechanism, but decided against it because seemed to me that Thunks are generated
for relocations to symbols, while the CMSE veneers do not require relocations.

TODO

This patch is still very much a prototype that demonstrates only the most
common use cases of writing and reading an import library, placement of veneers
at fixed and variable addresses.

Much of error handling is yet to be implemented. I plan to follow the error
messages emitted by ld.

More tests related to SG veneer section positioning needs to be done.
More testing on various combinations of symbol bindings, types, visibility needs
to be done.

Writing the import library to disk is a very hacky implementation that seg
faults in multi threaded invocations. The tests workaround this by using
--threads=1. This needs to be refactored and fixed before landing.

Reference

[1] Arm® v8-M Security Extensions: Requirements on Development Tools Version: 1.2
Downloadable from https://developer.arm.com/documentation/ecm0359818/latest
[2] https://reviews.llvm.org/D71129 [ARM][CMSE] Implement CMSE attributes
[3] https://sourceware.org/binutils/docs/ld/ARM.html
[4] https://sourceware.org/pipermail/binutils/2015-December/091424.html

The following are the requirements specified in [1] that are implemented by the linker.

[Req.8] A relocatable file containing only copies of the (absolute) symbols of the
secure gateways in the secure executable must be available to link non-secure
code against.
[Req.9] A toolchain must support generating a secure gateway veneer for each entry
function with external linkage. It consists of an SG instruction followed by a
B.W instruction that targets the entry function it veneers.
[Req.10] A secure gateway veneer must be labelled by an ELF symbol that has the same
binding, type, and name as the function it veneers, following the rules for C
entities as defined by [AAELF]
[Req.11] A toolchain must support creating a vector of secure gateway veneers
consisting of one or more veneers placed consecutively in memory.
[Req.12] Excluding the first instruction of a secure gateway veneer, a veneer
must not contain the bit pattern of the SG instruction on a 2-byte boundary.
[Req.13] A vector of secure gateway veneers must be aligned to a 32-byte
boundary, and must be zero padded to a 32-byte boundary.
[Req.14] You must have granted control of the position of secure gateway veneers
in a vector.
[Req.43] An entry function has two ELF function ( STT_FUNC ) symbols labelling it:

• A symbol that follows the standard naming for C entities as defined by
  [AAELF] labels the function’s inline secure gateway if it has one, otherwise
  the function’s first instruction.
• A special symbol that prefixes the standard function name with __acle_se_
  labels the function’s first non-SG instruction.

[Req.44] A toolchain must generate a secure gateway veneer for an entry function
that has both its symbols labelling the same address. Otherwise a secure gateway
is assumed to be present.
[Req.45] The address of an entry function must be the address labelled by its
standard symbol.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald TranscriptDec 1 2022, 2:57 AM
amilendra requested review of this revision.Dec 1 2022, 2:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 1 2022, 2:57 AM
tschuett added inline comments.
lld/ELF/Arch/ARM.cpp
934

I have literally no idea what you are doing, but you create a SmallVector inputSections of size 0 and then immediately add 4 elements.

amilendra added inline comments.Dec 3 2022, 3:07 AM
lld/ELF/Arch/ARM.cpp
934

Thanks for pointing it out. I'll improve it.

MaskRay added inline comments.Dec 3 2022, 12:18 PM
lld/ELF/Arch/ARM.cpp
934

SmallVector<SyntheticSection *, 4> instantiates some code which bloats the lld executable. SmallVector<SyntheticSection *, 0> makes sense if SmallVector<SyntheticSection *, 4> hasn't been used globally.

tschuett added inline comments.Dec 3 2022, 12:20 PM
lld/ELF/Arch/ARM.cpp
934

Then it is binary size vs. malloc.

MaskRay requested changes to this revision.Dec 3 2022, 4:57 PM
MaskRay added inline comments.
lld/ELF/Arch/ARM.cpp
917

delete

920

combine two if

934

I'll optimize for binary size and code size. Inline elements>0 compiles to more code and optimizes for this very niche extension which is only called once. It's not a good tradeoff to optimize away a few malloc.

939

These are not ELF header. elfheader* are not good names.

948

Several functions are repeatedly called. Use an array for the 4 OutputSections and use loops.

1030

TaskGroup's destructor ensures the tasks are done. It must finish before buffer->commit is called.

{
  parallel::TaskGroup tg;
  for (...)
    xxx->writeTo<ELFT>(...);
}
1037

Use the if (auto e = buffer->commit()) fatal(..) pattern in Writer.cpp

lld/ELF/InputFiles.cpp
1089

delete numSymbols == 0 which is always true

1094

The comment is incorrect. There is no lazy state.

1153

postParse is performance critical and parallel. symtab.addCmseSymPair(&sym, ext_sym); is racy.

Move cmse-specific code here and processArmCmseSymbol(sym, secIdx); to a separate pass. Consider just iterating symtab when --cmse-implib is used so that symtab.find(ext_name) and cmseSymMap can be avoided (symtab doesn't have duplicate entries).

1155

Use consume_front

1163

Diagnostics are not capitalized and don't have a trailing period.

lld/ELF/Options.td
83

We prefer FF and EEq for new options which only support the double-dash form.

lld/ELF/SymbolTable.cpp
106

delete braces around a simple single statement.

lld/ELF/SyntheticSections.cpp
2202

This order is more common.

2205

delete braces around simple single statement

The if for impLibMaxAddr looks strange. Move impLibMaxAddr += this->entsize after the loop.

2208

delete config->emachine == EM_ARM

2210

delete blank line

2226

Use try_emplace and the pair result to know whether the value has been inserted. Avoid code duplication.

2259

inline this called-only-once function.

2260

The preferred syntax is /*off=*/0

lld/ELF/SyntheticSections.h
1222

blank line above

use final

1242

a specified size.

1247

delete blank line

This revision now requires changes to proceed.Dec 3 2022, 4:57 PM

I am very familiar with lld/ELF. However, after rereading the description twice I was still quite puzzled until I started to read the implementation.
I think the description can use some reorganization. A random user will likely be more confused than I.

The 3 newly added options are critical for a reader, but the description is unclear.

  • The '--in-implib=file' specifies ... has a vague description then stops describing more semantics.
  • ld emits a warning is given if no '--out-implib' is given but new symbols have (typo?) The paragraph should just say what --out-implib does. was needed in ld maybe because can be moved to a footnote (Reference).
  • LLD does not implement '--out-implib' so the '--cmse-implib' option seems redundant at the moment. I'm puzzled. Doesn't this patch add --out-implib
  • Make import library to be a secure gateway import => grammar error in to be?
  • Option HelpText: please don't just follow GNU ld which have very vague description. These options can use better description. If necessary, add more information to lld/docs/ld.lld.1
MaskRay added inline comments.Dec 3 2022, 5:39 PM
lld/ELF/SyntheticSections.cpp
2285

delete parentheses

2290

const size_t off = (impLibMaxAddr < getVA() ? getVA() : impLibMaxAddr) + newEntries * entsize;

Prefer computing the initial value in one step instead of two.

2295
2298

delete parentheses

lld/ELF/SyntheticSections.h
1372

move before ppc64LongBranchTarget

lld/test/ELF/arm-cmse-error.s
17 ↗(On Diff #479223)

There should be another test where the absolute addresses are different.

lld/test/ELF/arm-cmse-veneers.s
100

-NEXT if applicable

Thanks for posting this I've got about half way through. I'll need to re-read the specification as well. Will comment more as I have them.

Will be worth a motivating example in the description, for example a simple secure state offering a single function, with the non-secure state accessing it via the import library. Will also be useful to show the extension of the secure-state API with another function while maintining the address of the previous function. This might help people reading the description.

lld/ELF/Arch/ARM.cpp
916

Would be good to have a summary comment of what a CMSE import lib is, and what the function has to do. Possibly worth a link to the specification as well.

940

IIUC we are writing an ELF file with just 4 sections, each with one input section of the same name. Perhaps use a prefix or suffix of Is and os to distinguish the two?

1011

In this case it looks like we are only ever writing 4 sections, the import lib is really just a symbol table. We'll not need the SHN_XINDEX part.

1019

I don't think you'll ever need a SHT_SYMTAB_SHNDX section for the import lib as we know ahead of time that the number of sections is fewer than the limit. Can you remove it?

1029

Would this be simpler if done in series? In other words is it worth multithreading for just 4 relatively small sections?

lld/ELF/Config.h
169

LLD tends not to use underscores in names. I recommend using CMSE in the name as implib is quite general and CMSE is very specific.

lld/ELF/InputFiles.cpp
1024

These look like they are only used within this source file. Can they be made static?

1148

Will be worth a brief comment saying what this function is supposed to do. It saves having to look up the specification.

1149

Is this sufficient? For example should we check from the BuildAttributes that the architecture is at least v8-M?

1156

Suggest Try to find the associated symbol definition. Symbol must have external linkage.

Symbol visibility has a specific meaning that is best not to confuse with binding.

1182

Suggest // Check that both symbols are Thumb symbols of type STT_FUNC.

lld/ELF/SymbolTable.h
63

Comment to describe these CMSE specific containers.

lld/ELF/SyntheticSections.h
1241

The symbol table entry, or the symbol from the import library

1243

You might be able to forwared declar ArmCmseSGVeneer (you only use it by name in the Vector), and move the class definitions to the .cpp file.

amilendra updated this revision to Diff 482810.EditedDec 14 2022, 4:25 AM

Thanks for the comprehensive reviews.

I've

  • Addressed most of the review comments.
  • Added more error handling.
  • Add diagnostics for missing .gnu.sgstubs section start address.
  • Sorted the symbols alphabetically for deterministic address assignment.

I'll update the error message/help/description text later.

amilendra marked 44 inline comments as done.Dec 14 2022, 4:45 AM
amilendra added inline comments.
lld/ELF/Arch/ARM.cpp
940

I've refactored to use OutputSection, InputSection pairs which hopefully makes the intentions clearer.

1029

IIUC the OutputSection::writeTo(uint8_t *buf, parallel::TaskGroup &tg) interface only supports multi-threaded calls?

1030

Thanks. This was the reason my tests were segfaulting during multi-threaded runs.

lld/ELF/InputFiles.cpp
1149

Thanks. Added checks for BuildAttributes.

lld/ELF/SyntheticSections.cpp
2226

IIUC, try_emplace inserts to symtab.cmseImportLib if it the key not already present?

lld/ELF/SyntheticSections.h
1222

ArmCmseSGVeneer is sub-classed further as ArmCmseFixedAddressSGVeneer and ArmCmseVariableAddressSGVeneer. Added final to those classes.

1243

I was not able to do this because the subclasses refer to enum Kind defined in ArmCmseSGVeneer.

amilendra marked 4 inline comments as done.Dec 14 2022, 4:47 AM

Neither arc patch D139092 nor curl -L 'https://reviews.llvm.org/D139092?download=1' | patch -p1 applies this patch cleanly. Can you rebase and check that this works?

lld/ELF/Arch/ARM.cpp
915–1022
915–1022

variable case

lld/ELF/Driver.cpp
341–342

This piece of code doesn't exist in upstream. Please ensure the patch is against the upstream. If you have multiple commits locally, fold them.

lld/ELF/InputFiles.cpp
533–540

As mentioned, toString(this) is costly and should be used here.

amilendra updated this revision to Diff 483904.Dec 19 2022, 4:15 AM

arc patch D139092 fails with the following error.

 INFO  Base commit is not in local repository; trying to fetch.
Branch name arcpatch-D139092 already exists; trying a new name.
Created and checked out branch arcpatch-D139092_1.


    This diff is against commit 28f0aebff1119d0bbae765b3600751ece4733e12, but
    the commit is nowhere in the working copy. Try to apply it against the
    current working copy state? (e7bd805805e4d8f61faeeee150bbdab1df116452)
    [Y/n] n

Usage Exception: User aborted the workflow.

I think the reason was that I rebased arcpatch-D139092 against main locally
and that caused the parent commit ID to change.
Reverting back to Diff ID 479223 to get back this revision to a consistent state.
Will apply Diff ID 482810 next without rebasing so that (hopefully)
the parent commit will remain consistent.

amilendra updated this revision to Diff 483907.Dec 19 2022, 4:24 AM
git am <Diff ID 482810>.patch
arc diff --update
amilendra updated this revision to Diff 483914.Dec 19 2022, 4:47 AM

arc patch D139092 still gives me the error.

 INFO  Base commit is not in local repository; trying to fetch.
Created and checked out branch arcpatch-D139092.
    This diff is against commit 0580656340fc6850797dc5291df79b8fb10f53ff, but
    the commit is nowhere in the working copy. Try to apply it against the
    current working copy state? (ec87b7231586865f7b0adfe22a0d056e40ae9264)
    [Y/n]

Usage Exception: User aborted the workflow.

Folding the two commits and doing arc diff --update D139092

amilendra updated this revision to Diff 483920.Dec 19 2022, 5:14 AM

Now I can do arc patch D139092
Rebasing using arc diff $(git merge-base HEAD origin) --update D139092

amilendra updated this revision to Diff 483972.Dec 19 2022, 8:13 AM

Fix review comments.
arm-cmse-invalid-arch.s only works for the ARM target.

amilendra marked 2 inline comments as done.

Addressing a review comment I missed.

amilendra marked 5 inline comments as done.Dec 19 2022, 11:01 AM
amilendra added inline comments.
lld/ELF/Driver.cpp
341–342

This piece of code doesn't exist in upstream. Please ensure the patch is against the upstream. If you have multiple commits locally, fold them.

I think it should have been fixed now.

bsmith added a subscriber: bsmith.Mon, Jan 23, 7:14 AM