This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/
-
ELF/
-
Arch/
4/7
X86.cpp
3/6
X86_64.cpp
1
Config.h
1/4
Driver.cpp
-
Options.td
-
SyntheticSections.h
4/5
SyntheticSections.cpp
-
Target.h
1
Writer.cpp
-
docs/
-
ld.lld.1
-
test/ELF/
-
ELF/
-
i386-cet.s
-
i386-feature-cet.s
-
x86-64-cet.s
3/4
x86-64-feature-cet.s

Differential D59780

Support Intel Control-flow Enforcement Technology
ClosedPublic

Authored by MaskRay on Mar 25 2019, 10:19 AM.

Download Raw Diff

Details

Reviewers

xiangzhangllvm
peter.smith
hjl.tools
• espindola
ruiu

Commits

rG7cd429f27d48: [ELF] Add -z force-ibt and -z shstk for Intel Control-flow Enforcement…

Summary

This patch is based on https://reviews.llvm.org/D58102.

This patch adds Intel CET (Control-flow Enforcement Technology) support to
lld. The implementation follows the draft version of psABI which you can
download from https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI.

I added a comment about what Intel CET is, so I won't repeat it here, but in
summary, CET introduces a new restriction on indirect jump instructions so
that you can limit the places to which you can jump to using indirect jumps.

In order to use the feature, you need to compile source files with
-fcf-protection=full. CET is enabled only when all input files are compiled with
the flag. If you want to make sure that the feature is enabled, pass
-z force-ibt -z shstk to lld.

CET-enabled executable files have two PLT sections, ".plt" and ".plt.sec".
For the details as to why we have two sections, please read this patch's
comment.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

peter.smith added inline comments.Mar 26 2019, 10:28 AM

lld/ELF/Driver.cpp
1684	In theory you could update Config->X86Features incrementally as you read them in rather than storing them in InputFiles and processing them later. Although I think doing all the logic here makes it easier to understand.
1692	I've made a few English suggestions below. No functional changes. // To enable CET (x86's hardware-assited control flow enforcement), // each source file must be compiled by passing --xxx to the compiler // [what is the compiler flag to enable CET?]. Object files compiled with the // flag contain feature flags indicating that they are compatible with CET. We // enable the feature only when all object files are compatible with CET. // // This function returns the merged feature flags. If 0, we cannot enable CET.
lld/ELF/SyntheticSections.cpp
291	I think it would be better to say "In x86, object files may contain feature flags indicating the features that they have used." The x86-64-psABI has: GNU_PROPERTY_X86_FEATURE_1_AND The x86 processor features indicated by the corresponding bits are used in program.
2563	Only the spec is available at the moment.
2570	"4 bytes long" and "16 bytes long"

ruiu marked 10 inline comments as done.Mar 26 2019, 1:47 PM

ruiu added inline comments.

lld/ELF/Arch/X86_64.cpp
185–186	Buf is incremented on every loop iteration.
lld/ELF/InputFiles.cpp
627 ↗	(On Diff #192157)	Data can be empty.
631 ↗	(On Diff #192157)	SIze is actually the size of the body, so I needed +8. Fixed.
lld/test/ELF/Inputs/x86-64-cet4.s
3 ↗	(On Diff #192157)	I don't want to calculate offset in the assembler as this is a test file. Something more obvious is preferable. I added i386-cet* test files.

address review comments

Harbormaster completed remote builds in B29650: Diff 192347.Mar 26 2019, 1:49 PM

add tests to verify the contents of .plt and .splt

Harbormaster completed remote builds in B29653: Diff 192351.Mar 26 2019, 1:58 PM

In D59780#1442728, @MaskRay wrote:
Is --force-cet the best option name? H. J. Lu pointed out that gold has -z ibtplt and other options, but I couldn't find these options in the binutils' repository. Is the option really implemneted to gold?

The binutils-gdb repository hosts two linkers: ld/ld-new (GNU ld) and gold/ld-new (GNU gold). -z ibtplt and -z ibt are ld.bfd options, not gold's. I asked in https://reviews.llvm.org/D58102 if there is a proposal to add support to gold but get no response so far...

I've done some experiments and let me clarify things. @xiangzhangllvm Please correct me I missed something.

On the compiler side

Support for Intel CET requires GCC 8 (https://gcc.gnu.org/gcc-8/changes.html).

-fcf-protection= is used to define __CET__ (branch: 1, return: 2, full: 3) and emit .note.gnu.property in object files.

In some older releases of GCC 8 (before Apr 20, 2018), CET requires two other options -mibt and -mshstk, which enable the macros __IBT__ __SHSTK__, respectively. -mcet implies both -mibt and -mshstk and defines __CET__. -mshstk enables several builtin functions.

-mibt is deleted in some newer GCC 8 release. -mshstk controls only the availability of some builtin options. If you are using some older releases of GCC 8:
gcc a.c -mibt -c # `__IBT__` is defined but no .note.gnu.property
gcc a.c -fcf-protection=branch -mibt -c # NT_GNU_PROPERTY_X86_FEATURE_IBT tag
gcc a.c -fcf-protection=return -mshstk -c # NT_GNU_PROPERTY_X86_FEATURE_SHSTK tag
gcc a.c -fcf-protection=full -mcet -c # -mcet is short for -mibt -mshstk, IBT + SHSTK
In newer GCC, forget about -mibt -mshstk and just use -fcf-protection=. All these options do no affect codegen.

On the linker side

ibt is what concerns linkers. shstk has no influence other than the output note section. The feature is only implemented in ld.bfd, not in gold.

-z ibtplt generates the second PLT .plt.sec (not .splt). -z ibt implies -z ibtplt.
The option does not check the input NT_GNU_PROPERTY_X86_FEATURE_IBT tag.

-z ibt emits NT_GNU_PROPERTY_X86_FEATURE_IBT but plain -z ibtplt doesn't.

GCC 5~8 support Intel MPX and you shall see the bnd prefix in some jump instructions.
Note that GCC 9/Linux kernel 4.18 drop MPX. Fortunately it seems MPX doesn't change the size of PLT entries. Thus in lld we don't have to support MPX, but we are still flexible enough to add related support if it revives in the future.

I think you have elaborate them.
I have check it. it is really ld.bfd not ld.gold, sorry for my misremember.

In D59780#1442757, @MaskRay wrote:
So on the compiler driver side, -fcf-protection is the single option end users are concerned. I believe this option hasn't been assigned linker option semantics in either GCC or clang. If assigning it the linker option semantics is favorable, we should make it compatible in GCC/clang and ideally use the same options in lld/ld.bfd/gold.

The current form of this patch just implements --force-cet, thus I believe the use cases are:
gcc a.o -o a # create `.splt` (or `.plt.sec`) if all of a.o and other stdlib have IBT bit set
             # I worry that someone may not want .splt for this case
  ld.lld a.o ... -o a

If all the files contain IBT, it means all the files compiled with -fcf-protection(=branch) ,there is no reason for user not to enable the IBT.

gcc a.o -fcf-protection -o a # fail if any of a.o and other stdlib do not have IBT+SHSTK bits set

It will not fail, it will continue link out a.out which contain no IBT+SHSTK

ld.lld a.o ... --force-cet -o a

This will require all files be compiled with -fcf-protection(=xxx), or it will give out error.

Another possibility is:
gcc a.o -o a # create .splt (or .plt.sec) if all of a.o and other stdlib have IBT bit set
ld.lld a.o ... -o a
gcc a.o -fcf-protection -o a # -fcf-protection is ignored as a linker option
ld.lld a.o ... -o a

Yes, current gcc and ld.lld do like this. -fcf-protection is just compiler option, it will be ignored for linker

gcc a.o -fcf-protection -Wl,--force-cet -o a # -Wl,--force-cet may be replaced by a -f flag
ld.lld a.o ... --force-cet -o a

Yes, currently, they are same, but I think people will rarely use --force-cet, it hard to linker the big projects in now days. Especially for the self build projects.

Another thing, given the removal of `-mibt` (D46881 on clang side), I wish we can repurpose `-z ibt` `-z ibtplt` `-z shstk` in ld.bfd if we can find better semantics for them, e.g. let `-fcf-protection=branch` translate to `-z ibt`: fail if any of input object files does not have `GNU_PROPERTY_X86_FEATURE_1_IBT` set. `-fcf-protection=shstk` translates to `-z shstk`, respectively.

It seems not good idea, we should separate the options of compiler and linker, and reserve the "-z xxx" options just for tests, not encourage users to use them for other purposes. Like gcc does.
In normal way, the resulting CET flags should mainly depend on whether all the link files are CET compiled or not.

xiangzhangllvm added inline comments.Mar 26 2019, 7:16 PM

lld/ELF/Arch/X86_64.cpp
185–186	Hi ruiu, I want to mean: Next loop Buf += 16, (-getPltEntryOffset(I)) -= 16 So they will write the some address for jmpq Instruction.
lld/test/ELF/x86-64-cet.s
35–37	This context is not corresponding to your writePlt() function. if (In.IBTPlt) { const uint8_t Inst[] = { 0xf3, 0x0f, 0x1e, 0xfb, // endbr32 0xff, 0x25, 0, 0, 0, 0, // jmpq *got(%rip) 0x66, 0x0f, 0x1f, 0x44, 0, 0, // nop };

ruiu mentioned this in rL357194: Create an instance of Target after reading all input files. NFC..Mar 28 2019, 10:38 AM

ruiu mentioned this in rLLD357194: Create an instance of Target after reading all input files. NFC..

ruiu mentioned this in rGf28825bc06f9: Create an instance of Target after reading all input files. NFC..

Generally simplified the code.
Separated CET-specific code from the regular code.
Fix bugs and add tests for .got.plt, .plt and .splt.

Harbormaster completed remote builds in B29784: Diff 192744.Mar 28 2019, 4:57 PM

xiangzhangllvm added inline comments.Mar 28 2019, 10:09 PM

lld/ELF/Arch/X86.cpp
430	It is not useful, For code size, Is it better to write nothing here, or not use it at all ?

MaskRay added inline comments.Mar 28 2019, 10:52 PM

lld/ELF/Arch/X86.cpp
414	`brend32` -> `endbr32`
416	`template <class ELFT> class IntelCET : public X86_64<ELFT> {` `IBT-enabled PLT` can be used with the ILP32 programming model (x86-32 ABI). (If you don't like putting too much stuff in the initial commit, `class IntelCET : public X86_64<ELF64LE> {`.
lld/ELF/Arch/X86_64.cpp
572	`brend32` -> `endbr64`
589	If I understand it correctly, `IntelCET::writePlt` writes the contents of ld.bfd's `.plt.sec`, then it doesn't need a PLT header. Can `Symbol::getPltVA` (`Plt->getVA() + Plt->HeaderSize + PltIndex * Target->PltEntrySize;`) compute a wrong address because it thinks `HeaderSize` is actually 0? Disassembly of section .plt: // counterpart in lld: IntelCET::writeIBTPlt 0000000000001020 <.plt>: 1020: ff 35 e2 2f 00 00 pushq 0x2fe2(%rip) # 4008 <_GLOBAL_OFFSET_TABLE_+0x8> 1026: f2 ff 25 e3 2f 00 00 bnd jmpq 0x2fe3(%rip) # 4010 <_GLOBAL_OFFSET_TABLE_+0x10> 102d: 0f 1f 00 nopl (%rax) 1030: f3 0f 1e fa endbr64 1034: 68 00 00 00 00 pushq $0x0 1039: f2 e9 e1 ff ff ff bnd jmpq 1020 <.plt> 103f: 90 nop 1040: f3 0f 1e fa endbr64 1044: 68 01 00 00 00 pushq $0x1 1049: f2 e9 d1 ff ff ff bnd jmpq 1020 <.plt> 104f: 90 nop ... Disassembly of section .plt.sec: // counterpart in lld: IntelCET::writePlt 0000000000001060 <putchar@plt>: 1060: f3 0f 1e fa endbr64 1064: f2 ff 25 ad 2f 00 00 bnd jmp QWORD PTR [rip+0x2fad] # 4018 <putchar@GLIBC_2.2.5> 106b: 0f 1f 44 00 00 nop DWORD PTR [rax+rax1+0x0] 0000000000001070 <puts@plt>: 1070: f3 0f 1e fa endbr64 1074: f2 ff 25 a5 2f 00 00 bnd jmp QWORD PTR [rip+0x2fa5] # 4020 <puts@GLIBC_2.2.5> 107b: 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0]
lld/ELF/InputFiles.cpp
594 ↗	(On Diff #192744)	call flow control -> Control-Flow Enforcement ?
608 ↗	(On Diff #192744)	`using Elf_Nhdr = ...` is more common nowadays.

(Besides, at some point, we need some IPLT + --require-cet tests.)

lld/ELF/Arch/X86.cpp
431	In the implementation of ld.bfd, I don't have 16 bytes of NOPs. I think in the ABI, `.SPLT1` is the counterpart of `.PLT1`, there is `.PLT0` (PLT header) but `.SPLT0` doesn't exist. Also, ld.bfd uses `.plt.sec`, not `.splt`. @hjl.tools Which is the preferred name for the second PLT section?

ruiu marked 10 inline comments as done.Mar 29 2019, 10:20 AM

ruiu added inline comments.

lld/ELF/Arch/X86.cpp
416	Actually the template is gone, I removed it because no code depended on ELFT. (Also no one really uses x32 ABI I guess? I thought that I read an article that Linux kernel developers are discussing deprecating x32 ABI.)
lld/ELF/InputFiles.cpp
594 ↗	(On Diff #192744)	This is a general name and not specific to the Intel's implementation.
608 ↗	(On Diff #192744)	Yeah, but we should convert all of them in lld at once.

removed SPLT0 as it contained just two NOPs
added tests for --require-cet

Harbormaster completed remote builds in B29820: Diff 192855.Mar 29 2019, 10:20 AM

rebase

Harbormaster completed remote builds in B29882: Diff 193034.Mar 31 2019, 6:02 PM

MaskRay added inline comments.Mar 31 2019, 8:10 PM

lld/ELF/Arch/X86.cpp
416	Sorry I didn't notice that X86_64 had been de-templated.
lld/ELF/InputFiles.cpp
594 ↗	(On Diff #192744)	My concern here is that llvm-objdump/GNU objdump and other tools/profilers would have to recognize two section names `.plt.sec` and `.splt`. Many tools use the section name to do special things, e.g. @hjl.tools If the ABI can be changed to not deliberately differ from the implementation (or the other way round), it'd be great. binutils/objdump.c:739 static bfd_boolean is_significant_symbol_name (const char * name) { return strncmp (name, ".plt", 4) == 0 \|\| strcmp (name, ".got") == 0; }

ruiu marked an inline comment as done.Mar 31 2019, 8:27 PM

ruiu added inline comments.

lld/ELF/InputFiles.cpp
594 ↗	(On Diff #192744)	I'm not sure if I follow. Currently, in my patch, I use ".splt" and ".plt" as PLT section names, which I think as defined by the ABI. Are you saying that ".plt.sec" might be a correct name?

xiangzhangllvm added inline comments.Mar 31 2019, 8:35 PM

lld/ELF/InputFiles.cpp
594 ↗	(On Diff #192744)	The SPEC not fix the plt's name, It full name is Second IBT-enabled PLT, GCC name it as plt.sec. I name it splt for short at first. It may be better to have the same name with GCC.

ruiu marked an inline comment as done.Mar 31 2019, 9:36 PM

ruiu added inline comments.

lld/ELF/InputFiles.cpp
594 ↗	(On Diff #192744)	Ok, the I'll just follow the others. Is there anything other than this you invented a nonstandard stuff for this patch?

MaskRay added inline comments.Mar 31 2019, 10:12 PM

lld/ELF/InputFiles.cpp
594 ↗	(On Diff #192744)	I'm not sure if I follow. Currently, in my patch, I use ".splt" and ".plt" as PLT section names, which I think as defined by the ABI. Are you saying that ".plt.sec" might be a correct name? Sorry for not being clear. PLT sections do not have a dedicated section type and in practice they are usually recognized by the name `.plt`. My example above gave one case. If ld.bfd uses `.plt.sec` while we use `.splt`, the tools (objdump/llvm-mc/profiler/...) would have to understand the two names. The complexity could have been avoided if the linkers and the ABI agreed on the same name, say, `.plt.sec`. I prefer `.plt.sec` to `.splt` because the same-prefix convention is already used in several other places to have fine-grained section types. Thus, I suggest, if there is no strong reason to keep using `.splt` in the ABI, change it to what the implementations do: `.plt.sec`.

renamed .splt .plt.sec

Harbormaster completed remote builds in B29887: Diff 193044.Mar 31 2019, 10:49 PM

writePltHeader should also be changed.

diff --git i/ELF/Arch/X86_64.cpp w/ELF/Arch/X86_64.cpp
index 668b40863..75f780f6b 100644
--- i/ELF/Arch/X86_64.cpp
+++ w/ELF/Arch/X86_64.cpp
@@ -144,7 +144,7 @@ void X86_64::writePltHeader(uint8_t *Buf) const {
   };
   memcpy(Buf, PltData, sizeof(PltData));
   uint64_t GotPlt = In.GotPlt->getVA();
-  uint64_t Plt = In.Plt->getVA();
+  uint64_t Plt = In.IBTPlt ? In.IBTPlt->getVA() : In.Plt->getVA();
   write32le(Buf + 2, GotPlt - Plt + 2); // GOTPLT+8
   write32le(Buf + 8, GotPlt - Plt + 4); // GOTPLT+16
 }

With these changes, I manage to run hello-world with clang {,-m32} -fuse-ld=lld -fcf-protection=branch a.c -Wl,--require-cet -o a; ./a

lld/ELF/Arch/X86_64.cpp
618	`write32le(Buf + 5, I);` For glibc lazy binding, x86_64 uses relocation indices while x86_32 uses offsets.
lld/ELF/Writer.cpp
532	Since `In.IBTPlt` is named `.plt` while `In.Plt` is named `.plt.sec`. Creating `In.IBTPlt` before `In.Plt` will give us this ordering: `.plt` `.plt.sec` (instead of `.plt.sec` `.plt`), which may look more natural. No strong preference here. Up to you.

review comments

In D59780#1449525, @MaskRay wrote:

writePltHeader should also be changed.

diff --git i/ELF/Arch/X86_64.cpp w/ELF/Arch/X86_64.cpp
index 668b40863..75f780f6b 100644
--- i/ELF/Arch/X86_64.cpp
+++ w/ELF/Arch/X86_64.cpp
@@ -144,7 +144,7 @@ void X86_64::writePltHeader(uint8_t *Buf) const {
   };
   memcpy(Buf, PltData, sizeof(PltData));
   uint64_t GotPlt = In.GotPlt->getVA();
-  uint64_t Plt = In.Plt->getVA();
+  uint64_t Plt = In.IBTPlt ? In.IBTPlt->getVA() : In.Plt->getVA();
   write32le(Buf + 2, GotPlt - Plt + 2); // GOTPLT+8
   write32le(Buf + 8, GotPlt - Plt + 4); // GOTPLT+16
 }

With these changes, I manage to run hello-world with clang {,-m32} -fuse-ld=lld -fcf-protection=branch a.c -Wl,--require-cet -o a; ./a

Great! Thank you for verifying!

Harbormaster completed remote builds in B29890: Diff 193048.Apr 1 2019, 12:55 AM

".splt" is now In.Plt. Changes to Relocations.cpp are reverted as we no longer need them.

Is --force-cet the best option name? H. J. Lu pointed out that gold has -z ibtplt and other options, but I couldn't find these options in the binutils' repository. Is the option really implemneted to gold?

The description needs update. .splt -> .plt.sec, gold -> ld.bfd

ibtplt is implemented for ld.bfd but not for gold.

This revision is now accepted and ready to land.Apr 1 2019, 1:04 AM

xiangzhangllvm added inline comments.Apr 1 2019, 1:08 AM

lld/ELF/InputFiles.cpp
594 ↗	(On Diff #192744)	Another difference with GNU-LD is that, GNU-LD can deal with the muli-.note.gnu.sections case, for example: file.s : .section ".note.gnu.property", "a" .... .section ".note.gnu.property", "a" .... In my earlier patch version I supported it, but at last, to simply the patch, I delete this function. Because I sought compiler and assember should merge them to 1 section (usually with multi-desc) one day. I am not sure my sought is right or not. I think it should let you know.

MaskRay added inline comments.Apr 1 2019, 1:26 AM

lld/ELF/Arch/X86.cpp
416	`IBT` (or `IBTPlt`) may be a better name than `IntelCET` as this class has nothing to do with the other part of CET: shstk.
lld/test/ELF/Inputs/i386-cet1.s
2 ↗	(On Diff #193048)	Missing `.align 4`
lld/test/ELF/Inputs/i386-cet2.s
1 ↗	(On Diff #193048)	Missing `.align 8`
lld/test/ELF/Inputs/i386-cet4.s
1 ↗	(On Diff #193048)	In one of the tests, it may be good to check `--require-cet` works as intended if `.note.gnu.property` contains several notes. `ld -r` may merge `.note.gnu.property` of several input objects into one.

ruiu edited the summary of this revision. (Show Details)Apr 1 2019, 1:27 AM

MaskRay added inline comments.Apr 1 2019, 1:27 AM

lld/test/ELF/Inputs/i386-cet2.s
1 ↗	(On Diff #193048)	Sorry, `x86-64-cet.s` needs `.align 8`, not `i386-cet*.s`.

ruiu marked an inline comment as done.Apr 1 2019, 1:33 AM

ruiu added inline comments.

lld/test/ELF/Inputs/i386-cet1.s
2 ↗	(On Diff #193048)	Do you need that? It's not wrong to add that but for the purpose of testing this file is enough, no?

MaskRay added inline comments.Apr 1 2019, 1:47 AM

lld/test/ELF/Inputs/i386-cet1.s
2 ↗	(On Diff #193048)	Not required but that would make the tests feel more real.

xiangzhangllvm added inline comments.Apr 1 2019, 4:21 AM

lld/ELF/SyntheticSections.cpp
311	Here may waste 4 bytes of code size in 32-bit machine.

ruiu marked an inline comment as done.Apr 1 2019, 9:44 PM

ruiu added inline comments.

lld/ELF/SyntheticSections.cpp
311	Wasting 4 bytes is I think completely acceptable, but I think this is a correctness issue. I'll fix that.

address review comments

ruiu marked an inline comment as done.Apr 1 2019, 10:27 PM

Harbormaster completed remote builds in B29937: Diff 193233.Apr 1 2019, 10:30 PM

I noticed that the latest GNU objdump can symbolize ld.bfd emitted .plt .plt.sec entries but not those emitted by lld. The gdb disassembly outputs exhibit the same difference.
I guess it is probably because lld doesn't emit the bnd prefix (0xf2) while ld.bfd does (BTW, I cannot turn off it: there is no -z nobndplt), and the relevant support isn't currently available in objdump.
If that is the case, it gives me the impression that the ABI can be changed as no tooling has caught up yet.

Since we don't emit the bnd prefix (0xf2) for MPX (MPX support has been dropped by GCC 9), we can merge .plt and .plt.sec entries as follows:

4 endbr64
5 jmpq *xxx(%rip) ; jump to the next endbr64 for lazy binding
4 endbr64
5 pushq           ; relocaton index
5 jmpq *xxx(%rip) ; jump to .plt

This PLT entry takes 4+5+4+5+5=23 bytes. If we aim for an 8-byte alignment, we can get the 24-byte PLT entry size, which improves on the current 32-byte.
(However, if the bnd prefix is prepended to the jmpq instructions, the PLT will take 25 bytes and won't cram in a 24-byte entry. I feel people don't care much about it now, so don't let it pessimize common cases)

I have another question to @xiangzhangllvm: have you considered optimizing the PLT entry size for -z now? We can have such simple PLT entry:

4 endbr64
5 jmpq *xxx(%rip)

Of course, if we use -z now, there may be little reason not to just use -fno-plt when compiling the sources.

Hi MaskRay, I tested your idea by changing the patch:

diff --git a/ELF/Arch/X86_64.cpp b/ELF/Arch/X86_64.cpp
index 7d7cc00..14fb6b2 100644
--- a/ELF/Arch/X86_64.cpp
+++ b/ELF/Arch/X86_64.cpp
@@ -553,11 +553,11 @@ void IntelCET::writePlt(uint8_t *Buf, uint64_t GotPltEntryAddr,
                         unsigned RelOff) const {
   const uint8_t Inst[] = {
       0xf3, 0x0f, 0x1e, 0xfa,       // endbr64
-      0xff, 0x25, 0,    0,    0, 0, // jmpq *got(%rip)
-      0x66, 0x0f, 0x1f, 0x44, 0, 0, // nop
+      0xf2, 0xff, 0x25, 0,    0,    0, 0, // jmpq *got(%rip)
+      0x0f, 0x1f, 0x44, 0, 0, // nop
   };
   memcpy(Buf, Inst, sizeof(Inst));
-  write32le(Buf + 6, GotPltEntryAddr - PltEntryAddr - 10);
+  write32le(Buf + 7, GotPltEntryAddr - PltEntryAddr - 11);
 }

After that, GNU objdump can symbolize .plt.sec, it is really the BND prefix factor.

[xiangzh1@shliclel100 /export/users.shared/xiangzh1/LLVM_ORG/test]$ld.lld -e func1 t.o t1.so -o tout-new
[xiangzh1@shliclel100 /export/users.shared/xiangzh1/LLVM_ORG/test]$objdump -d tout-new

tout-new:     file format elf64-x86-64


Disassembly of section .text:

0000000000201000 <func1>:
  201000:       e8 3b 00 00 00          callq  201040 <func2@plt>
  201005:       e8 46 00 00 00          callq  201050 <func3@plt>
  20100a:       c3                      retq

Disassembly of section .plt:

0000000000201010 <.plt>:
  201010:       ff 35 f2 1f 00 00       pushq  0x1ff2(%rip)        # 203008 <_DYNAMIC+0x1008>
  201016:       ff 25 f4 1f 00 00       jmpq   *0x1ff4(%rip)        # 203010 <_DYNAMIC+0x1010>
  20101c:       0f 1f 40 00             nopl   0x0(%rax)
  201020:       f3 0f 1e fa             endbr64
  201024:       68 00 00 00 00          pushq  $0x0
  201029:       e9 e2 ff ff ff          jmpq   201010 <func1+0x10>
  20102e:       66 90                   xchg   %ax,%ax
  201030:       f3 0f 1e fa             endbr64
  201034:       68 01 00 00 00          pushq  $0x1
  201039:       e9 d2 ff ff ff          jmpq   201010 <func1+0x10>
  20103e:       66 90                   xchg   %ax,%ax

Disassembly of section .plt.sec:

0000000000201040 <func2@plt>:
  201040:       f3 0f 1e fa             endbr64
  201044:       f2 ff 25 cd 1f 00 00    bnd jmpq *0x1fcd(%rip)        # 203018 <func2>
  20104b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

0000000000201050 <func3@plt>:
  201050:       f3 0f 1e fa             endbr64
  201054:       f2 ff 25 c5 1f 00 00    bnd jmpq *0x1fc5(%rip)        # 203020 <func3>
  20105b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

But I think this is not the good reason to change the ABI, it is better to change the objdump tool's code to fit the new plt.
GCC have implemented this ABI for one year, It will be very hard to push the changing work.

We never considered optimizing the PLT entry size, because we can not make sure how the developers to "deal with" the old 16-bytes plt in their old programs.
That is also one of the reasons why GCC (H.J.) split the plt to 2 16-bytes plts at first.

In D59780#1452530, @xiangzhangllvm wrote:

Hi MaskRay, I tested your idea by changing the patch:

diff --git a/ELF/Arch/X86_64.cpp b/ELF/Arch/X86_64.cpp
index 7d7cc00..14fb6b2 100644
--- a/ELF/Arch/X86_64.cpp
+++ b/ELF/Arch/X86_64.cpp
@@ -553,11 +553,11 @@ void IntelCET::writePlt(uint8_t *Buf, uint64_t GotPltEntryAddr,
                         unsigned RelOff) const {
   const uint8_t Inst[] = {
       0xf3, 0x0f, 0x1e, 0xfa,       // endbr64
-      0xff, 0x25, 0,    0,    0, 0, // jmpq *got(%rip)
-      0x66, 0x0f, 0x1f, 0x44, 0, 0, // nop
+      0xf2, 0xff, 0x25, 0,    0,    0, 0, // jmpq *got(%rip)
+      0x0f, 0x1f, 0x44, 0, 0, // nop
   };
   memcpy(Buf, Inst, sizeof(Inst));
-  write32le(Buf + 6, GotPltEntryAddr - PltEntryAddr - 10);
+  write32le(Buf + 7, GotPltEntryAddr - PltEntryAddr - 11);
 }

After that, GNU objdump can symbolize .plt.sec, it is really the BND prefix factor.

[xiangzh1@shliclel100 /export/users.shared/xiangzh1/LLVM_ORG/test]$ld.lld -e func1 t.o t1.so -o tout-new
[xiangzh1@shliclel100 /export/users.shared/xiangzh1/LLVM_ORG/test]$objdump -d tout-new

tout-new:     file format elf64-x86-64


Disassembly of section .text:

0000000000201000 <func1>:
  201000:       e8 3b 00 00 00          callq  201040 <func2@plt>
  201005:       e8 46 00 00 00          callq  201050 <func3@plt>
  20100a:       c3                      retq

Disassembly of section .plt:

0000000000201010 <.plt>:
  201010:       ff 35 f2 1f 00 00       pushq  0x1ff2(%rip)        # 203008 <_DYNAMIC+0x1008>
  201016:       ff 25 f4 1f 00 00       jmpq   *0x1ff4(%rip)        # 203010 <_DYNAMIC+0x1010>
  20101c:       0f 1f 40 00             nopl   0x0(%rax)
  201020:       f3 0f 1e fa             endbr64
  201024:       68 00 00 00 00          pushq  $0x0
  201029:       e9 e2 ff ff ff          jmpq   201010 <func1+0x10>
  20102e:       66 90                   xchg   %ax,%ax
  201030:       f3 0f 1e fa             endbr64
  201034:       68 01 00 00 00          pushq  $0x1
  201039:       e9 d2 ff ff ff          jmpq   201010 <func1+0x10>
  20103e:       66 90                   xchg   %ax,%ax

Disassembly of section .plt.sec:

0000000000201040 <func2@plt>:
  201040:       f3 0f 1e fa             endbr64
  201044:       f2 ff 25 cd 1f 00 00    bnd jmpq *0x1fcd(%rip)        # 203018 <func2>
  20104b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

0000000000201050 <func3@plt>:
  201050:       f3 0f 1e fa             endbr64
  201054:       f2 ff 25 c5 1f 00 00    bnd jmpq *0x1fc5(%rip)        # 203020 <func3>
  20105b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

I don't think I'm convinced. MaskRay's point is that the tooling's support of CET is buggy and needs changing anyway. Currently it looks like no tool seems to be working correctly. Then, why don't you need to stick with the ABI that is claimed to provide better performance without any evidence? Keeping ABI simple is perhaps much more important than you may be thinking. I really wish you will take it more seriously.

Besides that, I don't think GCC implemented any of the feature of the PLT -- it is only in the BFD linker but not in gold. Also no one is using the feature in the wild because no processors supporting CET is available in the market. Everything seems to be experimental yet at the moment.

We never considered optimizing the PLT entry size, because we can not make sure how the developers to "deal with" the old 16-bytes plt in their old programs.
That is also one of the reasons why GCC (H.J.) split the plt to 2 16-bytes plts at first.

In D59780#1460931, @ruiu wrote:

I don't think I'm convinced. MaskRay's point is that the tooling's support of CET is buggy and needs changing anyway. Currently it looks like no tool seems to be working correctly. Then, why don't you need to stick with the ABI that is claimed to provide better performance without any evidence? Keeping ABI simple is perhaps much more important than you may be thinking. I really wish you will take it more seriously.

Tooling output changes are straight forward to fix, and yes CET is new, and tooling is still being adjusted, but this adjustment is carried out by following the published ABI. We recently fixed ltrace for CET plt usage in Fedora for example.

To change the ABI you must go upstream to the ABI discussion list and have that discussion with the hardware vendor. The hardware vendor supports many different operating systems and implementations and the changes to the standard need to be discussed there. Alternatively you can make and adhere to your own standards and work with downstream tooling to support those standards as alternative implementations (hopefully with interoperability). I would not like to see two alternative ABIs, that leads to serious problems in downstream tooling and duplicated effort.

My opinion is that lld is too late to make the requested change to the PLT format at this point. You either get to adhere to the standard and participate in distribution builds, or not and you don't.

Besides that, I don't think GCC implemented any of the feature of the PLT -- it is only in the BFD linker but not in gold. Also no one is using the feature in the wild because no processors supporting CET is available in the market. Everything seems to be experimental yet at the moment.

There is nothing wrong with a feature being only in the BFD linker, it is the canonical linker for many distributions and build systems. Gold is a distinct linker from BFD, just like lld is a distinct linker from BFD. Gold has limited support for working with CET-enabled libraries, and the point of this support is just to disable CET (that's what is minimally required realy). Lack of feature parity with BFD may mean downstream distributions cannot use the linker for building distribution binaries that meet security policy for the target machine e.g. must have CET enabled.

Red Hat Enterprise Linux 8 Beta (publicly available for x86_64) is using the current CET PLT layout generated by BFD.

Hi Ruiu & Fangrui
Your idea is also make sense, but It is really too too late to change ABI now.
I wish we can reach a consensus as soon as possible.
Thank you all, very much!

The root cause of the problem here is that the API doesn't have enough rationale to choose a complex format than a simpler one, even though it was set just a few years ago and there's no historical burden to do so. I was repeatedly told that it is too late to change the ABI but that wasn't convincing (that's why the same topic came up many times in the thread) because no one said how much it is too late -- what thing is already done and if we redo it how long does it likely to take, etc. So, Carlos, thank you for giving some examples of the stuff that's already done to some programs.

I hope that future changes to the ABI have more rationale if some obvious simpler file format doesn't work for some reason.

So it seems I have two choices: just land this patch or start a discussion on the ABI mailing list to change the ABI. The latter would take much time.

Honestly, although I'm generally interested in the binary-level file format of executable files, I'm less interested in CET itself, especially compared to those who are working for Intel. So I'm naturally less interested in making efforts to propose a new ABI for CET. I guess I need a cooperation from Intel people to make a desired change. But looks like no one from Intel is interested in it. If that's the case, I probably should just land this patch and accept the consequence. But before that, let me ask again -- are you guys OK with the new 2-PLT scheme? Do you just accept that? Sorry for sticking to this point.

Red Hat Enterprise Linux 8 Beta (publicly available for x86_64) is using the current CET PLT layout generated by BFD.

If it uses MPX and the bnd prefix, it is probably not too late to change.

To change the ABI you must go upstream to the ABI discussion list and have that discussion with the hardware vendor. The hardware vendor supports many different operating systems and implementations and the changes to the standard need to be discussed there. Alternatively you can make and adhere to your own standards and work with downstream tooling to support those standards as alternative implementations (hopefully with interoperability). I would not like to see two alternative ABIs, that leads to serious problems in downstream tooling and duplicated effort.

My opinion is that lld is too late to make the requested change to the PLT format at this point. You either get to adhere to the standard and participate in distribution builds, or not and you don't.

@codonell @xiangzhangllvm I did create a thread in the x86-64-abi mailing list https://groups.google.com/forum/#!topic/x86-64-abi/Z6GqQDy_NaI The only reply was from the maintainer. I think I'd like more opinions from others.

I am a happy Linux glibc user but I want to learn things from multiple perspectives - insights from other parties are good. So I asked some FreeBSD folks to chime in and I asked how musl people felt about the second PLT scheme. Since this second PLT is really Linux+x86+glibc lazybind specific - it implies more overhead to other libcs. musl people can probably live with -fno-plt (musl doesn't support glibc style lazybind - they don't need PLT at all)

To be honest I feel the ABI is proposed in such a way that enforces other OSes, linkers, binary manipulation tools, emulators, diassemblers, binary analyzers, instrumentation tools and profilers to follow suit: there are announcements but no design discussions. However, I know at this point arguing is probably unavailing. Therefore, I believe I can be convinced if someone (from Intel, Red Hat, etc) gives performance numbers demonstrating this second PLT scheme has indeed the cache-locality merit and is better than the alternative 24-byte PLT entry scheme.

LuoYuanke added a subscriber: LuoYuanke.Apr 16 2019, 10:35 PM

We are Ok with with the 2-PLT scheme.

I think I'd like more opinions from others. there are announcements but no design discussions

Hi, Fangrui, the 2-PLT scheme has pass the ABI discussions, not H.J just made it by only himself.

gives performance numbers demonstrating this second PLT scheme has indeed the cache-locality merit and is better than the alternative 24-byte PLT entry scheme.

Yes, We really have not performance evidence, but this design is really more performance in theory, and this scheme has done years , and nothing went wrong.
The 24-bit plt is beautiful, but it is not the overwhelming reason to overthrow the big previous works.

annita.zhang added a subscriber: annita.zhang.Apr 17 2019, 7:07 PM

In D59780#1471032, @xiangzhangllvm wrote:

We are Ok with with the 2-PLT scheme.

I think I'd like more opinions from others. there are announcements but no design discussions

Hi, Fangrui, the 2-PLT scheme has pass the ABI discussions, not H.J just made it by only himself.

gives performance numbers demonstrating this second PLT scheme has indeed the cache-locality merit and is better than the alternative 24-byte PLT entry scheme.

Yes, We really have not performance evidence, but this design is really more performance in theory, and this scheme has done years , and nothing went wrong.
The 24-bit plt is beautiful, but it is not the overwhelming reason to overthrow the big previous works.

Let's discuss that on the x86-64 System V ABI mailing list. I signed up and once approved, I'll send a mail there.

Hi friends, I write a simple test to test the performance:
To simply test the performance affect I changed the PLT size from 16 Bytes to 24Bytes in LLVM 8.0.0.

[xiangzh1@scels74 /export/iusers/xiangzh1/LLVM/LLVMORG]$diff llvm800/tools/lld/ELF/Arch/X86_64.cpp llvm800-PLT24/tools/lld/ELF/Arch/X86_64.cpp
67c67
<   PltEntrySize = 16;
---
>   PltEntrySize = 24; //16-->24
157a158,159
>       0x66, 0x90,                   //nop
>       0x66, 0x0f, 0x1f, 0x44, 0, 0, // nop
[xiangzh1@scels74 /export/iusers/xiangzh1/LLVM/LLVMORG]$

Then I write a simple case like following:

Main.c
#include<stdio.h>
#include<time.h>
#define FUNCTION_PLT( ID ) fplt##ID();
int main(){
int i;
clock_t t1, t2;
t1 = clock();
for(i = 0; i < 1000000; i++){
FUNCTION_PLT( 0 )
FUNCTION_PLT( 1 )
FUNCTION_PLT( 2 )
...
FUNCTION_PLT( n )
}
t2 = clock();
printf("Time cost %ld\n", t2-t1 );
  return 0;
}

Fplt.c
#define FUNCTION_PLT( ID ) int fplt##ID(){return ID;}
FUNCTION_PLT( 0 )
FUNCTION_PLT( 1 )
FUNCTION_PLT( 2 )
.....
FUNCTION_PLT( n )

After that, I build them with 16Byte PLT-LLD and 24Byte PLT-LLD:

/export/iusers/xiangzh1/LLVM/LLVMORG/build800/bin/clang -c Main.c Fplt.c -w;
/export/iusers/xiangzh1/LLVM/LLVMORG/build800/bin/ld.lld -shared Fplt.o -o Fplt.so;
/export/iusers/xiangzh1/LLVM/LLVMORG/build800-plt24/bin/clang -fuse-ld=lld Main.o Fplt.so -o a24.out;
/export/iusers/xiangzh1/LLVM/LLVMORG/build800/bin/clang -fuse-ld=lld Main.o Fplt.so -o a16.out;

Then I copy them to my personal PC which just only I using it.
The follow text is copied from my personal PC (originally, I not change any data):

xiangzh1@xiangzh1:~/test/PLT/test$ export LD_LIBRARY_PATH=./
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2924013
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2721932
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2855578
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2330382
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2958377
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2361670
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2509809
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2451191
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2779484
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2551673
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2752218
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2752107
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2584999
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2630829
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 3148183
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2342366
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2678235
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2518696
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2767221
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2712448
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2667558
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2818396
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2740905
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2420335
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2586176
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2556211
xiangzh1@xiangzh1:~/test/PLT/test$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               158
Model name:          Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
Stepping:            9
CPU MHz:             845.906
CPU max MHz:         4200.0000
CPU min MHz:         800.0000
BogoMIPS:            7200.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            8192K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp flush_l1d
xiangzh1@xiangzh1:~/test/PLT/test$ ls
a16.out  a24.d  a24.dump  a24.out  cmd.txt  Fplt-bk.c  Fplt-bk.o  Fplt.c  Fplt.o  Fplt.so  Main-bk.c  Main-bk.o  Main.c  Main.o  perf.data  perf.data.a24  perf.data.old  pp  tt
xiangzh1@xiangzh1:~/test/PLT/test$ head -n 15 Main.c
#include<stdio.h>
#include<time.h>

#define FUNCTION_PLT( ID ) fplt##ID();

int main(){
int i;
clock_t t1, t2;
t1 = clock();
for(i = 0; i < 1000000; i++){

FUNCTION_PLT( 0 )
FUNCTION_PLT( 1 )
FUNCTION_PLT( 2 )
FUNCTION_PLT( 3 )
xiangzh1@xiangzh1:~/test/PLT/test$ tail -n 10 Main.c
FUNCTION_PLT( 724 )
FUNCTION_PLT( 725 )
FUNCTION_PLT( 726 )
FUNCTION_PLT( 727 )
FUNCTION_PLT( 728 )
}
t2 = clock();
printf("Time cost %ld\n", t2-t1 );
  return 0;
}
xiangzh1@xiangzh1:~/test/PLT/test$ head -n 10 Fplt.c
#define FUNCTION_PLT( ID ) int fplt##ID(){return ID;}
FUNCTION_PLT( 0 )
FUNCTION_PLT( 1 )
FUNCTION_PLT( 2 )
FUNCTION_PLT( 3 )
FUNCTION_PLT( 4 )
FUNCTION_PLT( 5 )
FUNCTION_PLT( 6 )
FUNCTION_PLT( 7 )
FUNCTION_PLT( 8 )
xiangzh1@xiangzh1:~/test/PLT/test$ tail -n 10 Fplt.c
FUNCTION_PLT( 719 )
FUNCTION_PLT( 720 )
FUNCTION_PLT( 721 )
FUNCTION_PLT( 722 )
FUNCTION_PLT( 723 )
FUNCTION_PLT( 724 )
FUNCTION_PLT( 725 )
FUNCTION_PLT( 726 )
FUNCTION_PLT( 727 )
FUNCTION_PLT( 728 )
xiangzh1@xiangzh1:~/test/PLT/test$

I found most time the big plt cost more time than small plt file.
This change just base on LLVM8.0.0-LLD, but the the principle is the same.
I also catch the cache miss in main function (I add the loop times to 10000000):

Xiang,

I believe that your benchmark is inappropriate and does not correctly capture correct performance characteristics of PLT entries of different sizes.

First of all, your microbenchmark is too artificial. You created lots of PLT entries for functions that don't do anything but immediately return, and your program doesn't do anything but hammer PLT entries all the time. That usage pattern is very different from real programs.

Second of all, and more importantly, I couldn't reproduce your result with the PLT of size 32 bytes. One possibility is that the performance you noticed was due to the fact that half of the 24-byte PLT entries are not aligned to 16 byte boundaries. Intel x86-64 Optimization Manual recommends that you align jump target to 16 bytes, so it might not be a surprise, though.

My machine is AMD Ryzen Threadripper 2990WX, so your benchmark's behavior on my machine might be different from yours, but still I could observe that your benchmark slows down only when I choose to use 24-byte PLT entry, as you can see below.

$ for i in `seq 1 10`; do echo -n "PLT16 "; ./main.16; echo -n "PLT24 "; ./main.24; echo -n "PLT32 "; ./main.32; echo; done
PLT16 Time cost 2667549
PLT24 Time cost 2891321
PLT32 Time cost 2656223

PLT16 Time cost 2682222
PLT24 Time cost 3245131
PLT32 Time cost 2682468

PLT16 Time cost 2659353
PLT24 Time cost 2895083
PLT32 Time cost 2657525

PLT16 Time cost 2680843
PLT24 Time cost 2867365
PLT32 Time cost 2676404

PLT16 Time cost 2661513
PLT24 Time cost 2931767
PLT32 Time cost 2653236

PLT16 Time cost 2680713
PLT24 Time cost 2872741
PLT32 Time cost 2678326

PLT16 Time cost 2661513
PLT24 Time cost 2892384
PLT32 Time cost 2656584

PLT16 Time cost 2683864
PLT24 Time cost 2887594
PLT32 Time cost 2658880

PLT16 Time cost 2682057
PLT24 Time cost 2911524
PLT32 Time cost 2679250

PLT16 Time cost 2654767
PLT24 Time cost 2915021
PLT32 Time cost 2659874

main.16 is linked with the regular lld. main.24 is linked with a modified lld whose output PLT size is 24 bytes. main.32 is linked with a modified lld whose output PLT size is 32 bytes.

As you can see, even with your synthesized microbenchmark, I couldn't see the difference of 16-byte PLT and 32-byte PLT. With that result, I have to say that I'm even more skeptical of the 2-PLT scheme than before.

Ping. I'm planning to upstream support for the AArch64 BTI and PAC features in LLD, llvm-objdump and llvm-readelf. I've written the LLD patches on top of this patch as the .note.gnu.property section is used in a similar way (GNU_PROPERTY_AARCH64_FEATURE_1_AND) so only a few small alterations are needed to that part of the code. I'm wondering if it is best to:

Submit the patches rebased on top of this patch.
- The advantage here is that you could see the differences in supporting X86 and AArch64 more clearly.
Strip out the .note.gnu.property code from this patch and resubmit it as part of the AArch64 BTI and PAC.
- The advantage here is that the AArch64 feature doesn't have to wait for CET to land.

Any suggestions or preferences? I have the code written and will start upstreaming tomorrow. There are some LLVM for llvm-readobj and llvm-objdump that the LLD patches will depend on so I'll start with those.

The ELF ABI has details on the .note.gnu.property support. https://developer.arm.com/docs/ihi0056/latest/elf-for-the-arm-64-bit-architecture-aarch64-abi-2019q1-documentation.

I was also wondering if there has been any progress with this. Was there any more discussion on the ABI mailing list?

I'm sorry, I reflect the problem to some leaders and experts, but the PLT ABI is still undecided. The most concern come from the unknow risk of difference with GNU.

In D59780#1520390, @xiangzhangllvm wrote:

I'm sorry, I reflect the problem to some leaders and experts, but the PLT ABI is still undecided. The most concern come from the unknow risk of difference with GNU.

@xiangzhangllvm If it is still undecided, then there is still opportunity to fix :) IMHO tooling is still immature and currently very few tools understand .plt.sec.

I'd really like to see more discussions about this, e.g. https://groups.google.com/forum/#!topic/x86-64-abi/Z6GqQDy_NaI
If PLT entries should be aligned by 16 to maximize performance, shall we switch to 32-byte PLT entry? Xiang, can you raise a topic on https://sourceware.org/ml/gnu-gabi/2019-q2 or https://sourceware.org/ml/binutils/2019-05/

In D59780#1520411, @MaskRay wrote:

In D59780#1520390, @xiangzhangllvm wrote:

I'm sorry, I reflect the problem to some leaders and experts, but the PLT ABI is still undecided. The most concern come from the unknow risk of difference with GNU.

@xiangzhangllvm If it is still undecided, then there is still opportunity to fix :) IMHO tooling is still immature and currently very few tools understand .plt.sec.

I'd really like to see more discussions about this, e.g. https://groups.google.com/forum/#!topic/x86-64-abi/Z6GqQDy_NaI
If PLT entries should be aligned by 16 to maximize performance, shall we switch to 32-byte PLT entry? Xiang, can you raise a topic on https://sourceware.org/ml/gnu-gabi/2019-q2 or https://sourceware.org/ml/binutils/2019-05/

hi, fangrui, 32-byte PLT entry is also aligned by 16, I have test the performance before, they are same.

In D59780#1520436, @xiangzhangllvm wrote:

In D59780#1520411, @MaskRay wrote:

In D59780#1520390, @xiangzhangllvm wrote:

I'm sorry, I reflect the problem to some leaders and experts, but the PLT ABI is still undecided. The most concern come from the unknow risk of difference with GNU.

@xiangzhangllvm If it is still undecided, then there is still opportunity to fix :) IMHO tooling is still immature and currently very few tools understand .plt.sec.

I'd really like to see more discussions about this, e.g. https://groups.google.com/forum/#!topic/x86-64-abi/Z6GqQDy_NaI
If PLT entries should be aligned by 16 to maximize performance, shall we switch to 32-byte PLT entry? Xiang, can you raise a topic on https://sourceware.org/ml/gnu-gabi/2019-q2 or https://sourceware.org/ml/binutils/2019-05/

hi, fangrui, 32-byte PLT entry is also aligned by 16, I have test the performance before, they are same.

Great. If the unified 32-byte PLT entry scheme doesn't regress, let's delete .plt.sec and fix the ABI. A simplified scheme will ease the adoption of CET. I guess you are in a good position to do this. You may send an email to the gnu-gabi and binutils mailing lists.

The current problem is the 2nd PLT ABI was in GCC side for more than 2 years. I think we need to work with GCC community to have a consistent ABI so we won't break each other. Let's have more discussion in x86_64 ABI site.

In D59780#1520543, @annita.zhang wrote:

The current problem is the 2nd PLT ABI was in GCC side for more than 2 years. I think we need to work with GCC community to have a consistent ABI so we won't break each other. Let's have more discussion in x86_64 ABI site.

@annita.zhang Thanks for weighing in! AFAIK, .plt.sec is a section synthesized by the linker. GCC doesn't know or have to know how PLT entries are represented.

for more than 2 years

This is less of a concern to me. Even GNU objdump compiled from binutils-gdb trunk (the same repository as ld.bfd) doesn't know how to symbolize such .plt.sec or .plt entries, so I believe the tooling is still immature and the adoption is low.

Annita, you and Xiang may have better ideas which parties should be involved in the discussion. Could one of you start a thread on x86-64-abi, binutils, and gnu-gabi? And/or gcc (if you think GCC folks should also be involved). You may CC Rui, Peter, and me.

I was planning to start a discussion at the gabi mailing list. I noticed that I didn't join the mailing list, and it's moderated, so I waited my join request would be approved, and then when it was approved, I didn't start a discussion immediately. Sorry for my negligence. Let me start a discussion.

In lld we already support two different PLT formats for mitigating Spectre. It might make sense to add them to the standard if enough number of people are already using them. Last I tried it with gdb, gdb crashes because of the nonstandard PLT format.

peter.smith mentioned this in D62609: [LLD][ELF][AArch64] Support for BTI and PAC.May 29 2019, 9:49 AM

peter.smith added a child revision: D62609: [LLD][ELF][AArch64] Support for BTI and PAC.

MaskRay added inline comments.May 30 2019, 3:24 AM

lld/ELF/Config.h
89	Split off the `X86Features` (or `ANDFeatures`) change and commit, to make D62609 easier to review?

I sent a proposal to the x86-64-abi mailing list. You can read the thread here: https://groups.google.com/forum/?hl=en#!topic/x86-64-abi/sQcX3__r4c0

peter.smith removed a child revision: D62609: [LLD][ELF][AArch64] Support for BTI and PAC.Jun 4 2019, 10:14 AM

rebase so that the patch does not diverge too much from the trunk

Harbormaster completed remote builds in B33060: Diff 203536.Jun 7 2019, 5:03 AM

fix tests

Harbormaster completed remote builds in B33121: Diff 203771.Jun 9 2019, 10:20 PM

Hi,

I added a comment to x86-64 ABI google group link. I also posted it below.

This CET ABI issue was raised during glibc discussion in GCC Cauldron. The conclusion was CET ABI wouldn’t be changed. It’s not because it’s a better option, but because it’s been finalized for 2 years (in 2017). And the implementation has been included in Linux distributions already. There's no obvious reason to change it w/o a pretty better solution.

On the other hand, Intel wants to support it in LLVM lld anyway. But we don’t want to force lld developers to do what they don’t want. So now the ball is at lld side. If lld would like to follow the CET ABI, we’re willing to submit the corresponding lld patches. If not, we will submit the lld patches supporting 32-byte PLTs. The disadvantage of the latter is ABI is incompatible between GCC and LLVM for CET. It may cause potential issues in some tools which has dependence on PLT size and layout. But we don’t have known samples yet. From our estimation, the risk may be low.

I’m looking forward to hearing your opinions and moving it forward.

Thanks,
Annita

In D59780#1709046, @annita.zhang wrote:

Hi,

I added a comment to x86-64 ABI google group link. I also posted it below.

This CET ABI issue was raised during glibc discussion in GCC Cauldron. The conclusion was CET ABI wouldn’t be changed. It’s not because it’s a better option, but because it’s been finalized for 2 years (in 2017). And the implementation has been included in Linux distributions already. There's no obvious reason to change it w/o a pretty better solution.

On the other hand, Intel wants to support it in LLVM lld anyway. But we don’t want to force lld developers to do what they don’t want. So now the ball is at lld side. If lld would like to follow the CET ABI, we’re willing to submit the corresponding lld patches. If not, we will submit the lld patches supporting 32-byte PLTs. The disadvantage of the latter is ABI is incompatible between GCC and LLVM for CET. It may cause potential issues in some tools which has dependence on PLT size and layout. But we don’t have known samples yet. From our estimation, the risk may be low.

I’m looking forward to hearing your opinions and moving it forward.

Thanks,
Annita

I actually made a similar proposal https://groups.google.com/forum/?hl=en#!topic/x86-64-abi/Z6GqQDy_NaI in April but it did not spur discussions.
Ali Bahrami and Michael Matz were active in generic-abi discussions. I am really glad they commented on https://groups.google.com/forum/?hl=en#!topic/x86-64-abi/sQcX3__r4c0 My summary is that many people think .plt.sec is overengineered. Carlos's stance is fairly obvious from the discussion. It is just too late to fix anything. (Gossip is that it is probably only Carlos himself cares about the auditing feature in glibc... As people have pointed out, very few libc implementations (just glibc and Solaris libc) support auditing so this does not surprise me at all -;))

In the Cauldron, GCC/LLVM Collaboration BoF was a topic. The PLT scheme argument reflects a bigger problem that some collaboration does not work well. I will give an example about the latest binutils 2.33.1 release. I caught the objcopy --set-section-alignment problem in the last minute (https://sourceware.org/ml/binutils/2019-10/msg00008.html). I would regret if the problematic objcopy patch slipped through and GNU developers were reluctant to change it - which would mean llvm-objcopy would not have an ideal solution. Recently in another case, I caught a problem related to .ctf but I guess GNU maintainers will not care my opinion. I'll stop as it may be too off-topic now.

For my own notes, I should follow binutils, libc-alpha, and x86-64-abi discussions more closely.

I accepted this patch. I am still sad about the .plt.sec scheme but am not opposed to this patch. It is up to @ruiu to rebase the patch and commit it.

annita.zhang added a comment.Oct 21 2019, 12:14 AM

This comment was removed by annita.zhang.

In D59780#1710631, @MaskRay wrote:

In the Cauldron, GCC/LLVM Collaboration BoF was a topic. The PLT scheme argument reflects a bigger problem that some collaboration does not work well. I will give an example about the latest binutils 2.33.1 release. I caught the objcopy --set-section-alignment problem in the last minute (https://sourceware.org/ml/binutils/2019-10/msg00008.html). I would regret if the problematic objcopy patch slipped through and GNU developers were reluctant to change it - which would mean llvm-objcopy would not have an ideal solution. Recently in another case, I caught a problem related to .ctf but I guess GNU maintainers will not care my opinion. I'll stop as it may be too off-topic now.

For my own notes, I should follow binutils, libc-alpha, and x86-64-abi discussions more closely.

I accepted this patch. I am still sad about the .plt.sec scheme but am not opposed to this patch. It is up to @ruiu to rebase the patch and commit it.

@MaskRay, I totally agree with you that LLVM and GCC community should work more closely on those ABI standards. For example, we are facing some LLVM ABI noncompliant issues especially in legacy ia32. And they're hard to fix. We need to involve both LLVM and GCC into discussion for future ABI changes. So both can be ABI compliant once it's finalized.

@ruiu, what's your decision then?

ping

This patch is outdated and needs rebasing. I'll rebase, upload it again and then submit.

Sounds good. Thank you!

In D59780#1738474, @ruiu wrote:

This patch is outdated and needs rebasing. I'll rebase, upload it again and then submit.

Hi ruiu, just remind, if you have no time, I can rebase it.

In D59780#1778623, @xiangzhangllvm wrote:

In D59780#1738474, @ruiu wrote:

This patch is outdated and needs rebasing. I'll rebase, upload it again and then submit.

Hi ruiu, just remind, if you have no time, I can rebase it.

I can do that, too, and cleaning the test.

lld/test/ELF/x86-64-cet.s
13	These instructions should be aligned. llvm-objcopy<9 may have alignment problems.

In D59780#1778672, @MaskRay wrote:

In D59780#1778623, @xiangzhangllvm wrote:

In D59780#1738474, @ruiu wrote:

This patch is outdated and needs rebasing. I'll rebase, upload it again and then submit.

Hi ruiu, just remind, if you have no time, I can rebase it.

I can do that, too, and cleaning the test.

Very good! thank you! In fact, I am a little unfamiliar about LLD code, for I have not do the lld work for a long time.

Rebase and improve comments.

Harbormaster completed remote builds in B42271: Diff 233253.Dec 10 2019, 8:16 PM

Hmm, I'm sorry but I'm confused. IIRC I had a discussion in the LLVM dev meeting that we were going to submit a change with a single PLT scheme rather than IPLT, so when I said that I'm going to submit a patch, I meant that I'm going to submit a patch for the 1PLT scheme rather than the 2PLT scheme. But this is for the 2PLT scheme. This is not something I want.

@xiangzhangllvm @annita.zhang The patch has been rebased, but it doesn't seem to work.

I made a local patch to make --require-cet behave more like --force-bti:

--- i/lld/ELF/Driver.cpp
+++ w/lld/ELF/Driver.cpp
@@ -1705,2 +1705,4 @@ template <class ELFT> static uint32_t getAndFeatures() {
-    } else if (!features && config->requireCET)
-      error(toString(f) + ": --require-cet: file is not compatible with CET");
+    } else if (config->requireCET && !(features & GNU_PROPERTY_X86_FEATURE_1_IBT)) {
+      warn(toString(f) + ": --require-cet: file is not compatible with CET");
+      features |= GNU_PROPERTY_X86_FEATURE_1_IBT;
+    }

I tried a trivial program with 2 PLT calls.

gcc -fcf-protection=full -c a.c
gcc -fcf-protection=full a.c -o a '-###'   # Retrieve linker command line, replace ld with

My GCC crt files are not CET compatible but I think that is probably irrelevant.

% ld.lld --eh-frame-hdr -m elf_x86_64 "--hash-style=gnu" -dynamic-linker /lib64/ld-linux-x86-64.so.2 -pie -o a /usr/lib/gcc/x86_64-linux-gnu/8/../../
../x86_64-linux-gnu/Scrt1.o /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/8/crtbeginS.o -L/usr/lib/gc
c/x86_64-linux-gnu/8 -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../../lib -L/lib/x86_64-linux
-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/8/../../.. a.o -lgcc --push-state --as-needed -lgcc_s
 --pop-state -lc -lgcc --push-state --as-needed -lgcc_s --pop-state /usr/lib/gcc/x86_64-linux-gnu/8/crtendS.o /usr/lib/gcc/x86_64-linux-gnu/8/../../..
/x86_64-linux-gnu/crtn.o --require-cet -o a                                                                                                          
ld.lld: warning: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/Scrt1.o: --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o: --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib/gcc/x86_64-linux-gnu/8/crtbeginS.o: --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib/x86_64-linux-gnu/libc_nonshared.a(elf-init.oS): --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib/gcc/x86_64-linux-gnu/8/crtendS.o: --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crtn.o: --require-cet: file is not compatible with CET

OK, it segfaults. So there may be some issues in the PLT.

./a => segmentation fault

Peter Smith implemented --force-bti for AArch64 in D62609.
Its semantic is:

--force-bti : Act as if all relocatable inputs had GNU_PROPERTY_AARCH64_FEATURE_1_BTI and warn for every relocatable object that does not.

Do you think it makes more sense to change --require-cet to --force-cet or --force-ibt?

There is also some glibc code

@peter.smith I find that in binutils-gdb, the AArch64 option is named -z force-bti, not --force-bti (commit [BFD, LD, AArch64, 2/3] Add --force-bti to enable BTI and to select BTI enabled PLTs). Do you intend to make them consistent?

In D59780#1778863, @ruiu wrote:

Hmm, I'm sorry but I'm confused. IIRC I had a discussion in the LLVM dev meeting that we were going to submit a change with a single PLT scheme rather than IPLT, so when I said that I'm going to submit a patch, I meant that I'm going to submit a patch for the 1PLT scheme rather than the 2PLT scheme. But this is for the 2PLT scheme. This is not something I want.

What was the decision that was made at the developer meeting? Will lld support the 2PLT scheme defined in the psABI?

In D59780#1778882, @tstellar wrote:

In D59780#1778863, @ruiu wrote:

Hmm, I'm sorry but I'm confused. IIRC I had a discussion in the LLVM dev meeting that we were going to submit a change with a single PLT scheme rather than IPLT, so when I said that I'm going to submit a patch, I meant that I'm going to submit a patch for the 1PLT scheme rather than the 2PLT scheme. But this is for the 2PLT scheme. This is not something I want.

What was the decision that was made at the developer meeting? Will lld support the 2PLT scheme defined in the psABI?

(Personally, I am still unhappy with the .plt.sec scheme. I think I prefer a 1PLT scheme like AArch64's Branch Target Indicators (BTI) and Pointer Authentication Code (PAC).) But I thought @ruiu accepted the fait accompli because there had been very strong objection to diverge from what GNU ld does. https://groups.google.com/forum/#!topic/x86-64-abi/sQcX3__r4c0

objdump -d (built from binutils-gdb HEAD) still does not work with lld produced binaries.

% objdump -d a.bfd  # -z ibt
Disassembly of section .plt:

0000000000001020 <.plt>:
    1020:       ff 35 e2 2f 00 00       pushq  0x2fe2(%rip)        # 4008 <_GLOBAL_OFFSET_TABLE_+0x8>
    1026:       f2 ff 25 e3 2f 00 00    bnd jmpq *0x2fe3(%rip)        # 4010 <_GLOBAL_OFFSET_TABLE_+0x10>
    102d:       0f 1f 00                nopl   (%rax)
    1030:       f3 0f 1e fa             endbr64 
    1034:       68 00 00 00 00          pushq  $0x0
    1039:       f2 e9 e1 ff ff ff       bnd jmpq 1020 <.plt>
    103f:       90                      nop
    1040:       f3 0f 1e fa             endbr64 
    1044:       68 01 00 00 00          pushq  $0x1
    1049:       f2 e9 d1 ff ff ff       bnd jmpq 1020 <.plt>
    104f:       90                      nop
Disassembly of section .plt.got:

0000000000001050 <__cxa_finalize@plt>:
    1050:       f3 0f 1e fa             endbr64 
    1054:       f2 ff 25 9d 2f 00 00    bnd jmpq *0x2f9d(%rip)        # 3ff8 <__cxa_finalize@GLIBC_2.2.5>
    105b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

Disassembly of section .plt.sec:

0000000000001060 <puts@plt>:
    1060:       f3 0f 1e fa             endbr64 
    1064:       f2 ff 25 ad 2f 00 00    bnd jmpq *0x2fad(%rip)        # 4018 <puts@GLIBC_2.2.5>
    106b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

0000000000001070 <exit@plt>:
    1070:       f3 0f 1e fa             endbr64 
    1074:       f2 ff 25 a5 2f 00 00    bnd jmpq *0x2fa5(%rip)        # 4020 <exit@GLIBC_2.2.5>
    107b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

% objdump -d a.lld  # --require-cet (actually my patched version, which behaves like AArch64's --force-bti
...
Disassembly of section .plt:

00000000000018c0 <.plt>:
    18c0:       ff 35 0a 22 00 00       pushq  0x220a(%rip)        # 3ad0 <__init_array_end+0x1190>
    18c6:       ff 25 0c 22 00 00       jmpq   *0x220c(%rip)        # 3ad8 <__init_array_end+0x1198>
    18cc:       0f 1f 40 00             nopl   0x0(%rax)
    18d0:       f3 0f 1e fa             endbr64 
    18d4:       68 00 00 00 00          pushq  $0x0
    18d9:       e9 e2 ff ff ff          jmpq   18c0 <_fini+0x14>
    18de:       66 90                   xchg   %ax,%ax
    18e0:       f3 0f 1e fa             endbr64 
    18e4:       68 01 00 00 00          pushq  $0x1
    18e9:       e9 d2 ff ff ff          jmpq   18c0 <_fini+0x14>
    18ee:       66 90                   xchg   %ax,%ax
    18f0:       f3 0f 1e fa             endbr64 
    18f4:       68 02 00 00 00          pushq  $0x2
    18f9:       e9 c2 ff ff ff          jmpq   18c0 <_fini+0x14>
    18fe:       66 90                   xchg   %ax,%ax

Disassembly of section .plt.sec:

0000000000001900 <.plt.sec>:
    1900:       f3 0f 1e fa             endbr64 
    1904:       ff 25 16 22 00 00       jmpq   *0x2216(%rip)        # 3b20 <__cxa_finalize@GLIBC_2.2.5>
    190a:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
    1910:       f3 0f 1e fa             endbr64 
    1914:       ff 25 0e 22 00 00       jmpq   *0x220e(%rip)        # 3b28 <puts@GLIBC_2.2.5>
    191a:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
    1920:       f3 0f 1e fa             endbr64 
    1924:       ff 25 06 22 00 00       jmpq   *0x2206(%rip)        # 3b30 <exit@GLIBC_2.2.5>
    192a:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)

On AArch64, the linker produces two dynamic tags DT_AARCH64_BTI_PLT and DT_AARCH64_PAC_PLT which will be read by glibc ld.so. I haven't checked how x86_64 glibc handles this.

I will probably build a top-of-tree glibc and a top-of-tree gcc tomorrow and investigate more about these stuff.

In D59780#1778882, @tstellar wrote:

In D59780#1778863, @ruiu wrote:

Hmm, I'm sorry but I'm confused. IIRC I had a discussion in the LLVM dev meeting that we were going to submit a change with a single PLT scheme rather than IPLT, so when I said that I'm going to submit a patch, I meant that I'm going to submit a patch for the 1PLT scheme rather than the 2PLT scheme. But this is for the 2PLT scheme. This is not something I want.

What was the decision that was made at the developer meeting? Will lld support the 2PLT scheme defined in the psABI?

I was confused too. @ruiu, maskray, what's the decision for lld support to CET?

xiangzhangllvm added inline comments.Dec 10 2019, 9:46 PM

lld/ELF/Driver.cpp
1704–1709	I don't know why "--forece-bti" use "warn" not "error", in my eyes, people use --require-cet to generate the output file, mostly means, they want to run the output file on a CET-checked machine. It will run fail when linked Non-CET object files to the output file. So here we used "error" instead of "warn".

MaskRay added inline comments.Dec 10 2019, 9:59 PM

lld/ELF/Driver.cpp

1704–1709

GNU ld supports -z cet-report=[none|warning|error] (added in 2019-04) https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=233a00833b984319d5e94db3f5d5d9a735edc984

Examples:

% ld.bfd ... -z cet-report=error              # exit code: 1
./ld-new: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/Scrt1.o: error: missing SHSTK property
./ld-new: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o: error: missing SHSTK property
% ld.bfd ... -z cet-report=warning        # exit code: 0
./ld-new: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/Scrt1.o: warning: missing SHSTK property
./ld-new: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o: warning: missing SHSTK property

> gcc -fcf-protection=full -c a.c
> gcc -fcf-protection=full a.c -o a '-###'   # Retrieve linker command line, replace ld with

My GCC crt files are not CET compatible but I think that is probably irrelevant.

% ld.lld --eh-frame-hdr -m elf_x86_64 "--hash-style=gnu" -dynamic-linker /lib64/ld-linux-x86-64.so.2 -pie -o a /usr/lib/gcc/x86_64-linux-gnu/8/../../
../x86_64-linux-gnu/Scrt1.o /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/8/crtbeginS.o -L/usr/lib/gc
c/x86_64-linux-gnu/8 -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../../lib -L/lib/x86_64-linux
-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/8/../../.. a.o -lgcc --push-state --as-needed -lgcc_s
 --pop-state -lc -lgcc --push-state --as-needed -lgcc_s --pop-state /usr/lib/gcc/x86_64-linux-gnu/8/crtendS.o /usr/lib/gcc/x86_64-linux-gnu/8/../../..
/x86_64-linux-gnu/crtn.o --require-cet -o a                                                                                                          
ld.lld: warning: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/Scrt1.o: --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o: --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib/gcc/x86_64-linux-gnu/8/crtbeginS.o: --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib/x86_64-linux-gnu/libc_nonshared.a(elf-init.oS): --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib/gcc/x86_64-linux-gnu/8/crtendS.o: --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crtn.o: --require-cet: file is not compatible with CET

OK, it segfaults. So there may be some issues in the PLT.

./a => segmentation fault

Can you share your a.c code?
I test the "hello world" test, it is ok. (At first I use as 2.25, it failed at assembling.)

[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$gcc -fcf-protection=full main.c -S -o main.s
[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$cat main.c
#include<stdio.h>
int main() {
 printf("Hello llvm!\n");
  return 0;
}
[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$gcc -fcf-protection=full main.c -S -o main.s
[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$/rdrive/ref/binutils/2.29/rhel70/efi2/bfd/bin/as --64 main.s -o main.o
[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$/rdrive/ref/binutils/2.29/rhel70/efi2/bfd/bin/as -version
GNU assembler (GNU Binutils) 2.29
Copyright (C) 2017 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or later.
This program has absolutely no warranty.
This assembler was configured for a target of `x86_64-linux-gnu'.
[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$../build-cet/bin/ld.lld --require-cet -plugin /nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../libexec/gcc/x86_64-linux-gnu/8.3.0/liblto_plugin.so "-plugin-opt=/nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../libexec/gcc/x86_64-linux-gnu/8.3.0/lto-wrapper" "-plugin-opt=-fresolution=/tmp/ccQKvMnE.res" "-plugin-opt=-pass-through=-lgcc" "-plugin-opt=-pass-through=-lgcc_s" "-plugin-opt=-pass-through=-lc" "-plugin-opt=-pass-through=-lgcc" "-plugin-opt=-pass-through=-lgcc_s" --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o a.out /lib/../lib64/crt1.o /lib/../lib64/crti.o /nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/crtbegin.o -L/nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0 -L/nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc -L/nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/../../.. main.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/crtend.o /lib/../lib64/crtn.o
ld.lld: warning: /lib/../lib64/crt1.o: --require-cet: file is not compatible with CET
ld.lld: warning: /lib/../lib64/crti.o: --require-cet: file is not compatible with CET
ld.lld: warning: /nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/crtbegin.o: --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib64/libc_nonshared.a(elf-init.oS): --require-cet: file is not compatible with CET
ld.lld: warning: /nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/crtend.o: --require-cet: file is not compatible with CET
ld.lld: warning: /lib/../lib64/crtn.o: --require-cet: file is not compatible with CET
[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$
[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$./a.out
Hello llvm!
[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$

Remove the "-plugin xxx", also work.

[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$../build-cet/bin/ld.lld --require-cet  --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o a.out /lib/../lib64/crt1.o /lib/../lib64/crti.o /nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/crtbegin.o -L/nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0 -L/nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc -L/nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/../../.. main.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/crtend.o /lib/../lib64/crtn.o                                                                           ld.lld: warning: /lib/../lib64/crt1.o: --require-cet: file is not compatible with CET
ld.lld: warning: /lib/../lib64/crti.o: --require-cet: file is not compatible with CET
ld.lld: warning: /nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/crtbegin.o: --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib64/libc_nonshared.a(elf-init.oS): --require-cet: file is not compatible with CET
ld.lld: warning: /nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/crtend.o: --require-cet: file is not compatible with CET
ld.lld: warning: /lib/../lib64/crtn.o: --require-cet: file is not compatible with CET
[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$./a.out
Hello llvm!
[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$

Apply two patches

-  uint64_t Plt = In.Plt->getVA();
+  uint64_t Plt = In.IBTPlt ? In.IBTPlt->getVA() : In.Plt->getVA();

--- i/lld/ELF/Driver.cpp
+++ w/lld/ELF/Driver.cpp
@@ -1705,2 +1705,4 @@ template <class ELFT> static uint32_t getAndFeatures() {
-    } else if (!features && config->requireCET)
-      error(toString(f) + ": --require-cet: file is not compatible with CET");
+    } else if (config->requireCET && !(features & GNU_PROPERTY_X86_FEATURE_1_IBT)) {
+      warn(toString(f) + ": --require-cet: file is not compatible with CET");
+      features |= GNU_PROPERTY_X86_FEATURE_1_IBT;
+    }

@xiangzhangllvm The previous version I uploaded definitely did not work. I just applied a fix. Both i386 and x86_64 seem to work now, but there are still problems with the tests.

You need either arc patch D59780 or curl -L 'https://reviews.llvm.org/D59780?download=true' | patch -p1 to apply the latest diff in your local repository.

I have a separate patch to rename lld --force-bti to -z force-bti. I think --force-ibt or -z force-ibt may make more sense than --require-cet.

Harbormaster completed remote builds in B42275: Diff 233263.Dec 11 2019, 12:24 AM

In D59780#1778882, @tstellar wrote:

In D59780#1778863, @ruiu wrote:

Hmm, I'm sorry but I'm confused. IIRC I had a discussion in the LLVM dev meeting that we were going to submit a change with a single PLT scheme rather than IPLT, so when I said that I'm going to submit a patch, I meant that I'm going to submit a patch for the 1PLT scheme rather than the 2PLT scheme. But this is for the 2PLT scheme. This is not something I want.

What was the decision that was made at the developer meeting? Will lld support the 2PLT scheme defined in the psABI?

I don't know what was discussed at the developer meeting, but I don't see how not following the established ABI is an option. It seems clear that GNU implementations will not change the ABI they've been using for 2+ years. I don't see the value in having lld create a competing ABI.

In D59780#1778988, @MaskRay wrote:
Apply two patches
-  uint64_t Plt = In.Plt->getVA();
+  uint64_t Plt = In.IBTPlt ? In.IBTPlt->getVA() : In.Plt->getVA();
--- i/lld/ELF/Driver.cpp
+++ w/lld/ELF/Driver.cpp
@@ -1705,2 +1705,4 @@ template <class ELFT> static uint32_t getAndFeatures() {
-    } else if (!features && config->requireCET)
-      error(toString(f) + ": --require-cet: file is not compatible with CET");
+    } else if (config->requireCET && !(features & GNU_PROPERTY_X86_FEATURE_1_IBT)) {
+      warn(toString(f) + ": --require-cet: file is not compatible with CET");
+      features |= GNU_PROPERTY_X86_FEATURE_1_IBT;
+    }
@xiangzhangllvm The previous version I uploaded definitely did not work. I just applied a fix. Both i386 and x86_64 seem to work now, but there are still problems with the tests.

You need either arc patch D59780 or curl -L 'https://reviews.llvm.org/D59780?download=true' | patch -p1 to apply the latest diff in your local repository.

I have a separate patch to rename lld --force-bti to -z force-bti. I think --force-ibt or -z force-ibt may make more sense than --require-cet.

Thank you very much!
I checked the "X86-64 fail tests", the reason is just that you changed "error" to "warn".
--force-ibt maybe use, (but to change --force-ibt to --force-cet, because 'cet' contains 'ibt' and 'shstk')
--require-cet is also needed, (strictly check the input files, make sure run ok in CET machines).
And I think it better to add --try-cet too, try to generate CET output if all inputs is CETed, this must be useful in bootstrap compiling program. (To let the whole program CETed step by step).

Any way, we can add the other "refined options" late, this time, just let '-force-cet' in.

In D59780#1780753, @xiangzhangllvm wrote:
In D59780#1778988, @MaskRay wrote:
Apply two patches
-  uint64_t Plt = In.Plt->getVA();
+  uint64_t Plt = In.IBTPlt ? In.IBTPlt->getVA() : In.Plt->getVA();
--- i/lld/ELF/Driver.cpp
+++ w/lld/ELF/Driver.cpp
@@ -1705,2 +1705,4 @@ template <class ELFT> static uint32_t getAndFeatures() {
-    } else if (!features && config->requireCET)
-      error(toString(f) + ": --require-cet: file is not compatible with CET");
+    } else if (config->requireCET && !(features & GNU_PROPERTY_X86_FEATURE_1_IBT)) {
+      warn(toString(f) + ": --require-cet: file is not compatible with CET");
+      features |= GNU_PROPERTY_X86_FEATURE_1_IBT;
+    }
@xiangzhangllvm The previous version I uploaded definitely did not work. I just applied a fix. Both i386 and x86_64 seem to work now, but there are still problems with the tests.

You need either arc patch D59780 or curl -L 'https://reviews.llvm.org/D59780?download=true' | patch -p1 to apply the latest diff in your local repository.

I have a separate patch to rename lld --force-bti to -z force-bti. I think --force-ibt or -z force-ibt may make more sense than --require-cet.
Thank you very much!
I checked the "X86-64 fail tests", the reason is just that you changed "error" to "warn".
--force-ibt maybe use, (but to change --force-ibt to --force-cet, because 'cet' contains 'ibt' and 'shstk')
--require-cet is also needed, (strictly check the input files, make sure run ok in CET machines).
And I think it better to add --try-cet too, try to generate CET output if all inputs is CETed, this must be useful in bootstrap compiling program. (To let the whole program CETed step by step).

Any way, we can add the other "refined options" late, this time, just let '-force-cet' in.

x86 IBT is similar to AArch64 BTI. Whether we should upgrade from -z force-ibt to -z force-cet depends on whether shstk requires support in the object files for correctness.. Here is my understanding.

GNU_PROPERTY_X86_FEATURE_1_IBT represents the capability of IBT in an object file. The output is IBT capable only if all object files have the bit. GNU_PROPERTY_AARCH64_FEATURE_1_BTI belongs to this category. So naming the relevant option -z force-ibt makes sense.
GNU_PROPERTY_X86_FEATURE_1_SHSTK represents that an object file desires SHSTK. It does not require support in the object files for correctness. GNU_PROPERTY_AARCH64_FEATURE_1_PAC_PLT belongs to this category.

I would name the two options -z force-ibt and -z shstk. I can imagine that shstk may have non-negligible overhead, and probably not everyone wants to enable it. So having separate options makes sense.

(GNU ld supports -z ibt and -z cet-report=[none|warning|error]. If shstk, as I think, does not require object files' support, I will consider it not so ideal. The name -z ibt does not make it clear that this forces ibt when some object files are not IBT capable. -z cet-report mixes intentions of two features.)

If @ruiu still accepts the patch in the first place, I may go with -z force-ibt and -z shstk.

In D59780#1779860, @andrew.w.kaylor wrote:

In D59780#1778882, @tstellar wrote:

In D59780#1778863, @ruiu wrote:

Hmm, I'm sorry but I'm confused. IIRC I had a discussion in the LLVM dev meeting that we were going to submit a change with a single PLT scheme rather than IPLT, so when I said that I'm going to submit a patch, I meant that I'm going to submit a patch for the 1PLT scheme rather than the 2PLT scheme. But this is for the 2PLT scheme. This is not something I want.

What was the decision that was made at the developer meeting? Will lld support the 2PLT scheme defined in the psABI?

I don't know what was discussed at the developer meeting, but I don't see how not following the established ABI is an option. It seems clear that GNU implementations will not change the ABI they've been using for 2+ years. I don't see the value in having lld create a competing ABI.

The PLT scheme in this patch is already different. See that GNU ld -z ibt produces a MPX bnd prefix but this patch doesn't. Also check out my objdump (built from HEAD) example above. PLT schemes have already diverged. I am still not very convinced doing things differently is breaking ABI. If pursing for 1PLT scheme breaks ABI, then this patch breaks ABI, too. In the long term, downstream tools should be taught not to rely on the exact forms of PLT entries. They should make assumption that the first instructions takes X bytes, the second takes Y, etc. They should use better heuristics and the psABI should mention this so that we won't get locked into the any specific PLT scheme.

For MPX prefix:
GCC have not supported the MPX from GCC 9. And intel will not support MPX code too. So we don’t consider MPX for CET in LLD.

In D59780#1781055, @xiangzhangllvm wrote:

For MPX prefix:
GCC have not supported the MPX from GCC 9. And intel will not support MPX code too. So we don’t consider MPX for CET in LLD.

I know that GCC has removed MPX and the Linux kernel is removing MPX (user-visible APIs and self-tests have been removed). I asked because I haven't seen a change on binutils-gdb side that will support a .plt.sec scheme without the BND prefix. So, I wonder what kind of changes are considered divergence from x86-64 psABI. After the removal of the BND prefix, the .plt entry will get the leeway of 2 bytes. If, say, in the future, a new security enhanced feature is proposed which requires a new instruction that will take more than 2 bytes, the 16-byte .plt entry no longer works, and toolchains will have to migrate a third PLT scheme, different from traditional PLT and the .plt.sec scheme.

As to the option name question, are you happy with -z force-ibt and -z shstk? (My understanding is that they should be very similar to -z force-bti and -z pac-plt, respectively.)

In D59780#1782133, @MaskRay wrote:

In D59780#1781055, @xiangzhangllvm wrote:

For MPX prefix:
GCC have not supported the MPX from GCC 9. And intel will not support MPX code too. So we don’t consider MPX for CET in LLD.

I know that GCC has removed MPX and the Linux kernel is removing MPX (user-visible APIs and self-tests have been removed). I asked because I haven't seen a change on binutils-gdb side that will support a .plt.sec scheme without the BND prefix. So, I wonder what kind of changes are considered divergence from x86-64 psABI. After the removal of the BND prefix, the .plt entry will get the leeway of 2 bytes. If, say, in the future, a new security enhanced feature is proposed which requires a new instruction that will take more than 2 bytes, the 16-byte .plt entry no longer works, and toolchains will have to migrate a third PLT scheme, different from traditional PLT and the .plt.sec scheme.

As to the option name question, are you happy with -z force-ibt and -z shstk? (My understanding is that they should be very similar to -z force-bti and -z pac-plt, respectively.)

These options are OK for us, thank you again!
By the way, as I know, gnu ld have no CET options, (excepted -z cet-report=xxx you point out before). it just try to add CET flags if all input file CETed.
For MPX prefix, in my eyes, it seems the new Binutils just forget to remove it. Try ld --help you can see it list "-z nobndplt Generate a regular PLT (default)"

Delete --force-cet. Add -z force-ibt and -z shstk.

Harbormaster completed remote builds in B42442: Diff 233731.Dec 12 2019, 8:40 PM

In D59780#1782932, @xiangzhangllvm wrote:

In D59780#1782133, @MaskRay wrote:

In D59780#1781055, @xiangzhangllvm wrote:

For MPX prefix:
GCC have not supported the MPX from GCC 9. And intel will not support MPX code too. So we don’t consider MPX for CET in LLD.

I know that GCC has removed MPX and the Linux kernel is removing MPX (user-visible APIs and self-tests have been removed). I asked because I haven't seen a change on binutils-gdb side that will support a .plt.sec scheme without the BND prefix. So, I wonder what kind of changes are considered divergence from x86-64 psABI. After the removal of the BND prefix, the .plt entry will get the leeway of 2 bytes. If, say, in the future, a new security enhanced feature is proposed which requires a new instruction that will take more than 2 bytes, the 16-byte .plt entry no longer works, and toolchains will have to migrate a third PLT scheme, different from traditional PLT and the .plt.sec scheme.

As to the option name question, are you happy with -z force-ibt and -z shstk? (My understanding is that they should be very similar to -z force-bti and -z pac-plt, respectively.)

These options are OK for us, thank you again!
By the way, as I know, gnu ld have no CET options, (excepted -z cet-report=xxx you point out before). it just try to add CET flags if all input file CETed.
For MPX prefix, in my eyes, it seems the new Binutils just forget to remove it. Try ld --help you can see it list "-z nobndplt Generate a regular PLT (default)"

-z nobndplt is not implemented in GNU ld

% ld/ld-new -z nobndplt -v
./ld-new: warning: -z nobndplt ignored
GNU ld (GNU Binutils) 2.33.50.20191213

gold supports -z nobndplt but it does not implement IBT.

% gold/ld-new --help | grep bndplt
  -z bndplt                   (x86-64 only) Generate a BND PLT for Intel MPX
  -z nobndplt                 Generate a regular PLT (default)

I have uploaded a diff that implements -z force-ibt and -z shstk, but ultimately it is the code owner (@ruiu)'s decision whether the change should be included.

@ruiu,
Can we submit the patch?

PIng.
@ruiu LLVM 10.0.0 will be branched on Jan. 15. I would expect that the LLD CET support can be included in this release. If you have concern with current implementation, let's have an offline discussion and move it forward quickly.

@ruiu If you are ok with this, I'll rebase, fix the PLT declarations (since I updated them in previous refactorings) and commit.

@MaskRay Please rebase and upload it with me and other people as reviewers, then I'll LGTM.

MaskRay commandeered this revision.Jan 9 2020, 10:27 AM

MaskRay edited reviewers, added: ruiu; removed: MaskRay.

This revision now requires review to proceed.Jan 9 2020, 10:27 AM

MaskRay added a comment.Jan 9 2020, 11:03 AM

This comment was removed by MaskRay.

@ruiu Done.

Rebase on top of D72474

iplt works. Tested several combinations

clang -m32 -static -fcf-protection=full a.c -fuse-ld=lld -Wl,-z,force-ibt -fno-PIC -o a
clang -m64 -static -fcf-protection=full a.c -fuse-ld=lld -Wl,-z,force-ibt -fno-PIC -o a
clang -m32 -static -fcf-protection=full a.c -fuse-ld=lld -Wl,-z,force-ibt -fPIC -o a
clang -m64 -static -fcf-protection=full a.c -fuse-ld=lld -Wl,-z,force-ibt -fPIC -o a
clang -m32 -fcf-protection=full a.c -fuse-ld=lld -Wl,-z,force-ibt -fPIC -o a
clang -m64 -fcf-protection=full a.c -fuse-ld=lld -Wl,-z,force-ibt -fPIC -o a

Unit tests: fail. 61724 tests passed, 1 failed and 779 were skipped.

failed: lld.ELF/x86-64-feature-cet.s

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Harbormaster failed remote builds in B43625: Diff 237157!Jan 9 2020, 12:45 PM

Fix manpage

Unit tests: fail. 61725 tests passed, 1 failed and 779 were skipped.

failed: lld.ELF/x86-64-feature-cet.s

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Harbormaster failed remote builds in B43629: Diff 237175!Jan 9 2020, 1:22 PM

xiangzhangllvm added inline comments.Jan 9 2020, 6:38 PM

lld/test/ELF/x86-64-feature-cet.s
41	lit test failed here, Here we mainly check is the first plt address is written correct in .got.plt and we can really see the data (50132000) 201350: f3 0f 1e fa endbr64 // First PLT I see here (line 87) add indirect function check, do we need to deal with this special function in this patch? or deal with it in another independent patch?

MaskRay added a parent revision: D72474: [ELF] Make TargetInfo::writeIgotPlt a no-op.Jan 9 2020, 8:12 PM

MaskRay marked an inline comment as done.

MaskRay added inline comments.

lld/test/ELF/x86-64-feature-cet.s
41	On RELA targets, .got.plt slots for IFUNC are not used. They can be kept zero. See D72474 Unfortunately pre-merge-bot does not recognize patch dependencies, so `x86-64-feature-cet.s` will fail.

xiangzhangllvm added inline comments.Jan 10 2020, 3:11 AM

lld/test/ELF/x86-64-feature-cet.s
41	So can we remove the GOTPLT check here let it 'skip' the bot check? because LLVM 10.0.0 will release on Jan 15, 2020. We can add back the GOTPLT check after the dependent patch merged.

MaskRay marked 2 inline comments as done.Jan 10 2020, 10:37 AM

MaskRay added inline comments.

lld/test/ELF/x86-64-feature-cet.s
41	You did not need to worry about the test... Anyway I've committed the .got.plt change. This patch should pass all tests now. @ruiu Good to merge?

Rebase after .got.plt change (D72474).

merge_guards_bot should be happy now.

Unit tests: pass. 61763 tests passed, 0 failed and 780 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Harbormaster failed remote builds in B43705: Diff 237391!Jan 10 2020, 11:09 AM

Ping @ruiu

ping @ruiu Hi, Do you have time to review? Thanks!

LGTM

In D59780#1812038, @ruiu wrote:

@MaskRay Please rebase and upload it with me and other people as reviewers, then I'll LGTM.

@MaskRay as @ruiu said above, I think you may assume that the patch is LGTH after it's rebased, if he has no other comments today. Given that, can you merge the patch in before LLVM branching?

This revision was not accepted when it landed; it landed in state Needs Review.Jan 13 2020, 11:48 PM

Closed by commit rG7cd429f27d48: [ELF] Add -z force-ibt and -z shstk for Intel Control-flow Enforcement… (authored by MaskRay). · Explain Why

This revision was automatically updated to reflect the committed changes.

Thank you again!!

Revision Contents

Path

Size

lld/

ELF/

Arch/

70 lines

61 lines

3 lines

46 lines

5 lines

20 lines

SyntheticSections.cpp

79 lines

Target.h

1 line

Writer.cpp

6 lines

docs/

ld.lld.1

7 lines

test/

ELF/

93 lines

92 lines

Diff 237857

lld/ELF/Arch/X86.cpp

Show First 20 Lines • Show All 404 Lines • ▼ Show 20 Lines	void X86::relaxTlsLdToLe(uint8_t *loc, RelType type, uint64_t val) const {
const uint8_t inst[] = {		const uint8_t inst[] = {
0x65, 0xa1, 0x00, 0x00, 0x00, 0x00, // movl %gs:0,%eax		0x65, 0xa1, 0x00, 0x00, 0x00, 0x00, // movl %gs:0,%eax
0x90, // nop		0x90, // nop
0x8d, 0x74, 0x26, 0x00, // leal 0(%esi,1),%esi		0x8d, 0x74, 0x26, 0x00, // leal 0(%esi,1),%esi
};		};
memcpy(loc - 2, inst, sizeof(inst));		memcpy(loc - 2, inst, sizeof(inst));
}		}

		// If Intel Indirect Branch Tracking is enabled, we have to emit special PLT
		// entries containing endbr32 instructions. A PLT entry will be split into two
		MaskRayAuthorUnsubmitted Done Reply Inline Actions `brend32` -> `endbr32` MaskRay: `brend32` -> `endbr32`
		// parts, one in .plt.sec (writePlt), and the other in .plt (writeIBTPlt).
		namespace {
		MaskRayAuthorUnsubmitted Not Done Reply Inline Actions `template <class ELFT> class IntelCET : public X86_64<ELFT> {` `IBT-enabled PLT` can be used with the ILP32 programming model (x86-32 ABI). (If you don't like putting too much stuff in the initial commit, `class IntelCET : public X86_64<ELF64LE> {`. MaskRay: `template <class ELFT> class IntelCET : public X86_64<ELFT> {` `IBT-enabled PLT` can be used…
		ruiuUnsubmitted Done Reply Inline Actions Actually the template is gone, I removed it because no code depended on ELFT. (Also no one really uses x32 ABI I guess? I thought that I read an article that Linux kernel developers are discussing deprecating x32 ABI.) ruiu: Actually the template is gone, I removed it because no code depended on ELFT. (Also no one…
		MaskRayAuthorUnsubmitted Not Done Reply Inline Actions Sorry I didn't notice that X86_64 had been de-templated. MaskRay: Sorry I didn't notice that X86_64 had been de-templated.
		MaskRayAuthorUnsubmitted Not Done Reply Inline Actions `IBT` (or `IBTPlt`) may be a better name than `IntelCET` as this class has nothing to do with the other part of CET: shstk. MaskRay: `IBT` (or `IBTPlt`) may be a better name than `IntelCET` as this class has nothing to do with…
		class IntelIBT : public X86 {
		public:
		IntelIBT();
		void writeGotPlt(uint8_t *buf, const Symbol &s) const override;
		void writePlt(uint8_t *buf, const Symbol &sym,
		uint64_t pltEntryAddr) const override;
		void writeIBTPlt(uint8_t *buf, size_t numEntries) const override;

		static const unsigned IBTPltHeaderSize = 16;
		};
		} // namespace

		IntelIBT::IntelIBT() { pltHeaderSize = 0; }

		xiangzhangllvmUnsubmitted Done Reply Inline Actions It is not useful, For code size, Is it better to write nothing here, or not use it at all ? xiangzhangllvm: It is not useful, For code size, Is it better to write nothing here, or not use it at all ?
		void IntelIBT::writeGotPlt(uint8_t *buf, const Symbol &s) const {
		MaskRayAuthorUnsubmitted Done Reply Inline Actions In the implementation of ld.bfd, I don't have 16 bytes of NOPs. I think in the ABI, `.SPLT1` is the counterpart of `.PLT1`, there is `.PLT0` (PLT header) but `.SPLT0` doesn't exist. Also, ld.bfd uses `.plt.sec`, not `.splt`. @hjl.tools Which is the preferred name for the second PLT section? MaskRay: In the implementation of ld.bfd, I don't have 16 bytes of NOPs. I think in the ABI, `.SPLT1` is…
		uint64_t va =
		in.ibtPlt->getVA() + IBTPltHeaderSize + s.pltIndex * pltEntrySize;
		write32le(buf, va);
		}

		void IntelIBT::writePlt(uint8_t *buf, const Symbol &sym,
		uint64_t /pltEntryAddr/) const {
		if (config->isPic) {
		const uint8_t inst[] = {
		0xf3, 0x0f, 0x1e, 0xfb, // endbr32
		0xff, 0xa3, 0, 0, 0, 0, // jmp *name@GOT(%ebx)
		0x66, 0x0f, 0x1f, 0x44, 0, 0, // nop
		};
		memcpy(buf, inst, sizeof(inst));
		write32le(buf + 6, sym.getGotPltVA() - in.gotPlt->getVA());
		return;
		}

		const uint8_t inst[] = {
		0xf3, 0x0f, 0x1e, 0xfb, // endbr32
		0xff, 0x25, 0, 0, 0, 0, // jmp *foo@GOT
		0x66, 0x0f, 0x1f, 0x44, 0, 0, // nop
		};
		memcpy(buf, inst, sizeof(inst));
		write32le(buf + 6, sym.getGotPltVA());
		}

		void IntelIBT::writeIBTPlt(uint8_t *buf, size_t numEntries) const {
		writePltHeader(buf);
		buf += IBTPltHeaderSize;

		const uint8_t inst[] = {
		0xf3, 0x0f, 0x1e, 0xfb, // endbr32
		0x68, 0, 0, 0, 0, // pushl $reloc_offset
		0xe9, 0, 0, 0, 0, // jmpq .PLT0@PC
		0x66, 0x90, // nop
		};

		for (size_t i = 0; i < numEntries; ++i) {
		memcpy(buf, inst, sizeof(inst));
		write32le(buf + 5, i * sizeof(object::ELF32LE::Rel));
		write32le(buf + 10, -pltHeaderSize - sizeof(inst) * i - 30);
		buf += sizeof(inst);
		}
		}

namespace {		namespace {
class RetpolinePic : public X86 {		class RetpolinePic : public X86 {
public:		public:
RetpolinePic();		RetpolinePic();
void writeGotPlt(uint8_t *buf, const Symbol &s) const override;		void writeGotPlt(uint8_t *buf, const Symbol &s) const override;
void writePltHeader(uint8_t *buf) const override;		void writePltHeader(uint8_t *buf) const override;
void writePlt(uint8_t *buf, const Symbol &sym,		void writePlt(uint8_t *buf, const Symbol &sym,
uint64_t pltEntryAddr) const override;		uint64_t pltEntryAddr) const override;
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	if (config->zRetpolineplt) {
if (config->isPic) {		if (config->isPic) {
static RetpolinePic t;		static RetpolinePic t;
return &t;		return &t;
}		}
static RetpolineNoPic t;		static RetpolineNoPic t;
return &t;		return &t;
}		}

		if (config->andFeatures & GNU_PROPERTY_X86_FEATURE_1_IBT) {
		static IntelIBT t;
		return &t;
		}

static X86 t;		static X86 t;
return &t;		return &t;
}		}

} // namespace elf		} // namespace elf
} // namespace lld		} // namespace lld

lld/ELF/Arch/X86_64.cpp

Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
void X86_64::writePltHeader(uint8_t *buf) const {		void X86_64::writePltHeader(uint8_t *buf) const {
const uint8_t pltData[] = {		const uint8_t pltData[] = {
0xff, 0x35, 0, 0, 0, 0, // pushq GOTPLT+8(%rip)		0xff, 0x35, 0, 0, 0, 0, // pushq GOTPLT+8(%rip)
0xff, 0x25, 0, 0, 0, 0, // jmp *GOTPLT+16(%rip)		0xff, 0x25, 0, 0, 0, 0, // jmp *GOTPLT+16(%rip)
0x0f, 0x1f, 0x40, 0x00, // nop		0x0f, 0x1f, 0x40, 0x00, // nop
};		};
memcpy(buf, pltData, sizeof(pltData));		memcpy(buf, pltData, sizeof(pltData));
uint64_t gotPlt = in.gotPlt->getVA();		uint64_t gotPlt = in.gotPlt->getVA();
uint64_t plt = in.plt->getVA();		uint64_t plt = in.ibtPlt ? in.ibtPlt->getVA() : in.plt->getVA();
write32le(buf + 2, gotPlt - plt + 2); // GOTPLT+8		write32le(buf + 2, gotPlt - plt + 2); // GOTPLT+8
write32le(buf + 8, gotPlt - plt + 4); // GOTPLT+16		write32le(buf + 8, gotPlt - plt + 4); // GOTPLT+16
}		}

void X86_64::writePlt(uint8_t *buf, const Symbol &sym,		void X86_64::writePlt(uint8_t *buf, const Symbol &sym,
uint64_t pltEntryAddr) const {		uint64_t pltEntryAddr) const {
const uint8_t inst[] = {		const uint8_t inst[] = {
0xff, 0x25, 0, 0, 0, 0, // jmpq *got(%rip)		0xff, 0x25, 0, 0, 0, 0, // jmpq *got(%rip)
Show All 14 Lines	RelType X86_64::getDynRel(RelType type) const {
return R_X86_64_NONE;		return R_X86_64_NONE;
}		}

void X86_64::relaxTlsGdToLe(uint8_t *loc, RelType type, uint64_t val) const {		void X86_64::relaxTlsGdToLe(uint8_t *loc, RelType type, uint64_t val) const {
if (type == R_X86_64_TLSGD) {		if (type == R_X86_64_TLSGD) {
// Convert		// Convert
// .byte 0x66		// .byte 0x66
// leaq x@tlsgd(%rip), %rdi		// leaq x@tlsgd(%rip), %rdi
// .word 0x6666		// .word 0x6666
// rex64		// rex64
		xiangzhangllvmUnsubmitted Not Done Reply Inline Actions It seems will always write the same place. xiangzhangllvm: It seems will always write the same place.
		ruiuUnsubmitted Done Reply Inline Actions Buf is incremented on every loop iteration. ruiu: Buf is incremented on every loop iteration.
		xiangzhangllvmUnsubmitted Not Done Reply Inline Actions Hi ruiu, I want to mean: Next loop Buf += 16, (-getPltEntryOffset(I)) -= 16 So they will write the some address for jmpq Instruction. xiangzhangllvm: Hi ruiu, I want to mean: Next loop Buf += 16, (-getPltEntryOffset(I)) -= 16 So they will write…
// call __tls_get_addr@plt		// call __tls_get_addr@plt
// to the following two instructions.		// to the following two instructions.
const uint8_t inst[] = {		const uint8_t inst[] = {
0x64, 0x48, 0x8b, 0x04, 0x25, 0x00, 0x00,		0x64, 0x48, 0x8b, 0x04, 0x25, 0x00, 0x00,
0x00, 0x00, // mov %fs:0x0,%rax		0x00, 0x00, // mov %fs:0x0,%rax
0x48, 0x8d, 0x80, 0, 0, 0, 0, // lea x@tpoff,%rax		0x48, 0x8d, 0x80, 0, 0, 0, 0, // lea x@tpoff,%rax
};		};
memcpy(loc - 4, inst, sizeof(inst));		memcpy(loc - 4, inst, sizeof(inst));
▲ Show 20 Lines • Show All 368 Lines • ▼ Show 20 Lines	if (memcmp(loc, "\x4c\x8d\x94\x24", 4) == 0 \|\|
// The offset bytes are encoded four bytes after the start of the		// The offset bytes are encoded four bytes after the start of the
// instruction.		// instruction.
write32le(loc + 4, read32le(loc + 4) - 0x4000);		write32le(loc + 4, read32le(loc + 4) - 0x4000);
return true;		return true;
}		}
return false;		return false;
}		}

		// If Intel Indirect Branch Tracking is enabled, we have to emit special PLT
		// entries containing endbr64 instructions. A PLT entry will be split into two
		MaskRayAuthorUnsubmitted Done Reply Inline Actions `brend32` -> `endbr64` MaskRay: `brend32` -> `endbr64`
		// parts, one in .plt.sec (writePlt), and the other in .plt (writeIBTPlt).
		namespace {
		class IntelIBT : public X86_64 {
		public:
		IntelIBT();
		void writeGotPlt(uint8_t *buf, const Symbol &s) const override;
		void writePlt(uint8_t *buf, const Symbol &sym,
		uint64_t pltEntryAddr) const override;
		void writeIBTPlt(uint8_t *buf, size_t numEntries) const override;

		static const unsigned IBTPltHeaderSize = 16;
		};
		} // namespace

		IntelIBT::IntelIBT() { pltHeaderSize = 0; }

		void IntelIBT::writeGotPlt(uint8_t *buf, const Symbol &s) const {
		MaskRayAuthorUnsubmitted Done Reply Inline Actions If I understand it correctly, `IntelCET::writePlt` writes the contents of ld.bfd's `.plt.sec`, then it doesn't need a PLT header. Can `Symbol::getPltVA` (`Plt->getVA() + Plt->HeaderSize + PltIndex * Target->PltEntrySize;`) compute a wrong address because it thinks `HeaderSize` is actually 0? Disassembly of section .plt: // counterpart in lld: IntelCET::writeIBTPlt 0000000000001020 <.plt>: 1020: ff 35 e2 2f 00 00 pushq 0x2fe2(%rip) # 4008 <_GLOBAL_OFFSET_TABLE_+0x8> 1026: f2 ff 25 e3 2f 00 00 bnd jmpq 0x2fe3(%rip) # 4010 <_GLOBAL_OFFSET_TABLE_+0x10> 102d: 0f 1f 00 nopl (%rax) 1030: f3 0f 1e fa endbr64 1034: 68 00 00 00 00 pushq $0x0 1039: f2 e9 e1 ff ff ff bnd jmpq 1020 <.plt> 103f: 90 nop 1040: f3 0f 1e fa endbr64 1044: 68 01 00 00 00 pushq $0x1 1049: f2 e9 d1 ff ff ff bnd jmpq 1020 <.plt> 104f: 90 nop ... Disassembly of section .plt.sec: // counterpart in lld: IntelCET::writePlt 0000000000001060 <putchar@plt>: 1060: f3 0f 1e fa endbr64 1064: f2 ff 25 ad 2f 00 00 bnd jmp QWORD PTR [rip+0x2fad] # 4018 <putchar@GLIBC_2.2.5> 106b: 0f 1f 44 00 00 nop DWORD PTR [rax+rax1+0x0] 0000000000001070 <puts@plt>: 1070: f3 0f 1e fa endbr64 1074: f2 ff 25 a5 2f 00 00 bnd jmp QWORD PTR [rip+0x2fa5] # 4020 <puts@GLIBC_2.2.5> 107b: 0f 1f 44 00 00 nop DWORD PTR [rax+rax1+0x0] MaskRay:* If I understand it correctly, `IntelCET::writePlt` writes the contents of ld.bfd's `.plt.sec`…
		uint64_t va =
		in.ibtPlt->getVA() + IBTPltHeaderSize + s.pltIndex * pltEntrySize;
		write64le(buf, va);
		}

		void IntelIBT::writePlt(uint8_t *buf, const Symbol &sym,
		uint64_t pltEntryAddr) const {
		const uint8_t Inst[] = {
		0xf3, 0x0f, 0x1e, 0xfa, // endbr64
		0xff, 0x25, 0, 0, 0, 0, // jmpq *got(%rip)
		0x66, 0x0f, 0x1f, 0x44, 0, 0, // nop
		};
		memcpy(buf, Inst, sizeof(Inst));
		write32le(buf + 6, sym.getGotPltVA() - pltEntryAddr - 10);
		}

		void IntelIBT::writeIBTPlt(uint8_t *buf, size_t numEntries) const {
		writePltHeader(buf);
		buf += IBTPltHeaderSize;

		const uint8_t inst[] = {
		0xf3, 0x0f, 0x1e, 0xfa, // endbr64
		0x68, 0, 0, 0, 0, // pushq <relocation index>
		0xe9, 0, 0, 0, 0, // jmpq plt[0]
		0x66, 0x90, // nop
		};

		for (size_t i = 0; i < numEntries; ++i) {
		memcpy(buf, inst, sizeof(inst));
		MaskRayAuthorUnsubmitted Not Done Reply Inline Actions `write32le(Buf + 5, I);` For glibc lazy binding, x86_64 uses relocation indices while x86_32 uses offsets. MaskRay: `write32le(Buf + 5, I);` For glibc lazy binding, x86_64 uses relocation indices while x86_32…
		write32le(buf + 5, i);
		write32le(buf + 10, -pltHeaderSize - sizeof(inst) * i - 30);
		buf += sizeof(inst);
		}
		}

// These nonstandard PLT entries are to migtigate Spectre v2 security		// These nonstandard PLT entries are to migtigate Spectre v2 security
// vulnerability. In order to mitigate Spectre v2, we want to avoid indirect		// vulnerability. In order to mitigate Spectre v2, we want to avoid indirect
// branch instructions such as `jmp *GOTPLT(%rip)`. So, in the following PLT		// branch instructions such as `jmp *GOTPLT(%rip)`. So, in the following PLT
// entries, we use a CALL followed by MOV and RET to do the same thing as an		// entries, we use a CALL followed by MOV and RET to do the same thing as an
// indirect jump. That instruction sequence is so-called "retpoline".		// indirect jump. That instruction sequence is so-called "retpoline".
//		//
// We have two types of retpoline PLTs as a size optimization. If `-z now`		// We have two types of retpoline PLTs as a size optimization. If `-z now`
// is specified, all dynamic symbols are resolved at load-time. Thus, when		// is specified, all dynamic symbols are resolved at load-time. Thus, when
▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	if (config->zRetpolineplt) {
if (config->zNow) {		if (config->zNow) {
static RetpolineZNow t;		static RetpolineZNow t;
return &t;		return &t;
}		}
static Retpoline t;		static Retpoline t;
return &t;		return &t;
}		}

		if (config->andFeatures & GNU_PROPERTY_X86_FEATURE_1_IBT) {
		static IntelIBT t;
		return &t;
		}

static X86_64 t;		static X86_64 t;
return &t;		return &t;
}		}

TargetInfo *getX86_64TargetInfo() { return getTargetInfo(); }		TargetInfo *getX86_64TargetInfo() { return getTargetInfo(); }

} // namespace elf		} // namespace elf
} // namespace lld		} // namespace lld

lld/ELF/Config.h

Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	struct VersionDefinition {
std::vector<SymbolVersion> patterns;		std::vector<SymbolVersion> patterns;
};		};

// This struct contains the global configuration for the linker.		// This struct contains the global configuration for the linker.
// Most fields are direct mapping from the command line options		// Most fields are direct mapping from the command line options
// and such fields have the same name as the corresponding options.		// and such fields have the same name as the corresponding options.
// Most fields are initialized by the driver.		// Most fields are initialized by the driver.
struct Configuration {		struct Configuration {
uint8_t osabi = 0;		uint8_t osabi = 0;
		MaskRayAuthorUnsubmitted Not Done Reply Inline Actions Split off the `X86Features` (or `ANDFeatures`) change and commit, to make D62609 easier to review? MaskRay: Split off the `X86Features` (or `ANDFeatures`) change and commit, to make D62609 easier to…
uint32_t andFeatures = 0;		uint32_t andFeatures = 0;
llvm::CachePruningPolicy thinLTOCachePolicy;		llvm::CachePruningPolicy thinLTOCachePolicy;
llvm::StringMap<uint64_t> sectionStartMap;		llvm::StringMap<uint64_t> sectionStartMap;
llvm::StringRef chroot;		llvm::StringRef chroot;
llvm::StringRef dynamicLinker;		llvm::StringRef dynamicLinker;
llvm::StringRef dwoDir;		llvm::StringRef dwoDir;
llvm::StringRef entry;		llvm::StringRef entry;
llvm::StringRef emulation;		llvm::StringRef emulation;
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	struct Configuration {
bool emitRelocs;		bool emitRelocs;
bool enableNewDtags;		bool enableNewDtags;
bool executeOnly;		bool executeOnly;
bool exportDynamic;		bool exportDynamic;
bool fixCortexA53Errata843419;		bool fixCortexA53Errata843419;
bool fixCortexA8;		bool fixCortexA8;
bool forceBTI;		bool forceBTI;
bool formatBinary = false;		bool formatBinary = false;
bool requireCET;
bool gcSections;		bool gcSections;
bool gdbIndex;		bool gdbIndex;
bool gnuHash = false;		bool gnuHash = false;
bool gnuUnique;		bool gnuUnique;
bool hasDynamicList = false;		bool hasDynamicList = false;
bool hasDynSymTab;		bool hasDynSymTab;
bool ignoreDataAddressEquality;		bool ignoreDataAddressEquality;
bool ignoreFunctionAddressEquality;		bool ignoreFunctionAddressEquality;
Show All 31 Lines	struct Configuration {
bool warnBackrefs;		bool warnBackrefs;
bool warnCommon;		bool warnCommon;
bool warnIfuncTextrel;		bool warnIfuncTextrel;
bool warnMissingEntry;		bool warnMissingEntry;
bool warnSymbolOrdering;		bool warnSymbolOrdering;
bool writeAddends;		bool writeAddends;
bool zCombreloc;		bool zCombreloc;
bool zCopyreloc;		bool zCopyreloc;
		bool zForceIbt;
bool zGlobal;		bool zGlobal;
bool zHazardplt;		bool zHazardplt;
bool zIfuncNoplt;		bool zIfuncNoplt;
bool zInitfirst;		bool zInitfirst;
bool zInterpose;		bool zInterpose;
bool zKeepTextSectionPrefix;		bool zKeepTextSectionPrefix;
bool zNodefaultlib;		bool zNodefaultlib;
bool zNodelete;		bool zNodelete;
bool zNodlopen;		bool zNodlopen;
bool zNow;		bool zNow;
bool zOrigin;		bool zOrigin;
bool zRelro;		bool zRelro;
bool zRodynamic;		bool zRodynamic;
		bool zShstk;
bool zText;		bool zText;
bool zRetpolineplt;		bool zRetpolineplt;
bool zWxneeded;		bool zWxneeded;
DiscardPolicy discard;		DiscardPolicy discard;
GnuStackKind zGnustack;		GnuStackKind zGnustack;
ICFLevel icf;		ICFLevel icf;
OrphanHandlingPolicy orphanHandling;		OrphanHandlingPolicy orphanHandling;
SortSectionPolicy sortSection;		SortSectionPolicy sortSection;
▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

lld/ELF/Driver.cpp

Show First 20 Lines • Show All 340 Lines • ▼ Show 20 Lines	static void checkOptions() {
if (config->executeOnly) {		if (config->executeOnly) {
if (config->emachine != EM_AARCH64)		if (config->emachine != EM_AARCH64)
error("-execute-only is only supported on AArch64 targets");		error("-execute-only is only supported on AArch64 targets");

if (config->singleRoRx && !script->hasSectionsCommand)		if (config->singleRoRx && !script->hasSectionsCommand)
error("-execute-only and -no-rosegment cannot be used together");		error("-execute-only and -no-rosegment cannot be used together");
}		}

if (config->zRetpolineplt && config->requireCET)		if (config->zRetpolineplt && config->zForceIbt)
error("--require-cet may not be used with -z retpolineplt");		error("-z force-ibt may not be used with -z retpolineplt");

if (config->emachine != EM_AARCH64) {		if (config->emachine != EM_AARCH64) {
if (config->pacPlt)		if (config->pacPlt)
error("-z pac-plt only supported on AArch64");		error("-z pac-plt only supported on AArch64");
if (config->forceBTI)		if (config->forceBTI)
error("-z force-bti only supported on AArch64");		error("-z force-bti only supported on AArch64");
}		}
}		}
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	if (StringRef("nognustack") == arg->getValue())
return GnuStackKind::None;		return GnuStackKind::None;
}		}

return GnuStackKind::NoExec;		return GnuStackKind::NoExec;
}		}

static bool isKnownZFlag(StringRef s) {		static bool isKnownZFlag(StringRef s) {
return s == "combreloc" \|\| s == "copyreloc" \|\| s == "defs" \|\|		return s == "combreloc" \|\| s == "copyreloc" \|\| s == "defs" \|\|
s == "execstack" \|\| s == "force-bti" \|\| s == "global" \|\|		s == "execstack" \|\| s == "force-bti" \|\| s == "force-ibt" \|\|
s == "hazardplt" \|\| s == "ifunc-noplt" \|\| s == "initfirst" \|\|		s == "global" \|\| s == "hazardplt" \|\| s == "ifunc-noplt" \|\|
s == "interpose" \|\| s == "keep-text-section-prefix" \|\| s == "lazy" \|\|		s == "initfirst" \|\| s == "interpose" \|\|
s == "muldefs" \|\| s == "separate-code" \|\|		s == "keep-text-section-prefix" \|\| s == "lazy" \|\| s == "muldefs" \|\|
s == "separate-loadable-segments" \|\| s == "nocombreloc" \|\|		s == "separate-code" \|\| s == "separate-loadable-segments" \|\|
s == "nocopyreloc" \|\| s == "nodefaultlib" \|\| s == "nodelete" \|\|		s == "nocombreloc" \|\| s == "nocopyreloc" \|\| s == "nodefaultlib" \|\|
s == "nodlopen" \|\| s == "noexecstack" \|\| s == "nognustack" \|\|		s == "nodelete" \|\| s == "nodlopen" \|\| s == "noexecstack" \|\|
s == "nokeep-text-section-prefix" \|\| s == "norelro" \|\|		s == "nognustack" \|\| s == "nokeep-text-section-prefix" \|\|
s == "noseparate-code" \|\| s == "notext" \|\| s == "now" \|\|		s == "norelro" \|\| s == "noseparate-code" \|\| s == "notext" \|\|
s == "origin" \|\| s == "pac-plt" \|\| s == "relro" \|\|		s == "now" \|\| s == "origin" \|\| s == "pac-plt" \|\| s == "relro" \|\|
s == "retpolineplt" \|\| s == "rodynamic" \|\| s == "text" \|\|		s == "retpolineplt" \|\| s == "rodynamic" \|\| s == "shstk" \|\|
s == "undefs" \|\| s == "wxneeded" \|\|		s == "text" \|\| s == "undefs" \|\| s == "wxneeded" \|\|
s.startswith("common-page-size=") \|\| s.startswith("max-page-size=") \|\|		s.startswith("common-page-size=") \|\| s.startswith("max-page-size=") \|\|
s.startswith("stack-size=");		s.startswith("stack-size=");
}		}

// Report an error for an unknown -z option.		// Report an error for an unknown -z option.
static void checkZOptions(opt::InputArgList &args) {		static void checkZOptions(opt::InputArgList &args) {
for (auto *arg : args.filtered(OPT_z))		for (auto *arg : args.filtered(OPT_z))
if (!isKnownZFlag(arg->getValue()))		if (!isKnownZFlag(arg->getValue()))
▲ Show 20 Lines • Show All 440 Lines • ▼ Show 20 Lines	config->executeOnly =
args.hasFlag(OPT_execute_only, OPT_no_execute_only, false);		args.hasFlag(OPT_execute_only, OPT_no_execute_only, false);
config->exportDynamic =		config->exportDynamic =
args.hasFlag(OPT_export_dynamic, OPT_no_export_dynamic, false);		args.hasFlag(OPT_export_dynamic, OPT_no_export_dynamic, false);
config->filterList = args::getStrings(args, OPT_filter);		config->filterList = args::getStrings(args, OPT_filter);
config->fini = args.getLastArgValue(OPT_fini, "_fini");		config->fini = args.getLastArgValue(OPT_fini, "_fini");
config->fixCortexA53Errata843419 = args.hasArg(OPT_fix_cortex_a53_843419);		config->fixCortexA53Errata843419 = args.hasArg(OPT_fix_cortex_a53_843419);
config->fixCortexA8 = args.hasArg(OPT_fix_cortex_a8);		config->fixCortexA8 = args.hasArg(OPT_fix_cortex_a8);
config->forceBTI = hasZOption(args, "force-bti");		config->forceBTI = hasZOption(args, "force-bti");
config->requireCET = args.hasArg(OPT_require_cet);
config->gcSections = args.hasFlag(OPT_gc_sections, OPT_no_gc_sections, false);		config->gcSections = args.hasFlag(OPT_gc_sections, OPT_no_gc_sections, false);
config->gnuUnique = args.hasFlag(OPT_gnu_unique, OPT_no_gnu_unique, true);		config->gnuUnique = args.hasFlag(OPT_gnu_unique, OPT_no_gnu_unique, true);
config->gdbIndex = args.hasFlag(OPT_gdb_index, OPT_no_gdb_index, false);		config->gdbIndex = args.hasFlag(OPT_gdb_index, OPT_no_gdb_index, false);
config->icf = getICF(args);		config->icf = getICF(args);
config->ignoreDataAddressEquality =		config->ignoreDataAddressEquality =
args.hasArg(OPT_ignore_data_address_equality);		args.hasArg(OPT_ignore_data_address_equality);
config->ignoreFunctionAddressEquality =		config->ignoreFunctionAddressEquality =
args.hasArg(OPT_ignore_function_address_equality);		args.hasArg(OPT_ignore_function_address_equality);
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	config->warnBackrefs =
args.hasFlag(OPT_warn_backrefs, OPT_no_warn_backrefs, false);		args.hasFlag(OPT_warn_backrefs, OPT_no_warn_backrefs, false);
config->warnCommon = args.hasFlag(OPT_warn_common, OPT_no_warn_common, false);		config->warnCommon = args.hasFlag(OPT_warn_common, OPT_no_warn_common, false);
config->warnIfuncTextrel =		config->warnIfuncTextrel =
args.hasFlag(OPT_warn_ifunc_textrel, OPT_no_warn_ifunc_textrel, false);		args.hasFlag(OPT_warn_ifunc_textrel, OPT_no_warn_ifunc_textrel, false);
config->warnSymbolOrdering =		config->warnSymbolOrdering =
args.hasFlag(OPT_warn_symbol_ordering, OPT_no_warn_symbol_ordering, true);		args.hasFlag(OPT_warn_symbol_ordering, OPT_no_warn_symbol_ordering, true);
config->zCombreloc = getZFlag(args, "combreloc", "nocombreloc", true);		config->zCombreloc = getZFlag(args, "combreloc", "nocombreloc", true);
config->zCopyreloc = getZFlag(args, "copyreloc", "nocopyreloc", true);		config->zCopyreloc = getZFlag(args, "copyreloc", "nocopyreloc", true);
		config->zForceIbt = hasZOption(args, "force-ibt");
config->zGlobal = hasZOption(args, "global");		config->zGlobal = hasZOption(args, "global");
config->zGnustack = getZGnuStack(args);		config->zGnustack = getZGnuStack(args);
config->zHazardplt = hasZOption(args, "hazardplt");		config->zHazardplt = hasZOption(args, "hazardplt");
config->zIfuncNoplt = hasZOption(args, "ifunc-noplt");		config->zIfuncNoplt = hasZOption(args, "ifunc-noplt");
config->zInitfirst = hasZOption(args, "initfirst");		config->zInitfirst = hasZOption(args, "initfirst");
config->zInterpose = hasZOption(args, "interpose");		config->zInterpose = hasZOption(args, "interpose");
config->zKeepTextSectionPrefix = getZFlag(		config->zKeepTextSectionPrefix = getZFlag(
args, "keep-text-section-prefix", "nokeep-text-section-prefix", false);		args, "keep-text-section-prefix", "nokeep-text-section-prefix", false);
config->zNodefaultlib = hasZOption(args, "nodefaultlib");		config->zNodefaultlib = hasZOption(args, "nodefaultlib");
config->zNodelete = hasZOption(args, "nodelete");		config->zNodelete = hasZOption(args, "nodelete");
config->zNodlopen = hasZOption(args, "nodlopen");		config->zNodlopen = hasZOption(args, "nodlopen");
config->zNow = getZFlag(args, "now", "lazy", false);		config->zNow = getZFlag(args, "now", "lazy", false);
config->zOrigin = hasZOption(args, "origin");		config->zOrigin = hasZOption(args, "origin");
config->zRelro = getZFlag(args, "relro", "norelro", true);		config->zRelro = getZFlag(args, "relro", "norelro", true);
config->zRetpolineplt = hasZOption(args, "retpolineplt");		config->zRetpolineplt = hasZOption(args, "retpolineplt");
config->zRodynamic = hasZOption(args, "rodynamic");		config->zRodynamic = hasZOption(args, "rodynamic");
config->zSeparate = getZSeparate(args);		config->zSeparate = getZSeparate(args);
		config->zShstk = hasZOption(args, "shstk");
config->zStackSize = args::getZOptionValue(args, OPT_z, "stack-size", 0);		config->zStackSize = args::getZOptionValue(args, OPT_z, "stack-size", 0);
config->zText = getZFlag(args, "text", "notext", true);		config->zText = getZFlag(args, "text", "notext", true);
config->zWxneeded = hasZOption(args, "wxneeded");		config->zWxneeded = hasZOption(args, "wxneeded");

// Parse LTO options.		// Parse LTO options.
if (auto *arg = args.getLastArg(OPT_plugin_opt_mcpu_eq))		if (auto *arg = args.getLastArg(OPT_plugin_opt_mcpu_eq))
parseClangOption(saver.save("-mcpu=" + StringRef(arg->getValue())),		parseClangOption(saver.save("-mcpu=" + StringRef(arg->getValue())),
arg->getSpelling());		arg->getSpelling());
▲ Show 20 Lines • Show All 681 Lines • ▼ Show 20 Lines	for (size_t i = 0, e = syms.size(); i != e; ++i)
if (Symbol *s = map.lookup(syms[i]))		if (Symbol *s = map.lookup(syms[i]))
syms[i] = s;		syms[i] = s;
});		});

// Update pointers in the symbol table.		// Update pointers in the symbol table.
for (const WrappedSymbol &w : wrapped)		for (const WrappedSymbol &w : wrapped)
symtab->wrap(w.sym, w.real, w.wrap);		symtab->wrap(w.sym, w.real, w.wrap);
}		}

		peter.smithUnsubmitted Not Done Reply Inline Actions In theory you could update Config->X86Features incrementally as you read them in rather than storing them in InputFiles and processing them later. Although I think doing all the logic here makes it easier to understand. peter.smith: In theory you could update Config->X86Features incrementally as you read them in rather than…
// To enable CET (x86's hardware-assited control flow enforcement), each		// To enable CET (x86's hardware-assited control flow enforcement), each
// source file must be compiled with -fcf-protection. Object files compiled		// source file must be compiled with -fcf-protection. Object files compiled
// with the flag contain feature flags indicating that they are compatible		// with the flag contain feature flags indicating that they are compatible
// with CET. We enable the feature only when all object files are compatible		// with CET. We enable the feature only when all object files are compatible
// with CET.		// with CET.
//		//
// This function returns the merged feature flags. If 0, we cannot enable CET.
// This is also the case with AARCH64's BTI and PAC which use the similar		// This is also the case with AARCH64's BTI and PAC which use the similar
// GNU_PROPERTY_AARCH64_FEATURE_1_AND mechanism.		// GNU_PROPERTY_AARCH64_FEATURE_1_AND mechanism.
		peter.smithUnsubmitted Done Reply Inline Actions I've made a few English suggestions below. No functional changes. // To enable CET (x86's hardware-assited control flow enforcement), // each source file must be compiled by passing --xxx to the compiler // [what is the compiler flag to enable CET?]. Object files compiled with the // flag contain feature flags indicating that they are compatible with CET. We // enable the feature only when all object files are compatible with CET. // // This function returns the merged feature flags. If 0, we cannot enable CET. peter.smith: I've made a few English suggestions below. No functional changes. ``` // To enable CET (x86's…
//
// Note that the CET-aware PLT is not implemented yet. We do error
// check only.
template <class ELFT> static uint32_t getAndFeatures() {		template <class ELFT> static uint32_t getAndFeatures() {
if (config->emachine != EM_386 && config->emachine != EM_X86_64 &&		if (config->emachine != EM_386 && config->emachine != EM_X86_64 &&
config->emachine != EM_AARCH64)		config->emachine != EM_AARCH64)
return 0;		return 0;

uint32_t ret = -1;		uint32_t ret = -1;
for (InputFile *f : objectFiles) {		for (InputFile *f : objectFiles) {
uint32_t features = cast<ObjFile<ELFT>>(f)->andFeatures;		uint32_t features = cast<ObjFile<ELFT>>(f)->andFeatures;
if (config->forceBTI && !(features & GNU_PROPERTY_AARCH64_FEATURE_1_BTI)) {		if (config->forceBTI && !(features & GNU_PROPERTY_AARCH64_FEATURE_1_BTI)) {
warn(toString(f) + ": -z force-bti: file does not have BTI property");		warn(toString(f) + ": -z force-bti: file does not have BTI property");
features \|= GNU_PROPERTY_AARCH64_FEATURE_1_BTI;		features \|= GNU_PROPERTY_AARCH64_FEATURE_1_BTI;
} else if (!features && config->requireCET)		} else if (config->zForceIbt &&
error(toString(f) + ": --require-cet: file is not compatible with CET");		!(features & GNU_PROPERTY_X86_FEATURE_1_IBT)) {
		warn(toString(f) + ": -z force-ibt: file does not have "
		"GNU_PROPERTY_X86_FEATURE_1_IBT property");
		features \|= GNU_PROPERTY_X86_FEATURE_1_IBT;
		}
		xiangzhangllvmUnsubmitted Not Done Reply Inline Actions I don't know why "--forece-bti" use "warn" not "error", in my eyes, people use --require-cet to generate the output file, mostly means, they want to run the output file on a CET-checked machine. It will run fail when linked Non-CET object files to the output file. So here we used "error" instead of "warn". xiangzhangllvm: I don't know why "--forece-bti" use "warn" not "error", in my eyes, people use --require-cet to…
		MaskRayAuthorUnsubmitted Not Done Reply Inline Actions GNU ld supports `-z cet-report=[none\|warning\|error]` (added in 2019-04) https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=233a00833b984319d5e94db3f5d5d9a735edc984 Examples: % ld.bfd ... -z cet-report=error # exit code: 1 ./ld-new: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/Scrt1.o: error: missing SHSTK property ./ld-new: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o: error: missing SHSTK property % ld.bfd ... -z cet-report=warning # exit code: 0 ./ld-new: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/Scrt1.o: warning: missing SHSTK property ./ld-new: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o: warning: missing SHSTK property MaskRay: GNU ld supports `-z cet-report=[none\|warning\|error]` (added in 2019-04) https://sourceware.
ret &= features;		ret &= features;
}		}

// Force enable pointer authentication Plt, we don't warn in this case as		// Force enable pointer authentication Plt, we don't warn in this case as
// this does not require support in the object for correctness.		// this does not require support in the object for correctness.
if (config->pacPlt)		if (config->pacPlt)
ret \|= GNU_PROPERTY_AARCH64_FEATURE_1_PAC;		ret \|= GNU_PROPERTY_AARCH64_FEATURE_1_PAC;
		// Force enable Shadow Stack.
		if (config->zShstk)
		ret \|= GNU_PROPERTY_X86_FEATURE_1_SHSTK;

return ret;		return ret;
}		}

// Do actual linking. Note that when this function is called,		// Do actual linking. Note that when this function is called,
// all linker scripts have already been parsed.		// all linker scripts have already been parsed.
template <class ELFT> void LinkerDriver::link(opt::InputArgList &args) {		template <class ELFT> void LinkerDriver::link(opt::InputArgList &args) {
// If a -hash-style option was not given, set to a default value,		// If a -hash-style option was not given, set to a default value,
▲ Show 20 Lines • Show All 284 Lines • Show Last 20 Lines

lld/ELF/Options.td

	Show First 20 Lines • Show All 170 Lines • ▼ Show 20 Lines
	defm fini: Eq<"fini", "Specify a finalizer function">, MetaVarName<"<symbol>">;			defm fini: Eq<"fini", "Specify a finalizer function">, MetaVarName<"<symbol>">;

	def fix_cortex_a53_843419: F<"fix-cortex-a53-843419">,			def fix_cortex_a53_843419: F<"fix-cortex-a53-843419">,
	HelpText<"Apply fixes for AArch64 Cortex-A53 erratum 843419">;			HelpText<"Apply fixes for AArch64 Cortex-A53 erratum 843419">;

	def fix_cortex_a8: F<"fix-cortex-a8">,			def fix_cortex_a8: F<"fix-cortex-a8">,
	HelpText<"Apply fixes for ARM Cortex-A8 erratum 657417">;			HelpText<"Apply fixes for ARM Cortex-A8 erratum 657417">;

	// This option is intentionally hidden from the user as the implementation			def force_bti: F<"force-bti">,
	// is not complete.			HelpText<"Force enable AArch64 BTI in PLT, warn if Input ELF file does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI property">;
	def require_cet: F<"require-cet">;

	defm format: Eq<"format", "Change the input format of the inputs following this option">,			defm format: Eq<"format", "Change the input format of the inputs following this option">,
	MetaVarName<"[default,elf,binary]">;			MetaVarName<"[default,elf,binary]">;

	defm gc_sections: B<"gc-sections",			defm gc_sections: B<"gc-sections",
	"Enable garbage collection of unused sections",			"Enable garbage collection of unused sections",
	"Disable garbage collection of unused sections (default)">;			"Disable garbage collection of unused sections (default)">;

	▲ Show 20 Lines • Show All 398 Lines • Show Last 20 Lines

lld/ELF/SyntheticSections.h

Show First 20 Lines • Show All 659 Lines • ▼ Show 20 Lines

private:		private:
size_t size = 0;		size_t size = 0;
};		};

// Used for PLT entries. It usually has a PLT header for lazy binding. Each PLT		// Used for PLT entries. It usually has a PLT header for lazy binding. Each PLT
// entry is associated with a JUMP_SLOT relocation, which may be resolved lazily		// entry is associated with a JUMP_SLOT relocation, which may be resolved lazily
// at runtime.		// at runtime.
		//
		// On PowerPC, this section contains lazy symbol resolvers. A branch instruction
		// jumps to a PLT call stub, which will then jump to the target (BIND_NOW) or a
		// lazy symbol resolver.
		//
		// On x86 when IBT is enabled, this section (.plt.sec) contains PLT call stubs.
		// A call instruction jumps to a .plt.sec entry, which will then jump to the
		// target (BIND_NOW) or a .plt entry.
class PltSection : public SyntheticSection {		class PltSection : public SyntheticSection {
public:		public:
PltSection();		PltSection();
void writeTo(uint8_t *buf) override;		void writeTo(uint8_t *buf) override;
size_t getSize() const override;		size_t getSize() const override;
bool isNeeded() const override;		bool isNeeded() const override;
void addSymbols();		void addSymbols();
void addEntry(Symbol &sym);		void addEntry(Symbol &sym);
		size_t getNumEntries() const { return entries.size(); }

size_t headerSize;		size_t headerSize = 0;

private:		private:
std::vector<const Symbol *> entries;		std::vector<const Symbol *> entries;
};		};

// Used for non-preemptible ifuncs. It does not have a header. Each entry is		// Used for non-preemptible ifuncs. It does not have a header. Each entry is
// associated with an IRELATIVE relocation, which will be resolved eagerly at		// associated with an IRELATIVE relocation, which will be resolved eagerly at
// runtime. PltSection can only contain entries associated with JUMP_SLOT		// runtime. PltSection can only contain entries associated with JUMP_SLOT
// relocations, so IPLT entries are in a separate section.		// relocations, so IPLT entries are in a separate section.
class IpltSection final : public SyntheticSection {		class IpltSection final : public SyntheticSection {
std::vector<const Symbol *> entries;		std::vector<const Symbol *> entries;

public:		public:
IpltSection();		IpltSection();
void writeTo(uint8_t *buf) override;		void writeTo(uint8_t *buf) override;
size_t getSize() const override;		size_t getSize() const override;
bool isNeeded() const override { return !entries.empty(); }		bool isNeeded() const override { return !entries.empty(); }
void addSymbols();		void addSymbols();
void addEntry(Symbol &sym);		void addEntry(Symbol &sym);
};		};

		// This is x86-only.
		class IBTPltSection : public SyntheticSection {
		public:
		IBTPltSection();
		void writeTo(uint8_t *Buf) override;
		size_t getSize() const override;
		};

class GdbIndexSection final : public SyntheticSection {		class GdbIndexSection final : public SyntheticSection {
public:		public:
struct AddressEntry {		struct AddressEntry {
InputSection *section;		InputSection *section;
uint64_t lowAddress;		uint64_t lowAddress;
uint64_t highAddress;		uint64_t highAddress;
uint32_t cuIndex;		uint32_t cuIndex;
};		};
▲ Show 20 Lines • Show All 466 Lines • ▼ Show 20 Lines	struct InStruct {
PPC64LongBranchTargetSection *ppc64LongBranchTarget;		PPC64LongBranchTargetSection *ppc64LongBranchTarget;
MipsGotSection *mipsGot;		MipsGotSection *mipsGot;
MipsRldMapSection *mipsRldMap;		MipsRldMapSection *mipsRldMap;
SyntheticSection *partEnd;		SyntheticSection *partEnd;
SyntheticSection *partIndex;		SyntheticSection *partIndex;
PltSection *plt;		PltSection *plt;
IpltSection *iplt;		IpltSection *iplt;
PPC32Got2Section *ppc32Got2;		PPC32Got2Section *ppc32Got2;
		IBTPltSection *ibtPlt;
RelocationBaseSection *relaPlt;		RelocationBaseSection *relaPlt;
RelocationBaseSection *relaIplt;		RelocationBaseSection *relaIplt;
StringTableSection *shStrTab;		StringTableSection *shStrTab;
StringTableSection *strTab;		StringTableSection *strTab;
SymbolTableBaseSection *symTab;		SymbolTableBaseSection *symTab;
SymtabShndxSection *symTabShndx;		SymtabShndxSection *symTabShndx;
};		};

extern InStruct in;		extern InStruct in;

} // namespace elf		} // namespace elf
} // namespace lld		} // namespace lld

#endif		#endif

lld/ELF/SyntheticSections.cpp

Show First 20 Lines • Show All 282 Lines • ▼ Show 20 Lines	static size_t getHashSize() {
default:		default:
llvm_unreachable("unknown BuildIdKind");		llvm_unreachable("unknown BuildIdKind");
}		}
}		}

// This class represents a linker-synthesized .note.gnu.property section.		// This class represents a linker-synthesized .note.gnu.property section.
//		//
// In x86 and AArch64, object files may contain feature flags indicating the		// In x86 and AArch64, object files may contain feature flags indicating the
// features that they have used. The flags are stored in a .note.gnu.property		// features that they have used. The flags are stored in a .note.gnu.property
		peter.smithUnsubmitted Done Reply Inline Actions I think it would be better to say "In x86, object files may contain feature flags indicating the features that they have used." The x86-64-psABI has: GNU_PROPERTY_X86_FEATURE_1_AND The x86 processor features indicated by the corresponding bits are used in program. peter.smith: I think it would be better to say "In x86, object files may contain feature flags indicating…
// section.		// section.
//		//
// lld reads the sections from input files and merges them by computing AND of		// lld reads the sections from input files and merges them by computing AND of
// the flags. The result is written as a new .note.gnu.property section.		// the flags. The result is written as a new .note.gnu.property section.
//		//
// If the flag is zero (which indicates that the intersection of the feature		// If the flag is zero (which indicates that the intersection of the feature
// sets is empty, or some input files didn't have .note.gnu.property sections),		// sets is empty, or some input files didn't have .note.gnu.property sections),
// we don't create this section.		// we don't create this section.
GnuPropertySection::GnuPropertySection()		GnuPropertySection::GnuPropertySection()
: SyntheticSection(llvm::ELF::SHF_ALLOC, llvm::ELF::SHT_NOTE,		: SyntheticSection(llvm::ELF::SHF_ALLOC, llvm::ELF::SHT_NOTE,
config->wordsize, ".note.gnu.property") {}		config->wordsize, ".note.gnu.property") {}

void GnuPropertySection::writeTo(uint8_t *buf) {		void GnuPropertySection::writeTo(uint8_t *buf) {
uint32_t featureAndType = config->emachine == EM_AARCH64		uint32_t featureAndType = config->emachine == EM_AARCH64
? GNU_PROPERTY_AARCH64_FEATURE_1_AND		? GNU_PROPERTY_AARCH64_FEATURE_1_AND
: GNU_PROPERTY_X86_FEATURE_1_AND;		: GNU_PROPERTY_X86_FEATURE_1_AND;

write32(buf, 4); // Name size		write32(buf, 4); // Name size
write32(buf + 4, config->is64 ? 16 : 12); // Content size		write32(buf + 4, config->is64 ? 16 : 12); // Content size
write32(buf + 8, NT_GNU_PROPERTY_TYPE_0); // Type		write32(buf + 8, NT_GNU_PROPERTY_TYPE_0); // Type
		xiangzhangllvmUnsubmitted Not Done Reply Inline Actions Here may waste 4 bytes of code size in 32-bit machine. xiangzhangllvm: Here may waste 4 bytes of code size in 32-bit machine.
		ruiuUnsubmitted Done Reply Inline Actions Wasting 4 bytes is I think completely acceptable, but I think this is a correctness issue. I'll fix that. ruiu: Wasting 4 bytes is I think completely acceptable, but I think this is a correctness issue. I'll…
memcpy(buf + 12, "GNU", 4); // Name string		memcpy(buf + 12, "GNU", 4); // Name string
write32(buf + 16, featureAndType); // Feature type		write32(buf + 16, featureAndType); // Feature type
write32(buf + 20, 4); // Feature size		write32(buf + 20, 4); // Feature size
write32(buf + 24, config->andFeatures); // Feature flags		write32(buf + 24, config->andFeatures); // Feature flags
if (config->is64)		if (config->is64)
write32(buf + 28, 0); // Padding		write32(buf + 28, 0); // Padding
}		}

▲ Show 20 Lines • Show All 2,117 Lines • ▼ Show 20 Lines	for (const SymbolTableEntry &s : symTab->getSymbols()) {
StringRef name = sym->getName();		StringRef name = sym->getName();
unsigned i = sym->dynsymIndex;		unsigned i = sym->dynsymIndex;
uint32_t hash = hashSysV(name) % numSymbols;		uint32_t hash = hashSysV(name) % numSymbols;
chains[i] = buckets[hash];		chains[i] = buckets[hash];
write32(buckets + hash, i);		write32(buckets + hash, i);
}		}
}		}

// On PowerPC64 the lazy symbol resolvers go into the `global linkage table`
// in the .glink section, rather then the typical .plt section.
PltSection::PltSection()		PltSection::PltSection()
: SyntheticSection(SHF_ALLOC \| SHF_EXECINSTR, SHT_PROGBITS, 16, ".plt"),		: SyntheticSection(SHF_ALLOC \| SHF_EXECINSTR, SHT_PROGBITS, 16, ".plt"),
headerSize(target->pltHeaderSize) {		headerSize(target->pltHeaderSize) {
		// On PowerPC, this section contains lazy symbol resolvers.
if (config->emachine == EM_PPC \|\| config->emachine == EM_PPC64) {		if (config->emachine == EM_PPC \|\| config->emachine == EM_PPC64) {
name = ".glink";		name = ".glink";
alignment = 4;		alignment = 4;
}		}

		// On x86 when IBT is enabled, this section contains the second PLT (lazy
		// symbol resolvers).
		if ((config->emachine == EM_386 \|\| config->emachine == EM_X86_64) &&
		(config->andFeatures & GNU_PROPERTY_X86_FEATURE_1_IBT))
		name = ".plt.sec";

// The PLT needs to be writable on SPARC as the dynamic linker will		// The PLT needs to be writable on SPARC as the dynamic linker will
// modify the instructions in the PLT entries.		// modify the instructions in the PLT entries.
if (config->emachine == EM_SPARCV9)		if (config->emachine == EM_SPARCV9)
this->flags \|= SHF_WRITE;		this->flags \|= SHF_WRITE;
}		}

void PltSection::writeTo(uint8_t *buf) {		void PltSection::writeTo(uint8_t *buf) {
if (config->emachine == EM_PPC) {		if (config->emachine == EM_PPC) {
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
void IpltSection::addSymbols() {		void IpltSection::addSymbols() {
size_t off = 0;		size_t off = 0;
for (size_t i = 0, e = entries.size(); i != e; ++i) {		for (size_t i = 0, e = entries.size(); i != e; ++i) {
target->addPltSymbols(*this, off);		target->addPltSymbols(*this, off);
off += target->pltEntrySize;		off += target->pltEntrySize;
}		}
}		}

		// This is an x86-only extra PLT section and used only when a security
		// enhancement feature called CET is enabled. In this comment, I'll explain what
		// the feature is and why we have two PLT sections if CET is enabled.
		//
		// So, what does CET do? CET introduces a new restriction to indirect jump
		// instructions. CET works this way. Assume that CET is enabled. Then, if you
		// execute an indirect jump instruction, the processor verifies that a special
		// "landing pad" instruction (which is actually a repurposed NOP instruction and
		// now called "endbr32" or "endbr64") is at the jump target. If the jump target
		// does not start with that instruction, the processor raises an exception
		// instead of continuing executing code.
		//
		// If CET is enabled, the compiler emits endbr to all locations where indirect
		// jumps may jump to.
		//
		// This mechanism makes it extremely hard to transfer the control to a middle of
		// a function that is not supporsed to be a indirect jump target, preventing
		// certain types of attacks such as ROP or JOP.
		//
		// Note that the processors in the market as of 2019 don't actually support the
		// feature. Only the spec is available at the moment.
		peter.smithUnsubmitted Done Reply Inline Actions Only the spec is available at the moment. peter.smith: Only the spec is available at the moment.
		//
		// Now, I'll explain why we have this extra PLT section for CET.
		//
		// Since you can indirectly jump to a PLT entry, we have to make PLT entries
		// start with endbr. The problem is there's no extra space for endbr (which is 4
		// bytes long), as the PLT entry is only 16 bytes long and all bytes are already
		// used.
		peter.smithUnsubmitted Done Reply Inline Actions "4 bytes long" and "16 bytes long" peter.smith: "4 bytes long" and "16 bytes long"
		//
		// In order to deal with the issue, we split a PLT entry into two PLT entries.
		// Remember that each PLT entry contains code to jump to an address read from
		// .got.plt AND code to resolve a dynamic symbol lazily. With the 2-PLT scheme,
		// the former code is written to .plt.sec, and the latter code is written to
		// .plt.
		//
		// Lazy symbol resolution in the 2-PLT scheme works in the usual way, except
		// that the regular .plt is now called .plt.sec and .plt is repurposed to
		// contain only code for lazy symbol resolution.
		//
		// In other words, this is how the 2-PLT scheme works. Application code is
		// supposed to jump to .plt.sec to call an external function. Each .plt.sec
		// entry contains code to read an address from a corresponding .got.plt entry
		// and jump to that address. Addresses in .got.plt initially point to .plt, so
		// when an application calls an external function for the first time, the
		// control is transferred to a function that resolves a symbol name from
		// external shared object files. That function then rewrites a .got.plt entry
		// with a resolved address, so that the subsequent function calls directly jump
		// to a desired location from .plt.sec.
		//
		// There is an open question as to whether the 2-PLT scheme was desirable or
		// not. We could have simply extended the PLT entry size to 32-bytes to
		// accommodate endbr, and that scheme would have been much simpler than the
		// 2-PLT scheme. One reason to split PLT was, by doing that, we could keep hot
		// code (.plt.sec) from cold code (.plt). But as far as I know no one proved
		// that the optimization actually makes a difference.
		//
		// That said, the 2-PLT scheme is a part of the ABI, debuggers and other tools
		// depend on it, so we implement the ABI.
		IBTPltSection::IBTPltSection()
		: SyntheticSection(SHF_ALLOC \| SHF_EXECINSTR, SHT_PROGBITS, 16, ".plt") {}

		void IBTPltSection::writeTo(uint8_t *buf) {
		target->writeIBTPlt(buf, in.plt->getNumEntries());
		}

		size_t IBTPltSection::getSize() const {
		// 16 is the header size of .plt.
		return 16 + in.plt->getNumEntries() * target->pltEntrySize;
		}

// The string hash function for .gdb_index.		// The string hash function for .gdb_index.
static uint32_t computeGdbHash(StringRef s) {		static uint32_t computeGdbHash(StringRef s) {
uint32_t h = 0;		uint32_t h = 0;
for (uint8_t c : s)		for (uint8_t c : s)
h = h * 67 + toLower(c) - 113;		h = h * 67 + toLower(c) - 113;
return h;		return h;
}		}

▲ Show 20 Lines • Show All 1,172 Lines • Show Last 20 Lines

lld/ELF/Target.h

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	public:

virtual void writePlt(uint8_t *buf, const Symbol &sym,		virtual void writePlt(uint8_t *buf, const Symbol &sym,
uint64_t pltEntryAddr) const {}		uint64_t pltEntryAddr) const {}
virtual void writeIplt(uint8_t *buf, const Symbol &sym,		virtual void writeIplt(uint8_t *buf, const Symbol &sym,
uint64_t pltEntryAddr) const {		uint64_t pltEntryAddr) const {
// All but PPC32 and PPC64 use the same format for .plt and .iplt entries.		// All but PPC32 and PPC64 use the same format for .plt and .iplt entries.
writePlt(buf, sym, pltEntryAddr);		writePlt(buf, sym, pltEntryAddr);
}		}
		virtual void writeIBTPlt(uint8_t *buf, size_t numEntries) const {}
virtual void addPltHeaderSymbols(InputSection &isec) const {}		virtual void addPltHeaderSymbols(InputSection &isec) const {}
virtual void addPltSymbols(InputSection &isec, uint64_t off) const {}		virtual void addPltSymbols(InputSection &isec, uint64_t off) const {}

// Returns true if a relocation only uses the low bits of a value such that		// Returns true if a relocation only uses the low bits of a value such that
// all those bits are in the same page. For example, if the relocation		// all those bits are in the same page. For example, if the relocation
// only uses the low 12 bits in a system with 4k pages. If this is true, the		// only uses the low 12 bits in a system with 4k pages. If this is true, the
// bits will always have the same value at runtime and we don't have to emit		// bits will always have the same value at runtime and we don't have to emit
// a dynamic relocation.		// a dynamic relocation.
▲ Show 20 Lines • Show All 208 Lines • Show Last 20 Lines

lld/ELF/Writer.cpp

Show First 20 Lines • Show All 509 Lines • ▼ Show 20 Lines	template <class ELFT> void createSyntheticSections() {
// that would cause a section type mismatch. However, because the Android		// that would cause a section type mismatch. However, because the Android
// dynamic loader reads .rel.plt after .rel.dyn, we can get the desired		// dynamic loader reads .rel.plt after .rel.dyn, we can get the desired
// behaviour by placing the iplt section in .rel.plt.		// behaviour by placing the iplt section in .rel.plt.
in.relaIplt = make<RelocationSection<ELFT>>(		in.relaIplt = make<RelocationSection<ELFT>>(
config->androidPackDynRelocs ? in.relaPlt->name : relaDynName,		config->androidPackDynRelocs ? in.relaPlt->name : relaDynName,
/sort=/false);		/sort=/false);
add(in.relaIplt);		add(in.relaIplt);

		if ((config->emachine == EM_386 \|\| config->emachine == EM_X86_64) &&
		(config->andFeatures & GNU_PROPERTY_X86_FEATURE_1_IBT)) {
		in.ibtPlt = make<IBTPltSection>();
		add(in.ibtPlt);
		}

in.plt = make<PltSection>();		in.plt = make<PltSection>();
add(in.plt);		add(in.plt);
in.iplt = make<IpltSection>();		in.iplt = make<IpltSection>();
add(in.iplt);		add(in.iplt);

if (config->andFeatures)		if (config->andFeatures)
add(make<GnuPropertySection>());		add(make<GnuPropertySection>());

// .note.GNU-stack is always added when we are creating a re-linkable		// .note.GNU-stack is always added when we are creating a re-linkable
		MaskRayAuthorUnsubmitted Not Done Reply Inline Actions Since `In.IBTPlt` is named `.plt` while `In.Plt` is named `.plt.sec`. Creating `In.IBTPlt` before `In.Plt` will give us this ordering: `.plt` `.plt.sec` (instead of `.plt.sec` `.plt`), which may look more natural. No strong preference here. Up to you. MaskRay: Since `In.IBTPlt` is named `.plt` while `In.Plt` is named `.plt.sec`. Creating `In.IBTPlt`…
// object file. Other linkers are using the presence of this marker		// object file. Other linkers are using the presence of this marker
// section to control the executable-ness of the stack area, but that		// section to control the executable-ness of the stack area, but that
// is irrelevant these days. Stack area should always be non-executable		// is irrelevant these days. Stack area should always be non-executable
// by default. So we emit this section unconditionally.		// by default. So we emit this section unconditionally.
if (config->relocatable)		if (config->relocatable)
add(make<GnuStackSection>());		add(make<GnuStackSection>());

if (in.symTab)		if (in.symTab)
▲ Show 20 Lines • Show All 2,190 Lines • Show Last 20 Lines

lld/docs/ld.lld.1

	Show First 20 Lines • Show All 599 Lines • ▼ Show 20 Lines
	Make the main stack executable.			Make the main stack executable.
	Stack permissions are recorded in the			Stack permissions are recorded in the
	.Dv PT_GNU_STACK			.Dv PT_GNU_STACK
	segment.			segment.
	.Pp			.Pp
	.It Cm force-bti			.It Cm force-bti
	Force enable AArch64 BTI instruction in PLT, warn if Input ELF file does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI property.			Force enable AArch64 BTI instruction in PLT, warn if Input ELF file does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI property.
	.Pp			.Pp
				.It Cm force-ibt
				Force enable Intel Indirect Branch Tracking in PLT, warn if an input ELF file
				does not have GNU_PROPERTY_X86_FEATURE_1_IBT property.
				.Pp
	.It Cm global			.It Cm global
	Sets the			Sets the
	.Dv DF_1_GLOBAL flag in the			.Dv DF_1_GLOBAL flag in the
	.Dv DYNAMIC			.Dv DYNAMIC
	section.			section.
	Different loaders can decide how to handle this flag on their own.			Different loaders can decide how to handle this flag on their own.
	.Pp			.Pp
	.It Cm ifunc-noplt			.It Cm ifunc-noplt
	▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
	Specify whether two adjacent PT_LOAD segments are allowed to overlap in pages.			Specify whether two adjacent PT_LOAD segments are allowed to overlap in pages.
	.Cm noseparate-code			.Cm noseparate-code
	(default) allows overlap.			(default) allows overlap.
	.Cm separate-code			.Cm separate-code
	allows overlap between two executable segments, or two non-executable segments.			allows overlap between two executable segments, or two non-executable segments.
	.Cm separate-loadable-segments			.Cm separate-loadable-segments
	disallows overlap.			disallows overlap.
	.Pp			.Pp
				.It Cm shstk
				x86 only, use shadow stack.
				.Pp
	.It Cm stack-size Ns = Ns Ar size			.It Cm stack-size Ns = Ns Ar size
	Set the main thread's stack size to			Set the main thread's stack size to
	.Ar size .			.Ar size .
	The stack size is recorded as the size of the			The stack size is recorded as the size of the
	.Ar size .			.Ar size .
	.Dv PT_GNU_STACK			.Dv PT_GNU_STACK
	program segment.			program segment.
	.Pp			.Pp
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

lld/test/ELF/i386-cet.s

This file was deleted.

	# REQUIRES: x86
	# RUN: llvm-mc -filetype=obj -triple=i386-unknown-linux %s -o %t.o
	# RUN: llvm-mc -filetype=obj -triple=i386-unknown-linux %p/Inputs/i386-cet1.s -o %t1.o
	# RUN: llvm-mc -filetype=obj -triple=i386-unknown-linux %p/Inputs/i386-cet2.s -o %t2.o
	# RUN: llvm-mc -filetype=obj -triple=i386-unknown-linux %p/Inputs/i386-cet3.s -o %t3.o
	# RUN: llvm-mc -filetype=obj -triple=i386-unknown-linux %p/Inputs/i386-cet4.s -o %t4.o

	# RUN: ld.lld -e func1 %t.o %t1.o -o %t
	# RUN: llvm-readelf -n %t \| FileCheck -check-prefix=CET -match-full-lines %s

	# RUN: ld.lld -e func1 %t.o %t2.o -o %t
	# RUN: llvm-readelf -n %t \| FileCheck -check-prefix=CET -match-full-lines %s

	# CET: Properties: x86 feature: IBT, SHSTK

	# RUN: ld.lld -e func1 %t.o %t3.o -o %t
	# RUN: llvm-readelf -S %t \| FileCheck -check-prefix=NOCET %s

	# NOCET: Section Headers
	# NOCET-NOT: .note.gnu.property

	# RUN: not ld.lld -e func1 %t.o %t3.o -o %t --require-cet 2>&1 \
	# RUN: \| FileCheck -check-prefix=ERROR %s
	# ERROR: i386-cet.s.tmp3.o: --require-cet: file is not compatible with CET

	# RUN: ld.lld -e func1 %t.o %t4.o -o %t
	# RUN: llvm-readelf -n %t \| FileCheck -check-prefix=NOSHSTK -match-full-lines %s

	# Check .note.gnu.protery without property SHSTK.
	# NOSHSTK: Properties: x86 feature: IBT

	.section ".note.gnu.property", "a"
	.long 4
	.long 0xc
	.long 0x5
	.asciz "GNU"

	.long 0xc0000002
	.long 4
	.long 3

	.text
	.globl func1
	.type func1,@function
	func1:
	call func2
	ret

lld/test/ELF/i386-feature-cet.s

This file was added.

				# REQUIRES: x86
				# RUN: llvm-mc -filetype=obj -triple=i386 %s -o %t.o
				# RUN: llvm-mc -filetype=obj -triple=i386 %p/Inputs/i386-cet1.s -o %t1.o
				# RUN: llvm-mc -filetype=obj -triple=i386 %p/Inputs/i386-cet2.s -o %t2.o
				# RUN: llvm-mc -filetype=obj -triple=i386 %p/Inputs/i386-cet3.s -o %t3.o
				# RUN: llvm-mc -filetype=obj -triple=i386 %p/Inputs/i386-cet4.s -o %t4.o

				# RUN: ld.lld -e func1 %t.o %t1.o -o %t
				# RUN: llvm-readelf -n %t \| FileCheck --check-prefix=CET --match-full-lines %s

				# RUN: ld.lld -e func1 %t.o %t2.o -o %t
				# RUN: llvm-readelf -n %t \| FileCheck --check-prefix=CET --match-full-lines %s

				# CET: Properties: x86 feature: IBT, SHSTK

				# RUN: ld.lld -e func1 %t.o %t3.o -o %t
				# RUN: llvm-readelf -S %t \| FileCheck --check-prefix=NOCET %s

				# NOCET: Section Headers
				# NOCET-NOT: .note.gnu.property

				# RUN: ld.lld -e func1 %t.o %t3.o -o %t -z force-ibt 2>&1 \
				# RUN: \| FileCheck --check-prefix=WARN %s
				# WARN: {{.*}}.o: -z force-ibt: file does not have GNU_PROPERTY_X86_FEATURE_1_IBT property

				# RUN: ld.lld -e func1 %t.o %t4.o -o %t
				# RUN: llvm-readelf -n %t \| FileCheck --check-prefix=NOSHSTK %s

				# Check .note.gnu.protery without property SHSTK.
				# NOSHSTK: Properties: x86 feature: IBT{{$}}

				# RUN: ld.lld -shared %t1.o -soname=so -o %t1.so
				# RUN: ld.lld -e func1 %t.o %t1.so -o %t
				# RUN: llvm-readelf -n %t \| FileCheck --check-prefix=CET --match-full-lines %s
				# RUN: llvm-readelf -x .got.plt %t \| FileCheck --check-prefix=GOTPLT %s
				# RUN: llvm-objdump -d --no-show-raw-insn --print-imm-hex %t \| FileCheck --check-prefix=DISASM %s

				# GOTPLT: Hex dump of section '.got.plt':
				# GOTPLT-NEXT: 0x004032d0 50224000 00000000 00000000 20124000
				# GOTPLT-NEXT: 0x004032e0 0b124000

				# DISASM: Disassembly of section .text:
				# DISASM: 00401200 func1:
				# DISASM-NEXT: 401200: calll 0x2b <func2+0x401230>
				# DISASM-NEXT: 401205: calll 0x36 <ifunc>
				# DISASM-NEXT: retl

				# DISASM: Disassembly of section .plt:
				# DISASM: 00401210 .plt:
				# DISASM-NEXT: 401210: pushl 0x4032d4
				# DISASM-NEXT: jmpl *0x4032d8
				# DISASM-NEXT: nop
				# DISASM-NEXT: nop
				# DISASM-NEXT: nop
				# DISASM-NEXT: nop
				# DISASM-NEXT: endbr32
				# DISASM-NEXT: pushl $0x0
				# DISASM-NEXT: jmp -0x1e <.plt>
				# DISASM-NEXT: nop

				# DISASM: Disassembly of section .plt.sec:
				# DISASM: 00401230 .plt.sec:
				# DISASM-NEXT: 401230: endbr32
				# DISASM-NEXT: jmpl *0x4032dc
				# DISASM-NEXT: nopw (%eax,%eax)

				# DISASM: Disassembly of section .iplt:
				# DISASM: 00401240 ifunc:
				# DISASM-NEXT: 401240: endbr32
				# DISASM-NEXT: jmpl *0x4032e0
				# DISASM-NEXT: nopw (%eax,%eax)

				.section ".note.gnu.property", "a"
				.long 4
				.long 0xc
				.long 0x5
				.asciz "GNU"

				.long 0xc0000002 # GNU_PROPERTY_X86_FEATURE_1_AND
				.long 4
				.long 3 # GNU_PROPERTY_X86_FEATURE_1_IBT and SHSTK

				.text
				.globl func1
				.type func1,@function
				func1:
				call func2
				call ifunc
				ret

				.type ifunc,@gnu_indirect_function
				ifunc:
				ret

lld/test/ELF/x86-64-cet.s

This file was deleted.

	# REQUIRES: x86
	# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t.o
	# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %p/Inputs/x86-64-cet1.s -o %t1.o
	# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %p/Inputs/x86-64-cet2.s -o %t2.o
	# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %p/Inputs/x86-64-cet3.s -o %t3.o
	# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %p/Inputs/x86-64-cet4.s -o %t4.o

	# RUN: ld.lld -e func1 %t.o %t1.o -o %t
	# RUN: llvm-readelf -n %t \| FileCheck -check-prefix=CET -match-full-lines %s

	# RUN: ld.lld -e func1 %t.o %t2.o -o %t
	# RUN: llvm-readelf -n %t \| FileCheck -check-prefix=CET -match-full-lines %s

	# CET: Properties: x86 feature: IBT, SHSTK

	# RUN: ld.lld -e func1 %t.o %t3.o -o %t
	# RUN: llvm-readelf -S %t \| FileCheck -check-prefix=NOCET %s

	# NOCET: Section Headers
	# NOCET-NOT: .note.gnu.property

	# RUN: not ld.lld -e func1 %t.o %t3.o -o %t --require-cet 2>&1 \
	# RUN: \| FileCheck -check-prefix=ERROR %s
	# ERROR: x86-64-cet.s.tmp3.o: --require-cet: file is not compatible with CET

	# RUN: ld.lld -e func1 %t.o %t4.o -o %t
	# RUN: llvm-readelf -n %t \| FileCheck -check-prefix=NOSHSTK -match-full-lines %s

	# Check .note.gnu.protery without property SHSTK.
	# NOSHSTK: Properties: x86 feature: IBT

	.section ".note.gnu.property", "a"
	.long 4
	.long 0x10
	.long 0x5
	.asciz "GNU"

	.long 0xc0000002
	.long 4
	.long 3
	.long 0

	.text
	.globl func1
	.type func1,@function
	func1:
	call func2
	ret

lld/test/ELF/x86-64-feature-cet.s

This file was added.

				# REQUIRES: x86
				# RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t.o
				# RUN: llvm-mc -filetype=obj -triple=x86_64 %p/Inputs/x86-64-cet1.s -o %t1.o
				# RUN: llvm-mc -filetype=obj -triple=x86_64 %p/Inputs/x86-64-cet2.s -o %t2.o
				# RUN: llvm-mc -filetype=obj -triple=x86_64 %p/Inputs/x86-64-cet3.s -o %t3.o
				# RUN: llvm-mc -filetype=obj -triple=x86_64 %p/Inputs/x86-64-cet4.s -o %t4.o

				# RUN: ld.lld -e func1 %t.o %t1.o -o %t
				# RUN: llvm-readelf -n %t \| FileCheck --check-prefix=CET --match-full-lines %s

				# RUN: ld.lld -e func1 %t.o %t2.o -o %t
				# RUN: llvm-readelf -n %t \| FileCheck --check-prefix=CET --match-full-lines %s

				# CET: Properties: x86 feature: IBT, SHSTK

				# RUN: ld.lld -e func1 %t.o %t3.o -o %t
				# RUN: llvm-readelf -S %t \| FileCheck --check-prefix=NOCET %s

				# NOCET: Section Headers
				# NOCET-NOT: .note.gnu.property

				# RUN: ld.lld -e func1 %t.o %t3.o -o %t -z force-ibt 2>&1 \
				# RUN: \| FileCheck --check-prefix=WARN %s
				# WARN: {{.*}}.o: -z force-ibt: file does not have GNU_PROPERTY_X86_FEATURE_1_IBT property

				# RUN: ld.lld -e func1 %t.o %t4.o -o %t
				# RUN: llvm-readelf -n %t \| FileCheck --check-prefix=NOSHSTK %s

				# Check .note.gnu.protery without property SHSTK.
				# NOSHSTK: Properties: x86 feature: IBT{{$}}

				# RUN: ld.lld -shared %t1.o -soname=so -o %t1.so
				# RUN: ld.lld -e func1 %t.o %t1.so -o %t
				# RUN: llvm-readelf -n %t \| FileCheck --check-prefix=CET --match-full-lines %s
				# RUN: llvm-readelf -x .got.plt %t \| FileCheck --check-prefix=GOTPLT %s
				# RUN: llvm-objdump -d --no-show-raw-insn --print-imm-hex %t \| FileCheck --check-prefix=DISASM %s

				# GOTPLT: Hex dump of section '.got.plt':
				# GOTPLT-NEXT: 203480 80232000 00000000 00000000 00000000
				# GOTPLT-NEXT: 203490 00000000 00000000 50132000 00000000
				# GOTPLT-NEXT: 2034a0 00000000 00000000
				xiangzhangllvmUnsubmitted Not Done Reply Inline Actions lit test failed here, Here we mainly check is the first plt address is written correct in .got.plt and we can really see the data (50132000) 201350: f3 0f 1e fa endbr64 // First PLT I see here (line 87) add indirect function check, do we need to deal with this special function in this patch? or deal with it in another independent patch? xiangzhangllvm: lit test failed here, Here we mainly check is the first plt address is written correct in .got.
				MaskRayAuthorUnsubmitted Done Reply Inline Actions On RELA targets, .got.plt slots for IFUNC are not used. They can be kept zero. See D72474 Unfortunately pre-merge-bot does not recognize patch dependencies, so `x86-64-feature-cet.s` will fail. MaskRay: On RELA targets, .got.plt slots for IFUNC are not used. They can be kept zero. See D72474…
				xiangzhangllvmUnsubmitted Done Reply Inline Actions So can we remove the GOTPLT check here let it 'skip' the bot check? because LLVM 10.0.0 will release on Jan 15, 2020. We can add back the GOTPLT check after the dependent patch merged. xiangzhangllvm: So can we remove the GOTPLT check here let it 'skip' the bot check? because LLVM 10.0.0 will…
				MaskRayAuthorUnsubmitted Done Reply Inline Actions You did not need to worry about the test... Anyway I've committed the .got.plt change. This patch should pass all tests now. @ruiu Good to merge? MaskRay: You did not need to worry about the test... Anyway I've committed the .got.plt change. This…

				# DISASM: Disassembly of section .text:
				# DISASM: 0000000000201330 func1:
				# DISASM-NEXT: 201330: callq 0x2b <func2+0x201360>
				# DISASM-NEXT: 201335: callq 0x36 <func2+0x201370>
				# DISASM-NEXT: retq

				# DISASM: Disassembly of section .plt:
				# DISASM: 0000000000201340 .plt:
				# DISASM-NEXT: 201340: pushq 0x2142(%rip)
				# DISASM-NEXT: jmpq *0x2144(%rip)
				# DISASM-NEXT: nopl (%rax)
				# DISASM-NEXT: endbr64
				# DISASM-NEXT: pushq $0x0
				# DISASM-NEXT: jmp -0x1e <.plt>
				# DISASM-NEXT: nop

				# DISASM: Disassembly of section .plt.sec:
				# DISASM: 0000000000201360 .plt.sec:
				# DISASM-NEXT: 201360: endbr64
				# DISASM-NEXT: jmpq *0x212e(%rip)
				# DISASM-NEXT: nopw (%rax,%rax)

				# DISASM: Disassembly of section .iplt:
				# DISASM: 0000000000201370 .iplt:
				# DISASM-NEXT: 201370: endbr64
				# DISASM-NEXT: jmpq *0x2126(%rip)
				# DISASM-NEXT: nopw (%rax,%rax)

				.section ".note.gnu.property", "a"
				.long 4
				.long 0x10
				.long 0x5
				.asciz "GNU"

				.long 0xc0000002 # GNU_PROPERTY_X86_FEATURE_1_AND
				.long 4
				.long 3 # GNU_PROPERTY_X86_FEATURE_1_IBT and SHSTK
				.long 0

				.text
				.globl func1
				.type func1,@function
				func1:
				call func2
				call ifunc
				ret

				.type ifunc,@gnu_indirect_function
				ifunc:
				ret

This is an archive of the discontinued LLVM Phabricator instance.

Support Intel Control-flow Enforcement TechnologyClosedPublic

Details

Diff Detail

Event Timeline

On the compiler side

On the linker side

Revision Contents

Diff 237857

lld/ELF/Arch/X86.cpp

lld/ELF/Arch/X86_64.cpp

lld/ELF/Config.h

lld/ELF/Driver.cpp

lld/ELF/Options.td

lld/ELF/SyntheticSections.h

lld/ELF/SyntheticSections.cpp

lld/ELF/Target.h

lld/ELF/Writer.cpp

lld/docs/ld.lld.1

lld/test/ELF/i386-cet.s

lld/test/ELF/i386-feature-cet.s

lld/test/ELF/x86-64-cet.s

lld/test/ELF/x86-64-feature-cet.s

Support Intel Control-flow Enforcement Technology
ClosedPublic