This is an archive of the discontinued LLVM Phabricator instance.

Support Intel Control-flow Enforcement Technology
ClosedPublic

Authored by MaskRay on Mar 25 2019, 10:19 AM.

Details

Summary

This patch is based on https://reviews.llvm.org/D58102.

This patch adds Intel CET (Control-flow Enforcement Technology) support to
lld. The implementation follows the draft version of psABI which you can
download from https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI.

I added a comment about what Intel CET is, so I won't repeat it here, but in
summary, CET introduces a new restriction on indirect jump instructions so
that you can limit the places to which you can jump to using indirect jumps.

In order to use the feature, you need to compile source files with
-fcf-protection=full. CET is enabled only when all input files are compiled with
the flag. If you want to make sure that the feature is enabled, pass
-z force-ibt -z shstk to lld.

CET-enabled executable files have two PLT sections, ".plt" and ".plt.sec".
For the details as to why we have two sections, please read this patch's
comment.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
peter.smith added inline comments.Mar 26 2019, 10:28 AM
lld/ELF/Driver.cpp
1684

In theory you could update Config->X86Features incrementally as you read them in rather than storing them in InputFiles and processing them later. Although I think doing all the logic here makes it easier to understand.

1692

I've made a few English suggestions below. No functional changes.

// To enable CET (x86's hardware-assited control flow enforcement),
// each source file must be compiled by passing --xxx to the compiler
// [what is the compiler flag to enable CET?]. Object files compiled with the
// flag contain feature flags indicating that they are compatible with CET. We
// enable the feature only when all object files are compatible with CET.
//
// This function returns the merged feature flags. If 0, we cannot enable CET.
lld/ELF/SyntheticSections.cpp
291

I think it would be better to say
"In x86, object files may contain feature flags indicating the features that they have used."
The x86-64-psABI has:

GNU_PROPERTY_X86_FEATURE_1_AND The x86 processor features indicated by
the corresponding bits are used in program.
2563

Only the spec is available at the moment.

2570

"4 bytes long" and "16 bytes long"

ruiu marked 10 inline comments as done.Mar 26 2019, 1:47 PM
ruiu added inline comments.
lld/ELF/Arch/X86_64.cpp
185–186

Buf is incremented on every loop iteration.

lld/ELF/InputFiles.cpp
627 ↗(On Diff #192157)

Data can be empty.

631 ↗(On Diff #192157)

SIze is actually the size of the body, so I needed +8. Fixed.

lld/test/ELF/Inputs/x86-64-cet4.s
3 ↗(On Diff #192157)

I don't want to calculate offset in the assembler as this is a test file. Something more obvious is preferable. I added i386-cet* test files.

ruiu updated this revision to Diff 192347.Mar 26 2019, 1:48 PM
ruiu marked an inline comment as done.
  • address review comments
ruiu updated this revision to Diff 192351.Mar 26 2019, 1:58 PM
  • add tests to verify the contents of .plt and .splt

Is --force-cet the best option name? H. J. Lu pointed out that gold has -z ibtplt and other options, but I couldn't find these options in the binutils' repository. Is the option really implemneted to gold?

The binutils-gdb repository hosts two linkers: ld/ld-new (GNU ld) and gold/ld-new (GNU gold). -z ibtplt and -z ibt are ld.bfd options, not gold's. I asked in https://reviews.llvm.org/D58102 if there is a proposal to add support to gold but get no response so far...

I've done some experiments and let me clarify things. @xiangzhangllvm Please correct me I missed something.

On the compiler side

Support for Intel CET requires GCC 8 (https://gcc.gnu.org/gcc-8/changes.html).

-fcf-protection= is used to define __CET__ (branch: 1, return: 2, full: 3) and emit .note.gnu.property in object files.

In some older releases of GCC 8 (before Apr 20, 2018), CET requires two other options -mibt and -mshstk, which enable the macros __IBT__ __SHSTK__, respectively. -mcet implies both -mibt and -mshstk and defines __CET__. -mshstk enables several builtin functions.

-mibt is deleted in some newer GCC 8 release. -mshstk controls only the availability of some builtin options. If you are using some older releases of GCC 8:

gcc a.c -mibt -c # `__IBT__` is defined but no .note.gnu.property
gcc a.c -fcf-protection=branch -mibt -c # NT_GNU_PROPERTY_X86_FEATURE_IBT tag
gcc a.c -fcf-protection=return -mshstk -c # NT_GNU_PROPERTY_X86_FEATURE_SHSTK tag
gcc a.c -fcf-protection=full -mcet -c # -mcet is short for -mibt -mshstk, IBT + SHSTK

In newer GCC, forget about -mibt -mshstk and just use -fcf-protection=. All these options do no affect codegen.

On the linker side

ibt is what concerns linkers. shstk has no influence other than the output note section. The feature is only implemented in ld.bfd, not in gold.

-z ibtplt generates the second PLT .plt.sec (not .splt). -z ibt implies -z ibtplt.
The option does not check the input NT_GNU_PROPERTY_X86_FEATURE_IBT tag.

-z ibt emits NT_GNU_PROPERTY_X86_FEATURE_IBT but plain -z ibtplt doesn't.

GCC 5~8 support Intel MPX and you shall see the bnd prefix in some jump instructions.
Note that GCC 9/Linux kernel 4.18 drop MPX. Fortunately it seems MPX doesn't change the size of PLT entries. Thus in lld we don't have to support MPX, but we are still flexible enough to add related support if it revives in the future.

I think you have elaborate them.
I have check it. it is really ld.bfd not ld.gold, sorry for my misremember.

xiangzhangllvm added a comment.EditedMar 26 2019, 6:52 PM

So on the compiler driver side, -fcf-protection is the single option end users are concerned. I believe this option hasn't been assigned linker option semantics in either GCC or clang. If assigning it the linker option semantics is favorable, we should make it compatible in GCC/clang and ideally use the same options in lld/ld.bfd/gold.

The current form of this patch just implements --force-cet, thus I believe the use cases are:

gcc a.o -o a # create `.splt` (or `.plt.sec`) if all of a.o and other stdlib have IBT bit set
             # I worry that someone may not want .splt for this case
  ld.lld a.o ... -o a

If all the files contain IBT, it means all the files compiled with -fcf-protection(=branch) ,there is no reason for user not to enable the IBT.

gcc a.o -fcf-protection -o a # fail if any of a.o and other stdlib do not have IBT+SHSTK bits set

It will not fail, it will continue link out a.out which contain no IBT+SHSTK

ld.lld a.o ... --force-cet -o a

This will require all files be compiled with -fcf-protection(=xxx), or it will give out error.

Another possibility is:

gcc a.o -o a # create .splt (or .plt.sec) if all of a.o and other stdlib have IBT bit set

ld.lld a.o ... -o a

gcc a.o -fcf-protection -o a # -fcf-protection is ignored as a linker option

ld.lld a.o ... -o a

Yes, current gcc and ld.lld do like this. -fcf-protection is just compiler option, it will be ignored for linker

gcc a.o -fcf-protection -Wl,--force-cet -o a # -Wl,--force-cet may be replaced by a -f flag

ld.lld a.o ... --force-cet -o a

Yes, currently, they are same, but I think people will rarely use --force-cet, it hard to linker the big projects in now days. Especially for the self build projects.

Another thing, given the removal of `-mibt` (D46881 on clang side), I wish we can repurpose `-z ibt` `-z ibtplt` `-z shstk` in ld.bfd if we can find better semantics for them, e.g. let `-fcf-protection=branch` translate to `-z ibt`: fail if any of input object files does not have `GNU_PROPERTY_X86_FEATURE_1_IBT` set. `-fcf-protection=shstk` translates to `-z shstk`, respectively.

It seems not good idea, we should separate the options of compiler and linker, and reserve the "-z xxx" options just for tests, not encourage users to use them for other purposes. Like gcc does.
In normal way, the resulting CET flags should mainly depend on whether all the link files are CET compiled or not.

xiangzhangllvm added inline comments.Mar 26 2019, 7:16 PM
lld/ELF/Arch/X86_64.cpp
185–186

Hi ruiu, I want to mean:
Next loop Buf += 16, (-getPltEntryOffset(I)) -= 16
So they will write the some address for jmpq Instruction.

lld/test/ELF/x86-64-cet.s
35–37

This context is not corresponding to your writePlt() function.

if (In.IBTPlt) {
  const uint8_t Inst[] = {
      0xf3, 0x0f, 0x1e, 0xfb,       // endbr32
      0xff, 0x25, 0,    0,    0, 0, // jmpq *got(%rip)
      0x66, 0x0f, 0x1f, 0x44, 0, 0, // nop
  };
ruiu updated this revision to Diff 192744.Mar 28 2019, 4:56 PM
  • Generally simplified the code.
  • Separated CET-specific code from the regular code.
  • Fix bugs and add tests for .got.plt, .plt and .splt.
lld/ELF/Arch/X86.cpp
430

It is not useful, For code size, Is it better to write nothing here, or not use it at all ?

MaskRay added inline comments.Mar 28 2019, 10:52 PM
lld/ELF/Arch/X86.cpp
414

brend32 -> endbr32

416

template <class ELFT> class IntelCET : public X86_64<ELFT> {

IBT-enabled PLT can be used with the ILP32 programming model (x86-32 ABI). (If you don't like putting too much stuff in the initial commit, class IntelCET : public X86_64<ELF64LE> {.

lld/ELF/Arch/X86_64.cpp
572

brend32 -> endbr64

589

If I understand it correctly, IntelCET::writePlt writes the contents of ld.bfd's .plt.sec, then it doesn't need a PLT header. Can Symbol::getPltVA (Plt->getVA() + Plt->HeaderSize + PltIndex * Target->PltEntrySize;) compute a wrong address because it thinks HeaderSize is actually 0?

Disassembly of section .plt:  // counterpart in lld: IntelCET::writeIBTPlt

0000000000001020 <.plt>:
    1020:       ff 35 e2 2f 00 00       pushq  0x2fe2(%rip)        # 4008 <_GLOBAL_OFFSET_TABLE_+0x8>
    1026:       f2 ff 25 e3 2f 00 00    bnd jmpq *0x2fe3(%rip)        # 4010 <_GLOBAL_OFFSET_TABLE_+0x10>
    102d:       0f 1f 00                nopl   (%rax)
    1030:       f3 0f 1e fa             endbr64 
    1034:       68 00 00 00 00          pushq  $0x0
    1039:       f2 e9 e1 ff ff ff       bnd jmpq 1020 <.plt>
    103f:       90                      nop
    1040:       f3 0f 1e fa             endbr64 
    1044:       68 01 00 00 00          pushq  $0x1
    1049:       f2 e9 d1 ff ff ff       bnd jmpq 1020 <.plt>
    104f:       90                      nop
...
Disassembly of section .plt.sec:  // counterpart in lld: IntelCET::writePlt

0000000000001060 <putchar@plt>:
    1060:       f3 0f 1e fa             endbr64 
    1064:       f2 ff 25 ad 2f 00 00    bnd jmp QWORD PTR [rip+0x2fad]        # 4018 <putchar@GLIBC_2.2.5>
    106b:       0f 1f 44 00 00          nop    DWORD PTR [rax+rax*1+0x0]

0000000000001070 <puts@plt>:
    1070:       f3 0f 1e fa             endbr64 
    1074:       f2 ff 25 a5 2f 00 00    bnd jmp QWORD PTR [rip+0x2fa5]        # 4020 <puts@GLIBC_2.2.5>
    107b:       0f 1f 44 00 00          nop    DWORD PTR [rax+rax*1+0x0]
lld/ELF/InputFiles.cpp
594 ↗(On Diff #192744)

call flow control -> Control-Flow Enforcement ?

608 ↗(On Diff #192744)

using Elf_Nhdr = ... is more common nowadays.

(Besides, at some point, we need some IPLT + --require-cet tests.)

lld/ELF/Arch/X86.cpp
431

In the implementation of ld.bfd, I don't have 16 bytes of NOPs. I think in the ABI, .SPLT1 is the counterpart of .PLT1, there is .PLT0 (PLT header) but .SPLT0 doesn't exist.

Also, ld.bfd uses .plt.sec, not .splt. @hjl.tools Which is the preferred name for the second PLT section?

ruiu marked 10 inline comments as done.Mar 29 2019, 10:20 AM
ruiu added inline comments.
lld/ELF/Arch/X86.cpp
416

Actually the template is gone, I removed it because no code depended on ELFT. (Also no one really uses x32 ABI I guess? I thought that I read an article that Linux kernel developers are discussing deprecating x32 ABI.)

lld/ELF/InputFiles.cpp
594 ↗(On Diff #192744)

This is a general name and not specific to the Intel's implementation.

608 ↗(On Diff #192744)

Yeah, but we should convert all of them in lld at once.

ruiu updated this revision to Diff 192855.Mar 29 2019, 10:20 AM
ruiu marked 2 inline comments as done.
  • removed SPLT0 as it contained just two NOPs
  • added tests for --require-cet
ruiu updated this revision to Diff 193034.Mar 31 2019, 6:00 PM
  • rebase
MaskRay added inline comments.Mar 31 2019, 8:10 PM
lld/ELF/Arch/X86.cpp
416

Sorry I didn't notice that X86_64 had been de-templated.

lld/ELF/InputFiles.cpp
594 ↗(On Diff #192744)

My concern here is that llvm-objdump/GNU objdump and other tools/profilers would have to recognize two section names .plt.sec and .splt. Many tools use the section name to do special things, e.g.
@hjl.tools If the ABI can be changed to not deliberately differ from the implementation (or the other way round), it'd be great.

binutils/objdump.c:739

static bfd_boolean
is_significant_symbol_name (const char * name)
{
  return strncmp (name, ".plt", 4) == 0 || strcmp (name, ".got") == 0;
}
ruiu marked an inline comment as done.Mar 31 2019, 8:27 PM
ruiu added inline comments.
lld/ELF/InputFiles.cpp
594 ↗(On Diff #192744)

I'm not sure if I follow. Currently, in my patch, I use ".splt" and ".plt" as PLT section names, which I think as defined by the ABI. Are you saying that ".plt.sec" might be a correct name?

xiangzhangllvm added inline comments.Mar 31 2019, 8:35 PM
lld/ELF/InputFiles.cpp
594 ↗(On Diff #192744)

The SPEC not fix the plt's name, It full name is Second IBT-enabled PLT,
GCC name it as plt.sec. I name it splt for short at first.
It may be better to have the same name with GCC.

ruiu marked an inline comment as done.Mar 31 2019, 9:36 PM
ruiu added inline comments.
lld/ELF/InputFiles.cpp
594 ↗(On Diff #192744)

Ok, the I'll just follow the others. Is there anything other than this you invented a nonstandard stuff for this patch?

MaskRay added inline comments.Mar 31 2019, 10:12 PM
lld/ELF/InputFiles.cpp
594 ↗(On Diff #192744)

I'm not sure if I follow. Currently, in my patch, I use ".splt" and ".plt" as PLT section names, which I think as defined by the ABI. Are you saying that ".plt.sec" might be a correct name?

Sorry for not being clear. PLT sections do not have a dedicated section type and in practice they are usually recognized by the name .plt. My example above gave one case. If ld.bfd uses .plt.sec while we use .splt, the tools (objdump/llvm-mc/profiler/...) would have to understand the two names. The complexity could have been avoided if the linkers and the ABI agreed on the same name, say, .plt.sec.

I prefer .plt.sec to .splt because the same-prefix convention is already used in several other places to have fine-grained section types.

Thus, I suggest, if there is no strong reason to keep using .splt in the ABI, change it to what the implementations do: .plt.sec.

ruiu updated this revision to Diff 193044.Mar 31 2019, 10:49 PM
  • renamed .splt .plt.sec

writePltHeader should also be changed.

diff --git i/ELF/Arch/X86_64.cpp w/ELF/Arch/X86_64.cpp
index 668b40863..75f780f6b 100644
--- i/ELF/Arch/X86_64.cpp
+++ w/ELF/Arch/X86_64.cpp
@@ -144,7 +144,7 @@ void X86_64::writePltHeader(uint8_t *Buf) const {
   };
   memcpy(Buf, PltData, sizeof(PltData));
   uint64_t GotPlt = In.GotPlt->getVA();
-  uint64_t Plt = In.Plt->getVA();
+  uint64_t Plt = In.IBTPlt ? In.IBTPlt->getVA() : In.Plt->getVA();
   write32le(Buf + 2, GotPlt - Plt + 2); // GOTPLT+8
   write32le(Buf + 8, GotPlt - Plt + 4); // GOTPLT+16
 }

With these changes, I manage to run hello-world with clang {,-m32} -fuse-ld=lld -fcf-protection=branch a.c -Wl,--require-cet -o a; ./a

lld/ELF/Arch/X86_64.cpp
618

write32le(Buf + 5, I);

For glibc lazy binding, x86_64 uses relocation indices while x86_32 uses offsets.

lld/ELF/Writer.cpp
532

Since In.IBTPlt is named .plt while In.Plt is named .plt.sec.
Creating In.IBTPlt before In.Plt will give us this ordering: .plt .plt.sec (instead of .plt.sec .plt), which may look more natural.

No strong preference here. Up to you.

ruiu updated this revision to Diff 193048.Apr 1 2019, 12:53 AM
  • review comments
ruiu added a comment.Apr 1 2019, 12:53 AM

writePltHeader should also be changed.

diff --git i/ELF/Arch/X86_64.cpp w/ELF/Arch/X86_64.cpp
index 668b40863..75f780f6b 100644
--- i/ELF/Arch/X86_64.cpp
+++ w/ELF/Arch/X86_64.cpp
@@ -144,7 +144,7 @@ void X86_64::writePltHeader(uint8_t *Buf) const {
   };
   memcpy(Buf, PltData, sizeof(PltData));
   uint64_t GotPlt = In.GotPlt->getVA();
-  uint64_t Plt = In.Plt->getVA();
+  uint64_t Plt = In.IBTPlt ? In.IBTPlt->getVA() : In.Plt->getVA();
   write32le(Buf + 2, GotPlt - Plt + 2); // GOTPLT+8
   write32le(Buf + 8, GotPlt - Plt + 4); // GOTPLT+16
 }

With these changes, I manage to run hello-world with clang {,-m32} -fuse-ld=lld -fcf-protection=branch a.c -Wl,--require-cet -o a; ./a

Great! Thank you for verifying!

MaskRay accepted this revision.Apr 1 2019, 1:04 AM

".splt" is now In.Plt. Changes to Relocations.cpp are reverted as we no longer need them.

Is --force-cet the best option name? H. J. Lu pointed out that gold has -z ibtplt and other options, but I couldn't find these options in the binutils' repository. Is the option really implemneted to gold?

The description needs update. .splt -> .plt.sec, gold -> ld.bfd

ibtplt is implemented for ld.bfd but not for gold.

This revision is now accepted and ready to land.Apr 1 2019, 1:04 AM
xiangzhangllvm added inline comments.Apr 1 2019, 1:08 AM
lld/ELF/InputFiles.cpp
594 ↗(On Diff #192744)

Another difference with GNU-LD is that, GNU-LD can deal with the muli-.note.gnu.sections case, for example:

file.s :
 .section ".note.gnu.property", "a"
....
 .section ".note.gnu.property", "a"
....

In my earlier patch version I supported it, but at last, to simply the patch, I delete this function.
Because I sought compiler and assember should merge them to 1 section (usually with multi-desc) one day.
I am not sure my sought is right or not. I think it should let you know.

MaskRay added inline comments.Apr 1 2019, 1:26 AM
lld/ELF/Arch/X86.cpp
416

IBT (or IBTPlt) may be a better name than IntelCET as this class has nothing to do with the other part of CET: shstk.

lld/test/ELF/Inputs/i386-cet1.s
2 ↗(On Diff #193048)

Missing .align 4

lld/test/ELF/Inputs/i386-cet2.s
1 ↗(On Diff #193048)

Missing .align 8

lld/test/ELF/Inputs/i386-cet4.s
1 ↗(On Diff #193048)

In one of the tests, it may be good to check --require-cet works as intended if .note.gnu.property contains several notes.

ld -r may merge .note.gnu.property of several input objects into one.

ruiu edited the summary of this revision. (Show Details)Apr 1 2019, 1:27 AM
MaskRay added inline comments.Apr 1 2019, 1:27 AM
lld/test/ELF/Inputs/i386-cet2.s
1 ↗(On Diff #193048)

Sorry, x86-64-cet.s needs .align 8, not i386-cet*.s.

ruiu marked an inline comment as done.Apr 1 2019, 1:33 AM
ruiu added inline comments.
lld/test/ELF/Inputs/i386-cet1.s
2 ↗(On Diff #193048)

Do you need that? It's not wrong to add that but for the purpose of testing this file is enough, no?

MaskRay added inline comments.Apr 1 2019, 1:47 AM
lld/test/ELF/Inputs/i386-cet1.s
2 ↗(On Diff #193048)

Not required but that would make the tests feel more real.

xiangzhangllvm added inline comments.Apr 1 2019, 4:21 AM
lld/ELF/SyntheticSections.cpp
311

Here may waste 4 bytes of code size in 32-bit machine.

ruiu marked an inline comment as done.Apr 1 2019, 9:44 PM
ruiu added inline comments.
lld/ELF/SyntheticSections.cpp
311

Wasting 4 bytes is I think completely acceptable, but I think this is a correctness issue. I'll fix that.

ruiu updated this revision to Diff 193233.Apr 1 2019, 10:27 PM
ruiu edited the summary of this revision. (Show Details)
  • address review comments
ruiu marked an inline comment as done.Apr 1 2019, 10:27 PM

I noticed that the latest GNU objdump can symbolize ld.bfd emitted .plt .plt.sec entries but not those emitted by lld. The gdb disassembly outputs exhibit the same difference.
I guess it is probably because lld doesn't emit the bnd prefix (0xf2) while ld.bfd does (BTW, I cannot turn off it: there is no -z nobndplt), and the relevant support isn't currently available in objdump.
If that is the case, it gives me the impression that the ABI can be changed as no tooling has caught up yet.

Since we don't emit the bnd prefix (0xf2) for MPX (MPX support has been dropped by GCC 9), we can merge .plt and .plt.sec entries as follows:

4 endbr64
5 jmpq *xxx(%rip) ; jump to the next endbr64 for lazy binding
4 endbr64
5 pushq           ; relocaton index
5 jmpq *xxx(%rip) ; jump to .plt

This PLT entry takes 4+5+4+5+5=23 bytes. If we aim for an 8-byte alignment, we can get the 24-byte PLT entry size, which improves on the current 32-byte.
(However, if the bnd prefix is prepended to the jmpq instructions, the PLT will take 25 bytes and won't cram in a 24-byte entry. I feel people don't care much about it now, so don't let it pessimize common cases)

I have another question to @xiangzhangllvm: have you considered optimizing the PLT entry size for -z now? We can have such simple PLT entry:

4 endbr64
5 jmpq *xxx(%rip)

Of course, if we use -z now, there may be little reason not to just use -fno-plt when compiling the sources.

Hi MaskRay, I tested your idea by changing the patch:

diff --git a/ELF/Arch/X86_64.cpp b/ELF/Arch/X86_64.cpp
index 7d7cc00..14fb6b2 100644
--- a/ELF/Arch/X86_64.cpp
+++ b/ELF/Arch/X86_64.cpp
@@ -553,11 +553,11 @@ void IntelCET::writePlt(uint8_t *Buf, uint64_t GotPltEntryAddr,
                         unsigned RelOff) const {
   const uint8_t Inst[] = {
       0xf3, 0x0f, 0x1e, 0xfa,       // endbr64
-      0xff, 0x25, 0,    0,    0, 0, // jmpq *got(%rip)
-      0x66, 0x0f, 0x1f, 0x44, 0, 0, // nop
+      0xf2, 0xff, 0x25, 0,    0,    0, 0, // jmpq *got(%rip)
+      0x0f, 0x1f, 0x44, 0, 0, // nop
   };
   memcpy(Buf, Inst, sizeof(Inst));
-  write32le(Buf + 6, GotPltEntryAddr - PltEntryAddr - 10);
+  write32le(Buf + 7, GotPltEntryAddr - PltEntryAddr - 11);
 }

After that, GNU objdump can symbolize .plt.sec, it is really the BND prefix factor.

[xiangzh1@shliclel100 /export/users.shared/xiangzh1/LLVM_ORG/test]$ld.lld -e func1 t.o t1.so -o tout-new
[xiangzh1@shliclel100 /export/users.shared/xiangzh1/LLVM_ORG/test]$objdump -d tout-new

tout-new:     file format elf64-x86-64


Disassembly of section .text:

0000000000201000 <func1>:
  201000:       e8 3b 00 00 00          callq  201040 <func2@plt>
  201005:       e8 46 00 00 00          callq  201050 <func3@plt>
  20100a:       c3                      retq

Disassembly of section .plt:

0000000000201010 <.plt>:
  201010:       ff 35 f2 1f 00 00       pushq  0x1ff2(%rip)        # 203008 <_DYNAMIC+0x1008>
  201016:       ff 25 f4 1f 00 00       jmpq   *0x1ff4(%rip)        # 203010 <_DYNAMIC+0x1010>
  20101c:       0f 1f 40 00             nopl   0x0(%rax)
  201020:       f3 0f 1e fa             endbr64
  201024:       68 00 00 00 00          pushq  $0x0
  201029:       e9 e2 ff ff ff          jmpq   201010 <func1+0x10>
  20102e:       66 90                   xchg   %ax,%ax
  201030:       f3 0f 1e fa             endbr64
  201034:       68 01 00 00 00          pushq  $0x1
  201039:       e9 d2 ff ff ff          jmpq   201010 <func1+0x10>
  20103e:       66 90                   xchg   %ax,%ax

Disassembly of section .plt.sec:

0000000000201040 <func2@plt>:
  201040:       f3 0f 1e fa             endbr64
  201044:       f2 ff 25 cd 1f 00 00    bnd jmpq *0x1fcd(%rip)        # 203018 <func2>
  20104b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

0000000000201050 <func3@plt>:
  201050:       f3 0f 1e fa             endbr64
  201054:       f2 ff 25 c5 1f 00 00    bnd jmpq *0x1fc5(%rip)        # 203020 <func3>
  20105b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

But I think this is not the good reason to change the ABI, it is better to change the objdump tool's code to fit the new plt.
GCC have implemented this ABI for one year, It will be very hard to push the changing work.

We never considered optimizing the PLT entry size, because we can not make sure how the developers to "deal with" the old 16-bytes plt in their old programs.
That is also one of the reasons why GCC (H.J.) split the plt to 2 16-bytes plts at first.

ruiu added a comment.Apr 10 2019, 2:43 AM

Hi MaskRay, I tested your idea by changing the patch:

diff --git a/ELF/Arch/X86_64.cpp b/ELF/Arch/X86_64.cpp
index 7d7cc00..14fb6b2 100644
--- a/ELF/Arch/X86_64.cpp
+++ b/ELF/Arch/X86_64.cpp
@@ -553,11 +553,11 @@ void IntelCET::writePlt(uint8_t *Buf, uint64_t GotPltEntryAddr,
                         unsigned RelOff) const {
   const uint8_t Inst[] = {
       0xf3, 0x0f, 0x1e, 0xfa,       // endbr64
-      0xff, 0x25, 0,    0,    0, 0, // jmpq *got(%rip)
-      0x66, 0x0f, 0x1f, 0x44, 0, 0, // nop
+      0xf2, 0xff, 0x25, 0,    0,    0, 0, // jmpq *got(%rip)
+      0x0f, 0x1f, 0x44, 0, 0, // nop
   };
   memcpy(Buf, Inst, sizeof(Inst));
-  write32le(Buf + 6, GotPltEntryAddr - PltEntryAddr - 10);
+  write32le(Buf + 7, GotPltEntryAddr - PltEntryAddr - 11);
 }

After that, GNU objdump can symbolize .plt.sec, it is really the BND prefix factor.

[xiangzh1@shliclel100 /export/users.shared/xiangzh1/LLVM_ORG/test]$ld.lld -e func1 t.o t1.so -o tout-new
[xiangzh1@shliclel100 /export/users.shared/xiangzh1/LLVM_ORG/test]$objdump -d tout-new

tout-new:     file format elf64-x86-64


Disassembly of section .text:

0000000000201000 <func1>:
  201000:       e8 3b 00 00 00          callq  201040 <func2@plt>
  201005:       e8 46 00 00 00          callq  201050 <func3@plt>
  20100a:       c3                      retq

Disassembly of section .plt:

0000000000201010 <.plt>:
  201010:       ff 35 f2 1f 00 00       pushq  0x1ff2(%rip)        # 203008 <_DYNAMIC+0x1008>
  201016:       ff 25 f4 1f 00 00       jmpq   *0x1ff4(%rip)        # 203010 <_DYNAMIC+0x1010>
  20101c:       0f 1f 40 00             nopl   0x0(%rax)
  201020:       f3 0f 1e fa             endbr64
  201024:       68 00 00 00 00          pushq  $0x0
  201029:       e9 e2 ff ff ff          jmpq   201010 <func1+0x10>
  20102e:       66 90                   xchg   %ax,%ax
  201030:       f3 0f 1e fa             endbr64
  201034:       68 01 00 00 00          pushq  $0x1
  201039:       e9 d2 ff ff ff          jmpq   201010 <func1+0x10>
  20103e:       66 90                   xchg   %ax,%ax

Disassembly of section .plt.sec:

0000000000201040 <func2@plt>:
  201040:       f3 0f 1e fa             endbr64
  201044:       f2 ff 25 cd 1f 00 00    bnd jmpq *0x1fcd(%rip)        # 203018 <func2>
  20104b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

0000000000201050 <func3@plt>:
  201050:       f3 0f 1e fa             endbr64
  201054:       f2 ff 25 c5 1f 00 00    bnd jmpq *0x1fc5(%rip)        # 203020 <func3>
  20105b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

But I think this is not the good reason to change the ABI, it is better to change the objdump tool's code to fit the new plt.
GCC have implemented this ABI for one year, It will be very hard to push the changing work.

I don't think I'm convinced. MaskRay's point is that the tooling's support of CET is buggy and needs changing anyway. Currently it looks like no tool seems to be working correctly. Then, why don't you need to stick with the ABI that is claimed to provide better performance without any evidence? Keeping ABI simple is perhaps much more important than you may be thinking. I really wish you will take it more seriously.

Besides that, I don't think GCC implemented any of the feature of the PLT -- it is only in the BFD linker but not in gold. Also no one is using the feature in the wild because no processors supporting CET is available in the market. Everything seems to be experimental yet at the moment.

We never considered optimizing the PLT entry size, because we can not make sure how the developers to "deal with" the old 16-bytes plt in their old programs.
That is also one of the reasons why GCC (H.J.) split the plt to 2 16-bytes plts at first.

I don't think I'm convinced. MaskRay's point is that the tooling's support of CET is buggy and needs changing anyway. Currently it looks like no tool seems to be working correctly. Then, why don't you need to stick with the ABI that is claimed to provide better performance without any evidence? Keeping ABI simple is perhaps much more important than you may be thinking. I really wish you will take it more seriously.

Tooling output changes are straight forward to fix, and yes CET is new, and tooling is still being adjusted, but this adjustment is carried out by following the published ABI. We recently fixed ltrace for CET plt usage in Fedora for example.

To change the ABI you must go upstream to the ABI discussion list and have that discussion with the hardware vendor. The hardware vendor supports many different operating systems and implementations and the changes to the standard need to be discussed there. Alternatively you can make and adhere to your own standards and work with downstream tooling to support those standards as alternative implementations (hopefully with interoperability). I would not like to see two alternative ABIs, that leads to serious problems in downstream tooling and duplicated effort.

My opinion is that lld is too late to make the requested change to the PLT format at this point. You either get to adhere to the standard and participate in distribution builds, or not and you don't.

Besides that, I don't think GCC implemented any of the feature of the PLT -- it is only in the BFD linker but not in gold. Also no one is using the feature in the wild because no processors supporting CET is available in the market. Everything seems to be experimental yet at the moment.

There is nothing wrong with a feature being only in the BFD linker, it is the canonical linker for many distributions and build systems. Gold is a distinct linker from BFD, just like lld is a distinct linker from BFD. Gold has limited support for working with CET-enabled libraries, and the point of this support is just to disable CET (that's what is minimally required realy). Lack of feature parity with BFD may mean downstream distributions cannot use the linker for building distribution binaries that meet security policy for the target machine e.g. must have CET enabled.

Red Hat Enterprise Linux 8 Beta (publicly available for x86_64) is using the current CET PLT layout generated by BFD.

Hi Ruiu & Fangrui
Your idea is also make sense, but It is really too too late to change ABI now.
I wish we can reach a consensus as soon as possible.
Thank you all, very much!

ruiu added a comment.Apr 16 2019, 3:13 AM

The root cause of the problem here is that the API doesn't have enough rationale to choose a complex format than a simpler one, even though it was set just a few years ago and there's no historical burden to do so. I was repeatedly told that it is too late to change the ABI but that wasn't convincing (that's why the same topic came up many times in the thread) because no one said how much it is too late -- what thing is already done and if we redo it how long does it likely to take, etc. So, Carlos, thank you for giving some examples of the stuff that's already done to some programs.

I hope that future changes to the ABI have more rationale if some obvious simpler file format doesn't work for some reason.

So it seems I have two choices: just land this patch or start a discussion on the ABI mailing list to change the ABI. The latter would take much time.

Honestly, although I'm generally interested in the binary-level file format of executable files, I'm less interested in CET itself, especially compared to those who are working for Intel. So I'm naturally less interested in making efforts to propose a new ABI for CET. I guess I need a cooperation from Intel people to make a desired change. But looks like no one from Intel is interested in it. If that's the case, I probably should just land this patch and accept the consequence. But before that, let me ask again -- are you guys OK with the new 2-PLT scheme? Do you just accept that? Sorry for sticking to this point.

MaskRay added a comment.EditedApr 16 2019, 7:20 AM

Red Hat Enterprise Linux 8 Beta (publicly available for x86_64) is using the current CET PLT layout generated by BFD.

If it uses MPX and the bnd prefix, it is probably not too late to change.

To change the ABI you must go upstream to the ABI discussion list and have that discussion with the hardware vendor. The hardware vendor supports many different operating systems and implementations and the changes to the standard need to be discussed there. Alternatively you can make and adhere to your own standards and work with downstream tooling to support those standards as alternative implementations (hopefully with interoperability). I would not like to see two alternative ABIs, that leads to serious problems in downstream tooling and duplicated effort.

My opinion is that lld is too late to make the requested change to the PLT format at this point. You either get to adhere to the standard and participate in distribution builds, or not and you don't.

@codonell @xiangzhangllvm I did create a thread in the x86-64-abi mailing list https://groups.google.com/forum/#!topic/x86-64-abi/Z6GqQDy_NaI The only reply was from the maintainer. I think I'd like more opinions from others.

I am a happy Linux glibc user but I want to learn things from multiple perspectives - insights from other parties are good. So I asked some FreeBSD folks to chime in and I asked how musl people felt about the second PLT scheme. Since this second PLT is really Linux+x86+glibc lazybind specific - it implies more overhead to other libcs. musl people can probably live with -fno-plt (musl doesn't support glibc style lazybind - they don't need PLT at all)

To be honest I feel the ABI is proposed in such a way that enforces other OSes, linkers, binary manipulation tools, emulators, diassemblers, binary analyzers, instrumentation tools and profilers to follow suit: there are announcements but no design discussions. However, I know at this point arguing is probably unavailing. Therefore, I believe I can be convinced if someone (from Intel, Red Hat, etc) gives performance numbers demonstrating this second PLT scheme has indeed the cache-locality merit and is better than the alternative 24-byte PLT entry scheme.

We are Ok with with the 2-PLT scheme.

I think I'd like more opinions from others. there are announcements but no design discussions

Hi, Fangrui, the 2-PLT scheme has pass the ABI discussions, not H.J just made it by only himself.

gives performance numbers demonstrating this second PLT scheme has indeed the cache-locality merit and is better than the alternative 24-byte PLT entry scheme.

Yes, We really have not performance evidence, but this design is really more performance in theory, and this scheme has done years , and nothing went wrong.
The 24-bit plt is beautiful, but it is not the overwhelming reason to overthrow the big previous works.

ruiu added a comment.Apr 18 2019, 1:28 AM

We are Ok with with the 2-PLT scheme.

I think I'd like more opinions from others. there are announcements but no design discussions

Hi, Fangrui, the 2-PLT scheme has pass the ABI discussions, not H.J just made it by only himself.

gives performance numbers demonstrating this second PLT scheme has indeed the cache-locality merit and is better than the alternative 24-byte PLT entry scheme.

Yes, We really have not performance evidence, but this design is really more performance in theory, and this scheme has done years , and nothing went wrong.
The 24-bit plt is beautiful, but it is not the overwhelming reason to overthrow the big previous works.

Let's discuss that on the x86-64 System V ABI mailing list. I signed up and once approved, I'll send a mail there.

xiangzhangllvm added a comment.EditedApr 18 2019, 8:20 PM

Hi friends, I write a simple test to test the performance:
To simply test the performance affect I changed the PLT size from 16 Bytes to 24Bytes in LLVM 8.0.0.

[xiangzh1@scels74 /export/iusers/xiangzh1/LLVM/LLVMORG]$diff llvm800/tools/lld/ELF/Arch/X86_64.cpp llvm800-PLT24/tools/lld/ELF/Arch/X86_64.cpp
67c67
<   PltEntrySize = 16;
---
>   PltEntrySize = 24; //16-->24
157a158,159
>       0x66, 0x90,                   //nop
>       0x66, 0x0f, 0x1f, 0x44, 0, 0, // nop
[xiangzh1@scels74 /export/iusers/xiangzh1/LLVM/LLVMORG]$

Then I write a simple case like following:

Main.c
#include<stdio.h>
#include<time.h>
#define FUNCTION_PLT( ID ) fplt##ID();
int main(){
int i;
clock_t t1, t2;
t1 = clock();
for(i = 0; i < 1000000; i++){
FUNCTION_PLT( 0 )
FUNCTION_PLT( 1 )
FUNCTION_PLT( 2 )
...
FUNCTION_PLT( n )
}
t2 = clock();
printf("Time cost %ld\n", t2-t1 );
  return 0;
}

Fplt.c
#define FUNCTION_PLT( ID ) int fplt##ID(){return ID;}
FUNCTION_PLT( 0 )
FUNCTION_PLT( 1 )
FUNCTION_PLT( 2 )
.....
FUNCTION_PLT( n )

After that, I build them with 16Byte PLT-LLD and 24Byte PLT-LLD:

/export/iusers/xiangzh1/LLVM/LLVMORG/build800/bin/clang -c Main.c Fplt.c -w;
/export/iusers/xiangzh1/LLVM/LLVMORG/build800/bin/ld.lld -shared Fplt.o -o Fplt.so;
/export/iusers/xiangzh1/LLVM/LLVMORG/build800-plt24/bin/clang -fuse-ld=lld Main.o Fplt.so -o a24.out;
/export/iusers/xiangzh1/LLVM/LLVMORG/build800/bin/clang -fuse-ld=lld Main.o Fplt.so -o a16.out;

Then I copy them to my personal PC which just only I using it.
The follow text is copied from my personal PC (originally, I not change any data):

xiangzh1@xiangzh1:~/test/PLT/test$ export LD_LIBRARY_PATH=./
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2924013
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2721932
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2855578
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2330382
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2958377
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2361670
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2509809
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2451191
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2779484
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2551673
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2752218
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2752107
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2584999
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2630829
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 3148183
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2342366
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2678235
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2518696
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2767221
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2712448
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2667558
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2818396
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2740905
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2420335
xiangzh1@xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2586176
xiangzh1@xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2556211
xiangzh1@xiangzh1:~/test/PLT/test$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               158
Model name:          Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
Stepping:            9
CPU MHz:             845.906
CPU max MHz:         4200.0000
CPU min MHz:         800.0000
BogoMIPS:            7200.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            8192K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp flush_l1d
xiangzh1@xiangzh1:~/test/PLT/test$ ls
a16.out  a24.d  a24.dump  a24.out  cmd.txt  Fplt-bk.c  Fplt-bk.o  Fplt.c  Fplt.o  Fplt.so  Main-bk.c  Main-bk.o  Main.c  Main.o  perf.data  perf.data.a24  perf.data.old  pp  tt
xiangzh1@xiangzh1:~/test/PLT/test$ head -n 15 Main.c
#include<stdio.h>
#include<time.h>

#define FUNCTION_PLT( ID ) fplt##ID();

int main(){
int i;
clock_t t1, t2;
t1 = clock();
for(i = 0; i < 1000000; i++){

FUNCTION_PLT( 0 )
FUNCTION_PLT( 1 )
FUNCTION_PLT( 2 )
FUNCTION_PLT( 3 )
xiangzh1@xiangzh1:~/test/PLT/test$ tail -n 10 Main.c
FUNCTION_PLT( 724 )
FUNCTION_PLT( 725 )
FUNCTION_PLT( 726 )
FUNCTION_PLT( 727 )
FUNCTION_PLT( 728 )
}
t2 = clock();
printf("Time cost %ld\n", t2-t1 );
  return 0;
}
xiangzh1@xiangzh1:~/test/PLT/test$ head -n 10 Fplt.c
#define FUNCTION_PLT( ID ) int fplt##ID(){return ID;}
FUNCTION_PLT( 0 )
FUNCTION_PLT( 1 )
FUNCTION_PLT( 2 )
FUNCTION_PLT( 3 )
FUNCTION_PLT( 4 )
FUNCTION_PLT( 5 )
FUNCTION_PLT( 6 )
FUNCTION_PLT( 7 )
FUNCTION_PLT( 8 )
xiangzh1@xiangzh1:~/test/PLT/test$ tail -n 10 Fplt.c
FUNCTION_PLT( 719 )
FUNCTION_PLT( 720 )
FUNCTION_PLT( 721 )
FUNCTION_PLT( 722 )
FUNCTION_PLT( 723 )
FUNCTION_PLT( 724 )
FUNCTION_PLT( 725 )
FUNCTION_PLT( 726 )
FUNCTION_PLT( 727 )
FUNCTION_PLT( 728 )
xiangzh1@xiangzh1:~/test/PLT/test$

I found most time the big plt cost more time than small plt file.
This change just base on LLVM8.0.0-LLD, but the the principle is the same.
I also catch the cache miss in main function (I add the loop times to 10000000):

ruiu added a comment.Apr 18 2019, 9:13 PM

Xiang,

I believe that your benchmark is inappropriate and does not correctly capture correct performance characteristics of PLT entries of different sizes.

First of all, your microbenchmark is too artificial. You created lots of PLT entries for functions that don't do anything but immediately return, and your program doesn't do anything but hammer PLT entries all the time. That usage pattern is very different from real programs.

Second of all, and more importantly, I couldn't reproduce your result with the PLT of size 32 bytes. One possibility is that the performance you noticed was due to the fact that half of the 24-byte PLT entries are not aligned to 16 byte boundaries. Intel x86-64 Optimization Manual recommends that you align jump target to 16 bytes, so it might not be a surprise, though.

My machine is AMD Ryzen Threadripper 2990WX, so your benchmark's behavior on my machine might be different from yours, but still I could observe that your benchmark slows down only when I choose to use 24-byte PLT entry, as you can see below.

$ for i in `seq 1 10`; do echo -n "PLT16 "; ./main.16; echo -n "PLT24 "; ./main.24; echo -n "PLT32 "; ./main.32; echo; done
PLT16 Time cost 2667549
PLT24 Time cost 2891321
PLT32 Time cost 2656223

PLT16 Time cost 2682222
PLT24 Time cost 3245131
PLT32 Time cost 2682468

PLT16 Time cost 2659353
PLT24 Time cost 2895083
PLT32 Time cost 2657525

PLT16 Time cost 2680843
PLT24 Time cost 2867365
PLT32 Time cost 2676404

PLT16 Time cost 2661513
PLT24 Time cost 2931767
PLT32 Time cost 2653236

PLT16 Time cost 2680713
PLT24 Time cost 2872741
PLT32 Time cost 2678326

PLT16 Time cost 2661513
PLT24 Time cost 2892384
PLT32 Time cost 2656584

PLT16 Time cost 2683864
PLT24 Time cost 2887594
PLT32 Time cost 2658880

PLT16 Time cost 2682057
PLT24 Time cost 2911524
PLT32 Time cost 2679250

PLT16 Time cost 2654767
PLT24 Time cost 2915021
PLT32 Time cost 2659874

main.16 is linked with the regular lld. main.24 is linked with a modified lld whose output PLT size is 24 bytes. main.32 is linked with a modified lld whose output PLT size is 32 bytes.

As you can see, even with your synthesized microbenchmark, I couldn't see the difference of 16-byte PLT and 32-byte PLT. With that result, I have to say that I'm even more skeptical of the 2-PLT scheme than before.

Ping. I'm planning to upstream support for the AArch64 BTI and PAC features in LLD, llvm-objdump and llvm-readelf. I've written the LLD patches on top of this patch as the .note.gnu.property section is used in a similar way (GNU_PROPERTY_AARCH64_FEATURE_1_AND) so only a few small alterations are needed to that part of the code. I'm wondering if it is best to:

  • Submit the patches rebased on top of this patch.
    • The advantage here is that you could see the differences in supporting X86 and AArch64 more clearly.
  • Strip out the .note.gnu.property code from this patch and resubmit it as part of the AArch64 BTI and PAC.
    • The advantage here is that the AArch64 feature doesn't have to wait for CET to land.

Any suggestions or preferences? I have the code written and will start upstreaming tomorrow. There are some LLVM for llvm-readobj and llvm-objdump that the LLD patches will depend on so I'll start with those.

The ELF ABI has details on the .note.gnu.property support. https://developer.arm.com/docs/ihi0056/latest/elf-for-the-arm-64-bit-architecture-aarch64-abi-2019q1-documentation.

I was also wondering if there has been any progress with this. Was there any more discussion on the ABI mailing list?

I'm sorry, I reflect the problem to some leaders and experts, but the PLT ABI is still undecided. The most concern come from the unknow risk of difference with GNU.

I'm sorry, I reflect the problem to some leaders and experts, but the PLT ABI is still undecided. The most concern come from the unknow risk of difference with GNU.

@xiangzhangllvm If it is still undecided, then there is still opportunity to fix :) IMHO tooling is still immature and currently very few tools understand .plt.sec.

I'd really like to see more discussions about this, e.g. https://groups.google.com/forum/#!topic/x86-64-abi/Z6GqQDy_NaI
If PLT entries should be aligned by 16 to maximize performance, shall we switch to 32-byte PLT entry? Xiang, can you raise a topic on https://sourceware.org/ml/gnu-gabi/2019-q2 or https://sourceware.org/ml/binutils/2019-05/

I'm sorry, I reflect the problem to some leaders and experts, but the PLT ABI is still undecided. The most concern come from the unknow risk of difference with GNU.

@xiangzhangllvm If it is still undecided, then there is still opportunity to fix :) IMHO tooling is still immature and currently very few tools understand .plt.sec.

I'd really like to see more discussions about this, e.g. https://groups.google.com/forum/#!topic/x86-64-abi/Z6GqQDy_NaI
If PLT entries should be aligned by 16 to maximize performance, shall we switch to 32-byte PLT entry? Xiang, can you raise a topic on https://sourceware.org/ml/gnu-gabi/2019-q2 or https://sourceware.org/ml/binutils/2019-05/

hi, fangrui, 32-byte PLT entry is also aligned by 16, I have test the performance before, they are same.

MaskRay added a comment.EditedMay 28 2019, 8:33 PM

I'm sorry, I reflect the problem to some leaders and experts, but the PLT ABI is still undecided. The most concern come from the unknow risk of difference with GNU.

@xiangzhangllvm If it is still undecided, then there is still opportunity to fix :) IMHO tooling is still immature and currently very few tools understand .plt.sec.

I'd really like to see more discussions about this, e.g. https://groups.google.com/forum/#!topic/x86-64-abi/Z6GqQDy_NaI
If PLT entries should be aligned by 16 to maximize performance, shall we switch to 32-byte PLT entry? Xiang, can you raise a topic on https://sourceware.org/ml/gnu-gabi/2019-q2 or https://sourceware.org/ml/binutils/2019-05/

hi, fangrui, 32-byte PLT entry is also aligned by 16, I have test the performance before, they are same.

Great. If the unified 32-byte PLT entry scheme doesn't regress, let's delete .plt.sec and fix the ABI. A simplified scheme will ease the adoption of CET. I guess you are in a good position to do this. You may send an email to the gnu-gabi and binutils mailing lists.

The current problem is the 2nd PLT ABI was in GCC side for more than 2 years. I think we need to work with GCC community to have a consistent ABI so we won't break each other. Let's have more discussion in x86_64 ABI site.

The current problem is the 2nd PLT ABI was in GCC side for more than 2 years. I think we need to work with GCC community to have a consistent ABI so we won't break each other. Let's have more discussion in x86_64 ABI site.

@annita.zhang Thanks for weighing in! AFAIK, .plt.sec is a section synthesized by the linker. GCC doesn't know or have to know how PLT entries are represented.

for more than 2 years

This is less of a concern to me. Even GNU objdump compiled from binutils-gdb trunk (the same repository as ld.bfd) doesn't know how to symbolize such .plt.sec or .plt entries, so I believe the tooling is still immature and the adoption is low.

Annita, you and Xiang may have better ideas which parties should be involved in the discussion. Could one of you start a thread on x86-64-abi, binutils, and gnu-gabi? And/or gcc (if you think GCC folks should also be involved). You may CC Rui, Peter, and me.

ruiu added a comment.May 29 2019, 5:45 AM

I was planning to start a discussion at the gabi mailing list. I noticed that I didn't join the mailing list, and it's moderated, so I waited my join request would be approved, and then when it was approved, I didn't start a discussion immediately. Sorry for my negligence. Let me start a discussion.

In lld we already support two different PLT formats for mitigating Spectre. It might make sense to add them to the standard if enough number of people are already using them. Last I tried it with gdb, gdb crashes because of the nonstandard PLT format.

MaskRay added inline comments.May 30 2019, 3:24 AM
lld/ELF/Config.h
89

Split off the X86Features (or ANDFeatures) change and commit, to make D62609 easier to review?

ruiu added a comment.Jun 3 2019, 6:37 AM

I sent a proposal to the x86-64-abi mailing list. You can read the thread here: https://groups.google.com/forum/?hl=en#!topic/x86-64-abi/sQcX3__r4c0

ruiu updated this revision to Diff 203536.Jun 7 2019, 4:58 AM
  • rebase so that the patch does not diverge too much from the trunk
ruiu updated this revision to Diff 203771.Jun 9 2019, 10:20 PM
  • fix tests

Hi,

I added a comment to x86-64 ABI google group link. I also posted it below.

This CET ABI issue was raised during glibc discussion in GCC Cauldron. The conclusion was CET ABI wouldn’t be changed. It’s not because it’s a better option, but because it’s been finalized for 2 years (in 2017). And the implementation has been included in Linux distributions already. There's no obvious reason to change it w/o a pretty better solution.

On the other hand, Intel wants to support it in LLVM lld anyway. But we don’t want to force lld developers to do what they don’t want. So now the ball is at lld side. If lld would like to follow the CET ABI, we’re willing to submit the corresponding lld patches. If not, we will submit the lld patches supporting 32-byte PLTs. The disadvantage of the latter is ABI is incompatible between GCC and LLVM for CET. It may cause potential issues in some tools which has dependence on PLT size and layout. But we don’t have known samples yet. From our estimation, the risk may be low.

I’m looking forward to hearing your opinions and moving it forward.

Thanks,
Annita

MaskRay accepted this revision.EditedOct 16 2019, 2:38 AM

Hi,

I added a comment to x86-64 ABI google group link. I also posted it below.

This CET ABI issue was raised during glibc discussion in GCC Cauldron. The conclusion was CET ABI wouldn’t be changed. It’s not because it’s a better option, but because it’s been finalized for 2 years (in 2017). And the implementation has been included in Linux distributions already. There's no obvious reason to change it w/o a pretty better solution.

On the other hand, Intel wants to support it in LLVM lld anyway. But we don’t want to force lld developers to do what they don’t want. So now the ball is at lld side. If lld would like to follow the CET ABI, we’re willing to submit the corresponding lld patches. If not, we will submit the lld patches supporting 32-byte PLTs. The disadvantage of the latter is ABI is incompatible between GCC and LLVM for CET. It may cause potential issues in some tools which has dependence on PLT size and layout. But we don’t have known samples yet. From our estimation, the risk may be low.

I’m looking forward to hearing your opinions and moving it forward.

Thanks,
Annita

I actually made a similar proposal https://groups.google.com/forum/?hl=en#!topic/x86-64-abi/Z6GqQDy_NaI in April but it did not spur discussions.
Ali Bahrami and Michael Matz were active in generic-abi discussions. I am really glad they commented on https://groups.google.com/forum/?hl=en#!topic/x86-64-abi/sQcX3__r4c0 My summary is that many people think .plt.sec is overengineered. Carlos's stance is fairly obvious from the discussion. It is just too late to fix anything. (Gossip is that it is probably only Carlos himself cares about the auditing feature in glibc... As people have pointed out, very few libc implementations (just glibc and Solaris libc) support auditing so this does not surprise me at all -;))

In the Cauldron, GCC/LLVM Collaboration BoF was a topic. The PLT scheme argument reflects a bigger problem that some collaboration does not work well. I will give an example about the latest binutils 2.33.1 release. I caught the objcopy --set-section-alignment problem in the last minute (https://sourceware.org/ml/binutils/2019-10/msg00008.html). I would regret if the problematic objcopy patch slipped through and GNU developers were reluctant to change it - which would mean llvm-objcopy would not have an ideal solution. Recently in another case, I caught a problem related to .ctf but I guess GNU maintainers will not care my opinion. I'll stop as it may be too off-topic now.

For my own notes, I should follow binutils, libc-alpha, and x86-64-abi discussions more closely.

I accepted this patch. I am still sad about the .plt.sec scheme but am not opposed to this patch. It is up to @ruiu to rebase the patch and commit it.

This comment was removed by annita.zhang.

In the Cauldron, GCC/LLVM Collaboration BoF was a topic. The PLT scheme argument reflects a bigger problem that some collaboration does not work well. I will give an example about the latest binutils 2.33.1 release. I caught the objcopy --set-section-alignment problem in the last minute (https://sourceware.org/ml/binutils/2019-10/msg00008.html). I would regret if the problematic objcopy patch slipped through and GNU developers were reluctant to change it - which would mean llvm-objcopy would not have an ideal solution. Recently in another case, I caught a problem related to .ctf but I guess GNU maintainers will not care my opinion. I'll stop as it may be too off-topic now.

For my own notes, I should follow binutils, libc-alpha, and x86-64-abi discussions more closely.

I accepted this patch. I am still sad about the .plt.sec scheme but am not opposed to this patch. It is up to @ruiu to rebase the patch and commit it.

@MaskRay, I totally agree with you that LLVM and GCC community should work more closely on those ABI standards. For example, we are facing some LLVM ABI noncompliant issues especially in legacy ia32. And they're hard to fix. We need to involve both LLVM and GCC into discussion for future ABI changes. So both can be ABI compliant once it's finalized.

@ruiu, what's your decision then?

ruiu added a comment.Nov 8 2019, 2:43 AM

This patch is outdated and needs rebasing. I'll rebase, upload it again and then submit.

Sounds good. Thank you!

This patch is outdated and needs rebasing. I'll rebase, upload it again and then submit.

Hi ruiu, just remind, if you have no time, I can rebase it.

This patch is outdated and needs rebasing. I'll rebase, upload it again and then submit.

Hi ruiu, just remind, if you have no time, I can rebase it.

I can do that, too, and cleaning the test.

lld/test/ELF/x86-64-cet.s
13

These instructions should be aligned.

llvm-objcopy<9 may have alignment problems.

This patch is outdated and needs rebasing. I'll rebase, upload it again and then submit.

Hi ruiu, just remind, if you have no time, I can rebase it.

I can do that, too, and cleaning the test.

Very good! thank you! In fact, I am a little unfamiliar about LLD code, for I have not do the lld work for a long time.

MaskRay updated this revision to Diff 233253.Dec 10 2019, 8:13 PM

Rebase and improve comments.

ruiu added a comment.Dec 10 2019, 8:39 PM

Hmm, I'm sorry but I'm confused. IIRC I had a discussion in the LLVM dev meeting that we were going to submit a change with a single PLT scheme rather than IPLT, so when I said that I'm going to submit a patch, I meant that I'm going to submit a patch for the 1PLT scheme rather than the 2PLT scheme. But this is for the 2PLT scheme. This is not something I want.

MaskRay added a comment.EditedDec 10 2019, 8:53 PM

@xiangzhangllvm @annita.zhang The patch has been rebased, but it doesn't seem to work.

I made a local patch to make --require-cet behave more like --force-bti:

--- i/lld/ELF/Driver.cpp
+++ w/lld/ELF/Driver.cpp
@@ -1705,2 +1705,4 @@ template <class ELFT> static uint32_t getAndFeatures() {
-    } else if (!features && config->requireCET)
-      error(toString(f) + ": --require-cet: file is not compatible with CET");
+    } else if (config->requireCET && !(features & GNU_PROPERTY_X86_FEATURE_1_IBT)) {
+      warn(toString(f) + ": --require-cet: file is not compatible with CET");
+      features |= GNU_PROPERTY_X86_FEATURE_1_IBT;
+    }

I tried a trivial program with 2 PLT calls.

gcc -fcf-protection=full -c a.c
gcc -fcf-protection=full a.c -o a '-###'   # Retrieve linker command line, replace ld with

My GCC crt files are not CET compatible but I think that is probably irrelevant.

% ld.lld --eh-frame-hdr -m elf_x86_64 "--hash-style=gnu" -dynamic-linker /lib64/ld-linux-x86-64.so.2 -pie -o a /usr/lib/gcc/x86_64-linux-gnu/8/../../
../x86_64-linux-gnu/Scrt1.o /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/8/crtbeginS.o -L/usr/lib/gc
c/x86_64-linux-gnu/8 -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../../lib -L/lib/x86_64-linux
-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/8/../../.. a.o -lgcc --push-state --as-needed -lgcc_s
 --pop-state -lc -lgcc --push-state --as-needed -lgcc_s --pop-state /usr/lib/gcc/x86_64-linux-gnu/8/crtendS.o /usr/lib/gcc/x86_64-linux-gnu/8/../../..
/x86_64-linux-gnu/crtn.o --require-cet -o a                                                                                                          
ld.lld: warning: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/Scrt1.o: --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o: --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib/gcc/x86_64-linux-gnu/8/crtbeginS.o: --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib/x86_64-linux-gnu/libc_nonshared.a(elf-init.oS): --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib/gcc/x86_64-linux-gnu/8/crtendS.o: --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crtn.o: --require-cet: file is not compatible with CET

OK, it segfaults. So there may be some issues in the PLT.

./a => segmentation fault

Peter Smith implemented --force-bti for AArch64 in D62609.
Its semantic is:

--force-bti : Act as if all relocatable inputs had GNU_PROPERTY_AARCH64_FEATURE_1_BTI and warn for every relocatable object that does not.

Do you think it makes more sense to change --require-cet to --force-cet or --force-ibt?

There is also some glibc code

@peter.smith I find that in binutils-gdb, the AArch64 option is named -z force-bti, not --force-bti (commit [BFD, LD, AArch64, 2/3] Add --force-bti to enable BTI and to select BTI enabled PLTs). Do you intend to make them consistent?

Hmm, I'm sorry but I'm confused. IIRC I had a discussion in the LLVM dev meeting that we were going to submit a change with a single PLT scheme rather than IPLT, so when I said that I'm going to submit a patch, I meant that I'm going to submit a patch for the 1PLT scheme rather than the 2PLT scheme. But this is for the 2PLT scheme. This is not something I want.

What was the decision that was made at the developer meeting? Will lld support the 2PLT scheme defined in the psABI?

MaskRay added a comment.EditedDec 10 2019, 9:16 PM

Hmm, I'm sorry but I'm confused. IIRC I had a discussion in the LLVM dev meeting that we were going to submit a change with a single PLT scheme rather than IPLT, so when I said that I'm going to submit a patch, I meant that I'm going to submit a patch for the 1PLT scheme rather than the 2PLT scheme. But this is for the 2PLT scheme. This is not something I want.

What was the decision that was made at the developer meeting? Will lld support the 2PLT scheme defined in the psABI?

(Personally, I am still unhappy with the .plt.sec scheme. I think I prefer a 1PLT scheme like AArch64's Branch Target Indicators (BTI) and Pointer Authentication Code (PAC).) But I thought @ruiu accepted the fait accompli because there had been very strong objection to diverge from what GNU ld does. https://groups.google.com/forum/#!topic/x86-64-abi/sQcX3__r4c0

objdump -d (built from binutils-gdb HEAD) still does not work with lld produced binaries.

% objdump -d a.bfd  # -z ibt
Disassembly of section .plt:

0000000000001020 <.plt>:
    1020:       ff 35 e2 2f 00 00       pushq  0x2fe2(%rip)        # 4008 <_GLOBAL_OFFSET_TABLE_+0x8>
    1026:       f2 ff 25 e3 2f 00 00    bnd jmpq *0x2fe3(%rip)        # 4010 <_GLOBAL_OFFSET_TABLE_+0x10>
    102d:       0f 1f 00                nopl   (%rax)
    1030:       f3 0f 1e fa             endbr64 
    1034:       68 00 00 00 00          pushq  $0x0
    1039:       f2 e9 e1 ff ff ff       bnd jmpq 1020 <.plt>
    103f:       90                      nop
    1040:       f3 0f 1e fa             endbr64 
    1044:       68 01 00 00 00          pushq  $0x1
    1049:       f2 e9 d1 ff ff ff       bnd jmpq 1020 <.plt>
    104f:       90                      nop
Disassembly of section .plt.got:

0000000000001050 <__cxa_finalize@plt>:
    1050:       f3 0f 1e fa             endbr64 
    1054:       f2 ff 25 9d 2f 00 00    bnd jmpq *0x2f9d(%rip)        # 3ff8 <__cxa_finalize@GLIBC_2.2.5>
    105b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

Disassembly of section .plt.sec:

0000000000001060 <puts@plt>:
    1060:       f3 0f 1e fa             endbr64 
    1064:       f2 ff 25 ad 2f 00 00    bnd jmpq *0x2fad(%rip)        # 4018 <puts@GLIBC_2.2.5>
    106b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

0000000000001070 <exit@plt>:
    1070:       f3 0f 1e fa             endbr64 
    1074:       f2 ff 25 a5 2f 00 00    bnd jmpq *0x2fa5(%rip)        # 4020 <exit@GLIBC_2.2.5>
    107b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
% objdump -d a.lld  # --require-cet (actually my patched version, which behaves like AArch64's --force-bti
...
Disassembly of section .plt:

00000000000018c0 <.plt>:
    18c0:       ff 35 0a 22 00 00       pushq  0x220a(%rip)        # 3ad0 <__init_array_end+0x1190>
    18c6:       ff 25 0c 22 00 00       jmpq   *0x220c(%rip)        # 3ad8 <__init_array_end+0x1198>
    18cc:       0f 1f 40 00             nopl   0x0(%rax)
    18d0:       f3 0f 1e fa             endbr64 
    18d4:       68 00 00 00 00          pushq  $0x0
    18d9:       e9 e2 ff ff ff          jmpq   18c0 <_fini+0x14>
    18de:       66 90                   xchg   %ax,%ax
    18e0:       f3 0f 1e fa             endbr64 
    18e4:       68 01 00 00 00          pushq  $0x1
    18e9:       e9 d2 ff ff ff          jmpq   18c0 <_fini+0x14>
    18ee:       66 90                   xchg   %ax,%ax
    18f0:       f3 0f 1e fa             endbr64 
    18f4:       68 02 00 00 00          pushq  $0x2
    18f9:       e9 c2 ff ff ff          jmpq   18c0 <_fini+0x14>
    18fe:       66 90                   xchg   %ax,%ax

Disassembly of section .plt.sec:

0000000000001900 <.plt.sec>:
    1900:       f3 0f 1e fa             endbr64 
    1904:       ff 25 16 22 00 00       jmpq   *0x2216(%rip)        # 3b20 <__cxa_finalize@GLIBC_2.2.5>
    190a:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
    1910:       f3 0f 1e fa             endbr64 
    1914:       ff 25 0e 22 00 00       jmpq   *0x220e(%rip)        # 3b28 <puts@GLIBC_2.2.5>
    191a:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
    1920:       f3 0f 1e fa             endbr64 
    1924:       ff 25 06 22 00 00       jmpq   *0x2206(%rip)        # 3b30 <exit@GLIBC_2.2.5>
    192a:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)

On AArch64, the linker produces two dynamic tags DT_AARCH64_BTI_PLT and DT_AARCH64_PAC_PLT which will be read by glibc ld.so. I haven't checked how x86_64 glibc handles this.

I will probably build a top-of-tree glibc and a top-of-tree gcc tomorrow and investigate more about these stuff.

Hmm, I'm sorry but I'm confused. IIRC I had a discussion in the LLVM dev meeting that we were going to submit a change with a single PLT scheme rather than IPLT, so when I said that I'm going to submit a patch, I meant that I'm going to submit a patch for the 1PLT scheme rather than the 2PLT scheme. But this is for the 2PLT scheme. This is not something I want.

What was the decision that was made at the developer meeting? Will lld support the 2PLT scheme defined in the psABI?

I was confused too. @ruiu, maskray, what's the decision for lld support to CET?

xiangzhangllvm added inline comments.Dec 10 2019, 9:46 PM
lld/ELF/Driver.cpp
1704–1709

I don't know why "--forece-bti" use "warn" not "error", in my eyes, people use --require-cet to generate the output file, mostly means, they want to run the output file on a CET-checked machine. It will run fail when linked Non-CET object files to the output file. So here we used "error" instead of "warn".

MaskRay added inline comments.Dec 10 2019, 9:59 PM
lld/ELF/Driver.cpp
1704–1709

GNU ld supports -z cet-report=[none|warning|error] (added in 2019-04) https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=233a00833b984319d5e94db3f5d5d9a735edc984

Examples:

% ld.bfd ... -z cet-report=error              # exit code: 1
./ld-new: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/Scrt1.o: error: missing SHSTK property
./ld-new: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o: error: missing SHSTK property
% ld.bfd ... -z cet-report=warning        # exit code: 0
./ld-new: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/Scrt1.o: warning: missing SHSTK property
./ld-new: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o: warning: missing SHSTK property
> gcc -fcf-protection=full -c a.c
> gcc -fcf-protection=full a.c -o a '-###'   # Retrieve linker command line, replace ld with

My GCC crt files are not CET compatible but I think that is probably irrelevant.

% ld.lld --eh-frame-hdr -m elf_x86_64 "--hash-style=gnu" -dynamic-linker /lib64/ld-linux-x86-64.so.2 -pie -o a /usr/lib/gcc/x86_64-linux-gnu/8/../../
../x86_64-linux-gnu/Scrt1.o /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/8/crtbeginS.o -L/usr/lib/gc
c/x86_64-linux-gnu/8 -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../../lib -L/lib/x86_64-linux
-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/8/../../.. a.o -lgcc --push-state --as-needed -lgcc_s
 --pop-state -lc -lgcc --push-state --as-needed -lgcc_s --pop-state /usr/lib/gcc/x86_64-linux-gnu/8/crtendS.o /usr/lib/gcc/x86_64-linux-gnu/8/../../..
/x86_64-linux-gnu/crtn.o --require-cet -o a                                                                                                          
ld.lld: warning: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/Scrt1.o: --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o: --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib/gcc/x86_64-linux-gnu/8/crtbeginS.o: --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib/x86_64-linux-gnu/libc_nonshared.a(elf-init.oS): --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib/gcc/x86_64-linux-gnu/8/crtendS.o: --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crtn.o: --require-cet: file is not compatible with CET

OK, it segfaults. So there may be some issues in the PLT.

./a => segmentation fault

Can you share your a.c code?
I test the "hello world" test, it is ok. (At first I use as 2.25, it failed at assembling.)

[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$gcc -fcf-protection=full main.c -S -o main.s
[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$cat main.c
#include<stdio.h>
int main() {
 printf("Hello llvm!\n");
  return 0;
}
[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$gcc -fcf-protection=full main.c -S -o main.s
[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$/rdrive/ref/binutils/2.29/rhel70/efi2/bfd/bin/as --64 main.s -o main.o
[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$/rdrive/ref/binutils/2.29/rhel70/efi2/bfd/bin/as -version
GNU assembler (GNU Binutils) 2.29
Copyright (C) 2017 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or later.
This program has absolutely no warranty.
This assembler was configured for a target of `x86_64-linux-gnu'.
[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$../build-cet/bin/ld.lld --require-cet -plugin /nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../libexec/gcc/x86_64-linux-gnu/8.3.0/liblto_plugin.so "-plugin-opt=/nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../libexec/gcc/x86_64-linux-gnu/8.3.0/lto-wrapper" "-plugin-opt=-fresolution=/tmp/ccQKvMnE.res" "-plugin-opt=-pass-through=-lgcc" "-plugin-opt=-pass-through=-lgcc_s" "-plugin-opt=-pass-through=-lc" "-plugin-opt=-pass-through=-lgcc" "-plugin-opt=-pass-through=-lgcc_s" --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o a.out /lib/../lib64/crt1.o /lib/../lib64/crti.o /nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/crtbegin.o -L/nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0 -L/nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc -L/nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/../../.. main.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/crtend.o /lib/../lib64/crtn.o
ld.lld: warning: /lib/../lib64/crt1.o: --require-cet: file is not compatible with CET
ld.lld: warning: /lib/../lib64/crti.o: --require-cet: file is not compatible with CET
ld.lld: warning: /nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/crtbegin.o: --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib64/libc_nonshared.a(elf-init.oS): --require-cet: file is not compatible with CET
ld.lld: warning: /nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/crtend.o: --require-cet: file is not compatible with CET
ld.lld: warning: /lib/../lib64/crtn.o: --require-cet: file is not compatible with CET
[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$
[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$./a.out
Hello llvm!
[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$

Remove the "-plugin xxx", also work.

[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$../build-cet/bin/ld.lld --require-cet  --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o a.out /lib/../lib64/crt1.o /lib/../lib64/crti.o /nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/crtbegin.o -L/nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0 -L/nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc -L/nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/../../.. main.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/crtend.o /lib/../lib64/crtn.o                                                                           ld.lld: warning: /lib/../lib64/crt1.o: --require-cet: file is not compatible with CET
ld.lld: warning: /lib/../lib64/crti.o: --require-cet: file is not compatible with CET
ld.lld: warning: /nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/crtbegin.o: --require-cet: file is not compatible with CET
ld.lld: warning: /usr/lib64/libc_nonshared.a(elf-init.oS): --require-cet: file is not compatible with CET
ld.lld: warning: /nfs/sc/proj/icl/rdrive/ref/gcc/8.3.0/rhel70/efi2/bin/../lib/gcc/x86_64-linux-gnu/8.3.0/crtend.o: --require-cet: file is not compatible with CET
ld.lld: warning: /lib/../lib64/crtn.o: --require-cet: file is not compatible with CET
[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$./a.out
Hello llvm!
[xiangzh1@scels75 /export/iusers/xiangzh1/LLVM/ORG/test]$
MaskRay updated this revision to Diff 233263.EditedDec 11 2019, 12:24 AM

Apply two patches

-  uint64_t Plt = In.Plt->getVA();
+  uint64_t Plt = In.IBTPlt ? In.IBTPlt->getVA() : In.Plt->getVA();
--- i/lld/ELF/Driver.cpp
+++ w/lld/ELF/Driver.cpp
@@ -1705,2 +1705,4 @@ template <class ELFT> static uint32_t getAndFeatures() {
-    } else if (!features && config->requireCET)
-      error(toString(f) + ": --require-cet: file is not compatible with CET");
+    } else if (config->requireCET && !(features & GNU_PROPERTY_X86_FEATURE_1_IBT)) {
+      warn(toString(f) + ": --require-cet: file is not compatible with CET");
+      features |= GNU_PROPERTY_X86_FEATURE_1_IBT;
+    }

@xiangzhangllvm The previous version I uploaded definitely did not work. I just applied a fix. Both i386 and x86_64 seem to work now, but there are still problems with the tests.

You need either arc patch D59780 or curl -L 'https://reviews.llvm.org/D59780?download=true' | patch -p1 to apply the latest diff in your local repository.

I have a separate patch to rename lld --force-bti to -z force-bti. I think --force-ibt or -z force-ibt may make more sense than --require-cet.

Hmm, I'm sorry but I'm confused. IIRC I had a discussion in the LLVM dev meeting that we were going to submit a change with a single PLT scheme rather than IPLT, so when I said that I'm going to submit a patch, I meant that I'm going to submit a patch for the 1PLT scheme rather than the 2PLT scheme. But this is for the 2PLT scheme. This is not something I want.

What was the decision that was made at the developer meeting? Will lld support the 2PLT scheme defined in the psABI?

I don't know what was discussed at the developer meeting, but I don't see how not following the established ABI is an option. It seems clear that GNU implementations will not change the ABI they've been using for 2+ years. I don't see the value in having lld create a competing ABI.

Apply two patches

-  uint64_t Plt = In.Plt->getVA();
+  uint64_t Plt = In.IBTPlt ? In.IBTPlt->getVA() : In.Plt->getVA();
--- i/lld/ELF/Driver.cpp
+++ w/lld/ELF/Driver.cpp
@@ -1705,2 +1705,4 @@ template <class ELFT> static uint32_t getAndFeatures() {
-    } else if (!features && config->requireCET)
-      error(toString(f) + ": --require-cet: file is not compatible with CET");
+    } else if (config->requireCET && !(features & GNU_PROPERTY_X86_FEATURE_1_IBT)) {
+      warn(toString(f) + ": --require-cet: file is not compatible with CET");
+      features |= GNU_PROPERTY_X86_FEATURE_1_IBT;
+    }

@xiangzhangllvm The previous version I uploaded definitely did not work. I just applied a fix. Both i386 and x86_64 seem to work now, but there are still problems with the tests.

You need either arc patch D59780 or curl -L 'https://reviews.llvm.org/D59780?download=true' | patch -p1 to apply the latest diff in your local repository.

I have a separate patch to rename lld --force-bti to -z force-bti. I think --force-ibt or -z force-ibt may make more sense than --require-cet.

Thank you very much!
I checked the "X86-64 fail tests", the reason is just that you changed "error" to "warn".
--force-ibt maybe use, (but to change --force-ibt to --force-cet, because 'cet' contains 'ibt' and 'shstk')
--require-cet is also needed, (strictly check the input files, make sure run ok in CET machines).
And I think it better to add --try-cet too, try to generate CET output if all inputs is CETed, this must be useful in bootstrap compiling program. (To let the whole program CETed step by step).

Any way, we can add the other "refined options" late, this time, just let '-force-cet' in.

Apply two patches

-  uint64_t Plt = In.Plt->getVA();
+  uint64_t Plt = In.IBTPlt ? In.IBTPlt->getVA() : In.Plt->getVA();
--- i/lld/ELF/Driver.cpp
+++ w/lld/ELF/Driver.cpp
@@ -1705,2 +1705,4 @@ template <class ELFT> static uint32_t getAndFeatures() {
-    } else if (!features && config->requireCET)
-      error(toString(f) + ": --require-cet: file is not compatible with CET");
+    } else if (config->requireCET && !(features & GNU_PROPERTY_X86_FEATURE_1_IBT)) {
+      warn(toString(f) + ": --require-cet: file is not compatible with CET");
+      features |= GNU_PROPERTY_X86_FEATURE_1_IBT;
+    }

@xiangzhangllvm The previous version I uploaded definitely did not work. I just applied a fix. Both i386 and x86_64 seem to work now, but there are still problems with the tests.

You need either arc patch D59780 or curl -L 'https://reviews.llvm.org/D59780?download=true' | patch -p1 to apply the latest diff in your local repository.

I have a separate patch to rename lld --force-bti to -z force-bti. I think --force-ibt or -z force-ibt may make more sense than --require-cet.

Thank you very much!
I checked the "X86-64 fail tests", the reason is just that you changed "error" to "warn".
--force-ibt maybe use, (but to change --force-ibt to --force-cet, because 'cet' contains 'ibt' and 'shstk')
--require-cet is also needed, (strictly check the input files, make sure run ok in CET machines).
And I think it better to add --try-cet too, try to generate CET output if all inputs is CETed, this must be useful in bootstrap compiling program. (To let the whole program CETed step by step).

Any way, we can add the other "refined options" late, this time, just let '-force-cet' in.

x86 IBT is similar to AArch64 BTI. Whether we should upgrade from -z force-ibt to -z force-cet depends on whether shstk requires support in the object files for correctness.. Here is my understanding.

  • GNU_PROPERTY_X86_FEATURE_1_IBT represents the capability of IBT in an object file. The output is IBT capable only if all object files have the bit. GNU_PROPERTY_AARCH64_FEATURE_1_BTI belongs to this category. So naming the relevant option -z force-ibt makes sense.
  • GNU_PROPERTY_X86_FEATURE_1_SHSTK represents that an object file desires SHSTK. It does not require support in the object files for correctness. GNU_PROPERTY_AARCH64_FEATURE_1_PAC_PLT belongs to this category.

I would name the two options -z force-ibt and -z shstk. I can imagine that shstk may have non-negligible overhead, and probably not everyone wants to enable it. So having separate options makes sense.

(GNU ld supports -z ibt and -z cet-report=[none|warning|error]. If shstk, as I think, does not require object files' support, I will consider it not so ideal. The name -z ibt does not make it clear that this forces ibt when some object files are not IBT capable. -z cet-report mixes intentions of two features.)

If @ruiu still accepts the patch in the first place, I may go with -z force-ibt and -z shstk.

MaskRay added a comment.EditedDec 11 2019, 7:45 PM

Hmm, I'm sorry but I'm confused. IIRC I had a discussion in the LLVM dev meeting that we were going to submit a change with a single PLT scheme rather than IPLT, so when I said that I'm going to submit a patch, I meant that I'm going to submit a patch for the 1PLT scheme rather than the 2PLT scheme. But this is for the 2PLT scheme. This is not something I want.

What was the decision that was made at the developer meeting? Will lld support the 2PLT scheme defined in the psABI?

I don't know what was discussed at the developer meeting, but I don't see how not following the established ABI is an option. It seems clear that GNU implementations will not change the ABI they've been using for 2+ years. I don't see the value in having lld create a competing ABI.

The PLT scheme in this patch is already different. See that GNU ld -z ibt produces a MPX bnd prefix but this patch doesn't. Also check out my objdump (built from HEAD) example above. PLT schemes have already diverged. I am still not very convinced doing things differently is breaking ABI. If pursing for 1PLT scheme breaks ABI, then this patch breaks ABI, too. In the long term, downstream tools should be taught not to rely on the exact forms of PLT entries. They should make assumption that the first instructions takes X bytes, the second takes Y, etc. They should use better heuristics and the psABI should mention this so that we won't get locked into the any specific PLT scheme.

For MPX prefix:
GCC have not supported the MPX from GCC 9. And intel will not support MPX code too. So we don’t consider MPX for CET in LLD.

For MPX prefix:
GCC have not supported the MPX from GCC 9. And intel will not support MPX code too. So we don’t consider MPX for CET in LLD.

I know that GCC has removed MPX and the Linux kernel is removing MPX (user-visible APIs and self-tests have been removed). I asked because I haven't seen a change on binutils-gdb side that will support a .plt.sec scheme without the BND prefix. So, I wonder what kind of changes are considered divergence from x86-64 psABI. After the removal of the BND prefix, the .plt entry will get the leeway of 2 bytes. If, say, in the future, a new security enhanced feature is proposed which requires a new instruction that will take more than 2 bytes, the 16-byte .plt entry no longer works, and toolchains will have to migrate a third PLT scheme, different from traditional PLT and the .plt.sec scheme.

As to the option name question, are you happy with -z force-ibt and -z shstk? (My understanding is that they should be very similar to -z force-bti and -z pac-plt, respectively.)

For MPX prefix:
GCC have not supported the MPX from GCC 9. And intel will not support MPX code too. So we don’t consider MPX for CET in LLD.

I know that GCC has removed MPX and the Linux kernel is removing MPX (user-visible APIs and self-tests have been removed). I asked because I haven't seen a change on binutils-gdb side that will support a .plt.sec scheme without the BND prefix. So, I wonder what kind of changes are considered divergence from x86-64 psABI. After the removal of the BND prefix, the .plt entry will get the leeway of 2 bytes. If, say, in the future, a new security enhanced feature is proposed which requires a new instruction that will take more than 2 bytes, the 16-byte .plt entry no longer works, and toolchains will have to migrate a third PLT scheme, different from traditional PLT and the .plt.sec scheme.

As to the option name question, are you happy with -z force-ibt and -z shstk? (My understanding is that they should be very similar to -z force-bti and -z pac-plt, respectively.)

These options are OK for us, thank you again!
By the way, as I know, gnu ld have no CET options, (excepted -z cet-report=xxx you point out before). it just try to add CET flags if all input file CETed.
For MPX prefix, in my eyes, it seems the new Binutils just forget to remove it. Try ld --help you can see it list "-z nobndplt Generate a regular PLT (default)"

MaskRay updated this revision to Diff 233731.Dec 12 2019, 8:38 PM
MaskRay edited the summary of this revision. (Show Details)

Delete --force-cet. Add -z force-ibt and -z shstk.

For MPX prefix:
GCC have not supported the MPX from GCC 9. And intel will not support MPX code too. So we don’t consider MPX for CET in LLD.

I know that GCC has removed MPX and the Linux kernel is removing MPX (user-visible APIs and self-tests have been removed). I asked because I haven't seen a change on binutils-gdb side that will support a .plt.sec scheme without the BND prefix. So, I wonder what kind of changes are considered divergence from x86-64 psABI. After the removal of the BND prefix, the .plt entry will get the leeway of 2 bytes. If, say, in the future, a new security enhanced feature is proposed which requires a new instruction that will take more than 2 bytes, the 16-byte .plt entry no longer works, and toolchains will have to migrate a third PLT scheme, different from traditional PLT and the .plt.sec scheme.

As to the option name question, are you happy with -z force-ibt and -z shstk? (My understanding is that they should be very similar to -z force-bti and -z pac-plt, respectively.)

These options are OK for us, thank you again!
By the way, as I know, gnu ld have no CET options, (excepted -z cet-report=xxx you point out before). it just try to add CET flags if all input file CETed.
For MPX prefix, in my eyes, it seems the new Binutils just forget to remove it. Try ld --help you can see it list "-z nobndplt Generate a regular PLT (default)"

-z nobndplt is not implemented in GNU ld

% ld/ld-new -z nobndplt -v
./ld-new: warning: -z nobndplt ignored
GNU ld (GNU Binutils) 2.33.50.20191213

gold supports -z nobndplt but it does not implement IBT.

% gold/ld-new --help | grep bndplt
  -z bndplt                   (x86-64 only) Generate a BND PLT for Intel MPX
  -z nobndplt                 Generate a regular PLT (default)

I have uploaded a diff that implements -z force-ibt and -z shstk, but ultimately it is the code owner (@ruiu)'s decision whether the change should be included.

@ruiu,
Can we submit the patch?

PIng.
@ruiu LLVM 10.0.0 will be branched on Jan. 15. I would expect that the LLD CET support can be included in this release. If you have concern with current implementation, let's have an offline discussion and move it forward quickly.

@ruiu If you are ok with this, I'll rebase, fix the PLT declarations (since I updated them in previous refactorings) and commit.

ruiu added a comment.Jan 9 2020, 6:44 AM

@MaskRay Please rebase and upload it with me and other people as reviewers, then I'll LGTM.

MaskRay commandeered this revision.Jan 9 2020, 10:27 AM
MaskRay edited reviewers, added: ruiu; removed: MaskRay.
This revision now requires review to proceed.Jan 9 2020, 10:27 AM
This comment was removed by MaskRay.
MaskRay updated this revision to Diff 237157.EditedJan 9 2020, 12:19 PM

@ruiu Done.

Rebase on top of D72474

iplt works. Tested several combinations

clang -m32 -static -fcf-protection=full a.c -fuse-ld=lld -Wl,-z,force-ibt -fno-PIC -o a
clang -m64 -static -fcf-protection=full a.c -fuse-ld=lld -Wl,-z,force-ibt -fno-PIC -o a
clang -m32 -static -fcf-protection=full a.c -fuse-ld=lld -Wl,-z,force-ibt -fPIC -o a
clang -m64 -static -fcf-protection=full a.c -fuse-ld=lld -Wl,-z,force-ibt -fPIC -o a
clang -m32 -fcf-protection=full a.c -fuse-ld=lld -Wl,-z,force-ibt -fPIC -o a
clang -m64 -fcf-protection=full a.c -fuse-ld=lld -Wl,-z,force-ibt -fPIC -o a

Unit tests: fail. 61724 tests passed, 1 failed and 779 were skipped.

failed: lld.ELF/x86-64-feature-cet.s

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

MaskRay updated this revision to Diff 237175.Jan 9 2020, 1:01 PM

Fix manpage

Unit tests: fail. 61725 tests passed, 1 failed and 779 were skipped.

failed: lld.ELF/x86-64-feature-cet.s

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

xiangzhangllvm added inline comments.Jan 9 2020, 6:38 PM
lld/test/ELF/x86-64-feature-cet.s
41

lit test failed here, Here we mainly check is the first plt address is written correct in .got.plt
and we can really see the data (50132000)
201350: f3 0f 1e fa endbr64 // First PLT

I see here (line 87) add indirect function check, do we need to deal with this special function in this patch?
or deal with it in another independent patch?

MaskRay marked an inline comment as done.
MaskRay added inline comments.
lld/test/ELF/x86-64-feature-cet.s
41

On RELA targets, .got.plt slots for IFUNC are not used. They can be kept zero. See D72474

Unfortunately pre-merge-bot does not recognize patch dependencies, so x86-64-feature-cet.s will fail.

xiangzhangllvm added inline comments.Jan 10 2020, 3:11 AM
lld/test/ELF/x86-64-feature-cet.s
41

So can we remove the GOTPLT check here let it 'skip' the bot check? because LLVM 10.0.0 will release on Jan 15, 2020. We can add back the GOTPLT check after the dependent patch merged.

MaskRay marked 2 inline comments as done.Jan 10 2020, 10:37 AM
MaskRay added inline comments.
lld/test/ELF/x86-64-feature-cet.s
41

You did not need to worry about the test...

Anyway I've committed the .got.plt change. This patch should pass all tests now.

@ruiu Good to merge?

MaskRay updated this revision to Diff 237391.Jan 10 2020, 10:38 AM
MaskRay marked an inline comment as done.

Rebase after .got.plt change (D72474).

merge_guards_bot should be happy now.

Unit tests: pass. 61763 tests passed, 0 failed and 780 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

ping @ruiu Hi, Do you have time to review? Thanks!

@MaskRay Please rebase and upload it with me and other people as reviewers, then I'll LGTM.

@MaskRay as @ruiu said above, I think you may assume that the patch is LGTH after it's rebased, if he has no other comments today. Given that, can you merge the patch in before LLVM branching?

This revision was not accepted when it landed; it landed in state Needs Review.Jan 13 2020, 11:48 PM
This revision was automatically updated to reflect the committed changes.

Thank you again!!