Page MenuHomePhabricator

Support X86 Control-flow Enforcement Technology (CET) in LLD
Needs ReviewPublic

Authored by xiangzhangllvm on Feb 11 2019, 10:31 PM.

Details

Summary
Control-flow Enforcement Technology (CET)

provides the following capabilities to defend against ROP/JOP style control-flow subversion attacks:

Shadow Stack (SHSTK) – return address protection to defend against Return Oriented Programming,

A shadow stack is a second stack for the program that is used exclusively for control transfer operations.
This stack is separate from the data stack and can be enabled for operation individually in user mode or supervisor mode.
When shadow stacks are enabled, the CALL instruction pushes the return address on both the data and shadow stack.
The RET instruction pops the return address from both stacks and compares them.
If the return addresses from the two stacks do not match, the processor signals a control protection exception (#CP).

Indirect branch tracking (IBT) – free branch protection to defend against Jump/Call Oriented Programming.

The ENDBRANCH is a new instruction that is used to mark valid indirect call/jump targets in the program.
This instruction opcode is selected to be one that is a NOP on legacy machines such that programs compiled with ENDBRANCH new instruction continue to function on old machines without the CET enforcement.
On processors that support CET the ENDBRANCH is still a NOP and is primarily used as a marker instruction by the in-order part of the processor pipeline to detect control flow violations.
The CPU implements a state machine that tracks indirect jump and call instructions. When one of these instructions is seen, the state machine moves from IDLE to WAIT_FOR_ENDBRANCH state.
In WAIT_FOR_ENDBRANCH state the next instruction in the program stream must be an ENDBRANCH.
If an ENDBRANCH is not seen the processor causes a control protection fault else the state machine moves back to IDLE state.

Work flow of CET:

A CET supported executable file will contain the CET flags in its .note.gnu.property section.
(The flags of SHSTK or IBT will be set at the GNU_PROPERTY_X86_FEATURE_1_AND property in .note.gnu.property section if the CET is enabled.)
When run the executable file, CET-enabled OS will check these flags in the related segment, and run the feature accordingly on supported hardware.
If it run on unsupported machines, These CET flags or ENDBRANCH will be ignore.

The key implement of the feature in LLD:

LLD will first check if all the relocatable object files contain the .note.gnu.property section and if there is GNU_PROPERTY_X86_FEATURE_1_AND property in this section.
If it finds the GNU_PROPERTY_X86_FEATURE_1_AND property in all the relocatable object files,
it will handle the flags in these properties and sections, create the resulting .note.gnu.property section in output file if necessary.
Because the IBT feature needs to insert an endbr* instruction at the end of indirect jump, LLD also deals with the lazy binding by adding a second PLT called SPLT.
Recording to https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-draft.pdf 13.2 DynamicLinking

Diff Detail

Repository
rLLD LLVM Linker

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

From the linker's perspective, .splt *is* PLT, which inter-module calls jump to. If .splt is enabled, .plt is no longer a PLT in the regular sense, but it just contains code to lazy-resolve symbols. So, I think you should handle .splt as the PLT despite its name throughout this patch if .splt is enabled. For example, In.Plt should be > the .splt and Sym.getPltVA() should return its address in .splt.
(I feel that the spec made a mistake when naming .splt -- it should've been .plt and the current "first" PLT should be .splt, but it's probably too late to complain.)

Hi , @ruiu:
I was refining the patch as your second suggestion, I got some troubles in this work. I found it hard to "reverse" the roles of the 2 PLTs in the source code.

  1. We can not simply use Sym.getPltVA() instead of Sym.getSPltVA(), because the first PLT entry is also needed somewhere, for example:

void X86_64<ELFT>::writeGotPlt(uint8_t *Buf, const Symbol &S) const {

if (Config->SPltSectionEnable)
  // If the PLT is IBT-enabled, the entries in .got.plt also initially point to
  // the entries in the first PLT not the second PLT.
  write64le(Buf, S.getPltVA());
else
  // Otherwise the entries in .got.plt initially point back to the
  // corresponding PLT entries with a fixed offset to skip the first
  // instruction.
  write64le(Buf, S.getPltVA() + 6);

}

  1. If I "reverse" the definition of the 2 PLTs fully, how about the Non-CET case, use SPLT class to definite the plts? And I worry that it may affect the other targets' code.

I am also a little afraid that will take confusions of understanding the code when people refer the ABI files.

Could you please give me suggestions about these if you have some good ideals?
And do you still concern the options way?

@xiangzhangllvm, a question for you not directly related to the patch but I suspect you might be best suited to find an answer: how do we test CET support today? We started trying to add CET to FreeBSD some time ago but have no way to test any work that we do.

We are using a special CET SDV for CET enabling. We have been enabling CET in Fedora piece by piece by adding -fcf-protection to CFLAGS.

I think https://bugs.llvm.org/show_bug.cgi?id=41000 "PT_NOTE segment not packed correctly" has some relevance to this review. In particular the .note.gnu.property sections on 64-bit machines have 8-byte alignment, merging these with 4-byte aligned note sections such as build-id can cause problems for consumers of PT_NOTE (see also https://sourceware.org/ml/libc-alpha/2018-08/msg00340.html for a discussion on the glibc mailing list). My understanding is that the eventual solution was to collate 4 and 8 byte SHT_NOTE sections separately and output a PT_NOTE section for each. I think that we should follow that and have tests for it.

I think https://bugs.llvm.org/show_bug.cgi?id=41000 "PT_NOTE segment not packed correctly" has some relevance to this review. In particular the .note.gnu.property sections on 64-bit machines have 8-byte alignment, merging these with 4-byte aligned note sections such as build-id can cause problems for consumers of PT_NOTE (see also https://sourceware.org/ml/libc-alpha/2018-08/msg00340.html for a discussion on the glibc mailing list). My understanding is that the eventual solution was to collate 4 and 8 byte SHT_NOTE sections separately and output a PT_NOTE section for each. I think that we should follow that and have tests for it.

Good discovery!
I think we need change the .note.gnu.property section alignment to 4-byte in Clang-TableGen, because all its element is 4-byte required now.
The old ABI required some of note.gnu.property' elements be 8-byte aligned in 64-bit machine, so the section required 8-byte alignment.
Last year, I found a bug about this alignment between GCC and CLANG, Then I discussed with the ABI designer and changed its element's alignment in https://reviews.llvm.org/D56080 .
I think there are already no ‘significance’ of 8 byte alignment for this section, we can change it now in LLVM.

MaskRay added a comment.EditedMar 8 2019, 6:17 PM

I think https://bugs.llvm.org/show_bug.cgi?id=41000 "PT_NOTE segment not packed correctly" has some relevance to this review. In particular the .note.gnu.property sections on 64-bit machines have 8-byte alignment, merging these with 4-byte aligned note sections such as build-id can cause problems for consumers of PT_NOTE (see also https://sourceware.org/ml/libc-alpha/2018-08/msg00340.html for a discussion on the glibc mailing list). My understanding is that the eventual solution was to collate 4 and 8 byte SHT_NOTE sections separately and output a PT_NOTE section for each. I think that we should follow that and have tests for it.

Good discovery!
I think we need change the .note.gnu.property section alignment to 4-byte in Clang-TableGen, because all its element is 4-byte required now.
The old ABI required some of note.gnu.property' elements be 8-byte aligned in 64-bit machine, so the section required 8-byte alignment.
Last year, I found a bug about this alignment between GCC and CLANG, Then I discussed with the ABI designer and changed its element's alignment in https://reviews.llvm.org/D56080 .
I think there are already no ‘significance’ of 8 byte alignment for this section, we can change it now in LLVM.

@peter.smith @xiangzhangllvm We can move the PT_NOTE alignment discussion to https://reviews.llvm.org/D59120 .

Other than the Auto vs Manual (for .note.gnu.property merging) debate, the second PLT section .splt may be more controversial. Hopefully we get consensus on .note.gnu.property soon and focus on that.

By the way: Do we want to rebuild the Linux with LLVM one day?

@xiangzhangllvm I think the answer is yes, at least some organizations are interested at that. https://github.com/ClangBuiltLinux/linux/issues I believe clang can build aarch64 and x86_64 Linux now. Linking the kernel with lld is probably done on aarch64, and linking it with x86_64 and powerpc64 has made much progress (the issue land is mostly the linker scripts).

Yes, MaskRay, let's get consensus on Auto vs Manual soon,
If you and @ruiu do not buy the reasones of Auto way, I tend to switch to the Manual way right now, after all, it is easy to 'refine' in future.

Hi friends:
I add --auto-cet and --force-cet options, and disable CET in default. Now there will be no influence to the LLD when not using these options.
I did these because I want to quickly finish the options discussion, I think it is easy to refine in the future.
Thank you very much!

ruiu added a comment.Mar 11 2019, 10:39 AM

Hi friends:
I add --auto-cet and --force-cet options, and disable CET in default. Now there will be no influence to the LLD when not using these options.
I did these because I want to quickly finish the options discussion, I think it is easy to refine in the future.

We actually need to be careful when adding new command line options. It is after all the only thing our users will see, and a flaw in a design would cause a real pain. In addition to that, a new feature with a new command line option is not something we can refine easily in the future. It is actually opposite: once we add an option, it is extremely difficult to change or remove it. Therefore, it is worth spending a little time to get everything right from beginning.

ruiu added a comment.Mar 11 2019, 11:11 AM

--auto-cet and --force-cet is perhaps fine. That should ensure users are aware of what they are doing (so the feature is not enabled too automatically), and it also provides a way to pin it so that it won't be accidentally turned off.

Was there any reason to choose that option names? H. J. Lu mentioned that GNU linkers supports several -z options for CET. If there's no strong reason to invent new option names, we should use the same command line options as GNU.

I think there are still several issues with this patch. I pointed out some of them before, but to summarize, let me iterate:

  • Your patch collects X86Feature1 bits from input files to copy that to a .note section in the output file. I don't think you need to do that. You only need to make sure that the input files have a desired bit in X86Feature1 bits, and generate a .note section containing GNU_PROPERTY_X86_FEATURE_1_IBT bit. In other words, please remove X86Feature1AND from Config.
  • You add fair amount of new code to Driver.cpp, but which feels wrong. It should not be part of a driver. But I think the real problem is that the code to read .note section does just too much.
  • Do you really have to read a .note section containing multiple descriptions? I think that compiler-created .note section contains only one description per a section, so you don't need to care abou it.
xiangzhangllvm added a comment.EditedMar 11 2019, 6:32 PM

--auto-cet and --force-cet is perhaps fine. That should ensure users are aware of what they are doing (so the feature is not enabled too automatically), and it also provides a way to pin it so that it won't be accidentally turned off.

Was there any reason to choose that option names? H. J. Lu mentioned that GNU linkers supports several -z options for CET. If there's no strong reason to invent new option names, we should use the same command line options as GNU.

Gold auto enabled CET in default, so it have no options corresponding to the --auto-cet and --force-cet.
The '-z' options in gold is just for testing, For Simplify,It better to add them in a dependent patch.

I think there are still several issues with this patch. I pointed out some of them before, but to summarize, let me iterate:

  • Your patch collects X86Feature1 bits from input files to copy that to a .note section in the output file. I don't think you need to do that. You only need to make sure that the input files have a desired bit in X86Feature1 bits, and generate a .note section containing GNU_PROPERTY_X86_FEATURE_1_IBT bit. In other words, please remove X86Feature1AND from Config.

Yes, ruiu, last time I also ask you about this point. please let me summarize your ideas here, then please check if I understand right?

  1. If not all input files have X86Feature1 bits, then discard the .note.gnu.property sections.
  2. If all input files have X86Feature1 bits: then we need to 'calculate' the resulting X86Feature1 bits from all the X86Feature1 bits ? (yes or no ?) then delete all the input files' .note.gnu.property sections ? (yes or no ?) then generate a .note.gnu.property section with the resulting X86Feature1 bits in input file level not in output file by Writer, so we need not to pass the X86Feature1 bits to Writer ? (yes or no ?)
  • You add fair amount of new code to Driver.cpp, but which feels wrong. It should not be part of a driver. But I think the real problem is that the code to read .note section does just too much.

Hi, ruiu, there are not very much code now, in fact, half of them are code annotation.

  • Do you really have to read a .note section containing multiple descriptions? I think that compiler-created .note section contains only one description per a section, so you don't need to care about it.

Yes, the compiler-created .note section contains only one description per a section, But we should consider the handwriting assemble files.

Thank you for your careful review!

Yes, ruiu, last time I also ask you about this point. please let me summarize your ideas here, then please check if I understand right?

If not all input files have X86Feature1 bits, then discard the .note.gnu.property sections.
If all input files have X86Feature1 bits:

then we need to 'calculate' the resulting X86Feature1 bits from all the X86Feature1 bits ?(yes or no ?)
then delete all the input files' .note.gnu.property sections ?(yes or no ?)
then generate a .note.gnu.property section with the resulting X86Feature1 bits in input file level not in output file by Writer, so we need not to pass the X86Feature1 bits to Writer ?(yes or no ?)
MaskRay added a comment.EditedMar 11 2019, 7:11 PM

Gold auto enabled CET in default

What kind of CET support does gold have now? I think it can merge .note.gnu.property now but I don't find code related to second PLT or ibtplt. (I have a question about ibtplt, see below).

I add --auto-cet and --force-cet options, and disable CET in default. Now there will be no influence to the LLD when not using these options.

I find that ld.bfd has three related options -z ibt -z ibtplt -z shstk (are there others?). What do they do?

-z ibtplt                   Generate IBT-enabled PLT entries\n"));
fprintf (file, _("\
-z ibt                      Generate GNU_PROPERTY_X86_FEATURE_1_IBT\n"));
fprintf (file, _("\
-z shstk                    Generate GNU_PROPERTY_X86_FEATURE_1_SHSTK\n"));

Is -z ibtplt similar to the second PLT? Does -z ibt generate GNU_PROPERTY_X86_FEATURE_1_IBT even if there is an input object file with the GNU_PROPERTY_X86_FEATURE_1_IBT bit cleared? Or when it is specified while -z shstk is not, merge just ibt but not shstk? I'm puzzled as you said the note merging behavior is automatic in ld.bfd.

--auto-cet and --force-cet

We can also make one option --cet={none,auto,force}, but I'd like to understand first what -z ibt -z shstk -z ibtplt do...

ELF/Driver.cpp
75

This macros is unnecessary. You can use alignTo. See ELF/LinkerScript.cpp for examples.

327

Delete "IBT can" " instead of retpolineplt".

They are two different aspects of security and are actually unrelated. I think the incompatibility is not due to technical impossibility, but we expect ( Retpoline-A-Branch-Target-Injection-Mitigation.pdf Section 4.3 claims) in the future retpolineplt can retire because of better hardware-level protection on CET-enabled processors.

MaskRay added inline comments.Mar 11 2019, 7:27 PM
test/ELF/i386-feature-1-and.s
24

These llvm-readelf -S %t RUN commands can share the same check-prefix. They just check if .note.gnu.property is present. (Actually I'm thinking if they can be deleted if the related llvm-readelf -n is not empty)

40

Typo. The comment may be less useful. A brief comment above the llvm-mc ld.lld commands explaining what the test is intended for may be more helpful to readers.

test/ELF/x86-64-feature-1-and.s
40

Typo. protery -> property

xiangzhangllvm added a comment.EditedMar 11 2019, 8:01 PM

Gold auto enabled CET in default

What kind of CET support does gold have now? I think it can merge .note.gnu.property now but I don't find code related to second PLT or ibtplt. (I have a question about ibtplt, see below).

I add --auto-cet and --force-cet options, and disable CET in default. Now there will be no influence to the LLD when not using these options.

I find that ld.bfd has three related options -z ibt -z ibtplt -z shstk (are there others?). What do they do?

-z ibtplt                   Generate IBT-enabled PLT entries\n"));
fprintf (file, _("\
-z ibt                      Generate GNU_PROPERTY_X86_FEATURE_1_IBT\n"));
fprintf (file, _("\
-z shstk                    Generate GNU_PROPERTY_X86_FEATURE_1_SHSTK\n"));

Is -z ibtplt similar to the second PLT? Does -z ibt generate GNU_PROPERTY_X86_FEATURE_1_IBT even if there is an input object file with the GNU_PROPERTY_X86_FEATURE_1_IBT bit cleared? Or when it is specified while -z shstk is not, merge just ibt but not shstk? I'm puzzled as you said the note merging behavior is automatic in ld.bfd.

--auto-cet and --force-cet

We can also make one option --cet={none,auto,force}, but I'd like to understand first what -z ibt -z shstk -z ibtplt do...

Hi MaskRay,
1 please not focus on these -z options now, H.J add them just for testing, for example, if you compiled file without CET, but finally, you want to see "there will be errors" in runtime if you force add CET flags in the output file.

2 I have sought used --xxx={none,auto,force}, at last I choose using --xxx-xxx, because I think we will mostly use --auto-cet, rarely use others

This macros is unnecessary. You can use alignTo. See ELF/LinkerScript.cpp for examples.

Delete "IBT can"  " instead of retpolineplt".

OK, I'll change it, thank you very much!

xiangzhangllvm updated this revision to Diff 190206.EditedMar 11 2019, 10:22 PM
--- a/ELF/Driver.cpp
+++ b/ELF/Driver.cpp
@@ -72,8 +72,6 @@ LinkerDriver *elf::Driver;

 static void setConfigs(opt::InputArgList &Args);

-#define ALIGN_UP(size, alignment) ((size) + (alignment) - 1)&(-(alignment))

 bool elf::link(ArrayRef<const char *> Args, bool CanExitEarly,
                raw_ostream &Error) {
   errorHandler().LogName = args::getFilenameWithoutExe(Args[0]);
@@ -323,8 +321,7 @@ static void checkOptions() {

   if (Config->ZRetpolineplt
       && (Config->AutoCET || Config->ForceCET))
-    error("--auto-cet/--force-cet may not be used with -z retpolineplt, IBT can"
-          " instead of retpolineplt");
+    error("--auto-cet/--force-cet may not be used with -z retpolineplt");
 }

 static const char *getReproduceOption(opt::InputArgList &Args) {
@@ -1600,7 +1597,7 @@ static bool findGNUPropertyX86Feature1AND(InputSectionBase *S,
         do {
           unsigned Size = Data[CurSecOffset + DescOffset + pr_dataszInDescIndex];
           Size += CurSecOffset * 4 + DescOffset*4 + DescFixedSize;
-          Size = ALIGN_UP (Size, S->Alignment);
+          Size = alignTo (Size, S->Alignment);
           NextDescOffset = Size / 4 - CurSecOffset;  //next DescOffset
           Next_pr_typeIndex = CurSecOffset + NextDescOffset + pr_typeInDescIndex;

@@ -1624,7 +1621,7 @@ static bool findGNUPropertyX86Feature1AND(InputSectionBase *S,
     CurSize += HeadSize + Size;
     // Section and pr_data both have the align requestment: 8 in 64bit cpu
     // and 4 in 32bit cpu.
-    CurSize = ALIGN_UP (CurSize, S->Alignment);
+    CurSize = alignTo (CurSize, S->Alignment);
     CurSecOffset = CurSize / 4;
   }
   return false;
test/ELF/i386-feature-1-and.s
24

It seems make sense!

40

Yes, I ll recommands it, in fact, these test is very simple for CET test.
They just check the flags is ok or not.
The most testing work we need to test in a CET special machine.

test/ELF/x86-64-feature-1-and.s
40

Sorry! ^_^

As an earlier comment mentioned, there is quite a lot of code that depends on the details of the .note.gnu.property section added very early to Driver.c which is not where we'd normally expect to see it. It also looks like you using quite a lot of code to get an InputSection to represent the output .note.gnu.property section. Given that there will be multiple Input .note.gnu.property sections but only one Output .note.gnu.property section, I think it would be cleaner to use a SyntheticSection to represent the .note.gnu.property section. SyntheticSections can implement all sorts of custom functionality that could move the parsing and merging code out of Driver.cpp. My suggestion is to:

  • Create a new SyntheticSection (see SyntheticSections.h), something like NotePropertySyntheticSection. This will represent our single output .note.gnu.property section. It can store the status of the FEATUREAND bits.
  • Add an entry to InStruct for a single instance of it that we will use for the output.
  • After InStruct instance is created in Writer.c but before the relocations are scanned to create PLT entries iterate through InputSections and pass each .note.gnu.property to the NotePropertySyntheticSection. Then erase these InputSections from the InputSections. The combineEhFrameSections<ELFT> function in Writer.cpp is one way of doing something like that.
  • When you add the .note.gnu.property InputSection to the NotePropertySyntheticSection you can update the FEATUREAND bits
  • The writeTo() member function of the NotePropertySyntheticSection can output the data needed to make up the output.

I think that this should allow you to integrate the .note.gnu.property in a similar way to other "merge" features such as EHFrame, and SHF_MERGE sections.

The ld.bfd AArch64 BTI patches have just gone up on https://www.sourceware.org/ml/binutils/2019-03/msg00021.html so I'll hopefully be starting work on another feature that uses .note.gnu.property soon.

ruiu added a comment.Mar 12 2019, 10:59 AM

--auto-cet and --force-cet is perhaps fine. That should ensure users are aware of what they are doing (so the feature is not enabled too automatically), and it also provides a way to pin it so that it won't be accidentally turned off.

Was there any reason to choose that option names? H. J. Lu mentioned that GNU linkers supports several -z options for CET. If there's no strong reason to invent new option names, we should use the same command line options as GNU.

Gold auto enabled CET in default, so it have no options corresponding to the --auto-cet and --force-cet.
The '-z' options in gold is just for testing, For Simplify,It better to add them in a dependent patch.

Please keep in mind that "let's think about that later" is not a correct approach for command line option names. That's a public interface that cannot be changed once released. I think that is also true to gold. Even if the intention to add the -z options to gold was for testing, if the options have been added in a user-visible way, it is very unlikely that they'd be removed. So you need to think carefully as to what is the most consistent set of option names both for gold and lld once you implement all the features you want. That the situation that lld has --{auto,force}-cet and gold has -z options for the same functionality seems pretty inconsistent. Probably the importance of option names should be taken more seriously.

I think there are still several issues with this patch. I pointed out some of them before, but to summarize, let me iterate:

  • Your patch collects X86Feature1 bits from input files to copy that to a .note section in the output file. I don't think you need to do that. You only need to make sure that the input files have a desired bit in X86Feature1 bits, and generate a .note section containing GNU_PROPERTY_X86_FEATURE_1_IBT bit. In other words, please remove X86Feature1AND from Config.

Yes, ruiu, last time I also ask you about this point. please let me summarize your ideas here, then please check if I understand right?

  1. If not all input files have X86Feature1 bits, then discard the .note.gnu.property sections.
  2. If all input files have X86Feature1 bits: then we need to 'calculate' the resulting X86Feature1 bits from all the X86Feature1 bits ? (yes or no ?) then delete all the input files' .note.gnu.property sections ? (yes or no ?) then generate a .note.gnu.property section with the resulting X86Feature1 bits in input file level not in output file by Writer, so we need not to pass the X86Feature1 bits to Writer ? (yes or no ?)

What I recommended is to always discard .note section after checking the GNU_PROPERTY_X86_FEATURE_1_IBT *bit*. There's no .note section merging or anything. When you parse a file, just read a single bit from a .note section (if exists), and discard it.

As a result, after you read all input files, you have only one bit -- which indicates whether or not all input files support CET.

Then, if CET is supported by all input files, create a synthetic section for a .note section to enable CET for the output.

  • You add fair amount of new code to Driver.cpp, but which feels wrong. It should not be part of a driver. But I think the real problem is that the code to read .note section does just too much.

Hi, ruiu, there are not very much code now, in fact, half of them are code annotation.

  • Do you really have to read a .note section containing multiple descriptions? I think that compiler-created .note section contains only one description per a section, so you don't need to care about it.

Yes, the compiler-created .note section contains only one description per a section, But we should consider the handwriting assemble files.

I'm skeptical if people actually hand-write a .note section in assembly. In particular, is it really normal or even correct to write a note section containing multiple descriptions in assembly by hand? When you enable the CET for an assembly file, you would pass a command line option to the assembler to let the assembler to add that section, no? I seems that you are overthinking. I'd just consider the case where there is only one note description in a .note section.

As an earlier comment mentioned, there is quite a lot of code that depends on the details of the .note.gnu.property section added very early to Driver.c which is not where we'd normally expect to see it. It also looks like you using quite a lot of code to get an InputSection to represent the output .note.gnu.property section. Given that there will be multiple Input .note.gnu.property sections but only one Output .note.gnu.property section, I think it would be cleaner to use a SyntheticSection to represent the .note.gnu.property section. SyntheticSections can implement all sorts of custom functionality that could move the parsing and merging code out of Driver.cpp. My suggestion is to:

  • Create a new SyntheticSection (see SyntheticSections.h), something like NotePropertySyntheticSection. This will represent our single output .note.gnu.property section. It can store the status of the FEATUREAND bits.
  • Add an entry to InStruct for a single instance of it that we will use for the output.
  • After InStruct instance is created in Writer.c but before the relocations are scanned to create PLT entries iterate through InputSections and pass each .note.gnu.property to the NotePropertySyntheticSection. Then erase these InputSections from the InputSections. The combineEhFrameSections<ELFT> function in Writer.cpp is one way of doing something like that.
  • When you add the .note.gnu.property InputSection to the NotePropertySyntheticSection you can update the FEATUREAND bits
  • The writeTo() member function of the NotePropertySyntheticSection can output the data needed to make up the output.

    I think that this should allow you to integrate the .note.gnu.property in a similar way to other "merge" features such as EHFrame, and SHF_MERGE sections.

    The ld.bfd AArch64 BTI patches have just gone up on https://www.sourceware.org/ml/binutils/2019-03/msg00021.html so I'll hopefully be starting work on another feature that uses .note.gnu.property soon.

Thanks very much for your suggestion, I was not much familiar to the function of SyntheticSection before, I'll carefully dig this class, It seems really not good to do the "calculate bits" work in Driver.cpp now. Thank you! peter

Please keep in mind that "let's think about that later" is not a correct approach for command line option names. That's a public interface that cannot be changed once released. I think that is also true to gold. Even if the intention to add the -z options to gold was for testing, if the options have been added in a user-visible way, it is very unlikely that they'd be removed. So you need to think carefully as to what is the most consistent set of option names both for gold and lld once you implement all the features you want. That the situation that lld has --{auto,force}-cet and gold has -z options for the same functionality seems pretty inconsistent. Probably the importance of option names should be taken more seriously.

lld's --{auto,force}-cet is different with gold's -z options, "-z" options directly add CET info into output files, not care the input files is CETed or not, they just designed for testing, it is not right to use them in normal build.
--{auto,force}-cet depend on the input files to enable CET or not.
The key difference is that gold auto enabled CET in default (equal with LLD's --auto-cet).
I tended to keep it same with gold, but seems hard to persuade you ☺

What I recommended is to always discard .note section after checking the GNU_PROPERTY_X86_FEATURE_1_IBT *bit*. There's no .note section merging or anything. When you parse a file, just read a single bit from a .note section (if exists), and discard it.
As a result, after you read all input files, you have only one bit -- which indicates whether or not all input files support CET.
Then, if CET is supported by all input files, create a synthetic section for a .note section to enable CET for the output.

OK, as peter's suggestion, I'll recheck here code, thank you!

I'm skeptical if people actually hand-write a .note section in assembly. In particular, is it really normal or even correct to write a note section containing multiple descriptions in assembly by hand?

The assemble syntax support multiple descriptions.
And suppose that, one day, not only CET feature will put into .note.gnu.property, (now ABI have defined several other kind features such as GNU_PROPERTY_X86_ISA_1_USED ... ). So I think it will be normal to see one section with multiple descriptions (each description contains a different feature).

When you enable the CET for an assembly file, you would pass a command line option to the assembler to let the assembler to add that section, no? I seems that you are overthinking. I'd just consider the case where there is only one note description in a .note section.

Yes, no, we write the .note.gnu.property in the assembly file not use options.

ruiu added a comment.Mar 13 2019, 1:55 PM

Please keep in mind that "let's think about that later" is not a correct approach for command line option names. That's a public interface that cannot be changed once released. I think that is also true to gold. Even if the intention to add the -z options to gold was for testing, if the options have been added in a user-visible way, it is very unlikely that they'd be removed. So you need to think carefully as to what is the most consistent set of option names both for gold and lld once you implement all the features you want. That the situation that lld has --{auto,force}-cet and gold has -z options for the same functionality seems pretty inconsistent. Probably the importance of option names should be taken more seriously.

lld's --{auto,force}-cet is different with gold's -z options, "-z" options directly add CET info into output files, not care the input files is CETed or not, they just designed for testing, it is not right to use them in normal build.
--{auto,force}-cet depend on the input files to enable CET or not.
The key difference is that gold auto enabled CET in default (equal with LLD's --auto-cet).

But gold already has options, -z ibt and -z ibtplt. You should be consistent with these options, shouldn't we?

What's going to happen if you specify these options to gold and some input files lack special marker .note section? Does the gold linker silently create an executable that won't work, or does it reject with an error message?

What I recommended is to always discard .note section after checking the GNU_PROPERTY_X86_FEATURE_1_IBT *bit*. There's no .note section merging or anything. When you parse a file, just read a single bit from a .note section (if exists), and discard it.
As a result, after you read all input files, you have only one bit -- which indicates whether or not all input files support CET.
Then, if CET is supported by all input files, create a synthetic section for a .note section to enable CET for the output.

OK, as peter's suggestion, I'll recheck here code, thank you!

I'm skeptical if people actually hand-write a .note section in assembly. In particular, is it really normal or even correct to write a note section containing multiple descriptions in assembly by hand?

The assemble syntax support multiple descriptions.
And suppose that, one day, not only CET feature will put into .note.gnu.property, (now ABI have defined several other kind features such as GNU_PROPERTY_X86_ISA_1_USED ... ). So I think it will be normal to see one section with multiple descriptions (each description contains a different feature).

When you enable the CET for an assembly file, you would pass a command line option to the assembler to let the assembler to add that section, no? I seems that you are overthinking. I'd just consider the case where there is only one note description in a .note section.

Yes, no, we write the .note.gnu.property in the assembly file not use options.

Fair enough. Maybe the thing that alerted me on that code was that the code was written in a very different way than the other code in the same .cpp file and other .cpp files in the same directory. First of all, the code to read that section should not be in the driver, and that was written in a different coding style. Perhaps after you fix that (probably you need to read code around the location where you are editing and mimic the way what they are doing), the code would blend into other pieces of code.

hi dears,
I updated the patch: most of changes refer to peter and ruiu's suggestions, but not of all.
The main change is:

  1. Read all input files, then collect the CET features, and discard the .note.gnu.property sections, do these in SyntheticSections.cpp not in Driver.cpp.
  2. If CET features checked successful, then create the SyntheticSection's sections (GnuPropertySection, SPltSection) in Writer's createSyntheticSections().

instead of directly creating .note.gnu.property section in Writer.

  1. Adjust the test cases to remove redundant code check.

deficiencies:

  1. I tend to collect the CET features in GnuPropertySection (SyntheticSection) class or in Writer->run like combineEhFrameSections(), but I found it should first get the CET features,

then decide to create the GnuPropertySection or not (SPltSection too).

  1. I didn't remove the Config->X86Feature1AND, because I need it to tell the createSyntheticSections() to create the GnuPropertySection or not (SPltSection too).
MaskRay edited reviewers, added: MaskRay; removed: maskray0.Mar 18 2019, 2:06 AM

H.J. have enable the CET in GNU-toolchain and rebuild/rewrite the related libs/linux-kernal, his team are testing it for at last one year.

Not Linux, FreeBSD. We are hoping to develop the kernel support but are unable to do so without a simulator.

deficiencies:

  1. I tend to collect the CET features in GnuPropertySection (SyntheticSection) class or in Writer->run like combineEhFrameSections(), but I found it should first get the CET features, then decide to create the GnuPropertySection or not (SPltSection too).
  2. I didn't remove the Config->X86Feature1AND, because I need it to tell the createSyntheticSections() to create the GnuPropertySection or not (SPltSection too).

I guess you mean "I tried", not "I tend" in 1). I think that you can do accomplish 1.) via unconditionally creating the In.GnuPropertySection and SPltSection and by implementing the empty() member function, ensure that they are removed by removeUnusedSyntheticSections() if there isn't anything that uses them. Unconditionally creating the sections means that you don't need to call mergeAggregateMetaData from Driver.cpp prior to createSyntheticSections(). For the case of ifunc where you need to turn off the feature then just inform In.GnuPropertySection and In.SPltSection as they will always be there.

ELF/SyntheticSections.cpp
300

Typo Dnone, should be Done. Having said that. You have already commented that only X86_FEATURE_1_AND is supported, do you need to repeat it?

ELF/SyntheticSections.h
154

The comment has quite a few typos. May I suggest something like:
// We only support a single feature X86_FEATURE_1_AND, so we hard code the DescSize to a single entry.

157

This doesn't need to be a function, you only use it once to initialize DescSize. Why not just make DescSize static const size_t DescSize = 16;

158

These member functions are only called once and can be inlined.

161

If you implement the empty() member function then you will be able to unconditionally create In.GnuProperty as it will be removed if it isn't used.

164

DescBuf is only used in writeTo, it can be defined there.

ELF/Writer.cpp
89

You aren't using this anymore so it can be removed.

425

If you implement the GnuPropertySection::empty() member function then you can unconditionally create the section. It will be removed by removeUnusedSynthetics() if there are no entries.

We are using a special CET SDV for CET enabling. We have been enabling CET in Fedora piece by piece by adding -fcf-protection to CFLAGS.

How do we get access to such a simulator so that we can do the same in FreeBSD?

xiangzhangllvm marked 7 inline comments as done.Mar 18 2019, 6:37 PM
xiangzhangllvm added inline comments.
ELF/SyntheticSections.cpp
300

Yes, sorry for the typo, thank you!
I repeat the reason here is to let people easily understand and they will know how to change it when the other feature added.

ELF/SyntheticSections.h
154

Sorry for the typos, I 'll change it, thank you!

157

For consideration of extensibility, I used function to get the DescSize. We will do some work here when we need deal with the other feature types, because there will create more than one Desc here.

158

Same reason with the getDescSize()

164

Make sense!

ELF/Writer.cpp
89

Yes, you are right!

425

Thank you! I want to try.

But gold already has options, -z ibt and -z ibtplt. You should be consistent with these options, shouldn't we?

Yes, I think we should, and I think it's better to support these test options in a independent patch, and it's easy to support.

What's going to happen if you specify these options to gold and some input files lack special marker .note section? Does the gold linker silently create an executable that won't work, or does it reject with an error message?

Yes, I test it, it will directly create '.note.gnu.property' section or 'splt', create an executable that won't work, and give no errors.
So it just used for test.
What your idea here?

We are using a special CET SDV for CET enabling. We have been enabling CET in Fedora piece by piece by adding -fcf-protection to CFLAGS.

How do we get access to such a simulator so that we can do the same in FreeBSD?

Hi, emaste , as I know, the SDV is not public now, we just internally use it.
is it? @hjl.tools

xiangzhangllvm added a comment.EditedMar 19 2019, 1:25 AM

I guess you mean "I tried", not "I tend" in 1). I think that you can do accomplish 1.) via unconditionally creating the In.GnuPropertySection and SPltSection and by implementing the empty() member function, ensure that they are removed by removeUnusedSyntheticSections() if there isn't anything that uses them. Unconditionally creating the sections means that you don't need to call mergeAggregateMetaData from Driver.cpp prior to createSyntheticSections(). For the case of ifunc where you need to turn off the feature then just inform In.GnuPropertySection and In.SPltSection as they will always be there.

Hi peter, I have implemented your idea in my code, and it work well, I did not update here now because I am not sure about the ifunc.
One of my destination is to remove the Config->X86Feature1AND and do the feature collection work inside synthetic GnuPropertySection class.
I found I can remove the Config->X86Feature1AND except dealing with ifunc, If I remove the Config->X86Feature1AND I still need another global variable.
At this stage I want to disable CET when the program will use ifunc, my current code is Relocations.cpp:1072

// To Be Done.
// Now the IBT can't work with IFUNC feature, because they may conflict in
// changing the PLT. So we disable the IBT when the IFUNC enabled.
if (Sym.isGnuIFunc()) {
  Config->X86Feature1AND &= ~GNU_PROPERTY_X86_FEATURE_1_IBT;
  Config->SPltSectionEnable = false;
}

Do you have some good idea to turn off the CET feature in ifunc ?
Thank you very much!

xiangzhangllvm added a comment.EditedMar 19 2019, 1:35 AM

I do this changes in my local: (the patch is too big here, so I cut some unimportant changes)
I do the feature collection work in the GnuPropertySection's Constructor which will be called in createSyntheticSections().
For the SPlt, I keep its generation condition, because the GnuPropertySection's Constructor will set the Config->SPltSectionEnable.

  1 diff --git a/ELF/Driver.cpp b/ELF/Driver.cpp
  2 index 7628df0..d5217bd 100644
  3 --- a/ELF/Driver.cpp
  4 +++ b/ELF/Driver.cpp
  5 @@ -1626,14 +1626,6 @@ template <class ELFT> void LinkerDriver::link(opt::InputArgList &Args) {
  6        return S->Name.startswith(".debug") || S->Name.startswith(".zdebug");
  7      });
  8
  9 -  // Merge aggregate metadata sections like .note.gnu.property that should be
 10 -  // found across all relocated files and reproduced once. But Until now, the
 11 -  // emitAggregateMetadata just deal with the X86Feature1AND info, and only X86_64, i386
 12 -  // support the X86Feature1AND.
 13 -  if ((Config->EMachine == EM_X86_64 || Config->EMachine == EM_386)
 14 -      && (Config->AutoCET || Config->ForceCET))
 15 -    mergeAggregateMetadata();
 16 -
 17    Config->EFlags = Target->calcEFlags();
 18
 19    if (Config->EMachine == EM_ARM) {

 20 diff --git a/ELF/SyntheticSections.cpp b/ELF/SyntheticSections.cpp
 21 index bb1c5d3..4ed1a72 100644
 22 --- a/ELF/SyntheticSections.cpp
 23 +++ b/ELF/SyntheticSections.cpp

 44 +bool GnuPropertySection::empty() const {
 45 +  uint32_t Feature = getFeature();
 46 +  return Feature == 0x0;
 47  }
 48
 59  void GnuPropertySection::setFeature() {
 60 -  Feature1AND = getFeature();
 61 +  Feature1AND = mergeAggregateMetadata();
 62  }
 63

101 -void elf::mergeAggregateMetadata() {
102 +uint32_t elf::mergeAggregateMetadata() {
         ... do some change here to return  X86Feature1AND;
136  }
137
138  template <class ELFT> void elf::splitSections() {
139 diff --git a/ELF/SyntheticSections.h b/ELF/SyntheticSections.h
140 index f148281..e369ef1 100644
141 --- a/ELF/SyntheticSections.h
142 +++ b/ELF/SyntheticSections.h
143 @@ -149,19 +149,19 @@ public:
144    GnuPropertySection();
145    void writeTo(uint8_t *Buf) override;
146    size_t getSize() const override;
147 +  bool empty() const override;
148
176
177 diff --git a/ELF/Writer.cpp b/ELF/Writer.cpp
178 index 11639e5..0deebb9 100644
179 --- a/ELF/Writer.cpp
180 +++ b/ELF/Writer.cpp
181 @@ -335,6 +335,16 @@ template <class ELFT> static void createSyntheticSections() {
182        Add(Sec);
183    }
184
185 +  // Merge aggregate metadata sections like .note.gnu.property that should be
186 +  // found across all relocated files and reproduced once. But Until now, the
187 +  // emitAggregateMetadata just deal with the X86Feature1AND info, and only
188 +  // X86_64, i386 support the X86Feature1AND.
189 +  if ((Config->EMachine == EM_X86_64 || Config->EMachine == EM_386)
190 +      && (Config->AutoCET || Config->ForceCET)) {
191 +    In.GnuProperty = make<GnuPropertySection>();
192 +    Add(In.GnuProperty);
193 +  }

198 @@ -416,16 +426,12 @@ template <class ELFT> static void createSyntheticSections() {
202 
203    if (Config->SPltSectionEnable) {
204      In.Splt = make<SPltSection>();
205      Add(In.Splt);
206    }
207

I guess you mean "I tried", not "I tend" in 1). I think that you can do accomplish 1.) via unconditionally creating the In.GnuPropertySection and SPltSection and by implementing the empty() member function, ensure that they are removed by removeUnusedSyntheticSections() if there isn't anything that uses them. Unconditionally creating the sections means that you don't need to call mergeAggregateMetaData from Driver.cpp prior to createSyntheticSections(). For the case of ifunc where you need to turn off the feature then just inform In.GnuPropertySection and In.SPltSection as they will always be there.

Hi peter, I have implemented your idea in my code, and it work well, I did not update here now because I am not sure about the ifunc.
One of my destination is to remove the Config->X86Feature1AND and do the feature collection work inside synthetic GnuPropertySection class.
I found I can remove the Config->X86Feature1AND except dealing with ifunc, If I remove the Config->X86Feature1AND I still need another global variable.
At this stage I want to disable CET when the program will use ifunc, my current code is Relocations.cpp:1072

// To Be Done.
// Now the IBT can't work with IFUNC feature, because they may conflict in
// changing the PLT. So we disable the IBT when the IFUNC enabled.
if (Sym.isGnuIFunc()) {
  Config->X86Feature1AND &= ~GNU_PROPERTY_X86_FEATURE_1_IBT;
  Config->SPltSectionEnable = false;
}

Do you have some good idea to turn off the CET feature in ifunc ?
Thank you very much!

As I understand it, what you have in this review (not the patch attached as a comment).

  • Driver.cpp calls mergeAggregateMetaData() to set the Config->SpltSectionEnabled and Config->X86Feature1AND
    • We know that all input objects have the feature property set.
  • Writer.cpp creates the In.SpltSection() and the In.GnuProperty() section as we know that will be needed.
  • If we encounter an ifunc then we disable IBT by setting Config->SpltSectionEnabled to false and clearing Config->X86Feature1AND

One thing I don't understand yet, apologies I've not had time to go into the details of how the splt section works, is that if you discover an ifunc mid-way through processing then you'll have inserted many entries into the In.Splt section, then disabled it, from then on you'll just use the In.Plt. Will this result in a broken PLT as it looks like we'll call get getPltVA() if Config->SpltSectionEnabled and I'm not sure if that will return the right thing if we've previously added an entry into the In.Splt section. I don't see any existing tests for ifunc handling, can you add a test case that does something like:

  • Enable SpltSection
  • Generate a In.SpltSection entry
  • encounter an Ifunc
  • Disable SpltSection
  • Generate a In.Plt entry for some other symbol
  • Check that the generated PLT makes sense.

If this bit is broken then I think you'll need to find a way of handling this case, which won't be easy. I can think of a few approaches:

  • Do a pre-scan for likely ifuncs, before calling scanRelocs. This could be simpler but more pessimistic than the existing code.
  • Have a way that the In.SpltSection decays into an In.PltSection gracefully if an ifunc is detected.

Going back to your original question, how about:

  • Writer.cpp at construction of In.GnuPropertySection calls mergeAggregateMetaData() and sets GnuPropertySection::Feature1AND and possibly SpltSection::enable
    • As you've implemented empty() these sections can be unconditionally created, although it doesn't matter too much if you conditionally create them, in that case In.GnuPltSection == nullptr is implicity Feature1AND=0 and In.SpltSection == nullptr is implicitly enableSpltSection == false
  • If we encounter an ifunc then we disable IBT by setting In.SpltSection::Enable to false and setting In.GnuPropertySection::X86Feature1AND to 0. If you've not created the sections then you'll obviously need to test that they exist first. If I've understood correctly if the sections don't exist then there is nothing to disable anyway.

Please let me know if I've misunderstood something critical?

ruiu added a comment.Mar 19 2019, 9:32 AM

Overall this patch still look very different from the existing lld code base and does many things by hand without using LLVM libObject. You really need to read the existing code first and then try to mimic what the existing code is doing. What this patch is trying to do is not that new -- your patch should and could follow the existing patterns that I believe not too hard to find by reading existing code.

It is perhaps faster to actually show how it should look like than pointing out all the details in the review thread, so I'll apply your patch locally, modify that, and then upload it so that you can read it.

ELF/SyntheticSections.cpp
318–328

These functions are too trivial.

3012–3013

This does not follow the LLVM naming convention.

3085

Essentially, when you read some structure from a file, you should cast a byte array to a struct defined in llvm/include/llvm/Object/ELFTypes.h. You should not read data like this function does.

3089

Needless use of decltype.

ELF/Writer.cpp
426–427

I don't think you need this variable.

As I understand it, what you have in this review (not the patch attached as a comment).

  • Driver.cpp calls mergeAggregateMetaData() to set the Config->SpltSectionEnabled and Config->X86Feature1AND
    • We know that all input objects have the feature property set.
  • Writer.cpp creates the In.SpltSection() and the In.GnuProperty() section as we know that will be needed.
  • If we encounter an ifunc then we disable IBT by setting Config->SpltSectionEnabled to false and clearing Config->X86Feature1AND

    One thing I don't understand yet, apologies I've not had time to go into the details of how the splt section works, is that if you discover an ifunc mid-way through processing then you'll have inserted many entries into the In.Splt section, then disabled it, from then on you'll just use the In.Plt. Will this result in a broken PLT as it looks like we'll call get getPltVA() if Config->SpltSectionEnabled and I'm not sure if that will return the right thing if we've previously added an entry into the In.Splt section. I don't see any existing tests for ifunc handling, can you add a test case that does something like:
  • Enable SpltSection
  • Generate a In.SpltSection entry
  • encounter an Ifunc
  • Disable SpltSection
  • Generate a In.Plt entry for some other symbol
  • Check that the generated PLT makes sense.

    If this bit is broken then I think you'll need to find a way of handling this case, which won't be easy. I can think of a few approaches:
  • Do a pre-scan for likely ifuncs, before calling scanRelocs. This could be simpler but more pessimistic than the existing code.
  • Have a way that the In.SpltSection decays into an In.PltSection gracefully if an ifunc is detected.

    Going back to your original question, how about:
  • Writer.cpp at construction of In.GnuPropertySection calls mergeAggregateMetaData() and sets GnuPropertySection::Feature1AND and possibly SpltSection::enable
    • As you've implemented empty() these sections can be unconditionally created, although it doesn't matter too much if you conditionally create them, in that case In.GnuPltSection == nullptr is implicity Feature1AND=0 and In.SpltSection == nullptr is implicitly enableSpltSection == false
  • If we encounter an ifunc then we disable IBT by setting In.SpltSection::Enable to false and setting In.GnuPropertySection::X86Feature1AND to 0. If you've not created the sections then you'll obviously need to test that they exist first. If I've understood correctly if the sections don't exist then there is nothing to disable anyway.

    Please let me know if I've misunderstood something critical?

Hi, Peter, very appreciate for your suggestion, for the Splt, in fact, it is just the entry of the old PLT, you can see in this way: SPLT+PLT=(old)PLT, we split the (old)Plt to Plt +SPlt, one of the reasons is for performance that we want to put them into different areas (hot code, cold code) .
And in this patch, I used getSPltVA() instead of getPltVA when face outside. For example in getRelocTargetVA():

case R_PLT:
  // When the CET Branch protection is enabled, the second PLT
  // should be call to.
  if (Config->SPltSectionEnable)
    return Sym.getSPltVA() + A;
  else
    return Sym.getPltVA() + A;

Maybe it's hard to use In.SpltSection::Enable to do these.
Any way I will check it.
Thank you very much!

Overall this patch still look very different from the existing lld code base and does many things by hand without using LLVM libObject. You really need to read the existing code first and then try to mimic what the existing code is doing. What this patch is trying to do is not that new -- your patch should and could follow the existing patterns that I believe not too hard to find by reading existing code.

It is perhaps faster to actually show how it should look like than pointing out all the details in the review thread, so I'll apply your patch locally, modify that, and then upload it so that you can read it.

Hi, ruiu, I'll try to read them and let the code style like the existing lld code, I think the most un-beautiful code you sought is in the findGNUPropertyX86Feature1AND() function, here is too much details about the ABI, and I get the features in my way.

But I am not quite understanding your following review:

    In.GnuProperty = make<GnuPropertySection>();
    Add(In.GnuProperty);
ruiu:  I don't think you need this variable.

Did you mean I should not use the GnuPropertySection (SyntheticSection Class) to generate the output .note.gnu.property section? It seems a little different with peter's suggestion.

Anyway I'll read more code first.
Thank you again!

Fine tuning the patch, and still need to refine the "code style".

xiangzhangllvm marked an inline comment as done.Mar 19 2019, 10:18 PM
xiangzhangllvm added inline comments.
ELF/SyntheticSections.cpp
3085

Yes! This is the key point!

xiangzhangllvm marked an inline comment as done.Mar 19 2019, 11:34 PM
xiangzhangllvm added inline comments.
ELF/SyntheticSections.cpp
3085

Hi Ruiu, let me tell you my idea about how to rewrite here code first:

First I checked llvm/include/llvm/Object/ELFTypes.h, and find no structure corresponding to our .note.gnu.property.
Compiler write this section at llvm/lib/Target/X86/X86AsmPrinter.cpp:X86AsmPrinter::EmitStartOfAsmFile(), it not used any structure too.

I want to change here like this:

struct head{
...
}
struct desc{
...
}
struct gnu.property{
  head H;
  desc D[]; 
}

then cast the note.gnu.property's data array to struct gnu.property.
and check the features though struct gnu.property.
Do you think it's OK?

I rewrited the collecting feature functions.
It is very simple now!

+struct GnuPropertyHeadType {
+  uint32_t Namesz;
+  uint32_t Descsz;
+  uint32_t NType;
+  uint32_t Name;
+};                                              // Did here need packed attribute ? I have a little concern
+
+struct GnuPropertyDescType {
+  uint32_t DescTy;
+  uint32_t Descsz;
+  uint32_t Feature;
+  uint32_t Pad;
+};                                              // Did here need packed attribute ? I have a little concern
+
+struct GnuPropertyType {
+  GnuPropertyHeadType Head;
+  GnuPropertyDescType Desc[];
+};                                            // Did here need packed attribute ? I have a little concern
+
+static bool findGNUPropertyX86Feature1AND(InputSectionBase *S,
+                                          unsigned &Features) {
+  auto Data = S->getDataAs<uint32_t>();
+  GnuPropertyType *GnuProperty = cast<GnuPropertyType>(Data.data());
+  size_t DescNum = GnuProperty->Head.Descsz / sizeof (GnuPropertyDescType);
+  if (GnuProperty->Head.NType != NT_GNU_PROPERTY_TYPE_0)
+    return false;
+  for (size_t i = 0; i < DescNum; i ++) {
+    if (GnuProperty->Desc[i].DescTy == GNU_PROPERTY_X86_FEATURE_1_AND) {
+      Features = GnuProperty->Desc[i].Feature;
+      return true;
+    }
+  }
+  return false;
+}

I rewrited the collecting feature functions.
It is very simple now!

+struct GnuPropertyHeadType {
+  uint32_t Namesz;
+  uint32_t Descsz;
+  uint32_t NType;
+  uint32_t Name;
+};                                              // Did here need packed attribute ? I have a little concern
+
+struct GnuPropertyDescType {
+  uint32_t DescTy;
+  uint32_t Descsz;
+  uint32_t Feature;
+  uint32_t Pad;
+};                                              // Did here need packed attribute ? I have a little concern
+
+struct GnuPropertyType {
+  GnuPropertyHeadType Head;
+  GnuPropertyDescType Desc[];
+};                                            // Did here need packed attribute ? I have a little concern
+
+static bool findGNUPropertyX86Feature1AND(InputSectionBase *S,
+                                          unsigned &Features) {
+  auto Data = S->getDataAs<uint32_t>();
+  GnuPropertyType *GnuProperty = cast<GnuPropertyType>(Data.data());
+  size_t DescNum = GnuProperty->Head.Descsz / sizeof (GnuPropertyDescType);
+  if (GnuProperty->Head.NType != NT_GNU_PROPERTY_TYPE_0)
+    return false;
+  for (size_t i = 0; i < DescNum; i ++) {
+    if (GnuProperty->Desc[i].DescTy == GNU_PROPERTY_X86_FEATURE_1_AND) {
+      Features = GnuProperty->Desc[i].Feature;
+      return true;
+    }
+  }
+  return false;
+}

I'm having problems getting this to compile on at least GCC 5.4.0 on Linux:

lld/ELF/SyntheticSections.cpp:3104:28: warning: ISO C++ forbids zero-size array ‘Desc’ [-Wpedantic]
   GnuPropertyDescType Desc[]

lld/ELF/SyntheticSections.cpp:3110:55: error: invalid conversion from ‘llvm::cast_retty<GnuPropertyType, const unsigned int*>::ret_type {aka const GnuPropertyType*}’ to ‘GnuPropertyType*’ [-fpermissive]
   GnuPropertyType *GnuProperty = cast<GnuPropertyType>(Data.data());

The cast<> is an LLVM cast http://llvm.org/docs/ProgrammersManual.html#the-isa-cast-and-dyn-cast-templates and isn't something you can do on an arbitrary struct.

Assuming (x86) that your input data is always little endian then you'll need to use something like ulittle32_t instead of uint32_t or your tests will fail when run on a big-endian machine like PPC or some Mips machines.

Hi, Peter, very appreciate for your suggestion, for the Splt, in fact, it is just the entry of the old PLT, you can see in this way: SPLT+PLT=(old)PLT, we split the (old)Plt to Plt +SPlt, one of the reasons is for performance that we want to put them into different areas (hot code, cold code) .
And in this patch, I used getSPltVA() instead of getPltVA when face outside. For example in getRelocTargetVA():

case R_PLT:
  // When the CET Branch protection is enabled, the second PLT
  // should be call to.
  if (Config->SPltSectionEnable)
    return Sym.getSPltVA() + A;
  else
    return Sym.getPltVA() + A;

Maybe it's hard to use In.SpltSection::Enable to do these.
Any way I will check it.
Thank you very much!

I've checked that it does work, although at the moment it does leave a redundant .splt section in the binary. I think that this can be removed by accounting for this in the SPltSection::empty(), I've left a suggestion inline. I also note that there are no tests that check the contents of the .splt are correct. I think it is important to have those as it can be easy for it to be broken by some later change and no-one noticing it.

ELF/SyntheticSections.h
681

If I've understood correctly if the .SPLT is disabled then this should return true, otherwise if you have any ifuncs then a redundant .splt section is produced.

test/ELF/x86-64-feature-1-and.s
58

The test checks that the splt is created, but it doesn't check that the contents are correct. There should be at least one of these tests that disassembles the .plt and checks that it is loading the correct .got entry.

There also should be a test to show that when an ifunc plt is added then we revert to using the .plt, and that any existing .splt entries are not called.

For example:

.type foo STT_GNU_IFUNC
.globl ifunc
ifun:
 ret

 .globl func1
 .type func1,@function
func1:
        callq func2
        callq ifunc    // func2 will have been added to .splt, we don't want func3 to be.
        callq func3
ruiu added a comment.Mar 20 2019, 3:48 PM

Overall this patch still look very different from the existing lld code base and does many things by hand without using LLVM libObject. You really need to read the existing code first and then try to mimic what the existing code is doing. What this patch is trying to do is not that new -- your patch should and could follow the existing patterns that I believe not too hard to find by reading existing code.

It is perhaps faster to actually show how it should look like than pointing out all the details in the review thread, so I'll apply your patch locally, modify that, and then upload it so that you can read it.

Hi, ruiu, I'll try to read them and let the code style like the existing lld code, I think the most un-beautiful code you sought is in the findGNUPropertyX86Feature1AND() function, here is too much details about the ABI, and I get the features in my way.

But I am not quite understanding your following review:

    In.GnuProperty = make<GnuPropertySection>();
    Add(In.GnuProperty);
ruiu:  I don't think you need this variable.

Did you mean I should not use the GnuPropertySection (SyntheticSection Class) to generate the output .note.gnu.property section? It seems a little different with peter's suggestion.

Sometimes we create a section and fill it later, because the exact contents of the section is not known at the time when we create it. In that case, you have to create a section first, and then store its pointer to In.* so that you can refer them later.

I don't think you need that, though. There's nothing to "finalize" for that section because everything is known when you instantiate this class.

xiangzhangllvm added a comment.EditedMar 20 2019, 5:40 PM

I'm having problems getting this to compile on at least GCC 5.4.0 on Linux:

lld/ELF/SyntheticSections.cpp:3104:28: warning: ISO C++ forbids zero-size array ‘Desc’ [-Wpedantic]
   GnuPropertyDescType Desc[]

lld/ELF/SyntheticSections.cpp:3110:55: error: invalid conversion from ‘llvm::cast_retty<GnuPropertyType, const unsigned int*>::ret_type {aka const GnuPropertyType*}’ to ‘GnuPropertyType*’ [-fpermissive]
   GnuPropertyType *GnuProperty = cast<GnuPropertyType>(Data.data());

The cast<> is an LLVM cast http://llvm.org/docs/ProgrammersManual.html#the-isa-cast-and-dyn-cast-templates and isn't something you can do on an arbitrary struct.

Assuming (x86) that your input data is always little endian then you'll need to use something like ulittle32_t instead of uint32_t or your tests will fail when run on a big-endian machine like PPC or some Mips machines.

Yes! It is really have error!
I used incremental compilation, didn't find it at first. I'll change it.

xiangzhangllvm marked 2 inline comments as done.Mar 20 2019, 6:45 PM

I've checked that it does work, although at the moment it does leave a redundant .splt section in the binary. I think that this can be removed by accounting for this in the SPltSection::empty(), .........

It shouldn't have redundant .splt section, there should not have any .splt section if IBT is disable or non-dynamic link. Could you show me your test please?

ELF/SyntheticSections.h
681

Till now this patch doesn't carefully handle ifunc, in ELF/Relocations.cpp:1072, I disable the splt if checked ifunc, if splt is disabled, there will be no splt section created.

if (Config->SPltSectionEnable) {
  In.Splt = make<SPltSection>();
  Add(In.Splt);
}
test/ELF/x86-64-feature-1-and.s
58

I check the contents of plts in my local, I'll add them to the patch.

For, I ifunc, I planned to handle it in my next patch, so I didn't add tests here.

I've checked that it does work, although at the moment it does leave a redundant .splt section in the binary. I think that this can be removed by accounting for this in the SPltSection::empty(), .........

It shouldn't have redundant .splt section, there should not have any .splt section if IBT is disable or non-dynamic link. Could you show me your test please?

My test is a simple modification of x86-64-feature-1-and.s and Inputs/x86-64-feature-1-and1.s
Step 1: add func3 and func4 to Inputs/x86-64-feature-1-and1.s so that we can resolve the symbols.

 .text
 .globl func3
 .type func3,@function
func3:
 retq

.text
 .globl func4
 .type func4,@function
func4:
 retq

For x86-64-feature-1-and.s
change.text to

 .text

.type foo STT_GNU_IFUNC
.globl foo
foo:
 ret

 .globl func1
 .type func1,@function
func1:
        callq func2
        callq func3
        callq foo        
        callq func4
 retq

I've added an ifunc, a call to an ifunc and a call to a couple more functions. Notably the ifunc is after the calls to func1 and func2, so we turn off the feature after having added entries to the .splt.
Disassembly from gnu objdump

0000000000201000 <foo>:
  201000:	c3                   	retq   

0000000000201001 <func1>:
  201001:	e8 2a 00 00 00       	callq  201030 <func2@plt>
  201006:	e8 35 00 00 00       	callq  201040 <func3@plt>
  20100b:	e8 50 00 00 00       	callq  201060 <*ABS*+0x201000@plt>
  201010:	e8 3b 00 00 00       	callq  201050 <func4@plt>
  201015:	c3                   	retq   

Disassembly of section .plt:

0000000000201020 <func2@plt-0x10>:
  201020:	ff 35 e2 1f 00 00    	pushq  0x1fe2(%rip)        # 203008 <_DYNAMIC+0x1008>
  201026:	ff 25 e4 1f 00 00    	jmpq   *0x1fe4(%rip)        # 203010 <_DYNAMIC+0x1010>
  20102c:	0f 1f 40 00          	nopl   0x0(%rax)

0000000000201030 <func2@plt>:
  201030:	ff 25 e2 1f 00 00    	jmpq   *0x1fe2(%rip)        # 203018 <func2>
  201036:	68 00 00 00 00       	pushq  $0x0
  20103b:	e9 e0 ff ff ff       	jmpq   201020 <func1+0x1f>

0000000000201040 <func3@plt>:
  201040:	ff 25 da 1f 00 00    	jmpq   *0x1fda(%rip)        # 203020 <func3>
  201046:	68 01 00 00 00       	pushq  $0x1
  20104b:	e9 d0 ff ff ff       	jmpq   201020 <func1+0x1f>

0000000000201050 <func4@plt>:
  201050:	ff 25 d2 1f 00 00    	jmpq   *0x1fd2(%rip)        # 203028 <func4>
  201056:	68 02 00 00 00       	pushq  $0x2
  20105b:	e9 c0 ff ff ff       	jmpq   201020 <func1+0x1f>

0000000000201060 <*ABS*+0x201000@plt>:
  201060:	ff 25 ca 1f 00 00    	jmpq   *0x1fca(%rip)        # 203030 <_DYNAMIC+0x1030>
  201066:	68 00 00 00 00       	pushq  $0x0
  20106b:	e9 e0 ff ff ff       	jmpq   201050 <func4@plt>

Disassembly of section .splt:

0000000000201070 <.splt>:
  201070:	f3 0f 1e fa          	endbr64 
  201074:	ff 25 9e 1f 00 00    	jmpq   *0x1f9e(%rip)        # 203018 <func2>
  20107a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
  201080:	f3 0f 1e fa          	endbr64 
  201084:	ff 25 96 1f 00 00    	jmpq   *0x1f96(%rip)        # 203020 <func3>
  20108a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)

Note the redundant splt. I'm using the previous version of the patch as the current one didn't compile for me. I will double check the most recent patch with a more recent compiler.

Bug fix for 32-bit, the Desc size is not same between 32-bit and 64-bit

Hi peter, this update will remove redundant .splt in your upper ifunc test.
The reason of redundant .splt is that we scan the IFUNC symbol after we collected the CET feature. So the splt was created first and without being removed after we found IFUNC.
Anyway this patch is just evade ifunc, I planed to hand ifunc+cet after this patch.
Thank your test! thank you too

The update:

 1 diff --git a/ELF/SyntheticSections.cpp b/ELF/SyntheticSections.cpp
 2 index 9066a84..a55ca1b 100644
 3 --- a/ELF/SyntheticSections.cpp
 4 +++ b/ELF/SyntheticSections.cpp
 5 @@ -304,6 +304,9 @@ void GnuPropertySection::writeTo(uint8_t *Buf) {
 6    uint8_t *DescBuf = Buf + 16;
 7    write32(DescBuf, GNU_PROPERTY_X86_FEATURE_1_AND); // Feature type
 8    write32(DescBuf + 4, 4);                          // Feature size
 9 +  // We need remove this if code when we supported IFUNC+CET.
10 +  if (!Config->SPltSectionEnable)
11 +    Feature1AND &= ~GNU_PROPERTY_X86_FEATURE_1_IBT;
12    write32(DescBuf + 8, Feature1AND);                // Feature flags
13  }
14
15 diff --git a/ELF/SyntheticSections.h b/ELF/SyntheticSections.h
16 index 626a76d..51c3515 100644
17 --- a/ELF/SyntheticSections.h
18 +++ b/ELF/SyntheticSections.h
19 @@ -697,7 +697,8 @@ public:
20    SPltSection();
21    void writeTo(uint8_t *Buf) override;
22    size_t getSize() const override;
23 -  bool empty() const override { return Entries.empty(); }
24 +  bool empty() const override {
25 +    return !Config->SPltSectionEnable || Entries.empty(); }
26
27    template <class ELFT> void addEntry(Symbol &Sym);
28  private:
29 diff --git a/ELF/Writer.cpp b/ELF/Writer.cpp
30 index 11639e5..341a557 100644
31 --- a/ELF/Writer.cpp
32 +++ b/ELF/Writer.cpp
33 @@ -421,10 +421,8 @@ template <class ELFT> static void createSyntheticSections() {
34      Add(In.Splt);
35    }
36
37 -  if (Config->X86Feature1AND) {
38 -    In.GnuProperty = make<GnuPropertySection>();
39 -    Add(In.GnuProperty);
40 -  }
41 +  if (Config->X86Feature1AND)
42 +    Add(make<GnuPropertySection>());
43
44    // .note.GNU-stack is always added when we are creating a re-linkable
45    // object file. Other linkers are using the presence of this marker
46 @@ -1669,8 +1667,6 @@ template <class ELFT> void Writer<ELFT>::finalizeSections() {
47    // earlier.
48    finalizeSynthetic(In.EhFrame);
49
50 -  finalizeSynthetic(In.GnuProperty);
51 -
52    for (Symbol *S : Symtab->getSymbols())
53      if (!S->IsPreemptible)

Update test to check splt's context

ruiu added a comment.Mar 25 2019, 9:56 AM

As I said I was working on refining this patch. Actually it turned out that it was more like a complete rewrite than a refinement, as this patch was not aligned very well with the existing lld codebase and also have several performance issues. I'll post a patch shortly.

I posted https://groups.google.com/forum/#!topic/x86-64-abi/Z6GqQDy_NaI to ask 3 questions regarding IBT-enabled PLT with lazy binding.