This is an archive of the discontinued LLVM Phabricator instance.

Support X86 Control-flow Enforcement Technology (CET) in LLD
AbandonedPublic

Authored by xiangzhangllvm on Feb 11 2019, 10:31 PM.

Details

Summary
Control-flow Enforcement Technology (CET)

provides the following capabilities to defend against ROP/JOP style control-flow subversion attacks:

Shadow Stack (SHSTK) – return address protection to defend against Return Oriented Programming,

A shadow stack is a second stack for the program that is used exclusively for control transfer operations.
This stack is separate from the data stack and can be enabled for operation individually in user mode or supervisor mode.
When shadow stacks are enabled, the CALL instruction pushes the return address on both the data and shadow stack.
The RET instruction pops the return address from both stacks and compares them.
If the return addresses from the two stacks do not match, the processor signals a control protection exception (#CP).

Indirect branch tracking (IBT) – free branch protection to defend against Jump/Call Oriented Programming.

The ENDBRANCH is a new instruction that is used to mark valid indirect call/jump targets in the program.
This instruction opcode is selected to be one that is a NOP on legacy machines such that programs compiled with ENDBRANCH new instruction continue to function on old machines without the CET enforcement.
On processors that support CET the ENDBRANCH is still a NOP and is primarily used as a marker instruction by the in-order part of the processor pipeline to detect control flow violations.
The CPU implements a state machine that tracks indirect jump and call instructions. When one of these instructions is seen, the state machine moves from IDLE to WAIT_FOR_ENDBRANCH state.
In WAIT_FOR_ENDBRANCH state the next instruction in the program stream must be an ENDBRANCH.
If an ENDBRANCH is not seen the processor causes a control protection fault else the state machine moves back to IDLE state.

Work flow of CET:

A CET supported executable file will contain the CET flags in its .note.gnu.property section.
(The flags of SHSTK or IBT will be set at the GNU_PROPERTY_X86_FEATURE_1_AND property in .note.gnu.property section if the CET is enabled.)
When run the executable file, CET-enabled OS will check these flags in the related segment, and run the feature accordingly on supported hardware.
If it run on unsupported machines, These CET flags or ENDBRANCH will be ignore.

The key implement of the feature in LLD:

LLD will first check if all the relocatable object files contain the .note.gnu.property section and if there is GNU_PROPERTY_X86_FEATURE_1_AND property in this section.
If it finds the GNU_PROPERTY_X86_FEATURE_1_AND property in all the relocatable object files,
it will handle the flags in these properties and sections, create the resulting .note.gnu.property section in output file if necessary.
Because the IBT feature needs to insert an endbr* instruction at the end of indirect jump, LLD also deals with the lazy binding by adding a second PLT called SPLT.
Recording to https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-draft.pdf 13.2 DynamicLinking

Diff Detail

Repository
rLLD LLVM Linker

Event Timeline

MaskRay added a comment.EditedFeb 12 2019, 12:00 AM

ELF/Arch/{X86,X86_64}.cpp implement another nonstandard PLT called retpoline PLT.
Can the security enhancement charactersitics of CET be used along with -z retpolineplt?
If not (for now and for the future), have you considered making a class like Retpoline which derives from X86_64<ELFT> and overrides the relevant GOT/PLT methods (does SPLT have a mnemonic name? Secure PLT?)

template <class ELFT> class Retpoline : public X86_64<ELFT> {
public:
  Retpoline();
  void writeGotPlt(uint8_t *Buf, const Symbol &S) const override;
  void writePltHeader(uint8_t *Buf) const override;
  void writePlt(uint8_t *Buf, uint64_t GotPltEntryAddr, uint64_t PltEntryAddr,
                int32_t Index, unsigned RelOff) const override;
};

When linking statically, IFUNC also requires PLT. Does this SPLT need to do anything to play with static linking?

Another question: is there any concurrent work planned on ld.bfd or gold? I find that gold merges GNU_PROPERTY_X86_* bits for .note.gnu.property but it doesn't have the SPLT feature yet.

I 'll check these issues, thank you!

ruiu added a comment.Feb 13 2019, 1:43 PM

This is a fairly large patch that introduces a completely new feature to lld. Please allow me time to review this patch carefully at high level; I wouldn't like to review the code in detail until high level concerns or questions are addressed. Next time, please consider sending an RFC to express your intent to implement a feature before finishing up your patch. Completing a proof-of-concept is fine, but further development should be done incrementally in my opinion. It is not an uncommon situation in which we have to request a significant rework if a large feature is implemented all at once in a single patch. In addition to that, splitting a patch into a number of incremental patches makes code review much easier.

Hi MaskRay:
I have checked your concerns, the SPLT really have problems to work with retpolineplt and IFUNC, because they all change the old PLT, I tend to deal with these conflicts in next time, because this patch is already big.
I'll disable the IBT which may generate SPLT to resolve these conflict temporarily when retpolineplt or IFUNC enabled.
And, yes, I have considered making a class for the SPLT, class PltSection : public SyntheticSection, it overrides the writeTo function.
In fact, The SPLT is standard for Second PLT, when the CET's IBT feature enabled, we split the old PLT to PLT + SPLT, dynamic call first call to SPLT then jump to PLT, the main reason we add a SPLT is that we need insert
ENDBRANCH instructions in the old PLT and we don't want to change PLT size at first.
Thank you very much!

Hi ruiu:
Very thank you for spending your time to review this patch, I'll keep your adjustments in my mind. But I am not clear about the RFC, what it stands for?

Hi ruiu:
Very thank you for spending your time to review this patch, I'll keep your adjustments in my mind. But I am not clear about the RFC, what it stands for?

I think RFC stands for "Request For Comments". You may send an email to llvm-dev (see http://lists.llvm.org/pipermail/llvm-dev/2019-February/thread.html for some examples) It is commonly used when some completely new features are planned. The comments here go to the mailing list llvm-commits but not many people subscribe to the list. llvm-dev has a wider audience.

Hi dears, I update the patch, and disable the IBT feature when the IFUNC or retpolineplt enabled.
I'll deal with them next time and now I mark "To Be Done" in the corresponding position.

Hi ruiu:
Very thank you for spending your time to review this patch, I'll keep your adjustments in my mind. But I am not clear about the RFC, what it stands for?

I think RFC stands for "Request For Comments". You may send an email to llvm-dev (see http://lists.llvm.org/pipermail/llvm-dev/2019-February/thread.html for some examples) It is commonly used when some completely new features are planned. The comments here go to the mailing list llvm-commits but not many people subscribe to the list. llvm-dev has a wider audience.

Thank you MaskRay!
And I update the patch to avoid ifunc and retpolineplt:

1 diff --git a/ELF/Driver.cpp b/ELF/Driver.cpp
2 index 4b5a89a..1bc8101 100644
3 --- a/ELF/Driver.cpp
4 +++ b/ELF/Driver.cpp
5 @@ -1505,6 +1505,13 @@ static void readGNUPropertyX86Feature1AND(InputSectionBase *S,
6    // integer. A bit in the output pr_data field is set only if it is set
7    // in all relocatable input pr_data fields.
8    Config->X86Feature1AND &= Flags;
9 +

10 + To Be Done.
11 +
Now the IBT can't work with retpolineplt feature, becasue they may conflict
12 + in changing the PLT. So we disable the IBT when the IFUNC enabled.
13 + if (Config->ZRetpolineplt)
14 + Config->X86Feature1AND &= ~GNU_PROPERTY_X86_FEATURE_1_IBT;
15 +
16 Config->SPltSectionEnable = Config->X86Feature1AND
17 & GNU_PROPERTY_X86_FEATURE_1_IBT;
18 }
19 diff --git a/ELF/Relocations.cpp b/ELF/Relocations.cpp
20 index 2dc981e..c579dfb 100644
21 --- a/ELF/Relocations.cpp
22 +++ b/ELF/Relocations.cpp
23 @@ -1036,6 +1036,12 @@ static void scanReloc(InputSectionBase &Sec, OffsetGetter &GetOffset, RelTy *&I,
24 getLocation(Sec, Sym, Offset));
25 }
26 Expr = toPlt(Expr);
27 +
28 +
To Be Done.
29 + Now the IBT can't work with IFUNC feature, becasue they may conflict in
30 +
changing the PLT. So we disable the IBT when the IFUNC enabled.
31 + Config->X86Feature1AND &= ~GNU_PROPERTY_X86_FEATURE_1_IBT;
32 + Config->SPltSectionEnable = false;
33 } else if (!Sym.IsPreemptible && Expr == R_GOT_PC && !isAbsoluteValue(Sym)) {
34 Expr = Target->adjustRelaxExpr(Type, RelocatedAddr, Expr);
35 } else if (!Sym.IsPreemptible) {

Hi ruiu:
The main logic in this patch is that: the top function is LinkerDriver::mergeAggregateMetadata(), it check if all the relocatable object files contain the GNU_PROPERTY_X86_FEATURE_1_AND property in this section.
All the GNU_PROPERTY_X86_FEATURE_1_AND property flags will be bit-and(&) at function readGNUPropertyX86Feature1AND. Then all property information will be collected in Config->X86Feature1AND.
The Config->X86Feature1AND will tell what contexts the result output .note.gnu.property section should be set.
The data struct of .note.gnu.property is written in https://github.com/hjl-tools/linux-abi/wiki/linux-abi-draft.pdf.

ruiu added a comment.Feb 14 2019, 10:27 AM

Can I ask a few random questions to understand the design of the patch?

  1. In the patch description, it says that lld checks if all input object files contains a certain marker .note section. What is supposed to happen if some object files miss the .note section? Do we silently disable the feature in that case?
  1. Why did you choose to create a separate section .splt instead of just writing different contents to .plt?
  1. Do you really want to support i386 (as opposed to x86-64)? I'm genuinely curious who would want to use the feature in the 32-bit mode.

Can I ask a few random questions to understand the design of the patch?

  1. In the patch description, it says that lld checks if all input object files contains a certain marker .note section. What is supposed to happen if some object files miss the .note section? Do we silently disable the feature in that case?
  1. Why did you choose to create a separate section .splt instead of just writing different contents to .plt?
  1. Do you really want to support i386 (as opposed to x86-64)? I'm genuinely curious who would want to use the feature in the 32-bit mode.

1 Yes, we will disable the feature if not all the input object files contain the .note.gnu.property section, because the CET feature will supervise all the indirect jumps in the program. If one input file do not contain the CET info (flags which is set in .note.gnu.property section's GNU_PROPERTY_X86_FEATURE_1_AND property ), it means it can‘t be check by the CET hardware, so we need to disable the CET feature. This is important to link the non-CET library.
2 Here we can optimize, when IBT enabled, we need to insert the ENDBRANCH instructions in the old PLT, at first we don't want to change the PLT size, so we add a SPLT ,the caller first jump to SPLT and then go to PLT. The function of PLT +SPLT = old PLT.
3 We both support it in x86 and x86-64, please refer line 2017 in Driver.cpp. The control flow attack threat both exist in 32-bit and 64-bit, So the further 32-bit CPU will support the CET too, In fact GCC has supported the CET feature in 32-bit machines by Lu HongJiu who write the https://github.com/hjl-tools/linux-abi/wiki/linux-abi-draft.pdf.
Thank you very much!

ruiu added a comment.Feb 14 2019, 5:46 PM

1 Yes, we will disable the feature if not all the input object files contain the .note.gnu.property section, because the CET feature will supervise all the indirect jumps in the program. If one input file do not contain the CET info (flags which is set in .note.gnu.property section's GNU_PROPERTY_X86_FEATURE_1_AND property ), it means it can‘t be check by the CET hardware, so we need to disable the CET feature. This is important to link the non-CET library.

Looks like the behavior implemented in this patch is this: if all object files are compiled with the CET-enabling compiler option, lld marks the resulting executable as CET-enabled, and if not, mark as otherwise. I think that behavior is error-prone and can be dangerous. Imagine that you already have a program that you use CET. If you make a change to the program and link it against a CET-unaware static library, it would disable CET altogether. I could imagine many different scenarios in which you would accidentally disable CET without knowing.

So I think a safer approach here is to add a new option to the linker to enable/disable CET. If enabled, lld should verify that all input object files has the .note section and report error if not.

2 Here we can optimize, when IBT enabled, we need to insert the ENDBRANCH instructions in the old PLT, at first we don't want to change the PLT size, so we add a SPLT ,the caller first jump to SPLT and then go to PLT. The function of PLT +SPLT = old PLT.

Can you elaborate a bit as to why you didn't want to change the PLT size? I think PLT should work with a different PLT entry size.

We actually have an alternative PLT format for a Spectre mitigation, whose PLT entry size is different from the standard one, and that's working fine.

3 We both support it in x86 and x86-64, please refer line 2017 in Driver.cpp. The control flow attack threat both exist in 32-bit and 64-bit, So the further 32-bit CPU will support the CET too, In fact GCC has supported the CET feature in 32-bit machines by Lu HongJiu who write the https://github.com/hjl-tools/linux-abi/wiki/linux-abi-draft.pdf.

Got it.

1 Yes, we will disable the feature if not all the input object files contain the .note.gnu.property section, because the CET feature will supervise all the indirect jumps in the program. If one input file do not contain the CET info (flags which is set in .note.gnu.property section's GNU_PROPERTY_X86_FEATURE_1_AND property ), it means it can‘t be check by the CET hardware, so we need to disable the CET feature. This is important to link the non-CET library.

Looks like the behavior implemented in this patch is this: if all object files are compiled with the CET-enabling compiler option, lld marks the resulting executable as CET-enabled, and if not, mark as otherwise. I think that behavior is error-prone and can be dangerous. Imagine that you already have a program that you use CET. If you make a change to the program and link it against a CET-unaware static library, it would disable CET altogether. I could imagine many different scenarios in which you would accidentally disable CET without knowing.

So I think a safer approach here is to add a new option to the linker to enable/disable CET. If enabled, lld should verify that all input object files has the .note section and report error if not.

It seems a good idea, please let me think it over for a time, anyway I wish it be an another increment patch if we decide do it.

2 Here we can optimize, when IBT enabled, we need to insert the ENDBRANCH instructions in the old PLT, at first we don't want to change the PLT size, so we add a SPLT ,the caller first jump to SPLT and then go to PLT. The function of PLT +SPLT = old PLT.

Can you elaborate a bit as to why you didn't want to change the PLT size? I think PLT should work with a different PLT entry size.

We actually have an alternative PLT format for a Spectre mitigation, whose PLT entry size is different from the standard one, and that's working fine.

These days I am also re-considering the SPLT way of implementing the IBT of PLT, In fact, at beginning, we thought the size of PLT may be fixed in some SPEC, so we just think that not changing the old PLT's size and use a independent SPLT with the size is a more safe way.
If you think just rewriting the old PLT is better, I'll try to re-implement this part.

These days I am also re-considering the SPLT way of implementing the IBT of PLT, In fact, at beginning, we thought the size of PLT may be fixed in some SPEC, so we just think that not changing the old PLT's size and using a independent SPLT with the same size is a more safe way.
If you think just rewriting the old PLT is better, I'll try to re-implement this part.

(If you haven't started a thread on llvm-dev maybe it is a good time to do it now. It'd be good to get feedback from users of other OSes and arches)

Varying PLT entry sizes should work but you'd likely have to teach GNU objdump/llvm-objdump how to symbolize them. (lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp fintPltEntries)

(If you haven't started a thread on llvm-dev maybe it is a good time to do it now. It'd be good to get feedback from users of other OSes and arches)

Varying PLT entry sizes should work but you'd likely have to teach GNU objdump/llvm-objdump how to symbolize them. (lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp fintPltEntries)

Yes, MaskRay, I'll check how GCC deal with it here, and try keep compatible with it.

Hi friends, And I find GCC used Second PLT too. I am trying to contact its designer. One of the reviewer Lu Hongjiu.

ruiu added a comment.Feb 15 2019, 11:16 AM

I really don't think we should introduce a new PLT (.splt) if the reason of adding it is to keep the compatibility of the existing .plt format. Second PLT is too complicated and would become a technical debt.

Please imagine what could happen in, say 10 years from now, when all x86 CPUs in the market support CET. Many programs would be using CET to tighten security. One who runs objdump on a binary might ask "what is this .splt section?" and the answer would be "that's essentially a PLT that was invented 10 years ago so that they didn't have to change the .plt content format at that moment." I don't think that's a good scenario.

gdb or other GNU binutils tools might not work correctly with a nonstandard PLT, but that's fixable. We should avoid complicate the binary format to avoid a short-term problem.

Hi ruiu, I asked its designer. This is really specified in x86-64 psABI SPEC, Please refer https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-draft.pdf 13.2 DynamicLinking

xiangzhangllvm edited reviewers, added: wxiao3; removed: pengfei.
xiangzhangllvm edited the summary of this revision. (Show Details)Feb 18 2019, 4:47 PM
ruiu added a comment.Feb 19 2019, 4:04 PM

Hi ruiu, I asked its designer. This is really specified in x86-64 psABI SPEC, Please refer https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-draft.pdf 13.2 DynamicLinking

We do not always have to just follow the psABI. If some part of it doesn't make much sense, we should change the spec instead of adopting it.

I think SPLT has better compatibility compared to change the original PLT entry. Given the platform is old, and use the old dynamic linker, the new library that build with SPLT can still be linked by dynamic linker. I am not sure if it still works if we change the struct of PLT entry.
The second issue is GNU toolchain has already adopt SPLT, if we want to change the design, we need also influence the GNU community to revise the design and all their libraries which are built with CET enabled toolchain.

ruiu added a comment.Feb 19 2019, 6:31 PM

I think SPLT has better compatibility compared to change the original PLT entry. Given the platform is old, and use the old dynamic linker, the new library that build with SPLT can still be linked by dynamic linker. I am not sure if it still works if we change the struct of PLT entry.

PLT itself works fine even if we change the instructions and the size of the PLT entry. I know that because we already did that for Retpoline PLT. As a mitigation for Spectre, lld can emit a PLT that uses RET instead of indirect JMP. That hardened PLT entry is larger than the standard one.

The second issue is GNU toolchain has already adopt SPLT, if we want to change the design, we need also influence the GNU community to revise the design and all their libraries which are built with CET enabled toolchain.

But I strongly believe you should do that. As I said, imagine the future in which CET is widely adopted. All x86 executables in the future would have two PLTs for no obvious technical reason? I don't really want it to happen.

Hi ruiu

It is better to have ABI discussion at https://groups.google.com/forum/#!forum/x86-64-abi or send email x86-64-abi@googlegroups.com. Could you join to discuss the ABI issue there? Below is some piece of discussion that I can search from the forum https://groups.google.com/forum/#!topic/x86-64-abi/nWA7tcnjPnk.

There are 2 reasons for this 2-PLT scheme:

  1. Provide compatibility with other tools that have an hardcoded limit of 16

bytes for an x86 PLT entry.

  1. Improve code cache locality: since most of the instructions in .plt would be

executed only the first time a symbol is resolved they would waste space in
the cache and, by having a .plt.sec, only instructions that are often executed
would be cached.

Thanks

The 2nd reason will be obvious in the case that the program with many dynamic calls.

Hi dear ruiu:
The Second PLT in x86-64 psABI has been settled more than one year ago, considering the compatibility and performance, the designers carefully selected this way at that time.
And there seems no overwhelming reason to change the SPEC, for many people think the Second PLT way is more flexible and compatible.
I think it would be ill-advised to let LLD keep different with GCC in this point for the GCC has adopt this way for one year.
I wish you can reconsider the technology of Second PLT, and we also wish you can join us in discussing this point at https://groups.google.com/forum/#!topic/x86-64-abi/nWA7tcnjPnk.
Thank you very much!

ruiu added a comment.Feb 22 2019, 5:03 PM

I asked about the rationale behind the 2-PLT scheme to H. J. Lu who proposed the design. I was not convinced one of the reasons he presented, which is that the 2-PLT scheme provides compatibility with old systems, because clearly the 2-PLT scheme breaks compatibility by changing the semantics of the existing .plt section and adding the second PLT. The other reason was more convincing, which is, it was designed with performance in mind, and it was an effort to reduce the size of the hot code path. With the 2-PLT scheme, the second PLT is in theory hot and the first PLT is relatively cold, as the first PLT is used only when a symbol is resolved lazily. That being said, there was not evidence or benchmark results supporting the claim, so I cannot really buy that argument.

So I'm not personally happy about that ABI. However, the 2-PLT scheme is there since Intel MPX support was added to the ABI, and it seems too late to propose a change. Therefore, as you and H. J. Lu suggested, we should proceed with the 2-PLT scheme. I'd like to leave a honest comment about the claimed design rationale and our thought for future reference, though (you don't need to do that in this patch.)

ELF/Driver.cpp
1459

You added large amount of code here. What does this do in short? I'm not interested in a file format at the moment, but I'd like to know what this code is supposed to do at high level.

ELF/InputSection.cpp
713–727

From the linker's perspective, .splt *is* PLT, which inter-module calls jump to. If .splt is enabled, .plt is no longer a PLT in the regular sense, but it just contains code to lazy-resolve symbols. So, I think you should handle .splt as the PLT despite its name throughout this patch if .splt is enabled. For example, In.Plt should be the .splt and Sym.getPltVA() should return its address in .splt.

(I feel that the spec made a mistake when naming .splt -- it should've been .plt and the current "first" PLT should be .splt, but it's probably too late to complain.)

ELF/Writer.cpp
1811

How is GNU ifunc supposed to work when CET is enabled?

xiangzhangllvm marked 3 inline comments as done.Feb 24 2019, 6:28 PM

Hi ruiu, very thank you for considering accepting the 2-PLT ABI design in X86 arch, Yes, we really have no real measurable evidence for the benefit of this design, and I also think there must be other problems for the CET here, but it is just the beginning. We will work on improving and refining it.
Your questions and suggestions during the discussion were all very much to the point.
I'll first to implement your first 2 review suggestions above.
Thank you again!

ELF/Driver.cpp
1459

The main purpose of here code is collecting the X86 features in the .note.gnu.section. Because there are different formats of the .note.gnu.section, finding and collecting features need deal with some details here. It really seems too much code, I'll try my best to simple here code.

ELF/InputSection.cpp
713–727

Your suggestions make sense! I'll check and refine it, Thank you very much!

ELF/Writer.cpp
1811

The GNU IFUNC is also specified in https://github.com/hjl-tools/linux-abi/wiki/linux-abi-draft.pdf.
It very similar to lazy binding.
I think there will be not much problem for IFUNC if I use your second suggestion-(handle .splt as the PLT despite its name throughout this patch).
We'll resolve these un-compatibility between CET and IFUNC, retpolineplt in the following patches.

ruiu added a comment.Feb 25 2019, 1:46 PM

I'd think you should add a new linker command line flag to enforce CET from the initial patch as I suggested earlier. Security measures should not be automatically turned off, so the behavior implemented in this patch seems little dangerous.

The other reason why I want to see that in the initial patch is that I believe adding that flag makes code actually simpler. With a command line flag, you wouldn't need to "merge" .note section attributes. You would only need to check if an input file has a required bit in a .note section when reading a file (if not, report an error), and you can always emit a .note section of the same contents from the linker to enable CET.

Hi ruiu, very thank you for considering accepting the 2-PLT ABI design in X86 arch, Yes, we really have no real measurable evidence for the benefit of this design, and I also think there must be other problems for the CET here, but it is just the beginning. We will work on improving and refining it.
Your questions and suggestions during the discussion were all very much to the point.
I'll first to implement your first 2 review suggestions above.
Thank you again!

Wait, I thought this was going in because it was already part of the ABI not an experimental WIP. Which is it? :)

I'm also less convinced of the utility of this ABI extension and would like to hear about the appropriate discussions. I don't think I saw them on the list, but I could be mistaken. What's up here? What's going on with the use of gnu note sections rather than any other part of the ELF abi? If we can, I think there should be a longer discussion about both the motivation and implementation of this feature.

Thanks!

-eric

Hi ruiu:
I update the patch, all the change is to simplify collecting X86 feature as your first suggestion point out.
simplifing these code is mainly based on that we rudely erasing the .note.gnu.property sections after we deal with the GNU_PROPERTY_X86_FEATURE_1_AND property.
Instead, the old code will check and keep the non-GNU_PROPERTY_X86_FEATURE_1_AND property in .note.gnu.property section.
The reason is also writen in the patch:

+ We rudely erase the .note.gnu.property sections in the input files. Because
+
till now we just implemented GNU_PROPERTY_X86_FEATURE_1_AND property, the
+ other unhandled properties in the sections may be discard too. It seems not
+
reasonable. But till now it's reasonable, because the .note.gnu.property
+ is a special section, all the properties in this section need to be checked
+
and handled. If they are simply passed to output file without being handled
+ there may be a bug or exception in running the output file. For example,
+
the GNU_PROPERTY_X86_ISA_1_USED also need all input file contain its flags,
+ it can't be simply passed to the output file without checking other files.
+
So we erase these informations before we implement the function of them.

Hi echristo:
In fact, there are already a long discussion about your question at ABI team. It's hard for us to not comply with the ABI rules. I did not join the ABI-making.
So I just can give you the explain by myself: In my eyes, using a new section (.note.gnu.property) to hold the GNU_PROPERTY_X86_* property is based on the specialty
of these X86 properties, they need all input files hold the property, it is easier to check or discard them in a special section than put them in the old exisiting sections.
An it is more flexible and extensible to put them in a special section.
I think we can make a deeper discussion at https://groups.google.com/forum/#!forum/x86-64-abi
thank you very much!

MaskRay added a comment.EditedFeb 26 2019, 1:31 AM

In case you missed it, what people request here is a thread (with title: [RFC] ...) on the llvm-dev mailing list...

People who have replied here may know there is a discussion thread at https://groups.google.com/forum/#!topic/x86-64-abi/nWA7tcnjPnk But other developers working on components that may be affected by this 2-PLT scheme may not be aware of it: clang, lld, lldb, binutils, MC, ...

Next time, please consider sending an RFC to express your intent to implement a feature before finishing up your patch.

You may send an email to llvm-dev (see http://lists.llvm.org/pipermail/llvm-dev/2019-February/thread.html for some examples)

If we can, I think there should be a longer discussion about both the motivation and implementation of this feature.

In case you missed it, what people request here is a thread (with title: [RFC] ...) on the llvm-dev mailing list...

People who have replied here may know there is a discussion thread at https://groups.google.com/forum/#!topic/x86-64-abi/nWA7tcnjPnk But other developers working on components that may be affected by this 2-PLT scheme may not be aware of it: clang, lld, lldb, binutils, MC, ...

Next time, please consider sending an RFC to express your intent to implement a feature before finishing up your patch.

You may send an email to llvm-dev (see http://lists.llvm.org/pipermail/llvm-dev/2019-February/thread.html for some examples)

If we can, I think there should be a longer discussion about both the motivation and implementation of this feature.

Sorry for my unfamiliarity with the process. It's my first time to send the patch to LLD community. Now, I have just register on the llvm-dev, and send the patch information to http://lists.llvm.org/pipermail/llvm-dev/2019-February/130538.html

ruiu added a comment.Feb 26 2019, 10:19 AM

I don't think you added a new command line option. My suggestion was this:

  1. Add a new command line option, say, --intel-cet (that's perhaps not the final name of the option, I chose it tentatively to move things forward.)
  2. If --intel-cet is given, verify that each input file contains a .note section with an appropriate bit, and discard .note section after reading it (so we don't merge them or copy them to the output file). If there's a file that doesn't contain a proper .note, report an error and stop.
  3. When creating an input file, create a .note section with appropriate contents when --intel-cet was given.

Specifically, I don't think you should add a new member X86Feature1AND to Config because with the above scheme you are no longer merging .note section so you don't need to collect bits. mergeAggregateMetadata should also be deleted from the patch.

What do you think?

Can you also rebase your patch so that I can apply your patch cleanly to git head?

I don't think you added a new command line option. My suggestion was this:

  1. Add a new command line option, say, --intel-cet (that's perhaps not the final name of the option, I chose it tentatively to move things forward.)
  2. If --intel-cet is given, verify that each input file contains a .note section with an appropriate bit, and discard .note section after reading it (so we don't merge them or copy them to the output file). If there's a file that doesn't contain a proper .note, report an error and stop.
  3. When creating an input file, create a .note section with appropriate contents when --intel-cet was given.

Hi ruiu:
Yes, I know your suggestion, and I was discussing the option with GCC team, because GCC 9 supports to use lld too, and there is no this kind option in current GNU-ld, I want to let them know this thing.
In fact, The current approach defined in psABI provides the maximum forward and backward binary compatibility so that CET can be enabled in OS piece by piece. Otherwise, CET OS enabling can
be very hard.

Specifically, I don't think you should add a new member X86Feature1AND to Config because with the above scheme you are no longer merging .note section so you don't need to collect bits. mergeAggregateMetadata should also be deleted from the patch.

What do you think?

Using Config->X86Feature1AND to collect the features' flags is nesseary, There are 2 reasons:
1st, --intel-cet is an mandatory option that force all the input files, libs, even kernels to be CETed, this is not sufficient in current state, so the non-mandatory cet scheme will the be a top priority
for a long time.
2nd, There are more than one features in the GNU_PROPERTY_X86_FEATURE_1_AND property, each feature need to "AND" with other
input file's. We need to "calculate" the resulting features of GNU_PROPERTY_X86_FEATURE_1_AND by ourselves before pass them into the output file.

Can you also rebase your patch so that I can apply your patch cleanly to git head?

sure!
Thank you very much!

I don't think you added a new command line option. My suggestion was this:

  1. Add a new command line option, say, --intel-cet (that's perhaps not the final name of the option, I chose it tentatively to move things forward.)
  2. If --intel-cet is given, verify that each input file contains a .note section with an appropriate bit, and discard .note section after reading it (so we don't merge them or copy them to the output file). If there's a file that doesn't contain a proper .note, report an error and stop.
  3. When creating an input file, create a .note section with appropriate contents when --intel-cet was given.

Specifically, I don't think you should add a new member X86Feature1AND to Config because with the above scheme you are no longer merging .note section so you don't need to collect bits. mergeAggregateMetadata should also be deleted from the patch.

What do you think?

This will not work because it makes it impossible to roll out CET support in a distribution incrementally. Each time a library is CET-enabled (because all its assembler files are properly annotated), all reverse dependencies would have to be patched and their build systems changed to pass a new flag to the linker.

How does lld handle the non-executable stack flag? From a developer point of view, the CET processing is supposed to be very similar.

This will not work because it makes it impossible to roll out CET support in a distribution incrementally. Each time a library is CET-enabled (because all its assembler files are properly annotated), all reverse dependencies would have to be patched and their build systems changed to pass a new flag to the linker.

How does lld handle the non-executable stack flag? From a developer point of view, the CET processing is supposed to be very similar.

Yes the CET can't be mandatory now, and we will re-writing the OS kernels and libs to support it, anyway there will be a very long way to go.
So, now CET in this patch is very easy to be disabled (once linked any non-CET input file). It won't infect current non-CET programs.

Thanks for putting this feature forward. I've not had a chance to go through everything in detail but I thought it would be important to mention that AArch64 has a similar set of features (Pointer Authentication PAC and Branch Target Identification BTI) that are going to use .note.gnu.property sections with GNU_PROPERTY_AARCH64_FEATURE_1_AND (same meaning as GNU_PROPERTY_X86_FEATURE_1_AND), with two associated feature bits GNU_PROPERTY_AARCH64_FEATURE_1_BTI and GNU_PROPERTY_AARCH64_FEATURE_1_PAC. AArch64 does need a modified PLT entry to make this work but it doesn't use a .splt, in effect an extra instruction at the top of the PLT if BTI is used and one at the end if PAC is used, or both if both BTI and PAC are needed. Given that we will have at least two targets implementing a similar mechanism but with target specific details then we'll either need to make it my responsibility to generalise the .note.gnu.property implementation so that it can support both AArch64 and X86, or we make it generic from the start. The main difference for AArch64 is that there are two independent feature bits to track.

In the case of AArch64 it is important for the program loader to only enable the BTI/PAC feature for the process if the whole program has been compiled/assembled to support it. Our prior experience with assembler files in particular is that it is very easy to get a single .s file added to a build that is harmless but doesn't have the .note.gnu.property set properly and a single one of these would be enough to clear all the features. Our thoughts were to add a command line option to force generation of the appropriate PLTs and the .note.gnu.property in the output file but to warn if an input file doesn't have the .note.gnu.property flag.

In the case of AArch64 (I don't know about X86), to ease deployment, the new instructions have been added to the Hint space so that they will act as no-ops AArch64 machines that don't support them.

Unfortunately I can't point you at any official documentation of the flags yet, the next release of the 64-bit ABI will have the details and should be released soon. I hope to post some patches soon after that has been done.

Unfortunately I can't point you at any official documentation of the flags yet, the next release of the 64-bit ABI will have the details and should be released soon. I hope to post some patches soon after that has been done.

Soon happened to be a few hours ago: https://developer.arm.com/docs/ihi0056/latest/elf-for-the-arm-64-bit-architecture-aarch64-abi-2018q4 search for BTI and PAC

Thanks for putting this feature forward. I've not had a chance to go through everything in detail but I thought it would be important to mention that AArch64 has a similar set of features (Pointer Authentication PAC and Branch Target Identification BTI) that are going to use .note.gnu.property sections with GNU_PROPERTY_AARCH64_FEATURE_1_AND (same meaning as GNU_PROPERTY_X86_FEATURE_1_AND), with two associated feature bits GNU_PROPERTY_AARCH64_FEATURE_1_BTI and GNU_PROPERTY_AARCH64_FEATURE_1_PAC. AArch64 does need a modified PLT entry to make this work but it doesn't use a .splt, in effect an extra instruction at the top of the PLT if BTI is used and one at the end if PAC is used, or both if both BTI and PAC are needed. Given that we will have at least two targets implementing a similar mechanism but with target specific details then we'll either need to make it my responsibility to generalise the .note.gnu.property implementation so that it can support both AArch64 and X86, or we make it generic from the start. The main difference for AArch64 is that there are two independent feature bits to track.

In the case of AArch64 it is important for the program loader to only enable the BTI/PAC feature for the process if the whole program has been compiled/assembled to support it. Our prior experience with assembler files in particular is that it is very easy to get a single .s file added to a build that is harmless but doesn't have the .note.gnu.property set properly and a single one of these would be enough to clear all the features. Our thoughts were to add a command line option to force generation of the appropriate PLTs and the .note.gnu.property in the output file but to warn if an input file doesn't have the .note.gnu.property flag.

$ info ld

'shstk'
     Generate GNU_PROPERTY_X86_FEATURE_1_SHSTK in
     .note.gnu.property section to indicate compatibility with
     Intel Shadow Stack.  Supported for Linux/i386 and
     Linux/x86_64.

'ibtplt'
     Generate Intel Indirect Branch Tracking (IBT) enabled PLT
     entries.  Supported for Linux/i386 and Linux/x86_64.

'ibt'
     Generate GNU_PROPERTY_X86_FEATURE_1_IBT in .note.gnu.property
     section to indicate compatibility with IBT. This also implies
     'ibtplt'.  Supported for Linux/i386 and Linux/x86_64.
ruiu added a comment.Feb 27 2019, 10:42 AM

is given, verify that each input file contains a .note section with an appropriate bit, and discard .note section after reading it (so we don't merge them or copy them to the output file). If there's a file that doesn't contain a proper .note, report an error and stop.

  1. When creating an input file, create a .note section with appropriate contents when --intel-cet was given.

Specifically, I don't think you should add a new member X86Feature1AND to Config because with the above scheme you are no longer merging .note section so you don't need to collect bits. mergeAggregateMetadata should also be deleted from the patch.

What do you think?

This will not work because it makes it impossible to roll out CET support in a distribution incrementally. Each time a library is CET-enabled (because all its assembler files are properly annotated), all reverse dependencies would have to be patched and their build systems changed to pass a new flag to the linker.

How does lld handle the non-executable stack flag? From a developer point of view, the CET processing is supposed to be very similar.

You made a good point: lld actually uses only the command line flag to enable non-executable stack flag for the same reason.

is given, verify that each input file contains a .note section with an appropriate bit, and discard .note section after reading it (so we don't merge them or copy them to the output file). If there's a file that doesn't contain a proper .note, report an error and stop.

  1. When creating an input file, create a .note section with appropriate contents when --intel-cet was given.

Specifically, I don't think you should add a new member X86Feature1AND to Config because with the above scheme you are no longer merging .note section so you don't need to collect bits. mergeAggregateMetadata should also be deleted from the patch.
What do you think?

Hi, Sorry ruiu, I don't much understand your idea, may I ask you some questions?
Just check each file contains .note.gnu.property or not? If so, how to deal with the appropriate bits from different files' sections? If we didn't deal with these appropriate bits, what the resulting context will put into the outfile's .note.gnu.property? For example, if a file
contain IBT and SHSTK bits, and another file just contain SHSTK bit, what the resulting bits will be in outfile if we did not deal with it (Now we do this thing in mergeAggregateMetadata).
As I know, lld will just treats the .note.gnu.proterty section as notes, and just concatenates all the .note.gnu.proterty sections together.

>> This will not work because it makes it impossible to roll out CET support in a distribution incrementally. Each time a library is CET-enabled (because all its assembler files are properly annotated), all reverse dependencies would have to be patched and their build systems changed to pass a new flag to the linker.

You made a good point: lld actually uses only the command line flag to enable non-executable stack flag for the same reason.

I think you mis-understand fweimer's idea, I think he want to say we can't use option to force enable CET, and it will very trouble if we let a library CETed.

In fact, if we add an option to force enable CET, We also need to do CET work as current way when the option not used.
just like this patch:

1 diff --git a/ELF/Config.h b/ELF/Config.h
2 index cdf594d..f49dae9 100644
3 --- a/ELF/Config.h
4 +++ b/ELF/Config.h
5 @@ -188,6 +188,7 @@ struct Configuration {
6    bool ZCombreloc;
7    bool ZCopyreloc;
8    bool ZExecstack;
9 +  bool ZForceCET;

10 bool ZGlobal;
11 bool ZHazardplt;
12 bool ZInitfirst;
13 diff --git a/ELF/Driver.cpp b/ELF/Driver.cpp
14 index f6da5df..5708fdf 100644
15 --- a/ELF/Driver.cpp
16 +++ b/ELF/Driver.cpp
17 @@ -348,8 +348,8 @@ static bool getZFlag(opt::InputArgList &Args, StringRef K1, StringRef K2,
18
19 static bool isKnownZFlag(StringRef S) {
20 return S == "combreloc" || S == "copyreloc" || S == "defs" ||
21 - S == "execstack" || S == "global" || S == "hazardplt" ||
22 - S == "initfirst" || S == "interpose" ||
23 + S == "execstack" || S == "force-cet" || S == "global" ||
24 + S == "hazardplt" || S == "initfirst" || S == "interpose" ||
25 S == "keep-text-section-prefix" || S == "lazy" || S == "muldefs" ||
26 S == "nocombreloc" || S == "nocopyreloc" || S == "nodefaultlib" ||
27 S == "nodelete" || S == "nodlopen" || S == "noexecstack" ||
28 @@ -873,6 +873,7 @@ void LinkerDriver::readConfigs(opt::InputArgList &Args) {
29 Config->ZCombreloc = getZFlag(Args, "combreloc", "nocombreloc", true);
30 Config->ZCopyreloc = getZFlag(Args, "copyreloc", "nocopyreloc", true);
31 Config->ZExecstack = getZFlag(Args, "execstack", "noexecstack", false);
32 + Config->ZForceCET = hasZOption(Args, "force-cet");
33 Config->ZGlobal = hasZOption(Args, "global");
34 Config->ZHazardplt = hasZOption(Args, "hazardplt");
35 Config->ZInitfirst = hasZOption(Args, "initfirst");
36 @@ -1656,9 +1657,16 @@ template <class ELFT> void LinkerDriver::mergeAggregateMetadata() {
37 There is no X86Feature1AND .note.gnu.property section in this file. Set
38
all CFProtection properties to false and discard other such sections.
39 IsAllFileF1AND = false;
40 - } else {
41 +
42 + // If used -z force-cet option, all the input files must be CET marked.
43 + if (Config->ZForceCET) {
44 + error(toString(F) + "must enable CET when using -z force-cet " +
45 + "linker option.");
46 + break;
47 + }
48 + } else {
49 readGNUPropertyX86Feature1AND<ELFT>(*I, Config, X86F1ANDDescLoca);
50 - }
51 + }
52 }
53 if (!IsAllFileF1AND) {
54 Config->SPltSectionEnable = false;

Add -z force-cet option to force enable CET.
But if we do not use this option, CET will also try to work unless meet the conditions of no satisfaction.

Add -z force-cet option to force enable CET.
But if we do not use this option, CET will also try to work unless meet the conditions of no satisfaction.

GNU linker supports:

'shstk'

Generate GNU_PROPERTY_X86_FEATURE_1_SHSTK in
.note.gnu.property section to indicate compatibility with
Intel Shadow Stack.  Supported for Linux/i386 and
Linux/x86_64.

'ibtplt'

Generate Intel Indirect Branch Tracking (IBT) enabled PLT
entries.  Supported for Linux/i386 and Linux/x86_64.

'ibt'

Generate GNU_PROPERTY_X86_FEATURE_1_IBT in .note.gnu.property
section to indicate compatibility with IBT. This also implies
'ibtplt'.  Supported for Linux/i386 and Linux/x86_64.
ruiu added a comment.Mar 1 2019, 2:20 PM

I'm not still convinced that having the mechanism to automatically enable/disable CET depending on how input object files are compiled is a good idea. It seems really bad idea to me. I'd think the right way to roll out CET is (1) compile source files with CET enabled, (2) enforce CET at the linker, and (3) permanently keep the linker option so that CET is not accidentally turned off. Step 3 is in practice mandatory for programs that you want to use CET, or otherwise you are exposed to risk of accidentally disabling it. And if you have to enforce CET at the linker level for programs you are serious about their security, there's no point of "automatically" enabling it. Rather, I'd think that a security mechanism that is "likely" automatically turned on is more harmful than helpful; that gives a false impression that a user's program is protected with CET even if it's not. I seems to me that that's a dangerous situation and would cause security issues that can be prevented now. Enabling and disabling security mechanism should be an intentional choice.

MaskRay added a comment.EditedMar 1 2019, 10:07 PM

Now the IBT can't work with retpolineplt feature, becasue they may conflict
in changing the PLT. So we disable the IBT when the IFUNC enabled.

According to Retpoline-A-Branch-Target-Injection-Mitigation.pdf Section 4.3

To avoid this conflict, future Intel processors implementing CET will also contain hardware mitigations for branch target injection (Spectre variant 2) (enhanced IBRS), that obviate the need for retpoline.

I hope this will be true (CET and enhanced IBRS) and retpolineplt will no longer be needed. If that is the case, you may delete Config->ZRetpolineplt checks from CET handling code and add an incompatibility error in ELF/Driver.cpp:checkOptions:

if (Config->IntelCET && Config->ZRetpolineplt)
  error("--intel-cet may not be used with -z retpolineplt");

(--intel-cet may not be the final name of the option.)

Given that we will have at least two targets implementing a similar mechanism but with target specific details then we'll either need to make it my responsibility to generalise the .note.gnu.property implementation so that it can support both AArch64 and X86, or we make it generic from the start. The main difference for AArch64 is that there are two independent feature bits to track.

Soon happened to be a few hours ago: https://developer.arm.com/docs/ihi0056/latest/elf-for-the-arm-64-bit-architecture-aarch64-abi-2018q4 search for BTI and PAC

@peter.smith Looking forward to a generic .note.gnu.property mechanism! Hope the ABI will soon get a PDF version (for now the link is nearly unsearchable. The avaiable PDF is pretty old (of version May 22, 2013)).

When creating an input file, create a .note section with appropriate contents when --intel-cet was given.

I believe executable stack has been considered bad since 1990s. lld is new(fortunate) enough to make an elegant choice: simply default to noexecstack and not to care about --warn-execstack and the tricky is_default_stack_executable() warn_execstack() states. For CET we do need to consider transition plans, which may make the logic complex (I am also not happy with that).

@ruiu Have you considered a --warn-intel-cet (or --error-intel-cet, --intel-cet-handling=error, -z intel-shstk --intel-cet=ibt,shstk)?

  • AND-feature bits (GNU_PROPERTY_AARCH64_FEATURE_1_AND GNU_PROPERTY_X86_FEATURE_1_AND) in .note.gnu.property will always be handled and emitted into linked executable/DSO
  • If the option is seen but the secure bit of the result AND-feature is not set, report a warning/error.
  • The option is permanently added to mark this executable/DSO is really CET protected.
  • Say in 5/10 years, this option becomes the default in lld (the almost ideal -z noexecstack status as we have today). Users need --no-intel-cet if they want to opt out.

Users should not believe the executable/DSO is CET protected unless the option is used in linking. The issue is that some executable/DSO not linked by --intel-cet may give a false impression that they are CET protected (say there is a .splt section) though actually not.

Question: do we have AMD people here describing if they could have a similar security enhancement feature?

I'm not still convinced that having the mechanism to automatically enable/disable CET depending on how input object files are compiled is a good idea. It seems really bad idea to me. I'd think the right way to roll out CET is (1) compile source files with CET enabled, (2) enforce CET at the linker, and (3) permanently keep the linker option so that CET is not accidentally turned off. Step 3 is in practice mandatory for programs that you want to use CET, or otherwise you are exposed to risk of accidentally disabling it. And if you have to enforce CET at the linker level for programs you are serious about their security, there's no point of "automatically" enabling it. Rather, I'd think that a security mechanism that is "likely" automatically turned on is more harmful than helpful; that gives a false impression that a user's program is protected with CET even if it's not. I seems to me that that's a dangerous situation and would cause security issues that can be prevented now. Enabling and disabling security mechanism should be an intentional choice.

Just to check I understand, I think your concern is that as the feature is enabled automatically when it is safe to do so, but is silently disabled if at least one object is not marked correctly, so users can't be sure that their binaries are protected without checking the output .note.gnu.property section?

I've tried to sum up the possibilities in the table below. With Auto behaviour as proposed in this patch and I believe is implemented in ld.bfd. The Manual behaviour is my extrapolation of an alternative behaviour, I think we'd have to do something when all the inputs had the CET property and the user hadn't set the linker command line flag otherwise we'd have an equivalent user expectation problem that the feature would be enabled.

Linker CLI switchAll input objects have propertyAuto behaviourManual behaviour
onyesEnable feature (PLT and mark output)same as Auto
offyesEnable feature (PLT and mark output)Warn "All inputs have CET but you have not enabled it, did you mean to switch it on?
onnoDo not enable feature, Warn or error, diagnose objects missing featuresame as Auto
offnoDo nothingsame as Auto

At this stage I couldn't definitively say that Auto is better than Manual or vice-versa. However if GNU ld has already implemented it one way and we have a community of people expecting it to work that way it seems like it is worth following suite unless there is a really good reason not to.

I agree with you that people should always use the command-line option to tell the linker that the feature is desired/required. If there is a common command line between ld, bfd and to a lesser extend gold, then the linker driver should set it if the compiler flag has been used.

I agree with you that people should always use the command-line option to tell the linker that the feature is desired/required. If there is a common command line between ld, bfd and to a lesser extend gold, then the linker driver should set it if the compiler flag has been used.

Thank you, peter, the table is very clear!
Just one point I want to say: We do not need check the compiler flags, Because if the CET flag exist in input files, it has directly mean the compiler flag been used.

My appoint is similar to yours.
And another my concern is the transition period of using CET:
If CET can only be enabled by manual behavior, I can make sure that no body will open CET option in this period, because they are very easy to get link errors when they are trying to link files. People will directly remove the CET options in their projects (if contain CET at first).

In fact, CET works in units of program processes,it can and should be enabled in OS or other platforms piece by piece, otherwise it will be very hard to pass the transition period time.

This week Lu.H.J (CET designer and Implementer in GNU) will come, I will discuss all your ideas with him.
Thank you! all here friends

Hi friends:
Today I discussed our option ideals with H.J.
In short, He suggested us to choose "Auto CET in linker by default", he gaves 3 key reason:

  1. We rarely used linker to directly link out programs, instead, we mostly use compiler to directly build our programs. So in these cases, users need to set CET compilation options (-fcf-protection)

in the build command lines if they want to use CET. They know it.

  1. He illustrate how he enabled CET piece by piece: for the gcc + glibc projects which depends on each other in their 'growth', he first partly CETed gcc, then use it partly CETed glibc, and again use the partly CETed gcc+glibc to build gcc and glibc, after times of this recirculating

he fully CETed the Interdependence projects. He also estimate that there will be cost yesrs to fully CETed the Linux OS and its programs, This is a long process". All of this will be very hard if we set "Enforce CET in linker".

By the way: Do we want to rebuild the Linux with LLVM one day?
  1. He explained that, IBT of CET can works in part unit of program process, if one CETed program load NON-CETed dynamic *.so files, The CET can just work in the old part of the program (not contain dynamic libs).

So, he suggest us to choose this "Auto" way, let program try their best to use CET.

Any way, This is just his suggestion. He said he also respect our last selections.

So, friends, if you still have concerns about the option way, please let me know, I'am not object to change the patch. I respect your ideas too.
I just hope to commit the first CET patch in LLD earlier, and then refining it.
Thank you very much!

How does lld handle the non-executable stack flag? From a developer point of view, the CET processing is supposed to be very similar.

The same messy way GNU tools do. Automatically detecting features based on input objects is convenient for users, at the cost of additional complexity in the tools and the introduction of silent failure modes.

@xiangzhangllvm, a question for you not directly related to the patch but I suspect you might be best suited to find an answer: how do we test CET support today? We started trying to add CET to FreeBSD some time ago but have no way to test any work that we do.

How does lld handle the non-executable stack flag? From a developer point of view, the CET processing is supposed to be very similar.

The same messy way GNU tools do. Automatically detecting features based on input objects is convenient for users, at the cost of additional complexity in the tools and the introduction of silent failure modes.

Hi emaste:
LLD will set the non-executable stack flag in default. The following code may answer your questions:

// The GNU linker uses .note.GNU-stack section as a marker indicating
// that the code in the object file does not expect that the stack is
// executable (in terms of NX bit). If all input files have the marker,
// the GNU linker adds a PT_GNU_STACK segment to tells the loader to
// make the stack non-executable. Most object files have this section as
// of 2017.
//
// But making the stack non-executable is a norm today for security
// reasons. Failure to do so may result in a serious security issue.
// Therefore, we make LLD always add PT_GNU_STACK unless it is
// explicitly told to do otherwise (by -z execstack). Because the stack
// executable-ness is controlled solely by command line options,
// .note.GNU-stack sections are simply ignored.

And , do you mean the relations between CET and the non-executable stack? CET focus on the control flow protections, not only on stack.

@xiangzhangllvm, a question for you not directly related to the patch but I suspect you might be best suited to find an answer: how do we test CET support today? We started trying to add CET to FreeBSD some time ago but have no way to test any work that we do.

CET should be support on hardware and OS first, all of this, our team have done. It is really a big work.
H.J. have enable the CET in GNU-toolchain and rebuild/rewrite the related libs/linux-kernal, his team are testing it for at last one year.
You can ask him the testing work details.

From the linker's perspective, .splt *is* PLT, which inter-module calls jump to. If .splt is enabled, .plt is no longer a PLT in the regular sense, but it just contains code to lazy-resolve symbols. So, I think you should handle .splt as the PLT despite its name throughout this patch if .splt is enabled. For example, In.Plt should be > the .splt and Sym.getPltVA() should return its address in .splt.
(I feel that the spec made a mistake when naming .splt -- it should've been .plt and the current "first" PLT should be .splt, but it's probably too late to complain.)

Hi , @ruiu:
I was refining the patch as your second suggestion, I got some troubles in this work. I found it hard to "reverse" the roles of the 2 PLTs in the source code.

  1. We can not simply use Sym.getPltVA() instead of Sym.getSPltVA(), because the first PLT entry is also needed somewhere, for example:

void X86_64<ELFT>::writeGotPlt(uint8_t *Buf, const Symbol &S) const {

if (Config->SPltSectionEnable)
  // If the PLT is IBT-enabled, the entries in .got.plt also initially point to
  // the entries in the first PLT not the second PLT.
  write64le(Buf, S.getPltVA());
else
  // Otherwise the entries in .got.plt initially point back to the
  // corresponding PLT entries with a fixed offset to skip the first
  // instruction.
  write64le(Buf, S.getPltVA() + 6);

}

  1. If I "reverse" the definition of the 2 PLTs fully, how about the Non-CET case, use SPLT class to definite the plts? And I worry that it may affect the other targets' code.

I am also a little afraid that will take confusions of understanding the code when people refer the ABI files.

Could you please give me suggestions about these if you have some good ideals?
And do you still concern the options way?

@xiangzhangllvm, a question for you not directly related to the patch but I suspect you might be best suited to find an answer: how do we test CET support today? We started trying to add CET to FreeBSD some time ago but have no way to test any work that we do.

We are using a special CET SDV for CET enabling. We have been enabling CET in Fedora piece by piece by adding -fcf-protection to CFLAGS.

I think https://bugs.llvm.org/show_bug.cgi?id=41000 "PT_NOTE segment not packed correctly" has some relevance to this review. In particular the .note.gnu.property sections on 64-bit machines have 8-byte alignment, merging these with 4-byte aligned note sections such as build-id can cause problems for consumers of PT_NOTE (see also https://sourceware.org/ml/libc-alpha/2018-08/msg00340.html for a discussion on the glibc mailing list). My understanding is that the eventual solution was to collate 4 and 8 byte SHT_NOTE sections separately and output a PT_NOTE section for each. I think that we should follow that and have tests for it.

I think https://bugs.llvm.org/show_bug.cgi?id=41000 "PT_NOTE segment not packed correctly" has some relevance to this review. In particular the .note.gnu.property sections on 64-bit machines have 8-byte alignment, merging these with 4-byte aligned note sections such as build-id can cause problems for consumers of PT_NOTE (see also https://sourceware.org/ml/libc-alpha/2018-08/msg00340.html for a discussion on the glibc mailing list). My understanding is that the eventual solution was to collate 4 and 8 byte SHT_NOTE sections separately and output a PT_NOTE section for each. I think that we should follow that and have tests for it.

Good discovery!
I think we need change the .note.gnu.property section alignment to 4-byte in Clang-TableGen, because all its element is 4-byte required now.
The old ABI required some of note.gnu.property' elements be 8-byte aligned in 64-bit machine, so the section required 8-byte alignment.
Last year, I found a bug about this alignment between GCC and CLANG, Then I discussed with the ABI designer and changed its element's alignment in https://reviews.llvm.org/D56080 .
I think there are already no ‘significance’ of 8 byte alignment for this section, we can change it now in LLVM.

MaskRay added a comment.EditedMar 8 2019, 6:17 PM

I think https://bugs.llvm.org/show_bug.cgi?id=41000 "PT_NOTE segment not packed correctly" has some relevance to this review. In particular the .note.gnu.property sections on 64-bit machines have 8-byte alignment, merging these with 4-byte aligned note sections such as build-id can cause problems for consumers of PT_NOTE (see also https://sourceware.org/ml/libc-alpha/2018-08/msg00340.html for a discussion on the glibc mailing list). My understanding is that the eventual solution was to collate 4 and 8 byte SHT_NOTE sections separately and output a PT_NOTE section for each. I think that we should follow that and have tests for it.

Good discovery!
I think we need change the .note.gnu.property section alignment to 4-byte in Clang-TableGen, because all its element is 4-byte required now.
The old ABI required some of note.gnu.property' elements be 8-byte aligned in 64-bit machine, so the section required 8-byte alignment.
Last year, I found a bug about this alignment between GCC and CLANG, Then I discussed with the ABI designer and changed its element's alignment in https://reviews.llvm.org/D56080 .
I think there are already no ‘significance’ of 8 byte alignment for this section, we can change it now in LLVM.

@peter.smith @xiangzhangllvm We can move the PT_NOTE alignment discussion to https://reviews.llvm.org/D59120 .

Other than the Auto vs Manual (for .note.gnu.property merging) debate, the second PLT section .splt may be more controversial. Hopefully we get consensus on .note.gnu.property soon and focus on that.

By the way: Do we want to rebuild the Linux with LLVM one day?

@xiangzhangllvm I think the answer is yes, at least some organizations are interested at that. https://github.com/ClangBuiltLinux/linux/issues I believe clang can build aarch64 and x86_64 Linux now. Linking the kernel with lld is probably done on aarch64, and linking it with x86_64 and powerpc64 has made much progress (the issue land is mostly the linker scripts).

Yes, MaskRay, let's get consensus on Auto vs Manual soon,
If you and @ruiu do not buy the reasones of Auto way, I tend to switch to the Manual way right now, after all, it is easy to 'refine' in future.

Hi friends:
I add --auto-cet and --force-cet options, and disable CET in default. Now there will be no influence to the LLD when not using these options.
I did these because I want to quickly finish the options discussion, I think it is easy to refine in the future.
Thank you very much!

ruiu added a comment.Mar 11 2019, 10:39 AM

Hi friends:
I add --auto-cet and --force-cet options, and disable CET in default. Now there will be no influence to the LLD when not using these options.
I did these because I want to quickly finish the options discussion, I think it is easy to refine in the future.

We actually need to be careful when adding new command line options. It is after all the only thing our users will see, and a flaw in a design would cause a real pain. In addition to that, a new feature with a new command line option is not something we can refine easily in the future. It is actually opposite: once we add an option, it is extremely difficult to change or remove it. Therefore, it is worth spending a little time to get everything right from beginning.

ruiu added a comment.Mar 11 2019, 11:11 AM

--auto-cet and --force-cet is perhaps fine. That should ensure users are aware of what they are doing (so the feature is not enabled too automatically), and it also provides a way to pin it so that it won't be accidentally turned off.

Was there any reason to choose that option names? H. J. Lu mentioned that GNU linkers supports several -z options for CET. If there's no strong reason to invent new option names, we should use the same command line options as GNU.

I think there are still several issues with this patch. I pointed out some of them before, but to summarize, let me iterate:

  • Your patch collects X86Feature1 bits from input files to copy that to a .note section in the output file. I don't think you need to do that. You only need to make sure that the input files have a desired bit in X86Feature1 bits, and generate a .note section containing GNU_PROPERTY_X86_FEATURE_1_IBT bit. In other words, please remove X86Feature1AND from Config.
  • You add fair amount of new code to Driver.cpp, but which feels wrong. It should not be part of a driver. But I think the real problem is that the code to read .note section does just too much.
  • Do you really have to read a .note section containing multiple descriptions? I think that compiler-created .note section contains only one description per a section, so you don't need to care abou it.
xiangzhangllvm added a comment.EditedMar 11 2019, 6:32 PM

--auto-cet and --force-cet is perhaps fine. That should ensure users are aware of what they are doing (so the feature is not enabled too automatically), and it also provides a way to pin it so that it won't be accidentally turned off.

Was there any reason to choose that option names? H. J. Lu mentioned that GNU linkers supports several -z options for CET. If there's no strong reason to invent new option names, we should use the same command line options as GNU.

Gold auto enabled CET in default, so it have no options corresponding to the --auto-cet and --force-cet.
The '-z' options in gold is just for testing, For Simplify,It better to add them in a dependent patch.

I think there are still several issues with this patch. I pointed out some of them before, but to summarize, let me iterate:

  • Your patch collects X86Feature1 bits from input files to copy that to a .note section in the output file. I don't think you need to do that. You only need to make sure that the input files have a desired bit in X86Feature1 bits, and generate a .note section containing GNU_PROPERTY_X86_FEATURE_1_IBT bit. In other words, please remove X86Feature1AND from Config.

Yes, ruiu, last time I also ask you about this point. please let me summarize your ideas here, then please check if I understand right?

  1. If not all input files have X86Feature1 bits, then discard the .note.gnu.property sections.
  2. If all input files have X86Feature1 bits: then we need to 'calculate' the resulting X86Feature1 bits from all the X86Feature1 bits ? (yes or no ?) then delete all the input files' .note.gnu.property sections ? (yes or no ?) then generate a .note.gnu.property section with the resulting X86Feature1 bits in input file level not in output file by Writer, so we need not to pass the X86Feature1 bits to Writer ? (yes or no ?)
  • You add fair amount of new code to Driver.cpp, but which feels wrong. It should not be part of a driver. But I think the real problem is that the code to read .note section does just too much.

Hi, ruiu, there are not very much code now, in fact, half of them are code annotation.

  • Do you really have to read a .note section containing multiple descriptions? I think that compiler-created .note section contains only one description per a section, so you don't need to care about it.

Yes, the compiler-created .note section contains only one description per a section, But we should consider the handwriting assemble files.

Thank you for your careful review!

Yes, ruiu, last time I also ask you about this point. please let me summarize your ideas here, then please check if I understand right?

If not all input files have X86Feature1 bits, then discard the .note.gnu.property sections.
If all input files have X86Feature1 bits:

then we need to 'calculate' the resulting X86Feature1 bits from all the X86Feature1 bits ?(yes or no ?)
then delete all the input files' .note.gnu.property sections ?(yes or no ?)
then generate a .note.gnu.property section with the resulting X86Feature1 bits in input file level not in output file by Writer, so we need not to pass the X86Feature1 bits to Writer ?(yes or no ?)
MaskRay added a comment.EditedMar 11 2019, 7:11 PM

Gold auto enabled CET in default

What kind of CET support does gold have now? I think it can merge .note.gnu.property now but I don't find code related to second PLT or ibtplt. (I have a question about ibtplt, see below).

I add --auto-cet and --force-cet options, and disable CET in default. Now there will be no influence to the LLD when not using these options.

I find that ld.bfd has three related options -z ibt -z ibtplt -z shstk (are there others?). What do they do?

-z ibtplt                   Generate IBT-enabled PLT entries\n"));
fprintf (file, _("\
-z ibt                      Generate GNU_PROPERTY_X86_FEATURE_1_IBT\n"));
fprintf (file, _("\
-z shstk                    Generate GNU_PROPERTY_X86_FEATURE_1_SHSTK\n"));

Is -z ibtplt similar to the second PLT? Does -z ibt generate GNU_PROPERTY_X86_FEATURE_1_IBT even if there is an input object file with the GNU_PROPERTY_X86_FEATURE_1_IBT bit cleared? Or when it is specified while -z shstk is not, merge just ibt but not shstk? I'm puzzled as you said the note merging behavior is automatic in ld.bfd.

--auto-cet and --force-cet

We can also make one option --cet={none,auto,force}, but I'd like to understand first what -z ibt -z shstk -z ibtplt do...

ELF/Driver.cpp
75

This macros is unnecessary. You can use alignTo. See ELF/LinkerScript.cpp for examples.

333

Delete "IBT can" " instead of retpolineplt".

They are two different aspects of security and are actually unrelated. I think the incompatibility is not due to technical impossibility, but we expect ( Retpoline-A-Branch-Target-Injection-Mitigation.pdf Section 4.3 claims) in the future retpolineplt can retire because of better hardware-level protection on CET-enabled processors.

MaskRay added inline comments.Mar 11 2019, 7:27 PM
test/ELF/i386-feature-1-and.s
24

These llvm-readelf -S %t RUN commands can share the same check-prefix. They just check if .note.gnu.property is present. (Actually I'm thinking if they can be deleted if the related llvm-readelf -n is not empty)

40

Typo. The comment may be less useful. A brief comment above the llvm-mc ld.lld commands explaining what the test is intended for may be more helpful to readers.

test/ELF/x86-64-feature-1-and.s
40

Typo. protery -> property

xiangzhangllvm added a comment.EditedMar 11 2019, 8:01 PM

Gold auto enabled CET in default

What kind of CET support does gold have now? I think it can merge .note.gnu.property now but I don't find code related to second PLT or ibtplt. (I have a question about ibtplt, see below).

I add --auto-cet and --force-cet options, and disable CET in default. Now there will be no influence to the LLD when not using these options.

I find that ld.bfd has three related options -z ibt -z ibtplt -z shstk (are there others?). What do they do?

-z ibtplt                   Generate IBT-enabled PLT entries\n"));
fprintf (file, _("\
-z ibt                      Generate GNU_PROPERTY_X86_FEATURE_1_IBT\n"));
fprintf (file, _("\
-z shstk                    Generate GNU_PROPERTY_X86_FEATURE_1_SHSTK\n"));

Is -z ibtplt similar to the second PLT? Does -z ibt generate GNU_PROPERTY_X86_FEATURE_1_IBT even if there is an input object file with the GNU_PROPERTY_X86_FEATURE_1_IBT bit cleared? Or when it is specified while -z shstk is not, merge just ibt but not shstk? I'm puzzled as you said the note merging behavior is automatic in ld.bfd.

--auto-cet and --force-cet

We can also make one option --cet={none,auto,force}, but I'd like to understand first what -z ibt -z shstk -z ibtplt do...

Hi MaskRay,
1 please not focus on these -z options now, H.J add them just for testing, for example, if you compiled file without CET, but finally, you want to see "there will be errors" in runtime if you force add CET flags in the output file.

2 I have sought used --xxx={none,auto,force}, at last I choose using --xxx-xxx, because I think we will mostly use --auto-cet, rarely use others

This macros is unnecessary. You can use alignTo. See ELF/LinkerScript.cpp for examples.

Delete "IBT can"  " instead of retpolineplt".

OK, I'll change it, thank you very much!

xiangzhangllvm updated this revision to Diff 190206.EditedMar 11 2019, 10:22 PM
--- a/ELF/Driver.cpp
+++ b/ELF/Driver.cpp
@@ -72,8 +72,6 @@ LinkerDriver *elf::Driver;

 static void setConfigs(opt::InputArgList &Args);

-#define ALIGN_UP(size, alignment) ((size) + (alignment) - 1)&(-(alignment))

 bool elf::link(ArrayRef<const char *> Args, bool CanExitEarly,
                raw_ostream &Error) {
   errorHandler().LogName = args::getFilenameWithoutExe(Args[0]);
@@ -323,8 +321,7 @@ static void checkOptions() {

   if (Config->ZRetpolineplt
       && (Config->AutoCET || Config->ForceCET))
-    error("--auto-cet/--force-cet may not be used with -z retpolineplt, IBT can"
-          " instead of retpolineplt");
+    error("--auto-cet/--force-cet may not be used with -z retpolineplt");
 }

 static const char *getReproduceOption(opt::InputArgList &Args) {
@@ -1600,7 +1597,7 @@ static bool findGNUPropertyX86Feature1AND(InputSectionBase *S,
         do {
           unsigned Size = Data[CurSecOffset + DescOffset + pr_dataszInDescIndex];
           Size += CurSecOffset * 4 + DescOffset*4 + DescFixedSize;
-          Size = ALIGN_UP (Size, S->Alignment);
+          Size = alignTo (Size, S->Alignment);
           NextDescOffset = Size / 4 - CurSecOffset;  //next DescOffset
           Next_pr_typeIndex = CurSecOffset + NextDescOffset + pr_typeInDescIndex;

@@ -1624,7 +1621,7 @@ static bool findGNUPropertyX86Feature1AND(InputSectionBase *S,
     CurSize += HeadSize + Size;
     // Section and pr_data both have the align requestment: 8 in 64bit cpu
     // and 4 in 32bit cpu.
-    CurSize = ALIGN_UP (CurSize, S->Alignment);
+    CurSize = alignTo (CurSize, S->Alignment);
     CurSecOffset = CurSize / 4;
   }
   return false;
test/ELF/i386-feature-1-and.s
24

It seems make sense!

40

Yes, I ll recommands it, in fact, these test is very simple for CET test.
They just check the flags is ok or not.
The most testing work we need to test in a CET special machine.

test/ELF/x86-64-feature-1-and.s
40

Sorry! ^_^

As an earlier comment mentioned, there is quite a lot of code that depends on the details of the .note.gnu.property section added very early to Driver.c which is not where we'd normally expect to see it. It also looks like you using quite a lot of code to get an InputSection to represent the output .note.gnu.property section. Given that there will be multiple Input .note.gnu.property sections but only one Output .note.gnu.property section, I think it would be cleaner to use a SyntheticSection to represent the .note.gnu.property section. SyntheticSections can implement all sorts of custom functionality that could move the parsing and merging code out of Driver.cpp. My suggestion is to:

  • Create a new SyntheticSection (see SyntheticSections.h), something like NotePropertySyntheticSection. This will represent our single output .note.gnu.property section. It can store the status of the FEATUREAND bits.
  • Add an entry to InStruct for a single instance of it that we will use for the output.
  • After InStruct instance is created in Writer.c but before the relocations are scanned to create PLT entries iterate through InputSections and pass each .note.gnu.property to the NotePropertySyntheticSection. Then erase these InputSections from the InputSections. The combineEhFrameSections<ELFT> function in Writer.cpp is one way of doing something like that.
  • When you add the .note.gnu.property InputSection to the NotePropertySyntheticSection you can update the FEATUREAND bits
  • The writeTo() member function of the NotePropertySyntheticSection can output the data needed to make up the output.

I think that this should allow you to integrate the .note.gnu.property in a similar way to other "merge" features such as EHFrame, and SHF_MERGE sections.

The ld.bfd AArch64 BTI patches have just gone up on https://www.sourceware.org/ml/binutils/2019-03/msg00021.html so I'll hopefully be starting work on another feature that uses .note.gnu.property soon.

ruiu added a comment.Mar 12 2019, 10:59 AM

--auto-cet and --force-cet is perhaps fine. That should ensure users are aware of what they are doing (so the feature is not enabled too automatically), and it also provides a way to pin it so that it won't be accidentally turned off.

Was there any reason to choose that option names? H. J. Lu mentioned that GNU linkers supports several -z options for CET. If there's no strong reason to invent new option names, we should use the same command line options as GNU.

Gold auto enabled CET in default, so it have no options corresponding to the --auto-cet and --force-cet.
The '-z' options in gold is just for testing, For Simplify,It better to add them in a dependent patch.

Please keep in mind that "let's think about that later" is not a correct approach for command line option names. That's a public interface that cannot be changed once released. I think that is also true to gold. Even if the intention to add the -z options to gold was for testing, if the options have been added in a user-visible way, it is very unlikely that they'd be removed. So you need to think carefully as to what is the most consistent set of option names both for gold and lld once you implement all the features you want. That the situation that lld has --{auto,force}-cet and gold has -z options for the same functionality seems pretty inconsistent. Probably the importance of option names should be taken more seriously.

I think there are still several issues with this patch. I pointed out some of them before, but to summarize, let me iterate:

  • Your patch collects X86Feature1 bits from input files to copy that to a .note section in the output file. I don't think you need to do that. You only need to make sure that the input files have a desired bit in X86Feature1 bits, and generate a .note section containing GNU_PROPERTY_X86_FEATURE_1_IBT bit. In other words, please remove X86Feature1AND from Config.

Yes, ruiu, last time I also ask you about this point. please let me summarize your ideas here, then please check if I understand right?

  1. If not all input files have X86Feature1 bits, then discard the .note.gnu.property sections.
  2. If all input files have X86Feature1 bits: then we need to 'calculate' the resulting X86Feature1 bits from all the X86Feature1 bits ? (yes or no ?) then delete all the input files' .note.gnu.property sections ? (yes or no ?) then generate a .note.gnu.property section with the resulting X86Feature1 bits in input file level not in output file by Writer, so we need not to pass the X86Feature1 bits to Writer ? (yes or no ?)

What I recommended is to always discard .note section after checking the GNU_PROPERTY_X86_FEATURE_1_IBT *bit*. There's no .note section merging or anything. When you parse a file, just read a single bit from a .note section (if exists), and discard it.

As a result, after you read all input files, you have only one bit -- which indicates whether or not all input files support CET.

Then, if CET is supported by all input files, create a synthetic section for a .note section to enable CET for the output.

  • You add fair amount of new code to Driver.cpp, but which feels wrong. It should not be part of a driver. But I think the real problem is that the code to read .note section does just too much.

Hi, ruiu, there are not very much code now, in fact, half of them are code annotation.

  • Do you really have to read a .note section containing multiple descriptions? I think that compiler-created .note section contains only one description per a section, so you don't need to care about it.

Yes, the compiler-created .note section contains only one description per a section, But we should consider the handwriting assemble files.

I'm skeptical if people actually hand-write a .note section in assembly. In particular, is it really normal or even correct to write a note section containing multiple descriptions in assembly by hand? When you enable the CET for an assembly file, you would pass a command line option to the assembler to let the assembler to add that section, no? I seems that you are overthinking. I'd just consider the case where there is only one note description in a .note section.

As an earlier comment mentioned, there is quite a lot of code that depends on the details of the .note.gnu.property section added very early to Driver.c which is not where we'd normally expect to see it. It also looks like you using quite a lot of code to get an InputSection to represent the output .note.gnu.property section. Given that there will be multiple Input .note.gnu.property sections but only one Output .note.gnu.property section, I think it would be cleaner to use a SyntheticSection to represent the .note.gnu.property section. SyntheticSections can implement all sorts of custom functionality that could move the parsing and merging code out of Driver.cpp. My suggestion is to:

  • Create a new SyntheticSection (see SyntheticSections.h), something like NotePropertySyntheticSection. This will represent our single output .note.gnu.property section. It can store the status of the FEATUREAND bits.
  • Add an entry to InStruct for a single instance of it that we will use for the output.
  • After InStruct instance is created in Writer.c but before the relocations are scanned to create PLT entries iterate through InputSections and pass each .note.gnu.property to the NotePropertySyntheticSection. Then erase these InputSections from the InputSections. The combineEhFrameSections<ELFT> function in Writer.cpp is one way of doing something like that.
  • When you add the .note.gnu.property InputSection to the NotePropertySyntheticSection you can update the FEATUREAND bits
  • The writeTo() member function of the NotePropertySyntheticSection can output the data needed to make up the output.

I think that this should allow you to integrate the .note.gnu.property in a similar way to other "merge" features such as EHFrame, and SHF_MERGE sections.

The ld.bfd AArch64 BTI patches have just gone up on https://www.sourceware.org/ml/binutils/2019-03/msg00021.html so I'll hopefully be starting work on another feature that uses .note.gnu.property soon.

Thanks very much for your suggestion, I was not much familiar to the function of SyntheticSection before, I'll carefully dig this class, It seems really not good to do the "calculate bits" work in Driver.cpp now. Thank you! peter

Please keep in mind that "let's think about that later" is not a correct approach for command line option names. That's a public interface that cannot be changed once released. I think that is also true to gold. Even if the intention to add the -z options to gold was for testing, if the options have been added in a user-visible way, it is very unlikely that they'd be removed. So you need to think carefully as to what is the most consistent set of option names both for gold and lld once you implement all the features you want. That the situation that lld has --{auto,force}-cet and gold has -z options for the same functionality seems pretty inconsistent. Probably the importance of option names should be taken more seriously.

lld's --{auto,force}-cet is different with gold's -z options, "-z" options directly add CET info into output files, not care the input files is CETed or not, they just designed for testing, it is not right to use them in normal build.
--{auto,force}-cet depend on the input files to enable CET or not.
The key difference is that gold auto enabled CET in default (equal with LLD's --auto-cet).
I tended to keep it same with gold, but seems hard to persuade you ☺

What I recommended is to always discard .note section after checking the GNU_PROPERTY_X86_FEATURE_1_IBT *bit*. There's no .note section merging or anything. When you parse a file, just read a single bit from a .note section (if exists), and discard it.
As a result, after you read all input files, you have only one bit -- which indicates whether or not all input files support CET.
Then, if CET is supported by all input files, create a synthetic section for a .note section to enable CET for the output.

OK, as peter's suggestion, I'll recheck here code, thank you!

I'm skeptical if people actually hand-write a .note section in assembly. In particular, is it really normal or even correct to write a note section containing multiple descriptions in assembly by hand?

The assemble syntax support multiple descriptions.
And suppose that, one day, not only CET feature will put into .note.gnu.property, (now ABI have defined several other kind features such as GNU_PROPERTY_X86_ISA_1_USED ... ). So I think it will be normal to see one section with multiple descriptions (each description contains a different feature).

When you enable the CET for an assembly file, you would pass a command line option to the assembler to let the assembler to add that section, no? I seems that you are overthinking. I'd just consider the case where there is only one note description in a .note section.

Yes, no, we write the .note.gnu.property in the assembly file not use options.

ruiu added a comment.Mar 13 2019, 1:55 PM

Please keep in mind that "let's think about that later" is not a correct approach for command line option names. That's a public interface that cannot be changed once released. I think that is also true to gold. Even if the intention to add the -z options to gold was for testing, if the options have been added in a user-visible way, it is very unlikely that they'd be removed. So you need to think carefully as to what is the most consistent set of option names both for gold and lld once you implement all the features you want. That the situation that lld has --{auto,force}-cet and gold has -z options for the same functionality seems pretty inconsistent. Probably the importance of option names should be taken more seriously.

lld's --{auto,force}-cet is different with gold's -z options, "-z" options directly add CET info into output files, not care the input files is CETed or not, they just designed for testing, it is not right to use them in normal build.
--{auto,force}-cet depend on the input files to enable CET or not.
The key difference is that gold auto enabled CET in default (equal with LLD's --auto-cet).

But gold already has options, -z ibt and -z ibtplt. You should be consistent with these options, shouldn't we?

What's going to happen if you specify these options to gold and some input files lack special marker .note section? Does the gold linker silently create an executable that won't work, or does it reject with an error message?

What I recommended is to always discard .note section after checking the GNU_PROPERTY_X86_FEATURE_1_IBT *bit*. There's no .note section merging or anything. When you parse a file, just read a single bit from a .note section (if exists), and discard it.
As a result, after you read all input files, you have only one bit -- which indicates whether or not all input files support CET.
Then, if CET is supported by all input files, create a synthetic section for a .note section to enable CET for the output.

OK, as peter's suggestion, I'll recheck here code, thank you!

I'm skeptical if people actually hand-write a .note section in assembly. In particular, is it really normal or even correct to write a note section containing multiple descriptions in assembly by hand?

The assemble syntax support multiple descriptions.
And suppose that, one day, not only CET feature will put into .note.gnu.property, (now ABI have defined several other kind features such as GNU_PROPERTY_X86_ISA_1_USED ... ). So I think it will be normal to see one section with multiple descriptions (each description contains a different feature).

When you enable the CET for an assembly file, you would pass a command line option to the assembler to let the assembler to add that section, no? I seems that you are overthinking. I'd just consider the case where there is only one note description in a .note section.

Yes, no, we write the .note.gnu.property in the assembly file not use options.

Fair enough. Maybe the thing that alerted me on that code was that the code was written in a very different way than the other code in the same .cpp file and other .cpp files in the same directory. First of all, the code to read that section should not be in the driver, and that was written in a different coding style. Perhaps after you fix that (probably you need to read code around the location where you are editing and mimic the way what they are doing), the code would blend into other pieces of code.

hi dears,
I updated the patch: most of changes refer to peter and ruiu's suggestions, but not of all.
The main change is:

  1. Read all input files, then collect the CET features, and discard the .note.gnu.property sections, do these in SyntheticSections.cpp not in Driver.cpp.
  2. If CET features checked successful, then create the SyntheticSection's sections (GnuPropertySection, SPltSection) in Writer's createSyntheticSections().

instead of directly creating .note.gnu.property section in Writer.

  1. Adjust the test cases to remove redundant code check.

deficiencies:

  1. I tend to collect the CET features in GnuPropertySection (SyntheticSection) class or in Writer->run like combineEhFrameSections(), but I found it should first get the CET features,

then decide to create the GnuPropertySection or not (SPltSection too).

  1. I didn't remove the Config->X86Feature1AND, because I need it to tell the createSyntheticSections() to create the GnuPropertySection or not (SPltSection too).
MaskRay edited reviewers, added: MaskRay; removed: maskray0.Mar 18 2019, 2:06 AM

H.J. have enable the CET in GNU-toolchain and rebuild/rewrite the related libs/linux-kernal, his team are testing it for at last one year.

Not Linux, FreeBSD. We are hoping to develop the kernel support but are unable to do so without a simulator.

deficiencies:

  1. I tend to collect the CET features in GnuPropertySection (SyntheticSection) class or in Writer->run like combineEhFrameSections(), but I found it should first get the CET features,

then decide to create the GnuPropertySection or not (SPltSection too).

  1. I didn't remove the Config->X86Feature1AND, because I need it to tell the createSyntheticSections() to create the GnuPropertySection or not (SPltSection too).

I guess you mean "I tried", not "I tend" in 1). I think that you can do accomplish 1.) via unconditionally creating the In.GnuPropertySection and SPltSection and by implementing the empty() member function, ensure that they are removed by removeUnusedSyntheticSections() if there isn't anything that uses them. Unconditionally creating the sections means that you don't need to call mergeAggregateMetaData from Driver.cpp prior to createSyntheticSections(). For the case of ifunc where you need to turn off the feature then just inform In.GnuPropertySection and In.SPltSection as they will always be there.

ELF/SyntheticSections.cpp
300

Typo Dnone, should be Done. Having said that. You have already commented that only X86_FEATURE_1_AND is supported, do you need to repeat it?

ELF/SyntheticSections.h
154

The comment has quite a few typos. May I suggest something like:
// We only support a single feature X86_FEATURE_1_AND, so we hard code the DescSize to a single entry.

157

This doesn't need to be a function, you only use it once to initialize DescSize. Why not just make DescSize static const size_t DescSize = 16;

158

These member functions are only called once and can be inlined.

161

If you implement the empty() member function then you will be able to unconditionally create In.GnuProperty as it will be removed if it isn't used.

164

DescBuf is only used in writeTo, it can be defined there.

ELF/Writer.cpp
88

You aren't using this anymore so it can be removed.

424

If you implement the GnuPropertySection::empty() member function then you can unconditionally create the section. It will be removed by removeUnusedSynthetics() if there are no entries.

We are using a special CET SDV for CET enabling. We have been enabling CET in Fedora piece by piece by adding -fcf-protection to CFLAGS.

How do we get access to such a simulator so that we can do the same in FreeBSD?

xiangzhangllvm marked 7 inline comments as done.Mar 18 2019, 6:37 PM
xiangzhangllvm added inline comments.
ELF/SyntheticSections.cpp
300

Yes, sorry for the typo, thank you!
I repeat the reason here is to let people easily understand and they will know how to change it when the other feature added.

ELF/SyntheticSections.h
154

Sorry for the typos, I 'll change it, thank you!

157

For consideration of extensibility, I used function to get the DescSize. We will do some work here when we need deal with the other feature types, because there will create more than one Desc here.

158

Same reason with the getDescSize()

164

Make sense!

ELF/Writer.cpp
88

Yes, you are right!

424

Thank you! I want to try.

But gold already has options, -z ibt and -z ibtplt. You should be consistent with these options, shouldn't we?

Yes, I think we should, and I think it's better to support these test options in a independent patch, and it's easy to support.

What's going to happen if you specify these options to gold and some input files lack special marker .note section? Does the gold linker silently create an executable that won't work, or does it reject with an error message?

Yes, I test it, it will directly create '.note.gnu.property' section or 'splt', create an executable that won't work, and give no errors.
So it just used for test.
What your idea here?

We are using a special CET SDV for CET enabling. We have been enabling CET in Fedora piece by piece by adding -fcf-protection to CFLAGS.

How do we get access to such a simulator so that we can do the same in FreeBSD?

Hi, emaste , as I know, the SDV is not public now, we just internally use it.
is it? @hjl.tools

xiangzhangllvm added a comment.EditedMar 19 2019, 1:25 AM

I guess you mean "I tried", not "I tend" in 1). I think that you can do accomplish 1.) via unconditionally creating the In.GnuPropertySection and SPltSection and by implementing the empty() member function, ensure that they are removed by removeUnusedSyntheticSections() if there isn't anything that uses them. Unconditionally creating the sections means that you don't need to call mergeAggregateMetaData from Driver.cpp prior to createSyntheticSections(). For the case of ifunc where you need to turn off the feature then just inform In.GnuPropertySection and In.SPltSection as they will always be there.

Hi peter, I have implemented your idea in my code, and it work well, I did not update here now because I am not sure about the ifunc.
One of my destination is to remove the Config->X86Feature1AND and do the feature collection work inside synthetic GnuPropertySection class.
I found I can remove the Config->X86Feature1AND except dealing with ifunc, If I remove the Config->X86Feature1AND I still need another global variable.
At this stage I want to disable CET when the program will use ifunc, my current code is Relocations.cpp:1072

// To Be Done.
// Now the IBT can't work with IFUNC feature, because they may conflict in
// changing the PLT. So we disable the IBT when the IFUNC enabled.
if (Sym.isGnuIFunc()) {
  Config->X86Feature1AND &= ~GNU_PROPERTY_X86_FEATURE_1_IBT;
  Config->SPltSectionEnable = false;
}

Do you have some good idea to turn off the CET feature in ifunc ?
Thank you very much!

xiangzhangllvm added a comment.EditedMar 19 2019, 1:35 AM

I do this changes in my local: (the patch is too big here, so I cut some unimportant changes)
I do the feature collection work in the GnuPropertySection's Constructor which will be called in createSyntheticSections().
For the SPlt, I keep its generation condition, because the GnuPropertySection's Constructor will set the Config->SPltSectionEnable.

  1 diff --git a/ELF/Driver.cpp b/ELF/Driver.cpp
  2 index 7628df0..d5217bd 100644
  3 --- a/ELF/Driver.cpp
  4 +++ b/ELF/Driver.cpp
  5 @@ -1626,14 +1626,6 @@ template <class ELFT> void LinkerDriver::link(opt::InputArgList &Args) {
  6        return S->Name.startswith(".debug") || S->Name.startswith(".zdebug");
  7      });
  8
  9 -  // Merge aggregate metadata sections like .note.gnu.property that should be
 10 -  // found across all relocated files and reproduced once. But Until now, the
 11 -  // emitAggregateMetadata just deal with the X86Feature1AND info, and only X86_64, i386
 12 -  // support the X86Feature1AND.
 13 -  if ((Config->EMachine == EM_X86_64 || Config->EMachine == EM_386)
 14 -      && (Config->AutoCET || Config->ForceCET))
 15 -    mergeAggregateMetadata();
 16 -
 17    Config->EFlags = Target->calcEFlags();
 18
 19    if (Config->EMachine == EM_ARM) {

 20 diff --git a/ELF/SyntheticSections.cpp b/ELF/SyntheticSections.cpp
 21 index bb1c5d3..4ed1a72 100644
 22 --- a/ELF/SyntheticSections.cpp
 23 +++ b/ELF/SyntheticSections.cpp

 44 +bool GnuPropertySection::empty() const {
 45 +  uint32_t Feature = getFeature();
 46 +  return Feature == 0x0;
 47  }
 48
 59  void GnuPropertySection::setFeature() {
 60 -  Feature1AND = getFeature();
 61 +  Feature1AND = mergeAggregateMetadata();
 62  }
 63

101 -void elf::mergeAggregateMetadata() {
102 +uint32_t elf::mergeAggregateMetadata() {
         ... do some change here to return  X86Feature1AND;
136  }
137
138  template <class ELFT> void elf::splitSections() {
139 diff --git a/ELF/SyntheticSections.h b/ELF/SyntheticSections.h
140 index f148281..e369ef1 100644
141 --- a/ELF/SyntheticSections.h
142 +++ b/ELF/SyntheticSections.h
143 @@ -149,19 +149,19 @@ public:
144    GnuPropertySection();
145    void writeTo(uint8_t *Buf) override;
146    size_t getSize() const override;
147 +  bool empty() const override;
148
176
177 diff --git a/ELF/Writer.cpp b/ELF/Writer.cpp
178 index 11639e5..0deebb9 100644
179 --- a/ELF/Writer.cpp
180 +++ b/ELF/Writer.cpp
181 @@ -335,6 +335,16 @@ template <class ELFT> static void createSyntheticSections() {
182        Add(Sec);
183    }
184
185 +  // Merge aggregate metadata sections like .note.gnu.property that should be
186 +  // found across all relocated files and reproduced once. But Until now, the
187 +  // emitAggregateMetadata just deal with the X86Feature1AND info, and only
188 +  // X86_64, i386 support the X86Feature1AND.
189 +  if ((Config->EMachine == EM_X86_64 || Config->EMachine == EM_386)
190 +      && (Config->AutoCET || Config->ForceCET)) {
191 +    In.GnuProperty = make<GnuPropertySection>();
192 +    Add(In.GnuProperty);
193 +  }

198 @@ -416,16 +426,12 @@ template <class ELFT> static void createSyntheticSections() {
202 
203    if (Config->SPltSectionEnable) {
204      In.Splt = make<SPltSection>();
205      Add(In.Splt);
206    }
207

I guess you mean "I tried", not "I tend" in 1). I think that you can do accomplish 1.) via unconditionally creating the In.GnuPropertySection and SPltSection and by implementing the empty() member function, ensure that they are removed by removeUnusedSyntheticSections() if there isn't anything that uses them. Unconditionally creating the sections means that you don't need to call mergeAggregateMetaData from Driver.cpp prior to createSyntheticSections(). For the case of ifunc where you need to turn off the feature then just inform In.GnuPropertySection and In.SPltSection as they will always be there.

Hi peter, I have implemented your idea in my code, and it work well, I did not update here now because I am not sure about the ifunc.
One of my destination is to remove the Config->X86Feature1AND and do the feature collection work inside synthetic GnuPropertySection class.
I found I can remove the Config->X86Feature1AND except dealing with ifunc, If I remove the Config->X86Feature1AND I still need another global variable.
At this stage I want to disable CET when the program will use ifunc, my current code is Relocations.cpp:1072

// To Be Done.
// Now the IBT can't work with IFUNC feature, because they may conflict in
// changing the PLT. So we disable the IBT when the IFUNC enabled.
if (Sym.isGnuIFunc()) {
  Config->X86Feature1AND &= ~GNU_PROPERTY_X86_FEATURE_1_IBT;
  Config->SPltSectionEnable = false;
}

Do you have some good idea to turn off the CET feature in ifunc ?
Thank you very much!

As I understand it, what you have in this review (not the patch attached as a comment).

  • Driver.cpp calls mergeAggregateMetaData() to set the Config->SpltSectionEnabled and Config->X86Feature1AND
    • We know that all input objects have the feature property set.
  • Writer.cpp creates the In.SpltSection() and the In.GnuProperty() section as we know that will be needed.
  • If we encounter an ifunc then we disable IBT by setting Config->SpltSectionEnabled to false and clearing Config->X86Feature1AND

One thing I don't understand yet, apologies I've not had time to go into the details of how the splt section works, is that if you discover an ifunc mid-way through processing then you'll have inserted many entries into the In.Splt section, then disabled it, from then on you'll just use the In.Plt. Will this result in a broken PLT as it looks like we'll call get getPltVA() if Config->SpltSectionEnabled and I'm not sure if that will return the right thing if we've previously added an entry into the In.Splt section. I don't see any existing tests for ifunc handling, can you add a test case that does something like:

  • Enable SpltSection
  • Generate a In.SpltSection entry
  • encounter an Ifunc
  • Disable SpltSection
  • Generate a In.Plt entry for some other symbol
  • Check that the generated PLT makes sense.

If this bit is broken then I think you'll need to find a way of handling this case, which won't be easy. I can think of a few approaches:

  • Do a pre-scan for likely ifuncs, before calling scanRelocs. This could be simpler but more pessimistic than the existing code.
  • Have a way that the In.SpltSection decays into an In.PltSection gracefully if an ifunc is detected.

Going back to your original question, how about:

  • Writer.cpp at construction of In.GnuPropertySection calls mergeAggregateMetaData() and sets GnuPropertySection::Feature1AND and possibly SpltSection::enable
    • As you've implemented empty() these sections can be unconditionally created, although it doesn't matter too much if you conditionally create them, in that case In.GnuPltSection == nullptr is implicity Feature1AND=0 and In.SpltSection == nullptr is implicitly enableSpltSection == false
  • If we encounter an ifunc then we disable IBT by setting In.SpltSection::Enable to false and setting In.GnuPropertySection::X86Feature1AND to 0. If you've not created the sections then you'll obviously need to test that they exist first. If I've understood correctly if the sections don't exist then there is nothing to disable anyway.

Please let me know if I've misunderstood something critical?

ruiu added a comment.Mar 19 2019, 9:32 AM

Overall this patch still look very different from the existing lld code base and does many things by hand without using LLVM libObject. You really need to read the existing code first and then try to mimic what the existing code is doing. What this patch is trying to do is not that new -- your patch should and could follow the existing patterns that I believe not too hard to find by reading existing code.

It is perhaps faster to actually show how it should look like than pointing out all the details in the review thread, so I'll apply your patch locally, modify that, and then upload it so that you can read it.

ELF/SyntheticSections.cpp
318–328

These functions are too trivial.

3047–3048

This does not follow the LLVM naming convention.

3120

Essentially, when you read some structure from a file, you should cast a byte array to a struct defined in llvm/include/llvm/Object/ELFTypes.h. You should not read data like this function does.

3124

Needless use of decltype.

ELF/Writer.cpp
425–426

I don't think you need this variable.

As I understand it, what you have in this review (not the patch attached as a comment).

  • Driver.cpp calls mergeAggregateMetaData() to set the Config->SpltSectionEnabled and Config->X86Feature1AND
    • We know that all input objects have the feature property set.
  • Writer.cpp creates the In.SpltSection() and the In.GnuProperty() section as we know that will be needed.
  • If we encounter an ifunc then we disable IBT by setting Config->SpltSectionEnabled to false and clearing Config->X86Feature1AND

One thing I don't understand yet, apologies I've not had time to go into the details of how the splt section works, is that if you discover an ifunc mid-way through processing then you'll have inserted many entries into the In.Splt section, then disabled it, from then on you'll just use the In.Plt. Will this result in a broken PLT as it looks like we'll call get getPltVA() if Config->SpltSectionEnabled and I'm not sure if that will return the right thing if we've previously added an entry into the In.Splt section. I don't see any existing tests for ifunc handling, can you add a test case that does something like:

  • Enable SpltSection
  • Generate a In.SpltSection entry
  • encounter an Ifunc
  • Disable SpltSection
  • Generate a In.Plt entry for some other symbol
  • Check that the generated PLT makes sense.

If this bit is broken then I think you'll need to find a way of handling this case, which won't be easy. I can think of a few approaches:

  • Do a pre-scan for likely ifuncs, before calling scanRelocs. This could be simpler but more pessimistic than the existing code.
  • Have a way that the In.SpltSection decays into an In.PltSection gracefully if an ifunc is detected.

Going back to your original question, how about:

  • Writer.cpp at construction of In.GnuPropertySection calls mergeAggregateMetaData() and sets GnuPropertySection::Feature1AND and possibly SpltSection::enable
    • As you've implemented empty() these sections can be unconditionally created, although it doesn't matter too much if you conditionally create them, in that case In.GnuPltSection == nullptr is implicity Feature1AND=0 and In.SpltSection == nullptr is implicitly enableSpltSection == false
  • If we encounter an ifunc then we disable IBT by setting In.SpltSection::Enable to false and setting In.GnuPropertySection::X86Feature1AND to 0. If you've not created the sections then you'll obviously need to test that they exist first. If I've understood correctly if the sections don't exist then there is nothing to disable anyway.

Please let me know if I've misunderstood something critical?

Hi, Peter, very appreciate for your suggestion, for the Splt, in fact, it is just the entry of the old PLT, you can see in this way: SPLT+PLT=(old)PLT, we split the (old)Plt to Plt +SPlt, one of the reasons is for performance that we want to put them into different areas (hot code, cold code) .
And in this patch, I used getSPltVA() instead of getPltVA when face outside. For example in getRelocTargetVA():

case R_PLT:
  // When the CET Branch protection is enabled, the second PLT
  // should be call to.
  if (Config->SPltSectionEnable)
    return Sym.getSPltVA() + A;
  else
    return Sym.getPltVA() + A;

Maybe it's hard to use In.SpltSection::Enable to do these.
Any way I will check it.
Thank you very much!

Overall this patch still look very different from the existing lld code base and does many things by hand without using LLVM libObject. You really need to read the existing code first and then try to mimic what the existing code is doing. What this patch is trying to do is not that new -- your patch should and could follow the existing patterns that I believe not too hard to find by reading existing code.

It is perhaps faster to actually show how it should look like than pointing out all the details in the review thread, so I'll apply your patch locally, modify that, and then upload it so that you can read it.

Hi, ruiu, I'll try to read them and let the code style like the existing lld code, I think the most un-beautiful code you sought is in the findGNUPropertyX86Feature1AND() function, here is too much details about the ABI, and I get the features in my way.

But I am not quite understanding your following review:

    In.GnuProperty = make<GnuPropertySection>();
    Add(In.GnuProperty);
ruiu:  I don't think you need this variable.

Did you mean I should not use the GnuPropertySection (SyntheticSection Class) to generate the output .note.gnu.property section? It seems a little different with peter's suggestion.

Anyway I'll read more code first.
Thank you again!

Fine tuning the patch, and still need to refine the "code style".

xiangzhangllvm marked an inline comment as done.Mar 19 2019, 10:18 PM
xiangzhangllvm added inline comments.
ELF/SyntheticSections.cpp
3120

Yes! This is the key point!

xiangzhangllvm marked an inline comment as done.Mar 19 2019, 11:34 PM
xiangzhangllvm added inline comments.
ELF/SyntheticSections.cpp
3120

Hi Ruiu, let me tell you my idea about how to rewrite here code first:

First I checked llvm/include/llvm/Object/ELFTypes.h, and find no structure corresponding to our .note.gnu.property.
Compiler write this section at llvm/lib/Target/X86/X86AsmPrinter.cpp:X86AsmPrinter::EmitStartOfAsmFile(), it not used any structure too.

I want to change here like this:

struct head{
...
}
struct desc{
...
}
struct gnu.property{
  head H;
  desc D[]; 
}

then cast the note.gnu.property's data array to struct gnu.property.
and check the features though struct gnu.property.
Do you think it's OK?

I rewrited the collecting feature functions.
It is very simple now!

+struct GnuPropertyHeadType {
+  uint32_t Namesz;
+  uint32_t Descsz;
+  uint32_t NType;
+  uint32_t Name;
+};                                              // Did here need packed attribute ? I have a little concern
+
+struct GnuPropertyDescType {
+  uint32_t DescTy;
+  uint32_t Descsz;
+  uint32_t Feature;
+  uint32_t Pad;
+};                                              // Did here need packed attribute ? I have a little concern
+
+struct GnuPropertyType {
+  GnuPropertyHeadType Head;
+  GnuPropertyDescType Desc[];
+};                                            // Did here need packed attribute ? I have a little concern
+
+static bool findGNUPropertyX86Feature1AND(InputSectionBase *S,
+                                          unsigned &Features) {
+  auto Data = S->getDataAs<uint32_t>();
+  GnuPropertyType *GnuProperty = cast<GnuPropertyType>(Data.data());
+  size_t DescNum = GnuProperty->Head.Descsz / sizeof (GnuPropertyDescType);
+  if (GnuProperty->Head.NType != NT_GNU_PROPERTY_TYPE_0)
+    return false;
+  for (size_t i = 0; i < DescNum; i ++) {
+    if (GnuProperty->Desc[i].DescTy == GNU_PROPERTY_X86_FEATURE_1_AND) {
+      Features = GnuProperty->Desc[i].Feature;
+      return true;
+    }
+  }
+  return false;
+}

I rewrited the collecting feature functions.
It is very simple now!

+struct GnuPropertyHeadType {
+  uint32_t Namesz;
+  uint32_t Descsz;
+  uint32_t NType;
+  uint32_t Name;
+};                                              // Did here need packed attribute ? I have a little concern
+
+struct GnuPropertyDescType {
+  uint32_t DescTy;
+  uint32_t Descsz;
+  uint32_t Feature;
+  uint32_t Pad;
+};                                              // Did here need packed attribute ? I have a little concern
+
+struct GnuPropertyType {
+  GnuPropertyHeadType Head;
+  GnuPropertyDescType Desc[];
+};                                            // Did here need packed attribute ? I have a little concern
+
+static bool findGNUPropertyX86Feature1AND(InputSectionBase *S,
+                                          unsigned &Features) {
+  auto Data = S->getDataAs<uint32_t>();
+  GnuPropertyType *GnuProperty = cast<GnuPropertyType>(Data.data());
+  size_t DescNum = GnuProperty->Head.Descsz / sizeof (GnuPropertyDescType);
+  if (GnuProperty->Head.NType != NT_GNU_PROPERTY_TYPE_0)
+    return false;
+  for (size_t i = 0; i < DescNum; i ++) {
+    if (GnuProperty->Desc[i].DescTy == GNU_PROPERTY_X86_FEATURE_1_AND) {
+      Features = GnuProperty->Desc[i].Feature;
+      return true;
+    }
+  }
+  return false;
+}

I'm having problems getting this to compile on at least GCC 5.4.0 on Linux:

lld/ELF/SyntheticSections.cpp:3104:28: warning: ISO C++ forbids zero-size array ‘Desc’ [-Wpedantic]
   GnuPropertyDescType Desc[]

lld/ELF/SyntheticSections.cpp:3110:55: error: invalid conversion from ‘llvm::cast_retty<GnuPropertyType, const unsigned int*>::ret_type {aka const GnuPropertyType*}’ to ‘GnuPropertyType*’ [-fpermissive]
   GnuPropertyType *GnuProperty = cast<GnuPropertyType>(Data.data());

The cast<> is an LLVM cast http://llvm.org/docs/ProgrammersManual.html#the-isa-cast-and-dyn-cast-templates and isn't something you can do on an arbitrary struct.

Assuming (x86) that your input data is always little endian then you'll need to use something like ulittle32_t instead of uint32_t or your tests will fail when run on a big-endian machine like PPC or some Mips machines.

Hi, Peter, very appreciate for your suggestion, for the Splt, in fact, it is just the entry of the old PLT, you can see in this way: SPLT+PLT=(old)PLT, we split the (old)Plt to Plt +SPlt, one of the reasons is for performance that we want to put them into different areas (hot code, cold code) .
And in this patch, I used getSPltVA() instead of getPltVA when face outside. For example in getRelocTargetVA():

case R_PLT:
  // When the CET Branch protection is enabled, the second PLT
  // should be call to.
  if (Config->SPltSectionEnable)
    return Sym.getSPltVA() + A;
  else
    return Sym.getPltVA() + A;

Maybe it's hard to use In.SpltSection::Enable to do these.
Any way I will check it.
Thank you very much!

I've checked that it does work, although at the moment it does leave a redundant .splt section in the binary. I think that this can be removed by accounting for this in the SPltSection::empty(), I've left a suggestion inline. I also note that there are no tests that check the contents of the .splt are correct. I think it is important to have those as it can be easy for it to be broken by some later change and no-one noticing it.

ELF/SyntheticSections.h
700

If I've understood correctly if the .SPLT is disabled then this should return true, otherwise if you have any ifuncs then a redundant .splt section is produced.

test/ELF/x86-64-feature-1-and.s
57

The test checks that the splt is created, but it doesn't check that the contents are correct. There should be at least one of these tests that disassembles the .plt and checks that it is loading the correct .got entry.

There also should be a test to show that when an ifunc plt is added then we revert to using the .plt, and that any existing .splt entries are not called.

For example:

.type foo STT_GNU_IFUNC
.globl ifunc
ifun:
 ret

 .globl func1
 .type func1,@function
func1:
        callq func2
        callq ifunc    // func2 will have been added to .splt, we don't want func3 to be.
        callq func3
ruiu added a comment.Mar 20 2019, 3:48 PM

Overall this patch still look very different from the existing lld code base and does many things by hand without using LLVM libObject. You really need to read the existing code first and then try to mimic what the existing code is doing. What this patch is trying to do is not that new -- your patch should and could follow the existing patterns that I believe not too hard to find by reading existing code.

It is perhaps faster to actually show how it should look like than pointing out all the details in the review thread, so I'll apply your patch locally, modify that, and then upload it so that you can read it.

Hi, ruiu, I'll try to read them and let the code style like the existing lld code, I think the most un-beautiful code you sought is in the findGNUPropertyX86Feature1AND() function, here is too much details about the ABI, and I get the features in my way.

But I am not quite understanding your following review:

    In.GnuProperty = make<GnuPropertySection>();
    Add(In.GnuProperty);
ruiu:  I don't think you need this variable.

Did you mean I should not use the GnuPropertySection (SyntheticSection Class) to generate the output .note.gnu.property section? It seems a little different with peter's suggestion.

Sometimes we create a section and fill it later, because the exact contents of the section is not known at the time when we create it. In that case, you have to create a section first, and then store its pointer to In.* so that you can refer them later.

I don't think you need that, though. There's nothing to "finalize" for that section because everything is known when you instantiate this class.

xiangzhangllvm added a comment.EditedMar 20 2019, 5:40 PM

I'm having problems getting this to compile on at least GCC 5.4.0 on Linux:

lld/ELF/SyntheticSections.cpp:3104:28: warning: ISO C++ forbids zero-size array ‘Desc’ [-Wpedantic]
   GnuPropertyDescType Desc[]

lld/ELF/SyntheticSections.cpp:3110:55: error: invalid conversion from ‘llvm::cast_retty<GnuPropertyType, const unsigned int*>::ret_type {aka const GnuPropertyType*}’ to ‘GnuPropertyType*’ [-fpermissive]
   GnuPropertyType *GnuProperty = cast<GnuPropertyType>(Data.data());

The cast<> is an LLVM cast http://llvm.org/docs/ProgrammersManual.html#the-isa-cast-and-dyn-cast-templates and isn't something you can do on an arbitrary struct.

Assuming (x86) that your input data is always little endian then you'll need to use something like ulittle32_t instead of uint32_t or your tests will fail when run on a big-endian machine like PPC or some Mips machines.

Yes! It is really have error!
I used incremental compilation, didn't find it at first. I'll change it.

xiangzhangllvm marked 2 inline comments as done.Mar 20 2019, 6:45 PM

I've checked that it does work, although at the moment it does leave a redundant .splt section in the binary. I think that this can be removed by accounting for this in the SPltSection::empty(), .........

It shouldn't have redundant .splt section, there should not have any .splt section if IBT is disable or non-dynamic link. Could you show me your test please?

ELF/SyntheticSections.h
700

Till now this patch doesn't carefully handle ifunc, in ELF/Relocations.cpp:1072, I disable the splt if checked ifunc, if splt is disabled, there will be no splt section created.

if (Config->SPltSectionEnable) {
  In.Splt = make<SPltSection>();
  Add(In.Splt);
}
test/ELF/x86-64-feature-1-and.s
57

I check the contents of plts in my local, I'll add them to the patch.

For, I ifunc, I planned to handle it in my next patch, so I didn't add tests here.

I've checked that it does work, although at the moment it does leave a redundant .splt section in the binary. I think that this can be removed by accounting for this in the SPltSection::empty(), .........

It shouldn't have redundant .splt section, there should not have any .splt section if IBT is disable or non-dynamic link. Could you show me your test please?

My test is a simple modification of x86-64-feature-1-and.s and Inputs/x86-64-feature-1-and1.s
Step 1: add func3 and func4 to Inputs/x86-64-feature-1-and1.s so that we can resolve the symbols.

 .text
 .globl func3
 .type func3,@function
func3:
 retq

.text
 .globl func4
 .type func4,@function
func4:
 retq

For x86-64-feature-1-and.s
change.text to

 .text

.type foo STT_GNU_IFUNC
.globl foo
foo:
 ret

 .globl func1
 .type func1,@function
func1:
        callq func2
        callq func3
        callq foo        
        callq func4
 retq

I've added an ifunc, a call to an ifunc and a call to a couple more functions. Notably the ifunc is after the calls to func1 and func2, so we turn off the feature after having added entries to the .splt.
Disassembly from gnu objdump

0000000000201000 <foo>:
  201000:	c3                   	retq   

0000000000201001 <func1>:
  201001:	e8 2a 00 00 00       	callq  201030 <func2@plt>
  201006:	e8 35 00 00 00       	callq  201040 <func3@plt>
  20100b:	e8 50 00 00 00       	callq  201060 <*ABS*+0x201000@plt>
  201010:	e8 3b 00 00 00       	callq  201050 <func4@plt>
  201015:	c3                   	retq   

Disassembly of section .plt:

0000000000201020 <func2@plt-0x10>:
  201020:	ff 35 e2 1f 00 00    	pushq  0x1fe2(%rip)        # 203008 <_DYNAMIC+0x1008>
  201026:	ff 25 e4 1f 00 00    	jmpq   *0x1fe4(%rip)        # 203010 <_DYNAMIC+0x1010>
  20102c:	0f 1f 40 00          	nopl   0x0(%rax)

0000000000201030 <func2@plt>:
  201030:	ff 25 e2 1f 00 00    	jmpq   *0x1fe2(%rip)        # 203018 <func2>
  201036:	68 00 00 00 00       	pushq  $0x0
  20103b:	e9 e0 ff ff ff       	jmpq   201020 <func1+0x1f>

0000000000201040 <func3@plt>:
  201040:	ff 25 da 1f 00 00    	jmpq   *0x1fda(%rip)        # 203020 <func3>
  201046:	68 01 00 00 00       	pushq  $0x1
  20104b:	e9 d0 ff ff ff       	jmpq   201020 <func1+0x1f>

0000000000201050 <func4@plt>:
  201050:	ff 25 d2 1f 00 00    	jmpq   *0x1fd2(%rip)        # 203028 <func4>
  201056:	68 02 00 00 00       	pushq  $0x2
  20105b:	e9 c0 ff ff ff       	jmpq   201020 <func1+0x1f>

0000000000201060 <*ABS*+0x201000@plt>:
  201060:	ff 25 ca 1f 00 00    	jmpq   *0x1fca(%rip)        # 203030 <_DYNAMIC+0x1030>
  201066:	68 00 00 00 00       	pushq  $0x0
  20106b:	e9 e0 ff ff ff       	jmpq   201050 <func4@plt>

Disassembly of section .splt:

0000000000201070 <.splt>:
  201070:	f3 0f 1e fa          	endbr64 
  201074:	ff 25 9e 1f 00 00    	jmpq   *0x1f9e(%rip)        # 203018 <func2>
  20107a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
  201080:	f3 0f 1e fa          	endbr64 
  201084:	ff 25 96 1f 00 00    	jmpq   *0x1f96(%rip)        # 203020 <func3>
  20108a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)

Note the redundant splt. I'm using the previous version of the patch as the current one didn't compile for me. I will double check the most recent patch with a more recent compiler.

Bug fix for 32-bit, the Desc size is not same between 32-bit and 64-bit

Hi peter, this update will remove redundant .splt in your upper ifunc test.
The reason of redundant .splt is that we scan the IFUNC symbol after we collected the CET feature. So the splt was created first and without being removed after we found IFUNC.
Anyway this patch is just evade ifunc, I planed to hand ifunc+cet after this patch.
Thank your test! thank you too

The update:

 1 diff --git a/ELF/SyntheticSections.cpp b/ELF/SyntheticSections.cpp
 2 index 9066a84..a55ca1b 100644
 3 --- a/ELF/SyntheticSections.cpp
 4 +++ b/ELF/SyntheticSections.cpp
 5 @@ -304,6 +304,9 @@ void GnuPropertySection::writeTo(uint8_t *Buf) {
 6    uint8_t *DescBuf = Buf + 16;
 7    write32(DescBuf, GNU_PROPERTY_X86_FEATURE_1_AND); // Feature type
 8    write32(DescBuf + 4, 4);                          // Feature size
 9 +  // We need remove this if code when we supported IFUNC+CET.
10 +  if (!Config->SPltSectionEnable)
11 +    Feature1AND &= ~GNU_PROPERTY_X86_FEATURE_1_IBT;
12    write32(DescBuf + 8, Feature1AND);                // Feature flags
13  }
14
15 diff --git a/ELF/SyntheticSections.h b/ELF/SyntheticSections.h
16 index 626a76d..51c3515 100644
17 --- a/ELF/SyntheticSections.h
18 +++ b/ELF/SyntheticSections.h
19 @@ -697,7 +697,8 @@ public:
20    SPltSection();
21    void writeTo(uint8_t *Buf) override;
22    size_t getSize() const override;
23 -  bool empty() const override { return Entries.empty(); }
24 +  bool empty() const override {
25 +    return !Config->SPltSectionEnable || Entries.empty(); }
26
27    template <class ELFT> void addEntry(Symbol &Sym);
28  private:
29 diff --git a/ELF/Writer.cpp b/ELF/Writer.cpp
30 index 11639e5..341a557 100644
31 --- a/ELF/Writer.cpp
32 +++ b/ELF/Writer.cpp
33 @@ -421,10 +421,8 @@ template <class ELFT> static void createSyntheticSections() {
34      Add(In.Splt);
35    }
36
37 -  if (Config->X86Feature1AND) {
38 -    In.GnuProperty = make<GnuPropertySection>();
39 -    Add(In.GnuProperty);
40 -  }
41 +  if (Config->X86Feature1AND)
42 +    Add(make<GnuPropertySection>());
43
44    // .note.GNU-stack is always added when we are creating a re-linkable
45    // object file. Other linkers are using the presence of this marker
46 @@ -1669,8 +1667,6 @@ template <class ELFT> void Writer<ELFT>::finalizeSections() {
47    // earlier.
48    finalizeSynthetic(In.EhFrame);
49
50 -  finalizeSynthetic(In.GnuProperty);
51 -
52    for (Symbol *S : Symtab->getSymbols())
53      if (!S->IsPreemptible)

Update test to check splt's context

ruiu added a comment.Mar 25 2019, 9:56 AM

As I said I was working on refining this patch. Actually it turned out that it was more like a complete rewrite than a refinement, as this patch was not aligned very well with the existing lld codebase and also have several performance issues. I'll post a patch shortly.

I posted https://groups.google.com/forum/#!topic/x86-64-abi/Z6GqQDy_NaI to ask 3 questions regarding IBT-enabled PLT with lazy binding.

MaskRay requested changes to this revision.Jul 27 2020, 11:45 PM

Implemented. Can be closed now..

This revision now requires changes to proceed.Jul 27 2020, 11:45 PM
xiangzhangllvm abandoned this revision.Jul 28 2020, 6:11 PM