Introduce a ThunkInserter CRTP base class from which new thunk types can inherit, e.g., thunks to mitigate https://software.intel.com/security-software-guidance/software-guidance/load-value-injection.
Details
Diff Detail
Event Timeline
By the way, I had initially implemented this patch with a pure virtual base class and a retpoline thunk (and later LVI thunk) class that implements the interface. However, I could not for the life of me structure the classes in a manner that would allow the compiler to devirtualize. Using CRTP admittedly sacrifices some readability, but it does not prevent the compiler from inlining RetpolineThunkInserter's methods.
Asking to learn here. I've not heard of CRTP and don't quite understand your explanation.
- What do you mean by allowing the compiler to devirtualize?
- Why is it preferable to allow the compiler to devirtualize?
- Why is it desirable to allow the compiler to inline RetpolineThunkInserter's methods?
- Why is that preferable to readability in this case? Is the implication that there is a large perf impact to not use CRTP?
More comments later.
llvm/lib/Target/X86/X86IndirectThunks.cpp | ||
---|---|---|
77 | Why not define this and the above function inline like you do for the LVI version in the 3/3 patch in this series? |
Recommend the article here: https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern
- What do you mean by allowing the compiler to devirtualize?
Suppose you have
struct Base { virtual void foo() = 0; }; struct D1 : Base { void foo() { … }; }; struct D2 final : Base { void foo() { … }; }; void barB(B *bptr) { bptr->foo(); } // cannot devirtualize void barD1(D1 *d1ptr) { d1ptr->foo(); } // cannot devirtualize void barD2(D2 *d2ptr) { d2ptr->foo(); } // can devirtualize
Devirtualizing a call means that you don't need to look up the method in the object's vtable. In short, a compiler can devirtualize a call if the target callee is unambiguous. In BarB(), the call to foo() cannot be devirtualized, because the pointee may be of type D1, D2, or something else that derives from D1. In BarD1(), the call also cannot be devirtualized because the compiler has no way of knowing that something else, perhaps in some other translation unit, may derive from D1 and have its own foo() implementation. The call to foo() in barD2() CAN be devirtualized because D2 is final and thus nothing can derive from it.
- Why is it preferable to allow the compiler to devirtualize?
- You save yourself two loads: loading the object's vtable, and loading the address of the target method.
- You allow the compiler to inline the callee.
- Why is it desirable to allow the compiler to inline RetpolineThunkInserter's methods?
This is my opinion and someone else may disagree. I suspect that most code compiled for X86 will not actually use any of these special thunks. But this pass is always run anyways, and the thunk inserter(s) need to look at every function and determine whether it is a thunk, or a function that needs a thunk. If these checks are being made as virtual calls, then IMO those cycles are being wasted. Virtual calls are most useful for big, scalable software that may change frequently. I see the thunk inserter as something fairly static that may only need to add a new thunk, say, every two years or so.
- Why is that preferable to readability in this case? Is the implication that there is a large perf impact to not use CRTP?
I don't know whether there will be a large performance impact, but there will be an impact. If the impact only applied to code that was using thunks, I think this would be ok. But some of the logic is being applied to all code, and therefore IMO should be as fast as possible.
I think I got all the answers to my questions from elsewhere. Lmk if these don't follow your rationale.
- What do you mean by allowing the compiler to devirtualize?
- Why is it preferable to allow the compiler to devirtualize?
devirtualize -> Avoid virtual dispatch. Virtual dispatch has overhead we prefer to avoid. LLVM has a preference to avoid virtual dispatch where possible and there is precedent within the codebase for using CRTP.
- Why is it desirable to allow the compiler to inline RetpolineThunkInserter's methods?
Performance
- Why is that preferable to readability in this case?
Following LLVM precedent.
- Is the implication that there is a large perf impact to not use CRTP?
Yes.
More comments later once I go through the whole thing. Sorry about not code reviewing all at once. I know it's not a desirable pattern.
Why not define this and the above function inline like you do for the LVI version in the 3/3 patch in this series?