This is an archive of the discontinued LLVM Phabricator instance.

Retain all jump table range checks when using BTI.
ClosedPublic

Authored by simon_tatham on Jul 17 2023, 9:35 AM.

Details

Summary

This modifies the switch-statement generation in SelectionDAGBuilder,
specifically the part that generates case clusters of type CC_JumpTable.

A table-based branch of any kind is at risk of being a JOP gadget, if
it doesn't range-check the offset into the table. For some types of
table branch, such as Arm TBB/TBH, the impact of this is limited
because the value loaded from the table is a relative offset of
limited size; for others, such as a MOV PC,Rn computed branch into a
table of further branch instructions, the gadget is fully general.

When compiling for branch-target enforcement via Arm's BTI system,
many of these table branch idioms use branch instructions of types
that do not require a BTI instruction at the branch destination. This
avoids the need to put a BTI at the start of each case handler,
reducing the number of available gadgets with BTIs (i.e. ones
which could be used by a JOP attack in spite of the BTI system). But
without a range check, the use of a non-BTI-requiring branch also
opens up a larger range of followup gadgets for an attacker's use.

A defence against this is to avoid optimising away the range check on
the table offset, even if the compiler believes that no out-of-range
value should be able to reach the table branch. (Rationale: that may
be true for values generated legitimately by the program, but not
those generated maliciously by attackers who have already corrupted
the control flow.)

The effect of keeping the range check and branching to an unreachable
block is that no actual code is generated at that block, so it will
typically point at the end of the function. That may still cause some
kind of unpredictable code execution (such as executing data as code,
or falling through to the next function in the code section), but even
if so, there will only be one possible invalid branch target,
rather than giving an attacker the choice of many possibilities.

This defence is enabled only when branch target enforcement is in use.
Without branch target enforcement, the range check is easily bypassed
anyway, by branching in to a location just after it. But with
enforcement, the attacker will have to enter the jump table dispatcher
at the initial BTI and then go through the range check. (Or, if they
don't, it's because they already have a general BTI-bypassing
gadget.)

Diff Detail

Event Timeline

simon_tatham created this revision.Jul 17 2023, 9:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 17 2023, 9:35 AM
simon_tatham requested review of this revision.Jul 17 2023, 9:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 17 2023, 9:35 AM

I can confirm that this is Arm's preferred way of fixing this. The alternative is to always use a BTI setting indirect branch, but this requires adding BTI j in front of every valid target of the branch, which bloats code-size and leaves more targets for an attacker to indirectly jump to.

Branch Target Identification on AArch64, which doesn't have a direct equivalent of the mov pc, Rn non BTI setting indirect branch is not currently affected. However the architecture has reserved the use of RET x16 for this purpose so code-generation strategy may change to need this in the future on AArch64 too.

MaskRay accepted this revision.Jul 25 2023, 11:38 PM

I agree with the hardening side argument. I have checked FallthroughUnreachable uses in CC_BitTests and CC_Range for some CodeGen tests (mostly in X86/) and confirmed that they don't need the "branch-target-enforcement" special case.

A table-based branch of any kind is at risk of being a JOP gadget, if it doesn't range-check the offset into the table. ...

Consider adding CC_JumpTable to the paragraph. Its case label is many lines above and with the default context of git log, it's difficult to see how this change is relevant to CC_JumpTable ...

... many of these table branch idioms use branch instructions that do not set the BTI flag, so they can target instructions without BTI landing pads.

Q: what does "use branch instructions that do not set the BTI flag" mean?

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
11324

CurMF->getMMI().getModule()->getModuleFlag( is more common to get Module.

llvm/test/CodeGen/Thumb2/jump-table-bti.ll
2

There is a bti-jump-table.mir, but I think jump-table-* as this patch adds is a better name. It groups jump table tests together and matches ARM/jump-table-*.ll.

2

This test file needs a file-level comment describing what it tests, perhaps:

;; When BTI is enabled, keep the range check for a jump table for hardening, even with a unreachable default

4

Add quotes, otherwise this seems to trigger some substitutions in zsh: sed '/^..for-non-bti-build-sed-will-delete-everything-after-this-line/q'

This revision is now accepted and ready to land.Jul 25 2023, 11:38 PM

I can confirm that this is Arm's preferred way of fixing this. The alternative is to always use a BTI setting indirect branch, but this requires adding BTI j in front of every valid target of the branch, which bloats code-size and leaves more targets for an attacker to indirectly jump to.

I see that the AArch64 equivalent (by changing llvm.arm.space to llvm.aarch64.space) has bti j in front of every valid target of the branch, therefore bloating code size.
But how does this leave more targets for an attacker to indirectly jump to? Do you mean that these bti j can be the target of a malicious indirect jump from elsewhere?

However, isn't this as unsafe as jumping to the default basic block (in this test case, the end of the function, after the RET) for an out-of-range branch value in the current switch instruction?

I agree with the hardening side argument. I have checked FallthroughUnreachable uses in CC_BitTests and CC_Range for some CodeGen tests (mostly in X86/) and confirmed that they don't need the "branch-target-enforcement" special case.

A table-based branch of any kind is at risk of being a JOP gadget, if it doesn't range-check the offset into the table. ...

Consider adding CC_JumpTable to the paragraph. Its case label is many lines above and with the default context of git log, it's difficult to see how this change is relevant to CC_JumpTable ...

... many of these table branch idioms use branch instructions that do not set the BTI flag, so they can target instructions without BTI landing pads.

Q: what does "use branch instructions that do not set the BTI flag" mean?

AArch32 has quite a few ways to do an indirect branch as the PC is a writeable register. The way the M-profile BTI is defined is in terms of BTI setting and BTI clearing instructions, where the flag is the EPSR.B bit. Intuitively a BTI setting instruction must transfer control to a BTI clearing instruction. Some indirect branches such as MOV PC, <Src Reg> are not BTI setting so they are not required to transfer control to a BTI clearing instruction.

Quotes/paraphrases from the v8-m Arm ARM https://developer.arm.com/documentation/ddi0553/latest/

BTI clearing: Branch Target Identification clearing instruction. Any instruction that clears the EPSR.B bit to zero.
BTI setting: Branch Target Identification setting instruction. Any instruction that sets the EPSR.B bit to one.
...
EPSR.B bit:
Unless otherwise stated, when this bit is set the next executed instruction must be a BTI clearing instruction otherwise an INVSTATE UsageFault is generated.

The BTI setting instructions are:

* BLX.
* BLXNS.
* When the register holding the branch address is not the LR:
– BX.
– BXNS.
* When the address is loaded into the PC:
– LDR (register).
– LDR (literal).
* When the address is loaded into the PC and the base address register is either not the SP or the SP and write-back of the SP does not occur:
– LDR (immediate).
– LDM, LDMIA, LDMFD.
– LDMDB, LDMEA.

The BTI clearing instructions are:

* BTI.
* SG.
* PACBTI.

I can confirm that this is Arm's preferred way of fixing this. The alternative is to always use a BTI setting indirect branch, but this requires adding BTI j in front of every valid target of the branch, which bloats code-size and leaves more targets for an attacker to indirectly jump to.

I see that the AArch64 equivalent (by changing llvm.arm.space to llvm.aarch64.space) has bti j in front of every valid target of the branch, therefore bloating code size.
But how does this leave more targets for an attacker to indirectly jump to? Do you mean that these bti j can be the target of a malicious indirect jump from elsewhere?

Yes. The theory is that if the attacker finds some vulnerability outside of the function, they can jump to the start of each case statement, which increases the number of gadgets in the program, whether those can be usefully exploited or not will depend on the program. It also has the side-effect of increasing code-size.

However, isn't this as unsafe as jumping to the default basic block (in this test case, the end of the function, after the RET) for an out-of-range branch value in the current switch instruction?

In the AArch32 case the MOV PC, <REG> instruction can land anywhere, not just at the start of a BTI instruction. On AArch64 the BX <reg> instruction is at least constrained to a BTI compatible landing pad.

Hope I've understood the question well enough to answer there.

chill added a subscriber: chill.Jul 26 2023, 2:41 AM
chill added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
11324
simon_tatham edited the summary of this revision. (Show Details)

Thanks for the review. New version addressing all those comments (I think):

Switched to finding the current module via CurMF as suggested.

Added @chill's suggestion of checking for a function attribute that overrides the module flag, and added some cases of that to the tests.

Added a file-header comment in the test, and zsh-proofing quotes in the sed commands.

Reworded the commit message to explicitly mention CC_JumpTable, and also to talk about branch instructions that do/don't require a BTI at their destination, instead of the more low-level terminology of whether they 'set the BTI flag'.

(This review is already marked as accepted, but @chill's point about function attributes required a noticeable change. @chill, I'll land this on Monday on the basis of the previous acceptance if I don't see any objections before then.)

chill accepted this revision.Jul 28 2023, 2:49 AM
chill added a comment.EditedJul 28 2023, 2:55 AM

LGTM.

Not sure about the sed usage, is it universally available (e.g. llvm-lit internal) or maybe the test would need REQUIRES: system-linux ?

Not sure about the sed usage, is it universally available (e.g. llvm-lit internal) or maybe the test would need REQUIRES: system-linux ?

I wasn't sure either, so I checked before deciding to use sed for the test preprocessing. The Software section of the Getting Started guide lists sed as one of the expected tools on the compilation host. And other lit tests already exist that use sed in pipelines without any special REQUIRES: to authorise it – a quick grep finds, for example, llvm/test/CodeGen/AArch64/speculation-hardening.ll.

I guess that means if you're building and testing on Windows, it's up to you to arrange to have all those tools on Windows one way or another. (Perhaps the 'git bash' environment provides good-enough ones?)

This revision was automatically updated to reflect the committed changes.