This is an archive of the discontinued LLVM Phabricator instance.

Add assembler plumbing for sanitize_memtag
ClosedPublic

Authored by hctim on Jun 30 2022, 4:50 PM.

Details

Summary

Extends the Asm reader/writer to support reading and writing the
'.memtag' directive (including allowing it on internal global
variables). Also add some extra tooling support, including objdump and
yaml2obj/obj2yaml.

Test that the sanitize_memtag IR attribute produces the expected asm
directive.

Uses the new Aarch64 MemtagABI specification
(https://github.com/ARM-software/abi-aa/blob/main/memtagabielf64/memtagabielf64.rst)
to identify symbols as tagged in object files. This is done using a
R_AARCH64_NONE relocation that identifies each tagged symbol, and these
relocations are tagged in a special SHT_AARCH64_MEMTAG_GLOBALS_STATIC
section. This signals to the linker that the global variable should be
tagged.

Diff Detail

Event Timeline

hctim created this revision.Jun 30 2022, 4:50 PM
hctim requested review of this revision.Jun 30 2022, 4:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2022, 4:50 PM

This needs a discussion on https://github.com/ARM-software/abi-aa/ . There are not many STO_ bits and I am not sure the usage is worth reserving one bit for it.

fmayer added inline comments.Jul 1 2022, 8:58 AM
llvm/lib/MC/MCSymbolELF.cpp
39

_Shift for consistency?

204

Consider turning this into explicit branches, which I think is easier to understand. Up to you though.

uint32_t NewFlags;
if (Tagged)
   NewFlags = getFlags() | (1x << ELF_IsMemoryTagged);
else
   NewFlags = getFlags() & ~(1x << ELF_IsMemoryTagged);
hctim edited the summary of this revision. (Show Details)Jul 6 2022, 11:21 AM
hctim marked 2 inline comments as done.

This needs a discussion on https://github.com/ARM-software/abi-aa/ . There are not many STO_ bits and I am not sure the usage is worth reserving one bit for it.

Reworked: no longer uses st_other bits to signal to the linker to do work - instead, it uses a new android-specific section type filled with R_AARCH64_NONE relocations.

llvm/lib/MC/MCSymbolELF.cpp
204

sure, done, but slightly differently.

hctim updated this revision to Diff 442644.Jul 6 2022, 11:21 AM
hctim marked an inline comment as done.

Update.

MaskRay added inline comments.Jul 7 2022, 10:33 AM
llvm/include/llvm/MC/MCDirectives.h
49

So that next time a value is added, no need to touch the MCSA_Tagged line.

llvm/include/llvm/MC/MCELFObjectWriter.h
143

Use imperative sentences. (Omitting s is much more common)

llvm/lib/MC/ELFObjectWriter.cpp
1091

The chunk of code doesn't belong here. See finalizeCGProfile

llvm/lib/MC/MCParser/AsmParser.cpp
5607

The name .tagged is general. Worth using a name more specific to memtag.

llvm/lib/MC/MCSymbolELF.cpp
201

you may omit 0x

207

you may omit 0x

llvm/lib/Target/AArch64/MCTargetDesc/AArch64ELFObjectWriter.cpp
37
459

Omit the variable. Just return

llvm/test/CodeGen/AArch64/global-tagging.ll
3 ↗(On Diff #442644)

llvm-mc tests belong to test/MC/AArch64

4 ↗(On Diff #442644)

llvm-readelf -r is better when the number of relocations is to be checked.

6 ↗(On Diff #442644)

Check D107949 how I add yaml2obj/obj2yaml/llvm-readobj/llvm-objdump. You can use that as the first patch, and llvm-mc support as the second.

12 ↗(On Diff #442644)

It usually reads better if for a particular check prefix, the content is aligned.

hctim updated this revision to Diff 443044.Jul 7 2022, 1:30 PM
hctim marked 12 inline comments as done.

Update with Ray's comments.

llvm/lib/MC/ELFObjectWriter.cpp
1091

I've moved it into a member function, but unlike finalizeCGProfile, we're not writing out a section here, we're only adding a set of relocations (and a special section to hold them). Our section has no bytes, just a set of relocations that reference it.

I don't see any other reasonable places to add this "add some relocations" pass (apart from in the ELFWriter c'tor).

llvm/test/CodeGen/AArch64/global-tagging.ll
6 ↗(On Diff #442644)

yaml2obj/obj2yaml/llvm-readelf/llvm-objdump here aren't actually being updated (I've removed an ELF note reservation from previous patchsets), they're just being used to check that the .memtag directive forces a relocation. I don't think testing each tool is necessary.

eugenis added inline comments.Jul 12 2022, 4:57 PM
llvm/include/llvm/MC/MCAsmInfo.h
777

getMemtagAttr ?

llvm/include/llvm/MC/MCSymbolELF.h
47

Why are these names so random?

getMemTag()
setMemTag()

or something similar with "is"

hctim marked 2 inline comments as done.Jul 12 2022, 5:22 PM
hctim added inline comments.
llvm/include/llvm/MC/MCSymbolELF.h
47

done, modelled after setVisibility / getVisibility above

hctim updated this revision to Diff 444116.Jul 12 2022, 5:22 PM
hctim marked an inline comment as done.

Unify the naming, based on 'Memtag'.

MaskRay added subscribers: Wilco1, peter.smith.EditedJul 13 2022, 10:48 AM

The implementation looks good to me. I assume that you have authority to define a value in SHT_ANDROID_*.

My only concern is whether regular AArch64 may need something similar in the future, then whether the section name .memtag.globals and the directive name .memtag may cause a recognition conflict.
Technically the assembler may decode the environment from the target triple (aarch64-linux-android31) and do different things on Android and on non-Android, but I personally favor different section names/directives
which can help Android migrate to the standard.

@Wilco1 @peter.smith

llvm/test/MC/AArch64/global-tagging.ll
7

> => -o

12

Instead of interleaving YAML/RELOCS/ASM checks, I think it seems more readable to group YAML/RELOCS/ASM separately.

enh added a subscriber: enh.Jul 13 2022, 11:56 AM

The implementation looks good to me. I assume that you have authority to define a value in SHT_ANDROID_*.

i own Android's elf.h and dynamic linker [citations needed: https://cs.android.com/android/platform/superproject/+/master:bionic/libc/include/elf.h and https://cs.android.com/android/platform/superproject/+/master:bionic/OWNERS], and this lgtm :-)

My only concern is whether regular AArch64 may need something similar in the future, then whether the section name .memtag.globals and the directive name .memtag may cause a recognition conflict.
Technically the assembler may decode the environment from the target triple (aarch64-linux-android31) and do different things on Android and on non-Android, but I personally favor different section names/directives
which can help Android migrate to the standard.

@Wilco1 @peter.smith

yeah, ideally we'll be able to get everyone in sync in the end, even if we have to ship first (like with the compressed relocations stuff).

Purely on the ABI front, I've got some questions and suggestions.

As a general open question, what are your ABI plans? Is it to create an Android specific ABI for .memtag, which presumably will need to be maintained for some period of time? I assume that if there is an alternative Linux ABI defined, presumably in the AArch64 ABI https://github.com/ARM-software/abi-aa then you'll consider adopting that?

As an aside the ABI is open for contributions https://github.com/ARM-software/abi-aa/blob/main/CONTRIBUTING.md if there is an effort to get a memtag ABI defined (not aware of one internally) we'd like to work with everyone to write something up. The PAuthABI (https://discourse.llvm.org/t/llvm-pointer-authentication-sync-ups/62661/5) is an example of that. We have an ELF extension https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst that has some additional relocations and dynamic relocations reserved.

For Arm/AArch64 specific sections, although I can't find it written down as "reserved" we try and follow a naming convention for Arm/AArch64 specific sections such as .ARM.attributes . My expectation is that anything that we define that is SHT_PROGBITS that is specific to Arm/AArch64 will have a .ARM. or .AArch64. prefix. If your section name is specific to Android, but could be supported on other architectures at some future point, could you use something like .Android.memtag?

For the .memtag directive there could be a parameter, potentially optional that changed its behaviour. For example .memtag huge, android .memtag huge, arm. It wouldn't be the end of the world if there had to be a second directive with a different name, although that would be potentially confusing. There doesn't seem to be a naming convention for directives in GNU assembler like there would be section names.

hctim added a comment.Jul 14 2022, 1:24 PM

Purely on the ABI front, I've got some questions and suggestions.

As a general open question, what are your ABI plans? Is it to create an Android specific ABI for .memtag, which presumably will need to be maintained for some period of time? I assume that if there is an alternative Linux ABI defined, presumably in the AArch64 ABI https://github.com/ARM-software/abi-aa then you'll consider adopting that?

I think the AArch64 ABI is a reasonable place for this - however my main concern is that I have zero experience with non-Android loaders, and no experience with other linkers outside of lld. As such, it's hard to estimate the "unknown unknowns" of those areas.

There is also a separate piece that is part of this work that belongs in the linker. In summary, it's a special section with a series of ULEB-encoded pairs of {offset, size} descriptors (one for each global variable that wants to be tagged).

I'll draft up an MTE AArch64 ABI extension over there, but will probably need some hand holding :).

For now though, to allow experimentation and further development, it would be great to temporarily reserve some section identifiers and semantics. Do you think we should use aarch64 reservations, or Android reservations? I would hope that we can avoid a RELR-style launch where Android supports some scheme for a few generations, then the generic ABI variant comes about, and then Android has to support both (and app developers have to use the old one for backwards compatibility, but that's not always obvious).

As an aside the ABI is open for contributions https://github.com/ARM-software/abi-aa/blob/main/CONTRIBUTING.md if there is an effort to get a memtag ABI defined (not aware of one internally) we'd like to work with everyone to write something up. The PAuthABI (https://discourse.llvm.org/t/llvm-pointer-authentication-sync-ups/62661/5) is an example of that. We have an ELF extension https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst that has some additional relocations and dynamic relocations reserved.

Fortunately, the current design fits without having to create new relocations (we did have a previous RFC that required new relocations for optimisation, but that's not our focus right now). We do re-use existing relocation types in a new novel way (like the R_NONE in a special section in this patch, or applications of memory tags in R_RELATIVE and other dynamic relocations).

For Arm/AArch64 specific sections, although I can't find it written down as "reserved" we try and follow a naming convention for Arm/AArch64 specific sections such as .ARM.attributes . My expectation is that anything that we define that is SHT_PROGBITS that is specific to Arm/AArch64 will have a .ARM. or .AArch64. prefix. If your section name is specific to Android, but could be supported on other architectures at some future point, could you use something like .Android.memtag?

For now though, to allow experimentation and further development, it would be great to temporarily reserve some section identifiers and semantics. Do you think we should use aarch64 reservations, or Android reservations?

For the .memtag directive there could be a parameter, potentially optional that changed its behaviour. For example .memtag huge, android .memtag huge, arm. It wouldn't be the end of the world if there had to be a second directive with a different name, although that would be potentially confusing. There doesn't seem to be a naming convention for directives in GNU assembler like there would be section names.

Isn't the compile target enough for this? For example, we might also want to (on some Android NDK version gate) do something different, which would be implied by the ## part of the aarch64-linux-android## target. I also believe that the name here is generic enough to be used by other architectures (x86, etc.) in future.

Purely on the ABI front, I've got some questions and suggestions.

As a general open question, what are your ABI plans? Is it to create an Android specific ABI for .memtag, which presumably will need to be maintained for some period of time? I assume that if there is an alternative Linux ABI defined, presumably in the AArch64 ABI https://github.com/ARM-software/abi-aa then you'll consider adopting that?

I think the AArch64 ABI is a reasonable place for this - however my main concern is that I have zero experience with non-Android loaders, and no experience with other linkers outside of lld. As such, it's hard to estimate the "unknown unknowns" of those areas.

There is also a separate piece that is part of this work that belongs in the linker. In summary, it's a special section with a series of ULEB-encoded pairs of {offset, size} descriptors (one for each global variable that wants to be tagged).

I'll draft up an MTE AArch64 ABI extension over there, but will probably need some hand holding :).

For now though, to allow experimentation and further development, it would be great to temporarily reserve some section identifiers and semantics. Do you think we should use aarch64 reservations, or Android reservations? I would hope that we can avoid a RELR-style launch where Android supports some scheme for a few generations, then the generic ABI variant comes about, and then Android has to support both (and app developers have to use the old one for backwards compatibility, but that's not always obvious).

Thank you. The best place to start would be to raise a github issue https://github.com/ARM-software/abi-aa/issues to cover the overall addition. If you can put a summary (or link to a description elsewhere) of what all the parts to the extension are in the issue then we can advise on the best way forward.

We do have people from our GNU team that have some experience of glibc and GNU ld so hopefully we can find anything that wouldn't be implementable in these areas.

As an aside the ABI is open for contributions https://github.com/ARM-software/abi-aa/blob/main/CONTRIBUTING.md if there is an effort to get a memtag ABI defined (not aware of one internally) we'd like to work with everyone to write something up. The PAuthABI (https://discourse.llvm.org/t/llvm-pointer-authentication-sync-ups/62661/5) is an example of that. We have an ELF extension https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst that has some additional relocations and dynamic relocations reserved.

Fortunately, the current design fits without having to create new relocations (we did have a previous RFC that required new relocations for optimisation, but that's not our focus right now). We do re-use existing relocation types in a new novel way (like the R_NONE in a special section in this patch, or applications of memory tags in R_RELATIVE and other dynamic relocations).

For Arm/AArch64 specific sections, although I can't find it written down as "reserved" we try and follow a naming convention for Arm/AArch64 specific sections such as .ARM.attributes . My expectation is that anything that we define that is SHT_PROGBITS that is specific to Arm/AArch64 will have a .ARM. or .AArch64. prefix. If your section name is specific to Android, but could be supported on other architectures at some future point, could you use something like .Android.memtag?

For now though, to allow experimentation and further development, it would be great to temporarily reserve some section identifiers and semantics. Do you think we should use aarch64 reservations, or Android reservations?

If we've got the opportunity to change while the feature is in development then either one is fine. In the ABI we've not yet got any .aarch64. reservations yet. We have .ARM.attributes reserved for BuildAttributes but I'm unsure whether the intent was that .ARM. would be the prefix for AArch64 or the more likely case that .ARM.attributes was chosen as existing tools already recognised that section name from the 32-bit ABI. At the moment the Android reservation is outside of anything Arm would define so you have a guarantee of no name clash if the ABI process falls through for any reason. If we can make the AArch64 ABI work then I hope it would be possible to change it later.

For the .memtag directive there could be a parameter, potentially optional that changed its behaviour. For example .memtag huge, android .memtag huge, arm. It wouldn't be the end of the world if there had to be a second directive with a different name, although that would be potentially confusing. There doesn't seem to be a naming convention for directives in GNU assembler like there would be section names.

Isn't the compile target enough for this? For example, we might also want to (on some Android NDK version gate) do something different, which would be implied by the ## part of the aarch64-linux-android## target. I also believe that the name here is generic enough to be used by other architectures (x86, etc.) in future.

The target is enough, we'd have to be careful in the tests to not abbreviate the triple too much but that is not a major problem.

hctim added a subscriber: pcc.Aug 9 2022, 10:36 AM
enh added a comment.Aug 9 2022, 11:24 AM

For now though, to allow experimentation and further development, it would be great to temporarily reserve some section identifiers and semantics. Do you think we should use aarch64 reservations, or Android reservations? I would hope that we can avoid a RELR-style launch where Android supports some scheme for a few generations, then the generic ABI variant comes about, and then Android has to support both (and app developers have to use the old one for backwards compatibility, but that's not always obvious).

fwiw, i don't think this case is as bad as compressed relocations because there's no NDK MTE ABI anyway, so this would only affect platform code. it would be annoying for SoC vendors/OEMs needing to rebuild prebuilts (and we'd need the backwards compatibility for 3+ years so that new system image + old vendor image still worked), but at least this wouldn't leak to app developers.

that said, it would still be ideal to get to an official ARM ABI before Android U is finalized early next year --- that would make it more likely we can avoid any disruption.

hctim added a comment.Aug 9 2022, 11:34 AM

For now though, to allow experimentation and further development, it would be great to temporarily reserve some section identifiers and semantics. Do you think we should use aarch64 reservations, or Android reservations? I would hope that we can avoid a RELR-style launch where Android supports some scheme for a few generations, then the generic ABI variant comes about, and then Android has to support both (and app developers have to use the old one for backwards compatibility, but that's not always obvious).

fwiw, i don't think this case is as bad as compressed relocations because there's no NDK MTE ABI anyway, so this would only affect platform code. it would be annoying for SoC vendors/OEMs needing to rebuild prebuilts (and we'd need the backwards compatibility for 3+ years so that new system image + old vendor image still worked), but at least this wouldn't leak to app developers.

that said, it would still be ideal to get to an official ARM ABI before Android U is finalized early next year --- that would make it more likely we can avoid any disruption.

I think it could leak to very progressive app developers - globals are tagged on a per-DSO basis and (at least now) dlopen doesn't discriminate between platform DSOs and app DSOs.

We could change that if we didn't want to have ABI problems, but if we ship MTE globals in an apex, we need the backwards compatibility anyway.

In the MemtagABI doc I'm drafting it also has dynamic table entries so that we can move off of the elf note for specifying heap/stack/etc., so that'd need to stick around for a few generations as well.

I think we'll just end up supporting both an old and a new for a few generations anyway.

hctim added a comment.Aug 16 2022, 5:31 PM

I've uploaded the first draft of the MemtagABI in https://github.com/ARM-software/abi-aa/pull/166

Please send ABI discussion over there.

For now, I'll change any implementation semantics to reflect that draft ABI and use some identifiers in the OS-experimental ranges to allow this to be upstreamed without dependencies on the ABI approval. We'll consider it an experimental ABI for Android use, but not plan to ship any of it right now.

Thanks very much for writing the ABI doc. Will take a look and leave comments on the pull request.

hctim updated this revision to Diff 454100.Aug 19 2022, 1:42 PM
hctim marked 2 inline comments as done.

Addressed the outstanding comments. Moved the Android-specific section name to line up with the MemtagABI name (just in the Android-specific space until that ABI is merged - in order to unblock our testing/merging).

@peter.smith, @MaskRay, @eugenis - would you mind taking a final review of this please? ELF types are Android-specific variants of those described in the MemtagABI draft (https://github.com/ARM-software/abi-aa/pull/166), so that we can merge this for Android testing.

I can take a look from the perspective of the ABI, not too familiar with the implementation but will try my best. I'm running a bit behind schedule so it may be a few days before I can comment. Very happy if others want to approve, particularly on the implementation.

peter.smith accepted this revision.Aug 31 2022, 8:51 AM

One really small comment, that is only a suggestion. This looks good to me. I've set approved, but please give a bit of time to see if there are any objections.

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
714

For the moment is it worth extending the check to an Android OS, that could be going over the top though.

This revision is now accepted and ready to land.Aug 31 2022, 8:51 AM
hctim marked an inline comment as done.Sep 1 2022, 2:19 PM
hctim added inline comments.
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
714

done, and moved from assert -> error.

hctim updated this revision to Diff 457402.Sep 1 2022, 2:20 PM
hctim marked an inline comment as done.

Update w/ comments.

hctim updated this revision to Diff 457431.Sep 1 2022, 3:46 PM

Move some existing stack tests to use -fsanitize=memtag-stack now that we have a non-android warning on -fsanitize=memtag-globals (which is implied by -fsantize=memtag, which the tests previously used).

Herald added a project: Restricted Project. · View Herald TranscriptSep 1 2022, 3:46 PM
eugenis added inline comments.Sep 9 2022, 11:52 AM
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
713

isTagged() already includes a check for hasSanitizerMetadata()

hctim marked an inline comment as done.Sep 12 2022, 12:33 PM
hctim updated this revision to Diff 459534.Sep 12 2022, 12:33 PM

Remove unnecessary hasSanitizerMetadata()

hctim edited the summary of this revision. (Show Details)Nov 23 2022, 1:59 PM
hctim updated this revision to Diff 477600.Nov 23 2022, 1:59 PM

Use new MemtagABI section types, now that it's merged upstream: https://github.com/ARM-software/abi-aa/blob/main/memtagabielf64/memtagabielf64.rst

fmayer added inline comments.Nov 23 2022, 2:20 PM
llvm/lib/MC/ELFObjectWriter.cpp
622

Is the story here that on non-AArch64, this function gets run, but there will be no SymE.isMemtag()?

1326–1333

move this after the assert for Sym

hctim updated this revision to Diff 477620.Nov 23 2022, 3:02 PM
hctim marked 2 inline comments as done.

Move isMemtag below assert(Sym)

MaskRay accepted this revision.Nov 23 2022, 11:47 PM
fmayer accepted this revision.Nov 28 2022, 10:01 AM
fmayer added inline comments.
llvm/lib/MC/ELFObjectWriter.cpp
622

optional: the assert message does not really convey that. Maybe something like "Unexpectedly found tagged global"

hctim updated this revision to Diff 478367.Nov 28 2022, 2:07 PM
hctim marked an inline comment as done.

Update comment for .memtag globals when generating object files on non-MTE-globals supported architectures, and move said comment from an assert to a fatal error.

llvm/lib/MC/ELFObjectWriter.cpp
622

Correct, and getMemtagRelocsSection() returns nullptr on every arch except aarch64.

This revision was landed with ongoing or failed builds.Dec 1 2022, 10:51 AM
This revision was automatically updated to reflect the committed changes.