This is an archive of the discontinued LLVM Phabricator instance.

[PATCH] [sanitizers] Add MSan support for AArch64
ClosedPublic

Authored by zatrazz on Sep 8 2015, 2:14 PM.

Download Raw Diff

Details

Reviewers

rengolin
t.p.northover
aemerson
pcc

Summary

This patch adds support for msan on aarch64-linux for both 39 and
42-bit VMA. The support is enabled by defining the
SANITIZER_AARCH64_VMA compiler flag to either 39 or 42 at build time
for both clang/llvm and compiler-rt. The default VMA is 39 bits.

Diff Detail

Event Timeline

zatrazz updated this revision to Diff 34260.Sep 8 2015, 2:14 PM

zatrazz retitled this revision from to [PATCH] [sanitizers] Add MSan support for AArch64.

zatrazz updated this object.

zatrazz added reviewers: t.p.northover, aemerson, rengolin, pcc.

zatrazz added a subscriber: llvm-commits.

Herald added subscribers: rengolin, aemerson. · View Herald TranscriptSep 8 2015, 2:14 PM

eugenis added a subscriber: eugenis.Sep 8 2015, 2:39 PM

eugenis added inline comments.

lib/Transforms/Instrumentation/MemorySanitizer.cpp
125	What's that? It's not even attached to any cmake setting. Does it mean that a typical clang binary can target all of x86/arm/ppc/etc, 32/64, but only one fixed aarch64 kernel configuration? I really think this should be a compiler flag. And it should be possible to remove the corresponding macro in compiler-rt as well by exporting this setting through a global variable in the compiled objects (see __msan_keep_going or the ASAN_FLEXIBLE_MAPPING thing).

samsonov added a subscriber: samsonov.Sep 8 2015, 2:51 PM

samsonov added inline comments.

lib/Transforms/Instrumentation/MemorySanitizer.cpp
125	+1. I'm sorry that we didn't figure it out promptly during reviews where this define was added to another sanitizers. We should rework this - building compiler and runtimes with a secret define is not the way to go.

rengolin added inline comments.Sep 9 2015, 4:10 AM

lib/Transforms/Instrumentation/MemorySanitizer.cpp
125	What's that? An unfortunate choice. We're in need of a better one. It's not even attached to any cmake setting. CFLAGS. I know... Does it mean that a typical clang binary can target all of x86/arm/ppc/etc, 32/64, but only one fixed aarch64 kernel configuration? I think so. If we want to target both 39 and 42, we'd have to create two separate libraries every time. With 48 coming to be the default in a year or so, we'd have to keep three. If that's acceptable, than I think we should create them all every time. If not, we need a way to select them at build time. I really think this should be a compiler flag. Can you expand on that? This is a temporary implementation detail that is very specific to AArch64, and one that will disappear soon. Currently, it is a compiler flag. Not a pretty one, but one that we expect to be used very rarely. In the future, when it all becomes 48 bits, it'll be used even less, and possibly deprecated altogether when we give up supporting the "old" VMA configurations. Creating an -mvma or anything like that would be kinda pointless, and harder to revert later.

zatrazz added inline comments.Sep 9 2015, 11:48 AM

lib/Transforms/Instrumentation/MemorySanitizer.cpp
125	What's that? It's not even attached to any cmake setting. Does it mean that a typical clang binary can target all of x86/arm/ppc/etc, 32/64, but only one fixed aarch64 kernel configuration? I really think this should be a compiler flag. And it should be possible to remove the corresponding macro in compiler-rt as well by exporting this setting through a global variable in the compiled objects (see __msan_keep_going or the ASAN_FLEXIBLE_MAPPING thing). I though about adding it as compiler flag and let the compiler/user decide to which VMA to use. However, as pointed out by rengolin it would require to build multiple sanitizer libraries for each VMA and select them at link time. If it is the desired target I can evaluate this option, however I would like to avoid such complication from the fact recent kernel change default VMA to 48 bits and the idea is to use only this config.

eugenis added inline comments.Sep 9 2015, 12:39 PM

lib/Transforms/Instrumentation/MemorySanitizer.cpp
125	It should be possible to have a single runtime library that would detect VMA at runtime. Maybe it would cost us a percent or two of performance. Having said that, I don't think this macro, as it is, would cause us any problems in practice, and I don't mind if you go ahead with this patch. Two questions: Does lower VMA setting work on kernels configured for higher VMA, or does it have to be an exact match? is there a runtime check for an incompatible or suboptimal VMA setting?

It should be possible to have a single runtime library that would detect VMA at runtime. Maybe it would cost us a percent or two of performance.

The performance issue is not the runtime detection, but how to disentangle all the macros and one VMA assumption
in sanitizer code. The MSAN compiler-rt patch relies on define both the internal allocator (which can be SizeClassAllocator32
or SizeClassAllocator64 depending of the available VMA size) and transformation macros (MEM_TO_SHADOW) that are
used extensively in a lot of places.

And there is the question if this work will be really useful, since the idea is indeed to have only one VMA as default (48-bit
for aarch64).

Having said that, I don't think this macro, as it is, would cause us any problems in practice, and I don't mind if you go ahead with this patch.

Two questions:

Does lower VMA setting work on kernels configured for higher VMA, or does it have to be an exact match?

It must be a exact match because compiler-rt sanitizer libraries mmap specific areas for shadow and origin memory
and they are calculated based on where the user regions are placed. For instance, for 39-bit VMA some text segment
is expected to be between 0x7000000000-0x8000000000 where for 42-bit VMA 0x3f000000000-0x40000000000. This
generates complete different mappings.

is there a runtime check for an incompatible or suboptimal VMA setting?

Not right now, but it is something I will implement.

In D12707#243160, @zatrazz wrote:

is there a runtime check for an incompatible or suboptimal VMA setting?

Not right now, but it is something I will implement.

That's a good idea, especially now that we have two "defaults", depending on the distro.

kristof.beyls added a subscriber: kristof.beyls.Sep 10 2015, 7:21 AM

kristof.beyls added inline comments.

lib/Transforms/Instrumentation/MemorySanitizer.cpp
125	I'm a bit confused. It seems to me that this is adding functionality to the code-generation part of the sanitizer, i.e. embedded in the compiler right? And we're making a decision at compiler-construction time about only supporting one variant of AArch64 linux kernels? That doesn't seem right to me as at compiler-construction time you may want to create a compiler that can target all the different variants of linux kernels.

http://reviews.llvm.org/D12763

eugenis added inline comments.Sep 10 2015, 11:52 AM

lib/Transforms/Instrumentation/MemorySanitizer.cpp
125	Yeah, it does sound quite bad. I'd also like to argue that lower VMA settings are unlikely to go away in the near future. Even when the linux kernel defaults to 48, there are platforms using older kernels, already released devices with at least a few years of lifetime. For example, a toolchain in the Android NDK would need to support VMA variants for a wide set of devices, and bundling a whole extra toolchain or two just for aarch64 is not an option. I take back my earlier comment. I don't see a way around -mvma. I agree that a runtime VMA detection (i.e. a single runtime library for all VMA variants) is not realistic. Sounds like we'll need to build 3 libraries and pick the correct one in the driver based on -mvma value.

rengolin added inline comments.Sep 11 2015, 2:20 AM

lib/Transforms/Instrumentation/MemorySanitizer.cpp
125	I didn't mean 39/42 are going away, just that we'll probably move the default some time in the future. In the long run, I agree that -mvma will be required, if anything, to pick one of the versions. We could also have the detection in the compiler itself, to act as a default, but due to cross-compilation issues, we'll need the flag anyway. Kristof also mentioned about name mangling as a way to do some run time selection, but that would also require some magic in the run time, either a dispatch table, or self-modifying code at start up (like IFUNC). I don't think we're at the stage that we can take any of these decisions lightly. Given the status of the sanitizer code, where a lot of it is either hard-coded or based on pre-processor macros, trying anything dynamic will be a big refactoring change. One that we may need, granted, but not at the expense of having the sanitizers working on the two VMA configurations we have today in some way. As I said earlier, we're discussing this, and maybe we should put forward a BOF session in the US LLVM to discuss the refactoring, but meanwhile, this allows us to test it on both VMAs as we're doing for the other sanitizers. We haven't taken any decision, nor we're not changing anything here, just adding functionality.

kristof.beyls added inline comments.Sep 11 2015, 2:37 AM

lib/Transforms/Instrumentation/MemorySanitizer.cpp
125	I don't think that name mangling requires dispatch tables or self-modifying code. I agree that name mangling or another mechanism to fail at binary build/link time rather than at run time can be done as part of a separate patch. However, I do think that hard-coding/restricting which linux distribution you can target when building the compiler is evil, and this patch shouldn't do that. I therefore think that some command line option indeed is needed with a reasonable default before this patch can land.

rengolin added inline comments.Sep 11 2015, 2:47 AM

lib/Transforms/Instrumentation/MemorySanitizer.cpp
125	There is a compilation option to control that, -DSANITIZER_AARCH64_VMA=42. I agree it's not the best, but being temporary until we can refactor all the other hard-coded stuff to make this work at run time without performance impacts, I think it's acceptable.

zatrazz added inline comments.Sep 11 2015, 5:33 AM

lib/Transforms/Instrumentation/MemorySanitizer.cpp
125	This patch itself is a enablement one meant to make it possible to run and check msan output in current AArch64 scenarios. The idea is to be able to continuous check for regression by buildbots and add an initial support. We do agree that current approach (using a compiler flag) is not the best option and it should go away with a better strategy. However, as pointed out I also do agree it should be done in another patch and incrementally. It requires some refactor from compiler-rt to build/generate multiple versions of the required symbols used by the sanitizers and the implementation of the flag mechanism. My idea is to after finish the msan sanitizer (aarch64-linux now supports asan, tsan, and dfsan) is to work on a mechanism to select the vma at build time and make the compiler-rt transparent to it (either by using name mangling, runtime detection, or multiples compiler-rt builds). So should we still block this patch based on the assumption that this mechanism will be one used for general deployment?

kristof.beyls added inline comments.Sep 11 2015, 5:44 AM

lib/Transforms/Instrumentation/MemorySanitizer.cpp
125	My main objection is that - unless I misunderstand - the current patch doesn't offer a command line flag to clang and/or llvm to specify the vma size of the system you're targeting. clang/llvm offers cross-compilation support out-of-the-box and only supporting a single vma size in built libraries just seems such a big regression over by-default cross-compilation support that I think this patch shouldn't land without a way to support instrumentation for different vma sizes. The way the patch currently is, it's basically impossible to create a single clang binary that can offer native support across all linux-aarch distributions. I think that's just too wrong. Furthermore, it seems like a minor change to this patch to not hard-code which vma size is supported in the llvm/clang binary being built, but instead having an llvm flag for this. Basically, I object to using predefines like SANITIZER_AARCH64_VMA in the compiler source code - i.e. this patch.

rengolin added inline comments.Sep 11 2015, 6:00 AM

lib/Transforms/Instrumentation/MemorySanitizer.cpp
125	My main objection is that - unless I misunderstand - the current patch doesn't offer a command line flag to clang and/or llvm to specify the vma size of the system you're targeting. It does, -DSANITIZER_AARCH64_VMA=42. We are testing both 39 and 42 bits VMA on our buildbots, so it is possible to change it. The way the patch currently is, it's basically impossible to create a single clang binary that can offer native support across all linux-aarch distributions. I think that's just too wrong. That's not this patch, that's how the whole sanitizer is hard-coded to be. That's a very important thing we need to implement, which will be our next step, but this is not related to this patch. This specific patch is not using anything that the other sanitizers are not, since we're not inventing the macro here, now. Because all sanitizers use the same macro, once we fix for one, we fix for all. But this is not the point of this patch.

rengolin accepted this revision.Sep 15 2015, 7:45 AM

rengolin edited edge metadata.

This revision is now accepted and ready to land.Sep 15 2015, 7:45 AM

I will push this patch.

Wrong patch (this was meant for compiler-rt).

zatrazz closed this revision.Sep 16 2015, 8:14 AM

Revision Contents

Path

Size

lib/

Transforms/

Instrumentation/

MemorySanitizer.cpp

34 lines

Diff 34890

lib/Transforms/Instrumentation/MemorySanitizer.cpp

Context not available.

	#define DEBUG_TYPE "msan"	#define DEBUG_TYPE "msan"

		// VMA size definition for architecture that support multiple sizes.
		// AArch64 has 3 VMA sizes: 39, 42 and 48.
		#ifndef SANITIZER_AARCH64_VMA
		eugenisUnsubmitted Not Done Reply Inline Actions What's that? It's not even attached to any cmake setting. Does it mean that a typical clang binary can target all of x86/arm/ppc/etc, 32/64, but only one fixed aarch64 kernel configuration? I really think this should be a compiler flag. And it should be possible to remove the corresponding macro in compiler-rt as well by exporting this setting through a global variable in the compiled objects (see __msan_keep_going or the ASAN_FLEXIBLE_MAPPING thing). eugenis: What's that? It's not even attached to any cmake setting. Does it mean that a typical clang…
		samsonovUnsubmitted Not Done Reply Inline Actions +1. I'm sorry that we didn't figure it out promptly during reviews where this define was added to another sanitizers. We should rework this - building compiler and runtimes with a secret define is not the way to go. samsonov: +1. I'm sorry that we didn't figure it out promptly during reviews where this define was added…
		rengolinUnsubmitted Not Done Reply Inline Actions What's that? An unfortunate choice. We're in need of a better one. It's not even attached to any cmake setting. CFLAGS. I know... Does it mean that a typical clang binary can target all of x86/arm/ppc/etc, 32/64, but only one fixed aarch64 kernel configuration? I think so. If we want to target both 39 and 42, we'd have to create two separate libraries every time. With 48 coming to be the default in a year or so, we'd have to keep three. If that's acceptable, than I think we should create them all every time. If not, we need a way to select them at build time. I really think this should be a compiler flag. Can you expand on that? This is a temporary implementation detail that is very specific to AArch64, and one that will disappear soon. Currently, it is a compiler flag. Not a pretty one, but one that we expect to be used very rarely. In the future, when it all becomes 48 bits, it'll be used even less, and possibly deprecated altogether when we give up supporting the "old" VMA configurations. Creating an -mvma or anything like that would be kinda pointless, and harder to revert later. rengolin: > What's that? An unfortunate choice. We're in need of a better one. > It's not even attached…
		zatrazzAuthorUnsubmitted Not Done Reply Inline Actions What's that? It's not even attached to any cmake setting. Does it mean that a typical clang binary can target all of x86/arm/ppc/etc, 32/64, but only one fixed aarch64 kernel configuration? I really think this should be a compiler flag. And it should be possible to remove the corresponding macro in compiler-rt as well by exporting this setting through a global variable in the compiled objects (see __msan_keep_going or the ASAN_FLEXIBLE_MAPPING thing). I though about adding it as compiler flag and let the compiler/user decide to which VMA to use. However, as pointed out by rengolin it would require to build multiple sanitizer libraries for each VMA and select them at link time. If it is the desired target I can evaluate this option, however I would like to avoid such complication from the fact recent kernel change default VMA to 48 bits and the idea is to use only this config. zatrazz: > What's that? > It's not even attached to any cmake setting. > Does it mean that a typical…
		eugenisUnsubmitted Not Done Reply Inline Actions It should be possible to have a single runtime library that would detect VMA at runtime. Maybe it would cost us a percent or two of performance. Having said that, I don't think this macro, as it is, would cause us any problems in practice, and I don't mind if you go ahead with this patch. Two questions: Does lower VMA setting work on kernels configured for higher VMA, or does it have to be an exact match? is there a runtime check for an incompatible or suboptimal VMA setting? eugenis: It should be possible to have a single runtime library that would detect VMA at runtime. Maybe…
		kristof.beylsUnsubmitted Not Done Reply Inline Actions I'm a bit confused. It seems to me that this is adding functionality to the code-generation part of the sanitizer, i.e. embedded in the compiler right? And we're making a decision at compiler-construction time about only supporting one variant of AArch64 linux kernels? That doesn't seem right to me as at compiler-construction time you may want to create a compiler that can target all the different variants of linux kernels. kristof.beyls: I'm a bit confused. It seems to me that this is adding functionality to the code-generation…
		eugenisUnsubmitted Not Done Reply Inline Actions Yeah, it does sound quite bad. I'd also like to argue that lower VMA settings are unlikely to go away in the near future. Even when the linux kernel defaults to 48, there are platforms using older kernels, already released devices with at least a few years of lifetime. For example, a toolchain in the Android NDK would need to support VMA variants for a wide set of devices, and bundling a whole extra toolchain or two just for aarch64 is not an option. I take back my earlier comment. I don't see a way around -mvma. I agree that a runtime VMA detection (i.e. a single runtime library for all VMA variants) is not realistic. Sounds like we'll need to build 3 libraries and pick the correct one in the driver based on -mvma value. eugenis: Yeah, it does sound quite bad. I'd also like to argue that lower VMA settings are unlikely to…
		rengolinUnsubmitted Not Done Reply Inline Actions I didn't mean 39/42 are going away, just that we'll probably move the default some time in the future. In the long run, I agree that -mvma will be required, if anything, to pick one of the versions. We could also have the detection in the compiler itself, to act as a default, but due to cross-compilation issues, we'll need the flag anyway. Kristof also mentioned about name mangling as a way to do some run time selection, but that would also require some magic in the run time, either a dispatch table, or self-modifying code at start up (like IFUNC). I don't think we're at the stage that we can take any of these decisions lightly. Given the status of the sanitizer code, where a lot of it is either hard-coded or based on pre-processor macros, trying anything dynamic will be a big refactoring change. One that we may need, granted, but not at the expense of having the sanitizers working on the two VMA configurations we have today in some way. As I said earlier, we're discussing this, and maybe we should put forward a BOF session in the US LLVM to discuss the refactoring, but meanwhile, this allows us to test it on both VMAs as we're doing for the other sanitizers. We haven't taken any decision, nor we're not changing anything here, just adding functionality. rengolin: I didn't mean 39/42 are going away, just that we'll probably move the default some time in the…
		kristof.beylsUnsubmitted Not Done Reply Inline Actions I don't think that name mangling requires dispatch tables or self-modifying code. I agree that name mangling or another mechanism to fail at binary build/link time rather than at run time can be done as part of a separate patch. However, I do think that hard-coding/restricting which linux distribution you can target when building the compiler is evil, and this patch shouldn't do that. I therefore think that some command line option indeed is needed with a reasonable default before this patch can land. kristof.beyls: I don't think that name mangling requires dispatch tables or self-modifying code. I agree that…
		rengolinUnsubmitted Not Done Reply Inline Actions There is a compilation option to control that, -DSANITIZER_AARCH64_VMA=42. I agree it's not the best, but being temporary until we can refactor all the other hard-coded stuff to make this work at run time without performance impacts, I think it's acceptable. rengolin: There is a compilation option to control that, -DSANITIZER_AARCH64_VMA=42. I agree it's not the…
		zatrazzAuthorUnsubmitted Not Done Reply Inline Actions This patch itself is a enablement one meant to make it possible to run and check msan output in current AArch64 scenarios. The idea is to be able to continuous check for regression by buildbots and add an initial support. We do agree that current approach (using a compiler flag) is not the best option and it should go away with a better strategy. However, as pointed out I also do agree it should be done in another patch and incrementally. It requires some refactor from compiler-rt to build/generate multiple versions of the required symbols used by the sanitizers and the implementation of the flag mechanism. My idea is to after finish the msan sanitizer (aarch64-linux now supports asan, tsan, and dfsan) is to work on a mechanism to select the vma at build time and make the compiler-rt transparent to it (either by using name mangling, runtime detection, or multiples compiler-rt builds). So should we still block this patch based on the assumption that this mechanism will be one used for general deployment? zatrazz: This patch itself is a enablement one meant to make it possible to run and check msan output…
		kristof.beylsUnsubmitted Not Done Reply Inline Actions My main objection is that - unless I misunderstand - the current patch doesn't offer a command line flag to clang and/or llvm to specify the vma size of the system you're targeting. clang/llvm offers cross-compilation support out-of-the-box and only supporting a single vma size in built libraries just seems such a big regression over by-default cross-compilation support that I think this patch shouldn't land without a way to support instrumentation for different vma sizes. The way the patch currently is, it's basically impossible to create a single clang binary that can offer native support across all linux-aarch distributions. I think that's just too wrong. Furthermore, it seems like a minor change to this patch to not hard-code which vma size is supported in the llvm/clang binary being built, but instead having an llvm flag for this. Basically, I object to using predefines like SANITIZER_AARCH64_VMA in the compiler source code - i.e. this patch. kristof.beyls: My main objection is that - unless I misunderstand - the current patch doesn't offer a command…
		rengolinUnsubmitted Not Done Reply Inline Actions My main objection is that - unless I misunderstand - the current patch doesn't offer a command line flag to clang and/or llvm to specify the vma size of the system you're targeting. It does, -DSANITIZER_AARCH64_VMA=42. We are testing both 39 and 42 bits VMA on our buildbots, so it is possible to change it. The way the patch currently is, it's basically impossible to create a single clang binary that can offer native support across all linux-aarch distributions. I think that's just too wrong. That's not this patch, that's how the whole sanitizer is hard-coded to be. That's a very important thing we need to implement, which will be our next step, but this is not related to this patch. This specific patch is not using anything that the other sanitizers are not, since we're not inventing the macro here, now. Because all sanitizers use the same macro, once we fix for one, we fix for all. But this is not the point of this patch. rengolin: > My main objection is that - unless I misunderstand - the current patch doesn't offer a…
		# define SANITIZER_AARCH64_VMA 39
		#else
		# if SANITIZER_AARCH64_VMA != 39 && SANITIZER_AARCH64_VMA != 42
		# error "invalid SANITIZER_AARCH64_VMA size"
		# endif
		#endif

	static const unsigned kOriginSize = 4;	static const unsigned kOriginSize = 4;
	static const unsigned kMinOriginAlignment = 4;	static const unsigned kMinOriginAlignment = 4;
	static const unsigned kShadowTLSAlignment = 8;	static const unsigned kShadowTLSAlignment = 8;
Context not available.
	0x1C0000000000, // OriginBase	0x1C0000000000, // OriginBase
	};	};

		// aarch64 Linux
		static const MemoryMapParams Linux_AArch64_MemoryMapParams = {
		#if SANITIZER_AARCH64_VMA == 39
		0x007C00000000, // AndMask
		0x000100000000, // XorMask
		0x004000000000, // ShadowBase
		0x004300000000, // OriginBase
		#elif SANITIZER_AARCH64_VMA == 42
		0x03E000000000, // AndMask
		0x001000000000, // XorMask
		0x010000000000, // ShadowBase
		0x012000000000, // OriginBase
		#endif
		};

	// i386 FreeBSD	// i386 FreeBSD
	static const MemoryMapParams FreeBSD_I386_MemoryMapParams = {	static const MemoryMapParams FreeBSD_I386_MemoryMapParams = {
	0x000180000000, // AndMask	0x000180000000, // AndMask
Context not available.
	&Linux_PowerPC64_MemoryMapParams,	&Linux_PowerPC64_MemoryMapParams,
	};	};

		static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams = {
		NULL,
		&Linux_AArch64_MemoryMapParams,
		};

	static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams = {	static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams = {
	&FreeBSD_I386_MemoryMapParams,	&FreeBSD_I386_MemoryMapParams,
	&FreeBSD_X86_64_MemoryMapParams,	&FreeBSD_X86_64_MemoryMapParams,
Context not available.
	case Triple::ppc64le:	case Triple::ppc64le:
	MapParams = Linux_PowerPC_MemoryMapParams.bits64;	MapParams = Linux_PowerPC_MemoryMapParams.bits64;
	break;	break;
		case Triple::aarch64:
		case Triple::aarch64_be:
		MapParams = Linux_ARM_MemoryMapParams.bits64;
		break;
	default:	default:
	report_fatal_error("unsupported architecture");	report_fatal_error("unsupported architecture");
	}	}
Context not available.