This is an archive of the discontinued LLVM Phabricator instance.

[compiler-rt] [msan] Unify aarch64 mapping
ClosedPublic

Authored by zatrazz on Oct 16 2015, 11:06 AM.

Details

Summary

This patch unify the 39-bit and 42-bit mapping for aarch64 using only
one instrumentation algorithm (based on 39-bit mapping). A runtime
check avoid mapping 42-bit only segments for 39-bit kernels.

The LLVM instrumentation change is at [1]. The only downside of this
patch is for 42-bit VMA the 39-bit shadow/origin segments will be created
regardless. However a patch is possible to filter out this based on runtime
VMA information.

[1] http://reviews.llvm.org/D13817

Diff Detail

Event Timeline

zatrazz updated this revision to Diff 37614.Oct 16 2015, 11:06 AM
zatrazz retitled this revision from to [compiler-rt] [msan] Unify aarch64 mapping.
zatrazz updated this object.
zatrazz added reviewers: pcc, rengolin, eugenis, kcc, samsonov.
zatrazz added a subscriber: llvm-commits.
eugenis added inline comments.Oct 19 2015, 11:58 AM
lib/msan/msan.h
106–107

This is probably out of scope of this review, but could your elaborate, and maybe add a comment, about the constraints that led to this complex mapping function? For example, a list of all address ranges that must be in "app" regions would help.

This mapping limits the applications to roughly 1/7th of the address space on 39 bit VMA and only 1/30th on 42 bit VMA. Could we do any better?

zatrazz added inline comments.Oct 22 2015, 6:45 AM
lib/msan/msan.h
106–107

This is exactly what I am struggling with current aarch64 39 and 42-bit VMA contraints regarding PIE positioning. The memory segments are:

0000000000-0010000000: both 39 and 42 for own programs segments
5500000000-5600000000: 39-bits PIE program segments
7f80000000-7000000000: 39-bits libraries segments

2aa00000000-2ab00000000: 42-bits PIE program segments
3ff00000000-3ffffffffff: 42-bits libraries segments

I am trying to increase the segments size, but it is hard to come up with a single transformation that works on both 39 and 42-bit VMA that maps 39-bit to 39-bits and also works for 42-bits. I open to suggestions.

eugenis added inline comments.Oct 22 2015, 11:29 AM
lib/msan/msan.h
106–107

Can we do the same as on x86_64: flip either one or both of the most significant bits (38 & 37)?
39-bit addresses will stay 39-bit.
The following regions seem to have long enough constant left prefix for this transormation to be linear:
2aa00000000-2ab00000000: 42-bits PIE program segments
3ff00000000-3ffffffffff: 42-bits libraries segments

It will fragment 42-bit VMA in like 16 application segments, and the same number of shadow and origin; some of them will be marked invalid to avoid shadow/app/origin overlap with other segments. Not a problem, as long as the function is linear on any contiguous kernel-mapped range.

zatrazz added inline comments.Oct 22 2015, 2:07 PM
lib/msan/msan.h
106–107

This seems to be a slight better strategy:

{0x00000000000ULL, 0x01000000000ULL, MappingDesc::INVALID, "invalid"},
{0x01000000000ULL, 0x02000000000ULL, MappingDesc::SHADOW,  "shadow-1"},
{0x02000000000ULL, 0x03000000000ULL, MappingDesc::ORIGIN,  "origin"},
{0x03000000000ULL, 0x03500000000ULL, MappingDesc::INVALID, "invalid"},
{0x03500000000ULL, 0x03600000000ULL, MappingDesc::SHADOW,  "shadow-2"},
{0x03600000000ULL, 0x04500000000ULL, MappingDesc::INVALID, "invalid"},
{0x04500000000ULL, 0x04600000000ULL, MappingDesc::ORIGIN,  "origin"},
{0x04600000000ULL, 0x05500000000ULL, MappingDesc::INVALID, "invalid"},
{0x05500000000ULL, 0x05600000000ULL, MappingDesc::APP,     "app-1"},
{0x05600000000ULL, 0x07000000000ULL, MappingDesc::INVALID, "invalid"},
{0x07000000000ULL, 0x08000000000ULL, MappingDesc::APP,     "app-2"},
{0x08000000000ULL, 0x2A000000000ULL, MappingDesc::INVALID, "invalid"},
{0x2a000000000ULL, 0x2ac00000000ULL, MappingDesc::APP,     "app-3"},
{0x2AC00000000ULL, 0x2C000000000ULL, MappingDesc::INVALID, "invalid"},
{0x2C000000000ULL, 0x2CC00000000ULL, MappingDesc::SHADOW,  "shadow-3"},
{0x2CC00000000ULL, 0x2D000000000ULL, MappingDesc::INVALID, "invalid"},
{0x2D000000000ULL, 0x2DC00000000ULL, MappingDesc::ORIGIN,  "origin-3"},
{0x2DC00000000ULL, 0x39000000000ULL, MappingDesc::INVALID, "invalid"},
{0x39000000000ULL, 0x3A000000000ULL, MappingDesc::SHADOW,  "shadow"},
{0x3A000000000ULL, 0x3B000000000ULL, MappingDesc::ORIGIN,  "origin"},
{0x3B000000000ULL, 0x3F000000000ULL, MappingDesc::INVALID, "invalid"},
{0x3F000000000ULL, 0x40000000000ULL, MappingDesc::APP,     "app-4"},
  1. define MEM_TO_SHADOW(mem) ((uptr)mem ^ 0x6000000000ULL)
  2. define SHADOW_TO_ORIGIN(shadow) (((uptr)(shadow)) + 0x1000000000ULL)

Although it does not increase the VMA available for 42-bits (4.39% compare to 13% for 39). I will try to check if it is possible to squeeze more for 42-bits, but the PIE constraint is really making this hard :/

eugenis added inline comments.Oct 22 2015, 2:19 PM
lib/msan/msan.h
106–107

Your "invalid" regions are suspiciously large. It should be possible to add more app space w/o changing the mapping function.

For example, [3b, 3c) is mapped to [3d, 3e) with origin at [3e, 3f) - all three are marked invalid in your list.

Also, when naming regions, please make sure that shadow for "app-N" is called "shadow-N" - makes the list easier to read and verify. The same for origin.

zatrazz added inline comments.Oct 22 2015, 2:48 PM
lib/msan/msan.h
106–107

I realized it just after I hit the send button. Using the segments:

{0x11000000000ULL, 0x12000000000ULL, MappingDesc::APP, "app"}
{0x20000000000ULL, 0x22000000000ULL, MappingDesc::APP, "app"}
{0x2E000000000ULL, 0x2F000000000ULL, MappingDesc::APP, "app"}
{0x3B000000000ULL, 0x3C000000000ULL, MappingDesc::APP, "app"}

I could reach of a total of 12.21% of total VMA avaliable for the application, % similar to x86_64 and MIPS (12.50%).

About the names, I will change and add proper comments.

zatrazz updated this revision to Diff 38256.Oct 23 2015, 1:24 PM

This is an updated patch that used a different mapping regions and a different transformation (using XOR to clear the bits 38/37). The new mapping is:

  • 39-bits adn 42-bits mapping 0x00000000000ULL-0x01000000000ULL MappingDesc::INVALID 0x01000000000ULL-0x02000000000ULL MappingDesc::SHADOW 0x02000000000ULL-0x03000000000ULL MappingDesc::ORIGIN 0x03000000000ULL-0x03500000000ULL MappingDesc::INVALID 0x03500000000ULL-0x03600000000ULL MappingDesc::SHADOW 0x03600000000ULL-0x04500000000ULL MappingDesc::INVALID 0x04500000000ULL-0x04600000000ULL MappingDesc::ORIGIN 0x04600000000ULL-0x05500000000ULL MappingDesc::INVALID 0x05500000000ULL-0x05600000000ULL MappingDesc::APP 0x05600000000ULL-0x07000000000ULL MappingDesc::INVALID 0x07000000000ULL-0x08000000000ULL MappingDesc::APP
  • 42-bit mapping 0x08000000000ULL-0x09000000000ULL MappingDesc::INVALID 0x09000000000ULL-0x0A000000000ULL MappingDesc::SHADOW 0x0A000000000ULL-0x0B000000000ULL MappingDesc::ORIGIN 0x0B000000000ULL-0x0F000000000ULL MappingDesc::INVALID 0x0F000000000ULL-0x10000000000ULL MappingDesc::APP 0x10000000000ULL-0x11000000000ULL MappingDesc::INVALID 0x11000000000ULL-0x12000000000ULL MappingDesc::APP 0x12000000000ULL-0x17000000000ULL MappingDesc::INVALID 0x17000000000ULL-0x18000000000ULL MappingDesc::SHADOW 0x18000000000ULL-0x19000000000ULL MappingDesc::ORIGIN 0x19000000000ULL-0x20000000000ULL MappingDesc::INVALID 0x20000000000ULL-0x21000000000ULL MappingDesc::APP 0x21000000000ULL-0x26000000000ULL MappingDesc::INVALID 0x26000000000ULL-0x27000000000ULL MappingDesc::SHADOW 0x27000000000ULL-0x28000000000ULL MappingDesc::ORIGIN 0x28000000000ULL-0x29000000000ULL MappingDesc::SHADOW 0x29000000000ULL-0x2A000000000ULL MappingDesc::ORIGIN 0x2A000000000ULL-0x2AC00000000ULL MappingDesc::APP 0x2AC00000000ULL-0x2C000000000ULL MappingDesc::INVALID 0x2C000000000ULL-0x2CC00000000ULL MappingDesc::SHADOW 0x2CC00000000ULL-0x2D000000000ULL MappingDesc::INVALID 0x2D000000000ULL-0x2DC00000000ULL MappingDesc::ORIGIN 0x2DC00000000ULL-0x2E000000000ULL MappingDesc::INVALID 0x2E000000000ULL-0x2F000000000ULL MappingDesc::APP 0x2F000000000ULL-0x39000000000ULL MappingDesc::INVALID 0x39000000000ULL-0x3A000000000ULL MappingDesc::SHADOW 0x3A000000000ULL-0x3B000000000ULL MappingDesc::ORIGIN 0x3B000000000ULL-0x3C000000000ULL MappingDesc::APP 0x3C000000000ULL-0x3D000000000ULL MappingDesc::INVALID 0x3D000000000ULL-0x3E000000000ULL MappingDesc::SHADOW 0x3E000000000ULL-0x3F000000000ULL MappingDesc::ORIGIN 0x3F000000000ULL-0x40000000000ULL MappingDesc::APP

Although it creates a lot of segments, it allows to use only one function transformation for both 39 and 42-bits VMA. It also allows a high percentage of total VMA avaliable for normal allocations (13.28% for 39-bits and 12.21% for 42-bits). It also uses a faster transformation.

I also updated the LLVM patch to change the instrumentation [1]. No regression found on aarch64 39/42-bit and the new transformation also fixed a XFAIL test (chained_origin_limits).

[1] http://reviews.llvm.org/D13817

eugenis edited edge metadata.Oct 23 2015, 3:30 PM

To clarify, 12-13% is based on the whole address space, as in 1<<39, and not just user-addressable part?
Because on x86_64, for example, userspace is only allowed to map up to 0x800000000000, and MSan leaves 25% of that to the application, which is close to the theoretical limit of 33%.

lib/msan/msan.h
75

Why just 55 to 56? The whole 50 to 60 seems ok.

In general, I don't see a reason why the third hexadecimal digit can be anything but 0, please reconsider another case of this below.

In general this looks very nice. I don't think we should be scared of the huge number of mappings, this is only a very minor constant work at startup. It also slows down MEM_IS_APP, of course, but it should not be noticeable provided that the loop in that function is fully unrolled - you might want to check that the assembly looks reasonable.

zatrazz updated this revision to Diff 38458.Oct 26 2015, 2:08 PM
zatrazz edited edge metadata.

Differences from previous versions:

  • Change 39-bit mapping for PIE from 0x55...-0x56... to whole 0x50...-0x60...

And answering the questioning, I was considering based on whole VMA range to make the calculations. I think with these mappings MSAN for aarch64 with 39 and 42 is in par with other architecture regarding the available VMA for program memory allocation. I also checked the MEM_IS_APP code generation and clang does unroll all the loop, while GCC does not (and I do not think also this is an issue right now).

eugenis accepted this revision.Oct 26 2015, 2:17 PM
eugenis edited edge metadata.

LGTM

This revision is now accepted and ready to land.Oct 26 2015, 2:17 PM
zatrazz closed this revision.Oct 29 2015, 6:06 AM