This is an archive of the discontinued LLVM Phabricator instance.

[ARM][AArch64] Workaround ARM/AArch64 percularity in clearing icache.
ClosedPublic

Authored by maxim-kuvyrkov on Nov 24 2017, 3:55 AM.

Details

Summary
Certain ARM implementations treat icache clear instruction as a memory read,
and CPU segfaults on trying to clear cache on !PROT_READ page.
We workaround this in Memory::protectMappedMemory by adding
PROT_READ to affected pages, clearing the cache, and then setting
desired protection.

This fixes "AllocationTests/MappedMemoryTest.***/3" unit-tests on
affected hardware.

Diff Detail

Repository
rL LLVM

Event Timeline

maxim-kuvyrkov created this revision.Nov 24 2017, 3:55 AM

The problem manifests itself in failing "AllocationTests/MappedMemoryTest.***/3" unit-tests when running on ARMv8.2 hardware with linux kernel version ~4.10.

I'm confused by the title and rationale for this change. I don't think this has anything to do with Armv8.2-A and is instead a feature of Armv8-A.

Looking at C.5.12 in a recent release (ARM DDI 0487B.a) Arm Architecture Reference Manual I see "If EL0 access is enabled, when executing at EL0, this instruction requires read access permission to the VA, otherwise it is IMPLEMENTATION DEFINED whether it causes a Permission Fault." that text (or a close equivalent) looks to have been in the manual since the first release.

So the change makes sense, but the references to Armv8.2-A look wrong to me.

lib/Support/Unix/Memory.inc
172 ↗(On Diff #124158)

Probably this should be !(Protect & PROT_READ)

I've successfully tested an earlier version of this patch, but not the uploaded version yet. Please review, but don't submit until testing is finished.

lib/Support/Unix/Memory.inc
172 ↗(On Diff #124158)

Indeed! Thanks.

...

So the change makes sense, but the references to Armv8.2-A look wrong to me.

I've checked the docs once again, and, indeed, they don't mention Armv8.2, so I'll remove the reference. However, judging from cursory read of https://patchwork.kernel.org/patch/9275721/ , it appears only Armv8.2 kernels would set EXECUTE_ONLY bits on the pages, kernels for Armv8.[01] would set READ permissions on EXEC pages as well.

maxim-kuvyrkov retitled this revision from [ARM][AArch64] Workaround ARMv8.2 percularity in clearing icache. to [ARM][AArch64] Workaround ARM/AArch64 percularity in clearing icache..
maxim-kuvyrkov edited the summary of this revision. (Show Details)

Updated patch with James' comments addressed (testing is still in progress).

The code changes make sense to me. There is a small typo in the title, it should be "peculiarity", not worth updating the review but will be worth changing in any commit message. It will be worth waiting to see if there are comments early next week from the US due to thanksgiving.

...

So the change makes sense, but the references to Armv8.2-A look wrong to me.

I've checked the docs once again, and, indeed, they don't mention Armv8.2, so I'll remove the reference. However, judging from cursory read of https://patchwork.kernel.org/patch/9275721/ , it appears only Armv8.2 kernels would set EXECUTE_ONLY bits on the pages, kernels for Armv8.[01] would set READ permissions on EXEC pages as well.

My reading (confirmed by our kernel team here) is that you'll be EXECUTE_ONLY at all architecture levels. The difference is that prior to Armv8.2 you don't have a Privileged Access Never (PAN) bit, so EL1 (kernel) can read EL0 (user)'s execute only data. With PAN+UAO in Armv8.2-A, even the kernel can't read the user execute only data.

So this code has likely always been buggy for AArch64, we just didn't see it on Linux systems until Linux 4.9 (4.10?) started allowing use of execute-only user permissions.

@jgreenhalgh , thanks for the info.

I've successfully tested the latest version of the patch with no failures, and with 8 progressions on AArch64 machine with 4.10 kernel.

zatrazz added inline comments.Nov 27 2017, 3:26 AM
lib/Support/Unix/Memory.inc
134 ↗(On Diff #124169)

I think instead of issue a mmap plus two mprotects for MF_EXEC, it would be better if we could just mmap with read permission, call InvalidateInstructionCache and then mmap to the expected protection. Something as:

int adjustAllocateMappedProtectionFlags (unsigned Flags)
{
#if defined(__NetBSD__) && defined(PROT_MPROTECT)
  Protect |= PROT_MPROTECT(PROT_READ | PROT_WRITE | PROT_EXEC);
#endif
#if defined(__arm__) || defined(__aarch64__)
  Protect |= PROT_READ;
#endif
  return Protect;
}

[...]

MemoryBlock
Memory::allocateMappedMemory(size_t NumBytes,
                             const MemoryBlock *const NearBlock,
                             unsigned PFlags,
                             std::error_code &EC) {
  [...]
  int Protect = getPosixProtectionFlags(PFlags);

  Protect = adjustAllocateMappedProtectionFlags (Protect);
  [...]

And then issue the InvalidateInstructionCache without requiring to mprotect to PROT_READ first:

std::error_code
Memory::protectMappedMemory(const MemoryBlock &M, unsigned Flags) {
  [...]
  bool InvalidateCache = (Flags & MF_EXEC);

#if defined(__arm__) || defined(__aarch64__)
  // Certain ARM implementations treat icache clear instruction as a memory read,
  // and CPU segfaults on trying to clear cache on !PROT_READ page.  Therefore we need
  // to temporarily add PROT_READ for the sake of flushing the instruction caches.
  if (InvalidateCache && !(Protect & PROT_READ)) {
    Memory::InvalidateInstructionCache(M.Address, M.Size);
    InvalidateCache = false;
  }
#endif

I think Memory should be removed as it's impossible to make it portable as is to every combination of OS and hardware. It's too low-level interface.

lhames accepted this revision.Nov 27 2017, 9:07 AM

This workaround seems ok in the short term. Longer term I think Kamil is right: Memory should be nuked in favor of something higher level.

This revision is now accepted and ready to land.Nov 27 2017, 9:07 AM
lib/Support/Unix/Memory.inc
134 ↗(On Diff #124169)

I experimented with that, and the code starts to look much uglier. This file is target-independent, and it seems better to localize all workaround code in a single function, rather than spreading it across several functions.

Would someone with commit access please merge this?

This revision was automatically updated to reflect the committed changes.

I've committed it for you r319166