This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/builtins/
-
builtins/
-
clear_cache.c

Differential D37788

[ARM] builtins: Replace abort by assert in clear_cache.
ClosedPublic

Authored by manojgupta on Sep 12 2017, 8:36 PM.

Download Raw Diff

Details

Reviewers

rengolin
compnerd
srhines
peter.smith
joerg

Commits

rGa2eacb3bfc92: [ARM] builtins: Replace abort by assert in clear_cache.
rCRT314322: [ARM] builtins: Replace abort by assert in clear_cache.
rL314322: [ARM] builtins: Replace abort by assert in clear_cache.

Summary

builtion_clear_cache maps to clear_cache function. On Linux,
clear_cache functions makes a syscall and does an abort if syscall fails.
Replace the abort by an assert so that non-debug builds do not abort
if the syscall fails.

Fixes PR34588.

Diff Detail

Build Status

Buildable 10168
Build 10168: arc lint + arc unit

Event Timeline

manojgupta created this revision.Sep 12 2017, 8:36 PM

Herald added subscribers: kristof.beyls, aemerson. · View Herald TranscriptSep 12 2017, 8:36 PM

From reading the code and the disassembly this looks like it isn't checking for a NULL start address but is actually checking the value in r0 after the svc call that will clear the cache has completed. Looking in http://elixir.free-electrons.com/linux/latest/source/arch/arm/kernel/traps.c [*] I can see that the call may use non zero return status to signal an error. Am I missing something here?

I don't know enough about the OS to know whether this check makes sense for Android or ChromeOS, if it is usefull on Linux perhaps you might want to conditionally compile it out rather than remove it.

Extract from web-page.

static inline int
do_cache_op(unsigned long start, unsigned long end, int flags)
{
	if (end < start || flags)
		return -EINVAL;

	if (!access_ok(VERIFY_READ, start, end - start))
		return -EFAULT;

	return __do_cache_op(start, end);
}
...
/*
 * Handle all unrecognised system calls.
 *  0x9f0000 - 0x9fffff are some more esoteric system calls
 */
#define NR(x) ((__ARM_NR_##x) - __ARM_NR_BASE)
asmlinkage int arm_syscall(int no, struct pt_regs *regs)
{
...
/*
	 * Flush a region from virtual address 'r0' to virtual address 'r1'
	 * _exclusive_.  There is no alignment requirement on either address;
	 * user space does not need to know the hardware cache layout.
	 *
	 * r2 contains flags.  It should ALWAYS be passed as ZERO until it
	 * is defined to be something else.  For now we ignore it, but may
	 * the fires of hell burn in your belly if you break this rule. ;)
	 *
	 * (at a later date, we may want to allow this call to not flush
	 * various aspects of the cache.  Passing '0' will guarantee that
	 * everything necessary gets flushed to maintain consistency in
	 * the specified region).
	 */
	case NR(cacheflush):
		return do_cache_op(regs->ARM_r0, regs->ARM_r1, regs->ARM_r2);

Updated change description.

Thanks Peter for taking a look. I was a bit confused by the assembly and didn't realize r0 is not the address.
libgcc implementation does not do the abort, It just happily calls svc and returns without any check. So, I have updated the summary to reflect only abort removal.

Do you have any evidence that this behaviour was reached by design and not accident?

A simple GCC/LKML thread answering that question would be enough, but code comments and documentation would be also interesting to have.

I really do not recommend us changing behaviour by looking at GCC's source code. Not only this is "bug-for-bug dumb emulation", but this can eventually take us into GPL problems.

This revision now requires changes to proceed.Sep 13 2017, 4:20 AM

rengolin added a reviewer: joerg.Sep 13 2017, 4:21 AM

I can't really comment on the Linux interface, but we generally ignore the system call error in NetBSD where one is necessary. While aborting might be legal, it seems to be the worst possible way of dealing with questionable arguments.

Digging it a bit, commit r203674 (reviewed by dsanders/bastien) introduced this part for Android ARM/MIPS.

The original description indicates that the reasons for this patch were misguided, the bug report has nothing to indicate what the error was and why turning the error checking off would fix it.

We're down to just emulating libgcc bug-for-bug, and that's not a path I'm willing to blindly take.

So, I recommend the authors to dig a bit more of what's happening and come up with either a clear description of why the error handling needs removing, or submit a new patch to fix the actual problem.

@joerg From a system point of view, how common is the case where the parameters are garbage? Shouldn't systems validate them before calling a service call? If users/libraries are calling system calls with bad arguments, don't we want to know what, why and how to fix them?

Thanks Renato,
Right now, I am trying to work with some kernel folks to get a better understanding of the underlying issue.
Will update when I have more information.

From speaking to a few people internally and digging a bit further into the code it seems like:

The prototype for the builtin is void builtin_clear_cache (char *begin, char *end) so there isn't a way of reporting failure from that function to the caller
The system call can fail on Linux when StartAddress > EndAddress which the user can check themselves before calling, or if the access_ok check fails (see comment later) or if a data abort exception occurs when doing one of the CP15 cache manipulation instructions. According to the linux kernel code, this happens when the virtual address isn't mapped.

So it looks like a program has given builtin_clear_cache an invalid memory range. In an ideal world builtin_clear_cache would pass the error code up to the caller so that they could decide what to do about it, unfortunately it is a void function so we can't do that. I guess the question is whether failure of builtin_clear_cache is sufficient to unconditionally terminate the program. There are defensible arguments on both sides, I'm tending towards the libgcc behaviour as that is what many applications will be expecting, if the error checking is needed then the OS specific functions could be used. In fact this may be the reason that the builtin doesn't return an error code as it has to be portable across many architectures and not all of these will have a cache clear that can fault.

access_ok for Arm expands to the __range_ok, the AArch64 has a comment that I've adapted for Arm.

/*
 * Test whether a block of memory is a valid user space address.
 * Returns 1 if the range is valid, 0 otherwise.
 *
 * This is equivalent to the following test:
 * (u33)addr + (u33)size <= current->addr_limit

In D37788#869943, @peter.smith wrote:

The prototype for the builtin is void builtin_clear_cache (char *begin, char *end) so there isn't a way of reporting failure from that function to the caller

The system call can fail on Linux when StartAddress > EndAddress which the user can check themselves before calling, or if the access_ok check fails (see comment later) or if a data abort exception occurs when doing one of the CP15 cache manipulation instructions. According to the linux kernel code, this happens when the virtual address isn't mapped. So it looks like a program has given builtin_clear_cache an invalid memory range.

Would be interesting to know how often these things happen in non-exceptional cases.

In an ideal world builtin_clear_cache would pass the error code up to the caller so that they could decide what to do about it, unfortunately it is a void function so we can't do that.

I'm coming at this frmo a pragmatic point of view, so we can ignore the ideal world scenario. :)

There are defensible arguments on both sides, I'm tending towards the libgcc behaviour as that is what many applications will be expecting, if the error checking is needed then the OS specific functions could be used. In fact this may be the reason that the builtin doesn't return an error code as it has to be portable across many architectures and not all of these will have a cache clear that can fault.

access_ok for Arm expands to the __range_ok, the AArch64 has a comment that I've adapted for Arm.
/*
 * Test whether a block of memory is a valid user space address.
 * Returns 1 if the range is valid, 0 otherwise.
 *
 * This is equivalent to the following test:
 * (u33)addr + (u33)size <= current->addr_limit

Sorry, pressed ctrl+enter by accident.

In D37788#869943, @peter.smith wrote:

I guess the question is whether failure of builtin_clear_cache is sufficient to unconditionally terminate the program.

Also, what kinds of failure are we talking about...

If the conditions that this happens are usually non-exceptional because of how the OS is designed, and core applications crash, than turning off the error is a no brainer.

OTOH, if this only happens on bad user code, then I see this as an opportunity to fix their code instead.

Bottom line is: we don't need to emulate libgcc in this case, and if this helps users write better code, then Linux devs will actually thank us for that.

Makes sense?

cheers,
--renato

Some more thoughts.

For a [VA start, VA end) interval that is entirely unmapped it is safe for the cache flush to continue without error as by definition the addresses won't be in the cache.
If we can split [VA start, VA end) into two sub-ranges [Mapped VA start, Mapped VA end), [Mapped VA end, Unmapped VA end) then when the function returns all the Mapped addresses will have been cleared, we don't care about the unmapped addresses.
If the [VA start, VA end) is sparsely mapped into non contiguous chunks of mapped and non-mapped then there is a potential problem if the program is written to assume that all addresses in the interval have been flushed.

Speaking to my colleague that works on QEMU, a common use case for this function is for people writing JITs, in essence write instructions to a buffer, call __builtin_cache flush with the address range of the buffer. In this case the buffer will be mapped and there should be no data aborts. A potential situation where an abort might occur is if the Jit is a bit sloppy about giving precise address ranges for what it has modified, especially if the address space is sparsely mapped.

It would be useful to find out what application crashed, and whether the compiler-rt abort was a genuine bug in the application, or a legitimate use where there was no expectation that all addresses were mapped.

Pragmatically I think the compiler-rt implementation is effectively adding an extra pre-condition to the builtin that all virtual addresses in the range are mapped. In return it offers the post-condition that all addresses in the range have been definitely cleared. My opinion is that we shouldn't change the contract of the builtin (defined by its implementation) without some kind of extra documentation.

Apologies for being light on details. The failure was discovered in NaCl (Native Client) in Chrome browser when running on ChromeOS with details at https://bugs.chromium.org/p/chromium/issues/detail?id=761103.
I don't know much about NaCl except that it allows execution for some trusted processes natively in a sandbox. It is also pretty painful to debug.

I also found (same as Peter) that builtin_clear_cache is mostly called on VMs/JITs. A quick search revealed art (Android runtime), halide, NaCl, openjdk, qemu, webkit, a bunch of profilers etc. use this builtin. Debugging a crash in any of these applications is non-trivial.
Only Nacl is deployed on ChromeBook and it took a pretty painful debugging and still struggling (can't run gdb :( ) to reveal that clear_cache builtin is causing problems. There is no guarantee that any of the above applications may not crash if built on ARM with compiler-rt. In addition, when clear_cache builtin aborts, there is no clear way to know the root cause since Linux kernel does not dump error messages in cache flush syscalls.

FWIW, in the past, this assertion has saved me some heartache since it caught the fact that the input was invalid and that the syscall actually failed.

Some more info on this. At the time Android did its implementation, the Linux kernel used[1] to call do_cache_op, ignoring the return result and then returning zero. Now [2], it returns whatever do_cache_op returns, which ends up being a very long sequence of uses and definitions, boiling down to SVC.

The GCC manual [3] doesn't say anything about failures, but it does say either one of two things will happen:

instructions are emitted in-line to clear the instruction cache
call to the __clear_cache function in libgcc is made

it's not unreasonable to assume "emit instruction" (singular) to be just SVC, no checks. Their own implementation in libgcc seems to confirm that.

However, the actual behaviour is probably historical (as shown from the kernel evolution), because x86 doesn't need to flush the code cache, while ARM does.

On the debug side of things, compiler-rt is implementing a function that will be on the stack trace after RT aborts, so easy to spot that it was a failed cache clearing attempt, with all registers as witness. If that function gets inlined, then walking back the assembly code will get you the SVE call, and then the register state.

I understand that NaCl and QEMU are a pain to debug, but diagnosing the problems of clearing the cache can help find much harder errors elsewhere.

So, I'm in two minds about this. While I understand the frustration of virtualisation users, I find it hard to concede without harder evidence (which so far has been anecdotal).

Being pragmatic, I would be ok with a change that had a non-default pre-processor guard (something like UNSAFE_CLEAR_CACHE), which platforms are free to set in their build systems, avoiding the abort.

In that case, I'd like people that are building compiler-rt on multiple platforms (like Peter, Saleem, Joerg) to make sure the solution is good for them.

[1] http://elixir.free-electrons.com/linux/v2.6.24/source/arch/arm/kernel/traps.c#L468
[2] http://elixir.free-electrons.com/linux/v4.14-rc1/source/arch/arm/kernel/traps.c#L631
[3] https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

So, I'm in two minds about this. While I understand the frustration of virtualisation users, I find it hard to concede without harder evidence (which so far has been anecdotal).

I've not been able to come to a firm conclusion on this either, I think that there is a good chance that there is call to the builtin that doesn't specify the address range with sufficient precision. However on a pragmatic basis I'm tending towards removing/altering the check for the following reasons:

The clear_cache() implementations on all the other architectures and platforms do not call compilerrt_abort(), for example the return value of sys_icache_invalidate() on Apple is not checked.
We are adding a stronger pre-condition [start, end) virtual addresses must be currently mapped, to libgcc's implementation. In return we can offer a stronger post-condition. True we don't need to slavishly follow libgcc, but we don't have any documentation stating this anywhere and I think that most people porting applications to compiler-rt will assume that the clear-cache function has the same interface.
Unlike the other uses of compilerrt_abort which usually represent unimplemented cases or precondition failures, I think that only the application can truly judge whether the program should be aborted.

I don't think the above argument is strong enough to go against the consensus opinion though.

Being pragmatic, I would be ok with a change that had a non-default pre-processor guard (something like UNSAFE_CLEAR_CACHE), which platforms are free to set in their build systems, avoiding the abort.

In that case, I'd like people that are building compiler-rt on multiple platforms (like Peter, Saleem, Joerg) to make sure the solution is good for them.

I think that would be difficult to do without making it ARM_UNSAFE_CLEAR_CACHE, for example on AArch64 the cache invalidation instructions are in user-space so clear-cache uses them directly, there is a bit in the SCTLR for EL1 that controls whether these instructions trap to EL1 or are ignored. An alternative might be to call another helper function, something like clear_cache_error() which would have a default weak implementation that calls compilerrt_abort().

In D37788#873836, @peter.smith wrote:

I don't think the above argument is strong enough to go against the consensus opinion though.

I don't think we have a consensus! :)

I think that would be difficult to do without making it ARM_UNSAFE_CLEAR_CACHE, for example on AArch64 the cache invalidation instructions are in user-space so clear-cache uses them directly, there is a bit in the SCTLR for EL1 that controls whether these instructions trap to EL1 or are ignored. An alternative might be to call another helper function, something like clear_cache_error() which would have a default weak implementation that calls compilerrt_abort().

This looks ugly. I agree with you, I'm more inclined to have a special case for ARM (if anyone wants it) than everyone else.

Peter, Saleem?

Just want to give an update on this.
The issue is being investigated by NaCL team but so far this is what they have told us:

"The addresses are correctly mapped. They start off as mmap()'d from a file as PROT_NONE, but they are mprotect()'d to PROT_READ|PROT_EXEC some time before calling cache flush on them."

Seaching on web, mprotect() + cache flush seems to be a common idiom in JITs.

An example of this idion usage: make_executable() function in https://github.com/pathscale/libunwind/blob/vanilla_pathscale/tests/ia64-test-dyn1.c

I verified that running the above code in a sample program (with a mmap'ed()'s memory buffer) aborts the program with compiler-rt but not libgcc.

In D37788#879519, @manojgupta wrote:

An example of this idion usage: make_executable() function in https://github.com/pathscale/libunwind/blob/vanilla_pathscale/tests/ia64-test-dyn1.c

I verified that running the above code in a sample program (with a mmap'ed()'s memory buffer) aborts the program with compiler-rt but not libgcc.

If I understand correctly the use of mprotect in that example leaves the pages readable and executable, but removes write access? As far as I can tell the memory doesn't need to be readable for cache clear instructions to work so I am surprised that an abort is happening.

What I'd like to know is if the libgcc implementation is reporting failure and ignoring it, in which case there might be something fundamentally wrong, or maybe the system call on your kernel the syscall doesn't report the error in r0, in which case compiler-rt is reading random garbage. Apologies I'm at a conference this week and can't check any of this myself, would you be able to:

Check your kernel to see what its implementation of the syscall does? If it doesn't return anything useful in R0 we have our answer.
Check the return value from the syscall from libgcc, or some clone of it?

I patched a couple of kernels to print the return value of the syscall and tested a few cases.

Atleast on Linux kernels 3.14 and 3.18, the syscall returns -EFAULT whenever PROT_WRITE is not used in mprotect call.

The following is the implementation in kernel 3.14 (http://elixir.free-electrons.com/linux/v3.14.79/source/arch/arm/kernel/traps.c#L536 )

static inline int do_cache_op(unsigned long start, unsigned long end, int flags)
{
	if (end < start || flags)
		return -EINVAL;
	if (!access_ok(VERIFY_READ, start, end - start))
		return -EFAULT; // -EFAULT is NOT returned from here. 
	return __do_cache_op(start, end); // returns -EFAULT without PROT_WRITE
}

libgcc does't check the return value of the syscall, it just makes the syscall and returns. So does gforth's implementation. Are there any other implementations I should check?

Interesting, I think the next thing to find out is "Is the cache flushed, and then the syscall returns -EFAULT, or is the cache not flushed and the syscall returns -EFAULT?" If it is the former then we may have a reasonable case for removing the check for compiler-rt, however if it is the latter then compiler-rt may have found a subtle bug in the idiom of removing write access, call clear cache on Arm.

It would be good to get this checked over by someone that knows linux kernel's better than I do.

A colleague mentioned that on Arm removing write and adding exec should cause mprotect() to flush the at least some of the cache hierarchy. This might explain why we don't see lots of failures, however it doesn't explain why we are seeing an abort when trying to clear the cache. The only thing that I can think of is that there is some dirty data in _a_ cache that can't be written back to the memory as the mprotect() has just marked it read-only. However I'm struggling to work out how this situation could arise.

Looking at the kernel source code, it will return -EFAULT if an instruction raises a fault.

Also, a colleague pointed me to Aarch64 docs (https://developer.arm.com/docs/den0024/latest/11-caches/115-cache-maintenance ) that state:
The AArch64 Data Cache invalidate instruction by address, DC IVAC, requires write permission or else a permission fault is generated.

I could not find anything similar for ARM32 but can i assume that ARM32 has a similar behavior? Most docs I found state use of mrc/mcr instructions for the cache flushing and requiring supervisor mode but didn't say anything related to write permission.

Ugh, this is not particularly helpful. I think that in the case that we can detect the failure, it would be nice. OTOH, I agree that these functions are special (effectively to be treated as if the compiler had emitted them inline). I think that having a guarded way to actually keep the abort is nice ... maybe an assert(start_reg == 0 && "syscall failed"); instead. That would just get pulled out in the case of a _NDEBUG build.

@peter.smith I don't think that the "kernel" here would be anything other than Linux. If the kernel version is missing the reporting, that is a different issue.

As an aside, it sounds like although this is a pain to debug, it may actually be identifying a real issue? (which is a good reason to keep it around).

The reason linux kernel is returning an error is because the mcr instruction is indeed raising a fault when operating on read only memory. Kernel's fault handler catches it and so it ends up returning -EFAULT.

http://elixir.free-electrons.com/linux/v3.18.71/source/arch/arm/mm/cache-v7.S#L259

ENTRY(v7_coherent_user_range)
UNWIND(.fnstart )
dcache_line_size r2, r3
sub r3, r2, #1
bic r12, r0, r3
#ifdef CONFIG_ARM_ERRATA_764369
ALT_SMP(W(dsb))
ALT_UP(W(nop))
#endif
1:
USER( mcr p15, 0, r12, c7, c11, 1 ) @ clean D line to the point of unification // This instruction raises a fault if memory is read only.

This code is unchanged between Linux kernel 3.xx to the current kernel 4.14 .

Apologies for taking so long to respond, I've been trying to find a reference as to whether the Arm cache clean using MRC CP15, ... should always cause a Data Abort if there is no write permission as with the AArch64 version, or only when it tries to actually write to memory, which would only occur if the data in the cache were dirty.

The best I can come up with is in the v7A Arm Architecture Reference Manual. By tracing through the pseudo code; the permission checks are done on a TranslationTable look up (as would be done when looking up an entry in the physically tagged cache by VA). The permission check on the TranslationTable look up will fail if the operation is a write but no write permission is available. The cache maintenance operations are considered write-only so this implies that on Arm v7A the cache maintenance operations need the memory to be writeable.

I think that this makes sense as you cannot clean a cache to main memory if the memory has been made read-only. As a result I think it is incorrect on Arm and AArch64 to do mprotect to remove write access and then flush the caches. The mprotect() implementation on Linux will do the cache maintenance to make sure any dirty data for the range is flushed so another call to clean the cache is not necessary.

I think that compnerd's idea of using an assert is a good one as this will give people the chance to find errors, while maintaining the same interface as the other architectures and libgcc.

I agree with Peter analysis and accept Saleem's solution as a compromise.

However, if this is indeed the case, it would be good to have that idiom fixed in whatever implementation still uses it.

Replaced abort by an assert as requested.

Harbormaster completed remote builds in B10618: Diff 116785.Sep 27 2017, 3:18 AM

I am also ok with an assert. Also adding Joerg back to reviewers, arc dropped him from the reviewers list when updating the review.

LGTM, thanks!

Let's see how this goes... :)

This revision is now accepted and ready to land.Sep 27 2017, 7:47 AM

manojgupta closed this revision.Sep 27 2017, 10:37 AM

Revision Contents

Path

Size

lib/

builtins/

clear_cache.c

3 lines

Diff 114961

lib/builtins/clear_cache.c

Show First 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	#define __ARM_NR_cacheflush 0x0f0002
register int start_reg __asm("r0") = (int) (intptr_t) start;		register int start_reg __asm("r0") = (int) (intptr_t) start;
const register int end_reg __asm("r1") = (int) (intptr_t) end;		const register int end_reg __asm("r1") = (int) (intptr_t) end;
const register int flags __asm("r2") = 0;		const register int flags __asm("r2") = 0;
const register int syscall_nr __asm("r7") = __ARM_NR_cacheflush;		const register int syscall_nr __asm("r7") = __ARM_NR_cacheflush;
__asm __volatile("svc 0x0"		__asm __volatile("svc 0x0"
: "=r"(start_reg)		: "=r"(start_reg)
: "r"(syscall_nr), "r"(start_reg), "r"(end_reg),		: "r"(syscall_nr), "r"(start_reg), "r"(end_reg),
"r"(flags));		"r"(flags));
if (start_reg != 0) {
compilerrt_abort();
}
#elif defined(_WIN32)		#elif defined(_WIN32)
FlushInstructionCache(GetCurrentProcess(), start, end - start);		FlushInstructionCache(GetCurrentProcess(), start, end - start);
#else		#else
compilerrt_abort();		compilerrt_abort();
#endif		#endif
#elif defined(__mips__)		#elif defined(__mips__)
const uintptr_t start_int = (uintptr_t) start;		const uintptr_t start_int = (uintptr_t) start;
const uintptr_t end_int = (uintptr_t) end;		const uintptr_t end_int = (uintptr_t) end;
▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines