Page MenuHomePhabricator

[AArch64][Builtins] Avoid unnecssary cache cleaning
ClosedPublic

Authored by bryanpkc on Oct 21 2019, 3:31 AM.

Details

Summary

Use new control bits CTR_EL0.DIC and CTR_EL0.IDC to discover the d-cache
cleaning and i-cache invalidation requirements for instruction-to-data
coherence. This matches the behavior in the latest libgcc.

Author: Shaokun Zhang <zhangshaokun@hisilicon.com>

Diff Detail

Event Timeline

bryanpkc created this revision.Oct 21 2019, 3:31 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 21 2019, 3:31 AM
Herald added subscribers: llvm-commits, Restricted Project, delcypher, kristof.beyls. · View Herald Transcript

I'm on vacation at the moment, will take a look next week as this will need some checking of the architecture manual. This LKML message https://lkml.org/lkml/2018/10/9/698 implies that in some cases it is insufficient to just read CTR_EL0.{IDC,DIC}, as a 0 also requires a check with CTR_EL1 which can't be done in userspace. Would you be able to take a look at that thread? For compiler-rt we need something that works on all CPUs at EL0.

I'm on vacation at the moment, will take a look next week as this will need some checking of the architecture manual. This LKML message https://lkml.org/lkml/2018/10/9/698 implies that in some cases it is insufficient to just read CTR_EL0.{IDC,DIC}, as a 0 also requires a check with CTR_EL1 which can't be done in userspace. Would you be able to take a look at that thread? For compiler-rt we need something that works on all CPUs at EL0.

The way I read that LKML discussion is that it's a kernel discussion on how to expose this feature to userspace (EL0). On earlier architecture revisions this field is architected as "reserved zero", meaning reads from it will be 0. Therefore at boot time the kernel needs to look at a higher EL to figure out if the feature actually exists. Once it figures that out it will set the EL0 field (probably through trapping) to the appropriate value for userspace, which is what we should be reading. Note that a value of 0 here is conservatively correct: it means userspace has to do the explicit cache clean operation, which is always safe to do in this context. This change tries to figure when it's safe to avoid that cache clean.

This logic is the same as the libgcc logic in http://gcc.gnu.org/r276122 so it looks good to me. The libgcc implementation caches the value of ctr_el0 in a static variable to avoid a sysreg read on every call. Can this be done here as well?

I actually read that LKML post before submitting the patch, and I had the same understanding as @ktkachov. I will change the code to cache the result of the MRS instruction as suggested.

bryanpkc updated this revision to Diff 226542.Oct 26 2019, 10:35 AM

Avoid reading CTR_EL0 on every call to __clear_cache.

peter.smith accepted this revision.Oct 28 2019, 3:26 AM

Thanks both for the extra information and update. Looks good to me.

This revision is now accepted and ready to land.Oct 28 2019, 3:26 AM
This revision was automatically updated to reflect the committed changes.