- User Since
- Jun 18 2014, 12:59 AM (269 w, 6 d)
Mon, Aug 12
Fri, Aug 9
I've just added a few fly-by nits; I'm afraid I didn't do an in-depth review.
Thu, Aug 8
Mon, Aug 5
Jul 10 2019
Thank you very much Ziang for sharing this work!
I'd love to more easily benchmark on Android devices - and this seems to be a good step in the right direction.
Thanks for the feedback Sam!
Jul 5 2019
Thanks for upstreaming this, Tim.
May 6 2019
I haven't created any tests for this yet - I thought I'd first try and get feedback on whether this approach seems right before I spent time learning about how to get the test-suite unit/regression tests set up.
Apr 15 2019
Apr 11 2019
Apr 2 2019
This intrinsic got added to gcc a while ago and should become available in the upcoming gcc 9 release.
In gcc however, the prototype of the intrinsic is slightly different (see https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html):
type __builtin_speculation_safe_value (type val, type failval)
It provides a second optional argument "failval". From the gcc documentation: "The function may use target-dependent speculation tracking state to cause failval to be returned when it is known that speculative execution has incorrectly predicted a conditional branch operation."
So, when implementing the intrinsic using a speculation barrier such as lfence, that failval argument doesn't have any effect. However, when lowering the intrinsic using speculation tracking similar to how that's used in SLH, this failval parameter is used to return a non-zero value on miss-speculation, in case the developer prefers that over the default zero value.
Mar 27 2019
Thanks for picking this up, Zola!
Mar 8 2019
Mar 7 2019
Feb 27 2019
LGTM, thanks for fixing this!
Feb 21 2019
This seems quite obvious.
Typically when we change the behaviour of code, we aim to add a unit/regression test to check that indeed the code now has the intended behaviour (and mostly to make sure future changes do not regress on that expected behaviour).
That being said, this is a change to debug output, for which we typically do not add regression tests.
Feb 20 2019
Feb 12 2019
Jan 23 2019
Jan 18 2019
Updated approach to use the register scavenger to find an available temporary register. In the rare case when no temporary register is available, fall back to inserting a full speculation barrier.
Jan 16 2019
Jan 15 2019
Jan 14 2019
Jan 11 2019
Any feedback on the idea of an alternative, much lighter Conduit API client (since we only use a subset of Phab features) where it's also easier to (by default) hook linters or various other tools to? Not sure how others feel about Arcanist especially considering it covers a massive set of APIs beyond what's commonly used here and a local PHP installation just to run it is often undesirable (A simpler one-file client in Python should suffice, especially considering that many are changing their workflows due to the Git migration).
Jan 10 2019
Jan 9 2019
Jan 8 2019
I naively wonder if unconditionally enabling a CSE optimization even at -O0 is going to lead to a poorer debug experience in some cases?
Jan 7 2019
Fix 2 issues flagged by -verify-machineinstrs when testing this patch on more code.
Dec 20 2018
Dec 17 2018
Dec 12 2018
This updated diff addresses Oliver's final comment, namely about what to do when the program uses X16, e.g because of use of inline assembly specifically requesting to use X16.
Besides all the options I came up with previously, there is a 4th option: prevent speculation by inserting DSB SYS/ISB instruction pairs. Conceptually this is not unlike using lfence in X86SpeculativeLoadHardening.
I believe the pros/cons of this approach are (compared to the 3 options I listed previously):
- A relatively simple implementation.
- Still keeping the advantages of doing speculation hardening very late.
- No silent non-protected code.
- This alternative protection mechanism is expected to only be needed for a very small amount of code.
- This likely has higher overhead than if we could still use a data flow mechanism to track when control flow miss-speculation happens.
Dec 6 2018
There currently isn't even a user interface to reserve X16.
X16 can be reserved by the user using inline assembly, either with a clobber or named register variable. I can think of a few cases where this might happen in real code:
D51432 uses it in libunwind to implement unwinding through functions using return-address signing.
Inline assembly which contains a function call will need to clobber X16 and X17.
If moving this pass to before register allocation would add a lot of extra complexity, maybe this could also be solved by copying the taint into SP before each inline asm block, and back out afterwards, like we currently do for calls?
- Addressing most of Oliver's comments.
Dec 5 2018
Dec 3 2018
Nov 26 2018
Nov 22 2018
LGTM now. I'm not an expert on the clang attribute mechanics, so am relying on Aaron's judgement/review for that part.
Nov 20 2018
Nov 16 2018
Nov 9 2018
This patch looks fine to me, assuming the TSV110 core implements Armv8.2 including the optional extensions ARMv8.2-FP16, ARMv8.2-DotProd, ARMv8.2-FHM and SPE.
Oct 31 2018
Add a sentence on statuses involved when triaging.
Oct 29 2018
Oct 28 2018
Oct 26 2018
Oct 25 2018
I guess you're mainly asking review on this to get opinions on the approach to introduce new pseudo instructions just to avoid cycles during DAG combining and/or lowering?
From the patch, I furthermore guess that introducing a new pseudo leads to quite a bit of code duplication/violations of the "don't-repeat-yourself" principle in the instruction matching patterns?
Oct 12 2018
Oct 11 2018
Sep 21 2018
Sep 19 2018
Sep 18 2018
Sep 7 2018
Sep 6 2018
Aug 30 2018
I am as excited by this as the other reviewers.
Thank you very much for this, Matthias!
Aug 27 2018
I'm not an expert on many of the areas touched by this patch, but it looks fine from me from a high-level point-of-view, modulo a few nits I have on a few comments.
Aug 23 2018
Jul 10 2018
Jul 9 2018
update diff with full context
Jun 28 2018
Jun 27 2018
I believe this is aiming to implement the -ffixed-reg command line option from gcc.
The documentation is at https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html:
Jun 22 2018
It's clear from your description that the run_under _mutateCommandLine isn't a "private" function in the module as is, and therefore removing the under score makes sense to me.
Maybe it'd be better for the mutatePlan in perf.py to be implemented in a different way so that _mutateCommandLine really can be private in the run_under module.
I'm afraid I don't have enough experience here to know whether that would be a reasonable goal, implementable with reasonable effort.
I hope @MatzeB could share his opinion on this.