Page MenuHomePhabricator

annita.zhang (annita.zhang)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 1 2019, 11:11 PM (65 w, 6 d)

Recent Activity

Feb 23 2020

annita.zhang accepted D75017: [Driver][X86] Add helptext for malign-branch*, mbranches-within-32B-boundaries.

LGTM

Feb 23 2020, 6:48 PM · Restricted Project
annita.zhang added inline comments to D75017: [Driver][X86] Add helptext for malign-branch*, mbranches-within-32B-boundaries.
Feb 23 2020, 5:00 PM · Restricted Project

Jan 30 2020

annita.zhang accepted D73267: Fix helptext for opt/llc after 14fc20ca6.

Refine the wordings to make it clearer. Other than that, LGTM.

Jan 30 2020, 1:12 AM · Restricted Project

Jan 17 2020

annita.zhang added a comment to D72463: [Driver][X86] Add -malign-branch* and -mbranches-within-32B-boundaries.

@MaskRay Did you merge it to LLVM 10 branch?

It is included in the release branch.

git branch origin/release/10.x --contains 5ca24d09aefaedf8e4148c7fce4b4ab0c4ecc72a # suceeded

Jan 17 2020, 9:34 PM · Restricted Project
annita.zhang added a comment to D72463: [Driver][X86] Add -malign-branch* and -mbranches-within-32B-boundaries.

@MaskRay Did you merge it to LLVM 10 branch?

Jan 17 2020, 7:18 AM · Restricted Project

Jan 16 2020

annita.zhang added a comment to D72225: Align branches within 32-Byte boundary(Prefix padding).

I have serious reservations about the rush to land this patch. I have expressed some of this privately, and other bits have been in previous review threads, but I want to put everything in one public place.

I don't see a strong value in having this in the current LLVM release. We have support for nop padding, and the delta between nop padding and prefix padding is minimal. My personal take is that wrapping up all of the wiring to provide the clang driver option to enable padding (via whatever mechanism is available, for now nops) should take preference over making the padding code marginally better.

Jan 16 2020, 12:35 AM · Restricted Project

Jan 15 2020

annita.zhang added a comment to D72225: Align branches within 32-Byte boundary(Prefix padding).

Diff 21 is still incorrect. I'll give a reproduce.

(DATA16_PREFIX)
(LEA64r)
<--- Diff 21 can place cs (0x2e) here and break the General Dynamic TLS code sequence --->
<MCInst 851> (DATA16_PREFIX)
<MCInst 851> (DATA16_PREFIX)
<MCInst 2450> (REX64_PREFIX)
<MCInst 602 <MCOperand Expr:(__tls_get_addr@PLT)>> CALL64pcrel32
Jan 15 2020, 10:53 PM · Restricted Project

Jan 13 2020

annita.zhang added a comment to D59780: Support Intel Control-flow Enforcement Technology.

@MaskRay Please rebase and upload it with me and other people as reviewers, then I'll LGTM.

Jan 13 2020, 11:12 PM · Restricted Project

Jan 6 2020

annita.zhang added inline comments to D72227: [Driver][X86] Add -malign-branch* and -malign-branch-within-32B-boundaries.
Jan 6 2020, 11:02 PM · Restricted Project
annita.zhang added inline comments to D72227: [Driver][X86] Add -malign-branch* and -malign-branch-within-32B-boundaries.
Jan 6 2020, 11:02 PM · Restricted Project
annita.zhang requested changes to D72227: [Driver][X86] Add -malign-branch* and -malign-branch-within-32B-boundaries.

A general comment. All the tests have been done with -malign-branch-prefix-size=5. I don't see any explicit reason to change the default prefix size from 5 to 4.

Jan 6 2020, 9:55 PM · Restricted Project
annita.zhang added a reviewer for D68720: Support -fstack-clash-protection for x86: andrew.w.kaylor.
Jan 6 2020, 12:16 AM · Restricted Project, Restricted Project
annita.zhang added a comment to D59780: Support Intel Control-flow Enforcement Technology.

PIng.
@ruiu LLVM 10.0.0 will be branched on Jan. 15. I would expect that the LLD CET support can be included in this release. If you have concern with current implementation, let's have an offline discussion and move it forward quickly.

Jan 6 2020, 12:16 AM · Restricted Project

Dec 20 2019

annita.zhang added a comment to D70157: Align branches within 32-Byte boundary(NOP padding).

I've gone ahead and landed the patch so that we can iterate in tree. See commit 14fc20ca62821b5f85582bf76a467d412248c248.

I've also landed a couple of follow up patches to address issues which would have otherwise required iteration on the review. See commits c148e2e2ef86f53391be459752511684424f331b, 4024d49edc1598a6f8017df541147b38bf1e2818, and 8b725f0459eee468ed7f9935fba3278fcb4997b1.

I still see some room for further cleanup (i.e. the fragment range scheme and tests), but what's in is of reasonable quality.

There's a couple follow up patches which are probably called for, but I think we can work on these in parallel now.

  1. We need to settle on assembler syntax.
  2. We need a patch for the x86 MI to MC translation to mark regions unsafe to pad. (Probably best to separate from the above for the moment.)
  3. We can incrementally add support for prefix padding.
Dec 20 2019, 6:48 PM · Restricted Project, Restricted Project

Dec 19 2019

annita.zhang added a comment to D70157: Align branches within 32-Byte boundary(NOP padding).

To be concrete, I propose:
".autopad", ".noautopad": allow/disallow the assembler to emit padding via inserting a nop or prefix before any instruction, as needed.
".align_branch_boundary [N,] [instruction,]": Enable branch-boundary padding (per previous description).

Dec 19 2019, 6:04 PM · Restricted Project, Restricted Project

Dec 17 2019

annita.zhang added a comment to D70157: Align branches within 32-Byte boundary(NOP padding).
Dec 17 2019, 12:57 AM · Restricted Project, Restricted Project

Dec 16 2019

annita.zhang added a comment to D70157: Align branches within 32-Byte boundary(NOP padding).

Here are the minutes from our phone call a few minutes ago.

Thanks for coordinating the meeting and having a clear summary. It helps a lot to accelerate the patch review. I really appreciate it!

Annita will refresh current patch with two key changes. 1) Drop prefix support and simplify and 2) drop clang driver support for now. Desire is to minimize cycle time before next iteration so that feedback on approach can be given while reviewers are still around.

Dec 16 2019, 10:00 PM · Restricted Project, Restricted Project
annita.zhang added a comment to D70157: Align branches within 32-Byte boundary(NOP padding).

More performance data was posted on http://lists.llvm.org/pipermail/llvm-dev/2019-December/137609.html and http://lists.llvm.org/pipermail/llvm-dev/2019-December/137610.html. Let's move on based on the data.

Dec 16 2019, 8:43 AM · Restricted Project, Restricted Project
annita.zhang added a comment to D71238: Align non-fused branches within 32-Byte boundary (basic case).

I just want to mention here, that my comment on the original patch still stands and really needs to be addressed:

We need to have detailed performance and binary size numbers for this kind of change. We should be looking at the test suite and SPEC for performance. I'd like to see before/after performance numbers on a system that is *not* using the microcode feature that addresses jCC (either a newer CPU, an older CPu, or an unpatched CPU). I'd also like to see before/after performance numbers on a system that *is* using the microcode feature to address the jCC issue. I'd suggest Clang itself, Firefox, and Chromium as binaries to show the binary size impact of this patch. Happy for others to suggest binaries to use to evaluate the binary size impact.

I continue to think that any patch we plan to land should have these numbers, and each incremental update to that patch should have the updated numbers. Otherwise, there is no way to do a good job reviewing the *impact* of these changes even after folks are satisfied with the *code* in these changes.

Dec 16 2019, 12:51 AM · Restricted Project
annita.zhang added a comment to D70157: Align branches within 32-Byte boundary(NOP padding).

I'm seeing lots of updates to fix bugs, but no movement for many days on both my meta comments and (in some ways more importantly) James's meta comments. (And thanks Philip for chiming in too!)

Meanwhile, we really, really need to get this functionality in place. The entire story for minimizing the new microcode performance hit hinges on these patches, and I'm really worried by how little progress we're seeing here.

Sorry for belated response. We're working hard to go through some paper work to get the performance data ready. I think maybe it's better to open a mailing thread in llvm-dev to post those performance data and discuss those suggestions.

The first data was posted in http://lists.llvm.org/pipermail/llvm-dev/2019-December/137413.html.

Thanks,
Annita

Dec 16 2019, 12:51 AM · Restricted Project, Restricted Project

Dec 10 2019

annita.zhang added a comment to D71238: Align non-fused branches within 32-Byte boundary (basic case).

Honestly, I both strongly agree with this, and see the argument as a somewhat lost cause. I think in practical terms, we *will* have to support something which works for both compiler generated assembly and legacy hand written assembly. I dislike that fact and the design it encourages, but it's reality.

Hand-written assembly is generally responsible for its own performance optimizations, and we do not usually expect the assembler to muck around with the code. So, why is it assumed that _this_ optimization should be done automatically on such assembly code?

Dec 10 2019, 9:55 PM · Restricted Project
annita.zhang added a comment to D59780: Support Intel Control-flow Enforcement Technology.

Hmm, I'm sorry but I'm confused. IIRC I had a discussion in the LLVM dev meeting that we were going to submit a change with a single PLT scheme rather than IPLT, so when I said that I'm going to submit a patch, I meant that I'm going to submit a patch for the 1PLT scheme rather than the 2PLT scheme. But this is for the 2PLT scheme. This is not something I want.

What was the decision that was made at the developer meeting? Will lld support the 2PLT scheme defined in the psABI?

Dec 10 2019, 9:28 PM · Restricted Project

Dec 9 2019

annita.zhang added a comment to D71238: Align non-fused branches within 32-Byte boundary (basic case).

I have a general comment. I think the macro-fused jcc instructions should be taken into account even for NOP padding. From our experience, it's better to take the macro-fused instructions as an integrated instruction, and ensure it not cross or against 32-byte boundary by padding if necessary.

Dec 9 2019, 10:25 PM · Restricted Project
annita.zhang added a comment to D70157: Align branches within 32-Byte boundary(NOP padding).

The point is that we have explicit requirement at the start and we have a lowering into 16-byte sequence that we need to be preserved exactly as it is.
Essentially what we need is a "protection" for this sequence from any changes by machinery that generates the binary code.
How can we protect a particular byte sequence from being changed by this branch aligner?

Dec 9 2019, 5:17 AM · Restricted Project, Restricted Project

Dec 7 2019

annita.zhang added a comment to D70157: Align branches within 32-Byte boundary(NOP padding).
Dec 7 2019, 1:30 AM · Restricted Project, Restricted Project

Dec 6 2019

annita.zhang added a comment to D70157: Align branches within 32-Byte boundary(NOP padding).

Third, I have not see a justification for why complexity for instruction prefix padding is necessary. All the effected CPUs support multi-byte nops, so we're talking about a *single micro op* difference between the nop form and prefix form. Can anyone point to a performance delta due to this? If not, I'd suggest we should start with the nop form, and then build the prefix form in a generic manner for all alignment varieties.

+1.

+1. Starting from just NOP padding sounds a simple and good first step. We can explore segment override prefixes in the future.

Dec 6 2019, 12:23 AM · Restricted Project, Restricted Project

Dec 3 2019

annita.zhang added a comment to D70157: Align branches within 32-Byte boundary(NOP padding).

I'm seeing lots of updates to fix bugs, but no movement for many days on both my meta comments and (in some ways more importantly) James's meta comments. (And thanks Philip for chiming in too!)

Meanwhile, we really, really need to get this functionality in place. The entire story for minimizing the new microcode performance hit hinges on these patches, and I'm really worried by how little progress we're seeing here.

Dec 3 2019, 11:30 PM · Restricted Project, Restricted Project

Nov 14 2019

annita.zhang added a comment to D70157: Align branches within 32-Byte boundary(NOP padding).

Thanks for sending this out!

I've made a minor comment on the flag structure, but my bigger question would be: can you summarize the performance hit and code size hit you've seen across some benchmarks? SPEC and the LLVM test suite would be nice, and maybe code size alone for some large binary like Chrome, Firefox, or Safari?

I'd be particularly interested in comparing the performance & code size hit incurred by the suggested "fused,jcc,jmp" set compared to all (adding call, ret, and indirect). If there is a big drop in either, I'd love to know which of these incurs the large drop.

Thanks,
-Chandler

Nov 14 2019, 9:49 PM · Restricted Project, Restricted Project

Nov 8 2019

annita.zhang added a comment to D59780: Support Intel Control-flow Enforcement Technology.

Sounds good. Thank you!

Nov 8 2019, 8:26 AM · Restricted Project
annita.zhang added a comment to D59780: Support Intel Control-flow Enforcement Technology.

ping

Nov 8 2019, 12:49 AM · Restricted Project

Oct 21 2019

annita.zhang added a comment to D59780: Support Intel Control-flow Enforcement Technology.

In the Cauldron, GCC/LLVM Collaboration BoF was a topic. The PLT scheme argument reflects a bigger problem that some collaboration does not work well. I will give an example about the latest binutils 2.33.1 release. I caught the objcopy --set-section-alignment problem in the last minute (https://sourceware.org/ml/binutils/2019-10/msg00008.html). I would regret if the problematic objcopy patch slipped through and GNU developers were reluctant to change it - which would mean llvm-objcopy would not have an ideal solution. Recently in another case, I caught a problem related to .ctf but I guess GNU maintainers will not care my opinion. I'll stop as it may be too off-topic now.

For my own notes, I should follow binutils, libc-alpha, and x86-64-abi discussions more closely.

I accepted this patch. I am still sad about the .plt.sec scheme but am not opposed to this patch. It is up to @ruiu to rebase the patch and commit it.

Oct 21 2019, 12:39 AM · Restricted Project
annita.zhang added a comment to D59780: Support Intel Control-flow Enforcement Technology.
Oct 21 2019, 12:20 AM · Restricted Project

Oct 15 2019

annita.zhang added a comment to D59780: Support Intel Control-flow Enforcement Technology.

I added a comment to x86-64 ABI google group link. I also posted it below.

Oct 15 2019, 1:58 AM · Restricted Project

May 28 2019

annita.zhang added a comment to D59780: Support Intel Control-flow Enforcement Technology.

The current problem is the 2nd PLT ABI was in GCC side for more than 2 years. I think we need to work with GCC community to have a consistent ABI so we won't break each other. Let's have more discussion in x86_64 ABI site.

May 28 2019, 10:45 PM · Restricted Project