Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

anna (Anna Thomas)
User

Projects

User does not belong to any projects.

User Details

User Since
Mar 30 2016, 11:13 AM (390 w, 1 d)

Recent Activity

Yesterday

anna added a comment to D153014: Deduplication of cyclic PHI nodes.

Looks like the missing concern from Nikita is about the compile time impact. Marek is checking this using the compile time tracker.

Wed, Sep 20, 8:34 AM · Restricted Project, Restricted Project
anna updated the diff for D154157: [LV] Cost model for out-of-loop reductions.

Added ctpop test

Wed, Sep 20, 7:07 AM · Restricted Project, Restricted Project
anna added a comment to D154157: [LV] Cost model for out-of-loop reductions.

Would you mind testing whether this also fixes https://github.com/llvm/llvm-project/issues/57476 and adding it as a test case if so?

Wed, Sep 20, 7:04 AM · Restricted Project, Restricted Project

Tue, Sep 19

anna added a comment to D154157: [LV] Cost model for out-of-loop reductions.

Any comments to progress this further?

Tue, Sep 19, 6:20 AM · Restricted Project, Restricted Project

Tue, Sep 12

anna added a comment to D154157: [LV] Cost model for out-of-loop reductions.

ping?

Tue, Sep 12, 6:55 PM · Restricted Project, Restricted Project

Fri, Sep 8

anna added a comment to D157549: [LV] Add debug output to print interleaved groups.
Fri, Sep 8, 9:26 AM · Restricted Project, Restricted Project
anna retitled D127884: POC: Add `elementtype` attribute requirement on atomic memory intrinsics. from Add `elementtype` attribute requirement on atomic memory intrinsics. to POC: Add `elementtype` attribute requirement on atomic memory intrinsics..
Fri, Sep 8, 9:22 AM · Restricted Project, Restricted Project
anna retitled D127884: POC: Add `elementtype` attribute requirement on atomic memory intrinsics. from [Draft][LangRef] Document `elementtype` attribute requirement on atomic memory intrinsics. to Add `elementtype` attribute requirement on atomic memory intrinsics..
Fri, Sep 8, 9:19 AM · Restricted Project, Restricted Project
anna updated the diff for D127884: POC: Add `elementtype` attribute requirement on atomic memory intrinsics..

This is a POC of most of the changes for the RFC to follow. The change will be split into LangRef + API as one change, Verifier + pass updates as another change.

Fri, Sep 8, 9:18 AM · Restricted Project, Restricted Project
anna added a comment to D157549: [LV] Add debug output to print interleaved groups.

shall I go ahead and land this? Simple enough change.

Fri, Sep 8, 9:13 AM · Restricted Project, Restricted Project

Wed, Sep 6

anna updated the summary of D154157: [LV] Cost model for out-of-loop reductions.
Wed, Sep 6, 11:29 AM · Restricted Project, Restricted Project
anna updated the diff for D154157: [LV] Cost model for out-of-loop reductions.

rebased over changes from dmgreen with more accurate costs for ARM out-of-loop reductions.

Wed, Sep 6, 11:21 AM · Restricted Project, Restricted Project
anna added a comment to D154157: [LV] Cost model for out-of-loop reductions.

Our X86 results are in. Over 244 workloads, the geomean changes by about ~0.4%. Looking at individual workloads, there are no major gains or regressions in large applications (unfortunately, we cannot share the exact benchmark names publicly). However, we do see big gains in 3 of these workloads where it performs a floating point minimum/maximum reduction over a float array of 3 elements (without this change we were vectorizing it).
Overall, I think the change is still reasonable to have since we now account for out of loop reductions more accurately.

Wed, Sep 6, 11:19 AM · Restricted Project, Restricted Project
anna commandeered D127884: POC: Add `elementtype` attribute requirement on atomic memory intrinsics..

Thanks @dantrushin for the patch. We have now had nasty miscompiles in practice (lowering memory intrinsics to regular loops and miss adding barriers for GC) and would like to make progress with this patch.

Wed, Sep 6, 8:16 AM · Restricted Project, Restricted Project

Tue, Sep 5

anna requested changes to D157729: [GuardWidening] Widen widenable conditions instead of branches.

Rebase needed. thanks.

Tue, Sep 5, 2:03 PM · Restricted Project, Restricted Project
anna added a comment to D154157: [LV] Cost model for out-of-loop reductions.

I've made some changes for the Arm MVE and NEON costs. Can you try a rebase? Thanks

Tue, Sep 5, 8:54 AM · Restricted Project, Restricted Project

Thu, Aug 31

anna added a comment to D154157: [LV] Cost model for out-of-loop reductions.

Gentle ping @fhahn. Anything I could do to move this forward? I have kept the option as off by default, since I would need to help to test this on supported targets upstream. It helps our use-case where loops are not vectorized when out-of-loop reductions are present in small trip count loops.
There maybe some fallouts where correction in reductions cost modelling will be required.

What are the regressions on ARM/RISCV? I think we should aim to enable this by default for all platforms, otherwise it is at the risk of not getting enabled. Also, those tests may show issues with the current implementation.

The regressions on ARM and RISCV are because we have an updated minimum trip count we expect (to offset the cost of the out-of-loop reductions), so the CHECK lines have been updated accordingly. RISCV looks okay with the minimum trip count being at least 4 or 8 (we round the MinimumTripCount up to VF, so it is more conservative).

Thu, Aug 31, 12:23 PM · Restricted Project, Restricted Project
anna updated the diff for D154157: [LV] Cost model for out-of-loop reductions.

changed the flag to on by default. Fixed the ARM/RISCV tests (by updating the lines). More details will follow.

Thu, Aug 31, 12:02 PM · Restricted Project, Restricted Project

Wed, Aug 30

anna accepted D157689: [GuardWidening] Refactor to work with the list of checks to widen/hoist.

LGTM.

Wed, Aug 30, 2:07 PM · Restricted Project, Restricted Project
anna added a comment to D154157: [LV] Cost model for out-of-loop reductions.

Gentle ping @fhahn. Anything I could do to move this forward? I have kept the option as off by default, since I would need to help to test this on supported targets upstream. It helps our use-case where loops are not vectorized when out-of-loop reductions are present in small trip count loops.
There maybe some fallouts where correction in reductions cost modelling will be required.

Wed, Aug 30, 7:46 AM · Restricted Project, Restricted Project

Mon, Aug 28

anna accepted D159009: [NFC][GuardWidening] Split widenCondCommon method.

Thanks!

Mon, Aug 28, 1:58 PM · Restricted Project, Restricted Project

Fri, Aug 25

anna accepted D158866: [StatepointLowering] Fix possible nullptr access in debug output.
Fri, Aug 25, 12:15 PM · Restricted Project, Restricted Project
anna added inline comments to D157689: [GuardWidening] Refactor to work with the list of checks to widen/hoist.
Fri, Aug 25, 10:37 AM · Restricted Project, Restricted Project
anna updated the diff for D154157: [LV] Cost model for out-of-loop reductions.

added an option which switches this off by default. The failures above are related to ARM and RISCV targets.

Fri, Aug 25, 7:52 AM · Restricted Project, Restricted Project

Thu, Aug 24

anna added inline comments to D157689: [GuardWidening] Refactor to work with the list of checks to widen/hoist.
Thu, Aug 24, 7:46 AM · Restricted Project, Restricted Project

Aug 22 2023

anna updated the diff for D154157: [LV] Cost model for out-of-loop reductions.

Rewrote the patch to extend existing logic for runtime checks calculation.
This now supports all kinds of out of loop reductions. Since we compute an upper-bound on MinProfitableTripCount, this maybe a more conservative estimate.

Aug 22 2023, 10:05 AM · Restricted Project, Restricted Project

Aug 18 2023

anna committed rG23f08af2bedd: [Inline] Avoid incompatible return attributes on deoptimize (authored by anna).
[Inline] Avoid incompatible return attributes on deoptimize
Aug 18 2023, 9:56 AM · Restricted Project, Restricted Project
anna closed D158286: [Inline] Avoid incompatible return attributes on deoptimize.
Aug 18 2023, 9:56 AM · Restricted Project, Restricted Project
anna updated the summary of D158286: [Inline] Avoid incompatible return attributes on deoptimize.
Aug 18 2023, 9:46 AM · Restricted Project, Restricted Project
anna updated the diff for D158286: [Inline] Avoid incompatible return attributes on deoptimize.

addressed review comment.

Aug 18 2023, 9:43 AM · Restricted Project, Restricted Project
anna added inline comments to D158286: [Inline] Avoid incompatible return attributes on deoptimize.
Aug 18 2023, 9:43 AM · Restricted Project, Restricted Project
anna updated the summary of D158286: [Inline] Avoid incompatible return attributes on deoptimize.
Aug 18 2023, 9:18 AM · Restricted Project, Restricted Project
anna updated the summary of D158286: [Inline] Avoid incompatible return attributes on deoptimize.
Aug 18 2023, 9:01 AM · Restricted Project, Restricted Project
anna added a comment to D158286: [Inline] Avoid incompatible return attributes on deoptimize.

One thing to note though is we would need to future proof against incompatible types (whenever new return types are added), but that happens in the common API: AttributeFuncs::typeIncompatible.

Aug 18 2023, 9:00 AM · Restricted Project, Restricted Project
anna updated the diff for D158286: [Inline] Avoid incompatible return attributes on deoptimize.

remove incompatible return attributes at the point we change the return type.

Aug 18 2023, 8:58 AM · Restricted Project, Restricted Project
anna added a comment to D158286: [Inline] Avoid incompatible return attributes on deoptimize.

Where is this update done? Shouldn't the code changing the return type drop incompatible attributes?

Aug 18 2023, 8:21 AM · Restricted Project, Restricted Project
anna updated the summary of D158286: [Inline] Avoid incompatible return attributes on deoptimize.
Aug 18 2023, 8:18 AM · Restricted Project, Restricted Project
anna updated the summary of D158286: [Inline] Avoid incompatible return attributes on deoptimize.
Aug 18 2023, 8:17 AM · Restricted Project, Restricted Project
anna requested review of D158286: [Inline] Avoid incompatible return attributes on deoptimize.
Aug 18 2023, 8:16 AM · Restricted Project, Restricted Project

Aug 17 2023

anna accepted D157502: [LoopPredication] Rework assumes of widened conditions.

LGTM w/ comments.

Aug 17 2023, 9:38 AM · Restricted Project, Restricted Project
anna added a comment to D157729: [GuardWidening] Widen widenable conditions instead of branches.

do you need review for this? Could you pls state the review stack in the description so it is clear the order of reviews. Thanks.

Aug 17 2023, 7:57 AM · Restricted Project, Restricted Project
anna added a comment to D157689: [GuardWidening] Refactor to work with the list of checks to widen/hoist.

do you need review for this? Could you pls state the review stack in the description so it is clear the order of reviews. Thanks.

Aug 17 2023, 7:57 AM · Restricted Project, Restricted Project

Aug 15 2023

anna accepted D157529: [NFC][GuardUtils] Add util to extract widenable conditions.

LGTM w/ comments.

Aug 15 2023, 9:28 AM · Restricted Project, Restricted Project

Aug 9 2023

anna requested review of D157549: [LV] Add debug output to print interleaved groups.
Aug 9 2023, 2:17 PM · Restricted Project, Restricted Project
anna committed rG5dfdf34df0de: [LV] Move interleaved test to X86 directory (authored by anna).
[LV] Move interleaved test to X86 directory
Aug 9 2023, 1:04 PM · Restricted Project, Restricted Project
anna added a comment to D155520: [LV] Complete load groups and release store groups in presence of dependency.

Thanks, I've moved the test and removed the x86-registered-target

Aug 9 2023, 1:03 PM · Restricted Project, Restricted Project
anna added inline comments to D157529: [NFC][GuardUtils] Add util to extract widenable conditions.
Aug 9 2023, 12:29 PM · Restricted Project, Restricted Project
anna accepted D157276: [NFC][LoopPredication] Extract guard parsing to GuardUtils.
Aug 9 2023, 10:23 AM · Restricted Project, Restricted Project

Aug 8 2023

anna committed rGcb7d28ef52b4: Fix BB failure for check lines (authored by anna).
Fix BB failure for check lines
Aug 8 2023, 5:30 PM · Restricted Project, Restricted Project
anna committed rG3cf24dbbdde0: [LV] Complete load groups and release store groups. Try 2. (authored by anna).
[LV] Complete load groups and release store groups. Try 2.
Aug 8 2023, 3:11 PM · Restricted Project, Restricted Project
anna closed D155520: [LV] Complete load groups and release store groups in presence of dependency.
Aug 8 2023, 3:10 PM · Restricted Project, Restricted Project
anna added a comment to D155520: [LV] Complete load groups and release store groups in presence of dependency.

Thanks everyone for the review and test cases. I'll try landing this again today.

Aug 8 2023, 1:25 PM · Restricted Project, Restricted Project
anna added inline comments to D157276: [NFC][LoopPredication] Extract guard parsing to GuardUtils.
Aug 8 2023, 9:25 AM · Restricted Project, Restricted Project
anna accepted D157073: [LegacyPM] Remove LowerGuardIntrinsicLegacyPass.

LGTM - we don't use the legacy pass pipeline. In fact, we're moving away from guard representation completely (not sure if anyone else downstream uses it though).

Aug 8 2023, 7:27 AM · Restricted Project, Restricted Project

Aug 3 2023

anna updated the diff for D155520: [LV] Complete load groups and release store groups in presence of dependency.

added Ayal's minimal reproducer. Addressed review comments.

Aug 3 2023, 6:48 PM · Restricted Project, Restricted Project
anna updated the diff for D155520: [LV] Complete load groups and release store groups in presence of dependency.

updated test case with correct check lines and comments.

Aug 3 2023, 10:10 AM · Restricted Project, Restricted Project
anna added inline comments to D155520: [LV] Complete load groups and release store groups in presence of dependency.
Aug 3 2023, 9:52 AM · Restricted Project, Restricted Project
anna accepted D156963: [X86] Workaround possible CPUID bug in Sandy Bridge..
Aug 3 2023, 7:10 AM · Restricted Project, Restricted Project, Restricted Project
anna added a comment to D156963: [X86] Workaround possible CPUID bug in Sandy Bridge..

thank you for the fix! Details of the diagnostic is here (for reference later) : https://reviews.llvm.org/D155145#4556178

Aug 3 2023, 7:09 AM · Restricted Project, Restricted Project, Restricted Project

Aug 2 2023

anna added a comment to D155145: [X86] Add AVX-VNNI-INT16 instructions..

thank you @craig.topper and @pengfei .

Aug 2 2023, 8:58 PM · Restricted Project, Restricted Project, Restricted Project
anna added a comment to D155145: [X86] Add AVX-VNNI-INT16 instructions..

Can you capture the values of EAX, EBX, ECX, and EDX after the two calls to getX86CpuIDAndInfoEx that have 0x7 as the first argument? Maybe there's a bug in CPUID on Sandy Bridge.

Sure, on the original code before the patch you suggested right?
The two calls are:

 bool HasLeaf7 =
        MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x0, &EAX, &EBX, &ECX, &EDX);
+   llvm::errs() << "Before setting fsgsbase the value for EAX: " << EAX
+                     << " EBX: " << EBX << " ECX: " << ECX << "  EDX: " << EDX
+                     << "\n";
  
    Features["fsgsbase"]   = HasLeaf7 && ((EBX >>  0) & 1);
....
bool HasLeaf7Subleaf1 =
        MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x1, &EAX, &EBX, &ECX, &EDX);
+   llvm::errs() << "Before setting sha512 the value for EAX: " << EAX
+                     << " EBX: " << EBX << " ECX: " << ECX << "  EDX: " << EDX
+                     << "\n";
    Features["sha512"]     = HasLeaf7Subleaf1 && ((EAX >> 0) & 1);
...
we set avxvnniint16 after this

Takes a while to get a build on this machine, should have the output soon.

Aug 2 2023, 8:01 PM · Restricted Project, Restricted Project, Restricted Project
anna added a comment to D155145: [X86] Add AVX-VNNI-INT16 instructions..

Can you capture the values of EAX, EBX, ECX, and EDX after the two calls to getX86CpuIDAndInfoEx that have 0x7 as the first argument? Maybe there's a bug in CPUID on Sandy Bridge.

Sure, on the original code before the patch you suggested right?
The two calls are:

 bool HasLeaf7 =
        MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x0, &EAX, &EBX, &ECX, &EDX);
+   llvm::errs() << "Before setting fsgsbase the value for EAX: " << EAX
+                     << " EBX: " << EBX << " ECX: " << ECX << "  EDX: " << EDX
+                     << "\n";
Aug 2 2023, 10:36 AM · Restricted Project, Restricted Project, Restricted Project
anna updated the diff for D155520: [LV] Complete load groups and release store groups in presence of dependency.

addressed review comment for use-after-free error (updated test to show the LoopAccessAnalysis bailout no longer present).

Aug 2 2023, 9:31 AM · Restricted Project, Restricted Project
anna added a comment to D155520: [LV] Complete load groups and release store groups in presence of dependency.

Thanks Ayal for the root cause. I'll update the patch

Aug 2 2023, 9:28 AM · Restricted Project, Restricted Project
anna added a comment to D155145: [X86] Add AVX-VNNI-INT16 instructions..

We see a crash bisected to this patch about using an illegal instruction.
Here's the CPUInfo for the machine:

CPU info:
current cpu id: 22
total 32(physical cores 16) (assigned logical cores 32) (assigned physical cores 16) (assigned_sockets:2 of 2) (8 cores per cpu, 2 threads per core) family 6 model 45 stepping 7 microcode 0x71a, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, aes, clmul, ht, tsc, tscinvbit, tscinv, clflush
AvgLoads: 0.30, 0.10, 0.18
CPU Model and flags from /proc/cpuinfo:
model name      : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
Online cpus: 0-31
Offline cpus:
BIOS frequency limitation: <Not Available>
Frequency switch latency (ns): 20000
Available cpu frequencies: <Not Available>
Current governor: schedutil
Core performance/turbo boost: <Not Available>

I don't see avxvnniint16 in the flags list nor avx2. So, this (relatively new) instruction shouldn't be generated for this machine. Any ideas on why this might be happening?

As far as I can see from the patch, the only way to generate avxvnniint16 instructions is to call its specific intrinsics explicitly. And we will check compiling options in FE before allowing to call the intrinsics. We do have an optimization to generate vnni instructions without intrinsics, but we haven't extend it to avxvnniint16 so far.
So I don't know what's wrong in your case, could you provide a reproducer for your problem?

I've investigated what is going on. With this patch, we are now passing in +avxvnniint16 into machine attributes. With that attribute, we now generate an instruction which is illegal on sandybridge machine:

 0x3013f2af:	jmpq   0x3013f09b
   0x3013f2b4:	mov    %rax,%rdi
   0x3013f2b7:	and    $0xfffffffffffffff0,%rdi
=> 0x3013f2bb:	vpbroadcastd %xmm0,%ymm2
   0x3013f2c0:	vpbroadcastd %xmm1,%ymm3

The instruction vpbroadcastd %xmm0,%ymm2 requires AVX2 CPU flag: https://www.felixcloutier.com/x86/vpbroadcast. However, the machine has only AVX flag.

This is the complete mattr generated:

!3 = !{!"-mattr=-prfchw,-cldemote,+avx,+aes,+sahf,+pclmul,-xop,+crc32,-xsaves,-avx512fp16,-sm4,+sse4.1,-avx512ifma,+xsave,-avx512pf,+sse4.2,-tsxldtrk,-ptwrite,-widekl,-sm3,-invpcid,+64bit,-xsavec,-avx512vpopcntdq,+cmov,-avx512vp2intersect,-avx512cd,-movbe,-avxvnniint8,-avx512er,-amx-int8,-kl,-sha512,-avxvnni,-rtm,-adx,-avx2,-hreset,-movdiri,-serialize,-vpclmulqdq,-avx512vl,-uintr,-clflushopt,-raoint,-cmpccxadd,-bmi,-amx-tile,+sse,-gfni,+avxvnniint16,-amx-fp16,+xsaveopt,-rdrnd,-avx512f,-amx-bf16,-avx512bf16,-avx512vnni,+cx8,-avx512bw,+sse3,-pku,-fsgsbase,-clzero,-mwaitx,-lwp,-lzcnt,-sha,-movdir64b,-wbnoinvd,-enqcmd,-prefetchwt1,-avxneconvert,-tbm,-pconfig,-amx-complex,+ssse3,+cx16,-bmi2,-fma,+popcnt,-avxifma,-f16c,-avx512bitalg,-rdpru,-clwb,+mmx,+sse2,-rdseed,-avx512vbmi2,-prefetchi,-rdpid,-fma4,-avx512vbmi,-shstk,-vaes,-waitpkg,-sgx,+fxsr,-avx512dq,-sse4a"}

I've confirmed if we changed to -avxvnniint16 we do not generate vpbroadcastd.

W.r.t. how we get the machine attributes generated through our front-end:

if (!sys::getHostCPUFeatures(Features))
      return std::move(mattr);
  
    // Fill mattr with default values.
    mattr.reserve(Features.getNumItems());
    for (auto &I : Features) {
      std::string attr(I.first());
      mattr.emplace_back(std::string(I.second ? "+" : "-") + attr);
    }

So, the problem is in getHostCPUFeatures, possibly this line from the patch :
Features["avxvnniint16"] = HasLeaf7Subleaf1 && ((EDX >> 10) & 1) && HasAVXSave;.

Does this patch help

diff --git a/llvm/lib/TargetParser/Host.cpp b/llvm/lib/TargetParser/Host.cpp
index 1141df09307c..11a6879fb76a 100644
--- a/llvm/lib/TargetParser/Host.cpp
+++ b/llvm/lib/TargetParser/Host.cpp
@@ -1769,7 +1769,7 @@ bool sys::getHostCPUFeatures(StringMap<bool> &Features) {
   Features["amx-tile"]   = HasLeaf7 && ((EDX >> 24) & 1) && HasAMXSave;
   Features["amx-int8"]   = HasLeaf7 && ((EDX >> 25) & 1) && HasAMXSave;
   bool HasLeaf7Subleaf1 =
-      MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x1, &EAX, &EBX, &ECX, &EDX);
+      HasLeaf7 && EAX >= 1 && !getX86CpuIDAndInfoEx(0x7, 0x1, &EAX, &EBX, &ECX, &EDX);
   Features["sha512"]     = HasLeaf7Subleaf1 && ((EAX >> 0) & 1);
   Features["sm3"]        = HasLeaf7Subleaf1 && ((EAX >> 1) & 1);
   Features["sm4"]        = HasLeaf7Subleaf1 && ((EAX >> 2) & 1);
Aug 2 2023, 7:01 AM · Restricted Project, Restricted Project, Restricted Project

Aug 1 2023

anna added a comment to D155145: [X86] Add AVX-VNNI-INT16 instructions..

We see a crash bisected to this patch about using an illegal instruction.
Here's the CPUInfo for the machine:

CPU info:
current cpu id: 22
total 32(physical cores 16) (assigned logical cores 32) (assigned physical cores 16) (assigned_sockets:2 of 2) (8 cores per cpu, 2 threads per core) family 6 model 45 stepping 7 microcode 0x71a, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, aes, clmul, ht, tsc, tscinvbit, tscinv, clflush
AvgLoads: 0.30, 0.10, 0.18
CPU Model and flags from /proc/cpuinfo:
model name      : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
Online cpus: 0-31
Offline cpus:
BIOS frequency limitation: <Not Available>
Frequency switch latency (ns): 20000
Available cpu frequencies: <Not Available>
Current governor: schedutil
Core performance/turbo boost: <Not Available>

I don't see avxvnniint16 in the flags list nor avx2. So, this (relatively new) instruction shouldn't be generated for this machine. Any ideas on why this might be happening?

As far as I can see from the patch, the only way to generate avxvnniint16 instructions is to call its specific intrinsics explicitly. And we will check compiling options in FE before allowing to call the intrinsics. We do have an optimization to generate vnni instructions without intrinsics, but we haven't extend it to avxvnniint16 so far.
So I don't know what's wrong in your case, could you provide a reproducer for your problem?

Aug 1 2023, 12:39 PM · Restricted Project, Restricted Project, Restricted Project

Jul 28 2023

anna added a comment to D155145: [X86] Add AVX-VNNI-INT16 instructions..

We see a crash bisected to this patch about using an illegal instruction.
Here's the CPUInfo for the machine:

CPU info:
current cpu id: 22
total 32(physical cores 16) (assigned logical cores 32) (assigned physical cores 16) (assigned_sockets:2 of 2) (8 cores per cpu, 2 threads per core) family 6 model 45 stepping 7 microcode 0x71a, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, aes, clmul, ht, tsc, tscinvbit, tscinv, clflush
AvgLoads: 0.30, 0.10, 0.18
CPU Model and flags from /proc/cpuinfo:
model name      : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
Online cpus: 0-31
Offline cpus:
BIOS frequency limitation: <Not Available>
Frequency switch latency (ns): 20000
Available cpu frequencies: <Not Available>
Current governor: schedutil
Core performance/turbo boost: <Not Available>

I don't see avxvnniint16 in the flags list nor avx2. So, this (relatively new) instruction shouldn't be generated for this machine. Any ideas on why this might be happening?

Jul 28 2023, 11:51 AM · Restricted Project, Restricted Project, Restricted Project

Jul 27 2023

anna added inline comments to D155520: [LV] Complete load groups and release store groups in presence of dependency.
Jul 27 2023, 10:13 AM · Restricted Project, Restricted Project
anna added inline comments to D155520: [LV] Complete load groups and release store groups in presence of dependency.
Jul 27 2023, 10:10 AM · Restricted Project, Restricted Project
anna requested review of D155520: [LV] Complete load groups and release store groups in presence of dependency.

Fixed the bug caused by this patch.

Jul 27 2023, 10:09 AM · Restricted Project, Restricted Project
anna updated the diff for D155520: [LV] Complete load groups and release store groups in presence of dependency.

fixed use-after-free error (with added testcase). This bug wasn't there before the patch because we would break out of the inner loop accessing A, whereas now we were continuing to see which other A accesses needed to release the group.

Jul 27 2023, 10:08 AM · Restricted Project, Restricted Project
anna reopened D155520: [LV] Complete load groups and release store groups in presence of dependency.
Jul 27 2023, 10:08 AM · Restricted Project, Restricted Project
anna added a comment to D155520: [LV] Complete load groups and release store groups in presence of dependency.

I reduced the testcase enough to show how GroupA can be same as GroupB, but there are two ways to fix this:

  • Identify that GroupA is same as GroupB and once we released groupA, we should iterate to the next B instruction (in the outer loop).
  • GroupA which is being released should never be same as GroupB and we do this by making sure that there's no dependency between *any of* the stores when inserting into GroupB. AFAICT, we don't do that since we check only between A and B when inserting (store) A into (store) groupB.
Jul 27 2023, 8:58 AM · Restricted Project, Restricted Project
anna added a comment to D155520: [LV] Complete load groups and release store groups in presence of dependency.

Thanks @mstorsjo and just for info, the use-after-free error shows up as a hang in non-asan builds (so its highly likely both reports are the same bug). I've reduced the first repro with bugpoint and it is due to GroupA == GroupB.

Jul 27 2023, 6:35 AM · Restricted Project, Restricted Project

Jul 26 2023

anna added a reverting change for rGeaf6117f3388: [LV] Complete load groups and release store groups in presence of dependency: rGe85fd3cbdd68: Revert "[LV] Complete load groups and release store groups in presence of….
Jul 26 2023, 12:08 PM · Restricted Project, Restricted Project
anna committed rGe85fd3cbdd68: Revert "[LV] Complete load groups and release store groups in presence of… (authored by anna).
Revert "[LV] Complete load groups and release store groups in presence of…
Jul 26 2023, 12:08 PM · Restricted Project, Restricted Project
anna added a reverting change for D155520: [LV] Complete load groups and release store groups in presence of dependency: rGe85fd3cbdd68: Revert "[LV] Complete load groups and release store groups in presence of….
Jul 26 2023, 12:07 PM · Restricted Project, Restricted Project
anna added inline comments to D155520: [LV] Complete load groups and release store groups in presence of dependency.
Jul 26 2023, 11:56 AM · Restricted Project, Restricted Project

Jul 25 2023

anna committed rGeaf6117f3388: [LV] Complete load groups and release store groups in presence of dependency (authored by anna).
[LV] Complete load groups and release store groups in presence of dependency
Jul 25 2023, 2:32 PM · Restricted Project, Restricted Project
anna closed D155520: [LV] Complete load groups and release store groups in presence of dependency.
Jul 25 2023, 2:32 PM · Restricted Project, Restricted Project
anna added a comment to D155520: [LV] Complete load groups and release store groups in presence of dependency.

Thank you for the review Ayal!

Jul 25 2023, 1:32 PM · Restricted Project, Restricted Project
anna added a comment to D155520: [LV] Complete load groups and release store groups in presence of dependency.

Hi Ayal, any more comments? Thanks.

Jul 25 2023, 11:38 AM · Restricted Project, Restricted Project

Jul 21 2023

anna updated the diff for D155520: [LV] Complete load groups and release store groups in presence of dependency.

addressed review comments.

Jul 21 2023, 10:17 AM · Restricted Project, Restricted Project
anna added inline comments to D155520: [LV] Complete load groups and release store groups in presence of dependency.
Jul 21 2023, 10:09 AM · Restricted Project, Restricted Project

Jul 19 2023

anna added inline comments to D155520: [LV] Complete load groups and release store groups in presence of dependency.
Jul 19 2023, 9:45 AM · Restricted Project, Restricted Project
anna updated the diff for D155520: [LV] Complete load groups and release store groups in presence of dependency.

addressed most review comments: corrected if (DependentInst) check and couple other NFC.

Jul 19 2023, 9:12 AM · Restricted Project, Restricted Project
anna added inline comments to D155520: [LV] Complete load groups and release store groups in presence of dependency.
Jul 19 2023, 9:10 AM · Restricted Project, Restricted Project

Jul 17 2023

anna requested review of D155520: [LV] Complete load groups and release store groups in presence of dependency.
Jul 17 2023, 3:10 PM · Restricted Project, Restricted Project
anna committed rGa5573bf030e8: [LV] Precommit test for interleaving miscompile (authored by anna).
[LV] Precommit test for interleaving miscompile
Jul 17 2023, 2:25 PM · Restricted Project, Restricted Project

Jul 14 2023

anna committed rG9675e3fa81e5: [LV] Address post-commit NFC comments in interleave (authored by anna).
[LV] Address post-commit NFC comments in interleave
Jul 14 2023, 1:25 PM · Restricted Project, Restricted Project
anna committed rGdfaf4587e4ce: Precommit follow-up testcase for interleaved miscompile (authored by anna).
Precommit follow-up testcase for interleaved miscompile
Jul 14 2023, 1:05 PM · Restricted Project, Restricted Project
anna added inline comments to D154309: [LV] Do not add load to group if it moves across conflicting store..
Jul 14 2023, 8:11 AM · Restricted Project, Restricted Project

Jul 13 2023

anna added inline comments to D154309: [LV] Do not add load to group if it moves across conflicting store..
Jul 13 2023, 5:40 PM · Restricted Project, Restricted Project
anna planned changes to D154157: [LV] Cost model for out-of-loop reductions.

thank you Florian, this is a nice idea. Working on it.

Jul 13 2023, 5:36 PM · Restricted Project, Restricted Project

Jul 12 2023

anna committed rG11592667344f: [SLP] Add support for fmaximum/fminimum reduction (authored by anna).
[SLP] Add support for fmaximum/fminimum reduction
Jul 12 2023, 12:23 PM · Restricted Project, Restricted Project
anna closed D154463: [SLPVectorize] Add support for fmaximum/fminimum reduction.
Jul 12 2023, 12:23 PM · Restricted Project, Restricted Project
anna committed rGa43aebcd91c3: [SLP] Test for minimum/maximum reduction (authored by anna).
[SLP] Test for minimum/maximum reduction
Jul 12 2023, 12:23 PM · Restricted Project, Restricted Project
anna closed D155096: [SLP] Test for minimum/maximum reduction.
Jul 12 2023, 12:23 PM · Restricted Project, Restricted Project
anna updated the diff for D154463: [SLPVectorize] Add support for fmaximum/fminimum reduction.

rebased over test in D155096

Jul 12 2023, 10:18 AM · Restricted Project, Restricted Project
anna updated the diff for D155096: [SLP] Test for minimum/maximum reduction.

removed check tags

Jul 12 2023, 9:30 AM · Restricted Project, Restricted Project