- User Since
- Apr 25 2016, 3:58 AM (317 w, 4 d)
May 1 2021
Apr 30 2021
Apr 26 2021
Regarding the pipe events, the patch in kernel.org had just copy pasted events from zen1 and zen2. Apparently, I learnt that these events are restricted and at present not really ready for public for znver3. That is what is reflected in the kernel patch as well. If we notice, PMCx000 is incorrectly mapped and doesn't really count the 6 pipes.
Is there any workaround that you guys suggest. I understand that these are pfm counters and are dynamic in nature. However, for the model, (at least for the throughput numbers) shall I publish our internal numbers running the internally enabled libpfm4 library? This is restricted in its immediate purpose which I understand however at present it may act as a work around for the throughput and llvm-mca numbers.
I am going through the modifications\corrections made in the by @lebedev.ri in the meantime.
Mar 24 2021
The FPU pipe counters aren't in the PPR as well. I have raised a ticket so as to get this updated in the PPR. Unfortunately, without the document getting updated, these events can't be enabled.
Mar 19 2021
My colleague had submitted the libpfm4 patch to the libpfm4 community. You can take this patch and enablement for exegesis for znver3. Thank you for the help!
Mar 16 2021
We have exchanged mails with Eranian for libpfm4 support. We will upload the libpfm4 patch shortly (3-4 days in time). The patch will be based on the latest PPR manual (https://www.amd.com/system/files/TechDocs/55898_pub.zip). I believe that will keep it moving for verifying the numbers and the details with exegesis.
Feb 26 2021
I can work on znver1 and znver2 models as well. I have a znver2 machine. Let me check if I can get a znver1 machine and run exegesis to correct these latency\throughput numbers.
Feb 25 2021
We have started working on the libpfm patch. We are working on getting it posted very soon! Hopefully, once that is in place, we will be able to measure these numbers in znver3 hardware and correct accordingly.
Jan 26 2021
I would like to know if this patch can be approved and later can be checked for the numbers when a cpu is accessible.
Since this is an extension and tweaks from znver1->znver2 path, I would request that if it is doable.
Jan 20 2021
Jan 12 2021
Jan 11 2021
The enhancement applies to specific string lengths. I will check it and if need be will submit changes with respect to this prerequisite.
Jan 8 2021
Jan 6 2021
- The instructions are updated for prefix specifiers
- Except for pvalidate all the SNP instructions are valid only in 64-bit environment. Corrected the test accordingly.
- The modes (In64BitMode, In64BitMode) are updated in Instruction description.
Jan 5 2021
Updaing the patch so that the simplified patch adds only few missing znver3 tests. The subsequent patches will comprehensively enable other znver3 features.
Dec 8 2020
Yep, Thank you! I will post smaller incremental patches.
Dec 7 2020
Apr 22 2020
@andrew.w.kaylor I went through the mailing list thread regarding this change and saw "Eventually, we’ll want to go back and teach specific optimizations to understand the intrinsics so that where possible optimizations can be performed in a manner consistent with dynamic rounding modes and strict exception handling.".
Do you have any references\plans on how to teach specific optimizations on this?
Jan 30 2020
Good with me! I am have moved some stones to get the wikichip URL updated.
Yes looks good to me.
Jan 27 2020
Changes done are obvious. Are you using libpfm with znver2 for verifying these?
Jan 9 2020
Jan 4 2020
RKSimon could you please commit D66088 on my behalf. I think my github account is not added.
Dec 31 2019
We are checking the libpfm enablement. I can commit D66088 if we are okay without libpfm.
Nov 18 2019
I agree on having a patch to enable exegesis. Will post that in couple of days. As mentioned, this is part of the initial plan as well.
Nov 11 2019
Nov 3 2019
Oct 22 2019
Updated the patch for review comments.
Oct 16 2019
Updated for review comments and latency modifications for MUL, vzeroupper, CLZERO instructions.
The changes for review comments are incorporated.
The latency information in CLZERO, VZEROUPPER, MUL instructions are updated.
Aug 18 2019
Aug 12 2019
Feb 26 2019
Feb 25 2019
Feb 19 2019
Addressed the comments from Craig Topper
Feb 18 2019
Jul 23 2018
I am fine!
Jun 19 2018
Jun 3 2018
May 3 2018
Sorry! Missed this thread completely!
May 1 2018
gcc uses feature bits. That would be a differential ISA with respect to the previous gen. I think the idea is to get the ISA list as close to the underlying arch. When an older version of the compiler gets used which doesn't have the arch enabled, the compiler can fallback to the closest arch which enables the ISA list.
Apr 8 2018
Apr 7 2018
It shouldn't differ.
The xmm version has 1 cycle latency and ymm version has 2 cycle latency for both AVX and SSE.
Mar 29 2018
Mar 27 2018
Mar 25 2018
Looks good to me!
Mar 21 2018
Aug 30 2017
Updated for review comments from Craig Topper!
Aug 22 2017
Simon! If you are okay with the patch, can you please commit the patch on my behalf!
Updated as per Javed's comments!
Aug 20 2017
Updated the patch as per Simon's comments.
Added the FP instruction itineraries which includes SSE4A and SHA instructions.
Aug 18 2017
Yes Simon! I will include the SSE4A instructions, their itineraries in the next patch. I will include tests verifying them as well.
If this patch is okay, can you please commit this patch on my behalf.
Simon, Craig Topper! My next increment is ready. If this patch can be accepted and committed, I will rebase and submit the next patch.
Or should I submit the next patch as an incremental patch with the changes put forth in this patch? Please help!
Aug 14 2017
Updated for the itineraries of memory variants of the instructions.
Aug 11 2017
Jul 19 2017
Jul 18 2017
Simon! If you are fine, can you please commit the patch on my behalf. I am yet to get commit access rights. Probably, after this patch, I will try to get it.
Patch update: For newer testcases.
Jul 17 2017
Updated as per Javed's review comments!
Jul 16 2017
Updated as per the review comments.
Jul 12 2017
Feb 8 2017
Thank you @craig.topper.
@craig.topper If you are okay, can you please commit the changes on my behalf?
I think it is okay even if we don't set the mayStore attribute.
I wrote a simple test to check the following
- Schedules based on the instruction attribute
- Side-effect handling
Feb 7 2017
Updated the test file "x86-32.s" for clzero only test!
Updated the builtins test for "__builtin_ia32_clzero"
Updated for review comments.
Updated for the review comments
Feb 1 2017
Jan 9 2017
If Okay, can you please commit these on my behalf. I don't have write access.
Yes. True I mentioned that for the grouping or the order of the features enabled. These initFeatureMap are done based on the intrinsics and the CodeGen part.
Adding znver1 to following tests.
b. Slow SHLD
c. slow unaligned memory
Fallback to CK_BTVER1 is ok but not to CK_BTVER2. This is not possible because of the partial YMM writes. They have different behavior for znver1 with AVX and their legacy SIMD counterparts. So, as of now leaving them to alphabetical order.
Jan 8 2017
The clzero intrinsic handling and feature addition will be handled as a separate patch.
Added movbe and sse4a into ISA list of znver1.
The clzero builtins and feature addition will be handled separately in another patch.
SSE4a and movbe are added to the ISA list.
Dec 21 2016
I am preparing a patch which doesn't include the clzero feature patch.
I will submit a separate patch for clzero feature patch.