- User Since
- Jun 20 2019, 5:36 PM (189 w, 2 d)
Tue, Jan 31
The code generated for inserting the scalar result of a 'uaddlv' intrinsic function to a vector exhibits a redundant move to the integer register first, & then moves it back to the destination vector Neon register. This can be done directly. In my understanding, this issue occurs because the selected instruction for vector insertion expects an i32/i64 integer register to hold the value and the selected instruction for the 'uaddlv' function is giving an output of i32/i64 at the instruction selection time, whereas in practice the result of 'uaddlv' goes to a Neon register. Therefore I added patterns to match the combined use of these two instructions (uaddlv & vector_insert), to generate the direct move from the source to destination Neon register.
Dec 15 2022
Addressed the final comments in a separate commit.
Dec 9 2022
Dec 8 2022
Added an assert for an extra check
Trying to fix patching error
Re-based on newly added tests
Added an extra unit test for 'zext <8xi8> to <8xi33>'. Added GISel path testing.
Dec 2 2022
Removed tbl-conversion cases to destination vector element width above 64, due to observed performance regressions. Will move this to a later patch, once we find a fix.
Blocked tbl-conversion for destination element size above 64 since only 2 or less destination vector elements can be chosen with each tbl instruction in these cases, making it less beneficial
Added two new test cases for destination vectors of arbitrary lengths
Dec 1 2022
Removed duplicate code by adding function pointers as parameter as advised in the reviews. Added more performance tests using ZExt/Trunc operations in combination with addition operation.
Nov 29 2022
Fixed rebasing error of duplicated tests
Trying to fix patching error again
Trying to fix patching error because of rebasing
Rebased on latest updated zext unit tests
Added tests for multiple back-to-back zext instructions for vectors & rebased on recent commit
Updating labels after rebasing
Trying to patch due to review dependency
Variable name changes
Addressed reviewer comments
Nov 28 2022
Rebasing & merging on a recent commit
Trying to fix patching error due to local parent commits
Removed cases where TBL lowering will not be beneficial
Nov 25 2022
Trying to fix rebasing error
Rebasing on parent patch for tests
Nov 23 2022
Removed (8|16)xi16 to (8|16)xi8 conversion because it wasn't showing benefits in instruction count, & additionally adding more instructions to the header. Updated comments.
Updated a comment
Removed case for 'trunc <(8|16)xi16> %x to <(8|16)xi8>' since it was adding more instructions to loop header, while not improving loop instruction count
Rebasing on commit of test cases prior to application of this patch
Nov 22 2022
Updated comments as mentioned in the reviews. Rebased on tests for this change prior to applying this patch.
Nov 21 2022
Minor update to comment
Nov 19 2022
Reverted to using random inputs & changed correctness test to compare against same operations with no vectorization
Nov 15 2022
Removed randomization in input & combined correctness tests with performance ones. Explicitly added vectorization width for 16 elements since the related patches target this width.
Nov 8 2022
Nov 3 2022
Addressed comments by t.p.northover - refactored code to remove redundancy
Patch already exists. This was posted by mistake.
Nov 2 2022
Minor fix to comments
Nov 1 2022
Removed the addition operation to keep only the truncate or zero-extend operation for a more focused performance comparison
Oct 31 2022
All the comments have been addressed in the latest patch.
Removed two test cases whose related patches are not yet available.
Oct 28 2022
Oct 25 2022
Ran clang-format since it was failing in the build report at https://buildkite.com/llvm-project/diff-checks/builds/133184
Oct 24 2022
The automated build tests failed for the previous patch because it was based on a previous commit for a unit test that isn't submitted yet. This patch fixes it by squashing the previous commit, removing the dependency & showing the final update.
Oct 22 2022
Extended the trunc lowering for other types like 16xi64, 16xi16, 8xi16
Oct 21 2022
Extended it to be generic enough for both truncate & zero-extend vector operations
Fixed a mistake where the same test was being ran twice
Oct 20 2022
Made a separate file for testing vector operations for truncate or zero extend. Added tests for truncate of different types of data types, with different vectorization width settings.
Oct 19 2022
Removed redundant code
Oct 4 2022
Removed an unused variable warning
Ran git-clang-format & made minor change to reduce LoC.