- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Feb 27 2023
In D142594#4138899, @dmgreen wrote:Thanks. From what I can see this LGTM.
The Peephole optimizations have a habit of causing subtle problems. Please make sure you run a bootstrap at least to see if any problems arise.
Feb 26 2023
Merged with latest codebase & updated the tests
Updated tests based on latest llvm open source code
Feb 20 2023
Removed redundant lines from MIR test file
Addressed the reviewer comments.
Removing redundant modification in a file
Addressed most comments of reviewers - removed kill flag on source register, deleted removal of dangling COPY instructions, removed unnecessary checks, updated test scripts.
Feb 19 2023
Added an test file in MIR to show what happens in these cases post ISel.
Feb 10 2023
Moved the optimization from ISel phase to Post-ISel peephole optimization
Feb 9 2023
Added a test case for inserting 'uaddlv' result in a non-zero index in the destination vector
Feb 8 2023
Thank you everyone for the feedback. I am currently working on the post-isel peephole optimization since that can capture the more generic pattern of
Jan 31 2023
The code generated for inserting the scalar result of a 'uaddlv' intrinsic function to a vector exhibits a redundant move to the integer register first, & then moves it back to the destination vector Neon register. This can be done directly. In my understanding, this issue occurs because the selected instruction for vector insertion expects an i32/i64 integer register to hold the value and the selected instruction for the 'uaddlv' function is giving an output of i32/i64 at the instruction selection time, whereas in practice the result of 'uaddlv' goes to a Neon register. Therefore I added patterns to match the combined use of these two instructions (uaddlv & vector_insert), to generate the direct move from the source to destination Neon register.
Dec 15 2022
Addressed the final comments in a separate commit.
Dec 9 2022
Dec 8 2022
Added an assert for an extra check
Trying to fix patching error
Re-based on newly added tests
Added an extra unit test for 'zext <8xi8> to <8xi33>'. Added GISel path testing.
Dec 2 2022
Removed tbl-conversion cases to destination vector element width above 64, due to observed performance regressions. Will move this to a later patch, once we find a fix.
Ran clang-format
Blocked tbl-conversion for destination element size above 64 since only 2 or less destination vector elements can be chosen with each tbl instruction in these cases, making it less beneficial
Added two new test cases for destination vectors of arbitrary lengths
Dec 1 2022
Removed duplicate code by adding function pointers as parameter as advised in the reviews. Added more performance tests using ZExt/Trunc operations in combination with addition operation.
Nov 29 2022
Fixed rebasing error of duplicated tests
Trying to fix patching error again
Trying to fix patching error because of rebasing
Rebased on latest updated zext unit tests
Added tests for multiple back-to-back zext instructions for vectors & rebased on recent commit
Updating labels after rebasing
Trying to patch due to review dependency
Ran clang-format
Variable name changes
Addressed reviewer comments
Nov 28 2022
Rebasing & merging on a recent commit
Trying to fix patching error due to local parent commits
Removed cases where TBL lowering will not be beneficial
Nov 25 2022
Trying to fix rebasing error
Rebasing on parent patch for tests
Nov 23 2022
Removed (8|16)xi16 to (8|16)xi8 conversion because it wasn't showing benefits in instruction count, & additionally adding more instructions to the header. Updated comments.
Updated a comment
Removed case for 'trunc <(8|16)xi16> %x to <(8|16)xi8>' since it was adding more instructions to loop header, while not improving loop instruction count
Rebasing on commit of test cases prior to application of this patch
Nov 22 2022
Updated comments as mentioned in the reviews. Rebased on tests for this change prior to applying this patch.
Nov 21 2022
Added comments
Minor update to comment
Nov 19 2022
In D138059#3935023, @fhahn wrote:Yeah, it is probably fine now, but testing with a single value also seems to make the test less interesting. You could keep the random initialization and add a version of truncOrZextVecInLoopWithVW8 that disables vectorization to generate comparison data for testing.
Reverted to using random inputs & changed correctness test to compare against same operations with no vectorization
Nov 15 2022
In D138059#3929083, @fhahn wrote:In D138059#3928627, @paquette wrote:Why randomized?
! In D138059#3929074, @nilanjana_basu wrote:
Removed randomization in input & combined correctness tests with performance ones. Explicitly added vectorization width for 16 elements since the related patches target this width.
I think the main reason for initializing with random data is to make the benchmarks more robust so the optimizer won't be able to (partly) optimize out our benchmark code?
In D138059#3928627, @paquette wrote:Why randomized?
Removed randomization in input & combined correctness tests with performance ones. Explicitly added vectorization width for 16 elements since the related patches target this width.
Nov 8 2022
Nov 3 2022
Ran clang-format
Addressed comments by t.p.northover - refactored code to remove redundancy
Patch already exists. This was posted by mistake.
Nov 2 2022
Minor fix to comments
Nov 1 2022
Removed the addition operation to keep only the truncate or zero-extend operation for a more focused performance comparison
Oct 31 2022
All the comments have been addressed in the latest patch.
Removed two test cases whose related patches are not yet available.