- User Since
- Feb 7 2020, 5:34 AM (33 w, 9 h)
Jul 29 2020
Jun 29 2020
Jun 27 2020
- revert to using static instead of thread_local
- removed more inconsistent spaces
Jun 26 2020
Adressed Alex's comments:
- make default configuration thread_local
- fixed clang tidy issues
- removed unnecessary spaces
Jun 25 2020
- Added a comment that reflects the discussions about clone (can be reverted) vs inline (performance).
- The current implementation is revertible at the cost of cloning the loop body.
- Use std::tie to improve the code readability.
Jun 24 2020
Jun 23 2020
- copy the options instead of keeping a reference
Jun 22 2020
Jun 20 2020
- addressed the review comments
- use ceil division by the step to compute the number of loop iterations (to support the cases where the number of loop iterations is known but not a multiple of the step)
- adapted one of the test cases to test this scenario
- removed some unnecessary headers
- cleanup includes
Jun 19 2020
- added additional test
- separated the loop tiling patch from the canonicalization
- addressed Stephan's comments
Ok I try to split the stuff.
updated to the latest master
I may missunderstand the thing but in mlir/test/Transforms/parallel-loop-collapsing.mlir the computation of I3 directly uses NEW_I0 and multiplies it by 10 and adds 9. Since NEW_I0 takes the values 0 and 1 this means we get the value 19 for I3 but we have the loop range 9 to 11 in the original loop. I guess NEW_I0 should be divided by two before computing the index I3?
- I started from the fixed tiling pass and tried to apply minimal changes (drop the minimum operation if the loop sizes is a multiple of the tile size)
- I added a canonicalization to drop single iteration loops
- I use CX / VX instead of VAL_X
Jun 17 2020
Jun 9 2020
The update passes the options structure to the type converter and to the conversion pattern base class (replaces the llvm type converter customizations). I also extended the patch to the rocdl backend.
May 26 2020
If we want to strive to little bit bigger refactoring we can postpone landing
I fixed the typo and added extended the existing tests a little bit to test 32-bit index computations
May 20 2020
My assumption is that the LLVMTypeConverterCustomization do not interfere with the address space conversion. Should the address space conversion be an integral part of the LLVMTypeConverterCustomization class?
May 18 2020
Apr 16 2020
Apr 3 2020
Thanks for fixing this. Should have had that idea myself!
Mar 18 2020
Mar 2 2020
Feb 12 2020
I added a test for a memref with a non-zero offset and non-standard strides. Let me know if you had another test in mind...