User Details
- User Since
- Nov 17 2017, 12:00 PM (178 w, 5 d)
Wed, Apr 7
Thu, Apr 1
Mar 12 2021
Mar 3 2021
Thank you for fixing this!
Mar 2 2021
Feb 26 2021
Feb 23 2021
Feb 22 2021
Feb 18 2021
Thanks for adding the asserts!
Feb 17 2021
Thanks for your work! It looks good to me and I only have a few minor comments.
Feb 9 2021
Feb 3 2021
Feb 2 2021
It is a very nice PR, thanks for the great work! I finish one round of reading the code and haven't read the MLIR tests yet. Would like to send out my comments I have so far.
Jan 28 2021
LGTM. Thanks for fixing this!
Apr 3 2019
Remove iostream.
Apr 2 2019
Apr 1 2019
Mar 29 2019
Rewrite a few magic constants in a different form, hoping this can make it easier to understand what is going on.
Modify test to also check a negative value beyond -max(float).
Fix commit message.
Modify test cases to check for the use of some constants.
Mar 28 2019
Fix the test to pass in a scalar value instead of a pointer.
Remove unnecessary device memory allocation and copy.
Add a call to cudaDeviceSynchronize.
Rename round.cu to test_round.cu.
Fix the intrinsic for double.
Change the test to use assert and a variable to prevent constant folding.
Mar 22 2019
Mar 19 2019
Modify test case to check the return value.
The problem is only reported by ubsan (undefined behavior sanitizer). My test case produces a value of 1.27e+80, which is a value between max(float) and max(double). Without ubsan, llvm opt (compiled with clang for x86) produces the correct result silently. However, since (float)the-double-value has undefined behavior by c++ language, there is no guarantee that the llvm opt produced by any compiler for any platform produces the same result. Do you have suggestion on what is a better test case?
Update commit message.
Add a test case.
Mar 18 2019
Jun 6 2018
@spatel Thanks for your comment! I need to revise my example to better match the real case I am looking at-- in particular, to show that the optimization will increase the number of zext instructions. In this example, there are only two zext instructions. However, if we narrow all the arithmetic operations to i32 we will need four zext instructions. In one of your previous comment, you have suggested to extend TruncInstCombine in aggressive-instcombine. I see the following issues: (1) TruncInstCombine doesn't increase the number of zext/sext instructions. (2) TruncInstCombine currently only handles operations with this properties truncate(op(i64), i32) == op(truncate(i64, i32)), div/rem/right-shift which are needed here aren't part of such operations. We can properly add value range analysis to resolve this though. (3) TruncInstCombine is driven by the truncate instruction in the IR, and there is no such truncate instruction in the case here. To handle the case here (such as the pattern "lshr followed by shl" you mentioned) , is it acceptable to add a new optimization to aggressive-instcombine that can increase the number of zext/sext instructions?
Jun 5 2018
What Sanjoy suggested doesn't (completely) solve the problem we have , where UDIV and UREM are used to calculate the dimensional indices from a linear index, and the dimensionally indices are then used to access tensors of different shapes (for example, two tensors being accessed both require broadcasting but along different dimensions).
With the current state of D47113, would it be acceptable if I revise my change here to only narrow i64 udiv/urem to i32 using the added patterns?
May 21 2018
InstCombiner::visitShl performs similar narrowing without checking user count and can increase the total number to ZExt instructions.
InstCombiner::visitLShr can perform the same transformation for the cases where correlated-value propagation is not needed to discover the range of the values.
However, unlike the transformation here, InstCombiner::visitLShr carefully makes sure that the transformation won't increase the total number of ZExt/ZExt instructions (Op1.hasOneUse check). Why the transformation here doesn't need similar check? Is it safe to remove such a check in InstCombiner::visitLShr?
May 18 2018
Ping.
May 14 2018
Instcombine has udiv/urem narrowing without increasing the total number of ZExt instructions. Correlated-Propagation has udiv/urem narrowing that can increasing the total number of ZExt instruction (kind of a superset of what is in instcombine). These are existing duplicated functionalities. I have considered two possible changes to this: (1) allow instcombine to increase the total number of ZExt instructions in general (by simply removing the m_oneuse related checks). (2) as shown in this change, allow instcombine to handle a special case and may increase the number of ZExt instructions by one.
I looked at other similar narrowing transformations in instcombine, and found most of them have the check to avoid increasing the total number of ZExt/SExt instructions (lshr narrowing for example), but some of them don't have such a check (shl narrowing for example). I am not sure whether these are by design or due to something else such as ignorance. Do you know about this?
May 13 2018
Are you suggesting to teach correlated-propagation to narrow shl/and? InstCombine can perform such narrowing, I don't see why it is better to duplicate this in correlated-propagation. Can you explain? The "m_OneUse" check in the code I am changing here is a simple logic to make sure that the transformation do not increase the number of ZExt instructions. I basically add a little more logic there to say "if there are two uses, and we can guarantee that the transformation do not increase the number of ZExt instructions, we also want to perform the transformation here. " What are your concerns for such a change?
May 12 2018
Yes, you are right in that correlated value propagation can handle the case. However, if the divisor is a power of 2 and instcombine is invoked before correlated value propagation (as in the opt passes), instcombine transforms the i64 udiv/urem into i64 shift/and. My original motivated test case is like below, the change here is required in order for opt transform all the i64 arithmetic operations into i32 operations.
Use early return.
May 11 2018
Add another test case.
Apr 26 2018
- Removed nuw nsw from the tests.
Apr 25 2018
- Removed nuw nsw from the tests.
- Removed nuw nsw from the tests.
- Removed nuw nsw from the tests.
- Removed nuw nsw from the tests.
- Removed \brief from the comment.
- Remove nuw nsw from the tests.
Apr 24 2018
- Applied git clang-format.
- Use prefix ++ instead of postfix ++.
- Use prefix ++ instead of postfix ++.
- Use instnamer to fix names.
- Use instnamer to fix names.
- Use instnamer to fix the name of the tests.