Page MenuHomePhabricator

easyaspi314 (easyaspi314 (Devin))
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 12 2018, 8:01 PM (103 w, 2 d)

Recent Activity

Nov 7 2019

easyaspi314 added a comment to D69762: [Diagnostics] Try to improve warning message for -Wreturn-type.

How about "this non-void {function|block} {may|does} not return a value"

Nov 7 2019, 1:09 PM · Restricted Project, Restricted Project

Jan 22 2019

easyaspi314 added a comment to D56474: [ARM] [NEON] Add ROTR/ROTL lowering.

Huh. Sorry about the idle time.

Jan 22 2019, 4:49 PM

Jan 18 2019

easyaspi314 added a comment to D56472: Change test/tools/lto/no-bitcode.s requirement from arm to aarch64.

@RKSimon considering that I am a newbie, I am pretty sure I do not.

Jan 18 2019, 7:48 AM

Jan 9 2019

easyaspi314 added a comment to D56474: [ARM] [NEON] Add ROTR/ROTL lowering.

Definitely either result/writeback cycles.

Jan 9 2019, 7:53 PM
easyaspi314 updated the diff for D56474: [ARM] [NEON] Add ROTR/ROTL lowering.

Fixed my incredibly stupid typo, added FSHL/FSHR support, and used llvm_unreachable instead of the ugly goto.

Jan 9 2019, 7:14 PM
easyaspi314 added a comment to D56474: [ARM] [NEON] Add ROTR/ROTL lowering.

Huh. Apparently, vshr/vsli is actually faster.

Jan 9 2019, 6:48 PM
easyaspi314 added a comment to D56474: [ARM] [NEON] Add ROTR/ROTL lowering.

I do want to mention that VSLI is not beneficial if it is required for the value to be in the same register as before. This pattern will place the value in a different register.

Jan 9 2019, 5:29 PM
easyaspi314 added a comment to D56474: [ARM] [NEON] Add ROTR/ROTL lowering.

Huh. Does clang even emit FSHL/FSHR instructions?

Jan 9 2019, 4:35 PM
easyaspi314 planned changes to D56474: [ARM] [NEON] Add ROTR/ROTL lowering.

I'm not sure this is really the best approach... essentially, there are two relevant transforms here:

  1. A rotate by a multiple of 8 can be transformed into a shuffle. I guess the only case that's really relevant on ARM is vrev, since there aren't any other single-instruction shifts that correspond to a rotate, so maybe it's okay to just special-case here.

Yeah, also, you need to load the pattern.

  1. (OR X, (SRL Y, N)) can be transformed to VSRI if X has enough known trailing zeros. You can special-case rotates (or slightly more generally, FSHL/FSHR), but it's not much harder to handle the general case.

True. I should probably implement that.

I'm also a little concerned that the VSRI could actually be slower in certain cases... if you look at timings for a Cortex-A57, 128-bit VSRI takes two cycles throughput to execute, as opposed to one for a regular shift.

Well VSRI takes care of both VSHR and VORR. It doesn't save any time, but it saves space.

Jan 9 2019, 2:31 PM

Jan 8 2019

easyaspi314 created D56474: [ARM] [NEON] Add ROTR/ROTL lowering.
Jan 8 2019, 9:16 PM
easyaspi314 created D56472: Change test/tools/lto/no-bitcode.s requirement from arm to aarch64.
Jan 8 2019, 7:18 PM
easyaspi314 updated the diff for D56118: [ARM]: Add optimized NEON uint64x2_t multiply routine..

Updated tests with the generation tools, changed multiply cost to 8.

Jan 8 2019, 6:58 PM · Restricted Project

Jan 6 2019

easyaspi314 added a comment to D56118: [ARM]: Add optimized NEON uint64x2_t multiply routine..
v2i64 mul5(v2i64 val)
{
    return val * 5;
}
Jan 6 2019, 7:03 PM · Restricted Project

Jan 3 2019

easyaspi314 updated the summary of D56118: [ARM]: Add optimized NEON uint64x2_t multiply routine..
Jan 3 2019, 8:27 PM · Restricted Project
easyaspi314 updated the diff for D56118: [ARM]: Add optimized NEON uint64x2_t multiply routine..

Ok, done. I fixed twomul and constant interleaving, and added a couple more tests.

Jan 3 2019, 7:37 PM · Restricted Project
easyaspi314 added a comment to D56118: [ARM]: Add optimized NEON uint64x2_t multiply routine..

vmul.i32 Qd,Qn,Qm actually takes 4 cycles, which means twomul has the same timing as ssemul, 11 cycles.
@efriedma that explains why twomul wasn't visibly faster in my tests.

Jan 3 2019, 9:29 AM · Restricted Project

Jan 2 2019

easyaspi314 added a comment to D56118: [ARM]: Add optimized NEON uint64x2_t multiply routine..

Oops, that's not right. mult_constant isn't matching.

Jan 2 2019, 7:28 PM · Restricted Project
easyaspi314 added a comment to D56118: [ARM]: Add optimized NEON uint64x2_t multiply routine..

This is what I have now. Constant swapping and twomul is now implemented, and I will update the patch once I finish the documentation and tests, as well as double-check the return values on my phone.

typedef unsigned long long v2i64 __attribute__((vector_size(16)));
Jan 2 2019, 6:01 PM · Restricted Project
easyaspi314 planned changes to D56118: [ARM]: Add optimized NEON uint64x2_t multiply routine..

I figured things out and I am on track to adding twomul and constant interleaving.

Jan 2 2019, 4:57 PM · Restricted Project

Dec 31 2018

easyaspi314 updated the diff for D56118: [ARM]: Add optimized NEON uint64x2_t multiply routine..

Fixed cost model and added cost model tests. I also made a larger diff.

Dec 31 2018, 4:59 PM · Restricted Project

Dec 29 2018

easyaspi314 planned changes to D56118: [ARM]: Add optimized NEON uint64x2_t multiply routine..

I am going to add constant interleaving, more tests, and call it done. I think that twomul and interleaved loads are much smaller and lower priority optimizations, and something I am not comfortable enough to do myself.

Dec 29 2018, 5:55 PM · Restricted Project

Dec 28 2018

easyaspi314 added a reviewer for D56118: [ARM]: Add optimized NEON uint64x2_t multiply routine.: eli.friedman.
Dec 28 2018, 7:30 PM · Restricted Project
easyaspi314 updated the diff for D56118: [ARM]: Add optimized NEON uint64x2_t multiply routine..

*facepalms* Wow, I am an idiot.

Dec 28 2018, 7:28 PM · Restricted Project
easyaspi314 updated the diff for D56118: [ARM]: Add optimized NEON uint64x2_t multiply routine..

Actually, further optimizations can be done later. At the very least, this is usable. I still left the notes for someone who wants to implement them later, but I think that at least, for now, it is far better than it was.

Dec 28 2018, 7:15 PM · Restricted Project

Dec 27 2018

easyaspi314 updated the summary of D56118: [ARM]: Add optimized NEON uint64x2_t multiply routine..
Dec 27 2018, 9:10 PM · Restricted Project
easyaspi314 retitled D56118: [ARM]: Add optimized NEON uint64x2_t multiply routine. from [ARM]: WIP: Add optimized uint64x2_t multiply routine. to [ARM]: WIP: Add optimized NEON uint64x2_t multiply routine..
Dec 27 2018, 9:10 PM · Restricted Project
easyaspi314 added inline comments to D56118: [ARM]: Add optimized NEON uint64x2_t multiply routine..
Dec 27 2018, 9:01 PM · Restricted Project
easyaspi314 created D56118: [ARM]: Add optimized NEON uint64x2_t multiply routine..
Dec 27 2018, 8:48 PM · Restricted Project

Dec 24 2018

easyaspi314 added a comment to rL350059: [X86] Use GetDemandedBits to simplify the operands of PMULDQ/PMULUDQ..

That looks much better. 👍🏻

Dec 24 2018, 1:02 PM

Dec 23 2018

easyaspi314 added a comment to D56057: [X86] Individually simplify both operands of PMULDQ/PMULUDQ using the other entry point of SimplifyDemandedBits that allows the one use check of the root node to be suppressed..

The really concerning thing is that this patch does the exact same thing that's done by simplifyI24 in AMDGPUISelLowering.cpp. I assume Simon like myself assumed that established infrastructure like that was doing the right thing and didn't question it too much.

Dec 23 2018, 10:18 PM
easyaspi314 added a comment to D56057: [X86] Individually simplify both operands of PMULDQ/PMULUDQ using the other entry point of SimplifyDemandedBits that allows the one use check of the root node to be suppressed..

Looking at the vector-reduce-mul.ll changes, I don't think this patch is valid. If the first node has multiple uses we can't propagate the input demanded bits to the next operation down.

Dec 23 2018, 8:41 PM
easyaspi314 added a comment to D56057: [X86] Individually simplify both operands of PMULDQ/PMULUDQ using the other entry point of SimplifyDemandedBits that allows the one use check of the root node to be suppressed..

I am a little suspicious about these changes to the test results.

Dec 23 2018, 1:07 PM

Apr 16 2018

easyaspi314 added inline comments to D45643: [Failing one test] Reword [-Wreturn-type] messages to "non-void x does not return a value".
Apr 16 2018, 9:55 AM · Restricted Project
easyaspi314 added inline comments to D45643: [Failing one test] Reword [-Wreturn-type] messages to "non-void x does not return a value".
Apr 16 2018, 9:22 AM · Restricted Project

Apr 13 2018

easyaspi314 created D45643: [Failing one test] Reword [-Wreturn-type] messages to "non-void x does not return a value".
Apr 13 2018, 4:10 PM · Restricted Project