Page MenuHomePhabricator

wengxt (Xuetian Weng)
User

Projects

User does not belong to any projects.

User Details

User Since
Jun 1 2015, 5:33 PM (215 w, 1 d)

Recent Activity

Oct 23 2017

wengxt closed D34911: Enable parsing C++ names generated by lambda functions..
Oct 23 2017, 8:52 PM

Jul 13 2017

wengxt added a comment to D34911: Enable parsing C++ names generated by lambda functions..

I don't have commit access. May any one help me to commit?

Jul 13 2017, 8:20 AM

Jun 30 2017

wengxt created D34911: Enable parsing C++ names generated by lambda functions..
Jun 30 2017, 2:38 PM

Aug 30 2015

wengxt added a comment to D12246: [NVPTX] change threading intrinsics from noduplicate to convergent.
for (int i = 0; i < *bound; ++i) {
  if (i == 0)
    __syncthreads();
}

This input program is valid as long as *bound > 0 has the same value across the block. Here loop-unrolling by a factor of 2 will separate off the first iteration of the loop into a duplicate body for the case where *bound is odd. I checked with an example loop that's similar but that doesn't use syncthreads() and LLVM does do unrolling by a factor of 2 in this way. If whether *bound is odd is divergent, then only part of the warp would execute the syncthreads() in the duplicate odd-case unrolled loop body. So I think that unrolling does have to be careful with divergent trip counts for loops that include __syncthreads() in cases such as this.

Aug 30 2015, 10:12 PM
wengxt updated the diff for D12484: [JumpThreading] make jump threading respect convergent annotation..

address jingyue's comment

Aug 30 2015, 7:44 PM
wengxt added inline comments to D12246: [NVPTX] change threading intrinsics from noduplicate to convergent.
Aug 30 2015, 2:22 PM
wengxt added a comment to D12246: [NVPTX] change threading intrinsics from noduplicate to convergent.
Aug 30 2015, 2:21 PM
wengxt updated the diff for D12246: [NVPTX] change threading intrinsics from noduplicate to convergent.

Separate jump threading change to another review.
Address jingyue's comment.

Aug 30 2015, 2:18 PM
wengxt added a parent revision for D12246: [NVPTX] change threading intrinsics from noduplicate to convergent: D12484: [JumpThreading] make jump threading respect convergent annotation..
Aug 30 2015, 2:14 PM
wengxt added a child revision for D12484: [JumpThreading] make jump threading respect convergent annotation.: D12246: [NVPTX] change threading intrinsics from noduplicate to convergent.
Aug 30 2015, 2:14 PM
wengxt retitled D12484: [JumpThreading] make jump threading respect convergent annotation. from [JumpThreading] make jump threading respect convergent annotation. For example, if (cond) { ... } else { ... } convergent_call(); if (cond) { ... } else { ... } should not be optimized to if (cond) { ... to [JumpThreading] make jump threading respect convergent annotation..
Aug 30 2015, 2:12 PM
wengxt added a reviewer for D12484: [JumpThreading] make jump threading respect convergent annotation.: arsenm.
Aug 30 2015, 2:12 PM
wengxt updated D12484: [JumpThreading] make jump threading respect convergent annotation..
Aug 30 2015, 2:11 PM
wengxt retitled D12484: [JumpThreading] make jump threading respect convergent annotation. from to [JumpThreading] make jump threading respect convergent annotation. For example, if (cond) { ... } else { ... } convergent_call(); if (cond) { ... } else { ... } should not be optimized to if (cond) { ....
Aug 30 2015, 2:09 PM

Aug 21 2015

wengxt updated D12241: [CUDA] Change initializer for CUDA device code based on CUDA documentation..
Aug 21 2015, 1:27 PM
wengxt added a comment to D12241: [CUDA] Change initializer for CUDA device code based on CUDA documentation..
Aug 21 2015, 1:26 PM
wengxt updated the diff for D12241: [CUDA] Change initializer for CUDA device code based on CUDA documentation..

make shared with initialization a hard error.

Aug 21 2015, 1:26 PM
wengxt updated the diff for D12241: [CUDA] Change initializer for CUDA device code based on CUDA documentation..

update based on comments.

Aug 21 2015, 1:09 PM
wengxt added a comment to D12241: [CUDA] Change initializer for CUDA device code based on CUDA documentation..

Just say: shared variables cannot have an initialization as part of their declaration. Whether being kept across invocations is undefined.

Aug 21 2015, 11:16 AM
wengxt added a comment to D12242: [NVPTX] Allow undef value as global initializer.
Aug 21 2015, 11:12 AM
wengxt updated the diff for D12242: [NVPTX] Allow undef value as global initializer.

update comments.

Aug 21 2015, 11:11 AM
wengxt retitled D12246: [NVPTX] change threading intrinsics from noduplicate to convergent from to [NVPTX] change threading intrinsics from noduplicate to convergent.
Aug 21 2015, 11:00 AM
wengxt added a parent revision for D12241: [CUDA] Change initializer for CUDA device code based on CUDA documentation.: D12242: [NVPTX] Allow undef value as global initializer.
Aug 21 2015, 9:54 AM
wengxt added a child revision for D12242: [NVPTX] Allow undef value as global initializer: D12241: [CUDA] Change initializer for CUDA device code based on CUDA documentation..
Aug 21 2015, 9:54 AM
wengxt retitled D12242: [NVPTX] Allow undef value as global initializer from to [NVPTX] Allow undef value as global initializer.
Aug 21 2015, 9:53 AM
wengxt retitled D12241: [CUDA] Change initializer for CUDA device code based on CUDA documentation. from to [CUDA] Change initializer for CUDA device code based on CUDA documentation..
Aug 21 2015, 9:31 AM

Aug 10 2015

wengxt added a comment to D11855: SelectionDAG: Prefer to combine multiplication with less uses for fma.
Aug 10 2015, 11:48 AM
wengxt updated the diff for D11855: SelectionDAG: Prefer to combine multiplication with less uses for fma.

Address jingyue's comment

Aug 10 2015, 11:48 AM
wengxt added a comment to D11855: SelectionDAG: Prefer to combine multiplication with less uses for fma.
Aug 10 2015, 11:18 AM
wengxt updated the diff for D11855: SelectionDAG: Prefer to combine multiplication with less uses for fma.

Address comments

Aug 10 2015, 11:17 AM

Aug 7 2015

wengxt retitled D11855: SelectionDAG: Prefer to combine multiplication with less uses for fma from to SelectionDAG: Prefer to combine multiplication with less uses for fma.
Aug 7 2015, 5:37 PM

Aug 6 2015

wengxt updated subscribers of D11143: [RFC] Cross Block DSE.
Aug 6 2015, 1:02 PM

Aug 5 2015

wengxt added inline comments to D11774: [NVPTX] Use LDG for pointer induction variables.
Aug 5 2015, 11:59 AM

Jul 23 2015

wengxt retitled D11311: Remove the user-count threshold when analyzing read attributes from Replace hardcoded threshold when analyze read attributes to Replace the hardcoded threshold when analyze read attributes.
Jul 23 2015, 3:47 PM
wengxt retitled D11311: Remove the user-count threshold when analyzing read attributes from Replace hardcoded threshold with an option when analyze read attributes to Replace hardcoded threshold when analyze read attributes.
Jul 23 2015, 3:47 PM
wengxt updated the diff for D11311: Remove the user-count threshold when analyzing read attributes.

remove the threshold

Jul 23 2015, 3:46 PM
wengxt added a comment to D11311: Remove the user-count threshold when analyzing read attributes.

Xuetian Weng wrote:

wengxt added a comment.

Question one, what does this have to do with ld.global.nc? It looks like
you've got a load instruction (wrapped in an intrinsic) and it in turn
has many uses. The fact it's ld.global.nc doesn't seem to be relevant?

In theory, if you replaced the ld.global.nc with a LoadInst, would you
have the same problem? Or is there some other optimization that occurs
on those uses of a LoadInst that doesn't occur on the ld.global.nc?

In our case, we intend to transform LoadInst into ld.global.nc during Instruction
selection, not the other direction. It's only safe to use ld.global.nc for LoadInst
if the pointer being accessed is readonly (and noalias, which is irreverent to this
patch).

I realize that, but my question stands. Is there an optimization that
should be happening to these ld.global.nc before we get here?

My commit message might be confusing. I don't think this change has anything to do with ld.global.nc. The LoadInst to ld.global.nc optimization is just one of the examples which can benefit from this change. And we happened to notice that when we are working on our cuda compiler.

Jul 23 2015, 10:01 AM

Jul 20 2015

wengxt updated D11314: [NVPTX] make load on global readonly memory to use ldg.
Jul 20 2015, 1:34 PM
wengxt retitled D11366: [MDA] change BlockScanLimit into a command line option. from to Change BlockScanLimit into a command line option..
Jul 20 2015, 12:41 PM
wengxt updated D11314: [NVPTX] make load on global readonly memory to use ldg.
Jul 20 2015, 12:06 PM
wengxt updated the diff for D11314: [NVPTX] make load on global readonly memory to use ldg.

update commit message

Jul 20 2015, 12:05 PM
wengxt added a comment to D11311: Remove the user-count threshold when analyzing read attributes.

Question one, what does this have to do with ld.global.nc? It looks like
you've got a load instruction (wrapped in an intrinsic) and it in turn
has many uses. The fact it's ld.global.nc doesn't seem to be relevant?

In theory, if you replaced the ld.global.nc with a LoadInst, would you
have the same problem? Or is there some other optimization that occurs
on those uses of a LoadInst that doesn't occur on the ld.global.nc?

Jul 20 2015, 11:59 AM

Jul 17 2015

wengxt retitled D11314: [NVPTX] make load on global readonly memory to use ldg from to [NVPTX] make load on global readonly memory to use ldg.
Jul 17 2015, 2:51 PM
wengxt updated the diff for D11311: Remove the user-count threshold when analyzing read attributes.

update test plan info

Jul 17 2015, 2:32 PM
wengxt retitled D11311: Remove the user-count threshold when analyzing read attributes from to Replace hardcoded threshold with an option when analyze read attributes.
Jul 17 2015, 1:37 PM

Jul 1 2015

wengxt added a comment to D10549: Add NVPTXPeephole pass to reduce unnecessary address cast.

Hi Samuel,

Jul 1 2015, 12:30 PM
wengxt added inline comments to D10853: [NVPTX] Move NVPTXPeephole after NVPTXPrologEpilogPass.
Jul 1 2015, 11:22 AM
wengxt updated the diff for D10853: [NVPTX] Move NVPTXPeephole after NVPTXPrologEpilogPass.

update according to comments

Jul 1 2015, 11:20 AM

Jun 30 2015

wengxt retitled D10853: [NVPTX] Move NVPTXPeephole after NVPTXPrologEpilogPass from to [NVPTX] Move NVPTXPeephole after NVPTXPrologEpilogPass.
Jun 30 2015, 4:09 PM
wengxt accepted D10849: [NVPTX] cleanups and refacotring in NVPTXFrameLowering.cpp.

LGTM

Jun 30 2015, 2:27 PM
wengxt retitled D10844: [NVPTX] Fix issue introduced in D10321 from to [NVPTX] Fix issue introduced in D10321.
Jun 30 2015, 11:01 AM

Jun 26 2015

wengxt updated the diff for D10750: Make InstCombine aware of TargetTransformInfo when optimize extension.

add datalayout test case to make sure it will not pass in old code

Jun 26 2015, 12:02 PM

Jun 25 2015

wengxt updated the diff for D10750: Make InstCombine aware of TargetTransformInfo when optimize extension.

update reviewer

Jun 25 2015, 3:15 PM
wengxt added a reviewer for D10750: Make InstCombine aware of TargetTransformInfo when optimize extension: majnemer.
Jun 25 2015, 3:13 PM
wengxt retitled D10750: Make InstCombine aware of TargetTransformInfo when optimize extension from to Make InstCombine aware of TargetTransformInfo when optimize extension.
Jun 25 2015, 3:10 PM

Jun 24 2015

wengxt updated the diff for D10549: Add NVPTXPeephole pass to reduce unnecessary address cast.

update based on comments.

Jun 24 2015, 11:39 AM
wengxt added a comment to D10549: Add NVPTXPeephole pass to reduce unnecessary address cast.

What is the future expectation for NVPTXPeephole? Are you planning on adding additional transforms? If not, perhaps a more specific name is warranted. Otherwise, LGTM! Thanks!

Jun 24 2015, 11:39 AM

Jun 19 2015

wengxt updated the diff for D10549: Add NVPTXPeephole pass to reduce unnecessary address cast.

update based on previous comments

Jun 19 2015, 2:27 PM
wengxt added inline comments to D10549: Add NVPTXPeephole pass to reduce unnecessary address cast.
Jun 19 2015, 1:31 PM

Jun 18 2015

wengxt added inline comments to D10549: Add NVPTXPeephole pass to reduce unnecessary address cast.
Jun 18 2015, 5:45 PM
wengxt updated the diff for D10549: Add NVPTXPeephole pass to reduce unnecessary address cast.

add a new test case to test a case of not emitting cvta.local %SP, %SPL

Jun 18 2015, 5:43 PM
wengxt updated the diff for D10549: Add NVPTXPeephole pass to reduce unnecessary address cast.

Fix issues raised in comments.

Jun 18 2015, 5:38 PM
wengxt added inline comments to D10549: Add NVPTXPeephole pass to reduce unnecessary address cast.
Jun 18 2015, 5:37 PM
wengxt retitled D10549: Add NVPTXPeephole pass to reduce unnecessary address cast from to Add NVPTXPeephole pass to reduce unnecessary address cast.
Jun 18 2015, 1:56 PM

Jun 17 2015

wengxt updated the diff for D10483: Add NVPTXLowerAlloca pass to convert alloca'ed memory to local address.

sorry for mistake..

Jun 17 2015, 3:22 PM
wengxt updated the diff for D10483: Add NVPTXLowerAlloca pass to convert alloca'ed memory to local address.

Fix issues according to jingyue's comments

Jun 17 2015, 2:48 PM

Jun 16 2015

wengxt updated the diff for D10483: Add NVPTXLowerAlloca pass to convert alloca'ed memory to local address.

Try to address jingyue's comments.

Jun 16 2015, 6:48 PM
wengxt added a comment to D10483: Add NVPTXLowerAlloca pass to convert alloca'ed memory to local address.

This will depends on D10482 to eliminate alloca across the addrspacecast boundary, otherwise there maybe performance regression because SROA will not work properly.

Jun 16 2015, 12:30 PM
wengxt added a parent revision for D10483: Add NVPTXLowerAlloca pass to convert alloca'ed memory to local address: D10482: Make SROA handle addrspacecast.
Jun 16 2015, 12:26 PM
wengxt retitled D10483: Add NVPTXLowerAlloca pass to convert alloca'ed memory to local address from to Add NVPTXLowerAlloca pass to convert alloca'ed memory to local address.
Jun 16 2015, 12:26 PM
wengxt retitled D10482: Make SROA handle addrspacecast from to Make SROA handle addrspacecast.
Jun 16 2015, 12:21 PM

Jun 15 2015

wengxt added a comment to D4501: Teach SROA about addrspacecast. .

So this is a case that we currently want to handle in nvptx, which is not covered instcombine/sroa right now.

Jun 15 2015, 5:03 PM

Jun 9 2015

wengxt updated the diff for D10327: Reassociate in favor of grouping previously paired operands.

Updated based on Daniel's comment.

Jun 9 2015, 5:15 PM

Jun 8 2015

wengxt added a comment to D10327: Reassociate in favor of grouping previously paired operands.

Related discussion and original patch on llvmdev.

Jun 8 2015, 4:23 PM
wengxt retitled D10327: Reassociate in favor of grouping previously paired operands from to Reassociate in favor of grouping previously paired operands.
Jun 8 2015, 4:03 PM