Page MenuHomePhabricator

jaykang10 (JinGu Kang)
User

Projects

User does not belong to any projects.

User Details

User Since
Jun 18 2014, 2:14 AM (274 w, 1 h)

Recent Activity

Jun 7 2019

jaykang10 added inline comments to D62962: Clang implementation of sizeless types.
Jun 7 2019, 2:29 AM · Restricted Project

Jun 12 2018

jaykang10 updated the diff for D48066: Add one more No-alias case to alias analysis..
Jun 12 2018, 9:31 AM
jaykang10 added a comment to D48066: Add one more No-alias case to alias analysis..

I don't understand what you're saying. Once you shift, you know that the lowest-order bit is zero. You can naturally combine that with whatever other information also happens to be known about idx.

Jun 12 2018, 8:55 AM
jaykang10 updated the diff for D48066: Add one more No-alias case to alias analysis..
Jun 12 2018, 8:54 AM
jaykang10 added a comment to D48066: Add one more No-alias case to alias analysis..
knownZeros(n) == (knownZeros(idx) << 1) | 1 and
knownOnes(n) == knownOnes(idx) << 1

I meant if nothing is known about idx, we fail to get No-alias opportunity. If there is nothing for KnownBits, let's just check odd number and the idx is used by both offsets as below.

Jun 12 2018, 6:50 AM
jaykang10 added inline comments to D48066: Add one more No-alias case to alias analysis..
Jun 12 2018, 6:28 AM
jaykang10 updated the diff for D48066: Add one more No-alias case to alias analysis..
Jun 12 2018, 5:46 AM
jaykang10 added a reviewer for D48066: Add one more No-alias case to alias analysis.: hfinkel.
Jun 12 2018, 3:25 AM
jaykang10 created D48066: Add one more No-alias case to alias analysis..
Jun 12 2018, 2:50 AM

Apr 11 2017

jaykang10 abandoned D31866: Keep NDEBUG out of header.
Apr 11 2017, 3:18 AM

Apr 10 2017

jaykang10 added inline comments to D31866: Keep NDEBUG out of header.
Apr 10 2017, 4:13 PM
jaykang10 added inline comments to D31866: Keep NDEBUG out of header.
Apr 10 2017, 12:19 AM

Apr 9 2017

jaykang10 created D31866: Keep NDEBUG out of header.
Apr 9 2017, 1:42 PM

Apr 4 2017

jaykang10 added a comment to D30810: Preserve vec3 type..

LGTM! Thanks!

Apr 4 2017, 9:54 AM
jaykang10 committed rL299445: Preserve vec3 type..
Preserve vec3 type.
Apr 4 2017, 9:53 AM
jaykang10 closed D30810: Preserve vec3 type. by committing rL299445: Preserve vec3 type..
Apr 4 2017, 9:53 AM

Mar 24 2017

jaykang10 added a comment to D30810: Preserve vec3 type..

I believe the argument lacks numbers (or at least you have them but didn't mention). I didn't hear about performance results, or validation that this was actually tested for correctness. Small test cases prove a point, but can't be considered general.

OTOH, it seems like this is exactly why you want the flag, to hide any potential issues and play with it. I'm not opposed to adding the flag if there's a commitment to actually get the result and change the default to whatever seems to be better, does that seems reasonable?

Mar 24 2017, 10:33 AM

Mar 23 2017

jaykang10 updated the diff for D30810: Preserve vec3 type..

Preserved vec3 type on __builtin_astype.

Mar 23 2017, 10:10 AM

Mar 22 2017

jaykang10 added a comment to D30810: Preserve vec3 type..

Yes. This would make sense. I am guessing that in vec3->vec4, we will have 3 loads and 4 stores and in vec4->vec3 we will have 4 loads and 3 stores?

Mar 22 2017, 10:40 AM

Mar 21 2017

jaykang10 added a comment to D30810: Preserve vec3 type..

Would you be able to update ScalarExprEmitter::VisitAsTypeExpr implementation accordingly to make things consistent?

Not sure it got lost somewhere... do you think you could address this too?

I guess due to 4 element alignment the generated casts as vec4 should be fine for vec3, but it is worth checking...

Mar 21 2017, 5:47 PM
jaykang10 added a comment to D30810: Preserve vec3 type..

As a result, I think it would be good for clang to have both of features and I would like to stick to the option "-fpresereve-vec3' to change the behavior easily.

The motivation doesn't seem solid to me, who else is going to benefit from this flag?

There are some off the main tree implementation that would benefit. But in the case of AMD GPU 3 loads/stores will be produced instead of 4. Sounds like a good optimization to me. As I said in my previous comment I think it should have been the default behavior from the beginning, but since different implementation landed first we can integrate this one now with an additional option.

Additionally, Here is assembly output from vec3 with amdgcn target. :)

LLVM IR

define void @foo(<3 x float>* nocapture %a, <3 x float>* nocapture readonly %b) {
entry:
  %0 = load <3 x float>, <3 x float>* %b, align 16
  store <3 x float> %0, <3 x float>* %a, align 16
  ret void
}

Assembly Output

	.text
	.section	.AMDGPU.config
	.long	47176
	.long	11272256
	.long	47180
	.long	132
	.long	47200
	.long	0
	.long	4
	.long	0
	.long	8
	.long	0
	.text
	.globl	foo
	.p2align	8
	.type	foo,@function
foo:                                    ; @foo
; BB#0:                                 ; %entry
	s_load_dword s2, s[0:1], 0x9
	s_load_dword s0, s[0:1], 0xa
	s_mov_b32 s4, SCRATCH_RSRC_DWORD0
	s_mov_b32 s5, SCRATCH_RSRC_DWORD1
	s_mov_b32 s6, -1
	s_mov_b32 s8, s3
	s_mov_b32 s7, 0xe8f000
	s_waitcnt lgkmcnt(0)
	v_mov_b32_e32 v0, s0
	buffer_load_dword v2, v0, s[4:7], s8 offen
	buffer_load_dword v3, v0, s[4:7], s8 offen offset:8
	buffer_load_dword v0, v0, s[4:7], s8 offen offset:4
	v_mov_b32_e32 v1, s2
	s_waitcnt vmcnt(0)
	buffer_store_dword v0, v1, s[4:7], s8 offen offset:4
	buffer_store_dword v2, v1, s[4:7], s8 offen
	buffer_store_dword v3, v1, s[4:7], s8 offen offset:8
	s_endpgm
.Lfunc_end0:
	.size	foo, .Lfunc_end0-foo

	.section	.AMDGPU.csdata
; Kernel info:
; codeLenInByte = 112
; NumSgprs: 9
; NumVgprs: 4
; FloatMode: 192
; IeeeMode: 1
; ScratchSize: 0
; LDSByteSize: 0 bytes/workgroup (compile time only)
; SGPRBlocks: 1
; VGPRBlocks: 0
; NumSGPRsForWavesPerEU: 9
; NumVGPRsForWavesPerEU: 4
; ReservedVGPRFirst: 0
; ReservedVGPRCount: 0
; COMPUTE_PGM_RSRC2:USER_SGPR: 2
; COMPUTE_PGM_RSRC2:TRAP_HANDLER: 0
; COMPUTE_PGM_RSRC2:TGID_X_EN: 1
; COMPUTE_PGM_RSRC2:TGID_Y_EN: 0
; COMPUTE_PGM_RSRC2:TGID_Z_EN: 0
; COMPUTE_PGM_RSRC2:TIDIG_COMP_CNT: 0

	.section	".note.GNU-stack"
Mar 21 2017, 3:03 AM

Mar 16 2017

jaykang10 added a comment to D30810: Preserve vec3 type..

As a result, I think it would be good for clang to have both of features and I would like to stick to the option "-fpresereve-vec3' to change the behavior easily.

The motivation doesn't seem solid to me, who else is going to benefit from this flag?

There are some off the main tree implementation that would benefit. But in the case of AMD GPU 3 loads/stores will be produced instead of 4. Sounds like a good optimization to me. As I said in my previous comment I think it should have been the default behavior from the beginning, but since different implementation landed first we can integrate this one now with an additional option.

Mar 16 2017, 5:15 AM
jaykang10 added a comment to D30810: Preserve vec3 type..

Actually, it's not a mis-compile. The record layout shows that there is a padding before field f2 and f2 starts at byte 16. So using "store <4 x float>" doesn't overwrite the field.

It depends on float3's alignment. I guess the float3's alignment is 16 on your target so there is a padding bytes before f2 to be aligned by 16. If you add attribute((packed)) on struct type, you could see overwrite.

I looked at ItaniumRecordLayoutBuilder to understand how attribute((packed)) affects the struct's layout.

If the number of elements of a vector is not a power of 2, ASTContext::getTypeInfoImpl rounds up the width and alignment to the next power of 2. So the size of float3 is 16 bytes. attribute((packed)) can change the alignment of float3, but it doesn't change its size, so there is a 4-byte padding between field f1 and f2.

Mar 16 2017, 4:40 AM

Mar 15 2017

jaykang10 added a comment to D30810: Preserve vec3 type..

The motivation doesn't seem solid to me, who else is going to benefit from this flag? You also didn't explain why doing this transformation yourself (looking through the shuffle) on your downstream pass isn't enough for you. We generally try to avoid adding flags if not for a good reason.

Mar 15 2017, 7:09 PM
jaykang10 added a comment to D30810: Preserve vec3 type..

I don't think there is anything wrong with the generation of vec3->vec4 in Clang. I believe the motivation for this was the OpenCL spec treating vec3 as vec4 aligned type (see section 6.1.5: https://www.khronos.org/registry/OpenCL/specs/opencl-2.0-openclc.pdf#12). So in terms of memory layout vec3 wouldn't be any different to vec4. But in terms of operations (including loads/stores) there can be potential gain from not handling the 4th element. This can be exploited by some targets. I think generating the vec3 from the frontend would be a better choice in the first place. Because backend can decide how to handle this. Including for architectures with no SIMD support it would just generate 3 separate loads/stores. Right now it seems that it will be forced to generate 4 loads/stores.

But considering that transformation to vec4 has been the default implementation for quite a while in the frontend, I think we would need a stronger motivation for switching to original vec3. So current approach with a special flag for preserving vec3 should be good enough to fit all needs.

Mar 15 2017, 3:16 PM
jaykang10 added a comment to D30810: Preserve vec3 type..

I believe the assumption is more practical: most part of upstream llvm targets only support vectors with even sized number of lanes. And in those cases you would have to expand to a 4x vector and leave the 4th element as undef anyway, so it was done in the front-end to get rid of it right away. Probably GPU targets do some special tricks here during legalization.

I have compiled below code, which current clang generates for vec3, using llc with amdgcn target.

Mar 15 2017, 12:17 AM

Mar 14 2017

jaykang10 added a comment to D30810: Preserve vec3 type..

Actually, it's not a mis-compile. The record layout shows that there is a padding before field f2 and f2 starts at byte 16. So using "store <4 x float>" doesn't overwrite the field.

Mar 14 2017, 4:24 PM
jaykang10 added a comment to D30810: Preserve vec3 type..

Hi JinGu,

I just read the discussion on cfe-dev, some comments:

My colleague and I are implementing a transformation pass between LLVM IR and another IR and we want to keep the 3-component vector types in our target IR

Why can't you go back through the shuffle to find out that it comes from a vec3 -> vec4 transformation? Adding a flag here just for that seems very odd.

Hi Bruno,

Thanks for your comment. We have a pass to undo the vec4 to vec3. I wondered why clang generates the vec4 for vec3 load/store. As you can see the comment on clang's code, they are generated for better performance. I had a questions at this point. clang should consider vector load/store aligned by 4 regardless of target??? llvm's codegen could handle vec3 according to targets' vector load/store with their alignment. I agree the flag looks odd but I was concerned some llvm's targets depend on the vec4 so I suggested to add the flag. If I missed something, please let me know.

I think doing this transformation in a frontend was not the best choice because we are losing the source information too early. But some targets can offer a better support for vec3 natively than converting to vec4. Regarding the option, I am wondering whether adding this as a part of TargetInfo would be a better solution. Although, I don't think any currently available upstream targets support this.

Mar 14 2017, 11:49 AM

Mar 13 2017

jaykang10 added a comment to D30810: Preserve vec3 type..

Hi JinGu,

I just read the discussion on cfe-dev, some comments:

My colleague and I are implementing a transformation pass between LLVM IR and another IR and we want to keep the 3-component vector types in our target IR

Why can't you go back through the shuffle to find out that it comes from a vec3 -> vec4 transformation? Adding a flag here just for that seems very odd.

Mar 13 2017, 2:43 PM
jaykang10 added a comment to D30810: Preserve vec3 type..

Would you be able to update ScalarExprEmitter::VisitAsTypeExpr implementation accordingly to make things consistent?

Mar 13 2017, 12:47 PM
jaykang10 added a comment to D30810: Preserve vec3 type..

Could you elaborate on the motivation for this change?

I was wondering why clang (CodeGen) needed the help of a command line option to decide whether vec3 should be converted to vec4. Can it just preserve vec3 when the architecture is spir?

Mar 13 2017, 2:07 AM

Mar 10 2017

jaykang10 added a comment to D30810: Preserve vec3 type..

Could you please add your test here (probably goes to test/CodeGenOpenCL)?

Oops!
I am so sorry. I missed it. I have updated it.

Mar 10 2017, 8:27 AM
jaykang10 updated the diff for D30810: Preserve vec3 type..

Changed help text for option and Added test file.

Mar 10 2017, 8:26 AM
jaykang10 updated the diff for D30810: Preserve vec3 type..

Fixed typo.

Mar 10 2017, 3:36 AM
jaykang10 updated the diff for D30810: Preserve vec3 type..

Added -f prefix to option name.

Mar 10 2017, 3:32 AM

Mar 9 2017

jaykang10 created D30810: Preserve vec3 type..
Mar 9 2017, 10:43 PM