Page MenuHomePhabricator

tra (Artem Belevich)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 8 2015, 1:53 PM (390 w, 6 d)

Recent Activity

Thu, Jun 30

tra added a comment to D128914: [HIP] Add support for handling HIP in the linker wrapper.

Syntax/style looks OK to me with a few nits.

Thu, Jun 30, 12:36 PM · Restricted Project, Restricted Project

Wed, Jun 29

tra accepted D128850: [HIP] Generate offloading entries for HIP with the new driver..
Wed, Jun 29, 1:27 PM · Restricted Project, Restricted Project

Tue, Jun 28

tra added a comment to D128752: [CUDA] Stop adding CUDA features twice.

Do we have tests that verify -target-feature arguments? It may be worth adding a test case there checking for redundant features.

Yeah, we have some existing tests that check for including the target features at least once. I felt like there was no need to include a test for what was more or less an oversight

Tue, Jun 28, 2:48 PM · Restricted Project, Restricted Project
tra accepted D128752: [CUDA] Stop adding CUDA features twice.

Do we have tests that verify -target-feature arguments? It may be worth adding a test case there checking for redundant features.

Tue, Jun 28, 1:52 PM · Restricted Project, Restricted Project
tra added a comment to D128752: [CUDA] Stop adding CUDA features twice.

we no longer will have a cached CUDA installation so we will usually create it twice.

Does that result in extra output in case we find an unexpected CUDA version, or when compiler is run with -v ?

We may want to wrap installation detector creation into some sort of singleton creation function.

We already create another one for the call coming from Clang, this patch just gets rid of the other call. I think the printouts you're talking about come from the variable in the CUDA toolchain specifically. Here we simply create one to get the version and throw it away. It's not ideal to do the same work twice, so we could wrap this into some singleton interface. Maybe a static optional value inside of the Toolchains/Cuda.cpp file wither a getter that returns a reference to it. Though I don't think this is likely to be a bottleneck.

Tue, Jun 28, 1:14 PM · Restricted Project, Restricted Project
tra added a comment to D128752: [CUDA] Stop adding CUDA features twice.

we no longer will have a cached CUDA installation so we will usually create it twice.

Tue, Jun 28, 12:39 PM · Restricted Project, Restricted Project

Mon, Jun 27

tra updated subscribers of D124787: [NVPTX] Implement NVPTX AliasAnalysis.

@nikic -- we could use your input here. The patch looks OK to me, but I'm not particularly familiar with AA machinery in general and could use a second opinion.

Mon, Jun 27, 12:33 PM · Restricted Project, Restricted Project

Thu, Jun 23

tra accepted D128441: [CUDA] Do not embed a fatbinary when using the new driver.

The linker wrapper cannot do anything with these embedded PTX files because we do not know how to link them,

Thu, Jun 23, 10:54 AM · Restricted Project, Restricted Project
tra accepted D128022: [HIP] add -fhip-kernel-arg-name.
Thu, Jun 23, 10:44 AM · Restricted Project, Restricted Project

Wed, Jun 22

tra added a comment to D127901: [LinkerWrapper] Add PTX output to CUDA fatbinary in LTO-mode.

Then we do need a knob controlling whether we do want to embed PTX or not. The default should be "off" IMO.
We currently have --[no-]cuda-include-ptx= we may reuse for that purpose.

We could definitely re-use that. It's another option that probably need to go inside the binary itself since normally those options aren't passed to the linker.

Wed, Jun 22, 4:39 PM · Restricted Project, Restricted Project
tra accepted D127686: [Offloading] Embed the target features in the OffloadBinary.

LGTM with a nit.

Wed, Jun 22, 2:44 PM · Restricted Project, Restricted Project
tra added a comment to D127901: [LinkerWrapper] Add PTX output to CUDA fatbinary in LTO-mode.

Do we want/need PTX, I do not, but I don't mind having it. Someone will ask for it eventually.

Wed, Jun 22, 2:38 PM · Restricted Project, Restricted Project
tra added inline comments to D128022: [HIP] add -fhip-kernel-arg-name.
Wed, Jun 22, 2:26 PM · Restricted Project, Restricted Project
tra added inline comments to D128022: [HIP] add -fhip-kernel-arg-name.
Wed, Jun 22, 11:30 AM · Restricted Project, Restricted Project
tra accepted D127948: [TableGen] Add new operator !exists.

Very nice. LGTM.

Wed, Jun 22, 10:15 AM · Restricted Project, Restricted Project

Tue, Jun 21

tra added inline comments to D127948: [TableGen] Add new operator !exists.
Tue, Jun 21, 4:49 PM · Restricted Project, Restricted Project
tra added a comment to D127948: [TableGen] Add new operator !exists.

Thank you. One more thought after sleeping on it: What do you think about !exists? I think !exists<T>("foo") reads nicely as "does there exist a record of type T with name 'foo'?". It's not perfect, but I like that there's less confusion with "instanceof" or "isinstance" operators that some programming languages have.

Tue, Jun 21, 2:28 PM · Restricted Project, Restricted Project
tra accepted D127510: [NVPTX] Keep metadata attached to module-scope variables.

Good catch! Thank you for fixing this.

Tue, Jun 21, 10:29 AM · Restricted Project, Restricted Project

Thu, Jun 16

tra added a comment to D127901: [LinkerWrapper] Add PTX output to CUDA fatbinary in LTO-mode.

Playing devil's advocate, I've got to ask -- do we even want to support JIT?

Thu, Jun 16, 2:40 PM · Restricted Project, Restricted Project
tra accepted D127866: [NVPTX] Fix issues in ptxas integration to LIT tests.

LGTM in general.

Thu, Jun 16, 1:17 PM · Restricted Project, Restricted Project
tra accepted D127878: [NVPTX] Fix constant expression initializers for global variables.
Thu, Jun 16, 1:06 PM · Restricted Project, Restricted Project
tra added inline comments to D127668: [NVPTX] Fix pointer type for short 32-bit pointers.
Thu, Jun 16, 10:54 AM · Restricted Project, Restricted Project

Wed, Jun 15

tra added a comment to D127739: [TableGen] Check if input string of !isa is an instance of record type.

I have considered adding a new operator !instanceof<T>(s) to check if whether a record with type T and name s exists, but I think !isa have the same semantic meaning with !instanceof,

Wed, Jun 15, 10:47 AM · Restricted Project, Restricted Project

Tue, Jun 14

tra accepted D127774: [Binary] Add iterator to the OffloadBinary string maps.
Tue, Jun 14, 1:58 PM · Restricted Project, Restricted Project
tra added inline comments to D127774: [Binary] Add iterator to the OffloadBinary string maps.
Tue, Jun 14, 1:45 PM · Restricted Project, Restricted Project
tra accepted D127771: [HIP] fix long double size.

AFAICT, the test case you've added works fine with the compiler at HEAD: https://cuda.godbolt.org/z/q3xYMfdeb
I guess it only shows up in assertion-enabled builds. Can you check what happens if you run the test case compiled as CUDA? I suspect it will have the same issue -- we don't override setAuxTarget at all.

Tue, Jun 14, 11:27 AM · Restricted Project, Restricted Project
tra added a comment to D127739: [TableGen] Check if input string of !isa is an instance of record type.

Do you have a practical use case where this feature would be needed?

Tue, Jun 14, 10:25 AM · Restricted Project, Restricted Project
tra accepted D127707: [Clang] Simplify unifying target features.
Tue, Jun 14, 10:02 AM · Restricted Project, Restricted Project

Mon, Jun 13

tra added inline comments to D127686: [Offloading] Embed the target features in the OffloadBinary.
Mon, Jun 13, 5:30 PM · Restricted Project, Restricted Project
tra added inline comments to D127686: [Offloading] Embed the target features in the OffloadBinary.
Mon, Jun 13, 4:13 PM · Restricted Project, Restricted Project
tra added inline comments to D127686: [Offloading] Embed the target features in the OffloadBinary.
Mon, Jun 13, 3:39 PM · Restricted Project, Restricted Project
tra accepted D127668: [NVPTX] Fix pointer type for short 32-bit pointers.

Good catch. Thank you for fixing this.

Mon, Jun 13, 11:32 AM · Restricted Project, Restricted Project
tra accepted D127673: [OpenMP] Fix offload packager not writing to temps correctly.

LGTM overall with a couple of nits.

Mon, Jun 13, 11:28 AM · Restricted Project, Restricted Project

Fri, Jun 10

tra added inline comments to D127504: [NVPTX] Use the mask() operator to initialize packed structs with pointers.
Fri, Jun 10, 3:14 PM · Restricted Project, Restricted Project
tra accepted D127515: [Clang] Change host/device only compilation to a driver mode.
Fri, Jun 10, 12:27 PM · Restricted Project, Restricted Project

Thu, Jun 9

tra added a comment to D127267: [NVPTX] Add setAuxTarget override rather than make a new TargetInfo.

I think the root cause of the problem here is that CUDA compilation assumes that the GPU-side looks identical to the host side of the compilation. The test was intended to verify that and, AFAICT, it did its job flagging the issue here.

Thu, Jun 9, 2:55 PM · Restricted Project, Restricted Project
tra added inline comments to D127142: [HIP] Link with clang_rt.builtins.
Thu, Jun 9, 2:37 PM · Restricted Project

Wed, Jun 8

tra accepted D126904: [llvm-objdump] Add support for dumping embedded offloading data.
Wed, Jun 8, 1:58 PM · Restricted Project, Restricted Project

Jun 6 2022

tra added a reviewer for D127142: [HIP] Link with clang_rt.builtins: MaskRay.

Looks OK syntax-wise. Library path should probably be fixed, though it appears to be a somewhat separate issue and could be done in its own patch, if the required change is not trivial.

Jun 6 2022, 12:44 PM · Restricted Project

Jun 3 2022

tra added a comment to D125165: [Clang] Introduce clang-offload-packager tool to bundle device files.

@jhuber6 -- @MaskRay has found that ninja install is failing in a clean build with:

clang: error: unable to execute command: Executable "clang-offload-packager" doesn't exist!
Jun 3 2022, 1:50 PM · Restricted Project, Restricted Project
tra added inline comments to D126904: [llvm-objdump] Add support for dumping embedded offloading data.
Jun 3 2022, 12:36 PM · Restricted Project, Restricted Project
tra added inline comments to D126904: [llvm-objdump] Add support for dumping embedded offloading data.
Jun 3 2022, 11:02 AM · Restricted Project, Restricted Project

Jun 2 2022

tra added a comment to D126904: [llvm-objdump] Add support for dumping embedded offloading data.

This patch only supports printing these sections, later we will want to
support dumping files the user may be interested in via another flag. I
am unsure if this should go here in llvm-objdump or llvm-objcopy.

Jun 2 2022, 3:27 PM · Restricted Project, Restricted Project
tra added inline comments to D126812: [Binary] Promote OffloadBinary to inherit from Binary.
Jun 2 2022, 10:07 AM · Restricted Project, Restricted Project, Restricted Project

Jun 1 2022

tra accepted D126812: [Binary] Promote OffloadBinary to inherit from Binary.
Jun 1 2022, 2:08 PM · Restricted Project, Restricted Project, Restricted Project
tra added a comment to D126812: [Binary] Promote OffloadBinary to inherit from Binary.

LGTM in principle, with new nits.

Jun 1 2022, 1:26 PM · Restricted Project, Restricted Project, Restricted Project

May 31 2022

tra accepted D126681: [HIP] Fix static lib name on windows.
May 31 2022, 10:11 AM · Restricted Project, Restricted Project
tra accepted D126704: [HIP] Pass -Xoffload-linker option to device linker.
May 31 2022, 10:07 AM · Restricted Project, Restricted Project

May 25 2022

tra accepted D125904: [Cuda] Use fallback method to mangle externalized decls if no CUID given.
May 25 2022, 4:47 PM · Restricted Project, Restricted Project
tra added a comment to D125904: [Cuda] Use fallback method to mangle externalized decls if no CUID given.

It would be great to have some compile-time checks for that, if possible. Otherwise it will only manifest at run-time and the end user will have no clue what's going on.

Not sure how we could check it at compile-time, if we knew what it was supposed to be we could just set it properly right?

May 25 2022, 3:45 PM · Restricted Project, Restricted Project
tra added a comment to D125904: [Cuda] Use fallback method to mangle externalized decls if no CUID given.

I am OK with this patch.

May 25 2022, 3:00 PM · Restricted Project, Restricted Project
tra accepted D126398: [Clang] Introduce `--offload-link` option to perform offload device linking.
May 25 2022, 12:04 PM · Restricted Project, Restricted Project
tra added inline comments to D126398: [Clang] Introduce `--offload-link` option to perform offload device linking.
May 25 2022, 11:53 AM · Restricted Project, Restricted Project
tra added a comment to D125904: [Cuda] Use fallback method to mangle externalized decls if no CUID given.

How much work would it take to add cuid generation in the new driver, similar to what the old driver does, using the same logic, however imperfect it is? I'd be OK with that as a possibly permanent solution.

May 25 2022, 11:17 AM · Restricted Project, Restricted Project
tra added a comment to D126398: [Clang] Introduce `--offload-link` option to perform offload device linking.

Currently, we infer the usage of the device linker by the user
specifying an offloading toolchain, e.g. (--offload-arch=...) or
(-fopenmp-targets=...), but this shouldn't be strictly necessary.

May 25 2022, 11:08 AM · Restricted Project, Restricted Project
tra added a comment to D125904: [Cuda] Use fallback method to mangle externalized decls if no CUID given.

Is this patch in its current form blocking any of your other work? no-cuid approach, even if we figure out how to do it, will likely take some time. Do you need an interim solution until then?

May 25 2022, 10:52 AM · Restricted Project, Restricted Project
tra accepted D126369: [LLVM] Add rcp.approx.ftz.f32 intrinsic.

LGTM. Do you need me to land it?

May 25 2022, 10:32 AM · Restricted Project, Restricted Project, Restricted Project

May 24 2022

tra added a comment to D125904: [Cuda] Use fallback method to mangle externalized decls if no CUID given.

I'm still itching to figure out a way to avoid CUID altogether and with the new driver it may be possible.
CUID serves two purposes:
a) avoid name conflicts during device-side linking ("must be globally unique" part)
b) allow host to refer to something in the GPU executable ("stable within TU" part)

May 24 2022, 4:37 PM · Restricted Project, Restricted Project
tra added inline comments to D125904: [Cuda] Use fallback method to mangle externalized decls if no CUID given.
May 24 2022, 1:09 PM · Restricted Project, Restricted Project
tra added inline comments to D125904: [Cuda] Use fallback method to mangle externalized decls if no CUID given.
May 24 2022, 10:59 AM · Restricted Project, Restricted Project
tra added a reviewer for D126158: [MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration.: tra.
May 24 2022, 10:33 AM · Restricted Project, Restricted Project, Restricted Project, Restricted Project
tra added a comment to D126158: [MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration..

I would suggest separating it into separate LLVM and MLIR patches.

May 24 2022, 10:33 AM · Restricted Project, Restricted Project, Restricted Project, Restricted Project

May 23 2022

tra accepted D126226: [OpenMP] Add `-Xoffload-linker` to forward input to the device linker.

Couple of nits. LGTM otherwise.

May 23 2022, 4:51 PM · Restricted Project, Restricted Project
tra added inline comments to D126226: [OpenMP] Add `-Xoffload-linker` to forward input to the device linker.
May 23 2022, 3:39 PM · Restricted Project, Restricted Project
tra added a comment to D125904: [Cuda] Use fallback method to mangle externalized decls if no CUID given.

clang a.c -c -o a-0.o Has some externalized static variable.
clang a.c -c -o a-1.o
clang a-0.o a-1.o
Redefined symbol error

May 23 2022, 3:25 PM · Restricted Project, Restricted Project
tra added a comment to D126226: [OpenMP] Add `-Xoffload-linker` to forward input to the device linker.

We keep running into the same old underlying issue that we do not have a good way to name/reference specific parts of the compilation pipeline. -Xfoo used to work OK for the linear 'standard' compilation pipeline, but these days when compilation grew from a simple linear pipe it's no longer adequate and we need to extend it.

Yeah, it's getting increasingly complicated to refer to certain portions of the compilation toolchain as we start adding more complicated stuff. Just recently I had a problem that I wanted to pass an -Xclang argument only to the CUDA toolchain, and there's no way to do it as far as I can tell. It may be worth revisiting this whole concept to support more arbitrary combinations.

May 23 2022, 2:40 PM · Restricted Project, Restricted Project
tra added a comment to D125904: [Cuda] Use fallback method to mangle externalized decls if no CUID given.

The one downside to this is that we are no longer stable under multiple compilations of the same file.

May 23 2022, 2:30 PM · Restricted Project, Restricted Project
tra accepted D126231: [External][CUDA] Add option to test with the new driver.
May 23 2022, 2:06 PM · Restricted Project
tra added a comment to D126226: [OpenMP] Add `-Xoffload-linker` to forward input to the device linker.

It's better to avoid JoinedAndSeparate for new options. It is for --xxx val and --xxxval but not intended for the option this patch will add.

May 23 2022, 2:03 PM · Restricted Project, Restricted Project
tra added a comment to D126226: [OpenMP] Add `-Xoffload-linker` to forward input to the device linker.

We already use this approach for the -Xopenmp-target=<triple> <arg> option that forwards arguments to the given toolchain, so that's what I was using as a basis.

May 23 2022, 12:22 PM · Restricted Project, Restricted Project
tra requested changes to D126226: [OpenMP] Add `-Xoffload-linker` to forward input to the device linker.
May 23 2022, 11:45 AM · Restricted Project, Restricted Project
tra added a comment to D126226: [OpenMP] Add `-Xoffload-linker` to forward input to the device linker.

The syntax is confusing. Normally only triple would be the argument for -Xoffload-linker option.
Having both -Xoffload-linker and -Xoffload-linker= variants also looks odd to me.

May 23 2022, 11:44 AM · Restricted Project, Restricted Project
tra added a comment to D126079: [NFC][LLVM] Merge shouldExpandAtomic*InIR into shouldExpandAtomicInstInIR.

Looks like it's a wash -- public API is smaller, but the implementations didn't really benefit much. If anything, they got a bit less readable.

May 23 2022, 11:21 AM · Restricted Project, Restricted Project

May 20 2022

tra accepted D125639: [NVPTX] Enable AtomicExpandPass for NVPTX.
May 20 2022, 12:00 PM · Restricted Project, Restricted Project
tra added a comment to D126079: [NFC][LLVM] Merge shouldExpandAtomic*InIR into shouldExpandAtomicInstInIR.

My original suggestion which resulted in this patch was to generalize the API to just one function accepting Instruction* and let the target deal with the details.
For many targets there's a fair amount of duplicated code across the per-instruction variant implementation of shouldExpandX and using single call would allow to consolidate the atomic expansion logic in fewer places.

May 20 2022, 11:52 AM · Restricted Project, Restricted Project

May 19 2022

tra added inline comments to D125937: [NVVM] Update intrinsic defintions to include the `nocallback` attribute.
May 19 2022, 10:24 AM · Restricted Project, Restricted Project

May 18 2022

tra added inline comments to D125639: [NVPTX] Enable AtomicExpandPass for NVPTX.
May 18 2022, 11:19 AM · Restricted Project, Restricted Project
tra accepted D125909: [AMDGPU] emit macro __GFX9__ etc.
May 18 2022, 10:46 AM · Restricted Project, Restricted Project
tra accepted D125829: [clang] Fix __has_builtin.

LGTM with a possible nit.

May 18 2022, 10:39 AM · Restricted Project, Restricted Project

May 17 2022

tra updated subscribers of D125829: [clang] Fix __has_builtin.

@echristo - FYI, in case you have an opinion on target builtin feature checks.

May 17 2022, 2:04 PM · Restricted Project, Restricted Project
tra added a comment to D124382: [Clang] Recognize target address space in superset calculation.

My concerns have been addressed. I'll defer the final LGTM to @Anastasia.

May 17 2022, 11:14 AM · Restricted Project, Restricted Project
tra added inline comments to D125639: [NVPTX] Enable AtomicExpandPass for NVPTX.
May 17 2022, 10:51 AM · Restricted Project, Restricted Project

May 16 2022

tra added inline comments to D125639: [NVPTX] Enable AtomicExpandPass for NVPTX.
May 16 2022, 4:17 PM · Restricted Project, Restricted Project
tra added inline comments to D125652: [LLVM] Add a check if should cast atomic operations to integer type.
May 16 2022, 4:09 PM · Restricted Project, Restricted Project
tra added a comment to D125639: [NVPTX] Enable AtomicExpandPass for NVPTX.

What's the minimum SM version do we currently support? I did find from one of your patch (https://reviews.llvm.org/D24943) that it had the function hasAtomAddF32, but now I can't find it. Was it removed because we support a minimum SM version, like SM20?

May 16 2022, 1:06 PM · Restricted Project, Restricted Project
tra added inline comments to D125639: [NVPTX] Enable AtomicExpandPass for NVPTX.
May 16 2022, 11:23 AM · Restricted Project, Restricted Project
tra added a comment to D125555: [clang] Add __has_target_feature.

Oops. forgot to hit 'submit' last week, so there's some overlap with Aaron's question.

May 16 2022, 10:13 AM · Restricted Project

May 13 2022

tra added a comment to D125512: [NVPTX] Enable atomic expansion of atomicrmw for NVPTX.

Tests?

May 13 2022, 10:15 AM · Restricted Project, Restricted Project
tra added inline comments to D125555: [clang] Add __has_target_feature.
May 13 2022, 9:53 AM · Restricted Project

May 11 2022

tra accepted D125423: [Intrinsics] Fix `nvvm_prmt` intrinsic attributes.

Good catch. LGTM.

May 11 2022, 5:32 PM · Restricted Project, Restricted Project

May 10 2022

tra added inline comments to D123810: [Cuda] Add initial support for wrapping CUDA images in the new driver..
May 10 2022, 3:58 PM · Restricted Project, Restricted Project, Restricted Project
tra accepted D123812: [CUDA] Add wrapper code generation for registering CUDA images.
May 10 2022, 3:52 PM · Restricted Project, Restricted Project
tra accepted D123471: [CUDA] Create offloading entries when using the new driver.
May 10 2022, 3:47 PM · Restricted Project, Restricted Project
tra accepted D123810: [Cuda] Add initial support for wrapping CUDA images in the new driver..

Please add a TODO around the items that need further work, so we don't forget about them. Review comments tend to fade from memory.

May 10 2022, 3:44 PM · Restricted Project, Restricted Project, Restricted Project
tra accepted D125165: [Clang] Introduce clang-offload-packager tool to bundle device files.

OK. LGTM.

May 10 2022, 3:28 PM · Restricted Project, Restricted Project
tra added inline comments to D123810: [Cuda] Add initial support for wrapping CUDA images in the new driver..
May 10 2022, 2:52 PM · Restricted Project, Restricted Project, Restricted Project
tra updated subscribers of D125165: [Clang] Introduce clang-offload-packager tool to bundle device files.

LGTM in principle.

May 10 2022, 11:44 AM · Restricted Project, Restricted Project
tra added inline comments to D124866: [CUDA][HIP] support __noinline__ as keyword.
May 10 2022, 10:51 AM · Restricted Project, Restricted Project

May 9 2022

tra added inline comments to D125165: [Clang] Introduce clang-offload-packager tool to bundle device files.
May 9 2022, 2:36 PM · Restricted Project, Restricted Project
tra accepted D124866: [CUDA][HIP] support __noinline__ as keyword.

If we are to add __forceinline__ as a keyword, I feel it better be a separate patch to be cleaner.

May 9 2022, 1:17 PM · Restricted Project, Restricted Project