Page MenuHomePhabricator

ABataev (Alexey Bataev)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 31 2013, 4:40 AM (341 w, 3 d)

Recent Activity

Fri, Aug 16

ABataev committed rG217ff1e44560: [OPENMP5.0]Diagnose global variables in lambda not marked as declare target. (authored by ABataev).
[OPENMP5.0]Diagnose global variables in lambda not marked as declare target.
Fri, Aug 16, 1:15 PM
ABataev committed rL369146: [OPENMP5.0]Diagnose global variables in lambda not marked as declare.
[OPENMP5.0]Diagnose global variables in lambda not marked as declare
Fri, Aug 16, 1:14 PM
ABataev added a comment to D64700: [SLPVectorizer] [NFC] Avoid repetitive calls to getSameOpcode()..

Rebase please

Fri, Aug 16, 12:35 PM
ABataev added a comment to D64943: [Clang][OpenMP offload] Eliminate use of OpenMP linker script.

As I understand ‘atexit’ solution would be target dependent (‘__cxa_atexit’ on Linux and ‘atexit’ on Windows) whereas @llvm.global_ctors/dtors variables offer similar and platform neutral functionality (http://llvm.org/docs/LangRef.html#the-llvm-global-ctors-global-variable). Why do you think that ‘atexit’ is a better choice?

Because it does not work on Windows, better to have portable solution, if possible.

Is there a bug # ?

Fri, Aug 16, 7:59 AM
ABataev added a comment to D64943: [Clang][OpenMP offload] Eliminate use of OpenMP linker script.

As I understand ‘atexit’ solution would be target dependent (‘__cxa_atexit’ on Linux and ‘atexit’ on Windows) whereas @llvm.global_ctors/dtors variables offer similar and platform neutral functionality (http://llvm.org/docs/LangRef.html#the-llvm-global-ctors-global-variable). Why do you think that ‘atexit’ is a better choice?

Fri, Aug 16, 7:44 AM

Thu, Aug 15

ABataev accepted D62432: [SLPVectorizer] Make the scheduler aware of the TreeEntry operands..

LG

Thu, Aug 15, 1:49 PM · Restricted Project
ABataev committed rGdeb49a6217da: Mark the test as unsupported on darwin, NFC. (authored by ABataev).
Mark the test as unsupported on darwin, NFC.
Thu, Aug 15, 1:32 PM
ABataev committed rL369044: Mark the test as unsupported on darwin, NFC..
Mark the test as unsupported on darwin, NFC.
Thu, Aug 15, 1:31 PM
ABataev added a comment to D66296: [BUNDLER]Improve the test, NFC..

Looks like this is failing on Darwin:

http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/1193/consoleFull#-2382751928254eaf0-7326-4999-85b0-388101f2d404

/Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm-project/clang/test/Driver/clang-offload-bundler.c:120:15: error: CK-TEXTLL: expected string not found in input
// CK-TEXTLL: @A = dso_local global i32 0
              ^
/Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/clang-build/tools/clang/test/Driver/Output/clang-offload-bundler.c.tmp.bundle3.ll:3:1: note: scanning from here
; ModuleID = '/Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm-project/clang/test/Driver/clang-offload-bundler.c'
^
/Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/clang-build/tools/clang/test/Driver/Output/clang-offload-bundler.c.tmp.bundle3.ll:8:1: note: possible intended match here
@A = global i32 0, align 4
Thu, Aug 15, 12:00 PM · Restricted Project, Restricted Project
ABataev committed rG1c3a5d78bd96: Fix the test, NFC. (authored by ABataev).
Fix the test, NFC.
Thu, Aug 15, 10:54 AM
ABataev committed rL369028: Fix the test, NFC..
Fix the test, NFC.
Thu, Aug 15, 10:52 AM
ABataev committed rG4fb80d56db62: [Driver][Bundler] Improve bundling of object files. (authored by ABataev).
[Driver][Bundler] Improve bundling of object files.
Thu, Aug 15, 10:18 AM
ABataev committed rL369019: [Driver][Bundler] Improve bundling of object files..
[Driver][Bundler] Improve bundling of object files.
Thu, Aug 15, 10:17 AM
ABataev closed D65819: [Driver][Bundler] Improve bundling of object files..
Thu, Aug 15, 10:17 AM · Restricted Project, Restricted Project
ABataev updated the diff for D65819: [Driver][Bundler] Improve bundling of object files..

Rebase

Thu, Aug 15, 9:44 AM · Restricted Project, Restricted Project
ABataev committed rGb2df99cd9501: [BUNDLER]Improve the test, NFC. (authored by ABataev).
[BUNDLER]Improve the test, NFC.
Thu, Aug 15, 9:31 AM
ABataev committed rL369015: [BUNDLER]Improve the test, NFC..
[BUNDLER]Improve the test, NFC.
Thu, Aug 15, 9:31 AM
ABataev closed D66296: [BUNDLER]Improve the test, NFC..
Thu, Aug 15, 9:30 AM · Restricted Project, Restricted Project
ABataev added a comment to D66095: [InstCombine] canonicalize a scalar-select-of-vectors to vector select.

What about very long vectors that do no fit into single register? Is it cost effective for such vectors too?

We would split the long vectors into values that fit the target registers in the backend. At that point, the target can decide if N vector selects are better or worse than a transfer to scalar compare and branch. As @lebedev.ri mentioned, we're missing that logic in SDAG, but given that this transform produces the better code for the default case, I don't think we need to make this patch dependent on backend fixups.

But it adds extra cost for vectors splitting. Maybe limit the size of the vectors in the patch?

How would we do that? This is canonicalization, so we are not using any target-dependent information here. Presumably, whoever or whatever created the illegal vectors in the first place knows that codegen will have to alter the those ops to create legal code, so that will be handled by a pass that has a cost model.

Thu, Aug 15, 8:22 AM · Restricted Project
ABataev created D66296: [BUNDLER]Improve the test, NFC..
Thu, Aug 15, 8:18 AM · Restricted Project, Restricted Project
ABataev added a comment to D66095: [InstCombine] canonicalize a scalar-select-of-vectors to vector select.

What about very long vectors that do no fit into single register? Is it cost effective for such vectors too?

We would split the long vectors into values that fit the target registers in the backend. At that point, the target can decide if N vector selects are better or worse than a transfer to scalar compare and branch. As @lebedev.ri mentioned, we're missing that logic in SDAG, but given that this transform produces the better code for the default case, I don't think we need to make this patch dependent on backend fixups.

Thu, Aug 15, 8:09 AM · Restricted Project
ABataev updated the diff for D65819: [Driver][Bundler] Improve bundling of object files..

Fixed comments.

Thu, Aug 15, 7:43 AM · Restricted Project, Restricted Project
ABataev added a comment to D65819: [Driver][Bundler] Improve bundling of object files..

The code changes look good to me, but the test doesn't pass on x86. We've faced the same problem when clang-offload-bundler was initially committed and the current testing is the best we were able to do.

Thu, Aug 15, 7:43 AM · Restricted Project, Restricted Project
ABataev added inline comments to D62432: [SLPVectorizer] Make the scheduler aware of the TreeEntry operands..
Thu, Aug 15, 6:38 AM · Restricted Project

Wed, Aug 14

ABataev added a comment to D65835: [OpenMP] Permit map with DSA on combined directive.

Why there are the changes from the another patch?

Wed, Aug 14, 6:00 PM · Restricted Project
ABataev added a comment to D66247: [OpenMP] Fix target map for unused variables.

Yes, just realized that, defaultmap does not affect explicit firstprivates. Then just check map(a) firstprivate(a) for int128 type. Check that it still has tofrom mapping. If so, then probably we already can handle such combinations correctly in the codegen. Also, test it for other mapping types, like alloc, from, to.

All work as expected.

Wed, Aug 14, 3:09 PM · Restricted Project
ABataev added a comment to D66095: [InstCombine] canonicalize a scalar-select-of-vectors to vector select.

What about very long vectors that do no fit into single register? Is it cost effective for such vectors too?

Wed, Aug 14, 2:58 PM · Restricted Project
ABataev added a comment to D66247: [OpenMP] Fix target map for unused variables.

Try map(a) firstprivate(a) defaultmap(scalar:tofrom), where a is int, for example. The variable must be mapped as tofrom in this case but, most probably, will be mapped as to.

Map type is 35=0x23, which has tofrom.

Wed, Aug 14, 2:50 PM · Restricted Project
ABataev added a comment to D66247: [OpenMP] Fix target map for unused variables.

Do we really need to map such variables? According to standard, "The map clause specifies how an original list item is mapped from the current task’s data environment to a corresponding list item in the device data environment of the device identified by the construct.", i.e. it does not map the variable, but "specifies" how to map it.

What you're asking for now contradicts what you asked for at https://reviews.llvm.org/D65835#1624659 when I asked about the same example of target map(a) enclosing teams private(a).

Seems to me I did not expressed my idea absolutely correctly. I meant that if the variable is really mapped, then the maptype must be generated in all cases. If we do not need to map the variable, just like in these case, there definitely should not be maptype.

Then I still need an example where D65835 doesn't behave correctly.

Wed, Aug 14, 2:26 PM · Restricted Project
ABataev added a comment to D66247: [OpenMP] Fix target map for unused variables.

Do we really need to map such variables? According to standard, "The map clause specifies how an original list item is mapped from the current task’s data environment to a corresponding list item in the device data environment of the device identified by the construct.", i.e. it does not map the variable, but "specifies" how to map it.

What you're asking for now contradicts what you asked for at https://reviews.llvm.org/D65835#1624659 when I asked about the same example of target map(a) enclosing teams private(a).

Wed, Aug 14, 2:05 PM · Restricted Project
ABataev added a comment to D66247: [OpenMP] Fix target map for unused variables.

Do we really need to map such variables? According to standard, "The map clause specifies how an original list item is mapped from the current task’s data environment to a corresponding list item in the device data environment of the device identified by the construct.", i.e. it does not map the variable, but "specifies" how to map it.

Wed, Aug 14, 1:45 PM · Restricted Project
ABataev updated the diff for D65819: [Driver][Bundler] Improve bundling of object files..

Reworked to keep partial linking to make original host object available for analysis without unbundling.

Wed, Aug 14, 1:09 PM · Restricted Project, Restricted Project
ABataev committed rGf8be476f0cde: [OPENMP]Support for non-rectangular loops. (authored by ABataev).
[OPENMP]Support for non-rectangular loops.
Wed, Aug 14, 12:30 PM
ABataev committed rL368903: [OPENMP]Support for non-rectangular loops..
[OPENMP]Support for non-rectangular loops.
Wed, Aug 14, 12:29 PM
ABataev added inline comments to D64700: [SLPVectorizer] [NFC] Avoid repetitive calls to getSameOpcode()..
Wed, Aug 14, 6:24 AM

Tue, Aug 13

ABataev added a comment to D65819: [Driver][Bundler] Improve bundling of object files..

Will this patch change the ability to consume a bundled object file without calling the unbundler? Using known ELF tools on the produced object files was an important design decision and IIRC was somewhat important for using build systems that are unaware of the bundled format.

Ping.

Missed this. We still produce correct object files, so all the tools will work with this.

I agree on a technical level that it's still a "correct" object, but not what I was looking for: The host object file will only be in the bundled section, so you cannot examine it without unbundling.

For example, with a small test file and clang -fopenmp -fopenmp-targets=x86_64 -c test.c I saw the following:

$ nm test.o                                       
0000000000000000 t .omp_offloading.requires_reg
0000000000000000 T test
                 U __tgt_register_requires

After applying this patch, the output is empty which might be a problem in certain cases.

Unfortunately, this is the only possible solution I see. There are already 2 reports that bundled objects does not work correctly after unbundling.

Can you please again share what exactly is the problem, with a small example? I saw discussions on openmp-dev, but that project was huge, and above you were quoting a man page and hinted to global constructors.

I don't have a small reproducer, unfortunately, only the big one.
Here is the message from the user:

Hi,
I am revisiting this possible compiler bug in Clang 8.0.0.

In the source code I am developing, there's a global static variable, nest::sli_neuron::recordablesMap_ put in the BSS section and it is expected to be fully initialized by the time nest::sli_neuron::sli_neuron() gets called, however in a gdb session:

(gdb) p nest::sli_neuron::recordablesMap_
Python Exception <type 'exceptions.AttributeError'> 'gdb.Type' object has no attribute 'name':
$1 = {<std::map<Name, double (nest::sli_neuron::*)() const, std::less<Name>, std::allocator<std::pair<Name const, double (nest::sli_neuron::*)() const> > >> = std::map with 0 elements, _vptr$RecordablesMap = 0x0}

this doesn't happen when -fopenmp-targets is _not_ used, is it not trivial to come up a reproducer, thus I am sending a work-in-progress report hoping someone will shed some light on this.

Thanks,
Itaru.

Another one:

Hi,
I am seeing a link error shown below:

`.text.startup' referenced in section `.init_array.0' of /tmp/event_delivery_manager-02f392.o: defined in discarded section `.text.startup[_ZN4nest18DataSecondaryEventIdNS_16GapJunctionEventEE18supported_syn_ids_E]' of /tmp/event_delivery_manager-02f392.o
clang-10: error: linker command failed with exit code 1 (use -v to see invocation)

I am not sure how to tackle this as the part is referenced isn't what I am
working on. I am using the latest trunk Clang 10 at this moment.

Steps to reproduce:

Steps to produce:
$ git clone https://https://github.com/nest/nest-simulator/
$ mkdir /tmp/build/nest-clang-offload/
$ cd /tmp/build/nest-clang-offload/
$ cmake -DCMAKE_TOOLCHAIN_FILE=Platform/JURON_Clang -DCMAKE_INSTALL_PREFIX=/path/to/opt/nest-clang -Dwith-gsl=ON -Dwith-openmp=ON -Dwith-mpi=OFF -Dwith-python=OFF -Dwith-offload=ON /path/to/nest-simulator/

I've seen these reports. What has eventually led to this patch, ie where do constructors come into play?

Tue, Aug 13, 12:41 PM · Restricted Project, Restricted Project
ABataev committed rG4a0328c92a81: Don't use std::errc (authored by ABataev).
Don't use std::errc
Tue, Aug 13, 12:33 PM
ABataev committed rL368739: Don't use std::errc.
Don't use std::errc
Tue, Aug 13, 12:31 PM
ABataev closed D66143: Don't use std::errc.
Tue, Aug 13, 12:31 PM · Restricted Project, Restricted Project
ABataev accepted D65836: Factor architecture dependent code out of loop.cu.

What about namong convention here? Shall we use capital letters for the var names or it is fine as is?

Long term it's probably worth moving to the LLVM conventions throughout. Short term, I'd rather keep roughly the same style as the surrounding code. I believe that's what this patch does.

Also, did you come to an agreement about design, directory layout etc.?

Overall agreement on the final design is a work in progress. I believe there is agreement that we'll need at least one target specific header, so this patch follows the suggestion in D64217 and calls it target_impl.

At the moment, nvptx/src is the most sensible directory for it (as that's the only directory!). I think this is an uncontentious step in the right direction.

Tue, Aug 13, 12:06 PM · Restricted Project, Restricted Project
ABataev added a comment to D65819: [Driver][Bundler] Improve bundling of object files..

Will this patch change the ability to consume a bundled object file without calling the unbundler? Using known ELF tools on the produced object files was an important design decision and IIRC was somewhat important for using build systems that are unaware of the bundled format.

Ping.

Missed this. We still produce correct object files, so all the tools will work with this.

I agree on a technical level that it's still a "correct" object, but not what I was looking for: The host object file will only be in the bundled section, so you cannot examine it without unbundling.

For example, with a small test file and clang -fopenmp -fopenmp-targets=x86_64 -c test.c I saw the following:

$ nm test.o                                       
0000000000000000 t .omp_offloading.requires_reg
0000000000000000 T test
                 U __tgt_register_requires

After applying this patch, the output is empty which might be a problem in certain cases.

Unfortunately, this is the only possible solution I see. There are already 2 reports that bundled objects does not work correctly after unbundling.

Can you please again share what exactly is the problem, with a small example? I saw discussions on openmp-dev, but that project was huge, and above you were quoting a man page and hinted to global constructors.

Tue, Aug 13, 12:02 PM · Restricted Project, Restricted Project
ABataev added a comment to D65819: [Driver][Bundler] Improve bundling of object files..

Will this patch change the ability to consume a bundled object file without calling the unbundler? Using known ELF tools on the produced object files was an important design decision and IIRC was somewhat important for using build systems that are unaware of the bundled format.

Ping.

Missed this. We still produce correct object files, so all the tools will work with this.

I agree on a technical level that it's still a "correct" object, but not what I was looking for: The host object file will only be in the bundled section, so you cannot examine it without unbundling.

For example, with a small test file and clang -fopenmp -fopenmp-targets=x86_64 -c test.c I saw the following:

$ nm test.o                                       
0000000000000000 t .omp_offloading.requires_reg
0000000000000000 T test
                 U __tgt_register_requires

After applying this patch, the output is empty which might be a problem in certain cases.

Tue, Aug 13, 11:42 AM · Restricted Project, Restricted Project
ABataev added a comment to D65836: Factor architecture dependent code out of loop.cu.

Naming convention

Tue, Aug 13, 10:13 AM · Restricted Project, Restricted Project
ABataev added a comment to D65836: Factor architecture dependent code out of loop.cu.

What about namong convention here? Shall we use capital letters for the var names or it is fine as is? Also, did you come to an agreement about design, directory layout etc.?

Tue, Aug 13, 10:13 AM · Restricted Project, Restricted Project
ABataev added a comment to D66143: Don't use std::errc.

As far as I know 4.8 is no longer supported.

That's Trusty and Trusty is on 4.8 which we no longer support, right?

It is still supported with some extra cmake flags.

It used to be, but that's no longer true. r366333 added an unconditional use of <regex> which doesn't exist in 4.8.

Tue, Aug 13, 8:46 AM · Restricted Project, Restricted Project
ABataev added a comment to D66143: Don't use std::errc.

Also, there is a clang unit test that fails because of the same reason. Can I include the changes in this test into this patch too or better to make a separate patch? The test is unittests/Basic/FileManagerTest.cpp

Tue, Aug 13, 8:26 AM · Restricted Project, Restricted Project
ABataev added a comment to D66143: Don't use std::errc.

That's Trusty and Trusty is on 4.8 which we no longer support, right?

Tue, Aug 13, 8:23 AM · Restricted Project, Restricted Project
ABataev added a comment to D66143: Don't use std::errc.

std::error_code works correctly, it is std::make_error_code leads to something strange in some cases.

Tue, Aug 13, 8:21 AM · Restricted Project, Restricted Project
ABataev added a comment to D66143: Don't use std::errc.

This isn't in a hot codepath, so the virtual calls aren't super important.

This not working in some libstdc++s is important of course, but llvm's Errc.h claims "std::error_code works OK on all platforms we use". I guess that's not true? Can you include more details on which platforms this is broken?

Tue, Aug 13, 8:17 AM · Restricted Project, Restricted Project
ABataev created D66143: Don't use std::errc.
Tue, Aug 13, 8:06 AM · Restricted Project, Restricted Project
ABataev added a comment to D65819: [Driver][Bundler] Improve bundling of object files..

Will this patch change the ability to consume a bundled object file without calling the unbundler? Using known ELF tools on the produced object files was an important design decision and IIRC was somewhat important for using build systems that are unaware of the bundled format.

Ping.

Tue, Aug 13, 7:07 AM · Restricted Project, Restricted Project
ABataev added a comment to D65819: [Driver][Bundler] Improve bundling of object files..

Ping.

Tue, Aug 13, 6:50 AM · Restricted Project, Restricted Project
ABataev added a comment to D65013: [OPENMP][NVPTX]Fix parallel level counter in Cuda 9.0..

Ping

Tue, Aug 13, 6:47 AM · Restricted Project

Sun, Aug 11

ABataev added a comment to D65835: [OpenMP] Permit map with DSA on combined directive.

Target teams private map will produce extra private in target context, other constructs either.

Here's what I tried:

int a;
#pragma omp target teams private(a) map(a)
  ;

The same code is generated as for:

int a;
#pragma omp target map(a)
#pragma omp teams private(a)
  ;

I haven't debugged it yet, but it seems orthogonal to whether you have a combined directive, which is what this patch is about.

Did you check the mapping flags, generated during host codegen? They must be the same. With private clause it may generate just map(alloc) instead of map(tofrom).

I diffed the .ll files for combined vs. separate constructs. The only difference is the file ID. @.offload_maptypes isn't generated in either (but it is if I replace private with firstprivate).

Sun, Aug 11, 2:47 PM · Restricted Project
ABataev added a comment to D65835: [OpenMP] Permit map with DSA on combined directive.

Target teams private map will produce extra private in target context, other constructs either.

Here's what I tried:

int a;
#pragma omp target teams private(a) map(a)
  ;

The same code is generated as for:

int a;
#pragma omp target map(a)
#pragma omp teams private(a)
  ;

I haven't debugged it yet, but it seems orthogonal to whether you have a combined directive, which is what this patch is about.

Sun, Aug 11, 11:16 AM · Restricted Project
ABataev added a comment to D65835: [OpenMP] Permit map with DSA on combined directive.

This is wrong. It affects all possible combinations and not only fof scalar types, all of them are affected.

Are you saying the patch isn't sufficient because other types need to be fixed too? Can you give an example?

Sun, Aug 11, 9:33 AM · Restricted Project
ABataev added a comment to D65835: [OpenMP] Permit map with DSA on combined directive.

Try something like target parallel firstprivate (a) map(a). Currently it will create a firstprivate copy of the variable a in target context thought it is not required at all. It may lead to increased register pressure and performance degradation.

Thanks. The only combination that appears to be affected is firstprivate and map for scalar types. I had only tried arrays and structs earlier, but they're not affected. If I had looked a little more closely at the test case this patch already introduced, I would have noticed that int is affected. The problematic analysis is in sema, where there was an apparent assumption that firstprivate wouldn't appear with map due to the previous restriction. This update fixes that.

For my previous update, I meant to point out that I introduced a fixme into the test suite. See the phabricator summary for details.

Sun, Aug 11, 8:53 AM · Restricted Project

Sat, Aug 10

ABataev added a comment to D65835: [OpenMP] Permit map with DSA on combined directive.

I made the following changes, as suggested:

  • Add back restriction for OpenMP < 5.0.
  • Remove is_device_ptr changes.

    Alexey, you said:

Plus, these changes are not enough to support this new feature from OpenMP 5.0. There definitely must be some changes in the codegen. If the variable is mapped in the target construct, we should not generate a code for the private clause of this variable on the target construct, since, in this case, private clauses are applied for the inner subdirectives, which are the part of the combined directive, but not the target part of the directive.

I haven't spent enough time exploring codegen here in the past, so I'm starting out by investigating the codegen output for various examples. I'm looking for differences between a combined target teams and an equivalent separate target and teams. So far, I see nothing but superficial differences. Do you have an example where there's an undesirable difference?

Thanks.

Sat, Aug 10, 3:44 PM · Restricted Project

Fri, Aug 9

ABataev added a comment to D65835: [OpenMP] Permit map with DSA on combined directive.

I want to be sure we're on the same page: For OpenMP 5.0, should we allow is_device_ptr with the private clauses?

Yes, since it is allowed by the standard.

Umm ... I probably missed some earlier discussions! What would be the behavior of the following code?

p = omp_target_alloc(...);
#pragma omp target private(p) is_device_ptr(p)
  p[...] = ...;   // crash or not?

It must crush, I assume. The main problem is that this construct is allowed by the standard.

Yep. We should add a warning message for it.

Upon further reflection, this is not clearly allowed by the standard. My experience is that, when reading standards, sometimes things are disallowed by contradiction (i.e., the standard does not define some behavior, and what the standard does say that's relevant is self contradictory). In this case, 2.19.3 says that list items which are privatized (and which are used) undergo replacement (with new items created as specified) while 2.12.5 says that "The is_device_ptr clause is used to indicate that a list item is a device pointer already in the device data environment and that it should be used directly." A given list item cannot simultaneously be "used directly" (2.12.5) and also undergo replacement: "Inside the construct, all references to the original list item are replaced by references to a new list item received by the task or SIMD lane" (2.19.3). Thus, it may be disallowed.

I think, this combination is still allowed. I assume that

#Pragma omp target parallel is_device_ptr(a) <dsa_clause>(a)

is the same as

#pragma omp target is_device_ptr(a)
#pragma omp parallel <dsa_clause>(a)

i.e. datasharing clauses are applied to inner sub-regions, not the target region itself.

With the parallel, that makes sense to me. In that case, however, I'd imagine that the privitization works as normal and the code wouldn't crash. Right?

Fri, Aug 9, 4:55 PM · Restricted Project
ABataev added a comment to D65835: [OpenMP] Permit map with DSA on combined directive.

I want to be sure we're on the same page: For OpenMP 5.0, should we allow is_device_ptr with the private clauses?

Yes, since it is allowed by the standard.

Umm ... I probably missed some earlier discussions! What would be the behavior of the following code?

p = omp_target_alloc(...);
#pragma omp target private(p) is_device_ptr(p)
  p[...] = ...;   // crash or not?

It must crush, I assume. The main problem is that this construct is allowed by the standard.

Yep. We should add a warning message for it.

Upon further reflection, this is not clearly allowed by the standard. My experience is that, when reading standards, sometimes things are disallowed by contradiction (i.e., the standard does not define some behavior, and what the standard does say that's relevant is self contradictory). In this case, 2.19.3 says that list items which are privatized (and which are used) undergo replacement (with new items created as specified) while 2.12.5 says that "The is_device_ptr clause is used to indicate that a list item is a device pointer already in the device data environment and that it should be used directly." A given list item cannot simultaneously be "used directly" (2.12.5) and also undergo replacement: "Inside the construct, all references to the original list item are replaced by references to a new list item received by the task or SIMD lane" (2.19.3). Thus, it may be disallowed.

Fri, Aug 9, 4:10 PM · Restricted Project
ABataev added a comment to D65835: [OpenMP] Permit map with DSA on combined directive.

I want to be sure we're on the same page: For OpenMP 5.0, should we allow is_device_ptr with the private clauses?

Yes, since it is allowed by the standard.

Umm ... I probably missed some earlier discussions! What would be the behavior of the following code?

p = omp_target_alloc(...);
#pragma omp target private(p) is_device_ptr(p)
  p[...] = ...;   // crash or not?

It must crush, I assume. The main problem is that this construct is allowed by the standard.

Yep. We should add a warning message for it.

Clang currently doesn't permit is_device_ptr with firstprivate either, but I'm not aware of any reason to diagnose that. Is there? And then there are privatization clauses like linear and reduction....

Fri, Aug 9, 3:49 PM · Restricted Project
ABataev added a comment to D65835: [OpenMP] Permit map with DSA on combined directive.

I want to be sure we're on the same page: For OpenMP 5.0, should we allow is_device_ptr with the private clauses?

Yes, since it is allowed by the standard.

Umm ... I probably missed some earlier discussions! What would be the behavior of the following code?

p = omp_target_alloc(...);
#pragma omp target private(p) is_device_ptr(p)
  p[...] = ...;   // crash or not?
Fri, Aug 9, 3:09 PM · Restricted Project

Thu, Aug 8

ABataev added a comment to D65835: [OpenMP] Permit map with DSA on combined directive.

I want to be sure we're on the same page: For OpenMP 5.0, should we allow is_device_ptr with the private clauses?

Thu, Aug 8, 3:50 PM · Restricted Project
ABataev added a comment to D65819: [Driver][Bundler] Improve bundling of object files..

Additional note. Seems to me, it has something to do with the partial linking. According to ld documentation, it is recommended to use -Ur option for partial linking of the C++ object files to resolve global constructors.

-Ur
For anything other than C++ programs, this option is equivalent to '-r': it generates relocatable output--i.e., an output file that can in turn serve as input to ld. When linking C++ programs, `-Ur' does resolve references to constructors, unlike '-r'. It does not work to use '-Ur' on files that were themselves linked with `-Ur'; once the constructor table has been built, it cannot be added to. Use `-Ur' only for the last partial link, and '-r' for the others.

The problem I saw is exactly connected with the global constructors, which are not called after partial linking.
Seems to me, we partially link objects files for C++ with -Ur option but we cannot say if this the last time we perform partial linking or not (we may try to bundle/unbundle objects several times, especially when we'll try to support linking with libraries). Better not to use partial linking in bundler.

Thu, Aug 8, 1:48 PM · Restricted Project, Restricted Project
ABataev added inline comments to D64375: [OpenMP][Docs] Provide implementation status details.
Thu, Aug 8, 10:39 AM · Restricted Project
ABataev added inline comments to D64700: [SLPVectorizer] [NFC] Avoid repetitive calls to getSameOpcode()..
Thu, Aug 8, 9:39 AM
ABataev added a comment to D65836: Factor architecture dependent code out of loop.cu.

@ABataev, others, any concerns? If not, let's go ahead with this.

Did you come to an agreement about the design of the new universal library? I would suggest starting with this.

We have a proposal and no major complaints, I count that as agreement.

Thu, Aug 8, 9:04 AM · Restricted Project, Restricted Project
ABataev added a comment to D65835: [OpenMP] Permit map with DSA on combined directive.

Maybe, but I haven't found any statement in either version that states that map restrictions apply to is_device_ptr.

is_device_ptr is a kind of mapping clause. I assume we can extend the restrictions for map clause to this clause too.

I'd like to understand this better. Is there something from the spec we can quote in the code?

See 2.19.7 Data-Mapping Attribute Rules, Clauses, and Directives

I looked again. I'm still not finding any text in that section that implies is_device_ptr follows the same restrictions as map. Can you please cite specific lines of text instead of an entire section? Thanks for your help.

Thu, Aug 8, 8:23 AM · Restricted Project
ABataev added a comment to D65836: Factor architecture dependent code out of loop.cu.

@ABataev, others, any concerns? If not, let's go ahead with this.

Thu, Aug 8, 7:23 AM · Restricted Project, Restricted Project
ABataev added inline comments to D64142: [SLP] try to create vector loads from bitcasted scalar pointers.
Thu, Aug 8, 7:12 AM · Restricted Project
ABataev committed rG195ae90307bd: [OPENMP]Add support for analysis of linear variables and step. (authored by ABataev).
[OPENMP]Add support for analysis of linear variables and step.
Thu, Aug 8, 6:43 AM
ABataev committed rL368295: [OPENMP]Add support for analysis of linear variables and step..
[OPENMP]Add support for analysis of linear variables and step.
Thu, Aug 8, 6:43 AM
ABataev closed D65461: [OPENMP]Add support for analysis of linear variables and step..
Thu, Aug 8, 6:43 AM · Restricted Project, Restricted Project
ABataev added inline comments to D65835: [OpenMP] Permit map with DSA on combined directive.
Thu, Aug 8, 5:22 AM · Restricted Project
ABataev added inline comments to D65835: [OpenMP] Permit map with DSA on combined directive.
Thu, Aug 8, 2:38 AM · Restricted Project

Wed, Aug 7

ABataev added inline comments to D65835: [OpenMP] Permit map with DSA on combined directive.
Wed, Aug 7, 2:35 PM · Restricted Project
ABataev added inline comments to D65835: [OpenMP] Permit map with DSA on combined directive.
Wed, Aug 7, 1:56 PM · Restricted Project
ABataev added a comment to D65835: [OpenMP] Permit map with DSA on combined directive.

Maybe, but I haven't found any statement in either version that states that map restrictions apply to is_device_ptr.

is_device_ptr is a kind of mapping clause. I assume we can extend the restrictions for map clause to this clause too.

I'd like to understand this better. Is there something from the spec we can quote in the code?

Wed, Aug 7, 12:15 PM · Restricted Project
ABataev added a comment to D65835: [OpenMP] Permit map with DSA on combined directive.

I should have reported that the current implementation isn't complete for OpenMP 4.5. For example, on target teams, reduction(+:x) map(x) is an error but not map(x) reduction(+:x). So there are bugs to fix, and maybe this will evolve into multiple patches, but I want to be sure I'm on the right path first.

Wed, Aug 7, 12:02 PM · Restricted Project
ABataev added a comment to D65835: [OpenMP] Permit map with DSA on combined directive.

Maybe, but I haven't found any statement in either version that states that map restrictions apply to is_device_ptr.

is_device_ptr is a kind of mapping clause. I assume we can extend the restrictions for map clause to this clause too.

Wed, Aug 7, 11:59 AM · Restricted Project
ABataev added inline comments to D65835: [OpenMP] Permit map with DSA on combined directive.
Wed, Aug 7, 11:45 AM · Restricted Project
ABataev added inline comments to D64943: [Clang][OpenMP offload] Eliminate use of OpenMP linker script.
Wed, Aug 7, 11:41 AM
ABataev updated the diff for D65461: [OPENMP]Add support for analysis of linear variables and step..

Rebase

Wed, Aug 7, 8:09 AM · Restricted Project, Restricted Project
ABataev accepted D65876: Use forceinline. Necessary for nvcc to inline small functions within the bitcode library.

LG

Wed, Aug 7, 8:01 AM · Restricted Project, Restricted Project
ABataev added inline comments to D65836: Factor architecture dependent code out of loop.cu.
Wed, Aug 7, 7:42 AM · Restricted Project, Restricted Project
ABataev committed rGa06155ddc4ed: [OPENMP]Set default version to OpenMP 4.5. (authored by ABataev).
[OPENMP]Set default version to OpenMP 4.5.
Wed, Aug 7, 7:40 AM
ABataev committed rL368172: [OPENMP]Set default version to OpenMP 4.5..
[OPENMP]Set default version to OpenMP 4.5.
Wed, Aug 7, 7:38 AM
ABataev added a comment to D65835: [OpenMP] Permit map with DSA on combined directive.

is_device_ptr can be considered as a kind of mapping clause (see 2.19.7 Data-Mapping Attribute Rules, Clauses, and Directives), so, I assume, clang correct here in terms of OpenMP 4.5.
Thus, I would not call this a "fix", this is just a new feature from OpenMP 5.0.
Plus, these changes are not enough to support this new feature from OpenMP 5.0. There definitely must be some changes in the codegen. If the variable is mapped in the target construct, we should not generate a code for the private clause of this variable on the target construct, since, in this case, private clauses are applied for the inner subdirectives, which are the part of the combined directive, but not the target part of the directive.

Wed, Aug 7, 7:09 AM · Restricted Project
ABataev committed rG2e3a07fcb87e: [OPENMP]Add standard macro value _OPENMP for OpenMP 5.0. (authored by ABataev).
[OPENMP]Add standard macro value _OPENMP for OpenMP 5.0.
Wed, Aug 7, 7:04 AM
ABataev committed rL368170: [OPENMP]Add standard macro value _OPENMP for OpenMP 5.0..
[OPENMP]Add standard macro value _OPENMP for OpenMP 5.0.
Wed, Aug 7, 7:01 AM
ABataev added a comment to D65836: Factor architecture dependent code out of loop.cu.

I would suggest at first to come to an agreement on the design of this reworked library at first.

Wed, Aug 7, 6:55 AM · Restricted Project, Restricted Project

Tue, Aug 6

ABataev added a comment to D65819: [Driver][Bundler] Improve bundling of object files..

Can you detail what "incorrect linking" means? AFAIK the additional sections were just bloating the executable, but how do they affect correctness?

Will this patch change the ability to consume a bundled object file without calling the unbundler? Using known ELF tools on the produced object files was an important design decision and IIRC was somewhat important for using build systems that are unaware of the bundled format.

Tue, Aug 6, 2:33 PM · Restricted Project, Restricted Project
ABataev updated the diff for D65013: [OPENMP][NVPTX]Fix parallel level counter in Cuda 9.0..

Rebase.

Tue, Aug 6, 11:57 AM · Restricted Project
ABataev added inline comments to D64943: [Clang][OpenMP offload] Eliminate use of OpenMP linker script.
Tue, Aug 6, 11:37 AM
ABataev created D65819: [Driver][Bundler] Improve bundling of object files..
Tue, Aug 6, 11:34 AM · Restricted Project, Restricted Project
ABataev committed rGc10180ed8ed9: [OPENMP][OFFLOADING]Fix the test, NFC. (authored by ABataev).
[OPENMP][OFFLOADING]Fix the test, NFC.
Tue, Aug 6, 11:14 AM
ABataev committed rL368068: [OPENMP][OFFLOADING]Fix the test, NFC..
[OPENMP][OFFLOADING]Fix the test, NFC.
Tue, Aug 6, 11:13 AM
ABataev added inline comments to D64217: [OpenMP][NFCI] Cleanup the target state queue implementation.
Tue, Aug 6, 9:16 AM · Restricted Project
ABataev added a comment to D64218: [OpenMP][NFCI] Cleanup the target synchronization implementation.

@ABataev I am confused by the comments you make. Could you describe a situation in which we do not execute the same code before and after this patch? Also, the O0/03 inline & constant propagation comment is not clear to me, what is the issue there?

I was confused by the names of the template arguments.

I'm happy to rename them if we want to go ahead with this __kmpc_impl_barrier (potentially without the rest in here). Name suggestions are always welcome.

Tue, Aug 6, 8:22 AM · Restricted Project
ABataev added a comment to D64218: [OpenMP][NFCI] Cleanup the target synchronization implementation.

@ABataev I am confused by the comments you make. Could you describe a situation in which we do not execute the same code before and after this patch? Also, the O0/03 inline & constant propagation comment is not clear to me, what is the issue there?

Tue, Aug 6, 7:57 AM · Restricted Project