Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

estewart08 (Ethan Stewart)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 16 2019, 12:59 PM (218 w, 2 d)

Recent Activity

Apr 7 2023

estewart08 added a comment to D147756: [Libomptarget] Load an image if it is compatible with at least one device.

What we currently have conservatively determines that a plugin is compatible to the image if all devices managed by the plugin are compatible. This is because when we initialize the device vector, we use a one-for-all style: if a plugin is compatible, we assume we can use all devices managed by the plugin. This patch will break the assumption.

Apr 7 2023, 8:19 AM · Restricted Project, Restricted Project

Feb 21 2023

Herald added a reviewer for D74144: [OPENMP50]Add basic support for array-shaping operation.: NoQ.

@ABataev, the below test is extracted from Sollve test suite and Clang now emit:

test.c:17:35: error: subscripted value is not an array or pointer
    #pragma omp target update to( (([N][N])foo)[1:M] )
                                  ^~~~~~~~~~~~~
test.c:17:5: error: expected at least one 'to' clause or 'from' clause specified to '#pragma omp target update'
    #pragma omp target update to( (([N][N])foo)[1:M] )

This error message came from the ActOnOMPArraySectionExpr which is called inside ParsePostfixExpressionSuffix. The issue is that the base expression in ActOnOMPArraySectionExpr looks like:

ParenExpr 0x122859be0 '<OpenMP array shaping type>' lvalue
`-OMPArrayShapingExpr 0x122859b98 '<OpenMP array shaping type>' lvalue
  |-IntegerLiteral 0x122859b38 'int' 5
  |-IntegerLiteral 0x122859b58 'int' 5
  `-DeclRefExpr 0x122859b78 'int *' lvalue Var 0x1228599d0 'foo' 'int *'

which is not a base that we would expect in an array section expr. I've tried relaxing the base type check in ActOnOMPArraySectionExpr but not sure it's the way to go. (or should I just extract the DeclReExpr from ArrayShapingExpr before calling ActOnOMPArraySectionExpr?)

#define N 5
#define M 3

int main(void) {
    int tmp[N][N];
    for(int i=0; i<N; i++)
        for(int j=0; j<N; j++)
            tmp[i][j] = N*i + j;

    int *foo = &tmp[0][0];

    // This compiles just fine
    //#pragma omp target update to( ([N][N])foo )

    // This is rejected by the compiler
    #pragma omp target update to( (([N][N])foo)[1:M] )
}

I don't think it is allowed by the standard.

According to the standard, The shape-operator can appear only in clauses where it is explicitly allowed.
In this case, array shaping is used as a base expression of array section (or subscript) expression, which does not meet the standard. Tje array sjaping operation is not used in clause, instead it is used as a base subexpression of another expression.

In OpenMP 5.0 [2.12.6, target update construct, Restrictions, C/C++, p.1] The list items that appear in the to or from clauses may use shape-operators.
Also, in the array shaping section in https://github.com/OpenMP/Examples, the example is also illustrated with the same usage:

...
S-17 // update boundary points (two columns of 2D array) on the host
S-18 // pointer is shaped to 2D array using the shape-operator
S-19 #pragma omp target update from( (([nx][ny+2])a)[0:nx][1], (([nx][ny+2])a)[0:nx][ny] )
...

Then just need to fix it, if examples document has this example.

Feb 21 2023, 2:26 PM · Restricted Project, Restricted Project
Herald added a project to D74970: [OpenMP] Refactor the analysis in checkMapClauseBaseExpression using StmtVisitor class.: Restricted Project.
Feb 21 2023, 1:29 PM · Restricted Project, Restricted Project, Restricted Project

Jan 24 2023

estewart08 added a comment to D142378: [OpenMP][Archer] Use dlsym rather than weak symbols for TSan annotations.

Also broke the AMD OpenMP buildbot:
https://lab.llvm.org/buildbot/#/builders/193/builds/25420

Jan 24 2023, 9:56 AM · Restricted Project, Restricted Project

Nov 2 2022

estewart08 committed rG85c2d92b9b32: [openmp][AMDGPU] - Correct getNumberOfBlocks calculation. (authored by estewart08).
[openmp][AMDGPU] - Correct getNumberOfBlocks calculation.
Nov 2 2022, 9:39 AM · Restricted Project, Restricted Project
estewart08 closed D137261: [openmp][AMDGPU] - Correct getNumberOfBlocks calculation..
Nov 2 2022, 9:39 AM · Restricted Project, Restricted Project
estewart08 updated the summary of D137261: [openmp][AMDGPU] - Correct getNumberOfBlocks calculation..
Nov 2 2022, 9:32 AM · Restricted Project, Restricted Project
estewart08 added a reviewer for D137261: [openmp][AMDGPU] - Correct getNumberOfBlocks calculation.: jhuber6.
Nov 2 2022, 9:11 AM · Restricted Project, Restricted Project
estewart08 updated the summary of D137261: [openmp][AMDGPU] - Correct getNumberOfBlocks calculation..
Nov 2 2022, 8:53 AM · Restricted Project, Restricted Project
estewart08 updated the summary of D137261: [openmp][AMDGPU] - Correct getNumberOfBlocks calculation..
Nov 2 2022, 8:53 AM · Restricted Project, Restricted Project
estewart08 updated the diff for D137261: [openmp][AMDGPU] - Correct getNumberOfBlocks calculation..

Updated summary.

Nov 2 2022, 8:51 AM · Restricted Project, Restricted Project
estewart08 updated the diff for D137261: [openmp][AMDGPU] - Correct getNumberOfBlocks calculation..

Adding clang-format to PATH.

Nov 2 2022, 8:50 AM · Restricted Project, Restricted Project
estewart08 requested review of D137261: [openmp][AMDGPU] - Correct getNumberOfBlocks calculation..
Nov 2 2022, 8:47 AM · Restricted Project, Restricted Project
estewart08 added inline comments to D135444: [OpenMP] Utilize the "non-uniform-workgroup" to simplify DeviceRTL.
Nov 2 2022, 7:45 AM · Restricted Project, Restricted Project
estewart08 added a comment to D135444: [OpenMP] Utilize the "non-uniform-workgroup" to simplify DeviceRTL.

This seems to have broken the amdgpu buildbot.
https://lab.llvm.org/buildbot/#/builders/193/builds/21152

Nov 2 2022, 6:47 AM · Restricted Project, Restricted Project

Dec 16 2021

estewart08 added a comment to D109885: [MLIR][[amdgpu-arch]][OpenMP] Remove direct dependency on /opt/rocm.

@estewart08 thoughts on a good CMAKE variable to allow users to define equivalent of /opt/rocm ? and not use environment variable inside the cmake file.

Dec 16 2021, 10:55 AM · Restricted Project, Restricted Project, Restricted Project

Dec 15 2021

estewart08 accepted D115813: [Debuginfod] Fix debuginfod unit test when $HOME is not writable..

LGTM

Dec 15 2021, 2:00 PM · Restricted Project
estewart08 added inline comments to D112758: [llvm] [Debuginfo] Debuginfod client library..
Dec 15 2021, 1:52 PM · Restricted Project
estewart08 added inline comments to D112758: [llvm] [Debuginfo] Debuginfod client library..
Dec 15 2021, 10:32 AM · Restricted Project

Dec 14 2021

estewart08 added a comment to D115661: [clang][amdgpu] - Choose when to promote VarDecl to address space 4..

This may cause perf regressions for HIP.

Do you have a test that would show such a regression? Emitting a store to address space (4) in a constructor seems the wrong thing to do.

The two lit tests which changed from addr space 4 to 1 demonstrated that. In alias analysis, if a variable is in addr space 4, the backend knows that it is constant and can do optimizations on it. After changing to addr space 1, those optimizations are gone.

The backend also knows because the constant flag is set on the global variable. Addrspace(4) is a kludge which is largely redundant with other mechanisms for indicating constants

Dec 14 2021, 1:53 PM · Restricted Project
estewart08 added a comment to D115661: [clang][amdgpu] - Choose when to promote VarDecl to address space 4..

This may cause perf regressions for HIP.

Dec 14 2021, 11:45 AM · Restricted Project

Dec 13 2021

estewart08 committed rGd1327f8a574a: [clang][amdgpu] - Choose when to promote VarDecl to address space 4. (authored by estewart08).
[clang][amdgpu] - Choose when to promote VarDecl to address space 4.
Dec 13 2021, 2:30 PM
estewart08 closed D115661: [clang][amdgpu] - Choose when to promote VarDecl to address space 4..
Dec 13 2021, 2:30 PM · Restricted Project
estewart08 added a reviewer for D115661: [clang][amdgpu] - Choose when to promote VarDecl to address space 4.: yaxunl.
Dec 13 2021, 12:12 PM · Restricted Project
estewart08 updated the diff for D115661: [clang][amdgpu] - Choose when to promote VarDecl to address space 4..

Resubmit patch with lint.

Dec 13 2021, 12:11 PM · Restricted Project
estewart08 requested review of D115661: [clang][amdgpu] - Choose when to promote VarDecl to address space 4..
Dec 13 2021, 12:08 PM · Restricted Project

Nov 22 2021

estewart08 accepted D114274: [openmp][amdgpu] Make plugin robust to presence of explicit implicit arguments.

LGTM, passed CI testing.

Nov 22 2021, 2:41 PM · Restricted Project

Jul 29 2021

estewart08 added a comment to D104904: [OpenMP][AMDGCN] Initial math headers support.

how to get this moving?

Jul 29 2021, 9:37 AM · Restricted Project

Jul 21 2021

estewart08 added inline comments to D104904: [OpenMP][AMDGCN] Initial math headers support.
Jul 21 2021, 7:01 AM · Restricted Project

Jul 9 2021

estewart08 added inline comments to D104904: [OpenMP][AMDGCN] Initial math headers support.
Jul 9 2021, 12:10 PM · Restricted Project

Jun 23 2021

estewart08 added inline comments to D104677: [OpenMP][AMDGCN] Apply fix for isnan, isinf and isfinite for amdgcn..
Jun 23 2021, 7:05 AM · Restricted Project, Restricted Project

Jun 22 2021

estewart08 updated the diff for D104677: [OpenMP][AMDGCN] Apply fix for isnan, isinf and isfinite for amdgcn..
Add test_isnan function to hip-header.hip.
Jun 22 2021, 1:47 PM · Restricted Project, Restricted Project

Jun 21 2021

estewart08 retitled D104677: [OpenMP][AMDGCN] Apply fix for isnan, isinf and isfinite for amdgcn. from [OpenMP][AMDGCN] Apply fix for isnan, isinf and isinfinite for amdgcn. to [OpenMP][AMDGCN] Apply fix for isnan, isinf and isfinite for amdgcn..
Jun 21 2021, 7:18 PM · Restricted Project, Restricted Project
estewart08 retitled D104677: [OpenMP][AMDGCN] Apply fix for isnan, isinf and isfinite for amdgcn. from [OpenMP] Apply fix for isnan, isinf and isinfinite for amdgcn. to [OpenMP][AMDGCN] Apply fix for isnan, isinf and isinfinite for amdgcn..
Jun 21 2021, 7:09 PM · Restricted Project, Restricted Project
estewart08 updated the diff for D104677: [OpenMP][AMDGCN] Apply fix for isnan, isinf and isfinite for amdgcn..

Attempt to use clang-format.

Jun 21 2021, 7:07 PM · Restricted Project, Restricted Project
estewart08 added reviewers for D104677: [OpenMP][AMDGCN] Apply fix for isnan, isinf and isfinite for amdgcn.: ronlieb, JonChesterfield, yaxunl, scchan, ashi1.
Jun 21 2021, 6:25 PM · Restricted Project, Restricted Project
estewart08 requested review of D104677: [OpenMP][AMDGCN] Apply fix for isnan, isinf and isfinite for amdgcn..
Jun 21 2021, 6:22 PM · Restricted Project, Restricted Project

May 7 2021

estewart08 accepted D101911: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode..

LGTM as a temporary workaround until SPMD properly assigns team private variables.

May 7 2021, 5:55 PM · Restricted Project
estewart08 added a comment to D101911: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode..

Hi Ethan, try this patch if it fixes the issue.

May 7 2021, 5:53 PM · Restricted Project

May 5 2021

estewart08 added a comment to D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode..

In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not see how this helps SPMD mode with team privatization of declarations in-between target teams and parallel regions.

Diв you try the reproducer with the applied patch?

Yes, I still saw the test fail, although it was not with latest llvm-project. Are you saying the reproducer passes for you?

I don't have CUDA installed but from what I see in the LLVM IR it shall pass. Do you have a debug log, does it crashes or produces incorrect results?

This is on an AMDGPU but I assume the behavior would be similar for NVPTX.

It produces incorrect/incomplete results in the dist[0] index after a manual reduction and in turn the final global gpu_results array is incorrect.
When thread 0 does a reduction into dist[0] it has no knowledge of dist[1] having been updated by thread 1. Which tells me the array is still thread private.
Adding some printfs, looking at one teams' output:

SPMD

Thread 0: dist[0]: 1
Thread 0: dist[1]: 0  // This should be 1
After reduction into dist[0]: 1  // This should be 2
gpu_results = [1,1]  // [2,2] expected

Generic Mode:

Thread 0: dist[0]: 1
Thread 0: dist[1]: 1   
After reduction into dist[0]: 2
gpu_results = [2,2]

Hmm, I would expect a crash if the array was allocated in the local memory. Could you try to add some more printfs (with data and addresses of the array) to check the results? Maybe there is a data race somewhere in the code?

As a reminder, each thread updates a unique index in the dist array and each team updates a unique index in gpu_results.

SPMD - shows each thread has a unique address for dist array

Team 0 Thread 1: dist[0]: 0, 0x7f92e24a8bf8
Team 0 Thread 1: dist[1]: 1, 0x7f92e24a8bfc

Team 0 Thread 0: dist[0]: 1, 0x7f92e24a8bf0
Team 0 Thread 0: dist[1]: 0, 0x7f92e24a8bf4

Team 0 Thread 0: After reduction into dist[0]: 1
Team 0 Thread 0: gpu_results address: 0x7f92a5000000
--------------------------------------------------
Team 1 Thread 1: dist[0]: 0, 0x7f92f9ec5188
Team 1 Thread 1: dist[1]: 1, 0x7f92f9ec518c

Team 1 Thread 0: dist[0]: 1, 0x7f92f9ec5180
Team 1 Thread 0: dist[1]: 0, 0x7f92f9ec5184

Team 1 Thread 0: After reduction into dist[0]: 1
Team 1 Thread 0: gpu_results address: 0x7f92a5000000

gpu_results[0]: 1
gpu_results[1]: 1

Generic - shows each team shares dist array address amongst threads

Team 0 Thread 1: dist[0]: 1, 0x7fac01938880
Team 0 Thread 1: dist[1]: 1, 0x7fac01938884

Team 0 Thread 0: dist[0]: 1, 0x7fac01938880
Team 0 Thread 0: dist[1]: 1, 0x7fac01938884

Team 0 Thread 0: After reduction into dist[0]: 2
Team 0 Thread 0: gpu_results address: 0x7fabc5000000
--------------------------------------------------
Team 1 Thread 1: dist[0]: 1, 0x7fac19354e10
Team 1 Thread 1: dist[1]: 1, 0x7fac19354e14

Team 1 Thread 0: dist[0]: 1, 0x7fac19354e10
Team 1 Thread 0: dist[1]: 1, 0x7fac19354e14

Team 1 Thread 0: After reduction into dist[0]: 2
Team 1 Thread 0: gpu_results address: 0x7fabc5000000

Could you check if it works with -fno-openmp-cuda-parallel-target-regions option?

Unfortunately that crashes:
llvm-project/llvm/lib/IR/Instructions.cpp:495: void llvm::CallInst::init(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::ArrayRef<llvm::OperandBundleDefT<llvm::Value*> >, const llvm::Twine&): Assertion `(i >= FTy->getNumParams() || FTy->getParamType(i) == Args[i]->getType()) && "Calling a function with a bad signature!"' failed.

Hmm, could you provide a full stack trace?

At this point I am not sure I want to dig into that crash as our llvm-branch is not caught up to trunk.

I did build trunk and ran some tests on a sm_70:
-Without this patch: code fails with incomplete results
-Without this patch and with -fno-openmp-cuda-parallel-target-regions: code fails with incomplete results

-With this patch: code fails with incomplete results (thread private array)
Team 0 Thread 1: dist[0]: 0, 0x7c1e800000a8
Team 0 Thread 1: dist[1]: 1, 0x7c1e800000ac

Team 0 Thread 0: dist[0]: 1, 0x7c1e800000a0
Team 0 Thread 0: dist[1]: 0, 0x7c1e800000a4

Team 0 Thread 0: After reduction into dist[0]: 1
Team 0 Thread 0: gpu_results address: 0x7c1ebc800000

Team 1 Thread 1: dist[0]: 0, 0x7c1e816f27c8
Team 1 Thread 1: dist[1]: 1, 0x7c1e816f27cc

Team 1 Thread 0: dist[0]: 1, 0x7c1e816f27c0
Team 1 Thread 0: dist[1]: 0, 0x7c1e816f27c4

Team 1 Thread 0: After reduction into dist[0]: 1
Team 1 Thread 0: gpu_results address: 0x7c1ebc800000

gpu_results[0]: 1
gpu_results[1]: 1
FAIL

-With this patch and with -fno-openmp-cuda-parallel-target-regions: Pass
Team 0 Thread 1: dist[0]: 1, 0x7a5b56000018
Team 0 Thread 1: dist[1]: 1, 0x7a5b5600001c

Team 0 Thread 0: dist[0]: 1, 0x7a5b56000018
Team 0 Thread 0: dist[1]: 1, 0x7a5b5600001c

Team 0 Thread 0: After reduction into dist[0]: 2
Team 0 Thread 0: gpu_results address: 0x7a5afc800000

Team 1 Thread 1: dist[0]: 1, 0x7a5b56000018
Team 1 Thread 1: dist[1]: 1, 0x7a5b5600001c

Team 1 Thread 0: dist[0]: 1, 0x7a5b56000018
Team 1 Thread 0: dist[1]: 1, 0x7a5b5600001c

Team 1 Thread 0: After reduction into dist[0]: 2
Team 1 Thread 0: gpu_results address: 0x7a5afc800000

gpu_results[0]: 2
gpu_results[1]: 2
PASS

I am concerned about team 0 and team 1 having the same address for the dist array here.

It is caused by the problem with the runtime. It should work with -fno-openmp-cuda-parallel-target-regions (I think) option (it uses a different runtime function for this case) and I just want to check that it really works. Looks like currently, runtime allocates a unique array for each thread.

May 5 2021, 7:05 AM · Restricted Project

May 4 2021

estewart08 added a comment to D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode..

In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not see how this helps SPMD mode with team privatization of declarations in-between target teams and parallel regions.

Diв you try the reproducer with the applied patch?

Yes, I still saw the test fail, although it was not with latest llvm-project. Are you saying the reproducer passes for you?

I don't have CUDA installed but from what I see in the LLVM IR it shall pass. Do you have a debug log, does it crashes or produces incorrect results?

This is on an AMDGPU but I assume the behavior would be similar for NVPTX.

It produces incorrect/incomplete results in the dist[0] index after a manual reduction and in turn the final global gpu_results array is incorrect.
When thread 0 does a reduction into dist[0] it has no knowledge of dist[1] having been updated by thread 1. Which tells me the array is still thread private.
Adding some printfs, looking at one teams' output:

SPMD

Thread 0: dist[0]: 1
Thread 0: dist[1]: 0  // This should be 1
After reduction into dist[0]: 1  // This should be 2
gpu_results = [1,1]  // [2,2] expected

Generic Mode:

Thread 0: dist[0]: 1
Thread 0: dist[1]: 1   
After reduction into dist[0]: 2
gpu_results = [2,2]

Hmm, I would expect a crash if the array was allocated in the local memory. Could you try to add some more printfs (with data and addresses of the array) to check the results? Maybe there is a data race somewhere in the code?

As a reminder, each thread updates a unique index in the dist array and each team updates a unique index in gpu_results.

SPMD - shows each thread has a unique address for dist array

Team 0 Thread 1: dist[0]: 0, 0x7f92e24a8bf8
Team 0 Thread 1: dist[1]: 1, 0x7f92e24a8bfc

Team 0 Thread 0: dist[0]: 1, 0x7f92e24a8bf0
Team 0 Thread 0: dist[1]: 0, 0x7f92e24a8bf4

Team 0 Thread 0: After reduction into dist[0]: 1
Team 0 Thread 0: gpu_results address: 0x7f92a5000000
--------------------------------------------------
Team 1 Thread 1: dist[0]: 0, 0x7f92f9ec5188
Team 1 Thread 1: dist[1]: 1, 0x7f92f9ec518c

Team 1 Thread 0: dist[0]: 1, 0x7f92f9ec5180
Team 1 Thread 0: dist[1]: 0, 0x7f92f9ec5184

Team 1 Thread 0: After reduction into dist[0]: 1
Team 1 Thread 0: gpu_results address: 0x7f92a5000000

gpu_results[0]: 1
gpu_results[1]: 1

Generic - shows each team shares dist array address amongst threads

Team 0 Thread 1: dist[0]: 1, 0x7fac01938880
Team 0 Thread 1: dist[1]: 1, 0x7fac01938884

Team 0 Thread 0: dist[0]: 1, 0x7fac01938880
Team 0 Thread 0: dist[1]: 1, 0x7fac01938884

Team 0 Thread 0: After reduction into dist[0]: 2
Team 0 Thread 0: gpu_results address: 0x7fabc5000000
--------------------------------------------------
Team 1 Thread 1: dist[0]: 1, 0x7fac19354e10
Team 1 Thread 1: dist[1]: 1, 0x7fac19354e14

Team 1 Thread 0: dist[0]: 1, 0x7fac19354e10
Team 1 Thread 0: dist[1]: 1, 0x7fac19354e14

Team 1 Thread 0: After reduction into dist[0]: 2
Team 1 Thread 0: gpu_results address: 0x7fabc5000000

Could you check if it works with -fno-openmp-cuda-parallel-target-regions option?

Unfortunately that crashes:
llvm-project/llvm/lib/IR/Instructions.cpp:495: void llvm::CallInst::init(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::ArrayRef<llvm::OperandBundleDefT<llvm::Value*> >, const llvm::Twine&): Assertion `(i >= FTy->getNumParams() || FTy->getParamType(i) == Args[i]->getType()) && "Calling a function with a bad signature!"' failed.

Hmm, could you provide a full stack trace?

May 4 2021, 12:48 PM · Restricted Project

Apr 29 2021

estewart08 added a comment to D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode..

In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not see how this helps SPMD mode with team privatization of declarations in-between target teams and parallel regions.

Diв you try the reproducer with the applied patch?

Yes, I still saw the test fail, although it was not with latest llvm-project. Are you saying the reproducer passes for you?

I don't have CUDA installed but from what I see in the LLVM IR it shall pass. Do you have a debug log, does it crashes or produces incorrect results?

This is on an AMDGPU but I assume the behavior would be similar for NVPTX.

It produces incorrect/incomplete results in the dist[0] index after a manual reduction and in turn the final global gpu_results array is incorrect.
When thread 0 does a reduction into dist[0] it has no knowledge of dist[1] having been updated by thread 1. Which tells me the array is still thread private.
Adding some printfs, looking at one teams' output:

SPMD

Thread 0: dist[0]: 1
Thread 0: dist[1]: 0  // This should be 1
After reduction into dist[0]: 1  // This should be 2
gpu_results = [1,1]  // [2,2] expected

Generic Mode:

Thread 0: dist[0]: 1
Thread 0: dist[1]: 1   
After reduction into dist[0]: 2
gpu_results = [2,2]

Hmm, I would expect a crash if the array was allocated in the local memory. Could you try to add some more printfs (with data and addresses of the array) to check the results? Maybe there is a data race somewhere in the code?

As a reminder, each thread updates a unique index in the dist array and each team updates a unique index in gpu_results.

SPMD - shows each thread has a unique address for dist array

Team 0 Thread 1: dist[0]: 0, 0x7f92e24a8bf8
Team 0 Thread 1: dist[1]: 1, 0x7f92e24a8bfc

Team 0 Thread 0: dist[0]: 1, 0x7f92e24a8bf0
Team 0 Thread 0: dist[1]: 0, 0x7f92e24a8bf4

Team 0 Thread 0: After reduction into dist[0]: 1
Team 0 Thread 0: gpu_results address: 0x7f92a5000000
--------------------------------------------------
Team 1 Thread 1: dist[0]: 0, 0x7f92f9ec5188
Team 1 Thread 1: dist[1]: 1, 0x7f92f9ec518c

Team 1 Thread 0: dist[0]: 1, 0x7f92f9ec5180
Team 1 Thread 0: dist[1]: 0, 0x7f92f9ec5184

Team 1 Thread 0: After reduction into dist[0]: 1
Team 1 Thread 0: gpu_results address: 0x7f92a5000000

gpu_results[0]: 1
gpu_results[1]: 1

Generic - shows each team shares dist array address amongst threads

Team 0 Thread 1: dist[0]: 1, 0x7fac01938880
Team 0 Thread 1: dist[1]: 1, 0x7fac01938884

Team 0 Thread 0: dist[0]: 1, 0x7fac01938880
Team 0 Thread 0: dist[1]: 1, 0x7fac01938884

Team 0 Thread 0: After reduction into dist[0]: 2
Team 0 Thread 0: gpu_results address: 0x7fabc5000000
--------------------------------------------------
Team 1 Thread 1: dist[0]: 1, 0x7fac19354e10
Team 1 Thread 1: dist[1]: 1, 0x7fac19354e14

Team 1 Thread 0: dist[0]: 1, 0x7fac19354e10
Team 1 Thread 0: dist[1]: 1, 0x7fac19354e14

Team 1 Thread 0: After reduction into dist[0]: 2
Team 1 Thread 0: gpu_results address: 0x7fabc5000000

Could you check if it works with -fno-openmp-cuda-parallel-target-regions option?

Apr 29 2021, 2:38 PM · Restricted Project
estewart08 added a comment to D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode..

In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not see how this helps SPMD mode with team privatization of declarations in-between target teams and parallel regions.

Diв you try the reproducer with the applied patch?

Yes, I still saw the test fail, although it was not with latest llvm-project. Are you saying the reproducer passes for you?

I don't have CUDA installed but from what I see in the LLVM IR it shall pass. Do you have a debug log, does it crashes or produces incorrect results?

This is on an AMDGPU but I assume the behavior would be similar for NVPTX.

It produces incorrect/incomplete results in the dist[0] index after a manual reduction and in turn the final global gpu_results array is incorrect.
When thread 0 does a reduction into dist[0] it has no knowledge of dist[1] having been updated by thread 1. Which tells me the array is still thread private.
Adding some printfs, looking at one teams' output:

SPMD

Thread 0: dist[0]: 1
Thread 0: dist[1]: 0  // This should be 1
After reduction into dist[0]: 1  // This should be 2
gpu_results = [1,1]  // [2,2] expected

Generic Mode:

Thread 0: dist[0]: 1
Thread 0: dist[1]: 1   
After reduction into dist[0]: 2
gpu_results = [2,2]

Hmm, I would expect a crash if the array was allocated in the local memory. Could you try to add some more printfs (with data and addresses of the array) to check the results? Maybe there is a data race somewhere in the code?

Apr 29 2021, 12:36 PM · Restricted Project
estewart08 added a comment to D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode..

In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not see how this helps SPMD mode with team privatization of declarations in-between target teams and parallel regions.

Diв you try the reproducer with the applied patch?

Yes, I still saw the test fail, although it was not with latest llvm-project. Are you saying the reproducer passes for you?

I don't have CUDA installed but from what I see in the LLVM IR it shall pass. Do you have a debug log, does it crashes or produces incorrect results?

Apr 29 2021, 11:12 AM · Restricted Project
estewart08 added a comment to D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode..

In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not see how this helps SPMD mode with team privatization of declarations in-between target teams and parallel regions.

Diв you try the reproducer with the applied patch?

Apr 29 2021, 9:58 AM · Restricted Project
estewart08 added a comment to D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode..

In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not see how this helps SPMD mode with team privatization of declarations in-between target teams and parallel regions.

Apr 29 2021, 9:53 AM · Restricted Project

Feb 7 2020

estewart08 updated the diff for D74092: Changed omp_get_max_threads() implementation to more closely match spec description..
  • Added FIXME comment to describe change in omp_get_max_threads behavior.
Feb 7 2020, 3:20 PM · Restricted Project

Feb 6 2020

estewart08 updated the diff for D74092: Changed omp_get_max_threads() implementation to more closely match spec description..
  • Update max_threads.c api test to match the change for omp_get_max_threads().
Feb 6 2020, 9:44 AM · Restricted Project

Feb 5 2020

estewart08 added a comment to D74092: Changed omp_get_max_threads() implementation to more closely match spec description..

I can definitely add the change to max_threads.c to this review. The CHECK would become 64 due to the fact we are counting all threads now with this proposed change 32 thread_limit + 32 master warp.

// CHECK: Non-SPMD MaxThreadsL1 = 64

Yes, the test I proposed would be for nvptx only due to the fact that the other tests reside in the nvptx directory and the original max_threads test was checking nvptx values as well. Is the plan to convert all tests so that they support different architectures in the future and move them to common?

Feb 5 2020, 5:33 PM · Restricted Project
estewart08 retitled D74092: Changed omp_get_max_threads() implementation to more closely match spec description. from Changed omp_get_max_threads() implementation to more closely match spec description: "The omp_get_max_threads routine returns an upper bound on the number of threads that could be used to form a new team if a parallel construct without a... to Changed omp_get_max_threads() implementation to more closely match spec description..
Feb 5 2020, 2:34 PM · Restricted Project
estewart08 created D74092: Changed omp_get_max_threads() implementation to more closely match spec description..
Feb 5 2020, 2:16 PM · Restricted Project