Page MenuHomePhabricator

gtbercea (Gheorghe-Teodor Bercea)
User

Projects

User does not belong to any projects.

User Details

User Since
Dec 29 2016, 12:44 AM (111 w, 1 d)

Recent Activity

Fri, Jan 18

gtbercea accepted D56733: [OPENMP] update release note for implemented OMP 5.0 features.

Thanks for the update!

Fri, Jan 18, 10:48 AM · Restricted Project

Thu, Jan 17

gtbercea added inline comments to D56733: [OPENMP] update release note for implemented OMP 5.0 features.
Thu, Jan 17, 1:41 PM · Restricted Project
gtbercea added a comment to D56733: [OPENMP] update release note for implemented OMP 5.0 features.

Could we add the changes in D56790 to this diff?

Sure, I will do that.

Thu, Jan 17, 1:10 PM · Restricted Project

Jan 16 2019

gtbercea added a comment to D56733: [OPENMP] update release note for implemented OMP 5.0 features.

Could we add the changes in D56790 to this diff?

Jan 16 2019, 12:09 PM · Restricted Project
gtbercea created D56790: [OpenMP] Update release notes with OpenMP related changes.
Jan 16 2019, 9:12 AM

Jan 9 2019

gtbercea added inline comments to D55928: [OpenMP] Add flag for preventing the extension to 64 bits for the collapse loop counter.
Jan 9 2019, 12:15 PM
gtbercea updated the diff for D55773: [OpenMP][libomptarget] Use shared memory variable for tracking parallel level.
  • Fix.
Jan 9 2019, 9:09 AM
gtbercea updated the diff for D55773: [OpenMP][libomptarget] Use shared memory variable for tracking parallel level.
  • Fix.
Jan 9 2019, 9:05 AM
gtbercea updated the diff for D55773: [OpenMP][libomptarget] Use shared memory variable for tracking parallel level.
  • Add slot initialization.
Jan 9 2019, 8:58 AM
gtbercea added inline comments to D55773: [OpenMP][libomptarget] Use shared memory variable for tracking parallel level.
Jan 9 2019, 6:46 AM

Jan 8 2019

gtbercea updated the diff for D55773: [OpenMP][libomptarget] Use shared memory variable for tracking parallel level.
  • Use new sync.
Jan 8 2019, 1:49 PM
gtbercea updated the diff for D56413: [OpenMP] Avoid remainder operations for loop index values on a collapsed loop nest..
Fix.
Jan 8 2019, 1:01 PM
gtbercea added a comment to D55773: [OpenMP][libomptarget] Use shared memory variable for tracking parallel level.

ping

Jan 8 2019, 12:27 PM
gtbercea updated the diff for D56413: [OpenMP] Avoid remainder operations for loop index values on a collapsed loop nest..

Remove redundant initalization.

Jan 8 2019, 11:54 AM
gtbercea updated the diff for D56413: [OpenMP] Avoid remainder operations for loop index values on a collapsed loop nest..

Fix update.

Jan 8 2019, 11:41 AM
gtbercea updated the diff for D56413: [OpenMP] Avoid remainder operations for loop index values on a collapsed loop nest..
Invert accumulation direction.
Jan 8 2019, 11:40 AM

Jan 7 2019

gtbercea created D56413: [OpenMP] Avoid remainder operations for loop index values on a collapsed loop nest..
Jan 7 2019, 2:09 PM

Dec 20 2018

gtbercea added inline comments to D55928: [OpenMP] Add flag for preventing the extension to 64 bits for the collapse loop counter.
Dec 20 2018, 7:43 AM
gtbercea updated the diff for D55928: [OpenMP] Add flag for preventing the extension to 64 bits for the collapse loop counter.
  • Address comments.
Dec 20 2018, 7:40 AM
gtbercea added inline comments to D55928: [OpenMP] Add flag for preventing the extension to 64 bits for the collapse loop counter.
Dec 20 2018, 7:25 AM
gtbercea updated the diff for D55928: [OpenMP] Add flag for preventing the extension to 64 bits for the collapse loop counter.
  • Address comments.
Dec 20 2018, 7:24 AM
gtbercea created D55928: [OpenMP] Add flag for preventing the extension to 64 bits for the collapse loop counter.
Dec 20 2018, 6:42 AM

Dec 17 2018

gtbercea created D55773: [OpenMP][libomptarget] Use shared memory variable for tracking parallel level.
Dec 17 2018, 8:44 AM
gtbercea created D55772: [OpenMP][libomptarget] Suppress C++ 11 related warnings when building libomptarget-nvptx bitcode library.
Dec 17 2018, 8:39 AM

Dec 10 2018

gtbercea accepted D55514: [OPENMP][NVPTX]Revert __kmpc_shuffle_int64 to its original form..

LG

Dec 10 2018, 8:41 AM

Dec 7 2018

gtbercea accepted D55440: [OPENMP][NVPTX]Enable fast shuffles on 64bit values only if CUDA >= 9..

LG

Dec 7 2018, 2:10 PM
gtbercea accepted D55436: [OPENMP][NVPTX]Save registers for optimized builds with enabled logging..

LG

Dec 7 2018, 7:41 AM

Dec 6 2018

gtbercea accepted D55379: [OPENMP][NVPTX]Correct type casting for printf args + simplified shfl64 function..

LG

Dec 6 2018, 11:49 AM
gtbercea accepted D55370: [OPENMP][NVPTX]Fix __kmpc_flush to flush the memory per system, not per block..

LG

Dec 6 2018, 7:20 AM

Dec 3 2018

gtbercea created D55219: [OpenMP][libomptarget] Flush intermediate values during team reduction .
Dec 3 2018, 7:16 AM

Nov 30 2018

gtbercea accepted D55130: [OPENMP][NVPTX]Make runtime compatible with the original runtime..

LGTM

Nov 30 2018, 8:37 AM

Nov 29 2018

gtbercea added a comment to D54342: Add omp_get_device_num() and update several other device API functions.

After this patch, omp_get_num_devices() seems to wrongly return 0 even when devices are present.

Nov 29 2018, 2:22 PM · Restricted Project

Nov 27 2018

gtbercea updated the diff for D54970: [OpenMP] Add a new version of the SPMD deinit kernel function.
Add constant values to function calls.
Nov 27 2018, 1:47 PM
gtbercea updated the diff for D54970: [OpenMP] Add a new version of the SPMD deinit kernel function.
Delete old function.
Nov 27 2018, 1:19 PM
gtbercea updated the diff for D54969: [OpenMP][libomptarget] Add new version of SPMD deinit kernel function with argument.
Call new function in the old function.
Nov 27 2018, 1:15 PM
gtbercea added a parent revision for D54970: [OpenMP] Add a new version of the SPMD deinit kernel function: D54969: [OpenMP][libomptarget] Add new version of SPMD deinit kernel function with argument.
Nov 27 2018, 1:09 PM
gtbercea added a child revision for D54969: [OpenMP][libomptarget] Add new version of SPMD deinit kernel function with argument: D54970: [OpenMP] Add a new version of the SPMD deinit kernel function.
Nov 27 2018, 1:09 PM
gtbercea created D54970: [OpenMP] Add a new version of the SPMD deinit kernel function.
Nov 27 2018, 1:09 PM
gtbercea created D54969: [OpenMP][libomptarget] Add new version of SPMD deinit kernel function with argument.
Nov 27 2018, 1:05 PM
gtbercea accepted D54967: [OPENMP][NVPTX]Basic support for reductions across the teams..

LG

Nov 27 2018, 12:54 PM
gtbercea updated the diff for D54960: [OpenMP][libomptarget] Refactor SPMD and runtime requirement checking.
Address comments.
Nov 27 2018, 11:18 AM
gtbercea accepted D52700: [libomptarget-nvptx] Remove dead functions.

LGTM since neither Clang nor clang-ykt use these functions.

Nov 27 2018, 11:04 AM
gtbercea created D54960: [OpenMP][libomptarget] Refactor SPMD and runtime requirement checking.
Nov 27 2018, 11:00 AM

Nov 20 2018

gtbercea accepted D54766: [OPENMP][NVPTX]Improved lock/critical constructs, NFC..

This is actually a bug fix.

Nov 20 2018, 12:14 PM

Nov 8 2018

gtbercea accepted D54260: [OPENMP]Make lambda mapping follow reqs for PTR_AND_OBJ mapping..

LG

Nov 8 2018, 7:10 AM

Nov 2 2018

gtbercea accepted D54035: [OPENMP][OFFLOADING]Change the lambda capturing flags..

LG

Nov 2 2018, 8:21 AM

Nov 1 2018

gtbercea accepted D53943: [OPENMP][NVPTX]Fixed/improved support for globalization in team contexts..

LG

Nov 1 2018, 1:26 PM

Oct 31 2018

gtbercea added inline comments to D53141: [OpenMP][libomptarget] Add runtime function for pushing coalesced global records.
Oct 31 2018, 1:40 PM
gtbercea updated the diff for D53141: [OpenMP][libomptarget] Add runtime function for pushing coalesced global records.
Reinstate assert.
Oct 31 2018, 1:38 PM
gtbercea updated the diff for D53141: [OpenMP][libomptarget] Add runtime function for pushing coalesced global records.

Move while on else branch.

Oct 31 2018, 12:35 PM
gtbercea updated the diff for D53141: [OpenMP][libomptarget] Add runtime function for pushing coalesced global records.
Address comments.
Oct 31 2018, 11:52 AM

Oct 30 2018

gtbercea accepted D51107: [LIBOMPTARGET] Add support for mapping of lambda captures..

LG

Oct 30 2018, 8:06 AM

Oct 29 2018

gtbercea retitled D53827: [OpenMP] Fix condition. from [OpenMP][NFC] Fix condition. to [OpenMP] Fix condition..
Oct 29 2018, 12:44 PM
gtbercea updated the diff for D53827: [OpenMP] Fix condition..
Add tests.
Oct 29 2018, 12:42 PM
gtbercea added reviewers for D53827: [OpenMP] Fix condition.: ABataev, caomhin.
Oct 29 2018, 12:32 PM
gtbercea created D53827: [OpenMP] Fix condition..
Oct 29 2018, 12:32 PM
gtbercea added a comment to D53443: [OpenMP][NVPTX] Enable default scheduling for parallel for in non-SPMD cases..

ping

Oct 29 2018, 7:39 AM
gtbercea updated the diff for D53448: [OpenMP][NVPTX] Use single loops when generating code for distribute parallel for.

Fix comment.

Oct 29 2018, 7:37 AM

Oct 26 2018

gtbercea updated the diff for D53443: [OpenMP][NVPTX] Enable default scheduling for parallel for in non-SPMD cases..
Add test.
Oct 26 2018, 2:44 PM
gtbercea created D53772: [NFC][OpenMP] Add new test for parallel for code generation..
Oct 26 2018, 11:54 AM
gtbercea updated the diff for D53443: [OpenMP][NVPTX] Enable default scheduling for parallel for in non-SPMD cases..
Add test.
Oct 26 2018, 8:43 AM

Oct 25 2018

gtbercea updated the diff for D53448: [OpenMP][NVPTX] Use single loops when generating code for distribute parallel for.

Change tests.

Oct 25 2018, 1:39 PM
gtbercea updated the diff for D53448: [OpenMP][NVPTX] Use single loops when generating code for distribute parallel for.

Add test for collapse.

Oct 25 2018, 12:35 PM
gtbercea updated the diff for D53448: [OpenMP][NVPTX] Use single loops when generating code for distribute parallel for.
Simplify code.
Oct 25 2018, 10:02 AM
gtbercea updated the diff for D53448: [OpenMP][NVPTX] Use single loops when generating code for distribute parallel for.
Refactor static chunk schedules. Fix tests.
Oct 25 2018, 8:59 AM
gtbercea added inline comments to D53448: [OpenMP][NVPTX] Use single loops when generating code for distribute parallel for.
Oct 25 2018, 6:55 AM
gtbercea updated the diff for D53448: [OpenMP][NVPTX] Use single loops when generating code for distribute parallel for.

Use NumIterations.

Oct 25 2018, 6:53 AM
gtbercea updated the diff for D53448: [OpenMP][NVPTX] Use single loops when generating code for distribute parallel for.
Refactor chunk one checking.
Oct 25 2018, 6:41 AM

Oct 19 2018

gtbercea added a parent revision for D53448: [OpenMP][NVPTX] Use single loops when generating code for distribute parallel for: D53443: [OpenMP][NVPTX] Enable default scheduling for parallel for in non-SPMD cases..
Oct 19 2018, 1:06 PM
gtbercea added a child revision for D53443: [OpenMP][NVPTX] Enable default scheduling for parallel for in non-SPMD cases.: D53448: [OpenMP][NVPTX] Use single loops when generating code for distribute parallel for.
Oct 19 2018, 1:06 PM
gtbercea updated the diff for D53448: [OpenMP][NVPTX] Use single loops when generating code for distribute parallel for.
Rebase.
Oct 19 2018, 12:34 PM
gtbercea created D53448: [OpenMP][NVPTX] Use single loops when generating code for distribute parallel for.
Oct 19 2018, 12:33 PM
gtbercea updated the diff for D53141: [OpenMP][libomptarget] Add runtime function for pushing coalesced global records.
Refactor.
Oct 19 2018, 12:21 PM
gtbercea updated the diff for D53443: [OpenMP][NVPTX] Enable default scheduling for parallel for in non-SPMD cases..
Refactor.
Oct 19 2018, 12:19 PM
gtbercea updated the diff for D53141: [OpenMP][libomptarget] Add runtime function for pushing coalesced global records.
Refactor.
Oct 19 2018, 12:16 PM
gtbercea updated the diff for D53141: [OpenMP][libomptarget] Add runtime function for pushing coalesced global records.
Refactor.
Oct 19 2018, 10:54 AM
gtbercea created D53443: [OpenMP][NVPTX] Enable default scheduling for parallel for in non-SPMD cases..
Oct 19 2018, 10:39 AM

Oct 11 2018

gtbercea updated the diff for D53141: [OpenMP][libomptarget] Add runtime function for pushing coalesced global records.

Simply call to common push function.

Oct 11 2018, 12:32 PM
gtbercea updated the diff for D53141: [OpenMP][libomptarget] Add runtime function for pushing coalesced global records.

Refactor.

Oct 11 2018, 11:10 AM
gtbercea updated the diff for D53141: [OpenMP][libomptarget] Add runtime function for pushing coalesced global records.

Ensure PushSize is multiple of 8 bytes.

Oct 11 2018, 10:56 AM
gtbercea created D53141: [OpenMP][libomptarget] Add runtime function for pushing coalesced global records.
Oct 11 2018, 8:21 AM

Oct 1 2018

gtbercea abandoned D29660: [OpenMP] Add flag for overwriting default PTX version for OpenMP targets.

Going through my list of reviews, this patch was reverted because of memory leaks in other changes. However, I don't think we need this anymore because Clang is raising the PTX level as needed for that CUDA version. Can we abandon this flag?

Oct 1 2018, 7:21 AM

Sep 28 2018

gtbercea added a comment to D52434: [OpenMP] Make default distribute schedule for NVPTX target regions in SPMD mode achieve coalescing.

You report a slow down which I am not able to reproduce actually. Do you use any additional clauses not present in your previous post?

No, only dist_schedule(static) which is faster. Tested on a Tesla P100 with today's trunk version:

#pragma omp target teams distribute parallel for (new defaults)190 - 250 GB/s
adding clauses for old defaults: schedule(static) dist_schedule(static)30 - 50 GB/s
same directive with only dist_schedule(static) added (fewer registers)320 - 400 GB/s
Sep 28 2018, 10:36 AM
gtbercea added a comment to D52434: [OpenMP] Make default distribute schedule for NVPTX target regions in SPMD mode achieve coalescing.

One big problem your code has is that the trip count is incredibly small, especially for STREAM and especially on GPUs. You need a much larger loop size otherwise the timings will be dominated by OpenMP setups costs.

Sure, I'm not that dump. The real code has larger loops, this was just for demonstration purposes. I don't expect the register count to change based on loop size - is that too optimistic?

Sep 28 2018, 7:54 AM
gtbercea added a comment to D52434: [OpenMP] Make default distribute schedule for NVPTX target regions in SPMD mode achieve coalescing.

One big problem your code has is that the trip count is incredibly small, especially for STREAM and especially on GPUs. You need a much larger loop size otherwise the timings will be dominated by OpenMP setups costs.

Sure, I'm not that dump. The real code has larger loops, this was just for demonstration purposes. I don't expect the register count to change based on loop size - is that too optimistic?

Sep 28 2018, 5:40 AM
gtbercea added a comment to D52434: [OpenMP] Make default distribute schedule for NVPTX target regions in SPMD mode achieve coalescing.

Just tested this and got very weird results for register usage:

void func(double *a) {
  #pragma omp target teams distribute parallel for map(a[0:100]) // dist_schedule(static)
  for (int i = 0; i < 100; i++) {
    a[i]++;
  }
}

Compiling with current trunk for sm_60 (Pascal): 29 registers
Adding dist_schedule(static) (the previous default): 19 registers
For reference: dist_schedule(static, 128) also uses 29 registers

Any ideas? This significantly slows down STREAM...

Sep 28 2018, 5:27 AM

Sep 27 2018

gtbercea updated the diff for D52629: [OpenMP] Make default parallel for schedule in NVPTX target regions in SPMD mode achieve coalescing.

Address comment.

Sep 27 2018, 1:25 PM
gtbercea created D52629: [OpenMP] Make default parallel for schedule in NVPTX target regions in SPMD mode achieve coalescing.
Sep 27 2018, 1:18 PM
gtbercea retitled D52434: [OpenMP] Make default distribute schedule for NVPTX target regions in SPMD mode achieve coalescing from [OpenMP] Make default schedules for NVPTX target regions in SPMD mode achieve coalescing to [OpenMP] Make default distribute schedule for NVPTX target regions in SPMD mode achieve coalescing.
Sep 27 2018, 12:25 PM
gtbercea added a comment to D52434: [OpenMP] Make default distribute schedule for NVPTX target regions in SPMD mode achieve coalescing.

Should we also change the default schedule to static, 1? I know that's not really needed for teams distribute parallel for (because the new default dist_schedule only leaves one iteration per thread), but this doesn't happen for target parallel for. Additionally it would make the intent more explicit and LLVM doesn't need to look through divisions needed to implement static without chunk. Just thinking aloud, not sure if that's worth it.

Sep 27 2018, 8:20 AM
gtbercea updated the diff for D52434: [OpenMP] Make default distribute schedule for NVPTX target regions in SPMD mode achieve coalescing.

Fix type of chunk size.

Sep 27 2018, 7:55 AM

Sep 26 2018

gtbercea abandoned D52436: [OpenMP][libomptarget] Add runtime functions for default schedule for distribute.

Due to most recent proposed changes to Clang in D52434, changes to the runtime are no longer required.

Sep 26 2018, 12:18 PM
gtbercea updated the diff for D52434: [OpenMP] Make default distribute schedule for NVPTX target regions in SPMD mode achieve coalescing.

Only change default schedule for distribute directive.

Sep 26 2018, 12:16 PM

Sep 24 2018

gtbercea created D52436: [OpenMP][libomptarget] Add runtime functions for default schedule for distribute.
Sep 24 2018, 2:16 PM
gtbercea created D52434: [OpenMP] Make default distribute schedule for NVPTX target regions in SPMD mode achieve coalescing.
Sep 24 2018, 1:44 PM

Sep 21 2018

gtbercea accepted D51875: [OPENMP][NVPTX] Add support for lastprivates/reductions handling in SPMD constructs with lightweight runtime..

LGTM

Sep 21 2018, 6:34 AM

Sep 14 2018

gtbercea created D52122: [OpenMP][libomptarget] Set the frame pointer then test empty slot condition.
Sep 14 2018, 1:43 PM

Sep 11 2018

gtbercea accepted D51937: [OPENMP]Increment iterator when the loop is continued..

LG

Sep 11 2018, 10:06 AM
gtbercea added a comment to D51687: [libomptarget-nvptx] Add testing infrastructure.

Considering your comment in the description about requiring latest Clang perhaps you should revisit this patch: D46842

Sep 11 2018, 6:06 AM

Aug 30 2018

gtbercea added a comment to D50845: [CUDA/OpenMP] Define only some host macros during device compilation.

removing InitializePredefinedAuxMacros and the new test completely should do.

Aug 30 2018, 1:19 PM