Page MenuHomePhabricator

dhruvachak (Dhruva Chakrabarti)
User

Projects

User does not belong to any projects.

User Details

User Since
Mar 15 2021, 11:51 AM (62 w, 2 h)

Recent Activity

Thu, Apr 28

dhruvachak requested review of D124652: [OMPT] [amdgpu] Implemented device init/fini/load callbacks.
Thu, Apr 28, 6:44 PM · Restricted Project, Restricted Project

Apr 19 2022

dhruvachak requested review of D124070: [OMPT] [amdgpu] Implemented callback registration in amdgpu plugin.
Apr 19 2022, 11:46 PM · Restricted Project, Restricted Project

Apr 18 2022

dhruvachak requested review of D123974: [OpenMP] [OMPT] Implemented callback registration in libomptarget.
Apr 18 2022, 7:24 PM · Restricted Project, Restricted Project

Apr 14 2022

dhruvachak updated the diff for D123572: [OpenMP] Implemented a connector for communication of OMPT callbacks between libraries..

Removed absolute path for location of omp-tools.h.

Apr 14 2022, 10:26 PM · Restricted Project, Restricted Project
dhruvachak committed rG7086a1db80e1: [libomptarget] [amdgpu] Hostcall offset check should consider implicit args (authored by dhruvachak).
[libomptarget] [amdgpu] Hostcall offset check should consider implicit args
Apr 14 2022, 5:54 PM · Restricted Project, Restricted Project
dhruvachak closed D123827: [libomptarget] [amdgpu] Hostcall offset check should consider implicit args.
Apr 14 2022, 5:54 PM · Restricted Project, Restricted Project
dhruvachak requested review of D123827: [libomptarget] [amdgpu] Hostcall offset check should consider implicit args.
Apr 14 2022, 5:27 PM · Restricted Project, Restricted Project

Apr 12 2022

dhruvachak requested review of D123572: [OpenMP] Implemented a connector for communication of OMPT callbacks between libraries..
Apr 12 2022, 12:00 AM · Restricted Project, Restricted Project

Apr 11 2022

dhruvachak added a reviewer for D123429: [OpenMP] Create separate categories for host, device, [no]emi events: JonChesterfield.
Apr 11 2022, 7:24 PM · Restricted Project, Restricted Project

Apr 8 2022

dhruvachak added a reviewer for D123429: [OpenMP] Create separate categories for host, device, [no]emi events: jlpeyton.
Apr 8 2022, 6:55 PM · Restricted Project, Restricted Project
dhruvachak added a comment to D123429: [OpenMP] Create separate categories for host, device, [no]emi events.

This patch is split out of https://reviews.llvm.org/D113728 which shows where the new event categories are used.

Apr 8 2022, 6:34 PM · Restricted Project, Restricted Project
dhruvachak requested review of D123429: [OpenMP] Create separate categories for host, device, [no]emi events.
Apr 8 2022, 6:29 PM · Restricted Project, Restricted Project
dhruvachak added inline comments to D102107: [OpenMP] Codegen aggregate for outlined function captures.
Apr 8 2022, 12:36 PM · Restricted Project, Restricted Project, Restricted Project, Restricted Project
dhruvachak added inline comments to D113728: [libomptarget] [amdgpu] Foundation for OMPT target callback support.
Apr 8 2022, 12:16 PM · Restricted Project, Restricted Project
dhruvachak added a comment to D113728: [libomptarget] [amdgpu] Foundation for OMPT target callback support.

I addressed most of the feedback and updated this review. I will work on splitting up this patch into multiple pieces so that they can be reviewed more easily.

Apr 8 2022, 11:36 AM · Restricted Project, Restricted Project
dhruvachak updated the diff for D113728: [libomptarget] [amdgpu] Foundation for OMPT target callback support.

Addressed feedback.

Apr 8 2022, 11:20 AM · Restricted Project, Restricted Project

Apr 7 2022

dhruvachak added a comment to D102107: [OpenMP] Codegen aggregate for outlined function captures.

I added https://github.com/llvm/llvm-project/issues/54654 documenting what I found when testing this patch on amdgpu.

@ggeorgakoudis Can you please rebase this patch on top of main? Thanks.

Hey @dhruvachak. Unfortunately I can't find time lately to work on this patch. Would you like to take over?

Apr 7 2022, 12:44 PM · Restricted Project, Restricted Project, Restricted Project, Restricted Project

Mar 30 2022

dhruvachak added a comment to D102107: [OpenMP] Codegen aggregate for outlined function captures.

As discussed in https://github.com/llvm/llvm-project/issues/54654, this needs to be added for SPMDization with this patch. Not sure whether further handling is required.

Mar 30 2022, 6:03 PM · Restricted Project, Restricted Project, Restricted Project, Restricted Project
Herald added a project to D102107: [OpenMP] Codegen aggregate for outlined function captures: Restricted Project.

I added https://github.com/llvm/llvm-project/issues/54654 documenting what I found when testing this patch on amdgpu.

Mar 30 2022, 12:05 PM · Restricted Project, Restricted Project, Restricted Project, Restricted Project

Jan 27 2022

dhruvachak added reviewers for D118424: Implementation of OMPT target device tracing: RaviNarayanaswamy, dreachem.
Jan 27 2022, 7:17 PM · Restricted Project
dhruvachak retitled D118424: Implementation of OMPT target device tracing from Implementation of OMPT target device tracing This patch is on top of D113728 which has support for OMPT target callbacks. to Implementation of OMPT target device tracing.
Jan 27 2022, 7:14 PM · Restricted Project
dhruvachak requested review of D118424: Implementation of OMPT target device tracing.
Jan 27 2022, 7:06 PM · Restricted Project

Jan 25 2022

dhruvachak updated the diff for D113728: [libomptarget] [amdgpu] Foundation for OMPT target callback support.

Guard ompt_init constructors with ifdef OMPT_SUPPORT, added comments to tests

Jan 25 2022, 4:15 PM · Restricted Project, Restricted Project

Dec 30 2021

dhruvachak updated the diff for D113728: [libomptarget] [amdgpu] Foundation for OMPT target callback support.

Added new tests for OMPT target callbacks for amdgpu

Dec 30 2021, 4:33 PM · Restricted Project, Restricted Project

Dec 10 2021

dhruvachak added a reviewer for D113728: [libomptarget] [amdgpu] Foundation for OMPT target callback support: AndreyChurbanov.
Dec 10 2021, 4:43 PM · Restricted Project, Restricted Project

Dec 9 2021

dhruvachak updated the diff for D113728: [libomptarget] [amdgpu] Foundation for OMPT target callback support.

Addressed feedback

Dec 9 2021, 7:30 PM · Restricted Project, Restricted Project
dhruvachak added inline comments to D113728: [libomptarget] [amdgpu] Foundation for OMPT target callback support.
Dec 9 2021, 7:26 PM · Restricted Project, Restricted Project

Nov 22 2021

dhruvachak updated the diff for D113728: [libomptarget] [amdgpu] Foundation for OMPT target callback support.

Addressed feedback: libomptarget finalize should not depend on OMPD

Nov 22 2021, 2:31 PM · Restricted Project, Restricted Project
dhruvachak added inline comments to D113728: [libomptarget] [amdgpu] Foundation for OMPT target callback support.
Nov 22 2021, 2:29 PM · Restricted Project, Restricted Project

Nov 19 2021

dhruvachak updated the diff for D113728: [libomptarget] [amdgpu] Foundation for OMPT target callback support.

Addressed feedback: made the suggested cmake change and removed unrelated changes

Nov 19 2021, 5:53 PM · Restricted Project, Restricted Project
dhruvachak added a comment to D113728: [libomptarget] [amdgpu] Foundation for OMPT target callback support.

Thanks, Joachim, for your comments. I removed the unrelated changes and made the cmake change in the next version.

Nov 19 2021, 5:47 PM · Restricted Project, Restricted Project

Nov 11 2021

dhruvachak added reviewers for D113728: [libomptarget] [amdgpu] Foundation for OMPT target callback support: JonChesterfield, jmellorcrummey, ronlieb, protze.joachim.
Nov 11 2021, 7:09 PM · Restricted Project, Restricted Project
dhruvachak requested review of D113728: [libomptarget] [amdgpu] Foundation for OMPT target callback support.
Nov 11 2021, 7:05 PM · Restricted Project, Restricted Project

Sep 29 2021

dhruvachak committed rG622627025332: [libomptarget] [amdgpu] After a kernel dispatch packet is published, its… (authored by dhruvachak).
[libomptarget] [amdgpu] After a kernel dispatch packet is published, its…
Sep 29 2021, 9:22 AM
dhruvachak closed D110679: [libomptarget] [amdgpu] After a kernel dispatch packet is published, its contents must not be accessed..
Sep 29 2021, 9:22 AM · Restricted Project
dhruvachak updated the summary of D110679: [libomptarget] [amdgpu] After a kernel dispatch packet is published, its contents must not be accessed..
Sep 29 2021, 12:18 AM · Restricted Project
dhruvachak added a reviewer for D110679: [libomptarget] [amdgpu] After a kernel dispatch packet is published, its contents must not be accessed.: laurentm0.
Sep 29 2021, 12:02 AM · Restricted Project

Sep 28 2021

dhruvachak requested review of D110679: [libomptarget] [amdgpu] After a kernel dispatch packet is published, its contents must not be accessed..
Sep 28 2021, 11:59 PM · Restricted Project

Jul 1 2021

dhruvachak accepted D105239: [libomptarget][nfc] Group environment variables, drop accesses to DeviceInfo global.

Yes, I was concerned about the num_groups computation getting out of sync when the grid size is not a multiple of the num_threads. But I understand your response that we will have to change the print at that point anyways, so let's go ahead with this.

Jul 1 2021, 8:55 AM · Restricted Project

Jun 30 2021

dhruvachak added inline comments to D105239: [libomptarget][nfc] Group environment variables, drop accesses to DeviceInfo global.
Jun 30 2021, 11:16 PM · Restricted Project
dhruvachak added a reverting change for rG2240b41ee4f3: [libomptarget] [amdgpu] Fix default setting of max flat workgroup size: rG98c36f0079d4: Revert "[libomptarget] [amdgpu] Fix default setting of max flat workgroup size".
Jun 30 2021, 5:15 PM
dhruvachak committed rG98c36f0079d4: Revert "[libomptarget] [amdgpu] Fix default setting of max flat workgroup size" (authored by dhruvachak).
Revert "[libomptarget] [amdgpu] Fix default setting of max flat workgroup size"
Jun 30 2021, 5:15 PM
dhruvachak added a reverting change for D105073: [libomptarget] [amdgpu] Fix default setting of max flat workgroup size: rG98c36f0079d4: Revert "[libomptarget] [amdgpu] Fix default setting of max flat workgroup size".
Jun 30 2021, 5:15 PM · Restricted Project
dhruvachak closed D105250: Revert "[libomptarget] [amdgpu] Fix default setting of max flat workgroup size".
Jun 30 2021, 5:15 PM · Restricted Project
dhruvachak added a reverting change for rG2240b41ee4f3: [libomptarget] [amdgpu] Fix default setting of max flat workgroup size: D105250: Revert "[libomptarget] [amdgpu] Fix default setting of max flat workgroup size".
Jun 30 2021, 5:11 PM
dhruvachak requested review of D105250: Revert "[libomptarget] [amdgpu] Fix default setting of max flat workgroup size".
Jun 30 2021, 5:11 PM · Restricted Project
dhruvachak added a reverting change for D105073: [libomptarget] [amdgpu] Fix default setting of max flat workgroup size: D105250: Revert "[libomptarget] [amdgpu] Fix default setting of max flat workgroup size".
Jun 30 2021, 5:11 PM · Restricted Project
dhruvachak added a comment to D105229: [libomptarget][nfc] Replace out arguments with struct return.

Anything that is accessible directly from getLaunchVals (such as DeviceInfo.*), can we not pass them and access them directly? e.g. DeviceInfo.EnvTeamLimit, etc. That would minimize the number of arguments to getLaunchVals.

Jun 30 2021, 2:56 PM · Restricted Project

Jun 29 2021

dhruvachak requested review of D105163: [libomptarget] [amdgpu] Ensure OMP_TEAMS_THREAD_LIMIT is not overwritten by the default workgroup size.
Jun 29 2021, 5:25 PM · Restricted Project
dhruvachak committed rGe0b713a0357a: [libomptarget] [amdgpu] Change default number of teams per computation unit (authored by dhruvachak).
[libomptarget] [amdgpu] Change default number of teams per computation unit
Jun 29 2021, 3:35 PM
dhruvachak closed D99003: [libomptarget] [amdgpu] Change default number of teams per computation unit.
Jun 29 2021, 3:35 PM · Restricted Project
dhruvachak updated the diff for D99003: [libomptarget] [amdgpu] Change default number of teams per computation unit.

rebase

Jun 29 2021, 3:32 PM · Restricted Project
dhruvachak committed rG2240b41ee4f3: [libomptarget] [amdgpu] Fix default setting of max flat workgroup size (authored by dhruvachak).
[libomptarget] [amdgpu] Fix default setting of max flat workgroup size
Jun 29 2021, 1:48 PM
dhruvachak closed D105073: [libomptarget] [amdgpu] Fix default setting of max flat workgroup size.
Jun 29 2021, 1:48 PM · Restricted Project
dhruvachak added a comment to D105073: [libomptarget] [amdgpu] Fix default setting of max flat workgroup size.

I guess this avoids a case where threadsPerGroup is set to zero by a zero ConstWGSize.

Jun 29 2021, 12:25 PM · Restricted Project

Jun 28 2021

dhruvachak updated the diff for D98832: [libomptarget] Tune the number of teams and threads for kernel launch..

[libomptarget] [amdgpu] Set number of teams and threads based on GPU occupancy.

Jun 28 2021, 6:38 PM · Restricted Project, Restricted Project
dhruvachak requested review of D105073: [libomptarget] [amdgpu] Fix default setting of max flat workgroup size.
Jun 28 2021, 6:15 PM · Restricted Project
dhruvachak updated the diff for D98832: [libomptarget] Tune the number of teams and threads for kernel launch..

[libomptarget] [amdgpu] Set number of teams and threads based on GPU occupancy.

Jun 28 2021, 11:57 AM · Restricted Project, Restricted Project

Jun 25 2021

dhruvachak added a comment to D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers..

I am seeing a run-time failure as well in addition to the compile-time failure reported earlier. Here's a test case, veccopy.c.

Jun 25 2021, 6:17 PM · Restricted Project

Jun 16 2021

dhruvachak added a comment to D98832: [libomptarget] Tune the number of teams and threads for kernel launch..

I haven't tried to understand the control flow yet. Is the idea to map a target region to as large a fraction of a CU as we can, scaling it back when occupancy constraints would force some of it to be idle anyway?

Jun 16 2021, 1:20 PM · Restricted Project, Restricted Project
dhruvachak added a reviewer for D98832: [libomptarget] Tune the number of teams and threads for kernel launch.: carlo.bertolli.
Jun 16 2021, 10:12 AM · Restricted Project, Restricted Project
dhruvachak updated the diff for D98832: [libomptarget] Tune the number of teams and threads for kernel launch..

[libomptarget] [amdgpu] Set number of teams and threads based on GPU occupancy.

Jun 16 2021, 10:05 AM · Restricted Project, Restricted Project

May 26 2021

dhruvachak accepted D103090: [libomptarget][nfc][amdgpu] Factor out setting upper bounds.

LGTM. Thanks.

May 26 2021, 11:28 AM · Restricted Project
dhruvachak accepted D103093: [libomptarget][nfc][amdgpu] Refactor uses of KernelInfoTable.

LGTM

May 26 2021, 10:50 AM · Restricted Project
dhruvachak added inline comments to D103090: [libomptarget][nfc][amdgpu] Factor out setting upper bounds.
May 26 2021, 10:44 AM · Restricted Project

May 24 2021

dhruvachak committed rG96d70f4d289b: [libomptarget] [amdgpu] Added LDS usage to the kernel trace (authored by dhruvachak).
[libomptarget] [amdgpu] Added LDS usage to the kernel trace
May 24 2021, 7:34 PM
dhruvachak closed D103059: [libomptarget] [amdgpu] Added LDS usage to the kernel trace.
May 24 2021, 7:34 PM · Restricted Project
dhruvachak added inline comments to D103059: [libomptarget] [amdgpu] Added LDS usage to the kernel trace.
May 24 2021, 7:24 PM · Restricted Project
dhruvachak requested review of D103059: [libomptarget] [amdgpu] Added LDS usage to the kernel trace.
May 24 2021, 5:47 PM · Restricted Project
dhruvachak committed rGca17b26d4d7a: [libomptarget] [amdgpu] Fix copy-paste error setting NumThreads for a corner… (authored by dhruvachak).
[libomptarget] [amdgpu] Fix copy-paste error setting NumThreads for a corner…
May 24 2021, 3:24 PM
dhruvachak closed D103037: [libomptarget] [amdgpu] Fix copy-paste error setting NumThreads for a corner case..
May 24 2021, 3:24 PM · Restricted Project
dhruvachak requested review of D103037: [libomptarget] [amdgpu] Fix copy-paste error setting NumThreads for a corner case..
May 24 2021, 11:26 AM · Restricted Project

Mar 19 2021

dhruvachak requested review of D99003: [libomptarget] [amdgpu] Change default number of teams per computation unit.
Mar 19 2021, 7:00 PM · Restricted Project
dhruvachak committed rG451e7001a097: Empty test commit, verifying commit access (authored by dhruvachak).
Empty test commit, verifying commit access
Mar 19 2021, 5:43 PM
dhruvachak added a comment to D98832: [libomptarget] Tune the number of teams and threads for kernel launch..

...
Agreed. However, I don't see LDS usage in the metadata table in the image. Is it present there?

Yes, see https://llvm.org/docs/AMDGPUUsage.html for the list of what we can expect. What may not be obvious is that the metadata calls it ".group_segment_fixed_size". I don't know the origin of the terminology, maybe opencl?

In theory, a very high sgpr count can limit the number of available workgroups if that's not factored in for determining the number of threads. But in practice, VGPRs tend to be the primary limiting factor. So perhaps we can start with using VGPRs for this purpose and have experience guide us in the future.

If I understand correctly, occupancy rules all look something like (resource used / resource available) == number simultaneous, where one of the resources tends to be limiting. Offhand, I think that's VGPR, SGPR, LDS (group segment). I think there's also an architecture dependent upper bound on how many things can run at once even if they use very little of those, maybe 8 for gfx9 and 16 for gfx10.

If that's right, perhaps the calculation should look something like:

uint vgpr_occupancy = vgpr_used / vgpr_available;
uint sgpr_occupancy = sgpr_used / sgpr_available;
uint lds_occupancy = lds_used / lds_available;
uint limiting_occupancy = min(vgpr_occupancy, sgpr_occupacny, lds_occupancy);

and then we derive threadsPerGroup from that occupancy and the various other considerations.

Mar 19 2021, 12:37 PM · Restricted Project, Restricted Project

Mar 18 2021

dhruvachak added a comment to D98832: [libomptarget] Tune the number of teams and threads for kernel launch..

Could you upload patches with full context please

Mar 18 2021, 4:03 PM · Restricted Project, Restricted Project
dhruvachak updated the diff for D98832: [libomptarget] Tune the number of teams and threads for kernel launch..

Added full context to the updated patch.

Mar 18 2021, 3:55 PM · Restricted Project, Restricted Project
dhruvachak added a comment to D98832: [libomptarget] Tune the number of teams and threads for kernel launch..

This is really interesting. The idea seems to be to choose the dispatch parameters based on the kernel metadata and the limits of the machine.

What's the underlying heuristic? Break across N CU's in chunks that match the occupancy limits of each CU?

Mar 18 2021, 11:25 AM · Restricted Project, Restricted Project
dhruvachak added a comment to D98829: [libomptarget] Add register usage info to kernel metadata.

Looks good to me, thanks.

We should probably use uint64_t everywhere, instead of sometimes truncating to uint32_t, but that pattern and the 0,0,0,0, one are pre-existing.

Let's go with this and report errors on implausible (e.g. > 4 billion) register counts as a separate patch, along with sanity checking requested LDS etc.

Mar 18 2021, 9:51 AM · Restricted Project

Mar 17 2021

dhruvachak requested review of D98832: [libomptarget] Tune the number of teams and threads for kernel launch..
Mar 17 2021, 5:52 PM · Restricted Project, Restricted Project
dhruvachak requested review of D98829: [libomptarget] Add register usage info to kernel metadata.
Mar 17 2021, 5:32 PM · Restricted Project