Page MenuHomePhabricator

rampitec (Stanislav Mekhanoshin)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 4 2014, 4:14 AM (375 w, 6 d)

Recent Activity

Today

rampitec added a comment to D104331: [AMDGPU] Use performOptimizedStructLayout for LDS sort.

PSDB passed.

Thu, Jun 17, 1:51 PM · Restricted Project
rampitec added a comment to D104316: [AMDGPU] Propagate LDS align into to instructions.

PSDB passed.

Thu, Jun 17, 12:07 PM · Restricted Project
rampitec updated the diff for D104316: [AMDGPU] Propagate LDS align into to instructions.

Fixed formatting.

Thu, Jun 17, 10:51 AM · Restricted Project
rampitec added inline comments to D104316: [AMDGPU] Propagate LDS align into to instructions.
Thu, Jun 17, 10:50 AM · Restricted Project

Yesterday

rampitec committed rG0a07343e34fc: [AMDGPU] Fixed constexpr expansion to handle multiple uses (authored by rampitec).
[AMDGPU] Fixed constexpr expansion to handle multiple uses
Wed, Jun 16, 4:58 PM
rampitec closed D104425: [AMDGPU] Fixed constexpr expansion to handle multiple uses.
Wed, Jun 16, 4:58 PM · Restricted Project
rampitec added inline comments to D104425: [AMDGPU] Fixed constexpr expansion to handle multiple uses.
Wed, Jun 16, 4:56 PM · Restricted Project
rampitec added a comment to D103655: [AMDGPU] Handle constant LDS uses from different kernels.

Minimized reproducer: https://gist.github.com/Artem-B/44d8fa3f1bf0a3c992f4fe5bcf678c3f#file-lds-assert-ll

LLVM version I've tested with: 47f18af55fd59e813144cc76711806d57a160e50

$ bin/opt -amdgpu-lower-module-lds -disable-output LDS-assert.ll

Thanks, reproduced.

Wed, Jun 16, 4:02 PM · Restricted Project
rampitec requested review of D104425: [AMDGPU] Fixed constexpr expansion to handle multiple uses.
Wed, Jun 16, 4:02 PM · Restricted Project
rampitec added a comment to D103655: [AMDGPU] Handle constant LDS uses from different kernels.

Minimized reproducer: https://gist.github.com/Artem-B/44d8fa3f1bf0a3c992f4fe5bcf678c3f#file-lds-assert-ll

LLVM version I've tested with: 47f18af55fd59e813144cc76711806d57a160e50

$ bin/opt -amdgpu-lower-module-lds -disable-output LDS-assert.ll
Wed, Jun 16, 3:11 PM · Restricted Project
rampitec added a comment to D103655: [AMDGPU] Handle constant LDS uses from different kernels.

FYI. I've just got an assertion in the pass. I'll post a reduced reproducer when I have it.
Meanwhile here' the crash info:

F0616 14:20:09.488221 1150352 logging.cc:107] assert.h assertion failed at third_party/llvm/llvm-project/llvm/include/llvm/Support/Casting.h:269 in typename cast_retty<X, Y *>::ret_type llvm::cast(Y *) [X = llvm::PointerType, Y = llvm::Type]: isa<X>(Val) && "cast<Ty>() argument of incompatible type!"
*** Check failure stack trace: ***
    @     0x55555d4253df  absl::logging_internal::LogMessage::Die()
    @     0x55555d424e54  absl::logging_internal::LogMessage::SendToLog()
    @     0x55555d424b7f  absl::logging_internal::LogMessage::Flush()
    @     0x55555d425ae9  absl::logging_internal::LogMessageFatal::~LogMessageFatal()
    @     0x55555d4239c4  __assert_fail
    @     0x555558fddf77  llvm::GetElementPtrInst::Create()
    @     0x55555d1077d8  llvm::ConstantExpr::getAsInstruction()
    @     0x55555d201d42  llvm::convertConstantExprsToInstructions()
    @     0x55555d20125c  llvm::convertConstantExprsToInstructions()
    @     0x55555b47e1bd  llvm::AMDGPU::replaceConstantUsesInFunction()
    @     0x55555b2a40d5  (anonymous namespace)::AMDGPULowerModuleLDS::processUsedLDS()
    @     0x55555b2a31df  (anonymous namespace)::AMDGPULowerModuleLDS::runOnModule()
    @     0x55555d1bff34  llvm::legacy::PassManagerImpl::run()
    @     0x555558fae1b9  (anonymous namespace)::EmitAssemblyHelper::EmitAssemblyWithNewPassManager()
    @     0x555558fa9537  clang::EmitBackendOutput()
    @     0x555558fa66a5  clang::BackendConsumer::HandleTranslationUnit()
    @     0x555559c210b4  clang::ParseAST()
    @     0x5555599cb106  clang::FrontendAction::Execute()
    @     0x55555993fdcf  clang::CompilerInstance::ExecuteAction()
    @     0x555558bdbff3  clang::ExecuteCompilerInvocation()
    @     0x555558bcfd54  cc1_main()
    @     0x555558bcd6e7  ExecuteCC1Tool()
    @     0x555558bcd3fd  main
    @     0x7ffff7d29bbd  __libc_start_main
    @     0x555558bca0a9  _start
Wed, Jun 16, 2:52 PM · Restricted Project

Tue, Jun 15

rampitec requested review of D104331: [AMDGPU] Use performOptimizedStructLayout for LDS sort.
Tue, Jun 15, 4:06 PM · Restricted Project
rampitec added inline comments to D104316: [AMDGPU] Propagate LDS align into to instructions.
Tue, Jun 15, 12:43 PM · Restricted Project
rampitec requested review of D104316: [AMDGPU] Propagate LDS align into to instructions.
Tue, Jun 15, 12:14 PM · Restricted Project
rampitec committed rGa11880468e55: [AMDGPU] Fix lds superalign test. NFC. (authored by rampitec).
[AMDGPU] Fix lds superalign test. NFC.
Tue, Jun 15, 11:13 AM
rampitec accepted D104293: [AMDGPU] Set more flags on Real instructions.
Tue, Jun 15, 10:47 AM · Restricted Project
rampitec accepted D104306: [AMDGPU] Use defvar in SOPInstructions.td. NFC..
Tue, Jun 15, 10:46 AM · Restricted Project

Mon, Jun 14

rampitec accepted D104219: [AMDGPU] Limit runs of fixLdsBranchVmemWARHazard.

LGTM apart from tidy comments on variable names.

Mon, Jun 14, 12:55 PM · Restricted Project
rampitec accepted D104241: AMDGPU: Fix assert on m0_lo16/m0_hi16.
Mon, Jun 14, 12:50 PM · Restricted Project
rampitec accepted D104181: AMDGPU: Fix infinite loop in DAG combine with fneg + fma.

LGTM

Mon, Jun 14, 12:40 PM · Restricted Project
rampitec accepted D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers..

LGTM. Please allow at least 24 hours for others to comment too before submitting. In the meanwhile PSDB and ePSDB runs will be useful.

Mon, Jun 14, 12:36 PM · Restricted Project
rampitec added inline comments to D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers..
Mon, Jun 14, 12:11 PM · Restricted Project

Thu, Jun 10

rampitec added inline comments to D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers..
Thu, Jun 10, 11:31 AM · Restricted Project
rampitec added a comment to D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers..

Insert @llvm.amdgcn.mbcnt.hi() for wave64 mode.

Thu, Jun 10, 11:17 AM · Restricted Project

Wed, Jun 9

rampitec added inline comments to D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers..
Wed, Jun 9, 12:28 PM · Restricted Project
rampitec added a comment to D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers..

Implemented approach(2). Here we actually do not need builtin_amdgcn_mbcnt_hi(~0u, builtin_amdgcn_mbcnt_lo(~0u, 0u)). Irrespective of the wave64 or wave32, _builtin_amdgcn_mbcnt_lo(~0u, 0u) is enough. The reason is - we only want to identify lane 0. On the other hand, for wave64, if we wanted to identify any lane greater than 31, then we would need builtin_amdgcn_mbcnt_hi(~0u, builtin_amdgcn_mbcnt_lo(~0u, 0u)).

As far as I understand mbcnt_lo will return 0 for any thread >= 32, so you still need to use mbcnt_hi.

@b-sumner why do you suggest to nest the hi and lo calls? I think it shall be (builtin_amdgcn_mbcnt_lo(~0u, 0u) + builtin_amdgcn_mbcnt_hi(~0u, 0u)) == 0.

Wed, Jun 9, 11:55 AM · Restricted Project
rampitec added a comment to D103951: [NFC][Scheduler] Refactor tryCandidate to return boolean.

We may change this logic further: return true only if TopCand.Reason is set (not NoCand anymore)

This seems to be better. Returning true with NoCand is misleading to me.

Wed, Jun 9, 11:45 AM · Restricted Project
rampitec added inline comments to D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers..
Wed, Jun 9, 11:32 AM · Restricted Project
rampitec added a comment to D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers..

Implemented approach(2). Here we actually do not need builtin_amdgcn_mbcnt_hi(~0u, builtin_amdgcn_mbcnt_lo(~0u, 0u)). Irrespective of the wave64 or wave32, _builtin_amdgcn_mbcnt_lo(~0u, 0u) is enough. The reason is - we only want to identify lane 0. On the other hand, for wave64, if we wanted to identify any lane greater than 31, then we would need builtin_amdgcn_mbcnt_hi(~0u, builtin_amdgcn_mbcnt_lo(~0u, 0u)).

Wed, Jun 9, 11:26 AM · Restricted Project
rampitec accepted D103431: [AMDGPU] Fix missing lowering of LDS used in global scope..

Looks good to me now.

Wed, Jun 9, 10:27 AM · Restricted Project
rampitec added inline comments to D103431: [AMDGPU] Fix missing lowering of LDS used in global scope..
Wed, Jun 9, 10:20 AM · Restricted Project

Tue, Jun 8

rampitec accepted D103920: [amdgpu] Add `-enable-ocl-mangling-mismatch-workaround`..

Thanks!

Tue, Jun 8, 12:39 PM · Restricted Project
rampitec added inline comments to D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers..
Tue, Jun 8, 12:39 PM · Restricted Project
rampitec requested changes to D103920: [amdgpu] Add `-enable-ocl-mangling-mismatch-workaround`..

Oops, too soon. Please add "amdgpu" to the switch.

Tue, Jun 8, 12:08 PM · Restricted Project
rampitec accepted D103920: [amdgpu] Add `-enable-ocl-mangling-mismatch-workaround`..

LGTM as a short term w/a.

Tue, Jun 8, 12:07 PM · Restricted Project
rampitec added a comment to D103431: [AMDGPU] Fix missing lowering of LDS used in global scope..

Can you add tests for GlobalAlias? It is not immediately clear if these are handled properly.

Tue, Jun 8, 11:59 AM · Restricted Project
rampitec added a comment to D103920: [amdgpu] Add `-enable-ocl-mangling-mismatch-workaround`..
Tue, Jun 8, 11:59 AM · Restricted Project
rampitec added a comment to D103920: [amdgpu] Add `-enable-ocl-mangling-mismatch-workaround`..
Tue, Jun 8, 11:58 AM · Restricted Project

Mon, Jun 7

rampitec accepted D103663: [AMDGPU] Add gfx1013 target.
Mon, Jun 7, 11:43 PM · Restricted Project, Restricted Project, Restricted Project
rampitec added inline comments to D103663: [AMDGPU] Add gfx1013 target.
Mon, Jun 7, 5:13 PM · Restricted Project, Restricted Project, Restricted Project
rampitec committed rG05289dfb6246: [AMDGPU] Handle constant LDS uses from different kernels (authored by rampitec).
[AMDGPU] Handle constant LDS uses from different kernels
Mon, Jun 7, 3:44 PM
rampitec closed D103655: [AMDGPU] Handle constant LDS uses from different kernels.
Mon, Jun 7, 3:44 PM · Restricted Project
rampitec updated the diff for D103655: [AMDGPU] Handle constant LDS uses from different kernels.

Rebased.

Mon, Jun 7, 3:30 PM · Restricted Project
rampitec added inline comments to D103663: [AMDGPU] Add gfx1013 target.
Mon, Jun 7, 3:24 PM · Restricted Project, Restricted Project, Restricted Project
rampitec accepted D103817: [AMDGPU] Introduce command line switch to control super aligning of LDS..
Mon, Jun 7, 3:21 PM · Restricted Project
rampitec added a comment to D103800: [AMDGPU] Add VReg_192 support for MIMG instructions.

Maybe this is stating the obvious, but you could add v6i32 and v6f32 types. @tpr did that before for v3 and v5 vectors, specifically for AMDGPU image ops. See D58901 (and a bunch of related commits you can find with git log --grep v5f32).

Mon, Jun 7, 1:35 PM · Restricted Project
rampitec added a comment to D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers..

You probably need to wrap all prologue LDS stores into a block to execute it only from lane 0 and add a barrier after. @t-tye correct me if I am wrong.

But, I remember that we had decided to avoid barrier here, and instead just make sure that each thread within each wave execute the store instructions? In anycase, let me clarify it with @t-tye and @b-sumner.

I do not remember, but probably we can omit it since it is a singe store readonly memory. Anyway a confirmation from @t-tye would be nice.

Mon, Jun 7, 1:33 PM · Restricted Project
rampitec updated subscribers of D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers..
Mon, Jun 7, 1:31 PM · Restricted Project
rampitec added inline comments to D103817: [AMDGPU] Introduce command line switch to control super aligning of LDS..
Mon, Jun 7, 1:30 PM · Restricted Project
rampitec added a comment to D103431: [AMDGPU] Fix missing lowering of LDS used in global scope..

This probably will need rebase on top of D103655 which presumably shall simplify shouldLowerLDSToStruct() logic?

Mon, Jun 7, 12:59 PM · Restricted Project
rampitec accepted D103661: [IR] Add utility to convert constant expression operands (of an instruction) to instructions..

Thanks! It makes it now I believe. D103655 has @timestwo() which exploits partial conversion change. LGTM.

Mon, Jun 7, 12:47 PM · Restricted Project
rampitec added inline comments to D103655: [AMDGPU] Handle constant LDS uses from different kernels.
Mon, Jun 7, 12:45 PM · Restricted Project
rampitec updated the diff for D103655: [AMDGPU] Handle constant LDS uses from different kernels.

Rebased.

Mon, Jun 7, 12:43 PM · Restricted Project
rampitec added inline comments to D103663: [AMDGPU] Add gfx1013 target.
Mon, Jun 7, 11:56 AM · Restricted Project, Restricted Project, Restricted Project
rampitec added inline comments to D103655: [AMDGPU] Handle constant LDS uses from different kernels.
Mon, Jun 7, 10:25 AM · Restricted Project
rampitec updated the diff for D103655: [AMDGPU] Handle constant LDS uses from different kernels.

Changed initializer style.

Mon, Jun 7, 10:24 AM · Restricted Project
rampitec added inline comments to D103663: [AMDGPU] Add gfx1013 target.
Mon, Jun 7, 10:11 AM · Restricted Project, Restricted Project, Restricted Project

Fri, Jun 4

rampitec added inline comments to D103661: [IR] Add utility to convert constant expression operands (of an instruction) to instructions..
Fri, Jun 4, 2:49 PM · Restricted Project
rampitec added inline comments to D103661: [IR] Add utility to convert constant expression operands (of an instruction) to instructions..
Fri, Jun 4, 2:47 PM · Restricted Project
rampitec added inline comments to D103661: [IR] Add utility to convert constant expression operands (of an instruction) to instructions..
Fri, Jun 4, 2:37 PM · Restricted Project
rampitec added inline comments to D103655: [AMDGPU] Handle constant LDS uses from different kernels.
Fri, Jun 4, 2:29 PM · Restricted Project
rampitec added inline comments to D103655: [AMDGPU] Handle constant LDS uses from different kernels.
Fri, Jun 4, 2:16 PM · Restricted Project
rampitec added inline comments to D103661: [IR] Add utility to convert constant expression operands (of an instruction) to instructions..
Fri, Jun 4, 1:25 PM · Restricted Project
rampitec added inline comments to D103661: [IR] Add utility to convert constant expression operands (of an instruction) to instructions..
Fri, Jun 4, 1:08 PM · Restricted Project
rampitec added inline comments to D103663: [AMDGPU] Add gfx1013 target.
Fri, Jun 4, 12:27 PM · Restricted Project, Restricted Project, Restricted Project
rampitec accepted D103672: [AMDGPU] Add v5f32/VReg_160 support for MIMG instructions.

I believe it was long needed.

Fri, Jun 4, 12:01 PM · Restricted Project
rampitec added inline comments to D103431: [AMDGPU] Fix missing lowering of LDS used in global scope..
Fri, Jun 4, 11:58 AM · Restricted Project
rampitec added a comment to D103655: [AMDGPU] Handle constant LDS uses from different kernels.

I think from this we are just one step from removing module lds except for potentially indirect functions. Everything else can be moved into kernel lds structure.

We do have a problem with excessive use of lds in rocBLAS because of the module lds already, so technically we have a regression with it. There is a w/a but it is better be solved sooner rather than later.

I am not sure, if we can completely eliminate module lds. My understanding is that it is still required to handle within non-kernel function used LDS. But, instead of directly dealing with LDS, it should deal with pointers, hence the patch https://reviews.llvm.org/D103225.

Fri, Jun 4, 11:44 AM · Restricted Project
rampitec updated the diff for D103655: [AMDGPU] Handle constant LDS uses from different kernels.

Addressed review comments.

Fri, Jun 4, 11:41 AM · Restricted Project
rampitec added a comment to D103663: [AMDGPU] Add gfx1013 target.

You need to replace HasGFX10_BEncoding with HasGFX10_AEncoding in the BVH and IMAGE_MSAA_LOAD_X. You also need to update llvm.amdgcn.image.msaa.load.x.ll test to include gfx1013.

Fri, Jun 4, 11:09 AM · Restricted Project, Restricted Project, Restricted Project

Thu, Jun 3

rampitec added inline comments to D103431: [AMDGPU] Fix missing lowering of LDS used in global scope..
Thu, Jun 3, 10:26 PM · Restricted Project
rampitec added a reviewer for D103661: [IR] Add utility to convert constant expression operands (of an instruction) to instructions.: tra.
Thu, Jun 3, 6:47 PM · Restricted Project
rampitec updated subscribers of D103655: [AMDGPU] Handle constant LDS uses from different kernels.
Thu, Jun 3, 4:35 PM · Restricted Project
rampitec added a comment to D103655: [AMDGPU] Handle constant LDS uses from different kernels.

I think from this we are just one step from removing module lds except for potentially indirect functions. Everything else can be moved into kernel lds structure.

Thu, Jun 3, 4:33 PM · Restricted Project
rampitec added a comment to D103655: [AMDGPU] Handle constant LDS uses from different kernels.

@hsmhsm It depends on the convertConstantExprsToInstructions() you have implemented in the D103225. I would appreciate if you extract the helper into a separate review. Kudos for the helper BTW.

Thu, Jun 3, 4:09 PM · Restricted Project
rampitec requested review of D103655: [AMDGPU] Handle constant LDS uses from different kernels.
Thu, Jun 3, 4:07 PM · Restricted Project
rampitec added inline comments to D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers..
Thu, Jun 3, 1:32 PM · Restricted Project
rampitec accepted D103261: [AMDGPU] Increase alignment of LDS globals if necessary before LDS lowering..

I like the ISA changes, I'd say let's try it. If needed we could restrict it later with total LDS consumption calculation.
@hsmhsm please let a day before the submit in case if people have strong objections.

Thu, Jun 3, 1:29 PM · Restricted Project
rampitec added inline comments to D103431: [AMDGPU] Fix missing lowering of LDS used in global scope..
Thu, Jun 3, 1:23 PM · Restricted Project

Tue, Jun 1

rampitec abandoned D103213: [AMDGPU] All GWS instructions need aligned VGPR on gfx90a.
Tue, Jun 1, 5:09 PM · Restricted Project
rampitec committed rG9e2e49328f19: [AMDGPU] All GWS instructions need aligned VGPR on gfx90a (authored by rampitec).
[AMDGPU] All GWS instructions need aligned VGPR on gfx90a
Tue, Jun 1, 5:08 PM
rampitec closed D103197: [AMDGPU] All GWS instructions need aligned VGPR on gfx90a.
Tue, Jun 1, 5:08 PM · Restricted Project
rampitec updated the diff for D103197: [AMDGPU] All GWS instructions need aligned VGPR on gfx90a.

Added comment.

Tue, Jun 1, 4:51 PM · Restricted Project
rampitec updated the diff for D103197: [AMDGPU] All GWS instructions need aligned VGPR on gfx90a.

Same code added to selectDSGWSIntrinsic().

Tue, Jun 1, 4:16 PM · Restricted Project
rampitec added a comment to D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers..

You probably need to wrap all prologue LDS stores into a block to execute it only from lane 0 and add a barrier after. @t-tye correct me if I am wrong.

But, I remember that we had decided to avoid barrier here, and instead just make sure that each thread within each wave execute the store instructions? In anycase, let me clarify it with @t-tye and @b-sumner.

Tue, Jun 1, 2:54 PM · Restricted Project
rampitec added inline comments to D103197: [AMDGPU] All GWS instructions need aligned VGPR on gfx90a.
Tue, Jun 1, 2:45 PM · Restricted Project
rampitec accepted D103322: [AMDGPU] Use s_add_i32 for address additions.

LGTM modulo Jay's comments.

Tue, Jun 1, 2:42 PM · Restricted Project
rampitec updated the diff for D103197: [AMDGPU] All GWS instructions need aligned VGPR on gfx90a.

Fix tidy warning.

Tue, Jun 1, 2:22 PM · Restricted Project
rampitec added inline comments to D103197: [AMDGPU] All GWS instructions need aligned VGPR on gfx90a.
Tue, Jun 1, 2:20 PM · Restricted Project
rampitec updated the diff for D103197: [AMDGPU] All GWS instructions need aligned VGPR on gfx90a.

Added verifier check.

Tue, Jun 1, 1:51 PM · Restricted Project
rampitec added inline comments to D103213: [AMDGPU] All GWS instructions need aligned VGPR on gfx90a.
Tue, Jun 1, 10:40 AM · Restricted Project

Thu, May 27

rampitec updated the diff for D103213: [AMDGPU] All GWS instructions need aligned VGPR on gfx90a.

Fix tidy warnings.

Thu, May 27, 1:38 PM · Restricted Project
rampitec added a comment to D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers..

You probably need to wrap all prologue LDS stores into a block to execute it only from lane 0 and add a barrier after. @t-tye correct me if I am wrong.

Thu, May 27, 1:30 PM · Restricted Project
rampitec added a comment to D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers..

Note the failure: LLVM.CodeGen/AMDGPU::llc-pipeline.ll

Thu, May 27, 12:48 PM · Restricted Project
rampitec added a comment to D103138: [AMDGPU] [IndirectCalls] Don't propagate attributes to address taken functions and their callees.

Generally LGTM. Please wait other reviewers too.

Thu, May 27, 12:45 PM · Restricted Project
rampitec added a comment to D103261: [AMDGPU] Increase alignment of LDS globals if necessary before LDS lowering..

Do we actually see underaligned LDS variables in practice? What if the user is forcing a lower alignment to prioritize space utilization?

Thu, May 27, 12:39 PM · Restricted Project
rampitec added a comment to D103261: [AMDGPU] Increase alignment of LDS globals if necessary before LDS lowering..

What's the use case for this? It will increase memory use if it increases the alignment of any variables

Thu, May 27, 12:24 PM · Restricted Project
rampitec added a comment to D103230: [AMDGPU] Add options to disable NSA for BVH instructions.

Any reason to force SA?

Thu, May 27, 10:02 AM · Restricted Project
rampitec added a comment to D103261: [AMDGPU] Increase alignment of LDS globals if necessary before LDS lowering..

Thanks! I'd suggest to combine all fix-lds-alignmen*.ll tests into one. Just use different names for different blocks of variables so they are used only in one kernel.

Thu, May 27, 9:55 AM · Restricted Project
rampitec added a comment to D103197: [AMDGPU] All GWS instructions need aligned VGPR on gfx90a.

JBTW, this patch directly reflects what's actually happen in HW. Even though these 3 instructions read 32 bit as a source internally they request to read 64 bit which in turn triggered alignment requirement. I assume this is really modeled as "read dword as an operand and have an implicit read of a superreg".

Thu, May 27, 2:04 AM · Restricted Project