Page MenuHomePhabricator
Feed Advanced Search

Oct 18 2022

jpages accepted D134641: [AMDGPU][Backend] Fix user-after-free in AMDGPUReleaseVGPRs::isLastVGPRUseVMEMStore.

LGTM, thanks!

Oct 18 2022, 9:42 AM · Restricted Project, Restricted Project

Jun 29 2022

jpages accepted D128270: [AMDGPU] New AMDGPUInsertDelayAlu pass.
Jun 29 2022, 10:43 AM · Restricted Project, Restricted Project

Jun 23 2022

jpages accepted D128442: [AMDGPU] GFX11: automatically release VGPRs at the end of the shader.

LGTM.

Jun 23 2022, 9:08 AM · Restricted Project, Restricted Project

Jun 2 2022

jpages committed rG2dfe41944658: [AMDGPU] Improve codegen of extractelement/insertelement in some cases (authored by jpages).
[AMDGPU] Improve codegen of extractelement/insertelement in some cases
Jun 2 2022, 2:10 PM · Restricted Project, Restricted Project
jpages closed D126389: [AMDGPU] Improve codegen of extractelement/insertelement in some cases.
Jun 2 2022, 2:10 PM · Restricted Project, Restricted Project, Unknown Object (Project)

Jun 1 2022

jpages added a comment to D126389: [AMDGPU] Improve codegen of extractelement/insertelement in some cases.

I wouldn’t really expect there to be a performance difference between gpr indexing and movrel requences. Gfx8 has both so you could directly compare there

Jun 1 2022, 9:56 AM · Restricted Project, Restricted Project, Unknown Object (Project)

May 31 2022

jpages updated the diff for D126389: [AMDGPU] Improve codegen of extractelement/insertelement in some cases.

Rebased, keep the current version for GFX9 and only apply this patch on architectures with movrel.

May 31 2022, 3:54 PM · Restricted Project, Restricted Project, Unknown Object (Project)

May 30 2022

jpages added a comment to D126389: [AMDGPU] Improve codegen of extractelement/insertelement in some cases.

Any performance numbers? The 8 element case was driven by a specific customer program and the performance of the cmp/select was better than movrel.

I don't know why that would be. Maybe the performance characteristics are different on GFX10+ compared to GFX9.

Also on GFX10+ sgpr usage does not affect occupancy, so perhaps the heuristic could be tweaked to make it more likely to use s_movrel (not v_movrel) on GFX10+.

I will try to get some performance numbers on specific games. Do you know if this performance problem was specific to an architecture?
Like Jay said, I could tweak the heuristic for this and only generate it for GFX10+.

All the tests were done on gfx9 flavors. It is also worth noting gfx9 does not have movrel, but uses s_set_gpr_idx_on instead, which may be the difference.

I.e. it may be linked to FeatureMovrel.

May 30 2022, 4:26 PM · Restricted Project, Restricted Project, Unknown Object (Project)

May 26 2022

jpages added a comment to D126389: [AMDGPU] Improve codegen of extractelement/insertelement in some cases.

Any performance numbers? The 8 element case was driven by a specific customer program and the performance of the cmp/select was better than movrel.

I don't know why that would be. Maybe the performance characteristics are different on GFX10+ compared to GFX9.

Also on GFX10+ sgpr usage does not affect occupancy, so perhaps the heuristic could be tweaked to make it more likely to use s_movrel (not v_movrel) on GFX10+.

May 26 2022, 7:35 AM · Restricted Project, Restricted Project, Unknown Object (Project)

May 25 2022

jpages added a reviewer for D126389: [AMDGPU] Improve codegen of extractelement/insertelement in some cases: rampitec.
May 25 2022, 12:51 PM · Restricted Project, Restricted Project, Unknown Object (Project)
jpages updated the diff for D126389: [AMDGPU] Improve codegen of extractelement/insertelement in some cases.

Updated a testcase

May 25 2022, 12:46 PM · Restricted Project, Restricted Project, Unknown Object (Project)
jpages retitled D126389: [AMDGPU] Improve codegen of extractelement/insertelement in some cases from [AMDGPU] Improve codegen of extractelement in some cases to [AMDGPU] Improve codegen of extractelement/insertelement in some cases.
May 25 2022, 9:24 AM · Restricted Project, Restricted Project, Unknown Object (Project)
jpages requested review of D126389: [AMDGPU] Improve codegen of extractelement/insertelement in some cases.
May 25 2022, 9:21 AM · Restricted Project, Restricted Project, Unknown Object (Project)

Feb 11 2022

jpages committed rGdcb2da13f16e: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode (authored by jpages).
[AMDGPU] Add a new intrinsic to control fp_trunc rounding mode
Feb 11 2022, 9:09 AM
jpages closed D110579: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.
Feb 11 2022, 9:09 AM · Restricted Project, Unknown Object (Project)

Feb 10 2022

jpages updated the diff for D110579: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.
Feb 10 2022, 5:33 PM · Restricted Project, Unknown Object (Project)
jpages updated the diff for D110579: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.
Feb 10 2022, 4:30 PM · Restricted Project, Unknown Object (Project)

Feb 8 2022

jpages added inline comments to D110579: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.
Feb 8 2022, 1:42 PM · Restricted Project, Unknown Object (Project)
jpages updated the diff for D110579: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.
Feb 8 2022, 1:35 PM · Restricted Project, Unknown Object (Project)

Feb 7 2022

jpages updated the diff for D110579: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.

Rebased following Jay's comments.

Feb 7 2022, 8:50 AM · Restricted Project, Unknown Object (Project)

Feb 3 2022

jpages added inline comments to D110579: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.
Feb 3 2022, 6:12 PM · Restricted Project, Unknown Object (Project)
jpages updated the diff for D110579: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.
Feb 3 2022, 6:11 PM · Restricted Project, Unknown Object (Project)
jpages updated the diff for D110579: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.

Updated a failing test on AArch64.

Feb 3 2022, 3:32 PM · Restricted Project, Unknown Object (Project)

Feb 1 2022

jpages added inline comments to D110579: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.
Feb 1 2022, 8:33 AM · Restricted Project, Unknown Object (Project)
jpages updated the diff for D110579: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.

Rebased following Jay's comments.

Feb 1 2022, 8:26 AM · Restricted Project, Unknown Object (Project)

Jan 31 2022

jpages added a comment to D110579: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.

Rebased, as suggested I added a target independent ISD opcode.

Jan 31 2022, 2:29 PM · Restricted Project, Unknown Object (Project)
jpages updated the diff for D110579: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.
Jan 31 2022, 2:21 PM · Restricted Project, Unknown Object (Project)

Jan 13 2022

jpages added inline comments to D110579: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.
Jan 13 2022, 1:46 PM · Restricted Project, Unknown Object (Project)
jpages updated the diff for D110579: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.

The intrinsic has the IntrNoMem attribute, and is represented by INTRINSIC_WO_CHAIN in the DAG.

Jan 13 2022, 1:40 PM · Restricted Project, Unknown Object (Project)

Jan 11 2022

jpages updated the diff for D110579: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.

I changed the patch to have only one intrinsic with a rounding mode parameter as suggested. This required many modifications in the frontend parts of LLVM.

Jan 11 2022, 4:44 PM · Restricted Project, Unknown Object (Project)

Dec 2 2021

jpages added inline comments to D110579: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.
Dec 2 2021, 9:35 AM · Restricted Project, Unknown Object (Project)

Dec 1 2021

jpages updated the diff for D110579: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.

Added G_FPTRUNC_ROUND_UPWARD/G_FPTRUNC_ROUND_DOWNWARD opcodes between the intrinsics and the pseudo instructions. Added a custom ISD node for the DAG version.

Dec 1 2021, 2:56 PM · Restricted Project, Unknown Object (Project)

Oct 20 2021

jpages added a comment to D110579: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.

The consequence of this change is the use of setreg instead of s_round_mode in the codegen. But it's probably better to not reinvent the wheel as this pass is already optimized to not insert too many setreg.

Good point. s_round_mode/s_denorm_mode are new in GFX10, so they did not exist when this pass was written. Do you think the pass could be improved to emit s_round_mode/s_denorm_mode instead of s_setreg whenever it only needs to change the rounding/denormal bits of the mode register? That could be a separate patch.

Oct 20 2021, 12:14 PM · Restricted Project, Unknown Object (Project)
jpages added inline comments to D110579: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.
Oct 20 2021, 12:12 PM · Restricted Project, Unknown Object (Project)

Oct 14 2021

jpages committed rGe4e48e2f025b: [AMDGPU] Add more tests for build_vector (authored by jpages).
[AMDGPU] Add more tests for build_vector
Oct 14 2021, 8:55 AM
jpages closed D111652: [AMDGPU] Add more tests for build_vector.
Oct 14 2021, 8:55 AM · Restricted Project

Oct 13 2021

jpages updated the diff for D111652: [AMDGPU] Add more tests for build_vector.

It was possible without the script, the diff is smaller like that, it's probably better.

Oct 13 2021, 12:41 PM · Restricted Project
jpages updated the diff for D111652: [AMDGPU] Add more tests for build_vector.

Rebased: updated the test with the script instead of manually.

Oct 13 2021, 8:01 AM · Restricted Project
jpages added inline comments to D111652: [AMDGPU] Add more tests for build_vector.
Oct 13 2021, 7:57 AM · Restricted Project
jpages updated the diff for D111652: [AMDGPU] Add more tests for build_vector.

Rebased without -LABEL in --check-prefixes

Oct 13 2021, 7:21 AM · Restricted Project

Oct 12 2021

jpages requested review of D111652: [AMDGPU] Add more tests for build_vector.
Oct 12 2021, 9:13 AM · Restricted Project

Oct 8 2021

jpages updated the diff for D110579: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.

Rebased based on Jay's comments.

Oct 8 2021, 9:48 AM · Restricted Project, Unknown Object (Project)

Oct 4 2021

jpages updated the diff for D110579: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.

Rebased based on previous comments.

Oct 4 2021, 1:16 PM · Restricted Project, Unknown Object (Project)

Sep 27 2021

jpages requested review of D110579: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.
Sep 27 2021, 1:17 PM · Restricted Project, Unknown Object (Project)

Jun 3 2021

jpages added a comment to D103344: [AMDGPU] Fix a crash when selecting a particular case of buffer_load_format_d16.

Probably should also add a globalisel test just in case

Jun 3 2021, 2:25 PM · Restricted Project
jpages committed rG37821155c972: [AMDGPU] Fix a crash when selecting a particular case of buffer_load_format_d16 (authored by jpages).
[AMDGPU] Fix a crash when selecting a particular case of buffer_load_format_d16
Jun 3 2021, 1:41 PM
jpages closed D103344: [AMDGPU] Fix a crash when selecting a particular case of buffer_load_format_d16.
Jun 3 2021, 1:41 PM · Restricted Project

Jun 2 2021

jpages updated the diff for D103344: [AMDGPU] Fix a crash when selecting a particular case of buffer_load_format_d16.

Renamed the test file with a better name.

Jun 2 2021, 8:46 AM · Restricted Project

Jun 1 2021

jpages updated the diff for D103344: [AMDGPU] Fix a crash when selecting a particular case of buffer_load_format_d16.

Rebased, changed the test with bugpoint/opt to a much simpler version.

Jun 1 2021, 12:27 PM · Restricted Project

May 31 2021

jpages updated the diff for D103344: [AMDGPU] Fix a crash when selecting a particular case of buffer_load_format_d16.
May 31 2021, 1:01 PM · Restricted Project

May 28 2021

jpages added reviewers for D103344: [AMDGPU] Fix a crash when selecting a particular case of buffer_load_format_d16: arsenm, rampitec.
May 28 2021, 2:50 PM · Restricted Project
jpages requested review of D103344: [AMDGPU] Fix a crash when selecting a particular case of buffer_load_format_d16.
May 28 2021, 2:50 PM · Restricted Project

May 11 2021

jpages updated the diff for D98081: [AMDGPU] Improve Codegen for build_vector.
May 11 2021, 12:15 PM · Restricted Project, Unknown Object (Project)

May 10 2021

jpages added inline comments to D98081: [AMDGPU] Improve Codegen for build_vector.
May 10 2021, 3:58 PM · Restricted Project, Unknown Object (Project)
jpages updated the diff for D98081: [AMDGPU] Improve Codegen for build_vector.

Added the VOP3Mods in the pattern and associated tests, thanks for the suggestion!

May 10 2021, 3:45 PM · Restricted Project, Unknown Object (Project)

May 6 2021

jpages updated the diff for D98081: [AMDGPU] Improve Codegen for build_vector.

Rebased to it in Tablegen, it should be cleaner now. I removed the too complex pattern with the conversions from i16 to f16.
Instead this v_pack_b32_f16 instruction will be used only for f16 when we are sure the two inputs have already been flushed if needed.

May 6 2021, 8:55 AM · Restricted Project, Unknown Object (Project)

May 4 2021

jpages added a comment to D101481: [AMDGPU] Select V_CVT_*16_F16 more often.

Thanks for the review. I don't have commit access to the repo, could someone do it for me?

May 4 2021, 1:47 PM · Restricted Project, Unknown Object (Project)

May 3 2021

jpages updated the diff for D101481: [AMDGPU] Select V_CVT_*16_F16 more often.

Updated for code style

May 3 2021, 8:36 AM · Restricted Project, Unknown Object (Project)

Apr 30 2021

jpages updated the diff for D101481: [AMDGPU] Select V_CVT_*16_F16 more often.

Thank you for your inputs.

Apr 30 2021, 3:08 PM · Restricted Project, Unknown Object (Project)

Apr 28 2021

jpages added a reviewer for D101481: [AMDGPU] Select V_CVT_*16_F16 more often: arsenm.
Apr 28 2021, 12:45 PM · Restricted Project, Unknown Object (Project)
jpages requested review of D101481: [AMDGPU] Select V_CVT_*16_F16 more often.
Apr 28 2021, 12:42 PM · Restricted Project, Unknown Object (Project)

Mar 23 2021

jpages updated the diff for D98081: [AMDGPU] Improve Codegen for build_vector.

It appears that this instruction is harder to select than expected on integers.

Mar 23 2021, 12:07 PM · Restricted Project, Unknown Object (Project)

Mar 5 2021

jpages requested review of D98081: [AMDGPU] Improve Codegen for build_vector.
Mar 5 2021, 3:07 PM · Restricted Project, Unknown Object (Project)