jbhateja (Jatin Bhateja)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 23 2017, 9:51 AM (21 w, 6 d)

Recent Activity

Thu, Sep 21

jbhateja updated the diff for D37686: [DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management..
  • Updating test case with more than one uses of sqrt / mul.
Thu, Sep 21, 10:58 PM
jbhateja committed rL313964: [X86] Updating the test case for FMF propagation..
[X86] Updating the test case for FMF propagation.
Thu, Sep 21, 10:50 PM
jbhateja closed D38163: [X86] Updating the test case for FMF propagation. by committing rL313964: [X86] Updating the test case for FMF propagation..
Thu, Sep 21, 10:50 PM
jbhateja accepted D38163: [X86] Updating the test case for FMF propagation..
Thu, Sep 21, 10:45 PM
jbhateja created D38163: [X86] Updating the test case for FMF propagation..
Thu, Sep 21, 10:45 PM
jbhateja added a comment to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..

@reviewers, required revision change are through, let me know if this can land back.

Thu, Sep 21, 12:21 PM
jbhateja added a comment to D37686: [DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management..

I've added some more FMF tests at rL313893 which I think this patch will miscompile. Please rebase/update.

As I suggested before, this patch shouldn't try to enable multiple DAG combines with node-level FMF. It's not as straightforward as you might think.

Pick exactly one combine if you want to show that this patch is working as intended. The llvm.muladd intrinsic test that you have here with a target that supports 'fma' (not plain x86) seems like a good choice to me. If we have a strict op in IR, it should produce an fma instruction. If we have a fast op in IR, it should produce the simpler fmul instruction?

My understanding and code changes are based LLVM Ref Manual 's section about Fast-Math flags" (http://llvm.org/docs/LangRef.html#fast-math-flags)

Which say for FMF flag NaN "Allow optimizations to assume the arguments and result are not NaN".

Now in following case which has been added by you

%y = call float @llvm.sqrt.f32(float %x)
%z = fdiv fast float 1.0, %y
ret float %z

We dont have fast flag over intrinsic but DAGCombining for fdiv sees a fast flag and assume result (%z) and arguments (constant , %y) as not a Nan and goes ahead and generates a reciprocal sqrt. If you remove fast from fdiv and add it to intrinsic then FMF opt at fdiv will not kick in.

Can you please let me know what you expected here.

I expect that the sqrt result is strict. Ie, it should use sqrtss if this is x86-64. We're not allowed to use rsqrtss and lose precision on that op.

That said, my memory of exactly how op-level FMF should work is fuzzy. If anyone else remembers or can link to threads where we've discussed this, please feel free to jump in. :)

Thu, Sep 21, 12:16 PM
jbhateja added a comment to D37686: [DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management..

I've added some more FMF tests at rL313893 which I think this patch will miscompile. Please rebase/update.

As I suggested before, this patch shouldn't try to enable multiple DAG combines with node-level FMF. It's not as straightforward as you might think.

Pick exactly one combine if you want to show that this patch is working as intended. The llvm.muladd intrinsic test that you have here with a target that supports 'fma' (not plain x86) seems like a good choice to me. If we have a strict op in IR, it should produce an fma instruction. If we have a fast op in IR, it should produce the simpler fmul instruction?

Thu, Sep 21, 11:53 AM
jbhateja added inline comments to D37686: [DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management..
Thu, Sep 21, 3:10 AM
jbhateja updated the diff for D37686: [DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management..
  • Review comments resolutions.
Thu, Sep 21, 3:07 AM
jbhateja committed rL313869: [X86] Adding a testpoint for fast-math flags propagation..
[X86] Adding a testpoint for fast-math flags propagation.
Thu, Sep 21, 2:55 AM
jbhateja closed D38127: [X86] Adding a testpoint for fast-math flags propagation. by committing rL313869: [X86] Adding a testpoint for fast-math flags propagation..
Thu, Sep 21, 2:55 AM
jbhateja accepted D38127: [X86] Adding a testpoint for fast-math flags propagation..
Thu, Sep 21, 2:50 AM
jbhateja created D38127: [X86] Adding a testpoint for fast-math flags propagation..
Thu, Sep 21, 2:49 AM
jbhateja updated the diff for D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..
Thu, Sep 21, 12:15 AM

Tue, Sep 19

jbhateja added inline comments to D36454: [X86] Changes to extract Horizontal addition operation for AVX-512..
Tue, Sep 19, 4:15 AM
jbhateja updated the diff for D36454: [X86] Changes to extract Horizontal addition operation for AVX-512..
Tue, Sep 19, 12:10 AM

Mon, Sep 18

jbhateja added a comment to D37686: [DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management..

Ping @reviewers.

Mon, Sep 18, 6:32 PM

Sun, Sep 17

jbhateja added a comment to D37686: [DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management..

ping @ reviewers.

Sun, Sep 17, 12:04 PM
jbhateja updated the diff for D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..
  • Updating tests for reported PRs for initial patch.
Sun, Sep 17, 11:36 AM
jbhateja committed rL313490: Adding test cases for PR34629 & PR34634..
Adding test cases for PR34629 & PR34634.
Sun, Sep 17, 11:17 AM
jbhateja closed D37962: Adding test cases for PR34629 & PR34634. by committing rL313490: Adding test cases for PR34629 & PR34634..
Sun, Sep 17, 11:17 AM
jbhateja accepted D37962: Adding test cases for PR34629 & PR34634..
Sun, Sep 17, 11:07 AM
jbhateja created D37962: Adding test cases for PR34629 & PR34634..
Sun, Sep 17, 11:06 AM
jbhateja added a comment to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..

@RKSimon, @Reviewers, revision was in accepted state earlier and fix to counter reported issues post commit to trunk has been fixed.

Sun, Sep 17, 12:38 AM
jbhateja updated the diff for D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..
  • Undefining result operand of factored statement to preserve SSA nature of Machine IR.
  • This fixes reperted PR 34634 and PR 34629 and build-bot failures reported.
Sun, Sep 17, 12:24 AM

Sat, Sep 16

jbhateja updated the diff for D37686: [DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management..
  • Rebase from trunk.
  • More changes to cover review comments.
  • Test usage of fast-math flags over nodes at some places, it fixes PR34558.
  • More places where flags over noded needs to be checked, to be done incrementally.
Sat, Sep 16, 6:24 AM

Fri, Sep 15

jbhateja added inline comments to D37880: Fix an out-of-bounds shufflevector index bug.
Fri, Sep 15, 10:35 PM
jbhateja added a comment to D37880: Fix an out-of-bounds shufflevector index bug.

I propose following as the fix , Simon and other reviews can comment.

Fri, Sep 15, 3:06 AM

Thu, Sep 14

jbhateja committed rL313343: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..
[X86] PR32755 : Improvement in CodeGen instruction selection for LEAs.
Thu, Sep 14, 10:32 PM
jbhateja closed D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs. by committing rL313343: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..
Thu, Sep 14, 10:32 PM
jbhateja updated the diff for D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..
  • Few synthetic changes.
Thu, Sep 14, 10:18 PM

Wed, Sep 13

jbhateja updated the diff for D37686: [DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management..
  • Review comments resolution.
Wed, Sep 13, 9:16 AM
jbhateja added a comment to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..

@lsaba, @reviewers , waiting for your LGTM or any remaining comments on this.
Thanks

Wed, Sep 13, 6:28 AM
jbhateja updated the diff for D37686: [DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management..
  • Review comments handling.
Wed, Sep 13, 6:18 AM

Tue, Sep 12

jbhateja added a comment to D37686: [DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management..

@reviewers, are there any more comments apart from last comments, this is just to save iteration, thanks for your time in reviews.

Tue, Sep 12, 8:00 PM
jbhateja added a comment to D37686: [DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management..

Ping @reviewers

Tue, Sep 12, 5:41 PM
jbhateja updated the diff for D37686: [DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management..
  • Review comments resolution + flags propagation over operands.
Tue, Sep 12, 8:16 AM

Mon, Sep 11

jbhateja added a comment to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..

ping @ reviewers.

Mon, Sep 11, 9:48 AM
jbhateja added a comment to D37686: [DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management..

Here's a potential test case that would show a difference from having FMF on a sqrt intrinsic:

define float @fast_recip_sqrt(float %x) {
  %y = call fast float @llvm.sqrt.f32(float %x)
  %z = fdiv fast float 1.0,  %y
  ret float %z
}
declare float @llvm.sqrt.f32(float) nounwind readonly

...but as I said earlier, we need to fix the DAGCombiner code where this fold is implemented to recognize the flags on the individual nodes. Currently, it just checks the global state:

if (Options.UnsafeFPMath) {

On x86 currently, this will use the full-precision sqrtss+divss, but it should be using rsqrtss followed by mulss/addss to refine the estimate.

Mon, Sep 11, 9:17 AM
jbhateja added a comment to D37686: [DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management..

RkSimon @Anything else or should I check this in as NFC.

Mon, Sep 11, 7:53 AM
jbhateja added reviewers for D37686: [DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management.: spatel, RKSimon.
Mon, Sep 11, 6:48 AM
jbhateja created D37686: [DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management..
Mon, Sep 11, 6:46 AM
jbhateja updated the diff for D34596: [X86]: Adding a new priority function 'guided-src' for Scheduler DAG instruction scheduling..
Mon, Sep 11, 6:14 AM

Sun, Sep 10

jbhateja added a comment to D37616: [X86] PR34149 Suboptimal codegen for fast minnum and maxnum..

I might be missing some context here. If we have fast/nnan on these calls, then can't we simplify this in IR to fmp+select and not have to deal with this in the backend? The intrinsics only exist to make sure that NaN behavior in IR meets the higher level standards, so if we have nnan, then we don't need the intrinsic?

Intrinsic function defer code geneation/expansion to backend this give backend control over geneating efficient code as per specific target.

It's incorrect that intrinsics are passed unaltered to the backend for expansion/optimization. See the optimizations for both generic and target-specific intrinsics in InstCombiner::visitCallInst().

Again, I may be missing some context - who created this IR? Creating a 'call fast llvm.maxnum()' just doesn't make sense to me, so if we can fix that in IR, we should do that. The intrinsic inhibits the large number of potential optimizations for fcmp+select that we have in IR. No target should benefit from having extra NaN semantics requirements provided by the intrinsic that are then overridden by FMF.

Please split the FlagsAcquirer diff into a separate patch.

Sun, Sep 10, 10:32 AM

Sat, Sep 9

jbhateja added a comment to D37616: [X86] PR34149 Suboptimal codegen for fast minnum and maxnum..

I might be missing some context here. If we have fast/nnan on these calls, then can't we simplify this in IR to fmp+select and not have to deal with this in the backend? The intrinsics only exist to make sure that NaN behavior in IR meets the higher level standards, so if we have nnan, then we don't need the intrinsic?

Sat, Sep 9, 10:12 AM
jbhateja updated the diff for D37616: [X86] PR34149 Suboptimal codegen for fast minnum and maxnum..
  • Consolidating Instruction->SDNode Flags propagation in one class.
Sat, Sep 9, 9:54 AM

Fri, Sep 8

jbhateja closed D37613: [X86] Adding a test point for PR34149 'Suboptimal codegen for "fast" minnum and maxnum'.

Closing with commit rL312778: [X86] Adding a test point for PR34149 'Suboptimal codegen for "fast" minnum

Fri, Sep 8, 2:47 AM
jbhateja accepted D37613: [X86] Adding a test point for PR34149 'Suboptimal codegen for "fast" minnum and maxnum'.
Fri, Sep 8, 2:47 AM
jbhateja updated the diff for D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..
  • Rebasing again.
  • Adding a check for subtarget feature Slow3OpLEA in pattern matching.
Fri, Sep 8, 2:44 AM
jbhateja edited reviewers for D37616: [X86] PR34149 Suboptimal codegen for fast minnum and maxnum., added: sanjoy, RKSimon, spatel, craig.topper; removed: llvm-commits.
Fri, Sep 8, 2:40 AM
jbhateja created D37616: [X86] PR34149 Suboptimal codegen for fast minnum and maxnum..
Fri, Sep 8, 2:38 AM
jbhateja committed rL312778: [X86] Adding a test point for PR34149 'Suboptimal codegen for "fast" minnum and….
[X86] Adding a test point for PR34149 'Suboptimal codegen for "fast" minnum and…
Fri, Sep 8, 2:17 AM
jbhateja closed D37614: [X86] Adding a test point for PR34149 'Suboptimal codegen for "fast" minnum and maxnum' by committing rL312778: [X86] Adding a test point for PR34149 'Suboptimal codegen for "fast" minnum and….
Fri, Sep 8, 2:17 AM
jbhateja accepted D37614: [X86] Adding a test point for PR34149 'Suboptimal codegen for "fast" minnum and maxnum'.
Fri, Sep 8, 2:13 AM
jbhateja created D37614: [X86] Adding a test point for PR34149 'Suboptimal codegen for "fast" minnum and maxnum'.
Fri, Sep 8, 2:12 AM
jbhateja created D37613: [X86] Adding a test point for PR34149 'Suboptimal codegen for "fast" minnum and maxnum'.
Fri, Sep 8, 2:12 AM

Wed, Sep 6

jbhateja added a comment to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..

3-Ops LEA are costly starting target SandyBridge , is there a limitation in the code for the targets this transformation works on? If not I think there should be.
you can check the Slow3OpsLEA feature for the full list of targets.

Wed, Sep 6, 4:48 AM
jbhateja added a comment to D36454: [X86] Changes to extract Horizontal addition operation for AVX-512..

I think we should try to combine based on the add only being used by the extract_vector_elt. Turn the add into a 128-bit add being fed by extract_subvectors. Similarly if we see an add only being used by an extract_subvector we can shrink that add too and push the extracts up. This type of transform feels more generally useful because it will allow us to narrow many more adds in this code. This will enable EVEX->VEX to use a smaller encoding. We can apply this to many other opcodes as well.

If we do this early enough we should be able to shrink the add before the horizontal add detection.

Wed, Sep 6, 12:43 AM

Tue, Sep 5

jbhateja committed rL312614: Updating a test reference for rL312608..
Updating a test reference for rL312608.
Tue, Sep 5, 8:59 PM
jbhateja closed D37501: Updating a test reference for rL312608. by committing rL312614: Updating a test reference for rL312608..
Tue, Sep 5, 8:59 PM
jbhateja accepted D37501: Updating a test reference for rL312608..
Tue, Sep 5, 8:56 PM
jbhateja created D37501: Updating a test reference for rL312608..
Tue, Sep 5, 8:52 PM
jbhateja committed rL312608: [X86] Allow cross-lane permutations for sub targets supporting AVX2..
[X86] Allow cross-lane permutations for sub targets supporting AVX2.
Tue, Sep 5, 8:01 PM
jbhateja closed D37388: [X86] Allow cross-lane permutations for sub targets supporting AVX2. by committing rL312608: [X86] Allow cross-lane permutations for sub targets supporting AVX2..
Tue, Sep 5, 8:01 PM
jbhateja updated the diff for D37388: [X86] Allow cross-lane permutations for sub targets supporting AVX2..
  • Formatting changes
Tue, Sep 5, 7:53 PM

Mon, Sep 4

jbhateja updated the diff for D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..
  • Fine tuning pattern matching condition.
  • Formatting changes.
Mon, Sep 4, 9:55 PM
jbhateja updated the diff for D37388: [X86] Allow cross-lane permutations for sub targets supporting AVX2..
  • Reverting lit script change.
Mon, Sep 4, 11:34 AM
jbhateja updated the diff for D37388: [X86] Allow cross-lane permutations for sub targets supporting AVX2..
Mon, Sep 4, 11:27 AM

Sun, Sep 3

jbhateja committed rL312444: Test commit access in clang..
Test commit access in clang.
Sun, Sep 3, 8:30 AM
jbhateja closed D37426: Test commit access in clang. by committing rL312444: Test commit access in clang..
Sun, Sep 3, 8:30 AM
jbhateja accepted D37426: Test commit access in clang..
Sun, Sep 3, 8:25 AM
jbhateja created D37426: Test commit access in clang..
Sun, Sep 3, 8:25 AM
jbhateja added a comment to D37388: [X86] Allow cross-lane permutations for sub targets supporting AVX2..

Oh Ok , thanks I shall take care in future.

Sun, Sep 3, 4:29 AM
jbhateja added a comment to D37388: [X86] Allow cross-lane permutations for sub targets supporting AVX2..

Hi Jatin,
Thanks for working on this.
You didn't add llvm-commits as a subscriber upon review creation, so this won't show up on the mailing list. Can you open a new review?

Hi Guy,

Sun, Sep 3, 3:07 AM
jbhateja added a comment to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..

@lamas, @reviewers, comments have been taken care. Let me know if anything else.

Sun, Sep 3, 12:16 AM
jbhateja added a comment to D37388: [X86] Allow cross-lane permutations for sub targets supporting AVX2..

Ping reviewers.

Sun, Sep 3, 12:10 AM
jbhateja added a comment to D36454: [X86] Changes to extract Horizontal addition operation for AVX-512..

Ping reviewers

Sun, Sep 3, 12:10 AM

Fri, Sep 1

jbhateja added reviewers for D37388: [X86] Allow cross-lane permutations for sub targets supporting AVX2.: aymanmus, craig.topper, RKSimon, spatel, guyblank.
Fri, Sep 1, 11:48 AM
jbhateja created D37388: [X86] Allow cross-lane permutations for sub targets supporting AVX2..
Fri, Sep 1, 11:27 AM

Wed, Aug 30

jbhateja updated the diff for D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..
Wed, Aug 30, 10:52 PM

Tue, Aug 29

jbhateja added a comment to D36454: [X86] Changes to extract Horizontal addition operation for AVX-512..

ping @reviewers

Tue, Aug 29, 4:26 AM
jbhateja committed rL311994: [X86] Adding a test to demonstrate aggressive folding for LEA facotrization..
[X86] Adding a test to demonstrate aggressive folding for LEA facotrization.
Tue, Aug 29, 3:51 AM
jbhateja closed D37257: [X86] Adding a test to demonstrate aggressive folding for LEA facotrization. by committing rL311994: [X86] Adding a test to demonstrate aggressive folding for LEA facotrization..
Tue, Aug 29, 3:50 AM
jbhateja accepted D37257: [X86] Adding a test to demonstrate aggressive folding for LEA facotrization..
Tue, Aug 29, 3:37 AM
jbhateja created D37257: [X86] Adding a test to demonstrate aggressive folding for LEA facotrization..
Tue, Aug 29, 3:36 AM

Sun, Aug 27

jbhateja updated the diff for D36454: [X86] Changes to extract Horizontal addition operation for AVX-512..
  • Stashed change leftout in last checkin + formatting changes.
  • Updating test reference
Sun, Aug 27, 8:22 AM
jbhateja updated the diff for D36454: [X86] Changes to extract Horizontal addition operation for AVX-512..
  • Removing a file got added in last patch.
Sun, Aug 27, 6:32 AM
jbhateja added a comment to D36454: [X86] Changes to extract Horizontal addition operation for AVX-512..

Thinking about this some more. Do we really want to use a horizontal add instruction for a register with itself? Horizontal add is suboptimally implemented in microcode. It's 3 uops while the pshufd and the add are only 2 uops. The 3 uops also mean its limited to the complex decoder on Intel hardware.

Sun, Aug 27, 6:27 AM
jbhateja updated the diff for D36454: [X86] Changes to extract Horizontal addition operation for AVX-512..
Sun, Aug 27, 6:22 AM
jbhateja closed D37183: [X86] Adding a test for horizontal [f]add/[f]sub for avx512 vector type 16x32..

closing with commit rL311847: [X86] Adding more tests for horizontal [F]HADD/[F]SUB for AVX512 vectors types.

Sun, Aug 27, 5:50 AM
jbhateja committed rL311847: [X86] Adding more tests for horizontal [F]HADD/[F]SUB for AVX512 vectors types.
[X86] Adding more tests for horizontal [F]HADD/[F]SUB for AVX512 vectors types
Sun, Aug 27, 5:44 AM
jbhateja updated the diff for D37183: [X86] Adding a test for horizontal [f]add/[f]sub for avx512 vector type 16x32..

[X86] Adding more test points for horizontal add/sub for integers/floating avx512 vector types.

Sun, Aug 27, 5:39 AM
jbhateja reopened D37183: [X86] Adding a test for horizontal [f]add/[f]sub for avx512 vector type 16x32..

Re-opeing to add more tests under same revision.

Sun, Aug 27, 5:04 AM

Sat, Aug 26

jbhateja committed rL311834: [X86] Adding a test for horizontal [f]add/[f]sub for avx512 vector type 16x32..
[X86] Adding a test for horizontal [f]add/[f]sub for avx512 vector type 16x32.
Sat, Aug 26, 12:06 PM
jbhateja closed D37183: [X86] Adding a test for horizontal [f]add/[f]sub for avx512 vector type 16x32. by committing rL311834: [X86] Adding a test for horizontal [f]add/[f]sub for avx512 vector type 16x32..
Sat, Aug 26, 12:05 PM
jbhateja committed rL311833: [DAGCombiner] Extending pattern detection for vector shuffle..
[DAGCombiner] Extending pattern detection for vector shuffle.
Sat, Aug 26, 12:05 PM
jbhateja added a reverting commit for rL311247: Merge branch 'arcpatch-D35788': rL311832: Revert rL311247 : To rectify commit message..
Sat, Aug 26, 12:05 PM
jbhateja committed rL311832: Revert rL311247 : To rectify commit message..
Revert rL311247 : To rectify commit message.
Sat, Aug 26, 12:05 PM
jbhateja accepted D37183: [X86] Adding a test for horizontal [f]add/[f]sub for avx512 vector type 16x32..
Sat, Aug 26, 11:55 AM