MatzeB (Matthias Braun)
User

Projects

User does not belong to any projects.

User Details

User Since
Aug 13 2013, 3:10 PM (266 w, 6 d)

Recent Activity

Yesterday

MatzeB accepted D52410: Use TRI->regsOverlap() in MachineBasicBlock::computeRegisterLiveness.

code change LGTM.
I'd be great if you can find a test for this. (Just in case the AMDGPU based test doesn't work out I'd be fine without one here, given how rarely computeRegisterLiveness() is used)

Mon, Sep 24, 5:33 PM
MatzeB accepted D52374: [MachineCopyPropagation] Reimplement CopyTracker in terms of register units.

I'm convinced now the clobbering works correctly because as clobberRegister does not just clobber the register at hand but the whole destination register of the copy touched by the COPY at hand.
LGTM (this time with the right review :)

Mon, Sep 24, 5:31 PM
MatzeB added a comment to D51524: [ARM64] [Windows] Handle funclets.

I don't really know how exception handling on windows works or what a funclet is. So here's just a couple style comments and questions

Mon, Sep 24, 5:25 PM
MatzeB accepted D52370: [MachineCopyPropagation] Rework how we manage RegMask clobbers.

I'm convinced now the clobbering works correctly because as clobberRegister does not just clobber the register at hand but the whole destination register of the copy touched by the COPY at hand.
LGTM

Mon, Sep 24, 4:07 PM

Fri, Sep 21

MatzeB added inline comments to D52374: [MachineCopyPropagation] Reimplement CopyTracker in terms of register units.
Fri, Sep 21, 5:24 PM
MatzeB accepted D52370: [MachineCopyPropagation] Rework how we manage RegMask clobbers.

LGTM

Fri, Sep 21, 5:13 PM
MatzeB accepted D52085: [ARM][ARMLoadStoreOptimizer].

The new .mir test is nicely concise now. Maybe we can even do without the .ll test now?

Fri, Sep 21, 9:14 AM
MatzeB accepted D51915: [LNT] Improve alignment of HTML report..

LGTM

Fri, Sep 21, 9:09 AM
MatzeB accepted D51916: [LNT] Use unit_abbrev of Fields in reports..

LGTM
We may consider adding something like a tooltip in the future with the fully expanded value...

Fri, Sep 21, 9:07 AM
MatzeB accepted D52346: [LNT] Print machine name in LNT graph tooltip..

LGTM

Fri, Sep 21, 9:00 AM

Thu, Sep 20

MatzeB created D52335: AArch64FastISel: Abort intrinsic selection if the left operand didn't select.
Thu, Sep 20, 6:30 PM
MatzeB accepted D52127: [MF][MBB]: Add the ability to register callbacks for removal and insertion of MachineInstrs.

LGTM with nitpicks addressed.

Thu, Sep 20, 1:35 PM

Wed, Sep 19

MatzeB closed D52255: MachineScheduler: Add -misched-print-dags flag.

Landed in rL342589

Wed, Sep 19, 1:57 PM
MatzeB closed D52256: AArch64: Add FuseCryptoEOR fusion rules.

Landed in rL342590

Wed, Sep 19, 1:56 PM
MatzeB added inline comments to D52256: AArch64: Add FuseCryptoEOR fusion rules.
Wed, Sep 19, 1:10 PM
MatzeB updated the diff for D52256: AArch64: Add FuseCryptoEOR fusion rules.

Address review feedback.

Wed, Sep 19, 1:08 PM

Tue, Sep 18

MatzeB created D52256: AArch64: Add FuseCryptoEOR fusion rules.
Tue, Sep 18, 6:29 PM
MatzeB created D52255: MachineScheduler: Add -misched-print-dags flag.
Tue, Sep 18, 6:29 PM
MatzeB added a comment to D52085: [ARM][ARMLoadStoreOptimizer].

Please create a .mir test that just checks the ARMLoadStoreOptimizer pass. See test/CodeGen/ARM/vldm-liveness.mir or test/CodeGen/ARM/load_store_opt_kill.mir for example.

Tue, Sep 18, 4:46 PM

Mon, Sep 17

MatzeB added inline comments to D51348: CodeGen: Make computeRegisterLiveness consider successors.
Mon, Sep 17, 4:09 PM
MatzeB added a comment to D51502: [X86] Fix register resizings for inline assembly register operands..

Isn't this fix about inline assembly? Why do we see all the scheduling/regalloc changes here?

Mon, Sep 17, 11:43 AM
MatzeB added a comment to rL342175: [X86] Fix register resizings for inline assembly register operands..

Isn't this fix about inline assembly? Why do we see all the scheduling/regalloc changes here?

Mon, Sep 17, 11:40 AM
MatzeB added a comment to D52127: [MF][MBB]: Add the ability to register callbacks for removal and insertion of MachineInstrs.

Some additional coding style comments

Mon, Sep 17, 11:06 AM
MatzeB added a comment to D52127: [MF][MBB]: Add the ability to register callbacks for removal and insertion of MachineInstrs.

I've seen callbacks lead to unintuitive/hard to understand code. Can you motivate the use case some more?

Mon, Sep 17, 10:53 AM

Fri, Sep 14

MatzeB updated the summary of D52125: X86, AArch64, ARM: Do not attach debug location to spill/reload instructions.
Fri, Sep 14, 3:04 PM
MatzeB created D52125: X86, AArch64, ARM: Do not attach debug location to spill/reload instructions.
Fri, Sep 14, 3:03 PM

Thu, Sep 13

MatzeB added inline comments to D49503: [test-suite] Add Image Processing Kernels Using Benchmark Library: Dither Algorithms.
Thu, Sep 13, 5:40 PM
MatzeB added a comment to D52010: RegAllocFast: Rewrite and improve.

Hi Matthias,

Thanks for modernizing this allocator.

Could you split the patch to make the review easier?

You mentioned several points in your description, if at all possible, it would be nice to have one patch for each.

Cheers,
-Quentin

I can try to do that for review purposes... But some things depend on each other so the intermediate steps are probably not going to be functional...

Thu, Sep 13, 4:54 PM

Wed, Sep 12

MatzeB added inline comments to D51916: [LNT] Use unit_abbrev of Fields in reports..
Wed, Sep 12, 8:12 PM
MatzeB created D52010: RegAllocFast: Rewrite and improve.
Wed, Sep 12, 7:29 PM

Mon, Sep 10

MatzeB accepted D51862: [LNT] more pep8 code style fixes.

Out of interest: What version of pycodestyle do you use? I currently seem to have version 2.4.0 here while I do see most of the issues, for example I don't see a message about the regex functions not using raw strings.

Mon, Sep 10, 9:33 AM

Fri, Sep 7

MatzeB added a comment to D51751: Merge clang's isRepeatedBytePattern with LLVM's isBytewiseValue.

LGTM once Eli approves.

Fri, Sep 7, 1:52 PM
MatzeB accepted D51752: NFC: deduplicate isRepeatedBytePattern from clang to LLVM's isBytewiseValue.

Looks obvious and LGTM.

Fri, Sep 7, 1:14 PM

Thu, Sep 6

MatzeB added a comment to D51752: NFC: deduplicate isRepeatedBytePattern from clang to LLVM's isBytewiseValue.
  • I assume the review lacks the changes on the llvm side?
Thu, Sep 6, 2:27 PM
MatzeB added a comment to D51752: NFC: deduplicate isRepeatedBytePattern from clang to LLVM's isBytewiseValue.
  • I'm not a good person to review clang changes.
  • I assume the review lacks the changes on the llvm side?
  • On the LLVM side this feels like something for Analysis/ValueTracking.h in fact I already see Value *isBytewiseValue(Value *V); there, maybe that is good enough for your purpose?
Thu, Sep 6, 2:20 PM

Wed, Sep 5

MatzeB accepted D51706: ARM64: improve non-zero memset isel by ~2x.

Nice change, LGTM!
Maybe wait a couple days before committing in case someone better versed with SelectionDAG wants to chime in.

Wed, Sep 5, 4:06 PM
MatzeB added inline comments to D51474: Consider CSRs in computeRegisterLiveness.
Wed, Sep 5, 9:11 AM

Tue, Sep 4

MatzeB added inline comments to D51663: [test-suite, CUDA] Update CUDA test suite cmake files..
Tue, Sep 4, 4:32 PM
MatzeB accepted D51663: [test-suite, CUDA] Update CUDA test suite cmake files..

LGTM, sorry for the breakage.

Tue, Sep 4, 4:32 PM
MatzeB added inline comments to D51474: Consider CSRs in computeRegisterLiveness.
Tue, Sep 4, 1:15 PM
MatzeB added a comment to D51048: cmake: Specify reference outputs in llvm_test_data().
In D51048#1223622, @tra wrote:

Though thinking about it, I don't really understand your problem:

How can there be multiple outputs with the same name? They have different names in the repository (algorithm.reference_output, assert.reference_output, ...)

For CUDA tests, from each test source file we generate a lot of test executables compiled for combination of {CUDA version, C++ standard, C++ standard library}. All of those test executables will run and need to have their output compared to the same reference file.

Tue, Sep 4, 11:45 AM

Fri, Aug 31

MatzeB added a comment to D51048: cmake: Specify reference outputs in llvm_test_data().
In D51048#1221423, @tra wrote:

I've ran into an unexpected problem after this patches have landed.

CUDA test suite generates multiple tests that use the same reference output.
I can't just use llvm_test_data(test_XYZ, "test.reference_output"), because there can only be one test.reference_output in the build directory.
I also can't manually create a symlink $SRC/test.reference_output -> $BUILD_DIR/test-XYZ.reference_output and then use llvm_test_data(SOURCE_DIR=$BUILD_DIR test-XYZ.reference_output) because it can't link to itself.
The only way around is to create a new directory, symlink original file there as test_XYZ and then use the temp dir as SOURCE_DIR to create test_XYZ.reference_output in the build dir. This is rather convoluted.

We probably need llvm_test_data with an optional argument that would allow providing a different name for the file name in the build dir, or make it unique automatically.

Possible fix for your case (untested as I don't have a CUDA environment):

diff --git a/External/CUDA/CMakeLists.txt b/External/CUDA/CMakeLists.txt
index 071ffbc9..63ab1e4c 100644
--- a/External/CUDA/CMakeLists.txt
+++ b/External/CUDA/CMakeLists.txt
@@ -58,13 +58,16 @@ macro(create_one_local_test_f Name FileGlob FilterRegex)
     set(_executable ${Name}-${VariantSuffix})
     set(_executable_path ${CMAKE_CURRENT_BINARY_DIR}/${_executable})
     # Verify reference output if it exists.
+    llvm_test_run()
+    set(REFERENCE_OUTPUT)
     if(EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/${Name}.reference_output)
-      llvm_test_traditional(${Name})
-    else()
-      # otherwise just run the executable.
-      llvm_test_run()
+      set(REFERENCE_OUTPUT ${Name}.reference_output)
+      llvm_test_verify(WORKDIR ${CMAKE_CURRENT_BINARY_DIR}
+        ${FPCMP} %o ${REFERENCE_OUTPUT}
+      )
     endif()
     llvm_test_executable(${_executable} ${_sources})
+    llvm_test_data(${_executable} ${REFERENCE_OUTPUT})
     target_compile_options(${_executable} PUBLIC ${VariantCPPFLAGS})
     if(VariantLibs)
       target_link_libraries(${_executable} ${VariantLibs})
Fri, Aug 31, 5:43 PM
MatzeB added a comment to D51048: cmake: Specify reference outputs in llvm_test_data().
In D51048#1221423, @tra wrote:

I've ran into an unexpected problem after this patches have landed.

CUDA test suite generates multiple tests that use the same reference output.
I can't just use llvm_test_data(test_XYZ, "test.reference_output"), because there can only be one test.reference_output in the build directory.
I also can't manually create a symlink $SRC/test.reference_output -> $BUILD_DIR/test-XYZ.reference_output and then use llvm_test_data(SOURCE_DIR=$BUILD_DIR test-XYZ.reference_output) because it can't link to itself.
The only way around is to create a new directory, symlink original file there as test_XYZ and then use the temp dir as SOURCE_DIR to create test_XYZ.reference_output in the build dir. This is rather convoluted.

We probably need llvm_test_data with an optional argument that would allow providing a different name for the file name in the build dir, or make it unique automatically.

Fri, Aug 31, 5:40 PM
MatzeB accepted D51537: Extend hasStoreToStackSlot with list of FI accesses..

Good idea! LGTM.

Fri, Aug 31, 1:35 PM
MatzeB added inline comments to D51465: Revamp test-suite documentation.
Fri, Aug 31, 10:43 AM
MatzeB added inline comments to D51465: Revamp test-suite documentation.
Fri, Aug 31, 9:42 AM
MatzeB updated the diff for D51465: Revamp test-suite documentation.
  • Address remaining review feedback, shorten some terminal dumps.
  • I will commit this after landing D51080 (the remote section here describes the situation after that patch)
Fri, Aug 31, 9:42 AM

Thu, Aug 30

MatzeB added inline comments to D51474: Consider CSRs in computeRegisterLiveness.
Thu, Aug 30, 5:00 PM
MatzeB added a comment to D51357: Remove LIT_SITE_CFG_IN_FOOTER..

I currently see on my machine:

[0/1] Running the LLVM regression tests
llvm-lit: /Users/mbraun/dev/public_llvm/utils/lit/lit/TestingConfig.py:102: fatal: unable to parse config file '/Users/mbraun/dev/public_llvm/build/test/lit.site.cfg.py', traceback: Traceback (most recent call last):
  File "/Users/mbraun/dev/public_llvm/utils/lit/lit/TestingConfig.py", line 89, in load_from_path
    exec(compile(data, path, 'exec'), cfg_globals, None)
  File "/Users/mbraun/dev/public_llvm/build/test/lit.site.cfg.py", line 59, in <module>
    lit.llvm.initialize(lit_config, config)
AttributeError: 'module' object has no attribute 'llvm'
Thu, Aug 30, 4:38 PM
MatzeB accepted D51495: Fix removal of dead elements from PressureDiff.

Good catch, the fix looks good to me. I assume you built llvm will all (non-experimental) targets and tested them?

Thu, Aug 30, 4:34 PM
MatzeB added a comment to D51048: cmake: Specify reference outputs in llvm_test_data().

And for some context about the seemingly excessive use of WORKDIR ${CMAKE_CURRENT_BINARY_DIR} here: One of my original plans was to make the test-suite build directory relocatable, i.e. after building you could move it to any other directory and still correctly run the benchmarks. That would mean have to stop using absolute paths and need to express everything relative to the benchmark binary. I hoped that an intermediate step towards this is changing as much as possible to WORKDIR ${CMAKE_CURRENT_BINARY_DIR} + relative paths. A next step would then be to change the default workdir to the binary dir.

Thu, Aug 30, 3:27 PM
MatzeB updated the diff for D51048: cmake: Specify reference outputs in llvm_test_data().

After applying this patch, the tests fail for me because it is taking the hash of stdout instead of the file mentioned in llvm_test_verify_hash_program_output macro.
Changing line 64 of cmake/modules/SingleMultiSource.cmake to ${CMAKE_SOURCE_DIR}/HashProgramOutput.sh ${file} will fix this.

Yes sorry, I had fixed this in my local copy too but failed to update the review.

Thu, Aug 30, 3:15 PM
MatzeB added inline comments to D51465: Revamp test-suite documentation.
Thu, Aug 30, 2:16 PM
MatzeB updated the diff for D51465: Revamp test-suite documentation.

Thanks for the detailed review comments!

Thu, Aug 30, 2:16 PM
MatzeB added a comment to D51474: Consider CSRs in computeRegisterLiveness.

Thanks for looking into this!

Thu, Aug 30, 9:37 AM
MatzeB added inline comments to D51080: litsupport/remote: Work without shared filesystem.
Thu, Aug 30, 9:11 AM
MatzeB accepted D39386: [Power9] Allow gpr callee saved spills in prologue to vector registers rather than stack.

LGTM too; some nitpicks below.

Thu, Aug 30, 9:09 AM

Wed, Aug 29

MatzeB edited reviewers for D51465: Revamp test-suite documentation, added: paquette; removed: jpaquette.
Wed, Aug 29, 3:56 PM
MatzeB created D51465: Revamp test-suite documentation.
Wed, Aug 29, 3:56 PM
MatzeB added a comment to D51080: litsupport/remote: Work without shared filesystem.

ping

The summary mentions this is for early review. Has this changed?

Ah yes, the first versions of this patch was not working in some corner cases like PGO. But with the latest updates I consider this patch ready.

Wed, Aug 29, 10:59 AM
MatzeB added inline comments to D51405: [LNT][RFC] Introduce Latest Runs Report..
Wed, Aug 29, 10:53 AM
MatzeB added a comment to D51405: [LNT][RFC] Introduce Latest Runs Report..

The new report and its implementation look good to me! However before committing this needs at least a basic unit test (tests/server/ui/V4Pages.py) to check that the page renders without exception and produces valid html.

Wed, Aug 29, 10:45 AM
MatzeB accepted D51350: CodeGen: Make computeRegisterLiveness search forward first.

This change LGTM and should be committed.

Wed, Aug 29, 10:34 AM
MatzeB accepted D51348: CodeGen: Make computeRegisterLiveness consider successors.

LGTM

Wed, Aug 29, 10:31 AM
MatzeB accepted D51421: Don't count debug instructions towards neighborhood count.

LGTM, thanks

Wed, Aug 29, 10:31 AM

Tue, Aug 28

MatzeB added a comment to D51080: litsupport/remote: Work without shared filesystem.

ping

Tue, Aug 28, 6:08 PM
MatzeB added a comment to D51048: cmake: Specify reference outputs in llvm_test_data().

ping

Tue, Aug 28, 6:07 PM

Mon, Aug 27

MatzeB added a comment to D49097: [RegisterCoalescer] Another fix for subrange join unreachable.
In D49097#1213446, @tpr wrote:

This fix adds code to scan all uses of a subrange and extend it, after every join. I can't think of a way of doing it in a lower impact way.

Which has got me thinking:

Am I right about all of the following?

  1. The decision on whether to join a copy is taken only with reference to each register's main range, not the subrange.

Yes.

  1. The only reason it does similar stuff with conflict detection and resolution on the subranges is so it can modify them incrementally.

Yes, it's mostly about updating the subranges.

  1. This fix, scanning all uses and extending subranges after a join, pretty much nullifies the advantage of modifying subranges incrementally.

Two reasons:

  1. Updating the liveranges was considered better for compiletime (I mean you could remove even the non-subrange updating logic with the same reasoning to make the code even simpler...)
  2. Recalculating subregister liveness used to not work reliably after coalescing (There were corner cases with full register uses with only some lanes being defined, without having corresponding IMPLICIT_DEF instructions for the undefined lanes).
Mon, Aug 27, 9:26 AM
MatzeB accepted D51308: MachineVerifier: Fix assert on implicit virtreg use.

LGTM

Mon, Aug 27, 8:34 AM

Aug 23 2018

MatzeB updated the diff for D51080: litsupport/remote: Work without shared filesystem.
  • Fix PGO in remote setting: We have to do the merging on the host side where the compiler toolchain resides, however before we can do that we have to retrieve the profile data from device. Make the remote module add commands to retrieve the profile data.
Aug 23 2018, 3:24 PM
MatzeB updated the diff for D51080: litsupport/remote: Work without shared filesystem.
  • Address review comments
  • Execute METRIC: lines remotely
Aug 23 2018, 9:31 AM

Aug 22 2018

MatzeB added a comment to D51080: litsupport/remote: Work without shared filesystem.
  1. cmake builddir for cross-compilation
  2. make/ninja
  3. rsync pushes test-suite builddir to target
  4. llvm-lit locale, but run every command on remote using ssh
  5. including fpmc/VALIDATE step in *.test (SPEC2017 also uses llvm_add_host_executable)

The VERIFY: steps in the scripts are run remotely as well.

PREPARE: and METRIC as well?

Yes PREPARE: is modified as well, and I just realized METRIC: should be modified too but is still TODO :)

Aug 22 2018, 3:33 PM
MatzeB added a comment to D51080: litsupport/remote: Work without shared filesystem.

Can you outline the intended workflow? My guess:

  1. cmake builddir for cross-compilation
  2. make/ninja
  3. rsync pushes test-suite builddir to target
  4. llvm-lit locale, but run every command on remote using
  5. including fpmc/VALIDATE step in *.test (SPEC2017 also uses llvm_add_host_executable)

The VERIFY: steps in the scripts are run remotely as well. I hope that takes part of the spec2017 validation things[*]

Aug 22 2018, 2:20 PM
MatzeB updated the diff for D51080: litsupport/remote: Work without shared filesystem.
Aug 22 2018, 1:26 PM
MatzeB accepted D50914: [RegisterCoalescer] Fix for assert in removePartialRedundancy.

LGTM with nitpicks addressed.

Aug 22 2018, 10:47 AM

Aug 21 2018

MatzeB created D51080: litsupport/remote: Work without shared filesystem.
Aug 21 2018, 5:48 PM
MatzeB added a dependency for D51048: cmake: Specify reference outputs in llvm_test_data(): D50209: cmake: Explicitely specify benchmark data.
Aug 21 2018, 10:37 AM
MatzeB added a dependent revision for D50209: cmake: Explicitely specify benchmark data: D51048: cmake: Specify reference outputs in llvm_test_data().
Aug 21 2018, 10:37 AM
MatzeB created D51048: cmake: Specify reference outputs in llvm_test_data().
Aug 21 2018, 10:36 AM

Aug 20 2018

MatzeB added inline comments to D50209: cmake: Explicitely specify benchmark data.
Aug 20 2018, 4:30 PM
MatzeB added a comment to D50209: cmake: Explicitely specify benchmark data.

I am ok with committing this, but maybe we should have someone else's opinion as well?

Aug 20 2018, 3:45 PM
MatzeB updated the diff for D50209: cmake: Explicitely specify benchmark data.

Updated patch:

Aug 20 2018, 3:40 PM
MatzeB added a comment to D50986: [mips] Prevent shrink-wrap for BuildPairF64, ExtractElementF64 when they use $sp.

This looks wrong to me. And it would be good if you could explain with more detail or an example. So far I see: An instruction uses a register but we didn't add a register operand for that. Why would we fix this by adding an exception into shrink wrapping instead of making sure there is an actual operand?

Aug 20 2018, 1:13 PM
MatzeB added a comment to D50988: Remove Darwin support from POWER backend..

Actually can you explain the removal of llvm/test/tools/dsymutil/PowerPC/sibling.test?

I had an email conversation with @JDevlieghere regarding this test. Someone suggested a test based on an old version of llvm-gcc from an old Powerbook G4, which is how this test was created. That was the only reason for this being a PowerPC test.

I tried generating an alternate test using a new GCC on a ppc64le machine, but the test caused an Assert failure in dsymutil. Jonas is looking at. In the meantime, he said this specific test should not block this patch.

If you have another idea for generating an equivalent test, I'm happy to do it.

Aug 20 2018, 1:08 PM
MatzeB added a comment to D50988: Remove Darwin support from POWER backend..

Actually can you explain the removal of llvm/test/tools/dsymutil/PowerPC/sibling.test?

Aug 20 2018, 12:32 PM
MatzeB accepted D50988: Remove Darwin support from POWER backend..

I just realized that this is particular test is only about making the tests not use a darwin target anymore. That LGTM.

Aug 20 2018, 12:31 PM
MatzeB added reviewers for D50988: Remove Darwin support from POWER backend.: dexonsmith, Gerolf.

I personally can't think of a reason why not to do this. But I think sending an RFC to llvm-dev is in order here.

Aug 20 2018, 12:29 PM
MatzeB accepted D48994: Update DBG_VALUE register operand during LiveInterval operations.

LGTM
Just change the title when committing since this is in generic code and will affect every target and not just "WebAssembly"

Aug 20 2018, 10:24 AM · debug-info
MatzeB accepted D50339: Consistently use MemoryLocation::UnknownSize to indicate unknown access size.

The practical application of this is to indicate aliasing with a given memory object, but without specifying exactly which part of the object is accessed (since it may be unknown). Dropping memory operands altogether will alias the operation with all other objects, so it will be correct, but overly conservative.

Aug 20 2018, 10:23 AM
MatzeB added a comment to D49401: TII: Generalize X86's isSafeToClobberEFLAGs.

Also I'd generally recommend to rather look at LivePhysReg/LiveRegUnits for doing liveness queries post-ra. They can be more expensive but give more consistent results...

I want to use this in an SSA pass, not post-RA

Aug 20 2018, 10:19 AM
MatzeB added a comment to D49671: [SchedModel] Propagate read advance cycles to implicit operands outside instruction descriptor.

This is tricky. Some comments:

  • Have you tried enabling subregister liveness tracking? Among other things it gets rid of the implicit-defs/uses for the full registers... (though there may be other factors influencing that decisions)

Yes, IIRC I have tried that, but got crashes immediately which was discouraging. So for the moment, that is not something that could be the default for SystemZ, I think.

  • What about just setting the latencies induced by the artifical implicit def-/uses[1] to 0? [1] = in lack of a better way to identify them, that would be all implicit vreg defs/uses that are not part of the MCInstrDesc.

Yes, that was also my idea but as I wrote earlier in some rare cases I noticed instructions where the actual latency was only put on that extra regalloc operand, while the explicit use op had just a unit latency!

I looked into this now a bit more, and it seems that in these cases a multiply or other instruction requires a double word register (128 bit), so a 64 bit register is coalesced into it:

Before Coalescing:

16B       %0:gr64bit = LGFRL @seedi ::
128B      undef %5.subreg_l64:gr128bit = COPY %0:gr64bit
144B      %6:gr128bit = COPY %5:gr128bit
160B      %6:gr128bit = MLGR %6:gr128bit, %3:gr64bit

After Coalescing:

16B       undef %5.subreg_l64:gr128bit = LGFRL @seedi :: (dereferenceable load 4 from @seedi)
...
144B      %6:gr128bit = COPY %5:gr128bit

After RA:

bb.0.entry:
renamable $r1d = LGFRL @seedi, implicit-def $r0q :: (dereferenceable load 4 from @seedi)
...
renamable $r4q = COPY renamable $r0q

After Post-RA pseudo instruction expansion pass:

bb.0.entry:
renamable $r1d = LGFRL @seedi, implicit-def $r0q :: (dereferenceable load 4 from @seedi)
...
$r4d = LGR $r0d, implicit $r0q
$r5d = LGR $r1d, implicit $r0q

DAG has the latency on $r0q (superreg), instead of $r0d between SU(0) and SU(3). ($r0q = $r0d + $r1d):

SU(0):   renamable $r1d = LGFRL @seedi, implicit-def $r0q :: (dereferenceable load 4 from @seedi)
# preds left       : 0
# succs left       : 9
# rdefs left       : 0
Latency            : 5
Depth              : 0
Height             : 70
Successors:
SU(4): Data Latency=5 Reg=$r1d
SU(4): Data Latency=5 Reg=$r0q
SU(3): Data Latency=5 Reg=$r0q
SU(3): Data Latency=1 Reg=$r0d
...
SU(3):   $r4d = LGR $r0d, implicit $r0q
# preds left       : 2
# succs left       : 3
# rdefs left       : 0
Latency            : 1
Depth              : 5
Height             : 65
Predecessors:
SU(0): Data Latency=5 Reg=$r0q
SU(0): Data Latency=1 Reg=$r0d
...
SU(4):   $r5d = LGR $r1d, implicit $r0q
# preds left       : 2
# succs left       : 3
# rdefs left       : 0
Latency            : 1
Depth              : 5
Height             : 65
Predecessors:
SU(0): Data Latency=5 Reg=$r1d
SU(0): Data Latency=5 Reg=$r0q
...

Seems like these are (rare) cases then where the defining instruction has an explicit def-op of a subregister, and a RegAlloc-implicit-def of the full register. The using instruction has an explicit use of the *other*
subreg, and an implicit use of the full register. The latency value is set only on the super-register (RegAlloc operand).

  • The "implicit-def of the full register" is added when materializing the result of the regalloc in VirtRegMap.cpp, it is not present or necessary during regalloc itself.
  • In your debug output I see:
SU(0):   renamable $r1d = LGFRL @seedi, implicit-def $r0q :: (dereferenceable load 4 from @seedi)
...
Successors:
SU(4): Data Latency=5 Reg=$r1d
SU(4): Data Latency=5 Reg=$r0q
SU(3): Data Latency=5 Reg=$r0q
SU(3): Data Latency=1 Reg=$r0d

Given that the actual instruction only writes to r1d I would argue that the latencies on r0q and r0d are "fake". Hence my proposal to ignore the extra operands during schedule dag construction or force their latency to zero. Or do you actually want a latency between SU(0) and SU(3) here?

Aug 20 2018, 10:18 AM
MatzeB added a comment to D50339: Consistently use MemoryLocation::UnknownSize to indicate unknown access size.

First: My experience with MachineMemOperands is limited, so feel free to correct me where necessary.

Aug 20 2018, 10:02 AM
MatzeB added a comment to D50914: [RegisterCoalescer] Fix for assert in removePartialRedundancy.

Please read https://llvm.org/docs/MIRLangRef.html#simplifying-mir-files

Aug 20 2018, 9:13 AM
MatzeB added a comment to D50914: [RegisterCoalescer] Fix for assert in removePartialRedundancy.

It would help review if you could describe the exact situation this happens. I am wondering:

Aug 20 2018, 9:11 AM
MatzeB added a reviewer for D50914: [RegisterCoalescer] Fix for assert in removePartialRedundancy: wmi.
Aug 20 2018, 8:56 AM

Aug 2 2018

MatzeB updated the diff for D50209: cmake: Explicitely specify benchmark data.

Let llvm_test_data() handle directories so we don't need a separate llvm_test_data_dir().

Aug 2 2018, 7:01 PM
MatzeB added inline comments to D50209: cmake: Explicitely specify benchmark data.
Aug 2 2018, 6:51 PM
MatzeB updated the diff for D50209: cmake: Explicitely specify benchmark data.

Fix benchmarks I missed before because they aren't built by default.

Aug 2 2018, 6:32 PM
MatzeB updated the diff for D50209: cmake: Explicitely specify benchmark data.
  • properly document new cmake functions and add SUB_DIRECTORY argument to llvm_test_data_dir() for consistency.
Aug 2 2018, 5:33 PM
MatzeB added a comment to D50209: cmake: Explicitely specify benchmark data.

Some notes:

  • I is only for the benchmarks shipping with the test-suite yet. The things in External will need some additional changes (they will keep working as is, just won't consistenly have their data in the builddir)
  • The copying grows the builddir from ~900MB to 1GB for me. If this is a concern then we could add a mode that creates symlinks when the test-suite is running locally; however I wasn't convinced yet it is worth having two modes just for 100MB savings in the builddir.
Aug 2 2018, 5:29 PM