This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/MC/
-
llvm/
-
MC/
2/4
MCInstrAnalysis.h
-
lib/
-
MC/
-
MCInstrAnalysis.cpp
-
Target/X86/MCTargetDesc/
-
X86/
-
MCTargetDesc/
8/18
X86MCTargetDesc.cpp
-
test/tools/llvm-mca/X86/BtVer2/
-
tools/
-
llvm-mca/
-
X86/
-
BtVer2/
-
clear-super-register-1.s
-
clear-super-register-2.s
-
tools/llvm-mca/
-
llvm-mca/
2
InstrBuilder.h
-
InstrBuilder.cpp
-
Instruction.h
-
RegisterFile.cpp
-
llvm-mca.cpp

Differential D48225

[llvm-mca][X86] Teach how to identify register writes that implicitly clear the upper portion of a super-register.
ClosedPublic

Authored by andreadb on Jun 15 2018, 9:27 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
courbet
gbedwell
craig.topper
ab
mattd

Commits

rG2145b13fc92c: [llvm-mca][X86] Teach how to identify register writes that implicitly clear the…
rL335113: [llvm-mca][X86] Teach how to identify register writes that implicitly clear the…

Summary

This patch teaches llvm-mca how to identify register writes that implicitly zero the upper portion of a super-register.

On X86-64, a general purpose register is implemented in hardware as a 64-bit register. Quoting the Intel 64 Software Developer's Manual: "an update to the lower 32 bits of a 64 bit integer register is architecturally defined to zero extend the upper 32 bits".
Also, a write to an XMM register performed by an AVX instruction implicitly zeroes the upper 128 bits of the aliasing YMM register.

This patch adds a new method named clearsSuperRegisters to the MCInstrAnalysis interface to help identify instructions that implicitly clear the upper portion of a super-register.
The rest of the patch teaches llvm-mca how to use that new method to obtain the information, and update the register dependencies accordingly.

I compared the kernels from tests clear-super-register-1.s and clear-super-register-2.s against the output from perf on btver2.
Previously there was a large discrepancy between the estimated IPC and the measured IPC. Now the differences are mostly in the noise.

Please let me know if okay to commit.

Thanks,
Andrea

Diff Detail

Event Timeline

andreadb created this revision.Jun 15 2018, 9:27 AM

Herald added a subscriber: tschuett. · View Herald TranscriptJun 15 2018, 9:27 AM

Should there be some AVX512VL tests? We don't have any scheduler that can test XOP instructions AFAICT (unless we want to cheat and use SandyBridge in its role as the generic model).

include/llvm/MC/MCInstrAnalysis.h
88	When is it better to use BitVector vs APInt? I don't have an answer but we're incredibly inconsistent on this!
lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
19	Include ordering?
316	XOP instructions? I think the TBM instructions that use this encoding will be safe as they are always GR32/GR64.
323	Is this safe for i686 32-bit targets?

I like the fact that we can now keep track of the implicit zero of the upper register bits. Very cool. This change makes sense, and as far as I can tell it looks good, but I'd wait for a few others to weigh in.

lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
307	Since you have a static routine for creating instances of this class: `createX86MCInstrAnalysis`, you can probably make this ctor private. It can probably be marked `explicit`, to imply no converting.
328	Can we be sure that NumDefs <= BitVector::size? I know Bitvector's access operators assert to check for out of bounds indexing.
tools/llvm-mca/InstrBuilder.h
55	Should the names of the formal parameters be capitalized? I realize you are trying to avoid clashing with the member names.

In D48225#1133813, @RKSimon wrote:

Should there be some AVX512VL tests? We don't have any scheduler that can test XOP instructions AFAICT (unless we want to cheat and use SandyBridge in its role as the generic model).

Thanks Simon,

I will add a test for AVX512VL. If you have a good idea about how to test XOP, then I can add more tests for it too.

include/llvm/MC/MCInstrAnalysis.h
88	I think using a BitVector (at least in this context) is probably okay. But - to be honest - I don't know the right answer to that question either. The idea is to use a simple bitvector to do very simple bit manipulation. APInt has a much richer interface. It allows to do other things (other than bit manipulation). It allows to do arithmetic and logic computation on integers with arbitrary precision. APInt is probably "over designed" for this particular context. That being said, both interfaces are okay. If you prefer, I can switch to APInt.
lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
19	Sorry about that. I will move the new include before "X86MCAsmInfo.h".
316	Good point. I forgot about XOP. However - as you mentioned - I have no idea how to test it. Do you still want me to add the check for XOP, even if we cannot write a test for it? Alternatively, I can leave a FIXME comment. Not sure what is more appropriate in this context...
323	It should be safe for i686 because the super-register of a 32-bit GPR is not usable/existent in practice.

craig.topper added inline comments.Jun 15 2018, 10:41 AM

include/llvm/MC/MCInstrAnalysis.h
88	BitVector always heap allocates, APInt heap allocates above 64 bits. SmallBitVector heap allocates above 58 bits on 64-bit hosts and above 27 bits on 32-bit hosts.
lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
323	VR128RegClass doesn't include XMM16-XMM31. Those are in VR128XRegClass.

Does this cover writes to YMM clearing the upper half of ZMM registers?

andreadb added inline comments.Jun 15 2018, 10:44 AM

lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
307	Unless I am missing something, the `createX86MCInstrAnalysis` static routine would not be able to see the constructor if we make it private.
328	We cannot be sure. We assume that it has been set to the right size by the user. Otherwise, it will assert inside BitVector. What we could do, is to always "resize" the BitVector to the actual number of register writes (from the information in Desc).
tools/llvm-mca/InstrBuilder.h
55	I have seen it done in a few (not many) places in LLVM specifically to avoid the issue with name clash. I can use different names to avoid the name clashing. Alternatively, I just capitalise the param names; it would still work, but it would not be nice to read.

In D48225#1133891, @craig.topper wrote:

Does this cover writes to YMM clearing the upper half of ZMM registers?

Good point. I didn't consider that case.
I guess we need to conservatively also check for YMM writes. Obviously, the implicit zeroing of the upper half of ZMM would only make sense for targets where ZMM is effectively usable...

include/llvm/MC/MCInstrAnalysis.h
88	Thanks for the answer. I change it to APInt then.
lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
323	I will change it. Thanks.

mattd added inline comments.Jun 15 2018, 11:02 AM

lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
307	You are correct, `createX86MCInstrAnalysis` would have to be a static method of `X86MCInstrAnalysis` to make private the create* routine. I see that none of the other create* routines in here are presented that way, so it probably doesn't make much sense to change this one. Please ignore my original comment.

Diffusion mentioned this in rL334945: [llvm-mca] Add tests for XOP and AVX512 instructions that implicitly clear the….Jun 18 2018, 7:05 AM

Patch updated.

Addressed review comments.
We now use an APInt instead of a BitVector to keep track of writes that update super-register(s).

A few XOP and AVX512 tests have been added at revision 334945 (http://llvm.org/viewvc/llvm-project?view=revision&revision=334945).

This new patch shows the updated analysis. Now the tool correctly kills the dependencies with the zeroed portions of YMM/ZMM registers. The new IPC for those tests looks much more realistic now. For example, avx512-super-registers-2.s goes from IPC 0.29 to IPC 1.89, with a teoretical maximum IPC of 2.00 (computed as NumInstructions / Block RThroughput = 6 / 3).

Please let me know if okay to commit.

Thanks,
Andrea

No more comments from me - @mattd @craig.topper are you guys happy with this?

I'm cool with the patch. I don't see anything wrong.

lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
330	I like this lambda, and it makes sense to me. AFAIK, the [=] capture encourages a copy of GR32RC and VR256XRC. Can we get away with a reference capture [&] instead?

mattd added inline comments.Jun 19 2018, 8:34 AM

lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
330	Nevermind, the GR43RC and VR128XRC are not automatic to this lambda, so they are propagated via reference semantics. This looks good to me.

Thanks Matt and Simon.

I will wait for the okay from Craig on the AVX512 part before committing this patch.

Cheers,
Andrea

craig.topper added inline comments.Jun 19 2018, 10:08 AM

lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
348	What should happen if you enable avx512f and xop instructions at the same time? I know no real CPU supports it, but should a 256-bit xop instruction clear the upper bits of zmm?

andreadb added inline comments.Jun 19 2018, 10:54 AM

lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
348	For XOP instructions, document "AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit and 256-Bit Media Instructions" says that: Bits [255:128] of the YMM register that corresponds to the destination are cleared That sentence is related to XOP instructions that set an XMM register. However, that same document uses the a very similar sentence when describing AVX instructions. For example, for VADDPD we have this: XMM Encoding: The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM regis ter. Bits [255:128] of the YMM register that corresponds to the destination are cleared. VLMAX (or, the concept of a "maximum vector register width" for the processor) is not even mentioned in the entire document. So, I honestly don't know what is the right answer to your question. If we want to be conservative, then we can assume for now that XOP does not update the upper bits of a ZMM register. In future (if AMD decides not to drop XOP), then we revisit this choice and update/simplify this code. What do you think?

lebedev.ri added a subscriber: lebedev.ri.Jun 19 2018, 10:59 AM

lebedev.ri added inline comments.

lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
348	In future (if AMD decides not to drop XOP) They already dropped it (and TBM), they don't exist in Zen.

So as it is now, writing a VR128X with XOP will zero [511:256] and [255:128], but writing VR256X with xop won't?

In D48225#1136793, @craig.topper wrote:

So as it is now, writing a VR128X with XOP will zero [511:256] and [255:128], but writing VR256X with xop won't?

I see what you mean. I don't want to complicate the API (especially since there is no cpu with XOP and AVX512f). What if I treat XOP the same as AVX then? In practice, this won't make any difference.

I'm fine with treating it the same as AVX.

Patch updated.

Treat XOP the same way as AVX. When the destination of a XOP instruction is an XMM register, we assume that the upper portion of all super-registers is cleared.

Ok the register class/feature stuff looks good to me now.

We have a quorum! LGTM

This revision is now accepted and ready to land.Jun 20 2018, 2:36 AM

In D48225#1137437, @RKSimon wrote:

We have a quorum! LGTM

Cheers ;-)

Thanks Craig/Simon/Matt for all the feedback.

Closed by commit rL335113: [llvm-mca][X86] Teach how to identify register writes that implicitly clear the… (authored by adibiagio). · Explain WhyJun 20 2018, 3:12 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

MC/

MCInstrAnalysis.h

27 lines

lib/

MC/

MCInstrAnalysis.cpp

9 lines

Target/

X86/

MCTargetDesc/

X86MCTargetDesc.cpp

52 lines

test/

tools/

llvm-mca/

X86/

BtVer2/

clear-super-register-1.s

34 lines

clear-super-register-2.s

82 lines

tools/

llvm-mca/

10 lines

37 lines

17 lines

4 lines

5 lines

Diff 151519

include/llvm/MC/MCInstrAnalysis.h

Show All 16 Lines

#include "llvm/MC/MCInst.h"		#include "llvm/MC/MCInst.h"
#include "llvm/MC/MCInstrDesc.h"		#include "llvm/MC/MCInstrDesc.h"
#include "llvm/MC/MCInstrInfo.h"		#include "llvm/MC/MCInstrInfo.h"
#include <cstdint>		#include <cstdint>

namespace llvm {		namespace llvm {

		class BitVector;
		class MCRegisterInfo;

class MCInstrAnalysis {		class MCInstrAnalysis {
protected:		protected:
friend class Target;		friend class Target;

const MCInstrInfo *Info;		const MCInstrInfo *Info;

public:		public:
MCInstrAnalysis(const MCInstrInfo *Info) : Info(Info) {}		MCInstrAnalysis(const MCInstrInfo *Info) : Info(Info) {}
Show All 22 Lines	public:
virtual bool isReturn(const MCInst &Inst) const {		virtual bool isReturn(const MCInst &Inst) const {
return Info->get(Inst.getOpcode()).isReturn();		return Info->get(Inst.getOpcode()).isReturn();
}		}

virtual bool isTerminator(const MCInst &Inst) const {		virtual bool isTerminator(const MCInst &Inst) const {
return Info->get(Inst.getOpcode()).isTerminator();		return Info->get(Inst.getOpcode()).isTerminator();
}		}

		/// Returns true if at least one of the register writes performed by
		/// \param Inst implicitly clears the upper portion of a super-register.
		///
		/// Example: on X86-64, a write to EAX implicitly clears the upper half of
		/// RAX. Also (still on x86) an XMM write perfomed by an AVX 128-bit
		/// instruction implicitly clear the upper half of the aliasing YMM register.
		///
		/// This method also updates a BitVector of register writes. There is one
		/// bit for every explicit/implicit write performed by the instruction. If a
		/// write implicitly clears its super-registers, then the corresponding bit is
		/// set (vic. it is cleared).
		///
		/// The first bits in the vector are related to explicit writes. The remaining
		/// bits are related to implicit writes. The sequence of writes follows the
		/// machine operand sequence. For implicit writes, the sequence is defined by
		/// the MCInstrDesc.
		///
		/// The assumption is that the capacity of the BitVector is correctly set by
		/// the caller. The default implementation conservatively assumes that none of
		/// the writes clears the upper portion of a super-register.
		virtual bool clearsSuperRegisters(const MCRegisterInfo &MRI,
		const MCInst &Inst,
		BitVector &Writes) const;
		RKSimonUnsubmitted Done Reply Inline Actions When is it better to use BitVector vs APInt? I don't have an answer but we're incredibly inconsistent on this! RKSimon: When is it better to use BitVector vs APInt? I don't have an answer but we're incredibly…
		andreadbAuthorUnsubmitted Not Done Reply Inline Actions I think using a BitVector (at least in this context) is probably okay. But - to be honest - I don't know the right answer to that question either. The idea is to use a simple bitvector to do very simple bit manipulation. APInt has a much richer interface. It allows to do other things (other than bit manipulation). It allows to do arithmetic and logic computation on integers with arbitrary precision. APInt is probably "over designed" for this particular context. That being said, both interfaces are okay. If you prefer, I can switch to APInt. andreadb: I think using a BitVector (at least in this context) is probably okay. But - to be honest - I…
		craig.topperUnsubmitted Done Reply Inline Actions BitVector always heap allocates, APInt heap allocates above 64 bits. SmallBitVector heap allocates above 58 bits on 64-bit hosts and above 27 bits on 32-bit hosts. craig.topper: BitVector always heap allocates, APInt heap allocates above 64 bits. SmallBitVector heap…
		andreadbAuthorUnsubmitted Not Done Reply Inline Actions Thanks for the answer. I change it to APInt then. andreadb: Thanks for the answer. I change it to APInt then.

/// Given a branch instruction try to get the address the branch		/// Given a branch instruction try to get the address the branch
/// targets. Return true on success, and the address in Target.		/// targets. Return true on success, and the address in Target.
virtual bool		virtual bool
evaluateBranch(const MCInst &Inst, uint64_t Addr, uint64_t Size,		evaluateBranch(const MCInst &Inst, uint64_t Addr, uint64_t Size,
uint64_t &Target) const;		uint64_t &Target) const;
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_MC_MCINSTRANALYSIS_H		#endif // LLVM_MC_MCINSTRANALYSIS_H

lib/MC/MCInstrAnalysis.cpp

	//===- MCInstrAnalysis.cpp - InstrDesc target hooks -----------------------===//			//===- MCInstrAnalysis.cpp - InstrDesc target hooks -----------------------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "llvm/MC/MCInstrAnalysis.h"			#include "llvm/MC/MCInstrAnalysis.h"

				#include "llvm/ADT/BitVector.h"
	#include "llvm/MC/MCInst.h"			#include "llvm/MC/MCInst.h"
	#include "llvm/MC/MCInstrDesc.h"			#include "llvm/MC/MCInstrDesc.h"
	#include "llvm/MC/MCInstrInfo.h"			#include "llvm/MC/MCInstrInfo.h"
	#include <cstdint>			#include <cstdint>

	using namespace llvm;			using namespace llvm;

				bool MCInstrAnalysis::clearsSuperRegisters(const MCRegisterInfo &MRI,
				const MCInst &Inst,
				BitVector &Writes) const {
				Writes.reset();
				return false;
				}

	bool MCInstrAnalysis::evaluateBranch(const MCInst &Inst, uint64_t Addr,			bool MCInstrAnalysis::evaluateBranch(const MCInst &Inst, uint64_t Addr,
	uint64_t Size, uint64_t &Target) const {			uint64_t Size, uint64_t &Target) const {
	if (Inst.getNumOperands() == 0 \|\|			if (Inst.getNumOperands() == 0 \|\|
	Info->get(Inst.getOpcode()).OpInfo[0].OperandType != MCOI::OPERAND_PCREL)			Info->get(Inst.getOpcode()).OpInfo[0].OperandType != MCOI::OPERAND_PCREL)
	return false;			return false;

	int64_t Imm = Inst.getOperand(0).getImm();			int64_t Imm = Inst.getOperand(0).getImm();
	Target = Addr+Size+Imm;			Target = Addr+Size+Imm;
	return true;			return true;
	}			}

lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp

	Show All 9 Lines
	// This file provides X86 specific target descriptions.			// This file provides X86 specific target descriptions.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "X86MCTargetDesc.h"			#include "X86MCTargetDesc.h"
	#include "InstPrinter/X86ATTInstPrinter.h"			#include "InstPrinter/X86ATTInstPrinter.h"
	#include "InstPrinter/X86IntelInstPrinter.h"			#include "InstPrinter/X86IntelInstPrinter.h"
	#include "X86MCAsmInfo.h"			#include "X86MCAsmInfo.h"
				#include "X86BaseInfo.h"
				#include "llvm/ADT/BitVector.h"
				RKSimonUnsubmitted Done Reply Inline Actions Include ordering? RKSimon: Include ordering?
				andreadbAuthorUnsubmitted Done Reply Inline Actions Sorry about that. I will move the new include before "X86MCAsmInfo.h". andreadb: Sorry about that. I will move the new include before "X86MCAsmInfo.h".
	#include "llvm/ADT/Triple.h"			#include "llvm/ADT/Triple.h"
	#include "llvm/DebugInfo/CodeView/CodeView.h"			#include "llvm/DebugInfo/CodeView/CodeView.h"
	#include "llvm/MC/MCInstrAnalysis.h"			#include "llvm/MC/MCInstrAnalysis.h"
	#include "llvm/MC/MCInstrInfo.h"			#include "llvm/MC/MCInstrInfo.h"
	#include "llvm/MC/MCRegisterInfo.h"			#include "llvm/MC/MCRegisterInfo.h"
	#include "llvm/MC/MCStreamer.h"			#include "llvm/MC/MCStreamer.h"
	#include "llvm/MC/MCSubtargetInfo.h"			#include "llvm/MC/MCSubtargetInfo.h"
	#include "llvm/MC/MachineLocation.h"			#include "llvm/MC/MachineLocation.h"
	▲ Show 20 Lines • Show All 262 Lines • ▼ Show 20 Lines
	}			}

	static MCRelocationInfo *createX86MCRelocationInfo(const Triple &TheTriple,			static MCRelocationInfo *createX86MCRelocationInfo(const Triple &TheTriple,
	MCContext &Ctx) {			MCContext &Ctx) {
	// Default to the stock relocation info.			// Default to the stock relocation info.
	return llvm::createMCRelocationInfo(TheTriple, Ctx);			return llvm::createMCRelocationInfo(TheTriple, Ctx);
	}			}

				namespace llvm {
				namespace X86_MC {

				class X86MCInstrAnalysis : public MCInstrAnalysis {
				X86MCInstrAnalysis(const X86MCInstrAnalysis &) = delete;
				X86MCInstrAnalysis &operator=(const X86MCInstrAnalysis &) = delete;
				virtual ~X86MCInstrAnalysis() = default;

				public:
				X86MCInstrAnalysis(const MCInstrInfo *MCII) : MCInstrAnalysis(MCII) {}
				mattdUnsubmitted Not Done Reply Inline Actions Since you have a static routine for creating instances of this class: `createX86MCInstrAnalysis`, you can probably make this ctor private. It can probably be marked `explicit`, to imply no converting. mattd: Since you have a static routine for creating instances of this class…
				andreadbAuthorUnsubmitted Not Done Reply Inline Actions Unless I am missing something, the `createX86MCInstrAnalysis` static routine would not be able to see the constructor if we make it private. andreadb: Unless I am missing something, the `createX86MCInstrAnalysis` static routine would not be able…
				mattdUnsubmitted Not Done Reply Inline Actions You are correct, `createX86MCInstrAnalysis` would have to be a static method of `X86MCInstrAnalysis` to make private the create* routine. I see that none of the other create* routines in here are presented that way, so it probably doesn't make much sense to change this one. Please ignore my original comment. mattd: You are correct, `createX86MCInstrAnalysis` would have to be a static method of…

				bool clearsSuperRegisters(const MCRegisterInfo &MRI, const MCInst &Inst,
				BitVector &Writes) const override {
				const MCInstrDesc &Desc = Info->get(Inst.getOpcode());

				// AVX instructions that write to XMM registers zero out the upper 128 bits
				// of the underlying YMM register.
				bool HasVEXOrEVEX = ((Desc.TSFlags & X86II::EncodingMask) == X86II::VEX \|\|
				(Desc.TSFlags & X86II::EncodingMask) == X86II::EVEX);
				RKSimonUnsubmitted Done Reply Inline Actions XOP instructions? I think the TBM instructions that use this encoding will be safe as they are always GR32/GR64. RKSimon: XOP instructions? I think the TBM instructions that use this encoding will be safe as they are…
				andreadbAuthorUnsubmitted Done Reply Inline Actions Good point. I forgot about XOP. However - as you mentioned - I have no idea how to test it. Do you still want me to add the check for XOP, even if we cannot write a test for it? Alternatively, I can leave a FIXME comment. Not sure what is more appropriate in this context... andreadb: Good point. I forgot about XOP. However - as you mentioned - I have no idea how to test it. Do…

				// On X86-64, a general purpose integer register is viewed as a 64-bit
				// register internal to the processor.
				// An update to the lower 32 bits of a 64 bit integer register is
				// architecturally defined to zero extend the upper 32 bits.
				const MCRegisterClass &GR32RC = MRI.getRegClass(X86::GR32RegClassID);
				const MCRegisterClass &XMMRC = MRI.getRegClass(X86::VR128RegClassID);
				RKSimonUnsubmitted Done Reply Inline Actions Is this safe for i686 32-bit targets? RKSimon: Is this safe for i686 32-bit targets?
				andreadbAuthorUnsubmitted Done Reply Inline Actions It should be safe for i686 because the super-register of a 32-bit GPR is not usable/existent in practice. andreadb: It should be safe for i686 because the super-register of a 32-bit GPR is not usable/existent in…
				craig.topperUnsubmitted Done Reply Inline Actions VR128RegClass doesn't include XMM16-XMM31. Those are in VR128XRegClass. craig.topper: VR128RegClass doesn't include XMM16-XMM31. Those are in VR128XRegClass.
				andreadbAuthorUnsubmitted Done Reply Inline Actions I will change it. Thanks. andreadb: I will change it. Thanks.

				unsigned NumDefs = Desc.getNumDefs();
				for (unsigned I = 0, E = NumDefs; I < E; ++I) {
				const MCOperand &Op = Inst.getOperand(I);
				Writes[I] = GR32RC.contains(Op.getReg()) \|\|
				mattdUnsubmitted Not Done Reply Inline Actions Can we be sure that NumDefs <= BitVector::size? I know Bitvector's access operators assert to check for out of bounds indexing. mattd: Can we be sure that NumDefs <= BitVector::size? I know Bitvector's access operators assert to…
				andreadbAuthorUnsubmitted Not Done Reply Inline Actions We cannot be sure. We assume that it has been set to the right size by the user. Otherwise, it will assert inside BitVector. What we could do, is to always "resize" the BitVector to the actual number of register writes (from the information in Desc). andreadb: We cannot be sure. We assume that it has been set to the right size by the user. Otherwise, it…
				(HasVEXOrEVEX && XMMRC.contains(Op.getReg()));
				}
				mattdUnsubmitted Not Done Reply Inline Actions I like this lambda, and it makes sense to me. AFAIK, the [=] capture encourages a copy of GR32RC and VR256XRC. Can we get away with a reference capture [&] instead? mattd: I like this lambda, and it makes sense to me. AFAIK, the [=] capture encourages a copy of…
				mattdUnsubmitted Not Done Reply Inline Actions Nevermind, the GR43RC and VR128XRC are not automatic to this lambda, so they are propagated via reference semantics. This looks good to me. mattd: Nevermind, the GR43RC and VR128XRC are not automatic to this lambda, so they are propagated via…

				for (unsigned I = 0, E = Desc.getNumImplicitDefs(); I < E; ++I) {
				const MCPhysReg Reg = Desc.getImplicitDefs()[I];
				Writes[I + NumDefs] =
				GR32RC.contains(Reg) \|\| (HasVEXOrEVEX && XMMRC.contains(Reg));
				}

				return Writes.any();
				}
				};

				} // end of namespace X86_MC

				} // end of namespace llvm

	static MCInstrAnalysis createX86MCInstrAnalysis(const MCInstrInfo Info) {			static MCInstrAnalysis createX86MCInstrAnalysis(const MCInstrInfo Info) {
	return new MCInstrAnalysis(Info);			return new X86_MC::X86MCInstrAnalysis(Info);
	}			}
				craig.topperUnsubmitted Not Done Reply Inline Actions What should happen if you enable avx512f and xop instructions at the same time? I know no real CPU supports it, but should a 256-bit xop instruction clear the upper bits of zmm? craig.topper: What should happen if you enable avx512f and xop instructions at the same time? I know no real…
				andreadbAuthorUnsubmitted Not Done Reply Inline Actions For XOP instructions, document "AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit and 256-Bit Media Instructions" says that: Bits [255:128] of the YMM register that corresponds to the destination are cleared That sentence is related to XOP instructions that set an XMM register. However, that same document uses the a very similar sentence when describing AVX instructions. For example, for VADDPD we have this: XMM Encoding: The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM regis ter. Bits [255:128] of the YMM register that corresponds to the destination are cleared. VLMAX (or, the concept of a "maximum vector register width" for the processor) is not even mentioned in the entire document. So, I honestly don't know what is the right answer to your question. If we want to be conservative, then we can assume for now that XOP does not update the upper bits of a ZMM register. In future (if AMD decides not to drop XOP), then we revisit this choice and update/simplify this code. What do you think? andreadb: For XOP instructions, document "AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit and…
				lebedev.riUnsubmitted Not Done Reply Inline Actions In future (if AMD decides not to drop XOP) They already dropped it (and TBM), they don't exist in Zen. lebedev.ri: > In future (if AMD decides not to drop XOP) They already dropped it (and TBM), they don't…

	// Force static initialization.			// Force static initialization.
	extern "C" void LLVMInitializeX86TargetMC() {			extern "C" void LLVMInitializeX86TargetMC() {
	for (Target *T : {&getTheX86_32Target(), &getTheX86_64Target()}) {			for (Target *T : {&getTheX86_32Target(), &getTheX86_64Target()}) {
	// Register the MC asm info.			// Register the MC asm info.
	RegisterMCAsmInfoFn X(*T, createX86MCAsmInfo);			RegisterMCAsmInfoFn X(*T, createX86MCAsmInfo);

	// Register the MC instruction info.			// Register the MC instruction info.
	▲ Show 20 Lines • Show All 218 Lines • Show Last 20 Lines

test/tools/llvm-mca/X86/BtVer2/clear-super-register-1.s

	# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
	# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=100 -resource-pressure=false -timeline -timeline-max-iterations=2 < %s \| FileCheck %s			# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=100 -resource-pressure=false -timeline -timeline-max-iterations=2 < %s \| FileCheck %s

	## Sets register RAX.			## Sets register RAX.
	imulq $5, %rcx, %rax			imulq $5, %rcx, %rax

	## Kills the previous definition of RAX.			## Kills the previous definition of RAX.
	## The upper portion of RAX is cleared.			## The upper portion of RAX is cleared.
	lzcnt %ecx, %eax			lzcnt %ecx, %eax

	## The AND can start immediately after the LZCNT.			## The AND can start immediately after the LZCNT.
	## It doesn't need to wait for the IMUL.			## It doesn't need to wait for the IMUL.
	and %rcx, %rax			and %rcx, %rax
	bsf %rax, %rcx			bsf %rax, %rcx

	# CHECK: Iterations: 100			# CHECK: Iterations: 100
	# CHECK-NEXT: Instructions: 400			# CHECK-NEXT: Instructions: 400
	# CHECK-NEXT: Total Cycles: 1203			# CHECK-NEXT: Total Cycles: 704
	# CHECK-NEXT: Dispatch Width: 2			# CHECK-NEXT: Dispatch Width: 2
	# CHECK-NEXT: IPC: 0.33			# CHECK-NEXT: IPC: 0.57
	# CHECK-NEXT: Block RThroughput: 6.0			# CHECK-NEXT: Block RThroughput: 6.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	# CHECK-NEXT: [6]: HasSideEffects			# CHECK-NEXT: [6]: HasSideEffects

	# CHECK: [1] [2] [3] [4] [5] [6] Instructions:			# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
	# CHECK-NEXT: 2 6 4.00 imulq $5, %rcx, %rax			# CHECK-NEXT: 2 6 4.00 imulq $5, %rcx, %rax
	# CHECK-NEXT: 1 1 0.50 lzcntl %ecx, %eax			# CHECK-NEXT: 1 1 0.50 lzcntl %ecx, %eax
	# CHECK-NEXT: 1 1 0.50 andq %rcx, %rax			# CHECK-NEXT: 1 1 0.50 andq %rcx, %rax
	# CHECK-NEXT: 8 5 2.00 bsfq %rax, %rcx			# CHECK-NEXT: 8 5 2.00 bsfq %rax, %rcx

	# CHECK: Timeline view:			# CHECK: Timeline view:
	# CHECK-NEXT: 0123456789			# CHECK-NEXT: 01234567
	# CHECK-NEXT: Index 0123456789 0123456			# CHECK-NEXT: Index 0123456789

	# CHECK: [0,0] DeeeeeeER . . . .. imulq $5, %rcx, %rax			# CHECK: [0,0] DeeeeeeER . . . imulq $5, %rcx, %rax
	# CHECK-NEXT: [0,1] .DeE----R . . . .. lzcntl %ecx, %eax			# CHECK-NEXT: [0,1] .DeE----R . . . lzcntl %ecx, %eax
	# CHECK-NEXT: [0,2] .D=====eER. . . .. andq %rcx, %rax			# CHECK-NEXT: [0,2] .D=eE----R. . . andq %rcx, %rax
	# CHECK-NEXT: [0,3] . D=====eeeeeER. . .. bsfq %rax, %rcx			# CHECK-NEXT: [0,3] . D=eeeeeER . . bsfq %rax, %rcx
	# CHECK-NEXT: [1,0] . .D======eeeeeeER .. imulq $5, %rcx, %rax			# CHECK-NEXT: [1,0] . .D==eeeeeeER. imulq $5, %rcx, %rax
	# CHECK-NEXT: [1,1] . . D=====eE-----R .. lzcntl %ecx, %eax			# CHECK-NEXT: [1,1] . . D=eE-----R. lzcntl %ecx, %eax
	# CHECK-NEXT: [1,2] . . D===========eER .. andq %rcx, %rax			# CHECK-NEXT: [1,2] . . D==eE-----R andq %rcx, %rax
	# CHECK-NEXT: [1,3] . . D===========eeeeeER bsfq %rax, %rcx			# CHECK-NEXT: [1,3] . . D==eeeeeER bsfq %rax, %rcx

	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	# CHECK-NEXT: 0. 2 4.0 0.5 0.0 imulq $5, %rcx, %rax			# CHECK-NEXT: 0. 2 2.0 0.5 0.0 imulq $5, %rcx, %rax
	# CHECK-NEXT: 1. 2 3.5 0.5 4.5 lzcntl %ecx, %eax			# CHECK-NEXT: 1. 2 1.5 0.5 4.5 lzcntl %ecx, %eax
	# CHECK-NEXT: 2. 2 9.0 0.0 0.0 andq %rcx, %rax			# CHECK-NEXT: 2. 2 2.5 0.0 4.5 andq %rcx, %rax
	# CHECK-NEXT: 3. 2 9.0 0.0 0.0 bsfq %rax, %rcx			# CHECK-NEXT: 3. 2 2.5 0.0 0.0 bsfq %rax, %rcx

test/tools/llvm-mca/X86/BtVer2/clear-super-register-2.s

	Show All 27 Lines
	vaddps %ymm3, %ymm1, %ymm4			vaddps %ymm3, %ymm1, %ymm4
	vaddps %ymm3, %ymm1, %ymm4			vaddps %ymm3, %ymm1, %ymm4
	vaddps %ymm3, %ymm1, %ymm4			vaddps %ymm3, %ymm1, %ymm4
	vaddps %ymm3, %ymm1, %ymm4			vaddps %ymm3, %ymm1, %ymm4
	vandps %xmm4, %xmm1, %xmm0			vandps %xmm4, %xmm1, %xmm0

	# CHECK: Iterations: 100			# CHECK: Iterations: 100
	# CHECK-NEXT: Instructions: 1800			# CHECK-NEXT: Instructions: 1800
	# CHECK-NEXT: Total Cycles: 7003			# CHECK-NEXT: Total Cycles: 3811
	# CHECK-NEXT: Dispatch Width: 2			# CHECK-NEXT: Dispatch Width: 2
	# CHECK-NEXT: IPC: 0.26			# CHECK-NEXT: IPC: 0.47
	# CHECK-NEXT: Block RThroughput: 38.0			# CHECK-NEXT: Block RThroughput: 38.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	Show All 15 Lines
	# CHECK-NEXT: 2 3 2.00 vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: 2 3 2.00 vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: 2 3 2.00 vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: 2 3 2.00 vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: 2 3 2.00 vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: 2 3 2.00 vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: 2 3 2.00 vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: 2 3 2.00 vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: 2 3 2.00 vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: 2 3 2.00 vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: 1 1 0.50 vandps %xmm4, %xmm1, %xmm0			# CHECK-NEXT: 1 1 0.50 vandps %xmm4, %xmm1, %xmm0

	# CHECK: Timeline view:			# CHECK: Timeline view:
	# CHECK-NEXT: 0123456789 0123456789 0123456789 01234			# CHECK-NEXT: 0123456789 0123456789 0123456789 0123456789
	# CHECK-NEXT: Index 0123456789 0123456789 0123456789 0123456789			# CHECK-NEXT: Index 0123456789 0123456789 0123456789 0123456789

	# CHECK: [0,0] DeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeER . . . . . . . vdivps %ymm0, %ymm1, %ymm3			# CHECK: [0,0] DeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeER . . . . . . . . vdivps %ymm0, %ymm1, %ymm3
	# CHECK-NEXT: [0,1] .DeeeE----------------------------------R . . . . . . . vaddps %xmm0, %xmm1, %xmm3			# CHECK-NEXT: [0,1] .DeeeE----------------------------------R . . . . . . . . vaddps %xmm0, %xmm1, %xmm3
	# CHECK-NEXT: [0,2] . D====================================eeeER . . . . . . . vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: [0,2] . D==eeeE--------------------------------R . . . . . . . . vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: [0,3] . D=====================================eeeER . . . . . . vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: [0,3] . D===eeeE------------------------------R . . . . . . . . vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: [0,4] . D======================================eeeER . . . . . . vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: [0,4] . D====eeeE-----------------------------R . . . . . . . . vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: [0,5] . D=======================================eeeER. . . . . . vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: [0,5] . D=====eeeE---------------------------R . . . . . . . . vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: [0,6] . .D========================================eeeER . . . . . vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: [0,6] . .D======eeeE--------------------------R . . . . . . . . vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: [0,7] . . D=========================================eeeER . . . . . vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: [0,7] . . D=======eeeE------------------------R . . . . . . . . vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: [0,8] . . D==========================================eeeER . . . . vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: [0,8] . . D========eeeE-----------------------R. . . . . . . . vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: [0,9] . . D===========================================eeeER . . . . vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: [0,9] . . D=========eeeE---------------------R. . . . . . . . vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: [0,10] . . D============================================eeeER. . . . vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: [0,10] . . D==========eeeE--------------------R . . . . . . . vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: [0,11] . . .D=============================================eeeER . . . vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: [0,11] . . .D===========eeeE------------------R . . . . . . . vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: [0,12] . . . D==============================================eeeER . . . vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: [0,12] . . . D============eeeE-----------------R . . . . . . . vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: [0,13] . . . D===============================================eeeER . . vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: [0,13] . . . D=============eeeE---------------R . . . . . . . vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: [0,14] . . . D================================================eeeER . . vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: [0,14] . . . D==============eeeE--------------R . . . . . . . vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: [0,15] . . . D=================================================eeeER. . vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: [0,15] . . . D===============eeeE------------R . . . . . . . vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: [0,16] . . . .D==================================================eeeER . vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: [0,16] . . . .D================eeeE-----------R . . . . . . . vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: [0,17] . . . . D====================================================eER . vandps %xmm4, %xmm1, %xmm0			# CHECK-NEXT: [0,17] . . . . D==================eE----------R . . . . . . . vandps %xmm4, %xmm1, %xmm0
				# CHECK-NEXT: [1,0] . . . . D====================eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeER. vdivps %ymm0, %ymm1, %ymm3
				# CHECK-NEXT: [1,1] . . . . D=================eeeE-------------------------------------R. vaddps %xmm0, %xmm1, %xmm3
				# CHECK-NEXT: [1,2] . . . . D===================eeeE-----------------------------------R vaddps %ymm3, %ymm1, %ymm4
				# CHECK-NEXT: [1,3] . . . . .D====================eeeE---------------------------------R vaddps %ymm3, %ymm1, %ymm4

	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	# CHECK-NEXT: 0. 1 1.0 1.0 0.0 vdivps %ymm0, %ymm1, %ymm3			# CHECK-NEXT: 0. 2 11.0 1.5 0.0 vdivps %ymm0, %ymm1, %ymm3
	# CHECK-NEXT: 1. 1 1.0 1.0 34.0 vaddps %xmm0, %xmm1, %xmm3			# CHECK-NEXT: 1. 2 9.5 0.5 35.5 vaddps %xmm0, %xmm1, %xmm3
	# CHECK-NEXT: 2. 1 37.0 0.0 0.0 vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: 2. 2 11.5 0.0 33.5 vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: 3. 1 38.0 2.0 0.0 vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: 3. 2 12.5 2.0 31.5 vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: 4. 1 39.0 4.0 0.0 vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: 4. 1 5.0 4.0 29.0 vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: 5. 1 40.0 6.0 0.0 vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: 5. 1 6.0 6.0 27.0 vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: 6. 1 41.0 8.0 0.0 vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: 6. 1 7.0 7.0 26.0 vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: 7. 1 42.0 10.0 0.0 vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: 7. 1 8.0 8.0 24.0 vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: 8. 1 43.0 12.0 0.0 vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: 8. 1 9.0 9.0 23.0 vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: 9. 1 44.0 14.0 0.0 vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: 9. 1 10.0 10.0 21.0 vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: 10. 1 45.0 16.0 0.0 vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: 10. 1 11.0 11.0 20.0 vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: 11. 1 46.0 18.0 0.0 vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: 11. 1 12.0 12.0 18.0 vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: 12. 1 47.0 20.0 0.0 vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: 12. 1 13.0 13.0 17.0 vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: 13. 1 48.0 22.0 0.0 vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: 13. 1 14.0 14.0 15.0 vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: 14. 1 49.0 24.0 0.0 vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: 14. 1 15.0 15.0 14.0 vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: 15. 1 50.0 26.0 0.0 vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: 15. 1 16.0 16.0 12.0 vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: 16. 1 51.0 28.0 0.0 vaddps %ymm3, %ymm1, %ymm4			# CHECK-NEXT: 16. 1 17.0 17.0 11.0 vaddps %ymm3, %ymm1, %ymm4
	# CHECK-NEXT: 17. 1 53.0 0.0 0.0 vandps %xmm4, %xmm1, %xmm0			# CHECK-NEXT: 17. 1 19.0 0.0 10.0 vandps %xmm4, %xmm1, %xmm0

tools/llvm-mca/InstrBuilder.h

	Show All 11 Lines
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_TOOLS_LLVM_MCA_INSTRBUILDER_H			#ifndef LLVM_TOOLS_LLVM_MCA_INSTRBUILDER_H
	#define LLVM_TOOLS_LLVM_MCA_INSTRBUILDER_H			#define LLVM_TOOLS_LLVM_MCA_INSTRBUILDER_H

	#include "Instruction.h"			#include "Instruction.h"
	#include "Support.h"			#include "Support.h"
				#include "llvm/MC/MCInstrAnalysis.h"
	#include "llvm/MC/MCInstrInfo.h"			#include "llvm/MC/MCInstrInfo.h"
				#include "llvm/MC/MCRegisterInfo.h"
	#include "llvm/MC/MCSubtargetInfo.h"			#include "llvm/MC/MCSubtargetInfo.h"

	namespace mca {			namespace mca {

	class DispatchUnit;			class DispatchUnit;

	/// A builder class that knows how to construct Instruction objects.			/// A builder class that knows how to construct Instruction objects.
	///			///
	/// Every llvm-mca Instruction is described by an object of class InstrDesc.			/// Every llvm-mca Instruction is described by an object of class InstrDesc.
	/// An InstrDesc describes which registers are read/written by the instruction,			/// An InstrDesc describes which registers are read/written by the instruction,
	/// as well as the instruction latency and hardware resources consumed.			/// as well as the instruction latency and hardware resources consumed.
	///			///
	/// This class is used by the tool to construct Instructions and instruction			/// This class is used by the tool to construct Instructions and instruction
	/// descriptors (i.e. InstrDesc objects).			/// descriptors (i.e. InstrDesc objects).
	/// Information from the machine scheduling model is used to identify processor			/// Information from the machine scheduling model is used to identify processor
	/// resources that are consumed by an instruction.			/// resources that are consumed by an instruction.
	class InstrBuilder {			class InstrBuilder {
	const llvm::MCSubtargetInfo &STI;			const llvm::MCSubtargetInfo &STI;
	const llvm::MCInstrInfo &MCII;			const llvm::MCInstrInfo &MCII;
				const llvm::MCRegisterInfo &MRI;
				const llvm::MCInstrAnalysis &MCIA;
	llvm::SmallVector<uint64_t, 8> ProcResourceMasks;			llvm::SmallVector<uint64_t, 8> ProcResourceMasks;

	llvm::DenseMap<unsigned short, std::unique_ptr<const InstrDesc>> Descriptors;			llvm::DenseMap<unsigned short, std::unique_ptr<const InstrDesc>> Descriptors;
	llvm::DenseMap<const llvm::MCInst *, std::unique_ptr<const InstrDesc>>			llvm::DenseMap<const llvm::MCInst *, std::unique_ptr<const InstrDesc>>
	VariantDescriptors;			VariantDescriptors;

	const InstrDesc &createInstrDescImpl(const llvm::MCInst &MCI);			const InstrDesc &createInstrDescImpl(const llvm::MCInst &MCI);
	InstrBuilder(const InstrBuilder &) = delete;			InstrBuilder(const InstrBuilder &) = delete;
	InstrBuilder &operator=(const InstrBuilder &) = delete;			InstrBuilder &operator=(const InstrBuilder &) = delete;

	public:			public:
	InstrBuilder(const llvm::MCSubtargetInfo &sti, const llvm::MCInstrInfo &mcii)			InstrBuilder(const llvm::MCSubtargetInfo &sti, const llvm::MCInstrInfo &mcii,
				mattdUnsubmitted Not Done Reply Inline Actions Should the names of the formal parameters be capitalized? I realize you are trying to avoid clashing with the member names. mattd: Should the names of the formal parameters be capitalized? I realize you are trying to avoid…
				andreadbAuthorUnsubmitted Not Done Reply Inline Actions I have seen it done in a few (not many) places in LLVM specifically to avoid the issue with name clash. I can use different names to avoid the name clashing. Alternatively, I just capitalise the param names; it would still work, but it would not be nice to read. andreadb: I have seen it done in a few (not many) places in LLVM specifically to avoid the issue with…
	: STI(sti), MCII(mcii),			const llvm::MCRegisterInfo &mri,
				const llvm::MCInstrAnalysis &mcia)
				: STI(sti), MCII(mcii), MRI(mri), MCIA(mcia),
	ProcResourceMasks(STI.getSchedModel().getNumProcResourceKinds()) {			ProcResourceMasks(STI.getSchedModel().getNumProcResourceKinds()) {
	computeProcResourceMasks(STI.getSchedModel(), ProcResourceMasks);			computeProcResourceMasks(STI.getSchedModel(), ProcResourceMasks);
	}			}

	const InstrDesc &getOrCreateInstrDesc(const llvm::MCInst &MCI);			const InstrDesc &getOrCreateInstrDesc(const llvm::MCInst &MCI);
	// Returns an array of processor resource masks.			// Returns an array of processor resource masks.
	// Masks are computed by function mca::computeProcResourceMasks. see			// Masks are computed by function mca::computeProcResourceMasks. see
	// Support.h for a description of how masks are computed and how masks can be			// Support.h for a description of how masks are computed and how masks can be
	Show All 10 Lines

tools/llvm-mca/InstrBuilder.cpp

//===--------------------- InstrBuilder.cpp ---------------------- C++ --===//		//===--------------------- InstrBuilder.cpp ---------------------- C++ --===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
/// \file		/// \file
///		///
/// This file implements the InstrBuilder interface.		/// This file implements the InstrBuilder interface.
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "InstrBuilder.h"		#include "InstrBuilder.h"
		#include "llvm/ADT/BitVector.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/MC/MCInst.h"		#include "llvm/MC/MCInst.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Support/WithColor.h"		#include "llvm/Support/WithColor.h"

#define DEBUG_TYPE "llvm-mca"		#define DEBUG_TYPE "llvm-mca"

▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	static void computeMaxLatency(InstrDesc &ID, const MCInstrDesc &MCDesc,
// If latency is unknown, then conservatively assume a MaxLatency of 100cy.		// If latency is unknown, then conservatively assume a MaxLatency of 100cy.
ID.MaxLatency = Latency < 0 ? 100U : static_cast<unsigned>(Latency);		ID.MaxLatency = Latency < 0 ? 100U : static_cast<unsigned>(Latency);
}		}

static void populateWrites(InstrDesc &ID, const MCInst &MCI,		static void populateWrites(InstrDesc &ID, const MCInst &MCI,
const MCInstrDesc &MCDesc,		const MCInstrDesc &MCDesc,
const MCSchedClassDesc &SCDesc,		const MCSchedClassDesc &SCDesc,
const MCSubtargetInfo &STI) {		const MCSubtargetInfo &STI) {
// Set if writes through this opcode may update super registers.
// TODO: on x86-64, a 4 byte write of a general purpose register always
// fully updates the super-register.
// More in general, (at least on x86) not all register writes perform
// a partial (super-)register update.
// For example, an AVX instruction that writes on a XMM register implicitly
// zeroes the upper half of every aliasing super-register.
//
// For now, we pessimistically assume that writes are all potentially
// partial register updates. This is a good default for most targets, execept
// for those like x86 which implement a special semantic for certain opcodes.
// At least on x86, this may lead to an inaccurate prediction of the
// instruction level parallelism.
bool FullyUpdatesSuperRegisters = false;

// Now Populate Writes.

// This algorithm currently works under the strong (and potentially incorrect)		// This algorithm currently works under the strong (and potentially incorrect)
// assumption that information related to register def/uses can be obtained		// assumption that information related to register def/uses can be obtained
// from MCInstrDesc.		// from MCInstrDesc.
//		//
// However class MCInstrDesc is used to describe MachineInstr objects and not		// However class MCInstrDesc is used to describe MachineInstr objects and not
// MCInst objects. To be more specific, MCInstrDesc objects are opcode		// MCInst objects. To be more specific, MCInstrDesc objects are opcode
// descriptors that are automatically generated via tablegen based on the		// descriptors that are automatically generated via tablegen based on the
// instruction set information available from the target .td files. That		// instruction set information available from the target .td files. That
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	if (CurrentDef < NumWriteLatencyEntries) {
// Conservatively default to MaxLatency.		// Conservatively default to MaxLatency.
Write.Latency = WLE.Cycles == -1 ? ID.MaxLatency : WLE.Cycles;		Write.Latency = WLE.Cycles == -1 ? ID.MaxLatency : WLE.Cycles;
Write.SClassOrWriteResourceID = WLE.WriteResourceID;		Write.SClassOrWriteResourceID = WLE.WriteResourceID;
} else {		} else {
// Assign a default latency for this write.		// Assign a default latency for this write.
Write.Latency = ID.MaxLatency;		Write.Latency = ID.MaxLatency;
Write.SClassOrWriteResourceID = 0;		Write.SClassOrWriteResourceID = 0;
}		}
Write.FullyUpdatesSuperRegs = FullyUpdatesSuperRegisters;
Write.IsOptionalDef = false;		Write.IsOptionalDef = false;
LLVM_DEBUG({		LLVM_DEBUG({
dbgs() << "\t\tOpIdx=" << Write.OpIndex << ", Latency=" << Write.Latency		dbgs() << "\t\tOpIdx=" << Write.OpIndex << ", Latency=" << Write.Latency
<< ", WriteResourceID=" << Write.SClassOrWriteResourceID << '\n';		<< ", WriteResourceID=" << Write.SClassOrWriteResourceID << '\n';
});		});
CurrentDef++;		CurrentDef++;
}		}

▲ Show 20 Lines • Show All 196 Lines • ▼ Show 20 Lines	for (const ReadDescriptor &RD : D.Reads) {
if (!RegID)		if (!RegID)
continue;		continue;

// Okay, this is a register operand. Create a ReadState for it.		// Okay, this is a register operand. Create a ReadState for it.
assert(RegID > 0 && "Invalid register ID found!");		assert(RegID > 0 && "Invalid register ID found!");
NewIS->getUses().emplace_back(llvm::make_unique<ReadState>(RD, RegID));		NewIS->getUses().emplace_back(llvm::make_unique<ReadState>(RD, RegID));
}		}

		// Use a bit-vector to track register writes that implicitly clear the upper
		// portion of the underlying super-registers.
		// There is one bit for every (explicit or implicit) register write.
		BitVector BV(D.Writes.size());

		// Now query the MCInstrAnalysis object to obtain information about which
		// register writes implicitly clear the upper portion of a super-register.
		MCIA.clearsSuperRegisters(MRI, MCI, BV);

// Initialize writes.		// Initialize writes.
		unsigned WriteIndex = 0;
for (const WriteDescriptor &WD : D.Writes) {		for (const WriteDescriptor &WD : D.Writes) {
unsigned RegID =		unsigned RegID =
WD.OpIndex == -1 ? WD.RegisterID : MCI.getOperand(WD.OpIndex).getReg();		WD.OpIndex == -1 ? WD.RegisterID : MCI.getOperand(WD.OpIndex).getReg();
// Check if this is a optional definition that references NoReg.		// Check if this is a optional definition that references NoReg.
if (WD.IsOptionalDef && !RegID)		if (WD.IsOptionalDef && !RegID) {
		++WriteIndex;
continue;		continue;
		}

assert(RegID && "Expected a valid register ID!");		assert(RegID && "Expected a valid register ID!");
NewIS->getDefs().emplace_back(llvm::make_unique<WriteState>(WD, RegID));		NewIS->getDefs().emplace_back(
		llvm::make_unique<WriteState>(WD, RegID, BV[WriteIndex]));
		++WriteIndex;
}		}

return NewIS;		return NewIS;
}		}
} // namespace mca		} // namespace mca

tools/llvm-mca/Instruction.h

Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
struct WriteDescriptor {		struct WriteDescriptor {
// Operand index. -1 if this is an implicit write.		// Operand index. -1 if this is an implicit write.
int OpIndex;		int OpIndex;
// Write latency. Number of cycles before write-back stage.		// Write latency. Number of cycles before write-back stage.
int Latency;		int Latency;
// This field is set to a value different than zero only if this		// This field is set to a value different than zero only if this
// is an implicit definition.		// is an implicit definition.
unsigned RegisterID;		unsigned RegisterID;
// True if this write generates a partial update of a super-registers.
// On X86, this flag is set by byte/word writes on GPR registers. Also,
// a write of an XMM register only partially updates the corresponding
// YMM super-register if the write is associated to a legacy SSE instruction.
bool FullyUpdatesSuperRegs;
// Instruction itineraries would set this field to the SchedClass ID.		// Instruction itineraries would set this field to the SchedClass ID.
// Otherwise, it defaults to the WriteResourceID from the MCWriteLatencyEntry		// Otherwise, it defaults to the WriteResourceID from the MCWriteLatencyEntry
// element associated to this write.		// element associated to this write.
// When computing read latencies, this value is matched against the		// When computing read latencies, this value is matched against the
// "ReadAdvance" information. The hardware backend may implement		// "ReadAdvance" information. The hardware backend may implement
// dedicated forwarding paths to quickly propagate write results to dependent		// dedicated forwarding paths to quickly propagate write results to dependent
// instructions waiting in the reservation station (effectively bypassing the		// instructions waiting in the reservation station (effectively bypassing the
// write-back stage).		// write-back stage).
Show All 38 Lines	class WriteState {
int CyclesLeft;		int CyclesLeft;

// Actual register defined by this write. This field is only used		// Actual register defined by this write. This field is only used
// to speedup queries on the register file.		// to speedup queries on the register file.
// For implicit writes, this field always matches the value of		// For implicit writes, this field always matches the value of
// field RegisterID from WD.		// field RegisterID from WD.
unsigned RegisterID;		unsigned RegisterID;

		// True if this write implicitly clears the upper portion of RegisterID's
		// super-registers.
		bool ClearsSuperRegs;

// A list of dependent reads. Users is a set of dependent		// A list of dependent reads. Users is a set of dependent
// reads. A dependent read is added to the set only if CyclesLeft		// reads. A dependent read is added to the set only if CyclesLeft
// is "unknown". As soon as CyclesLeft is 'known', each user in the set		// is "unknown". As soon as CyclesLeft is 'known', each user in the set
// gets notified with the actual CyclesLeft.		// gets notified with the actual CyclesLeft.

// The 'second' element of a pair is a "ReadAdvance" number of cycles.		// The 'second' element of a pair is a "ReadAdvance" number of cycles.
std::set<std::pair<ReadState *, int>> Users;		std::set<std::pair<ReadState *, int>> Users;

public:		public:
WriteState(const WriteDescriptor &Desc, unsigned RegID)		WriteState(const WriteDescriptor &Desc, unsigned RegID,
: WD(Desc), CyclesLeft(UNKNOWN_CYCLES), RegisterID(RegID) {}		bool clearsSuperRegs = false)
		: WD(Desc), CyclesLeft(UNKNOWN_CYCLES), RegisterID(RegID),
		ClearsSuperRegs(clearsSuperRegs) {}
WriteState(const WriteState &Other) = delete;		WriteState(const WriteState &Other) = delete;
WriteState &operator=(const WriteState &Other) = delete;		WriteState &operator=(const WriteState &Other) = delete;

int getCyclesLeft() const { return CyclesLeft; }		int getCyclesLeft() const { return CyclesLeft; }
unsigned getWriteResourceID() const { return WD.SClassOrWriteResourceID; }		unsigned getWriteResourceID() const { return WD.SClassOrWriteResourceID; }
unsigned getRegisterID() const { return RegisterID; }		unsigned getRegisterID() const { return RegisterID; }

void addUser(ReadState *Use, int ReadAdvance);		void addUser(ReadState *Use, int ReadAdvance);
bool fullyUpdatesSuperRegs() const { return WD.FullyUpdatesSuperRegs; }		bool clearsSuperRegisters() const { return ClearsSuperRegs; }

// On every cycle, update CyclesLeft and notify dependent users.		// On every cycle, update CyclesLeft and notify dependent users.
void cycleEvent();		void cycleEvent();
void onInstructionIssued();		void onInstructionIssued();

#ifndef NDEBUG		#ifndef NDEBUG
void dump() const;		void dump() const;
#endif		#endif
▲ Show 20 Lines • Show All 213 Lines • Show Last 20 Lines

tools/llvm-mca/RegisterFile.cpp

Show First 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	void RegisterFile::addRegisterWrite(WriteState &WS,

// No physical registers are allocated for instructions that are optimized in		// No physical registers are allocated for instructions that are optimized in
// hardware. For example, zero-latency data-dependency breaking instructions		// hardware. For example, zero-latency data-dependency breaking instructions
// don't consume physical registers.		// don't consume physical registers.
if (ShouldAllocatePhysRegs)		if (ShouldAllocatePhysRegs)
allocatePhysRegs(Mapping.second, UsedPhysRegs);		allocatePhysRegs(Mapping.second, UsedPhysRegs);

// If this is a partial update, then we are done.		// If this is a partial update, then we are done.
if (!WS.fullyUpdatesSuperRegs())		if (!WS.clearsSuperRegisters())
return;		return;

for (MCSuperRegIterator I(RegID, &MRI); I.isValid(); ++I)		for (MCSuperRegIterator I(RegID, &MRI); I.isValid(); ++I)
RegisterMappings[*I].first = &WS;		RegisterMappings[*I].first = &WS;
}		}

void RegisterFile::removeRegisterWrite(const WriteState &WS,		void RegisterFile::removeRegisterWrite(const WriteState &WS,
MutableArrayRef<unsigned> FreedPhysRegs,		MutableArrayRef<unsigned> FreedPhysRegs,
bool ShouldFreePhysRegs) {		bool ShouldFreePhysRegs) {
unsigned RegID = WS.getRegisterID();		unsigned RegID = WS.getRegisterID();
bool ShouldInvalidateSuperRegs = WS.fullyUpdatesSuperRegs();		bool ShouldInvalidateSuperRegs = WS.clearsSuperRegisters();

assert(RegID != 0 && "Invalidating an already invalid register?");		assert(RegID != 0 && "Invalidating an already invalid register?");
assert(WS.getCyclesLeft() != -512 &&		assert(WS.getCyclesLeft() != -512 &&
"Invalidating a write of unknown cycles!");		"Invalidating a write of unknown cycles!");
assert(WS.getCyclesLeft() <= 0 && "Invalid cycles left for this write!");		assert(WS.getCyclesLeft() <= 0 && "Invalid cycles left for this write!");
RegisterMapping &Mapping = RegisterMappings[RegID];		RegisterMapping &Mapping = RegisterMappings[RegID];
if (!Mapping.first)		if (!Mapping.first)
return;		return;
▲ Show 20 Lines • Show All 101 Lines • Show Last 20 Lines

tools/llvm-mca/llvm-mca.cpp

Show First 20 Lines • Show All 375 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {

std::unique_ptr<buffer_ostream> BOS;		std::unique_ptr<buffer_ostream> BOS;

mca::CodeRegions Regions(SrcMgr);		mca::CodeRegions Regions(SrcMgr);
MCStreamerWrapper Str(Ctx, Regions);		MCStreamerWrapper Str(Ctx, Regions);

std::unique_ptr<MCInstrInfo> MCII(TheTarget->createMCInstrInfo());		std::unique_ptr<MCInstrInfo> MCII(TheTarget->createMCInstrInfo());

		std::unique_ptr<MCInstrAnalysis> MCIA(
		TheTarget->createMCInstrAnalysis(MCII.get()));

if (!MCPU.compare("native"))		if (!MCPU.compare("native"))
MCPU = llvm::sys::getHostCPUName();		MCPU = llvm::sys::getHostCPUName();

std::unique_ptr<MCSubtargetInfo> STI(		std::unique_ptr<MCSubtargetInfo> STI(
TheTarget->createMCSubtargetInfo(TripleName, MCPU, /* FeaturesStr */ ""));		TheTarget->createMCSubtargetInfo(TripleName, MCPU, /* FeaturesStr */ ""));
if (!STI->isCPUStringValid(MCPU))		if (!STI->isCPUStringValid(MCPU))
return 1;		return 1;

▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {

const MCSchedModel &SM = STI->getSchedModel();		const MCSchedModel &SM = STI->getSchedModel();

unsigned Width = SM.IssueWidth;		unsigned Width = SM.IssueWidth;
if (DispatchWidth)		if (DispatchWidth)
Width = DispatchWidth;		Width = DispatchWidth;

// Create an instruction builder.		// Create an instruction builder.
mca::InstrBuilder IB(STI, MCII);		mca::InstrBuilder IB(STI, MCII, MRI, MCIA);

// Number each region in the sequence.		// Number each region in the sequence.
unsigned RegionIdx = 0;		unsigned RegionIdx = 0;
for (const std::unique_ptr<mca::CodeRegion> &Region : Regions) {		for (const std::unique_ptr<mca::CodeRegion> &Region : Regions) {
// Skip empty code regions.		// Skip empty code regions.
if (Region->empty())		if (Region->empty())
continue;		continue;

▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines