This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
4/4
MachineStableHash.h
18/26
StableHashTree.h
-
lib/CodeGen/
-
CodeGen/
-
CMakeLists.txt
-
MachineStableHash.cpp
3
StableHashTree.cpp
-
unittests/CodeGen/
-
CodeGen/
-
CMakeLists.txt
-
StableHashTreeTest.cpp

Differential D88180

[RFC] StableHashTree Implementation.
Needs ReviewPublic

Authored by plotfi on Sep 23 2020, 1:20 PM.

Download Raw Diff

Details

Reviewers

lanza
paquette
thegameg
kyulee
manmanren

Summary

This is a first pass at bringing some work that has been done on assisting the Machine Outliner with cross module outlining decisions. A lot of this work is inspired by or directly refactored from the Global Machine Outliner for ThinLTO talk from EuroLLVM 2020 (https://llvm.org/devmtg/2020-04/talks.html#TechTalk_58).

In this diff however, there is no LTO: This diff enables a way to serialize a representation of the MachineOutliner suffix tree as a HashTree to disk. Serialized HashTrees can be read in and used to aid in making better outlining decisions for modules where a Candidate sequence only occurs once, but which have duplicate Candidates off module. This is a first step and I anticipate this will evolve a lot from its current form.

For now I have a single test case to showcase the mechanics, but am working on more test cases.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	540 ms	linux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp

Event Timeline

plotfi created this revision.Sep 23 2020, 1:20 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 23 2020, 1:20 PM

Herald added subscribers: llvm-commits, dexonsmith, hiraditya, mgorny. · View Herald Transcript

plotfi requested review of this revision.Sep 23 2020, 1:20 PM

@thegameg: @paquette tells be you might have some ideas on a better format than this json business going on here (based on work you've done on remarks). What do you think?

Harbormaster completed remote builds in B72717: Diff 293840.Sep 23 2020, 1:58 PM

plotfi updated this revision to Diff 294195.Sep 24 2020, 4:34 PM

Harbormaster completed remote builds in B72891: Diff 294195.Sep 24 2020, 5:27 PM

Cleaning up patch to be easier to understand.

spelling and grammar

Harbormaster completed remote builds in B72899: Diff 294211.Sep 24 2020, 6:36 PM

Harbormaster completed remote builds in B72901: Diff 294212.Sep 24 2020, 6:48 PM

clang-tidy

comments added

Harbormaster completed remote builds in B72907: Diff 294228.Sep 24 2020, 10:40 PM

I think it would make sense to put the StableHashTree implementation in its own patch. Then, in a follow up, you can plumb through the outliner support.

The data structure itself needs tests outside of the outliner, and I think it references the outliner itself a little too much in the comments.

I also feel like the outliner shouldn't be responsible for producing HashTree. That seems like a different thing which may have its own cost model and considerations. It might make sense to adapt the IRSimilarity framework to MIR and use that for the purposes of producing the tree.

Having the outliner consume the tree seems fine to me though.

llvm/include/llvm/CodeGen/MachineOutliner.h
185 ↗	(On Diff #294228)	Variable name in comment should match the actual variable name. (Included existing typo for clarity)
188 ↗	(On Diff #294228)	Should say "Matches" or "Match", not "Matche" Maybe a more succinct name?
190 ↗	(On Diff #294228)	Typo
194 ↗	(On Diff #294228)	If you use an `Optional`, you can differentiate between "it's just empty" and "it's not actually being used" in the type itself. Also, would it make sense to use a `SmallVector` here? http://llvm.org/docs/ProgrammersManual.html#vector However, SmallVector<T, 0> is often a better option due to the advantages listed [in the SmallVector section]. std::vector is still useful when you need to store more than UINT32_MAX elements or when interfacing with code that expects vectors
llvm/include/llvm/CodeGen/MachineStableHash.h
1	Might be worth fixing the filename here in a NFC commit?
26	remove unnecessary whitespace change
34	Why not both const?
34	Probably should have a documentation comment?
llvm/include/llvm/CodeGen/StableHashTree.h
12	I think that this comment can describe what this actually does in a bit more detail. If this is intended to be a reusable data structure (as the comment implies), I think it'd make sense to address the following questions: What does the stable hash tree actually do? Why would you use it in a transformation? Also "Global Machine Outlining" isn't defined anywhere. In the patch description you have: A lot of this work is inspired by or directly refactored from the Global Machine Outliner for ThinLTO talk from EuroLLVM 2020 (https://llvm.org/devmtg/2020-04/talks.html#TechTalk_58). So it'd be nice to include that somewhere, so people curious about that can take a look.
29	Should be a Doxygen comment Try to use class names to make things clear
34	Move member variable documentation into the struct.
35	Move member variable documentation into the struct.
37	If you use a more meaningful name than "Data", it shouldn't be necessary to document this?
38	Could this just be a function that checks if the map is empty?
39	Would it be appropriate to use an `IndexedMap` here? http://llvm.org/docs/ProgrammersManual.html#llvm-adt-indexedmap-h IndexedMap is a specialized container for mapping small dense integers (or values that can be mapped to small dense integers) to some other type. It is internally implemented as a vector with a mapping function that maps the keys to the dense integer range. I suppose it depends on if `stable_hash` tends to be dense, by whatever measure of dense is being used here.
42	Match the name of the file?
46	I think that you can drop the part about `walkEdges`? General documentation for the data structure and how it works would be better in the `\file` comment at the top.
50	These type names are pretty long. Maybe it'd be good to reduce the cognitive overload by doing something like this somewhere: /// Graph traversal callback types. ///{ using EdgeCallbackFn = std::function<void(const HashNode , const HashNode )>; using NodeCallbackFn = std::function<void(const HashNode *)>; ///}
51	Although vertex and node are interchangeable terms, I think it'd be good to be consistent and just choose one?
57	Should be Doxygen
60	Documentation comments don't need to include implementation info; that can go out of date.
61	Use Doxygen stuff
71	No need to mention where this is called if you document the algorithm somewhere.
75	- Documentation should just say what the function does, not include implementation details. Doxygen comment.
80	Probably good to not mention outlining here if this is supposed to be general-purpose?
90	Needs documentation.
llvm/lib/CodeGen/MachineOutliner.cpp
114 ↗	(On Diff #294228)	This is a long sentence. Split it up?
372 ↗	(On Diff #294228)	Capitalization?
596 ↗	(On Diff #294228)	Can this go in a function?
611 ↗	(On Diff #294228)	Do you have to use `llvm::` here?
629 ↗	(On Diff #294228)
668 ↗	(On Diff #294228)	I'm not a fan of messing with the found candidates or cost model to make this work. If you wanted to handle candidates that appear once across many modules, I think it would be best to pre-populate a hash tree with known beneficial candidates versus trying to guess/mess with stuff during outlining? Since the tree is serialized to JSON, it should be possible to pre-populate it without using the outliner... Maybe it'd make sense to adapt the IR similarity framework for this portion somehow versus putting all of this in the outliner? It seems like generating the hash tree is really outside the scope of the pass. Consuming and using it seems okay though.
848 ↗	(On Diff #294228)	remove braces
946 ↗	(On Diff #294228)	remove braces

Updated based on @paquette's feedback. This only includes the StableHashTree data structure and a unit test.

plotfi retitled this revision from [RFC] HashTree and MachineOutliner HashTree Serialization for cross module data sharing. to [RFC] StableHashTree Implementation..Oct 27 2020, 1:17 PM

plotfi marked 20 inline comments as done.Oct 27 2020, 1:25 PM

plotfi added inline comments.

llvm/include/llvm/CodeGen/StableHashTree.h
38	I thought this myself, but you need an IsTerminal flag because you could have some sequence you want to hash like: ORRWri ORRWri ORRWri RET as well as ORRWri ORRWri ORRWri You need the terminal to know that even though you have a node with successors that that know can be the terminal node in a sequence that was added.
39	Can I add this as a post commit NFC commit? I am unsure on the tradeoffs here at the moment.

Harbormaster completed remote builds in B76624: Diff 301092.Oct 27 2020, 2:02 PM

plotfi marked an inline comment as done.Oct 30 2020, 11:20 AM

plotfi marked an inline comment as not done.Nov 2 2020, 9:58 AM

If this data structure is a trie, is there any reason you can't just improve the existing SuffixTree to do all of this?

A suffix tree is just a compressed trie.

llvm/include/llvm/CodeGen/StableHashTree.h
58	Should `IsTerminal` ever be modified outside of `Insert` and `readFromBuffer`? Would it make sense to have it be a private member with a getter?
101	Should this be in StableHashTree.cpp?
123	Needs documentation? What happens if something inserted was already in the tree?
130	Seems like this may be a bit nicer as an Optional which returns where the thing was found? Optional<HashNode *> find(const StableHashSequence &Sequence) const; That way you can also access the specific node if you need it. Or, even better, if you had an iterator type for this data structure you could do something like this: iterator find(const StableHashSequence &Sequence) const; I feel like the iterator idea is probably better, since it's more consistent with the rest of the LLVM data types. You could even have an `edge_iterator` and a `vertex_iterator` for edge/vertex walks.
136	Should these functions be in StableHashTree.cpp?
144	Missing comment?
llvm/lib/CodeGen/StableHashTree.cpp
68	Typo
117	Should be consistent with how JSON is capitalized in things the a person might see on the command line.
181	Nit: It's a bit nicer to read if you put the simpler situations as the one you continue from. This lessens the indentation level of the more complicated case: if (I != Current->Successors.end()) { Current = I->second.get(); continue; } // Didn't find the hash in the current node's successors. Create a new one. std::unique_ptr<HashNode> Next = std::make_unique<HashNode>(); // ...

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

MachineStableHash.h

10 lines

StableHashTree.h

176 lines

lib/

CodeGen/

CMakeLists.txt

1 line

MachineStableHash.cpp

14 lines

StableHashTree.cpp

205 lines

unittests/

CodeGen/

CMakeLists.txt

1 line

StableHashTreeTest.cpp

112 lines

Diff 301092

llvm/include/llvm/CodeGen/MachineStableHash.h

//===------------ MachineStableHash.h - MIR Stable Hashing Utilities ------===//

paquetteUnsubmitted

Done

Might be worth fixing the filename here in a NFC commit?

paquette: Might be worth fixing the filename here in a NFC commit?

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

// Stable hashing for MachineInstr and MachineOperand. Useful or getting a

// hash across runs, modules, etc.

//===----------------------------------------------------------------------===//

#ifndef LLVM_CODEGEN_MACHINESTABLEHASH_H

#define LLVM_CODEGEN_MACHINESTABLEHASH_H

#include "llvm/CodeGen/MachineBasicBlock.h"

#include "llvm/CodeGen/StableHashing.h"

namespace llvm {

class MachineInstr;

class MachineOperand;

/// \returns a stable_hash for MachineOperand \p MO

stable_hash stableHashValue(const MachineOperand &MO);

paquetteUnsubmitted

Done

remove unnecessary whitespace change

paquette: remove unnecessary whitespace change

/// \returns a stable_hash of all the hashes of the opcode and MachineOperands

/// of MachineInstr \p MI.

stable_hash stableHashValue(const MachineInstr &MI, bool HashVRegs = false,

bool HashConstantPoolIndices = false,

bool HashMemOperands = false);

/// \returns a collection of stable_hashes for each of the MachineInstrs of

/// a given MachineBasicBlock from iterator \p Begin to \p End.

paquetteUnsubmitted

Done

std::vector<stable_hash>

- stableHashMachineInstrs(MachineBasicBlock::iterator &Begin,

+ stableHashMachineInstrs(const MachineBasicBlock::iterator &Begin,

const MachineBasicBlock::iterator &End);

Why not both const?

paquette: Why not both const?

paquetteUnsubmitted

Done

Probably should have a documentation comment?

paquette: Probably should have a documentation comment?

std::vector<stable_hash>

stableHashMachineInstrs(const MachineBasicBlock::iterator &Begin,

const MachineBasicBlock::iterator &End);

} // namespace llvm

#endif

llvm/include/llvm/CodeGen/StableHashTree.h

This file was added.

//===-- StableHashTree.h ----------------------------------------*- C++ -*-===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

///

/// \file

/// Contains a stable hash tree implementation based on llvm::stable_hash.

/// A StableHashTree is a Trie that contains sequences of hash values. This

/// data structure is generic, and is intended for use cases where something

paquetteUnsubmitted

Done

I think that this comment can describe what this actually does in a bit more detail. If this is intended to be a reusable data structure (as the comment implies), I think it'd make sense to address the following questions:

What does the stable hash tree actually do?
Why would you use it in a transformation?

Also "Global Machine Outlining" isn't defined anywhere.

In the patch description you have:

A lot of this work is inspired by or directly refactored from the Global Machine Outliner for ThinLTO talk from EuroLLVM 2020 (https://llvm.org/devmtg/2020-04/talks.html#TechTalk_58).

So it'd be nice to include that somewhere, so people curious about that can take a look.

paquette: I think that this comment can describe what this actually does in a bit more detail. If this is…

/// similar to a trie is already in use. The upside to using a StableHashTree is

/// that it can be used to understand data collected across modules, or it can

/// be used to serialize data about a build to disk for use in a future build.

///

/// To use a StableHashTree you must already have a way to take some sequence of

/// data and use llvm::stable_hash to turn that sequence into a

/// std::vector<llvm::stable_hash> (ie StableHashSequence). Each of these hash

/// sequences can be inserted into a StableHashTree where the beginning of a

/// unique sequence starts from the root of the tree and ends at a Terminal

/// (IsTerminal) node.

///

/// This StableHashTree was originally implemented as part of the EuroLLVM 2020

/// talk "Global Machine Outliner for ThinLTO":

///

/// https://llvm.org/devmtg/2020-04/talks.html#TechTalk_58

///

/// This talk covers how a global stable hash tree is used to collect

paquetteUnsubmitted

Done

namespace llvm {

- // A node in the hash tree might be terminal, i.e. it represents the end

+ // A HashNode might be a terminal, i.e. it represents the end

// of an stable instruction hash sequence that was outlined in some module.

Should be a Doxygen comment
Try to use class names to make things clear

paquette: - Should be a Doxygen comment - Try to use class names to make things clear

/// information about valid MachineOutliner Candidates across modules, and used

/// to inform modules where matching candidates are encountered but occur in

/// less frequency and as a result are ignored by the MachineOutliner had there

/// not been a global stable hash tree in use (assuming FullLTO is disabled).

///

paquetteUnsubmitted

Done

// different stable instruction hashes.

- // Data is the Hash for the current node

// IsTerminal is true if this node is the last node in a hash sequence

Move member variable documentation into the struct.

paquette: Move member variable documentation into the struct.

//===----------------------------------------------------------------------===//

paquetteUnsubmitted

Done

// Data is the Hash for the current node

- // IsTerminal is true if this node is the last node in a hash sequence

struct HashNode {

Move member variable documentation into the struct.

paquette: Move member variable documentation into the struct.

#ifndef LLVM_CODEGEN_STABLEHASHTREE_H

paquetteUnsubmitted

Done

struct HashNode {

- stable_hash Data = 0LL;

+ stable_hash Hash = 0LL;

bool IsTerminal{false};

If you use a more meaningful name than "Data", it shouldn't be necessary to document this?

paquette: If you use a more meaningful name than "Data", it shouldn't be necessary to document this?

#define LLVM_CODEGEN_STABLEHASHTREE_H

paquetteUnsubmitted

Done

stable_hash Data = 0LL;

- bool IsTerminal{false};

+ /// \returns true if this HashNode is the last in a hash sequence.

+ bool isTerminal() { return Successors.empty(); }

std::unordered_map<stable_hash, std::unique_ptr<HashNode>> Successors;

Could this just be a function that checks if the map is empty?

paquette: Could this just be a function that checks if the map is empty?

plotfiAuthorUnsubmitted

Done

I thought this myself, but you need an IsTerminal flag because you could have some sequence you want to hash like:

ORRWri
ORRWri
ORRWri
RET

as well as

ORRWri
ORRWri
ORRWri

You need the terminal to know that even though you have a node with successors that that know can be the terminal node in a sequence that was added.

plotfi: I thought this myself, but you need an IsTerminal flag because you could have some sequence you…

paquetteUnsubmitted

Not Done

bool IsTerminal{false};

- std::unordered_map<stable_hash, std::unique_ptr<HashNode>> Successors;

+ IndexedMap<stable_hash, std::unique_ptr<HashNode>> Successors;

};

class HashTree {

Would it be appropriate to use an IndexedMap here?

http://llvm.org/docs/ProgrammersManual.html#llvm-adt-indexedmap-h

IndexedMap is a specialized container for mapping small dense integers (or values that can be mapped to small dense integers) to some other type. It is internally implemented as a vector with a mapping function that maps the keys to the dense integer range.

I suppose it depends on if stable_hash tends to be dense, by whatever measure of dense is being used here.

paquette: Would it be appropriate to use an `IndexedMap` here? http://llvm.org/docs/ProgrammersManual.

plotfiAuthorUnsubmitted

Not Done

Can I add this as a post commit NFC commit? I am unsure on the tradeoffs here at the moment.

plotfi: Can I add this as a post commit NFC commit? I am unsure on the tradeoffs here at the moment.

#include <algorithm>

#include <memory>

#include <unordered_map>

paquetteUnsubmitted

Done

std::unordered_map<stable_hash, std::unique_ptr<HashNode>> Successors;

};

- class HashTree {

+ class StableHashTree {

public:

Match the name of the file?

paquette: Match the name of the file?

#include <vector>

#include "llvm/CodeGen/StableHashing.h"

#include "llvm/Support/Error.h"

paquetteUnsubmitted

Done

I think that you can drop the part about walkEdges?

General documentation for the data structure and how it works would be better in the \file comment at the top.

paquette: I think that you can drop the part about `walkEdges`? General documentation for the data…

#include "llvm/Support/VirtualFileSystem.h"

#include "llvm/Support/raw_ostream.h"

namespace llvm {

paquetteUnsubmitted

Done

void walkGraph(

- std::function<void(const HashNode *, const HashNode *)> CallbackEdge,

+ EdgeCallbackFn CallbackEdge,

std::function<void(const HashNode *)> CallbackVertex) const;

These type names are pretty long.

Maybe it'd be good to reduce the cognitive overload by doing something like this somewhere:

/// Graph traversal callback types.
///{
using EdgeCallbackFn = std::function<void(const HashNode *, const HashNode *)>;
using NodeCallbackFn = std::function<void(const HashNode *)>;
///}

paquette: These type names are pretty long. Maybe it'd be good to reduce the cognitive overload by doing…

paquetteUnsubmitted

Done

std::function<void(const HashNode *, const HashNode *)> CallbackEdge,

- std::function<void(const HashNode *)> CallbackVertex) const;

+ std::function<void(const HashNode *)> CallbackNode) const;

/// Walks the edges of a HashTree using walkGraph.

Although vertex and node are interchangeable terms, I think it'd be good to be consistent and just choose one?

paquette: Although vertex and node are interchangeable terms, I think it'd be good to be consistent and…

/// \brief A HashNode is an entry in a StableHashTree that contains a value Hash

/// as well as a collection of Successors (which are other HashNodes that are

/// part of a sequence of llvm::stable_hashes). A HashNode might

/// be IsTerminal meaning that it represents the end of a stable_hash sequence.

struct HashNode {

stable_hash Hash = 0LL;

paquetteUnsubmitted

Done

Should be Doxygen

paquette: Should be Doxygen

bool IsTerminal{false};

paquetteUnsubmitted

Not Done

Should IsTerminal ever be modified outside of Insert and readFromBuffer?

Would it make sense to have it be a private member with a getter?

paquette: Should `IsTerminal` ever be modified outside of `Insert` and `readFromBuffer`? Would it make…

std::unordered_map<stable_hash, std::unique_ptr<HashNode>> Successors;

};

paquetteUnsubmitted

Done

void walkVertices(std::function<void(const HashNode *)> Callback) const;

- /// Uses HashTree::walkEdges to print the edges of the hash tree.

+ /// Print the edges of the HashTree.

/// If a DebugMap is provided, then it will be used to provide richer output.

Documentation comments don't need to include implementation info; that can go out of date.

paquette: Documentation comments don't need to include implementation info; that can go out of date.

paquetteUnsubmitted

Done

/// Uses HashTree::walkEdges to print the edges of the hash tree.

- /// If a DebugMap is provided, then it will be used to provide richer output.

+ /// If a \p DebugMap is provided, then it will be used to provide richer output.

void dump(raw_ostream &OS = llvm::errs(),

Use Doxygen stuff

paquette: Use Doxygen stuff

struct StableHashTree {

/// Graph traversal callback types.

///{

using EdgeCallbackFn =

std::function<void(const HashNode *, const HashNode *)>;

using NodeCallbackFn = std::function<void(const HashNode *)>;

///}

using StableHashSequence = std::vector<stable_hash>;

paquetteUnsubmitted

Done

llvm::Error writeHashTreeToFile(StringRef Filename) const;

- /// When building a hash tree, insert sequences of stable instruction hashes.

+ /// Insert \p StableHashSequences into the HashTree.

void insertIntoHashTree(

No need to mention where this is called if you document the algorithm somewhere.

paquette: No need to mention where this is called if you document the algorithm somewhere.

/// Walks every edge and node in the StableHashTree and calls CallbackEdge

/// for the edges and CallbackNode for the nodes with the stable_hash for

/// the source and the stable_hash of the sink for an edge. These generic

paquetteUnsubmitted

Done

const std::vector<std::vector<stable_hash>> &StableHashSequences);

- // When using a hash tree, starting from the root, check whether a sequence

+ /// Checks if \p StableHashSequence is in the HashTree.

+ ///

+ /// \returns true when \p StableHashSequence is in the HashTree, and false

+ /// otherwise.

// of stable instruction hashes ends up at a terminal node.

- Documentation should just say what the function does, not include implementation details.

Doxygen comment.

paquette: - Documentation should just say what the function does, not include implementation details.

/// callbacks can be used to traverse a StableHashTree for the purpose of

/// print debugging or serializing it.

void walkGraph(EdgeCallbackFn CallbackEdge,

NodeCallbackFn CallbackNode) const;

paquetteUnsubmitted

Done

Probably good to not mention outlining here if this is supposed to be general-purpose?

paquette: Probably good to not mention outlining here if this is supposed to be general-purpose?

/// Walks the nodes of a StableHashTree using walkGraph.

void walkVertices(NodeCallbackFn Callback) const {

walkGraph([](const HashNode *A, const HashNode *B) {}, Callback);

}

/// Uses walkVertices to print a StableHashTree.

/// If a \p DebugMap is provided, then it will be used to provide richer

/// output.

void print(raw_ostream &OS = llvm::errs(),

std::unordered_map<stable_hash, std::string> DebugMap = {}) const;

paquetteUnsubmitted

Done

Needs documentation.

paquette: Needs documentation.

void dump() const { print(llvm::errs()); }

/// Builds a StableHashTree from a \p Buffer.

/// The serialization format here should be considered opaque, and may change.

/// \returns llvm::Error::ErrorSuccess if successful, otherwise returns some

/// other llvm::Error error kind.

llvm::Error readFromBuffer(StringRef Buffer);

/// Serializes a StableHashTree from a file at \p Filename.

llvm::Error readFromFile(StringRef Filename) {

paquetteUnsubmitted

Not Done

Should this be in StableHashTree.cpp?

paquette: Should this be in StableHashTree.cpp?

llvm::SmallString<256> Filepath(Filename);

auto FileOrError =

llvm::vfs::getRealFileSystem()->getBufferForFile(Filepath);

if (!FileOrError)

return llvm::errorCodeToError(FileOrError.getError());

return readFromBuffer(FileOrError.get()->getBuffer());

}

/// Serializes a StableHashTree to a file at \p Filename.

llvm::Error writeToFile(StringRef Filename) const {

std::error_code EC;

llvm::raw_fd_ostream OS(Filename, EC, llvm::sys::fs::OF_Text);

if (EC)

return llvm::createStringError(EC, "Unable to open StableHashTree Data");

// Note: For now the format is the same as the print output, but this can

// change.

print(OS);

OS.flush();

return llvm::Error::success();

}

void insert(const std::vector<StableHashSequence> &Sequences) {

paquetteUnsubmitted

Not Done

Needs documentation?

What happens if something inserted was already in the tree?

paquette: Needs documentation? What happens if something inserted was already in the tree?

for (const auto &Sequence : Sequences)

insert(Sequence);

}

/// \returns true if \p Sequence exists in a StableHashTree, false

/// otherwise.

bool find(const StableHashSequence &Sequence) const;

paquetteUnsubmitted

Not Done

Seems like this may be a bit nicer as an Optional which returns where the thing was found?

Optional<HashNode *> find(const StableHashSequence &Sequence) const;

That way you can also access the specific node if you need it.

Or, even better, if you had an iterator type for this data structure you could do something like this:

iterator find(const StableHashSequence &Sequence) const;

I feel like the iterator idea is probably better, since it's more consistent with the rest of the LLVM data types. You could even have an edge_iterator and a vertex_iterator for edge/vertex walks.

paquette: Seems like this may be a bit nicer as an Optional which returns where the thing was found? ```…

/// \returns the size of a StableHashTree by traversing it. If

/// \p GetTerminalCountOnly is true, it only counts the terminal nodes

/// (meaning it returns the size of the number of hash sequences in a

/// StableHashTree).

size_t size(bool GetTerminalCountOnly = false) const {

paquetteUnsubmitted

Not Done

Should these functions be in StableHashTree.cpp?

paquette: Should these functions be in StableHashTree.cpp?

size_t Size = 0;

walkVertices([&Size, GetTerminalCountOnly](const HashNode *N) {

Size += (N && (!GetTerminalCountOnly || N->IsTerminal));

});

return Size;

}

size_t depth() const {

paquetteUnsubmitted

Not Done

Missing comment?

paquette: Missing comment?

size_t Size = 0;

std::unordered_map<const HashNode *, size_t> DepthMap;

walkGraph(

[&DepthMap](const HashNode *Src, const HashNode *Dst) {

size_t Depth = DepthMap[Src];

DepthMap[Dst] = Depth + 1;

[&Size, &DepthMap](const HashNode *N) {

Size = std::max(Size, DepthMap[N]);

});

return Size;

}

private:

/// StableHashTree is a compact representation of a set of stable_hash

/// sequences. It allows for for efficient walking of these sequences for

/// matching purposes. HashTreeImpl is the root node of this tree. Its Hash

/// value is 0, and its Successors are the beginning of StableHashSequences

/// inserted into the StableHashTree.

HashNode HashTreeImpl;

/// Inserts a \p Sequence into a StableHashTree. The last node in the sequence

/// will set IsTerminal to true in StableHashTree.

void insert(const StableHashSequence &Sequence);

};

} // namespace llvm

#endif

llvm/lib/CodeGen/CMakeLists.txt

Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	add_llvm_component_library(LLVMCodeGen
MachineLoopInfo.cpp		MachineLoopInfo.cpp
MachineLoopUtils.cpp		MachineLoopUtils.cpp
MachineModuleInfo.cpp		MachineModuleInfo.cpp
MachineModuleInfoImpls.cpp		MachineModuleInfoImpls.cpp
MachineOperand.cpp		MachineOperand.cpp
MachineOptimizationRemarkEmitter.cpp		MachineOptimizationRemarkEmitter.cpp
MachineOutliner.cpp		MachineOutliner.cpp
MachinePassManager.cpp		MachinePassManager.cpp
		StableHashTree.cpp
MachinePipeliner.cpp		MachinePipeliner.cpp
MachinePostDominators.cpp		MachinePostDominators.cpp
MachineRegionInfo.cpp		MachineRegionInfo.cpp
MachineRegisterInfo.cpp		MachineRegisterInfo.cpp
MachineScheduler.cpp		MachineScheduler.cpp
MachineSink.cpp		MachineSink.cpp
MachineSizeOpts.cpp		MachineSizeOpts.cpp
MachineSSAUpdater.cpp		MachineSSAUpdater.cpp
▲ Show 20 Lines • Show All 102 Lines • Show Last 20 Lines

llvm/lib/CodeGen/MachineStableHash.cpp

Show First 20 Lines • Show All 186 Lines • ▼ Show 20 Lines	for (const auto *Op : MI.memoperands()) {
HashComponents.push_back(static_cast<unsigned>(Op->getSyncScopeID()));		HashComponents.push_back(static_cast<unsigned>(Op->getSyncScopeID()));
HashComponents.push_back(static_cast<unsigned>(Op->getBaseAlign().value()));		HashComponents.push_back(static_cast<unsigned>(Op->getBaseAlign().value()));
HashComponents.push_back(static_cast<unsigned>(Op->getFailureOrdering()));		HashComponents.push_back(static_cast<unsigned>(Op->getFailureOrdering()));
}		}

return stable_hash_combine_range(HashComponents.begin(),		return stable_hash_combine_range(HashComponents.begin(),
HashComponents.end());		HashComponents.end());
}		}

		std::vector<stable_hash>
		llvm::stableHashMachineInstrs(const MachineBasicBlock::iterator &Begin,
		const MachineBasicBlock::iterator &End) {
		std::vector<stable_hash> Sequence;
		for (auto I = Begin; I != End; I++) {
		const MachineInstr &MI = *I;
		stable_hash Hash = stableHashValue(MI);
		if (!Hash)
		return {};
		Sequence.push_back(Hash);
		}
		return Sequence;
		}

llvm/lib/CodeGen/StableHashTree.cpp

This file was added.

//===---- StableHashTree.cpp ------------------------------------*- C++ -*-===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

///

//===----------------------------------------------------------------------===//

#include "llvm/CodeGen/StableHashTree.h"

#include "llvm/ADT/STLExtras.h"

#include "llvm/ADT/Statistic.h"

#include "llvm/CodeGen/MachineOperand.h"

#include "llvm/CodeGen/StableHashing.h"

#include "llvm/Support/Debug.h"

#include "llvm/Support/Error.h"

#include "llvm/Support/ErrorHandling.h"

#include "llvm/Support/JSON.h"

#include "llvm/Support/MemoryBuffer.h"

#include "llvm/Support/raw_ostream.h"

#include <cstdlib>

#include <functional>

#include <iterator>

#include <set>

#include <ios>

#include <stack>

#include <string>

#include <system_error>

#include <unordered_map>

#include <utility>

#include <vector>

#define DEBUG_TYPE "stable-hash-tree"

using namespace llvm;

namespace llvm {

void StableHashTree::walkGraph(EdgeCallbackFn CallbackEdge,

NodeCallbackFn CallbackNode) const {

std::stack<const HashNode *> Stack;

Stack.push(&HashTreeImpl);

while (!Stack.empty()) {

const auto *Current = Stack.top();

Stack.pop();

CallbackNode(Current);

for (const auto &P : Current->Successors) {

CallbackEdge(Current, P.second.get());

Stack.push(P.second.get());

}

void StableHashTree::print(

llvm::raw_ostream &OS,

std::unordered_map<stable_hash, std::string> DebugMap) const {

std::unordered_map<const HashNode *, unsigned> NodeMap;

walkVertices([&NodeMap](const HashNode *Current) {

size_t Index = NodeMap.size();

NodeMap[Current] = Index;

assert(Index = NodeMap.size() + 1 &&

"Expected size of ModeMap to increment by 1");

paquetteUnsubmitted

Not Done

assert(Index = NodeMap.size() + 1 &&

- "Expected size of ModeMap to increment by 1");

+ "Expected size of NodeMap to increment by 1");

});

bool IsFirstEntry = true;

Typo

paquette: Typo

});

bool IsFirstEntry = true;

OS << "{";

for (const auto &Entry : NodeMap) {

if (!IsFirstEntry)

OS << ",";

OS << "\n";

IsFirstEntry = false;

OS << " \"" << Entry.second << "\" : {\n";

OS << " \"hash\" : \"";

OS.raw_ostream::write_hex(Entry.first->Hash);

OS << "\",\n";

OS << " \"isTerminal\" : "

<< "\"" << (Entry.first->IsTerminal ? "true" : "false") << "\",\n";

// For debugging we want to provide a string representation of the hashing

// source, such as a MachineInstr dump, etc. Not intended for production.

auto MII = DebugMap.find(Entry.first->Hash);

if (MII != DebugMap.end())

OS << " \"source\" : \"" << MII->second << "\",\n";

OS << " \"neighbors\" : [";

bool IsFirst = true;

for (const auto &Adj : Entry.first->Successors) {

if (!IsFirst)

OS << ",";

IsFirst = false;

OS << " \"";

OS << NodeMap[Adj.second.get()];

OS << "\" ";

}

OS << "]\n }";

}

OS << "\n}\n";

OS.flush();

}

llvm::Error StableHashTree::readFromBuffer(StringRef Buffer) {

auto Json = llvm::json::parse(Buffer);

if (!Json)

return Json.takeError();

const json::Object *JO = Json.get().getAsObject();

if (!JO)

return llvm::createStringError(std::error_code(), "Bad Json");

paquetteUnsubmitted

Not Done

if (!JO)

- return llvm::createStringError(std::error_code(), "Bad Json");

+ return llvm::createStringError(std::error_code(), "Bad JSON");

std::unordered_map<unsigned, const llvm::json::Value *> JsonMap;

Should be consistent with how JSON is capitalized in things the a person might see on the command line.

paquette: Should be consistent with how JSON is capitalized in things the a person might see on the…

std::unordered_map<unsigned, const llvm::json::Value *> JsonMap;

for (const auto &E : *JO)

JsonMap[std::stoul(E.first.str())] = &E.second;

assert(JsonMap.find(0x0) != JsonMap.end() && "Expected a root HashTree node");

// We have a JsonMap and a NodeMap. We walk the JSON form of the HashTree

// using the JsonMap by using the stack of JSON IDs. As we walk we used the

// IDs to get the currwent JSON Node and the current HashNode.

std::unordered_map<unsigned, HashNode *> NodeMap;

std::stack<unsigned> Stack;

Stack.push(0);

NodeMap[0] = &HashTreeImpl;

while (!Stack.empty()) {

unsigned Current = Stack.top();

Stack.pop();

HashNode *CurrentSubtree = NodeMap[Current];

const auto *CurrentJson = JsonMap[Current]->getAsObject();

std::vector<unsigned> Neighbors;

llvm::transform(*CurrentJson->get("neighbors")->getAsArray(),

std::back_inserter(Neighbors),

[](const llvm::json::Value &S) {

return std::stoull(S.getAsString()->str());

});

stable_hash Hash = std::stoull(

CurrentJson->get("hash")->getAsString()->str(), nullptr, 16);

CurrentSubtree->Hash = Hash;

std::string IsTerminalStr =

StringRef(CurrentJson->get("isTerminal")->getAsString()->str()).lower();

CurrentSubtree->IsTerminal =

IsTerminalStr == "true" || IsTerminalStr == "on";

for (auto N : Neighbors) {

auto I = JsonMap.find(N);

if (I == JsonMap.end())

return llvm::createStringError(std::error_code(),

"Missing neighbor in JSON");

std::unique_ptr<HashNode> Neighbor = std::make_unique<HashNode>();

HashNode *NeighborPtr = Neighbor.get();

stable_hash StableHash = std::stoull(

I->second->getAsObject()->get("hash")->getAsString()->str(), nullptr,

16);

CurrentSubtree->Successors.emplace(StableHash, std::move(Neighbor));

NodeMap[N] = NeighborPtr;

Stack.push(I->first);

}

return llvm::Error::success();

}

void StableHashTree::insert(const StableHashSequence &Sequence) {

HashNode *Current = &HashTreeImpl;

for (stable_hash StableHash : Sequence) {

auto I = Current->Successors.find(StableHash);

if (I == Current->Successors.end()) {

paquetteUnsubmitted

Not Done

Nit: It's a bit nicer to read if you put the simpler situations as the one you continue from. This lessens the indentation level of the more complicated case:

if (I != Current->Successors.end()) {
  Current = I->second.get();
  continue;
}

// Didn't find the hash in the current node's successors. Create a new one.
std::unique_ptr<HashNode> Next = std::make_unique<HashNode>();
// ...

paquette: Nit: It's a bit nicer to read if you put the simpler situations as the one you continue from.

std::unique_ptr<HashNode> Next = std::make_unique<HashNode>();

HashNode *NextPtr = Next.get();

NextPtr->Hash = StableHash;

Current->Successors.emplace(StableHash, std::move(Next));

Current = NextPtr;

continue;

}

Current = I->second.get();

}

Current->IsTerminal = true;

}

bool StableHashTree::find(const StableHashSequence &Sequence) const {

const HashNode *Current = &HashTreeImpl;

for (stable_hash StableHash : Sequence) {

const auto I = Current->Successors.find(StableHash);

if (I == Current->Successors.end())

return false;

Current = I->second.get();

}

return Current->IsTerminal;

}

} // namespace llvm

llvm/unittests/CodeGen/CMakeLists.txt

Show All 17 Lines	add_llvm_unittest(CodeGenTests
AllocationOrderTest.cpp		AllocationOrderTest.cpp
AsmPrinterDwarfTest.cpp		AsmPrinterDwarfTest.cpp
DIEHashTest.cpp		DIEHashTest.cpp
DIETest.cpp		DIETest.cpp
LowLevelTypeTest.cpp		LowLevelTypeTest.cpp
LexicalScopesTest.cpp		LexicalScopesTest.cpp
MachineInstrBundleIteratorTest.cpp		MachineInstrBundleIteratorTest.cpp
MachineInstrTest.cpp		MachineInstrTest.cpp
		StableHashTreeTest.cpp
MachineOperandTest.cpp		MachineOperandTest.cpp
PassManagerTest.cpp		PassManagerTest.cpp
ScalableVectorMVTsTest.cpp		ScalableVectorMVTsTest.cpp
TypeTraitsTest.cpp		TypeTraitsTest.cpp
TargetOptionsTest.cpp		TargetOptionsTest.cpp
TestAsmPrinter.cpp		TestAsmPrinter.cpp
)		)

add_subdirectory(GlobalISel)		add_subdirectory(GlobalISel)

target_link_libraries(CodeGenTests PRIVATE LLVMTestingSupport)		target_link_libraries(CodeGenTests PRIVATE LLVMTestingSupport)

llvm/unittests/CodeGen/StableHashTreeTest.cpp

This file was added.

				//===- StableHashTreeTest.cpp ---------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/CodeGen/StableHashTree.h"
				#include "llvm/CodeGen/MachineBasicBlock.h"
				#include "llvm/CodeGen/MachineFunction.h"
				#include "llvm/CodeGen/MachineInstr.h"
				#include "llvm/CodeGen/MachineMemOperand.h"
				#include "llvm/CodeGen/MachineModuleInfo.h"
				#include "llvm/CodeGen/MachineStableHash.h"
				#include "llvm/CodeGen/TargetFrameLowering.h"
				#include "llvm/CodeGen/TargetInstrInfo.h"
				#include "llvm/CodeGen/TargetLowering.h"
				#include "llvm/CodeGen/TargetSubtargetInfo.h"
				#include "llvm/IR/DebugInfoMetadata.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/ModuleSlotTracker.h"
				#include "llvm/MC/MCAsmInfo.h"
				#include "llvm/MC/MCSymbol.h"
				#include "llvm/Support/TargetRegistry.h"
				#include "llvm/Support/TargetSelect.h"
				#include "llvm/Target/TargetMachine.h"
				#include "llvm/Target/TargetOptions.h"
				#include "gtest/gtest.h"

				using namespace llvm;

				namespace {
				// Include helper functions to ease the manipulation of MachineFunctions.
				#include "MFCommon.inc"

				TEST(HashBasicBlock, StableHashTreeTest) {
				LLVMContext Ctx;
				Module Mod("Module", Ctx);
				auto MF = createMachineFunction(Ctx, Mod);

				MCInstrDesc MCID1 = {0, 0, 0, 0, 0, 0, 0, nullptr, nullptr, nullptr};
				MCInstrDesc MCID2 = {1, 0, 0, 0, 0, 0, 0, nullptr, nullptr, nullptr};
				MCInstrDesc MCID3 = {2, 0, 0, 0, 0, 0, 0, nullptr, nullptr, nullptr};
				MCInstrDesc MCID4 = {3, 0, 0, 0, 0, 0, 0, nullptr, nullptr, nullptr};
				MCInstrDesc MCID5 = {4, 0, 0, 0, 0, 0, 0, nullptr, nullptr, nullptr};

				std::vector<std::vector<MCInstrDesc>> InstrSequences = {
				{MCID1, MCID2, MCID4},
				{MCID1, MCID3, MCID4},
				{MCID1, MCID3, MCID4, MCID5},
				};

				// Populate a Stable Hash Tree with Machine Instructions. Because this is a
				// Hash Trie the series of instructions should overlap and result in a tree
				// that is of depth 4 and of size 7.
				bool IsFirst = true;
				StableHashTree Tree;
				for (auto &IS : InstrSequences) {
				auto *MBB = MF->CreateMachineBasicBlock();
				for (auto &MCID : IS) {
				auto *MI = MF->CreateMachineInstr(MCID, DebugLoc());
				MBB->insert(MBB->end(), MI);
				}

				auto BI = MBB->begin();
				std::vector<std::vector<stable_hash>> HashList = {
				llvm::stableHashMachineInstrs(BI, MBB->end())};
				Tree.insert(HashList);

				if (IsFirst) {
				IsFirst = false;
				ASSERT_TRUE(Tree.depth() == 3);
				}
				}

				// Check depth and size of this tree as expected above.
				ASSERT_TRUE(Tree.depth() == 4);
				ASSERT_TRUE(Tree.size() == 7);

				// Since the purpose of Stable Hash Tree is partly for serializing, test
				// print.
				std::string Str;
				raw_string_ostream Sstr(Str);
				Tree.print(Sstr);

				// Now test that `Tree` is deserialized to a string, serialize it into Tree2.
				StableHashTree Tree2;
				if (auto Err = Tree2.readFromBuffer(StringRef(Str)))
				consumeError(std::move(Err));

				// Because the HashNode uses an unordered_map as a successor as we walk to
				// compare `Tree` and `Tree2` we must insert the hash values into the sorted
				// std::map and std::set for proper comparison.
				std::map<stable_hash, std::set<stable_hash>> HashValueMap1;
				std::map<stable_hash, std::set<stable_hash>> HashValueMap2;

				Tree.walkVertices([&HashValueMap1](const HashNode *N) {
				for (const auto &Succ : N->Successors)
				HashValueMap1[N->Hash].insert(Succ.first);
				});

				Tree2.walkVertices([&HashValueMap2](const HashNode *N) {
				for (const auto &Succ : N->Successors)
				HashValueMap2[N->Hash].insert(Succ.first);
				});

				ASSERT_TRUE(std::equal(HashValueMap1.begin(), HashValueMap1.end(),
				HashValueMap2.begin()));
				}

				} // end namespace

This is an archive of the discontinued LLVM Phabricator instance.

[RFC] StableHashTree Implementation.Needs ReviewPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 301092

llvm/include/llvm/CodeGen/MachineStableHash.h

llvm/include/llvm/CodeGen/StableHashTree.h

llvm/lib/CodeGen/CMakeLists.txt

llvm/lib/CodeGen/MachineStableHash.cpp

llvm/lib/CodeGen/StableHashTree.cpp

llvm/unittests/CodeGen/CMakeLists.txt

llvm/unittests/CodeGen/StableHashTreeTest.cpp

[RFC] StableHashTree Implementation.
Needs ReviewPublic