This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang-tools-extra/clangd/
-
clangd/
2/2
CMakeLists.txt
-
quality/
3/3
CompletionModel.cmake
11/11
CompletionModelCodegen.py
9/9
README.md
-
model/
-
features.json
-
forest.json
-
unittests/
-
CMakeLists.txt
-
CodeCompleteTests.cpp
1/1
DecisionForestTests.cpp
-
decision_forest_model/
-
CategoricalFeature.h
-
features.json
-
forest.json

Differential D83814

[clangd] Add Random Forest runtime for code completion.
ClosedPublic

Authored by usaxena95 on Jul 14 2020, 2:32 PM.

Download Raw Diff

Details

Reviewers

sammccall
adamcz

Commits

rG9b6765e784b3: [clangd] Add Random Forest runtime for code completion.

Summary

We intend to replace heuristics based code completion ranking with a Decision Forest Model.

This patch introduces a format for representing the model and an inference runtime that is code-generated at build time.

Forest.json contains all the trees as an array of trees.
Features.json describes the features to be used.
Codegen file takes the above two files and generates CompletionModel containing Feature struct and corresponding Evaluate function. The Evaluate function maps a feature to a real number describing the relevance of this candidate.
The codegen is part of build system and these files are generated at build time.
Proposes a way to test the generated runtime using a test model.
- Replicates the model structure in unittests.
- unittest tests both the test model (for correct tree traversal) and the real model (for sanity).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

usaxena95 created this revision.Jul 14 2020, 2:32 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2020, 2:32 PM

Herald added subscribers: cfe-commits, kadircet, arphaman and 4 others. · View Herald Transcript

Harbormaster failed remote builds in B64229: Diff 277978!Jul 14 2020, 2:33 PM

usaxena95 edited the summary of this revision. (Show Details)Jul 15 2020, 1:58 AM

usaxena95 edited the summary of this revision. (Show Details)Jul 15 2020, 9:09 AM

usaxena95 edited the summary of this revision. (Show Details)

nridge added a subscriber: nridge.Jul 21 2020, 4:24 PM

Is there some additional context for what this patch is aiming to do? The description sounds interesting but I don't really understand it.

What is a "feature" in this context?

The features refers to the code completion signals in https://github.com/llvm/llvm-project/blob/master/clang-tools-extra/clangd/Quality.h
These signals are currently used to map the code completion candidates to a relevance score using hand-coded heuristics.
We intend to replace the heuristics with a Decision forest model. This patch introduces a dummy model and corresponding runtime that will be used to inference this model.

This is still WIP and I will provide more details in the description once this is finalized.

Addressed offline comments.

Harbormaster failed remote builds in B65534: Diff 280406!Jul 24 2020, 4:57 AM

usaxena95 edited the summary of this revision. (Show Details)Jul 24 2020, 7:11 AM

usaxena95 added a reviewer: sammccall.

Better formatting in generated files.

Harbormaster failed remote builds in B65559: Diff 280455!Jul 24 2020, 7:13 AM

usaxena95 edited the summary of this revision. (Show Details)Jul 24 2020, 7:23 AM

In D83814#2166349, @usaxena95 wrote:

This is still WIP and I will provide more details in the description once this is finalized.

This really should have high level documentation - I don't think Nathan will be the only one to ask these questions.
We need a description of what problem this solves, what the concepts are (trees, features, scores), inference, training, data sources, generated API.

Given that the implementation is necessarily split across CMake, generated code, generator, and data (but not a lot of hand-written C++) I think it's probably best documented in a README.md or so in the model/ dir.

Seems fine to do that in this patch, or ahead of time (before the implementation)... I wouldn't wait to do it after, it'll help with the review.

One question we haven't discussed is how workspace/symbol should work. (This uses the hand-tuned score function with some different logic). This is relevant to naming: it's not the "completion model" if it also has other features.

clang-tools-extra/clangd/CMakeLists.txt
32	if you want the compiler script to be a parameter, make it an argument to the function rather than a magic variable. But really, I think the CompletionModel.cmake is tightly coupled to the python script, I think it can be hardcoded there.
clang-tools-extra/clangd/CompletionModel.cmake
1 ↗	(On Diff #280455)	I think there's some confusion in the naming. You've got the code split into two locations: the generic generator and the specific code completion model. However the directory named just "model" contains the specific stuff, and the generic parts are named "completionmodel!". I'd suggest either: don't generalize, and put everything in clangd/quality or so split into clangd/quality/ and clangd/forest/ for the specific/generic parts
5 ↗	(On Diff #280455)	what does the class do?
7 ↗	(On Diff #280455)	df is cryptic. decision_forest or gen_decision_forest?
17 ↗	(On Diff #280455)	I'd suggest passing the component filenames explicitly here since you're computing them anyway

sammccall added inline comments.Jul 24 2020, 2:15 PM

clang-tools-extra/clangd/CompletionModel.cmake
19 ↗	(On Diff #280455)	fname -> filename
29 ↗	(On Diff #280455)	this needs to be guarded based on the compiler - other compilers use different flags I'd suggest just -Wno-usuned
31 ↗	(On Diff #280455)	It'd be nice to avoid passing data out by setting magic variables with generic names. The caller is already passing in the filename they want, so they know what it is.
clang-tools-extra/clangd/CompletionModelCodegen.py
1 ↗	(On Diff #280455)	this needs documentation throughout
9 ↗	(On Diff #280455)	Hmm, do these classes really pay for themselves compared to just using the JSON-decoded data structures directly and writing functions? e.g. def setter(feature): if (feature['kind'] == KIND_CATEGORICAL) return "void Set{feature}(float V) {{ {feature} = OrderEncode(V); }}".format(feature['name']) ...
10 ↗	(On Diff #280455)	These labels seem a bit abstract, what do you think about "number"/"enum"?
21 ↗	(On Diff #280455)	Assert failures print the expression, so no need to say "header not found" etc. And in fact the next line will throw a reasonable exception in this case... could drop these
28 ↗	(On Diff #280455)	nit: `set{Feature}`, following LLVM style
33 ↗	(On Diff #280455)	raise instead? I'm not sure we want this error handling to be turned off... assert is for programming errors (applies to the rest of the usage of assert too)
45 ↗	(On Diff #280455)	i think this is `set(f.header for f in features.values() if f.type == Feature.Type.CATEGORICAL)` with no need for lambda but my python is rusty
52 ↗	(On Diff #280455)	why not assert this on self.ns so that "::Foo" will work fine?
60 ↗	(On Diff #280455)	join here rather than returning an array?
67 ↗	(On Diff #280455)	gaurd -> guard
70 ↗	(On Diff #280455)	`''.join('_' if x in '-' else x.upper() for x in self.fname)` ?
74 ↗	(On Diff #280455)	(again, not clear the classes pay for themselves in terms of complexity. We do a lot of parsing and validation, but it's not clear it's important)
103 ↗	(On Diff #280455)	would be nice to structure to reduce the indentation inside here. Also 'codegen' is a pretty generic name for this particular piece. I'd consider def node(n): return { 'boost': boost_node, 'if_greater': if_greater_node, 'if_member': if_member_node, }[n['operation']](n) and then define each case separately. (That snippet does actually check errors!)
129 ↗	(On Diff #280455)	rather than passing features around to get access to the enum type, what about adding `using {feature}_type = ...` to the top of the generated file, and using that here?
282 ↗	(On Diff #280455)	putting all the trees in a single huge json file seems like it makes them hard to inspect, did you consider one-per-file?
clang-tools-extra/clangd/for-review-only/CompletionModel.cpp
28 ↗	(On Diff #280455)	nit: consistently either: tree0, tree0_node1 tree_0, tree_0_node_1 t0, t0_n1 etc
clang-tools-extra/clangd/for-review-only/CompletionModel.h
10 ↗	(On Diff #280455)	this shouldn't be part of the public interface. Can we make it private static in the model class?

usaxena95 added a reviewer: adamcz.Sep 8 2020, 5:01 AM

Hi @usaxena95 and @sammccall,

I am wondering about couple high-level things.

Do you guys intend to open-source also the training part of the model pipeline or publish a model trained on generic-enough training set so it could be reasonably used on "any" codebase?

Do you still intend to support the heuristic that is currently powering clangd in the future?

Thanks!

Addressed comments.

Harbormaster completed remote builds in B71261: Diff 291024.Sep 10 2020, 10:31 AM

usaxena95 added inline comments.Sep 10 2020, 10:37 AM

clang-tools-extra/clangd/CMakeLists.txt
32	Hardcoded it in .cmake file.
clang-tools-extra/clangd/CompletionModel.cmake
1 ↗	(On Diff #280455)	Moved .cmake, codegen and model in quality dir.
5 ↗	(On Diff #280455)	The class specifies the name and scope of the Feature class. `clang::clangd::Example` in this case.
17 ↗	(On Diff #280455)	This allows the generated cc file to include the header using "filename.h". If we give the filepath as input, we would have to strip out the filename from it. Although I like the current notion of being explicit that the output_dir contains the two files. We need to add output_dir to include path to use this library.
29 ↗	(On Diff #280455)	only MSVC needed a different flag. `-Wno-unused` works with Clang and GCC. https://godbolt.org/z/Gvdne7
31 ↗	(On Diff #280455)	We can avoid `GENERATED_CC`. But I still wanted to keep the output directory as a detail in this function itself and not as an input parameter. Changed the name to more specific name `DECISION_FOREST_OUTPUT_DIR`.
clang-tools-extra/clangd/CompletionModelCodegen.py
9 ↗	(On Diff #280455)	Removed the Feature class and Tree. CppClass calculates and holds the namespaces which I felt convenient.
52 ↗	(On Diff #280455)	Allowed fully qualified names of classes.
70 ↗	(On Diff #280455)	Sorry for making this complicated. filename was assumed to be in PascalCase (and not contain '-' at all). I wanted to convert it to UPPER_SNAKE_CASE. To avoid unnecessary complexity, lets simply convert it to upper case.

Hi @jkorous

Do you guys intend to open-source also the training part of the model pipeline ?

Open sourcing the training part (both dataset generation and using an open sourced DecisionForest based framework for training) has been on our radar. Although gathering capacity for this task has been difficult lately.

Publish a model trained on generic-enough training set so it could be reasonably used on "any" codebase?

Although the current model has not been trained on a generic codebase, but since the features involved doesn't capture code style/conventions/variable names, it is likely that it performs well on generic code bases as well. This remains to be tested.

Do you still intend to support the heuristic that is currently powering clangd in the future?

Currently we are planning to use this model behind a flag. Initially we would be focusing on comparing the two. Since maintaining and developing signals is easier for an ML model, we might end up deprecating the heuristics.

Thanks,
Utkarsh.

Added README.md for the code completion model.

Harbormaster completed remote builds in B71363: Diff 291205.Sep 11 2020, 7:26 AM

adamcz added inline comments.Sep 14 2020, 8:05 AM

clang-tools-extra/clangd/unittests/DecisionForestTests.cpp
6	This is supposed to be "namespace clang", right?

Fixed namespace.

usaxena95 marked an inline comment as done.Sep 15 2020, 12:16 AM

Harbormaster completed remote builds in B71690: Diff 291813.Sep 15 2020, 12:30 AM

Fixed namespace ending.

Harbormaster completed remote builds in B71734: Diff 291909.Sep 15 2020, 7:38 AM

Looks good to me overall, some minor style comments included ;-)

Do we expect this to be a generic model code generator that gets reused for other things? If not, maybe we can hardcode more (like the namespace, class name, etc), but if you think there's other use cases for this then this LGTM.

clang-tools-extra/clangd/quality/CompletionModelCodegen.py
40	Why GENERATED_DECISON_FOREST_MODEL instead of output_dir, to be consistent with header guards for other files? Doesn't matter much for generated code, but if someone opens this in vim they'll see warnings.
58	nit: add space after if for readability (also below)
94	Please extend the comment to mention the second return value (size of the tree)
106	This is a good place to use an Python's f-string. Also in few places below.
127	style nit: be consistent with spaces around +
178	Is there a reason to make this a friend free-function instead of static method on the Example class? The problem is that you now end up with clang::clangd::Evaluate, so if we every re-use this code gen for another model we'll have a name collision.
272	nit: be consistent about putting a "." at the end of the help text or not.
clang-tools-extra/clangd/unittests/model/CategoricalFeature.h
1 ↗	(On Diff #291813)	Can we rename this directory? quality/model makes some sense (although it would be better to include something about code completion there), but unittests/model is not very descriptive - what model? How about unittests/decision_forest_model/ or something like that? Or go with the Input/TEST_NAME pattern.

Addressed comments.

clang-tools-extra/clangd/quality/CompletionModelCodegen.py
40	The output_dir is the absolute path and not a project relative path. I tried to stick with a special prefix for header guard as done in other Generated headers (e.g. protos) If someone opens this in vim, there would many other warnings that they would see like "unused_label" ;) I don't think that would be a concern since it would be opened for inspection and not for editing.
178	The class name ("Example" currently) would be different for a different model and therefore there would be another overload for `Evaluate(const AnotherClass&)` even if the namespaces are same (`clang::clangd`).
clang-tools-extra/clangd/unittests/model/CategoricalFeature.h
1 ↗	(On Diff #291813)	You are right "quality" wasn't indicative of code completion here but we decided to be consistent with the current naming. The current heuristics for the ranking are in Quality.h and Quality.cpp ;-) changed the dir name in unittests.

Harbormaster completed remote builds in B71866: Diff 292188.Sep 16 2020, 5:13 AM

Just some naming and doc nits. This looks really solid now, nice job!

In D83814#2261458, @jkorous wrote:

Hi @usaxena95 and @sammccall,

I am wondering about couple high-level things.

Do you guys intend to open-source also the training part of the model pipeline or publish a model trained on generic-enough training set so it could be reasonably used on "any" codebase?

@adamcz and I were talking about this too... I think it's important we do as much of this as possible. I was the one not finding time to do it though, and I think Adam may do better :-)

the existing training stuff is using internal tech, but AFAIK it's nothing LightGBM can't do (it trains on a single machine). So we should be able to open-source the training setup and actually use that.
training data generation is harder to open, because it involves sampling a large diverse body of code and parsing lots of it. The core code that embeds clangd and extracts completion candidates should be very doable, so one could run over LLVM on one machine. The framework to run at a larger scale is coupled to internal infra though, and we're currently training on both public and non-public code.

clang-tools-extra/clangd/quality/CompletionModel.cmake
11	these vars are used only once, I'd suggest inlining them for readability
14	/generated/decision_forest seems redundant considering ${CMAKE_BINARY_DIR} is already the generated-files tree for the directory of the calling CMakeLists. Can't we just use ${CMAKE_BINARY_DIR} directly and avoid the DECISION_FOREST_OUTPUT_DIR variable?
clang-tools-extra/clangd/quality/CompletionModelCodegen.py
151	you can use triple-quoted f-strings, which I think would be more readable than blocks of "code +=" code += f"""class {cpp_class.name} {{ public: {"\n ".join(setters)} private: {"\n ".join(class_members)} """ etc may even do the whole thing in one go.
218	again, making this one big triple-quoted f-string may be nicer, up to you
clang-tools-extra/clangd/quality/README.md
5	The second half of this sentence simply restates the first. Maybe we can combine this with the second paragraph: "A decision tree is a full binary tree that provides a quality prediction for an input (code completion item). Internal nodes represent a binary decision based on the input data, and leaf nodes represent a prediction."
9	Nit: I think it's worth separating out defining features vs conditions. e.g. "An input (code completion candidate) is characterized as a set of features, such as the type of symbol or the number of existing references At every non-leaf node, we evaluate the condition to decide whether to go left or right. The condition compares one feature of the input against a constant. It is either: ...".
17	nit: rather than alternating between describing traversing all trees and one tree, I'd just say "To compute an overall quality score, we traverse each tree in this way and add up the scores".
27	This is a little prone to confusion with C++ type. Consider "kind" instead?
35	this might be "type"?
39	The max numeric value may not exceed 32 (is that right?)
77	nit: order should match order of the actual paragraphs. This is short enough though that you might not want the mini-TOC here.
99	This seems like it might be a pain to maintain. Maybe just include the json files and the public interface from DecicionForestRuntime.h?
99	may want to add the cmake invocation to generate the files.

Addressed comments.

usaxena95 marked an inline comment as done.Sep 18 2020, 12:52 AM

usaxena95 added inline comments.

clang-tools-extra/clangd/quality/CompletionModel.cmake
14	Changed `CMAKE_BINARY_DIR` to `CMAKE_CURRENT_BINARY_DIR` and removed /generated/decision_forest to avoid the `DECISION_FOREST_OUTPUT_DIR` variable.

Harbormaster completed remote builds in B72140: Diff 292717.Sep 18 2020, 12:54 AM

LG from my side!

This revision is now accepted and ready to land.Sep 18 2020, 1:06 AM

adamcz accepted this revision.Sep 18 2020, 6:27 AM

Removed generated (for review) files.

Harbormaster completed remote builds in B72188: Diff 292818.Sep 18 2020, 9:16 AM

Fixed output_dir cmake variable. Clean build succeeds now.
Ready to land.

Harbormaster completed remote builds in B72196: Diff 292828.Sep 18 2020, 10:15 AM

This revision was landed with ongoing or failed builds.Sep 18 2020, 10:27 AM

Closed by commit rG9b6765e784b3: [clangd] Add Random Forest runtime for code completion. (authored by usaxena95). · Explain Why

This revision was automatically updated to reflect the committed changes.

usaxena95 added a commit: rG9b6765e784b3: [clangd] Add Random Forest runtime for code completion..

echristo added a reverting change: rG549e55b3d563: Temporarily Revert "[clangd] Add Random Forest runtime for code completion.".Sep 18 2020, 2:50 PM

usaxena95 edited the summary of this revision. (Show Details)Sep 19 2020, 12:37 AM

Revision Contents

Path

Size

clang-tools-extra/

clangd/

CMakeLists.txt

9 lines

quality/

CompletionModel.cmake

37 lines

CompletionModelCodegen.py

283 lines

README.md

220 lines

model/

features.json

8 lines

forest.json

18 lines

unittests/

CMakeLists.txt

10 lines

CodeCompleteTests.cpp

12 lines

DecisionForestTests.cpp

29 lines

decision_forest_model/

CategoricalFeature.h

5 lines

features.json

16 lines

forest.json

52 lines

Diff 292839

clang-tools-extra/clangd/CMakeLists.txt

Show All 22 Lines
)		)

set(LLVM_LINK_COMPONENTS		set(LLVM_LINK_COMPONENTS
Support		Support
AllTargetsInfos		AllTargetsInfos
FrontendOpenMP		FrontendOpenMP
Option		Option
)		)

		include(${CMAKE_CURRENT_SOURCE_DIR}/quality/CompletionModel.cmake)
		sammccallUnsubmitted Done Reply Inline Actions if you want the compiler script to be a parameter, make it an argument to the function rather than a magic variable. But really, I think the CompletionModel.cmake is tightly coupled to the python script, I think it can be hardcoded there. sammccall: if you want the compiler script to be a parameter, make it an argument to the function rather…
		usaxena95AuthorUnsubmitted Done Reply Inline Actions Hardcoded it in .cmake file. usaxena95: Hardcoded it in .cmake file.
		gen_decision_forest(${CMAKE_CURRENT_SOURCE_DIR}/quality/model CompletionModel clang::clangd::Example)

if(MSVC AND NOT CLANG_CL)		if(MSVC AND NOT CLANG_CL)
set_source_files_properties(CompileCommands.cpp PROPERTIES COMPILE_FLAGS -wd4130) # disables C4130: logical operation on address of string constant		set_source_files_properties(CompileCommands.cpp PROPERTIES COMPILE_FLAGS -wd4130) # disables C4130: logical operation on address of string constant
endif()		endif()

include_directories(BEFORE "${CMAKE_CURRENT_BINARY_DIR}/../clang-tidy")		include_directories(BEFORE "${CMAKE_CURRENT_BINARY_DIR}/../clang-tidy")

add_clang_library(clangDaemon		add_clang_library(clangDaemon
AST.cpp		AST.cpp
Show All 32 Lines	add_clang_library(clangDaemon
Selection.cpp		Selection.cpp
SemanticHighlighting.cpp		SemanticHighlighting.cpp
SemanticSelection.cpp		SemanticSelection.cpp
SourceCode.cpp		SourceCode.cpp
QueryDriverDatabase.cpp		QueryDriverDatabase.cpp
TUScheduler.cpp		TUScheduler.cpp
URI.cpp		URI.cpp
XRefs.cpp		XRefs.cpp
		${CMAKE_CURRENT_BINARY_DIR}/CompletionModel.cpp

index/Background.cpp		index/Background.cpp
index/BackgroundIndexLoader.cpp		index/BackgroundIndexLoader.cpp
index/BackgroundIndexStorage.cpp		index/BackgroundIndexStorage.cpp
index/BackgroundQueue.cpp		index/BackgroundQueue.cpp
index/BackgroundRebuild.cpp		index/BackgroundRebuild.cpp
index/CanonicalIncludes.cpp		index/CanonicalIncludes.cpp
index/FileIndex.cpp		index/FileIndex.cpp
Show All 24 Lines	add_clang_library(clangDaemon
clangTidy		clangTidy
${LLVM_PTHREAD_LIB}		${LLVM_PTHREAD_LIB}
${ALL_CLANG_TIDY_CHECKS}		${ALL_CLANG_TIDY_CHECKS}

DEPENDS		DEPENDS
omp_gen		omp_gen
)		)

		# Include generated CompletionModel headers.
		target_include_directories(clangDaemon PUBLIC
		$<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}>
		)

clang_target_link_libraries(clangDaemon		clang_target_link_libraries(clangDaemon
PRIVATE		PRIVATE
clangAST		clangAST
clangASTMatchers		clangASTMatchers
clangBasic		clangBasic
clangDriver		clangDriver
clangFormat		clangFormat
clangFrontend		clangFrontend
Show All 40 Lines

clang-tools-extra/clangd/quality/CompletionModel.cmake

This file was added.

				# Run the Completion Model Codegenerator on the model present in the
				# ${model} directory.
				# Produces a pair of files called ${filename}.h and ${filename}.cpp in the
				# ${CMAKE_CURRENT_BINARY_DIR}. The generated header
				# will define a C++ class called ${cpp_class} - which may be a
				# namespace-qualified class name.
				function(gen_decision_forest model filename cpp_class)
				set(model_compiler ${CMAKE_SOURCE_DIR}/../clang-tools-extra/clangd/quality/CompletionModelCodegen.py)

				set(output_dir ${CMAKE_CURRENT_BINARY_DIR})
				set(header_file ${output_dir}/${filename}.h)
				sammccallUnsubmitted Done Reply Inline Actions these vars are used only once, I'd suggest inlining them for readability sammccall: these vars are used only once, I'd suggest inlining them for readability
				set(cpp_file ${output_dir}/${filename}.cpp)

				add_custom_command(OUTPUT ${header_file} ${cpp_file}
				sammccallUnsubmitted Done Reply Inline Actions /generated/decision_forest seems redundant considering ${CMAKE_BINARY_DIR} is already the generated-files tree for the directory of the calling CMakeLists. Can't we just use ${CMAKE_BINARY_DIR} directly and avoid the DECISION_FOREST_OUTPUT_DIR variable? sammccall: /generated/decision_forest seems redundant considering ${CMAKE_BINARY_DIR} is already the…
				usaxena95AuthorUnsubmitted Done Reply Inline Actions Changed `CMAKE_BINARY_DIR` to `CMAKE_CURRENT_BINARY_DIR` and removed /generated/decision_forest to avoid the `DECISION_FOREST_OUTPUT_DIR` variable. usaxena95: Changed `CMAKE_BINARY_DIR` to `CMAKE_CURRENT_BINARY_DIR` and removed /generated/decision_forest…
				COMMAND "${Python3_EXECUTABLE}" ${model_compiler}
				--model ${model}
				--output_dir ${output_dir}
				--filename ${filename}
				--cpp_class ${cpp_class}
				COMMENT "Generating code completion model runtime..."
				DEPENDS ${model_compiler} ${model}/forest.json ${model}/features.json
				VERBATIM )

				set_source_files_properties(${header_file} PROPERTIES
				GENERATED 1)
				set_source_files_properties(${cpp_file} PROPERTIES
				GENERATED 1)

				# Disable unused label warning for generated files.
				if (CMAKE_CXX_COMPILER_ID STREQUAL "MSVC")
				set_source_files_properties(${cpp_file} PROPERTIES
				COMPILE_FLAGS /wd4102)
				else()
				set_source_files_properties(${cpp_file} PROPERTIES
				COMPILE_FLAGS -Wno-unused)
				endif()
				endfunction()

clang-tools-extra/clangd/quality/CompletionModelCodegen.py

This file was added.

				"""Code generator for Code Completion Model Inference.

				Tool runs on the Decision Forest model defined in {model} directory.
				It generates two files: {output_dir}/{filename}.h and {output_dir}/{filename}.cpp
				The generated files defines the Example class named {cpp_class} having all the features as class members.
				The generated runtime provides an `Evaluate` function which can be used to score a code completion candidate.
				"""

				import argparse
				import json
				import struct
				from enum import Enum


				class CppClass:
				"""Holds class name and names of the enclosing namespaces."""

				def __init__(self, cpp_class):
				ns_and_class = cpp_class.split("::")
				self.ns = [ns for ns in ns_and_class[0:-1] if len(ns) > 0]
				self.name = ns_and_class[-1]
				if len(self.name) == 0:
				raise ValueError("Empty class name.")

				def ns_begin(self):
				"""Returns snippet for opening namespace declarations."""
				open_ns = [f"namespace {ns} {{" for ns in self.ns]
				return "\n".join(open_ns)

				def ns_end(self):
				"""Returns snippet for closing namespace declarations."""
				close_ns = [
				f"}} // namespace {ns}" for ns in reversed(self.ns)]
				return "\n".join(close_ns)


				def header_guard(filename):
				'''Returns the header guard for the generated header.'''
				return f"GENERATED_DECISION_FOREST_MODEL_{filename.upper()}_H"

				adamczUnsubmitted Done Reply Inline Actions Why GENERATED_DECISON_FOREST_MODEL instead of output_dir, to be consistent with header guards for other files? Doesn't matter much for generated code, but if someone opens this in vim they'll see warnings. adamcz: Why GENERATED_DECISON_FOREST_MODEL instead of output_dir, to be consistent with header guards…
				usaxena95AuthorUnsubmitted Done Reply Inline Actions The output_dir is the absolute path and not a project relative path. I tried to stick with a special prefix for header guard as done in other Generated headers (e.g. protos) If someone opens this in vim, there would many other warnings that they would see like "unused_label" ;) I don't think that would be a concern since it would be opened for inspection and not for editing. usaxena95: The output_dir is the absolute path and not a project relative path. I tried to stick with a…

				def boost_node(n, label, next_label):
				"""Returns code snippet for a leaf/boost node.
				Adds value of leaf to the score and jumps to the root of the next tree."""
				return f"{label}: Score += {n['score']}; goto {next_label};"


				def if_greater_node(n, label, next_label):
				"""Returns code snippet for a if_greater node.
				Jumps to true_label if the Example feature (NUMBER) is greater than the threshold.
				Comparing integers is much faster than comparing floats. Assuming floating points
				are represented as IEEE 754, it order-encodes the floats to integers before comparing them.
				Control falls through if condition is evaluated to false."""
				threshold = n["threshold"]
				return f"{label}: if (E.{n['feature']} >= {order_encode(threshold)} /{threshold}/) goto {next_label};"


				def if_member_node(n, label, next_label):
				adamczUnsubmitted Done Reply Inline Actions nit: add space after if for readability (also below) adamcz: nit: add space after if for readability (also below)
				"""Returns code snippet for a if_member node.
				Jumps to true_label if the Example feature (ENUM) is present in the set of enum values
				described in the node.
				Control falls through if condition is evaluated to false."""
				members = '\|'.join([
				f"BIT({n['feature']}_type::{member})"
				for member in n["set"]
				])
				return f"{label}: if (E.{n['feature']} & ({members})) goto {next_label};"


				def node(n, label, next_label):
				"""Returns code snippet for the node."""
				return {
				'boost': boost_node,
				'if_greater': if_greater_node,
				'if_member': if_member_node,
				}[n['operation']](n, label, next_label)


				def tree(t, tree_num: int, node_num: int):
				"""Returns code for inferencing a Decision Tree.
				Also returns the size of the decision tree.

				A tree starts with its label `t{tree#}`.
				A node of the tree starts with label `t{tree#}_n{node#}`.

				The tree contains two types of node: Conditional node and Leaf node.
				- Conditional node evaluates a condition. If true, it jumps to the true node/child.
				Code is generated using pre-order traversal of the tree considering
				false node as the first child. Therefore the false node is always the
				immediately next label.
				- Leaf node adds the value to the score and jumps to the next tree.
				"""
				label = f"t{tree_num}_n{node_num}"
				code = []
				adamczUnsubmitted Done Reply Inline Actions Please extend the comment to mention the second return value (size of the tree) adamcz: Please extend the comment to mention the second return value (size of the tree)
				if node_num == 0:
				code.append(f"t{tree_num}:")

				if t["operation"] == "boost":
				code.append(node(t, label=label, next_label=f"t{tree_num+1}"))
				return code, 1

				false_code, false_size = tree(
				t['else'], tree_num=tree_num, node_num=node_num+1)

				true_node_num = node_num+false_size+1
				true_label = f"t{tree_num}_n{true_node_num}"
				adamczUnsubmitted Done Reply Inline Actions This is a good place to use an Python's f-string. Also in few places below. adamcz: This is a good place to use an Python's f-string. Also in few places below.

				true_code, true_size = tree(
				t['then'], tree_num=tree_num, node_num=true_node_num)

				code.append(node(t, label=label, next_label=true_label))

				return code+false_code+true_code, 1+false_size+true_size


				def gen_header_code(features_json: list, cpp_class, filename: str):
				"""Returns code for header declaring the inference runtime.

				Declares the Example class named {cpp_class} inside relevant namespaces.
				The Example class contains all the features as class members. This
				class can be used to represent a code completion candidate.
				Provides `float Evaluate()` function which can be used to score the Example.
				"""
				setters = []
				for f in features_json:
				feature = f["name"]
				if f["kind"] == "NUMBER":
				adamczUnsubmitted Done Reply Inline Actions style nit: be consistent with spaces around + adamcz: style nit: be consistent with spaces around +
				# Floats are order-encoded to integers for faster comparison.
				setters.append(
				f"void set{feature}(float V) {{ {feature} = OrderEncode(V); }}")
				elif f["kind"] == "ENUM":
				setters.append(
				f"void set{feature}(unsigned V) {{ {feature} = 1 << V; }}")
				else:
				raise ValueError("Unhandled feature type.", f["kind"])

				# Class members represent all the features of the Example.
				class_members = [f"uint32_t {f['name']} = 0;" for f in features_json]

				nline = "\n "
				guard = header_guard(filename)
				return f"""#ifndef {guard}
				#define {guard}
				#include <cstdint>

				{cpp_class.ns_begin()}
				class {cpp_class.name} {{
				public:
				{nline.join(setters)}

				private:
				sammccallUnsubmitted Done Reply Inline Actions you can use triple-quoted f-strings, which I think would be more readable than blocks of "code +=" code += f"""class {cpp_class.name} {{ public: {"\n ".join(setters)} private: {"\n ".join(class_members)} """ etc may even do the whole thing in one go. sammccall: you can use triple-quoted f-strings, which I think would be more readable than blocks of "code…
				{nline.join(class_members)}

				// Produces an integer that sorts in the same order as F.
				// That is: a < b <==> orderEncode(a) < orderEncode(b).
				static uint32_t OrderEncode(float F);
				friend float Evaluate(const {cpp_class.name}&);
				}};

				float Evaluate(const {cpp_class.name}&);
				{cpp_class.ns_end()}
				#endif // {guard}
				"""


				def order_encode(v: float):
				i = struct.unpack('<I', struct.pack('<f', v))[0]
				TopBit = 1 << 31
				# IEEE 754 floats compare like sign-magnitude integers.
				if (i & TopBit): # Negative float
				return (1 << 32) - i # low half of integers, order reversed.
				return TopBit + i # top half of integers


				def evaluate_func(forest_json: list, cpp_class: CppClass):
				"""Generates code for `float Evaluate(const {Example}&)` function.
				The generated function can be used to score an Example."""
				code = f"float Evaluate(const {cpp_class.name}& E) {{\n"
				adamczUnsubmitted Done Reply Inline Actions Is there a reason to make this a friend free-function instead of static method on the Example class? The problem is that you now end up with clang::clangd::Evaluate, so if we every re-use this code gen for another model we'll have a name collision. adamcz: Is there a reason to make this a friend free-function instead of static method on the Example…
				usaxena95AuthorUnsubmitted Done Reply Inline Actions The class name ("Example" currently) would be different for a different model and therefore there would be another overload for `Evaluate(const AnotherClass&)` even if the namespaces are same (`clang::clangd`). usaxena95: The class name ("Example" currently) would be different for a different model and therefore…
				lines = []
				lines.append("float Score = 0;")
				tree_num = 0
				for tree_json in forest_json:
				lines.extend(tree(tree_json, tree_num=tree_num, node_num=0)[0])
				lines.append("")
				tree_num += 1

				lines.append(f"t{len(forest_json)}: // No such tree.")
				lines.append("return Score;")
				code += " " + "\n ".join(lines)
				code += "\n}"
				return code


				def gen_cpp_code(forest_json: list, features_json: list, filename: str,
				cpp_class: CppClass):
				"""Generates code for the .cpp file."""
				# Headers
				# Required by OrderEncode(float F).
				angled_include = [
				f'#include <{h}>'
				for h in ["cstring", "limits"]
				]

				# Include generated header.
				qouted_headers = {f"{filename}.h", "llvm/ADT/bit.h"}
				# Headers required by ENUM features used by the model.
				qouted_headers \|= {f["header"]
				for f in features_json if f["kind"] == "ENUM"}
				quoted_include = [f'#include "{h}"' for h in sorted(qouted_headers)]

				# using-decl for ENUM features.
				using_decls = "\n".join(f"using {feature['name']}_type = {feature['type']};"
				for feature in features_json
				if feature["kind"] == "ENUM")
				nl = "\n"
				return f"""{nl.join(angled_include)}

				{nl.join(quoted_include)}
				sammccallUnsubmitted Done Reply Inline Actions again, making this one big triple-quoted f-string may be nicer, up to you sammccall: again, making this one big triple-quoted f-string may be nicer, up to you

				#define BIT(X) (1 << X)

				{cpp_class.ns_begin()}

				{using_decls}

				uint32_t {cpp_class.name}::OrderEncode(float F) {{
				static_assert(std::numeric_limits<float>::is_iec559, "");
				constexpr uint32_t TopBit = ~(~uint32_t{{0}} >> 1);

				// Get the bits of the float. Endianness is the same as for integers.
				uint32_t U = llvm::bit_cast<uint32_t>(F);
				std::memcpy(&U, &F, sizeof(U));
				// IEEE 754 floats compare like sign-magnitude integers.
				if (U & TopBit) // Negative float.
				return 0 - U; // Map onto the low half of integers, order reversed.
				return U + TopBit; // Positive floats map onto the high half of integers.
				}}

				{evaluate_func(forest_json, cpp_class)}
				{cpp_class.ns_end()}
				"""


				def main():
				parser = argparse.ArgumentParser('DecisionForestCodegen')
				parser.add_argument('--filename', help='output file name.')
				parser.add_argument('--output_dir', help='output directory.')
				parser.add_argument('--model', help='path to model directory.')
				parser.add_argument(
				'--cpp_class',
				help='The name of the class (which may be a namespace-qualified) created in generated header.'
				)
				ns = parser.parse_args()

				output_dir = ns.output_dir
				filename = ns.filename
				header_file = f"{output_dir}/{filename}.h"
				cpp_file = f"{output_dir}/{filename}.cpp"
				cpp_class = CppClass(cpp_class=ns.cpp_class)

				model_file = f"{ns.model}/forest.json"
				features_file = f"{ns.model}/features.json"

				with open(features_file) as f:
				features_json = json.load(f)

				with open(model_file) as m:
				forest_json = json.load(m)

				with open(cpp_file, 'w+t') as output_cc:
				output_cc.write(
				gen_cpp_code(forest_json=forest_json,
				adamczUnsubmitted Done Reply Inline Actions nit: be consistent about putting a "." at the end of the help text or not. adamcz: nit: be consistent about putting a "." at the end of the help text or not.
				features_json=features_json,
				filename=filename,
				cpp_class=cpp_class))

				with open(header_file, 'w+t') as output_h:
				output_h.write(gen_header_code(
				features_json=features_json, cpp_class=cpp_class, filename=filename))


				if __name__ == '__main__':
				main()

clang-tools-extra/clangd/quality/README.md

This file was added.

				# Decision Forest Code Completion Model

				## Decision Forest
				A decision forest is a collection of many decision trees. A decision tree is a full binary tree that provides a quality prediction for an input (code completion item). Internal nodes represent a binary decision based on the input data, and leaf nodes represent a prediction.

				sammccallUnsubmitted Done Reply Inline Actions The second half of this sentence simply restates the first. Maybe we can combine this with the second paragraph: "A decision tree is a full binary tree that provides a quality prediction for an input (code completion item). Internal nodes represent a binary decision based on the input data, and leaf nodes represent a prediction." sammccall: The second half of this sentence simply restates the first. Maybe we can combine this with the…
				In order to predict the relevance of a code completion item, we traverse each of the decision trees beginning with their roots until we reach a leaf.

				An input (code completion candidate) is characterized as a set of features, such as the type of symbol or the number of existing references.

				sammccallUnsubmitted Done Reply Inline Actions Nit: I think it's worth separating out defining features vs conditions. e.g. "An input (code completion candidate) is characterized as a set of features, such as the type of symbol or the number of existing references At every non-leaf node, we evaluate the condition to decide whether to go left or right. The condition compares one feature of the input against a constant. It is either: ...". sammccall: Nit: I think it's worth separating out defining features vs conditions. e.g. "An input (code…
				At every non-leaf node, we evaluate the condition to decide whether to go left or right. The condition compares one feature* of the input against a constant. The condition can be of two types:
				- if_greater: Checks whether a numerical feature is >= a threshold.
				- if_member: Check whether the enum feature is contained in the set defined in the node.

				A leaf node contains the value score.
				To compute an overall quality score, we traverse each tree in this way and add up the scores.

				## Model Input Format
				sammccallUnsubmitted Done Reply Inline Actions nit: rather than alternating between describing traversing all trees and one tree, I'd just say "To compute an overall quality score, we traverse each tree in this way and add up the scores". sammccall: nit: rather than alternating between describing traversing all trees and one tree, I'd just say…
				The input model is represented in json format.

				### Features
				The file features.json defines the features available to the model.
				It is a json list of features. The features can be of following two kinds.

				#### Number
				```
				{
				"name": "a_numerical_feature",
				sammccallUnsubmitted Done Reply Inline Actions This is a little prone to confusion with C++ type. Consider "kind" instead? sammccall: This is a little prone to confusion with C++ type. Consider "kind" instead?
				"kind": "NUMBER"
				}
				```
				#### Enum
				```
				{
				"name": "an_enum_feature",
				"kind": "ENUM",
				sammccallUnsubmitted Done Reply Inline Actions this might be "type"? sammccall: this might be "type"?
				"enum": "fully::qualified::enum",
				"header": "path/to/HeaderDeclaringEnum.h"
				}
				```
				sammccallUnsubmitted Done Reply Inline Actions The max numeric value may not exceed 32 (is that right?) sammccall: The max numeric value may not exceed 32 (is that right?)
				The field `enum` specifies the fully qualified name of the enum.
				The maximum cardinality of the enum can be 32.

				The field `header` specifies the header containing the declaration of the enum.
				This header is included by the inference runtime.


				### Decision Forest
				The file `forest.json` defines the decision forest. It is a json list of DecisionTree.

				DecisionTree is one of IfGreaterNode, IfMemberNode, LeafNode.
				#### IfGreaterNode
				```
				{
				"operation": "if_greater",
				"feature": "a_numerical_feature",
				"threshold": A real number,
				"then": {A DecisionTree},
				"else": {A DecisionTree}
				}
				```
				#### IfMemberNode
				```
				{
				"operation": "if_member",
				"feature": "an_enum_feature",
				"set": ["enum_value1", "enum_value2", ...],
				"then": {A DecisionTree},
				"else": {A DecisionTree}
				}
				```
				#### LeafNode
				```
				{
				"operation": "boost",
				"score": A real number
				}
				```
				sammccallUnsubmitted Done Reply Inline Actions nit: order should match order of the actual paragraphs. This is short enough though that you might not want the mini-TOC here. sammccall: nit: order should match order of the actual paragraphs. This is short enough though that you…

				## Code Generator for Inference
				The implementation of inference runtime is split across:

				### Code generator
				The code generator `CompletionModelCodegen.py` takes input the `${model}` dir and generates the inference library:
				- `${output_dir}/{filename}.h`
				- `${output_dir}/{filename}.cpp`

				Invocation
				```
				python3 CompletionModelCodegen.py \
				--model path/to/model/dir \
				--output_dir path/to/output/dir \
				--filename OutputFileName \
				--cpp_class clang::clangd::YourExampleClass
				```
				### Build System
				`CompletionModel.cmake` provides `gen_decision_forest` method .
				Client intending to use the CompletionModel for inference can use this to trigger the code generator and generate the inference library.
				It can then use the generated API by including and depending on this library.

				sammccallUnsubmitted Done Reply Inline Actions This seems like it might be a pain to maintain. Maybe just include the json files and the public interface from DecicionForestRuntime.h? sammccall: This seems like it might be a pain to maintain. Maybe just include the json files and the…
				sammccallUnsubmitted Done Reply Inline Actions may want to add the cmake invocation to generate the files. sammccall: may want to add the cmake invocation to generate the files.
				### Generated API for inference
				The code generator defines the Example `class` inside relevant namespaces as specified in option `${cpp_class}`.

				Members of this generated class comprises of all the features mentioned in `features.json`.
				Thus this class can represent a code completion candidate that needs to be scored.

				The API also provides `float Evaluate(const MyClass&)` which can be used to score the completion candidate.


				## Example
				### model/features.json
				```
				[
				{
				"name": "ANumber",
				"type": "NUMBER"
				},
				{
				"name": "AFloat",
				"type": "NUMBER"
				},
				{
				"name": "ACategorical",
				"type": "ENUM",
				"enum": "ns1::ns2::TestEnum",
				"header": "model/CategoricalFeature.h"
				}
				]
				```
				### model/forest.json
				```
				[
				{
				"operation": "if_greater",
				"feature": "ANumber",
				"threshold": 200.0,
				"then": {
				"operation": "if_greater",
				"feature": "AFloat",
				"threshold": -1,
				"then": {
				"operation": "boost",
				"score": 10.0
				},
				"else": {
				"operation": "boost",
				"score": -20.0
				}
				},
				"else": {
				"operation": "if_member",
				"feature": "ACategorical",
				"set": [
				"A",
				"C"
				],
				"then": {
				"operation": "boost",
				"score": 3.0
				},
				"else": {
				"operation": "boost",
				"score": -4.0
				}
				}
				},
				{
				"operation": "if_member",
				"feature": "ACategorical",
				"set": [
				"A",
				"B"
				],
				"then": {
				"operation": "boost",
				"score": 5.0
				},
				"else": {
				"operation": "boost",
				"score": -6.0
				}
				}
				]
				```
				### DecisionForestRuntime.h
				```
				...
				namespace ns1 {
				namespace ns2 {
				namespace test {
				class Example {
				public:
				void setANumber(float V) { ... }
				void setAFloat(float V) { ... }
				void setACategorical(unsigned V) { ... }

				private:
				...
				};

				float Evaluate(const Example&);
				} // namespace test
				} // namespace ns2
				} // namespace ns1
				```

				### CMake Invocation
				Inorder to use the inference runtime, one can use `gen_decision_forest` function
				described in `CompletionModel.cmake` which invokes `CodeCompletionCodegen.py` with the appropriate arguments.

				For example, the following invocation reads the model present in `path/to/model` and creates
				`${CMAKE_CURRENT_BINARY_DIR}/myfilename.h` and `${CMAKE_CURRENT_BINARY_DIR}/myfilename.cpp`
				describing a `class` named `MyClass` in namespace `fully::qualified`.



				```
				gen_decision_forest(path/to/model
				myfilename
				::fully::qualifed::MyClass)
				```
				No newline at end of file

clang-tools-extra/clangd/quality/model/features.json

This file was added.

				[
				{
				"name": "ContextKind",
				"kind": "ENUM",
				"type": "clang::CodeCompletionContext::Kind",
				"header": "clang/Sema/CodeCompleteConsumer.h"
				}
				]
				No newline at end of file

clang-tools-extra/clangd/quality/model/forest.json

This file was added.

				[
				{
				"operation": "if_member",
				"feature": "ContextKind",
				"set": [
				"CCC_DotMemberAccess",
				"CCC_ArrowMemberAccess"
				],
				"then": {
				"operation": "boost",
				"score": 3.0
				},
				"else": {
				"operation": "boost",
				"score": 1.0
				}
				}
				]
				No newline at end of file

clang-tools-extra/clangd/unittests/CMakeLists.txt

Show All 22 Lines
endif()		endif()

if (CLANGD_ENABLE_REMOTE)		if (CLANGD_ENABLE_REMOTE)
include_directories(${CMAKE_CURRENT_BINARY_DIR}/../index/remote)		include_directories(${CMAKE_CURRENT_BINARY_DIR}/../index/remote)
add_definitions(-DGOOGLE_PROTOBUF_NO_RTTI=1)		add_definitions(-DGOOGLE_PROTOBUF_NO_RTTI=1)
set(REMOTE_TEST_SOURCES remote/MarshallingTests.cpp)		set(REMOTE_TEST_SOURCES remote/MarshallingTests.cpp)
endif()		endif()

		include(${CMAKE_CURRENT_SOURCE_DIR}/../quality/CompletionModel.cmake)
		gen_decision_forest(${CMAKE_CURRENT_SOURCE_DIR}/decision_forest_model DecisionForestRuntimeTest ::ns1::ns2::test::Example)

add_custom_target(ClangdUnitTests)		add_custom_target(ClangdUnitTests)
add_unittest(ClangdUnitTests ClangdTests		add_unittest(ClangdUnitTests ClangdTests
Annotations.cpp		Annotations.cpp
ASTTests.cpp		ASTTests.cpp
BackgroundIndexTests.cpp		BackgroundIndexTests.cpp
CanonicalIncludesTests.cpp		CanonicalIncludesTests.cpp
ClangdTests.cpp		ClangdTests.cpp
ClangdLSPServerTests.cpp		ClangdLSPServerTests.cpp
CodeCompleteTests.cpp		CodeCompleteTests.cpp
CodeCompletionStringsTests.cpp		CodeCompletionStringsTests.cpp
CollectMacrosTests.cpp		CollectMacrosTests.cpp
CompileCommandsTests.cpp		CompileCommandsTests.cpp
CompilerTests.cpp		CompilerTests.cpp
ConfigCompileTests.cpp		ConfigCompileTests.cpp
ConfigProviderTests.cpp		ConfigProviderTests.cpp
ConfigYAMLTests.cpp		ConfigYAMLTests.cpp
		DecisionForestTests.cpp
DexTests.cpp		DexTests.cpp
DiagnosticsTests.cpp		DiagnosticsTests.cpp
DraftStoreTests.cpp		DraftStoreTests.cpp
ExpectedTypeTest.cpp		ExpectedTypeTest.cpp
FileDistanceTests.cpp		FileDistanceTests.cpp
FileIndexTests.cpp		FileIndexTests.cpp
FindSymbolsTests.cpp		FindSymbolsTests.cpp
FindTargetTests.cpp		FindTargetTests.cpp
Show All 29 Lines	add_unittest(ClangdUnitTests ClangdTests
TestFS.cpp		TestFS.cpp
TestIndex.cpp		TestIndex.cpp
TestTU.cpp		TestTU.cpp
TypeHierarchyTests.cpp		TypeHierarchyTests.cpp
TweakTests.cpp		TweakTests.cpp
TweakTesting.cpp		TweakTesting.cpp
URITests.cpp		URITests.cpp
XRefsTests.cpp		XRefsTests.cpp
		${CMAKE_CURRENT_BINARY_DIR}/DecisionForestRuntimeTest.cpp

support/CancellationTests.cpp		support/CancellationTests.cpp
support/ContextTests.cpp		support/ContextTests.cpp
support/FunctionTests.cpp		support/FunctionTests.cpp
support/MarkupTests.cpp		support/MarkupTests.cpp
support/ThreadingTests.cpp		support/ThreadingTests.cpp
support/TestTracer.cpp		support/TestTracer.cpp
support/TraceTests.cpp		support/TraceTests.cpp

${REMOTE_TEST_SOURCES}		${REMOTE_TEST_SOURCES}

$<TARGET_OBJECTS:obj.clangDaemonTweaks>		$<TARGET_OBJECTS:obj.clangDaemonTweaks>
)		)

		# Include generated ComletionModel headers.
		target_include_directories(ClangdTests PUBLIC
		$<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}>
		)

clang_target_link_libraries(ClangdTests		clang_target_link_libraries(ClangdTests
PRIVATE		PRIVATE
clangAST		clangAST
clangASTMatchers		clangASTMatchers
clangBasic		clangBasic
clangFormat		clangFormat
clangFrontend		clangFrontend
clangIndex		clangIndex
Show All 30 Lines

clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp

//===-- CodeCompleteTests.cpp ------------------------------------ C++ --===//		//===-- CodeCompleteTests.cpp ------------------------------------ C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "Annotations.h"		#include "Annotations.h"
#include "ClangdServer.h"		#include "ClangdServer.h"
#include "CodeComplete.h"		#include "CodeComplete.h"
#include "Compiler.h"		#include "Compiler.h"
		#include "CompletionModel.h"
#include "Matchers.h"		#include "Matchers.h"
#include "Protocol.h"		#include "Protocol.h"
#include "Quality.h"		#include "Quality.h"
#include "SourceCode.h"		#include "SourceCode.h"
#include "SyncAPI.h"		#include "SyncAPI.h"
#include "TestFS.h"		#include "TestFS.h"
#include "TestIndex.h"		#include "TestIndex.h"
#include "TestTU.h"		#include "TestTU.h"
Show All 21 Lines
using ::testing::AllOf;		using ::testing::AllOf;
using ::testing::Contains;		using ::testing::Contains;
using ::testing::ElementsAre;		using ::testing::ElementsAre;
using ::testing::Field;		using ::testing::Field;
using ::testing::HasSubstr;		using ::testing::HasSubstr;
using ::testing::IsEmpty;		using ::testing::IsEmpty;
using ::testing::Not;		using ::testing::Not;
using ::testing::UnorderedElementsAre;		using ::testing::UnorderedElementsAre;
		using ContextKind = CodeCompletionContext::Kind;

// GMock helpers for matching completion items.		// GMock helpers for matching completion items.
MATCHER_P(Named, Name, "") { return arg.Name == Name; }		MATCHER_P(Named, Name, "") { return arg.Name == Name; }
MATCHER_P(NameStartsWith, Prefix, "") {		MATCHER_P(NameStartsWith, Prefix, "") {
return llvm::StringRef(arg.Name).startswith(Prefix);		return llvm::StringRef(arg.Name).startswith(Prefix);
}		}
MATCHER_P(Scope, S, "") { return arg.Scope == S; }		MATCHER_P(Scope, S, "") { return arg.Scope == S; }
MATCHER_P(Qualifier, Q, "") { return arg.RequiredQualifier == Q; }		MATCHER_P(Qualifier, Q, "") { return arg.RequiredQualifier == Q; }
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	return codeComplete(FilePath, Test.point(), /Preamble=/nullptr, ParseInput,
Opts);		Opts);
}		}

Symbol withReferences(int N, Symbol S) {		Symbol withReferences(int N, Symbol S) {
S.References = N;		S.References = N;
return S;		return S;
}		}

		TEST(DecisionForestRuntime, SanityTest) {
		using Example = clangd::Example;
		using clangd::Evaluate;
		Example E1;
		E1.setContextKind(ContextKind::CCC_ArrowMemberAccess);
		Example E2;
		E2.setContextKind(ContextKind::CCC_SymbolOrNewName);
		EXPECT_GT(Evaluate(E1), Evaluate(E2));
		}

TEST(CompletionTest, Limit) {		TEST(CompletionTest, Limit) {
clangd::CodeCompleteOptions Opts;		clangd::CodeCompleteOptions Opts;
Opts.Limit = 2;		Opts.Limit = 2;
auto Results = completions(R"cpp(		auto Results = completions(R"cpp(
struct ClassWithMembers {		struct ClassWithMembers {
int AAA();		int AAA();
int BBB();		int BBB();
int CCC();		int CCC();
▲ Show 20 Lines • Show All 2,795 Lines • Show Last 20 Lines

clang-tools-extra/clangd/unittests/DecisionForestTests.cpp

This file was added.

				#include "DecisionForestRuntimeTest.h"
				#include "decision_forest_model/CategoricalFeature.h"
				#include "gtest/gtest.h"

				namespace clang {
				namespace clangd {
				adamczUnsubmitted Done Reply Inline Actions This is supposed to be "namespace clang", right? adamcz: This is supposed to be "namespace clang", right?

				TEST(DecisionForestRuntime, Evaluate) {
				using Example = ::ns1::ns2::test::Example;
				using Cat = ::ns1::ns2::TestEnum;
				using ::ns1::ns2::test::Evaluate;

				Example E;
				E.setANumber(200); // True
				E.setAFloat(0); // True: +10.0
				E.setACategorical(Cat::A); // True: +5.0
				EXPECT_EQ(Evaluate(E), 15.0);

				E.setANumber(200); // True
				E.setAFloat(-2.5); // False: -20.0
				E.setACategorical(Cat::B); // True: +5.0
				EXPECT_EQ(Evaluate(E), -15.0);

				E.setANumber(100); // False
				E.setACategorical(Cat::C); // True: +3.0, False: -6.0
				EXPECT_EQ(Evaluate(E), -3.0);
				}
				} // namespace clangd
				} // namespace clang

clang-tools-extra/clangd/unittests/decision_forest_model/CategoricalFeature.h

This file was added.

				namespace ns1 {
				namespace ns2 {
				enum TestEnum { A, B, C, D };
				} // namespace ns2
				} // namespace ns1

clang-tools-extra/clangd/unittests/decision_forest_model/features.json

This file was added.

				[
				{
				"name": "ANumber",
				"kind": "NUMBER"
				},
				{
				"name": "AFloat",
				"kind": "NUMBER"
				},
				{
				"name": "ACategorical",
				"kind": "ENUM",
				"type": "ns1::ns2::TestEnum",
				"header": "decision_forest_model/CategoricalFeature.h"
				}
				]
				No newline at end of file

clang-tools-extra/clangd/unittests/decision_forest_model/forest.json

This file was added.

				[
				{
				"operation": "if_greater",
				"feature": "ANumber",
				"threshold": 200.0,
				"then": {
				"operation": "if_greater",
				"feature": "AFloat",
				"threshold": -1,
				"then": {
				"operation": "boost",
				"score": 10.0
				},
				"else": {
				"operation": "boost",
				"score": -20.0
				}
				},
				"else": {
				"operation": "if_member",
				"feature": "ACategorical",
				"set": [
				"A",
				"C"
				],
				"then": {
				"operation": "boost",
				"score": 3.0
				},
				"else": {
				"operation": "boost",
				"score": -4.0
				}
				}
				},
				{
				"operation": "if_member",
				"feature": "ACategorical",
				"set": [
				"A",
				"B"
				],
				"then": {
				"operation": "boost",
				"score": 5.0
				},
				"else": {
				"operation": "boost",
				"score": -6.0
				}
				}
				]
				No newline at end of file

This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Add Random Forest runtime for code completion.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 292839

clang-tools-extra/clangd/CMakeLists.txt

clang-tools-extra/clangd/quality/CompletionModel.cmake

clang-tools-extra/clangd/quality/CompletionModelCodegen.py

clang-tools-extra/clangd/quality/README.md

clang-tools-extra/clangd/quality/model/features.json

clang-tools-extra/clangd/quality/model/forest.json

clang-tools-extra/clangd/unittests/CMakeLists.txt

clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp

clang-tools-extra/clangd/unittests/DecisionForestTests.cpp

clang-tools-extra/clangd/unittests/decision_forest_model/CategoricalFeature.h

clang-tools-extra/clangd/unittests/decision_forest_model/features.json

clang-tools-extra/clangd/unittests/decision_forest_model/forest.json

[clangd] Add Random Forest runtime for code completion.
ClosedPublic