Garamond.DTVModelGaramond.ENVOP_REQUESTGaramond.ERRORED_REQUESTGaramond.EmbeddingsLibraryGaramond.KILL_REQUESTGaramond.READCONFIGS_REQUESTGaramond.RESPONSE_TERMINATORGaramond.UNINITIALIZED_REQUESTGaramond.BOEEmbedderGaramond.BOREPEmbedderGaramond.BruteTreeIndexGaramond.CPMeanEmbedderGaramond.DTVEmbedderGaramond.DisCEmbedderGaramond.HNSWIndexGaramond.IVFIndexGaramond.InternalRequestGaramond.KDTreeIndexGaramond.NaiveIndexGaramond.NoopIndexGaramond.SIFEmbedderGaramond.SearchEnvGaramond.SearchResultGaramond.SearcherBase.deleteat!Base.lengthBase.parseBase.pop!Base.popfirst!Base.push!Base.pushfirst!Garamond.__document2vecGaramond.aggregate!Garamond.build_data_envGaramond.build_loggerGaramond.build_responseGaramond.build_result_from_idsGaramond.build_search_envGaramond.build_search_envGaramond.build_searcherGaramond.chop_to_lengthGaramond.densifyGaramond.detect_languageGaramond.document2vecGaramond.env_operatorGaramond.garamond_log_formatterGaramond.missing_needlesGaramond.noop_rankerGaramond.parse_configurationGaramond.printable_versionGaramond.read_configuration_to_jsonGaramond.respondGaramond.rest_serverGaramond.searchGaramond.searchGaramond.search_serverGaramond.sentences2vecGaramond.squashGaramond.squashGaramond.suggestion_search!Garamond.summarizeGaramond.unix_socket_serverGaramond.versionGaramond.web_socket_serverGaramond.word_embeddingsHNSW.knn_search
Garamond.BruteTreeIndex — Type.BruteTree index type for storing vectors. It is a wrapper around a BruteTree NN structure and performs brute search using a distance-based similarity between vectors.
Garamond.HNSWIndex — Type.HNSW index type for storing vectors. It is a wrapper around a HierarchicalNSW (Hierarchical Navigable Small Worlds) NN graph structure and performs a very efficient search using a distance-based similarity between vectors.
References
Garamond.IVFIndex — Type.IVFADC index type for storing vectors. It is a wrapper around a IVFADCIndex (inverted file system with asymmetric distance computation) structure and performs a billion-scale search using a distance-based similarity between vectors.
References
Garamond.KDTreeIndex — Type.K-D Tree index type for storing vectors. It is a wrapper around a KDTree NN structure and performs a more efficient search using a distance-based similarity between vectors.
Garamond.NaiveIndex — Type.Naive index type for storing vectors. It is a wrapper around a vector of embeddings and performs brute search using the cosine similarity between vectors.
Garamond.NoopIndex — Type.Noop index type for storing vectors. Returns empty vectors of indexes, scores. Useful when search is done only in the db.
Garamond.SearchEnv — Type.Search environment object. It contains all the data, searchers
and additional structures needed by the engine to function.Garamond.SearchResult — Type.Object that stores the search results from a single searcher.
Garamond.Searcher — Type.Search object. It contains all the indexed data and relatedconfiguration that allows for searches to be performed.
Garamond.build_search_env — Method.build_search_env(config_path; cache_path=nothing)Creates a search environment using the information provided by the configuration file config_path.
Garamond.build_search_env — Method.build_search_env(env_config; cache_path=nothing)Creates a search environment using the information provided by the environment configuration env_config. A cache filepath can be specified by cache_path in which case the function will attempt to load it first.
Garamond.parse_configuration — Method.parse_configuration(filename)Parses a data configuration file (JSON format) and returns a NamedTuple that acts as a search environment configuration.
• Search environment options reference     data_loader::Function             # 0 argument function that when called loads the data i.e. dbdata     data_sampler::Function            # function that takes as input raw data and outputs a dbdata row     id_key::Symbol                    # the name of the primary integer key in dbdata     vectors_eltype::Type              # the type of the vectors, scores etc. has to be <:AbstractFloat     searcher_configs::Vector{NamedTuple}  # vector of searcher configs (see reference below)     embedder_configs::Vector{NamedTuple}  # vector of embedder configs (see reference below)     config_path::String               # the path to the config
• Embedder config fields reference     id::String     description::String     language::String                  # the embedder-level language     stem_words::Bool                  # whether to stem words     ngram_complexity::Int             # ngram complexity (i.e. max number of tokes for an n-gram)     vectors::Symbol                   # wordvectors calculation/source i.e. :count, :tf, :tfidf, :bm25, :word2vec, :glove, :conceptnet, :compressed     vectors_transform::Symbol         # transform to apply to the vectors i.e. :lsa, :rp, :none     vectors_dimension::Int            # desired dimensionality after transform (ignored for word2vec approaches)     embeddings_path::Union{Nothing, String}  # path to the embeddings file     embeddings_kind::Symbol           # Type of the embedding file for Word2Vec, GloVe i.e. :text, :binary     doc2vec_method::Symbol            # How to arrive at a single embedding from multiple i.e. :boe, :sif etc.     glove_vocabulary::Union{Nothing, String}  # Path to a GloVe-generated vocabulary file (only for binary embeddings)     oov_policy::Symbol                # what to do with non-embeddable documents i.e. :none, :largevector     `embedderkwarguments::Dict{Symbol, Any}# explicit specification of embedder keyword argumentsembeddablefields::Union{Nothing, Vector{Symbol}}# which fields to use for training the embeddertextstripflags::UInt32# How to strip text data before indexingsifalpha::Float# smooth inverse frequency α parameter (for 'sif' doc2vec method only)borepdimension::Int# output dimension for BOREP embedderboreppoolingfunction::Symbol# pooling function for the BOREP embedderdiscngram::Int`                   # DisC embedder ngram parameter
• Searcher config fields reference     id::String                        # searcher id     id_aggregation::String            # aggregation id     description::String               # description of the searcher     enabled::Vector{Bool}             # whether to use the searcher in search or not     search_index::Symbol              # type of the search index i.e. :naive, :kdtree, :hnsw     search_index_arguments::Vector{Any}     search_index_kwarguments::Dict{Symbol, Any}     indexable_fields::Union{Nothing, Vector{Symbol}}  # which fields to index     data_embedder::String             # id of the data/document embedder     input_embedder::String            # id of the input/query embedder     heuristic::Union{Nothing, Symbol} # search heuristic for suggesting mispelled words (nothing means no recommendations)     score_alpha::Float                # score alpha (parameter for the scoring function)     score_weight::Float               # weight of scores of searcher (used in result aggregation)
Garamond.rest_server — Method.rest_server(port::Integer, io_port::Integer, search_server_ready::Condition [;ipaddr::String])Starts a bi-directional HTTP REST server at address ipaddr::String (defaults to "0.0.0.0" i.e. all ip's) that uses the TCP port port and communicates with the search server through the TCP port io_port. The server is started once the condition search_server_ready is triggered.
Garamond.search — Method.search(srcher, query [;kwargs])Searches for query (i.e. key terms) in srcher, and returns information regarding the the documents that match best the query. The function returns an object of type SearchResult.
Arguments
srcher::Searcheris the searcherquerythe query, can be either aStringorVector{String}
Keyword arguments
search_method::Symbolcontrols the type of matching::exactuses exact matches while:regexconsideres the needle a regular expressionmax_matches::Intis the maximum number of search results to returnmax_suggestions::Intis the maximum number of suggestions to return for each missing needle
Garamond.search — Method.search(srchers, query [;kwargs])Searches for query (i.e. key terms) in multiple searches and returns information regarding the documents that match best the query. The function returns the search results in the form of a Vector{SearchResult}.
Arguments
srchers::Vector{Searcher}is the searchers vectorquerythe query, can be either aStringorVector{String}
Keyword arguments
search_method::Symbolcontrols the type of matching::exactuses exact matches while:regexconsideres the needle a regular expressionmax_matches::Intis the maximum number of search results to returnmax_suggestions::Intis the maximum number of suggestions to return for each missing needlecustom_weights::Dict{Symbol, Float}are custom weights for each searcher's results used in result aggregation
Garamond.search_server — Method.search_server(data_config_path, io_port, search_server_ready; cache_path=nothing)Search server for Garamond. It is a finite-state-machine that when called, creates the searchers i.e. search objects using the data_config_path and the proceeds to looping continuously in order to asynchronously handle outside requests.
After the searchers are loaded, the search server sends a notification using search_server_ready to any listening I/O servers.
Garamond.unix_socket_server — Method.unix_socket_server(socket::AbstractString, io_port::Integer, start::Condition)Starts a bi-directional unix socket server that uses a UNIX-socket socket and communicates with the search server through the TCP port io_port. The server is started once the condition start is triggered.
Garamond.web_socket_server — Method.web_socket_server(port::UInt16, io_port::Integer, start::Condition [; ipaddr::String])Starts a bi-directional web socket server that uses a WEB-socket at address ipaddr::String (defaults to "127.0.0.1") and port port and communicates with the search server through the TCP port io_port. The server is started once the condition start is triggered.
Garamond.DTVModel — Constant.Constant that represents document term vector (DTV) models used in text embedding.
Garamond.ENVOP_REQUEST — Constant.Request corresponding to an environment operation command.
Garamond.ERRORED_REQUEST — Constant.Request corresponding to an error i.e. in parsing.
Garamond.EmbeddingsLibrary — Constant.Constant that represents embeddings libraries used in text embedding.
Garamond.KILL_REQUEST — Constant.Request corresponding to a kill server command.
Garamond.READCONFIGS_REQUEST — Constant.Request corresponding to a searcher read configuration command.
Garamond.RESPONSE_TERMINATOR — Constant.Standard response terminator. It is used in the client-server communication mark the end of sent and received messages.
Garamond.UNINITIALIZED_REQUEST — Constant.Default request.
Garamond.BOEEmbedder — Type.Bag-of-embeddings (BOE) structure for document embedding using word vectors.
Garamond.BOREPEmbedder — Type.Bag-of-random-embedding-projections (BOREP) structure for document embedding using word vectors.
References
Garamond.CPMeanEmbedder — Type.Concatenated-power-mean-embeddings (CPMean) structure for document embedding using word vectors.
References
Garamond.DTVEmbedder — Type.Structure for document embedding using DTV's.
Garamond.DisCEmbedder — Type.Distributed Co-occurence (DisC) structure for document embedding using word vectors.
References
Garamond.InternalRequest — Type.Request object for the internal server of the engine.
Garamond.SIFEmbedder — Type.Smooth inverse frequency (SIF) structure for document embedding using word vectors.
References
Base.deleteat! — Method.deleteat!(env::SearchEnv, pos)Deletes from a search environment the db and index elements with linear indices found in pos.
Base.length — Method.length(index)Returns the number of points indexed in index.
Base.parse — Method.parse(::Type{InternalRequest}, request::AbstractString)Parses an outside request received from a client into an InternalRequest usable by the search server.
Base.pop! — Method.pop!(env::SearchEnv)Pops last point from a search environment. Returns last db row and associated indexed vector.
Base.popfirst! — Method.popfirst!(env::SearchEnv)Pops first point from a search environment. Returns first db row and associated indexed vector.
Base.push! — Method.push!(env::SearchEnv, rawdata)Pushes to a search environment i.e. to the db and all indexes.
Base.pushfirst! — Method.pushfirst!(env::SearchEnv, rawdata)Pushes to the first position to a search environment i.e. to the db and all indexes.
Garamond.__document2vec — Method.document2vec(embedder, document)Word-embeddings approach to document embedding. It embeds documents using word embeddings libraries and some algorithm for combining these (depending on the type of embedder).
Arguments
embedder::WordVectorsEmbedderis the embedderdocument::Vector{String}the document to be embedded, where each vector element corresponds to a sentence
Garamond.aggregate! — Method.Aggregates search results from several searchers based on their aggregation_id i.e. results from searchers with identical aggregation id's are merged together into a new search result that replaces the individual searcher ones.
Garamond.build_data_env — Method.build_data_env(env::SearchEnv)Strips searchers from env.
Garamond.build_logger — Function.build_logger(logging_stream, log_level)Builds a logger using the stream logging_streamand log_level provided.
Arguments
logging_stream::Stringis the output stream and can take the values:
"null" logs to /dev/null, "stdout" (default) logs to standard output,   "/path/to/existing/file" logs to an existing file and   "/path/to/non-existing/file" creates the log file. If no valid option   is provided, the default stream is the standard output.
log_level::Stringis the log level can take the values"debug",
"info", "error" and defaults to "info" if no valid option is provided.
Garamond.build_response — Method.build_response(dbdata, request, results, [; kwargs...])Builds a response for an engine client using the data, request and results.
Garamond.build_result_from_ids — Method.Constructs a search result from a list of data ids.
Garamond.build_searcher — Method.build_searcher(dbdata, config)Creates a Searcher from a searcher configuration.
Garamond.chop_to_length — Method.Post-processes a string to fit a certain length, adding … if necessary at the end of its choped represenation.
Garamond.densify — Method.densify(array)Transforms sparse arrays into dense ones.
Garamond.detect_language — Method.detect_language(text [; default=DEFAULT_LANGUAGE])Detects the language of a piece of text. Returns a language of type Languages.Language. If the text is empty of the confidence is low, return the default language.
Garamond.document2vec — Method.document2vec(embedder, document [;isregex=false])Embeds documents. The document representation is conceptually a vector of sentences, the output is always a vector of floating point numbers.
Arguments
embedder::AbstractEmbedderis the embedderdocument::Vector{AbstractString}the document to be embedded, where each vector element corresponds to a sentence
Keyword arguments
isregex::Boolafalsevalue (default) specifies that the document tokens are to be matched exactly while atruevalue specifies that the tokens are to be matched partially (for DTV-based document embedding only)
Garamond.env_operator — Method.env_operator(env, channels)Saves/Loads/Updates the search environment env. Communication with the search server i.e. getting the command and its arguments and sending back a new environment is done via channels.
Garamond.garamond_log_formatter — Method.garamond_log_formatter(level, _module, group, id, file, line)Garamond -specific log message formatter. Takes a fixed set of input arguments and returns the color, prefix and suffix for the log message.
Garamond.missing_needles — Function.Returns found and missing needles using an embedder
Garamond.noop_ranker — Method.Noop ranker, does not rank, returns the first input argument unchanged.
Garamond.printable_version — Method.printable_version()Returns a pretty version string that includes the git commit and date.
Garamond.read_configuration_to_json — Method.read_configuration_to_json(env)Returns a JSON dictionary with the full configuration of the search environment.
Garamond.respond — Method.respond(env, socket, counter, channels)Responds to search server requests received on socket using the search data from searchers. The requests are counted through the variable counter.
Garamond.sentences2vec — Method.sentences2vec(embedder, document_embedding, embedded_words [;dim=0])Returns a matrix of sentence embeddings from a vector of matrices containing individual sentence word embeddings. Used mostly for word-vectors based embedders.
Arguments
embedder::AbstractEmbedderis the embedderdocument_embedding::Vector{Matrix{AbstractFloat}}are the document's word embeddings, where each element of the vector represents the embedding of a sentence (whith the matrix columns individual word embeddings)
Keyword arguments
dim::Intis the dimension of the word embeddings i.e. number of components in the word vector (default0)embedded_words::Vector{Vector{AbstractString}}are the words in each sentence the were embedded (their order corresponds to the order of the matrix columns indocument_embedding
Garamond.squash — Method.squash(m)Function that creates a single mean vector from a matrix m and performs some normalization operations as well.
Garamond.squash — Method.squash(vv, m)Function that creates a single mean vector from a vector of vectors vv where each vector has a length m and performs some normalization operations as well.
Garamond.suggestion_search! — Method.suggestion_search!(suggestions, search_tree, needles [;max_suggestions=1])Searches in the search tree for partial matches for each of  the needles.
Garamond.summarize — Method.summarize(sentences [;ns=1, flags=DEFAULT_SUMMARIZATION_STRIP_FLAGS])Build a summary of the text's sentences. The resulting summary will be a ns sentence document; each sentence is pre-procesed using the flags option.
Garamond.version — Method.version()Returns the current Garamond version using the Project.toml and git. If the Project.toml, git are not available, the version defaults to an empty string.
Garamond.word_embeddings — Method.word_embeddings(word_vectors, document_tokens [;kwargs])Returns a matrix corresponding to the word embeddings of document_tokens as well as the indices of missing i.e. not-embedded tokens.
Arguments
word_vectors::EmbeddingsLibrarywordvectors object; can be aWord2Vec.WordVectors,Glowe.WordVectorsorConceptnetNumberbatch.ConceptNetdocument_tokens::Vector{String}the words to be embedded, where each vector element corresponds to a word
Keyword arguments
keep_size::Boolafalsevalue discards vectors for words not found while atruevalue (default) places a zero vector in the embeddings matrixprint_matched_words::Booliftrue, the words that were and that were not embedded are printed (defaultfalse)kwargs...the rest of the keyword arguments areConceptNetspecific and can be found by inspecting the help ofConceptnetNumberbatch.embed_document
HNSW.knn_search — Method.knn_search(index, point, k, keep)Searches for the k nearest neighbors of point in data contained in the index. The index may vary from a simple wrapper inside a matrix to more complex structures such as k-d trees, etc. Only neighbors present in keep are returned.