API Reference
compress(wv [;kwargs...])

Compresses wv::WordVectors by using array quantization.

Keyword arguments

  • sampling_ratio::AbstractFloat specifies the percentage of vectors to use

for quantization codebook creation

  • k::Int number of quantization values for a codebook
  • m::Int number of codebooks to use
  • method::Symbol specifies the array quantization method
  • distance::PreMetric is the distance

Other keyword arguments specific to the quantization methods can also be provided.

source
compressedwordvectors(filename [,type=Float64][; kind=:text])

Generate a CompressedWordVectors type object from a file.

Arguments

  • filename::AbstractString the embeddings file name
  • type::Type type of the embedding vector elements; default Float64

Keyword arguments

  • kind::Symbol specifies whether the embeddings file is textual (:text)

or binary (:binary); default :text

source
conceptnet2wv(cptnet, language)

Converts a ConceptNet object, cptnet to a WordVectors object. The language of the word embeddings has to be specified explicitly as a Symbol or Languages.Language (Conceptnet embeddings can be multilingual).

source
cosine_vec(wv::WordVectors, wordvector, n=10 [;vocab=nothing])

Compute the cosine similarities and return best n positions and calculated values between wordvector and the word vectors from wv. A vocabulary mask vocab can be specified to consider only a subset of word vectors.

source
pca_reduction(wv::WordVectors, rdim=7, outdim=size(wv.vectors,1); [do_pca=true])

Post-processes word embeddings wv by removing the first rdim PCA components from the word vectors and also reduces the dimensionality to outdim through a subsequent PCA transform, if do_pca=true.

Arguments

  • wv::WordVectors the word embeddings
  • rdim::Int the number of PCA components to remove from the data (default 7)
  • outdim::Int the output dimensionality of the data after the PCA dimensionality reduction; it is performed only if do_pca=true and the default value is the same as that of the input embeddings i.e. no reduction

Keyword arguments

  • do_pca::Bool whether to perform a PCA transform of the post-processed data (default true)

References:

source
similarity_order(wv::WordVectors, alpha=-0.65)

Post-processes the word embeddings wv so that the embeddings capture more information than directly apparent through a linear transformation that adjusts the similarity order of the model. The function returns a new WordVectors object containing the processed embeddings.

Arguments

  • wv::WordVectors the word embeddings

alpha::AbstractFloat the α parameter of the algorithm (default -0.65)

References:

source
vocab_reduction(wv::WordVectors, seed, nn)

Produces a reduced vocabulary version of wv by removing all but the nn nearest neighbors of each word present in the vocabulary seed.

source
write2disk(filename::AbstractString, wv::CompressedWordVectors [;kind=:binary])

Writes compressed embeddings to disk.

Arguments

  • filename::AbstractString the embeddings file name
  • wv::CompressedWordVectors the embeddings

Keyword arguments

  • kind::Symbol specifies whether the embeddings file is textual (:text)

or binary (:binary); default :binary

source
write2disk(filename::AbstractString, wv::WordVectors [;kind=:binary])

Writes embeddings to disk.

Arguments

  • filename::AbstractString the embeddings file name
  • wv::WordVectors the embeddings

Keyword arguments

  • kind::Symbol specifies whether the embeddings file is textual (:text)

or binary (:binary); default :binary

source
analogy_words(cwv, pos, neg, n=5)

Return the top n words computed by analogy similarity between positive words pos and negaive words neg. from the CompressedWordVectors cwv.

source
get_vector(cwv, word)

Return the vector representation of word from the CompressedWordVectors cwv.

source
vocabulary(cwv)

Return the vocabulary as a vector of words of the CompressedWordVectors cwv.

source
Base.sizeMethod.
size(cwv)

Return the word vector length and the number of words as a tuple.

source
analogy(cwv, pos, neg, n=5)

Compute the analogy similarity between two lists of words. The positions and the similarity values of the top n similar words will be returned. For example, king - man + woman = queen will be pos=["king", "woman"], neg=["man"].

source
cosine(cwv, word, n=10)

Return the position of n (by default n = 10) neighbors of word and their cosine similarities.

source
cosine_similar_words(cwv, word, n=10)

Return the top n (by default n = 10) most similar words to word from the CompressedWordVectors cwv.

source
in_vocabulary(cwv, word)

Return true if word is part of the vocabulary of the CompressedWordVector cwv and false otherwise.

source
index(cwv, word)

Return the index of word from the CompressedWordVectors cwv.

source
similarity(cwv, word1, word2)

Return the cosine similarity value between two words word1 and word2.

source