Base.size
EmbeddingsAnalysis.analogy
EmbeddingsAnalysis.compress
EmbeddingsAnalysis.compressedwordvectors
EmbeddingsAnalysis.conceptnet2wv
EmbeddingsAnalysis.cosine
EmbeddingsAnalysis.cosine_similar_words
EmbeddingsAnalysis.cosine_vec
EmbeddingsAnalysis.in_vocabulary
EmbeddingsAnalysis.index
EmbeddingsAnalysis.pca_reduction
EmbeddingsAnalysis.similarity
EmbeddingsAnalysis.similarity_order
EmbeddingsAnalysis.vocab_reduction
EmbeddingsAnalysis.write2disk
EmbeddingsAnalysis.write2disk
Word2Vec.analogy_words
Word2Vec.get_vector
Word2Vec.vocabulary
EmbeddingsAnalysis.compress
— Method.compress(wv [;kwargs...])
Compresses wv::WordVectors
by using array quantization.
Keyword arguments
sampling_ratio::AbstractFloat
specifies the percentage of vectors to use
for quantization codebook creation
k::Int
number of quantization values for a codebookm::Int
number of codebooks to usemethod::Symbol
specifies the array quantization methoddistance::PreMetric
is the distance
Other keyword arguments specific to the quantization methods can also be provided.
EmbeddingsAnalysis.compressedwordvectors
— Method.compressedwordvectors(filename [,type=Float64][; kind=:text])
Generate a CompressedWordVectors
type object from a file.
Arguments
filename::AbstractString
the embeddings file nametype::Type
type of the embedding vector elements; defaultFloat64
Keyword arguments
kind::Symbol
specifies whether the embeddings file is textual (:text
)
or binary (:binary
); default :text
EmbeddingsAnalysis.conceptnet2wv
— Method.conceptnet2wv(cptnet, language)
Converts a ConceptNet
object, cptnet
to a WordVectors
object. The language
of the word embeddings has to be specified explicitly as a Symbol
or Languages.Language
(Conceptnet embeddings can be multilingual).
EmbeddingsAnalysis.cosine_vec
— Function.cosine_vec(wv::WordVectors, wordvector, n=10 [;vocab=nothing])
Compute the cosine similarities and return best n
positions and calculated values between wordvector
and the word vectors from wv
. A vocabulary mask vocab
can be specified to consider only a subset of word vectors.
EmbeddingsAnalysis.pca_reduction
— Method.pca_reduction(wv::WordVectors, rdim=7, outdim=size(wv.vectors,1); [do_pca=true])
Post-processes word embeddings wv
by removing the first rdim
PCA components from the word vectors and also reduces the dimensionality to outdim
through a subsequent PCA transform, if do_pca=true
.
Arguments
wv::WordVectors
the word embeddingsrdim::Int
the number of PCA components to remove from the data (default 7)outdim::Int
the output dimensionality of the data after the PCA dimensionality reduction; it is performed only ifdo_pca=true
and the default value is the same as that of the input embeddings i.e. no reduction
Keyword arguments
do_pca::Bool
whether to perform a PCA transform of the post-processed data (defaulttrue
)
References:
EmbeddingsAnalysis.similarity_order
— Method.similarity_order(wv::WordVectors, alpha=-0.65)
Post-processes the word embeddings wv
so that the embeddings capture more information than directly apparent through a linear transformation that adjusts the similarity order of the model. The function returns a new WordVectors
object containing the processed embeddings.
Arguments
wv::WordVectors
the word embeddings
alpha::AbstractFloat
the α
parameter of the algorithm (default -0.65)
References:
EmbeddingsAnalysis.vocab_reduction
— Method.vocab_reduction(wv::WordVectors, seed, nn)
Produces a reduced vocabulary version of wv
by removing all but the nn
nearest neighbors of each word present in the vocabulary seed
.
EmbeddingsAnalysis.write2disk
— Method.write2disk(filename::AbstractString, wv::CompressedWordVectors [;kind=:binary])
Writes compressed embeddings to disk.
Arguments
filename::AbstractString
the embeddings file namewv::CompressedWordVectors
the embeddings
Keyword arguments
kind::Symbol
specifies whether the embeddings file is textual (:text
)
or binary (:binary
); default :binary
EmbeddingsAnalysis.write2disk
— Method.write2disk(filename::AbstractString, wv::WordVectors [;kind=:binary])
Writes embeddings to disk.
Arguments
filename::AbstractString
the embeddings file namewv::WordVectors
the embeddings
Keyword arguments
kind::Symbol
specifies whether the embeddings file is textual (:text
)
or binary (:binary
); default :binary
Word2Vec.analogy_words
— Function.analogy_words(cwv, pos, neg, n=5)
Return the top n
words computed by analogy similarity between positive words pos
and negaive words neg
. from the CompressedWordVectors cwv
.
Word2Vec.get_vector
— Method.get_vector(cwv, word)
Return the vector representation of word
from the CompressedWordVectors cwv
.
Word2Vec.vocabulary
— Method.vocabulary(cwv)
Return the vocabulary as a vector of words of the CompressedWordVectors cwv
.
Base.size
— Method.size(cwv)
Return the word vector length and the number of words as a tuple.
EmbeddingsAnalysis.analogy
— Method.analogy(cwv, pos, neg, n=5)
Compute the analogy similarity between two lists of words. The positions and the similarity values of the top n
similar words will be returned. For example, king - man + woman = queen
will be pos=["king", "woman"], neg=["man"]
.
EmbeddingsAnalysis.cosine
— Function.cosine(cwv, word, n=10)
Return the position of n
(by default n = 10
) neighbors of word
and their cosine similarities.
EmbeddingsAnalysis.cosine_similar_words
— Function.cosine_similar_words(cwv, word, n=10)
Return the top n
(by default n = 10
) most similar words to word
from the CompressedWordVectors cwv
.
EmbeddingsAnalysis.in_vocabulary
— Method.in_vocabulary(cwv, word)
Return true
if word
is part of the vocabulary of the CompressedWordVector cwv
and false
otherwise.
EmbeddingsAnalysis.index
— Method.index(cwv, word)
Return the index of word
from the CompressedWordVectors cwv
.
EmbeddingsAnalysis.similarity
— Method.similarity(cwv, word1, word2)
Return the cosine similarity value between two words word1
and word2
.