matchzoo.embedding package¶
Submodules¶
matchzoo.embedding.embedding module¶
Matchzoo toolkit for token embedding.
-
class
matchzoo.embedding.embedding.Embedding(data)¶ Bases:
objectEmbedding class.
- Examples::
>>> import matchzoo as mz >>> train_raw = mz.datasets.toy.load_data() >>> pp = mz.preprocessors.NaivePreprocessor() >>> train = pp.fit_transform(train_raw, verbose=0) >>> vocab_unit = mz.build_vocab_unit(train, verbose=0) >>> term_index = vocab_unit.state['term_index'] >>> embed_path = mz.datasets.embeddings.EMBED_RANK
- To load from a file:
>>> embedding = mz.embedding.load_from_file(embed_path) >>> matrix = embedding.build_matrix(term_index) >>> matrix.shape[0] == len(term_index) True
- To build your own:
>>> data = pd.DataFrame(data=[[0, 1], [2, 3]], index=['A', 'B']) >>> embedding = mz.Embedding(data) >>> matrix = embedding.build_matrix({'A': 2, 'B': 1, '_PAD': 0}) >>> matrix.shape == (3, 2) True
-
build_matrix(term_index, initializer=<function Embedding.<lambda>>)¶ Build a matrix using term_index.
Parameters: - term_index (
dict) – A dict or TermIndex to build with. - initializer – A callable that returns a default value for missing terms in data. (default: a random uniform distribution in range) (-0.2, 0.2)).
Return type: ndarrayReturns: A matrix.
- term_index (
-
input_dim¶ return Embedding input dimension.
Return type: int
-
output_dim¶ return Embedding output dimension.
Return type: int
-
matchzoo.embedding.embedding.load_from_file(file_path, mode='word2vec')¶ Load embedding from file_path.
Parameters: - file_path (
str) – Path to file. - mode (
str) – Embedding file format mode, one of ‘word2vec’ or ‘glove’. (default: ‘word2vec’)
Return type: Returns: An
matchzoo.embedding.Embeddinginstance.- file_path (