matchzoo.embedding package

Submodules

matchzoo.embedding.embedding module

Matchzoo toolkit for token embedding.

class matchzoo.embedding.embedding.Embedding(data)

Bases: object

Embedding class.

Examples::
>>> import matchzoo as mz
>>> train_raw = mz.datasets.toy.load_data()
>>> pp = mz.preprocessors.NaivePreprocessor()
>>> train = pp.fit_transform(train_raw, verbose=0)
>>> vocab_unit = mz.build_vocab_unit(train, verbose=0)
>>> term_index = vocab_unit.state['term_index']
>>> embed_path = mz.datasets.embeddings.EMBED_RANK
To load from a file:
>>> embedding = mz.embedding.load_from_file(embed_path)
>>> matrix = embedding.build_matrix(term_index)
>>> matrix.shape[0] == len(term_index)
True
To build your own:
>>> data = pd.DataFrame(data=[[0, 1], [2, 3]], index=['A', 'B'])
>>> embedding = mz.Embedding(data)
>>> matrix = embedding.build_matrix({'A': 2, 'B': 1, '_PAD': 0})
>>> matrix.shape == (3, 2)
True
build_matrix(term_index, initializer=<function Embedding.<lambda>>)

Build a matrix using term_index.

Parameters:
  • term_index (dict) – A dict or TermIndex to build with.
  • initializer – A callable that returns a default value for missing terms in data. (default: a random uniform distribution in range) (-0.2, 0.2)).
Return type:

ndarray

Returns:

A matrix.

input_dim

return Embedding input dimension.

Return type:int
output_dim

return Embedding output dimension.

Return type:int
matchzoo.embedding.embedding.load_from_file(file_path, mode='word2vec')

Load embedding from file_path.

Parameters:
  • file_path (str) – Path to file.
  • mode (str) – Embedding file format mode, one of ‘word2vec’ or ‘glove’. (default: ‘word2vec’)
Return type:

Embedding

Returns:

An matchzoo.embedding.Embedding instance.

Module contents