Welcome to MatchZoo's documentation!¶

MatchZoo是一款深度文本匹配的工具。它是为了促进深度文本匹配模型的设计比较和共享而开发。这里有很多像DRMM,MatchPyramid, MV-LSTM, aNMM, DUET, ARC-I, ARC-II, DSSM和 CDSSM一样的深度匹配模型,使用统一的接口进行封装。MatchZoo主要解决的任务有文档检索,问题回答,会话应答排序,同义句识别,等等。我们非常欢迎MatchZoo的用户提出建议与意见和贡献代码。
matchzoo package¶
Subpackages¶
matchzoo.auto package¶
Submodules¶
matchzoo.auto.prepare module¶
Prepare mode, preprocessor, and data pack.
-
matchzoo.auto.prepare.
prepare
(model, data_pack, preprocessor=None, verbose=1)¶ Prepare mode, preprocessor, and data pack.
This handles interaction among data, model, and preprocessor automatically. For example, some model like DSSM have dynamic input shapes based on the result of word hashing. Some models have an embedding layer which dimension is related to the data's vocabulary size. prepare takes care of all that and returns properly prepared model, data, and preprocessor for you.
参数: - model --
- data_pack --
- preprocessor -- If not set, use the model's default preprocessor.
- verbose -- Verbosity, 0 or 1.
返回类型: Tuple
[BaseModel
,DataPack
,BasePreprocessor
]返回:
matchzoo.auto.tuner module¶
Tuner class. Currently a minimum working demo.
Module contents¶
matchzoo.contrib package¶
Subpackages¶
matchzoo.contrib.models package¶
Match LSTM model.
-
class
matchzoo.contrib.models.match_lstm.
MatchLSTM
(params=None, backend=None)¶ 基类:
matchzoo.engine.base_model.BaseModel
Match LSTM model.
Examples
>>> model = MatchLSTM() >>> model.guess_and_fill_missing_params(verbose=0) >>> model.params['embedding_input_dim'] = 10000 >>> model.params['embedding_output_dim'] = 100 >>> model.params['embedding_trainable'] = True >>> model.params['fc_num_units'] = 200 >>> model.params['lstm_num_units'] = 256 >>> model.params['dropout_rate'] = 0.5 >>> model.build()
-
build
()¶ Build model.
-
classmethod
get_default_params
()¶ Get default parameters.
-
Module contents¶
matchzoo.data_generator package¶
Submodules¶
matchzoo.data_generator.data_generator module¶
Base generator.
-
class
matchzoo.data_generator.data_generator.
DataGenerator
(data_pack, batch_size=32, shuffle=True)¶ 基类:
keras.utils.data_utils.Sequence
Abstract base class of all matchzoo generators.
Every generator must implement
_get_batch_of_transformed_samples()
method.Examples
>>> import matchzoo as mz >>> raw_data = mz.datasets.toy.load_data() >>> data_generator = DataGenerator(raw_data, batch_size=3, ... shuffle=False) >>> len(data_generator) 34 >>> data_generator.num_instance 100 >>> x, y = data_generator[-1] >>> type(x) <class 'dict'> >>> x.keys() dict_keys(['id_left', 'text_left', 'id_right', 'text_right']) >>> type(x['id_left']) <class 'numpy.ndarray'> >>> type(x['id_right']) <class 'numpy.ndarray'> >>> type(x['text_left']) <class 'numpy.ndarray'> >>> type(x['text_right']) <class 'numpy.ndarray'> >>> type(y) <class 'numpy.ndarray'>
-
num_instance
¶ Return the number of instances.
返回类型: int
-
on_epoch_end
()¶ Reorganize the index array while epoch is ended.
-
reset
()¶ Reset the generator from begin.
-
matchzoo.data_generator.dpool_data_generator module¶
Data generator with dynamic pooling.
-
class
matchzoo.data_generator.dpool_data_generator.
DPoolDataGenerator
(data_pack, fixed_length_left, fixed_length_right, compress_ratio_left=1, compress_ratio_right=1, batch_size=32, shuffle=True)¶ 基类:
matchzoo.data_generator.data_generator.DataGenerator
Generate data with dynamic pooling.
Examples
>>> import matchzoo as mz >>> raw_data = mz.datasets.toy.load_data() >>> preprocessor = mz.preprocessors.BasicPreprocessor( ... fixed_length_left=10, ... fixed_length_right=40, ... remove_stop_words=True) >>> processed_data = preprocessor.fit_transform(raw_data) >>> data_generator = DPoolDataGenerator(processed_data, 3, 10, ... batch_size=3, shuffle=False) >>> len(data_generator) 34 >>> data_generator.num_instance 100 >>> x, y = data_generator[-1] >>> type(x) <class 'dict'> >>> x.keys() dict_keys(['id_left', 'text_left', 'length_left', 'id_right', 'text_right', 'length_right', 'dpool_index']) >>> type(x['id_left']) <class 'numpy.ndarray'> >>> type(x['id_right']) <class 'numpy.ndarray'> >>> type(x['text_left']) <class 'numpy.ndarray'> >>> type(x['text_right']) <class 'numpy.ndarray'> >>> type(y) <class 'numpy.ndarray'>
-
class
matchzoo.data_generator.dpool_data_generator.
DPoolPairDataGenerator
(data_pack, fixed_length_left, fixed_length_right, compress_ratio_left=1, compress_ratio_right=1, num_dup=1, num_neg=1, batch_size=32, shuffle=True)¶ 基类:
matchzoo.data_generator.pair_data_generator.PairDataGenerator
Generate pair-wise data with dynamic pooling.
Examples
>>> np.random.seed(111) >>> import matchzoo as mz >>> raw_data = mz.datasets.toy.load_data() >>> preprocessor = mz.preprocessors.BasicPreprocessor( ... fixed_length_left=10, ... fixed_length_right=40, ... remove_stop_words=True) >>> processed_data = preprocessor.fit_transform(raw_data) >>> data_generator = DPoolPairDataGenerator(processed_data, 3, 10, ... 1, 1, 2, 1, 3, False) >>> data_generator.num_instance 10 >>> len(data_generator) 4 >>> x, y = data_generator[0] >>> type(x) <class 'dict'> >>> x.keys() dict_keys(['id_left', 'text_left', 'length_left', 'id_right', 'text_right', 'length_right', 'dpool_index']) >>> type(x['id_left']) <class 'numpy.ndarray'> >>> type(x['id_right']) <class 'numpy.ndarray'> >>> type(x['text_left']) <class 'numpy.ndarray'> >>> type(x['text_right']) <class 'numpy.ndarray'> >>> len(x['id_left']) 6 >>> len(x['id_right']) 6 >>> type(y) <class 'numpy.ndarray'>
matchzoo.data_generator.dynamic_data_generator module¶
Dynamic data generator with transform function inside.
-
class
matchzoo.data_generator.dynamic_data_generator.
DynamicDataGenerator
(func, *args, **kwargs)¶ 基类:
matchzoo.data_generator.data_generator.DataGenerator
Data generator with preprocess unit inside.
Examples
>>> import matchzoo as mz >>> raw_data = mz.datasets.toy.load_data() >>> data_generator = DynamicDataGenerator(len, data_pack=raw_data, ... batch_size=1, shuffle=False) >>> len(data_generator) 100 >>> x, y = data_generator[0] >>> type(x) <class 'dict'> >>> x.keys() dict_keys(['id_left', 'text_left', 'id_right', 'text_right']) >>> type(x['id_left']) <class 'numpy.ndarray'> >>> type(x['id_right']) <class 'numpy.ndarray'> >>> type(x['text_left']) <class 'numpy.ndarray'> >>> type(x['text_right']) <class 'numpy.ndarray'> >>> type(y) <class 'numpy.ndarray'>
matchzoo.data_generator.histogram_data_generator module¶
Data generator with matching histogram.
-
class
matchzoo.data_generator.histogram_data_generator.
HistogramDataGenerator
(data_pack, embedding_matrix, bin_size=30, hist_mode='CH', batch_size=32, shuffle=True)¶ 基类:
matchzoo.data_generator.data_generator.DataGenerator
Generate data with matching histogram.
参数: - data_pack (
DataPack
) -- The input data pack. - embedding_matrix (
ndarray
) -- The embedding matrix used to generator match histogram. - bin_size (
int
) -- The number of bin size of the histogram. - hist_mode (
str
) -- The mode of theMatchingHistogramUnit
, one of CH, NH, and LCH. - batch_size (
int
) -- The batch size. - shuffle (
bool
) -- Boolean, whether to shuffle the data while generating a batch.
Examples
>>> import matchzoo as mz >>> raw_data = mz.datasets.toy.load_data() >>> preprocessor = mz.preprocessors.BasicPreprocessor() >>> processed_data = preprocessor.fit_transform(raw_data) >>> raw_embedding = mz.embedding.load_from_file( ... mz.datasets.embeddings.EMBED_10_GLOVE ... ) >>> embedding_matrix = raw_embedding.build_matrix( ... preprocessor.context['vocab_unit'].state['term_index'] ... ) >>> data_generator = HistogramDataGenerator(processed_data, ... embedding_matrix, 3, 'CH', batch_size=3, shuffle=False ... ) >>> x, y = data_generator[-1] >>> type(x) <class 'dict'> >>> x.keys() dict_keys(['id_left', 'text_left', 'length_left', 'id_right', 'text_right', 'length_right', 'match_histogram']) >>> type(x['match_histogram']) <class 'numpy.ndarray'> >>> x['match_histogram'].shape (1, 30, 3) >>> type(y) <class 'numpy.ndarray'>
- data_pack (
-
class
matchzoo.data_generator.histogram_data_generator.
HistogramPairDataGenerator
(data_pack, embedding_matrix, bin_size=30, hist_mode='CH', num_dup=1, num_neg=1, batch_size=32, shuffle=True)¶ 基类:
matchzoo.data_generator.pair_data_generator.PairDataGenerator
Generate pair-wise data with matching histogram.
参数: - data_pack (
DataPack
) -- The input data pack. - embedding_matrix (
ndarray
) -- The embedding matrix used to generator match histogram. - bin_size (
int
) -- The number of bin size of the histogram. - hist_mode (
str
) -- The mode of theMatchingHistogramUnit
, one of CH, NH, and LCH. - batch_size (
int
) -- The batch size. - shuffle (
bool
) -- Boolean, whether to shuffle the data while generating a batch.
Examples
>>> np.random.seed(111) >>> import matchzoo as mz >>> raw_data = mz.datasets.toy.load_data() >>> preprocessor = mz.preprocessors.BasicPreprocessor() >>> processed_data = preprocessor.fit_transform(raw_data) >>> raw_embedding = mz.embedding.load_from_file( ... mz.datasets.embeddings.EMBED_10_GLOVE ... ) >>> embedding_matrix = raw_embedding.build_matrix( ... preprocessor.context['vocab_unit'].state['term_index'] ... ) >>> data_generator = HistogramPairDataGenerator(processed_data, ... embedding_matrix, 3, 'CH', 1, 1, 3, False) >>> len(data_generator) 2 >>> x, y = data_generator[0] >>> type(x) <class 'dict'> >>> x.keys() dict_keys(['id_left', 'text_left', 'length_left', 'id_right', 'text_right', 'length_right', 'match_histogram']) >>> type(x['match_histogram']) <class 'numpy.ndarray'> >>> x['match_histogram'].shape (6, 30, 3) >>> type(y) <class 'numpy.ndarray'>
- data_pack (
-
matchzoo.data_generator.histogram_data_generator.
match_histogram_generator
(x, match_hist_unit)¶ Generate the matching hisogram for input.
参数: - x (
dict
) -- The input dict. - match_hist_unit (
MatchingHistogramUnit
) -- The histogram unitMatchingHistogramUnit
.
返回类型: ndarray
返回: The matching histogram.
- x (
-
matchzoo.data_generator.histogram_data_generator.
trunc_text
(input_text, length)¶ Truncating the input text according to the input length.
参数: - input_text (
list
) -- The input text need to be truncated. - length (
list
) -- The length used to truncated the text.
返回类型: list
返回: The truncated text.
- input_text (
matchzoo.data_generator.pair_data_generator module¶
Pair-wise data generator.
-
class
matchzoo.data_generator.pair_data_generator.
PairDataGenerator
(data_pack, num_dup=1, num_neg=1, batch_size=32, shuffle=True)¶ 基类:
matchzoo.data_generator.data_generator.DataGenerator
Generate pair-wise data.
Examples
>>> np.random.seed(111) >>> import matchzoo as mz >>> raw_data = mz.datasets.toy.load_data() >>> data_generator = PairDataGenerator(raw_data, 2, 1, 3, False) >>> data_generator.num_instance 10 >>> len(data_generator) 4 >>> x, y = data_generator[0] >>> type(x) <class 'dict'> >>> x.keys() dict_keys(['id_left', 'text_left', 'id_right', 'text_right']) >>> type(x['id_left']) <class 'numpy.ndarray'> >>> type(x['id_right']) <class 'numpy.ndarray'> >>> type(x['text_left']) <class 'numpy.ndarray'> >>> type(x['text_right']) <class 'numpy.ndarray'> >>> len(x['id_left']) 6 >>> len(x['id_right']) 6 >>> type(y) <class 'numpy.ndarray'>
-
num_instance
¶ Get the total number of pairs.
返回类型: int
-
classmethod
reorganize_data_pack
(data_pack, num_dup=1, num_neg=1)¶ Re-organize the data pack as pair-wise format.
参数: - data_pack (
DataPack
) -- the inputDataPack
. - num_dup (
int
) -- number of duplicates for each positive sample. - num_neg (
int
) -- number of negative samples associated with each positive sample.
返回: the reorganized
DataPack
object.- data_pack (
-
Module contents¶
matchzoo.data_pack package¶
Submodules¶
matchzoo.data_pack.build_unit_from_data_pack module¶
Build unit from data pack.
-
matchzoo.data_pack.build_unit_from_data_pack.
build_unit_from_data_pack
(unit, data_pack, mode='both', flatten=True, verbose=1)¶ Build a
StatefulProcessorUnit
from aDataPack
object.参数: - unit (
StatefulProcessorUnit
) --StatefulProcessorUnit
object to be built. - data_pack (
DataPack
) -- The inputDataPack
object. - mode (
str
) -- One of 'left', 'right', and 'both', to determine the source data for building theVocabularyUnit
. - flatten (
bool
) -- Flatten the datapack or not. True to organize theDataPack
text as a list, and False to organizeDataPack
text as a list of list. - verbose (
int
) -- Verbosity.
返回类型: 返回: A built
StatefulProcessorUnit
object.- unit (
matchzoo.data_pack.build_vocab_unit module¶
Build a processor_units.VocabularyUnit
given data_pack.
-
matchzoo.data_pack.build_vocab_unit.
build_vocab_unit
(data_pack, mode='both', verbose=1)¶ Build a
processor_units.VocabularyUnit
given data_pack.The data_pack should be preprocessed forehand, and each item in text_left and text_right columns of the data_pack should be a list of tokens.
参数: - data_pack (
DataPack
) -- TheDataPack
to build vocabulary upon. - mode (
str
) -- One of 'left', 'right', and 'both', to determine the source data for building theVocabularyUnit
. - verbose (
int
) -- Verbosity.
返回类型: 返回: A built vocabulary unit.
- data_pack (
matchzoo.data_pack.data_pack module¶
Matchzoo DataPack, pair-wise tuple (feature) and context as input.
-
class
matchzoo.data_pack.data_pack.
DataPack
(relation, left, right)¶ 基类:
object
Matchzoo
DataPack
data structure, store dataframe and context.DataPack is a MatchZoo native data structure that most MatchZoo data handling processes build upon. A DataPack consists of three parts: left, right and relation, each one of is a pandas.DataFrame.
参数: - relation (
DataFrame
) -- Store the relation between left document and right document use ids. - left (
DataFrame
) -- Store the content or features for id_left. - right (
DataFrame
) -- Store the content or features for id_right.
Example
>>> left = [ ... ['qid1', 'query 1'], ... ['qid2', 'query 2'] ... ] >>> right = [ ... ['did1', 'document 1'], ... ['did2', 'document 2'] ... ] >>> relation = [['qid1', 'did1', 1], ['qid2', 'did2', 1]] >>> relation_df = pd.DataFrame(relation) >>> left = pd.DataFrame(left) >>> right = pd.DataFrame(right) >>> dp = DataPack( ... relation=relation_df, ... left=left, ... right=right, ... ) >>> len(dp) 2
-
DATA_FILENAME
= 'data.dill'¶
-
class
FrameView
(data_pack)¶ 基类:
object
FrameView.
-
append_text_length
()¶ Append length_left and length_right columns.
参数: inplace -- True to modify inplace, False to return a modified copy. (default: False) Example
>>> import matchzoo as mz >>> data_pack = mz.datasets.toy.load_data() >>> 'length_left' in data_pack.frame[0].columns False >>> new_data_pack = data_pack.append_text_length() >>> 'length_left' in new_data_pack.frame[0].columns True >>> 'length_left' in data_pack.frame[0].columns False >>> data_pack.append_text_length(inplace=True) >>> 'length_left' in data_pack.frame[0].columns True
-
apply_on_text
(func, mode='both', rename=None, verbose=1)¶ Apply func to text columns based on mode.
参数: - func (
Callable
) -- The function to apply. - mode (
str
) -- One of "both", "left" and "right". - rename (
Optional
[str
]) -- If set, use new names for results instead of replacing the original columns. To set rename in "both" mode, use a tuple of str, e.g. ("text_left_new_name", "text_right_new_name"). - inplace -- True to modify inplace, False to return a modified copy. (default: False)
- verbose (
int
) -- Verbosity.
返回: - Examples::
>>> import matchzoo as mz >>> data_pack = mz.datasets.toy.load_data() >>> frame = data_pack.frame
- To apply len on the left text and add the result as 'length_left':
>>> data_pack.apply_on_text(len, mode='left', ... rename='length_left', ... inplace=True) >>> list(frame[0].columns) ['id_left', 'text_left', 'length_left', 'id_right', 'text_right', 'label']
- To do the same to the right text:
>>> data_pack.apply_on_text(len, mode='right', ... rename='length_right', ... inplace=True) >>> list(frame[0].columns) ['id_left', 'text_left', 'length_left', 'id_right', 'text_right', 'length_right', 'label']
- To do the same to the both texts at the same time:
>>> data_pack.apply_on_text(len, mode='both', ... rename=('extra_left', 'extra_right'), ... inplace=True) >>> list(frame[0].columns) ['id_left', 'text_left', 'length_left', 'extra_left', 'id_right', 'text_right', 'length_right', 'extra_right', 'label']
- To suppress outputs:
>>> data_pack.apply_on_text(len, mode='both', verbose=0, ... inplace=True)
- func (
-
drop_label
()¶ Remove label column from the data pack.
参数: inplace -- True to modify inplace, False to return a modified copy. (default: False) Example
>>> import matchzoo as mz >>> data_pack = mz.datasets.toy.load_data() >>> data_pack.has_label True >>> data_pack.drop_label(inplace=True) >>> data_pack.has_label False
-
frame
¶ View the data pack as a
pandas.DataFrame
.Returned data frame is created by merging the left data frame, the right dataframe and the relation data frame. Use [] to access an item or a slice of items.
返回类型: FrameView
返回: A matchzoo.DataPack.FrameView
instance.Example
>>> import matchzoo as mz >>> data_pack = mz.datasets.toy.load_data() >>> type(data_pack.frame) <class 'matchzoo.data_pack.data_pack.DataPack.FrameView'> >>> frame_slice = data_pack.frame[0:5] >>> type(frame_slice) <class 'pandas.core.frame.DataFrame'> >>> list(frame_slice.columns) ['id_left', 'text_left', 'id_right', 'text_right', 'label'] >>> full_frame = data_pack.frame() >>> len(full_frame) == len(data_pack) True
-
has_label
¶ return -- True if label column exists, False other wise.
返回类型: bool
-
one_hot_encode_label
(num_classes=2)¶ One-hot encode label column of relation.
参数: - num_classes -- Number of classes.
- inplace -- True to modify inplace, False to return a modified copy. (default: False)
返回:
-
relation
¶ Get
relation()
ofDataPack
.返回类型: DataFrame
-
save
(dirpath)¶ Save the
DataPack
object.A saved
DataPack
is represented as a directory with aDataPack
object (transformed user input as features and context), it will be saved by pickle.参数: dirpath ( Union
[str
,Path
]) -- directory path of the savedDataPack
.
-
shuffle
()¶ Shuffle the data pack by shuffling the relation column.
参数: inplace -- True to modify inplace, False to return a modified copy. (default: False) Example
>>> import matchzoo as mz >>> import numpy.random >>> numpy.random.seed(0) >>> data_pack = mz.datasets.toy.load_data() >>> orig_ids = data_pack.relation['id_left'] >>> shuffled = data_pack.shuffle() >>> (shuffled.relation['id_left'] != orig_ids).any() True
-
unpack
()¶ Unpack the data for training.
The return value can be directly feed to model.fit or model.fit_generator.
返回类型: Tuple
[Dict
[str
, <built-in function array>],Optional
[<built-in function array>]]返回: A tuple of (X, y). y is None if self has no label. Example
>>> import matchzoo as mz >>> data_pack = mz.datasets.toy.load_data() >>> X, y = data_pack.unpack() >>> type(X) <class 'dict'> >>> sorted(X.keys()) ['id_left', 'id_right', 'text_left', 'text_right'] >>> type(y) <class 'numpy.ndarray'> >>> X, y = data_pack.drop_label().unpack() >>> type(y) <class 'NoneType'>
- relation (
matchzoo.data_pack.pack module¶
Convert list of input into class:DataPack expected format.
-
matchzoo.data_pack.pack.
pack
(df)¶ Pack a
DataPack
using df.The df must have text_left and text_right columns. Optionally, the df can have id_left, id_right to index text_left and text_right respectively. id_left, id_right will be automatically generated if not specified.
参数: df ( DataFrame
) -- Inputpandas.DataFrame
to use.- Examples::
>>> import matchzoo as mz >>> import pandas as pd >>> df = pd.DataFrame(data={'text_left': list('AABC'), ... 'text_right': list('abbc'), ... 'label': [0, 1, 1, 0]}) >>> mz.pack(df).frame() id_left text_left id_right text_right label 0 L-0 A R-0 a 0 1 L-0 A R-1 b 1 2 L-1 B R-1 b 1 3 L-2 C R-2 c 0
返回类型: DataPack
Module contents¶
matchzoo.datasets package¶
Subpackages¶
matchzoo.datasets.embeddings package¶
Embedding data loader.
matchzoo.datasets.snli package¶
SNLI data loader.
-
matchzoo.datasets.snli.load_data.
load_data
(stage='train', task='classification', target_label='entailment')¶ Load SNLI data.
参数: - stage -- One of train, dev, and test. (default: train)
- task -- Could be one of ranking, classification or a
matchzoo.engine.BaseTask
instance. (default: ranking) - target_label -- If ranking, chose one of entailment, contradiction, neutral, and - as the positive label. (default: entailment)
返回: A DataPack if ranking, a tuple of (DataPack, classes) if classification.
matchzoo.datasets.toy package¶
-
matchzoo.datasets.toy.
load_data
(stage='train', task='ranking')¶ Load WikiQA data.
参数: - stage -- One of train, dev, and test.
- task -- Could be one of ranking, classification or a
matchzoo.engine.BaseTask
instance.
返回: A DataPack if ranking, a tuple of (DataPack, classes) if classification.
Example
>>> import matchzoo as mz >>> stages = 'train', 'dev', 'test' >>> tasks = 'ranking', 'classification' >>> for stage in stages: ... for task in tasks: ... _ = mz.datasets.toy.load_data(stage, task)
matchzoo.datasets.wiki_qa package¶
WikiQA data loader.
-
matchzoo.datasets.wiki_qa.load_data.
load_data
(stage='train', task='ranking', filter=False)¶ Load WikiQA data.
参数: - stage -- One of train, dev, and test.
- task -- Could be one of ranking, classification or a
matchzoo.engine.BaseTask
instance. - filter -- Whether remove the questions without correct answers.
返回: A DataPack if ranking, a tuple of (DataPack, classes) if classification.
matchzoo.engine package¶
Submodules¶
matchzoo.engine.base_metric module¶
Metric base class and some related utilities.
-
matchzoo.engine.base_metric.
parse_metric
(metric)¶ Parse input metric in any form into a
BaseMetric
instance.参数: metric ( Union
[str
,Type
[BaseMetric
],BaseMetric
]) -- Input metric in any form.返回: A BaseMetric
instance- Examples::
>>> from matchzoo import engine, metrics
- Use str as keras native metrics:
>>> engine.parse_metric('mse') 'mse'
- Use str as MatchZoo metrics:
>>> mz_metric = engine.parse_metric('map') >>> type(mz_metric) <class 'matchzoo.metrics.mean_average_precision.MeanAveragePrecision'>
- Use
matchzoo.engine.BaseMetric
subclasses as MatchZoo metrics: >>> type(engine.parse_metric(metrics.AveragePrecision)) <class 'matchzoo.metrics.average_precision.AveragePrecision'>
- Use
matchzoo.engine.BaseMetric
instances as MatchZoo metrics: >>> type(engine.parse_metric(metrics.AveragePrecision())) <class 'matchzoo.metrics.average_precision.AveragePrecision'>
-
matchzoo.engine.base_metric.
sort_and_couple
(labels, scores)¶ Zip the labels with scores into a single list.
返回类型: <built-in function array>
matchzoo.engine.base_model module¶
Base Model.
-
class
matchzoo.engine.base_model.
BaseModel
(params=None, backend=None)¶ 基类:
abc.ABC
Abstract base class of all matchzoo models.
-
BACKEND_WEIGHTS_FILENAME
= 'backend_weights.h5'¶
-
PARAMS_FILENAME
= 'params.dill'¶
-
backend
¶ return model backend, a keras model instance.
返回类型: Model
-
build
()¶ Build model, each sub class need to impelemnt this method.
Example
>>> BaseModel() Traceback (most recent call last): ... TypeError: Can't instantiate abstract class BaseModel ... >>> class MyModel(BaseModel): ... def build(self): ... pass >>> assert MyModel()
-
compile
()¶ Compile model for training.
Only keras native metrics are compiled together with backend. MatchZoo metrics are evaluated only through
evaluate()
. Notice that keras count loss as one of the metrics while MatchZoomatchzoo.engine.BaseTask
does not.Examples
>>> from matchzoo import models >>> model = models.Naive() >>> model.guess_and_fill_missing_params(verbose=0) >>> model.params['task'].metrics = ['mse', 'map'] >>> model.params['task'].metrics ['mse', mean_average_precision(0)] >>> model.build() >>> model.compile()
-
evaluate
(x, y, batch_size=128, verbose=1)¶ Evaluate the model.
See
keras.models.Model.evaluate()
for more details.参数: - x (
Union
[ndarray
,List
[ndarray
],Dict
[str
,ndarray
]]) -- input data - y (
ndarray
) -- labels - batch_size (
int
) -- number of samples per gradient update - verbose (
int
) -- verbosity mode, 0 or 1
返回类型: Dict
[str
,float
]返回: scalar test loss (if the model has a single output and no metrics) or list of scalars (if the model has multiple outputs and/or metrics). The attribute model.backend.metrics_names will give you the display labels for the scalar outputs.
- Examples::
>>> import matchzoo as mz >>> data_pack = mz.datasets.toy.load_data() >>> preprocessor = mz.preprocessors.NaivePreprocessor() >>> data_pack = preprocessor.fit_transform(data_pack) >>> m = mz.models.DenseBaseline() >>> m.params['task'] = mz.tasks.Ranking() >>> m.params['task'].metrics = [ ... 'acc', 'mse', 'mae', 'ce', ... 'average_precision', 'precision', 'dcg', 'ndcg', ... 'mean_reciprocal_rank', 'mean_average_precision', 'mrr', ... 'map', 'MAP', ... mz.metrics.AveragePrecision(threshold=1), ... mz.metrics.Precision(k=2, threshold=2), ... mz.metrics.DiscountedCumulativeGain(k=2), ... mz.metrics.NormalizedDiscountedCumulativeGain( ... k=3, threshold=-1), ... mz.metrics.MeanReciprocalRank(threshold=2), ... mz.metrics.MeanAveragePrecision(threshold=3) ... ] >>> m.guess_and_fill_missing_params(verbose=0) >>> m.build() >>> m.compile() >>> x, y = data_pack.unpack() >>> evals = m.evaluate(x, y, verbose=0) >>> type(evals) <class 'dict'>
- x (
-
fit
(x, y, batch_size=128, epochs=1, verbose=1, **kwargs)¶ Fit the model.
See
keras.models.Model.fit()
for more details.参数: - x (
Union
[ndarray
,List
[ndarray
]]) -- input data. - y (
ndarray
) -- labels. - batch_size (
int
) -- number of samples per gradient update. - epochs (
int
) -- number of epochs to train the model. - verbose (
int
) -- 0, 1, or 2. Verbosity mode. 0 = silent, 1 = verbose, 2 = one log line per epoch.
Key word arguments not listed above will be propagated to keras's fit.
返回类型: History
返回: A keras.callbacks.History instance. Its history attribute contains all information collected during training. - x (
-
fit_generator
(generator, epochs=1, verbose=1, **kwargs)¶ Fit the model with matchzoo generator.
See
keras.models.Model.fit_generator()
for more details.参数: - generator (
DataGenerator
) -- A generator, an instance ofengine.DataGenerator
. - epochs (
int
) -- Number of epochs to train the model. - verbose (
int
) -- 0, 1, or 2. Verbosity mode. 0 = silent, 1 = verbose, 2 = one log line per epoch.
返回类型: History
返回: A keras.callbacks.History instance. Its history attribute contains all information collected during training.
- generator (
-
classmethod
get_default_params
(with_embedding=False, with_multi_layer_perceptron=False)¶ Model default parameters.
- The common usage is to instantiate
matchzoo.engine.ModelParams
- first, then set the model specific parametrs.
Examples
>>> class MyModel(BaseModel): ... def build(self): ... print(self._params['num_eggs'], 'eggs') ... print('and', self._params['ham_type']) ... ... @classmethod ... def get_default_params(cls): ... params = engine.ParamTable() ... params.add(engine.Param('num_eggs', 512)) ... params.add(engine.Param('ham_type', 'Parma Ham')) ... return params >>> my_model = MyModel() >>> my_model.build() 512 eggs and Parma Ham
Notice that all parameters must be serialisable for the entire model to be serialisable. Therefore, it's strongly recommended to use python native data types to store parameters.
返回类型: ParamTable
返回: model parameters - The common usage is to instantiate
-
classmethod
get_default_preprocessor
()¶ Model default preprocessor.
The preprocessor's transform should produce a correctly shaped data pack that can be used for training. Some extra configuration (e.g. setting input_shapes in
matchzoo.models.DSSMModel
may be required on the user's end.返回类型: BasePreprocessor
返回: Default preprocessor.
-
guess_and_fill_missing_params
(verbose=1)¶ Guess and fill missing parameters in
params
.Use this method to automatically fill-in hyper parameters. This involves some guessing so the parameter it fills could be wrong. For example, the default task is Ranking, and if we do not set it to Classification manaully for data packs prepared for classification, then the shape of the model output and the data will mismatch.
参数: verbose -- Verbosity.
-
load_embedding_matrix
(embedding_matrix, name='embedding')¶ Load an embedding matrix.
Load an embedding matrix into the model's embedding layer. The name of the embedding layer is specified by name. For models with only one embedding layer, set name='embedding' when creating the keras layer, and use the default name when load the matrix. For models with more than one embedding layers, initialize keras layer with different layer names, and set name accordingly to load a matrix to a chosen layer.
参数: - embedding_matrix (
ndarray
) -- Embedding matrix to be loaded. - name (
str
) -- Name of the layer. (default: 'embedding')
- embedding_matrix (
-
params
¶ return -- model parameters.
返回类型: ParamTable
-
predict
(x, batch_size=128)¶ Generate output predictions for the input samples.
See
keras.models.Model.predict()
for more details.参数: - x (
Union
[ndarray
,List
[ndarray
]]) -- input data - batch_size -- number of samples per gradient update
返回类型: ndarray
返回: numpy array(s) of predictions
- x (
-
save
(dirpath)¶ Save the model.
A saved model is represented as a directory with two files. One is a model parameters file saved by pickle, and the other one is a model h5 file saved by keras.
参数: dirpath ( Union
[str
,Path
]) -- directory path of the saved model
-
-
matchzoo.engine.base_model.
load_model
(dirpath)¶ Load a model. The reverse function of
BaseModel.save()
.参数: dirpath ( Union
[str
,Path
]) -- directory path of the saved model返回类型: BaseModel
返回: a BaseModel
instance
matchzoo.engine.base_preprocessor module¶
BasePreprocessor
define input and ouutput for processors.
-
class
matchzoo.engine.base_preprocessor.
BasePreprocessor
¶ 基类:
object
BasePreprocessor
to input handle data.A preprocessor should be used in two steps. First, fit, then, transform. fit collects information into context, which includes everything the preprocessor needs to transform together with other useful information for later use. fit will only change the preprocessor's inner state but not the input data. In contrast, transform returns a modified copy of the input data without changing the preprocessor's inner state.
-
DATA_FILENAME
= 'preprocessor.dill'¶
-
context
¶ Return context.
-
fit
(data_pack, verbose=1)¶ Fit parameters on input data.
This method is an abstract base method, need to be implemented in the child class.
This method is expected to return itself as a callable object.
参数: - data_pack (
DataPack
) --Datapack
object to be fitted. - verbose -- Verbosity.
返回类型: - data_pack (
-
fit_transform
(data_pack, verbose=1)¶ Call fit-transform.
参数: data_pack ( DataPack
) --DataPack
object to be processed.返回类型: DataPack
-
save
(dirpath)¶ Save the
DSSMPreprocessor
object.A saved
DSSMPreprocessor
is represented as a directory with the context object (fitted parameters on training data), it will be saved by pickle.参数: dirpath ( Union
[str
,Path
]) -- directory path of the savedDSSMPreprocessor
.
-
-
matchzoo.engine.base_preprocessor.
load_preprocessor
(dirpath)¶ Load the fitted context. The reverse function of
save()
.参数: dirpath ( Union
[str
,Path
]) -- directory path of the saved model.返回类型: DataPack
返回: a DSSMPreprocessor
instance.
-
matchzoo.engine.base_preprocessor.
validate_context
(func)¶ Validate context in the preprocessor.
matchzoo.engine.base_task module¶
Base task.
-
class
matchzoo.engine.base_task.
BaseTask
(loss=None, metrics=None)¶ 基类:
abc.ABC
Base Task, shouldn't be used directly.
-
classmethod
convert_metrics
(metrics)¶ Convert metrics into properly formed list of metrics.
Examples
>>> BaseTask.convert_metrics(['mse']) ['mse'] >>> BaseTask.convert_metrics('map') [mean_average_precision(0)]
-
classmethod
list_available_losses
()¶ 返回类型: list
返回: a list of available losses.
-
classmethod
list_available_metrics
()¶ 返回类型: list
返回: a list of available metrics.
-
loss
¶ return -- Loss used in the task.
-
metrics
¶ return -- Metrics used in the task.
-
output_dtype
¶ return -- output data type for specific task.
-
output_shape
¶ return -- output shape of a single sample of the task.
返回类型: tuple
-
classmethod
matchzoo.engine.callbacks module¶
Callbacks.
-
class
matchzoo.engine.callbacks.
EvaluateAllMetrics
(model, x, y, once_every=1, batch_size=32, model_save_path=None, verbose=1)¶ 基类:
keras.callbacks.Callback
Callback to evaluate all metrics.
MatchZoo metrics can not be evaluated batch-wise since they require dataset-level information. As a result, MatchZoo metrics are not evaluated automatically when a Model fit. When this callback is used, all metrics, including MatchZoo metrics and Keras metrics, are evluated once every once_every epochs.
参数: - model (
BaseModel
) -- Model to evaluate. - x (
Union
[ndarray
,List
[ndarray
]]) -- - y (
ndarray
) -- - once_every (
int
) -- Evaluation only triggers when epoch % once_every == 0. (default: 1, i.e. evaluate on every epoch's end) - batch_size (
int
) -- Number of samples per evaluation. This only affects the evaluation of Keras metrics, since MatchZoo metrics are always evaluated using the full data. - model_save_path (
Optional
[str
]) -- Directory path to save the model after each evaluate callback, (default: None, i.e., no saving.) - verbose -- Verbosity.
-
on_epoch_end
(epoch, logs=None)¶ Called at the end of en epoch.
参数: - epoch -- integer, index of epoch.
- logs -- dictionary of logs.
返回: dictionary of logs.
- model (
matchzoo.engine.hyper_spaces module¶
Hyper parameter search spaces wrapping hyperopt.
-
class
matchzoo.engine.hyper_spaces.
HyperoptProxy
(hyperopt_func, **kwargs)¶ 基类:
object
Hyperopt proxy class.
See hyperopt's documentation for more details: https://github.com/hyperopt/hyperopt/wiki/FMin
Reason of these wrappers:
A hyper space in hyperopt requires a label to instantiate. This label is used later as a reference to original hyper space that is sampled. In matchzoo, hyper spaces are used inmatchzoo.engine.Param
. Only if a hyper space's label matches its parentmatchzoo.engine.Param
's name, matchzoo can correctly back-refrenced the parameter got sampled. This can be done by asking the user always use the same name for a parameter and its hyper space, but typos can occur. As a result, these wrappers are created to hide hyper spaces' label, and always correctly bind them with its parameter's name.- Examples::
>>> import matchzoo as mz >>> from hyperopt.pyll.stochastic import sample
- Basic Usage:
>>> model = mz.models.DenseBaseline() >>> sample(model.params.hyper_space) {'mlp_num_layers': 1.0, 'mlp_num_units': 274.0}
- Arithmetic Operations:
>>> new_space = 2 ** mz.hyper_spaces.quniform(2, 6) >>> model.params.get('mlp_num_layers').hyper_space = new_space >>> sample(model.params.hyper_space) {'mlp_num_layers': 8.0, 'mlp_num_units': 292.0}
-
convert
(name)¶ Attach name as hyperopt.hp's label.
参数: name ( str
) --返回类型: Apply
返回: a hyperopt ready search space
-
class
matchzoo.engine.hyper_spaces.
choice
(options)¶ 基类:
matchzoo.engine.hyper_spaces.HyperoptProxy
hyperopt.hp.choice()
proxy.
-
class
matchzoo.engine.hyper_spaces.
quniform
(low, high, q=1)¶ 基类:
matchzoo.engine.hyper_spaces.HyperoptProxy
hyperopt.hp.quniform()
proxy.
-
class
matchzoo.engine.hyper_spaces.
uniform
(low, high)¶ 基类:
matchzoo.engine.hyper_spaces.HyperoptProxy
hyperopt.hp.uniform()
proxy.
matchzoo.engine.param module¶
Parameter class.
-
class
matchzoo.engine.param.
Param
(name, value=None, hyper_space=None, validator=None, desc=None)¶ 基类:
object
Parameter class.
Basic usages with a name and value:
>>> param = Param('my_param', 10) >>> param.name 'my_param' >>> param.value 10
Use with a validator to make sure the parameter always keeps a valid value.
>>> param = Param( ... name='my_param', ... value=5, ... validator=lambda x: 0 < x < 20 ... ) >>> param.validator <function <lambda> at 0x...> >>> param.value 5 >>> param.value = 10 >>> param.value 10 >>> param.value = -1 Traceback (most recent call last): ... ValueError: Validator not satifised. The validator's definition is as follows: validator=lambda x: 0 < x < 20
Use with a hyper space. Setting up a hyper space for a parameter makes the parameter tunable in a
matchzoo.engine.Tuner
.>>> from matchzoo.engine.hyper_spaces import quniform >>> param = Param( ... name='positive_num', ... value=1, ... hyper_space=quniform(low=1, high=5) ... ) >>> param.hyper_space <matchzoo.engine.hyper_spaces.quniform object at ...> >>> from hyperopt.pyll.stochastic import sample >>> hyperopt_space = param.hyper_space.convert(param.name) >>> samples = [sample(hyperopt_space) for _ in range(64)] >>> set(samples) == {1, 2, 3, 4, 5} True
The boolean value of a
Param
instance is only True when the value is not None. This is because some default falsy values like zero or an empty list are valid parameter values. In other words, the boolean value means to be "if the parameter value is filled".>>> param = Param('dropout') >>> if param: ... print('OK') >>> param = Param('dropout', 0) >>> if param: ... print('OK') OK
A _pre_assignment_hook is initialized as a data type convertor if the value is set as a number to keep data type consistency of the parameter. This conversion supports python built-in numbers, numpy numbers, and any number that inherits
numbers.Number
.>>> param = Param('float_param', 0.5) >>> param.value = 10 >>> param.value 10.0 >>> type(param.value) <class 'float'>
-
desc
¶ return -- Parameter description.
-
hyper_space
¶ return -- Hyper space of the parameter.
-
name
¶ return -- Name of the parameter.
返回类型: str
-
set_default
(val, verbose=1)¶ Set default value, has no effect if already has a value.
参数: - val -- Default value to set.
- verbose -- Verbosity.
-
validator
¶ return -- Validator of the parameter.
返回类型: Callable
[[Any
],bool
]
-
value
¶ return -- Value of the parameter.
返回类型: Any
-
matchzoo.engine.param_table module¶
Parameters table class.
-
class
matchzoo.engine.param_table.
ParamTable
¶ 基类:
object
Parameter table class.
Example
>>> params = ParamTable() >>> params.add(Param('ham', 'Parma Ham')) >>> params.add(Param('egg', 'Over Easy')) >>> params['ham'] 'Parma Ham' >>> params['egg'] 'Over Easy' >>> print(params) ham Parma Ham egg Over Easy >>> params.add(Param('egg', 'Sunny side Up')) Traceback (most recent call last): ... ValueError: Parameter named egg already exists. To re-assign parameter egg value, use `params["egg"] = value` instead.
-
completed
()¶ 返回类型: bool
返回: True if all params are filled, False otherwise. Example
>>> import matchzoo >>> model = matchzoo.models.Naive() >>> model.params.completed() False >>> model.guess_and_fill_missing_params(verbose=0) >>> model.params.completed() True
-
hyper_space
¶ return -- Hyper space of the table, a valid hyperopt graph.
返回类型: dict
-
keys
()¶ 返回类型: Keysview
[~KT]返回: Parameter table keys.
-
set
(key, param)¶ Set key to parameter param.
-
Module contents¶
matchzoo.layers package¶
Submodules¶
matchzoo.layers.dynamic_pooling_layer module¶
An implementation of Dynamic Pooling Layer.
-
class
matchzoo.layers.dynamic_pooling_layer.
DynamicPoolingLayer
(psize1, psize2, **kwargs)¶ 基类:
keras.engine.base_layer.Layer
Layer that computes dynamic pooling of one tensor.
参数: - psize1 (
int
) -- pooling size of dimension 1 - psize2 (
int
) -- pooling size of dimension 2 - kwargs -- Standard layer keyword arguments.
Examples
>>> import matchzoo as mz >>> layer = mz.layers.DynamicPoolingLayer(3, 2) >>> num_batch, left_len, right_len, num_dim = 5, 3, 2, 10 >>> layer.build([[num_batch, left_len, right_len, num_dim], ... [num_batch, left_len, right_len, 3]])
-
build
(input_shape)¶ Build the layer.
参数: input_shape -- the shapes of the input tensors, for DynamicPoolingLayer we need tow input tensors.
-
call
(inputs, **kwargs)¶ The computation logic of DynamicPoolingLayer.
参数: inputs ( list
) -- two input tensors.返回类型: Any
-
compute_output_shape
(input_shape)¶ Calculate the layer output shape.
参数: input_shape ( list
) -- the shapes of the input tensors, for DynamicPoolingLayer we need tow input tensors.返回类型: tuple
-
get_config
()¶ Get the config dict of DynamicPoolingLayer.
返回类型: dict
- psize1 (
matchzoo.layers.matching_layer module¶
An implementation of Matching Layer.
-
class
matchzoo.layers.matching_layer.
MatchingLayer
(normalize=False, matching_type='dot', **kwargs)¶ 基类:
keras.engine.base_layer.Layer
Layer that computes a matching matrix between samples in two tensors.
参数: - normalize (
bool
) -- Whether to L2-normalize samples along the dot product axis before taking the dot product. If set to True, then the output of the dot product is the cosine proximity between the two samples. - matching_type (
str
) -- the similarity function for matching - kwargs -- Standard layer keyword arguments.
Examples
>>> import matchzoo as mz >>> layer = mz.layers.MatchingLayer(matching_type='dot', ... normalize=True) >>> num_batch, left_len, right_len, num_dim = 5, 3, 2, 10 >>> layer.build([[num_batch, left_len, num_dim], ... [num_batch, right_len, num_dim]])
-
build
(input_shape)¶ Build the layer.
参数: input_shape ( list
) -- the shapes of the input tensors, for MatchingLayer we need tow input tensors.
-
call
(inputs, **kwargs)¶ The computation logic of MatchingLayer.
参数: inputs ( list
) -- two input tensors.返回类型: Any
-
compute_output_shape
(input_shape)¶ Calculate the layer output shape.
参数: input_shape ( list
) -- the shapes of the input tensors, for MatchingLayer we need tow input tensors.返回类型: tuple
-
get_config
()¶ Get the config dict of MatchingLayer.
返回类型: dict
- normalize (
Module contents¶
matchzoo.losses package¶
Submodules¶
matchzoo.losses.rank_cross_entropy_loss module¶
The rank cross entropy loss.
-
class
matchzoo.losses.rank_cross_entropy_loss.
RankCrossEntropyLoss
(num_neg=1)¶ 基类:
object
Rank cross entropy loss.
Examples
>>> from keras import backend as K >>> softmax = lambda x: np.exp(x)/np.sum(np.exp(x), axis=0) >>> x_pred = K.variable(np.array([[1.0], [1.2], [0.8]])) >>> x_true = K.variable(np.array([[1], [0], [0]])) >>> expect = -np.log(softmax(np.array([[1.0], [1.2], [0.8]]))) >>> loss = K.eval(RankCrossEntropyLoss(num_neg=2)(x_true, x_pred)) >>> np.isclose(loss, expect[0]).all() True
matchzoo.losses.rank_hinge_loss module¶
The rank hinge loss.
-
class
matchzoo.losses.rank_hinge_loss.
RankHingeLoss
(num_neg=1, margin=1.0)¶ 基类:
object
Rank hinge loss.
Examples
>>> from keras import backend as K >>> x_pred = K.variable(np.array([[1.0], [1.2], [0.8], [1.4]])) >>> x_true = K.variable(np.array([[1], [0], [1], [0]])) >>> expect = ((1.0 + 1.2 - 1.0) + (1.0 + 1.4 - 0.8)) / 2 >>> expect 1.4 >>> loss = K.eval(RankHingeLoss(num_neg=1, margin=1.0)(x_true, x_pred)) >>> np.isclose(loss, expect) True
Module contents¶
matchzoo.metrics package¶
Submodules¶
matchzoo.metrics.average_precision module¶
Average precision metric for ranking.
-
class
matchzoo.metrics.average_precision.
AveragePrecision
(threshold=0)¶ 基类:
matchzoo.engine.base_metric.BaseMetric
Average precision metric.
-
ALIAS
= ['average_precision', 'ap']¶
-
matchzoo.metrics.discounted_cumulative_gain module¶
Discounted cumulative gain metric for ranking.
-
class
matchzoo.metrics.discounted_cumulative_gain.
DiscountedCumulativeGain
(k=1, threshold=0)¶ 基类:
matchzoo.engine.base_metric.BaseMetric
Disconunted cumulative gain metric.
-
ALIAS
= ['discounted_cumulative_gain', 'dcg']¶
-
matchzoo.metrics.mean_average_precision module¶
Mean average precision metric for ranking.
-
class
matchzoo.metrics.mean_average_precision.
MeanAveragePrecision
(threshold=0)¶ 基类:
matchzoo.engine.base_metric.BaseMetric
Mean average precision metric.
-
ALIAS
= ['mean_average_precision', 'map']¶
-
matchzoo.metrics.mean_reciprocal_rank module¶
Mean reciprocal ranking metric.
-
class
matchzoo.metrics.mean_reciprocal_rank.
MeanReciprocalRank
(threshold=0)¶ 基类:
matchzoo.engine.base_metric.BaseMetric
Mean reciprocal rank metric.
-
ALIAS
= ['mean_reciprocal_rank', 'mrr']¶
-
matchzoo.metrics.normalized_discounted_cumulative_gain module¶
Normalized discounted cumulative gain metric for ranking.
-
class
matchzoo.metrics.normalized_discounted_cumulative_gain.
NormalizedDiscountedCumulativeGain
(k=1, threshold=0)¶ 基类:
matchzoo.engine.base_metric.BaseMetric
Normalized discounted cumulative gain metric.
-
ALIAS
= ['normalized_discounted_cumulative_gain', 'ndcg']¶
-
matchzoo.metrics.precision module¶
Precision for ranking.
-
class
matchzoo.metrics.precision.
Precision
(k=1, threshold=0)¶ 基类:
matchzoo.engine.base_metric.BaseMetric
Precision metric.
-
ALIAS
= 'precision'¶
-
Module contents¶
matchzoo.models package¶
Submodules¶
matchzoo.models.anmm module¶
An implementation of aNMM Model.
-
class
matchzoo.models.anmm.
ANMM
(params=None, backend=None)¶ 基类:
matchzoo.engine.base_model.BaseModel
ANMM Model.
Examples
>>> model = ANMM() >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
-
build
()¶ Build model structure.
aNMM model based on bin weighting and query term attentions
-
classmethod
get_default_params
()¶ 返回类型: ParamTable
返回: model default parameters.
-
matchzoo.models.arci module¶
An implementation of ArcI Model.
-
class
matchzoo.models.arci.
ArcI
(params=None, backend=None)¶ 基类:
matchzoo.engine.base_model.BaseModel
ArcI Model.
Examples
>>> model = ArcI() >>> model.params['num_blocks'] = 1 >>> model.params['left_filters'] = [32] >>> model.params['right_filters'] = [32] >>> model.params['left_kernel_sizes'] = [3] >>> model.params['right_kernel_sizes'] = [3] >>> model.params['left_pool_sizes'] = [2] >>> model.params['right_pool_sizes'] = [4] >>> model.params['conv_activation_func'] = 'relu' >>> model.params['mlp_num_layers'] = 1 >>> model.params['mlp_num_units'] = 64 >>> model.params['mlp_num_fan_out'] = 32 >>> model.params['mlp_activation_func'] = 'relu' >>> model.params['dropout_rate'] = 0.5 >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
-
build
()¶ Build model structure.
ArcI use Siamese arthitecture.
-
classmethod
get_default_params
()¶ 返回类型: ParamTable
返回: model default parameters.
-
matchzoo.models.arcii module¶
An implementation of ArcII Model.
-
class
matchzoo.models.arcii.
ArcII
(params=None, backend=None)¶ 基类:
matchzoo.engine.base_model.BaseModel
ArcII Model.
Examples
>>> model = ArcII() >>> model.params['embedding_output_dim'] = 300 >>> model.params['num_blocks'] = 2 >>> model.params['kernel_1d_count'] = 32 >>> model.params['kernel_1d_size'] = 3 >>> model.params['kernel_2d_count'] = [16, 32] >>> model.params['kernel_2d_size'] = [[3, 3], [3, 3]] >>> model.params['pool_2d_size'] = [[2, 2], [2, 2]] >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
-
build
()¶ Build model structure.
ArcII has the desirable property of letting two sentences meet before their own high-level representations mature.
-
classmethod
get_default_params
()¶ 返回类型: ParamTable
返回: model default parameters.
-
matchzoo.models.cdssm module¶
An implementation of CDSSM (CLSM) model.
-
class
matchzoo.models.cdssm.
CDSSM
(params=None, backend=None)¶ 基类:
matchzoo.engine.base_model.BaseModel
CDSSM Model implementation.
Learning Semantic Representations Using Convolutional Neural Networks for Web Search. (2014a) A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval. (2014b)
Examples
>>> model = CDSSM() >>> model.params['optimizer'] = 'adam' >>> model.params['filters'] = 32 >>> model.params['kernel_size'] = 3 >>> model.params['conv_activation_func'] = 'relu' >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
-
build
()¶ Build model structure.
CDSSM use Siamese architecture.
-
classmethod
get_default_params
()¶ 返回类型: ParamTable
返回: model default parameters.
-
classmethod
get_default_preprocessor
()¶ 返回: Default preprocessor.
-
guess_and_fill_missing_params
(verbose=1)¶ Guess and fill missing parameters in
params
.Use this method to automatically fill-in hyper parameters. This involves some guessing so the parameter it fills could be wrong. For example, the default task is Ranking, and if we do not set it to Classification manually for data packs prepared for classification, then the shape of the model output and the data will mismatch.
参数: verbose -- Verbosity.
-
matchzoo.models.conv_knrm module¶
ConvKNRM model.
-
class
matchzoo.models.conv_knrm.
ConvKNRM
(params=None, backend=None)¶ -
ConvKNRM model.
Examples
>>> model = ConvKNRM() >>> model.params['embedding_input_dim'] = 10000 >>> model.params['embedding_output_dim'] = 300 >>> model.params['embedding_trainable'] = True >>> model.params['filters'] = 128 >>> model.params['conv_activation_func'] = 'tanh' >>> model.params['max_ngram'] = 3 >>> model.params['use_crossmatch'] = True >>> model.params['kernel_num'] = 11 >>> model.params['sigma'] = 0.1 >>> model.params['exact_sigma'] = 0.001 >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
-
build
()¶ Build model.
-
get_default_params
()¶ Get default parameters.
-
matchzoo.models.dense_baseline module¶
A simple densely connected baseline model.
-
class
matchzoo.models.dense_baseline.
DenseBaseline
(params=None, backend=None)¶ 基类:
matchzoo.engine.base_model.BaseModel
A simple densely connected baseline model.
Examples
>>> model = DenseBaseline() >>> model.params['mlp_num_layers'] = 2 >>> model.params['mlp_num_units'] = 300 >>> model.params['mlp_num_fan_out'] = 128 >>> model.params['mlp_activation_func'] = 'relu' >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build() >>> model.compile()
-
build
()¶ Model structure.
-
classmethod
get_default_params
()¶ 返回类型: ParamTable
返回: model default parameters.
-
matchzoo.models.drmm module¶
An implementation of DRMM Model.
-
class
matchzoo.models.drmm.
DRMM
(params=None, backend=None)¶ 基类:
matchzoo.engine.base_model.BaseModel
DRMM Model.
Examples
>>> model = DRMM() >>> model.params['mlp_num_layers'] = 1 >>> model.params['mlp_num_units'] = 5 >>> model.params['mlp_num_fan_out'] = 1 >>> model.params['mlp_activation_func'] = 'tanh' >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build() >>> model.compile()
-
classmethod
attention_layer
(attention_input, attention_mask=None)¶ Performs attention on the input.
参数: - attention_input (
Any
) -- The input tensor for attention layer. - attention_mask (
Optional
[Any
]) -- A tensor to mask the invalid values.
返回: The masked output tensor.
- attention_input (
-
build
()¶ Build model structure.
-
classmethod
get_default_params
()¶ 返回类型: ParamTable
返回: model default parameters.
-
classmethod
matchzoo.models.drmmtks module¶
An implementation of DRMMTKS Model.
-
class
matchzoo.models.drmmtks.
DRMMTKS
(params=None, backend=None)¶ 基类:
matchzoo.engine.base_model.BaseModel
DRMMTKS Model.
Examples
>>> model = DRMMTKS() >>> model.params['embedding_input_dim'] = 10000 >>> model.params['embedding_output_dim'] = 100 >>> model.params['top_k'] = 20 >>> model.params['mlp_num_layers'] = 1 >>> model.params['mlp_num_units'] = 5 >>> model.params['mlp_num_fan_out'] = 1 >>> model.params['mlp_activation_func'] = 'tanh' >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
-
classmethod
attention_layer
(attention_input, attention_mask=None)¶ Performs attention on the input.
参数: - attention_input (
Any
) -- The input tensor for attention layer. - attention_mask (
Optional
[Any
]) -- A tensor to mask the invalid values.
返回: The masked output tensor.
- attention_input (
-
build
()¶ Build model structure.
-
classmethod
get_default_params
()¶ 返回类型: ParamTable
返回: model default parameters.
-
classmethod
matchzoo.models.dssm module¶
An implementation of DSSM, Deep Structured Semantic Model.
-
class
matchzoo.models.dssm.
DSSM
(params=None, backend=None)¶ 基类:
matchzoo.engine.base_model.BaseModel
Deep structured semantic model.
Examples
>>> model = DSSM() >>> model.params['mlp_num_layers'] = 3 >>> model.params['mlp_num_units'] = 300 >>> model.params['mlp_num_fan_out'] = 128 >>> model.params['mlp_activation_func'] = 'relu' >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
-
build
()¶ Build model structure.
DSSM use Siamese arthitecture.
-
classmethod
get_default_params
()¶ 返回类型: ParamTable
返回: model default parameters.
-
classmethod
get_default_preprocessor
()¶ 返回: Default preprocessor.
-
matchzoo.models.duet module¶
DUET Model.
-
class
matchzoo.models.duet.
DUET
(params=None, backend=None)¶ 基类:
matchzoo.engine.base_model.BaseModel
DUET Model.
Examples
>>> model = DUET() >>> model.params['embedding_input_dim'] = 1000 >>> model.params['embedding_output_dim'] = 300 >>> model.params['lm_filters'] = 32 >>> model.params['lm_hidden_sizes'] = [64, 32] >>> model.params['dropout_rate'] = 0.5 >>> model.params['dm_filters'] = 32 >>> model.params['dm_kernel_size'] = 3 >>> model.params['dm_d_mpool'] = 4 >>> model.params['dm_hidden_sizes'] = [64, 32] >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
-
build
()¶ Build model.
-
classmethod
get_default_params
()¶ Get default parameters.
-
matchzoo.models.knrm module¶
KNRM model.
-
class
matchzoo.models.knrm.
KNRM
(params=None, backend=None)¶ 基类:
matchzoo.engine.base_model.BaseModel
KNRM model.
Examples
>>> model = KNRM() >>> model.params['embedding_input_dim'] = 10000 >>> model.params['embedding_output_dim'] = 10 >>> model.params['embedding_trainable'] = True >>> model.params['kernel_num'] = 11 >>> model.params['sigma'] = 0.1 >>> model.params['exact_sigma'] = 0.001 >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
-
build
()¶ Build model.
-
classmethod
get_default_params
()¶ Get default parameters.
-
matchzoo.models.match_pyramid module¶
An implementation of MatchPyramid Model.
-
class
matchzoo.models.match_pyramid.
MatchPyramid
(params=None, backend=None)¶ 基类:
matchzoo.engine.base_model.BaseModel
MatchPyramid Model.
Examples
>>> model = MatchPyramid() >>> model.params['embedding_output_dim'] = 300 >>> model.params['num_blocks'] = 2 >>> model.params['kernel_count'] = [16, 32] >>> model.params['kernel_size'] = [[3, 3], [3, 3]] >>> model.params['dpool_size'] = [3, 10] >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
-
build
()¶ Build model structure.
MatchPyramid text matching as image recognition.
-
classmethod
get_default_params
()¶ 返回类型: ParamTable
返回: model default parameters.
-
matchzoo.models.mvlstm module¶
An implementation of MVLSTM Model.
-
class
matchzoo.models.mvlstm.
MVLSTM
(params=None, backend=None)¶ 基类:
matchzoo.engine.base_model.BaseModel
MVLSTM Model.
Examples
>>> model = MVLSTM() >>> model.params['lstm_units'] = 32 >>> model.params['top_k'] = 50 >>> model.params['mlp_num_layers'] = 2 >>> model.params['mlp_num_units'] = 20 >>> model.params['mlp_num_fan_out'] = 10 >>> model.params['mlp_activation_func'] = 'relu' >>> model.params['dropout_rate'] = 0.5 >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
-
build
()¶ Build model structure.
-
classmethod
get_default_params
()¶ 返回类型: ParamTable
返回: model default parameters.
-
matchzoo.models.naive module¶
Naive model with a simplest structure for testing purposes.
-
class
matchzoo.models.naive.
Naive
(params=None, backend=None)¶ 基类:
matchzoo.engine.base_model.BaseModel
Naive model with a simplest structure for testing purposes.
Bare minimum functioning model. The best choice to get things rolling. The worst choice to fit and evaluate performance.
-
build
()¶ Build.
-
matchzoo.models.parameter_readme_generator module¶
matchzoo.preprocessors package¶
Submodules¶
matchzoo.preprocessors.basic_preprocessor module¶
Basic Preprocessor.
-
class
matchzoo.preprocessors.basic_preprocessor.
BasicPreprocessor
(fixed_length_left=30, fixed_length_right=30, filter_mode='df', filter_low_freq=2, filter_high_freq=inf, remove_stop_words=False)¶ 基类:
matchzoo.engine.base_preprocessor.BasePreprocessor
Baisc preprocessor helper.
参数: - fixed_length_left (
int
) -- Integer, maximize length ofleft
in the data_pack. - fixed_length_right (
int
) -- Integer, maximize length ofright
in the data_pack. - filter_mode (
str
) -- String, mode used byFrequenceFilterUnit
, Can be 'df', 'cf', and 'idf'. - filter_low_freq (
float
) -- Float, lower bound value used byFrequenceFilterUnit
. - filter_high_freq (
float
) -- Float, upper bound value used byFrequenceFilterUnit
. - remove_stop_words (
bool
) -- Bool, useStopRemovalUnit
unit or not.
Example
>>> import matchzoo as mz >>> train_data = mz.datasets.toy.load_data('train') >>> test_data = mz.datasets.toy.load_data('test') >>> preprocessor = mz.preprocessors.BasicPreprocessor( ... fixed_length_left=10, ... fixed_length_right=20, ... filter_mode='df', ... filter_low_freq=2, ... filter_high_freq=1000, ... remove_stop_words=True ... ) >>> preprocessor = preprocessor.fit(train_data) >>> preprocessor.context['input_shapes'] [(10,), (20,)] >>> preprocessor.context['vocab_size'] 284 >>> processed_train_data = preprocessor.transform(train_data) >>> type(processed_train_data) <class 'matchzoo.data_pack.data_pack.DataPack'> >>> test_data_transformed = preprocessor.transform(test_data) >>> type(test_data_transformed) <class 'matchzoo.data_pack.data_pack.DataPack'>
- fixed_length_left (
matchzoo.preprocessors.cdssm_preprocessor module¶
CDSSM Preprocessor.
-
class
matchzoo.preprocessors.cdssm_preprocessor.
CDSSMPreprocessor
(fixed_length_left=10, fixed_length_right=40, with_word_hashing=True)¶ 基类:
matchzoo.engine.base_preprocessor.BasePreprocessor
CDSSM Model preprocessor.
matchzoo.preprocessors.dssm_preprocessor module¶
DSSM Preprocessor.
-
class
matchzoo.preprocessors.dssm_preprocessor.
DSSMPreprocessor
(with_word_hashing=True)¶ 基类:
matchzoo.engine.base_preprocessor.BasePreprocessor
DSSM Model preprocessor.
matchzoo.preprocessors.naive_preprocessor module¶
Naive Preprocessor.
-
class
matchzoo.preprocessors.naive_preprocessor.
NaivePreprocessor
¶ 基类:
matchzoo.engine.base_preprocessor.BasePreprocessor
Naive preprocessor.
Example
>>> import matchzoo as mz >>> train_data = mz.datasets.toy.load_data() >>> test_data = mz.datasets.toy.load_data(stage='test') >>> preprocessor = mz.preprocessors.NaivePreprocessor() >>> train_data_processed = preprocessor.fit_transform(train_data) >>> type(train_data_processed) <class 'matchzoo.data_pack.data_pack.DataPack'> >>> test_data_transformed = preprocessor.transform(test_data) >>> type(test_data_transformed) <class 'matchzoo.data_pack.data_pack.DataPack'>
matchzoo.processor_units package¶
Submodules¶
matchzoo.processor_units.chain_transform module¶
Wrapper function organizes a number of transform functions.
-
matchzoo.processor_units.chain_transform.
chain_transform
(units)¶ Compose unit transformations into a single function.
matchzoo.processor_units.processor_units module¶
Matchzoo toolkit for text pre-processing.
-
class
matchzoo.processor_units.processor_units.
DigitRemovalUnit
¶ 基类:
matchzoo.processor_units.processor_units.ProcessorUnit
Process unit to remove digits.
-
transform
(tokens)¶ Remove digits from list of tokens.
参数: tokens ( list
) -- list of tokens to be filtered.Return tokens: tokens of tokens without digits. 返回类型: list
-
-
class
matchzoo.processor_units.processor_units.
FixedLengthUnit
(text_length, pad_value=0, pad_mode='pre', truncate_mode='pre')¶ 基类:
matchzoo.processor_units.processor_units.ProcessorUnit
FixedLengthUnit Class.
Process unit to get the fixed length text.
Examples
>>> fixedlen = FixedLengthUnit(3) >>> fixedlen.transform(range(1, 6)) == [3, 4, 5] True >>> fixedlen = FixedLengthUnit(3) >>> fixedlen.transform(range(1, 3)) == [0, 1, 2] True
-
transform
(tokens)¶ Transform list of tokenized tokens into the fixed length text.
参数: tokens ( list
) -- list of tokenized tokens.Return tokens: list of tokenized tokens in fixed length. 返回类型: list
-
-
class
matchzoo.processor_units.processor_units.
FrequencyFilterUnit
(low=0, high=inf, mode='df')¶ 基类:
matchzoo.processor_units.processor_units.StatefulProcessorUnit
Frequency filter unit.
参数: - low -- Lower bound, inclusive.
- high -- Upper bound, exclusive.
- mode -- One of tf (term frequency), df (document frequency), and idf (inverse document frequency).
- Examples::
>>> import matchzoo as mz
- To filter based on term frequency (tf):
>>> tf_filter = mz.processor_units.FrequencyFilterUnit( ... low=2, mode='tf') >>> tf_filter.fit([['A', 'B', 'B'], ['C', 'C', 'C']]) >>> tf_filter.transform(['A', 'B', 'C']) ['B', 'C']
- To filter based on document frequency (df):
>>> tf_filter = mz.processor_units.FrequencyFilterUnit( ... low=2, mode='df') >>> tf_filter.fit([['A', 'B'], ['B', 'C']]) >>> tf_filter.transform(['A', 'B', 'C']) ['B']
- To filter based on inverse document frequency (idf):
>>> idf_filter = mz.processor_units.FrequencyFilterUnit( ... low=1.2, mode='idf') >>> idf_filter.fit([['A', 'B'], ['B', 'C', 'D']]) >>> idf_filter.transform(['A', 'B', 'C']) ['A', 'C']
-
fit
(list_of_tokens)¶ Fit list_of_tokens by calculating mode states.
-
transform
(tokens)¶ Transform a list of tokens by filtering out unwanted words.
返回类型: list
-
class
matchzoo.processor_units.processor_units.
LemmatizationUnit
¶ 基类:
matchzoo.processor_units.processor_units.ProcessorUnit
Process unit for token lemmatization.
-
transform
(tokens)¶ Lemmatization a sequence of tokens.
参数: tokens ( list
) -- list of tokens to be lemmatized.Return tokens: list of lemmatizd tokens. 返回类型: list
-
-
class
matchzoo.processor_units.processor_units.
LowercaseUnit
¶ 基类:
matchzoo.processor_units.processor_units.ProcessorUnit
Process unit for text lower case.
-
transform
(tokens)¶ Convert list of tokens to lower case.
参数: tokens ( list
) -- list of tokens.Return tokens: lower-cased list of tokens. 返回类型: list
-
-
class
matchzoo.processor_units.processor_units.
MatchingHistogramUnit
(bin_size=30, embedding_matrix=None, normalize=True, mode='LCH')¶ 基类:
matchzoo.processor_units.processor_units.ProcessorUnit
MatchingHistogramUnit Class.
参数: - bin_size (
int
) -- The number of bins of the matching histogram. - embedding_matrix -- The word embedding matrix applied to calculate the matching histogram.
- normalize -- Boolean, normalize the embedding or not.
- mode (
str
) -- The type of the historgram, it should be one of 'CH', 'NG', or 'LCH'.
Examples
>>> embedding_matrix = np.array([[1.0, -1.0], [1.0, 2.0], [1.0, 3.0]]) >>> text_left = [0, 1] >>> text_right = [1, 2] >>> histogram = MatchingHistogramUnit(3, embedding_matrix, True, 'CH') >>> histogram.transform([text_left, text_right]) [[3.0, 1.0, 1.0], [1.0, 2.0, 2.0]]
-
transform
(text_pair)¶ Transform the input text.
返回类型: list
- bin_size (
-
class
matchzoo.processor_units.processor_units.
NgramLetterUnit
(ngram=3, reduce_dim=True)¶ 基类:
matchzoo.processor_units.processor_units.ProcessorUnit
Process unit for n-letter generation.
Triletter is used in
DSSMModel
. This processor is expected to execute before Vocab has been created.Examples
>>> triletter = NgramLetterUnit() >>> rv = triletter.transform(['hello', 'word']) >>> len(rv) 9 >>> rv ['#he', 'hel', 'ell', 'llo', 'lo#', '#wo', 'wor', 'ord', 'rd#'] >>> triletter = NgramLetterUnit(reduce_dim=False) >>> rv = triletter.transform(['hello', 'word']) >>> len(rv) 2 >>> rv [['#he', 'hel', 'ell', 'llo', 'lo#'], ['#wo', 'wor', 'ord', 'rd#']]
-
transform
(tokens)¶ Transform token into tri-letter.
For example, word should be represented as #wo, wor, ord and rd#.
参数: tokens ( list
) -- list of tokens to be transformed.Return n_letters: generated n_letters. 返回类型: list
-
-
class
matchzoo.processor_units.processor_units.
ProcessorUnit
¶ 基类:
object
Process unit do not persive state (i.e. do not need fit).
-
transform
(input)¶ Abstract base method, need to be implemented in subclass.
-
-
class
matchzoo.processor_units.processor_units.
PuncRemovalUnit
¶ 基类:
matchzoo.processor_units.processor_units.ProcessorUnit
Process unit for remove punctuations.
-
transform
(tokens)¶ Remove punctuations from list of tokens.
参数: tokens ( list
) -- list of toekns.Return rv: tokens without punctuation. 返回类型: list
-
-
class
matchzoo.processor_units.processor_units.
StatefulProcessorUnit
¶ 基类:
matchzoo.processor_units.processor_units.ProcessorUnit
Process unit do persive state (i.e. need fit).
-
fit
(input)¶ Abstract base method, need to be implemented in subclass.
-
state
¶ Get current state.
-
-
class
matchzoo.processor_units.processor_units.
StemmingUnit
(stemmer='porter')¶ 基类:
matchzoo.processor_units.processor_units.ProcessorUnit
Process unit for token stemming.
-
transform
(tokens)¶ Reducing inflected words to their word stem, base or root form.
参数: - tokens (
list
) -- list of string to be stemmed. - stemmer -- stemmer to use, porter or lancaster.
引发: ValueError -- stemmer type should be porter or lancaster.
Return tokens: stemmed token.
返回类型: list
- tokens (
-
-
class
matchzoo.processor_units.processor_units.
StopRemovalUnit
(lang='english')¶ 基类:
matchzoo.processor_units.processor_units.ProcessorUnit
Process unit to remove stop words.
Example
>>> unit = StopRemovalUnit() >>> unit.transform(['a', 'the', 'test']) ['test'] >>> type(unit.stopwords) <class 'list'>
-
stopwords
¶ Get stopwords based on language.
Params lang: language code. 返回类型: list
返回: list of stop words.
-
transform
(tokens)¶ Remove stopwords from list of tokenized tokens.
参数: - tokens (
list
) -- list of tokenized tokens. - lang -- language code for stopwords.
Return tokens: list of tokenized tokens without stopwords.
返回类型: list
- tokens (
-
-
class
matchzoo.processor_units.processor_units.
TokenizeUnit
¶ 基类:
matchzoo.processor_units.processor_units.ProcessorUnit
Process unit for text tokenization.
-
transform
(input)¶ Process input data from raw terms to list of tokens.
参数: input ( str
) -- raw textual input.Return tokens: tokenized tokens as a list. 返回类型: list
-
-
class
matchzoo.processor_units.processor_units.
VocabularyUnit
¶ 基类:
matchzoo.processor_units.processor_units.StatefulProcessorUnit
Vocabulary class.
Examples
>>> vocab = VocabularyUnit() >>> vocab.fit(['A', 'B', 'C', 'D', 'E']) >>> term_index = vocab.state['term_index'] >>> term_index {'E': 1, 'C': 2, 'D': 3, 'A': 4, 'B': 5} >>> index_term = vocab.state['index_term'] >>> index_term {1: 'C', 2: 'A', 3: 'E', 4: 'B', 5: 'D'}
>>> term_index['out-of-vocabulary-term'] 0 >>> index_term[0] '' >>> index_term[42] Traceback (most recent call last): ... KeyError: 42
>>> a_index = term_index['A'] >>> c_index = term_index['C'] >>> vocab.transform(['C', 'A', 'C']) == [c_index, a_index, c_index] True >>> vocab.transform(['C', 'A', 'OOV']) == [c_index, a_index, 0] True
>>> indices = vocab.transform('ABCDDZZZ') >>> ''.join(vocab.state['index_term'][i] for i in indices) 'ABCDD'
-
class
IndexTerm
¶ 基类:
dict
Map index to term.
-
class
TermIndex
¶ 基类:
dict
Map term to index.
-
transform
(tokens)¶ Transform a list of tokens to corresponding indices.
返回类型: list
-
class
-
class
matchzoo.processor_units.processor_units.
WordHashingUnit
(term_index)¶ 基类:
matchzoo.processor_units.processor_units.ProcessorUnit
Word-hashing layer for DSSM-based models.
The input of
WordHashingUnit
should be a list of word sub-letter list extracted from one document. The output of is the word-hashing representation of this document.NgramLetterUnit
andVocabularyUnit
are two essential prerequisite ofWordHashingUnit
.Examples
>>> letters = [['#te', 'tes','est', 'st#'], ['oov']] >>> word_hashing = WordHashingUnit( ... term_index={'': 0,'st#': 1, '#te': 2, 'est': 3, 'tes': 4}) >>> hashing = word_hashing.transform(letters) >>> hashing[0] array([0., 1., 1., 1., 1., 0.]) >>> hashing[1] array([1., 0., 0., 0., 0., 0.]) >>> hashing.shape (2, 6)
-
transform
(terms)¶ Transform list of
letters
into word hashing layer.参数: terms ( list
) -- list of tri_letters generated byNgramLetterUnit
.返回类型: ndarray
返回: Word hashing representation of tri-letters.
-
-
matchzoo.processor_units.processor_units.
list_available
()¶ List all available units.
Module contents¶
matchzoo.tasks package¶
Submodules¶
matchzoo.tasks.classification module¶
Classification task.
-
class
matchzoo.tasks.classification.
Classification
(num_classes=2)¶ 基类:
matchzoo.engine.base_task.BaseTask
Classification task.
Examples
>>> classification_task = Classification(num_classes=2) >>> classification_task.metrics = ['precision'] >>> classification_task.num_classes 2 >>> classification_task.output_shape (2,) >>> classification_task.output_dtype <class 'int'> >>> print(classification_task) Classification Task with 2 classes
-
classmethod
list_available_losses
()¶ 返回类型: list
返回: a list of available losses.
-
classmethod
list_available_metrics
()¶ 返回类型: list
返回: a list of available metrics.
-
num_classes
¶ return -- number of classes to classify.
返回类型: int
-
output_dtype
¶ return -- target data type, expect int as output.
-
output_shape
¶ return -- output shape of a single sample of the task.
返回类型: tuple
-
classmethod
matchzoo.tasks.ranking module¶
Ranking task.
-
class
matchzoo.tasks.ranking.
Ranking
(loss=None, metrics=None)¶ 基类:
matchzoo.engine.base_task.BaseTask
Ranking Task.
Examples
>>> ranking_task = Ranking() >>> ranking_task.metrics = ['map', 'ndcg'] >>> ranking_task.output_shape (1,) >>> ranking_task.output_dtype <class 'float'> >>> print(ranking_task) Ranking Task
-
classmethod
list_available_losses
()¶ 返回类型: list
返回: a list of available losses.
-
classmethod
list_available_metrics
()¶ 返回类型: list
返回: a list of available metrics.
-
output_dtype
¶ return -- target data type, expect float as output.
-
output_shape
¶ return -- output shape of a single sample of the task.
返回类型: tuple
-
classmethod
matchzoo.tasks.utils module¶
Task utilities.
Module contents¶
Submodules¶
matchzoo.embedding module¶
Matchzoo toolkit for token embedding.
-
class
matchzoo.embedding.
Embedding
(data)¶ 基类:
object
Embedding class.
- Examples::
>>> import matchzoo as mz >>> data_pack = mz.datasets.toy.load_data() >>> pp = mz.preprocessors.NaivePreprocessor() >>> vocab_unit = mz.build_vocab_unit(pp.fit_transform(data_pack), ... verbose=0) >>> term_index = vocab_unit.state['term_index'] >>> embed_path = mz.datasets.embeddings.EMBED_RANK
- To load from a file:
>>> embedding = mz.embedding.load_from_file(embed_path) >>> matrix = embedding.build_matrix(term_index) >>> matrix.shape[0] == len(term_index) + 1 True
- To build your own:
>>> data = pd.DataFrame(data=[[0, 1], [2, 3]], index=['A', 'B']) >>> embedding = mz.embedding.Embedding(data) >>> matrix = embedding.build_matrix({'A': 2, 'B': 1}) >>> matrix.shape == (3, 2) True
-
build_matrix
(term_index, initializer=<function Embedding.<lambda>>)¶ Build a matrix using term_index.
参数: - term_index (
dict
) -- A dict or TermIndex to build with. - initializer -- A callable that returns a default value for missing terms in data. (default: a random uniform distribution in range) (-0.2, 0.2)).
返回类型: ndarray
返回: A matrix.
- term_index (
-
input_dim
¶ return Embedding input dimension.
返回类型: int
-
output_dim
¶ return Embedding output dimension.
返回类型: int
-
matchzoo.embedding.
load_from_file
(file_path, mode='word2vec')¶ Load embedding from file_path.
参数: - file_path (
str
) -- Path to file. - mode (
str
) -- Embedding file format mode, one of 'word2vec' or 'glove'. (default: 'word2vec')
返回类型: 返回: An
matchzoo.embedding.Embedding
instance.- file_path (
matchzoo.logger module¶
MatchZoo Logging module.
matchzoo.version module¶
Matchzoo version file.
Module contents¶
MatchZoo Model Reference¶
NaiveModel¶
Model Documentation¶
Naive model with a simplest structure for testing purposes.
Bare minimum functioning model. The best choice to get things rolling. The worst choice to fit and evaluate performance.
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | name | Not related to the model's behavior. | ||
1 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class 'matchzoo.models.naive_model.NaiveModel'> | |
2 | input_shapes | Dependent on the model and data. Should be set manually. | ||
3 | task | Decides model output shape, loss, and metrics. | ||
4 | optimizer | choice in ['adam', 'adgrad', 'rmsprop'] |
DSSM¶
Model Documentation¶
Deep structured semantic model.
- Examples:
>>> model = DSSM() >>> model.params['mlp_num_layers'] = 3 >>> model.params['mlp_num_units'] = 300 >>> model.params['mlp_num_fan_out'] = 128 >>> model.params['mlp_activation_func'] = 'relu' >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | name | Not related to the model's behavior. | ||
1 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class 'matchzoo.models.dssm.DSSM'> | |
2 | input_shapes | Dependent on the model and data. Should be set manually. | ||
3 | task | Decides model output shape, loss, and metrics. | ||
4 | optimizer | adam | choice in ['adam', 'adgrad', 'rmsprop'] | |
5 | with_multi_layer_perceptron | A flag of whether a multiple layer perceptron is used. Shouldn't be changed. | True | |
6 | mlp_num_units | Number of units in first mlp_num_layers layers. | ||
7 | mlp_num_layers | Number of layers of the multiple layer percetron. | ||
8 | mlp_num_fan_out | Number of units of the layer that connects the multiple layer percetron and the output. | ||
9 | mlp_activation_func | Activation function used in the multiple layer perceptron. |
CDSSM¶
Model Documentation¶
CDSSM Model implementation.
Learning Semantic Representations Using Convolutional Neural Networks for Web Search. (2014a) A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval. (2014b)
- Examples:
>>> model = CDSSM() >>> model.params['optimizer'] = 'adam' >>> model.params['filters'] = 32 >>> model.params['kernel_size'] = 3 >>> model.params['conv_activation_func'] = 'relu' >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | name | Not related to the model's behavior. | ||
1 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class 'matchzoo.models.cdssm.CDSSM'> | |
2 | input_shapes | Dependent on the model and data. Should be set manually. | ||
3 | task | Decides model output shape, loss, and metrics. | ||
4 | optimizer | choice in ['adam', 'adgrad', 'rmsprop'] | ||
5 | with_multi_layer_perceptron | A flag of whether a multiple layer perceptron is used. Shouldn't be changed. | True | |
6 | mlp_num_units | Number of units in first mlp_num_layers layers. | ||
7 | mlp_num_layers | Number of layers of the multiple layer percetron. | ||
8 | mlp_num_fan_out | Number of units of the layer that connects the multiple layer percetron and the output. | ||
9 | mlp_activation_func | Activation function used in the multiple layer perceptron. | ||
10 | filters | Number of filters in the 1D convolution layer. | 32 | |
11 | kernel_size | Number of kernel size in the 1D convolution layer. | 3 | |
12 | strides | Strides in the 1D convolution layer. | 1 | |
13 | padding | The padding mode in the convolution layer. It should be one of same, valid, and causal. | same | |
14 | conv_activation_func | Activation function in the convolution layer. | relu | |
15 | w_initializer | glorot_normal | ||
16 | b_initializer | zeros | ||
17 | dropout_rate | The dropout rate. | 0.3 |
DenseBaselineModel¶
Model Documentation¶
A simple densely connected baseline model.
- Examples:
>>> model = DenseBaselineModel() >>> model.params['mlp_num_layers'] = 2 >>> model.params['mlp_num_units'] = 300 >>> model.params['mlp_num_fan_out'] = 128 >>> model.params['mlp_activation_func'] = 'relu' >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build() >>> model.compile()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | name | Not related to the model's behavior. | ||
1 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class 'matchzoo.models.dense_baseline_model.DenseBaselineModel'> | |
2 | input_shapes | Dependent on the model and data. Should be set manually. | ||
3 | task | Decides model output shape, loss, and metrics. | ||
4 | optimizer | choice in ['adam', 'adgrad', 'rmsprop'] | ||
5 | with_multi_layer_perceptron | A flag of whether a multiple layer perceptron is used. Shouldn't be changed. | True | |
6 | mlp_num_units | Number of units in first mlp_num_layers layers. | 256 | quantitative uniform distribution in [16, 512), with a step size of 1 |
7 | mlp_num_layers | Number of layers of the multiple layer percetron. | quantitative uniform distribution in [1, 5), with a step size of 1 | |
8 | mlp_num_fan_out | Number of units of the layer that connects the multiple layer percetron and the output. | ||
9 | mlp_activation_func | Activation function used in the multiple layer perceptron. |
ArcI¶
Model Documentation¶
ArcI Model.
- Examples:
>>> model = ArcI() >>> model.params['num_blocks'] = 1 >>> model.params['left_filters'] = [32] >>> model.params['right_filters'] = [32] >>> model.params['left_kernel_sizes'] = [3] >>> model.params['right_kernel_sizes'] = [3] >>> model.params['left_pool_sizes'] = [2] >>> model.params['right_pool_sizes'] = [4] >>> model.params['conv_activation_func'] = 'relu' >>> model.params['mlp_num_layers'] = 1 >>> model.params['mlp_num_units'] = 64 >>> model.params['mlp_num_fan_out'] = 32 >>> model.params['mlp_activation_func'] = 'relu' >>> model.params['dropout_rate'] = 0.5 >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | name | Not related to the model's behavior. | ||
1 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class 'matchzoo.models.arci.ArcI'> | |
2 | input_shapes | Dependent on the model and data. Should be set manually. | ||
3 | task | Decides model output shape, loss, and metrics. | ||
4 | optimizer | adam | choice in ['adam', 'adgrad', 'rmsprop'] | |
5 | with_embedding | A flag used help auto module. Shouldn't be changed. | True | |
6 | embedding_input_dim | Usually equals vocab size + 1. Should be set manually. | ||
7 | embedding_output_dim | Should be set manually. | ||
8 | embedding_trainable | True to enable embedding layer training, False to freeze embedding parameters. | ||
9 | with_multi_layer_perceptron | A flag of whether a multiple layer perceptron is used. Shouldn't be changed. | True | |
10 | mlp_num_units | Number of units in first mlp_num_layers layers. | ||
11 | mlp_num_layers | Number of layers of the multiple layer percetron. | ||
12 | mlp_num_fan_out | Number of units of the layer that connects the multiple layer percetron and the output. | ||
13 | mlp_activation_func | Activation function used in the multiple layer perceptron. | ||
14 | num_blocks | Number of convolution blocks. | 1 | |
15 | left_filters | The filter size of each convolution blocks for the left input. | [32] | |
16 | left_kernel_sizes | The kernel size of each convolution blocks for the left input. | [3] | |
17 | right_filters | The filter size of each convolution blocks for the right input. | [32] | |
18 | right_kernel_sizes | The kernel size of each convolution blocks for the right input. | [3] | |
19 | conv_activation_func | The activation function in the convolution layer. | relu | |
20 | left_pool_sizes | The pooling size of each convolution blocks for the left input. | [2] | |
21 | right_pool_sizes | The pooling size of each convolution blocks for the right input. | [2] | |
22 | padding | The padding mode in the convolution layer. It should be oneof same, valid, and causal. | same | choice in ['same', 'valid', 'causal'] |
23 | dropout_rate | The dropout rate. | 0.0 | quantitative uniform distribution in [0.0, 0.8), with a step size of 0.01 |
ArcII¶
Model Documentation¶
ArcII Model.
- Examples:
>>> model = ArcII() >>> model.params['embedding_output_dim'] = 300 >>> model.params['num_blocks'] = 2 >>> model.params['kernel_1d_count'] = 32 >>> model.params['kernel_1d_size'] = 3 >>> model.params['kernel_2d_count'] = [16, 32] >>> model.params['kernel_2d_size'] = [[3, 3], [3, 3]] >>> model.params['pool_2d_size'] = [[2, 2], [2, 2]] >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | name | Not related to the model's behavior. | ||
1 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class 'matchzoo.models.arcii.ArcII'> | |
2 | input_shapes | Dependent on the model and data. Should be set manually. | ||
3 | task | Decides model output shape, loss, and metrics. | ||
4 | optimizer | adam | choice in ['adam', 'rmsprop', 'adagrad'] | |
5 | with_embedding | A flag used help auto module. Shouldn't be changed. | True | |
6 | embedding_input_dim | Usually equals vocab size + 1. Should be set manually. | ||
7 | embedding_output_dim | Should be set manually. | ||
8 | embedding_trainable | True to enable embedding layer training, False to freeze embedding parameters. | ||
9 | num_blocks | Number of 2D convolution blocks. | 1 | |
10 | kernel_1d_count | Kernel count of 1D convolution layer. | 32 | |
11 | kernel_1d_size | Kernel size of 1D convolution layer. | 3 | |
12 | kernel_2d_count | Kernel count of 2D convolution layer ineach block | [32] | |
13 | kernel_2d_size | Kernel size of 2D convolution layer in each block. | [[3, 3]] | |
14 | activation | Activation function. | relu | |
15 | pool_2d_size | Size of pooling layer in each block. | [[2, 2]] | |
16 | padding | The padding mode in the convolution layer. It should be oneof same, valid, and causal. | same | choice in ['same', 'valid', 'causal'] |
17 | dropout_rate | The dropout rate. | 0.0 | quantitative uniform distribution in [0.0, 0.8), with a step size of 0.01 |
MatchPyramid¶
Model Documentation¶
MatchPyramid Model.
- Examples:
>>> model = MatchPyramid() >>> model.params['embedding_output_dim'] = 300 >>> model.params['num_blocks'] = 2 >>> model.params['kernel_count'] = [16, 32] >>> model.params['kernel_size'] = [[3, 3], [3, 3]] >>> model.params['dpool_size'] = [3, 10] >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | name | Not related to the model's behavior. | ||
1 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class 'matchzoo.models.match_pyramid.MatchPyramid'> | |
2 | input_shapes | Dependent on the model and data. Should be set manually. | ||
3 | task | Decides model output shape, loss, and metrics. | ||
4 | optimizer | adam | choice in ['adam', 'rmsprop', 'adagrad'] | |
5 | with_embedding | A flag used help auto module. Shouldn't be changed. | True | |
6 | embedding_input_dim | Usually equals vocab size + 1. Should be set manually. | ||
7 | embedding_output_dim | Should be set manually. | ||
8 | embedding_trainable | True to enable embedding layer training, False to freeze embedding parameters. | ||
9 | num_blocks | Number of convolution blocks. | 1 | |
10 | kernel_count | The kernel count of the 2D convolution of each block. | [32] | |
11 | kernel_size | The kernel size of the 2D convolution of each block. | [[3, 3]] | |
12 | activation | The activation function. | relu | |
13 | dpool_size | The max-pooling size of each block. | [3, 10] | |
14 | padding | The padding mode in the convolution layer. It should be oneof same, valid, and causal. | same | choice in ['same', 'valid', 'causal'] |
15 | dropout_rate | The dropout rate. | 0.0 | quantitative uniform distribution in [0.0, 0.8), with a step size of 0.01 |
KNRM¶
Model Documentation¶
KNRM model.
- Examples:
>>> model = KNRM() >>> model.params['embedding_input_dim'] = 10000 >>> model.params['embedding_output_dim'] = 10 >>> model.params['embedding_trainable'] = True >>> model.params['kernel_num'] = 11 >>> model.params['sigma'] = 0.1 >>> model.params['exact_sigma'] = 0.001 >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | name | Not related to the model's behavior. | ||
1 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class 'matchzoo.models.knrm.KNRM'> | |
2 | input_shapes | Dependent on the model and data. Should be set manually. | ||
3 | task | Decides model output shape, loss, and metrics. | ||
4 | optimizer | choice in ['adam', 'adgrad', 'rmsprop'] | ||
5 | with_embedding | A flag used help auto module. Shouldn't be changed. | True | |
6 | embedding_input_dim | Usually equals vocab size + 1. Should be set manually. | ||
7 | embedding_output_dim | Should be set manually. | ||
8 | embedding_trainable | True to enable embedding layer training, False to freeze embedding parameters. | ||
9 | kernel_num | The number of RBF kernels. | 11 | quantitative uniform distribution in [5, 20), with a step size of 1 |
10 | sigma | The sigma defines the kernel width. | 0.1 | quantitative uniform distribution in [0.01, 0.2), with a step size of 0.01 |
11 | exact_sigma | The exact_sigma denotes the sigma for exact match. | 0.001 |
DUET¶
Model Documentation¶
DUET Model.
- Examples:
>>> model = DUET() >>> model.params['embedding_input_dim'] = 1000 >>> model.params['embedding_output_dim'] = 300 >>> model.params['lm_filters'] = 32 >>> model.params['lm_hidden_sizes'] = [64, 32] >>> model.params['dropout_rate'] = 0.5 >>> model.params['dm_filters'] = 32 >>> model.params['dm_kernel_size'] = 3 >>> model.params['dm_d_mpool'] = 4 >>> model.params['dm_hidden_sizes'] = [64, 32] >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | name | Not related to the model's behavior. | ||
1 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class 'matchzoo.models.duet.DUET'> | |
2 | input_shapes | Dependent on the model and data. Should be set manually. | ||
3 | task | Decides model output shape, loss, and metrics. | ||
4 | optimizer | choice in ['adam', 'adgrad', 'rmsprop'] | ||
5 | with_embedding | A flag used help auto module. Shouldn't be changed. | True | |
6 | embedding_input_dim | Usually equals vocab size + 1. Should be set manually. | ||
7 | embedding_output_dim | Should be set manually. | ||
8 | embedding_trainable | True to enable embedding layer training, False to freeze embedding parameters. | ||
9 | lm_filters | Filter size of 1D convolution layer in the local model. | 32 | |
10 | lm_hidden_sizes | A list of hidden size of the MLP layer in the local model. | [32] | |
11 | dm_filters | Filter size of 1D convolution layer in the distributed model. | 32 | |
12 | dm_kernel_size | Kernel size of 1D convolution layer in the distributed model. | 3 | |
13 | dm_q_hidden_size | Hidden size of the MLP layer for the left text in the distributed model. | 32 | |
14 | dm_d_mpool | Max pooling size for the right text in the distributed model. | 3 | |
15 | dm_hidden_sizes | A list of hidden size of the MLP layer in the distributed model. | [32] | |
16 | padding | The padding mode in the convolution layer. It should be one of same, valid, and causal. | same | |
17 | activation_func | Activation function in the convolution layer. | relu | |
18 | dropout_rate | The dropout rate. | 0.5 | quantitative uniform distribution in [0.0, 0.8), with a step size of 0.02 |
DRMMTKS¶
Model Documentation¶
DRMMTKS Model.
- Examples:
>>> model = DRMMTKS() >>> model.params['embedding_input_dim'] = 10000 >>> model.params['embedding_output_dim'] = 100 >>> model.params['top_k'] = 20 >>> model.params['mlp_num_layers'] = 1 >>> model.params['mlp_num_units'] = 5 >>> model.params['mlp_num_fan_out'] = 1 >>> model.params['mlp_activation_func'] = 'tanh' >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | name | Not related to the model's behavior. | ||
1 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class 'matchzoo.models.drmmtks.DRMMTKS'> | |
2 | input_shapes | Dependent on the model and data. Should be set manually. | [(5,), (300,)] | |
3 | task | Decides model output shape, loss, and metrics. | ||
4 | optimizer | adam | choice in ['adam', 'adgrad', 'rmsprop'] | |
5 | with_embedding | A flag used help auto module. Shouldn't be changed. | True | |
6 | embedding_input_dim | Usually equals vocab size + 1. Should be set manually. | ||
7 | embedding_output_dim | Should be set manually. | ||
8 | embedding_trainable | True to enable embedding layer training, False to freeze embedding parameters. | ||
9 | with_multi_layer_perceptron | A flag of whether a multiple layer perceptron is used. Shouldn't be changed. | True | |
10 | mlp_num_units | Number of units in first mlp_num_layers layers. | ||
11 | mlp_num_layers | Number of layers of the multiple layer percetron. | ||
12 | mlp_num_fan_out | Number of units of the layer that connects the multiple layer percetron and the output. | ||
13 | mlp_activation_func | Activation function used in the multiple layer perceptron. | ||
14 | mask_value | The value to be masked from inputs. | -1 | |
15 | top_k | Size of top-k pooling layer. | 10 | quantitative uniform distribution in [2, 100), with a step size of 1 |
DRMM¶
Model Documentation¶
DRMM Model.
- Examples:
>>> model = DRMM() >>> model.params['mlp_num_layers'] = 1 >>> model.params['mlp_num_units'] = 5 >>> model.params['mlp_num_fan_out'] = 1 >>> model.params['mlp_activation_func'] = 'tanh' >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build() >>> model.compile()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | name | Not related to the model's behavior. | ||
1 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class 'matchzoo.models.drmm.DRMM'> | |
2 | input_shapes | Dependent on the model and data. Should be set manually. | [(5,), (5, 30)] | |
3 | task | Decides model output shape, loss, and metrics. | ||
4 | optimizer | adam | choice in ['adam', 'adgrad', 'rmsprop'] | |
5 | with_embedding | A flag used help auto module. Shouldn't be changed. | True | |
6 | embedding_input_dim | Usually equals vocab size + 1. Should be set manually. | ||
7 | embedding_output_dim | Should be set manually. | ||
8 | embedding_trainable | True to enable embedding layer training, False to freeze embedding parameters. | ||
9 | with_multi_layer_perceptron | A flag of whether a multiple layer perceptron is used. Shouldn't be changed. | True | |
10 | mlp_num_units | Number of units in first mlp_num_layers layers. | ||
11 | mlp_num_layers | Number of layers of the multiple layer percetron. | ||
12 | mlp_num_fan_out | Number of units of the layer that connects the multiple layer percetron and the output. | ||
13 | mlp_activation_func | Activation function used in the multiple layer perceptron. | ||
14 | mask_value | The value to be masked from inputs. | -1 |
ANMM¶
Model Documentation¶
ANMM Model.
- Examples:
>>> model = ANMM() >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | name | Not related to the model's behavior. | ||
1 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class 'matchzoo.models.anmm.ANMM'> | |
2 | input_shapes | Dependent on the model and data. Should be set manually. | ||
3 | task | Decides model output shape, loss, and metrics. | ||
4 | optimizer | choice in ['adam', 'adgrad', 'rmsprop'] | ||
5 | with_embedding | A flag used help auto module. Shouldn't be changed. | True | |
6 | embedding_input_dim | Usually equals vocab size + 1. Should be set manually. | ||
7 | embedding_output_dim | Should be set manually. | ||
8 | embedding_trainable | True to enable embedding layer training, False to freeze embedding parameters. | ||
9 | dropout_rate | The dropout rate. | 0.1 | |
10 | num_layers | Number of hidden layers in the MLP layer. | 2 | |
11 | hidden_sizes | Number of hidden size for each hidden layer | [30, 30] |
MVLSTM¶
Model Documentation¶
MVLSTM Model.
- Examples:
>>> model = MVLSTM() >>> model.params['lstm_units'] = 32 >>> model.params['top_k'] = 50 >>> model.params['mlp_num_layers'] = 2 >>> model.params['mlp_num_units'] = 20 >>> model.params['mlp_num_fan_out'] = 10 >>> model.params['mlp_activation_func'] = 'relu' >>> model.params['dropout_rate'] = 0.5 >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | name | Not related to the model's behavior. | ||
1 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class 'matchzoo.models.mvlstm.MVLSTM'> | |
2 | input_shapes | Dependent on the model and data. Should be set manually. | ||
3 | task | Decides model output shape, loss, and metrics. | ||
4 | optimizer | adam | choice in ['adam', 'adgrad', 'rmsprop'] | |
5 | with_embedding | A flag used help auto module. Shouldn't be changed. | True | |
6 | embedding_input_dim | Usually equals vocab size + 1. Should be set manually. | ||
7 | embedding_output_dim | Should be set manually. | ||
8 | embedding_trainable | True to enable embedding layer training, False to freeze embedding parameters. | ||
9 | with_multi_layer_perceptron | A flag of whether a multiple layer perceptron is used. Shouldn't be changed. | True | |
10 | mlp_num_units | Number of units in first mlp_num_layers layers. | ||
11 | mlp_num_layers | Number of layers of the multiple layer percetron. | ||
12 | mlp_num_fan_out | Number of units of the layer that connects the multiple layer percetron and the output. | ||
13 | mlp_activation_func | Activation function used in the multiple layer perceptron. | ||
14 | lstm_units | Integer, the hidden size in the bi-directional LSTM layer. | 32 | |
15 | dropout_rate | Float, the dropout rate. | 0.0 | |
16 | top_k | Integer, the size of top-k pooling layer. | 10 | quantitative uniform distribution in [2, 100), with a step size of 1 |
MatchLSTM¶
Model Documentation¶
Match LSTM model.
- Examples:
>>> model = MatchLSTM() >>> model.guess_and_fill_missing_params(verbose=0) >>> model.params['embedding_input_dim'] = 10000 >>> model.params['embedding_output_dim'] = 100 >>> model.params['embedding_trainable'] = True >>> model.params['fc_num_units'] = 200 >>> model.params['lstm_num_units'] = 256 >>> model.params['dropout_rate'] = 0.5 >>> model.build()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | name | Not related to the model's behavior. | ||
1 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class 'matchzoo.contrib.models.match_lstm.MatchLSTM'> | |
2 | input_shapes | Dependent on the model and data. Should be set manually. | ||
3 | task | Decides model output shape, loss, and metrics. | ||
4 | optimizer | choice in ['adam', 'adgrad', 'rmsprop'] | ||
5 | with_embedding | A flag used help auto module. Shouldn't be changed. | True | |
6 | embedding_input_dim | Usually equals vocab size + 1. Should be set manually. | ||
7 | embedding_output_dim | Should be set manually. | ||
8 | embedding_trainable | True to enable embedding layer training, False to freeze embedding parameters. | ||
9 | lstm_num_units | The hidden size in the LSTM layer. | 256 | quantitative uniform distribution in [128, 384), with a step size of 32 |
10 | fc_num_units | The hidden size in the full connection layer. | 200 | quantitative uniform distribution in [100, 300), with a step size of 20 |
11 | dropout_rate | The dropout rate. | 0.0 | quantitative uniform distribution in [0.0, 0.9), with a step size of 0.01 |