matchzoo.engine package¶

Submodules¶

matchzoo.engine.base_metric module¶

Metric base class and some related utilities.

class matchzoo.engine.base_metric.BaseMetric¶

基类：abc.ABC

Metric base class.

ALIAS = 'base_metric'¶

matchzoo.engine.base_metric.parse_metric(metric)¶

Parse input metric in any form into a BaseMetric instance.

参数:	metric (`Union`[`str`, `Type`[`BaseMetric`], `BaseMetric`]) -- Input metric in any form.
返回:	A `BaseMetric` instance

Examples::

>>> from matchzoo import engine, metrics

Use str as keras native metrics:

>>> engine.parse_metric('mse')
'mse'

Use str as MatchZoo metrics:

>>> mz_metric = engine.parse_metric('map')
>>> type(mz_metric)
<class 'matchzoo.metrics.mean_average_precision.MeanAveragePrecision'>

Use matchzoo.engine.BaseMetric subclasses as MatchZoo metrics:

>>> type(engine.parse_metric(metrics.AveragePrecision))
<class 'matchzoo.metrics.average_precision.AveragePrecision'>

Use matchzoo.engine.BaseMetric instances as MatchZoo metrics:

>>> type(engine.parse_metric(metrics.AveragePrecision()))
<class 'matchzoo.metrics.average_precision.AveragePrecision'>

matchzoo.engine.base_metric.sort_and_couple(labels, scores)¶

Zip the labels with scores into a single list.

返回类型:	<built-in function array>

matchzoo.engine.base_model module¶

Base Model.

class matchzoo.engine.base_model.BaseModel(params=None, backend=None)¶

基类：abc.ABC

Abstract base class of all matchzoo models.

BACKEND_WEIGHTS_FILENAME = 'backend_weights.h5'¶

PARAMS_FILENAME = 'params.dill'¶

backend¶

return model backend, a keras model instance.

返回类型:	`Model`

build()¶

Build model, each sub class need to impelemnt this method.

Example

>>> BaseModel()  
Traceback (most recent call last):
...
TypeError: Can't instantiate abstract class BaseModel ...
>>> class MyModel(BaseModel):
...     def build(self):
...         pass
>>> assert MyModel()

compile()¶

Compile model for training.

Only keras native metrics are compiled together with backend. MatchZoo metrics are evaluated only through evaluate(). Notice that keras count loss as one of the metrics while MatchZoo matchzoo.engine.BaseTask does not.

Examples

>>> from matchzoo import models
>>> model = models.Naive()
>>> model.guess_and_fill_missing_params(verbose=0)
>>> model.params['task'].metrics = ['mse', 'map']
>>> model.params['task'].metrics
['mse', mean_average_precision(0)]
>>> model.build()
>>> model.compile()

evaluate(x, y, batch_size=128, verbose=1)¶

Evaluate the model.

See keras.models.Model.evaluate() for more details.

参数:	x (`Union`[`ndarray`, `List`[`ndarray`], `Dict`[`str`, `ndarray`]]) -- input data y (`ndarray`) -- labels batch_size (`int`) -- number of samples per gradient update verbose (`int`) -- verbosity mode, 0 or 1
返回类型:	`Dict`[`str`, `float`]
返回:	scalar test loss (if the model has a single output and no metrics) or list of scalars (if the model has multiple outputs and/or metrics). The attribute model.backend.metrics_names will give you the display labels for the scalar outputs.

Examples::

>>> import matchzoo as mz
>>> data_pack = mz.datasets.toy.load_data()
>>> preprocessor = mz.preprocessors.NaivePreprocessor()
>>> data_pack = preprocessor.fit_transform(data_pack)
>>> m = mz.models.DenseBaseline()
>>> m.params['task'] = mz.tasks.Ranking()
>>> m.params['task'].metrics = [
...     'acc', 'mse', 'mae', 'ce',
...     'average_precision', 'precision', 'dcg', 'ndcg',
...     'mean_reciprocal_rank', 'mean_average_precision', 'mrr',
...     'map', 'MAP',
...     mz.metrics.AveragePrecision(threshold=1),
...     mz.metrics.Precision(k=2, threshold=2),
...     mz.metrics.DiscountedCumulativeGain(k=2),
...     mz.metrics.NormalizedDiscountedCumulativeGain(
...         k=3, threshold=-1),
...     mz.metrics.MeanReciprocalRank(threshold=2),
...     mz.metrics.MeanAveragePrecision(threshold=3)
... ]
>>> m.guess_and_fill_missing_params(verbose=0)
>>> m.build()
>>> m.compile()
>>> x, y = data_pack.unpack()
>>> evals = m.evaluate(x, y, verbose=0)
>>> type(evals)
<class 'dict'>

fit(x, y, batch_size=128, epochs=1, verbose=1, **kwargs)¶

Fit the model.

See keras.models.Model.fit() for more details.

参数:	x (`Union`[`ndarray`, `List`[`ndarray`]]) -- input data. y (`ndarray`) -- labels. batch_size (`int`) -- number of samples per gradient update. epochs (`int`) -- number of epochs to train the model. verbose (`int`) -- 0, 1, or 2. Verbosity mode. 0 = silent, 1 = verbose, 2 = one log line per epoch.

Key word arguments not listed above will be propagated to keras's fit.

返回类型:	`History`
返回:	A keras.callbacks.History instance. Its history attribute contains all information collected during training.

fit_generator(generator, epochs=1, verbose=1, **kwargs)¶

Fit the model with matchzoo generator.

See keras.models.Model.fit_generator() for more details.

参数:	generator (`DataGenerator`) -- A generator, an instance of `engine.DataGenerator`. epochs (`int`) -- Number of epochs to train the model. verbose (`int`) -- 0, 1, or 2. Verbosity mode. 0 = silent, 1 = verbose, 2 = one log line per epoch.
返回类型:	`History`
返回:	A keras.callbacks.History instance. Its history attribute contains all information collected during training.

classmethod get_default_params(with_embedding=False, with_multi_layer_perceptron=False)¶

Model default parameters.

The common usage is to instantiate matchzoo.engine.ModelParams: first, then set the model specific parametrs.

Examples

>>> class MyModel(BaseModel):
...     def build(self):
...         print(self._params['num_eggs'], 'eggs')
...         print('and', self._params['ham_type'])
...
...     @classmethod
...     def get_default_params(cls):
...         params = engine.ParamTable()
...         params.add(engine.Param('num_eggs', 512))
...         params.add(engine.Param('ham_type', 'Parma Ham'))
...         return params
>>> my_model = MyModel()
>>> my_model.build()
512 eggs
and Parma Ham

Notice that all parameters must be serialisable for the entire model to be serialisable. Therefore, it's strongly recommended to use python native data types to store parameters.

返回类型:	`ParamTable`
返回:	model parameters

classmethod get_default_preprocessor()¶

Model default preprocessor.

The preprocessor's transform should produce a correctly shaped data pack that can be used for training. Some extra configuration (e.g. setting input_shapes in matchzoo.models.DSSMModel may be required on the user's end.

返回类型:	`BasePreprocessor`
返回:	Default preprocessor.

guess_and_fill_missing_params(verbose=1)¶

Guess and fill missing parameters in params.

Use this method to automatically fill-in hyper parameters. This involves some guessing so the parameter it fills could be wrong. For example, the default task is Ranking, and if we do not set it to Classification manaully for data packs prepared for classification, then the shape of the model output and the data will mismatch.

参数:	verbose -- Verbosity.

load_embedding_matrix(embedding_matrix, name='embedding')¶

Load an embedding matrix.

Load an embedding matrix into the model's embedding layer. The name of the embedding layer is specified by name. For models with only one embedding layer, set name='embedding' when creating the keras layer, and use the default name when load the matrix. For models with more than one embedding layers, initialize keras layer with different layer names, and set name accordingly to load a matrix to a chosen layer.

参数:	embedding_matrix (`ndarray`) -- Embedding matrix to be loaded. name (`str`) -- Name of the layer. (default: 'embedding')

params¶

return -- model parameters.

返回类型:	`ParamTable`

predict(x, batch_size=128)¶

Generate output predictions for the input samples.

See keras.models.Model.predict() for more details.

参数:	x (`Union`[`ndarray`, `List`[`ndarray`]]) -- input data batch_size -- number of samples per gradient update
返回类型:	`ndarray`
返回:	numpy array(s) of predictions

save(dirpath)¶

Save the model.

A saved model is represented as a directory with two files. One is a model parameters file saved by pickle, and the other one is a model h5 file saved by keras.

参数:	dirpath (`Union`[`str`, `Path`]) -- directory path of the saved model

matchzoo.engine.base_model.load_model(dirpath)¶

Load a model. The reverse function of BaseModel.save().

参数:	dirpath (`Union`[`str`, `Path`]) -- directory path of the saved model
返回类型:	`BaseModel`
返回:	a `BaseModel` instance

matchzoo.engine.base_preprocessor module¶

BasePreprocessor define input and ouutput for processors.

class matchzoo.engine.base_preprocessor.BasePreprocessor¶

基类：object

BasePreprocessor to input handle data.

A preprocessor should be used in two steps. First, fit, then, transform. fit collects information into context, which includes everything the preprocessor needs to transform together with other useful information for later use. fit will only change the preprocessor's inner state but not the input data. In contrast, transform returns a modified copy of the input data without changing the preprocessor's inner state.

DATA_FILENAME = 'preprocessor.dill'¶

context¶: Return context.

fit(data_pack, verbose=1)¶

Fit parameters on input data.

This method is an abstract base method, need to be implemented in the child class.

This method is expected to return itself as a callable object.

参数:	data_pack (`DataPack`) -- `Datapack` object to be fitted. verbose -- Verbosity.
返回类型:	`BasePreprocessor`

fit_transform(data_pack, verbose=1)¶

Call fit-transform.

参数:	data_pack (`DataPack`) -- `DataPack` object to be processed.
返回类型:	`DataPack`

save(dirpath)¶

Save the DSSMPreprocessor object.

A saved DSSMPreprocessor is represented as a directory with the context object (fitted parameters on training data), it will be saved by pickle.

参数:	dirpath (`Union`[`str`, `Path`]) -- directory path of the saved `DSSMPreprocessor`.

transform(data_pack, verbose=1)¶

Transform input data to expected manner.

This method is an abstract base method, need to be implemented in the child class.

参数:	data_pack (`DataPack`) -- `DataPack` object to be transformed. verbose -- Verbosity. or list of text-left, text-right tuples.
返回类型:	`DataPack`

matchzoo.engine.base_preprocessor.load_preprocessor(dirpath)¶

Load the fitted context. The reverse function of save().

参数:	dirpath (`Union`[`str`, `Path`]) -- directory path of the saved model.
返回类型:	`DataPack`
返回:	a `DSSMPreprocessor` instance.

matchzoo.engine.base_preprocessor.validate_context(func)¶: Validate context in the preprocessor.

matchzoo.engine.base_task module¶

Base task.

class matchzoo.engine.base_task.BaseTask(loss=None, metrics=None)¶

基类：abc.ABC

Base Task, shouldn't be used directly.

classmethod convert_metrics(metrics)¶

Convert metrics into properly formed list of metrics.

Examples

>>> BaseTask.convert_metrics(['mse'])
['mse']
>>> BaseTask.convert_metrics('map')
[mean_average_precision(0)]

classmethod list_available_losses()¶

返回类型:	`list`
返回:	a list of available losses.

classmethod list_available_metrics()¶

返回类型:	`list`
返回:	a list of available metrics.

loss¶: return -- Loss used in the task.

metrics¶: return -- Metrics used in the task.

output_dtype¶: return -- output data type for specific task.

output_shape¶

return -- output shape of a single sample of the task.

返回类型:	`tuple`

matchzoo.engine.base_task.list_available_tasks(base=<class 'matchzoo.engine.base_task.BaseTask'>)¶

返回类型:	`List`[`Type`[`BaseTask`]]
返回:	a list of available task types.

matchzoo.engine.callbacks module¶

Callbacks.

class matchzoo.engine.callbacks.EvaluateAllMetrics(model, x, y, once_every=1, batch_size=32, model_save_path=None, verbose=1)¶

基类：keras.callbacks.Callback

Callback to evaluate all metrics.

MatchZoo metrics can not be evaluated batch-wise since they require dataset-level information. As a result, MatchZoo metrics are not evaluated automatically when a Model fit. When this callback is used, all metrics, including MatchZoo metrics and Keras metrics, are evluated once every once_every epochs.

参数:

model (BaseModel) -- Model to evaluate.
x (Union[ndarray, List[ndarray]]) --
y (ndarray) --
once_every (int) -- Evaluation only triggers when epoch % once_every == 0. (default: 1, i.e. evaluate on every epoch's end)
batch_size (int) -- Number of samples per evaluation. This only affects the evaluation of Keras metrics, since MatchZoo metrics are always evaluated using the full data.
model_save_path (Optional[str]) -- Directory path to save the model after each evaluate callback, (default: None, i.e., no saving.)
verbose -- Verbosity.

on_epoch_end(epoch, logs=None)¶

Called at the end of en epoch.

参数:	epoch -- integer, index of epoch. logs -- dictionary of logs.
返回:	dictionary of logs.

matchzoo.engine.hyper_spaces module¶

Hyper parameter search spaces wrapping hyperopt.

class matchzoo.engine.hyper_spaces.HyperoptProxy(hyperopt_func, **kwargs)¶

基类：object

Hyperopt proxy class.

See hyperopt's documentation for more details: https://github.com/hyperopt/hyperopt/wiki/FMin

Reason of these wrappers:

A hyper space in hyperopt requires a label to instantiate. This label is used later as a reference to original hyper space that is sampled. In matchzoo, hyper spaces are used in matchzoo.engine.Param. Only if a hyper space's label matches its parent matchzoo.engine.Param's name, matchzoo can correctly back-refrenced the parameter got sampled. This can be done by asking the user always use the same name for a parameter and its hyper space, but typos can occur. As a result, these wrappers are created to hide hyper spaces' label, and always correctly bind them with its parameter's name.

Examples::

>>> import matchzoo as mz
>>> from hyperopt.pyll.stochastic import sample

Basic Usage:

>>> model = mz.models.DenseBaseline()
>>> sample(model.params.hyper_space)  
 {'mlp_num_layers': 1.0, 'mlp_num_units': 274.0}

Arithmetic Operations:

>>> new_space = 2 ** mz.hyper_spaces.quniform(2, 6)
>>> model.params.get('mlp_num_layers').hyper_space = new_space
>>> sample(model.params.hyper_space)  
{'mlp_num_layers': 8.0, 'mlp_num_units': 292.0}

convert(name)¶

Attach name as hyperopt.hp's label.

参数:	name (`str`) --
返回类型:	`Apply`
返回:	a hyperopt ready search space

class matchzoo.engine.hyper_spaces.choice(options)¶

基类：matchzoo.engine.hyper_spaces.HyperoptProxy

hyperopt.hp.choice() proxy.

class matchzoo.engine.hyper_spaces.quniform(low, high, q=1)¶

基类：matchzoo.engine.hyper_spaces.HyperoptProxy

hyperopt.hp.quniform() proxy.

class matchzoo.engine.hyper_spaces.uniform(low, high)¶

基类：matchzoo.engine.hyper_spaces.HyperoptProxy

hyperopt.hp.uniform() proxy.

matchzoo.engine.param module¶

Parameter class.

class matchzoo.engine.param.Param(name, value=None, hyper_space=None, validator=None, desc=None)¶

基类：object

Parameter class.

Basic usages with a name and value:

>>> param = Param('my_param', 10)
>>> param.name
'my_param'
>>> param.value
10

Use with a validator to make sure the parameter always keeps a valid value.

>>> param = Param(
...     name='my_param',
...     value=5,
...     validator=lambda x: 0 < x < 20
... )
>>> param.validator  
<function <lambda> at 0x...>
>>> param.value
5
>>> param.value = 10
>>> param.value
10
>>> param.value = -1
Traceback (most recent call last):
    ...
ValueError: Validator not satifised.
The validator's definition is as follows:
validator=lambda x: 0 < x < 20

Use with a hyper space. Setting up a hyper space for a parameter makes the parameter tunable in a matchzoo.engine.Tuner.

>>> from matchzoo.engine.hyper_spaces import quniform
>>> param = Param(
...     name='positive_num',
...     value=1,
...     hyper_space=quniform(low=1, high=5)
... )
>>> param.hyper_space  
<matchzoo.engine.hyper_spaces.quniform object at ...>
>>> from hyperopt.pyll.stochastic import sample
>>> hyperopt_space = param.hyper_space.convert(param.name)
>>> samples = [sample(hyperopt_space) for _ in range(64)]
>>> set(samples) == {1, 2, 3, 4, 5}
True

The boolean value of a Param instance is only True when the value is not None. This is because some default falsy values like zero or an empty list are valid parameter values. In other words, the boolean value means to be "if the parameter value is filled".

>>> param = Param('dropout')
>>> if param:
...     print('OK')
>>> param = Param('dropout', 0)
>>> if param:
...     print('OK')
OK

A _pre_assignment_hook is initialized as a data type convertor if the value is set as a number to keep data type consistency of the parameter. This conversion supports python built-in numbers, numpy numbers, and any number that inherits numbers.Number.

>>> param = Param('float_param', 0.5)
>>> param.value = 10
>>> param.value
10.0
>>> type(param.value)
<class 'float'>

desc¶: return -- Parameter description.

hyper_space¶: return -- Hyper space of the parameter.

name¶

return -- Name of the parameter.

返回类型:	`str`

set_default(val, verbose=1)¶

Set default value, has no effect if already has a value.

参数:	val -- Default value to set. verbose -- Verbosity.

validator¶

return -- Validator of the parameter.

返回类型:	`Callable`[[`Any`], `bool`]

value¶

return -- Value of the parameter.

返回类型:	`Any`

matchzoo.engine.param_table module¶

Parameters table class.

class matchzoo.engine.param_table.ParamTable¶

基类：object

Parameter table class.

Example

>>> params = ParamTable()
>>> params.add(Param('ham', 'Parma Ham'))
>>> params.add(Param('egg', 'Over Easy'))
>>> params['ham']
'Parma Ham'
>>> params['egg']
'Over Easy'
>>> print(params)
ham                           Parma Ham
egg                           Over Easy
>>> params.add(Param('egg', 'Sunny side Up'))
Traceback (most recent call last):
    ...
ValueError: Parameter named egg already exists.
To re-assign parameter egg value, use `params["egg"] = value` instead.

add(param)¶

参数:	param (`Param`) -- parameter to add.

completed()¶

返回类型:	`bool`
返回:	True if all params are filled, False otherwise.

Example

>>> import matchzoo
>>> model = matchzoo.models.Naive()
>>> model.params.completed()
False
>>> model.guess_and_fill_missing_params(verbose=0)
>>> model.params.completed()
True

get(key)¶

返回类型:	`Param`
返回:	The parameter in the table named key.

hyper_space¶

return -- Hyper space of the table, a valid hyperopt graph.

返回类型:	`dict`

keys()¶

返回类型:	`Keysview`[~KT]
返回:	Parameter table keys.

set(key, param)¶: Set key to parameter param.

matchzoo.engine package¶

Submodules¶

matchzoo.engine.base_metric module¶

matchzoo.engine.base_model module¶

matchzoo.engine.base_preprocessor module¶

matchzoo.engine.base_task module¶

matchzoo.engine.callbacks module¶

matchzoo.engine.hyper_spaces module¶

matchzoo.engine.param module¶

matchzoo.engine.param_table module¶

Module contents¶