matchzoo.engine package

Submodules

matchzoo.engine.base_metric module

Metric base class and some related utilities.

class matchzoo.engine.base_metric.BaseMetric

Bases: abc.ABC

Metric base class.

ALIAS = 'base_metric'
matchzoo.engine.base_metric.sort_and_couple(labels, scores)

Zip the labels with scores into a single list.

Return type:<built-in function array>

matchzoo.engine.base_model module

Base Model.

class matchzoo.engine.base_model.BaseModel(params=None, backend=None)

Bases: abc.ABC

Abstract base class of all MatchZoo models.

MatchZoo models are wrapped over keras models, and the actual keras model built can be accessed by model.backend. params is a set of model hyper-parameters that deterministically builds a model. In other words, params[‘model_class’](params=params) of the same params always create models with the same structure.

Parameters:
  • params (Optional[ParamTable]) – Model hyper-parameters. (default: return value from get_default_params())
  • backend (Optional[Model]) – A keras model as the model backend. Usually not passed as an argument.

Example

>>> BaseModel()  # doctest: +ELLIPSIS
Traceback (most recent call last):
...
TypeError: Can't instantiate abstract class BaseModel ...
>>> class MyModel(BaseModel):
...     def build(self):
...         pass
>>> isinstance(MyModel(), BaseModel)
True
BACKEND_WEIGHTS_FILENAME = 'backend_weights.h5'
PARAMS_FILENAME = 'params.dill'
backend

return model backend, a keras model instance.

Return type:Model
build()

Build model, each subclass need to impelemnt this method.

compile()

Compile model for training.

Only keras native metrics are compiled together with backend. MatchZoo metrics are evaluated only through evaluate(). Notice that keras count loss as one of the metrics while MatchZoo matchzoo.engine.BaseTask does not.

Examples

>>> from matchzoo import models
>>> model = models.Naive()
>>> model.guess_and_fill_missing_params(verbose=0)
>>> model.params['task'].metrics = ['mse', 'map']
>>> model.params['task'].metrics
['mse', mean_average_precision(0.0)]
>>> model.build()
>>> model.compile()
evaluate(x, y, batch_size=128)

Evaluate the model.

Parameters:
  • x (Dict[str, ndarray]) – Input data.
  • y (ndarray) – Labels.
  • batch_size (int) – Number of samples when predict for evaluation. (default: 128)
Examples::
>>> import matchzoo as mz
>>> data_pack = mz.datasets.toy.load_data()
>>> preprocessor = mz.preprocessors.NaivePreprocessor()
>>> data_pack = preprocessor.fit_transform(data_pack, verbose=0)
>>> m = mz.models.DenseBaseline()
>>> m.params['task'] = mz.tasks.Ranking()
>>> m.params['task'].metrics = [
...     'acc', 'mse', 'mae', 'ce',
...     'average_precision', 'precision', 'dcg', 'ndcg',
...     'mean_reciprocal_rank', 'mean_average_precision', 'mrr',
...     'map', 'MAP',
...     mz.metrics.AveragePrecision(threshold=1),
...     mz.metrics.Precision(k=2, threshold=2),
...     mz.metrics.DiscountedCumulativeGain(k=2),
...     mz.metrics.NormalizedDiscountedCumulativeGain(
...         k=3, threshold=-1),
...     mz.metrics.MeanReciprocalRank(threshold=2),
...     mz.metrics.MeanAveragePrecision(threshold=3)
... ]
>>> m.guess_and_fill_missing_params(verbose=0)
>>> m.build()
>>> m.compile()
>>> x, y = data_pack.unpack()
>>> evals = m.evaluate(x, y)
>>> type(evals)
<class 'dict'>
Return type:Dict[BaseMetric, float]
evaluate_generator(generator, batch_size=128)

Evaluate the model.

Parameters:
  • generator (DataGenerator) – DataGenerator to evluate.
  • batch_size (int) – Batch size. (default: 128)
Return type:

Dict[BaseMetric, float]

fit(x, y, batch_size=128, epochs=1, verbose=1, **kwargs)

Fit the model.

See keras.models.Model.fit() for more details.

Parameters:
  • x (Union[ndarray, List[ndarray], dict]) – input data.
  • y (ndarray) – labels.
  • batch_size (int) – number of samples per gradient update.
  • epochs (int) – number of epochs to train the model.
  • verbose (int) – 0, 1, or 2. Verbosity mode. 0 = silent, 1 = verbose, 2 = one log line per epoch.

Key word arguments not listed above will be propagated to keras’s fit.

Return type:History
Returns:A keras.callbacks.History instance. Its history attribute contains all information collected during training.
fit_generator(generator, epochs=1, verbose=1, **kwargs)

Fit the model with matchzoo generator.

See keras.models.Model.fit_generator() for more details.

Parameters:
  • generator (DataGenerator) – A generator, an instance of engine.DataGenerator.
  • epochs (int) – Number of epochs to train the model.
  • verbose (int) – 0, 1, or 2. Verbosity mode. 0 = silent, 1 = verbose, 2 = one log line per epoch.
Return type:

History

Returns:

A keras.callbacks.History instance. Its history attribute contains all information collected during training.

classmethod get_default_params(with_embedding=False, with_multi_layer_perceptron=False)

Model default parameters.

The common usage is to instantiate matchzoo.engine.ModelParams
first, then set the model specific parametrs.

Examples

>>> class MyModel(BaseModel):
...     def build(self):
...         print(self._params['num_eggs'], 'eggs')
...         print('and', self._params['ham_type'])
...
...     @classmethod
...     def get_default_params(cls):
...         params = ParamTable()
...         params.add(Param('num_eggs', 512))
...         params.add(Param('ham_type', 'Parma Ham'))
...         return params
>>> my_model = MyModel()
>>> my_model.build()
512 eggs
and Parma Ham

Notice that all parameters must be serialisable for the entire model to be serialisable. Therefore, it’s strongly recommended to use python native data types to store parameters.

Return type:ParamTable
Returns:model parameters
classmethod get_default_preprocessor()

Model default preprocessor.

The preprocessor’s transform should produce a correctly shaped data pack that can be used for training. Some extra configuration (e.g. setting input_shapes in matchzoo.models.DSSMModel may be required on the user’s end.

Return type:BasePreprocessor
Returns:Default preprocessor.
get_embedding_layer(name='embedding')

Get the embedding layer.

All MatchZoo models with a single embedding layer set the embedding layer name to embedding, and this method should return that layer.

Parameters:name (str) – Name of the embedding layer. (default: embedding)
Return type:Layer
guess_and_fill_missing_params(verbose=1)

Guess and fill missing parameters in params.

Use this method to automatically fill-in other hyper parameters. This involves some guessing so the parameter it fills could be wrong. For example, the default task is Ranking, and if we do not set it to Classification manaully for data packs prepared for classification, then the shape of the model output and the data will mismatch.

Parameters:verbose – Verbosity.
load_embedding_matrix(embedding_matrix, name='embedding')

Load an embedding matrix.

Load an embedding matrix into the model’s embedding layer. The name of the embedding layer is specified by name. For models with only one embedding layer, set name=’embedding’ when creating the keras layer, and use the default name when load the matrix. For models with more than one embedding layers, initialize keras layer with different layer names, and set name accordingly to load a matrix to a chosen layer.

Parameters:
  • embedding_matrix (ndarray) – Embedding matrix to be loaded.
  • name (str) – Name of the layer. (default: ‘embedding’)
params

model parameters.

Type:return
Return type:ParamTable
predict(x, batch_size=128)

Generate output predictions for the input samples.

See keras.models.Model.predict() for more details.

Parameters:
  • x (Dict[str, ndarray]) – input data
  • batch_size – number of samples per gradient update
Return type:

ndarray

Returns:

numpy array(s) of predictions

save(dirpath)

Save the model.

A saved model is represented as a directory with two files. One is a model parameters file saved by pickle, and the other one is a model h5 file saved by keras.

Parameters:dirpath (Union[str, Path]) – directory path of the saved model

Example

>>> import matchzoo as mz
>>> model = mz.models.Naive()
>>> model.guess_and_fill_missing_params(verbose=0)
>>> model.build()
>>> model.save('temp-model')
>>> import shutil
>>> shutil.rmtree('temp-model')
matchzoo.engine.base_model.load_model(dirpath)

Load a model. The reverse function of BaseModel.save().

Parameters:dirpath (Union[str, Path]) – directory path of the saved model
Return type:BaseModel
Returns:a BaseModel instance

Example

>>> import matchzoo as mz
>>> model = mz.models.Naive()
>>> model.guess_and_fill_missing_params(verbose=0)
>>> model.build()
>>> model.save('my-model')
>>> model.params.keys() == mz.load_model('my-model').params.keys()
True
>>> import shutil
>>> shutil.rmtree('my-model')

matchzoo.engine.base_preprocessor module

BasePreprocessor define input and ouutput for processors.

class matchzoo.engine.base_preprocessor.BasePreprocessor

Bases: object

BasePreprocessor to input handle data.

A preprocessor should be used in two steps. First, fit, then, transform. fit collects information into context, which includes everything the preprocessor needs to transform together with other useful information for later use. fit will only change the preprocessor’s inner state but not the input data. In contrast, transform returns a modified copy of the input data without changing the preprocessor’s inner state.

DATA_FILENAME = 'preprocessor.dill'
context

Return context.

fit(data_pack, verbose=1)

Fit parameters on input data.

This method is an abstract base method, need to be implemented in the child class.

This method is expected to return itself as a callable object.

Parameters:
  • data_pack (DataPack) – Datapack object to be fitted.
  • verbose (int) – Verbosity.
Return type:

BasePreprocessor

fit_transform(data_pack, verbose=1)

Call fit-transform.

Parameters:
  • data_pack (DataPack) – DataPack object to be processed.
  • verbose (int) – Verbosity.
Return type:

DataPack

save(dirpath)

Save the DSSMPreprocessor object.

A saved DSSMPreprocessor is represented as a directory with the context object (fitted parameters on training data), it will be saved by pickle.

Parameters:dirpath (Union[str, Path]) – directory path of the saved DSSMPreprocessor.
transform(data_pack, verbose=1)

Transform input data to expected manner.

This method is an abstract base method, need to be implemented in the child class.

Parameters:
  • data_pack (DataPack) – DataPack object to be transformed.
  • verbose (int) – Verbosity. or list of text-left, text-right tuples.
Return type:

DataPack

matchzoo.engine.base_preprocessor.load_preprocessor(dirpath)

Load the fitted context. The reverse function of save().

Parameters:dirpath (Union[str, Path]) – directory path of the saved model.
Return type:DataPack
Returns:a DSSMPreprocessor instance.
matchzoo.engine.base_preprocessor.validate_context(func)

Validate context in the preprocessor.

matchzoo.engine.base_task module

Base task.

class matchzoo.engine.base_task.BaseTask(loss=None, metrics=None)

Bases: abc.ABC

Base Task, shouldn’t be used directly.

classmethod list_available_losses()
Return type:list
Returns:a list of available losses.
classmethod list_available_metrics()
Return type:list
Returns:a list of available metrics.
loss

Loss used in the task.

Type:return
metrics

Metrics used in the task.

Type:return
output_dtype

output data type for specific task.

Type:return
output_shape

output shape of a single sample of the task.

Type:return
Return type:tuple

matchzoo.engine.callbacks module

Callbacks.

class matchzoo.engine.callbacks.EvaluateAllMetrics(model, x, y, once_every=1, batch_size=128, model_save_path=None, verbose=1)

Bases: keras.callbacks.Callback

Callback to evaluate all metrics.

MatchZoo metrics can not be evaluated batch-wise since they require dataset-level information. As a result, MatchZoo metrics are not evaluated automatically when a Model fit. When this callback is used, all metrics, including MatchZoo metrics and Keras metrics, are evluated once every once_every epochs.

Parameters:
  • model (BaseModel) – Model to evaluate.
  • x (Union[ndarray, List[ndarray]]) –
  • y (ndarray) –
  • once_every (int) – Evaluation only triggers when epoch % once_every == 0. (default: 1, i.e. evaluate on every epoch’s end)
  • batch_size (int) – Number of samples per evaluation. This only affects the evaluation of Keras metrics, since MatchZoo metrics are always evaluated using the full data.
  • model_save_path (Optional[str]) – Directory path to save the model after each evaluate callback, (default: None, i.e., no saving.)
  • verbose – Verbosity.
on_epoch_end(epoch, logs=None)

Called at the end of en epoch.

Parameters:
  • epoch (int) – integer, index of epoch.
  • logs (Optional[dict]) – dictionary of logs.
Returns:

dictionary of logs.

matchzoo.engine.hyper_spaces module

Hyper parameter search spaces wrapping hyperopt.

class matchzoo.engine.hyper_spaces.HyperoptProxy(hyperopt_func, **kwargs)

Bases: object

Hyperopt proxy class.

See hyperopt’s documentation for more details: https://github.com/hyperopt/hyperopt/wiki/FMin

Reason of these wrappers:

A hyper space in hyperopt requires a label to instantiate. This label is used later as a reference to original hyper space that is sampled. In matchzoo, hyper spaces are used in matchzoo.engine.Param. Only if a hyper space’s label matches its parent matchzoo.engine.Param’s name, matchzoo can correctly back-refrenced the parameter got sampled. This can be done by asking the user always use the same name for a parameter and its hyper space, but typos can occur. As a result, these wrappers are created to hide hyper spaces’ label, and always correctly bind them with its parameter’s name.
Examples::
>>> import matchzoo as mz
>>> from hyperopt.pyll.stochastic import sample
Basic Usage:
>>> model = mz.models.DenseBaseline()
>>> sample(model.params.hyper_space)  # doctest: +SKIP
 {'mlp_num_layers': 1.0, 'mlp_num_units': 274.0}
Arithmetic Operations:
>>> new_space = 2 ** mz.hyper_spaces.quniform(2, 6)
>>> model.params.get('mlp_num_layers').hyper_space = new_space
>>> sample(model.params.hyper_space)  # doctest: +SKIP
{'mlp_num_layers': 8.0, 'mlp_num_units': 292.0}
convert(name)

Attach name as hyperopt.hp’s label.

Parameters:name (str) –
Return type:Apply
Returns:a hyperopt ready search space
class matchzoo.engine.hyper_spaces.choice(options)

Bases: matchzoo.engine.hyper_spaces.HyperoptProxy

hyperopt.hp.choice() proxy.

class matchzoo.engine.hyper_spaces.quniform(low, high, q=1)

Bases: matchzoo.engine.hyper_spaces.HyperoptProxy

hyperopt.hp.quniform() proxy.

matchzoo.engine.hyper_spaces.sample(space)

Take a sample in the hyper space.

This method is stateless, so the distribution of the samples is different from that of tune call. This function just gives a general idea of what a sample from the space looks like.

Example

>>> import matchzoo as mz
>>> space = mz.models.Naive.get_default_params().hyper_space
>>> mz.hyper_spaces.sample(space)  # doctest: +ELLIPSIS
{'optimizer': ...}
class matchzoo.engine.hyper_spaces.uniform(low, high)

Bases: matchzoo.engine.hyper_spaces.HyperoptProxy

hyperopt.hp.uniform() proxy.

matchzoo.engine.param module

Parameter class.

class matchzoo.engine.param.Param(name, value=None, hyper_space=None, validator=None, desc=None)

Bases: object

Parameter class.

Basic usages with a name and value:

>>> param = Param('my_param', 10)
>>> param.name
'my_param'
>>> param.value
10

Use with a validator to make sure the parameter always keeps a valid value.

>>> param = Param(
...     name='my_param',
...     value=5,
...     validator=lambda x: 0 < x < 20
... )
>>> param.validator  # doctest: +ELLIPSIS
<function <lambda> at 0x...>
>>> param.value
5
>>> param.value = 10
>>> param.value
10
>>> param.value = -1
Traceback (most recent call last):
    ...
ValueError: Validator not satifised.
The validator's definition is as follows:
validator=lambda x: 0 < x < 20

Use with a hyper space. Setting up a hyper space for a parameter makes the parameter tunable in a matchzoo.engine.Tuner.

>>> from matchzoo.engine.hyper_spaces import quniform
>>> param = Param(
...     name='positive_num',
...     value=1,
...     hyper_space=quniform(low=1, high=5)
... )
>>> param.hyper_space  # doctest: +ELLIPSIS
<matchzoo.engine.hyper_spaces.quniform object at ...>
>>> from hyperopt.pyll.stochastic import sample
>>> hyperopt_space = param.hyper_space.convert(param.name)
>>> samples = [sample(hyperopt_space) for _ in range(64)]
>>> set(samples) == {1, 2, 3, 4, 5}
True

The boolean value of a Param instance is only True when the value is not None. This is because some default falsy values like zero or an empty list are valid parameter values. In other words, the boolean value means to be “if the parameter value is filled”.

>>> param = Param('dropout')
>>> if param:
...     print('OK')
>>> param = Param('dropout', 0)
>>> if param:
...     print('OK')
OK

A _pre_assignment_hook is initialized as a data type convertor if the value is set as a number to keep data type consistency of the parameter. This conversion supports python built-in numbers, numpy numbers, and any number that inherits numbers.Number.

>>> param = Param('float_param', 0.5)
>>> param.value = 10
>>> param.value
10.0
>>> type(param.value)
<class 'float'>
desc

Parameter description.

Type:return
Return type:str
hyper_space

Hyper space of the parameter.

Type:return
Return type:Union[Apply, HyperoptProxy]
name

Name of the parameter.

Type:return
Return type:str
reset()

Set the parameter’s value to None, which means “not set”.

This method bypasses validator.

Example

>>> import matchzoo as mz
>>> param = mz.Param(
...     name='str', validator=lambda x: isinstance(x, str))
>>> param.value = 'hello'
>>> param.value = None
Traceback (most recent call last):
    ...
ValueError: Validator not satifised.
The validator's definition is as follows:
name='str', validator=lambda x: isinstance(x, str))
>>> param.reset()
>>> param.value is None
True
set_default(val, verbose=1)

Set default value, has no effect if already has a value.

Parameters:
  • val – Default value to set.
  • verbose – Verbosity.
validator

Validator of the parameter.

Type:return
Return type:Callable[[Any], bool]
value

Value of the parameter.

Type:return
Return type:Any

matchzoo.engine.param_table module

Parameters table class.

class matchzoo.engine.param_table.ParamTable

Bases: object

Parameter table class.

Example

>>> params = ParamTable()
>>> params.add(Param('ham', 'Parma Ham'))
>>> params.add(Param('egg', 'Over Easy'))
>>> params['ham']
'Parma Ham'
>>> params['egg']
'Over Easy'
>>> print(params)
ham                           Parma Ham
egg                           Over Easy
>>> params.add(Param('egg', 'Sunny side Up'))
Traceback (most recent call last):
    ...
ValueError: Parameter named egg already exists.
To re-assign parameter egg value, use `params["egg"] = value` instead.
add(param)
Parameters:param (Param) – parameter to add.
completed()
Return type:bool
Returns:True if all params are filled, False otherwise.

Example

>>> import matchzoo
>>> model = matchzoo.models.Naive()
>>> model.params.completed()
False
>>> model.guess_and_fill_missing_params(verbose=0)
>>> model.params.completed()
True
get(key)
Return type:Param
Returns:The parameter in the table named key.
hyper_space

Hyper space of the table, a valid hyperopt graph.

Type:return
Return type:dict
keys()
Return type:KeysView
Returns:Parameter table keys.
set(key, param)

Set key to parameter param.

to_frame()

Convert the parameter table into a pandas data frame.

Return type:DataFrame
Returns:A pandas.DataFrame.

Example

>>> import matchzoo as mz
>>> table = mz.ParamTable()
>>> table.add(mz.Param(name='x', value=10, desc='my x'))
>>> table.add(mz.Param(name='y', value=20, desc='my y'))
>>> table.to_frame()
  Name Description  Value Hyper-Space
0    x        my x     10        None
1    y        my y     20        None
update(other)

Update self.

Update self with the key/value pairs from other, overwriting existing keys. Notice that this does not add new keys to self.

This method is usually used by models to obtain useful information from a preprocessor’s context.

Parameters:other (dict) – The dictionary used update.

Example

>>> import matchzoo as mz
>>> model = mz.models.DenseBaseline()
>>> model.params['input_shapes'] is None
True
>>> prpr = model.get_default_preprocessor()
>>> _ = prpr.fit(mz.datasets.toy.load_data(), verbose=0)
>>> model.params.update(prpr.context)
>>> model.params['input_shapes']
[(30,), (30,)]

matchzoo.engine.parse_metric module

matchzoo.engine.parse_metric.parse_metric(metric, task=None)

Parse input metric in any form into a BaseMetric instance.

Parameters:
  • metric (Union[str, Type[BaseMetric], BaseMetric]) – Input metric in any form.
  • task (Optional[BaseTask]) – Task type for determining specific metric.
Return type:

Union[BaseMetric, str]

Returns:

A BaseMetric instance

Examples::
>>> from matchzoo import metrics
>>> from matchzoo.engine.parse_metric import parse_metric
Use str as keras native metrics:
>>> parse_metric('mse')
'mse'
Use str as MatchZoo metrics:
>>> mz_metric = parse_metric('map')
>>> type(mz_metric)
<class 'matchzoo.metrics.mean_average_precision.MeanAveragePrecision'>
Use matchzoo.engine.BaseMetric subclasses as MatchZoo metrics:
>>> type(parse_metric(metrics.AveragePrecision))
<class 'matchzoo.metrics.average_precision.AveragePrecision'>
Use matchzoo.engine.BaseMetric instances as MatchZoo metrics:
>>> type(parse_metric(metrics.AveragePrecision()))
<class 'matchzoo.metrics.average_precision.AveragePrecision'>

Module contents