matchzoo.engine package

Submodules

matchzoo.engine.base_metric module

Metric base class and some related utilities.

class matchzoo.engine.base_metric.BaseMetric

基类:abc.ABC

Metric base class.

ALIAS = 'base_metric'
matchzoo.engine.base_metric.parse_metric(metric)

Parse input metric in any form into a BaseMetric instance.

参数:metric (Union[str, Type[BaseMetric], BaseMetric]) -- Input metric in any form.
返回:A BaseMetric instance
Examples::
>>> from matchzoo import engine, metrics
Use str as keras native metrics:
>>> engine.parse_metric('mse')
'mse'
Use str as MatchZoo metrics:
>>> mz_metric = engine.parse_metric('map')
>>> type(mz_metric)
<class 'matchzoo.metrics.mean_average_precision.MeanAveragePrecision'>
Use matchzoo.engine.BaseMetric subclasses as MatchZoo metrics:
>>> type(engine.parse_metric(metrics.AveragePrecision))
<class 'matchzoo.metrics.average_precision.AveragePrecision'>
Use matchzoo.engine.BaseMetric instances as MatchZoo metrics:
>>> type(engine.parse_metric(metrics.AveragePrecision()))
<class 'matchzoo.metrics.average_precision.AveragePrecision'>
matchzoo.engine.base_metric.sort_and_couple(labels, scores)

Zip the labels with scores into a single list.

返回类型:<built-in function array>

matchzoo.engine.base_model module

Base Model.

class matchzoo.engine.base_model.BaseModel(params=None, backend=None)

基类:abc.ABC

Abstract base class of all matchzoo models.

BACKEND_WEIGHTS_FILENAME = 'backend_weights.h5'
PARAMS_FILENAME = 'params.dill'
backend

return model backend, a keras model instance.

返回类型:Model
build()

Build model, each sub class need to impelemnt this method.

Example

>>> BaseModel()  
Traceback (most recent call last):
...
TypeError: Can't instantiate abstract class BaseModel ...
>>> class MyModel(BaseModel):
...     def build(self):
...         pass
>>> assert MyModel()
compile()

Compile model for training.

Only keras native metrics are compiled together with backend. MatchZoo metrics are evaluated only through evaluate(). Notice that keras count loss as one of the metrics while MatchZoo matchzoo.engine.BaseTask does not.

Examples

>>> from matchzoo import models
>>> model = models.Naive()
>>> model.guess_and_fill_missing_params(verbose=0)
>>> model.params['task'].metrics = ['mse', 'map']
>>> model.params['task'].metrics
['mse', mean_average_precision(0)]
>>> model.build()
>>> model.compile()
evaluate(x, y, batch_size=128, verbose=1)

Evaluate the model.

See keras.models.Model.evaluate() for more details.

参数:
  • x (Union[ndarray, List[ndarray], Dict[str, ndarray]]) -- input data
  • y (ndarray) -- labels
  • batch_size (int) -- number of samples per gradient update
  • verbose (int) -- verbosity mode, 0 or 1
返回类型:

Dict[str, float]

返回:

scalar test loss (if the model has a single output and no metrics) or list of scalars (if the model has multiple outputs and/or metrics). The attribute model.backend.metrics_names will give you the display labels for the scalar outputs.

Examples::
>>> import matchzoo as mz
>>> data_pack = mz.datasets.toy.load_data()
>>> preprocessor = mz.preprocessors.NaivePreprocessor()
>>> data_pack = preprocessor.fit_transform(data_pack)
>>> m = mz.models.DenseBaseline()
>>> m.params['task'] = mz.tasks.Ranking()
>>> m.params['task'].metrics = [
...     'acc', 'mse', 'mae', 'ce',
...     'average_precision', 'precision', 'dcg', 'ndcg',
...     'mean_reciprocal_rank', 'mean_average_precision', 'mrr',
...     'map', 'MAP',
...     mz.metrics.AveragePrecision(threshold=1),
...     mz.metrics.Precision(k=2, threshold=2),
...     mz.metrics.DiscountedCumulativeGain(k=2),
...     mz.metrics.NormalizedDiscountedCumulativeGain(
...         k=3, threshold=-1),
...     mz.metrics.MeanReciprocalRank(threshold=2),
...     mz.metrics.MeanAveragePrecision(threshold=3)
... ]
>>> m.guess_and_fill_missing_params(verbose=0)
>>> m.build()
>>> m.compile()
>>> x, y = data_pack.unpack()
>>> evals = m.evaluate(x, y, verbose=0)
>>> type(evals)
<class 'dict'>
fit(x, y, batch_size=128, epochs=1, verbose=1, **kwargs)

Fit the model.

See keras.models.Model.fit() for more details.

参数:
  • x (Union[ndarray, List[ndarray]]) -- input data.
  • y (ndarray) -- labels.
  • batch_size (int) -- number of samples per gradient update.
  • epochs (int) -- number of epochs to train the model.
  • verbose (int) -- 0, 1, or 2. Verbosity mode. 0 = silent, 1 = verbose, 2 = one log line per epoch.

Key word arguments not listed above will be propagated to keras's fit.

返回类型:History
返回:A keras.callbacks.History instance. Its history attribute contains all information collected during training.
fit_generator(generator, epochs=1, verbose=1, **kwargs)

Fit the model with matchzoo generator.

See keras.models.Model.fit_generator() for more details.

参数:
  • generator (DataGenerator) -- A generator, an instance of engine.DataGenerator.
  • epochs (int) -- Number of epochs to train the model.
  • verbose (int) -- 0, 1, or 2. Verbosity mode. 0 = silent, 1 = verbose, 2 = one log line per epoch.
返回类型:

History

返回:

A keras.callbacks.History instance. Its history attribute contains all information collected during training.

classmethod get_default_params(with_embedding=False, with_multi_layer_perceptron=False)

Model default parameters.

The common usage is to instantiate matchzoo.engine.ModelParams
first, then set the model specific parametrs.

Examples

>>> class MyModel(BaseModel):
...     def build(self):
...         print(self._params['num_eggs'], 'eggs')
...         print('and', self._params['ham_type'])
...
...     @classmethod
...     def get_default_params(cls):
...         params = engine.ParamTable()
...         params.add(engine.Param('num_eggs', 512))
...         params.add(engine.Param('ham_type', 'Parma Ham'))
...         return params
>>> my_model = MyModel()
>>> my_model.build()
512 eggs
and Parma Ham

Notice that all parameters must be serialisable for the entire model to be serialisable. Therefore, it's strongly recommended to use python native data types to store parameters.

返回类型:ParamTable
返回:model parameters
classmethod get_default_preprocessor()

Model default preprocessor.

The preprocessor's transform should produce a correctly shaped data pack that can be used for training. Some extra configuration (e.g. setting input_shapes in matchzoo.models.DSSMModel may be required on the user's end.

返回类型:BasePreprocessor
返回:Default preprocessor.
guess_and_fill_missing_params(verbose=1)

Guess and fill missing parameters in params.

Use this method to automatically fill-in hyper parameters. This involves some guessing so the parameter it fills could be wrong. For example, the default task is Ranking, and if we do not set it to Classification manaully for data packs prepared for classification, then the shape of the model output and the data will mismatch.

参数:verbose -- Verbosity.
load_embedding_matrix(embedding_matrix, name='embedding')

Load an embedding matrix.

Load an embedding matrix into the model's embedding layer. The name of the embedding layer is specified by name. For models with only one embedding layer, set name='embedding' when creating the keras layer, and use the default name when load the matrix. For models with more than one embedding layers, initialize keras layer with different layer names, and set name accordingly to load a matrix to a chosen layer.

参数:
  • embedding_matrix (ndarray) -- Embedding matrix to be loaded.
  • name (str) -- Name of the layer. (default: 'embedding')
params

return -- model parameters.

返回类型:ParamTable
predict(x, batch_size=128)

Generate output predictions for the input samples.

See keras.models.Model.predict() for more details.

参数:
  • x (Union[ndarray, List[ndarray]]) -- input data
  • batch_size -- number of samples per gradient update
返回类型:

ndarray

返回:

numpy array(s) of predictions

save(dirpath)

Save the model.

A saved model is represented as a directory with two files. One is a model parameters file saved by pickle, and the other one is a model h5 file saved by keras.

参数:dirpath (Union[str, Path]) -- directory path of the saved model
matchzoo.engine.base_model.load_model(dirpath)

Load a model. The reverse function of BaseModel.save().

参数:dirpath (Union[str, Path]) -- directory path of the saved model
返回类型:BaseModel
返回:a BaseModel instance

matchzoo.engine.base_preprocessor module

BasePreprocessor define input and ouutput for processors.

class matchzoo.engine.base_preprocessor.BasePreprocessor

基类:object

BasePreprocessor to input handle data.

A preprocessor should be used in two steps. First, fit, then, transform. fit collects information into context, which includes everything the preprocessor needs to transform together with other useful information for later use. fit will only change the preprocessor's inner state but not the input data. In contrast, transform returns a modified copy of the input data without changing the preprocessor's inner state.

DATA_FILENAME = 'preprocessor.dill'
context

Return context.

fit(data_pack, verbose=1)

Fit parameters on input data.

This method is an abstract base method, need to be implemented in the child class.

This method is expected to return itself as a callable object.

参数:
  • data_pack (DataPack) -- Datapack object to be fitted.
  • verbose -- Verbosity.
返回类型:

BasePreprocessor

fit_transform(data_pack, verbose=1)

Call fit-transform.

参数:data_pack (DataPack) -- DataPack object to be processed.
返回类型:DataPack
save(dirpath)

Save the DSSMPreprocessor object.

A saved DSSMPreprocessor is represented as a directory with the context object (fitted parameters on training data), it will be saved by pickle.

参数:dirpath (Union[str, Path]) -- directory path of the saved DSSMPreprocessor.
transform(data_pack, verbose=1)

Transform input data to expected manner.

This method is an abstract base method, need to be implemented in the child class.

参数:
  • data_pack (DataPack) -- DataPack object to be transformed.
  • verbose -- Verbosity. or list of text-left, text-right tuples.
返回类型:

DataPack

matchzoo.engine.base_preprocessor.load_preprocessor(dirpath)

Load the fitted context. The reverse function of save().

参数:dirpath (Union[str, Path]) -- directory path of the saved model.
返回类型:DataPack
返回:a DSSMPreprocessor instance.
matchzoo.engine.base_preprocessor.validate_context(func)

Validate context in the preprocessor.

matchzoo.engine.base_task module

Base task.

class matchzoo.engine.base_task.BaseTask(loss=None, metrics=None)

基类:abc.ABC

Base Task, shouldn't be used directly.

classmethod convert_metrics(metrics)

Convert metrics into properly formed list of metrics.

Examples

>>> BaseTask.convert_metrics(['mse'])
['mse']
>>> BaseTask.convert_metrics('map')
[mean_average_precision(0)]
classmethod list_available_losses()
返回类型:list
返回:a list of available losses.
classmethod list_available_metrics()
返回类型:list
返回:a list of available metrics.
loss

return -- Loss used in the task.

metrics

return -- Metrics used in the task.

output_dtype

return -- output data type for specific task.

output_shape

return -- output shape of a single sample of the task.

返回类型:tuple
matchzoo.engine.base_task.list_available_tasks(base=<class 'matchzoo.engine.base_task.BaseTask'>)
返回类型:List[Type[BaseTask]]
返回:a list of available task types.

matchzoo.engine.callbacks module

Callbacks.

class matchzoo.engine.callbacks.EvaluateAllMetrics(model, x, y, once_every=1, batch_size=32, model_save_path=None, verbose=1)

基类:keras.callbacks.Callback

Callback to evaluate all metrics.

MatchZoo metrics can not be evaluated batch-wise since they require dataset-level information. As a result, MatchZoo metrics are not evaluated automatically when a Model fit. When this callback is used, all metrics, including MatchZoo metrics and Keras metrics, are evluated once every once_every epochs.

参数:
  • model (BaseModel) -- Model to evaluate.
  • x (Union[ndarray, List[ndarray]]) --
  • y (ndarray) --
  • once_every (int) -- Evaluation only triggers when epoch % once_every == 0. (default: 1, i.e. evaluate on every epoch's end)
  • batch_size (int) -- Number of samples per evaluation. This only affects the evaluation of Keras metrics, since MatchZoo metrics are always evaluated using the full data.
  • model_save_path (Optional[str]) -- Directory path to save the model after each evaluate callback, (default: None, i.e., no saving.)
  • verbose -- Verbosity.
on_epoch_end(epoch, logs=None)

Called at the end of en epoch.

参数:
  • epoch -- integer, index of epoch.
  • logs -- dictionary of logs.
返回:

dictionary of logs.

matchzoo.engine.hyper_spaces module

Hyper parameter search spaces wrapping hyperopt.

class matchzoo.engine.hyper_spaces.HyperoptProxy(hyperopt_func, **kwargs)

基类:object

Hyperopt proxy class.

See hyperopt's documentation for more details: https://github.com/hyperopt/hyperopt/wiki/FMin

Reason of these wrappers:

A hyper space in hyperopt requires a label to instantiate. This label is used later as a reference to original hyper space that is sampled. In matchzoo, hyper spaces are used in matchzoo.engine.Param. Only if a hyper space's label matches its parent matchzoo.engine.Param's name, matchzoo can correctly back-refrenced the parameter got sampled. This can be done by asking the user always use the same name for a parameter and its hyper space, but typos can occur. As a result, these wrappers are created to hide hyper spaces' label, and always correctly bind them with its parameter's name.
Examples::
>>> import matchzoo as mz
>>> from hyperopt.pyll.stochastic import sample
Basic Usage:
>>> model = mz.models.DenseBaseline()
>>> sample(model.params.hyper_space)  
 {'mlp_num_layers': 1.0, 'mlp_num_units': 274.0}
Arithmetic Operations:
>>> new_space = 2 ** mz.hyper_spaces.quniform(2, 6)
>>> model.params.get('mlp_num_layers').hyper_space = new_space
>>> sample(model.params.hyper_space)  
{'mlp_num_layers': 8.0, 'mlp_num_units': 292.0}
convert(name)

Attach name as hyperopt.hp's label.

参数:name (str) --
返回类型:Apply
返回:a hyperopt ready search space
class matchzoo.engine.hyper_spaces.choice(options)

基类:matchzoo.engine.hyper_spaces.HyperoptProxy

hyperopt.hp.choice() proxy.

class matchzoo.engine.hyper_spaces.quniform(low, high, q=1)

基类:matchzoo.engine.hyper_spaces.HyperoptProxy

hyperopt.hp.quniform() proxy.

class matchzoo.engine.hyper_spaces.uniform(low, high)

基类:matchzoo.engine.hyper_spaces.HyperoptProxy

hyperopt.hp.uniform() proxy.

matchzoo.engine.param module

Parameter class.

class matchzoo.engine.param.Param(name, value=None, hyper_space=None, validator=None, desc=None)

基类:object

Parameter class.

Basic usages with a name and value:

>>> param = Param('my_param', 10)
>>> param.name
'my_param'
>>> param.value
10

Use with a validator to make sure the parameter always keeps a valid value.

>>> param = Param(
...     name='my_param',
...     value=5,
...     validator=lambda x: 0 < x < 20
... )
>>> param.validator  
<function <lambda> at 0x...>
>>> param.value
5
>>> param.value = 10
>>> param.value
10
>>> param.value = -1
Traceback (most recent call last):
    ...
ValueError: Validator not satifised.
The validator's definition is as follows:
validator=lambda x: 0 < x < 20

Use with a hyper space. Setting up a hyper space for a parameter makes the parameter tunable in a matchzoo.engine.Tuner.

>>> from matchzoo.engine.hyper_spaces import quniform
>>> param = Param(
...     name='positive_num',
...     value=1,
...     hyper_space=quniform(low=1, high=5)
... )
>>> param.hyper_space  
<matchzoo.engine.hyper_spaces.quniform object at ...>
>>> from hyperopt.pyll.stochastic import sample
>>> hyperopt_space = param.hyper_space.convert(param.name)
>>> samples = [sample(hyperopt_space) for _ in range(64)]
>>> set(samples) == {1, 2, 3, 4, 5}
True

The boolean value of a Param instance is only True when the value is not None. This is because some default falsy values like zero or an empty list are valid parameter values. In other words, the boolean value means to be "if the parameter value is filled".

>>> param = Param('dropout')
>>> if param:
...     print('OK')
>>> param = Param('dropout', 0)
>>> if param:
...     print('OK')
OK

A _pre_assignment_hook is initialized as a data type convertor if the value is set as a number to keep data type consistency of the parameter. This conversion supports python built-in numbers, numpy numbers, and any number that inherits numbers.Number.

>>> param = Param('float_param', 0.5)
>>> param.value = 10
>>> param.value
10.0
>>> type(param.value)
<class 'float'>
desc

return -- Parameter description.

hyper_space

return -- Hyper space of the parameter.

name

return -- Name of the parameter.

返回类型:str
set_default(val, verbose=1)

Set default value, has no effect if already has a value.

参数:
  • val -- Default value to set.
  • verbose -- Verbosity.
validator

return -- Validator of the parameter.

返回类型:Callable[[Any], bool]
value

return -- Value of the parameter.

返回类型:Any

matchzoo.engine.param_table module

Parameters table class.

class matchzoo.engine.param_table.ParamTable

基类:object

Parameter table class.

Example

>>> params = ParamTable()
>>> params.add(Param('ham', 'Parma Ham'))
>>> params.add(Param('egg', 'Over Easy'))
>>> params['ham']
'Parma Ham'
>>> params['egg']
'Over Easy'
>>> print(params)
ham                           Parma Ham
egg                           Over Easy
>>> params.add(Param('egg', 'Sunny side Up'))
Traceback (most recent call last):
    ...
ValueError: Parameter named egg already exists.
To re-assign parameter egg value, use `params["egg"] = value` instead.
add(param)
参数:param (Param) -- parameter to add.
completed()
返回类型:bool
返回:True if all params are filled, False otherwise.

Example

>>> import matchzoo
>>> model = matchzoo.models.Naive()
>>> model.params.completed()
False
>>> model.guess_and_fill_missing_params(verbose=0)
>>> model.params.completed()
True
get(key)
返回类型:Param
返回:The parameter in the table named key.
hyper_space

return -- Hyper space of the table, a valid hyperopt graph.

返回类型:dict
keys()
返回类型:Keysview[~KT]
返回:Parameter table keys.
set(key, param)

Set key to parameter param.

Module contents