matchzoo.engine package¶
Submodules¶
matchzoo.engine.base_metric module¶
Metric base class and some related utilities.
-
matchzoo.engine.base_metric.
parse_metric
(metric)¶ Parse input metric in any form into a
BaseMetric
instance.参数: metric ( Union
[str
,Type
[BaseMetric
],BaseMetric
]) -- Input metric in any form.返回: A BaseMetric
instance- Examples::
>>> from matchzoo import engine, metrics
- Use str as keras native metrics:
>>> engine.parse_metric('mse') 'mse'
- Use str as MatchZoo metrics:
>>> mz_metric = engine.parse_metric('map') >>> type(mz_metric) <class 'matchzoo.metrics.mean_average_precision.MeanAveragePrecision'>
- Use
matchzoo.engine.BaseMetric
subclasses as MatchZoo metrics: >>> type(engine.parse_metric(metrics.AveragePrecision)) <class 'matchzoo.metrics.average_precision.AveragePrecision'>
- Use
matchzoo.engine.BaseMetric
instances as MatchZoo metrics: >>> type(engine.parse_metric(metrics.AveragePrecision())) <class 'matchzoo.metrics.average_precision.AveragePrecision'>
-
matchzoo.engine.base_metric.
sort_and_couple
(labels, scores)¶ Zip the labels with scores into a single list.
返回类型: <built-in function array>
matchzoo.engine.base_model module¶
Base Model.
-
class
matchzoo.engine.base_model.
BaseModel
(params=None, backend=None)¶ 基类:
abc.ABC
Abstract base class of all matchzoo models.
-
BACKEND_WEIGHTS_FILENAME
= 'backend_weights.h5'¶
-
PARAMS_FILENAME
= 'params.dill'¶
-
backend
¶ return model backend, a keras model instance.
返回类型: Model
-
build
()¶ Build model, each sub class need to impelemnt this method.
Example
>>> BaseModel() Traceback (most recent call last): ... TypeError: Can't instantiate abstract class BaseModel ... >>> class MyModel(BaseModel): ... def build(self): ... pass >>> assert MyModel()
-
compile
()¶ Compile model for training.
Only keras native metrics are compiled together with backend. MatchZoo metrics are evaluated only through
evaluate()
. Notice that keras count loss as one of the metrics while MatchZoomatchzoo.engine.BaseTask
does not.Examples
>>> from matchzoo import models >>> model = models.Naive() >>> model.guess_and_fill_missing_params(verbose=0) >>> model.params['task'].metrics = ['mse', 'map'] >>> model.params['task'].metrics ['mse', mean_average_precision(0)] >>> model.build() >>> model.compile()
-
evaluate
(x, y, batch_size=128, verbose=1)¶ Evaluate the model.
See
keras.models.Model.evaluate()
for more details.参数: - x (
Union
[ndarray
,List
[ndarray
],Dict
[str
,ndarray
]]) -- input data - y (
ndarray
) -- labels - batch_size (
int
) -- number of samples per gradient update - verbose (
int
) -- verbosity mode, 0 or 1
返回类型: Dict
[str
,float
]返回: scalar test loss (if the model has a single output and no metrics) or list of scalars (if the model has multiple outputs and/or metrics). The attribute model.backend.metrics_names will give you the display labels for the scalar outputs.
- Examples::
>>> import matchzoo as mz >>> data_pack = mz.datasets.toy.load_data() >>> preprocessor = mz.preprocessors.NaivePreprocessor() >>> data_pack = preprocessor.fit_transform(data_pack) >>> m = mz.models.DenseBaseline() >>> m.params['task'] = mz.tasks.Ranking() >>> m.params['task'].metrics = [ ... 'acc', 'mse', 'mae', 'ce', ... 'average_precision', 'precision', 'dcg', 'ndcg', ... 'mean_reciprocal_rank', 'mean_average_precision', 'mrr', ... 'map', 'MAP', ... mz.metrics.AveragePrecision(threshold=1), ... mz.metrics.Precision(k=2, threshold=2), ... mz.metrics.DiscountedCumulativeGain(k=2), ... mz.metrics.NormalizedDiscountedCumulativeGain( ... k=3, threshold=-1), ... mz.metrics.MeanReciprocalRank(threshold=2), ... mz.metrics.MeanAveragePrecision(threshold=3) ... ] >>> m.guess_and_fill_missing_params(verbose=0) >>> m.build() >>> m.compile() >>> x, y = data_pack.unpack() >>> evals = m.evaluate(x, y, verbose=0) >>> type(evals) <class 'dict'>
- x (
-
fit
(x, y, batch_size=128, epochs=1, verbose=1, **kwargs)¶ Fit the model.
See
keras.models.Model.fit()
for more details.参数: - x (
Union
[ndarray
,List
[ndarray
]]) -- input data. - y (
ndarray
) -- labels. - batch_size (
int
) -- number of samples per gradient update. - epochs (
int
) -- number of epochs to train the model. - verbose (
int
) -- 0, 1, or 2. Verbosity mode. 0 = silent, 1 = verbose, 2 = one log line per epoch.
Key word arguments not listed above will be propagated to keras's fit.
返回类型: History
返回: A keras.callbacks.History instance. Its history attribute contains all information collected during training. - x (
-
fit_generator
(generator, epochs=1, verbose=1, **kwargs)¶ Fit the model with matchzoo generator.
See
keras.models.Model.fit_generator()
for more details.参数: - generator (
DataGenerator
) -- A generator, an instance ofengine.DataGenerator
. - epochs (
int
) -- Number of epochs to train the model. - verbose (
int
) -- 0, 1, or 2. Verbosity mode. 0 = silent, 1 = verbose, 2 = one log line per epoch.
返回类型: History
返回: A keras.callbacks.History instance. Its history attribute contains all information collected during training.
- generator (
-
classmethod
get_default_params
(with_embedding=False, with_multi_layer_perceptron=False)¶ Model default parameters.
- The common usage is to instantiate
matchzoo.engine.ModelParams
- first, then set the model specific parametrs.
Examples
>>> class MyModel(BaseModel): ... def build(self): ... print(self._params['num_eggs'], 'eggs') ... print('and', self._params['ham_type']) ... ... @classmethod ... def get_default_params(cls): ... params = engine.ParamTable() ... params.add(engine.Param('num_eggs', 512)) ... params.add(engine.Param('ham_type', 'Parma Ham')) ... return params >>> my_model = MyModel() >>> my_model.build() 512 eggs and Parma Ham
Notice that all parameters must be serialisable for the entire model to be serialisable. Therefore, it's strongly recommended to use python native data types to store parameters.
返回类型: ParamTable
返回: model parameters - The common usage is to instantiate
-
classmethod
get_default_preprocessor
()¶ Model default preprocessor.
The preprocessor's transform should produce a correctly shaped data pack that can be used for training. Some extra configuration (e.g. setting input_shapes in
matchzoo.models.DSSMModel
may be required on the user's end.返回类型: BasePreprocessor
返回: Default preprocessor.
-
guess_and_fill_missing_params
(verbose=1)¶ Guess and fill missing parameters in
params
.Use this method to automatically fill-in hyper parameters. This involves some guessing so the parameter it fills could be wrong. For example, the default task is Ranking, and if we do not set it to Classification manaully for data packs prepared for classification, then the shape of the model output and the data will mismatch.
参数: verbose -- Verbosity.
-
load_embedding_matrix
(embedding_matrix, name='embedding')¶ Load an embedding matrix.
Load an embedding matrix into the model's embedding layer. The name of the embedding layer is specified by name. For models with only one embedding layer, set name='embedding' when creating the keras layer, and use the default name when load the matrix. For models with more than one embedding layers, initialize keras layer with different layer names, and set name accordingly to load a matrix to a chosen layer.
参数: - embedding_matrix (
ndarray
) -- Embedding matrix to be loaded. - name (
str
) -- Name of the layer. (default: 'embedding')
- embedding_matrix (
-
params
¶ return -- model parameters.
返回类型: ParamTable
-
predict
(x, batch_size=128)¶ Generate output predictions for the input samples.
See
keras.models.Model.predict()
for more details.参数: - x (
Union
[ndarray
,List
[ndarray
]]) -- input data - batch_size -- number of samples per gradient update
返回类型: ndarray
返回: numpy array(s) of predictions
- x (
-
save
(dirpath)¶ Save the model.
A saved model is represented as a directory with two files. One is a model parameters file saved by pickle, and the other one is a model h5 file saved by keras.
参数: dirpath ( Union
[str
,Path
]) -- directory path of the saved model
-
-
matchzoo.engine.base_model.
load_model
(dirpath)¶ Load a model. The reverse function of
BaseModel.save()
.参数: dirpath ( Union
[str
,Path
]) -- directory path of the saved model返回类型: BaseModel
返回: a BaseModel
instance
matchzoo.engine.base_preprocessor module¶
BasePreprocessor
define input and ouutput for processors.
-
class
matchzoo.engine.base_preprocessor.
BasePreprocessor
¶ 基类:
object
BasePreprocessor
to input handle data.A preprocessor should be used in two steps. First, fit, then, transform. fit collects information into context, which includes everything the preprocessor needs to transform together with other useful information for later use. fit will only change the preprocessor's inner state but not the input data. In contrast, transform returns a modified copy of the input data without changing the preprocessor's inner state.
-
DATA_FILENAME
= 'preprocessor.dill'¶
-
context
¶ Return context.
-
fit
(data_pack, verbose=1)¶ Fit parameters on input data.
This method is an abstract base method, need to be implemented in the child class.
This method is expected to return itself as a callable object.
参数: - data_pack (
DataPack
) --Datapack
object to be fitted. - verbose -- Verbosity.
返回类型: - data_pack (
-
fit_transform
(data_pack, verbose=1)¶ Call fit-transform.
参数: data_pack ( DataPack
) --DataPack
object to be processed.返回类型: DataPack
-
save
(dirpath)¶ Save the
DSSMPreprocessor
object.A saved
DSSMPreprocessor
is represented as a directory with the context object (fitted parameters on training data), it will be saved by pickle.参数: dirpath ( Union
[str
,Path
]) -- directory path of the savedDSSMPreprocessor
.
-
-
matchzoo.engine.base_preprocessor.
load_preprocessor
(dirpath)¶ Load the fitted context. The reverse function of
save()
.参数: dirpath ( Union
[str
,Path
]) -- directory path of the saved model.返回类型: DataPack
返回: a DSSMPreprocessor
instance.
-
matchzoo.engine.base_preprocessor.
validate_context
(func)¶ Validate context in the preprocessor.
matchzoo.engine.base_task module¶
Base task.
-
class
matchzoo.engine.base_task.
BaseTask
(loss=None, metrics=None)¶ 基类:
abc.ABC
Base Task, shouldn't be used directly.
-
classmethod
convert_metrics
(metrics)¶ Convert metrics into properly formed list of metrics.
Examples
>>> BaseTask.convert_metrics(['mse']) ['mse'] >>> BaseTask.convert_metrics('map') [mean_average_precision(0)]
-
classmethod
list_available_losses
()¶ 返回类型: list
返回: a list of available losses.
-
classmethod
list_available_metrics
()¶ 返回类型: list
返回: a list of available metrics.
-
loss
¶ return -- Loss used in the task.
-
metrics
¶ return -- Metrics used in the task.
-
output_dtype
¶ return -- output data type for specific task.
-
output_shape
¶ return -- output shape of a single sample of the task.
返回类型: tuple
-
classmethod
matchzoo.engine.callbacks module¶
Callbacks.
-
class
matchzoo.engine.callbacks.
EvaluateAllMetrics
(model, x, y, once_every=1, batch_size=32, model_save_path=None, verbose=1)¶ 基类:
keras.callbacks.Callback
Callback to evaluate all metrics.
MatchZoo metrics can not be evaluated batch-wise since they require dataset-level information. As a result, MatchZoo metrics are not evaluated automatically when a Model fit. When this callback is used, all metrics, including MatchZoo metrics and Keras metrics, are evluated once every once_every epochs.
参数: - model (
BaseModel
) -- Model to evaluate. - x (
Union
[ndarray
,List
[ndarray
]]) -- - y (
ndarray
) -- - once_every (
int
) -- Evaluation only triggers when epoch % once_every == 0. (default: 1, i.e. evaluate on every epoch's end) - batch_size (
int
) -- Number of samples per evaluation. This only affects the evaluation of Keras metrics, since MatchZoo metrics are always evaluated using the full data. - model_save_path (
Optional
[str
]) -- Directory path to save the model after each evaluate callback, (default: None, i.e., no saving.) - verbose -- Verbosity.
-
on_epoch_end
(epoch, logs=None)¶ Called at the end of en epoch.
参数: - epoch -- integer, index of epoch.
- logs -- dictionary of logs.
返回: dictionary of logs.
- model (
matchzoo.engine.hyper_spaces module¶
Hyper parameter search spaces wrapping hyperopt.
-
class
matchzoo.engine.hyper_spaces.
HyperoptProxy
(hyperopt_func, **kwargs)¶ 基类:
object
Hyperopt proxy class.
See hyperopt's documentation for more details: https://github.com/hyperopt/hyperopt/wiki/FMin
Reason of these wrappers:
A hyper space in hyperopt requires a label to instantiate. This label is used later as a reference to original hyper space that is sampled. In matchzoo, hyper spaces are used inmatchzoo.engine.Param
. Only if a hyper space's label matches its parentmatchzoo.engine.Param
's name, matchzoo can correctly back-refrenced the parameter got sampled. This can be done by asking the user always use the same name for a parameter and its hyper space, but typos can occur. As a result, these wrappers are created to hide hyper spaces' label, and always correctly bind them with its parameter's name.- Examples::
>>> import matchzoo as mz >>> from hyperopt.pyll.stochastic import sample
- Basic Usage:
>>> model = mz.models.DenseBaseline() >>> sample(model.params.hyper_space) {'mlp_num_layers': 1.0, 'mlp_num_units': 274.0}
- Arithmetic Operations:
>>> new_space = 2 ** mz.hyper_spaces.quniform(2, 6) >>> model.params.get('mlp_num_layers').hyper_space = new_space >>> sample(model.params.hyper_space) {'mlp_num_layers': 8.0, 'mlp_num_units': 292.0}
-
convert
(name)¶ Attach name as hyperopt.hp's label.
参数: name ( str
) --返回类型: Apply
返回: a hyperopt ready search space
-
class
matchzoo.engine.hyper_spaces.
choice
(options)¶ 基类:
matchzoo.engine.hyper_spaces.HyperoptProxy
hyperopt.hp.choice()
proxy.
-
class
matchzoo.engine.hyper_spaces.
quniform
(low, high, q=1)¶ 基类:
matchzoo.engine.hyper_spaces.HyperoptProxy
hyperopt.hp.quniform()
proxy.
-
class
matchzoo.engine.hyper_spaces.
uniform
(low, high)¶ 基类:
matchzoo.engine.hyper_spaces.HyperoptProxy
hyperopt.hp.uniform()
proxy.
matchzoo.engine.param module¶
Parameter class.
-
class
matchzoo.engine.param.
Param
(name, value=None, hyper_space=None, validator=None, desc=None)¶ 基类:
object
Parameter class.
Basic usages with a name and value:
>>> param = Param('my_param', 10) >>> param.name 'my_param' >>> param.value 10
Use with a validator to make sure the parameter always keeps a valid value.
>>> param = Param( ... name='my_param', ... value=5, ... validator=lambda x: 0 < x < 20 ... ) >>> param.validator <function <lambda> at 0x...> >>> param.value 5 >>> param.value = 10 >>> param.value 10 >>> param.value = -1 Traceback (most recent call last): ... ValueError: Validator not satifised. The validator's definition is as follows: validator=lambda x: 0 < x < 20
Use with a hyper space. Setting up a hyper space for a parameter makes the parameter tunable in a
matchzoo.engine.Tuner
.>>> from matchzoo.engine.hyper_spaces import quniform >>> param = Param( ... name='positive_num', ... value=1, ... hyper_space=quniform(low=1, high=5) ... ) >>> param.hyper_space <matchzoo.engine.hyper_spaces.quniform object at ...> >>> from hyperopt.pyll.stochastic import sample >>> hyperopt_space = param.hyper_space.convert(param.name) >>> samples = [sample(hyperopt_space) for _ in range(64)] >>> set(samples) == {1, 2, 3, 4, 5} True
The boolean value of a
Param
instance is only True when the value is not None. This is because some default falsy values like zero or an empty list are valid parameter values. In other words, the boolean value means to be "if the parameter value is filled".>>> param = Param('dropout') >>> if param: ... print('OK') >>> param = Param('dropout', 0) >>> if param: ... print('OK') OK
A _pre_assignment_hook is initialized as a data type convertor if the value is set as a number to keep data type consistency of the parameter. This conversion supports python built-in numbers, numpy numbers, and any number that inherits
numbers.Number
.>>> param = Param('float_param', 0.5) >>> param.value = 10 >>> param.value 10.0 >>> type(param.value) <class 'float'>
-
desc
¶ return -- Parameter description.
-
hyper_space
¶ return -- Hyper space of the parameter.
-
name
¶ return -- Name of the parameter.
返回类型: str
-
set_default
(val, verbose=1)¶ Set default value, has no effect if already has a value.
参数: - val -- Default value to set.
- verbose -- Verbosity.
-
validator
¶ return -- Validator of the parameter.
返回类型: Callable
[[Any
],bool
]
-
value
¶ return -- Value of the parameter.
返回类型: Any
-
matchzoo.engine.param_table module¶
Parameters table class.
-
class
matchzoo.engine.param_table.
ParamTable
¶ 基类:
object
Parameter table class.
Example
>>> params = ParamTable() >>> params.add(Param('ham', 'Parma Ham')) >>> params.add(Param('egg', 'Over Easy')) >>> params['ham'] 'Parma Ham' >>> params['egg'] 'Over Easy' >>> print(params) ham Parma Ham egg Over Easy >>> params.add(Param('egg', 'Sunny side Up')) Traceback (most recent call last): ... ValueError: Parameter named egg already exists. To re-assign parameter egg value, use `params["egg"] = value` instead.
-
completed
()¶ 返回类型: bool
返回: True if all params are filled, False otherwise. Example
>>> import matchzoo >>> model = matchzoo.models.Naive() >>> model.params.completed() False >>> model.guess_and_fill_missing_params(verbose=0) >>> model.params.completed() True
-
hyper_space
¶ return -- Hyper space of the table, a valid hyperopt graph.
返回类型: dict
-
keys
()¶ 返回类型: Keysview
[~KT]返回: Parameter table keys.
-
set
(key, param)¶ Set key to parameter param.
-