matchzoo.engine package¶
Submodules¶
matchzoo.engine.base_metric module¶
Metric base class and some related utilities.
-
class
matchzoo.engine.base_metric.BaseMetric¶ Bases:
abc.ABCMetric base class.
-
ALIAS= 'base_metric'¶
-
-
matchzoo.engine.base_metric.sort_and_couple(labels, scores)¶ Zip the labels with scores into a single list.
Return type: <built-in function array>
matchzoo.engine.base_model module¶
Base Model.
-
class
matchzoo.engine.base_model.BaseModel(params=None, backend=None)¶ Bases:
abc.ABCAbstract base class of all MatchZoo models.
MatchZoo models are wrapped over keras models, and the actual keras model built can be accessed by model.backend. params is a set of model hyper-parameters that deterministically builds a model. In other words, params[‘model_class’](params=params) of the same params always create models with the same structure.
Parameters: - params (
Optional[ParamTable]) – Model hyper-parameters. (default: return value fromget_default_params()) - backend (
Optional[Model]) – A keras model as the model backend. Usually not passed as an argument.
Example
>>> BaseModel() # doctest: +ELLIPSIS Traceback (most recent call last): ... TypeError: Can't instantiate abstract class BaseModel ... >>> class MyModel(BaseModel): ... def build(self): ... pass >>> isinstance(MyModel(), BaseModel) True
-
BACKEND_WEIGHTS_FILENAME= 'backend_weights.h5'¶
-
PARAMS_FILENAME= 'params.dill'¶
-
backend¶ return model backend, a keras model instance.
Return type: Model
-
build()¶ Build model, each subclass need to impelemnt this method.
-
compile()¶ Compile model for training.
Only keras native metrics are compiled together with backend. MatchZoo metrics are evaluated only through
evaluate(). Notice that keras count loss as one of the metrics while MatchZoomatchzoo.engine.BaseTaskdoes not.Examples
>>> from matchzoo import models >>> model = models.Naive() >>> model.guess_and_fill_missing_params(verbose=0) >>> model.params['task'].metrics = ['mse', 'map'] >>> model.params['task'].metrics ['mse', mean_average_precision(0.0)] >>> model.build() >>> model.compile()
-
evaluate(x, y, batch_size=128)¶ Evaluate the model.
Parameters: - x (
Dict[str,ndarray]) – Input data. - y (
ndarray) – Labels. - batch_size (
int) – Number of samples when predict for evaluation. (default: 128)
- Examples::
>>> import matchzoo as mz >>> data_pack = mz.datasets.toy.load_data() >>> preprocessor = mz.preprocessors.NaivePreprocessor() >>> data_pack = preprocessor.fit_transform(data_pack, verbose=0) >>> m = mz.models.DenseBaseline() >>> m.params['task'] = mz.tasks.Ranking() >>> m.params['task'].metrics = [ ... 'acc', 'mse', 'mae', 'ce', ... 'average_precision', 'precision', 'dcg', 'ndcg', ... 'mean_reciprocal_rank', 'mean_average_precision', 'mrr', ... 'map', 'MAP', ... mz.metrics.AveragePrecision(threshold=1), ... mz.metrics.Precision(k=2, threshold=2), ... mz.metrics.DiscountedCumulativeGain(k=2), ... mz.metrics.NormalizedDiscountedCumulativeGain( ... k=3, threshold=-1), ... mz.metrics.MeanReciprocalRank(threshold=2), ... mz.metrics.MeanAveragePrecision(threshold=3) ... ] >>> m.guess_and_fill_missing_params(verbose=0) >>> m.build() >>> m.compile() >>> x, y = data_pack.unpack() >>> evals = m.evaluate(x, y) >>> type(evals) <class 'dict'>
Return type: Dict[BaseMetric,float]- x (
-
evaluate_generator(generator, batch_size=128)¶ Evaluate the model.
Parameters: - generator (
DataGenerator) – DataGenerator to evluate. - batch_size (
int) – Batch size. (default: 128)
Return type: Dict[BaseMetric,float]- generator (
-
fit(x, y, batch_size=128, epochs=1, verbose=1, **kwargs)¶ Fit the model.
See
keras.models.Model.fit()for more details.Parameters: - x (
Union[ndarray,List[ndarray],dict]) – input data. - y (
ndarray) – labels. - batch_size (
int) – number of samples per gradient update. - epochs (
int) – number of epochs to train the model. - verbose (
int) – 0, 1, or 2. Verbosity mode. 0 = silent, 1 = verbose, 2 = one log line per epoch.
Key word arguments not listed above will be propagated to keras’s fit.
Return type: HistoryReturns: A keras.callbacks.History instance. Its history attribute contains all information collected during training. - x (
-
fit_generator(generator, epochs=1, verbose=1, **kwargs)¶ Fit the model with matchzoo generator.
See
keras.models.Model.fit_generator()for more details.Parameters: - generator (
DataGenerator) – A generator, an instance ofengine.DataGenerator. - epochs (
int) – Number of epochs to train the model. - verbose (
int) – 0, 1, or 2. Verbosity mode. 0 = silent, 1 = verbose, 2 = one log line per epoch.
Return type: HistoryReturns: A keras.callbacks.History instance. Its history attribute contains all information collected during training.
- generator (
-
classmethod
get_default_params(with_embedding=False, with_multi_layer_perceptron=False)¶ Model default parameters.
- The common usage is to instantiate
matchzoo.engine.ModelParams - first, then set the model specific parametrs.
Examples
>>> class MyModel(BaseModel): ... def build(self): ... print(self._params['num_eggs'], 'eggs') ... print('and', self._params['ham_type']) ... ... @classmethod ... def get_default_params(cls): ... params = ParamTable() ... params.add(Param('num_eggs', 512)) ... params.add(Param('ham_type', 'Parma Ham')) ... return params >>> my_model = MyModel() >>> my_model.build() 512 eggs and Parma Ham
Notice that all parameters must be serialisable for the entire model to be serialisable. Therefore, it’s strongly recommended to use python native data types to store parameters.
Return type: ParamTableReturns: model parameters - The common usage is to instantiate
-
classmethod
get_default_preprocessor()¶ Model default preprocessor.
The preprocessor’s transform should produce a correctly shaped data pack that can be used for training. Some extra configuration (e.g. setting input_shapes in
matchzoo.models.DSSMModelmay be required on the user’s end.Return type: BasePreprocessorReturns: Default preprocessor.
-
get_embedding_layer(name='embedding')¶ Get the embedding layer.
All MatchZoo models with a single embedding layer set the embedding layer name to embedding, and this method should return that layer.
Parameters: name ( str) – Name of the embedding layer. (default: embedding)Return type: Layer
-
guess_and_fill_missing_params(verbose=1)¶ Guess and fill missing parameters in
params.Use this method to automatically fill-in other hyper parameters. This involves some guessing so the parameter it fills could be wrong. For example, the default task is Ranking, and if we do not set it to Classification manaully for data packs prepared for classification, then the shape of the model output and the data will mismatch.
Parameters: verbose – Verbosity.
-
load_embedding_matrix(embedding_matrix, name='embedding')¶ Load an embedding matrix.
Load an embedding matrix into the model’s embedding layer. The name of the embedding layer is specified by name. For models with only one embedding layer, set name=’embedding’ when creating the keras layer, and use the default name when load the matrix. For models with more than one embedding layers, initialize keras layer with different layer names, and set name accordingly to load a matrix to a chosen layer.
Parameters: - embedding_matrix (
ndarray) – Embedding matrix to be loaded. - name (
str) – Name of the layer. (default: ‘embedding’)
- embedding_matrix (
-
params¶ model parameters.
Type: return Return type: ParamTable
-
predict(x, batch_size=128)¶ Generate output predictions for the input samples.
See
keras.models.Model.predict()for more details.Parameters: - x (
Dict[str,ndarray]) – input data - batch_size – number of samples per gradient update
Return type: ndarrayReturns: numpy array(s) of predictions
- x (
-
save(dirpath)¶ Save the model.
A saved model is represented as a directory with two files. One is a model parameters file saved by pickle, and the other one is a model h5 file saved by keras.
Parameters: dirpath ( Union[str,Path]) – directory path of the saved modelExample
>>> import matchzoo as mz >>> model = mz.models.Naive() >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build() >>> model.save('temp-model') >>> import shutil >>> shutil.rmtree('temp-model')
- params (
-
matchzoo.engine.base_model.load_model(dirpath)¶ Load a model. The reverse function of
BaseModel.save().Parameters: dirpath ( Union[str,Path]) – directory path of the saved modelReturn type: BaseModelReturns: a BaseModelinstanceExample
>>> import matchzoo as mz >>> model = mz.models.Naive() >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build() >>> model.save('my-model') >>> model.params.keys() == mz.load_model('my-model').params.keys() True >>> import shutil >>> shutil.rmtree('my-model')
matchzoo.engine.base_preprocessor module¶
BasePreprocessor define input and ouutput for processors.
-
class
matchzoo.engine.base_preprocessor.BasePreprocessor¶ Bases:
objectBasePreprocessorto input handle data.A preprocessor should be used in two steps. First, fit, then, transform. fit collects information into context, which includes everything the preprocessor needs to transform together with other useful information for later use. fit will only change the preprocessor’s inner state but not the input data. In contrast, transform returns a modified copy of the input data without changing the preprocessor’s inner state.
-
DATA_FILENAME= 'preprocessor.dill'¶
-
context¶ Return context.
-
fit(data_pack, verbose=1)¶ Fit parameters on input data.
This method is an abstract base method, need to be implemented in the child class.
This method is expected to return itself as a callable object.
Parameters: - data_pack (
DataPack) –Datapackobject to be fitted. - verbose (
int) – Verbosity.
Return type: - data_pack (
-
fit_transform(data_pack, verbose=1)¶ Call fit-transform.
Parameters: - data_pack (
DataPack) –DataPackobject to be processed. - verbose (
int) – Verbosity.
Return type: - data_pack (
-
save(dirpath)¶ Save the
DSSMPreprocessorobject.A saved
DSSMPreprocessoris represented as a directory with the context object (fitted parameters on training data), it will be saved by pickle.Parameters: dirpath ( Union[str,Path]) – directory path of the savedDSSMPreprocessor.
-
transform(data_pack, verbose=1)¶ Transform input data to expected manner.
This method is an abstract base method, need to be implemented in the child class.
Parameters: - data_pack (
DataPack) –DataPackobject to be transformed. - verbose (
int) – Verbosity. or list of text-left, text-right tuples.
Return type: - data_pack (
-
-
matchzoo.engine.base_preprocessor.load_preprocessor(dirpath)¶ Load the fitted context. The reverse function of
save().Parameters: dirpath ( Union[str,Path]) – directory path of the saved model.Return type: DataPackReturns: a DSSMPreprocessorinstance.
-
matchzoo.engine.base_preprocessor.validate_context(func)¶ Validate context in the preprocessor.
matchzoo.engine.base_task module¶
Base task.
-
class
matchzoo.engine.base_task.BaseTask(loss=None, metrics=None)¶ Bases:
abc.ABCBase Task, shouldn’t be used directly.
-
classmethod
list_available_losses()¶ Return type: listReturns: a list of available losses.
-
classmethod
list_available_metrics()¶ Return type: listReturns: a list of available metrics.
-
loss¶ Loss used in the task.
Type: return
-
metrics¶ Metrics used in the task.
Type: return
-
output_dtype¶ output data type for specific task.
Type: return
-
output_shape¶ output shape of a single sample of the task.
Type: return Return type: tuple
-
classmethod
matchzoo.engine.callbacks module¶
Callbacks.
-
class
matchzoo.engine.callbacks.EvaluateAllMetrics(model, x, y, once_every=1, batch_size=128, model_save_path=None, verbose=1)¶ Bases:
keras.callbacks.CallbackCallback to evaluate all metrics.
MatchZoo metrics can not be evaluated batch-wise since they require dataset-level information. As a result, MatchZoo metrics are not evaluated automatically when a Model fit. When this callback is used, all metrics, including MatchZoo metrics and Keras metrics, are evluated once every once_every epochs.
Parameters: - model (
BaseModel) – Model to evaluate. - x (
Union[ndarray,List[ndarray]]) – - y (
ndarray) – - once_every (
int) – Evaluation only triggers when epoch % once_every == 0. (default: 1, i.e. evaluate on every epoch’s end) - batch_size (
int) – Number of samples per evaluation. This only affects the evaluation of Keras metrics, since MatchZoo metrics are always evaluated using the full data. - model_save_path (
Optional[str]) – Directory path to save the model after each evaluate callback, (default: None, i.e., no saving.) - verbose – Verbosity.
-
on_epoch_end(epoch, logs=None)¶ Called at the end of en epoch.
Parameters: - epoch (
int) – integer, index of epoch. - logs (
Optional[dict]) – dictionary of logs.
Returns: dictionary of logs.
- epoch (
- model (
matchzoo.engine.hyper_spaces module¶
Hyper parameter search spaces wrapping hyperopt.
-
class
matchzoo.engine.hyper_spaces.HyperoptProxy(hyperopt_func, **kwargs)¶ Bases:
objectHyperopt proxy class.
See hyperopt’s documentation for more details: https://github.com/hyperopt/hyperopt/wiki/FMin
Reason of these wrappers:
A hyper space in hyperopt requires a label to instantiate. This label is used later as a reference to original hyper space that is sampled. In matchzoo, hyper spaces are used inmatchzoo.engine.Param. Only if a hyper space’s label matches its parentmatchzoo.engine.Param’s name, matchzoo can correctly back-refrenced the parameter got sampled. This can be done by asking the user always use the same name for a parameter and its hyper space, but typos can occur. As a result, these wrappers are created to hide hyper spaces’ label, and always correctly bind them with its parameter’s name.- Examples::
>>> import matchzoo as mz >>> from hyperopt.pyll.stochastic import sample
- Basic Usage:
>>> model = mz.models.DenseBaseline() >>> sample(model.params.hyper_space) # doctest: +SKIP {'mlp_num_layers': 1.0, 'mlp_num_units': 274.0}
- Arithmetic Operations:
>>> new_space = 2 ** mz.hyper_spaces.quniform(2, 6) >>> model.params.get('mlp_num_layers').hyper_space = new_space >>> sample(model.params.hyper_space) # doctest: +SKIP {'mlp_num_layers': 8.0, 'mlp_num_units': 292.0}
-
convert(name)¶ Attach name as hyperopt.hp’s label.
Parameters: name ( str) –Return type: ApplyReturns: a hyperopt ready search space
-
class
matchzoo.engine.hyper_spaces.choice(options)¶ Bases:
matchzoo.engine.hyper_spaces.HyperoptProxyhyperopt.hp.choice()proxy.
-
class
matchzoo.engine.hyper_spaces.quniform(low, high, q=1)¶ Bases:
matchzoo.engine.hyper_spaces.HyperoptProxyhyperopt.hp.quniform()proxy.
-
matchzoo.engine.hyper_spaces.sample(space)¶ Take a sample in the hyper space.
This method is stateless, so the distribution of the samples is different from that of tune call. This function just gives a general idea of what a sample from the space looks like.
Example
>>> import matchzoo as mz >>> space = mz.models.Naive.get_default_params().hyper_space >>> mz.hyper_spaces.sample(space) # doctest: +ELLIPSIS {'optimizer': ...}
-
class
matchzoo.engine.hyper_spaces.uniform(low, high)¶ Bases:
matchzoo.engine.hyper_spaces.HyperoptProxyhyperopt.hp.uniform()proxy.
matchzoo.engine.param module¶
Parameter class.
-
class
matchzoo.engine.param.Param(name, value=None, hyper_space=None, validator=None, desc=None)¶ Bases:
objectParameter class.
Basic usages with a name and value:
>>> param = Param('my_param', 10) >>> param.name 'my_param' >>> param.value 10
Use with a validator to make sure the parameter always keeps a valid value.
>>> param = Param( ... name='my_param', ... value=5, ... validator=lambda x: 0 < x < 20 ... ) >>> param.validator # doctest: +ELLIPSIS <function <lambda> at 0x...> >>> param.value 5 >>> param.value = 10 >>> param.value 10 >>> param.value = -1 Traceback (most recent call last): ... ValueError: Validator not satifised. The validator's definition is as follows: validator=lambda x: 0 < x < 20
Use with a hyper space. Setting up a hyper space for a parameter makes the parameter tunable in a
matchzoo.engine.Tuner.>>> from matchzoo.engine.hyper_spaces import quniform >>> param = Param( ... name='positive_num', ... value=1, ... hyper_space=quniform(low=1, high=5) ... ) >>> param.hyper_space # doctest: +ELLIPSIS <matchzoo.engine.hyper_spaces.quniform object at ...> >>> from hyperopt.pyll.stochastic import sample >>> hyperopt_space = param.hyper_space.convert(param.name) >>> samples = [sample(hyperopt_space) for _ in range(64)] >>> set(samples) == {1, 2, 3, 4, 5} True
The boolean value of a
Paraminstance is only True when the value is not None. This is because some default falsy values like zero or an empty list are valid parameter values. In other words, the boolean value means to be “if the parameter value is filled”.>>> param = Param('dropout') >>> if param: ... print('OK') >>> param = Param('dropout', 0) >>> if param: ... print('OK') OK
A _pre_assignment_hook is initialized as a data type convertor if the value is set as a number to keep data type consistency of the parameter. This conversion supports python built-in numbers, numpy numbers, and any number that inherits
numbers.Number.>>> param = Param('float_param', 0.5) >>> param.value = 10 >>> param.value 10.0 >>> type(param.value) <class 'float'>
-
desc¶ Parameter description.
Type: return Return type: str
-
hyper_space¶ Hyper space of the parameter.
Type: return Return type: Union[Apply,HyperoptProxy]
-
name¶ Name of the parameter.
Type: return Return type: str
-
reset()¶ Set the parameter’s value to None, which means “not set”.
This method bypasses validator.
Example
>>> import matchzoo as mz >>> param = mz.Param( ... name='str', validator=lambda x: isinstance(x, str)) >>> param.value = 'hello' >>> param.value = None Traceback (most recent call last): ... ValueError: Validator not satifised. The validator's definition is as follows: name='str', validator=lambda x: isinstance(x, str)) >>> param.reset() >>> param.value is None True
-
set_default(val, verbose=1)¶ Set default value, has no effect if already has a value.
Parameters: - val – Default value to set.
- verbose – Verbosity.
-
validator¶ Validator of the parameter.
Type: return Return type: Callable[[Any],bool]
-
value¶ Value of the parameter.
Type: return Return type: Any
-
matchzoo.engine.param_table module¶
Parameters table class.
-
class
matchzoo.engine.param_table.ParamTable¶ Bases:
objectParameter table class.
Example
>>> params = ParamTable() >>> params.add(Param('ham', 'Parma Ham')) >>> params.add(Param('egg', 'Over Easy')) >>> params['ham'] 'Parma Ham' >>> params['egg'] 'Over Easy' >>> print(params) ham Parma Ham egg Over Easy >>> params.add(Param('egg', 'Sunny side Up')) Traceback (most recent call last): ... ValueError: Parameter named egg already exists. To re-assign parameter egg value, use `params["egg"] = value` instead.
-
completed()¶ Return type: boolReturns: True if all params are filled, False otherwise. Example
>>> import matchzoo >>> model = matchzoo.models.Naive() >>> model.params.completed() False >>> model.guess_and_fill_missing_params(verbose=0) >>> model.params.completed() True
-
hyper_space¶ Hyper space of the table, a valid hyperopt graph.
Type: return Return type: dict
-
keys()¶ Return type: KeysViewReturns: Parameter table keys.
-
set(key, param)¶ Set key to parameter param.
-
to_frame()¶ Convert the parameter table into a pandas data frame.
Return type: DataFrameReturns: A pandas.DataFrame. Example
>>> import matchzoo as mz >>> table = mz.ParamTable() >>> table.add(mz.Param(name='x', value=10, desc='my x')) >>> table.add(mz.Param(name='y', value=20, desc='my y')) >>> table.to_frame() Name Description Value Hyper-Space 0 x my x 10 None 1 y my y 20 None
-
update(other)¶ Update self.
Update self with the key/value pairs from other, overwriting existing keys. Notice that this does not add new keys to self.
This method is usually used by models to obtain useful information from a preprocessor’s context.
Parameters: other ( dict) – The dictionary used update.Example
>>> import matchzoo as mz >>> model = mz.models.DenseBaseline() >>> model.params['input_shapes'] is None True >>> prpr = model.get_default_preprocessor() >>> _ = prpr.fit(mz.datasets.toy.load_data(), verbose=0) >>> model.params.update(prpr.context) >>> model.params['input_shapes'] [(30,), (30,)]
-
matchzoo.engine.parse_metric module¶
-
matchzoo.engine.parse_metric.parse_metric(metric, task=None)¶ Parse input metric in any form into a
BaseMetricinstance.Parameters: - metric (
Union[str,Type[BaseMetric],BaseMetric]) – Input metric in any form. - task (
Optional[BaseTask]) – Task type for determining specific metric.
Return type: Union[BaseMetric,str]Returns: A
BaseMetricinstance- Examples::
>>> from matchzoo import metrics >>> from matchzoo.engine.parse_metric import parse_metric
- Use str as keras native metrics:
>>> parse_metric('mse') 'mse'
- Use str as MatchZoo metrics:
>>> mz_metric = parse_metric('map') >>> type(mz_metric) <class 'matchzoo.metrics.mean_average_precision.MeanAveragePrecision'>
- Use
matchzoo.engine.BaseMetricsubclasses as MatchZoo metrics: >>> type(parse_metric(metrics.AveragePrecision)) <class 'matchzoo.metrics.average_precision.AveragePrecision'>
- Use
matchzoo.engine.BaseMetricinstances as MatchZoo metrics: >>> type(parse_metric(metrics.AveragePrecision())) <class 'matchzoo.metrics.average_precision.AveragePrecision'>
- metric (