matchzoo.preprocessors package

Submodules

matchzoo.preprocessors.basic_preprocessor module

Basic Preprocessor.

class matchzoo.preprocessors.basic_preprocessor.BasicPreprocessor(fixed_length_left=30, fixed_length_right=30, filter_mode='df', filter_low_freq=2, filter_high_freq=inf, remove_stop_words=False)

基类:matchzoo.engine.base_preprocessor.BasePreprocessor

Baisc preprocessor helper.

参数:
  • fixed_length_left (int) -- Integer, maximize length of left in the data_pack.
  • fixed_length_right (int) -- Integer, maximize length of right in the data_pack.
  • filter_mode (str) -- String, mode used by FrequenceFilterUnit, Can be 'df', 'cf', and 'idf'.
  • filter_low_freq (float) -- Float, lower bound value used by FrequenceFilterUnit.
  • filter_high_freq (float) -- Float, upper bound value used by FrequenceFilterUnit.
  • remove_stop_words (bool) -- Bool, use StopRemovalUnit unit or not.

Example

>>> import matchzoo as mz
>>> train_data = mz.datasets.toy.load_data('train')
>>> test_data = mz.datasets.toy.load_data('test')
>>> preprocessor = mz.preprocessors.BasicPreprocessor(
...     fixed_length_left=10,
...     fixed_length_right=20,
...     filter_mode='df',
...     filter_low_freq=2,
...     filter_high_freq=1000,
...     remove_stop_words=True
... )
>>> preprocessor = preprocessor.fit(train_data)
>>> preprocessor.context['input_shapes']
[(10,), (20,)]
>>> preprocessor.context['vocab_size']
284
>>> processed_train_data = preprocessor.transform(train_data)
>>> type(processed_train_data)
<class 'matchzoo.data_pack.data_pack.DataPack'>
>>> test_data_transformed = preprocessor.transform(test_data)
>>> type(test_data_transformed)
<class 'matchzoo.data_pack.data_pack.DataPack'>
fit(data_pack, verbose=1)

Fit pre-processing context for transformation.

参数:
  • data_pack (DataPack) -- data_pack to be preprocessed.
  • verbose -- Verbosity.
返回:

class:BasicPreprocessor instance.

transform(data_pack, verbose=1)

Apply transformation on data, create fixed length representation.

参数:
  • data_pack (DataPack) -- Inputs to be preprocessed.
  • verbose -- Verbosity.
返回类型:

DataPack

返回:

Transformed data as DataPack object.

matchzoo.preprocessors.cdssm_preprocessor module

CDSSM Preprocessor.

class matchzoo.preprocessors.cdssm_preprocessor.CDSSMPreprocessor(fixed_length_left=10, fixed_length_right=40, with_word_hashing=True)

基类:matchzoo.engine.base_preprocessor.BasePreprocessor

CDSSM Model preprocessor.

fit(data_pack, verbose=1)

Fit pre-processing context for transformation.

参数:
  • verbose -- Verbosity.
  • data_pack (DataPack) -- Data_pack to be preprocessed.
返回:

class:CDSSMPreprocessor instance.

transform(data_pack, verbose=1)

Apply transformation on data, create letter-ngram representation.

参数:
  • data_pack (DataPack) -- Inputs to be preprocessed.
  • verbose -- Verbosity.
返回类型:

DataPack

返回:

Transformed data as DataPack object.

matchzoo.preprocessors.dssm_preprocessor module

DSSM Preprocessor.

class matchzoo.preprocessors.dssm_preprocessor.DSSMPreprocessor(with_word_hashing=True)

基类:matchzoo.engine.base_preprocessor.BasePreprocessor

DSSM Model preprocessor.

fit(data_pack, verbose=1)

Fit pre-processing context for transformation.

参数:
  • verbose -- Verbosity.
  • data_pack (DataPack) -- data_pack to be preprocessed.
返回:

class:DSSMPreprocessor instance.

transform(data_pack, verbose=1)

Apply transformation on data, create tri-letter representation.

参数:
  • data_pack (DataPack) -- Inputs to be preprocessed.
  • verbose -- Verbosity.
返回类型:

DataPack

返回:

Transformed data as DataPack object.

matchzoo.preprocessors.naive_preprocessor module

Naive Preprocessor.

class matchzoo.preprocessors.naive_preprocessor.NaivePreprocessor

基类:matchzoo.engine.base_preprocessor.BasePreprocessor

Naive preprocessor.

Example

>>> import matchzoo as mz
>>> train_data = mz.datasets.toy.load_data()
>>> test_data = mz.datasets.toy.load_data(stage='test')
>>> preprocessor = mz.preprocessors.NaivePreprocessor()
>>> train_data_processed = preprocessor.fit_transform(train_data)
>>> type(train_data_processed)
<class 'matchzoo.data_pack.data_pack.DataPack'>
>>> test_data_transformed = preprocessor.transform(test_data)
>>> type(test_data_transformed)
<class 'matchzoo.data_pack.data_pack.DataPack'>
fit(data_pack, verbose=1)

Fit pre-processing context for transformation.

参数:
  • data_pack (DataPack) -- data_pack to be preprocessed.
  • verbose -- Verbosity.
返回:

class:NaivePreprocessor instance.

transform(data_pack, verbose=1)

Apply transformation on data, create tri-letter representation.

参数:
  • data_pack (DataPack) -- Inputs to be preprocessed.
  • verbose -- Verbosity.
返回类型:

DataPack

返回:

Transformed data as DataPack object.

Module contents

matchzoo.preprocessors.list_available()