matchzoo.preprocessors package¶
Submodules¶
matchzoo.preprocessors.basic_preprocessor module¶
Basic Preprocessor.
-
class
matchzoo.preprocessors.basic_preprocessor.
BasicPreprocessor
(fixed_length_left=30, fixed_length_right=30, filter_mode='df', filter_low_freq=2, filter_high_freq=inf, remove_stop_words=False)¶ 基类:
matchzoo.engine.base_preprocessor.BasePreprocessor
Baisc preprocessor helper.
参数: - fixed_length_left (
int
) -- Integer, maximize length ofleft
in the data_pack. - fixed_length_right (
int
) -- Integer, maximize length ofright
in the data_pack. - filter_mode (
str
) -- String, mode used byFrequenceFilterUnit
, Can be 'df', 'cf', and 'idf'. - filter_low_freq (
float
) -- Float, lower bound value used byFrequenceFilterUnit
. - filter_high_freq (
float
) -- Float, upper bound value used byFrequenceFilterUnit
. - remove_stop_words (
bool
) -- Bool, useStopRemovalUnit
unit or not.
Example
>>> import matchzoo as mz >>> train_data = mz.datasets.toy.load_data('train') >>> test_data = mz.datasets.toy.load_data('test') >>> preprocessor = mz.preprocessors.BasicPreprocessor( ... fixed_length_left=10, ... fixed_length_right=20, ... filter_mode='df', ... filter_low_freq=2, ... filter_high_freq=1000, ... remove_stop_words=True ... ) >>> preprocessor = preprocessor.fit(train_data) >>> preprocessor.context['input_shapes'] [(10,), (20,)] >>> preprocessor.context['vocab_size'] 284 >>> processed_train_data = preprocessor.transform(train_data) >>> type(processed_train_data) <class 'matchzoo.data_pack.data_pack.DataPack'> >>> test_data_transformed = preprocessor.transform(test_data) >>> type(test_data_transformed) <class 'matchzoo.data_pack.data_pack.DataPack'>
- fixed_length_left (
matchzoo.preprocessors.cdssm_preprocessor module¶
CDSSM Preprocessor.
-
class
matchzoo.preprocessors.cdssm_preprocessor.
CDSSMPreprocessor
(fixed_length_left=10, fixed_length_right=40, with_word_hashing=True)¶ 基类:
matchzoo.engine.base_preprocessor.BasePreprocessor
CDSSM Model preprocessor.
matchzoo.preprocessors.dssm_preprocessor module¶
DSSM Preprocessor.
-
class
matchzoo.preprocessors.dssm_preprocessor.
DSSMPreprocessor
(with_word_hashing=True)¶ 基类:
matchzoo.engine.base_preprocessor.BasePreprocessor
DSSM Model preprocessor.
matchzoo.preprocessors.naive_preprocessor module¶
Naive Preprocessor.
-
class
matchzoo.preprocessors.naive_preprocessor.
NaivePreprocessor
¶ 基类:
matchzoo.engine.base_preprocessor.BasePreprocessor
Naive preprocessor.
Example
>>> import matchzoo as mz >>> train_data = mz.datasets.toy.load_data() >>> test_data = mz.datasets.toy.load_data(stage='test') >>> preprocessor = mz.preprocessors.NaivePreprocessor() >>> train_data_processed = preprocessor.fit_transform(train_data) >>> type(train_data_processed) <class 'matchzoo.data_pack.data_pack.DataPack'> >>> test_data_transformed = preprocessor.transform(test_data) >>> type(test_data_transformed) <class 'matchzoo.data_pack.data_pack.DataPack'>