matchzoo.data_generator package¶
Subpackages¶
Submodules¶
matchzoo.data_generator.data_generator module¶
Base generator.
-
class
matchzoo.data_generator.data_generator.DataGenerator(data_pack, mode='point', num_dup=1, num_neg=1, resample=True, batch_size=128, shuffle=True, callbacks=None)¶ Bases:
keras.utils.data_utils.SequenceData Generator.
Used to divide a
matchzoo.DataPackinto batches. This is helpful for generating batch-wise features and delaying data preprocessing to the fit time.See tutorials/data_handling.ipynb for a walkthrough.
Parameters: - data_pack (
DataPack) – DataPack to generator data from. - mode – One of “point”, “pair”, and “list”. (default: “point”)
- num_dup (
int) – Number of duplications per instance, only effective when mode is “pair”. (default: 1) - num_neg (
int) – Number of negative samples per instance, only effective when mode is “pair”. (default: 1) - resample (
bool) – Either to resample for each epoch, only effective when mode is “pair”. (default: True) - batch_size (
int) – Batch size. (default: 128) - shuffle (
bool) – Either to shuffle the samples/instances. (default: True) - callbacks (
Optional[List[Callback]]) – Callbacks. See matchzoo.data_generator.callbacks for more details.
- Examples::
>>> import numpy as np >>> import matchzoo as mz >>> np.random.seed(0) >>> data_pack = mz.datasets.toy.load_data() >>> batch_size = 8
- To generate data points:
>>> point_gen = mz.DataGenerator( ... data_pack=data_pack, ... batch_size=batch_size ... ) >>> len(point_gen) 13 >>> x, y = point_gen[0] >>> for key, value in sorted(x.items()): ... print(key, str(value)[:30]) id_left ['Q6' 'Q17' 'Q1' 'Q13' 'Q16' ' id_right ['D6-6' 'D17-1' 'D1-2' 'D13-3' text_left ['how long is the term for fed text_right ['See Article I and Article II
- To generate data pairs:
>>> pair_gen = mz.DataGenerator( ... data_pack=data_pack, ... mode='pair', ... num_dup=4, ... num_neg=4, ... batch_size=batch_size, ... shuffle=False ... ) >>> len(pair_gen) 3 >>> x, y = pair_gen[0] >>> for key, value in sorted(x.items()): ... print(key, str(value)[:30]) id_left ['Q1' 'Q1' 'Q1' 'Q1' 'Q1' 'Q1' id_right ['D1-3' 'D1-4' 'D1-0' 'D1-1' ' text_left ['how are glacier caves formed text_right ['A glacier cave is a cave for
- To generate data lists:
- # TODO:
-
batch_indices¶ batch_indices getter.
-
batch_size¶ batch_size getter.
-
callbacks¶ callbacks getter.
-
mode¶ mode getter.
-
num_dup¶ num_dup getter.
-
num_neg¶ num_neg getter.
-
on_epoch_end()¶ Reorganize the index array while epoch is ended.
-
reset_index()¶ Set the
index_array.Here the
index_arrayrecords the index of all the instances.
-
shuffle¶ shuffle getter.
- data_pack (
matchzoo.data_generator.data_generator_builder module¶
-
class
matchzoo.data_generator.data_generator_builder.DataGeneratorBuilder(**kwargs)¶ Bases:
objectData Generator Bulider. In essense a wrapped partial function.
Example
>>> import matchzoo as mz >>> builder = mz.DataGeneratorBuilder(mode='pair', batch_size=32) >>> data = mz.datasets.toy.load_data() >>> gen = builder.build(data) >>> type(gen) <class 'matchzoo.data_generator.data_generator.DataGenerator'> >>> gen.batch_size 32 >>> gen_64 = builder.build(data, batch_size=64) >>> gen_64.batch_size 64
-
build(data_pack, **kwargs)¶ Build a DataGenerator.
Parameters: - data_pack – DataPack to build upon.
- kwargs – Additional keyword arguments to override the keyword arguments passed in __init__.
Return type:
-