matchzoo.data_generator package

Submodules

matchzoo.data_generator.data_generator module

Base generator.

class matchzoo.data_generator.data_generator.DataGenerator(data_pack, mode='point', num_dup=1, num_neg=1, resample=True, batch_size=128, shuffle=True, callbacks=None)

Bases: keras.utils.data_utils.Sequence

Data Generator.

Used to divide a matchzoo.DataPack into batches. This is helpful for generating batch-wise features and delaying data preprocessing to the fit time.

See tutorials/data_handling.ipynb for a walkthrough.

Parameters:
  • data_pack (DataPack) – DataPack to generator data from.
  • mode – One of “point”, “pair”, and “list”. (default: “point”)
  • num_dup (int) – Number of duplications per instance, only effective when mode is “pair”. (default: 1)
  • num_neg (int) – Number of negative samples per instance, only effective when mode is “pair”. (default: 1)
  • resample (bool) – Either to resample for each epoch, only effective when mode is “pair”. (default: True)
  • batch_size (int) – Batch size. (default: 128)
  • shuffle (bool) – Either to shuffle the samples/instances. (default: True)
  • callbacks (Optional[List[Callback]]) – Callbacks. See matchzoo.data_generator.callbacks for more details.
Examples::
>>> import numpy as np
>>> import matchzoo as mz
>>> np.random.seed(0)
>>> data_pack = mz.datasets.toy.load_data()
>>> batch_size = 8
To generate data points:
>>> point_gen = mz.DataGenerator(
...     data_pack=data_pack,
...     batch_size=batch_size
... )
>>> len(point_gen)
13
>>> x, y = point_gen[0]
>>> for key, value in sorted(x.items()):
...     print(key, str(value)[:30])
id_left ['Q6' 'Q17' 'Q1' 'Q13' 'Q16' '
id_right ['D6-6' 'D17-1' 'D1-2' 'D13-3'
text_left ['how long is the term for fed
text_right ['See Article I and Article II
To generate data pairs:
>>> pair_gen = mz.DataGenerator(
...     data_pack=data_pack,
...     mode='pair',
...     num_dup=4,
...     num_neg=4,
...     batch_size=batch_size,
...     shuffle=False
... )
>>> len(pair_gen)
3
>>> x, y = pair_gen[0]
>>> for key, value in sorted(x.items()):
...     print(key, str(value)[:30])
id_left ['Q1' 'Q1' 'Q1' 'Q1' 'Q1' 'Q1'
id_right ['D1-3' 'D1-4' 'D1-0' 'D1-1' '
text_left ['how are glacier caves formed
text_right ['A glacier cave is a cave for
To generate data lists:
# TODO:
batch_indices

batch_indices getter.

batch_size

batch_size getter.

callbacks

callbacks getter.

mode

mode getter.

num_dup

num_dup getter.

num_neg

num_neg getter.

on_epoch_end()

Reorganize the index array while epoch is ended.

reset_index()

Set the index_array.

Here the index_array records the index of all the instances.

shuffle

shuffle getter.

matchzoo.data_generator.data_generator_builder module

class matchzoo.data_generator.data_generator_builder.DataGeneratorBuilder(**kwargs)

Bases: object

Data Generator Bulider. In essense a wrapped partial function.

Example

>>> import matchzoo as mz
>>> builder = mz.DataGeneratorBuilder(mode='pair', batch_size=32)
>>> data = mz.datasets.toy.load_data()
>>> gen = builder.build(data)
>>> type(gen)
<class 'matchzoo.data_generator.data_generator.DataGenerator'>
>>> gen.batch_size
32
>>> gen_64 = builder.build(data, batch_size=64)
>>> gen_64.batch_size
64
build(data_pack, **kwargs)

Build a DataGenerator.

Parameters:
  • data_pack – DataPack to build upon.
  • kwargs – Additional keyword arguments to override the keyword arguments passed in __init__.
Return type:

DataGenerator

Module contents