Skip to content

dataset #

Utilities for working with datasets.

Functions:

  • iter_dataset

    Iterate over a Hugging Face Dataset, yielding each 'row' as a dictionary.

iter_dataset #

iter_dataset(dataset: Dataset) -> Iterator[dict[str, Any]]

Iterate over a Hugging Face Dataset, yielding each 'row' as a dictionary.

Parameters:

  • dataset (Dataset) –

    The dataset to iterate over.

Yields:

  • dict[str, Any]

    A dictionary representing a single entry in the batch, where each key is a

  • dict[str, Any]

    column name and the corresponding value is the entry in that column for the

  • dict[str, Any]

    current row.

Source code in descent/utils/dataset.py
def iter_dataset(dataset: datasets.Dataset) -> typing.Iterator[dict[str, typing.Any]]:
    """Iterate over a Hugging Face Dataset, yielding each 'row' as a dictionary.

    Args:
        dataset: The dataset to iterate over.

    Yields:
        A dictionary representing a single entry in the batch, where each key is a
        column name and the corresponding value is the entry in that column for the
        current row.
    """

    columns = [*dataset.features]

    for row in zip(*[dataset[column] for column in columns], strict=True):
        yield dict(zip(columns, row, strict=True))