mindformers.dataset.MaskLanguageModelDataset

class mindformers.dataset.MaskLanguageModelDataset(dataset_config: dict = None)[源代码]

Bert pretrain dataset.

实际案例

>>> from mindformers.tools.register import MindFormerConfig
>>> from mindformers import MindFormerBook
>>> from mindformers.dataset import MaskLanguageModelDataset
>>> from mindformers.dataset import build_dataset, check_dataset_config
>>> config_dict_list = MindFormerBook.get_trainer_support_task_list()
>>> config_path = config_dict_list['fill_mask']['bert_tiny_uncased']
>>> # Initialize a MindFormerConfig instance with a specific config file of yaml.
>>> config = MindFormerConfig(config_path)
>>> config.train_dataset.data_loader.dataset_dir = "The required task dataset path"
    Note:
        The detailed data setting could refer to
        https://gitee.com/mindspore/mindformers/blob/r0.3/docs/model_cards/bert.md
>>> check_dataset_config(config)
>>> # 1) use config dict to build dataset
>>> dataset_from_config = build_dataset(config.train_dataset_task)
>>> # 2) use class name to build dataset
>>> dataset_from_name = build_dataset(class_name='MaskLanguageModelDataset',
...                                   dataset_config=config.train_dataset_task.dataset_config)
>>> # 3) use class to build dataset
>>> dataset_from_class = MaskLanguageModelDataset(config.train_dataset_task.dataset_config)