mindformers.AutoTokenizer¶
- class mindformers.AutoTokenizer[源代码]¶
Load the tokenizer according to the yaml_name_or_path. It supports the following situations 1. yaml_name_or_path is the model name. 2. yaml_name_or_path is the path to the downloaded files.
- Examples:
>>> from mindformers.auto_class import AutoTokenizer >>> >>> # 1) instantiates a tokenizer by the model name >>> tokenizer_a = AutoTokenizer.from_pretrained("clip_vit_b_32") >>> # 2) instantiates a tokenizer by the path to the downloaded files. >>> from mindformers.models.clip.clip_tokenizer import CLIPTokenizer >>> clip_tokenizer = CLIPTokenizer.from_pretrained("clip_vit_b_32") >>> clip_tokenizer.save_pretrained(path_saved) >>> restore_tokenizer = AutoTokenizer.from_pretrained(path_saved)
- classmethod from_pretrained(yaml_name_or_path, **kwargs)[源代码]¶
From pretrain method, which instantiates a tokenizer by yaml name or path.
- Args:
- yaml_name_or_path (str): A supported yaml name or a path to .yaml file,
the supported model name could be selected from .show_support_list(). If yaml_name_or_path is model name, it supports model names beginning with mindspore or the model name itself, such as “mindspore/clip_vit_b_32” or “clip_vit_b_32”.
- pretrained_model_name_or_path (Optional[str]): Equal to “yaml_name_or_path”,
if “pretrained_model_name_or_path” is set, “yaml_name_or_path” is useless.
- Returns:
A tokenizer which inherited from PretrainedTokenizer.