mindformers.models.clip.CLIPTokenizer¶
-
class
mindformers.models.clip.CLIPTokenizer(vocab_file: str, eos_token: str = '<|endoftext|>', bos_token: str = '<|startoftext|>', pad_token: str = '<|endoftext|>', unk_token: str = '<|endoftext|>')[源代码]¶ CLIP Tokenizer
- 参数
vocab_file (str) – File path of vocab.
eos_token (str) – Eos_token.
bos_token (str) – Bos_token.
pad_token (str) – Pad_token.
unk_token (str) – Unk_token.
实际案例
>>> from mindformers import CLIPTokenizer >>> CLIPTokenizer.show_support_list() INFO - support list of CLIPTokenizer is: INFO - ['clip_vit_b_32'] INFO - ------------------------------------- >>> tokenizer = CLIPTokenizer.from_pretrained('clip_vit_b_32') >>> tokenizer("a boy") {'input_ids': [49406, 320, 1876, 49407], 'attention_mask': [1, 1, 1, 1]}
-
FILE_LIST= ['tokenizer_config.json']¶ clip tokenizer
-
build_inputs_with_special_tokens(token_ids_0, token_ids_1=None)[源代码]¶ Insert the special tokens to the input_ids. Currently, we support token_ids_0 is a list of ids.
-
property
vocab_size¶ Get the vocab size