mindformers.models.clip.CLIPProcessor¶
- class mindformers.models.clip.CLIPProcessor(image_processor, tokenizer, max_length=77, padding='max_length', return_tensors='ms')[源代码]¶
CLIP Processor, consists of a feature extractor (BaseFeatureEXtractor) for image input, and a tokenizer (BaseTokenizer) for text input.
- Args:
image_processor (BaseImageProcessor): Used for process image data. tokenizer (BaseTokenizer): Used for process text data. max_length (Optional[int]): The length of text tokens. padding (Optional[str]): The padding strategy of tokenizer, [None, “max_length”]. return_tensors (Optional[str]): The type of returned tensors for tokenizer, [None, “ms”].
- Examples:
>>> from mindformers import CLIPProcessor >>> from mindformers.tools.image_tools import load_image >>> image = load_image("https://ascend-repo-modelzoo.obs.cn-east-2." ... "myhuaweicloud.com/XFormer_for_mindspore/clip/sunflower.png") >>> text = ["a boy", "a girl"] >>> CLIPProcessor.show_support_list() INFO - support list of CLIP Processor is: INFO - ['clip_vit_b_32'] INFO - ------------------------------------- >>> processor = CLIPProcessor.from_pretrained('clip_vit_b_32') >>> processor(image, text) {'image': Tensor(shape=[1, 3, 224, 224], dtype=Float32, value= [[[[-1.52949083e+000, -1.52949083e+000,... -1.48569560e+000, -1.50029397e+000], [-1.52949083e+000, -1.52949083e+000, ... -1.50029397e+000, -1.50029397e+000], [-1.50029397e+000, -1.50029397e+000 ... -1.48569560e+000, -1.50029397e+000], ... [8.23431015e-001, 8.80311251e-001, ... -1.33801913e+000, -1.43755960e+000], [7.80770779e-001, 8.37651074e-001, ... -1.23847866e+000, -1.39489937e+000], [6.10130012e-001, 7.66550720e-001, ... -1.19581854e+000, -1.38067937e+000]]]]), 'text': Tensor(shape=[2, 77], dtype=Int32, value= [[49406, 320, 1876 ... 0, 0, 0], [49406, 320, 1611 ... 0, 0, 0]])}