mindformers.models.clip.CLIPModel¶
- class mindformers.models.clip.CLIPModel(config: CLIPConfig)[源代码]¶
CLIPModel. The supported model name could be selected from CLIPModel.show_support_list().
- Args:
config (CLIPConfig): The config of clip model, which could be obtained by CLIPConfig class.
- Examples:
>>> from mindformers import CLIPModel >>> CLIPModel.show_support_list() INFO - support list of CLIPModel is: INFO - ['clip_vit_b_32'] INFO - ------------------------------------- >>> model = CLIPModel.from_pretrained('clip_vit_b_32') >>> type(model) <class 'mindformers.models.clip.clip.CLIPModel'>
- get_image_features(image: Tensor, pixel_values: Optional[Tensor] = None)[源代码]¶
Get_image_features
- Args:
image (ms.Tensor): A image tensor processed by image_processor. pixel_values (Optional[ms.Tensor]): Equal to “image”,
if “pixel_values” is set, “image” is useless.
- Returns:
Image feature.
- Examples:
>>> import numpy as np >>> from mindformers import CLIPModel, CLIPProcessor >>> processor = CLIPProcessor.from_pretrained('clip_vit_b_32') >>> model = CLIPModel.from_pretrained('clip_vit_b_32') >>> fake_image_batch = np.random.random((5, 3, 578, 213)) >>> model.get_image_features(processor.image_processor(fake_image_batch)) Tensor(shape=[5, 512], dtype=Float32, value= [[-1.50102973e-001, -2.63687313e-001, -5.65953791e-001 ... -2.93511450e-001], [-1.50103331e-001, -2.63622820e-001, -5.65623760e-001 ... -2.93337226e-001], [-1.50102973e-001, -2.63687313e-001, -5.65953791e-001 ... -2.93511450e-001], [-1.49712294e-001, -2.64100820e-001, -5.65740824e-001 ... -2.93599486e-001], [-1.50102973e-001, -2.63687313e-001, -5.65953791e-001 ... -2.93511450e-001]])
- get_text_features(text: Tensor, input_ids: Optional[Tensor] = None)[源代码]¶
Get_text_features
- Args:
text (ms.Tensor): A text id tensor processed by tokenizer. input_ids (Optional[ms.Tensor]): Equal to “text”,
if “input_ids” is set, “text” is useless.
- Returns:
Text feature.
- Examples:
>>> from mindformers import CLIPModel, CLIPProcessor >>> processor = CLIPProcessor.from_pretrained('clip_vit_b_32') >>> model = CLIPModel.from_pretrained('clip_vit_b_32') >>> fake_text_batch = ["a boy", "a girl", "a women", "a men"] >>> text = processor.tokenizer( ... fake_text_batch, max_length=77, padding="max_length", return_tensors="ms" ... )["input_ids"] >>> model.get_text_features(text) Tensor(shape=[4, 512], dtype=Float32, value= [[6.03631809e-002, 1.79528534e-001, ... -2.23753393e-001, 1.42413378e-002], [1.28974199e-001, 7.46373609e-002, ... -3.68579805e-001, 1.53980583e-001], [9.89909172e-002, 2.01410800e-002, ... -2.54495114e-001, 7.68117979e-002], [3.16975415e-002, 2.26992741e-001, ... -5.22942394e-002, 1.98922127e-001]])