mindformers.models.clip.CLIPModel¶
-
class
mindformers.models.clip.CLIPModel(config: mindformers.models.clip.clip_config.CLIPConfig)[源代码]¶ CLIPModel. The supported model name could be selected from CLIPModel.show_support_list().
- 参数
config (CLIPConfig) – The config of clip model, which could be obtained by CLIPConfig class.
实际案例
>>> from mindformers import CLIPModel >>> CLIPModel.show_support_list() INFO - support list of CLIPModel is: INFO - ['clip_vit_b_32'] INFO - ------------------------------------- >>> model = CLIPModel.from_pretrained('clip_vit_b_32') >>> type(model) <class 'mindformers.models.clip.clip.CLIPModel'>
-
get_image_features(image: mindspore.common.tensor.Tensor, pixel_values: Optional[mindspore.common.tensor.Tensor] = None)[源代码]¶ Get_image_features
- 参数
image (ms.Tensor) – A image tensor processed by image_processor.
pixel_values (Optional[ms.Tensor]) – Equal to “image”, if “pixel_values” is set, “image” is useless.
- 返回
Image feature.
实际案例
>>> import numpy as np >>> from mindformers import CLIPModel, CLIPProcessor >>> processor = CLIPProcessor.from_pretrained('clip_vit_b_32') >>> model = CLIPModel.from_pretrained('clip_vit_b_32') >>> fake_image_batch = np.random.random((5, 3, 578, 213)) >>> model.get_image_features(processor.image_processor(fake_image_batch)) Tensor(shape=[5, 512], dtype=Float32, value= [[-1.50102973e-001, -2.63687313e-001, -5.65953791e-001 ... -2.93511450e-001], [-1.50103331e-001, -2.63622820e-001, -5.65623760e-001 ... -2.93337226e-001], [-1.50102973e-001, -2.63687313e-001, -5.65953791e-001 ... -2.93511450e-001], [-1.49712294e-001, -2.64100820e-001, -5.65740824e-001 ... -2.93599486e-001], [-1.50102973e-001, -2.63687313e-001, -5.65953791e-001 ... -2.93511450e-001]])
-
get_text_features(text: mindspore.common.tensor.Tensor, input_ids: Optional[mindspore.common.tensor.Tensor] = None)[源代码]¶ Get_text_features
- 参数
text (ms.Tensor) – A text id tensor processed by tokenizer.
input_ids (Optional[ms.Tensor]) – Equal to “text”, if “input_ids” is set, “text” is useless.
- 返回
Text feature.
实际案例
>>> from mindformers import CLIPModel, CLIPProcessor >>> processor = CLIPProcessor.from_pretrained('clip_vit_b_32') >>> model = CLIPModel.from_pretrained('clip_vit_b_32') >>> fake_text_batch = ["a boy", "a girl", "a women", "a men"] >>> text = processor.tokenizer( ... fake_text_batch, max_length=77, padding="max_length", return_tensors="ms" ... )["input_ids"] >>> model.get_text_features(text) Tensor(shape=[4, 512], dtype=Float32, value= [[6.03631809e-002, 1.79528534e-001, ... -2.23753393e-001, 1.42413378e-002], [1.28974199e-001, 7.46373609e-002, ... -3.68579805e-001, 1.53980583e-001], [9.89909172e-002, 2.01410800e-002, ... -2.54495114e-001, 7.68117979e-002], [3.16975415e-002, 2.26992741e-001, ... -5.22942394e-002, 1.98922127e-001]])