mindformers.pipeline.TextClassificationPipeline¶
-
class
mindformers.pipeline.TextClassificationPipeline(model, tokenizer=None, **kwargs)[源代码]¶ Pipeline for text classification
- 参数
model (Union[str, BaseModel]) – The model used to perform task, the input could be a supported model name, or a model instance inherited from BaseModel.
tokenizer – a tokenizer (None or Tokenizer) for text processing.
- 引发
TypeError – If input model and image_processor’s types are not corrected.
ValueError – If the input model is not in support list.
实际案例
>>> from mindformers.pipeline import TextClassificationPipeline >>> from mindformers import AutoTokenizer, BertForMultipleChoice, AutoConfig >>> input_data = ["The new rights are nice enough-Everyone really likes the newest benefits ", ... "i don't know um do you do a lot of camping-I know exactly."] >>> tokenizer = AutoTokenizer.from_pretrained('txtcls_bert_base_uncased_mnli') >>> txtcls_mnli_config = AutoConfig.from_pretrained('txtcls_bert_base_uncased_mnli') >>> model = BertForMultipleChoice(txtcls_mnli_config) >>> txtcls_pipeline = TextClassificationPipeline(task='text_classification', ... model=model, ... tokenizer=tokenizer, ... max_length=model.config.seq_length, ... padding="max_length") >>> results = txtcls_pipeline(input_data, top_k=1) >>> print(results) [[{'label': 'neutral', 'score': 0.9714198708534241}], [{'label': 'contradiction', 'score': 0.9967639446258545}]]
-
forward(model_inputs, **forward_params)[源代码]¶ Forward process
- 参数
model_inputs (dict) – outputs of preprocess.
- 返回
probs dict.
-
inputs_process(inputs_zero, inputs_one)[源代码]¶ process of two sentences relationship classification
- 参数
inputs_zero (str) – the first sentence
inputs_one (str) – the second sentence
- 返回
processed inputs, mask, token_type about two sentences
-
postprocess(model_outputs, **postprocess_params)[源代码]¶ Postprocess
- 参数
model_outputs (dict) – outputs of forward process.
top_k (int) – Return top_k probs of result
- 返回
Classification results
-
preprocess(inputs, **preprocess_params)[源代码]¶ Preprocess of text classification
- 参数
inputs (str) – the str to be classified.
max_length (int) – max length of tokenizer’s output
padding (False / "max_length") – padding for max_length
return_tensors ("ms") – the type of returned tensors
- 返回
processed text.