mindformers.pipeline.TextClassificationPipeline

class mindformers.pipeline.TextClassificationPipeline(model, tokenizer=None, **kwargs)[源代码]

Pipeline for text classification

参数
  • model (Union[str, BaseModel]) – The model used to perform task, the input could be a supported model name, or a model instance inherited from BaseModel.

  • tokenizer – a tokenizer (None or Tokenizer) for text processing.

引发
  • TypeError – If input model and image_processor’s types are not corrected.

  • ValueError – If the input model is not in support list.

实际案例

>>> from mindformers.pipeline import TextClassificationPipeline
>>> from mindformers import AutoTokenizer, BertForMultipleChoice, AutoConfig
>>> input_data = ["The new rights are nice enough-Everyone really likes the newest benefits ",
...               "i don't know um do you do a lot of camping-I know exactly."]
>>> tokenizer = AutoTokenizer.from_pretrained('txtcls_bert_base_uncased_mnli')
>>> txtcls_mnli_config = AutoConfig.from_pretrained('txtcls_bert_base_uncased_mnli')
>>> model = BertForMultipleChoice(txtcls_mnli_config)
>>> txtcls_pipeline = TextClassificationPipeline(task='text_classification',
...                                              model=model,
...                                              tokenizer=tokenizer,
...                                              max_length=model.config.seq_length,
...                                              padding="max_length")
>>> results = txtcls_pipeline(input_data, top_k=1)
>>> print(results)
    [[{'label': 'neutral', 'score': 0.9714198708534241}],
    [{'label': 'contradiction', 'score': 0.9967639446258545}]]
forward(model_inputs, **forward_params)[源代码]

Forward process

参数

model_inputs (dict) – outputs of preprocess.

返回

probs dict.

inputs_process(inputs_zero, inputs_one)[源代码]

process of two sentences relationship classification

参数
  • inputs_zero (str) – the first sentence

  • inputs_one (str) – the second sentence

返回

processed inputs, mask, token_type about two sentences

postprocess(model_outputs, **postprocess_params)[源代码]

Postprocess

参数
  • model_outputs (dict) – outputs of forward process.

  • top_k (int) – Return top_k probs of result

返回

Classification results

preprocess(inputs, **preprocess_params)[源代码]

Preprocess of text classification

参数
  • inputs (str) – the str to be classified.

  • max_length (int) – max length of tokenizer’s output

  • padding (False / "max_length") – padding for max_length

  • return_tensors ("ms") – the type of returned tensors

返回

processed text.