mindformers.pipeline.TokenClassificationPipeline

class mindformers.pipeline.TokenClassificationPipeline(model, id2label, tokenizer=None, **kwargs)[源代码]

Pipeline for token classification

参数
  • model (Union[str, BaseModel]) – The model used to perform task, the input could be a supported model name, or a model instance inherited from BaseModel.

  • tokenizer – a tokenizer (None or Tokenizer) for text processing.

  • id2label – a dict which maps label id to label str.

引发
  • TypeError – If input model and image_processor’s types are not corrected.

  • ValueError – If the input model is not in support list.

实际案例

>>> from mindformers.pipeline import TokenClassificationPipeline
>>> from mindformers import AutoTokenizer, BertForTokenClassification, AutoConfig
>>> from mindformers.dataset.labels import cluener_labels
>>> id2label = {label_id: label for label_id, label in enumerate(cluener_labels)}
>>> input_data = ["表身刻有代表日内瓦钟表匠freresoltramare的“fo”字样。"]
>>> tokenizer = AutoTokenizer.from_pretrained('tokcls_bert_base_chinese_cluener')
>>> ner_dense_cluener_config = AutoConfig.from_pretrained('tokcls_bert_base_chinese_cluener')
>>> model = BertForTokenClassification(ner_dense_cluener_config)
>>> tokcls_pipeline = TokenClassificationPipeline(task='token_classification',
...                                               model=model,
...                                               id2label=id2label,
...                                               tokenizer=tokenizer,
...                                               max_length=model.config.seq_length,
...                                               padding="max_length")
>>> results = tokcls_pipeline(input_data)
>>> print(results)
    [[{'entity_group': 'address', 'start': 6, 'end': 8, 'score': 0.52329, 'word': '日内瓦'},
      {'entity_group': 'name', 'start': 12, 'end': 25, 'score': 0.83922, 'word': 'freresoltramar'}]]
forward(model_inputs, **forward_params)[源代码]

Forward process

参数

model_inputs (dict) – outputs of preprocess.

返回

probs dict.

get_entities_bios(seq)[源代码]

Gets entities from sequence. note: BIOS :param seq: sequence of labels. :type seq: list

返回

list of (chunk_type, chunk_start, chunk_end).

返回类型

list

示例

# >>> seq = [‘B-PER’, ‘I-PER’, ‘O’, ‘S-LOC’] # >>> get_entity_bios(seq) [[‘PER’, 0, 1], [‘LOC’, 3, 3]]

postprocess(model_outputs, **postprocess_params)[源代码]

Postprocess

参数

model_outputs (dict) – outputs of forward process.

返回

The generated results

preprocess(inputs, **preprocess_params)[源代码]

Preprocess of token classification

参数
  • inputs (str) – the str to be classified.

  • max_length (int) – max length of tokenizer’s output

  • padding (False / "max_length") – padding for max_length

  • return_tensors ("ms") – the type of returned tensors

返回

processed text.