mindformers.pipeline.TokenClassificationPipeline¶
- class mindformers.pipeline.TokenClassificationPipeline(model, id2label, tokenizer=None, **kwargs)[源代码]¶
Pipeline for token classification
- Args:
- model (Union[str, BaseModel]): The model used to perform task,
the input could be a supported model name, or a model instance inherited from BaseModel.
tokenizer : a tokenizer (None or Tokenizer) for text processing. id2label : a dict which maps label id to label str.
- Raises:
TypeError: If input model and image_processor’s types are not corrected. ValueError: If the input model is not in support list.
- Examples:
>>> from mindformers.pipeline import TokenClassificationPipeline >>> from mindformers import AutoTokenizer, BertForTokenClassification, AutoConfig >>> from mindformers.dataset.labels import cluener_labels >>> id2label = {label_id: label for label_id, label in enumerate(cluener_labels)} >>> input_data = ["表身刻有代表日内瓦钟表匠freresoltramare的“fo”字样。"] >>> tokenizer = AutoTokenizer.from_pretrained('tokcls_bert_base_chinese_cluener') >>> ner_dense_cluener_config = AutoConfig.from_pretrained('tokcls_bert_base_chinese_cluener') >>> model = BertForTokenClassification(ner_dense_cluener_config) >>> tokcls_pipeline = TokenClassificationPipeline(task='token_classification', ... model=model, ... id2label=id2label, ... tokenizer=tokenizer, ... max_length=model.config.seq_length, ... padding="max_length") >>> results = tokcls_pipeline(input_data) >>> print(results) [[{'entity_group': 'address', 'start': 6, 'end': 8, 'score': 0.52329, 'word': '日内瓦'}, {'entity_group': 'name', 'start': 12, 'end': 25, 'score': 0.83922, 'word': 'freresoltramar'}]]
- forward(model_inputs, **forward_params)[源代码]¶
Forward process
- Args:
model_inputs (dict): outputs of preprocess.
- Return:
probs dict.
- get_entities_bios(seq)[源代码]¶
Gets entities from sequence. note: BIOS Args:
seq (list): sequence of labels.
- Returns:
list: list of (chunk_type, chunk_start, chunk_end).
- Example:
# >>> seq = [‘B-PER’, ‘I-PER’, ‘O’, ‘S-LOC’] # >>> get_entity_bios(seq) [[‘PER’, 0, 1], [‘LOC’, 3, 3]]
- postprocess(model_outputs, **postprocess_params)[源代码]¶
Postprocess
- Args:
model_outputs (dict): outputs of forward process.
- Return:
The generated results
- preprocess(inputs, **preprocess_params)[源代码]¶
Preprocess of token classification
- Args:
inputs (str): the str to be classified. max_length (int): max length of tokenizer’s output padding (False / “max_length”): padding for max_length return_tensors (“ms”): the type of returned tensors
- Return:
processed text.