sec_parser.processing_steps.title_classifier

Classes

TitleClassifier

TitleClassifier elements into TitleElement instances by scanning a list

Module Contents

class sec_parser.processing_steps.title_classifier.TitleClassifier(types_to_process: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None, types_to_exclude: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None)

Bases: sec_parser.processing_steps.abstract_classes.abstract_elementwise_processing_step.AbstractElementwiseProcessingStep

TitleClassifier elements into TitleElement instances by scanning a list of semantic elements and replacing suitable candidates.

The “_unique_styles_by_order” tuple:

  • Represents an ordered set of unique styles found in the document.

  • Preserves the order of insertion, which determines the hierarchical level of each style.

  • Assumes that earlier “highlight” styles correspond to higher level paragraph or section headings.

_unique_styles_by_order: tuple[sec_parser.semantic_elements.highlighted_text_element.TextStyle, Ellipsis] = ()
_add_unique_style(style: sec_parser.semantic_elements.highlighted_text_element.TextStyle) None

Add a new unique style if not already present.

_process_element(element: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, _: sec_parser.processing_steps.abstract_classes.abstract_elementwise_processing_step.ElementProcessingContext) sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement

Process each element and convert to TitleElement if necessary.