sec_parser.processing_steps.title_classifier
Classes
TitleClassifier elements into TitleElement instances by scanning a list |
Module Contents
- class sec_parser.processing_steps.title_classifier.TitleClassifier(types_to_process: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None, types_to_exclude: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None)
-
TitleClassifier elements into TitleElement instances by scanning a list of semantic elements and replacing suitable candidates.
The “_unique_styles_by_order” tuple:
Represents an ordered set of unique styles found in the document.
Preserves the order of insertion, which determines the hierarchical level of each style.
Assumes that earlier “highlight” styles correspond to higher level paragraph or section headings.
- _unique_styles_by_order: tuple[sec_parser.semantic_elements.highlighted_text_element.TextStyle, Ellipsis] = ()
- _add_unique_style(style: sec_parser.semantic_elements.highlighted_text_element.TextStyle) None
Add a new unique style if not already present.
- _process_element(element: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, _: sec_parser.processing_steps.abstract_classes.abstract_elementwise_processing_step.ElementProcessingContext) sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement
Process each element and convert to TitleElement if necessary.