sec_parser.processing_steps.page_header_classifier ================================================== .. py:module:: sec_parser.processing_steps.page_header_classifier Classes ------- .. autoapisummary:: sec_parser.processing_steps.page_header_classifier.PageHeaderCandidate sec_parser.processing_steps.page_header_classifier.PageHeaderClassifier Module Contents --------------- .. py:class:: PageHeaderCandidate .. py:attribute:: TEXT_LENGTH_THRESHOLD :value: 100 .. py:attribute:: OCCURRENCE_THRESHOLD :value: 5 .. py:attribute:: MOST_COMMON_CANDIDATE_LIMIT :value: None .. py:attribute:: text :type: str .. py:attribute:: style :type: sec_parser.semantic_elements.highlighted_text_element.TextStyle | None .. py:class:: PageHeaderClassifier(types_to_process: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None, types_to_exclude: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None) Bases: :py:obj:`sec_parser.processing_steps.abstract_classes.abstract_elementwise_processing_step.AbstractElementwiseProcessingStep` .. py:attribute:: _NUM_ITERATIONS :value: 2 .. py:attribute:: _element_to_page_header_candidate :type: dict[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, PageHeaderCandidate] .. py:attribute:: _candidate_count :type: collections.Counter[PageHeaderCandidate] .. py:attribute:: _most_common_candidates :type: dict[PageHeaderCandidate, int] | None :value: None .. py:method:: _process_element(element: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, context: sec_parser.processing_steps.abstract_classes.abstract_elementwise_processing_step.ElementProcessingContext) -> sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement .. py:method:: _find_page_header_candidates(element: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement) -> None .. py:method:: _classify_elements(element: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement) -> sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement .. py:method:: _get_most_common_candidates() -> dict[PageHeaderCandidate, int]