sec_parser.processing_steps.page_number_classifier ================================================== .. py:module:: sec_parser.processing_steps.page_number_classifier Classes ------- .. autoapisummary:: sec_parser.processing_steps.page_number_classifier.PageNumberCandidate sec_parser.processing_steps.page_number_classifier.MostCommonCandidateSearchStatus sec_parser.processing_steps.page_number_classifier.PageNumberClassifier Module Contents --------------- .. py:class:: PageNumberCandidate .. py:attribute:: TEXT_LENGTH_THRESHOLD :value: 100 .. py:attribute:: OCCURRENCE_THRESHOLD :value: 5 .. py:attribute:: text :type: str .. py:class:: MostCommonCandidateSearchStatus(*args, **kwds) Bases: :py:obj:`enum.Enum` Create a collection of name/value pairs. Example enumeration: >>> class Color(Enum): ... RED = 1 ... BLUE = 2 ... GREEN = 3 Access them by: - attribute access:: >>> Color.RED - value lookup: >>> Color(1) - name lookup: >>> Color['RED'] Enumerations can be iterated over, and know how many members they have: >>> len(Color) 3 >>> list(Color) [, , ] Methods can be added to enumerations, and members can have their own attributes -- see the documentation for details. .. py:attribute:: NOT_SEARCHED .. py:attribute:: NOT_EXIST .. py:attribute:: FOUND .. py:class:: PageNumberClassifier(types_to_process: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None, types_to_exclude: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None) Bases: :py:obj:`sec_parser.processing_steps.abstract_classes.abstract_elementwise_processing_step.AbstractElementwiseProcessingStep` .. py:attribute:: _NUM_ITERATIONS :value: 2 .. py:attribute:: _element_to_page_number_candidate :type: dict[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, PageNumberCandidate] .. py:attribute:: _candidate_count :type: collections.Counter[PageNumberCandidate] .. py:attribute:: _most_common_candidate :type: PageNumberCandidate | None :value: None .. py:attribute:: _most_common_candidate_count :type: int :value: 0 .. py:attribute:: _search_status :type: MostCommonCandidateSearchStatus .. py:method:: _process_element(element: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, context: sec_parser.processing_steps.abstract_classes.abstract_elementwise_processing_step.ElementProcessingContext) -> sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement .. py:method:: _find_page_number_candidates(element: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement) -> None .. py:method:: _classify_elements(element: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement) -> sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement .. py:method:: _get_most_common_candidate() -> PageNumberCandidate | None