sec_parser.processing_steps.page_number_classifier
Classes
AbstractElementwiseTransformStep class is used to iterate over |
|
The ElementProcessingContext class is designed to provide context information |
|
The PageNumberElement class represents a page number within a document. |
|
Create a collection of name/value pairs. |
|
Module Contents
- class sec_parser.processing_steps.page_number_classifier.AbstractElementwiseProcessingStep(*, types_to_process: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None, types_to_exclude: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None)
Bases:
sec_parser.processing_steps.abstract_classes.abstract_processing_step.AbstractProcessingStepAbstractElementwiseTransformStep class is used to iterate over all Semantic Elements with or without applying transformations.
- _NUM_ITERATIONS = 1
- abstract _process_element(element: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, context: sec_parser.processing_steps.abstract_classes.processing_context.ElementProcessingContext) sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement
_process_element method is responsible for transforming a single semantic element into another.
It can also be utilized to simply iterate over all elements without applying any transformations.
- _process_recursively(elements: list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement], *, _context: sec_parser.processing_steps.abstract_classes.processing_context.ElementProcessingContext) list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]
- class sec_parser.processing_steps.page_number_classifier.ElementProcessingContext
The ElementProcessingContext class is designed to provide context information for elementwise processing steps.
- iteration: int
- class sec_parser.processing_steps.page_number_classifier.PageNumberElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)
Bases:
IrrelevantElementThe PageNumberElement class represents a page number within a document. It is a subclass of the IrrelevantElement class and is used to identify and handle page numbers in the document.
- class sec_parser.processing_steps.page_number_classifier.PageNumberCandidate
- TEXT_LENGTH_THRESHOLD = 100
- OCCURRENCE_THRESHOLD = 5
- text: str
- class sec_parser.processing_steps.page_number_classifier.MostCommonCandidateSearchStatus(*args, **kwds)
Bases:
enum.EnumCreate a collection of name/value pairs.
Example enumeration:
>>> class Color(Enum): ... RED = 1 ... BLUE = 2 ... GREEN = 3
Access them by:
attribute access:
>>> Color.RED <Color.RED: 1>
value lookup:
>>> Color(1) <Color.RED: 1>
name lookup:
>>> Color['RED'] <Color.RED: 1>
Enumerations can be iterated over, and know how many members they have:
>>> len(Color) 3
>>> list(Color) [<Color.RED: 1>, <Color.BLUE: 2>, <Color.GREEN: 3>]
Methods can be added to enumerations, and members can have their own attributes – see the documentation for details.
- NOT_SEARCHED
- NOT_EXIST
- FOUND
- class sec_parser.processing_steps.page_number_classifier.PageNumberClassifier(types_to_process: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None, types_to_exclude: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None)
-
- _NUM_ITERATIONS = 2
- _process_element(element: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, context: sec_parser.processing_steps.abstract_classes.abstract_elementwise_processing_step.ElementProcessingContext) sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement
- _find_page_number_candidates(element: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement) None
- _classify_elements(element: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement) sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement
- _get_most_common_candidate() PageNumberCandidate | None