sec_parser.processing_steps.page_number_classifier

Classes

AbstractElementwiseProcessingStep

AbstractElementwiseTransformStep class is used to iterate over

ElementProcessingContext

The ElementProcessingContext class is designed to provide context information

PageNumberElement

The PageNumberElement class represents a page number within a document.

PageNumberCandidate

MostCommonCandidateSearchStatus

Create a collection of name/value pairs.

PageNumberClassifier

Module Contents

class sec_parser.processing_steps.page_number_classifier.AbstractElementwiseProcessingStep(*, types_to_process: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None, types_to_exclude: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None)

Bases: sec_parser.processing_steps.abstract_classes.abstract_processing_step.AbstractProcessingStep

AbstractElementwiseTransformStep class is used to iterate over all Semantic Elements with or without applying transformations.

_NUM_ITERATIONS = 1
abstract _process_element(element: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, context: sec_parser.processing_steps.abstract_classes.processing_context.ElementProcessingContext) sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement

_process_element method is responsible for transforming a single semantic element into another.

It can also be utilized to simply iterate over all elements without applying any transformations.

_process_recursively(elements: list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement], *, _context: sec_parser.processing_steps.abstract_classes.processing_context.ElementProcessingContext) list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]
_process(elements: list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]) list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]
class sec_parser.processing_steps.page_number_classifier.ElementProcessingContext

The ElementProcessingContext class is designed to provide context information for elementwise processing steps.

iteration: int
class sec_parser.processing_steps.page_number_classifier.PageNumberElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

Bases: IrrelevantElement

The PageNumberElement class represents a page number within a document. It is a subclass of the IrrelevantElement class and is used to identify and handle page numbers in the document.

class sec_parser.processing_steps.page_number_classifier.PageNumberCandidate
TEXT_LENGTH_THRESHOLD = 100
OCCURRENCE_THRESHOLD = 5
text: str
class sec_parser.processing_steps.page_number_classifier.MostCommonCandidateSearchStatus(*args, **kwds)

Bases: enum.Enum

Create a collection of name/value pairs.

Example enumeration:

>>> class Color(Enum):
...     RED = 1
...     BLUE = 2
...     GREEN = 3

Access them by:

  • attribute access:

>>> Color.RED
<Color.RED: 1>
  • value lookup:

>>> Color(1)
<Color.RED: 1>
  • name lookup:

>>> Color['RED']
<Color.RED: 1>

Enumerations can be iterated over, and know how many members they have:

>>> len(Color)
3
>>> list(Color)
[<Color.RED: 1>, <Color.BLUE: 2>, <Color.GREEN: 3>]

Methods can be added to enumerations, and members can have their own attributes – see the documentation for details.

NOT_SEARCHED
NOT_EXIST
FOUND
class sec_parser.processing_steps.page_number_classifier.PageNumberClassifier(types_to_process: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None, types_to_exclude: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None)

Bases: sec_parser.processing_steps.abstract_classes.abstract_elementwise_processing_step.AbstractElementwiseProcessingStep

_NUM_ITERATIONS = 2
_process_element(element: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, context: sec_parser.processing_steps.abstract_classes.abstract_elementwise_processing_step.ElementProcessingContext) sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement
_find_page_number_candidates(element: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement) None
_classify_elements(element: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement) sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement
_get_most_common_candidate() PageNumberCandidate | None