sec_parser.processing_steps.supplementary_text_classifier
Classes
AbstractElementwiseTransformStep class is used to iterate over |
|
The ElementProcessingContext class is designed to provide context information |
|
The HighlightedTextElement class, among other uses, |
|
The SupplementaryText class captures various types of supplementary text |
|
The TextElement class represents a standard text paragraph within a document. |
|
SupplementaryTextClassifier class for converting elements into |
Functions
|
Replace newlines and any following spaces with a single space. |
Module Contents
- class sec_parser.processing_steps.supplementary_text_classifier.AbstractElementwiseProcessingStep(*, types_to_process: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None, types_to_exclude: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None)
Bases:
sec_parser.processing_steps.abstract_classes.abstract_processing_step.AbstractProcessingStepAbstractElementwiseTransformStep class is used to iterate over all Semantic Elements with or without applying transformations.
- _NUM_ITERATIONS = 1
- abstract _process_element(element: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, context: sec_parser.processing_steps.abstract_classes.processing_context.ElementProcessingContext) sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement
_process_element method is responsible for transforming a single semantic element into another.
It can also be utilized to simply iterate over all elements without applying any transformations.
- _process_recursively(elements: list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement], *, _context: sec_parser.processing_steps.abstract_classes.processing_context.ElementProcessingContext) list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]
- class sec_parser.processing_steps.supplementary_text_classifier.ElementProcessingContext
The ElementProcessingContext class is designed to provide context information for elementwise processing steps.
- iteration: int
- class sec_parser.processing_steps.supplementary_text_classifier.HighlightedTextElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, style: TextStyle | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)
Bases:
sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElementThe HighlightedTextElement class, among other uses, is an intermediate step in identifying title elements.
For example:
First, elements with specific styles (like bold or italic text) are classified as HighlightedTextElements. These are later examined to determine if they should be considered TitleElements.
- classmethod create_from_element(source: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin, *, style: TextStyle | None = None) HighlightedTextElement
Convert the semantic element into another semantic element type.
- to_dict(*, include_previews: bool = False, include_contents: bool = False) dict[str, Any]
- class sec_parser.processing_steps.supplementary_text_classifier.SupplementaryText(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)
Bases:
sec_parser.semantic_elements.mixins.dict_text_content_mixin.DictTextContentMixin,sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElementThe SupplementaryText class captures various types of supplementary text within a document, such as unit qualifiers, additional notes, and disclaimers.
For example: - “(In millions, except number of shares which are reflected in thousands and
per share amounts)”
“See accompanying Notes to Condensed Consolidated Financial Statements.”
“Disclaimer: This is not financial advice.”
- class sec_parser.processing_steps.supplementary_text_classifier.TextElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)
Bases:
sec_parser.semantic_elements.mixins.dict_text_content_mixin.DictTextContentMixin,sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElementThe TextElement class represents a standard text paragraph within a document.
- sec_parser.processing_steps.supplementary_text_classifier.clean_whitespace(input_str: str) str
Replace newlines and any following spaces with a single space.
- class sec_parser.processing_steps.supplementary_text_classifier.SupplementaryTextClassifier(*, types_to_process: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None, types_to_exclude: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None)
-
SupplementaryTextClassifier class for converting elements into SupplementaryText instances.
This step scans through a list of semantic elements and changes it, primarily by replacing suitable candidates with SupplementaryText instances.
- _process_element(element: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, _: sec_parser.processing_steps.abstract_classes.abstract_elementwise_processing_step.ElementProcessingContext) sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement
Transform a single semantic element into a TextElement if applicable.