sec_parser.processing_steps.abstract_classes.abstract_processing_step

Attributes

ElementTransformer

Exceptions

SecParserRuntimeError

Base exception class for sec_parser.

AlreadyProcessedError

Base exception class for sec_parser.

Classes

AbstractSemanticElement

In the domain of HTML parsing, especially in the context of SEC EDGAR documents,

AbstractProcessingStep

AbstractProcessingStep class for transforming a list of elements.

Module Contents

exception sec_parser.processing_steps.abstract_classes.abstract_processing_step.SecParserRuntimeError

Bases: SecParserError, RuntimeError

Base exception class for sec_parser. All custom exceptions in sec_parser are inherited from this class.

class sec_parser.processing_steps.abstract_classes.abstract_processing_step.AbstractSemanticElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

Bases: abc.ABC

In the domain of HTML parsing, especially in the context of SEC EDGAR documents, a semantic element refers to a meaningful unit within the document that serves a specific purpose. For example, a paragraph or a table might be considered a semantic element. Unlike syntactic elements, which merely exist to structure the HTML, semantic elements carry information that is vital to the understanding of the document’s content.

This class serves as a foundational representation of such semantic elements, containing an HtmlTag object that stores the raw HTML tag information. Subclasses will implement additional behaviors based on the type of the semantic element.

log_init(log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None) None

Has to be called at the very end of the __init__ method.

property html_tag: sec_parser.processing_engine.html_tag.HtmlTag
classmethod create_from_element(source: AbstractSemanticElement, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin) AbstractSemanticElement

Convert the semantic element into another semantic element type.

to_dict(*, include_previews: bool = False, include_contents: bool = False) dict[str, Any]
__repr__() str

Return repr(self).

contains_words() bool

Return True if the semantic element contains text.

property text: str

Property text is a passthrough to the HtmlTag text property.

get_source_code(*, pretty: bool = False, enable_compatibility: bool = False) str

get_source_code is a passthrough to the HtmlTag method.

get_summary() str

Return a human-readable summary of the semantic element.

This method aims to provide a simplified, human-friendly representation of the underlying HtmlTag. In this base implementation, it is a passthrough to the HtmlTag’s get_text() method.

Note: Subclasses may override this method to provide a more specific summary based on the type of element.

sec_parser.processing_steps.abstract_classes.abstract_processing_step.ElementTransformer
exception sec_parser.processing_steps.abstract_classes.abstract_processing_step.AlreadyProcessedError

Bases: sec_parser.exceptions.SecParserRuntimeError

Base exception class for sec_parser. All custom exceptions in sec_parser are inherited from this class.

class sec_parser.processing_steps.abstract_classes.abstract_processing_step.AbstractProcessingStep

Bases: abc.ABC

AbstractProcessingStep class for transforming a list of elements. Chaining multiple steps together allows for complex transformations while keeping the code modular.

Each instance of a step is designed to be used for a single transformation operation. This ensures that any internal state maintained during a transformation is isolated to the processing of a single document.

process(elements: list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]) list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]

Transform the list of semantic elements.

Note: The elements argument could potentially be mutated for performance reasons.

abstract _process(elements: list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]) list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]

Implement the actual transformation logic in child classes.

This method is intended to be overridden by child classes to provide specific transformation logic.