sec_parser.semantic_elements.abstract_semantic_element

Exceptions

InvalidLevelError

Base exception class for sec_parser.

Classes

AbstractSemanticElement

In the domain of HTML parsing, especially in the context of SEC EDGAR documents,

AbstractLevelElement

The AbstractLevelElement class provides a level attribute to semantic elements.

Module Contents

class sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

Bases: abc.ABC

In the domain of HTML parsing, especially in the context of SEC EDGAR documents, a semantic element refers to a meaningful unit within the document that serves a specific purpose. For example, a paragraph or a table might be considered a semantic element. Unlike syntactic elements, which merely exist to structure the HTML, semantic elements carry information that is vital to the understanding of the document’s content.

This class serves as a foundational representation of such semantic elements, containing an HtmlTag object that stores the raw HTML tag information. Subclasses will implement additional behaviors based on the type of the semantic element.

_html_tag
processing_log
log_init(log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None) None

Has to be called at the very end of the __init__ method.

property html_tag: sec_parser.processing_engine.html_tag.HtmlTag
classmethod create_from_element(source: AbstractSemanticElement, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin) AbstractSemanticElement

Convert the semantic element into another semantic element type.

to_dict(*, include_previews: bool = False, include_contents: bool = False) dict[str, Any]
__repr__() str
contains_words() bool

Return True if the semantic element contains text.

property text: str

Property text is a passthrough to the HtmlTag text property.

get_source_code(*, pretty: bool = False, enable_compatibility: bool = False) str

get_source_code is a passthrough to the HtmlTag method.

get_summary() str

Return a human-readable summary of the semantic element.

This method aims to provide a simplified, human-friendly representation of the underlying HtmlTag. In this base implementation, it is a passthrough to the HtmlTag’s get_text() method.

Note: Subclasses may override this method to provide a more specific summary based on the type of element.

class sec_parser.semantic_elements.abstract_semantic_element.AbstractLevelElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, level: int | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

Bases: AbstractSemanticElement

The AbstractLevelElement class provides a level attribute to semantic elements. It represents hierarchical levels in the document structure. For instance, a main section title might be at level 1, a subsection at level 2, etc.

MIN_LEVEL = 0
level = 0
classmethod create_from_element(source: AbstractSemanticElement, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin, *, level: int | None = None) AbstractLevelElement

Convert the semantic element into another semantic element type.

to_dict(*, include_previews: bool = False, include_contents: bool = False) dict[str, Any]
__repr__() str
exception sec_parser.semantic_elements.abstract_semantic_element.InvalidLevelError

Bases: sec_parser.exceptions.SecParserValueError

Base exception class for sec_parser. All custom exceptions in sec_parser are inherited from this class.