sec_parser.semantic_elements.abstract_semantic_element ====================================================== .. py:module:: sec_parser.semantic_elements.abstract_semantic_element Exceptions ---------- .. autoapisummary:: sec_parser.semantic_elements.abstract_semantic_element.InvalidLevelError Classes ------- .. autoapisummary:: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement sec_parser.semantic_elements.abstract_semantic_element.AbstractLevelElement Module Contents --------------- .. py:class:: AbstractSemanticElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None) Bases: :py:obj:`abc.ABC` In the domain of HTML parsing, especially in the context of SEC EDGAR documents, a semantic element refers to a meaningful unit within the document that serves a specific purpose. For example, a paragraph or a table might be considered a semantic element. Unlike syntactic elements, which merely exist to structure the HTML, semantic elements carry information that is vital to the understanding of the document's content. This class serves as a foundational representation of such semantic elements, containing an HtmlTag object that stores the raw HTML tag information. Subclasses will implement additional behaviors based on the type of the semantic element. .. py:attribute:: _html_tag .. py:attribute:: processing_log .. py:method:: log_init(log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None) -> None Has to be called at the very end of the __init__ method. .. py:property:: html_tag :type: sec_parser.processing_engine.html_tag.HtmlTag .. py:method:: create_from_element(source: AbstractSemanticElement, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin) -> AbstractSemanticElement :classmethod: Convert the semantic element into another semantic element type. .. py:method:: to_dict(*, include_previews: bool = False, include_contents: bool = False) -> dict[str, Any] .. py:method:: __repr__() -> str .. py:method:: contains_words() -> bool Return True if the semantic element contains text. .. py:property:: text :type: str Property text is a passthrough to the HtmlTag text property. .. py:method:: get_source_code(*, pretty: bool = False, enable_compatibility: bool = False) -> str get_source_code is a passthrough to the HtmlTag method. .. py:method:: get_summary() -> str Return a human-readable summary of the semantic element. This method aims to provide a simplified, human-friendly representation of the underlying HtmlTag. In this base implementation, it is a passthrough to the HtmlTag's get_text() method. Note: Subclasses may override this method to provide a more specific summary based on the type of element. .. py:class:: AbstractLevelElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, level: int | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None) Bases: :py:obj:`AbstractSemanticElement` The AbstractLevelElement class provides a level attribute to semantic elements. It represents hierarchical levels in the document structure. For instance, a main section title might be at level 1, a subsection at level 2, etc. .. py:attribute:: MIN_LEVEL :value: 0 .. py:attribute:: level :value: 0 .. py:method:: create_from_element(source: AbstractSemanticElement, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin, *, level: int | None = None) -> AbstractLevelElement :classmethod: Convert the semantic element into another semantic element type. .. py:method:: to_dict(*, include_previews: bool = False, include_contents: bool = False) -> dict[str, Any] .. py:method:: __repr__() -> str .. py:exception:: InvalidLevelError Bases: :py:obj:`sec_parser.exceptions.SecParserValueError` Base exception class for sec_parser. All custom exceptions in sec_parser are inherited from this class.