sec_parser.semantic_elements
============================

.. py:module:: sec_parser.semantic_elements

.. autoapi-nested-parse::

   The semantic_elements subpackage provides abstractions
   for meaningful units in SEC EDGAR documents. It converts
   raw HTML elements into representations that carry
   semantic significance.


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/sec_parser/semantic_elements/abstract_semantic_element/index
   /autoapi/sec_parser/semantic_elements/composite_semantic_element/index
   /autoapi/sec_parser/semantic_elements/highlighted_text_element/index
   /autoapi/sec_parser/semantic_elements/mixins/index
   /autoapi/sec_parser/semantic_elements/semantic_elements/index
   /autoapi/sec_parser/semantic_elements/table_element/index
   /autoapi/sec_parser/semantic_elements/title_element/index
   /autoapi/sec_parser/semantic_elements/top_section_start_marker/index
   /autoapi/sec_parser/semantic_elements/top_section_title/index
   /autoapi/sec_parser/semantic_elements/top_section_title_types/index


Exceptions
----------

.. autoapisummary::

   sec_parser.semantic_elements.InvalidLevelError


Classes
-------

.. autoapisummary::

   sec_parser.semantic_elements.AbstractLevelElement
   sec_parser.semantic_elements.AbstractSemanticElement
   sec_parser.semantic_elements.CompositeSemanticElement
   sec_parser.semantic_elements.EmptyElement
   sec_parser.semantic_elements.ImageElement
   sec_parser.semantic_elements.IrrelevantElement
   sec_parser.semantic_elements.NotYetClassifiedElement
   sec_parser.semantic_elements.PageHeaderElement
   sec_parser.semantic_elements.PageNumberElement
   sec_parser.semantic_elements.SupplementaryText
   sec_parser.semantic_elements.TextElement
   sec_parser.semantic_elements.TableElement
   sec_parser.semantic_elements.TitleElement
   sec_parser.semantic_elements.TopSectionTitle


Package Contents
----------------

.. py:class:: AbstractLevelElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, level: int | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

   Bases: :py:obj:`AbstractSemanticElement`


   The AbstractLevelElement class provides a level attribute to semantic elements.
   It represents hierarchical levels in the document structure. For instance,
   a main section title might be at level 1, a subsection at level 2, etc.


   .. py:attribute:: MIN_LEVEL
      :value: 0


   .. py:attribute:: level
      :value: 0


   .. py:method:: create_from_element(source: AbstractSemanticElement, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin, *, level: int | None = None) -> AbstractLevelElement
      :classmethod:


      Convert the semantic element into another semantic element type.


   .. py:method:: to_dict(*, include_previews: bool = False, include_contents: bool = False) -> dict[str, Any]


   .. py:method:: __repr__() -> str


.. py:class:: AbstractSemanticElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

   Bases: :py:obj:`abc.ABC`


   In the domain of HTML parsing, especially in the context of SEC EDGAR documents,
   a semantic element refers to a meaningful unit within the document that serves a
   specific purpose. For example, a paragraph or a table might be considered a
   semantic element. Unlike syntactic elements, which merely exist to structure the
   HTML, semantic elements carry information that is vital to the understanding of the
   document's content.

   This class serves as a foundational representation of such semantic elements,
   containing an HtmlTag object that stores the raw HTML tag information. Subclasses
   will implement additional behaviors based on the type of the semantic element.


   .. py:attribute:: _html_tag


   .. py:attribute:: processing_log


   .. py:method:: log_init(log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None) -> None

      Has to be called at the very end of the __init__ method.


   .. py:property:: html_tag
      :type: sec_parser.processing_engine.html_tag.HtmlTag


   .. py:method:: create_from_element(source: AbstractSemanticElement, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin) -> AbstractSemanticElement
      :classmethod:


      Convert the semantic element into another semantic element type.


   .. py:method:: to_dict(*, include_previews: bool = False, include_contents: bool = False) -> dict[str, Any]


   .. py:method:: __repr__() -> str


   .. py:method:: contains_words() -> bool

      Return True if the semantic element contains text.


   .. py:property:: text
      :type: str


      Property text is a passthrough to the HtmlTag text property.


   .. py:method:: get_source_code(*, pretty: bool = False, enable_compatibility: bool = False) -> str

      get_source_code is a passthrough to the HtmlTag method.


   .. py:method:: get_summary() -> str

      Return a human-readable summary of the semantic element.

      This method aims to provide a simplified, human-friendly representation of
      the underlying HtmlTag. In this base implementation, it is a passthrough
      to the HtmlTag's get_text() method.

      Note: Subclasses may override this method to provide a more specific summary
      based on the type of element.


.. py:exception:: InvalidLevelError

   Bases: :py:obj:`sec_parser.exceptions.SecParserValueError`


   Base exception class for sec_parser.
   All custom exceptions in sec_parser are inherited from this class.


.. py:class:: CompositeSemanticElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, inner_elements: tuple[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, Ellipsis] | None, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

   Bases: :py:obj:`sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement`


   CompositeSemanticElement acts as a container for other semantic elements,
   especially for cases where a single HTML root tag wraps multiple elements.
   This ensures structural integrity and enables various features like
   semantic segmentation visualization, and debugging by comparison with the
   original document.

   Why is this useful:
   ===================
   1. Some semantic elements, like XBRL tags (<ix>), may wrap multiple semantic
   elements. The container ensures that these relationships are not broken
   during parsing.
   2. Enables the parser to fully reconstruct the original HTML document, which
   opens up possibilities for features like semantic segmentation visualization
   (e.g. recreate the original document but put semi-transparent colored boxes
   on top, based on semantic meaning), serialization of parsed documents into
   an augmented HTML, and debugging by comparing to the original document.


   .. py:attribute:: _inner_elements
      :type:  tuple[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, Ellipsis]
      :value: ()


   .. py:property:: inner_elements
      :type: tuple[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, Ellipsis]


   .. py:method:: create_from_element(source: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin, *, inner_elements: list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement] | None = None) -> CompositeSemanticElement
      :classmethod:


      Convert the semantic element into another semantic element type.


   .. py:method:: to_dict(*, include_previews: bool = False, include_contents: bool = False) -> dict[str, Any]


   .. py:method:: unwrap_elements(elements: collections.abc.Iterable[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement], *, include_containers: bool | None = None) -> list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]
      :classmethod:


      Recursively flatten a list of AbstractSemanticElement objects.
      For each CompositeSemanticElement encountered, its inner_elements
      are also recursively flattened. The 'include_containers' parameter controls
      whether the CompositeSemanticElement itself is included in the flattened list.


.. py:class:: EmptyElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

   Bases: :py:obj:`IrrelevantElement`


   The EmptyElement class represents an HTML element that does not contain any content.
   It is a subclass of the IrrelevantElement class and is used to identify and handle
   empty HTML tags in the document.


.. py:class:: ImageElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

   Bases: :py:obj:`sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement`


   The ImageElement class represents a standard image within a document.


.. py:class:: IrrelevantElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

   Bases: :py:obj:`sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement`


   The IrrelevantElement class identifies elements in the parsed HTML that do not
   contribute to the content. These elements often include page separators, page
   numbers, and other non-content items. For instance, HTML tags without content
   like <p></p> or <div></div> are deemed irrelevant, often used in documents just
   to add vertical space.


.. py:class:: NotYetClassifiedElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

   Bases: :py:obj:`sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement`


   The NotYetClassifiedElement class represents an element whose type
   has not yet been determined. The parsing process aims to
   classify all instances of this class into more specific
   subclasses of AbstractSemanticElement.


.. py:class:: PageHeaderElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

   Bases: :py:obj:`IrrelevantElement`


   The PageHeaderElement class represents a page header within a document.
   It is a subclass of the IrrelevantElement class and is used to identify
   and handle page headers in the document, such as current section titles
   and company names.


.. py:class:: PageNumberElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

   Bases: :py:obj:`IrrelevantElement`


   The PageNumberElement class represents a page number within a document.
   It is a subclass of the IrrelevantElement class and is used to identify
   and handle page numbers in the document.


.. py:class:: SupplementaryText(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

   Bases: :py:obj:`sec_parser.semantic_elements.mixins.dict_text_content_mixin.DictTextContentMixin`, :py:obj:`sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement`


   The SupplementaryText class captures various types of supplementary text
   within a document, such as unit qualifiers, additional notes, and disclaimers.

   For example:
   - "(In millions, except number of shares which are reflected in thousands and
      per share amounts)"
   - "See accompanying Notes to Condensed Consolidated Financial Statements."
   - "Disclaimer: This is not financial advice."


.. py:class:: TextElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

   Bases: :py:obj:`sec_parser.semantic_elements.mixins.dict_text_content_mixin.DictTextContentMixin`, :py:obj:`sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement`


   The TextElement class represents a standard text paragraph within a document.


.. py:class:: TableElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

   Bases: :py:obj:`sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement`


   The TableElement class represents a standard table within a document.


   .. py:method:: get_summary() -> str

      Return a human-readable summary of the semantic element.

      This method aims to provide a simplified, human-friendly representation of
      the underlying HtmlTag.


   .. py:method:: to_dict(*, include_previews: bool = False, include_contents: bool = False) -> dict[str, Any]


   .. py:method:: table_to_markdown() -> str


.. py:class:: TitleElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, level: int | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

   Bases: :py:obj:`sec_parser.semantic_elements.mixins.dict_text_content_mixin.DictTextContentMixin`, :py:obj:`sec_parser.semantic_elements.abstract_semantic_element.AbstractLevelElement`


   The TitleElement class represents the title of a paragraph or other content object.
   It serves as a semantic marker, providing context and structure to the document.


.. py:class:: TopSectionTitle(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None, level: int | None = None, section_type: sec_parser.semantic_elements.top_section_title_types.TopSectionInFiling | None = None)

   Bases: :py:obj:`sec_parser.semantic_elements.mixins.dict_text_content_mixin.DictTextContentMixin`, :py:obj:`sec_parser.semantic_elements.top_section_start_marker.TopSectionStartMarker`


   The TopSectionTitle class represents the title and the beginning of a top-level
   section of a document. For instance, in SEC 10-Q reports, a
   top-level section could be "Part I, Item 3. Quantitative and Qualitative
   Disclosures About Market Risk.".


   .. py:method:: create_from_element(source: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin, *, level: int | None = None, section_type: sec_parser.semantic_elements.top_section_title_types.TopSectionInFiling | None = None) -> sec_parser.semantic_elements.abstract_semantic_element.AbstractLevelElement
      :classmethod:


      Convert the semantic element into another semantic element type.


   .. py:method:: to_dict(*, include_previews: bool = False, include_contents: bool = False) -> dict[str, Any]