sec_parser.processing_engine.core
=================================

.. py:module:: sec_parser.processing_engine.core


Classes
-------

.. autoapisummary::

   sec_parser.processing_engine.core.AbstractSemanticElementParser
   sec_parser.processing_engine.core.Edgar10QParser
   sec_parser.processing_engine.core.Edgar10KParser


Module Contents
---------------

.. py:class:: AbstractSemanticElementParser(get_steps: Callable[[], list[sec_parser.processing_steps.abstract_classes.abstract_processing_step.AbstractProcessingStep]] | None = None, *, parsing_options: sec_parser.processing_engine.types.ParsingOptions | None = None, html_tag_parser: sec_parser.processing_engine.html_tag_parser.AbstractHtmlTagParser | None = None)

   Bases: :py:obj:`abc.ABC`


   Responsible for parsing semantic elements from HTML documents.
   It takes raw HTML and turns it into a list of objects
   representing semantic elements.

   At a High Level:
   ==================
   1. Extract top-level HTML tags from the document.
   2. Transform these tags into a list of more specific semantic
      elements step-by-step.

   Why Focus on Top-Level Tags?
   ============================
   SEC filings usually have a flat HTML structure, which simplifies the
   parsing process. Each top-level HTML tag often directly corresponds
   to a single semantic element. This is different from many websites
   where HTML tags are nested deeply,requiring more complex parsing.

   For Advanced Users:
   ====================
   The parsing process is implemented as a sequence of steps and allows for
   customization at each step.

   - Pipeline Pattern: Raw HTML tags are processed in a sequential manner.
     The steps follow an ordered, step-by-step approach, akin to a Finite
     State Machine (FSM). Each element transitions through various states
     defined by the sequence of processing steps.

   - Strategy Pattern: Each step is customizable. You can either replace,
     remove, or extend any of the existing steps with your own or
     inherited implementation. Alternatively, you can replace the entire pipeline
     with your own process.


   .. py:attribute:: _get_steps


   .. py:attribute:: _parsing_options


   .. py:attribute:: _html_tag_parser


   .. py:method:: get_default_steps() -> list[sec_parser.processing_steps.abstract_classes.abstract_processing_step.AbstractProcessingStep]
      :abstractmethod:


   .. py:method:: parse(html: str | bytes, *, unwrap_elements: bool | None = None, include_containers: bool | None = None, include_irrelevant_elements: bool | None = None) -> list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]


   .. py:method:: parse_from_tags(root_tags: list[sec_parser.processing_engine.html_tag.HtmlTag], *, unwrap_elements: bool | None = None, include_containers: bool | None = None, include_irrelevant_elements: bool | None = None) -> list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]


.. py:class:: Edgar10QParser(get_steps: Callable[[], list[sec_parser.processing_steps.abstract_classes.abstract_processing_step.AbstractProcessingStep]] | None = None, *, parsing_options: sec_parser.processing_engine.types.ParsingOptions | None = None, html_tag_parser: sec_parser.processing_engine.html_tag_parser.AbstractHtmlTagParser | None = None)

   Bases: :py:obj:`AbstractSemanticElementParser`


   The Edgar10QParser class is responsible for parsing SEC EDGAR 10-Q
   quarterly reports. It transforms the HTML documents into a list
   of elements. Each element in this list represents a part of
   the visual structure of the original document.


   .. py:method:: get_default_steps(get_checks: Callable[[], list[sec_parser.processing_steps.individual_semantic_element_extractor.single_element_checks.abstract_single_element_check.AbstractSingleElementCheck]] | None = None) -> list[sec_parser.processing_steps.abstract_classes.abstract_processing_step.AbstractProcessingStep]


   .. py:method:: get_default_single_element_checks() -> list[sec_parser.processing_steps.individual_semantic_element_extractor.single_element_checks.abstract_single_element_check.AbstractSingleElementCheck]


.. py:class:: Edgar10KParser(get_steps: Callable[[], list[sec_parser.processing_steps.abstract_classes.abstract_processing_step.AbstractProcessingStep]] | None = None, *, parsing_options: sec_parser.processing_engine.types.ParsingOptions | None = None, html_tag_parser: sec_parser.processing_engine.html_tag_parser.AbstractHtmlTagParser | None = None)

   Bases: :py:obj:`AbstractSemanticElementParser`


   The Edgar10KParser class is responsible for parsing SEC EDGAR 10-K
   quarterly reports. It transforms the HTML documents into a list
   of elements. Each element in this list represents a part of
   the visual structure of the original document.


   .. py:method:: get_default_steps(get_checks: Callable[[], list[sec_parser.processing_steps.individual_semantic_element_extractor.single_element_checks.abstract_single_element_check.AbstractSingleElementCheck]] | None = None) -> list[sec_parser.processing_steps.abstract_classes.abstract_processing_step.AbstractProcessingStep]


   .. py:method:: get_default_single_element_checks() -> list[sec_parser.processing_steps.individual_semantic_element_extractor.single_element_checks.abstract_single_element_check.AbstractSingleElementCheck]