sec_parser.processing_steps.text_element_merger
===============================================
.. py:module:: sec_parser.processing_steps.text_element_merger
Classes
-------
.. autoapisummary::
sec_parser.processing_steps.text_element_merger.TextElementMerger
Module Contents
---------------
.. py:class:: TextElementMerger
Bases: :py:obj:`sec_parser.processing_steps.abstract_classes.abstract_element_batch_processing_step.AbstractElementBatchProcessingStep`
TextElementMerger is a processing step that merges adjacent text elements
For example, TextElement() and TextElement()
into a single TextElement().
Intended to fix weird formatting artifacts, such as:
Property and equipment, net, co
nsisted of the following (in millions):
Notice, how text is split into two spans, even though it's a single sentence.
Source: https://www.sec.gov/Archives/edgar/data/1652044/000165204423000094/goog-20230930.htm
.. py:method:: _process_elements(elements: list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement], _: sec_parser.processing_steps.abstract_classes.processing_context.ElementProcessingContext) -> list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]
.. py:method:: _merge(elements: list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]) -> sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement
:classmethod: