sec_parser.processing_steps.text_element_merger

Classes

TextElementMerger

TextElementMerger is a processing step that merges adjacent text elements

Module Contents

class sec_parser.processing_steps.text_element_merger.TextElementMerger

Bases: sec_parser.processing_steps.abstract_classes.abstract_element_batch_processing_step.AbstractElementBatchProcessingStep

TextElementMerger is a processing step that merges adjacent text elements For example, TextElement(<span></span>) and TextElement(<span></span>) into a single TextElement(<span></span><span></span>).

Intended to fix weird formatting artifacts, such as:
<ix:nonnumeric contextref=”c-1” name=”us-gaap:PropertyPlantAndEquipmentTextBlock” id=”f-989” escape=”true”>

<span style=”background-color:#ffffff;color:#000000;font-family:’Arial’,sans-serif;font-size:10pt;font-weight:400;line-height:120%”>Property and equipment, net, co</span> <span style=”color:#000000;font-family:’Arial’,sans-serif;font-size:10pt;font-weight:400;line-height:120%”>nsisted of the following (in millions):</span>

</ix:nonnumeric>

Notice, how text is split into two spans, even though it’s a single sentence. Source: https://www.sec.gov/Archives/edgar/data/1652044/000165204423000094/goog-20230930.htm

_process_elements(elements: list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement], _: sec_parser.processing_steps.abstract_classes.processing_context.ElementProcessingContext) list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]
classmethod _merge(elements: list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]) sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement