sec_parser.processing_steps.text_element_merger
Classes
TextElementMerger is a processing step that merges adjacent text elements |
Module Contents
- class sec_parser.processing_steps.text_element_merger.TextElementMerger
-
TextElementMerger is a processing step that merges adjacent text elements For example, TextElement(<span></span>) and TextElement(<span></span>) into a single TextElement(<span></span><span></span>).
- Intended to fix weird formatting artifacts, such as:
- <ix:nonnumeric contextref=”c-1” name=”us-gaap:PropertyPlantAndEquipmentTextBlock” id=”f-989” escape=”true”>
<span style=”background-color:#ffffff;color:#000000;font-family:’Arial’,sans-serif;font-size:10pt;font-weight:400;line-height:120%”>Property and equipment, net, co</span> <span style=”color:#000000;font-family:’Arial’,sans-serif;font-size:10pt;font-weight:400;line-height:120%”>nsisted of the following (in millions):</span>
</ix:nonnumeric>
Notice, how text is split into two spans, even though it’s a single sentence. Source: https://www.sec.gov/Archives/edgar/data/1652044/000165204423000094/goog-20230930.htm