sec_parser.semantic_elements.mixins.dict_text_content_mixin

Classes

DictTextContentMixin

In the domain of HTML parsing, especially in the context of SEC EDGAR documents,

Module Contents

class sec_parser.semantic_elements.mixins.dict_text_content_mixin.DictTextContentMixin(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

Bases: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement

In the domain of HTML parsing, especially in the context of SEC EDGAR documents, a semantic element refers to a meaningful unit within the document that serves a specific purpose. For example, a paragraph or a table might be considered a semantic element. Unlike syntactic elements, which merely exist to structure the HTML, semantic elements carry information that is vital to the understanding of the document’s content.

This class serves as a foundational representation of such semantic elements, containing an HtmlTag object that stores the raw HTML tag information. Subclasses will implement additional behaviors based on the type of the semantic element.

to_dict(*, include_previews: bool = False, include_contents: bool = False) dict[str, Any]