sec_parser.semantic_elements.mixins.dict_text_content_mixin

Classes

`AbstractSemanticElement`	In the domain of HTML parsing, especially in the context of SEC EDGAR documents,
`DictTextContentMixin`	In the domain of HTML parsing, especially in the context of SEC EDGAR documents,

Module Contents

class sec_parser.semantic_elements.mixins.dict_text_content_mixin.AbstractSemanticElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

Bases: abc.ABC

In the domain of HTML parsing, especially in the context of SEC EDGAR documents, a semantic element refers to a meaningful unit within the document that serves a specific purpose. For example, a paragraph or a table might be considered a semantic element. Unlike syntactic elements, which merely exist to structure the HTML, semantic elements carry information that is vital to the understanding of the document’s content.

This class serves as a foundational representation of such semantic elements, containing an HtmlTag object that stores the raw HTML tag information. Subclasses will implement additional behaviors based on the type of the semantic element.

log_init(log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None) → None: Has to be called at the very end of the __init__ method.

property html_tag: sec_parser.processing_engine.html_tag.HtmlTag

classmethod create_from_element(source: AbstractSemanticElement, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin) → AbstractSemanticElement: Convert the semantic element into another semantic element type.

to_dict(*, include_previews: bool = False, include_contents: bool = False) → dict[str, Any]

__repr__() → str: Return repr(self).

contains_words() → bool: Return True if the semantic element contains text.

property text: str: Property text is a passthrough to the HtmlTag text property.

get_source_code(*, pretty: bool = False, enable_compatibility: bool = False) → str: get_source_code is a passthrough to the HtmlTag method.

get_summary() → str

Return a human-readable summary of the semantic element.

This method aims to provide a simplified, human-friendly representation of the underlying HtmlTag. In this base implementation, it is a passthrough to the HtmlTag’s get_text() method.

Note: Subclasses may override this method to provide a more specific summary based on the type of element.

class sec_parser.semantic_elements.mixins.dict_text_content_mixin.DictTextContentMixin(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

Bases: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement

In the domain of HTML parsing, especially in the context of SEC EDGAR documents, a semantic element refers to a meaningful unit within the document that serves a specific purpose. For example, a paragraph or a table might be considered a semantic element. Unlike syntactic elements, which merely exist to structure the HTML, semantic elements carry information that is vital to the understanding of the document’s content.

This class serves as a foundational representation of such semantic elements, containing an HtmlTag object that stores the raw HTML tag information. Subclasses will implement additional behaviors based on the type of the semantic element.

to_dict(*, include_previews: bool = False, include_contents: bool = False) → dict[str, Any]