sec_parser.semantic_elements.abstract_semantic_element

Attributes

LogItemOrigin

Exceptions

SecParserValueError

Base exception class for sec_parser.

InvalidLevelError

Base exception class for sec_parser.

Classes

ProcessingLog

AbstractSemanticElement

In the domain of HTML parsing, especially in the context of SEC EDGAR documents,

AbstractLevelElement

The AbstractLevelElement class provides a level attribute to semantic elements.

Module Contents

exception sec_parser.semantic_elements.abstract_semantic_element.SecParserValueError

Bases: SecParserError, ValueError

Base exception class for sec_parser. All custom exceptions in sec_parser are inherited from this class.

sec_parser.semantic_elements.abstract_semantic_element.LogItemOrigin
class sec_parser.semantic_elements.abstract_semantic_element.ProcessingLog
add_item(*, message: LogItemPayload, log_origin: LogItemOrigin) None
get_items() tuple[LogItem, Ellipsis]
copy() ProcessingLog
class sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

Bases: abc.ABC

In the domain of HTML parsing, especially in the context of SEC EDGAR documents, a semantic element refers to a meaningful unit within the document that serves a specific purpose. For example, a paragraph or a table might be considered a semantic element. Unlike syntactic elements, which merely exist to structure the HTML, semantic elements carry information that is vital to the understanding of the document’s content.

This class serves as a foundational representation of such semantic elements, containing an HtmlTag object that stores the raw HTML tag information. Subclasses will implement additional behaviors based on the type of the semantic element.

log_init(log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None) None

Has to be called at the very end of the __init__ method.

property html_tag: sec_parser.processing_engine.html_tag.HtmlTag
classmethod create_from_element(source: AbstractSemanticElement, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin) AbstractSemanticElement

Convert the semantic element into another semantic element type.

to_dict(*, include_previews: bool = False, include_contents: bool = False) dict[str, Any]
__repr__() str

Return repr(self).

contains_words() bool

Return True if the semantic element contains text.

property text: str

Property text is a passthrough to the HtmlTag text property.

get_source_code(*, pretty: bool = False, enable_compatibility: bool = False) str

get_source_code is a passthrough to the HtmlTag method.

get_summary() str

Return a human-readable summary of the semantic element.

This method aims to provide a simplified, human-friendly representation of the underlying HtmlTag. In this base implementation, it is a passthrough to the HtmlTag’s get_text() method.

Note: Subclasses may override this method to provide a more specific summary based on the type of element.

class sec_parser.semantic_elements.abstract_semantic_element.AbstractLevelElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, level: int | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

Bases: AbstractSemanticElement

The AbstractLevelElement class provides a level attribute to semantic elements. It represents hierarchical levels in the document structure. For instance, a main section title might be at level 1, a subsection at level 2, etc.

MIN_LEVEL = 0
classmethod create_from_element(source: AbstractSemanticElement, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin, *, level: int | None = None) AbstractLevelElement

Convert the semantic element into another semantic element type.

to_dict(*, include_previews: bool = False, include_contents: bool = False) dict[str, Any]
__repr__() str

Return repr(self).

exception sec_parser.semantic_elements.abstract_semantic_element.InvalidLevelError

Bases: sec_parser.exceptions.SecParserValueError

Base exception class for sec_parser. All custom exceptions in sec_parser are inherited from this class.