sec_parser.semantic_elements.highlighted_text_element

Exceptions

SecParserValueError

Base exception class for sec_parser.

Classes

AbstractSemanticElement

In the domain of HTML parsing, especially in the context of SEC EDGAR documents,

HighlightedTextElement

The HighlightedTextElement class, among other uses,

TextStyle

Functions

exceeds_capitalization_threshold(→ bool)

Calculate the percentage of capitalized letters in a given string s.

Module Contents

exception sec_parser.semantic_elements.highlighted_text_element.SecParserValueError

Bases: SecParserError, ValueError

Base exception class for sec_parser. All custom exceptions in sec_parser are inherited from this class.

class sec_parser.semantic_elements.highlighted_text_element.AbstractSemanticElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

Bases: abc.ABC

In the domain of HTML parsing, especially in the context of SEC EDGAR documents, a semantic element refers to a meaningful unit within the document that serves a specific purpose. For example, a paragraph or a table might be considered a semantic element. Unlike syntactic elements, which merely exist to structure the HTML, semantic elements carry information that is vital to the understanding of the document’s content.

This class serves as a foundational representation of such semantic elements, containing an HtmlTag object that stores the raw HTML tag information. Subclasses will implement additional behaviors based on the type of the semantic element.

log_init(log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None) None

Has to be called at the very end of the __init__ method.

property html_tag: sec_parser.processing_engine.html_tag.HtmlTag
classmethod create_from_element(source: AbstractSemanticElement, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin) AbstractSemanticElement

Convert the semantic element into another semantic element type.

to_dict(*, include_previews: bool = False, include_contents: bool = False) dict[str, Any]
__repr__() str

Return repr(self).

contains_words() bool

Return True if the semantic element contains text.

property text: str

Property text is a passthrough to the HtmlTag text property.

get_source_code(*, pretty: bool = False, enable_compatibility: bool = False) str

get_source_code is a passthrough to the HtmlTag method.

get_summary() str

Return a human-readable summary of the semantic element.

This method aims to provide a simplified, human-friendly representation of the underlying HtmlTag. In this base implementation, it is a passthrough to the HtmlTag’s get_text() method.

Note: Subclasses may override this method to provide a more specific summary based on the type of element.

sec_parser.semantic_elements.highlighted_text_element.exceeds_capitalization_threshold(s: str, threshold: float) bool

Calculate the percentage of capitalized letters in a given string s. Only counts characters that can be capitalized (alphabetic characters).

class sec_parser.semantic_elements.highlighted_text_element.HighlightedTextElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, style: TextStyle | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

Bases: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement

The HighlightedTextElement class, among other uses, is an intermediate step in identifying title elements.

For example:

First, elements with specific styles (like bold or italic text) are classified as HighlightedTextElements. These are later examined to determine if they should be considered TitleElements.

classmethod create_from_element(source: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin, *, style: TextStyle | None = None) HighlightedTextElement

Convert the semantic element into another semantic element type.

to_dict(*, include_previews: bool = False, include_contents: bool = False) dict[str, Any]
class sec_parser.semantic_elements.highlighted_text_element.TextStyle
PERCENTAGE_THRESHOLD = 80
BOLD_THRESHOLD = 600
is_all_uppercase: bool = False
bold_with_font_weight: bool = False
italic: bool = False
centered: bool = False
underline: bool = False
__bool__() bool
classmethod from_style_and_text(style_percentage: dict[tuple[str, str], float], text: str) TextStyle
classmethod _is_bold_with_font_weight(key: str, value: str) bool