sec_parser.semantic_elements.highlighted_text_element
Exceptions
Base exception class for sec_parser. |
Classes
In the domain of HTML parsing, especially in the context of SEC EDGAR documents, |
|
The HighlightedTextElement class, among other uses, |
|
Functions
|
Calculate the percentage of capitalized letters in a given string s. |
Module Contents
- exception sec_parser.semantic_elements.highlighted_text_element.SecParserValueError
Bases:
SecParserError,ValueErrorBase exception class for sec_parser. All custom exceptions in sec_parser are inherited from this class.
- class sec_parser.semantic_elements.highlighted_text_element.AbstractSemanticElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)
Bases:
abc.ABCIn the domain of HTML parsing, especially in the context of SEC EDGAR documents, a semantic element refers to a meaningful unit within the document that serves a specific purpose. For example, a paragraph or a table might be considered a semantic element. Unlike syntactic elements, which merely exist to structure the HTML, semantic elements carry information that is vital to the understanding of the document’s content.
This class serves as a foundational representation of such semantic elements, containing an HtmlTag object that stores the raw HTML tag information. Subclasses will implement additional behaviors based on the type of the semantic element.
- log_init(log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None) None
Has to be called at the very end of the __init__ method.
- property html_tag: sec_parser.processing_engine.html_tag.HtmlTag
- classmethod create_from_element(source: AbstractSemanticElement, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin) AbstractSemanticElement
Convert the semantic element into another semantic element type.
- to_dict(*, include_previews: bool = False, include_contents: bool = False) dict[str, Any]
- __repr__() str
Return repr(self).
- contains_words() bool
Return True if the semantic element contains text.
- property text: str
Property text is a passthrough to the HtmlTag text property.
- get_source_code(*, pretty: bool = False, enable_compatibility: bool = False) str
get_source_code is a passthrough to the HtmlTag method.
- get_summary() str
Return a human-readable summary of the semantic element.
This method aims to provide a simplified, human-friendly representation of the underlying HtmlTag. In this base implementation, it is a passthrough to the HtmlTag’s get_text() method.
Note: Subclasses may override this method to provide a more specific summary based on the type of element.
- sec_parser.semantic_elements.highlighted_text_element.exceeds_capitalization_threshold(s: str, threshold: float) bool
Calculate the percentage of capitalized letters in a given string s. Only counts characters that can be capitalized (alphabetic characters).
- class sec_parser.semantic_elements.highlighted_text_element.HighlightedTextElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, style: TextStyle | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)
Bases:
sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElementThe HighlightedTextElement class, among other uses, is an intermediate step in identifying title elements.
For example:
First, elements with specific styles (like bold or italic text) are classified as HighlightedTextElements. These are later examined to determine if they should be considered TitleElements.
- classmethod create_from_element(source: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin, *, style: TextStyle | None = None) HighlightedTextElement
Convert the semantic element into another semantic element type.
- to_dict(*, include_previews: bool = False, include_contents: bool = False) dict[str, Any]
- class sec_parser.semantic_elements.highlighted_text_element.TextStyle
- PERCENTAGE_THRESHOLD = 80
- BOLD_THRESHOLD = 600
- is_all_uppercase: bool = False
- bold_with_font_weight: bool = False
- italic: bool = False
- centered: bool = False
- underline: bool = False
- __bool__() bool
- classmethod from_style_and_text(style_percentage: dict[tuple[str, str], float], text: str) TextStyle
- classmethod _is_bold_with_font_weight(key: str, value: str) bool