sec_parser.semantic_elements.semantic_elements
Classes
The NotYetClassifiedElement class represents an element whose type |
|
The ErrorWhileProcessingElement class represents an element that could |
|
The IrrelevantElement class identifies elements in the parsed HTML that do not |
|
The PageNumberElement class represents a page number within a document. |
|
The PageHeaderElement class represents a page header within a document. |
|
The EmptyElement class represents an HTML element that does not contain any content. |
|
The IntroductorySectionElement class represents elements that are part of the |
|
The TextElement class represents a standard text paragraph within a document. |
|
The SupplementaryText class captures various types of supplementary text |
|
The ImageElement class represents a standard image within a document. |
Module Contents
- class sec_parser.semantic_elements.semantic_elements.NotYetClassifiedElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)
Bases:
sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElementThe NotYetClassifiedElement class represents an element whose type has not yet been determined. The parsing process aims to classify all instances of this class into more specific subclasses of AbstractSemanticElement.
- class sec_parser.semantic_elements.semantic_elements.ErrorWhileProcessingElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, error: Exception, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)
Bases:
sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElementThe ErrorWhileProcessingElement class represents an element that could not be processed due to an error. This class is used to handle exceptions and errors during the parsing process.
- error
- classmethod create_from_element(source: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin, *, error: Exception | None = None) sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement
Convert the semantic element into another semantic element type.
- class sec_parser.semantic_elements.semantic_elements.IrrelevantElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)
Bases:
sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElementThe IrrelevantElement class identifies elements in the parsed HTML that do not contribute to the content. These elements often include page separators, page numbers, and other non-content items. For instance, HTML tags without content like <p></p> or <div></div> are deemed irrelevant, often used in documents just to add vertical space.
- class sec_parser.semantic_elements.semantic_elements.PageNumberElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)
Bases:
IrrelevantElementThe PageNumberElement class represents a page number within a document. It is a subclass of the IrrelevantElement class and is used to identify and handle page numbers in the document.
- class sec_parser.semantic_elements.semantic_elements.PageHeaderElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)
Bases:
IrrelevantElementThe PageHeaderElement class represents a page header within a document. It is a subclass of the IrrelevantElement class and is used to identify and handle page headers in the document, such as current section titles and company names.
- class sec_parser.semantic_elements.semantic_elements.EmptyElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)
Bases:
IrrelevantElementThe EmptyElement class represents an HTML element that does not contain any content. It is a subclass of the IrrelevantElement class and is used to identify and handle empty HTML tags in the document.
- class sec_parser.semantic_elements.semantic_elements.IntroductorySectionElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)
Bases:
IrrelevantElementThe IntroductorySectionElement class represents elements that are part of the introductory sections of a document, such as title page, disclaimers or other preliminary information that precedes the main content of the document. This class is a subclass of the IrrelevantElement class, as these introductory sections are typically not part of the core financial data to be extracted.
- class sec_parser.semantic_elements.semantic_elements.TextElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)
Bases:
sec_parser.semantic_elements.mixins.dict_text_content_mixin.DictTextContentMixin,sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElementThe TextElement class represents a standard text paragraph within a document.
- class sec_parser.semantic_elements.semantic_elements.SupplementaryText(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)
Bases:
sec_parser.semantic_elements.mixins.dict_text_content_mixin.DictTextContentMixin,sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElementThe SupplementaryText class captures various types of supplementary text within a document, such as unit qualifiers, additional notes, and disclaimers.
For example: - “(In millions, except number of shares which are reflected in thousands and
per share amounts)”
“See accompanying Notes to Condensed Consolidated Financial Statements.”
“Disclaimer: This is not financial advice.”
- class sec_parser.semantic_elements.semantic_elements.ImageElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)
Bases:
sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElementThe ImageElement class represents a standard image within a document.