sec_parser.semantic_elements.semantic_elements

Classes

NotYetClassifiedElement

The NotYetClassifiedElement class represents an element whose type

ErrorWhileProcessingElement

The ErrorWhileProcessingElement class represents an element that could

IrrelevantElement

The IrrelevantElement class identifies elements in the parsed HTML that do not

PageNumberElement

The PageNumberElement class represents a page number within a document.

PageHeaderElement

The PageHeaderElement class represents a page header within a document.

EmptyElement

The EmptyElement class represents an HTML element that does not contain any content.

IntroductorySectionElement

The IntroductorySectionElement class represents elements that are part of the

TextElement

The TextElement class represents a standard text paragraph within a document.

SupplementaryText

The SupplementaryText class captures various types of supplementary text

ImageElement

The ImageElement class represents a standard image within a document.

Module Contents

class sec_parser.semantic_elements.semantic_elements.NotYetClassifiedElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

Bases: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement

The NotYetClassifiedElement class represents an element whose type has not yet been determined. The parsing process aims to classify all instances of this class into more specific subclasses of AbstractSemanticElement.

class sec_parser.semantic_elements.semantic_elements.ErrorWhileProcessingElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, error: Exception, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

Bases: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement

The ErrorWhileProcessingElement class represents an element that could not be processed due to an error. This class is used to handle exceptions and errors during the parsing process.

error
classmethod create_from_element(source: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin, *, error: Exception | None = None) sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement

Convert the semantic element into another semantic element type.

class sec_parser.semantic_elements.semantic_elements.IrrelevantElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

Bases: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement

The IrrelevantElement class identifies elements in the parsed HTML that do not contribute to the content. These elements often include page separators, page numbers, and other non-content items. For instance, HTML tags without content like <p></p> or <div></div> are deemed irrelevant, often used in documents just to add vertical space.

class sec_parser.semantic_elements.semantic_elements.PageNumberElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

Bases: IrrelevantElement

The PageNumberElement class represents a page number within a document. It is a subclass of the IrrelevantElement class and is used to identify and handle page numbers in the document.

class sec_parser.semantic_elements.semantic_elements.PageHeaderElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

Bases: IrrelevantElement

The PageHeaderElement class represents a page header within a document. It is a subclass of the IrrelevantElement class and is used to identify and handle page headers in the document, such as current section titles and company names.

class sec_parser.semantic_elements.semantic_elements.EmptyElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

Bases: IrrelevantElement

The EmptyElement class represents an HTML element that does not contain any content. It is a subclass of the IrrelevantElement class and is used to identify and handle empty HTML tags in the document.

class sec_parser.semantic_elements.semantic_elements.IntroductorySectionElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

Bases: IrrelevantElement

The IntroductorySectionElement class represents elements that are part of the introductory sections of a document, such as title page, disclaimers or other preliminary information that precedes the main content of the document. This class is a subclass of the IrrelevantElement class, as these introductory sections are typically not part of the core financial data to be extracted.

class sec_parser.semantic_elements.semantic_elements.TextElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

Bases: sec_parser.semantic_elements.mixins.dict_text_content_mixin.DictTextContentMixin, sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement

The TextElement class represents a standard text paragraph within a document.

class sec_parser.semantic_elements.semantic_elements.SupplementaryText(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

Bases: sec_parser.semantic_elements.mixins.dict_text_content_mixin.DictTextContentMixin, sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement

The SupplementaryText class captures various types of supplementary text within a document, such as unit qualifiers, additional notes, and disclaimers.

For example: - “(In millions, except number of shares which are reflected in thousands and

per share amounts)”

  • “See accompanying Notes to Condensed Consolidated Financial Statements.”

  • “Disclaimer: This is not financial advice.”

class sec_parser.semantic_elements.semantic_elements.ImageElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

Bases: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement

The ImageElement class represents a standard image within a document.