sec_parser.semantic_elements.semantic_elements

Classes

`NotYetClassifiedElement`	The NotYetClassifiedElement class represents an element whose type
`ErrorWhileProcessingElement`	The ErrorWhileProcessingElement class represents an element that could
`IrrelevantElement`	The IrrelevantElement class identifies elements in the parsed HTML that do not
`PageNumberElement`	The PageNumberElement class represents a page number within a document.
`PageHeaderElement`	The PageHeaderElement class represents a page header within a document.
`EmptyElement`	The EmptyElement class represents an HTML element that does not contain any content.
`IntroductorySectionElement`	The IntroductorySectionElement class represents elements that are part of the
`TextElement`	The TextElement class represents a standard text paragraph within a document.
`SupplementaryText`	The SupplementaryText class captures various types of supplementary text
`ImageElement`	The ImageElement class represents a standard image within a document.

Module Contents

class sec_parser.semantic_elements.semantic_elements.NotYetClassifiedElement

Bases: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement

The NotYetClassifiedElement class represents an element whose type has not yet been determined. The parsing process aims to classify all instances of this class into more specific subclasses of AbstractSemanticElement.

class sec_parser.semantic_elements.semantic_elements.ErrorWhileProcessingElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, error: Exception, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

Bases: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement

The ErrorWhileProcessingElement class represents an element that could not be processed due to an error. This class is used to handle exceptions and errors during the parsing process.

error

classmethod create_from_element(source: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin, *, error: Exception | None = None) → sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement

class sec_parser.semantic_elements.semantic_elements.IrrelevantElement

Bases: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement

The IrrelevantElement class identifies elements in the parsed HTML that do not contribute to the content. These elements often include page separators, page numbers, and other non-content items. For instance, HTML tags without content like <p></p> or <div></div> are deemed irrelevant, often used in documents just to add vertical space.

class sec_parser.semantic_elements.semantic_elements.PageNumberElement

Bases: IrrelevantElement

The PageNumberElement class represents a page number within a document. It is a subclass of the IrrelevantElement class and is used to identify and handle page numbers in the document.

class sec_parser.semantic_elements.semantic_elements.PageHeaderElement

Bases: IrrelevantElement

The PageHeaderElement class represents a page header within a document. It is a subclass of the IrrelevantElement class and is used to identify and handle page headers in the document, such as current section titles and company names.

class sec_parser.semantic_elements.semantic_elements.EmptyElement

Bases: IrrelevantElement

The EmptyElement class represents an HTML element that does not contain any content. It is a subclass of the IrrelevantElement class and is used to identify and handle empty HTML tags in the document.

class sec_parser.semantic_elements.semantic_elements.IntroductorySectionElement

Bases: IrrelevantElement

The IntroductorySectionElement class represents elements that are part of the introductory sections of a document, such as title page, disclaimers or other preliminary information that precedes the main content of the document. This class is a subclass of the IrrelevantElement class, as these introductory sections are typically not part of the core financial data to be extracted.

class sec_parser.semantic_elements.semantic_elements.TextElement

Bases: sec_parser.semantic_elements.mixins.dict_text_content_mixin.DictTextContentMixin, sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement

The TextElement class represents a standard text paragraph within a document.

class sec_parser.semantic_elements.semantic_elements.SupplementaryText

Bases: sec_parser.semantic_elements.mixins.dict_text_content_mixin.DictTextContentMixin, sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement

The SupplementaryText class captures various types of supplementary text within a document, such as unit qualifiers, additional notes, and disclaimers.

For example: - “(In millions, except number of shares which are reflected in thousands and

per share amounts)”

“See accompanying Notes to Condensed Consolidated Financial Statements.”
“Disclaimer: This is not financial advice.”

class sec_parser.semantic_elements.semantic_elements.ImageElement

Bases: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement

The ImageElement class represents a standard image within a document.