sec_parser.processing_steps.abstract_classes.abstract_elementwise_processing_step
Attributes
Exceptions
Base exception class for sec_parser. |
Classes
AbstractProcessingStep class for transforming a list of elements. |
|
The ElementProcessingContext class is designed to provide context information |
|
CompositeSemanticElement acts as a container for other semantic elements, |
|
The ErrorWhileProcessingElement class represents an element that could |
|
AbstractElementwiseTransformStep class is used to iterate over |
Module Contents
- exception sec_parser.processing_steps.abstract_classes.abstract_elementwise_processing_step.SecParserError
Bases:
ExceptionBase exception class for sec_parser. All custom exceptions in sec_parser are inherited from this class.
- class sec_parser.processing_steps.abstract_classes.abstract_elementwise_processing_step.AbstractProcessingStep
Bases:
abc.ABCAbstractProcessingStep class for transforming a list of elements. Chaining multiple steps together allows for complex transformations while keeping the code modular.
Each instance of a step is designed to be used for a single transformation operation. This ensures that any internal state maintained during a transformation is isolated to the processing of a single document.
- process(elements: list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]) list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]
Transform the list of semantic elements.
Note: The elements argument could potentially be mutated for performance reasons.
- abstract _process(elements: list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]) list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]
Implement the actual transformation logic in child classes.
This method is intended to be overridden by child classes to provide specific transformation logic.
- class sec_parser.processing_steps.abstract_classes.abstract_elementwise_processing_step.ElementProcessingContext
The ElementProcessingContext class is designed to provide context information for elementwise processing steps.
- iteration: int
- class sec_parser.processing_steps.abstract_classes.abstract_elementwise_processing_step.CompositeSemanticElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, inner_elements: tuple[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, Ellipsis] | None, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)
Bases:
sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElementCompositeSemanticElement acts as a container for other semantic elements, especially for cases where a single HTML root tag wraps multiple elements. This ensures structural integrity and enables various features like semantic segmentation visualization, and debugging by comparison with the original document.
Why is this useful:
1. Some semantic elements, like XBRL tags (<ix>), may wrap multiple semantic elements. The container ensures that these relationships are not broken during parsing. 2. Enables the parser to fully reconstruct the original HTML document, which opens up possibilities for features like semantic segmentation visualization (e.g. recreate the original document but put semi-transparent colored boxes on top, based on semantic meaning), serialization of parsed documents into an augmented HTML, and debugging by comparing to the original document.
- property inner_elements: tuple[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, Ellipsis]
- classmethod create_from_element(source: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin, *, inner_elements: list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement] | None = None) CompositeSemanticElement
Convert the semantic element into another semantic element type.
- to_dict(*, include_previews: bool = False, include_contents: bool = False) dict[str, Any]
- classmethod unwrap_elements(elements: collections.abc.Iterable[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement], *, include_containers: bool | None = None) list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]
Recursively flatten a list of AbstractSemanticElement objects. For each CompositeSemanticElement encountered, its inner_elements are also recursively flattened. The ‘include_containers’ parameter controls whether the CompositeSemanticElement itself is included in the flattened list.
- class sec_parser.processing_steps.abstract_classes.abstract_elementwise_processing_step.ErrorWhileProcessingElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, error: Exception, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)
Bases:
sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElementThe ErrorWhileProcessingElement class represents an element that could not be processed due to an error. This class is used to handle exceptions and errors during the parsing process.
- classmethod create_from_element(source: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin, *, error: Exception | None = None) sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement
Convert the semantic element into another semantic element type.
- sec_parser.processing_steps.abstract_classes.abstract_elementwise_processing_step.MODULE_LOGGER_NAME
- class sec_parser.processing_steps.abstract_classes.abstract_elementwise_processing_step.AbstractElementwiseProcessingStep(*, types_to_process: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None, types_to_exclude: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None)
Bases:
sec_parser.processing_steps.abstract_classes.abstract_processing_step.AbstractProcessingStepAbstractElementwiseTransformStep class is used to iterate over all Semantic Elements with or without applying transformations.
- _NUM_ITERATIONS = 1
- abstract _process_element(element: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, context: sec_parser.processing_steps.abstract_classes.processing_context.ElementProcessingContext) sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement
_process_element method is responsible for transforming a single semantic element into another.
It can also be utilized to simply iterate over all elements without applying any transformations.
- _process_recursively(elements: list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement], *, _context: sec_parser.processing_steps.abstract_classes.processing_context.ElementProcessingContext) list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]