sec_parser.processing_steps.empty_element_classifier

Exceptions

InvalidIterationError

Raised when an invalid iteration value is encountered.

Classes

AbstractElementwiseProcessingStep

AbstractElementwiseTransformStep class is used to iterate over

ElementProcessingContext

The ElementProcessingContext class is designed to provide context information

EmptyElement

The EmptyElement class represents an HTML element that does not contain any content.

EmptyElementClassifier

IrrelevantElementClassifier class for converting elements

Module Contents

class sec_parser.processing_steps.empty_element_classifier.AbstractElementwiseProcessingStep(*, types_to_process: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None, types_to_exclude: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None)

Bases: sec_parser.processing_steps.abstract_classes.abstract_processing_step.AbstractProcessingStep

AbstractElementwiseTransformStep class is used to iterate over all Semantic Elements with or without applying transformations.

_NUM_ITERATIONS = 1
abstract _process_element(element: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, context: sec_parser.processing_steps.abstract_classes.processing_context.ElementProcessingContext) sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement

_process_element method is responsible for transforming a single semantic element into another.

It can also be utilized to simply iterate over all elements without applying any transformations.

_process_recursively(elements: list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement], *, _context: sec_parser.processing_steps.abstract_classes.processing_context.ElementProcessingContext) list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]
_process(elements: list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]) list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]
class sec_parser.processing_steps.empty_element_classifier.ElementProcessingContext

The ElementProcessingContext class is designed to provide context information for elementwise processing steps.

iteration: int
class sec_parser.processing_steps.empty_element_classifier.EmptyElement(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None)

Bases: IrrelevantElement

The EmptyElement class represents an HTML element that does not contain any content. It is a subclass of the IrrelevantElement class and is used to identify and handle empty HTML tags in the document.

exception sec_parser.processing_steps.empty_element_classifier.InvalidIterationError

Bases: ValueError

Raised when an invalid iteration value is encountered.

class sec_parser.processing_steps.empty_element_classifier.EmptyElementClassifier(*, types_to_process: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None, types_to_exclude: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None)

Bases: sec_parser.processing_steps.abstract_classes.abstract_elementwise_processing_step.AbstractElementwiseProcessingStep

IrrelevantElementClassifier class for converting elements into IrrelevantElement instances.

This step scans through a list of semantic elements and changes it, primarily by replacing suitable candidates with IrrelevantElement instances.

_process_element(element: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, _: sec_parser.processing_steps.abstract_classes.abstract_elementwise_processing_step.ElementProcessingContext) sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement

Transform a single semantic element into a EmptyElement if applicable.