sec_parser.semantic_tree.tree_builder

Classes

TitleElement

The TitleElement class represents the title of a paragraph or other content object.

TopSectionStartMarker

The TopSectionStartMarker class represents the beginning of a top-level

AbstractNestingRule

AbstractNestingRule is a base class for defining rules for nesting

AlwaysNestAsParentRule

AbstractNestingRule is a base class for defining rules for nesting

NestSameTypeDependingOnLevelRule

AbstractNestingRule is a base class for defining rules for nesting

SemanticTree

TreeNode

The TreeNode class is a fundamental part of the semantic tree structure.

TreeBuilder

Builds a semantic tree from a list of semantic elements.

Module Contents

class sec_parser.semantic_tree.tree_builder.TitleElement

Bases: sec_parser.semantic_elements.mixins.dict_text_content_mixin.DictTextContentMixin, sec_parser.semantic_elements.abstract_semantic_element.AbstractLevelElement

The TitleElement class represents the title of a paragraph or other content object. It serves as a semantic marker, providing context and structure to the document.

class sec_parser.semantic_tree.tree_builder.TopSectionStartMarker(html_tag: sec_parser.processing_engine.html_tag.HtmlTag, *, processing_log: sec_parser.processing_engine.processing_log.ProcessingLog | None = None, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin | None = None, level: int | None = None, section_type: sec_parser.semantic_elements.top_section_title_types.TopSectionType | None = None)

Bases: sec_parser.semantic_elements.abstract_semantic_element.AbstractLevelElement

The TopSectionStartMarker class represents the beginning of a top-level section of a document. It is used to mark the start of sections such as “Part I, Item 1. Business” in SEC 10-Q reports.

classmethod create_from_element(source: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, log_origin: sec_parser.processing_engine.processing_log.LogItemOrigin, *, level: int | None = None, section_type: sec_parser.semantic_elements.top_section_title_types.TopSectionType | None = None) sec_parser.semantic_elements.abstract_semantic_element.AbstractLevelElement

Convert the semantic element into another semantic element type.

to_dict(*, include_previews: bool = False, include_contents: bool = False) dict[str, Any]
class sec_parser.semantic_tree.tree_builder.AbstractNestingRule(*, exclude_parents: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None, exclude_children: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None)

Bases: abc.ABC

AbstractNestingRule is a base class for defining rules for nesting semantic elements. Each rule should ideally mention at most one or two types of semantic elements to reduce coupling and complexity.

In case of conflicts between rules, they should be resolved through parameters like exclude_parents and exclude_children.

should_be_nested_under(parent: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, child: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement) bool
abstract _should_be_nested_under(parent: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, child: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement) bool
class sec_parser.semantic_tree.tree_builder.AlwaysNestAsParentRule(cls: type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement], /, *, exclude_parents: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None, exclude_children: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None)

Bases: AbstractNestingRule

AbstractNestingRule is a base class for defining rules for nesting semantic elements. Each rule should ideally mention at most one or two types of semantic elements to reduce coupling and complexity.

In case of conflicts between rules, they should be resolved through parameters like exclude_parents and exclude_children.

_should_be_nested_under(parent: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, child: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement) bool
class sec_parser.semantic_tree.tree_builder.NestSameTypeDependingOnLevelRule(*, exclude_parents: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None, exclude_children: set[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]] | None = None)

Bases: AbstractNestingRule

AbstractNestingRule is a base class for defining rules for nesting semantic elements. Each rule should ideally mention at most one or two types of semantic elements to reduce coupling and complexity.

In case of conflicts between rules, they should be resolved through parameters like exclude_parents and exclude_children.

_should_be_nested_under(parent: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, child: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement) bool
class sec_parser.semantic_tree.tree_builder.SemanticTree(root_nodes: list[sec_parser.semantic_tree.tree_node.TreeNode])
__iter__() collections.abc.Iterator[sec_parser.semantic_tree.tree_node.TreeNode]

Iterate over the root nodes of the tree.

__len__() int
property nodes: collections.abc.Iterator[sec_parser.semantic_tree.tree_node.TreeNode]

Get all nodes in the semantic tree. This includes the root nodes and all their descendants.

render(*, pretty: bool | None = True, ignored_types: tuple[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement], Ellipsis] | None = None, char_display_limit: int | None = None, verbose: bool = False) str

Render the semantic tree as a human-readable string.

Syntactic sugar for a more convenient usage of render.

print(*, pretty: bool | None = True, ignored_types: tuple[type[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement], Ellipsis] | None = None, char_display_limit: int | None = None, verbose: bool = False, line_limit: int | None = None) None

Print the semantic tree as a human-readable string.

Syntactic sugar for a more convenient usage of render.

class sec_parser.semantic_tree.tree_builder.TreeNode(semantic_element: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement, *, parent: TreeNode | None = None, children: collections.abc.Iterable[TreeNode] | None = None)

The TreeNode class is a fundamental part of the semantic tree structure. Each TreeNode represents a node in the tree. It holds a reference to a semantic element, maintains a list of its child nodes, and a reference to its parent node. This class provides methods for managing the tree structure, such as adding and removing child nodes. Importantly, these methods ensure logical consistency as children/parents are being changed. For example, if a parent is removed from a child, the child is automatically removed from the parent.

property semantic_element: sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement
property children: list[TreeNode]
property parent: TreeNode | None
add_child(child: TreeNode) None
add_children(children: collections.abc.Iterable[TreeNode]) None
remove_child(child: TreeNode) None
has_child(child: TreeNode) bool
get_descendants() collections.abc.Iterator[TreeNode]
__repr__() str

Return repr(self).

property text: str

Property text is a passthrough to the SemanticElement text property.

get_source_code(*, pretty: bool = False) str

get_source_code is a passthrough to the SemanticElement method.

class sec_parser.semantic_tree.tree_builder.TreeBuilder(get_rules: Callable[[], list[sec_parser.semantic_tree.nesting_rules.AbstractNestingRule]] | None = None)

Builds a semantic tree from a list of semantic elements.

Why Use a Tree Structure?

Using a tree data structure allows for easier and more robust filtering of sections. With a tree, you can select specific branches to filter, making it straightforward to identify section boundaries. This approach is more maintainable and robust compared to attempting the same operations on a flat list of elements.

Overview:

  1. Takes a list of semantic elements.

  2. Applies nesting rules to these elements.

Customization:

The nesting process is customizable through a list of rules. These rules determine how new elements should be nested under existing ones.

Advanced Customization:

You can supply your own set of rules by providing a callable to get_rules, which should return a list of AbstractNestingRule instances.

static get_default_rules() list[sec_parser.semantic_tree.nesting_rules.AbstractNestingRule]
build(elements: list[sec_parser.semantic_elements.abstract_semantic_element.AbstractSemanticElement]) sec_parser.semantic_tree.semantic_tree.SemanticTree
_find_parent_node(new_node: sec_parser.semantic_tree.tree_node.TreeNode, stack: list[sec_parser.semantic_tree.tree_node.TreeNode], rules: list[sec_parser.semantic_tree.nesting_rules.AbstractNestingRule]) sec_parser.semantic_tree.tree_node.TreeNode | None
_should_nest_under(child_node: sec_parser.semantic_tree.tree_node.TreeNode, parent_node: sec_parser.semantic_tree.tree_node.TreeNode, rules: list[sec_parser.semantic_tree.nesting_rules.AbstractNestingRule]) bool