sec_parser.utils.bs4_.count_text_matches_in_descendants

Functions

get_first_deepest_tag(→ bs4.Tag | None)

Given a BeautifulSoup tag, returns the first deepest tag within it.

is_unary_tree(→ bool)

is_unary_tree determines if a BeautifulSoup tag forms a unary tree.

count_text_matches_in_descendants(→ int)

Module Contents

sec_parser.utils.bs4_.count_text_matches_in_descendants.get_first_deepest_tag(tag: bs4.Tag) bs4.Tag | None

Given a BeautifulSoup tag, returns the first deepest tag within it.

For example, if we have the following HTML structure: <div><p>Test</p><span>Another Test</span></div> and we pass the ‘div’ tag to this function, it will return the ‘p’ tag, which is the first deepest tag within the ‘html’ tag.

sec_parser.utils.bs4_.count_text_matches_in_descendants.is_unary_tree(tag: bs4.Tag) bool

is_unary_tree determines if a BeautifulSoup tag forms a unary tree. In a unary tree, each node has at most one child.

Unary trees can contain NavigableString leaves. However, if a non-leaf node contains a non-empty NavigableString, the tree is not considered unary.

Additionally, if the some tag is a ‘table’, the function will return True regardless of its children. This is because in the context of this application, ‘table’ tags are always considered unary.

sec_parser.utils.bs4_.count_text_matches_in_descendants.count_text_matches_in_descendants(bs4_tag: bs4.Tag, predicate: Callable[[str], bool], *, exclude_links: bool | None = None) int