sec_parser.utils.bs4_.count_text_matches_in_descendants
Functions
|
Given a BeautifulSoup tag, returns the first deepest tag within it. |
|
is_unary_tree determines if a BeautifulSoup tag forms a unary tree. |
Module Contents
- sec_parser.utils.bs4_.count_text_matches_in_descendants.get_first_deepest_tag(tag: bs4.Tag) bs4.Tag | None
Given a BeautifulSoup tag, returns the first deepest tag within it.
For example, if we have the following HTML structure: <div><p>Test</p><span>Another Test</span></div> and we pass the ‘div’ tag to this function, it will return the ‘p’ tag, which is the first deepest tag within the ‘html’ tag.
- sec_parser.utils.bs4_.count_text_matches_in_descendants.is_unary_tree(tag: bs4.Tag) bool
is_unary_tree determines if a BeautifulSoup tag forms a unary tree. In a unary tree, each node has at most one child.
Unary trees can contain NavigableString leaves. However, if a non-leaf node contains a non-empty NavigableString, the tree is not considered unary.
Additionally, if the some tag is a ‘table’, the function will return True regardless of its children. This is because in the context of this application, ‘table’ tags are always considered unary.
- sec_parser.utils.bs4_.count_text_matches_in_descendants.count_text_matches_in_descendants(bs4_tag: bs4.Tag, predicate: Callable[[str], bool], *, exclude_links: bool | None = None) int