updates
This commit is contained in:
126
Backend/venv/lib/python3.12/site-packages/nltk/treetransforms.py
Normal file
126
Backend/venv/lib/python3.12/site-packages/nltk/treetransforms.py
Normal file
@@ -0,0 +1,126 @@
|
||||
# Natural Language Toolkit: Tree Transformations
|
||||
#
|
||||
# Copyright (C) 2005-2007 Oregon Graduate Institute
|
||||
# Author: Nathan Bodenstab <bodenstab@cslu.ogi.edu>
|
||||
# URL: <https://www.nltk.org/>
|
||||
# For license information, see LICENSE.TXT
|
||||
|
||||
r"""
|
||||
A collection of methods for tree (grammar) transformations used
|
||||
in parsing natural language.
|
||||
|
||||
Although many of these methods are technically grammar transformations
|
||||
(ie. Chomsky Norm Form), when working with treebanks it is much more
|
||||
natural to visualize these modifications in a tree structure. Hence,
|
||||
we will do all transformation directly to the tree itself.
|
||||
Transforming the tree directly also allows us to do parent annotation.
|
||||
A grammar can then be simply induced from the modified tree.
|
||||
|
||||
The following is a short tutorial on the available transformations.
|
||||
|
||||
1. Chomsky Normal Form (binarization)
|
||||
|
||||
It is well known that any grammar has a Chomsky Normal Form (CNF)
|
||||
equivalent grammar where CNF is defined by every production having
|
||||
either two non-terminals or one terminal on its right hand side.
|
||||
When we have hierarchically structured data (ie. a treebank), it is
|
||||
natural to view this in terms of productions where the root of every
|
||||
subtree is the head (left hand side) of the production and all of
|
||||
its children are the right hand side constituents. In order to
|
||||
convert a tree into CNF, we simply need to ensure that every subtree
|
||||
has either two subtrees as children (binarization), or one leaf node
|
||||
(non-terminal). In order to binarize a subtree with more than two
|
||||
children, we must introduce artificial nodes.
|
||||
|
||||
There are two popular methods to convert a tree into CNF: left
|
||||
factoring and right factoring. The following example demonstrates
|
||||
the difference between them. Example::
|
||||
|
||||
Original Right-Factored Left-Factored
|
||||
|
||||
A A A
|
||||
/ | \ / \ / \
|
||||
B C D ==> B A|<C-D> OR A|<B-C> D
|
||||
/ \ / \
|
||||
C D B C
|
||||
|
||||
2. Parent Annotation
|
||||
|
||||
In addition to binarizing the tree, there are two standard
|
||||
modifications to node labels we can do in the same traversal: parent
|
||||
annotation and Markov order-N smoothing (or sibling smoothing).
|
||||
|
||||
The purpose of parent annotation is to refine the probabilities of
|
||||
productions by adding a small amount of context. With this simple
|
||||
addition, a CYK (inside-outside, dynamic programming chart parse)
|
||||
can improve from 74% to 79% accuracy. A natural generalization from
|
||||
parent annotation is to grandparent annotation and beyond. The
|
||||
tradeoff becomes accuracy gain vs. computational complexity. We
|
||||
must also keep in mind data sparcity issues. Example::
|
||||
|
||||
Original Parent Annotation
|
||||
|
||||
A A^<?>
|
||||
/ | \ / \
|
||||
B C D ==> B^<A> A|<C-D>^<?> where ? is the
|
||||
/ \ parent of A
|
||||
C^<A> D^<A>
|
||||
|
||||
|
||||
3. Markov order-N smoothing
|
||||
|
||||
Markov smoothing combats data sparcity issues as well as decreasing
|
||||
computational requirements by limiting the number of children
|
||||
included in artificial nodes. In practice, most people use an order
|
||||
2 grammar. Example::
|
||||
|
||||
Original No Smoothing Markov order 1 Markov order 2 etc.
|
||||
|
||||
__A__ A A A
|
||||
/ /|\ \ / \ / \ / \
|
||||
B C D E F ==> B A|<C-D-E-F> ==> B A|<C> ==> B A|<C-D>
|
||||
/ \ / \ / \
|
||||
C ... C ... C ...
|
||||
|
||||
|
||||
|
||||
Annotation decisions can be thought about in the vertical direction
|
||||
(parent, grandparent, etc) and the horizontal direction (number of
|
||||
siblings to keep). Parameters to the following functions specify
|
||||
these values. For more information see:
|
||||
|
||||
Dan Klein and Chris Manning (2003) "Accurate Unlexicalized
|
||||
Parsing", ACL-03. https://www.aclweb.org/anthology/P03-1054
|
||||
|
||||
4. Unary Collapsing
|
||||
|
||||
Collapse unary productions (ie. subtrees with a single child) into a
|
||||
new non-terminal (Tree node). This is useful when working with
|
||||
algorithms that do not allow unary productions, yet you do not wish
|
||||
to lose the parent information. Example::
|
||||
|
||||
A
|
||||
|
|
||||
B ==> A+B
|
||||
/ \ / \
|
||||
C D C D
|
||||
|
||||
"""
|
||||
|
||||
from nltk.internals import deprecated
|
||||
from nltk.tree.transforms import chomsky_normal_form as cnf
|
||||
from nltk.tree.transforms import collapse_unary as cu
|
||||
from nltk.tree.transforms import un_chomsky_normal_form as ucnf
|
||||
|
||||
chomsky_normal_form = deprecated(
|
||||
"Import using `from nltk.tree import chomsky_normal_form` instead."
|
||||
)(cnf)
|
||||
un_chomsky_normal_form = deprecated(
|
||||
"Import using `from nltk.tree import un_chomsky_normal_form` instead."
|
||||
)(ucnf)
|
||||
collapse_unary = deprecated(
|
||||
"Import using `from nltk.tree import collapse_unary` instead."
|
||||
)(cu)
|
||||
|
||||
|
||||
__all__ = ["chomsky_normal_form", "un_chomsky_normal_form", "collapse_unary"]
|
||||
Reference in New Issue
Block a user