updates

2025-12-01 06:50:10 +02:00
parent 91f51bc6fe
commit 62c1fe5951
4682 changed files with 544807 additions and 31208 deletions
--- a/Backend/venv/lib/python3.12/site-packages/nltk/treetransforms.py
+++ b/Backend/venv/lib/python3.12/site-packages/nltk/treetransforms.py
@@ -0,0 +1,126 @@
+# Natural Language Toolkit: Tree Transformations
+#
+# Copyright (C) 2005-2007 Oregon Graduate Institute
+# Author: Nathan Bodenstab <bodenstab@cslu.ogi.edu>
+# URL: <https://www.nltk.org/>
+# For license information, see LICENSE.TXT
+
+r"""
+A collection of methods for tree (grammar) transformations used
+in parsing natural language.
+
+Although many of these methods are technically grammar transformations
+(ie. Chomsky Norm Form), when working with treebanks it is much more
+natural to visualize these modifications in a tree structure.  Hence,
+we will do all transformation directly to the tree itself.
+Transforming the tree directly also allows us to do parent annotation.
+A grammar can then be simply induced from the modified tree.
+
+The following is a short tutorial on the available transformations.
+
+ 1. Chomsky Normal Form (binarization)
+
+    It is well known that any grammar has a Chomsky Normal Form (CNF)
+    equivalent grammar where CNF is defined by every production having
+    either two non-terminals or one terminal on its right hand side.
+    When we have hierarchically structured data (ie. a treebank), it is
+    natural to view this in terms of productions where the root of every
+    subtree is the head (left hand side) of the production and all of
+    its children are the right hand side constituents.  In order to
+    convert a tree into CNF, we simply need to ensure that every subtree
+    has either two subtrees as children (binarization), or one leaf node
+    (non-terminal).  In order to binarize a subtree with more than two
+    children, we must introduce artificial nodes.
+
+    There are two popular methods to convert a tree into CNF: left
+    factoring and right factoring.  The following example demonstrates
+    the difference between them.  Example::
+
+     Original       Right-Factored     Left-Factored
+
+          A              A                      A
+        / | \          /   \                  /   \
+       B  C  D   ==>  B    A|<C-D>   OR   A|<B-C>  D
+                            /  \          /  \
+                           C    D        B    C
+
+ 2. Parent Annotation
+
+    In addition to binarizing the tree, there are two standard
+    modifications to node labels we can do in the same traversal: parent
+    annotation and Markov order-N smoothing (or sibling smoothing).
+
+    The purpose of parent annotation is to refine the probabilities of
+    productions by adding a small amount of context.  With this simple
+    addition, a CYK (inside-outside, dynamic programming chart parse)
+    can improve from 74% to 79% accuracy.  A natural generalization from
+    parent annotation is to grandparent annotation and beyond.  The
+    tradeoff becomes accuracy gain vs. computational complexity.  We
+    must also keep in mind data sparcity issues.  Example::
+
+     Original       Parent Annotation
+
+          A                A^<?>
+        / | \             /   \
+       B  C  D   ==>  B^<A>    A|<C-D>^<?>     where ? is the
+                                 /  \          parent of A
+                             C^<A>   D^<A>
+
+
+ 3. Markov order-N smoothing
+
+    Markov smoothing combats data sparcity issues as well as decreasing
+    computational requirements by limiting the number of children
+    included in artificial nodes.  In practice, most people use an order
+    2 grammar.  Example::
+
+      Original       No Smoothing       Markov order 1   Markov order 2   etc.
+
+       __A__            A                      A                A
+      / /|\ \         /   \                  /   \            /   \
+     B C D E F  ==>  B    A|<C-D-E-F>  ==>  B   A|<C>  ==>   B  A|<C-D>
+                            /   \               /   \            /   \
+                           C    ...            C    ...         C    ...
+
+
+
+    Annotation decisions can be thought about in the vertical direction
+    (parent, grandparent, etc) and the horizontal direction (number of
+    siblings to keep).  Parameters to the following functions specify
+    these values.  For more information see:
+
+    Dan Klein and Chris Manning (2003) "Accurate Unlexicalized
+    Parsing", ACL-03.  https://www.aclweb.org/anthology/P03-1054
+
+ 4. Unary Collapsing
+
+    Collapse unary productions (ie. subtrees with a single child) into a
+    new non-terminal (Tree node).  This is useful when working with
+    algorithms that do not allow unary productions, yet you do not wish
+    to lose the parent information.  Example::
+
+       A
+       |
+       B   ==>   A+B
+      / \        / \
+     C   D      C   D
+
+"""
+
+from nltk.internals import deprecated
+from nltk.tree.transforms import chomsky_normal_form as cnf
+from nltk.tree.transforms import collapse_unary as cu
+from nltk.tree.transforms import un_chomsky_normal_form as ucnf
+
+chomsky_normal_form = deprecated(
+    "Import using `from nltk.tree import chomsky_normal_form` instead."
+)(cnf)
+un_chomsky_normal_form = deprecated(
+    "Import using `from nltk.tree import un_chomsky_normal_form` instead."
+)(ucnf)
+collapse_unary = deprecated(
+    "Import using `from nltk.tree import collapse_unary` instead."
+)(cu)
+
+
+__all__ = ["chomsky_normal_form", "un_chomsky_normal_form", "collapse_unary"]