updates

2025-12-01 06:50:10 +02:00
parent 91f51bc6fe
commit 62c1fe5951
4682 changed files with 544807 additions and 31208 deletions
--- a/Backend/venv/lib/python3.12/site-packages/nltk/chunk/init.py
+++ b/Backend/venv/lib/python3.12/site-packages/nltk/chunk/init.py
@@ -0,0 +1,205 @@
+# Natural Language Toolkit: Chunkers
+#
+# Copyright (C) 2001-2025 NLTK Project
+# Author: Steven Bird <stevenbird1@gmail.com>
+#         Edward Loper <edloper@gmail.com>
+# URL: <https://www.nltk.org/>
+# For license information, see LICENSE.TXT
+#
+
+"""
+Classes and interfaces for identifying non-overlapping linguistic
+groups (such as base noun phrases) in unrestricted text.  This task is
+called "chunk parsing" or "chunking", and the identified groups are
+called "chunks".  The chunked text is represented using a shallow
+tree called a "chunk structure."  A chunk structure is a tree
+containing tokens and chunks, where each chunk is a subtree containing
+only tokens.  For example, the chunk structure for base noun phrase
+chunks in the sentence "I saw the big dog on the hill" is::
+
+  (SENTENCE:
+    (NP: <I>)
+    <saw>
+    (NP: <the> <big> <dog>)
+    <on>
+    (NP: <the> <hill>))
+
+To convert a chunk structure back to a list of tokens, simply use the
+chunk structure's ``leaves()`` method.
+
+This module defines ``ChunkParserI``, a standard interface for
+chunking texts; and ``RegexpChunkParser``, a regular-expression based
+implementation of that interface. It also defines ``ChunkScore``, a
+utility class for scoring chunk parsers.
+
+RegexpChunkParser
+=================
+
+``RegexpChunkParser`` is an implementation of the chunk parser interface
+that uses regular-expressions over tags to chunk a text.  Its
+``parse()`` method first constructs a ``ChunkString``, which encodes a
+particular chunking of the input text.  Initially, nothing is
+chunked.  ``parse.RegexpChunkParser`` then applies a sequence of
+``RegexpChunkRule`` rules to the ``ChunkString``, each of which modifies
+the chunking that it encodes.  Finally, the ``ChunkString`` is
+transformed back into a chunk structure, which is returned.
+
+``RegexpChunkParser`` can only be used to chunk a single kind of phrase.
+For example, you can use an ``RegexpChunkParser`` to chunk the noun
+phrases in a text, or the verb phrases in a text; but you can not
+use it to simultaneously chunk both noun phrases and verb phrases in
+the same text.  (This is a limitation of ``RegexpChunkParser``, not of
+chunk parsers in general.)
+
+RegexpChunkRules
+----------------
+
+A ``RegexpChunkRule`` is a transformational rule that updates the
+chunking of a text by modifying its ``ChunkString``.  Each
+``RegexpChunkRule`` defines the ``apply()`` method, which modifies
+the chunking encoded by a ``ChunkString``.  The
+``RegexpChunkRule`` class itself can be used to implement any
+transformational rule based on regular expressions.  There are
+also a number of subclasses, which can be used to implement
+simpler types of rules:
+
+    - ``ChunkRule`` chunks anything that matches a given regular
+      expression.
+    - ``StripRule`` strips anything that matches a given regular
+      expression.
+    - ``UnChunkRule`` will un-chunk any chunk that matches a given
+      regular expression.
+    - ``MergeRule`` can be used to merge two contiguous chunks.
+    - ``SplitRule`` can be used to split a single chunk into two
+      smaller chunks.
+    - ``ExpandLeftRule`` will expand a chunk to incorporate new
+      unchunked material on the left.
+    - ``ExpandRightRule`` will expand a chunk to incorporate new
+      unchunked material on the right.
+
+Tag Patterns
+~~~~~~~~~~~~
+
+A ``RegexpChunkRule`` uses a modified version of regular
+expression patterns, called "tag patterns".  Tag patterns are
+used to match sequences of tags.  Examples of tag patterns are::
+
+     r'(<DT>|<JJ>|<NN>)+'
+     r'<NN>+'
+     r'<NN.*>'
+
+The differences between regular expression patterns and tag
+patterns are:
+
+    - In tag patterns, ``'<'`` and ``'>'`` act as parentheses; so
+      ``'<NN>+'`` matches one or more repetitions of ``'<NN>'``, not
+      ``'<NN'`` followed by one or more repetitions of ``'>'``.
+    - Whitespace in tag patterns is ignored.  So
+      ``'<DT> | <NN>'`` is equivalent to ``'<DT>|<NN>'``
+    - In tag patterns, ``'.'`` is equivalent to ``'[^{}<>]'``; so
+      ``'<NN.*>'`` matches any single tag starting with ``'NN'``.
+
+The function ``tag_pattern2re_pattern`` can be used to transform
+a tag pattern to an equivalent regular expression pattern.
+
+Efficiency
+----------
+
+Preliminary tests indicate that ``RegexpChunkParser`` can chunk at a
+rate of about 300 tokens/second, with a moderately complex rule set.
+
+There may be problems if ``RegexpChunkParser`` is used with more than
+5,000 tokens at a time.  In particular, evaluation of some regular
+expressions may cause the Python regular expression engine to
+exceed its maximum recursion depth.  We have attempted to minimize
+these problems, but it is impossible to avoid them completely.  We
+therefore recommend that you apply the chunk parser to a single
+sentence at a time.
+
+Emacs Tip
+---------
+
+If you evaluate the following elisp expression in emacs, it will
+colorize a ``ChunkString`` when you use an interactive python shell
+with emacs or xemacs ("C-c !")::
+
+    (let ()
+      (defconst comint-mode-font-lock-keywords
+        '(("<[^>]+>" 0 'font-lock-reference-face)
+          ("[{}]" 0 'font-lock-function-name-face)))
+      (add-hook 'comint-mode-hook (lambda () (turn-on-font-lock))))
+
+You can evaluate this code by copying it to a temporary buffer,
+placing the cursor after the last close parenthesis, and typing
+"``C-x C-e``".  You should evaluate it before running the interactive
+session.  The change will last until you close emacs.
+
+Unresolved Issues
+-----------------
+
+If we use the ``re`` module for regular expressions, Python's
+regular expression engine generates "maximum recursion depth
+exceeded" errors when processing very large texts, even for
+regular expressions that should not require any recursion.  We
+therefore use the ``pre`` module instead.  But note that ``pre``
+does not include Unicode support, so this module will not work
+with unicode strings.  Note also that ``pre`` regular expressions
+are not quite as advanced as ``re`` ones (e.g., no leftward
+zero-length assertions).
+
+:type CHUNK_TAG_PATTERN: regexp
+:var CHUNK_TAG_PATTERN: A regular expression to test whether a tag
+     pattern is valid.
+"""
+
+from nltk.chunk.api import ChunkParserI
+from nltk.chunk.named_entity import Maxent_NE_Chunker
+from nltk.chunk.regexp import RegexpChunkParser, RegexpParser
+from nltk.chunk.util import (
+    ChunkScore,
+    accuracy,
+    conllstr2tree,
+    conlltags2tree,
+    ieerstr2tree,
+    tagstr2tree,
+    tree2conllstr,
+    tree2conlltags,
+)
+
+
+def ne_chunker(fmt="multiclass"):
+    """
+    Load NLTK's currently recommended named entity chunker.
+    """
+    return Maxent_NE_Chunker(fmt)
+
+
+def ne_chunk(tagged_tokens, binary=False):
+    """
+    Use NLTK's currently recommended named entity chunker to
+    chunk the given list of tagged tokens.
+
+    >>> from nltk.chunk import ne_chunk
+    >>> from nltk.corpus import treebank
+    >>> from pprint import pprint
+    >>> pprint(ne_chunk(treebank.tagged_sents()[2][8:14])) # doctest: +NORMALIZE_WHITESPACE
+    Tree('S', [('chairman', 'NN'), ('of', 'IN'), Tree('ORGANIZATION', [('Consolidated', 'NNP'), ('Gold', 'NNP'), ('Fields', 'NNP')]), ('PLC', 'NNP')])
+
+    """
+    if binary:
+        chunker = ne_chunker(fmt="binary")
+    else:
+        chunker = ne_chunker()
+    return chunker.parse(tagged_tokens)
+
+
+def ne_chunk_sents(tagged_sentences, binary=False):
+    """
+    Use NLTK's currently recommended named entity chunker to chunk the
+    given list of tagged sentences, each consisting of a list of tagged tokens.
+    """
+    if binary:
+        chunker = ne_chunker(fmt="binary")
+    else:
+        chunker = ne_chunker()
+    return chunker.parse_sents(tagged_sentences)
--- a/Backend/venv/lib/python3.12/site-packages/nltk/chunk/pycache/init.cpython-312.pyc
+++ b/Backend/venv/lib/python3.12/site-packages/nltk/chunk/pycache/init.cpython-312.pyc
--- a/Backend/venv/lib/python3.12/site-packages/nltk/chunk/pycache/api.cpython-312.pyc
+++ b/Backend/venv/lib/python3.12/site-packages/nltk/chunk/pycache/api.cpython-312.pyc
--- a/Backend/venv/lib/python3.12/site-packages/nltk/chunk/pycache/named_entity.cpython-312.pyc
+++ b/Backend/venv/lib/python3.12/site-packages/nltk/chunk/pycache/named_entity.cpython-312.pyc
--- a/Backend/venv/lib/python3.12/site-packages/nltk/chunk/pycache/regexp.cpython-312.pyc
+++ b/Backend/venv/lib/python3.12/site-packages/nltk/chunk/pycache/regexp.cpython-312.pyc
--- a/Backend/venv/lib/python3.12/site-packages/nltk/chunk/pycache/util.cpython-312.pyc
+++ b/Backend/venv/lib/python3.12/site-packages/nltk/chunk/pycache/util.cpython-312.pyc
--- a/Backend/venv/lib/python3.12/site-packages/nltk/chunk/api.py
+++ b/Backend/venv/lib/python3.12/site-packages/nltk/chunk/api.py
@@ -0,0 +1,56 @@
+# Natural Language Toolkit: Chunk parsing API
+#
+# Copyright (C) 2001-2025 NLTK Project
+# Author: Edward Loper <edloper@gmail.com>
+#         Steven Bird <stevenbird1@gmail.com> (minor additions)
+# URL: <https://www.nltk.org/>
+# For license information, see LICENSE.TXT
+
+##//////////////////////////////////////////////////////
+##  Chunk Parser Interface
+##//////////////////////////////////////////////////////
+
+from nltk.chunk.util import ChunkScore
+from nltk.internals import deprecated
+from nltk.parse import ParserI
+
+
+class ChunkParserI(ParserI):
+    """
+    A processing interface for identifying non-overlapping groups in
+    unrestricted text.  Typically, chunk parsers are used to find base
+    syntactic constituents, such as base noun phrases.  Unlike
+    ``ParserI``, ``ChunkParserI`` guarantees that the ``parse()`` method
+    will always generate a parse.
+    """
+
+    def parse(self, tokens):
+        """
+        Return the best chunk structure for the given tokens
+        and return a tree.
+
+        :param tokens: The list of (word, tag) tokens to be chunked.
+        :type tokens: list(tuple)
+        :rtype: Tree
+        """
+        raise NotImplementedError()
+
+    @deprecated("Use accuracy(gold) instead.")
+    def evaluate(self, gold):
+        return self.accuracy(gold)
+
+    def accuracy(self, gold):
+        """
+        Score the accuracy of the chunker against the gold standard.
+        Remove the chunking the gold standard text, rechunk it using
+        the chunker, and return a ``ChunkScore`` object
+        reflecting the performance of this chunk parser.
+
+        :type gold: list(Tree)
+        :param gold: The list of chunked sentences to score the chunker on.
+        :rtype: ChunkScore
+        """
+        chunkscore = ChunkScore()
+        for correct in gold:
+            chunkscore.score(correct, self.parse(correct.leaves()))
+        return chunkscore
--- a/Backend/venv/lib/python3.12/site-packages/nltk/chunk/named_entity.py
+++ b/Backend/venv/lib/python3.12/site-packages/nltk/chunk/named_entity.py
@@ -0,0 +1,407 @@
+# Natural Language Toolkit: Chunk parsing API
+#
+# Copyright (C) 2001-2025 NLTK Project
+# Author: Edward Loper <edloper@gmail.com>
+#         Eric Kafe <kafe.eric@gmail.com> (tab-format models)
+# URL: <https://www.nltk.org/>
+# For license information, see LICENSE.TXT
+
+"""
+Named entity chunker
+"""
+
+import os
+import re
+from xml.etree import ElementTree as ET
+
+from nltk.tag import ClassifierBasedTagger, pos_tag
+
+try:
+    from nltk.classify import MaxentClassifier
+except ImportError:
+    pass
+
+from nltk.chunk.api import ChunkParserI
+from nltk.chunk.util import ChunkScore
+from nltk.data import find
+from nltk.tokenize import word_tokenize
+from nltk.tree import Tree
+
+
+class NEChunkParserTagger(ClassifierBasedTagger):
+    """
+    The IOB tagger used by the chunk parser.
+    """
+
+    def __init__(self, train=None, classifier=None):
+        ClassifierBasedTagger.__init__(
+            self,
+            train=train,
+            classifier_builder=self._classifier_builder,
+            classifier=classifier,
+        )
+
+    def _classifier_builder(self, train):
+        return MaxentClassifier.train(
+            #          "megam" cannot be the default algorithm since it requires compiling with ocaml
+            train,
+            algorithm="iis",
+            gaussian_prior_sigma=1,
+            trace=2,
+        )
+
+    def _english_wordlist(self):
+        try:
+            wl = self._en_wordlist
+        except AttributeError:
+            from nltk.corpus import words
+
+            self._en_wordlist = set(words.words("en-basic"))
+            wl = self._en_wordlist
+        return wl
+
+    def _feature_detector(self, tokens, index, history):
+        word = tokens[index][0]
+        pos = simplify_pos(tokens[index][1])
+        if index == 0:
+            prevword = prevprevword = None
+            prevpos = prevprevpos = None
+            prevshape = prevtag = prevprevtag = None
+        elif index == 1:
+            prevword = tokens[index - 1][0].lower()
+            prevprevword = None
+            prevpos = simplify_pos(tokens[index - 1][1])
+            prevprevpos = None
+            prevtag = history[index - 1][0]
+            prevshape = prevprevtag = None
+        else:
+            prevword = tokens[index - 1][0].lower()
+            prevprevword = tokens[index - 2][0].lower()
+            prevpos = simplify_pos(tokens[index - 1][1])
+            prevprevpos = simplify_pos(tokens[index - 2][1])
+            prevtag = history[index - 1]
+            prevprevtag = history[index - 2]
+            prevshape = shape(prevword)
+        if index == len(tokens) - 1:
+            nextword = nextnextword = None
+            nextpos = nextnextpos = None
+        elif index == len(tokens) - 2:
+            nextword = tokens[index + 1][0].lower()
+            nextpos = tokens[index + 1][1].lower()
+            nextnextword = None
+            nextnextpos = None
+        else:
+            nextword = tokens[index + 1][0].lower()
+            nextpos = tokens[index + 1][1].lower()
+            nextnextword = tokens[index + 2][0].lower()
+            nextnextpos = tokens[index + 2][1].lower()
+
+        # 89.6
+        features = {
+            "bias": True,
+            "shape": shape(word),
+            "wordlen": len(word),
+            "prefix3": word[:3].lower(),
+            "suffix3": word[-3:].lower(),
+            "pos": pos,
+            "word": word,
+            "en-wordlist": (word in self._english_wordlist()),
+            "prevtag": prevtag,
+            "prevpos": prevpos,
+            "nextpos": nextpos,
+            "prevword": prevword,
+            "nextword": nextword,
+            "word+nextpos": f"{word.lower()}+{nextpos}",
+            "pos+prevtag": f"{pos}+{prevtag}",
+            "shape+prevtag": f"{prevshape}+{prevtag}",
+        }
+
+        return features
+
+
+class NEChunkParser(ChunkParserI):
+    """
+    Expected input: list of pos-tagged words
+    """
+
+    def __init__(self, train):
+        self._train(train)
+
+    def parse(self, tokens):
+        """
+        Each token should be a pos-tagged word
+        """
+        tagged = self._tagger.tag(tokens)
+        tree = self._tagged_to_parse(tagged)
+        return tree
+
+    def _train(self, corpus):
+        # Convert to tagged sequence
+        corpus = [self._parse_to_tagged(s) for s in corpus]
+
+        self._tagger = NEChunkParserTagger(train=corpus)
+
+    def _tagged_to_parse(self, tagged_tokens):
+        """
+        Convert a list of tagged tokens to a chunk-parse tree.
+        """
+        sent = Tree("S", [])
+
+        for tok, tag in tagged_tokens:
+            if tag == "O":
+                sent.append(tok)
+            elif tag.startswith("B-"):
+                sent.append(Tree(tag[2:], [tok]))
+            elif tag.startswith("I-"):
+                if sent and isinstance(sent[-1], Tree) and sent[-1].label() == tag[2:]:
+                    sent[-1].append(tok)
+                else:
+                    sent.append(Tree(tag[2:], [tok]))
+        return sent
+
+    @staticmethod
+    def _parse_to_tagged(sent):
+        """
+        Convert a chunk-parse tree to a list of tagged tokens.
+        """
+        toks = []
+        for child in sent:
+            if isinstance(child, Tree):
+                if len(child) == 0:
+                    print("Warning -- empty chunk in sentence")
+                    continue
+                toks.append((child[0], f"B-{child.label()}"))
+                for tok in child[1:]:
+                    toks.append((tok, f"I-{child.label()}"))
+            else:
+                toks.append((child, "O"))
+        return toks
+
+
+def shape(word):
+    if re.match(r"[0-9]+(\.[0-9]*)?|[0-9]*\.[0-9]+$", word, re.UNICODE):
+        return "number"
+    elif re.match(r"\W+$", word, re.UNICODE):
+        return "punct"
+    elif re.match(r"\w+$", word, re.UNICODE):
+        if word.istitle():
+            return "upcase"
+        elif word.islower():
+            return "downcase"
+        else:
+            return "mixedcase"
+    else:
+        return "other"
+
+
+def simplify_pos(s):
+    if s.startswith("V"):
+        return "V"
+    else:
+        return s.split("-")[0]
+
+
+def postag_tree(tree):
+    # Part-of-speech tagging.
+    words = tree.leaves()
+    tag_iter = (pos for (word, pos) in pos_tag(words))
+    newtree = Tree("S", [])
+    for child in tree:
+        if isinstance(child, Tree):
+            newtree.append(Tree(child.label(), []))
+            for subchild in child:
+                newtree[-1].append((subchild, next(tag_iter)))
+        else:
+            newtree.append((child, next(tag_iter)))
+    return newtree
+
+
+def load_ace_data(roots, fmt="binary", skip_bnews=True):
+    for root in roots:
+        for root, dirs, files in os.walk(root):
+            if root.endswith("bnews") and skip_bnews:
+                continue
+            for f in files:
+                if f.endswith(".sgm"):
+                    yield from load_ace_file(os.path.join(root, f), fmt)
+
+
+def load_ace_file(textfile, fmt):
+    print(f"  - {os.path.split(textfile)[1]}")
+    annfile = textfile + ".tmx.rdc.xml"
+
+    # Read the xml file, and get a list of entities
+    entities = []
+    with open(annfile) as infile:
+        xml = ET.parse(infile).getroot()
+    for entity in xml.findall("document/entity"):
+        typ = entity.find("entity_type").text
+        for mention in entity.findall("entity_mention"):
+            if mention.get("TYPE") != "NAME":
+                continue  # only NEs
+            s = int(mention.find("head/charseq/start").text)
+            e = int(mention.find("head/charseq/end").text) + 1
+            entities.append((s, e, typ))
+
+    # Read the text file, and mark the entities.
+    with open(textfile) as infile:
+        text = infile.read()
+
+    # Strip XML tags, since they don't count towards the indices
+    text = re.sub("<(?!/?TEXT)[^>]+>", "", text)
+
+    # Blank out anything before/after <TEXT>
+    def subfunc(m):
+        return " " * (m.end() - m.start() - 6)
+
+    text = re.sub(r"[\s\S]*<TEXT>", subfunc, text)
+    text = re.sub(r"</TEXT>[\s\S]*", "", text)
+
+    # Simplify quotes
+    text = re.sub("``", ' "', text)
+    text = re.sub("''", '" ', text)
+
+    entity_types = {typ for (s, e, typ) in entities}
+
+    # Binary distinction (NE or not NE)
+    if fmt == "binary":
+        i = 0
+        toks = Tree("S", [])
+        for s, e, typ in sorted(entities):
+            if s < i:
+                s = i  # Overlapping!  Deal with this better?
+            if e <= s:
+                continue
+            toks.extend(word_tokenize(text[i:s]))
+            toks.append(Tree("NE", text[s:e].split()))
+            i = e
+        toks.extend(word_tokenize(text[i:]))
+        yield toks
+
+    # Multiclass distinction (NE type)
+    elif fmt == "multiclass":
+        i = 0
+        toks = Tree("S", [])
+        for s, e, typ in sorted(entities):
+            if s < i:
+                s = i  # Overlapping!  Deal with this better?
+            if e <= s:
+                continue
+            toks.extend(word_tokenize(text[i:s]))
+            toks.append(Tree(typ, text[s:e].split()))
+            i = e
+        toks.extend(word_tokenize(text[i:]))
+        yield toks
+
+    else:
+        raise ValueError("bad fmt value")
+
+
+# This probably belongs in a more general-purpose location (as does
+# the parse_to_tagged function).
+def cmp_chunks(correct, guessed):
+    correct = NEChunkParser._parse_to_tagged(correct)
+    guessed = NEChunkParser._parse_to_tagged(guessed)
+    ellipsis = False
+    for (w, ct), (w, gt) in zip(correct, guessed):
+        if ct == gt == "O":
+            if not ellipsis:
+                print(f"  {ct:15} {gt:15} {w}")
+                print("  {:15} {:15} {2}".format("...", "...", "..."))
+                ellipsis = True
+        else:
+            ellipsis = False
+            print(f"  {ct:15} {gt:15} {w}")
+
+
+# ======================================================================================
+
+
+class Maxent_NE_Chunker(NEChunkParser):
+    """
+    Expected input: list of pos-tagged words
+    """
+
+    def __init__(self, fmt="multiclass"):
+        from nltk.data import find
+
+        self._fmt = fmt
+        self._tab_dir = find(f"chunkers/maxent_ne_chunker_tab/english_ace_{fmt}/")
+        self.load_params()
+
+    def load_params(self):
+        from nltk.classify.maxent import BinaryMaxentFeatureEncoding, load_maxent_params
+
+        wgt, mpg, lab, aon = load_maxent_params(self._tab_dir)
+        mc = MaxentClassifier(
+            BinaryMaxentFeatureEncoding(lab, mpg, alwayson_features=aon), wgt
+        )
+        self._tagger = NEChunkParserTagger(classifier=mc)
+
+    def save_params(self):
+        from nltk.classify.maxent import save_maxent_params
+
+        classif = self._tagger._classifier
+        ecg = classif._encoding
+        wgt = classif._weights
+        mpg = ecg._mapping
+        lab = ecg._labels
+        aon = ecg._alwayson
+        fmt = self._fmt
+        save_maxent_params(wgt, mpg, lab, aon, tab_dir=f"/tmp/english_ace_{fmt}/")
+
+
+def build_model(fmt="multiclass"):
+    chunker = Maxent_NE_Chunker(fmt)
+    chunker.save_params()
+    return chunker
+
+
+# ======================================================================================
+
+"""
+2004 update: pickles are not supported anymore.
+
+Deprecated:
+
+def build_model(fmt="binary"):
+    print("Loading training data...")
+    train_paths = [
+        find("corpora/ace_data/ace.dev"),
+        find("corpora/ace_data/ace.heldout"),
+        find("corpora/ace_data/bbn.dev"),
+        find("corpora/ace_data/muc.dev"),
+    ]
+    train_trees = load_ace_data(train_paths, fmt)
+    train_data = [postag_tree(t) for t in train_trees]
+    print("Training...")
+    cp = NEChunkParser(train_data)
+    del train_data
+
+    print("Loading eval data...")
+    eval_paths = [find("corpora/ace_data/ace.eval")]
+    eval_trees = load_ace_data(eval_paths, fmt)
+    eval_data = [postag_tree(t) for t in eval_trees]
+
+    print("Evaluating...")
+    chunkscore = ChunkScore()
+    for i, correct in enumerate(eval_data):
+        guess = cp.parse(correct.leaves())
+        chunkscore.score(correct, guess)
+        if i < 3:
+            cmp_chunks(correct, guess)
+    print(chunkscore)
+
+    outfilename = f"/tmp/ne_chunker_{fmt}.pickle"
+    print(f"Saving chunker to {outfilename}...")
+
+    with open(outfilename, "wb") as outfile:
+        pickle.dump(cp, outfile, -1)
+
+    return cp
+"""
+
+if __name__ == "__main__":
+    # Make sure that the object has the right class name:
+    build_model("binary")
+    build_model("multiclass")
--- a/Backend/venv/lib/python3.12/site-packages/nltk/chunk/regexp.py
+++ b/Backend/venv/lib/python3.12/site-packages/nltk/chunk/regexp.py
--- a/Backend/venv/lib/python3.12/site-packages/nltk/chunk/util.py
+++ b/Backend/venv/lib/python3.12/site-packages/nltk/chunk/util.py
@@ -0,0 +1,642 @@
+# Natural Language Toolkit: Chunk format conversions
+#
+# Copyright (C) 2001-2025 NLTK Project
+# Author: Edward Loper <edloper@gmail.com>
+#         Steven Bird <stevenbird1@gmail.com> (minor additions)
+# URL: <https://www.nltk.org/>
+# For license information, see LICENSE.TXT
+
+import re
+
+from nltk.metrics import accuracy as _accuracy
+from nltk.tag.mapping import map_tag
+from nltk.tag.util import str2tuple
+from nltk.tree import Tree
+
+##//////////////////////////////////////////////////////
+## EVALUATION
+##//////////////////////////////////////////////////////
+
+
+def accuracy(chunker, gold):
+    """
+    Score the accuracy of the chunker against the gold standard.
+    Strip the chunk information from the gold standard and rechunk it using
+    the chunker, then compute the accuracy score.
+
+    :type chunker: ChunkParserI
+    :param chunker: The chunker being evaluated.
+    :type gold: tree
+    :param gold: The chunk structures to score the chunker on.
+    :rtype: float
+    """
+
+    gold_tags = []
+    test_tags = []
+    for gold_tree in gold:
+        test_tree = chunker.parse(gold_tree.flatten())
+        gold_tags += tree2conlltags(gold_tree)
+        test_tags += tree2conlltags(test_tree)
+
+    #    print 'GOLD:', gold_tags[:50]
+    #    print 'TEST:', test_tags[:50]
+    return _accuracy(gold_tags, test_tags)
+
+
+# Patched for increased performance by Yoav Goldberg <yoavg@cs.bgu.ac.il>, 2006-01-13
+#  -- statistics are evaluated only on demand, instead of at every sentence evaluation
+#
+# SB: use nltk.metrics for precision/recall scoring?
+#
+class ChunkScore:
+    """
+    A utility class for scoring chunk parsers.  ``ChunkScore`` can
+    evaluate a chunk parser's output, based on a number of statistics
+    (precision, recall, f-measure, misssed chunks, incorrect chunks).
+    It can also combine the scores from the parsing of multiple texts;
+    this makes it significantly easier to evaluate a chunk parser that
+    operates one sentence at a time.
+
+    Texts are evaluated with the ``score`` method.  The results of
+    evaluation can be accessed via a number of accessor methods, such
+    as ``precision`` and ``f_measure``.  A typical use of the
+    ``ChunkScore`` class is::
+
+        >>> chunkscore = ChunkScore()           # doctest: +SKIP
+        >>> for correct in correct_sentences:   # doctest: +SKIP
+        ...     guess = chunkparser.parse(correct.leaves())   # doctest: +SKIP
+        ...     chunkscore.score(correct, guess)              # doctest: +SKIP
+        >>> print('F Measure:', chunkscore.f_measure())       # doctest: +SKIP
+        F Measure: 0.823
+
+    :ivar kwargs: Keyword arguments:
+
+        - max_tp_examples: The maximum number actual examples of true
+          positives to record.  This affects the ``correct`` member
+          function: ``correct`` will not return more than this number
+          of true positive examples.  This does *not* affect any of
+          the numerical metrics (precision, recall, or f-measure)
+
+        - max_fp_examples: The maximum number actual examples of false
+          positives to record.  This affects the ``incorrect`` member
+          function and the ``guessed`` member function: ``incorrect``
+          will not return more than this number of examples, and
+          ``guessed`` will not return more than this number of true
+          positive examples.  This does *not* affect any of the
+          numerical metrics (precision, recall, or f-measure)
+
+        - max_fn_examples: The maximum number actual examples of false
+          negatives to record.  This affects the ``missed`` member
+          function and the ``correct`` member function: ``missed``
+          will not return more than this number of examples, and
+          ``correct`` will not return more than this number of true
+          negative examples.  This does *not* affect any of the
+          numerical metrics (precision, recall, or f-measure)
+
+        - chunk_label: A regular expression indicating which chunks
+          should be compared.  Defaults to ``'.*'`` (i.e., all chunks).
+
+    :type _tp: list(Token)
+    :ivar _tp: List of true positives
+    :type _fp: list(Token)
+    :ivar _fp: List of false positives
+    :type _fn: list(Token)
+    :ivar _fn: List of false negatives
+
+    :type _tp_num: int
+    :ivar _tp_num: Number of true positives
+    :type _fp_num: int
+    :ivar _fp_num: Number of false positives
+    :type _fn_num: int
+    :ivar _fn_num: Number of false negatives.
+    """
+
+    def __init__(self, **kwargs):
+        self._correct = set()
+        self._guessed = set()
+        self._tp = set()
+        self._fp = set()
+        self._fn = set()
+        self._max_tp = kwargs.get("max_tp_examples", 100)
+        self._max_fp = kwargs.get("max_fp_examples", 100)
+        self._max_fn = kwargs.get("max_fn_examples", 100)
+        self._chunk_label = kwargs.get("chunk_label", ".*")
+        self._tp_num = 0
+        self._fp_num = 0
+        self._fn_num = 0
+        self._count = 0
+        self._tags_correct = 0.0
+        self._tags_total = 0.0
+
+        self._measuresNeedUpdate = False
+
+    def _updateMeasures(self):
+        if self._measuresNeedUpdate:
+            self._tp = self._guessed & self._correct
+            self._fn = self._correct - self._guessed
+            self._fp = self._guessed - self._correct
+            self._tp_num = len(self._tp)
+            self._fp_num = len(self._fp)
+            self._fn_num = len(self._fn)
+            self._measuresNeedUpdate = False
+
+    def score(self, correct, guessed):
+        """
+        Given a correctly chunked sentence, score another chunked
+        version of the same sentence.
+
+        :type correct: chunk structure
+        :param correct: The known-correct ("gold standard") chunked
+            sentence.
+        :type guessed: chunk structure
+        :param guessed: The chunked sentence to be scored.
+        """
+        self._correct |= _chunksets(correct, self._count, self._chunk_label)
+        self._guessed |= _chunksets(guessed, self._count, self._chunk_label)
+        self._count += 1
+        self._measuresNeedUpdate = True
+        # Keep track of per-tag accuracy (if possible)
+        try:
+            correct_tags = tree2conlltags(correct)
+            guessed_tags = tree2conlltags(guessed)
+        except ValueError:
+            # This exception case is for nested chunk structures,
+            # where tree2conlltags will fail with a ValueError: "Tree
+            # is too deeply nested to be printed in CoNLL format."
+            correct_tags = guessed_tags = ()
+        self._tags_total += len(correct_tags)
+        self._tags_correct += sum(
+            1 for (t, g) in zip(guessed_tags, correct_tags) if t == g
+        )
+
+    def accuracy(self):
+        """
+        Return the overall tag-based accuracy for all text that have
+        been scored by this ``ChunkScore``, using the IOB (conll2000)
+        tag encoding.
+
+        :rtype: float
+        """
+        if self._tags_total == 0:
+            return 1
+        return self._tags_correct / self._tags_total
+
+    def precision(self):
+        """
+        Return the overall precision for all texts that have been
+        scored by this ``ChunkScore``.
+
+        :rtype: float
+        """
+        self._updateMeasures()
+        div = self._tp_num + self._fp_num
+        if div == 0:
+            return 0
+        else:
+            return self._tp_num / div
+
+    def recall(self):
+        """
+        Return the overall recall for all texts that have been
+        scored by this ``ChunkScore``.
+
+        :rtype: float
+        """
+        self._updateMeasures()
+        div = self._tp_num + self._fn_num
+        if div == 0:
+            return 0
+        else:
+            return self._tp_num / div
+
+    def f_measure(self, alpha=0.5):
+        """
+        Return the overall F measure for all texts that have been
+        scored by this ``ChunkScore``.
+
+        :param alpha: the relative weighting of precision and recall.
+            Larger alpha biases the score towards the precision value,
+            while smaller alpha biases the score towards the recall
+            value.  ``alpha`` should have a value in the range [0,1].
+        :type alpha: float
+        :rtype: float
+        """
+        self._updateMeasures()
+        p = self.precision()
+        r = self.recall()
+        if p == 0 or r == 0:  # what if alpha is 0 or 1?
+            return 0
+        return 1 / (alpha / p + (1 - alpha) / r)
+
+    def missed(self):
+        """
+        Return the chunks which were included in the
+        correct chunk structures, but not in the guessed chunk
+        structures, listed in input order.
+
+        :rtype: list of chunks
+        """
+        self._updateMeasures()
+        chunks = list(self._fn)
+        return [c[1] for c in chunks]  # discard position information
+
+    def incorrect(self):
+        """
+        Return the chunks which were included in the guessed chunk structures,
+        but not in the correct chunk structures, listed in input order.
+
+        :rtype: list of chunks
+        """
+        self._updateMeasures()
+        chunks = list(self._fp)
+        return [c[1] for c in chunks]  # discard position information
+
+    def correct(self):
+        """
+        Return the chunks which were included in the correct
+        chunk structures, listed in input order.
+
+        :rtype: list of chunks
+        """
+        chunks = list(self._correct)
+        return [c[1] for c in chunks]  # discard position information
+
+    def guessed(self):
+        """
+        Return the chunks which were included in the guessed
+        chunk structures, listed in input order.
+
+        :rtype: list of chunks
+        """
+        chunks = list(self._guessed)
+        return [c[1] for c in chunks]  # discard position information
+
+    def __len__(self):
+        self._updateMeasures()
+        return self._tp_num + self._fn_num
+
+    def __repr__(self):
+        """
+        Return a concise representation of this ``ChunkScoring``.
+
+        :rtype: str
+        """
+        return "<ChunkScoring of " + repr(len(self)) + " chunks>"
+
+    def __str__(self):
+        """
+        Return a verbose representation of this ``ChunkScoring``.
+        This representation includes the precision, recall, and
+        f-measure scores.  For other information about the score,
+        use the accessor methods (e.g., ``missed()`` and ``incorrect()``).
+
+        :rtype: str
+        """
+        return (
+            "ChunkParse score:\n"
+            + f"    IOB Accuracy: {self.accuracy() * 100:5.1f}%\n"
+            + f"    Precision:    {self.precision() * 100:5.1f}%\n"
+            + f"    Recall:       {self.recall() * 100:5.1f}%\n"
+            + f"    F-Measure:    {self.f_measure() * 100:5.1f}%"
+        )
+
+
+# extract chunks, and assign unique id, the absolute position of
+# the first word of the chunk
+def _chunksets(t, count, chunk_label):
+    pos = 0
+    chunks = []
+    for child in t:
+        if isinstance(child, Tree):
+            if re.match(chunk_label, child.label()):
+                chunks.append(((count, pos), child.freeze()))
+            pos += len(child.leaves())
+        else:
+            pos += 1
+    return set(chunks)
+
+
+def tagstr2tree(
+    s, chunk_label="NP", root_label="S", sep="/", source_tagset=None, target_tagset=None
+):
+    """
+    Divide a string of bracketted tagged text into
+    chunks and unchunked tokens, and produce a Tree.
+    Chunks are marked by square brackets (``[...]``).  Words are
+    delimited by whitespace, and each word should have the form
+    ``text/tag``.  Words that do not contain a slash are
+    assigned a ``tag`` of None.
+
+    :param s: The string to be converted
+    :type s: str
+    :param chunk_label: The label to use for chunk nodes
+    :type chunk_label: str
+    :param root_label: The label to use for the root of the tree
+    :type root_label: str
+    :rtype: Tree
+    """
+
+    WORD_OR_BRACKET = re.compile(r"\[|\]|[^\[\]\s]+")
+
+    stack = [Tree(root_label, [])]
+    for match in WORD_OR_BRACKET.finditer(s):
+        text = match.group()
+        if text[0] == "[":
+            if len(stack) != 1:
+                raise ValueError(f"Unexpected [ at char {match.start():d}")
+            chunk = Tree(chunk_label, [])
+            stack[-1].append(chunk)
+            stack.append(chunk)
+        elif text[0] == "]":
+            if len(stack) != 2:
+                raise ValueError(f"Unexpected ] at char {match.start():d}")
+            stack.pop()
+        else:
+            if sep is None:
+                stack[-1].append(text)
+            else:
+                word, tag = str2tuple(text, sep)
+                if source_tagset and target_tagset:
+                    tag = map_tag(source_tagset, target_tagset, tag)
+                stack[-1].append((word, tag))
+
+    if len(stack) != 1:
+        raise ValueError(f"Expected ] at char {len(s):d}")
+    return stack[0]
+
+
+### CONLL
+
+_LINE_RE = re.compile(r"(\S+)\s+(\S+)\s+([IOB])-?(\S+)?")
+
+
+def conllstr2tree(s, chunk_types=("NP", "PP", "VP"), root_label="S"):
+    """
+    Return a chunk structure for a single sentence
+    encoded in the given CONLL 2000 style string.
+    This function converts a CoNLL IOB string into a tree.
+    It uses the specified chunk types
+    (defaults to NP, PP and VP), and creates a tree rooted at a node
+    labeled S (by default).
+
+    :param s: The CoNLL string to be converted.
+    :type s: str
+    :param chunk_types: The chunk types to be converted.
+    :type chunk_types: tuple
+    :param root_label: The node label to use for the root.
+    :type root_label: str
+    :rtype: Tree
+    """
+
+    stack = [Tree(root_label, [])]
+
+    for lineno, line in enumerate(s.split("\n")):
+        if not line.strip():
+            continue
+
+        # Decode the line.
+        match = _LINE_RE.match(line)
+        if match is None:
+            raise ValueError(f"Error on line {lineno:d}")
+        (word, tag, state, chunk_type) = match.groups()
+
+        # If it's a chunk type we don't care about, treat it as O.
+        if chunk_types is not None and chunk_type not in chunk_types:
+            state = "O"
+
+        # For "Begin"/"Outside", finish any completed chunks -
+        # also do so for "Inside" which don't match the previous token.
+        mismatch_I = state == "I" and chunk_type != stack[-1].label()
+        if state in "BO" or mismatch_I:
+            if len(stack) == 2:
+                stack.pop()
+
+        # For "Begin", start a new chunk.
+        if state == "B" or mismatch_I:
+            chunk = Tree(chunk_type, [])
+            stack[-1].append(chunk)
+            stack.append(chunk)
+
+        # Add the new word token.
+        stack[-1].append((word, tag))
+
+    return stack[0]
+
+
+def tree2conlltags(t):
+    """
+    Return a list of 3-tuples containing ``(word, tag, IOB-tag)``.
+    Convert a tree to the CoNLL IOB tag format.
+
+    :param t: The tree to be converted.
+    :type t: Tree
+    :rtype: list(tuple)
+    """
+
+    tags = []
+    for child in t:
+        try:
+            category = child.label()
+            prefix = "B-"
+            for contents in child:
+                if isinstance(contents, Tree):
+                    raise ValueError(
+                        "Tree is too deeply nested to be printed in CoNLL format"
+                    )
+                tags.append((contents[0], contents[1], prefix + category))
+                prefix = "I-"
+        except AttributeError:
+            tags.append((child[0], child[1], "O"))
+    return tags
+
+
+def conlltags2tree(
+    sentence, chunk_types=("NP", "PP", "VP"), root_label="S", strict=False
+):
+    """
+    Convert the CoNLL IOB format to a tree.
+    """
+    tree = Tree(root_label, [])
+    for word, postag, chunktag in sentence:
+        if chunktag is None:
+            if strict:
+                raise ValueError("Bad conll tag sequence")
+            else:
+                # Treat as O
+                tree.append((word, postag))
+        elif chunktag.startswith("B-"):
+            tree.append(Tree(chunktag[2:], [(word, postag)]))
+        elif chunktag.startswith("I-"):
+            if (
+                len(tree) == 0
+                or not isinstance(tree[-1], Tree)
+                or tree[-1].label() != chunktag[2:]
+            ):
+                if strict:
+                    raise ValueError("Bad conll tag sequence")
+                else:
+                    # Treat as B-*
+                    tree.append(Tree(chunktag[2:], [(word, postag)]))
+            else:
+                tree[-1].append((word, postag))
+        elif chunktag == "O":
+            tree.append((word, postag))
+        else:
+            raise ValueError(f"Bad conll tag {chunktag!r}")
+    return tree
+
+
+def tree2conllstr(t):
+    """
+    Return a multiline string where each line contains a word, tag and IOB tag.
+    Convert a tree to the CoNLL IOB string format
+
+    :param t: The tree to be converted.
+    :type t: Tree
+    :rtype: str
+    """
+    lines = [" ".join(token) for token in tree2conlltags(t)]
+    return "\n".join(lines)
+
+
+### IEER
+
+_IEER_DOC_RE = re.compile(
+    r"<DOC>\s*"
+    r"(<DOCNO>\s*(?P<docno>.+?)\s*</DOCNO>\s*)?"
+    r"(<DOCTYPE>\s*(?P<doctype>.+?)\s*</DOCTYPE>\s*)?"
+    r"(<DATE_TIME>\s*(?P<date_time>.+?)\s*</DATE_TIME>\s*)?"
+    r"<BODY>\s*"
+    r"(<HEADLINE>\s*(?P<headline>.+?)\s*</HEADLINE>\s*)?"
+    r"<TEXT>(?P<text>.*?)</TEXT>\s*"
+    r"</BODY>\s*</DOC>\s*",
+    re.DOTALL,
+)
+
+_IEER_TYPE_RE = re.compile(r'<b_\w+\s+[^>]*?type="(?P<type>\w+)"')
+
+
+def _ieer_read_text(s, root_label):
+    stack = [Tree(root_label, [])]
+    # s will be None if there is no headline in the text
+    # return the empty list in place of a Tree
+    if s is None:
+        return []
+    for piece_m in re.finditer(r"<[^>]+>|[^\s<]+", s):
+        piece = piece_m.group()
+        try:
+            if piece.startswith("<b_"):
+                m = _IEER_TYPE_RE.match(piece)
+                if m is None:
+                    print("XXXX", piece)
+                chunk = Tree(m.group("type"), [])
+                stack[-1].append(chunk)
+                stack.append(chunk)
+            elif piece.startswith("<e_"):
+                stack.pop()
+            #           elif piece.startswith('<'):
+            #               print "ERROR:", piece
+            #               raise ValueError # Unexpected HTML
+            else:
+                stack[-1].append(piece)
+        except (IndexError, ValueError) as e:
+            raise ValueError(
+                f"Bad IEER string (error at character {piece_m.start():d})"
+            ) from e
+    if len(stack) != 1:
+        raise ValueError("Bad IEER string")
+    return stack[0]
+
+
+def ieerstr2tree(
+    s,
+    chunk_types=[
+        "LOCATION",
+        "ORGANIZATION",
+        "PERSON",
+        "DURATION",
+        "DATE",
+        "CARDINAL",
+        "PERCENT",
+        "MONEY",
+        "MEASURE",
+    ],
+    root_label="S",
+):
+    """
+    Return a chunk structure containing the chunked tagged text that is
+    encoded in the given IEER style string.
+    Convert a string of chunked tagged text in the IEER named
+    entity format into a chunk structure.  Chunks are of several
+    types, LOCATION, ORGANIZATION, PERSON, DURATION, DATE, CARDINAL,
+    PERCENT, MONEY, and MEASURE.
+
+    :rtype: Tree
+    """
+
+    # Try looking for a single document.  If that doesn't work, then just
+    # treat everything as if it was within the <TEXT>...</TEXT>.
+    m = _IEER_DOC_RE.match(s)
+    if m:
+        return {
+            "text": _ieer_read_text(m.group("text"), root_label),
+            "docno": m.group("docno"),
+            "doctype": m.group("doctype"),
+            "date_time": m.group("date_time"),
+            #'headline': m.group('headline')
+            # we want to capture NEs in the headline too!
+            "headline": _ieer_read_text(m.group("headline"), root_label),
+        }
+    else:
+        return _ieer_read_text(s, root_label)
+
+
+def demo():
+    s = "[ Pierre/NNP Vinken/NNP ] ,/, [ 61/CD years/NNS ] old/JJ ,/, will/MD join/VB [ the/DT board/NN ] ./."
+    import nltk
+
+    t = nltk.chunk.tagstr2tree(s, chunk_label="NP")
+    t.pprint()
+    print()
+
+    s = """
+These DT B-NP
+research NN I-NP
+protocols NNS I-NP
+offer VBP B-VP
+to TO B-PP
+the DT B-NP
+patient NN I-NP
+not RB O
+only RB O
+the DT B-NP
+very RB I-NP
+best JJS I-NP
+therapy NN I-NP
+which WDT B-NP
+we PRP B-NP
+have VBP B-VP
+established VBN I-VP
+today NN B-NP
+but CC B-NP
+also RB I-NP
+the DT B-NP
+hope NN I-NP
+of IN B-PP
+something NN B-NP
+still RB B-ADJP
+better JJR I-ADJP
+. . O
+"""
+
+    conll_tree = conllstr2tree(s, chunk_types=("NP", "PP"))
+    conll_tree.pprint()
+
+    # Demonstrate CoNLL output
+    print("CoNLL output:")
+    print(nltk.chunk.tree2conllstr(conll_tree))
+    print()
+
+
+if __name__ == "__main__":
+    demo()