Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Tries with frequencies

This Python package has functions for creation and manipulation of Tries (Prefix trees) with frequencies.

The package provides Machine Learning (ML) functionalities, not "just" a Trie data structure.

This Python implementation closely follows the Mathematica implementation [AAp2].


Installation

From GitHub:

pip install -e git+https://github.com/antononcube/Python-packages.git#egg=TriesWithFrequencies-antononcube\&subdirectory=TriesWithFrequencies

From PyPI:

python3 -m pip install TriesWithFrequencies

Setup

from TriesWithFrequencies import *

Creation examples

In this section we show a few ways to create tries with frequencies.

Consider a trie (prefix tree) created over a list of words:

tr = trie_create_by_split( ["bar", "bark", "bars", "balm", "cert", "cell"] )
trie_form(tr)
TRIEROOT => 6.0
├─b => 4.0
│ └─a => 4.0
│   ├─r => 3.0
│   │ └─k => 1.0
│   │ └─s => 1.0
│   └─l => 1.0
│     └─m => 1.0
└─c => 2.0
  └─e => 2.0
    ├─r => 1.0
    │ └─t => 1.0
    └─l => 1.0
      └─l => 1.0

Here we convert the trie with frequencies above into a trie with probabilities:

ptr = trie_node_probabilities( tr )
trie_form(ptr)
TRIEROOT => 1.0
├─b => 0.6666666666666666
│ └─a => 1.0
│   ├─r => 0.75
│   │ ├─k => 0.3333333333333333
│   │ └─s => 0.3333333333333333
│   └─l => 0.25
│     └─m => 1.0
└─c => 0.3333333333333333
  └─e => 1.0
    ├─r => 0.5
    │ └─t => 1.0
    └─l => 0.5
      └─l => 1.0

Shrinking

Here we shrink the trie with probabilities above:

trie_form(trie_shrink(ptr))
TRIEROOT => 1.0
└─ba => 1.0
  └─r => 0.75
    └─k => 0.3333333333333333
    └─s => 0.3333333333333333
  └─lm => 1.0
└─ce => 1.0
  └─rt => 1.0
  └─ll => 1.0

Here we shrink the frequencies trie using a separator:

trie_form(trie_shrink(tr, sep="~"))
TRIEROOT => 6.0
└─b~a => 4.0
  └─r => 3.0
    └─k => 1.0
    └─s => 1.0
  └─l~m => 1.0
└─c~e => 2.0
  └─r~t => 1.0
  └─l~l => 1.0

Retrieval and sub-tries

Here we retrieve a sub-trie with a key:

trie_form(trie_sub_trie(tr, list("bar")))
r => 3.0
└─k => 1.0
└─s => 1.0

Classification

Create a trie:

words = [*(["bar"] * 6), *(["bark"] * 3), *(["bare"] * 2), *(["cam"] * 3), "came", *(["camelia"] * 4)]
tr = trie_create_by_split(words)
tr = trie_node_probabilities(tr)

Show node counts:

trie_node_counts(tr)
{'total': 13, 'internal': 10, 'leaves': 3}

Show the trie form:

trie_form(tr)
TRIEROOT => 1.0
├─b => 0.5789473684210527
│ └─a => 1.0
│   └─r => 1.0
│     ├─k => 0.2727272727272727
│     └─e => 0.18181818181818182
└─c => 0.42105263157894735
  └─a => 1.0
    └─m => 1.0
      └─e => 0.625
        └─l => 0.8
          └─i => 1.0
            └─a => 1.0

Classify with the letters of the word "cam":

trie_classify(tr, list("cam"), prop="Probabilities")
{'a': 0.5, 'm': 0.375, 'e': 0.12499999999999997}

References

Articles

[AA1] Anton Antonov, "Tries with frequencies for data mining", (2013), MathematicaForPrediction at WordPress.

[AA2] Anton Antonov, "Removal of sub-trees in tries", (2013), MathematicaForPrediction at WordPress.

[AA3] Anton Antonov, "Tries with frequencies in Java" (2017), MathematicaForPrediction at WordPress. GitHub Markdown.

[RAC1] Tib, "Day 10: My 10 commandments for Raku performances", (2020), Raku Advent Calendar.

[WK1] Wikipedia entry, Trie.

Packages

[AAp1] Anton Antonov, Tries with frequencies Mathematica Version 9.0 package, (2013), MathematicaForPrediction at GitHub.

[AAp2] Anton Antonov, Tries with frequencies Mathematica package, (2013-2018), MathematicaForPrediction at GitHub.

[AAp3] Anton Antonov, Tries with frequencies in Java, (2017), MathematicaForPrediction at GitHub.

[AAp4] Anton Antonov, Java tries with frequencies Mathematica package, (2017), MathematicaForPrediction at GitHub.

[AAp5] Anton Antonov, Java tries with frequencies Mathematica unit tests, (2017), MathematicaForPrediction at GitHub.

[AAp6] Anton Antonov, ML::TriesWithFrequencies Raku package, (2021), GitHub/antononcube.

Videos

[AAv1] Anton Antonov, "Prefix Trees with Frequencies for Data Analysis and Machine Learning", (2017), Wolfram Technology Conference 2017, Wolfram channel at YouTube.