dhondta
diff --git a/‎README.md‎
Lines changed: 31 additions & 12 deletions b/‎README.md‎
Lines changed: 31 additions & 12 deletions
diff --git a/‎codext/base/__init__.py‎
Lines changed: 0 additions & 1 deletion b/‎codext/base/__init__.py‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎codext/base/_base.py‎
Lines changed: 51 additions & 29 deletions b/‎codext/base/_base.py‎
Lines changed: 51 additions & 29 deletions
diff --git a/‎codext/base/_base2n.py‎
Lines changed: 9 additions & 16 deletions b/‎codext/base/_base2n.py‎
Lines changed: 9 additions & 16 deletions
@@ -211,14 +211,33 @@ o
 
 ## :page_with_curl: List of codecs
 
-#### BaseXX
-
-- [X] `baseN`: see [base encodings](https://python-codext.readthedocs.io/en/latest/enc/base.html) (incl [z]base32, 36, 45, 58, 62, 63, 64, [z]85, 91, 100, 122)
+#### [BaseXX](https://python-codext.readthedocs.io/en/latest/enc/base.html)
+
+- [X] `base1`: useless, but for the sake of completeness
+- [X] `base2`: simple conversion to binary (with a variant with a reversed alphabet)
+- [X] `base3`: conversion to ternary (with a variant with a reversed alphabet)
+- [X] `base4`: conversion to quarternary (with a variant with a reversed alphabet)
+- [X] `base8`: simple conversion to octal (with a variant with a reversed alphabet)
+- [X] `base10`: simple conversion to decimal
+- [X] `base16`: simple conversion to hexadecimal (with a variant holding an alphabet with digits and letters inverted)
+- [X] `base26`: conversion to alphabet letters
+- [X] `base32`: classical conversion according to the RFC4648 with all its variants ([zbase32](https://philzimmermann.com/docs/human-oriented-base-32-encoding.txt), extended hexadecimal, [geohash](https://en.wikipedia.org/wiki/Geohash), [Crockford](https://www.crockford.com/base32.html))
+- [X] `base36`: [Base36](https://en.wikipedia.org/wiki/Base36) conversion to letters and digits (with a variant inverting both groups)
+- [X] `base45`: [Base45](https://datatracker.ietf.org/doc/html/draft-faltstrom-base45-04.txt) DRAFT algorithm (with a variant inverting letters and digits)
+- [X] `base58`: multiple versions of [Base58](https://en.bitcoinwiki.org/wiki/Base58) (bitcoin, flickr, ripple)
+- [X] `base62`: [Base62](https://en.wikipedia.org/wiki/Base62) conversion to lower- and uppercase letters and digits (with a variant with letters and digits inverted)
+- [X] `base63`: similar to `base62` with the "`_`" added
+- [X] `base64`: classical conversion according to RFC4648 with its variant URL (or *file*) (it also holds a variant with letters and digits inverted)
+- [X] `base67`: custom conversion using some more special characters (also with a variant with letters and digits inverted)
+- [X] `base85`: all variants of Base85 ([Ascii85](https://fr.wikipedia.org/wiki/Ascii85), [z85](https://rfc.zeromq.org/spec/32), [Adobe](https://dencode.com/string/ascii85), [(x)btoa](https://dencode.com/string/ascii85), [RFC1924](https://datatracker.ietf.org/doc/html/rfc1924), [XML](https://datatracker.ietf.org/doc/html/draft-kwiatkowski-base85-for-xml-00))
+- [X] `base91`: [Base91](http://base91.sourceforge.net) custom conversion
+- [X] `base100` (or *emoji*): [Base100](https://github.com/AdamNiederer/base100) custom conversion
+- [X] `base122`: [Base100](http://blog.kevinalbs.com/base122) custom conversion
 - [X] `base-genericN`: see [base encodings](https://python-codext.readthedocs.io/en/latest/enc/base.html) ; supports any possible base
 
 This category also contains `ascii85`, `adobe`, `[x]btoa`, `zeromq` with the `base85` codec.
 
-#### Binary
+#### [Binary](https://python-codext.readthedocs.io/en/latest/enc/binary.html)
 
 - [X] `baudot`: supports CCITT-1, CCITT-2, EU/FR, ITA1, ITA2, MTK-2 (Python3 only), UK, ...
 - [X] `baudot-spaced`: variant of `baudot` ; groups of 5 bits are whitespace-separated
@@ -232,17 +251,17 @@ This category also contains `ascii85`, `adobe`, `[x]btoa`, `zeromq` with the `ba
 - [X] `manchester-inverted`: variant of `manchester` ; XORes each bit of the input with `10`
 - [X] `rotateN`: rotates characters by the specified number of bits (*N* belongs to [1, 7] ; Python 3 only)
 
-#### Common
+#### [Common](https://python-codext.readthedocs.io/en/latest/enc/common.html)
 
 - [X] `a1z26`: keeps words whitespace-separated and uses a custom character separator
 - [X] `cases`: set of case-related encodings (including camel-, kebab-, lower-, pascal-, upper-, snake- and swap-case, slugify, capitalize, title)
-- [X] `dummy`: set of simple encodings (including replace, reverse, word-reverse, substite and strip-spaces)
+- [X] `dummy`: set of simple encodings (including integer, replace, reverse, word-reverse, substite and strip-spaces)
 - [X] `octal`: dummy octal conversion (converts to 3-digits groups)
 - [X] `octal-spaced`: variant of `octal` ; dummy octal conversion, handling whitespace separators
 - [X] `ordinal`: dummy character ordinals conversion (converts to 3-digits groups)
 - [X] `ordinal-spaced`: variant of `ordinal` ; dummy character ordinals conversion, handling whitespace separators
 
-#### Compression
+#### [Compression](https://python-codext.readthedocs.io/en/latest/enc/compressions.html)
 
 - [X] `gzip`: standard Gzip compression/decompression
 - [X] `lz77`: compresses the given data with the algorithm of Lempel and Ziv of 1977
@@ -253,7 +272,7 @@ This category also contains `ascii85`, `adobe`, `[x]btoa`, `zeromq` with the `ba
 
 > :warning: Compression functions are of course definitely **NOT** encoding functions ; they are implemented for leveraging the `.encode(...)` API from `codecs`.
 
-#### Cryptography
+#### [Cryptography](https://python-codext.readthedocs.io/en/latest/enc/crypto.html)
 
 - [X] `affine`: aka Affine Cipher
 - [X] `atbash`: aka Atbash Cipher
@@ -268,7 +287,7 @@ This category also contains `ascii85`, `adobe`, `[x]btoa`, `zeromq` with the `ba
 
 > :warning: Crypto functions are of course definitely **NOT** encoding functions ; they are implemented for leveraging the `.encode(...)` API from `codecs`.
 
-#### Hashing
+#### [Hashing](https://python-codext.readthedocs.io/en/latest/enc/hashing.html)
 
 - [X] `blake`: includes BLAKE2b and BLAKE2s (Python 3 only ; relies on `hashlib`)
 - [X] `checksums`: includes Adler32 and CRC32 (relies on `zlib`)
@@ -279,7 +298,7 @@ This category also contains `ascii85`, `adobe`, `[x]btoa`, `zeromq` with the `ba
 
 > :warning: Hash functions are of course definitely **NOT** encoding functions ; they are implemented for convenience with the `.encode(...)` API from `codecs` and useful for chaning codecs.
 
-#### Languages
+#### [Languages](https://python-codext.readthedocs.io/en/latest/enc/languages.html)
 
 - [X] `braille`: well-known braille language (Python 3 only)
 - [X] `ipsum`: aka lorem ipsum
@@ -293,15 +312,15 @@ This category also contains `ascii85`, `adobe`, `[x]btoa`, `zeromq` with the `ba
 - [X] `tap`: converts text to tap/knock code, commonly used by prisoners
 - [X] `tomtom`: similar to `morse`, using slashes and backslashes
 
-#### Others
+#### [Others](https://python-codext.readthedocs.io/en/latest/enc/others.html)
 
 - [X] `dna`: implements the 8 rules of DNA sequences (N belongs to [1,8])
 - [X] `html`: implements entities according to [this reference](https://dev.w3.org/html5/html-author/charref)
 - [X] `letter-indices`: encodes consonants and/or vowels with their corresponding indices
 - [X] `markdown`: unidirectional encoding from Markdown to HTML
 - [X] `url`: aka URL encoding
 
-#### Steganography
+#### [Steganography](https://python-codext.readthedocs.io/en/latest/enc/stegano.html)
 
 - [X] `hexagram`: uses Base64 and encodes the result to a charset of [I Ching hexagrams](https://en.wikipedia.org/wiki/Hexagram_%28I_Ching%29) (as implemented [here](https://github.com/qntm/hexagram-encode))
 - [X] `klopf`: aka Klopf code ; Polybius square with trivial alphabetical distribution
 
@@ -2,7 +2,6 @@
 from argparse import ArgumentParser, RawTextHelpFormatter
 from types import MethodType
 
-from .ascii85 import *
 from .base45 import *
 from .base85 import *
 from .base91 import *
 
@@ -10,9 +10,13 @@
 from types import FunctionType, MethodType
 
 from ..__common__ import *
+from ..__common__ import _set_exc
 from ..__info__ import __version__
 
 
+_set_exc("BaseError")
+_set_exc("BaseEncodeError")
+_set_exc("BaseDecodeError")
 """
 Curve fitting:
 
@@ -44,18 +48,7 @@
 [ 0.02827357  0.00510124 -0.99999984  0.01536941]
 """
 EXPANSION_FACTOR = lambda base: 0.02827357 / (base**0.00510124-0.99999984) + 0.01536941
-
-
-class BaseError(ValueError):
-    pass
-
-
-class BaseDecodeError(BaseError):
-    pass
-
-
-class BaseEncodeError(BaseError):
-    pass
+SIZE_LIMIT = 1024 * 1024 * 1024
 
 
 def _generate_charset(n):
@@ -95,22 +88,27 @@ def _get_charset(charset, p=""):
         except KeyError:
             pass
         # or handle [p]arameter as a pattern
-        default, n = None, None
+        default, n, best = None, None, None
         for pattern, cset in charset.items():
             n = len(cset)
-            if pattern == "":
+            if re.match(pattern, ""):
                 default = cset
                 continue
-            if re.match(pattern, p):
-                return cset
+            m = re.match(pattern, p)
+            if m:  # find the longest match from the patterns
+                s, e = m.span()
+                if e - s > len(best or ""):
+                    best = pattern
+        if best:
+            return charset[best]
         # special case: the given [p]arameter can be the charset itself if it has the right length
         p = re.sub(r"^[-_]+", "", p)
         if len(p) == n:
             return p
         # or simply rely on key ''
         if default is not None:
             return default
-    raise ValueError("Bad charset descriptor")
+    raise ValueError("Bad charset descriptor ('%s')" % p)
 
 
 # generic base en/decoding functions
@@ -123,6 +121,12 @@ def base_encode(input, charset, errors="strict", exc=BaseEncodeError):
     :param exc:     exception to be raised in case of error
     """
     i, n, r = input if isinstance(input, integer_types) else s2i(input), len(charset), ""
+    if n == 1:
+        if i > SIZE_LIMIT:
+            raise InputSizeLimitError("Input exceeded size limit")
+        return i * charset[0]
+    if n == 10:
+        return str(i) if charset == digits else "".join(charset[int(x)] for x in str(i))
     while i > 0:
         i, c = divmod(i, n)
         r = charset[c] + r
@@ -138,11 +142,15 @@ def base_decode(input, charset, errors="strict", exc=BaseDecodeError):
     :param exc:     exception to be raised in case of error
     """
     i, n, dec = 0, len(charset), lambda n: base_encode(n, [chr(x) for x in range(256)], errors, exc)
+    if n == 1:
+        return i2s(len(input))
+    if n == 10:
+        return i2s(int(input)) if charset == digits else "".join(str(charset.index(c)) for c in input)
     for k, c in enumerate(input):
         try:
             i = i * n + charset.index(c)
         except ValueError:
-            handle_error("base", errors, exc, decode=True)(c, k, dec(i))
+            handle_error("base", errors, exc, decode=True)(c, k, dec(i), "base%d" % n)
     return dec(i)
 
 
@@ -162,15 +170,19 @@ def base(charset, pattern, pow2=False, encode_template=base_encode, decode_templ
         raise BaseError("Bad charset ; {} is not a power of 2".format(n))
 
     def encode(param="", *args):
-        a = _get_charset(charset, param)
+        a = _get_charset(charset, args[0] if len(args) > 0 and args[0] else param)
         def _encode(input, errors="strict"):
+            if len(input) == 0:
+                return "", 0
             return encode_template(input, a, errors), len(input)
         return _encode
 
     def decode(param="", *args):
-        a = _get_charset(charset, param)
+        a = _get_charset(charset, args[0] if len(args) > 0 and args[0] else param)
         sl, sc = "\n" not in a, "\n" not in a and not "\r" in a
         def _decode(input, errors="strict"):
+            if len(input) == 0:
+                return "", 0
             input = _stripl(input, sc, sl)
             return decode_template(input, a, errors), len(input)
         return _decode
@@ -205,10 +217,14 @@ def _decode(input, errors="strict"):
         expansion_factor=lambda f, n: (EXPANSION_FACTOR(int(n.split("-")[0][4:])), .05))
 
 
-def main(n, ref=None, alt=None, inv=True):
+def main(n, ref=None, alt=None, inv=True, swap=True):
     base = str(n) + ("-" + alt.lstrip("-") if alt else "")
     src = "The data are encoded as described for the base%(base)s alphabet in %(reference)s.\n" % \
           {'base': base, 'reference': "\n" + ref if len(ref) > 20 else ref} if ref else ""
+    text = "%(source)sWhen decoding, the input may contain newlines in addition to the bytes of the formal base" \
+           "%(base)s alphabet.  Use --ignore-garbage to attempt to recover from any other non-alphabet bytes in the" \
+           " encoded stream." % {'base': base, 'source': src}
+    text = "\n".join(x for x in wrap(text, 74))
     descr = """Usage: base%(base)s [OPTION]... [FILE]
 Base%(base)s encode or decode FILE, or standard input, to standard output.
 
@@ -217,20 +233,19 @@ def main(n, ref=None, alt=None, inv=True):
 Mandatory arguments to long options are mandatory for short options too.
   -d, --decode          decode data
   -i, --ignore-garbage  when decoding, ignore non-alphabet characters
-%(inv)s  -w, --wrap=COLS       wrap encoded lines after COLS character (default 76).
+%(inv)s%(swap)s  -w, --wrap=COLS       wrap encoded lines after COLS character (default 76).
                           Use 0 to disable line wrapping
 
       --help     display this help and exit
       --version  output version information and exit
 
-%(source)sWhen decoding, the input may contain newlines in addition to the bytes of
-the formal base%(base)s alphabet.  Use --ignore-garbage to attempt to recover
-from any other non-alphabet bytes in the encoded stream.
+%(text)s
 
 Report base%(base)s translation bugs to <https://github.com/dhondta/python-codext/issues/new>
 Full documentation at: <https://python-codext.readthedocs.io/en/latest/enc/base.html>
-""" % {'base': base, 'source': src,
-       'inv': ["", "  -I, --invert          invert charsets from the base alphabet (e.g. lower- and uppercase)\n"][inv]}
+""" % {'base': base, 'text': text,
+       'inv': ["", "  -I, --invert          invert charsets from the base alphabet (e.g. digits and letters)\n"][inv],
+       'swap': ["", "  -s, --swapcase        swap the case\n"][swap]}
 
     def _main():
         p = ArgumentParser(description=descr, formatter_class=RawTextHelpFormatter, add_help=False)
@@ -240,6 +255,8 @@ def _main():
         p.add_argument("-i", "--ignore-garbage", action="store_true")
         if inv:
             p.add_argument("-I", "--invert", action="store_true")
+        if swap:
+            p.add_argument("-s", "--swapcase", action="store_true")
         p.add_argument("-w", "--wrap", type=int, default=76)
         p.add_argument("--help", action="help")
         p.add_argument("--version", action="version")
@@ -249,14 +266,19 @@ def _main():
             args.wrap = 0
         args.invert = getattr(args, "invert", False)
         c, f = _input(args.file), [encode, decode][args.decode]
-        c = c.rstrip("\r\n") if isinstance(c, str) else c.rstrip(b"\r\n")
+        if swap and args.decode:
+            c = codecs.decode(c, "swapcase")
+        c = b(c).rstrip(b"\r\n")
         try:
             c = f(c, "base" + base + ["", "-inv"][getattr(args, "invert", False)],
                   ["strict", "ignore"][args.ignore_garbage])
         except Exception as err:
             print("%sbase%s: invalid input" % (getattr(err, "output", ""), base))
             return 1
-        for l in (wrap(ensure_str(c), args.wrap) if args.wrap > 0 else [ensure_str(c)]):
+        c = ensure_str(c)
+        if swap and not args.decode:
+            c = codecs.encode(c, "swapcase")
+        for l in (wrap(c, args.wrap) if args.wrap > 0 else [c]):
             print(l)
         return 0
     return _main
 
@@ -5,23 +5,16 @@
 from math import ceil, log
 
 from ..__common__ import *
-from ._base import base, _get_charset, BaseError
+from ..__common__ import _set_exc
+from ._base import base, _get_charset
 
 
 _bin = lambda x: bin(x if isinstance(x, int) else ord(x))
 
 
 # base en/decoding functions for N a power of 2
-class Base2NError(BaseError):
-    pass
-
-
-class Base2NDecodeError(BaseError):
-    pass
-
-
-class Base2NEncodeError(BaseError):
-    pass
+_set_exc("Base2NDecodeError")
+_set_exc("Base2NEncodeError")
 
 
 def base2n(charset, pattern=None, name=None, **kwargs):
@@ -35,13 +28,12 @@ def base2n(charset, pattern=None, name=None, **kwargs):
     base(charset, pattern, True, base2n_encode, base2n_decode, name, **kwargs)
 
 
-def base2n_encode(string, charset, errors="strict", exc=Base2NEncodeError):
+def base2n_encode(string, charset, errors="strict"):
     """ 8-bits characters to base-N encoding for N a power of 2.
     
     :param string:  string to be decoded
     :param charset: base-N characters set
     :param errors:  errors handling marker
-    :param exc:     exception to be raised in case of error
     """
     bs, r, n = "", "", len(charset)
     # find the number of bits for the given character set and the quantum
@@ -66,13 +58,12 @@ def base2n_encode(string, charset, errors="strict", exc=Base2NEncodeError):
     return r + int(l / nb_out - len(r)) * "="
 
 
-def base2n_decode(string, charset, errors="strict", exc=Base2NDecodeError):
+def base2n_decode(string, charset, errors="strict"):
     """ Base-N to 8-bits characters decoding for N a power of 2.
     
     :param string:  string to be decoded
     :param charset: base-N characters set
     :param errors:  errors handling marker
-    :param exc:     exception to be raised in case of error
     """
     bs, r, n = "", "", len(charset)
     # particular case: for hex, ensure the right case in the charset ; not that this way, if mixed cases are used, it
@@ -95,7 +86,9 @@ def base2n_decode(string, charset, errors="strict", exc=Base2NDecodeError):
                 bs += ("{:0>%d}" % nb_in).format(_bin(charset.index(c))[2:])
             except ValueError:
                 if errors == "strict":
-                    raise exc("'base' codec can't decode character '{}' in position {}".format(c, i))
+                    e = Base2NDecodeError("'base%d' codec can't decode character '%s' in position %d" % (n, c, i))
+                    e.__cause__ = e  # block exceptions chaining
+                    raise e
                 elif errors == "replace":
                     bs += "0" * nb_in
                 elif errors == "ignore":