-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy pathsymbol.py
More file actions
182 lines (144 loc) · 6.59 KB
/
symbol.py
File metadata and controls
182 lines (144 loc) · 6.59 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
# -*- coding: utf-8; -*-
"""Lispy symbols and gensym for Python. Both are pickle-aware.
See:
https://stackoverflow.com/questions/8846628/what-exactly-is-a-symbol-in-lisp-scheme
https://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node27.html
"""
__all__ = ["sym", "gensym", "Symbol"]
from weakref import WeakValueDictionary
import threading
import uuid
# Symbol registry. Used for tracking symbol object identities within the same process.
_symbols = WeakValueDictionary()
_symbols_update_lock = threading.Lock()
# Gensyms go into a separate registry, to make name conflicts with named symbols
# impossible, even if someone grabs one of the UUIDs and uses it as a name.
_gensyms = WeakValueDictionary()
_gensyms_update_lock = threading.Lock()
class Symbol:
"""Base class for lispy symbols.
Meant only for `isinstance` checks that are intended to match both
named symbols and gensyms.
"""
pass
class sym(Symbol):
"""Interned symbol.
In plain English: a lightweight, human-readable, process-wide unique marker,
that can be quickly compared to another such marker by object identity.
name: any hashable, typically str.
The human-readable name of the symbol. Maps to the object identity.
Example::
cat = sym("cat")
assert cat is sym("cat")
assert cat is not sym("dog")
If you pass in the same `name` to the `sym` constructor, it gives you the
same object instance each time (in the same process). Even unpickling an
interned symbol that has the same name produces the same `sym` instance as
any other `sym` with that name.
This behaves like a Lisp symbol.
(Technically, it's like a zen-minimalistic Scheme/Racket symbol, since
Common Lisp stuffs all sorts of additional cruft in symbols. If you insist
on emulating that, note a `sym` is just a Python object.)
CAUTION: If you're familiar with JavaScript's `Symbol` and looking
for that, see `gensym`.
"""
def __new__(cls, name): # This covers unpickling, too.
# What we want to do:
# if name not in _symbols:
# _symbols[name] = super().__new__(cls)
# return _symbols[name]
#
# But because weakref and thread-safety, we must:
try: # EAFP to eliminate TOCTTOU.
return _symbols[name]
except KeyError:
# But we still need to be careful to avoid race conditions.
with _symbols_update_lock:
if name not in _symbols:
# We were the first thread to acquire the lock.
# Make a strong reference to keep the new instance alive until construction is done.
instance = _symbols[name] = super().__new__(cls)
else:
# Some other thread acquired the lock before us, and created the instance.
instance = _symbols[name]
return instance
def __init__(self, name):
self.name = name
# Pickle support. The default `__setstate__` (writing to `self.__dict__`)
# is fine, but we must pass args to `__new__` so it can find/allocate the
# correct object instance.
#
# Note we don't `sys.intern` the name *strings*; if we did, we'd need a
# custom `__setstate__` to redo that upon unpickling, since for `pickle`
# a string is a string, whether the original was interned or not.
def __getnewargs__(self):
return (self.name,)
def __str__(self):
return self.name
def __repr__(self):
return f'sym("{self.name}")'
# TODO: maybe get rid of the code duplication. But maybe no point.
class gsym(Symbol):
"""Uninterned symbol. These are generated by `gensym`.
uid: UUID
Maps to the object identity, so it must be unique.
Use one of the functions from the `uuid` stdlib
module to generate the UUID.
label: str
The human-readable label, shown in `str` and `repr`.
"""
def __new__(cls, uid, label):
try:
return _gensyms[uid]
except KeyError:
with _gensyms_update_lock:
if uid not in _gensyms:
instance = _gensyms[uid] = super().__new__(cls)
else:
# Provided that one of the `uuid` functions creates the
# `uid`, this case should only be possible when unpickling
# the same pickled gsym in several threads simultaneously.
instance = _gensyms[uid]
return instance
def __init__(self, uid, label):
self.uid = uid
self.label = label
def __getnewargs__(self):
return (self.uid, self.label)
def __str__(self):
return f"gensym#{self.label}:{self.uid}"
def __repr__(self):
return f'gsym("{self.label}", {repr(self.uid)})'
def gensym(label):
"""Create an uninterned symbol.
The return value is the only time you'll see that symbol object; take good
care of it!
The label is a short human-readable description (like the `name` of a named
symbol), but it has no relation to object identity. Object identity is
tracked by an UUID, assigned when the gensym value is first created.
Example::
tabby = gensym("cat")
scottishfold = gensym("cat")
assert tabby is not scottishfold
print(tabby) # gensym:cat:81a44c53-fe2d-4e65-b2de-329076cc7755
print(scottishfold) # gensym:cat:94287f75-02b5-4138-9174-1e422e618d59
Uninterned symbols are useful as unique nonce/sentinel values, like the
pythonic idiom `nonce = object()`, but they come with a human-readable label.
They also have a superpower: with the help of the UUID, they survive a
pickle roundtrip with object identity intact. Unpickling the *same*
gensym value multiple times in the same process will produce just one
object instance. (If the original return value from gensym is still alive,
it is the same object instance.)
The UUID is generated with the pseudo-random algorithm `uuid.uuid4`. Due to
rollover of the time field, it is possible for collisions with current
UUIDs to occur with those generated after (approximately) the year 3400.
See:
https://docs.python.org/3/library/uuid.html
https://tools.ietf.org/html/rfc4122
This is like Lisp's `gensym` and JavaScript's `Symbol`.
If you're familiar with `mcpyrate`'s `gensym` or MacroPy's `gen_sym`, those
mean something different. Their purpose is to create a lexical identifier
that is not in use, whereas this `gensym` creates a symbol object for
run-time use.
"""
return gsym(uuid.uuid4(), label)