Skip to content

Latest commit

 

History

History

readme.md

Python

Python's strftime directives

Starters

Scientific Python Lectures - Tutorials on the scientific Python ecosystem: a quick introduction to central tools and techniques.

jpmorganchase/python-training

The Python Tutorial

An Effective Python Environment: Making Yourself at Home - Real Python

Scipy Lecture Notes

ipython-cookbook

Stop using utcnow and utcfromtimestamp - HN

Today I Scripted | Vincent D. Warmerdam

# /// script
# dependencies = [
#   "requests<3",
#   "rich",
# ]
# ///

import requests
from rich.pretty import pprint

resp = requests.get("https://peps.python.org/api/peps.json")
data = resp.json()
pprint([(k, v["title"]) for k, v in data.items()][:10])

How to run Python in production

format:
	uv run autoflake --in-place -r --remove-all-unused-imports --remove-unused-variables .
	uv run autopep8 --recursive --in-place --select W292,W293,W391,E121,E122,E123,E126,E128,E129,E131,E202,E225,E226,E241,E301,E302,E303,E704,E731 .
	uv run ruff check --config pyproject.toml --fix .
	# Same line length as Black
	uv run isort --line-length 88 .

lint:
	uv run autoflake --check-diff -r --quiet \
		--remove-all-unused-imports --remove-unused-variables --remove-duplicate-keys .
	# W503 has been deprecated in favor of W504 - https://www.flake8rules.com/rules/W503.html
	uv run flake8 . --extend-exclude venv --count --show-source --statistics --max-line-length=88 --ignore=E501,W503
	# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
	uv run flake8 . --extend-exclude venv --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
	# Config file is specified for brevity
	uv run ruff check --config pyproject.toml .
	# Same line length as Black
	uv run isort --check --diff --line-length 88 .
	uv run pylint --rcfile=../../.pylintrc --output-format=colorized .

What Data Professionals Need to Know about uv

Advanced

Python Design Patterns

Courses

Google Python course

Learn Python the Hard Way

Textbooks

Think Python - Allen Downey

Effective Python: 59 Specific Ways to Write Better Python - Brett Slatkin

Automate the Boring Stuff with Python - Al Sweigart

Fluent Python - Luciano Ramalho

Useful tools

The hand-picked selection of the best Python libraries and tools of 2024

argparse builder - a simple graphical interface for quick creation of the argparse commandline switches for your scripts

Blog posts

How to Write Beautiful Python Code With PEP 8 from Real Python.

How to set up a perfect Python project - Brendan Maginnis - text

Python’s Innards: Introduction

Code Examples

krother/advanced_python - github

krother/software-engineering-python - github

Talks

Raymond Hettinger - Dataclasses: The code generator to end all code generators - PyCon 2018 - youtube

Raymond Hettinger: Numerical Marvels Inside Python - Keynote | PyData Tel Aviv 2022 - youtube

How to Write Python Code Others Like to Use - Anna Tisch - Kiwi Pycon X

Top to down, left to right (Surprise talk) - James Powell

Jonas Neubert - What is a PLC and how do I talk Python to it? - PyCon 2019

Modern Python Developer's Toolkit - Sebastian Witowski

Road to Python 3 - Lisa Guo, Hui Ding Keynote PyCon 2017

Memory Management in Python - The Basics - PyCon 2016 - Nina Zakharenko

Untitled12.ipynb | PyData Eindhoven 2019 - Vincent D. Warmerdam

High Performance Data Processing in Python || Donald Whyte - youtube

  • how python and numpy works at memory level

Losing your Loops Fast Numerical Computing with NumPy - Jake VanderPlas

Ned Batchelder - Big-O: How Code Slows as Data Grows - PyCon 2018 - youtube

Brandon Rhodes: All Your Ducks In A Row: Data Structures in the Std Lib and Beyond - PyCon 2014 - youtube

Greg Ward - How to Write Reusable Code - PyCon 2015 - video

How to make a good library API PyCon 2017 - video

Jack Diederich - HOWTO Write a Function - youtube

Jack Diederich - Stop Writing Classes - video

Brett Slatkin - Refactoring Python: Why and how to restructure your code - PyCon 2016 - youtube

Alex Gaynor: Fast Python, Slow Python - PyCon 2014 - video

Transforming Code into Beautiful, Idiomatic Python - Raymond Hettinger - youtube

PyCon 2010: The Mighty Dictionary - youtube

Hidden Treasures in the Standard Library - video

Raymond Hettingier - Python's Class Development Toolkit - youtube

Raymond Hettingier - Modern solvers: Problems well-defined are problems solved - PyCon 2019 - youtube

Jeff Reback - What is the Future of Pandas - youtube

"Python Oddities Explained" - Trey Hunner (PyCon AU 2019)

Python is not block scoped

  • list comprehensions have their own scope
  • for loops don't have their own scope

Two ways to change things in Py

  • change variable (assignment)
  • change object (mutation)

Can read globals always

  • can't assign to globals in local scope

Scope matters with assignment, not with mutation

Lists don't contain objects, they contain references to memory locations

x = []
x.append(x)

Variables don't contain objects, they are names that point to objects

+= on lists & tuples

  • will not mutate tuples (instead create new object)
  • will mutate lists (same object mutated)

Duck typing

Guido van Rossum: Python | MIT Artificial Intelligence (AI) Podcast

Transforming Code into Beautiful, Idiomatic Python - Raymond Hettinger - youtube

Python: DevOps for Electrical Engineers

The best explanation of Python decorators I’ve ever seen. (An archived answer from StackOverflow.)

Jeff Reback - What is the Future of Pandas - youtube

Ned Batchelder - Big-O: How Code Slows as Data Grows - PyCon 2018 - youtube

Hidden Treasures in the Standard Library - video

High Performance Data Processing in Python || Donald Whyte - youtube

  • covers how python and numpy works at memory level

PyCon 2010: The Mighty Dictionary - youtube

Thinking Recursively in Python article

7 Habits to Improve The Performance of Python Programs - blog post

How to make a good library API PyCon 2017 - video

An A-Z of useful Python tricks - Medium

Elana Hashman - Teaching Python: The Hard Parts - PyCon 2016 - video

  • explain the inconsistent syntax - ie foo.len() versus len(foo)
  • start teaching testing early - teaching it later makes it sound not important
  • be aware of your own shortcomings
  • set reasonable expectations

Brandon Rhodes: All Your Ducks In A Row: Data Structures in the Std Lib and Beyond - PyCon 2014 - youtube

Talk ignores data compression (bus to ram) + memory hierarchy

Some data structures (linked lists, doubly-linked list, trees of all kinds) don't appear in Python because Python data structures don't contain data, they contain addresses

  • bit = 0 or 1
  • byte = 8 bits, can represent 0-255

Computer memory = array of bytes, named by integer addresses

RAM = parallel that provide random access to different locations. Parallel across memory to find the bytes that are wanted (they go onto the databus)

Address arithmetic (adding and subtracting) = handles the data structures at the machine level - records (sequence of fields in agreed order) and arrays

Records in Python = reference count (8 bytes) and then 8 bytes for address of the type (ie int, float etc)

String = variable length, 6 different 8 byte then the string itself

Retrieve a record by adding record start address + field's offset

Unicode = 4 bytes per character

Record = addition, hereto Array = multiplication, homo Both give immediate access to data you want

Python arrays

Array = everything in it is the same (same length). Given an array b bytes long, item i lives at address + bi

struct can be built in python (C level structure). Can be useful for binary conversations with C libraries or I/O

Array useful for binary conversations with C or I/O

Indexing is fast because of the use of arrays at the machine level

But accessing its items from Python requires repeated object building. To sum an array of 100 floats, Python needs to build >100 float objects

Numpy

Numpy to the rescue. Numpy supports math operations without building intermediate objects, they happen at the C level

Python can't use raw records of arrays because it is dynamic - the main datastructures are general purpose

i.e. tuple of different types - array address math (a+bi) depends on every element being the same length (b)

Python tuple contains an array of addresses! Data structures without the data

Moving an item means only copying an address - there will only be one copy of an object, regardless of how many times it appears in an object

List = pythons most dangerous data structure

Lists are like tuples but they can grow. To grow, it might need to move - but Python objects can't change addresses - Python solves this by having an address for addresses

Python lists reserve extra room to avoid reallocation (which is expensive)

Thousand and million check - what is the cost of doing this operation a thousand times? And a million times?

Thousand -> million

  • O(n) = 3 zeros
  • O(nlog(n)) = 4 zeros
  • O(n2) = 6 zeros

Append on a list asks for extra item slots (4, 8, 16, 25, 35, 36...) - this solves the problem. This is called amortization - because it spreads the cost over time (like a mortgage). This is bumpy (ie you pay for one additional append, but then the next n are free etc)

The list is a tradeoff - saving time by wasting space. Lists use on average 94% of their slots, but get linear time

The problem is that some single item operations need to touch each element in the list. Appending is fast, inserting at the beginning is slow

Slicing - copy versus view

Normally when slicing a slice of size n costs n address copies (expensive)

np/pd slices are views (no data copying) -> fast

Dicts

Dicts - behind each dict is an array where keys are stored at indexes according to their hash value (an integer). Each slot must store both the key and value (hash, key address, value address)

Dict grows by doubling or quadrupling - amortizes resize cost. Resizes at 2/3 full to avoid collisions - only ever 1/3 - 2/3 full

Given a key, a dict can hash it and jump right to it's slow

All dict operations are fast and therefore safe for beginners

Because dict assigns key to array indices by hash, it iterates in arbitrary order

Set = dict with only keys (no values)

Classes are implemented with dictionaries - each attribute is a key in a hidden dictionary. Classes that pre-specify slots are implemented as a struct

bisect = to get all numbers greater than or less than a number (uses binary search)

Deque is used in the multiprocessing Queue

heapq allows fetching top t of n items in O(t+n) time

nlargest() and nsmallest()

Raymond Hettingier - Python's Class Development Toolkit - youtube

Document as you go

Never be as agile as before you start writing code

Inheriting from object - get extra capability

Only instance variables should be instance variables. Use class variable to share between all instances

init !=. a constructor.

Self = the instance, it is already made once your init is called

Takes an existing instance and populates it

Yangey = you anti gonna need it - don’t add features in before you know whether you need it

Start with minimum possible

Too many features is a bad thing

People who get good at one language, check those rules into another language (ie private variables from java into python)

Converter and adapter function

Constructor war = want to construct the class using different inputs (ie from a radius or from a bounding box)

Answer = offer different methods

Use a class method as an alternative method

Problem here is that this method will always return a Circle (never a Tire) - need to think about subclass

Class Tire(Circle) - works by using cls (supports subclassing)

Static method = to attach methods to classes (improves findability and ensure function used in the correct context)

Put all the tools in the toolbox

All problems have easy to understand, simple and wrong answers

Keeping a spare copy of a method by using __ underscores. Class local references

Use __.method to make sure you refer to you (not your children classes). Allows subclasses to overwrite methods without breaking methods in subclasses

@property - can be added later to attributes. Can’t do in a compiled language (but can in a dynamic language)

Flyweight design matter - deals with memory issues from multiple classes. Always save til last

__slots__

Surpasses instance dict, saves memory

Slots aren’t inherited

Cost of cache miss as expensive as floating point divide (importance of memory)

Alex Gaynor: Fast Python, Slow Python - PyCon 2014 - video - review

Benchmarks are lies - impossible to reduce performance to a single number

Performance is about specialization

  • specialize the algorithm for the use case
  • specialize code to get it to run faster

Dynamically typed languages can't be optimize in the same way as static - but that doesn't mean their aren't ways to optimise then

  • slow versus hard to optimize

Allocations and copies make things slow

Dictionaries are not specialised. Classes are specialised.

Dicts

  • good for mapping things to other things
  • mapping or arbitrary keys to arbitrary values
  • not a replacement for classes

Object has a fixed set of properties. Object appears as a new thing to the interpreter (same as for named tuples)

  • use objects to represent things you have an understanding of

Myths - especially for other types of python (not cpython)

  • function calls are expensive
  • Only use built in data types
  • Don't write python like java/C

Memory Management in Python - The Basics - PyCon 2016 - Nina Zakharenko

Python has automatic garbage collection (c doesn't)

hex(id(o))

c style

  • fixed size bucket
  • same sized data
  • changing the value means that we chatge the data stored in that memory location

python

  • names = label for an object, objects can have lots of names
  • references = name of container that points at aonther object
  • objects
  • simple objects (numbers & strings) - store their own values
  • containers (dict, lists) & classes = store references to simple objs or other containers

reducing ref count

  • del statement = reduces ref count (doesnt delete obj)
  • changing the ref
  • var out of scope

global namespace = ref count never reaches 0!

sys.getsizeof(obj) # gives size in bytes

sys.getrefcount(object) # ref count

garbage collection = program automatically releasing memory

  • added space (storing ref count) & oxecution overhead (chinging ref counts)

GIL = exists because we can't change ref counts concurrently

  • makes garbage collection fast & simple
  • but means we can only execute one thread in a python program

means we need to use multiprocessing

classes

print(obj.dict)

all obj hawve type, value, ref count