Save Data in Python

Saving data is one of the most common tasks you will face in Python. Whether you are building a script that logs user actions, a web application that persists user sessions, or a data pipeline that outputs processed results, you need to choose how to store that data on disk. Python gives you several built-in options, and picking the right one depends on the kind of data you have and how you plan to use it later.

Text files work well for simple logs and configuration data. CSV files are the standard choice for tabular data that needs to be opened in spreadsheet tools. JSON handles structured data and integrates directly with web APIs. Pickle serializes arbitrary Python objects, including custom classes, directly to binary files. SQLite creates a lightweight relational database without needing a separate server process. Each approach has strengths and tradeoffs, and I will walk through all of them with working code so you know exactly when to use which.

TLDR

Plain text: use open() with with for simple string or log data
CSV files: use the csv module for tabular data
JSON: use json.dump() for structured data and API integration
Binary serialization: use pickle for arbitrary Python objects
Relational database: use sqlite3 for queryable, persistent structured data
Use 'a' mode to append, 'w' mode to overwrite
Use pathlib.Path for cross-platform file paths
Always close files or use the with statement to avoid data loss

Saving to Text Files

The most basic way to save data in Python is writing plain text to a file. You need to open a file, write the content, and close it. The safest pattern uses the with statement, which guarantees the file gets closed even if your code throws an exception mid-write.

# Writing plain text to a file
with open("output.txt", "w", encoding="utf-8") as f:
    f.write("Hello, world!\n")
    f.write("Line two of the file.\n")

# Reading it back
with open("output.txt", "r", encoding="utf-8") as f:
    content = f.read()
    print(content)

The encoding="utf-8" argument matters on systems where the default encoding is not UTF-8, which is common on Windows. Without it, writing non-ASCII characters like accented letters or non-Latin scripts can silently corrupt your data or raise an exception on some environments.

Writing data line by line is more memory-efficient for large files because the entire file is not held in memory at once.

# Writing a large file line by line
with open("log.txt", "w", encoding="utf-8") as f:
    for i in range(10000):
        f.write(f"Processing item {i}\n")

If you need to write non-string data, convert it to a string first using str() or f-strings.

# Writing numbers and mixed types as text
data = [("Alice", 32, 72000), ("Bob", 45, 91000), ("Carol", 28, 58000)]

with open("employees.txt", "w", encoding="utf-8") as f:
    for name, age, salary in data:
        f.write(f"{name},{age},{salary}\n")

# Reading back and parsing
with open("employees.txt", "r", encoding="utf-8") as f:
    for line in f:
        name, age, salary = line.strip().split(",")
        print(f"{name} is {age} years old and earns ${salary}")

Saving to CSV

Text files with comma-separated values work for simple data, but the moment your data has multiple fields, special characters, or numeric values, the csv module is the better tool. It handles quoting, escaping, and line endings correctly across platforms, which plain string splitting does not.

import csv

# Writing to a CSV file
data = [
    ["Name", "Age", "Department", "Salary"],
    ["Alice", 32, "Engineering", 72000],
    ["Bob", 45, "Marketing", 91000],
    ["Carol", 28, "Engineering", 58000],
    ["David", 38, "Sales", 68000],
]

with open("employees.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.writer(f)
    writer.writerows(data)

print("CSV file written successfully.")

The newline="" argument is required on Windows. Without it, Excel and other spreadsheet tools may display rows with double-spaced lines when they open your CSV. On Linux and macOS it does not cause problems either.

Reading the file back is straightforward with the csv.reader object.

import csv

with open("employees.csv", "r", newline="", encoding="utf-8") as f:
    reader = csv.reader(f)
    headers = next(reader)  # Read the header row
    print(f"Columns: {headers}")
    
    for row in reader:
        name, age, dept, salary = row
        print(f"{name} works in {dept} and is {age} years old.")

For more complex tabular data with named columns, use a DictionaryWriter or pandas, which build on top of the csv module and add powerful data manipulation features.

# Using DictWriter for named columns
import csv

data = [
    {"Name": "Alice", "Age": 32, "Department": "Engineering", "Salary": 72000},
    {"Name": "Bob", "Age": 45, "Department": "Marketing", "Salary": 91000},
]

with open("employees_dict.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=["Name", "Age", "Department", "Salary"])
    writer.writeheader()
    writer.writerows(data)

print("Done.")

Saving to JSON

JSON is the standard format for structured data in web APIs and configuration files. Python’s json module handles conversion between Python objects and JSON text. The basic functions are json.dump() for writing directly to a file and json.dumps() for converting to a string first.

import json

# Python dict and list map directly to JSON objects and arrays
config = {
    "database": {
        "host": "localhost",
        "port": 5432,
        "name": "myapp"
    },
    "debug": False,
    "allowed_users": ["alice", "bob", "carol"]
}

# Write to a JSON file
with open("config.json", "w", encoding="utf-8") as f:
    json.dump(config, f, indent=2)

# Read it back
with open("config.json", "r", encoding="utf-8") as f:
    loaded = json.load(f)

print(f"Database host: {loaded['database']['host']}")
print(f"Allowed users: {loaded['allowed_users']}")

The indent=2 argument produces readable output with proper indentation. Without it, the entire JSON sits on one line, which is harder to debug but slightly more compact.

json.dumps() is useful when you need the JSON as a string, for example to send over a network or store in a database field.

import json

data = {"message": "hello", "values": [1, 2, 3]}
json_string = json.dumps(data)
print(f"Type: {type(json_string)}, Value: {json_string}")
# Type: <class 'str'>, Value: {"message": "hello", "values": [1, 2, 3]}

If you already understand Python dictionaries and lists, JSON will feel immediately familiar because the structure maps one-to-one.

JSON has limitations. It cannot represent Python-specific types like datetime, bytes, or custom class instances directly. Dates should be converted to ISO 8601 strings before writing, and binary data should be encoded with base64.

import json
from datetime import datetime

# Handling datetime objects
records = [
    {"id": 1, "name": "Alice", "created": datetime.now().isoformat()},
    {"id": 2, "name": "Bob", "created": datetime(2025, 1, 15).isoformat()},
]

with open("records.json", "w", encoding="utf-8") as f:
    json.dump(records, f, indent=2)

# Reading back and parsing dates
with open("records.json", "r", encoding="utf-8") as f:
    loaded = json.load(f)
    
for record in loaded:
    record["created"] = datetime.fromisoformat(record["created"])
    print(f"{record['name']}: {record['created']}")

Saving to Binary with Pickle

JSON works for basic data types, but the moment you need to save a Python object that is not a dict, list, or primitive, JSON has no mechanism to represent it. That is where the pickle module comes in. Pickle serializes any Python object to a binary file and can reconstruct it exactly when loaded.

import pickle

class Employee:
    def __init__(self, name, age, salary):
        self.name = name
        self.age = age
        self.salary = salary
    
    def __repr__(self):
        return f"Employee({self.name}, {self.age}, {self.salary})"

# Create a custom object
alice = Employee("Alice", 32, 72000)

# Pickle it to a file
with open("employee.pkl", "wb") as f:
    pickle.dump(alice, f, protocol=pickle.HIGHEST_PROTOCOL)

# Load it back
with open("employee.pkl", "rb") as f:
    loaded_alice = pickle.load(f)

print(loaded_alice)
print(f"Salary: {loaded_alice.salary}")
print(isinstance(loaded_alice, Employee))

Pickle produces binary files, not human-readable text. Opening a .pkl file in a text editor shows gibberish. That is expected and fine.

The protocol parameter controls the pickle format version. Using HIGHEST_PROTOCOL produces the most compact and fastest serialization, but files written with a high protocol can only be read by Python 3 installations, not Python 2.

You can pickle lists, dictionaries, and nested structures of custom objects as well.

import pickle

class Order:
    def __init__(self, order_id, items):
        self.order_id = order_id
        self.items = items  # items is a list of product names

orders = [
    Order(1001, ["Widget A", "Widget B"]),
    Order(1002, ["Gadget X"]),
    Order(1003, ["Gizmo 1", "Gizmo 2", "Gizmo 3"]),
]

with open("orders.pkl", "wb") as f:
    pickle.dump(orders, f, protocol=pickle.HIGHEST_PROTOCOL)

with open("orders.pkl", "rb") as f:
    loaded_orders = pickle.load(f)

for order in loaded_orders:
    print(f"Order {order.order_id}: {len(order.items)} items")

One critical security note: never unpickle data from untrusted sources. A malicious pickle file can execute arbitrary code when loaded. Only unpickle data you have written yourself or from sources you fully control.

Saving to SQLite

SQLite is a self-contained relational database that stores data in a single .db or .sqlite file. It requires no server process and works out of the box with Python’s standard library. You get a real database with tables, columns, rows, and SQL queries.

Connect to a database using sqlite3.connect(). The connection object holds a cursor, and you execute SQL through the cursor. Call connection.commit() to finalize changes.

import sqlite3

# Create a connection (file is created if it does not exist)
conn = sqlite3.connect("app.db")
cursor = conn.cursor()

# Create a table
cursor.execute("""
    CREATE TABLE IF NOT EXISTS employees (
        id INTEGER PRIMARY KEY,
        name TEXT NOT NULL,
        age INTEGER,
        department TEXT,
        salary REAL
    )
""")

# Insert rows
cursor.execute("INSERT INTO employees (name, age, department, salary) VALUES (?, ?, ?, ?)",
               ("Alice", 32, "Engineering", 72000))
cursor.execute("INSERT INTO employees (name, age, department, salary) VALUES (?, ?, ?, ?)",
               ("Bob", 45, "Marketing", 91000))

conn.commit()

# Query the data
cursor.execute("SELECT name, department, salary FROM employees WHERE department = ?", ("Engineering",))
for row in cursor.fetchall():
    print(f"{row[0]} in {row[1]} earns ${row[2]:.2f}")

conn.close()

The ? placeholders in the SQL prevent SQL injection attacks. Always use parameterized queries rather than string formatting when inserting user-provided data.

Reading data back uses the same connection pattern.

import sqlite3

conn = sqlite3.connect("app.db")
cursor = conn.cursor()

# Get all employees with salary above a threshold
cursor.execute("SELECT name, salary FROM employees WHERE salary > ? ORDER BY salary DESC", (60000,))
results = cursor.fetchall()

print(f"Found {len(results)} employees:")
for name, salary in results:
    print(f"  {name}: ${salary:.2f}")

conn.close()

SQLite handles most operations correctly without explicit transactions, but wrapping multiple writes in an explicit transaction is faster because SQLite groups the writes into a single atomic operation.

import sqlite3

conn = sqlite3.connect("app.db")
cursor = conn.cursor()

try:
    cursor.execute("BEGIN")
    cursor.execute("INSERT INTO employees (name, age, department, salary) VALUES (?, ?, ?, ?)",
                   ("Carol", 28, "HR", 62000))
    cursor.execute("INSERT INTO employees (name, age, department, salary) VALUES (?, ?, ?, ?)",
                   ("David", 41, "Finance", 85000))
    conn.commit()
    print("Transaction committed successfully.")
except Exception as e:
    conn.rollback()
    print(f"Error: {e}, transaction rolled back.")
finally:
    conn.close()

Append vs Write Mode

When you open a file, you choose what happens to existing content. Use 'w' to create a new file or truncate an existing one to zero bytes. Use 'a' to append to the end of an existing file without touching its current contents.

# Write mode: always starts fresh
with open("log.txt", "w") as f:
    f.write("First run\n")
    # File now contains "First run\n"

# Calling again with 'w' erases the previous content
with open("log.txt", "w") as f:
    f.write("Second run\n")
    # File now contains only "Second run\n"

# Append mode: adds to the end without removing existing content
with open("log.txt", "a") as f:
    f.write("Third run\n")

with open("log.txt", "r") as f:
    print(f.read())
# Output:
# Second run
# Third run

Append mode is the right choice for log files, activity feeds, and any data where new records should be added over time without erasing history. Write mode is correct when you are outputting a complete fresh result each time, for example writing a generated report or exporting processed data.

There is no built-in insert-in-the-middle mode for plain text files. If you need that, use a database or read the entire file, modify it in memory, and write it back.

File Paths and Cross-Platform Compatibility

Hardcoding file paths with forward slashes or backslashes breaks across operating systems. Python’s pathlib.Path handles this automatically and makes your code more readable.

from pathlib import Path

# Build paths that work on any OS
data_dir = Path("output") / "data"
data_dir.mkdir(parents=True, exist_ok=True)  # Creates directories as needed

file_path = data_dir / "results.csv"
print(f"Writing to: {file_path}")

# Use the path object directly with open()
with open(file_path, "w", encoding="utf-8") as f:
    f.write("col1,col2,col3\n")
    f.write("a,b,c\n")

# List all files in the directory
for f in data_dir.iterdir():
    print(f"  {f.name}")

On Windows, Path("output/data/results.csv") automatically converts forward slashes to backslashes. On Linux and macOS, it passes them through unchanged. Never use string concatenation to build file paths in production code.

# Do this
from pathlib import Path
path = Path("output") / "reports" / "summary.csv"

# Never this
path = "output/" + "reports/" + "summary.csv"  # Breaks on Windows in some contexts
path = f"output\\reports\\summary.csv"         # Breaks on Linux and macOS

Common Pitfalls

Forgetting to Close Files

The biggest cause of data loss I see in practice is forgetting to close files after writing. If your program crashes before the buffer is flushed to disk, the file may be empty or partially written.

# Bad: file may not be flushed if crash happens before close()
f = open("data.txt", "w")
f.write("important data")
# If crash happens here, data is lost
f.close()

# Good: with statement guarantees the file is closed
with open("data.txt", "w") as f:
    f.write("important data")
# File is guaranteed closed even if exception is raised

The with statement is the correct pattern for all file operations in Python. Never use bare open() without it for write operations.

Encoding Issues

On systems configured with a legacy locale, opening files without specifying encoding can cause silent corruption or UnicodeEncodeError exceptions.

# Always specify encoding for text files
with open("data.txt", "w", encoding="utf-8") as f:
    f.write("Unicode text: \u00e9 \u4e2d \u6587\n")

# This fails on many Windows systems without encoding指定
# with open("data.txt", "w") as f:  # Dangerous
#     f.write("Unicode text: \u00e9")

If you are processing files that use a different encoding, such as Windows-1252 or Shift-JIS, specify that encoding explicitly rather than relying on the system default.

Newline Characters

Different operating systems use different characters to represent line endings. Unix uses \n, macOS legacy versions used \r, and Windows traditionally used \r\n. Python’s text mode handles these conversions automatically, but mixing binary and text mode can cause problems.

# For CSV files, always use newline="" in text mode
import csv

with open("data.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.writer(f)
    writer.writerow(["Name", "Value"])
    writer.writerow(["Test", 100])
# Without newline="", rows may be double-spaced on Windows

File Mode Confusion

Opening a file in 'r+' mode does not create it if it does not exist. If you try to open a non-existent file with 'r+', you get a FileNotFoundError. Use 'w+' if you need to create and write to a file that may not exist yet.

from pathlib import Path

# Safe write: 'w' creates the file if it does not exist
with open("output.csv", "w", encoding="utf-8") as f:
    f.write("fresh content\n")

# Trying to read a file that does not exist raises an error
try:
    with open("does_not_exist.txt", "r") as f:
        pass  # FileNotFoundError raised here
except FileNotFoundError:
    print("File does not exist, handle this case.")

# Use exist_ok=True with pathlib to avoid errors when creating
Path("output_dir").mkdir(parents=True, exist_ok=True)

Comparison: CSV vs JSON vs Pickle vs SQLite

Each storage format suits different use cases. Here is how to choose.

Format	Human-readable	Python objects	Queryable	Best for
CSV	Yes	No (flat data only)	Limited (row-by-row)	Tabular data, spreadsheets, data exchange
JSON	Yes	Limited (built-in types only)	No	Config files, web APIs, nested data
Pickle	No	Yes (any Python object)	No	Serializing model objects, caches, intermediate state
SQLite	No	No (use explicit column types)	Yes (full SQL)	Applications needing queries, relationships, indexes

For a data pipeline that outputs processed tabular data, CSV is usually the right choice because the output is portable and can be opened in Excel or loaded with pandas. For configuration files that a human may need to edit by hand, JSON is the right pick. For saving trained machine learning models or custom class instances, pickle is the standard choice. For anything that needs to support complex queries, filtering, or relationships between entities, SQLite is the right tool.

Frequently Asked Questions

How do I save data in Python?

Use open() with the with statement to write data to a file. Choose the format based on your data type: plain text for logs, CSV for tabular data, JSON for structured data, pickle for Python objects, and SQLite for relational data. Each module is part of Python’s standard library with no installation required.

What is the difference between write mode and append mode?

Write mode ('w') creates a new file or erases an existing file before writing. Append mode ('a') adds new content to the end of an existing file without changing what is already there. Use append mode for logs and write mode for fresh exports.

Can pickle handle any Python object?

Pickle can serialize most Python objects, including custom class instances, functions, and nested data structures. It cannot serialize things like open file handles or database connections. Always use parameterized queries rather than string formatting when inserting user-provided data into any database.

When should I use SQLite instead of JSON or CSV?

SQLite makes sense when you need to query your data selectively, handle multiple related tables, or support concurrent access from different processes. If you only need to store and retrieve complete records with no querying, CSV or JSON is simpler.

How do I handle file paths on different operating systems?

Use pathlib.Path for all path operations. Path objects automatically convert to the correct separator for the operating system and provide clean methods for creating directories and checking file existence.

Why should I always use the with statement when writing files?

The with statement guarantees the file is closed even when your code throws an exception mid-write. Without it, unflushed buffers may be lost if the program crashes before closing the file explicitly.

How do I save data from a Python dictionary to a file?

Use the json module for human-readable output or the pickle module if you need to preserve the exact Python types. The json.dump() function writes a dictionary directly to a JSON file. The pickle.dump() function handles dictionaries and any other Python object.