README.md

NovelTTS

AI Application for Converting Novel Text to Audiobooks

High-quality speech synthesis using Zhipu GLM-TTS with Bilibili audio voice cloning support

English | 中文 | Русский | 한국어 | 日本語

Development Log: Agent&Chat.md

✨ Features

📖 Novel Text Reading - Supports .txt, .md files, and URL content extraction
🎯 Intelligent Text Segmentation - Automatically split long text into TTS-suitable segments
🎙️ AI Speech Synthesis - High-quality voice generation based on Zhipu GLM-TTS
🎭 Voice Cloning - Extract reference audio from Bilibili videos for voice cloning via GLM-TTS-Clone
🎵 Audio Processing - Audio merging and format conversion using NAudio
🔄 Smart Retry - Retry mechanism for API call failures using Polly
📊 Progress Tracking - Real-time processing progress display

🏗️ Architecture

The project adopts Clean Architecture design pattern:

NovelTTSApp/
├── src/
│   ├── Core/                    # Core Layer - Domain Entities & Interfaces
│   │   ├── Entities/            # Domain Entities
│   │   │   ├── Novel.cs         # Novel Entity
│   │   │   ├── AudioSegment.cs  # Audio Segment Entity
│   │   │   └── VoiceReference.cs# Voice Reference Entity
│   │   └── Interfaces/          # Core Interfaces
│   │       ├── INovelReader.cs
│   │       ├── ITextSegmenter.cs
│   │       ├── ITtsService.cs
│   │       ├── IAudioProcessor.cs
│   │       ├── IBilibiliDownloader.cs
│   │       └── INovelProcessor.cs
│   │
│   ├── Infrastructure/          # Infrastructure Layer - Implementations
│   │   ├── Configuration/       # Configuration Classes
│   │   ├── Services/            # Service Implementations
│   │   │   ├── NovelReader.cs
│   │   │   ├── TextSegmenter.cs
│   │   │   ├── ZhipuTtsService.cs
│   │   │   ├── AudioProcessor.cs
│   │   │   └── BilibiliDownloader.cs
│   │   └── DependencyInjection.cs
│   │
│   └── App/                     # Application Layer - Main Program
│       ├── Services/
│       │   └── NovelProcessor.cs
│       ├── Program.cs
│       └── appsettings.json
│
└── NovelTTSApp.sln

🚀 Quick Start

Prerequisites

.NET 10.0 SDK or higher
Zhipu AI API Key (Get here)

Installation & Configuration

Clone Project

git clone https://github.com/your-repo/NovelTTSApp.git
cd NovelTTSApp

Configure API Key

Edit src/App/appsettings.json:

{
  "AI": {
    "Endpoint": "https://open.bigmodel.cn/api/paas/v4/",
    "ApiKey": "YOUR_API_KEY_HERE",
    "ModelId": "glm-4-voice"
  },
  "Paths": {
    "InputFolder": "./data/novels",
    "OutputFolder": "./data/output",
    "ReferenceAudioFolder": "./data/reference_audio",
    "TempFolder": "./data/temp"
  }
}

Build Project

dotnet build -c Release

Run Program

dotnet run --project src/App

📖 Usage

Command Line Arguments

NovelTTSApp [options]

Options:
    -i, --input <path>     Input novel file path (.txt or .md)
    -o, --output <path>    Output audio file path (.mp3)
    -c, --chapter <name>   Chapter filter keyword
    -v, --voice <url>      Bilibili video URL for voice cloning (optional)
    -h, --help             Show help information

Usage Examples

# Process all novels in default input folder
dotnet run --project src/App

# Process specific chapter
dotnet run --project src/App -- -c "Chapter 1"

# Use Bilibili video for voice cloning
dotnet run --project src/App -- -c "Chapter 1" -v https://www.bilibili.com/video/BV1xxxxxxxx

# Process single novel file
dotnet run --project src/App -- -i ./mynovel.txt -o ./mynovel.mp3

🎭 Voice Cloning

Voice cloning is implemented via Zhipu GLM-TTS-Clone API, complete workflow:

1. Download and extract reference audio from Bilibili video (10-second clip)
2. Upload audio to Zhipu API to get file_id (purpose: voice-clone-input)
3. Call voice/clone to create voice → obtain voice_id
4. Use voice_id to call GLM-TTS to generate cloned voice

📚 Reference: GLM-TTS-Clone

📁 Data Directory Structure

data/
├── novels/              # Novel text source files
│   └── BookName/
│       └── 01.Chapter1/
│           ├── 001.Prologue.txt
│           └── 002.Introduction.txt
├── output/              # Generated audiobook files
├── reference_audio/     # Reference audio from Bilibili
└── temp/                # Temporary audio segment files

🔧 Core Dependencies

Library	Version	Purpose
Microsoft.Extensions.AI	Latest	.NET AI Unified Abstraction Layer
NAudio	2.2.1	Audio Processing (format conversion, merging)
HtmlAgilityPack	1.11.59	HTML Parsing (web novel extraction)
Serilog	4.2.0	Structured Logging
Polly	8.0.0	Resilience (retry mechanism)

📊 Business Flow

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Asset Prep    │────▶│  Text Process   │────▶│   AI Generate   │
│                 │     │                 │     │                 │
│ • Read novel    │     │ • Text cleaning │     │ • Call Zhipu API│
│ • B站 audio     │     │ • Smart segment │     │ • Stream process│
│ • Voice clone   │     │ • Voice clone   │     │ • Voice generate│
└─────────────────┘     └─────────────────┘     └────────┬────────┘
                                                         │
                                                         ▼
                                               ┌─────────────────┐
                                               │   Post Process  │
                                               │                 │
                                               │ • Merge segments│
                                               │ • Format convert│
                                               └─────────────────┘

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Made with ❤️ using .NET 10 and Zhipu AI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NovelTTS

✨ Features

🏗️ Architecture

🚀 Quick Start

Prerequisites

Installation & Configuration

📖 Usage

Command Line Arguments

Usage Examples

🎭 Voice Cloning

📁 Data Directory Structure

🔧 Core Dependencies

📊 Business Flow

📄 License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

NovelTTS

✨ Features

🏗️ Architecture

🚀 Quick Start

Prerequisites

Installation & Configuration

📖 Usage

Command Line Arguments

Usage Examples

🎭 Voice Cloning

📁 Data Directory Structure

🔧 Core Dependencies

📊 Business Flow

📄 License