An open-source tool for visualizing and exploring etymological patterns using RAG (Retrieval Augmented Generation) and AI. This project combines traditional etymology resources with modern NLP techniques to provide interactive visualizations of word origins and evolution.
This project is currently under active development. While core functionality is implemented, some features may be incomplete or subject to change. Contributions and feedback are welcome!
- Interactive Etymology Tree Visualization: Explore word evolution through time with an interactive tree visualization
- Morphological Analysis: Break down words into their constituent stems, roots, and affixes
- Multi-Source Research: Combines data from:
- Local etymonline dataset
- Wiktionary API
- Web scraping (etymonline.com)
- AI-powered analysis
- RAG Pipeline: Efficient retrieval and storage of etymology data using ChromaDB
- Semantic Search: Find similar words and patterns using embeddings
- Caching System: Local caching of research results for improved performance
- Modular Architecture: Clean separation of concerns between data sources, analysis, and visualization
The system is built with a modular architecture consisting of:
RAGPipeline: Central coordinator for data storage and retrievalEtymologyCache: Persistent caching using ChromaDBResearchAgent: Orchestrates etymology research across multiple sourcesSimilarityAgent: Handles word and stem similarity computationsTreeAgent: Constructs hierarchical visualizations of word evolutionStemAgent: Performs morphological analysis of words
- Local etymonline dataset
- Wiktionary API integration
- Web scraping capabilities
- LLM-powered analysis
- Streamlit-based GUI with:
- Interactive etymology trees
- Timeline views
- Stem analysis breakdowns
- Similarity networks
-
Clone the repository: ```bash git clone https://github.com/yourusername/Etymologistics.git cd Etymologistics ```
-
Install dependencies: ```bash pip install -r requirements.txt ```
-
Set up environment variables: ```bash
OPENAI_API_KEY=your_key_here # Optional, for enhanced analysis WIKTIONARY_API_KEY=your_key_here # Optional ETYMONLINE_API_KEY=your_key_here # Optional ```
- Run the application: ```bash streamlit run src/gui.py ```
- Enter a word in the search box
- View the etymology tree visualization
- Switch between different views:
- Tree View: Interactive visualization of word evolution
- Timeline View: Chronological representation of changes
- Stem Analysis: Morphological breakdown of the word
This project is open to contributions! Please feel free to:
- Report bugs
- Suggest features
- Submit pull requests
[Add your chosen license here]
- Etymology data from etymonline.com
- Wiktionary API
- Sentence transformers for embeddings
- ChromaDB for vector storage
- Streamlit for visualization
- Some features require API keys for full functionality
- Web scraping is used as a fallback and may be unreliable
- LLM analysis quality depends on the model and prompt engineering
- Research results may vary in completeness and accuracy
- Enhanced visualization options
- Additional etymology data sources
- Improved similarity search
- Better handling of compound words
- Extended language family support