Automated setup for Data Science, AI/ML, and Research workflows on Windows
The Data Science Kit is a robust, modular PowerShell installer that automates the setup of a comprehensive environment for data science, AI/ML development, and automated research on Windows. Whether you're setting up for basic document authoring, advanced machine learning, or specific projects like ResilienceScan, this installer provides tailored installation profiles to get you productive quickly.
- Windows 10/11 (64-bit recommended)
- Administrator privileges
- Internet connection
-
Clone or download this repository
git clone https://github.com/your-org/Data-ScienceKit.git cd Data-ScienceKit/Scripts
-
Set execution policy (if needed)
PowerShell -ExecutionPolicy Bypass -File .\Main-Installer.ps1 Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
Set execution
Set-ExecutionPolicy -ExecutionPolicy Bypass -Scope Process -Force -
Run the installer
.\Main-Installer.ps1 -
Select your profile when prompted, or specify directly:
.\Main-Installer.ps1 -Profile "RecilienceScan"
Choose the profile that best matches your needs:
Minimal setup for RecilienceScan report automation pipeline
- Git, Python 3.11+, Quarto CLI, TinyTeX
- Perfect for automated report generation and email delivery
Basic tools for coding and version control
- Git, VS Code, Windows Terminal, Package Managers
- Ideal for general development work
Document authoring with Python/R basics
- Essential tools + Python, R, Quarto, basic packages
- Great for academic writing and basic analysis
Complete data science environment
- Full Python & R stack, RStudio, Jupyter, visualization tools
- Comprehensive setup for data analysis and research
Advanced AI/ML stack with deep learning frameworks
- DataScience + TensorFlow, PyTorch, Transformers, NLP tools
- Ready for machine learning and AI development
Big data processing with distributed computing
- AI_ML + Apache Spark, Hadoop, Java JDK, cloud tools
- Handles large-scale data processing
Complete installation with all available components
- Everything included: all tools, packages, fonts, utilities
- Maximum capability installation
The installer uses a modular architecture with 50+ specialized PowerShell modules in the modules/ directory. Each module handles a specific installation task and can run independently.
- Main-Installer.ps1 - Orchestrates the entire installation process
- modules/ - Individual installation modules for specific tools
- requirements/ - Package requirement files for Python/R
- assets/ - Custom fonts, configurations, and resources
- System Prerequisites - Admin verification, execution policy, internet check
- Package Managers - Chocolatey, Scoop, Winget availability
- Core Tools - Git, Python, R, development environments
- Specialized Packages - Data science, AI/ML, visualization libraries
- Publishing Stack - Quarto, LaTeX, documentation tools
- Verification - Environment testing and comprehensive reporting
- Package Managers: Chocolatey, Scoop detection, Winget checking
- Version Control: Git with automated installation
- Programming Languages: Python 3.11+ with automated setup
- Publishing: Quarto CLI, TinyTeX for PDF generation
- Development: VS Code (basic installation)
- System Tools: Windows Terminal, 7-Zip (modules exist)
- R Environment: R language, RStudio, comprehensive R packages
- Data Science Stack: pandas, numpy, scipy, scikit-learn, visualization libraries
- AI & Machine Learning: TensorFlow, PyTorch, Transformers, NLP tools
- Database Tools: SQLite tools, DBeaver, database connectors
- Big Data: Apache Spark, Hadoop, distributed computing
- Advanced Publishing: Custom fonts, advanced Quarto features
- Productivity Tools: FFmpeg, Draw.io, additional utilities
See the comprehensive planning document for the full intended scope
Data-ScienceKit/
βββ Scripts/
β βββ Main-Installer.ps1 # Main installation orchestrator
β βββ modules/ # Individual installation modules
β β βββ Install-Git.ps1
β β βββ Install-PythonCore.ps1
β β βββ Install-QuartoCLI.ps1
β β βββ Test-SystemRequirements.ps1
β β βββ ... (50+ modules)
β βββ requirements/
β β βββ requirements.txt # Python packages
β β βββ r-packages.txt # R packages
β βββ assets/ # Fonts, configs, resources
βββ README.md # This file
βββ LICENSE
- Continues installation even if non-critical components fail
- Detailed logging and error reporting
- Comprehensive final installation report
- Real-time progress indicators
- Phase-by-phase installation status
- Clear success/failure feedback
- Skips already-installed components
- Validates existing installations
- Refreshes environment variables automatically
- Detailed installation logs
- JSON-formatted results for automation
- Environment verification reports
PowerShell Execution Policy Error
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser -Force"Module file not found" errors
- Ensure you're running from the
Scripts/directory - Check that all required module files exist in
modules/
Package manager not found (choco/scoop)
- The installer handles package manager installation automatically
- Restart PowerShell session if PATH refresh doesn't work
Python/R packages fail to install
- Check internet connectivity
- Verify Python/R are properly installed first
- Some packages may require manual installation
- Check the installation log file for detailed error information
- Run individual modules to isolate issues:
cd modules .\Install-PythonCore.ps1 - Use the verification modules to test your environment:
.\Test-SystemRequirements.ps1
You can modify the profile definitions in Main-Installer.ps1 to create custom installation combinations.
All modules can be run independently for testing or custom installations:
cd modules
.\Install-Git.ps1
.\Install-PythonCore.ps1.\Main-Installer.ps1 -ForceReinstall -Profile "DataScience".\Main-Installer.ps1 -SkipChecks -Profile "Essential"Contributions are welcome! The modular architecture makes it easy to:
- Add new installation modules
- Enhance existing modules
- Create new installation profiles
- Improve error handling and reporting
This project is licensed under the MIT License - see the LICENSE file for details.
Built for the Lectoraat Supply Chain Finance at Windesheim University of Applied Sciences, supporting data science education and research automation.
Need a specific tool setup? Create an issue or contribute a new module to help the community!