# Data Analysis with Python and PySpark This is the companion repository for the _Data Analysis with Python and PySpark_ book (Manning, estimated publishing date: 2022.) It contains the source code and data download scripts, when pertinent. ## Get the data The complete data set for the book hovers at around ~1GB. Because of this, [I moved the data sources to Drobpox]( https://www.dropbox.com/sh/ebwuv1y2rrwl6v8/AAAPEQ8F12RMKcmC8pjFUYiSa?dl=0) to avoid cloning a gigantic repository. The book assumes the data is under `./data`. ## Mistakes or omissions If you encounter mistakes in the book manuscript (including the printed source code), please use the Manning platform to provide feedback.