|
| 1 | +# The Predictive Lambda Pattern |
| 2 | + |
| 3 | + |
| 4 | + |
| 5 | +This is a pattern that uses a container inside Lambda to deploy a custom Python ML model to predict the nearest Chipotle restaurant based on your lat/long. |
| 6 | + |
| 7 | +Some Useful References: |
| 8 | + |
| 9 | +| Author | Link | |
| 10 | +| ------------- | ------------- | |
| 11 | +| AWS Blog | [New for AWS Lambda – Container Image Support](https://aws.amazon.com/blogs/aws/new-for-aws-lambda-container-image-support/) | |
| 12 | +| AWS Docs | [Lambda now supports container images](https://aws.amazon.com/about-aws/whats-new/2020/12/aws-lambda-now-supports-container-images-as-a-packaging-format/) | |
| 13 | +| Yan Cui | [Package your Lambda function as a container image](https://lumigo.io/blog/package-your-lambda-function-as-a-container-image/) | |
| 14 | +| Scikit Learn Docs | [User Guide](https://scikit-learn.org/stable/user_guide.html) | |
| 15 | +| AWS ECR Gallery | [Python Lambda Image](https://gallery.ecr.aws/lambda/python) | |
| 16 | +| Docker Docs | [CLI Reference](https://docs.docker.com/reference/) | |
| 17 | + |
| 18 | +## What's Included In This Pattern? |
| 19 | +This pattern uses sklearn to create a custom k nearest neighbour model to predict the nearest Chipotle to a given Latitude and Longitude. The model is deployed inside a container attached to AWS Lambda. |
| 20 | + |
| 21 | +### The Data |
| 22 | +If you want to look at the data used for this model you can look at the [jupyter notebook](model/training/Chipotle.ipynb), the raw data came from [kaggle](https://www.kaggle.com/jeffreybraun/chipotle-locations) |
| 23 | + |
| 24 | +### The ML Model |
| 25 | +This is a very simple model to demonstrate the concept (I didn't even check the accuracy because it doesn't change the pattern). It uses [sklearn nearest neighbors](https://scikit-learn.org/stable/modules/neighbors.html) to predict the closest Chipotle location to a given lat/long |
| 26 | + |
| 27 | +### Two Docker Containers |
| 28 | +I use the Lambda image to train the ML model in one container and then I use a separate container for the deployed Lambda Function. The reason I do this is because it means that you know you have pickled your model in the same environment it will be deployed but you can use things that wont be packaged into your deployed function keeping it as lightweight as possible. You will also have a built container image containing the raw data, the training logic and the trained model. These images could be archived to have a history of your model. |
| 29 | + |
| 30 | +### A Lambda Function |
| 31 | +I have this setup with a 15 second timeout and 4GB ram to comfortably run our model |
| 32 | + |
| 33 | +### An API Gateway HTTP API |
| 34 | +Setup as a proxy integration, all requests hit the Lambda Function |
| 35 | + |
| 36 | +## How Do I Test This Pattern? |
| 37 | + |
| 38 | +do "npm run deploy" from the base directory and you will have the url for an API Gateway output into the logs or in the CloudFormation console. Open that url in a browser but add "?lat=39.153198&long=-77.066176" to the end and you should get back a prediction. |
| 39 | + |
| 40 | +## How Does It Work? |
| 41 | + |
| 42 | +Most of the logic for this lives in the model folder. There are two Dockerfiles: |
| 43 | +- Dockerfile - used by Lambda during the deploy |
| 44 | +- TrainingDockerfile - used to spin up the container to train our model |
| 45 | + |
| 46 | +I have added the trained model to version control but if you want to retrain it yourself what you have to do is make sure docker is running and: |
| 47 | + |
| 48 | +```bash |
| 49 | +cd model |
| 50 | +./trainmodel.sh |
| 51 | +``` |
| 52 | + |
| 53 | +This uses the Lambda Python image to run the file training/training.py and then copy the chipotle.pkl file out of the container. The requirements.txt is shared between the training container and the deployed container. |
| 54 | + |
| 55 | +The actual logic that runs when we hit our url is in model/deployment/app.py, it unpickles the model, makes a prediction and returns the response as a string. |
| 56 | + |
| 57 | + |
| 58 | +## Useful CDK Commands |
| 59 | + |
| 60 | +To manually create a virtualenv on MacOS and Linux: |
| 61 | + |
| 62 | +``` |
| 63 | +$ python3 -m venv .venv |
| 64 | +``` |
| 65 | + |
| 66 | +After the init process completes and the virtualenv is created, you can use the following |
| 67 | +step to activate your virtualenv. |
| 68 | + |
| 69 | +``` |
| 70 | +$ source .venv/bin/activate |
| 71 | +``` |
| 72 | + |
| 73 | +If you are a Windows platform, you would activate the virtualenv like this: |
| 74 | + |
| 75 | +``` |
| 76 | +% .venv\Scripts\activate.bat |
| 77 | +``` |
| 78 | + |
| 79 | +Once the virtualenv is activated, you can install the required dependencies. |
| 80 | + |
| 81 | +``` |
| 82 | +$ pip install -r requirements.txt |
| 83 | +``` |
| 84 | + |
| 85 | +At this point you can now synthesize the CloudFormation template for this code. |
| 86 | + |
| 87 | +``` |
| 88 | +$ cdk synth |
| 89 | +``` |
| 90 | + |
| 91 | +To add additional dependencies, for example other CDK libraries, just add |
| 92 | +them to your `setup.py` file and rerun the `pip install -r requirements.txt` |
| 93 | +command. |
| 94 | + |
| 95 | +## Useful commands |
| 96 | + |
| 97 | + * `cdk ls` list all stacks in the app |
| 98 | + * `cdk synth` emits the synthesized CloudFormation template |
| 99 | + * `cdk deploy` deploy this stack to your default AWS account/region |
| 100 | + * `cdk diff` compare deployed stack with current state |
| 101 | + * `cdk docs` open CDK documentation |
| 102 | + |
| 103 | +Enjoy! |
0 commit comments