diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..c4597ae --- /dev/null +++ b/.gitignore @@ -0,0 +1,5 @@ +samconfig.toml +.aws-sam +.idea +.history +__pycache__ diff --git a/README.md b/README.md index 61f3b57..39f3b14 100644 --- a/README.md +++ b/README.md @@ -1,83 +1,260 @@ -# AWS Lambda Reference Architecture: Real-time File Processing +# Serverless Reference Architecture: Real-time File Processing -The Real-time File Processing reference architecture is a general-purpose, event-driven, parallel data processing architecture that utilizes [AWS Lambda](https://aws.amazon.com/lambda). This architecture is ideal for workloads that need more than one data derivative of an object. This simple architecture is described in this [diagram](https://s3.amazonaws.com/awslambda-reference-architectures/file-processing/lambda-refarch-fileprocessing.pdf) and [blog post](https://aws.amazon.com/blogs/compute/fanout-s3-event-notifications-to-multiple-endpoints/). This sample applicaton demonstrates a Markdown conversion application where Lambda is used to convert Markdown files to HTML and plain text. +The Real-time File Processing reference architecture is a general-purpose, event-driven, parallel data processing architecture that uses [AWS Lambda](https://aws.amazon.com/lambda). This architecture is ideal for workloads that need more than one data derivative of an object. -## Running the Example +In this example application, we deliver notes from an interview in Markdown format to S3. S3 Events are used to trigger multiple processing flows - one to convert and persist Markdown files to HTML and another to detect and persist sentiment. -The provided [AWS CloudFormation template](https://s3.amazonaws.com/awslambda-reference-architectures/file-processing/lambda_file_processing.template) can be used to launch a stack that demonstrates the Lambda file processing reference architecture. Detailed information about the this template can be found in the CloudFormation Template Details section below. +## Architectural Diagram -**Important:** Because the AWS CloudFormation stack name is used in the name of the S3 buckets, that stack name must only contain lowercase letters. Please use lowercase letters when typing the stack name. The provided CloudFormation template retreives its Lambda code from a bucket in the us-east-1 region. To launch this sample in another region, please modify the template and upload the Lambda code to a bucket in that region. +![Reference Architecture - Real-time File Processing](img/lambda-refarch-fileprocessing-simple.png) +## Application Components -Use the button below to launch the stack via the AWS Console. +### Event Trigger -[![Launch into Lambda ETL into North Virginia with CloudFormation](http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/images/cloudformation-launch-stack-button.png)](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/new?stackName=lambda-file-processing&templateURL=https://s3.amazonaws.com/awslambda-reference-architectures/file-processing/lambda_file_processing.template) +In this architecture, individual files are processed as they arrive. To achive this, we utilize [AWS S3 Events](https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html) and [Amazon Simple Notification Service](https://docs.aws.amazon.com/sns/latest/dg/welcome.html). When an object is created in S3, an event is emitted to a SNS topic. We deliver our event to 2 seperate [SQS Queues](https://aws.amazon.com/sqs/), representing 2 different workflows. Refer to [What is Amazon Simple Notification Service?](https://docs.aws.amazon.com/sns/latest/dg/welcome.html) for more information about eligible targets. -Alternatively, you can use the following command to launch the stack using the AWS CLI. This assumes you have already [installed the AWS CLI](http://docs.aws.amazon.com/cli/latest/userguide/installing.html). +### Conversion Workflow + +Our function will take Markdown files stored in our **InputBucket**, convert them to HTML, and store them in our **OutputBucket**. The **ConversionQueue** SQS queue captures the S3 Event JSON payload, allowing for more control of our **ConversionFunction** and better error handling. Refer to [Using AWS Lambda with Amazon SQS](https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html) for more details. + +If our **ConversionFunction** cannot remove the messages from the **ConversionQueue**, they are sent to **ConversionDlq**, a dead-letter queue (DLQ), for inspection. A CloudWatch Alarm is configured to send notification to an email address when there are any messages in the **ConversionDlq**. + +### Sentiment Analysis Workflow + +Our function will take Markdown files stored in our **InputBucket**, detect the overall sentiment for each file, and store the result in our **SentimentTable**. + +We are using [Amazon Comprehend](https://aws.amazon.com/comprehend/) to detect overall interview sentiment. Amazon Comprehend is a machine learning powered service that makes it easy to find insights and relationships in text. We use the Sentiment Analysis API to understand whether interview responses are positive or negative. + +The Sentiment workflow uses the same SQS-to-Lambda Function pattern as the Coversion workflow. + +If our **SentimentFunction** cannot remove the messages from the **SentimentQueue**, they are sent to **SentimentDlq**, a dead-letter queue (DLQ), for inspection. A CloudWatch Alarm is configured to send notification to an email address when there are any messages in the **SentimentDlq**. + +## Building and Deploying the Application with the AWS Serverless Application Model (AWS SAM) + +This application is deployed using the [AWS Serverless Application Model (AWS SAM)](https://aws.amazon.com/serverless/sam/). AWS SAM is an open-source framework that enables you to build serverless applications on AWS. It provides you with a template specification to define your serverless application, and a command line interface (CLI) tool. + +### Pre-requisites + +* [AWS CLI version 2](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) + +* [AWS SAM CLI (0.41.0 or higher)](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html) + +* [Docker](https://docs.docker.com/install/) + +### Clone the Repository + +#### Clone with SSH + +```bash +git clone git@github.com:aws-samples/lambda-refarch-fileprocessing.git +``` + +#### Clone with HTTPS + +```bash +git clone https://github.com/aws-samples/lambda-refarch-fileprocessing.git +``` + +### Build + +The AWS SAM CLI comes with abstractions for a number of Lambda runtimes to build your dependencies, and copies the source code into staging folders so that everything is ready to be packaged and deployed. The *sam build* command builds any dependencies that your application has, and copies your application source code to folders under *.aws-sam/build* to be zipped and uploaded to Lambda. ```bash -aws cloudformation create-stack \ - --stack-name lambda-file-processing \ - --template-url https://s3.amazonaws.com/awslambda-reference-architectures/file-processing/lambda_file_processing.template \ - --capabilities CAPABILITY_IAM +sam build --use-container ``` -## Testing +**Note** -Once you have created the stack using the provided template, you can test the system by uploading a Markdown file to the InputBucket that was created in the stack. The README.md file in this repository can be used as an example file. Once the file has been uploaded, you can see the resulting HTML and plain text files in the output bucket of your stack. You can also view the CloudWatch logs for each of the functions in order to see the details of their execution. +Be sure to use v0.41.0 of the AWS SAM CLI or newer. Failure to use the proper version of the AWS SAM CLI will result in a `InvalidDocumentException` exception. The `EventInvokeConfig` property is not recognized in earlier versions of the AWS SAM CLI. To confirm your version of AWS SAM, run the command `sam --version`. + +### Deploy + +For the first deployment, please run the following command and save the generated configuration file *samconfig.toml*. Please use **lambda-file-refarch** for the stack name. + +```bash +sam deploy --guided +``` + +You will be prompted to enter data for *ConversionLogLevel* and *SentimentLogLevel*. The default value for each is *INFO* but you can also enter *DEBUG*. You will also be prompted for *AlarmRecipientEmailAddress*. + +Subsequent deployments can use the simplified `sam deploy`. The command will use the generated configuration file *samconfig.toml*. + +You will receive an email asking you to confirm subscription to the `lambda-file-refarch-AlarmTopic` SNS topic that will receive alerts should either the `ConversionDlq` SQS queue or `SentimentDlq` SQS queue receive messages. + +## Testing the Example + +After you have created the stack using the CloudFormation template, you can manually test the system by uploading a Markdown file to the InputBucket that was created in the stack. + +Alternatively you test it by utilising the pipeline tests.sh script, however the test script removes the resources it creates, so if you wish to explore the solution and see the output files +and DynamoDB tables manually uploading is the better option. + +### Manually testing + + You can use the any of the sample-xx.md files in the repository /**tests** directory as example files. After the files have been uploaded, you can see the resulting HTML file in the output bucket of your stack. You can also view the CloudWatch logs for each of the functions in order to see the details of their execution. You can use the following commands to copy a sample file from the provided S3 bucket into the input bucket of your stack. +```bash +INPUT_BUCKET=$(aws cloudformation describe-stack-resource --stack-name lambda-file-refarch --logical-resource-id InputBucket --query "StackResourceDetail.PhysicalResourceId" --output text) +aws s3 cp ./tests/sample-01.md s3://${INPUT_BUCKET}/sample-01.md +aws s3 cp ./tests/sample-02.md s3://${INPUT_BUCKET}/sample-02.md +``` + +Once the input files has been uploaded to the input bucket, a series of events are put into motion. + +1. The input Markdown files are converted and stored in a separate S3 bucket. ``` -BUCKET=$(aws cloudformation describe-stack-resource --stack-name lambda-file-processing --logical-resource-id InputBucket --query "StackResourceDetail.PhysicalResourceId" --output text) -aws s3 cp s3://awslambda-reference-architectures/file-processing/example.md s3://$BUCKET/example.md +OUTPUT_BUCKET=$(aws cloudformation describe-stack-resource --stack-name lambda-file-refarch --logical-resource-id ConversionTargetBucket --query "StackResourceDetail.PhysicalResourceId" --output text) +aws s3 ls s3://${OUTPUT_BUCKET} ``` -After the file has been uploaded to the input bucket you can inspect the output bucket to see the rendered HTML and plain text output files created by the Lambda functions. +2. The input Markdown files are analyzed and their sentiment published to a DynamoDB table. +``` +DYNAMO_TABLE=$(aws cloudformation describe-stack-resource --stack-name lambda-file-refarch --logical-resource-id SentimentTable --query "StackResourceDetail.PhysicalResourceId" --output text) +aws dynamodb scan --table-name ${DYNAMO_TABLE} --query "Items[*]" +``` You can also view the CloudWatch logs generated by the Lambda functions. -## Cleaning Up -To tear down the deployed resources you must complete the following steps: +### Using the test script -1. Delete all objects in the input and output buckets. -1. Delete the CloudFormation stack. -1. Delete the CloudWatch Log groups that contain the execution logs for the two processor functions. +The pipeline end to end test script can be manually executed, you will need to ensure you have adequate permissions to perform the test script actions. +* Describing stack resources +* Uploading and deleting files from the S3 input bucket +* Deleting files from the S3 output bucket +* Reading and deleting entries from the DynamoDB table +```bash +bash ./tests.sh lambda-file-refarch +``` -## CloudFormation Template Resources +While the script is executing you will see all the stages output to the command line. The samples are uploaded to the **InputBucket**, the script will then wait for files to appear in the **OutputBucket** before checking they have all been processed and the matching html file exists in the **OutputBucket**. It will also check that the sentiment for each of the files has been recorded in the **SentimentTable**. Once complete the script will remove all the files created and the entries from the **SentimentTable**. -### Parameters -- *CodeBucket*: Name of the S3 bucket in the stack's region that contains the code for the two Lambda functions, ProcessorFunctionOne and ProcessorFunctionTwo. Defaults to the managed bucket 'awslambda-reference-architectures'. +### Extra credit testing -- *CodeKeyPrefix*: The key prefix for the Lambda function code relative to `CodeBucket`. Defaults to 'file-processing'. +Try uploading (or adding to ./tests if you are using the script) an oversized (>100MB) or invalid file type to the input bucket. +You can check in X-ray to explore how you can trace these kind of errors within the solution. + +* Linux command + +```bash +fallocate -l 110M ./tests/sample-oversize.md +``` + +* Mac OS X command + +```bash +mkfile 110m ./tests/sample-oversize.md +``` + +![X-Ray Error Tracing - Real-time File Processing](img/lambda-refarch-fileprocessing-x-ray-error-trace.png) + + +## Viewing the CloudWatch dashboard + +A dashboard is created as a part of the stack creation process. Metrics are published for the conversion and sentiment analysis processes. In addition, the alarms and alarm states are published. + +![CloudWatch Dashboard - Real-time File Processing](img/lambda-refarch-fileprocessing-dashboard.png) + +## Cleaning Up the Example Resources + +To remove all resources created by this example, run the following command: + +```bash +bash cleanup.sh +``` + +### What Is Happening in the Script? + +Objects are cleared out from the `InputBucket` and `ConversionTargetBucket`. + +```bash +for bucket in InputBucket ConversionTargetBucket; do + echo "Clearing out ${bucket}..." + BUCKET=$(aws cloudformation describe-stack-resource --stack-name lambda-file-refarch --logical-resource-id ${bucket} --query "StackResourceDetail.PhysicalResourceId" --output text) + aws s3 rm s3://${BUCKET} --recursive + echo +done +``` + +The CloudFormation stack is deleted. + +```bash +aws cloudformation delete-stack \ +--stack-name lambda-file-refarch +``` + +The CloudWatch Logs Groups associated with the Lambda functions are deleted. + +```bash +for log_group in $(aws logs describe-log-groups --log-group-name-prefix '/aws/lambda/lambda-file-refarch-' --query "logGroups[*].logGroupName" --output text); do + echo "Removing log group ${log_group}..." + aws logs delete-log-group --log-group-name ${log_group} + echo +done +``` + +## SAM Template Resources ### Resources -[The provided template](https://s3.amazonaws.com/awslambda-reference-architectures/file-processing/lambda_file_processing.template) + +[The provided template](https://s3.amazonaws.com/awslambda-reference-architectures/file-processing/packaged-template.yml) creates the following resources: -- *InputBucket*: An Amazon Simple Storage Service (Amazon S3) bucket that holds the raw Markdown files. Uploading a file to this bucket will trigger both processing functions. +- **InputBucket** - A S3 bucket that holds the raw Markdown files. Uploading a file to this bucket will trigger processing functions. + +- **NotificationTopic** - A SNS topic that receives S3 events from the **InputBucket**. + +- **NotificationTopicPolicy** - A SNS topic policy that allows the **InputBucket** to publish events to the **NotificationTopic**. + +- **NotificationQueuePolicy** - A SQS queue policy that allows the **NotificationTopic** to publish events to the **ConversionQueue** and **SentimentQueue**. + +- **ApplyS3NotificationLambdaFunction** - A Lambda function that adds a S3 bucket notification when objects are created in the **InputBucket**. The function is called by **ApplyInputBucketTrigger**. + +- **ApplyInputBucketTrigger** - A CloudFormation Custom Resource that invokes the **ApplyS3NotificationLambdaFunction** when a CloudFormation stack is created. + +- **ConversionSubscription** - A SNS subscription that allows the **ConversionQueue** to receive messages from **NotificationTopic**. + +- **ConversionQueue** - A SQS queue that is used to store events for conversion from Markdown to HTML. + +- **ConversionDlq** - A SQS queue that is used to capture messages that cannot be processed by the **ConversionFunction**. The *RedrivePolicy* on the **ConversionQueue** is used to manage how traffic makes it to this queue. + +- **ConversionFunction** - A Lambda function that takes the input file, converts it to HTML, and stores the resulting file to **ConversionTargetBucket**. + +- **ConversionTargetBucket** - A S3 bucket that stores the converted HTML. + +- **SentimentSubscription** - A SNS subscription that allows the **SentimentQueue** to receive messages from **NotificationTopic**. + +- **SentimentQueue** - A SQS queue that is used to store events for sentiment analysis processing. + +- **SentimentDlq** - A SQS queue that is used to capture messages that cannot be processed by the **SentimentFunction**. The *RedrivePolicy* on the **SentimentQueue** is used to manage how traffic makes it to this queue. + +- **SentimentFunction** - A Lambda function that takes the input file, performs sentiment analysis, and stores the output to the **SentimentTable**. + +- **SentimentTable** - A DynamoDB table that stores the input file along with the sentiment. -- *OutputBucket*: An Amazon S3 bucket that is populated by the processor functions with the transformed files. +- **AlarmTopic** - A SNS topic that has an email as a subscriber. This topic is used to receive alarms from the **ConversionDlqAlarm**, **SentimentDlqAlarm**, **ConversionQueueAlarm**, **SentimentQueueAlarm**, **ConversionFunctionErrorRateAlarm**, **SentimentFunctionErrorRateAlarm**, **ConversionFunctionThrottleRateAlarm**, and **SentimentFunctionThrottleRateAlarm**. -- *InputNotificationTopic*: An Amazon Simple Notification Service (Amazon SNS) topic used to invoke multiple Lambda functions in response to each object creation notification. +- **ConversionDlqAlarm** - A CloudWatch Alarm that detects when there there are any messages sent to the **ConvesionDlq** within a 1 minute period and sends a notification to the **AlarmTopic**. -- *NotificationPolicy*: An Amazon SNS topic policy which permits `InputBucket` to call the `Publish` action on the topic. +- **SentimentDlqAlarm** - A CloudWatch Alarm that detects when there there are any messages sent to the **SentimentDlq** within a 1 minute period and sends a notification to the **AlarmTopic**. -- *ProcessorFunctionOne*: An AWS Lambda function that converts Markdown files to HTML. The deployment package for this function must be located at s3://[CodeBucket]/[CodeKeyPrefix]/data-processor-1.zip. +- **ConversionQueueAlarm** - A CloudWatch Alarm that detects when there are 20 or more messages in the **ConversionQueue** within a 1 minute period and sends a notification to the **AlarmTopic**. -- *ProcessorFunctionTwo*: An AWS Lambda function that converts Markdown files to plain text. The deployment package for this function must be located at s3://[CodeBucket]/[CodeKeyPrefix]/data-processor-2.zip. +- **SentimentQueueAlarm** - A CloudWatch Alarm that detects when there are 20 or more messages in the **SentimentQueue** within a 1 minute period and sends a notification to the **AlarmTopic**. -- *LambdaExecutionRole*: An AWS Identity and Access Management (IAM) role used by the two Lambda functions. +- **ConversionFunctionErrorRateAlarm** - A CloudWatch Alarm that detects when there is an error rate of 5% over a 5 minute period for the **ConversionFunction** and sends a notification to the **AlarmTopic**. -- *RolePolicy*: An IAM policy associated with `LambdaExecutionRole` that allows the functions to get objects from `InputBucket`, put object to `OutputBucket` and log to Amazon CloudWatch. +- **SentimentFunctionErrorRateAlarm** - A CloudWatch Alarm that detects when there is an error rate of 5% over a 5 minute period for the **SentimentFunction** and sends a notification to the **AlarmTopic**. -- *LambdaInvokePermissionOne*: A policy that enables Amazon SNS to invoke ProcessorFunctionOne based on notifications from InputNotificationTopic. +- **ConversionFunctionThrottleRateAlarm** - A CloudWatch Alarm that detects when ther is a throttle rate of 1% over a 5 minute period for the **ConversionFunction** and sends a notification to the **AlarmTopic**. -- *LambdaInvokePermissionTwo*: A policy that enables Amazon SNS to invoke ProcessorFunctionTwo based on notifications from InputNotificationTopic. +- **SentimentFunctionThrottleRateAlarm** - A CloudWatch Alarm that detects when ther is a throttle rate of 1% over a 5 minute period for the **SentimentFunction** and sends a notification to the **AlarmTopic**. +- **ApplicationDashboard** - A CloudWatch Dashboard that displays Conversion Function Invocations, Conversion Function Error Rate, Conversion Function Throttle Rate, Conversion DLQ Length, Sentiment Function Invocations, Sentiment Function Error Rate, Sentiment Function Throttle Rate, and Sentiment DLQ Length. ## License diff --git a/WELL-ARCHITECTED.md b/WELL-ARCHITECTED.md new file mode 100644 index 0000000..94b6035 --- /dev/null +++ b/WELL-ARCHITECTED.md @@ -0,0 +1,212 @@ +## Operational Excellence + +#### OPS 1. How do you evaluate your Serverless application’s health? + +* [ ] Question does not apply to this workload + +* [x] **[Required]** Understand, analyze and alert on metrics provided out of the box +* [x] **[Best]** Use application, business, and operations metrics +* [x] **[Good]** Use distributed tracing and code is instrumented with additional context +* [ ] **[Good]** Use structured and centralized logging +* [ ] None of these + + +##### Notes + +>* The example uses structured logging output to Cloudwatch. For our example we only deploy to a single account so we don't require the use of cross account centralised logging. +> +>* We have alarms configured with notifications should processing fail. +> +>* We do not have a defined KPI within the application. We could however use a metric such as number of records processed within a given time frame and alert if this is outside of the defined thresholds. + +--- + +#### OPS 2. How do you approach application lifecycle management? + +* [ ] Question does not apply to this workload + +* [x] **[Required]** Use infrastructure as code and stages isolated in separate environments +* [x] **[Good]** Prototype new features using temporary environments +* [ ] **[Good]** Use a rollout deployment mechanism +* [ ] **[Good]** Use configuration management +* [ ] **[Good]** Review the function runtime deprecation policy +* [ ] **[Best]** Use CI/CD including automated testing across separate accounts + +* [ ] None of these + + +##### Notes + +>* Our example utilizes infrastructure as code and includes a simple pipeline that will build and deploy within an individual account and to an individual environment. However the nature of this example means it can be deployed multiple times with different configurations. You can for example deploy a staging pipeline that would watch a development branch and deploy and changes to the Staging application stack. You could also deploy a production pipeline stack that watches the master branch and merges here will trigger a production release. +> +>* For this example a rollout mechanism would involve adopting either a Blue / Green deployment strategy with you controlling which input bucket a particular user hits . Alternatively for application business logic only changes these could be tested by having a notification invoke an alternate version of a lambda under specific conditions. + +--- + +## Security + +#### SEC 1: How do you control access to your Serverless API? + +* [x] Question does not apply to this workload + + +* [ ] **[Required]** Use appropriate endpoint type and mechanisms to secure access to your API +* [ ] **[Good]** Use authentication and authorization mechanisms +* [ ] **[Best]** Scope access based on identity’s metadata + + +* [ ] None of these + + +##### Notes + +>This solution doesn't include an API frontend so the question doesn't apply. + +--- + +#### SEC 2: How do you manage your Serverless application’s security boundaries? + +* [ ] Question does not apply to this workload + + +* [x] **[Required]** Evaluate and define resource policies +* [x] **[Good]** Control network traffic at all layers +* [x] **[Best]** Smaller functions require fewer permissions +* [x] **[Required]** Use temporary credentials between resources and components + +* [ ] None of these + + +##### Notes + +> * We use IAM policy to ensure that resources can only be called by other resources that should be calling them. +> +> * All application components will assume a role with only the permissions it requires in order to perform its function. This will either be only being able to perform a specific action on multiple resources or any action on a particular resource. +> +> * This application does not use private networking. +> +> * We have individual functions for each different piece of business logic. + +--- + +#### SEC 3: How do you implement Application Security in your workload?*** + +* [ ] Question does not apply to this workload + + +* [x] **[Required]** Review security awareness documents frequently +* [x] **[Required]** Store secrets that are used in your code securely +* [ ] **[Good]** Implement runtime protection to help prevent against malicious code execution +* [ ] **[Best]** Automatically review workload’s code dependencies/libraries +* [x] **[Best]** Validate inbound events + + +* [ ] None of these + + +##### Notes + +> * This application doesn't have any stored secrets. The GitHub token is required by CodePipeline, this is passed as a string for CloudFormation, it is not however visible within the CloudFormation console. This could be improved by manually creating a secrets manager entry for the token and replacing the CloudFormation parameter for the token with the secrets manager value by utilising Dynamic References. +>https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/dynamic-references.html +> +> * For reviewing dependencies and libraries we could integrate an automatic check into the pipeline. There are many tools and providers which can check code. Currently this is manual using PEP8 and Bandit manual checks. +> +> * We only check for particular events and check to make sure the object is valid. + +--- + +## Reliability + +#### REL 1. How do you regulate inbound request rates? + +* [ ] Question does not apply to this workload + + +* [x] **[Required]** Use throttling to control inbound request rates +* [ ] **[Good]** Use, analyze and enforce API quotas +* [X] **[Best]** Use mechanisms to protect non-scalable resources + + +* [ ] None of these + + +##### Notes + +> * We are using SQS queues in front of our Lambda functions, this helps us throttle the rate at which our application processes requests. +> +> * We don't have API's to set quotas for. +> +> * Our downstream resources are S3 and DynamoDB on-demand which are more than capable of scaling to match our volumes. + +--- + +#### REL 2. How do you build resiliency into your Serverless application? + +* [ ] Question does not apply to this workload + + +* [x] **[Required]** Manage transaction, partial, and intermittent failures +* [x] **[Required]** Manage duplicate and unwanted events +* [ ] **[Good]** Orchestrate long-running transactions +* [x] **[Best]** Consider scaling patterns at burst rates + + +* [ ] None of these + +##### Notes + +> * We use SQS queues and DLQ's to ensure any processing failure results in a notification. +> +> * The Dynamo key and converted S3 object for each analysis is tied to the input object being analyzed. Pushing the same document will result in the same artifact. +> +> * Our example does not deal with duplicate files. Any duplicate will overwrite the previous, this could be improved inserting another layer of business logic that first checks the inbound file and renames with a UUID, it could additionally check to see if the file hash has already been processed. +> +> * The processing time of our transactions is fast and we can handle multiple files in a single invocation. Under heavy load of inbound files the SQS queue handles the work being distributed to lambda up to 1000 concurrent batches. + +--- + + +## Performance Efficiency + +#### PERF 1. How do you optimize your Serverless application’s performance? + +* [ ] Question does not apply to this workload + + +* [x] **[Required]** Measure, evaluate, and select optimum capacity units +* [x] **[Good]** Measure and optimize function startup time +* [ ] **[Good]** Take advantage of concurrency via async and stream-based function invocations +* [x] **[Good]** Optimize access patterns and apply caching where applicable +* [x] **[Best]** Integrate with managed services directly over functions when possible + +* [ ] None of these + + +##### Notes + +> * We have looked at how our function performs with different batch sizes and memory configurations to find what we believe is optimal for cost/performance . +> +> * For our example there is no real advantage to async. If concurrency was an issue it would be possible to chain the business logic, rather than perform it in parallel. +> +> * Data is pulled from S3 and held locally and cached for the execution, however currently there is only a single task performed per invocation so there is no benefit. Caching outside of the function would offer no benefit over S3. +> +> * In our Sentiment function we are utilising comprehend which a managed service. + +--- + +## Cost Optimization + +#### COST 1. How do you optimize your Serverless application’s costs? + +* [ ] Question does not apply to this workload + +* [x] **[Required]** Minimize external calls and function code initialization +* [x] **[Required]** Optimize logging output and its retention +* [x] **[Good]** Optimize function configuration to reduce cost +* [x] **[Best]** Use cost-aware usage patterns in code + +* [ ] None of these + +##### Notes + +>We have configurable logging levels and bench marked our function for optimal cost/performance. diff --git a/buildspec-test.yml b/buildspec-test.yml new file mode 100644 index 0000000..8c21fcd --- /dev/null +++ b/buildspec-test.yml @@ -0,0 +1,16 @@ +version: 0.2 + +phases: + install: + runtime-versions: + python: 3.7 + commands: + - pip install --upgrade awscli + build: + commands: + - chmod +x tests.sh + - ./tests.sh $OUTPUT_STACK_NAME + post_build: + commands: + - bash -c "if [ /"$CODEBUILD_BUILD_SUCCEEDING/" == /"0/" ]; then exit 1; fi" + - echo Test stage successfully completed on `date` diff --git a/buildspec.yml b/buildspec.yml new file mode 100644 index 0000000..208e225 --- /dev/null +++ b/buildspec.yml @@ -0,0 +1,19 @@ +version: 0.2 + +phases: + install: + runtime-versions: + python: 3.7 + commands: + - pip install --upgrade aws-sam-cli + build: + commands: + - sam build --use-container + post_build: + commands: + - sam package --output-template-file $SAM_OUTPUT_TEMPLATE --s3-bucket $ARTIFACT_BUCKET + - bash -c "if [ /"$CODEBUILD_BUILD_SUCCEEDING/" == /"0/" ]; then exit 1; fi" + - echo Build stage successfully completed on `date` +artifacts: + files: + - $SAM_OUTPUT_TEMPLATE diff --git a/cleanup.sh b/cleanup.sh new file mode 100644 index 0000000..bb392c8 --- /dev/null +++ b/cleanup.sh @@ -0,0 +1,20 @@ +#!/bin/bash + + +echo "Clearing out resources of lambda-file-refarch stack..." +echo +echo "Cleaning up S3 buckets..." && for bucket in InputBucket ConversionTargetBucket; do + echo "Clearing out ${bucket}..." + BUCKET=$(aws cloudformation describe-stack-resource --stack-name lambda-file-refarch --logical-resource-id ${bucket} --query "StackResourceDetail.PhysicalResourceId" --output text) + aws s3 rm s3://${BUCKET} --recursive + echo +done + +echo "Deleting CloudFormation stack..." && aws cloudformation delete-stack \ +--stack-name lambda-file-refarch + +echo "Clearing out CloudWatch Log Groups..." && for log_group in $(aws logs describe-log-groups --log-group-name-prefix '/aws/lambda/lambda-file-refarch-' --query "logGroups[*].logGroupName" --output text); do + echo "Removing log group ${log_group}..." + aws logs delete-log-group --log-group-name ${log_group} + echo +done diff --git a/data-processor-1.js b/data-processor-1.js deleted file mode 100644 index 151c70c..0000000 --- a/data-processor-1.js +++ /dev/null @@ -1,75 +0,0 @@ -/* Copyright 2015 Amazon.com, Inc. or its affiliates. All Rights Reserved. - -Licensed under the Apache License, Version 2.0 (the "License"). You may not use -this file except in compliance with the License. A copy of the License is -located at - -http://aws.amazon.com/apache2.0/ - -or in the "license" file accompanying this file. This file is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or -implied. See the License for the specific language governing permissions and -limitations under the License. */ - -var AWS = require('aws-sdk'); -var marked = require('marked'); -var async = require('async'); - -var s3 = new AWS.S3(); - -function getSNSMessageObject(msgString) { - var x = msgString.replace(/\\/g, ''); - var y = x.substring(1, x.length - 1); - var z = JSON.parse(y); - return z; -} - -exports.handler = function (event, context) { - console.log('event: ' + JSON.stringify(event)); - var snsMsgString = JSON.stringify(event.Records[0].Sns.Message); - var snsMsgObject = getSNSMessageObject(snsMsgString); - - var obj = { - 'bucket': snsMsgObject.Records[0].s3.bucket.name, - 'bucketOut': String(snsMsgObject.Records[0].s3.bucket.name + "-out"), - 'key': snsMsgObject.Records[0].s3.object.key, - }; - - async.waterfall([ - function download(next) { - // get Markdown object - s3.getObject( - { - Bucket: obj.bucket, - Key: obj.key - }, - next); - }, - function transform(response, next) { - // convert md -> html - var data = marked(String(response.Body)); - next(null, data); - }, - function upload(data, next) { - // change file extension - var newFileName = obj.key.split(".")[0] + ".html"; - console.log("Uploading data to: " + obj.bucketOut); - s3.putObject( - { - Bucket: obj.bucketOut, - Key: newFileName, - Body: data, - ContentType: "text/html" // set contentType as HTML - }, - next); - } - ], function (err) { - if (err) { - console.error(err); - } else { - console.log('Success'); - } - context.done(); - }); - -}; diff --git a/data-processor-2.js b/data-processor-2.js deleted file mode 100644 index 7de6bde..0000000 --- a/data-processor-2.js +++ /dev/null @@ -1,75 +0,0 @@ -/* Copyright 2015 Amazon.com, Inc. or its affiliates. All Rights Reserved. - -Licensed under the Apache License, Version 2.0 (the "License"). You may not use -this file except in compliance with the License. A copy of the License is -located at - -http://aws.amazon.com/apache2.0/ - -or in the "license" file accompanying this file. This file is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or -implied. See the License for the specific language governing permissions and -limitations under the License. */ - -var AWS = require('aws-sdk'); -var removeMD = require('remove-markdown'); -var async = require('async'); - -var s3 = new AWS.S3(); - -function getSNSMessageObject(msgString) { - var x = msgString.replace(/\\/g, ''); - var y = x.substring(1, x.length - 1); - var z = JSON.parse(y); - return z; -} - -exports.handler = function (event, context) { - console.log('event: ' + JSON.stringify(event)); - var snsMsgString = JSON.stringify(event.Records[0].Sns.Message); - var snsMsgObject = getSNSMessageObject(snsMsgString); - - var obj = { - 'bucket': snsMsgObject.Records[0].s3.bucket.name, - 'bucketOut': String(snsMsgObject.Records[0].s3.bucket.name + "-out"), - 'key': snsMsgObject.Records[0].s3.object.key - }; - - async.waterfall([ - function download(next) { - // get Markdown object - s3.getObject( - { - Bucket: obj.bucket, - Key: obj.key - }, - next); - }, - function transform(response, next) { - // strip out md and convert to plaintext - var data = removeMD(String(response.Body)); - next(null, data); - }, - function upload(data, next) { - // change file extension - var newFileName = obj.key.split(".")[0] + ".txt"; - console.log("Uploading data to: " + obj.bucketOut); - s3.putObject( - { - Bucket: obj.bucketOut, - Key: newFileName, - Body: data, - ContentType: "text/plain" // set contentType as plaintext - }, - next); - } - ], function (err) { - if (err) { - console.error(err); - } else { - console.log('Success'); - } - context.done(); - }); - -}; diff --git a/img/lambda-refarch-fileprocessing-dashboard.png b/img/lambda-refarch-fileprocessing-dashboard.png new file mode 100644 index 0000000..0f614c1 Binary files /dev/null and b/img/lambda-refarch-fileprocessing-dashboard.png differ diff --git a/img/lambda-refarch-fileprocessing-simple-pipeline.png b/img/lambda-refarch-fileprocessing-simple-pipeline.png new file mode 100644 index 0000000..38b1b76 Binary files /dev/null and b/img/lambda-refarch-fileprocessing-simple-pipeline.png differ diff --git a/img/lambda-refarch-fileprocessing-simple.png b/img/lambda-refarch-fileprocessing-simple.png new file mode 100644 index 0000000..dfe4c30 Binary files /dev/null and b/img/lambda-refarch-fileprocessing-simple.png differ diff --git a/img/lambda-refarch-fileprocessing-x-ray-error-trace.png b/img/lambda-refarch-fileprocessing-x-ray-error-trace.png new file mode 100644 index 0000000..f552c1f Binary files /dev/null and b/img/lambda-refarch-fileprocessing-x-ray-error-trace.png differ diff --git a/lambda_file_processing.template b/lambda_file_processing.template deleted file mode 100644 index 62f07a0..0000000 --- a/lambda_file_processing.template +++ /dev/null @@ -1,258 +0,0 @@ -{ - "AWSTemplateFormatVersion": "2010-09-09", - "Description": "CFN template to create architecture represented at https://aws.amazon.com/blogs/compute/fanout-s3-event-notifications-to-multiple-endpoints/.", - "Parameters": { - "CodeBucket": { - "Description": "S3 Bucket containing Lambda deployment packages and sub-stack templates", - "Type": "String", - "Default" : "awslambda-reference-architectures" - }, - "CodeKeyPrefix": { - "Description": "The key prefix for all deployment packages and sub-stack templates within CodeBucket", - "Type": "String", - "Default" : "file-processing" - } - }, - "Resources": { - "InputBucket": { - "Type": "AWS::S3::Bucket", - "Properties": { - "BucketName": {"Fn::Join" : ["-", [{"Ref" : "AWS::StackName"}, {"Ref" : "AWS::AccountId"}, "files"]]}, - "NotificationConfiguration": { - "TopicConfigurations": [ - { - "Event": "s3:ObjectCreated:*", - "Topic": { "Ref" : "InputNotificationTopic" } - } - ] - } - }, - "DependsOn": "NotificationPolicy" - }, - "OutputBucket": { - "Type": "AWS::S3::Bucket", - "Properties": { - "BucketName": {"Fn::Join" : ["-", [{"Ref" : "InputBucket"}, "out"]]} - } - }, - "InputNotificationTopic": { - "Type": "AWS::SNS::Topic", - "Properties": { - "Subscription": [ - { - "Endpoint": { - "Fn::GetAtt": [ - "ProcessorFunctionOne", - "Arn" - ] - }, - "Protocol": "lambda" - }, - { - "Endpoint": { - "Fn::GetAtt": [ - "ProcessorFunctionTwo", - "Arn" - ] - }, - "Protocol": "lambda" - } - ] - } - }, - "NotificationPolicy": { - "Type": "AWS::SNS::TopicPolicy", - "Properties": { - "PolicyDocument": { - "Id": "PushBucketNotificationPolicy", - "Version": "2012-10-17", - "Statement": [ - { - "Sid": "AllowBucketToPushNotificationEffect", - "Effect": "Allow", - "Principal": { - "Service": "s3.amazonaws.com" - }, - "Action": "sns:Publish", - "Resource": { - "Ref": "InputNotificationTopic" - }, - "Condition": { - "ArnLike": { - "aws:SourceArn": { - "Fn::Join": [ - "", - [ - "arn:aws:s3:*:*:", - {"Fn::Join" : ["-", [{"Ref" : "AWS::StackName"}, {"Ref" : "AWS::AccountId"}, "files"]]} - ] - ] - } - } - } - } - ] - }, - "Topics": [ - { - "Ref": "InputNotificationTopic" - } - ] - } - }, - "ProcessorFunctionOne": { - "Type": "AWS::Lambda::Function", - "Properties": { - "Code": { - "S3Bucket": { "Ref": "CodeBucket" }, - "S3Key": {"Fn::Join" : ["/", [{"Ref": "CodeKeyPrefix"}, "data-processor-1.zip"]]} - }, - "Description": "Data Processor One", - "Handler": "data-processor-1.handler", - "Role": { - "Fn::GetAtt": [ - "LambdaExecutionRole", - "Arn" - ] - }, - "Runtime": "nodejs", - "MemorySize": 128, - "Timeout": 3 - } - }, - "ProcessorFunctionTwo": { - "Type": "AWS::Lambda::Function", - "Properties": { - "Code": { - "S3Bucket": { "Ref": "CodeBucket" }, - "S3Key": {"Fn::Join" : ["/", [{"Ref": "CodeKeyPrefix"}, "data-processor-2.zip"]]} - }, - "Description": "Data Processor Two", - "Handler": "data-processor-2.handler", - "Role": { - "Fn::GetAtt": [ - "LambdaExecutionRole", - "Arn" - ] - }, - "Runtime": "nodejs", - "MemorySize": 128, - "Timeout": 3 - } - }, - "LambdaExecutionRole": { - "Type": "AWS::IAM::Role", - "Properties": { - "AssumeRolePolicyDocument": { - "Version": "2012-10-17", - "Statement": [ - { - "Effect": "Allow", - "Principal": { - "Service": [ - "lambda.amazonaws.com" - ] - }, - "Action": [ - "sts:AssumeRole" - ] - } - ] - }, - "Path": "/" - } - }, - "RolePolicy": { - "Type": "AWS::IAM::Policy", - "Properties": { - "PolicyName": "root", - "PolicyDocument": { - "Version": "2012-10-17", - "Statement": [ - { - "Effect": "Allow", - "Action": [ - "logs:CreateLogGroup", - "logs:CreateLogStream", - "logs:PutLogEvents" - ], - "Resource": "arn:aws:logs:*:*:*" - }, - { - "Effect": "Allow", - "Action": [ - "s3:GetObject" - ], - "Resource": { "Fn::Join": ["", ["arn:aws:s3:::", { "Ref" : "InputBucket" }, "/*"]]} - }, - { - "Effect": "Allow", - "Action": [ - "s3:PutObject" - ], - "Resource": { "Fn::Join": ["", ["arn:aws:s3:::", { "Ref" : "OutputBucket" }, "/*"]]} - } - ] - }, - "Roles": [ - { - "Ref": "LambdaExecutionRole" - } - ] - } - }, - - "LambdaInvokePermissionOne": { - "Type": "AWS::Lambda::Permission", - "Properties": { - "FunctionName" : { "Fn::GetAtt" : ["ProcessorFunctionOne", "Arn"] }, - "Action": "lambda:InvokeFunction", - "Principal": "sns.amazonaws.com", - "SourceArn" : { "Ref" : "InputNotificationTopic" } - } - }, - - - "LambdaInvokePermissionTwo": { - "Type": "AWS::Lambda::Permission", - "Properties": { - "FunctionName" : { "Fn::GetAtt" : ["ProcessorFunctionTwo", "Arn"] }, - "Action": "lambda:InvokeFunction", - "Principal": "sns.amazonaws.com", - "SourceArn" : { "Ref" : "InputNotificationTopic" } - } - } - }, - "Outputs": { - "Bucket": { - "Description": "Storage location for data which is to be processed by Lambda functions", - "Value": { - "Ref": "InputBucket" - } - }, - "BucketOut": { - "Description": "Storage location for data which is to be processed by Lambda functions", - "Value": { - "Ref": "OutputBucket" - } - }, - "Topic": { - "Description": "SNS topic to fanout S3 Event notifications to Lambda functions", - "Value": { - "Ref": "InputNotificationTopic" - } - }, - "ProcessorFxOne": { - "Description": "Lambda function receiving SNS messages of S3 events", - "Value": { - "Ref": "ProcessorFunctionOne" - } - }, - "ProcessorFxTwo": { - "Description": "Lambda function receiving SNS messages of S3 events", - "Value": { - "Ref": "ProcessorFunctionTwo" - } - } - } -} diff --git a/pipeline/README.md b/pipeline/README.md new file mode 100644 index 0000000..35ad006 --- /dev/null +++ b/pipeline/README.md @@ -0,0 +1,153 @@ +# Serverless Reference Architecture: Real-time File Processing Deployment Pipeline + +The Real-time File Processing reference pipeline architecture is an example of using basic CI/CD pipeline using the AWS fully managed continuous delivery service [CodePipeline](https://aws.amazon.com/codepipeline/) in order to deploy a Serverless application. Our pipeline consists of source, build and deployment stages. +We use exactly the same method as in the manual deployment however we utilise [CodeBuild](https://aws.amazon.com/codebuild/) to build and package our application and the native CodePipeline CloudFormation support to deploy our package. + +## CI/CD Pipeline Diagram + + +![Reference Architecture - Real-time File Processing CI/CD Pipeline](../img/lambda-refarch-fileprocessing-simple-pipeline.png) + + +## Pipeline Components + + +### CloudFormation Template + + +pipeline.yml is a CloudFormation template that will deploy all the required pipeline components. Once the stack has deployed the Pipeline will automatically execute and deploy the Serverless Application. See getting started for information on how to deploy the template. + + +#### Deployed Resources + + +* Pipeline S3 bucket, used to store pipeline artefacts that are passed between stages. +* CodePipeline +* CodeBuild Build and Test Projects +* Roles for CodePipeline, CodeBuild and the CloudFormation Deployment +* SNS Topic for Pipeline notifications +* CloudWatch Event for Pipeline Failures + + +### Source + + +For this application we are hosting our source code in GitHub. Other [Source Integrations](https://docs.aws.amazon.com/codepipeline/latest/userguide/integrations-action-type.html#integrations-source) are available however this template focuses on GitHub. Whenever an update is pushed to the GitHub branch being +monitored (default: master) our pipeline will begin executing. The source stage will connect to GitHub using the credentials provided and clone the branch into our pipeline artefact bucket for use in the other stages. + + +### Build + + +In order to run our SAM build and SAM package commands we are using [CodeBuild](https://aws.amazon.com/codebuild/), a fully managed continuous integration service. Codebuild allows us to perform a sequence of commands that we define in the [buildSpec.yml](https://docs.aws.amazon.com/codebuild/latest/userguide/build-spec-ref.html) +file that will execute inside the [build environment](https://docs.aws.amazon.com/codebuild/latest/userguide/build-env-ref.html) we define using a docker container. For this project we are using the Amazon Linux 2 version 1.0 container with Python 3.7. + +Within the buildspec.yml we are: + +* Updating SAM to the latest version +* Running SAM build as per the manual deployment +* Running SAM Package again as per the manual deployment steps +* Instructing CodeBuild to pass the output template back to the Pipeline for use in the deployment stage. + + + +### Deploy + + +To deploy our application stack we are not using SAM Deploy, CodePipeline doesn't support SAM natively so instead we are opting to use the CodePipeline native support for CloudFormation to deploy the template that SAM creates. The pipeline has a role it use with appropriate permissions to deploy the template created by the SAM package step which will create a stack containing the resources defined in our SAM Template. We are using [change sets](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-changesets.html) and [approval actions](https://docs.aws.amazon.com/codepipeline/latest/userguide/approvals-action-add.html) to demonstrate a manual approval workflow. + +You will need to approve the deployment before the pipeline execution actually deploys any resources. Once approved, additional resources will be deployed as per the main architecture documentation. + + +### Test + +The test stage will execute a bash script to perform an end to end test of the application. It uploads 24 sample files from the tests directory and checks for outputs and sentiment DB entries. + +If it cannot locate either the output files or DB entries the pipeline will fail. Once the tests successfully complete the script removes the test resources. + + +## Getting started + + +To get started using the template found in this repository under pipeline/pipeline.yaml. You will need to provide additional information to deploy the stack. + + * GitHubToken: GitHub OAuthToken with access to be able to clone the repository. You can find more information in the [GitHub Documentation](https://github.com/settings/tokens) + * AlarmRecipientEmailAddress: You will need to provide an email address that can be used for configuring notifications + +Optionally, if you are deploying from your own repository you will need to also provide: + + * GitHubRepoName: The name of the GitHub repository hosting your source code. By default it points to the aws-samples repo. + * GitHubRepoBranch: The GitHub repo branch code pipeline should watch for changes on. This defaults to master, but any branch can be used. + * GitHubRepoOwner: the GitHub repository owner. e.g. aws-samples + + + +### Deploying the template + + +You can deploy the template using either the [AWS Console](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-create-stack.html) or the [AWS CLI](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-cli-creating-stack.html) + +**[TODO]** Insert quick link to create CFN stack + + + +##### Example CLI Deployment + + +> aws cloudformation deploy --template-file pipeline/pipeline.yaml --stack-name "lambda-file-refarch-pipeline" --capabilities "CAPABILITY_IAM" "CAPABILITY_NAMED_IAM" --parameter-overrides GitHubToken="**{replace with your GitHub Token}**" AlarmRecipientEmailAddress="**{replace with your admin email}**" + + + +### Deploying twice for a Development and Production example. + + +You can actually deploy the pipeline twice to give two separate environments. Allowing you to create a simple dev to production workflow. + +This will allow you to build your application in your development branch and any changes will automatically be picked up and deployed by the pipeline. Once you have tested and are happy the changes can be merged to master and they will be automatically built and deployed to production. + +Deploy the first stack using a stack name of "lambda-file-refarch-pipeline-dev" update the **AppName** parameter to be environment specific. e.g. "lambda-file-refarch-dev" and make sure to update the branch to the development one. + +##### Example CLI Deployment for development pipeline + + +> aws cloudformation deploy --template-file pipeline/pipeline.yaml --stack-name "lambda-file-refarch-pipeline-dev" --capabilities "CAPABILITY_IAM" "CAPABILITY_NAMED_IAM" --parameter-overrides AppName="lambda-file-refarch-dev" GitHubToken="**{replace with your GitHub Token}**" AlarmRecipientEmailAddress="**{replace with your admin email}**" GitHubRepoBranch="develop" + + +Once that has deployed and the application stack has also successfully deployed you can provision the production pipeline stack. + + +> aws cloudformation deploy --template-file pipeline/pipeline.yaml --stack-name "lambda-file-refarch-pipeline-prod" --capabilities "CAPABILITY_IAM" "CAPABILITY_NAMED_IAM" --parameter-overrides AppName="lambda-file-refarch-prod" GitHubToken="**{replace with your GitHub Token}**" AlarmRecipientEmailAddress="**{replace with your admin email}**" GitHubRepoBranch="master" + + +##### Approval Actions + +Any deployments will require the approval of a change set before the deployment can proceed. There will be an email sent to the admin email address, which will include a link to the approval request. You will need to ensure you have confirmed the subscription in order to receive the notification. + +Alternatively you can do this by navigating the console or you can use the cli [Approve or Reject an Approval Action in CodePipeline](https://docs.aws.amazon.com/codepipeline/latest/userguide/approvals-approve-or-reject.html) + +To use the CLI it requires the creation of a JSON document and knowing the token for the last execution. See the documentation above for details on this. + +> aws codepipeline put-approval-result --cli-input-json file://approvalstage-approved.json + +## Clean-up + +In order to remove all resources created by this example you will first need to make sure the 3 S3 buckets are empty. + +* Pipeline artefact bucket +* Application input bucket +* Application conversion bucket + +Once that is complete you can remove both the Application Stack and the Pipeline Stack. +Note that the pipeline stack should not be removed until the application stack has successfully deleted as it is deployed using a role present in the pipeline stack. This role is used to also delete the stack. + +Additionally there will be some Codebuild logs and Log Groups left over in CloudWatch, these can be deleted. + +Alternatively you can use the script /pipeline/cleanup.sh + +Things to note: + +* Script will remove only stacks deployed as described in the examples. + +* Both the application and the pipeline stacks will be removed. + +* JQ needs to be installed in order to empty the pipeline bucket as versioning is enabled. The command to delete versions and markers requires it. \ No newline at end of file diff --git a/pipeline/cleanup.sh b/pipeline/cleanup.sh new file mode 100644 index 0000000..05e2155 --- /dev/null +++ b/pipeline/cleanup.sh @@ -0,0 +1,72 @@ +#!/bin/bash + +command -v jq >/dev/null 2>&1 || { echo >&2 "jq is required but it's not installed. Aborting."; exit 1; } + +echo "Clearing out resources of lambda-file-refarch and Pipeline stacks..." +echo +echo "Cleaning up Application S3 buckets..." && for bucket in InputBucket ConversionTargetBucket; do + echo "Clearing out ${bucket}..." + BUCKET=$(aws cloudformation describe-stack-resource --stack-name lambda-file-refarch-app --logical-resource-id ${bucket} --query "StackResourceDetail.PhysicalResourceId" --output text) + aws s3 rm s3://${BUCKET} --recursive + echo +done + +echo "Cleaning up Pipeline S3 buckets..." +BUCKET=$(aws cloudformation describe-stack-resource --stack-name lambda-file-refarch-pipeline --logical-resource-id "PipelineBucket" --query "StackResourceDetail.PhysicalResourceId" --output text) + +echo + +echo "Removing all versions from ${BUCKET}" + +VERSIONS=`aws s3api list-object-versions --bucket $BUCKET | jq '.Versions'` +MARKERS=`aws s3api list-object-versions --bucket $BUCKET | jq '.DeleteMarkers'` +let COUNT=`echo $VERSIONS | jq 'length'`-1 + +if [ $COUNT -gt -1 ]; then + echo "removing files from bucket" + for i in $(seq 0 $COUNT); do + KEY=`echo $VERSIONS | jq .[$i].Key | sed -e 's/\"//g'` + VERSIONID=`echo $VERSIONS | jq .[$i].VersionId | sed -e 's/\"//g'` + CMD="aws s3api delete-object --bucket $BUCKET --key $KEY --version-id $VERSIONID" + echo ${CMD} + $CMD + done +fi + +let COUNT=`echo $MARKERS |jq 'length'`-1 + +if [ $COUNT -gt -1 ]; then + echo "removing delete markers" + + for i in $(seq 0 $COUNT); do + KEY=`echo $MARKERS | jq .[$i].Key | sed -e 's/\"//g'` + VERSIONID=`echo $MARKERS | jq .[$i].VersionId | sed -e 's/\"//g'` + CMD="aws s3api delete-object --bucket $BUCKET --key $KEY --version-id $VERSIONID" + echo ${CMD} + $CMD + done +fi + +echo "Deleting lambda-file-refarch-app CloudFormation stack..." && aws cloudformation delete-stack \ + --stack-name lambda-file-refarch-app + +echo "Waiting for stack deletion..." && aws cloudformation wait stack-delete-complete \ + --stack-name lambda-file-refarch-app + +echo "Deleting lambda-file-refarch-pipeline CloudFormation stack..." && aws cloudformation delete-stack \ + --stack-name lambda-file-refarch-pipeline + +echo "Waiting for stack deletion..." && aws cloudformation wait stack-delete-complete \ + --stack-name lambda-file-refarch-pipeline + +echo "Clearing out Application CloudWatch Log Groups..." && for log_group in $(aws logs describe-log-groups --log-group-name-prefix /aws/lambda/lambda-file-refarch-app- --query "logGroups[*].logGroupName" --output text); do + echo "Removing log group ${log_group}..." + aws logs delete-log-group --log-group-name ${log_group} + echo +done + +echo "Clearing out CodeBuild CloudWatch Log Groups..." && for log_group in $(aws logs describe-log-groups --log-group-name-prefix /aws/codebuild/lambda-file-refarch-app-build --query "logGroups[*].logGroupName" --output text); do + echo "Removing log group ${log_group}..." + aws logs delete-log-group --log-group-name ${log_group} + echo +done \ No newline at end of file diff --git a/pipeline/pipeline.yaml b/pipeline/pipeline.yaml new file mode 100644 index 0000000..d68709d --- /dev/null +++ b/pipeline/pipeline.yaml @@ -0,0 +1,474 @@ +AWSTemplateFormatVersion: "2010-09-09" +Description: "Template for full CI/CD serverless applications." +Parameters: + AppName: + Type: String + Default: lambda-file-refarch-app + Description: Name used for application deployment + SAMOutputFile: + Type: String + Default: packaged-template.yml + Description: The filename for the output SAM file from the buildspec file + CodeBuildImage: + Type: String + Default: "aws/codebuild/amazonlinux2-x86_64-standard:1.0" + Description: Image used for CodeBuild project. + GitHubRepoName: + Type: String + Default: "lambda-refarch-fileprocessing" + Description: The GitHub repo name + GitHubRepoBranch: + Type: String + Description: The GitHub repo branch code pipelines should watch for changes on + Default: master + GitHubRepoOwner: + Type: String + Default: "aws-samples" + Description: GitHub Repository Owner. + GitHubToken: + NoEcho: true + Type: String + Description: "Secret. OAuthToken with access to Repo. Long string of characters and digits. Go to https://github.com/settings/tokens" + AlarmRecipientEmailAddress: + Type: String + Description: Email address for any alerts. +Resources: + CodeBuildProject: + DependsOn: [PipelineBucket] + Description: AWS CodeBuild project + Type: AWS::CodeBuild::Project + Properties: + Artifacts: + Type: CODEPIPELINE + Description: !Sub "Building stage for ${AppName}." + Environment: + ComputeType: BUILD_GENERAL1_SMALL + PrivilegedMode: True + EnvironmentVariables: + - Name: ARTIFACT_BUCKET + Value: !Ref PipelineBucket + - Name: SAM_OUTPUT_TEMPLATE + Value: !Ref SAMOutputFile + Image: !Ref CodeBuildImage + Type: LINUX_CONTAINER + Name: !Sub "${AppName}-build" + ServiceRole: !GetAtt CodeBuildTrustRole.Arn + Source: + Type: CODEPIPELINE + Tags: + - Key: app-name + Value: !Ref AppName + TimeoutInMinutes: 5 + CodeBuildTestProject: + DependsOn: [PipelineBucket] + Description: AWS CodeBuild project + Type: AWS::CodeBuild::Project + Properties: + Artifacts: + Type: CODEPIPELINE + Description: !Sub "Testing stage for ${AppName}." + Environment: + ComputeType: BUILD_GENERAL1_SMALL + PrivilegedMode: True + EnvironmentVariables: + - Name: OUTPUT_STACK_NAME + Value: !Sub "${AppName}" + Image: !Ref CodeBuildImage + Type: LINUX_CONTAINER + Name: !Sub "${AppName}-test" + ServiceRole: !GetAtt CodeBuildTrustRole.Arn + Source: + Type: CODEPIPELINE + BuildSpec: "buildspec-test.yml" + Tags: + - Key: app-name + Value: !Ref AppName + TimeoutInMinutes: 5 + PipelineBucket: + Description: S3 bucket for AWS CodePipeline artifacts + Type: AWS::S3::Bucket + Properties: + BucketName: !Sub "pipeline-${AWS::AccountId}-${AWS::Region}-${AppName}" + VersioningConfiguration: + Status: Enabled + PipelineNotificationTopic: + Type: AWS::SNS::Topic + Properties: + Subscription: + - Protocol: email + Endpoint: !Ref AlarmRecipientEmailAddress + PipelineSNSTopicPolicy: + Type: AWS::SNS::TopicPolicy + Properties: + PolicyDocument: + Id: PipelineTopicPolicy + Version: '2012-10-17' + Statement: + - Sid: CwEventsPut + Effect: Allow + Principal: + Service: + - events.amazonaws.com + Action: sns:Publish + Resource: !Ref PipelineNotificationTopic + - Sid: PipelinePut + Effect: Allow + Principal: + Service: + - codepipeline.amazonaws.com + Action: sns:Publish + Resource: !Ref PipelineNotificationTopic + Topics: + - !Ref PipelineNotificationTopic + S3ArtifactBucketPolicy: + DependsOn: [PipelineBucket] + Description: S3 bucket policy for AWS CodePipeline access + Type: AWS::S3::BucketPolicy + Properties: + Bucket: !Ref PipelineBucket + PolicyDocument: + Version: "2012-10-17" + Id: SSEAndSSLPolicy + Statement: + - Sid: DenyInsecureConnections + Effect: Deny + Principal: "*" + Action: s3:* + Resource: !Sub "arn:aws:s3:::${PipelineBucket}/*" + Condition: + Bool: + aws:SecureTransport: false + ProjectPipeline: + DependsOn: [PipelineBucket, CodeBuildProject] + Description: AWS CodePipeline deployment pipeline for project + Type: AWS::CodePipeline::Pipeline + Properties: + Name: !Sub "${AppName}-pipeline" + RoleArn: !GetAtt CodePipelineTrustRole.Arn + Stages: + - Name: Source + Actions: + - Name: source + InputArtifacts: [] + ActionTypeId: + Version: "1" + Category: Source + Owner: ThirdParty + Provider: GitHub + OutputArtifacts: + - Name: !Sub "${AppName}-SourceArtifact" + Configuration: + Repo: !Ref GitHubRepoName + Branch: !Ref GitHubRepoBranch + OAuthToken: !Ref GitHubToken + Owner: !Ref GitHubRepoOwner + RunOrder: 1 + - Name: Build + Actions: + - Name: build-from-source + InputArtifacts: + - Name: !Sub "${AppName}-SourceArtifact" + ActionTypeId: + Category: Build + Owner: AWS + Version: "1" + Provider: CodeBuild + OutputArtifacts: + - Name: !Sub "${AppName}-BuildArtifact" + Configuration: + ProjectName: !Sub "${AppName}-build" + RunOrder: 1 + - Name: Deploy + Actions: + - Name: create-changeset + InputArtifacts: + - Name: !Sub "${AppName}-BuildArtifact" + ActionTypeId: + Category: Deploy + Owner: AWS + Version: "1" + Provider: CloudFormation + OutputArtifacts: [] + Configuration: + StackName: !Sub "${AppName}" + ActionMode: CHANGE_SET_REPLACE + RoleArn: !GetAtt CloudFormationTrustRole.Arn + ChangeSetName: pipeline-changeset + Capabilities: CAPABILITY_NAMED_IAM + TemplatePath: !Sub "${AppName}-BuildArtifact::${SAMOutputFile}" + ParameterOverrides: !Sub '{"AlarmRecipientEmailAddress": "${AlarmRecipientEmailAddress}"}' + RunOrder: 1 + - Name: approve-changeset + InputArtifacts: [] + ActionTypeId: + Category: Approval + Owner: AWS + Provider: Manual + Version: '1' + Configuration: + NotificationArn: !Ref PipelineNotificationTopic + RunOrder: 2 + - Name: execute-changeset + InputArtifacts: [] + ActionTypeId: + Category: Deploy + Owner: AWS + Version: "1" + Provider: CloudFormation + OutputArtifacts: [] + Configuration: + StackName: !Sub "${AppName}" + ActionMode: CHANGE_SET_EXECUTE + ChangeSetName: pipeline-changeset + RunOrder: 3 + - Name: Test + Actions: + - Name: end-to-end + InputArtifacts: + - Name: !Sub "${AppName}-SourceArtifact" + ActionTypeId: + Category: Build + Owner: AWS + Version: "1" + Provider: CodeBuild + Configuration: + ProjectName: !Sub "${AppName}-test" + RunOrder: 1 + ArtifactStore: + Type: S3 + Location: !Ref PipelineBucket + PipelineEventRule: + Type: "AWS::Events::Rule" + Properties: + Description: "Trigger notifications based on pipeline state change to Failure" + EventPattern: + source: + - "aws.codepipeline" + detail-type: + - CodePipeline Pipeline Execution State Change + detail: + state: + - "FAILED" + State: "ENABLED" + Targets: + - Arn: !Ref PipelineNotificationTopic + Id: "PipelineTopic" + InputTransformer: + InputTemplate: !Sub '"The ${AppName} Pipeline in account has at