In this blog, we'll walk through building a serverless log analytics pipeline using Amazon OpenSearch Serverless, AWS Lambda, and Amazon S3.
Log analytics is crucial for monitoring system health, detecting anomalies, and troubleshooting issues in modern applications. Traditional log processing solutions often require significant infrastructure and operational overhead. However, with OpenSearch Serverless, you can build a scalable, cost-effective log analytics pipeline without managing infrastructure.
Architecture Overview
Our serverless log analytics pipeline will use:
- AWS Lambda – Processes incoming logs and transforms data.
- Amazon S3 – Stores raw logs before ingestion.
- AWS S3 Triggers – Triggers Lambda when new logs arrive.
- OpenSearch Serverless – Stores and indexes log data for querying.
- OpenSearch Dashboards – For visualization and analysis.
Workflow Steps
- Application logs are stored in an S3 bucket.
- S3 event notifications trigger an AWS Lambda function.
- Lambda processes and transforms logs into OpenSearch-compatible JSON.
- The transformed logs are sent to OpenSearch Serverless.
- Users can query and visualize logs using OpenSearch Dashboards.
Step-by-Step Implementation
1. Set Up an OpenSearch Serverless Collection
Follow the steps listed in this AWS OpenSearch guide to create a collection.
You will need to create a Time series collection. We generally recommend redundancy, but you can choose to disable it based on your search requirements. Follow all of the relevant security steps. Make sure you enable access to both Opensearch and Opensearch dashboards.
2. Configure an S3 Bucket for Log Storage
- Create an S3 bucket to store logs (e.g.,
my-app-logs
). - You will need to create a permissions policy and role that would allow your Lambda function to access the bucket. See here for details: https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html#with-s3-example-create-policy.
- You also need to add an access rule to the collection which would allow this role to access Opensearch, (as well as anyone else who should access it).
3. Deploy an AWS Lambda Function for Log Processing
Lambda will process raw logs and push them to OpenSearch.
You will need to add a layer which contains the boto3 and opensearch-py packages - See instructions here: https://docs.aws.amazon.com/lambda/latest/dg/python-layers.html#python-layer-creating.
Use the following requirements.txt file to create the layer, then upload it and have the lambda use it.
opensearch-py==2.8.0
boto3==1.36.26
Don't forget to deploy the function after adding the layer as well.
Example Python Lambda Code for Log Processing
import json
import boto3
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
def lambda_handler(event, context):
host ='your_host_name' # serverless collection endpoint, without https://
region = 'us-east-1' # e.g. us-east-1
service = 'aoss'
credentials = boto3.Session().get_credentials()
auth = AWSV4SignerAuth(credentials, region, service)
# create an opensearch client and use the request-signer
client = OpenSearch(
hosts=[{'host': host, 'port': 443}],
http_auth=auth,
use_ssl=True,
verify_certs=True,
connection_class=RequestsHttpConnection,
pool_maxsize=20,
)
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
s3_client = boto3.client('s3')
response = s3_client.get_object(Bucket=bucket, Key=key)
log_data = response['Body'].read().decode('utf-8')
# Transform log data into JSON format for OpenSearch
log_entries = [json.loads(line) for line in log_data.strip().split('\n')]
bulk_data = "\n".join([json.dumps({"index": {"_index":"logs"}}) + "\n" + json.dumps(entry) for entry in log_entries])
response = client.bulk(bulk_data)
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
Create the S3 trigger as explained here https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html#with-s3-example-create-trigger
Use a test event to test your function: https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html#with-s3-example-test-dummy-event.
4. Query and Analyze Logs in OpenSearch Dashboards
- Navigate to OpenSearch Dashboards.
- Create an index pattern for your logs and use Discover to explore them (e.g. search for
message:error
) - Build dashboards for real-time log monitoring.
Enhancements and Optimizations
- Use data lifecycle policies for automatic log retention.
- Use Amazon Kinesis Firehose to stream logs efficiently.
- Optimize queries using structured logging and indexing or Pulse Query Analytics.
- Manage your capacity limits to control the resources available to the collection and manage your scale.
- Monitor metrics in Cloudwatch.
Conclusion
By leveraging OpenSearch Serverless, AWS Lambda, and S3, you can build a scalable, serverless log analytics pipeline that is cost-efficient, highly available, and easy to manage. This architecture is ideal for monitoring applications, troubleshooting issues, and gaining real-time insights into system behavior.