Building Log Analytics Pipeline with Amazon OpenSearch Serverless

In this blog, we'll walk through building a serverless log analytics pipeline using Amazon OpenSearch Serverless, AWS Lambda, and Amazon S3.

Log analytics is crucial for monitoring system health, detecting anomalies, and troubleshooting issues in modern applications. Traditional log processing solutions often require significant infrastructure and operational overhead. However, with OpenSearch Serverless, you can build a scalable, cost-effective log analytics pipeline without managing infrastructure.

Architecture Overview

Our serverless log analytics pipeline will use:

AWS Lambda – Processes incoming logs and transforms data.
Amazon S3 – Stores raw logs before ingestion.
AWS S3 Triggers – Triggers Lambda when new logs arrive.
OpenSearch Serverless – Stores and indexes log data for querying.
OpenSearch Dashboards – For visualization and analysis.

Workflow Steps

Application logs are stored in an S3 bucket.
S3 event notifications trigger an AWS Lambda function.
Lambda processes and transforms logs into OpenSearch-compatible JSON.
The transformed logs are sent to OpenSearch Serverless.
Users can query and visualize logs using OpenSearch Dashboards.

Step-by-Step Implementation

1. Set Up an OpenSearch Serverless Collection

Follow the steps listed in this AWS OpenSearch guide to create a collection.

You will need to create a Time series collection. We generally recommend redundancy, but you can choose to disable it based on your search requirements. Follow all of the relevant security steps. Make sure you enable access to both Opensearch and Opensearch dashboards.

2. Configure an S3 Bucket for Log Storage

Create an S3 bucket to store logs (e.g., my-app-logs).
You will need to create a permissions policy and role that would allow your Lambda function to access the bucket. See here for details: https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html#with-s3-example-create-policy.
You also need to add an access rule to the collection which would allow this role to access Opensearch, (as well as anyone else who should access it).

3. Deploy an AWS Lambda Function for Log Processing

Lambda will process raw logs and push them to OpenSearch.

You will need to add a layer which contains the boto3 and opensearch-py packages - See instructions here: https://docs.aws.amazon.com/lambda/latest/dg/python-layers.html#python-layer-creating.

Use the following requirements.txt file to create the layer, then upload it and have the lambda use it.

opensearch-py==2.8.0
boto3==1.36.26

Don't forget to deploy the function after adding the layer as well.

Example Python Lambda Code for Log Processing

import json
import boto3
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth


def lambda_handler(event, context):
    host ='your_host_name'  # serverless collection endpoint, without https://
    region = 'us-east-1'  # e.g. us-east-1

    service = 'aoss'
    credentials = boto3.Session().get_credentials()
    auth = AWSV4SignerAuth(credentials, region, service)

    # create an opensearch client and use the request-signer
    client = OpenSearch(
        hosts=[{'host': host, 'port': 443}],
        http_auth=auth,
        use_ssl=True,
        verify_certs=True,
        connection_class=RequestsHttpConnection,
        pool_maxsize=20,
    )
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']
    s3_client = boto3.client('s3')
    response = s3_client.get_object(Bucket=bucket, Key=key)
    log_data = response['Body'].read().decode('utf-8')
    
    # Transform log data into JSON format for OpenSearch
    log_entries = [json.loads(line) for line in log_data.strip().split('\n')]
    
    
    bulk_data = "\n".join([json.dumps({"index": {"_index":"logs"}}) + "\n" + json.dumps(entry) for entry in log_entries])
    response = client.bulk(bulk_data)

    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }

Create the S3 trigger as explained here https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html#with-s3-example-create-trigger

Use a test event to test your function: https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html#with-s3-example-test-dummy-event.

4. Query and Analyze Logs in OpenSearch Dashboards

Navigate to OpenSearch Dashboards.
Create an index pattern for your logs and use Discover to explore them (e.g. search for message:error )
Build dashboards for real-time log monitoring.

Enhancements and Optimizations

Use data lifecycle policies for automatic log retention.
Use Amazon Kinesis Firehose to stream logs efficiently.
Optimize queries using structured logging and indexing or Pulse Query Analytics.
Manage your capacity limits to control the resources available to the collection and manage your scale.
Monitor metrics in Cloudwatch.

Conclusion

By leveraging OpenSearch Serverless, AWS Lambda, and S3, you can build a scalable, serverless log analytics pipeline that is cost-efficient, highly available, and easy to manage. This architecture is ideal for monitoring applications, troubleshooting issues, and gaining real-time insights into system behavior.