CASE STUDY

Data Engine

SUMMARY

It is an integration layer for processing data, created exclusively with serverless technologies provided by AWS. It receives incoming data, processes it into a normalized data schema and sends the transformed data to the SAP cloud. It also provides an API for business and technical operations to retrieve information from the system and maintain data in the system. This API is exposed as REST calls over https.

COMPANY PROFILE

The client conducts international marketing activities. He uses a wide variety of platforms such as Google, Facebook or Amazon to obtain customer interaction data.

LOCATION

Austria

# LOCATIONS

10 locations

EMPLOYEES

BRANCHE

10.000 employees

production, sport, media, sales

AS-IS SITUATION

The customer organizes international events and relies on various platforms for event management and ticketing. The data of the platforms are processed for internal purposes and have to be deleted after certain times for DSGVO reasons.

SOLUTION

The task of creative-it was the setup of the State Keeper. The purpose of this component is, as its name suggests, to know the state of a certain input date as it moves through the system until it reaches its final state. To track these state changes, all connectors must send a state change for a specific document to the State Keeper.

Fully serverless in AWS

  • DynamoDB

  • S3

  • Lambda

  • SQS, with some FIFO queue

  • SES (for business operations emails)

  • API Gateway

  • Athena

  • Step Functions

All infrastructure managed via IaC in terraform

OUTCOME

The data of events from different platforms are aggregated, processed and loaded into the target system. All state changes of the data are tracked and made available for business and technical operations. All personal data of the event participants are anonymized according to defined points in time.


Benefits

  • scales dramatically on peak loads (millions of records per hour)

  • NoOps - there is no operations team monitoring or dealing with first or second level issues, all that needs to be done is handled by the developers

  • There were no outages or errors due to the unavailability of one of the services (1,5yrs in prod now)
    Costs around 1k (for all three environments) per month, which includes Operations!

  • The same solution on-prem would costs around 15k per month (sizing needed based on high peak loads, dealing with outages, etc.)

  • Extremely fast time-to-market as every infrastructure resource is created by the developers on-demand, no waiting for other departments or teams to get a simple or complex task done

  • Developing the solution on-prem would have costs approx. 3 times more in development

  • We can design the ideal architectural solution for a problem as we can use every service that is available, and this is a large number of services

  • Transparency in runtime costs of a feature: we can calculate the runtime costs of new features to the cent per request as part of the effort estimation, e.g. we can tell Business that dealing with a GDPR request costs less than EUR 1, no matter how many requests we receive

  • Rarely used features (like handling GDPR removal requests) are designed in such ways that they do not cause a high monthly fee, or increase costs per amount of data stored, e.g. we do not store data in a database, but use the existing data in S3 buckets via Athena to find the data to be removed

  • Elasticity is so good that we need to throttle the outbound data transfer to avoid overload on the cloud-based SAP system (which was believed to be able to carry any load we can provide)

  • Average transition time is always the same, no matter how high the current load is

  • Using DynamoDB means that there is no degradation in performance when the amount of data grows (guaranteed less than 10ms when accessing data using the partition key)