A production-grade Data Lakehouse designed for extreme cost efficiency and scalability.
This architecture ingests raw edge logs, processes them via serverless ETL, and stores them in
Apache Iceberg format for ACID-compliant analytics.
Architecture Diagram
flowchart TB
subgraph Users
Browser[Browser]
end
subgraph GitHub
Repo[GitHub Repository]
Actions[GitHub Actions]
end
subgraph AWS_Global["AWS Global"]
WAF[AWS WAF
Web ACL]
Route53[Route53
DNS]
ACM[ACM Certificate
us-east-1]
CloudFront[CloudFront Pro
Primary Distribution]
CloudFrontDR[CloudFront Pro
DR Distribution]
LambdaDR[Lambda
DR Failover]
CWHealthAlarm[CloudWatch Alarm
Health Check]
end
subgraph AWS_EU["AWS eu-central-1"]
S3[S3 Bucket
allaboutdata.eu]
KMS[KMS
Website CMK]
KMSLake[KMS
Datalake CMK]
end
subgraph AWS_DR["AWS eu-central-2 (DR)"]
S3DR[S3 Bucket
allaboutdata.eu-dr]
KMSDR[KMS
DR Customer Managed Key]
end
subgraph DataLake["AWS Data Lake"]
S3Lake[S3 Data Lake Bucket]
GlueCatalog[Glue Data Catalog
Bronze Table]
EventBridge[EventBridge
Scheduler]
GlueJob[Glue Spark Job
Bronze → Silver]
Iceberg[Iceberg Warehouse
Silver Table]
GoldJob[Glue Spark Job
Silver → Gold]
Gold[Gold Warehouse
Parquet Aggregations]
Athena[Athena
SQL Analytics]
LambdaDash[Lambda
Dashboard Export]
GlueDQ[Glue Data Quality
DQDL Rulesets]
end
subgraph Observability["AWS Observability"]
CWDashboard[CloudWatch
Dashboard]
CWAlarms[CloudWatch
Alarms]
SNS[SNS
Email Alerts]
end
subgraph External
GoogleWorkspace[Google Workspace
Email]
end
%% User Flow
Browser -->|HTTPS Request| Route53
Route53 -->|DNS Resolution| CloudFront
WAF -->|Protection| CloudFront
WAF -->|Protection| CloudFrontDR
CloudFront -->|Origin| S3
CloudFrontDR -->|Origin| S3DR
ACM -.->|TLS Certificate| CloudFront
ACM -.->|TLS Certificate| CloudFrontDR
%% DR Failover Flow
Route53 -.->|Health Check| CWHealthAlarm
CWHealthAlarm -->|SNS → ALARM/OK| LambdaDR
LambdaDR -.->|AssociateAlias| CloudFront
LambdaDR -.->|AssociateAlias| CloudFrontDR
%% CI/CD Flow
Repo -->|Push to main| Actions
Actions -->|S3 Sync| S3
Actions -->|Cache Invalidation| CloudFront
%% Email (Google Workspace)
Route53 -.->|MX Record| GoogleWorkspace
%% Data Lake Flow
CloudFront -->|Access Logs| S3Lake
S3Lake -->|TSV Logs| GlueCatalog
S3Lake -.->|S3 Events| EventBridge
EventBridge -->|6h Trigger| GlueJob
EventBridge -->|Daily Trigger| GoldJob
GlueCatalog -->|Incremental Read| GlueJob
GlueJob -->|Write Iceberg V2| Iceberg
Iceberg -->|Delta Read| GoldJob
GoldJob -->|Write Parquet 128MB| Gold
Iceberg -->|Query| Athena
Gold -->|Query| Athena
%% Dashboard Export Flow
GoldJob -.->|Job Complete| EventBridge
EventBridge -->|Trigger| LambdaDash
LambdaDash -->|Query Gold| Athena
LambdaDash -->|Write JSON| S3
%% Cross-Region Replication
S3 -->|S3 CRR| S3DR
%% Encryption
KMS -.->|SSE-KMS| S3
KMSDR -.->|SSE-KMS| S3DR
KMSLake -.->|SSE-KMS| S3Lake
%% Data Quality (Daily at 3 AM)
EventBridge -->|Daily 3 AM| GlueDQ
GlueDQ -->|Validate| Iceberg
GlueDQ -->|Validate| Gold
%% Observability Flow
GlueJob -.->|Metrics| CWDashboard
GoldJob -.->|Metrics| CWDashboard
LambdaDash -.->|Metrics| CWDashboard
CWAlarms -->|Alert| SNS
SNS -->|Email| GoogleWorkspace
%% Styling
classDef aws fill:#FF9900,stroke:#232F3E,color:#232F3E
classDef github fill:#24292E,stroke:#24292E,color:#fff
classDef user fill:#4285F4,stroke:#1a73e8,color:#fff
classDef external fill:#34A853,stroke:#1e8e3e,color:#fff
classDef datalake fill:#8C4FFF,stroke:#232F3E,color:#fff
classDef observability fill:#DD3522,stroke:#232F3E,color:#fff
classDef security fill:#1A8FE3,stroke:#232F3E,color:#fff
class Route53,ACM,CloudFront,CloudFrontDR,S3,S3DR aws
class WAF,KMS,KMSDR,KMSLake security
class Repo,Actions github
class Browser user
class GoogleWorkspace external
class S3Lake,GlueCatalog,EventBridge,GlueJob,Iceberg,GoldJob,Gold,Athena,LambdaDash,GlueDQ datalake
class CWDashboard,CWAlarms,CWHealthAlarm,SNS observability
class LambdaDR datalake