Cloud Project — AWS · Terraform Portfolio
A self-directed cloud engineering portfolio: four AWS services, provisioned end-to-end with Terraform — no console clicking. Each project is fully independent; any one can be deployed or destroyed on its own, in any order. The progression runs from an edge-hardened static site (P1) to an event-driven data pipeline (P2), to a scheduled automation/DR system (P3), culminating in the flagship AI chatbot (P4), which reuses the code patterns of the first three without depending on their deployed infrastructure.
Overview
| Project | One-line | Core stack | |
|---|---|---|---|
| P1 | Static Web | HTTPS-only static site, private origin behind WAF | S3 + CloudFront + WAF + CloudWatch + SNS |
| P2 | Serverless Pipeline | Event-driven multi-format file ingestion | S3×3 + Lambda×3 + SQS×2(+DLQ) + DynamoDB + API GW + SNS |
| P3 | Smart Vault | Scheduled EBS backup/restore with cross-region DR | EventBridge + Lambda×3 + EBS + S3×2 + API GW(REST) + SNS |
| P4 ★ | AI Chatbot | Gemini-powered customer-service chatbot | API GW + Lambda + DynamoDB + SSM + SNS + S3 + CloudFront |
Region footprint
- Primary:
ap-northeast-2(Seoul) — all compute and data. us-east-1: WAF Web ACL + ACM + CloudFront-scope alarms (CloudFront requires global scope).ap-southeast-1(Singapore): P3 cross-region DR replication target.
What this portfolio demonstrates: Terraform IaC discipline, least-privilege IAM design, event-driven and scheduled serverless patterns, secrets management (SSM SecureString), edge security (WAF / CloudFront OAC), multi-region & DR, observability (CloudWatch + SNS), and a documented engineering process (ADRs, change logs, error records, remote state).
Foundation & cost guardrails
Everything starts from a clean AWS account, not the root user. A dedicated IAM user holds the
Terraform credentials, and a budget alarm is wired up before the first apply so a runaway
resource can never quietly rack up cost.





Cross-cutting engineering practices
These patterns recur across all four projects and are the backbone of the portfolio.
Infrastructure as Code. Every project follows the same Terraform module layout:
projectN/
├── main.tf # all infrastructure
├── iam.tf # one least-privilege role per Lambda (P2/P3/P4)
├── variables.tf # project_name, suffix, alert_email, tags, ...
├── outputs.tf # endpoints + ready-to-run test commands
└── lambda/ # Python 3.12 handlers (P2/P3/P4)
Provider pinned to hashicorp/aws ~> 5.0, required_version >= 1.5.0. Bucket names are made globally
unique with a suffix variable.


Remote state backend (shared). State is centralized in S3 with native S3 lockfile locking
(use_lockfile = true, Terraform ≥ 1.10) — no DynamoDB lock table. Because a backend can’t be managed
by the config that uses it, a standalone bootstrap/ config (own local state) provisions the bucket
(versioned, AES256, public-access-block ×4, TLS-only deny policy). Each project namespaces its state
under a distinct key. Ordering rule: apply bootstrap/ before any project terraform init.
This is recorded in ADR-0002 and driven by ERR-001 (a local-state-not-shared-across-machines
incident).
Security posture (recurring).
- Least-privilege IAM per Lambda — every Lambda gets its own role scoped to exact ARNs and actions; no shared wildcard role.
- Private S3 + CloudFront OAC — origin buckets are never public; access only via the CloudFront
service principal scoped by
AWS:SourceArn(P1, P4). - Secrets in SSM SecureString — P4’s Gemini key is KMS-encrypted in Parameter Store, fetched once at cold start, never in Terraform state or the Lambda env tab (ADR-0001).
- Encryption at rest — AES256 SSE on all buckets, including the state bucket.
- TLS-only — the state bucket denies non-HTTPS access.
Observability (recurring). Every project ships CloudWatch alarms + a dashboard + an SNS email path: error-rate/5xx (P1), Lambda errors + DLQ depth (P2), missed-backup + duration (P3), Lambda errors + response latency (P4). 5-minute aggregation; alarms notify on threshold breach.
Cost discipline. Free-tier-first: DynamoDB PAY_PER_REQUEST, TTL auto-expiry to bound storage, S3
lifecycle expiry, ARM64 Lambda. The only genuinely paid items are P1’s WAF managed rules (~$5–6/mo) and
P3’s EBS snapshots ($0.05/GB/mo) + cross-region replication.
P1 — Security/Performance-Optimized Static Website
Stack: S3 + CloudFront + WAF + ACM (optional) + CloudWatch + SNS · Region: Seoul, with
WAF/ACM/alarms in us-east-1.
Globally distributed, HTTPS-only static hosting where the S3 origin is fully private — reachable only through CloudFront via OAC — fronted by WAF.
User
→ WAF (CommonRuleSet SQLi/XSS, AmazonIpReputationList, RateLimit 2000/5min/IP)
→ CloudFront (HTTP→HTTPS redirect, TTL 1h/24h, 404/403 → index.html for SPA)
→ S3 (private, public-access-block all true, OAC-only)
↓
CloudWatch (4xx>5%, 5xx>1%, WAF blocks>100/5min) → SNS → email
| Resource | Detail |
|---|---|
| S3 bucket | Private, versioning on, AES256 SSE |
| Bucket policy | s3:GetObject only to CloudFront service principal, scoped by AWS:SourceArn |
| WAF Web ACL | CLOUDFRONT scope: CommonRuleSet + IP reputation + 2000 req/5min/IP rate limit |
| CloudFront OAC | sigv4, signing_behavior = always |
| CloudFront dist | default root index.html, HTTPS redirect, SPA 404/403 rewrite |
| CloudWatch + SNS | error-rate/WAF alarms (Seoul + us-east-1 topics) + dashboard |
Security analysis: there is no public access path — OAC + the AWS:SourceArn-scoped bucket
policy is the only way in; direct S3 URLs return 403 by design. WAF rejects malicious traffic before it
reaches the CDN. ACM is left commented out (default CloudFront cert until a custom domain is added).
Cost: free-tier friendly; WAF managed rules (~$5–6/mo) are the only meaningful recurring cost (disable WAF for ~$0 pure-dev).
P2 — Multi-Format Data Processing Serverless Pipeline
Stack: S3×3 + Lambda×3 + SQS×2 (+2 DLQ) + DynamoDB + API Gateway (HTTP v2) + SNS + CloudWatch · Region: Seoul.
Event-driven ingestion: files land in S3 (or via API), a router classifies them by extension, structured data is parsed into DynamoDB, and unstructured data (PDF/image) goes through Textract. Failures isolate via DLQs and a quarantine bucket.
Upload (S3 ObjectCreated or POST /upload)
→ Router Lambda (by extension)
├─ csv/json → structured SQS → Parser Lambda → DynamoDB
├─ pdf/image → unstructured SQS → Extractor Lambda → S3 processed
└─ unknown → S3 quarantine (tagged with reason)
(each SQS: maxReceiveCount=3 → DLQ; errors → SNS email)
The Router is the only synchronous-on-event component; everything downstream is decoupled through SQS,
so a slow or failing parser can’t back-pressure the ingest. maxReceiveCount=3 then DLQ gives bounded
retries; the quarantine bucket captures inputs the router can’t classify, keeping the happy path clean.
Each of the three Lambdas has its own scoped role — the Router can enqueue but not write DynamoDB; the
Parser writes DynamoDB but doesn’t call Textract; and so on.
A structured CSV upload routed and parsed into DynamoDB:

A structured JSON upload handled the same way:

An unstructured PDF flowing through the Extractor / Textract path (CloudWatch logs):

An unsupported extension diverted to the quarantine bucket instead of crashing the pipeline:

Textract extraction in the console:

Cost: effectively $0–1/mo on free tier (DynamoDB pay-per-request, Lambda/SQS free tier).
P3 — Smart Vault (Intelligent Automated Backup)
Stack: EventBridge + Lambda×3 + EC2/EBS snapshots + S3×2 (cross-region) + API Gateway (REST v1) + SNS + CloudWatch · Region: Seoul (primary) + Singapore (DR).
Automated EBS backup/restore. EventBridge schedules snapshots of EC2 instances tagged backup:true; a
cleanup Lambda expires snapshots by their RetainUntil tag; cleanup logs replicate cross-region; and a
key-protected REST endpoint restores a snapshot to a new EBS volume.
EventBridge (hourly + daily 09:01 KST) → Backup Lambda
→ find EC2 tagged backup:true → create EBS snapshot (+ RetainUntil, ManagedBy tags)
→ SNS report email
EventBridge (daily 02:00 KST) → Cleanup Lambda
→ delete expired snapshots (DRY_RUN default true) → log to S3 archive (Seoul)
→ cross-region replication → S3 DR (Singapore, STANDARD_IA)
API Gateway POST /restore (API key) → Restore Lambda → new EBS volume from snapshot
CloudWatch alarms → SNS → email
Notable internals: the RetainUntil tag makes retention data-driven — Backup stamps an expiry,
Cleanup reads it; no separate retention DB. DRY_RUN defaults to true, so the first cleanup run
only lists targets (safe by default). DR is achieved with native S3 cross-region replication of the
cleanup-logs/* prefix to Singapore. Restore validates snapshot_id / volume_type before the EC2
call, and /restore is guarded by an API key. EBS snapshots are intentionally not Terraform-managed
— delete them manually (filter ManagedBy=smart-vault) after testing so cost stops.
The EC2 instance being protected — availability zone and security group:

Snapshot creation, verified in the console:


The SNS backup report email (subscription confirmed, then a success report):


A restore driven through the REST endpoint — Lambda log and the resulting success email:


Cost: EBS snapshots ($0.05/GB/mo) + cross-region replication ($0.02/GB) are the only paid items —
under $1 for a small test volume. Always terraform destroy after testing.
P4 — Customer-Service AI Chatbot ★ Flagship
Stack: API Gateway (HTTP v2) + Lambda + DynamoDB + SSM + SNS + S3 + CloudFront + CloudWatch · Region: Seoul.
A Google Gemini-API-based customer-service chatbot (Bedrock is a drop-in runtime toggle). It receives messages on a REST endpoint, keeps conversation history in DynamoDB (24h TTL), returns AI responses, and emails a human agent via SNS on escalation. The web UI is served from S3 + CloudFront.
Browser / curl
→ CloudFront (HTTPS) ─ OAC ─▶ S3 (web UI, private)
→ API Gateway HTTP v2 (POST /chat)
▼
Lambda chatbot (ARM64, 256MB, timeout 45s, Python 3.12)
┌──────────────────────────────────────────────┐
│ 1. get_history DynamoDB Query (last 10 turns)│
│ 2. build_prompt System Prompt + history + msg │
│ 3. call_ai Gemini API (HTTP 25s) | Bedrock│
│ 4. route ESCALATE / profanity / fallback│
│ 5. save_history DynamoDB BatchWriter (TTL 24h) │
└──────────────────────────────────────────────┘
│ │
DynamoDB SSM Parameter Store (Gemini key, cold-start fetch)
(sessions, TTL 24h)
▼ (on ESCALATE)
SNS → email
| Resource | Detail |
|---|---|
DynamoDB p4-chatbot-sessions | PK session_id (S), SK timestamp (S), PAY_PER_REQUEST, TTL 24h |
Lambda p4-chatbot-chatbot | ARM64, 256MB, timeout 45s, Python 3.12 |
| API Gateway HTTP v2 | POST /chat, CORS allow_origins=["*"] (restrict in prod) |
S3 p4-chatbot-ui-{suffix} | web UI, private, AES256 SSE |
| CloudFront + OAC | HTTPS-only, default_root_object=index.html, 404→index.html |
| SSM Parameter Store | /cloud-portfolio/gemini-api-key (SecureString, KMS); Terraform-managed placeholder with ignore_changes=[value], real key seeded once via CLI |
SNS p4-chatbot-alerts | email subscription (escalation + alarms) |
The _0/_1 conversation-ordering invariant. Each turn stores a user + assistant item at the
same ISO timestamp, disambiguated by a sort-key suffix:
2026-06-05T14:23:45.123456+00:00_0 → role "user"
2026-06-05T14:23:45.123456+00:00_1 → role "assistant"
DynamoDB sorts the SK ascending and '0' < '1', so the user item always precedes the assistant item.
get_history() pairs items as [user, assistant] — this ordering is a precondition of the pairing
loop. An earlier _user/_assistant suffix broke it ('a' < 'u' put the assistant first), so
multi-turn history mis-paired and fed the wrong context to the model. That was the repeated-response
bug, fixed and recorded in the change log.
Credentials & state. The Gemini key lives in SSM SecureString (ADR-0001) — never in Terraform
state or the Lambda env tab; it’s fetched once at cold start and cached for the container lifetime, with
IAM granting ssm:GetParameter on the single parameter ARN only. P4 was also the first project migrated
to the shared S3 remote-state backend (ADR-0002 / ERR-001).
Provisioning P4 with Terraform, then with Terragrunt (DRY remote-state config):






Cost: effectively $0/mo on free tier (Lambda / API GW / DynamoDB / SSM / S3+CloudFront / SNS all within limits; Gemini is free at ≤1,500 req/day). Switching to Bedrock ≈ $1–3/mo.
Pattern reuse
P4 reuses the code patterns of P1–P3 but creates every resource itself — no dependency on their deployed infrastructure. Each project deploys and destroys independently.
| Pattern source | Borrowed pattern | Where applied in P4 |
|---|---|---|
| P1 Static Web | S3 private + CloudFront OAC + HTTPS-only hosting | web UI |
| P2 Pipeline | DynamoDB PAY_PER_REQUEST + TTL; API GW → Lambda AWS_PROXY | p4-chatbot-sessions + POST /chat |
| P3 Smart Vault | SNS topic + email subscription notification | p4-chatbot-alerts escalation email |
Skills matrix
| Capability | P1 | P2 | P3 | P4 |
|---|---|---|---|---|
| Terraform IaC | ✓ | ✓ | ✓ | ✓ |
| Python 3.12 Lambda | — | ✓×3 | ✓×3 | ✓ |
| Compute | — | Lambda | Lambda + EC2/EBS | Lambda (ARM64) |
| Data store | — | DynamoDB | EBS snapshots + S3 | DynamoDB (TTL) |
| Messaging / queue | — | SQS + DLQ | — | — |
| Eventing / schedule | — | S3 events | EventBridge cron | — |
| API | — | API GW HTTP | API GW REST (key) | API GW HTTP |
| Edge / CDN | CloudFront + WAF | — | — | CloudFront + OAC |
| AI / ML | — | Textract | — | Gemini / Bedrock |
| Secrets | — | — | API key (tfvars/SSM) | SSM SecureString |
| Multi-region / DR | us-east-1 (WAF) | — | Singapore DR | — |
| Least-privilege IAM | bucket policy | per-Lambda | per-Lambda | per-Lambda |
| Observability | CW + SNS | CW + SNS | CW + SNS | CW + SNS |
Process skills: ADR-driven decisions, change-log discipline, error records with prevention, and a Terraform remote-state backend with locking.
Roadmap
Next up is a document-analysis engine — Textract for OCR/table extraction feeding embeddings into OpenSearch for semantic search. It’s deliberately the last step because SageMaker endpoints and OpenSearch bill hourly rather than per-request, so it’ll run as a single timed demo and be destroyed immediately after.