Jun 7, 2026

Cloud Project — AWS · Terraform Portfolio

SHIPPED AWS · Terraform · Lambda · DynamoDB · CloudFront · WAF · SQS · EventBridge · API Gateway · SSM · Python · IaC GitHub ↗

A self-directed cloud engineering portfolio: four AWS services, provisioned end-to-end with Terraform — no console clicking. Each project is fully independent; any one can be deployed or destroyed on its own, in any order. The progression runs from an edge-hardened static site (P1) to an event-driven data pipeline (P2), to a scheduled automation/DR system (P3), culminating in the flagship AI chatbot (P4), which reuses the code patterns of the first three without depending on their deployed infrastructure.

Overview

	Project	One-line	Core stack
P1	Static Web	HTTPS-only static site, private origin behind WAF	S3 + CloudFront + WAF + CloudWatch + SNS
P2	Serverless Pipeline	Event-driven multi-format file ingestion	S3×3 + Lambda×3 + SQS×2(+DLQ) + DynamoDB + API GW + SNS
P3	Smart Vault	Scheduled EBS backup/restore with cross-region DR	EventBridge + Lambda×3 + EBS + S3×2 + API GW(REST) + SNS
P4 ★	AI Chatbot	Gemini-powered customer-service chatbot	API GW + Lambda + DynamoDB + SSM + SNS + S3 + CloudFront

Region footprint

Primary: ap-northeast-2 (Seoul) — all compute and data.
us-east-1: WAF Web ACL + ACM + CloudFront-scope alarms (CloudFront requires global scope).
ap-southeast-1 (Singapore): P3 cross-region DR replication target.

What this portfolio demonstrates: Terraform IaC discipline, least-privilege IAM design, event-driven and scheduled serverless patterns, secrets management (SSM SecureString), edge security (WAF / CloudFront OAC), multi-region & DR, observability (CloudWatch + SNS), and a documented engineering process (ADRs, change logs, error records, remote state).

Foundation & cost guardrails

Everything starts from a clean AWS account, not the root user. A dedicated IAM user holds the Terraform credentials, and a budget alarm is wired up before the first apply so a runaway resource can never quietly rack up cost.

Creating a dedicated IAM user for Terraform instead of using the root account

Installing the AWS CLI and Terraform locally

Verifying the CLI is authenticated against the account

Setting a billing budget with a threshold alarm

Final review and budget creation

Cross-cutting engineering practices

These patterns recur across all four projects and are the backbone of the portfolio.

Infrastructure as Code. Every project follows the same Terraform module layout:

projectN/
├── main.tf        # all infrastructure
├── iam.tf         # one least-privilege role per Lambda (P2/P3/P4)
├── variables.tf   # project_name, suffix, alert_email, tags, ...
├── outputs.tf     # endpoints + ready-to-run test commands
└── lambda/        # Python 3.12 handlers (P2/P3/P4)

Provider pinned to hashicorp/aws ~> 5.0, required_version >= 1.5.0. Bucket names are made globally unique with a suffix variable.

Fixing a Terraform configuration issue surfaced during apply

terraform init — run once before starting a project

Remote state backend (shared). State is centralized in S3 with native S3 lockfile locking (use_lockfile = true, Terraform ≥ 1.10) — no DynamoDB lock table. Because a backend can’t be managed by the config that uses it, a standalone bootstrap/ config (own local state) provisions the bucket (versioned, AES256, public-access-block ×4, TLS-only deny policy). Each project namespaces its state under a distinct key. Ordering rule: apply bootstrap/ before any project terraform init. This is recorded in ADR-0002 and driven by ERR-001 (a local-state-not-shared-across-machines incident).

Security posture (recurring).

Least-privilege IAM per Lambda — every Lambda gets its own role scoped to exact ARNs and actions; no shared wildcard role.
Private S3 + CloudFront OAC — origin buckets are never public; access only via the CloudFront service principal scoped by AWS:SourceArn (P1, P4).
Secrets in SSM SecureString — P4’s Gemini key is KMS-encrypted in Parameter Store, fetched once at cold start, never in Terraform state or the Lambda env tab (ADR-0001).
Encryption at rest — AES256 SSE on all buckets, including the state bucket.
TLS-only — the state bucket denies non-HTTPS access.

Observability (recurring). Every project ships CloudWatch alarms + a dashboard + an SNS email path: error-rate/5xx (P1), Lambda errors + DLQ depth (P2), missed-backup + duration (P3), Lambda errors + response latency (P4). 5-minute aggregation; alarms notify on threshold breach.

Cost discipline. Free-tier-first: DynamoDB PAY_PER_REQUEST, TTL auto-expiry to bound storage, S3 lifecycle expiry, ARM64 Lambda. The only genuinely paid items are P1’s WAF managed rules (~$5–6/mo) and P3’s EBS snapshots ($0.05/GB/mo) + cross-region replication.

P1 — Security/Performance-Optimized Static Website

Stack: S3 + CloudFront + WAF + ACM (optional) + CloudWatch + SNS · Region: Seoul, with WAF/ACM/alarms in us-east-1.

Globally distributed, HTTPS-only static hosting where the S3 origin is fully private — reachable only through CloudFront via OAC — fronted by WAF.

User
  → WAF (CommonRuleSet SQLi/XSS, AmazonIpReputationList, RateLimit 2000/5min/IP)
  → CloudFront (HTTP→HTTPS redirect, TTL 1h/24h, 404/403 → index.html for SPA)
  → S3 (private, public-access-block all true, OAC-only)
        ↓
  CloudWatch (4xx>5%, 5xx>1%, WAF blocks>100/5min) → SNS → email

Resource	Detail
S3 bucket	Private, versioning on, AES256 SSE
Bucket policy	`s3:GetObject` only to CloudFront service principal, scoped by `AWS:SourceArn`
WAF Web ACL	`CLOUDFRONT` scope: CommonRuleSet + IP reputation + 2000 req/5min/IP rate limit
CloudFront OAC	sigv4, `signing_behavior = always`
CloudFront dist	default root `index.html`, HTTPS redirect, SPA 404/403 rewrite
CloudWatch + SNS	error-rate/WAF alarms (Seoul + us-east-1 topics) + dashboard

Security analysis: there is no public access path — OAC + the AWS:SourceArn-scoped bucket policy is the only way in; direct S3 URLs return 403 by design. WAF rejects malicious traffic before it reaches the CDN. ACM is left commented out (default CloudFront cert until a custom domain is added).

Cost: free-tier friendly; WAF managed rules (~$5–6/mo) are the only meaningful recurring cost (disable WAF for ~$0 pure-dev).

P2 — Multi-Format Data Processing Serverless Pipeline

Stack: S3×3 + Lambda×3 + SQS×2 (+2 DLQ) + DynamoDB + API Gateway (HTTP v2) + SNS + CloudWatch · Region: Seoul.

Event-driven ingestion: files land in S3 (or via API), a router classifies them by extension, structured data is parsed into DynamoDB, and unstructured data (PDF/image) goes through Textract. Failures isolate via DLQs and a quarantine bucket.

Upload (S3 ObjectCreated  or  POST /upload)
  → Router Lambda (by extension)
       ├─ csv/json  → structured SQS   → Parser Lambda    → DynamoDB
       ├─ pdf/image → unstructured SQS → Extractor Lambda → S3 processed
       └─ unknown   → S3 quarantine (tagged with reason)
  (each SQS: maxReceiveCount=3 → DLQ; errors → SNS email)

The Router is the only synchronous-on-event component; everything downstream is decoupled through SQS, so a slow or failing parser can’t back-pressure the ingest. maxReceiveCount=3 then DLQ gives bounded retries; the quarantine bucket captures inputs the router can’t classify, keeping the happy path clean. Each of the three Lambdas has its own scoped role — the Router can enqueue but not write DynamoDB; the Parser writes DynamoDB but doesn’t call Textract; and so on.

A structured CSV upload routed and parsed into DynamoDB:

CSV upload routed through the pipeline

A structured JSON upload handled the same way:

JSON upload processed into DynamoDB

An unstructured PDF flowing through the Extractor / Textract path (CloudWatch logs):

PDF processed via the Extractor Lambda

An unsupported extension diverted to the quarantine bucket instead of crashing the pipeline:

Unknown file type quarantined

Textract extraction in the console:

Textract extraction result

Cost: effectively $0–1/mo on free tier (DynamoDB pay-per-request, Lambda/SQS free tier).

P3 — Smart Vault (Intelligent Automated Backup)

Stack: EventBridge + Lambda×3 + EC2/EBS snapshots + S3×2 (cross-region) + API Gateway (REST v1) + SNS + CloudWatch · Region: Seoul (primary) + Singapore (DR).

Automated EBS backup/restore. EventBridge schedules snapshots of EC2 instances tagged backup:true; a cleanup Lambda expires snapshots by their RetainUntil tag; cleanup logs replicate cross-region; and a key-protected REST endpoint restores a snapshot to a new EBS volume.

EventBridge (hourly + daily 09:01 KST) → Backup Lambda
    → find EC2 tagged backup:true → create EBS snapshot (+ RetainUntil, ManagedBy tags)
    → SNS report email

EventBridge (daily 02:00 KST) → Cleanup Lambda
    → delete expired snapshots (DRY_RUN default true) → log to S3 archive (Seoul)
         → cross-region replication → S3 DR (Singapore, STANDARD_IA)

API Gateway POST /restore (API key) → Restore Lambda → new EBS volume from snapshot

CloudWatch alarms → SNS → email

Notable internals: the RetainUntil tag makes retention data-driven — Backup stamps an expiry, Cleanup reads it; no separate retention DB. DRY_RUN defaults to true, so the first cleanup run only lists targets (safe by default). DR is achieved with native S3 cross-region replication of the cleanup-logs/* prefix to Singapore. Restore validates snapshot_id / volume_type before the EC2 call, and /restore is guarded by an API key. EBS snapshots are intentionally not Terraform-managed — delete them manually (filter ManagedBy=smart-vault) after testing so cost stops.

The EC2 instance being protected — availability zone and security group:

EC2 security group for the protected instance

Snapshot creation, verified in the console:

Snapshot generation verification

Snapshot generation check

The SNS backup report email (subscription confirmed, then a success report):

SNS subscription confirmed

Backup success verification email

A restore driven through the REST endpoint — Lambda log and the resulting success email:

Restore Lambda log

Restore success email

Cost: EBS snapshots ($0.05/GB/mo) + cross-region replication ($0.02/GB) are the only paid items — under $1 for a small test volume. Always terraform destroy after testing.

P4 — Customer-Service AI Chatbot ★ Flagship

Stack: API Gateway (HTTP v2) + Lambda + DynamoDB + SSM + SNS + S3 + CloudFront + CloudWatch · Region: Seoul.

A Google Gemini-API-based customer-service chatbot (Bedrock is a drop-in runtime toggle). It receives messages on a REST endpoint, keeps conversation history in DynamoDB (24h TTL), returns AI responses, and emails a human agent via SNS on escalation. The web UI is served from S3 + CloudFront.

Browser / curl
  → CloudFront (HTTPS) ─ OAC ─▶ S3 (web UI, private)
  → API Gateway HTTP v2 (POST /chat)
       ▼
  Lambda chatbot (ARM64, 256MB, timeout 45s, Python 3.12)
  ┌──────────────────────────────────────────────┐
  │ 1. get_history   DynamoDB Query (last 10 turns)│
  │ 2. build_prompt  System Prompt + history + msg │
  │ 3. call_ai       Gemini API (HTTP 25s) | Bedrock│
  │ 4. route         ESCALATE / profanity / fallback│
  │ 5. save_history  DynamoDB BatchWriter (TTL 24h) │
  └──────────────────────────────────────────────┘
       │                    │
   DynamoDB             SSM Parameter Store (Gemini key, cold-start fetch)
   (sessions, TTL 24h)
       ▼ (on ESCALATE)
     SNS → email

Resource	Detail
DynamoDB `p4-chatbot-sessions`	PK `session_id` (S), SK `timestamp` (S), `PAY_PER_REQUEST`, TTL 24h
Lambda `p4-chatbot-chatbot`	ARM64, 256MB, timeout 45s, Python 3.12
API Gateway HTTP v2	`POST /chat`, CORS `allow_origins=["*"]` (restrict in prod)
S3 `p4-chatbot-ui-{suffix}`	web UI, private, AES256 SSE
CloudFront + OAC	HTTPS-only, `default_root_object=index.html`, 404→index.html
SSM Parameter Store	`/cloud-portfolio/gemini-api-key` (SecureString, KMS); Terraform-managed placeholder with `ignore_changes=[value]`, real key seeded once via CLI
SNS `p4-chatbot-alerts`	email subscription (escalation + alarms)

The _0/_1 conversation-ordering invariant. Each turn stores a user + assistant item at the same ISO timestamp, disambiguated by a sort-key suffix:

2026-06-05T14:23:45.123456+00:00_0  → role "user"
2026-06-05T14:23:45.123456+00:00_1  → role "assistant"

DynamoDB sorts the SK ascending and '0' < '1', so the user item always precedes the assistant item. get_history() pairs items as [user, assistant] — this ordering is a precondition of the pairing loop. An earlier _user/_assistant suffix broke it ('a' < 'u' put the assistant first), so multi-turn history mis-paired and fed the wrong context to the model. That was the repeated-response bug, fixed and recorded in the change log.

Credentials & state. The Gemini key lives in SSM SecureString (ADR-0001) — never in Terraform state or the Lambda env tab; it’s fetched once at cold start and cached for the container lifetime, with IAM granting ssm:GetParameter on the single parameter ARN only. P4 was also the first project migrated to the shared S3 remote-state backend (ADR-0002 / ERR-001).

Provisioning P4 with Terraform, then with Terragrunt (DRY remote-state config):

Terraform apply for P4 — part 1

Terraform apply for P4 — part 2

Terraform apply for P4 — outputs

Terragrunt run — part 1

Verifying Gruntwork's public key

Terragrunt apply complete

Cost: effectively $0/mo on free tier (Lambda / API GW / DynamoDB / SSM / S3+CloudFront / SNS all within limits; Gemini is free at ≤1,500 req/day). Switching to Bedrock ≈ $1–3/mo.

Pattern reuse

P4 reuses the code patterns of P1–P3 but creates every resource itself — no dependency on their deployed infrastructure. Each project deploys and destroys independently.

Pattern source	Borrowed pattern	Where applied in P4
P1 Static Web	S3 private + CloudFront OAC + HTTPS-only hosting	web UI
P2 Pipeline	DynamoDB `PAY_PER_REQUEST` + TTL; API GW → Lambda AWS_PROXY	`p4-chatbot-sessions` + `POST /chat`
P3 Smart Vault	SNS topic + email subscription notification	`p4-chatbot-alerts` escalation email

Skills matrix

Capability	P1	P2	P3	P4
Terraform IaC	✓	✓	✓	✓
Python 3.12 Lambda	—	✓×3	✓×3	✓
Compute	—	Lambda	Lambda + EC2/EBS	Lambda (ARM64)
Data store	—	DynamoDB	EBS snapshots + S3	DynamoDB (TTL)
Messaging / queue	—	SQS + DLQ	—	—
Eventing / schedule	—	S3 events	EventBridge cron	—
API	—	API GW HTTP	API GW REST (key)	API GW HTTP
Edge / CDN	CloudFront + WAF	—	—	CloudFront + OAC
AI / ML	—	Textract	—	Gemini / Bedrock
Secrets	—	—	API key (tfvars/SSM)	SSM SecureString
Multi-region / DR	us-east-1 (WAF)	—	Singapore DR	—
Least-privilege IAM	bucket policy	per-Lambda	per-Lambda	per-Lambda
Observability	CW + SNS	CW + SNS	CW + SNS	CW + SNS

Process skills: ADR-driven decisions, change-log discipline, error records with prevention, and a Terraform remote-state backend with locking.

Roadmap

Next up is a document-analysis engine — Textract for OCR/table extraction feeding embeddings into OpenSearch for semantic search. It’s deliberately the last step because SageMaker endpoints and OpenSearch bill hourly rather than per-request, so it’ll run as a single timed demo and be destroyed immediately after.