Serverless AI Model Execution on AWS Fargate Spot
Moved an AI/ML compute workload from an always-on server to an on-demand setup using ECS Fargate Spot, so compute runs only when a job is submitted.
A FastAPI API records each job in DynamoDB, launches a short-lived Fargate task, and tracks progress until completion. Results are written to S3, and the API returns a download link when ready.
The container outputs a small completion payload to CloudWatch Logs, which the API reads after the task ends — keeping the system simple and loosely coupled.
This change reduced compute costs by roughly ~70% compared to on-demand Fargate (and far more compared to an always-on server), while keeping the user experience intact.
What this covers
Always-On to On-Demand
Replaced an idle-heavy server with per-job Fargate Spot tasks that start, run, and terminate automatically.
Containerized ML Model
Packaged the model + dependencies into a single Docker image so it runs the same locally and in Fargate.
Async Job Orchestration
API tracks job states in DynamoDB and manages task lifecycle (start → poll → complete/fail) with timeouts.
CloudWatch as Communication Channel
Task writes outputs to S3 and logs a small JSON completion signal; API reads it after the task finishes.
S3 Result Pipeline
Results stored in S3 and delivered via presigned URLs; avoids pushing large files through the API server.
Full-Stack Job Lifecycle
Frontend polls job status and shows progress + results; users can download or clean up completed jobs.
ECS Task Definition as Infrastructure
Task resources, logging, and networking are codified and versioned so runs are repeatable and auditable.