Serverless AI Model Execution on AWS Fargate Spot

Moved an AI/ML compute workload from an always-on server to an on-demand setup using ECS Fargate Spot, so compute runs only when a job is submitted.

A FastAPI API records each job in DynamoDB, launches a short-lived Fargate task, and tracks progress until completion. Results are written to S3, and the API returns a download link when ready.

The container outputs a small completion payload to CloudWatch Logs, which the API reads after the task ends — keeping the system simple and loosely coupled.

This change reduced compute costs by roughly ~70% compared to on-demand Fargate (and far more compared to an always-on server), while keeping the user experience intact.

AWS ECS Fargate SpotAWS ECRAWS S3AWS CloudWatch LogsAWS DynamoDBDockerFastAPIVue.jsPython (RDKit, Pandas, Boto3)
View on GitHub

What this covers

Always-On to On-Demand

Replaced an idle-heavy server with per-job Fargate Spot tasks that start, run, and terminate automatically.

Containerized ML Model

Packaged the model + dependencies into a single Docker image so it runs the same locally and in Fargate.

Async Job Orchestration

API tracks job states in DynamoDB and manages task lifecycle (start → poll → complete/fail) with timeouts.

CloudWatch as Communication Channel

Task writes outputs to S3 and logs a small JSON completion signal; API reads it after the task finishes.

S3 Result Pipeline

Results stored in S3 and delivered via presigned URLs; avoids pushing large files through the API server.

Full-Stack Job Lifecycle

Frontend polls job status and shows progress + results; users can download or clean up completed jobs.

ECS Task Definition as Infrastructure

Task resources, logging, and networking are codified and versioned so runs are repeatable and auditable.