Day 3: Why I'm Building 4 Services Instead of One Big App

Day 3: Why I’m Building 4 Services Instead of One Big App

Day 3 done.

No code written yet—spent the hour planning HuntKit‘s architecture.

And honestly? I almost made a huge mistake.

The Temptation: Build It All At Once

My first instinct was to build one FastAPI app that does everything:

Scrapes jobs
Analyzes profiles
Matches candidates
Finds emails

Ship it all together, deploy, done.

Then I re-read Chapter 1.

What Chapter 1 Actually Taught Me

“Trying to scale everything together is how you end up with a monolith that can’t grow.”

The example that clicked: Web servers and databases scale differently. That’s why we separate them with load balancers and database replication.

For HuntKit, I realized each piece has wildly different needs:

Job Aggregator (scraping 1000+ sites):

I/O intensive—constantly making HTTP requests
Needs distributed crawlers across regions
Rate limiting per domain to avoid bans
Biggest scaling challenge

Profile Analyzer (parsing GitHub/LinkedIn):

CPU intensive—processing repos, analyzing code
Caching-heavy (same profiles checked repeatedly)
Predictable load based on user signups

Matching Engine (ranking jobs by fit):

Read-heavy (90% reads, 10% writes)
Benefits massively from Redis caching
Needs fast response times (<100ms)

Outreach Assistant (email finder + drafts):

API-dependent (Hunter.io, OpenAI)
Queuing required (async processing)
Cost-sensitive (API calls = $$)

If I built one monolith, I’d have to scale EVERYTHING when job scraping hits limits—even though matching and outreach aren’t stressed.

The Microservices Decision

Breaking into 4 independent services means:

Scale each based on actual need (crawler needs 10 instances, matcher needs 2)
Deploy updates without breaking everything
Test one piece at a time (ship faster, iterate publicly)
Different tech choices per service (Scrapy for crawling, FastAPI for matching)

This is literally applying horizontal scaling at the service level.

Why Start With The Hardest Part?

Service 1: Job Aggregator is my Day 8-20 focus because:

It’s the riskiest unknown – I’ve never scraped at scale, don’t know legal boundaries, anti-bot measures are evolving
It’s the foundation – without jobs, nothing else matters
It’s where I’ll learn the most – I’ve done GitHub parsing, matching algorithms, LLM integration before. Distributed scraping? New territory.

De-risk the hard stuff early.

My Struggle: Database Decisions

Here’s where I’m stuck: Do I choose databases now or later?

Current thinking:

Job listings = probably Postgres (relational, ACID for job data integrity)
User profiles/cache = Redis (fast reads for matching)
Analytics = maybe ClickHouse later for logs/metrics

But honestly? I don’t know the exact data shape yet. Choosing now feels like guessing.

Is it okay to say “TBD until Day 10 when I see actual scraped data”? Or does that look like I don’t understand databases matter?

Genuinely curious: How do you decide tech stack timing?

Tomorrow’s Plan (Day 4)

Not building yet. Exploring:

Research Scrapy vs Playwright vs custom async crawlers
Study 10 company career pages (structure, anti-bot measures)
Test robots.txt compliance frameworks
Read legal considerations (don’t want to accidentally break laws)

Day 5: Pick an approach based on findings
Day 6+: Start coding Service 1

I’m nervous about the legal gray area of scraping. Has anyone dealt with this before?

What I’m Learning

System design isn’t just “pick the right database.” It’s about:

Understanding different scaling needs
Building for iteration, not perfection
De-risking unknowns early
Separating concerns so failures don’t cascade

Chapter 1 made this click. Now applying it to something real.

Tech Stack (So Far):

Backend: FastAPI (async = perfect for I/O-heavy scraping)
Frontend: React for web (Flutter mobile later if this works)
Services: 4 independent microservices
Infrastructure: TBD per service needs

Progress: 3/100 days. Still planning, not rushing.

Drop your thoughts below—especially if you’ve built scrapers or microservices before. I’m learning as I go.