Day 3: Why I’m Building 4 Services Instead of One Big App
Day 3 done.
No code written yet—spent the hour planning HuntKit‘s architecture.
And honestly? I almost made a huge mistake.
Contents
The Temptation: Build It All At Once
My first instinct was to build one FastAPI app that does everything:
- Scrapes jobs
- Analyzes profiles
- Matches candidates
- Finds emails
Ship it all together, deploy, done.
Then I re-read Chapter 1.
What Chapter 1 Actually Taught Me
“Trying to scale everything together is how you end up with a monolith that can’t grow.”
The example that clicked: Web servers and databases scale differently. That’s why we separate them with load balancers and database replication.
For HuntKit, I realized each piece has wildly different needs:
Job Aggregator (scraping 1000+ sites):
- I/O intensive—constantly making HTTP requests
- Needs distributed crawlers across regions
- Rate limiting per domain to avoid bans
- Biggest scaling challenge
Profile Analyzer (parsing GitHub/LinkedIn):
- CPU intensive—processing repos, analyzing code
- Caching-heavy (same profiles checked repeatedly)
- Predictable load based on user signups
Matching Engine (ranking jobs by fit):
- Read-heavy (90% reads, 10% writes)
- Benefits massively from Redis caching
- Needs fast response times (<100ms)
Outreach Assistant (email finder + drafts):
- API-dependent (Hunter.io, OpenAI)
- Queuing required (async processing)
- Cost-sensitive (API calls = $$)
If I built one monolith, I’d have to scale EVERYTHING when job scraping hits limits—even though matching and outreach aren’t stressed.
The Microservices Decision
Breaking into 4 independent services means:
- Scale each based on actual need (crawler needs 10 instances, matcher needs 2)
- Deploy updates without breaking everything
- Test one piece at a time (ship faster, iterate publicly)
- Different tech choices per service (Scrapy for crawling, FastAPI for matching)
This is literally applying horizontal scaling at the service level.
Why Start With The Hardest Part?
Service 1: Job Aggregator is my Day 8-20 focus because:
- It’s the riskiest unknown – I’ve never scraped at scale, don’t know legal boundaries, anti-bot measures are evolving
- It’s the foundation – without jobs, nothing else matters
- It’s where I’ll learn the most – I’ve done GitHub parsing, matching algorithms, LLM integration before. Distributed scraping? New territory.
De-risk the hard stuff early.
My Struggle: Database Decisions
Here’s where I’m stuck: Do I choose databases now or later?
Current thinking:
- Job listings = probably Postgres (relational, ACID for job data integrity)
- User profiles/cache = Redis (fast reads for matching)
- Analytics = maybe ClickHouse later for logs/metrics
But honestly? I don’t know the exact data shape yet. Choosing now feels like guessing.
Is it okay to say “TBD until Day 10 when I see actual scraped data”? Or does that look like I don’t understand databases matter?
Genuinely curious: How do you decide tech stack timing?
Tomorrow’s Plan (Day 4)
Not building yet. Exploring:
- Research Scrapy vs Playwright vs custom async crawlers
- Study 10 company career pages (structure, anti-bot measures)
- Test robots.txt compliance frameworks
- Read legal considerations (don’t want to accidentally break laws)
Day 5: Pick an approach based on findings
Day 6+: Start coding Service 1
I’m nervous about the legal gray area of scraping. Has anyone dealt with this before?
What I’m Learning
System design isn’t just “pick the right database.” It’s about:
- Understanding different scaling needs
- Building for iteration, not perfection
- De-risking unknowns early
- Separating concerns so failures don’t cascade
Chapter 1 made this click. Now applying it to something real.
Tech Stack (So Far):
- Backend: FastAPI (async = perfect for I/O-heavy scraping)
- Frontend: React for web (Flutter mobile later if this works)
- Services: 4 independent microservices
- Infrastructure: TBD per service needs
Progress: 3/100 days. Still planning, not rushing.
Drop your thoughts below—especially if you’ve built scrapers or microservices before. I’m learning as I go.