LinkedIn Sales API scraper

What's this?

A full-end web-app + Chrome extension for automating scraping data using the LinkedIn Sales API. See the live demo at mainq.io. Here are some of the most useful features implemented in the backend:

Connect multiple LinkedIn Sales accounts. The backend will distribute concurrent API calls between them and automatically switch when a Sales account gets rate-limited.
Stores downloaded Account (companies) and Lead (people) data in PostgreSQL, allowing to easily export as JSON or CSV.
Data deduplication.

About

I built this with the end goal of monetizing the solution as a subscription-based SaaS. That's not my goal anymore so I'm open sourcing this. The LinkedIn Sales API isn't public and there are no docs about it. There was a lot of reverse engineering in this project and I want to make it available to anybogy interested.

Note that most of the code in this repo is focused on the web-app component of the task (controlling things like limits on concurrency per user, automatically switching between sales accounts and other "product" oriented features). The Linkedin Sales API code is mostly in linkedin.py.

Architecture

Looking back at it, quite overengineered. Runs via docker-compose:

                         ┌───────────────┐
                         │     NGINX     │
                         └───────────────┘
                               │   │
                       ┌───────┘   └───────┐
                       ↓                   ↓
              ┌─────────────────┐ ┌─────────────────┐
              │ API (replica 1) │ │ API (replica 2) │
              └─────────────────┘ └─────────────────┘
                       │                   │
                       └─────────┐─────────┘
   ┌──────────────┐              │
   │   postgres   │              ↓
   └──────────────┘      ┌───────────────┐      ┌──────────────┐
                         │ Orchestrator  │      │    redis     │
   ┌──────────────┐      └───────────────┘      └──────────────┘
   │ disk volume  │              │
   └──────────────┘              ↓
                            ┌────────┐
                            │  NATS  │
                            └────────┘
                                 │
                  ┌──────────────└──────────────┐
                  ↓                             ↓
       ┌────────────────────┐        ┌────────────────────┐
       │ Worker (replica 1) │        │ Worker (replica 2) │
       └────────────────────┘        └────────────────────┘

NGINX acts as the reverse proxy.
The multiple API replicas handle things like authentication but their main job is to create and schedule user-requested scraping jobs.
The orchestrator feeds scraping jobs to workers accoding to the job-concurrency limits per user. This allows users to create jobs in bulk, while the backend feeds jobs to workers at a controlled rate to avoid getting rate limited by LinkedIn too fast as well as acting as a rate limiter to protect this this backend's workers. The orchestrator internally stores a queue per user which gets asynchronously consumed as jobs get completed.
The workers run scraping, export, deduplication, etc., to avoid overloading the main API servers. They are handled by the TaskIQ framework.
NATS is the event broker, used to asynchronously feed jobs into the workers.
All data is stored in Postgres. LinkedIn Sales responses are stored in JSONB columns.
A shared docker volume is used to share large static CSV files between the workers and the API servers.
Redis is used to store TaskIQ results, which can be awaited from the API workers.

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
.idea		.idea
assets		assets
backend		backend
extension		extension
frontend		frontend
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
dev.env		dev.env
docker-compose.dev.yaml		docker-compose.dev.yaml
docker-compose.prod.yaml		docker-compose.prod.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LinkedIn Sales API scraper

What's this?

About

Architecture

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LinkedIn Sales API scraper

What's this?

About

Architecture

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages