Back to blog
Scrape Glassdoor with Selenium: Guide & Use Cases

Glassdoor is one of the primary sources for employer reviews, salary disclosures, and job listings. HR teams scrape Glassdoor to benchmark compensation. Recruiters monitor it to track competitor hiring activity. Researchers pull it to analyze what candidates actually say about the companies they have worked for.
If you need to collect that data at scale, web scraping Glassdoor is the practical path. This guide walks through a working Python setup using Selenium, covers what data you can extract, and explains where DIY automation typically hits its limits.
Note: Glassdoor uses JavaScript rendering and anti-bot protection. The code below is suitable for beginner level but to obtain consistent and large-scale results you should use proxy rotation and stealth browser configuration. Always review Glassdoor’s Terms of Service before scraping.
Scrape Glassdoor Reviews Python Method: Know the Basics
About Glassdoor
Glassdoor is a jobs and employer review platform operating across 60+ countries. Unlike most job boards, it combines job listings with reviews about employee, salary reports, interview experiences, and CEO approval ratings — all in one place.
The combination of standard information with insights on how the employers are perceived in the market makes Glassdoor data uniquely useful. For HR teams, recruitment agencies, and labor market researchers, that context is of particular interest.
What Data Can You Get by Web Scraping Glassdoor
- Job listings — job title, seniority level, employment type (full-time, contract, remote), and application link
- Job descriptions — full requirement lists, responsibilities, and required skills per listing
- Salary data — posted ranges, base pay, bonuses, and equity where disclosed
- Company details — employer name, industry, size, headquarters, and company profile URL
- Employee reviews — star ratings, pros/cons, role title, employment status, and review date
- CEO and leadership ratings — approval percentages and trend data over time
- Interview data — difficulty ratings, process descriptions, and outcome (offer/no offer)
- Posting metadata — date posted, job ID, and “Easy Apply” availability
Who benefits from job data scraping? Is it possible to collect information from 10+ job boards? Find the answers here —> Job Scraping: Benefits, Challenges & Case Study
How to Scrape Glassdoor with Selenium: Starting Point
Glassdoor loads its content dynamically through JavaScript, so a plain HTTP request returns incomplete HTML — the job cards and review sections render after the initial page load. Selenium drives a real browser that executes JavaScript before parsing, which is why it is the right tool for this target.
Let’s start with the setup.
pip install selenium pandas
Importing Selenium
Starting from Selenium 4, the driver is initialized through a Service object. Make sure you have ChromeDriver installed and matching your Chrome version.
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
DRIVER_PATH = '/path/to/chromedriver'
service = Service(executable_path=DRIVER_PATH)
driver = webdriver.Chrome(service=service)
Navigating Glassdoor Job Search
Glassdoor structures job search through URL parameters. Pass your keyword and location directly in the URL, this is more reliable than simulating search box input and easier to iterate across pages.
keyword = "data analyst"
location = "New York"
start = 0 # pagination offset
url = f"https://www.glassdoor.com/Job/jobs.htm?sc.keyword={keyword.replace(' ', '+')}&locT=C&locId=1132348&start={start}"
driver.get(url)
The start parameter controls pagination — increment it by 30 to move to the next page of results.
Waiting for Job Cards to Load
Glassdoor renders job cards asynchronously. Use WebDriverWait to confirm the listings are present before attempting to extract them.
wait = WebDriverWait(driver, 10)
wait.until(
EC.presence_of_element_located((By.CSS_SELECTOR, "li[data-test='jobListing']"))
)
job_cards = driver.find_elements(By.CSS_SELECTOR, "li[data-test='jobListing']")
print(f"Found {len(job_cards)} job listings")
Extracting Job Card Data
From each job card you can extract the position, company name, location, and salary range where posted:
import pandas as pd
titles, companies, locations, salaries = [], [], [], []
for card in job_cards:
try:
title = card.find_element(By.CSS_SELECTOR, "[data-test='job-title']").text
except Exception:
title = ""
try:
company = card.find_element(By.CSS_SELECTOR, "[data-test='employer-name']").text
except Exception:
company = ""
try:
location = card.find_element(By.CSS_SELECTOR, "[data-test='emp-location']").text
except Exception:
location = ""
try:
salary = card.find_element(By.CSS_SELECTOR, "[data-test='detailSalary']").text
except Exception:
salary = ""
titles.append(title)
companies.append(company)
locations.append(location)
salaries.append(salary)
Tip: Always inspect the current page structure in your browser’s DevTools before running the scraper. Glassdoor updates its HTML periodically. data-test attributes tend to be more stable than class names — prefer them when both are available.
Glassdoor Review Scraper Python
To collect employee reviews, navigate to the company’s Reviews page. Glassdoor renders reviews inside containers you can target by data-test attributes.
import time
company_url = "https://www.glassdoor.com/Reviews/your-company-reviews-EXAMPLE.htm"
driver.get(company_url)
time.sleep(2)
wait.until(
EC.presence_of_element_located((By.CSS_SELECTOR, "li[data-test='review']"))
)
review_cards = driver.find_elements(By.CSS_SELECTOR, "li[data-test='review']")
ratings, headlines, pros_list, cons_list, dates = [], [], [], [], []
for review in review_cards:
try:
rating = review.find_element(
By.CSS_SELECTOR,
"[data-test='cell-Stars-value']"
).get_attribute("aria-label")
except Exception:
rating = ""
try:
headline = review.find_element(
By.CSS_SELECTOR,
"[data-test='review-title']"
).text
except Exception:
headline = ""
try:
pros = review.find_element(
By.CSS_SELECTOR,
"[data-test='pros']"
).text
except Exception:
pros = ""
try:
cons = review.find_element(
By.CSS_SELECTOR,
"[data-test='cons']"
).text
except Exception:
cons = ""
try:
date = review.find_element(
By.CSS_SELECTOR,
"time"
).get_attribute("datetime")
except Exception:
date = ""
ratings.append(rating)
headlines.append(headline)
pros_list.append(pros)
cons_list.append(cons)
dates.append(date)
Saving Extracted Data
Combine job or review data into a DataFrame and export to CSV:
df = pd.DataFrame({
'Title': titles,
'Company': companies,
'Location': locations,
'Salary': salaries
})
df.to_csv('glassdoor_jobs.csv', index=False)
print(df)
For review data:
df_reviews = pd.DataFrame({
'Rating': ratings,
'Headline': headlines,
'Pros': pros_list,
'Cons': cons_list,
'Date': dates
})
df_reviews.to_csv('glassdoor_reviews.csv', index=False)
print(df_reviews)
Anti-Bot Handling When You Scrape Glassdoor Reviews Python
Glassdoor uses active bot detection. A standard Selenium session exposes the navigator.webdriver property in the browser fingerprint, which Glassdoor’s detection layer reads as a signal of automation. At any volume above a few dozen requests, the scraper will hit CAPTCHAs or IP blocks.
A few things that matter in practice:
- Proxy rotation. Residential or ISP proxies significantly reduce the chance of IP-level blocking. Datacenter proxies are detected at a much higher rate on platforms like Glassdoor.
- Request pacing. Fixed intervals between requests are a bot signal. Real users have variable timing. Adding randomized delays between page loads (not uniform time.sleep(2) calls, but jittered pauses) reduces detection triggers.
- Browser fingerprint patching. Libraries like playwright-stealth patch the headless browser properties that anti-bot systems check: Navigator, Canvas, WebGL, and TLS signatures.
- Glassdoor’s login wall. Much of the review data is available only after account login. Automating a logged-in session adds new challenges: session management, cookie handling, and 2FA on occasion.
You committed to building your own scraper? Learn more about all the challenges you can face in scraping protected sites —> How to Deal With the Most Common Challenges in Web Scraping
Mastering Glassdoor Scraper Selenium: DataOx’s Contribution
You now have a working structural approach on how to scrape Glassdoor. That covers only the basics, because continuous and high-volume projects present many different problems. For example, updates of site structure break selectors, anti-bot measures evolve, while maintaining a scraper across multiple targets takes ongoing engineering work.
As experts in scraping technology with 10+ years of experience, we handle anti-bot measures at the infrastructure level, manage IP rotation across residential and ISP proxy pools, and configure browser fingerprint patching to guarantee seamless scraping process. When Glassdoor updates its page structure — and it does — we detect the break and fix it without interrupting delivery.
Our clients receive cleaned, structured datasets in any format the project requires: CSV, JSON, API, or a custom pipeline. Not only from Glassdoor, but also from LinkedIn, other job boards, and custom websites in the recruitment space.
To find out how DataOx can help you scrape Glassdoor data according to your business goals, schedule a free consultation with our expert.

web scraping services
Get free consultation
FAQ about Scrape Glassdoor with Selenium
Is web scraping Glassdoor legal?
Publicly visible data — job titles, salary ranges, review ratings, company names — is legally accessible, unlike private information that demands authentication to access. Considering Terms of Service, it also matters how you collect data and for what purposes. DataOx responsibly reviews the scope of every Glassdoor scraping project before starting — zero legal incidents across 10+ years of operation.
Why use Selenium for web scraping Glassdoor?
Glassdoor renders its job cards and review content through JavaScript. A plain HTTP request returns incomplete HTML, therefore the listings load after the initial page response. Selenium drives a real browser that executes JavaScript before parsing, so the content is fully rendered when the scraper reads it. For teams that need Glassdoor data at scale without building browser automation infrastructure, DataOx handles the full extraction pipeline and delivers structured result in any format the project requires — with options of data visualization and integration directly into your workflow.
Can a Glassdoor scraper collect data across multiple companies simultaneously?
Yes, that is a loop across company profile URLs or search query combinations. The scraper iterates through each target, builds the request dynamically, and aggregates results into a single dataset. Unfortunately, it is possible only at medium-sized volumes, at higher the same IP will eventually trigger blocking. DataOx structures multi-company Glassdoor scraping projects with distributed requests and deduplication built in.
How do I keep Glassdoor review data current without re-scraping everything?
Filter by review date on each run and collect only entries posted since the last scrape. Most review pages expose sorting by date, which lets you stop paginating once you reach the limit. DataOx sets up scheduled Glassdoor scraping pipelines that run on whatever interval the project requires — daily, weekly, monthly, in real time or custom scheduling.
What happens when the Glassdoor scraper hits a CAPTCHA?
The scraper stalls and stops collecting until the challenge is resolved. With standard Selenium, CAPTCHA triggers are common because the navigator.webdriver flag in the browser fingerprint signals automation to Glassdoor’s detection layer. DataOx handles CAPTCHA avoidance and our setups are configured considering platform’s protection details to minimize detection triggers.
Stay ahead with data insights
Subscribe to DataOx newsletter
get a free consultation
Fill out the form — we'll get back to you with options tailored to your needs.
what happens next
We review your goals and get in touch to clarify scope
Your privacy is a priority — NDA available upon request.
You receive a clear proposal with timeline, budget, and delivery format.
Once approved, we start building your data pipeline.
get a free consultation
Fill out the form — we'll get back to you with options tailored to your needs.
what happens next
We review your goals and get in touch to clarify scope
Your privacy is a priority — NDA available upon request.
You receive a clear proposal with timeline, budget, and delivery format.
Once approved, we start building your data pipeline.




