Table of Contents

Scrape Glassdoor Reviews Python Method: Know the Basics About Glassdoor What Data Can You Get by Web Scraping Glassdoor How to Scrape Glassdoor with Selenium: Starting Point Importing Selenium Navigating Glassdoor Job Search Waiting for Job Cards to Load Extracting Job Card Data Glassdoor Review Scraper Python Saving Extracted Data Anti-Bot Handling When You Scrape Glassdoor Reviews Python Mastering Glassdoor Scraper Selenium: DataOx’s Contribution

Back to blog

Scrape Glassdoor with Selenium: Guide & Use Cases

HR professionals reviewing employee data using tools to scrape Glassdoor reviews for UK workplace insights

Glassdoor is one of the primary sources for employer reviews, salary disclosures, and job listings. HR teams scrape Glassdoor to benchmark compensation. Recruiters monitor it to track competitor hiring activity. Researchers pull it to analyze what candidates actually say about the companies they have worked for.

If you need to collect that data at scale, web scraping Glassdoor is the practical path. This guide walks through a working Python setup using Selenium, covers what data you can extract, and explains where DIY automation typically hits its limits.

Note: Glassdoor uses JavaScript rendering and anti-bot protection. The code below is suitable for beginner level but to obtain consistent and large-scale results you should use proxy rotation and stealth browser configuration. Always review Glassdoor’s Terms of Service before scraping.

Scrape Glassdoor Reviews Python Method: Know the Basics

About Glassdoor

Glassdoor is a jobs and employer review platform operating across 60+ countries. Unlike most job boards, it combines job listings with reviews about employee, salary reports, interview experiences, and CEO approval ratings — all in one place.

The combination of standard information with insights on how the employers are perceived in the market makes Glassdoor data uniquely useful. For HR teams, recruitment agencies, and labor market researchers, that context is of particular interest.

What Data Can You Get by Web Scraping Glassdoor

  • Job listings — job title, seniority level, employment type (full-time, contract, remote), and application link
  • Job descriptions — full requirement lists, responsibilities, and required skills per listing
  • Salary data — posted ranges, base pay, bonuses, and equity where disclosed
  • Company details — employer name, industry, size, headquarters, and company profile URL
  • Employee reviews — star ratings, pros/cons, role title, employment status, and review date
  • CEO and leadership ratings — approval percentages and trend data over time
  • Interview data — difficulty ratings, process descriptions, and outcome (offer/no offer)
  • Posting metadata — date posted, job ID, and “Easy Apply” availability

Who benefits from job data scraping? Is it possible to collect information from 10+ job boards? Find the answers here —> Job Scraping: Benefits, Challenges & Case Study

How to Scrape Glassdoor with Selenium: Starting Point

Glassdoor loads its content dynamically through JavaScript, so a plain HTTP request returns incomplete HTML — the job cards and review sections render after the initial page load. Selenium drives a real browser that executes JavaScript before parsing, which is why it is the right tool for this target.

Let’s start with the setup.

pip install selenium pandas

Importing Selenium

Starting from Selenium 4, the driver is initialized through a Service object. Make sure you have ChromeDriver installed and matching your Chrome version.


from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

DRIVER_PATH = '/path/to/chromedriver'

service = Service(executable_path=DRIVER_PATH)
driver = webdriver.Chrome(service=service)

Glassdoor structures job search through URL parameters. Pass your keyword and location directly in the URL, this is more reliable than simulating search box input and easier to iterate across pages.


keyword = "data analyst"
location = "New York"
start = 0 # pagination offset

url = f"https://www.glassdoor.com/Job/jobs.htm?sc.keyword={keyword.replace(' ', '+')}&locT=C&locId=1132348&start={start}"

driver.get(url)

The start parameter controls pagination — increment it by 30 to move to the next page of results.

Waiting for Job Cards to Load

Glassdoor renders job cards asynchronously. Use WebDriverWait to confirm the listings are present before attempting to extract them.


wait = WebDriverWait(driver, 10)

wait.until(
EC.presence_of_element_located((By.CSS_SELECTOR, "li[data-test='jobListing']"))
)

job_cards = driver.find_elements(By.CSS_SELECTOR, "li[data-test='jobListing']")

print(f"Found {len(job_cards)} job listings")

Extracting Job Card Data

From each job card you can extract the position, company name, location, and salary range where posted:

import pandas as pd

titles, companies, locations, salaries = [], [], [], []

for card in job_cards:
try:
title = card.find_element(By.CSS_SELECTOR, "[data-test='job-title']").text
except Exception:
title = ""

try:
company = card.find_element(By.CSS_SELECTOR, "[data-test='employer-name']").text
except Exception:
company = ""

try:
location = card.find_element(By.CSS_SELECTOR, "[data-test='emp-location']").text
except Exception:
location = ""

try:
salary = card.find_element(By.CSS_SELECTOR, "[data-test='detailSalary']").text
except Exception:
salary = ""

titles.append(title)
companies.append(company)
locations.append(location)
salaries.append(salary)

Tip: Always inspect the current page structure in your browser’s DevTools before running the scraper. Glassdoor updates its HTML periodically. data-test attributes tend to be more stable than class names — prefer them when both are available.

Glassdoor Review Scraper Python

To collect employee reviews, navigate to the company’s Reviews page. Glassdoor renders reviews inside containers you can target by data-test attributes.

import time

company_url = "https://www.glassdoor.com/Reviews/your-company-reviews-EXAMPLE.htm"

driver.get(company_url)

time.sleep(2)

wait.until(
EC.presence_of_element_located((By.CSS_SELECTOR, "li[data-test='review']"))
)

review_cards = driver.find_elements(By.CSS_SELECTOR, "li[data-test='review']")

ratings, headlines, pros_list, cons_list, dates = [], [], [], [], []

for review in review_cards:
try:
rating = review.find_element(
By.CSS_SELECTOR,
"[data-test='cell-Stars-value']"
).get_attribute("aria-label")
except Exception:
rating = ""

try:
headline = review.find_element(
By.CSS_SELECTOR,
"[data-test='review-title']"
).text
except Exception:
headline = ""

try:
pros = review.find_element(
By.CSS_SELECTOR,
"[data-test='pros']"
).text
except Exception:
pros = ""

try:
cons = review.find_element(
By.CSS_SELECTOR,
"[data-test='cons']"
).text
except Exception:
cons = ""

try:
date = review.find_element(
By.CSS_SELECTOR,
"time"
).get_attribute("datetime")
except Exception:
date = ""

ratings.append(rating)
headlines.append(headline)
pros_list.append(pros)
cons_list.append(cons)
dates.append(date)

Saving Extracted Data

Combine job or review data into a DataFrame and export to CSV:


df = pd.DataFrame({
'Title': titles,
'Company': companies,
'Location': locations,
'Salary': salaries
})

df.to_csv('glassdoor_jobs.csv', index=False)
print(df)

For review data:

df_reviews = pd.DataFrame({
'Rating': ratings,
'Headline': headlines,
'Pros': pros_list,
'Cons': cons_list,
'Date': dates
})
df_reviews.to_csv('glassdoor_reviews.csv', index=False)
print(df_reviews)

Anti-Bot Handling When You Scrape Glassdoor Reviews Python

Glassdoor uses active bot detection. A standard Selenium session exposes the navigator.webdriver property in the browser fingerprint, which Glassdoor’s detection layer reads as a signal of automation. At any volume above a few dozen requests, the scraper will hit CAPTCHAs or IP blocks.

A few things that matter in practice:

  • Proxy rotation. Residential or ISP proxies significantly reduce the chance of IP-level blocking. Datacenter proxies are detected at a much higher rate on platforms like Glassdoor.
  • Request pacing. Fixed intervals between requests are a bot signal. Real users have variable timing. Adding randomized delays between page loads (not uniform time.sleep(2) calls, but jittered pauses) reduces detection triggers.
  • Browser fingerprint patching. Libraries like playwright-stealth patch the headless browser properties that anti-bot systems check: Navigator, Canvas, WebGL, and TLS signatures.
  • Glassdoor’s login wall. Much of the review data is available only after account login. Automating a logged-in session adds new challenges: session management, cookie handling, and 2FA on occasion.

You committed to building your own scraper? Learn more about all the challenges you can face in scraping protected sites —> How to Deal With the Most Common Challenges in Web Scraping

Mastering Glassdoor Scraper Selenium: DataOx’s Contribution

You now have a working structural approach on how to scrape Glassdoor. That covers only the basics, because continuous and high-volume projects present many different problems. For example, updates of site structure break selectors, anti-bot measures evolve, while maintaining a scraper across multiple targets takes ongoing engineering work.

As experts in scraping technology with 10+ years of experience, we handle anti-bot measures at the infrastructure level, manage IP rotation across residential and ISP proxy pools, and configure browser fingerprint patching to guarantee seamless scraping process. When Glassdoor updates its page structure — and it does — we detect the break and fix it without interrupting delivery.

Our clients receive cleaned, structured datasets in any format the project requires: CSV, JSON, API, or a custom pipeline. Not only from Glassdoor, but also from LinkedIn, other job boards, and custom websites in the recruitment space.

To find out how DataOx can help you scrape Glassdoor data according to your business goals, schedule a free consultation with our expert.


Web scraping services for enterprise data extraction and custom scraping solutions with real-time delivery

web scraping services

Get free consultation
Web scraping services for enterprise data extraction and custom scraping solutions with real-time delivery

Leave a Reply

Your email address will not be published. Required fields are marked *

FAQ about Scrape Glassdoor with Selenium

Is web scraping Glassdoor legal?

Publicly visible data — job titles, salary ranges, review ratings, company names — is legally accessible, unlike private information that demands authentication to access. Considering Terms of Service, it also matters how you collect data and for what purposes. DataOx responsibly reviews the scope of every Glassdoor scraping project before starting — zero legal incidents across 10+ years of operation.

Why use Selenium for web scraping Glassdoor?

Glassdoor renders its job cards and review content through JavaScript. A plain HTTP request returns incomplete HTML, therefore the listings load after the initial page response. Selenium drives a real browser that executes JavaScript before parsing, so the content is fully rendered when the scraper reads it. For teams that need Glassdoor data at scale without building browser automation infrastructure, DataOx handles the full extraction pipeline and delivers structured result in any format the project requires — with options of data visualization and integration directly into your workflow.

Can a Glassdoor scraper collect data across multiple companies simultaneously?

Yes, that is a loop across company profile URLs or search query combinations. The scraper iterates through each target, builds the request dynamically, and aggregates results into a single dataset. Unfortunately, it is possible only at medium-sized volumes, at higher the same IP will eventually trigger blocking. DataOx structures multi-company Glassdoor scraping projects with distributed requests and deduplication built in.

How do I keep Glassdoor review data current without re-scraping everything?

Filter by review date on each run and collect only entries posted since the last scrape. Most review pages expose sorting by date, which lets you stop paginating once you reach the limit. DataOx sets up scheduled Glassdoor scraping pipelines that run on whatever interval the project requires — daily, weekly, monthly, in real time or custom scheduling.

What happens when the Glassdoor scraper hits a CAPTCHA?

The scraper stalls and stops collecting until the challenge is resolved. With standard Selenium, CAPTCHA triggers are common because the navigator.webdriver flag in the browser fingerprint signals automation to Glassdoor’s detection layer. DataOx handles CAPTCHA avoidance and our setups are configured considering platform’s protection details to minimize detection triggers.

get a free consultation

Fill out the form — we'll get back to you with options tailored to your needs.

what happens next

We review your goals and get in touch to clarify scope

Your privacy is a priority — NDA available upon request.

You receive a clear proposal with timeline, budget, and delivery format.

Once approved, we start building your data pipeline.

Most projects launch within up to 10 business days.

Have a question? Ask away

contact us

Let's find the best solution for your data needs.

    get a free consultation

    Fill out the form — we'll get back to you with options tailored to your needs.

    what happens next

    We review your goals and get in touch to clarify scope

    Your privacy is a priority — NDA available upon request.

    You receive a clear proposal with timeline, budget, and delivery format.

    Once approved, we start building your data pipeline.

    Most projects launch within up to 10 business days.

    Have a question? Ask away

    contact us

    Let's find the best solution for your data needs.