In my web scraping journey with Selenium, I encountered a roadblock when the website I was scraping suddenly returned a “403 Forbidden Access” error. It turned out that my VPS IP had been banned.
To overcome this issue and ensure seamless web scraping, I explored using proxy servers with Selenium. After some trial and error, I discovered a reliable solution: Selenium Wire.
Selenium Wire is an extension of Selenium that offers additional features, including seamless proxy integration.
import requests
def get_proxies(api_key, page=1, page_size=25):
url = f"https://proxy.webshare.io/api/v2/proxy/list/?mode=direct&page={page}&page_size={page_size}"
headers = {"Authorization": api_key}
response = requests.get(url, headers=headers)
if response.status_code == 200:
proxies = []
for proxy in response.json().get('results', []):
proxies.append(f"http://{proxy['username']}:{proxy['password']}@{proxy['proxy_address']}:{proxy['port']}")
return proxies
print(f"Error fetching proxies: {response.status_code}")
return []
To use these proxies with Selenium, follow these steps:
- Fetch a random proxy from the list obtained using the
get_proxies
function. - Configure the Selenium WebDriver to use this proxy with Selenium Wire.
Here’s a snippet of the code:
from seleniumwire import webdriver
import random
def configure_selenium_with_proxy(proxy_list, chrome_options):
proxy = random.choice(proxy_list)
return webdriver.Chrome(options=chrome_options, seleniumwire_options={'proxy': proxy})
By implementing this approach with Selenium Wire, you can easily bypass IP bans and ensure uninterrupted web scraping.