# Chrome Scrapping
## Instructions
how to scrape the internet using a chrome engine.
this video shows how to collect all characters from Witcher books wiki using selenium package.
script:
```python
import pandas as pd
import time
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
import matplotlib.pyplot as plt
import os
import logging
## Setup chrome options
chrome_options = Options()
chrome_options.add_argument("--headless") # Ensure GUI is off
chrome_options.add_argument("--no-sandbox")
# Silent download of drivers
logging.getLogger('WDM').setLevel(logging.NOTSET)
os.environ['WDM_LOG'] = 'False'
# Create service
webdriver_service = Service(ChromeDriverManager().install())
# Create driver
driver = webdriver.Chrome(service = webdriver_service, options = chrome_options)
# Go to the characters in books page
page_url = "https://witcher.fandom.com/wiki/Category:Characters_in_the_stories"
driver.get(page_url)
# # Click on Accept cookies
# time.sleep(3)
# driver.find_element(By.XPATH, '//div[text()="ACCEPT"]').click()
# Find books
book_categories = driver.find_elements(by=By.CLASS_NAME, value='category-page__member-link')
books = []
for category in book_categories:
book_url = category.get_attribute('href')
book_name = category.text
books.append({'book_name': book_name, "url": book_url})
character_list = []
for book in books:
# go to book page
driver.get(book['url'])
character_elems = driver.find_elements(by=By.CLASS_NAME, value = 'category-page__member-link')
for elem in character_elems:
character_list.append({'book': book['book_name'],'character': elem.text})
```
## Overview
🔼Topic:: [[Web Scrapping]]
Topic:: [[Network analysis]]
◀Origin:: [[Thu Vu data analytics]]
🔗Link:: [Source](https://www.youtube.com/watch?v=RuNolAh_4bU)
GitHub repo - https://github.com/thu-vu92/the_witcher_network