Chrome scrapping - Philosopher's Code

# Chrome Scrapping ## Instructions how to scrape the internet using a chrome engine. this video shows how to collect all characters from Witcher books wiki using selenium package. script: ```python import pandas as pd import time from selenium import webdriver from webdriver_manager.chrome import ChromeDriverManager from selenium.webdriver.common.by import By from selenium.webdriver.chrome.service import Service from selenium.webdriver.chrome.options import Options import matplotlib.pyplot as plt import os import logging ## Setup chrome options chrome_options = Options() chrome_options.add_argument("--headless") # Ensure GUI is off chrome_options.add_argument("--no-sandbox") # Silent download of drivers logging.getLogger('WDM').setLevel(logging.NOTSET) os.environ['WDM_LOG'] = 'False' # Create service webdriver_service = Service(ChromeDriverManager().install()) # Create driver driver = webdriver.Chrome(service = webdriver_service, options = chrome_options) # Go to the characters in books page page_url = "https://witcher.fandom.com/wiki/Category:Characters_in_the_stories" driver.get(page_url) # # Click on Accept cookies # time.sleep(3) # driver.find_element(By.XPATH, '//div[text()="ACCEPT"]').click() # Find books book_categories = driver.find_elements(by=By.CLASS_NAME, value='category-page__member-link') books = [] for category in book_categories: book_url = category.get_attribute('href') book_name = category.text books.append({'book_name': book_name, "url": book_url}) character_list = [] for book in books: # go to book page driver.get(book['url']) character_elems = driver.find_elements(by=By.CLASS_NAME, value = 'category-page__member-link') for elem in character_elems: character_list.append({'book': book['book_name'],'character': elem.text}) ``` ## Overview 🔼Topic:: [[Web Scrapping]] Topic:: [[Network analysis]] ◀Origin:: [[Thu Vu data analytics]] 🔗Link:: [Source](https://www.youtube.com/watch?v=RuNolAh_4bU) GitHub repo - https://github.com/thu-vu92/the_witcher_network