Selenium으로 크롤링하기

PROGRAMMING/Python 2021. 5. 11. 14:37

크롤링은 크게 두가지 방법을 사용해서 행할 수 있다.
+ 둘 다 쓸 수도 있다.

BeautifulSoup 모듈을 이용하는 방법 (정적 크롤링)
Selenium 모듈을 이용하는 방법 (동적 크롤링)

기본적으로는 정적 크롤링이 HTML DOM parsing을 통해 (특히 웹 프로그래밍을 해본 사람이라면) 리소스를 얻을 수 있기 때문에 더 쉽다. 하지만 순수 HTML이 아닌 JS로 이루어진 웹 페이지라면 BeautifulSoup만으로는 크롤링이 어렵다. 이 때 사용할 수 있는 동적 크롤링인 Selenium 모듈에 대한 사용법을 알아보자.

Selenium 설치 및 환경 구축

1. Selenium 모듈을 설치한다.

pip install selenium

2. Chrome 정보를 확인한 뒤, 여기에서 ChromeDriver-WebDriver for Chrome-를 설치한다. Selenium은 webdriver 실행 파일을 켠 상태에서 동적으로 크롤링하는 구조이기 때문에, 파이썬 스크립트 실행 시 webdriver가 켜져 있어야 한다.

처음 Selenium을 사용하게 되면 ... 과장 조금 보태서 내가 해커가 된 느낌이다. Chrome이 자동으로 열리고 닫히며, 키워드 입력 등 매크로를 사용할 수 있다. 나도 약간 놀랐다. ＼（〇_ｏ）／

webdriver를 이용하면 Chrome에서 인식해 위와 같은 문구를 띄운다.

Selenium 예제

쉑쉑버거 포장하는 법

지금 쉑쉑버거가 너무 먹고싶어서, 네이버에 '쉑쉑버거 포장'을 검색하는 크롤러 예제를 만들어 보았다. 이 코드를 실행하면 마법 같게도 자동으로 근처 쉑쉑버거 포장 후기를 알려준다.

Selenium이 불안정한 건지, 처음에 잘 되다가 갑자기 CMD에서 USB 관련 warning이 자꾸 뜬다. 작동에 문제는 없지만 신경쓰여서 ChromeOptions 관련 코드도 추가하였다.

chromedriver는 소스코드와 같은 폴더에 있다고 가정하고 작성한 코드입니다.

import time
from selenium import webdriver

# option for USB warnings
options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-logging"])
browser = webdriver.Chrome(options=options)

browser.get('https://www.naver.com/')
time.sleep(2)
search_box = browser.find_element_by_name('query')
search_box.send_keys('쉑쉑버거 포장')
search_box.submit()
time.sleep(2)
browser.quit()

Result:

네이버 로그인 하기

네이버 로그인이 너무 귀찮아서 자동 로그인 프로그램을 만들어 버렸다면?

from selenium import webdriver
import time

# option for USB warnings 
options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-logging"])

user = ""
passwd = ""

if __name__ == "__main__":
    browser = webdriver.Chrome(executable_path="../webdriver/chromedriver.exe", options=options)
    browser.implicitly_wait(3)

    # enter the login page
    url_login = "https://nid.naver.com/nidlogin.login"
    browser.get(url_login)
    print("enter to login page...")

    e = browser.find_element_by_id("id")
    e.clear()
    e.send_keys(user)
    time.sleep(0.5)
    f = browser.find_element_by_id("pw")
    f.clear()
    f.send_keys(passwd)
    time.sleep(0.5)

    form = browser.find_element_by_css_selector("input.btn_global[type=submit]")
    form.click()
    print("clicking the login button..")
    time.sleep(5)

    # enter the shopping page
    browser.get("https://order.pay.naver.com/home?tabMenu=SHOPPING")

    products = browser.find_elements_by_css_selector(".p_info span")
    print(products)
    if len(products) > 0:
        for product in products:
            print("-", product.text)
    else:
        print("nothing to list.")

결과는 아쉽게도.. 선두 IT기업 답게 막혀있다. ㅠ

예제

책을 참고하여 github에 BeautifulSoup 및 Selenium의 몇몇 예제들을 작성해 보았다.

https://github.com/yerimJu/crawling_examples

yerimJu/crawling_examples

crawling_examples. Contribute to yerimJu/crawling_examples development by creating an account on GitHub.

github.com

참고 서적 : 파이썬을 이용한 머신러닝, 딥러닝 실전 개발 입문 / 위키북스 / 2017

Reference

https://velog.io/@swhybein/Python-Selenium%EC%9C%BC%EB%A1%9C-%ED%81%AC%EB%A1%A4%EB%A7%81%ED%95%98%EA%B8%B0

Python - Selenium으로 크롤링하기

웬만하면 BeautifulSoup을 이용해 크롤링 할 수 있으면 좋지만 안되는 경우도 가끔 있습니다. 기생충 리뷰에 이어 오스카 홈페이지에서 수상자 명단을 크롤링 해보려 했지만 html에 텍스트 자체가 나

velog.io

choihyuunmin.tistory.com/82

파이썬 셀레니움 - 시스템에 부착된 장치가 작동하지 않습니다. (0x1F)

파이썬에서 selenium을 공부하던 중 크롬 드라이버를 실행시키면 터미널에 다음과 같은 에러가 발생했다. 정확한 에러메시지를 보게 되면 USB: usb_device_handle_win.cc:1049 Failed to read descriptor from no..

choihyuunmin.tistory.com

https://oslinux.tistory.com/33

우분투 서버에 selenium 설치하기

https://tecadmin.net/setup-selenium-chromedriver-on-ubuntu/ How to Setup Selenium with ChromeDriver on Ubuntu 18.04 & 16.04 – TecAdmin How to setup Selenium with ChromeDriver on Ubuntu, and LinuxMi..

oslinux.tistory.com

TroubleShootings

mjdeeplearning.tistory.com/46

Python 크롤링 시 Only the following pseudo-classes are implemented: nth-of-type.오류

Only the following pseudo-classes are implemented: nth-of-type.selenium 구글 크롬 개발자도구에서 셀렉터 카피에서 나오는 child 선택자인 nth-child 를 지원하지 않는다. tr:nth ->nth-of-type 으로 바꿔준..

mjdeeplearning.tistory.com

https://stackoverflow.com/questions/50138615/webdriverexception-unknown-error-cannot-find-chrome-binary-error-with-selenium

WebDriverException: unknown error: cannot find Chrome binary error with Selenium in Python for older versions of Google Chrome

For compatibility reasons I prefer to use Chrome version 55.0.2883.75 with Chromedriver v. 2.26. I downloaded the older version of chrome from https://www.slimjet.com/chrome/google-chrome-old-versi...

stackoverflow.com

https://synkc.tistory.com/entry/Chromedriver-DevToolsActivePort-file-doesnt-exist-%EC%97%90%EB%9F%AC-%ED%95%B4%EA%B2%B0%EB%B2%95

Chromedriver DevToolsActivePort file doesn't exist 에러 해결법

간밤에 삽질하게 만들었다. chromedriver가 업데이트 되면서 DevToolsActivePort를 찾을 수 없다는 에러를 뿜게 되었다. chrome_options = webdriver.ChromeOptions() chrome_options.add_argument('--headless'..

synkc.tistory.com

https://june98.tistory.com/11

파이썬 (Selenium) 로딩까지 기다림 (feat. WebDriverWait)

파이썬 셀레니움 로딩까지 기다림 Waits Selenium (feat. WebDriverWait) 2020/12/31 - [개발자/파이썬] - 파이썬 (Python) Selenium (기본, 네이버 로그인) 저번 글에서 Selenium의 사용방법에 대해서 글을 썼었..

june98.tistory.com

https://wkdtjsgur100.github.io/selenium-does-not-work-to-click/

Selenium에서 특정 element가 갑자기 클릭이 되지 않을 때 (python)

selenium을 이용해서 functional test를 쨔는 도중, 특정 element를 클릭하는 부분을 쨔야하는 경우가 있었는데,

wkdtjsgur100.github.io

https://hugssy.tistory.com/197

'cp949' codec can't encode character '\xa0' 문제 해결법

파이썬의 bs4 그리고 requests를 활용해서 크롤링을 할 때에, 한글 인코딩 문제로 아래와 같은 에러가 발생할 수 있다. 본인은 conda 가상 환경에서 python 3.6 버전을 이용해서 cgv의 상영 시간표를 크롤

hugssy.tistory.com

저작자표시 비영리 동일조건

'PROGRAMMING > Python' 카테고리의 다른 글

Python Programming : datetime ⇿ string 변환 (0)	2021.04.28
Algorithm 문제 해결에 Python을 사용해야 하는 이유 10가지 (0)	2021.03.14
파이썬을 이용한 사인 그래프 그리기 - Generating a Synthetic Sine Wave with Python (0)	2017.03.21

ABOUT ME

Emily's Tistory

Selenium 설치 및 환경 구축

Selenium 예제

쉑쉑버거 포장하는 법

Result:

네이버 로그인 하기

예제

Reference

TroubleShootings

'PROGRAMMING > Python' 카테고리의 다른 글

티스토리툴바

ABOUT ME

Selenium 설치 및 환경 구축

Selenium 예제

쉑쉑버거 포장하는 법

Result:

네이버 로그인 하기

예제

Reference

TroubleShootings

'PROGRAMMING > Python' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바