python-mini-projects/projects/Scrape_quotes/quote_scraper.py at master · JavaGle/python-mini-projects

49 lines (38 loc) · 1.41 KB

from bs4 import BeautifulSoup
import requests
# URL to the website
url='http://quotes.toscrape.com'
# Getting the html file and parsing with html.parser
html=requests.get(url)
bs=BeautifulSoup(html.text,'html.parser')
# Tries to open the file 
    csv_file=open('quote_list.csv','w')
    fieldnames=['quote','author','tags']
    dictwriter=csv.DictWriter(csv_file,fieldnames=fieldnames)
    # Writes the headers
    dictwriter.writeheader()
    #While next button is found in the page the loop runs
    while True:
        # Loops through quote in the page
        for quote in bs.findAll('div',{'class':'quote'}):
            #Extract the text part of quote, author and tags
            text=quote.find('span',{'class':'text'}).text
            author=quote.find('small',{'class':'author'}).text
            tags=[]
            for tag in quote.findAll('a',{'class':'tag'}):
                tags.append(tag.text)
            #Writes the current quote,author and tags to a csv file
            dictwriter.writerow({'quote':text,'author':author,'tags':tags})
        #Finds the link to next page
        next=bs.find('li',{'class':'next'})
        if not next: 
            break
        #Gets and parses the html file of next page
        html=requests.get(url+next.a.attrs['href'])
        bs=BeautifulSoup(html.text,'html.parser')
    print('Unknown Error!!!')
    csv_file.close()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

quote_scraper.py

Latest commit

History

quote_scraper.py

File metadata and controls