Scraping quotes

A website of quotes is given at https://quotes.toscrape.com as an example to learn web scraping.

The following code is stored in spider_quotes.py. It is an example from the scrapy documentation.

import scrapy


class QuoteSpider(scrapy.Spider):
    name = "toscrape-css"
    start_urls = [
        'http://quotes.toscrape.com/',
    ]

    def parse(self, response):
        for quote in response.css("div.quote"):
            yield {
                'text': quote.css("span.text::text").extract_first(),
                'author': quote.css("small.author::text").extract_first(),
                'tags': quote.css("div.tags > a.tag::text").extract()
            }

        next_page_url = response.css("li.next > a::attr(href)").extract_first()
        if next_page_url is not None:
            yield scrapy.Request(response.urljoin(next_page_url))

and the cralwer can be started using scrapy runspider spider_quotes.py -o quotes_data.csv.

import pandas as pd
pd.read_csv('quotes_data.csv')
text author tags
0 “The world as we have created it is a process ... Albert Einstein change,deep-thoughts,thinking,world
1 “It is our choices, Harry, that show what we t... J.K. Rowling abilities,choices
2 “There are only two ways to live your life. On... Albert Einstein inspirational,life,live,miracle,miracles
3 “The person, be it gentleman or lady, who has ... Jane Austen aliteracy,books,classic,humor
4 “Imperfection is beauty, madness is genius and... Marilyn Monroe be-yourself,inspirational
... ... ... ...
196 “You never really understand a person until yo... Harper Lee better-life-empathy
197 “You have to write the book that wants to be w... Madeleine L'Engle books,children,difficult,grown-ups,write,write...
198 “Never tell the truth to people who are not wo... Mark Twain truth
199 “A person's a person, no matter how small.” Dr. Seuss inspirational
200 “... a mind needs books as a sword needs a whe... George R.R. Martin books,mind

201 rows × 3 columns