Django 1.7 + Scrapy

Today I am going to share how to use Scrapy and Django together to crawl to a website and store scrapped data to Database using Django.

First, let us build a Django application using the following commands.

pip install django==1.7 startproject example_project
cd example_project

Inside the example_project directory, we will create a django app named app:

python startapp app

Then, we will update the like this:

from django.db import models

class ExampleDotCom(models.Model):
    title = models.CharField(max_length=255)
    description = models.CharField(max_length=255)

    def __str__(self):
        return self.title

Now we shall update the inside the app directory:

from django.contrib import admin
from app.models import ExampleDotCom

Update INSTALLED_APPS of like:

INSTALLED_APPS += ('app',)

Now, we will run the following commands in project directory:

python makemigrations
python migrate
python createsuperuser

The last command will prompt to create a super user for the application. Now we will run the following command:

python runserver

It will start the django application.

Django part is complete for now. Lets start the scrapy project.

In separate directory, we will create a scrapy project using the following commands:

pip install Scrapy==1.0.3
scrapy startproject example_bot

To use with Django application from scrapy application, we shall update its inside example_bot project directory:

import os
import sys

DJANGO_SETTINGS_MODULE = 'example_project.settings'

sys.path.insert(0, DJANGO_PROJECT_PATH)
BOT_NAME = 'example_bot'

SPIDER_MODULES = ['example_bot.spiders']

To connect with django model, we need to install DjangoItem like this:

pip install scrapy-djangoitem==1.0.0

Inside example_bot directory, we will update the file like this:

from scrapy_djangoitem import DjangoItem
from app.models import ExampleDotCom

class ExampleDotComItem(DjangoItem):
    django_model = ExampleDotCom

Now we will create a crawl spider named inside spiders directory:

from scrapy.spiders import BaseSpider
from example_bot.items import ExampleDotComItem

class ExampleSpider(BaseSpider):
    name = "example"
    allowed_domains = [""]
    start_urls = ['']

    def parse(self, response):
         title = response.xpath('//title/text()').extract()[0]
         description = response.xpath('//body/div/p/text()').extract()[0]
         return ExampleDotComItem(title=title, description=description)

Now we shall create an pipeline class like this inside

class ExPipeline(object):
    def process_item(self, item, spider):
        return item

Now we need to update the with this:

    'example_bot.pipelines.ExPipeline': 1000,

Project structure will be like this:

├── django1.7+scrapy
│   ├── example_bot
│   │   ├──
│   │   ├──
│   │   ├──
│   │   ├──
│   │   └── spiders
│   │       ├──
│   │       └── 
│   └── scrapy.cfg
└── example_project
    ├── app
    │   ├──
    │   ├──
    │   ├──
    │   └──
    └── example_project

Now we shall run the application using the following command:

scrapy crawl example

Now let us return to the running django application. If everything above is done correctly, then we shall see an object of ExampleDotCom class has been created like the below screenshot in this url http://localhost:8000/admin/app/exampledotcom/:

All Objects

Single Object

Thats all. Up and running django 1.7 + Scrapy project.

Drawbacks: Only implemented using django 1.7


Got help and clues from this stackoverflow link.

comments powered by Disqus