Useful tools

Testing your spider

Thanks to the command line tools provided by Scrapy, we can easily test the spiders as we are developing them:

scrapy crawl WSP -a 'ftp_host=ftp.example.com' -a 'ftp_netrc=/path/to/netrc'

WSP is the name of the spider as defined in the name attribute of the spider.

As you see, you can also pass custom arguments to the spider via the -a flag. These will be directly mapped to the constructor of the spider.

If you want to change the directory where your JSON file will be stored, pass the settings variable JSON_OUTPUT_DIR to any scrapy crawl command:

scrapy crawl WSP -s 'JSON_OUTPUT_DIR=/tmp/' -a 'ftp_host=ftp.example.com' -a 'ftp_netrc=/path/to/netrc'

Writing extraction code with scrapy shell

In order to help you implement the extraction from the XML files, scrapy provides a shell simulating a response:

scrapy shell file:///path/to/sample.xml

You can then run xpath expressions in the shell:

response.selector.xpath(".//abstract").extract()
["...some abstract ..."]