Skip to content

ylrax/data_scrapping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 

Repository files navigation

Data_scrapping

Projects of data web scrapping

npm bundle size


Garfield comic strip scrapper

This python script gets comic strips from a webpage and saves them into the computer as images. There are no economical intentions on this project, all rights reserved to the owners of the content.

  • Requisites:

Python 3 code tested with the libraries (it may work with other versions):

beautifulsoup4==4.6.3
requests==2.20.1
urllib3==1.24.1

Library installation process:

pip install beautifulsoup4==4.6.3 requests==2.20.1 urllib3==1.24.1
  • Tests and checks

To verify the correct performance of the code, there is a small test to be checked. Sometimes the web updates the layout and the code stop working.

To launch the test:

python comic_strips/test/functionality_check.py

If no error are triggered, the the complete code should work, as now is explained.

Execution:

There are two versions of the code:

  1. New daily comic strip grabber:

With the file comic_strip_grabber.py daily Spanish comics strips are gathered.

Usage:

Configure the date parameters inside the file and execute it

python comic_strips/comic_strip_grabber.py

or give it parameters (start dates and end date) on the call with format '%Y/%m/%d':

python comic_strips/comic_strip_grabber.py "2017/01/01" "2017/01/02"
  1. First old english comic strips

With the file classic_emg_first_strips.py older English comics strips are gathered. The launch is the same as the previous one:

Configure the date parameters inside the file and execute it

python comic_strips/classic_emg_first_strips.py

or give it parameters (start dates and end date) on the call with format '%Y/%m/%d':

python comic_strips/classic_emg_first_strips.py "2017/01/01" "2017/01/02"

Output

Comic strips are stored inside the folder extracted_images as PNG images and format: "garf_YYYY_MM_DD.png"

Also all logs and information are stored into a scv file.


Manga chapter scrapper

(WIP) Currently there is a notebook under the folder online_manga that extracts full chapters from online viewers ands saves them as images locally.

About

Projects of data web scrapping

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors