BeautifulSoup is a python library which helps in managing data from html or xml files, using beautifulsoup helps in searching, navigation and parsing data with ease and less amount code.
Lets Scrape and download all One Piece Episodes from kissanime.to website:
How are we going to do with beautifulsoup:
1. Get the source(html code) of the One Piece page.
2. Find the required episode links using BeautifulSoup. (without beautifulsoup we would be using regular expression which is great but for less code and easiness we use beautifullsoup)
3. Use webkit and gtk to grab the video source url and finally download episode using this url.
Get the source code(html) from url: https://kissanime.to/Anime/One-Piece/
lets name the file as one_piece.html
Using BeautifulSoup to get the list of links:
BeautifulSoup's find_all method is the most used and simple method to search over all the file for required data.
The above code displays all the episode urls, as you see beautifulsoup is very easy to use, it has many more methods like title which displays the title of source code, prettify method which displays the html code in a beautifull format.
Here we used find_all method which captures all the anchor tags in a list which have href attributes.
After gettings all the anchor tags we are conditioning only the tags which have 'Episode' in the href link as these are the only urls we need.
Using gtk and webkit libraries we are going to actually open a simple webkit browser and run the url.
We are using the webkit browser because the kissanime.to site only displays the real video link if a real browser requests it. So we are using webkit to request kissanime as a real browser and then once it responds with real video link we will grab that and start downloading it.
Run download.py with url that we got by scraping as argument:
You can combine the above two scripts to make everything automated instead of running dowload.py with url for every episode url.