Parsing With Nokogiri
I was reading an article from our blog about extracting all the links from a webpage with python.Have a look , that's a well written article.So ,I decided to write an article about extracting links and images links with Ruby using Nokogiri .
What's Nokogiri ?
Nokogiri is a library that acts as HTML/XML parser. In simple language, if you want to extract a piece of information from a website to use it in your program what would you do? Suppose we want to extract the information in <div id="abc"> to use it in our program,either I will copy the source of website into a text file manualy and then search through the whole document or I can use a library that can help me in extracting the information directly from the website.Nokogiri is one such library.
Using Nokogiri
Step 1. Install the gem 'nokogiri' by typing "gem install nokogiri" .
Step 2.
Include the library in your program by typing "require 'nokogiri'". Also include the 'open-uri' library by typing " require 'open-uri' " as we will be dealing with the website.
Step 3.
Now we will open the page and with the help of css selector we wil l look for <a> tag and then we will pick out whats inside 'href' that will be the link.Same we will do for obtaining an image too.
Have a look at the complete code (Explanation in comments):
Run it by typing: $ ruby nokogiri.rb ,on your terminal.
Thank you!
What's Nokogiri ?
Nokogiri is a library that acts as HTML/XML parser. In simple language, if you want to extract a piece of information from a website to use it in your program what would you do? Suppose we want to extract the information in <div id="abc"> to use it in our program,either I will copy the source of website into a text file manualy and then search through the whole document or I can use a library that can help me in extracting the information directly from the website.Nokogiri is one such library.
Using Nokogiri
Step 1. Install the gem 'nokogiri' by typing "gem install nokogiri" .
Step