Difference between revisions of "Web scraping"

From Publication Station
Line 10: Line 10:


We will use a browser extension called WebScraper.io. You can install the extension [https://addons.mozilla.org/en-US/firefox/addon/web-scraper/ for Firefox] or for [https://addons.mozilla.org/en-US/firefox/addon/web-scraper/ for Chrome].
We will use a browser extension called WebScraper.io. You can install the extension [https://addons.mozilla.org/en-US/firefox/addon/web-scraper/ for Firefox] or for [https://addons.mozilla.org/en-US/firefox/addon/web-scraper/ for Chrome].
To learn how to use WebScraper.io you can watch [https://www.youtube.com/watch?v=n7fob_XVsbY&t=47s the intro video].


'''Step 2:'''
'''Step 2:'''
Line 22: Line 24:


'''Step 4:'''
'''Step 4:'''
You should now have an extra tab called "Web Scraper Dev". Open this tab.

Revision as of 08:55, 2 September 2022

Web scraping is used to scrape data such as text and images from websites. In this example we will scrape data from the Gutenberg website.

The purpose of web scraping is to transform web content into usable data for other programs or analysis. In this case we transform the following website into CSV data which can be opened in Microsoft Excel or Numbers.

Alice Wonderland Gutenberg.png
Alice Wonderland Scraped.png

Step 1:

We will use a browser extension called WebScraper.io. You can install the extension for Firefox or for for Chrome.

To learn how to use WebScraper.io you can watch the intro video.

Step 2:

After installing the extension you can navigate to Alice’s Adventures in Wonderland on the Gutenberg website.

Step 3:

Right click anywhere on the screen and click "inspect". This will open the inspector, a tool commonly used for debugging websites.

Alice Wonderland Inspect.png

Step 4:

You should now have an extra tab called "Web Scraper Dev". Open this tab.