Web Scraping 101: Get Started with BeautifulSoup

Photo by Michael Dziedzic on Unsplash

Getting started with web scraping can be easy or difficult.

The trick is to be able to gradually understand every little step that is part of this “practice”.

Getting started with web scraping can be easy or difficult.

First, you need to get familiar with the external modules:

import requests

from bs4 import BeautifulSoup

import urllib.request

Photo by HalGatewood.com on Unsplash

Now we need to choose a target page/website of our scraper.

In this case, my mission will be to get the titles of the most popular articles on google news.

scraper.py

# Send a request

url = “https://news.google.com/topstories"

response = requests.get(url)

# Instantience a Web Scraper

soup = BeautifulSoup(response.content, “html.parser”)

print(soup.prettify())

Now we have a soup object that contains all the HTML code related to the target page.

The last thing to do is to print all the titles of the most popular articles on the screen, which can be done by iterating all the <h3> elements of the page and printing their textual content on the screen.

titles = soup.find_all(“h3”)

#Print the current TOP NEWS

for title in titles:

print(title.text)

What to do now?

Generally, in these cases, it might be useful to save all the titles in a file and change the code that automatically saves the titles from time to time.

Thanks for reading.

Web Scraping 101: made with love by Antonio Scapellato

Creative Developer & Entrepreneur.