Web Scraping Using Python

  1. Find the URL that you want to scrape.
  2. Inspecting the Page.
  3. Find the data you want to extract.
  4. Write the code.
  5. Run the code and extract the data.
  6. Store the data in the required format.
  1. Requests is a HTTP library for the Python programming language. The goal of the project is to make HTTP requests simpler and more human-friendly. Allows you to send HTTP/1.1 requests extremely easily. There’s no need to manually add query strings to your URLs, or to form-encode your POST data.
  2. Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.
  3. BeautifulSoup is a Python library for pulling data out of HTML and XML files. It works with your favourite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
import requests
from bs4 import BeautifulSoup
import pandas as pd
Text = []
res = requests.get('https://en.wikipedia.org/wiki/Web_scraping')
soup = BeautifulSoup(res.text, 'html. Parser')
soup.select('mw-headline')
for i in soup.select('.mw-headline'):
print(i.text)
Text.append(i.text)
df = pd.DataFrame({'Text':Text}) 
df.to_csv('Web_Scrap.csv', index=False, encoding='utf-8')

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store