Python tutorial: HTTP requests to import data from the web




[ad_1]

Learn how to perform HTTP requests to import web data with Python: https://www.datacamp.com/courses/importing-data-in-python-part-2

Congrats on importing your first web data! In order to import files from the web, we used the urlretrieve function from urllib.requests. Lets now unpack this a bit and, in the process, understand a few things about how the internet works:

URL stands for Uniform or Universal Resource Locator and all they really are are references to web resources. The vast majority of URLs are web addresses, but they can refer to a few other things, such as file transfer protocols (FTP) and database access. We’ll currently focus on those URLs that are web addresses OR the locations of websites.

Such a URL consists of 2 parts:

A protocol identifier http ot https and
A resource name such as datacamp.com
The combination of protocol identifier and resource name uniquely specifies the web address!

To explain URLs, I have introduced yet another acronym http, which itself stands for HyperText Transfer Protocol. Wikipedia provides a great description of HTTP:

The Hypertext Transfer Protocol (HTTP) is an application protocol for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web.
Note that HTTPS is a more secure form of HTTP. Each time you go to a website, you are actually sending an HTTP request to a server. This request is known as a GET request, by far the most common type of HTTP request.

We are actually performing a GET request when using the function urlretrieve. The ingenuity of urlretrieve also lies in fact that it not only makes a GET request but also saves the relevant data locally.

In the following, you’ll learn how to make more GET requests to store web data in your environment. In particular, you’ll figure out how to get the HTML data from a webpage. HTML stands for Hypertext Markup Language and is the standard markup language for the web.

To extract the html from the wikipedia home page, you

Import the necessary functions;
Specify the URL;
Package the GET request using the function Request;
Send the request and catch the response using the function urlopen;
This returns an HTTPResponse object, which has an associated read method;
You then apply this read method to the response, which returns the HTML as a string, which you store in the variable html.
You remember to be polite and close the response!
Now we are going to do the same, however here we’ll use the requests package, which provides a wonderful API for making requests. According to the requests package website:

Requests allows you to send organic, grass-fed HTTP/1.1 requests, without the need for manual labor.
and the following organizations claim to use requests internally:

Her Majesty’s Government, Amazon, Google, Twilio, NPR, Obama for America, Twitter, Sony, and Federal U.S. Institutions that prefer to be unnamed
Moreover,

Requests is one of the most downloaded Python packages of all time, pulling in over 7,000,000 downloads every month. All the cool kids are doing it!
Lets now see requests at work:

Here you

Import the package requests;
Specify the URL;
Package the request, send the request and catch the response with a single function requests.get();
Apply the text method to the response which returns the HTML as a string;
That’s enough out of me for the time being: let’s get you hacking away at pulling down some HTML from the web using GET requests!

GET coding!

Source


[ad_2]

Comment List

  • DataCamp
    November 26, 2020

    i am getting no module named requests.

  • DataCamp
    November 26, 2020

    can i fetch data in a kivy app?

  • DataCamp
    November 26, 2020

    Amazing way of presentation and confidence.

  • DataCamp
    November 26, 2020

    What function takes a domain name as a parameter and returns an ip address?

  • DataCamp
    November 26, 2020

    What if you have javascript in your page

  • DataCamp
    November 26, 2020

    can you please tell me how can i play mp4 video from url which is store in json file in python ?

  • DataCamp
    November 26, 2020

    sir plz help how to access .txt file on spyder

  • DataCamp
    November 26, 2020

    Wendy Shields

  • DataCamp
    November 26, 2020

    Add a public comment…CANCEL COMMENT

  • DataCamp
    November 26, 2020

    How to do the same thing when you have a more than one email ids saved in a csv file?

  • DataCamp
    November 26, 2020

    how to confirm that my url has opened after r.get('url') .?

  • DataCamp
    November 26, 2020

    Where the word hyper came from?

  • DataCamp
    November 26, 2020

    Bearded man inspiring me!

  • DataCamp
    November 26, 2020

    out of 2 codes 1 one executed without errors but webpage not opened.
    2nd code ssl connection error

  • DataCamp
    November 26, 2020

    hello everybody

  • DataCamp
    November 26, 2020

    This is really well explained. The slides are exactly enough.

  • DataCamp
    November 26, 2020

    This was an amazingly well-explained, crystal clear, explanation! Thank you so much for putting together this video!! I've been struggling to understand the process, and finally, your concise, step-by-step, well-articulated tutorial really made it easy to understand!! I'm now looking forward to learning more from DataCamp! You are a fantastic teacher!

Write a comment