onlyforbopi
11/1/2017 - 6:51 AM

INTERNET - HTTP AND RESTful

  1. HTTP AND RESTFUL BASIC THEORY
  2. BASIC MODULES - REQUEST
  3. HTTPRESTFULCOMMANDS.py (COMMAND OVERVIEW)
  4. DOWNLOAD FILES WITH HTTP/RESTFUL USING PYTHON - pythonhttprestdl.py
  5. DOWNLOADING COMPARISON AND OVERVIEW OF METHODS- dlmethodstest.py
  6. SIMPLE HTTP HANDSHAKES AND URLS
  7. Base64 authentification
import urllib2

username = 'user1'
password = '123456'

#This should be the base url you wanted to access.
baseurl = 'http://server_name.com'

#Create a password manager
manager = urllib2.HTTPPasswordMgrWithDefaultRealm()
manager.add_password(None, baseurl, username, password)

#Create an authentication handler using the password manager
auth = urllib2.HTTPBasicAuthHandler(manager)

#Create an opener that will replace the default urlopen method on further calls
opener = urllib2.build_opener(auth)
urllib2.install_opener(opener)

#Here you should access the full url you wanted to open
response = urllib2.urlopen(baseurl + "/file")


#########################################################
#########################################################
# py3


down vote
If you can use the requests library, it's insanely easy. I'd highly recommend using it if possible:

import requests

url = 'http://somewebsite.org'
user, password = 'bob', 'I love cats'
resp = requests.get(url, auth=(user, password))
import requests, base64

usrPass = "userid:password"
b64Val = base64.b64encode(usrPass)
r=requests.post(api_URL, 
                headers={"Authorization": "Basic %s" % b64Val},
                data=payload)
                
                
                
>>> import base64
>>> encoded = base64.b64encode('data to be encoded')
>>> encoded
'ZGF0YSB0byBiZSBlbmNvZGVk'
>>> data = base64.b64decode(encoded)
>>> data
'data to be encoded'




import base64
base64.b64encode(b'your name')  # b'eW91ciBuYW1l'
base64.b64encode('your name'.encode('ascii'))  # b'eW91ciBuYW1l'
You could use a library called requests.

import requests
r = requests.get("http://example.com/foo/bar")
This is quite easy. Then you can do like this:

>>> print r.status_code
>>> print r.headers
>>> print r.content


import httplib2
resp, content = httplib2.Http().request("http://example.com/foo/bar")


# wget in python
# From python cookbook, 2nd edition, page 487
import sys, urllib

def reporthook(a, b, c):
    print "% 3.1f%% of %d bytes\r" % (min(100, float(a * b) / c * 100), c),
for url in sys.argv[1:]:
    i = url.rfind("/")
    file = url[i+1:]
    print url, "->", file
    urllib.urlretrieve(url, file, reporthook)
print

# Python 3 version
import sys, urllib.request

def reporthook(a, b, c):
    print ("% 3.1f%% of %d bytes\r" % (min(100, float(a * b) / c * 100), c))
    sys.stdout.flush()
for url in sys.argv[1:]:
    i = url.rfind("/")
    file = url[i+1:]
    print (url, "->", file)
    urllib.request.urlretrieve(url, file, reporthook)
print
#!/usr/bin/python
import requests
from StringIO import StringIO
from PIL import Image
import profile as profile
import urllib
import wget


url = 'https://tinypng.com/images/social/website.jpg'

def testRequest():
    image_name = 'test1.jpg'
    r = requests.get(url, stream=True)
    with open(image_name, 'wb') as f:
        for chunk in r.iter_content():
            f.write(chunk)

def testRequest2():
    image_name = 'test2.jpg'
    r = requests.get(url)
    i = Image.open(StringIO(r.content))
    i.save(image_name)

def testUrllib():
    image_name = 'test3.jpg'
    testfile = urllib.URLopener()
    testfile.retrieve(url, image_name)

def testwget():
    image_name = 'test4.jpg'
    wget.download(url, image_name)

if __name__ == '__main__':
    profile.run('testRequest()')
    profile.run('testRequest2()')
    profile.run('testUrllib()')
    profile.run('testwget()')
    
    
#testRequest - 4469882 function calls (4469842 primitive calls) in 20.236 seconds
#testRequest2 - 8580 function calls (8574 primitive calls) in 0.072 seconds
#testUrllib - 3810 function calls (3775 primitive calls) in 0.036 seconds
#testwget - 3489 function calls in 0.020 seconds
#################################################################################################
#################################################################################################
# Following are the most commonly used calls for downloading files in python:
# It depends on version, (2 vs 3) or whether we use standard library (urllib) or 
# external modules (wget, requests)

# Using urllib
urllib.urlretrieve ('url_to_file', file_name)

# Using urllib2
urllib2.urlopen('url_to_file')

# Using requests library
requests.get(url)

# Using wget library
wget.download('url', file_name)

#Note: urlopen and urlretrieve are found to perform relatively bad with downloading large files 
#(size > 500 MB). requests.get stores the file in-memory until download is complete.

#Note2: Most of these methods will store the contents in a variable, and then we ll need to iterate
# over it and write it to a file, in text mode, or binary mode. See below.

#################################################################################################
#################################################################################################
# Using urllib2

# Basics
#In Python 2, use urllib2 which comes with the standard library.

# Ex 1.
# import module
import urllib2
# store url response in variable response
response = urllib2.urlopen('http://www.example.com/')
# call method .read() on response and store in html
html = response.read()


# Ex 2 - + Saving to File
#The wb in open('test.mp3','wb') opens a file (and erases any existing file) 
#in binary mode so you can save data with it instead of just text.

import urllib2
mp3file = urllib2.urlopen("http://www.example.com/songs/mp3.mp3")
# Writing to file 
with open('test.mp3','wb') as output:
  output.write(mp3file.read())

# Ex 3 - Saving to file, Parsing Filename, Reading by buffer
import urllib2
import os

    url = "http://download.thinkbroadband.com/10MB.zip"
    file_name = url.split('/')[-1]                              # parse filename
    u = urllib2.urlopen(url)                                    # open url, store in u
    f = open(file_name, 'wb')                                   # open filename, for writing
    meta = u.info()                                             # print meta information, using info()
    file_size = int(meta.getheaders("Content-Length")[0])       # get the size of the file
    print "Downloading: %s Bytes: %s" % (file_name, file_size)
    os.system('cls')
    file_size_dl = 0                                            # set starting file size
    block_sz = 8192                                             # set starting block size
    while True:
        buffer = u.read(block_sz)                               # read the first block
        if not buffer:
            break

        file_size_dl += len(buffer)                             # update file size
        f.write(buffer)                                         # write block into output file
        status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
        status = status + chr(8)*(len(status)+1)
        print status,

    f.close()
    
# Ex. 4 - with functions for download
import os
from urllib2 import urlopen, URLError, HTTPError


def dlfile(url):
    # Open the url
    try:
        f = urlopen(url)
        print "downloading " + url

        # Open our local file for writing
        with open(os.path.basename(url), "wb") as local_file:
            local_file.write(f.read())

    #handle errors
    except HTTPError, e:
        print "HTTP Error:", e.code, url
    except URLError, e:
        print "URL Error:", e.reason, url


def main():
    # Iterate over image ranges
    for index in range(150, 151):
        url = ("http://www.archive.org/download/"
               "Cory_Doctorow_Podcast_%d/"
               "Cory_Doctorow_Podcast_%d_64kb_mp3.zip" %
               (index, index))
        dlfile(url)

if __name__ == '__main__':
    main()


# Ex 5 - with progress bar
import urllib2

url = "http://download.thinkbroadband.com/10MB.zip"

file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(file_name, 'wb')
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading: %s Bytes: %s" % (file_name, file_size)

file_size_dl = 0
block_sz = 8192
while True:
    buffer = u.read(block_sz)
    if not buffer:
        break

    file_size_dl += len(buffer)
    f.write(buffer)
    status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
    status = status + chr(8)*(len(status)+1)
    print status,

f.close()


#################################################################################################
#################################################################################################
# Using urllib

# urllib has method irlretrieve, which will save the file in first parameter, to the pathname in
# second, ie :

# Python 2
import urllib
urllib.urlretrieve ("http://www.example.com/songs/mp3.mp3", "mp3.mp3")

#(for Python 3+ use 'import urllib.request' and urllib.request.urlretrieve)
import urllib.request
import urllib.request.urlretrieve
urllib.urlretrieve ("http://www.example.com/songs/mp3.mp3", "mp3.mp3")

# Ex Basic (Using urllib - Most stable method)
import urllib
testfile = urllib.URLopener()
testfile.retrieve("http://randomsite.com/file.gz", "file.gz")

# Ex.1 (Python 3 + only standard lib)
# Read the html response
urllib.request.urlopen
import urllib.request
response = urllib.request.urlopen('http://www.example.com/')
html = response.read()

# Retrieve / Download file
urllib.request.urlretrieve
import urllib.request
urllib.request.urlretrieve('http://www.example.com/songs/mp3.mp3', 'mp3.mp3')

# Ex. 2 (Python 2 + print)
import urllib
sock = urllib.urlopen("http://diveintopython.org/")
htmlSource = sock.read()                            
sock.close()                                        
print htmlSource 

# Ex. 3 (Python 3 + output to file)
#urlretrieve and requests.get is simple, however in reality not. 
#I have fetched data for couple sites, including text and images, 
#the above two probably solve most of the tasks. but for a more 
#universal solution I suggest the use of urlopen. As it is included 
#in Python 3 standard library, your code could run on any machine that 
#run Python 3 without pre-installing site-par
# Note:
#This answer provides a solution to HTTP 403 Forbidden when downloading 
#file over http using Python. I have tried only requests and urllib 
#modules, the other module may provide something better, but this is 
#the one I used to solve most of the problems.

import urllib.request
url_request = urllib.request.Request(url, headers=headers)
url_connect = urllib.request.urlopen(url_request)
len_content = url_content.length

#remember to open file in bytes mode
with open(filename, 'wb') as f:
    while True:
        buffer = url_connect.read(buffer_size)
        if not buffer: break

        #an integer value of size of written data
        data_wrote = f.write(buffer)

#you could probably use with-open-as manner
url_connect.close()

# Ex. 4 (Python 2 + two ways to do read()
# Note :
# urllib2 is more complete than urllib and should likely be the module 
# used if you want to do more complex things, but to make the answers 
#more complete, urllib is a simpler module if you want just the basics:
import urllib
response = urllib.urlopen('http://www.example.com/sound.mp3')
mp3 = response.read()
Will work fine. Or, if you don't want to deal with the "response" object you can call read() directly:

import urllib
mp3 = urllib.urlopen('http://www.example.com/sound.mp3').read()


# Ex. 5 (Python 3 +  Using .decode method for binary)
import urllib.request
url = 'http://example.com/'
response = urllib.request.urlopen(url)
data = response.read()      # a `bytes` object
text = data.decode('utf-8') # a `str`; this step can't be used if data is binary

# Ex. 6 (Python3  + urlretrieve()) - urlretrieve is considered deprecated
# The easiest way to download and save a file is to use the urllib.request.urlretrieve function:
import urllib.request
# Download the file from `url` and save it locally under `file_name`:
urllib.request.urlretrieve(url, file_name)

# Ex. 7 (Python3 + urlretrieve + get filename)
# Download the file from `url`, save it in a temporary directory and get the
# path to it (e.g. '/tmp/tmpb48zma.txt') in the `file_name` variable:
file_name, headers = urllib.request.urlretrieve(url)
#But keep in mind that urlretrieve is considered legacy and might become 
#deprecated (not sure why, though).

# Ex. 8 (Python 3 - Most correct way - using urlopen())
#But this works well only for small files.
import urllib.request
import shutil
# Download the file from `url` and save it locally under `file_name`:
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
    shutil.copyfileobj(response, out_file)


# Ex. 9 ( Python 3 - Using .write())
# If this seems too complicated, you may want to go simpler and store the whole #download in a bytes 
# object and then write it to a file.
import urllib.request
...
# Download the file from `url` and save it locally under `file_name`:
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
    data = response.read() # a `bytes` object
    out_file.write(data)


# Ex .10 (Combined with unzipping)
#It is possible to extract .gz (and maybe other formats) compressed data on 
#the fly, but such an operation probably requires the HTTP server to support 
#random access to the file.

import urllib.request
import gzip
...
# Read the first 64 bytes of the file inside the .gz archive located at `url`
url = 'http://example.com/something.gz'
with urllib.request.urlopen(url) as response:
    with gzip.GzipFile(fileobj=response) as uncompressed:
        file_header = uncompressed.read(64) # a `bytes` object
        # Or do anything shown above using `uncompressed` instead of `response


#################################################################################################
#################################################################################################
# Using requests

# Ex 1 - Using requests module to print length of file
import requests
 
url = "http://download.thinkbroadband.com/10MB.zip"
r = requests.get(url)
print len(r.content)
#10485760

# Ex 2 - Using method .text
# Note: Used in case of downloading text files
# response.text return the output as a string object, 
#use it when you're downloading a text file. Such as HTML file, etc.

import requests
url = 'http://www.hrecos.org//images/Data/forweb/HRTVBSH.Metadata.pdf'
response = requests.get(url)
with open('/tmp/metadata.pdf', 'wb') as f:
    f.write(response.text)
    

# Ex 3 - Using method .content
# response.content return the output as bytes object, use it when 
#you're downloading a binary file. Such as PDF file, audio file, image, etc.
# In case of downloading non text files, here 
# response.txt returns a byte string
import requests
url = 'http://www.hrecos.org//images/Data/forweb/HRTVBSH.Metadata.pdf'
response = requests.get(url)
with open('/tmp/metadata.pdf', 'wb') as f:
    f.write(response.content)
    
# Ex 4 - Using method response.raw
#chunk_size is the chunk size which you want to use. If you set it as 2000, 
#then requests will download that file the first 2000 bytes, write them into 
#the file, and do this again, again and again, unless it finished.
#So this can save your RAM. But I'd prefer use response.content instead in 
#this case since your file is small. As you can see use response.raw is complex.
import requests
url = 'http://www.hrecos.org//images/Data/forweb/HRTVBSH.Metadata.pdf'
r = requests.get(url, stream=True)

with open('/tmp/metadata.pdf', 'wb') as fd:
    for chunk in r.iter_content(chunk_size):
        fd.write(chunk)


# Ex 5 - Using requests module + implemented progressbar + tqdm
from tqdm import tqdm
import requests

url = "http://download.thinkbroadband.com/10MB.zip"
response = requests.get(url, stream=True)

with open("10MB", "wb") as handle:
    for data in tqdm(response.iter_content()):
        handle.write(data)
        
#################################################################################################
#################################################################################################
# Using wget
import wget
wget.download('url')

#If you have wget installed, you can use parallel_sync.
#pip install parallel_sync

from parallel_sync import wget
urls = ['http://something.png', 'http://somthing.tar.gz', 'http://somthing.zip']
wget.download('/tmp', urls)
# or a single file:
wget.download('/tmp', urls[0], filenames='x.zip', extract=True)


##################################################################################
#The Structure of HTTP / RESTful API
#Following are points to remember while developing RESTful API:
#
#URL ( Universal Resource Locator )
#Message Type
#Headers
#Parameters
#Payload
#Authentication

#################################################################################
#1. URL
#The URL is the core of RESTful API.  Generally the URL refers a web page, but it can also refer a service or a resource.
#For example : http://graph.facebook.com/v2.3/{photo-id}
#The above URL is a resource which holds the photo with id photo-id.  As per the above syntax the value for the photo-id must be replaced with {photo-id}.
#Python code snippet to store a URL in a Python object:

>>> url = 'http://graph.facebook.com/v2.3/123435'


#################################################################################
#2. Message Types

#HTTP supports 
#       GET
#       POST
#       PUT
#       DELETE message types.  

#There are few more types as well.  Please take a look at the reference[1] to understand them in detail.

#########
#GET – to retrieve resource.  
#Eg. GET http://graph.facebook.com/v2.3/1234345 will retrieve the photograph stored in that location.

>>> import requests
>>> ret = requests.get(url)
>>> ret.staus_code
200

########
#POST – to update a resource .  
#POST http://graph.facebook.com/v2.3/123435 will update the existing photo with the new 
#photograph supplied in the message payload.  POST will also create resource, if the resource 
#is not available.

>>> import requests
>>> ret = requests.post(url)
>>> ret.status_code
200

########
#PUT – to create a resource.  PUT http://graph.facebook.com/v2.3/123435 will create a resource 
#by uploading the photograph sent on the message payload.

>>> import requests
>>> ret = requests.put(url)
>>> ret.status_code
201

########
#DELETE – to delete a resource – DELETE http://graph.facebook.com/v2.3/123435 will delete the 
#photograph present in that location.

>>> import requests
>>> ret = requests.delete(url)
>>> ret.status_code
200

#################################################################
#3. Headers
#The HTTP header generally contains information used to process the request and responses.  
#The headers are colon separated key value pairs. For example “Accept: text/plain”.  
#The http request & response may be have multiple headers.  Since it is a key value pair, 
#we can use Python’s dictionary data type to store these values.

#Single Header & Multiple headers:

>>> head = {"Content-type": "application/json"}
>>> head= {"Accept":"applicaiton/json",
        "Content-type": "application/json"}

#Make the API call with the above header:

>>> ret = requests.get(url,headers=head)
>>> ret.status_code
200

#In the above statement, “headers” is the name of argument.  So we have used the Python 
#feature of passing named arguments to a function.


##############################################################################
#4 Parameters

#Sometimes we may want to pass values in the URL parameters.  For example, the URL 
#http://www.abc.com/abc.php?name=Saravanan&designation=Technical Leader .  This 
#URL expects the user to send the value for the keyword “name” and  “designation”.    
#The below code snippet helps to you accomplish this tasks.  The “params” argument is used 
#to set the value for parameters.

>>> parameters = {'name':'Saravanan',
          'designation':'Technical Leader'}
>>> head = {'Content-Type':'application/json'}
>>> ret = requests.post(url,params=parameters,header=head)
>>> ret.status_code
200

#################################################################################
#5 Payload

#The payload contains the data to be sent on the requests.  In this we will see how to send a JSON 
#object in the payload.

empObj = {'name':'Saravanan', 'title':'Architect','Org':'Cisco Systems'}

#As in the previous examples, we cannot send the JSON object which is a dictionary data type in 
#Python.  In the above snippet we created a empObj which is a dictionary data type of Python.  
#This must be converted into JSON object before send the request.

#The json library in Python helps here .

>>> import json
>>> emp = json.dumps(empObj)

#The json.dumps converts the dictionary object into a JSON object.
#The complete code snippet is below:

>>> import json
>>> import requests
>>>
>>> url='http://graph.facebook.com/v2.3/123123
>>> head = {'Content-type':'application/json',
             'Accept':'application/json'}
>>> payload = {'name':'Saravanan',
               'Designation':'Architect',
               'Orgnization':'Cisco Systems'}

>>> payld = json.dumps(payload)
>>> ret = requests.post(url,header=head,data=payld)
>>> ret.status_code
200
 
##########################################################################################
#6 Authorization

#The “requests” library supports various forms of authentication, which includes Basic, 
#Digest Authentication, OAuth and others.  The value for authentication can be passed using 
#“auth” parameter of the requests method.

>>> 
>>> from requests.auth import HTTPBasicAuth
>>> url = 'http://www.hostmachine.com/sem/getInstances'
>>> requests.get(url, auth=HTTPBasicAuth('username','password')
200

#The “auth” argument can take any function, so if you want to define your own custom authentication 
#and pass it to “auth“.

#Summary
#The above code snippet is a sample to explain the simplicity of Python and requests library.  
#You can take a look at the official website of Requests and learn advanced concepts 
#in RESTful API developments.

#####################################################################################
#References
#[1] HTTP Wiki : http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol
#[2] History of HTTP by W3 Org : http://www.w3.org/Protocols/History.html
#[3] Requests – http://docs.python-requests.org/en/latest/
#[4] Requests and Urllib2 Comparison : https://gist.github.com/kennethreitz/973705
#[5] Installation of Requests library : http://docs.python-requests.org/en/latest/user/install/#install
#[6] HTTP Headers – http://en.wikipedia.org/wiki/List_of_HTTP_header_fields
 
Request Library
The Requests python library is simple and straight forward library for developing RESTful Clients.  
Python has a built in library called urllib2, it is bit complex and old style when compared to Requests. 
After writing couple of programs using the urllib2, I am completely convinced by the below statement issued by the 
developers of Requests.   Also refer the Reference[4] for comparing the code segments written using 
urllib2 and requests library.

Python’s standard urllib2 module provides most of the HTTP capabilities you need, but the API is thoroughly 
broken. It was built for a different time — and a different web. It requires an enormous amount of work 
(even method overrides) to perform the simplest of tasks.

Please refer the URL http://docs.python-requests.org/en/latest/user/install/#install to install the 
requests library before proceeding.
BASIC THEORY

HTTP & RESTful APIs
HTTP is a request / response protocol and is similar to client-server model.  
In the internet world, generally the web browser sends the HTTP request and 
the web server responds with HTTP response.  Also it is not necessary that 
the client is always a browser.  The client can be any application which can 
send a HTTP request.

We have used so many application level communication protocols.  Starting from
RPC (Remote Procedure Call), Java RMI (Remote Method Invocation), XML/RPC, 
SOAP/HTTP.  In this lineage RESTful API is the current application level 
client-server protocol.

RESTful API is an application level protocol.  It is heavily used in internet 
(WWW)  and distributed systems.  It is recommended by Services Oriented Architecture 
(SOA) to communicate between loosely coupled distributed components. 

The RESTful API is a form of HTTP protocol is the de facto standard for Cloud 
communications.

The two properties of RESTful which makes suitable for modern internet and 
cloud communication is stateless and cache-less.  The protocol does not enforce
any state-machine, it means there is no order of protocol messages enforced.  
Also the protocol will not remember any information across requests or responses.
  
Each and every request is unique and it has no relation with previous or next request 
which may come.  To understand more on HTTP protocol look at the references below.  

Basic Python 3 Library = import requests