Log in website with python.

POST
GET
Comparison
Log in with urllib and urllib2

To log in a website, we need submit some parameters to it, and the website will make some responses.

There are two HTTP Request methods: GET and POST.

POST

Submits data to be processed to a specified resource

GET

Requests data from a specified resource

Comparison

	GET	POST
BACK button/Reload	Harmless	Data will be re-submitted (the browser should alert the user that the data are about to be re-submitted)
Bookmarked	Can be bookmarked	Cannot be bookmarked
Encoding type	application/x-www-form-urlencoded	application/x-www-form-urlencoded or multipart/form-data. Use multipart encoding for binary data
Cached	Can be cache	Not cached
History	Parameters remain in browser history	Parameters are not saved in browser history
Restrictions on data length	Yes, when sending data, the GET method adds the data to the URL; and the length of a URL is limited (maximum URL length is 2048 characters)	No restrictions
Restrictions on data type	Only ASCII characters allowed	No restrictions. Binary data is also allowed
Security	GET is less secure compared to POST because data sent is part of the URL. Never use GET when sending passwords or other sensitive information!	POST is a little safer than GET because the parameters are not stored in browser history or in web server logs
Visibility	Data is visible to everyone in the URL	Data is not displayed in the URL

Log in with urllib and urllib2

We need to prepare the parameter in the function of urlopen(url, data, timeout). It is necessary to prepare a dict with the keywords of ‘username’ and ‘password’ (we should check the source code of the website to check the keywords).

import urllib
import urllib2
values = {"username":"1016903103@qq.com","password":"XXXX"}
data = urllib.urlencode(values) 
url = "https://passport.csdn.net/account/login?from=http://my.csdn.net/my/mycsdn"
request = urllib2.Request(url,data)
response = urllib2.urlopen(request)
print response.read()  

To log in the website of p.nju.edu.cn

just modify the url to http://p.nju.edu.cn/portal_io/login is ok.

But it is unsafe to write the username and the password into the script directly. So I suggest to receive the input from the console. And then, the code with modification is shown below.

import sys
import urllib
import urllib2
import getpass

def readInUserName():
	# read in the username
	username = raw_input('username:')
	return username

def readInPassword():
	# read in the password without display
	password = getpass.getpass()
	return password

# deal the case that users set the $username and the $password by the command line
if len(sys.argv) < 3:
	username = readInUserName()
	password = readInPassword()
else:
	username = sys.argv[1]
	password = sys.argv[2]

values = {}
values['username'] = username
values['password'] = password

data = urllib.urlencode(values)

url = "http://p.nju.edu.cn/portal_io/login"

request = urllib2.Request(url,data)
reponse = urllib2.urlopen(request)

context = reponse.read()

msg = context.split(',')

# print the login state
for ele in  msg:
    if ele.split(':')[0] == '"reply_msg"':
        print ele.split(':')[1]
        break

Then, there is a shell script for log in

#! /bin/bash

usr=YOUR_USERNAME
pwd=YOUR_PASSWROD

python pLog.py $usr $pwd

ref http://wiki.jikexueyuan.com/project/python-crawler-guide/the-use-of-urllib-library.html

ref https://www.w3schools.com/tags/ref_httpmethods.asp

It's our wits that make us men.

Work hard! Play hard!

Log in website with python.

POST

GET

Comparison

Log in with urllib and urllib2

It's our wits that make us men.

Work hard! Play hard!

目录

POST

GET

Comparison

Log in with urllib and urllib2