It's our wits that make us men.

Work hard! Play hard!

Log in website with python.

Posted on By hejw005

To log in a website, we need submit some parameters to it, and the website will make some responses.

There are two HTTP Request methods: GET and POST.

POST

Submits data to be processed to a specified resource

GET

Requests data from a specified resource

Comparison

  GET POST
BACK button/Reload Harmless Data will be re-submitted (the browser should alert the user that the data are about to be re-submitted)
Bookmarked Can be bookmarked Cannot be bookmarked
Encoding type application/x-www-form-urlencoded application/x-www-form-urlencoded or multipart/form-data. Use multipart encoding for binary data
Cached Can be cache Not cached
History Parameters remain in browser history Parameters are not saved in browser history
Restrictions on data length Yes, when sending data, the GET method adds the data to the URL; and the length of a URL is limited (maximum URL length is 2048 characters) No restrictions
Restrictions on data type Only ASCII characters allowed No restrictions. Binary data is also allowed
Security GET is less secure compared to POST because data sent is part of the URL. Never use GET when sending passwords or other sensitive information! POST is a little safer than GET because the parameters are not stored in browser history or in web server logs
Visibility Data is visible to everyone in the URL Data is not displayed in the URL

Log in with urllib and urllib2

We need to prepare the parameter in the function of urlopen(url, data, timeout). It is necessary to prepare a dict with the keywords of ‘username’ and ‘password’ (we should check the source code of the website to check the keywords).

import urllib
import urllib2
values = {"username":"1016903103@qq.com","password":"XXXX"}
data = urllib.urlencode(values) 
url = "https://passport.csdn.net/account/login?from=http://my.csdn.net/my/mycsdn"
request = urllib2.Request(url,data)
response = urllib2.urlopen(request)
print response.read()  

To log in the website of p.nju.edu.cn

just modify the url to http://p.nju.edu.cn/portal_io/login is ok.

But it is unsafe to write the username and the password into the script directly. So I suggest to receive the input from the console. And then, the code with modification is shown below.

import sys
import urllib
import urllib2
import getpass

def readInUserName():
	# read in the username
	username = raw_input('username:')
	return username

def readInPassword():
	# read in the password without display
	password = getpass.getpass()
	return password

# deal the case that users set the $username and the $password by the command line
if len(sys.argv) < 3:
	username = readInUserName()
	password = readInPassword()
else:
	username = sys.argv[1]
	password = sys.argv[2]

values = {}
values['username'] = username
values['password'] = password

data = urllib.urlencode(values)

url = "http://p.nju.edu.cn/portal_io/login"

request = urllib2.Request(url,data)
reponse = urllib2.urlopen(request)

context = reponse.read()

msg = context.split(',')

# print the login state
for ele in  msg:
    if ele.split(':')[0] == '"reply_msg"':
        print ele.split(':')[1]
        break

Then, there is a shell script for log in

#! /bin/bash

usr=YOUR_USERNAME
pwd=YOUR_PASSWROD

python pLog.py $usr $pwd

ref http://wiki.jikexueyuan.com/project/python-crawler-guide/the-use-of-urllib-library.html

ref https://www.w3schools.com/tags/ref_httpmethods.asp