python访问网页代码(python访问https)

http://www.itjxue.com  2023-04-03 16:17  来源:未知  点击次数: 

想用python编写一个脚本,登录网页,在网页里做一系列操作,应该怎样实现?

python编写一个脚本的具体操作:

1、首先,打开python并创建一个新的PY文件。

2、其次,import os,因为涉及系统文件的路径,因此首先在此处导入系统模块。

3、随后,可以双击打开HTML文件,然后就可以看到书写的网页,如下图所示。

4、最后,添加html.close(),需添加此行代码以关闭,否则将占用大量内存,如下图所示。这样,用python简单的制作一个网页的所有操作就完成了。完成。

python 多线程 访问网站

#python2

#coding=utf-8

import?os,re,requests,sys,time,threading

reload(sys)

sys.setdefaultencoding('utf-8')

class?Archives(object):

????def?__init__(self,?url):

????????self.url?=?url

????

????def?save_html(self,?text):

????????fn?=?'{}_{}'.format(int(time.time()),?self.url.split('/')[-1])

????????dirname?=?'htmls'

????????if?not?os.path.exists(dirname):

????????????os.mkdir(dirname)

????????with?open(os.path.join(dirname,?fn),?'w')?as?f:

????????????f.write(text)

????????????

????def?get_htmls(self):

????????try:??????????????

????????????r?=??requests.get(self.url)

????????????r.raise_for_status()

????????????r.encoding?=?r.apparent_encoding

????????????print?'get?html?from?',?url

????????????self.save_html(r.text)

????????except?Exception,e:

????????????print?'爬取失败',e????????????

????def?main(self):

????????thread?=?threading.Thread(target=self.get_htmls())

????????thread.start()

????????thread.join()

if?__name__=='__main__':

????start=time.time()

????fn?=?sys.argv[1]?if?len(sys.argv)1?else?'urls.txt'

????with?open(fn)?as?f:

????????s?=?f.readlines()

????for?url?in?set(s):

????????a=Archives(url.strip())

????????a.main()????

????end=time.time()

????print?end-start

求python抓网页的代码

python3.x中使用urllib.request模块来抓取网页代码,通过urllib.request.urlopen函数取网页内容,获取的为数据流,通过read()函数把数字读取出来,再把读取的二进制数据通过decode函数解码(编号可以通过查看网页源代码中meta? http-equiv="content-type" content="text/html;charset=gbk" /得知,如下例中为gbk编码。),这样就得到了网页的源代码。

如下例所示,抓取本页代码:

import?urllib.request

html?=?urllib.request.urlopen('

).read().decode('gbk')?#注意抓取后要按网页编码进行解码

print(html)

以下为urllib.request.urlopen函数说明:

urllib.request.urlopen(url,

data=None, [timeout, ]*, cafile=None, capath=None,

cadefault=False, context=None)

Open the URL url, which can be either a string or a Request object.

data must be a bytes object specifying additional data to be sent to

the server, or None

if no such data is needed. data may also be an iterable object and in

that case Content-Length value must be specified in the headers. Currently HTTP

requests are the only ones that use data; the HTTP request will be a

POST instead of a GET when the data parameter is provided.

data should be a buffer in the standard application/x-www-form-urlencoded format. The urllib.parse.urlencode() function takes a mapping or

sequence of 2-tuples and returns a string in this format. It should be encoded

to bytes before being used as the data parameter. The charset parameter

in Content-Type

header may be used to specify the encoding. If charset parameter is not sent

with the Content-Type header, the server following the HTTP 1.1 recommendation

may assume that the data is encoded in ISO-8859-1 encoding. It is advisable to

use charset parameter with encoding used in Content-Type header with the Request.

urllib.request module uses HTTP/1.1 and includes Connection:close header

in its HTTP requests.

The optional timeout parameter specifies a timeout in seconds for

blocking operations like the connection attempt (if not specified, the global

default timeout setting will be used). This actually only works for HTTP, HTTPS

and FTP connections.

If context is specified, it must be a ssl.SSLContext instance describing the various SSL

options. See HTTPSConnection for more details.

The optional cafile and capath parameters specify a set of

trusted CA certificates for HTTPS requests. cafile should point to a

single file containing a bundle of CA certificates, whereas capath

should point to a directory of hashed certificate files. More information can be

found in ssl.SSLContext.load_verify_locations().

The cadefault parameter is ignored.

For http and https urls, this function returns a http.client.HTTPResponse object which has the

following HTTPResponse

Objects methods.

For ftp, file, and data urls and requests explicitly handled by legacy URLopener and FancyURLopener classes, this function returns a

urllib.response.addinfourl object which can work as context manager and has methods such as

geturl() — return the URL of the resource retrieved,

commonly used to determine if a redirect was followed

info() — return the meta-information of the page, such

as headers, in the form of an email.message_from_string() instance (see Quick

Reference to HTTP Headers)

getcode() – return the HTTP status code of the response.

Raises URLError on errors.

Note that None

may be returned if no handler handles the request (though the default installed

global OpenerDirector uses UnknownHandler to ensure this never happens).

In addition, if proxy settings are detected (for example, when a *_proxy environment

variable like http_proxy is set), ProxyHandler is default installed and makes sure the

requests are handled through the proxy.

The legacy urllib.urlopen function from Python 2.6 and earlier has

been discontinued; urllib.request.urlopen() corresponds to the old

urllib2.urlopen.

Proxy handling, which was done by passing a dictionary parameter to urllib.urlopen, can be

obtained by using ProxyHandler objects.

Changed in version 3.2: cafile

and capath were added.

Changed in version 3.2: HTTPS virtual

hosts are now supported if possible (that is, if ssl.HAS_SNI is true).

New in version 3.2: data can be

an iterable object.

Changed in version 3.3: cadefault

was added.

Changed in version 3.4.3: context

was added.

python怎么在网页输入代码?

如果想实现网页里面输入python代码 ,然后可以看到执行结果,可以进入 这个网页直接在网页输入运行代码。

查看python的源代码的方法:

按Windows+R键,在运行里输入notepad,然后将后缀名为.py的python源文件拖进notepad(词本)程序里就可以看到了。

如果要好一点的效果,就去下一个 notepad++ ,这个软件查看各种代码效果都很好,

也可以下一个Uedit,

如果想运行python脚本,就去下一个python安装,

python自带一个IDE,可以查看、编辑与调试python代码,安装python之后可以右击后缀为.py的文件,选择Edit with IDLE,这样即可以查看,也可以调试代码。

如何用python访问自己编写的网页

html

body

form

可获取码列表:

select name="liscode"

option value="01"123456/option

option value="02"123457/option

option value="03"123458/option

option value="04"123459/option

option value="05"123460/option

option value="06"123461/option

/select

input type="submit" value="确认获取"/

/form

/body

/html

其中所有liscode是从一个txt文档上提取的,当用户点击获取一个的时候,该项即被删除。

如何用python实现呢?

做一个py脚本或exe给用户实现的话大概像下面这样:

Python code

infile = open('codelist.txt','r') codelist = infile.readlines() used_code = codelist[0] #remove用掉的code(删除行) codelist.remove(codelist[0]) infile.close() #重写文件(我不知道是否有能直接删除一行的文件操作方法) outfile = open('codelist.txt','w') for code in codelist: outfile.write(code + '\n') outfile.close() print used_code

python里面request怎么读取html代码?

使用Python 3的requests模块抓取网页源码并保存到文件示例:

import requests

html = requests.get("")

with open('test.txt','w',encoding='utf-8') as f:

f.write(html.text)

这是一个基本的文件保存操作,但这里有几个值得注意的问题:

1.安装requests包,命令行输入pip install requests即可自动安装。很多人推荐使用requests,自带的urllib.request也可以抓取网页源码

2.open方法encoding参数设为utf-8,否则保存的文件会出现乱码。

3.如果直接在cmd中输出抓取的内容,会提示各种编码错误,所以保存到文件查看。

4.with open方法是更好的写法,可以自动操作完毕后释放资源。

另一个例子:

import requests

ff = open('testt.txt','w',encoding='utf-8')

with open('test.txt',encoding="utf-8") as f:

for line in f:

ff.write(line)

ff.close()

这是演示读取一个txt文件,每次读取一行,并保存到另一个txt文件中的示例。

因为在命令行中打印每次读取一行的数据,中文会出现编码错误,所以每次读取一行并保存到另一个文件,这样来测试读取是否正常。(注意open的时候制定encoding编码方式)

(责任编辑:IT教学网)

更多