抽空爬了豆瓣的top250,欢迎大家一起分享Python爬虫的经验

本人小白,只会这些简单的项目emmm……

代码很简单,引用了requests和pyquery库

import requests
from pyquery import PyQuery as pq
for url in ['https://movie.douban.com/top250?start={}'.format(page) for page in range(0,250,25)]:
    html = pq(requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36'}).text)
    for item in html('.item').items():
        num = item.find('.pic em').text()
        title = item.find('.title').html()
        link = item.find('.pic a').attr('href')
        star = item.find('.rating_num').text()
        print(num, title, star, link)

厉害,我是纯白,只会无代码的PowerBi爬一些信息 :slightly_smiling_face:

可以的,不过这只是简单的打印,还需要存储下来。可以尝试csv或者excel保存。