Python 3 爬虫库 Beautiful Soup 4.4.0 的安装及使用示例

开发环境

Python 3.8
Ubuntu 20.04

安装 Beautiful Soup 依赖

首先安装 pipenv，来解决依赖问题。安装方法详细参考 python 3 使用 pipenv 进行依赖管理与项目隔离

然后在项目目录下，pipenv shell 进入虚拟环境，执行安装依赖的命令：

pipenv install beautifulsoup4
pipenv install requests
pipenv install lxml

一个最简单的爬虫

打印本站 www.sunzhongwei.com 的网站标题，及首页文章列表的标题。

#!/usr/bin/env python3

import requests
from bs4 import BeautifulSoup


def run():
    r = requests.get('https://www.sunzhongwei.com')

    if r.status_code != 200:
        print("error: status code is ", r.status_code)
        return

    soup = BeautifulSoup(r.content)
    encoding = soup.original_encoding
    print("original encoding is: ", encoding)

    title = soup.select_one("title")
    print("title is ", title.text)

    print("Name of articles:")
    articles = soup.select("h3.media-heading")
    for article in articles:
        print(article.text)


if '__main__' == __name__:
    print("Hello world!")
    run()

相对于 find，更喜欢 select，因为跟 CSS 选择器的语法一致，没有记忆负担。

执行

pipenv shell
python3 test.py

参考

https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/

关于作者 🌱

我是来自山东烟台的一名开发者，有感兴趣的话题，或者软件开发需求，欢迎加微信 zhongwei 聊聊，或者关注我的个人公众号“大象工具”，查看更多联系方式

Python 3 爬虫库 Beautiful Soup 4.4.0 的安装及使用示例

开发环境

安装 Beautiful Soup 依赖

一个最简单的爬虫

执行

参考

关于作者 🌱

相关推荐