14.2 图片验证码的处理

图片验证码是常见的反爬虫手段之一。处理图片验证码通常需要借助第三方服务或机器学习模型来识别验证码内容。

##### 1. **使用第三方服务**

有许多第三方服务提供验证码识别功能，如[打码平台](https://www.dama2.com/)、[极验](https://www.geetest.com/)等。这些服务通常需要注册账号并调用API。

**示例：使用打码平台识别验证码**

Python复制

```python
import requests

def recognize_captcha(image_path):
    api_key = "your_api_key"
    api_secret = "your_api_secret"
    captcha_image = open(image_path, "rb").read()

response = requests.post(
        "https://api.dama2.com/v2/rec",
        data={
            "user": api_key,
            "pwd": api_secret,
            "type": 1004,  # 验证码类型
        },
        files={"file": captcha_image}
    )
    result = response.json()
    return result["data"]["val"]  # 返回识别结果

# 下载验证码图片
driver.get("https://example.com/captcha")
captcha_image = driver.find_element(By.ID, "captcha_image")
captcha_image.screenshot("captcha.png")

# 识别验证码
captcha_text = recognize_captcha("captcha.png")
print("Recognized captcha:", captcha_text)

# 输入验证码
captcha_input = driver.find_element(By.ID, "captcha_input")
captcha_input.send_keys(captcha_text)
```

##### 2. **使用机器学习模型**

对于简单的验证码，可以使用机器学习模型进行识别。例如，使用[Tesseract OCR](https://github.com/tesseract-ocr/tesseract)或[EasyOCR](https://github.com/JaidedAI/EasyOCR)等工具。

**安装EasyOCR**

bash复制

```bash
pip install easyocr
```

**示例：使用EasyOCR识别验证码**

Python复制

```python
import easyocr
from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get("https://example.com/captcha")

# 下载验证码图片
captcha_image = driver.find_element(By.ID, "captcha_image")
captcha_image.screenshot("captcha.png")

# 识别验证码
reader = easyocr.Reader(["en"])
captcha_text = reader.readtext("captcha.png")[0][1]
print("Recognized captcha:", captcha_text)

# 输入验证码
captcha_input = driver.find_element(By.ID, "captcha_input")
captcha_input.send_keys(captcha_text)
```

------

### 总结

本章介绍了如何处理动态页面加载和图片验证码的识别。通过显式等待、轮询检查和监听网络请求，可以有效处理动态页面加载的问题。对于图片验证码，可以使用第三方服务或机器学习模型进行识别。掌握这些技术后，你将能够更高效地处理动态内容，提高自动化任务的可靠性。