Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
docs: clarify feapder callback patterns
  • Loading branch information
ShellMonster committed May 13, 2026
commit bac65d30b737fc6fdcb35ac22cceb342bb64b0bf
1 change: 1 addition & 0 deletions Skills/feapder-crawler/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ metadata:
- 用户询问“要不要脱离 feapder”“改成 requests 怎么样”这类方案判断时,只能先做风险/收益评估和替代方案说明;这不等于授权实现。只有用户明确确认“就改成非 feapder 实现”后,才可以写非 feapder 代码。
- 不要假设 `AirSpider` 的配置行为等同于 Redis 分布式爬虫;先确认基类和启动参数。
- 对 `BatchSpider` / `TaskSpider`,必须区分任务下发 `start_monitor_task()` 和 worker 采集 `start()`。
- 多页面、多来源、多步骤解析时,优先把回调写成公开的 `parse_xxx` 方法,并在 `feapder.Request(..., callback=self.parse_xxx)` 显式绑定;不要把所有 URL 分支塞进默认 `parse()` 再转调 `_parse_xxx` 私有方法。默认 `parse()` 只适合未指定 callback 的入口或很小的单页爬虫。
- 排查入库问题时,沿着 `yield Item` 或 `yield UpdateItem` 追到 `ITEM_PIPELINES`,再看具体 pipeline 和配置。
- 排查解析问题时,先看 `Request.callback`、`parser_name`、`download_midware`、`validate`、`exception_request`、`failed_request`,不要急着改 scheduler。
- 遇到 import/path 问题时,记住 feapder 从 `items/` 或 `spiders/` 启动时会把项目根目录插入 `sys.path`。
Expand Down
50 changes: 50 additions & 0 deletions Skills/feapder-crawler/references/request-response.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,56 @@ def parse_list(self, request, response):
yield feapder.Request(href, callback=self.parse_detail, category=category)
```

多来源或多步骤解析时,不要把所有页面都扔进默认 `parse()` 再用 URL 判断分发到 `_parse_xxx` 私有方法。优先让每条链路在 `Request` 上显式绑定 callback:

```python
YYB_URL = "https://sj.qq.com/appdetail/{}"
WDJ_URL = "https://www.wandoujia.com/apps/{}"
MI_URL = "https://app.mi.com/details?id={}"


def start_requests(self):
for pkg in self.seed_packages:
yield feapder.Request(YYB_URL.format(pkg), callback=self.parse_yyb)


def parse_yyb(self, request, response):
pkg = request.url.rsplit("/", 1)[-1]
for related_pkg in self.extract_yyb_related(response):
yield feapder.Request(
YYB_URL.format(related_pkg),
callback=self.parse_yyb,
)
yield feapder.Request(
WDJ_URL.format(related_pkg),
callback=self.parse_wandoujia,
)
yield feapder.Request(
MI_URL.format(related_pkg),
callback=self.parse_mi,
)


def parse_wandoujia(self, request, response):
pass


def parse_mi(self, request, response):
pass
```

反例:

```python
def parse(self, request, response):
if "sj.qq.com" in response.url:
yield from self._parse_yyb(request, response)
elif "wandoujia.com" in response.url:
yield from self._parse_wandoujia(request, response)
```

这个反例能跑,但会隐藏 feapder 的 callback 链路,不利于调试 `request.callback_name`、失败重试、跨 parser 回调和后续维护。只有很小的单页爬虫才适合只写默认 `parse()`。

## Parser 钩子

`BaseParser` 和各类 spider 都支持这些常见钩子:
Expand Down