Content Intelligence

将小红书/抖音采集数据转化为可执行的内容洞察 — 趋势检测、爆款归因、模式识别、选题建议

in  xiaohongshu-downloader 或 douyin-downloader 的输出目录（manifest.json / download_manifest.jsonl）
out JSON 结构化洞察 + Markdown 报告 + HTML 可视化报告

fail 输入目录不存在或无可识别数据   → exit 1, 提示 "No data found"
fail ANTHROPIC_API_KEY 缺失          → exit 1, 提示设置 key 或使用 --skip-analysis
fail Claude 返回非 JSON             → 降级为仅 ingest 结果（无分析），不崩溃
fail 输入目录混合未知格式            → 跳过无法识别的目录，处理可识别的部分

Adapters: xiaohongshu-downloader (manifest.json), douyin-downloader (download_manifest.jsonl)

示例输出

ci ingest — 数据预览（不需要 API key）

$ ci ingest ./xhs-data ./douyin-data
Found 47 items from 2 directories

[xiaohongshu ] 5个让你效率翻倍的AI工具，第3个绝了                    | 👍12830 💬342 ⭐5621 score=24137
[douyin      ] 为什么90%的人用不好ChatGPT                           | 👍8450 💬891 ⭐0 score=15573
[xiaohongshu ] 打工人必备！这个免费AI帮我写完了年终总结               | 👍6200 💬156 ⭐2840 score=12348
...

ci run — 完整分析（需要 Anthropic API key）

$ ci run ./xhs-data ./douyin-data -o ./reports
📥 Ingesting 2 directories...
   Found 47 content items
   Platforms: xiaohongshu, douyin

📊 Top 5 by engagement:
   1. [xiaohongshu] 5个让你效率翻倍的AI工具，第3个绝了 (score: 24137)
   2. [douyin] 为什么90%的人用不好ChatGPT (score: 15573)
   ...

🧠 Running Claude analysis on 47 items...
   Trends: 8
   Top content: 12
   Patterns: 6
   Topic suggestions: 7

📝 Generating reports → ./reports
   ✅ json: reports/2026-03-29_intelligence.json
   ✅ md: reports/2026-03-29_report.md
   ✅ html: reports/2026-03-29_report.html

✨ Done! Open the HTML report:
   open reports/2026-03-29_report.html

intelligence.json 结构（机器可读）

{
  "trends": [{"topic": "AI工具", "heat_score": 85, "direction": "rising"}],
  "top_content": [{"title": "...", "why_it_works": "数字列表 + 痛点开头"}],
  "patterns": [{"pattern": "反常识标题", "frequency": "top 30% 中出现 60%"}],
  "topic_suggestions": [{"topic": "...", "angle": "...", "confidence": "high"}]
}

架构

┌──────────────────┐     ┌──────────────────┐
│  xiaohongshu-    │     │  douyin-          │
│  downloader      │     │  downloader       │
│  output/         │     │  downloads/       │
└────────┬─────────┘     └────────┬──────────┘
         │                        │
         └───────────┬────────────┘
                     ▼
            ┌─────────────────┐
            │  Ingest         │  自动检测平台格式
            │  → ContentItem  │  统一数据模型
            └────────┬────────┘
                     ▼
            ┌─────────────────┐
            │  Analyze        │  Claude API
            │  趋势 / 归因 /   │  模式识别
            │  模式 / 选题     │
            └────────┬────────┘
                     ▼
            ┌─────────────────┐
            │  Report         │  JSON + MD + HTML
            └─────────────────┘

三阶段 pipeline：Ingest（多平台数据归一化）→ Analyze（Claude 内容模式识别）→ Report（多格式输出）。每个阶段对应一个模块，数据通过 ContentItem 统一 schema 流转。

快速开始

# 1. 安装
git clone https://github.com/zinan92/content-intelligence.git
cd content-intelligence
pip install -e .

# 2. 预览采集数据（不需要 API key）
ci ingest ./path/to/xhs-output ./path/to/douyin-output

# 3. 完整分析（需要 Anthropic API key）
export ANTHROPIC_API_KEY=sk-ant-...
ci run ./xhs-output ./douyin-output -o ./reports

# 4. 查看报告
open reports/*_report.html

功能一览

功能	说明	CLI 命令
XHS 数据解析	解析 manifest.json + note_*.json，支持 notes/items 两种字段	`ci ingest`
抖音数据解析	解析 download_manifest.jsonl + *_metadata.json，自动加载转录文本	`ci ingest`
自动平台检测	根据文件格式判断小红书 vs 抖音，无需手动指定	`ci ingest` / `ci run`
趋势检测	话题热度排名 + 上升/下降方向 + 跨平台对比	`ci run`
爆款归因	分析 top 内容为什么火（标题策略、内容结构、互动特征）	`ci run`
模式识别	爆款标题模式、内容模式、互动模式的频率统计	`ci run`
选题建议	基于数据的选题 + 切入角度 + 置信度评级	`ci run`
多格式报告	JSON（机器可读）+ Markdown + HTML 报告	`ci run -o ./reports`

技术栈

层级	技术	用途
语言	Python 3.11+	核心运行时
CLI	Click	命令行界面
AI	Anthropic Claude API (claude-sonnet-4-5)	内容分析、模式识别
模板	Jinja2	HTML 报告渲染
数据模型	dataclass (frozen)	不可变 ContentItem + AnalysisResult
测试	pytest	单元测试

项目结构

content-intelligence/
├── src/
│   ├── cli.py          # CLI 入口 (ci run / ci ingest)
│   ├── schema.py       # ContentItem + AnalysisResult 数据模型
│   ├── ingest.py       # XHS + Douyin 数据解析（自动检测平台）
│   ├── analyze.py      # Claude 分析引擎（prompt + JSON 解析）
│   ├── report.py       # HTML/MD/JSON 报告生成
│   └── templates/      # Jinja2 HTML 模板
├── tests/
│   ├── test_schema.py
│   ├── test_ingest.py
│   ├── test_analyze.py
│   └── fixtures/       # XHS/Douyin 样本数据
├── docs/
│   └── plan.md         # 实现计划
├── pyproject.toml
└── README.md

配置

环境变量	说明	必填	默认值
`ANTHROPIC_API_KEY`	Claude API key	`ci run` 分析时必填	--

ci ingest 预览模式不需要 API key。

CLI 参考

# 完整 pipeline
ci run <dirs...> [--output ./reports] [--api-key sk-ant-...] [--skip-analysis]

# 仅 ingest 预览（不调用 Claude）
ci ingest <dirs...>

参数/选项	说明	必填	默认值
`dirs`	一个或多个采集输出目录	是	--
`--output / -o`	报告输出目录	否	`./reports`
`--api-key`	Anthropic API key（也可通过环境变量）	分析时必填	`$ANTHROPIC_API_KEY`
`--skip-analysis`	跳过 Claude 分析，仅 ingest + 空报告	否	`false`

For AI Agents

Capability Contract

name: content-intelligence
capability: 将社交媒体采集数据转化为结构化内容洞察
version: 0.1.0

interface:
  cli: ci
  subcommands:
    run:
      description: 完整 pipeline — ingest + Claude 分析 + 报告生成
      args: ["dirs..."]
      flags:
        --output: {type: string, default: "./reports"}
        --api-key: {type: string, env: ANTHROPIC_API_KEY}
        --skip-analysis: {type: boolean}
      exit_codes:
        0: 成功，报告已生成
        1: 无数据或缺少 API key
    ingest:
      description: 仅解析和预览数据
      args: ["dirs..."]

input:
  format: 目录路径（xiaohongshu-downloader 或 douyin-downloader 输出）
  detection: 自动（manifest.json → XHS, download_manifest.jsonl → Douyin）

output:
  json: "{date}_intelligence.json"
  markdown: "{date}_report.md"
  html: "{date}_report.html"
  schema:
    trends: [{topic, heat_score, direction, evidence, platforms}]
    top_content: [{id, title, platform, engagement_score, why_it_works}]
    patterns: [{pattern, frequency, examples, applicable_to}]
    topic_suggestions: [{topic, angle, confidence, reasoning, reference_ids}]

dependencies:
  runtime: [click, anthropic, jinja2]
  upstream: [xiaohongshu-downloader, douyin-downloader]

install: pip install -e .

Agent 调用示例

import subprocess
import json
import os

# 完整分析
result = subprocess.run(
    ["ci", "run", "./xhs-data", "./douyin-data", "-o", "./reports"],
    capture_output=True, text=True,
    env={**os.environ, "ANTHROPIC_API_KEY": "sk-ant-..."}
)

# 读取结构化输出
intelligence = json.loads(open("./reports/2026-03-29_intelligence.json").read())
trends = intelligence["trends"]               # 热门话题
suggestions = intelligence["topic_suggestions"]  # 选题建议

# 仅 ingest（不需要 API key）
result = subprocess.run(
    ["ci", "ingest", "./xhs-data"],
    capture_output=True, text=True
)

License

MIT

项目	关系	链接
xiaohongshu-downloader	上游数据源（小红书采集）	zinan92/xiaohongshu-downloader
douyin-downloader	上游数据源（抖音采集）	zinan92/douyin-downloader-1
videocut	下游消费者（内容生产）	zinan92/videocut

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Content Intelligence

示例输出

架构

快速开始

功能一览

技术栈

项目结构

配置

For AI Agents

Capability Contract

Agent 调用示例

相关项目

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
docs		docs
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Content Intelligence

示例输出

架构

快速开始

功能一览

技术栈

项目结构

配置

For AI Agents

Capability Contract

Agent 调用示例

相关项目

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages