Skip to content

Comments

feat(code-chunk): full integration of YAML, TOML, JSON, and JSONL#28

Open
matperez wants to merge 1 commit intosupermemoryai:mainfrom
matperez:feature/data-formats-yaml-toml-json-jsonl
Open

feat(code-chunk): full integration of YAML, TOML, JSON, and JSONL#28
matperez wants to merge 1 commit intosupermemoryai:mainfrom
matperez:feature/data-formats-yaml-toml-json-jsonl

Conversation

@matperez
Copy link

What

Adds full support for YAML, TOML, JSON, and JSONL in the code-chunk pipeline: language detection, tree-sitter parsing, entity extraction (top-level sections), and chunking with size limits.

Why

Config and data files (YAML, TOML, JSON, JSONL) should be chunked like code: entities = top-level sections, boundaries and size controlled by the same node-based algorithm. This allows indexing and retrieval of config/docs in the same way as source files.

Notes

  • JSONL is handled without a single-file AST: lines are accumulated until maxChunkSize; each line is treated as one entity (name from first key or "line N").
  • New Language values: yaml, toml, json, jsonl. New EntityType: section.
  • Dependencies: tree-sitter-json, @tree-sitter-grammars/tree-sitter-yaml, @tree-sitter-grammars/tree-sitter-toml.
  • Tests added/updated for parser, extract, and integration (stream) for all four formats.

- Добавлены языки: yaml, toml, json, jsonl и тип сущности section
- Парсер: грамматики tree-sitter для yaml/toml/json, расширения файлов
- Извлечение сущностей: запросы и fallback для топ-уровневых секций
- extractName/extractSignature и импорты для новых форматов
- JSONL: отдельная ветка чанкинга по строкам (без AST всего файла)
- Тесты: parser, extract, integration для всех четырёх форматов
@matperez matperez changed the title feat(code-chunk): полная интеграция YAML, TOML, JSON и JSONL feat(code-chunk): full integration of YAML, TOML, JSON, and JSONL Feb 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant