feat(code-chunk): full integration of YAML, TOML, JSON, and JSONL#28
Open
matperez wants to merge 1 commit intosupermemoryai:mainfrom
Open
feat(code-chunk): full integration of YAML, TOML, JSON, and JSONL#28matperez wants to merge 1 commit intosupermemoryai:mainfrom
matperez wants to merge 1 commit intosupermemoryai:mainfrom
Conversation
- Добавлены языки: yaml, toml, json, jsonl и тип сущности section - Парсер: грамматики tree-sitter для yaml/toml/json, расширения файлов - Извлечение сущностей: запросы и fallback для топ-уровневых секций - extractName/extractSignature и импорты для новых форматов - JSONL: отдельная ветка чанкинга по строкам (без AST всего файла) - Тесты: parser, extract, integration для всех четырёх форматов
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds full support for YAML, TOML, JSON, and JSONL in the code-chunk pipeline: language detection, tree-sitter parsing, entity extraction (top-level sections), and chunking with size limits.
Why
Config and data files (YAML, TOML, JSON, JSONL) should be chunked like code: entities = top-level sections, boundaries and size controlled by the same node-based algorithm. This allows indexing and retrieval of config/docs in the same way as source files.
Notes