feat: add reactome_searcher by ChenZiHong-Gavin · Pull Request #176 · InternScience/GraphGen

ChenZiHong-Gavin · 2026-02-06T07:14:14Z

No description provided.

gemini-code-assist · 2026-02-06T07:14:34Z

Summary of Changes

Hello @ChenZiHong-Gavin, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the system's bioinformatics capabilities by introducing a dedicated Reactome searcher. This new component allows users to seamlessly query the Reactome database for detailed information on biological pathways, reactions, and entities. It also supports advanced features like gene list enrichment analysis and provides a standardized output format, making it easier to integrate Reactome data into broader analytical workflows. The integration ensures that Reactome can be used as a data source within existing pipeline configurations.

Highlights

New Reactome Searcher: Introduced a new ReactomeSearcher class to interact with the Reactome database, enabling comprehensive querying of biological pathways and reactions.
Unified Search Interface: The ReactomeSearcher provides a unified search method that intelligently handles different query types, including Reactome stable IDs, keywords, and gene lists for enrichment analysis.
Gene Overrepresentation Analysis: Added functionality to perform overrepresentation analysis on gene lists, mapping genes to Reactome pathways and providing statistical enrichment results.
Integration and Examples: Integrated the new searcher into the existing search service and provided new example input and configuration files (search_reactome_demo.jsonl and search_reactome_config.yaml) to demonstrate its usage.
Robust API Interaction: Implemented robust API interaction with retry logic for GET and POST requests, handling network issues and timeouts gracefully.

Changelog

examples/input_examples/search_reactome_demo.jsonl
- Added a new example input file to demonstrate various Reactome query types, including Reactome IDs, keywords, gene symbols, and gene lists.
examples/search/search_reactome/search_reactome_config.yaml
- Added a new configuration file for setting up a Reactome search pipeline, defining read_step and search_step with reactome as the data source and configurable reactome_params.
graphgen/models/init.py
- Modified to import ReactomeSearcher and add it to the _module_lookup dictionary, making it accessible within the graphgen.models module.
graphgen/models/searcher/db/reactome_searcher.py
- Added a new Python module defining the ReactomeSearcher class, which encapsulates all logic for interacting with the Reactome Content and Analysis Services, including methods for ID lookup, keyword search, gene analysis, and result normalization.
graphgen/operators/search/search_service.py
- Modified the _init_searcher method to dynamically instantiate the ReactomeSearcher when the data_source is specified as reactome in the search configuration.

Activity

No specific activity (comments, reviews, or progress updates) has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a ReactomeSearcher for querying the Reactome database, which is a valuable addition. The implementation is comprehensive, covering various features of the Reactome API. However, I've found a few critical issues in graphgen/models/searcher/db/reactome_searcher.py, including syntax errors that will prevent the code from running, and a logic bug that could lead to a crash. I've also pointed out some areas where the code can be made more robust and maintainable. Please review the detailed comments.

gemini-code-assist · 2026-02-06T07:16:21Z

graphgen/models/searcher/db/reactome_searcher.py

+            "description": data.get("summation", [{}])[0].get("text", "")
+            if isinstance(data.get("summation"), list)
+            else "",


This code has two critical issues:

SyntaxError: A conditional expression used as a dictionary value must be enclosed in parentheses if it spans multiple lines. As written, this will cause a syntax error.

IndexError: Even after fixing the syntax, there's a logic bug. If data.get("summation") returns an empty list [], the condition isinstance([], list) is true, but data.get("summation", [{}]) also returns [], which will cause an IndexError when [0] is accessed. This will crash the application.

I've suggested a fix that resolves both the syntax and logic errors using a more robust approach.

"description": (s[0].get("text", "") if (s := data.get("summation")) and isinstance(s, list) and s else ""),

gemini-code-assist · 2026-02-06T07:16:21Z

graphgen/models/searcher/db/reactome_searcher.py

+                    "reference_entities": [
+                        ref.get("dbId") for ref in data.get("referenceEntity", [])
+                    ]
+                    if isinstance(data.get("referenceEntity"), list)
+                    else [],


This block of code has a critical syntax error because a multi-line conditional expression for a dictionary value must be enclosed in parentheses. Additionally, the logic is overly complex and difficult to read, which impacts maintainability. I'm suggesting a more concise and readable version that fixes the syntax error and simplifies the logic.

"reference_entities": ([ref.get("dbId") for ref in s] if isinstance(s := data.get("referenceEntity"), list) else []),

gemini-code-assist · 2026-02-06T07:16:21Z

graphgen/models/searcher/db/reactome_searcher.py

+    def __init__(
+        self,
+        species: str = "Homo sapiens",
+        timeout: int = 30,
+        max_retries: int = 3,


The max_retries parameter is defined in the __init__ method but it is not used by any of the tenacity.retry decorators in this class. Instead, the number of retries is hardcoded (e.g., stop_after_attempt(3) in _get and _post, and stop_after_attempt(5) in search). This is misleading and can lead to unexpected behavior. Please either use the max_retries parameter to configure the retry logic or remove it to avoid confusion.

gemini-code-assist · 2026-02-06T07:16:21Z

graphgen/models/searcher/db/reactome_searcher.py

+        elif "\n" in query or "," in query:
+            # Parse gene list
+            genes = [g.strip() for g in re.split(r"[\n,]", query) if g.strip()]
+            if len(genes) > 1 or (len(genes) == 1 and len(genes[0]) < 20):
+                # Likely a gene list
+                result = self.analyze_genes(
+                    genes, projection=projection, include_disease=include_disease
+                )


The heuristic used to distinguish a gene list from a keyword query is fragile. Splitting by comma or newline can misinterpret queries that are natural language sentences containing commas. For example, a query like "pathways related to TP53, a tumor suppressor" would be incorrectly parsed as a gene list ['pathways related to TP53', ' a tumor suppressor'], leading to an incorrect analysis. Consider making this logic more robust, for instance by checking if all split parts conform to a typical gene symbol format before deciding to treat it as a gene list.

feat: add reactome_searcher

166c1e6

github-actions bot added core examples labels Feb 6, 2026

gemini-code-assist bot reviewed Feb 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add reactome_searcher#176

feat: add reactome_searcher#176
ChenZiHong-Gavin wants to merge 1 commit intomainfrom
feat/reactome_searcher

ChenZiHong-Gavin commented Feb 6, 2026

Uh oh!

gemini-code-assist bot commented Feb 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 6, 2026

Uh oh!

gemini-code-assist bot Feb 6, 2026

Uh oh!

gemini-code-assist bot Feb 6, 2026

Uh oh!

gemini-code-assist bot Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChenZiHong-Gavin commented Feb 6, 2026

Uh oh!

gemini-code-assist bot commented Feb 6, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant