Conversation
Summary of ChangesHello @ChenZiHong-Gavin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the system's bioinformatics capabilities by introducing a dedicated Reactome searcher. This new component allows users to seamlessly query the Reactome database for detailed information on biological pathways, reactions, and entities. It also supports advanced features like gene list enrichment analysis and provides a standardized output format, making it easier to integrate Reactome data into broader analytical workflows. The integration ensures that Reactome can be used as a data source within existing pipeline configurations. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a ReactomeSearcher for querying the Reactome database, which is a valuable addition. The implementation is comprehensive, covering various features of the Reactome API. However, I've found a few critical issues in graphgen/models/searcher/db/reactome_searcher.py, including syntax errors that will prevent the code from running, and a logic bug that could lead to a crash. I've also pointed out some areas where the code can be made more robust and maintainable. Please review the detailed comments.
| "description": data.get("summation", [{}])[0].get("text", "") | ||
| if isinstance(data.get("summation"), list) | ||
| else "", |
There was a problem hiding this comment.
This code has two critical issues:
- SyntaxError: A conditional expression used as a dictionary value must be enclosed in parentheses if it spans multiple lines. As written, this will cause a syntax error.
- IndexError: Even after fixing the syntax, there's a logic bug. If
data.get("summation")returns an empty list[], the conditionisinstance([], list)is true, butdata.get("summation", [{}])also returns[], which will cause anIndexErrorwhen[0]is accessed. This will crash the application.
I've suggested a fix that resolves both the syntax and logic errors using a more robust approach.
"description": (s[0].get("text", "") if (s := data.get("summation")) and isinstance(s, list) and s else ""),| "reference_entities": [ | ||
| ref.get("dbId") for ref in data.get("referenceEntity", []) | ||
| ] | ||
| if isinstance(data.get("referenceEntity"), list) | ||
| else [], |
There was a problem hiding this comment.
This block of code has a critical syntax error because a multi-line conditional expression for a dictionary value must be enclosed in parentheses. Additionally, the logic is overly complex and difficult to read, which impacts maintainability. I'm suggesting a more concise and readable version that fixes the syntax error and simplifies the logic.
"reference_entities": ([ref.get("dbId") for ref in s] if isinstance(s := data.get("referenceEntity"), list) else []),| def __init__( | ||
| self, | ||
| species: str = "Homo sapiens", | ||
| timeout: int = 30, | ||
| max_retries: int = 3, |
There was a problem hiding this comment.
The max_retries parameter is defined in the __init__ method but it is not used by any of the tenacity.retry decorators in this class. Instead, the number of retries is hardcoded (e.g., stop_after_attempt(3) in _get and _post, and stop_after_attempt(5) in search). This is misleading and can lead to unexpected behavior. Please either use the max_retries parameter to configure the retry logic or remove it to avoid confusion.
| elif "\n" in query or "," in query: | ||
| # Parse gene list | ||
| genes = [g.strip() for g in re.split(r"[\n,]", query) if g.strip()] | ||
| if len(genes) > 1 or (len(genes) == 1 and len(genes[0]) < 20): | ||
| # Likely a gene list | ||
| result = self.analyze_genes( | ||
| genes, projection=projection, include_disease=include_disease | ||
| ) |
There was a problem hiding this comment.
The heuristic used to distinguish a gene list from a keyword query is fragile. Splitting by comma or newline can misinterpret queries that are natural language sentences containing commas. For example, a query like "pathways related to TP53, a tumor suppressor" would be incorrectly parsed as a gene list ['pathways related to TP53', ' a tumor suppressor'], leading to an incorrect analysis. Consider making this logic more robust, for instance by checking if all split parts conform to a typical gene symbol format before deciding to treat it as a gene list.
No description provided.