Knowledge-guided Contextual Gene Set Analysis with Large Language Models
cGSA is a novel AI-driven framework that enhances GSA by incorporating context-aware pathway prioritization. This paradigm shift significantly improves interpretability, reducing the need for extensive manual analysis, and enhancing reproducibility.
python 3.11.0
openai 0.28.0
torch 1.13.0
numpy 1.26.3
pandas 2.1.4
requests 2.31.0
tiktoken 0.12.0
tokenizers 0.19.1
python-louvain 0.11
networkx 2.4
tqdm 4.28.1
pandas 0.23.4
texttable 1.5.0
- demo.xlsx: three demo DEGs for testing cGSA
Tip
The 102 DEGs could be found in the Supplementary directory.
-
Apply an OpenAI Key from the Azure OpenAI service to activate the access of LLMs, e.g., GPT-4.
OpenAI Documentation: https://learn.microsoft.com/en-us/azure/ai-services/
-
Create a virtual environment on your GPU terminate by using the anaconda command:
conda create -n {envname} python=3.11 -
Activate the environment by using the command:
conda activate {envname} -
Install the required packages one by one with the command:
pip install {package}=={version}
- Create a directory for cGSA in your own workplace
- Download this respoisit directly to your directory or git the respoisit by:
git@github.com:ncbi-nlp/cGSA.git
- Go to the created directory of GeneAgent
cd {directory} - Open the evaluation.py, the exploration.py, and the confidance.py respectively to replace the openai.api_key with your own API Key, as well as other required parameters openai.api_base and openai.api_version.
openai.api_key=YOUR_OWN_OPENAI_KEY openai.api_base=YOUR_OWN_OPENAI_BASE_SETTING openai.api_version=YOUR_OWN_OPENAI_API_VERSION
Type following command in your virtual environment.
python PathDis.py
The results will be stored accordingly.
Tip
If you want to evaluate your own gene sets, save them to Data directory and change the directory path in the PathDis.py
This research was supported [in part] by the Intramural Research Program of the National Institutes of Health (NIH). The contributions of the NIH author(s) are considered Works of the United States Government. The findings and conclusions presented in this paper are those of the author(s) and do not necessarily reflect the views of the NIH or the U.S. Department of Health and Human Services.
This tool shows the results of research conducted in the Computational Biology Branch, NLM. The information produced on this website is not intended for direct diagnostic use or medical decision-making without review and oversight by a clinical or genomics professional. Individuals should not change their health behavior solely on the basis of information produced on this website. NIH does not independently verify the validity or utility of the information produced by this tool. If you have questions about the information produced on this website, please see a health care professional. More information about NLM's disclaimer policy is available.