Skip to content

ncbi-nlp/cGSA

Repository files navigation

Project Overview

Title

Knowledge-guided Contextual Gene Set Analysis with Large Language Models

Abstract

cGSA is a novel AI-driven framework that enhances GSA by incorporating context-aware pathway prioritization. This paradigm shift significantly improves interpretability, reducing the need for extensive manual analysis, and enhancing reproducibility.

Requirement

python 3.11.0
openai 0.28.0
torch  1.13.0
numpy  1.26.3
pandas 2.1.4
requests  2.31.0 
tiktoken 0.12.0
tokenizers 0.19.1
python-louvain 0.11
networkx 2.4
tqdm 4.28.1
pandas 0.23.4
texttable 1.5.0 

Datasets

  • demo.xlsx: three demo DEGs for testing cGSA

Tip

The 102 DEGs could be found in the Supplementary directory.

Configuration:

Installation

  1. Apply an OpenAI Key from the Azure OpenAI service to activate the access of LLMs, e.g., GPT-4.

    OpenAI Documentation: https://learn.microsoft.com/en-us/azure/ai-services/

  2. Create a virtual environment on your GPU terminate by using the anaconda command:

    conda create -n {envname} python=3.11
    
  3. Activate the environment by using the command:

    conda activate {envname}
    
  4. Install the required packages one by one with the command:

    pip install {package}=={version}
    

Download

  1. Create a directory for cGSA in your own workplace
  2. Download this respoisit directly to your directory or git the respoisit by:
    git@github.com:ncbi-nlp/cGSA.git
    

Replace the openai key

  1. Go to the created directory of GeneAgent
    cd {directory}
    
  2. Open the evaluation.py, the exploration.py, and the confidance.py respectively to replace the openai.api_key with your own API Key, as well as other required parameters openai.api_base and openai.api_version.
    openai.api_key=YOUR_OWN_OPENAI_KEY
    openai.api_base=YOUR_OWN_OPENAI_BASE_SETTING
    openai.api_version=YOUR_OWN_OPENAI_API_VERSION
    

Execute

Running

Type following command in your virtual environment.

python PathDis.py

The results will be stored accordingly.

Tip

If you want to evaluate your own gene sets, save them to Data directory and change the directory path in the PathDis.py

Acknowledgements

This research was supported [in part] by the Intramural Research Program of the National Institutes of Health (NIH). The contributions of the NIH author(s) are considered Works of the United States Government. The findings and conclusions presented in this paper are those of the author(s) and do not necessarily reflect the views of the NIH or the U.S. Department of Health and Human Services.

Disclaimer

This tool shows the results of research conducted in the Computational Biology Branch, NLM. The information produced on this website is not intended for direct diagnostic use or medical decision-making without review and oversight by a clinical or genomics professional. Individuals should not change their health behavior solely on the basis of information produced on this website. NIH does not independently verify the validity or utility of the information produced by this tool. If you have questions about the information produced on this website, please see a health care professional. More information about NLM's disclaimer policy is available.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages