Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/microsoft/graphrag/llms.txt

Use this file to discover all available pages before exploring further.

GraphRAG can consume significant LLM resources. Start with the tutorial dataset until you understand the system, and experiment with fast/inexpensive models before committing to large indexing jobs.
Get GraphRAG running in 5 steps: install, initialize, configure, index, and query.

Prerequisites

  • Python 3.11-3.13
  • OpenAI API key or Azure OpenAI credentials

Step 1: Install GraphRAG

1

Create a project directory

mkdir graphrag_quickstart
cd graphrag_quickstart
python -m venv .venv
2

Activate your virtual environment

source .venv/bin/activate
3

Install the package

pip install graphrag

Step 2: Initialize your workspace

Initialize GraphRAG to create the necessary configuration files:
graphrag init
When prompted, specify your preferred models:
  • Chat model: gpt-4.1 (default) or gpt-4-turbo
  • Embedding model: text-embedding-3-large (default)
This creates:
  • settings.yaml - Main configuration file
  • .env - Environment variables (API keys)
  • input/ - Directory for source documents
  • prompts/ - Customizable extraction prompts

Step 3: Configure your API key

Edit .env and add your OpenAI API key:
GRAPHRAG_API_KEY=sk-...

Step 4: Add your data

Download the sample dataset (A Christmas Carol):
curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt -o ./input/book.txt
Or add your own documents to the input/ directory. Supported formats:
  • .txt files
  • .csv files (with text column)
  • .json files (with text field)

Step 5: Run the indexing pipeline

Build your knowledge graph index:
graphrag index
This process:
  1. Chunks your documents into text units
  2. Extracts entities, relationships, and claims
  3. Builds a knowledge graph
  4. Detects hierarchical communities
  5. Generates community summaries
  6. Creates vector embeddings
Indexing typically takes 5-15 minutes for the sample dataset with GPT-4. Cost: approximately 0.500.50-2.00 depending on the model and dataset size.
The output is saved to output/ as parquet files.

Step 6: Query your knowledge graph

Now you can ask questions about your data using different search methods:

Understanding the search methods

Global search

Best for: Holistic dataset understandingUses all community summaries in a map-reduce process. Ideal for “what are the main themes” type questions.

Local search

Best for: Entity-specific questionsRetrieves entities and their neighbors. Ideal for “who is X” or “what does Y do” questions.

DRIFT search

Best for: Multi-level reasoningCombines local entity retrieval with community context. Ideal for complex questions requiring both depth and breadth.

Next steps

Configuration

Customize indexing, chunking, and model settings

Prompt tuning

Generate domain-specific prompts for better extraction

Query engine

Learn about all four search methods in depth

Python API

Use GraphRAG programmatically in your applications

Troubleshooting

If you hit rate limits:
  1. Reduce parallelization in settings.yaml:
    parallelization:
      concurrent_requests: 5  # Reduce from default 25
    
  2. Add delays between requests:
    concurrent_requests: 5
    rate_limit_per_minute: 300
    
To reduce indexing costs:
  1. Use gpt-3.5-turbo for initial testing
  2. Enable caching to avoid re-processing:
    cache:
      type: file
    
  3. Use the “fast” indexing method:
    graphrag index --method fast
    
For large datasets:
  1. Increase chunk overlap to reduce total chunks
  2. Reduce max_cluster_size in settings
  3. Process documents in batches
Improve extraction with prompt tuning:
graphrag prompt-tune
This auto-generates prompts optimized for your domain and data.