Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/microsoft/graphrag/llms.txt

Use this file to discover all available pages before exploring further.

GraphRAG provides sensible defaults for all configuration options. This reference documents all default values.

Model defaults

Completion models

DEFAULT_COMPLETION_MODEL_ID
str
default:"default_completion_model"
Default identifier for completion model configurations.
DEFAULT_COMPLETION_MODEL
str
default:"gpt-4.1"
Default completion model name.
DEFAULT_COMPLETION_MODEL_AUTH_TYPE
AuthMethod
default:"ApiKey"
Default authentication method for completion models.
DEFAULT_MODEL_PROVIDER
str
default:"openai"
Default model provider.

Embedding models

DEFAULT_EMBEDDING_MODEL_ID
str
default:"default_embedding_model"
Default identifier for embedding model configurations.
DEFAULT_EMBEDDING_MODEL
str
default:"text-embedding-3-large"
Default embedding model name.
DEFAULT_EMBEDDING_MODEL_AUTH_TYPE
AuthMethod
default:"ApiKey"
Default authentication method for embedding models.

Encoding

ENCODING_MODEL
str
default:"o200k_base"
Default encoding model for tokenization.

Directory defaults

DEFAULT_INPUT_BASE_DIR
str
default:"input"
Default base directory for input files.
DEFAULT_OUTPUT_BASE_DIR
str
default:"output"
Default base directory for output files.
DEFAULT_CACHE_BASE_DIR
str
default:"cache"
Default base directory for cache storage.
DEFAULT_UPDATE_OUTPUT_BASE_DIR
str
default:"update_output"
Default base directory for incremental update output.

Entity types

DEFAULT_ENTITY_TYPES
list[str]
Default entity types to extract during graph construction.

Configuration class defaults

The following sections document default values for each configuration class.

BasicSearchDefaults

prompt
None
default:"None"
Basic search prompt template.
k
int
default:"10"
Number of results to return.
max_context_tokens
int
default:"12000"
Maximum context tokens.
completion_model_id
str
default:"default_completion_model"
Completion model ID.
embedding_model_id
str
default:"default_embedding_model"
Embedding model ID.

ChunkingDefaults

type
str
default:"tokens"
Chunking strategy type (from ChunkerType enum).
size
int
default:"1200"
Chunk size in tokens.
overlap
int
default:"100"
Overlap between chunks in tokens.
encoding_model
str
default:"o200k_base"
Encoding model for tokenization.
prepend_metadata
None
default:"None"
Metadata to prepend to chunks.

ClusterGraphDefaults

max_cluster_size
int
default:"10"
Maximum size of clusters.
use_lcc
bool
default:"True"
Whether to use the largest connected component.
seed
int
default:"0xDEADBEEF"
Random seed for clustering (3735928559 in decimal).

CommunityReportDefaults

graph_prompt
None
default:"None"
Prompt for graph-based community reports.
text_prompt
None
default:"None"
Prompt for text-based community reports.
max_length
int
default:"2000"
Maximum report length in tokens.
max_input_length
int
default:"8000"
Maximum input length in tokens.
completion_model_id
str
default:"default_completion_model"
Completion model ID.
model_instance_name
str
default:"community_reporting"
Model instance name for caching.

DriftSearchDefaults

prompt
None
default:"None"
DRIFT search prompt.
reduce_prompt
None
default:"None"
Reduce step prompt.
data_max_tokens
int
default:"12000"
Maximum data tokens.
reduce_max_tokens
None
default:"None"
Maximum reduce tokens.
reduce_temperature
float
default:"0"
Temperature for reduce step.
reduce_max_completion_tokens
None
default:"None"
Maximum completion tokens for reduce.
concurrency
int
default:"32"
Concurrency level for DRIFT operations.
drift_k_followups
int
default:"20"
Number of followup queries.
primer_folds
int
default:"5"
Number of primer folds.
primer_llm_max_tokens
int
default:"12000"
Maximum tokens for primer LLM.
n_depth
int
default:"3"
Search depth.
local_search_text_unit_prop
float
default:"0.9"
Text unit proportion for local search component.
local_search_community_prop
float
default:"0.1"
Community proportion for local search component.
local_search_top_k_mapped_entities
int
default:"10"
Top k entities for local search.
local_search_top_k_relationships
int
default:"10"
Top k relationships for local search.
local_search_max_data_tokens
int
default:"12000"
Maximum data tokens for local search.
local_search_temperature
float
default:"0"
Temperature for local search.
local_search_top_p
float
default:"1"
Top p for local search.
local_search_n
int
default:"1"
Number of completions for local search.
local_search_llm_max_gen_tokens
int | None
default:"None"
Maximum generation tokens for local search.
local_search_llm_max_gen_completion_tokens
int | None
default:"None"
Maximum completion tokens for local search.
completion_model_id
str
default:"default_completion_model"
Completion model ID.
embedding_model_id
str
default:"default_embedding_model"
Embedding model ID.

EmbedTextDefaults

embedding_model_id
str
default:"default_embedding_model"
Embedding model ID.
model_instance_name
str
default:"text_embedding"
Model instance name for caching.
batch_size
int
default:"16"
Batch size for embedding operations.
batch_max_tokens
int
default:"8191"
Maximum tokens per batch.
names
list[str]
List of embeddings to generate (uses default_embeddings).

ExtractClaimsDefaults

enabled
bool
default:"False"
Whether claim extraction is enabled.
prompt
None
default:"None"
Claim extraction prompt.
description
str
Description of claims to extract.
max_gleanings
int
default:"1"
Maximum number of gleaning iterations.
completion_model_id
str
default:"default_completion_model"
Completion model ID.
model_instance_name
str
default:"extract_claims"
Model instance name for caching.

ExtractGraphDefaults

prompt
None
default:"None"
Graph extraction prompt.
entity_types
list[str]
Entity types to extract.
max_gleanings
int
default:"1"
Maximum number of gleaning iterations.
completion_model_id
str
default:"default_completion_model"
Completion model ID.
model_instance_name
str
default:"extract_graph"
Model instance name for caching.

TextAnalyzerDefaults

Used for NLP-based graph extraction.
extractor_type
NounPhraseExtractorType
default:"RegexEnglish"
Noun phrase extractor type.
model_name
str
default:"en_core_web_md"
SpaCy model name.
max_word_length
int
default:"15"
Maximum word length to consider.
word_delimiter
str
default:"' '"
Delimiter between words.
include_named_entities
bool
default:"True"
Whether to include named entities.
exclude_nouns
list[str]
List of nouns to exclude (uses EN_STOP_WORDS).
exclude_entity_tags
list[str]
default:"[\"DATE\"]"
Entity tags to exclude.
exclude_pos_tags
list[str]
default:"[\"DET\", \"PRON\", \"INTJ\", \"X\"]"
Part-of-speech tags to exclude.
noun_phrase_tags
list[str]
default:"[\"PROPN\", \"NOUNS\"]"
Tags for noun phrases.
noun_phrase_grammars
dict[str, str]
Grammar rules for noun phrase combination.Default:
{
    "PROPN,PROPN": "PROPN",
    "NOUN,NOUN": "NOUNS",
    "NOUNS,NOUN": "NOUNS",
    "ADJ,ADJ": "ADJ",
    "ADJ,NOUN": "NOUNS"
}

ExtractGraphNLPDefaults

normalize_edge_weights
bool
default:"True"
Whether to normalize edge weights.
text_analyzer
TextAnalyzerDefaults
Text analyzer configuration.
concurrent_requests
int
default:"25"
Number of concurrent requests.
async_mode
AsyncType
default:"Threaded"
Async mode to use.

GlobalSearchDefaults

map_prompt
None
default:"None"
Map step prompt.
reduce_prompt
None
default:"None"
Reduce step prompt.
knowledge_prompt
None
default:"None"
Knowledge generation prompt.
max_context_tokens
int
default:"12000"
Maximum context tokens.
data_max_tokens
int
default:"12000"
Maximum data tokens.
map_max_length
int
default:"1000"
Maximum map response length in words.
reduce_max_length
int
default:"2000"
Maximum reduce response length in words.
dynamic_search_threshold
int
default:"1"
Community rating threshold for inclusion.
dynamic_search_keep_parent
bool
default:"False"
Keep parent community if children are relevant.
dynamic_search_num_repeats
int
default:"1"
Number of times to rate each community.
dynamic_search_use_summary
bool
default:"False"
Use community summary instead of full context.
dynamic_search_max_level
int
default:"2"
Maximum community hierarchy level.
completion_model_id
str
default:"default_completion_model"
Completion model ID.

StorageDefaults

type
str
default:"file"
Storage type (from StorageType enum).
encoding
str | None
default:"None"
Text encoding.
base_dir
str | None
default:"None"
Base directory for file storage.
azure_connection_string
None
default:"None"
Azure connection string.
azure_container_name
None
default:"None"
Azure container name.
azure_account_url
None
default:"None"
Azure account URL.
azure_cosmosdb_account_url
None
default:"None"
Azure CosmosDB account URL.

InputDefaults

type
InputType
default:"Text"
Input type.
encoding
str | None
default:"None"
Text encoding.
file_pattern
None
default:"None"
File pattern for matching input files.
id_column
None
default:"None"
Column name for document IDs.
title_column
None
default:"None"
Column name for document titles.
text_column
None
default:"None"
Column name for document text.

InputStorageDefaults

Extends StorageDefaults.
base_dir
str
default:"input"
Base directory for input storage.

CacheStorageDefaults

Extends StorageDefaults.
base_dir
str
default:"cache"
Base directory for cache storage.

CacheDefaults

type
CacheType
default:"Json"
Cache type.
storage
CacheStorageDefaults
Cache storage configuration.

LocalSearchDefaults

prompt
None
default:"None"
Local search prompt.
text_unit_prop
float
default:"0.5"
Text unit proportion.
community_prop
float
default:"0.15"
Community proportion.
conversation_history_max_turns
int
default:"5"
Maximum conversation history turns.
top_k_entities
int
default:"10"
Top k entities to retrieve.
top_k_relationships
int
default:"10"
Top k relationships to retrieve.
max_context_tokens
int
default:"12000"
Maximum context tokens.
completion_model_id
str
default:"default_completion_model"
Completion model ID.
embedding_model_id
str
default:"default_embedding_model"
Embedding model ID.

OutputStorageDefaults

Extends StorageDefaults.
base_dir
str
default:"output"
Base directory for output storage.

PruneGraphDefaults

min_node_freq
int
default:"2"
Minimum node frequency.
max_node_freq_std
None
default:"None"
Maximum node frequency standard deviation.
min_node_degree
int
default:"1"
Minimum node degree.
max_node_degree_std
None
default:"None"
Maximum node degree standard deviation.
min_edge_weight_pct
float
default:"40.0"
Minimum edge weight percentage.
remove_ego_nodes
bool
default:"True"
Whether to remove ego nodes.
lcc_only
bool
default:"False"
Keep only largest connected component.

ReportingDefaults

type
ReportingType
default:"file"
Reporting type.
base_dir
str
default:"logs"
Base directory for reporting.
connection_string
None
default:"None"
Connection string for blob reporting.
container_name
None
default:"None"
Container name for blob reporting.
storage_account_blob_url
None
default:"None"
Storage account blob URL.

SnapshotsDefaults

embeddings
bool
default:"False"
Whether to save embedding snapshots.
graphml
bool
default:"False"
Whether to save GraphML snapshots.
raw_graph
bool
default:"False"
Whether to save raw graph snapshots.

SummarizeDescriptionsDefaults

prompt
None
default:"None"
Summarization prompt.
max_length
int
default:"500"
Maximum summary length in tokens.
max_input_tokens
int
default:"4000"
Maximum input tokens.
completion_model_id
str
default:"default_completion_model"
Completion model ID.
model_instance_name
str
default:"summarize_descriptions"
Model instance name for caching.

UpdateOutputStorageDefaults

Extends StorageDefaults.
base_dir
str
default:"update_output"
Base directory for update output storage.

VectorStoreDefaults

type
str
default:"lancedb"
Vector store type (from VectorStoreType enum).
db_uri
str
default:"output/lancedb"
Database URI for vector store.

GraphRagConfigDefaults

Root configuration defaults.
models
dict
default:"{}"
Legacy model configurations.
completion_models
dict
default:"{}"
Completion model configurations.
embedding_models
dict
default:"{}"
Embedding model configurations.
concurrent_requests
int
default:"25"
Default concurrent requests.
async_mode
AsyncType
default:"Threaded"
Default async mode.
reporting
ReportingDefaults
Reporting configuration defaults.
input_storage
InputStorageDefaults
Input storage configuration defaults.
output_storage
OutputStorageDefaults
Output storage configuration defaults.
update_output_storage
UpdateOutputStorageDefaults
Update output storage configuration defaults.
cache
CacheDefaults
Cache configuration defaults.
input
InputDefaults
Input configuration defaults.
embed_text
EmbedTextDefaults
Text embedding configuration defaults.
chunking
ChunkingDefaults
Chunking configuration defaults.
snapshots
SnapshotsDefaults
Snapshots configuration defaults.
extract_graph
ExtractGraphDefaults
Graph extraction configuration defaults.
extract_graph_nlp
ExtractGraphNLPDefaults
NLP graph extraction configuration defaults.
summarize_descriptions
SummarizeDescriptionsDefaults
Description summarization configuration defaults.
community_reports
CommunityReportDefaults
Community reports configuration defaults.
extract_claims
ExtractClaimsDefaults
Claims extraction configuration defaults.
prune_graph
PruneGraphDefaults
Graph pruning configuration defaults.
cluster_graph
ClusterGraphDefaults
Graph clustering configuration defaults.
Local search configuration defaults.
Global search configuration defaults.
DRIFT search configuration defaults.
Basic search configuration defaults.
vector_store
VectorStoreDefaults
Vector store configuration defaults.
workflows
None
default:"None"
Workflows list.