The default pipeline produces a series of output tables that align with the GraphRAG knowledge model. By default, these tables are written as Parquet files to disk.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/microsoft/graphrag/llms.txt
Use this file to discover all available pages before exploring further.
All output tables include embeddings written directly to your configured vector store for efficient downstream retrieval.
Shared fields
All tables have two identifier fields for global uniqueness and human readability:| Field | Type | Description |
|---|---|---|
id | str | Generated UUID, ensuring global uniqueness across all records |
human_readable_id | int | Incremented short ID created per-run. Used in generated summaries with citations for easy visual cross-reference |
Communities
This table contains the final communities generated by the Leiden algorithm. Communities are strictly hierarchical, subdividing into children as cluster affinity is narrowed.| Field | Type | Description |
|---|---|---|
community | int | Leiden-generated cluster ID for the community. These increment with depth and are unique through all levels of the hierarchy. For this table, human_readable_id is a copy of the community ID |
parent | int | Parent community ID |
children | int[] | List of child community IDs |
level | int | Depth of the community in the hierarchy |
title | str | Friendly name of the community |
entity_ids | str[] | List of entities that are members of the community |
relationship_ids | str[] | List of relationships wholly within the community (source and target both in community) |
text_unit_ids | str[] | List of text units represented within the community |
period | str | Date of ingest in ISO8601 format, used for incremental update merges |
size | int | Size of the community (entity count), used for incremental update merges |
Example communities.parquet
Example communities.parquet
Community reports
This table contains the summarized reports for each community, generated by the LLM.| Field | Type | Description |
|---|---|---|
community | int | Short ID of the community this report applies to |
parent | int | Parent community ID |
children | int[] | List of child community IDs |
level | int | Level of the community this report applies to |
title | str | LLM-generated title for the report |
summary | str | LLM-generated summary of the report |
full_content | str | LLM-generated full report |
rank | float | LLM-derived relevance ranking based on member entity salience |
rating_explanation | str | LLM-derived explanation of the rank |
findings | dict | LLM-derived list of the top 5-10 insights from the community. Contains summary and explanation values |
full_content_json | json | Full JSON output as returned by the LLM. Most fields are extracted into columns, but this JSON is sent for query summarization to allow prompt tuning to add fields/content |
period | str | Date of ingest in ISO8601 format, used for incremental update merges |
size | int | Size of the community (entity count), used for incremental update merges |
Example community_reports.parquet
Example community_reports.parquet
Covariates
This optional table is generated when claim extraction is enabled. Claims typically identify malicious behavior such as fraud, so they are not useful for all datasets.| Field | Type | Description |
|---|---|---|
covariate_type | str | Always “claim” with default covariates |
type | str | Nature of the claim type |
description | str | LLM-generated description of the behavior |
subject_id | str | Name of the source entity (performing the claimed behavior) |
object_id | str | Name of the target entity (behavior is performed on) |
status | str | LLM-derived assessment of correctness. One of: TRUE, FALSE, SUSPECTED |
start_date | str | LLM-derived start of the claimed activity (ISO8601) |
end_date | str | LLM-derived end of the claimed activity (ISO8601) |
source_text | str | Short string of text containing the claimed behavior |
text_unit_id | str | ID of the text unit the claim was extracted from |
Example covariates.parquet
Example covariates.parquet
Documents
This table contains the list of document content after import.| Field | Type | Description |
|---|---|---|
title | str | Filename, unless otherwise configured during CSV/JSON import |
text | str | Full text of the document |
text_unit_ids | str[] | List of text units (chunks) that were parsed from the document |
metadata | dict | If specified during CSV/JSON import, this is a dict of metadata for the document |
Example documents.parquet
Example documents.parquet
Entities
This table contains all entities found in the data by the LLM.| Field | Type | Description |
|---|---|---|
title | str | Name of the entity |
type | str | Type of the entity. By default: “organization”, “person”, “geo”, or “event” (unless configured differently or auto-tuning is used) |
description | str | Textual description of the entity. Since entities may be found in many text units, this is an LLM-derived summary of all descriptions |
text_unit_ids | str[] | List of the text units containing the entity |
frequency | int | Count of text units the entity was found within |
degree | int | Node degree (connectedness) in the graph |
Example entities.parquet
Example entities.parquet
Relationships
This table contains all entity-to-entity relationships found in the data by the LLM. This is also the edge list for the graph.| Field | Type | Description |
|---|---|---|
source | str | Name of the source entity |
target | str | Name of the target entity |
description | str | LLM-derived description of the relationship. Like entity descriptions, this is summarized from multiple instances |
weight | float | Weight of the edge in the graph. Summed from an LLM-derived “strength” measure for each relationship instance |
combined_degree | int | Sum of source and target node degrees |
text_unit_ids | str[] | List of text units the relationship was found within |
Example relationships.parquet
Example relationships.parquet
Text units
This table contains all text chunks parsed from the input documents.| Field | Type | Description |
|---|---|---|
text | str | Raw full text of the chunk |
n_tokens | int | Number of tokens in the chunk. Should normally match the chunk_size config parameter, except for the last chunk which is often shorter |
document_id | str | ID of the document the chunk came from |
entity_ids | str[] | List of entities found in the text unit |
relationship_ids | str[] | List of relationships found in the text unit |
covariate_ids | str[] | Optional list of covariates found in the text unit |
Example text_units.parquet
Example text_units.parquet
Working with Parquet files
Storage locations
By default, Parquet files are written to theoutput directory specified in your configuration:
settings.yaml
- Local filesystem
- Azure Blob Storage
- Custom storage
output/entities.parquetoutput/relationships.parquetoutput/communities.parquet- etc.
Next steps
Custom graphs
Learn how to bring your own existing graph data
Querying
Use the output tables for GraphRAG queries
Configuration
Configure storage providers and output settings