You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Originally posted by Teo2423 January 21, 2025
I am currently working on my undergraduate thesis in Data Science at the University of Buenos Aires. A significant portion of my thesis leverages Graph RAG, particularly focusing on the use of community reports.
To achieve this, I utilize the CLI tool to index the graph, after which I extract the reports and summaries from the create_final_community_reports.parquet file.
I have encountered an issue regarding the number of summaries per level, which seems counterintuitive, and I have been unable to locate any filters or logic in the source code that might explain this behavior.
The node distribution per level, derived from the CLI logs during the indexing process (also reflected in the indexing_logs), is as follows:
I find it perplexing that the level with the largest number of nodes (Level 0) has the fewest summaries. This raises questions about whether there might be a filter or threshold preventing summaries from being generated for communities below a certain level of relevance, or if I might be looking for summaries in the wrong place.
Any insights or guidance on this matter would be greatly appreciated.
Thank you for your time and support.
Best regards,
Teo
The text was updated successfully, but these errors were encountered:
Discussed in #1647
Originally posted by Teo2423 January 21, 2025
I am currently working on my undergraduate thesis in Data Science at the University of Buenos Aires. A significant portion of my thesis leverages Graph RAG, particularly focusing on the use of community reports.
To achieve this, I utilize the CLI tool to index the graph, after which I extract the reports and summaries from the
create_final_community_reports.parquet
file.I have encountered an issue regarding the number of summaries per level, which seems counterintuitive, and I have been unable to locate any filters or logic in the source code that might explain this behavior.
The node distribution per level, derived from the CLI logs during the indexing process (also reflected in the indexing_logs), is as follows:
Node Distribution
Level 0: 3248 nodes
Level 1: 1738 nodes
Level 2: 1093 nodes
Level 3: 65 nodes
Summary Distribution
Level 0: 27 summaries
Level 1: 61 summaries
Level 2: 90 summaries
Level 3: 35 summaries
I find it perplexing that the level with the largest number of nodes (Level 0) has the fewest summaries. This raises questions about whether there might be a filter or threshold preventing summaries from being generated for communities below a certain level of relevance, or if I might be looking for summaries in the wrong place.
Any insights or guidance on this matter would be greatly appreciated.
Thank you for your time and support.
Best regards,
Teo
The text was updated successfully, but these errors were encountered: