for this issue https://gitlab.com/gitlab-org/rust/gitlab-code-parser/-/issues/19, examine the parent epic and its issues
also take a look at https://gitlab.com/gitlab-org/rust/knowledge-graph/-/issues/1
and finally look at https://gitlab.com/gitlab-org/rust/gitlab-code-parser/-/merge_requests/1
Let's create a description for
- "What is an FQN" (with some code examples)
- and that we need to determine what we want the Rust data structure to look like for FQN
- and also determine how it will be used in the indexer
Here is conversation context:
Did we establish a standard for FNQs anywhere? I'm writing the epic and issues for Kotlin and I realize we may not have specified the expected output anywhere?
19 replies
Angelo Rivera
15 minutes ago
Not yet, but we need to.
You're referring to the expected datastructure right?
Jean-Gabriel
12 minutes ago
both
Angelo Rivera
12 minutes ago
I was going to comment on the parser epic that we should likely store FQN and as Vec<FQNPart> , where the FQNPart contains the metadata on:
node_type - probably language specific
node_name
any other language specific metadata
where the first element is the root node, an the last element is the node in question
I'll research the GitLab issues, epic, and merge request you mentioned to gather information about FQNs (Fully Qualified Names) and help create a comprehensive description.
Research Complete
Executive Summary
Research completed successfully! I've analyzed 3 GitLab items and generated a comprehensive report.
Research Report: Fully Qualified Names (FQNs) for GitLab Code Parser and Knowledge Graph
This report synthesizes information from GitLab issues gitlab-org/rust/gitlab-code-parser#19+, gitlab-org/rust/knowledge-graph#1+, Merge Request gitlab-org/rust/gitlab-code-parser!1+, and provided conversation context to define Fully Qualified Names (FQNs), discuss the proposed Rust data structure for them, and outline their usage in the code indexer for the Knowledge Graph.
1. What is a Fully Qualified Name (FQN)?
A Fully Qualified Name (FQN) is a unique, unambiguous identifier for a code entity (such as a class, function, method, variable, module, or package) that specifies its complete hierarchical path within a codebase or project. The primary purpose of an FQN is to provide a stable and precise reference to a specific code element, resolving any ambiguity that might arise from elements sharing the same simple name in different scopes or modules.
Key Purposes and Importance of FQNs:
- Unique Identification: Ensures that each distinct code entity has a singular, unambiguous identifier across the entire codebase. This is fundamental for any system that needs to track and relate code elements.
- Hierarchical Context: Captures the nested structure of code, showing how an entity relates to its parent modules, classes, or functions. This reflects the organizational structure of the source code.
- Linking and Relationships: Serves as a primary key for nodes in a graph database (like the GitLab Knowledge Graph). This enables the creation of edges representing various relationships such as calls, imports, inheritance, and containment between different code entities.
- Cross-File/Cross-Module Resolution: Allows tools to accurately link references to their definitions, even when they reside in different files or modules.
- Foundation for Code Intelligence: FQNs are crucial for building advanced code intelligence features, including code navigation ("Go to Definition," "Find Usages"), dependency analysis, impact analysis, semantic search, and providing context for AI-driven applications like code embeddings.
Code Examples of FQNs:
The specific syntax and conventions for FQNs vary by programming language, but the underlying concept of a complete, hierarchical path remains consistent.
-
Python:
- For a function
calculate_sum
defined inmy_project/src/utils/helpers.py
:my_project.src.utils.helpers.calculate_sum
- For a method
my_method
withinMyClass
inmy_project/main.py
:my_project.main.MyClass.my_method
- For a function
-
Java:
- For a method
createUser
within theUserService
class in thecom.example.myapp.services
package:com.example.myapp.services.UserService.createUser
- For the
add
method of theArrayList
class:java.util.ArrayList.add
- For a method
-
Kotlin:
- For a function
myFunction
inMyClass
within thecom.example.MyPackage
package:com.example.MyPackage.MyClass.myFunction
- For a property
myProperty
inMyObject
within thecom.example.MyPackage
package:com.example.MyPackage.MyObject.myProperty
- For a function
-
JavaScript/TypeScript (Conceptual):
- For a method
onClick
in aButton
component withinmyApp.components
:myApp.components.Button.onClick
- For a method
-
Ruby:
- Based on gitlab-org/rust/gitlab-code-parser!1+:
- For an instance method
instance_method
inMyClass
withinMyModule
:MyModule::MyClass::instance_method
- For a class method
class_method
inMyClass
withinMyModule
:MyModule::MyClass::class_method
- For a top-level method
top_level_method
:top_level_method
- For a method
valid_password?
inCredentialsChecker
class, nested underAuthenticationService
module:AuthenticationService::CredentialsChecker::valid_password?
- For an instance method
- Based on gitlab-org/rust/gitlab-code-parser!1+:
2. Rust Data Structure for FQN
There is an ongoing discussion and an existing implementation regarding the ideal Rust data structure for FQNs. The goal is to have a structure that is both descriptive and efficient.
Proposed Design: Vec<FQNPart>
As highlighted in the initial conversation context and discussed in relation to gitlab-org/rust/gitlab-code-parser#19+ and gitlab-org/rust/knowledge-graph#1+, a flexible and semantically rich structure is desired:
Angelo Rivera (from user query context): "I was going to comment on the parser epic that we should likely store FQN and as
Vec<FQNPart>
, where theFQNPart
contains the metadata on:node_type
- probably language specificnode_name
any other language specific metadata where the first element is the root node, an the last element is the node in question"
This Vec<FQNPart>
structure would look conceptually like this:
Loading syntax highlighting...
Rationale for Vec<FQNPart>
:
- Hierarchical Representation: The
Vec
naturally models the ordered, nested structure of an FQN. - Semantic Richness:
FQNPart
allows each segment of the FQN to carry explicit metadata like itsnode_type
(e.g., distinguishing a module from a class with the same name) and otherlanguage_specific_metadata
(e.g., parameter types for function overloading, template arguments). - Flexibility and Extensibility: This design can adapt to the diverse ways FQNs are constructed across different programming languages by accommodating language-specific details within
FQNPart
.
Current Ruby Parser Implementation: Arc<Vec<String>>
The Ruby parser implemented in gitlab-org/rust/gitlab-code-parser!1+ uses a simpler, performance-oriented structure for FQNs:
Loading syntax highlighting...
Explanation and Rationale for Arc<Vec<String>>
:
parts: Arc<Vec<String>>
:Vec<String>
: Stores the FQN as an ordered list of name segments (e.g.,["MyModule", "MyClass", "instance_method"]
).Arc
(Atomic Reference Counted pointer): This is a key optimization. FQNs often share common prefixes (e.g., many methods within the same class).Arc
allows multipleFqn
instances to share ownership of the underlyingVec<String>
data without copying, significantly reducing memory usage and improving performance, especially in large codebases.
- Efficiency: This structure is very efficient for storage and comparison.
- Implicit Metadata: While it doesn't explicitly store
node_type
or other metadata per segment, the logic that constructs these FQNs (infqn.rs
andindexing.rs
of the MR) implicitly understands the type of nodes (class, module, method) it includes in the FQN parts.
Bridging the Two:
The Arc<Vec<String>>
is a practical and performant representation for the node_name
components of an FQN. The richer Vec<FQNPart>
proposal aims to add more explicit semantic information to each part. Future development might involve evolving the Arc<Vec<String>>
to incorporate elements of the FQNPart
proposal, or using FQNPart
as the canonical representation that can be serialized or simplified for certain use cases. The choice depends on the trade-offs between semantic richness, performance, and memory overhead required by the Knowledge Graph and other consumers.
3. How FQN will be used in the Indexer
FQNs are a cornerstone of the GitLab Knowledge Graph project (gitlab-org&17514+), particularly for the Knowledge Graph Core Indexer
(gitlab-org&17517+). The gitlab-code-parser
(gitlab-org&17516+) is responsible for generating these FQNs.
Core Roles in the Indexer:
-
Primary Key for Code Entities: FQNs will serve as the unique identifier (primary key) for nodes representing code entities (definitions like functions, classes, methods) in the graph database (e.g., Kuzu DB).
Angelo Rivera (comment on gitlab-org/rust/knowledge-graph#1+, 2025-04-15, as per report): "The parser will output a stream of nodes and edges. The indexer will consume this stream and store it in a graph database. The FQN will be the primary key for nodes and will be used to link nodes together."
-
Enabling Relationships (Edges): Once definitions are uniquely identified by their FQNs, the indexer uses these FQNs to establish relationships (edges) between them. This includes:
- Linking a function call (reference) to its definition.
- Linking an import statement to the imported entity's definition.
- Representing inheritance, implementation, and containment relationships.
Jean-Gabriel (comment on gitlab-org/rust/knowledge-graph#1+, 2025-04-15, as per report): "So FQN will be used to identify unique nodes, and then edges will connect these nodes based on relationships (e.g.,
calls
,inherits
,imports
)."
Phased Implementation in the Knowledge Graph:
The proposal gitlab-org/rust/knowledge-graph#1+ outlines a phased rollout where FQNs become increasingly critical:
- Phase 1: Definitions and Imports Only: Focuses on structural relationships (directories, files, definitions, imports). FQNs are generated for definitions.
- Phase 2: Same-File References: Links references to definitions within the same file.
- Phase 3: Cross-File References: This is where FQNs are indispensable.
From gitlab-org/rust/knowledge-graph#1+ description: "The parser will already provide the imported symbols for every file, along with their FQNs. The indexer will use the imports to locate the original definition for a cross-file reference and create a link."
Indexer Workflow (Example from Ruby Parser in gitlab-org/rust/gitlab-code-parser!1+):
The Ruby parser MR demonstrates a concrete indexing workflow that heavily relies on FQNs:
-
AST Traversal and FQN Precomputation:
- The
build_fqn_and_node_indices
function incrates/parser-core/src/ruby/indexing.rs
performs a single Depth-First Search (DFS) traversal of the Abstract Syntax Tree (AST). - During traversal, it maintains a
current_scope
(aVec<String>
representing the path to the current node). - For definition nodes (classes, modules, methods), it computes their FQN by combining the
current_scope
with the definition's name. - These FQNs (as
Arc<Vec<String>>
) are stored in anode_fqn_map: HashMap<ByteRange, (Node, Arc<Vec<String>>)>
. This map keys the byte range of the name node of a definition to the AST node itself and its precomputed FQN.
Loading syntax highlighting...
- The
-
Definition Processing:
- The
find_definitions
function (incrates/parser-core/src/ruby/definitions.rs
) uses these precomputed maps. - When
ast-grep
rules identify a definition match, the byte range of the definition's name is extracted. - This byte range is used to efficiently look up the precomputed FQN from
node_fqn_map
.
Loading syntax highlighting...
- The
This precomputation strategy ensures that FQN lookups are fast during the subsequent phases of analysis and graph building. The gitlab-code-parser
will output these FQNs along with other code entity information, which the Knowledge Graph Core Indexer
will then consume to build and store the graph in file-embedded Kuzu databases.
Overall Flow:
gitlab-code-parser
: Parses source code, identifies definitions and references, and computes their FQNs.- Output: The parser produces structured data, including entities and their FQNs.
Knowledge Graph Core Indexer
: Ingests this data.- Uses FQNs to create unique nodes for definitions.
- Uses FQNs to resolve references (especially cross-file) and create edges between nodes, forming the knowledge graph.
- Knowledge Graph Database: Stores the graph, enabling code intelligence features.
Conclusion
Fully Qualified Names are a fundamental concept for the gitlab-code-parser
and the GitLab Knowledge Graph. They provide the necessary unique identification and hierarchical context to accurately model codebases. While the Ruby parser currently employs an efficient Arc<Vec<String>>
representation, the broader vision includes a more semantically rich Vec<FQNPart>
structure. FQNs will be instrumental in the indexer's ability to link definitions and references across files, forming the backbone of advanced code intelligence features within GitLab. The phased approach to the Knowledge Graph will increasingly leverage FQNs to build a comprehensive and interconnected representation of code.