for this issue https://gitlab.com/gitlab-org/rust/gitlab-code-parser/-/issues/19, examine the parent epic and its issues

also take a look at https://gitlab.com/gitlab-org/rust/knowledge-graph/-/issues/1

and finally look at https://gitlab.com/gitlab-org/rust/gitlab-code-parser/-/merge_requests/1

Let's create a description for

  • "What is an FQN" (with some code examples)
  • and that we need to determine what we want the Rust data structure to look like for FQN
  • and also determine how it will be used in the indexer

Here is conversation context:

Did we establish a standard for FNQs anywhere? I'm writing the epic and issues for Kotlin and I realize we may not have specified the expected output anywhere?

19 replies

Angelo Rivera
  15 minutes ago
Not yet, but we need to.
You're referring to the expected datastructure right?

Jean-Gabriel
  12 minutes ago
both

Angelo Rivera
  12 minutes ago
I was going to comment on the parser epic that we should likely store FQN and as Vec<FQNPart> , where the FQNPart contains the metadata on:
node_type - probably language specific
node_name
any other language specific metadata
where the first element is the root node, an the last element is the node in question

Sparkles Icon
Assistant

I'll research the GitLab issues, epic, and merge request you mentioned to gather information about FQNs (Fully Qualified Names) and help create a comprehensive description.

Research agent completed

Research Complete

3 items analyzed Complete
Complete04:23 PM338.4K tokens•$0.0623
Token Usage Summary
338.4K
Total Tokens
312.8K
Input Tokens
25.6K
Output Tokens
$0.0623
Est. Cost

Executive Summary

Research completed successfully! I've analyzed 3 GitLab items and generated a comprehensive report.

Research Report: Fully Qualified Names (FQNs) for GitLab Code Parser and Knowledge Graph

This report synthesizes information from GitLab issues gitlab-org/rust/gitlab-code-parser#19+, gitlab-org/rust/knowledge-graph#1+, Merge Request gitlab-org/rust/gitlab-code-parser!1+, and provided conversation context to define Fully Qualified Names (FQNs), discuss the proposed Rust data structure for them, and outline their usage in the code indexer for the Knowledge Graph.

1. What is a Fully Qualified Name (FQN)?

A Fully Qualified Name (FQN) is a unique, unambiguous identifier for a code entity (such as a class, function, method, variable, module, or package) that specifies its complete hierarchical path within a codebase or project. The primary purpose of an FQN is to provide a stable and precise reference to a specific code element, resolving any ambiguity that might arise from elements sharing the same simple name in different scopes or modules.

Key Purposes and Importance of FQNs:

  • Unique Identification: Ensures that each distinct code entity has a singular, unambiguous identifier across the entire codebase. This is fundamental for any system that needs to track and relate code elements.
  • Hierarchical Context: Captures the nested structure of code, showing how an entity relates to its parent modules, classes, or functions. This reflects the organizational structure of the source code.
  • Linking and Relationships: Serves as a primary key for nodes in a graph database (like the GitLab Knowledge Graph). This enables the creation of edges representing various relationships such as calls, imports, inheritance, and containment between different code entities.
  • Cross-File/Cross-Module Resolution: Allows tools to accurately link references to their definitions, even when they reside in different files or modules.
  • Foundation for Code Intelligence: FQNs are crucial for building advanced code intelligence features, including code navigation ("Go to Definition," "Find Usages"), dependency analysis, impact analysis, semantic search, and providing context for AI-driven applications like code embeddings.

Code Examples of FQNs:

The specific syntax and conventions for FQNs vary by programming language, but the underlying concept of a complete, hierarchical path remains consistent.

  • Python:

    • For a function calculate_sum defined in my_project/src/utils/helpers.py: my_project.src.utils.helpers.calculate_sum
    • For a method my_method within MyClass in my_project/main.py: my_project.main.MyClass.my_method
  • Java:

    • For a method createUser within the UserService class in the com.example.myapp.services package: com.example.myapp.services.UserService.createUser
    • For the add method of the ArrayList class: java.util.ArrayList.add
  • Kotlin:

    • For a function myFunction in MyClass within the com.example.MyPackage package: com.example.MyPackage.MyClass.myFunction
    • For a property myProperty in MyObject within the com.example.MyPackage package: com.example.MyPackage.MyObject.myProperty
  • JavaScript/TypeScript (Conceptual):

    • For a method onClick in a Button component within myApp.components: myApp.components.Button.onClick
  • Ruby:

    • Based on gitlab-org/rust/gitlab-code-parser!1+:
      • For an instance method instance_method in MyClass within MyModule: MyModule::MyClass::instance_method
      • For a class method class_method in MyClass within MyModule: MyModule::MyClass::class_method
      • For a top-level method top_level_method: top_level_method
      • For a method valid_password? in CredentialsChecker class, nested under AuthenticationService module: AuthenticationService::CredentialsChecker::valid_password?

2. Rust Data Structure for FQN

There is an ongoing discussion and an existing implementation regarding the ideal Rust data structure for FQNs. The goal is to have a structure that is both descriptive and efficient.

Proposed Design: Vec<FQNPart>

As highlighted in the initial conversation context and discussed in relation to gitlab-org/rust/gitlab-code-parser#19+ and gitlab-org/rust/knowledge-graph#1+, a flexible and semantically rich structure is desired:

Angelo Rivera (from user query context): "I was going to comment on the parser epic that we should likely store FQN and as Vec<FQNPart>, where the FQNPart contains the metadata on: node_type - probably language specific node_name any other language specific metadata where the first element is the root node, an the last element is the node in question"

This Vec<FQNPart> structure would look conceptually like this:

Loading syntax highlighting...

Rationale for Vec<FQNPart>:

  • Hierarchical Representation: The Vec naturally models the ordered, nested structure of an FQN.
  • Semantic Richness: FQNPart allows each segment of the FQN to carry explicit metadata like its node_type (e.g., distinguishing a module from a class with the same name) and other language_specific_metadata (e.g., parameter types for function overloading, template arguments).
  • Flexibility and Extensibility: This design can adapt to the diverse ways FQNs are constructed across different programming languages by accommodating language-specific details within FQNPart.

Current Ruby Parser Implementation: Arc<Vec<String>>

The Ruby parser implemented in gitlab-org/rust/gitlab-code-parser!1+ uses a simpler, performance-oriented structure for FQNs:

Loading syntax highlighting...

Explanation and Rationale for Arc<Vec<String>>:

  • parts: Arc<Vec<String>>:
    • Vec<String>: Stores the FQN as an ordered list of name segments (e.g., ["MyModule", "MyClass", "instance_method"]).
    • Arc (Atomic Reference Counted pointer): This is a key optimization. FQNs often share common prefixes (e.g., many methods within the same class). Arc allows multiple Fqn instances to share ownership of the underlying Vec<String> data without copying, significantly reducing memory usage and improving performance, especially in large codebases.
  • Efficiency: This structure is very efficient for storage and comparison.
  • Implicit Metadata: While it doesn't explicitly store node_type or other metadata per segment, the logic that constructs these FQNs (in fqn.rs and indexing.rs of the MR) implicitly understands the type of nodes (class, module, method) it includes in the FQN parts.

Bridging the Two:

The Arc<Vec<String>> is a practical and performant representation for the node_name components of an FQN. The richer Vec<FQNPart> proposal aims to add more explicit semantic information to each part. Future development might involve evolving the Arc<Vec<String>> to incorporate elements of the FQNPart proposal, or using FQNPart as the canonical representation that can be serialized or simplified for certain use cases. The choice depends on the trade-offs between semantic richness, performance, and memory overhead required by the Knowledge Graph and other consumers.

3. How FQN will be used in the Indexer

FQNs are a cornerstone of the GitLab Knowledge Graph project (gitlab-org&17514+), particularly for the Knowledge Graph Core Indexer (gitlab-org&17517+). The gitlab-code-parser (gitlab-org&17516+) is responsible for generating these FQNs.

Core Roles in the Indexer:

  1. Primary Key for Code Entities: FQNs will serve as the unique identifier (primary key) for nodes representing code entities (definitions like functions, classes, methods) in the graph database (e.g., Kuzu DB).

    Angelo Rivera (comment on gitlab-org/rust/knowledge-graph#1+, 2025-04-15, as per report): "The parser will output a stream of nodes and edges. The indexer will consume this stream and store it in a graph database. The FQN will be the primary key for nodes and will be used to link nodes together."

  2. Enabling Relationships (Edges): Once definitions are uniquely identified by their FQNs, the indexer uses these FQNs to establish relationships (edges) between them. This includes:

    • Linking a function call (reference) to its definition.
    • Linking an import statement to the imported entity's definition.
    • Representing inheritance, implementation, and containment relationships.

    Jean-Gabriel (comment on gitlab-org/rust/knowledge-graph#1+, 2025-04-15, as per report): "So FQN will be used to identify unique nodes, and then edges will connect these nodes based on relationships (e.g., calls, inherits, imports)."

Phased Implementation in the Knowledge Graph:

The proposal gitlab-org/rust/knowledge-graph#1+ outlines a phased rollout where FQNs become increasingly critical:

  • Phase 1: Definitions and Imports Only: Focuses on structural relationships (directories, files, definitions, imports). FQNs are generated for definitions.
  • Phase 2: Same-File References: Links references to definitions within the same file.
  • Phase 3: Cross-File References: This is where FQNs are indispensable.

    From gitlab-org/rust/knowledge-graph#1+ description: "The parser will already provide the imported symbols for every file, along with their FQNs. The indexer will use the imports to locate the original definition for a cross-file reference and create a link."

Indexer Workflow (Example from Ruby Parser in gitlab-org/rust/gitlab-code-parser!1+):

The Ruby parser MR demonstrates a concrete indexing workflow that heavily relies on FQNs:

  1. AST Traversal and FQN Precomputation:

    • The build_fqn_and_node_indices function in crates/parser-core/src/ruby/indexing.rs performs a single Depth-First Search (DFS) traversal of the Abstract Syntax Tree (AST).
    • During traversal, it maintains a current_scope (a Vec<String> representing the path to the current node).
    • For definition nodes (classes, modules, methods), it computes their FQN by combining the current_scope with the definition's name.
    • These FQNs (as Arc<Vec<String>>) are stored in a node_fqn_map: HashMap<ByteRange, (Node, Arc<Vec<String>>)>. This map keys the byte range of the name node of a definition to the AST node itself and its precomputed FQN.
    Loading syntax highlighting...
  2. Definition Processing:

    • The find_definitions function (in crates/parser-core/src/ruby/definitions.rs) uses these precomputed maps.
    • When ast-grep rules identify a definition match, the byte range of the definition's name is extracted.
    • This byte range is used to efficiently look up the precomputed FQN from node_fqn_map.
    Loading syntax highlighting...

This precomputation strategy ensures that FQN lookups are fast during the subsequent phases of analysis and graph building. The gitlab-code-parser will output these FQNs along with other code entity information, which the Knowledge Graph Core Indexer will then consume to build and store the graph in file-embedded Kuzu databases.

Overall Flow:

  1. gitlab-code-parser: Parses source code, identifies definitions and references, and computes their FQNs.
  2. Output: The parser produces structured data, including entities and their FQNs.
  3. Knowledge Graph Core Indexer: Ingests this data.
    • Uses FQNs to create unique nodes for definitions.
    • Uses FQNs to resolve references (especially cross-file) and create edges between nodes, forming the knowledge graph.
  4. Knowledge Graph Database: Stores the graph, enabling code intelligence features.

Conclusion

Fully Qualified Names are a fundamental concept for the gitlab-code-parser and the GitLab Knowledge Graph. They provide the necessary unique identification and hierarchical context to accurately model codebases. While the Ruby parser currently employs an efficient Arc<Vec<String>> representation, the broader vision includes a more semantically rich Vec<FQNPart> structure. FQNs will be instrumental in the indexer's ability to link definitions and references across files, forming the backbone of advanced code intelligence features within GitLab. The phased approach to the Knowledge Graph will increasingly leverage FQNs to build a comprehensive and interconnected representation of code.