go through this epic https://gitlab.com/groups/gitlab-org/-/epics/17514 and fully research the architecture according to the work items I'd like to make descriptions for. Make sure each work item is appropriately researched.

Keep it the research scoped to just the epic and its children.

Task,Notes,Theme,Project,Crate/Packages in Project,Applies to Client | Server | Both,Issue,Priority,Owner,Status,Milestone
Ruby Parser,"We must implement this Language's parsing and data extraction (through AST).

The following needs to be extracted:
- Definitions
  - Modules
  - Classes
  - Methods
  - Constants
- References
- Imports
  - relative
  - module/dependency",Language Support,GitLab Code Parser,Core Parser,Client & Server,,P0,,In progress,
Typescript/JS Parser,"We must implement this Language's parsing and data extraction (through AST).

The following needs to be extracted:
- Definitions
  - Modules
  - Classes
  - Methods
  - Constants
- References
- Imports
  - relative
  - alias imports
  - module/dependency

Michael Usachenko",Language Support,GitLab Code Parser,Core Parser,Client & Server,https://gitlab.com/gitlab-org/rust/gitlab-code-parser/-/issues/2,P0,,Not started,
GoLang Parser,,Language Support,GitLab Code Parser,Core Parser,Client & Server,,P0,,Not started,
Python Parser,Jonathan Shobrook,Language Support,GitLab Code Parser,Core Parser,Client & Server,,P0,,Not started,
Kotlin Parser,,Language Support,GitLab Code Parser,Core Parser,Client & Server,,P0,,,
Unit Testing Framework,,Testing,GitLab Code Parser,Core Parser,Client & Server,,,,,
Integration Testing Framework,,Testing,GitLab Code Parser,Core Parser,Client & Server,,,,,
Benchmarking/Performance Framework,,Testing,GitLab Code Parser,Core Parser,Client & Server,,,,,
Unit Testing Framework,,Testing,Knowledge Graph Indexer,"Core Indexer crate, CLI crate, Language Server Bindings crate",Client & Server,,,,,
Integration Testing Framework,,Testing,Knowledge Graph Indexer,"CLI crate, Language Server Bindings crate, Core Indexer crate",Client & Server,,,,,
Benchmarking/Performance Framework,,Testing,Knowledge Graph Indexer,"Core Indexer crate, CLI crate",Client & Server,,,,,
Client Side Usage Collector Service,Will use the https://github.com/getsentry/sentry-rust package,Observability,Knowledge Graph Indexer,,,,,,,
Client Side Error Collector Service,Will use the https://github.com/getsentry/sentry-rust package,Observability,Knowledge Graph Indexer,,,,,,,
Server Side Usage Collector Service,TBD - Needs confirmation on how the server will collect and propagate metrics/usage from the indexer,Observability,Knowledge Graph Indexer,,,,,,,
Server Side Error Collector Service,TBD - Needs confirmation on how the server will collect and propagate errors from the indexer,Observability,Knowledge Graph Indexer,,,,,,,
Abstract Metric Service,"This service will take in a generic ""collector"" service/struct, which should be used throughout the codebase to collect metrics.

In this service, we should define ""collect"" and ""flush"" events, which will trigger the metric capturing (either sent to Sentry on the client or Whatever Server will use to collect)
",Observability,Knowledge Graph Indexer,Core Indexer crate,Client & Server,,,,,
Abstract Error Service,"Roughly the same as the Abstract Metric Service, but tailored for collecting Errors",Observability,Knowledge Graph Indexer,Core Indexer crate,Client & Server,,,,,
Database Connection Service,"A wrapper service that takes in a Kuzu connection/database object and performs common operations (write, read, copy from, etc). ",Indexer Specific Code,Knowledge Graph Indexer,Core Indexer crate,Client & Server,,,,,
Database Tables / Schema Service,This service should define the various tables and relationships in a type-safe API that Writer (Bulk or Incremental) service can use,Indexer Specific Code,Knowledge Graph Indexer,Core Indexer crate,Client & Server,,,,,
Analysis Service,"This service is responsible for leveraging the gitlab-code-parser. 

It will input a file path and file content and return the appropriate data depending on the language. 

This will return the definition, import, reference, and any other nodes along with metadata (like the fully qualified name). 

We should leverage generic types here (depending on some language Enum), which will return the type depending on the language so that consumers can perform language specific logic (like the relationship builder service)",Indexer Specific Code,Knowledge Graph Indexer,Core Indexer crate,Client & Server,,,,,
Relationship Builder Service,"This layer will be responsible for resolving the following relationships

- Directory -> Directory
- Directory -> File
- File -> Definitions (Class, Method, Type, Constant etc)
- File -> References
- References -> Definitions

The final output should be the nodes that adhere to the Database Schema to later be later inserted.

This should go to the Writer Service",Indexer Specific Code,Knowledge Graph Indexer,Core Indexer crate,Client & Server,,,,,
Writer Service - General,"This layer should determine the strategy for the incoming write (can manually set via arguments, such as ""init""), and pass the write operation to Bulk Writer or Incremental",Indexer Specific Code,Knowledge Graph Indexer,Core Indexer crate,Client & Server,,,,,
Writer Service - Bulk Writer,"This layer is a sub-layer of the general Writer Service. This should be responsible for initial imports into the Kuzu Database (leveraging COPY From) and handling large updates (large node deletion, updates, etc)",Indexer Specific Code,Knowledge Graph Indexer,Core Indexer crate,Client & Server,,,,,
Writer Service - Incremental,"This layer is a sub-layer of the general Writer Service. This should be responsible for incremental updates for node changes (such as a single file change). This will leverage Cypher queries to mutate data (Merge, Set, etc)",Indexer Specific Code,Knowledge Graph Indexer,Core Indexer crate,Client,,,,,
Workspace Management Service,"For the client side, we need a way for the user to point the knowledge graph indexer to multiple repositories (a workspace folder).  

This will include: 
- traversing the file system to discover repositories depending on the workspace URI
- any other local repository operations (in the future, obtaining git data)
- being able to associate the local project with the GitLab/GitHub remote",Indexer Specific Code,Knowledge Graph Indexer,"CLI crate, Language Server Bindings crate",Client,,,,,
Language Data Extraction Service,"This layer leverages the GitLab Code Parser to extract definitions and references.

This logic will encapsulate the following:
- accepting list of files
- kicking off worker(s)/threads to iterate over the files
- extracting the data via the AST service
- extracting the data and send message to ""matcher""

",Indexer Specific Code,Knowledge Graph Indexer,Core Indexer crate,Client & Server,,,,,
Query Service,"This service should contain all logic needed for querying data, including:

- Common Cypher Queries (get definitions for X file)
- Extracting any node data from a file (can use Analysis Service if needed)",Indexer Specific Code,Knowledge Graph Indexer,Core Indexer crate,Client & Server,,,,,
,- linux (UNIX) first,,,,,,,,,
Sparkles Icon
Assistant
Research agent completed

Research Complete

18 items analyzed Complete
Complete10:49 PM4.3M tokens•$0.8252
Token Usage Summary
4.3M
Total Tokens
3.9M
Input Tokens
393.8K
Output Tokens
$0.8252
Est. Cost

Executive Summary

Research completed successfully! I've analyzed 18 GitLab items and generated a comprehensive report.

GitLab Knowledge Graph: Architecture Synthesis

This report synthesizes research on the GitLab Knowledge Graph initiative, focusing on the architecture as defined within the epic gitlab-org&17514, Knowledge Graph First Iteration and its child items. The research aims to provide detailed architectural context for the specific work items listed in the original user request.

1. Vision and Goals

The GitLab Knowledge Graph project aims to revolutionize how developers interact with and understand code repositories. The core vision, as stated in gitlab-org&17514, is:

Vision Statement

Imagine opening any repository—on your laptop, in CI, or inside the Web IDE—typing one command, and instantly seeing how every file, class, and import fits together.

We aim for any developer to access the GitLab Knowledge Graph project, run the tool on any machine, point it to a repository, and generate a working Repository Knowledge Graph of the project. Then, the knowledge graph can be extended and deeply integrated with GitLab features for customers. Think Gource for Repository code and GitLab features, for AI use cases.

  • Open‑source & stand‑alone: works on macOS, Linux, Windows; no GitLab account required. By making this a standalone project and lowering the barrier to entry, we can encourage community contributions.
  • Deeply integrated: the very same artifact enriches the GitLab Monolith, Language Server, CLI, and future AI features—one graph, everywhere.

The primary problem addressed is the insufficiency of current AI data sources within GitLab for complex code understanding. The Knowledge Graph, particularly through a "Knowledge Graph Hybrid Embeddings approach" (mentioned in the related epic gitlab-org&16210, X-Ray Graph: Knowledge Graph-based Code Intelligence), is intended to provide a more scalable and comprehensive solution.

Goals from gitlab-org&16210:

  1. Build a hybrid engine capable of running/indexing both locally and on server.
  2. Maintain a central data source based on the main branch with diff indexing support.
  3. Enable server-side data availability for non-IDE use cases.
  4. Centralize storage of costly data (descriptions, crawled data, embeddings).
  5. Support ad-hoc creation for non-GitLab repos and offline scenarios.
  6. Implement hybrid Cypher and Vector Search capabilities.
  7. Enable multi-repository support for organization-wide insights.

A visual representation of an early Proof of Concept (POC) Knowledge Graph is mentioned in gitlab-org&16210:

![POC Knowledge Graph](/uploads/a3a5de54c9206be6f07285e65e77542e/image.png)

2. Overall Architecture

The Knowledge Graph architecture is a modular system built primarily in Rust, designed for performance, cross-platform compatibility, and integration across various GitLab features. It revolves around several core components:

  • GitLab Code Parser (One Parser - gitlab-org&17516): A unified static code analysis library responsible for parsing source code of various languages and extracting structured data (definitions, references, imports) using Abstract Syntax Trees (ASTs).
  • Gitalisk (Repository Service v2 - gitlab-org&17515): A reusable, open-source library for performing git actions and managing repository interactions across desktop operating systems. This is a foundational service for components needing to interact with local repositories.
  • Knowledge Graph Core Indexer (gitlab-org&17517): The central engine that consumes data from the Code Parser and Gitalisk, builds relationships between code entities, and writes this information into a graph database.
  • Knowledge Graph Server (gitlab-org&17518): Handles the server-side aspects, including storing graph data, managing graph nodes (leveraging existing Zoekt infrastructure), and serving queries.
  • Database: The chosen graph database is Kuzu DB, which is file-embedded. Each repository will have its own Kuzu DB instance.
  • Query Language: Cypher is the query language used to interact with Kuzu DB.

The system is designed for hybrid operation, supporting both client-side (local indexing and querying, e.g., in IDEs via Language Server) and server-side (centralized indexing, querying via GitLab Rails) use cases.

The relationship between these core Rust libraries is highlighted in gitlab-org/gitlab#536080, [Knowledge Graph] Publish Rust Crate:

The Knowledge Graph initiative requires sharing of core Rust libraries—specifically, the gitlab-org&17515, Gitalisk (Repository Service v2) and gitlab-org&17516, One Parser (gitlab-code-parser) projects—across multiple downstream consumers, most notably the gitlab-org&17517, Knowledge Graph Core Indexer Project crate.

3. Detailed Component Breakdown and Work Items

This section details the architecture relevant to each work item category provided by the user.

3.1. Language Parsers (GitLab Code Parser Project)

The GitLab Code Parser (gitlab-org&17516) aims to:

Establish a single, efficient, and reliable static code analysis library (gitlab-code-parser) built in Rust. This library will serve as the foundation for diverse code intelligence features across GitLab...

It uses tree-sitter as a core dependency for AST parsing. The goal for each language parser is to extract:

  • Definitions (Modules, Classes, Methods, Constants)
  • References
  • Imports (relative, module/dependency, language-specific like alias imports)
  • Fully Qualified Names (FQNs) for these entities.

The following language parsers are planned:

  • Ruby Parser:

    • Description: Implement parsing and AST data extraction for Ruby.
    • Status: In progress (as per user's list).
  • Typescript/JS Parser:

    • Description: Implement parsing and AST data extraction for TypeScript/JavaScript.
    • Issue: gitlab-org/rust/gitlab-code-parser#2, TypeScript/Javascript
    • Details from gitlab-org/rust/gitlab-code-parser#2: This issue provides a detailed table for handling various import syntaxes in ES Modules and CommonJS, mapping them to AST nodes and capture names.
      ### Imports
      
      | Family / Purpose | Example syntax | Identifier-capture name(s) | Path-capture name | Primary AST node(s)* |
      |------------------|----------------|----------------------------|-------------------|-----------------------|
      | **ES-module – named** | `import { foo } from 'pkg'` | `imported-identifier` | `import-source` | `import_statement → named_imports → import_specifier` |
      | **ES-module – named + alias** | `import { foo as bar } from 'pkg'` | `imported-identifier`, `renamed-identifier` | `import-source` | same as above (`alias` child inspected) |
      | **ES-module – default** | `import baz from 'pkg'` | `default-import` | `import-source` | `import_statement → import_clause` |
      | **ES-module – namespace** | `import * as utils from 'pkg'` | `namespace-import` | `import-source` | `import_statement → namespace_import` |
      | **ES-module – side-effect** | `import 'pkg'` | _(none)_ | `import-source` | `import_statement` |
      | **CommonJS – simple require** | `const fizz = require('pkg')` | `require-name` | `require-source` | `variable_declarator → call_expression` |
      ... (table continues) ...
      
    • Comment from gitlab-org/rust/gitlab-code-parser#2 by Michael Usachenko (2024-05-09T15:53:32Z):

      @michaelangeloio Michael Usachenko commented: Created this issue to track the work for the TypeScript/Javascript parser. I've added a table to the description that outlines the different types of imports we need to handle.

  • GoLang Parser:

  • Python Parser:

    • Description: Implement parsing and AST data extraction for Python.
    • Epic: gitlab-org&18004, First iteration of the Python parser
    • Goal from gitlab-org&18004:

      We want to add support for Python to the One Parser project. This will enable knowledge graph construction for Python codebases... Given a Python file, the parser should extract:

      1. Definitions (functions, classes, and class methods)
      2. References (function/method calls, class instantiations)
      3. Imports It should also compute a fully qualified name (FQN) for each one.
    • Child Issues detailing Python parsing:
  • Kotlin Parser:

    • Description: Implement parsing and AST data extraction for Kotlin.

3.2. Testing Frameworks

Testing is crucial for both the GitLab Code Parser and Knowledge Graph Indexer.

  • GitLab Code Parser Testing:

  • Knowledge Graph Indexer Testing:

    • Unit Testing Framework: For Core Indexer crate, CLI crate, Language Server Bindings crate.
    • Integration Testing Framework: For CLI crate, Language Server Bindings crate, Core Indexer crate.
    • Benchmarking/Performance Framework: For Core Indexer crate, CLI crate.

While the work items list these frameworks, specific architectural details for them were not deeply elaborated in the researched epics/issues beyond acknowledging their necessity.

3.3. Observability (Knowledge Graph Indexer Project)

Observability services are planned for the Knowledge Graph Indexer.

  • Client Side Usage/Error Collector Service:

    • Notes: Will use the https://github.com/getsentry/sentry-rust package.
    • Description: Services for collecting usage and error data from client-side indexer components.
  • Server Side Usage/Error Collector Service:

    • Notes: TBD - Needs confirmation on how the server will collect and propagate metrics/usage/errors from the indexer.
    • Description: Services for server-side collection. Issue gitlab-org/gitlab#540844, Add monitoring and logging for graph DBs (child of gitlab-org&17767) details extending Zoekt monitoring for graph DBs, tracking metrics like disk/memory usage, DB connections, request counts, and performance timings.
  • Abstract Metric/Error Service:

    • Notes: These services will take in a generic "collector" service/struct, define "collect" and "flush" events, and be used throughout the codebase for collecting metrics/errors.
    • Crate: Core Indexer crate.
    • Applies to: Client & Server.
    • Description: Abstract interfaces to decouple metric/error collection logic from specific reporting mechanisms.

3.4. Indexer Specific Code (Knowledge Graph Indexer Project)

The Knowledge Graph Core Indexer (gitlab-org&17517) is structured into several Rust crates:

  • CLI crate -> will be able to run the entire indexing process in a standalone CLI application
  • core -> will contain base logic. This will include the following:
    • AST extraction (with preconfigured rules)
    • Node matching logic (aka linking definitions to references, directories to files, etc)
    • Any associated data structures for IMPORT scan and cypher queries
  • client indexer -> this will leverage the ignore WalkParallel and performance critical methods for fast scanning & building of the graph
  • database client (TBD) -> in the case the monolith indexer wants to use its own database client, we will keep the database client abstracted from the core crate

The initial workspace setup for the indexer is shown in MR gitlab-org/rust/knowledge-graph!3, chore: init workspace, which created cli and indexer sub-crates.

The following services are planned within the indexer:

  • Database Connection Service:

    • Notes: "A wrapper service that takes in a Kuzu connection/database object and performs common operations (write, read, copy from, etc)."
    • Crate: Core Indexer crate.
    • Description: Abstraction for Kuzu DB interactions. The server-side query endpoint (gitlab-org/gitlab-zoekt-indexer#81) also implies managing DB connections.
  • Database Tables / Schema Service:

    • Notes: "This service should define the various tables and relationships in a type-safe API that Writer (Bulk or Incremental) service can use."
    • Crate: Core Indexer crate.
    • Description: Defines the graph schema. An initial schema proposal is in gitlab-org/gitlab#508973, Phase 0: X-Ray Graph Initial Experiment, including nodes like Repository, File, Class, Function and relationships like CONTAINS, USES, IMPORTS.
  • Analysis Service:

    • Notes: "This service is responsible for leveraging the gitlab-code-parser. It will input a file path and file content and return the appropriate data depending on the language... along with metadata (like the fully qualified name)."
    • Crate: Core Indexer crate.
    • Description: Interface to gitlab-code-parser for extracting language-specific data.
  • Relationship Builder Service:

    • Notes: "This layer will be responsible for resolving the following relationships: Directory -> Directory, Directory -> File, File -> Definitions, File -> References, References -> Definitions. The final output should be the nodes that adhere to the Database Schema..."
    • Crate: Core Indexer crate.
    • Description: Processes extracted data to resolve and build relationships between code entities. Corresponds to "Node matching logic" in the Core Indexer epic.
  • Writer Service - General, Bulk Writer, Incremental Writer:

    • Notes: General writer determines strategy (bulk/incremental). Bulk writer uses COPY FROM for initial imports/large updates. Incremental writer uses Cypher (MERGE, SET) for single file changes (client-side).
    • Crate: Core Indexer crate.
    • Description: Services for writing data to Kuzu DB with different strategies.
  • Workspace Management Service:

    • Notes: "For the client side, we need a way for the user to point the knowledge graph indexer to multiple repositories (a workspace folder)... traversing the file system... obtaining git data... associate the local project with the GitLab/GitHub remote."
    • Crate: CLI crate, Language Server Bindings crate.
    • Applies to: Client.
    • Description: Manages local workspaces, discovers repositories. Likely leverages Gitalisk (gitlab-org&17515) for Git operations. The Gitalisk epic vision is: "We want to create a reusable, interoperable, and open-source developer library for performing git actions across desktop operating systems workspaces."
  • Language Data Extraction Service:

    • Notes: "This layer leverages the GitLab Code Parser to extract definitions and references... accepting list of files - kicking off worker(s)/threads... extracting the data via the AST service... send message to "matcher"."
    • Crate: Core Indexer crate.
    • Description: Orchestrates parsing of multiple files using gitlab-code-parser. Used by gitlab-zoekt-indexer for the "create graph" task (gitlab-org/gitlab-zoekt-indexer#79).
  • Query Service:

    • Notes: "This service should contain all logic needed for querying data, including: Common Cypher Queries (get definitions for X file) - Extracting any node data from a file (can use Analysis Service if needed)."
    • Crate: Core Indexer crate.
    • Description: Encapsulates logic for querying the graph database using Cypher. Consumed by server-side query endpoint (gitlab-org/gitlab-zoekt-indexer#81).

3.5. Server-Side Architecture (Knowledge Graph Server)

The Knowledge Graph Server (gitlab-org&17518) details the server-side deployment:

  • Store knowledge graphs for repositories in file-embedded Kuzu DBs...
  • Build a thin API service which runs on graph nodes...
  • To avoid building graph nodes infrastructure from scratch, we will use Zoekt nodes... extend the existing gitlab-zoekt-indexer service...
  • Create an abstraction layer on GitLab Rails side... to query graph databases using Cypher query.

Graph Creation/Update Flow (from gitlab-org&17765, Creation/update of knowledge graph for a repository): GitLab Rails schedules indexing on a Zoekt node. The gitlab-zoekt-indexer (gitlab-org/gitlab-zoekt-indexer#79) fetches files, calls the Rust knowledge graph library (Core Indexer + Code Parser) to parse and update the Kuzu DB. Replication to other nodes is then scheduled (gitlab-org/gitlab#540854, gitlab-org/gitlab-zoekt-indexer#80).

Mermaid Diagram (click to expand)
sequenceDiagram
    box Rails
    participant GR[GitLab Rails]
    end
    box Graph node
    participant GN[Graph node (Zoekt)]
    end
    box Gitalisk
    participant G[Gitalisk]
    end
    box Parser
    participant P[gitlab-code-parser]
    end
    box Graph DB
    participant K[Kuzu DB]
    end

    GR->>GN: Schedule graph creation/update for repo X
    GN->>G: Fetch repository files for repo X (Potentially, or direct file access)
    G-->>GN: Repository files
    GN->>P: Parse repository files (via Indexer's Analysis/Language Data Extraction Service)
    P-->>GN: Parsed data (nodes, edges)
    GN->>K: Store parsed data (via Indexer's Writer Service)
    K-->>GN: Confirmation
    GN->>GR: Report graph updated for repo X
    GR->>GR: Schedule graph copy to other replicas (if any)

(Adapted from diagram in gitlab-org&17765)

Querying Flow (from gitlab-org&17766, Querying knowledge graph): Services in GitLab Rails query via a "Knowledge graph layer", which forwards Cypher queries to the zoekt-webservice on the appropriate Zoekt node, which then queries Kuzu DB.

Mermaid Diagram (click to expand)
flowchart TD
    subgraph GitLab Rails
    T1[Duo Chat Tool] -->|Cypher query for project X| KG_Layer(Knowledge graph layer)
    T2[Other service] -->|Cypher query for project Y| KG_Layer(Knowledge graph layer)
    end
    KG_Layer --> |Cypher query for project X| GA1(zoekt-webservice on Zoekt Node 1)
    KG_Layer --> |Cypher query for project Y| GA2(zoekt-webservice on Zoekt Node 2)
    subgraph Zoekt Node 1
    GA1 -->K1[Kuzu DB X]
    GA1 -->K2[Kuzu DB A (replica)]
    end
    subgraph Zoekt Node 2
    GA2 -->K21[Kuzu DB Y]
    GA2 -->K22[Kuzu DB C (replica)]
    end

(Adapted from diagram in gitlab-org&17766)

Graph Node Management (gitlab-org&17767, Graph node management): Reuses existing Zoekt nodes and infrastructure. Rails models (KnowledgeGraphEnabledRepository, KnowledgeGraphReplica) will be added (gitlab-org/gitlab#540785) and Zoekt nodes can be designated as "zoekt only", "kuzu only", or "zoekt and kuzu" (gitlab-org/gitlab#540786).

3.6. Packaging and Distribution

  • Rust Crates: Core libraries (gitlab-code-parser, gitalisk, knowledge-graph-core-indexer) will be published as Rust crates (gitlab-org/gitlab#536080).
  • NPM Package: An NPM package is planned for client-side components, potentially including Language Server bindings. This will be managed in a separate frontend project (gitlab-org/gitlab#536079, [GItalisk] Publish NPM Package).
    • Comment from gitlab-org/gitlab#536079 by vyaklushin (2024-04-15):

      @michaelangeloio I think it's better to have a separate project for the NPM package, e.g. gitlab-org/frontend/gitlab-knowledge-graph-indexer. This way, we can manage the package independently from the main GitLab project.

    • Comment from gitlab-org/gitlab#536079 by michaelangeloio (2024-04-15):

      I created the new project: https://gitlab.com/gitlab-org/frontend/gitlab-knowledge-graph-indexer

3.7. Development Priorities

  • Linux (UNIX) first: Development will initially prioritize Linux environments.

4. Limitations and Gaps

While the overall architecture and many components are well-defined, specific implementation details for some of the testing frameworks and the precise mechanisms for server-side observability data propagation (beyond extending Zoekt monitoring) were less detailed in the researched items. The internal architecture of services like the "matcher" component (mentioned in Language Data Extraction Service notes) is also not fully elaborated.

5. Conclusion

The GitLab Knowledge Graph project is building a sophisticated, Rust-based system for code analysis and graph-based code intelligence. The architecture is modular, comprising a dedicated Code Parser for multi-language static analysis, a Core Indexer for building and managing the Kuzu graph database, and a Server component that integrates with existing Zoekt infrastructure for hosting and querying. Gitalisk provides foundational repository interaction capabilities. The system is designed for hybrid client/server operation, emphasizing performance, reusability, and deep integration with GitLab features. The detailed work items provide a clear roadmap for implementing the various services required for parsing, indexing, querying, testing, and observability.

awesome now give me detailed descriptions of new issues for each item in the CSV, make them extremely detailed.

Sparkles Icon
Assistant

Okay, I will generate detailed issue descriptions for each item in your CSV based on the research report.

Here are the detailed issue descriptions:


Issue: Implement Ruby Parser for GitLab Code Parser

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17516 - One Parser (gitlab-code-parser)

Project: GitLab Code Parser Crate/Packages: Core Parser Applies to: Client & Server Priority: P0 Status: In progress

Description:

This issue is part of the effort to build a unified, efficient static code analysis library (gitlab-code-parser) in Rust, which is a foundational component for the GitLab Knowledge Graph. The goal is to add robust support for parsing Ruby code.

The core task is to implement the parsing logic for Ruby using tree-sitter (as established by the One Parser epic) and extract key structural information from the Abstract Syntax Tree (AST).

Specifically, the parser must be capable of extracting:

  • Definitions: Identify and extract definitions for:
    • Modules
    • Classes
    • Methods
    • Constants
  • References: Identify and extract references to code entities.
  • Imports: Identify and extract import statements, covering:
    • Relative imports
    • Module/dependency imports

The extracted data should include necessary metadata, such as Fully Qualified Names (FQNs), to be consumed by downstream components like the Knowledge Graph Indexer's Analysis Service and Relationship Builder Service.

This work is crucial for enabling the Knowledge Graph to understand and index Ruby codebases, contributing to features like code navigation, dependency analysis, and AI-assisted code understanding.


Issue: Implement Typescript/JS Parser for GitLab Code Parser

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17516 - One Parser (gitlab-code-parser) Related Issue: gitlab-org/rust/gitlab-code-parser#2

Project: GitLab Code Parser Crate/Packages: Core Parser Applies to: Client & Server Priority: P0 Status: Not started Owner: Michael Usachenko

Description:

This issue is part of the effort to build a unified, efficient static code analysis library (gitlab-code-parser) in Rust, a key component for the GitLab Knowledge Graph. The goal is to add comprehensive support for parsing TypeScript and JavaScript code.

The core task is to implement the parsing logic for TypeScript/JavaScript using tree-sitter (as established by the One Parser epic) and extract key structural information from the Abstract Syntax Tree (AST).

Specifically, the parser must be capable of extracting:

  • Definitions: Identify and extract definitions for:
    • Modules
    • Classes
    • Methods
    • Constants
  • References: Identify and extract references to code entities.
  • Imports: Identify and extract import statements, covering various syntaxes:
    • Relative imports
    • Alias imports (e.g., import { foo as bar })
    • Module/dependency imports (including named, default, namespace, and side-effect imports)
    • CommonJS require() calls

As detailed in gitlab-org/rust/gitlab-code-parser#2, special attention is needed to correctly handle the diverse import syntaxes in the JavaScript/TypeScript ecosystem and map them to appropriate AST nodes and capture names.

The extracted data should include necessary metadata, such as Fully Qualified Names (FQNs), to be consumed by downstream components like the Knowledge Graph Indexer's Analysis Service and Relationship Builder Service.

This work is crucial for enabling the Knowledge Graph to understand and index TypeScript/JavaScript codebases, contributing to features like code navigation, dependency analysis, and AI-assisted code understanding.


Issue: Implement GoLang Parser for GitLab Code Parser

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17516 - One Parser (gitlab-code-parser) Related Issue: gitlab-org/gitlab#536081 (POC for Go package)

Project: GitLab Code Parser Crate/Packages: Core Parser Applies to: Client & Server Priority: P0 Status: Not started

Description:

This issue is part of the effort to build a unified, efficient static code analysis library (gitlab-code-parser) in Rust, a key component for the GitLab Knowledge Graph. The goal is to add support for parsing Go code.

The core task is to implement the parsing logic for Go using tree-sitter (as established by the One Parser epic) and extract key structural information from the Abstract Syntax Tree (AST).

Specifically, the parser must be capable of extracting:

  • Definitions: Identify and extract definitions (e.g., functions, types, constants, variables).
  • References: Identify and extract references to code entities.
  • Imports: Identify and extract import statements (package imports).

The extracted data should include necessary metadata, such as Fully Qualified Names (FQNs), to be consumed by downstream components like the Knowledge Graph Indexer's Analysis Service and Relationship Builder Service.

Adding Go support is essential for extending the Knowledge Graph's capabilities to a significant language used within and outside of GitLab. The One Parser is intended to be consumable by projects written in other languages, including Go, as explored in gitlab-org/gitlab#536081.


Issue: Implement Python Parser for GitLab Code Parser

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17516 - One Parser (gitlab-code-parser) Related Epic: &18004 - First iteration of the Python parser Related Issues: gitlab-org/rust/gitlab-code-parser#3, gitlab-org/rust/gitlab-code-parser#5, gitlab-org/rust/gitlab-code-parser#8, gitlab-org/rust/gitlab-code-parser#6

Project: GitLab Code Parser Crate/Packages: Core Parser Applies to: Client & Server Priority: P0 Status: Not started Owner: Jonathan Shobrook

Description:

This issue is part of the effort to build a unified, efficient static code analysis library (gitlab-code-parser) in Rust, a key component for the GitLab Knowledge Graph. The goal is to add comprehensive support for parsing Python code, as detailed in the related epic &18004.

The core task is to implement the parsing logic for Python using tree-sitter (as established by the One Parser epic) and extract key structural information from the Abstract Syntax Tree (AST). The implementation should leverage ast-grep for defining pattern-matching rules where appropriate.

Specifically, the parser must be capable of extracting:

  • Definitions: Identify and extract definitions for named functions, classes, and class methods (excluding anonymous functions). This is tracked in gitlab-org/rust/gitlab-code-parser#3.
  • References: Identify and extract references, specifically class instantiations and function calls. This is tracked in gitlab-org/rust/gitlab-code-parser#5.
  • Imports: Identify and extract import statements, covering various syntaxes (import X, from X import Y, relative imports, and handling special cases like star imports and conditional imports). The parser should return imported symbols and their Fully Qualified Names (FQNs). This is tracked in gitlab-org/rust/gitlab-code-parser#8.

The parser should also compute a Fully Qualified Name (FQN) for each extracted definition and reference. Resolving FQNs for references in Python is complex due to its dynamic nature; the aim is for "pretty good" resolution using techniques like scope tracking, as discussed in gitlab-org/rust/gitlab-code-parser#6. Full resolution of imported function locations will be handled by the Knowledge Graph Indexer.

The extracted data should be consumed by downstream components like the Knowledge Graph Indexer's Analysis Service and Relationship Builder Service.

Adding Python support is essential for enabling the Knowledge Graph to understand and index Python codebases, contributing to features like code navigation, dependency analysis, and AI-assisted code understanding.


Issue: Implement Kotlin Parser for GitLab Code Parser

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17516 - One Parser (gitlab-code-parser)

Project: GitLab Code Parser Crate/Packages: Core Parser Applies to: Client & Server Priority: P0 Status:

Description:

This issue is part of the effort to build a unified, efficient static code analysis library (gitlab-code-parser) in Rust, a key component for the GitLab Knowledge Graph. The goal is to add support for parsing Kotlin code.

The core task is to implement the parsing logic for Kotlin using tree-sitter (as established by the One Parser epic) and extract key structural information from the Abstract Syntax Tree (AST).

Specifically, the parser must be capable of extracting:

  • Definitions: Identify and extract definitions (e.g., classes, objects, interfaces, functions, properties).
  • References: Identify and extract references to code entities.
  • Imports: Identify and extract import statements.

The extracted data should include necessary metadata, such as Fully Qualified Names (FQNs), to be consumed by downstream components like the Knowledge Graph Indexer's Analysis Service and Relationship Builder Service.

Adding Kotlin support is essential for extending the Knowledge Graph's capabilities to codebases written in Kotlin.


Issue: Implement Unit Testing Framework for GitLab Code Parser

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17516 - One Parser (gitlab-code-parser) Related Issue: gitlab-org/gitlab#536077

Project: GitLab Code Parser Crate/Packages: Core Parser Applies to: Client & Server Priority: Status:

Description:

This issue is to establish a robust unit testing framework within the gitlab-code-parser project, specifically for the Core Parser crate.

The gitlab-code-parser is a critical component of the Knowledge Graph, responsible for accurately parsing various programming languages. Unit tests are essential to ensure the correctness and reliability of the parsing logic for each language and the core AST extraction mechanisms.

This framework should facilitate writing isolated tests for individual functions, modules, and components within the Core Parser crate. It should integrate with Rust's standard testing capabilities or leverage appropriate testing libraries.

The initial setup of the gitlab-code-parser project (gitlab-org/gitlab#536077) included adding a unit test; this issue focuses on building out the comprehensive framework and test suite.

Implementing a strong unit testing framework is vital for maintaining code quality, preventing regressions, and enabling confident development as new languages and features are added to the parser.


Issue: Implement Integration Testing Framework for GitLab Code Parser

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17516 - One Parser (gitlab-code-parser)

Project: GitLab Code Parser Crate/Packages: Core Parser Applies to: Client & Server Priority: Status:

Description:

This issue is to establish an integration testing framework for the gitlab-code-parser project, specifically focusing on the Core Parser crate.

While unit tests verify individual components, integration tests are necessary to ensure that different parts of the parser work correctly together, particularly how the tree-sitter parsing interacts with the AST data extraction and the generation of structured output (definitions, references, imports).

This framework should allow testing the parser's behavior on realistic code snippets or files for supported languages, verifying that the extracted data matches expected outputs. It should cover scenarios involving complex code structures, various import types, and edge cases.

Implementing an integration testing framework is crucial for validating the end-to-end parsing process and ensuring the gitlab-code-parser reliably provides the data required by downstream Knowledge Graph components.


Issue: Implement Benchmarking/Performance Framework for GitLab Code Parser

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17516 - One Parser (gitlab-code-parser) Related Issue: gitlab-org/gitlab#536077

Project: GitLab Code Parser Crate/Packages: Core Parser Applies to: Client & Server Priority: Status:

Description:

This issue is to establish a benchmarking and performance testing framework for the gitlab-code-parser project, specifically for the Core Parser crate.

The gitlab-code-parser is designed to be efficient and performant, as it will process potentially large codebases. It's critical to measure and monitor its performance over time and identify bottlenecks.

This framework should allow for:

  • Benchmarking parsing speed for different languages and file sizes.
  • Measuring memory usage during parsing.
  • Tracking performance changes as the codebase evolves or new languages are added.
  • Running performance tests on representative code samples.

The initial setup of the gitlab-code-parser project (gitlab-org/gitlab#536077) mentioned the importance of benchmarking; this issue focuses on building out the dedicated framework.

Implementing a benchmarking framework is essential for ensuring the gitlab-code-parser meets performance requirements and remains a fast and efficient component of the Knowledge Graph.


Issue: Implement Unit Testing Framework for Knowledge Graph Indexer

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17517 - Knowledge Graph Core Indexer Project

Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate, CLI crate, Language Server Bindings crate Applies to: Client & Server Priority: Status:

Description:

This issue is to establish a robust unit testing framework within the knowledge-graph-indexer project, covering its core components: the Core Indexer crate, CLI crate, and Language Server Bindings crate.

The Knowledge Graph Indexer is the central engine responsible for consuming parsed data, building relationships, and writing to the graph database. Unit tests are essential to ensure the correctness and reliability of its various services (e.g., Database Connection, Schema, Relationship Builder, Writers, Query Service).

This framework should facilitate writing isolated tests for individual functions, modules, and components within these crates. It should integrate with Rust's standard testing capabilities or leverage appropriate testing libraries.

Implementing a strong unit testing framework is vital for maintaining code quality, preventing regressions, and enabling confident development across the different parts of the indexer.


Issue: Implement Integration Testing Framework for Knowledge Graph Indexer

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17517 - Knowledge Graph Core Indexer Project

Project: Knowledge Graph Indexer Crate/Packages: CLI crate, Language Server Bindings crate, Core Indexer crate Applies to: Client & Server Priority: Status:

Description:

This issue is to establish an integration testing framework for the knowledge-graph-indexer project, covering its key components: the CLI crate, Language Server Bindings crate, and Core Indexer crate.

While unit tests verify individual components, integration tests are necessary to ensure that the various services and crates within the indexer work correctly together and with external dependencies like the gitlab-code-parser and Kuzu DB.

This framework should allow testing scenarios such as:

  • End-to-end indexing of a small repository (reading files, parsing via Analysis Service, building relationships via Relationship Builder, writing to Kuzu via Writer Services).
  • Testing the interaction between the CLI or Language Server Bindings and the Core Indexer logic.
  • Verifying the correctness of data written to and queried from Kuzu DB.

Implementing an integration testing framework is crucial for validating the end-to-end indexing and querying processes and ensuring the Knowledge Graph Indexer functions correctly in realistic scenarios.


Issue: Implement Benchmarking/Performance Framework for Knowledge Graph Indexer

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17517 - Knowledge Graph Core Indexer Project

Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate, CLI crate Applies to: Client & Server Priority: Status:

Description:

This issue is to establish a benchmarking and performance testing framework for the knowledge-graph-indexer project, focusing on the Core Indexer crate and CLI crate.

The Knowledge Graph Indexer's performance is critical, especially for indexing large repositories and providing fast query responses. It's essential to measure and monitor its performance over time and identify bottlenecks.

This framework should allow for:

  • Benchmarking the end-to-end indexing process (initial bulk indexing and incremental updates).
  • Measuring the performance of key services like Relationship Builder and Writer Services.
  • Benchmarking query execution times for common Cypher queries via the Query Service.
  • Tracking performance changes as the codebase evolves or the graph schema changes.
  • Running performance tests on representative codebases and graph sizes.

Implementing a benchmarking framework is essential for ensuring the Knowledge Graph Indexer meets performance requirements for both client-side and server-side use cases.


Issue: Implement Client Side Usage Collector Service

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17517 - Knowledge Graph Core Indexer Project

Project: Knowledge Graph Indexer Crate/Packages: Applies to: Client Priority: Status:

Description:

This issue is to implement a service within the Knowledge Graph Indexer specifically for collecting usage data from client-side components (e.g., the CLI or Language Server).

This service will be responsible for capturing metrics related to how the indexer is used locally, such as indexing duration, features used, etc. This data is valuable for understanding user behavior, identifying popular features, and informing future development.

The service should leverage the https://github.com/getsentry/sentry-rust package for reporting usage data. It should integrate with the Abstract Metric Service to provide a standardized interface for collecting and flushing data.

This work contributes to the overall observability of the Knowledge Graph Indexer in client-side environments.


Issue: Implement Client Side Error Collector Service

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17517 - Knowledge Graph Core Indexer Project

Project: Knowledge Graph Indexer Crate/Packages: Applies to: Client Priority: Status:

Description:

This issue is to implement a service within the Knowledge Graph Indexer specifically for collecting error data from client-side components (e.g., the CLI or Language Server).

This service will be responsible for capturing errors and exceptions that occur during client-side operations, such as parsing failures, indexing errors, or database issues. This data is crucial for debugging, identifying stability problems, and improving the reliability of the indexer.

The service should leverage the https://github.com/getsentry/sentry-rust package for reporting error data. It should integrate with the Abstract Error Service to provide a standardized interface for collecting and flushing errors.

This work contributes to the overall observability and stability of the Knowledge Graph Indexer in client-side environments.


Issue: Implement Server Side Usage Collector Service

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17517 - Knowledge Graph Core Indexer Project Related Epic: &17767 - Graph node management Related Issue: gitlab-org/gitlab#540844

Project: Knowledge Graph Indexer Crate/Packages: Applies to: Server Priority: Status:

Description:

This issue is to implement a service within the Knowledge Graph Indexer specifically for collecting usage data from server-side components running on Graph nodes (Zoekt nodes).

This service will be responsible for capturing metrics related to how the indexer is used on the server, such as indexing duration for server-scheduled jobs, query execution counts, etc. This data is valuable for monitoring the health and performance of the server-side indexing infrastructure.

The mechanism for collecting and propagating metrics from the indexer running on Zoekt nodes needs to be confirmed (TBD). Research indicates that this might involve extending existing Zoekt monitoring capabilities, as discussed in gitlab-org/gitlab#540844, which focuses on adding monitoring for graph DBs including metrics like disk/memory usage, DB connections, request counts, and performance timings.

This service should integrate with the Abstract Metric Service to provide a standardized interface for collecting and flushing data, which will then be handled by the server-side reporting mechanism.

This work contributes to the overall observability of the Knowledge Graph Indexer in server-side environments.


Issue: Implement Server Side Error Collector Service

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17517 - Knowledge Graph Core Indexer Project Related Epic: &17767 - Graph node management Related Issue: gitlab-org/gitlab#540844

Project: Knowledge Graph Indexer Crate/Packages: Applies to: Server Priority: Status:

Description:

This issue is to implement a service within the Knowledge Graph Indexer specifically for collecting error data from server-side components running on Graph nodes (Zoekt nodes).

This service will be responsible for capturing errors and exceptions that occur during server-side operations, such as indexing job failures, database errors, or query execution problems. This data is crucial for debugging, identifying stability problems, and improving the reliability of the server-side indexing infrastructure.

The mechanism for collecting and propagating errors from the indexer running on Zoekt nodes needs to be confirmed (TBD). Research indicates that this might involve extending existing Zoekt monitoring capabilities, as discussed in gitlab-org/gitlab#540844, which focuses on adding monitoring for graph DBs.

This service should integrate with the Abstract Error Service to provide a standardized interface for collecting and flushing errors, which will then be handled by the server-side reporting mechanism.

This work contributes to the overall observability and stability of the Knowledge Graph Indexer in server-side environments.


Issue: Implement Abstract Metric Service for Knowledge Graph Indexer

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17517 - Knowledge Graph Core Indexer Project

Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate Applies to: Client & Server Priority: Status:

Description:

This issue is to implement an abstract service within the Core Indexer crate of the Knowledge Graph Indexer for collecting metrics.

This service should define a generic interface (e.g., a trait or struct) that represents a "collector". This interface should include methods like collect (to record a specific metric event) and flush (to trigger the sending of collected metrics to a reporting backend).

The purpose of this abstract service is to decouple the metric collection logic used throughout the indexer codebase from the specific reporting mechanism (e.g., Sentry on the client, or a TBD server-side system). Components within the indexer will interact with this abstract service, and concrete implementations will handle the actual sending of data based on whether the indexer is running client-side or server-side.

This service is fundamental for building a consistent and flexible observability framework within the Knowledge Graph Indexer.


Issue: Implement Abstract Error Service for Knowledge Graph Indexer

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17517 - Knowledge Graph Core Indexer Project

Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate Applies to: Client & Server Priority: Status:

Description:

This issue is to implement an abstract service within the Core Indexer crate of the Knowledge Graph Indexer for collecting errors.

Similar to the Abstract Metric Service, this service should define a generic interface (e.g., a trait or struct) that represents an "error collector". This interface should include methods for capturing error details and potentially triggering reporting.

The purpose of this abstract service is to decouple the error collection logic used throughout the indexer codebase from the specific reporting mechanism (e.g., Sentry on the client, or a TBD server-side system). Components within the indexer will interact with this abstract service, and concrete implementations will handle the actual sending of error data.

This service is fundamental for building a consistent and flexible error reporting framework within the Knowledge Graph Indexer, crucial for debugging and improving stability.


Issue: Implement Database Connection Service for Knowledge Graph Indexer

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17517 - Knowledge Graph Core Indexer Project

Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate Applies to: Client & Server Priority: Status:

Description:

This issue is to implement a Database Connection Service within the Core Indexer crate of the Knowledge Graph Indexer.

This service will act as a wrapper around the Kuzu DB connection and database objects. Its primary responsibility is to provide a standardized API for common database operations required by the indexer, such as:

  • Establishing and managing connections to the Kuzu database file.
  • Executing write operations.
  • Executing read operations (Cypher queries).
  • Handling bulk data loading operations (e.g., COPY FROM).

By encapsulating Kuzu interactions within this service, the rest of the indexer codebase remains decoupled from the specifics of the database library, making it easier to manage connections and perform operations consistently across different parts of the indexer (e.g., Writer Services, Query Service). This service will be used by both client-side and server-side indexer components.


Issue: Implement Database Tables / Schema Service for Knowledge Graph Indexer

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17517 - Knowledge Graph Core Indexer Project

Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate Applies to: Client & Server Priority: Status:

Description:

This issue is to implement a Database Tables / Schema Service within the Core Indexer crate of the Knowledge Graph Indexer.

This service is responsible for defining and managing the schema of the Kuzu graph database used by the Knowledge Graph. It should provide a type-safe API for representing the various node labels, relationship types, and their properties.

Key responsibilities include:

  • Defining the structure of nodes (e.g., Repository, File, Class, Function) and their properties.
  • Defining the structure of relationships (e.g., CONTAINS, USES, IMPORTS) and their properties.
  • Providing methods or structures that allow other services, particularly the Writer Services (Bulk and Incremental), to interact with the schema in a structured and type-safe manner when creating or updating nodes and relationships.

An initial schema proposal exists (gitlab-org/gitlab#508973) and should serve as a starting point. This service ensures consistency in how data is modeled and written to the graph database across both client and server implementations.


Issue: Implement Analysis Service for Knowledge Graph Indexer

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17517 - Knowledge Graph Core Indexer Project Related Epic: &17516 - One Parser (gitlab-code-parser)

Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate Applies to: Client & Server Priority: Status:

Description:

This issue is to implement the Analysis Service within the Core Indexer crate of the Knowledge Graph Indexer.

This service acts as the primary interface for the indexer to interact with the gitlab-code-parser. Its responsibility is to take raw file content and path, determine the language, and leverage the appropriate parser to extract structured code intelligence data.

Key functions of this service include:

  • Accepting a file path and its content as input.
  • Identifying the programming language of the file.
  • Invoking the gitlab-code-parser for the detected language.
  • Processing the output from the parser to return relevant data, such as definitions, imports, references, and associated metadata (like Fully Qualified Names).
  • Potentially using generic types or enums based on the language to allow consumers to handle language-specific nuances if necessary, although the goal is to provide a relatively standardized output format for the Relationship Builder.

This service is crucial as it bridges the gap between raw source code and the structured data needed to build the knowledge graph, serving both client and server indexing processes.


Issue: Implement Relationship Builder Service for Knowledge Graph Indexer

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17517 - Knowledge Graph Core Indexer Project

Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate Applies to: Client & Server Priority: Status:

Description:

This issue is to implement the Relationship Builder Service within the Core Indexer crate of the Knowledge Graph Indexer.

This service is a core piece of the indexing logic, responsible for taking the structured data extracted by the Analysis Service (from the gitlab-code-parser) and resolving the relationships between different code entities. This corresponds to the "Node matching logic" mentioned in the Core Indexer epic.

The service will process the extracted definitions, references, and imports and build the connections that form the graph. Key relationships to be resolved include:

  • Directory -> Directory (containment)
  • Directory -> File (containment)
  • File -> Definitions (containment of classes, methods, types, constants, etc.)
  • File -> References (containment)
  • References -> Definitions (linking a reference to its corresponding definition)

The output of this service should be a set of nodes and relationships that conform to the schema defined by the Database Tables / Schema Service, ready to be inserted into the Kuzu DB by the Writer Services. This service is essential for transforming parsed data into a connected graph structure, applicable to both client and server indexing.


Issue: Implement General Writer Service for Knowledge Graph Indexer

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17517 - Knowledge Graph Core Indexer Project

Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate Applies to: Client & Server Priority: Status:

Description:

This issue is to implement the General Writer Service within the Core Indexer crate of the Knowledge Graph Indexer.

This service acts as the main entry point for writing data (nodes and relationships) into the Kuzu graph database. Its primary responsibility is to determine the appropriate writing strategy based on the context of the indexing operation.

Key functions include:

  • Receiving data (nodes and relationships) from the Relationship Builder Service.
  • Determining whether a bulk write or an incremental write operation is required. This might be based on arguments passed to the indexer (e.g., an "init" flag for initial indexing) or internal logic detecting the scope of changes.
  • Delegating the actual write operation to either the Bulk Writer Service or the Incremental Writer Service.

This service provides a unified interface for data persistence and manages the different strategies needed for efficient graph population and updates, serving both client and server indexing processes.


Issue: Implement Bulk Writer Service for Knowledge Graph Indexer

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17517 - Knowledge Graph Core Indexer Project

Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate Applies to: Client & Server Priority: Status:

Description:

This issue is to implement the Bulk Writer Service within the Core Indexer crate of the Knowledge Graph Indexer. This service is a sub-layer of the General Writer Service.

This service is optimized for performing large-scale data writes to the Kuzu graph database. Its primary use cases are:

  • Initial Indexing: Populating the database for the first time from a full codebase scan. This should leverage Kuzu's efficient COPY FROM command for bulk data loading.
  • Large Updates: Handling scenarios where a significant portion of the graph needs to be updated or replaced, such as after a major refactoring or branch change. This might involve bulk deletion and re-insertion of nodes and relationships.

The service should interact with the Database Connection Service to execute these bulk operations efficiently. It must handle potential performance considerations and error handling for large data volumes. This service is crucial for the initial setup and major updates of the knowledge graph, applicable to both client and server indexing.


Issue: Implement Incremental Writer Service for Knowledge Graph Indexer

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17517 - Knowledge Graph Core Indexer Project

Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate Applies to: Client Priority: Status:

Description:

This issue is to implement the Incremental Writer Service within the Core Indexer crate of the Knowledge Graph Indexer. This service is a sub-layer of the General Writer Service.

This service is designed for handling small, targeted updates to the Kuzu graph database, primarily for client-side use cases like changes within a single file in an IDE.

Key functions include:

  • Receiving data (nodes and relationships) representing changes (additions, modifications, deletions) from the Relationship Builder Service.
  • Executing targeted updates using Cypher queries, specifically MERGE (for creating or matching nodes/relationships), SET (for updating properties), and DELETE (for removing nodes/relationships).
  • Interacting with the Database Connection Service to execute these Cypher queries.

This service is essential for keeping the client-side knowledge graph up-to-date efficiently as developers make small changes to their code, avoiding the need for full re-indexing. The CSV indicates this primarily applies to the Client, which aligns with the use case of reacting to local file changes.


Issue: Implement Workspace Management Service for Knowledge Graph Indexer

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17517 - Knowledge Graph Core Indexer Project Related Epic: &17515 - Gitalisk (Repository Service v2)

Project: Knowledge Graph Indexer Crate/Packages: CLI crate, Language Server Bindings crate Applies to: Client Priority: Status:

Description:

This issue is to implement the Workspace Management Service within the Knowledge Graph Indexer, specifically for the CLI crate and Language Server Bindings crate. This service is crucial for the client-side user experience.

The service is responsible for managing the context of one or more repositories that the user wants to index and query locally (a "workspace folder").

Key functions include:

  • Accepting a workspace URI or path from the user.
  • Traversing the local file system within the workspace to discover individual repositories.
  • Performing local repository operations, potentially leveraging the Gitalisk library (&17515), such as identifying the repository root, checking status, or obtaining basic git data (in the future).
  • Providing a mechanism to associate a local repository with its corresponding remote on GitLab or GitHub.
  • Managing the state of the workspace and the discovered repositories for use by other client-side indexer components.

This service is fundamental for enabling the Knowledge Graph Indexer to operate effectively on a developer's local machine, supporting multi-repository workflows and integrating with IDEs via the Language Server.


Issue: Implement Language Data Extraction Service for Knowledge Graph Indexer

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17517 - Knowledge Graph Core Indexer Project Related Epic: &17516 - One Parser (gitlab-code-parser)

Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate Applies to: Client & Server Priority: Status:

Description:

This issue is to implement the Language Data Extraction Service within the Core Indexer crate of the Knowledge Graph Indexer.

This service orchestrates the process of parsing multiple files within a codebase using the gitlab-code-parser. It is a higher-level service compared to the Analysis Service, which focuses on single files.

Key functions include:

  • Accepting a list of file paths to be processed.
  • Kicking off worker threads or parallel processing mechanisms to efficiently iterate over the list of files.
  • For each file, utilizing the Analysis Service to extract data (definitions, references, imports) via the AST provided by the gitlab-code-parser.
  • Collecting the extracted data from all processed files.
  • Sending the extracted data to the "matcher" component (which is part of the Relationship Builder Service) for relationship resolution.

This service is used by the gitlab-zoekt-indexer for the "create graph" task and is essential for efficiently processing entire repositories during initial indexing or large updates on both client and server sides.


Issue: Implement Query Service for Knowledge Graph Indexer

Epic: &17514 - Knowledge Graph First Iteration Related Epic: &17517 - Knowledge Graph Core Indexer Project Related Epic: &17766 - Querying knowledge graph

Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate Applies to: Client & Server Priority: Status:

Description:

This issue is to implement the Query Service within the Core Indexer crate of the Knowledge Graph Indexer.

This service encapsulates all the logic required for querying the Kuzu graph database using the Cypher query language. It provides an API for other components (both client-side and server-side) to retrieve information from the knowledge graph without needing to interact directly with the database connection or write raw Cypher.

Key functions include:

  • Providing methods for common query patterns (e.g., "get all definitions in file X", "find all references to method Y", "show imports for file Z").
  • Constructing and executing Cypher queries via the Database Connection Service.
  • Processing the results returned by Kuzu DB into a usable format.
  • Potentially leveraging the Analysis Service if needed to extract data from specific file contents as part of a query.

This service is consumed by the server-side zoekt-webservice query endpoint and client-side components needing to read from the graph. It is fundamental for enabling features that rely on querying the knowledge graph.


Development Priority: Linux (UNIX) First

Epic: &17514 - Knowledge Graph First Iteration

Project: N/A (Applies broadly) Crate/Packages: N/A Applies to: N/A Priority: Status:

Description:

This item reflects a key development priority for the Knowledge Graph initiative: focusing initial development efforts on Linux (UNIX) environments.

While the long-term vision is for the Knowledge Graph components (like the Code Parser and Indexer) to be cross-platform (macOS, Linux, Windows), the initial development and testing will prioritize Linux.

This means that features, build processes, and testing infrastructure should first be established and stabilized on Linux before full support for other operating systems is implemented. This approach helps streamline initial development and address potential platform-specific issues incrementally.

Developers working on any component of the Knowledge Graph should be aware of this priority and ensure their work is compatible with or initially targeting Linux environments.