go through this epic https://gitlab.com/groups/gitlab-org/-/epics/17514 and then give me new issue descriptions for each work item based off the architecture. Make sure each work item is appropriately researched. Make sure to limit to just that epic and sub epics.
Give me each description in GitLab Flavored markdown with clear separations.
Task,Notes,Theme,Project,Crate/Packages in Project,Applies to Client | Server | Both,Issue,Priority,Owner,Status,Milestone
Ruby Parser,"We must implement this Language's parsing and data extraction (through AST).
The following needs to be extracted:
- Definitions
- Modules
- Classes
- Methods
- Constants
- References
- Imports
- relative
- module/dependency",Language Support,GitLab Code Parser,Core Parser,Client & Server,,P0,,In progress,
Typescript/JS Parser,"We must implement this Language's parsing and data extraction (through AST).
The following needs to be extracted:
- Definitions
- Modules
- Classes
- Methods
- Constants
- References
- Imports
- relative
- alias imports
- module/dependency
Michael Usachenko",Language Support,GitLab Code Parser,Core Parser,Client & Server,https://gitlab.com/gitlab-org/rust/gitlab-code-parser/-/issues/2,P0,,Not started,
GoLang Parser,,Language Support,GitLab Code Parser,Core Parser,Client & Server,,P0,,Not started,
Python Parser,Jonathan Shobrook,Language Support,GitLab Code Parser,Core Parser,Client & Server,,P0,,Not started,
Kotlin Parser,,Language Support,GitLab Code Parser,Core Parser,Client & Server,,P0,,,
Unit Testing Framework,,Testing,GitLab Code Parser,Core Parser,Client & Server,,,,,
Integration Testing Framework,,Testing,GitLab Code Parser,Core Parser,Client & Server,,,,,
Benchmarking/Performance Framework,,Testing,GitLab Code Parser,Core Parser,Client & Server,,,,,
Unit Testing Framework,,Testing,Knowledge Graph Indexer,"Core Indexer crate, CLI crate, Language Server Bindings crate",Client & Server,,,,,
Integration Testing Framework,,Testing,Knowledge Graph Indexer,"CLI crate, Language Server Bindings crate, Core Indexer crate",Client & Server,,,,,
Benchmarking/Performance Framework,,Testing,Knowledge Graph Indexer,"Core Indexer crate, CLI crate",Client & Server,,,,,
Client Side Usage Collector Service,Will use the https://github.com/getsentry/sentry-rust package,Observability,Knowledge Graph Indexer,,,,,,,
Client Side Error Collector Service,Will use the https://github.com/getsentry/sentry-rust package,Observability,Knowledge Graph Indexer,,,,,,,
Server Side Usage Collector Service,TBD - Needs confirmation on how the server will collect and propagate metrics/usage from the indexer,Observability,Knowledge Graph Indexer,,,,,,,
Server Side Error Collector Service,TBD - Needs confirmation on how the server will collect and propagate errors from the indexer,Observability,Knowledge Graph Indexer,,,,,,,
Abstract Metric Service,"This service will take in a generic ""collector"" service/struct, which should be used throughout the codebase to collect metrics.
In this service, we should define ""collect"" and ""flush"" events, which will trigger the metric capturing (either sent to Sentry on the client or Whatever Server will use to collect)
",Observability,Knowledge Graph Indexer,Core Indexer crate,Client & Server,,,,,
Abstract Error Service,"Roughly the same as the Abstract Metric Service, but tailored for collecting Errors",Observability,Knowledge Graph Indexer,Core Indexer crate,Client & Server,,,,,
Database Connection Service,"A wrapper service that takes in a Kuzu connection/database object and performs common operations (write, read, copy from, etc). ",Indexer Specific Code,Knowledge Graph Indexer,Core Indexer crate,Client & Server,,,,,
Database Tables / Schema Service,This service should define the various tables and relationships in a type-safe API that Writer (Bulk or Incremental) service can use,Indexer Specific Code,Knowledge Graph Indexer,Core Indexer crate,Client & Server,,,,,
Analysis Service,"This service is responsible for leveraging the gitlab-code-parser.
It will input a file path and file content and return the appropriate data depending on the language.
This will return the definition, import, reference, and any other nodes along with metadata (like the fully qualified name).
We should leverage generic types here (depending on some language Enum), which will return the type depending on the language so that consumers can perform language specific logic (like the relationship builder service)",Indexer Specific Code,Knowledge Graph Indexer,Core Indexer crate,Client & Server,,,,,
Relationship Builder Service,"This layer will be responsible for resolving the following relationships
- Directory -> Directory
- Directory -> File
- File -> Definitions (Class, Method, Type, Constant etc)
- File -> References
- References -> Definitions
The final output should be the nodes that adhere to the Database Schema to later be later inserted.
This should go to the Writer Service",Indexer Specific Code,Knowledge Graph Indexer,Core Indexer crate,Client & Server,,,,,
Writer Service - General,"This layer should determine the strategy for the incoming write (can manually set via arguments, such as ""init""), and pass the write operation to Bulk Writer or Incremental",Indexer Specific Code,Knowledge Graph Indexer,Core Indexer crate,Client & Server,,,,,
Writer Service - Bulk Writer,"This layer is a sub-layer of the general Writer Service. This should be responsible for initial imports into the Kuzu Database (leveraging COPY From) and handling large updates (large node deletion, updates, etc)",Indexer Specific Code,Knowledge Graph Indexer,Core Indexer crate,Client & Server,,,,,
Writer Service - Incremental,"This layer is a sub-layer of the general Writer Service. This should be responsible for incremental updates for node changes (such as a single file change). This will leverage Cypher queries to mutate data (Merge, Set, etc)",Indexer Specific Code,Knowledge Graph Indexer,Core Indexer crate,Client,,,,,
Workspace Management Service,"For the client side, we need a way for the user to point the knowledge graph indexer to multiple repositories (a workspace folder).
This will include:
- traversing the file system to discover repositories depending on the workspace URI
- any other local repository operations (in the future, obtaining git data)
- being able to associate the local project with the GitLab/GitHub remote",Indexer Specific Code,Knowledge Graph Indexer,"CLI crate, Language Server Bindings crate",Client,,,,,
Language Data Extraction Service,"This layer leverages the GitLab Code Parser to extract definitions and references.
This logic will encapsulate the following:
- accepting list of files
- kicking off worker(s)/threads to iterate over the files
- extracting the data via the AST service
- extracting the data and send message to ""matcher""
",Indexer Specific Code,Knowledge Graph Indexer,Core Indexer crate,Client & Server,,,,,
Query Service,"This service should contain all logic needed for querying data, including:
- Common Cypher Queries (get definitions for X file)
- Extracting any node data from a file (can use Analysis Service if needed)",Indexer Specific Code,Knowledge Graph Indexer,Core Indexer crate,Client & Server,,,,,
,- linux (UNIX) first,,,,,,,,,
Research Complete
Executive Summary
Research completed successfully! I've analyzed 7 GitLab items and generated a comprehensive report.
This report synthesizes research into GitLab Epic gitlab-org&17514
(Knowledge Graph First Iteration) and its related epics to provide new, detailed issue descriptions for the work items listed in your query. The aim is to ground these descriptions in the established architecture and vision for the Knowledge Graph initiative.
While initial broad research faced challenges in accessing all details for gitlab-org&17516
(One Parser) and gitlab-org&17517
(Knowledge Graph Core Indexer Project), the subsequent detailed reports provided for this synthesis do contain specific information and analysis derived from these and related epics. This final report leverages that successfully gathered information.
Architectural Overview
The GitLab Knowledge Graph initiative aims to create a powerful code intelligence system. This system will be open-source, standalone, and deeply integrated into GitLab, enriching features for AI, code navigation, and more.
Knowledge Graph Vision (gitlab-org&17514
)
The overarching vision for the Knowledge Graph, as stated in Knowledge Graph First Iteration (gitlab-org&17514):
Vision Statement
Imagine opening any repository—on your laptop, in CI, or inside the Web IDE—typing one command, and instantly seeing how every file, class, and import fits together.
We aim for any developer to access the GitLab Knowledge Graph project, run the tool on any machine, point it to a repository, and generate a working Repository Knowledge Graph of the project. Then, the knowledge graph can be extended and deeply integrated with GitLab features for customers. Think Gource for Repository code and GitLab features, for AI use cases.
- Open‑source & stand‑alone: works on macOS, Linux, Windows; no GitLab account required. By making this a standalone project and lowering the barrier to entry, we can encourage community contributions.
- Deeply integrated: the very same artifact enriches the GitLab Monolith, Language Server, CLI, and future AI features—one graph, everywhere.
One Parser (gitlab-org&17516
)
A foundational component is the "One Parser" project, detailed in One Parser (gitlab-code-parser) (gitlab-org&17516):
Vision Statement
Establish a single, efficient, and reliable static code analysis library (
gitlab-code-parser
) built in Rust. This library will serve as the foundation for diverse code intelligence features across GitLab, from server-side indexing (Knowledge Graph (gitlab-org&17514), Embeddings (gitlab-org&16910)) to client-side analysis (Language Server, Web IDE). Initially scoped to AI and Editor Features.
The problem it aims to solve is the growing need for advanced code understanding for features like the Knowledge Graph, Codebase Chat Context/Embeddings, and Language Server/IDE features.
Knowledge Graph Core Indexer (gitlab-org&17517
)
The Knowledge Graph Core Indexer Project (gitlab-org&17517) outlines the structure of the indexer itself, built as a set of Rust crates:
- CLI crate: A standalone command-line application for running the indexing process.
- core crate: Contains base logic, including AST extraction rules, node matching (linking definitions, references, files, directories), and data structures for import/query operations.
- client indexer crate: Focuses on fast scanning and graph building for client-side operations.
- database client (TBD): For abstracting database interaction.
Knowledge Graph Server (gitlab-org&17518
)
The server-side architecture, described in Knowledge Graph Server (gitlab-org&17518), proposes:
- Store knowledge graphs for repositories in file-embedded Kuzu DBs (each repository will have its own graph DB)
- Build a thin API service which runs on graph nodes and which takes care of serving incoming query requests (and also takes care of DB management tasks).
- To avoid building graph nodes infrastructure from scratch, we will use Zoekt nodes for indexing and querying also graph databases. We will extend the existing gitlab-zoekt service instead of creating a separate new service.
- Create an abstraction layer on GitLab Rails side which can be used by other services to query graph databases using Cypher query
Other Key Components
- Gitalisk (
gitlab-org&17515
): A Rust library for cross-platform Git operations, crucial for workspace management and local repository interaction. - Database: Kuzu DB is the proposed graph database, supporting Cypher queries and potentially vector search.
Proposed Issue Descriptions
Below are the new issue descriptions for each work item, synthesized from the research reports and grounded in the architecture outlined above.
Issue: Implement Ruby Parser for GitLab Code Parser
Task: Ruby Parser Theme: Language Support Project: GitLab Code Parser Crate/Packages: Core Parser Applies to: Client & Server Priority: P0 Status: In progress
Description:
Implement the Ruby language parser within the gitlab-code-parser
Rust library (One Parser (gitlab-code-parser) (gitlab-org&17516)). This involves leveraging a suitable AST parsing library (likely tree-sitter) and implementing the logic to traverse the AST and extract key code intelligence data points. This parser is a critical component for extracting code intelligence from Ruby projects, contributing to the core
crate's AST extraction capabilities as outlined in the Knowledge Graph Core Indexer Project (gitlab-org&17517).
The parser must be capable of extracting the following information from Ruby source files:
- Definitions: Identify and extract definitions for:
- Modules
- Classes
- Methods
- Constants
- References: Identify and extract references to definitions.
- Imports: Identify and extract import statements, distinguishing between:
- Relative imports
- Module/dependency imports
The extracted data should be structured in a way that can be consumed by downstream services, such as the Knowledge Graph Indexer's Analysis Service, including necessary metadata like location (file path, line/column) and potentially the raw text of the extracted entity. This work is foundational for building the Knowledge Graph for Ruby projects.
Acceptance Criteria:
- Ruby grammar integrated and functional within
gitlab-code-parser
. - AST traversal logic implemented to identify and extract Definitions (Modules, Classes, Methods, Constants), References, and Imports (relative, module/dependency).
- Extracted data format is consistent and well-defined for consumption by indexer components.
- Comprehensive unit tests covering various Ruby syntax structures and extraction scenarios.
- Basic integration test demonstrating successful parsing and extraction for a sample Ruby file.
Issue: Implement Typescript/JS Parser for GitLab Code Parser
Task: Typescript/JS Parser Theme: Language Support Project: GitLab Code Parser Crate/Packages: Core Parser Applies to: Client & Server Issue: gitlab-org/rust/gitlab-code-parser#2 Priority: P0 Status: Not started
Description:
Implement the TypeScript and JavaScript language parser within the gitlab-code-parser
Rust library (One Parser (gitlab-code-parser) (gitlab-org&17516)). This involves leveraging a suitable AST parsing library (likely tree-sitter) and implementing the logic to traverse the AST and extract key code intelligence data points for both languages, considering their shared syntax and unique features. This parser will be integrated into the core
crate as described in the Knowledge Graph Core Indexer Project (gitlab-org&17517) to provide AST extraction for these languages.
The parser should be capable of extracting the following information from TS/JS source files:
- Definitions: Identify and extract definitions for:
- Modules (including namespaces and objects acting as modules)
- Classes
- Methods (including functions and class methods)
- Constants (including variables declared with
const
)
- References: Identify and extract references to definitions.
- Imports: Identify and extract import statements, supporting various syntaxes including:
- Relative imports
- Alias imports (
import { foo as bar } from 'pkg'
) - Module/dependency imports (ES Modules and CommonJS
require
)
The following table from gitlab-org/rust/gitlab-code-parser#2 provides guidance on handling different import types:
Family / Purpose | Example syntax | Identifier-capture name(s) | Path-capture name | Primary AST node(s)* |
---|---|---|---|---|
ES-module – named | import { foo } from 'pkg' | imported-identifier | import-source | import_statement → named_imports → import_specifier |
ES-module – named + alias | import { foo as bar } from 'pkg' | imported-identifier , renamed-identifier | import-source | same as above (alias child inspected) |
ES-module – default | import baz from 'pkg' | default-import | import-source | import_statement → import_clause |
ES-module – namespace | import * as utils from 'pkg' | namespace-import | import-source | import_statement → namespace_import |
ES-module – side-effect | import 'pkg' | (none) | import-source | import_statement |
CommonJS – simple require | const fizz = require('pkg') | variable-name | require-source | variable_declarator → call_expression |
CommonJS – destructured require | const { buzz } = require('pkg') | property-identifier | require-source | variable_declarator → object_pattern → property_identifier |
The extracted data should be structured in a way that is consumable by downstream services, such as the Knowledge Graph Indexer's Analysis Service, including necessary metadata like location (file path, line/column) and potentially the raw text of the extracted entity. This work is crucial for enabling Knowledge Graph functionality for frontend and Node.js projects.
Acceptance Criteria:
- Typescript/JavaScript grammar integrated and functional within
gitlab-code-parser
. - AST traversal logic implemented to identify and extract Definitions (Modules, Classes, Methods, Constants), References, and Imports (relative, alias, module/dependency).
- Extracted data format is consistent and well-defined for consumption by indexer components.
- Comprehensive unit tests covering various TS/JS syntax structures and extraction scenarios, including common module systems (ESM, CommonJS).
- Basic integration test demonstrating successful parsing and extraction for a sample TS/JS file.
Issue: Implement GoLang Parser for GitLab Code Parser
Task: GoLang Parser Theme: Language Support Project: GitLab Code Parser Crate/Packages: Core Parser Applies to: Client & Server Priority: P0 Status: Not started
Description:
Implement the Go language parser within the gitlab-code-parser
Rust library (One Parser (gitlab-code-parser) (gitlab-org&17516)). This involves leveraging a suitable AST parsing library (likely tree-sitter) and implementing the logic to traverse the AST and extract key code intelligence data points for Go. This parser will be a key part of the core
crate's AST extraction capabilities as defined in the Knowledge Graph Core Indexer Project (gitlab-org&17517).
Based on the requirements for other supported languages and the structure of Go code, the parser should be capable of extracting the following information from Go source files:
- Definitions: Identify and extract definitions for:
- Packages
- Structs
- Functions (including methods)
- Constants
- Interfaces
- Types (aliases, etc.)
- Variables (global/package level)
- References: Identify and extract references to definitions.
- Imports: Identify and extract package import statements.
The extracted data should be structured in a way that can be consumed by downstream services, such as the Knowledge Graph Indexer's Analysis Service, including necessary metadata like location (file path, line/column) and potentially the raw text of the extracted entity. Consideration should be given to computing Fully Qualified Names (FQNs) for Go entities. This is a high-priority task to enable code intelligence for Go projects within the Knowledge Graph.
Acceptance Criteria:
- GoLang grammar integrated and functional within
gitlab-code-parser
. - AST traversal logic implemented to identify and extract key Definitions, References, and Imports.
- Extracted data format is consistent and well-defined for consumption by indexer components.
- Comprehensive unit tests covering various GoLang syntax structures and extraction scenarios.
- Basic integration test demonstrating successful parsing and extraction for a sample GoLang file.
Issue: Implement Python Parser for GitLab Code Parser
Task: Python Parser Theme: Language Support Project: GitLab Code Parser Crate/Packages: Core Parser Applies to: Client & Server Priority: P0 Owner: Jonathan Shobrook Status: Not started
Description:
Implement the Python language parser within the gitlab-code-parser
Rust library (One Parser (gitlab-code-parser) (gitlab-org&17516)). This involves leveraging a suitable AST parsing library (likely tree-sitter) and implementing the logic to traverse the AST and extract key code intelligence data points for Python. This parser will contribute to the core
crate's AST extraction functionality as outlined in the Knowledge Graph Core Indexer Project (gitlab-org&17517).
As defined in the related epic First iteration of the Python parser (gitlab-org&18004), the parser should extract:
- Definitions (functions, classes, and class methods)
- References (function/method calls, class instantiations)
- Imports
It should also compute a fully qualified name (FQN) for each one. These will be used to link nodes together in the Knowledge Graph Indexer.
The parser should be capable of extracting the following information from Python source files:
- Definitions: Identify and extract definitions for:
- Functions
- Classes
- Class methods
- Other potential definitions like variables/constants at module level.
- References: Identify and extract references, including function/method calls and class instantiations.
- Imports: Identify and extract import statements, including
import module
,from module import name
, and relative imports.
Crucially, the parser must also compute a Fully Qualified Name (FQN) for each extracted definition. This FQN is essential for the Knowledge Graph Indexer to correctly link references to their definitions across files and modules.
The extracted data should be structured in a way that can be consumed by downstream services, such as the Knowledge Graph Indexer's Analysis Service, including necessary metadata like location (file path, line/column) and the computed FQN. This is a high-priority task for providing code intelligence for Python projects.
Acceptance Criteria:
- Python grammar integrated and functional within
gitlab-code-parser
. - AST traversal logic implemented to identify and extract key Definitions, References, and Imports.
- The parser correctly computes Fully Qualified Names (FQNs) for extracted definitions.
- Extracted data format is consistent and well-defined for consumption by indexer components.
- Comprehensive unit tests covering various Python syntax structures and extraction scenarios.
- Basic integration test demonstrating successful parsing and extraction for a sample Python file.
Issue: Implement Kotlin Parser for GitLab Code Parser
Task: Kotlin Parser Theme: Language Support Project: GitLab Code Parser Crate/Packages: Core Parser Applies to: Client & Server Priority: P0
Description:
Implement the Kotlin language parser within the gitlab-code-parser
Rust library (One Parser (gitlab-code-parser) (gitlab-org&17516)). This involves leveraging a suitable AST parsing library (likely tree-sitter) and implementing the logic to traverse the AST and extract key code intelligence data points for Kotlin. This parser will be integrated into the core
crate's AST extraction capabilities as described in the Knowledge Graph Core Indexer Project (gitlab-org&17517).
Based on the requirements for other supported languages and the structure of Kotlin code, the parser should be capable of extracting the following information from Kotlin source files:
- Definitions: Identify and extract definitions for:
- Packages
- Classes
- Objects
- Interfaces
- Functions (including extension functions and methods)
- Properties (variables and constants)
- Enums
- Data classes, Sealed classes, etc.
- References: Identify and extract references to definitions.
- Imports: Identify and extract package and specific import statements (
import ...
).
The extracted data should be structured in a way that can be consumed by downstream services, such as the Knowledge Graph Indexer's Analysis Service, including necessary metadata like location (file path, line/column) and potentially the raw text of the extracted entity. Consideration should be given to computing Fully Qualified Names (FQNs) for Kotlin entities. This task is necessary to enable code intelligence for Kotlin projects.
Acceptance Criteria:
- Kotlin grammar integrated and functional within
gitlab-code-parser
. - AST traversal logic implemented to identify and extract key Definitions, References, and Imports.
- Extracted data format is consistent and well-defined for consumption by indexer components.
- Comprehensive unit tests covering various Kotlin syntax structures and extraction scenarios.
- Basic integration test demonstrating successful parsing and extraction for a sample Kotlin file.
Issue: Implement Unit Testing Framework for GitLab Code Parser
Task: Unit Testing Framework Theme: Testing Project: GitLab Code Parser Crate/Packages: Core Parser Applies to: Client & Server
Description:
Establish and implement a comprehensive unit testing framework for the gitlab-code-parser
project (One Parser (gitlab-code-parser) (gitlab-org&17516)), targeting the Core Parser
crate. A robust unit testing framework is essential to ensure the correctness and reliability of the language parsers and core AST extraction logic. This framework will support the development within the Core Parser
crate, part of the architecture described in the Knowledge Graph Core Indexer Project (gitlab-org&17517).
Unit tests should cover individual functions, modules, and components of the parser in isolation, verifying that they produce the expected output for given inputs (e.g., parsing specific code snippets and verifying the generated AST nodes and extracted data). The framework should facilitate:
- Writing isolated unit tests for individual functions and modules within the parser.
- Mocking dependencies to test units in isolation.
- Running tests efficiently and reporting results.
Given that the gitlab-code-parser
is a foundational library used by both client and server components of the Knowledge Graph, ensuring its quality through thorough unit testing is critical.
Acceptance Criteria:
- A clear structure for writing unit tests within the
gitlab-code-parser
project is defined. - Tools and dependencies required for unit testing are configured.
- Example unit tests are implemented for at least one language parser to demonstrate usage.
- Unit tests can be run via a standard command (e.g.,
cargo test
). - Test results are easily interpretable.
Issue: Implement Integration Testing Framework for GitLab Code Parser
Task: Integration Testing Framework Theme: Testing Project: GitLab Code Parser Crate/Packages: Core Parser Applies to: Client & Server
Description:
Establish and implement an integration testing framework for the gitlab-code-parser
project (One Parser (gitlab-code-parser) (gitlab-org&17516)). An integration testing framework is needed to verify that different components of the parser work correctly together and that the parser integrates properly with potential consumers or simulated external dependencies. This is crucial for the stability of the Knowledge Graph, particularly for the Core Parser
crate as defined in the Knowledge Graph Core Indexer Project (gitlab-org&17517).
Integration tests could involve:
- Testing the flow from raw source code input through AST generation and data extraction.
- Simulating the interaction with the Knowledge Graph Indexer's
Analysis Service
orLanguage Data Extraction Service
by providing file paths/content and verifying the output format and content. - Testing the parser's behavior with various file sizes and complexities.
This framework will help identify issues that arise from the interaction between different parts of the parser or its expected usage context.
Acceptance Criteria:
- A clear structure for writing integration tests within the
gitlab-code-parser
project is defined. - Tools and dependencies required for integration testing are configured.
- Example integration tests are implemented for at least one language parser using realistic code samples.
- Integration tests can be run via a standard command or CI pipeline.
- Test results are easily interpretable.
Issue: Implement Benchmarking/Performance Framework for GitLab Code Parser
Task: Benchmarking/Performance Framework Theme: Testing Project: GitLab Code Parser Crate/Packages: Core Parser Applies to: Client & Server
Description:
Establish and implement a benchmarking and performance testing framework for the gitlab-code-parser
project (One Parser (gitlab-code-parser) (gitlab-org&17516)), targeting the Core Parser
crate. Given that the parser will process potentially large codebases as part of the indexing process on Zoekt nodes (Knowledge Graph Server (gitlab-org&17518)), performance is a critical concern.
The framework should allow for:
- Measuring the time taken to parse files of different sizes and languages.
- Identifying performance bottlenecks in the parsing and data extraction process.
- Tracking performance changes over time to prevent regressions.
- Benchmarking memory usage during parsing.
Establishing this framework early will be crucial for optimizing the parser's performance, which directly impacts the overall indexing speed of the Knowledge Graph.
Acceptance Criteria:
- A clear structure for writing benchmarks within the
gitlab-code-parser
project is defined. - Tools and dependencies required for benchmarking are configured (e.g., Criterion.rs).
- Example benchmarks are implemented for core parsing and extraction operations for at least one language.
- Benchmarks can be run via a standard command.
- Performance metrics are collected and reported in a usable format.
- Ability to compare benchmark results over time is established (e.g., via CI integration).
Issue: Implement Unit Testing Framework for Knowledge Graph Indexer
Task: Unit Testing Framework Theme: Testing Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate, CLI crate, Language Server Bindings crate Applies to: Client & Server
Description:
Establish and implement a comprehensive unit testing framework for the Knowledge Graph Indexer project (Knowledge Graph Core Indexer Project (gitlab-org&17517)). This framework will support development across the Core Indexer crate
, CLI crate
, and Language Server Bindings crate
. A robust unit testing framework is essential to ensure the correctness and reliability of the various services and components within the indexer.
Unit tests should cover individual functions, modules, and services in isolation, such as:
Database Connection Service
(mocking Kuzu interactions)Database Tables / Schema Service
(verifying schema definitions)Analysis Service
(mocking parser output)Relationship Builder Service
(verifying relationship logic)Writer Service
components (mocking database writes)Query Service
(mocking database reads)Workspace Management Service
(mocking file system operations)- Observability services (mocking collector interactions)
This framework is vital for building a stable and maintainable indexer, which operates on both the client (Knowledge Graph First Iteration (gitlab-org&17514)) and server (on Zoekt nodes, Knowledge Graph Server (gitlab-org&17518)).
Acceptance Criteria:
- A unit testing framework is established and configured for the Knowledge Graph Indexer project, covering its main crates.
- Initial sets of unit tests are written for core logic in each relevant crate.
- Tests can be easily run locally and in CI.
- Documentation on how to write and run unit tests is available.
Issue: Implement Integration Testing Framework for Knowledge Graph Indexer
Task: Integration Testing Framework Theme: Testing Project: Knowledge Graph Indexer Crate/Packages: CLI crate, Language Server Bindings crate, Core Indexer crate Applies to: Client & Server
Description:
Establish and implement an integration testing framework for the Knowledge Graph Indexer project (Knowledge Graph Core Indexer Project (gitlab-org&17517)). This framework should cover the interactions between the different crates and services, including the core
indexer logic, the CLI
application, Language Server Bindings
, and the database layer. An integration testing framework is needed to verify that different components and services of the indexer work correctly together and interact properly with external dependencies.
Integration tests could involve:
- Testing the end-to-end indexing flow: reading files, parsing (mocked or real), building relationships, and writing to a Kuzu database.
- Testing the interaction between the
Analysis Service
and thegitlab-code-parser
. - Testing the
Writer Service
components with a real Kuzu instance. - Testing the
Query Service
against a populated Kuzu database. - Testing the
Workspace Management Service
with actual file system operations. - Testing the CLI and Language Server interfaces.
- Simulating the server-side indexing process on Zoekt nodes (Knowledge Graph Server (gitlab-org&17518)).
This framework will help identify issues that arise from the interaction between different parts of the indexer and its dependencies, ensuring the system functions correctly as a whole.
Acceptance Criteria:
- An integration testing framework is established and configured for the Knowledge Graph Indexer project.
- Initial integration tests are written covering key workflows (e.g., indexing a small sample repository via the CLI).
- Tests can be easily run locally and integrated into CI.
- Documentation on how to write and run integration tests is available.
Issue: Implement Benchmarking/Performance Framework for Knowledge Graph Indexer
Task: Benchmarking/Performance Framework Theme: Testing Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate, CLI crate Applies to: Client & Server
Description:
Establish and implement a benchmarking and performance testing framework for the Knowledge Graph Indexer project (Knowledge Graph Core Indexer Project (gitlab-org&17517)), targeting the Core Indexer crate
and CLI crate
. Performance is a critical aspect of the Knowledge Graph Indexer, especially when processing large repositories on Zoekt nodes (Knowledge Graph Server (gitlab-org&17518)) or providing a responsive experience client-side (Knowledge Graph First Iteration (gitlab-org&17514)).
The framework should allow for:
- Measuring the time taken for the entire indexing process (from file input to database write).
- Benchmarking the performance of specific services like
Relationship Builder
andWriter Service
components. - Measuring the performance of common queries executed by the
Query Service
. - Tracking performance changes over time to prevent regressions.
- Benchmarking memory and disk usage.
Establishing this framework early will be crucial for optimizing the indexer's performance and ensuring it scales effectively for large projects.
Acceptance Criteria:
- A benchmarking framework is established and configured for the Knowledge Graph Indexer project.
- Initial benchmarks are written for core indexing operations and the CLI execution.
- Benchmarks can be run locally and integrated into CI.
- Performance metrics can be collected and reported.
- Documentation on how to write and run benchmarks is available.
Issue: Implement Client Side Usage Collector Service
Task: Client Side Usage Collector Service Theme: Observability Project: Knowledge Graph Indexer Applies to: Client
Description:
Implement a service within the Knowledge Graph Indexer project (Knowledge Graph Core Indexer Project (gitlab-org&17517)) responsible for collecting usage data when the indexer is run on the client side (e.g., via the CLI or Language Server). This service should integrate with a telemetry backend to report how the tool is being used. This aligns with the overall goal of building a robust, standalone tool as per Knowledge Graph First Iteration (gitlab-org&17514).
This service should:
- Utilize the
sentry-rust
package for collecting and reporting usage data, as noted in the work item details. - Capture relevant usage metrics from the client-side indexer operations (e.g., indexing start/end events, duration, number of files processed, features used).
- Integrate with the
Abstract Metric Service
to provide a concrete implementation for client-side usage collection. - Ensure metrics are sent to the configured Sentry endpoint or other designated collection service when appropriate (e.g., on flush, on completion).
- Handle potential errors during collection gracefully.
This service is crucial for gaining visibility into how the Knowledge Graph Indexer is used on the client side and identifying areas for improvement. Data collection should respect user privacy settings.
Acceptance Criteria:
- A
ClientSideUsageCollectorService
is implemented usingsentry-rust
. - The service integrates with the Abstract Metric Service interface.
- Key client-side usage metrics are identified and collected.
- Metrics are successfully sent to a test endpoint.
- Unit tests cover the service's logic.
Issue: Implement Client Side Error Collector Service
Task: Client Side Error Collector Service Theme: Observability Project: Knowledge Graph Indexer Applies to: Client
Description:
Implement a service within the Knowledge Graph Indexer project (Knowledge Graph Core Indexer Project (gitlab-org&17517)) responsible for collecting error data when the indexer encounters issues on the client side (e.g., parsing errors, database errors, file system issues). This service should report errors to a centralized error tracking system, contributing to the reliability of the standalone tool envisioned in Knowledge Graph First Iteration (gitlab-org&17514).
This service should:
- Utilize the
sentry-rust
package for capturing and reporting errors, as noted in the work item details. - Log detailed information about errors, including context, stack traces, and metadata.
- Integrate with the
Abstract Error Service
to provide a concrete implementation for client-side error collection. - Ensure errors are sent to the configured Sentry endpoint or other designated collection service.
- Handle potential errors during error reporting gracefully to avoid cascading failures.
This service is vital for identifying and addressing issues encountered by users of the client-side indexer.
Acceptance Criteria:
- A
ClientSideErrorCollectorService
is implemented usingsentry-rust
. - The service integrates with the Abstract Error Service interface.
- Errors occurring client-side are captured with relevant details.
- Errors are successfully sent to a test endpoint.
- Unit tests cover the service's logic.
Issue: Implement Server Side Usage Collector Service
Task: Server Side Usage Collector Service Theme: Observability Project: Knowledge Graph Indexer Applies to: Server
Description:
Implement a service within the Knowledge Graph Indexer project (Knowledge Graph Core Indexer Project (gitlab-org&17517)) or related server components (Knowledge Graph Server (gitlab-org&17518)) responsible for collecting usage metrics from the server-side execution environment (e.g., on Zoekt nodes). This supports the goal of deep integration with GitLab features as per Knowledge Graph First Iteration (gitlab-org&17514).
This service will require prior investigation (TBD in notes) into the standard GitLab server-side metrics collection mechanisms. Once the mechanism is confirmed, the service should:
- Integrate with the
Abstract Metric Service
to provide a concrete implementation for server-side usage collection. - Define and collect relevant server-side usage metrics related to indexing operations (e.g., indexing job duration, resource usage (CPU, memory, disk I/O), number of repositories indexed, number of queries served, query latency).
- Ensure metrics are propagated to the appropriate GitLab monitoring and analytics systems.
- Handle potential errors during collection gracefully.
This service is crucial for gaining visibility into the performance and usage of the Knowledge Graph Indexer on the server side and informing infrastructure scaling and optimization efforts.
Acceptance Criteria:
- Server-side metrics collection mechanism is confirmed and documented.
- A
ServerSideUsageCollectorService
is implemented, integrating with the Abstract Metric Service. - Key server-side usage metrics are identified and collected.
- Metrics are successfully propagated to the designated server-side collection system.
- Unit tests cover the service's logic.
Issue: Implement Server Side Error Collector Service
Task: Server Side Error Collector Service Theme: Observability Project: Knowledge Graph Indexer Applies to: Server
Description:
Implement a service within the Knowledge Graph Indexer project (Knowledge Graph Core Indexer Project (gitlab-org&17517)) or related server components (Knowledge Graph Server (gitlab-org&17518)) responsible for collecting error information from the server-side execution environment (e.g., on Zoekt nodes). This supports the reliability of the integrated component as envisioned in Knowledge Graph First Iteration (gitlab-org&17514).
This service will require prior investigation (TBD in notes) into the standard GitLab server-side error collection mechanisms. Once the mechanism is confirmed, the service should:
- Integrate with the
Abstract Error Service
to provide a concrete implementation for server-side error collection. - Capture relevant error details, including context, stack traces, system information, indexer version, and details about the indexing or querying operation being performed when the error occurred.
- Ensure errors are propagated to the appropriate GitLab error tracking systems (e.g., Sentry, if used server-side).
- Handle potential errors during error reporting gracefully.
This service is crucial for identifying and debugging issues encountered by the Knowledge Graph Indexer running on the server side.
Acceptance Criteria:
- Server-side error collection mechanism is confirmed and documented.
- A
ServerSideErrorCollectorService
is implemented, integrating with the Abstract Error Service. - Errors occurring server-side are captured with relevant details.
- Errors are successfully propagated to the designated server-side error tracking system.
- Unit tests cover the service's logic.
Issue: Implement Abstract Metric Service for Knowledge Graph Indexer
Task: Abstract Metric Service Theme: Observability Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate Applies to: Client & Server
Description:
Implement an abstract service or trait within the core
crate of the Knowledge Graph Indexer project (Knowledge Graph Core Indexer Project (gitlab-org&17517)) for collecting metrics. This service should define a standard interface that concrete metric collector implementations (like the Client Side Usage Collector or Server Side Usage Collector) will adhere to. This service is foundational for implementing both client-side and server-side usage and error collection.
The service should define a clear API, potentially including methods like collect(metric_name, value, tags)
and flush()
. This abstraction allows different concrete "collector" implementations to be plugged in depending on the environment (client or server). This service will provide a consistent API for reporting metrics throughout the indexer codebase, decoupling the metric collection logic from the specific reporting backend used on the client or server.
Acceptance Criteria:
- An
AbstractMetricService
(or similar) trait and/or struct is defined in thecore-indexer
crate. - A mechanism for injecting or configuring the concrete "collector" implementation is established.
- Methods for collecting common metric types are implemented in the abstract service.
collect
andflush
methods are implemented to manage the metric lifecycle.- Example usage of the service is demonstrated within the
core-indexer
crate. - The service design accommodates different backend implementations (e.g., Sentry, Prometheus).
Issue: Implement Abstract Error Service for Knowledge Graph Indexer
Task: Abstract Error Service Theme: Observability Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate Applies to: Client & Server
Description:
Implement an abstract service or trait within the core
crate of the Knowledge Graph Indexer project (Knowledge Graph Core Indexer Project (gitlab-org&17517)) for collecting and reporting errors. This service should define a standard interface that concrete error collector implementations (like the Client Side Error Collector or Server Side Error Collector) will adhere to. This service is foundational for implementing both client-side and server-side error collection.
The service should define a clear API, potentially including methods like capture_error(error, context)
and flush()
. This abstraction allows different concrete "collector" implementations to be plugged in depending on the environment (client or server). This service will provide a consistent API for reporting errors throughout the indexer codebase, decoupling the error reporting logic from the specific backend used on the client or server.
Acceptance Criteria:
- An
AbstractErrorService
(or similar) trait and/or struct is defined in thecore-indexer
crate. - A mechanism for injecting or configuring the concrete "error reporter" implementation is established.
- Methods for capturing errors and panics are implemented in the abstract service.
- Example usage of the service is demonstrated within the
core-indexer
crate. - The service design accommodates different backend implementations (e.g., Sentry).
Issue: Implement Database Connection Service for Knowledge Graph Indexer
Task: Database Connection Service Theme: Indexer Specific Code Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate Applies to: Client & Server
Description:
Implement a wrapper service or module within the core
crate of the Knowledge Graph Indexer project (Knowledge Graph Core Indexer Project (gitlab-org&17517)) to manage connections and common operations with the Kuzu database. According to the architecture (Knowledge Graph Server (gitlab-org&17518)), Kuzu DBs will be used to store the knowledge graph. This service will provide a clean, abstract API for interacting with a Kuzu database instance.
The service should encapsulate the Kuzu database connection object and provide methods for common database operations, such as:
- Executing Cypher queries for reading data.
- Executing Cypher queries for writing/mutating data (e.g.,
MERGE
,SET
,DELETE
). - Performing bulk data imports (leveraging Kuzu's
COPY FROM
). - Managing transactions and connection lifecycle (opening, closing).
This service acts as a data access layer, abstracting the specifics of the Kuzu API from higher-level services like the Writer Service
and Query Service
. It is a critical piece of the indexer's data persistence layer, used by both client and server components.
Acceptance Criteria:
- A
DatabaseConnectionService
is implemented, wrapping Kuzu connection logic. - Methods for common read, write, and bulk import operations are implemented.
- The service handles connection management.
- Unit tests cover the service's interaction with a mock or in-memory Kuzu instance.
- Documentation on how to use the service is available.
Issue: Implement Database Tables / Schema Service for Knowledge Graph Indexer
Task: Database Tables / Schema Service Theme: Indexer Specific Code Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate Applies to: Client & Server
Description:
Implement a service or module within the core
crate of the Knowledge Graph Indexer project (Knowledge Graph Core Indexer Project (gitlab-org&17517)) responsible for defining and managing the Knowledge Graph database schema. The Knowledge Graph will store various entities (files, directories, definitions, references, imports) and their relationships in a Kuzu graph database (Knowledge Graph Server (gitlab-org&17518)). This service is responsible for defining the structure of this graph.
The service should:
- Define the node tables (e.g.,
File
,Directory
,Definition
,Reference
,Import
) and their properties (e.g.,File
might havepath
,language
;Class
might havename
,fully_qualified_name
). - Define the relationship tables (e.g.,
CONTAINS
,DEFINES
,REFERENCES
,IMPORTS
) and their properties. - Provide a type-safe API or data structures representing the schema elements that services like the
Relationship Builder Service
andWriter Service
can use to construct and insert data that conforms to the schema. - Potentially include logic for creating or migrating the database schema.
This service ensures consistency in how data is structured and written to the Kuzu database, supporting the core
logic for node matching and data structures.
Acceptance Criteria:
- Database schema (node and relationship tables, properties) is defined within the service/module.
- A type-safe API or data structures representing the schema are implemented.
- The service can be used by other components to understand the schema.
- Unit tests cover the schema definition and API.
- Documentation on the schema and how to use the service is available.
Issue: Implement Analysis Service for Knowledge Graph Indexer
Task: Analysis Service Theme: Indexer Specific Code Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate Applies to: Client & Server
Description:
Implement the Analysis Service within the core
crate of the Knowledge Graph Indexer project (Knowledge Graph Core Indexer Project (gitlab-org&17517)). This service is responsible for integrating with the gitlab-code-parser
(One Parser (gitlab-code-parser) (gitlab-org&17516)) to perform static analysis on source code files. According to the architecture (Knowledge Graph Server (gitlab-org&17518)), this service will run on Zoekt nodes as part of the indexing process.
The service should:
- Accept a file path and its content as input.
- Determine the language of the file (potentially using file extension or language detection).
- Invoke the appropriate language parser from
gitlab-code-parser
. - Process the parser's output (AST) to extract structured data, including:
- Definitions (Classes, Methods, Functions, etc.)
- References
- Imports
- Associated metadata (e.g., fully qualified names, code locations).
- Return the extracted data using generic types or enums that allow downstream services (like the Relationship Builder) to handle language-specific nuances.
- Handle parsing errors gracefully.
This service is a crucial step in the indexing pipeline, transforming raw code into structured data ready for graph representation.
Acceptance Criteria:
AnalysisService
struct/trait is defined in thecore-indexer
crate.- Integration with
gitlab-code-parser
is established. - The service can process file input and return extracted data (definitions, references, imports) for at least one language.
- Generic types or language enums are used for the output structure.
- Error handling for parsing failures is implemented.
- Unit tests cover the service's logic and integration with the parser (potentially using mock parser output).
Issue: Implement Relationship Builder Service for Knowledge Graph Indexer
Task: Relationship Builder Service Theme: Indexer Specific Code Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate Applies to: Client & Server
Description:
Implement the Relationship Builder Service within the core
crate of the Knowledge Graph Indexer project (Knowledge Graph Core Indexer Project (gitlab-org&17517)). This service is responsible for taking the structured data extracted by the Analysis Service and building the relationships between code entities and file/directory structures, conforming to the defined Database Schema. According to the architecture (Knowledge Graph Server (gitlab-org&17518)), this service will run on Zoekt nodes as part of the indexing process.
The service should:
- Process the output from the Analysis Service (extracted definitions, references, imports) and file/directory information.
- Identify and resolve the following relationships:
Directory
->Directory
(e.g., parent/child)Directory
->File
File
->Definition
(e.g., a file defines a class or function)File
->Reference
(e.g., a file contains a reference)Reference
->Definition
(linking a reference to its corresponding definition, potentially using FQNs and import information)File
->Imports
Imports
->Definitions
(resolving imported symbols)
- Generate a set of nodes and relationships that adhere to the structure defined by the Database Schema Service.
- Prepare this data for consumption by the Writer Service.
This service is critical for transforming parsed code data into the graph structure stored in the database, contributing to the "Node matching logic" of the core
crate.
Acceptance Criteria:
RelationshipBuilderService
struct/trait is defined in thecore-indexer
crate.- The service can process input data from the Analysis Service and file/directory structure.
- Logic for identifying and building the specified relationships is implemented.
- Relationship resolution logic handles cross-file/module links using FQNs and import data.
- The output data structure conforms to the Database Schema Service.
- The output is in a format consumable by the Writer Service.
- Unit and integration tests are implemented to verify relationship building logic.
Issue: Implement General Writer Service for Knowledge Graph Indexer
Task: Writer Service - General Theme: Indexer Specific Code Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate Applies to: Client & Server
Description:
Implement the General Writer Service within the core
crate of the Knowledge Graph Indexer project (Knowledge Graph Core Indexer Project (gitlab-org&17517)). This service acts as an orchestrator for writing graph data (nodes and relationships) into the Kuzu database, determining the appropriate writing strategy based on the operation type. According to the architecture (Knowledge Graph Server (gitlab-org&17518)), this service will run on Zoekt nodes as part of the indexing process.
The service should:
- Accept graph data (nodes and relationships) generated by the Relationship Builder Service.
- Determine the writing strategy (e.g.,
Bulk
,Incremental
,Init
) based on configuration or input arguments. - Delegate the actual write operation to the appropriate sub-service (Bulk Writer Service or Incremental Writer Service).
- Interact with the Database Connection Service to perform database operations.
- Handle potential errors during the writing process.
This service provides a unified entry point for all data write operations to the Knowledge Graph database.
Acceptance Criteria:
GeneralWriterService
struct/trait is defined in thecore-indexer
crate.- Logic for determining the writing strategy is implemented.
- The service can delegate write operations to placeholder Bulk and Incremental writer functions/structs.
- Integration with the Database Connection Service is established.
- Unit tests cover the strategy determination and delegation logic.
Issue: Implement Bulk Writer Service for Knowledge Graph Indexer
Task: Writer Service - Bulk Writer Theme: Indexer Specific Code Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate Applies to: Client & Server
Description:
Implement the Bulk Writer Service within the core
crate of the Knowledge Graph Indexer project (Knowledge Graph Core Indexer Project (gitlab-org&17517)). This service is a sub-component of the General Writer Service and is responsible for efficiently writing large volumes of graph data into the Kuzu database. According to the architecture (Knowledge Graph Server (gitlab-org&17518)), this service will run on Zoekt nodes as part of the indexing process.
The service should:
- Accept graph data (nodes and relationships) in a format suitable for bulk import.
- Leverage Kuzu's
COPY FROM
functionality via the Database Connection Service for initial indexing. - Implement logic for handling large updates, which may involve bulk deletion and re-insertion or other optimized strategies.
- Handle potential errors during bulk operations.
This service is critical for the performance of initial repository indexing and large-scale graph updates.
Acceptance Criteria:
BulkWriterService
struct/trait is defined in thecore-indexer
crate.- Logic for performing bulk imports using
COPY FROM
via the Database Connection Service is implemented. - Basic handling for large updates is considered.
- Unit tests cover the bulk writing logic (potentially using mock database interactions).
Issue: Implement Incremental Writer Service for Knowledge Graph Indexer
Task: Writer Service - Incremental Theme: Indexer Specific Code Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate Applies to: Client
Description:
Implement the Incremental Writer Service within the core
crate of the Knowledge Graph Indexer project (Knowledge Graph Core Indexer Project (gitlab-org&17517)). This service is a sub-layer of the General Writer Service and is optimized for small, targeted updates to the graph. According to Knowledge Graph First Iteration (gitlab-org&17514), this service is particularly relevant for the client-side indexer.
The service should:
- Accept incremental graph data changes (nodes and relationships to be added, updated, or deleted).
- Leverage Cypher queries (
MERGE
,SET
,DELETE
) via the Database Connection Service to mutate the graph data. - Optimize queries for efficiency when applying small changes.
- Handle potential errors during incremental operations.
This service is crucial for keeping the Knowledge Graph up-to-date with minimal overhead, particularly important for responsive client-side features.
Acceptance Criteria:
IncrementalWriterService
struct/trait is defined in thecore-indexer
crate.- Logic for applying incremental changes using Cypher queries via the Database Connection Service is implemented.
- Basic query optimization for small changes is considered.
- Unit tests cover the incremental writing logic (potentially using mock database interactions).
Issue: Implement Workspace Management Service for Knowledge Graph Indexer
Task: Workspace Management Service Theme: Indexer Specific Code Project: Knowledge Graph Indexer Crate/Packages: CLI crate, Language Server Bindings crate Applies to: Client
Description:
Implement a WorkspaceManagementService
within the client-side crates (cli
, language-server-bindings
) of the Knowledge Graph Indexer (Knowledge Graph Core Indexer Project (gitlab-org&17517)). This service is crucial for the standalone, client-side use case of the Knowledge Graph Indexer described in Knowledge Graph First Iteration (gitlab-org&17514).
The service should be capable of:
- Accepting a workspace URI (e.g., a folder path).
- Traversing the filesystem within the workspace to discover individual Git repositories.
- Performing basic local repository operations, such as identifying the repository root. This could potentially leverage the Gitalisk library (gitlab-org&17515).
- (Future) Incorporating operations to obtain local git data for repositories (branch, commit, remotes).
- Providing functionality to associate a discovered local repository with its corresponding remote project on GitLab or GitHub.
- Managing the context of which repositories are part of the current workspace for indexing and querying purposes.
This service enables the client-side indexer to operate on multiple local repositories and understand their context.
Acceptance Criteria:
WorkspaceManagementService
struct/trait is defined in the client-side crates.- The service can discover Git repositories within a given directory path.
- Methods for basic local Git operations are implemented (potentially using Gitalisk).
- Functionality to link a local repo to a remote is designed (implementation may be a separate task).
- The service can maintain a list of discovered repositories within the active workspace.
- Unit tests cover filesystem traversal and repository identification logic.
Issue: Implement Language Data Extraction Service for Knowledge Graph Indexer
Task: Language Data Extraction Service Theme: Indexer Specific Code Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate Applies to: Client & Server
Description:
Implement the Language Data Extraction Service within the core
indexer crate of the Knowledge Graph Indexer project (Knowledge Graph Core Indexer Project (gitlab-org&17517)). This service is responsible for orchestrating the process of reading source files, determining their language, and extracting relevant code intelligence data using the gitlab-code-parser
(One Parser (gitlab-code-parser) (gitlab-org&17516)). According to the architecture (Knowledge Graph Server (gitlab-org&17518)), this service will run on Zoekt nodes as part of the server-side indexing process.
The service should:
- Accept a list of file paths to process.
- Implement efficient file system traversal and filtering (potentially leveraging the
client indexer
crate's focus on fast scanning, e.g.,ignore
'sWalkParallel
). - Utilize worker threads or an asynchronous processing model to iterate over files in parallel.
- For each file, invoke the Analysis Service to parse the code and extract definitions, references, imports, and metadata.
- Collect the extracted data from the workers.
- Send the extracted data to the "matcher" (which is likely the Relationship Builder Service) for graph construction.
This service manages the parallel processing of files to improve indexing performance.
Acceptance Criteria:
LanguageDataExtractionService
struct/trait is defined in thecore-indexer
crate.- The service can accept a list of files and iterate over them.
- Parallel processing using workers/threads is implemented.
- Integration with the Analysis Service is established to get extracted data.
- Extracted data is prepared and passed to the Relationship Builder Service.
- Error handling for file processing and extraction is implemented.
- Unit tests cover the file iteration, parallel processing, and service coordination logic.
Issue: Implement Query Service for Knowledge Graph Indexer
Task: Query Service Theme: Indexer Specific Code Project: Knowledge Graph Indexer Crate/Packages: Core Indexer crate Applies to: Client & Server
Description:
Implement the Query Service within the core
indexer crate of the Knowledge Graph Indexer project (Knowledge Graph Core Indexer Project (gitlab-org&17517)). This service is responsible for providing an API to query the Knowledge Graph database and retrieve code intelligence information. According to the architecture (Knowledge Graph Server (gitlab-org&17518)), on the server side, queries will come via the GitLab Rails abstraction layer and the zoekt-webservice
. On the client side (Knowledge Graph First Iteration (gitlab-org&17514)), queries will be executed directly against the local Kuzu DB.
The service should:
- Provide methods for executing common Cypher queries against the Kuzu database via the Database Connection Service (e.g., finding definitions within a specific file, finding references to a definition, traversing relationships).
- Potentially include functionality to extract data directly from a file's content using the Analysis Service if the query pertains to the structure within a single file rather than relationships in the graph.
- Return query results in a structured and easily consumable format.
- Handle potential errors during query execution.
This service is essential for enabling all features that consume data from the Knowledge Graph, serving both client-side (LSP features) and server-side (API endpoints) consumers.
Acceptance Criteria:
QueryService
struct/trait is defined in thecore-indexer
crate.- Methods for executing common Cypher queries via the Database Connection Service are implemented.
- Integration with the Analysis Service for file-internal data extraction is considered/implemented if needed.
- Query results are returned in a defined structure.
- Error handling for query execution is implemented.
- Unit tests cover the query logic and interaction with the database (potentially using a mock database).
- Documentation on available queries and how to use the service is available.
General Note on Platform Prioritization
Task: - linux (UNIX) first
Description:
This is not a specific implementation task but a guiding principle for the development of the Knowledge Graph Indexer and related components (Knowledge Graph First Iteration (gitlab-org&17514), One Parser (gitlab-code-parser) (gitlab-org&17516), Knowledge Graph Core Indexer Project (gitlab-org&17517)).
All development, testing, and initial deployment efforts should prioritize Linux (UNIX-like) operating systems first. While the chosen technologies (Rust, tree-sitter
, Kuzu) are generally cross-platform, ensuring full functionality and performance on Linux is the primary focus before expanding to other platforms (e.g., macOS, Windows).
This principle applies to:
- Setting up development environments.
- Running unit, integration, and performance tests.
- Configuring CI pipelines.
- Initial documentation and support.
Consideration for cross-platform compatibility should be kept in mind during design and implementation, but full support for other platforms will follow the successful implementation and stabilization on Linux.
This report provides detailed issue descriptions based on the synthesized research, aiming to align each work item with the overall Knowledge Graph architecture and vision. These descriptions should serve as a solid foundation for creating actionable development tasks.