researching this epic https://gitlab.com/groups/gitlab-org/-/epics/18028 and this MR https://gitlab.com/gitlab-org/rust/gitlab-code-parser/-/merge_requests/1, help me do the following:
help me come up with issues (title and detailed description) similar to typescript and python epics that will go under https://gitlab.com/groups/gitlab-org/-/epics/18028
help me come up with a detailed epic description for https://gitlab.com/groups/gitlab-org/-/epics/18028 tailor the issue descriptions towards a Kotlin parser
make sure to break the issues down into logical chunks based off the research and make sure to explain the scope
keep the scope limit to under https://gitlab.com/groups/gitlab-org/-/epics/18028, do not search above it
I'll research the epic and merge request you provided to understand the structure and requirements for creating Kotlin parser issues. Let me gather the necessary information first.
Research Complete
Executive Summary
Research completed successfully! I've analyzed 4 GitLab items and generated a comprehensive report.
This report synthesizes detailed research on GitLab's "One Parser" project to provide recommendations for integrating Kotlin language support. It includes a proposed update to the epic description for gitlab-org&18028
(Ruby Parser First Iteration) to encompass Kotlin, and a set of detailed issues for developing the Kotlin parser, designed to be created under this epic. The recommendations are based on established patterns from existing Ruby, Python, and JavaScript/TypeScript parser implementations within the gitlab-code-parser
project.
1. Context: The "One Parser" Project and Existing Patterns
GitLab's "One Parser" project (gitlab-org&17516) aims to create a unified, Rust-based static code analysis library named gitlab-code-parser
. This library is foundational for various code intelligence features, including the Knowledge Graph and AI-driven functionalities.
The core technical approach involves using language-specific tree-sitter
grammars to generate Abstract Syntax Trees (ASTs) and ast-grep
for defining pattern-matching rules to extract key code entities.
Analysis of existing language parser epics, such as:
- Python Parser (gitlab-org&18004)
- JavaScript/TypeScript Parser (gitlab-org&18005)
- Ruby Parser (gitlab-org&18028)
reveals a consistent 5-phase pattern for parser development:
- Parse Definitions: Extracting structural code elements (classes, functions, etc.).
- Resolve FQNs for Definitions: Computing Fully Qualified Names for these definitions.
- Parse References: Identifying usages and calls to defined entities.
- Resolve FQNs for References: Determining FQNs for these references.
- Parse Imports: Extracting dependency statements.
This structured approach ensures comprehensive data extraction for the Knowledge Graph Indexer.
2. Proposed Update to Epic gitlab-org&18028
for Kotlin Support
The epic gitlab-org&18028
is currently titled "Ruby Parser First Iteration". To incorporate Kotlin work as requested, it's recommended to broaden its scope. The title could be updated to, for example, "Ruby and Kotlin Parser First Iteration" or "Language Parsers First Iteration (Ruby, Kotlin)".
The following is a proposed description section to be added or integrated into the existing epic description of gitlab-org&18028
, specifically detailing the Kotlin parser goals and scope:
Kotlin Parser Integration
Goal
Add initial support for Kotlin to the One Parser project, building upon the foundational work established by the Ruby, Python, and JavaScript/TypeScript parsers. This will enable knowledge graph construction for Kotlin codebases, as well as other static analysis applications, like chunking code for embeddings and providing context for AI features.
For a Kotlin file, the parser should yield:
- Definitions: Classes (including
data class
,sealed class
,enum class
,inner class
,nested class
), interfaces (includingsealed interface
), objects (includingcompanion object
), functions (top-level, member, extension, local, lambda expressions), properties (top-level, member, extension,val
,var
,lateinit
,const
), and type aliases. - References: Function calls (including infix and operator overloading), property accesses, class/object instantiations, method references/callable references, and constructor calls.
- Imports/Dependencies:
import
statements (single, multiple, aliased, wildcard) andpackage
declarations.
It should also compute a Fully Qualified Name (FQN) for each extracted entity. These FQNs will be used to link nodes together in the Knowledge Graph Indexer.
Why imports?
As highlighted in the Python Parser epic (gitlab-org&18004):
Given a single file, we can only resolve FQNs for references to functions defined in the same file. For references to functions imported from other files, the best our parser can do is trace the reference back to the imported symbol.
For Kotlin, this means if a file contains:
Loading syntax highlighting...
The FQN our parser computes for the myUtilityFunction
call within main()
will initially be com.example.utils.myUtilityFunction
. The Knowledge Graph Indexer will then use this FQN, combined with the import information, to link this reference to the actual definition of myUtilityFunction
in com.example.utils
. This is crucial for:
- File-to-file relationships: Understanding dependencies between source files.
- Cross-file FQN resolution: Accurately resolving references that span multiple files or modules.
- Third-party dependencies: Identifying and tracking external library usage.
Limitations (First Iteration for Kotlin)
Due to the complexity of Kotlin's features and the scope of this initial iteration, the following aspects will not be fully supported or will have known limitations:
- Dynamic/Reflection-based Calls: Calls resolved at runtime using reflection (e.g.,
Class.forName("...")
) cannot be statically analyzed. - Complex DSLs: While
tree-sitter-kotlin
can parse the syntax, resolving FQNs within highly custom Domain Specific Languages (DSLs) might be limited. - Multi-platform Project Resolution: Resolving FQNs across different source sets (common, JVM, JS, Native) in multi-platform projects is out of scope for this iteration. The parser will operate on individual files.
- Annotation Processing: The semantic meaning derived from annotation processors will not be considered for FQN resolution or entity extraction.
- Type Inference for Complex Scenarios: While basic type inference will be leveraged by
tree-sitter
's AST, deep semantic type inference for highly generic or complex lambda expressions might be limited for FQN resolution. - Operator Overloading: Calls using overloaded operators will be treated as standard method calls, and their FQN resolution will follow the same rules.
- Delegated Properties: The underlying getter/setter logic for delegated properties will not be explicitly modeled as separate definitions/references in this iteration.
- Anonymous Object Expressions: While the anonymous function part might be captured, the full FQN resolution for members within anonymous object expressions might be limited.
Implementation Details
The Kotlin parser will be implemented in Rust within the gitlab-code-parser
project. It will primarily leverage:
tree-sitter-kotlin
for generating the Abstract Syntax Tree (AST).ast-grep
for defining pattern-matching rules to extract definitions, references, and imports.- A common
CodeAnalyzer
framework (as seen in gitlab-org/rust/gitlab-code-parser!12) will be utilized to ensure consistency across language parsers.
3. Proposed Kotlin Parser Issues (under gitlab-org&18028
)
The following five issues are proposed for the Kotlin parser, structured according to the established 5-phase pattern. These issues should be created as children of the updated gitlab-org&18028
epic.
3.1. Issue 1: (Kotlin) Parse Definitions
-
Title:
(Kotlin) Parse Definitions
-
Description: Goal: Extract core code definitions from Kotlin files using
tree-sitter-kotlin
andast-grep
.This issue focuses on identifying and extracting structural definitions within Kotlin code. These definitions are crucial for building the foundational nodes of the Knowledge Graph.
Scope: The parser should identify and extract the following types of definitions:
class_declaration
: Classes, includingdata class
,sealed class
,enum class
,inner class
, andnested class
.interface_declaration
: Interfaces, includingsealed interface
.object_declaration
: Object declarations, includingcompanion object
and anonymous objects where feasible.function_declaration
: Functions, including top-level functions, member functions, extension functions, local functions, and operator functions.property_declaration
: Properties, including top-level properties, member properties, extension properties,val
,var
,lateinit var
, andconst val
.type_alias
: Type aliases.lambda_expression
: Lambda expressions, especially when assigned to a variable or passed as a higher-order function argument, where they define a distinct code block.
Out of Scope (for this issue):
- Computing Fully Qualified Names (FQNs) for these definitions (handled in Issue 3.2).
- Resolving definitions across multiple files or modules.
- Parsing annotations or their semantic meaning beyond simple identification.
Implementation Details: Leverage the
tree-sitter-kotlin
grammar to build the Abstract Syntax Tree (AST). Defineast-grep
rules (e.g., incrates/parser-core/src/kotlin/rules/definitions.yaml
) to match the specified definition nodes. The output should be a list of raw matches (MatchInfo
) or a structured representation of each definition, including its type, name, and source code location (start/end line/column).Example Kotlin Code:
Loading syntax highlighting...
Expected Definitions (simplified):
MyClass
(class)id
(property, withinMyClass
primary constructor)memberFunction
(function, withinMyClass
)myProperty
(property, withinMyClass
)MyCompanion
(object, withinMyClass
)TAG
(property, withinMyCompanion
)staticMethod
(function, withinMyCompanion
)MyInterface
(interface)interfaceMethod
(function, withinMyInterface
)MySingleton
(object)doSomething
(function, withinMySingleton
)myExtension
(function, extension onString
)Name
(type alias)myLambda
(property holding a lambda)
3.2. Issue 2: (Kotlin) Resolve FQNs for Definitions
-
Title:
(Kotlin) Resolve FQNs for Definitions
-
Description: Goal: Compute Fully Qualified Names (FQNs) for all definitions identified in Kotlin files.
Assuming definitions have been captured (as per Issue 3.1), this issue focuses on traversing the AST to determine the complete, unique, hierarchical name for each definition. This is crucial for linking nodes in the Knowledge Graph.
Scope: The FQN resolution should correctly handle:
- Package-level definitions: Based on
package
declaration and file path. - Nested classes/interfaces/objects: FQNs for
inner class
,nested class
,companion object
, and other nested declarations (e.g.,com.example.MyClass.NestedClass
,com.example.MyClass.Companion.myMethod
). - Member functions and properties: FQNs for members within classes, interfaces, and objects.
- Extension functions and properties: FQNs should reflect their declaration site and the receiver type (e.g.,
com.example.MyExtensions.String.myExtensionFunction
). - Lambda expressions: FQNs for lambdas should be derived from their context (e.g.,
com.example.MyClass.myFunction.<lambda_N>
). - Type aliases: FQNs for type aliases.
Implementation Details: This will involve a traversal of the AST (similar to
build_fqn_and_node_indices
in the Ruby parser MR gitlab-org/rust/gitlab-code-parser!1) to build a scope stack that tracks the current package, class, object, or function context. When a definition node is encountered, its FQN is constructed by combining the scope stack with its local name. The file path will be used to derive the initial package FQN segment if apackage
declaration is missing or incomplete.Example Kotlin Code (from Issue 3.1):
Loading syntax highlighting...
Expected FQNs for Definitions:
com.example.app.MyClass
com.example.app.MyClass.id
com.example.app.MyClass.memberFunction
com.example.app.MyClass.MyCompanion
com.example.app.MyClass.MyCompanion.staticMethod
com.example.app.String.myExtension
(or a similar convention for extensions, e.g.,com.example.app.<file_name>.String.myExtension
)com.example.app.Name
- Package-level definitions: Based on
3.3. Issue 3: (Kotlin) Parse References
-
Title:
(Kotlin) Parse References
-
Description: Goal: Identify and extract references within Kotlin code using
ast-grep
.This issue focuses on recognizing instances where previously defined entities (functions, classes, properties, objects) are used or called. These references are essential for establishing relationships (edges) in the Knowledge Graph.
Scope: The parser should identify and extract the following types of references:
- Function Calls: Calls to top-level, member, extension, local, and operator functions (e.g.,
myFunction()
,myObject.doSomething()
,String.myExtension()
,a + b
). - Property Accesses: Reads or writes to properties (e.g.,
myObject.myProperty
,MyClass.staticProperty
). - Class/Object Instantiations: Calls to constructors (e.g.,
MyClass()
,MyObject
). - Type References: Usage of class, interface, or object names as types (e.g.,
val x: MyClass
,fun process(arg: MyInterface)
). - Method References/Callable References: References to functions or properties (e.g.,
::myFunction
,MyClass::myMember
). - References to enum entries.
Out of Scope (for this issue):
- Computing Fully Qualified Names (FQNs) for these references (handled in Issue 3.4).
- References within string literals or comments.
Implementation Details: Utilize
ast-grep
with appropriate pattern-matching rules (e.g., incrates/parser-core/src/kotlin/rules/references.yaml
) to capture reference nodes. The output should include the matched text, its location, and any captured meta-variables that can aid in later FQN resolution.Example Kotlin Code:
Loading syntax highlighting...
Expected References (simplified, names only):
MyClass
(constructor call)memberFunction
(function call)id
(property access)Name
(type usage)myExtension
(function call)staticMethod
(function call)process
(function call)
- Function Calls: Calls to top-level, member, extension, local, and operator functions (e.g.,
3.4. Issue 4: (Kotlin) Resolve FQNs for References
-
Title:
(Kotlin) Resolve FQNs for References
-
Description: Goal: Compute Fully Qualified Names (FQNs) for all references identified in Kotlin files.
Assuming references have been parsed (as per Issue 3.3), this issue focuses on resolving the FQN of the entity being referenced. This is a complex task due to Kotlin's rich type system, scope rules (including imports and extension functions), and type inference. Perfect static resolution may not be possible for all cases in the first iteration; the aim is for "pretty good" resolution for common cases within a single file and based on imports.
Scope: The FQN resolution for references should aim to handle:
- Local variable/parameter references: Resolve to the FQN of their declaration (if applicable, or mark as local).
- Member references: Resolve to the FQN of the class/object member, considering
this
andsuper
. - Top-level function/property references: Resolve to their package-level FQN.
- Extension function/property calls: Resolve to the FQN of the extension's declaration.
- References within scope functions: (e.g.,
apply
,with
,run
,let
,also
) where the receiver context changes. - References to imported symbols: Trace the reference back to the imported symbol's FQN (e.g.,
com.example.utils.Helper
). The Knowledge Graph Indexer will handle cross-file resolution. - Basic aliased imports: If an import is aliased (e.g.,
import com.example.MyClass as MC
), references toMC
should resolve tocom.example.MyClass
.
Out of Scope (for this issue / Limitations for first iteration):
- Full cross-file FQN resolution (this is primarily for the Knowledge Graph Indexer, using data from "Parse Imports").
- Complex type inference involving generics, higher-order functions, or reflection.
- Resolution of references involving operator overloading beyond identifying the underlying function if clear.
- Resolution of references that require full project-level dependency analysis or build system awareness.
Implementation Details: This will require sophisticated AST traversal, maintaining a symbol table for the current file's scope (including imports), and context tracking. It may involve basic type inference for local variables. The
node_fqn_map
built during definition FQN resolution (Issue 3.2) will be crucial. For imported symbols, the FQN will be the FQN of the import itself.Example Kotlin Code (from Issue 3.3):
Loading syntax highlighting...
Expected FQNs for References (simplified):
com.example.app.MyClass
(constructor)com.example.app.MyClass.memberFunction
com.example.app.MyClass.id
com.example.app.Name
(type usage)com.example.app.String.myExtension
(assumingName
isString
andmyExtension
is onString
)com.example.app.MyClass.MyCompanion.staticMethod
com.example.utils.Helper.process
(ifhelper
is of typecom.example.utils.Helper
)
3.5. Issue 5: (Kotlin) Parse Imports
-
Title:
(Kotlin) Parse Imports
-
Description: Goal: Extract all
package
declarations andimport
statements from Kotlin files, along with the imported symbols and their Fully Qualified Names (FQNs) or target paths.Understanding these dependencies is critical for building a comprehensive Knowledge Graph, as it allows for linking definitions and references across different files and modules.
Scope: The parser should identify and extract information from:
package
declarations: Identify the package name of the current file (e.g.,package com.example.app
).- Single imports:
import com.example.MyClass
- Member imports:
import com.example.MyClass.staticMember
- Alias imports:
import com.example.MyClass as MyAlias
- Wildcard imports:
import com.example.util.*
For each import, the parser should capture:
- The imported symbol's name (or alias if used).
- The fully qualified path of the imported symbol (e.g.,
com.example.MyClass
,com.example.util
). - The type of import (e.g., single, wildcard, aliased).
Out of Scope (for this issue):
- Resolving the actual file path or module where the imported symbol is defined (this is a task for the Knowledge Graph Indexer).
- Handling dynamic imports or classpath resolution.
Implementation Details: Utilize
tree-sitter-kotlin
andast-grep
to identifypackage_header
andimport_list
/import_header
nodes. Extract the relevant text and structure from these nodes to form the FQN or path. The output should be structured to facilitate cross-file FQN resolution by the Knowledge Graph Indexer.Example Kotlin Code:
Loading syntax highlighting...
Expected Imports (simplified):
- Package:
com.example.app
- Import:
List
->kotlin.collections.List
- Import:
SB
(alias) ->kotlin.text.StringBuilder
- Import:
*
(wildcard) ->com.example.common
- Import:
User
->com.example.data.User
- Import:
createDefault
->com.example.data.User.Companion.createDefault
4. Supporting Evidence from Research
The proposed structure and scope are directly informed by existing GitLab parser initiatives:
-
The Ruby Parser First Iteration epic (gitlab-org&18028) outlines the core goals for a language parser:
"For a Ruby file, the parser should yield:
- Definitions: ...
- References: ...
- Imports/Dependencies: ... It should also compute a Fully Qualified Name (FQN) for each extracted entity."
-
The Ruby parser MR (gitlab-org/rust/gitlab-code-parser!1) details the technical workflow:
"1. Initialization: ...
AstGrep
object, and preloadedast-grep
rules. 2. Rule Execution: ... executes the loadedast-grep
rules ... 3. Indexing: ...build_fqn_and_node_indices
... constructs ...node_fqn_map
..." This workflow is expected to be adapted for Kotlin. -
The Python Parser epic (gitlab-org&18004) and JavaScript/TypeScript Parser epic (gitlab-org&18005) clearly establish the 5-issue breakdown (Parse Definitions, Resolve FQNs for Definitions, Parse References, Resolve FQNs for References, Parse Imports), which provides a robust framework for tackling language parsing incrementally and comprehensively.
5. Conclusion
By adopting the proven 5-issue structure and aligning with the goals of the "One Parser" project, this plan provides a clear and logical roadmap for integrating Kotlin support. The detailed issue descriptions, tailored for Kotlin's specific features and with defined scopes, will guide the development process. This will enable GitLab to effectively analyze Kotlin codebases, enriching the Knowledge Graph and powering advanced code intelligence and AI-assisted features for Kotlin developers.