If you want to take the code, here is the link.
It was a cold winter evening in Dublin. I was making a clone of Space Invaders in Zig and I came across another memory bug. The cursor in my Neovim stood steel. I realized that I created a self referential structure in the init
method of the Game structure. I opened the fuzzy search and quickly typed "init"
, and found 4 symbols with the name I was looking for. Apparently, I could differ them by the preview, but I don't like using a preview. I realized that it would be much easier to find a document symbol by its path. Namely, it'd be easier to type two words "Game" and "init" to get the right method. When we use fuzzy search to find a file in a project, we see the entire file path instead of just a name. If we found multiple files with the same name, we can type the name of one of the folders from the path of the file which we need. Why wouldn't we use the same approach for symbols in a document?
The fuzzy search in Neovim is implemented by a couple of plugins. Personally, I prefer Telescope, mainly it's easy to extend it, thank the rich documentation. Also, Neovim supports Tree-sitter to get the Abstract Syntax Tree (AST) to build the symbol paths, the plugin Nvim Treesitter does the work. Therefore, the high level plan is this:
The Telescope source code is well documented. You type :help tele
and press <TAB>
and see all components of the plugin in the completion list. Using the information you can create a dead simple bookmark menu with 35 lines of code:
local conf = require "telescope.config".values local make_entry = require "telescope.make_entry" local pickers = require "telescope.pickers" local finders = require "telescope.finders" local function create_bookmark_menu() local current_filename = vim.api.nvim_buf_get_name(0) local first_line = 1 local last_line = vim.api.nvim_buf_line_count(0) local entries = { { lnum = first_line, col = 1, text = "begin", filename = current_filename, }, { lnum = last_line, col = 1, text = "end", filename = current_filename, } } pickers.new({}, { prompt_title = "dummy menu", finder = finders.new_table({ results = entries, entry_maker = make_entry.gen_from_buffer_lines({}), sorter = conf.generic_sorter({}), previewer = conf.grep_previewer({}), }), }):find() end
Out of the example you can see two things:
The data format we get out of Tree-sitter shapes the entries for Telescope. The data we can get has following pieces:
This structure looks almost like that from the example from Telescope API. The symbols coordinates are pairs of lnum
and col
. The name of the symbol can be put into the text of a menu entry. There is no field for the kind of a symbol in the example. Therefore, I we have to introduce it. Also, we don't need the field filename
, because we collect symbols from one file only. Given that, the structure looks like this:
entry = { lnum = 12, col = 1, text = "init", kind = "function" }
However, the purpose of a menu with entries of this structure is missed, because text
has only the symbol name. The purpose is having the entire path to the symbol. Also, Tree-sitter calls a function a "functiondeclaration". Therefore, the structure will be like this:
entry = { lnum = 12, col = 1, text = "App::init", kind = "function_declaration" }
To get the entries we traverse the AST of a file. The typical way is the visitor pattern. The function of the visitor requires the following data:
That's what they are look like in the function declaration
local function visit_node(node, expected_node_type, tree_path_array, result, bufnr)
Let's define what the function does.
tree_path_array
.tree_path_array
.tree_path_array
.Here is the code of the steps:
visit_node = function(node, expected_node_type, tree_path_array, result, bufnr) local node_name = get_name(node, expected_node_type, bufnr) if node_name == nil or node_name:len() == 0 then return end local node_type = node:type() table.insert(tree_path_array, node_name) local line, col = node:start() local node_path = table.concat(tree_path_array, path_delim) table.insert(result, { lnum = line + 1, col = col + 1, kind = node_type, text = node_path, }) visit_children(node, expected_node_type, tree_path_array, result, bufnr) table.remove(tree_path_array, table.maxn(tree_path_array)) end
Out of the code, you see that result
gets the data exactly in the format we defined. Also, you can see that we have node_type
and expected_node_type
. Those two fields seem carrying the same information. I'll disclose the purpose of expected_node_type
in the following sections.
The last piece in the algorithm is visiting the children. It has 2 steps:
In the code, it looks this way:
local function visit_children(parent_node, expected_parent_node_type, tree_path_array, result, bufnr) local body_node = (function() if expected_parent_node_type == expected_node_type_variable_declaration then return find_complex_child(parent_node:iter_children()) elseif expected_parent_node_type == expected_node_type_test_declaration then return find_child(parent_node, expected_node_type_block) elseif expected_parent_node_type == expected_node_type_container_field then for _, body in pairs(parent_node:field("type")) do return filter_complex_child(body) end end return nil end)() if body_node ~= nil then for child_node, _ in body_node:iter_children() do local child_expected_node_type = get_expected_node_type(child_node) if ( child_expected_node_type == expected_node_type_container_field or child_expected_node_type == expected_node_type_function_declaration or child_expected_node_type == expected_node_type_variable_declaration ) then visit_node(child_node, child_expected_node_type, tree_path_array, result, bufnr) end end end end
The function find_complex_child
finds a child which passes the function filter_complex_child
, which determines if the child is one of the following types:
Then we traverse the child nodes of the body according the rules from (2).
The type of the data in node:type()
is a string. The type of the data in expected_node_type
is an integer. We use expected_node_type
to get the name of the node. Consider the following Zig snippet:
const ObjectHandle = struct { index: u16, }; fn app() void {} test "Happy path" { }
We have 4 types of nodes here:
Each one of them requires different approach to get its name.
The AST of the variable declaration requires to iterate over its children to find identifier
:
(variable_declaration ; [0, 0] - [2, 2] (identifier) ; [0, 6] - [0, 18] (struct_declaration ; [0, 21] - [2, 1] ; the container field will be shown separately )) ; [1, 11] - [1, 14]
The AST of the function declaration has its name in the named child name
:
(function_declaration ; [3, 0] - [3, 16] name: (identifier) ; [3, 3] - [3, 6] (parameters) ; [3, 6] - [3, 8] type: (builtin_type) ; [3, 9] - [3, 13] body: (block)) ; [3, 14] - [3, 16]
The AST of the container field has its name at the same place:
(container_field ; [1, 4] - [1, 14] name: (identifier) ; [1, 4] - [1, 9] type: (builtin_type)))) ; [1, 11] - [1, 14]
The AST of the test declaration requires to iterate over its children to find string
:
(test_declaration ; [4, 0] - [4, 17] (string ; [4, 5] - [4, 13] (string_content)) ; [4, 6] - [4, 12] (block)) ; [4, 14] - [4, 17]
Given that, we have to compare the type of the visited node with the expected node types and get the name according the type:
local function get_name(of_node, expected_node_type, bufnr) if expected_node_type == expected_node_type_variable_declaration then local identifier = find_child(of_node, expected_node_type_identifier) if identifier == nil then return nil end return vim.treesitter.get_node_text(identifier, bufnr) elseif ( expected_node_type == expected_node_type_function_declaration or expected_node_type == expected_node_type_container_field ) then local _, function_name_node = next(of_node:field("name")) if function_name_node == nil then return nil end return vim.treesitter.get_node_text(function_name_node, bufnr) elseif expected_node_type == expected_node_type_test_declaration then local test_name = find_child(of_node, expected_node_type_string) if test_name ~= nil then return "test" .. vim.treesitter.get_node_text(test_name, bufnr) else return "test" end else return nil end end
It's time to answer the question. Why do I need this expected_node_type
? As I said the data type is an integer. It's faster to compare the numbers than strings. In other words, the algorithmic complexity of the string comparison is linear, while integer is constant (spoiler: it's not true for LuaJIT 2.1). It's not a problem to compare strings once. However, sometimes you inspect a source file of 10k - 20k lines, and the plugin, which you rely on, fails to do in a frame (16 ms or even 32 ms). It is a deal breaker for me. The main reason why I like Neovim (and Vim in the past) is that, that it's snappy and crispy (not only performance-wise, but the experience in general). I really hate when some plugin takes away this experience. Also, it's not that green to spend unnecessary CPU cycles.
My initial idea was using the integer identifier which I get from node:symbol()
. The method gives you an integer which you can use to identify the type. However, the identifiers change with a new version of Nvim Treesitter. That means, that I have to do a follow up change in the code. Apparently it's annoying.
In order to mitigate the inconvenience, I came up with a silly idea of keeping both ways of the symbol type identification in the code. Namely, the functions which utilize node:symbol()
work, but the functions which utilize node:type()
remain commented.
I remove the comment operator in the beginning of them, upgrade Nvim Treesitter, and my symbol menu works again. The flaw of the plan is that I have a few places where I get the type. It made me to create this expected_node_type
and tables to match it against the data from node:symbol()
and node:type()
. That allowed me to amend only one place.
The entire idea is in these 3 functions:
local function get_expeceted_node_type_from_node_symbol(node) local node_symbol = node:symbol() for _, value in pairs(expected_node_type_against_symbols_table) do if value[1] == node_symbol then return value[2] end end return expected_node_type_unknown end local function get_expeceted_node_type_from_node_type(node) return expected_node_type_against_types_table[node:type()] end local function get_expected_node_type(node) local expected_node_type = get_expeceted_node_type_from_node_symbol(node) -- local expected_node_type = get_expeceted_node_type_from_node_type(node) return expected_node_type end
As you can see they do a table lookup. I wanted to minimize them, that's why I pass expected_node_type
to the visiting function. However, the necessity of maintaining the code compelled me to check the actual code execution duration, which surprised me.
I did the benchmark of the code this way:
gettimeofday
is a pair of seconds and microseconds. With C I would've used the timersub function to subtract two instances of the result to get the duration. However, I failed to find the function without installing anything extra. Therefore, I had to implement the subtraction.Here is the code of the benchmark:
local diffs = {} function benchmark(callable) local before_s, before_ms = vim.uv.gettimeofday() callable() local after_s, after_ms = vim.uv.gettimeofday() local after = after_s .. "." .. after_ms if before_ms > after_ms then after_ms = after_ms + 1000000 after_s = after_s - 1 end local diff_s = after_s - before_s local diff_ms = after_ms - before_ms print("before: " .. before_s .. "." .. before_ms .. ", after: " .. after .. ", duration " .. diff_s .. "." .. diff_ms) table.insert(diffs, diff_s * 1000000 + diff_ms) end for i = 0, 100, 1 do benchmark(function() collect_document_symbols(vim.api.nvim_get_current_buf()) end) end table.sort(diffs) local avg = vim.iter(diffs):fold(0, function(acc, v) return acc + v end) / (#diffs) print("median: " .. (diffs[49] + diffs[50]) / 2 .. " avg: " .. avg)
The function collect_document_symbols
only collects the symbols with Treesitter API. Those are the symbols I'm talking about. The results of the benchmark are surprising:
node:symbol()
(integer):
node:type()
(string):
The difference is inside a statistical error range. Does the Lua compare string in a constant time? Yes it does! Here is my investigation.
As a result, I dropped node:symbol()
and used only node:type()
. However, I kept the expected_node_type
. Another useful property of integers is that I can make binary masks out of them. Those masks can be used to say if a type is one of a set of types with a single condition.
The last piece in the puzzle is filling the menu. In the example we used the entry maker from the menu for a buffer lines. Unfortunately, it doesn't work for us, because the format of our data is different. It expects the data with 2 fields: the line number, the line content. In our case, every entry has the line, the column, the kind, the text. There is Telescope's entry maker for Treesitter, though, but it creates its own text, which doesn't work for us, because we need our own text.
Also, in order to search among a certain kind of symbols, I customized the sorter of the menu. As a result the code to create our menu is:
local function ret(opts) if opts == nil then opts = {} end local bufnr = vim.api.nvim_get_current_buf() opts.filename = vim.api.nvim_buf_get_name(bufnr) local results = collect_document_symbols(bufnr) if vim.tbl_isempty(results) then return end pickers.new(opts, { prompt_title = "Zig symbols", finder = finders.new_table({ results = results, entry_maker = opts.entry_maker or entry_maker(opts), }), sorter = conf.prefilter_sorter({ tag = "kind", sorter = conf.generic_sorter(opts), }), previewer = conf.grep_previewer(opts), }):find() end
Out of the code note 2 things:
entry_maker
is our custom function which I'll describe later.sorter
has the field tag
with the value "kind"
. This value will be used by Telescope to the kind of a symbol in an entry of the table results
.An entry maker returns a function which is the actual entry maker. There are 2 important things to create an entry maker:
The formatting, which the displayer does, determines 3 things:
The listed items is implemented with the following code:
local displayer = entry_display.create({ separator = " │ ", items = { { width = 5 }, { width = 25 }, { remaining = true }, }, }) local make_display = function(entry) return displayer({ { entry.lnum, "TelescopeResultsLineNr" }, { entry.kind, "TelescopeResultsSpecialComment" }, { entry.text, "" }, }) end
The last piece in creating the entry maker is the mapping of the data, which we have mined, to the Telescope menu. As it navigates a user, it needs the following data:
The code to do this:
return function(entry) local entry_kind = entry.kind return make_entry.set_default_entry_mt({ ordinal = entry.text .. " " .. entry_kind, kind = entry_kind, display = make_display, filename = opts.filename, lnum = entry.lnum, text = entry.text, col = entry.col, }, opts) end
That's what the function entry_maker
returns.
Out of this little adventure, I found that it's fairly simple (not easy, but simple) to work with Tree-sitter to get information about the source code. It made me to wonder about custom code linters. For example, a team might have internal code style, they check if the style is maintained in every pull request manually. A custom linter would automate this process and it could be run even in CI or even Git hooks.
It was very interesting to discover the internals of comparing strings in Lua. I understand better the reason of Lua being popular in Game development where your budget is 16 ms, not mentioning screens with 120Hz and 240Hz refresh frequencies. I definitely learned the lesson of not assuming something based on my experience and checking the actual implementation.
I haven't created a plugin for this, because essentially it's just a configuration. There is no licence for the code, feel free to use however you want. Link to the code again.