Multi-language code indexer (written in C11) for semantic search, built on SQLite and tree-sitter.
Languages currently implemented: C, Go, Perl, PHP, Python, TypeScript
Database: Creates code-index.db in current working directory
Note: This tool is designed for indexing source code (functions, classes, variables, ...),
not prose or documentation files (.md, .txt, .log, ...).
While you can index any text file, the tool extracts code symbols and won't be useful for natural language text.
On an apt-based Linux (like Debian), install the following dependencies. MacOS users, see MACOS_SETUP.md. Windows users, see MSYS2_SETUP.md I have not yet tested on non-apt Linux, so any help here would be appreciated.
apt install libtree-sitter-dev libtree-sitter0.25 libsqlite3-devNote: You may need to change libtree-sitter0.25 to libtree-sitter0, or some other version, depending on your system.
Clone the repo
git clone https://github.com/ebcode/SourceMinder.git && cd SourceMinderClone the tree-sitter grammars (at least one):
git clone https://github.com/tree-sitter/tree-sitter-c.git
git clone https://github.com/tree-sitter/tree-sitter-go.git
git clone https://github.com/tree-sitter-perl/tree-sitter-perl.git
git clone https://github.com/tree-sitter/tree-sitter-php.git
git clone https://github.com/tree-sitter/tree-sitter-python.git
git clone https://github.com/tree-sitter/tree-sitter-typescript.gitSelect which languages you want to build (all disabled by default):
./configure --enable-all # All languages (recommended for testing)
./configure --enable-c --enable-typescript --enable-php # Specific languages
./configure --enable-all --disable-php # All but PHP
CC=clang ./configure --enable-c # Custom compiler, only Cmake # Build indexers and query tool
sudo make install # Install to /usr/local/binInstalled binaries: index-c, index-ts, index-php, index-go, index-python, index-perl, qi
Config files: /usr/local/share/sourceminder/<language>/config/
| Flag | Purpose | Example |
|---|---|---|
-i <type...> |
Include context types (OR) | qi user -i func var prop |
-x <type...> |
Exclude context types (OR) | qi user -x comment string |
--list-types |
Display full list of context types | qi --list-types |
-x noise |
Exclude comments & strings | qi user -x noise |
-f <pattern...> |
Filter by file paths (OR) | qi user -f .c .h or qi user -f src/* lib/* |
-p <pattern> |
Filter by parent symbol | qi count -p patterns finds patterns->count |
--parent-type <pattern...> |
Filter by parent's declared type | qi % -i prop --parent-type IndexerConfig finds fields of any IndexerConfig variable |
-t <pattern> |
Filter by type annotation | qi '*' -i arg -t 'int *' finds int pointer args |
-m <pattern> |
Filter by modifier | qi '*' -i func -m static finds static functions |
-s <pattern> |
Filter by scope | qi '*' -s public finds public members |
-e |
Expand full definitions | qi getUserById -i func -e |
-C <n> |
Show n context lines | qi user -C 3 |
-A <n> |
Show n lines after | qi user -A 5 |
-B <n> |
Show n lines before | qi user -B 5 |
--and |
Multi-pattern same-line | qi fprintf stderr --and |
--and <n> |
Multi-pattern within n lines | qi malloc free --and 10 |
--def |
Only definitions | qi getUserById --def |
--usage |
Only usages | qi User --usage |
--within <sym> |
Search within function/class | qi malloc --within handle_request |
--limit <n> |
Limit results | qi '*' --limit 20 |
--toc |
Table of contents | qi '*' -f file.c --toc |
--columns |
Custom columns | qi user --columns line symbol context parent |
-v |
Verbose (all columns) | qi user -v |
--full |
Full context names | qi user --full |
--raw |
Bare source lines only (no headers, line numbers, separators, or stats); use with -e or -B/-A |
qi new -f file.pl -e --raw |
IMPORTANT: Multi-Value Arguments (Non-Standard UNIX Pattern)
Unlike most UNIX tools, qi accepts multiple values for both the main pattern and subsequent flags:
# Multiple search patterns (OR logic)
qi user session token --limit 20 # Find user OR session OR token
# Multiple context types (OR logic)
qi user -i func var prop # Functions OR variables OR properties
# Multiple file patterns (OR logic)
qi symbol -f file1.c file2.c utils.h # In any of these files
qi malloc -f *.c *.h # All .c OR .h files
# Combine multiple values across arguments
qi malloc free -i func call -f memory.c allocator.c buffer.cComparison with standard UNIX tools:
- Standard:
grep -f pattern.txt file.c(one file per-f) - qi:
qi symbol -f file1.c file2.c file3.c(multiple files per-f)
Pattern wildcards:
*= any characters (e.g.,*Manager,get*,*user*).= single character (e.g.,.etUsermatchesgetUser,setUser)- No wildcards = exact match (case-insensitive)
- Note:
%and_also work (SQL LIKE syntax) but*and.are recommended
Escaping special characters:
\*= literal asterisk (e.g.,operator\*findsoperator*)\.= literal period (e.g.,file\.cfindsfile.c)\--flag= search for flag-like symbols (e.g.,\--helpfinds--help)
Examples:
qi 'get*' -i func # Wildcard: finds getUser, getData, etc.
qi 'get\*' -i func # Literal: finds "get*" symbol if it exists
qi '.etUser' # Wildcard: finds getUser, setUser (. = single char)
qi '\--help' # Literal: finds "--help" stringFor Developers:
- Browse a project's vocabulary (like a book index)
- Find symbols by context (classes, functions, variables, etc.)
- Track relationships with parent symbols (e.g.,
obj.method()) - Filter by access modifiers (public, private, protected)
- Query with SQL wildcards for partial matches
qi vs grep — see docs/QI_VS_GREP.md for a full comparison. In short: use qi as the default for code navigation (symbols, structure, precise filtering); use grep for literal text, regex, and non-code files.
For Claude Code:
- Faster workflow than grep + Read for code navigation
- Saves tokens with compact output
- Relative paths work directly with Read/Edit tools
- Use
qi -f file.c --tocbefore Read tool to understand structure - Use
-eto see full definitions without Read tool
Step 1: Index the SourceMinder codebase
index-c . --once --verboseStep 2: Query (single and multiple patterns)
qi -f shared/indexer-main.c --toc # Table-of-Contents output
qi handle_import_statement # Exact match
qi '*user*' # Contains "user"
qi '*' -i func --def --limit 10 # First 10 function definitions (quote single * to prevent shell expansion)
# Multiple patterns (OR logic)
qi malloc free calloc # Find any of these
qi user session token -i var --limit 20 # Variables named user/session/tokenStep 3: View code
qi getUserById -i func -e # Expand full function
qi getUserById -i func call -e -C 3 # Expand full function, and show context around call sitesStep 4: Background daemon
index-c . & # Watch for changesqi 'symbol' --limit-per-file 1 --limit 5Output:
Searching for: symbol
LINE | SYM | CTX
-----+--------+-----
./query-index.c:
67 | symbol | COM
./benchmark/struct-layout-optimized.c:
26 | symbol | PROP
./benchmark/struct-layout-original.c:
17 | symbol | PROP
./c/c_language.c:
23 | Symbol | COM
./go/go_language.c:
83 | Symbol | COM
Found 351 matches (showing first 5)
Result breakdown: ARG (162), COM (68), VAR (52), STR (37), PROP (32)
Tip: Use -i <context> to narrow results
- LINE: Line number
- SYM: The matched symbol (as it appears in the source)
- CTX: Where/how the symbol appears (context)
- Informational notes at the end
Understanding context types is key to effective filtering:
| Type | Short | Definition | Example |
|---|---|---|---|
class |
- | Class declarations | class UserManager { } |
interface |
iface |
Interface declarations | interface Storable { } |
function |
func |
Functions and methods | function getUserById() { } |
argument |
arg |
Function parameters | function foo(userId: string) |
variable |
var |
Variables and constants | const cursor = ... |
property |
prop |
Class properties | this.name = "Alice" |
type |
- | Type definitions/annotations | type UserID = string |
import |
imp |
Imported symbols | import { User } from ... |
export |
exp |
Exported symbols | export { User } |
call |
- | Function/method calls | getUserById(123) |
lambda |
- | Lambda/arrow functions | (x) => x * 2 |
enum |
- | Enum declarations | enum Status { } |
case |
- | Enum cases | Active = 1 |
namespace |
ns |
Namespace declarations | namespace Utils { } |
trait |
- | Trait declarations | trait Serializable { } (PHP) |
comment |
com |
Words from comments | // display user info |
string |
str |
Words from string literals | "user name" |
filename |
file |
Filename without extension | File named user.ts |
Usage:
qi user -i func var # Include Functions OR variables
qi user -x comment string # Exclude comments AND strings
qi user -x noise # Shorthand for -x comment string
qi user -i func var -x comment # Combine include and excludeFolder Mode (recommended for projects):
- Recursively indexes all matching files
- Respects ignore lists in
<language>/config/ignore_files.txt - Runs in daemon by default, watching for changes
- Use for: Active development on a codebase
index-c ./src ./libFile Mode (for specific files):
- Indexes only the specified files
- Ignores ignore lists
- Runs once and exits (no daemon)
- Use for: One-off indexing, testing, or specific file updates
index-c main.c utils.c helper.cStart daemon:
index-c ./src & # Background process
index-c ./src --silent & # SilentOne-time indexing:
index-c ./src --once # Index once and exitStop daemon:
ps aux | grep index-c # Find process ID
kill <PID> # Kill it
killall index-c index-ts # Or kill all indexersWhen to use:
- Daemon: Active development - auto-updates as you edit
- --once: CI/CD pipelines, one-time analysis, manual control
index-<language> <folders...> [options]Options:
--once- Run once and exit (no daemon)--silent- Silence output--quiet-init- Quiet initial indexing, noisy re-indexing on file change--verbose- Show preflight checks and progress--exclude-dir DIR [DIR...]- Exclude additional folders
Examples:
index-c ./src --verbose --once
index-c ./src ./lib --quiet-init &
index-c ../ --exclude-dir build dist node_modulesThe indexer:
- Scans files matching
<language>/config/file-extensions.txt - Skips folders in
<language>/config/ignore_files.txt - Extracts symbols via tree-sitter AST parsing
- Tracks parent symbols for member expressions (e.g.,
this.target.getBounds()) - Captures access modifiers (public, private, protected)
- Filters noise (stopwords, keywords, punctuation, short symbols, pure numbers)
- Stores relative paths from current working directory
Multiple language indexers can run in parallel without database locks:
index-ts ./src &
index-c ./lib &
index-php ./app &SQLite WAL mode is enabled automatically on first run.
qi <pattern...> [options] # One or more patterns (logical OR)Single pattern matching (wildcard syntax):
qi cursor # Exact: cursor
qi '*manager' # Ends with Manager
qi 'get*' # Starts with get
qi '*session*' # Contains session
qi _etuser # Matches: getUser, setUserMultiple pattern matching (OR logic - finds any match):
qi malloc free calloc # Find any of these functions
qi user session token --limit 30 # Find user OR session OR token
qi '*error' '*Exception' '*Fail*' # Any error-related symbolsqi '*manager' -i class iface # Only classes or interfaces
qi user -x noise # Exclude comments and strings
qi user -f .c .h # .c OR .h files
qi user -f src/ # All files in src/
qi user -f src/* lib/* # Multiple directoriesqi getUserById --def # Only definitions
qi User --usage # Only usagesFind lines where ALL patterns co-occur:
qi fprintf err_msg --and # Both on same line
qi malloc free --and 10 # Both within 10 lines of each otherqi getUserById -i func -e # Expand complete function definition
qi cursor -C 2 # 2 lines before and after
qi cursor -A 5 # 5 lines afterqi -f database.c --toc # All definitions in file
qi -f c_language.c --toc -i func # Only functionsOutput:
database.c:
IMPORTS: sqlite3.h, stdio.h, stdlib.h, string.h
FUNCTIONS:
init_database (15-45)
execute_query (47-89)
close_database (91-98)
TYPES:
DatabaseConfig (5-12)
qi user --limit 10 # First 10 matches
qi '*' --limit-per-file 3 --limit 10 # First 3 per file, up to 10 total# Discover → explore → expand
qi '*' -f query-index.c --toc # 1. Explore file structure
qi '*user*' -i func --def --limit 10 # 2. Find interesting functions
qi validateUser -i func -e # 3. See implementation
qi validateUser --usage -C 3 # 4. Find usage examplesqi also supports power-user filters for parent-symbol, type annotation, modifier, and scope-based queries, plus scoped search within a specific function or class. See docs/ADVANCED_USAGE.md for details.
Config files (file extensions, ignored folders, stopwords, keywords) live under <language>/config/ locally and /usr/local/share/sourceminder/<language>/config/ system-wide. See docs/CONFIGURATION.md for the full reference.
For index-size expectations, query-performance tips, and recommended workflows, see docs/PERFORMANCE.md.
Database-locked errors, missing symbols, build issues, and debugging tips are covered in docs/TROUBLESHOOTING.md.