MarkdownCommander

**Larz60+**

I have tons of markdown documents that i have collected for some time. Whenever I need to find one that I put together sometime in the past, the time consummed in actually finding the document often exceeded the value of it's contents. I decided a tool was needed that would allow quick access to any document, and perhaps display the results as well. So I came up with the idea for a Markdown Commander. I laid out the general plan, and decided I would use AI to help with the coding. My AI of choice is XGrok, and the contribution I received from Grok was tremendous. This turned out to be such a useful too, that I thought I'd share the code.

So here's the steps needed to build it on your system. I built this on Linux Mint 21.3, and haven't tried it on any other OS, but I beleive it should run with little or no modification.

Create a project directory and name it MarkdownCommander.
cd to that directory.
Create a virtual environment, using whatever tool you like. I use pythons venv like so: python -m venv venv
Start the virtual environment: . ./venv/bin/activate
install wxpython using the following command: pip install -U -f https://extras.wxpython.org/wxPython4/extras/linux/gtk3/ubuntu-22.04 wxPython
This command is for Ubuntu system, using gtk which is what linux mint 21.3 uses as it's base. You may have to play with this for your OS. wxpython can be fussy.
Cleanup apt: sudo apt autoclean
Update apt: sudo apt update
Install wxwidgets using: sudo apt install libwxgtk3.0-dev
Install mistune: pip install mistune
install sentence_transformers: pip install sentence_transformers
Create a src directory, and cd to that directory:
mkdir src
cd src
create an empty __init__.py file: touch __init__.py

Using you favorite text editor, add MarkdownCommander.py:

import wx
import wx.html2
import mistune
from pathlib import Path
from sentence_transformers import SentenceTransformer, util
import threading
import torch
from datetime import datetime


# ====================== SAFE TORCH LOAD ======================
torch.serialization.add_safe_globals([datetime])


class SemanticIndexer:
    def __init__(self, model_name="all-MiniLM-L6-v2"):
        self.model = SentenceTransformer(model_name)
        self.embeddings = None
        self.file_paths = []
        self.file_names = []
        self.file_mtimes = []
        self.source_dir = None
        self.build_time = None
        self.last_update_time = None
        self.exclude_dirs = {'.git', 'node_modules', '__pycache__', 'venv', 'env', '.venv'}

    def _get_file_info(self, path: Path):
        try:
            text = path.read_text(encoding='utf-8', errors='ignore')[:12000]
            stat = path.stat()
            return text, stat.st_mtime
        except:
            return None, None

    def build_index(self, directory: str):
        directory = Path(directory)
        self.source_dir = directory
        texts = []
        self.file_paths.clear()
        self.file_names.clear()
        self.file_mtimes.clear()

        for p in directory.rglob("*.*"):
            if any(ex in p.parts for ex in self.exclude_dirs):
                continue
            if p.suffix.lower() not in {'.md', '.txt', '.py', '.html', '.rst'}:
                continue
            text, mtime = self._get_file_info(p)
            if text:
                texts.append(text)
                self.file_paths.append(str(p))
                self.file_names.append(p.name)
                self.file_mtimes.append(mtime)

        print(f"Encoding {len(texts)} documents...")
        self.embeddings = self.model.encode(texts, convert_to_tensor=True, show_progress_bar=True)
        self.build_time = datetime.now()
        self.last_update_time = datetime.now()

    def search(self, query: str, top_k=10):
        if self.embeddings is None:
            return []
        query_emb = self.model.encode(query, convert_to_tensor=True)
        hits = util.semantic_search(query_emb, self.embeddings, top_k=top_k)[0]
        return [(self.file_paths[h['corpus_id']], 
                 self.file_names[h['corpus_id']], 
                 h['score']) for h in hits]

    def check_and_update(self):
        if not self.source_dir or self.embeddings is None:
            return False, 0

        current_files = {}
        for p in self.source_dir.rglob("*.*"):
            if any(ex in p.parts for ex in self.exclude_dirs):
                continue
            if p.suffix.lower() not in {'.md', '.txt', '.py', '.html', '.rst'}:
                continue
            _, mtime = self._get_file_info(p)
            if mtime:
                current_files[str(p)] = (p, mtime)

        changed_count = 0
        to_keep = [i for i, path in enumerate(self.file_paths) if path in current_files]
        if len(to_keep) != len(self.file_paths):
            changed_count += len(self.file_paths) - len(to_keep)
            self.file_paths = [self.file_paths[i] for i in to_keep]
            self.file_names = [self.file_names[i] for i in to_keep]
            self.file_mtimes = [self.file_mtimes[i] for i in to_keep]
            self.embeddings = self.embeddings[to_keep]

        existing_set = set(self.file_paths)
        to_add_texts = []
        to_add_paths = []
        to_add_names = []
        to_add_mtimes = []

        for path_str, (p, mtime) in current_files.items():
            if path_str not in existing_set:
                text, _ = self._get_file_info(p)
                if text:
                    to_add_texts.append(text)
                    to_add_paths.append(path_str)
                    to_add_names.append(p.name)
                    to_add_mtimes.append(mtime)
                    changed_count += 1
            else:
                idx = self.file_paths.index(path_str)
                if abs(mtime - self.file_mtimes[idx]) > 2.0:
                    text, _ = self._get_file_info(p)
                    if text:
                        to_add_texts.append(text)
                        to_add_paths.append(path_str)
                        to_add_names.append(p.name)
                        to_add_mtimes.append(mtime)
                        self.file_paths.pop(idx)
                        self.file_names.pop(idx)
                        self.file_mtimes.pop(idx)
                        self.embeddings = torch.cat([self.embeddings[:idx], self.embeddings[idx+1:]])
                        changed_count += 1

        if to_add_texts:
            new_emb = self.model.encode(to_add_texts, convert_to_tensor=True, show_progress_bar=False)
            self.embeddings = torch.cat([self.embeddings, new_emb]) if len(self.embeddings) > 0 else new_emb
            self.file_paths.extend(to_add_paths)
            self.file_names.extend(to_add_names)
            self.file_mtimes.extend(to_add_mtimes)

        if changed_count > 0:
            self.last_update_time = datetime.now()

        return changed_count > 0, changed_count

    def save(self, save_dir: Path):
        save_dir.mkdir(parents=True, exist_ok=True)
        torch.save({
            'embeddings': self.embeddings,
            'file_paths': self.file_paths,
            'file_names': self.file_names,
            'file_mtimes': self.file_mtimes,
            'source_dir': str(self.source_dir) if self.source_dir else None,
            'build_time': self.build_time,
            'last_update_time': self.last_update_time,
        }, save_dir / "index.pt")

    def load(self, save_dir: Path):
        data = torch.load(save_dir / "index.pt", weights_only=True, map_location='cpu')
        self.embeddings = data['embeddings']
        self.file_paths = data['file_paths']
        self.file_names = data['file_names']
        self.file_mtimes = data.get('file_mtimes', [])
        src = data.get('source_dir')
        self.source_dir = Path(src) if src else None
        self.build_time = data.get('build_time')
        self.last_update_time = data.get('last_update_time')


# ====================== MAIN APP ======================
class SemanticSearchApp(wx.Frame):
    INDEX_DIR = Path.home() / "semantic_search_index"
    DARK_BG = wx.Colour(30, 30, 30)
    DARK_FG = wx.Colour(230, 230, 230)
    LIGHT_BG = wx.Colour(255, 255, 255)
    LIGHT_FG = wx.Colour(0, 0, 0)

    def __init__(self):
        super().__init__(None, title="Markdown Commander", size=(1280, 820))
        self.indexer = None
        self.current_results = []
        self._update_lock = threading.Lock()
        self.is_dark_mode = False
        
        self._build_ui()
        self._try_auto_load_index()
        self.Show()

    def _build_ui(self):
        splitter = wx.SplitterWindow(self, style=wx.SP_LIVE_UPDATE | wx.SP_3D)
        left = wx.Panel(splitter)
        sizer = wx.BoxSizer(wx.VERTICAL)

        # Query area
        query_box = wx.BoxSizer(wx.HORIZONTAL)
        wx.StaticText(left, label="Query:", size=(50, -1))
        self.query_ctrl = wx.TextCtrl(left, style=wx.TE_PROCESS_ENTER, size=(-1, 40))
        self.query_ctrl.Bind(wx.EVT_TEXT_ENTER, self.on_search)
        query_box.Add(self.query_ctrl, 1, wx.EXPAND | wx.RIGHT, 8)

        self.search_btn = wx.Button(left, label="Search", size=(100, 40))
        self.search_btn.Bind(wx.EVT_BUTTON, self.on_search)
        query_box.Add(self.search_btn, 0, wx.ALIGN_CENTER_VERTICAL)

        sizer.Add(query_box, 0, wx.EXPAND | wx.ALL, 10)

        wx.StaticText(left, label="Results (double-click to open):")
        self.results_list = wx.ListCtrl(left, style=wx.LC_REPORT | wx.LC_SINGLE_SEL | wx.LC_HRULES)
        self.results_list.InsertColumn(0, "Score", width=90)
        self.results_list.InsertColumn(1, "Document", width=580)
        self.results_list.Bind(wx.EVT_LIST_ITEM_ACTIVATED, self.on_result_click)

        sizer.Add(self.results_list, 1, wx.EXPAND | wx.ALL, 10)
        left.SetSizer(sizer)

        # ==================== RIGHT PANEL - HTML PREVIEW ====================
        right_panel = wx.Panel(splitter)
        right_sizer = wx.BoxSizer(wx.VERTICAL)
        
        toolbar = wx.BoxSizer(wx.HORIZONTAL)
        self.clear_btn = wx.Button(right_panel, label="Clear Preview")
        self.clear_btn.Bind(wx.EVT_BUTTON, self.on_clear_preview)
        toolbar.Add(self.clear_btn, 0, wx.ALL, 5)
        right_sizer.Add(toolbar, 0, wx.EXPAND | wx.LEFT | wx.RIGHT, 5)

        # self.preview = wx.html.HtmlWindow(right_panel, style=wx.SUNKEN_BORDER)
        self.preview = wx.html2.WebView.New(right_panel)
        # self.preview.SetStandardFonts(10)
        right_sizer.Add(self.preview, 1, wx.EXPAND | wx.ALL, 5)
        
        right_panel.SetSizer(right_sizer)

        splitter.SplitVertically(left, right_panel, 680)
        splitter.SetMinimumPaneSize(400)

        # Menu
        menu_bar = wx.MenuBar()
        file_menu = wx.Menu()
        file_menu.Append(101, "&Build/Rebuild Full Index...\tCtrl+B")
        file_menu.Append(102, "&Force Full Rebuild\tCtrl+R")
        file_menu.Append(103, "&Save Current Index\tCtrl+S")
        file_menu.AppendSeparator()
        file_menu.Append(105, "Toggle Dark/Light Mode\tCtrl+T")
        file_menu.AppendSeparator()
        file_menu.Append(104, "E&xit")
        menu_bar.Append(file_menu, "&File")
        self.SetMenuBar(menu_bar)

        self.Bind(wx.EVT_MENU, self.on_build_index, id=101)
        self.Bind(wx.EVT_MENU, self.on_force_rebuild, id=102)
        self.Bind(wx.EVT_MENU, self.on_save_index, id=103)
        self.Bind(wx.EVT_MENU, self.on_toggle_theme, id=105)
        self.Bind(wx.EVT_MENU, lambda e: self.Close(), id=104)

        self.SetStatusBar(wx.StatusBar(self))
        self.SetStatusText("Ready — Build an index to begin")
        self.apply_theme()

    def on_clear_preview(self, event):
        self.preview.SetPage("","")
        self.SetStatusText("Preview cleared")

    def apply_theme(self):
        bg = self.DARK_BG if self.is_dark_mode else self.LIGHT_BG
        fg = self.DARK_FG if self.is_dark_mode else self.LIGHT_FG

        self.SetBackgroundColour(bg)
        self.query_ctrl.SetBackgroundColour(bg)
        self.query_ctrl.SetForegroundColour(fg)
        self.results_list.SetBackgroundColour(bg)
        self.results_list.SetForegroundColour(fg)
        self.Refresh()
        self.Update()

    def on_toggle_theme(self, event):
        self.is_dark_mode = not self.is_dark_mode
        self.apply_theme()
        mode = "Dark" if self.is_dark_mode else "Light"
        self.SetStatusText(f"Switched to {mode} mode")

    # ==================== Remaining methods ====================
    def _try_auto_load_index(self):
        if not (self.INDEX_DIR / "index.pt").exists():
            return
        if wx.MessageBox("Saved index found. Load it now?", "Load Index", wx.YES_NO | wx.ICON_QUESTION) == wx.YES:
            try:
                self.indexer = SemanticIndexer()
                self.indexer.load(self.INDEX_DIR)
                self._index_ready()
            except Exception as e:
                wx.MessageBox(f"Load failed:\n{str(e)}", "Load Error", wx.OK | wx.ICON_ERROR)

    def on_build_index(self, event):
        dlg = wx.DirDialog(self, "Select folder with your documents")
        if dlg.ShowModal() == wx.ID_OK:
            path = dlg.GetPath()
            self.SetStatusText("Building full index...")
            threading.Thread(target=self._build_index_thread, args=(path,), daemon=True).start()
        dlg.Destroy()

    def _build_index_thread(self, folder):
        if self.indexer is None:
            self.indexer = SemanticIndexer()
        try:
            self.indexer.build_index(folder)
            wx.CallAfter(self._index_ready)
        except Exception as e:
            wx.CallAfter(lambda: wx.MessageBox(str(e), "Error"))

    def on_force_rebuild(self, event):
        if not self.indexer or not self.indexer.source_dir:
            wx.MessageBox("Please build an index first.", "Info")
            return
        if wx.MessageBox("Clear current index and rebuild everything?", "Force Rebuild", wx.YES_NO | wx.ICON_WARNING) == wx.YES:
            threading.Thread(target=self._build_index_thread, args=(str(self.indexer.source_dir),), daemon=True).start()

    def _index_ready(self):
        count = len(self.indexer.file_paths)
        ts = self.indexer.last_update_time.strftime("%Y-%m-%d %H:%M") if self.indexer.last_update_time else ""
        self.SetStatusText(f"Ready — {count} documents • Last updated: {ts}")

    def _check_for_updates(self):
        if not self.indexer:
            return
        with self._update_lock:
            updated, changed = self.indexer.check_and_update()
            if updated:
                wx.CallAfter(self._index_ready)
                wx.CallAfter(lambda: self.SetStatusText(f"Auto-updated: {changed} new/changed file{'s' if changed != 1 else ''}"))

    def on_search(self, event):
        if not self.indexer or self.indexer.embeddings is None:
            wx.MessageBox("Please build an index first.", "Info")
            return

        threading.Thread(target=self._check_for_updates, daemon=True).start()

        query = self.query_ctrl.GetValue().strip()
        if not query:
            return

        self.SetStatusText("Searching...")
        self.results_list.DeleteAllItems()

        results = self.indexer.search(query, top_k=12)
        self.current_results = results

        for i, (_, name, score) in enumerate(results):
            self.results_list.InsertItem(i, f"{score:.4f}")
            self.results_list.SetItem(i, 1, name)

        self.SetStatusText(f"Found {len(results)} results")

    def on_save_index(self, event):
        if not self.indexer or self.indexer.embeddings is None:
            wx.MessageBox("Nothing to save.", "Info")
            return
        try:
            self.indexer.save(self.INDEX_DIR)
            wx.MessageBox(f"Index saved to:\n{self.INDEX_DIR}", "Success")
        except Exception as e:
            wx.MessageBox(str(e), "Save Failed")

    def on_result_click(self, event):
        idx = event.GetIndex()
        full_path, name, score = self.current_results[idx]
        fpath = Path(full_path)
        
        try:
            content = fpath.read_text(encoding="utf-8", errors="ignore")
            html_content = mistune.html(content)            

            full_html = f"""
            <html>
            <head>
                <style>
                    body {{ font-family: Arial, Helvetica, sans-serif; padding: 20px; line-height: 1.6; }}
                    pre {{ background: #f4f4f4; padding: 12px; border-radius: 4px; overflow: auto; }}
                    code {{ font-family: monospace; }}
                    h1, h2, h3 {{ color: #2c3e50; }}
                    img {{ max-width: 100%; height: auto; display: block; margin: 15px 0; }}
                </style>
            </head>
            <body>
                {html_content}
            </body>
            </html>
            """

            # Fix: Use base URL so local images load
            base_url = f"file://{fpath.parent.resolve()}/"
            
            self.preview.SetPage(full_html, base_url)
            self.SetStatusText(f"Opened: {name}  (Score: {score:.4f})")
            
        except Exception as e:
            print(f"Preview error: {e}")  # for your console
            self.SetStatusText(f"Could not open file: {e}")
            self.preview.SetPage(f"<html><body><p>Error: {e}</p></body></html>", "")
        
if __name__ == "__main__":
    app = wx.App(False)
    frame = SemanticSearchApp()
    app.MainLoop()

go back to main directory: cd ..
create a data directory: mkdir data
create a markdown sub directory in data: mkdir ./data/markdown
And an image directory below the markdown directory: mkdir ./data/markdown/images

Gather your markdown files, and load them all into the markdown directory, and any associated images into the images directory.

sentence_transformers uses hugging face machine learning language, so an internet connection is needed to run.

That should be all that's required to get started.

From the project directory, run: python src/MarkdownCommander.py

Once you have built a model, it can be reloaded to speed up the process. You should only need to rebuild when you add new documents.

Here's a screenshot of the program, and the display of a markdown document that contains images:

Again, I greatly appreciate the tremendous help that I received from XGrok, which is my choice for python assistance.

If you find that I missd anything, please let me know.

Edited may 17 -- Clarified install steps

noisefloor · May-17-2026, 05:24 PM

Hi,

if I understand correctly, the program does not parse any syntax of the Markdown files, right? If this is the case, the program should work as well with an other purely text-based format like e.g. reST?

Regards, noisefloor

Nova · (This post was last modified: May-18-2026, 03:52 PM by buran.)

This is actually a really useful idea — semantic search for personal markdown archives can save a ton of time, especially for large note collections. The UI and markdown preview integration make it feel practical for real daily use as well.

I’ve been wanting to build more utility tools like this myself once my PC bottleneck[/url] stops bullying my workflow 😅 Really nice project overall, and the setup instructions are detailed enough to follow easily.

buran write May-18-2026, 03:52 PM:
Spam link removed

**Larz60+** · (This post was last modified: May-17-2026, 11:37 PM by Larz60+.)

noisefloor Wrote:if I understand correctly, the program does not parse any syntax of the Markdown files, right?

You are correct. There's no reason why this can't be modified for many other types of files.

noisefloor · May-18-2026, 01:18 PM

Hello,

Quote:semantic search for personal markdown archives can save a ton of time, especially for large note collections.

Well, if somebody uses a Linux distro with GNOME as the desktop environment, GNOME's Tracker should do the same thing: allow full text search through documents which are indexed by Tracker. Respectively Baloo for KDE Plasma or, more general, any desktop search engine integrating with your Linux system. Not sure what Windows and MacOS offer, but I guess there's something available here, too.
I played around with Tracker a few years back for the German-language Ubuntu Wiki, but I never tried to search through a larger collection of documents with it. So I don't know how good (or bad) it is compared to the solution presented here.

Regards, noisefloor

MarkdownCommander

User Panel Messages

Announcements