May-16-2026, 02:56 PM
(This post was last modified: May-17-2026, 11:39 PM by Larz60+.
Edit Reason: Clear Page button didn't work, removed clutter
)
I have tons of markdown documents that i have collected for some time. Whenever I need to find one that I put together sometime in the past, the time consummed in actually finding the document often exceeded the value of it's contents. I decided a tool was needed that would allow quick access to any document, and perhaps display the results as well. So I came up with the idea for a Markdown Commander. I laid out the general plan, and decided I would use AI to help with the coding. My AI of choice is XGrok, and the contribution I received from Grok was tremendous. This turned out to be such a useful too, that I thought I'd share the code.
So here's the steps needed to build it on your system. I built this on Linux Mint 21.3, and haven't tried it on any other OS, but I beleive it should run with little or no modification.
Gather your markdown files, and load them all into the markdown directory, and any associated images into the images directory.
sentence_transformers uses hugging face machine learning language, so an internet connection is needed to run.
That should be all that's required to get started.
From the project directory, run:
Once you have built a model, it can be reloaded to speed up the process. You should only need to rebuild when you add new documents.
Here's a screenshot of the program, and the display of a markdown document that contains images:
Again, I greatly appreciate the tremendous help that I received from XGrok, which is my choice for python assistance.
If you find that I missd anything, please let me know.
Edited may 17 -- Clarified install steps
So here's the steps needed to build it on your system. I built this on Linux Mint 21.3, and haven't tried it on any other OS, but I beleive it should run with little or no modification.
- Create a project directory and name it MarkdownCommander.
- cd to that directory.
- Create a virtual environment, using whatever tool you like. I use pythons venv like so:
python -m venv venv
- Start the virtual environment:
. ./venv/bin/activate
- install wxpython using the following command:
pip install -U -f https://extras.wxpython.org/wxPython4/extras/linux/gtk3/ubuntu-22.04 wxPython
This command is for Ubuntu system, using gtk which is what linux mint 21.3 uses as it's base. You may have to play with this for your OS. wxpython can be fussy.
- Cleanup apt:
sudo apt autoclean
- Update apt:
sudo apt update
- Install wxwidgets using:
sudo apt install libwxgtk3.0-dev
- Install mistune:
pip install mistune
- install sentence_transformers:
pip install sentence_transformers
- Create a src directory, and cd to that directory:
mkdir src
cd src
- create an empty __init__.py file:
touch __init__.py
- Using you favorite text editor, add MarkdownCommander.py:
import wx import wx.html2 import mistune from pathlib import Path from sentence_transformers import SentenceTransformer, util import threading import torch from datetime import datetime # ====================== SAFE TORCH LOAD ====================== torch.serialization.add_safe_globals([datetime]) class SemanticIndexer: def __init__(self, model_name="all-MiniLM-L6-v2"): self.model = SentenceTransformer(model_name) self.embeddings = None self.file_paths = [] self.file_names = [] self.file_mtimes = [] self.source_dir = None self.build_time = None self.last_update_time = None self.exclude_dirs = {'.git', 'node_modules', '__pycache__', 'venv', 'env', '.venv'} def _get_file_info(self, path: Path): try: text = path.read_text(encoding='utf-8', errors='ignore')[:12000] stat = path.stat() return text, stat.st_mtime except: return None, None def build_index(self, directory: str): directory = Path(directory) self.source_dir = directory texts = [] self.file_paths.clear() self.file_names.clear() self.file_mtimes.clear() for p in directory.rglob("*.*"): if any(ex in p.parts for ex in self.exclude_dirs): continue if p.suffix.lower() not in {'.md', '.txt', '.py', '.html', '.rst'}: continue text, mtime = self._get_file_info(p) if text: texts.append(text) self.file_paths.append(str(p)) self.file_names.append(p.name) self.file_mtimes.append(mtime) print(f"Encoding {len(texts)} documents...") self.embeddings = self.model.encode(texts, convert_to_tensor=True, show_progress_bar=True) self.build_time = datetime.now() self.last_update_time = datetime.now() def search(self, query: str, top_k=10): if self.embeddings is None: return [] query_emb = self.model.encode(query, convert_to_tensor=True) hits = util.semantic_search(query_emb, self.embeddings, top_k=top_k)[0] return [(self.file_paths[h['corpus_id']], self.file_names[h['corpus_id']], h['score']) for h in hits] def check_and_update(self): if not self.source_dir or self.embeddings is None: return False, 0 current_files = {} for p in self.source_dir.rglob("*.*"): if any(ex in p.parts for ex in self.exclude_dirs): continue if p.suffix.lower() not in {'.md', '.txt', '.py', '.html', '.rst'}: continue _, mtime = self._get_file_info(p) if mtime: current_files[str(p)] = (p, mtime) changed_count = 0 to_keep = [i for i, path in enumerate(self.file_paths) if path in current_files] if len(to_keep) != len(self.file_paths): changed_count += len(self.file_paths) - len(to_keep) self.file_paths = [self.file_paths[i] for i in to_keep] self.file_names = [self.file_names[i] for i in to_keep] self.file_mtimes = [self.file_mtimes[i] for i in to_keep] self.embeddings = self.embeddings[to_keep] existing_set = set(self.file_paths) to_add_texts = [] to_add_paths = [] to_add_names = [] to_add_mtimes = [] for path_str, (p, mtime) in current_files.items(): if path_str not in existing_set: text, _ = self._get_file_info(p) if text: to_add_texts.append(text) to_add_paths.append(path_str) to_add_names.append(p.name) to_add_mtimes.append(mtime) changed_count += 1 else: idx = self.file_paths.index(path_str) if abs(mtime - self.file_mtimes[idx]) > 2.0: text, _ = self._get_file_info(p) if text: to_add_texts.append(text) to_add_paths.append(path_str) to_add_names.append(p.name) to_add_mtimes.append(mtime) self.file_paths.pop(idx) self.file_names.pop(idx) self.file_mtimes.pop(idx) self.embeddings = torch.cat([self.embeddings[:idx], self.embeddings[idx+1:]]) changed_count += 1 if to_add_texts: new_emb = self.model.encode(to_add_texts, convert_to_tensor=True, show_progress_bar=False) self.embeddings = torch.cat([self.embeddings, new_emb]) if len(self.embeddings) > 0 else new_emb self.file_paths.extend(to_add_paths) self.file_names.extend(to_add_names) self.file_mtimes.extend(to_add_mtimes) if changed_count > 0: self.last_update_time = datetime.now() return changed_count > 0, changed_count def save(self, save_dir: Path): save_dir.mkdir(parents=True, exist_ok=True) torch.save({ 'embeddings': self.embeddings, 'file_paths': self.file_paths, 'file_names': self.file_names, 'file_mtimes': self.file_mtimes, 'source_dir': str(self.source_dir) if self.source_dir else None, 'build_time': self.build_time, 'last_update_time': self.last_update_time, }, save_dir / "index.pt") def load(self, save_dir: Path): data = torch.load(save_dir / "index.pt", weights_only=True, map_location='cpu') self.embeddings = data['embeddings'] self.file_paths = data['file_paths'] self.file_names = data['file_names'] self.file_mtimes = data.get('file_mtimes', []) src = data.get('source_dir') self.source_dir = Path(src) if src else None self.build_time = data.get('build_time') self.last_update_time = data.get('last_update_time') # ====================== MAIN APP ====================== class SemanticSearchApp(wx.Frame): INDEX_DIR = Path.home() / "semantic_search_index" DARK_BG = wx.Colour(30, 30, 30) DARK_FG = wx.Colour(230, 230, 230) LIGHT_BG = wx.Colour(255, 255, 255) LIGHT_FG = wx.Colour(0, 0, 0) def __init__(self): super().__init__(None, title="Markdown Commander", size=(1280, 820)) self.indexer = None self.current_results = [] self._update_lock = threading.Lock() self.is_dark_mode = False self._build_ui() self._try_auto_load_index() self.Show() def _build_ui(self): splitter = wx.SplitterWindow(self, style=wx.SP_LIVE_UPDATE | wx.SP_3D) left = wx.Panel(splitter) sizer = wx.BoxSizer(wx.VERTICAL) # Query area query_box = wx.BoxSizer(wx.HORIZONTAL) wx.StaticText(left, label="Query:", size=(50, -1)) self.query_ctrl = wx.TextCtrl(left, style=wx.TE_PROCESS_ENTER, size=(-1, 40)) self.query_ctrl.Bind(wx.EVT_TEXT_ENTER, self.on_search) query_box.Add(self.query_ctrl, 1, wx.EXPAND | wx.RIGHT, 8) self.search_btn = wx.Button(left, label="Search", size=(100, 40)) self.search_btn.Bind(wx.EVT_BUTTON, self.on_search) query_box.Add(self.search_btn, 0, wx.ALIGN_CENTER_VERTICAL) sizer.Add(query_box, 0, wx.EXPAND | wx.ALL, 10) wx.StaticText(left, label="Results (double-click to open):") self.results_list = wx.ListCtrl(left, style=wx.LC_REPORT | wx.LC_SINGLE_SEL | wx.LC_HRULES) self.results_list.InsertColumn(0, "Score", width=90) self.results_list.InsertColumn(1, "Document", width=580) self.results_list.Bind(wx.EVT_LIST_ITEM_ACTIVATED, self.on_result_click) sizer.Add(self.results_list, 1, wx.EXPAND | wx.ALL, 10) left.SetSizer(sizer) # ==================== RIGHT PANEL - HTML PREVIEW ==================== right_panel = wx.Panel(splitter) right_sizer = wx.BoxSizer(wx.VERTICAL) toolbar = wx.BoxSizer(wx.HORIZONTAL) self.clear_btn = wx.Button(right_panel, label="Clear Preview") self.clear_btn.Bind(wx.EVT_BUTTON, self.on_clear_preview) toolbar.Add(self.clear_btn, 0, wx.ALL, 5) right_sizer.Add(toolbar, 0, wx.EXPAND | wx.LEFT | wx.RIGHT, 5) # self.preview = wx.html.HtmlWindow(right_panel, style=wx.SUNKEN_BORDER) self.preview = wx.html2.WebView.New(right_panel) # self.preview.SetStandardFonts(10) right_sizer.Add(self.preview, 1, wx.EXPAND | wx.ALL, 5) right_panel.SetSizer(right_sizer) splitter.SplitVertically(left, right_panel, 680) splitter.SetMinimumPaneSize(400) # Menu menu_bar = wx.MenuBar() file_menu = wx.Menu() file_menu.Append(101, "&Build/Rebuild Full Index...\tCtrl+B") file_menu.Append(102, "&Force Full Rebuild\tCtrl+R") file_menu.Append(103, "&Save Current Index\tCtrl+S") file_menu.AppendSeparator() file_menu.Append(105, "Toggle Dark/Light Mode\tCtrl+T") file_menu.AppendSeparator() file_menu.Append(104, "E&xit") menu_bar.Append(file_menu, "&File") self.SetMenuBar(menu_bar) self.Bind(wx.EVT_MENU, self.on_build_index, id=101) self.Bind(wx.EVT_MENU, self.on_force_rebuild, id=102) self.Bind(wx.EVT_MENU, self.on_save_index, id=103) self.Bind(wx.EVT_MENU, self.on_toggle_theme, id=105) self.Bind(wx.EVT_MENU, lambda e: self.Close(), id=104) self.SetStatusBar(wx.StatusBar(self)) self.SetStatusText("Ready — Build an index to begin") self.apply_theme() def on_clear_preview(self, event): self.preview.SetPage("","") self.SetStatusText("Preview cleared") def apply_theme(self): bg = self.DARK_BG if self.is_dark_mode else self.LIGHT_BG fg = self.DARK_FG if self.is_dark_mode else self.LIGHT_FG self.SetBackgroundColour(bg) self.query_ctrl.SetBackgroundColour(bg) self.query_ctrl.SetForegroundColour(fg) self.results_list.SetBackgroundColour(bg) self.results_list.SetForegroundColour(fg) self.Refresh() self.Update() def on_toggle_theme(self, event): self.is_dark_mode = not self.is_dark_mode self.apply_theme() mode = "Dark" if self.is_dark_mode else "Light" self.SetStatusText(f"Switched to {mode} mode") # ==================== Remaining methods ==================== def _try_auto_load_index(self): if not (self.INDEX_DIR / "index.pt").exists(): return if wx.MessageBox("Saved index found. Load it now?", "Load Index", wx.YES_NO | wx.ICON_QUESTION) == wx.YES: try: self.indexer = SemanticIndexer() self.indexer.load(self.INDEX_DIR) self._index_ready() except Exception as e: wx.MessageBox(f"Load failed:\n{str(e)}", "Load Error", wx.OK | wx.ICON_ERROR) def on_build_index(self, event): dlg = wx.DirDialog(self, "Select folder with your documents") if dlg.ShowModal() == wx.ID_OK: path = dlg.GetPath() self.SetStatusText("Building full index...") threading.Thread(target=self._build_index_thread, args=(path,), daemon=True).start() dlg.Destroy() def _build_index_thread(self, folder): if self.indexer is None: self.indexer = SemanticIndexer() try: self.indexer.build_index(folder) wx.CallAfter(self._index_ready) except Exception as e: wx.CallAfter(lambda: wx.MessageBox(str(e), "Error")) def on_force_rebuild(self, event): if not self.indexer or not self.indexer.source_dir: wx.MessageBox("Please build an index first.", "Info") return if wx.MessageBox("Clear current index and rebuild everything?", "Force Rebuild", wx.YES_NO | wx.ICON_WARNING) == wx.YES: threading.Thread(target=self._build_index_thread, args=(str(self.indexer.source_dir),), daemon=True).start() def _index_ready(self): count = len(self.indexer.file_paths) ts = self.indexer.last_update_time.strftime("%Y-%m-%d %H:%M") if self.indexer.last_update_time else "" self.SetStatusText(f"Ready — {count} documents • Last updated: {ts}") def _check_for_updates(self): if not self.indexer: return with self._update_lock: updated, changed = self.indexer.check_and_update() if updated: wx.CallAfter(self._index_ready) wx.CallAfter(lambda: self.SetStatusText(f"Auto-updated: {changed} new/changed file{'s' if changed != 1 else ''}")) def on_search(self, event): if not self.indexer or self.indexer.embeddings is None: wx.MessageBox("Please build an index first.", "Info") return threading.Thread(target=self._check_for_updates, daemon=True).start() query = self.query_ctrl.GetValue().strip() if not query: return self.SetStatusText("Searching...") self.results_list.DeleteAllItems() results = self.indexer.search(query, top_k=12) self.current_results = results for i, (_, name, score) in enumerate(results): self.results_list.InsertItem(i, f"{score:.4f}") self.results_list.SetItem(i, 1, name) self.SetStatusText(f"Found {len(results)} results") def on_save_index(self, event): if not self.indexer or self.indexer.embeddings is None: wx.MessageBox("Nothing to save.", "Info") return try: self.indexer.save(self.INDEX_DIR) wx.MessageBox(f"Index saved to:\n{self.INDEX_DIR}", "Success") except Exception as e: wx.MessageBox(str(e), "Save Failed") def on_result_click(self, event): idx = event.GetIndex() full_path, name, score = self.current_results[idx] fpath = Path(full_path) try: content = fpath.read_text(encoding="utf-8", errors="ignore") html_content = mistune.html(content) full_html = f""" <html> <head> <style> body {{ font-family: Arial, Helvetica, sans-serif; padding: 20px; line-height: 1.6; }} pre {{ background: #f4f4f4; padding: 12px; border-radius: 4px; overflow: auto; }} code {{ font-family: monospace; }} h1, h2, h3 {{ color: #2c3e50; }} img {{ max-width: 100%; height: auto; display: block; margin: 15px 0; }} </style> </head> <body> {html_content} </body> </html> """ # Fix: Use base URL so local images load base_url = f"file://{fpath.parent.resolve()}/" self.preview.SetPage(full_html, base_url) self.SetStatusText(f"Opened: {name} (Score: {score:.4f})") except Exception as e: print(f"Preview error: {e}") # for your console self.SetStatusText(f"Could not open file: {e}") self.preview.SetPage(f"<html><body><p>Error: {e}</p></body></html>", "") if __name__ == "__main__": app = wx.App(False) frame = SemanticSearchApp() app.MainLoop() - go back to main directory:
cd ..
- create a data directory:
mkdir data
- create a markdown sub directory in data:
mkdir ./data/markdown
- And an image directory below the markdown directory:
mkdir ./data/markdown/images
Gather your markdown files, and load them all into the markdown directory, and any associated images into the images directory.
sentence_transformers uses hugging face machine learning language, so an internet connection is needed to run.
That should be all that's required to get started.
From the project directory, run:
python src/MarkdownCommander.pyOnce you have built a model, it can be reloaded to speed up the process. You should only need to rebuild when you add new documents.
Here's a screenshot of the program, and the display of a markdown document that contains images:
Again, I greatly appreciate the tremendous help that I received from XGrok, which is my choice for python assistance.
If you find that I missd anything, please let me know.
Edited may 17 -- Clarified install steps
