In a world dominated by keyboards, the simple act of getting thoughts onto the screen can feel like a chore. We juggle emails, code, documents, and messages, often wishing for a faster, more natural way to input text. While cloud-based dictation exists, it comes with strings attached — subscriptions, privacy worries, and the need for a constant internet connection.

What if you could reclaim your voice data and dictate freely, directly into any Windows application, without paying a cent or sending audio to the cloud?

Meet OmniDictate, a free, open-source desktop application designed to bring powerful, local, real-time AI dictation to your Windows PC.

➡️ Main Project Repository: https://github.com/gurjar1/OmniDictate ⬅️

OmniDictate GUI Screenshot The OmniDictate user interface

The OmniDictate Advantage: Local, Free, and Integrated

OmniDictate was born from a desire for a dictation tool that respects user privacy and integrates seamlessly into existing workflows. Here’s what sets it apart:

1. Truly Local Processing = Total Privacy

Unlike services that process your speech on remote servers, OmniDictate runs 100% on your own computer.

  • Your Voice Stays Yours: No audio data is uploaded or stored externally. Your dictated content, conversations, and sensitive information remain confidential.
  • Works Offline: Once the AI model is downloaded, OmniDictate functions perfectly without an internet connection.
  • No Subscriptions, Ever: It’s genuinely free, leveraging open-source technology.

2. Open Source Transparency

Available under a Creative Commons BY-NC 4.0 license, OmniDictate embraces the open-source spirit.

  • Free for Non-Commercial Use: Use it without cost for personal projects, writing, accessibility needs, and more.
  • Inspectable Code: Anyone can view the source code on GitHub (https://github.com/gurjar1/OmniDictate), understand its workings, and contribute to its improvement.

3. Optimized AI Performance

OmniDictate utilizes state-of-the-art speech recognition powered by faster-whisper, an optimized implementation of OpenAI’s Whisper models.

  • High Accuracy: Benefits from Whisper’s robustness across different accents and environments.
  • Speed: faster-whisper uses CTranslate2 for significantly faster inference than standard Whisper, especially with GPU acceleration.
  • GPU Ready: Automatically uses your CUDA-enabled NVIDIA GPU if available and correctly configured, providing near-instant transcription. Falls back gracefully to CPU if needed (though noticeably slower).
  • Model Choice: Select different Whisper model sizes (from tiny to large-v3) directly within the GUI to balance speed and accuracy based on your hardware.

4. Type Directly Into Any Application

This is a core feature. OmniDictate doesn’t trap your text in its own window.

  • System-Wide Input: Using Windows accessibility and automation libraries (pywinauto, pynput), it simulates keyboard input directly into the currently focused application — be it your word processor, email client, code editor, web browser, or chat app (except OmniDictate itself).
  • Seamless Workflow: Dictate naturally without the interruption of copying and pasting.

5. User-Friendly GUI and Controls

The PySide6 interface provides easy access and configuration:

  • Modern Dark Theme: Easy on the eyes.
  • Simple Controls: Clear Start/Stop buttons and status indicators.
  • VAD Toggle: Easily switch Voice Activity Detection on or off. When off, dictation only occurs via Push-to-Talk.
  • Push-to-Talk & Stop Hotkeys: Configurable global hotkeys (Defaults: Right Shift for PTT, Esc for Stop) allow control even when OmniDictate isn’t the active window.
  • Settings Management: Adjust VAD sensitivity, typing speed, filter words, newline commands, and more directly in the GUI. Settings are saved persistently.
  • Transcription Display: See a running log of your transcribed text within the app.
  • Copy Button: Quickly copy the entire transcription history to your clipboard.
  • Restore Defaults: Easily revert settings back to their original values.
  • Voice Commands: Includes support for “delete last n words”, “new line”, and spoken punctuation.

How It Works: The Technology

OmniDictate combines several key Python libraries:

  • faster-whisper & ctranslate2: Provide the core, high-speed AI transcription capabilities.
  • sounddevice: Captures audio input from your microphone reliably.
  • pynput: Handles global keyboard monitoring for hotkeys and simulates precise, character-by-character typing output.
  • pywinauto: Enables interaction with Windows applications for commands like “delete last words”.
  • PySide6: Powers the modern graphical user interface.
  • numpy: Used for efficient handling of audio data.
  • torch: Required by faster-whisper, essential for GPU acceleration.

Getting Started with OmniDictate

You can download pre-built versions (recommended for most users) or build from source if you prefer.

  1. Prerequisites:

    • Windows 10 or 11 (64-bit recommended).
    • Microsoft Visual C++ Redistributable (VS 2015–2022 x64) (Download from Microsoft’s website). This is crucial!
    • (GPU Users Only) Ensure you have the correct NVIDIA Driver, CUDA Toolkit, and cuDNN installed before running OmniDictate in GPU mode. See the detailed “CUDA/cuDNN Setup” section in the project’s README for critical instructions.
  2. Download: Go to the OmniDictate Releases Page on GitHub.

  3. Choose an Option:

    • Installer (.exe): Installs to Program Files and creates shortcuts. Recommended for ease of use.
    • Portable ZIP (.zip): No installation needed. Extract the folder and run OmniDictate.exe from inside. Useful for portability or if you encounter installer issues. (Download link for ZIP is available in the README, potentially via Google Drive due to GitHub’s file size limits).
  4. Handle Security Warning: Since the application is unsigned (due to the cost of code signing certificates for free software), Windows Defender SmartScreen will likely show a warning (“Windows protected your PC”). Click “More info” and then “Run anyway” to proceed with the installer or the first launch of the portable .exe.

  5. Launch & Configure: Start OmniDictate via its shortcut or .exe. Explore the “Configuration” section in the GUI to adjust settings like model size, hotkeys, etc. Settings are saved automatically.

  6. Dictate!

(Optional) Using the Original Command Line Version

If you prefer a command-line interface or want to see the project’s origins, the original CLI version is available:

  • CLI Repository: https://github.com/gurjar1/OmniDictate-CLI
  • Setup involves cloning the repository, setting up a Python environment, and installing dependencies including the correct PyTorch version (see the CLI README for full details).

Basic Usage Guide (GUI)

  1. Launch OmniDictate.
  2. (Optional) Configure: Adjust settings in the “Configuration” section as needed.
  3. Start Dictation: Click the “Start Dictation” button. The status indicator will update.
  4. Dictate:
    • VAD Mode (Default): Simply speak when the status shows “Listening”. Transcription starts automatically when you speak and pauses when you stop.
    • Push-to-Talk (PTT): Hold down the configured PTT key (Default: Right Shift). Transcription occurs only while the key is held. This overrides VAD. You can turn VAD off via the GUI button if you only want PTT control.
    • Output: Text is typed into the currently active application window and also displayed within the “Transcription Output” area in OmniDictate’s GUI.
  5. Use Commands: Say things like “delete last two words”, “new line”, “comma”, etc., during dictation.
  6. Stop Dictation: Click the “Stop Dictation” button in the GUI or press the configured global Stop hotkey (Default: Escape).

Who Can Benefit from OmniDictate?

  • Writers & Content Creators: Speed up drafting emails, articles, and social media posts.
  • Developers: Dictate code comments, commit messages, or technical notes.
  • Students: Efficiently take notes during online lectures or transcribe research.
  • Anyone Seeking Productivity: Reduce typing time and effort.
  • Privacy-Focused Individuals: Keep voice data entirely local.
  • Users Needing Accessibility Options: Provides an alternative to keyboard input.

The Road Ahead

OmniDictate is actively developed. Future plans might include support for more languages in the GUI, more advanced command options, and further performance tuning. Being open-source, community feedback and contributions via GitHub are highly valued.

Stop letting the keyboard slow you down. Give OmniDictate a try and rediscover the power of your voice for interacting with your computer — privately, freely, and efficiently.


Find the code, downloads, and full documentation on GitHub: ➡️ https://github.com/gurjar1/OmniDictate ⬅️