January 18, 2026

I Finally Built the Transcription Tool I’ve Wanted Since 2008

Illustration showing audio sound waves transforming into binary data and then into text and media files, representing a local, privacy-first audio-to-text publishing workflow.
Audio in. Text out. Publishing only when I say so — all on my own machine. Image created with help of AI.

I've always wanted a transcription machine because for years, typing has been a bottleneck.

Not thinking.
Not clarity.
Not ideas.

Typing.

Back when the first iPhones came out, I had a simple wish:

let me talk, and let my words appear in my blog.

At the time, that was fantasy. Speech recognition existed, but only in research labs, big companies, or cloud services that didn’t really work well and definitely weren’t private. I moved on, kept typing, and learned to live with the speed limit.

Fast-forward to now.

Modern hardware.

Local machine learning.

Open models.

Enough computing power sitting on my desk to do what used to require a lab.

So I finally did it.

I built a fully local voice cloning and publishing pipeline on my own laptop. No cloud inference. No subscriptions. No dashboards. No usage caps. No data leaving my machine unless I explicitly choose it.

My intellectual property never leaves my machine unless I explicitly choose it.

That constraint mattered more than the tech itself.


What I wanted (and what I refused)

I didn’t want:

  • another AI subscription

  • another web interface

  • another service asking me to “upgrade” my own brain

  • another place my raw thoughts were stored on someone else’s servers

I wanted:

  • text → audio

  • audio → text

  • both directions

  • locally

  • for free

  • automated, but only when I asked for it

This isn’t about replacing judgment.

It’s about removing friction.

Automation that empowers judgment instead of eroding it.


The tool I built

At a high level, the system now does two things:

  1. Transcription

    • I record audio

    • Drop it in a folder

    • Whisper runs locally on Apple Silicon using Metal

    • Clean, readable text appears

    • Optional publishing happens only if I explicitly speak intent

  2. Voice synthesis

    • I provide my own voice reference

    • Text files dropped into a folder become .m4a files

    • The voice is mine

    • The processing is local

    • The output is mine to keep or discard

No GPU calls inside Python ML stacks.

No fragile cloud dependencies.

No long-running services pretending to be “magic.”

Just files, folders, and clear contracts.


Why this is finally possible

In 2008, this idea simply wasn’t realistic.

Speech models weren’t good enough. Hardware wasn’t accessible. Tooling didn’t exist outside academic circles.

Today, it is.

Not because of one model or one framework, but because the ecosystem finally matured:

  • open speech models

  • commodity GPUs

  • local inference

  • better system-level tooling

This is the kind of problem that’s only solvable now.


What this unlocks for me

I can think out loud without restraint.

I can write at the speed of thought.

I can turn raw thinking into drafts without ceremony.

And I can do it knowing:

  • my data stays local

  • my voice is mine

  • my process is under my control

This isn’t a product (yet).

It’s a personal tool.

But it’s also a case study in how I approach problems:

constraints first, workflow second, technology last.

If you’re curious how it works in detail, I’ve written more about the architecture and tradeoffs here:

👉 My Local Transcription PipelineAttachment.tiff

More soon.

July 31, 2024

Embracing AI in UX and UI Design: A Game-Changer for Articulating Visual and Functional Needs

As an experienced UX and UI designer, incorporating AI into my workflow was a no-brainer, and it has transformed how I approach design and development. With AI, I can precisely frame my visual and functional requirements, allowing me to identify potential design and engineering issues before presenting them to developers—this is a far more efficient use of our limited time budgets. I'll use a tool like jsfidde.net to test an early idea and explore basic feasibility. This proactive approach significantly improves the efficiency of my work.

AI tools enable me to create detailed prototypes and simulations, helping me visualize end products more accurately. This capability allows me to explore and refine ideas extensively, pushing the boundaries of creativity and functionality. When I collaborate with developers, I have a well-defined vision that minimizes misunderstandings and maximizes productivity. I can also share my sandbox, where I play with ideas.

AI-driven design tools streamline the iteration process. I can quickly test different design elements and functionalities, receiving immediate feedback that guides my decisions. This iterative approach ensures I address potential issues early, leading to a smoother development process and a more polished final product.

This AI-driven design transformation is helping me reach the right solutions far faster.

Real-World Application: Designing a Digital Split-Flap Display

One real-world example of how I use AI in my design process is creating a digital split-flap display similar to those seen in European train stations. I have a clear visual concept and understand the functional requirements. However, determining the most efficient coding approach is beyond my expertise, as my primary focus is design. Nevertheless, my programming background, having coded since I was 14, allows me to bridge the gap between design and development effectively. I can ask AI to fulfill specific requirements, and it can provide solutions for me to try. I continue to iterate as I find my way to a solution, but as mentioned, far faster than before.

Integrating Voice Synthesis for a Radio Noir Project

Installing an open-source language model on my laptop, I'm also venturing into a radio noir side project for fun. This model will synthesize my voice to read raw text into MP3 files, adding a unique and personal touch to my project. Utilizing AI for voice synthesis on my laptop allows me to avoid costs from online providers when I only want to dabble in it for learning and to see what I can do. Creatively, this approach allows for more dynamic and engaging storytelling, perfect for the immersive nature of radio noir.

Looking Ahead

I am excited to continue on these paths, eager to see where all this new learning takes me and where the AI and design industries are headed. Integrating AI in design and development opens up new possibilities.

Like so many times before, I am excited to be at the forefront of this evolution of technology.

---

#UXDesign #UIDesign #AIDesign #ArtificialIntelligence #TechInnovation #DesignThinking #Prototyping #DigitalDesign #VoiceTechnology #DesignProcess #FutureOfWork #Programming #CreativeTech #Innovation #DigitalTransformation #DesignLeadership #UserExperience #ProductDesign #TechTrends #AIInDesign

May 26, 2024

Redesigning My Website: A Journey Towards Improvement

I was interviewed by an AI tuned for UX design about my recent web design overhaul and here's what AI had to say.

Read more

© BERT : MCMXCV — MMXXVI

Back to top Arrow