

Uh, noise cancellation is hard. First of all, the audio pipeline currently isn’t able to resample the microphones, so mic and output need to be connected to separate i2s buses, or it won’t work simultaneously in the first place.
And then I had some luck with the microwakeword component. It often triggers correctly even with noise in the background. And I have an automation that mutes all media players and the TV when the wake word is triggered. That’s my “noise cancelling”.
I think more elaborate noise cancelling is going to require some dedicated hardware (or maybe some proprietary ESP-ADF functions) and a microphone array. But that’s probably as expensive as an Voice PE?!
I’m not in a good place with the voice assistant anyway. Don’t own a graphics card. So it’s slow. And Whisper never gets all the words right for me. So it’s down to the speech-to-phrase addon. And that seems to be broken as of now. At least I get more connection errors than commands through. I think I’m going to do the Sendspin media player first. And then maybe add a microphone and voice assistant later.






Nice, thanks for the link! I wasn’t aware of that. Sadly as with all shiny new things it doesn’t fit all my requirements… I’d really like to speak to my house in my native language. But I figure English will do. I’m gonna try that.
Not sure if an ESP32-S3 is fast enough for more advanced DSP plus the rest of an voice assistant. At least I found some ESP32 libraries with noise reduction, echo cancellation… There is the ESP-ADF and a project called ESP32-SpeexDSP. But I didn’t try that yet. The Rockckip / Luckfox development board looks nice as well. A Cortex-A7 and a few hundred megabytes of memory might come in handy. And whatever the NPU does. But I don’t have a clue what kind of software and libraries we got for embedded Linux or custom processing units.
Anyway. I think the production-grade stuff mostly uses multiple microphones and a combination of beamforming and echo cancellation. I got 4 inmp441 microphones here. But I lack the software/libraries to tinker with that kind of signal processing.