Page 1 of 1
RFC on architecture for MIDI piano practice tool for blind/low vision/elderly users.
Posted: Sat Dec 05, 2020 7:54 pm
by danjcla
I'm working on a recording solution for my technophobic father for X-Mas, and just wanted to see if this seems like a reasonable high-level architecture to more experienced people:
(a)
midish or
arecordmidi (which would be better? Or something else?) continuously records the MIDI output of a digital piano to a
compressing file system - I'm thinking BTRFS - say one file per day.
(b) Configure
Rhasspy to be able to feed MIDI back to the device via midish, so it gets played, initially in the simplest way possible, e.g. by speaking things like "Play 2020 12 5 at 1 15 pm" and "stop". (My dad doesn't have or want internet access, so need to avoid any Speach-To-Text that sends voice offsite for processing.)
(c) Over time, improve the interface, for instance, to allow the user to give arbitrary names to start points. Perhaps also add the feature of being able to choose between sending the MIDI back to the device and playing via some nicer-in-some-ways instrument via linuxsampler and the appropriate .gig files.
Any thoughts?
Thanks,
-Danny
Re: RFC on architecture for MIDI piano practice tool for blind/low vision/elderly users.
Posted: Sun Dec 06, 2020 8:21 am
by Basslint
Hello Danny, and welcome!
I think it's a great idea. I like the fact that you are following the *NIX way and working in a modular client-server fashion.
One helpful feature would be a "ls" command to list recordings on a given day. I know that they are all contained in the same file but without it, you would have to remember the exact time you recorded something. Perhaps you could use text-to-speech to answer the question "What did I record on 2020 12 5?". Or you could open/close a file manually, by saying "start recording" and "stop recording".
Otherwise, you could trigger recordings on input. You would record a temporary file continuously and when a note is first played, that last note is copied onto a new file which refers to that specific session. The file closes automatically after some idle time (let's say if no note is played for 10 minutes).
Re: RFC on architecture for MIDI piano practice tool for blind/low vision/elderly users.
Posted: Sun Dec 06, 2020 9:32 am
by folderol
Interesting project.
You will need to pay a lot of attention to the parser. Decoding typed info is hard enough (I know from experience!) so spoken words will have even more potential variations.
Re: RFC on architecture for MIDI piano practice tool for blind/low vision/elderly users.
Posted: Sun Dec 06, 2020 9:55 am
by jeanette_c
Hi Danny, I think you could lure a few people with a project like that. The way you proposed it is especially nice because it relies only on open source software, so you could in the end create a full install with all configurations.
If you don't want to go as far as that, I think Pianoteq includes the recording feature with an idle time and would, naturally, also include an in-the-box sound engine. I'm mostly mentioning it, since I supose you will have enough to do with the interface to your system and have less than three weeks now.
I have seen only one off-line Linux speech recognition system, which works as a complete standalone. Maybe there are more such systems in connection wiith some desktop environment. I think this system, written in Java, was called Sirius.
Midish as a recording tool is quite useful, since it can be scripted or operated by piped commands. The direct shell is just a convenience interface, originally written to demonstrate how other UIs could be built on top of Midish. If you'd like to work with idle times to turn off and trim recordings, maybe an additional piece of software attached to the MIDI input, perhaps using RTmidi could tell you when notes are being played on the input. Or you could try to parse the output of aseqdump for incoming events.
There are a few text-to-speech options on Linux, espeak, which is very synthetic, but has a few voice parameters. I think Mary TTS has a better voice or two. Mbrola might work on your architecture. It's voices aren't great by today's standards, but good enough for short feedback. Then there is Festival which has a few OK voices. There is one commercial voice provider Voxin. It sort of works with speech-dispatcher, the tool used to pass text to a speech engine from desktop environments. It has a small commandline utility. There was an announcement on the Orca mailinglist for a new alternative to speechdispatcher that is tailored to work well with Voxin.
I hope some of that is helpful.
Best wishes, Jeanette