Motion Sonified

demo 1: standing start | demo 2: squat | demo 3: calisthenics

Motion Sonified is an independent, experimental research project of Michael Barkasi. It aims to explore motion sonification as a form of sensory augmentation. The hope is eventually to help athletes and performance artists improve bodily awareness and motion mechanics. What makes Motion Sonified unique is the attempt to sonify explosive, sub-second high-skill sport movements. This white paper sketches the project, with references.

Movement sonification

Movement (or motion) sonification is the transformation of body position and movement into sound. Information about position and movement is captured by wearable sensors, like the kind found in a smart watch or smart phone. A processor then uses the sensor readings to modulate qualities of a tone, like pitch or volume. There is a thriving community of researchers, artists, and engineers working on movement sonification. The range of applications is wide: everything from using sound to help cyclists smooth out their pedal stroke to enriching the experience of art or expressing artistic emotions. Rehab and the treatment of motor diseases is another area often researched.

The limits of proprioception

Why movement sonification? Our awareness of our body’s position and movement mechanics is surprisingly poor. We’re not very good at feeling the position of our limbs or making our bodies move as we intend. The sensory receptors in our muscles and joints register only impoverished information about position and movement. The signal they generate is slow and often suppressed by the brain, which largely tracks movement through its predication of the body’s position. Here is a brief guide, and here is a longer one with references.

Sensory augmentation

Movement sonification aims to augment the impoverished signal from proprioception with rich auditory information. Instead of having awareness of your position and movement through kinaesthesia, the information is conveyed through a tone. In this way, movement sonification is like other work on sensory substitution, e.g. the use of tactile interfaces to convey visual information to blind individuals. Just as visuohaptic systems convert camera data into pressure on the skin, audioproprioceptive systems (movement sonfication) convert spatial sensor data into audible sound.

On some of the runs in this demo, I recorded the sound directly off the headphone wire. Here you can see one of those recordings, of sound generated on the first pedal stroke, which show the kind of information captured by the unit and conveyed to the reader. Notice the time scale and resolution: This is successful sonification of an explosive (~0.5sec) movement with high temporal resolution (5ms). For more on how I generate the sound, see the audio analysis below.

Demo 1: The standing start

Here’s an example. I aim to sonify explosive sport movements, like the standing start in track cycling. In track cycling, riders often start from a dead stop and must accelerate the bike as quickly as possible. In this demo, the intention was for the rider’s hip thrust (distance from handlebars) to control pitch, and back alignment (flat vs round back) to control volume. The engineering challenge is substantial: Success requires capturing fine position changes at very high temporal resolution. The prototype used in this video samples position and updates the tone accordingly every 5 ms, using only a small microcontroller. For demonstration, the sound is played through a Bluetooth speaker, although in practice it’s intended to be heard through a wired earbud. (I’ve retracked the video, to eliminate the Bluetooth delay, which isn’t there through wired earbuds.) The hope is to give the athlete improved information about body position to guide their movement. Instead of focusing on their body, they focus on the tone and attempt to perform the movement while maximizing pitch (hip thrust) and keeping volume up (maintaining a flat back). Thomas Hums performs the start in the video. Full clip here.

Recording of a second standing start attempt from the demo. (Neither wire recording is from the start shown in the video.)

Sport and performance art

Complex and (especially) explosive sport movements, like the standing start in track cycling, are difficult to learn, develop, and maintain proficiency. Lifters struggle to keep their backs flat as they squat. Even the best quarterbacks and pitchers must constantly work on their throwing mechanics. Novices need constant correction to hold yoga poses properly. Sport is filled with complex movements that are difficult to master: e.g., high jumps, golf swings, and somersaults. Similarly, performance art like dance involves complex movement involving precision timing and requiring accurate bodily awareness. These movements are difficult to master.

Demo 2: Squat

A key engineering challenge is to get the unit to learn the video, so it provides a useful sonification. I do that through machine learning. In this demo, the unit learns the motion through an initial five-position motion capture phase. (Similar motion capture was used in the standing start demo.) Hip angle controls pitch, so that the pitch hits 2000Hz in the bottom of the squat and drops to 300Hz at the top. Sensor alignment controls volume: The further the alignment is from what’s recorded during the motion capture, the lower the volume. Thus, the user gains information about their squat depth from the pitch, and their back/hip alignment from the volume. Decreases in volume mean that the back isn’t aligned properly with the hips, e.g. the hips might be rising faster than the back. For demonstration I played the sound through Bluetooth. Unlike in the standing start demo, I haven’t corrected the (~ 1/2 sec) delay. When used as intended, through wired earbuds, the delay is only 5-10ms.

Real time feedback

Much existing work on movement sonification focuses on sonifying rhythmic motion, giving subjects a rhythmic beat to help regulate motion, or using the body to control sound (especially for art). The aim of my project is different. I aim to build high-precision sonification units which give users useful, precise, and real time feedback on body position and movement. To be useful as a tool for movements like a squat or high jump, latency should be as low as possible (e.g., 5-10ms), sampling and updates every 5ms or so, with the spatial resolution to capture extremely small changes in movement and convert them into audible changes in sound.

Movement correction

The ultimate hope is to design sonifications which enable real time correction. For example, the auditory signal should not only allow a lifter to know their back wasn’t straight after their squat. It should be fast enough that it allows them to correct their form as they squat. While some movements, like a baseball pitch, are likely too fast to allow for real time correction, a fast enough auditory signal might still (even if subconsciously) provide useful feedback that improves technique on the next attempt. Further, even these ultrafast movements have slow preparatory phases (e.g., the “wind up” in a pitch) which do allow for real time correction. Sonification should, in theory, allow for these sorts of improvements, since an auditory signal should be able to provide more precise information faster than natural proprioception. A key challenge is teaching the user to ignore their own bodily feedback and instead move to control the tone. (The idea is not to focus on both your body and the tone, but only the tone.)

Demo 3: Calisthenics routine

There are many potential applications of this technology. Here I used it to sonify a simple calisthenics routine. Back angle controls pitch, leg angle controls volume. (This early demo doesn’t sonify error or alignment.) Notice how the volume goes down as my legs approach parallel with the ground, while changes in the orientation of my back change pitch. As with the squat demo video, there is a delay from the Bluetooth which isn’t there when used as intended with wired earbuds. More clips from this session are on YouTube.

The temporal resolution of audition

Why sound? Why not give athletes visual information? It’s easy to put them in front of a mirror. First, it’s not always possible for athletes to perform in front of mirrors. Second, many sport movements require attention to be focused on the task, not on a mirror. Audition is powerful because it’s much less demanding of attention than vision. Even when focused on some task, sound can be quickly and effortlessly processed in the background. Third, and most important, vision is slow with poor temporal resolution. It takes roughly 150ms from stimulus onset before visual information is processed into a usable form, and vision detects changes that unfold over tenths of seconds. In contrast, audition takes only about 30-50ms to process input and detects changes at a scale of milliseconds. Audition is much better suited to process the fast, temporally precise information involved in complex body movement.

Here’s a simple demonstration. I’m varying the gap between a repeating 5ms tone by only 1-3 ms, with audible results. I also varying the pitch of that tone, again with audible results. So, the human auditory system only needs about a few wave forms to pick up pitch, and is sensitive to temporal gaps of only a millisecond or two.

The technology

I sonify movement using commercially available sensors. I use microprocessors to read the data and convert it into sound. Roughly, the equipment is something like Mozzi, but I use my own hardware and code. Because of the demand for ultra low latency, high temporal resolution, and spatial precision, movement sonification is a challenging engineering task. Currently, my prototype uses 9-axis digital inertial measurement units (IMUs). I also experiment with analog resistance units, like flex sensors. The microprocessor reads two strategically placed sensors, then uses those readings to modulate a simple single-frequency tone it generates in the background. I use a special form of Pulse-Width Modulation (PWM) to generate the sound in a way that allows for near instantaneous updates of pitch and volume (in less than 1/20 of a millisecond).

Audio Analysis

For more demos, see my SoundCloud.

Here is a sample, recorded directly off the output wire, of the sounds my current prototype makes. To generate the sound using only software with minimal processor demands and with fast updates, I use a special variation of pulse-width modulation (PWM) Ioosely inspired by how PWM is used to control motors and dimmer lights, my conversations with engineering friends, and Connor Nishijima’s volume library for Arduino. I made this recording as I moved the unit with my hands (not as the unit was worn during a sport movement). While the sounds are a bit shrill, the recording shows the robustness, precision, and speed of the auditory signal generated by my unit. As I move the sensors, volume and pitch change. You can see from the audio analyses that the unit updates pitch and volume every 5 ms (the intended update rate I currently use), and does so with an extremely fine-grain resolution of volume and pitch control.

Embedded simplicity

A key insight to doing fast, real-time movement sonification is that you need to explore the natural way in which subjects are embodied and embedded in the environment. The body has limited degrees of freedom, complex target body variables correlate with simpler ones, and units are meaningless to the brain’s motor control system. My aim is to exploit the way in which subjects are embodied and embedded, plus features of the brain’s own neural processing, to do robust and fast sonification.

A multidisciplinary perspective

My project is unique insofar as it blends a diverse array of perspectives: psychophysics, sport engineering, athletic training, and phenomenology. I study the response properties of sensory neurons to stimuli, design wearable sensor units, program sonifications, and introspect what it’s like to listen to the auditory feedback as I move. As a trained philosopher and cognitive scientist, I bring a wide background ranging from cognitive neuroscience to classical phenomenology.

Perception as a skill

Newborn infants need to learn to use their eyes to see and hands to touch. Blind individuals given visuohaptic sensory substitution devices need to learn how the tactile sensations correlate with visual stimuli. Similarly, effective use of audioproprioceptive sensory augmentation will presumably require learning how to use the feedback. Using your senses to perceive the world (or, in this case, your body) is always a skill to be learned. Consider how painters become better at discriminating colours as they train to paint, or how musicians learn to better discriminate notes. A key component to movement sonification is understanding how sonifications must be learned, and (like any other sensory task) using the system will be a skill that needs developing. The key skill involved is likely to be sensory exploration, i.e. the skill of using intentional movement (motor commands) to see how sensory input changes.

Key research questions

  • What kinds of acoustic changes are most readily discerned by audition?
  • How fast and at what temporal resolution can audition discern those changes?
  • What kind of sensors work best for capturing position and motion data for sonification? (There are no perfect sensors; there is always a trade off.)
  • How do you generate sounds that can be modulated in real time (or virtually real time) by sensor readings? Are there purely analog circuit solutions, or are digital microprocessors best?
  • What kind of neural responses and differences (e.g., as measured by EEG) are observed when using movement sonification?
  • Is there behavioural data showing that movement sonification improves bodily awareness, movement mechanics, or speeds movement learning?
  • What is it like for people using movement sonification? What is their subjective experience? What kinds of skills are needed for effective use, and how can people be taught them?