I’ve spent an awful long time working out seemingly trivial technical points with this project so far, but now i feel like I’m reasonably fluent in using Puredata and Openframeworks, and have created a reasonably flexible development setup using oF, Pd and Logic Pro as a temporary sound source. Although the ultimate goal is to integrate everything into a single application, having the component parts separated out like this means that i can experiment and make changes, and see the results quickly.
The use of the ‘vanilla’ Pd version means that it should be possible to integrate it into the OF project using the ‘OfxPd’ add-on. I’ve already done some experimentation with this, and simple Pd patches are easy enough to integrate. The sound generation will be taken care of inside Pd eventually, rather than using Logic – so the whole thing should work as a single app.
The OF program loads an image and steps through it from left to right – taking a cue from how a book or musical score is read. Each pixel column is analysed, and a measure of the average values in the column calculated. Clearly this doesn’t tell the whole story of the image – variations from top-to-bottom are not taken into account yet, for example – but it’s enough to go on for now.
So for each column i’ve got an average measure of hue, brightness and saturation for that column, a kind of running average for each one as well, and a total H/S/B value per column. The running average is used to work out how much the current column’s average changes for each step – this is used to provide a kind of ‘complexity index’ as a measure of the amount of variation across the image.
The image is read left-to-right at a rate of one column per 16th beat, at (approx) 100 BPM. For a reasonably wide image (say, 1024 pixels) this gives a total song length of around 64 bars, or 2.5 minutes.
All this data is sent via MIDI (which turns out to still be the best way!) to Pd, which, at this stage, generates some simple melodies using a basic pad sound created in Logic. The pad generator is governed by some simple rules, such as:
More complexity = more variation, fewer spaces between notes
More brightness = more likelihood of a major key progression over a minor key progression
‘Red’ spectrum = warm, filtered sounds.
‘Blue’ spectrum = cold, bright sounds
Brightness is also literally mapped to filter cutoff, meaning the brighter an image is, the brighter a sound is.
Results so far
Feeding the program different images certainly results in the generation of short pieces of music that differ in terms of tonal quality, note progression and ‘feel’. The music so far is pretty boring, but i think that can be addressed by introducing more interesting generative rules. I think as a ‘proof of concept’ what i’ve got so far is positive – definitely darker images produce darker sounds, different colours produce different results and transitions between contrasting parts of an image produce corresponding changes in the audio.
Here’s a couple of examples: