The Semantics of Music Libraries - Part 1

iTunes Caddy is a new application I am working on!

As many of you know, in the FEAC institute class I teach, I always use your music library as the best example of metadata.  I explained that example in detail in my new book - the diagram I use is shown below:

Recently, I had to pull songs off my iPod and move them back to my iTunes library on a new computer.  Unfortunately, due to file naming differences in the programs, I now have many duplicate songs in my iTunes library.  After searching the internet for some "dedupe" utilities ... none impressed me so I decided to write one of my own.  The good part about this is that the iTunes library is an XML document and I wanted to explore more deeply.  Though I will not get into the details here today ... I do not like the iTunes XML format due to its poor semantics.  Briefly, it uses an xml dtd to define a simple property list where what should be tag values are actually the "key" portion of the key-value pairs in the property list.  Very poor XML design.  XML was not intended to be used for you to turn generic programming data structures into a persistent library.  Ok, back to my "dedupe utility"... here is what I have so far of what I am calling the "iTunes Caddy":

While the first goal is to fix errors (like duplicates) in the iTunes library, I will also explore the semantics of Music libraries.  As you can see I plan on also creating some visualizations of a music library.  I am up to the part of the application where I am parsing the iTunes xml representation and turning it into an internal class representation.  In fact, as stated earlier, I am displeased with the weak semantics of the iTunes metadata.  Here is a high-level view of my plan for the design of the class representation of a Music Library:

So, let's walk through the semantics of a music library.  A music library is a collection of audio tracks (called tracks) and a collection of artists.  In fact, while a collection of tracks may be technically sufficient, the collection of artists is better because people identify more with the context of the songs than with individual songs.  Thus, discovery is a key part of music and must be included in the model.  That then requires us to more accurately represent the semantics of a song.  In that, a song does not come into existence on its own.  It is created by an artist.  That artist is a collection of one or more musicians.  Artists create albums which are collections of audio tracks.  While I have not shown the attributes in the above diagram, there are obviously many attributes in each class.  This is just the beginning of our exploration ... in the next few weeks I will explore the Music Brainz knowledge base and look to integrate it into the application.  More soon ... Oh, yeah - this will be an open source project but not until it is further along. As always, comments are very welcome (via the contact form on the left).  Best wishes, - Mike