[New]: Application Metadata

Updated: 09 October 2006

Project: Application Metadata
Purpose: This is a simple project to research the elements of good metadata design.  In order to do that we need to work through a non-trivial example.  I chose Application metadata as the project.  In other words, how would we design metadata to describe the programs we have running on our computer?  This is especially important on a family computer or a shared work computer where multiple people are downloading programs off the internet.  In order to prevent duplication and aid discovery, we need to design some metadata to describe what exists.

Updates:

  • Update1 - 8/27/2006.  Simple File Scanner application to look at all the executables on a PC.
  • Update2 - 10/9/2006.  Changed attack on the problem to analyzing and categorizing applications on the start menu.

8/27/2006

Steps:

  1. Understand the problem.  We will begin this project by studying what we want to describe.  To do that I created a very simple java program to scan for files with a specific extension (like .exe) and dump out the file metadata (name, file size, etc.) to an XML file.  Here is a sample snippet from a scan of our family PC:

    <!-- Scan of .exe files on Family PC -->
    <files>
    <file name='aolconnfix.exe' dirName='c:\' modifiedDate = 'Mon Dec 05 18:35:02 EST 2005' length='10920' />
    <file name='AHD3.EXE' dirName='c:\applications\AHDW' modifiedDate = 'Mon Nov 07 23:00:00 EST 1994' length='240048' />
    <file name='SETUP.EXE' dirName='c:\applications\AHDW' modifiedDate = 'Wed Aug 31 00:00:00 EDT 1994' length='24624' />
    <file name='SETUP2.EXE' dirName='c:\applications\AHDW' modifiedDate = 'Mon Nov 07 23:00:00 EST 1994' length='38208' />
    <file name='_MSSETUP.EXE' dirName='c:\applications\AHDW' modifiedDate = 'Wed Aug 31 00:00:00 EDT 1994' length='9813' />
    <file name='dnsproxy.exe' dirName='c:\applications\kingate-1.6-pre2-win32\kingate-1.6-pre2-win32\bin' modifiedDate = 'Mon Dec 01 11:40:28 EST 2003' length='57344' />
    <file name='kingate.exe' dirName='c:\applications\kingate-1.6-pre2-win32\kingate-1.6-pre2-win32\bin' modifiedDate = 'Sun Mar 21 15:51:08 EST 2004' length='360448' />
    <file name='sdelete.exe' dirName='c:\applications\sdelete\Release' modifiedDate = 'Thu Oct 16 12:38:42 EDT 2003' length='61440' />
    <file name='sdel.exe' dirName='c:\applications\sdelete' modifiedDate = 'Thu Jul 08 11:05:56 EDT 1999' length='45056' />
    <file name='apache-tomcat-5.5.12.exe' dirName='c:\archive' modifiedDate = 'Sat Dec 24 15:57:11 EST 2005' length='5030608' />
    <file name='dxsetup.exe' dirName='c:\archive\DirectX9' modifiedDate = 'Sun Jun 01 17:47:20 EDT 2003' length='467456' />
    <file name='fear_server_en_103.exe' dirName='c:\archive' modifiedDate = 'Sat Apr 01 22:26:00 EST 2006' length='121607252' />|
    <file name='install_flash_player.exe' dirName='c:\archive' modifiedDate = 'Wed Jul 19 18:58:46 EDT 2006' length='1355912' />
    <file name='jdk-1_5_0_06-nb-4_1-win-ml.exe' dirName='c:\archive' modifiedDate = 'Wed Dec 21 00:08:07 EST 2005' length='136795282' />
    <file name='KDiff3Setup_0.9.88.exe' dirName='c:\archive' modifiedDate = 'Sun Mar 05 10:41:11 EST 2006' length='2445486' />
    <file name='NCSetup-1.1a.exe' dirName='c:\archive' modifiedDate = 'Sun Mar 05 11:14:41 EST 2006' length='595466' />
    <file name='netbeans-5_0-windows.exe' dirName='c:\archive' modifiedDate = 'Mon Feb 20 07:14:41 EST 2006' length='60850738' />
    <file name='netinfotool_allos_gw.exe' dirName='c:\archive' modifiedDate = 'Sun Mar 12 16:33:00 EST 2006' length='950334' />
    <file name='sjsas_pe-8_2_2005Q2-nb-5_0-fcs-bin-win.exe' dirName='c:\archive' modifiedDate = 'Mon Feb 20 07:12:05 EST 2006' length='38329103' />
    <file name='WinMerge-2.4.6-Setup.exe' dirName='c:\archive' modifiedDate = 'Sun Mar 05 10:23:27 EST 2006' length='2578854' />
    <file name='ATAPI.EXE' dirName='c:\DELL' modifiedDate = 'Tue Sep 03 10:31:44 EDT 2002' length='28672' />
    <file name='DOSXPRES.EXE' dirName='c:\DELL' modifiedDate = 'Wed Jul 14 19:44:26 EDT 1999' length='13043' />
    <file name='EXPRESS.EXE' dirName='c:\DELL' modifiedDate = 'Wed Aug 25 16:17:24 EDT 1999' length='79024' />
    <file name='UWAKEOFF.EXE' dirName='c:\DELL' modifiedDate = 'Thu Jul 25 17:45:32 EDT 2002' length='28672' />
    <file name='UWAKEON.EXE' dirName='c:\DELL' modifiedDate = 'Thu Jul 25 17:46:44 EDT 2002' length='28672' />
    ...
    <file name='VIEW32.EXE' dirName='c:\WINDOWS' modifiedDate = 'Mon Aug 26 05:12:00 EDT 1996' length='93184' />
    <file name='wanmpsvc.exe' dirName='c:\WINDOWS' modifiedDate = 'Fri Jan 10 18:13:04 EST 2003' length='65536' />
    <file name='WINHELP.EXE' dirName='c:\WINDOWS' modifiedDate = 'Thu Aug 29 07:00:00 EDT 2002' length='256192' />
    <file name='winhlp32.exe' dirName='c:\WINDOWS' modifiedDate = 'Wed Aug 04 03:56:57 EDT 2004' length='283648' />
    <file name='LHA.EXE' dirName='c:\Writing\CPPOINTR' modifiedDate = 'Sat Jul 20 03:13:00 EDT 1991' length='34283' />
    <file name='test.exe' dirName='c:\Writing\interviews\Debug' modifiedDate = 'Sun Dec 15 06:00:28 EST 2002' length='155701' />
    </files>
    <scan numFiles='3319' />

    A few interesting insights that jump out just from examining the file metadata.
    - the parent directory name is key metadata because it often groups multiple related executables.
    - Is size a useful judge of the importance of the application?  What do you think?  You should run a scan of your computer and examine the results.

  2. If you want to scan your computer using this simple java program ... here is the source code. (Very quick and dirty.  I will also work on improving this ... still debating whether I will turn it into a real application in order to add the metadata we design).
  3. What lessons have we learned from examining our applications?

10/9/2006 - Took a different approach to work on this problem.  Instead of scanning all apps and weeding through hundreds.  I decided to analyze and categorize just the applications on the Start menu of my laptop.  From this I created a taxonomy that could be used to discover an application via its function or utility.  See my blog entry on this subject by clicking here.