Wednesday, April 16, 2014

MetaDirectory: Creating searchable meta directory structures from XML metadata

So what can you do with your movie XML metadata?

I've had this idea for some time and now when I started studying XSLT I needed something concrete to practice it's features.

The idea was that I have files that have some kind of XML metadata information I could automatically create multiple directory structures to navigate to this file.

For instance if I have list of actors, I could go to directory Actors and open any actor directory there and it would show all the movies that actor has made. I could go further into Year subdirectory and I could select only those movies that actor made in particular year.

And it should be highly configurable.
XSLT solves that problem. It isn't the easiest of languages but it's very powerful for this kind of use.

This script creates directory structures and shortcuts to the original file in the end of the directory tree. Original file can be any file type as long as it has accompanying XML metadatafile. It your document is "My document.doc" then the XML file should be named "My Document.doc.xml".

These examples use only movie files and the XML schema is from ImdbPY. See my earlier blog post of fetching these XML files from Imdb.

Because this script only creates new directories it never deletes them. If your metadata changes only way to properly update the directory structure is to build it from scratch. And you have to run the script to update the directories after the metadata changes, of course.

The script has four parameters:
  1. TargetDir  Directory where your metadata directory structure is written
  2. XSLT File  Name of the XSLT file used for formatting the directory structure
  3. SourceDir  Source directory for the document files
  4. XML Static file Static XML file that is used with media XML file for XSLT formatting 

Here's the script and single executable file (compiled with pyInstaller). 


MetaDirectory.py
MetaDirectory.exe


As you can see the script is quite straightforward. The magic happens in XSLT file.

If you have several source directory trees you simply run MetaDirectory many times and use the same target directory. You can also create different XSLT files and Static XML files for each source directory tree. You can for instance combine music, music videos and movies into same target directories so you can have first level directory like Favorite Artists/Elvis Presley that contains all the music and all the movies of Elvis Presley.



The XML Static file can be used for inserting static XML structures. Like for instance you have a directory where all the movies are awful. You can put into your Static file
<own_movierating>Awful</own_movierating>
Another use case is actors. If you put all the actors of all the movies in one directory there will be a lot of actors. You usually want to list only the lead actors or top100. Then you can create in your static file a list of all the actors you find special and in your meta directory you can create a directory where only those actors are.

Here's a example of Static XML file:
StaticData.XML

There will also be the original document file name in the XML schema /data/DocumentFileName. You can use it if you want to name the shortcuts with original file name.

And the most important file is the XSLT file.
MovieDB.xslt



This is only an example file. It is not ment to be a complete XSLT. There's only a couple of use cases you can copy&paste to your own XSLT file. You don't have to know how XSLT works.

There's some examples of ImdbPY XML schema at the start of file that you can use. All elements are in the root element /data.

The XSLT file transform the XML metadata into list of all the possible directory names. The list starts with <items>-tag.

<xsl:for-each select="/data/movie/genres/item">
    <item>
        <xsl:copy-of select="/data/movie/kind/text()"/>/[Genre]/<xsl:copy-of select="text()"/>/[Year]/<xsl:copy-of select="/data/movie/year/text()"/>/<xsl:copy-of select="/data/movie/title/text()"/>

    </item>
</xsl:for-each>
This first section selects from XML file all the genres of that movie and loops them. For each row it also gets the media type(kind), the year movie was mad and always the last one is the name of the document and in this case it's the name of the movie.

<xsl:for-each select="/data/movie/director/person">
    <item>

        <xsl:copy-of select="/data/movie/kind/text()"/>/[Director]/<xsl:copy-of select="name/text()"/>/[Year]/<xsl:copy-of select="/data/movie/year/text()"/>/<xsl:copy-of select="/data/movie/title/text()"/>
    </item>
</xsl:for-each>

<xsl:for-each select="/data/movie/director/person">
    <item>
        <xsl:copy-of select="/data/movie/kind/text()"/>/[Director]/<xsl:copy-of select="name/text()"/>/<xsl:copy-of select="/data/movie/title/text()"/>
    </item>

</xsl:for-each>

This is similar than the the first one but in this case there's one root directory [Director] where two XSLT queries insert files. First one creates directory structure
movie/Director/*/Year/*/
and the second one
movie/Director/*/
So when you open the directory for particular director, it lists all the movies AND has also subdirectories where you can select particular year the movie was made.

<xsl:for-each select="/data/movie/cast/person">
    <xsl:variable name="actor" select="name/text()"/>
    <xsl:for-each select="/data/movie/genres/item">
        <item>
            <xsl:copy-of select="/data/movie/kind/text()"/>/[Actor, All]/<xsl:copy-of select="$actor"/>/[Genre]/<xsl:copy-of select="text()"/>/<xsl:copy-of select="/data/movie/title/text()"/>
        </item>
    </xsl:for-each>
</xsl:for-each>


This one is a little more complex. It lists all the actors and inside each actors directory is also directories for different genres. This has to be made with two for-each loops because one movie has many actors and many genres. So the product of this query will be a cartesian product of all the movies actors and all the movies genres.

First we loop the actors and put the name of the actor in variable. Otherwise we cannot access the actor name in the inner loop.Then we loop all the genres and when we print the row, we use the variable $actor.

<xsl:for-each select="/data/movie/cast/person">
    <xsl:variable name="actor" select="name/text()"/>
     <xsl:for-each select="/data/top_actors/actor[name=$actor]">
            <item>
                <xsl:copy-of select="/data/movie/kind/text()"/>/[Actor,Top 100]/<xsl:copy-of select="$actor"/>/[Year]/<xsl:copy-of select="
/data/movie/year/text()"/>/<xsl:copy-of select="/data/movie/title/text()"/>
            </item>
    </xsl:for-each>
</xsl:for-each>


This is an example how static data can be used to filter out rows.First I loop all the actors and save the actors name in variable actor. The I loop all the top_actors from my static data file and select only those where name of top_actor is the same as in variable actor.

<item>
    <xsl:copy-of select="/data/movie/kind/text()"/>/<xsl:copy-of select="substring(/data/movie/title/text(),1,1)"/>/<xsl:copy-of select="/data/movie/title/text()"/>
</item>


This does not loop anything because you have to use for-each loops only for items that are many for one movie. This only takes initial from movie title using substring so it creates directories A, B,C so it's easier to search by movie title. It should be fairly easy also to remove the leading A's ot The's.

Last part of each item is always the file name of the shortcut. It does not have to be movie title. It only has to be unique. You could for instance combine in file name the year and the title.

One way the script could be expanded is reading Windows file metadata like Word and Excel directly from files using Windows API. Then it would not be necessary to create separate XML files for them.

This script uses XSLT 2.0 and there's tons of capabilities in it. These examples only a scratches the surface.

Edit:
I updated the script to properly work with Unicode characters.  For instance the shortcut creation had to be done via alternate COM interface for it to work with Unicode paths.

Imdb2XML: Fetching Imdb movie metadata to XML files

I wrote a little script to easily fetch movie metadata from Imdb.

Actually I didn't wrote that part that fetches the metadata but the guys who work with Imdb2Py package did.

The script takes as a parameter name of the movie file and the second optional parameter is keywords you want to remove from movie file name before you start searching IMDB with that movie name.
You can edit the search terms and when you start the search it lists the found movies and lets you select the right one. It then fetches IMDB metadata for that movie and saves as "[wholemoviefilename]".xml.

Here's the script: 
You have to install Python 2.7 and the latest version of ImdbPY to use it.

(link removed)

Easiest way yto use it is to create batch file that calls:
python Imdb2XML.py %1 "strings to be removed"
and associate that with your movie files.

I tried to make an single EXE file but there's some problems with Unicode characters that I have to solve before I can release that version.

But why do you need XML metadata files for your movies, might you ask...

Then you should read this...

Edit:
The problems with Unicode characters were not in pyinstaller but in the script. After I solved that the EXE works also fine. In case you wonder about security, imdb2py makes requests to addresses www.imdb.com and akas.imdb.com.

Imdb2XML.exe
Imdb2XML.py

Sunday, April 6, 2014

A plead to use tuple versioning in a database properly

I've seen tuple versioning been used in many databases and in almost all of them the end time of version is implemented in an error-prone way.

There are two ways to implement the end time:

Way 1. End time of version is the same time than start time of next version. 

From wikipedia:
For example, if a person's job changes from Engineer to Manager, there would be two tuples in an Employee table, one with the value Engineer for job and the other with the value Manager for job. The end time for the Engineer tuple would be equal to the start time for the Manager tuple.
When you select specific version you use query like this:
sysdate >= start_time and sysdate < end_time

Way 2. End time of the version is the last time the version was valid. 

If resolution of time is one day, then end time of version for instance would be 30.9.2013 if the start time of the next version  is 1.10.2013.

The query then looks almost the same than the other one: 
sysdate >= start_time and sysdate <= end_time


Do not use Way 2.  


Problem with Way 2 is you have to know what the resolution of time is. If the start time of the next version is 1.10.2013, then the end time of version could be 30.9.2013 or 30.9 23:59:00 or 30.9 23:59:59 or 30.9 23:59:59.999 depending of the resolution of the time used. Remember you normally hard code this to your queries or program code everywhere where you have to make a new version.

If you want to change the resolution of time later, you have to change your program AND your data. If you change your resolution of time from day to seconds, your have to update every row in your database and add that 23:59:59.

If you have connections to other databases that use different resolutions of time, you have to change end times on-the-fly. If you use Way 1 you don't have to do anything.

When you are searching for the version before another row you have to make same calculation depending of the resolution of time in your system. If you use Way 1 you can simple use query earlier_version.end_time = next_version.start_time.


This is completely unnecessary. I can't find any decent arguments to use Way 2 instead of Way 1. It's not a big problem but like I said it's an unnecessary one. If you simply use Way 1 there will not be any of the problems I previously mentioned. The queries to find specific version are nearly identical.

If you are using Oracle 11G there's also now an effective way to index null values of the end times:

http://www.dba-oracle.com/oracle_tips_null_idx.htm

Monday, January 27, 2014

Windows 7 Task Scheduler:How to send email alerts from failed task executions


I try to automate as much as I can with Windows 7 Task Scheduler but I found out Task Scheduler does not check at all the return code of the completed program. It always shows tasks status as "Action completed" no matter what errors happened during the execution. Because it always shows as completed, it's not possible to automatically send an email if task fails. Because it never fails.

There was not any decent answer even in Microsoft's Technet. So I had to dig into XPath event filters to get it working.

Creating the XPath filter

First we create the appropriate XPath filter. We can test the filter in Event viewer. Open Event viewer and open context menu for "Custom Views". Select "Create custom view". A window opens and select it's tab "XML". Check "Edit query manually", pass the warning  and copy-paste this filter there:

<QueryList>
  <Query Id="0" Path="Microsoft-Windows-TaskScheduler/Operational">
    <Select Path="Microsoft-Windows-TaskScheduler/Operational">
        *[System[
                (EventID=201)
                ]] 
    and
    *[EventData[
            Data[@Name="ResultCode"]!=0
        ]]
    </Select>
  </Query>
</QueryList>


Press "Ok" and "Save filter to Custom View" windows opens. Give the custom view a name and press "Ok". Now you have a view that should show all tasks that reported completed but actually have return code other than 0.

XPath syntax seems to be simple so you can try other search conditions also. You may have to blacklist or whitelist what programs you want alerts from. There are tasks that will return codes other than 0 even when the execution was ok. For instance task action "Display a Message" returns 1 as a result code so if you create a task that triggers on failed executions and as an action shows a message, you will end up with infinite loop...

<QueryList>
  <Query Id="0" Path="Microsoft-Windows-TaskScheduler/Operational">
    <Select Path="Microsoft-Windows-TaskScheduler/Operational">
        *[System[
                (EventID=201)
                ]]
    and
    *[EventData[
            Data[@Name="ResultCode"]!=0 and 

            ( Data[@Name="TaskName"]!="\DONT_SHOW_THIS_TASK" and
            Data[@Name="TaskName"]!="\DONT_SHOW_THIS_TASK_TOO")
         ]]
    </Select>
  </Query>
</QueryList>


Using XPath filter as an event trigger

When you have a working filter we can create scheduled task for it. Create a new task and select "On an event" as a trigger. Select "Custom" instead of "Basic" and press "Edit Event filter". Select tab "XML" and copy-paste your filter there.

Sending mail as an event action

I don't have any easy answers for you about setting the mail. The "Send an e-mail" action in Task Scheduler does not work with mail systems that require authentication.
I have Cygwin environment and configured SSMTP for these type of mails so I simply created a batch file to call SSMTP to send me an email.
One problem is you don't get any information in mail about what actually went wrong. You can only trigger static emails "Something is wrong". With cron it's easier because it sends you stderr as message body.

Using Powershell scripts as Tasks

One thing to remind you is if you are using Powershell scripts, you should always use -command instead of -file.
If you have task:
C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -noninteractive -nologo -file BadJob.ps1
It does not return Result code! This is an obvious bug in Powershell.

You have to use: 
C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -noninteractive -nologo -command .\BadJob.ps1


Conclusion

Creating email alerts is a lot harder than it should be. XPath syntax is not for average users and without it there's no way to check for task failures. I already installed Cygwin's cron to bypass Windows Task Scheduler but then decided to check this XPath feature. Cron does not have fancy GUI but getting it to send mails with enough information about the failure is easier in the end if you already have a working Cygwin environment and know how to use unix.

Monday, January 20, 2014

Bezl: THE minimalistic iPhone case


Bezl is the most minimalistic iPhone case you could find.
Too bad it's only for iPhones :(





First a little disclaimer: Bezl's designer Jos Cocquyt is a friend of mine. I remember when he told me his idea of this minimalistic phone case in one of the beach restaurant's in Jericoacoara. I was immediately sold on the simplicity of the idea.

When you drop your phone to flat surface like concrete floor there's only four spots that take the worst impact, the four corners. If you have bumbers like in Bezl it's not even possible that you phone ever touches the concrete floor. There's video in Bezl's web site that shows phone been dropped from the roof of the house without any scratch to the phone. When the corners are protected only things that can break your phone are the twist if it drops from too high or if the floor is not flat.

One of the reasons Jos designed this was that with normal phone cases when you spend a lot of time in surfing beaches there's always sand and dust flying around.  There will always be some of that sand and dust inside the case, between the case and the phone, and it will stay there if you don't clean it once in a while. With Bezl if the phone glass is scratch-resistant like iPhone you don't need to have the whole case.

What I also like about Bezl is the minimalistic feel. First thing that came to my mind about this was high-end amplifiers and turntables. 



Sunday, January 5, 2014

Estimated cost of cracking your WPA2/AES password using cloud based cracking tools

Cloud-based cracking tools were introduced a couple of years ago for everyone to use ( e.g. Cloudcracker ) and with easily rentable cloud servers ( e.g. Amazon EC2) you can always write your own password cracker. With quick access to scalable server farms the password cracking is no more a problem about time spent cracking but more about money spent cracking. How much money someone wants to use to crack your WPA2/AES password?

You should first read this article:
Tom's Hardware: Wi-Fi Security: Cracking WPA With CPUs, GPUs, And The Cloud

According to article, in 2011 it cost $160 to crack password with 6 ASCII characters. We still live in a world of Moore's Law so we can calculate crude estimates about how the price will fall in next decades.




From this chart you can see that if you use password of 9 ASCII characters with today's hardware it costs $ 30 000 000 to crack it. If you are not designing stealth airplanes I would not worry about it ;)

If someone is willing to pay $10 000, your 8 character password can be cracked in 2022.

This is meant to be very simplified presentation about how cloud computing changes the "rules" of password cracking. It isn't anymore about how many hours it takes when anyone can rent cracking power online. Even intelligence agencies with their own servers farms have to think about the cost of cracking.

So relax. If you have password of 12 characters it still costs $1000M to crack in 2036. No matter how many million servers they have in their farms. Remember, extra characters in your WPA2 password do not cost anything.

All this is of course changed when the first password cracking quantum computers are up and running in ten years from now...

Sunday, December 15, 2013

Sennheiser RS-180 headphones disassembled


RS-180 are great but they have one feature that I don't like. The tend to get to warm.
The temperature inside the ear cups is about 5 C degrees warmer than outside. For me that's too much.

So I opened the headphones to see if there's any improvements to be made. At first I thought it was the electronics warming the speakers but there's only electronics in the right ear cup.

 Without the muffs.

Four simple screws to open and you get the speakers out. This is the right ear cup with the Kleer electronics.


The left ear cup is almost empty, only speaker and battery inside.