…just another hippy blog

Software development, cold beer, photography and other high-caffeine, bikini powered, topics.

Amores de Kafka

Dyptic Ride

No, Movida is not dead.

It’s just me being lazy and busy. Mainly lazy.

There is a new lame animated screenshot for the IMDb import wizard,

After a first attempt to use libxml’s HTML parser I switched back to plain pattern matching. The main problem is that HTML text nodes can contain other nodes and it’s quite hard (read p.i.t.a.) to reconstruct the text,

Take this example from the IMDb search resuts:

<li><a href=”URL”>Amores de Kafka, Los</a> (1988)<br>aka <em>”The Loves of Kafka”</em> - USA</li>

Retrieving that “aka The Loves of Kafka - USA” ain’t that easy if you have a regular XML tree.
And you would end up making the whole parsing process slower.

Now I’ll need to write some code to actually parse a movie page and find someone do provide me with better artwork for the wizard :P

No comments yet. Be the first.

Leave a reply