Beware of the Python on the Window (XP)

By | 2006-09-02

Two years ago, I arrived to Barcelona to continue my Ph.D studies. Two years, how long!, who could imagine all the things that were going to happen? In those days I was looking for a programming language to “fast-prototype” my ideas. The final match was between Python and Ruby, and I finally started to learn “the snake”. But due to a lack of research funds, I had to stop it and start working in a private company, and I didn’t really make a lot of use of Python.

Now I’m rediscovering this programming language, because I feel I’m using PHP too much (among other causes). So I started using Python again, and to start with, I tried to solve a problem/need I had:
– I had 42000 files, containing Go games (*.sgf)
– I wanted rename those files from ugly numbers to something like year-player1Name-player2Name.sgf.
– Some files had the header information coded in UTF-8, with player’s names written in Japanese characters.

I love Unicode’s UTF-8, with these lovely Japanese ideographs (and, of course, my favorite one is code “7881“). I’m using WinXP (yes, I’m a masochist), and I’ve read it natively supports UTF-8, so I thought it was a nice idea to have file’s names with Japanese calligraphy. Finally the result was beautiful… but 2 problems appeared:
1- Most of the Go-game programs on Windows are not prepared to read UTF-8 files, and crash. So I had to make a version without ideographs 🙁
2- Windows throws an error after renaming 10000 files… maybe renaming 42000 files is too hard a job for it 🙁

The code itself:

import glob,re,os,sys

pats = [re.compile(r'DT\[(\d{2,4})'),    #year
        re.compile(r'PW\[([^\]]*)'),    #player White
        re.compile(r'PB\[([^\]]*)')]    #player Black

for file in files:
    info=[]    #were I'm going to save header info
    for pat in pats:
                info.append('kanjis') # replace ideographs

        newName = reduce(lambda x,y:x+'-'+y,info)
            print "Unexpected error:", sys.exc_info()[1]


I know I can group exceptions, but I prefer to write in this way for an easy understanding. By the way, exceptions are one of the strange things of Python… IMO they are too “hardware” for this kind of language.