Recently, a friend of mine was experiencing problems with some visual representations of temporal series, stochastic in nature and very irregular as number of points. The dataset was huge (in the order of gigabytes) therefore an automatic solution was needed.
Probably PERL would have been a better choice, but I have taken the chance and, this morning, in a couple of hours I've completed my first Python project. It took a while to learn the syntax for dictionaries, lambda functions and some useful I/O classes, but the rest was really piece of cake. My final implementation works as follows:
- a brief preprocessing, in order to determine the length of each temporal series, done with regular expressions and a dictionary;
- some filtering on the dictionary, in order to remove spurious results;
- reading the file again while skipping, in every time series, as many rows as are needed to make the time series uniform.
The final code is about 50 rows long (comments included); an equivalent C++ implementation, probably, would have been much longer and complex. For sure, files I/O is much more verbose than the Python's with open(...) as descriptor syntax.
I'm very far from a serious knowledge of Python's potential, but I'm impressed: the learning curve is astonishing, as is the clarity of the source code. I'm pretty sure that my future implementations that will not require some low-level functionality (eg., CUDA) or extreme performances will be Python-based.