I signed up to the John Hopkins Data Science course maybe a year or so ago then did the first module (The Data Scientists Toolbox) and got a distinction. Since then I procrastinated a little but I’m halfway through the R Programming Module at the moment.
Programming…it’s second nature to me really. I was a software developer after all so learning a new language is in many respects an exercise in just doing what I want to do and picking the language up as I go. Every program is after all either a Sequence, Selection or Iteration and programming constructs in any programming language are very similar to each other.
Therefore getting experience in either R or Python for that matter is just a case of picking something to do and getting on with it. What I’ve learnt really in my first few weeks of R Programming is the power contained in how data structures are built, what you can do with them, and in the extension libraries. Everything else just seems to be all about learning the actual Data Science techniques and what to apply them to. There’s also the question of Python and what that entails. I took a bit of a deeper dive into the detail today on Python in my lunch break and discovered that as far as the actual Programming Language is concerned there’s not much to it. I read through the Python Tutorial and as far as I can see I’m in the same boat as learning R – the real power is actually in the extension libraries and which techniques to apply to real problems. This explains why Data Science jobs always mention things like pandas, NumPy and SciPy.
So I really don’t see myself having TOO much trouble getting to grips with the R or Python languages so I just need to get myself the relevant IDE and any other libraries and just get on with it. I do have a slight grievance in a sense in that I’ve been working with C# on my own projects for a few years and being able to develop systems in Visual Studio is something I’m familiar with and I’m a little reluctant to drop it. Maybe I don’t need to? Maybe I can work with Python and R under a C# umbrella? Perhaps I should look at building some libraries myself. The obvious choice would be to build a Genetic Algorithm library to start with since I worked heavily on these at college and I think I still have the code somewhere. I’ve read, and still own, the Goldberg Book and it just so happened I came across it a couple of days back. I could write it in C# and then look at building examples of use in Python to see where that gets me.