September 13th Update


I decided this evening that I ought to update on progress! It has been a couple of months since the last post and lots of progress has been made. Briefly I have figured out and solved :

  1. Raspberry Pi Cluster power supply issue. This involved purchasing a 600W PC Power Supply and some USB Power Distribution Boards. The power supply has been hacked and connected to the USB PDBs which in turn power 4 Raspberry Pi’s per PDB. The power supply has enough juice to run 12 Raspberry Pi’s simultaneously plus the 2 Gb Network switches required to connect the RPi’s to my local network router.
  2. Setup a MariaDB Database on my local Network Access Storage (NAS). This database will be used as needed for Raspberry Pi Cluster software projects.
  3. Signed up for, and started on 4th September, the Fundamentals Of Data Science course by the Southampton Data Science Academy.
  4. Signed up for, and started on 5th September, the Data Science With Python online course with Datacamp.
  5. Downloaded Anaconda and got started with Jupyter Notebooks. These will be two important packages/tools to be fluent with as I get into my Data Science career.
  6. Decided upon two initial, and interlinked, Data Science software projects to pursue. Firstly to build a distributed Python analysis engine based upon my previous automated analysis tool that I called “Describe”. Secondly to build a distributed Genetic Algorithm processing cluster for function optimisation. Both projects will be written mainly in Python but also using the Microsoft stack where appropriate.
  7. Using skills gained from the Datacamp course I’ve managed to connect to my local MariaDB database and my remote MS-SQL Databases that I rent through Fasthosts. This enables me to make use of my SQL and Data Analysis skills particularly with Microsofts T-SQL. I will be able to build complex queries using SQL Server Management Studio and then make use of them in Python.
  8. Decided upon an initial Data Science showcase analysis that uses the knowledge of the UK Residential Property Market that I gained when working in Risk Management. This analysis makes use of Python skills I’ve learnt doing the Datacamp course mentioned above. So far I have built :

a) A url data extraction program using Jupyter Notebooks / Python that loops through all UK Postcode Districts and downloads the Property sales transaction data from January 1st 1995 to present. It gets a file for each District and renames the files to the appropriate District Postcode.

b) A python program that pulls the data from a single UK Postcode District into a pandas dataframe, enriches it with some extra columns, then creates a new dataframe that groups the Property sales transactions by Year/Month and averages those sales against the Year/Month so that the data can be plotted.

c) Downloaded UK Retail Price Index (RPI) and Average Earnings Index (AEI) data.

d) Purchased a UK Postcode Districts and Sectors SVG vector map which will be used to present the Property sales and RPI/AEI data that I’ve downloaded in interesting ways! I’ve figured out how to edit the SVG file, and how I will be able to plot data geographically on the map.

That’s about it I think although there’s plenty of research, reading and thought that’s been going on behind all of the above.

The next 5 weeks will be all about completing the two Data Science courses that I’m taking plus completing the initial showcase.

Here’s to progress!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s