I decided this evening that I ought to update on progress! It has been a couple of months since the last post and lots of progress has been made. Briefly I have figured out and solved :
- Raspberry Pi Cluster power supply issue. This involved purchasing a 600W PC Power Supply and some USB Power Distribution Boards. The power supply has been hacked and connected to the USB PDBs which in turn power 4 Raspberry Pi’s per PDB. The power supply has enough juice to run 12 Raspberry Pi’s simultaneously plus the 2 Gb Network switches required to connect the RPi’s to my local network router.
- Setup a MariaDB Database on my local Network Access Storage (NAS). This database will be used as needed for Raspberry Pi Cluster software projects.
- Signed up for, and started on 4th September, the Fundamentals Of Data Science course by the Southampton Data Science Academy.
- Signed up for, and started on 5th September, the Data Science With Python online course with Datacamp.
- Downloaded Anaconda and got started with Jupyter Notebooks. These will be two important packages/tools to be fluent with as I get into my Data Science career.
- Decided upon two initial, and interlinked, Data Science software projects to pursue. Firstly to build a distributed Python analysis engine based upon my previous automated analysis tool that I called “Describe”. Secondly to build a distributed Genetic Algorithm processing cluster for function optimisation. Both projects will be written mainly in Python but also using the Microsoft stack where appropriate.
- Using skills gained from the Datacamp course I’ve managed to connect to my local MariaDB database and my remote MS-SQL Databases that I rent through Fasthosts. This enables me to make use of my SQL and Data Analysis skills particularly with Microsofts T-SQL. I will be able to build complex queries using SQL Server Management Studio and then make use of them in Python.
- Decided upon an initial Data Science showcase analysis that uses the knowledge of the UK Residential Property Market that I gained when working in Risk Management. This analysis makes use of Python skills I’ve learnt doing the Datacamp course mentioned above. So far I have built :
a) A url data extraction program using Jupyter Notebooks / Python that loops through all UK Postcode Districts and downloads the Property sales transaction data from January 1st 1995 to present. It gets a file for each District and renames the files to the appropriate District Postcode.
b) A python program that pulls the data from a single UK Postcode District into a pandas dataframe, enriches it with some extra columns, then creates a new dataframe that groups the Property sales transactions by Year/Month and averages those sales against the Year/Month so that the data can be plotted.
c) Downloaded UK Retail Price Index (RPI) and Average Earnings Index (AEI) data.
d) Purchased a UK Postcode Districts and Sectors SVG vector map which will be used to present the Property sales and RPI/AEI data that I’ve downloaded in interesting ways! I’ve figured out how to edit the SVG file, and how I will be able to plot data geographically on the map.
That’s about it I think although there’s plenty of research, reading and thought that’s been going on behind all of the above.
The next 5 weeks will be all about completing the two Data Science courses that I’m taking plus completing the initial showcase.
Here’s to progress!