Another update since it has been almost 3.5 weeks since the last one.
Work has been rather busy since we’re in UAT for a large software development within the London Insurance Market. I’m right in the middle of the action plus it’s happening just when I’m also doing the Southampton Data Science Academy Fundamentals Of Data Science course. So I’m working in detailed transaction/message data by day and studying Data Science by night. Here’s a summary of progress :
- Raspberry Pi Cluster – nothing done since last time.
- The Fundamentals Of Data Science course run by the Southampton Data Science Academy. I’m in week 4 of the 6 week course. I’ve finished two of the three assignments for the course. Both assignments I will have passed with at least 80% on each. I know this because the assignments are automatically assessed (built in Jupyter Notebooks) except on Assignment 1 there’s a prose question which I may or may not get marks for and Assignment 2 I haven’t figured out the answer to the final question. I may still have a go at it as you can re-submit the assignment as many times as you want but only the final submission is used for your result.
- The Data Science With Python online course with Datacamp. I’ve completed 6 of the 20 modules and I’m halfway through module 7. There’s 13 more to go after that! Modules completed are :
- Intro To Python For Data Science
- Intermediate Python For Data Science
- Python Data Science Toolbox (Part 1)
- Python Data Science Toolbox (Part 2)
- Importing Data in Python (Part 1)
- Importing Data in Python (Part 2)
- Distributed Python “Describe” exploratory analysis engine not started.
- Distributed Python Genetic Algorithm not started.
- UK Property Market Analysis showcase DB. The database required for analysing the UK Property Market is being built as I type! I wrote the Python programme in Jupyter Notebooks to take stock of all the CSV files downloaded from Land Registry, loop through each of them, and insert rows into my UKPropertySales table on my local MariaDB (that lives on my NAS). This means I will have a DB of every Property Sale in the UK from January 1995 until August 2017. Next step will be to write a Python program that analyses the Property Sales and produces a set of descriptive analyses for every Postcode Area/Sector in the UK.
- UK Property Market Analysis showcase. I’ve also been pulling apart the Vector file to figure out how I can link any analysis by postcode to the vector shapes for each postcode Area/District. I started by purchasing an App called BoxySVG which allows access to the metadata for each shape and I started manually adding the Postcode Area/District label to the Title field in the metadata. This would allow me to link a shape directly to any Postcode Area/District. I then looked at how I could take the vector co-ordinates for each shape, do the same for the Postcode Area/District labels, from the SVG file and then figure out which label went against which shape automatically and add the label value into the metadata to save me a huge job. I sometimes start things manually because I believe that by getting your hands dirty in small details for a while you often discover the patterns you need to develop a better solution – because you know what you’re dealing with in a much more intimate way. After working on this for a while I switched to looking at the Property Sales data for a single Postcode Area/District. I chose one that I had lived in previously just for the extra interest value. I normalised the average Property Price by adjusting according to the . It was at the point when I built a chart in Excel of the Property Sales for that area that I discovered that on a monthly basis for a Postcode Area/District there are not many individual sales. This would cause a big problem when it came to plotting the data on the UK postcode map so at this point I decided to move up from plotting down to District level to a Sector level e.g rather than plot at the level of “AL1 A” I will plot at “AL1”. Doing this gives a much better picture of average Property sale prices. There are also much less Postcode Area/Sectors to deal with which will make processing of the visual plots much quicker. Something else I looked into was how I can turn the Postcode area shapes into 3D shapes rather than a flat 2D shape. Anyways, here is the chart that I produced that is the foundation of the analysis that will be undertaken per Postcode Area/Sector. The blue line is the average Property Sale price and the red line is a 12 month rolling average. The index used to normalise the average sale price is labelled as CDKO (“Long term indicator of prices of consumer goods and services (Jan 1974=100)”) downloaded from the data.gov.uk website. This gives an indication of the relative price of Property across the years.
So…lots more progress but not enough time to write something more substantial than an update!