"Day Zero" is a term that is used to refer to a situation of extreme water shortage - a situation that Cape Town came very close to facing last year. They have managed to avoid complete water shortage as of now, but it is a huge warning for the rest of the world. Many cities all around the world experience water shortages throughout the year, causing massive disruptions in day-to-day life of locals, businesses, and the economy. The ability to keep an eye on cities coming close to a ‘Day Zero’ event could be incredibly useful in not only effectively deploying aid, but also to take preventative actions early on so that local authorities are not forced to impose extreme water usage laws on short notice.
To help raise awareness to this situation, we used machine learning to develop a model to predict the water stress level of a country given a particular set of attributes. We divided the raw stress values (a scale from 0-100) into 6 categories:
- Stress Value of 0-20: None
- Stress Value of 21-40: Low
- Stress Value of 41-60: Medium
- Stress Value of 61-80: Alert
- Stress Value of 81-100: High
- Stress Value of >100: Critical
Initial Approach and Results
We started off the project by selecting the a set of relevant attributes to build up a large dataset. Our dataset consisted of 11 attributes, such as annual precipitation, total renewable water resources per capita, desalination capacity, etc. The dataset consisted of 1968 examples. These examples included data for 180 countries, for different years ranging from 1960 to 2014. Our next step was to train a machine learning model on this dataset, to be able to predict stress levels of a country, given all other attributes (i.e. the target attribute was stress level). We ran a series of tests in Weka to find the best algorithm to build a model based on our data. Since there were a fair number of missing attributes, we predicted that decision tree or some other instance based learner would perform well on the data. It turned out that nearest neighbour (IBk) produced the best results (88.26% classification accuracy) when we generated a Weka model with 10 fold cross-validation. KStar was second (89.28%) and RandomForest was third (88.26%).
After generating these models, converted our entire dataset into a test set, and we used the models to generate a whole new set of stress values.
Once the new set of stress values were generated, we collected this data and used scikit-learn to perform linear regression on each country's stress data over the time period of 1960 to 2014. Using this model, we plotted a graph of the water stress vs. year for each country and used that to find predict the year when stress crossed a critical level. The results of our project can be seen below. Select a country from the drop down list to find out when the country could face 'Day Zero'. The world map below shows the set of countries that are predicted to run out of water completely by 2100.
Select a country.
Countries that could possibly run out of water completely by 2100. These predictions could encourage citizens to take immediate steps to conserve water.
The report for this project with more detailed results and a thorough analysis of the results can be found here.