Analyze the rainfall of landslide on Apache Spark

Tags: , , ,

Aim:

To analyze the rainfall data from a given place using big data implementation for scalability for rainfall prediction

Synopsis:

In recent years countries like United States of America, Japan, China, Taiwan etc, were suffering from extreme and dangerous natural disasters due to impact of climate. The flood is one of the main reasons for the damages caused in the Asian countries like, India, Bangladesh, Sri Lanka, China etc. These floods increase the risk of death by 75%. The advancement in the information technology, the need for easy accessibility of large cloud storage and processing power is available. Data mining technologies helps us to provide reference for decision makers as summarized information even from the large amount of data. Among many data mining techniques, classification is a widely used one. Past studies proposed many techniques that could be applied to classification, such as decision trees, neural networks, Bayesian classifiers, support vector machines.

Proposed system:

To overcome the fallback in the existing system we propose a machine learning based system to increase the efficiency and accuracy. To handle voluminous data we are using Hadoop to store and retrieve data from the distributed hadoop file system (hdfs). Random forest algorithm is to be implemented for forecasting rainfall.