Distributed Data Processing on the Cloud - lecture 9

Klipi teostus: Ahti Saar 09.11.2018 1005 vaatamist Arvutiteadus


In recent years, there has been a significant growth in the size of data that needs to be processed and analyzed. With the advent of cloud computing and maturity of distributed systems, several new solutions have popped up for distributed data processing such as MapReduce, in memory alternatives such as Apache Spark, NoSQL databases or frameworks based on the Bulk Synchronous Parallel model. This course aims at providing students with an overview of cloud and how large-scale data of the order of few Tera or Peta bytes can be processed with distributed data processing solutions and frameworks, on the cloud resources. The course introduces Cloud computing, MapReduce, BigData solutions such as Pig, Spark, Giraph, NoSQL solutions such as Riak and MongoDB.