Cloud Computing Search Engine

Loading

MapReduce and Hadoop: Learning Resources

MapReduce is a framework introduced by Google in 2004 to support distributed computing on large data sets on clusters of computers. Some research papers from Google describe MapReduce.
 http://labs.google.com/papers/mapreduce.html

Hadoop is an opensource implementation based on the Google's MapReduce and Google File System (GFS) papers. Hadoop is a project of the Apache Foundation (http://hadoop.apache.org/). There are several opensource projects using Hadoop for many different applications. Database systems such as Cassandra and HBase are based on Hadoop. Currently, websites such as Facebook, Yahoo, Amazon and Hulu, are using these software products for providing high scalability and availability.

There are several other implementations and applications of the  MapReduce principles. Amazon and AppScale, for example, also provide APIs for MapReduce processing.

If you are interested in an opensource solution (based on google app engine) that you can deploy on Eucalyptus, you can also review AppScale.
 http://code.google.com/p/appscale/
 
Introduction

If you are interested in a basic introduction, there is a presentation
you can use...
   http://ccsl.ime.usp.br/wiki/images/8/8a/Hadoop_seminar.pdf

and some lectures from a Google´s seminar that can be used as an introduction
 http://sites.google.com/site/mriap2008/lectures

Apache´s website also includes other presentations
 http://wiki.apache.org/hadoop/HadoopPresentations

Tutorials

If you are interested in tutorials and training material, you can
check some basic tutorials from Apache and Yahoo
 http://hadoop.apache.org/common/docs/r0.15.2/mapred_tutorial.html
 http://developer.yahoo.com/hadoop/tutorial/index.html

Also, maybe you can use some training material from Cloudera. Cloudera
is a company offering commercial distribution and support for Hadoop.
They provide a tutorial in their website...
 https://ccp.cloudera.com/display/SUPPORT/Hadoop+Tutorial
and some videos from Cloudera webinars (including one about common
problems that can be solved with Hadoop)
 http://www.cloudera.com/resources/Training/
 http://www.cloudera.com/resources/Recorded+Webinars/
 
Other lectures

If you are interested in lectures for specific courses, you can check
the listing referenced in Google Code University
 http://code.google.com/intl/es-CO/edu/parallel/
 http://code.google.com/intl/es-CO/edu/submissions/mapreduce/listing.html
 http://code.google.com/intl/es-CO/edu/submissions/mapreduce-minilecture/listing.html

Additionally, there are some lectures from Berkeley as part of a
course on Structure and Interpretation of Computer Programs...
 http://www.youtube.com/watch?v=mVXpvsdeuKU
 http://www.youtube.com/watch?v=NjAKl5B0BKs

and some others from an stanford course on mining large data sets (using Hadoop)
   http://www.stanford.edu/class/cs246/cs246-11-mmds/handouts.html

Other Ideas

Finally, there are some other videos in the internet
 http://lanyrd.com/topics/hadoop/video/
 http://www.mapreduce.org/videos.php
 

blogger templates 3 columns | Make Money Online