Wednesday, December 26, 2012

Episode 23: Hadoop


Download

News
Tool of the Show
Book of the Show


Hadoop

History
  • Jeff Dean & Sanjay Ghemawat wrote the paper MapReduce
  • Created by Doug Cutting while he was at Yahoo!.
  • Intended to support Lucene (search engine reverse indexing).
  • Facebook announces their hadoop filesystem has grown to 100 petabytes. 
Features
  • HDFS: Hadoop Distributed Filesystem
  • HBase: A distributed, column-oriented database
  • Zookeeper: Distributed coordination service
  • Crunch: Simplified API for creating mapreduce pipelines.

        Strengths
        • Scale-free
        • Fault Tolerant
        • Can add/remove hardware in real-time.
        Weaknesses
        • Long spin up / spin down time.
          • Worker Pools
        • Excessive Serialization/deserialization
        • Excessive Materialization

        Tools
        • Avro: A serialization framework
        • Pig & Hive: querying and storing large datasets

        Uses
        • Storing/Manipulating Big Data.

        2 comments:

        1. 7 years later and I just now listened to this episode. I still use Emacs for software development. :-)

          ReplyDelete
        2. Aw, this was a very nice post. In thought I wish to put in writing like this additionally ?taking time and actual effort to make an excellent article?but what can I say?I procrastinate alot and on no account appear to get one thing done. https://royalcbd.com/product/cbd-oil-1000mg/

          ReplyDelete