Download
News
- Fast Fourier Transform
- Ouya Update
- HTML5 Finalized
- Raspberry Pi Store
- Jason: Emacs http://www.gnu.org/software/emacs/
- Patrick: Chrome Browser Sync http://www.google.com/chrome
- Jason: Hadoop: The Definitive Guide
- Patrick: Anathem
Hadoop
History
- Jeff Dean & Sanjay Ghemawat wrote the paper MapReduce
- Created by Doug Cutting while he was at Yahoo!.
- Intended to support Lucene (search engine reverse indexing).
- Facebook announces their hadoop filesystem has grown to 100 petabytes.
- HDFS: Hadoop Distributed Filesystem
- HBase: A distributed, column-oriented database
- Zookeeper: Distributed coordination service
- Crunch: Simplified API for creating mapreduce pipelines.
Strengths
- Scale-free
- Fault Tolerant
- Can add/remove hardware in real-time.
Weaknesses
- Long spin up / spin down time.
- Worker Pools
- Excessive Serialization/deserialization
- Excessive Materialization
Tools
- Avro: A serialization framework
- Pig & Hive: querying and storing large datasets
Uses
- Storing/Manipulating Big Data.