Networking For Big Data

Networking For Big Data
Big Data Enabled by Networking

  • Big Data Enabled by Networking
  • Google File System
  • BigTable
  • MapReduce
  • MapReduce Example
  • MapReduce Optimization
  • Story of Hadoop
  • Hadoop
  • Networking Requirements for Big Data
  • Recent Developments in Networking
  • 1. Virtualizing Computation
  • 2. Virtualizing Storage
  • 3. Virtualizing Rack Storage Connectivity
  • Multi-Root IOV
  • 4. Virtualizing Data Center Storage
  • 5. Virtualizing Metro Storage
  • Virtualizing the Global Storage
  • Software Defined Networking
  • Network Function Virtualization (NFV)
  • Big Data for Networking
  1. Why, What, and How of Big Data:
    It’s all because of advances in networking
  2. Recent Developments in Networking and their
    role in Big Data (Virtualization, SDN, NFV)
  3. Networking needs Big Data

Google File System

GFS provides a familiar file system interface, though it does not implement a standard API such as POSIX. Files are organized hierarchically in directories and identified by path-names. We support the usual operations to create,delete,open,close,read,and write files. Moreover, GFS has snapshot and record append operations. Snapshot creates a copy of a file or a directory tree at low cost. Record append allows multiple clients to append data to the same file concurrently while guaranteeing the atomicity of each individual client’s append. It is useful for implementing multi-way merge results and producer consumer queues that many clients can simultaneously append to without additional locking. We have found these types of files to be invaluable in building large distributed applications. Snapshot and record append are discussed further in Sections 3.4 and 3.3 respectively.

    1. Commodity computers serve as “Chunk Servers” and store
      multiple copies of data blocks.
      A master server keeps a map of all chunks of files and location
      of those chunks.
    2. All writes are propagated by the writing chunk server to other
      chunk servers that have copies.
    3. Master server controls all read
      -write accesses

Google File System

Networking For Big Data
4 (80%) 1 vote