Accelerating Download With Multiple Machines Using Peer-To-Peer Networks

Computers download files from each other.

A peer-to-peer network will help a file be downloaded by multiple machines much faster, so P2P is a crucial concept to consider for Systems Design and Systems Design interviews.

A peer-to-peer (P2P) network is a computer network where all the connected devices, called peers, share resources directly without needing a central server. In a P2P network, each device acts as a client and a server, meaning it can request and provide resources simultaneously.

Think of it like a group of friends sitting in a circle, where everyone can share things directly. Instead of going through a central authority or relying on a single person to distribute resources, each friend can share their resources and access resources from others in the group.

For example, in a file-sharing P2P network, if you have a file that someone else wants, they can download it directly from your computer, and vice versa. This direct sharing of resources among peers allows for decentralized and distributed file sharing, communication, or any other type of collaboration within the network.

We often use P2P networks for applications like file sharing (e.g., BitTorrent), decentralized cryptocurrencies (e.g., Bitcoin), torrent downloads, and communication platforms (e.g., Zoom). They offer advantages such as increased scalability, fault tolerance, and the ability to operate without relying on a single point of failure.

What is the problem Peer-To-Peer Networks Solve?

If 10 machines download 100MB from one machine with an upload speed of 10 Mbps, and each machine has an internet speed of 100 Mbps with no latency, we can calculate the time it would take for all the machines to complete the download.

First, let’s convert the file size and network speeds to the same units:

100MB = 800 Mb (1 byte = 8 bits)
10 Mbps = 10 Mb/s (megabits per second)
100 Mbps = 100 Mb/s (megabits per second)

To calculate the time required for a single machine to download the file:
Time = File Size / Download Speed
Time = 800 Mb / 100 Mb/s
Time = 8 seconds

Since there are 10 machines, and each machine takes 8 seconds to download the file, we can calculate the total time required for all machines to finish downloading:

Total Time = Time per Machine * Number of Machines
Total Time = 8 seconds * 10
Total Time = 80 seconds

Therefore, it would take a total of 80 seconds for all 10 machines to download 100MB from the machine with an upload speed of 10 Mbps, assuming no latency or network bottlenecks.

10 machines download 10 MB from one machine with 10 Mbps of upload.

Solution from Peer-to-Peer Network

The overall download time can be significantly reduced if we apply a peer-to-peer strategy, where each machine can simultaneously upload and download files to/from other machines.

In a peer-to-peer scenario, all machines can share the workload by distributing the file among themselves. Let’s consider the following approach:

Each machine divides the 100MB file into smaller chunks, for example, 10MB each.
Each machine simultaneously uploads its assigned chunk to multiple machines.
Each machine concurrently downloads the chunks assigned to it from other machines.
Assuming there is no latency or network bottlenecks, and each machine maintains its upload speed of 10 Mbps and download speed of 100 Mbps, we can calculate the time required for the overall download.

Since each machine uploads 10MB (80 Mb) and its upload speed is 10 Mbps, the upload time for each machine would be:

Upload Time per Machine = Upload Size / Upload Speed
Upload Time per Machine = 80 Mb / 10 Mbps
Upload Time per Machine = 8 seconds

Since there are 10 machines, they can all upload their chunks simultaneously. Therefore, the upload time remains the same.

Now, each machine needs to download the remaining 90 chunks (900MB) from other machines. Since each machine has a download speed of 100 Mbps, the download time for each machine would be:

Download Time per Machine = Download Size / Download Speed
Download Time per Machine = 900 Mb / 100 Mbps
Download Time per Machine = 9 seconds

Again, all machines can download their assigned chunks simultaneously. Therefore, the download time remains the same.

In this scenario, the upload time is 8 seconds, and the download time is 9 seconds per machine. Since these tasks can be performed concurrently, the overall time required for all machines to finish downloading will be equivalent to the longest individual task, which is 9 seconds.

Therefore, with the peer-to-peer strategy, it would take approximately 9 seconds for all 10 machines to download the 100MB file, assuming no latency or network bottlenecks.

Types of Peer-to-Peer Network

Peer-to-peer (P2P) networks are a type of distributed network architecture where participants in the network, called peers, can directly communicate and share resources without needing a centralized server. There are several types of P2P networks, including:

Pure P2P Network: In a pure P2P network, all peers have equal capabilities and can act as clients and servers. Each peer can initiate and respond to requests for resources or services from other peers, and no central authority or server is controlling the network.

Hybrid P2P Network: A hybrid P2P network combines elements of both P2P and client-server architectures. It usually includes a central server or a set of super-peers that provide indexing, discovery, or other coordination services to facilitate the functioning of the network. Peers in the network can still communicate directly with each other. Still, the central server or super-peers help with tasks such as peer discovery, resource indexing, or maintaining network stability.

Structured P2P Network: In a structured P2P network, peers organize themselves into a specific structure or overlay network. This structure enables efficient resource discovery and routing by maintaining consistent lookup tables or distributed hash tables (DHTs). Examples of structured P2P networks include Chord, CAN (Content-Addressable Network), and Pastry.

Unstructured P2P Network: In an unstructured P2P network, there is no specific organization or structure among the participating peers. Peers connect randomly or through some form of ad hoc communication, and resource discovery relies on flooding or random-walk-based search algorithms. Examples of unstructured P2P networks include Gnutella and Freenet.

Overlay P2P Network: An overlay P2P network is built on top of an existing network infrastructure, such as the Internet. It uses the underlying network’s communication capabilities to establish connections between peers. The overlay network provides an additional layer of abstraction and enables peers to discover and communicate with each other. Examples of overlay P2P networks include BitTorrent and Skype.

These are some common types of P2P networks, each with its own advantages and use cases. The choice of network type depends on scalability, efficiency, resource requirements, and the specific application or use-case requirements.

Technologies that Use Peer-to-peer

Software engineers have several technologies and frameworks at their disposal to implement peer-to-peer (P2P) networks. Here are a few commonly used ones:

Libp2p: Libp2p is a modular networking stack designed specifically for P2P applications. It provides a set of protocols, libraries, and tools that enable developers to build decentralized and distributed systems. Libp2p supports various transport protocols, peer discovery mechanisms, and secure communication channels.

Uber Kraken: Kraken is a P2P-powered Docker registry that focuses on scalability and availability. It is designed for Docker image management, replication, and distribution in a hybrid cloud environment. With pluggable backend support, Kraken can easily integrate into existing Docker registry setups as the distribution layer.

Kraken has been in production at Uber since early 2018. In our busiest cluster, Kraken distributes more than 1 million blobs per day, including 100k 1G+ blobs. Kraken distributes 20K 100MB-1G blobs at its peak production load in under 30 sec.

Peer-to-peer nodes from Uber Kraken

Source image from: https://github.com/uber/kraken

WebRTC: WebRTC (Web Real-Time Communication) is a web standard that enables real-time communication between browsers and applications. It provides peer-to-peer capabilities for audio, video, and data streaming. WebRTC can be utilized to establish direct connections between peers in a P2P network.

ZeroMQ: ZeroMQ is a lightweight messaging library that facilitates high-performance asynchronous messaging between components or nodes in a distributed system. It supports various messaging patterns, including publish-subscribe, request-reply, and pipeline, which can be leveraged to implement P2P communication.

Kademlia: Kademlia is a distributed hash table (DHT) algorithm commonly used in P2P networks. It provides a decentralized key-value storage mechanism and efficient lookup operations. By implementing the Kademlia protocol, software engineers can build scalable and fault-tolerant P2P networks.

BitTorrent: BitTorrent is a widely used protocol for peer-to-peer file sharing. It enables efficient distribution and downloading of files across a network by dividing them into small pieces and allowing peers to exchange those pieces. Software engineers can leverage the BitTorrent protocol to implement P2P file-sharing systems.

IPFS: IPFS (InterPlanetary File System) is a distributed file system that combines ideas from P2P networks and distributed hash tables. It provides a content-addressable storage model where files are identified by their cryptographic hashes. IPFS allows for decentralized and resilient file storage and retrieval.

Blockchain: Blockchain technology, popularized by cryptocurrencies like Bitcoin and Ethereum, can also be used to implement P2P networks. Blockchain provides a distributed and decentralized ledger for recording and validating transactions or other types of data. It can enable secure and transparent peer-to-peer interactions.

These are just a few examples of technologies that software engineers can use to implement P2P networks. The choice of technology depends on the specific requirements and goals of the P2P application being developed.

Gossip Protocol

A gossip protocol is a type of peer-to-peer (P2P) communication protocol that allows information to spread across a network in an efficient and decentralized manner. It is inspired by the way rumors or gossip spread among individuals in a social network. In a gossip protocol, each peer periodically selects a random set of peers to exchange information with, propagating the information throughout the network.

Here’s how a gossip protocol typically works in a P2P network:

Initial Information Dissemination: When a new piece of information (e.g., an update, a message, or a resource) enters the network, a peer initiates the gossip process by sharing the information with a few randomly selected peers.

Peer Selection: Each peer, upon receiving new information, randomly selects a subset of peers from its list of known neighbors. The size of the subset can vary depending on the specific protocol.

Information Exchange: The selected peers exchange information with each other. This can involve sending the entire information or a subset of it, depending on the protocol design.

Propagation and Replication: The process of information exchange continues iteratively. Each peer, upon receiving new information, becomes a source for propagating that information to its selected peers. This process leads to the rapid dissemination of information across the network.

Convergence: Over time, as the gossip protocol progresses, the information is disseminated to a large portion of the network, and peers converge toward having consistent information.

Gossip protocols offer several advantages in P2P networks:

Decentralization: Gossip protocols operate without relying on a central authority or server. Peers communicate directly with each other, enabling decentralized information dissemination.

Scalability: Gossip protocols are highly scalable since the dissemination process occurs in a distributed manner. The workload is distributed among peers, allowing the network to handle large-scale systems.

Fault Tolerance: Gossip protocols are resilient to failures or changes in the network. If a peer fails or leaves the network, the information can still propagate through other paths.

Efficiency: Gossip protocols are typically efficient in terms of network bandwidth and latency. Peers exchange information with a small subset of other peers, reducing the overall communication overhead.

Gossip protocols have been used in various applications, including distributed databases, distributed systems, content distribution networks, and peer-to-peer file-sharing networks. Examples of gossip protocols include the Epidemic protocol, Push-Sum protocol, and Plumtree protocol.

Summary

Peer-to-peer network is a powerful solution to download a large amount of data very fast for many machines. Therefore, to remember the key points, let’s recap what we learned:

  • A peer-to-peer (P2P) network is a decentralized network where participants, called peers, can directly connect and interact with each other without relying on a central authority.
  • Peers in a P2P network can act both as consumers and providers of resources, services, or information.
    P2P networks facilitate direct communication and data exchange among peers, allowing them to share files, collaborate, or participate in decentralized systems.
  • Peers in a P2P network can discover and connect to each other using various mechanisms like peer discovery protocols or distributed hash tables (DHTs).
  • P2P networks can be structured in different ways, such as unstructured networks, where peers connect randomly, or structured networks, where peers connect based on a specific topology or protocol.
  • P2P networks often employ protocols and algorithms designed for efficient resource sharing, such as distributed file sharing, distributed computing, or decentralized cryptocurrency transactions.
  • P2P networks can offer advantages like increased resilience, scalability, and fault tolerance compared to centralized systems.
  • Examples of P2P networks include file-sharing systems like BitTorrent, communication systems like Skype, and blockchain networks like Bitcoin and Ethereum.
  • Building robust P2P networks requires addressing challenges like peer discovery, routing, data consistency, security, and incentivization mechanisms.
  • Please note that this is a simplified summary of the P2P network concept, and there are many variations and intricacies within different P2P architectures and implementations.
Written by
Rafael del Nero
Join the discussion