Social networks play an important role in our daily life. Without physical connections, we are aware of the status of our friends. We are even able to connect with famous persons and learn their life, which was unimaginable before. One key way of connections in social networks is via feeds, where people publish status or information and others who subscribe or follow it would see. With more and more people are connecting virtually, there is a massive growth of feeds and their flow between people. A system to support such function in the global scale is the foundation of modern social networks.
In social networks, posts of a user form a feed list. A feed list can be followed in order to receive notifications of updates. A user can follow multiple feed lists. From a user’s perspective, a timeline displays the merged view of all followed feeds ordered by time. The following- and followed-relationship among users exhibits a social graph.
It’s common that one feed list is followed by multiple users. Therefore, reading operations are dominant. With a social network system with 100 millions of users, where on average each feed list has 100 followers, there can be 100 times of readings over writings. Suppose there are 1% of users publishing a post every hour, the QPS (query per second) is 300 (\(100M\times1\%/60/60\)) for writings and 30,000 for readings accordingly.
One particular challenge is the feeding from popular users who have tons of followers. For example, with 1 million followers, every new post from a popular user can generate 1 million readings. If we require such broadcast to complete in 1 second, the QPS of readings is then spiked to 1 million.
Push and pull
One challenge of social network applications is how to efficiently publish feedings to followers. There are two models to support feed publications. A push model proactively pushes each new feed to all followers based on the social graph. New feeds then become ready in each follower’s repository in the server. When a follower opens or refreshes the application, new feeds can be directly downloaded from the follower’s repository in the server.
A push model is enough in most cases, since majority of users have a handful of followers–most of us are not super stars. For popular users with extraordinarily large number of followers, a push model is likely to encounter QPS bottlenecks. One optimization is to broadcast new feeds only to online followers first. Pushes to offline users can be delayed as they won’t be aware of it anyway. However, a push model does not have further sense about user behaviors. For example, pushes to some online users can also be delayed if they are using the application while not refreshing it yet.
Alternatively, a pull model keeps new feeds in the source repositories without broadcast. When a follower opens or refreshes the application, a pulling of new feeds is triggered by collecting them from every resource repository based on the social graph. The pull model resolves the QPS bottlenecks of the push model for popular users. A pulling operation is taken only when a follower wants it. Instead, the status of publishers is opaque to followers. A pulling operation always queries all resources even if majority of them are idle. While a user is unlikely to follow too many publishers, when many of them are pulling, a QPS bottleneck can be reached.
Both push and pull models have their own pros and cons. Major limitations come from a lack of recognition in either the follower side (for the push model) or the publisher side (for the pull model). Let’s visit some specifics based on the assumptions below.
- Among a total of 100-million users, only 100 of them are popular (each has more than 1 million followers), while the rest have an average of 100 followers for each.
- Every hour there is one popular user posting a new feed.
- Every hour there are 1% of normal users posting new feeds.
- Every hour there are 10% of users refreshing the application to fetch new feeds.
- On average, every user follows 100 publishers, among which 10% are popular publishers.
|QPS (push)||QPS (pull)|
|Popular publishers||\(1,000,000\)||\( 27,778\)
|Normal publishers||\( 27,778\)
Apparently the best of both worlds can be achieved by combining the push and pull models. Specifically, we apply a push model to normal publishers that have a handful of followers and use a pull model for popular publishers.