Platform
For us, of course, the architecture of the project is of particular interest: how the main components of the system interact, what in-house developments were required, what tricks we had to use. But before moving on to it, you need to familiarize yourself with the basic things - the technologies and products used.
Debian Linux is used as the main operating system - a time-tested solution, one of the oldest and most stable modern distributions. To balance the load between application servers, the nginx HTTP server operating in reverse proxy mode is used. His responsibilities include maintaining a connection with the user’s browser and transmitting requests to servers responsible for executing the PHP code, as well as monitoring the delivery of the result back to the browser. PHP code is executed using the mod_php module for Apache - there are quite a few alternative options, especially based on the FastCGI protocol, but VKontakte management took a more conservative path in this matter, using the most time-tested solution. No special systems for optimizing the performance of PHP code are used (for example, Facebook wrote its own PHP to C compiler called HipHop), the only external optimization is op-code caching using the publicly available XCache solution.
The situation with data storage looks quite vague: on the one hand, its own database management system, written in C and created by the “best minds” of Russia, is actively used, on the other hand, MySQL was often mentioned as the main storage. I will tell you more about my own VKontakte database below. Speaking about data storage, one cannot fail to mention such an important aspect as caching frequently used information (locating it in RAM for quick access). For this, a very popular product in this area is used - memcached. If you haven't heard: this system allows you to perform very simple atomic operations, such as locating and retrieving arbitrary data by key. The main feature is lightning-fast access and the ability to easily combine the RAM of a large number of servers into a common array for temporary storage of “hot” data.
Third-party projects that are not key to VKontakte are often implemented either using rather exotic solutions, or, conversely, using the simplest technologies. For example, the instant messaging service is implemented in node.js (you can read more about this development in the article “Server-side JavaScript” in ][ 08/2010) using the XMPP aka Jabber protocol (we will return to it later). Video conversion is implemented using the simplest and most effective library - ffmpeg, which also runs the very popular VLC video player.
Main technologies used
- Debian Linux is the main operating system
- nginx - load balancing
- PHP + XCache
- Apache + mod_php
- memcached
- MySQL
- Own DBMS in C, created by the “best minds” of Russia
- node.js - a layer for implementing the XMPP protocol, lives behind HAProxy (haproxy.1wt.eu)
- xfs - file system for storing images and delivering them to the user
- ffmpeg - video conversion
Architecture
The most noticeable difference from the architecture of many other large Internet projects is the fact that VKontakte servers are multifunctional. Those. there is no clear division into database servers, file servers, etc. — they are simultaneously used in several roles. In this case, the redistribution of roles occurs in a semi-automatic mode with the participation of system administrators. On the one hand, this optimizes the efficiency of using system resources, which is good, but on the other hand, it increases the likelihood of conflicts at the operating system level within one server, which entails stability problems. However, despite the use of servers in different roles, the project’s computing power is usually used by less than 20%.
Load balancing between servers occurs in a multi-layered manner, which includes balancing at the DNS level (the domain is served using 32 IP addresses), as well as request routing within the system, with different servers used for different types of requests. For example, generating pages with news (now called a microblog) works according to a clever scheme that uses the capabilities of the memcached protocol to send requests in parallel to obtain data from a large number of keys. If there is no data in the cache, a similar request is sent to the data storage system, and the results obtained are sorted, filtered and discarded at the PHP code level. This functionality works in a similar way on Facebook (they recently exchanged experience), only instead of Facebook’s own DBMS they use MySQL.
A large amount of software has been developed within the walls of VKontakte, which more accurately meets the needs of the project than available opensource and commercial solutions. In addition to the aforementioned proprietary DBMS, they have a monitoring system with notification via SMS (Pavel himself helped design the interface), an automatic code testing system, and statistics and log analyzers.
The project uses fairly powerful equipment; the following server characteristics were tentatively named:
- 8-core Intel processors (two per server, apparently);
- 64 GB of RAM;
- 8 hard drives;
- RAID is not used (replication and backup are carried out at the software level).
It is noteworthy that the servers are not branded, but are assembled by a specialized Russian company. Now the project equipment is located in 4 data centers in St. Petersburg and Moscow, with the entire main database located in the St. Petersburg data center, and only audio and video are hosted in Moscow. There are plans to replicate the database with another data center in the Leningrad region, as well as use the Content Delivery Network to increase the speed of downloading media content in the regions.
Many projects faced with a large number of photos often invent their own solutions for storing them and delivering them to users. This was the first question asked to Pavel from the audience: “How do you store images?” - “On disks!” One way or another, representatives of VKontakte said that this whole bunch of photos of all colors and sizes is simply stored and served from the file system (they use xfs) of a large number of servers, without additional frills. The only confusing thing is that this approach didn’t work for other major projects - they probably didn’t know the magic word :).
No less magical is that very own database in C. This product, perhaps, received the main attention of the audience, but at the same time almost no details about what it, in fact, is, were ever made public. It is known that the DBMS was developed by the “best minds” of Russia, winners of Olympiads and TopCoder competitions, and also that it is used in the most heavily loaded VKontakte services:
- Private messages
- Messages on the walls
- Statuses
- Search
- Privacy
- Friends lists
Unlike MySQL, a non-relational data model is used, and most operations are performed in RAM. The access interface is an extended memcached protocol. Specially composed keys return the results of complex queries (most often specific to a particular service).
The system was designed taking into account the possibility of clustering and automatic data replication. The developers would like to make this system a universal DBMS and publish it under the GPL, but so far it has not been possible due to the high degree of integration with other services.
Interesting facts about VKontakte
- The development process is close to the Agile methodology with weekly iterations (cycles), within which all stages of development take place: planning, requirements analysis, design, development and testing.
- The operating system kernel has been modified (to work with memory), and there is its own package base for Debian.
- Photos are uploaded to two hard drives on one server at the same time and then backed up to another server.
- There are many improvements to memcached, incl. for more stable and long-term placement of objects in memory; There is even a version that ensures data safety.
- Photos are not deleted to minimize fragmentation.
- Decisions on the development of the project are made by Pavel Durov and Andrey Rogozov, responsibility for the services rests with them and the developer who implemented it.
- Pavel Durov has been saving money for hosting since his 1st year :).
Format
VKontakte allows you to create several types of pages: public page (public), personal profile, group and event. Either a group or public is suitable for business. In some cases, a personal profile with an event will be useful, but we will mention this below. Now let's decide what to choose: a group or a public. We do not recommend creating both at the same time - they solve similar problems, so they will simply take subscribers away from each other.
Group
The group is more suitable for communication between the participants themselves: here, in the most prominent place, there is a block with “Discussions”; users themselves write posts in the news feed (if this is not prohibited by the administrator). This type of page will be an excellent unification of people with similar interests and a platform for discussing common topics.
pros
- Privacy settings - the group can be open, closed or private. Everyone can join the open one, the closed one is accepted after the approval of the admin, and the private one is accepted only by invitation.
- All members can publish posts, but such posts will not appear in their news feed.
- User Content - Members can create their own albums, upload videos, photos and music.
- You can invite friends.
Minuses
- The wiki menu cannot be placed on the main page; it is always hidden behind a tab.
- It is not displayed in the profiles of participants - it is visible only if you expand additional profile information, and then it is invisible in the continuous list of other groups.
Public
A public page or public page is better suited if you do not plan to create a platform for general discussion, but only want to broadcast the necessary information to subscribers. If we draw analogies, then a group is a circle of interests, and a public is a newspaper. In the first they discuss, in the second they only receive information. Therefore, the public is ideal for publishing news.
Pros of public
- You can place a menu with wiki markup directly on the main page, pinning it to the top.
- The public is displayed to subscribers in the “interesting pages” block. Friends will see him there and may also become interested in the public.
- You can moderate posts from subscribers, which first appear in the “Suggested News” and only after the approval of the administrator are published in the feed.
Disadvantages of public
- You cannot invite friends to join the community.
- There are no privacy settings - anyone on the Internet can view public posts.
What common
Both the group and the public have a number of common features and functions:
- detailed statistics: visits, subscriptions, likes, reposts and comments;
- the ability to conduct live video broadcasts;
- the ability to respond in person to users on behalf of the community;
- availability of widgets and applications;
- function of creating “products”;
- transfer of payments.
Therefore, when choosing a page format, it is important to take into account the characteristics of the audience and business area. For example, if you are breeding puppies or kittens for sale, it would be more logical to create a group, since customers will probably want to share their impressions after the purchase, communicate with other pet owners, and post photos and videos.
If you have an advertising agency, it is more logical to create a public page, since it is unlikely that clients have anything to discuss - they have different areas of interest. It is only important for them to get the necessary information from you, look at past cases, and make sure of your professionalism.
Events and personal profile
Let us separately dwell on two less popular formats of VKontakte wanderers.
Event
This type is suitable if you are promoting a specific event: a concert, exhibition, conference, training, master class - anything that has a specific date. You can invite other users to the event and promote it using internal advertising.
Personal profile
A personal page will be useful in promoting a business if it is firmly tied to the personality of the owner. This is relevant for wedding hosts, photographers, freelancers, celebrities, business coaches, experts in their fields.
Subprojects
Audio and video services are secondary to the social network; the project’s creators do not particularly focus on them. This is mainly due to the fact that they rarely correlate with the main purpose of using a social network - communication, and also create a large number of problems. Video traffic is the main expense of the project, plus the well-known problems with illegal content and claims from copyright holders. 1000-1500 servers are used to transcode video, and it is also stored on them. Media files are banned by hash when deleted at the request of copyright holders, but this is ineffective and it is planned to improve this mechanism. Obviously, we are talking about developing a more intelligent algorithm for recognizing audio and video content by tags, as is, for example, implemented in YouTube, where an uploaded video that violates the license can be automatically deleted within a few minutes after uploading.
As you know, some time ago it became possible to communicate on VKontakte via the Jabber protocol (aka XMPP). The protocol is completely open and there are many opensource implementations. For a number of reasons (including problems with integration with other VKontakte services), it was decided to create our own server within a month, which would be a layer between VKontakte’s internal services and the implementation of the XMPP protocol. It is implemented in node.js - the choice is due to the fact that almost all project developers know JavaScript, and besides, it is a good set of tools for implementing the task. The difficult part was working with large contact lists. For many users, the number of VKontakte friends is measured in hundreds and thousands, and the activity of changing statuses is high: people appear and disappear from online more often than in other similar situations. In addition, it was necessary to implement close integration with the internal personal messaging system VKontakte. As a result, there are 60-80 thousand people online on the service, at its peak - 150 thousand. The HAProxy TCP/HTTP load balancer handles incoming connections and is used to distribute requests across servers and deploy new versions.
When choosing a data storage system, we thought about non-relational data storage systems (in particular, MongoDB), but in the end we decided to use the familiar MySQL. The service operates on 5 servers of different configurations, each of which runs node.js code (4 processes per server), and the three most powerful ones also run MySQL. An interesting feature is the lack of connection between groups of friends in XMPP with groups of friends on the site - this was done at the request of users who did not want their friends to see over their shoulders which group they were in.
An important subproject is also integration with external resources, which is far from easy to implement in a highly loaded service. Increasingly, on the pages of third-party projects you can see “Like” widgets, which allow you to quickly share an interesting post with your friends, as well as small “We are VKontakte” blocks with data about users within the linked group. The main steps taken in this direction, with some comments:
- Maximum cross-browser compatibility for widgets and IFrame applications based on the easyXDM and fastXDM libraries, which ensure interaction between a third-party resource and the VKontakte program interface. Thus, the problem of cross-domain interaction and the issue of working in all browsers was solved.
- Cross-posting statuses on Twitter, implemented using request queues.
- A “share with friends” button that supports openGraph tags and automatically selects a suitable illustration (by comparing the contents of the tag
- Ability to download videos through third-party video hosting sites (YouTube, RuTube, Vimeo, etc.).
Business goals and objectives that can be solved using VKontakte
There is only one goal here - to increase the company's profits. This goal can be achieved in different ways or a combination of them. Therefore, it is important to specify the goal and formulate specific tasks to achieve it, the indicators of which can be measured. For example, “I want to sell 10 pairs of sneakers through a group every week” or “I want to increase website traffic from social networks by 10%.
A VK page can solve at least seven business promotion problems.
1. Working with negativity . If people regularly leave reviews for your products or services on the Internet, it is better for them to be left in the group. This will help process the negative, console or reassure the dissatisfied client, and if he is completely inadequate, show with his polite and sympathetic responses that such feedback should not be trusted. This will also help you understand if there really are any weaknesses in the work: long delivery, deception of customers by an unscrupulous manager, etc.
2. Presentation of goods . Thanks to the “Products” function, you can create a catalog of products or services: with illustrations, prices, characteristics, and an order button. This will help optimize the time spent listing the assortment in correspondence with a client or by phone and generally increase sales. Users favorite products, share them with friends, and repost them, expanding your customer base.
3. Increasing website traffic . Social networks, with proper promotion, will become an additional source of traffic to the site. Make intriguing announcements of news and articles on your website and publish them along with a link on your page. Add links to the price list, order form, detailed description of services - this way you will increase traffic.
4. Audience research . In a group, you can regularly conduct surveys to understand the preferences and wishes of business clients: which product is more interesting to them, what time they prefer to come to you, which competitors they visit besides you, how they feel about promotions, what they would suggest improving. This data will help speed up your progress.
5. Information . Using social networks, you can instantly inform customers about important changes: closing for renovation, the appearance of a new product, the launch of a profitable promotion or referral program, changes in work hours, opening a new store, moving to another office.
6. Expanding the audience . If you publish high-quality and interesting content that users want to share on their profiles, this will attract new members to the group, some of whom will become your clients.
7. Increased confidence . Thanks to a group or page on VK, you can communicate with clients and show your expertise by answering questions in the comments or publishing useful materials on your topic. For example, if you own a flower shop, write an article with tips on arranging bouquets or choosing flowers for different events. If you have a car service center, make a review of the best online auto parts stores. Subscribers will be grateful.
Not a secret
The veil of secrecy about the technical implementation of VKontakte has been slightly dispelled, a lot of interesting aspects have been published, but many points still remain a secret. Perhaps in the future more detailed information will appear about VKontakte’s own DBMS, which, as it turns out, is the key to solving all the most difficult issues in system scalability. Now, no matter how anyone feels about VKontakte, the service is very interesting from the point of view of building highly loaded systems. Still, 11 billion requests per day, the highest uptime and almost 100 million users are worth a lot.