We just shipped a new version of the Twitter app with a brand new search experience that blends the most relevant content - Tweets, user accounts, images, news, related searches, and more - into a single stream of results. This is a major shift from how we have previously partitioned results by type (for instance, Tweet search vs. people search). We think this simplified experience makes it easier to find great content on Twitter using your mobile device.
A typical search scores items of the same type and picks the top-scoring results. In a blended search experience, this is not straightforward. The scores of different content types are computed by different services, and thus not directly comparable for blending. Another challenge is to decide which type of content to mix, as not all content types are always desirable to display. This post discusses our approach to solving these challenges.
When a user searches, different types of content are searched separately, returning a sequence of candidate results for each content type with a type-specific score for each. For certain content types that are displayed as a single group or gallery unit, such as users or images, we assign the maximum score of results as the representative score of this content type. The result sequences for some content types may be trimmed or discarded entirely at this point.
Once results of different content types are prepared, each type-specific score is converted into a universally compatible score, called a “uniscore”. Uniscores of different modules are used as a means to blend content types as in a merge-sort, except for the penalization of content type transition. This is to avoid over-diversification of content types in the blended result.
Fig. 1: Search ranker chose News1 followed by Tweet1 so far and is presented with three candidatesTweet2, User Group, and News2 to pick the content after Tweet1. News2 has the highest uniscore but search ranker picks Tweet2, instead of News2 as we penalize change in type between consecutive content by decreasing the score of News2 from 0.65 to 0.55, for instance.
Individual content is assigned a type-specific score, which is called a “raw” score, by its corresponding service. To facilitate blending and ranking content of different types as described above, raw scores are converted into uniscores using type-specific log-linear score conversion functions – where the chance of a converted score to take its value in [0, 1] is at least 95%, as estimated from observed dataset.
Content selection and boosting
Certain types of content may not have many relevant items to show for a particular input query, in which case we may choose not to include this type of content in search results. In other cases, for instance if query volume or matched item counts have an unusual spike (what we call a “burst”), we show this type and may also boost it to appear at a higher place in the results. To facilitate this, we represent trends in searches or matching result counts as a single number that is proportional to the level of “burstiness”.
For example, consider measuring “burstiness” for the number of images and news content matching the query “photos”. We first obtain three sequences of term frequencies, e.g. :
Fig. 2 : Three sequences of number of Tweets over eight 15 minute buckets from bucket 1 (2 hours ago) to 8 (most recent).
Tweet : counts of Tweets that match query “photos”.
Image : counts of Tweets that match query “photos” and contain image links.
News : counts of Tweets that match query “photos” and contain news links.
Query “photos” is shown not only to match Tweets with image links more than those with news links but also is increasing over time.
Our approach to compute the burstiness of image and news facets is an extension of original work by Jon Kleinberg on bursty structure detection, which is in essence matching current level of burst to one of a predefined set of bursty states, while minimizing too diverse a change in matched states for smooth estimation .
In our extension, burstiness of mixable content types including images, users, news, and tweets are computed simultaneously to reflect relative difference in bursty levels between different types and used the distance of observed rate from each state’s bursty level as state cost. This is because accurately estimating probability of occurrence is infeasible for real-time estimation due to expensive computational cost and possible introduction of zero intervals between probability states due to numerical approximation. Optimal state sequences for images and news are estimated as shown in Fig 3.
Fig. 3 : Normalized image and news counts are matched to one of n=5 states : 1 average, 2 above, and 2 below. Matched states curves show a more stable quantization of original sequence which has the effect of removal of small noisy peaks.
Finally, burstiness of each content type is computed as an exponential moving average of state IDs in the optimal state sequence. As shown in Fig. 3, jointly optimizing the sum of state cost and transition cost yields a smooth quantization of original sequence, which automatically filters out small noisy peaks in original counts. Also, this maps both trending (bursty) and steadily high sequences to a high burstiness value.
Burstiness computed this way is used to filter out content types with low or no bursts. It’s also used to boost the score of corresponding content types, as a feature for a multi-class classifier that predicts the most likely content type for a query, and in additional components of the ranking system.
 J. Kleinberg, Bursty and Hierarchical Structure in Streams, Proc. 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2002. (PDF)
Posted by Youngin Shin
Did someone say … cookies?