Thursday, March 15, 2012


Popularity analysis for merging geotagged data

Tanel Tammet

The new site http://www.sightsmap.com integrates wikipedia, wikitravel, foursquare and panoramio on a popularity heatmap.

We do not use any "professional" datasets: all the information shown on the maps (except the underlying google maps, of course) is crowdsourced by panoramio, wikipedia, wikitravel and foursquare users.
The key technology in achieving the right merging and presentation of crowdsourced data is combined popularity analysis of locations and wikipedia articles.

First we created a visual popularity heatmap of the whole world, using both the number of photos and separate photographers in the panoramio database for each area, starting from the coarse-grained whole world grid down through six successive layers of smaller and smaller grids with higher and higher resolutions. The data used comes from the panoramio public api. The colour of each pixel on the heatmap is calculated by the square-root-like function on the photos-and-photographers count of the corresponding grid square, with slight modifications to the function for each grid layer.

The next step was to create a popularity index for geotagged wikipedia articles, excluding the ones with clearly non-geographical content, like people. The geotagged articles are obtained from the dbpedia project and the popularity data for wikipedia is obtained from the wikipedia log files: we used two full days of logfiles, one in summer and one in winter. The popularity rank of the articles is further modified by the type and additional properties of the article as given by dbpedia: for example, if an article concerns a world heritage site, it receives a strong additional bonus.

The combination of the two datasets - heatmap and wikipedia - uses an algorithm which looks for the most highly ranked wikipedia articles geotagged around the top heatmap spots for each subgrid on each layer. First we cluster the heatmap dots to avoid showing lots of markers very close to each other. Then we look for the most popular wikipedia articles near the hotspots: the higher-ranked a heatmap spot is, the larger the area to search. If nothing is found or the found article has a much lower popularity than the heatmap spot, we do not attach anything to the hotspot. Otherwise we connect a hotspot to the wikipedia article plus the corresponding wikitravel article, if available.

Knowing a highest-ranked wikipedia article for an area helps to google for more: the markers additionally give a direct search link to the title.

In addition to the six world-covering layers of successively smaller gridsteps with higher resolutions we created separate ultra-high-res heatmaps for 15000 top hotspots, most of them cities. The resolution of  these high-res heatmaps depends on the popularity rank of the hotspots: the more photos, the higher the resolution, up to street level for top 500.

The ultra-high-res heatmaps are then populated with the combined wikipedia and foursquare markers for top spots in this heatmap, using an algorithm which first tries to associate wikipedia and foursquare to the most popular places on the map, and finally merges the top wikipedia and foursquare articles to the mix, even if they are not located near a visually attractive spot. The foursquare data is obtained via the public api only for the areas surrounding the hotspots, differently from panoramio and wikipedia, which are obtained for the whole world.

Again, we exclude both geotagged wikipedia articles and foursquare locations with obviously non-geographic or non-sightseeing type.  We add bonuses to articles and locations based on the suitability of their type: for example, castles, churches and public squares get different bonuses. 

The end result is a large set of overlay png tiles for each resolution along with their associated json datafiles containing wikipedia/wikitravel/foursquare places, ranks and types.

When showing the map on the browser, we calculate the necessary tiles and json datafiles using javascript, each time a user viewport changes, either by panning or zooming. The new files are loaded, visual parts of tiles are presented and the top location data objects are shown on markers, color-coded based on their relative popularity in the visual area. The default view of the inside-city-markers is geared for sighseeing. If the user chooses to see places to eat, drink or sleep instead, we will just show the foursquare locations with the corresponding types, ranked purely by the total amount of visitors of the foursquare location. Since all the map layers and datafiles are precomputed, the whole application runs in the browser with nothing done on the server side except serving static files.





Saturday, July 23, 2011

How Sightsplanner searches

In this article we will examine more inner workings of Sightsplanner - it's time to see how to construct the recommendation that consists of multiple items, each individually rated according to how well they match with the user's profile. There is a lot of ground to cover, so we will be glossing over some detail.

We will begin by explaining how the recommendation is modeled as an optimization problem and look at the method of solving it - iterative search. Starting with a simple greedy search we will add features one by one, arriving at a more evolved meta-heuristic called GRASP. Also, an example of the search algorithm at work will be given.

The optimization problem

Each recommended item (a tourist attraction) has a predicted value, or utility, associated with them. Items are combined into candidate solutions, which in turn each have an aggregate score determined. We discussed approaches of doing that in earlier articles "Rating the tours" and "The balancing act". The aggregate score is what we are ultimately seeking to maximize. Each item also has a cost associated, including the static cost of visit time, but also the dynamic cost of traveling to the location and possibly waiting until the attraction opens. Finally, there are constraints - the main one being the total time available for the trip, but items can also have opening and closing times which form time windows.

Most general description that applies in our case would be that we are attempting to solve a constraint optimization problem. We look at combinations that satisfy given constraints, while attempting to minimize or maximize some cost function - in our case, aggregate score of the solution. However, this is probably too general to be useful - instead, we should look at more specialized classes of problems which also have more specialized, and therefore better suited algorithms available.

Traveling Salesman Problem (TSP) is one that is well-known, although in it's classical form of shortest path optimization it does not quite apply to tourist recommender systems - rather, we have something more related to TSP with profits, also called the orienteering problem - we're looking for the shortest path (so we can best use our time) with the highest profit. Adding the constraint of time windows, we get even more specific class of problems, namely orienteering problems with time windows (OPTW).

Due to the presence of balance requirements that conflict with maximizing the individual item score (again, see here), the most naturally fitting model is bi-objective orienteering problem (metric 1 is item score, metric 2 is category balance) - however Sightsplanner currently avoids the approach of multi-criteria optimization in order to present a single solution to the user.

The problem classification may appear as a strictly academic exercise, but it is quite relevant because each class of problems also has a range of solution methods being developed and refined - we can pick one of the established methods as a starting point, instead of re-inventing the wheel. However, in this article, we will do just that - re-invent the search algorithm in order to illustrate the logic behind it's design.

Simplistic approaches

It should be common knowledge, but nevertheless worth mentioning, that the most fundamental approach of solving optimization problems is trying all the possible combinations and comparing the results to see which one is best. Another bit of common knowledge is that this exhaustive search approach is usually too time-consuming to be practical. All practical approaches stem from examining only a part of this solution space - we try some combinations and ignore the (vast majority) of others, the tricky part being how to decide which combinations are worth looking at.

A simple search method, which is very fast, and may give reasonable results goes as follows. Start by examining all the items we can use in constructing our recommendation with and pick the one that has the highest rating (and fits into the solution at the current step). Add this item to the solution, then examine the remaining items, seeking the one that has the next highest rating. Keep repeating this procedure until our solution is filled - this is called pure greedy search, the greedy part being that our heuristic for guiding the search is to always pick the item with the highest individual ranking, hoping that this will result in a good overall score. This search only requires one iteration, since repeated applications will produce identical results - hence the speed.

The greedy search on steroids

With some modifications, the humble greedy search can be turned into a fairly sophisticated, yet simple to understand and easily tunable algorithm. First let's turn our attention to the most glaring weakness - only being able to examine one solution candidate. Each recommended item incurs a direct cost (in case of trip planning, the time spent visiting the attraction), which itself is easy enough to handle even by the pure greedy search - just modify the item ranking by cost, which gives an expected cost-benefit ratio. However, each item also incurs an indirect cost - by visiting one attraction towards south, we may cause the cost of items towards the north to increase because of the increased travel time. So it's certainly possible that instead of going towards the south to visit the lone highest-ranking item there, you might want to head north and visit multiple items there that also have a high rank, just not the highest. There is really no way around this problem other than iterative search, meaning that we need to examine many alternative solutions.

The first modification we add is randomness. Instead of picking the highest-ranking item at each step, we pick a random item, but preferably near the top of the ranking list. Now we can repeat our cycle multiple times and get a different sequence of items each time. Note that we are not losing anything compared to the original algorithm - we may run our first iteration so that we're picking the top-ranking item on each step and introduce randomization at the subsequent iterations, so we're guaranteed to get the same solution as the pure greedy search - plus a number of alternative solutions.

If we are willing to spend more time at each iteration, we can refine our search further. Until now it has been implied that we append items to the solution step-by-step, meaning that we start from location s, then add item A, then B and so on. This is computationally fastest, but not necessarily the best - as long as the transposition s, A, B → s, B, A is possible, it might be worth checking if it reduces the cost, as we may expect the value to be the same*. So, at each step of the iteration, we instead examine every possible slot in the solution we can insert the item into, and go with the one that incurs the lowest overall cost. By doing so we have introduced local search in our algorithm. As this includes the solutions that would have been produced by adding items sequentially, we are again not missing anything, compared to the previous version of the algorithm.

Before introducing the final modification, let's return briefly to the discussion in "The Balancing Act". In short, there are multiple categories of items and for the sake of diversity, we would like many categories to be represented in the final solution. At the same time, categories are not equal and some may consistently have higher-ranking items than others. To address that, we described a formula to calculate a score for the recommendation - the formula rewards including items from all the categories present in user profile, regardless of their individual ranking. Until this point, our search algorithm has been ignorant of this policy.

Observing that the category-based scoring model makes the utility of a candidate item dependent on the items already present in the solution, we are now ready to correct the search algorithm accordingly. Originally we ordered items by individual ranking, picking the top item. Then we modified this to pick the top item and random items near the top, but the ordering was still done by individual ranking. However, if at each step we calculate the utility for each item that is dependent on the solution under construction, we may use this value to order the candidates instead**. Here's the pseudo-code that includes all the improvements discussed so far:
BestSolution:=empty
START ITERATION i
initialize the Solution
DO
compute utility value and
best place in Solution for each item
IF i=1 THEN
NextItem:=item with highest utility value
ELSE
NextItem:=random item with a high utility value
END IF
insert NextItem in Solution at the best place
WHILE Solution does not exceed total time available
IF Solution > BestSolution THEN
BestSolution:=Solution
END IF
END ITERATION

* - glossing over the details here - in some cases the value won't be the same, for example pool, restaurant is preferable to restaurant, pool.
** - this is not the only motivation for re-ranking the items between search steps, but it's the most pertinent to Sightsplanner.

GRASP

The algorithm we have constructed is an instance of Greedy Randomized Adaptive Search Procedure (Full article from Metaheuristics Handbook available here). The other parts of the name are probably self-explanatory at this point - the word adaptive refers to re-calculating the utility value (referred to as heuristic value in GRASP terminology) at each step of the search.

Classical GRASP is somewhat different in that the local search is performed once at each iteration - in Sightsplanner it's integrated into heuristic value calculation. This approach was pioneered by another recommender (journal article) and seems to work well in practice.

So far we didn't really explain how the items are picked randomly, yet favoring the high-ranking ones. If we take the queue of items sorted by their utility value and draw a line at either 30% of the top items, or items that have at least 70% of the utility value of the top item in the queue, we are saying that α=0.3. The next item is chosen randomly, based on uniform distribution, from the items above that line. As noted in the Handbook of Metaheuristics, if α=0, then we have a pure greedy search, while α=1 corresponds to random construction. This is a powerful feature of GRASP that allows us to control the greediness/randomness, even automatically. The list of items that the random selection is made from is called restricted candidate list (RCL).

Sightsplanner's search at work

The following picture shows the first step of an full iteration in Sightsplanner's GRASP search. For illustrative purposes, we've taken up the task of recommending colored balls - the colors may be thought of as metaphor for different categories of items. We begin by computing the heuristic values; these are then used to order the items to produce the restricted candidate list. Not all items necessarily make it to the candidate list at each step - a real-world example would be an attraction that is too far from the starting location to make visiting it viable initially, but if our trip takes us closer to the attraction it becomes viable. Here the blue ball rated at 0.8 is unusable initially.



When the second step begins, the heuristic values of red balls are shifted (while their individual item ratings remain intact) - this is because we'd like our recommended set to be as diverse as possible. The heuristic value calculation is capable of predicting that including too many red balls would be counter-productive.



The third step is similar, except that green balls are the only ones that are not represented in the solution yet, so everything else gets penalized. The size of RCL has shrunk - as in the actual Sightsplanner program, the size is limited by value and only two balls are above the alpha threshold. Here we also see that the insertion point of a new item may be between items already present in the solution.



If our initial goal was to find three balls, then our iteration is complete. We may compare this solution to earlier ones and re-initialize the state for the next iteration. The current solution can either be cleared completely, or some items may be left in place to seed the next solution - this makes the iterations faster, but more prone to getting stuck in local maxima. For instance, if the red ball rated 0.9 is left in the solution, a promising alternative where the red ball (1.0) is present, won't likely be found during the next iteration at least.

Summary

We've looked at a meta-heuristic search method called GRASP. The algorithm can be derived by taking a primitive greedy search and adding features, until we have enough domain-specific knowledge included to produce recommendations under the constraints of time windows, total time and the scoring model that emphasizes category balance. The algorithm is not yet widely used in trip planning recommender systems - we are only aware of Citytrip Planner where the approach was tried originally.

Sunday, June 5, 2011

The Balancing Act

In http://sightsplanner.blogspot.com/2011/05/rating-tours.html we discussed the method of rating the output of heuristic search so that the tours with individually highest-ranked items would be preferred. Having a reliable rating method is the basis of guiding the search and selecting the best from the candidate solutions.

One method of differentiating the tourist attractions in Sightsplanner database is their popularity property - a number that describes the relative activity of tourist visits to the attraction. The most active site currently has 2500 daily visits, while some obscure sites may go as low as 1. A typical museum might have 100 visits. The popularity property is aggregated into the final item score, so that with two otherwise identical items, the one with a higher popularity property will receive a higher score.

Although the ability to differentiate between similar-looking (to the machine) churches or museums is highly useful for enhancing the user experience, it also introduces side effects. In a historically attractive town like Tallinn, historic sites like medieval architecture are far more popular among tourists than sports and outdoor activities - in other words, items in some categories will tend have a higher score on the average than other categories.

From the statistics taken from Sightsplanner database of 835 tourist attractions, dated 01.03.2011, the average values of popularity property were the following (click on the image to see the graph in it's full glory):



The categories represent the seven sliders from Sightsplanner homepage. Even though these averages don't provide us with the exact statistical distribution, it would not be unusual if a large number of architecture-related items had higher popularity than the best sports-related item in the database.

So imagine we pushed both the "Architecture & City" and "Sports & Outdoor" sliders to the maximum, leaving the remaining ones at the minimum value, then ask Sightsplanner to give us a recommendation. Both categories of items should match to our preferences with similar level of confidence, but after adjusting for popularity, the top-ranked items would be mainly, or even exclusively, architecture-related. Applying the methods described in the "Rating the tours" article, the recommender would then accordingly attempt to produce tours that consist of architecture objects. Did this meet with the user's expectations?

To answer that question, we will take a step back and have a look at the input of the preferences from the recommender homepage.

1. Interpretation of sliders

Events: 0
Architecture & City: 1.0
Museums & Arts: 0.5
Eating out: 0

In this sample of sliders, 0 marks the leftmost position, indicating that there is no interest in that category. The rightmost position translates to the value 1 and the positions in between to fractional values.

In order for the recommender software to produce output that matches the user's preferences, we must, as the first step, unambiguously define what this configuration of sliders means. There are two plausible interpretations:

Interpretation 1: I would like to spend 66% of my time seeing architecture and 33% of my time visiting museums
Interpretation 2: I like architecture twice as much as museums (and hate the rest). Thus architecture sites on the average have a match score of S, museums have S/2 and the rest have 0.

If we could pick one of these as the optimization criteria and ignore the other we would have a straightforward way of searching for the maximal solution. As we saw, introducing the generally useful notion of popularity as an item score component produced the effect where one category could occupy all of the top spots in the item rankings - so using the first interpretation would help to ensure that the categories receive fair representation. The choice of time as the method of determining shares is arbitrary, but seems better suited than counting the items. On the other hand, the second interpretation remains one of the main design concepts of Sightsplanner and cannot be discarded either - in other words, we need to accommodate both interpretations.

The goals of the recommendation have then become:
- I would like each attraction in my tour to be as high-rated as possible
- I would like each category that I've expressed interest in, receive a fair share in the tour

2. Tackling the categories

The individual item ratings as an optimization criteria is already addressed by our tour ranking method. The redefinition of goals has upgraded our problem classification into the field of multiple-criteria optimization - we should now also observe whether all the categories present in the user profile are present in the candidate solutions and favor those where the proportions of categories by time are similar to what was indicated in the profile. We shall call this new criteria the balance criteria, as it represent the balance between categories.

There are two general methods of doing multiple-criteria recommendation - one is aggregating all the criteria into a single value and optimizing for that. The other is to treat the criteria separately, using a specially-designed search algorithm that produces a set of Pareto-optimal solutions. This set is then presented to the user.

In the current concept of Sightsplanner the recommender is expected to return a single optimal solution. For that purpose, computing the Pareto set does not appear to add much value, as automatically picking a single solution from the set would still require computing a single aggregate score for each of the set members.

Now that we know we need to arrive at a single rating value that would describe both the average individual item score, as well as the balance of categories. The formula we start with is:

m = sum(m_i) / (n + t_u / t_min)

(m_i is the individual item score, n is the number of items and t_u is the unused time. Also the notation of t_avg has been dropped in favor of t_min, the minimal visit time of a recommended item).

Observe that the score formula already contains the notion of "unused" time. This is a construct we can exploit further. Consider our target time shares of 33% museums and 66% architecture from the sliders example - if the items in the candidate solution indeed make up these proportions, we can call the time spent on this tour "well-utilized". If, on the other hand, the items would constitute of 80% architecture and 20% museums, we have an extra 14% of architecture that was not needed. In this case 86% of the time was well utilized.

This fits rather well into our score formula. Instead of looking at the average score of all the items, we only consider those items that contribute to filling the expected category time shares. The rest of the items are discarded, so that the search is no longer rewarded for piling items from a single top-ranking category. The time that was left over from filling the category shares is added to the otherwise unused time.

The new, Utilized Time Model formula:

m = sum(m*_i) / (n* + (t - t*) / t_min)

Where m*_i denotes the individual item score of all the contributing items, n* is the number of contributing items and t* is the the "well utilized" time. When there are more items in a category than required to fill it, the items are prioritized by their individual score - the higher-scoring ones are included first. The items are treated as discrete, so in case there is an item without which a category would remain unfilled, but adding it "overflows" the category, we allow the item to contribute to the score if more than half of it's visit time is useful.

3. Avoiding overdoing it

Sightsplanner computes the expected time shares of categories in minutes, so that a 100% share would be equal to the duration of the entire tour. This approach was chosen, because it makes the GRASP search inside the planner easier to guide - until the entire tour is filled with items, some categories will remain unfilled and the heuristic can prioritize the items appropriately while inserting them one by one.

If, on the other hand, we looked at the proportions items currently in the half-finished solutions, we might think that after inserting a single museum into the solution, we have 100% museums, meaning that we have grossly overfilled our 33% expectation. This can confuse the heuristic and produce anomalies(an item that is actually needed may get a weak heuristic score because it incorrectly appears to be overfilling a category).

When using time quotas, meeting the balance criteria exactly is likely to be too demanding. Even if the planner gets the item proportions right, travel time between different attractions makes it impossible to fill the quotas given in minutes. This can be compensated for by reducing the quotas by some predetermined amount - in other words, relaxing the balance criteria.

More importantly, though, it is likely that the recommended items won't fit neatly, leaving categories under- and overfilled. Let's look at a different configuration of sliders:

Events: 0.2
Architecture & City: 1.0
Museums & Arts: 0.2
Eating out: 0.2

Value of 0.2 would be a slider that is moved 1/5th from it's leftmost, "uninterested" position. Let's assume that the duration of the tour is long enough so that our Utilized Time Model formula properly rewards the planner for including items from "small" categories. The shares are so small however that they consistently get overfilled even by a single item. Together, these extra minutes assigned to events, museums and eating out take a large portion of time away from the architecture category. It again comes down to interpreting user expectations, but it is not unreasonable to assume that for a tour of short duration, items of low priority would be excluded, instead of including them at the expense of a high priority category.

To address this, we take a slightly different approach to relaxing the balance criteria. The quotas will be lowered more if the sliders are towards the "uninterested" position and less if they are towards the "interested" position. This creates some buffer time to travel between recommended attractions, as mentioned earlier, but mostly at the expense of categories that the user is less interested in.

We will omit the actual time quota table from Sightsplanner for brevity and just look at how our sliders example would be affected. A slider position of 0.2 is assigned a ratio of 0.5 and the slider with a position of 1.0 has a ratio of 1.0. Assuming that the tour length is 3 hours, we get the following time quotas:

Events: 180 minutes * (0.2/1.6) * 0.5 = 11 minutes
Architecture & City: 180 minutes * (1.0/1.6) * 1.0 = 113 minutes
Museums & Arts: 11 minutes
Eating out: 11 minutes

So everything except the architecture will be fairly insignificant and the planner won't be rewarded for recommending items in these categories, unless they have very short visit times (following the rule that at least half of the item's time should contribute to something for it's score to count, it needs to be under 22 minutes in this case).

4. Other considerations

Our scoring model allows one more simple method of refining the calculation - so far we've exploited the utilized and unused time constructs, the other major component is the individual item scores that contribute towards the average. These could be modified to encourage or discourage specific behavior of the planner. Let's look at an application for modifying the individual item scores.

Links are the part of the tour that fall between two waypoints (recommended items). Sometimes there is a remotely located, but very interesting attraction that would contribute a significant amount to the overall score. It is possible that the high score outweighs the disadvantage of having to expend a large amount of time to get there, as it's possible that all the alternative attractions are simply so much worse. In such case, a recommendation where the user has to, for example, walk 60 minutes from one attraction to another, is possible.

However, this would not be the case if the remote attraction's score would be lowered exactly due to it's remoteness. This is fairly simple to implement: if the link leading to an item is longer than some specified amount of time, the item score is lowered. The planner is no longer rewarded for including it and might have to look closer. Note that the penalty is specific to that particular link, so the item might still be a healthy choice if we later end up somewhere closer to it.

Specifically, the operator of the recommender can set soft and hard tolerance limits (such as 30 min, 60 min) for the link length - we'll start lowering the item score once the link reaches the soft limit, it should reach the minimum at the hard limit. The item score is then:

m*_i = \epsilon + (t_hard - l_i) (m_i - \epsilon) / (t_hard - t_soft)

\epsilon is some small score greater than 0 that makes this item more useful than not visiting anything at all; l_i denotes link length and m_i is the unmodified score.

Monday, May 9, 2011

Rating the tours

Sightsplanner is built around the premise that we have an idea what the user is interested in and are able to rank individual items - more specifically, tourist attractions - from our database according to that. Once we obtain such rankings, we have learned what the best-matching attractions are and putting the tour together should be a cakewalk.

Let's give it a try. Say we have 3 hours and our interest-matching calculation has provided these items from Tallinn city center:
(1) Oleviste church steeple (1.0), visit time 30 min
(2) Niguliste church (0.8), visit time 60 min
(3) Olde Hansa restaurant (0.7), visit time 100 min
(4) Viru shopping center (0.3), visit time 60 min

The numbers (like 1.0) are match scores, meaning that the Oleviste steeple appears to be best matched to the user's interests and besides churches, the user also has interest in dining and shopping.

It is evident that not all of these items may be visited, as the total time would exceed 4 hours, while we only have 3 at our disposal. So we need to take some items and discard the others, making sure we get the most out of our available time - a combinatorial optimization problem.

There are many techniques to solve this class of problems but there is usually one thing in common - we need to compare one candidate solution to another to find out which one is the best (at least, of those we have examined so far), so a numerical value is assigned to each - a metric or criteria.

1. The obvious approach

So what happens if we just add the individual item ratings together? Indeed this looks like a natural way of giving an overall rating to the tour. If we visit better attractions, the sum of ratings must increase and also, this should favor the solutions where the items are packed into the available time as efficiently as possible.

m = sum(m_i)

Where m_i is an item included in the candidate solution.

Looking at our example, the combination of (1), (2) and (4) appears to have the highest m = 1.0 + 0.8 + 0.3 = 2.1 - every time we try to include (3), only one other item fits which gives us a maximum m = 0.7 + 1.0 = 1.7 for solutions including Olde Hansa.

Incidentally, if we use this scoring method, we have modeled our tour-building task as equivalent to what is called the "0-1 knapsack" problem in the literature. The knapsack in the name refers to the anecdotal description of the problem where a thief has a limited size knapsack and tries to fill it with most valuable items possible, while each item also has an amount of space it takes up. The "0-1" part comes from the more formal mathematical description and means that each item may either be taken or not taken (but you're not allowed to cut them up).

At this point we may congratulate ourselves on a job well done and start programming our elegant little scoring model into our recommender. The user, on the other hand, may be left wondering why the program sent him/her shopping while dining had a higher priority. Turns out that this approach has a natural bias for items that have short visit time, as it is possible to fit more of those into the available time - and this bias isn't quite intuitive and acceptable to the user, leading to unsatisfactory user experience.

In our first example, the situation was somewhat ambiguous, as we had two possible categories of items to choose from and given the limited time, we picked the less time-consuming alternative, while the discarded item wasn't even the highest-scoring one. However, this bias can degrade the quality of recommended tours much more severely. Let's look at three museums:

(A) 100min score 1.0
(B) 30min score 0.6
(C) 30 min score 0.6

If we have 100 minutes to use, including (A) would limit us to a maximum score of 1.0. Picking (B) and (C) instead, we increase the minimum score to 1.2 and have 40 minutes to fill with more items! So, we have a high-scoring item (A) that the user will never see in recommended tours. In reality, people are visiting that attraction however, furthermore it is so interesting that they are spending a lot more time on the average there - so our recommender must have failed somehow.

Here one might argue that it's possible to differentiate scores more, so that weaker items are discarded anyway. However this approach is somewhat artificial and does not remove bias for shorter visit time among "equal" items. Also, we would like the tour-building phase of the recommendation to work somewhat independently of the calculation of individual item rankings.

2. Compensating for time creates a different kind of bias

Another intuitive approach to rating the tours might be saying, "I would like to spend each minute of the tour at as high-scoring attractions as possible". From the user experience point of view, this statement looks promising. So, let's see how we can achieve this. If we spend 30 minutes at an item with a score of 1.0 and 60 minutes at an item with a score of 0.4, we could say that on the average, we spent each minute at a 0.6-rated item. So, calculating the overall rating for the candidate solution is again quite simple:

m = sum(m_i * t_i) / t

Where t_i is the visit time of the attraction and t is the total time. Within the calculation of one recommendation, t can actually be dropped, as we always have the same amount of time available for the tour.

Revisiting our museum example, after compensating for the visit times like this, we get:

(A) 1.0* 100min = 100
(B,C) 0.6* 30min = 18

So, we have remedied the earlier flaw and one good item is now superior to 3..4 mediocre ones when calculating the total score. When using this new formula to rate the solutions to our first example, we get that our earlier winner (1),(2) and (4) scores m = 1.0*30 + 0.8*60 + 0.3*60 = 96. Instead, picking (2) and (3) gives m = 0.8*60 + 0.7*100 = 118 - a new best solution. This suggests that there is no more bias against the long visit time; while we have lost the individually highest scoring item, considering the very limited combinations available in the example this does not necessarily indicate a flaw of the scoring model - if there was twin item (1a) with the same properties as (1), those two could be inserted to replace (2) and the overall score would increase.

Yet there is a problem with this calculation method also and to understand why, we need to include the search algorithm that actually builds the candidate solutions. In a real-world application, having just 4 matching items is unlikely and would even defeat the purpose of automated recommendation - more likely we would get 40 or 400 matches to the particular user's profile. When that happens, we no longer have computing resources to try out all the possible combinations and need to rely on approximations of what might be the best solution.

Enter the heuristic search. If we look at a state-of-the-art meta-heuristic like GRASP (Sightsplanner also uses a derivative version of it), it has some characteristics of a simple greedy search in that it tends to pick higher-scoring items first - the idea, of course, being that this guides us to a best solution faster.

In the above museum example this would be quite fine, as (A) is likely to be picked first and as long as the search runs, we could try combinations that include (A) and something else. However, the scores may be distributed the other way:

(D) 0.5 * 100min = 50
(E,F) 1.0 * 30min = 30

Now, the 100 min item (D) looks like a more attractive first choice for the search algorithm. Of course, given enough time, the search could eventually find a different combination that includes (E) and (F) yielding a higher total score, but only exhaustive search could guarantee this. In practice we have to aim for a search technique that converges towards a "good-ish" solution fast.

3. Average score looks nice

So, getting rid of the dependence on item visit time has become desirable. Let's revisit the statement of the goal of the search for the best solution - perhaps if we reword it as "I would like each attraction in my tour to be as high-rated as possible", we are not distorting the meaning from the user's perspective. Mathematically, we get quite a difference:

m = avg(m_i)

Our first and second formula could also be expressed as average score, however this one is explicitly the average item score. Again applying that to our candidate solutions, we find that the winner by first formula, the solution consisting of items (1), (2) and (4) gets m = (1.0 + 0.8 + 0.3) / 3 = 0.7 and the winner by second formula scores m = (0.7 + 0.8) / 2 = 0.75. Checking our museum examples, obviously item (A) always yields higher average than (B) and (C), while from the search aspect (E) and (F), when using their individual item rankings, remain preferable to item (D).

The above looks very nice, but there is a catch - so far our model has been too perfect and we have ignored the real-world aspect of having to walk (or drive) from one attraction to other, attractions having opening and closing times and even the time available to perform the search expiring before something sensible has been produced. So, in a real-world application it is possible that tours containing a lot of "waste" time, such as picking inefficient routes between attractions, are produced. Should a candidate solution containing just item (1) appear somehow, it would win by virtue of it's unbeatable m = 1.0, even though 2.5 hours of the tour would be completely wasted.

Search algorithms can be guided to always use the available time, but from the aspect of just being able to compare all the conceivable candidate solutions, the problem does not go away. On the other hand, we can attack the issue from the scoring point of view in a fairly straightforward manner.

Let's introduce variable t_u that denotes the total time not spent at an attraction. We might now calculate some sort of a penalty to the overall score that increases when t_u increases and is 0 when the t_u is 0 (or low enough). This would put us on a slippery ground of having two criteria - "spend as much time at attractions as possible" and "have the highest possible average". The implications of this are beyond the scope of this article, but considering we get two sometimes diverging scores for each candidate solution, combining them together may require some sort of weights (of emphasis, relative importance), which leaves us with an administrative task of tuning these weights to best match our database of items and expectations for recommender output.

Instead, let's pretend that we did spend the unused time at some attractions - since they were completely useless, they contribute 0 individual item score each. Our formula then becomes:

m = sum(m_i) / (n + t_u / t_avg)

Where n is the number of items in the candidate solution and t_avg is some amount of time we have assigned to visiting these imaginary items which could also be derived from the available item data. We consider the imaginary attractions as discrete items, so the remainder of t_u/t_avg is ignored.

Note that we haven't quite avoided using a parameter to aggregate the evaluations of two potentially conflicting criteria as stated earlier. The difference now is that we can predict the behavior of the recommender somewhat - if there is something, anything with a score higher than 0 to insert into that unused time, the recommender will try to do so. The extreme cases where one high-scoring item alone beats a number of very low-scoring ones remain - without collecting statistics on recommendations it is too early to state whether this is an issue that still needs addressing.

For the sake of completeness, here are the scores for candidate solutions we have found so far, using t_avg = 45 min: items (1), (2) and (4) m = (1.0 + 0.8 + 0.3) / (3 + 30/45) = 0.7; items (2) and (3) m = (0.7 + 0.8) / (2 + 20/45) = 0.75; a solution consisting of just item (1) m = 1.0 / (1 + 150/45) = 0.25.

Thursday, March 24, 2011

About sightsplanner

Sightsplanner is a personal recommender for your trip to a foreign city. It was launched for Tallinn in spring 2011.

Based on your time of the visit and your preferences about different types of sights and events, the system will suggest which places to visit and will build a realistic itinerary just for your visit.

Why use sightsplanner?

Obviously, most people planning a trip start investigations by looking for materials on the web. It is easy to find a large amount of information about the city and its most popular tourism objects, but it takes very long time to select and organize all the interesting objects. Places and events which are really interesting for you, but not for the general public, are easily missed.

Once you find the sights you'd like to see, you need to build a rough itinerary. This is - obviously - pretty hard. We need to find out opening times and schedule the visits accordingly. It may turn out that some of the objects do not fit into the time schedule, in which we will have need to select different objects. All this takes a lot of effort.

Sightsplanner takes care of finding the objects based on you personal preferences, preferences, visit time, method of travel and the corresponding information about sights: their types, opening times, location, typical visit time and popularity of the sight. It will also build an itinerary for your visit. Last not least, sightsplanner knows more objects and more about the objects than any other site.

Using sightsplanner

Your plan and suggestions are composed using your preferences, visit time, method of travel and the corresponding information about sights: their types, opening times, location, typical visit time and popularity of the sight.
Set your preferences by moving sliders left or right. Moving the slider left will give fewer suggestions in that category, moving right will give you more. You can set more detailed preferences by opening the subcategory list: just click on the category name and the detailed list will open.

After setting your preferences, click the "Look" button: this will build your personalized visit itinerary. If you just want to see the most popular objects of the selected category mix, click the link "Popular objects" below the button. At the bottom of the page there are several options for defining your visit - start time and date, duration and means of transport. The question marks after the sliders give a few hints about what might be included in that category. The four pre-set options at the top of the page simply move several sliders at once to a predefined position. 

How does it work?

The suggestion engine finds interesting objects for the given user, based on the user preferences, using sophisticated probabilistic matching algorithms and a wealth of knowledge about the objects. For example, sightsplanner knows the rough popularity of the object, it knows the different categories (along with fuzzy degrees) the object is in, it knows the average visit time and opening times. Each knowledge item is associated with a degree of confidence.

In order to find the interesting objects, each query recalculates the score of "interestingness" for absolutely all the objects in the database, for exactly the user-given set of preferences and limitations of the given query. Simply put, the objects that the user probably likes have a higher score and vice versa.

The objects and events with their calculates "interestingness" are then organized into a timetable based on their location and time using several optimal trip search algorithms in parallel.

Conceptually, we use extended semantic web techonology plus probabilistic rules with a confidence measure. Data is kept as extended RDF sextuplets - object, property, value, confidence, source, timestamp - in both an ordinary relational database and an extremely fast main memory database for match calculation and planning. We derive additional information for objects using a rule system including full predicate logic augmented with a confidence measure.

The system database is built by merging, scraping and converting information from a wide number of different sources: several structured databases, several semistructured thematic websites plus human research and data augmentation/correction.

Contacts and links

Sightsplanner was built by a team of developers and researchers from three companies Mindstone, Apprise, ELIKO and the Tallinn University of Technology.

For further information, contact Tanel Tammet: tanel.tammet@gmail.com

Some links you may want to check: