Freebase – Database for world knowledge/data
TweetThrough my CTO‘s blog, came across this company called Metaweb and their product – Freebase – a database for world’s data/knowledge. At the outset, it seems similar to Wikipedia and in fact Freebase has used Wikipedia as one its sources to seed its initial set of data. So, what is different about them from Wikipedia, Google Base etc.?
- Much like a content management system, they combine structured and free form data, maintain the association between them and more importantly leverage this to allow search based on both kinds of data. This kind of search can be more targeted.
- Content types/classes can be defined and instances of these types become the meta-data/structured information. For example, facets of a person/place/thing are stored as structured data.
- This way of organizing data allows one the ability to perform complex queries on structured data and can be combined with text search on unstructured data – ala content management system
- Freebase also provided APIs that allows applications to tap into this vast amount of data. Looks like this is one place Metaweb plans to monetize.
I did a search on “Gundappa” – my favorite cricket player and it did bring a list of available matches, and with “Gundappa Viswanath” I found the following tags/content types – “Person, Cricket Player, Cricket Bowler, Pro Athlete”. These types allow one to associate meta-data with Gundappa such as his Birth date, type of player etc. Of course, there is also a description/article (unstructured information) on him.
As my CTO has mentioned in his blog, I do think that this kind of “information base” has its own value within an enterprise.
I am amazed at the way the next wave/generation of applications web – based on social tagging, mashing up and information gathering/publishing is coming up. The true benefit is when such information can be mined in ways that allow us to find hidden meaning/purpose within them and then use such meaning to come with appropriate conclusions. From an
Another interesting technology to look at is Amazon’s Mechanical Turk – a way to combine human intelligence to solve a problem. Amazon does have a diverse portfolio and I believe they are one of the “cool” technology companies around, except they do not tout their own coolness!!.
Hi Ganesh,
I had trouble understanding some of your points. So I did a search for Gundappa vishwanath on Wikipedia, Google and Freebase. Wikipedia directly took me to the Gundappa page, Google listed all articles connected to him, The first page on Freebase was again Gundappa.
How are Content types/classes different from tags or catetories of the blogging world. May be if you can give example of the kinds of complex queries that would work on Freebase but not on google.
Excellent article Ganesh. I agree with you about Amazon. Their web services are very mature and scalable. One of the visionary and creative companies. (May be it is to do with all great coffee they get at Seattle:-) )
Archana,
Thanks for asking the question. It made me look at Freebase’s architecture more deeply. At the outset, I thought it was another implementation of a Content Management system. However, looks like it is not.
Traditional Content Management (CM) systems such as FileNet, Documentum let you define a Content Type – which provides a structured way to describe an item and then a Document/Article – unstructured content can be associated with the item. You can index unstructured content and provide a way to query structured (via the database) and unstructured data (via indexed words) together. For example assuming “Person” was a content type, “Gundappa” would be an instance of that content type. Article(s) associated with Gundappa would be the unstructured content. One can now search for all articles associated with Gundappa that has the word “Century”. Of course, a CM system would do more, but this simplified picture should suffice for the sake of this discussion.
Freebase also has the concept of a Content Type. But they also have another concept called topic. Topic is the article/document. One would create a topic and then associate that with instance(s) of content type(s). It also looks like content type is inheritable. Topics are also associated with one another. In the case of Gundappa, he would be associated with content types such as “person”, “athlete”, “cricket player” etc. On the topic side, he would be associated with the “game of cricket”, “country of India” etc. “Game of cricket” association comes from the fact that he is a “cricket player”. He is associated with the topic of “India” based on the fact that he is a “person” and “person” has an attribute called “country” and “India” is a “country”. So, with Freebase , you start building a graph of such relationships between topics leveraging the content types. Their USP (and how it is different from wikipedia) comes from two things –
In a CM system, you store an item; describe the item by associating meta-data/attributes with it using a content type. Looks like with Freebase, you create a topic /document and then associate a content type with it – which is a way to describe the document/article. However, you also build relationships between topics which a CM system does not provide (at least AFAIK).
Though not much is known about Freebase’s architecture, the following links might help a bit –
http://www.freebase.com/view/%239202a8c04000641f800000000544e148
http://www.freebase.com/view/%239202a8c04000641f800000000544e143
http://weblog.infoworld.com/stratdev/archives/2007/03/freebase_the_se.html
Now, to answer you other question – how are tags different from content types? Tags are a way to classify an item. Content Type provides the mechanism to define/describe an item. Content Types have attributes/properties that help define an item. The term item is used in a broad manner and could include things such as person/place/thing etc. Hope this helps.
Please note that this is what I have gleaned from my knowledge of the Content Management domain and a quick search for “Freebase” on the web. Hope I did not confuse you more 🙂
Ganesh
Ever since Ganesh wrote this post, i have been reading up on Freebase and trying to make sense of it. I can’t say i have understood much yet.
But your question about tagging is easier to answer i think.
In the Content Management/KM world, you have this popular beast called Taxonomy (please recall your biology class linnaeus binomial nomenclature as one of the first examples of taxonomies in the world). The idea is, as Ganesh says you have a set of metadata and you classify a document according to it. For example, when you upload a document into Channel One, it captures a set of attributes to classify the document which is later used for searching. These attributes are usually in the form of pull down menus so you can classify.
Now in the web world, Joshua Schachter, the pioneer inventor of Del.icio.us one of the earliest social bookmarking systems (now part of Yahoo) came up with the idea of just using a tag. For instance, when you write a post on wordpress, you use a set of keywords that classify the post. The difference is you can use any number of keywords or short phrases without using a formal taxonomy. This came to be known as folksonomy. This tagging concept now pervades every web 2.0 site on the internet – youtube, slashdot, flickr, … in channel one also we have copied the same idea essentially.
So the Content Type that Ganesh refers to would be a category in a taxonomy. Hope this helps.
Ganesh,
Excellent post. I have been reading up on Freebase and i can’t say i have undestood much. If you look at clusty.com it offers the clustered search. so if you searched for amazon, it would list a cluster on the river as well as on the e-commerce company. In the links you referred to there is references to semantic web (RDF) etc. It is not clear to me how the information in Freebase gets built up over time?
Thank you very much for the detailed response Ganesh. Now I get the picture.
Sukumar,
Information is built up over time by you and me just like wikipedia. Reference and cross references amongst topics would have to be done by you and me. Since, most of content out there does not have meta-data (via RDF etc.), we have to specify the meta-data via content types and then perform the association between topics.
At least that is how I think it works.
Ganesh
Thanks Ganesh.makes sense. In that case I need to understand how it differs from google base. -sukumar
Google base is very interesting. I submitted my favorite recipe Majjiga Pulusu from my blog . Let me see if it becomes search able from Google Base within 1 hour.
From the FAQ
How’s Freebase different from Wikipedia? From Google Base?
Wikipedia and Freebase both appeal to people who love to use and organize information. The difference lies in the way they store information. Wikipedia arranges information in the form of articles. Freebase lists facts and statistics. Freebase’s list form is good not only for people who like to glance at facts, but also for people who want to use the data to build other web sites and software. Information in article form can’t be reused in the same way (though, obviously, articles are awesome for other purposes).
In addition, the topics covered by Freebase include subjects that are too obscure for Wikipedia, which strives for notability appropriate to an encyclopedia.
Google Base is a whole ‘nother ball of wax. The data in Freebase is all shared and collectively editable, with a single instance of each topic (as described in Question 1). Google Base, on the other hand, lets you help other people find your data, but it doesn’t provide a community editing tool nor does it attempt to reconcile data sets. It’s a different animal.
Vamsi,
Thanks for the example and the comparison between google base and freebase.you saved me some research time -Sukumar