This chapter proposes exercises to manipulate and query real-world RDFS ontologies, and especially YAGO. YAGO was developed at the Max Planck Institute in Saarbrücken in Germany. At the time of this writing, it is the largest ontology of human quality that is freely available. It contains millions of entities such as scientists, and millions of facts about these entities such as where a particular scientist was born. YAGO also includes knowledge about the classes and relationships compositing it (e.g., a hierarchy of classes and relationships).
Go to the YAGO Web site, http://mpii.de/yago, click on the “Demo” tab and start the textual browser. This browser allows navigating through the YAGO ontology.
Then, to install YAGO on your machine, make sure that you have Java installed and around 5 GB free disk space. Proceed as follows:
You are all set, YAGO is installed on your machine and is ready for querying!
YAGO is expressed in RDFS. In RDFS, the facts and the ontological statements are written as triples. These triples can be queried using SPARQL , the standard querying language for RDFS facts. SPARQL was introduced in Chapter 8. The query engine of YAGO uses the Jena framework http://openjena.org/. Jena is an open-source project that has grown out of work with the HP Labs Semantic Web Program. Jena ships with YAGO, so that we only need to download and install YAGO.
To query YAGO, open a terminal window, navigate to the folder where the converters live and run the SPARQL script (called yago2sparql.bat on Windows and yago2sparql.sh on Unix). You will be invited to type SPARQL queries for YAGO. For ease of notation, the following namespaces are already defined:
In addition, the default namespace, referred to as simply “:”, is already set to the namespace of YAGO, http://www.mpii.de/yago/resource/. We can ask for simple YAGO facts through SPARQL queries of the form
Here, ?V is a variable name, as indicated by the question mark. The SELECT clause may contain several variables separated with whitespace. A and B are entities (with proper namespace prefix) and R is a relation (also with proper namespace prefix). Each of these components may also be a variable. The WHERE clause possibly contains multiple triples, separated by a dot. Try out the following:
This query lists all classes that Elvis Presley is an instance of. (Be sure to type all characters in the query exactly as written here.) Note that the results show the full URI of the entities, not the equivalent short form with namespace prefixes.
In YAGO, the ontological statements expressing the subclass and subproperty relations as well as the range and domain restrictions of properties are stored as RDFS triples. However, the semantics of these statements is not taken into account. In particular, the facts that follow from the RDFS entailment rules (see Chapter 7) are not derived. To derive these facts, one can use the saturation algorithm given in Chapter 8. It is possible, using the converters, to generate a Jena store of YAGO that includes (some of) these derived facts. This requires, however, downloading the YAGO2 ontology in its default format, and the conversion process can be a lenghty one.
Enter a blank line to quit the SPARQL interface.
An ontology refers to an entity through a URI. YAGO, for example, refers to the entity of Elvis Presley as http://mpii.de/yago/resource/Elvis_Presley. Another ontology may use a different identifier to refer to that entity. The DBpedia ontology, for instance, refers to Elvis as http://dbpedia.org/resource/Elvis_Presley. In general, these URIs (i.e., identifiers) do not have to be URL (i.e., locators). In other words, they do not have to refer to Web pages. In principle, when a URI is entered in a browser, one might simply get an error message. However, some ontologies implement the “Cool URI” protocol1 of the W3C. This means that each URI in the ontology is actually understood by a Web server that is configured to respond to a request of this URI. (In other words, each such URI is also an URL.) This allows a machine to retrieve fragments of the ontology from the server. Let us try this out:
This accesses the URI as a URL, just like a Web browser. If you look into elvis.html, you will see the Wikipedia page of Elvis.
The file elvis.rdfs should now contain everything YAGO knows about Elvis Presley. The file format is RDF, encoded in XML.
By following the URIs in the results, a machine can navigate the entire ontology.
As we have seen, different ontologies can use different URIs to refer to the same entity. The Linked Data Project, found at http://linkeddata.org/, tries to establish links between such synonymous URIs. Such a link takes the form of an RDFS statement. The predicate is sameAs of the OWL namespace:
These links allow jumping from one ontology to another. If both ontologies implement the Cool URI protocol, a machine can gather information about one entity from multiple servers. Let us try this out: Go to the Web site of the Sig.ma semantic Web search engine, http://sig.ma/. This engine gathers information from different ontologies about a given entity. It uses sameAs links, Cool URIs, and RDFa annotations hidden in HTML pages2. This leads to a lot of data, but potentially also very noisy data. Ask Sig.ma for
You can also try out keywords (such as “United States”). See how Sig.ma gathers data from multiple ontologies. The Linked Data project was pioneered by the DBpedia ontology, which is therefore a hub in this Web of data.