图谱实践笔记2番外篇 - SPARQL by Example

Author: Steven Date: Mar 23, 2019 Updated On: May 5, 2022
Categories: KG
2.5k words in total, 13 minutes required.

本篇是对SPARQL的番外补充;是对于cambridgesemantics教程[1]的学习笔记内容。

SPARQL1.0语法

SPARQL1.0 是一个read-only language,不具有update等进阶功能。

Graph Pattern

SPARQL可以看做是对Graph Pattern进行匹配,如下为最简单的一个三元组Pattern

1
2
3
4
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT * WHERE {
?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas>
}

返回类型为DBPedia域下WikicatCitiesInTexas的所有subject。

1
2
3
4
5
6
7
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbp: <http://dbpedia.org/ontology/>
SELECT * WHERE {
?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> .
?city dbp:populationTotal ?popTotal .
}

这个query加入了更复杂的限定条件,在复杂化pattern的同时使得搜索结果变少。

Query也可以使用Turtle的语法,如下

1
2
3
4
5
6
7
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbp: <http://dbpedia.org/ontology/>
SELECT * WHERE {
?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;
dbp:populationTotal ?popTotal .
}

注意的是第一个语句使用了一个分号分割,而不是句号。

1
2
3
4
5
6
7
8
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbp: <http://dbpedia.org/ontology/>
SELECT * WHERE {
?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;
dbp:populationTotal ?popTotal .
OPTIONAL {?city dbp:populationMetro ?popMetro . }
}

当对一个属性的过滤是可选时,我们使用OPTINAL语句。

和SQL相比,RDF中没有显式的NULL,因此应对可能缺失的信息时要用OPTINAL来避免过于严苛的过滤。

Solution Modifiers

接下来,介绍一组Solution Modifiers: ORDER BY, LIMIT, OFFSET

1
2
3
4
5
6
7
8
9
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbp: <http://dbpedia.org/ontology/>
SELECT * WHERE {
?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;
dbp:populationTotal ?popTotal .
OPTIONAL {?city dbp:populationMetro ?popMetro . }
}
ORDER BY desc(?popTotal)

注意,对某个字段升序或降序要使用desc或者asc关键字。

1
2
3
4
5
6
7
8
9
10
11
12
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbp: <http://dbpedia.org/ontology/>
SELECT * WHERE {
?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;
dbp:populationTotal ?popTotal .
OPTIONAL {?city dbp:populationMetro ?popMetro. }
}
ORDER BY desc(?popTotal)
LIMIT 10
OFFSET 5
# At most 10 results will be returned, starting with the 5th result.

LIMIT和OFFSET可以配合使用。

Remove Results

介绍Filter命令,以下类型的操作是允许的

  • Logical: &&, ||, !
  • Mathematical: +, -, *, /
  • Comparison: =, !=, <, >, <=, >=
  • SPARQL tests: isURI, isBlank, isLiteral, bound
  • SPARQL accessors: str, lang, datatype
  • Other: sameTerm, langMatches, regex
1
2
3
4
5
6
7
8
9
10
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbp: <http://dbpedia.org/ontology/>
SELECT * WHERE {
?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;
dbp:populationTotal ?popTotal .
OPTIONAL {?city dbp:populationMetro ?popMetro . }
FILTER (?popTotal > 50000)
}
ORDER BY desc(?popTotal)

一般写在Graph Pattern的后面。

1
2
3
4
5
6
7
8
9
10
11
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbp: <http://dbpedia.org/ontology/>
SELECT * WHERE {
?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;
dbp:populationTotal ?popTotal ;
rdfs:label ?name
OPTIONAL {?city dbp:populationMetro ?popMetro . }
FILTER (?popTotal > 50000)
}
ORDER BY desc(?popTotal)

与上面的query相比,这个查询多了一个语句,即 rdfs:label ?name。注意,在这个条件加入后,会把不同label name的city放到不同的组里,如果用FILTER条件则筛选变得相对更为严苛。

1
2
3
4
5
6
7
8
9
10
11
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbp: <http://dbpedia.org/ontology/>
SELECT * WHERE {
?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;
dbp:populationTotal ?popTotal ;
rdfs:label ?name
OPTIONAL {?city dbp:populationMetro ?popMetro. }
FILTER (?popTotal > 50000 && langmatches(lang(?name), “EN”))
}
ORDER BY desc(?popTotal)

这个语句对两个逻辑申明进行了与操作。FILTER条件也可以等价写为FILTER (?popTotal > 50000 && lang(?name) = “en”)

1
2
3
4
5
6
7
8
9
10
11
12
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbp: <http://dbpedia.org/ontology/>
SELECT * WHERE {
?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;
dbp:populationTotal ?popTotal ;
rdfs:label ?name
OPTIONAL {?city dbp:populationMetro ?popMetro. }
FILTER (?popTotal > 50000 && langmatches(lang(?name), “EN”) )
FILTER(!bound(?popMetro))
}
ORDER BY desc(?popTotal)

这个查询中bound()验证了?popTotal这个变量是否不为空或匹配到了。

1
2
3
4
5
6
7
8
9
10
11
12
13
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbp: <http://dbpedia.org/ontology/>
SELECT * WHERE {
?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;
dbp:populationTotal ?popTotal ;
rdfs:label ?name
OPTIONAL {?city dbp:populationMetro ?popMetro. }
FILTER (?popTotal > 50000)
FILTER (langmatches(lang(?name), “EN”) )
FILTER(!bound(?popMetro))
}
ORDER BY desc(?popTotal)

FILTER条件可以并排写成多个。

UNION

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbp: <http://dbpedia.org/ontology/>
SELECT * WHERE {
{
?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;
dbp:populationTotal ?popTotal ;
rdfs:label ?name
OPTIONAL {?city dbp:populationMetro ?popMetro. }
FILTER (?popTotal > 50000 && langmatches(lang(?name), “EN”))
}
UNION
{
?city rdf:type <http://dbpedia.org/class/yago/CitiesInCalifornia>;
dbp:populationTotal ?popTotal ;
rdfs:label ?name
OPTIONAL {?city dbp:populationMetro ?popMetro. }
FILTER (?popTotal > 50000 && langmatches(lang(?name), “EN”))
}
}
ORDER BY desc(?popTotal)

以下是上述query的一个简写

1
2
3
4
5
6
7
8
9
10
11
12
13
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbp: <http://dbpedia.org/ontology/>
SELECT * WHERE {
?city dbp:populationTotal ?popTotal ;
rdfs:label ?name
OPTIONAL {?city dbp:populationMetro ?popMetro. }
FILTER (?popTotal > 50000 && langmatches(lang(?name), “EN”))
{ ?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> . }
UNION
{ ?city rdf:type <http://dbpedia.org/class/yago/CitiesInCalifornia>. }
}
ORDER BY desc(?popTotal)

Named Graphs and the GRAPH Clause

定义一个自己的graph,如下

1
2
3
4
5
6
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT * WHERE {
GRAPH ?g {
?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> .
}
}

这里把匹配的结构放入了一个named graph中。

ASK

1
2
3
4
5
6
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
ASK WHERE {
<http://dbpedia.org/resource/Austin,_Texas>
rdf:type
<http://dbpedia.org/class/yago/WikicatCitiesInTexas> .
}

WHERE条件中是一个graph pattern。

1
2
3
4
5
6
7
8
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbp: <http://dbpedia.org/ontology/>
ASK WHERE {
?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;
dbp:populationTotal ?popTotal ;
dbp:populationMetro ?popMetro.
FILTER (?popTotal > 600000 && ?popMetro < 1800000)
}

这个query具有更为复杂的WHERE条件。

DESCRIBE

1
DESCRIBE <http://dbpedia.org/resource/Austin,_Texas>

返回描述这个实体的一个subgraph。

1
2
3
4
5
6
7
8
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbp: <http://dbpedia.org/ontology/>
DESCRIBE ?city WHERE {
?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;
dbp:populationTotal ?popTotal ;
dbp:populationMetro ?popMetro.
FILTER (?popTotal > 600000 && ?popMetro < 1800000)
}

可以加入更为复杂的条件。

CONSTRUCT

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbp: <http://dbpedia.org/ontology/>
CONSTRUCT {
?city rdf:type <http://myvocabulary.com/LargeMetroCitiesInTexas> ;
<http://myvocabulary.com/cityName> ?name ;
<http://myvocabulary.com/totalPopulation> ?popTotal ;
<http://myvocabulary.com/metroPopulation> ?popMetro .
} WHERE {
?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;
dbp:populationTotal ?popTotal ;
rdfs:label ?name ;
dbp:populationMetro ?popMetro .
FILTER (?popTotal > 500000 && langmatches(lang(?name), “EN”))
}

允许定义新的vocabulary来构建一个新的图结构。一些新的词汇可以自定义在CONSTRUCT子句中。

For SPARQL endpoints, SELECT and ASK queries return XML (application/sparql-results+xml) as the standard query result format for a SPARQL query. There is also a non-standard JSON syntax.

Both DESCRIBE and CONSTRUCT return RDF graphs directly, and not in the standard SPARQL query results format. For example, DBpedia will return results in N3.

SPARQL1.1语法

SPARQL 1.1包括:

  1. Aggregates: ability to group results and calculate aggregate values (e.g. count, min, max, avg, sum, …).
  2. Projected expressions: ability for query results to contain values derived from constants, function calls, or other expressions in the SELECT list.
  3. Sub-queries: allows a query to be embedded within another.
  4. Negation: includes two negation operators: NOT EXIST and MINUS
  5. Update: an update language for RDF

还有以下高级功能:

  1. Property paths: query arbitrary length paths of a graph via a regular-expression-like syntax
  2. Query Federation: ability to split a single query and send parts of it to different SPARQL endpoints and then combining the results from each one
  3. Service Description: a vocabulary and discovery mechanism that describes the capabilities of a SPARQL endpoint.
  4. Entailment Regimes: defines conditions under which SPARQL queries can be used for inference under RDF, RDF Schema, OWL, or RIF entailment.

更复杂参见[2]

Aggregate

1
2
3
4
5
6
7
8
# What are the top interests of LiveJournal users interested in Harry Potter?
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?interest COUNT(*) AS ?count where
{
?p foaf:interest <http://www.livejournal.com/interests.bml?int=harry+potter> .
?p foaf:interest ?interest
}
GROUP BY ?interest ORDER BY DESC(COUNT(*)) LIMIT 10

上述查询中COUNT函数不用逗号分隔,要配合GROUP BY关键字使用。其他类似函数还有MIN, MAX, SUM。

Subquery

1
2
3
4
5
6
7
8
9
10
11
PREFIX foaf:  <http://xmlns.com/foaf/0.1/>
SELECT ?name ?email
FROM <http://www.w3.org/People/Berners-Lee/card>
WHERE {
{
SELECT DISTINCT ?person ?name WHERE {
?person foaf:name ?name
} ORDER BY ?name LIMIT 10 OFFSET 10
}
OPTIONAL { ?person foaf:mbox ?email }
}

其他非标准化特性

正在进行的SPARQL 1.1中下列特性是非标准化的:

  • Full-text search. How is keyword/key-phrase search integrated with SPARQL queries?
  • Parameters. How can initial bindings be supplied to a SPARQL endpoint along with the query itself?
  • Querying “all” named graphs. Is there a standard way to ask that a SPARQL query be run against all the graphs that a SPARQL endpoint knows about?
  • SPARQL in XML and RDF. Several toolsets make use of XML- or RDF-based serializations of SPARQL queries.

例如,对于text search,OpenLink’s Virtuoso使用了一个扩展函数bif:contains。

1
2
3
4
5
6
7
8
9
10
11
12
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX type: <http://dbpedia.org/class/yago/>
PREFIX prop: <http://dbpedia.org/property/>
SELECT ?lbl ?est
WHERE {
?country rdfs:label ?lbl .
FILTER(bif:contains(?lbl, "Republic")) .
?country a type:Country108544813 ;
prop:establishedDate ?est .
FILTER(?est < "1920-01-01"^^xsd:date) .
}

SQL和SPARQL的区别

  • SQL Data是基于DDL来定义schema来组织数据;而SPARQL对应的RDF Data是通过三元组的statement形式来定义的。
  • 相比于relational dta,RDF Data是一种post-Web language,是非常方便纳入第三方publisher的数据进行融合的。
  • SQL中通过LEFT OUTER JOIN来应对存在NULL值的问题,而SPARQL则通过OPTIONAL来避免出现无对应statement的情况。
  • SPARQL可以在Web上进行检索,而SQL则需要对应的Database。
  • 对于某个entity对应的知识,SPARQL的检索更为容易,而SQL可能需要检索多张相关表。
  • SPARQL支持定义named graph来进行联邦查询(federate queries)。
  • RDF/RDFS/OWL对于SCHEMA的扩展是非常灵活方便的。
  • A Direct Mapping of Relational Data to RDF,R2RML: RDB to RDF Mapping Language是两种完成来SQL和SPARQL之间转换的协议。

更多举例参见[4]

扩展阅读