2.5k words in total, 13 minutes required. 本篇是对SPARQL的番外补充;是对于cambridgesemantics教程[1]的学习笔记内容。 SPARQL1.0语法SPARQL1.0 是一个read-only language,不具有update等进阶功能。 Graph PatternSPARQL可以看做是对Graph Pattern进行匹配,如下为最简单的一个三元组Pattern 1234PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>SELECT * WHERE {?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas>} 返回类型为DBPedia域下WikicatCitiesInTexas的所有subject。 1234567PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX dbp: <http://dbpedia.org/ontology/>SELECT * WHERE {?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> .?city dbp:populationTotal ?popTotal .} 这个query加入了更复杂的限定条件,在复杂化pattern的同时使得搜索结果变少。 Query也可以使用Turtle的语法,如下1234567PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX dbp: <http://dbpedia.org/ontology/>SELECT * WHERE {?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ; dbp:populationTotal ?popTotal .}注意的是第一个语句使用了一个分号分割,而不是句号。 12345678PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX dbp: <http://dbpedia.org/ontology/>SELECT * WHERE {?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;dbp:populationTotal ?popTotal .OPTIONAL {?city dbp:populationMetro ?popMetro . }} 当对一个属性的过滤是可选时,我们使用OPTINAL语句。 和SQL相比,RDF中没有显式的NULL,因此应对可能缺失的信息时要用OPTINAL来避免过于严苛的过滤。 Solution Modifiers接下来,介绍一组Solution Modifiers: ORDER BY, LIMIT, OFFSET 123456789PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX dbp: <http://dbpedia.org/ontology/>SELECT * WHERE {?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;dbp:populationTotal ?popTotal .OPTIONAL {?city dbp:populationMetro ?popMetro . }}ORDER BY desc(?popTotal) 注意,对某个字段升序或降序要使用desc或者asc关键字。 123456789101112PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX dbp: <http://dbpedia.org/ontology/>SELECT * WHERE {?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;dbp:populationTotal ?popTotal .OPTIONAL {?city dbp:populationMetro ?popMetro. }}ORDER BY desc(?popTotal)LIMIT 10OFFSET 5# At most 10 results will be returned, starting with the 5th result. LIMIT和OFFSET可以配合使用。 Remove Results介绍Filter命令,以下类型的操作是允许的 Logical: &&, ||, ! Mathematical: +, -, *, / Comparison: =, !=, <, >, <=, >= SPARQL tests: isURI, isBlank, isLiteral, bound SPARQL accessors: str, lang, datatype Other: sameTerm, langMatches, regex 12345678910PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX dbp: <http://dbpedia.org/ontology/>SELECT * WHERE {?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;dbp:populationTotal ?popTotal .OPTIONAL {?city dbp:populationMetro ?popMetro . }FILTER (?popTotal > 50000)}ORDER BY desc(?popTotal) 一般写在Graph Pattern的后面。 1234567891011PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX dbp: <http://dbpedia.org/ontology/>SELECT * WHERE {?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;dbp:populationTotal ?popTotal ;rdfs:label ?nameOPTIONAL {?city dbp:populationMetro ?popMetro . }FILTER (?popTotal > 50000)}ORDER BY desc(?popTotal) 与上面的query相比,这个查询多了一个语句,即 rdfs:label ?name。注意,在这个条件加入后,会把不同label name的city放到不同的组里,如果用FILTER条件则筛选变得相对更为严苛。 1234567891011PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX dbp: <http://dbpedia.org/ontology/>SELECT * WHERE {?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;dbp:populationTotal ?popTotal ;rdfs:label ?nameOPTIONAL {?city dbp:populationMetro ?popMetro. }FILTER (?popTotal > 50000 && langmatches(lang(?name), “EN”))}ORDER BY desc(?popTotal) 这个语句对两个逻辑申明进行了与操作。FILTER条件也可以等价写为FILTER (?popTotal > 50000 && lang(?name) = “en”)。 123456789101112PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX dbp: <http://dbpedia.org/ontology/>SELECT * WHERE {?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;dbp:populationTotal ?popTotal ;rdfs:label ?name OPTIONAL {?city dbp:populationMetro ?popMetro. }FILTER (?popTotal > 50000 && langmatches(lang(?name), “EN”) )FILTER(!bound(?popMetro))}ORDER BY desc(?popTotal) 这个查询中bound()验证了?popTotal这个变量是否不为空或匹配到了。 12345678910111213PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX dbp: <http://dbpedia.org/ontology/>SELECT * WHERE {?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;dbp:populationTotal ?popTotal ;rdfs:label ?nameOPTIONAL {?city dbp:populationMetro ?popMetro. }FILTER (?popTotal > 50000)FILTER (langmatches(lang(?name), “EN”) )FILTER(!bound(?popMetro))}ORDER BY desc(?popTotal) FILTER条件可以并排写成多个。 UNION123456789101112131415161718192021PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX dbp: <http://dbpedia.org/ontology/>SELECT * WHERE {{?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;dbp:populationTotal ?popTotal ;rdfs:label ?nameOPTIONAL {?city dbp:populationMetro ?popMetro. }FILTER (?popTotal > 50000 && langmatches(lang(?name), “EN”))}UNION { ?city rdf:type <http://dbpedia.org/class/yago/CitiesInCalifornia>; dbp:populationTotal ?popTotal ; rdfs:label ?name OPTIONAL {?city dbp:populationMetro ?popMetro. } FILTER (?popTotal > 50000 && langmatches(lang(?name), “EN”)) }}ORDER BY desc(?popTotal) 以下是上述query的一个简写12345678910111213PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX dbp: <http://dbpedia.org/ontology/>SELECT * WHERE {?city dbp:populationTotal ?popTotal ;rdfs:label ?nameOPTIONAL {?city dbp:populationMetro ?popMetro. }FILTER (?popTotal > 50000 && langmatches(lang(?name), “EN”)){ ?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> . } UNION { ?city rdf:type <http://dbpedia.org/class/yago/CitiesInCalifornia>. }}ORDER BY desc(?popTotal) Named Graphs and the GRAPH Clause定义一个自己的graph,如下123456PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>SELECT * WHERE {GRAPH ?g {?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> .}}这里把匹配的结构放入了一个named graph中。 ASK123456PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>ASK WHERE {<http://dbpedia.org/resource/Austin,_Texas>rdf:type<http://dbpedia.org/class/yago/WikicatCitiesInTexas> .} WHERE条件中是一个graph pattern。 12345678PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX dbp: <http://dbpedia.org/ontology/>ASK WHERE {?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;dbp:populationTotal ?popTotal ;dbp:populationMetro ?popMetro.FILTER (?popTotal > 600000 && ?popMetro < 1800000)} 这个query具有更为复杂的WHERE条件。 DESCRIBE1DESCRIBE <http://dbpedia.org/resource/Austin,_Texas> 返回描述这个实体的一个subgraph。 12345678PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX dbp: <http://dbpedia.org/ontology/>DESCRIBE ?city WHERE {?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;dbp:populationTotal ?popTotal ;dbp:populationMetro ?popMetro.FILTER (?popTotal > 600000 && ?popMetro < 1800000)} 可以加入更为复杂的条件。 CONSTRUCT123456789101112131415PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX dbp: <http://dbpedia.org/ontology/>CONSTRUCT {?city rdf:type <http://myvocabulary.com/LargeMetroCitiesInTexas> ;<http://myvocabulary.com/cityName> ?name ;<http://myvocabulary.com/totalPopulation> ?popTotal ;<http://myvocabulary.com/metroPopulation> ?popMetro .} WHERE {?city rdf:type <http://dbpedia.org/class/yago/WikicatCitiesInTexas> ;dbp:populationTotal ?popTotal ;rdfs:label ?name ;dbp:populationMetro ?popMetro .FILTER (?popTotal > 500000 && langmatches(lang(?name), “EN”))} 允许定义新的vocabulary来构建一个新的图结构。一些新的词汇可以自定义在CONSTRUCT子句中。 For SPARQL endpoints, SELECT and ASK queries return XML (application/sparql-results+xml) as the standard query result format for a SPARQL query. There is also a non-standard JSON syntax. Both DESCRIBE and CONSTRUCT return RDF graphs directly, and not in the standard SPARQL query results format. For example, DBpedia will return results in N3. SPARQL1.1语法SPARQL 1.1包括: Aggregates: ability to group results and calculate aggregate values (e.g. count, min, max, avg, sum, …). Projected expressions: ability for query results to contain values derived from constants, function calls, or other expressions in the SELECT list. Sub-queries: allows a query to be embedded within another. Negation: includes two negation operators: NOT EXIST and MINUS Update: an update language for RDF 还有以下高级功能: Property paths: query arbitrary length paths of a graph via a regular-expression-like syntax Query Federation: ability to split a single query and send parts of it to different SPARQL endpoints and then combining the results from each one Service Description: a vocabulary and discovery mechanism that describes the capabilities of a SPARQL endpoint. Entailment Regimes: defines conditions under which SPARQL queries can be used for inference under RDF, RDF Schema, OWL, or RIF entailment. 更复杂参见[2]。 Aggregate12345678# What are the top interests of LiveJournal users interested in Harry Potter?PREFIX foaf: <http://xmlns.com/foaf/0.1/>SELECT ?interest COUNT(*) AS ?count where { ?p foaf:interest <http://www.livejournal.com/interests.bml?int=harry+potter> . ?p foaf:interest ?interest }GROUP BY ?interest ORDER BY DESC(COUNT(*)) LIMIT 10 上述查询中COUNT函数不用逗号分隔,要配合GROUP BY关键字使用。其他类似函数还有MIN, MAX, SUM。 Subquery1234567891011PREFIX foaf: <http://xmlns.com/foaf/0.1/>SELECT ?name ?emailFROM <http://www.w3.org/People/Berners-Lee/card>WHERE { { SELECT DISTINCT ?person ?name WHERE { ?person foaf:name ?name } ORDER BY ?name LIMIT 10 OFFSET 10 } OPTIONAL { ?person foaf:mbox ?email }} 其他非标准化特性正在进行的SPARQL 1.1中下列特性是非标准化的: Full-text search. How is keyword/key-phrase search integrated with SPARQL queries? Parameters. How can initial bindings be supplied to a SPARQL endpoint along with the query itself? Querying “all” named graphs. Is there a standard way to ask that a SPARQL query be run against all the graphs that a SPARQL endpoint knows about? SPARQL in XML and RDF. Several toolsets make use of XML- or RDF-based serializations of SPARQL queries. 例如,对于text search,OpenLink’s Virtuoso使用了一个扩展函数bif:contains。 123456789101112PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX type: <http://dbpedia.org/class/yago/>PREFIX prop: <http://dbpedia.org/property/>SELECT ?lbl ?estWHERE { ?country rdfs:label ?lbl . FILTER(bif:contains(?lbl, "Republic")) . ?country a type:Country108544813 ; prop:establishedDate ?est . FILTER(?est < "1920-01-01"^^xsd:date) .} SQL和SPARQL的区别 SQL Data是基于DDL来定义schema来组织数据;而SPARQL对应的RDF Data是通过三元组的statement形式来定义的。 相比于relational dta,RDF Data是一种post-Web language,是非常方便纳入第三方publisher的数据进行融合的。 SQL中通过LEFT OUTER JOIN来应对存在NULL值的问题,而SPARQL则通过OPTIONAL来避免出现无对应statement的情况。 SPARQL可以在Web上进行检索,而SQL则需要对应的Database。 对于某个entity对应的知识,SPARQL的检索更为容易,而SQL可能需要检索多张相关表。 SPARQL支持定义named graph来进行联邦查询(federate queries)。 RDF/RDFS/OWL对于SCHEMA的扩展是非常灵活方便的。 A Direct Mapping of Relational Data to RDF,R2RML: RDB to RDF Mapping Language是两种完成来SQL和SPARQL之间转换的协议。 更多举例参见[4]。 扩展阅读1.sparql nuts bolts. ↩2.https://www.w3.org/TR/sparql11-query/. ↩3.SPARQL by Example. ↩4.SPARQL vs. SQL. ↩ ← Previous Post Next Post→ Table of Contents SPARQL1.0语法Graph PatternSolution ModifiersRemove ResultsUNIONNamed Graphs and the GRAPH ClauseASKDESCRIBECONSTRUCTSPARQL1.1语法AggregateSubquery其他非标准化特性SQL和SPARQL的区别扩展阅读