图谱实践笔记4 - Jena实训

Author: Steven Date: Mar 29, 2019 Updated On: May 5, 2022
Categories: KG
1k words in total, 5 minutes required.

图谱实践笔记第四篇:Jena实训,生成所需的subgraph

读取RDF文件

Jena 2.10引入了RDF I/O technology (RIOT) 技术。

Class Comment
RDFDataMgr Main set of functions to read and load models and datasets
StreamRDF 处理parser的输出
RDFParser 详细设置一个parser
StreamManager 处理输入流的打开
RDFLanguages 语言注册
RDFParserRegistry parser工厂注册

可以处理TURTLE、TTL、NTRIPLES、N3、RDFJSON等多种输入语言格式。

1
2
3
4
Model model = ModelFactory.createDefaultModel() ;
model.read("data.ttl") ;
// If the syntax is not as the file extension, a language can be declared
model.read("data.foo", "TURTLE") ;

使用RDFDataMgr的方法如下

1
2
3
4
5
6
7
8
9
10
// Create a model and read into it from file
// "data.ttl" assumed to be Turtle.
Model model = RDFDataMgr.loadModel("data.ttl") ;

// Create a dataset and read into it from file
// "data.trig" assumed to be TriG.
Dataset dataset = RDFDataMgr.loadDataset("data.trig") ;

// Read into an existing Model
RDFDataMgr.read(model, "data2.ttl") ;

使用RDFParser的方法如下,可以定义一些异常处理和语言格式设置等:

1
2
3
4
5
6
7
8
9
 // The parsers will do the necessary character set conversion.  
try (InputStream in = new FileInputStream("data.some.unusual.extension")) {
RDFParser.create()
.source(in)
.lang(RDFLanguages.TRIG)
.errorHandler(ErrorHandlerFactory.errorHandlerStrict)
.base("http://example/base")
.parse(noWhere);
}

将Query内容输出成TTL格式

我们将上一篇中的查询结果进行改成,直接输入到TTL格式中。这样做的目的,是为了能够让Jena在其他程序中直接读取TTL文件,而生成需要的model。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Query query = QueryFactory.create(queryString);
QueryExecution qExec = QueryExecutionFactory.create(query, model);
//QueryExecution qExec = QueryExecutionFactory.create(queryString, model);
System.out.println("Start execute query");
//Iterator<Triple> rs = qExec.execConstructTriples();
ResultSet results = qExec.execSelect();
List<QuerySolution> solutions = ResultSetFormatter.toList(results);
for(int i = 0; i < solutions.size(); i++){
RDFNode objectNode= solutions.get(i).get("?object");
String uri = objectNode.asResource().getURI();
String localName = uri.substring(35).replace("/", "u0001");
if("".equals(localName) || localName == null){
count++;
localName = "BlankNode_" + count;
}

File newFile = new File(output + localName + ".ttl");
FileWriter fw = new FileWriter(newFile);
BufferedWriter bw = new BufferedWriter(fw);
// the first query, find all statements starting with objectNode
Model model1 = model.listStatements(objectNode.asResource(), null, (RDFNode) null).toModel();
// the second query, find all statements ending with objectNode, an add operation is used for the two models
Model model2 = model1.add(model.listStatements(null, null, objectNode).toModel());
// write to files
model2.write(bw, "TURTLE");
bw.close();
fw.close();
}
System.out.println("Finish writing to file");
System.out.println("Final count:" + solutions.size());
dataset.close();

通过上述步骤,我们十分简单地将数据构造成了标准的TTL格式,下面是一个示例,描述的是2000_UEFA_Cup_Final_riots这个实体的关系,包括left relation和right relation。

1
2
3
4
5
6
7
8
9
10
<http://yago-knowledge.org/resource/Arsenal_firm>
<http://yago-knowledge.org/resource/linksTo>
<http://yago-knowledge.org/resource/2000_UEFA_Cup_Final_riots> .

<http://yago-knowledge.org/resource/2013-14_Arsenal_F.C._season>
<http://yago-knowledge.org/resource/linksTo>
<http://yago-knowledge.org/resource/2000_UEFA_Cup_Final_riots> .

<http://yago-knowledge.org/resource/2000_UEFA_Cup_Final_riots>
a <http://www.w3.org/2002/07/owl#Thing> , <http://yago-knowledge.org/resource/wordnet_action_100037396> , <http://yago-knowledge.org/resource/wikicat_Events> , <http://yago-knowledge.org/resource/wordnet_aggression_100964569> , <http://yago-knowledge.org/resource/wikicat_Sports_riots> , <http://yago-knowledge.org/resource/wordnet_conflict_100958896> , <http://yago-knowledge.org/resource/wordnet_riot_101170502> , <http://yago-knowledge.org/resource/wikicat_2000_events> , <http://yago-knowledge.org/resource/wordnet_act_100030358> , <http://yago-knowledge.org/resource/yagoPermanentlyLocatedEntity> , <http://yago-knowledge.org/resource/wordnet_psychological_feature_100023100> , <http://yago-knowledge.org/resource/wikicat_2000_riots> , <http://yago-knowledge.org/resource/wikicat_2000s_events> , <http://yago-knowledge.org/resource/wordnet_violence_100965404> , <http://yago-knowledge.org/resource/wikicat_May_events> , <http://yago-knowledge.org/resource/wordnet_group_action_101080366> , <http://yago-knowledge.org/resource/wordnet_abstraction_100002137> , <http://yago-knowledge.org/resource/wikicat_Riots> , <http://yago-knowledge.org/resource/wordnet_event_100029378> , <http://yago-knowledge.org/resource/wikicat_Riots_and_civil_disorder_in_Denmark> , <http://yago-knowledge.org/resource/wikicat_May_2000_events> ;

重新读入TTL文件到Jena

在有了上述文件后,我们也可以调用model的read方法来重新读入TTL文件,如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
        File file = new File(inputDictionary);
File[] fs = file.listFiles();
model = ModelFactory.createDefaultModel(); // creates an in-memory Jena Model
for(File f: fs){
if(!f.isDirectory()){
// System.out.println(f);
try {
InputStream in = new FileInputStream(f.getAbsolutePath());
RIOT.init();
System.out.println(f.getAbsolutePath());
model.read(in, null, "TURTLE");
System.out.println(model.listStatements().toList().size());
// System.out.println("\n---- Turtle ----");
// model.write(System.out, "TURTLE");
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}

注意,上述的读取是增量模式,也就是说读取多个文件不会形成覆盖,而是update。

有了这个model,我们可以重新执行各种类型的query,来得到想要的结果了。

扩展阅读