图谱实践笔记4 - Jena实训｜YOLO - A Blog You Only Look Once

1k words in total, 5 minutes required.

图谱实践笔记第四篇：Jena实训，生成所需的subgraph

读取RDF文件

Jena 2.10引入了RDF I/O technology (RIOT) 技术。

Class	Comment
RDFDataMgr	Main set of functions to read and load models and datasets
StreamRDF	处理parser的输出
RDFParser	详细设置一个parser
StreamManager	处理输入流的打开
RDFLanguages	语言注册
RDFParserRegistry	parser工厂注册

可以处理TURTLE、TTL、NTRIPLES、N3、RDFJSON等多种输入语言格式。

Model model = ModelFactory.createDefaultModel() ;
model.read("data.ttl") ;
// If the syntax is not as the file extension, a language can be declared
model.read("data.foo", "TURTLE") ;

使用RDFDataMgr的方法如下

// Create a model and read into it from file
// "data.ttl" assumed to be Turtle.
Model model = RDFDataMgr.loadModel("data.ttl") ;

// Create a dataset and read into it from file
// "data.trig" assumed to be TriG.
Dataset dataset = RDFDataMgr.loadDataset("data.trig") ;

// Read into an existing Model
RDFDataMgr.read(model, "data2.ttl") ;

使用RDFParser的方法如下，可以定义一些异常处理和语言格式设置等：

 // The parsers will do the necessary character set conversion.  
try (InputStream in = new FileInputStream("data.some.unusual.extension")) {
     RDFParser.create()
    .source(in)
    .lang(RDFLanguages.TRIG)
    .errorHandler(ErrorHandlerFactory.errorHandlerStrict)
    .base("http://example/base")
    .parse(noWhere);
}

将Query内容输出成TTL格式

我们将上一篇中的查询结果进行改成，直接输入到TTL格式中。这样做的目的，是为了能够让Jena在其他程序中直接读取TTL文件，而生成需要的model。

Query query = QueryFactory.create(queryString);
QueryExecution qExec = QueryExecutionFactory.create(query, model);
//QueryExecution qExec = QueryExecutionFactory.create(queryString, model);
System.out.println("Start execute query");
//Iterator<Triple> rs = qExec.execConstructTriples();
ResultSet results = qExec.execSelect();
List<QuerySolution> solutions = ResultSetFormatter.toList(results);
for(int i = 0; i < solutions.size(); i++){
    RDFNode objectNode= solutions.get(i).get("?object");
    String uri = objectNode.asResource().getURI();
    String localName = uri.substring(35).replace("/", "u0001");
    if("".equals(localName) || localName == null){
      count++;
      localName = "BlankNode_" + count;
}

File newFile = new File(output + localName + ".ttl");
FileWriter fw = new FileWriter(newFile);
BufferedWriter bw = new BufferedWriter(fw);
// the first query, find all statements starting with objectNode
Model model1 = model.listStatements(objectNode.asResource(), null, (RDFNode) null).toModel();
// the second query, find all statements ending with objectNode, an add operation is used for the two models
Model model2 = model1.add(model.listStatements(null, null, objectNode).toModel());
// write to files
model2.write(bw, "TURTLE");
bw.close();
fw.close();
}
System.out.println("Finish writing to file");
System.out.println("Final count:" + solutions.size());
dataset.close();

通过上述步骤，我们十分简单地将数据构造成了标准的TTL格式，下面是一个示例，描述的是2000_UEFA_Cup_Final_riots这个实体的关系，包括left relation和right relation。

<http://yago-knowledge.org/resource/Arsenal_firm>
      <http://yago-knowledge.org/resource/linksTo>
              <http://yago-knowledge.org/resource/2000_UEFA_Cup_Final_riots> .

<http://yago-knowledge.org/resource/2013-14_Arsenal_F.C._season>
      <http://yago-knowledge.org/resource/linksTo>
              <http://yago-knowledge.org/resource/2000_UEFA_Cup_Final_riots> .

<http://yago-knowledge.org/resource/2000_UEFA_Cup_Final_riots>
      a       <http://www.w3.org/2002/07/owl#Thing> , <http://yago-knowledge.org/resource/wordnet_action_100037396> , <http://yago-knowledge.org/resource/wikicat_Events> , <http://yago-knowledge.org/resource/wordnet_aggression_100964569> , <http://yago-knowledge.org/resource/wikicat_Sports_riots> , <http://yago-knowledge.org/resource/wordnet_conflict_100958896> , <http://yago-knowledge.org/resource/wordnet_riot_101170502> , <http://yago-knowledge.org/resource/wikicat_2000_events> , <http://yago-knowledge.org/resource/wordnet_act_100030358> , <http://yago-knowledge.org/resource/yagoPermanentlyLocatedEntity> , <http://yago-knowledge.org/resource/wordnet_psychological_feature_100023100> , <http://yago-knowledge.org/resource/wikicat_2000_riots> , <http://yago-knowledge.org/resource/wikicat_2000s_events> , <http://yago-knowledge.org/resource/wordnet_violence_100965404> , <http://yago-knowledge.org/resource/wikicat_May_events> , <http://yago-knowledge.org/resource/wordnet_group_action_101080366> , <http://yago-knowledge.org/resource/wordnet_abstraction_100002137> , <http://yago-knowledge.org/resource/wikicat_Riots> , <http://yago-knowledge.org/resource/wordnet_event_100029378> , <http://yago-knowledge.org/resource/wikicat_Riots_and_civil_disorder_in_Denmark> , <http://yago-knowledge.org/resource/wikicat_May_2000_events> ;

重新读入TTL文件到Jena

在有了上述文件后，我们也可以调用model的read方法来重新读入TTL文件，如下

        File file = new File(inputDictionary);
        File[] fs = file.listFiles();
        model = ModelFactory.createDefaultModel(); // creates an in-memory Jena Model
        for(File f: fs){
            if(!f.isDirectory()){
//                System.out.println(f);
                try {
                    InputStream in = new FileInputStream(f.getAbsolutePath());
                    RIOT.init();
                    System.out.println(f.getAbsolutePath());
                    model.read(in, null, "TURTLE");
                    System.out.println(model.listStatements().toList().size());
//                    System.out.println("\n---- Turtle ----");
//                    model.write(System.out, "TURTLE");
                } catch (FileNotFoundException e) {
                    e.printStackTrace();
                }
            }
        }

注意，上述的读取是增量模式，也就是说读取多个文件不会形成覆盖，而是update。

有了这个model，我们可以重新执行各种类型的query，来得到想要的结果了。

扩展阅读

1.https://jena.apache.org/documentation/io/rdf-input.html ↩

Table of Contents

读取RDF文件
将Query内容输出成TTL格式
重新读入TTL文件到Jena
扩展阅读