1k words in total, 5 minutes required. 图谱实践笔记第四篇:Jena实训,生成所需的subgraph 读取RDF文件Jena 2.10引入了RDF I/O technology (RIOT) 技术。 Class Comment RDFDataMgr Main set of functions to read and load models and datasets StreamRDF 处理parser的输出 RDFParser 详细设置一个parser StreamManager 处理输入流的打开 RDFLanguages 语言注册 RDFParserRegistry parser工厂注册 可以处理TURTLE、TTL、NTRIPLES、N3、RDFJSON等多种输入语言格式。 1234Model model = ModelFactory.createDefaultModel() ;model.read("data.ttl") ;// If the syntax is not as the file extension, a language can be declaredmodel.read("data.foo", "TURTLE") ; 使用RDFDataMgr的方法如下 12345678910// Create a model and read into it from file// "data.ttl" assumed to be Turtle.Model model = RDFDataMgr.loadModel("data.ttl") ;// Create a dataset and read into it from file// "data.trig" assumed to be TriG.Dataset dataset = RDFDataMgr.loadDataset("data.trig") ;// Read into an existing ModelRDFDataMgr.read(model, "data2.ttl") ; 使用RDFParser的方法如下,可以定义一些异常处理和语言格式设置等: 123456789 // The parsers will do the necessary character set conversion. try (InputStream in = new FileInputStream("data.some.unusual.extension")) { RDFParser.create() .source(in) .lang(RDFLanguages.TRIG) .errorHandler(ErrorHandlerFactory.errorHandlerStrict) .base("http://example/base") .parse(noWhere);} 将Query内容输出成TTL格式我们将上一篇中的查询结果进行改成,直接输入到TTL格式中。这样做的目的,是为了能够让Jena在其他程序中直接读取TTL文件,而生成需要的model。 12345678910111213141516171819202122232425262728293031Query query = QueryFactory.create(queryString);QueryExecution qExec = QueryExecutionFactory.create(query, model);//QueryExecution qExec = QueryExecutionFactory.create(queryString, model);System.out.println("Start execute query");//Iterator<Triple> rs = qExec.execConstructTriples();ResultSet results = qExec.execSelect();List<QuerySolution> solutions = ResultSetFormatter.toList(results);for(int i = 0; i < solutions.size(); i++){ RDFNode objectNode= solutions.get(i).get("?object"); String uri = objectNode.asResource().getURI(); String localName = uri.substring(35).replace("/", "u0001"); if("".equals(localName) || localName == null){ count++; localName = "BlankNode_" + count;}File newFile = new File(output + localName + ".ttl");FileWriter fw = new FileWriter(newFile);BufferedWriter bw = new BufferedWriter(fw);// the first query, find all statements starting with objectNodeModel model1 = model.listStatements(objectNode.asResource(), null, (RDFNode) null).toModel();// the second query, find all statements ending with objectNode, an add operation is used for the two modelsModel model2 = model1.add(model.listStatements(null, null, objectNode).toModel());// write to filesmodel2.write(bw, "TURTLE");bw.close();fw.close();}System.out.println("Finish writing to file");System.out.println("Final count:" + solutions.size());dataset.close(); 通过上述步骤,我们十分简单地将数据构造成了标准的TTL格式,下面是一个示例,描述的是2000_UEFA_Cup_Final_riots这个实体的关系,包括left relation和right relation。 12345678910<http://yago-knowledge.org/resource/Arsenal_firm> <http://yago-knowledge.org/resource/linksTo> <http://yago-knowledge.org/resource/2000_UEFA_Cup_Final_riots> .<http://yago-knowledge.org/resource/2013-14_Arsenal_F.C._season> <http://yago-knowledge.org/resource/linksTo> <http://yago-knowledge.org/resource/2000_UEFA_Cup_Final_riots> .<http://yago-knowledge.org/resource/2000_UEFA_Cup_Final_riots> a <http://www.w3.org/2002/07/owl#Thing> , <http://yago-knowledge.org/resource/wordnet_action_100037396> , <http://yago-knowledge.org/resource/wikicat_Events> , <http://yago-knowledge.org/resource/wordnet_aggression_100964569> , <http://yago-knowledge.org/resource/wikicat_Sports_riots> , <http://yago-knowledge.org/resource/wordnet_conflict_100958896> , <http://yago-knowledge.org/resource/wordnet_riot_101170502> , <http://yago-knowledge.org/resource/wikicat_2000_events> , <http://yago-knowledge.org/resource/wordnet_act_100030358> , <http://yago-knowledge.org/resource/yagoPermanentlyLocatedEntity> , <http://yago-knowledge.org/resource/wordnet_psychological_feature_100023100> , <http://yago-knowledge.org/resource/wikicat_2000_riots> , <http://yago-knowledge.org/resource/wikicat_2000s_events> , <http://yago-knowledge.org/resource/wordnet_violence_100965404> , <http://yago-knowledge.org/resource/wikicat_May_events> , <http://yago-knowledge.org/resource/wordnet_group_action_101080366> , <http://yago-knowledge.org/resource/wordnet_abstraction_100002137> , <http://yago-knowledge.org/resource/wikicat_Riots> , <http://yago-knowledge.org/resource/wordnet_event_100029378> , <http://yago-knowledge.org/resource/wikicat_Riots_and_civil_disorder_in_Denmark> , <http://yago-knowledge.org/resource/wikicat_May_2000_events> ; 重新读入TTL文件到Jena在有了上述文件后,我们也可以调用model的read方法来重新读入TTL文件,如下 12345678910111213141516171819 File file = new File(inputDictionary); File[] fs = file.listFiles(); model = ModelFactory.createDefaultModel(); // creates an in-memory Jena Model for(File f: fs){ if(!f.isDirectory()){// System.out.println(f); try { InputStream in = new FileInputStream(f.getAbsolutePath()); RIOT.init(); System.out.println(f.getAbsolutePath()); model.read(in, null, "TURTLE"); System.out.println(model.listStatements().toList().size());// System.out.println("\n---- Turtle ----");// model.write(System.out, "TURTLE"); } catch (FileNotFoundException e) { e.printStackTrace(); } } } 注意,上述的读取是增量模式,也就是说读取多个文件不会形成覆盖,而是update。 有了这个model,我们可以重新执行各种类型的query,来得到想要的结果了。 扩展阅读1.https://jena.apache.org/documentation/io/rdf-input.html ↩ ← Previous Post Next Post→ Table of Contents 读取RDF文件将Query内容输出成TTL格式重新读入TTL文件到Jena扩展阅读