Log in

Relevant bibliographies by topics / Triplestore / Journal articles

To see the other types of publications on this topic, follow the link: Triplestore.

Journal articles on the topic 'Triplestore'

Author: Grafiati

Published: 4 June 2021

Last updated: 20 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Triplestore.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Jovanovik, Milos, Timo Homburg, and Mirko Spasić. "A GeoSPARQL Compliance Benchmark." ISPRS International Journal of Geo-Information 10, no. 7 (2021): 487. http://dx.doi.org/10.3390/ijgi10070487.

Full text

Abstract:

GeoSPARQL is an important standard for the geospatial linked data community, given that it defines a vocabulary for representing geospatial data in RDF, defines an extension to SPARQL for processing geospatial data, and provides support for both qualitative and quantitative spatial reasoning. However, what the community is missing is a comprehensive and objective way to measure the extent of GeoSPARQL support in GeoSPARQL-enabled RDF triplestores. To fill this gap, we developed the GeoSPARQL compliance benchmark. We propose a series of tests that check for the compliance of RDF triplestores with the GeoSPARQL standard, in order to test how many of the requirements outlined in the standard a tested system supports. This topic is of concern because the support of GeoSPARQL varies greatly between different triplestore implementations, and the extent of support is of great importance for different users. In order to showcase the benchmark and its applicability, we present a comparison of the benchmark results of several triplestores, providing an insight into their current GeoSPARQL support and the overall GeoSPARQL support in the geospatial linked data domain.

APA, Harvard, Vancouver, ISO, and other styles

2

Sagi, Tomer, Matteo Lissandrini, Torben Bach Pedersen, and Katja Hose. "A design space for RDF data representations." VLDB Journal 31, no. 2 (2022): 347–73. http://dx.doi.org/10.1007/s00778-021-00725-x.

Full text

Abstract:

AbstractRDF triplestores’ ability to store and query knowledge bases augmented with semantic annotations has attracted the attention of both research and industry. A multitude of systems offer varying data representation and indexing schemes. However, as recently shown for designing data structures, many design choices are biased by outdated considerations and may not result in the most efficient data representation for a given query workload. To overcome this limitation, we identify a novel three-dimensional design space. Within this design space, we map the trade-offs between different RDF data representations employed as part of an RDF triplestore and identify unexplored solutions. We complement the review with an empirical evaluation of ten standard SPARQL benchmarks to examine the prevalence of these access patterns in synthetic and real query workloads. We find some access patterns, to be both prevalent in the workloads and under-supported by existing triplestores. This shows the capabilities of our model to be used by RDF store designers to reason about different design choices and allow a (possibly artificially intelligent) designer to evaluate the fit between a given system design and a query workload.

APA, Harvard, Vancouver, ISO, and other styles

3

Yousfi, Houssameddine, Amin Mesmoudi, Allel Hadjali, Houcine Matallah, and Seif-Eddine Benkabou. "SRDF_QDAG: An efficient end-to-end RDF data management when graph exploration meets spatial processing." Computer Science and Information Systems, no. 00 (2023): 46. http://dx.doi.org/10.2298/csis230225046y.

Full text

Abstract:

The popularity of RDF has led to the creation of several datasets (e.g., Yago, DBPedia) with different natures (graph, temporal, spatial). Different extensions have also been proposed for SPARQL language to provide appropriate processing. The best known is GeoSparql, that allows the integration of a set of spatial operators. In this paper, we propose new strategies to support such operators within a particular TripleStore, named RDF QDAG, that relies on graph fragmentation and exploration and guarantees a good compromise between scalability and performance. Our proposal covers the different TripleStore components (Storage, evaluation, optimization). We evaluated our proposal using spatial queries with real RDF data, and we also compared performance with the latest version of a popular commercial TripleStore. The first results demonstrate the relevance of our proposal and how to achieve an average gain of performance of 28% by choosing the right evaluation strategies to use. Based on these results, we proposed to extend the RDF QDAG optimizer to dynamically select the evaluation strategy to use depending on the query. Then, we show also that our proposal yields the best strategy for most queries.

APA, Harvard, Vancouver, ISO, and other styles

4

Greventis, K., F. Psarommatis, A. Reina, et al. "A hybrid framework for industrial data storage and exploitation." 52nd CIRP Conference on Manufacturing Systems (CMS), Ljubljana, Slovenia, June 12-14, 2019 81 (June 24, 2019): 892–97. https://doi.org/10.1016/j.procir.2019.03.221.

Full text

Abstract:

In this paper a hybrid framework is illustrated, with a software and hardware integration strategy, for an industrial platform that exploits features from a Relational Database (RDB) and Triplestore using the blackboard architectural pattern, ensuring efficient and accurate communication concerning data transfer among software applications and devices. Specifically, “Raw Data Handler”, manages unstructured data from IoT devices that are kept in an Apache Cassandra instance, while “Production Data Handler” acts on structured data, persisted in a MySQL database. Filtered data is transformed into knowledge and persisted into the Triplestore database (DB) and can be retrieved by expert systems at any time. The proposed framework will be tested and validated within Z-Fact0r project.

APA, Harvard, Vancouver, ISO, and other styles

5

Hajjamy, Oussama El, Hajar Khallouki, Larbi Alaoui, and Mohamed Bahaj. "Semantic integration of traditional and heterogeneous data sources (UML, XML and RDB) in OWL2 triplestore." International Journal of Data Analysis Techniques and Strategies 13, no. 1/2 (2021): 36. http://dx.doi.org/10.1504/ijdats.2021.114667.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Alaoui, Larbi, Mohamed Bahaj, Oussama El Hajjamy, and Hajar Khallouki. "Semantic integration of traditional and heterogeneous data sources (UML, XML and RDB) in OWL2 triplestore." International Journal of Data Analysis Techniques and Strategies 13, no. 1/2 (2021): 36. http://dx.doi.org/10.1504/ijdats.2021.10037314.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Giannios, Giorgos, Lampros Mpaltadoros, Vasilis Alepopoulos, et al. "A Semantic Framework to Detect Problems in Activities of Daily Living Monitored through Smart Home Sensors." Sensors 24, no. 4 (2024): 1107. http://dx.doi.org/10.3390/s24041107.

Full text

Abstract:

Activities of daily living (ADLs) are fundamental routine tasks that the majority of physically and mentally healthy people can independently execute. In this paper, we present a semantic framework for detecting problems in ADLs execution, monitored through smart home sensors. In the context of this work, we conducted a pilot study, gathering raw data from various sensors and devices installed in a smart home environment. The proposed framework combines multiple Semantic Web technologies (i.e., ontology, RDF, triplestore) to handle and transform these raw data into meaningful representations, forming a knowledge graph. Subsequently, SPARQL queries are used to define and construct explicit rules to detect problematic behaviors in ADL execution, a procedure that leads to generating new implicit knowledge. Finally, all available results are visualized in a clinician dashboard. The proposed framework can monitor the deterioration of ADLs performance for people across the dementia spectrum by offering a comprehensive way for clinicians to describe problematic behaviors in the everyday life of an individual.

APA, Harvard, Vancouver, ISO, and other styles

8

Tran, Ba-Huy, Nathalie Aussenac-Gilles, Catherine Comparot, and Cassia Trojahn. "Semantic Integration of Raster Data for Earth Observation: An RDF Dataset of Territorial Unit Versions with their Land Cover." ISPRS International Journal of Geo-Information 9, no. 9 (2020): 503. http://dx.doi.org/10.3390/ijgi9090503.

Full text

Abstract:

Semantic technologies are at the core of Earth Observation (EO) data integration, by providing an infrastructure based on RDF representation and ontologies. Because many EO data come in raster files, this paper addresses the integration of data calculated from rasters as a way of qualifying geographic units through their spatio-temporal features. We propose (i) a modular ontology that contributes to the semantic and homogeneous description of spatio-temporal data to qualify predefined areas; (ii) a Semantic Extraction, Transformation, and Load (ETL) process, allowing us to extract data from rasters and to link them to the corresponding spatio-temporal units and features; and (iii) a resulting dataset that is published as an RDF triplestore, exposed through a SPARQL endpoint, and exploited by a semantic interface. We illustrate the integration process with raster files providing the land cover of a specific French winery geographic area, its administrative units, and their land registers over different periods. The results have been evaluated with regards to three use-cases exploiting these EO data: integration of time series observations; EO process guidance; and data cross-comparison.

APA, Harvard, Vancouver, ISO, and other styles

9

Blacher, Mark, Julien Klaus, Christoph Staudt, Sören Laue, Viktor Leis, and Joachim Giesen. "Efficient and Portable Einstein Summation in SQL." Proceedings of the ACM on Management of Data 1, no. 2 (2023): 1–19. http://dx.doi.org/10.1145/3589266.

Full text

Abstract:

Computational problems ranging from artificial intelligence to physics require efficient computations of large tensor expressions. These tensor expressions can often be represented in Einstein notation. To evaluate tensor expressions in Einstein notation, that is, for the actual Einstein summation, usually external libraries are used. Surprisingly, Einstein summation operations on tensors fit well with fundamental SQL constructs. We show that by applying only four mapping rules and a simple decomposition scheme using common table expressions, large tensor expressions in Einstein notation can be translated to portable and efficient SQL code. The ability to execute large Einstein summation queries opens up new possibilities to process data within SQL. We demonstrate the power of Einstein summation queries on four use cases, namely querying triplestore data, solving Boolean satisfiability problems, performing inference in graphical models, and simulating quantum circuits. The performance of Einstein summation queries, however, depends on the query engine implemented in the database system. Therefore, supporting efficient Einstein summation computations in database systems presents new research challenges for the design and implementation of query engines.

APA, Harvard, Vancouver, ISO, and other styles

10

Moreira, Dilvan de Abreu, and Davi Machado da Rocha. "Gazetteer literário de Machado de Assis." Encontros Bibli: revista eletrônica de biblioteconomia e ciência da informação 30 (March 17, 2025): 1–32. https://doi.org/10.5007/1518-2924.2025.e101283.

Full text

Abstract:

Objetivo: Este estudo tem o objetivo de desenvolver uma aplicação web semânticaque mapeia localidades geográficas nas obras de Machado de Assis, armazenando-as em uma triplestore. A partir da integração dos dados disponibilizados pela enciclopédia MachadodeAssis.net com as coordenadas geográficas de Geonames.org e GoogleMaps, o projeto visa oferecer uma experiência de leitura através de mapas interativos que servirão de suporte para as menções aos espaços realizadas pelo escritor ao longo do Século XIX. Método: Utiliza a biblioteca Python BeautifulSoup para consultas e coleta dos dados da enciclopédia, estruturando-os de acordo com os parâmetros do schema.org. As citações coletadas são submetidas aos modelos gpt3.5-instruct e gpt4-turbo para obter os nomes atuais das localidades e a devida classificação destes espaços de acordo com a ontologia Geonames.org. Consultas SPARQL são realizadas ao portal dados.literaturabrasileira.ufsc.br para obter identificadores únicos para cada livro. Resultado: A aplicação oferece uma integração entre mapas, citações e textos completos, em consonância com os padrões Linked Data. Conclusões: A intersecção entre tecnologia, literatura e geolocalização pode oferecer experiências de leitura interessantes, proporcionando um terreno fértil para o desenvolvimento das chamadas humanidades digitais.

APA, Harvard, Vancouver, ISO, and other styles

11

Angioni, Simone, Angelo Salatino, Francesco Osborne, Diego Reforgiato Recupero, and Enrico Motta. "AIDA: A knowledge graph about research dynamics in academia and industry." Quantitative Science Studies 2, no. 4 (2021): 1356–98. http://dx.doi.org/10.1162/qss_a_00162.

Full text

Abstract:

Abstract Academia and industry share a complex, multifaceted, and symbiotic relationship. Analyzing the knowledge flow between them, understanding which directions have the biggest potential, and discovering the best strategies to harmonize their efforts is a critical task for several stakeholders. Research publications and patents are an ideal medium to analyze this space, but current data sets of scholarly data cannot be used for such a purpose because they lack a high-quality characterization of the relevant research topics and industrial sectors. In this paper, we introduce the Academia/Industry DynAmics (AIDA) Knowledge Graph, which describes 21 million publications and 8 million patents according to the research topics drawn from the Computer Science Ontology. 5.1 million publications and 5.6 million patents are further characterized according to the type of the author’s affiliations and 66 industrial sectors from the proposed Industrial Sectors Ontology (INDUSO). AIDA was generated by an automatic pipeline that integrates data from Microsoft Academic Graph, Dimensions, DBpedia, the Computer Science Ontology, and the Global Research Identifier Database. It is publicly available under CC BY 4.0 and can be downloaded as a dump or queried via a triplestore. We evaluated the different parts of the generation pipeline on a manually crafted gold standard yielding competitive results.

APA, Harvard, Vancouver, ISO, and other styles

12

Günther, Taras, Matthias Filter, and Fernanda Dórea. "Making Linked Data accessible for One Health Surveillance with the "One Health Linked Data Toolbox"." ARPHA Conference Abstracts 4 (May 28, 2021): e68821. https://doi.org/10.3897/aca.4.e68821.

Full text

Abstract:

In times of emerging diseases, data sharing and data integration are of particular relevance for One Health Surveillance (OHS) and decision support. Furthermore, there is an increasing demand to provide governmental data in compliance to the FAIR (Findable, Accessible, Interoperable, Reusable) data principles. Semantic web technologies are key facilitators for providing data interoperability, as they allow explicit annotation of data with their meaning, enabling reuse without loss of the data collection context. Among these, we highlight ontologies as a tool for modeling knowledge in a field, which simplify the interpretation and mapping of datasets in a computer readable medium; and the Resource Description Format (RDF), which allows data to be shared among human and computer agents following this knowledge model. Despite their potential for enabling cross-sectoral interoperability and data linkage, the use and application of these technologies is often hindered by their complexity and the lack of easy-to-use software applications.To overcome these challenges the OHEJP Project ORION developed the Health Surveillance Ontology (HSO). This knowledge model forms a foundation for semantic interoperability in the domain of One Health Surveillance. It provides a solution to add data from the target sectors (public health, animal health and food safety) in compliance with the FAIR principles of findability, accessibility, interoperability, and reusability, supporting interdisciplinary data exchange and usage. To provide use cases and facilitate the accessibility to HSO, we developed the One Health Linked Data Toolbox (OHLDT), which consists of three new and custom-developed web applications with specific functionalities. The first web application allows users to convert surveillance data available in Excel files online into HSO-RDF and vice versa. The web application demonstrates that data provided in well-established data formats can be automatically translated in the linked data format HSO-RDF. The second application is a demonstrator of the usage of HSO-RDF in a HSO triplestore database. In the user interface of this application, the user can select HSO concepts based on which to search and filter among surveillance datasets stored in a HSO triplestore database. The service then provides automatically generated dashboards based on the context of the data. The third web application demonstrates the use of data interoperability in the OHS context by using HSO-RDF to annotate meta-data, and in this way link datasets across sectors. The web application provides a dashboard to compare public data on zoonosis surveillance provided by EFSA and ECDC.The first solution enables linked data production, while the second and third provide examples of linked data consumption, and their value in enabling data interoperability across sectors. All described solutions are based on the open-source software KNIME and are deployed as web service via a KNIME Server hosted at the German Federal Institute for Risk Assessment. The semantic web extension of KNIME, which is based on the Apache Jena Framework, allowed a rapid an easy development within the project. The underlying open source KNIME workflows are freely available and can be easily customized by interested end users.With our applications, we demonstrate that the use of linked data has a great potential strengthening the use of FAIR data in OHS and interdisciplinary data exchange.

APA, Harvard, Vancouver, ISO, and other styles

13

Giallonardo, Ester, Francesco Poggi, Davide Rossi, and Eugenio Zimeo. "Semantics-Driven Programming of Self-Adaptive Reactive Systems." International Journal of Software Engineering and Knowledge Engineering 30, no. 06 (2020): 805–34. http://dx.doi.org/10.1142/s0218194020400082.

Full text

Abstract:

In recent years, new classes of highly dynamic, complex systems are gaining momentum. These classes include, but are not limited to IoT, smart cities, cyber-physical systems and sensor networks. These systems are characterized by the need to express behaviors driven by external and/or internal changes, i.e. they are reactive and context-aware. A desirable design feature of these systems is the ability of adapting their behavior to environment changes. In this paper, we propose an approach to support adaptive, reactive systems based on semantic runtime representations of their context, enabling the selection of equivalent behaviors, i.e. behaviors that have the same effect on the environment. The context representation and the related knowledge are managed by an engine designed according to a reference architecture and programmable through a declarative definition of sensors and actuators. The knowledge base of sensors and actuators (hosted by an RDF triplestore) is bound to the real world by grounding semantic elements to physical devices via REST APIs. The proposed architecture along with the defined ontology tries to address the main problems of dynamically re-configurable systems by exploiting a declarative, queryable approach to enable runtime reconfiguration with the help of (a) semantics to support discovery in heterogeneous environment, (b) composition logic to define alternative behaviors for variation points, (c) bi-causal connection life-cycle to avoid dangling links with the external environment. The proposal is validated in a case study aimed at designing an edge node for smart buildings dedicated to cultural heritage preservation.

APA, Harvard, Vancouver, ISO, and other styles

14

Piirainen, Esko, Eija-Leena Laiho, Bonsdorff Tea von, and Tapani Lahti. "Managing Taxon Data in FinBIF." Biodiversity Information Science and Standards 3 (June 26, 2019): e37422. https://doi.org/10.3897/biss.3.37422.

Full text

Abstract:

The Finnish Biodiversity Information Facility, FinBIF (https://species.fi), has developed its own taxon database. This allows FinBIF taxon specialists to maintain their own, expert-validated view of Finnish species. The database covers national needs and can be rapidly expanded by our own development team. Furthermore, in the database each taxon is given a globally unique persistent URI identifier (https://www.w3.org/TR/uri-clarification), which refers to the taxon concept, not just to the name. The identifier doesn't change if the taxon concept doesn't change. We aim to ensure compatibility with checklists from other countries by linking taxon concepts as Linked Data (https://www.w3.org/wiki/LinkedData) — a work started as a part of the Nordic e-Infrastructure Collaboration (NeIC) DeepDive project (https://neic.no/deepdive). The database is used as a basis for observation/specimen searches, e-Learning and identification tools, and it is browsable by users of the FinBIF portal. The data is accessible to everyone under CC-BY 4.0 license (https://creativecommons.org/licenses/by/4.0) in machine readable formats. The taxon specialists maintain the taxon data using a web application. Currently, there are 60 specialists. All changes made to the data go live every night. The nightly update interval allows the specialists a grace period to make their changes. Allowing the taxon specialists to modify the taxonomy database themselves leads to some challenges. To maintain the integrity of critical data, such as lists of protected species, we have had to limit what the specialists can do. Changes to critical data is carried out by an administrator. The database has special features for linking observations to the taxonomy. These include hidden species aggregates and tools to override how a certain name used in observations is linked to the taxonomy. Misapplied names remain an unresolved problem. The most precise way to record an observation is to use a taxon concept: Most observations are still recorded using plain names, but it is possible for the observer to pick a concept. Also, when data is published in FinBIF from other information systems, the data providers can link their observations to the concepts using the identifiers of concepts. The ability to use taxon concepts as basis of observations means we have to maintain the concepts over time — a task that may become arduous in the future (Fig. 1). As it stands now, the FinBIF taxon data model — including adjacent classes such as publication, person, image, and endangerment assessments — consists of 260 properties. If the data model were stored in a normalized relational database, there would be approximately 56 tables, which could be difficult to maintain. Keeping track of a complete history of data is difficult in relational databases. Alternatively, we could use document storage to store taxon data. However, there are some difficulties associated with document storages: (1) much work is required to implement a system that does small atomic update operations; (2) batch updates modifying multiple documents usually require writing a script; and (3) they are not ideal for doing searches. We use a document storage for observation data, however, because they are well suited for storing large quantities of complex records. In FinBIF, we have decided to use a triplestore for all small datasets, such as taxon data. More specifically, the data is stored according to the RDF specification (https://www.w3.org/RDF). An RDF Schema defines the allowed properties for each class. Our triplestore implementation is an Oracle relational database with two tables (resource and statement), which gives us the ability to do SQL queries and updates. Doing small atomic updates is easy as only a small subset of the triplets can be updated instead of the entire data entity. Maintaining a complete record of history comes without much effort, as it can be done on an individual triplet level. For performance-critical queries, the taxon data is loaded into an Elasticsearch (https://www.elastic.co) search engine.

APA, Harvard, Vancouver, ISO, and other styles

15

HACHEY, B., C. GROVER, and R. TOBIN. "Datasets for generic relation extraction." Natural Language Engineering 18, no. 1 (2011): 21–59. http://dx.doi.org/10.1017/s1351324911000106.

Full text

Abstract:

AbstractA vast amount of usable electronic data is in the form of unstructured text. The relation extraction task aims to identify useful information in text (e.g. PersonW works for OrganisationX, GeneY encodes ProteinZ) and recode it in a format such as a relational database or RDF triplestore that can be more effectively used for querying and automated reasoning. A number of resources have been developed for training and evaluating automatic systems for relation extraction in different domains. However, comparative evaluation is impeded by the fact that these corpora use different markup formats and notions of what constitutes a relation. We describe the preparation of corpora for comparative evaluation of relation extraction across domains based on the publicly available ACE 2004, ACE 2005 and BioInfer data sets. We present a common document type using token standoff and including detailed linguistic markup, while maintaining all information in the original annotation. The subsequent reannotation process normalises the two data sets so that they comply with a notion of relation that is intuitive, simple and informed by the semantic web. For the ACE data, we describe an automatic process that automatically converts many relations involving nested, nominal entity mentions to relations involving non-nested, named or pronominal entity mentions. For example, the first entity is mapped from ‘one’ to ‘Amidu Berry’ in the membership relation described in ‘Amidu Berry, one half of PBS’. Moreover, we describe a comparably reannotated version of the BioInfer corpus that flattens nested relations, maps part-whole to part-part relations and maps n-ary to binary relations. Finally, we summarise experiments that compare approaches to generic relation extraction, a knowledge discovery task that uses minimally supervised techniques to achieve maximally portable extractors. These experiments illustrate the utility of the corpora.1

APA, Harvard, Vancouver, ISO, and other styles

16

Schilling, S., and C. Clemen. "PRACTICAL EXAMPLES ON BIM-GIS INTEGRATION BASED ON SEMANTIC WEB TRIPLESTORES." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLVI-5/W1-2022 (February 3, 2022): 211–16. http://dx.doi.org/10.5194/isprs-archives-xlvi-5-w1-2022-211-2022.

Full text

Abstract:

Abstract. The integration of geodata and building models is one of the current challenges in the AECOO (architecture, engineering, construction, owner, operation) domain. Data from Building Information Models (BIM) and Geographical Information Systems (GIS) can’t be simply mapped 1:1 to each other because of their different domains. One possible approach is to convert all data in a domain-independent format and link them together in a semantic database. To demonstrate, how this data integration can be done in a federated database architecture, we utilize concepts of the semantic web, ontologies and the Resource Description Framework (RDF). It turns out, however, that traditional object-relational approaches provide more efficient access methods on geometrical representations than triplestores. Therefore we developed a hybrid approach with files, geodatabases and triplestores. This work-in-progress-paper (extend abstract) demonstrates our intermediate research results by practical examples and identifies opportunities and limitations of the hybrid approach.

APA, Harvard, Vancouver, ISO, and other styles

17

Gmür, Reto, Donat Agosti, and Guido Sautter. "Synospecies, a Linked Data Application to Explore Taxonomic Names." Biodiversity Information Science and Standards 6 (August 23, 2022): e93707. https://doi.org/10.3897/biss.6.93707.

Full text

Abstract:

Synospecies is a linked data application to explore changes in taxonomic names (Gmür and Agosti 2021). The underlying source of truth for the establishment of taxa, the assignment and re-assignment of names, are taxonomic treatments. Taxonomic treatments are sections of publications documenting the features or distribution of taxa in ways adhering to highly formalized conventions, and published in scientific journals, which shape our understanding of global biodiversity (Catapano 2010). Plazi, a not-for-profit organization dedicated to liberating knowledge, extracts the relevant information from these treatments and makes it publicly available in digital form. Depending on the original form of a publication, a treatment undergoes several steps during its processing. All these steps affect the available digital artifacts extracted from the treatment's original publication. The treatments are digitalized, the text is annotated with a specialized editor, and cross-referenced and enhanced with other sources (Agosti and Sautter 2018). After these steps, the annotated text is transformed to the different structured data-formats used by other digital biodiversity platforms (e.g., Global Biodiversity Information Facility: Plazi.org taxonomic treatment database using Darwin Core Archive, generic linked data tools (e.g. lod view; RDF2h Browser) and other consuming applications (e.g Ocellus via Zenodeo using XML; openBioDiv using XML; HMW using XML; Biotic interaction browser using TaxPub XML; opendata.swiss using RDF) .While these transformations have been taking place for a long time now, Plazi is now experimenting with making this process more transparent: with the Plazi Actionable Accessible Archive (PAAA) architecture both addition and modification of the digitalized treatments trigger an extensible set of workflows that are immediately executed on the GitHub platform. Not only is the exact definition and code of every workflow publicly accessible, but the results, errors and execution time of every single workflow is accessible as well. This offers an unprecedented degree of transparency and flexibility in the data processing that we have prototypically implemented for the creation of the RDF data used by Synospecies. As with the W3C GRDDL recommendation (https://www.w3.org/TR/grddl/) XSLT is used to transform XML to RDF/XML, a concrete syntax of the early days of RDF still supported by most RDF tools, allowing the data to be read as RDF. The used XSLT document is part of the bundled gg2rdf GitHub action (https://github.com/plazi/gg2rdf) together with the other transformation steps required to generate a transformation result in the both human- and machine-readable RDF Turtle format. On the GitHub Actions page of the treatments-xml repository (https://github.com/plazi/treatments-xml/actions) one can see that every commit to this repository triggers a workflow run that takes approximately 12 minutes to execute. After that the transformation results are available in the treatments-rdf repository (https://github.com/plazi/treatments-rdf/). The commit of RDF data to the treatments-rdf repository triggers a webhook that loads the newly added data to the Plazi triplestore making it virtually immediately available in Synospecies.

APA, Harvard, Vancouver, ISO, and other styles

18

Heikkinen, Mikko, Ville-Matti Riihikoski, Anniina Kuusijärvi, Dare Talvitie, Tapani Lahti, and Leif Schulman. "Kotka - A national multi-purpose collection management system." Biodiversity Information Science and Standards 3 (June 18, 2019): e37179. https://doi.org/10.3897/biss.3.37179.

Full text

Abstract:

Many natural history museums share a common problem: a multitude of legacy collection management systems (CMS) and the difficulty of finding a new system to replace them. Kotka is a CMS created by the Finnish Museum of Natural History (Luomus) to solve this problem. Its development started in late 2011 and was put into operational use in 2012. Kotka was first built to replace dozens of in-house systems previously used at Luomus, but eventually grew into a national system, which is now used by 10 institutions in Finland. Kotka currently holds c. 1.7 million specimens from zoological, botanical, paleontological, microbial and botanic garden collections, as well as data from genomic resource collections. Kotka is designed to fit the needs of different types of collections and can be further adapted when new needs arise. Kotka differs in many ways from traditional CMS's. It applies simple and pragmatic approaches. This has helped it to grow into a widely used system despite limited development resources – on average less than one full-time equivalent developer (FTE). The aim of Kotka is to improve collection management efficiency by providing practical tools. It emphasizes the quantity of digitized specimens over completeness of the data. It also harmonizes collection management practices by bringing all types of collections under one system. Kotka stores data mostly in a denormalized free text format using a triplestore and a simple hierarchical data model (Fig. 1). This allows greater flexibility of use and faster development compared to a normalized relational database. New data fields and structures can easily be added as needs arise. Kotka does some data validation, but quality control is seen as a continuous process and is mostly done after the data has been recorded into the system. The data model is loosely based on the ABCD (Access to Biological Collection Data) standard, but has been adapted to support practical needs. Kotka is a web application and data can be entered, edited, searched and exported through a browser-based user interface. However, most users prefer to enter new data in customizable MS-Excel templates, which support the hierarchical data model, and upload these to Kotka. Batch updates can also be done using Excel. Kotka stores all revisions of the data to avoid any data loss due to technical or human error. Kotka also supports designing and printing specimen labels, annotations by external users, as well as handling accessions, loan transactions, and the Nagoya protocol. Taxonomy management is done using a separate system provided by the Finnish Biodiversity Information Facility (FinBIF). This decoupling also allows entering specimen data before the taxonomy is updated, which speeds up specimen digitization. Every specimen is given a persistent unique HTTP-URI identifier (CETAF stable identifiers). Specimen data is accessible through the FinBIF portal at species.fi, and will later be shared to GBIF according to agreements with data holders. Kotka is continuously developed and adapted to new requirements in close collaboration with curators and technical collection staff, using agile software development methods. It is available as open source, but is tightly integrated with other FinBIF infrastructure, and currently only offered as an online service (Software as a Service) hosted by FinBIF.

APA, Harvard, Vancouver, ISO, and other styles

19

Lee, Sangkeun, Sreenivas R. Sukumar, Seokyong Hong, and Seung-Hwan Lim. "Enabling graph mining in RDF triplestores using SPARQL for holistic in-situ graph analysis." Expert Systems with Applications 48 (April 2016): 9–25. http://dx.doi.org/10.1016/j.eswa.2015.11.010.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Stork, Lise, Andreas Weber, and Katherine Wolstencroft. "The Semantic Field Book Annotator." Biodiversity Information Science and Standards 3 (June 19, 2019): e37223. https://doi.org/10.3897/biss.3.37223.

Full text

Abstract:

Biodiversity research expeditions to the globe's most biodiverse areas have been conducted for several hundred years. Natural history museums contain a wealth of historical materials from such expeditions, but they are stored in a fragmented way. As a consequence links between the various resources, e.g., specimens, illustrations and field notes, are often lost and are not easily re-established. Natural history museums have started to use persistent identifiers for physical collection objects, such as specimens, as well as associated information resources, such as web pages and multimedia. As a result, these resources can more easily be linked, using Linked Open Data (LOD), to information sources on the web. Specimens can be linked to taxonomic backbones of data providers, e.g., the Encyclopedia Of Life (EOL), the Global Biodiversity Information Facility (GBIF), or publications with Digital Object Identifiers (DOI). For the <em>content</em> of biodiversity expedition archives, (e.g. field notes), no such formalisations exist. However, linking the specimens to specific handwritten notes taken in the field can increase their scientific value. Specimens are generally accompanied by a label containing the location of the site where the specimen was collected, the collector's name and the classification. Field notes often augment the basic metadata found with specimens with important details concerning, for instance, an organism's habitat and morphology. Therefore, inter-collection interoperability of multimodal resources is just as important as intra-collection interoperability of unimodal resources. The linking of field notes and illustrations to specimens entails a number of challenges: historical handwritten content is generally difficult to read and interpret, especially due to changing taxonomic systems, nomenclature and collection practices. It is vital that: the content is structured in a similar way as the specimens, so that links can more easily be re-established either manually or in an automated way; for consolidation, the content is enriched with outgoing links to semantic resources, such as Geonames or Virtual International Authority File (VIAF); and this process is a transparent one: how links are established, why and by whom, should be stored to encourage scholarly discussions and to promote the attribution of efforts. In order to address some of these issues, we have built a tool, the Semantic Field Book Annotator (SFB-A), that allows for the direct annotation of digitised (scanned) pages of field books and illustrations with Linked Open Data (LOD). The tool guides the user through the annotation process, so that semantic links are automatically generated in a formalised way. These annotations and links are subsequently stored in an RDF triplestore. As the use of the Darwin Core standard is considered best practice among collection managers for the digitisation of their specimens, our tool is equipped with an ontology based on Darwin Core terms, the NHC-Ontology, which extends the Darwin Semantic Web (DSW) ontology. The tool can annotate any image, be it an image of a specimen with a textual label, an illustration with a textual label or a handwritten species description. Interoperability of annotations between the various resources within a collection is therefore ensured. Terms in the ontology are structured using OWL web ontology language. This allows for more complex tasks such as OWL reasoning and semantic queries, and facilitates the creation of a richer knowledge base that is more amenable to research.

APA, Harvard, Vancouver, ISO, and other styles

21

Devezas, José. "Graph-based entity-oriented search." ACM SIGIR Forum 55, no. 1 (2021): 1–2. http://dx.doi.org/10.1145/3476415.3476430.

Full text

Abstract:

Entity-oriented search has revolutionized search engines. In the era of Google Knowledge Graph and Microsoft Satori, users demand an effortless process of search. Whether they express an information need through a keyword query, expecting documents and entities, or through a clicked entity, expecting related entities, there is an inherent need for the combination of corpora and knowledge bases to obtain an answer. Such integration frequently relies on independent signals extracted from inverted indexes, and from quad indexes indirectly accessed through queries to a triplestore. However, relying on two separate representation models inhibits the effective cross-referencing of information, discarding otherwise available relations that could lead to a better ranking. Moreover, different retrieval tasks often demand separate implementations, although the problem is, at its core, the same. With the goal of harnessing all available information to optimize retrieval, we explore joint representation models of documents and entities, while taking a step towards the definition of a more general retrieval approach. Specifically, we propose that graphs should be used to incorporate explicit and implicit information derived from the relations between text found in corpora and entities found in knowledge bases. We also take advantage of this framework to elaborate a general model for entity-oriented search, proposing a universal ranking function for the tasks of ad hoc document retrieval (leveraging entities), ad hoc entity retrieval, and entity list completion. At a conceptual stage, we begin by proposing the graph-of-entity, based on the relations between combinations of term and entity nodes. We introduce the entity weight as the corresponding ranking function, relying on the idea of seed nodes for representing the query, either directly through term nodes, or based on the expansion to adjacent entity nodes. The score is computed based on a series of geodesic distances to the remaining nodes, providing a ranking for the documents (or entities) in the graph. In order to improve on the low scalability of the graph-of-entity, we then redesigned this model in a way that reduced the number of edges in relation to the number of nodes, by relying on the hypergraph data structure. The resulting model, which we called hypergraph-of-entity, is the main contribution of this thesis. The obtained reduction was achieved by replacing binary edges with n -ary relations based on sets of nodes and entities (undirected document hyperedges), sets of entities (undirected hyperedges, either based on cooccurrence or a grouping by semantic subject), and pairs of a set of terms and a set of one entity (directed hyperedges, mapping text to an object). We introduce the random walk score as the corresponding ranking function, relying on the same idea of seed nodes, similar to the entity weight in the graph-of-entity. Scoring based on this function is highly reliant on the structure of the hypergraph, which we call representation-driven retrieval. As such, we explore several extensions of the hypergraph-of-entity, including relations of synonymy, or contextual similarity, as well as different weighting functions per node and hyperedge type. We also propose TF-bins as a discretization for representing term frequency in the hypergraph-of-entity. For the random walk score, we propose and explore several parameters, including length and repeats, with or without seed node expansion, direction, or weights, and with or without a certain degree of node and/or hyperedge fatigue, a concept that we also propose. For evaluation, we took advantage of TREC 2017 OpenSearch track, which relied on an online evaluation process based on the Living Labs API, and we also participated in TREC 2018 Common Core track, which was based on the newly introduced TREC Washington Post Corpus. Our main experiments were supported on the INEX 2009 Wikipedia collection, which proved to be a fundamental test collection for assessing retrieval effectiveness across multiple tasks. At first, our experiments solely focused on ad hoc document retrieval, ensuring that the model performed adequately for a classical task. We then expanded the work to cover all three entity-oriented search tasks. Results supported the viability of a general retrieval model, opening novel challenges in information retrieval, and proposing a new path towards generality in this area.

APA, Harvard, Vancouver, ISO, and other styles

22

Yamada, Issaku, Matthew P. Campbell, Nathan Edwards, et al. "The glycoconjugate ontology (GlycoCoO) for standardizing the annotation of glycoconjugate data and its application." Glycobiology 31, no. 7 (2021): 741–50. http://dx.doi.org/10.1093/glycob/cwab013.

Full text

Abstract:

Abstract Recent years have seen great advances in the development of glycoproteomics protocols and methods resulting in a sustainable increase in the reporting proteins, their attached glycans and glycosylation sites. However, only very few of these reports find their way into databases or data repositories. One of the major reasons is the absence of digital standard to represent glycoproteins and the challenging annotations with glycans. Depending on the experimental method, such a standard must be able to represent glycans as complete structures or as compositions, store not just single glycans but also represent glycoforms on a specific glycosylation side, deal with partially missing site information if no site mapping was performed, and store abundances or ratios of glycans within a glycoform of a specific site. To support the above, we have developed the GlycoConjugate Ontology (GlycoCoO) as a standard semantic framework to describe and represent glycoproteomics data. GlycoCoO can be used to represent glycoproteomics data in triplestores and can serve as a basis for data exchange formats. The ontology, database providers and supporting documentation are available online (https://github.com/glycoinfo/GlycoCoO).

APA, Harvard, Vancouver, ISO, and other styles

23

Peroni, Silvio, and David Shotton. "OpenCitations, an infrastructure organization for open scholarship." Quantitative Science Studies 1, no. 1 (2020): 428–44. http://dx.doi.org/10.1162/qss_a_00023.

Full text

Abstract:

OpenCitations is an infrastructure organization for open scholarship dedicated to the publication of open citation data as Linked Open Data using Semantic Web technologies, thereby providing a disruptive alternative to traditional proprietary citation indexes. Open citation data are valuable for bibliometric analysis, increasing the reproducibility of large-scale analyses by enabling publication of the source data. Following brief introductions to the development and benefits of open scholarship and to Semantic Web technologies, this paper describes OpenCitations and its data sets, tools, services, and activities. These include the OpenCitations Data Model; the SPAR (Semantic Publishing and Referencing) Ontologies; OpenCitations’ open software of generic applicability for searching, browsing, and providing REST APIs over resource description framework (RDF) triplestores; Open Citation Identifiers (OCIs) and the OpenCitations OCI Resolution Service; the OpenCitations Corpus (OCC), a database of open downloadable bibliographic and citation data made available in RDF under a Creative Commons public domain dedication; and the OpenCitations Indexes of open citation data, of which the first and largest is COCI, the OpenCitations Index of Crossref Open DOI-to-DOI Citations, which currently contains over 624 million bibliographic citations and is receiving considerable usage by the scholarly community.

APA, Harvard, Vancouver, ISO, and other styles

24

Baskauf, Steven J. "Having Your Cake and Eating It Too: JSON-LD as an RDF serialization format." Biodiversity Information Science and Standards 5 (September 13, 2021): e74266. https://doi.org/10.3897/biss.5.74266.

Full text

Abstract:

One impediment to the uptake of linked data technology is developers' unfamiliarity with typical Resource Description Framework (RDF) serializations like Turtle and RDF/XML. JSON for Linking Data (JSON-LD) is designed to bypass this problem by expressing linked data in the well-known Javascript Object Notation (JSON) format that is popular with developers. JSON-LD is now Google's preferred format for exposing Schema.org structured data in web pages for search optimization, leading to its widespread use by web developers. Another successful use of JSON-LD is by the International Image Interoperability Framework (IIIF), which limits its use to a narrow design pattern, which is readily consumed by a variety of applications. This presentation will show how a similar design pattern has been used in Audubon Core and with Biodiversity Information Standards (TDWG) controlled vocabularies to serialize data in a manner that is both easily consumed by conventional applications, but which also can be seamlessly loaded as RDF into triplestores or other linked data applications. The presentation will also suggest how JSON-LD might be used in other contexts within TDWG vocabularies, including with the Darwin Core Resource Relationship terms.

APA, Harvard, Vancouver, ISO, and other styles

25

NuringHati, Melati. "Analisis Teknik Permainan Violin Concertino In D 1st Movement Karya Hans M. Millies dalam Style Mozart." Repertoar Journal 5, no. 1 (2024): 1–12. http://dx.doi.org/10.26740/rj.v5n1.p1-12.

Full text

Abstract:

Musik romantik memiliki karakteristik yang kompleks sehingga penelitian memiliki tujuan untuk mendeskripsikan teknik permainan violin meliputi teknik bowing serta tanda dinamika pada komposisi musik ini, khususnya musik era romantik. Penelitian menggunakan metode kualitatif deskriptif dengan pendekatan berupa studi literatur dan dokumentasi. Hasil dan pembahasan dari penelitian ini adalah bentuk musik dan teknik permainan violin pada komposisi berjudul Concertino in D 1st movement karya Hans M. Millies. Subjek dari penelitian ini adalah teknik permainan yang ada didalam partitur dari komposisi concertino in D 1st movement sehingga instrumen penelitian yang digunakan yakni peneliti sendiri dan buku partitur. Dalam pengumpulan data yang digunakan meliputi studi literatur dan musikologi. Reduksi data, penyajian data, dan penarikan kesimpulan menjadi teknik dalam menganalisis data. Hasil dari penelitian ini terdapat beberapa teknik permainan yaitu legato, staccato, aciaccatura, tenuto, trill, doublestops, triplestops, dan quadruplestops. Bentuk musik pada komposisi ini adalah Concertino dengan A-B-C-D sebagai pola lagu.

APA, Harvard, Vancouver, ISO, and other styles

26

Honti, Gergely, and János Abonyi. "Frequent Itemset Mining and Multi-Layer Network-Based Analysis of RDF Databases." Mathematics 9, no. 4 (2021): 450. http://dx.doi.org/10.3390/math9040450.

Full text

Abstract:

Triplestores or resource description framework (RDF) stores are purpose-built databases used to organise, store and share data with context. Knowledge extraction from a large amount of interconnected data requires effective tools and methods to address the complexity and the underlying structure of semantic information. We propose a method that generates an interpretable multilayered network from an RDF database. The method utilises frequent itemset mining (FIM) of the subjects, predicates and the objects of the RDF data, and automatically extracts informative subsets of the database for the analysis. The results are used to form layers in an analysable multidimensional network. The methodology enables a consistent, transparent, multi-aspect-oriented knowledge extraction from the linked dataset. To demonstrate the usability and effectiveness of the methodology, we analyse how the science of sustainability and climate change are structured using the Microsoft Academic Knowledge Graph. In the case study, the FIM forms networks of disciplines to reveal the significant interdisciplinary science communities in sustainability and climate change. The constructed multilayer network then enables an analysis of the significant disciplines and interdisciplinary scientific areas. To demonstrate the proposed knowledge extraction process, we search for interdisciplinary science communities and then measure and rank their multidisciplinary effects. The analysis identifies discipline similarities, pinpointing the similarity between atmospheric science and meteorology as well as between geomorphology and oceanography. The results confirm that frequent itemset mining provides an informative sampled subsets of RDF databases which can be simultaneously analysed as layers of a multilayer network.

APA, Harvard, Vancouver, ISO, and other styles

27

Schweizer, Tobias, and Benjamin Geer. "Gravsearch: Transforming SPARQL to query humanities data." Semantic Web, September 23, 2020, 1–22. http://dx.doi.org/10.3233/sw-200386.

Full text

Abstract:

RDF triplestores have become an appealing option for storing and publishing humanities data, but available technologies for querying this data have drawbacks that make them unsuitable for many applications. Gravsearch (Virtual Graph Search), a SPARQL transformer developed as part of a web-based API, is designed to support complex searches that are desirable in humanities research, while avoiding these disadvantages. It does this by introducing server software that mediates between the client and the triplestore, transforming an input SPARQL query into one or more queries executed by the triplestore. This design suggests a practical way to go beyond some limitations of the ways that RDF data has generally been made available.

APA, Harvard, Vancouver, ISO, and other styles

28

Tristan, O'Neill, Myers Trina, and Trevathan Jarrod. "A Decision Matrix for the Evaluation of Triplestores for Use in a Virtual Research Environment." October 6, 2013. https://doi.org/10.5281/zenodo.1088782.

Full text

Abstract:

The Tropical Data Hub (TDH) is a virtual research environment that provides researchers with an e-research infrastructure to congregate significant tropical data sets for data reuse, integration, searching, and correlation. However, researchers often require data and metadata synthesis across disciplines for cross-domain analyses and knowledge discovery. A triplestore offers a semantic layer to achieve a more intelligent method of search to support the synthesis requirements by automating latent linkages in the data and metadata. Presently, the benchmarks to aid the decision of which triplestore is best suited for use in an application environment like the TDH are limited to performance. This paper describes a new evaluation tool developed to analyze both features and performance. The tool comprises a weighted decision matrix to evaluate the interoperability, functionality, performance, and support availability of a range of integrated and native triplestores to rank them according to requirements of the TDH.

APA, Harvard, Vancouver, ISO, and other styles

29

Tristan, O'Neill, Myers Trina, and Trevathan Jarrod. "An Evaluation Model for Semantic Enablement of Virtual Research Environments." December 20, 2012. https://doi.org/10.5281/zenodo.1054873.

Full text

Abstract:

The Tropical Data Hub (TDH) is a virtual research environment that provides researchers with an e-research infrastructure to congregate significant tropical data sets for data reuse, integration, searching, and correlation. However, researchers often require data and metadata synthesis across disciplines for crossdomain analyses and knowledge discovery. A triplestore offers a semantic layer to achieve a more intelligent method of search to support the synthesis requirements by automating latent linkages in the data and metadata. Presently, the benchmarks to aid the decision of which triplestore is best suited for use in an application environment like the TDH are limited to performance. This paper describes a new evaluation tool developed to analyze both features and performance. The tool comprises a weighted decision matrix to evaluate the interoperability, functionality, performance, and support availability of a range of integrated and native triplestores to rank them according to requirements of the TDH.

APA, Harvard, Vancouver, ISO, and other styles

30

Fichtner, Mark, Robert Nasarek, and Tom Wiesing. "WissKI." Proceedings of the Conference on Research Data Infrastructure 1 (September 7, 2023). http://dx.doi.org/10.52825/cordi.v1i.353.

Full text

Abstract:

WissKI is a free and open source virtual research environment based on the free and open source content management system Drupal. It features everything that the content management system provides while using a triplestore for authorative data storage. Thus changes can be made in the triplestore and they are directly reflected by the system. Furthermore WissKI provides all features that are necessary for a full Linked Open Data Semantic Web platform.

APA, Harvard, Vancouver, ISO, and other styles

31

Daquino, Marilena, Ivan Heibi, Silvio Peroni, and David Shotton. "Creating RESTful APIs over SPARQL endpoints using RAMOSE." Semantic Web, September 14, 2021, 1–19. http://dx.doi.org/10.3233/sw-210439.

Full text

Abstract:

Semantic Web technologies are widely used for storing RDF data and making them available on the Web through SPARQL endpoints, queryable using the SPARQL query language. While the use of SPARQL endpoints is strongly supported by Semantic Web experts, it hinders broader use of RDF data by common Web users, engineers and developers unfamiliar with Semantic Web technologies, who normally rely on Web RESTful APIs for querying Web-available data and creating applications over them. To solve this problem, we have developed RAMOSE, a generic tool developed in Python to create REST APIs over SPARQL endpoints. Through the creation of source-specific textual configuration files, RAMOSE enables the querying of SPARQL endpoints via simple Web RESTful API calls that return either JSON or CSV-formatted data, thus hiding all the intrinsic complexities of SPARQL and RDF from common Web users. We provide evidence that the use of RAMOSE to provide REST API access to RDF data within OpenCitations triplestores is beneficial in terms of the number of queries made by external users of such RDF data using the RAMOSE API, compared with the direct access via the SPARQL endpoint. Our findings show the importance for suppliers of RDF data of having an alternative API access service, which enables its use by those with no (or little) experience in Semantic Web technologies and the SPARQL query language. RAMOSE can be used both to query any SPARQL endpoint and to query any other Web API, and thus it represents an easy generic technical solution for service providers who wish to create an API service to access Linked Data stored as RDF in a triplestore.

APA, Harvard, Vancouver, ISO, and other styles

32

Daga, Enrico, Albert Meroño-Peñuela, and Enrico Motta. "Sequential linked data: The state of affairs." Semantic Web, July 28, 2021, 1–36. http://dx.doi.org/10.3233/sw-210436.

Full text

Abstract:

Sequences are among the most important data structures in computer science. In the Semantic Web, however, little attention has been given to Sequential Linked Data. In previous work, we have discussed the data models that Knowledge Graphs commonly use for representing sequences and showed how these models have an impact on query performance and that this impact is invariant to triplestore implementations. However, the specific list operations that the management of Sequential Linked Data requires beyond the simple retrieval of an entire list or a range of its elements – e.g. to add or remove elements from a list –, and their impact in the various list data models, remain unclear. Covering this knowledge gap would be a significant step towards the realization of a Semantic Web list Application Programming Interface (API) that standardizes list manipulation and generalizes beyond specific data models. In order to address these challenges towards the realization of such an API, we build on our previous work in understanding the effects of various sequential data models for Knowledge Graphs, extending our benchmark and proposing a set of read-write Semantic Web list operations in SPARQL, with insert, update and delete support. To do so, we identify five classic list-based computer science sequential data structures (linked list, double linked list, stack, queue, and array), from which we derive nine atomic read-write operations for Semantic Web lists. We propose a SPARQL implementation of these operations with five typical RDF data models and compare their performance by executing them against six increasing dataset sizes and four different triplestores. In light of our results, we discuss the feasibility of our devised API and reflect on the state of affairs of Sequential Linked Data.

APA, Harvard, Vancouver, ISO, and other styles

33

Daga, Enrico, Albert Meroño-Peñuela, and Enrico Motta. "Sequential Linked Data: the State of Affairs." Semantic Web – Interoperability, Usability, Applicability, an IOS Press Journal appear (June 11, 2021). https://doi.org/10.5281/zenodo.4927864.

Full text

Abstract:

Sequences are among the most important data structures in computer science. In the Semantic Web, however, little attention has been given to Sequential Linked Data. In previous work, we have discussed the data models that Knowledge Graphs commonly use for representing sequences and showed how these models have an impact on query performance and that this impact is invariant to triplestore implementations. However, the specific list operations that the management of Sequential Linked Data requires beyond the simple retrieval of an entire list or a range of its elements --e.g. to add or remove elements from a list--, and their impact in the various list data models, remain unclear.  Covering this knowledge gap would be a significant step towards the realization of a Semantic Web list Application Programming Interface (API) that standardizes list manipulation and generalizes beyond specific data models.  In order to address these challenges towards the realization of such an API, we build on our previous work in understanding the effects of various sequential data models for Knowledge Graphs, extending our benchmark and proposing a set of read-write Semantic Web list operations in SPARQL, with insert, update and delete support. To do so, we identify five classic list-based computer science sequential data structures (linked list, double linked list, stack, queue, and array), from which we derive nine atomic read-write operations for Semantic Web lists. We propose a SPARQL implementation of these operations with five typical RDF data models and compare their performance by executing them against six increasing dataset sizes and four different triplestores. In light of our results, we discuss the feasibility of our devised API and reflect on the state of affairs of Sequential Linked Data.

APA, Harvard, Vancouver, ISO, and other styles

34

Louarn, Marine, Fabrice Chatonnet, Xavier Garnier, et al. "Improving reusability along the data life cycle: a regulatory circuits case study." Journal of Biomedical Semantics 13, no. 1 (2022). http://dx.doi.org/10.1186/s13326-022-00266-4.

Full text

Abstract:

Abstract Background In life sciences, there has been a long-standing effort of standardization and integration of reference datasets and databases. Despite these efforts, many studies data are provided using specific and non-standard formats. This hampers the capacity to reuse the studies data in other pipelines, the capacity to reuse the pipelines results in other studies, and the capacity to enrich the data with additional information. The Regulatory Circuits project is one of the largest efforts for integrating human cell genomics data to predict tissue-specific transcription factor-genes interaction networks. In spite of its success, it exhibits the usual shortcomings limiting its update, its reuse (as a whole or partially), and its extension with new data samples. To address these limitations, the resource has previously been integrated in an RDF triplestore so that TF-gene interaction networks could be generated with two SPARQL queries. However, this triplestore did not store the computed networks and did not integrate metadata about tissues and samples, therefore limiting the reuse of this dataset. In particular, it does not enable to reuse only a portion of Regulatory Circuits if a study focuses on a subset of the tissues, nor to combine the samples described in the datasets with samples from other studies. Overall, these limitations advocate for the design of a complete, flexible and reusable representation of the Regulatory Circuits dataset based on Semantic Web technologies. Results We provide a modular RDF representation of the Regulatory Circuits, called Linked Extended Regulatory Circuits (LERC). It consists in (i) descriptions of biological and experimental context mapped to the references databases, (ii) annotations about TF-gene interactions at the sample level for 808 samples, (iii) annotations about TF-gene interactions at the tissue level for 394 tissues, (iv) metadata connecting the knowledge graphs cited above. LERC is based on a modular organisation into 1,205 RDF named graphs for representing the biological data, the sample-specific and the tissue-specific networks, and the corresponding metadata. In total it contains 3,910,794,050 triples and is available as a SPARQL endpoint. Conclusion The flexible and modular architecture of LERC supports biologically-relevant SPARQL queries. It allows an easy and fast querying of the resources related to the initial Regulatory Circuits datasets and facilitates its reuse in other studies. Associated website https://regulatorycircuits-lod.genouest.org

APA, Harvard, Vancouver, ISO, and other styles

35

Delmas, M., O. Filangi, N. Paulhe, et al. "FORUM: Building a Knowledge Graph from public databases and scientific literature to extract associations between chemicals and diseases." Bioinformatics, September 3, 2021. http://dx.doi.org/10.1093/bioinformatics/btab627.

Full text

Abstract:

Abstract Motivation Metabolomics studies aim at reporting a metabolic signature (list of metabolites) related to a particular experimental condition. These signatures are instrumental in the identification of biomarkers or classification of individuals, however their biological and physiological interpretation remains a challenge. To support this task, we introduce FORUM: a Knowledge Graph (KG) providing a semantic representation of relations between chemicals and biomedical concepts, built from a federation of life science databases and scientific literature repositories. Results The use of a Semantic Web framework on biological data allows us to apply ontological based reasoning to infer new relations between entities. We show that these new relations provide different levels of abstraction and could open the path to new hypotheses. We estimate the statistical relevance of each extracted relation, explicit or inferred, using an enrichment analysis, and instantiate them as new knowledge in the KG to support results interpretation/further inquiries. Availability A web interface to browse and download the extracted relations, as well as a SPARQL endpoint to directly probe the whole FORUM knowledge graph, are available at https://forum-webapp.semantic-metabolomics.fr. The code needed to reproduce the triplestore is available at https://github.com/eMetaboHUB/Forum-DiseasesChem. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

36

Günther, Taras, Matthias Filter, and Fernanda Dórea. "Making Linked Data accessible for One Health Surveillance with the "One Health Linked Data Toolbox"." ARPHA Conference Abstracts 4 (May 28, 2021). http://dx.doi.org/10.3897/aca.4.e68821.

Full text

Abstract:

In times of emerging diseases, data sharing and data integration are of particular relevance for One Health Surveillance (OHS) and decision support. Furthermore, there is an increasing demand to provide governmental data in compliance to the FAIR (Findable, Accessible, Interoperable, Reusable) data principles. Semantic web technologies are key facilitators for providing data interoperability, as they allow explicit annotation of data with their meaning, enabling reuse without loss of the data collection context. Among these, we highlight ontologies as a tool for modeling knowledge in a field, which simplify the interpretation and mapping of datasets in a computer readable medium; and the Resource Description Format (RDF), which allows data to be shared among human and computer agents following this knowledge model. Despite their potential for enabling cross-sectoral interoperability and data linkage, the use and application of these technologies is often hindered by their complexity and the lack of easy-to-use software applications. To overcome these challenges the OHEJP Project ORION developed the Health Surveillance Ontology (HSO). This knowledge model forms a foundation for semantic interoperability in the domain of One Health Surveillance. It provides a solution to add data from the target sectors (public health, animal health and food safety) in compliance with the FAIR principles of findability, accessibility, interoperability, and reusability, supporting interdisciplinary data exchange and usage. To provide use cases and facilitate the accessibility to HSO, we developed the One Health Linked Data Toolbox (OHLDT), which consists of three new and custom-developed web applications with specific functionalities. The first web application allows users to convert surveillance data available in Excel files online into HSO-RDF and vice versa. The web application demonstrates that data provided in well-established data formats can be automatically translated in the linked data format HSO-RDF. The second application is a demonstrator of the usage of HSO-RDF in a HSO triplestore database. In the user interface of this application, the user can select HSO concepts based on which to search and filter among surveillance datasets stored in a HSO triplestore database. The service then provides automatically generated dashboards based on the context of the data. The third web application demonstrates the use of data interoperability in the OHS context by using HSO-RDF to annotate meta-data, and in this way link datasets across sectors. The web application provides a dashboard to compare public data on zoonosis surveillance provided by EFSA and ECDC. The first solution enables linked data production, while the second and third provide examples of linked data consumption, and their value in enabling data interoperability across sectors. All described solutions are based on the open-source software KNIME and are deployed as web service via a KNIME Server hosted at the German Federal Institute for Risk Assessment. The semantic web extension of KNIME, which is based on the Apache Jena Framework, allowed a rapid an easy development within the project. The underlying open source KNIME workflows are freely available and can be easily customized by interested end users. With our applications, we demonstrate that the use of linked data has a great potential strengthening the use of FAIR data in OHS and interdisciplinary data exchange.

APA, Harvard, Vancouver, ISO, and other styles

37

Del Nostro, Pierluigi, Gerhard Goldbeck, Andrea Pozzi, and Daniele Toti. "Modeling experts, knowledge providers and expertise in Materials Modeling: MAEO as an application ontology of EMMO’s ecosystem." Applied Ontology, August 1, 2023, 1–20. http://dx.doi.org/10.3233/ao-230024.

Full text

Abstract:

This work presents the MarketPlace Agent and Expert Ontology (MAEO), an ontology for modeling experts, expertise, and more broadly, knowledge providers and knowledge seekers for the subject areas of Materials Modeling. MAEO had its inception within the “MarketPlace” European project, whose purpose is to bring about a single entry point for gathering scientific and industrial stakeholders in Materials Modeling. As such, this project aimed to build an online platform where experts and knowledge providers can be searched, found and brought into contact with users, or knowledge seekers, and with one another. MAEO was developed in order to fulfill the requirements of this online platform and thus support it, but is also part of a wider ecosystem of Materials Modeling-related ontologies, at whose core lies the Elementary Multiperspective Material Ontology (EMMO). MAEO is thus an EMMO-compliant application ontology, and has been loosely aligned with a number of existing ontologies, including Friend-Of-A-Friend (FOAF) and five recently-developed EMMO-based domain ontologies for the classification of materials, models, manufacturing processes, characterization methods and software products related to Materials Modeling. Here, a detailed description of the axiomatization of MAEO and its interconnected ontologies is provided, along with results coming from its deployment and experimentation in a StarDog triplestore. Availability. The axiomatization of the ontology is stored in a GitHub repository available at: https://github.com/emmo-repo/MAEO-Ontology, and is published at the following URL: http://emmo.info/emmo/application/maeo/experts.

APA, Harvard, Vancouver, ISO, and other styles

38

Piirainen, Esko, Eija-Leena Laiho, Tea von Bonsdorff, and Tapani Lahti. "Managing Taxon Data in FinBIF." Biodiversity Information Science and Standards 3 (June 26, 2019). http://dx.doi.org/10.3897/biss.3.37422.

Full text

Abstract:

The Finnish Biodiversity Information Facility, FinBIF (https://species.fi), has developed its own taxon database. This allows FinBIF taxon specialists to maintain their own, expert-validated view of Finnish species. The database covers national needs and can be rapidly expanded by our own development team. Furthermore, in the database each taxon is given a globally unique persistent URI identifier (https://www.w3.org/TR/uri-clarification), which refers to the taxon concept, not just to the name. The identifier doesn’t change if the taxon concept doesn’t change. We aim to ensure compatibility with checklists from other countries by linking taxon concepts as Linked Data (https://www.w3.org/wiki/LinkedData) — a work started as a part of the Nordic e-Infrastructure Collaboration (NeIC) DeepDive project (https://neic.no/deepdive). The database is used as a basis for observation/specimen searches, e-Learning and identification tools, and it is browsable by users of the FinBIF portal. The data is accessible to everyone under CC-BY 4.0 license (https://creativecommons.org/licenses/by/4.0) in machine readable formats. The taxon specialists maintain the taxon data using a web application. Currently, there are 60 specialists. All changes made to the data go live every night. The nightly update interval allows the specialists a grace period to make their changes. Allowing the taxon specialists to modify the taxonomy database themselves leads to some challenges. To maintain the integrity of critical data, such as lists of protected species, we have had to limit what the specialists can do. Changes to critical data is carried out by an administrator. The database has special features for linking observations to the taxonomy. These include hidden species aggregates and tools to override how a certain name used in observations is linked to the taxonomy. Misapplied names remain an unresolved problem. The most precise way to record an observation is to use a taxon concept: Most observations are still recorded using plain names, but it is possible for the observer to pick a concept. Also, when data is published in FinBIF from other information systems, the data providers can link their observations to the concepts using the identifiers of concepts. The ability to use taxon concepts as basis of observations means we have to maintain the concepts over time — a task that may become arduous in the future (Fig. 1). As it stands now, the FinBIF taxon data model — including adjacent classes such as publication, person, image, and endangerment assessments — consists of 260 properties. If the data model were stored in a normalized relational database, there would be approximately 56 tables, which could be difficult to maintain. Keeping track of a complete history of data is difficult in relational databases. Alternatively, we could use document storage to store taxon data. However, there are some difficulties associated with document storages: (1) much work is required to implement a system that does small atomic update operations; (2) batch updates modifying multiple documents usually require writing a script; and (3) they are not ideal for doing searches. We use a document storage for observation data, however, because they are well suited for storing large quantities of complex records. In FinBIF, we have decided to use a triplestore for all small datasets, such as taxon data. More specifically, the data is stored according to the RDF specification (https://www.w3.org/RDF). An RDF Schema defines the allowed properties for each class. Our triplestore implementation is an Oracle relational database with two tables (resource and statement), which gives us the ability to do SQL queries and updates. Doing small atomic updates is easy as only a small subset of the triplets can be updated instead of the entire data entity. Maintaining a complete record of history comes without much effort, as it can be done on an individual triplet level. For performance-critical queries, the taxon data is loaded into an Elasticsearch (https://www.elastic.co) search engine.

APA, Harvard, Vancouver, ISO, and other styles

39

Rousset, Marie-Christine, and Federico Ulliana. "Extracting Bounded-Level Modules from Deductive RDF Triplestores." Proceedings of the AAAI Conference on Artificial Intelligence 29, no. 1 (2015). http://dx.doi.org/10.1609/aaai.v29i1.9176.

Full text

Abstract:

We present a novel semantics for extracting bounded-level modules from RDF ontologies and databases augmented with safe inference rules, a la Datalog. Dealing with a recursive rule language poses challenging issues for defining the module semantics, and also makes module extraction algorithmically unsolvable in some cases. Our results include a set of module extraction algorithms compliant with the novel semantics. Experimental results show that the resulting framework is effective in extracting expressive modules from RDF datasets with formal guarantees, whilst controlling their succinctness.

APA, Harvard, Vancouver, ISO, and other styles

40

Alaoui, Khadija, and Mohamed Bahaj. "Evaluation Criteria for RDF Triplestores with an Application to Allegrograph." International Journal of Advanced Computer Science and Applications 11, no. 6 (2020). http://dx.doi.org/10.14569/ijacsa.2020.0110653.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Gmür, Reto, Donat Agosti, and Guido Sautter. "Synospecies, a Linked Data Application to Explore Taxonomic Names." Biodiversity Information Science and Standards 6 (August 23, 2022). http://dx.doi.org/10.3897/biss.6.93707.

Full text

Abstract:

Synospecies is a linked data application to explore changes in taxonomic names (Gmür and Agosti 2021). The underlying source of truth for the establishment of taxa, the assignment and re-assignment of names, are taxonomic treatments. Taxonomic treatments are sections of publications documenting the features or distribution of taxa in ways adhering to highly formalized conventions, and published in scientific journals, which shape our understanding of global biodiversity (Catapano 2010). Plazi, a not-for-profit organization dedicated to liberating knowledge, extracts the relevant information from these treatments and makes it publicly available in digital form. Depending on the original form of a publication, a treatment undergoes several steps during its processing. All these steps affect the available digital artifacts extracted from the treatment's original publication. The treatments are digitalized, the text is annotated with a specialized editor, and cross-referenced and enhanced with other sources (Agosti and Sautter 2018). After these steps, the annotated text is transformed to the different structured data-formats used by other digital biodiversity platforms (e.g., Global Biodiversity Information Facility: Plazi.org taxonomic treatment database using Darwin Core Archive, generic linked data tools (e.g. lod view; RDF2h Browser) and other consuming applications (e.g Ocellus via Zenodeo using XML; openBioDiv using XML; HMW using XML; Biotic interaction browser using TaxPub XML; opendata.swiss using RDF) . While these transformations have been taking place for a long time now, Plazi is now experimenting with making this process more transparent: with the Plazi Actionable Accessible Archive (PAAA) architecture both addition and modification of the digitalized treatments trigger an extensible set of workflows that are immediately executed on the GitHub platform. Not only is the exact definition and code of every workflow publicly accessible, but the results, errors and execution time of every single workflow is accessible as well. This offers an unprecedented degree of transparency and flexibility in the data processing that we have prototypically implemented for the creation of the RDF data used by Synospecies. As with the W3C GRDDL recommendation (https://www.w3.org/TR/grddl/) XSLT is used to transform XML to RDF/XML, a concrete syntax of the early days of RDF still supported by most RDF tools, allowing the data to be read as RDF. The used XSLT document is part of the bundled gg2rdf GitHub action (https://github.com/plazi/gg2rdf) together with the other transformation steps required to generate a transformation result in the both human- and machine-readable RDF Turtle format. On the GitHub Actions page of the treatments-xml repository (https://github.com/plazi/treatments-xml/actions) one can see that every commit to this repository triggers a workflow run that takes approximately 12 minutes to execute. After that the transformation results are available in the treatments-rdf repository (https://github.com/plazi/treatments-rdf/). The commit of RDF data to the treatments-rdf repository triggers a webhook that loads the newly added data to the Plazi triplestore making it virtually immediately available in Synospecies.

APA, Harvard, Vancouver, ISO, and other styles

42

Heikkinen, Mikko, Ville-Matti Riihikoski, Anniina Kuusijärvi, Dare Talvitie, Tapani Lahti, and Leif Schulman. "Kotka - A national multi-purpose collection management system." Biodiversity Information Science and Standards 3 (June 18, 2019). http://dx.doi.org/10.3897/biss.3.37179.

Full text

Abstract:

Many natural history museums share a common problem: a multitude of legacy collection management systems (CMS) and the difficulty of finding a new system to replace them. Kotka is a CMS created by the Finnish Museum of Natural History (Luomus) to solve this problem. Its development started in late 2011 and was put into operational use in 2012. Kotka was first built to replace dozens of in-house systems previously used at Luomus, but eventually grew into a national system, which is now used by 10 institutions in Finland. Kotka currently holds c. 1.7 million specimens from zoological, botanical, paleontological, microbial and botanic garden collections, as well as data from genomic resource collections. Kotka is designed to fit the needs of different types of collections and can be further adapted when new needs arise. Kotka differs in many ways from traditional CMS's. It applies simple and pragmatic approaches. This has helped it to grow into a widely used system despite limited development resources – on average less than one full-time equivalent developer (FTE). The aim of Kotka is to improve collection management efficiency by providing practical tools. It emphasizes the quantity of digitized specimens over completeness of the data. It also harmonizes collection management practices by bringing all types of collections under one system. Kotka stores data mostly in a denormalized free text format using a triplestore and a simple hierarchical data model (Fig. 1). This allows greater flexibility of use and faster development compared to a normalized relational database. New data fields and structures can easily be added as needs arise. Kotka does some data validation, but quality control is seen as a continuous process and is mostly done after the data has been recorded into the system. The data model is loosely based on the ABCD (Access to Biological Collection Data) standard, but has been adapted to support practical needs. Kotka is a web application and data can be entered, edited, searched and exported through a browser-based user interface. However, most users prefer to enter new data in customizable MS-Excel templates, which support the hierarchical data model, and upload these to Kotka. Batch updates can also be done using Excel. Kotka stores all revisions of the data to avoid any data loss due to technical or human error. Kotka also supports designing and printing specimen labels, annotations by external users, as well as handling accessions, loan transactions, and the Nagoya protocol. Taxonomy management is done using a separate system provided by the Finnish Biodiversity Information Facility (FinBIF). This decoupling also allows entering specimen data before the taxonomy is updated, which speeds up specimen digitization. Every specimen is given a persistent unique HTTP-URI identifier (CETAF stable identifiers). Specimen data is accessible through the FinBIF portal at species.fi, and will later be shared to GBIF according to agreements with data holders. Kotka is continuously developed and adapted to new requirements in close collaboration with curators and technical collection staff, using agile software development methods. It is available as open source, but is tightly integrated with other FinBIF infrastructure, and currently only offered as an online service (Software as a Service) hosted by FinBIF.

APA, Harvard, Vancouver, ISO, and other styles

43

Stork, Lise, Andreas Weber, and Katherine Wolstencroft. "The Semantic Field Book Annotator." Biodiversity Information Science and Standards 3 (June 19, 2019). http://dx.doi.org/10.3897/biss.3.37223.

Full text

Abstract:

Biodiversity research expeditions to the globe’s most biodiverse areas have been conducted for several hundred years. Natural history museums contain a wealth of historical materials from such expeditions, but they are stored in a fragmented way. As a consequence links between the various resources, e.g., specimens, illustrations and field notes, are often lost and are not easily re-established. Natural history museums have started to use persistent identifiers for physical collection objects, such as specimens, as well as associated information resources, such as web pages and multimedia. As a result, these resources can more easily be linked, using Linked Open Data (LOD), to information sources on the web. Specimens can be linked to taxonomic backbones of data providers, e.g., the Encyclopedia Of Life (EOL), the Global Biodiversity Information Facility (GBIF), or publications with Digital Object Identifiers (DOI). For the content of biodiversity expedition archives, (e.g. field notes), no such formalisations exist. However, linking the specimens to specific handwritten notes taken in the field can increase their scientific value. Specimens are generally accompanied by a label containing the location of the site where the specimen was collected, the collector’s name and the classification. Field notes often augment the basic metadata found with specimens with important details concerning, for instance, an organism’s habitat and morphology. Therefore, inter-collection interoperability of multimodal resources is just as important as intra-collection interoperability of unimodal resources. The linking of field notes and illustrations to specimens entails a number of challenges: historical handwritten content is generally difficult to read and interpret, especially due to changing taxonomic systems, nomenclature and collection practices. It is vital that: the content is structured in a similar way as the specimens, so that links can more easily be re-established either manually or in an automated way; for consolidation, the content is enriched with outgoing links to semantic resources, such as Geonames or Virtual International Authority File (VIAF); and this process is a transparent one: how links are established, why and by whom, should be stored to encourage scholarly discussions and to promote the attribution of efforts. the content is structured in a similar way as the specimens, so that links can more easily be re-established either manually or in an automated way; for consolidation, the content is enriched with outgoing links to semantic resources, such as Geonames or Virtual International Authority File (VIAF); and this process is a transparent one: how links are established, why and by whom, should be stored to encourage scholarly discussions and to promote the attribution of efforts. In order to address some of these issues, we have built a tool, the Semantic Field Book Annotator (SFB-A), that allows for the direct annotation of digitised (scanned) pages of field books and illustrations with Linked Open Data (LOD). The tool guides the user through the annotation process, so that semantic links are automatically generated in a formalised way. These annotations and links are subsequently stored in an RDF triplestore. As the use of the Darwin Core standard is considered best practice among collection managers for the digitisation of their specimens, our tool is equipped with an ontology based on Darwin Core terms, the NHC-Ontology, which extends the Darwin Semantic Web (DSW) ontology. The tool can annotate any image, be it an image of a specimen with a textual label, an illustration with a textual label or a handwritten species description. Interoperability of annotations between the various resources within a collection is therefore ensured. Terms in the ontology are structured using OWL web ontology language. This allows for more complex tasks such as OWL reasoning and semantic queries, and facilitates the creation of a richer knowledge base that is more amenable to research.

APA, Harvard, Vancouver, ISO, and other styles

44

A. Salgueiro, Mariana D., Veronica Dos Santos, André L. C. Rêgo, et al. "Searching for Researchers: an Ontology-based NoSQL Database System Approach and Practical Implementation." Journal of Information and Data Management 13, no. 5 (2022). http://dx.doi.org/10.5753/jidm.2022.2601.

Full text

Abstract:

This work presents the design and implementation of two web-based search systems, Busc@NIMA and Quem@PUC. Both systems allow the identification of research and development projects, besides existing competencies in laboratories and departments involving professors and researchers at PUC-Rio University. Our applications are based on a list of search-related terms that are matched to the dataset composed of PUC-Rio’s Lattes CVs offered courses, information from administrative systems, and specific keywords that are input by the professors/researchers themselves. To integrate all the needed data, we consider multiple database and search technologies, such as XML, RDF, TripleStores, and Relational Databases. Search results include professor’s name, academic papers, teaching activities, contact links, keywords, and laboratories of those involved with the subject represented by the set of keywords input. We describe the main features that show how our systems work.

APA, Harvard, Vancouver, ISO, and other styles

45

Mussa, Omar, Omer Rana, Benoit Goossens, Pablo Orozco Ter Wengel, and Charith Perera. "ForestQB: Enhancing Linked Data Exploration through Graphical and Conversational UIs Integration." ACM Journal on Computing and Sustainable Societies, June 29, 2024. http://dx.doi.org/10.1145/3675759.

Full text

Abstract:

This paper introduces the Forest Query Builder (ForestQB), an innovative toolkit designed to enhance the exploration and application of observational Linked Data (LD) within the field of wildlife research and conservation. Addressing the challenges faced by non-experts in navigating Resource Description Framework (RDF) triplestores and executing SPARQL queries, ForestQB employs a novel integrated approach. This approach combines a graphical user interface (GUI) with a conversational user interface (CUI), thereby greatly simplifying the process of query formulation and making observational LD accessible to users without expertise in RDF or SPARQL. Developed through insights derived from a comprehensive ethnographic study involving wildlife researchers, ForestQB is specifically designed to improve the accessibility of SPARQL endpoints and facilitate the exploration of observational LD in wildlife research contexts. To evaluate the effectiveness of our approach, we conducted a user experiment. The results of this evaluation affirm that ForestQB is not only efficient and user-friendly but also plays a crucial role in eliminating barriers for users, facilitating the effective use of observational LD in wildlife conservation and extending its benefits to wider domains. (GitHub Link)

APA, Harvard, Vancouver, ISO, and other styles

46

Touré, Vasundra, Philip Krauss, Kristin Gnodtke, et al. "FAIRification of health-related data using semantic web technologies in the Swiss Personalized Health Network." Scientific Data 10, no. 1 (2023). http://dx.doi.org/10.1038/s41597-023-02028-y.

Full text

Abstract:

AbstractThe Swiss Personalized Health Network (SPHN) is a government-funded initiative developing federated infrastructures for a responsible and efficient secondary use of health data for research purposes in compliance with the FAIR principles (Findable, Accessible, Interoperable and Reusable). We built a common standard infrastructure with a fit-for-purpose strategy to bring together health-related data and ease the work of both data providers to supply data in a standard manner and researchers by enhancing the quality of the collected data. As a result, the SPHN Resource Description Framework (RDF) schema was implemented together with a data ecosystem that encompasses data integration, validation tools, analysis helpers, training and documentation for representing health metadata and data in a consistent manner and reaching nationwide data interoperability goals. Data providers can now efficiently deliver several types of health data in a standardised and interoperable way while a high degree of flexibility is granted for the various demands of individual research projects. Researchers in Switzerland have access to FAIR health data for further use in RDF triplestores.

APA, Harvard, Vancouver, ISO, and other styles

47

Baskauf, Steven J. "Having Your Cake and Eating It Too: JSON-LD as an RDF serialization format." Biodiversity Information Science and Standards 5 (September 13, 2021). http://dx.doi.org/10.3897/biss.5.74266.

Full text

Abstract:

One impediment to the uptake of linked data technology is developers’ unfamiliarity with typical Resource Description Framework (RDF) serializations like Turtle and RDF/XML. JSON for Linking Data (JSON-LD) is designed to bypass this problem by expressing linked data in the well-known Javascript Object Notation (JSON) format that is popular with developers. JSON-LD is now Google’s preferred format for exposing Schema.org structured data in web pages for search optimization, leading to its widespread use by web developers. Another successful use of JSON-LD is by the International Image Interoperability Framework (IIIF), which limits its use to a narrow design pattern, which is readily consumed by a variety of applications. This presentation will show how a similar design pattern has been used in Audubon Core and with Biodiversity Information Standards (TDWG) controlled vocabularies to serialize data in a manner that is both easily consumed by conventional applications, but which also can be seamlessly loaded as RDF into triplestores or other linked data applications. The presentation will also suggest how JSON-LD might be used in other contexts within TDWG vocabularies, including with the Darwin Core Resource Relationship terms.

APA, Harvard, Vancouver, ISO, and other styles

48

Costa, Lázaro, Nuno Freitas, and João Rocha da Silva. "An evaluation of Graph Databases and Object-Graph Mappers in CIDOC CRM-compliant digital archives." Journal on Computing and Cultural Heritage, February 18, 2022. http://dx.doi.org/10.1145/3485847.

Full text

Abstract:

The Portuguese General Directorate for Book, Archives and Libraries (DGLAB) has selected CIDOC CRM as base for its next-generation digital archive management software. Given the ontology foundations of the CRM, a graph database or a triple store were seen as the best candidates to represent a CRM-based data model for the new software. We thus decided to compare several of these databases, based on their maturity, features, performance in standard tasks and, most importantly, the Object-Graph Mappers (OGM) available to interact with each database in an Object-Oriented way. Our conclusions are drawn not only from a systematic review of related works but from an experimental scenario. For our experiment, we designed a simple CRM-compliant graph designed to test the ability of each OGM/database combination to tackle the so-called “Diamond-problem” in Object-Oriented Programming (OOP), to ensure that property instances follow domain and range constraints. Our results show that 1. ontological consistency enforcement in graph databases and triplestores is much harder to achieve than in a relational database, making them more suited to an analytical rather than a transactional role, 2. Object-Graph Mappers are still rather immature solutions and 3. neomodel, an OGM for the Neo4j graph database, is the most mature solution in the study as it satisfies all requirements, although it is also the least performing.

APA, Harvard, Vancouver, ISO, and other styles

49

Lawrence, Peter. "Knowledge Graph Primer: Creating a digital twin graph model." May 2, 2022. https://doi.org/10.5281/zenodo.6518057.

Full text

Abstract:

A series of tutorials designed to introduce the use of knowledge graphs to an industrial environment (process, manufacturing industries; construction etc.) to technical users with some database experience but not necessarily experience of the use of graphs. Throughout the tutorials, there is an example ‘thread’ of building models of process plants and equipment with the objective of creating a fit-for-purpose<sup><sup>[1]</sup></sup> ‘digital twin’ model. This thread culminates with a ‘recipe’ for building fit-for-purpose digital twin models, particularly in the manufacturing and equipment domains. Each tutorial is illustrated by the de <strong>Session 1: Introduction to graphs</strong> <strong><em>Graph databases, how they fit in the database landscape, their advantages, and disadvantages.</em></strong> What is a knowledge graph database Why is it different, What are its advantages What are its disadvantages   How to build a graph database Triplestores Development tools <strong>Session 2: Simple modeling with graphs</strong> <strong><em>The tools required to build a fit-for-purpose model: RDF, RDFS</em></strong> Using RDF, RDFS, and OWL (maybe) to describe graph models Creating 1-D models with graphs Querying graphs with SPARQL with OData <strong>Session 3: Creating a realistic plant model: a digital-twin</strong> <strong><em>Tackling the real requirements of a fit-for-purpose digital-twin model</em></strong> What are the issues to be solved applying knowledge graphs to an industrial application Limitations of 1D modeling (simple subject-property-object) Useful concepts from Basic Formal Ontology Extension to 2D modeling (reified statements so we can have other data such as Units-of-measure, accuracy, version and so on) Extension to 3D modeling (so we can capture the fact that knowledge is always changing over time) Avoiding complex taxonomies with shapes   [1] Fit-for-purpose: Although focused on the manufacturing and equipment domains, we want to build a model which the participants can relate and easily transfer to their own problems. This model should be able to answer the detailed questions that they know will be required in practice: change management, moving equipment, lifecycle of equipment, missing values, measurement values, units-of-measure and so on. 

APA, Harvard, Vancouver, ISO, and other styles

50

Florian, Thiery. "Irische ᚑᚌᚆᚐᚋ Steine im Wikimedia Universum". 11 лютого 2021. https://doi.org/10.5281/zenodo.4568093.

Full text

Abstract:

Im Rahmen des <em>Fellow-Programm Freies Wissen</em> möchte ich im Sinne offener Wissenschaft eine für alle im Sinne der <em>Knowledge Equity</em> frei zugängliche, semantisch beschriebene, transparente <em>Linked Open Data</em><sup>[1]</sup> (LOD) Datensammlung irischer Ogham Steine<sup>[2]</sup> erstellen. Diese Sammlung wird auf bereits bestehenden publizierten Forschungen aufbauen und kann dadurch als weiteres wichtiges Forschungstool im Bereich der frühmittelalterlichen Inschriften dienen. Auf privaten Reisen durch Irland sind mir insbesondere im westlichen Teil der grünen Insel, in den Counties Kerry und Cork, an diversen Stellen Hinweise auf eine mysteriöse Schrift und Steine als deren originäre Inschriftenträger begegnet. Nach einer ersten Recherche stellten sich diese als Ogham-Steine mit einer frühmittelalterlichen Ogham-Schrift, als eine der bemerkenswertesten nationalen Schätze Irlands heraus. Ogham-Steine wurden in Irland und im westlichen Teil Britanniens zwischen dem 4. und 9. Jahrhundert aufgestellt. Die auf den Steinen eingemeißelten Inschriften zeigen insbesondere verwandtschaftliche oder Stammes-Beziehungen und könnten so als Grabsteine oder Flächenabgrenzungen gedient haben. Sie sind eine wichtige Quelle für Historiker, aber auch für Sprachwissenschaftler und Archäologen. Um einer großen Forschungscommunity diesen reichhaltigen Schatz eines kleinen überschaubaren Korpus an Inschriften und Steinen als freies Wissen näherzubringen, entstand die Idee des <em>Ogi-Ogham Projektes</em><sup>[3]</sup>. Diese Idee wurde mit Freunden in einer Freizeit-Working-Group, den <em>Research Squirrel Engineers</em><sup>[4]</sup>, aufgenommen. Hierdurch sind bereits erste Modellierungen und Publikationen von Steinen nach Macálister in <em>Wikidata</em><sup>[5]</sup> entstanden. Die semantische Modellierung soll dabei in zwei Arten erfolgen. Zum Einen sollen die Daten (Steine, Fundorte, Wörter, Personen, etc.) in Wikidata abgelegt werden, um so die Daten in der Linked Data Cloud verorten und der Community die Möglichkeit zu bieten sich an freiem Wissen im Bereich der Ogham Inschriften zu beteiligen. Dies kann z.B. auch durch Bilder von Ogham-Steinen in Wikimedia Commons geschehen sowie der Ergänzung und Übersetzung der erklärenden Wikipedia Seiten. Zum Anderen sollen die Daten in einer eigenen Ogham-Ontologie gespeichert, über einen SPARQL Endpoint zur Verfügung gestellt und mit den in Wikidata vorliegenden Steinen verknüpft werden. Dies ermöglicht eine tiefergehende semantische Modellierung der Ogham Steine und deren Inschriften und kann somit zum offenen und freien wissenschaftlichen Diskurs beitragen. Die Ogham Steine sollen darüber hinaus in einer community-freundlichen Webplattform eine Suche auf Wikidata und in anderen Triplestores ermöglichen. Dabei sollen Filtermöglichkeiten zu bestimmten Themen, wie benutzte Wörter, Material oder Personen, sowie nach geographisch abgrenzbaren Bereichen möglich sein. Zudem soll eine Integration in freie GIS Software ermöglicht werden, so dass Wissenschaftler weitere Analysen in ihrer eigenen Softwarewelt durchführen können. As part of the <em>Fellow-Programm Freies Wissen</em>, I would like to create a semantically described, <em>Linked Open Data</em><sup>[6]</sup> (LOD) collection of Irish Ogham stones<sup>[7]</sup> that is freely accessible to all in the spirit of <em>Knowledge Equity</em>. This collection is build on existing published research and can thus serve as another important research tool in the field of early medieval inscriptions. On private journeys through Ireland, especially in the western part of the green isle, in the counties of Kerry and Cork, I have come across a mysterious script and stones as its original inscription carriers in various places. After initial research, these turned out to be Ogham stones with an early medieval Ogham script, one of Ireland's most remarkable national treasures. Ogham stones were placed in Ireland and the western part of Britain between the 4th and 9th century. The inscriptions carved on the stones show in particular kinship or tribal relationships and so may have served as gravestones or area demarcations. They are an important source for historians, but also for linguists and archaeologists. In order to bring this rich treasure of a small manageable corpus of inscriptions and stones as free knowledge to a large research community, the idea of the <em>Ogi Ogham Project</em><sup>[8]</sup> was born. This idea was initiated with friends in a spare time working group, the <em>Research Squirrel Engineers</em><sup>[9]</sup>. This has already resulted in the first modelling and publications of stones after Macálister in <em>Wikidata</em><sup>[10]</sup>. The semantic modelling will be done in two ways. On the one hand, the data (stones, sites, words, people, etc.) will be stored in Wikidata in order to integrate the data into the Linked Data Cloud and to offer the community the opportunity to participate in Free Knowledge in the field of Ogham inscriptions. This can be done, for example, by adding images of Ogham stones to Wikimedia Commons and by adding to and translating the explanatory Wikipedia pages. On the other hand, the data will be stored in a separate Ogham ontology, made available via a SPARQL endpoint and linked to the stones available in Wikidata. This enables a deeper semantic modelling of the Ogham stones and their inscriptions and can thus contribute to an open and free scientific discourse. In addition, the Ogham stones will be made searchable on Wikidata and other Triplestores in a community-friendly web platform. Filter options for specific topics, such as words used, material or persons, as well as for geographically definable areas should be possible. In addition, integration into free GIS software is to be made possible so that scientists can carry out further analyses in their own software world.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!