Log in

Relevant bibliographies by topics / Oracle RDBMS

Contents

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers

Academic literature on the topic 'Oracle RDBMS'

Author: Grafiati

Published: 4 June 2021

Last updated: 1 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Oracle RDBMS.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Oracle RDBMS"

1

Jeffry, Jeffry. "Analisis Kinerja Web Server pada SIM Manajemen Diklat Poltekpel Sorong Menggunakan RDBMS MySQL dan MariaDB." Journal of System and Computer Engineering (JSCE) 1, no. 1 (July 23, 2020): 12–20. http://dx.doi.org/10.47650/jsce.v1i1.77.

Full text

Abstract:

Perkembangan teknologi informasi dan data meningkat pesat di era big data seperti sekarang ini. Database Management System menjadi bagian utama yang sangat penting untuk mengontrol arus data. Penelitian ini membandingkan kinerja web server yang menggunakan RDBMS open source yang berbeda antara MySQL dan MariaDB. Pengujian dilakukan pada Oracle Virtual Machine Virtualbox menggunakan ApacheBench untuk mengukur kinerja Web Server pada SIM Manajemen Diklat Poltekpel Sorong. Hasil percobaan menunjukkan bahwa web server ketika menggunakan RDBMS MySQL cenderung memiliki performa yang cukup stabil ketika permintaan akses web di bawah 300 kali secara bersamaan yaitu pada 100,200 dan 300 kali berturut-turut sebesar 7.764/ms, 16.386/ms dan 30.025/ms. Namun, saat permintaan akses web di atas 300 secara bersamaan RDBMS MariaDB justru menunjukkan kinerja yang lebih baik. Hal ini ditunjukkan dengan permintaan akses 400 dan 500 kali web server secara bersamaan, waktu respon terlihat lebih cepat dibandingkan ketika menggunakan RDBMS MySQL berturut-turut sebesar 51.877/ms dan 54.702/ms sedangkan RDBMS mariaDB untuk permintaan akses web server secara bersamaan pada 100,200,300,400 dan 500 berturut-turut sebesar 14.213/ms, 25.642/ms, 40.831/ms, 48.021/ms dan 51.630/ms

APA, Harvard, Vancouver, ISO, and other styles

2

Murti, Darlis Heru, Yudhi Purwananto, and M. Rifqi Febrianto. "QUERY BUILDER PADA RDBMS ORACLE MENGGUNAKAN XML DAN ACTIVEX BERBASIS WEB." JUTI: Jurnal Ilmiah Teknologi Informasi 3, no. 1 (January 1, 2004): 1. http://dx.doi.org/10.12962/j24068535.v3i1.a124.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Possenti, Luigi, Lara Savini, Annamaria Conte, Nicola D'Alterio, Maria Luisa Danzetta, Alessio Di Lorenzo, Maria Nardoia, Paolo Migliaccio, Susanna Tora, and Paolo Dalla Villa. "A New Information System for the Management of Non-Epidemic Veterinary Emergencies." Animals 10, no. 6 (June 5, 2020): 983. http://dx.doi.org/10.3390/ani10060983.

Full text

Abstract:

The Italian National Veterinary Services, public health professionals, and policy makers are asked to participate at different levels in the decision-making process for the management of non-epidemic emergencies. A decision support system offering the different administrative and operational emergency management levels with a spatial and decisional tool to be used in the case of natural disasters is still missing at the national level. Within this context, the Italian General Directorate for Animal Health of the Ministry of Health funded a research project for the implementation of a new Veterinary Information System for Non-Epidemic Emergencies (SIVENE), an innovative real-time decision support tool for emergency response in a disaster management scenario. SIVENE was developed according to a multi-layer architecture with four integrated components: the database layer, which was implemented by an RDBMS Oracle 11 g; the ReST service layer, which was created using J2EE, Spring, and MyBatis technologies; the web application (business framework and user interface), which was developed in Angular4 framework using TypeScript language; and the web Geographic Information Systems (GIS), which was realized through the implementation of a geodatabase in Oracle RDBMS 11 g. This system allows us to build up and dynamically create a set of dedicated checklists to be used in the field when gathering the information needed for the management of non-epidemic emergencies; employ the application on mobile devices, such as tablets and smartphones; and use the web GIS to manage and visualize data of veterinary interest and territorial maps of risk and damage.

APA, Harvard, Vancouver, ISO, and other styles

4

Basu, Aniruddha. "Post Boost Track Processing Using Conventional DBMS Software." Defence Science Journal 66, no. 2 (March 23, 2016): 130. http://dx.doi.org/10.14429/dsj.66.9244.

Full text

Abstract:

<p>The design of air defence, traditional command control system is very challenging which has been used with basic methodologies. Traditional design is associated with unstructured and uncorrelated data and requires huge lines of code using hard disk drive (HDD) in the system. Hence an attempt was made for a better simplified database management system (DBMS) software data access methodology, which processed the incoming airborne data, message in RDBMS database to achieve full automation on real-time. The transaction is accomplished through SQL pass through method from the host decision making system into database. An algorithm of track identification during midcourse track separation was undertaken for prototype development on DBMS data access methodology. In this methodology Oracle C++ calls interface embedded query call was used from the host interface system. The purpose of this development was to find a comparison of online process timing between HDD and SSD using commercial database, and to evaluate performance of dynamic processing of RDBMS Database for identification of target vehicle and booster after separation. Produced experimentation results from improved performance of the proposed methodology on which futuristic command control system can rely.</p><p> </p>

APA, Harvard, Vancouver, ISO, and other styles

5

Thomas Mason, Robert. "Changing Paradigms of Technical Skills for Data Engineers." Issues in Informing Science and Information Technology 15 (2018): 035–42. http://dx.doi.org/10.28945/4033.

Full text

Abstract:

Aim/Purpose: This paper investigates the changing paradigms for technical skills that are needed by Data Engineers in 2018. Background: A decade ago, data engineers needed technical skills for Relational Database Management Systems (RDBMS), such as Oracle and Microsoft SQL Server. With the advent of Hadoop and NoSQL Databases in recent years, Data Engineers require new skills to support the large distributed datastores (Big Data) that currently exist. Job demand for Data Scientists and Data Engineers has increased over the last five years. Methodology: This research methodology leveraged the Pig programming language that used MapReduce software located on the Amazon Web Services (AWS) Cloud. Data was collected from 100 Indeed.com job advertisements during July of 2017 and then was uploaded to the AWS Cloud. Using MapReduce, phrases/words were counted and then sorted. The sorted phrase / word counts were then leveraged to create the list of the 20 top skills needed by a Data Engineer based on the job advertisements. This list was compared to the 20 top skills for a Data Engineer presented by Stitch that surveyed 6,500 Data Engineers in 2016. Contribution: This paper presents a list of the 20 top technical skills required by a Data Engineer.

APA, Harvard, Vancouver, ISO, and other styles

6

Cahyanugraha, Ervin Adhi, R. Rizal Isnanto, and Ike Pertiwi Windasari. "Desain dan Implementasi Sistem Online Gudang Pada PT. PLN (Persero) Distribusi Regional Jawa Tengah dan D.I Yogyakarta." Jurnal Teknologi dan Sistem Komputer 3, no. 1 (January 30, 2015): 154–60. http://dx.doi.org/10.14710/jtsiskom.3.1.2015.154-160.

Full text

Abstract:

The information system that can provide complete, accurate information in an integrated way with the ability to reach all people in the system is very important. The criteria are not fulfilled by the warehouse management system on PT. PLN (Persero), distribution of Central Java and Special Region of Yogyakarta. Some districts are not reached by the information system technology; some of them still use Microsoft Office Excel to manage the goods in the warehouse. . So, communication up to rayon is not good, rayon and area are so hard to exchange information, transaction between rayon and area is not quick. this case cause procurement goods is hampered, and this make direct impact to unfinished project . Therefore, the online system of this warehouse can solve the flaw of the old system. The Warehouse Online System is the ASP.Net-based web application. This development process uses Microsoft Visual Studio as the development tools. An information system is not separable from the database as the information data storage; Warehouse Online System uses Oracle as the Relational Database Management System (RDBMS). The Warehouse Online System is suitable with the Standard Operational Procedure (SOP) of PT. PLN (Persero). Through Warehouse Online System, the distributors from the area or district can be integrated to communicate, to do the transaction on the material in the warehouse, and to get the access of the user based on their task and authority. Based on the Black Box test, the functions on Warehouse Online System function like what is expected. The maintenance process is still needed to improve the system in the future.

APA, Harvard, Vancouver, ISO, and other styles

7

Anderson, Richard, Gopalan Arun, and Richard Frank. "Oracle Rdb's record caching model." ACM SIGMOD Record 27, no. 2 (June 1998): 526–27. http://dx.doi.org/10.1145/276305.276365.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Savini, L., C. Ippoliti, I. Di Lorenzo, and Anna Amaria Conte. "Base de données sur Internet et système d'information géographique pour appuyer Medreonet." Revue d’élevage et de médecine vétérinaire des pays tropicaux 62, no. 2-4 (February 1, 2009): 162. http://dx.doi.org/10.19182/remvt.10070.

Full text

Abstract:

The main objective of Medreonet is to share and exchange data, expertise, experiences and information on bluetongue (BT), African horse sickness (AHS) and epizootic haemorrhagic disease (EHD). In this context the web-based database and geographical information system (GIS) application is the most suitable tool to provide a friendly environment that is easy to use by the different actors involved in the project. The web-based database and GIS application has been developed using ESRI software (release 9.0) (ArcIMS, ArcGIS desktop, ArcSDE), Java and Active Server Pages (ASP). Users can access the public web-GIS through a generic Internet browser and the information required (maps and data) are published by ArcIMS using web server technology. ArcSDE and an Oracle relational database management system (release 8i) (RDBMS) are used to store and manage spatial and alphanumerical data. The authorized users can input new information and data on their geographical area of competence directly online, using ASP and a web interface. The accuracy of the data entered into the information system (e.g. missing values, duplicates, incorrect data format, etc.) is guaranteed by automatic check procedures that operate during the updating of the centralized database. The database was designed to store all the epidemiological data deemed relevant by the experts and all the scientific results, when available, produced during the project. In particular, the data collected cover three main sets of information which are displayed and spread through an interactive, dynamic mapping system: – outbreak distribution, i.e. the geographical distribution of the disease by year and serotype at the regional level in the European Union (EU) and Mediterranean countries for BT, AHS, and EHD, respectively; – serological surveillance results, i.e. geographical distribution of the true and apparent prevalence of infection based on the analyses of BT serological surveillance data; – entomological distribution, i.e. geographical distribution of nine vector species by year and month, number of catch sites, number of catches, vector and maximum number of midges at the regional level in the EU and Mediterranean countries. Medreonet database and GIS application fulfill all the requirements stipulated in the project; moreover the system is fully scalable and may adapt to future demands.

APA, Harvard, Vancouver, ISO, and other styles

9

Kusumo, Dana Sulistiyo, Moch Arief Bijaksana, and Dhinta Darmantoro. "DATA MINING DENGAN ALGORITMA APRIORI PADA RDBMS ORACLE." TEKTRIKA - Jurnal Penelitian dan Pengembangan Telekomunikasi, Kendali, Komputer, Elektrik, dan Elektronika 8, no. 1 (September 20, 2016). http://dx.doi.org/10.25124/tektrika.v8i1.215.

Full text

Abstract:

Data mining merupakan proses analisis data menggunakan perangkat lunak untuk menemukan pola dan aturan (rules) dalam himpunan data. Data mining dapat menganalisis data yang besar untuk menemukan pengetahuan guna mendukung pengambilan keputusan. Dalam penelitian ini akan dibahas Association Rule sebagai salah satu fungsi data mining yang diimplementasikan menggunakan Algoritma Apriori. Akan dianalisis pula dua teknik penghitungan support di candidate generation pada Algoritma Apriori, yakni : K-way dan 2 Group-By pada tiga sampel dataset dengan atribut transaksi id dan item. Pada penelitian ini terlihat bahwa permasalahan penghitungan support di candidate generation merupakan bottleneck dari Algoritma Apriori dimana perbaikan Algoritma Apriori ditekankan pada candidate generation dan efektivitas dari Algoritma Apriori. Penelitian ini dilakukan pada RDBMS Oracle dengan memanfaatkan tools TKPROF untuk mengukur performansi query berdasarkan operasi I/O pada penghitungan support di candidate generation. Hasil penelitian membuktikan bahwa metode support counting K-way lebih baik daripada Two Group-by.Kata Kunci : Data Mining, Association Rule, Algoritma Apriori, candidate generation, K-way, 2 Group-By

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Oracle RDBMS"

1

Arvidsson, Andreas, and Jörgen Bygdemark. "JÄMFÖRELSE MELLAN ORACLE RDBMS, ORACLE NOSQL OCH MONGODB." Thesis, Umeå universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-163179.

Full text

Abstract:

Databases are present everywhere in our modern society and the amount of data that have to be stored is constantly increasing, which means that it’s now more important than ever to be able to handle massive data sets effectively. NoSQL databases2 were developed to solve this problem by efficiently storing large amounts of data and enable fast access to that data. Since NoSQL databases only became popular within the last ten years, they haven’t been as well researched as relational databases. An in-depth evaluation is carried out on six distinct features, where one part is comparative performance tests. The other features are: scalability, consistency, availability, durability and reliability. MongoDB and Oracle NoSQL are the NoSQL databases used and together with Oracle RDBMS as relational database make up the basis for a comparative study of the above mentioned features.The results showed that there are big differences between how data is handled in NoSQL compared to relational databases that will affect the choice of database, e.g. that NoSQL tends to prioritize that clients can reach the database over non-contradictory data and lowering the demands on transaction management to increase performance and storage capacity. Furthermore, the performance tests showed that both NoSQL databases performed beer than the relational database regardless of the data set size. MongoDB was clearly the fastest on reading operations, while Oracle NoSQL performed write operations the fastest most of the time. Both NoSQL databases are impacted less by a growing data set than the relational database for both read and write operations.

APA, Harvard, Vancouver, ISO, and other styles

2

Tsai, Chongren, and 蔡崇仁. "A Replication-based Application Model for Oracle 7 RDBMS and Lotus Notes Document-based Server." Thesis, 1997. http://ndltd.ncl.edu.tw/handle/99554665681047175919.

Full text

Abstract:

碩士
國立台灣工業技術學院
工程技術研究所
85
The distributed applications have very wide usage in client/ server environment. The replication technique is a key issue to make data sharable among different database servers.This search discusses some replication issues covering relational database model, document-based database model, homogeneous and heterogeneous database servers. Their differences on data transfer behaviors are explored to show up the critical points for developing distributed database applications.To show our model''''s availability, we built a simulation environment for demonstrating the two-way replication between Oracle 7 and Lotus Notes 4.5. In this model, NotesPump Server acts as a coordinator of data replication. We built a Sales Order Management System on Oracle 7 ,and a Sales Decision Support System on Lotus Notes. After that, Pump is performing its scheduled replication task between those two systems. On Oracle side, summary data about product sales and employees is pulled out and transferring to Notes side by Pump Server. Managers on Notes client side use these data to do some value-added activities, and make Pump carry back the updated data to Oracle''''s sales database.

APA, Harvard, Vancouver, ISO, and other styles

3

Mogotlane, Kgotatso Desmond. "Semantic knowledge extraction from relational databases." Thesis, 2014. http://hdl.handle.net/10352/337.

Full text

Abstract:

M. Tech. (Information Technology, Department of Information and Communications Technology, Faculty of Applied an Computer Sciences), Vaal University of Technolog
One of the main research topics in Semantic Web is the semantic extraction of knowledge stored in relational databases through ontologies. This is because ontologies are core components of the Semantic Web. Therefore, several tools, algorithms and frameworks are being developed to enable the automatic conversion of relational databases into ontologies. Ontologies produced with these tools, algorithms and frameworks needs to be valid and competent for them to be useful in Semantic Web applications within the target knowledge domains. However, the main challenges are that many existing automatic ontology construction tools, algorithms, and frameworks fail to address the issue of ontology verification and ontology competency evaluation. This study investigates possible solutions to these challenges. The study began with a literature review in the semantic web field. The review let to the conceptualisation of a framework for semantic knowledge extraction to deal with the abovementioned challenges. The proposed framework had to be evaluated in a real life knowledge domain. Therefore, a knowledge domain was chosen as a case study. The data was collected and the business rules of the domain analysed to develop a relational data model. The data model was further implemented into a test relational database using Oracle RDBMS. Thereafter, Protégé plugins were applied to automatically construct ontologies from the relational database. The resulting ontologies are further validated to match their structures against existing conceptual database-to-ontology mapping principles. The matching results show the performance and accuracy of Protégé plugins in automatically converting relational databases into ontologies. Finally, the study evaluated the resulting ontologies against the requirements of the knowledge domain. The requirements of the domain are modelled with competency questions (CQs) and mapped to the ontology using SPARQL queries design, execution and analysis against users’ views of CQs answers. Experiments show that, although users have different views of the answers to CQs, the execution of the SPARQL translations of CQs against the ontology does produce outputs instances that satisfy users’ expectations. This indicates that Protégé plugins generated ontology from relational database embodies domain and semantic features to be useful in Semantic Web applications.

APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Oracle RDBMS"

1

Corporation, Oracle. Oracle RDBMS release notes: Version 5.1.. Belmont, Calif: Oracle Corporation, 1986.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

2

Corporation, Oracle. Oracle RDBMS database administrator's guide: Version 6.0. Belmont, Calif: Oracle Corporation, 1989.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

3

Corporation, Oracle. Oracle RDBMS performance tuning guide: Version 6.0. Belmont, Calif: Oracle Corporation, 1990.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

4

Corporation, Oracle. Oracle RDBMS database administrator's guide: Version 6.0. Belmont, Calif: Oracle Corporation, 1990.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

5

Corporation, Oracle. Oracle RDBMS utilities user's guide: Version 6.0. Belmont, Calif: Oracle Corporation, 1988.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

6

Corporation, Oracle. ORACLE RDBMS database administrator's guide: Version 6.0. Belmont, Calif: Oracle Corp., 1989.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

7

Corporation, Oracle. Oracle RDBMS error messages and codes manual: Version 6.0. Belmont, Calif: Oracle Corporation, 1990.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Oracle RDBMS"

1

Shaw, Steve, and Martin Bach. "RDBMS Installation and Configuration." In Pro Oracle Database 11g RAC on Linux, 505–57. Berkeley, CA: Apress, 2010. http://dx.doi.org/10.1007/978-1-4302-2959-9_10.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Alapati, Sam R. "Installing the Oracle9i RDBMS." In Expert Oracle9i Database Administration, 107–37. Berkeley, CA: Apress, 2003. http://dx.doi.org/10.1007/978-1-4302-0773-3_4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Wycislik, Lukasz. "Storage Efficiency of LOB Structures for Free RDBMSs on Example of PostgreSQL and Oracle Platforms." In Beyond Databases, Architectures and Structures. Towards Efficient Solutions for Data Analysis and Knowledge Representation, 212–23. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-58274-0_18.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Saygili, Okcan Yasin. "Relational Database Management System (RDBMS)." In The Introduction to Private Cloud using Oracle Exadata and Oracle Database, 5–8. CRC Press, 2020. http://dx.doi.org/10.1201/9780429020902-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Naik, Shefali Trushit. "Accessing Data From Multiple Heterogeneous Distributed Database Systems." In Applying Integration Techniques and Methods in Distributed Systems and Technologies, 192–219. IGI Global, 2019. http://dx.doi.org/10.4018/978-1-5225-8295-3.ch008.

Full text

Abstract:

This chapter describes the method to retrieve data from multiple heterogeneous distributed relational database management systems such as MySQL, PostgreSQL, MS SQL Server, MS Access, etc. into Oracle RDBMS using Oracle's Heterogeneous Gateway Services. The complete process starting from downloading and installation of required software, creation of data source names using open database connectivity, modification of system parameter files, checking connections, creation of synonyms for tables of remote databases into oracle, creation of database links and accessing data from non-oracle databases using database links is explained in great detail. Apart from this, data manipulation in remote databases from Oracle and execution of PL/SQL procedures to manipulate data residing on remote databases is discussed with examples. Troubleshooting common errors during this process is also discussed.

APA, Harvard, Vancouver, ISO, and other styles

6

Dweib, Ibrahim, and Joan Lu. "State of the Art Technology." In Advances in Data Mining and Database Management, 201–18. IGI Global, 2013. http://dx.doi.org/10.4018/978-1-4666-1975-3.ch015.

Full text

Abstract:

This chapter presents the state of the art approaches for storing and retrieving the XML documents from relational databases. Approaches are classified into schema-based mapping and schemaless-based mapping. It also discusses the solutions which are included in Database Management Systems such as SQL Server, Oracle, and DB2. The discussion addresses the issues of: rebuilding XML from RDBMS approaches, comparison of mapping approaches, and their advantages and disadvantages. The chapter concludes with the issues addressed.

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Oracle RDBMS"

1

Rumpler, Béatrice, Mario Polo, and Benjamin Razafimandimby. "Tools and Methods for Performance Evaluation of RDBMS Applications." In ASME 1994 International Computers in Engineering Conference and Exhibition and the ASME 1994 8th Annual Database Symposium collocated with the ASME 1994 Design Technical Conferences. American Society of Mechanical Engineers, 1994. http://dx.doi.org/10.1115/edm1994-0505.

Full text

Abstract:

Abstract The goal of our research is to predict performance of applications using ORACLE RDBMS and then to propose tools to optimize performance. The performance we are interesting in, is the performance as perceived by users. We then specially study user’s transactions response time. Our method is based on measurement, and the first step was to measure performance on existing applications using ORACLE RDBMS. We have developed several software tools: - user simulator - application generator - workload application generator - measurement tools to measure user transactions response time and system activity during a transaction execution. The second step consisted in data collection and data analysis of measures. The data analysis, based on statistic methods, has permitted to extract the most influent factors and to understand how they can enhance applications performance (Rumpler and Polo, 1993). We are now able to present the most part of these results. The last step will consist in building the rules of an expert system for configuration and tuning assistance of ORACLE RDBMS applications. We also analyse the impact of operating system (UNIX) parameters on performance, and these information will complete our expert system possibilities. The present paper will describe precisely this research with tools developed, methods used, and results.

APA, Harvard, Vancouver, ISO, and other styles

2

Anderson, Richard, Gopalan Arun, and Richard Frank. "Oracle Rdb's record caching model." In the 1998 ACM SIGMOD international conference. New York, New York, USA: ACM Press, 1998. http://dx.doi.org/10.1145/276304.276365.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

"Changing Paradigms of Technical Skills for Data Engineers." In InSITE 2018: Informing Science + IT Education Conferences: La Verne California. Informing Science Institute, 2018. http://dx.doi.org/10.28945/4001.

Full text

Abstract:

Aim/Purpose: [This Proceedings paper was revised and published in the 2018 issue of the journal Issues in Informing Science and Information Technology, Volume 15] This paper investigates the new technical skills that are needed for Data Engineering. Past research is compared to new research which creates a list of the 20 top tech-nical skills required by a Data Engineer. The growing availability of Data Engineering jobs is discussed. The research methodology describes the gathering of sample data and then the use of Pig and MapReduce on AWS (Amazon Web Services) to count occurrences of Data Engineering technical skills from 100 Indeed.com job advertisements in July, 2017. Background: A decade ago, Data Engineering relied heavily on the technology of Relational Database Management Sys-tems (RDBMS). For example, Grisham, P., Krasner, H., and Perry D. (2006) described an Empirical Soft-ware Engineering Lab (ESEL) that introduced Relational Database concepts to students with hands-on learning that they called “Data Engineering Education with Real-World Projects.” However, as seismic im-provements occurred for the processing of large distributed datasets, big data analytics has moved into the forefront of the IT industry. As a result, the definition for Data Engineering has broadened and evolved to include newer technology that supports the distributed processing of very large amounts of data (e.g. Hadoop Ecosystem and NoSQL Databases). This paper examines the technical skills that are needed to work as a Data Engineer in today’s rapidly changing technical environment. Research is presented that re-views 100 job postings for Data Engineers from Indeed (2017) during the month of July, 2017 and then ranks the technical skills in order of importance. The results are compared to earlier research by Stitch (2016) that ranked the top technical skills for Data Engineers in 2016 using LinkedIn to survey 6,500 peo-ple that identified themselves as Data Engineers. Methodology: A sample of 100 Data Engineering job postings were collected and analyzed from Indeed during July, 2017. The job postings were pasted into a text file and then related words were grouped together to make phrases. For example, the word “data” was put into context with other related words to form phrases such as “Big Data”, “Data Architecture” and “Data Engineering”. A text editor was used for this task and the find/replace functionality of the text editor proved to be very useful for this project. After making phrases, the large text file was uploaded to the Amazon cloud (AWS) and a Pig batch job using Map Reduce was leveraged to count the occurrence of phrases and words within the text file. The resulting phrases/words with occurrence counts was download to a Personal Computer (PC) and then was loaded into an Excel spreadsheet. Using a spreadsheet enabled the phrases/words to be sorted by oc-currence count and then facilitated the filtering out of irrelevant words. Another task to prepare the data involved the combination phrases or words that were synonymous. For example, the occurrence count for the acronym ELT and the occurrence count for the acronym ETL were added together to make an overall ELT/ETL occurrence count. ETL is a Data Warehousing acronym for Extracting, Transforming and Loading data. This task required knowledge of the subject area. Also, some words were counted in lower case and then the same word was also counted in mixed or upper case, thus producing two or three occur-rence counts for the same word. These different counts were added together to make an overall occur-rence count for the word (e.g. word occurrence counts for Python and python were added together). Fi-nally, the Indeed occurrence counts were sorted to allow for the identification of a list of the top 20 tech-nical skills needed by a Data Engineer. Contribution: Provides new information about the Technical Skills needed by Data Engineers. Findings: Twelve of the 20 Stitch (2016) report phrases/words that are highlighted in bold above matched the tech-nical skills mentioned in the Indeed research. I considered C, C++ and Java a match to the broader cate-gory of Programing in the Indeed data. Although the ranked order of the two lists did not match, the top five ranked technical skills for both lists are similar. The reader of this paper might consider the skills of SQL, Python, Hadoop/HDFS to be very important technical skills for a Data Engineer. Although the programming language R is very popular with Data Scientists, it did not make the top 20 skills for Data Engineering; it was in the overall list from Indeed. The R programming language is oriented towards ana-lytical processing (e.g. used by Data Scientists), whereas the Python language is a scripting and object-oriented language that facilitates the creation of Data Pipelines (e.g. used by Data Engineers). Because the data was collected one year apart and from very different data sources, the timing of the data collection and the different data sources could account for some of the differences in the ranked lists. It is worth noting that the Indeed research ranked list introduced the technical skills of Design Skills, Spark, AWS (Amazon Web Services), Data Modeling, Kafta, Scala, Cloud Computing, Data Pipelines, APIs and AWS Redshift Data Warehousing to the top 20 ranked technical skills list. The Stitch (2016) report that did not have matches to the Indeed (2017) sample data for Linux, Databases, MySQL, Business Intelligence, Oracle, Microsoft SQL Server, Data Analysis and Unix. Although many of these Stitch top 20 technical skills were on the Indeed list, they did not make the top 20 ranked technical skills. Recommendations for Practitioners: Some of the skills needed for Database Technologies are transferable to Data Engineering. Recommendation for Researchers: None Impact on Society: There is not much peer reviewed literature on the subject of Data Engineering, this paper will add new information to the subject area. Future Research: I'm developing a Specialization in Data Engineering for the MS in Data Science degree at our university.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!