Wednesday, April 10, 2019

Data Persistence


Information System

At the most basic level, an information system (IS) is a set of components that work together to manage data processing and storage. Its role is to support the key aspects of running an organization, such as communication, record-keeping, decision making, data analysis and more. Companies use this information to improve their business operations, make strategic decisions and gain a competitive edge.


Information systems typically include a combination of software, hardware and telecommunication networks. For example, an organization may use customer relationship management systems to gain a better understanding of its target audience, acquire new customers and retain existing clients. This technology allows companies to gather and analyse sales activity data, define the exact target group of a marketing campaign and measure customer satisfaction.

The Benefits of Information Systems


Modern technology can significantly boost your company's performance and productivity. Information systems are no exception. Organizations worldwide rely on them to research and develop new ways to generate revenue, engage customers and streamline time-consuming tasks.

With an information system, businesses can save time and money while making smarter decisions. A company's internal departments, such as marketing and sales, can communicate better and share information more easily.
Since this technology is automated and uses complex algorithms, it reduces human error. Furthermore, employees can focus on the core aspects of a business rather than spending hours collecting data, filling out paperwork and doing manual analysis.

Thanks to modern information systems, team members can access massive amounts of data from one platform. For example, they can gather and process information from different sources, such as vendors, customers, warehouses and sales agents, with a few mouse clicks.

Uses and Applications

There are different types of information systems and each has a different role. Business intelligence (BI) systems, for instance, can turn data into valuable insights.

This kind of technology allows for faster, more accurate reporting, better business decisions and more efficient resource allocation. Another major benefit is data visualization, which enables analysts to interpret large amounts of information, predict future events and find patterns in historical data.

                       Data

Data can be defined as a representation of facts, concepts, or instructions in a formalized manner, which should be suitable for communication, interpretation, or processing by human or electronic machine.
Data is represented with the help of characters such as alphabets (A-Z, a-z), digits (0-9) or special characters (+, -,/,*,<,>,= etc.)


Data Processing Cycle


Data processing is the re-structuring or re-ordering of data by people or machine to increase their usefulness and add values for a particular purpose. Data processing consists of the following basic steps - input, processing, and output. These three steps constitute the data processing cycle.











·        Input − In this step, the input data is prepared in some convenient form for processing. The form will depend on the processing machine. For example, when electronic computers are used, the input data can be recorded on any one of the several types of input medium, such as magnetic disks, tapes, and so on.

·        Processing − In this step, the input data is changed to produce data in a more useful form. For example, pay-checks can be calculated from the time cards, or a summary of sales for the month can be calculated from the sales orders.

  •   Output − At this stage, the result of the proceeding processing step is collected. The particular form of the output data depends on the use of the data. For example, output data may be pay-checks for employees.


       Database

A database is a collection of information that is organized so that it can be easily accessed, managed and updated.
Data is organized into rows, columns and tables, and it is indexed to make it easier to find relevant information. Data gets updated, expanded and deleted as new information is added. Databases process workloads to create and update themselves, querying the data they contain and running applications against it.

Computer databases typically contain aggregations of data records or files, such as sales transactions, product catalogues and inventories, and customer profiles.








Database server


Database server is the term used to refer to the back-end system of a database application using client/server architecture. The back-end, sometimes called database server, performs tasks such as data analysis, storage, data manipulation, archiving, and other non-user specific tasks.

Some of the well-known DBMSs include MySQLPostgreSQLMicrosoft SQL ServerOracle, SAP and DB2. So, A database is a place to store and retrieve organised data.



Database management system


A database management system (DBMS) is system software for creating and managing databases. The DBMS provides users and programmers with a systematic way to create, retrieve, update and manage data.


A DBMS makes it possible for end users to create, read, update and delete data in a database. The DBMS essentially serves as an interface between the database and end users or application programs, ensuring that data is consistently organized and remains easily accessible.





File vs Database


Difference Between File and Database is that a data file is a collection of related records stored on a storage medium such as a hard disk or optical disc. While a database is a collection of data organized in a manner that allows access, retrieval, and use of that data. Data is a collection of unprocessed items, which can include text, numbers, images, audio, and video.

File

Database File Example
data file is a collection of related records stored on a storage medium such as a hard disk or optical disc. A Student file at a school might consist of thousands of individual student records. Each student record in the file contains the same fields. Each field, however, contains different data. The image shows a small sample Student file that contains four student records, each with eleven fields. A database includes a group of related data files.

Database

database is a collection of data organized in a manner that allows access, retrieval, and use of that data. Data is a collection of unprocessed items, which can include text, numbers, images, audio, and video. Information is processed data; that is, it is organized, meaningful, and useful.
Computers process data in a database into information. A database at a school, for example, contains data about students, e.g., student data, class data, etc. A computer at the school processes new student data and then sends advising appointment and ID card information to the printers.
Arrangements of data

A table is an arrangement of data in rows and columns, or possibly in a more complex structure. Tables are widely used in communication, research, and data analysis. ... Further, tables differ significantly in variety, structure, flexibility, notation, representation and use.



Types of Database Management Systems



Database Management Systems

A database is a collection of data or records. Database management systems are designed to work with data. A database management system (DBMS) is a software system that uses a standard method to store and organize data. The data can be added, updated, deleted, or traversed using various standard algorithms and queries. 


Types of Database Management Systems

There are several types of database management systems. Here is a list of seven common database management systems:
  1. Hierarchical databases
  2. Network databases
  3. Relational databases
  4. Object-oriented databases
  5. Graph databases
  6. ER model databases
  7. Document databases

Hierarchical Databases




In a hierarchical database management systems (hierarchical DBMSs) model, data is stored in a parent-children relationship nodes. In a hierarchical database, besides actual data, records also contain information about their groups of parent/child relationships. 
In a hierarchical database model, data is organized into a tree like structure. The data is stored in form of collection of fields where each field contains only one value. The records are linked to each other via links into a parent-children relationship. In a hierarchical database model, each child record has only one parent. A parent can have multiple children.
To retrieve a field’s data, we need to traversed through each tree until the record is found.
The hierarchical database system structure was developed by IBM in early 1960s. While hierarchical structure is simple, it is inflexible due to the parent-child one-to-many relationship. Hierarchical databases are widely used to build high performance and availability applications usually in banking and telecommunications industries.
The IBM Information Management System (IMS) and Windows Registry are two popular examples of hierarchical databases.
  
Advantage 
Hierarchical database can be accessed and updated rapidly because in this model structure is like as a tree and the relationships between records are defined in advance. This feature is a two-edged.
Disadvantage 

This type of database structure is that each child in the tree may have only one parent, and relationships or linkages between children are not permitted, even if they make sense from a logical standpoint. Hierarchical databases are so in their design. it can add a new field or record requires that the entire database be redefined. 

Network Databases



Network database management systems (Network DBMSs) use a network structure to create relationship between entities. Network databases are mainly used on a large digital computer. Network databases are hierarchical databases but unlike hierarchical databases where one node can have one parent only, a network node can have relationship with multiple entities. A network database looks more like a cobweb or interconnected network of records.
In network databases, children are called members and parents are called occupier. The difference between each child or member can have more than one parent.


The approval of the network data model is similar to a hierarchical data model. Data in a network database is organized in many-to-many relationships.
The network database structure was invented by Charles Bachman. Some of the popular network databases are Integrated Data Store (IDS), IDMS (Integrated Database Management System), Raima Database Manager, TurboIMAGE, and Univac DMS-1100. 

Relational Databases



In relational database management systems (RDBMS), the relationship between data is relational and data is stored in tabular form of columns and rows. Each column if a table represents an attribute and each row in a table represents a record. Each field in a table represents a data value.
Structured Query Language (SQL) is a the language used to query a RDBMS including inserting, updating, deleting, and searching records. 
Relational databases work on each table has a key field that uniquely indicates each row, and that these key fields can be used to connect one table of data to another.

Relational databases are the most popular and widely used databases. Some of the popular DDBMS are Oracle, SQL Server, MySQL, SQLite, and IBM DB2.
The relational database has two major reasons
  1. Relational databases can be used with little or no training.
  2. Database entries can be modified without specify the entire body.
Properties of Relational Tables
In the relational database we have to follow some properties which are given below.
  • It's Values are Atomic
  • In Each Row is alone.
  • Column Values are of the Same thing.
  • Columns is undistinguished.
  • Sequence of Rows is Insignificant.
  • Each Column has a common Name.
RDBMs are the most popular databases. Learn here Most Popular Database In the World

Object-Oriented Model



In this Model we have to discuss the functionality of the object-oriented Programming. It takes more than storage of programming language objects. Object DBMS's increase the semantics of the C++ and Java. I t provides full-featured database programming capability, while containing native language compatibility. It adds the database functionality to object programming languages. This approach is the analogical of the application and database development into a constant data model and language environment. Applications require less code, use more natural data modelling, and code bases are easier to maintain. Object developers can write complete database applications with a decent amount of additional effort.
The object-oriented database derivation is the integrity of object-oriented programming language systems and consistent systems. The power of the object-oriented databases comes from the cyclical treatment of both consistent data , as found in databases, and transient data, as found in executing programs.


Object-oriented databases use small, recyclable separated of software called objects. The objects themselves are stored in the object-oriented database. Each object contains of two elements:
  1. Piece of data (e.g., sound, video, text, or graphics).
  2. Instructions, or software programs called methods, for what to do with the data.
Object-oriented database management systems (OODBMs) were created in early 1980s. Some OODBMs were designed to work with OOP languages such as Delphi, Ruby, C++, Java, and Python. Some popular OODBMs are TORNADO, Gemstone, ObjectStore, GBase, VBase, InterSystems Cache, Versant Object Database, ODABA, ZODB, Poet. JADE, and Informix.
Disadvantage of Object-oriented databases
  1. Object-oriented databases have these disadvantages.
  2. Object-oriented database are more expensive to develop.
  3. In the Most organizations are unwilling to abandon and convert from those databases.
Benefits of Object-oriented databases
The benefits to object-oriented databases are compelling. The ability to mix and match reusable objects provides incredible multimedia capability.

Graph Databases



Graph Databases are NoSQL databases and use a graph structure for sematic queries. The data is stored in form of nodes, edges, and properties. In a graph database, a Node represent an entity or instance such as customer, person, or a car. A node is equivalent to a record in a relational database system. An Edge in a graph database represents a relationship that connects nodes. Properties are additional information added to the nodes.
The Neo4j, Azure Cosmos DB, SAP HANA, Sparksee, Oracle Spatial and Graph, OrientDB, ArrangoDB, and MarkLogic are some of the popular graph databases. Graph database structure is also supported by some RDBMs including Oracle and SQL Server 2017 and later versions. 

ER Model Databases 






An ER model is typically implemented as a database. In a simple relational database implementation, each row of a table represents one instance of an entity type, and each field in a table represents an attribute type. In a relational database a relationship between entities is implemented by storing the primary key of one entity as a pointer or "foreign key" in the table of another entity.
Entity-relationship model was developed by Peter Chen 1976. 

Document Databases 



Document databases (Document DB) are also NoSQL database that store data in form of documents. Each document represents the data, its relationship between other data elements, and attributes of data. Document database store data in a key value form.
Document DB has become popular recently due to their document storage and NoSQL properties. NoSQL data storage provide faster mechanism to store and search documents.
Popular NoSQL databases are Hadoop/Hbase, Cassandra, Hypertable, MapR, Hortonworks, Cloudera, Amazon SimpleDB, Apache Flink, IBM Informix, Elastic, MongoDB, and Azure DocumentDB.

Difference between Big Data and Data Warehouse:





1.   Data Warehouse is an architecture of data storing or data repository. Whereas Big Data is a technology to handle huge data and prepare the repository.
2.   Any kind of DBMS data accepted by Data warehouse, whereas Big Data accept all kind of data including transnational data, social media data, machinery data or any DBMS data.
3.   Data warehouse only handles structure data (relational or not relational), but big data can handle structure, non-structure, semi-structured data.
4.   Big data normally used a distributed file system to load huge data in a distributed way, but data warehouse doesn’t have that kind of concept.
5.   From a business point of view, as big data has a lot of data, analytics on that will be very fruitful, and the result will be more meaningful which help to take proper decision for that organization. Whereas Data warehouse mainly helps to analytic on informed information.
6.   Data warehouse means the relational database, so storing, fetching data will be similar with normal SQL query. And big data is not following proper database structure, we need to use hive or spark SQL to see the data by using hive specific query.
7.   100% data loaded into data warehousing are using for analytics reports. But whatever data loaded by Hadoop, maximum 0.5% used on analytics reports till now. Others data are loaded into the system, but in not use status.
8.   Data Warehousing never able to handle humongous data (totally unstructured data). Big data (Apache Hadoop) is the only option to handle humongous data.
9.   The timing of fetching increasing simultaneously in data warehouse based on data volume. Means, it will take small time for low volume data and big time for a huge volume of data just like DBMS. But in case of big data, it will take a small period of time to fetching huge data (as it specially designed for handling huge data), but taken huge time if we somehow try to load or fetch small data in HDFS by using map reduce.

Data Warehouse is mainly an architecture, not a technology. It extracting data from varieties SQL based data source (mainly relational database) and help for generating analytic reports. In









DATA WAREHOUSE:

Terms of definition, data repository, which using for any analytic reports, has been generated from one process, which is nothing but the data warehouse.

BIG DATA:

Big Data is mainly a technology, which stands on volume, velocity, and variety of the data. Volumes define the amount of data coming from different sources, velocity refers to the speed


components of a database application system

The database management system can be divided into five major components, they are:
1.   Hardware
2.   Software
3.   Data
4.   Procedures
5.   Database Access Language

Let's have a simple diagram to see how they all fit together to form a database management system.

Hardware

When we say Hardware, we mean computer, hard disks, I/O channels for data, and any other physical component involved before any data is successfully stored into the memory.
When we run Oracle or MySQL on our personal computer, then our computer's Hard Disk, our Keyboard using which we type in all the commands, our computer's RAM, ROM all become a part of the DBMS hardware.

Software

This is the main component, as this is the program which controls everything. The DBMS software is more like a wrapper around the physical database, which provides us with an easy-to-use interface to store, access and update data.
The DBMS software is capable of understanding the Database Access Language and interpret it into actual database commands to execute them on the DB.

Data

Data is that resource, for which DBMS was designed. The motive behind the creation of DBMS was to store and utilise data.
In a typical Database, the user saved Data is present and meta data is stored.
Metadata is data about the data. This is information stored by the DBMS to better understand the data stored in it.
For example: When I store my Name in a database, the DBMS will store when the name was stored in the database, what is the size of the name, is it stored as related data to some other data, or is it independent, all this information is metadata.

 Procedures

Procedures refer to general instructions to use a database management system. This includes procedures to setup and install a DBMS, To login and logout of DBMS software, to manage databases, to take backups, generating reports etc.

 Database Access Language

Database Access Language is a simple language designed to write commands to access, insert, update and delete data stored in any database.
A user can write commands in the Database Access Language and submit it to the DBMS for execution, which is then translated and executed by the DBMS.
User can create new databases, tables, insert data, fetch stored data, update data and delete the data using the access language.

SQL Statements : Use this for general-purpose access to your database. Useful when you are using static SQL statements at runtime. The Statement interface cannot accept parameters.

Statement stmt = con.createStatement();

stmt.executeUpdate(“update STUDENT set NAME =” +
name +
“ where ID =” +

id + “)”;

Prepared statements : Use this when you plan to use the SQL statements many times. The PreparedStatement interface accepts input parameters at runtime.

PreparedStatement pstmt = con.prepareStatement("update

STUDENT set NAME = ? where ID = ?");
pstmt.setString(1, "MyName");
pstmt.setInt(2, 111);

pstmt.executeUpdate();

Callable statements : Use this when you want to access the database stored procedures. The CallableStatement interface can also accept runtime input parameters.

CallableStatement cstmt = con.prepareCall("{call

anyProcedure(?, ?, ?)}");

cstmt.execute();


Should I Or Should I Not Use ORM ?

  • If you’re going to use ORM(Object Relational Mapping), you should make your model objects as simple as possible. Be more vigilant about simplicity to make sure your model objects really are just Plain ol’ Data. Otherwise you may end up wrestling with your ORM to make sure the persistence works like you expect it to, and it’s not looking for methods and properties that aren’t actually there.
  • If you’re not going to use ORM, you should probably define DAOs or persistence and query methods to avoid coupling the model layer with the persistence layer. Otherwise you end up with SQL in your model objects and a forced dependency on your project.
  • If you know your data access patterns are generally going to be simple (like basic object retrieval) but you don’t know all of them up front, you should think about using an ORM. While ORMs can make building complex queries confusing to build and difficult to debug, an ORM can save you huge amounts of time if your queries are generally pretty simple.
  • If you know your data access pattern is going to be complex or you plan to use a lot of database-specific features, you may not want to use an ORM. While many ORMs (like Hibernate) let you access the underlying data source connection pretty easily , if you know you’re going to have to throw around a lot of custom SQL, you may not get a lot of value out of ORM to begin with because you’re constantly going to have to break out of it.
  • If it absolutely, positively, has to, has tohas to go fast, you may not want to use ORM. The only way to be absolutely sure all your queries consistently go fast is to plan your database structure carefully, manage your data access pattern with extreme prejudice, commit to one data store, and write your own queries optimized against that data store.
Having mentioned some of the scenarios depending on which you could decide on whether or not to go with ORM, let me also point out few Pros and Cons of ORM in general.
PROS
  • Facilitates implementing domain model pattern.
  • Huge reduction in code.
  • Takes care of vendor specific code by itself.
  • Cache Management — Entities are cached in memory thereby reducing load on the DB.
CONS
  • Increased startup time due to metadata preparation( not good for desktop applications).
  • Huge learning curve without ORM.
  • Relatively hard to fine tune and debug generated SQL.Not suitable for applications without a clean domain object model.

Whether or not you should use ORM isn’t about other people’s values, or even your own. It’s about choosing the right technique for your application based on its technical requirements. Use ORM or don’t based not on personal values but on what your app needs more: control over data access, or less code to maintain.



POJO Vs Java Beans

POJO classes
POJO stands for Plain Old Java Object. It is an ordinary Java object, not bound by any special restriction other than those forced by the Java Language Specification and not requiring any class path. POJOs are used for increasing the readability and re-usability of a program. POJOs have gained most acceptance because they are easy to write and understand. They were introduced in EJB 3.0 by Sun microsystems.
A POJO should not:
  1. Extend prespecified classes, Ex: public class GFG extends javax.servlet.http.HttpServlet { … } is not a POJO class.
  2. Implement prespecified interfaces, Ex: public class Bar implements javax.ejb.EntityBean { … } is not a POJO class.
  3. Contain prespecified annotations, Ex: @javax.persistence.Entity public class Baz { … } is not a POJO class.
POJOs basically defines an entity. Like in you program, if you want a Employee class then you can create a POJO as follows:
// Employee POJO class to represent entity Employee 
public class Employee 
// default field 
String name; 

// public field 
public String id; 

// private salary 
private double salary; 

//arg-constructor to initialize fields 
public Employee(String name, String id, 
double salary) 
this.name = name; 
this.id = id; 
this.salary = salary; 

// getter method for name 
public String getName() 
return name; 

// getter method for id 
public String getId() 
return id; 

// getter method for salary 
public Double getSalary() 
return salary; 
The above example is a well defined example of POJO class. As you can see, there is no restriction on access-modifier of fields. They can be private, default, protected or public. It is also not necessary to include any constructor in it.




POJO is an object which encapsulates Business Logic. Following image shows a working example of POJO class. Controllers get interact with your business logic which in turn interact with POJO to access the database. In this example a database entity is represented by POJO. This POJO has the same members as database entity.





Java Beans
Beans are special type of Pojos. There are some restrictions on POJO to be a bean.
  1. All JavaBeans are POJOs but not all POJOs are JavaBeans.
  2. Serializable i.e. they should implement Serializable interface. Still some POJOs who don’t implement Serializable interface are called POJOs beacause Serializable is a marker interface and therefore not of much burden.
  3. Fields should be private. This is to provide the complete control on fields.
  4. Fields should have getters or setters or both.
  5. A no-arg constructor should be there in a bean.
  6. Fields are accessed only by constructor or getter setters.
Getters and Setters have some special names depending on field name. For example, if field name is someProperty then its getter preferably will be:
public void getSomeProperty()
{
   return someProperty;
} 

and setter will be
public void setSomePRoperty(someProperty)
{
   this.someProperty=someProperty;
}
Visibility of getters and setters in generally public. Getters and setters provide the complete restriction on fields. e.g. consider below property,
Integer age;
If you set visibility of age to public, then any object can use this. Suppose you want that age can’t be 0. In that case you can’t have control. Any object can set it 0. But by using setter method, you have control. You can have a condition in your setter method. Similarly, for getter method if you want that if your age is 0 then it should return null, you can achieve this by using getter method as in following example:
// Java program to illustrate JavaBeans 
class Bean 
// private field property 
private Integer property; 
Bean() 
// No-arg constructor 

// setter method for property 
public void setProperty(Integer property) 
if (property == 0) 
// if property is 0 return 
return; 
this.property=property; 

// getter method for property 
public int getProperty() 
if (property == 0) 
// if property is 0 return null 
return null; 
return property; 

// Class to test above bean 
public class GFG 
public static void main(String[] args) 
Bean bean = new Bean(); 

bean.setProperty(0); 
System.out.println("After setting to 0: " + 
bean.getProperty()); 

bean.setProperty(5); 
System.out.println("After setting to valid" + 
" value: " + bean.getProperty()); 
Output:-
After setting to 0: null
After setting to valid value: 5
POJO vs Java Bean
POJOJAVA BEAN
It doesn’t have special restrictions other than those forced by Java language.It is a special POJO which have some restrictions.
It doesn’t provide much control on members.It provides complete control on members.
It can implement Serializable interface.It should implement serializable interface.
Fields can be accessed by their names.Fields are accessed only by getters and setters.
Fields can have any visiblity.Fields have only private visiblity.
There can be a no-arg constructor.It must have a no-arg constructor.
It is used when you don’t want to give restriction on your members and give user complete access of your entityIt is used when you want to provide user your entity but only some part of your entity.

POJO classes and Beans both are used to define java objects to increase their readability and reusability. POJOs don’t have other restrictions while beans are special POJOs with some restrictions.


Java Persistence API (JPA)

•An API/specification for ORM

Uses :

•POJO classes
•XML based mapping file (represent the DB)

•A provider (implementation of JPA)

JPA Architecture




JPA implementations

•Hybernate
•JDO
•EclipseLink 
•ObjectDB

ORM tools 

Java


PHP

  • CakePHP, ORM and framework for PHP 5, open source (scalars, arrays, objects); based on database introspection, no class extending
  • CodeIgniter, framework that includes an ActiveRecord implementation
  • Doctrine, open source ORM for PHP 5.2.3, 5.3.X. Free software (MIT)
  • FuelPHP, ORM and framework for PHP 5.3, released under the MIT license. Based on the ActiveRecord pattern.
  • Laravel, framework that contains an ORM called "Eloquent" an ActiveRecord implementation.
  • Maghead, a database framework designed for PHP7 includes ORM, Sharding, DBAL, SQL Builder tools etc. free software, released under MIT license.
  • Propel, ORM and query-toolkit for PHP 5, inspired by Apache Torque, free software, MIT
  • Qcodo, ORM and framework for PHP 5, open source
  • QCubed, A community driven fork of Qcodo
  • Rocks, open source ORM for PHP 5.1 plus, free for non-commercial use, GPL
  • Redbean, ORM layer for PHP 5, creates and maintains tables on the fly, open source, BSD
  • Skipper, visualization tool and a code/schema generator for PHP ORM frameworks, commercial
  • Torpor, open source ORM for PHP 5.1 plus, free software, MIT, database and OS agnostic
  • Yii, ORM and framework for PHP 5, released under the BSD license. Based on the ActiveRecord pattern.
  • Zend Framework, framework that includes a table data gateway and row data gateway implementations.

.Net




NOSQL (Not Only Sql database)

NoSQL is an approach to database design that can accomodate a wide variety of data models, including key-value, document, columnar and graph formats. NoSQL, which stand for "not only SQL," is an alternative to traditional relational databases in which data is placed in tables and data schema is carefully designed before the database is built. NoSQL databases are especially useful for working with large sets of distributed data.












Benefits of NoSQL

•When compared to relational databases,NoSQL databases are more scalable and provide superior performance, and their data model addresses several issues that the relational model is not designed to address:

•Large volumes of rapidly changing structured,semi-structured, and unstructured data

NoSQL DB servers

•MongoDB
•Cassandra
•Redis
•Amazon DynamoDB
•Hbase

Hadoop


The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Hadoop core concepts

• Hadoop Distributed File System (HDFS™): A distributed
file system that provides high-throughput access to
application data
• Hadoop YARN: A framework for job scheduling and cluster
resource management.
• Hadoop Map Reduce: A YARN-based system for parallel
processing of large data sets.



Information Retrieval(IR)

Information retrieval is about finding documents relevant to an information need, which are stored and indexed. 

This is done by posing a query to a search engine which matches the terms used as search keys to the terms used to store the documents in the index.


Information Retrieval Tools and their Utilization

 Libraries have been in existence since the beginning of writing and have served as a repository of the intellectual wealth of the society. As such, libraries have always been concerned with storing and retrieving information in the media it is created on. As the quantities of information grew exponentially, libraries were forced to make maximum use of information retrieval tools to facilitate the storage and retrieval process. 

These major tools are Catalogues, Classification Schemes, Indexes, Abstracts, Bibliographies. Other Information Retrieval tools in the library include the following: Encyclopedia, Directories, Dictionaries, Almanacs, Handbooks, Atlases, Periodicals, and Concordances among others. 



https://www.tutorialspoint.com/jdbc/jdbc-statements.htm

https://medium.com/@mithunsasidharan/should-i-or-should-i-not-use-orm-4c3742a639ce

https://www.geeksforgeeks.org/pojo-vs-java-beans/

https://en.wikipedia.org/wiki/List_of_object-relational_mapping_software

https://searchdatamanagement.techtarget.com/definition/NoSQL-Not-Only-SQL

https://hadoop.apache.org/

https://svn.spraakdata.gu.se/repos/richard/pub/sv2122_web/STIR.pdf


https://www.sryahwapublications.com/research-journal-of-library-and-information-science/pdf/v2-i2/4.pdf