Glossary of Terms related to Big Data-Alphabet E to N

Alphabet – E

Exabytes (EB):

A single Exabyte is approximately equal to 1,000 Petabytes or data stored on approximately 2 million personal computers. 5 Exabytes may be equal to the total number of words ever spoken by human beings so far. In other words a Exabyte is the number of Bytes that is equal to 1 followed by 18 zeroes.

According to the IBM Dictionary of computing, the hard disk space, or data storage space in terms of bytes in decimal notation are described as

1 Bit = Binary Digit; 8 Bits = 1 Byte; 1000 Bytes = 1 Kilobyte; 1000 Kilobytes = 1 Megabyte;
1000 Megabytes = 1 Gigabyte; 1000 Gigabytes = 1 Terabyte; 1000 Terabytes = 1 Petabyte;
1000 Petabytes = 1 Exabyte and 1000 Exabytes = 1 Zettabyte & so on

Alphabet – F

Fuzzy Logic:

Fuzzy logic is a kind of approach to computing meant to mimic human brains by working on “the concept of partial truth” rather than the usual ” completely true and completely false ” (1 or 0)

Boolean logic on which the modern computer is based.

Alphabet – G

Gamification:

Gamification in context of big data is using the gaming elements like competing with others, scoring points with certain play rules etc. for gathering and analyzing the data and by generally motivating the users. The objective of gamification techniques is to enhance the natural desires of people for socializing, learning, mastery, competition, achievement, status, self-expression. Companies apply gaming principles to increase interest in a product or service, or simply to deepen their customers’ relationship with their brand.

Geopbytes:

A Geopbyte is approximately equal to 1,000 Brontobytes. In other words a Geopbyte is the number of Bytes that is equal to 1 followed by 30 zeroes. It is not known as to why this term was created by the IT world. It is not likely for the human beings to create a hard drive of 1 Geopbyte capacity. According to the IBM Dictionary of computing, the hard disk space, or data storage space in terms of bytes in decimal notation are described as
1 Bit = Binary Digit; 8 Bits = 1 Byte; 1000 Bytes = 1 Kilobyte; 1000 Kilobytes = 1 Megabyte; 1000 Megabytes = 1 Gigabyte; 1000 Gigabytes = 1 Terabyte; 1000 Terabytes = 1 Petabyte; 1000 Petabytes = 1 Exabyte; 1000 Exabytes = 1 Zettabyte; 1000 Zettabytes = 1 Yottabyte; 1000 Yottabytes = 1 Brontobyte; 1000 Brontobytes = 1 Geopbyte;

GPU-Accelerated Databases:

GPU database may be relational database or non-relational database, that uses a graphical processing unit (GPU) to perform operations. GPU databases are the most innovative and their trend is pretty fast in the world of databases. These are more flexible in processing many types of streaming data or much larger amounts of data and primarily focus on analytics. GPU-accelerated DBMS is virtually identical to a “traditional” DBMS. It includes functionality for integrity control, parsing SQL queries, and performing logical optimizations on queries. Online analytical and transactional processing can be greatly accelerated by using GPU acceleration.

Graph Analytics:

Graph analytics technology help organizations discovering the cause, effect, and influence of events on business outcomes and can be used for solving problems. Graph analytics can organize and visualize relationships by comparing “many-to-many.” With graph analytics it is possible to enquire not only about the friends of a person having direct relationships but also all of their friends as well.

Graph Databases:

Graph database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data. Graph databases can databases directly store the relationships between records and can powerfully manage highly connected data and complex queries. With graph databases we can quickly explore millions of connections per second.

Alphabet – H

Hadoop:

Hadoop is an open source, Java-based software programming framework for storing and processing massive amount of data and running applications under distributed computing environments. It handles extremely large storage for any type of data, huge processing power for virtually unlimited concurrent tasks. It is part of the “Apache” project sponsored by the “Apache Software Foundation”.

Hadoop User Experience (Hue):

Hadoop User Experience (Hue) is a web-based open source Analytics Workbench for browsing, querying and visualizing data through Apache Hadoop. It is a pack of several applications that interact with Hadoop components, and has an open SDK for creating new applications. Applications forming a part of Hue are: FileBrowser, Beeswax, Impala App, Oozie App, Pig App, HBase Browser, Table Browser, Search App, Job Browser and Job Designer.

HANA (High-Performance Analytical Application)

(HANA) is an in-memory database and application development platform from SAP for processing massive amount of data with delivery of real time transactions and analytics. HANA stores data in the memory in a columnar format and supports industry standards, like SQL and MDX (Multidimensional Expressions).

HBase:

HBase is an open-source, non-relational, distributed column-oriented database written in Java. It is a part of Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File System). It offers Google’s Bigtable like functionality to Hadoop. It provides a fault-tolerant way of storing large amount of sparse data.

Alphabet – I

Ingestion:

Ingestion in the context of big data is the process of importing of streaming data from several different sources for immediate use or storage in a database. With automated data ingestion process, the data preparation remains an integral part of the automation process so that the data is easily analyzed on the fly or later on by business intelligence (BI) and business analytics (BA) programs

Alphabet – L

Load Balancing:

Load balancing is the activity of distributing workload across multiple computers or servers so that optimum results are achieved out of the system in a given time or all users get the service faster. Load balancers are deployed to enhance the capacity of concurrent users and the reliability of applications. Load balancing can be implemented with software, hardware or with a combination of the both.

Alphabet – M

MapReduce:

MapReduce is a programming model for processing and generating big data sets with a parallel, distributed algorithm on a cluster. A MapReduce program consists of a mapping method, which filters and sorts data, and a reduce method, which performs a function on that data and returns an output.

Mashup:

Mashup in context with data is an integration of two or more data sets in a single graphical interface. To the end-users, data analyst, or business manager scanning dashboards through reports, the background activities remain invisible. These people simply enjoy the benefits of greater access to rich information associated with the data. For example, combining real estate listings with demographic data or geographic data.

Metadata:

Metadata is the data which provide context or additional information about the other data. Metadata can provide information about the condition of storage of data in the database, its correctness, time, date, process of compilation and processing, etc. Example of metadata of a document can be its title information, subject, author, revisions, size of data file, date of creation etc.

MongoDB:

MongoDB is a free and open-source cross-platform document-oriented database program in contrast to the traditional table-based relational database structure. It facilitates integration of structured and unstructured data in various applications quite easier and faster. MongoDB can be used as a file system with load balancing and data replication capabilities over multiple machines for storing files.

Multi-Dimensional Databases (MDB):

A multidimensional database or (MDB) – or a multidimensional database management system (MDDBMS) is a database optimized for OLAP applications for rapidly process the data.

MultiValue Databases:

MultiValue Databases is a type of NoSQL and multidimensional database which are similar to conventional RDBMSs storing data in tables. They differ from RDBMSs in the aspect that a MultiValue Database can assign multiple values to the attribute of a record. They can powerfully handle HTML and XML strings directly.

Munging:

Munging in context with data refers to cleaning up a messy data set. Munging means all types of activities that can be performed on the raw data for making it more cleaner for conveniently inputting in an analytical algorithm.

Alphabet – N

Natural Language Processing (NLP):

Software algorithms, under artificial intelligence, that are designed to permit computers to precisely understand everyday human speech in natural languages, and effectively process large amounts of natural human language data.

Neural Network:

Neural network is a biologically-inspired programming system the pattern of which is based upon the operation of neurons in the human brain. Neural networks are also known as artificial neural networks and deep learning is a set of techniques for learning in neural networks. These techniques are commercially used for solving complex problems on pattern recognition or signal processing. Examples of commercial applications are speech-to-text transcription, handwriting recognition for check processing, facial recognition, speech-to-text transcription and weather prediction etc.

Normal Distribution:

Normal distribution is a graphical representation of continuous probability of large number of random variables. Normal distribution curve is also known as bell curve or Gaussian distribution curve and all data values are plotted symmetrically with majority of the results situated around the mean value of probability.