muhammad-s of ElektraSoft
10/9/2017 - 9:20 PM

Big Data (BD0101EN)

This course is designed to introduce you to Big Data and Big Data terminology.


3 Major Sources of Big Data:
  People-generated data
  Machine-generated data,
  Business-generated data
  
Big Data Forms:
  Structured: (Sources: Relational DB and SpreadSheets)
    Structured data is data that is organized, labelled, and has a strict model that it follows.
  Unstructured:
    Unstructured data is said to make up about 80% of data in the world, where the data is usually in a text form and does not have a predefined model or is organized in any way.
  Semi-structured: (Sources: XML , JSON)
    And semi-structured data is a combination of the two. It is similar to structured data, where it may have an organized structure, but lacks a strictly-defined model.

More data has been created in the past two years than in the entire history of humankind.

By 2020, about 1.7 megabytes of new information will be created every second for every human being in the world.

By 2020, the data we create and copy will reach around 35 zettabytes, up from only 7.9 zettabytes today.


bit> Bytes > KB> MB > GN> TB> PetaByte> ExaByte> ZetaByte

It is predicted that by 2020, 10%  of the world's data will be produced by machines, and most of the world's data will be produced in emerging markets.

Companies like Amazon, Netflix and Spotify use algorithms based on big data to make specific recommendations based on customer preferences and historical behavior.

Personal assistants like Siri on Apple devices use big data to devise answers to the infinite number of questions end users may ask.

Google now makes recommendations based on the big data on a user's device.
1. Velocity (speed of Data)
  Velocity is the idea that data is being generated extremely fast, a process that never stops.
  Attributes include near or real-time streaming and local and cloud-based technologies that can process information very quickly.
  Drivers : Competative Advantage, precomputed Info

2. Volume (Scale of Data)
  Volume is the amount of data generated.For example, exabytes, zettabytes, yottabytes, etc..
  Drivers of volume are the increase in data sources, higher resolution sensors and scalable infrastructure.

3. variety (Diversity of Data)
  Variety is the idea that data comes from different sources,machines, people, processes,both internal and external to organizations.
  Attributes include the degree of structure and complexity and drivers are mobile technologies, social media, wearable technologies, geo technologies, video, and many, many more.

4. Veracity ( (Accuracy) Certainty of Data)
  Veracity is the quality and origin of data. 
  Attributes include consistency, completeness, integrity,and ambiguity.
  Drivers include cost, and the need for traceability.

5.Value
  Value refers to our ability and need to turn data into value. Value isn't just profit.It may be medical or social benefits,or customer, employee, or personal satisfaction.
  The main reasons for why people invest time to understand.Big Data is to derive value from it.
Big Data is a collection of data from traditional and digital sources inside and outside a company that 
represent a source of ongoing discovery and analysis.