The data which remains in fixed field within file or records is called structured data.This includes data contained in relational databases and spreadsheets.
The term structured data generally refers to data that has a defined length and format for big data. Examples of structured data include numbers, dates, and groups of words and numbers calledstrings. Most experts agree that this kind of data accounts for about 20 percent of the data that is out there. Structured data is the data you’re probably used to dealing with. It’s usually stored in a database.
|
Structured Data |
Sources of structured big data
Although
this might seem like business as usual, in reality, structured data is taking on a new impotence and role in the world of big data. The moderation of technology leads to newer sources of structured data being generated — often in real time and in huge volumes. The sources of data are categorized into two types:
Computer- or machine-generated: Machine-generated data normally refers to data that is created by a machine without human interaction.
Human-generated: This is data that humans, in intervention with computers, supply.
Some experts argue that a third type exists that is a in between machine and human. Here though, we’re explaining with the first two types.
Machine-generated structured data can include the following:
Sensor data: Examples contains radio frequency ID tags, medical devices, and Global Positioning System data. Corporations are looking for this type for supply chain management and data inventory control.
web log data: When servers, applications, networks, and so on operate, they capture all kinds of data about their activity. This can amount to huge volumes of data that can be useful, for example, to deal with service-level agreements or to predict security breaches.
Point-of-sale data: When the cashier swipes the bar code of any product that you are purchasing, all that data associated with the product is generated.
Financial data: Lots of financial systems are now programmatic; they are operated based on predefined rules that automate processes. Stock-trading data is a good example of this. It contains structured data such as the company symbol and dollar value. Some of this data is machine generated, and some is human generated.
Examples of structured human-generated data might include the following:
Input data: This is any piece of data that a human might input into a computer, such as name, age, income, non-free-form survey responses, and so on. This data can be useful to understand basic customer behavior.
Click-stream data: Data is generated every time you click a link on a website. This data can be analyzed to determine customer behavior and buying patterns.
Gaming-related data: Every move you make in a game can be recorded. This can be useful in understanding how end users move through a gaming portfolio.
When taken together with millions of other users submitting the same information, the size is astronomical. Additionally, much of this data has a real-time component to it that can be useful for understanding patterns that have the potential of predicting outcomes.
The bottom line is that this kind of information can be powerful and can be utilized for many purposes.
The role of relational databases in big data
Data persistence refers to how a database retains versions of itself when modified. The great granddaddy of persistent data stores is the relational database management system. In its infancy, the computing industry used what are now considered primitive techniques for data persistence.
The relational model was invented by Edgar Codd, an IBM scientist, in the 1970s and was used by IBM, Oracle, Microsoft, and others. It is still in wide usage today and plays an important role in the evolution of big data. Understanding the relational database is important because other types of databases are used with big data.
In a relational model, the data is stored in a table. This database would contain a schema — that is, a structural representation of what is in the database. For example, in a relational database, the schema defines the tables, the fields in the tables, and the relationships between the two.
The data is stored in columns, one each for each specific attribute. The data is also stored in the row. The first table stores product information; the second stores demographic information. Each has various attributes. Each table can be updated with new data, and data can be deleted, read, and updated. This is often accomplished in a relational model using a structured query language (SQL).
|
structure data |
Another aspect of the relational model using SQL is that tables can be queried using a common key. The common key in the tables is CustomerID.
You can submit a query, for example, to determine the gender of customers who purchased a specific product. It might look something like this:
Select CustomerID, State, Gender, Product from "demographic table", "product t