Share

Top Two Concerns of Big Data Hadoop Implementation

John Plato Jose


According to IBM, we create 2.5 quintillion bytes of data every day. These data originates from all spheres of activity and everywhere: to name just a few, data's come from sensors, social media sites, digital pictures, web logs and transaction records of online purchases etc,.

In general, data can be classified into three categories. Any data which can be stored in databases can be called as Structured data. For example, transaction records of online purchase can be stored in databases. Hence, it can be called as Structured data. Some data can be partially stored in databases which can be called as Semi-Structured data. For example, the data on the XML records can be partially stored in databases and it can be called as Semi Structured Data.

The other forms of data which will not fit into these two categories are called as Unstructured Data. To name a few, data from social media sites, web logs cannot be stored analysed and processed in databases, therefore it is categorised as Unstructured Data. The other term used for Unstructured Data is Big Data.

According to NASSCOM, Structured Data accounts for 10% of the total data that exists today in the Internet. It accounts for 10% of semi-structured data and the remaining 80% of data comes under Unstructured Data. In general, organizations use analysis of Structured and Semi Structured Data using traditional data analytics tools. There was no sophisticated tools available to analyse the Unstructured Data till the Map Reduce framework which was developed by Google. Later, Apache developed a framework called "Hadoop" which analyses all these Data and reveals information which will be of great help for business to take better decisions.

Hadoop has already proved its importance in several areas. For example, according to NASSCOM, many organizations have started using Big Data analytics. National Oceanic and Atmosphere Administration (NOAA), National Aeronautics and Space Administration (NASA) and several pharmaceutical and energy companies have started using big data analytics extensively to predict their customer behaviour.

According to a recent research from Nemertes group, organizations perceive value in Big Data analytics and planning to have a better leverage in reaping the benefits of Big Data Analytics. The New York Times is using Big Data tools for text analysis, and Walt Disney Company use them to correlate and understand customer behaviour in all of its stores and theme parks. Indian IT companies such as TCS, Wipro, Infosys and other key players have also started to reap the immense potential which Big Data continues to offer.

This clearly shows that Big Data is an emerging area and many companies have started to explore new opportunities. Meanwhile, usage Big Data is proving to be worthwhile but at the same time it may also be noted that privacy and data protection concerns have also risen.

The concern about Big Data analytics is very much valid from the viewpoint of privacy. Let me give a very simple example. Nowadays I am very much sure that most of us use Social media such as Face book, Twitter and many other social forums and most of us watch videos on YouTube. Imagine these websites using Big Data Analytical tools to identify your activity on the Internet, to analyse data, your search behaviour and the content you have watched in social media. Through Big Data your activity on the Social Media Forum can be clearly identified. This is a blatant violation of your privacy. Further, just imagine the organization is sharing the data from the analysis to a few marketing agencies, this in turn creates more privacy issues.

Now let us discuss things from the data protection perspective. As usual. Big Data is stored in Cloud environment. It means the data is distributed over the network and stored somewhere in the Globe. Let me give an example. Let us say you reside in UK and access some social media website and your data including your profile may be stored in a country in Asia or in some other country. If the social media website decides to sell some of the data including your data to a marketing agency, they will be in a position to gain complete access to your profile, including your phone number.

If the marketing agency tracks the geo-location of the phone number, they will be in a position to record your complete movements right from the time you leave your house and move on to your friend's house, when you leave your house for work and even your visit to your lover will also be recorded. Armed with this data, advertisers may use things for their advantage according to the regular routine adopted by you every day and they can also locate you and promote their ventures wherever you are. It clearly shows that Data protection is another major concern with Big Data Analytics.

Several lawmakers and regulators around the globe have voiced their concern about Big Data analytics. Organizations such as Consumer Watchdog have also raised apprehensions about privacy and data protection connected with Big Data Analytics. According to a report from Gartner, "Forty one percent of consumers say they would be concerned about privacy if they were to use mobile location services so that they can receive more targeted offers through advertising or loyalty programs".

Big Data is a great tool and it can open more avenues and great opportunities to organizations. The extraordinary benefits of Big Data should not be tampered by concerns over privacy and data protection. The good aspect is, many organizations are clearly aware and have beforehand information regarding this issue. Some of the organizations have started to share the intent of data collection to the customers. Some organizations have updated the privacy policy on their websites to share the intent of its data collection strategy.

Besides the Cloud Security Alliance (CSA), a consortium of technology companies and public sector agencies have launched the Big Data Working Group, which is working to find suitable solution to data-centric and privacy problems. Therefore, hopefully, these two major issues will be addressed and benefits of Big Data analysis will be put to great use and immense potential it offers will be harnessed in the coming days. Let's hope for the best.

John Jose is the Managing Director of Perpetro Technologies Private Limited which is offering services including software development, testing, IT trainings including Big Data and corporate trainings to organizations. Check out the URL http://www.perpetrotech.com/ for additional information.

Article Source: http://EzineArticles.com/expert/John_Plato_Jose/1733829
http://EzineArticles.com/?Top-Two-Concerns-of-Big-Data-Hadoop-Implementation&id=8206862






Manage research, learning and skills at IT1me. Create an account using LinkedIn to manage and organize your IT knowledge. IT1me works like a shopping cart for information -- helping you to save, discuss and share.


  Contact Us  |  About IT1me.com |  IT Training & References |  IT Careers |  IT Hardware |  IT Software |  IT Books