Tag Archives: datawarehouse

The Rise of Analytics, Hadoop and The Data Lake – Analytics

In my current role, I am being so lucky getting insights from different customers around the rise of their analytics story, and the baby steps companies out there are taking in order to get faster insights on their customer base before some competitor does in order to retain the customer base and possibly grow it, along with the baby steps some giant mistakes are occurring returning the company a thousand step backwards, especially in a field where FINALLY application and storage meet together outside performance requirement and IOPS discussions but talking about the governance of the data, storage , access, protection and most importantly Automation.

In the last 2 decades of data storage and with more and more application being developed in organizations, data ended up in silos of storage, some are still being tier one served from a tier one SAN storage other are shared folders and web logs being stored in a slower more economic and denser tier which could be a NAS storage.

The last big step in the database world was data warehousing, where many of databases where living isolated from each other, not allowing for a collaborative way of achieving analytics efficiently in an organization, over the last decade most of the CIOs task list was about the Data Warehousing Strategy.

Analytics on databases is a way of creating complex (sometime not) queries or searches from a data source(s), others may call it data mining well all , in my opinion, There are three different stages for analytics.

Stage 1 started with single database analytics, Stage 2 started a decade ago with analytics performed on multiple data sources and databases mostly using data warehousing where the data was mostly structured, Stage 3 that is discovering or just about to implement it which is analytics based on the same old structured data that we use to query in addition to the unstructured data.

Handling structured data is so mature now that most people understands that I need a relational database to handle my customer orders records for example (Although NOSQL databases are a valid replacement for this market), unstructured data came in with its own challenges, I can write a an 3000 word article on each of them but shortly for this one its data ingestion and flow where tools like Apache storm and Pivotal springXD excels at, software storage layer where a file system like Hadoop is the preferred way to go depends on the case you may use MongoDB, Cassandra…etc a friend of mine actually forwarded me a great article on when to use whathttp://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

Unstructured data varies from machine generated data (e.g. server logs) web logs (cookies, customers movement on the web, tracking..etc) social media like twitter, facebook, mobile phones beacons (Bluetooth and Wifi) and any other information that relates to your business, customers or product.

These days organizations are taking analytics really seriously, I can see these job ads everywhere for data engineers and data scientists (yes there is a major different), I have met tons of these data scientists here and no not all of them are R experts, some of them only know couple of visualization tools like Tableau but they are able to get great insights from the data, some of them doesn’t even know basic shell commands on unix, and some of them are completely Microsoft windows only users….

Retail, City Councils, Airports, Insurance, Finance, Banking, Government, Defense, and even small startups have started doing something around analytics, infact some of them talk about innovation and Gartner Pace Layered approach (Open Data Lake), in Airports we talk about flight delays prediction, Bluetooth beacons, city councils does parking sensors on the streets for sending the inspector when time is expired for a certain occupied bay and traffic light management using beacons, Retail needs better insights and not the overnight ones from EDW,also their massively increasing foot print of data, insurance collects data from mobile apps in order and correlate customers to their information in order to retain their footprint, health insurance are giving away fit bits to offer better discounts for active healthy customers, cant talk about government and defense.

in the next episodes I will be  connecting more dots together in technology prespecitves, hadoop, in-memory databases, mpp databases,reporting tools,lambda architecture,gartner stuff, …etc stay tuned…