Apache Zeppelin Walk Through with Hortonworks


Screenshot 2015-10-27 11.54.33

Introduction

Apache Zeppelin (Incubator at the time of writing this post) is one of my favourite tools that I try to position and present to anyone interested in Analytics, Its 100% open source with an intelligent international team behind it in Korea (NFLABS)(Moving to San Francisco soon),  its mainly based on interpreter concept that allows any language/data-processing-backend to be plugged into Apache Zeppelin.

Very similar to IPython/Jupyter except that the UI is probably more appealing and the amount of interpreters supported are richer, at the time of writing this Blog Zeppelin supported:

with this rich set of interpreters provided, it makes on boarding platforms like Apache Hadoop or Data Lake concepts much easier where data is sitting and consolidated somewhere and different organizational units with different skill sets needs to access the data and perform their day to day duties on it as data discovery, queries, data modelling, data streaming and finally Data Science using Apache Spark.

Apache Zeppelin Overview

With the notebook style editor and the ability to save notebooks on the fly, you can end up with some really cool notebooks, whether you are a data engineer, data scientist or a BI specialist.

Zeppelin Notebook Example

Dataset showing the Health Expenditure of the Australian Government over time by state.

Zeppelin also got a basic clean visualization views integrated with it, it also gives you control over what do you want to include in your graph by dragging and dropping fields in your visualization as below:

Zeppelin Drag and Drop
The sum of government budget healthcare expenditure in Australia by State

Also when you are done with your awesome notebook story, you can easily create a report out of it and either print it or send it out.

Car Accidents Fatalities in Melbourne

Car Accident Fatalities related to Alcohol driving , showing the most fatal days on the streets and the most fatal car accident types during Alcohol times

Playing with Zeppelin

If you have never played with Zeppelin before then visit this link for a quick way  to start working it out using the latest Hortonworks tutorial we are including Zeppelin as part of HDP as a technical preview, which may supporting it officially may follow, check it out  Here try out the different interpreters and how it interacts with Hadoop.

Zeppelin Hub

I was recently given access to the beta version of Hub, Hub is supposed to make life in organizations easier when it comes to sharing notebooks between different departments or pepole within the organization.

Lets assume an Organization got Marketing, BI and Data Science practices, the three departments overlaps with each other when it comes to the datasets being used, therfore there is no need anymore for each department to work completely isolated from the others, as they can share their experience together, brag about their notebooks, work together on the same notebook when trying to work on either complicated notebook or different skills are required.

Zeppelin Hub UI

Zeppelin Hub UI

Lets have a deeper look at Hub…

Hub Instances

Instance is backed by a Zeppelin installation somewhere (server,laptop,hadoop..etc), every time you create a new Instance a new Token is generated, this token should be added in your local Zeppelin installation under folder /incubator_zeppelin/conf/zeppelin-env.sh e.g.

export ZEPPELINHUB_API_TOKEN="f41d1a2b-98f8-XXXX-2575b9b189"

Once the token is added, you will be able to see the notebooks online whenever you connect to Hub (http://zeppelin.hub.com).

Hub Spaces

once an instance is added, you will be able to see all the notebook for each instance, and since every space is actually either a dept. or a category of notebooks that needs to be shared across certain people, you can easily drag and drop notebooks into spaces making them shared across this specific space.

Adding a Notebook to a Space

Adding a Notebook to a Space

Showing a Notebook inside Zeppelin Hub

Showing a Notebook inside Zeppelin Hub

Very cool !

Since its beta, there is still much of work to be done like executing notebooks from Hub directly, resizing and formatting and some other minor issues, I am sure the All Stars team @nflabs will make it happen very soon as they always did.

if you are interested in playing with Beta, you may request access on Apache Zeppelin website here

Hortonworks and Apache Zeppelin

Hortonworks is heavily adopting Apache Zeppelin, that showed in the contribution they have made into the product and into Apache Ambari, Ali Bajwa one of Rockstars at Hortonworks created an Apache Zeppelin View on Ambari, which gives Zeppelin authentication and allows users to have a single pane of glass when it comes to uploading datasets using HDFS view on Apache Ambari Views and other operational needs.

Apache Ambari with Zeppelin View Integration

Apache Ambari with Zeppelin View Integration

Screenshot 2015-10-27 11.25.15

Apache Zeppelin Notebook editor from Apache Ambari

If you want to integrate Zeppelin in Ambari with Apache Spark as well, just easily follow the steps  on this link

Helium

Project Helium is a revolutionary change in Zeppelin, Helium allows you to integrate almost any standard html, css, javascript as a visualization or a view inside Zeppelin.

Helium Application would consists of an View, Algortihm and an Access to the resource, you can get more information of Helium here

One thought on “Apache Zeppelin Walk Through with Hortonworks

  1. Pingback: Big Analytics Roundup (November 2, 2015) | The Big Analytics Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s