What is Azure Databricks?

Azure Databricks was announced November 15, 2017 at Microsoft Connect. Azure Databricks is a fast, easy, and collaborative Apache Spark based analytics platform. It allows Data Scientist, Data Engineers, and Business users to review code, build models, and analyze data within a single platform. At this highest level it is the only framework that combines both data processing with machine learning.

Databricks & Apache Spark

Databricks is a company that was founded in 2013 by the creators of Apache Spark and enables easy access to cloud-based processing using Apache Spark. Databricks develops a web interface for automated cluster management, IPython style notebooks for code development, and hosts Spark Summit which is the largest conference for Spark. Databricks grew from a team of seven at AMPLab in University of California, Berkeley.

In 2009 Matei Zaharia’s PhD Dissertation from on An Architecture for Fast and General Data processing on Large Clusters presents the problems with specialized systems and why the need to a unified solution was necessary. Apache Spark was developed to be a distributed computing framework built on Scala as an alternative to Google’s MapReduce. Spark was donated to the Apache Software Foundation Incubator Project in June 2013 and graduated in February 2014. Since then Apache Spark has become the most active and widely used open source project for data analytics. There are hundreds of people contributing to the project each year and continues to add new libraries.

How to get started

Databricks is available in Community Edition, as a 14-day free trial on AWS, and on Microsoft Azure. The Community Edition is easy to get started and contains a single cluster with 6GB, no worker nodes, and publicly share your work. This is great to learn and develop your solutions. The 14-day free trial will spin up a cluster that can scale to any size, access to a job scheduler, advanced security, and integration with BI tools for visualization.

Try Databricks at this link: https://databricks.com/try-databricks

Microsoft Azure Databricks also includes a 14-day free trial. Along with the same features listed in the Databricks on AWS and Community Edition the Azure solution will allow for integration into SQL Data Warehouse

Microsoft Azure Databricks: https://azure.microsoft.com/en-us/services/databricks/





Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.