Databricks Launches Second MOOC: Scalable Machine Learning
We have been working in collaboration with professors at UC Berkeley and UCLA to produce two freely available Massive Open Online Courses (MOOCs). The first MOOC was released earlier this month and has...
View ArticleMyFitnessPal Delivers New Feature, Speeds up Pipeline, and Boosts Team...
To learn more about how Databricks helped MyFitnessPal with analytics, check out an earlier article in Wall Street Journal (log-in required) or download the case study. We are excited to announce that...
View ArticleGuest blog: PMML Support in Spark MLlib
This is a guest blog from our friend Vincenzo Selvaggio. The recently released Apache Spark 1.4 introduces PMML support to MLlib for linear models and k-means clustering. This achievement is the result...
View ArticleNew Visualizations for Understanding Spark Streaming Applications
Earlier, we presented new visualizations introduced in Spark 1.4.0 to understand the behavior of Spark applications. Continuing the theme, this blog highlights new visualizations introduced...
View ArticleAnnouncing SparkHub: A Community Site for Apache Spark
Today, we are happy to announce SparkHub (http://sparkhub.databricks.com), a service for the Apache Spark™ community to easily find the most relevant Spark resources on the web. SparkHub contains the...
View ArticleIntroducing R Notebooks in Databricks
Spark 1.4 was released on June 11 and one of the exciting new features was SparkR. I am happy to announce that we now support R notebooks and SparkR in Databricks, our hosted Spark service. Databricks...
View ArticleIntroducing Window Functions in Spark SQL
In this blog post, we introduce the new window function feature that was added in Spark 1.4. Window functions allow users of Spark SQL to calculate results such as the rank of a given row or a moving...
View ArticleJoint Blog Post: Bringing ORC Support into Apache Spark
This is a joint blog post with our partner Hortonworks. Zhan Zhang is a member of technical staff at Hortonworks, where he collaborated with the Databricks team on this new feature. In version 1.2.0,...
View ArticleBe Heard with the Spark Survey
At Databricks, we are constantly working to improve Apache Spark. To help us and the Spark community, we would love to hear from you to help set Spark’s future direction. A recent example of the...
View ArticleYesware Deploys Production Data Pipeline in Record Time with Databricks
We are happy to announce that Yesware chose Databricks to build its production data pipeline, completing the project in record time — in just under three weeks. Press release:...
View ArticleUsing 3rd Party Libraries in Databricks: Spark Packages and Maven Libraries
In an earlier post, we described how you can easily integrate your favorite IDE with Databricks to speed up your application development. In this post, we will show you how to import 3rd party...
View ArticleNew Features in Machine Learning Pipelines in Spark 1.4
Spark 1.2 introduced Machine Learning (ML) Pipelines to facilitate the creation, tuning, and inspection of practical ML workflows. Spark’s latest release, Spark 1.4, significantly extends the ML...
View ArticleDiving into Spark Streaming’s Execution Model
With so many distributed stream processing engines available, people often ask us about the unique benefits of Spark Streaming. From early on, Apache Spark has provided an unified engine that natively...
View ArticleGuest blog: SequoiaDB Connector for Apache Spark
This is a guest blog from Tao Wang at SequoiaDB. He is the co-founder and CTO of SequoiaDB, leading its long-term technology vision, and is responsible for the leadership of advanced technology...
View ArticleHelping the Democratization of Big Data
When we started Databricks, we thought that extracting insights from big data was insanely difficult for no good reason. You almost needed an advanced degree to be able to get any meaningful work done....
View ArticleAnnouncing the Databricks Academic Partners Program
Databricks was born from academic research and today we are giving back to the academic community with the Databricks Academic Partners program. This program will provide academic instructors and...
View ArticleFrom Pandas to Apache Spark’s DataFrame
This is a cross-post from the blog of Olivier Girardot. Olivier is a software engineer and the co-founder of Lateral Thoughts, where he works on Machine Learning, Big Data, and DevOps solutions. With...
View ArticleSpark 1.5 Preview Now Available in Databricks
We are excited to announce that starting today, Apache Spark 1.5.0 is available as a preview in Databricks. Our users can now choose to provision clusters with Spark 1.5 or previous Spark versions...
View ArticleSpark Summit Europe Full Agenda available online
This October, join the Apache Spark community in Amsterdam at the Beurs Van Berlage for the very first Spark Summit in Europe! We are happy to announce that the full agenda is now finalized, you can...
View ArticleAnnouncing Spark 1.5
The inaugural Spark Summit Europe will be held in Amsterdam this October. Check out the full agenda and get your ticket before it sells out! Today we are happy to announce the availability of Apache...
View Article