Sharethrough Selects Databricks to Discover Hidden Patterns in Ad Serving...
We’re really excited to announce that Sharethrough has selected Databricks Cloud to discover hidden patterns in customer behavior data. Press release:...
View ArticleAnnouncing Spark 1.3!
Today I’m excited to announce the general availability of Spark 1.3! Spark 1.3 introduces the widely anticipated DataFrame API, an evolution of Spark’s RDD abstraction designed to make crunching large...
View ArticleSpark’ing an Anti Money Laundering Revolution
This is a guest blog from our one of our partners: Tresata Tresata and Databricks announced a real-time, Spark and Hadoop-powered Anti-Money Laundering solution earlier today. Tresata’s predictive...
View ArticleDatabricks Launches “Jobs” Feature for Production Workloads
Databricks Cloud now includes a new feature called Jobs, enabling support for running production pipelines, consisting of standalone Spark applications. Jobs includes a scheduler that enables data...
View ArticlePanTera Big Data Visualization Leverages the Power of Databricks Cloud
This is a guest blog from our one of our partners: UnCharted formerly known as Oculus Info, Inc. About PanTeraTM PanTera was created with the fundamental guiding principles that visualization,...
View ArticleUsing MongoDB with Spark
This is a guest blog from Matt Kalan, a Senior Solution Architect at MongoDB Introduction The broad spectrum of data management technologies available today makes it difficult for users to discern...
View ArticleWhat’s new for Spark SQL in Spark 1.3
The Spark 1.3 release represents a major milestone for Spark SQL. In addition to several major features, we are very excited to announce that the project has officially graduated from Alpha, after...
View ArticleTopic modeling with LDA: MLlib meets GraphX
Topic models automatically infer the topics discussed in a collection of documents. These topics can be used to summarize and organize documents, or used for featurization and dimensionality reduction...
View ArticleImprovements to Kafka integration of Spark Streaming
Apache Kafka is rapidly becoming one of the most popular open source stream ingestion platforms. We see the same trend among the users of Spark Streaming as well. Hence, in Spark 1.3, we have focused...
View ArticleSpark Turns Five Years Old!
Today, we’re celebrating an important milestone for the Spark project — it’s now been five years since Spark was first open sourced. When we first decided to release our research code at UC Berkeley,...
View ArticleSpark 2.0: Rearchitecting Spark for Mobile Platforms
Yesterday, to celebrate Spark’s 5 year old birthday, we looked back at the history of the project. Today, we are happy to announce the next major chapter of Spark development: an architectural overhaul...
View ArticleLearning how to write Spark applications in Databricks Cloud with the...
We built Databricks Cloud on top of the Apache Spark framework to make data analysis simple. In the same spirit, we want to make the adoption and usage of Databricks Cloud simple. Developers and data...
View ArticleTimeful Chooses Databricks Cloud to Enable Intelligent Time Management
We are thrilled to announce that Timeful chose Databricks Cloud to enable intelligent time management with data analytics. Press release:...
View ArticleA Look Back at Spark Summit East
We are delighted about the success of the first Spark Summit East, held in New York City on March 18th. The summit was attended by a sold-out crowd of over 900 people from more than 300 organizations....
View ArticleDeep Dive into Spark SQL’s Catalyst Optimizer
Spark SQL is one of the newest and most technically involved components of Spark. It powers both SQL queries and the new DataFrame API. At the core of Spark SQL is the Catalyst optimizer, which...
View ArticleRunning Spark GraphX algorithms on Library of Congress subject heading SKOS
This is a guest post from Bob DuCharme. Original article appeared in: http://www.snee.com/bobdc.blog/2015/04/running-spark-graphx-algorithm.html Well, one algorithm, but a very cool one. Last month, in...
View ArticleCeltra Scales Big Data Analysis Projects Six-Fold with Databricks Cloud
We are thrilled to announce that Celtra selected Databricks Cloud to scale its big data analysis projects, increasing the amount of ad-hoc analysis done, six-fold. Press release:...
View ArticleThe Easiest Way to Run Spark Jobs
Recently, Databricks added a new feature, Jobs, to our cloud service. You can find a detailed overview of this feature here. This feature allows one to programmatically run Spark jobs on Amazon’s EC2...
View ArticleNew MLlib Algorithms in Spark 1.3: FP-Growth and Power Iteration Clustering
This is a guest blog post from Huawei’s big data global team. Huawei, a Fortune Global 500 private company, has put together a global team since 2013 to work on Spark community projects and contribute...
View ArticleAnalyzing Apache Access Logs with Databricks Cloud
Databricks Cloud provides a powerful platform to process, analyze, and visualize big and small data in one place. In this blog, we will illustrate how to analyze access logs of an Apache HTTP web...
View Article