Improving Threat Detection in a Big Data World
High-profile cybersecurity breaches dominated headlines in 2017. In the first half of the year, over 1.9B records were stolen. That’s more than 7,000 records breached every minute. And the fallout from...
View ArticleSpark Summit is Becoming the Spark + AI Summit
We’re excited to announce that Spark Summit is expanding its coverage in 2018 to include in-depth content on artificial intelligence. We are also renaming the conference Spark + AI Summit. AI has...
View ArticleThe Architecture of the Next CERN Accelerator Logging Service
This is a community guest blog from Jakub Wozniak, a software engineer and project technical lead at CERN physics laboratory, further expounding and complementing his keynote at Spark Summit EU in...
View ArticleOverstock Marketing + Databricks = Data Science at Scale
This is a guest post from Chris Robison, Head of Marketing Data Science at Overstock.com. At Overstock.com we’ve never had a problem with a lack of data. At 19 years old, we have one of the most...
View ArticleUnifying People Processes and Platform – The Movie
Today we released our Databricks Unified Analytics Platform video. This short video illustrates to analytics leaders how Databricks can unify their analytics efforts onto one platform. This unification...
View ArticleDatabricks and Apache Spark 2017 Year in Review
At Databricks we welcome the dawn of the New Year 2018 by reflecting on what we achieved collectively as a company and community in 2017. In this blog, we elaborate on the three themes: unification,...
View ArticleDatabricks Cache Boosts Apache Spark Performance
We are excited to announce the general availability of Databricks Cache, a Databricks Runtime feature as part of the Unified Analytics Platform that can improve the scan speed of your Apache Spark...
View ArticleMeltdown and Spectre’s Performance Impact on Big Data Workloads in the Cloud
Last week, the details of two industry-wide security vulnerabilities, known as Meltdown and Spectre, were released. These exploits enable cross-VM and cross-process attacks by allowing untrusted...
View ArticleMeltdown and Spectre: Exploits and Mitigation Strategies
In an earlier blog post, we analyzed the performance impact of Meltdown and Spectre on big data workloads in the cloud. In this blog post, we explain these exploits, their mitigation strategies and...
View ArticleMatei Zaharia’s 5 predictions about AI in 2018
Over the past few years, the demand for artificial intelligence (AI) and machine learning capabilities has surged with innovations in natural language processing, task automation, and predictions. From...
View ArticleAccelerate Innovation with Microsoft Azure Databricks
It’s hard to believe that we are already three weeks into 2018. If you’re still struggling to get valuable insights from your data, now is the perfect time to try something new! We recently announced...
View ArticleIntroducing Apache Spark 2.3
Today we are happy to announce the availability of Apache Spark 2.3.0 on Databricks as part of its Databricks Runtime 4.0. We want to thank the Apache Spark community for all their valuable...
View ArticleApache Spark 2.3 with Native Kubernetes Support
This is a community blog from Anirudh Ramanathan and Palak Bhatia, software engineer and product manager respectively at Google, working in the Kubernetes team. They are part of the group of companies...
View ArticleAnnouncing Machine Learning Model Export in Databricks
In recent years, machine learning has become ubiquitous in industry and production environments. Both academic and industry institutions had previously focused on training and producing models, but the...
View ArticleIntroducing Stream-Stream Joins in Apache Spark 2.3
Since we introduced Structured Streaming in Apache Spark 2.0, it has supported joins (inner join and some type of outer joins) between a streaming and a static DataFrame/Dataset. With the release of...
View ArticleSelected Sessions to Watch for at Spark + AI Summit 2018
Early last month, we announced our agenda for Spark + AI Summit 2018, with over 180 selected talks with 11 tracks and training courses. For this summit, we have added four new tracks to expand its...
View ArticleIntroducing Low-latency Continuous Processing Mode in Structured Streaming in...
Import this notebook on Databricks Structured Streaming in Apache Spark 2.0 decoupled micro-batch processing from its high-level APIs for a couple of reasons. First, it made developer’s experience...
View ArticleAzure Databricks, industry-leading analytics platform powered by Apache Spark™
The confluence of cloud, data, and AI is driving unprecedented change. The ability to utilize data and turn it into breakthrough insights is foundational to innovation today. Our goal is to empower...
View ArticleIntroducing Click: The Command Line Interactive Controller for Kubernetes
Click is an open-source tool that lets you quickly and easily run commands against Kubernetes resources, without copy/pasting all the time, and that easily integrates into your existing command line...
View ArticleIntroducing Data Brick™: The Building Block of DataBricks’ Unified Analytics...
As a digital society built around data and devices, we have reached a pivotal juncture where data and Artificial Intelligence must be accessible to everyone. Riding this trend, many homes now contain...
View Article