Allow Simple Cluster Creation with Full Admin Control Using Cluster Policies
What is a Databricks cluster policy? A Databricks cluster policy is a template that restricts the way users interact with cluster configuration. Today, any user with cluster creation permissions is...
View ArticleThe data community raised a total of $101,626 to help organizations fight...
Our commitment to diversity and inclusion is inherent in our company values at Databricks, but recent events and protests around the world have reminded us that there’s much more we can do to bring...
View ArticleA data-driven approach to Environmental, Social and Governance
The future of finance goes hand in hand with social responsibility, environmental stewardship and corporate ethics. In order to stay competitive, Financial Services Institutions (FSI) are increasingly...
View ArticleAzure Databricks Now Available in Azure Government (Public Preview)
We are excited to announce that Azure Databricks is now in Microsoft’s Azure Government region, enabling new data and AI use cases for federal agencies, state and local governments, public...
View ArticleHow to Extract Market Drivers at Scale Using Alternative Data
Watch the on-demand webinar Alternative Data Analytics with Python for a demonstration of the solution discussed in this blog and/or download the following notebooks to try it yourself. Stock Analysis...
View ArticleSpark + AI Summit Reflections
Developers attending a conference have high expectations: what knowledge gaps they’ll fill; what innovative ideas or inspirational thoughts they’ll take away; who to contact for technical questions,...
View ArticleAnalyzing Customer Attrition in Subscription Models
Download the notebooks to demo the solution covered below The subscription model is experiencing a renaissance. Gone are the days of the penny music CD clubs, replaced by an ever-increasing assortment...
View ArticleBucket Brigade — Securing Public S3 Buckets
Are your Amazon S3 buckets secure? Do you know which ones are public or private? Do you even know which ones are supposed to be? Data breaches are expensive. Facebook notoriously exposed 540 million...
View ArticleOptimizing User Defined Functions with Apache Spark™ and R in the Real World:...
Introduction In part 1 we talked about how Baseball Operations for the Minnesota Twins wanted to run up to 20k simulations on 15 million historical pitches – 300 billion total simulations – to more...
View ArticleA Comprehensive Look at Dates and Timestamps in Apache Spark™ 3.0
Apache Spark is a very popular tool for processing structured and unstructured data. When it comes to processing structured data, it supports many basic data types, like integer, long, double, string,...
View ArticleModern Industrial IoT Analytics on Azure – Part 1
This post and the three-part series about Industrial IoT analytics were jointly authored by Databricks and members of the Microsoft Cloud Solution Architecture team. We would like to thank Databricks...
View ArticleData with impact: A look back at the first Hackathon for Social Good
As global citizens, more and more businesses are investing in corporate social responsibility (CSR) programs to help solve the issues of system social injustice and economic inequity highlighted by...
View ArticleOn Demand Virtual Workshop: Predicting Churn to Improve Customer Retention
The proliferation of subscription models has increased across industries: from direct-to-consumer brands for shaving supplies and prepared meals to streaming media services, at-home fitness, auto...
View ArticleModern Industrial IoT Analytics on Azure – Part 2
Introduction In part 1 of the series on Modern Industrial Internet of Things (IoT) Analytics on Azure, we walked through the big data use case and the goals for modern IIoT analytics, shared a...
View ArticleData Teams Unite! Spark + AI Summit Recap
It’s been a few weeks since Spark + AI Summit 2020 and we can still feel the amazing energy from this global virtual event. Judging from the positive feedback across social media posts, press coverage...
View ArticleInteroperability between Koalas and Apache Spark
Koalas is an open source project which provides a drop-in replacement for pandas, enabling efficient scaling out to hundreds of worker nodes for everyday data science and machine learning. After over...
View ArticleHow to accelerate your ETL pipelines from 18 hours to as fast as 5 minutes...
Azure Databricks enables organizations to migrate on-premises ETL pipelines to the cloud to dramatically accelerate performance and increase reliability. If you are using SQL Server Integration...
View ArticleFlagging at-risk subscribers for direct-to-consumer media services
“The biggest problem for streaming services is not so much getting new members, it’s holding them. It’s the churn factor.” Tom Rogers, Executive Chairman at WinView, Inc and former NBC Cable President...
View ArticleModern Industrial IoT Analytics on Azure – Part 3
In part 2 of this three-part series on Azure data analytics for modern industrial internet of things (IIoT) applications, we ingested real-time IIoT data from field devices into Azure and performed...
View ArticleTop 5 Reasons to Convert Your Cloud Data Lake to a Delta Lake
If you examine the agenda for any of the Spark Summits in the past five years, you will notice that there is no shortage of talks on how best to architect a data lake in the cloud using Apache Spark™...
View Article