Announcing a New Redash Connector for Databricks
We’re happy to introduce a new, open source connector with Redash, a cloud-based SQL analytics service, to make it easy to query data lakes with Databricks. Traditionally, data analyst teams face...
View ArticleAdaptive Query Execution: Speeding Up Spark SQL at Runtime
This is a joint engineering effort between the Databricks Apache Spark engineering team — Wenchen Fan, Herman van Hovell and MaryAnn Xue — and the Intel engineering team — Ke Jia, Haifeng Chen and...
View ArticleVectorized R I/O in Upcoming Apache Spark 3.0
R is one of the most popular computer languages in data science, specifically dedicated to statistical analysis with a number of extensions, such as RStudio addins and other R packages, for data...
View ArticleMonitor Your Databricks Workspace with Audit Logs
Cloud computing has fundamentally changed how companies operate – users are no longer subject to the restrictions of on-premises hardware deployments such as physical limits of resources and onerous...
View ArticleCustomer Lifetime Value Part 1: Estimating Customer Lifetimes
Download the Customer Lifetimes Part 1 notebook to demo the solution covered below. The biggest challenge every marketer faces is how to best spend money to profitably grow their brand. We want to...
View ArticleHow the Minnesota Twins Scaled Pitch Scenario Analysis to Measure Player...
Statistical Analysis in the Game of Baseball A single pitch in Major League Baseball (MLB) generates tens of megabytes of data, from pitch movement to ball rotation to hitter behavior to the movement...
View ArticleAutomate continuous integration and continuous delivery on Databricks using...
CONTENTS Overview Why do we need yet another deployment framework? Simplifying CI/CD on Databricks via reusable templates Development lifecycle using Databricks Deployments How to create and deploy a...
View ArticleModernizing Risk Management Part 2: Aggregations, Backtesting at Scale and...
Understanding and mitigating risk is at the forefront of any financial services institution. However, as previously discussed in the first blog of this two-part series, banks today are still struggling...
View ArticleAccelerating developers by ditching the data center
Guest blog by R Tyler Croy, Director of Platform Engineering at Scribd People don’t tend to get excited about the data platform. It is often regarded much like road infrastructure: nobody thinks much...
View ArticleData Teams Unite! Countdown to Spark + AI Summit
Spark + AI Summit 2020 is now virtual and free! June 22-26 is just around the corner and the excitement is building! More sessions. More speakers. 4x More training. And more of the world’s data...
View ArticleMedia and Entertainment Sessions You Don’t Want to Miss at Spark + AI Summit...
For years, the Spark + AI Summit has been the premier meeting place for organizations looking to build artificial intelligence (AI) applications at scale with leading technologies such as Apache...
View ArticleFinancial Services Sessions You Don’t Want to Miss at Spark + AI Summit 2020
Radical transformation is the theme of 2020, with customers demanding personalized products, improved protection against fraud, and digital experiences that match every small shift in behavior. Banks,...
View ArticleA Guide to the MLflow Talk at Spark + AI Summit 2020
It’s been 2 years since we originally launched MLflow, an open source platform for the full machine learning lifecycle, and we are thrilled and humbled by the adoption and impact it has gained in the...
View ArticleEnterprise Cloud Service Public Preview on AWS
At Databricks, we have had the opportunity to collaborate with companies that have transformed the way people live. Some of our customers have developed life saving drugs, delivered industry-first user...
View ArticleAccelerating Somatic Variant Calling with the Databricks TNSeq Pipeline
Genetic analyses are a critical tool in revolutionizing how we treat cancer. By understanding the mutations present in tumor cells, researchers can gain clues that lead to drug targets and eventually...
View ArticleSimplify Data Conversion from Apache Spark to TensorFlow and PyTorch
Petastorm is a popular open-source library from Uber that enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. We are excited to...
View ArticleHealthcare and Life Sciences Sessions You Don’t Want to Miss at Spark + AI...
The healthcare industry is in a rapid state of change. The COVID-19 pandemic has shined a light on how critical it is for healthcare payers, providers, pharmaceutical companies and government agencies...
View ArticleRetail and Consumer Goods Sessions You Don’t Want to Miss at Spark + AI...
The current economic environment is having a significant impact on the Retail and Consumer Goods sector. Rapid changes in how consumers shop is forcing companies to rethink their sales, marketing, and...
View ArticleOn-Demand Virtual Session: Customer Lifetime Value
Before you can provide personalized services and offers to your customers, you need to know who they are. In this virtual workshop, retail and media experts will demonstrate how to build advanced...
View ArticleSimplify Python environment management on Databricks Runtime for Machine...
Today we announce the release of %pip and %conda notebook magic commands to significantly simplify python environment management in Databricks Runtime for Machine Learning. With the new magic commands,...
View Article