Personalizing the Customer Experience with Recommendations
Go directly to the Recommendation notebooks referenced throughout this post. Retail made a giant leap forward in the adoption of e-commerce in 2020, E-commerce as a percentage of total retail saw...
View ArticleNatively Query Your Delta Lake With Scala, Java, and Python
Today, we’re happy to announce that you can natively query your Delta Lake with Scala and Java (via the Delta Standalone Reader) and Python (via the Delta Rust API). Delta Lake is an open-source...
View ArticleHow to Manage Python Dependencies in PySpark
Controlling the environment of an application is often challenging in a distributed computing environment – it is difficult to ensure all nodes have the desired environment to execute, it may be tricky...
View ArticleBayesian Modeling of the Temporal Dynamics of COVID-19 Using PyMC3
In this post, we look at how to use PyMC3 to infer the disease parameters for COVID-19. PyMC3 is a popular probabilistic programming framework that is used for Bayesian modeling. Two popular methods to...
View ArticleLakehouse Architecture Realized: Enabling Data Teams With Faster, Cheaper and...
Databricks was founded under the vision of using data to solve the world’s toughest problems. We started by building upon our open source roots in Apache Spark™ and creating a thriving collection of...
View ArticleOver 200K Enrolled in Databricks Certification and Training
More than 200,000 individuals have participated in Databricks certification and training over the past two years, including thousands of partners. In the past year alone, over 75,000 individuals have...
View ArticleLearn How Disney+ Built Their Streaming Data Analytics Platform With...
Martin Zapletal, Software Engineering Director at Disney+, is presenting at re:Invent 2020 with the session How Disney+ uses fast data ubiquity to improve the customer experience (must be registered to...
View ArticleOver 200K Enrolled in Databricks Certification and Training
More than 200,000 individuals have participated in Databricks certification and training over the past four years, including thousands of partners. In the past year alone, over 75,000 individuals have...
View ArticleData Access Governance and 3 Signs You Need it
This is a guest authored post by Heather Devane, content marketing manager, Immuta. Cloud data analytics is only as powerful as the ability to access that data for use. Yet, the data stewards...
View ArticleLeveling the Playing Field: HorovodRunner for Distributed Deep Learning Training
This is a guest post authored by Sr. Staff Data Scientist/User Experience Researcher Jing Pan and Senior Data Scientist Wendao Liu of leading health insurance marketplace eHealth. None generates...
View ArticleCombining Rules-based and AI Models to Combat Financial Fraud
The financial service industry (FSI) is rushing towards transformational change, delivering transactional features and facilitating payments through new digital channels to remain competitive....
View ArticleHow to Save up to 50% on Azure ETL While Improving Data Quality
The challenges of data quality One of the most common issues our customers face is maintaining high data quality standards, especially as they rapidly increase the volume of data they process, analyze...
View ArticleData Exfiltration Protection With Databricks on AWS
In this blog, you will learn a series of steps you can take to harden your Databricks deployment from a network security standpoint, reducing the risk of Data exfiltration happening in your...
View ArticleRay & MLflow: Taking Distributed Machine Learning Applications to Production
This is a guest blog from software engineers Amog Kamsetty and Archit Kulkarni of Anyscale and contributors to Ray.io In this blog post, we’re announcing two new integrations with Ray and MLflow: Ray...
View ArticleSecurity Cluster Connectivity Is Generally Available on Azure Databricks
This is a collaborative post co-authored by Principal Product Manager Premal Shah, Microsoft, and Principal Enterprise Readiness Manager Abhinav Garg, Databricks We’re excited to announce the general...
View ArticleHow Lakehouses Solve Common Issues With Data Warehouses
Editor’s note: This is the first in a series of posts largely based on the CIDR paper Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics, with permission...
View ArticleAutomatically Evolve Your Nested Column Schema, Stream From a Delta Table...
We recently announced the release of Delta Lake 0.8.0, which introduces schema evolution and performance improvements in merge and operational metrics in table history. The key features in this release...
View ArticleAccelerating ML Experimentation in MLflow
This fall, I interned with the ML team, which is responsible for building the tools and services that make it easy to do machine learning on Databricks. During my internship, I implemented several...
View ArticleAmplify Insights into Your Industry With Geospatial Analytics
Data science is becoming commonplace and most companies are leveraging analytics and business intelligence to help make data-driven business decisions. But are you supercharging your analytics and...
View ArticleAzure Databricks Achieves DoD Impact Level 5 (IL5) on Microsoft Azure Government
We are excited to announce that Azure Databricks has received a Provisional Authorization (PA) by the Defense Information Systems Agency (DISA) at Impact Level 5 (IL5), as published in the Department...
View Article