Announcing Hackathon for Social Good
Data Teams Unite! We’re excited to announce our first-ever virtual and global hackathon, where you’ll form data teams to help tackle climate change, the COVID-19 pandemic or issues unique to your local...
View ArticleHow a Fresh Approach to Safety Stock Analysis Can Optimize Inventory
Refer to the accompanying notebook for more details. A manufacturer is working on an order for a customer only to find that the delivery of a critical part is delayed by a supplier. A retailer...
View ArticleGlow 0.3.0 Introduces New Large-Scale Genomic Analysis Features
In October of last year, Databricks and the Regeneron Genetics Center® partnered together to introduce Project Glow, an open-source analysis tool aimed at empowering genetics researchers to work on...
View ArticleNew study: Databricks delivers nearly $29 million in economic benefits and...
A new commissioned study by Forrester Consulting on behalf of Databricks finds that Databricks customers experience revenue acceleration, improved data team productivity and infrastructure savings...
View ArticleEvolving the Databricks brand
. Some brands start out as, well, brands. A lot of work goes into the concept and painting the picture before the business is ever launched. Databricks is different. It always has been and always will...
View ArticleFaster SQL Queries on Delta Lake with Dynamic File Pruning
There are two time-honored optimization techniques for making queries run faster in data systems: process data at a faster rate or simply process less data by skipping non-relevant data. This blog post...
View ArticleIntern Tips for a Virtual Databricks Internship
Winter 2020 Interns at our Spark Social S’mores Event At Databricks, we host interns year-round, and we love sharing their experiences working on impactful projects that help data teams solve the...
View ArticleAzure Databricks Security Best Practices
Azure Databricks is a Unified Data Analytics Platform that is a part of the Microsoft Azure Cloud. Built upon the foundations of Delta Lake, MLflow, Koalas and Apache SparkTM, Azure Databricks is a...
View ArticleHow to build a Quality of Service (QoS) analytics solution for streaming...
The Importance of Quality to Streaming Video Services Databricks QoS Solution Overview Video QoS Solution Architecture Making Your Data Ready for Analytics Creating the Dashboard / Virtual Network...
View ArticleDatabricks Launches Global University Alliance Program
We are excited to announce the Databricks University Alliance, a global program to help students get hands-on experience using Databricks for both in-person learning and in virtual classrooms....
View ArticleNow on Databricks: A Technical Preview of Databricks Runtime 7 Including a...
Introducing Databricks Runtime 7.0 Beta We’re excited to announce that the Apache Spark 3.0.0-preview2 release is available on Databricks as part of our new Databricks Runtime 7.0 Beta. The...
View ArticleFighting Cyber Threats in the Public Sector with Scalable Analytics and AI
Watch our on-demand webinar Real-time Threat Detection, Analytics and AI in the Public Sector to learn more and see a live demo. In 2019, there were 7,098 data breaches exposing over 15.1 billion...
View ArticleA Convolutional Neural Network Implementation For Car Classification
Convolutional Neural Networks (CNN) are state-of-the-art Neural Network architectures that are primarily used for computer vision tasks. CNN can be applied to a number of different tasks, such as image...
View ArticleShrink Training Time and Cost Using NVIDIA GPU-Accelerated XGBoost and Apache...
Guest Blog by Niranjan Nataraja and Karthikeyan Rajendran of Nvidia. Niranjan Nataraja is a lead data scientist at Nvidia and specializes in building big data pipelines for data science tasks and...
View ArticleSchema Evolution in Merge Operations and Operational Metrics in Delta Lake
Try this notebook to reproduce the steps outlined below We recently announced the release of Delta Lake 0.6.0, which introduces schema evolution and performance improvements in merge and operational...
View ArticleManage and Scale Machine Learning Models for IoT Devices
A common data science internet of things (IoT) use case involves training machine learning models on real-time data coming from an army of IoT sensors. Some use cases demand that each connected device...
View ArticleNew Pandas UDFs and Python Type Hints in the Upcoming Release of Apache Spark...
Pandas user-defined functions (UDFs) are one of the most significant enhancements in Apache Spark for data science. They bring many benefits, such as enabling users to use Pandas APIs and improving...
View ArticleMLOps takes center stage at Spark + AI Summit
As companies ramp up machine learning, the growth in the number of models they have under development begins to impact their set of tools, processes and infrastructure. Machine learning involves data...
View ArticleModernizing Risk Management Part 1: Streaming data-ingestion, rapid model...
Managing risk within the financial services, especially within the banking sector, has increased in complexity over the past several years. First, new frameworks (such as FRTB) are being introduced...
View ArticleAutomating away engineering on-call workflows at Databricks
A Summer of Self-healing This summer I interned with the Cloud Infrastructure team. The team is responsible for building scalable infrastructure to support Databricks’s multi-cloud product, while using...
View Article