AML Solutions at Scale Using Databricks Lakehouse Platform
Anti-Money Laundering (AML) compliance has been undoubtedly one of the top agenda items for regulators providing oversight of financial institutions across the globe. As AML evolved and became more...
View ArticleUnlocking The Power of Health Data With a Modern Data Lakehouse
A single patient produces approximately 80 megabytes of medical data every year. Multiply that across thousands of patients over their lifetime, and you’re looking at petabytes of patient data that...
View ArticleHow Databricks’ Data Team Built a Lakehouse Across 3 Clouds and 50+ Regions
The internal logging infrastructure at Databricks has evolved over the years and we have learned a few lessons along the way about how to maintain a highly available log pipeline across multiple clouds...
View ArticleThe Three Things CXO’s Prioritize in Their Data and AI Strategy
Leveraging data (internal and external) and customer analytics to innovate and create competitive advantages is more powerful than it has ever been. This popular practice is fueled by the growing...
View ArticleTop Considerations When Migrating Off of Hadoop
Apache Hadoop was created more than 15 years ago as an open source, distributed storage and compute platform designed for large data sets and large-scale batch processing. Early on, it was cheaper than...
View ArticleImproving Patient Insights With Textual ETL in the Lakehouse Paradigm
This is a collaborative post from Databricks and Forest Rim Technology. We thank Bill Inmon, Founder and CEO, and Mary Levins, Chief Data Officer, of Forest Rim for their contributions. The amount...
View ArticleMonitoring ML Models With Model Assertions
This is a collaborative post from Databricks and the Stanford University Computer Science Department. We thank Daniel Kang, Deepti Raghavan and Peter Bailis of Stanford University for their...
View ArticleThe Delta Between ML Today and Efficient ML Tomorrow
Delta Lake and MLflow both come up frequently in conversation but often as two entirely separate products. This blog will focus on the synergies between Delta Lake and MLflow for machine learning use...
View ArticleGetting Started With Ingestion into Delta Lake
Ingesting data can be hard and complex since you either need to use an always-running streaming platform like Kafka or you need to be able to keep track of which files haven’t been ingested yet. In...
View ArticleAugment Your SIEM for Cybersecurity at Cloud Scale
Over the last decade, security incident and event management tools (SIEMs) have become a standard in enterprise security operations. SIEMs have always had their detractors. But the explosion of cloud...
View ArticleDatabricks Lecture Series at UC Berkeley School of Information
This is a collaborative post from Databricks and UC Berkeley. We thank Tia Foss, Director of Philanthropy, UC Berkeley School of Information, for her contributions. Databricks began in the computer...
View ArticleAn Experimentation Pipeline for Extracting Topics From Text Data Using PySpark
This post is part of a series of posts on topic modeling. Topic modeling is the process of extracting topics from a set of text documents. This is useful for understanding or summarizing large...
View ArticleHow We Built Databricks on Google Kubernetes Engine (GKE)
Our release of Databricks on Google Cloud Platform (GCP) was a major milestone toward a unified data, analytics and AI platform that is truly multi-cloud. Databricks on GCP, a jointly-developed service...
View Article5 Key Steps to Successfully Migrate From Hadoop to the Lakehouse Architecture
The decision to migrate from Hadoop to a modern cloud-based architecture like the lakehouse architecture is a business decision, not a technology decision. In a previous blog, we dug into the reasons...
View ArticleIntroducing Support for gp3, Amazon’s New General Purpose SSD Volume
Databricks clusters on AWS now support gp3 volumes, the latest generation of Amazon Elastic Block Storage (EBS) general purpose SSDs. gp3 volumes offer consistent performance, cost savings and the...
View ArticleHow We Achieved High-bandwidth Connectivity With BI Tools
Business Intelligence (BI) tools such as Tableau and Microsoft Power BI are notoriously slow at extracting large query results from traditional data warehouses because they typically fetch the data in...
View ArticleAnnouncing the Databricks Beacons Program
With roots in academia and open source, we know much of Databricks’ success is due to the community- the data scientists, data engineers, developers, data architects, data analysts, open-source...
View ArticleHow Building Apache Zeppelin Led Me to Databricks
Today, I am excited to announce that I have officially joined Databricks as an Engineer on the Data Science team. This move comes after over a year of founding and running Staroid, a cloud-based...
View ArticleGetting to Know Databricks India
India is a vast country with extreme variations. A one-size-fits-all workplace does not do it justice. Although continued urbanization, transportation, and infrastructure are the foundation for...
View ArticleMastering the Next Level: Leveraging Data and AI in the Gaming Sector
How do you take 10k events per second from 30M users to create a better gamer experience? How can a small data team build more automated workflows to grow impact across all business units, from finance...
View Article