A Guide to Women In Unified Analytics Events at Spark+AI Summit Europe
Spark + AI Summit is Europe’s largest data and machine learning conference, and the big news in 2019 is how many women are driving some of the greatest advances in big data, machine learning, and data...
View ArticleEngineering population scale Genome-Wide Association Studies with Apache...
Try this notebook series in Databricks The advent of genome-wide association studies (GWAS) in the late 2000s enabled scientists to begin to understand the causes of complex diseases such as diabetes...
View ArticleDemocratizing Financial Time Series Analysis with Databricks
Try this notebook in Databricks Introduction The role of data scientists, data engineers, and analysts at financial institutions includes (but is not limited to) protecting hundreds of billions of...
View ArticleHow Informatica Data Engineering Goes Hadoop-less with Databricks
Back in May, we announced our partnership with Informatica to build out a rich set of integrations between our two platforms. It’s been exciting work for the team because of what we can do for joint...
View ArticleDelta Lake Now Hosted by the Linux Foundation to Become the Open Standard for...
At today’s Spark + AI Summit Europe in Amsterdam, we announced that Delta Lake is becoming a Linux Foundation project. Together with the community, the project aims to establish an open standard for...
View ArticleManaged MLflow Now Available on Databricks Community Edition
In February 2016, we introduced Databricks Community Edition, a free edition for big data developers to learn and get started quickly with Apache Spark. Since then our commitment to foster a community...
View ArticleIntroducing the MLflow Model Registry
At today’s Spark + AI Summit in Amsterdam, we announced the availability of the MLflow Model Registry, a new component in the MLflow open source ML platform. Since we introduced MLflow at Spark+AI...
View ArticleIntroducing Glow: an open-source toolkit for large-scale genomic analysis
The key to solving some of today’s most challenging medical problems lies in the analysis of genomics data. Understanding the impact of the minor changes in an individual’s genome on their overall...
View ArticleScaling Financial Time Series Analysis Beyond PCs and Pandas: On-Demand...
On Oct 9th, 2019, we hosted a live webinar —Scaling Financial Time Series Analysis Beyond PCs and Pandas — with Junta Nakai, Industry Leader Financial Services at Databricks, and Ricardo Portilla,...
View ArticleSpark + AI in Amsterdam: European Summit Recap, Keynote Videos, & Announcements
Spark + AI Summit Europe 2019 came to Amsterdam this past week! Over 2,300 data scientists, data engineers, and global business leaders from 63 different countries descended upon the RAI Amsterdam...
View ArticleSimplify Data Lake Access with Azure AD Credential Passthrough
Azure Databricks brings together the best of the Apache Spark, Delta Lake, an Azure cloud. The close partnership provides integrations with Azure services, including Azure’s cloud-based role-based...
View ArticleScaling Hyperopt to Tune Machine Learning Models in Python
Try the Hyperopt notebook to reproduce the steps outlined below and watch our on-demand webinar to learn more. Hyperopt is one of the most popular open-source libraries for tuning Machine Learning...
View ArticleWhy we are investing 100 million euros in our European Development Center
A few days ago, we announced an investment of 100 million euros in our European Development Center in Amsterdam. I want to take a moment to describe why this is a pivotal moment for Databricks and why...
View ArticleScalable near real-time S3 access logging analytics with Apache Spark™ and...
The original blog is from Viacheslav Inozemtsev, Senior Data Engineer at Zalando, reproduced with permission. Introduction Many organizations use AWS S3 as their main storage infrastructure for their...
View ArticleSolving the Challenge of Big Data Cloud Migration with WANdisco, Databricks...
Migrating from Hadoop on-premises to the cloud has been a common theme in recent Databricks blog posts and conference sessions. They’ve identified key considerations, highlighted partnerships and...
View ArticleNew Microsoft Azure Data Warehouse Service and Azure Databricks Combine...
In the last two years since it first became available, thousands of companies have adopted Azure Databricks, making it one of the fastest growing data and AI services on Microsoft Azure. Customers now...
View ArticleCelebrating Growth at Databricks and 1,000 Employees!
This November, Databricks hired our 1,000th full-time employee! Founded in Berkeley in 2013, our six co-founders created Databricks to help data teams solve the world’s toughest problems – and since...
View ArticleUsing AutoML Toolkit’s FamilyRunner Pipeline APIs to Simplify and Automate...
Try this Loan Risk with AutoML Pipeline API Notebook in Databricks Introduction In the post Using AutoML Toolkit to Automate Loan Default Predictions, we had shown how the Databricks Labs’ AutoML...
View ArticleAutomate and Fast-track Data Lake and Cloud ETL with Databricks and StreamSets
Data lake ingestion is a critical component of a modern data infrastructure. But enterprises often run into challenges when they have to use this data for analytics and machine learning workloads....
View ArticleUse Databricks Pools to Speed up your Data Pipelines and Scale Clusters Quickly
Data Engineering teams deploy short, automated jobs on Databricks. They expect their clusters to start quickly, execute the job, and terminate. Data Analytics teams run large auto-scaling, interactive...
View Article