Big Graph Analytics with LynxKite & Spark
This is a guest blog from our one of our partners: Lynx Analytics About Lynx Analytics Lynx Analytics is a data analytics consultancy firm with a focus on graph analytics and proprietary big graph...
View ArticleRecent performance improvements in Apache Spark: SQL, Python, DataFrames, and...
In this post, we look back and cover recent performance efforts in Spark. In a follow-up blog post next week, we will look forward and share with you our thoughts on the future evolution of Spark’s...
View ArticleProject Tungsten: Bringing Spark Closer to Bare Metal
In a previous blog post, we looked back and surveyed performance improvements made to Spark in the past year. In this post, we look forward and share with you the next chapter, which we are calling...
View ArticleSpark Summit 2015 in San Francisco is just around the corner!
We’re proud to announce that the new Spark Summit website is live! This includes the full list of community talks along with the first set of keynotes. With over 260 submissions this year, the...
View ArticleNTT DATA: Operating Spark clusters at thousands-core scale and use cases for...
This is a guest blog from our one of our partners: NTT DATA Corporation About NTT DATA Corporation NTT DATA Corporation is a Japanese IT solution provider and the global IT services arm of NTT (Nippon...
View ArticleTuning Java Garbage Collection for Spark Applications
This is a guest post from our friends in the SSG STO Big Data Technology group at Intel. Join us at the Spark Summit to hear from Intel and other companies deploying Spark in production. Use the code...
View ArticleDatabricks Launches MOOC: Data Science on Spark
For the past several months, we have been working in collaboration with professors from the University of California Berkeley and University of California Los Angeles to produce two freely available...
View ArticleStatistical and Mathematical Functions with DataFrames in Spark
Join us at Spark Summit to hear more about new functionalities of Apache Spark. Use the code Databricks20 to receive a 20% discount! We introduced DataFrames in Spark 1.3 to make Apache Spark much...
View ArticleJoin us for Engineer Office Hours at the Spark Summit
This post is about the upcoming Spark Summit in San Francisco. Tickets are selling fast, so register today to join us! Use the code Databricks20 to receive a 20% discount! We’re happy to announce that...
View ArticleSimplify Machine Learning on Spark with Databricks
Join us at Spark Summit to hear more about new functionalities of Apache Spark. Use the code Databricks20 to receive a 20% discount! Spark MLlib is a core component of Apache Spark that allows data...
View ArticleMaking Databricks Cloud better for developers: IDE Integration
We have been working hard at Databricks to make our product more user-friendly for developers. Recently, we have added two new features that will allow developers easily use external libraries – both...
View ArticleAnnouncing SparkR: R on Spark
Join us at the Spark Summit to learn more about SparkR. Use the code Databricks20 to receive a 20% discount! I am excited to announce that the upcoming Apache Spark 1.4 release will include SparkR, an...
View ArticleHuawei Embraces Open-Source Apache Spark
This is a guest blog from one of our partners: Huawei Join us at the Spark Summit to hear from Intel and other companies deploying Spark in production. Use the code Databricks20 to receive a 20%...
View ArticleAnnouncing Apache Spark 1.4
Join us at the Spark Summit to learn more about Spark 1.4. Use the code Databricks20 to receive a 20% discount! Today I’m excited to announce the general availability of Spark 1.4! Spark 1.4...
View ArticleDatabricks and IBM Collaborate to Enhance Apache Spark Machine Learning
At today’s Spark Summit, Databricks and IBM announced a joint effort to contribute key machine learning capabilities to the Apache Spark Project. Over the course of the next few months, Databricks and...
View ArticleDatabricks is now Generally Available
We are excited to announce today, at Spark Summit 2015, the general availability of the Databricks – a hosted data platform from the team that created Apache Spark. With Databricks, you can...
View ArticleZen and the Art of Spark Maintenance with Cassandra
This is a guest post from our friends at DataStax. Apache Cassandra™ is a fully distributed, highly scalable database that allows users to create online applications that are always-on and can process...
View ArticleGuest blog: How Customers Win with Spark on Hadoop
This is a guest post from our friends at MapR. This blog summarizes my conversations over the last few months with users who have deployed Spark in production on the MapR Distribution including...
View ArticleA Look Back at Spark Summit 2015
We are delighted about the success of Spark Summit 2015 in San Francisco on June 15th and 16th, with three different sold-out Spark Training sessions on June 17th. This is the largest Spark Summit...
View ArticleUnderstanding your Spark application through visualization
The greatest value of a picture is when it forces us to notice what we never expected to see. – John Tukey In the past, the Spark UI has been instrumental in helping users debug their applications. In...
View Article