What is Azure Databricks?
- Azure Databricks is the fruit of a partnership between Microsoft and Apache Spark powerhouse, Databricks.
- The service provides a cloud-based environment for data scientists, data engineers and business analysts to perform analysis quickly and interactively, build models and deploy workflows using Apache Spark
- It’s designed specifically for big data processing and data scientists can take advantage of built-in core API for core languages like SQL, Python, and Scala.
Why Azure Databricks?
- Connect to any datasource , on premises , cloud ( ADW, Azure Data Lake ) etc..
- Integrate with Git and get versioning capabilities
- Share notebooks and work with peers in multiple languages (R, Python, SQL and Scala) and libraries of choice. Real-time coauthoring, commenting, and automated versioning simplify collaboration while staying in control.
- Quickly discover new insights with built-in interactive visualizations or any library like matplotlib or ggplot. Export results and notebooks in html or ipynb format, or build and share dashboards that always stay up to date.
- Schedule notebooks to automatically run Machine Learning and data pipelines at scale, and create multi-stage pipelines using notebooks workflows. Set up alerts and quickly access audit logs for easy monitoring and troubleshooting.