Open Nav

Data Sources

Unify your entire machine learning workflow with all your favorite data science tools, languages, frameworks and compute. Connect any data source, algorithm, and your favorite package, and programming language.

Flexible ML Platform

Flexible

Use any language, AI framework, and compute environment. Integrate and version any kind of data to reuse in any project, experiment, and/or notebook

Interactive ML Platform

Interactive

Use any development environment like JupyterLab, RStudio, and more with pre-installed dependencies and version control

Unified ML Platform

Interactive

One unified environment to manage, build and deploy your machine learning with all your favorite data science tools

Apache Kafka is an open-source stream-processing software platform that aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.

 

Apache Cassandra is a free and open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.

MySQL is an open-source relational database management system. Its name is a combination of “My”, the name of co-founder Michael Widenius’s daughter, and “SQL”, the abbreviation for Structured Query Language.

Amazon Simple Storage Service is a service offered by AWS that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its global e-commerce network.

Amazon Redshift Database is a cloud-based, big data warehouse solution offered by Amazon. The platform provides a storage system that lets companies store petabytes of data in easy-to-access “clusters” that can be queried in parallel.

PostgreSQL is a powerful, open source object-relational database system that uses and extends the SQL language combined with many features that safely store and scale the most complicated data workloads.

Google Cloud Storage is a RESTful online file storage web service for storing and accessing data on Google Cloud Platform infrastructure. The service combines the performance and scalability of Google’s cloud with advanced security and sharing capabilities.

BigQuery is a fully-managed data warehouse on RESTful web service that enables scalable, cost-effective and fast analysis of big data working in conjunction with Google Cloud Storage. It is a serverless Software as a Service that may be used complementarily with MapReduce.

Azure Blob storage is Microsoft’s object storage solution for the cloud. Blob storage is optimized for storing massive amounts of unstructured data. Unstructured data is data that doesn’t adhere to a particular data model or definition, such as text or binary data.