Unify your entire machine learning workflow with all your favorite data science tools, languages, frameworks and compute. Connect any data source, algorithm, and your favorite package, and programming language.
Also known as Scikit-learn, SKLearn is a free Python machine learning library featured various classification, regression and clustering algorithms including support for vector machines, random forests, and gradient poosting
Amazon Simple Storage Service is a service offered by AWS that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its global e-commerce network.
MySQL is an open-source relational database management system. Its name is a combination of “My”, the name of co-founder Michael Widenius’s daughter, and “SQL”, the abbreviation for Structured Query Language.
Amazon Redshift Database is a cloud-based, big data warehouse solution offered by Amazon. The platform provides a storage system that lets companies store petabytes of data in easy-to-access “clusters” that can be queried in parallel.
BigQuery is a fully-managed data warehouse on RESTful web service that enables scalable, cost-effective and fast analysis of big data working in conjunction with Google Cloud Storage. It is a serverless Software as a Service that may be used complementarily with MapReduce.
Google Cloud Storage is a RESTful online file storage web service for storing and accessing data on Google Cloud Platform infrastructure. The service combines the performance and scalability of Google’s cloud with advanced security and sharing capabilities.
Apache Cassandra is a free and open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.
PostgreSQL is a powerful, open source object-relational database system that uses and extends the SQL language combined with many features that safely store and scale the most complicated data workloads.
Azure Blob storage is Microsoft’s object storage solution for the cloud. Blob storage is optimized for storing massive amounts of unstructured data. Unstructured data is data that doesn’t adhere to a particular data model or definition, such as text or binary data.
Apache Kafka is an open-source stream-processing software platform that aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
Snowflake is an analytic data warehouse provided as Software-as-a-Service (SaaS). The Snowflake data warehouse uses a new SQL database engine with a unique architecture designed for the cloud and provides a data warehouse that is faster, easier to use, and far more flexible than traditional data warehouse offerings.
MinIO is a high-performance, S3 compatible object store. It is built for large scale AI/ML, data lake and database workloads. It is software-defined and runs on any cloud or on-premises infrastructure.
DGX is a workstation made by NVIDIA, that specializes in GPU acceleration for deep learning applications. DGX is optimized for accelerated data loading, data manipulation, and training of algorithms, get faster insights leveraging the performance and large GPU memory footprint of NVIDIA DGX Station.
An on-premise server is a physical, on-site server that a company must manage and maintain individually. While sometimes more costly in the short term, on premises is often considered more secure and reliable.
Apache Spark is an open source general purpose cluster-computing framework. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. Spark includes a framework called MLlib for machine learning
Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery. It works with range of container tools, including docker.
Red Hat® OpenShift® is an enterprise-ready Kubernetes container platform with full-stack automated operations to manage hybrid cloud and multicloud deployments. Red Hat OpenShift is optimized to improve developer productivity and promote innovation.
VMware Enterprise PKS is a Kubernetes-based container solution with advanced networking, a private container registry, and life cycle management. Enterprise PKS simplifies the deployment and operation of Kubernetes clusters so you can run and manage containers at scale on private and public clouds.
Amazon Web Services offers a broad set of global cloud-based products including compute, storage, databases, analytics, networking, mobile, developer tools, management tools, IoT, security, and enterprise applications: on-demand, available in seconds, with pay-as-you-go pricing.
Google Cloud Platform, offered by Google, is a suite of cloud computing services that runs on the same infrastructure that Google uses internally for its end-user products such as, Google Search, Gmail and YouTube.
Microsoft Azure is a cloud computing service created by Microsoft for building, testing, deploying, and managing applications and services through Microsoft-managed data centers.
K Nearest Neighbors is a classic algorithm for classification and regression. The algorithm receives the natural number K as an input and a training set. During the test set, for each sample the algorithm checks the K’s closest elements in the training set to the sample and classify it by a “plurality vote”.
Recurrent Neural Network is a class of neural networks where connections between nodes form a directed graph along a temporal sequence. In this type of network, a cycle can be formed and this behavior allows the creation of “internal memory” in order to process sequences of inputs. This makes them applicable to tasks such as handwriting recognition or speech recognition.
Support Vector Machine (SVM) is a supervised learning model. The SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on the side of the gap on which they fall.
Convolutional Neural Network is a class of deep neural networks, most commonly applied in the field of Computer Vision. CNNs are able to take in an input matrix (image), assign importance (learnable weights and biases) to various aspects\objects in the image and differentiate one from the other.
Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes or mean prediction of the individual trees.
Logistic Regression is an ML algorithm which is used for classification problems, it is a predictive analysis algorithm based on the concept of probability. The algorithm classifies with the sigmoid function which always returns values between 0 (absolutely false) and 1 (absolutely true).
K-Means is a clustering algorithm. This algorithm aims to cluster the data to K groups (clusters) such that each example in the data is as close as possible to the nearest centroid.