Data Science: A to Z

Kate Strachnyi
5 min readJan 4, 2023

--

Photo by Brett Jordan on Unsplash

A

  • A/B Testing: A statistical way of comparing two versions of a product or feature in order to determine which performs better.
  • Accuracy: The percentage of correct predictions made by a model.
  • Artificial Intelligence (AI): The study and design of intelligent agents, which are systems that can reason, learn, and act autonomously.
  • Association Rule Learning: A method for discovering relationships between variables in large datasets.

B

  • Batch Learning: A type of machine learning where the model is trained on the entire dataset at once, rather than incrementally.
  • Bias: A systematic error or deviation from the true value in a statistical analysis.
  • Big Data: Very large datasets that may be too large or complex for traditional data processing techniques.
  • Boosting: A type of ensemble method that combines the predictions of multiple weak models to create a strong model.

C

  • Classification: A type of machine learning task that involves predicting a class label for a given input data.
  • Clustering: A type of unsupervised learning that involves grouping data into clusters based on similarity.
  • Correlation: A statistical measure of the relationship between two variables.
  • Cross-Validation: A method for evaluating the performance of a model by dividing the data into training and testing sets and using each in turn as the holdout set.

D

  • Decision Tree: A type of machine learning model that uses a tree-like structure to make predictions based on feature values.
  • Deep Learning: A subfield of machine learning that involves training artificial neural networks on large datasets.
  • Dimensionality Reduction: The process of reducing the number of features in a dataset by selecting the most important ones or combining correlated features.
  • DATAcated: Dedicated to data — see Kate Strachnyi :)

E

  • Ensemble Method: A machine learning technique that combines the predictions of multiple models to create a more accurate prediction.
  • Evaluation Metric: A measure used to evaluate the performance of a model.

F

  • Feature: A characteristic or attribute of a data point that can be used to predict a target variable.
  • Feature Engineering: The process of selecting and creating features to be used in a machine learning model.
  • Feature Selection: The process of selecting a subset of relevant features to use in a model.

G

  • Gradient Descent: An optimization algorithm used to find the values of parameters that minimize a loss function.

H

  • Hyperparameter: A parameter that is set prior to training a machine learning model and controls the model’s behavior.

I

  • Imbalanced Classes: A classification problem where one class significantly outnumbers the other classes.
  • Imputation: The process of replacing missing or invalid data with a reasonable estimate.

K

  • K-Fold Cross-Validation: A type of cross-validation where the data is divided into K folds, and the model is trained and evaluated K times, with a different fold used as the holdout set each time.
  • K-Means Clustering: A type of clustering algorithm that divides the data into K clusters based on similarity.

L

  • Label: The target variable in a supervised learning problem.
  • Learning Rate: A hyperparameter that controls the step size at which the optimization algorithm makes updates to the model parameters.
  • Loss Function: A measure of how well a model’s predictions match the ground truth labels.

M

  • Machine Learning: A subfield of artificial intelligence that involves training models to make predictions or decisions based on data.
  • Model: A mathematical representation of a process or system that can be used to make predictions or decisions.

N

  • Neural Network: A type of machine learning model inspired by the structure and function of the brain, consisting of layers of interconnected nodes.

O

  • Overfitting: A phenomenon that occurs when a model is excessively complex and has learned patterns in the training data that do not generalize to unseen data.

P

  • Precision: The percentage of correct positive predictions made by a model.
  • Predictor Variable: A feature used to predict a target variable in a machine learning model.

R

  • Regression: A type of machine learning task that involves predicting a continuous target variable.
  • Regularization: A technique used to prevent overfitting in machine learning models by adding a penalty term to the loss function.

S

  • Sample: A subset of a population used to represent the entire population in statistical analysis.
  • Supervised Learning: A type of machine learning where the model is trained on labeled data and learns to predict the label for new, unseen data.
  • Support Vector Machine (SVM): A type of machine learning model that uses a hyperplane to classify data.

T

  • Test Set: A subset of a dataset used to evaluate the performance of a machine learning model.
  • Training Set: A subset of a dataset used to train a machine learning model.

U

  • Unsupervised Learning: A type of machine learning where the model is not given any labeled training data and must discover patterns and relationships in the data on its own.

V

  • Validation Set: A subset of a dataset used to tune the hyperparameters of a machine learning model.
  • Variable: A characteristic or attribute that can vary or change within a dataset.

W

  • Weight: In the context of machine learning, a weight is a parameter that determines the strength of the influence of an input feature on the output of the model.
  • Weight Initialization: The process of setting the initial values for the weights of a neural network model.
  • Weight Regularization: A technique used to prevent overfitting in neural network models by adding a penalty term to the loss function based on the magnitude of the weights.
  • Weight Sharing: A technique used in neural networks to reduce the number of parameters and improve model performance. It involves sharing the same weights across multiple connections in the network.
  • Word Embedding: A representation of words as vectors in a high-dimensional space, where semantically similar words are close together in the space.

X:

  • XGBoost: A gradient boosting library for machine learning that is used to improve model performance.

Y

  • YOLO (You Only Look Once): A real-time object detection algorithm that uses a convolutional neural network to predict the bounding boxes and class labels for objects in an image.

Z:

  • Zero-Shot Learning: A type of machine learning where the model is able to classify objects it has never seen before, based on its knowledge of other objects and their relationships.

--

--

Kate Strachnyi
Kate Strachnyi

Written by Kate Strachnyi

Founder of DATAcated | Author | Ultra-Runner | Mom of 2

Responses (1)