Intro to Machine Learning - MATH-CS COMPASS

The Law of Intelligence

Artificial intelligence (AI) has become one of the most influential technologies of our time, powering applications from search engines to self-driving cars. Before diving into its technical details, it's worth stepping back and asking: what is intelligence itself, and what does it mean to replicate it artificially?

For example, the laws of motion and aerodynamics govern both natural and human-made flight. We accept without hesitation that birds can fly through the air, and we trust airplanes to carry us safely across continents. This shared trust comes from our understanding of the same physical principles that explain both. Similarly, if we could uncover the fundamental laws of intelligence, we might someday build machines that "think" with the same confidence we have in machines that fly.

Even though creating a truly intelligent system — one that rivals the flexibility and generality of the human mind — remains an open scientific challenge, we do have guiding principles. Modern approaches to AI are built on frameworks like Bayesian decision theory and information processing. These form the theoretical foundation for machine learning (ML), which is a subfield of artificial intelligence that focuses on developing algorithms that enable computers to learn from data and improve their performance on specific tasks over time.

A widely cited formal definition of machine learning comes from computer scientist Tom M. Mitchell:

A computer program is said to learn from experience \(E\), with respect to some class of tasks \(T\) and performance measure \(P\), if its performance at tasks in \(T\), as measured by \(P\), improves with experience \(E\). (T. Mitchell, Machine Learning, McGraw Hill, 1997)

One branch of machine learning, called deep learning, utilizes large neural networks to perform complex tasks such as:

Autonomous vehicles: Self-driving cars and drones leverage deep learning to interpret sensor data, navigate environments, and make real-time decisions, enhancing transportation safety and efficiency.
Medical diagnostics: Deep learning models analyze medical images and patient data to detect diseases such as cancer and heart conditions, enabling early diagnosis and personalized treatment plans.
Scientific discovery: AI accelerates research in fields like materials science and genomics by predicting molecular structures and interactions, leading to breakthroughs in drug development and sustainable materials.

One of the most impactful applications of deep learning today is the development of Large Language Models (LLMs). These models, such as GPT-4.5 by OpenAI, Gemini 2.5 Pro by Google DeepMind, Claude 3.7 Sonnet by Anthropic, and Llama 3 by Meta, are built using deep neural networks — specifically the transformer architecture — and are trained on massive text datasets. LLMs have demonstrated remarkable capabilities in language understanding, text generation, translation, and even reasoning. They represent the cutting edge of deep learning research and are a driving force behind the current AI revolution.

Despite their impressive performance, current large language models still have fundamental limitations. While they can generate fluent text and mimic reasoning patterns, they lack true understanding, contextual grounding, and self-awareness. These models rely on vast amounts of data and computational power, and they still fall short of the efficiency, adaptability, and generalization abilities seen in human cognition. This invites a deeper question: what makes natural intelligence so effective — and how might we capture some of that power in artificial systems?

In nature, intelligent organisms often rely on heuristics — simple, fast approximations — rather than fully rational or optimal solutions. The human brain, for example, can recognize faces and objects almost instantly, sometimes even producing optical illusions due to its shortcuts. This observation suggests that the future of AI may benefit not only from mathematics and computer science, but also from insights in neuroscience, cognitive science, and psychology.

To build truly intelligent machines, we may need to understand not just how to compute optimally, but how to approximate intelligently.

Supervised vs Unsupervised

Machine learning encompasses various approaches, primarily distinguished by the presence or absence of labeled data:

Supervised Learning:
Unsupervised Learning:

Additionally, modern machine learning has introduced hybrid approaches:

Semi-Supervised Learning:
Self-Supervised Learning:

Basic Categories of Machine Learning

Machine learning tasks are broadly categorized based on the nature of the prediction or pattern recognition involved:

Regression: Predicts continuous numerical values.

Linear regression, polynomial regression, Ridge regression

Classification: Assigns inputs into predefined discrete categories.

Logistic regression

support vector machines

neural networks

Clustering: Groups similar data points without predefined labels.

K-means clustering

Dimensionality Reduction: Reduces the number of input variables in a dataset while preserving essential information.

Principal Component Analysis (PCA), Autoencoders.

Reinforcement Learning: Trains agents to make sequences of decisions by rewarding desired behaviors and punishing undesired ones.

Standard Process of ML

The machine learning process involves several key steps to develop effective models:

Problem Definition:
Data Collection:
Data Preprocessing:
Data Splitting:
Model and Optimization Procedure Selection:
Training:
Evaluation:
Hyperparameter Tuning:
Testing:
Deployment:
Monitoring and Maintenance: