Intro to Machine Learning

The Law of Intelligence Supervised vs Unsupervised Basic Categories of ML Standard Process of ML

The Law of Intelligence

Artificial intelligence (AI) has become one of the most influential technologies of our time, powering applications from search engines to self-driving cars. Before diving into its technical details, it's worth stepping back and asking: what is intelligence itself, and what does it mean to replicate it artificially?

For example, the laws of motion and aerodynamics govern both natural and human-made flight. We accept without hesitation that birds can fly through the air, and we trust airplanes to carry us safely across continents. This shared trust comes from our understanding of the same physical principles that explain both. Similarly, if we could uncover the fundamental laws of intelligence, we might someday build machines that "think" with the same confidence we have in machines that fly.

Even though creating a truly intelligent system — one that rivals the flexibility and generality of the human mind — remains an open scientific challenge, we do have guiding principles. Modern approaches to AI are built on frameworks like Bayesian decision theory and information processing. These form the theoretical foundation for machine learning (ML), which is a subfield of artificial intelligence that focuses on developing algorithms that enable computers to learn from data and improve their performance on specific tasks over time.

A widely cited formal definition of machine learning comes from computer scientist Tom M. Mitchell:

A computer program is said to learn from experience \(E\), with respect to some class of tasks \(T\) and performance measure \(P\), if its performance at tasks in \(T\), as measured by \(P\), improves with experience \(E\). (T. Mitchell, Machine Learning, McGraw Hill, 1997)

One branch of machine learning, called deep learning, utilizes large neural networks to perform complex tasks such as:

One of the most impactful applications of deep learning today is the development of Large Language Models (LLMs). These models, such as GPT-4.5 by OpenAI, Gemini 2.5 Pro by Google DeepMind, Claude 3.7 Sonnet by Anthropic, and Llama 3 by Meta, are built using deep neural networks — specifically the transformer architecture — and are trained on massive text datasets. LLMs have demonstrated remarkable capabilities in language understanding, text generation, translation, and even reasoning. They represent the cutting edge of deep learning research and are a driving force behind the current AI revolution.

Despite their impressive performance, current large language models still have fundamental limitations. While they can generate fluent text and mimic reasoning patterns, they lack true understanding, contextual grounding, and self-awareness. These models rely on vast amounts of data and computational power, and they still fall short of the efficiency, adaptability, and generalization abilities seen in human cognition. This invites a deeper question: what makes natural intelligence so effective — and how might we capture some of that power in artificial systems?

In nature, intelligent organisms often rely on heuristics — simple, fast approximations — rather than fully rational or optimal solutions. The human brain, for example, can recognize faces and objects almost instantly, sometimes even producing optical illusions due to its shortcuts. This observation suggests that the future of AI may benefit not only from mathematics and computer science, but also from insights in neuroscience, cognitive science, and psychology.

To build truly intelligent machines, we may need to understand not just how to compute optimally, but how to approximate intelligently.

Supervised vs Unsupervised

Machine learning encompasses various approaches, primarily distinguished by the presence or absence of labeled data:

Additionally, modern machine learning has introduced hybrid approaches:

Basic Categories of Machine Learning

Machine learning tasks are broadly categorized based on the nature of the prediction or pattern recognition involved:

Standard Process of ML

The machine learning process involves several key steps to develop effective models:

  1. Problem Definition:
  2.    Clearly articulate the problem and determine whether machine learning is an appropriate solution.
  3. Data Collection:
  4.    Gather relevant data from various sources, ensuring quality and representativeness.
  5. Data Preprocessing:
  6.    Clean and prepare the data by handling missing values, encoding categorical variables, and normalizing features.
  7. Data Splitting:
  8.    Divide the dataset into training, validation, and test sets to evaluate model performance effectively.
  9. Model and Optimization Procedure Selection:
  10.    Choose suitable algorithms based on the problem type (e.g., regression, classification) and data characteristics.
  11. Training:
  12.    Feed the training data into the model, allowing it to learn patterns and relationships.
  13. Evaluation:
  14.    Assess the model's performance using the validation set and appropriate metrics (e.g., accuracy, precision, recall).
  15. Hyperparameter Tuning:
  16.    Optimize model parameters to enhance performance, often using techniques like grid search or random search.
  17. Testing:
  18.    Evaluate the final model on the test set to estimate its performance on unseen data.
  19. Deployment:
  20.    Integrate the model into a production environment where it can make real-time predictions.
  21. Monitoring and Maintenance:
  22.    Continuously monitor the model's performance and update it as necessary to accommodate new data or changing conditions.