Pranjal's Blog

Predicting Bitcoin Prices with Linear Regression: A Beginner-Friendly Guide

Pranjal Verma — Sun, 28 Sep 2025 15:40:26 GMT

Linear Regression is one of the most fundamental concepts in machine learning and an excellent starting point for beginners. In this blog, we’ll unpack what linear regression is, how it works, and then build a practical model to predict Bitcoin closing prices using historical BTC price data.

Note: The python notebook is been attached, feel free to try it yourself.

What is Linear Regression?

Linear regression is a simple way to understand how one thing affects another. Imagine trying to predict a student’s exam score based on how many hours they study. Linear regression fits a straight line through data points, showing the relationship between study hours (independent variable) and exam scores (dependent variable). This helps predict what the score might be for a given number of study hours by using a simple formula like $y = w\cdot x + b$, where $y$ is the score, $x$ is the hours studied, $w$ is how much the score changes with each hour, and $b$ is the starting point when no hours are studied.

This technique is easy to use and understand, making it popular for many fields like business, healthcare, and tech. It helps you figure out how different factors are connected and predict future outcomes based on past data. Whether you’re using one factor or many, linear regression is a valuable tool to turn data into useful insights quickly and clearly.

More formally, Linear regression models the relationship between one dependent variable $y$ and one or more independent variables $x$. It “fits” a straight line (or a hyperplane for multiple features) that best predicts $y$ from $x$.

Linear regression models are categorized according to the number of input features ($x$) they use.

Simple Linear Regression (one feature):

$$y = mx+b$$

where,

$x$: feature (independent variable)
$y$: target (dependent variable)
$w$: slope parameter
$b$: intercept

Multiple Linear Regression (many features):

Multiple Linear Regression models the relationship between a dependent variable and two or more independent variables to understand how multiple factors together influence the outcome. It fits a linear equation that predicts the result based on the combined effects of all input features.

$$y = \vec w \cdot \vec x + b$$

where,

$\vec x$: vector of input features $[x_1, x_2, …, x_n]$
$\vec w$: vector of learned weights $[w_1, w_2, …, w_n]$
$b$: bias (intercept)
$y$: predicted output
$\vec w \cdot \vec x$: the dot product of $\vec w$ and $\vec x $

This above plot is the 3d representation of Multiple Linear Regression Model, having wt, mpg and year as the feature and target respectively, This images was sourced from - stat420.org.

The objective is to find $\vec w$ and $b$ that minimize the difference between predicted values $\hat y$ and actual values $y$. So that for any feature variables ($\vec x$) we are able to predict/compute the target ($y$).

Measuring Fit: Mean Squared Error (MSE)

Mean Squared Error (MSE) is a common way to measure how well a linear regression model fits the data. It calculates the average of the squared differences between the actual values and the values predicted by the model, with lower MSE values indicating better prediction accuracy.

We evaluate how good the predictions are using Mean Squared Error (MSE):

$$J(\mathbf{w}, b) = \frac{1}{2m} \sum_{i=1}^{m} \big(\hat{y}^{(i)} - y^{(i)}\big)^2$$

where,

$m$ is the number of examples.
$\hat y$ is the predicted value from the model, $\hat y=\vec w \cdot \vec x$

Minimizing this gives us the “best-fit” line.

How Does Linear Regression Work?

Linear regression works by finding the best-fitting straight line through a set of data points that relate an independent variable (input) to a dependent variable (output). This line is defined by an equation that predicts the output based on the input, and the model adjusts this line to minimize the difference between the actual and predicted values, often by using a method called least squares. The goal is to create a simple equation that accurately represents the relationship so it can be used to make predictions on new data.

The most common way to fit parameters is with the Least Squares Method, solved via Gradient Descent.

Gradient Descent in a Nutshell

Gradient descent is an iterative algorithm that updates parameters step by step in the direction that decreases the cost function the fastest.

The updates are:

$$w = w - \alpha \cdot \frac{\partial J}{\partial w}, \quad b = b - \alpha \cdot \frac{\partial J}{\partial b}$$

$\alpha$: learning rate (step size)
$\frac{\partial J}{\partial w}, \frac{\partial J}{\partial b}$: gradients

The loop continues until the error stabilizes (convergence).

Why Vectorization Matters?

In machine learning, datasets can have millions of rows. Loops in Python quickly become inefficient. Vectorization leverages NumPy’s optimized operations to handle entire vectors or matrices in a single step, making code faster and cleaner.

Without vectorization (slow loop):

  y_hat = []  
      for i in range(m):
          pred = 0
          for j in range(n):  
               pred += w[j] * X[i, j]  
               y_hat.append(pred + b)

With vectorization (fast & concise):
```
  y_hat = np.dot(X, w) + b
```

Note: Both produce the same result, but vectorization is significantly faster.

Case Study: Predicting Bitcoin Closing Prices

We’ll now apply linear regression on Bitcoin’s historical price data (2014–2024). Explore the dataset for yourself Kaggle BTC-USD Stock Data.

The dataset includes following features:

Open – price at the start of the day
High / Low – daily maximum and minimum prices
Close – target (end-of-day price)
Adj Close – adjusted close; excluded to avoid leakage
Volume – trading volume

Step 0 - Basic setup for getting started

Use Google Colab or setup the Jupyter Notebook locally (guide for local setup).
We are using Pandas and Numpy libraries.
Let’s import these libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Step 1 — Load & Inspect the Data

df = pd.read_csv('BTC-USD_stock_data.csv')  
df['Date'] = pd.to_datetime(df['Date'])  
df.set_index('Date', inplace=True)  
df.head()

Checks Before Modeling

Ensure numeric dtypes.
Handle missing values (df.isna().sum()).
Remove duplicates.

Since our dataset is already cleaned, we do not have to worry abot the data sanity

Step 2 — Feature Selection & Normalization

We’ll use: Open, High, Low, Volume.
Exclude Adj Close (directly derived from Close).

Why Normalize?

BTC prices and volume differ enormously in scale. Without scaling, training is inefficient. We apply z-score normalization:

$$z = \frac{x - \bar x}{\sigma}$$

where,

$\bar x$, mean of the $x$ and
$\sigma$, standard deviation of $x$.

df = (df - df.mean()) / df.std(ddof=0)

For our model, the Close attribute in the dataset is the target ($y$). Let’s fetch that too.

# Create target variable (next day's closing price)
df['Target'] = df['Close'].shift(-1)
df['Target'] = df['Target'].fillna(df['Target'].mean())

Step 3 — Visualizing Trends

Plot 30-day rolling mean of closing prices:

rolling = df['Target'].rolling(window=30).mean()
plt.figure(figsize=(12,4))  
plt.plot(df.index, df['Target'], alpha=0.3, label='Close')  
plt.plot(df.index, rolling, color='red', label='30-day Rolling Mean')  
plt.title('Close Price with Rolling Mean')  
plt.xlabel('Date')  
plt.ylabel('Close Price (Standardized)')  
plt.legend()  
plt.show()

Step 3 - Extract Feature and Target

# Separate features and target
features = df.loc[:, df.columns != 'Target']
target = df['Target']
print(f'Feature matrix shape: {features.shape}')
print(f'Target vector shape: {target.shape}')

Step 4 — Building the Model

1. Prediction Function

def model_fn(x, w, b):
    """
    Linear regression prediction function.

    Args:
        x: Single Feature Datapoint (1 x n)
        w: Weight vector (n,)
        b: Bias scalar

    Returns:
        y_hat: Predicted values (m,)
    """
    y_hat = np.dot(x, w) + b
    return y_hat

for example:

$\vec x = [-1.127270, -1.12714, -1.126278, -1.203614]$

$\vec w = [0.18435932, 0.19623844, 0.19707516, 0.20951568, 0.20951568, 0.00232556]$ (Random coefficients)

$b = 0.000397$

then,

$\hat y = x_1 w_1 + x_2 w_2 + x_3 w_3 + x_5 w_5 + b$

$\hat y= (-1.127270 \cdot 0.18435932) + (-1.12714 \cdot 0.19623844) + (-1.126278 \cdot 0.19707516) + (-1.203614 \cdot 0.20951568) + 0.000397$

or we can get the dot product of matrix of $\vec w$ and $\vec x$, i.e., $\hat y= \vec w \cdot \vec x + b$

$\vec{x} = \begin{pmatrix} -1.127270 -1.12714 -1.126278 -1.203614 \end{pmatrix} $

$$\vec{w} = \begin{pmatrix} 0.18435932 \\ 0.19623844 \\ 0.19707516 \\ 0.20951568 \\ 0.20951568 \\ 0.00232556 \end{pmatrix}$$

$\vec{x} \cdot \vec{w} = (-1.127270 \cdot 0.18435932) + (-1.12714 \cdot 0.19623844) + (-1.126278 \cdot 0.19707516) + (-1.203614 \cdot 0.20951568) + 0.000397$

2. Cost function

The purpose. is to measure of how bad predictions are using Mean Squared Error -

$$J(\mathbf{w}, b) = \frac{1}{2m} \sum_{i=1}^{m} \big(\hat{y}^{(i)} - y^{(i)}\big)^2$$

def cost_fn(X, Y, w, b):
    """
    Calculate Mean Squared Error cost.

    Args:
        X: Feature matrix
        Y: Target vector
        w: Weight vector
        b: Bias scalar

    Returns:
        cost: Mean squared error
    """
    m = len(X)
    y_hat = model_fn(X, w, b)
    cost = np.sum((y_hat - Y) ** 2) / (2 * m)
    return cost

3. Gradient Computation

If $\hat y= \vec w \cdot \vec x +b$, and cost $J = \frac {1}{2m} \sum (\hat y - y)^2$. then,

Residual vector: $r=\hat y − y = \vec w \cdot \vec x + b − y$, (shape: m,)
Gradient w.r.t. weights $\vec w$

$$\frac{\partial J}{\partial w} = \frac{1}{m} X^\top r$$
Gradient w.r.t bias $b$

$$\frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m r_i$$

def compute_gradient(X, Y, w, b):
    """
    Compute gradients for weights and bias.

    Args:
        X: Feature matrix
        Y: Target vector
        w: Weight vector
        b: Bias scalar

    Returns:
        dl_dw: Gradient with respect to weights
        dl_db: Gradient with respect to bias
    """
    m = len(X)
    y_hat = model_fn(X, w, b)

    # Compute gradients
    dl_dw = np.dot(X.T, (y_hat - Y)) / m
    dl_db = np.sum(y_hat - Y) / m

    return dl_dw, dl_db

4. Gradient Descent Algorithm

Gradient descent core idea

Repeatedly move w and b in the direction that reduces the cost:

$$w \leftarrow w - \alpha \cdot dl{dw}, \quad b \leftarrow b - \alpha \cdot dl{db}$$
$\alpha$(alpha), the learning rate controls step size:
- Too large → divergence (cost blows up).
- Too small → very slow convergence.
Practical training notes
- Initialization: zeros are fine for linear regression.
- Iterations: monitor cost decrease; don’t blindly run 10k — stop when cost plateaus.
- Monitoring: store cost_history and plot it. Also look at the magnitude of gradients — if gradients are near zero early, learning rate may be too small; if gradients explode, rate is too large.
- Early stopping: stop if validation cost increases (overfitting) or if cost changes less than a small epsilon for many steps.

Step4: Start the model training

# Convert to numpy arrays for efficient computation
X_train = features.to_numpy()
y_train = target.to_numpy()
print(X_train[0].shape)
# Initialize parameters
w_init = np.zeros_like(X_train[0])
b_init = 0.0

# Hyperparameters
alpha = 0.003      # Learning rate
iterations = 10000

# Train the model
print("Training Linear Regression Model...")
w_final, b_final, cost_history = gradient_descent(
    X_train, y_train, w_init, b_init, alpha, iterations
)

print(f"\nFinal parameters:")
print(f"Weights: {w_final}")
print(f"Bias: {b_final:.6f}")
print(f"Final cost: {cost_history[-1]:.6f}")

Learning curve subplots (first 30 / 100 / 1000 iterations)

First 30 iterations: you may see a tiny initial change — this is the model finding an initial descent direction.
First 100 iterations: if the curve bends downward noticeably, gradient descent is making meaningful progress; verify cost is decreasing smoothly.
First 1000 iterations: if this has flattened, the model is converging. If it’s still noisy or increasing, reduce the learning rate. Tip: plot cost_history on a log scale if values span orders of magnitude — that often makes convergence behavior easier to read.

Prediction vs Actual — two-panel explanation

The below plot show scatter plot that compares the actual values against the predicted values generated by the linear regression model. Points clustering closely along the diagonal red dashed line indicate strong agreement between predicted and real values

The second panel presents a time series plot showing how the predicted and actual values evolve over time. By plotting standardized prices on the same timeline, it demonstrates how well the model tracks real-world changes, making it easier to observe periods of strong prediction and potential deviations. Together, these visualizations provide a comprehensive view of the model’s performance both in terms of point-by-point accuracy and temporal consistency.

Link to the notebook:

Jupyter notebook - pvcodes/bitcoin-price-prediction
HTML Version - github/pvcodes/bitcoin-price-prediction

What is ML? Supervised vs Unsupervised Learning

Pranjal Verma — Fri, 19 Sep 2025 13:08:27 GMT

Why Does Machine Learning Matter in Today’s Landscape?

Machine learning powers many of the apps and services we rely on daily—from Uber’s dynamic pricing to Netflix’s personalized recommendations. Whether it’s Gmail’s spam filters, fraud detection in banking, or voice assistants like Siri and Alexa, ML quietly works behind the scenes to deliver smarter, faster, and more personalized experiences.

With the recent AI boom, terms like Artificial Intelligence (AI) and Machine Learning (ML) have become mainstream. But ML isn’t new—it has been evolving for decades, shaping how computers learn from data and improve over time.

At its core, machine learning is about teaching computers to learn from experience rather than relying solely on explicit programming. These techniques span everything from simple statistics to complex neural networks. For those interested, How Did We Get Here? A Brief History of Machine Learning offers a historical perspective, while Fundamentals of Machine Learning for Predictive Data Analytics is a great textbook resource.

In practice, ML usually involves two steps:

Training a model with algorithms and example data, so it learns the relationship between inputs and outputs.
Deploying the model into applications to make predictions or decisions in real-time and at scale.

This naturally leads us to two core branches of ML: Supervised and Unsupervised learning. But before we dive into them, let’s clarify some common terminologies.

Terminologies in Machine Learning

Dataset – A collection of data used to train and test models (e.g., patient records, emails, or images).
Feature (Input Variable) – An attribute or characteristic used by the model (e.g., a person’s age, height, or income).
Target (Label/Output Variable) – The value we want the model to predict (e.g., “spam” or “not spam,” house price).
Model – The mathematical representation learned from data that maps inputs (features) to outputs (targets).
Training – The process of feeding a dataset to the algorithm so it can learn patterns.
Testing – Evaluating the model’s performance on unseen data to check how well it generalizes.

Supervised and Unsupervised Learning

When we talk about machine learning, most problems fall into two broad categories: supervised and unsupervised learning.

Supervised Learning - Think of this like a teacher guiding a student. You provide the algorithm with input data and the correct answers (labels), and the model learns to map one to the other. For example, if you feed in house features (size, location, number of rooms) along with actual house prices, the model will eventually learn how to predict the price of a new house.
Some common techniques here include Linear Regression, Logistic Regression, Decision Trees, and Neural Networks.

Unsupervised Learning - Now imagine exploring a new city without a tour guide. You don’t have labels telling you what’s what — instead, you look for patterns and groupings on your own. That’s what unsupervised learning does: it takes unlabeled data and finds hidden structures within it. For example, given a pile of customer purchase histories, the algorithm might group customers with similar buying habits together — even if no one told it what those groups should look like.
Popular approaches include Clustering, Dimensionality Reduction, and Association Rule Learning.

When to Use: Supervised vs. Unsupervised Learning

Supervised Learning – Best for problems with known outcomes and labeled data.
Examples: spam email classification, image recognition, stock price prediction.
Unsupervised Learning – Best when the data is unlabeled and the goal is to explore patterns, group similar instances, or detect anomalies.
Examples: organizing large data archives, building recommendation systems, customer segmentation.

Summary of Differences: Supervised vs. Unsupervised Learning

Aspect	Supervised Learning	Unsupervised Learning
What is it?	Train the model with input data paired with labeled outputs.	Train the model to discover hidden patterns in unlabeled data.
Techniques	Logistic Regression, Linear Regression, Decision Tree, Neural Networks.	Clustering, Association Rule Learning, Probability Density, Dimensionality Reduction.
Goal	Predict an output based on known inputs.	Identify relationships or patterns between input data points.
Approach	Minimize the error between predicted outputs and true labels.	Find patterns, similarities, or anomalies within the data.

Conclusion

Machine learning is reshaping how we live and work—powering smarter apps, real-time decisions, and innovations across industries. With emerging trends like automated ML, explainable AI, and even quantum integration, its future impact will be even greater.

For professionals and businesses, understanding ML is no longer optional—it’s essential for staying relevant in a data-driven world.

Ever wondered how machines predict house prices or stock trends? In the next blog, we’ll break it down with Linear Regression—one of the simplest yet most powerful ML techniques.