Contents

The Problem With Simple Train-Test Splits

When You Have Very Little Data: Leave-One-Out

Practical Example: Comparing Two Models

How to Choose the Right K

Common Mistakes to Avoid

When You Don't Need Cross-Validation

Cross-Validation Explained: Why Your 80/20 Split Isn't Good Enough

The smarter way to test machine learning models when you don't have millions of data points

13 Nov 2025 • 10:038 min read

Brian Hulela

Photo by Kevin Ku

You just built a machine learning model. You split your data 80/20, trained on the 80%, tested on the 20%, and got 85% accuracy. Sounds good, right?

Not so fast.

What if that particular 20% you held back was easy to predict?
What if it was unusually hard?
What if you just got lucky?

You have no way to know because you only tested once.

This is where cross-validation comes in. It's one of the most important techniques in machine learning, yet most beginners don't understand why it matters or how to use it properly.

Let me show you why your simple train-test split is leaving performance on the table, and how cross-validation can give you more reliable models.

The Problem With Simple Train-Test Splits

Imagine you have 1,000 customer records and you want to predict which customers will buy your product. You split the data: 800 for training, 200 for testing.

Here's what can go wrong.

Your test set might accidentally contain mostly high-value customers who are easy to predict. You get 90% accuracy and think your model is amazing. Then you deploy it, and it fails miserably on real customers.

Or the opposite happens. Your test set gets all the weird edge cases. You get 60% accuracy, think your model is terrible, and give up. But actually, your model would work fine on typical customers.

The fundamental problem is that you're making a huge decision (is this model good or bad?) based on one random split of your data. That's risky.

With small datasets, it gets worse. If you only have 200 samples and you hold back 20%, you're training on just 160 examples. That's probably not enough. But your test set of 40 samples is too small to give you a reliable accuracy estimate either.

You're stuck. You need more training data, but you also need a decent test set. Simple splitting doesn't work.

How Cross-Validation Solves This

Cross-validation is simple. Instead of splitting your data once, you split it multiple times and train multiple models. Then you average the results.

Let's walk through the most common approach: 5-fold cross-validation.

Take your 1,000 customer records and divide them into 5 equal groups of 200. These groups are called "folds." Now you train 5 different models:

Model 1: Train on folds 2, 3, 4, and 5 (800 records). Test on fold 1 (200 records).
Model 2: Train on folds 1, 3, 4, and 5 (800 records). Test on fold 2 (200 records).
Model 3: Train on folds 1, 2, 4, and 5 (800 records). Test on fold 3 (200 records).
Model 4: Train on folds 1, 2, 3, and 5 (800 records). Test on fold 4 (200 records).
Model 5: Train on folds 1, 2, 3, and 4 (800 records). Test on fold 5 (200 records).

Each model gets tested on a different 20% of your data. Then you average all 5 test scores to get your final accuracy estimate.

Why is this better? Because every single data point gets used for testing exactly once. You get a much more reliable estimate of how your model will perform on new data. You're not dependent on getting lucky with one random split.

If one fold happens to be unusually easy or hard, it gets averaged out by the other four folds. You get a realistic view of your model's true performance.

The Math Behind Why This Works

Here's the insight that makes cross-validation powerful.

With a single 80/20 split, your accuracy estimate has high variance. Run the split again with a different random seed, and you might get a very different number. That variance represents uncertainty about your model's true performance.

With 5-fold cross-validation, you're essentially running 5 different experiments and averaging them. This dramatically reduces the variance of your accuracy estimate. Basic statistics tells us that averaging multiple measurements gives you a more reliable estimate than a single measurement.

Think of it like taking someone's temperature. One reading might be off. But if you take five readings and average them, you'll get closer to the true value.

The same logic applies to model evaluation. Five test sets give you a better estimate than one test set.

When You Have Very Little Data: Leave-One-Out

Sometimes you have so little data that even 5-fold cross-validation isn't enough. Maybe you only have 50 samples. Holding back 20% means testing on just 10 examples. That's not enough to reliably estimate performance.

This is where Leave-One-Out Cross-Validation (LOOCV) comes in.

The idea is extreme but effective. If you have 50 samples, you train 50 different models. Each model trains on 49 samples and tests on 1 sample. Then you average all 50 test results.

Here's what it looks like:

Model 1: Train on samples 2-50, test on sample 1.
Model 2: Train on samples 1, 3-50, test on sample 2.
Model 3: Train on samples 1-2, 4-50, test on sample 3. Continue for all 50 samples.

Each data point gets its turn as the test set. You use almost all your data for training each time (49 out of 50), which helps when data is scarce. And every single sample gets evaluated, giving you the most thorough assessment possible.

The downside? Computation time. Training 50 models instead of 5 (or 1) takes longer. But when you have limited data and need reliable estimates, it's worth the wait.

LOOCV is most useful when you have fewer than 100-200 samples. Beyond that, the computational cost outweighs the benefits, and regular k-fold cross-validation works fine.

Practical Example: Comparing Two Models

Let's see how cross-validation helps in a real scenario.

You're trying to decide between a random forest and a neural network for your problem. You have 2,000 samples.

Without cross-validation:

Random forest: 83% accuracy on test set
Neural network: 86% accuracy on test set
Decision: Use the neural network

With 5-fold cross-validation:

Random forest: 82%, 84%, 83%, 85%, 81%—Average: 83%
Neural network: 91%, 78%, 88%, 79%, 92%—Average: 85.6%

Now you see something important. The neural network has higher variance. Sometimes it's great (91%, 92%), sometimes it's mediocre (78%, 79%). The random forest is more consistent.

Which model should you choose? It depends. If consistency matters for your application, the random forest might be better despite the slightly lower average. If you just want the highest average performance, take the neural network.

Without cross-validation, you wouldn't have seen this variance. You'd have made your decision based on incomplete information.

How to Choose the Right K

In k-fold cross-validation, k is the number of folds. Common choices are 5 or 10. How do you pick?

5-fold is good when:

You have moderate amounts of data (1,000-10,000 samples)
Training is computationally expensive
You want results quickly

10-fold is better when:

You have more data (10,000+ samples)
You want more precise estimates
Training time isn't a concern

Leave-one-out when:

You have very little data (under 200 samples)
Training is fast
You need maximum precision

There's always a tradeoff. More folds mean more reliable estimates but longer computation time. Fewer folds mean faster results but more variance in your estimates.

For most practical applications, 5-fold is a good default. It's fast enough and reliable enough for real work.

Common Mistakes to Avoid

Mistake 1: Doing preprocessing before the split

Never normalize, scale, or transform your data before splitting it into folds. This causes data leakage. Your test fold will contain information from your training folds.

Always do this: Split first, then fit your preprocessing on the training folds only, then apply it to the test fold.

Mistake 2: Using cross-validation for final model selection

Cross-validation is for evaluating models, not building them. After you use cross-validation to pick the best approach, train one final model on ALL your data. That's the model you deploy.

Mistake 3: Forgetting about time dependencies

If your data has a time component (stock prices, customer behavior over time), random splitting breaks the temporal order. Use time-based splitting instead, where your test set always comes after your training set in time.

Mistake 4: Not stratifying imbalanced data

If you're predicting rare events (fraud, disease), make sure each fold has a similar proportion of positive cases. Most libraries have a stratified option that handles this automatically.

When You Don't Need Cross-Validation

Cross-validation isn't always necessary. If you have massive amounts of data (millions of samples), a simple train-test split is fine. The law of large numbers means your single test set will be representative.

For example, if you're training on 10 million images and testing on 2 million, that test set is large enough to give you a reliable accuracy estimate. Cross-validation would just waste computational resources.

Save cross-validation for when you actually need it: moderate to small datasets where you need reliable performance estimates.

Cross-validation is about reducing uncertainty. A single train-test split leaves you guessing. Cross-validation gives you confidence.

When you have limited data, it lets you use all your samples for both training and testing (just not simultaneously). When you're comparing models, it shows you not just average performance but also variance and consistency.

Yes, it takes more computation time. Yes, it's more complex to implement. But the insights you gain are worth it. You'll make better decisions about which models to use, catch overfitting earlier, and deploy models that actually work in production.

Next time you're about to do a simple 80/20 split, ask yourself: do I really know how this model will perform? Or am I just getting lucky with one random split?

Cross-validation gives you the real answer.

Cross-Validation Explained: Why Your 80/20 Split Isn't Good Enough

The smarter way to test machine learning models when you don't have millions of data points

The Problem With Simple Train-Test Splits

How Cross-Validation Solves This

The Math Behind Why This Works

When You Have Very Little Data: Leave-One-Out

Practical Example: Comparing Two Models

How to Choose the Right K

Common Mistakes to Avoid

Mistake 1: Doing preprocessing before the split

Mistake 2: Using cross-validation for final model selection

Mistake 3: Forgetting about time dependencies

Mistake 4: Not stratifying imbalanced data

When You Don't Need Cross-Validation

Responses (0)

Related Articles

A Beginner’s Guide to Data Pipelines in Python

High-Value Skills You Can Learn in Under 12 Months for 2026

6 Real Ways People Are Using Computer Vision to Make Money in Agriculture

Read More from Brian Hulela

The Simplest Way to Visualize Your Data Without Coding

A Beginner’s Guide to Data Pipelines in Python

Digital Skills That Open Career Opportunities