60 Day Scikit-learn

Getting Started with scikit-learn

What is scikit-learn?

Scikit-learn (sklearn) is a free Python library that helps you analyze data and make predictions. It’s like a toolbox for "machine learning," which means teaching a computer to learn patterns from data and make decisions.

In this tutorial, we’ll:

Install scikit-learn.
Load some simple data.
Use a machine learning model to predict something.
Explain each step in plain language!

Step 1: Set Up Your Tools

Before we start, you need Python and scikit-learn installed on your computer.

Install Python: If you don’t have Python, download it from python.org. Use version 3.8 or higher.
Install scikit-learn: Open your computer’s command line (Terminal on Mac or Command Prompt on Windows) and type:
```
pip install scikit-learn
```
Press Enter, and it’ll download scikit-learn.
Install a code editor: Use something simple like Jupyter Notebook or VS Code. For this tutorial, I’ll assume you’re using a basic Python file (e.g., mycode.py).

Step 2: Write Your First Python Code

Open your code editor and create a new file. Let’s start by bringing in the tools we need.

# Import the tools we’ll use from scikit-learn
from sklearn.linear_model import LogisticRegression  # This is our prediction tool
import numpy as np  # Helps us work with numbers easily

What’s happening here?
- LogisticRegression is a tool that guesses "yes or no" answers (like pass or fail).
- numpy (np) helps us handle lists of numbers.

Step 3: Create Some Simple Data

Imagine we have data about students: how many hours they studied and whether they passed (1 = pass, 0 = fail). Let’s make up some numbers.

# Hours studied (our "input" data)
hours_studied = np.array([[1], [2], [3], [4], [5]])

# Pass (1) or Fail (0) (our "output" data)
pass_or_fail = np.array([0, 0, 1, 1, 1])

What’s this?
- hours_studied is a list of numbers: 1 hour, 2 hours, etc.
- pass_or_fail tells us the result: 0 means fail, 1 means pass.
- For example, a student who studied 1 hour failed (0), but one who studied 5 hours passed (1).

Step 4: Build a Prediction Model

Now, let’s teach the computer to predict if a student passes based on study hours.

# Create the prediction tool (model)
model = LogisticRegression()

# Teach the model using our data
model.fit(hours_studied, pass_or_fail)

What’s happening?
- model = LogisticRegression() is like hiring a tutor to learn from your data.
- model.fit() is like the tutor studying the hours and pass/fail results to find a pattern.

Step 5: Make Predictions

Let’s ask the model: "If a student studies 3.5 hours, will they pass?"

# Test with 3.5 hours
new_hours = np.array([[3.5]])

# Predict: 0 (fail) or 1 (pass)
prediction = model.predict(new_hours)

# Show the result
print("Prediction for 3.5 hours:", "Pass" if prediction[0] == 1 else "Fail")

What’s this?
- new_hours is the new data we’re testing (3.5 hours).
- model.predict() guesses if the student passes or fails.
- The print line shows the answer in plain words.

When you run this, it might say something like:

Prediction for 3.5 hours: Pass

Step 6: Check How Good the Model Is

Let’s see how well our model learned by testing it on the original data.

# Check predictions for all our original hours
predictions = model.predict(hours_studied)

# Show the results
print("Hours Studied:", hours_studied.flatten())  # Flatten makes it a simple list
print("Actual Results:", pass_or_fail)
print("Predicted Results:", predictions)

Output might look like:

Hours Studied: [1 2 3 4 5]
Actual Results: [0 0 1 1 1]
Predicted Results: [0 0 1 1 1]

What’s this?
- It shows what the model guessed for 1, 2, 3, 4, and 5 hours.
- Compare “Predicted” to “Actual” to see if it’s right!

Full Code Example

Here’s everything put together. Copy this into a file (e.g., learn_sklearn.py) and run it with Python:

# Import tools
from sklearn.linear_model import LogisticRegression
import numpy as np

# Our data
hours_studied = np.array([[1], [2], [3], [4], [5]])
pass_or_fail = np.array([0, 0, 1, 1, 1])

# Create and teach the model
model = LogisticRegression()
model.fit(hours_studied, pass_or_fail)

# Predict for 3.5 hours
new_hours = np.array([[3.5]])
prediction = model.predict(new_hours)
print("Prediction for 3.5 hours:", "Pass" if prediction[0] == 1 else "Fail")

# Check all predictions
predictions = model.predict(hours_studied)
print("Hours Studied:", hours_studied.flatten())
print("Actual Results:", pass_or_fail)
print("Predicted Results:", predictions)

What Did We Learn?

Data: We used hours studied to predict passing or failing.
Model: LogisticRegression learned the pattern (more hours = more likely to pass).
Prediction: We guessed results for new data and checked how good our guesses were.

Try It Yourself!

Change the hours_studied and pass_or_fail numbers to something else (e.g., [1, 3, 5, 7] and [0, 1, 1, 1]).
Predict for a different number of hours, like 6 or 2.5.
See what happens!

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Getting Started with scikit-learn

What is scikit-learn?

Step 1: Set Up Your Tools

Step 2: Write Your First Python Code

Step 3: Create Some Simple Data

Step 4: Build a Prediction Model

Step 5: Make Predictions

Step 6: Check How Good the Model Is

Full Code Example

What Did We Learn?

Try It Yourself!

FilesExpand file tree

60 Day Scikit-learn

Directory actions

More options

Directory actions

More options

Latest commit

History

60 Day Scikit-learn

Folders and files

parent directory

README.md

Getting Started with scikit-learn

What is scikit-learn?

Step 1: Set Up Your Tools

Step 2: Write Your First Python Code

Step 3: Create Some Simple Data

Step 4: Build a Prediction Model

Step 5: Make Predictions

Step 6: Check How Good the Model Is

Full Code Example

What Did We Learn?

Try It Yourself!