On this information, I am going to stroll you thru an entire knowledge evaluation venture utilizing Python to research helicopter jail escapes all through historical past. You may discover ways to acquire knowledge from Wikipedia, clear it, carry out fundamental evaluation, and visualize your findings utilizing Python’s basic instruments.
This guided venture is designed for learners who’re studying Python fundamentals like lists, loops, and conditional logic. By working with a real-world dataset on an uncommon subject, you may acquire sensible expertise making use of these ideas whereas practising the core knowledge evaluation workflow: knowledge assortment, cleansing, evaluation, and visualization.
What You may Be taught
By the top of this venture, you may know learn how to:
- Import and manipulate knowledge in Python
- Clear and put together uncooked knowledge for evaluation
- Create frequency tables utilizing completely different Python approaches
- Visualize knowledge with easy charts
- Evaluate the effectivity of various knowledge constructions (lists vs. dictionaries vs. DataFrames)
Setting Up Your Surroundings
1. Set Up Your Workspace
We’ll work with a .ipynb
file, which might be rendered within the following instruments:
- Jupyter Pocket book (native set up required)
- Google Colab (browser-based, no set up wanted)
2. Obtain the Useful resource File
We will likely be using a helper.py
file that features features to work with net knowledge, visualize it and extra, so be sure to obtain the helper.py
file from the lesson if you wish to work domestically.

For this venture, you may want a Python setting with a couple of fundamental libraries. In the event you’re engaged on the Dataquest platform, all the things is already arrange for you. In the event you’re working domestically, you may want:
- Python 3.x
- Jupyter Pocket book or JupyterLab
- Matplotlib for visualization
Let’s start by importing all the features within the helper.py
file. In a real-world venture, these features would deal with duties which might be past the scope of our present evaluation however are crucial for the venture to run easily.
from helper import *
Getting the Knowledge
One of many cool facets of this venture is that we’re pulling our knowledge straight from Wikipedia. Relatively than working with a static CSV file, we’ll be scraping a reside Wikipedia web page that lists helicopter jail escapes all through historical past.
url = "https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes"
knowledge = data_from_url(url)
The data_from_url
operate is dealing with the online scraping for us, extracting the data from the Wikipedia desk and changing it right into a format we will work with.
Teacher Perception: Working with reside knowledge provides an attention-grabbing dimension to this venture. In contrast to static datasets, web-sourced knowledge might change between once you develop your evaluation and when another person runs it. This can be a widespread problem in real-world knowledge science as a result of knowledge isn’t static and might be up to date at any time.
Let’s check out what our knowledge seems like:
print(len(knowledge)) # Test what number of entries we've got
print(sort(knowledge)) # Test the information construction sort
50
Our knowledge is saved as a listing with 50 entries. Now let’s study the primary few entries to know the construction:
for row in knowledge[:3]:
print(row)
['August 19, 1971', 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro', "Joel David Kaplan was a New York businessman who had been arrested for murder in 1962 in Mexico City and was incarcerated at the Santa Martha Acatitla prison in the Iztapalapa borough of Mexico City. Joel's sister, Judy Kaplan, arranged the means for to help Kaplan escape, and on August 19, 1971, a helicopter landed in the prison yard. The guards mistakenly thought this was an official visit. In two minutes, Kaplan and his cellmate Carlos Antonio Contreras, a Venezuelan counterfeiter, were able to board the craft and were piloted away, before any shots were fired.[9] Each males have been flown to Texas after which completely different planes flew Kaplan to California and Castro to Guatemala.[3] The Mexican authorities by no means initiated extradition proceedings in opposition to Kaplan.[9] The escape is instructed in a guide, The ten-Second Jailbreak: The Helicopter Escape of Joel David Kaplan.[4] It additionally impressed the 1975 motion film Breakout, which starred Charles Bronson and Robert Duvall.[9]"]
['October 31, 1973', 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon", 'On October 31, 1973 an IRA member hijacked a helicopter and forced the pilot to land in the exercise yard of Dublin's Mountjoy Jail's D Wing at 3:40xa0p.m., October 31, 1973. Three members of the IRA were able to escape: JB O'Hagan, Seamus Twomey and Kevin Mallon. Another prisoner who also was in the prison was quoted as saying, "One shamefaced screw apologised to the governor and said he thought it was the new Minister for Defence (Paddy Donegan) arriving. I told him it was our Minister of Defence leaving." The Mountjoy helicopter escape became Republican lore and was immortalized by "The Helicopter Song", which contains the lines "It's up like a bird and over the city. There's three men a'missing I heard the warder say".[1]']
['May 24, 1978', 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson', "43-year-old Barbara Ann Oswald hijacked a Saint Louis-based charter helicopter and forced the pilot to land in the yard at USP Marion. While landing the aircraft, the pilot, Allen Barklage, who was a Vietnam War veteran, struggled with Oswald and managed to wrestle the gun away from her. Barklage then shot and killed Oswald, thwarting the escape.[10] A couple of months later Oswald's daughter hijacked TWA Flight 541 in an effort to free Trapnell."]
Every entry accommodates details about a helicopter jail escape try, together with the date, jail title, nation, whether or not it succeeded, the escapee(s), and an in depth description. The descriptions are very lengthy, and we do not want them for our evaluation, so let’s clear our knowledge by eradicating them.
Cleansing the Knowledge
Knowledge cleansing is an important step in any evaluation. For this venture, we’ll begin by eradicating the prolonged descriptions from our knowledge, which is within the final column. We’ll take away it by slicing knowledge
:
index = 0
for row in knowledge:
knowledge[index] = row[:-1] # Hold all the things besides the final component
index += 1
Now let’s affirm our cleansing labored:
print(knowledge[:3])
[['August 19, 1971', 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro'], ['October 31, 1973', 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon"], ['May 24, 1978', 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson']]
A lot better! Now every entry is extra manageable, containing simply the important thing info we’d like.
Subsequent, we have to standardize the dates. At present, they’re in a format like “August 19, 1971,” however for our evaluation, we simply want the 12 months:
for row in knowledge:
date = fetch_year(row[0])
row[0] = date
Teacher Perception: When working with dates in knowledge evaluation, it’s normal to extract simply the elements you want. Right here, we’re solely excited by yearly tendencies, so we extract simply the 12 months. If we have been searching for seasonal patterns, we would hold the month as an alternative.
Let’s examine our progress:
print(knowledge[:3])
[[1971, 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro'], [1973, 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon"], [1978, 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson']]
Nice! The dates have been transformed to simply the years, which can make our evaluation simpler.
Analyzing Jail Escapes by 12 months
Now that our knowledge is clear, let’s analyze which years had essentially the most helicopter jail escape makes an attempt.
First, let’s discover the vary of years in our dataset:
min_year = min(knowledge, key=lambda x: x[0])[0]
max_year = max(knowledge, key=lambda x: x[0])[0]
print(min_year)
print(max_year)
1971
2020
Now, let’s create a listing of all years from the minimal to the utmost. It will guarantee we account for years the place no escapes occurred:
years = []
for 12 months in vary(min_year, max_year + 1):
years.append(12 months)
Subsequent, we’ll create a frequency desk to depend escape makes an attempt per 12 months:
attempts_per_year = []
for 12 months in years:
attempts_per_year.append([year, 0])
Now, we’ll loop by way of our knowledge and increment the depend for annually:
for row in knowledge:
for year_attempt in attempts_per_year:
12 months = year_attempt[0]
if row[0] == 12 months:
year_attempt[1] += 1
[[1971, 1], [1972, 0], [1973, 1], [1974, 0], [1975, 0], [1976, 0], [1977, 0], [1978, 1], [1979, 0], [1980, 0], [1981, 2], [1982, 0], [1983, 1], [1984, 0], [1985, 2], [1986, 3], [1987, 1], [1988, 1], [1989, 2], [1990, 1], [1991, 1], [1992, 2], [1993, 1], [1994, 0], [1995, 0], [1996, 1], [1997, 1], [1998, 0], [1999, 1], [2000, 2], [2001, 3], [2002, 2], [2003, 1], [2004, 0], [2005, 2], [2006, 1], [2007, 3], [2008, 0], [2009, 3], [2010, 1], [2011, 0], [2012, 1], [2013, 2], [2014, 1], [2015, 0], [2016, 1], [2017, 0], [2018, 1], [2019, 0], [2020, 1]]
Let’s visualize our findings with a easy bar chart:
%matplotlib inline
barplot(attempts_per_year)
Teacher Perception: From this visualization, we will see that helicopter jail escapes aren’t notably widespread in any given 12 months, with most years having 0-1 makes an attempt. Nevertheless, there are attention-grabbing spikes within the mid-Nineteen Eighties, early 2000s, and round 2007-2009, the place a number of makes an attempt occurred in the identical 12 months. This might be because of copycat makes an attempt impressed by profitable escapes, or maybe adjustments in jail safety measures over time.
Analyzing Escapes by Nation
Now let us take a look at which nations have had essentially the most helicopter jail escapes.
First, we have to get a listing of all nations in our dataset:
nations = []
for row in knowledge:
nation = row[2]
if nation not in nations:
nations.append(nation)
Now let’s depend the frequency of escapes in every nation utilizing our fundamental Python expertise:
countries_frequency = []
for nation in nations:
countries_frequency.append([country, 0])
for country_attempt in countries_frequency:
nation = country_attempt[0]
for row in knowledge:
if row[2] == nation:
country_attempt[1] += 1
This method works, however it’s not essentially the most environment friendly. In real-world knowledge evaluation, we frequently have a number of methods to unravel the identical downside. Let’s examine some different approaches:
Utilizing Dictionaries (Extra Environment friendly)
nations = {}
for row in knowledge:
nation = row[2]
if nation not in nations:
nations[country] = 1
else:
nations[country] += 1
Teacher Perception: Dictionaries are sometimes extra environment friendly than lists for duties like counting frequencies as a result of they use hash tables, permitting for O(1) lookup time in comparison with O(n) for lists. This efficiency distinction turns into extra vital with bigger datasets. Throughout a technical interview I as soon as panicked as a result of I could not keep in mind the pandas technique for frequency counts, however understanding learn how to do it with fundamental Python would have been ample!
Utilizing Pandas (Most Environment friendly)
For these acquainted with pandas, we will accomplish this job much more effectively utilizing that pandas DataFrame technique (value_counts()
) I struggled to recollect:
import pandas as pd
df = pd.DataFrame(knowledge, columns=["Date", "Prison name", "Country", "Succeeded", "Escapee"])
countries_frequency = df["Country"].value_counts()
print_pretty_table(countries_frequency)
Nation | Variety of Occurrences |
---|---|
France | 15 |
United States | 8 |
Canada | 4 |
Belgium | 4 |
Greece | 4 |
United Kingdom | 2 |
Australia | 2 |
Brazil | 2 |
Netherlands | 1 |
Italy | 1 |
Mexico | 1 |
Chile | 1 |
Russia | 1 |
Eire | 1 |
Puerto Rico | 1 |
From our evaluation, we will see that France has the very best variety of helicopter jail escapes with 15 makes an attempt, adopted by america with 8.
Evaluation
On this venture, we have analyzed helicopter jail escapes utilizing fundamental Python expertise. We realized learn how to:
- Import knowledge from an online supply
- Clear and course of knowledge for evaluation
- Create frequency tables utilizing completely different Python approaches
- Visualize our findings
We found that helicopter jail escapes peaked in sure years (1986, 2001, 2007, and 2009) and are commonest in France and america.
Subsequent Steps
If you wish to lengthen this venture, listed below are some concepts:
- Analyze the success fee of helicopter escapes by nation or decade
- Examine whether or not a number of escapees have an effect on success charges
- Search for patterns in repeat escapees
- Discover correlations between jail safety ranges and escape makes an attempt
In the event you’re able to take your Python expertise to the following degree, take a look at Dataquest’s Python Fundamentals for Knowledge Evaluation talent path or our Knowledge Analyst in Python profession path.
Bear in mind, the important thing to mastering knowledge evaluation is follow. Attempt to apply these strategies to datasets that curiosity you!
Joyful coding!