Analyzing the NYC Parking Violations Dataset

An Introduction to Cloud Data Warehousing

New York City, known for its bustling streets and limited parking spaces, generates a wealth of parking violation data. In this article, we will walk you through the NYC parking violations dataset's schema, demonstrate how to load it into a data warehouse like Firebolt, and provide sample SQL queries for analysis.

Understanding the NYC Parking Violations Schema

Before delving into the analysis, let's first understand the schema of the NYC parking violations dataset. Here's a tabular representation of the dataset's schema:

Column Name Data Type Description
summons Integer Unique identifier for each parking violation record
plateid Text License plate identifier
registration Text Vehicle registration identifier
plate Text License plate information
issue_date Date Date when the violation was issued
violation_code Integer Code indicating the type of parking violation
vehicle_body Text Description of the vehicle's body type
vehicle_make Text Make or manufacturer of the vehicle
issuing_agency Text Agency responsible for issuing the violation
street_code1 Integer Code representing the primary street where the violation occurred
street_code2 street_code2 Code representing a secondary street related to the violation
street_code3 Integer Code representing another secondary street related to the violation
vehicle_expiration Text Expiration date of the vehicle registration
violation_location Integer Code representing the location of the violation
violation_precinct Integer Precinct where the violation occurred
issuer_precinct Integer Precinct of the issuing officer
issuer_code Integer Code representing the issuing officer
issuer_command Text Command of the issuing officer
issuer_squad Text Squad of the issuing officer
violation_time Text Time of day when the violation occurred
time_first_observed Text Time when the violation was first observed
violation_county Text County where the violation occurred
violation_infront_opposite Text Indicator of whether the violation occurred in front of or opposite
house_integer Text House number or identifier on the street where the violation occurred
street_name Text Name of the street where the violation occurred
intersecting_street Text Name of the intersecting street (if applicable)
date_first_observed Text Date when the violation was first observed
law_section Integer Section of the law or code related to the violation
sub_division Text Sub-division or additional information related to the violation
violation_legal_code Text Legal code related to the violation
days_parking_in_effect Text Days of the week when parking regulations are in effect
from_hours_in_effect Text Starting time when parking regulations are in effect
to_hours_in_effect Text Ending time when parking regulations are in effect
vehicle_color Text Color of the vehicle
unregistered_vehicle Text Indicator of whether the vehicle is unregistered
vehicle_year Text Year of manufacture of the vehicle
meter_integer Text Meter number (if applicable)
feet_from_curb Integer Distance (in feet) from the curb where the violation occurred
violation_post_code Text Postal code related to the violation
violation_description Text Description of the parking violation
no_standing_or_stopping_violation Text Indicator of whether it's a "no standing" or "no stopping" violation
hydrant_violation Text Indicator of whether it's a hydrant violation
double_parking_violation Text Indicator of whether it's a double parking violation

The dataset contains various columns providing information about each parking violation, including vehicle details, violation codes, and issuance information.

Loading Data into Firebolt

To begin analyzing the NYC parking violations dataset, you'll need to load it into a data warehouse like Firebolt. Data can be loaded into Firebolt using two options. First option is to use the “COPY FROM” statement to load data. 

Option 1: Using “Copy FROM”

Below statement, shows the use of “COPY FROM” to infer the schema and load data into Firebolt.

1COPY INTO nyc_parkingviolations FROM 's3://firebolt-sample-datasets-public-us-east-1/nyc_sample_datasets/nycparking/parquet/'
2WITH PATTERN="*.parquet" AUTO_CREATE=TRUE 
3 TYPE=PARQUET;

Option 2: Using External tables

1. Create an External Table

An external table defines the schema and location of the data source. In Firebolt, you can create an external table using SQL DDL (Data Definition Language) statements, specifying the dataset's structure and the source location. For example:

1CREATE EXTERNAL TABLE ex_nyc_parkingviolations (
2  summons Integer,
3  plateid Text,
4  registration Text,
5  plate Text,
6  issue_date TEXT,
7  violation_code Integer,
8  vehicle_body Text,
9  vehicle_make Text,
10  issuing_agency Text,
11  street_code1 Integer,
12  street_code2 Integer,
13  street_code3 Integer,
14  vehicle_expiration Text,
15  violation_location Integer,
16  violation_precinct Integer,
17  issuer_precinct Integer,
18  issuer_code Integer,
19  issuer_command Text,
20  issuer_squad Text,
21  violation_time Text,
22  time_first_observed Text,
23  violation_county Text,
24  violation_infront_opposit Text,
25  house_integer Text,
26  street_name Text,
27  intersecting_street Text,
28  date_first_observed Integer,
29  law_section Integer,
30  sub_division Text,
31  violation_legal_code Text,
32  days_parking_in_effect Text,
33  from_hours_in_effect Text,
34  to_hours_in_effect Text,
35  vehicle_color Text,
36  unregistered_vehicle Integer,
37  vehicle_year Text,
38  meter_integer Text,
39  feet_from_curb Integer,
40  violation_post_code Text,
41  violation_description Text,
42  no_standing_or_stopping_violation Text,
43  hydrant_violation Text,
44  double_parking_violation Text
45)
46URL = 's3://firebolt-sample-datasets-public-us-east-1/nyc_sample_datasets/nycparking/parquet/'
47OBJECT_PATTERN = '*.parquet
48TYPE = (PARQUET);

In this step, you define the external table's structure and specify the location of the Parquet file in an S3 bucket.

2. Create an Internal Table

An internal table is where the data is stored and optimized for querying in Firebolt. You can create an internal table based on the external table structure. Here's an example:

1CREATE TABLE nyc_parkingviolations (
2  summons Integer,
3  plateid Text,
4  registration Text,
5  plate Text,
6  issue_date Date,
7  violation_code Integer,
8  vehicle_body Text,
9  vehicle_make Text,
10  issuing_agency Text,
11  street_code1 Integer,
12  street_code2 Integer,
13  street_code3 Integer,
14  vehicle_expiration Integer,
15  violation_location Integer,
16  violation_precinct Integer,
17  issuer_precinct Integer,
18  issuer_code Integer,
19  issuer_command Text,
20  issuer_squad Text,
21  violation_time Text,
22  time_first_observed Text,
23  violation_county Text,
24  violation_infront_opposit Text,
25  house_integer Text,
26  street_name Text,
27  intersecting_street Text,
28  date_first_observed Integer,
29  law_section Integer,
30  sub_division Text,
31  violation_legal_code Text,
32  days_parking_in_effect Text,
33  from_hours_in_effect Text,
34  to_hours_in_effect Text,
35  vehicle_color Text,
36  unregistered_vehicle Integer,
37  vehicle_year Text,
38  meter_integer Text,
39  feet_from_curb Integer,
40  violation_post_code Text,
41  violation_description Text,
42  no_standing_or_stopping_violation Text,
43  hydrant_violation Text,
44  double_parking_violation Text
45);

This SQL statement creates an internal table and copies the data from the external table into it.

3. Data Ingestion

Once you have created the internal table, you can insert new data or update existing data as needed. For example, you can periodically load updated parking violations data into the internal table.

1INSERT INTO nyc_parkingviolations 
2SELECT 
3 summons ,
4  plateid ,
5  registration ,
6  plate ,
7  TO_TIMESTAMP(issue_date,'YYYY-MM-DD') ,
8  violation_code ,
9  vehicle_body ,
10  vehicle_make ,
11  issuing_agency ,
12  street_code1 ,
13  street_code2 ,
14  street_code3 ,
15  vehicle_expiration ,
16  violation_location ,
17  violation_precinct ,
18  issuer_precinct ,
19  issuer_code ,
20  issuer_command ,
21  issuer_squad ,
22  violation_time ,
23  time_first_observed ,
24  violation_county ,
25  violation_infront_opposit ,
26  house_integer ,
27  street_name ,
28  intersecting_street ,
29  date_first_observed ,
30  law_section ,
31  sub_division ,
32  violation_legal_code ,
33  days_parking_in_effect ,
34  from_hours_in_effect ,
35  to_hours_in_effect ,
36  vehicle_color ,
37  unregistered_vehicle ,
38  vehicle_year ,
39  meter_integer ,
40  feet_from_curb ,
41  violation_post_code ,
42  violation_description ,
43  no_standing_or_stopping_violation ,
44  hydrant_violation ,
45  double_parking_violation 
46FROM ex_nyc_parkingviolations;

Sample SQL Queries for Analysis

Now that you have loaded the NYC parking violations dataset into Firebolt, you can perform various analyses using SQL queries. Here are some sample queries with descriptions:

Query 1: Count of Violations by Violation Code

1SELECT violation_code, violation_description, COUNT(*) AS violation_count
2FROM nyc_parkingviolations
3GROUP BY violation_code, violation_description
4ORDER BY violation_count DESC;

This query counts the number of violations for each violation code, helping you identify the most common parking violations.

Query 2: Average Violations per Day

1SELECT issue_date, COUNT(*) AS daily_violations
2FROM nyc_parkingviolations
3GROUP BY issue_date
4ORDER BY issue_date;

This query calculates the average number of violations issued per day, allowing you to spot trends over time.

Query 3: Top Vehicle Makes with the Most Violations

1SELECT vehicle_make, COUNT(*) AS violation_count
2FROM nyc_parkingviolations
3GROUP BY vehicle_make
4ORDER BY violation_count DESC
5LIMIT 10;

This query identifies the top vehicle makes associated with the most violations, which can be useful for targeting enforcement efforts.

Query 4: Violations by Issuing Agency

1SELECT issuing_agency, COUNT(*) AS violation_count
2FROM nyc_parkingviolations
3GROUP BY issuing_agency
4ORDER BY violation_count DESC;

This query categorizes violations by issuing agency, helping you understand which agencies are responsible for the most violations.

Feel free to explore the dataset further… who knows what insights you might draw. Could it be the color of the car that draws parking tickets … You never know!

Send me as pdf