New York City, known for its bustling streets and limited parking spaces, generates a wealth of parking violation data. In this article, we will walk you through the NYC parking violations dataset's schema, demonstrate how to load it into a data warehouse like Firebolt, and provide sample SQL queries for analysis.
Before delving into the analysis, let's first understand the schema of the NYC parking violations dataset. Here's a tabular representation of the dataset's schema:
Column Name | Data Type | Description |
---|---|---|
summons | Integer | Unique identifier for each parking violation record |
plateid | Text | License plate identifier |
registration | Text | Vehicle registration identifier |
plate | Text | License plate information |
issue_date | Date | Date when the violation was issued |
violation_code | Integer | Code indicating the type of parking violation |
vehicle_body | Text | Description of the vehicle's body type |
vehicle_make | Text | Make or manufacturer of the vehicle |
issuing_agency | Text | Agency responsible for issuing the violation |
street_code1 | Integer | Code representing the primary street where the violation occurred |
street_code2 | street_code2 | Code representing a secondary street related to the violation |
street_code3 | Integer | Code representing another secondary street related to the violation |
vehicle_expiration | Text | Expiration date of the vehicle registration |
violation_location | Integer | Code representing the location of the violation |
violation_precinct | Integer | Precinct where the violation occurred |
issuer_precinct | Integer | Precinct of the issuing officer |
issuer_code | Integer | Code representing the issuing officer |
issuer_command | Text | Command of the issuing officer |
issuer_squad | Text | Squad of the issuing officer |
violation_time | Text | Time of day when the violation occurred |
time_first_observed | Text | Time when the violation was first observed |
violation_county | Text | County where the violation occurred |
violation_infront_opposite | Text | Indicator of whether the violation occurred in front of or opposite |
house_integer | Text | House number or identifier on the street where the violation occurred |
street_name | Text | Name of the street where the violation occurred |
intersecting_street | Text | Name of the intersecting street (if applicable) |
date_first_observed | Text | Date when the violation was first observed |
law_section | Integer | Section of the law or code related to the violation |
sub_division | Text | Sub-division or additional information related to the violation |
violation_legal_code | Text | Legal code related to the violation |
days_parking_in_effect | Text | Days of the week when parking regulations are in effect |
from_hours_in_effect | Text | Starting time when parking regulations are in effect |
to_hours_in_effect | Text | Ending time when parking regulations are in effect |
vehicle_color | Text | Color of the vehicle |
unregistered_vehicle | Text | Indicator of whether the vehicle is unregistered |
vehicle_year | Text | Year of manufacture of the vehicle |
meter_integer | Text | Meter number (if applicable) |
feet_from_curb | Integer | Distance (in feet) from the curb where the violation occurred |
violation_post_code | Text | Postal code related to the violation |
violation_description | Text | Description of the parking violation |
no_standing_or_stopping_violation | Text | Indicator of whether it's a "no standing" or "no stopping" violation |
hydrant_violation | Text | Indicator of whether it's a hydrant violation |
double_parking_violation | Text | Indicator of whether it's a double parking violation |
The dataset contains various columns providing information about each parking violation, including vehicle details, violation codes, and issuance information.
To begin analyzing the NYC parking violations dataset, you'll need to load it into a data warehouse like Firebolt. Data can be loaded into Firebolt using two options. First option is to use the “COPY FROM” statement to load data.
Below statement, shows the use of “COPY FROM” to infer the schema and load data into Firebolt.
1COPY INTO nyc_parkingviolations FROM 's3://firebolt-sample-datasets-public-us-east-1/nyc_sample_datasets/nycparking/parquet/'
2WITH PATTERN="*.parquet" AUTO_CREATE=TRUE
3 TYPE=PARQUET;
An external table defines the schema and location of the data source. In Firebolt, you can create an external table using SQL DDL (Data Definition Language) statements, specifying the dataset's structure and the source location. For example:
1CREATE EXTERNAL TABLE ex_nyc_parkingviolations (
2 summons Integer,
3 plateid Text,
4 registration Text,
5 plate Text,
6 issue_date TEXT,
7 violation_code Integer,
8 vehicle_body Text,
9 vehicle_make Text,
10 issuing_agency Text,
11 street_code1 Integer,
12 street_code2 Integer,
13 street_code3 Integer,
14 vehicle_expiration Text,
15 violation_location Integer,
16 violation_precinct Integer,
17 issuer_precinct Integer,
18 issuer_code Integer,
19 issuer_command Text,
20 issuer_squad Text,
21 violation_time Text,
22 time_first_observed Text,
23 violation_county Text,
24 violation_infront_opposit Text,
25 house_integer Text,
26 street_name Text,
27 intersecting_street Text,
28 date_first_observed Integer,
29 law_section Integer,
30 sub_division Text,
31 violation_legal_code Text,
32 days_parking_in_effect Text,
33 from_hours_in_effect Text,
34 to_hours_in_effect Text,
35 vehicle_color Text,
36 unregistered_vehicle Integer,
37 vehicle_year Text,
38 meter_integer Text,
39 feet_from_curb Integer,
40 violation_post_code Text,
41 violation_description Text,
42 no_standing_or_stopping_violation Text,
43 hydrant_violation Text,
44 double_parking_violation Text
45)
46URL = 's3://firebolt-sample-datasets-public-us-east-1/nyc_sample_datasets/nycparking/parquet/'
47OBJECT_PATTERN = '*.parquet
48TYPE = (PARQUET);
In this step, you define the external table's structure and specify the location of the Parquet file in an S3 bucket.
An internal table is where the data is stored and optimized for querying in Firebolt. You can create an internal table based on the external table structure. Here's an example:
1CREATE TABLE nyc_parkingviolations (
2 summons Integer,
3 plateid Text,
4 registration Text,
5 plate Text,
6 issue_date Date,
7 violation_code Integer,
8 vehicle_body Text,
9 vehicle_make Text,
10 issuing_agency Text,
11 street_code1 Integer,
12 street_code2 Integer,
13 street_code3 Integer,
14 vehicle_expiration Integer,
15 violation_location Integer,
16 violation_precinct Integer,
17 issuer_precinct Integer,
18 issuer_code Integer,
19 issuer_command Text,
20 issuer_squad Text,
21 violation_time Text,
22 time_first_observed Text,
23 violation_county Text,
24 violation_infront_opposit Text,
25 house_integer Text,
26 street_name Text,
27 intersecting_street Text,
28 date_first_observed Integer,
29 law_section Integer,
30 sub_division Text,
31 violation_legal_code Text,
32 days_parking_in_effect Text,
33 from_hours_in_effect Text,
34 to_hours_in_effect Text,
35 vehicle_color Text,
36 unregistered_vehicle Integer,
37 vehicle_year Text,
38 meter_integer Text,
39 feet_from_curb Integer,
40 violation_post_code Text,
41 violation_description Text,
42 no_standing_or_stopping_violation Text,
43 hydrant_violation Text,
44 double_parking_violation Text
45);
This SQL statement creates an internal table and copies the data from the external table into it.
Once you have created the internal table, you can insert new data or update existing data as needed. For example, you can periodically load updated parking violations data into the internal table.
1INSERT INTO nyc_parkingviolations
2SELECT
3 summons ,
4 plateid ,
5 registration ,
6 plate ,
7 TO_TIMESTAMP(issue_date,'YYYY-MM-DD') ,
8 violation_code ,
9 vehicle_body ,
10 vehicle_make ,
11 issuing_agency ,
12 street_code1 ,
13 street_code2 ,
14 street_code3 ,
15 vehicle_expiration ,
16 violation_location ,
17 violation_precinct ,
18 issuer_precinct ,
19 issuer_code ,
20 issuer_command ,
21 issuer_squad ,
22 violation_time ,
23 time_first_observed ,
24 violation_county ,
25 violation_infront_opposit ,
26 house_integer ,
27 street_name ,
28 intersecting_street ,
29 date_first_observed ,
30 law_section ,
31 sub_division ,
32 violation_legal_code ,
33 days_parking_in_effect ,
34 from_hours_in_effect ,
35 to_hours_in_effect ,
36 vehicle_color ,
37 unregistered_vehicle ,
38 vehicle_year ,
39 meter_integer ,
40 feet_from_curb ,
41 violation_post_code ,
42 violation_description ,
43 no_standing_or_stopping_violation ,
44 hydrant_violation ,
45 double_parking_violation
46FROM ex_nyc_parkingviolations;
Now that you have loaded the NYC parking violations dataset into Firebolt, you can perform various analyses using SQL queries. Here are some sample queries with descriptions:
Query 1: Count of Violations by Violation Code
1SELECT violation_code, violation_description, COUNT(*) AS violation_count
2FROM nyc_parkingviolations
3GROUP BY violation_code, violation_description
4ORDER BY violation_count DESC;
This query counts the number of violations for each violation code, helping you identify the most common parking violations.
Query 2: Average Violations per Day
1SELECT issue_date, COUNT(*) AS daily_violations
2FROM nyc_parkingviolations
3GROUP BY issue_date
4ORDER BY issue_date;
This query calculates the average number of violations issued per day, allowing you to spot trends over time.
Query 3: Top Vehicle Makes with the Most Violations
1SELECT vehicle_make, COUNT(*) AS violation_count
2FROM nyc_parkingviolations
3GROUP BY vehicle_make
4ORDER BY violation_count DESC
5LIMIT 10;
This query identifies the top vehicle makes associated with the most violations, which can be useful for targeting enforcement efforts.
Query 4: Violations by Issuing Agency
1SELECT issuing_agency, COUNT(*) AS violation_count
2FROM nyc_parkingviolations
3GROUP BY issuing_agency
4ORDER BY violation_count DESC;
This query categorizes violations by issuing agency, helping you understand which agencies are responsible for the most violations.
Feel free to explore the dataset further… who knows what insights you might draw. Could it be the color of the car that draws parking tickets … You never know!