
1. CHAPTER 9: Inspecting and Modifying Data
1.1. Importing data on meat, poultry, and egg producers
1.2. interviewing the data set
1.2.1. Purpose
1.2.1.1. Discover details
1.2.1.2. Determine what data contains
1.2.1.3. Understand what questions can be answered
1.2.1.4. Assess suitability for analysis
1.2.2. Techniques
1.2.2.1. Aggregate queries
1.2.2.2. GROUP BY
1.2.2.3. HAVING clause
1.2.2.4. COUNT()
1.2.2.5. Filtering
1.2.3. Example Analysis
1.2.3.1. Dataset: meat_poultry_egg_inspect
1.2.3.2. Question: Are companies listed multiple times at same address?
1.2.4. Best Practices
1.2.4.1. Never make assumptions
1.2.4.2. Validate data before drawing conclusions
1.2.4.3. Investigate anomalies
1.2.5. Checking for Missing Values
1.2.5.1. Purpose
1.2.5.1.1. Verify data completeness
1.2.5.1.2. Identify missing values
1.2.5.1.3. Understand data quality issues
1.2.5.2. SQL Techniques
1.2.5.2.1. COUNT() function
1.2.5.2.2. GROUP BY clause
1.2.5.2.3. ORDER BY sorting
1.2.5.2.4. WHERE with IS NULL
1.2.5.2.5. Handling NULLs in sorting (NULLS FIRST/LAST)
1.2.5.3. Data Cleanup
1.2.5.3.1. Identifying missing values
1.2.5.3.2. Tracking data cleansing tasks
1.2.5.3.3. Source verification procedures
1.2.6. Checking for Inconsistent Data Values
1.2.6.1. Problem Description
1.2.6.1.1. Inconsistent data hampers analysis
1.2.6.1.2. Multiple spellings lead to inaccurate aggregation
1.2.6.1.3. Variations in names create data quality issues
1.2.6.2. SQL Techniques
1.2.6.2.1. GROUP BY clause
1.2.6.2.2. COUNT() function
1.2.6.2.3. ORDER BY to sort results for easy scanning
1.2.6.2.4. Visual inspection of unduplicated values
1.2.6.3. Solution Strategy
1.2.6.3.1. Standardize company names
1.2.6.3.2. Create list of items to fix
1.2.6.3.3. Ensure proper grouping for future aggregations
1.3. modifying tables, columns, and data
1.3.1. Modifying Database Elements
1.3.1.1. SQL Commands
1.3.1.2. Use Cases
1.3.1.3. Modify Process
1.3.2. When to Toss your data
1.3.2.1. Warning Signs
1.3.2.2. Steps to Evaluate
1.3.2.3. Making the Call
1.3.3. Modifying Tables with ALTER TABLE
1.3.3.1. Add Column
1.3.3.2. Drop Column
1.3.3.3. Change Data Type
1.3.3.4. Add NOT NULL
1.3.3.5. Remove NOT NULL
1.4. Modifying Values with UPDATE
1.4.1. Basic Syntax
1.4.2. Multiple Columns
1.4.3. Targeted Updates
1.4.4. Cross-Table Updates
1.5. Creating Backup Tables
1.5.1. Purpose
1.5.1.1. Reference for future use
1.5.1.2. Protection against accidental data loss
1.5.1.3. Safety measure before making table modifications
1.5.2. Syntax
1.5.3. Verification
1.5.3.1. Compare row counts
1.5.3.2. Use single query to check both tables
1.5.4. Limitations
1.5.4.1. Indexes not copied
1.5.4.2. Need to create new indexes on backup tables
1.5.4.3. Consider performance impacts if querying backup table
1.6. Restoring Missing Column Values
1.6.1. Creating a Column Copy
1.6.1.1. SQL Operations
1.6.1.1.1. Add Column
1.6.1.1.2. Copy data
1.6.1.2. Verification process
1.6.1.2.1. Query both columns
1.6.1.2.2. Verify results
1.6.1.3. Benefits
1.6.1.3.1. Extra caution beyond table backup
1.6.1.3.2. Maintains original data in same table
1.6.1.3.3. Allow safe updates to original column
1.6.1.3.4. Easy recovery if updates cause problems
1.6.1.3.5. Parient, cautions approach
1.6.1.4. Usage context
1.6.1.4.1. Used with table that already has a full backup
1.6.1.4.2. Specifically column containing state codes
1.6.1.4.3. Preparation for updating missing data
1.6.2. Updating Rows Where Values Are Missing
1.6.2.1. Problem identification
1.6.2.1.1. Missing state codes in 'st' column
1.6.2.1.2. Three rows affected
1.6.2.1.3. Need complete state data for counts
1.6.2.2. Research process
1.6.2.2.1. Online search for missing information
1.6.2.2.2. Find state for each establishment
1.6.2.3. SQL solution
1.6.2.3.1. UPDATE with WHERE clause
1.6.2.4. Verification
1.6.2.4.1. Rerun query to find NULL values
1.6.3. Restoring Original Values
1.6.3.1. Why restore
1.6.3.1.1. Incorrect values provided
1.6.3.1.2. Wrong rows updated
1.6.3.1.3. Need to undo changes
1.6.3.1.4. Data safety measure
1.6.3.2. Backup options
1.6.3.2.1. In table back column
1.6.3.2.2. Full table backup
1.6.3.3. Restoration methods
1.6.3.4. Implement details
1.6.3.4.1. Both methods use UPDATE statements
1.6.3.4.2. Method 1 is simpler
1.6.3.4.3. Method 2 joins tables using primary key
1.6.4. Updating Values for Consistency
1.6.4.1. Problem identification
1.6.4.1.1. Inconsistent company name spellings
1.6.4.1.2. Multiple variations for same company
1.6.4.2. Data protection strategy
1.6.4.2.1. Create new column for standardized values
1.6.4.2.2. Preserve original data in original column
1.6.4.2.3. Work with copy to avoid data loss
1.6.4.3. Standardization process
1.6.4.3.1. Use UPDATE with WHERE clause
1.6.4.3.2. Use LIKE with wlldcard pattern matching
1.6.4.3.3. Set all matches to standardized name format
1.6.4.4. Verification
1.6.4.4.1. SELECT query to compare original and standardized values
1.6.4.4.2. Confirm all variations now standardized
1.6.4.4.3. Keep original column for reference
1.6.5. Repairing ZIP Codes Using Concatenation
1.6.5.1. Problemm identification
1.6.5.1.1. Missing leading zeros in ZIP codes
1.6.5.2. Data safety approach
1.6.5.2.1. Create backup colummn
1.6.5.2.2. Copy all existing ZIP values
1.6.5.2.3. Work with original and backup
1.6.5.3. Concatenation technique
1.6.5.3.1. Double-pipe operator (||)
1.6.5.3.2. Combines strings
1.6.5.4. Implementation steps
1.6.5.4.1. Backup ZIP data
1.6.5.4.2. Fix codes needing
1.6.5.4.3. Verification
1.6.5.4.4. Additional applications
1.6.6. Updating Values Across Tables
1.6.6.1. Concept
1.6.6.1.1. Use data from one table to update onather
1.6.6.1.2. Leverages table relationships
1.6.6.1.3. Requires matching keys between tables
1.6.6.2. Table structure
1.6.6.2.1. Main table
1.6.6.2.2. Region table
1.6.6.3. Update technique
1.6.6.3.1. UPDATE with table alias
1.6.6.3.2. WHERE EXISTS with subquery
1.6.6.3.3. Join table via matching columns
1.6.6.3.4. Filter by region value
1.7. Deleting unnecessary data
1.7.1. Deleting Rows from a Table
1.7.1.1. DELETE FROM
1.7.1.1.1. Removes all rows from a table
1.7.1.1.2. Syntax: DELETE FROM table_name
1.7.1.1.3. No WHERE clause needed
1.7.1.2. DELETE WITH WHERE
1.7.1.2.1. Removes specific fows
1.7.1.2.2. Requires WHERE clause with filtering expression
1.7.1.2.3. Syntax: DELETE FROM table_name WHERE expression
1.7.1.2.4. Can use comparision operators (=, <>, IN)
1.7.1.2.5. Can filter with multiple conditions using AND or OR
1.7.1.3. Alternative methods
1.7.1.3.1. DROP TABLE followed by CREATE TABLE
1.7.1.3.2. Faster for tables with large number of rows
1.7.1.3.3. Recreates empty table structure
1.7.2. Deleting a Column from a Table
1.7.2.1. Syntax
1.7.2.1.1. ALTER TABLE table_name DROP COLUMN column_name
1.7.2.1.2. Requires table name and column name
1.7.2.1.3. Table must exist in database
1.7.2.1.4. Column must exist in table
1.7.2.2. Use case example
1.7.2.2.1. Removing backup columns after data fixes
1.7.2.3. Effects
1.7.2.3.1. Removes column completely from table
1.7.2.3.2. Deleted all data in that column
1.7.2.3.3. Cannot be undone without backup
1.7.2.3.4. Table structure is modified
1.7.3. Deleting a Table from a Database
1.7.3.1. Syntax
1.7.3.1.1. DROP TABLE table_name
1.7.3.1.2. Simple ANSI SQL statement
1.7.3.1.3. Only requires table name
1.7.3.2. Use cases
1.7.3.2.1. Remove backup tables
1.7.3.2.2. Delete working tables that are no longer needed
1.7.3.3. Effects
1.7.3.3.1. Completely removes table from database
1.7.3.3.2. Deletes all data in the table
1.7.3.3.3. Cannot be undone without backup
1.7.3.3.4. Removes table structure and schema
1.8. Using transaction blocks to save or revert changes
1.8.1. Core concept
1.8.1.1. Group of SQL statements treated as a unit
1.8.1.2. Change not final until committed
1.8.1.3. Can be canceled before commiting
1.8.2. Transaction workflow
1.8.2.1. Start transaction
1.8.2.2. Execute SQL statements
1.8.2.3. Review results
1.8.2.4. Decide to commit or rollback
1.8.3. Keywords
1.8.3.1. TRANSACTION
1.8.3.2. COMMIT
1.8.3.3. ROLLBACK
1.9. Improving performance when updating large tables
1.9.1. Problem with row updates
1.9.1.1. PostgreSQL creates new row versions for each update
1.9.1.2. Old row versions not deleted immediately
1.9.1.3. Result in table inflation
1.9.1.4. Increases disk usage
1.9.1.5. Time consuming for large tables
1.9.2. Copy-and-swap technique
1.9.2.1. Create a copy of the table with the new column already populated
1.9.2.2. Swap the table names so the new copy become the main table
1.9.2.3. Original becomes a backup
1.9.3. Benefits
1.9.3.1. Avoids table inflation
1.9.3.2. Reduces disk space usage
1.9.3.3. Fast for table with hundreds of thousands or million of rows
1.9.3.4. Eliminates need for immediate VACUUM cleanup
1.9.3.5. Clean solution with less overhead
2. CHAPTER 10: Statistical Functions in SQL
2.1. Creating a census stats table
2.1.1. Data structure
2.1.1.1. Table name: asc_2011_2015_stats
2.1.2. Data source
2.1.2.1. US census bureau
2.1.2.2. American commmunity survey
2.1.2.3. 5-year estimates
2.1.3. SQL creation code
2.1.3.1. CREATE TABLE statment
2.1.3.2. Column definitions
2.1.3.3. Constraints setup
2.1.3.4. CHECK constraint
2.1.4. Data import process
2.1.5. Measuring Correlation with corr(Y, X)
2.1.5.1. Pearson correlation coefficient
2.1.5.1.1. Measures strength of linear relationship
2.1.5.1.2. Values range from -1 to +1
2.1.5.1.3. Denoted as r
2.1.5.1.4. Show how change in one variable relates to change in another
2.1.5.2. Interpretation of values
2.1.5.2.1. r = 0 No relationship
2.1.5.2.2. r = +/-0.1 to +/-0.29 weak relationship
2.1.5.2.3. r = +/-0.3 to +/-0.59 Moderate relationship
2.1.5.2.4. r = +/-0.6 to +/-0.99 Strong to nearly perfect relationship
2.1.5.2.5. r = +/-1 Perfect relationship
2.1.5.3. SQL function implementation
2.1.5.3.1. Function corr (X, Y)
2.1.5.3.2. Binary aggregate function
2.1.5.3.3. Y = dependent variable
2.1.5.3.4. X = independent variable
2.1.5.3.5. Order of parameters doesn't matter for results
2.1.6. Checking Additional Correlations
2.1.6.1. SQL techniques
2.1.6.1.1. Using corr (Y,X) function
2.1.6.1.2. Rouding results using round()
2.1.6.1.3. Type casting to numeric with ::
2.1.6.1.4. Single query for multiple correlations
2.1.6.2. Correclation analysis
2.1.6.2.1. Multiple correlation tests
2.1.6.2.2. Comparing variable pairs
2.1.6.2.3. Education, income commute times
2.1.6.3. Variable pairs & results
2.1.6.3.1. 0.68 Strong
2.1.6.3.2. 0.05 Very weak
2.1.6.3.3. -0.14 Weak, inverse
2.1.6.4. Key findings & caveats
2.1.7. Predicting Values with Regression Analysis
2.1.7.1. Linear regression concept
2.1.7.1.1. Finding the best linear equation
2.1.7.1.2. Creating a best fit straight line
2.1.7.1.3. Least squares regression line
2.1.7.1.4. Describes relationship between variables
2.1.7.2. Regression line formula
2.1.7.2.1. Y = bX + a
2.1.7.3. SQL implementation
2.1.7.3.1. regr_slope (X, Y)
2.1.7.3.2. regr_intercept (X, Y)
2.1.7.3.3. round() function for readability
2.1.7.3.4. ::numeric for type conversion
2.1.8. Finding the Effect of an Independent Variable with r-squared
2.1.8.1. R-squared concept
2.1.8.1.1. Coefficient of determination
2.1.8.1.2. Measures percentage of variation explained by the independent variable
2.1.8.1.3. Values range from 0 to 1
2.1.8.1.4. Example
2.1.8.2. Calculation methods
2.1.8.2.1. Square the correclation coefficient (r² = r x r)
2.1.8.2.2. Value alway between 0 and 1
2.1.8.2.3. Higher values mean stronger explanation power
2.1.8.3. SQL implementation
2.1.8.4. Limitations & caveats
2.1.8.4.1. Correclation doesn't prove causality
2.2. Creating rankings with sql
2.2.1. Ranking with rank() and dense_rank()
2.2.1.1. Function concepts
2.2.1.1.1. Window functions
2.2.1.1.2. Calculate acress sets of rows
2.2.1.1.3. Return results for each row
2.2.1.1.4. Unlike aggregate functions, don't group rows
2.2.1.1.5. Part of standard ANSI SQL
2.2.1.2. Syntax & implementation
2.2.1.2.1. Function syxtax: rank() OVER (ORDER BY column [DESC/ASC])
2.2.1.2.2. dense_rank() OVER (ORDER BY column [DESC/ASC])
2.2.1.2.3. Both use OVER clause to define
2.2.1.2.4. ORDER BY specifices sorting direction
2.2.1.3. Differences in handling ties
2.2.1.3.1. rank(): Creates gaps in rankings after ties
2.2.1.3.2. No gaps in rankings after ties
2.2.2. Ranking Within Subgroups with PARTITION BY
2.2.2.1. Concept overview
2.2.2.1.1. Window functions with grouping
2.2.2.1.2. Rank rows within specific groups
2.2.2.1.3. Create separate rankings for each group
2.2.2.1.4. Different from overall ranking
2.2.2.2. Syntax & Implementation
2.2.2.2.1. OVER clause extended with PARTITION BY
2.2.2.2.2. Syntax: rank() OVER (PARTITION BY column ORDER BY column [DESC/ASC])
2.2.2.2.3. Group rows by values in the partition column
2.2.2.3. Use cases & applications
2.2.2.3.1. Rank employees by salary within departments
2.2.2.3.2. Rank movies by earning within each genre
2.2.2.3.3. Find top vehicle for each auto manufacturer
2.2.2.3.4. Identify wettest month for each year
2.3. Calculating rates for meaningful comparisons
2.3.1. Why raw number can be misleading
2.3.1.1. Missing context
2.3.1.2. Population differences
2.3.1.3. Raw comparision is not meaningful
2.3.2. Rate calculation method
2.3.2.1. Formula: (count/population * 1000)
2.3.2.2. Result: Rate per 1000 people
2.3.3. Limittation & considerrations
2.3.3.1. FBI discourages creating rankings
2.3.3.2. Misleading perceptions can harm communities
2.3.3.3. Variations die to may factors
2.3.3.4. Incomplete reporting by agencies
2.3.3.5. Despite limitations still useful for comparision
3. CHAPTER 11: Working with Dates and Times
3.1. working with time zones
3.1.1. Finding Your Time Zone Setting
3.1.1.1. Default time zone setting
3.1.1.2. Listing time zones
3.1.1.3. Filtering time zones
3.1.1.4. Time zone considerations
3.1.2. Setting the Time Zone
3.1.2.1. Configuration
3.1.2.1.1. Server: postgresql.conf (permanent)
3.1.2.1.2. Client: SET timezone TO 'timezone' (session-only)
3.1.2.2. Data Handling
3.1.2.2.1. Stored as UTC internally
3.1.2.2.2. Display varies by session timezone
3.1.2.2.3. Original data remains unchanged
3.1.2.3. Conversion
3.1.2.3.1. Change session: SET timezone TO 'US/Eastern'
3.1.2.3.2. Per query: SELECT time AT TIME ZONE 'Asia/Seoul'
3.1.2.3.3. Example: 4am Pacific = 7am Eastern = 9pm Seoul
3.1.2.4. Benefits
3.1.2.4.1. Accurate calculations across regions
3.1.2.4.2. Consistent timestamp handling
3.2. Data types and Functions for dates and times
3.2.1. Manipulating dates and times
3.2.1.1. Extracting the Components of a timestamp Value
3.2.1.1.1. data_part() function
3.2.1.1.2. Calculated components
3.2.1.1.3. Alternative methods
3.2.1.2. Creating Datetime Values from timestamp Components
3.2.1.2.1. Common use cases
3.2.1.2.2. PostgreSQL functions
3.2.1.2.3. Input types
3.2.1.2.4. Time zone consideration
3.2.1.3. Retrieving the Current Date and Time
3.2.1.3.1. Standard SQL functions
3.2.1.3.2. PostgreSQL functions
3.3. Calculations with dates and times
4. CHAPTER 12: Advanced Query Techniques
4.1. Using subqueries
4.1.1. Filtering with Subqueries in a WHERE Clause
4.1.1.1. Purpose
4.1.1.1.1. Generate filter values dynamically
4.1.1.1.2. Avoid hardcoding comparison values
4.1.1.1.3. Combine multiple queries into one
4.1.1.1.4. Filter by calculated values
4.1.1.2. Implementation
4.1.1.2.1. Enclosed in parentheses after comparison operators
4.1.1.2.2. Example: WHERE column >= (SELECT ...)
4.1.1.2.3. Acts as a value generator for comparison
4.1.1.2.4. Can return single value or list of values
4.1.1.3. Practical Uses
4.1.1.3.1. Population analysis: WHERE population >= (SELECT percentile_cont(.9)...)
4.1.1.3.2. Table optimization: Delete rows based on subquery criteria
4.1.1.3.3. Working with subsets of large datasets
4.1.1.3.4. Dynamic filtering based on dataset characteristics
4.1.2. Creating Derived Tables with Subqueries
4.1.2.1. Definition
4.1.2.1.1. Subquery in FROM clause
4.1.2.1.2. Returns rows and columns as a table
4.1.2.1.3. Must be named with alias
4.1.2.2. Purpose
4.1.2.2.1. Perform multi-step operations in one query
4.1.2.2.2. Enable operations impossible in single query
4.1.2.2.3. Create intermediate results for further analysis
4.1.2.2.4. Process data before final selection
4.1.3. Joining Derived Tables
4.1.3.1. Definition
4.1.3.1.1. Multiple subqueries in FROM clause
4.1.3.1.2. Each subquery produces a derived table
4.1.3.1.3. Tables connected via JOIN statements
4.1.3.2. Purpose
4.1.3.2.1. Perform complex multi-step data processing
4.1.3.2.2. Preprocess data from different sources
4.1.3.2.3. Create intermediate aggregations
4.1.3.2.4. Calculate rates from preprocessed data
4.1.3.3. Structure
4.1.3.3.1. Multiple derived tables: (SELECT...) AS alias
4.1.3.3.2. Join condition: ON table1.column = table2.column
4.1.3.3.3. Final calculations in main query SELECT
4.1.4. Generating Columns with Subqueries
4.1.4.1. Definition
4.1.4.1.1. Subquery placed in SELECT clause column list
4.1.4.1.2. Returns single values for each row
4.1.4.1.3. Creates calculated columns dynamically
4.1.4.2. Implementation
4.1.4.2.1. Format: SELECT column1, (SELECT...) AS new_column
4.1.4.2.2. Typically uses aggregate functions in subquery
4.1.4.2.3. Can be used in calculations with other columns
4.1.4.2.4. Can appear multiple times in same query
4.1.4.3. Use Cases
4.1.4.3.1. Add reference values to each row (e.g., median)
4.1.4.3.2. Calculate differences from aggregates
4.1.4.3.3. Create comparison metrics
4.1.4.3.4. Generate derived statistics
4.2. common table expressions
4.2.1. Definition
4.2.1.1. Temporary named result sets
4.2.1.2. Created with WITH clause
4.2.1.3. Also called "WITH clause"
4.2.1.4. Recent addition to SQL standard
4.2.2. Syntax
4.2.2.1. WITH table_name (columns) AS (subquery)
4.2.2.2. Column definitions are optional
4.2.2.3. No data type definitions needed
4.2.2.4. Multiple CTEs can be defined at once
4.2.3. Use Cases
4.2.3.1. Simplifying complex queries
4.2.3.2. Replacing repetitive subqueries
4.2.3.3. Joining preprocessed data sets
4.2.3.4. Creating multiple temporary tables
4.3. cross tabulations
4.4. reclassifying values with case
4.5. Advanced Query Techniques 209 using case in a common table expression
5. CHAPTER 13: Mining Text to Find Meaningful Data
5.1. Formatting text using string Functions
5.1.1. Case Formatting
5.1.1.1. upper
5.1.1.1.1. Capitalizes all alphabetical characters
5.1.1.1.2. Example: upper('Neal7') → 'NEAL7'
5.1.1.2. lower
5.1.1.2.1. Converts all alphabetical characters to lowercase
5.1.1.2.2. Example: lower('Randy') → 'randy'
5.1.1.3. initcap
5.1.1.3.1. Capitalizes first letter of each word
5.1.1.3.2. Example: initcap('at the end of the day') → 'At The End Of The Day'
5.1.2. Character Information
5.1.2.1. char_length(string)
5.1.2.1.1. Counts total characters in string
5.1.2.1.2. Includes spaces in count
5.1.2.1.3. Example: char_length(' Pat ') → 5
5.1.2.2. length(string)
5.1.2.2.1. Alternative to char_length()
5.1.2.2.2. Non-ANSI SQL function
5.1.2.2.3. Has variant for binary strings
5.1.2.3. position(substring in string)
5.1.2.3.1. Finds location of substring in string
5.1.2.3.2. Returns index position (starting at 1)
5.1.2.3.3. Example: position(', ' in 'Tan, Bella') → 4
5.1.3. Removing Characters
5.1.3.1. trim()
5.1.3.1.1. Removes specified characters from strings
5.1.3.1.2. Default: removes spaces if no character specified
5.1.3.1.3. Example: trim('s' from 'socks') → 'ock'
5.1.3.2. Position Options
5.1.3.2.1. Leading (front of string)
5.1.3.2.2. Trailing (end of string)
5.1.3.2.3. Both (default, removes from both ends)
5.1.3.2.4. Example: trim(trailing 's' from 'socks') → 'sock'
5.1.3.3. PostgreSQL Variants
5.1.3.3.1. ltrim(string, characters)
5.1.3.3.2. rtrim(string, characters)
5.1.4. Extracting and Replacing Characters
5.1.4.1. left(string, number)
5.1.4.1.1. Returns specified number of characters from left side
5.1.4.1.2. Example: left('703-555-1212', 3) → '703'
5.1.4.2. right(string, number)
5.1.4.2.1. Returns specified number of characters from right side
5.1.4.2.2. Example: right('703-555-1212', 8) → '555-1212'
5.1.4.3. replace(string, from, to)
5.1.4.3.1. Substitutes characters in a string
5.1.4.3.2. Replaces all instances of 'from' with 'to'
5.1.4.3.3. Example: replace('bat', 'b', 'c') → 'cat'
5.2. Matching text patterns with regular expressions
5.2.1. Regular Expression Notation
5.2.1.1. Basic Elements
5.2.1.1.1. Literals: Match exact characters (e.g., "Al" matches in "Alicia")
5.2.1.1.2. Wildcard (.): Matches any single character except newline
5.2.1.1.3. Character Classes []: Match any character within brackets
5.2.1.2. Special Character Shortcuts
5.2.1.2.1. \w: Any word character [A-Za-z0-9_]
5.2.1.2.2. \d: Any digit
5.2.1.2.3. \s: Any whitespace
5.2.1.2.4. \t: Tab character
5.2.1.2.5. \n: Newline character
5.2.1.2.6. \r: Carriage return character
5.2.1.3. Position Anchors
5.2.1.3.1. ^: Match at start of string
5.2.1.3.2. $: Match at end of string
5.2.1.4. Quantifiers
5.2.1.4.1. ?: Preceding match zero or one time
5.2.1.4.2. *: Preceding match zero or more times
5.2.1.4.3. +: Preceding match one or more times
5.2.1.4.4. {m}: Exactly m times
5.2.1.4.5. {m,n}: Between m and n times
5.2.1.5. Grouping & Alternation
5.2.1.5.1. (pattern): Capture group - reports matched portion
5.2.1.5.2. (?:pattern): Non-capture group - groups without reporting
5.2.1.5.3. a|b: Alternation - match either a or b
5.2.1.6. Examples
5.2.1.6.1. .+: Any character one or more times
5.2.1.6.2. \d{4}: Four digits (e.g., "2019")
5.2.1.6.3. May \d, \d{4}: "May" followed by digit, comma, and four digits
5.2.1.6.4. ^\w+: One or more word characters at string start
5.2.1.6.5. \w+.$: One or more word characters plus any character at string end
5.2.2. Turning Text to Data with Regular Expression Functions
5.2.2.1. Creating a Table for Crime Reports
5.2.2.2. Matching Crime Report Date Patterns
5.2.2.2.1. Purpose
5.2.2.2.2. Using regexp_match() Function
5.2.2.2.3. Date Pattern Structure
5.2.2.2.4. Regular Expression Components
5.2.3. Matching Additional Crime Report Elements
5.2.3.1. First Hour Pattern
5.2.3.1.1. Pattern: /\d{2}\n(\d{4})
5.2.3.1.2. Components
5.2.3.1.3. Example: 4/16/17-4/17/17 2100-0900 hrs
5.2.3.2. Second Hour Pattern
5.2.3.2.1. Pattern: /\d{2}\n\d{4}-(\d{4})
5.2.3.2.2. Components
5.2.3.2.3. Example: 4/16/17-4/17/17 2100-0900 hrs
5.2.3.3. Street Address Pattern
5.2.3.3.1. Pattern: hrs.\n(\d+ .+(?:Sq.|Plz.|Dr.|Ter.|Rd.))
5.2.3.3.2. Components
5.2.3.3.3. Example: 21800 block Newlin Mill Rd
5.2.3.4. City Pattern
5.2.3.4.1. Pattern: (?:Sq.|Plz.|Dr.|Ter.|Rd.)\n(\w+ \w+|\w+)\n
5.2.3.4.2. Components
5.2.3.4.3. Example: Sterling, Middleburg
5.2.3.5. Crime Type Pattern
5.2.3.5.1. Pattern: \n(?:\w+ \w+|\w+)\n(.*):
5.2.3.5.2. Components
5.2.3.5.3. Example: Larceny, Destruction of Property
5.2.3.6. Description Pattern
5.2.3.6.1. Pattern: :\s(.+)(?:C0|SO)
5.2.3.6.2. Components
5.2.3.6.3. Example: A license plate was reported stolen from a vehicle
5.2.3.7. Case Number Pattern
5.2.3.7.1. Pattern: (?:C0|SO)[0-9]+
5.2.3.7.2. Components
5.2.3.7.3. Example: C0170006614, SO170006250
5.2.4. Extracting Text from the regexp_match() Result
5.2.4.1. Understanding Array Results
5.2.4.1.1. regexp_match() returns array of text values
5.2.4.1.2. Array indicator: text[] data type in column header
5.2.4.1.3. Visual indicator: curly brackets {} around results
5.2.4.1.4. Example: {C0170006614} instead of C0170006614
5.2.4.2. Extraction Method
5.2.4.2.1. Use array notation to access elements
5.2.4.2.2. Format: (regexp_match(text, pattern))[array_index]
5.2.4.2.3. Array index starts at 1, not 0
5.2.4.2.4. Syntax requires parentheses around regexp_match()
5.2.4.2.5. Example: (regexp_match(original_text, '(?:C0|SO)[0-9]+'))[1]
5.2.4.3. Extraction Process
5.2.4.3.1. Wrap regexp_match() function in parentheses
5.2.4.3.2. Add [1] to extract first element of array
5.2.4.4. Purpose
5.2.4.4.1. Prepare data for database updates
5.2.4.4.2. Match column data types (text vs. array)
5.2.5. Updating the crime_reports Table with Extracted Data
5.2.5.1. Process Overview
5.2.5.1.1. Extract elements from original text
5.2.5.1.2. Format data for timestamp compatibility
5.2.5.1.3. Use UPDATE statements to populate columns
5.2.5.1.4. Convert text patterns to structured data
5.2.5.2. Date and Time Extraction
5.2.5.2.1. Extract date with regexp_match(): \d{1,2}/\d{1,2}/\d{2}
5.2.5.2.2. Extract time with regexp_match(): /\d{2}\n(\d{4})
5.2.5.2.3. Access first element from array results with [1]
5.2.5.2.4. Example: (regexp_match(original_text, '\d{1,2}/\d{1,2}/\d{2}'))[1]
5.2.5.3. Timestamp Creation
5.2.5.3.1. Concatenate elements with double-pipe (||) operator
5.2.5.3.2. Format: date || ' ' || time || ' US/Eastern'
5.2.6. Using CASE to Handle Special Instances
5.2.6.1. UPDATE Statement Structure
5.2.6.1.1. Single statement updates all columns
5.2.6.1.2. More efficient than multiple statements
5.2.6.1.3. Handles inconsistent data patterns
5.2.6.1.4. Combines regex extraction with conditional logic
5.2.6.2. CASE Statement Logic
5.2.6.2.1. Tests multiple conditions sequentially
5.2.6.2.2. Format: CASE WHEN condition THEN result ELSE default END
5.2.6.2.3. Returns appropriate value based on conditions
5.2.7. Using Regular Expressions with WHERE
5.2.7.1. Operators
5.2.7.1.1. ~ : Case-sensitive regex match
5.2.7.1.2. ~* : Case-insensitive regex match
5.2.7.1.3. !~ : Negated case-sensitive match
5.2.7.1.4. !~* : Negated case-insensitive match
5.2.7.2. Advantages
5.2.7.2.1. More complex pattern matching than LIKE/ILIKE
5.2.7.2.2. Flexible pattern definitions
5.2.7.2.3. Supports alternation (either/or matching)
5.2.7.2.4. Can combine with other SQL conditions
5.2.7.3. Syntax
5.2.7.3.1. WHERE column_name ~* 'pattern'
5.2.7.3.2. WHERE column_name !~ 'pattern'
5.2.7.3.3. Can use AND/OR with other conditions
5.2.7.3.4. Patterns use standard regex notation
5.2.8. Additional Regular Expression Functions
5.2.8.1. regexp_replace(string, pattern, replacement)
5.2.8.2. regexp_split_to_table(string, pattern)
5.2.8.3. regexp_split_to_array(string, pattern)
5.2.8.4. Array Processing Functions
5.3. Full text search in postgresql
5.3.1. Text Search Data Types
5.3.2. Creating a Table for Full Text Search
6. CHAPTER 14: Analyzing Spatial Data with PostGIS
6.1. installing postgis and creating a spatial database
6.2. the building blocks of spatial data
6.3. two-dimensional geometries
6.3.1. Simple Feature Standard
6.3.1.1. Created by ISO and OGC
6.3.1.2. Defines 2D/3D shapes ("geometries")
6.3.1.3. Implemented by PostGIS
6.3.2. Basic Geometry Types
6.3.2.1. Point
6.3.2.1.1. Single location (X,Y coordinates)
6.3.2.1.2. Represented as dot on maps
6.3.2.1.3. Example: longitude/latitude
6.3.2.2. LineString
6.3.2.2.1. 2+ connected points
6.3.2.2.2. Straight line segments
6.3.2.2.3. Examples: roads, trails, streams
6.3.2.3. Polygon
6.3.2.3.1. 2D shape with 3+ sides
6.3.2.3.2. Closed area with boundaries
6.3.2.3.3. Can contain "holes" (interior polygons)
6.3.2.3.4. Examples: nations, buildings, lakes
6.3.2.4. MultiPoint
6.3.2.4.1. Collection of Points
6.3.2.4.2. Example: all locations of a retail chain
6.3.2.5. MultiLineString
6.3.2.5.1. Collection of LineStrings
6.3.2.5.2. Example: road with disconnected segments
6.3.2.6. MultiPolygon
6.3.2.6.1. Collection of Polygons
6.3.2.6.2. Example: parcels divided by roads
6.4. postgis data types
6.4.1. creating spatial objects with postgis Functions
6.4.1.1. ST_GeomFromText()
6.4.1.1.1. Primary geometry constructor
6.4.1.1.2. Syntax: ST_GeomFromText(WKT, SRID)
6.4.1.2. ST_GeogFromText()
6.4.1.2.1. Creates geography data type
6.4.1.3. Point Functions
6.4.1.3.1. ST_PointFromText()
6.4.1.3.2. ST_MakePoint()
6.4.1.3.3. ST_SetSRID()
6.4.1.4. LineString Functions
6.4.1.4.1. ST_LineFromText()
6.4.1.4.2. ST_MakeLine()
6.4.1.5. Polygon Functions
6.4.1.5.1. ST_PolygonFromText()
6.4.1.5.2. ST_MakePolygon()
6.4.1.5.3. ST_MPolyFromText()
6.4.2. analyzing Farmers’ markets data
6.4.2.1. Finding nearby markets for weekend visits
6.4.2.2. Distance-based analysis and planning
6.4.2.3. Similar to common map application features
6.5. working with census shapefiles
6.5.1. Shapefile Components
6.5.1.1. .shp - Main geometry storage file
6.5.1.2. .shx - Geometry index file
6.5.1.3. .dbf - Attribute database (dBASE format)
6.5.1.4. .xml - Optional metadata
6.5.1.5. .prj - Optional projection/coordinate system
6.5.2. Loading Methods
6.5.2.1. GUI Approach
6.5.2.2. Connection Settings
6.5.2.3. Import Parameters
6.6. performing spatial Joins
6.6.1. Key Concept
6.6.1.1. Joins based on spatial relationships
6.6.1.2. Links tables through geometry columns
6.6.1.3. Uses spatial predicates instead of equality
6.6.2. Spatial Join Functions
6.6.2.1. ST_Intersects()
6.6.2.2. ST_Intersection()
7. CHAPTER 15: Saving Time with Views, Functions, and Triggers
7.1. Using views to simplify queries
7.1.1. Definition
7.1.1.1. Stored queries → virtual tables
7.1.1.2. CREATE OR REPLACE VIEW name AS SELECT...
7.1.2. Structure
7.1.2.1. Simple views → filter columns/rows
7.1.2.1.1. Example: nevada_counties_pop_2010
7.1.2.1.2. SELECT 4/16 columns + WHERE filter
7.1.2.2. Complex views → calculations + joins
7.1.2.2.1. Example: county_pop_change_2010_2000
7.1.2.2.2. JOIN census tables + percent calculation
7.1.3. Management
7.1.3.1. CREATE → new view
7.1.3.2. REPLACE → update (constraints: same columns/types/order)
7.1.3.3. DROP → delete
7.1.3.4. Location: Schemas → public → Views
7.1.4. Usage
7.1.4.1. Query like tables: SELECT * FROM view_name
7.1.4.2. Filter: WHERE conditions
7.1.4.3. Column selection: specific vs all (*)
7.1.4.4. Performance: saves repeated complex queries
7.1.5. Benefits
7.1.5.1. Simplifies complexity
7.1.5.2. Standardizes calculations
7.1.5.3. Enables consistent reporting
7.2. Inserting, Updating, and Deleting Data Using a View
7.2.1. Creating a View of Employees
7.2.2. Inserting Rows Using the employees_tax_dept View
7.2.2.1. INSERT INTO view_name = INSERT INTO underlying_table
7.2.2.2. Changes in view → reflected in base table
7.2.2.3. View query reruns → shows updated results
7.2.3. Updating Rows Using the employees_tax_dept View
7.2.3.1. UPDATE view_name = UPDATE base_table
7.2.3.2. WHERE conditions apply normally
7.2.3.3. Changes affect underlying data
7.2.4. Deleting Rows Using the employees_tax_dept View
7.2.4.1. DELETE FROM view_name = DELETE FROM base_table
7.2.4.2. Removes data from underlying table
7.2.4.3. Same restrictions as inserts/updates apply
7.3. programming Your own Functions
7.3.1. Creating the percent_change() Function
7.3.1.1. Purpose: Reuse code, simplify calculations
7.3.1.2. Syntax: CREATE OR REPLACE FUNCTION name(args)
7.3.2. Using the percent_change() Function
7.3.2.1. Identical results to formula
7.3.2.2. Cleaner code
7.3.2.3. Less error-prone
7.3.2.4. Consistent calculations
7.3.2.5. Simpler maintenance
7.3.3. Updating Data with a Function
7.3.3.1. Automatic personal days calculation
7.3.3.2. Applies consistent business rules
7.3.3.3. Bulk data update
7.3.4. Using the Python Language in a Function
7.3.4.1. Leverage Python's capabilities
7.3.4.2. Access to Python libraries
7.3.4.3. Complex data processing
7.3.4.4. Text manipulation
7.3.4.5. Beyond SQL/PL/pgSQL limitations
7.4. automating database actions with triggers
7.4.1. Logging Grade Updates to a Table
7.4.2. Creating Tables to Track Grades and Updates
7.4.3. Creating the Function and Trigger
7.4.3.1. Automated audit trail
7.4.3.2. Change history preservation
7.4.3.3. No application code changes needed
7.4.3.4. Consistent enforcement
7.4.3.5. Data integrity protection
7.4.4. Automatically Classifying Temperatures
7.4.4.1. Automatic classification
7.4.4.2. No manual intervention needed
7.4.4.3. Real-time data processing
7.4.4.4. Consistent categorization
7.4.4.5. Ready for reporting/forecasting
8. CHAPTER 1: Creating Your First Database and Table
8.1. Creating a database
8.1.1. CREATE DATABASE analysis;
8.2. Creating a table
8.2.1. The CREATE TABLE Statement
8.3. Inserting rows into a table
8.4. Formatting sql for readability
8.4.1. Importance
8.4.1.1. No special formatting required to run
8.4.1.2. Enhances readability
8.4.1.3. Facilitates collaboration
8.4.2. Conventions
8.4.2.1. Keywords
8.4.2.1.1. Use UPPERCASE for SQL keywords (e.g., SELECT)
8.4.2.1.2. Optional: Uppercase data types (e.g., TEXT, INTEGER)
8.4.2.1.3. Book practice: Lowercase data types for distinction
8.4.2.2. Object Names
8.4.2.2.1. Avoid camelCase
8.4.2.2.2. Use lowercase_and_underscores (e.g., table_name, column_name)
8.4.2.2.3. More details in Chapter 7
8.4.2.3. Indentation
8.4.2.3.1. Indent clauses and code blocks
8.4.2.3.2. Use either
8.4.2.3.3. Tabs vs Spaces
9. CHAPTER 2: Beginning Data Exploration with SELECT
9.1. Basic select syntax
9.1.1. Quering all column
9.1.1.1. SELECT * FROM my_table;
9.1.2. Querying a Subset of Columns
9.1.2.1. SELECT some_column, another_column, amazing_column FROM table_name;
9.1.3. Using DISTINCT to Find Unique Values
9.2. Sorting data with ORDER BY
9.2.1. Purpose
9.2.1.1. Arranges data for better readability and pattern identification
9.2.1.2. Does not alter the original table, only query results
9.2.2. Syntax
9.2.2.1. ORDER BY column_name [ASC | DESC]
9.2.3. Default Behavior
9.2.3.1. Ascending order (ASC)
9.2.4. Descending Order
9.2.4.1. Use `DESC` for descending order
9.2.5. Example
9.2.6. Sorting Multiple Columns
9.2.6.1. Use comma-separated column names with sort order
9.2.6.2. Example
9.2.7. Best practices
9.2.7.1. Limit the number of columns in `ORDER BY` for clarity
9.2.7.2. Focus results on answering specific questions
9.2.7.3. Run multiple queries for complex sorting needs
9.3. Filtering rows with where
9.3.1. Purpose
9.3.1.1. Limit rows returned based on criteria
9.3.1.2. Use WHERE clause with operators to filter data
9.3.1.3. Exclude rows if needed
9.3.2. Syntax
9.3.3. Basic Example
9.3.4. Comparison Operators
9.3.4.1. Equal to (`=`): `WHERE school = 'Baker Middle'`
9.3.4.2. Not equal to (`<>` or `!=`): `WHERE school != 'Baker Middle'`
9.3.4.3. Greater than (`>`): `WHERE salary > 20000`
9.3.4.4. Less than (`<`): `WHERE salary < 60500`
9.3.4.5. Greater than or equal to (`>=`): `WHERE salary >= 20000`
9.3.4.6. Less than or equal to (`<=`): `WHERE salary <= 60500`
9.3.4.7. Between range (`BETWEEN`): `WHERE salary BETWEEN 20000 AND 40000`
9.3.4.8. Match set of values (`IN`): `WHERE last_name IN ('Bush', 'Roush')`
9.3.4.9. Pattern matching (case-sensitive `LIKE`, case-insensitive `ILIKE`)
9.3.4.10. Negation (`NOT`): `WHERE first_name NOT ILIKE 'sam%'`
9.3.5. Wildcard Characters
9.3.5.1. `%`: Matches one or more characters
9.3.5.2. `_`: Matches exactly one character
9.3.6. Combining Operators
9.3.6.1. Example
9.3.7. Best practices
9.3.7.1. Use `ILIKE` for case-insensitive searches
9.3.7.2. Combine filters with `AND`/`OR` for precision
9.3.7.3. Use `ORDER BY` to rank results
9.3.7.4. Optimize performance with indexes for large datasets
10. CHAPTER 3: Understanding Data Types
10.1. Charactor
10.1.1. CHAR(n)
10.1.1.1. Fixed-length column
10.1.1.2. Specified by n (e.g., char(20))
10.1.1.3. Pads with spaces if fewer characters inserted
10.1.1.4. Always stores exactly n characters
10.1.1.5. Alternative syntax: character(n)
10.1.1.6. Legacy type, used infrequently
10.1.2. VARCHAR(n)
10.1.2.1. Variable-length column
10.1.2.2. Maximum length specified by n
10.1.2.3. Stores only actual characters inserted
10.1.2.4. No padding with spaces
10.1.2.5. Space-efficient storage
10.1.2.6. Alternative syntax: character varying(n)
10.1.3. TEXT
10.1.3.1. Variable-length column
10.1.3.2. Unlimited length (up to ~1GB)
10.1.3.3. Not part of SQL standard
10.1.4. Performance Considerations
10.1.4.1. No substantial performance difference among types
10.1.4.2. VARCHAR and TEXT offer space savings
10.1.4.3. CHAR wastes space with padding
10.1.5. Usage Recommendations
10.1.5.1. Use VARCHAR for most cases
10.1.5.2. Choose n value to handle outliers
10.1.5.3. Use CHAR for truly fixed-length data (e.g., state codes)
10.1.5.4. Use TEXT when length is unpredictable
10.2. Number
10.2.1. Integer Types
10.2.1.1. Whole numbers (positive/negative)
10.2.1.2. Used for: counting, IDs, quantities
10.2.1.3. Types by Size
10.2.1.3.1. smallint
10.2.1.3.2. integer
10.2.1.3.3. bigint
10.2.1.4. Auto-Incrementing Types
10.2.1.4.1. Used for: primary keys, IDs
10.2.1.4.2. Auto-increment with each new row
10.2.2. Decimal Numbers
10.2.2.1. Fixed-Point Types
10.2.2.1.1. numeric(precision, scale) or scale)
10.2.2.2. Floating-Point Types
10.2.2.2.1. real
10.2.2.2.2. double precision
10.2.3. Considerations
10.2.3.1. Floating-Point Issues
10.2.3.1.1. Mathematical inaccuracies
10.2.3.1.2. Example: 0.7 × 10,000,000 ≠ 7,000,000
10.2.3.1.3. Uses less storage than fixed-point
10.2.3.2. Storage Efficiency
10.2.3.2.1. Smaller types use less space
10.2.3.2.2. Important for large databases
10.2.4. Selection Guidelines
10.2.4.1. Use integers when possible
10.2.4.2. Use numeric/decimal for exact calculations (money)
10.3. Dates and Times
10.3.1. Overview
10.3.1.1. Essential for tracking when events occur
10.3.1.2. Server-based time awareness
10.3.1.3. Calendar-aware (leap years, time zones)
10.3.1.4. Key for data storytelling
10.3.2. Core Data Types
10.3.2.1. TIMESTAMP
10.3.2.1.1. Date and time together
10.3.2.1.2. Variant: timestamp with time zone (timestamptz)
10.3.2.1.3. Most useful with time zone included
10.3.2.2. DATE
10.3.2.2.1. Date only (no time)
10.3.2.3. TIME
10.3.2.3.1. Time only (no date)
10.3.2.3.2. Can include time zone
10.3.2.4. INTERVAL
10.3.2.4.1. Represents duration of time
10.3.2.4.2. Format: quantity unit (12 days, 8 hours)
10.3.2.4.3. Units from microsecond to millennium
10.3.3. Time Zone Handling
10.3.3.1. Specification Methods
10.3.3.1.1. Named time zones (Australia/Melbourne)
10.3.3.1.2. Abbreviations (EST)
10.3.3.1.3. UTC offset (-8)
10.3.3.1.4. Time zone database (tz database)
10.3.3.2. Display Considerations
10.3.3.2.1. Values displayed relative to user's time zone
10.3.3.2.2. UTC offset shown (e.g., -05)
10.3.4. Input Formats
10.3.4.1. ISO format (YYYY-MM-DD HH:MM:SS)
10.3.4.2. Other supported formats (MM/DD/YYYY)
10.3.5. Calculations
10.3.5.1. Math operations with interval type
10.3.5.2. Example: timestamp - interval
10.3.5.3. Creates new timestamp values
10.3.5.4. Simplifies date/time calculations
10.4. Miscellaneous Types
10.4.1. Boolean Type
10.4.1.1. Stores true or false values
10.4.1.2. Used for flags and conditions
10.4.2. Geometric Types
10.4.2.1. Two-dimensional objects
10.4.2.2. Points
10.4.2.3. Lines
10.4.2.4. Circles
10.4.2.5. Other geometric shapes
10.4.3. Network Address Types
10.4.3.1. IP addresses
10.4.3.2. MAC addresses
10.4.3.3. Network identifiers
10.4.4. UUID Type
10.4.4.1. Universally Unique Identifier
10.4.4.2. Often used as unique key values in tables
10.4.4.3. Alternative to serial/sequence IDs
10.4.5. Structured Data Types
10.4.5.1. XML data type
10.4.5.2. JSON data type
10.5. Transforming Values from One Type to Another with CAST
10.5.1. Purpose
10.5.1.1. Transform values from one data type to another
10.5.1.2. Needed for
10.5.1.2.1. Combining numbers with text
10.5.1.2.2. Treating text dates as date types
10.5.1.2.3. Sorting based on different type rules
10.5.1.2.4. Performing calculations on converted values
10.5.2. Syntax
10.5.2.1. CAST(expression AS target_type)
10.5.2.2. Example
10.5.3. Valid Conversions
10.5.3.1. Numbers to text
10.5.3.1.1. Always works
10.5.3.1.2. Integer → varchar
10.5.3.1.3. Numeric → varchar
10.5.3.2. Timestamps to text
10.5.3.2.1. Can extract just date portion
10.5.3.2.2. CAST(timestamp AS varchar(10))
10.5.3.3. Numbers to other number types
10.5.3.3.1. Numeric → integer (with rounding)
10.5.3.3.2. Integer → numeric
10.5.4. Limitations
10.5.4.1. Target type must accommodate original value
10.5.4.2. Cannot convert
10.5.4.2.1. Text with letters to numbers
10.5.4.2.2. Invalid date strings to dates
10.6. CAST Shortcut notaion
10.6.1. Double Colon (::) Shortcut
10.6.1.1. Example
10.6.2. Best practices
10.6.2.1. Choose based on readability
10.6.2.2. Consider future code maintainers
10.6.2.3. Be consistent within a project
10.7. Try it yourself
10.7.1. 1. Your company delivers fruit and vegetables to local grocery stores, and you need to track the mileage driven by each driver each day to a tenth of a mile . Assuming no driver would ever travel more than 999 miles in a day, what would be an appropriate data type for the mileage column in your table? Why?
10.7.2. 2. In the table listing each driver in your company, what are appropriate data types for the drivers’ first and last names? Why is it a good idea to separate first and last names into two columns rather than having one larger name column?
10.7.3. 3. Assume you have a text column that includes strings formatted as dates . One of the strings is written as '4//2017' . What will happen when you try to convert that string to the timestamp data type?
11. CHAPTER 4: Importing and Exporting Data
11.1. Working with delimited text Files
11.1.1. Basics
11.1.1.1. Universal data exchange format
11.1.1.2. Bridge between different software applications
11.1.1.3. Plain text format with structured data
11.1.1.4. Alternative to proprietary formats
11.1.2. Structure
11.1.2.1. Rows
11.1.2.1.1. represent table records
11.1.2.2. Columns
11.1.2.2.1. eparated by delimiter characters
11.1.2.3. Common delimiters
11.1.2.3.1. Comma (most common)
11.1.2.3.2. Pipe (|)
11.1.2.3.3. Tab
11.1.2.3.4. Semicolon (;)
11.1.2.3.5. Ampersand (&)
11.1.3. CSV Files
11.1.3.1. CSV = Comma-Separated Values
11.1.3.2. Most common delimited file format
11.1.3.3. "CSV" and "comma-delimited" are interchangeable terms
11.1.3.4. Widely supported by
11.1.3.4.1. Databases
11.1.3.4.2. Spreadsheets
11.1.3.4.3. Programming languages
11.1.4. Delimiter Challenges
11.1.4.1. Problem
11.1.4.1.1. Values containing delimiter characters
11.1.4.1.2. Example: address with comma
11.1.4.2. Solution
11.1.4.2.1. Text Qualifiers
11.1.4.2.2. Most common: double quotes (")
11.2. Using COPY to import data
11.2.1. Basic Syntax
11.2.1.1. COPY table_name FROM 'file_path' WITH (options);
11.2.1.2. Table must already exist in database
11.2.1.3. Command means "Copy data TO my table FROM this file"
11.2.2. File Path Specification
11.2.2.1. Windows Format
11.2.2.1.1. Drive letter + colon + backslash + directories
11.2.2.1.2. Example: `'C:\Users\Anthony\Desktop\my_file.csv'`
11.2.2.2. macOS/Linux Format
11.2.2.2.1. Forward slash + directories from root
11.2.2.2.2. Example: `'/Users/anthony/Desktop/my_file.csv'`
11.2.2.3. Always enclose path in single quotes
11.2.3. WITH Options
11.2.3.1. Enclosed in parentheses
11.2.3.2. Multiple options separated by commas
11.2.3.3. Example: `WITH (FORMAT CSV, HEADER);`
11.2.4. Common COPY Options
11.2.4.1. FORMAT
11.2.4.1.1. Specifies file type
11.2.4.1.2. Options
11.2.4.2. HEADER
11.2.4.2.1. `HEADER` or `HEADER ON`
11.2.4.2.2. Indicates source file has column names in first row
11.2.4.2.3. On import: Skips first row during import
11.2.4.2.4. On export: Includes column names as first row
11.2.4.3. DELIMITER
11.2.4.3.1. `DELIMITER 'character'`
11.2.4.3.2. Specifies the column separator character
11.2.4.3.3. Default for CSV is comma
11.2.4.3.4. Example for pipe-delimited: `DELIMITER '|'`
11.2.4.3.5. Must be a single character (not carriage return)
11.2.4.4. QUOTE
11.2.4.4.1. `QUOTE 'character'`
11.2.4.4.2. Specifies text qualifier character
11.2.4.4.3. Default is double quote (")
11.2.4.4.4. Used to wrap values containing delimiter characters
11.3. Importing census data describing counties
11.3.1. SELECT queries to check data
11.3.2. ORDER BY with LIMIT to view specific data subsets
11.3.3. Checking large values (Alaska land areas)
11.3.4. Checking in the gitlab SQL
11.4. Importing a subset of columns with COPY
12. CHAPTER 5: Basic Math and Stats with SQL
12.1. Math operators
12.1.1. Basic Operators (ANSI SQL Standard)
12.1.1.1. Addition (+)
12.1.1.1.1. Used with: numbers, dates, intervals
12.1.1.1.2. Example: SELECT 2 + 2; → 4
12.1.1.2. Subtraction (-)
12.1.1.2.1. Example: SELECT 9 - 1; → 8
12.1.1.2.2. Used with: numbers, dates, intervals
12.1.1.3. Multiplication (*)
12.1.1.3.1. Example: SELECT 3 * 4; → 12
12.1.1.3.2. Used with: numbers
12.1.1.4. Division (/)
12.1.1.4.1. Example: SELECT 11 / 6; → 1 (integer division)
12.1.1.4.2. Returns quotient only (no remainder)
12.1.1.4.3. Integer division drops decimal portion
12.1.2. PostgreSQL-Specific Operators
12.1.2.1. Modulo (%)
12.1.2.1.1. Returns only the remainder
12.1.2.1.2. Example: SELECT 11 % 6; → 5
12.1.2.1.3. Useful for
12.1.2.2. Exponentiation (^)
12.1.2.2.1. Raises base to power of exponent
12.1.2.2.2. Example: SELECT 3 ^ 4; → 81 (3 to the 4th power)
12.1.2.3. Square Root (|/)
12.1.2.3.1. Prefix operator
12.1.2.3.2. Example: SELECT |/ 10; → 3.16227766...
12.1.2.4. Cube Root (||/)
12.1.2.4.1. Prefix operator
12.1.2.4.2. Example: SELECT ||/ 10; → 2.15443469...
12.1.2.5. Factorial (!)
12.1.2.5.1. Suffix operator
12.1.2.5.2. Example: SELECT 4 !; → 24 (4×3×2×1)
12.1.2.5.3. Used for permutation calculations
12.1.3. Practical Usage
12.1.3.1. Simple calculations with SELECT
12.1.3.2. Column computations in tables
12.1.3.3. Conditional logic with mathematical tests
12.1.3.4. Math in WHERE clauses for filtering
12.2. Doing math across census table columns
12.2.1. Working with Census Data
12.2.1.1. US Census 2010 data (us_counties_2010)
12.2.1.2. 3,143 counties
12.2.1.3. Population breakdowns by race
12.2.1.4. Column naming conventions
12.2.1.4.1. Census codes (e.g., p0010006)
12.2.1.4.2. Using AS for readable column aliases
12.2.2. Basic Column Math
12.2.2.1. Adding Columns
12.2.2.1.1. Example: `p0010003 + p0010004 AS "Total White and Black"`
12.2.2.1.2. Calculations performed row-by-row
12.2.2.1.3. Always use aliases for calculated columns
12.2.2.2. Verifying Data Integrity
12.2.2.2.1. Checking that race categories sum to total population
12.2.2.2.2. Formula: `p0010003 + p0010004 + p0010005 + p0010006 + p0010007 + p0010008 + p0010009`
12.2.2.2.3. Creating a "Difference" column to highlight discrepancies
12.2.2.2.4. Ordering by difference to spot problems
12.2.3. Calculating Percentages
12.2.3.1. Formula
12.2.3.1.1. (part / whole) * 100
12.2.3.2. Example
12.2.3.2.1. `(CAST(p0010006 AS numeric(8,1)) / p0010001) * 100 AS "pct_asian"`
12.2.3.3. Data Type Considerations
12.2.3.3.1. Need to CAST integers to numeric for decimal division
12.2.3.3.2. Integer division truncates decimals (returns 0 for small percentages)
12.3. Aggregate Functions for averages and sums
12.4. Finding the median
12.4.1. Finding the Median with Percentile Functions
12.4.2. Median and Percentiles with Census Data
12.4.3. Finding Other Quantiles with Percentile Functions
12.4.4. Creating a median() Function
12.5. Finding the mode
12.5.1. Example
12.5.1.1. SELECT mode() WITHIN GROUP (ORDER BY p0010001) FROM us_counties_2010;
13. CHAPTER 6: Joining Tables in a Relational Database
13.1. Linking tables using Join
13.1.1. Purpose
13.1.1.1. Connect multiple tables in a single query
13.1.1.2. Combine related data from different tables
13.1.1.3. Create relationships between datasets
13.1.1.4. Enable complex data analysis across tables
13.1.2. JOIN Syntax
13.2. Relating tables with key columns
13.2.1. Database Scenario
13.2.1.1. Public agency payroll analysis
13.2.1.2. Data received as separate tables instead of one spreadsheet
13.2.1.3. Two key tables: employees and departments
13.2.1.4. Need to connect employee data to their departments
13.2.2. Table Structure
13.2.2.1. Departments Table
13.2.2.1.1. dept_id (Primary Key)
13.2.2.1.2. dept (department name)
13.2.2.1.3. city (location)
13.2.2.2. Employees Table
13.2.2.2.1. emp_id (Primary Key)
13.2.2.2.2. first_name, last_name
13.2.2.2.3. salary
13.2.2.2.4. dept_id (Foreign Key → departments.dept_id)
13.2.3. Key Columns
13.2.3.1. Primary Key
13.2.3.1.1. Uniquely identifies each row in a table
13.2.3.1.2. Requirements
13.2.3.2. Foreign Key
13.2.3.2.1. References primary key in another table
13.2.3.2.2. Creates relationships between tables
13.2.3.2.3. Requirements
13.2.4. Constraints
13.2.4.1. PRIMARY KEY constraint
13.2.4.2. FOREIGN KEY constraint
13.2.4.3. UNIQUE constraint
13.2.5. Advantages of Relational Structure
13.2.5.1. Eliminates data redundancy
13.2.5.2. Simplifies data management
13.2.5.3. Maintains data integrity
13.3. Querying multiple tables using JOIN
13.3.1. JOIN Concept
13.3.1.1. Connects rows across tables based on matching column values
13.3.1.2. Combines data from multiple tables into a single result set
13.3.1.3. Returns columns from both tables where values match
13.3.1.4. Can filter joined results using WHERE clause
13.3.2. JOIN Syntax
13.4. JOIN types
13.4.1. JOIN (INNER JOIN)
13.4.1.1. Definition
13.4.1.1.1. Returns only rows where values match in both tables
13.4.1.2. Syntax
13.4.1.3. Alternative syntax
13.4.1.3.1. INNER JOIN
13.4.1.4. Visual concept
13.4.1.4.1. Intersection of two tables
13.4.1.5. Returns
13.4.1.5.1. Only rows with matching values in the joined columns
13.4.1.6. Key characteristic
13.4.1.6.1. Excludes non-matching rows from both tables
13.4.2. LEFT JOIN
13.4.2.1. Definition
13.4.2.1.1. Returns all rows from left table plus matching rows from right
13.4.2.2. Syntax
13.4.2.3. Visual concept
13.4.2.3.1. Complete left table with matching right table data
13.4.2.4. Returns
13.4.2.4.1. All rows from left table
13.4.2.4.2. Matching rows from right table
13.4.2.4.3. NULL values for right table when no match exists
13.4.2.5. Key characteristic
13.4.2.5.1. Preserves all records from the table before JOIN keyword
13.4.3. RIGHT JOIN
13.4.3.1. Definition
13.4.3.1.1. Returns all rows from right table plus matching rows from left
13.4.3.2. Syntax
13.4.3.3. Visual concept
13.4.3.3.1. Complete right table with matching left table data
13.4.3.4. Returns
13.4.3.4.1. All rows from right table
13.4.3.4.2. Matching rows from left table
13.4.3.4.3. NULL values for left table when no match exists
13.4.3.5. Key characteristic
13.4.3.5.1. Preserves all records from the table after JOIN keyword
13.4.4. FULL OUTER JOIN
13.4.4.1. Definition
13.4.4.1.1. Returns all rows from both tables
13.4.4.2. Syntax
13.4.4.3. Visual concept
13.4.4.3.1. Union of two tables with matches connected
13.4.4.4. Returns
13.4.4.4.1. All rows from left table
13.4.4.4.2. All rows from right table
13.4.4.4.3. NULL values where no matches exist
13.4.4.5. Key characteristic
13.4.4.5.1. Preserves all records from both tables
13.4.5. CROSS JOIN
13.4.5.1. Definition
13.4.5.1.1. Returns every possible combination of rows
13.4.5.2. Syntax
13.4.5.3. Visual concept
13.4.5.3.1. Cartesian product of two tables
13.4.5.4. Returns
13.4.5.4.1. Each row from left table combined with every row from right table
13.4.5.5. Key characteristic
13.4.5.5.1. Result size equals (rows in table1) × (rows in table2)
13.5. Using NULL to Find rows with missing values
13.5.1. NULL Concept
13.5.1.1. Special database value ≠ 0 or ""
13.5.1.2. Represents missing/unknown data
13.5.1.3. Cross-type compatible
13.5.2. NULL in JOINs
13.5.2.1. LEFT JOIN → NULLs in right table columns
13.5.2.2. RIGHT JOIN → NULLs in left table columns
13.5.2.3. FULL JOIN → NULLs in both tables' columns
13.5.3. Finding Missing Data
13.5.3.1. WHERE column IS NULL
13.5.3.2. WHERE column IS NOT NULL
13.5.3.3. Cannot use: =, !=, <, >
13.5.4. Query Pattern
13.6. Three types of table relationships
13.6.1. One-to-One (1:1)
13.6.1.1. Definition
13.6.1.1.1. One record in Table A matches exactly one record in Table B
13.6.1.2. Key Structure
13.6.1.2.1. Primary keys match uniquely across tables
13.6.1.2.2. No duplicate keys in either table
13.6.1.3. Examples
13.6.1.3.1. Person ↔ Passport
13.6.1.3.2. State ↔ Capital
13.6.1.3.3. Country ↔ President
13.6.2. One-to-Many (1:N)
13.6.2.1. Definition
13.6.2.1.1. One record in Table A matches multiple records in Table B
13.6.2.2. Key Structure
13.6.2.2.1. Primary key in parent table
13.6.2.2.2. Foreign key in child table (can repeat)
13.6.2.3. Examples
13.6.2.3.1. Manufacturer ↔ Car Models
13.6.2.3.2. Author ↔ Books
13.6.2.3.3. Department ↔ Employees
13.6.3. Many-to-Many (M:N)
13.6.3.1. Definition
13.6.3.1.1. Multiple records in Table A match multiple records in Table B
13.6.3.2. Key Structure
13.6.3.2.1. Requires junction/bridge table
13.6.3.2.2. Two one-to-many relationships
13.6.3.3. Examples
13.6.3.3.1. Students ↔ Courses
13.6.3.3.2. Players ↔ Positions
13.6.3.3.3. Books ↔ Categories
13.7. Selecting specific columns in a Join
13.7.1. Why Specify Columns
13.7.1.1. Targeted data retrieval
13.7.1.2. Protected from schema changes
13.7.1.3. Better performance
13.7.1.4. Clear output structure
13.7.2. The Ambiguity Issue
13.7.2.1. Same column name → error
13.7.3. Solution Path
13.7.3.1. Table Qualification
13.7.3.2. Column Aliasing
13.7.4. Query Pattern
13.8. Simplifying Join syntax with table aliases
13.8.1. Purpose
13.8.1.1. Reduce code verbosity
13.8.1.2. Improve readability
13.8.1.3. Simplify column references
13.8.1.4. Make long queries manageable
13.8.2. Syntax
13.8.2.1. Declared in FROM clause
13.8.2.2. Pattern: `table_name AS alias`
13.8.2.3. AS keyword is optional
13.8.2.4. Examples
13.8.2.4.1. `schools_left AS lt`
13.8.2.4.2. `employees AS e`
13.8.2.4.3. `customer_accounts AS ca`
13.9. Joining multiple tables
13.9.1. Core Concept
13.9.1.1. Join unlimited tables in single query
13.9.1.2. Each join connects via matching column values
13.9.1.3. Chain joins sequentially in query
13.9.1.4. Some database systems may have limits
13.9.2. Join Chain Pattern
13.9.2.1. TableA → TableB → TableC → TableD...
13.9.2.2. Each connection requires JOIN clause
13.9.2.3. Example
13.9.3. Syntax Structure
13.10. Performing math on Joined table columns
13.10.1. Core Concept
13.10.1.1. Apply math operations across joined tables
13.10.1.2. Reference columns using table aliases
13.10.1.3. Calculate changes between related datasets
13.10.1.4. Formula pattern: `table1.column OP table2.column`
13.10.2. Implementation
13.10.2.1. Table Setup
13.10.2.1.1. CREATE matched tables (e.g., census_2010, census_2000)
13.10.2.1.2. JOIN on unique identifiers (primary/foreign keys)
13.10.2.1.3. Reference columns with table aliases
13.10.2.2. Calculation Types
13.10.2.2.1. Simple: `t1.value - t2.value` (raw difference)
13.10.2.2.2. Complex: `(t1.value - t2.value)/t2.value * 100` (percent change)
13.10.2.2.3. Apply functions: round(), cast(), avg(), etc
13.10.3. Example Query Pattern
14. CHAPTER 7: Table Design That Works for You
14.1. Naming tables, columns, and other identifiers
14.1.1. Naming Styles
14.1.1.1. Snake Case
14.1.1.1.1. `berry_smoothie`
14.1.1.1.2. Words separated by underscores
14.1.1.1.3. All lowercase
14.1.1.1.4. PostgreSQL documentation standard
14.1.1.2. Camel Case
14.1.1.2.1. `berrySmoothie`
14.1.1.2.2. First word lowercase
14.1.1.2.3. Subsequent words capitalized
14.1.1.3. Pascal Case
14.1.1.3.1. `BerrySmoothie`
14.1.1.3.2. All words capitalized
14.1.1.3.3. Microsoft SQL Server recommendation
14.1.2. Quoting Identifiers
14.1.2.1. Purpose
14.1.2.1.1. Enable mixed case: `"Customers"`
14.1.2.1.2. Allow spaces: `"trees planted"`
14.1.2.1.3. Permit reserved keywords: `"SELECT"`
14.1.2.2. Consequences
14.1.2.2.1. Must use quotes in all references
14.1.2.2.2. Case sensitivity enforced
14.1.2.2.3. Increased error potential
14.1.2.2.4. Example: `customers` ≠ `"Customers"`
14.1.3. Best Practices
14.1.3.1. Use Snake Case
14.1.3.1.1. `video_on_demand` > `videoondemand`
14.1.3.1.2. Most readable option
14.1.3.1.3. Avoids quoting requirements
14.1.3.2. Naming Guidelines
14.1.3.2.1. Descriptive over cryptic: `arrival_time` > `arv_tm`
14.1.3.2.2. Plural for tables: `teachers`, `vehicles`
14.1.3.2.3. Watch length limitations (PostgreSQL: 63 chars)
14.1.3.2.4. Date-stamp copies: `table_name_YYYY_MM_DD`
14.1.3.3. Consistency
14.1.3.3.1. Follow organizational style guide
14.1.3.3.2. Apply consistently across database
14.1.3.3.3. Avoid mixed conventions
14.2. Controlling column values with constraints
14.2.1. Primary Keys: Natural vs. Surrogate
14.2.1.1. Natural Keys
14.2.1.1.1. Core Concept
14.2.1.1.2. Requirements
14.2.1.1.3. Implementation Methods
14.2.1.1.4. Examples
14.2.1.1.5. Limitations
14.2.1.2. Surrogate Keys
14.2.1.2.1. Core Concept
14.2.1.2.2. Characteristics
14.2.1.2.3. Implementation Types
14.2.1.2.4. Advantages
14.2.1.3. Primary Key Syntax
14.2.1.3.1. Inline constraint
14.2.1.3.2. Table-level constraint
14.2.1.4. Creating a Composite Primary Key
14.2.1.4.1. Example
14.2.1.5. Creating an Auto-Incrementing Surrogate Key
14.2.1.5.1. serial
14.2.1.5.2. bigserial
14.2.1.5.3. smallserial
14.2.1.5.4. Example
14.2.2. Foreign Keys
14.2.2.1. Core Concept
14.2.2.1.1. Column(s) in a table referencing primary key of another table
14.2.2.1.2. Ensures data consistency and referential integrity
14.2.2.1.3. Prevents orphaned rows in related tables
14.2.2.2. Key Features
14.2.2.2.1. Values must exist in the referenced table
14.2.2.2.2. Enforces relationships across tables
14.2.2.2.3. Rejects invalid data inserts or updates
14.2.2.3. Example
14.2.3. Automatically Deleting Related Records with CASCADE
14.2.3.1. Core Concept
14.2.3.1.1. Automatically deletes related records in child tables when a parent record is deleted.
14.2.3.1.2. Maintains referential integrity
14.2.3.1.3. Prevents orphaned rows in related tables
14.2.3.2. Syntax
14.2.3.2.1. Add `ON DELETE CASCADE` to foreign key definition
14.2.3.2.2. Example
14.2.4. The CHECK Constraint
14.2.4.1. Core Concept
14.2.4.1.1. Ensures column values meet logical criteria.
14.2.4.1.2. Prevents invalid or nonsensical data
14.2.4.1.3. Rejects data that fails the logical test
14.2.4.2. Syntax
14.2.4.2.1. Column Constraint
14.2.4.2.2. Table Constraint
14.2.4.3. Use Cases
14.2.4.3.1. Restrict range of values
14.2.4.3.2. Limit to valid options
14.2.4.3.3. Ensure relationships between columns
14.2.5. The UNIQUE Constraint
14.2.5.1. Core Concept
14.2.5.1.1. Ensures column values are unique across all rows.
14.2.5.1.2. Allows multiple `NULL` values (unlike primary keys)
14.2.5.1.3. Prevents duplicate entries in specified columns
14.2.5.2. Syntax
14.2.5.2.1. Define in `CREATE TABLE` statement
14.2.6. The NOT NULL Constraint
14.2.6.1. Core Concept
14.2.6.1.1. Ensures a column cannot contain `NULL` values
14.2.6.1.2. Guarantees a value is present in every row for the specified column
14.2.6.1.3. Prevents empty or unknown data
14.2.6.2. Syntax
14.2.6.2.1. Declared after the column definition
14.2.7. Removing Constraints or Adding Them Later
14.2.7.1. Core Concept
14.2.7.1.1. Constraints can be added or removed after table creation
14.2.7.1.2. Use `ALTER TABLE` for modifying constraints
14.2.7.1.3. Ensures flexibility in database schema management
14.2.7.2. Syntax
14.3. Speeding up queries with indexes
14.3.1. B-Tree: PostgreSQL’s Default Index
14.3.1.1. Core Concept
14.3.1.1.1. Default index type in PostgreSQL.
14.3.1.1.2. Automatically created for:
14.3.1.1.3. Supports equality and range operations.
14.3.1.2. Characteristics
14.3.1.2.1. Data Storage
14.3.1.2.2. Structure
14.3.1.3. Supported Operations
14.3.1.3.1. Equality: `=`
14.3.1.3.2. Range: `<`, `<=`, `>=`, `>`, `BETWEEN`
14.3.1.4. Other PostgreSQL Index Types
14.3.1.4.1. Generalized Inverted Index (GIN)
14.3.1.4.2. Generalized Search Tree (GiST)
14.3.1.5. Advantages
14.3.1.5.1. Speeds up search and retrieval operations
14.3.1.5.2. Handles large datasets efficiently
14.3.1.5.3. Supports a wide range of operations
14.3.2. Benchmarking Query Performance with EXPLAIN
14.3.2.1. Core Concept
14.3.2.1.1. EXPLAIN Command
14.3.2.1.2. EXPLAIN with ANALYZE
14.3.2.2. Purpose
14.3.2.2.1. Measure performance improvements before and after adding an index
14.3.2.2.2. Understand database behavior for a query
14.3.2.3. Output
14.3.2.3.1. Query Plan Details
14.3.2.4. Use Cases
14.3.2.4.1. Benchmarking queries
14.3.2.4.2. Debugging slow queries
14.3.2.4.3. Optimizing database performance
14.3.2.5. Recording Some Control Execution Times
14.3.2.5.1. Useful for analyzing and optimizing query performance
14.3.2.6. Adding an Index
14.3.2.6.1. Adding an Index
14.3.2.6.2. Removing an Index
14.3.3. Considerations When Using Indexes
14.3.3.1. Core Concept
14.3.3.1.1. Indexes improve query performance but come with storage and maintenance costs.
14.3.3.1.2. Not every column needs an index.
14.3.3.2. Key Considerations
14.3.3.2.1. Database Documentation
14.3.3.2.2. Table Joins
14.3.3.2.3. Query Performance
14.3.3.2.4. Testing and Optimization
15. CHAPTER 8: Extracting Information by Grouping and Summarizing
15.1. Creating the library survey tables
15.1.1. Creating the 2014 Library Data Table
15.1.2. Creating the 2009 Library Data Table
15.2. Exploring the library data using aggregate Functions
15.2.1. Counting Rows and Values Using count()
15.2.1.1. Core Concept
15.2.1.1.1. `count()` is an aggregate function part of the ANSI SQL standard
15.2.1.1.2. Provides an easy way to count rows and values in a table
15.2.1.1.3. Using `count(*)`
15.2.2. Counting Values Present in a Column
15.2.2.1. Core Concept
15.2.2.1.1. `count(column_name)` counts the number of rows in a specific column that contain **non-NULL values**
15.2.2.1.2. Different from `count(*)`, which counts all rows regardless of `NULL` values
15.2.2.2. Recommendations
15.2.2.2.1. Validate Data
15.2.2.2.2. Expert Consultation
15.2.2.2.3. Broader Analysis
15.2.3. Counting Distinct Values in a Column
15.2.3.1. Core Concept
15.2.3.1.1. Used after `SELECT` to return unique values in a column or combinations of columns
15.2.3.1.2. Can be combined with `count()` to count distinct values
15.2.4. Finding Maximum and Minimum Values Using max() and min()
15.2.4.1. Purpose
15.2.4.1.1. Identify the largest and smallest values in a column
15.2.4.1.2. Gain an understanding of the scope of values
15.2.4.1.3. Detect potential data issues
15.2.4.2. Functions
15.2.4.2.1. `max(column_name)`: Returns the maximum value
15.2.4.2.2. `min(column_name)`: Returns the minimum value
15.2.5. Aggregating Data Using GROUP BY
15.2.5.1. Purpose
15.2.5.1.1. Group results based on one or more column values
15.2.5.1.2. Perform aggregate operations (e.g., `SUM()`, `COUNT()`) for each group
15.2.5.2. Behavior
15.2.5.2.1. Eliminates duplicate values, similar to `DISTINCT`
15.2.5.3. Using GROUP BY on Multiple Columns with count()
15.2.5.3.1. Purpose
15.2.5.3.2. Behavior
15.2.5.4. Revisiting sum() to Examine Library Visits
15.2.5.4.1. Purpose
15.2.5.4.2. Key Techniques
15.2.5.5. Grouping Visit Sums by State
15.2.5.6. Filtering an Aggregate Query Using HAVING