›INDEX
Last Updated:

Introduction

Database Systems VS File Systems

Problems with file systems:

  • Longer development times.
  • Longer time to get answers about data.
  • Complex system administration.
  • Security issues which are harder to manage.
  • Extensive programming.
  • Structural dependence.

Problems with Database systems:

  • Increased costs.
  • Management complexity.
  • Maintaining currency.
  • Vendor dependence.
  • Frequent upgrade/replacement cycles.

Basic Terminology

  • Data: Raw facts. Data has little meaning unless they have been organized into some logical arrangement.
  • Field: A character or group of characters that has a specific meaning. A field is used to define and store data.
  • Record: A logically connected set of one or more fields that describes a person, place, or thing.
  • File: A collection or related records.
Structural Dependence
  • Structural dependence: Access to file is dependent on its structure. All file system programs need to be modified to conform to a new file structure.
  • Structural independence: File structure can be changed without affecting the application's ability to access the data.
Data Dependence
  • Data dependence: Data access changes when data storage characteristics change.
  • Data independence: Data storage characteristics is changed without affecting the program's ability to access the data.
Data Redundancy

Unnecessarily storing same data at different places. Islands of information is when data is scattered in locations. This increases the probability of having different version of the same data.

Problems

  • Poor Data security.
  • Data inconsistency.
  • Increased likelihood of data-entry errors when complex entries are made in different files.
  • Data anomaly: Develops when not all the required changes in the redundant data are made successfully.
Data Anomaly
  • Update Anomalies
  • Insertion Anomalies
  • Deletion Anomalies

Types of Databases

By Users

  • Single-user database: Supports one user at a time.
    • Desktop database: Runs on a personal computer.
  • Multi-user database: Supports multiple users at the same time.
    • Workgroup database: Supports a small number of users or a specific department.
    • Enterprise database: Supports many users across many departments.

By Location

  • Centralized database: Data is located at a single site.
  • Distributed database: Data is distributed across different sites.
  • Cloud database: Created and maintained using cloud data services that provide defined performance measures for the database.

By Purpose

  • General-purpose database: Contains a wide variety of data used in multiple disciplines.
  • Discipline-specific databases: Contains data focused on specific subject areas.

By Time Sensitivity and Use

  • Operational database: Designed to support a company's day-to-day operations. (Also called Online transaction processing database (OLTP)).
  • Analytical database: Stores historical data and business metrics used exclusively for tactical or strategic decision-making.
    • Data warehouse: Stores data in a format optimized for decision support.
    • Online analytical processing (OLAP): Tools for retrieving processing, and modeling data from the data warehouse.
  • Business intelligence: Captures and process business data to generate information that support decision making.

By Structure

  • Unstructured data: It exists in their original state.
  • Structured data: It results from formatting.
  • Semi-structured data: Processed to some extent.
  • Extensible Markup Languages (XML): Represents data elements in textual format.

ACID VS BASE Transactions

ACID

ACID: Atomicity, Consistency, Isolation, Durability.

  • Atomicity: Each transaction is all or nothing (Transaction can consist of multiple commands)
  • Consistency: Any transaction brings the database from one valid state to another.
  • Isolation: Concurrent execution of statements would be if those statements were executed sequentially.
  • Durability: Once a transaction has been committed, it will remain so, even in the event of power loss, crashes or breakdowns.

The ACID model ensures data is safe and consistently stored. Normally employed by Relational databases.

BASE

BASE: Basic, Availability, Soft state, Eventual consistency

  • Basic Availability: System guarantees availability as regards the CAP (Consistency, Availability, Partition tolerance)
  • Soft state: System state could change over time, even without input, due to Eventual consistency.
  • Eventual consistency: System will eventually become consistent when it stops receiving input.

The BASE model prioritize availability over consistency. Normally employed by NoSQL databases.

Data Modeling

Data modeling is a process used to define and analyze data requirements needed to support the business process within the scope of corresponding information systems in organizations.

Iterative and progressive process of creating a specific data model for a determined problem domain.

Data models are simple representations of complex real world data structures.

Types

Types of Data Model

Conceptual Data Model

  • Identifies the highest-level relationship between entities.
  • Facilitates communication, and integration between systems, processes, and organizations.
  • Often expressed using an entity-relationship model (ERM).

Conceptual Data Model

Logical Data Model

  • Expands on the CDM (conceptual data model) with more details.
  • Describes the data in as much detail as possible, without regards to its physical implementation.
  • Contains attributes in addition to entities and relationship.
  • Normalization done at this level.

Logical Data Model

Physical Data Model

  • Models how actual database will be build.
  • Shows the complete table structure, keys, and relationships between tables.
  • Dependent on the database management systems used.

Physical Data Model

Basic Building Blocks

  • Entity: Unique and distinct object used to collect and store data.
  • Attribute: Characteristic of an entity.
  • Relationship: Describes an association among entities.
    • One-to-many (1:M)
    • Many-to-many (M:N or M:M)
    • One-to-one (1:1)
  • Constraint: set of rules to ensure data integrity.

Types of Relationships

Types of relationships

Terminology

  • Key attribute: Attribute that is a part of a key.
  • Entity integrity: Condition in which each row in the table has its own unique identity.
  • Referential integrity: Every reference to an entity instance by another entity instance is valid.

Integrity Rules

  • Entity Integrity
    • All PK entries are unique, and no part of a PK may be null.
    • Each row will have a unique identity, and foreign key (FK) values can properly reference PK values.
  • Referential Integrity
    • FK may or may not be part of its table's PK.
    • FK may be null, if it's not a part of the PK.
    • Every non-null FK value must reference an existing PK value in the table to which it's related.

SQL

SQL is a non-procedural language (you command what is to be done, not how it is to be done.) SQL commands are broadly into DDL and DML commands. DDL commands are used to configure the structure of the database and tables. DML commands are used to modify data within those tables and structures.

DDL (Data Definition Language)

Data Definition Language is used to work with structure in SQL. DDL includes commands to create database objects such as tables, indexes, and views; define constrains for data objects; define access rights to the database objects.

Create Database/Schema

CREATE SCHEMA schema_name AUTHORIZATION creator;

The keyword DATABASE can be used instead of SCHEMA.
Optionally, can use [IF NOT EXISTS] to prevent error message if database already exists.

CREATE DATABASE IF NOT EXISTS my_first_database;

Create Table Structures

CREATE TABLE tablename (
    column1     datatype    [constraint],
    column2     datatype    [constraint],
    PRIMARY KEY (column1 [, column2]),
    FOREIGN KEY (column1 [, column2])
        REFERENCES tablename2 (column1[, column2]),
    CONSTRAINT  constraint1
);

Example:

CREATE TABLE `country` (
    `Code`        char(3)       NOT NULL    DEFAULT "",
    `Name`        varchar(52)   NOT NULL    DEFAULT "",
    `Continent`   varchar(52)   NOT NULL    DEFAULT "",
    `Population`  int                       DEFAULT "Africa",
    `LifeExp`     float(3,1)                DEFAULT NULL,
    `Capital`     varchar(52)               DEFAULT NULL,
    PRIMARY KEY (`Code`)
);
SQL Constraints
  • NOT NULL: Ensures that column does not accept nulls.
  • UNIQUE: Ensures that all values in column are unique.
  • DEFAULT: Assigns value to attribute when a new row is added to table.
  • CHECK: Validates data when attribute value is entered.

Examples:

CONSTRAINT CUSTOMER_NAME_UNIQ UNIQUE(CUS_LNAME, CUS_FNAME);

CONSTRAINT INVOICE_CHECK CHECK(INV_DATE > TO_DATE('01-JAN-2016', 'DD-MON-YYYY'));
SQL Data Types
  • NUMBER(L, D) or NUMERIC(L, D): L is the number of character and D is the number of digits after the decimal place.
  • INTEGER or INT: Used to store integers.
  • SMALLINT: Like INT but limited to integer values up to six digits.
  • DECIMAL(L, D): Like NUMBER specification, but the storage length is a minimum specification. Greater lengths are acceptable.
  • CHAR(L): Fixed-length character data for up to 255 characters.
  • VARCHAR(L) or VARCHAR2(L): Variable-length character data.
  • DATE: Stores dates in the Julian date format.
  • DATETIME: Used for values that contain both date and time parts.
  • TIMESTAMP: Used to record the date and time of an event.

Alter Table Structure

We can use the ALTER TABLE command to make changes to the table structure:

  • ADD: Adds a column.
  • MODIFY: Changes column characteristics.
  • DROP: Delete a column.

Also used to:

  • Add table constraints.
  • Remove table constraints.

Adding and Dropping Columns

ALTER TABLE tablename ADD (column datatype);

ALTER TABLE tablename MODIFY column datatype;

ALTER TABLE tablename DROP COLUMN column;

ALTER TABLE tablename RENAME column TO column2;

ALTER TABLE tablename CHANGE column to column2 datatype;

Renaming and Dropping Table from the Database

RENAME TABLE tablename TO newtablename;

DROP TABLE tablename;

Can drop a table only if it is not the one side of any relationship. RDBMS generates a foreign key integrity violation error message if you try to drop a referenced table.

Command or Option Description
CREATE SCHEMA AUTHORIZATION Creates a database schema
CREATE TABLE Creates a new table in the user's database schema
NOT NULL Ensures that a column will not have null values
UNIQUE Ensures that a column will not have duplicate values
PRIMARY KEY Defines a primary key for a table
FOREIGN KEY Defines a foreign key for a table
DEFAULT Defines a default value for a column (when no value is given)
CHECK Validates data in an attribute
CREATE INDEX Creates an index for a table
CREATE VIEW Creates a dynamic subset of rows and columns from one or more tables
ALTER TABLE Modifies a table's definition (adds, modifies, or deletes attributes or constraints)
CREATE TABLE AS Creates a new table based on a query in the user’s database schema
DROP TABLE Permanently deletes a table (and its data)
DROP INDEX Permanently deletes an index
DROP VIEW Permanently deletes a view

DML (Data Manipulation Language)

DML is used to query and manage the data inside the database. They include commands to insert, update, and delete data within the database table. It also includes commands to query the database and return useful information.

Adding Data to a Table

INSERT INTO tablename VALUES (value1, value2, ..., valueN);

INSERT INTO tablename (column1, ..., columnN) VALUES (value1, ..., valueN);

The second statement can be used to add a subset of all the columns.

Updating Table Data

UPDATE tablename SET assignment_list [WHERE where condition];

Example:

UPDATE country SET LifeExp = 100;
UPDATE country SET LifeExp = 79.4 WHERE Code="CAN";

Deleting Table Rows

DELETE FROM tablename [WHERE condition];

Select Statements

Here is the most basic version of the select statement:

SELECT columnlist FROM tablename;

Example:

SELECT * FROM country;
SELECT Code, Name FROM country;

WHERE Conditional

We can use the WHERE conditional along with comparison operators and logical operators to filter our select statement.

WHERE column1 = column2 AND colum3 >= value1;

-- Example
WHERE first = last AND code >= 200;

Multiple Tables

There are many ways in SQL to get data from many tables and the act of connecting 2 tables is called a "JOIN".

Cross Join

A cross-join returns the Cartesian product of the two tables.

-- Old style
SELECT * FROM T1, T2;

-- New style
SELECT * FROM T1 CROSS JOIN T2;
Inner Join

Inner Join

Natural Join

Returns only the rows with matching values in the matching columns; the matching columns must have the same names and similar data types.

SELECT * FROM T1 NATURAL JOIN T2;

Natural joins aren't recommended because:

  • DBMS 'guesses' columns to join based on name and type.
  • Harder to maintain and document.
  • JOINs cannot be done if column names are different.
  • Always an Equi-Join i.e. based on equality.
Join Using

Returns only the rows with the matching values in the columns indicated in the USING clause.

SELECT * FROM T1 FROM T2 USING (C1 [, C2]);
Join On

Returns only the rows that meet the join condition indicated in the ON clause.

SELECT * FROM T1 JOIN T2 ON T1.C1 = T2.C2;
Outer Join

Outer Join

Left Join

Return rows with matching values and includes all rows from the left table (T1) with unmatched values. The best way to think of this is: we have a table T1, and we're trying to append the data from T2 onto T1.

SELECT * FROM T1 LEFT OUTER JOIN T2 ON T1.c1 = T2.c1;

SELECT * FROM T1 LEFT JOIN T2 ON T1.c1 = T2.c1;
Right Join

Return rows with matching values and includes all rows from the right table (T2) with unmatched values. The best way to think of this is: we have a table T2, and we're trying to append the data from T1 onto T2.

SELECT * FROM T1 RIGHT OUTER JOIN T2 ON T1.c1 = T2.c1;

SELECT * FROM T1 RIGHT JOIN T2 ON T1.c1 = T2.c1;
Full Join

Return rows with matching values and includes all rows from both tables (T1 and T2) with unmatched values.

SELECT * FROM T1 FULL OUTER JOIN T2 ON T1.c1 = T2.c1;

Transactions

START TRANSACTION;

COMMIT [work];

These 2 commands are used when committing changes to the database.

Restoring Table Contents

The ROLLBACK command can be used to restore the previous state of the database.

ROLLBACK [work];

COMMIT and ROLLBACK only work with the data manipulation commands that add, modify or delete data rows.

Operators and Functions

Comparison Operators

Symbol Meaning
= Equal To
< Less than
> greater than
<= Less than or equal
>= Greater than or equal
<> or != Not Equal

Logical Operators

  • OR: logical or operation on 2 comparisons.
  • AND: logical and operation on 2 comparisons.
  • NOT: negate the result of a comparison.

Arithmetic Operators

  • +: Add.
  • -: Subtract.
  • *: Multiply.
  • /: Divide.
  • ^: Power of.

Numeric Functions

Find all numeric functions at this link: Numeric Functions.

String Functions

Find all string functions at this link: String Functions

Date Time Functions

Find all date time functions at this link: Date Time Functions

Subqueries

Subqueries are used when it is necessary to process data based on other processed data.

Example: "Provide a list of vendors who do not provide any products."

select V_CODE, V_NAME from VENDOR where V_CODE not in
    (select V_CODE from PRODUCTS);

A subquery is a query (select statement) inside another query. The first query is often known as the outer query and the subquery is known as the inner query.

Inner queries are executed first. And the entire SQL statement is referred to as a nested query.

Returns a list of values

When a subquery returns a list of values (usually one column) we can use keywords like: IN, ANY, ALL to ask questions about this list.

Examples:

SELECT P_CODE, P_PRICE FROM PRODUCT
    WHERE P_PRICE IN
    (SELECT P_PRICE FROM PRODUCT WHERE V_CODE=23119);

SELECT P_CODE, P_PRICE FROM PRODUCT
    WHERE P_PRICE > ANY
    (SELECT P_PRICE FROM PRODUCT WHERE V_CODE=23119);

SELECT P_CODE, P_PRICE FROM PRODUCT
    WHERE P_PRICE > ALL
    (SELECT P_PRICE FROM PRODUCT WHERE V_CODE=23119);

Correlated Subquery

Correlated subqueries executes once for each row in the outer query. This makes this extremely slow.

Example:

SELECT INV_NUMBER, P_CODE, LINE_UNITS FROM LINE L1
    WHERE L1.LINE_UNITS > (
        SELECT AVG(LINE_UNITS)
        FROM LINE L2
        WHERE L1.P_CODE=L2.P_CODE
    );

In the above example, we can see that L1.P_CODE is used from the outer query inside the inner query. This makes this query a correlated subquery.

Control Flow and Conditional Expressions

CASE Expressions

A CASE expression is used when we need to return different results based on a condition or the result of a value or expression.

There are 2 formats of the CASE statement:

CASE value
    WHEN compare_value THEN result
    [WHEN compare_value THEN result] ...
    [ELSE result]
END
CASE
    WHEN [condition] THEN result
    [WHEN [condition] THEN result ...]
    [ELSE result]
END

If function

Evaluates an expression to TRUE or FALSE and retuns different expressions based on the result of this evaluation.

IF (expr1, expr2, expr3)

-- If expr1 is true (not 0 or null),
-- return expr2, otherwise return expr3

Example of useful case:

SELECT
    COUNT(IF(P_DISCOUNT>0, 1, NULL)) as 'Discounted',
    COUNT(IF(P_DISCOUNT=0, 1, NULL)) as 'Not discounted'
FROM PRODUCT;

Views

A virtual table is the output of a relational operator such as a SELECT is another relation (or table).

A View is a virtual table based on a SELECT query. The tables on which the query is based are called base tables.

We can create views in mysql using the CREATE VIEW command.

CREATE VIEW viewname AS SELECT query

Create view is classified as a DDL command since it stores the sub-query specification in the data dictionary.

Example: "Create a view to generate the product listing with the quantity on hand of products with price over 50".

CREATE VIEW price_gt_50 AS
    SELECT p_descript, p_qoh, p_price
    FROM product
    WHERE p_price > 50;


-- View data using the created view
select * from price_gt_50;

Since views are stored SELECT statements posing as a table, the actual data is only stored in the original tables.

Updatable views can be updated, not all views are updatable. An updatable view is a view that does not contain any of the following:

  • Aggregate function or GROUP BY expression.
  • SET operators like UNION, INTERSECT or MINUS.
  • Views without all NOT NULL columns from the base tables.

We can see which views are updatable in mysql using the following command:

SELECT TABLE_NAME, IS_UPDATABLE
    FROM INFORMATION_SCHEMA.VIEWS
    WHERE TABLE_SCHEMA = 'schema name';

Indexes

In computer science, a term index is a data structure to facilitate fast lookup of terms and clauses in a logic program, deductive database.

These are used to quickly find rows with specific column values.

  • Index key: Index's reference point that leads to data location identified by the key.
  • Unique index: Index key can have only one pointer value associated with it.

Each index is associated with only one table. One table can have several indexes. Index is automatically created on the primary key column.

Indexes result in faster searches but slower updates. This is because every time a value in the table (associated with an index) is updated then all the corresponding indexes must be recomputed as well.

We can use the CREATE INDEX command to create an index. The UNIQUE qualifies prevents a value that has been used before. Composite indexes prevent data duplication.

To delete an index, use the DROP INDEX command.

Syntax:

CREATE [UNIQUE] INDEX indexname
    ON tablename (col1 [, col2]);

Examples:

-- Creates index on column P_CODE
CREATE UNIQUE INDEX P_CODEX
ON PRODUCT (P_CODE);

-- Creates index in desc. order
CREATE INDEX PROD_PRICEX
ON PRODUCT (P_PRICE DESC);

Procedural SQL

  • Standard SQL lacks procedural functionality, it does not support conditional (IF, ELSE) statements or loops (WHILE, FOR).

  • SQL-99 standard defines the use of persistent stored modules (PSM). A block of code containing standard SQL statements and procedural extensions. The code is executed at the DBMS sever.

  • Support for PSMs is left up to the vendors. PMSs existed long before they were standardized. Different vendors implement a similar procedure is different ways.

Stored Procedures

A stored procedure is a collection of procedural and SQL statements. These statements are stored in the database just like tables, views and triggers.

These procedures can be used to encapsulate and represent business transactions.

Advantages of Stored procedures

  • Reduced network traffic (no transmission of individual SQL statements over the network).

  • Increased performance (All transactions are executed locally on the server).

  • Encapsulation and security (Column names of the table are hidden, and execution privileges to users can be granted.)

  • Change in table structure will not affect code.

Example of a stored procedure without input and output:

DELIMITER //

CREATE PROCEDURE allMovies()
BEGIN
    select title, description from film;
END //

DELIMITER ;

Note, in the above example we reset the delimiter so that we can use ; for statements within the stored procedure. Once we're done with our stored procedure, we reset our delimiter to the semicolon.

We can view all stored procedures by running:

select routine_name, routine_type,definer,created,security_type,SQL_Data_Access from information_schema.routines where routine_type='PROCEDURE' and routine_schema='name_of_schema';

Delete a procedure:

DROP PROCEDURE <procedure-name>;

Parameterized Stored Procedure

When we create a stored procedure, the parameters must be specified within the parenthesis. The syntax is as follows:

(IN | OUT | INOUT) (Parameter Name [datatype(length)])
Example of IN params
DELIMITER //
CREATE PROCEDURE moviesByRating(IN rating varchar(50))
BEGIN
    select title, description from film where rating=rating;
END //
DELIMITER ;
Example of OUT params
DELIMITER //
CREATE PROCEDURE countPG13(OUT Total_Movies int)
BEGIN
    select count(title) INTO Total_Movies from film where rating='PG-13';
END //
DELIMITER ;

Example of using such a procedure:

CALL countPG13(@totalPG13);
select @totalPG13 as Movies;

Keys

  • Super key: Any combination of columns that uniquely identify the entire row
  • Candidate key: all minimal super keys
    • Primary key
    • Alternate keys

Enjoy the notes on this website? Consider supporting me in this adventure in you preferred way: Support me.