SQL: All You Need to Know
The History of SQL
The IBM Santa Teresa Lab published their design for a Relational Database Management System (RDBMS) named System R, authored by Edgar Frank Codd in 1970. The database schema comprised multiple related tables consisting of rows and columns. The tables were initially manipulated and queried using a form of relational algebra. Donald Chamberlin and Raymond Boyce developed the first procedural language designed for the RDBMS, called SEQUEL. It was later shortened to SQL, which is short for Structured Query Language, due to a trademark dispute. The SQL language has subsets for changing and creating database objects such as tables and views, including the Data Definition Language (DDL). The primary role of SQL is to query tables, delete, update, and insert rows of data. In 1986 SQL first became an ANSI and ISO/IEC standard which multiple vendors have implemented. Actian, to this day, is an active member of the standards body to keep its relational database technology, such as the Avalanche Cloud Data Platform, aligned to SQL standards.
IBM’s first commercial RDBMS was SQL/DS, which ran on mainframe computers. IBM later released a version of their DB2 RDBMS that ran on mainframes and Unix-based systems. Oracle created the first commercial RDBMS that ran on multiple operating systems such as VM, MVS, VMS, Windows, Linux and Solaris. Richard Stonebreaker’s team at Berkeley labs in California resulted in the eventual creation of Ingres, Informix and Sybase databases which were first used on minicomputers and Unix based such as Sun Solaris. Ingres was the first RDBMS to support distributed queries. The first clustered RDBMS ran on DEC VMS. Microsoft SQL Server was developed from a code base licensed from Sybase.
Before the RDBMS technology became commonplace, databases used access methods to retrieve stored records using indexes, linked lists of records or linked hierarchies of records. As relational databases have become popular, non-relational databases have been provided with SQL access to make them easy to query for people with SQL skills. Examples are IDMS-R, a network database, and Actian Vector, a columnar data store.
RDBMS technology is increasingly moving to cloud platforms, where most database market growth is concentrated.
What is SQL?
SQL makes it easy to extract data from a database using a standard set of operators to store, manipulate, and retrieve data from databases. For example, the SELECT * FROM CUSTOMERS statement will fetch all the rows and columns from the CUSTOMERS table. Predicates such as the WHERE clause can be used to specify subsets of data. If the table contains a STATE column, the following SQL statement will retrieve just the customers in California:
SELECT * FROM CUSTOMERS WHERE STATE = 'CA'
Multiple tables can be related to each other by designing them with key columns in common. A database design process is used to map relationships between database objects, allowing data from multiple tables to be queried using a single SQL statement. For example, you might have STATE being a common column in the CUSTOMERS table and the ORDERS table, allowing you to retrieve orders related to specific states or group results by state in your report.
When designing a database schema, it makes sense to organize the tables for the most flexibility for queries. Normalization is the term used to describe the process of optimizing how many tables you have and how they are related, minimizing the duplication of data. Two or more tables or views can be referenced in a single SQL SELECT statement by performing a table join. The tables must be related to each other using a common key column.
The following are the types of joins where left and right are the order of the tables named in the SELECT statement:
- INNER JOIN: This join returns those records which have matching values in both the tables.
- FULL JOIN: This join returns all those records which either have a match in the left or the right table.
- LEFT JOIN: This type of join returns records from the left table, along with those records which satisfy the condition from the right table.
- RIGHT JOIN: This kind of join returns records from the right table and those that satisfy the condition from the left table.
To understand JOIN clauses, you need to be familiar with keys and foreign-key and relationships between tables. For example, a customer table might have a primary key of Customer-ID. An Orders table containing a Customer-ID column will be known as a foreign key in that table.
Every table you want to use in relations should have a primary key constraint; this can either be a single
CREATE TABLE customer (Customer-ID int NOT NULL PRIMARY KEY, Customer-Name char (60),... CREATE TABLE Orders (Customer-ID int NOT NULL PRIMARY KEY,...Customer-ID int FOREIGN KEY REFERENCES Customer(Customer-ID));
SELECT CustomerName, OrderID FROM customers INNER JOIN Orders ON customer.Customer-ID = Orders.Customer-ID WHERE CustomerName ="Jim Clarks";
The query below retrieves all orders placed by Jim Clarks:
SELECT CustomerName, OrderID FROM Customers LEFT JOIN Orders ON Customer.Customer-ID = Orders.Customer-ID ORDER BY CustomerName;
Popular BI tools make it easier to join multiple tables by writing the required SQL for you graphically by dragging and dropping tables and fields on a canvas.
What is SQL Used For?
The most common use of SQL is to fetch or retrieve data from databases. Data can be read from a single table or view, multiple tables, in a single database or across multiple distributed databases.
Other than the SELECT statement that is introduced in the above section, below is a subset of SQL statements and what they are used for:
- CREATE DATABASE – creates a database instance to store objects such as tables and indexes
- CREATE TABLE – is used to create a table of data
- INSERT – is used to insert new rows into a table
- UPDATE – is used to make changes to data contained in a database table
- DELETE – is used to remove rows of data
- GRANT – is used to give users the permission to view or modify database objects such as tables
- REVOKE – is used to remove permissions from users or groups of users
- ALTER – allows the user to add or remove columns from an object to update permissions
- DROP TABLE – deletes a table
- CREATE INDEX – will create an INDEX to allow for efficient access to data using a key-value
- TRUNCATE TABLE – allows the user to empty the contents of a table
The SELECT statement can contain aggregation functions, including:
- COUNT() – returns the total number of rows that satisfy the criteria in the statement
- MIN() – returns the smallest value in the result set
- MAX() – returns the largest value in the result set
- AVG() – returns the average of the values in the result set
- GROUP BY – will group the results by a given column value
- ORDER BY – will sort the results by the specified column
Comments in SQL
Comments can be single-line or multi-line. A single-line comment begins with a double hyphen, as in the example below:
-- This is my comment
A multi-line comment is enclosed in a /* and */ pair as in the example below:
/* This is multiple line comment */
There are mainly three set operations: UNION, INTERSECT and EXCEPT.
- UNION – This operator combines the result-set of two or more SELECT statements.
SELECT Columns FROM Table1 UNION SELECT Columns FROM Table2;
- INTERSECT – This clause combines two SELECT statements and returns the intersection of the data sets of both the SELECT statements.
SELECT Column1 , Column2 .... FROM TableName WHERE Condition INTERSECT SELECT Column1 , Column2 .... FROM TableName WHERE Condition
- EXCEPT – This operator returns those rows that are returned by the first SELECT operation and are not returned by the second SELECT operation.
SELECT ColumnName FROM TableName EXCEPT SELECT ColumnName FROM TableName;
What about procedural code?
SQL is primarily a declarative language. Sometimes it is helpful to use procedural code in an RDBMS. Examples of procedures executed in the database might be to perform a complex calculation or have a stored procedure before or after an operation such as an INSERT or DELETE. Stored procedures are usually proprietary using an interpreted language or compiled 3 GL as a User Defined Function (UDF).
How is SQL accessed?
RDBMS vendors usually provide a command-line interface to their database that can be used in a command shell or within a command script. Actian provides a Call Level Interface (CLI). This is generally useful for Database Administrators and Developers who are setting up a database or doing maintenance tasks such as taking backups or creating database copies for testing.
The ODBC interface allows existing applications such as Microsoft Excel to use embedded SQL to fetch data from SQL Server and other databases.
Java applications use the JDBC driver to allow them to access databases using SQL.
Application developers use Application Programming Interfaces (API) to embed SQL statements into their applications which can be written in C, Python or COBOL, for example. Database rows can be fetched one at a time or in batches or arrays.
The Avalanche Cloud Data Platform provides a web-based query editor to make constructing and executing SQL statements easy.
Business Intelligence tools such as Looker, Tableau, Qlik Sense and Microsoft Power BI provide visual tools to write SQL statements to create dashboards that can access multiple vendors’ database technologies. All these BI solutions can be used with the Avalanche Cloud Data Platform.
What is The Difference Between SQL and MySQL?
SQL is a standards-supported query language designed to make it easy to extract data from a database using a standard set of operators to store, manipulate, and retrieve data from databases. The difference between SQL and MySQL is that MySQL is not a database query language but an open-source database product.
What is SQL Server?
Released in 1989, SQL Server is Microsoft’s own relational database management system (RDBMS). It was designed as a fully-featured database to compete against Oracle Database (DB) and MySQL.
SQL Server supports ANSI SQL, the standard SQL language and contains T-SQL, its own SQL implementation. SQL Server Management Studio (SSMS) is SQL Server’s main interface tool. Microsoft SQL Server is sometimes referred to as MSSQL.
Microsoft SQL Server is offered with different feature sets to meet the RDBMS needs of small businesses to large enterprises.
Is SQL Easy to Learn?
Like most skills, basic SQL can be learned in weeks, but mastery can take months. The key is to understand how the database is designed and what indexes exist so you know what queries make sense. It is also important to appreciate the limitations of a database design.
Transactional versus Decision Support oriented databases
A transactional database design that is optimized for order entry may have very few indexes to ensure high transaction rates can be accommodated by the available compute resources. Running complex SQL queries directly against these operational databases may slow down transactions and impact customers and revenue. If complex decision-making is required, a data warehouse is better suited. A data warehouse is organized to support decision-making, so will have more indexes and may have much of the data pre-aggregated into views to speed queries. The data warehouse is populated from periodic snapshots of the operational databases or is updated in real-time as records change on the operational system.
Where Can I Learn SQL?
There are plenty of resources and SQL programming courses that can be found online.
- W3 Schools is a great place to learn and practice SQL. The URL is https://www.w3schools.com/sql/default.asp
- Codecadmy provides a certificated SQL class at https://www.codecademy.com/learn/learn-sql
- Actian provides an SQL reference at https://docs.actian.com/avalanche/index.html#page/SQLLanguage/SQL_Statements.htm