The advantage of SQL, besides being one of the basic languages of data science, is that it is easy to read and understand. In fact, it is a language whose syntax is quite similar to that of English.
Another advantage of SQL is that it is a flexible and versatile language that offers many possibilities and ways of visualizing data to meet the specific demands of each user. SQL also helps us to ensure data integrity by making sure that the data is valid, consistent and error-free.
Example of a SQL command
One of the most basic functions we can perform in SQL is to join two independent tables using the ‘Left Join’ statement, which integrates all the records of the left table with the records of the right table:
SELECT Customers.CustomerName, Orders.OrderID
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerName
As we can see in this example, the SQL language is very similar to English. ‘SELECT’ means select, ‘FROM’ is from and ‘ORDER BY’ means sort by. Basically, it is a programming language that uses the same syntax as English, so learning it is not complicated.
SQL: Basic commands and terms
Of all, ‘SELECT’ is the most basic SQL command and the one that anyone who wants to get started in this computer language should learn first.
In SQL, ‘SELECT’ is used to describe the set of data we want to get from a database. That is, it is the main query and the one we will probably use the most for everything. To make the query, we must specify which tables contain the data, the relationships between the data, the fields or calculations applied to the data, the criteria that the data must meet to be included in the selection and how we want to sort the data.
Beyond ‘SELECT’, some of the other most basic SQL commands are:
|SQL Command||What does it do?|
|SELECT||It shows us a list of those fields that contain data of interest according to our query.|
|FROM||It shows us the tables that contain the fields of the ‘SELECT’ clause.|
|WHERE||It specifies the field criteria the data must meet to be included in the selection.|
|ORDER BY||It specifies how to sort the data.|
|GROUP BY||In commands that include aggregated functions, ‘Group by’ shows us the fields that are not summarized in the SELECT clause.|
|HAVING||In commands that include aggregated functions, it specifies the conditions that apply to the fields summarized in the ‘SELECT’ clause.|
10 SQL tricks that will make your job easier
A continuación listamos los que, según nuestra experiencia, son los 10 mejores consejos para trabajar con este lenguaje de programación.
Here are what, in our experience, are the 10 best tips for working with this programming language.
1. Use descriptive names for data tables
When naming our data tables and table columns, it is advisable to use names as simple, easy to understand and descriptive as possible. This will make our work much easier when performing queries and executing our commands.
It is more advisable to use descriptive names than to use short names that are not understood.
2. Use uppercase and lowercase letters to distinguish between elements
Although it is not mandatory because SQL does not distinguish between uppercase and lowercase letters, most data analysts use lowercase for tables and columns and uppercase for the rest.
This distinction between lowercase and uppercase makes queries easier to read and understand.
3. Use the SQL execution order
It is essential to follow the SQL execution order when writing our queries and commands. The established order of SQL provides better performance and avoids problems.
SQL execution order is:
4. GROUP BY
6. ORDER BY
4. Do not use too many tables
The organization of databases in tables facilitates the understanding of the data and their relationships. However, it is not advisable to use a large number of one- or two-column tables, since this data can probably be integrated into other tables.
Having fewer tables will simplify the work and avoid overloading the database.
5. Do not use tables with too many columns
Try to find a middle ground. Normalizing data is efficient, but in the right measure. However, creating extremely long tables with lots and lots of columns is not recommended either.
It is preferable that our tables do not reach 20 columns in order not to overload the database and to facilitate data retrieval and analysis.
6. Maintain consistency between primary and secondary keys
In SQL, relationships between tables are created through keys. Keys are fields entered into the relational table and there are many ways to name them. However, it is important to be consistent in naming the keys. Otherwise, it will be more complex to create the relationships.
It is advisable to name the secondary key after the primary key. For example, if our primary key is labeled “id”, we can call the secondary key «tablename_id».
7. Be careful with dots
Do not use dots in table, column and field names. In SQL dots are used to separate tables from fields. Continuing with the previous example, when you want to refer to another element, use the underscore: «tablename_id».
8. Avoid legacy tables and columns
The data maintenance process and the data validation and verification process will be easier and faster if we get rid of the legacy columns and tables.
9. Store dates as datetime
The inclusion of dates is one of the most complex SQL clauses. Therefore, good SQL practices advise to store days, months and years in the format of the ‘datetime’ field. It is easiest to store a single time zone to avoid complications when analyzing time series.
10. Remember de data source
The data contained in a relational database comes from an external source. In this sense, it is important not to lose sight of the source of origin and to label the views accordingly, referencing the origin of the data.
Want to get more out of your data?