Computer Science

Join Operation in SQL

Join operation in SQL is a technique used to combine data from two or more tables based on a related column between them. It allows users to retrieve data from multiple tables in a single query. There are different types of join operations such as inner join, left join, right join, and full outer join.

Written by Perlego with AI-assistance

8 Key excerpts on "Join Operation in SQL"

  • Learn SQL Database Programming
    eBook - ePub

    Learn SQL Database Programming

    Query and manipulate databases from popular relational database servers using SQL

    Querying Multiple Tables

    In this chapter, you will learn how to query multiple tables. You will learn how to use SQL joins to join two or more tables together, including INNER and OUTER (LEFT, RIGHT, and FULL) joins, and advanced joins (cross, natural, and self joins). You will learn about set theory and how to combine queries using UNION and UNION ALL, and how to get the differences and intersections of different queries. Lastly, you will learn how to optimize queries when they contain multiple tables. In this chapter, we will cover the following topics:
    • Understanding joins
    • Using INNER JOIN
    • Using OUTER JOIN
    • Using advanced joins
    • Understanding set theory
    • Using indexes with your queries
    Passage contains an image

    Technical requirements

    You can refer to the code files of this chapter at the following GitHub link: https://github.com/PacktPublishing/learn-sql-database-programming/tree/master/chapter-7
    Passage contains an image

    Understanding joins

    Before we begin a discussion on the types of joins, let's go over what a join is and why you would use one. A join refers to when you connect two or more tables in a query. Joining tables in a query requires you to join them on a related column that is in each table you want to join together. There are a couple of different types of joins, including the following ones:
    • Inner join : This type of join returns only matching records from each joined table.
    • Outer join : This type of join has a few types of joins that can be used, including the following:
    • Left outer join : This type of join includes all rows from the left table and any matching rows between the left and right tables.
    • Right outer join : This type of join includes all rows from the right table and any matching rows between the right and left tables.
    • Full outer join : This type of join includes all rows from both the left and right tables. This type of join is not available in MySQL.
  • Professional Microsoft SQL Server 2008 Integration Services
    • Brian Knight, Erik Veerman, Grant Dickinson, Douglas Hinson, Darren Herbold(Authors)
    • 2011(Publication Date)
    • Wrox
      (Publisher)
    Merge Join typically uses less memory than the Lookup Component, because it only maintains the required few rows in memory to support joining the two streams. However, it does not support short circuit execution in that both pipelines need to stream their entire contents before the component considers its work done. For example, if the first input has five rows, and the second input has one million rows, and it so happens that the first five rows immediately join successfully, the component will still stream the other 999,995 rows from the second input even though they cannot possibly be joined anymore. This makes sense in left-join scenarios; however, the architectural reasons for this being the case in inner-join scenarios is beyond the scope of this chapter.
    Contrasting to the Relational Join
    Though the methods and syntax you employ in the relational and SSIS worlds may differ, joining multiple row sets together using congruent keys achieves the same desired result. In the relational database world the equivalent of a lookup is accomplished by joining two or more tables together using declarative syntax that executes in a set-based manner. The operation remains close to the data at all times; there is typically no need to move the data out-of-process with respect to the database engine (except when joining across databases, though this is usually a non-optimal operation). When joining tables within the same database, the engine can take advantage of multiple different internal algorithms, knowledge of table statistics, cardinality, temporary storage, cost-based plans, and the benefit of many years of ongoing research and code optimization. Operations can still complete in a resource-constrained environment because the platform has many intrinsic functions and operators that simplify multi-step operations, such as implicit parallelism, paging, sorting, and hashing.
    In a cost-based optimization database system, the end-user experience is typically transparent; the declarative SQL syntax (calculus) abstracts the underlying relational machinations (algebra) such that the user may not in fact know how the problem was solved by the engine (thus; query plans). In other words, the engine has the ability to transform a problem statement as defined by the user into an internal form that can be optimized into one of many solution sets — transparently. The end-user experience is usually synchronous and non-blocking; results are materialized in a streaming manner with the engine effecting the highest degree of parallelism possible.
  • SQL Pocket Primer
    eBook - ePub
    If you are not convinced of the preceding statement, consider this scenario: you have great performance in your SQL statements, but you aren’t sure if all the data is correct. If you have mission critical data that requires 100% data integrity, then data integrity has a higher priority than optimal performance.
    Fortunately, performance issues can sometimes be addressed by performing the appropriate denormalization of relevant tables. Note that this will involve rewriting the SQL statements that perform a JOIN of the normalized tables so that the new SQL statements query the single denormalized database table.
    Types of SQL JOIN Statements
    The JOIN keyword enables you to define various types of SQL statements that have slightly different semantics:
    • INNER JOIN
    • LEFT OUTER JOIN
    • RIGHT OUTER JOIN
    • CROSS JOIN
    • SELF-JOIN
    Let’s suppose that table A has some (but not all) corresponding rows in table B , and that table B has some (but not all) corresponding rows in table A . Moreover, let’s also assume that a JOIN statement specifies table A first and then table B .
    An INNER JOIN returns all rows from table A that have non-empty matching rows in another table.
    A LEFT JOIN returns all rows from left-side table A and either matching rows from the right-side table B or NULL if no matching rows in right-side table B .
    A RIGHT JOIN returns all rows from right-side table B and either matching rows from the left-side table A or NULL if no matching rows in table A .
    A CROSS JOIN is a Cartesian or “full” product of rows from left-side table A and right-side table B .
    A SELF JOIN joins a table to itself. A common use-case involves an employees table that contains a manager attribute for each employee. Given an employee in this table, find the value in the manager attribute for that employee, and then search the employees table a second time using the manager attribute.
    This sequence of steps can be repeated until the top-most employee does not have a manager (such as the CEO). Given an employee, the preceding sequence produces the management hierarchy from the employee to the topmost employee in a company (defined in the table).
  • Beginning Microsoft SQL Server 2012 Programming
    • Paul Atkinson, Robert Vieira(Authors)
    • 2012(Publication Date)
    • Wrox
      (Publisher)
    UNION operator, which allows you to combine the results of two queries into one.

    COMBINING TABLE DATA WITH JOINS

    When you’re operating in a normalized environment, you’ll frequently run into situations in which not all of the information that you want is in one table. In other cases, all the information you want returned is in one table, but the information you want to place conditions on is in another table. This is where the JOIN clause comes in.
    A JOIN does just what it sounds like — it joins the information from two tables together into one result set. You can think of a result set as being a virtual table. It has both columns and rows, and the columns have data types. Indeed, in Chapter 7 , I’ll show you how to treat a result set as if it were a table and use it for other queries.
    How exactly does a JOIN put the information from two tables into a single result set? Well, that depends on how you tell it to put the data together — that’s why there are four kinds of JOIN s. The thing that all JOIN s have in common is that they match one record up with one or more other records to make a record that is a superset created by the combined columns of both records.
    For example, take a look at a record from a table called Films :
    FILMID FILMNAME YEARMADE
    1 My Fair Lady 1964
    Now follow that up with a record from a table called Actors :
    FILMID FIRSTNAME LASTNAME
    1 Rex Harrison
    With a JOIN , you can create one record from these two records found in totally separate tables:
    This JOIN (at least apparently) joins records in a one-to-one relationship. One Films record joins to exactly one Actors record.
    Let’s expand things just a bit and see if you can see what’s happening. I’ve added another record to the Actors
  • Introductory Relational Database Design for Business, with Microsoft Access
    • Jonathan Eckstein, Bonnie R. Schultz(Authors)
    • 2017(Publication Date)
    • Wiley
      (Publisher)
    State field value does not match the condition. When comparing text fields to literal character strings such as “CA”, you should enclose the literal character strings in double quotes. Otherwise, SQL will try to interpret the character string as an attribute name.

    Inner Joins

    So far, this chapter has considered only queries drawn from a single table. We now discuss how SQL can express queries based on data from multiple tables. The most common technique for basing queries on multiple tables is called an inner join. An inner join consists of all combinations of rows selected from two tables that meet some matching condition, formally called a join predicate. One standard syntax for an inner join, of which we have already seen examples earlier in this book, is:
    First_Table INNER JOIN Second_Table ON Condition
    Formally, this syntax specifies that the query should form a table consisting of all combinations of a record from First_Table with a record from Second_Table for which Condition evaluates to “true.” Most frequently, Condition specifies that a foreign key in one table should match a primary key in the other.
    Here is an example of an inner join based on the plumbing store database:
    SELECT FirstName, LastName, OrderDate FROM CUSTOMER INNER JOIN ORDERS ON CUSTOMER.CustomerID = ORDERS.CustomerID;
    The INNER JOIN expression is now the data_source following the FROM keyword, where before we used a single table. This construction means that the data to be displayed is taken from the temporary table resulting from the inner join operation. The particular inner join expression, namely,
    CUSTOMER INNER JOIN ORDERS ON CUSTOMER.CustomerID = ORDERS.CustomerID
    specifies that the query should be based on all combinations of records from the CUSTOMER and ORDERS tables that have matching CustomerID fields. This kind of primary key to foreign key matching condition is by far the most common kind of inner join. CUSTOMER.CustomerID refers to the CustomerID field from the CUSTOMER table, while ORDERS.CustomerID refers to the CustomerID field from the ORDERS table. The use of “. ” here is called qualification. It is required to eliminate ambiguity whenever several underlying tables have fields of the same name, as is the case for the CustomerID in this example: if we were to just write CustomerID , SQL would not be able to tell whether we were referring to the CustomerID field in the CUSTOMER table or the CustomerID field in the ORDERS table. To resolve this ambiguity, we preface an attribute name with a table name and “. ”: for example, CUSTOMER.CustomerID means the CustomerID
  • Professional Microsoft SQL Server 2014 Integration Services
    • Brian Knight, Devin Knight, Jessica M. Moss, Mike Davis, Chris Rock(Authors)
    • 2014(Publication Date)
    • Wrox
      (Publisher)
    Though the methods and syntax you employ in the relational and SSIS worlds may differ, joining multiple row sets together using congruent keys achieves the same desired result. In the relational database world, the equivalent of a Lookup is accomplished by joining two or more tables together using declarative syntax that executes in a set-based manner. The operation remains close to the data at all times; there is typically no need to move the data out-of-process with respect to the database engine as long as the databases are on the same SQL Server instance (except when joining across databases, though this is usually a nonoptimal operation). When joining tables within the same database, the engine can take advantage of multiple different internal algorithms, knowledge of table statistics, cardinality, temporary storage, cost-based plans, and the benefit of many years of ongoing research and code optimization. Operations can still complete in a resource-constrained environment because the platform has many intrinsic functions and operators that simplify multi-step operations, such as implicit parallelism, paging, sorting, and hashing.
    In a cost-based optimization database system, the end-user experience is typically transparent; the declarative SQL syntax abstracts the underlying relational machinations such that the user may not in fact know how the problem was solved by the engine. In other words, the engine is capable of transforming a problem statement as defined by the user into an internal form that can be optimized into one of many solution sets — transparently. The end-user experience is usually synchronous and nonblocking; results are materialized in a streaming manner, with the engine effecting the highest degree of parallelism possible.
    The operation is atomic in that once a join is specified, the operation either completes or fails in total — there are no substeps that can succeed or fail in a way the user would experience independently. Furthermore, it is not possible to receive two result sets from the query at the same time — for instance, if you specified a left join, then you could not direct the matches to go one direction and the nonmatches somewhere else.
    Advanced algorithms allow efficient caching of multiple joins using the same tables — for instance, round-robin read-ahead enables separate T-SQL statements (using the same base tables) to utilize the same caches.
    The following relational query joins two tables from the AdventureWorksDW database together. Notice how you join only two tables at a time, using declarative syntax, with particular attention being paid to specification of the join columns:
  • Systems Analysis and Synthesis
    eBook - ePub

    Systems Analysis and Synthesis

    Bridging Computer Science and Information Technology

    SQL to describe several important aspects of databases at a conceptual level. We have done so because it introduces us to many useful concepts, such as view , join , transaction , deadlock , etc., at a reasonably abstract level. These concepts are useful to any database — however it is implemented.
    Of particular interest are the join algorithms used in query optimisation, and the methods used to ensure the atomicity of transactions. In later chapters we shall see how these basic ideas can be adapted to different technologies and infrastructures.
    Passage contains an image

    6.8. Further Reading

    Chapter 8 of ‘Foundations of Computer Science: C Edition’ by Alfred V. Aho and Jeffrey D. Ullman (1994, ISBN: 0-7167-8284-7) gives an excellent, SQL -free, introduction to query optimisation. The 2002 textbook ‘Databases and Transaction Processing: An Application-Oriented Approach’ by Philip M. Lewis , Arthur Bernstein , and Michael Kifer (ISBN: 0-201-070872-8) is a good source of material about database technology and includes plenty of material about the SQL language and its implementation.
    SQL (originally called SEQUEL ) was initially developed by IBM researchers in the 1970s. The language is now the subject of ANSI and ISO standards.
    PHP , which we have cited as a widely used host language for SQL , was created by Rasmus Lerdorf in 1995. PHP was originally an acronym for ‘Personal Home Page’. It is free software.
    PL/SQL , used in the example of embedded SQL , is a product of the Oracle Corporation .
    Passage contains an image

    6.9. Exercises

    1. On page 192 we discussed a query that could produce a list of timetable clashes . It was suggested that only a very clever optimiser would spot that a cross join was being made between two identical sets of data. Suppose that the DBMS’s query optimiser proves inadequate and the resulting query takes too long. It is decided to write an embedded SQL procedure to create the cross join from a single instance of the data. This would involve fetching a list of enrolments for one candidate at a time, storing them in an array — ten locations would be more than enough — and inserting all the required pairs into a new table. Write a query that will produce the required lists of enrolments.
    2. Candidate C is enrolling in subjects S and T
  • Relational Database Design and Implementation
    You can extend the same table join technique you have just read about to find as many rows in a table you need. Create one copy of the table with a correlation name for the number of rows the query needs to match in the FROM clause and join those tables together. In the WHERE clause, use a predicate that includes one restrict for each copy of the table. For example, to retrieve data that have four specified rows in a table, you need four copies of the table, three joins, and four expressions in the restrict predicate. The general format of such a query is

    Outer Joins

    As you read in Chapter 6 , an outer join is a join that includes rows in a result table even though there may not be a match between rows in the two tables being joined. Whenever the DBMS can’t match rows, it places nulls in the columns for which no data exist. The result may therefore not be a legal relation because it may not have a primary key. However, because a query’s result table is a virtual table that is never stored in the database, having no primary keys doesn’t present a data integrity problem.
    To perform an outer join using the SQL-92 syntax, you indicate the type of join in the FROM clause. For example, to perform a left outer join between the customer and sale tables, you could type
    The result appears in Figure 17.4 . Notice that five rows appear to be empty in the sale_id and sale_date columns. These five customers haven’t made any purchases. Therefore, the columns in question are actually null. However, most DBMSs have no visible indicator for null; it looks as if the values are blank. It is the responsibility of the person viewing the result table to realize that the empty spaces represent nulls rather than blanks.
    Figure 17.4   The result of an outer join.
    The SQL-92 outer join syntax for joins has the same options as the inner join syntax:
    If you use the syntax in the preceding example, the DBMS will automatically perform the outer join on all matching columns between the two tables.
    If you want to specify the columns over which the outer join will be performed, and the columns have the same names in both tables, add a USING clause:
    If the columns over which you want to perform the outer join do not have the same name, then append an ON clause that contains the join condition:
    Note: The SQL standard also includes an operation known as the UNION JOIN. It performs a FULL OUTER JOIN on two tables and then throws out the rows that match, placing all those that don’t match in the result table. The UNION JOIN hasn’t been widely implemented.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.