Computer Science

SQL HAVING

SQL HAVING is a clause used in SQL queries to filter data based on aggregate functions. It is used in conjunction with the GROUP BY clause to filter the results of a query based on a condition that applies to the group as a whole, rather than to individual rows. The HAVING clause is similar to the WHERE clause, but it operates on groups rather than individual rows.

Written by Perlego with AI-assistance

7 Key excerpts on "SQL HAVING"

  • Hands-On Data Science with SQL Server 2017
    eBook - ePub

    Hands-On Data Science with SQL Server 2017

    Perform end-to-end data analysis to gain efficient data insight

    • Marek Chmel, Vladimír Mužný(Authors)
    • 2018(Publication Date)
    • Packt Publishing
      (Publisher)
    . Let's explore results from the previous section. Here, we had a result from the following query: SELECT CategoryName , SubcategoryName , COUNT(*) as RecordCountFROM #srcGROUP BY CategoryName, SubcategoryName
    The result from the preceding query has 37 rows containing RecordCount values from 1 to 43. Let's say that subcategories with fewer than five products are not significant for us. We then have two options of how to filter out subcategories with a small amount of products. The first of these options is shown in the following query:
    ; WITH cte AS(SELECT CategoryName , SubcategoryName , COUNT(*) as RecordCountFROM #srcGROUP BY CategoryName, SubcategoryName)SELECT * FROM cte WHERE RecordCount >= 5
    This query uses CTE to calculate aggregations, and when the result of the CTE query is made, it is filtered using the WHERE clause of the query and the CTE. The preceding query works, but using the HAVING clause makes the task at least more readable. Let's look at the following query:
    SELECT CategoryName , SubcategoryName , COUNT(*) as RecordCountFROM #srcGROUP BY CategoryName, SubcategoryNameHAVING COUNT(*) >= 5 The preceding query does not use CTE, but it produces exactly the same result. From the preceding sample query we also see that the HAVING clause, if it's present, always follows the GROUP BY clause.
    Now we have two clauses that are useful for data filtering, so let's explain what the difference is between both clauses. The WHERE clause filters data before it's aggregated; it impacts the amount of records coming into the aggregation. The HAVING clause filters data after it's aggregated. You may wonder whether we can add an aggregate function as part of a predicate into the WHERE clause. However, we must not do this because the result of the aggregate function is not known when the WHERE clause filters rows. Can we add a not aggregated column into the HAVING clause? Yes, we can, but it reduces the readability of our queries. Let's explore this in the following example. In this example, we don't want to aggregate data for the Clothing category; the following queries will do this for us:
  • SQL Pocket Primer
    eBook - ePub
    city name:
    SELECT city, COUNT(city) FROM weather GROUP BY city ORDER BY city; +------+-------------+ | city | count(city) | +------+-------------+ |      |           2 | | chi  |           1 | | se   |           1 | | sf   |           7 | +------+-------------+ 4 rows in set (0.003 sec)
    The HAVING clause enables you to specify an additional filter condition for the result set. For example, the following SQL statement extends the previous SQL statement by restricting the result set to cities whose count is greater than 2 in the weather table:
    SELECT city, COUNT(city) FROM weather GROUP BY city HAVING count(*) > 2 ORDER BY city; +------+-------------+ | city | count(city) | +------+-------------+ | sf   |           7 | +------+-------------+ 1 row in set (0.003 sec)
    A HAVING clause and a WHERE clause both filter the data in a result set, but there is a difference. HAVING applies only to groups of data, whereas the WHERE clause applies to individual rows.
    The HAVING clause is executed before the SELECT statement, which means that you cannot use aliases of aggregated columns in the HAVING clause.
    Displaying Duplicate Attribute Values
    In a previous section, you learned how to delete duplicate rows, where two rows are considered duplicates if they have the same attribute value. The following SQL statement uses the HAVING
  • Database Management Systems
    without the HAVING clause, the extracted table will return all rows without any selection taking place on the extracted table.
    • SELECT SI.SALE_ID, COUNT(*) AS TOTAL_OF_ITEMS, SUM(SI.SLITEM_UNITS) AS TOTAL_QUANTITY
    • FROM SALE_ITEM SI GROUP BY SI.SALE_ID;
    In the HAVING clause, all conditional and logical operators apply, for example:
    How many items does each sale have? Give a list with sale items of more than three and the total quantity of product sold of more than seven per sale.
    • SELECT SI.SALE_ID, COUNT(*) AS TOTAL_OF_ITEMS, SUM(SI.SLITEM_UNITS) AS TOTAL_QUANTITY
    • FROM SALE_ITEM SI GROUP BY SI.SALE_ID HAVING COUNT(*) > 3 AND SUM(SI.SLITEM_UNITS) > 7;
    Since aggregate functions return numbers, the conditional and logical operators may be applied to numbers too.
    SQL commands with HAVING clause may be used in conjunction with WHERE clause. The WHERE clause limits the rows extracted from the source table and the HAVING clause limits the rows extracted from the aggregated table.
    How many items does each sale have for 2016? Give a list with sale items having more than three per sale.
    • SELECT SI.SALE_ID, COUNT(*) AS TOTAL_OF_ITEMS, SUM(SI.SLITEM_UNITS) AS TOTAL_QUANTITY
    • FROM SALE_ITEM SI JOIN SALE SL ON SI.SALE_ID = SL.SALE_ID WHERE SL.SALE_DATE BETWEEN '01-01-2016' AND '12-31-2016' GROUP BY SI.SALE_ID HAVING COUNT(*) > 3;
    The WHERE clause limits the rows extracted from the joined source table. Only the sales in 2016 are grouped using the WHERE SL.SALE_DATE BETWEEN ‘01-01-2016' AND '12-31-2016' clause. At the end, the query extracts only the 2016 sales with sale items having more than three per sale.
  • SQL for Data Analytics
    eBook - ePub

    SQL for Data Analytics

    Harness the power of SQL to extract insights from data, 3rd Edition

    • Benjamin Johnston, Jun Shan, Matt Goldwasser, Upom Malik(Authors)
    • 2022(Publication Date)
    • Packt Publishing
      (Publisher)
    0.5 is that the median is the 50th percentile, which is 0.5 as a fraction. This gives you the following result:
    Figure 4.22: Result of an ordered set aggregate function
    With ordered set aggregate functions, you now have the tools for calculating virtually any aggregate statistic of interest for a dataset. In the next section, you will learn how to use aggregates to deal with data quality.

    Aggregate Functions with the HAVING Clause

    You learned about the WHERE clause in this chapter when you worked on SELECT statements, which select only certain rows meeting the condition from the original table for later queries. You also learned how to use aggregate functions with the WHERE clause in the previous section. Bear in mind that the WHERE clause will always be applied to the original dataset. This behavior is defined by the SQL SELECT statement syntax, regardless of whether there is a GROUP BY clause or not. Meanwhile, GROUP BY is a two-step process. In the first step, SQL selects rows from the original table or table set to form aggregate groups. In the second step, SQL calculates the aggregate function results. When you apply a WHERE clause, its conditions are applied to the original table or table set, which means it will always be applied in the first step. Sometimes, you are only interested in certain rows in the aggregate function result with certain characteristics, and only want to keep them in the query output and remove the rest. This can only happen after the aggregation has been completed and you get the results, thus it is part of the second step of GROUP BY
  • Beginning Microsoft SQL Server 2012 Programming
    • Paul Atkinson, Robert Vieira(Authors)
    • 2012(Publication Date)
    • Wrox
      (Publisher)
    Now that you’ve seen how to operate with groups, let’s move on to one of the concepts that a lot of people have problems with. Of course, after reading the next section, you’ll think it’s a snap.

    Placing Conditions on Groups with the HAVING Clause

    Up to now, all of the conditions have been against specific rows. If a given column in a row doesn’t have a specific value or isn’t within a range of values, the entire row is left out. All of this happens before the groupings are really even thought about.
    What if you want to place conditions on what the groups themselves look like? In other words, what if you want every row to be added to a group, but then you want to say that only after the groups are fully accumulated are you ready to apply the condition. Well, that’s where the HAVING clause comes in.
    The HAVING clause is used only when there is also a GROUP BY in your query. Whereas the WHERE clause is applied to each row before it even has a chance to become part of a group, the HAVING clause is applied to the aggregated value for that group.
    Let’s start off with a slight modification to the GROUP BY query you used at the end of the previous section — the one that tells you the number of employees assigned to each manager’s EmployeeID :
    SELECT ManagerID AS Manager, COUNT(*) AS Reports FROM HumanResources.Employee2 GROUP BY ManagerID;
    Code snippet Chap03.sql
    In the next chapter, you’ll learn how to put names on the EmployeeID s that are in the Manager column. For now though, just note that there appear to be three different managers in the company. Apparently, everyone reports to these three people, except for one person who doesn’t have a manager assigned — that is probably the company president (you could write a query to verify that, but instead just trust in the assumption for now).
    This query doesn’t have a WHERE clause, so the GROUP BY was operating on every row in the table and every row is included in a grouping. To test what would happen to your COUNT s, try adding a WHERE
  • Introductory Relational Database Design for Business, with Microsoft Access
    • Jonathan Eckstein, Bonnie R. Schultz(Authors)
    • 2017(Publication Date)
    • Wiley
      (Publisher)
    Avg somewhere in your query, typically in the SELECT clause. Without any aggregation functions, GROUP BY will either cause an error message, have no effect, or operate in the same way as SELECT DISTINCT, which is simpler to use.

    HAVING

    Suppose we are interested in performing the same query, but we want to see only customers who have spent a total of at least $7,000. This restriction cannot be imposed by a WHERE clause, because WHERE restrictions are always applied before grouping and aggregation occur. Here, we need to instead apply a criterion to the result of the Sum aggregation function, which can only be known after the grouping and aggregation steps. To apply criteria after grouping and aggregation, SQL provides an additional clause called HAVING. To implement this query, we add the clause
    HAVING Sum(UnitPrice*Quantity) >= 7000
    to the query, resulting in:
    SELECT FirstName, LastName, Sum(UnitPrice*Quantity) AS Revenue FROM CUSTOMER, ORDERS, ORDERDETAIL, PRODUCT WHERE CUSTOMER.CustomerID = Orders.CustomerID AND Orders.OrderID = ORDERDETAIL.OrderID AND ORDERDETAIL.ProductID = PRODUCT.ProductID GROUP BY CUSTOMER.CustomerID, FirstName, LastName HAVING Sum(UnitPrice*Quantity) >= 7000;
    This clause specifies that, after grouping and aggregation, we retain only output rows for which the sum of the unit price multiplied by quantity is at least 7,000. Now, only “big spenders” appear in the output, as shown in Table 10.9 .
    Table 10.9
    Output of query identifying high‐spending customers.
    First Name Last Name Revenue
    Benjamin Masterson $7,293.60
    Mary Milgrom $23,783.75
    Geoffrey Hammer $8,670.27
    Ashley Flannery $8,246.50
    HAVING and WHERE perform similar functions, but WHERE filters records before aggregation and HAVING filters records after
  • Data Science Fundamentals Pocket Primer
    GROUP BY in a SQL statement to display the number of purchase orders that were created on a daily basis:
    SELECT purchase_date, COUNT(*) FROM purchase_orders GROUP BY purchase_date;

    Select Statements with a HAVING Clause

    The following SQL statement illustrates how to specify GROUP BY in a SQL statement to display the number of purchase orders that were created on a daily basis, and only those days where at least 4 purchase orders were created:
    SELECT purchase_date, COUNT(*) FROM purchase_orders GROUP BY purchase_date; HAVING COUNT(purchase_date) > 3;

    WORKING WITH INDEXES IN SQL

    SQL enables you to define one or more indexes for a table, which can greatly reduce the amount of time that is required to select a single row or a subset of rows from a table.
    A SQL index on a table consists of one or more attributes in a table. SQL updates in a table that has one or more indexes requires more time than updates without the existence of indexes on that table because both the table and the index (or indexes) must be updated. Therefore, it’s better to create indexes on tables that involve table columns that are frequently searched.
    Here are two examples of creating indexes on the customers table:
    CREATE INDEX idx_cust_lname ON customers (lname);   CREATE INDEX idx_cust_lname_fname ON customers (lname,fname);

    WHAT ARE KEYS IN AN RDBMS?

    A key
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.