Computer Science

GROUP BY SQL

GROUP BY SQL is a clause used in SQL queries to group rows that have the same values. It is used in conjunction with aggregate functions such as SUM, COUNT, AVG, etc. to perform calculations on the grouped data. The resulting output displays the grouped data and the calculated values.

Written by Perlego with AI-assistance

11 Key excerpts on "GROUP BY SQL"

  • SQL for Data Analytics
    eBook - ePub

    SQL for Data Analytics

    Harness the power of SQL to extract insights from data, 3rd Edition

    • Benjamin Johnston, Jun Shan, Matt Goldwasser, Upom Malik(Authors)
    • 2022(Publication Date)
    • Packt Publishing
      (Publisher)
    GROUP BY clause.
    Note
    To access the source code for this specific section, please refer to https://packt.link/OU9zr .

    Aggregate Functions with the GROUP BY Clause

    So far, you have used aggregate functions to calculate statistics for an entire column. However, most times you are interested in not only the aggregate values for a whole table but also the values for smaller groups in the table. To illustrate this, refer back to the customers table. You know that the total number of customers is 50,000. However, you might want to know how many customers there are in each state. But how can you calculate this?
    You could determine how many states there are with the following query: SELECT DISTINCT   state FROM   customers;
    You will see 50 distinct states, Washington D.C., and NULL returned as a result of the preceding query, totaling 52 rows. Once you have the list of states, you could then run the following query for each state:
    SELECT   COUNT(*) FROM   customers WHERE   state='{state}'
    Although you can do this, it is incredibly tedious and can take a long time if there are many states. The GROUP BY clause provides a much more efficient solution.

    The GROUP BY Clause

    GROUP BY is a clause that divides the rows of a dataset into multiple groups based on some sort of key that is specified in the clause. An aggregate function is then applied to all the rows within a single group to produce a single number for that group. The GROUP BY key and the aggregate value for the group are then displayed in the SQL output. The following diagram illustrates this general process:
    Figure 4.11: General GROUP BY computational model
    In the preceding diagram, you can see that the dataset has multiple groups (Group 1 , Group 2 , …, Group N ). Here, the aggregate function is applied to all the rows in Group 1 and generates the result Aggregate 1 . Then, the aggregate function is applied to all the rows in Group 2 and generates the result Aggregate 2
  • Data Wrangling with SQL
    eBook - ePub

    Data Wrangling with SQL

    A hands-on guide to manipulating, wrangling, and engineering data using SQL

    • Raghav Kandarpa, Shivangi Saxena(Authors)
    • 2023(Publication Date)
    • Packt Publishing
      (Publisher)
    each category.
    The result set displays the total revenue, maximum revenue, and minimum revenue for each category. In this case, the X category has a total revenue of 250, with a maximum revenue of 150 and a minimum revenue of 100. The Y category has a total revenue of 750, with a maximum revenue of 300 and a minimum revenue of 200.
    In summary, the GROUP BY clause in SQL allows us to group rows in a result set based on one or more columns and then use aggregate functions to perform calculations on the grouped data. This is a powerful tool for analyzing and summarizing large sets of data.

    Case scenario

    An interesting use case scenario for the GROUP BY clause in SQL could be analyzing survey data for a market research company. The company may have a table of survey responses, with columns for the respondent’s age, gender, income, and overall satisfaction rating.
    To understand the demographics of the respondents and their satisfaction levels, the market research company might use GROUP BY to group the responses by age and gender and then use aggregate functions such as COUNT() and AVG() to calculate the total number of respondents for each group and their average satisfaction rating.
    respondent_id age gender income satisfaction_rating
    1 25 Male 50000 4
    2 30 Female 55000 5
    3 35 Male 60000 3
    4 40 Female 65000 4
    5 45 Male 70000 4
    6 50 Female 75000 5
    Figure 8.6 – survey_responses table
    For example, the following SQL query would group responses by age and gender and calculate the total number of respondents and average satisfaction rating for each group:
    SELECT age,  gender,   COUNT(*) as num_respondents,   AVG(satisfaction_rating) as avg_satisfaction FROM survey_responses GROUP BY age, gender ORDER BY avg_satisfaction DESC;
  • OCA: Oracle Database 11g Administrator Certified Associate Study Guide
    • Biju Thomas(Author)
    • 2011(Publication Date)
    • Sybex
      (Publisher)
    job_id . The SQL shows the number of different jobs within each department:
    SELECT department_id, job_id, COUNT(*) FROM employees GROUP BY department_id, job_id ORDER BY 1, 2;   DEPARTMENT_ID JOB_ID       COUNT(*) ------------- ---------- ----------            10 AD_ASST             1            20 MK_MAN              1            20 MK_REP              1            30 PU_CLERK            5            30 PU_MAN              1            40 HR_REP              1            50 SH_CLERK           20            50 ST_CLERK           20            50 ST_MAN              5            60 IT_PROG             5            70 PR_REP              1            80 SA_MAN              5            80 SA_REP             29            90 AD_PRES             1            90 AD_VP               2           100 FI_ACCOUNT          5           100 FI_MGR              1           110 AC_ACCOUNT          1           110 AC_MGR              1               SA_REP              1
    The GROUP BY clause groups data, but Oracle does not guarantee the order of the result set by the grouping order. To order the data in any specific order, you must use the ORDER BY clause.
    Group-Function Overview
    Tables 3-1 and 3-2
  • Introductory Relational Database Design for Business, with Microsoft Access
    • Jonathan Eckstein, Bonnie R. Schultz(Authors)
    • 2017(Publication Date)
    • Wiley
      (Publisher)
    CustomerID attribute is already present in ORDERS. Dispensing with the CUSTOMER table, we obtain exactly the same output through the simpler query:
    SELECT CustomerID, Sum(UnitPrice*Quantity) AS Revenue FROM ORDERS, ORDERDETAIL, PRODUCT WHERE ORDERS.OrderID = ORDERDETAIL.OrderID AND ORDERDETAIL.ProductID = PRODUCT.ProductID GROUP BY CustomerID;
    But if we wish to display customer name information, we must use the CUSTOMER table. Returning to the version of the query that displays customer names, an alternative to using a seemingly redundant GROUP BY clause is to apply some unnecessary aggregation operation in the SELECT clause. For example, we could write:
    SELECT Min(FirstName), Min(LastName), Sum(UnitPrice*Quantity) AS Revenue FROM CUSTOMER, ORDERS, ORDERDETAIL, PRODUCT WHERE CUSTOMER.CustomerID = ORDERS.CustomerID AND ORDERS.OrderID = ORDERDETAIL.OrderID AND ORDERDETAIL.ProductID = PRODUCT.ProductID GROUP BY CUSTOMER.CustomerID;
    The Min operation, when applied to text data, selects the alphabetically first value within each group. But since FirstName and LastName take the same value throughout each group, the output for each group contains the only possible applicable first name and last name. This query is probably more confusing to a human than the original one, however. If we were to take this approach, we would also need to use more AS modifiers to make the column names in the output more readable.
    We do not give an example here, but it is possible to GROUP BY not only by the values of simple attributes, but the values of general expressions (computed fields).
    When you use GROUP BY, one should have at least one aggregation function such as Sum or Avg
  • Pocket Primer
    eBook - ePub
    GROUP BY in a SQL statement to display the number of purchase orders that were created on a daily basis:
    SELECT purchase_date, COUNT(*) FROM purchase_orders GROUP BY purchase_date;

    Select Statements with a HAVING Clause

    The following SQL statement illustrates how to specify GROUP BY in a SQL statement to display the number of purchase orders that were created on a daily basis, and only those days where at least 4 purchase orders were created:
    SELECT purchase_date, COUNT(*) FROM purchase_orders GROUP BY purchase_date; HAVING COUNT(purchase_date) > 3;

    WORKING WITH INDEXES IN SQL

    SQL enables you to define one or more indexes for a table, which can greatly reduce the amount of time that is required to select a single row or a subset of rows from a table.
    A SQL index on a table consists of one or more attributes in a table. SQL updates in a table that has one or more indexes requires more time than updates without the existence of indexes on that table because both the table and the index (or indexes) must be updated. Therefore, it’s better to create indexes on tables that involve table columns that are frequently searched.
    Here are two examples of creating indexes on the customers table:
    CREATE INDEX idx_cust_lname ON customers (lname);   CREATE INDEX idx_cust_lname_fname ON customers (lname,fname);

    WHAT ARE KEYS IN AN RDBMS?

    A key
  • Hands-On Data Science with SQL Server 2017
    eBook - ePub

    Hands-On Data Science with SQL Server 2017

    Perform end-to-end data analysis to gain efficient data insight

    • Marek Chmel, Vladimír Mužný(Authors)
    • 2018(Publication Date)
    • Packt Publishing
      (Publisher)
    we need to the GROUP BY clause separated by colons. Let's execute the following query: SELECT CategoryName , SubcategoryName , COUNT(*) as RecordCountFROM #srcGROUP BY CategoryName, SubcategoryName
    The result of preceding query shows groups formed of Category names, and then of Subcategory names. Using the GROUP BY clause is relatively straightforward, but what if we want to add subtotals into our result? For this purpose, we have an additional syntax helper, called grouping sets .
    Grouping sets are useful when we want to calculate not only raw groups, but also higher groups. In the previous example, we combined CategoryName and SubcategoryName. If you look carefully at this, you can see that the numbers in the RecordCount column are the same whether we have a CategoryName column added into our query or not. This is because subcategories are fully nested into categories without overlaps. The following sample query will not only calculate record counts for every group built from a combination of categories and subcategories, but it will also compute record count for categories only, and for whole datasets:
    SELECT CategoryName , SubcategoryName , COUNT(*) as RecordCOuntFROM #srcGROUP BY GROUPING SETS( (CategoryName, SubcategoryName) , (CategoryName) , ())
    The GROUP BY clause in the preceding query has changed. The GROUPING SETS keyword introduces brackets containing sets of grouping criteria also enclosed in brackets. Remember that both brackets are mandatory. In our example we have three grouping sets:
  • Mastering PostgreSQL 9.6
    might be right by doing guesswork.
    Other database engines can accept aggregate functions without an OVER or even a GROUP BY clause. However, from a logical point of view this is wrong and on top of that a violation of SQL. Passage contains an image

    Partitioning data

    So far the same result can also easily be achieved using a subselect. However, if you want more than just the overall average, subselects will turn your queries into nightmares. Suppose, you just don't want the overall average but the average of the country you are dealing with. A PARTITION BY clause is what you need:
    test=# SELECT country, year, production, consumption, avg(production) OVER (PARTITION BY country) FROM t_oil; country | year | production | consumption | avg -----------------+------+------------+-------------+----------- Canada | 1965 | 920 | 1108 | 2123.2173 Canada | 2010 | 3332 | 2316 | 2123.2173 Canada | 2009 | 3202 | 2190 | 2123.2173 ... Iran | 1966 | 2132 | 148 | 3631.6956 Iran | 2010 | 4352 | 1874 | 3631.6956 Iran | 2009 | 4249 | 2012 | 3631.6956 ...
    The point here is that each country will be assigned to the average of the country. The OVER clause defines the window we are looking at. In this case the window is the country the row belongs to. In other words the query returns the rows compared to all rows in this country.
    Note that the year column is not sorted. The query does not contain an explicit sort order so it might happen that data is returned in random order. Remember, SQL does not promise sorted output unless you explicitly state what you want.
    Basically, a PARTITION BY clause takes any expression. Usually most people will use a column to partition the data. Here is an example:
  • Mastering PostgreSQL 13
    eBook - ePub

    Mastering PostgreSQL 13

    Build, administer, and maintain database applications efficiently with PostgreSQL 13, 4th Edition

    Handling Advanced SQL
    In Chapter 3 , Making Use of Indexes , you learned about indexing, as well as about PostgreSQL's ability to run custom indexing code to speed up queries. In this chapter, you will learn about advanced SQL. Most of the people who read this book will have some experience of using SQL. However, experience has shown that the advanced features outlined in this book are not widely known, and therefore it makes sense to cover them in this context to help people to achieve their goals faster and more efficiently. There has been a long discussion about whether the database is just a simple data store or whether the business logic should be in the database. Maybe this chapter will shed some light and show how capable a modern relational database really is. SQL is not what it used to be back when SQL-92 was around. Over the years, the language has grown and become more and more powerful.
    This chapter is about modern SQL and its features. A variety of different and sophisticated SQL features are covered and presented in detail. We will cover the following topics in this chapter:
    • Introducing grouping sets
    • Using ordered sets
    • Understanding hypothetical aggregates
    • Utilizing windowing functions and analytics
    • Writing your own aggregates
    By the end of this chapter, you will understand and be able to use advanced SQL.

    Introducing grouping sets

    Every advanced user of SQL should be familiar with the GROUP BY and HAVING clauses. But are they also aware of CUBE, ROLLUP, and
  • OCA: Oracle Database 12c Administrator Certified Associate Study Guide
    • Biju Thomas(Author)
    • 2014(Publication Date)
    • Sybex
      (Publisher)
    As you can see in the result, the space used by each schema in each tablespace is shown as well as the total space used in each tablespace and the total space used by each schema. The total space used in the database (including all tablespaces) is also shown in the very first line.
    Three functions (GROUPING , GROUP_ID , and GROUPING_ID ) can come in very handy when you’re using the ROLLUP and CUBE modifiers of the GROUP BY clause.
    In the examples you have seen using the ROLLUP and CUBE modifiers, there was no way of telling which row was a subtotal and which row was a grand total. You can use the GROUPING function to overcome this problem. Review the following SQL example:
    SELECT gender, marital_status, count(*) num_rec, GROUPING (gender) g_grp, GROUPING (marital_status) ms_grp FROM oe.customers GROUP BY CUBE(marital_status, gender); G MARITAL_STATUS NUM_REC G_GRP MS_GRP - -------------------- ---------- ---------- ---------- 319 1 1 F 110 0 1 M 209 0 1 single 139 1 0 F single 47 0 0 M single 92 0 0 married 180 1 0 F married 63 0 0 M married 117 0 0
    The G_GRP column has a 1 for NULL values generated by the CUBE or ROLLUP modifier for GENDER column. Similarly, the MS_GRP column has a 1 when NULL values are generated in the MARITAL_STATUS column. By using a DECODE function on the result of the GROUPING
  • Mastering PostgreSQL 10
    This actually makes sense because the average has to be defined precisely. The database engine cannot just guess at any value. Other database engines can accept aggregate functions without an OVER or even a GROUP BY clause. However, from a logical point of view this is wrong, and on top of that, a violation of SQL. Passage contains an image

    Partitioning data

    So far, the same result can also easily be achieved using a subselect. However, if you want more than just the overall average, subselects will turn your queries into nightmares. Suppose, you don't just want the overall average but the average of the country you are dealing with. A PARTITION BY clause is what you need:
    test=# SELECT country, year, production, consumption, avg(production) OVER (PARTITION BY country) FROM t_oil;
    country | year | production | consumption | avg ----------+-------+------------+-------------+----------- Canada | 1965 | 920 | 1108 | 2123.2173 Canada | 2010 | 3332 | 2316 | 2123.2173 Canada | 2009 | 3202 | 2190 | 2123.2173 ... Iran | 1966 | 2132 | 148 | 3631.6956 Iran | 2010 | 4352 | 1874 | 3631.6956 Iran | 2009 | 4249 | 2012 | 3631.6956 ...
    The point here is that each country will be assigned to the average of the country. The OVER clause defines the window we are looking at. In this case, the window is the country the row belongs to. In other words, the query returns the rows compared to all rows in this country.
    The year column is not sorted. The query does not contain an explicit sort order so it might be that data is returned in a random order. Remember, SQL does not promise sorted output unless you explicitly state what you want.
    Basically, a PARTITION BY clause takes any expression. Usually, most people will use a column to partition the data. Here is an example:
  • Beginning Microsoft SQL Server 2012 Programming
    • Paul Atkinson, Robert Vieira(Authors)
    • 2012(Publication Date)
    • Wrox
      (Publisher)
    GROUP BY clause, these aggregators perform exactly as they did before, except that they return a count for each grouping rather than the full table. You can use this to get your number of reports:
    SELECT ManagerID, COUNT(*) FROM HumanResources.Employee2 GROUP BY ManagerID;
    Code snippet Chap03.sql
    Notice that you are grouping only by the ManagerID — the COUNT() function is an aggregator and, therefore, does not have to be included in the GROUP BY clause.
    ManagerID ----------- ----------- NULL 1 1 3 4 3 5 4 (4 row(s) affected)
    Your results tell us that the manager with ManagerID 1 has three people reporting to him or her, and that three people report to the manager with ManagerID 4 , as well as four people reporting to ManagerID 5 . You are also able to tell that one Employee record had a NULL value in the ManagerID field. This employee apparently doesn’t report to anyone (hmmm, president of the company I suspect?).
    It’s probably worth noting that you, technically speaking, could use a GROUP BY clause without any kind of aggregator, but this wouldn’t make sense. Why not? Well, SQL Server is going to wind up doing work on all the rows in order to group them, but functionally speaking you would get the same result with a DISTINCT option (which you’ll look at shortly), and it would operate much faster.
    Now that you’ve seen how to operate with groups, let’s move on to one of the concepts that a lot of people have problems with. Of course, after reading the next section, you’ll think it’s a snap.

    Placing Conditions on Groups with the HAVING Clause

    Up to now, all of the conditions have been against specific rows. If a given column in a row doesn’t have a specific value or isn’t within a range of values, the entire row is left out. All of this happens before the groupings are really even thought about.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.