Some Useful aggregate and mathematical functions in MYSQL for data science

Data Science is in too much Hype these days. Every organization is now a day looking for data scientists. The Data science role includes the collection, retrieval, analyzing, and representation of data in pictorial format, including statistical information to the users.

The need for data scientists has been rise due to an increase in the use of the internet. Every layman use data these days. Which causes an increase in data in zillions and trillions.

So, here are some most useful and important MySQL Aggregate functions.

  • Aggregate functions
    • Count()
    • Sum()
    • Avg()
    • Min()
    • Max()
    • Group_concat()
  • Mathematical functions
    • Absolute
    • Ceil and Floor
    • Truncate or Round
    • Modulo
    • Rand
    • Sqrt
    • Power

MYSQL

Introducing the Dataset or What is Dataset?

We will be using the illustration of buying Toys online from the site and it is filled with 10 rows. Below MySQL table records the toys purchased by the client with their respective details also.

We will be taking the above example to show the use of different functions in our article.

Aggregate functions

COUNT()

This is the most important function in MySql. It returns the total number of records/rows in the table. We can use the count() function to get the total number of records/rows in our table. We can achieve it by using the below query.

The total rows in our table are 10. The above query shows the same.

Now, Suppose if we have to count the total number of cities in the dataset. We just have to replace the * with the column name in the count function and we will get the total number of cities in the dataset.

The total number of cities in our table is 10. The above query shows the desired output.

You must have seen, there is one problem in the above query. i.e The city’s names are duplicate and that is counted as a separate row in the count() function.

To get the unique city count, we will be using the DISTINCT() clause with count() function. Below is the way to achieve the desired results.

The above query shows the count of distinct cities in the table.

SUM()

When we declare any column as Numeric, Then it comes to mind we can perform calculations on it. In our table column, order_amount is numeric and this can be used in analyzing sales amount.

SUM() function is used to calculate the total of a column related to sales. We might need a total amount of sales per city, month, or year. So, we can use this function to analyze the sales amount.

Below is the way to calculate the ‘Total Sales‘ from the column ‘order_amount‘ of the table.

What if we have to calculate the ‘Total Sales’ by Client city?

We can add the GROUP BY clause to achieve the same. See the Below way to get the desired results.

Similarly, if we have to calculate the ‘Total Sales’ by Toy Name, to analyse which toy is more popular among customers ?

We can get the results by using the GROUP BY clause just by changing the column name. See the Below way to get the desired results. We can see by ‘Total sales’ column that the Balls has been ordered more.

Similarly, if we have to calculate the quantity of Toys sold, to analyse which toy has to be produced more ?

We can get the results by using the GROUP BY clause just by changing the column name in above example.

We have used ORDER BY clause to sort the quantity in descending format. By this we can easily see product sold with their quantity. See the below way to get the desired results.

AVG()

AVG() is the mean of MySql.

Average is also one most important function to be used in analysis of data. The mean is the average of the numbers. It is easy to calculate: add up all the numbers, then divide by how many numbers there are. In other words it is the sum divided by the count. Same thing can be done with the help of AVG() function in mysql.

Below is the way to find the avg() of order amount in our table.

What if, we have to get the average order amount value by city?

Below is the way to find it in our table.

We have got the avg() order amount by cities in the above example.

MAX() and MIN()

Aggregate value analysis isn’t complete without computing the minimum and maximum values. These can be simply computed using the MIN() and MAX() functions.

GROUP_CONCAT()

Suppose, We have to get the DISTINCT city count with their names seperated by comma in a single row. Then we will be adding the GROUP_CONCAT() to get the cities name and count() function to know their count. Here is the way to use it.

We have got the distinct city count with their names in a single row.  To get the unique city count, we will be using the DISTINCT() clause with count() function. Above is the way to achieve the desired results.

Mathematical functions

Most of the time you would have to deal with numbers in the SQL table for data analysis. To deal with these numbers, you need mathematical functions. These might have a trivial definition but when it comes to the analysis, they are the most prolifically used functions.

Absolute

abs() is the most useful and common mathematical function. It calculates the absolute value of a numeric value that you pass as an argument.

Now, let’s see the way how it is useful. To understand the use of this functions, Let’s first find out the deviation of the amount for every record from the average amount from our table.

Now, after calculations, we can see some negative values in the output. These can be easily converted to positives using the abs() function as shown below:

Ceil and Floor

If there are numeric values in your dataset, then some times need comes to truncate the decimal part from the number. This can be implemented using both the Ceil() and Floor() functions.

We can simply convert them to either the next higher integer using Ceil().

or the previous lower integer using Floor().

In our table, the order_amount column has lots of decimal values. We can convert them to integers by applying the same.

Truncate or Round

In some cases, the need arose to truncate the numeric values up to 1 or 2 or more than that decimal places. The truncate() function helps us to achieve this by making this process more simple. All you have to do is pass the decimal number as the first argument and the number of places you want to truncate it to as the second argument.

In the above example, We have truncated the values up to 1 and 2 decimal places.

Similarly, we can use the round() function, which helps in rounding off the values. In the below example we are rounding off the values up to 2 decimal places.

Modulo

The modulo function is a powerful and important function to use in the analysis. Modulo returns the remainder left when the second number divides the first number. It is used by calling the function mod(x,y) where the result is the remainder left when x is divided by y.

We can find the odd or even records from the MySql table using it. For example, I can use the modulo function to find those records which had an odd number of quantities.

Or I could find even quantities if I negate the above result by using the not keyword.

RAND()

Sometimes, we need to sort the records randomly or we have to select the records randomly. This can be achieved by the rand() function. It helps in generating a random floating-point number between 0 (inclusive) and 1 (exclusive). This function generates different values every time.

The above screen show how the rand() function is generating different values every time.

Rand() can be used to sort the rows randomly. See the below illustrations:

Randomly selecting any record, this is the best function to use for this purpose. For example, it can be used to select the winners of any contest randomly. Let’s have a look at the below screen. Every time a new row is selected as the winner.

SQRT

This function returns the square root of a positive number only. It will help in statistical analyzing of data in presenting data in pictorial format.

Power

The Power function helps in getting the square root of the value passed as an argument. We can easily find the value of square root, cube root, or nth root of the arguments. The below screen will show the use.

In our dataset/table we can easily get the power value of order_amount. As you can see on the below screen. I have used the round function also to round off the values up to 2 decimal places.

Endnotes

To summarize, I have covered some of the most common used Aggregate functions to be used quite a lot in day-to-day data analysis tasks. I will be adding soon some more articles for the reference.

Hope this article helps you bring out more from your dataset. And if you have any favorite MySql function that you find useful or use quite often, do comment below and share your experience!


See all MySQL String functions MySQL 8 String Functions.


See all PHP string functions PHP String Functions.


You may also like...