In Python, we can use the pandas.DataFrame.describe() function to get the statistical summary of data in a dataset. In this article, we will use pandas to read from a CSV file and then, print the statistical summary of the dataset using the pandas.DataFrame.describe() function.
We can use the following Python code for that purpose:
import pandas
data = pandas.read_csv("iris.csv")
print(data.describe(include="all"))
Here, we first import the pandas module. Then, we read the CSV file named “iris.csv”. The CSV file contains various information, such as the sepal length, sepal width, petal length, petal width, and species of flowers. Out of these five columns, the first four columns contain floating point values and the last column contains strings.
We want to print the statistical summary of all the columns including the categorical columns. So, we pass the argument include=”all” in the pandas.DataFrame.describe() function.
The output of the above program will be:
sepal_length sepal_width petal_length petal_width species
count 150.000000 150.000000 150.000000 150.000000 150
unique NaN NaN NaN NaN 3
top NaN NaN NaN NaN setosa
freq NaN NaN NaN NaN 50
mean 5.843333 3.057333 3.758000 1.199333 NaN
std 0.828066 0.435866 1.765298 0.762238 NaN
min 4.300000 2.000000 1.000000 0.100000 NaN
25% 5.100000 2.800000 1.600000 0.300000 NaN
50% 5.800000 3.000000 4.350000 1.300000 NaN
75% 6.400000 3.300000 5.100000 1.800000 NaN
max 7.900000 4.400000 6.900000 2.500000 NaN
Here, count specifies the number of rows. Unique specifies the number of unique values in a categorical column. It contains NaN …








































0 Comments