Pandas is a Python library that can be used for data manipulation and data analysis. The library offers data structures that can be used for manipulating numerical tables and time series. The pandas library was developed by Wes McKinney and the development started in 2008. In 2009, pandas became open source.
The name pandas is derived from the word “panel data” which means data sets that can contain multiple observations over various time periods for the same individuals. The name pandas also refers to “Python data analysis.”
So, the pandas is an open-source Python library. The library is built on top of the NumPy library. It provides us with some new data structures, such as Series and DataFrame. A pandas Series is a one-dimensional data structure. It is similar to NumPy one-dimensional array. But, the main difference is it is labeled. And the pandas DataFrame is a two-dimensional data structure. It can contain tabular data.
In machine learning, we often use DataFrame to contain the tabular data. The columns contain the features of a dataset and the output. And the rows contain the observations.
This article gave a brief introduction to the pandas Python library. In our next few articles, we will explore more on the pandas Python library. We will learn how to create Series and DataFrame and how to operate on them.






0 Comments