What is Pandas
-
Is is an open-source Python library used for data manipulation and
analysis. it can analyze csv data and perform operations as mean, min,
max etc.
parsing multiple file formats converting input data table into a NumPy matrix arrays.
Datastructures in Pandas
-
Pandas provides two types of classes for handling data: Series,
Dataframes
1. Series
-
a one-dimensional labeled array holding data of any type such as
integers, strings, Python objects etc.
s = pd.Series() // Empty series
print(series,'\n') #Series([], dtype: float64)
series = pd.Series(5) //Single value 5
print(series,'\n') #0 5
#dtype: int64
series = pd.Series([1, 2, 3]) //Array [1,2,3]
print(series,'\n') #0 1
#1 2
#2 3
#dtype: int64
s = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
print(s,'\n') # a 1
# b 2
# c 3
# dtype: int64
# Setting index using Dictionary
s = pd.Series({'a':1, 'b':2, 'c':3})
print(s,'\n') # a 1
# b 2
# c 3
# dtype: int64
2. Dataframes
-
A 2-dimensional data structure that holds data like a two-dimension
array or a table with rows and columns.(Eg: csv file)
df = pd.DataFrame()
print(df) #Empty DataFrame
# Columns: []
# Index: []
df = pd.DataFrame([5, 6])
print(df) #0
#0 5
#1 6
df = pd.DataFrame({'c1': [1, 2], 'c2': [3, 4]}, //taking dictionary as argument
index=['r1', 'r2'])
print(df) # c1 c2
#r1 1 3
#r2 2 4
// Append additional rows
df = pd.DataFrame([[5, 6], [1.2, 3]])
print(df) # 0 1
# 0 5.0 6
# 1 1.2 3
r = pd.Series([0, 0], name='r3')
df = df.append(r)
print('{}\n'.format(r)) # 0 1
# 0 5.0 6
# 1 1.2 3
# r3 0.0 0
// Dropping rows, coloumns
df = pd.DataFrame({'c1': [1, 2], 'c2': [3, 4], 'c3': [5, 6]}, index=['r1', 'r2'])
print(df) # c1 c2 c3
# r1 1 3 5
# r2 2 4 6
df = df.drop(labels='r1')
print(df) # c1 c2 c3
# r2 2 4 6