Part 2 – Hyperspectral Cube – Getting familiar with data exploration.

Loading the data and check shape.

The Pavia University provided dataset includes the samples as well as the ground truth.

def load_data():
    '''
    X - input: 3D
    y - output: 2D
    '''
    X = loadmat('data/PaviaU.mat')['paviaU']
    y = loadmat('data/PaviaU_gt.mat')['paviaU_gt']
    print("X shape: ", X.shape)
    print("y shape: ", y.shape)
    return X, y

X, y = load_data()

To analyse HSI data I needed to remember that this image data has some differences from the regular cat, dog, human classification problem. HSI is high-dimensional and it took me a short while to wrap my head around this….

The valuable data in HSI is in the spectral content of the pixel which can be used as a unique “fingerprint” to identify materials and/or quality.

To understand it better and to expand on the initial visualisation of the spectral bands in part 1 it could also be explained as: each pixel is a vector of the length of the number of bands of HSI.

To extract this data from pixels we need to reshape it/turn it into a vector.

This can be done automatically by using the reshape() method by providing it with the parameters:

‘-1’ (unknown: which lets the method to figure it out)

and

‘X.shape[2]’ (the shape at index 2: which is 103 in this case)

def get_pixels(X, y):
    # extract pixels/reshape:
    # each pixel is a vector of length of the number of bands of HSI
    
    # examine the data and shape of X sample
    print("X0")
    print(X[0])
    print("X0 shape")
    print(X[0].shape, "\n")


    # reshape
    q = X.reshape(-1, X.shape[2])

    # examine the data and shape after reshape
    print("Q0")
    print(q[0])
    print("Q0 shape")
    print(q[0].shape, "\n")
    
    # X shape
    print("X-shape ", X.shape)
    # q shape
    print("q-shape ", q.shape)


    # '''
    # q shape:  (207400, 103)
    # 610x340
    # '''

As we can see the reshape() method allowed to turn it into a vector.

This is the part where I still need to do some more research to gain a deeper understating of the spectral dimension as it’s still a little unclear to me but since I now have a vector I could not just leave it without plotting it on a graph 🙂 .

We can use matplotlib pyplot to do this.

    # plot the q[0] vector
    plt.plot(q[0])
    plt.show()

    # plot the q[80] vector
    plt.plot(q[80])
    plt.show()

Making the data frame using Pandas library.

Now that I have the vectors it’s time to make a data frame and make space for the classes.

    # data frame
    # loading our 'q' as 'data' 
    df = pd.DataFrame(data = q)

    # dataframe with classes
    # concatenate data and space for classes
    df = pd.concat([df, pd.DataFrame(data = y.ravel())], axis = 1)

    # column names
    df.columns = [f'band {i}' for i in range(1, X.shape[2] + 1)] + ['class']
    # display the data-frame head (5 rows)
    print(df.head())

    # save to csv file
    df.to_csv('data_set.csv')
    return df

The code:

# gettin the pixels and saving to csv file
def get_pixels(X, y):
    # extract pixels/reshape:
    # each pixel is a vector of length of the number of bands of HSI
    print('X0 shape')
    print(X[0].shape, '\n')

    q = X.reshape(-1, X.shape[2])
    print('Q0 shape')
    print(q[0].shape, '\n')

    # X shape
    print('X-shape ', X.shape)
    # q shape
    print('q-shape ', q.shape, '\n')

    # plot vector at index q[0] 
    plt.plot(q[0])
    plt.show()

    '''
    print('q shape: ', q.shape)
    q shape:  (207400, 103)
    610x340
    '''

    # data frame
    df = pd.DataFrame(data = q)

    # dataframe with class column
    df = pd.concat([df, pd.DataFrame(data = y.ravel())], axis = 1)

    # column names
    df.columns = [f'band {i}' for i in range(1, X.shape[2] + 1)] + ['class']

    # store to csv file
    df.to_csv('data_set.csv')
    return df


df = get_pixels(X, y)
print(df.head())

In the next Part 3 of my exploration of HSI I will do some more data exploration and visualization. Also, since HSI data is high dimensional, I will touch on some more complex subjects such as dimension reduction using PCA (I’m still doing some research on that to understand it more). Stay tuned.

Leave a comment