Skip to content

The response pdf is neatly organized in a table format in PDF, contains 15 distinct questions and corresponding 106 student answers. Performed sentiment analysis on pdf by extracting the raw data from pdf and convert it into data frame for easy analysis. Then perform column wise and row wise sentiment analysis and shown the result along with graph.

Notifications You must be signed in to change notification settings

Ujjwal-Jaiswal-UJ/Sentiment-Analysis-on-PDF-file-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Sentiment Analysis on Response PDF file

Response PDF

  • The Response PDF file comprises 106 student feedback responses, all centered around the fascinating subject of Mathematics. These responses are neatly organized in a table format in PDF, featuring 15 distinct questions and corresponding answers.
  • I have delve into techniques for PDF data extraction, gaining practical skills. Sentiment analysis will allow to gauge student sentiments, providing valuable context.

Project Objective

  • The mission is to uncover insights from PDF files and perform sentiment analysis on the student feedback.
  • Extracting data from the PDF and transforming it into a structured dataframe will be our key challenge.
  • Through this project, I will bridge the gap between raw PDF content and meaningful data insights.

Deployment

Reading PDF and Data Cleaning

  • Utilized the PdfReader from the PyPDF2 library to read the Response.pdf file.
  • Extracted the data as text into a NumPy array.
  • Split the text based on question marks (‘?’) to isolate individual questions.
  • Segregated the 15 questions into a list called “ques” and collected student answers into a separate list called “feedbacks”.
  • Removed timestamps and dates from the feedbacks.
  • Developed a function to format feedback elements into new list after every 14 entries.
  • Identified unique student responses and organized them into a nested list representing all student feedback.

Data Frame Conversion

  • Rename nested list as “df_row” for feedbacks and renamed the ques list to “col_df”.
  • Converted both lists into Pandas DataFrames.
  • Concatenated the df_row and col_df DataFrames to create a single table resembling the PDF structure.
  • Shifted the first row to serve as the header, resulting in 15 distinct responses across 106 student rows.

Column-wise Sentiment Analysis

  • Counted the student responses for each distinctive option within a question column.
  • Assigned sentiment scores (positive, zero, negative) to these options.
  • Calculated the total sentiment score by multiplying the count with the corresponding sentiment score and dividing by the total count (106 students).
  • Computed the average sentiment score for each question.
  • Plotted pie graphs to visualize student responses for each question.
  • This provided an overview of sentiment distribution across options.

Row-wise Sentiment Analysis

  • Developed a function for row-wise sentiment analysis using TextBlob.
  • The result was divided by 15 to obtain the final sentiment value for each student.
  • If the value was ≤ 0.12, the student was considered confident with a strong foundation in mathematics; otherwise, they were deemed under-confident with a weaker base.

Sentiment Analysis on Response PDF Presentation

About

The response pdf is neatly organized in a table format in PDF, contains 15 distinct questions and corresponding 106 student answers. Performed sentiment analysis on pdf by extracting the raw data from pdf and convert it into data frame for easy analysis. Then perform column wise and row wise sentiment analysis and shown the result along with graph.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published