Tuesday, December 16, 2025

DATA SCIENCE – QUESTION & ANSWER FORMAT (M.Tech Exam Oriented)

 

UNIT – 1 : INTRODUCTION TO DATA SCIENCE

Q1. Define Data Science.

Answer: Data Science is an interdisciplinary field that uses scientific methods, statistics, algorithms, and computing techniques to extract meaningful insights and knowledge from structured and unstructured data for decision making.


Q2. What are the basic terminologies of Data Science?

Answer:

  • Data: Raw facts and figures

  • Dataset: Collection of related data

  • Feature: Individual measurable property of data

  • Label: Output variable in supervised learning

  • Model: Mathematical representation of a real-world process

  • Training & Testing Data: Data used to build and evaluate models


Q3. Explain the types of data.

Answer:

  1. Structured Data – Organized in rows and columns (e.g., databases)

  2. Unstructured Data – Text, images, videos

  3. Semi-structured Data – JSON, XML

  4. Qualitative Data – Categorical data

  5. Quantitative Data – Numerical data (Discrete & Continuous)


Q4. Explain the five steps of Data Science.

Answer:

  1. Problem Definition

  2. Data Collection

  3. Data Cleaning & Preparation

  4. Data Analysis & Modeling

  5. Visualization & Decision Making


Q5. Explain NumPy ndarray.

Answer: The NumPy ndarray is a multidimensional array object that stores elements of the same data type and enables fast numerical computation using vectorization.

Features:

  • Fixed size

  • Homogeneous elements

  • Supports broadcasting


Q6. What are Universal Functions (ufuncs)?

Answer: Universal functions are vectorized functions in NumPy that perform element-wise operations on arrays efficiently.

Examples: add(), subtract(), multiply(), sqrt()


Q7. Explain Array-Oriented Programming.

Answer: Array-oriented programming avoids explicit loops and uses vectorized operations on entire arrays, improving performance and readability.


Q8. Explain File I/O with NumPy.

Answer: NumPy provides functions like save(), load(), savetxt(), and loadtxt() to store and retrieve array data from files.


Q9. Explain Linear Algebra operations in NumPy.

Answer: NumPy supports matrix multiplication, transpose, inverse, determinant, and eigenvalues using the numpy.linalg module.


Q10. Explain pseudorandom number generation.

Answer: NumPy generates random numbers using algorithms that produce reproducible sequences controlled by a seed value.


UNIT – 2 : DATA EXPLORATION WITH PANDAS

Q11. What is Data Exploration?

Answer: Data exploration is the process of understanding data characteristics using summary statistics and visualizations before modeling.


Q12. Explain Pandas data structures.

Answer:

  1. Series – One-dimensional labeled array

  2. DataFrame – Two-dimensional labeled table

  3. Index – Immutable array for labeling


Q13. Explain descriptive statistics in Pandas.

Answer: Pandas provides functions like mean(), median(), mode(), std(), min(), max() to summarize data.


Q14. What are correlation and covariance?

Answer:

  • Correlation measures the strength and direction of relationship between variables.

  • Covariance indicates how two variables change together.


Q15. Explain unique values and value counts.

Answer:

  • unique() returns distinct values

  • value_counts() returns frequency of values


Q16. Explain data loading and storage methods in Pandas.

Answer: Pandas supports reading and writing data using CSV, Excel, JSON, SQL databases, and web APIs.


UNIT – 3 : DATA CLEANING, PREPARATION AND WRANGLING

Q17. What is data cleaning?

Answer: Data cleaning is the process of detecting and correcting inaccurate, incomplete, or inconsistent data.


Q18. Explain methods for handling missing data.

Answer:

  • dropna()

  • fillna()

  • Forward/Backward filling


Q19. Explain data transformation.

Answer: Data transformation converts data into suitable format using normalization, scaling, encoding, and aggregation.


Q20. Explain string manipulation in Pandas.

Answer: Pandas provides vectorized string methods using .str accessor for operations like lower(), split(), replace().


Q21. What is data wrangling?

Answer: Data wrangling is the process of cleaning, structuring, and enriching raw data into a usable format.


Q22. Explain combining and merging datasets.

Answer: Pandas supports merge(), join(), and concat() to combine datasets using keys or indexes.


Q23. Explain reshaping and pivoting.

Answer: Reshaping changes the structure of data using pivot(), stack(), unstack(), and melt().


UNIT – 4 : DATA VISUALIZATION & GROUP OPERATIONS

Q24. Explain Matplotlib API.

Answer: Matplotlib is a Python library used for creating static, animated, and interactive plots.


Q25. Explain plotting with Pandas and Seaborn.

Answer: Pandas uses Matplotlib internally, while Seaborn provides high-level statistical visualizations.


Q26. Explain GroupBy mechanics.

Answer: GroupBy splits data into groups, applies functions, and combines results.


Q27. What are pivot tables and cross-tabulation?

Answer: Pivot tables summarize data using aggregation, while cross-tabulation computes frequency tables.


UNIT – 5 : STATISTICAL THINKING & TIME SERIES ANALYSIS

Q28. Explain statistical distributions.

Answer: Distributions describe how values are spread and can be visualized using histograms.


Q29. What are outliers?

Answer: Outliers are extreme values that deviate significantly from other observations.


Q30. Explain PMF and CDF.

Answer:

  • PMF gives probability of discrete values

  • CDF gives cumulative probability up to a value


Q31. Explain percentile-based statistics.

Answer: Percentiles divide data into 100 equal parts and help compare relative standing.


Q32. Explain Time Series Analysis.

Answer: Time series analysis studies data points indexed in time order to identify patterns and trends.


Q33. Explain date and time handling in Pandas.

Answer: Pandas provides datetime objects, date ranges, frequency conversion, and time zone handling for time-based data.



⭐ MOST IMPORTANT QUESTIONS (HIGH PROBABILITY)

🔥 UNIT–1 (Very Important)

  1. Define Data Science and explain the five steps of Data Science ⭐⭐⭐

  2. Explain NumPy ndarray and its features ⭐⭐

  3. Explain Universal Functions and Array-Oriented Programming ⭐⭐

  4. Explain Linear Algebra operations in NumPy ⭐⭐


🔥 UNIT–2 (Very Important)

  1. Explain Pandas Data Structures with examples ⭐⭐⭐

  2. Explain Descriptive Statistics, Correlation and Covariance ⭐⭐

  3. Explain Data loading and storage methods in Pandas ⭐⭐


🔥 UNIT–3 (Very Important)

  1. Explain Data Cleaning and handling missing data ⭐⭐⭐

  2. Explain Data Wrangling – combining and merging datasets ⭐⭐

  3. Explain Reshaping and Pivoting operations ⭐⭐


🔥 UNIT–4 (Very Important)

  1. Explain Matplotlib architecture and API ⭐⭐⭐

  2. Explain GroupBy mechanics and Split–Apply–Combine strategy ⭐⭐⭐

  3. Explain Pivot tables and Cross-tabulation ⭐⭐


🔥 UNIT–5 (Very Important)

  1. Explain Statistical Distributions and Outliers ⭐⭐⭐

  2. Explain PMF and CDF with examples ⭐⭐⭐

  3. Explain Time Series Analysis and Pandas time tools ⭐⭐⭐


✏️ IMPORTANT DIAGRAMS & FLOWCHARTS (WHAT TO DRAW IN EXAM)

📌 1. Data Science Life Cycle (UNIT–1) ⭐⭐⭐

Draw a flowchart:

Problem Definition → Data Collection → Data Cleaning → Data Analysis → Visualization & Decision Making

👉 Label each step clearly.


📌 2. NumPy Array Structure (UNIT–1) ⭐⭐

Draw:

  • 2D matrix

  • Show rows, columns, shape

👉 Mention: homogeneous data, fast computation.


📌 3. Pandas Data Structures (UNIT–2) ⭐⭐⭐

Draw block diagram:

Series → DataFrame → Index

👉 Show Series as single column, DataFrame as table.


📌 4. Data Cleaning Process (UNIT–3) ⭐⭐⭐

Flowchart:

Raw Data → Missing Value Handling → Transformation → Clean Data

📌 5. Data Wrangling Operations (UNIT–3) ⭐⭐

Diagram:

Merge / Join / Concat → Reshape → Final Dataset

📌 6. Split–Apply–Combine Strategy (UNIT–4) ⭐⭐⭐

Very Important Diagram:

Data → Split (GroupBy) → Apply (Aggregation) → Combine (Result)

📌 7. Histogram & Distribution (UNIT–5) ⭐⭐⭐

Draw:

  • X-axis: Values

  • Y-axis: Frequency

👉 Mention: shape, spread, outliers.


📌 8. PMF vs CDF Graph (UNIT–5) ⭐⭐⭐

Draw two graphs:

  • PMF: discrete bars

  • CDF: increasing curve


📌 9. Time Series Plot (UNIT–5) ⭐⭐⭐

Draw:

  • X-axis: Time

  • Y-axis: Value

  • Mark trend/seasonality


🏆 FINAL EXAM WRITING STRATEGY

✔ Start answer with definition ✔ Add diagram/flowchart wherever possible ✔ Use keywords from syllabus ✔ End with applications / advantages

👉 This is a complete scoring package for your Data Science exam.

No comments:

Post a Comment

 NLP UNIT-2 Grammars and Parsing – Top- Down and Bottom- Up Parsers 4 1. Grammars in Natural Language Processing Definition A grammar is a ...