TEJASWI : DATA SCIENCE – QUESTION & ANSWER FORMAT (M.Tech Exam Oriented)

UNIT – 1 : INTRODUCTION TO DATA SCIENCE

Q1. Define Data Science.

Answer: Data Science is an interdisciplinary field that uses scientific methods, statistics, algorithms, and computing techniques to extract meaningful insights and knowledge from structured and unstructured data for decision making.

Q2. What are the basic terminologies of Data Science?

Answer:

Data: Raw facts and figures
Dataset: Collection of related data
Feature: Individual measurable property of data
Label: Output variable in supervised learning
Model: Mathematical representation of a real-world process
Training & Testing Data: Data used to build and evaluate models

Q3. Explain the types of data.

Answer:

Structured Data – Organized in rows and columns (e.g., databases)
Unstructured Data – Text, images, videos
Semi-structured Data – JSON, XML
Qualitative Data – Categorical data
Quantitative Data – Numerical data (Discrete & Continuous)

Q4. Explain the five steps of Data Science.

Answer:

Problem Definition
Data Collection
Data Cleaning & Preparation
Data Analysis & Modeling
Visualization & Decision Making

Q5. Explain NumPy ndarray.

Answer: The NumPy ndarray is a multidimensional array object that stores elements of the same data type and enables fast numerical computation using vectorization.

Features:

Fixed size
Homogeneous elements
Supports broadcasting

Q6. What are Universal Functions (ufuncs)?

Answer: Universal functions are vectorized functions in NumPy that perform element-wise operations on arrays efficiently.

Examples: add(), subtract(), multiply(), sqrt()

Q7. Explain Array-Oriented Programming.

Answer: Array-oriented programming avoids explicit loops and uses vectorized operations on entire arrays, improving performance and readability.

Q8. Explain File I/O with NumPy.

Answer: NumPy provides functions like save(), load(), savetxt(), and loadtxt() to store and retrieve array data from files.

Q9. Explain Linear Algebra operations in NumPy.

Answer: NumPy supports matrix multiplication, transpose, inverse, determinant, and eigenvalues using the numpy.linalg module.

Q10. Explain pseudorandom number generation.

Answer: NumPy generates random numbers using algorithms that produce reproducible sequences controlled by a seed value.

UNIT – 2 : DATA EXPLORATION WITH PANDAS

Q11. What is Data Exploration?

Answer: Data exploration is the process of understanding data characteristics using summary statistics and visualizations before modeling.

Q12. Explain Pandas data structures.

Answer:

Series – One-dimensional labeled array
DataFrame – Two-dimensional labeled table
Index – Immutable array for labeling

Q13. Explain descriptive statistics in Pandas.

Answer: Pandas provides functions like mean(), median(), mode(), std(), min(), max() to summarize data.

Q14. What are correlation and covariance?

Answer:

Correlation measures the strength and direction of relationship between variables.
Covariance indicates how two variables change together.

Q15. Explain unique values and value counts.

Answer:

unique() returns distinct values
value_counts() returns frequency of values

Q16. Explain data loading and storage methods in Pandas.

Answer: Pandas supports reading and writing data using CSV, Excel, JSON, SQL databases, and web APIs.

UNIT – 3 : DATA CLEANING, PREPARATION AND WRANGLING

Q17. What is data cleaning?

Answer: Data cleaning is the process of detecting and correcting inaccurate, incomplete, or inconsistent data.

Q18. Explain methods for handling missing data.

Answer:

dropna()
fillna()
Forward/Backward filling

Q19. Explain data transformation.

Answer: Data transformation converts data into suitable format using normalization, scaling, encoding, and aggregation.

Q20. Explain string manipulation in Pandas.

Answer: Pandas provides vectorized string methods using .str accessor for operations like lower(), split(), replace().

Q21. What is data wrangling?

Answer: Data wrangling is the process of cleaning, structuring, and enriching raw data into a usable format.

Q22. Explain combining and merging datasets.

Answer: Pandas supports merge(), join(), and concat() to combine datasets using keys or indexes.

Q23. Explain reshaping and pivoting.

Answer: Reshaping changes the structure of data using pivot(), stack(), unstack(), and melt().

UNIT – 4 : DATA VISUALIZATION & GROUP OPERATIONS

Q24. Explain Matplotlib API.

Answer: Matplotlib is a Python library used for creating static, animated, and interactive plots.

Q25. Explain plotting with Pandas and Seaborn.

Answer: Pandas uses Matplotlib internally, while Seaborn provides high-level statistical visualizations.

Q26. Explain GroupBy mechanics.

Answer: GroupBy splits data into groups, applies functions, and combines results.

Q27. What are pivot tables and cross-tabulation?

Answer: Pivot tables summarize data using aggregation, while cross-tabulation computes frequency tables.

UNIT – 5 : STATISTICAL THINKING & TIME SERIES ANALYSIS

Q28. Explain statistical distributions.

Answer: Distributions describe how values are spread and can be visualized using histograms.

Q29. What are outliers?

Answer: Outliers are extreme values that deviate significantly from other observations.

Q30. Explain PMF and CDF.

Answer:

PMF gives probability of discrete values
CDF gives cumulative probability up to a value

Q31. Explain percentile-based statistics.

Answer: Percentiles divide data into 100 equal parts and help compare relative standing.

Q32. Explain Time Series Analysis.

Answer: Time series analysis studies data points indexed in time order to identify patterns and trends.

Q33. Explain date and time handling in Pandas.

Answer: Pandas provides datetime objects, date ranges, frequency conversion, and time zone handling for time-based data.

⭐ MOST IMPORTANT QUESTIONS (HIGH PROBABILITY)

🔥 UNIT–1 (Very Important)

Define Data Science and explain the five steps of Data Science ⭐⭐⭐
Explain NumPy ndarray and its features ⭐⭐
Explain Universal Functions and Array-Oriented Programming ⭐⭐
Explain Linear Algebra operations in NumPy ⭐⭐

🔥 UNIT–2 (Very Important)

Explain Pandas Data Structures with examples ⭐⭐⭐
Explain Descriptive Statistics, Correlation and Covariance ⭐⭐
Explain Data loading and storage methods in Pandas ⭐⭐

🔥 UNIT–3 (Very Important)

Explain Data Cleaning and handling missing data ⭐⭐⭐
Explain Data Wrangling – combining and merging datasets ⭐⭐
Explain Reshaping and Pivoting operations ⭐⭐

🔥 UNIT–4 (Very Important)

Explain Matplotlib architecture and API ⭐⭐⭐
Explain GroupBy mechanics and Split–Apply–Combine strategy ⭐⭐⭐
Explain Pivot tables and Cross-tabulation ⭐⭐

🔥 UNIT–5 (Very Important)

Explain Statistical Distributions and Outliers ⭐⭐⭐
Explain PMF and CDF with examples ⭐⭐⭐
Explain Time Series Analysis and Pandas time tools ⭐⭐⭐

✏️ IMPORTANT DIAGRAMS & FLOWCHARTS (WHAT TO DRAW IN EXAM)

📌 1. Data Science Life Cycle (UNIT–1) ⭐⭐⭐

Draw a flowchart:

Problem Definition → Data Collection → Data Cleaning → Data Analysis → Visualization & Decision Making

👉 Label each step clearly.

📌 2. NumPy Array Structure (UNIT–1) ⭐⭐

Draw:

2D matrix
Show rows, columns, shape

👉 Mention: homogeneous data, fast computation.

📌 3. Pandas Data Structures (UNIT–2) ⭐⭐⭐

Draw block diagram:

Series → DataFrame → Index

👉 Show Series as single column, DataFrame as table.

📌 4. Data Cleaning Process (UNIT–3) ⭐⭐⭐

Flowchart:

Raw Data → Missing Value Handling → Transformation → Clean Data

📌 5. Data Wrangling Operations (UNIT–3) ⭐⭐

Diagram:

Merge / Join / Concat → Reshape → Final Dataset

📌 6. Split–Apply–Combine Strategy (UNIT–4) ⭐⭐⭐

Very Important Diagram:

Data → Split (GroupBy) → Apply (Aggregation) → Combine (Result)

📌 7. Histogram & Distribution (UNIT–5) ⭐⭐⭐

Draw:

X-axis: Values
Y-axis: Frequency

👉 Mention: shape, spread, outliers.

📌 8. PMF vs CDF Graph (UNIT–5) ⭐⭐⭐

Draw two graphs:

PMF: discrete bars
CDF: increasing curve

📌 9. Time Series Plot (UNIT–5) ⭐⭐⭐

Draw:

X-axis: Time
Y-axis: Value
Mark trend/seasonality

🏆 FINAL EXAM WRITING STRATEGY

✔ Start answer with definition ✔ Add diagram/flowchart wherever possible ✔ Use keywords from syllabus ✔ End with applications / advantages

👉 This is a complete scoring package for your Data Science exam.

Tuesday, December 16, 2025

DATA SCIENCE – QUESTION & ANSWER FORMAT (M.Tech Exam Oriented)

UNIT – 1 : INTRODUCTION TO DATA SCIENCE

Q1. Define Data Science.

Q2. What are the basic terminologies of Data Science?

Q3. Explain the types of data.

Q4. Explain the five steps of Data Science.

Q5. Explain NumPy ndarray.

Q6. What are Universal Functions (ufuncs)?

Q7. Explain Array-Oriented Programming.

Q8. Explain File I/O with NumPy.

Q9. Explain Linear Algebra operations in NumPy.

Q10. Explain pseudorandom number generation.

UNIT – 2 : DATA EXPLORATION WITH PANDAS

Q11. What is Data Exploration?

Q12. Explain Pandas data structures.

Q13. Explain descriptive statistics in Pandas.

Q14. What are correlation and covariance?

Q15. Explain unique values and value counts.

Q16. Explain data loading and storage methods in Pandas.

UNIT – 3 : DATA CLEANING, PREPARATION AND WRANGLING

Q17. What is data cleaning?

Q18. Explain methods for handling missing data.

Q19. Explain data transformation.

Q20. Explain string manipulation in Pandas.

Q21. What is data wrangling?

Q22. Explain combining and merging datasets.

Q23. Explain reshaping and pivoting.

UNIT – 4 : DATA VISUALIZATION & GROUP OPERATIONS

Q24. Explain Matplotlib API.

Q25. Explain plotting with Pandas and Seaborn.

Q26. Explain GroupBy mechanics.

Q27. What are pivot tables and cross-tabulation?

UNIT – 5 : STATISTICAL THINKING & TIME SERIES ANALYSIS

Q28. Explain statistical distributions.

Q29. What are outliers?

Q30. Explain PMF and CDF.

Q31. Explain percentile-based statistics.

Q32. Explain Time Series Analysis.

Q33. Explain date and time handling in Pandas.

⭐ MOST IMPORTANT QUESTIONS (HIGH PROBABILITY)

🔥 UNIT–1 (Very Important)

🔥 UNIT–2 (Very Important)

🔥 UNIT–3 (Very Important)

🔥 UNIT–4 (Very Important)

🔥 UNIT–5 (Very Important)

✏️ IMPORTANT DIAGRAMS & FLOWCHARTS (WHAT TO DRAW IN EXAM)

📌 1. Data Science Life Cycle (UNIT–1) ⭐⭐⭐

📌 2. NumPy Array Structure (UNIT–1) ⭐⭐

📌 3. Pandas Data Structures (UNIT–2) ⭐⭐⭐

📌 4. Data Cleaning Process (UNIT–3) ⭐⭐⭐

📌 5. Data Wrangling Operations (UNIT–3) ⭐⭐

📌 6. Split–Apply–Combine Strategy (UNIT–4) ⭐⭐⭐

📌 7. Histogram & Distribution (UNIT–5) ⭐⭐⭐

📌 8. PMF vs CDF Graph (UNIT–5) ⭐⭐⭐

📌 9. Time Series Plot (UNIT–5) ⭐⭐⭐

🏆 FINAL EXAM WRITING STRATEGY

No comments:

Post a Comment