unit-1
Define basic terminologies in data science (e.g., data mining, data quality, similarity measures).
Explain types of data and their characteristics (structured/unstructured, quality aspects).
Describe the five steps of the data science process (collection, cleaning, analysis, modeling, deployment).
Explain universal functions (ufuncs) in NumPy with examples.
High-Weightage Long Questions (8-10 Marks)
From analyzed past papers and question banks:
Discuss the five steps of data science life cycle with a real-world example.
Explain NumPy ndarray creation, attributes, and indexing/slicing; provide code snippets.
Describe array-oriented programming using NumPy: vectorized computations vs. loops, with performance comparison.
Illustrate file I/O with NumPy arrays (np.save, np.load) and linear algebra functions (dot product, eigenvalues).
Explain pseudorandom number generation in NumPy (np.random module) and its applications in data simulation.
unit-2Data Exploration with Pandas focuses on data loading, manipulation, and summary stats using Pandas structures, a key Unit II topic in JNTUK M.Tech Data Science syllabi at St. Mary's. Past papers from JNTUK affiliates emphasize practical functions, DataFrame operations, and file handling, with repeats in R19/R20 exams.
Repeated Short Answer Questions (2-5 Marks)
Common from analyzed JNTUK data science papers:
Define Pandas Series, DataFrame, and Index objects with examples.
Explain essential functionality: head(), tail(), describe() for data exploration.
Differentiate correlation vs. covariance; compute unique values and value counts.
List steps in the data exploration process (profiling, cleaning, visualization).
High-Weightage Long Questions (8-10 Marks)
Frequently asked in previous years:
Describe Pandas data structures in detail; create a DataFrame from CSV and perform descriptive statistics (mean, std, correlation matrix).
Explain data loading methods: read_csv(), read_excel(); handle text/binary formats with examples (e.g., skiprows, delimiters).
Illustrate summarizing data: value_counts(), membership (isin()), pivot tables; compare groupby() vs. pivot_table().
Discuss interacting with web APIs (requests library) and databases (SQLAlchemy with Pandas); write code for reading JSON/XML data.
Exam Tips
Practice coding demos like merging DataFrames (pd.merge, concat) and handling duplicates/missing values, as they appear in 70% of papers. Use JNTUK model QPs from firstranker.com for Unit II alignment with St. Mary's syllabus.
unit-3Data Cleaning, Preparation, and Data Wrangling covers handling missing values, transformations, string operations, and reshaping data with Pandas, typically Unit III in JNTUK M.Tech Data Science syllabi at St. Mary's. Previous JNTUK papers stress practical Pandas methods like dropna(), apply(), regex, merge(), and pivot(), with high repetition in R19/R20 exams across affiliates.
Repeated Short Answer Questions (2-5 Marks)
Frequently tested basics:
Explain handling missing data in Pandas (dropna(), fillna(), interpolate()).
Describe string object methods (.str accessor) and vectorized functions (str.contains(), str.replace()).
Differentiate join vs. merge in data wrangling; list types of joins (inner, outer).
What are regular expressions in Pandas? Give syntax for common patterns (e.g., \d+).
High-Weightage Long Questions (8-10 Marks)
Common from question banks and past papers:
Discuss the data wrangling process: handling missing data, transformation, and reshaping; provide code for detecting/filling NaNs in a DataFrame.
Explain string manipulation with examples: regex for extraction, vectorized .str methods vs. loops for performance.
Illustrate combining/merging datasets (pd.concat(), pd.merge()); compare with hierarchical indexing (MultiIndex) and reshaping (stack(), pivot()).
Describe full data preparation pipeline: cleaning (duplicates, outliers), transformation (normalization, encoding), and pivoting for analysis with code snippets.
Exam Tips
Focus on code examples (e.g., df.groupby().apply() for transformations, pd.pivot_table() for reshaping) as they score high; practice with CSV datasets. Align with St. Mary's R20 syllabus by reviewing SIETK/JNTUK banks for identical patterns.
unit-4Data Visualization with Matplotlib and Group Operations covers plotting APIs, integration with Pandas/Seaborn, and GroupBy mechanics, forming Unit IV in JNTUK M.Tech Data Science syllabi at St. Mary's. Past papers highlight Matplotlib customization, aggregation functions, and pivot tables, with consistent repeats in R19/R20 exams mirroring patterns from similar JNTUK affiliates.
Repeated Short Answer Questions (2-5 Marks)
Frequently appearing basics:
Explain Matplotlib API primer: key functions like plt.plot(), plt.show(), subplots.
Differentiate plotting with Pandas (.plot()) vs. pure Matplotlib; list Seaborn advantages.
Describe GroupBy mechanics: split-apply-combine process with syntax.
What are pivot tables vs. cross-tabulation? Give pd.crosstab() example.
High-Weightage Long Questions (8-10 Marks)
Top repeats from question banks:
Discuss Matplotlib components (figures, axes, ticks); explain customization of labels, legends, annotations with code examples (plt.xlabel(), plt.legend(), plt.annotate()).
Illustrate data aggregation: apply(), agg() functions; compute group statistics (mean, sum) on a sample DataFrame.
Explain plotting types (line, bar, scatter, pie) and Seaborn integration; provide code for subplots and saving plots (plt.savefig()).
Describe full GroupBy workflow: hierarchical indexing, merging datasets, reshaping with pivot_table(); compare with stack/unstack.
Exam Tips
Emphasize code snippets for plots (e.g., sns.heatmap(corr_matrix), df.groupby('category').plot()) and aggregation demos, as they dominate 8-mark questions. Practice with JNTUK model papers; St. Mary's follows identical R20 patterns.
unit-5Statistical Thinking and Time Series Analysis covers distributions, probability functions, outlier detection, and Pandas datetime tools, matching Unit V in JNTUK M.Tech Data Science syllabi (R20) at St. Mary's. JNTUK previous papers consistently test visualization of stats, PMF/CDF computations, and time series manipulations, drawing from shared question patterns across affiliates.
Repeated Short Answer Questions (2-5 Marks)
Common from Unit V exams:
Define probability mass function (PMF) and cumulative distribution function (CDF); explain percentile-based statistics.
Describe methods for detecting and handling outliers in distributions (IQR, z-score).
Explain summarizing distributions: variance, histograms, and reporting results with examples.
List Pandas tools for time series: pd.to_datetime(), date ranges, frequencies (resample()).
High-Weightage Long Questions (8-10 Marks)
Frequently repeated high-scorers:
Discuss statistical thinking process: plotting histograms/CDFs, variance computation, comparing percentile ranks; provide code for random number generation and distribution analysis.
Explain time series basics: date/time data types, shifting data (shift()), time zones (tz_convert()); illustrate period arithmetic with freq='D'.
Illustrate full workflow: outlier detection/removal, distribution summarization (boxplots, quantiles), and PMF/CDF representation using Matplotlib/Seaborn.
Describe time series handling: generating date ranges (pd.date_range()), resampling frequencies, and period indexing; compare with asfreq() vs. resample().
No comments:
Post a Comment