### Data Science Masters program highlights—PESIT South Campus

### Course Outline:

### Nine Core Courses (Lecture Based) – 27 credits ( 16*3= 48 lectures/semester)

### One “Directed Independent Study” component- 3 credits

### Masters thesis—6 credits

### Total= 36 credits

### Span= FOUR 16 week long semesters. Classes will be held during after-work hours on weekdays and weekends.

### Degree requirements: Passing grade of “C” or better in coursework, “P” in thesis upon successful completion of scholarly work, approved by thesis supervisor and examination committee.

### Evaluation of Coursework: 50%- assignments/mini projects

### 50%- end of term test (Open Book)

### Expectations: Candidates should acquire solid theoretical understanding of Mathematical and Statistical interpretations of data and be proficient in handling/manipulating/analyzing massive data sets and draw inferences. Additionally, they are expected to

- Approach business problems data-analytically. Think carefully and make better-informed decisions and automated decisions.
- Understand fundamental principles of data science, such as using data to get information about an unknown quantity of interest, calculating and using data similarity, fitting models to data, supervised and unsupervised modeling, overfitting and its avoidance, evaluation and model analytics, visualization, predictive modeling, causal inference, the data mining process, problem decomposition, to name a few.
- Be able to apply the most important data science methods, using open-source tools.

**Careers upon completion of the program( a selected few)**

- Data and Decision Science positions
- Investment Banking
- Policy research
- Insurance Analyst
- Climate Modeling and forecasting
- Sociology
- Psychology
- Politics and Economics

- Urban progress
- Business Analytics
- High performance computing
- System performance and modeling

### Coursework Information

**Basic courses**

**Introduction to Data Science**

**Introduction to Linear Algebra and Matrix Computation **

**Probability Models**

**Data Mining**

**Practical issues**

**Advanced courses**

**Machine Learning ****and Computational Statistics:** Machine learning, pattern recognition, statistical modeling, Mixture models, Clustering and neural computation.

PCA, ICA

**Distributed Computing Paradigms:** Data management using Map/Reduce, Hadoop, SQL/NoSQL, Relational Algebra, AmazonEC2, CloudSim.

**Decision theory and Models:** Decision processes, sequential models, Non-sequential models, the phases of practical decisions – and of decision theory, Deciding and valuing, Relations and numbers, the comparative value terms, Completeness, Transitivity, preferences in decision-making, Numerical representation, utilities in decision-making, Outcomes and states of nature, Decision matrices, states of nature, expected utility, Objective and subjective utility, appraisal of EU, Probability estimates, Bayesian-ism, Variations of expected utility, Process utilities and regret theory

**Graphs and estimation: ** Graph representations, graph based algorithms: Breadth First and Depth First Search, graph covering: eg. vertex cover, minimum dominating set

Advanced graph based algorithms: max flow algorithms, Graph database

Random graphs.

**The world of data science: Case Study modeling:**

Internet-scale data analytics, hypothesis formation, data collection, methods of analysis and visualization, storage strategies for analysis, constructing representative samples , Script-based programming techniques to automate collection from a variety of third-party resources, such as application programming interfaces (APIs). Methods to store raw data, merge disparate data sets, clean inconsistent entries and construct derivative data sets.

**Directed Independent Study: freedom to choose one from any of the following domains:**

Causal Inference: Statistical Methods For Program Evaluation and Policy Research

** **Missing Data/ Statistical Analysis of Networks/Applied Spatial Statistics/Complete and sufficient statistics/Statistical Natural Language Processing/ Natural Language Processing

Social Networks/Bioinformatics/ Biostatistics/Forecasting Time Series Data/

Game Theory/Applied Stochastic Processes for Financial Models/ Mathematics of Investment/Sampling Techniques/ Scientific Computing/Neuroscience/ AstroStatistics/Formal Modeling in Political Science/Game Theory and Politics

**Masters Thesis -Minimum Expectation-**

Candidates are expected to conduct detailed study of a problem of their choice, review literature/white papers/technical papers and write original or expository reports. Plagiarism is considered an offense and degree might not be awarded if found guilty of plagiarism.

Prepared by:

Snehanshu Saha and Anand Narasimhamurthy

Faculty involved-

Surbhi Agrawal

Archana Mathur

Kakoli Bora

Arun Kumar

Sai Prasanna

Swathi Gambhire