This course provides an overview of Data Science, covering a broad selection of key challenges in and methodologies for working with big data. Topics to be covered include data collection, integration, management, modeling, analysis, visualization, prediction and informed decision making, as well as data security and data privacy. This introductory course is integrative across the core disciplines of Data Science, including databases, data warehousing, statistics, data mining, data visualization, high performance computing, cloud computing, and business intelligence. Professional skills, such as communication, presentation, and storytelling with data, will be fostered. Students will acquire a working knowledge of data science through hands-on projects and case studies in a variety of business, engineering, social sciences, or life sciences domains. Issues of ethics, leadership, and teamwork are highlighted. Prerequisites:None beyond meeting the Data Science admission criteria.
This course surveys the statistical methods most useful in data science applications. Topics covered include predictive modeling methods, including multiple linear regression, and time series; data dimension reduction; Discrimination and classification methods, clustering methods;and committee methods. Students will implement these methods using statistical software. Prerequisites: Statistics at the level of MA 2611 and MA2612 and linear algebra at the level of MA 2071.
Emerging applications in science and engineering disciplines generate and collect data at unprecedented speed, scale, and complexity that need to be managed and analyzed efficiently. This course introduces the emerging techniques and infrastructures developed for big data management including parallel and distributed database systems, map-reduce infrastructures, scalable platforms for complex data types, stream processing systems, and cloud-based computing. Query processing, optimization, access methods, storage layouts, and energy management techniques developed on these infrastructures will be covered. Students are expected to engage in hands-on projects using one or more of these technologies. Prerequisites: A beginning course in databases at the level of CS4432 or equivalent knowledge, and programming experience.
Innovation and discoveries are no longer hindered by the ability to collect data, but the ability to summarize, analyze, and discover knowledge from the collected data in a scalable fashion. This course covers computational techniques and algorithms for analyzing and mining patterns in large-scale datasets. Techniques studied address data analysis issues related to data volume (scalable and distributed analysis), data velocity (high-speed data streams), data variety (complex, heterogeneous, or unstructured data), and data veracity (data uncertainty). Techniques include mining and machine learning techniques for complex data types, and scaleup and scale-out strategies that leverage big data infrastructures. Real-world applications using these techniques, for instance social media analysis and scientific data mining, are selectively discussed. Students are expected to engage in hands-on projects using one or more of these technologies. Prerequisites: A beginning course in databases and a beginning course in data mining, or equivalent knowledge, and programming experience.
An offering of this course will cover a topic of current interest in detail. This serves as a flexible vehicle to provide a one-time offering of topics of current interest as well as to offer new topics before they are made into a permanent course. Prerequisites: will vary with topic.
This course will allow a student to study a chosen topic in Data Science under the guidance of a faculty member affiliated with the Data Science program. The student must produce a written report.
A directed research study, conducted under the guidance of a faculty member affiliated with the Data Science Program, investigates challenges and techniques central to data science, and aims to innovate novel approaches and techniques towards solving these challenges. The student must produce a written report.
This 3-credit graduate qualifying project, typically done in teams, is to be carried out in cooperation with a sponsor or industrial partner. It must be overseen by a faculty member affiliated with the Data Science Program. This offering integrates theory and practice of Data Science, and should include the utilization of tools and techniques acquired in the Data Science Program. In addition to a written report, this project must be presented in a formal presentation to faculty of the Data Science program and sponsors. Professional development skills, such as communication, teamwork, leadership, and collaboration, along with storytelling, will be practiced. Prerequisite: DS 501, completion of at least 24 credits of the DS degree, or consent of the instructor.
A thesis in Data Science consists of a research and development project worth (a minimum of) graduate credit hours advised by a faculty member affiliated with the Data Science Program. A thesis proposal must be approved by the DS Program Review Board and the student?s advisor, before the student can register for more than three thesis credits. The student must satisfactorily complete a written thesis document, and present the results to the DS faculty in a public presentation.