DS 2500View Syllabus

Intermediate Programming
with Data

Intermediate Python programming, APIs, exploratory data analysis, statistical modeling, machine learning foundations, and collaborative software development.

PythonData SciencePandasAPIsMachine Learning

About

Programming with data, not just for data.

DS 2500 focused on intermediate Python programming for data science — object-oriented design, working with APIs, cleaning messy real-world datasets, building visualizations, and introducing machine learning fundamentals within a collaborative software engineering workflow.

The semester culminated in a team research project analyzing consumer banking complaints using live CFPB data. We collected roughly 35,000 records through the CFPB API, explored complaint patterns across major institutions, and evaluated whether regulatory enforcement actions appear to change harmful consumer outcomes — my first complete end-to-end data science pipeline.

Skills

Core competencies from this course.

Python
Pandas
Matplotlib
JSON APIs
Data Cleaning
Feature Engineering
Exploratory Data Analysis
Statistical Analysis
Machine Learning Foundations
Git Collaboration
Object-Oriented Programming
Research Communication

Featured Project

Banking fraud & corporate misconduct.

Capstone Research Project

Banking Fraud & Corporate Misconduct

Do Regulatory Penalties Change Bank Behavior?

Using roughly 35,000 CFPB consumer complaints collected through the CFPB API, our team investigated whether major enforcement actions against large banks appear to reduce harmful consumer complaint patterns. The project combined API-based data collection, exploratory analysis, visualization, and statistical comparisons across multiple financial institutions.

Project Preview

Visuals from the capstone analysis.

Downloads

Reports, slides, and course files.

Learning Outcomes

What this course taught me to do.

1

API Development

Querying external data sources programmatically and handling paginated JSON responses.

2

JSON Processing

Parsing, filtering, and restructuring nested API payloads into analysis-ready tables.

3

Data Visualization

Communicating patterns clearly with Matplotlib charts tuned for technical audiences.

4

Exploratory Analysis

Investigating distributions, outliers, and relationships before formal modeling.

5

Object-Oriented Programming

Designing reusable Python classes that organize data pipelines and project logic.

6

Python Programming

Writing intermediate-level scripts with functions, modules, and disciplined structure.

7

Machine Learning Concepts

Applying foundational ML ideas to real datasets with appropriate skepticism and validation.

8

Team Software Development

Dividing responsibilities, integrating contributions, and shipping a shared codebase.

9

Professional Research

Framing questions, citing sources, and defending conclusions with evidence.

10

Git Collaboration

Branching, merging, and reviewing teammate changes on a shared repository.

11

Statistical Thinking

Comparing groups, interpreting variation, and avoiding overclaiming from noisy data.

12

Technical Presentation

Distilling complex analysis into slides that non-specialists can follow.

Reflection

My first end-to-end data science workflow.

DS 2500 significantly strengthened my programming abilities beyond introductory Python. I learned to work with real-world APIs, clean messy datasets that never arrive in tidy CSV form, and build reproducible analysis pipelines my teammates could trust. Creating professional visualizations and communicating technical findings through written reports and presentations became as important as writing the code itself.

The CFPB banking project was my first complete end-to-end data science workflow — from API collection through exploratory analysis to a defended conclusion. Collaborating with Git, dividing analytical responsibilities, and integrating our work into a single narrative taught me how data science actually happens in teams. That foundation directly prepared me for future machine learning and analytics coursework, where rigor, reproducibility, and clear communication are non-negotiable.