Object Oriented Programming

From TaDa Wiki
Jump to: navigation, search


Before getting into any substance, a few caveats:

  • Explaining object-oriented programming (OOP) is a little like explaining how to ride a bike -- it can be helpful, but don't be surprised if you don't feel like an expert after reading this. OOP is about how one organizes their programs, and the only way to really get a handle on it is through practice and seeing lots of examples!
  • OOP is primarily about how people design software, not how people use software. In most cases, as a user, whether something is OOP or not is basically irrelevant. Not always, but often. Still, you hear about it enough I thought I'd put up a page here so you know what it's about (and can decide for yourself whether you care).


The core idea of OOP is that code should be modular, meaning all the pieces of a program should be organized into small, self-contained units.

Each object can be thought of as having two main components:

  1. Some data stored inside the object;
  2. A set of functions to manipulate that data.

In a data-analysis setting, each data structure -- like a matrix, vector, or dataframe -- is usually an object. A matrix includes the data in that matrix, as well as a number of functions for manipulating that data. But the principle is much more general -- for example, if you were designing a program to play music, you could create a "song" object that contains all the code necessary to play or modify a song, as well as the data for the song itself.

Before you get too worried about what an object is, though, remember that it's just an abstraction created by computer scientists to help organize code. It's an organizational tool, not a fundamental part of how your computer works.

Why OOP?

There are a few reasons OOP is popular, but probably the biggest are that (a) it makes software development easier, and (b) it helps prevent program bugs.

Software Development

OOP was not created with data scientists in mind. To understand the value of OOP, it's helpful to imagine a team of programmers trying to develop a large piece of software.

Since there's no way for everyone on a team to collectively write each line of code, large pieces of software are broken down into smaller pieces. For example, a team of software engineers trying to design a big media-player like iTunes might have one team develop software to play music files; another team might develop the software to play video files, etc. But how do they make sure that when they combine these different pieces of software, they all work together?

This is the value of OOP -- if each team writes their code so that it is as self-contained as possible (meaning that everything the video player object includes all the code it needs to play video), the chances that everything will work as predicted when it is brought together increases.

Moreover, imagine that a few years down the road, this same company wanted to update their software with a new music player! Because everything is self-contained, they can just replace the music-player component of their code without worrying that those changes will ruin the video-player they already built.


The other advantage of OOP is that it allows programmers to limit the ways in which users can interact with the data inside an object, making programs more stable. In the OOP paradigm, the only way to modify the data inside an object is by using one of the functions built into an object. This allows programmers to control what users can do. For example, if a programmer were writing the code for a vector, they can set it such that if you try to do something crazy things like add 100 items to a vector that only has space for 10 items, it will print an error message saying "That's not allowed!" rather than allowing the computer to try and execute the command, causing your program to crash.