Hello and welcome to Relational Database Design. This is a really interesting topic and it's a topic that I think you can learn pretty quickly. But it's sort of like the artistry of what databases are. And so what I'm going to be teaching you is kind of techniques. And I'm going to assume that at some level you're going to not think that you're a great database designer and you're going to get better the more you do it. And it also really helps to do it with other people. I know that when I learned to do database design, I did it with a group of people and I was in a conference room and we had a whiteboard and we were drawing pictures. And people who knew more than me were drawing pictures. And I was able to get it by drawing pictures. So I'm going to simulate that in this lecture. But at some point, for you to get really good, you're going to have to work with people who know more than you, who can mentor you, and then it comes to you pretty fast. So the basic idea of database design is you're going to end up drawing a picture of the shape of data. And so you're going to end up with something that looks like this picture. This is a picture from a database for software that I write, that I use for learning technology that I build for autograders for classes, etc. And this happens to be a database that's thinking about students, it's thinking about courses, it's thinking about memberships in those courses, it's thinking about roles, it's thinking about different courses. And so eventually I have to draw this picture. And so we're going to try to get you to draw pictures, and they have lines between them. And each of these boxes on the picture is a table and the lines are what are called foreign keys. We'll cover all this in a bit. But you end up drawing a picture of your data and then you translate that picture into a set of database commands and shapes to create it. And if you go into a large project, Sakai is another project that I am part of, and this is just a small part of it. You will often notice that the project has like a data model on the wall in a giant five-feet-by-three-feet diagram. And that's because this becomes kind of like the core essence, the soul of an application is the data model, the way you store the data. And so it's very, very important. So the idea is in building a data model, is that you can't always just read through a file in sequence. You want to be able to sort of bounce around in that file. And in a way, it's a form of compression through connection, if that makes any sense. And so if you have an email address, csev@umich.edu, and you're going to have all kinds of places in the data of this application where something is owned by csev@umich.edu, the basic rule is, and this is a form of compression, is you don't put that string in your database more than once. You put that string in once and then give it a key, like a number. And so if you were using this on Coursera, for example, you might find that when I logged into Coursera for the very first time, I was assigned a user number, like 116,421. And instead of putting my email address throughout the Coursera data model, they just put that number. And so we need this number and if I author a discussion forum post, we use the number to indicate that I was the author of that. If you comment it, we use your number, we don't use your email address. If all goes well, we should have one table and one field and one copy of your email address in a database. And so one way to think about this, and this is the way that I first learned how to do this, is you take an application and you have to build a data model from a mock-up of the user interface to the application. And so you can kind of imagine you're at a startup. I was building an educational system for Ford Motor Company when I first learned to do data modeling many years ago. And so we knew what the designers thought they needed in terms of this is the user interface. And then we have to come up with a database. Now you might think, well, I'll just make this a CSV file, it doesn't look like it's all that much data. The problem is that if you just make it a CSV file for 100 entries, it might be fast and for 300 entries it might be fast. And then for 3 million entries, it's not fast. And then for 300 million entries, it's not fast. And the interesting thing about databases is you can have 300 million entries and it still can be fast. And the basic flaw in a CSV file, and I don't know if you've ever tried, you probably haven't. But a lot of people have tried to like make a spreadsheet of all of your music. And it's easy to type this stuff in but you end up with what I'll call vertical replication. So when you're typing these things in, you type in a track and then you type in the artist and the album and the genre. And then you type in another track and then you copy and paste these pieces. And you copy paste, copy paste, copy paste, copy paste. And that seems like it's okay, but if you start thinking about this, it's kind of frustrating. You still need to know that these are part of an album and part of an artist and part of a genre, you still need to know that. But then let's just say you want to go in and edit it. And let's just say that as you were putting this in, you had a typo. And you have many albums by Black Sabbath. But you made a typo and you kept copying and pasting, and now you've got to go fix it, fix it everywhere. In this simple one, it seems simple enough. But if we're talking millions and millions of records, then it's impossible. And that's why ultimately we want to come up with a data model where the word Black Sabbath ends up in one table one time, and then there's some number, 42 for example. And then everywhere we want to mark something as belonging to Black Sabbath instead of, there we go, instead of putting in Black Sabbath, we put in 42, 42, 42, 42, 42, dot, dot, dot, dot, dot. And now if I made a typo, and so this somewhere else in some other table is Black Sabbath and 42 is the number, and I made a typo, I just fix it here, and somehow, magically, all this changes. So you could go get a whole PhD in database. What I just described is the single most important concept in building a data model. And I'll say this over and over. Vertical replication of string data is not what we want to do. Now, the thing we don't want to do is go back to our user interface designers and say, you know what? Do you realize how bad vertical replication of data is? Because I took a class and my professor said I shouldn't do vertical replication of data. So you need to make a new user interface. And that's wrong. The interface should be what it is. If this is the interface, and now we can see all this, because this is pretty convenient, as a user I kind of like this, right? I can sort these things up and down and up and down, and I do want to see this, and I want to be able to search for Black Sabbath, I want to search for all these words, and I want to see all this vertical replication in the user interface. I just don't want to have it in the database for efficiency purposes. So we accept the user interface and its needs as okay. We don't fight that. But what we have to do then is make a data model and then reconstruct from that data model the user interface that they want. So the idea is that you find sort of each string, number, each piece of data, and then you decide which table you're going to put it in. And sometimes you have to build new tables and then you build these tables and then you draw lines in between the tables. And you end up with those pictures that I talked about before. So we'll stop here, but then when we come back, we're going to go through and build such a table.