The common usage of machine learning would be to build a classification model. Classification models are simply going to assign data examples into different categories. The simplest and a very common type of classification model would be what we call binary classification. In this case, there's only two possible classes. Either something is or is not. It's either A or it's B, it's a 0 or a 1. We might use this in the medical field to classify a patient as either having a particular condition or not having it. Or if we're looking at using this type of scheme, perhaps for maintaining equipment, either the equipment is about to fail or it is not about to fail. Because there's only two possibilities, that's why we call it binary classification. Logistic regression is probably the simplest classification algorithm that is of course, useful for this type of binary classification. With logistic regression, we have an algorithm that is going to output a classification probability that will range between 0 and 1. Then of course, we can assign the data examples to the appropriate class based on that classification probability. The thing is though, when I say the term logistic regression, you might think well, but don't we use linear regression for estimating continuous values? Yes, that's true. We use linear regression for that task. Why would logistic regression be treated as a classification algorithm rather than as actually a regression algorithm like linear regression? The thing is logistic regression is similar in that it does take continuous variables as inputs and it maps the relationship of one variable to another variable, and then it outputs a value that has a range from 0-1. But because it is applied to then classifying the objects, we don't use it for regression tasks, we use it for classification tasks. In fact the formula that is output by logistic regression, is what's called a logistic function. The logistic function is a type of a sigmoid function that creates an S-shaped curve. If you look at the slide, you'll see here that we have an example where we're comparing the customer satisfaction score from a customer survey, to whether it is likely that customer is going to return or not. The classification we're looking for is, will we have a returning customer or not? As you can see that as the satisfaction score increases, the probability that the customer will return is higher. In our case, you can see that all of the data examples have been classified as either a 0 or a 1. Of course, at some point we've got to determine just exactly where will we make that decision. Where will that decision boundary exist between returning and not returning? We're going to have a look at that in a little bit more detail here. But we can see that using this logistic function, that we can actually deal with any outliers that might occur in the data set as well, that may have an extremely high satisfaction score, and of course, we can deal with the whole range of values as well. The formula that is actually used, the standard logistic function that is being represented here, is a sigmoid function. We have the Sigma here that's showing that if we apply that function to the estimated value between negative infinity and positive infinity, which are the possible ranges really in this case of our x value. We use that as the negative exponent for the natural logarithm base. Of course that's a value of 2.71828 and it goes on and on like that, then the pattern starts to break from here. But anyway, the point is that's your natural logarithm base. What this is going to do, is it is going to generate that characteristic S-shaped curve that we saw before, where the curve is always going to start very close to 0 for a y-value, and it will always end very close to 1 for the y-value. Then of course we'll have the values of x where the midpoint of that curve will always cross through the value where x is actually set to 0. The idea here is that the function will be able to basically learn the model parameters necessary in order to perform the classification. If we look at our diagram here now, you can see that we've now defined a decision boundary that is going to be used to determine whether something should be placed into the positive class or negative class. If our decision boundary is perhaps 0.5 here, we can extrapolate that decision boundary to our curve and then of course, any satisfaction score that is greater than this point is going to end up being categorized as a yes, we have a returning customer and any satisfaction score lower than that, would end up categorizing that data as, no, we do not have a returning customer. The actual decision boundary itself can be expressed as follows using these formulas where p-hat, which is basically really going to be the probability of something being classified, if that's greater than or equal to 0.5, then y-hat, which is our predicted classification, will be a 1, and if p-hat is less than 0.5, then our y-hat or predicted classification for this would be a 0. You can see here that by applying our sigmoid function, we can look at the likelihood of something being placed into either the class 1 or the class 0. Of course, the decision boundary that was selected was arbitrarily selected in this case and of course, we could tune this model so that it favors ensuring that if something is going to be categorized as being a 1 or a yes, then it is a returning customer, we may want to set a decision boundary at a higher level here, where we'd need a probability of at least 0.7 before we would then predict that that would be a returning customer, and of course, that would raise the necessary satisfaction score in order to make that determination. The real question of course is, if we do have a data point that lands somewhere in this range here where we're going through that 0 crossing point, then at what point do we really make the determination as to whether it's going to be a positive class or a negative class. That's going to be based on the decision boundary that we have selected as the Data Scientist.