Hello. As you are well aware data analytics is layered and sophisticated, and there is a convergence of data analytics and data science. Let's look at the major types of modeling that help us understand datasets. Modeling is the process of constructing analysis to understand relationships in a dataset. As an example, building a revenue forecast is modeling revenue growth using historical data, customers expectations, and other factors. In addition to having a strong understanding of financial modeling, you need to understand key modeling approaches used in complex or advanced data analytics projects, and data science to know what these models are and how they might be used and evaluated. If you're not building the models but are working with data scientists who build complex models for you to use and interpret, having this context will allow you to ask relevant questions and to be more informed about the underlying model's performance. One of the most often cited modeling approaches is machine learning, machine learning models can be supervised, semi-supervised, or unsupervised. Machine learning models are generally used to predict a particular outcome such as, whether a customer will default, whether a text message is spam, which product would perform the best given a certain set of attributes ranging from color to size, or which customer is most likely to leave or turn. Supervised machine learning models are trained using data labeled with the correct answers so the model can learn from those labels. For example, a 10-year dataset from a telecom company contains customer churn information. Each customer account is labeled with age, gender, location, income level, phone usage, phone types, number of times the customer interacts with a company representative, and whether the customer churned. Establish a machine learning model by identifying the most meaningful factors contributing to the churn and using this model to predict whether a new customer will likely churn in the future. An unsupervised machine learning model does not have a dataset with labels to learn from, it must identify patterns. With a semi-supervised machine learning model, some data is labeled but not all of it. In complex predictive analytics problems, machine learning models are always used. In any kind of modeling you want to understand whether there is overfitting or underfitting of the model. Overfitting of the model means that the model performs well on the training data, which it knows well but the model can't be generalized for new datasets. Underfitting means that the model doesn't perform well even on the training data. Generally underfitting requires assessing whether the model is the right type of model to characterize a phenomenon. In any modeling that involves predicting an outcome, each outcome generated from the model is categorized in one of four ways: true positive, the customer churned, and our model predicted that this customer would churn, false positive the customer never churned but our model suggested that this customer would churn. False negative the customer churned but our model predicted that this customer would never churn. True negative the customer did not churn and our model predicted that this customer would not churn. Data scientists evaluate model performance by looking at recall, precision, accuracy, and additional measures derived from recall and precision. Precision is calculated by dividing the number of true positives by the combined true positives and false positives. Recall is true positives divided by the combined true positives and false negatives. Accuracy is true positives plus true negatives divided by the total number of outcomes generated. TP plus TN plus FP plus FN. In most modeling exercises, you must balance precision and recall. You want to make sure that the model is correctly categorizing and identifying all the relevant outcomes such as customers that churn by looking at recall. At the same time, you want to make sure the model has sufficient precision so each time it generates an outcome it is telling you a true positive. Given a data analytics problem and equipped with different types of available models, you can also evaluate which model may be the most appropriate. Models are important but if the problem framing and scope are not set appropriately, the model's outcome will not provide a useful solution. Now let's turn to the main types of data analytics that you'll apply, depending on the problem you're trying to solve. You're likely to come across or work with these four types of data analytics. As we discussed earlier, data analytics projects can generally be categorized into four types: one, descriptive data analytics project, two, diagnostic data analytics projects, three, predictive data analytics projects, and four prescriptive data analytics projects. Descriptive data analytics projects focus on delivering data insights to describe a situation or a scenario. For example, showing sales trends using historical data is descriptive analytics, showing employee composition using anonymized and aggregated datasets from the human resources department is also descriptive analytics. Descriptive analytics is grounded in descriptive statistics. Diagnostic data analytics projects focus on using data to help identify root causes in a situation. Using data analytics to identify revenue fluctuation among customers can help diagnose volatility in revenue growth, this would be a diagnostic data analytics project. Identifying the driver of anomalies in expenses by analyzing historical expense data and the nature of those expenses is another example of a diagnostic project. Diagnostic analytics can also be referred to as root cause analysis. Predictive data analytics use existing data to find patterns that can be used to understand future outcomes. Understanding which customers might leave or churn is predictive analysis. Identifying a potentially fraudulent transaction out of millions of transactions is a predictive analytics problem. Predictive analytics problems are often solved through machine learning models as discussed earlier. Predictive modeling approaches include : simple linear regression, multiple regression, decision tree, neural networks, and clustering. Let's talk more about these predictive models. In simple linear regression, a supervised predictive model is used to predict an outcome variable as the function of an explanatory variable. Generally speaking the outcome variable and explanatory variable have a linear relationship. An example would be using tenure to predict salary level. Salary is the outcome variable and tenure is the explanatory variable. A simple linear regression could express this relationship and explore how significant tenure is as a single factor for explaining salary level. Multiple regression is a supervised machine learning model that can express the relationship between multiple independent or explanatory variables, and a single outcome or dependent variable. Expanding on the previous example, you might want to predict salary using tenure and other factors such as the location of employment and economic environment. You can use multiple regression to understand which factor or factors are most significant in affecting salary. A decision tree is another predictive model. As the name implies a decision tree uses a tree structure to map several paths and possible outcomes associated with each path. Decision trees are intuitive and transparent making them easier to understand. A neural network is a series of supervised or unsupervised algorithms that can detect patterns from data. This resembles how the brain processes information by transmitting and interpreting signals through billions of neurons. The final predictive model is clustering. Clustering is an unsupervised predictive model that works with unlabeled data. Clustering is used to find similar inputs to identify a representative value for that group. Prescriptive data analytics projects which are closely related to descriptive and diagnostic data analytics projects focus on using data insights to inform actions that can course-correct and underlying problem or create value for the organization. A prescriptive analytics project could project the revenue growth by a customer answering the question, how much is a customer likely to spend, and how much revenue will that generate for the company. The projected revenue growth of a customer could be based on various factors that influence a customer's purchasing decisions. With the result of the data analysis, you can make recommendations for maintaining revenue growth. Another example of a prescriptive analytics project is using data to identify business groups with a higher likelihood of expense reimbursement anomalies and recommending best practices to minimize expense reimbursement noncompliance. Advanced data analytics is often used to describe analytics projects that involve working with large amounts of complex datasets using various data analytics tools from statistical tools to machine learning tools to derive insights. An advanced analytics project could be descriptive, diagnostic, predictive, and prescriptive. Advanced analytics projects are often cross-functional and reliant on framing the right problem to set the scope of the analytics work. Wow, you now have a solid understanding of modeling and the five main types of data analytics projects: descriptive, diagnostic, predictive, prescriptive, and advanced analytics. With the various modeling approaches and methods at our disposal, it's still essential to understand the underlying data analytics concepts and be able to frame the problem and scope. This will put you in a stronger position to identify the right model to tackle the problem at hand, evaluate the model, and ask relevant questions. Next, we'll talk about tools that can aid data management.