So now, let's have a look how linear regression is done using Spark ML. You first create a new notebook in Watson Studio and and you give it a name, linear regression, and we select this background time. This is important. Now, we click on "Create notebook." So now you obtain the file from the object store. You change this to parquet since we want to have the file in parquet format and not in CSV format. We get rid of this crap, it's actually far more easy. It's just spark that read.parquet. So let's show the data after we have read it. This should be straightforward and fast. Okay. Here you see that everything works as it should. So let's swap the variables, because I prefer to work with df instead of df_data_1. Then we create a temporary query view in order to write SQL statements against the DataFrame, and we say spark.sql. So we create a multi-line statement. So since this is data which suits more for classification example, but we want to exemplify a linear regression example, let's feature engineer one additional column, and a very good regression column in an SLR meta context is energy. So we create SQL statement which computes the energy for us. We say select, square root of sum of x squared plus the sum of y squared plus the sum of c squared, and we call this new column label. Since later during training it's more easy if you call it label than energy. Of course, we need to class from where we pick this example, and we are basically grouping by the class. So we are computing all the energy created per movement pattern. So one thing you have to mention is that this is not the actual energy, because we don't know the mass of the device. So we have to incorporate the mass of the device in order to compute the energy correctly, but for now this is fine. The other thing is that we are not normalizing this data because you might have more example for one class than for another one. But here we see that third class we have now computed a continuous variable, and that's data we can use for regression. But before we can do that, we have to join the original data with the new DataFrame. So let's actually create a new DataFrame out of this SQL statement, which we call df_energy, and we register this as an additional temporary query table in order to issue SQL statements against this new data frame. So now we actually join those two together. We call it df_ join, which is nothing else than spark.sql. Again, we create a three. Again, we create a multi-line SQL statement. So you write select star from df inner join, df_energy. On df.class equals df_energy.class. If that's not familiar to you, don't worry, we will show it later how it looks like. So it will be really clear for you. So df join.show shows us the contents of the join data. Here per class, we just put the overall energy essay label. That's something we can now predict using those vibration sensor data. So now we import linear regression model and we create an instance of the linear regression model. We have three parameters, x duration, read param, and the elastic net parameter. We will come to this later. Let's take the defaults for now. Now we create the pipeline. So import pyspark.ml Pipeline, and now we create a new pipeline. Instance it pipeline the stages, and we have VectorAssembler, normalizer, and the linear regression model. So only three stages in this case. So now we create a new model by executing the pipeline. So pipeline.fit of the DataFrame. Now to predictions is the model.transform DataFrame. So now let's obtain the linear regression model from the stages. So it's Stage Number 3. On the summary, we have R2 measure, which is nine percent, which is not the best. But for illustrative example, this is sufficient.