Today we are going to be seeing about GeoPandas. The first part we will cover is exploring, selecting, and manipulating GeoPandas subject. What is GeoPandas? GeoPandas essentially extends the Pandas library to add spatial geometry data type and enable spatial operation on these types of data using Shapely. GeoPandas further depends on Fiona that we saw for file access, and it uses descartes and matplotlib for plotting. How do you read data into a GeoDataFrame? If you already saw DataFrame, we already saw how to open a file in Pandas. Let's put them together. Here we have import OS, and then we have from geopandas import, GeoSeries, GeoDataFrame, read_file, and gpd. From matplotlib we're importing pyplot. Our input file again in this case is the New York boroughs data, the same data we saw in Fiona. All we have to do again here, assigning boroDF, boroughDataFrame is equal to gdp.read_file(input_file). Here's what you see. You just read a file, remember when we did this in Fiona, you had to go through and create a table in order to view this data. With GeoPandas, since you are putting it in a data frame, it makes it much more simpler to view the data. Automatically indexes 0-4 were created and then you see the attributes associated with it. You have BoroCode, BoroName, Shape_Length, and Shape_Area, followed by the geometry which has the multi-polygon. The same dataset, now we are seeing everything together in a single data frame. That's the power of GeoPanda. It's integrating Fiona as well as Shapely and other tools that we have, and Pandas of course. Let's explore this data frame. You see the index, we want to see the index. It's from 0-5, and it's step 1, so 0-4, so 0, 1, 2, 3, and 4. Columns, it has BoroCode, BoroName, Shape_Length, Shape_Area, and geometry, just like we saw. If you save values, you will see again all the values. This is all contained within the data frame, and when you just did a boroDF, it nicely displayed to you this entire table, which encapsulates what we're seeing here, individually index, columns, and values. However, there are other things you can also see. You can see what is the shape meaning how many rows and how many columns do you have? In this case, you have five rows and five columns so you'll see five-by-five. You can see the number of columns and the first line is seeing the number of rows. You're counting the columns in this example, sorry. You're counting the columns. In this case, you are counting the columns associated with the first row, and in this case you're seeing the column, so they match up. For example, you want to show the first column, in this case, it's an array which is Manhattan data. You have one Manhattan, the shape length and the shape area, and you can also access it as. What we are seeing here is you can access it. You can think about this data frame as a two-dimensional array, like we saw in series, it can be considered as a one data dimension array, or data frame can be thought of as a simple two-dimensional array, and you can access it directly with indexes. The first row and first column I want, of course, it starts with 00, so 0001 or 1011. This is like, if you think starting at one, it's the second row, second column. That's Manhattan. If you go back and look here, that's this column. It's that the index is not considered a data element, but these are all values. The first 0th row, first row, 0th column, first column Manhattan, and that's exactly what we get here. Here's some the example you can use to summarize the DataFrame. You can say describe this DataFrame, and then it will give you all kinds of statistics. You say median, mean, standard deviation, minimum value, all those kinds of things. For some, it might make sense, for some it might not make sense. But this will give you a statistics on each of these things which have numbers, not for strings, of course. You can transpose this, so for some data, you might need to transpose that data. It's just like a 2 by 2 matrix, you are just transposing it. In this case the columns became rows and the rows became column. You can sort the DataFrame, you can say, I want to sort it based on BoroCode. You just sort it in this case, initially, Staten Island was with index zero in this case, since you're sorting the BoroCode, Manhattan became the first one and you're printing all the values and you are sorting it in place and ascending is equal to two, meaning the value of boroDF will change. Otherwise, it will return another object, if you say inplace is false, then you will return another DataFrame which you can store in a variable, but the original boroDF will not be changed ascending just as it's sorting in ascending order, or if you say false, it will be sorted in descending order. Let's say you want to get subset of this data, how would you do it? In this case, boroDF column two will give you the first two rows of the data. Remember previously we have sorted the data, so you have Manhattan, Bronx, Brooklyn, Queens, and Staten Island, those are the five elements you have and then now if you are requesting the first two, so zero at the first row, and that's what you're getting. You can say, "Oh, I want to sort by area, and then give me the area in a descending order, and then give me the first two element." In this case you are doing two things at one, first you are sorting and then you are getting the results, soyou sort it and you get the result, so you get in this case, Queens and Brooklyn, which are the two biggest ones. Next, you can select rows and columns together. Let's say we wanted only area and BoroName, so that's what you will do, boroDF and you have this brackets and you say, "Well, select me this column named Shape_Area and another column says BoroName." Both those things is what you will get. Now, I want to select these two things, but I want only the first two rows. You can just say, "Give me a subset, these two columns, " and then you have another brackets, column two, which gives you just the first two row. Basically you are combining those things, so you're selecting rose and you are selecting columns. You can also do it the other way, you can say, I want the first two rows first and then I want three columns, in this case, Shape_Area, BoroName, and Shape_Length. It doesn't matter which order you want to select, you can select rows and columns interchangeably. You can add a column to this DataFrame. DataFrame is not just a static thing, it's a dynamic object so you can add column. Let's say you want to add a boro population, came up with some numbers and say, these are the population of each of these boros, and then I'm saying boroDF "Population" is equal to bpop this new array which I created. Now if I see this boro, you see a new column. In addition to the BoroCode, name, shape length, area, and geometry, you get a new column which shows the population. You might ask whether this is the only way of adding on new things. Let's say I don't know which order it's in or this is done in a dynamic fashion. Can we assign the values correctly to the appropriate rows? Let's take another example, in this case we are using a pandas series, we use a series and we create a little bit more realistic numbers for boro population hence assign each of these keys, Manhattan has this, Bronx has this, Brooklyn has this much population. Now, if you want to say, "Hey, put this boro population is equal to population," like we did previously, that will not work because these are not associated with the index currently. The index, if you remember are 0, 1, 2, 3, 4, those are the index which is on the right. Now if we want to make this BoroName as the index, we can do that, you can say, create me a new variable which has the same name, and I'm just setting the index as BoroName. Now if I see the boroDF, you will see that that additional column of index has been dropped, but this new column, BoroName is identified as the index. Now you can add this boroDF population, so you are saying a new population and you are assigning it this population. In this case, what will happen? It will automatically match these indexes and associate the correct values. If you don't do this, you would get the values might be assigned based on their ordering in the list. In the last case, you had ordering based on, so the first one got assigned to Queen's, the last one got assigned to Manhattan in the first case. In this case, we clearly say Manhattan has this value and so Manhattan will be assigned that value regardless of the place it is in the series. What have we seen till now? We have seen how to read the data using a Geopandas, we have seen how to do simple manipulation of the data, how to explore the data, how to explore the columns, the rows, how to subset the columns rows, and how to index using a DataFrame, and how to add new columns to the DataFrame. Thank you.