In this system I would introduce something called use of PS on PSG notes. We'll discuss this genome and this DNA string in the maxing session, in the session on maxing functionality from bio strings. It is the yeast genome and we will run matchPattern on the yeast genome, or the yeast genome chromosome one. Now what we get out of this matchPattern here, and we saw that in earlier session, is something called a views object. Now it's a views, we can see it refer to a specific DNA strain object and it has a start and an end, looks a little bit like a DNA string set, but it also looks like an i ranges. And it actually turns out that underneath the hood, a views object is the same, is represented as I ranges. And we can get the ranges out of it by writing ranges, which is a single i ranges. We can check that these particular coordinates actually gets us the right nucleotides, so let's see here, 5 7 9 3 2 to 579.9, and that gives us the sequence we are matching for, so that fits. Even though it is a view and it's represented underneath the code as an IRange, we can do stuff like run functions on it as if it was a DNAStringSet. In other words, we can do something like alphabetFrequency on the view, and we get back the output of alphabet frequency on the DNA string set. Now, views are very powerful because they are very efficient and fast way of representing what is basically short sequences or short parts of a bigger object. All we need to stall is coordinates. We don't need to stall the actual sequence. So, a view consists of a a set of coordinates plus an object they link into. That means we can do things such as we can shift the view by 10 paces. Now we get another sequence, which is a hit but cf to ten bases. And this computation wasn't really possible if all we had was the eight bases of the hit. Let's see this in practice. We can do something like, we get back at g ranges by using the vmatchPattern function on this thing here, on the entire genome. UCA. And we get back at g ranges, and we can instantiate a view object with the views function. We say we want the view object on the Scerevisiae genome and the views are given at this Granges here. Now, we actually get the DNA screen associated with these coordinates. That's very powerful. And it allows us to easily represent many many subsequences, such as promoters, or. So let's say I take that into overdrive and compute the TC content of promoters in the yeast genome, so in order to get the promoters, let's load up a notation hop, which you should be familiar with by now. Let's say do a query on our AnnotationHub for this specific version of the genome. And let's say refSeq. Oh, let's just hit genes. Oh, I have to grade my AnnotationHub first. It's downloading the AnnotationHub. Let's query it. And we get back two objects, SGD Genes and Ensembl Genes. SGD stand for sophisticated genome database, it's the official database. And then the other thing is from ensemble. Let's just take the first one. Let's use double brackets for downloading the genes. And now we want to get the promoters. Well, there's a nice convenient function for getting promoters in Bioconductor which is just called promoters. Remember, from old dozen you can see that body examining the acument gives you a 2.2 Kp interval 2,000 on screen 200 bases off screen. No. Two thousand paces upstream and two hundred paces downstream of the transcription stop site. Now, we've got a lot errors and warnings. Warning messages. Why is that? Well, it turns out, that there are some of these genomes that are right at the boundary of the genome. Let's look at element number two, here. That started around base 500 on chromosome one, and it goes to base -1665. G ranges complain when it gets like genome indices that are less than zero. And we get rid of this by taking the promoters, and we trim them off, and basically we cut off anything outside the sequence length of the genome. And so now we can see that we have some promoters that are less than 2.2 kb long. Now, this is the promoters, now we are going to instantiate a view of these promoters. So that's very simple, we say, views scerevisiae, and these are promoters. Look how fast it was, and now we have the coordinates. But we also have the nucleotides in the different promoters. Now what remains is to get something like the cheesy content. But now basically, run letter frequency and the views. We want C and G characters, and we want as.prob equals true. This was the GC content of all promoters in the genome computed in no time. Let's do a density plot of a plot density of these values. Okay, here we have a density plot with gcProm on the x axis and you may remember from an earlier session that the GC content of the genome is roughly 38%. Or was it 39? I recompute it there, but let's just say 38. And we can see the promoters don't really seem to have dramatically different GC content than the rest of the genome. [COUGH] That may be surprising, but remember that the yeast genome is a small genome. It's filled up with coding sequences and regulatory regions unlike the human genome. So this was a small example of how powerful views can be. We will see in a later session that the instantiated views and other big objects and this is a very, very efficient way of doing some computations.