When you submit a comprehensive genome analysis job and Patrick Hickey submitted in reach, you get an assembly report and you also get a genome report. We talked about a lot of files that come with the comprehensive genome analysis. In this particular video we're going to look in detail at the kind of data that you get back when you've submitted reads for assembly. So to find our job, we can go to the jobs output page. Click on that, find the job that we want, highlight that, click the View button. And this gives us all the information about our particular job, including its name and where it's located within the workspace. Some different information about how long it ran, the parameters that were selected when the job was submitted. In an earlier video I talked about the full genome report and then a separate one. I talked about each of these other files that are associated with this analysis. Notice these two checkered flags, one for assembly and one for annotation. The comprehensive genome analysis job is giving you a separate assembly and a separate annotation. And it makes it so that you don't have to do an assembly and then do an annotation when it's done, it's more of a streamlined service. So if we want to look at the details from the assembly job, let's click on this row with the checkered flag. A checkered flag indicates a successfully run part of the comprehensive genome analysis job, and this is telling me that I had a successful assembly. So I double click on that and that reopens the page, and it tells me details about the assembly. Once again, the job ID and how long the assembly took to run. Now notice that there's an assembly report, the fast state file for the context, and these are just commands that were used. And we'll click on the details file in a second, but the main thing I want you to see is this assembly report. So I highlight that and I click on the view. And this is just telling me information I might want to know about the assembly. All of this is also, you can also see it in the full genome report that we described in another video. But this tells me what the inputs were, there were three separate files, one of them was PAC bio and two of them were aluminum. The PAC bio was single read, the aluminum were paired in reads. The assembler used was Canoe, it ran 1.3 hours. This was the command that it ran, and this was the ending contact fast aid file. There are direct links to the different cost reports, but that information is summarized here, which gives you some information about the number of contexts when you look at the various sizes. So the PAC bio was the large one and it was greater than 50,000 base pairs, and the two little pair dated files were the smaller ones. And it gives you the length of those things. It tells you the size, the total number of contexts, and the largest context, total length, GC, and 50 and 75, L50, L75, which are common statistics that are reported for genome quality. And it also tells you how many unresolved nucleotides they could find per 100 kilobase pairs, and that was zero. This is just telling some of the things that were parameters that were run and how many changes. With the pylon iteration they did 406 changes in the first round, and then it looks like 5 and 2 in the wreck on. They didn't do any changes, so that's interesting. Those are interesting information there to see. Those iterations, they go back through the assembly and try to resolve places where there seem to be mismatches between the reads. So this is telling you how many times it happened the first time, the second time, and the third time. It tells what the settings were, which was minimum contact length of 300, and the size of 5. And look at this, this is the bandage plot which just shows you, I just always think these are cute, they're probably not very useful. Probably can't use it in a publication, but it's very cute. So let's go back to the assembly job, I click A map breadcrumb here, the fast a file is just a text file. If you wanted to go take it and the annotate the assembly at a different service, you could just take that and go with it. But then there's this details folder, so let's click on that. And this gives you all the information about the canoe report, standard errors that the cost reports. All that information is here for you to look at and dig down into detail in death, but most of its summarized in that full genome report. So, I encourage you to look at that. And in the next video I'll talk about the annotation job and how you can see the results for that. Thanks for joining. >> Everyone, I have some good news since I recorded the original video. We've started including total coverage of the genome in our assembly reports. We used to just provide it on at each context, but now we do the whole thing, so let me show you how to find it. Let's just pick a job, this is from one of those hybrid assemblies. I'll pick the view icon and then I'll go into the assembly job and then I want to click on the report here, so you can see where this would be. So in the assembly report now, in this part under the assembly, it'll tell you how long it took another information. But this tells you the long read coverage, this was a hybrid assembling, so this is for the PAC bio reads. And this will tell you the short read coverage, so these would need two things you could put in your paper, if you publish on it. One other thing I wanted you to note is that, if you scroll down under post assembly transformations. It'll tell you at which point it make changes based on doing these iterations to try to better resolve what the nucleotide was called at a particular position. And so it's showing that at the first pylon iteration. That's where it started making all the changes, and then after that it dropped of-. If I had known this from the beginning and wanted to save time, which of course you don't. But I would have chosen to just do one pylon iteration with the rack on to see what it would result in, so that's where coverage is. Okay, it's time for your fourth task of the comprehensive genome analysis instructions. This one is going to be a little bit complicated, but don't worry, I'm going to step you through it. But I want you to do is open the assembly report, for each of those 27 hybrid assembly jobs that you submitted. Will record the job ID, capture the time that the assembly took and capture the long and short read coverage. And then we're going to drill down into the details folder and download the TSV file and then capture the cost statistics from that. So, I'm going to show you how to do it. Let's go here to my job and we're going to be putting that into this file that we captured earlier. So I'm just going to step you through one of these, here is a job, the first one I want to look at, so I click on that and I click view. And this tells me the name of the job, which you could see in my spreadsheet, I didn't have, so I'll capture that name here. And it was unicycular 00 and I'll paste that there, and then here's the job ID here. This is just so you have it, so we're all keeping good record keeping. And I'm going to paste that job ID there. Here is the assembly, so one click on that. And then we have the assembly_assembly_report html. First we need to open that one, so I'll save you that, and within that there are a few statistics that I want you to record. One is that this job took 24.6 hours on mine, I mean, you may not care what these things are, but it'll give you an idea of how long a job will take. And then I want you to capture the long read coverage, cause often people are asking for coverage statistics for their genomes. And you paste that there. And then the short read coverage because this was a hybrid assembly, it's got both long and short read coverage statistics. So those are things you would put directly into your manuscript. All right, so we have that part, but now we want this stuff from the cost report, which is right here. But it's going to be difficult for me to parse this out of this HTML report. But rather than cutting and pasting, and cutting and pasting, and cutting and pasting a gazillion times, let's go back here to assembly. Click on the details. And where it says this TSV file, go ahead and download that. And then I asked to see it in Finder. And I open that with Excel. Here's the Excel thing. So generally what I do, because I've already put it in this order for you in the spreadsheet. So I just copy it here, and then I go over to the spreadsheet here, and you can see that this spreadsheet goes on, and on, and on, and on. So I click the first row, and then I transpose it. And you can see it's showing me the number of contexts, how big they are, all the information. We're going to do the same thing later with the annotations statistics, but this is what I want you to do that for each of those 27 jobs. And then you'll start to see things appearing clear differences between the two strategies. And one of the things that I thought was interesting was the largest contig, information like that for you to examine. Good luck with that. I know it'll take a while, but really, as you do it you'll get a greater understanding for what the different strategies, how they work, and what they do.