Hello and welcome. Let's recap on some of the key points that we've covered in this course before you take the practice exam. You explore the process of building a modern data warehouse. The process includes data ingestion and preparation, making the data ready for consumption by analytical tools, and providing access to the data in a shape format so that it can easily be consumed by data visualization tools. As a data engineer, you should know the best practices and considerations to follow when working with Azure Data Lake Storage, such as how your data is structured, your file sizes, and the implementation of a cooking process. You designed a star schema and learned how to classify your model tables as either dimension or fact. Generally, dimension tables contain a relatively small number of rows. Fact tables on the other hand, will contain a very large number of rows and continue to grow over time. You explore data loading techniques and how to optimize querying performance on Azure Synapse Analytics SQL pools for use in analytical workloads. Loading data is essential because of the need to query or analyze the data to gain insights from it. One of the main design goals in loading data is to manage or minimize the impact on analytical workloads while loading the data with the highest throughput possible. Azure Synapse Analytics has a rich set of tools and methods available to load data into SQL pools. You can load data from relational or non-relational data stores, structured or semi-structured, on-premises systems or other clouds in batches or streams. When loading data into Azure Synapse Analytics on a scheduled basis, it's important to try to reduce the time taken to perform the data load, and minimize the resources needed as much as possible to maintain good performance cost effectively. You learned that singleton or smaller transaction bachelor's should be grouped into larger batches to optimize the Synapse SQL pools processing capabilities. You also learn that Azure Synapse Analytics allows you to create, control, and manage resource availability when workloads are competing. This allows you to manage the relative importance of each workload when waiting for available resources. Azure Synapse Analytics is a high performing massively parallel processing or MPP engine that is built with loading and querying large datasets in mind. You saw that selecting the correct table distribution can have an impact on your data load and query performance. Here you worked with three table distributions including round robin distribution, hash distribution, and replicated tables. Next, you discovered that a well-designed indexing strategy can reduce disk input or output operations and consume less system resources, therefore improving query performance especially when using filtering, scans, and joins in a query. You also explored the dedicated SQL pools indexing options, clustered columnstore index, clustered index, and non clustered index. You learned how to improve query performance with materialized views. Materialized views results in increased performance since the data within the view can be fetched without having to resolve the underlying query to base tables. You can also further filter and supplement other queries in the same way you would in a table. You learn that using the read committed snapshot ensures data consistency. If you experienced delays in the completion of queries, the read committed snapshot isolation levels should be employed to alleviate this. You also learned how to optimize queries with results set caching. Azure Synapse SQL automatically caches career results in the user database for repetitive use. You also noted that result set caching allows subsequent query executions to get results directly from the persistent cache, so re-computation is not needed. You saw that Azure Synapse Analytics enables you to create either SQL pools or Spark pools within the workspace. These can be seamlessly mixed and matched based on your requirements. The integrated platform experience allows you to switch between apache Spark and SQL based data engineering tasks applicable to the expertise you have in house. As a result an apache Spark orientated Data Engineer can easily communicate and work with a SQL based Data Engineer on the same platform. Next, you work with windowing functions which can be used to perform calculations against a range of Data, and can also be used to programmatically define a D duplication of data technique or paginate results. You saw that you can use approximate execution to reduce latency when executing queries with large datasets. You discover that Azure Synapse SQL pools support placing complex data processing logic into stored procedures. Stored procedures are a great way of encapsulating one or more SQL statements or a reference to the microsoft.net framework, common language runtime or CLR method. You learned that you can pause and resume compute resources on-demand to reduce costs. When performing the batch movement of Data to populate a data warehouse, it is typical for the Data Engineer to understand the schedule on which the data loads take place. In these circumstances, you may be able to predict the periods of downtime in the data loading and querying process and take advantage of the pause operations to minimize your costs. You discover that Azure Advisor recommendations, are based on telemetry data that is generated by Azure Synapse Analytics. But telemetry data that is captured by Azure Synapse Analytics include data skew and replicated table information, columns, statistics data, temp DB utilization data, and adaptive cache. You learn that columnstore index scans a table by scanning column segments of individual row groups, maximizing the number of rows in each row group which enhances query performance. You also learn that materialized views in the Azure Synapse SQL pool provide a low maintenance efficient method to retrieve and view data from complex queries. Next, you saw that fully logged operations use the transaction log to keep track of every row change, whereas the minimally logged operations keep track of extent allocations and metadata changes only. Minimally logged operations can improve performance and provide more efficiency than fully logged operations. You learned how to configure authentication and explore the range of network security steps that you should consider to secure azure Synapse Analytics. These include authenticating individual users service to service authentication and managed identity. You learn that you can use the managed identity capabilities to authenticate to any service that supports Azure Active Directory authentication. You developed an understanding of Azure Key Vault and shared access signature. As a best practice, you shouldn't share Storage Account Keys, and you can use Azure Key Vault to manage and secure the keys. For untrusted clients, use a shared access signature. A shared access signature is a string that contains a security token that can be attached to a Uniform Resource Identifier or URI. You explored how you can manage authorization through column and row-level security within Azure Synapse Analytics. You saw that column level security allows you to restrict column access in order to protect sensitive data. Row-level security or RLS can help you to create group membership or execution contexts in order to control access, to not just columns in a database table but the actual rows. You learned how to implement encryption and should recall that transparent data encryption or TDE, is an encryption mechanism to help you protect Azure Synapse Analytics. It will protect Azure Synapse Analytics against threats of malicious offline activity. TDE encrypts the entire database storage using a symmetric key called a database encryption key or DEK. Well done, you have completed the course recap, now it is time to take the course practice exam.