Snowflake Interview Questions and Answers
Note: This is not a video based course.
We have put together a detailed and real-time list of interview questions (about 300+) which will help you to answers all levels of questions that are asked in an interview. These questions and answers are helpful to both freshers and experienced.
This guide is a consolidated effort of Data Management consultants and more than 4 Snowflake Architects. All the questions in this guide were asked in a real interview at least once, over the past 3 years. The questions in this guide were either asked to them or asked by them to other interviewees in interviews. Every question is accompanied by a detailed answer (technical and functional) with examples or illustrations.
You can use these Interview questions as additional study material to prepare for your SNOWPRO CORE CERTIFICATION.
Your Instructor
Course Curriculum
-
StartExplain about Snowflake architecture?
-
StartWhat is the Snowflake Data Warehouse?
-
StartHow does Snowflake Work?
-
StartWhat are the three layers of Snowflake architecture?
-
StartIs Snowflake an MPP database?
-
StartExplain about the different table Types available in Snowflake?
-
StartWhich Snowflake edition should you use if you want to enable time travel for up to 90 days?
-
StartWhat are Micro-partitions?
-
StartCan you create Transient Views in Snowflake?
-
StartExplain about the differences and similarities between Transient and Temporary tables?
-
StartBy default, clustering keys are created for every table, how can you disable this option?
-
StartWhat is the default type of table created in Snowflake?
-
StartHow many servers are present in X-Large Warehouse
-
StartAs Snowflake should use one of the cloud provider (like AWS or Azure) as part of its architecture, why can't the AWS database Amazon Redshift can be used instead of the Snowflake warehouse?
-
StartWhat view types can be created in Snowflake but not in traditional databases?
-
StartIs Snowflake a Data Lake?
-
StartWhat are the key benefits you have noticed after migrating to Snowflake from a traditional on-premise database?
-
StartWhen you execute a query, how does Snowflake retrieves the data as compared to the traditional databases?
-
StartExplain the difference between External Stages and Internal Name Stages?
-
StartExplain the difference between User and Table Stages?
-
StartYou are working in a Investment Bank and you want to explore on Snowflake and decided to create a free Snowflake account. Your Manager has Instructed you to use Virtual Private Snowflake Edition for the free trail as this edition provides dedicated servers for your company? What do you suggest?
-
StartWhat are the constraints which are enforced in Snowflake?
-
StartWhat is unique about Snowflake Vs Other Warehouses?
-
StartYou have created a Virtual Warehouse with a warehouse size of 2X-Large, and extracted data from different data sources applied Address Validation standardization using third-party tools before loading the data into the warehouse. As part of the compliance requirement, you need to make sure to compare the source data address with the address loaded in the target after applying the address standardization and load the records which don't match into a temporary table. The temporary table which is created in the Prod environment has to be cloned to the test environment for the development team to review the data and then provide the results to the Business Team? What is your recommendation?
-
StartYou have recently created your Snowflake account, and created few jobs which extract data from an SAP HANA, and during one of the Product Release testing, there are some failures due to which some of the virtual warehouses (2X-Large) are not available? What do you recommend?
-
StartYou have observed that a store procedure which is getting executed daily at 7 AM as part of your batch process is consuming resources and the CPU I/O is showing as 90%, and the other jobs which are getting executed are impacted due to the store procedure. How can you quickly resolve the issue with the store procedure?
-
StartYou have migrated from Teradata to Snowflake, one of the main issues which you faced in the old system is with the MPI system data. The MPI data is sent by the source system daily at 01:00 AM and the ETL Process will take around 5-6 hours and loads the data into the target tables between 06:00 AM to 07:00 AM. After the ETL process is complete no other process will modify the data in these tables until the business users check and confirm the Ledger Transactions. After the users confirm, the Indicators in the target table are updated and data will be loaded to the downstream tables. The Issue which is faced by the business users while accessing the data in Teradata is, each user has to wait for 01-02 hours to get the required General Ledger Stats, and sometimes when multiple queries are executed the CPU I/O usage was high. Business users want to get the query results immediately as these are static SQL’s which they use on daily basis. How these issues can be fixed in the newly migrated Snowflake database?
-
StartYou have migrated from Teradata to Snowflake, in the old system (Teradata) few ETL batch loads are scheduled to execute using Teradata Tpump load utility, TPump uses row hash locks, meaning users can run queries while it’s updating the Teradata Warehouse. In the new system(Snowflake), you should also use above approach. i.e. Users should be able to access the data, while the loads are being executed on the Snowflake Warehouses without any issues. What is your recommendation?
-
StartYou are using the Snowflake connector in Informatica Cloud (Data Integration tool) to process some data as per your batch requirements, you have extracted data from different data sets and loaded the data into the stage tables, from stage tables the data will be loaded to your warehouse. The data in the stage tables are always truncated and reloaded for every load. In Snowflake, you can define the stage tables type as
-
StartSome queries are getting executed on a warehouse and you have executed Alter Warehouse statement to resize the warehouse, how this will effect the queries which are already in execution state?
-
StartIts a best practice to disable the fail-safe for temporary tables, these tables exist only for the duration of a session and are not queriable by any other user, disabling fail-safe will help in reducing failsafe storage for temporary tables? What do you recommend?
-
StartYour company has recently procured Snowflake Standard edition, as per the Initial plan you have planned to migrate applications one by one and then upgrade the Snowflake to Enterprise Edition, but all the applications are dependent on each other, so have migrated all the applications at the same time to Snowflake Standard Edition. As queries are submitted to the warehouse, Snowflake has queued most of the queries due to Insufficient resources which are causing issues in users accessing different applications. How can you resolve the above issue at the earliest?
-
StartA new business analyst has joined your project, as part of the on-boarding process you have sent him some queries to generate some reports, the query took around 5 minutes to get executed, the same query, when executed by other business analyst's, has returned the results immediately? What could be the Issue?
-
StartYou are working in an Insurance company, and you have planned a major deployment on the weekend which includes extracting historical data from PowerExchange and load it into one of the Snowflake database tables, the load took around 20 hours to complete and the data is validated to be released to the users on Monday morning so that the users can complete the review as part of the compliance process for the newly launched MedSupp Policies. One of the Incremental ETL Jobs which you have deployed as part of this major deployment has the Truncate Target Table option enabled and the data which is loaded PowerExchange is deleted when the job executed on Monday. What is the best approach to recover the historical data at the earliest which was accidentally deleted?
-
StartYou have created a warehouse using the command create or replace warehouse OriginalWH initially_suspended=true; What will be the size of the warehouse?
-
StartYou are changing the scaling policy for a warehouse from Standard to Economy. You want to make sure the SQL statements from the application can be queued for only 180 seconds and if there are any queries which run for more than 360 seconds should be canceled by the system. Which parameters should you configure for this requirement?
-
StartYou are executing some queries on Medium size Warehouse, the queries are getting executed for a longer period of time than expected. You are planning to re-size the warehouse to X-Large size? can you resize the warehouse when the queries are still executing?
-
StartYou have created an External Table(E_Prev_MPI) in which you have loaded all the Historical data from an MPI source system, you need to join the E_Prev_MPI table with one of the tables in Warehouse (W_Curr_MPI) which has the current snapshot of data, and if there are any matching records, you need to update the E_Prev_MPI.Matched column to 'Yes' There are a lot of performance issues while performing the update so you have created Partition on the E_Prev_MPI table. Is this the best approach?
-
StartYou are working in a medical services company, as per the guidelines of the legal team, any objects containing PII data must not be visible to those who do not require access to the PII data. What is the best approach for the above requirement?
-
StartYou are working in a major telecom company, you collect transactional data from different switches which generate huge(1 TB) CDR (call detail records) volume every day. All the CDR records are loaded to a summary table ( which is present in the Snowflake warehouse of size 4X_Large) and different reports are generated based on the daily revenue generated on the calls for each region. The queries are getting executed for a long time to generate the daily reports which are based on the transaction date and region, what is the best approach to optimize the queries to generate the reports faster
-
StartYour organization has planned to procure Snowflake and has decided to migrate the application in a phase by phase manner. In the first phase, you have planned to Include some non-critical applications and the requests from these applications can be queued up to 24 hours till they are processed. You want to keep the costs low in the first phase which can be increased going forward. Select the best Warehouse for the above requirement.
-
StartTableau reports are configured to query the data based on the Joins from multiple Snowflake tables which have a size of 120 GB each, the reports will be accessed by the senior management to make critical decisions. Some of the users who are accessing the reports, have faced severe performance issue to load the Dashboard and access the reports. What can be done so that the users can get fast response time for accessing the dashboards and reports?
-
StartYou are trying to debug a production issue due to which some of the reports are showing incorrect numbers. The issue is with data loaded in a summary table for a particular policy(AMT00877TR5) is showing Incorrectly, the table from which the data is fetched is of SCD-Type 1, by the time you checked the data, the tables were already updated with the latest data. You are not sure about the root cause of the issue. How can you check the data before it was updated?
-
StartYou have created some reports which access huge volume of data, the reports are configured to perform range and equality searches and business users generate the report on every first business day of the month. The reports are generated based on the data available in the Snowflake tables, and the report generation process is very slow. You have been Instructed to not add any more storage costs due to some project constraints. What can be done to improve the performance of the reports?
-
StartA query is executed from the client and the query result exists in the result cache and the underlying data is not changed. The query results are returned from
-
StartA manufacturing company decided to implement data sharing and share data about the progress of orders directly to the consumers of their products. However, a customer must only be able to see the orders they have placed.
-
StartYou have scheduled a job to re-cluster a table on the weekend, but the DBA Team will suspended all the virtual warehouse's on the weekend. What error will you get when your job is triggered?
-
StartYou have a ETL process, which is currently getting executed on Oracle which is installed on a single cluster, the process starts at 7 PM and it will take around 10 hours to complete. You are now migrating to Snowflake, you should provide clear estimate on how much time the job will take to complete in the new Snowflake Prod environment, your manager is concerned on the credits ($$) charged that will be charged by Snowflake. How will you provide the stats for this?
-
StartThe batch load will be completed at 3 AM you have kept a buffer of 2 hours and you have scheduled the clone jobs to copy data from Prod to Test Environment every Monday morning at 5 AM. Due to some issue the batch process executed till 7 AM and you did not hold the clone jobs. When the clone jobs are executed, how the data will be cloned from Prod to Test when the data load is in progress?
-
StartYou have created a network policy but the policy is not enforced in Snowflake, what could be the issue?
-
StartYou have recently joined a company which is using Teradata database, some of the Architects proposed that they should migrate their database from Teradata to Snowflake but the customer is not clear on the benefits they get by moving to Snowflake as both Teradata and Snowflake are Massive Parallel processing (MPP) systems. What is your recommendation to the customer on using Snowflake compared to the Teradata database?
-
StartMost of the customers who use Snowflake talk about zero-copy clone equating to zero-cost development, do you agree with this?
-
StartWhy Snowflake has the option to create the Primary and Foreign key constraints when these can not be enforced?
-
StartYou have created a standard multi-cluster warehouse with Maximum clusters as 10 and Minimum Clusters as 3, lets say the warehouse is using 8 clusters and users have executed several queries which all are cached and users are able to see the cache results faster, now lets say the warehouse has scaled down from using 8 clusters to 4 clusters, will the cache files will be reused when the users execute the same query?
-
StartYou are trying to perform some data loads using Snowpipe, the load is taking longer than excepted so you stopped the existing load and increased the size of Virtual Warehouse to X-Large, when you restart the load does the load resume from the point where it was last stopped?
-
StartIn Snowflake, you can use sample clause to limit the number of records from a table, what are some of the use cases where you have used sample clause apart from fetching the sample records from a table?
-
StartYou have a list of tables around 150, in Snowflake, you need to get the names of the corresponding table stages, and send them to the downstream team, how can you get the table stage names?
-
StartIn Snowflake, you can limit the records being fetched by sample clause, which sample clause will impact the query performance and why?
-
StartYou are loading the data to a table based on which HealthCare reports are generated, the load doesn't have any issue and its getting completed in less than 2 minutes, but when users are accessing the reports they are severe performance issues, the reports are filtered by two specific columns, you have suggested to use the Clustering keys or Search Optimization, but the Snowflake Administrator is not in favor of creating Clustering keys or Search Optimization or Materialized views. How can you solve this issue?
-
StartHow Snowflake charges for Data Storage?
-
StartWhat is the difference between Replication and Cloning?
-
StartIn Secure Data Sharing, who pays for the compute and storage resources?
-
StartYou are a Data Provider, and you are sharing your data to non-Snowflake users. How can you control the data usage by the Data Consumers?
-
StartYou have just procured Snowflake and created 4 warehouses, how much will snowflake charge for this?
-
StartYou are using Snowflake throughout the year every day for 3 hours, and loading the data into Virtual Warehouse which has a size of X-Small. You are now trying to estimate the yearly credits of the Snowflake usage, your ETL analyst based on the ETL schedule has specified that the snowflake was used for 260 days. How many credits are charged by Snowflake for this usage?
-
StartDoes Snowflake charge any credits when you perform re-clustering on a table?
-
StartYou have used Snowflake trail account, and decided to proceed with migrating your data to Snowflake Enterprise Edition. You have clear stats on the required storage. Which purchase plan do you recommend?
-
StartHow did you optimize the incurred costs in Snowflake?
-
StartExplain how on-Demand vs Pre-Purchased Capacity will impact your Project Budget?
-
StartYou need to provide high level estimates on the cost of using Snowflake with different Cloud providers using different regions? Which utility can you use to get these stats?
-
StartHow choosing the Incorrect storage type will Impact your budget?
-
StartYou have selected AWS as your cloud provider, where can you check on how much AWS is charging your account for the Snowflake usage?
-
StartHow can you check the consumed credits in your account?