Data Warehousing Interview Questions and Answers with Explanation

Q:

What is data cleaning? How can we do that?

Answer

Data cleaning is also known as data scrubbing. Data cleaning is a process which ensures the set of data is correct and accurate. Data accuracy and consistency, data integration is checked during data cleaning. Data cleaning can be applied for a set of records or multiple sets of data which need to be merged.

Data cleaning is performed by reading all records in a set and verifying their accuracy. Typos and spelling errors are rectified. Mislabeled data if available is labeled and filed. Incomplete or missing entries are completed. Unrecoverable records are purged, for not to take space and inefficient operations.

Workspace

Report Error

View answer Workspace Report Error Discuss

Subject: Data Warehousing

0 3315

Q:

What is Data Mart?

Answer

Data Mart is a data repository which is served to a community of people who works on knowledge (also known as knowledge workers). The data resource can be from enterprise resources or from a data warehouse.

Workspace

Report Error

View answer Workspace Report Error Discuss

Subject: Data Warehousing

0 3271

Q:

Describe about Physical Data Integration?

Answer

- Physical Data Integration is all about creating new system that replicates data from the source systems.

- This process is done to manage the data independent of the original system.

- Data Warehouse is the example of Physical Data Integration.

- The benefits of PDI include data version management, combination of data from various sources, like mainframes, flat files, databases.

- A separate system is needed for handling vast data volumes.

Workspace

Report Error

View answer Workspace Report Error Discuss

Subject: Data Warehousing

0 3242

Q:

Explain the use of lookup tables and Aggregate tables.

Answer

At the time of updating the data warehouse, a lookup table is used. When placed on the fact table or warehouse based upon the primary key of the target, the update is takes place only by allowing new records or updated records depending upon the condition of lookup.

The materialized views are aggregate tables. It contains summarized data. For example, to generate sales reports on weekly or monthly or yearly basis instead of daily basis of an application, the date values are aggregated into week values, week values are aggregated into month values and month values into year values. To perform this process aggregate function is used.

Workspace

Report Error

View answer Workspace Report Error Discuss

Subject: Data Warehousing

0 3158

Q:

Describe the foreign key columns in fact table and dimension table.

Answer

The primary keys of entity tables are the foreign keys of dimension tables.

The Primary keys of fact dimensional table are the foreign keys of fact tablels.

Workspace

Report Error

View answer Workspace Report Error Discuss

Subject: Data Warehousing

1 3156

Q:

What is Cascade and Drill Through? What is the difference between them?

Answer

Cascade:

- Cascade process involves taking values from various other prompts.

- The result is a single report.

- The result is used when a criteria is to be implemented.

Drill Through:

- Drill Through process is implemented when navigation from summary to detailed information.

- Drill Through has a parent and a child report.

- Data of another report can be seen based on the current details of data.

Workspace

Report Error

View answer Workspace Report Error Discuss

Subject: Data Warehousing

0 3017

Q:

What are the different models used in cluster analysis?

Answer

There are many algorithms that can be used to analyze the database to check the maintenance of all the data sets that are already present. The different types of cluster models include as follows:

- Connectivity models: these are the models that connect one cluster to another cluster. This includes the example of hierarchical clustering that is based on the distance connectivity of one model to another model.

- Centroid models: these are the models that are used to find the clusters using the single mean vector. It includes the example of k-means algorithm.

- Distribution models: it includes the specification of the models that are statistically distributed for example multivariate normal distribution model.

- Density models: deals with the clusters that are densely connected with one another in the regions having the data space.

- Group models: specifies the model that doesn’t provide the refined model for the output and just gives the grouping information

Workspace

Report Error

View answer Workspace Report Error Discuss

Subject: Data Warehousing

0 2937

Q:

Explain the difference between star and snowflake schemas.

Answer

Star schema: A highly de-normalized technique. A star schema has one fact table and is associated with numerous dimensions table and depicts a star.

Snow flake schema: The normalized principles applied star schema is known as Snow flake schema. Every dimension table is associated with sub dimension table.

Differences:

- A dimension table will not have parent table in star schema, whereas snow flake schemas have one or more parent tables.

- The dimensional table itself consists of hierarchies of dimensions in star schema, where as hierarchies are split into different tables in snow flake schema. The drilling down data from top most hierarchies to the lowermost hierarchies can be done.

Data Warehousing Questions

Quick Links