Data Warehousing Questions

Q:

What is the difference between view and materialized view?

Answer

View:


- Tail raid data representation is provided by a view to access data from its table.


- It has logical structure can not occupy space.


- Changes get affected in corresponding tables.


 


Materialized view


- Pre calculated data persists in materialized view.


- It has physical data space occupation.


- Changes will not get affected in corresponding tables.

Report Error

View answer Workspace Report Error Discuss

0 2062
Q:

What are the different models used in cluster analysis?

Answer

There are many algorithms that can be used to analyze the database to check the maintenance of all the data sets that are already present. The different types of cluster models include as follows:


- Connectivity models: these are the models that connect one cluster to another cluster. This includes the example of hierarchical clustering that is based on the distance connectivity of one model to another model. 


- Centroid models: these are the models that are used to find the clusters using the single mean vector. It includes the example of k-means algorithm.


- Distribution models: it includes the specification of the models that are statistically distributed for example multivariate normal distribution model.


- Density models: deals with the clusters that are densely connected with one another in the regions having the data space. 


- Group models: specifies the model that doesn’t provide the refined model for the output and just gives the grouping information

Report Error

View answer Workspace Report Error Discuss

0 2035
Q:

Explain about various caches available in Data Integrator

Answer

  •  NO_CACHE – It is used for not caching values.

  •  PRE_LOAD_CACHE – Result column preloads and compares the column into the memory, prior to executing the lookup.

  •  PRE_LOAD_CACHE is used when the table can exactly fit in the memory space.

  •  DEMAND_LOAD_CACHE – Result column loads and compares the column into the memory when a function performs the execution.

  •  DEMAND_LOAD_CACHE is suitable while looking up the highly repetitive values with small subset of data.

Report Error

View answer Workspace Report Error Discuss

0 2031
Q:

What are the factors that are addressed to integrate data?

Answer

Following are the data integration factors:


- Sub set of the available data should be optimal. 


- Noise/distortion estimation levels because of sensory/processing conditions at the time of data collection.


- Accuracy, spatial and spectral resolution of data.


- Data formats, storage and retrieval mechanisms.


- Efficiency of computation for integrating data sets to reach the goals.

Report Error

View answer Workspace Report Error Discuss

0 2017
Q:

What is Dimensional Modeling?

Answer

Dimensional modeling is often used in Data warehousing. In simpler words it is a rational or consistent design technique used to build a data warehouse. DM uses facts and dimensions of a warehouse for its design. A snow and star flake schema represent data modeling.

Report Error

View answer Workspace Report Error Discuss

0 2006
Q:

What are fact tables and dimension tables?

Answer

Business facts or measures and foreign keys are persisted in fact tables which are referred as candidate keys in dimension tables. Additive values are usually provided by the fact tables which acts as independent variables by which dimensional attributes are analyzed.


Attributes that are used to constrain and group data for performing data warehousing queries are persisted in the dimension tables.


 

Report Error

View answer Workspace Report Error Discuss

0 1970
Q:

How do we measure progress in Data Integration?

Answer

Look for the existence of the following items:


- Generic Data Models


- An Enterprise Data Platform


- Identify the Data Sources


- Selection of a MDM Product


- Implementation of a Customer Master Index or appropriate alternative

Report Error

View answer Workspace Report Error Discuss

0 1957
Q:

Define "Correlated Subqueries" ?

Answer

In a SQL Database, a 'correlated subquery' is a kind of sub query yet connected subquery is dependent on another query for a value that is returned. In case of execution, the sub query is executed first and afterwards the correlated query will be executed.

Report Error

View answer Workspace Report Error Discuss

6 1919