CS614 Assignment 1 Spring
Solution 2021
Question 1: Suppose that you are the data analyst on
the project team building a data warehouse for an insurance company. List at
least three data sources from which you will bring the data into your data
warehouse?
Solution:
Operational Database:
Operational databases are used to keep track of, monitor,
and store real-time business data. An operational database, for example, may be
used to monitor warehouse/stock quantities. An operating database will be used
to keep track of how many goods have been sold and when the business may need
to reorder stock as consumers order products from an online web store.
In a computer database,
an operational database stores information about an organization’s activities,
such as customer relationship management transactions or financial operations.
An operational database
is used to manage the company’s day-to-day activities and transactions. It may
also be asked to assist with analytic processing by delivering real-time
dashboards or facilitating the integration of analytics into organizational
processes.
Archive datastore:
Almost all archive data stores are represented in
relational format. The mapping between the two is simple if the source data is
relational. Some source databases, on the other hand, will not be relational
and will take some work to convert.
The archive datastore
must be managed in a way that ensures its long-term viability. This is the main
objective.
Benefits of Data Archiving:
Ø Reduced cost:
Ø Better backup and restore performance:
Ø Prevention of data loss
Ø Increased security
Ø Regulatory compliance
Semi-structured:
Semi-structured data exists somewhere in the midst
of all of this. PACs are the most well-known example of healthcare, where a
database stores information about stored images (which is structured), but the
individual files (images) are unstructured data. PACS are typically built on
top so a SQL or Oracle database and the structured portion of the framework are
small in comparison to the unstructured image size.
Semi-structured data is
a hybrid of structured and unstructured data that combines the best of both
worlds. It also adheres to a set of rules, is consistent, and exists to save
space and provide clarification. Semi-structured documents include CSV, XML,
and JSON. NoSQL databases are commonly used to store semi-structured data.
Question 2: Data warehouse systems often have
complex issues due to many business requirements. Technical complexity
issues arise from three areas: sourcing issues, transformation issues, and
target issues.
Write at least two examples of each (Not more than
one line for each).
Transformation issues:
It takes various tests,
each of which takes time and is time-consuming when applied to larger data
sets, making it less accurate. During the transition, a lack of experience and
carelessness can cause issues.
A variety of
limitations exist in data warehouses, such as data authentication being fake at
times. In certain instances, data authentication is not feasible.
Target Issue:
The final stage is
Load, which is an operation that involves loading data that has not been
cleaned into the target system, resulting in an error. Irrelevant loading
causes an error in the target scheme.
Sourcing Issue:
Database path is
incorrect
Creating bottlenecks
due to insufficient CPU or Memory resource
Saving DATA in URDU,
FARSI in Database
Download File: |
Click Here |
---|
No comments:
Post a Comment