Playlist

Data Warehouse

by Dr. Holger Aust

My Notes
  • Required.
Save Cancel
    Learning Material 2
    • PDF
      Slides Digitalization for Companies.pdf
    • PDF
      Download Lecture Overview
    Report mistake
    Transcript

    00:07 A data warehouse is an important part of modern companies.

    00:10 In the world we live in now, where data is everywhere, it is too important to not be able to analyze data consistently and measure results.

    00:18 Companies gain an edge over their competitors by making decisions based on data. A data warehouse is a central system for storing and analyzing data.

    00:28 Think of it as a box where all the information flows.

    00:31 But what does that mean now? The main job of a data warehouse is to make it possible for company data to be analyzed.

    00:40 That means that each department can make its own reports and evaluations to help it make good decisions.

    00:47 Integration and separation are the two main ideas that guide the data warehouse.

    00:53 Integration means that all of the different kinds of data from the different source systems are moved to the data warehouse, which is a central place to store data.

    01:03 For example, this would be the data from the patient management system, the clinic information system, and the ERP system in a hospital.

    01:12 All of this information is put together so that reports can be made to make sure patients are safe or to keep costs down.

    01:20 On the other hand, an online retailer can improve his service by storing transaction data from the ordering system, customer data from CRM, and dispatch data from the logistics system in a data warehouse.

    01:34 So, all of this information can be sent to the main memory.

    01:38 Even though it takes work to set up a data warehouse, it is worth it because a central place to store data has many benefits.

    01:47 First of all, it allows analysis to be done based on a single point of truth, or the source of truth. Why does that matter so much? If things were the other way around, each department would have its own truth.

    02:02 All of the data would be handled a little bit differently.

    02:04 For example, the number of patients would be different.

    02:08 These signs show, for example, that the salary would be different.

    02:12 There would have to be chaos.

    02:14 That's why it's so important that key figures and reports are based on the same data. The second principle is separation, which means that operational systems and analysis should be kept separate.

    02:29 When the ordering system is down, for example, it is the worst nightmare for a store owner. They are losing sales and only getting customers who are unhappy.

    02:38 So that should not happen.

    02:42 If the systems weren't kept separate, a careless mistake or an inexperienced worker could make the question too hard.

    02:48 Some complex analyses also require a complex query.

    02:52 So, these should not happen on the operational system, but instead in a separate place called the data warehouse.

    02:59 The only time the operational systems are put under stress is when data is being gathered. So only at night, when not much is going on.

    03:07 The flow of data is pretty easy to understand.

    03:11 The data, which are shown as dots, are taken from the different source systems and saved to the central data store through an integration layer.

    03:19 The integration layer is there to make sure that all the data is in sync and to change the structures. Then, the data is spread out in places called "data marts." The data marts are just copies of the sections that were made for each department.

    03:34 For instance, the Personal Data Mart has data about employees and their salaries.

    03:40 A customer Service Data Mart has information about customers and maybe information about transactions. This encapsulation gives the same more performance and keeps unauthorized people from getting to it.

    03:55 Let's go over what a data warehouse does once more: First of all, data acquisition is the process of connecting different source systems and keeping them up to date.

    04:06 Second, data storage.

    04:09 It is putting the data away for a long time, including on backup systems that are properly linked. Third, how many data marts there are.

    04:18 These are always filled.

    04:20 And fourth, looking at the data.

    04:23 Simple descriptive analyses and reports can also be made from the data warehouse itself.

    04:29 There are three steps you need to take to move data from one database system to another. Extracting, changing, and loading.

    04:37 The name for this is the "ETL process." During the extraction, the data is pulled from the data sources.

    04:45 This can be done on a regular basis, since the source can make extracts on a regular basis. Or it could be event-driven, which means that the source makes deductions when certain things happen, such as when a hundred changes are made to the database.

    05:00 Or it can happen when someone asks for it too.

    05:03 The data warehouse will then ask the source to make a copy.

    05:08 During the transformation, the data are changed and brought together.

    05:14 There is a difference between syntactic transformations, which are formal changes like how a date is written, and semantic transformations, which are changes in meaning. One example of this would be getting rid of entries that are already there. The last step, "load," is just putting the information in the target database. Do you want your company to have a data warehouse? Then you have to overcome a few obstacles.

    05:44 This can be roughly split into three parts: "System," "Competence," and "Parameters." So the first question is which system you need.

    05:54 After a system has been put in place, it takes a lot of work to change it.

    05:58 The second point is: How can a company build up its skills? Do you need to hire experts to do this? Or could external consultants do it?? During the implementation, this is the usual way things are done.

    06:10 On the other hand, it's not very common to outsource everything.

    06:14 And third, how the parameters are set.

    06:17 What information does which department need? Which information shouldn't be shared? Which update frequencies make sense from a content point of view and are also possible from a technical point of view? ...et cetera. The more these questions can be answered ahead of time, the better and easier it will be to set up a data warehouse.

    06:39 Let's talk about what a data warehouse can't do.

    06:42 And the most important part of that is data protection, because everyone has the right to informal self-determination.

    06:50 So, he can choose which personal information is shared with whom and when.

    06:57 Data avoidance or data economy is the main idea behind this.

    07:02 For a data warehouse, this means that the personal data must be made anonymous or made up to look like it belongs to someone else.

    07:10 With anonymization, it is no longer possible to find out who someone is.

    07:14 With pseudonymization, instead of saving personal information in the charts, only an ID number is kept.

    07:21 And there is only one protected chart that shows how each person is linked to their ID number. In conclusion, it can be said that a data warehouse is needed to create key figures for the whole company that are based on a single point of truth.

    07:36 Using these key numbers, you can measure the growth of each department in an objective way. Even though it takes a lot of work to set up a data warehouse at first, it is necessary because it is what makes data analytics possible.


    About the Lecture

    The lecture Data Warehouse by Dr. Holger Aust is from the course Data Data Data (EN).


    Included Quiz Questions

    1. Integration
    2. Separation
    3. Transformation
    4. Data protection

    Author of lecture Data Warehouse

    Dr. Holger Aust

    Dr. Holger Aust


    Customer reviews

    (1)
    5,0 of 5 stars
    5 Stars
    5
    4 Stars
    0
    3 Stars
    0
    2 Stars
    0
    1  Star
    0