00:07
A data warehouse is an important part of
modern companies.
00:10
In the world we live in now, where data is
everywhere, it is too important to not be
able to analyze data consistently and
measure results.
00:18
Companies gain an edge over their
competitors by making decisions based on
data. A data warehouse is a central system
for storing and analyzing data.
00:28
Think of it as a box where all the
information flows.
00:31
But what does that mean now?
The main job of a data warehouse is to make
it possible for company data to be analyzed.
00:40
That means that each department can make its
own reports and evaluations to help it make
good decisions.
00:47
Integration and separation are the two main
ideas that guide the data warehouse.
00:53
Integration means that all of the different
kinds of data from the different source
systems are moved to the data warehouse,
which is a central place to store data.
01:03
For example, this would be the data from the
patient management system, the clinic
information system, and the ERP system in a
hospital.
01:12
All of this information is put together so
that reports can be made to make sure
patients are safe or to keep costs down.
01:20
On the other hand, an online retailer can
improve his service by storing transaction
data from the ordering system, customer data
from CRM, and dispatch data from the
logistics system in a data warehouse.
01:34
So, all of this information can be sent to
the main memory.
01:38
Even though it takes work to set up a data
warehouse, it is worth it because a central
place to store data has many benefits.
01:47
First of all, it allows analysis to be done
based on a single point of truth, or the
source of truth. Why does that matter so
much?
If things were the other way around, each
department would have its own truth.
02:02
All of the data would be handled a little
bit differently.
02:04
For example, the number of patients would be
different.
02:08
These signs show, for example, that the
salary would be different.
02:12
There would have to be chaos.
02:14
That's why it's so important that key
figures and reports are based on the same
data. The second principle is separation,
which means that operational systems and
analysis should be kept separate.
02:29
When the ordering system is down, for
example, it is the worst nightmare for a
store owner. They are losing sales and only
getting customers who are unhappy.
02:38
So that should not happen.
02:42
If the systems weren't kept separate, a
careless mistake or an inexperienced worker
could make the question too hard.
02:48
Some complex analyses also require a complex
query.
02:52
So, these should not happen on the
operational system, but instead in a separate
place called the data warehouse.
02:59
The only time the operational systems are
put under stress is when data is being
gathered. So only at night, when not much is
going on.
03:07
The flow of data is pretty easy to
understand.
03:11
The data, which are shown as dots, are taken
from the different source systems and saved
to the central data store through an
integration layer.
03:19
The integration layer is there to make sure
that all the data is in sync and to change
the structures. Then, the data is spread out
in places called "data marts." The data marts
are just copies of the sections that were
made for each department.
03:34
For instance, the Personal Data Mart has
data about employees and their salaries.
03:40
A customer Service Data Mart has information
about customers and maybe information about
transactions. This encapsulation gives the
same more performance and keeps unauthorized
people from getting to it.
03:55
Let's go over what a data warehouse does
once more: First of all, data acquisition is
the process of connecting different source
systems and keeping them up to date.
04:06
Second, data storage.
04:09
It is putting the data away for a long time,
including on backup systems that are properly
linked. Third, how many data marts there
are.
04:18
These are always filled.
04:20
And fourth, looking at the data.
04:23
Simple descriptive analyses and reports can
also be made from the data warehouse itself.
04:29
There are three steps you need to take to
move data from one database system to
another. Extracting, changing, and loading.
04:37
The name for this is the "ETL process."
During the extraction, the data is pulled
from the data sources.
04:45
This can be done on a regular basis, since
the source can make extracts on a regular
basis. Or it could be event-driven, which
means that the source makes deductions when
certain things happen, such as when a
hundred changes are made to the database.
05:00
Or it can happen when someone asks for it
too.
05:03
The data warehouse will then ask the source
to make a copy.
05:08
During the transformation, the data are
changed and brought together.
05:14
There is a difference between syntactic
transformations, which are formal changes
like how a date is written, and semantic
transformations, which are changes in
meaning. One example of this would be
getting rid of entries that are already
there. The last step, "load," is just
putting the information in the target
database. Do you want your company to have a
data warehouse?
Then you have to overcome a few obstacles.
05:44
This can be roughly split into three parts:
"System," "Competence," and "Parameters." So
the first question is which system you need.
05:54
After a system has been put in place, it
takes a lot of work to change it.
05:58
The second point is: How can a company build
up its skills?
Do you need to hire experts to do this?
Or could external consultants do it??
During the implementation, this is the usual
way things are done.
06:10
On the other hand, it's not very common to
outsource everything.
06:14
And third, how the parameters are set.
06:17
What information does which department need?
Which information shouldn't be shared?
Which update frequencies make sense from a
content point of view and are also possible
from a technical point of view?
...et cetera. The more these questions can
be answered ahead of time, the better and
easier it will be to set up a data
warehouse.
06:39
Let's talk about what a data warehouse can't
do.
06:42
And the most important part of that is data
protection, because everyone has the right to
informal self-determination.
06:50
So, he can choose which personal information
is shared with whom and when.
06:57
Data avoidance or data economy is the main
idea behind this.
07:02
For a data warehouse, this means that the
personal data must be made anonymous or made
up to look like it belongs to someone else.
07:10
With anonymization, it is no longer possible
to find out who someone is.
07:14
With pseudonymization, instead of saving
personal information in the charts, only an
ID number is kept.
07:21
And there is only one protected chart that
shows how each person is linked to their ID
number. In conclusion, it can be said that a
data warehouse is needed to create key
figures for the whole company that are based
on a single point of truth.
07:36
Using these key numbers, you can measure the
growth of each department in an objective
way. Even though it takes a lot of work to
set up a data warehouse at first, it is
necessary because it is what makes data
analytics possible.