With data warehousing, two security issues need to be addressed, so security needs to be implemented at two levels. The first is protecting the data warehouse against employees. Users within the organization should be able to access only the information that they have the security clearance for. The second is protecting the data warehouse against those external to the organization who may try to access the organization's data.
Protect Your Information from Users
As is the case with any legacy application, a data warehouse consists of a set of tables. These tables could be fact, dimension, helper, or any other type of tables present in the warehouse. An end user application is used for analyzing and exploring the data present in the warehouse. This could be either an OLAP tool or an internally developed application. Users should be allowed to access the data warehouse only via the end user application.
Security for the end user application is important because it serves as the interface between the data warehouse and the end users. To an extent, the security requirements depend on the requirements for the data warehouse. Security for the data warehouse and the end user application can be broadly classified into six areas: user security, row-level security, report security, data security, data backup, and virus protection.
User Security
This level of security pertains to the data that can be accessed by the end users. Users should not be allowed to add, delete, or modify any of the data that is present in the data warehouse. They should be allowed to only query the data warehouse and view data that they have access to. Only those administering the data warehouse should have access to the population scripts used for refreshing the data.
In order to implement user security, administrators should provide users who are going to access the data warehouse with a user ID and password. If there are multiple users with the same authority level, then a user class can be created and the user IDs linked to the user class. For each user ID, the corresponding authority level needs to be set. It should be ensured that users who access the data warehouse do so only via their user ID. When a user logs in, the system identifies the access rights for that user ID, thereby allowing the user to view data that the user has access to.
Row-Level Security
This pertains to the data that can be viewed by the end users. For example, employees in a particular department should be allowed to view only data that pertains to their department. They should not be allowed to view data that is related to the other departments. Row-level security can be managed at the data warehouse level as well as at the end user application level.
At the data warehouse level, one method that can be used for implementing row-level security is to create views on the data warehouse tables. Another approach could be to partition the data and store it in the data warehouse. A slight deviation from this would be to create departmental datamarts. For example, data pertaining to the human resource department is stored in the HR datamart. Similarly, information pertaining to the finance department is stored in the financial datamart. The approach used for implementing row-level security could vary based on the requirements for the data warehouse.
Row-level security can also be handled in the end user application. One commonly used technique is data filtering at the report level. For example, assume that the head of marketing wants to see details about employees in the marketing department. There is no point in presenting the manager with a report that contains details about all the employees in the organization. In this case, a filter can be applied on the department field, thereby allowing each department manager to view data pertaining to their department.
Report Security
In a data warehouse environment, users can execute two types of reports: ad hoc reports and predefined reports.
Ad hoc reports are developed on the fly. Users who are exploring the data in the warehouse usually come out with such reports. Those who generate these reports can either save them for future use or delete them. A potential problem that could arise here is that, when exploring the data, the user may come across information that ideally the user should not have access to. In order to overcome this, a user should first log in to the end user reporting application using a user ID and password. Based on the authority that is associated with that user ID, the user can run ad hoc reports on the data warehouse and view data that the user has access to.
Predefined reports can also be run on a data warehouse. When a data warehouse is developed, a set of predefined reports is usually provided along with the data warehouse. Generally, these reports are executed on a periodic basis. An example of this could be the monthly employee payroll report. The list of users who will be allowed to access each report should be determined before the reports can be released to the end users, thereby allowing the security for the reports to be set up before releasing the reports to the end users. Users should only be allowed to execute reports that they have access to.
Data Security
Most data warehouses contain a lot of sensitive and personal information. For example, they could contain personal details about an employee, such as the employee's contact information, date of birth, SSN, etc. To safeguard such information, data encryption (otherwise known as "data masking") can be used. All sensitive information can be encrypted and stored in the data warehouse. When this data is accessed by authorized users, it can be decrypted and revealed. If for some reason a user is able to hack into the data warehouse, the user will see only the encrypted data. At the same time, it is unnecessary and time-consuming to encrypt all the data that is stored in the data warehouse.
Data Backup
Every organization that has a data warehouse should have a disaster recovery plan. An important part of this plan is data backup, which should be performed daily or least weekly. Data can be backed up to a separate server, a tape, or even a CD. The volume of data in the data warehouse determines the backup medium. The backup of the data should be stored at a separate physical location if possible. Should anything happen, the backup can be used for restoring the data warehouse.
Virus Protection
The data warehouse also needs to be protected against a virus attack. If the data warehouse gets infected with a virus, it could cause quite a few problems. Therefore, good virus software needs to be installed and used to protect the data stored in the data warehouse. The software that is used should always be kept up-to-date. It is also advisable to carry out periodic virus scans on the data warehouse.
Protect Your Information from Hackers
After you've protected the data warehouse from employees within an organization, the next step is to protect the data warehouse from those who may try to access the data warehouse from outside the organization. The threat of unauthorized access to the data warehouse from outside is just as high as the threat from within. One commonly used method for protecting the data warehouse from external access is to use a firewall.
To ensure that the security of the data warehouse is always maintained at a high level, security experts must perform periodic security audits on the data warehouse. During the audit, the security of the data warehouse environment and end user applications needs to be reviewed. Any security lapses found should be documented, and solutions for bridging these gaps should be provided. On completion of the audit, a report should be prepared and presented to management for their review and approval before implementing the required security procedures.
Never Enough
As we have seen, security plays an important role in a data warehouse environment. An organization's data is extremely valuable to both employees and external sources, and the loss of this information could prove to be an organization's downfall. Every possible attempt should be made to secure the data warehouse and the data stored in it.
There is no such thing as too much security. As the saying goes, it's better to be safe than sorry.
Warren Sequeira is a senior technical architect with Hexaware Technologies Ltd. He can be reached via email at
LATEST COMMENTS
MC Press Online