Storage Solutions
Data Archival Solutions
Infrastructure Solutions
Partners
Professional Services
Financing
Data Archival
"Archiving" -- Misunderstood and Poorly Executed
"Archiving involves indexing content such that it can be retrieved easily at a later time using a keyword search. Anything else is just backup, and ineffectual backup at that".
As networked storage becomes more ubiquitous, the need to manage where the
data is stored and to ensure that it can be moved around within the
storage environment becomes increasingly important. However, as this
process moves up the agenda for IT departments, the understanding of the
differences between different techniques and limitations becomes ever
more important.
Data migration, backup, disaster recovery and
archiving may all seem to merge into the single discipline of data
movement but each has a different role to play. What differentiates each
of these is the business drivers behind them. For instance, data
migration is about the bulk movement of data from one data storage
resource to another to achieve a particular outcome. This tends to be a
data movement which is based on a ‘one off’, project-based requirement,
rather than a regular feature of day to day Data Centre activity.
Disaster recovery on the other hand, is deployed as a form of risk
mitigation, basically a form of insurance against a catastrophic event
depriving an organisation of access to its data. Where disaster recovery
is concerned, data will be in an almost constant state of change as
remote facilities are constantly updated.
By contrast, Backup is about restoring lost, deleted or corrupted data to
a known good state. In the real world, most cases are about restoring
individual files rather than complete system volumes. Backup does not
try to keep up with a constantly changing set of data but relies on a
‘snapshot’ of a point in time which can vary from hours (using
virtualised disk techniques) to days and weeks using more conventional
tape backup.
Finally, archiving is about the long term retention of data which rarely
if ever changes and any changes which do occur tend to be deletions of
data no longer required. The characteristic which defines archiving is
therefore the unchanging nature of the data.
Information Life Cycle Management (ILM)
Information Lifecycle Management (ILM) is a concept being widely
promoted in the industry today. Many industry leaders consider this to
be little more than a name change for a technique known as Hierarchical
Storage Management (HSM) which has been widely used in the mainframe
world for many years. Despite employing many similar techniques, ILM is
not exactly the same as HSM in that it has some unique characteristics
that allow it to be applied in an open system environment. ILM must
deliver in an environment where the data is typically unstructured and
scalability of the system is a must while, HSM was developed to operate
in an environment where highly structured data was managed by a closed
system with limited potential for future scalability.
The advent of data storage virtualisation techniques from companies such
as Bridgehead Software and StoreAge, enables different disk technologies
and disk vendors to be consolidated into one logical pool of centralised
storage capacity has been one of the a major elements underpinning the
ILM approach and offers users new alternatives for their archiving
strategies based on a tiered storage model.
This ability to mix and match data storage based on criteria such as
performance and price, means that archiving can become a multistage
process with data not accessed in the immediate past to be taken from
high performance (and high cost) disk systems and migrated to lower
performance (lower cost) disk systems and in the process releasing
expensive, high performance disk capacity for high value system use. If
the data remains undisturbed for a further period then there can be a
further migration to, for example, tape where it can remain without
consuming expensive disk storage space.
Of course there is no universal panacea, and with ILM one of the
challenges is tracking the data as it is passed further down the data
storage hierarchy. In an ILM environment, the way this is achieved is
when a file is moved from a high performance (high cost) disk a ‘stub
file’ is left behind. This ‘stub file’ automatically redirects any data
access (to the migrated data) to the lower performing disk system. This
process is transparent to the user, When the file in question is further
migrated to tape, another stub file is written, this time on the low
performance disk giving yet another re-direct for any data access.
ILM is inherently an application typically embedded in a SAN. ILM will
undoubtedly become a significant option to be considered as part of an
organisation’s data archiving strategy. However, technology is a tool
which provides part of the solution. Users will still have to define and
create the policies around which data should be migrated, when it should
be migrated, how it should be migrated and the criteria on which these
decisions are based. ILM will require business and technology decision
making to go hand in hand if a long term viable solution based on ILM is
to be effective.
Regulation & Compliance
One of the drivers for the number of IT departments reviewing their
archiving strategy is the deluge of legal, regulatory and compliance
requirements affecting digitally stored data. There are two distinct
aspects to archiving in a regulatory environment. The first is all about
policies for retaining data and equally important - deleting data. The
second is matching the technology available to meet the specific
requirement. Compliance can be viewed as a three dimensional model based
on the triple axis of Regulation, Industry Sector and Geography.
However, issues such as the retention of data tend to be a common thread
and this relates directly to an archiving strategy.
Where archiving policies are concerned, the decision of how long to
retain data in an archive depends upon the applicable regulatory
requirements which set a minimum level and the organisation’s internal
policies which may extend this period. However, some legal requirements
also mean that data must be deleted when it is no longer required for
the purpose for which it was gathered (Data Protection Legislation) so
equally important is a data deletion policy that ensure only data which
is legally acceptable is kept in an archive. Additionally, there are
issues around whether archived data should be capable of being changed
and, if so, tracking who has done this. These challenges are met by an
array of technology approaches most based around Write Once Read Many
(WORM) technology.
Write Once Read Many (times) - (WORM)
Today almost all digital data is stored on either disk and/or tape.
These technologies are designed to allow data to be added, changed or
deleted as required by the user. Whether or not these data changes are
journalled is a local operational decision - but in most cases no
records are kept. The new regulatory regimes now require that any
changes to affected archived data are either impossible or are recorded.
This has given rise to three distinct approaches.
Optical Disk Archiving
Most people are familiar with optical technology through entertainment
CDs and DVDs. Data Centre Professionals may also be familiar with
Magneto Optical (MO) technology where the data is written magnetically
but read by a laser. While all these can be true WORM (the data cannot
be altered once it is written) the data capacity of these technologies
is small compared to amounts of digital data being accumulated
electronically, resulting in many pieces of media being required to
archive even modest amounts of data by today’s standards.
One manufacturer, Plasmon, has pioneered a new optical technology called
UDO (Ultra Density Optical) which currently triples the capacity over a
magneto-optical drive and is road mapped to double and then double again
within the next few years. Many users have had a ‘wait and see’ approach
to this new technology but now, with the media being second sourced by
Mitsubishi (Verbatim), and HP having adopted the technology as a
standard offering, shipments are now in the multi-petabytes and rising
sharply. UDO may be the way forward for capacity but all optical devices
have one more issue to resolve - how do you delete just one file on an
optical cartridge without having to copy and re-write all the data
(minus the file that is to be deleted). Plasmon now have a solution for
this as well - file shredding. Optical data storage offers an effective,
leading edge alternative for users wanting to make their archive systems
fully compliant.
Secure Archiving
Data archives are a prime target for both covert and malevolent
attempts at unauthorised data access. Securing the data archive not only
makes commercial sense but also contributes towards meeting compliance
requirements.
Securing data archives can take two approaches. The first of these is to
encrypt the data - thereby making it useless even if it is subject to
unauthorised access.
The second approach is not to allow unauthorised access in the first
place. This implies layers of hardened access controls which cannot be
altered by just one person and where all access to the secured data and
all changes must be authorised by 2 or more nominated individuals and
where all attempts to access data (whether successful or not) are
journalled.
Once again there are a number of products which can be deployed. These
range from the high end Decru devices designed to deliver encryption and
access controls, to the enterprise level SAN, to the ‘Paranoia’ based
product from DIS which delivers cost effective, ‘in the box’ tape
solutions for the mid to low end market. The DIS offering is targeted
specifically at encryption for archiving and backup solutions which are
designed to ‘plug in and go’ to address the problem of the limited IT
skills sets typically available to the SMB. DECRU offers a wide range of
options including file and block level encryption with access controls
(for both tape and disk) for the enterprise and sub-enterprise market
where IT skill sets can effectively deploy the technology.
