Rick Dales, VP Product Management

July 17, 2008

To Stub or Not to Stub (Part 1 of 2)

Posted by Rick Dales, VP Product Management

In the world of email archiving, there is an ongoing argument about the value of stubbing, a process designed to help manage the storage in Exchange by replacing messages or attachments on an email server with a link to a copy of the file in an archive. I thought I’d weigh in on this topic, first by explaining the concept and looking at the pros and cons, and then (in a second post), providing a list of four best practices that businesses should follow if they’re relying on stubbing in their organization.

With the growth of email volume outpacing the reduction in total cost of storage ownership, it comes as no surprise that IT is struggling to manage Exchange storage. The real frustration for most Exchange administrators is that the vast majority of their storage is occupied with content that people almost never read. For performance and reliability reasons, Exchange is usually implemented on the most expensive of storage platforms making this usage pattern extremely expensive. Furthermore, as a transaction system, every piece of data is open for modification. This means that every piece of data needs to be backed up on a regular basis.

Introducing Stubbing – How it works
All of these factors have led IT to investigate archiving as a means to address their storage challenges. The idea is simple – focus the Exchange server on the delivery and management of current mail, and push the older mail to another repository that can be managed on less expensive infrastructure. That repository can then use archival storage management processes that allow for incremental backup of only newly added information, rather than the entire set.

Moving the data to another location (the archive) benefits IT; however, training users to change their behavior and look for this information in a new application (often with unique user interfaces and workflows) is often too cumbersome for broad adoption. To address these concerns, archiving vendors introduced features known as stubbing or shortcutting. This involves replacing the messages or attachments in users’ mailboxes with a pointer to the copy in the archive. From an end-user’s perspective, the email data is still accessible from Outlook, and yet they don’t run into their mailbox quota less often.

Stubbing Drawbacks
Stubbing isn’t without its drawbacks, however. To understand the impact on storage, you need a solid understanding of Exchange’s single instance storage model. When a message is delivered to multiple recipients within the same mailbox database (storage group), the message body and attachments are only stored once, and the message entry in each mailbox simply references the single copy of this data.

When a user modifies a message in their mailbox, Exchange creates a unique copy of the content and points the message in the user’s mailbox to that copy. As Exchange doesn’t provide any way to access the single-instance store of content, stubbing processes behave like end-user edits -- modifying messages on a mailbox by mailbox basis. If a message was sent to multiple recipients on the same mailbox database, but you only stub content for some of them, you actually increase not decrease storage by implementing stubbing. Furthermore, even though stubs may be small (typically <2K), as the stubbing process works through each mailbox, it is creating separate items in the single-instance store.

Since many elements of Exchange and data management processes are impacted by the number of entries in the tables, not just their total size, the unwinding of single-instance storage in Exchange can be problematic. As it happens, however, Microsoft Office has a habit of updating attachment metadata when a user views the item, which in most environments means that single-instance storage is pretty much non-existent within Exchange. The more of these changes that are made in Exchange between backups, the longer an incremental backup of the mail system will take.

Microsoft’s answer to the storage management problem is to change Exchange 2007 to support dramatically larger mailboxes and to change the way backup processes work so that managing these larger mailboxes databases becomes more practical. While most firms that I’ve talked to plan to increase mailbox sizes with their conversion to Exchange 2007, few are creating the 1GB mailboxes that Microsoft touts.

Conclusion
Clearly, stubbing is not the straightforward Exchange storage management solution that some vendors would have you believe. That having been said, when implemented properly, it can be a valuable tool to manage the growth of Exchange storage with minimal impact on end-user behavior. In my next post, I’ll talk about four best practices to make the most of stubbing in your organization.

May 20, 2008

Can one email archiving approach meet all your needs? (Part 4 of 4)

Posted by Rick Dales, VP Product Management

In my last  three posts, I introduced the idea that there are multiple approaches to archiving and took a deeper look at the two most widely-used methods – mailbox archiving and journaled archiving.  I conclude this series of posts by addressing the question that often comes up:  Can one email archiving approach equally solve both your mailbox storage management challenges as well as meet your legal discovery and compliance requirements?

As I mentioned in my first post, companies may have many goals when they decide to implement an email archive, but some goals may end up being in conflict with others.   For example, the IT group may implement an archive for mailbox storage management purposes and let users control which messages are archived and which ones are deleted.   However, by doing this, they defeat the organization’s retention policy and make the archive a meaningless place to manage preservation orders for a litigation hold. 

Most of the in-house archiving software products implement both mailbox archiving and journal archiving and allow customers to enable both approaches as a way to deal with the limitations of each.  Not only does this not provide an overly practical solution, it also results in duplicate storage of content (despite what they might tell you about single instance storage).

At Fortiva, we use journal archiving because we wanted to ensure that we could address the litigation readiness and compliance requirements.  However, as I mentioned in my previous posts, using journaling as a source of information that you plan to expose to end-users requires additional work (that most archives don’t attempt to do).  We do the extra work to understand routing of messages and assignment to end-user mailboxes so that one copy of the message can be used for both end-user access as well as discovery purposes. 

Fortiva offers capabilities such as stubbing, a process similar to mailbox archiving where a periodic scan of mailboxes is performed.  Unlike implementing mailbox archiving on top of journaling, we scan mailboxes and then use our powerful real-time search engine to find the item that already exists in the archive to determine what the stub (or shortcut) in the mailbox should point to.  Doing so allows us to leverage the single copy of the data that is already in the archive via journaling.

It must be noted that Fortiva’s solution is built around a retention policy engine that assigns retention when messages are archived.  This means that neither users nor IT can simply say “I don’t need this anymore” and delete items at will.  As such, while Fortiva provides the added value of addressing storage management challenges, our on-demand archive is most suited for those that have a need for consistent retention as a core business requirement. 

While most modern archiving solutions offer some capabilities to address legal discovery and storage management challenges, each will have limitations on one area or the other – partially because the “optimal” business rules for each problem are in conflict. Thus, knowing what your primary goal will help you decide which email archiving approach is best suited for your organization.

May 16, 2008

Approach 2: Journaled Archiving (Part 3 of 4)

Posted by Rick Dales, VP Product Management

In my last two posts, I talked about the fact that there are multiple approaches to archiving, each with its pros and cons. I also took a closer look at one of those approaches – mailbox archiving.  In this post, I will dive more deeply into another widely-used approach – journaled archiving – including how it works and what problems it is best suited to address.

Journaled archiving relies on a feature in the mail system that captures a copy of every message in transport (as it is sent/received) and puts a copy in another mailbox.  This copy of the message is stored as an attachment to a message known as a journal report, which contains additional information about the actual recipients of the original message.  The archiving system then uses this “journal mailbox” as a source of messages to be captured (and typically deletes the content once it has been captured).  Some outsourced solutions rely on the customer configuring journaling to deliver to a remote SMTP address.

Strengths

  • Complete capture of email messages
    The journaling process places a copy of every message that is sent/received into a separate mailbox at the same time that a user receives it in their mailbox.  A user choosing to delete the message in their own mailbox has no bearing on whether the message gets archived. 
  • A single, complete picture of each message
    As the journaling process includes BCC information and expansion of distribution lists, the archiving system can provide a full picture of the original message.  While multiple Exchange servers can increase the complexity on this front (because multiple journal reports may be created), the data exists to allow an archiving system to collapse the data into a single message containing all information about the actual recipients.

Weaknesses

  • Providing end-user access to their own mail is difficult
    To provide end-users with access to the messages that they sent or received, an archiving system has to determine which mailboxes a message was actually delivered to.  The address information on journal reports is insufficient to archive this, as forwarding and routing rules must be factored into the equation.   While it is possible to do this (and Fortiva does), most other journal mail systems do not, resulting in journaled messages being available only to IT or legal that have rights to see all mail.
  • No direct ability to modify/stub messages
    There is no connection between a journal report in the journaling mailbox and the messages that live in users’ mailboxes.  Replacing message content in users’ mailboxes with a pointer to the message captured using journaling, requires the archiving system to use complex lookup routines based upon content similarity.  Fortiva uses this approach, but most firms do not.

Appropriate Uses of Journaled Archiving

Best suited for: Legal and Regulatory Compliance
Journaled archiving is the Microsoft-recommended approach for capturing data for legal discovery and compliance requirements.  It allows for the complete capture of all messages in a single, unified view.

Not usually well-suited for: Email Storage Management*
Unless the archiving vendor specifically implements other processes to cleanup user mailboxes, journaled archiving approaches won’t address storage management challenges. Some journaled archiving solutions, including Fortiva, have implemented attachment stubbing (replacing attachments with a link to the file in the archive) to address this.

Not usually well-suited for: End-user Access*
Unless the archiving vendor specially implements techniques to determine which users actually received mail, users will either not be able to access their own mail, or will be granted access to a subset of the messages that they actually received. Some solutions, such as Fortiva, have developed a way to overcome this, allowing end-users to fully access all their archived mail.  Because journaled archiving isn’t working against the users’ mailbox, it can’t record which folder each user chooses to file the messages into.

* NOTE - As a point of reference (and self-disclosure), Fortiva uses journaled archiving. It overcomes some of the noted limitations with additional address resolution techniques and the use of a periodic scan of users’ mailboxes to allow for the stubbing of older attachments.

May 06, 2008

Approach 1: Mailbox Archiving (Part 2 of 4)

Posted by Rick Dales, VP Product Management

In a previous post, I introduced the idea that there are multiple approaches to archiving.  In this post, I will dive more deeply into one of the two most common approaches, known as mailbox archiving, including how it works and what problems it is best suited to address.

Mailbox archiving is the process of periodically connecting to a user’s mailbox and looking for content that matches some criteria (an archiving policy) and adding it to the archive.  While a mailbox archiving process might run on a nightly basis, typically the archiving policies are set to only store messages that are older than a certain age (typically 30-90 days).

Strengths

  • Visibility to all content and state information in the mailbox
    By connecting directly to the user’s mailbox, the archiving system can see (and choose to capture) any type of content, including calendar events, that wouldn’t be sent to another user.  Similarly, they can capture which folder the user has put the item into.
  • Ability to modify messages in the mailbox
    With direct access to the user’s mailbox, the original message can be modified (flagged), deleted or replaced with a pointer to the copy in the archive.
  • Easy to provide end-user access
    As the archive knows which mailbox it found a message in, it can easily provide the appropriate security controls to provide users with access to the messages in their mailbox without granting access to other messages.

Weaknesses

  • Incomplete set of messages are captured
    Similar to backups, any periodic snapshot activity cannot record things that arrived and were subsequently deleted between capture cycles.  Given that users read and then deleted over 50% of messages on the day they receive them, periodic capture will miss the majority of mail – even if the archiving policy is set to capture messages immediately. 
  • Incomplete picture of each message’s recipients
    When a user receives a message they have no visibility to the set of recipients that were BCC’d.  In addition, if the message was sent to a distribution list, the actual set of recipients isn’t stored with the message.  In the period between message receipt and capture, the membership of the distribution list can change materially (or the distribution list can be deleted from the mail system entirely).
  • Duplicate message removal is very difficult
    While digital signatures can be used to find and remove duplication of message bodies and attachments to optimize the storage within the archive, removing duplication of the messages themselves is difficult because the set of recipients may be different and the meta data about when a message was received will vary from mailbox to mailbox.  When performing legal discovery across a set of users, duplicate copies of messages from different user’s mailboxes dramatically increases the costs of reviewing messages to be produced for opposing counsel.

Appropriate Uses of Mailbox Archiving

Bested suited for:
Mailbox Storage Management
Mailbox archiving is appropriate for active mailbox storage management. A significant advantage -  mailbox archiving systems can “stub” or “shortcut” messages so that users don’t need to change their behavior to access historical mail. It is important to note, however, that without an active process that removes content from user’s mailbox, an archive only aids in storage management if combined with tight mailbox quotas – requiring users to spend hours each month on manual cleanup tasks.

Not appropriate for: Legal Discovery or Regulatory Compliance
Since mailbox archiving does not ensure the archiving all messages, nor does it provide a complete view of all message traffic, it is not suitable to address legal discovery or regulatory compliance requirements.

Click here to read Part 1 of Different Approaches to Archiving Email

April 22, 2008

Understanding the Different Approaches to Archiving Email (Part 1 of 4)

Posted by Rick Dales, VP Product Marketing

Discussing email archiving can be challenging, because the phrase “email archiving” is interpreted in very different ways, based upon the set of problems users are trying to address. Similarly, dedicated email archiving systems are not alike, and may offer different approaches to archiving.

Before selecting an email archive, it’s important to first understand the fundamental differences between these different approaches. Each one has pros and cons, depending on your archiving goals. These goals typically include (in no particular order):

  1. Providing a central, searchable, deduplicated repository of email data to use for the enforcement of litigation hold orders and the execution of legal discovery requests
  2. Provision of a systematic review process to monitor content sent/received by regulated employees (generally this is only in the financial services space)
  3. Providing easy access for users to their historical mail for productivity purposes, without keeping all of the mail on the production mail system
  4. Maintaining access to historical information when employees leave the organization

As I will explain over my next few posts, each of the current archiving approaches has limitations when trying to address all of these challenges. As a result, the selection of an email archiving system must first consider the best archiving approach to achieve your goals. Given that these goals can be in conflict, it will be equally important to prioritize your objectives and decide which items you are willing to make compromises on.

In my next few blog posts, I will provide a high-level overview of the main archiving approaches, outlining the pros and cons of each, from my perspective. (full disclosure: Fortiva uses a journaled archiving approach)

March 28, 2008

Reducing the Risks, Costs and Time of e-Discovery: EDRM and Email Archiving

Posted by Rick Dales, VP Product Management

This week I participated in a live webcast at Fortiva that discussed how an email archive can reduce the cost, risk and time involved with e-discovery. I thought it might be helpful to share some of the ideas from that event here on the blog.

A lot of people use the Electronic Discovery Reference Model (EDRM) to explain the steps involved in the e-discovery process. Since having an email archive significantly changes how a company deals with the first four phases of the model, I’m going to focus on those areas (as they pertain to email). First, I’ll explain what each step involves, and then provide a general sense of the time and costs involved (with and without an email archive).

Ultimately, as the diagram below shows, having an email archive in place can dramatically reduce the costs involved with e-discovery, not to mention the risks.

  1. Information Management – This is the first step in the model, and it refers to the organizing of information and the application of consistent retention policies. 

    Today, most firms have poorly defined retention policies and little, if any, way to enforce them across the organization. As a result, most companies dedicate very little – if any – budget to this step. The problem is, with no enforced policies, businesses are at risk of spoliation from destruction of data that should have been retained during a lawsuit. They also may be exposed to excess legal risk by maintaining data beyond the retention policy.

    By adding an email archive which captures a copy of every email message and applies consistent retention policies, a company can avoid these risks. Proactively capturing email into a central, searchable repository also increases visibility, and allows legal counsel to conduct early case assessment. It also makes future steps in the EDRM process faster and less expensive.  However, adding an archive requires a company to dedicate a moderate budget amount upfront.

  2. Identification – This step basically involves determining where – and in what format – email exists.

    Without an archive, this may require IT to find and catalog backup tapes, PST files (on the corporate network, individual laptops and desktops, and portable storage devices), and email servers. The hard costs are generally fairly low, but the time and effort required by IT can add up significantly.

    By adding an email archive, this step can essentially be eliminated, especially if a company takes steps to eliminate the use of PST files. Since all email is stored in the archive, it is a single source from which all e-discovery requests can be met.

  3. Preservation & Collection – Preservation means ensuring that email is protected against destruction or alteration (generally after a litigation hold), while collection refers to the gathering of email from the various sources catalogued in the identification phase. These two steps can sometimes overlap.

    Without an archive, enforcing a litigation hold manually (by asking individuals to retain information) is a hit or miss situation. Even if you can ensure that all relevant information is preserved (ie. by storing complete backup tapes), you will almost always end up retaining more data than necessary – potentially exposing the business to additional risk.

    With an archive in place, the preservation & collection process is radically reshaped. With all data stored in a central location, there’s no need for collection at all. The archive also allows you to easily enforce a preservation/hold order for only the data required, without risking additional data deletion/spoliation of evidence (in Fortiva’s case, a litigation hold can literally be enforced with a click of a button).

  4. Processing, Review & Analysis – This phase involves the preparation of data, as well as the review and analysis of that data. This is where the most dramatic time and cost-saving benefits can be achieved with an email archive.

    Without an archive, this generally involves restoring backup tapes and removing duplicates, which can be an extraordinarily expensive process (the average cost to restore a single backup tape is $2,500 and some businesses may have hundreds of tapes to restore). This is also when the initial culling process takes place - eliminating unnecessary documents in order to reduce the amount of data that needs to be manually reviewed. Since processing work is typically done by third parties with limited culling capabilities, the resulting dataset that needs to be reviewed is generally very large.

    With an archive in place, all the data is stored in a deduplicated fashion, and it can be searched and reviewed at any time.  This allows businesses to conduct early case assessment before meeting with opposing counsel (or even before a formal case is filed).  A powerful search feature also makes the culling process more effective, ultimately reducing the amount of data that needs to be reviewed and analyzed.

  5. Presentation – Since theoretically, the same dataset will be produced following the first four stages, an archive has no material impact on production and presentation costs.

As you can see in the diagram above, an archive involves moderate incremental costs in the information management phase (regardless of whether or not you’re involved in litigation); however, it dramatically reduces the total cost of the e-discovery process when a request comes up. Ultimately, even if you only have one case that requires e-discovery over the course of three years, it still makes economic sense to implement an archive (based on Fortiva’s pricing). This is true without taking into consideration the additional risks (and potential costs) that come with not having an archive in place.

March 13, 2008

Litigation Hold Loopholes – Preventing End-User Deletion

Deletekey Post by Rick Dales, VP Product Management

Last week, an interesting post appeared on StorageSoup, a SearchStorage.com blog that provides commentary on the storage industry. The post, titled FRCP looking like a PITW (Pain in the Wallet), identifies some of the potential loopholes a company can face trying to enforce a litigation hold. It also questions whether technology exists to address these loopholes without forcing an organization to literally keep every email indefinitely.

The quick answer to that question is yes (in fact that’s exactly what Fortiva’s on-demand email archive offers), but I thought it would be worthwhile to address some of the challenges mentioned in the blog entry in a bit more depth. Considering that the post was written by Tory Skyers, a Senior Systems Engineer who has hands-on experience dealing with multiple litigation holds and who regularly writes on storage issues, the confusion around how to best enforce a litigation hold is obviously hitting even the most seasoned IT professionals.

Here’s a quick rundown of Skyers’ main concerns, followed by my thoughts and recommendations:

  1. Some trials last a loooooooong time, and the costs of storing the data requested for litigation hold on WORM are very significant. Despite this, the potential risks and costs of not having the data available can be so high that businesses can’t afford not to store relevant data once a litigation hold comes into affect.

    1. As Skyers mentions, some cases can last five years or more and the cost of storing this data starts adding up quickly. The whole process can also be time-consuming for IT, and there are no guarantees that data won’t be corrupted. So not only is this approach expensive, it’s risky too. Having said that, the risks of not storing the data can be even higher. The key is to find a more cost-effective, reliable way to store the data (ie. an email archive).
  2. There’s a “Safe Harbor” clause in the FRCP that absolves companies of responsibility if the company has — and strictly follows — a deletion and retention policy. This protects the company from falling afoul of the regulation, but does my act (as an end user) of deleting an email fall under the “Safe Harbor” clause?

    1. The quick answer is no. The “Safe Harbor” clause protects organizations from being penalized for deleting relevant information before a litigation hold comes into affect, assuming the data was deleted according to a stated deletion and retention policy. If an end user is allowed to delete an email (accidentally or intentionally) that is covered by a litigation hold, or that has not yet reached the corporate retention period, it can be considered spoliation of data.

      Spoliation is the withholding, hiding, or destruction of evidence relevant to a legal proceeding and is a criminal act in the United States. It can result in fines and/or incarceration for the parties who engaged in the spoliation. It can also lead to a negative inference ruling that can ultimately lead to a guilt verdict.

      To avoid this, companies should have technology in place to ensure that email data cannot be deleted by an end-user until both of the following criteria are met: a) it has reached its retention period and b) it is not covered by a litigation hold.
  3. I’ve seen some precedent that leads me to believe that simply having and following a policy is not enough… So as it relates to e-discovery, if a company allows [me] to delete my own emails, are [they] implicitly approving of me disobeying retention and deletion policy?

    1. In a way, yes. The key to meeting the FRCP guidelines is having and enforcing a policy. If you believe your end-users can be relied on to accurately enforce your policy (and not make any errors), then it is sufficient to simply have a policy and rely on your employees. Otherwise, you better have some technology in place that enforces your policy (including litigation holds) and prevents human error.

      In fact, a case in point is the recent Intel vs AMD lawsuit. Intel executives were informed of the litigation hold retention requirement, but many of them deleted email anyway. Regardless of whether the email deletion was intentional (or whether it was simply human error), the company was guilty of spoliation.
  4. It seems like I would have to have CDP in place and store every email entering and leaving every mailbox forever to be really covered against every contingency.

    1. Fortunately, it’s not that bad. Once an email reaches the lifecycle outlined in the corporate retention policy, it can (and should) be deleted (assuming it’s not covered by a litigation hold). There is absolutely no need to keep everything forever (in fact that would raise a company’s risk profile significantly).

      The question is, how should you store your email? Skyers accurately points out that relying on a backup process may be insufficient, since any data that is sent or received, and deleted in between backup periods may not be retained. Beyond that, it is virtually impossible to apply a consistent retention policy against data on backups, since a single tape necessarily contains emails crossing a wide span of time. Backup tapes also have a high rate of corruption/failure, making them an unreliable.

      To keep all the data that enters your corporate email system for as long as necessary (and no longer), you really need an email archive like Fortiva, which captures every email that is sent or received, and keeps multiple copies in unalterable format on spinning disk until they meet the retention policy.

So all this leads to one conclusion –an email archive is really the most foolproof way to avoid the many possible loopholes when dealing with the FRCP requirements for email retention, litigation holds and e-discovery. At the risk of being self-promotional, here’s a run-down of how Fortiva meets all the requirements and addresses the concerns raised by Skyers:

  • Cost-effective storage: Fortiva’s SmartStore archive stores a redundant copy of every email sent and received according to the customer’s retention policy in a centralized location. It requires virtually no effort on the part of IT, and it starts at just $1.10 per user, per month for 1000-user company. It also offers storage management features that allow a company to significantly reduce the burden on the Exchange email server.
  • Litigation hold: Fortiva allows legal or IT to enforce a litigation hold against relevant email indefinitely with a click of a button in a web-browser interface.
  • Policy enforcement: Fortiva allows you to develop granular policies (including different retention policies for different departments, individuals, and types of data), and automatically enforces those policies.
  • Redundant storage: Fortiva stores multiple copies of every email in unalterable format on spinning disk, and keeps an additional copy in a secondary location. The system also provides continuous data validation across all archived data.

It’s important to note that not all email archives offer the same functionality. There is a whole class of email archives that were designed primarily to address email storage management issues, and those typically allow end-user deletion/deletion outside the retention policy (introducing many of the problems highlighted above). But that gets into topic in itself. In my next post, I’ll explain the different types of email archive, and the situations that each type is best suited for.

February 19, 2008

SaaS Pushing Prices Down in Email Archiving

Posted by Rick Dales, VP Product Management

Over the past month, three SaaS vendors (including Fortiva) in the email archiving space have made announcements about new or existing products that are being offered at price points that have never been seen before. Fortiva was the first in this trend, announcing SmartStore - an entry-level email archive that allows customers to meet all their compliance and e-discovery requirements for the same price as email storage (pricing starts at $1.10 per user, per month for 1000 users). This was followed by Google announcing new, highly competitive pricing for their Postini archiving product, followed by another SaaS archiving competitor who announced a similarly-priced SaaS archiving option.

While some might speculate that this pricing trend is a result of “the Google effect” (downward price pressure from a large organization), that’s certainly not the case for Fortiva. Our product announcement was planned well in advance of the Google release, in response to a market need for an entry-level archive that meets legal demands and FRCP regulations. 

The reason we’re able to introduce this lower-cost archiving option is twofold. First, as a SaaS provider, we can offer a “pay as you go” model that allows customers to only pay for the services and features they require (unlike in-house solutions). Second, our SaaS architecture (shared resources and greater buying power for infrastructure) allows us to continually lower our cost-to-serve model, and therefore be very competitive in our pricing. I suspect these factors are having a similar impact on the other SaaS vendors in our space.

It’s exciting to see email archiving vendors delivering on the promises that SaaS can offer – including greater reliability and performance, faster implementation, fewer hassles and minimal IT management, and now, significantly lower TCO. This momentum is great for the industry and more importantly, great for customers.

January 28, 2008

Archiving Your Email for the Same Price as Email Storage

Posted by Rick Dales, VP Product Management

72sm_color_fortiva_logo_copy First of all, let me start by saying that this is one of the rare times on this blog that we’re posting specifically about a Fortiva product…my apologies in advance for the self-promotion, but we felt this was worthy of a post.

We believe that every company should be diligent in storing their email in a way that complies with the FRCP requirements and allows you to meet e-discovery requests. Most companies have avoided this until now, largely because it’s too expensive. At Fortiva, we don’t think it should be. After all, if the data is being stored anyway, why should it cost so much more to do things like apply a policy or enforce a litigation hold against that data?

Which is why today, we announced the Fortiva SmartStore™ archive, a Software-as-a-Service product that allows you to centrally archive all your email, enforce policies and litigation holds, perform enterprise-wide search and easily conduct early case assessment, all for the same or less than the cost of storing and managing the data on enterprise storage in-house.

What this means for your average company is that there is now a way to protect against a potentially crippling e-discovery request, without adding significantly to your costs or to the demands on IT.

Until now, adding an active archive has been prohibitively expensive for companies that don’t face litigation on a regular basis. This wasn’t such a problem until the Federal Rules of Civil Procedure (FRCP) were amended in December, 2006, clearly identifying email as discoverable. Since then, there have been over 100 cases that have been impacted by the FRCP e-discovery rules in some way. Ultimately, no company is exempt, and cost is not considered a valid argument for not producing email requested during discovery.

Essentially, SmartStore acts as inexpensive insurance against potential litigation. You can rest assured that your email is being stored in accordance with the FRCP, and that you can quickly search and retrieve that email if a lawsuit comes up.

If you’re wondering how we can do this, it’s fairly simple. First, as a SaaS solution, our customers benefit from lower costs resulting from the shared infrastructure of Fortiva's multi-tenant architecture.

Secondly, we’ve significantly reduce the costs associated with search. As we’ve mentioned in previous posts, the cost of providing real-time search across all of a company’s email can be extremely expensive. Since SmartStore returns search results in a few hours instead of seconds, the archive costs are considerably lowered.

We know that companies that face infrequent lawsuits don’t require real-time search on-demand; in fact, a search response time of under 3 hours is fairly common for an email archive, and can easily meet the needs of many organizations. With SmartStore in place, customers can meet their basic requirements and upgrade at any time to real-time search if they faced intensive e-discovery requests.

If you want to learn more about Fortiva SmartStore, you can visit our website at www.fortiva.com/smartstore.

January 25, 2008

The e-Discovery Search Quandary – Justifying the Cost of Infrequent Searches (Part 5 in a Series of Search)

Search Posted by Rick Dales, VP Product Management

In our previous posts, both Chris and I discussed the significant investment in infrastructure that is necessary to provide fast, reliable search of corporate email. Even just a few years ago, this wasn’t a big issue for most businesses because they simply weren’t conducting searches across the entire email repository. However; in our increasingly litigious society, the growing costs that come from e-discovery are forcing more and more businesses to address the notion of "litigation readiness" – which inherently requires the ability to search email to isolate materials relevant to a given case. 

For companies that live under the cloud of a perpetual cycle of lawsuits, a variety of new technologies and processes have emerged to help people manage, collect, review and produce information for litigation.  Unfortunately, these approaches are often very expensive and can't be justified by the majority of businesses that only periodically face litigation hold and/or e-discovery activities -  a point that was reinforced by a recent survey that showed 1 in 5 businesses have settled a case to avoid the cost of searching through and retrieving email. 

For a company with a relatively long standard retention period (something that is becoming the norm), legal must be able to mine through a constantly-growing set of emails. This is particularly problematic because the cost to provide relatively quick searches doesn't grow linearly with the data growth, but instead, in most systems it grows exponentially. As difficult as it often is to justify the costs of "preventative" technologies (such as email archiving for litigation readiness), a system with rapidly increasing costs is even harder to justify.

Software-as-a-Service (SaaS) is a perfect model for addressing these types of challenges. Here’s why. When an e-discovery request comes in, most companies need powerful e-discovery capabilities with very little advanced notice; however, the rest of the time, they’re unlikely to need that search capability. Instead of building a system in-house that is underpowered when it's needed and wasteful the rest of the time, SaaS allows firms to readily access a pool of resources on-demand to meet their needs.

By spreading the cost of a large infrastructure over many customers, each of whom are unlikely to need the system at the same point in time, users get maximum capabilities at a far more justifiable, predictable cost.  To scale without bounds, SaaS companies like Fortiva are forced to build infrastructures whose cost does not grow exponentially (or it would be less and less profitable to take on new business).  This technology investment gets further passed along to the customer base so that costs per unit of data stored/processed go down over time.

Just like buying insurance, litigation readiness is about reducing risk and preventing significant, unexpected (and unplanned) costs.  There is the cost of enforcing a litigation hold; the cost of e-discovery activities and the cost of increased litigation risk by not having (or having access to) critical data – not to mention the costs of negative judgments. So it’s not surprising that litigation readiness – much like insurance again – can be a challenging thing to justify, especially when lawsuits aren't part of your firm's daily life. SaaS solutions can prove to be the best way to balance these needs.

Click here to read Part 6 in the Series on Search

January 04, 2008

Searching an Email Archive: Real-World Examples (Part 2 in a Series on Search)

Posted by Rick Dales, VP Product Management

Search In my previous post, I talked about the significant challenges of enterprise-wide search, and how those challenges directly translate to an email archive (in fact, they’re arguably greater for an email archive).

Today, most organizations archive email for legal discovery purposes. While they may have other goals, including compliance and storage management, searching through the entire repository is a fundamental requirement for any archive. The problem is that firms always underestimate the growth of the data and the infrastructure required to support the searching of that data (and the sales team from most email archiving vendors have little or no reason to change that).

To further this point, I wanted to share some real world experiences from companies we recently talked to that have an in-house email archiving solution in place. The first is an international bank that was archiving for a division of about 10,000 users.  Within two years they had amassed several terabytes of information. At that point, every time their legal or compliance department requested data from the archive, it was taking the IT department in excess of 24 hours to run a search.  With an expectation of next day delivery of information, this left no room for error.

This is far from the worst example we’ve seen. Another firm took over 25 days to complete a single search. And these experiences are not uncommon.  Making it even worse, we frequently hear that IT staff must stay up all night monitoring these long-running activities, because turnaround times don't allow for processes that fail overnight to be restarted the next day. 

Almost without exception, the companies we talk to say that their email volume is growing faster than expected.  The end result is that any new investments in the archive go toward growing the data intake processing capacity, not the search or access capability. Companies simply don’t have the budget, staff or time to keep up with search optimization. Which takes me back to my first post of this series, where I explained how a few years’ worth of corporate information can quickly accumulate to the size of all public information on the web, making it unreasonable for a company to even try to achieve short windows for search in-house (it would require hundreds or thousands of dedicated servers).

The big challenge is that for most organizations archiving data for litigation readiness, the data remains largely untouched until a legal issue arises. At that point, critical (and time-sensitive) searches are required. Yet maintaining the infrastructure in-house to conduct those searches on an infrequent basis (even a couple times a week) makes no sense. Leveraging a shared (SaaS) infrastructure for search, on the other hand, is an ideal way to cost-effectively conduct time-sensitive searches on a periodic basis.

As the archiving industry begins to mature, and more companies have experience managing an archive for more than a year or so, this problem will continue to come to light, and the benefits of multi-tenancy for archiving will be better understood. In the meantime, if you’re considering an email archive, take the time to ask the vendors you’re evaluating if they track search performance. Furthermore, ask to speak to customer references that have been archiving email for a significant period of time (and that have a comparable storage requirements to your own), and ask them about search times. You might be surprised at the answer.

Click here to read Part 3 in the Series on Search

December 19, 2007

Why is enterprise search so elusive? (Part 1 in a Series on Search)

Posted by Rick Dales, VP Product Management

Search Time and time again, we get calls from people who are looking for a new email archiving vendor because they are frustrated with the search performance of their current archiving solution. And it’s not surprising that they’re frustrated. With Google searching the whole internet in real-time, it seems logical that searching data across a single company would be a fairly easy thing to do.

As far back as the early 80's (when desktop document tools entered the workplace) people have talked about the importance of enterprise search as a key enabler for knowledge workers. Twenty five years later, an abundance of "enterprise search" products exist, yet very few firms have implemented company-wide searching of business information. Which leaves the obvious question - why not?

In fact, "enterprise search" is really a misnomer for these products which invariably are focused on narrow areas of information such as intranets or specific application data. The truth is that providing real-time searching of data across the enterprise (especially when you include the vast amount of email) involves significant challenges.

As the only company in our industry that offers a search time guarantee, Fortiva has a first-hand understanding of these challenges. Here is a quick breakdown of why searching through enterprise data is harder than it sounds:

Finding and quickly indexing distributed information is challenging
Business information comes in many forms -- structured and unstructured on an ever-changing set of machines on the corporate network.  Finding the machines that contain this information and scanning each file on each machine is both costly and challenging, particularly when dealing with laptops that come and go from the network.  Once documents have been found, the processing cost of extracting textual content in a meaningful form is also difficult and expensive.  Given that people most frequently look for recent information, if the indexing process doesn't work very quickly, users will likely find that the search engine is useless.

For day to day use, search needs to be fast -- which can be extraordinarily expensive and require an enormous amount of infrastructure
Web search engines such as Google, MSN and Yahoo have set user expectations for search.  If you have to wait more than a few seconds to get search results, you give up and move onto the next search engine. These sites get performance by distributing search activities across hundreds (or thousands) of servers for each search request, then aggregating the results.  Since the web is a single corpus of information that everyone shares common access to, and the search engines can profit from searches through advertising dollars, building out this large scale infrastructure is cost-effective.

What a lot of people don’t realize is that a few years’ worth of corporate information can quickly accumulate to the size of all public information on the web.  Yet to support the same search performance, implementing hundreds (or thousands) of servers in a single organization can never be justified.

Information access tools need to understand changing security models
Most documents within an organization are designed for consumption by a very limited audience.  Confidentiality of many types of information, including financial, business development and HR content is critical within an organization.  To provide a unified search infrastructure, each user must only be able to see search results for documents that they should have access to.  Determining these security relationships is challenging, however, because most firms assume that documents that live on a user's machine or their personal space on the network should only be accessible to that user.  Making this assumption, however, dramatically reduces the knowledge management value of enterprise wide search.

Ultimately, when you consider these challenges, it’s not surprising that so many people are frustrated with the search performance provided by in-house email archiving solutions. An email archive has to deal with a massive amount of searchable information, often many times the size of the "active" information set found throughout the corporate network.  Providing high-performance search across this data requires distributed search technology similar to that used by the web search engines and a large infrastructure. The truth is that for most companies, keeping up with the infrastructure requirements to support real-time search of the ever-growing volume of email data is – and will continue to be – cost-prohibitive.

And that’s where Fortiva’s software-as-a-service model starts to make real sense. By sharing the costs of the search infrastructure among different customers, Fortiva is able to guarantee search performance on an ongoing basis. Chris Tebo, our CTO and I will get into more detail about how we do this a future post in this search series.

Click here to read Part 2 in the Series on Search

November 02, 2007

Email Archiving and Risk Reduction: 7 Things to Know (Part 2 of 2)

Posted by Rick Dales, VP Product Management

In my last post, I commented on a recent Computerworld article entitled "Seven things to know about reducing risk with an e-mail archive", in which four Wikibon.org community members discuss the subject of email archiving and risk. As I noted earlier, the article points out seven “rules” for reducing risk with an email archive, all of which are valid to some extent; however, there are some overly simplified recommendations in the piece that warrant clarification. In this post, I will comment on the four remaining suggestions (having covered the first three in my last post).

4. Design for secure transfer from one medium to another.

The article makes a good point that to cost-effectively keep data 10-year period, you will need to plan to move that data to updated storage media at some point. This is a challenge that many people don’t consider when they implement archiving in-house – it’s also a compelling benefit of a SaaS solution like Fortiva (since that’s our problem, not yours).

Because content in an email archive accumulates over time, data management becomes a challenge both in terms of scalability and long-term accessibility of information. While it is critical that your archive not be tied to a specific storage medium, most people forget the practical considerations of moving very large blocks of information from one device to another.

As a service provider, we have unique scalability challenges, one of which is the need to constantly re-balance our storage infrastructure to ensure balanced load and maximum data utilization. As a result, we employ best practices that allow for the secure transfer of data from one medium to another, including”

- storing data in small, manageable units, so that movement from sever to server (and from one medium to another) is not only possible, but practical

- providing automatic fail over to alternate active copies of each piece of information to allow for uninterrupted access while data management and migration tasks are happening

- constantly scanning each piece of information in the archive to ensure that corruption and tampering have not taken place

When implementing an email archive, you should ask how the solution compares in each of these three areas.

5. Build to support derivative uses of the data.

While pundits have extolled the holy grail of knowledge management for years, email data may prove to be the centerpiece of an enterprise knowledge base. Email contains sufficient metadata (dates, involved parties, etc) to provide some basic context to information flow. Virtually all documents created in an organization flow through email as well. That having been said, email archives typically involve proprietary storage techniques to allow for storage optimizations such as compression and single-instance storage as well as effective retention management.

While Bert suggests storing information in industry standard formats to prepare for yet to be determined applications, it is impossible to predict what the "right", "industry standard" format will be. Instead, the system should easily provide ways to access and export the data into standard formats for analysis.

6. Avoid overly complex solutions and vendor lock-in.

Regardless of how simple or complex the solution is, having clear objectives is critical to a successful archive implementation. The reality is that once you have an archiving solution in place, it is difficult (but not impossible) to switch, so getting the basics right is critical. The winner in the feature race will get you no where if your archive can't:

- ingest new data as fast as your organization creates it

- provide consistent search performance, regardless of how large the archive gets

- ensure consistent availability

While Fortiva provides advanced functionality, such as message stubbing, we always ensure that customers have a meaningful policy and are successfully archiving new mail before we deploy added capabilities. I think the key point here is that effective risk reduction and a rich feature set do not have to be mutually exclusive. Getting added value from your email archive (beyond risk reduction) is a goal that should not be overlooked.

7. IT will manage e-mail archives as applications, with the storage group providing the actual storage as a service to the application team.

One of the most interesting insights we've gained in the process of converting customers from in-house archiving solutions to our SaaS platform is that storage groups don't understand that archival storage is a whole different beast. If the application and storage allocation is managed appropriately, archival data can be organized into blocks that become read-only, and no longer require ongoing backups. For large archives, treating the entire dataset as through it were a transactional database can dramatically increase costs and impact system availability.

As a result, contrary to Bert's suggestion, archive applications and storage management need to be done in tandem to get optimal efficiency. Given that most IT organizations are not organized in this way, outsourcing to firms that have this joint expertise allows for a more reliable, yet less expensive overall solution.

October 30, 2007

Email Archiving and Risk Reduction: 7 Things to Know (Part 1 of 2)

Posted by Rick Dales, VP Product Management

In a recent Computerworld article entitled "Seven things to know about reducing risk with an e-mail archive", Bert Latamore shares with us the opinions of four Wikibon.org community members: Josh Krischer, David Floyer, Peter Burris and David Vellante on the subject of email archiving and risk. While they make some valid points, and the seven “risk-reducers” are worth considering before you purchase an archive, there are also some overly simplified recommendations in the piece. 

As an email archiving vendor that has converted customers from virtually every solution out there, we've learned a lot about where people's email archiving implementations go wrong (as well as what works).  I thought it would be worthwhile to review the seven items in light of feedback from our customers and how Fortiva approaches the relevant challenges. In this post, I’ll cover the first three points, and I’ll address the remaining four in a follow-up post.

1. Focus on the issue of risk when selecting the technology for the base archive.
This is a valid point – while some people look at an email archive solely as a way to move data off of their overburdened mail system, if an email archive is ever going to be used to address legal discovery or compliance requirements (and it’s almost a given that it will at some point), these considerations must be first and foremost in the selection and implementation of an archiving solution. 

Since the article doesn’t go into specifics on what an archiving solution should have in order to effectively reduce risk, I thought I’d do that here. For discovery and compliance purposes, you must ensure complete, authentic capture of data and consistently apply retention policies.  As a result your solution must have:

  • a robust policy engine that can systematically classify messages for retention (not just treat everything as a keep forever black hole)
  • a way to isolate specific messages to apply a litigation hold to (while still being able to dispose of other items that have met their retention period)
  • all disposed data be unrecoverable as soon as its disposition has been authorized - digital fingerprints on all content to prove that it hasn't been tampered with
  • a data capture process that can never loose data due to system or network failure

2. Good procedures are more important than access speed.
Again, this is true, but the article misses a few salient points (and lacks specifics on what to look for).

An inconsistent archive creates, rather than mitigates risk.  While the exact rules for the retention of information are up for debate, the consistent application of a company's stated policies is not.  If you are going to rely on an archive to reduce risk you need to:

  • Ensure all changes to the system (and the policies applied to the data) are fully audited so that you can explain why any message was retained for the period it was
  • Have multiple (redundant) copies of all data, including indexes, to ensure that equipment failure does not result in data loss
  • Have offsite copies of the data to protect this asset in case of disaster
  • Ensure that no unauthorized user can access the data (this includes database or storage administrators)

While sound data management procedures are critical, the argument that 48-hour turnaround on legal discovery activities is “fine” misses a key point about risk:  Instant access to mine through data allows legal counsel to assess the risk involved, determine how likely they are to win the case, and make educated decisions about how to proceed.  With the ability to perform "pre-discovery" investigations, firms can choose to settle or fight a case based upon real insight into the information they have on hand. So while quick/real-time access to data may not be as necessary as enforcing consistent procedures, it certainly is a very valuable feature to have, and one that should not be ignored as a risk reduction tool.

3. Do not archive e-mails from before the archive was created
This is an interesting perspective, and one that certainly has its merits. It’s true that importing historical data to an archive can be an expensive, labor-intensive process and it may not make sense for every company. However, while Floyer makes a valid point that imported historical email will rarely, if ever, represent a complete record, he misses a key way that importing old email to an archive can reduce risk.

Being able to ensure that you don't keep things longer than your stated policy allows cannot be underestimated as a risk reduction tool (as Floyer covered in point #2) – and this is something that an archive can address, assuming you follow the right procedures.  If you archive historical email at the start of an archiving project with the hope of reducing risk, it is critical that you delete the imported email data stored in various places on user laptops, desktops and file servers (ie. in PST files). That way, you can enforce a consistent retention policy on all your email data, dramatically reducing risk.

On the other hand, if you allow users to keep their PST files, while ingesting a copy of them into the archive, the only potential risk reduction you gain is from easy search access for "pre-discovery" investigation. Another thing to remember – it is important to try to import your data right at the beginning of the archive implementation and record when these activities happened, so that you can explain, if called upon, that data prior to a certain date is not a complete set. 

As for the cost of importing the data, each vendor's solution is different.   As a service provider, we charge based upon the amount of data imported.  Importing data from a legacy archive almost always pays for itself, because the legacy system would otherwise need to be maintained in parallel.  When it comes to "unmanaged" data, such as PST files, the initial cost may seem harder to justify.  It's worth understanding, however, that unless you prevent users from adding to their local storage of email data, your risk profile doesn't change with the addition of the archive.

In my next post, I'll address the remaining four recommendations from the article.



About

About
Contact

 Subscribe in a reader

Subcribe by Email:

Archives

Search


Powered by TypePad