Storage

May 20, 2008

Can one email archiving approach meet all your needs? (Part 4 of 4)

Posted by Rick Dales, VP Product Management

In my last  three posts, I introduced the idea that there are multiple approaches to archiving and took a deeper look at the two most widely-used methods – mailbox archiving and journaled archiving.  I conclude this series of posts by addressing the question that often comes up:  Can one email archiving approach equally solve both your mailbox storage management challenges as well as meet your legal discovery and compliance requirements?

As I mentioned in my first post, companies may have many goals when they decide to implement an email archive, but some goals may end up being in conflict with others.   For example, the IT group may implement an archive for mailbox storage management purposes and let users control which messages are archived and which ones are deleted.   However, by doing this, they defeat the organization’s retention policy and make the archive a meaningless place to manage preservation orders for a litigation hold. 

Most of the in-house archiving software products implement both mailbox archiving and journal archiving and allow customers to enable both approaches as a way to deal with the limitations of each.  Not only does this not provide an overly practical solution, it also results in duplicate storage of content (despite what they might tell you about single instance storage).

At Fortiva, we use journal archiving because we wanted to ensure that we could address the litigation readiness and compliance requirements.  However, as I mentioned in my previous posts, using journaling as a source of information that you plan to expose to end-users requires additional work (that most archives don’t attempt to do).  We do the extra work to understand routing of messages and assignment to end-user mailboxes so that one copy of the message can be used for both end-user access as well as discovery purposes. 

Fortiva offers capabilities such as stubbing, a process similar to mailbox archiving where a periodic scan of mailboxes is performed.  Unlike implementing mailbox archiving on top of journaling, we scan mailboxes and then use our powerful real-time search engine to find the item that already exists in the archive to determine what the stub (or shortcut) in the mailbox should point to.  Doing so allows us to leverage the single copy of the data that is already in the archive via journaling.

It must be noted that Fortiva’s solution is built around a retention policy engine that assigns retention when messages are archived.  This means that neither users nor IT can simply say “I don’t need this anymore” and delete items at will.  As such, while Fortiva provides the added value of addressing storage management challenges, our on-demand archive is most suited for those that have a need for consistent retention as a core business requirement. 

While most modern archiving solutions offer some capabilities to address legal discovery and storage management challenges, each will have limitations on one area or the other – partially because the “optimal” business rules for each problem are in conflict. Thus, knowing what your primary goal will help you decide which email archiving approach is best suited for your organization.

May 16, 2008

Approach 2: Journaled Archiving (Part 3 of 4)

Posted by Rick Dales, VP Product Management

In my last two posts, I talked about the fact that there are multiple approaches to archiving, each with its pros and cons. I also took a closer look at one of those approaches – mailbox archiving.  In this post, I will dive more deeply into another widely-used approach – journaled archiving – including how it works and what problems it is best suited to address.

Journaled archiving relies on a feature in the mail system that captures a copy of every message in transport (as it is sent/received) and puts a copy in another mailbox.  This copy of the message is stored as an attachment to a message known as a journal report, which contains additional information about the actual recipients of the original message.  The archiving system then uses this “journal mailbox” as a source of messages to be captured (and typically deletes the content once it has been captured).  Some outsourced solutions rely on the customer configuring journaling to deliver to a remote SMTP address.

Strengths

  • Complete capture of email messages
    The journaling process places a copy of every message that is sent/received into a separate mailbox at the same time that a user receives it in their mailbox.  A user choosing to delete the message in their own mailbox has no bearing on whether the message gets archived. 
  • A single, complete picture of each message
    As the journaling process includes BCC information and expansion of distribution lists, the archiving system can provide a full picture of the original message.  While multiple Exchange servers can increase the complexity on this front (because multiple journal reports may be created), the data exists to allow an archiving system to collapse the data into a single message containing all information about the actual recipients.

Weaknesses

  • Providing end-user access to their own mail is difficult
    To provide end-users with access to the messages that they sent or received, an archiving system has to determine which mailboxes a message was actually delivered to.  The address information on journal reports is insufficient to archive this, as forwarding and routing rules must be factored into the equation.   While it is possible to do this (and Fortiva does), most other journal mail systems do not, resulting in journaled messages being available only to IT or legal that have rights to see all mail.
  • No direct ability to modify/stub messages
    There is no connection between a journal report in the journaling mailbox and the messages that live in users’ mailboxes.  Replacing message content in users’ mailboxes with a pointer to the message captured using journaling, requires the archiving system to use complex lookup routines based upon content similarity.  Fortiva uses this approach, but most firms do not.

Appropriate Uses of Journaled Archiving

Best suited for: Legal and Regulatory Compliance
Journaled archiving is the Microsoft-recommended approach for capturing data for legal discovery and compliance requirements.  It allows for the complete capture of all messages in a single, unified view.

Not usually well-suited for: Email Storage Management*
Unless the archiving vendor specifically implements other processes to cleanup user mailboxes, journaled archiving approaches won’t address storage management challenges. Some journaled archiving solutions, including Fortiva, have implemented attachment stubbing (replacing attachments with a link to the file in the archive) to address this.

Not usually well-suited for: End-user Access*
Unless the archiving vendor specially implements techniques to determine which users actually received mail, users will either not be able to access their own mail, or will be granted access to a subset of the messages that they actually received. Some solutions, such as Fortiva, have developed a way to overcome this, allowing end-users to fully access all their archived mail.  Because journaled archiving isn’t working against the users’ mailbox, it can’t record which folder each user chooses to file the messages into.

* NOTE - As a point of reference (and self-disclosure), Fortiva uses journaled archiving. It overcomes some of the noted limitations with additional address resolution techniques and the use of a periodic scan of users’ mailboxes to allow for the stubbing of older attachments.

May 08, 2008

Wondering how you can afford to go green? Get your numbers straight.

Posted by Justin Wiebe, Fortiva Operations

Given my posting last month on green computing, I found the following statistic based on a new survey by McKinsey and Co. quite interesting.  Apparently, the world’s data centers are projected to surpass the airline industry as a greenhouse gas polluter by 2020.

In my previous post, I wrote about efforts taken by Fortiva to reduce our overall infrastructure power consumption. This has had the dual benefit of reducing both our impact on the environment and our cost of doing business. Since then, I have been thinking more about the challenges of justifying green computing from a dollars and cents perspective.

Until recently, I have rarely been able to create a positive Return On Investment (ROI) for new hardware purchases, especially those related to green computing. It turns out that all along I was missing something – that dollar amount that pushes the cost of existing systems over the top and reduces the payback to less than 3 years (sound familiar?). And what is that key? Power – more importantly, the cost of the power used over its lifetime by the piece of hardware you want to replace. As noted by Mark Monroe, Director of Sustainable Computing at Sun, rarely are power costs included in the IT budget.

Here at Fortiva, we try to roll all of our data center costs into one number. By adding up all co-location costs, power costs, cooling costs and miscellaneous data center costs (but not bandwidth costs) and then dividing this number by the useable power, we obtain the monthly $/VA  cost (or approximately $/W). By then determining the amount of power (VA or W) used by each server, we can calculate the cost per month to keep the server up and running. A sample of the calculation may look something like this:

Co-location Cost ($/VA) * Power Used by Server (VA ) * Server Life = Cost to Run Server

With the cost of power increasing almost everywhere, the Cost To Run Server is approaching the Cost To Buy Server. At Fortiva, the cost of hosting our dual-cpu servers for three years is approximately 75% of the total cost to purchase the server. If you stretch the life of the server out to five years, it actually costs more to host the server than to buy it.

So what should you do now? Try the following:

1 – Determine Co-Location Costs:

  • Check your contracts. If you outsource your co-location facilities, you may be able to calculate the $/VA cost from your contracts.
  • Talk to your finance department. If you manage your own co-location facilities, see if you can find out the costs of power, maintenance, security, on-call personnel, etc. Remember, your Co-Location Cost is the total of the costs to run the data center, divided by the useable amount of power.

2 – Determine Power Used by Server:

  • Get yourself a good power meter and find out how much power your servers actually consume when idle and when under load. The numbers provided by the manufacturers tend not to reflect how you use the servers on a daily basis.

3 – Get out the spreadsheet and start crunching the numbers.

Try calculating the ROI on consolidating some of your existing servers onto virtual machines, or replacing some of your older machines with more energy-efficient models. You’ll probably be surprised by the results.

Hopefully these numbers can provide you with a better understanding of the true cost of your IT infrastructure. Who knows, they may also help you reduce your overall power consumption, justify some new hardware, or even help you justify outsourcing to someone who already has.

May 06, 2008

Approach 1: Mailbox Archiving (Part 2 of 4)

Posted by Rick Dales, VP Product Management

In a previous post, I introduced the idea that there are multiple approaches to archiving.  In this post, I will dive more deeply into one of the two most common approaches, known as mailbox archiving, including how it works and what problems it is best suited to address.

Mailbox archiving is the process of periodically connecting to a user’s mailbox and looking for content that matches some criteria (an archiving policy) and adding it to the archive.  While a mailbox archiving process might run on a nightly basis, typically the archiving policies are set to only store messages that are older than a certain age (typically 30-90 days).

Strengths

  • Visibility to all content and state information in the mailbox
    By connecting directly to the user’s mailbox, the archiving system can see (and choose to capture) any type of content, including calendar events, that wouldn’t be sent to another user.  Similarly, they can capture which folder the user has put the item into.
  • Ability to modify messages in the mailbox
    With direct access to the user’s mailbox, the original message can be modified (flagged), deleted or replaced with a pointer to the copy in the archive.
  • Easy to provide end-user access
    As the archive knows which mailbox it found a message in, it can easily provide the appropriate security controls to provide users with access to the messages in their mailbox without granting access to other messages.

Weaknesses

  • Incomplete set of messages are captured
    Similar to backups, any periodic snapshot activity cannot record things that arrived and were subsequently deleted between capture cycles.  Given that users read and then deleted over 50% of messages on the day they receive them, periodic capture will miss the majority of mail – even if the archiving policy is set to capture messages immediately. 
  • Incomplete picture of each message’s recipients
    When a user receives a message they have no visibility to the set of recipients that were BCC’d.  In addition, if the message was sent to a distribution list, the actual set of recipients isn’t stored with the message.  In the period between message receipt and capture, the membership of the distribution list can change materially (or the distribution list can be deleted from the mail system entirely).
  • Duplicate message removal is very difficult
    While digital signatures can be used to find and remove duplication of message bodies and attachments to optimize the storage within the archive, removing duplication of the messages themselves is difficult because the set of recipients may be different and the meta data about when a message was received will vary from mailbox to mailbox.  When performing legal discovery across a set of users, duplicate copies of messages from different user’s mailboxes dramatically increases the costs of reviewing messages to be produced for opposing counsel.

Appropriate Uses of Mailbox Archiving

Bested suited for:
Mailbox Storage Management
Mailbox archiving is appropriate for active mailbox storage management. A significant advantage -  mailbox archiving systems can “stub” or “shortcut” messages so that users don’t need to change their behavior to access historical mail. It is important to note, however, that without an active process that removes content from user’s mailbox, an archive only aids in storage management if combined with tight mailbox quotas – requiring users to spend hours each month on manual cleanup tasks.

Not appropriate for: Legal Discovery or Regulatory Compliance
Since mailbox archiving does not ensure the archiving all messages, nor does it provide a complete view of all message traffic, it is not suitable to address legal discovery or regulatory compliance requirements.

Click here to read Part 1 of Different Approaches to Archiving Email

April 15, 2008

Note to Vendors: Please Help Us Be Green

Posted by Justin Wiebe, Fortiva Operations

Green_computing I recently returned from a trip to Europe where I visited data centers in several countries. Almost everyone talked about the environment, carbon neutral computing and the possibility of governments starting to tax businesses based on their computing carbon footprint. This was a refreshing contrast to my experiences in North America where there is lots of press about ‘Green Computing’, but not too much action.

Here at Fortiva, we have been putting a lot of emphasis on minimizing our footprint. Not just because it is good for the environment, but also because the economics make sense. Over the past few years, we have managed to reduce our power consumption in a number of different ways:

  • Using AMD processors and lower-power SATA drives has allowed us to reduce our power usage per GB stored from 0.2 W to 0.05 W
  • Virtualization in our Development and QA environments has reduced the number of servers by approximately one-fifth. 
  • Investing in remote management solutions reduces the number of visits we make to our data centers.
  • And, as I opened with, we are looking into our vendor’s  ’greenness ‘.

Looking at this list of changes, it seems like we have made a lot of progress over the past few years. Unfortunately, it still feels like we have a long way to go. I have come up with a wish list of changes I would like to see our vendors make to help us achieve our goals:

  • Ship less junk with each server:  For every server we receive, we probably throw away a third of the total weight shipped. Packaging, cable management kits, mounting brackets for non-standard racks, and documentation that no one reads goes directly into the dumpster. Some of our more enlightened data centers encourage recycling, but no one seems to take the time to sort the mess. Add the extra fuel used to ship the servers as a result of this excess weight, and eliminating these extras would save us all money.
  • Act like a global company: One of our server suppliers recently announced that they will no longer ship a server purchased in Canada directly to a US address. What this means for us is that when we order a server for one of our US data centers, the manufacturer ships it from the US factory to our Canadian office, where we turn around and ship it back.
  • Offer us older hardware:  Newer isn’t always greener – or even necessary. In many cases, we can use older, lower power consuming CPUs to power our storage servers. We just can’t get them.

I am sure there a lots of things I have left off this list that may make more sense for your company. Think about them, and the next time your vendor’s rep asks if there is any way they can help you, you’ll have something to share with them.

March 13, 2008

Litigation Hold Loopholes – Preventing End-User Deletion

Deletekey Post by Rick Dales, VP Product Management

Last week, an interesting post appeared on StorageSoup, a SearchStorage.com blog that provides commentary on the storage industry. The post, titled FRCP looking like a PITW (Pain in the Wallet), identifies some of the potential loopholes a company can face trying to enforce a litigation hold. It also questions whether technology exists to address these loopholes without forcing an organization to literally keep every email indefinitely.

The quick answer to that question is yes (in fact that’s exactly what Fortiva’s on-demand email archive offers), but I thought it would be worthwhile to address some of the challenges mentioned in the blog entry in a bit more depth. Considering that the post was written by Tory Skyers, a Senior Systems Engineer who has hands-on experience dealing with multiple litigation holds and who regularly writes on storage issues, the confusion around how to best enforce a litigation hold is obviously hitting even the most seasoned IT professionals.

Here’s a quick rundown of Skyers’ main concerns, followed by my thoughts and recommendations:

  1. Some trials last a loooooooong time, and the costs of storing the data requested for litigation hold on WORM are very significant. Despite this, the potential risks and costs of not having the data available can be so high that businesses can’t afford not to store relevant data once a litigation hold comes into affect.

    1. As Skyers mentions, some cases can last five years or more and the cost of storing this data starts adding up quickly. The whole process can also be time-consuming for IT, and there are no guarantees that data won’t be corrupted. So not only is this approach expensive, it’s risky too. Having said that, the risks of not storing the data can be even higher. The key is to find a more cost-effective, reliable way to store the data (ie. an email archive).
  2. There’s a “Safe Harbor” clause in the FRCP that absolves companies of responsibility if the company has — and strictly follows — a deletion and retention policy. This protects the company from falling afoul of the regulation, but does my act (as an end user) of deleting an email fall under the “Safe Harbor” clause?

    1. The quick answer is no. The “Safe Harbor” clause protects organizations from being penalized for deleting relevant information before a litigation hold comes into affect, assuming the data was deleted according to a stated deletion and retention policy. If an end user is allowed to delete an email (accidentally or intentionally) that is covered by a litigation hold, or that has not yet reached the corporate retention period, it can be considered spoliation of data.

      Spoliation is the withholding, hiding, or destruction of evidence relevant to a legal proceeding and is a criminal act in the United States. It can result in fines and/or incarceration for the parties who engaged in the spoliation. It can also lead to a negative inference ruling that can ultimately lead to a guilt verdict.

      To avoid this, companies should have technology in place to ensure that email data cannot be deleted by an end-user until both of the following criteria are met: a) it has reached its retention period and b) it is not covered by a litigation hold.
  3. I’ve seen some precedent that leads me to believe that simply having and following a policy is not enough… So as it relates to e-discovery, if a company allows [me] to delete my own emails, are [they] implicitly approving of me disobeying retention and deletion policy?

    1. In a way, yes. The key to meeting the FRCP guidelines is having and enforcing a policy. If you believe your end-users can be relied on to accurately enforce your policy (and not make any errors), then it is sufficient to simply have a policy and rely on your employees. Otherwise, you better have some technology in place that enforces your policy (including litigation holds) and prevents human error.

      In fact, a case in point is the recent Intel vs AMD lawsuit. Intel executives were informed of the litigation hold retention requirement, but many of them deleted email anyway. Regardless of whether the email deletion was intentional (or whether it was simply human error), the company was guilty of spoliation.
  4. It seems like I would have to have CDP in place and store every email entering and leaving every mailbox forever to be really covered against every contingency.

    1. Fortunately, it’s not that bad. Once an email reaches the lifecycle outlined in the corporate retention policy, it can (and should) be deleted (assuming it’s not covered by a litigation hold). There is absolutely no need to keep everything forever (in fact that would raise a company’s risk profile significantly).

      The question is, how should you store your email? Skyers accurately points out that relying on a backup process may be insufficient, since any data that is sent or received, and deleted in between backup periods may not be retained. Beyond that, it is virtually impossible to apply a consistent retention policy against data on backups, since a single tape necessarily contains emails crossing a wide span of time. Backup tapes also have a high rate of corruption/failure, making them an unreliable.

      To keep all the data that enters your corporate email system for as long as necessary (and no longer), you really need an email archive like Fortiva, which captures every email that is sent or received, and keeps multiple copies in unalterable format on spinning disk until they meet the retention policy.

So all this leads to one conclusion –an email archive is really the most foolproof way to avoid the many possible loopholes when dealing with the FRCP requirements for email retention, litigation holds and e-discovery. At the risk of being self-promotional, here’s a run-down of how Fortiva meets all the requirements and addresses the concerns raised by Skyers:

  • Cost-effective storage: Fortiva’s SmartStore archive stores a redundant copy of every email sent and received according to the customer’s retention policy in a centralized location. It requires virtually no effort on the part of IT, and it starts at just $1.10 per user, per month for 1000-user company. It also offers storage management features that allow a company to significantly reduce the burden on the Exchange email server.
  • Litigation hold: Fortiva allows legal or IT to enforce a litigation hold against relevant email indefinitely with a click of a button in a web-browser interface.
  • Policy enforcement: Fortiva allows you to develop granular policies (including different retention policies for different departments, individuals, and types of data), and automatically enforces those policies.
  • Redundant storage: Fortiva stores multiple copies of every email in unalterable format on spinning disk, and keeps an additional copy in a secondary location. The system also provides continuous data validation across all archived data.

It’s important to note that not all email archives offer the same functionality. There is a whole class of email archives that were designed primarily to address email storage management issues, and those typically allow end-user deletion/deletion outside the retention policy (introducing many of the problems highlighted above). But that gets into topic in itself. In my next post, I’ll explain the different types of email archive, and the situations that each type is best suited for.

March 06, 2008

Is Tape Going the Way of the Dodo?

Dodo_bird Posted by Jeremy Hope, VP Operations

I recently got an email from a vendor that I felt I had to comment on, and since it refers to something I have recently been blogging on – storage and backups – I thought I’d dump my thoughts into the blog.  The email I’m referring to was from a vendor inviting me to read a White Paper titled The Risk of a Disk-Only Backup Strategy: the Case for Disk and Tape, extolling the benefits of Tape technology for backups rather than relying only on disk to disk backup solutions.

The synopsis of the report is that Disk drives have a high MTBF (Mean Time Between Failure) rate in their later years (jeez technology gets less healthy as it gets older – go figure) and if you drop disk drives they may break (huh?- what a breakthrough!).   This is their total justification of why you need tape in your environment rather than relying on disk to disk backup alone.

Ok, so I concede these two points might be true (i’m not going to try drop kicking any of my disk drives to prove them wrong) but let’s look at the big picture here. The White Paper fails to mention the MTBF rate of Tape Drives and physical tapes themselves (how many times have you tried to retrieve data from a tape to find out it is corrupt?), or the fact if you drop them they break too (both tape drive and tape).  Never mind the headaches you have to go through when you try to restore that 4 year old tape that was created with a drive you no longer have (it was dropped a while back) and the new latest technology drive won’t read it.

The White Paper also fails to mention readily available technologies and solutions (RAID 6, distributed/cluster file systems, grid computing, multiple redundant copies, etc.) that can be used to improve disk to disk backups.  When these technologies are utilized (if you don’t plan to keep up with technology – get out of the IT business) these simple issues can easily be overcome. In fact there are numerous ways that a disk to disk backup solution can be advantageous and even better than tape for data intensive uses such as archives.

At Fortiva we use current RAID technology, accompanied with a grid computing storage infrastructure to provide multiple redundant copies of data across both Primary and Secondary data centers.  At least 3 copies of data exist at any one time and replication is used to keep the copies current.  This includes copies of disk to disk backups for the various systems.

If data is needed to be moved it is done via gigabit network or portable disk drives (that now provide over a Terabyte in capacity) and the new instances of data (and its redundant copies) are verified before the original is deleted.  This accounts for any possible service or data outages within a primary data center caused by any one set of data as well as providing for Disaster Recovery.  Having the backups running on spinning disk also allows for online verification of the data (when is the last time you loaded all of your tapes from tape storage to verify their integrity?).  This is done at a very affordable price without spending a cent on tape infrastructure and all of its complexities.

In our environment tape isn’t just becoming extinct like the Dodo, it’s already gone and buried.

February 26, 2008

How We Keep Email Archiving Costs Low

Posted by Jeremy Hope, VP Operations

As Rick's blog entry from January 28 noted, Fortiva recently introduced an entry-level archiving solution (SmartStore) that is extremely price-competitive. To help people better understand how this is possible, I thought I’d explain the unique storage challenges that email archiving presents, and how we at Fortiva deal with those challenges in a way that allows us to keep costs low.

The majority of companies implement high performance, highly redundant, high priced storage for their transaction-based applications and slower performing, less redundant, lower cost storage for larger amounts of data within file based applications.  The challenge with archived data is that it requires storage with both characteristics, crossing the typical boundaries of storage solutions typically implemented within most IT environments. 

Archive data necessitates storage with high throughput, not only to be able to write the large amounts of data within a reasonable time, but also to allow for the searching of the data.  High redundancy within the archive data storage environment is expected since in most cases only one copy of the data will exist (making tape copies of hundreds of TBs of data is impractical).   Meanwhile the same characteristic, the sheer quantity of data, begs for less expensive storage to stay economical.

This leaves many IT Managers puzzled with how to provide an archive solution at a practical price with reasonable performance.    One solution is the use of a Software-as-a-Service (SaaS) solution like Fortiva, where you let the provider worry about the storage environment.  Still, many may wonder how providers such as Fortiva can provide lower cost per TB solutions (such as our recently announced SmartStore solution) without losing money due to the storage costs alone.

For Fortiva, the solution lies in a grid computing infrastructure that utilizes a large number of 1 or 2U servers with locally attached RAID disk arrays.   This hardware provides for a fast, highly redundant and scalable storage infrastructure.  This storage environment mixed with the Fortiva “secret sauce” – a proprietary Distributed File System at the application layer that tracks where data is within the grid of distributed servers – allows Fortiva to provide multiple redundant copies of data at an extremely low cost.  Another advantage of the solution is the consolidated computing power available by utilizing each CPU within the grid that is used for providing search and other application functionality.

The fact that Fortiva uses a grid environment for all clients distributed throughout a data center provides the economies of scale that no large enterprises can afford to implement themselves – a fact that is reflected in the low pricing Fortiva offers.

December 10, 2007

The Truly Centralized Email Repository: Is it even possible?

Posted by Stephen Prokai, Fortiva Professional Services

Regardless of what the most pressing drivers are behind an email archive project, having a centralized repository for all corporate email is perhaps the most appealing end-goals for almost everyone we speak with.  But is it even possible to corral all of that data?

Many archiving solutions start with a “from today” implementation which means you have to take additional steps to get yesterday’s data into the repository.  To make this task easier there is usually some way to import that data, but old email can have lots of homes.  The obvious ones are Exchange mailboxes, back-up tapes, shared network drives, laptops, desktops and even a legacy email archive. 

You need to make sure that you pay attention and check all of the possible places that email can hide on your network.  Buried PSTs are the trickiest to locate, especially on laptops that are not always connected to your network.  Even the best PST finder utility can’t locate something that it can’t see.   Existing PST aren’t the only trouble though. 

Unless you deactivate the users’ ability to create PSTs, you can bet that there will be more within minutes of the initial deletion.  This quickly becomes a user training/re-education discussion though.  If you are going to remove the ability for users to create PSTs you had better have an archive that can be deployed quickly and more importantly one that is fast and easy to use.

Dealing with legacy email from an archiving system that’s on its way out can also be a major part of the project.  You’ll need a way to first get the old data out, get it into a format that your new archive can handle then get it imported.  One key thing to note here is that you may not need to actually include all of that old data in the archive.

Corporate email retention policies usually define at what point an email can be disposed of.  It is likely that an effort to gather all disparately stored email messages will end up with a load of data that has already satisfied the defined retention policy.  Steps should be taken to make sure you do not import anything you don’t need.  Not doing so can lead to higher risk exposure as well as higher costs, both for the initial import project and on an ongoing basis.

Achieving a truly centralized repository for your email is possible, but it really requires a concerted effort and some serious planning.  Making sure you gather everything you know is on your network and even the email you don’t know is out there is critical.  The company retention policy needs to be clearly defined and enforced, and the users need to know what it is.  The risk factors and the cost for storage considerations can not be ignored.

November 29, 2007

Green Computing and Virtualization – Basics First

Posted by Jeremy Hope, VP Operations

Banner_summ_3 Sitting at the Gartner Data Center conference, the discussion of green computing and better utilization of servers to reduce power and cooling requirements within Data Centers is everywhere.  In all of these sessions, about two slides into the presentation the discussion turns to and focuses on Virtualization Technology to achieve this goal.

What strikes me as strange with this is that so far in not one of the presentations that I have attended have I heard discussed the basics needed to start reducing power consumption, reducing cooling needs, and implement Virtual Technology – upgrading  of the old server infrastructure.  Sure everyone is trying to sell their biggest, baddest 32 or 64 way CPU box that can be sliced up in a myriad of VMs – but what about the basics – taking the old power hungry, heat-spewing Pentium III/Xeon boxes that have been purchased over the past years and replacing them with current (x- core or equivalent) technology.

At Fortiva, we changed our standard Linux server configuration from a Pentium III based system to a Dual Core CPU based solution, at the same time upgrading to a more current motherboard, power supplies and interfaces.  Doing so has not only reduced the cost of each server by approximately $400 (while providing a 150% lift in processing power – perfect for VM implementation)  but these new machines also use approximately 60% less power.  That’s a huge savings and increase in processing power, without implementing a bit of VM code.

Maybe it’s time for the hardware vendors to provide buyback incentives for replacing older technology that is power hungry with newer equipment that manages power better and would provide a much better basis for most Virtualization technology.

The government in Ontario (where i live) just recently started offering tax discounts for anyone replacing an energy pig appliance with one that is Energy Star Compliant, even offering to come pickup that old beer fridge out of your basement to help us all save energy.  Maybe the hardware vendors and Data Center hosting providers should get together and offer a similar plan to remove those old Pentium III machines from the Data Center.   Then maybe they can provide adequate power and cooling for all those new Virtual Machines without a major retrofit.

November 27, 2007

The Incredible Cost of PST Files

Posted by Alan Armstrong, VP Business Development

Here’s the scenario: You send a message with a 1MB attachment to ten colleagues in your company. When you do that, your email server keeps just one copy, so your message only takes up 1MB of storage. Pretty efficient, right? This idea is called Single Instance Storage (SIS), and it allows us all to share such large files without driving up storage demands more quickly than necessary.

Now, if your colleagues are anything like mine, 9 out of 10 of them will never delete the message or the 1MB attachment that you send. (And if they are anything like my colleagues, only 4 out of 5 of them will even open it, so thank goodness that they aren’t eating up all those bits for an individual copy. :))

As time passes, however, you and all of your colleagues will likely hit a storage limit, and need to delete things from their email stores. Many people do this according to date or size, so when they are notified about an impending “Quota overrun”, many of them will create local copies (PSTs) of their large and old emails so that they can continue to reference the old email when needed, but it won’t take up space on the email server.

At this point, things get really bad. Remember that when you sent this email, and until this point, your 1MB attachment was only consuming 1MB of storage. When 9 out of the 10 recipients, and you as the sender, each move the message to a local PST file, the file is copied to each person’s local or network file storage. This is not a big deal for local storage, because local disk is plentiful and cheap. The problem, however, arises, during company-wide backups.

A single backup of a 1MB file is unlikely to ever cause a problem; however, in the scenario above, the IT department is now going to consume 10MB of bandwidth, backup storage, as well as a few extra seconds of backup time, for a single file of 1MB. This happens because the backup process is not set up to recognize multiple copies of the same file.

If that were not bad enough, each time you modify, or even open, the PST file, it needs to be backed up again. Thus that 10MB file might be backed up daily for 2 weeks, or however long the IT team retains backup tapes. At that point, your 1MB file could potentially be using as much as 100MB of storage space (see Figure 1).

Some teams and some products can get around this problem of whole-cloth PST backup, but nonetheless you get the idea.

Or did you? Here it is, put in another way:

  • Local copies of corporate data cause problems. They create duplication of data, consume many times more storage, consume backup space, bandwidth, and precious backup window times.
  • Perhaps worse than any of those things, they take the data out of a centralized place where it can be managed as a corporate asset. Locally stored data cannot easily be disposed of as it should, and could put the company in a difficult situation in the case of a lawsuit.
  • When IT imposes email quotas, the problem moves and multiplies.

What’s the solution to these problems? Well, very briefly, you should consider moving the Exchange / email data to a secondary tier of storage; to an archive. With an archive in place, IT can restrict or eliminate the use of PST files, which can have a huge impact on corporate storage resources. In fact, in our example above, it would take our 100MBs of storage space down to just 2MBs (see Figure 2).

We’ve had customers tell us that as much as 80 percent of their network storage was being taken up by PST files - getting rid of this storage hog, especially to an on-demand email archive, can have a huge impact on IT - both in terms of infrastructure and the time required to manage and backup that amount of data. Getting rid of PSTs can also significantly reduce a business’s potential exposure to legal risks by getting rid of “rogue” PSTs that don't get deleted according to any policy, and that can easily end up in the wrong place (like a disgruntled former employee's home desktop).

It’s like a game of whack-a-mole; you hit one problem but it just comes up somewhere else. But in this case, the problem multiplies a hundred fold. It seems like a paradox, but this is a case where you lose money by trying to save money: IT imposes quotas on end-user mailboxes in order to reduce the cost and burden of storing email on Exchange. But when IT whacks that mole, 10 other moles pop up in PST files across the company, and then multiply again in backup routines.

Whack that mole for good. Kill the little rodent. He’s been chewing up your storage for too long. The only real way to do it is to retain a single, central copy of the file. Might I recommend an archive?

November 06, 2007

Let SAAS Providers Worry about Dispersed Data Center Costs

Posted by Jeremy Hope, VP Operations

A Gartner research note on US Colocation Costs (“Colocation Prices in U.S. Internet Data Centers Continue to Increase Rapidly” ID Number: G00151222) concludes that “colocation contracts should expect to see at least a 20% price increase, and pricing may be as much as triple that of three years ago”. The article goes on to discuss many of the other issues that we currently see in the collocation industry: lack of available space, inability for existing sites to support power needs, increase in power cost, lack of customer service, etc.

We see this more and more even outside the US, with space being at a premium in London, UK and other European centers, and even in Canada where the advantage of a weak Canadian dollar has disappeared overnight (at this moment a Canadian dollar is worth $1.04 US). We are also starting to see increases of 25% or more for power within the same collocation facilities.

How are individual companies supposed to safely and successfully make their way through this disastrous world while still affording to pay these increasingly high prices? Through my 20 + years in Operations I have experienced firsthand what can happen to Data Centers – I’ve lived through floods, Ice Storms, Fires, Interruptible Power Supplies (I do not believe in Un-interruptible Power Supplies –that’s another blog), Union Strikes, Chemical leaks, to name a few. Every time I have wished that I had more Data Centers to spread my mission critical applications to in order to reduce the effect of a specific disaster. With the costs on the rise, it is hard enough for the typical company to afford to have one data center let alone multiple Service Continuity sites.

This is where SAAS has a major operational advantage. If you can sign up 5 different SAAS vendors for your 5 mission critical applications, you have just exponentially reduced the chance of a single outage effecting more than one critical system, and done so in an extremely cost effective manner. Even more so if you are dealing with secondary applications such as Archiving. Why waste your valuable Data Center space and money on a system or data that may only be accessed once a month?

Let the SAAS vendors, who can take advantage of economies of scale to negotiate lower costs, worry about the cost associated with the physical data centers, power, and storage. Leverage your spending over multiple vendors and therefore sites – get your eggs out of the one basket.

June 20, 2007

A new model for SaaS behind the firewall?

Posted by Chris Tebo, CTO

In his posting of June 15, titled, "Can the appliance put SaaS on-premise?", Phil Wainewright makes the point that,

Delivering software as an appliance brings many of the same benefits as delivering software as a service. In fact, in recent weeks and months I’ve spoken to some people who’ve talked as if the two models were interchangeable. I wouldn’t go that far, but I would say that they’re different facets of the same trend towards making software easier to install and use, and I would also add, perhaps controversially, that if you believe in using the Web to deliver software functionality, then like it or not you’re probably going to end up delivering software appliances within your range of offerings.

Wainewright comes to the conclusion that both software as an appliance and software as a service have their place, and they shouldn’t be seen as competing with one another. While I agree with Wainewright on the points he makes in both that and a follow-up posting, I think it’s important to consider the possibilities offered by a third option, one that combines software as an appliance with software as a service. This is the model we use at Fortiva, and it’s one that I believe will continue to gain traction with vendors that want to provide the convenience of SaaS with a level of integration and data security that can only be achieved with an on-premise component.

In his posting, Wainewright makes the point that,

“The appliance model provides many of the benefits of SaaS without forcing customers to store and access their data outside of the firewall.”

This is true – and very useful for applications that involve small amounts of data. However, many SaaS solutions tackle challenges that involve large amounts of data by offering a large, centralized infrastructure. Since IT departments can face considerable challenges managing and maintaining a large data set, these customers get significant benefits from SaaS solutions that address the management of both the software and the data. In fact, a key value proposition for SaaS often involves not having to worry about procuring and managing large amounts of storage, which in turns allows the customer to avoid having to address the full suite of data management tools. So in these cases, the appliance model alone is not an option.

While SaaS allows you to benefit from “worry-free,” fully scalable storage on demand, it also has its issues.  The SaaS model can lead to isolated solutions that suffer from administrative challenges and a logical disconnect from the way other corporate information is managed and used. It also presents obvious security challenges. Overcoming these limitations requires an integration point within the corporation.  To do this, without losing ease of setup and maintenance benefits of a SaaS solution, some vendors (Fortiva included) have started to introduce in-house appliances (software as an appliance) that act as a gateway to their centralized services (software as a service).

In Fortiva’s case (a SaaS email archiving solution), we ship a “plug-and-play” style appliance that integrates directly with the customer’s Microsoft Exchange and Active Directory. The appliance also encrypts all the data before sending it over a secure transmission to Fortiva’s data centers. I’ve explained in my last two posts how this works, and how the combined SaaS/appliance approach allows us to provide rich functionality (including advanced search) to data that remains encrypted at all times outside the firewall.

So maybe it’s not SaaS OR software as an appliance (SaaA?) that companies should be considering…maybe it’s the two together.

April 20, 2007

Amazon S3 and EC2 - Game changers for SaaS development?

Posted by Chris Tebo, CTO

Amazon_logo_2 With announcements from Microsoft, EMC, Symantec, and a number of other vendors in the last week alone, Software as a Service (SaaS) is clearly hitting its stride. But one of the more interesting announcements I heard this week came from Amazon.com’s company's founder and CEO.

Speaking at the O'Reilly Web 2.0 Summit in San Francisco Monday, Jeff Bezos told the audience that demand for Amazon.com Inc.'s SaaS storage solution has grown so much that the company is being forced to scale back the beta program for its Elastic Compute Cloud (EC2) program, and that it is having to add data centers and disk space to deal with the demand for storage that they're seeing against their Simple Storage Service (S3). I’ve been watching Amazon.com's S3 and EC2 services since they launched last year, and this announcement confirms what I suspected.

These two services from Amazon are game changers for SaaS providers.  They provide storage and compute capacity on demand.  Need 5 servers for the next day to process some data?  EC2 can do that for about $0.10/CPU-hour.  Need to store 100GB of data for the next 3 months?  S3 can do that for about $0.15/GB-month.

These services open the door to getting SaaS solutions up and running with no up-front costs for provisioning servers.  Game changer #1!  While there have been some fairly high profile solutions that are using Amazon's services as key elements of their infrastructure, I do believe it is going to take some time for Amazon to commit to service levels that are required to build a business.

But there's still another game changer for SaaS solutions in all of this.  One of the real challenges for any SaaS provider is to test their solution at scale - ie. test their solution in environments that mirror their production configuration and production load.   The challenge here is that production environments for SaaS providers often include 100s or even 1000s of servers.  Purchasing and then managing a test environment at those scales is not something that most development organizations are equipped to do.  While some might be able to swallow the capital costs of the environment,  the management costs will push the organization to look for other approaches to testing.

With Amazon's EC2 and S3 services,  the SaaS solution provider can bring up test environments as they need them at a significant fraction of what it would cost to do so internally.  Need 50 servers for the next 12 hours to do some load and scale testing?  That'll cost about $60 with EC2.  If you want to execute those tests every week, you're now spending $240 per month to do so.  You couldn't manage 50 servers internally for anywhere near that price, so why would you even try to?

Amazon’'s services – and others like it that will inevitably pop up – have the potential to significantly impact the growth of Software as a Service. By providing the computing and storage infrastructure, Amazon’s service can enable businesses to build and offer SaaS solutions without having to deal with the challenges of managing bandwidth or buying servers. With a plug-and-play back-end, the time involved in getting a SaaS solution up and running can be cut back by a massive amount –- 70 percent according to Jeff Bezos, Amazon’ founder and CEO.

I can't wait to see how all of this plays out.

And by the way... I'm not just blogging about these solutions.  I'm looking at how we at Fortiva can employ Amazon's web-services to address real challenges we face in providing an email archiving service for our customers.  Our first step in that direction will be focused on product testing.   I'll let you all know how it goes!



About

About
Contact

 Subscribe in a reader

Subcribe by Email:

Archives

Search


Powered by TypePad