Are your pictures safe?

December 28, 2012  •  Leave a Comment

Is It Safe?

Today's F-Stop Cafe post is all about the safe keeping of your memories.  In this day of digital everything, your memories are stored in bits and bytes on various kinds of devices.  The rise of digital media and the ease at which we can produce and store massive amounts of content presents a unique challenge to the long term storage and maintenance of our memories.

If you are already familiar with how your images are stored on your camera and computer and want to jump strait to how I keep my data safe jump down to the How I Do It section below.  Otherwise strap in its about to get a little bumpy.

What is it?

Digital data is stored on your media using a binary code where 0's and 1's represent the smallest part of your data, a bit.  Depending on the format of the image those bits are collected in various way to describe your image to the computer.  For example, in a typical image there are anywhere from 16 to 24 bits or more used to describe every dot or "pixel" in your image.  Today cameras are rated in megapixels which as you might guess means the approx number of pixels in an image where mega mean million.  If we keep the numbers small and talk about a 5 megapixel image, it has approx 5 million pixels in the image.   So if we assume the low end of 16 bits per pixel  we would have the following.

5,000,000 X 16 = 80,000,000 or 80 million bits! 

Now of course there is other information stored in your picture than just the image data, stuff that tells a computer what kind of image it is and a bunch of other stuff.  However, the lions share of the data is your picture.  Also depending on the raw format, if your storing raw files, they may store as much as 14 bits per channel or more!  Thats 16 x 3 + various other tidbits of information about each pixel.  So for example one of my raw files runs around 30 megabytes per image.  Crank out a reasonably complex PSD and suddenly I am looking at several hundred megabyte individual files.  Yikes! What to do what to do?

How's it stored?

Lets just talk for a bit longer about the geeky stuff in how the data is actually stored.  Hang with me as there is a reason for this.   Most consumer cameras today use those little SD cards.  They are basically memory chips that don't lose the data when the power is removed.     This is your digital "film."  All storage systems use a file system to store data and to promote commonality among devices so you can interchange the cards and cameras without worry of having to ask "will they work?"

Before we go any further I do want to stress one thing.  The media you put in your camera is the first point of potential failure. DO NOT SKIMP ON YOUR MEMORY CARDS!  Buy only the best from reputable manufacturers like Sandisk, Lexar, Panasonic etc, and make sure they are not fakes.   Using inferior product is just a matter of time before you get burned as I have in the past.

The actual data is stored in blocks of bits which are in turn stored in clusters of blocks and here in lies the devil in the details.  A cluster is the smallest element that is used to "reference" your data by the file system.  In you think of your home, the post office knows about your mailbox (your cluster) in which they can stuff your mail(your blocks of data) up to a certain point.  If you get more mail then your mailbox will hold you have to buy another mailbox.  It works the same way on the file system.  Larger capacity cards use larger cluster sizes for the data (bigger mailboxes).  In most cases the big cards, 16gig and larger, use 32k cluster sizes.  That 32k translates to 32,000 bytes of data, where one byte contains 8 bits.  Are you confused yet?  Its ok you dont need to remember all this, because your card will tell your camera and your computer all it needs to know to access your stuff.  However, understanding this bucket of bits and bytes will illustrate my main point in a few more sentences. 

If we put it all together now, this is what the math looks like.  Lets assume a 32k cluster size and a 5 megapixel 16 bit jpeg image from our previous example. 

Each cluster can hold 32,000 bytes of data or 256,000 bits.

From our previous example a 5 megapixel 16 bit image needs 80 MILLION bits just to hold the image data.

So.. 80,000,000 / 256,000 = 312.5 or 313 to 315 clusters of data stored per image. 

All of that and we are just talking about a cell phone image!  If you think about modern cameras today which are 12 to 36 megapixel, those clusters just jumped from 315 to over 1000 and more. 

Bits, bytes, buckets of bits and bytes.  Whats the point?

So all the techno gibberish earlier is there to stress one very important fact.  Your data is fragile. We all know about failed hard drives and system crashes etc, those are big hairy monsters in the room you cant ignore.  Most probably know someone who suffered through one, and the shock that sets in when they realize their data is gone.  And of course we have all accidentally deleted a file not realizing it until after you empty your recycle bin.   However, the more insidious, less overt danger is the potential corruption of a file.  File corruptions can go unnoticed for a long time.  Usually only revealing themselves when you need the file and find that its dead or in a state that is not able to render correctly.  What makes this even more dangerous is that some corruptions can be propagated, meaning the file data can be copied just fine but its bad data.  This means that unless you have some level of versioning in your backup methods you can, and I have, copy bad data over good to the point that no good versions exist anymore. 

How does this happen?  If we go back to the example above our modern cameras need over 1000 clusters to store their files.  The bigger the files the more clusters are needed.  Hard drives can and do have a cluster die from time to time and never skips a beat.  Modern drives have ways to realize the cluster is going bad and attempt to read the data and move it somewhere else.  However this is not always 100% successful.  All it takes is one of those clusters to not be 100% accurate and your image is now potentially trashed.  There are a ton of other ways data can get mutilated under normal every day activities.  The bottom line is digital data is fragile, way more fragile than film negatives or actual prints.  Proper redundant versioned backup processes are critical to ensure they last a for generations. 

One last word on hard drive failures.  Google once did a study on hard drive life expectancy and found some interesting data.  The take away from the study was that drives fail but most drives failed in the 3 to 5 year range.  Interestingly consumer drives have one to three year warranties and enterprise drives have five plus year warranties.  They also found that array drive failures were very correlated, thus calling into question the use of RAID, even the much praised RAID5.  More on this in a bit.

What to do?

Now that the nitty gritty of the dangers of data storage are out of the way lets chat about how to protect your data.  A little later I will discuss how I store and protect mine personal and my client data, but for now lets talk about basics.  To ensure you give your photos a chance to survive there are several levels of storage safety you can apply.  These are presenting in my personal order of importance, your budget and needs will obviously dictate how far you take it.

1. Multiple Copies Same Disk - At a minimum you should maintain a backup set of your images, even if you have a single hard drive, at least make copies to different locations.  One set to look at and play with one set to recover from if you hose the first set.  If your images are only on your phone or your camera you are at risk.  Your memory stick will fail.  Basically at this level you copy from your card to both locations before you begin to play with your files.  NOTE: Always copy from the source files if possible.  Each time the files are written there is a potential for corruption.  Sometimes it is unavoidable to not copy the copy but care should be taken when doing so.

2. Multiple Copies Different Media -  Same as 1 above except now your backup copy is on a separate piece of media.  This could be another hard drive (internal or external), a CD or DVD or even another media stick from your camera.  

3. Multiple Copies With Offsite - Same as 2 but now includes a third copy in an off site location.  This could be as simple as a external hard drive in your bank safe deposit box, or at a relatives house.  Or you could use one of the many cloud services as an off site backup.

Of course all this sounds really simple but if your not organized and have a "system" it can quickly break down into a mess of not knowing what is where and if its backed up.  Also keeping the files synced up can get difficult.  If you have deleted a bunch from your working set why keep them in your backup eating space if you know you want them gone?   There are lots of tools to get this job done, I will discuss my favorites in the section where I discus how I do it.  Although at a basic level as a Lightroom user it can handle the initial copy to multiple locations seamlessly.

How I Do It.

First lets get the details of the software out of the way.  Storage wise my main editing workstation haves 4.5 terabytes of working disk space with a 1.5 terabyte exteranal drive as well.  My Windows Home Server has about 7 terabytes of storage at the moment.  Your needs may vary significantly from mine but the concepts remain the same.  One thing to note, All of my editing software is non destructive.  This means it does not affect the original files when I make changes to them in either Lightroom or Photoshop.  That is critical to know in my model, if your using a destructive editor you may need to take that into consideration.   My data management workflow includes the following software:

  • Adobe Lighroom - Not much needs to be said here - Lightroom is used to manage my catalogs of images and to maintain the metadata associated with the images, keywords, and all the non destructive edits done in lightroom.
  • Goodsync - Goodsync is a fantastic utility I use to manage the physical files themselves.  I goodsync because it does a very thorough job of validating that the destination files exactly match the source files.   This is a big step in ensureing I dont propagate corrupted files.
  • Crashplan - Crashplan is the service I decided to use because they let me manage multiple backup sets for different types of files, and they backup anything I specify.  However the really important feature is that they version all files backed up and they never delete anything while your account is active.  
  • Windows Home Server 2011 - Used to manage the storage on my server and to run unattended backups of my systems while I sleep.  Though that is not documented here as its not specific to photography storage.

Basically my data management workflow looks like this.

While the image above illustrates the entire process it can be broken down into a few simple steps. 

1. Import

When I finish a shoot the first thing I do is download the images off my camera to my primary machine into a temporary offloading directory on the primary drive. This kicks off a couple of processes that move data.

Sync the primary drive with the secondary drive and the secondary drive with the external drive if its on.  This copies new files to the secondary drive as well as deletes any files deleted in previous editing sessions.

The external drive is only turned on for imports and is normally turned off as a cold standby backup.

 

2. Background processes

2.A Server Copy

The second part of the process involves some automatic process' that runs at least once daily (I can force it to run more if I have a particularly heavy shooting day or editing day).  This process uses goodsync to sync changes on the secondary drive to my home server storage drives.  Most of the time this is a non-destructive one way copy without delete.  This means it simply copies new or updated files but does not delete.  I have a manual process that is identical but includes deletes for annual storage cleanup.  

If you remember the secondary drive is not cleaned up until another import is done.  So in most cases all imported files get copied over to the server prior to my post import culling process. 

2.B Cloud Copy

In addition to the automatic server copy process performed by Goodsync, my cloud storage client is always watching for file system changes.  When it sees new or updated files coming into the server it begins to copy those same files up to the cloud for automatic off site storage.  I use the Crashplan storage service but the destination could just as easily be an external drive at a relative or friends house.   One of the features I really love about Crashplan and that feature is free of charge.

 

Can I Sleep At Night?

It seems like a lot of work and for what?  Unless I have a catastrophic failure in my camera I have taken every effort to ensure that not only my client photos are safe, but my personal files as well.  My children's birth videos and pictures, birthdays and all the other moments captured so far.  It lets me sleep knowing that the data is safe for now.   And while there seems like a lot of "processes" to manage it really is not a big deal and most of them can be set to autopilot.  I just prefer to control some of the legs of the operation.   So in a nutshell what does the above get me?

1. On import I have 3 copies of all files, one set in a working state, one set in a hot standby state and one set in a cold standby state offline and safe.

2. With each import previous activities are replicated across the three physical locations (three physical hard drives).

3. On a continuing basis my files and changes are replicated to my server (two physical computers, 4+ physical hard drives).

4. On a continuing basis my files and changes are replicated to the Crashplan storage service (two physical computers, 4+ physical hard drive and geographically dispersed cloud off site storage).

5. Thanks to Crashplan at any time I can get to any version of files that do have destructive changes applied.  (PSD Files, rendered jpegs etc.)

6. Almost all of this activity can be set to happen automagically.

My Epiphany

I recently had a bit of an epiphany when I watched a short little film from a film maker in France called Lost Memories.  It really stirred something inside me.  While its a science fiction short story, I know from my background in IT that the potential for loss is real.  It caused me to think about my memories of my family and our lives.  It is what caused me to shore up my backup processes but more importantly it  made me think about what if?  While not probable it is possible to have a solar flair or EM storm strong enough to take out electronic storage in all but tempest protected equipment. 

So now I embark on my next level of protection.  I will be going back through my entire collection of images over 10 years of personal photos numbering close to 50000 images to hand pick the best to be placed in our generational books.  Real archival quality hand made books of our images to ensure that they will never be lost.  When that project is done it will close the loop from digital to physical and will truly give me peace of mind that my family will always have our memories. 

What about other options?

I have a RAID box on my network and thats where my files are safe.

RAID has been falsely hailed as a valid backup option, and that is completely false and dangerous.  RAID systems are used in enterprises to either increase speed of transfers or to provide fault tolerance.  There is always a backup solution to provide data protection for the content on the raided systems.  

All my pictures are stored on Twitter, Facebook, Instagram etc. 

First these services compress and distort the images.  So they are not originals and to me are considered compromised.  Second, as was seen by the recent Instagram fiasco once uploaded, your files are free to be sued by these services as they see fit. Lastly, they have no obligation to ensure you get all your pictures if they decide to shut down or to shutter your account. 

What about Apples Photo Stream?

While I have not played with this a whole lot it does have promise.  In that it replicates the files to many places automatically but I havent played with it enough to know what it does with deletes and other activities.

I have all my stuff on DVD or CD, that's good right?

Yes and no.  CD and DVD media is relatively small and very slow.  Not to mention that the hundreds of years of data safety is a bit of a stretch.  Without proper care and handling the data on a CD/DVD can begin to deteriorate pretty quickly.  They can be a good secondary media but should not be your only copies of your files.

Bottom Line

The bottom line in all this is to remember that your data is fragile and can be destroyed in literally one keystroke.  Please make sure to at least make sure to get the images off your camera and on to your hard drive, preferably two copies off onto two hard drives so you reduce your risk of damage.   More than that is up to your budget and your needs.  LAstly there is always print copies, haul your stick to the local shop that does prints and get them printed, if you want archival quality you can find a professional photographer like myself to help you build and produce archival quality books to be handed down for generations. 


Comments

Subscribe
RSS
Archive
January February March April May June July August September (5) October November (1) December (1)
January (2) February March (1) April May June July August September October November December
January (1) February (2) March April May June July August September October November December
January February March April May June July August September October November December
January February March April May June July August September October November December
January February March April May June July August September October November December