Sunday, February 10, 2019

A GEEKY QUESTION FOR THE TECHIES IN THE AUDIENCE


When I delete something from my computer, my understanding is that the byte in the first place on that part of the hard drive is changed [to E5?], indicating that the location is available for storage.  Until something else overwrites it, the original material remains.  Now, suppose I write emails, then delete them and empty the trash.  Is the text of those emails at that point actually erased from my email program server?  [I assume it is not stored on my own hard drive.]  Suppose the recipient of the email does the same.  Is there somewhere in the cloud where that data remains?

Am I even asking this question correctly?

6 comments:

  1. "Is the text of those emails at that point actually erased from my email program server?" Short answer: not automatically. Longer treatment: https://csilcs.co.uk/interesting-article-blancco/

    ReplyDelete
  2. You're right that in the first instance deleting a file doesn't remove it from the hard drive, but instead marks that bit of memory as being available. But in general this flag isn't directly attached to the data on your drive. In the file systems I'm familiar with, there is something like an index which tracks which bits of the physical drive are allocated to which file. This is where the flags are set which indicates whether a given bit of space is free or occupied, and this index is what is changed when the file is deleted. So, deleting a file on one of those systems doesn't change anything about the data itself, but changed something at the index on an entirely different part of the drive.

    The deleted data only disappears once it has been overwritten. Sometimes you get utilities which will do something called 'secure deletion' or 'scrubbing' or the like, where it will repeatedly overwrite the space the old file was in with random gibberish so that the traces of the old file are thoroughly removed.

    On am email server, this kind of scrubbing can happen organically if the server is used a lot, since servers do a lot more reading, writing, and overwriting than personal computers do. If the server is busy, and it doesn't have oodles of extra space, then I wouldn't be surprised if the space that a deleted email used to be in was already scrubbed just by usual reading and writing.

    Increasingly, very many servers (and especially ones for commercial services) aren't physical machines under someone's desk or on a rack somewhere, but are run off the cloud, meaning that they are one of the things run off of enormous complexes of computers and storage space, where you rent some computing time and storage from this complex (Amazon is an industry leader in this kind of thing). If the server runs off of one of those cloud services, where there isn't even a physical computer dedicated to it, the organic scrubbing of old file spaces would be routine and very quick, since the physical machines that make up the massive complex are shared across very many entirely different software services: bits of the same email server may across time be run on hundreds of different physical machines, and those physical machines may run thousands of entirely different software services. There's no prospect of tracking down the physical file space devoted to a particular old email under such a system.

    To help you get your head around it, cloud computing is like Just In Time warehousing for computer processes, but with turnover that's faster by orders of magnitude than clunky old crates of physical items. We live in a crazy old world these days.

    ReplyDelete
  3. Marinus is exactly right. To put it simply, computer files are managed by your operating system (Mac or Windows). When you ask your operating system to create a new file, it keeps track of information (or "meta-data") related to that file. However, emails are probably not "files" exactly. In a program like outlook, I believe all your emails are located in a single file (managed by the operating system) but individual emails are inside that file and managed by the email program. So when a single email is deleted, the way that happens depends on the email program, not the operating system.

    ReplyDelete
  4. So to really answer your question ... it depends. My hunch is that for most email programs (including webmail services like Gmail), most likely yes. Even if the answer is no, if the email has been hanging around for more than 24 hours, it is likely included in one of the nightly backups for at least a little while after deletion.

    ReplyDelete
  5. Second Marinus. Pretty much covered it there.

    On your home PC, the actual total erasure of deleted files could take months or might never actually happen if you don't use it really frequently. As noted above, there's a fundamental difference between 'marked for deletion' and 'actually deleted'.

    And how it physically happens:

    On 'old style' mechanical hard drives (that make the characteristic clicking and spinning sounds) the 0 and 1 values for bytes of data are stored as voltage differences. When a byte is reset from 1 to 0, it won't revert back to exactly 0 volts that it came out of factory, and holds a memory of previously written data. If you had a device that could read voltage very sensitively, for a block of data on a traditional mechanical harddrive it might see 0.01V, 0.02V, 0.00V, 1.02V and so on. With the kind of data recovery devices and software that government agencies have they can reasonably interpret whether each byte was 1 or 0 on the previous write, and the write before that, and the write before that. I believe the NSA use at least 7 passes to try and combat against this memory effect - that means that if they were going to re-use a hard-drive they would first write all 0s to it seven times over. The more passes you do, the closer to 0.0000V you get.

    'Modern' Solid State Drives (SSDs) don't really have this memory problem, but also can't guarantee anything will ever be overwritten.

    If you deleted some files from a traditional hard drive, the next files you write will almost certainly fill up what you just deleted. This is because those drives are circular platters so they can access the data at the inside of the platter much faster than data around the outside edge (it's physically much less distance to cover). If they need to store something, they will always prioritise putting it nearer the centre than filling up fresh space on the edge.

    SSDs have parallel storage spaces more like rows of safety deposit boxes in a bank that they can access equally quickly, except if the box was previously full it has to be emptied first, then it can be filled with new information. So an SSD acts opposite to a mechanical hard drive - it takes twice as long to fill up previously occupied space so it will always prioritise fresh new storage space. What this means is that you could delete something and it will stay on the SSD in perpetuity, or at least until you fill up all the rest of the drive and then the drive has to begrudgingly use that 'marked for deletion' space. This problem was discovered very quickly and what now happens is that your operating system runs a command called TRIM every so often when you're not using it intensively that goes through all the 'marked for deletion' bytes and zeros them out, ready to be saved to again at full speed. In my experience, TRIM means data from SSDs is much much less recoverable than from hard drive, but because it's only concerned with maintaining maximum operating speed, not data security, it might not run, or it might only do a partial TRIM, and it wouldn't tell anyone. At least with mechanical hard drives the really paranoid can set it to overwrite 35 passes and know that their data is gone.

    ReplyDelete
  6. And on the cloud

    As long as you just have to have faith that whoever is storing your stuff has set their system up properly. If they have, total deletion should happen pretty quickly because storage space is a premium and they can't afford to keep endless archives of your deleted mail. Once you empty the trash that free space will get used and reused in short order for other people's mail or files or whatever.

    In addition to the 'physical' problems above, two potential problems here are replication and backups. Big services like Google replicate your data across data centres so that you can access it quickly from anywhere in the world, and if there was a total outage at one, full service can (in theory) still continue. If they replicated your mail to a server but then didn't configure it correctly so when you press 'delete' it deletes the mail in question, then you could end up with an archive of emails stacking up somewhere. The same could happen with backups they take but then don’t automatically delete.

    Economics hopefully makes this unlikely, that and at least in Europe there are GDPR laws now to very harshly punish the misuse of data.

    ReplyDelete