I received this today, which prompted me to write a post as right now everyone is getting websites made for their shopfronts as soon as possible, I won’t reveal whom they were, as I liked that, despite their awful security prior, that they were proactive in admitting their mistake and notifying everyone who needs to know.
What stuck out to me is that they mentioned their additional security measures “includes obfuscating all credit card information and hashing all passwords” which to me is just something that anyone anywhere should already do, but if you don’t, read this whole post.
Basically, the company went through and built their website, their web developer wasn’t security conscious (read: not very good) and he made it work. It worked happily, all was well and everyone was having a gay old time.
Then someone else discovered that he hadn’t locked the doors of the website, and upon entering found the credit card numbers, expiry dates, and security codes just lying in a pile in the middle of the room. And they probably had a field day, sneaking in every once in a while, grabbing the new numbers and sneaking out again.
Those credit card numbers could have been yours.
What they were doing
There are two things to note that they did wrong, the first one is more abstract, the second is exacting.
The first one is “they made a website that wasn’t perfect, and someone was able to hack it”. If your website is perfect and you’re better than all of the top tier software engineers in the world, you can stop reading here.
The second one was, “they stored all of their data as if their website was perfect” and that was in the form of keeping it in the database as “Plain Text” which is literally as it sounds. Human readable. This blog is plain text.
The second one is probably the bigger issue, because there are a lot of smart people out there so making a website that can’t be hacked is extremely difficult.
But even if they did hack it, the second one, which is very easy to implement, would have minimised the damage.
Additionally, it means that the website owners themselves have granted themselves the unfettered ability to read your personal information, mostly because they were lazy.
Think of the person you least trust, they probably have a website people buy stuff from. Now picture them being able to read the email addresses, passwords, credit card numbers and everything else on all the people that use their website.
Hashing, Encrypting and Plain Text
These are probably the three main terms you’ll hear when talking about data security as a website owner and they’re not too difficult to explain, nor spot or implement. But only ONE of them should be used (unless you have a really good reason).
Each of them is a method of storing data that may be later recalled, and in the case of passwords, recalled regularly. Since it can be recalled, then it means that it is something someone else could read if they got access to your database, and it’s your job to make it unreadable for them.
The explanations below will be relative to passwords.
I’ll start with hashing, because it’s what you should be using when it comes to passwords.
Hashing is taking the users password as it comes in and scrambling it in a consistent but irreversible manner.
For example, a basic hash could assign a number to every letter in the alphabet where A = 1, B = 2 and so on, then add the numbers together for every letter in the password, and finally truncate it to one character. So the password “Daniel” becomes 4+1+14+9+5+12 = 45. Then it gets truncated to 4.
The truncation means you always have a non-unique outcome, making it irreversible, yet still a consistent one. You couldn’t take 4 and derive the password “Daniel” back out of it, and it also makes it easy to plan your database as someone couldn’t supply a 50,000 character password and get a number that’s 6 characters long when you only made space for one character.
Now that you have the hash, and it’s consistent, you need simply save the number 4 to your database. The next time they try to login, their password will be hashed in the same wau, it will spit out the number 4, and you then compare the new hash to the saved one and if they match, then they’re (probably) the same input. You needn’t save their actual password, but can still check it.
A single digit numeric only hash would be a pretty poor one, as you’d have a 1 in 10 chance of two different passwords giving the same hash, which is why modern hashing algorithms are a lot more complicated.
So why is irreversible important? And subsequently, why would we use this for passwords over encryption.?
For both questions the first reason is, “People are lazy” and the second reason is, “People are untrustworthy.”
People are lazy?
Yep, that’s exactly it. People are lazy. Remembering a hundred different passwords is hard, and most people don’t understand the benefits of a password manager, so many people recycle a few, I was certainly guilty of that in the past. I could use my password “Daniel” for my email, for Facebook, and also my bank account.
What this means, is that if a hashed password was reversible, and someone hacked my website full of passwords due to lazy coding (more laziness), they would get a list of recycled usernames, and if they could derive a list of recycled passwords then they could then break into my bank, my Facebook and my email account (easy to find because my email address would be in the same table).
What about the untrustworthiness reason?
Most discussion about data breaches and stolen data tends to revolve around hackers. But there’s actually a much easier path to getting this information without finding a vulnerability in someone’s system.
Work for them.
A disgruntled IT employee in most jobs could easily get access to a database, skim it without anyone noticing, and on top of that have a much easier time tracking down the decryption keys on a reversible hash. This is where the untrustworthiness lies, hell even a sole proprietor who writes his own site themselves could skim the data, credit card information and the like and use it how they so desire. And there’s no reason why an employee should ever have access to see someone’s real password.
These two reasons, laziness and untrustworthiness are why hashing, and not encryption, is how you should be storing passwords. There’s no reason why a website owner their employees, or a hacker, should ever need to be able to derive your actual password from the value in the database for logging into their site or application, just compare the hash.
Encrypting is the term used for an algorithm that scrambles an input in a reversible manner. And is used for storing data where the original value needs to be recalled, but also needs to be protected from prying eyes.
This makes it great for things like credit card numbers, API keys and secrets and other personal information you don’t want hackers to be able to access if they steal your database.
But terrible for passwords.
Encrypting is usually performed by an extremely complicated cypher with an encryption key layer. To simplify it, a simple cypher would be similar to hashing in that every letter is assigned a character, so A = 01, B = 02, C = 03 and so on, then the output is concatenated (written one after the other). So “Daniel” would be encrypted using this awfully insecure method to 040114090512. You could then flip the key and reverse it back to Daniel 04 = D, 01 = A, 14 = N, 09 = I, 05 = E and 12 = L.
An encryption key can then be added to personalise it and “prevent” people from being able to reverse it using just the knowledge of the key pair. EG, if the encryption key was numeric, and it was added to the final number, a key of 3 would mean that A = 04, B = 05, C = 06. So now “Daniel” gets encrypted to “070417120815”.
If someone were to guess that my encryption key was 2, and they tried to reverse it, they would get “Ebojfm” instead of “Daniel” as they minus 2 from 04, resulting in B instead of A, for example.
Why is it terrible for passwords?
Because there’s no reason why a website owner needs to know the actual content of your password.
Once again, people are lazy and untrustworthy. Most people will recycle passwords, and a disgruntled employee, with the ability to reverse a password could easily then jump on Gmail and just start mashing in email addresses from the database alongside the decrypted passwords supplied and likely get a bunch of hits.
When you type your password in, the website owner doesn’t need to decrypt your saved password and see if it matches, they just need to hash what you type in and see if the hash matches, since it’s consistent based on the algorithm given and they never get a chance to know of the actual contents of your password.
A good web developer will never store someone’s actual password, or ever be able to recall it UNLESS there’s a good reason where they would need to use it.
So when would you use encryption?
Encryption is used when the website needs to protect the data from prying eyes, but also recall it for use later.
Saving credit card numbers is a good use of encryption (although even then, everyday websites shouldn’t be storing credit card numbers anymore, most secure gateways have a system that will store it for you then return a token that can bring it back later), encrypt the number when it’s not in use, decrypt it when the customer wants to buy something without having to retype their credit card details.
Alternatively, when the service is a middle man for another logged in service.
For example you sign up to a service where that website then logs into another website and performs an action for you, they can’t submit the hash to that other website, in this case they SHOULD encrypt the saved password with a strong encryption method in case someone ever finds a bug in their code and gets a shot at an SQL injection and steals their database.
Those are just examples, though, encryption is good for any time data should get further protections in a database but also need to be recalled later. You could, in theory, encrypt ALL private information, but that would add a lot of overhead to your website performance and be overkill on some information.
Plain Text is as it describes, taking the raw data and storing it as itself. EG, if my password is “Daniel”, and you went into the database and looked at my account, under the column Password it would say “Daniel”.
Passwords should NEVER be stored as Plain Text. But they often are, especially on older websites or homebrew.
The website at the start of this article used Plain Text passwords (and credit card information, so a double no-no, don’t store credit card information, and if you do, encrypt it at the bare minimum).
When their site was hacked and their database stolen, the hacker found themselves a handy reference guide of personal information, email addresses, oft-recycled passwords and valid credit card data. Shopping spree time.
Why would a website store passwords in plain text?
Two reasons, they’re lazy and they’re ignorant.
They’re lazy because it’s easy to hash or encrypt passwords.
In PHP, for example, a single function will hash a password and the rest is the same (unless you’ve have a good web developer using a top class hashing algorithm, in which case they need a second function to verify it but that same function will replace the code the dev would have written for the comparison).
Hashing a password is as simple as this in PHP:
$password = password_hash($password);
And comparing is as easy as this:
$valid = password_verify($password, $hash);
Someone who can’t be bothered doing that can’t be described as anything other than lazy.
Ignorant, on the other hand is because they haven’t been bothered to learn why it’s so important to understand data security, and learn the extremely easy bare basics in protecting data, such as those two functions above.
Someone who saves passwords in plain text can’t be described as anything other than ignorant.
Someone who saves passwords in plain text can’t be described as anything other than lazy and ignorant.
Final note on Plain Text Passwords
There is one other reason why a website would save passwords in plain text (or at least, retrievable text) but it’s not a good reason. It is, however, a great method for an end user to tell how poorly a website has been coded.
Emailing the user their password.
If a website emails you your password, whether at the start, or because you’ve pressed the forgot password button, they have your password saved and not a hash. If you see this, and you’ve used that password elsewhere, IMMEDIATELY CHANGE IT. They’re going to be hacked one day, and that password will be exposed for the internet to see.
After changing it, report it to these guys (who haven’t updated since March 2021 at the time of writing, so possible they’ve stopped), they’re a website that names and shames websites that store passwords in a retrievable format, and there’s some pretty big names on here, such as newworld.co.nz stopped saving them in plain text in March 2021, and Atlassian didn’t change until December 2020. Crazy.
Never email people their password, if they forget it, generate a new password and force them to change it from that.
I have a website but don’t code it myself, how can I check this?
If you have a website that has been built for you, talk to your web developer.
If they’re unsure, assume they’re doing it wrong.
Someone who has good command of data on a website would be able to say without hesitation whether passwords are hashed or not, they wrote it, after all.
Unless: You’re using a CMS (Content Management System) such as WordPress. If your website is simply a skinned WordPress site then your developer may not know, but your passwords will be hashed.
But better safe than sorry, if your web developer doesn’t know I would recommend you seek independent advice to verify, they don’t necessarily need to look into the code, they can simply check the table on the database.
I’m a developer and didn’t realise, where can I learn?
I can suggest, but I can’t exactly give you a single location as the same rules apply for every programming language but are implemented in different ways, and people learn in different ways.
For example, I’m self-taught, and everything I learn is via the old Googles. Whereupon I will read screeds of information from different sources and compile concepts and best practices based on what makes sense to me.
For a strict set of resources, start with OWASP and Google for more understanding on concepts you’re struggling with or that are more relevant to your specific programming language.
OWASP is a non-profit organisation dedicated to good software security practises (good code, not antivirus).
What algorithm should I use? Is SHA-256 okay?
No, SHA256 is not okay. SHA256 WAS okay a decade or more ago, but there are much better solutions out there.
Why is SHA-256 not okay now?
If you’re reading this you’ve probably at least heard of the blockchain (Bitcoin is a cryptocurrency built on the blockchain). Most blockchains are “proof of work” systems where dedicated devices work to find the original answer of a hash, and Bitcoin hashes use SHA-256 for their algorithm. As a result modern hardware is absolutely geared towards quickly finding the root to SHA-256 hashes.
This makes SHA-256 particularly weak in this modern age, SHA-256 hasn’t been broken yet, so it’s not reversible, but modern hardware can perform literally hundreds of millions of attempts to guess it per second.
Because of this, modern hashing algorithms are designed to slow the number of attempts that can be made, rather than try to make them more complicated (although they do the latter as well) by repeating hashes, increasing memory requirements and other means.
To simplify things, if a device can only make 10 attempts per second, and it takes a billion attempts to work out a password it would take 3 years running on full bore to work out one password. While a SHA-256 password could be calculated in 10 seconds at 100,000,000 hashes per second.
It’s also for this reason that the best defense again brute force attacks on a website is simply limiting the number of failures. If someone gets locked out for 5 minutes after their password fails 5 times, then that means a password can only be guessed once per minute.
So what do I use?
Follow the recommendations of OWASP, but right now the new hotness is Argon2id. As well as being slow to process, it takes other factors into account that means that hashing the same value twice will result in two different hashes, with a secondary function being able to marry up those two values and determine if one would have been the other.
My site saves passwords in plain text, when should I fix it?
Right now. As soon as possible.
Now now now.
Don’t wait for anything, hackers aren’t waiting for your next financial cycle, or your website to be upgraded. If you know you have a problem it should be your biggest priority.
It’s not even that hard to fix, add another column to your password table called “hashedPassword”, hash all the plain text passwords into it, and delete the old record once it’s done. Then update your code to point at this new column and hash/verify against it.
Finally: My site does hash passwords, but in SHA-256, should I fix it?
Absolutely, it’s not such a high priority as plain text passwords but should have some sort of importance.
It’s also not that difficult, either. Again, add another column into table called “newPassword” or something.
Then in your login function, call both columns, hash the password to be tested using both SHA-256 and the new method (argon2id).
If the “newPassword” column is empty for that user, compare the old SHA-256 column, and if it passes, update the “newPassword” column with the new hash, and clear out the SHA-256 record.
If it’s not empty, verify the “newPassword” record.
Set a date where it must all be rolled over, say, 6 months from now, on that sixth month clear out the remaining SHA-256 columns and send out an email to the stragglers to say next they login they’ll need to use the forgot password function, you might even recover a few customers who weren’t going to return.