diff options
Diffstat (limited to 'doc/details.mdwn')
-rw-r--r-- | doc/details.mdwn | 365 |
1 files changed, 365 insertions, 0 deletions
diff --git a/doc/details.mdwn b/doc/details.mdwn new file mode 100644 index 0000000..e0f85e5 --- /dev/null +++ b/doc/details.mdwn @@ -0,0 +1,365 @@ +[[!toc]] + +## Storing a key + +* Input password and name from user. The name is a combination of their own + name and a more obscure name (such as the name of their high-school + sweetheart). +* Get the keyid of the key. (This can be any a public value + unique to the private key, eg a gpg keyid. It's only used to allow + storing multiple keys under a given name. If a gpg public key is not on the + keyservers, or the key is not a gpg key, can use "" for the keyid.) +* Generate N by argon2(name, salt=keyid), tuned to take 10 minutes. +* Generate N1-N3 by sha256 of N+1,2,3 +* Generate decryption puzzle P, a byte chosen at random. +* Generate K by argon2(password, salt=name+P), tuned to take 0.195 minutes. +* AES encrypt (data + checksum) with K as the key. +* Shamir the encrypted key with N=2, M=3, yeilding S1-S3. +* Servers reject attempts to store an object under a name that is + already in use. +* Servers do not allow enumerating all objects stored, + and require a proof of work to handle any request. +* Upload S1-S3 to separate servers under the names N1-N3. + If any of the uploads is rejected as name already in use, + ask user to enter a different name or password. + +So, storing a key takes 10 minutes. + +## Recovering a key + +* Input password and name from user. +* Calculate N and N1-N2 +* Request N1-N3 from servers until two objects are available. +* Shamir recombine the objects. +* Guess a value for P. +* Generate K by argon2(password, salt=name+P) +* AES decrypt +* Repeat with new P until checksum verifies. + +This takes 10 minutes to calculate N, plus on average 128 guesses of P. +Total recovery time varies from 10 minutes to 60 minutes, with an +average of 35 minutes. + +## Difficulty of brute forcing a single encrypted key + +* Assume we know the name and keyid, or have otherwise found a way to + determine the shards of a key. Download and recombine the shards. +* Guess a password. +* For each possible value of P, AES decrypt with + K = argon2(password, salt=name+P), and check if checksum verifies. + This takes 0.195 minutes * 256 = 50 minutes total. +* Repeat for next password. + +So, for a password with N entropy, the number of CPU-years of work +is to crack it is: `2^(N-1)*50/60/24/365` + +* Strong password (50 entropy): 53553077761 CPU-years +* Weak password (30 entropy): 51072 CPU-years +* Super-weak password (19 entropy): 25 CPU-years + +So, if an attacker is able to find the right shards for a secret key, it's +feasible for them to crack super-weak and weak passwords, assuming the +secret key is worth the cost of doing do. Stronger passwords quickly +become infeasible to crack. + +## Attack methods + +An attacker who wants to target a particular person can guess the name they +used, derive N1-N3, download two of S1-S3 and start brute forcing the +password soon after the object is stored. This is the most likely attack +method, so any user who could potentially be targeted like this should +choose a strong password. A name that attackers are unlikely to guess +prevents this attack, which is why keysafe prompts for not only the +user's name, but also a more obscure name. Each name guess that the +attacker makes takes 10 minutes of CPU time to generate N, as well +as whatever proof of work the servers require. + +The sharding prevents a single malicious server from blindly +guessing weak passwords across its entire collection of objects. +It takes two servers colluding to try to recombine their shards. + +If recombining two shards yielded data which could be checked to see if +it's valid, then it would become fairly inexpensive to try all combinations +of shards, and obtain all the encrypted keys for further cracking. So it's +important that there not be any checksum or header in addition to the AES +encrypted data. (AES encrypted data cannot be distinguised from random +garbage except by block size.) Get that right, and with N keysafe users, an +attacker would need to try `2^(N-1)` combinations of shards to find one on +average, and would need to brute force the password of each combination. +With only 20 keysafe users, assuming all users have super-weak passwords, +this attack takes 13107200 years of CPU work `(2^19*25)` to crack one. With +50 users, this jumps to quadrillions of years of CPU work. + +Colluding servers can try to correlate related objects based on access +patterns and recombine pairs of those, and then brute force the password. +The correlation needs to be very fine-grained for this to work. +If 15 users' objects are all bucketed together by the correlation, +then the attacker has 16384 combinations to try on average before +finding a correct combination. Multiply by 5 years CPU work for cracking +only super-weak passwords. + +Colluding servers who want to target a particular person +can guess their N1-N3, check if those objects exist on the server, and +begin brute-forcing the password. This is not much cheaper than the same +attack performed by a third-party attacker, except that the attacker +doesn't need to wait for an object to download from the server to check if +it exists. + +A state-level entity may try to subpoena the entire contents of keysafe +servers. Once 2 servers are compromised, the state-level entity can try the +same attacks that colluding servers can use (but probably cannot use +attacks involving correlation because the server operators should not be +retaining the data needed for correlation). Of course, they probably have +many more resources to throw at the problem. But with only 50 keysafe +users, recombining shards and trying super-weak passwords would be +prohibitively expensive as detailed above. So, a state-level entity +will probably only find it feasible to target particular people. +Since such an attack can be performed by anyone as discussed above, +there seems to actually be no incentive for a state-level to subpoena data. + +A state-level entity's best bet at getting lots of keys is probably to use +their resources to compromise keysafe servers, and modify them to log data. +Then a correlation attack can be done as discussed above. + +A different kind of attack: Legal/extralegal action to get a +particular person's key removed from storage servers to try to +deny them access to it. Or, to entirely take down storage servers. + +### Malicious data attack + +Two servers could collude to serve up malicious data to try to exploit the +user's system. + +For example, if the user is using their gpg key to encrypt emails, +and they restore a different gpg key, they might use it to encrypt with and +then what they said could be decrypted by the attacker. + +To perform this attack, the attacker first has to manage to crack the user's +password. Then they can replace the objects with malicious versions, encrypted +with the same password. + +So, this is not too useful for gpg key replacement, since the attacker +must already know the secret key. However, perhaps they could exploit bugs +in gpg to compromise the user's system. + +## Server list + +There's a server list shipped with the client, giving their tor onion address +and the organization providing the server. + +Three of the servers in the list are recommended servers. +Shards are stored on these unless overridden by other configuration. + +When recovering a key, the client tries the recommended servers first. But, +if it fails to recover the key using those, it goes on to try other servers +on the list. This way we don't need to remember which servers the shards +were stored on, and can change the recommended servers when necessary. + +See [[servers]] for more on the server list. + +## Servers + +Servers run exclusively as tor hidden services. This prevents them from +finding out the IP addresses of clients, as well as providing transport +level encryption. + +Servers should avoid keeping any logs, and should santize +the timestamps of any files stored on them. (Eg set back to epoch +and store on filesystem mounted with noatime.) + +Only small objects are accepted to be stored. This is to prevent this from +being used as a general purpose data storage system, and only be useful +for storing keys. + +However, `gpg --export-secret-key` can be hundreds of KB in size +when a key has a lot of signatures etc. While this size can be cut +down to 4 KB using `paperkey` to only include the secret key, it's +desirable to store the public and secret key together. This way, +a user does not have to publish the key to keyservers, which makes some +attack methods much less likely to try to crack their key. + +So, the object size should be at least a few tens of KB. 64kb seems +reasonable. If keysafe needs to store a larger key, it can chunk it up. + +Objects shoud be padded to a fixed size before they are encrypted, to +prevent attackers from correlating and targeting particular objects based +on size. + +## client-server Proof of Work + +The Proof of Work prevents servers being flooded with requests. +This is particularly important to prevent misuse of keysafe servers +to store large data on them. It's also somewhat useful to prevent attackers +guessing the name someone used to store a key; but the cost of generating +N from a name makes the server's proof of work only a secondary line of +defense against such an attack. Finally, PoW is useful to protect against +DOS attacks. + +Assuming that the client communicates with the server over http: + + PUT /keysafe/objects/N + GET /keysafe/objects/N + +The server's response can be either the result of the request, +or a proof of work requirement, which specifies the difficulty D +(number of 0's needed), random salt RS, and the number of argon2 +iterations. The client provides the proof of work in a query parameter, +which is a string S such that argon2(N,S+RS) starts with a given number +of 0's. + +(The server should only be able to control the number of iterations, +not other argon2 parameters, to avoid forcing the client to use too much +memory. Normally, the server will keep iterations small, probably 1, +since it does need to calculate the argon2 hash once itself.) + +The server can use a [token bucket](https://en.wikipedia.org/wiki/Token_bucket) +to throttle requests to a given rate. In fact, there can be a whole +series of token buckets B0,B1.., for increasing difficulty proofs of work. + +A request without a proof of work is checked in B0. If that bucket is empty, +the server responds with a proof of work requirement D=1, and +the client makes a new request whose proof of work allows access to B1. +If the server is under load, B1 might also be empty, and so the client +will have to try again with D=2 to access B2, and so on. + +If there are 4 buckets, and each bucket refills at the rate of a +token every minute, then the maximum allowed throughput is 4 requests +per minute. If calculating the proof of work takes 2^D seconds on average, +then it will take on average 16 minutes work to access B4. + +The server can generate a different RS for each request, and can +insert them into a bloom filter to keep track of ones it has given out. +Bloom filter false positives are not a problem because they are quite +rare and so it is not efficient for a client to make up RS in hope that +there will be a false positive. + +To guard against DOS attacks that reuse proofs of work, the server can +maintain a second bloom filter, putting RS into it once it's used, and +rejecting attempts that reuse a RS. Since the bloom filter will +(with a low probability) yield a false positive, the server should reject +an attempt by generating a new RS' and requesting a new proof of work from +the client. + +## Avoiding correlations + +As noted above, the more objects that are stored, the more secure +keysafe becomes, because it becomes harder to correlate objects +that belong to a user. + +Ways the server could draw correlations include: + +* Time that the data was stored. + (Avoid by waiting some period of time between uploading + various objects, probably on the order of days. If every keysafe uploads + its objects at midnight GMT, the server will find it hard to correlate + them.) +* IP address used to store the data. + (Using tor avoids this.) +* When the data was retrieved. + (Avoid by waiting some period of time between retrieving shards. + Although, this makes key recovery take longer, which could be + frustrating..) +* A user ID associated with the data. + (Avoid by allowing anyone to store data and don't associate data with + any user ID.) +* Same sized objects may be related shares. + (Avoid by making all objects stored be a standard size.) + +## Detecting corrupt data + + ori> if a single server is compromised, it can return bogus data when you request its fragment of the shared secret + ori> if you only have three servers, you can't know which two to trust, short of just trying + ori> you end up with three possible ways to reconstruct the secrets, A-B, A-C, B-C. only one is legit. + +This could also happen due to error not compromise. Or even due to +asking the wrong server for an object and getting back a dummy response. + +To guard against this, include a sha256sum of the secret key in the +data that is sharded. This way the validity of the restored key can be +verified. + +Note that this may make it marginally easier for brute force attacks, since +they can use that checksum to detect when the attack is successful. Only +marginally easier because it's not hard to fingerprint a gpg secret key. +Even the minimised raw key output by paperkey can be recognised by a parser. + +## Versioning + +The algorithms and parameters chosen by keysafe could turn out to be +wrong, or need adjusting to keep up with technology. While we'd like to +avoid this by getting the design right, it's important to have a plan in +case it becomes necessary. + +The simplest approach would be to include a version number with each shard. +But, this causes a problem: When switching to a new version, the early +atopters of that version are a small group, and so it becomes easier to +correlate their shards. + +The version number could instead be included in the data that is sharded. +This avoids the above problem, but leads to an even worse problem: +An attacker can look for the version number after combining some random +shards, and if they see it, they know they picked shards that are actually +related. As described in "Attack methods", if an attacker can do that, +it becomes easy for two colliding servers to find the right combinations of +all shards. + +A safe alternative to embedding a version number anywhere in the data is +to adjust the parameters of the argon2 hash used to generate the shard +names. Maintain a list of versions and their shard name argon2 +parameters (as well as other parameters etc). During key recovery, +keysafe can try different versions in turn, use argon2 to generate the +shard names, and see if those shards can be downloaded. Once it finds +shards, it knows the version that created them, and the other parameters +of how the data is encoded. + +The only downside to this method is that each new version added to the list +makes recovering data from all older versions take 10 minutes longer, as +the argon2 hash has to be run an additional time. + +Note that the argon2 hash parameters to vary between versions should not be +merely the number of rounds, as that would allow an attacker to hash for +each version's number of rounds in one pass. Instead, vary the hash +memory parameter. + +## Ideas + +Some ideas for improvements to keysafe. + +### Assisted Password-based Key Derivation + +An idea from [Nigori](http://www.links.org/files/nigori/nigori-protocol-01.html#anchor15): + +If the user has an account at some server, the server can contribute part +of the data used to generate the key encryption key K. So not only is the +user's keysafe password needed for key recovery, but the user has to +authenticate to the server. As well as adding entropy to K, the server +can do rate limiting to prevent password guessing. + +Risks include: + +* The server becomes a target to be compromised. +* If the server goes down, the user loses the ability to recover their gpg + key from keysafe. + +### Entangled objects + +An idea from Anthony Towns: + +Two or more objects stored on a keysafe server can be entangled. +When one of them is requested, the server should delete the others. + +This could be used in several ways: + +* Keysafe could upload random objects entangled with a key's object, + and keep local records of the names of the random objects. Then if the + user wants to stop storing a key on keysafe servers, keysafe can request + the entangled objects to (hopefully) delete the real objects from the + server. +* Keysafe could pick or prompt for some tripwire names that an attacker + might try if they were looking for a user's data. Then an attacker + risks deleting the data stored on the server. Although this also has + DOS potential. +* The user could pick a real name and fake name and keysafe uploads + objects for both. If the user is being forced to give up their keysafe + name and password, they could provide the fake name, and if it were + used, their data would get deleted from the keysafe servers. |