I was interested in Apple’s approach where they would look at checksums of the images to see if they matched checksums of known CSAM. Its trivial to defeat by changing even a single pixel, but it’s the only acceptable way to implement this scanning. Any other method is an overreach and a huge invasion of privacy.
Maybe, depending on the algorithm used. Some are designed to produce the same output given similar inputs.
It’s also easy to abuse systems like that in order to get someone falsely flagged, by generating a file with the same checksum as known CSAM.
It’s also easy for someone in power (or with the right access) to add checksums of anything they don’t like, such as documents associated with opposing political or religious views.
One-way math doesn’t preclude finding a collision.
(And just to be clear, checksum in the context of this conversation is a generic term that includes cryptographic hashes and perceptual hashes.)
Also, since we’re talking about a list of checksums, an attacker wouldn’t even have to find a collision with a specific one to get someone in trouble. This makes an attack far easier. See also: the birthday problem.
Checksums, on the other hand, are designed to minimize the probability of collisions between similar inputs, without regard for collisions between very different inputs.[8] Instances where bad actors attempt to create or find hash collisions are known as collision attacks.[9]
Even this method is overreach: who control the database?
Journalist have a scoop on a US violation of civil rights? Well not if it is important to the CIA who slipped the PDF that was their evidence into the hash pool and had his phone silently rat him out as the one reporting.
This hands ungodly power to those running that database. It’s blind, and it “only flags the bad things”. Which we all agree CSAM is bad, but I can easily ruin someone inconvenient to me if I was in that position by just ensuring some of his personal and unique photo get into the hash. It’s a one way process, so everyone would just believe definitively that this radical MLK guy is a horrible pedo because we got some images off his phone in a diner.
It’s not as easy to defeat as just changing the pixel…
CSAM detection often uses existing features for image matching such as PhotoDNA by Microsoft. Similarly both Facebook and Google also have image matching algorithms and software that is used for CSAM detection which.
These are all hash based image matching tools used for broad feature sets such as reverse image search in bing, and are not defeated by simply changing a pixel. Or even redrawing parts of the whole image itself.
You’re not just throwing an md5 or an sha at an images binary. It’s much more nuanced and complex than that, otherwise hash based image matching would be essentially useless for anything of consequence.
I was interested in Apple’s approach where they would look at checksums of the images to see if they matched checksums of known CSAM. Its trivial to defeat by changing even a single pixel, but it’s the only acceptable way to implement this scanning. Any other method is an overreach and a huge invasion of privacy.
Maybe, depending on the algorithm used. Some are designed to produce the same output given similar inputs.
It’s also easy to abuse systems like that in order to get someone falsely flagged, by generating a file with the same checksum as known CSAM.
It’s also easy for someone in power (or with the right access) to add checksums of anything they don’t like, such as documents associated with opposing political or religious views.
In other words, still invasive and dangerous.
More thoughts here: https://www.eff.org/deeplinks/2019/11/why-adding-client-side-scanning-breaks-end-end-encryption
Checksums wouldnt work well for their purposes if they could easily be made to match any desired checksum. It’s one way math.
One-way math doesn’t preclude finding a collision.
(And just to be clear, checksum in the context of this conversation is a generic term that includes cryptographic hashes and perceptual hashes.)
Also, since we’re talking about a list of checksums, an attacker wouldn’t even have to find a collision with a specific one to get someone in trouble. This makes an attack far easier. See also: the birthday problem.
Checksums, on the other hand, are designed to minimize the probability of collisions between similar inputs, without regard for collisions between very different inputs.[8] Instances where bad actors attempt to create or find hash collisions are known as collision attacks.[9]
https://en.m.wikipedia.org/wiki/Hash_collision#:~:text=Checksums%2C on the other hand,are known as collision attacks.
Even this method is overreach: who control the database?
Journalist have a scoop on a US violation of civil rights? Well not if it is important to the CIA who slipped the PDF that was their evidence into the hash pool and had his phone silently rat him out as the one reporting.
This hands ungodly power to those running that database. It’s blind, and it “only flags the bad things”. Which we all agree CSAM is bad, but I can easily ruin someone inconvenient to me if I was in that position by just ensuring some of his personal and unique photo get into the hash. It’s a one way process, so everyone would just believe definitively that this radical MLK guy is a horrible pedo because we got some images off his phone in a diner.
It’s not as easy to defeat as just changing the pixel…
CSAM detection often uses existing features for image matching such as PhotoDNA by Microsoft. Similarly both Facebook and Google also have image matching algorithms and software that is used for CSAM detection which.
These are all hash based image matching tools used for broad feature sets such as reverse image search in bing, and are not defeated by simply changing a pixel. Or even redrawing parts of the whole image itself.
You’re not just throwing an md5 or an sha at an images binary. It’s much more nuanced and complex than that, otherwise hash based image matching would be essentially useless for anything of consequence.