• conciselyverbose@kbin.social
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    They have a fixed size output, yes. That output is effectively universally substantially smaller than the input it supports. The fact that they can also take smaller inputs as well increases the actual number of inputs, because those are in addition to the number of full length messages. The point is that the input space is a fuckton of orders of magnitude larger than the output space, which means you’re literally unconditionally guaranteed that collisions have to exist.

    Half your points are specific to a cryptographic hash, which isn’t the only kind of hash or the only useful kind of hash, but since that’s what you’re talking about fine.

    1. Collisions existing are normal. You can only avoid making finding a collision easier than finding the actual input for a password application and finding a collision with a modified hard to do for a checksum. The collisions still exist. In some applications of hashing, eg semantic hashing, collisions for similar inputs are desirable.

    2. Yes, this is the point of a hash, but it’s not hard to do.

    3. Again, same thing. Deterministic code isn’t that hard to do.

    4. Preventing predictability is the only point for a cryptographic hash (besides being deliberately heavy to prevent brute force). If there aren’t systematic flaws to make the distribution of outputs distinguishable from randomness, your cryptographic hash is going its job.