• 0 Posts
  • 183 Comments
Joined 4 months ago
cake
Cake day: March 23rd, 2025

help-circle


  • If you have a hash collision in a cryptography context, you have a broken system. E.g. MD5 became useless for validating files, because anyone can create collisions without a ton of effort, and thus comparing an MD5 sum doesn’t tell you whether you have an unmodified file or not.

    On a hash map collisions are part of the system. Sure, you’d like to not have collisions if possible, but if not then you’ll just have two values in the same bucket, no big issue.

    In fact, having a more complex hashing algorithm that would guarantee that there are no collisions will likely hurt your performance more because calculating the hash will take so long.





  • Who pissed in your coffee?

    Sure you can write some script to interpret the data, but then you need to write an extra script that you need to run any time you step through the code, or whenever you want to look at the data when it’s stored or transferred.

    But I guess you have never worked on an actually big project, so how would you know?

    I guess you aren’t entirely wrong here. If nobody other than you ever uses your program and nobody other than you ever looks at the code, readability really doesn’t matter and thus you can microoptimize everything into illegibility. But don’t extrapolate from your hobby coding to actual projectes.




  • I see what you are saying. But if you aren’t using a cryptographic hash function then collisions don’t matter in your use case anyway, otherwise you’d be using a cryptographic hash function.

    For example, you’d use a non-cryptographic hash function for a hashmap. While collisions aren’t exactly desireable in that use case, they also aren’t bad and in fact, the whole process is designed with them in mind. And it doesn’t matter at all that the distribution might not be perfect.

    So when we are talking about a context where collisions matter, there’s no question whether you should use a cryptographic hash or not.


  • This is about cryptographic hashing functions (to be fair, could have spelled that out in my prior comment, but in general when someone talks about anything security relevant in conjunction with hashing, they always mean cryptographic hashing functions).

    MD5 is not a cryptographic hashing function for exactly these reasons.

    Also, the example you gave in your original comment wasn’t actually about distribution but about symbol space.

    By multiplying by four (and I guess you implicitly meant that the bit length of the hash stays the same thus dropping two bits of information) you are reducing the number of possible hashes by a factor of four, because now 3/4 of all potential hashes can’t happen any more. So sure, if your 64bit hash is actually only a 62bit hash that just includes two constant 0 bits, then of course you have to calculate the collision chance for 62bits and not 64bits.

    But if all hashes are still possible, and only the distribution isn’t perfectly even (like is the case with MD5), then the average chance for collisions doesn’t change at all. You have some hashes where collisions are more likely, but they are perfectly balanced with hashes where collisions are less likely.



  • Symbol names can be outdated as well, but what’s worse is they can be flat-out wrong.

    Real-life example that I had at my last job:

    var isNotX = isX()
    
    // somewhere else in the code:
    
    var isX = isX()
    
    fun isX() {
      // Code returns isNotX
    }
    

    That part of the code had a bug and it wasn’t clear whether the function should return X or not X (the function was much more complex but returned a boolean).

    A comment could have given context and/or be used as parity check for which implementation would have been correct.

    This way I had to step through the whole flow just to figure out what it’s doing and what it’s supposed to do.


  • Harry Potter, especially in the first few books, is really not hard fiction at all. Rowling’s worldbuilding is only there to make for a nice, somewhat magical backdrop for a children’s story. Close to none of the in-universe rules she sets up really work if you look at them hard enough.

    It starts with Wingardium Leviosa (and many other spells) blatantly breaking the laws of thermodynamics, thus allowing for infinite energy generation and thus infinite matter generation, but this continues not only throughout the magic system but also throughout every other system she sets up. Because most of it is nothing but a whimsical caricature of real things.

    The money system is a caricature of the old British pre-decimal £sd money system.

    Quidditch is a caricature of football (thousands of ways to perform a foul), rugby (brutal tackling and violence on the pitch) and cricket (a game can last for months) rolled into one.

    The house system and house cup are only slightly embellished versions of what exists in real-life British boarding schools.

    Just a few examples. The books are specifically not written in a rational-logical way. Attacking that is so easy that it’s just boring. It’s like proving that raindeer noses don’t glow bright or that gingerbread lacks the static properties to be used to build life-sized houses for witches.



  • Not only are generally experienced developers really valuable, but developers experienced with the project they work on.

    It takes a long time to actually understand everything in a large project, and if you do, you save a ton of time because you just know a lot of context already. No need to research or figure things out, you just know.

    That’s why the constant reorgs in larger corporations are incredibly hurtful to performance. If you want performance, let people stick to the few projects they know instead of switching stuff around all the time.


  • I know it was just an example, and examples are always contrived. It was just a good example to show how complexity can increase a lot if you use classes/types as glorified comments.

    This total denial of the existence of comments, which is a common attitude right now, can easily do more harm than good, when you add code not for functionality but instead just to have more space for symbol names to put text into.

    The project I worked on in my last job suffered a lot from this. It would lead to some really weird patterns, like e.g. this:

    class BaseCommand {
        var commandType
    }
    
    class CreateCommand extends BaseCommand {
        public CreateCommand() {
            this.commandType = CREATE;
        }
    }
    

    We had dozens and dozens of classes that didn’t actually contain any actual code, but were just flavour for “readability”, which in turn just created garbage lines of code and all that trash would negate any potential benefits to readability. And since the code was spread out over so many classes it would mean that even simple changes would require touching tons of files.

    I had, for example, a very simple 1 Story Point task to do. The whole task was “rename a variable”, nothing more than that. I had to change 40 files. Not an exaggeration, it really was 40 files.

    Code is for code, comments and external documentation are for documentation. Don’t abuse one for the other.


  • You can write comments, but you can’t make your colleagues read them. They don’t necessarily have to visit the originating file to read the docs.

    When do you need documentation? When you are down in the code or when you are sitting on the toilet browsing Confluence? If your goal is to make people read the documentation, then the documentation needs to be immediately there where you need it, not in some external thing like Confluence.

    Same goes with if your goal is to make people update the documentation. That’s much more likely to happen if the documentation is in a comment in the code than when you first have to go hunting to find the correct page in that steaming pile of mess that is confluence.

    Just be clear and explicit. Its not gaming; you dont have to care about losing a couple extra frames to type out a few extra characters. Most IDEs have sufficient autocompletes so it’s literally not even a problem in many cases.

    You still only got so much screen real estate, and having huge names means that your lines get very long at times, which makes everything really hard to read.


  • Tbh, creating new code just to shorten variable names is pretty bad practice. Don’t do that.

    Each line of code needs to be maintained, each line of code can contain bugs and reusing such a class in locations it wasn’t actually made for can cause more harm than good.

    And if you are adding external information (e.g. via a class) why not just add that information as inline documentation (aka comments) instead? Most IDEs can handle that so that if you hover over the variable/function name it will display the documentation. And pretty much all IDEs allow you to navigate to declaration with just one click, and there you find the necessary information.

    You example only gets worse if you keep nesting these things, so for example if I have:

    int sleepIntervalSeconds = 0;
    

    Then I immediately know:

    • It’s an int (not a double)
    • It’s an interval used for sleeping
    • It’s in seconds

    (Putting all that in a comment instead of the variable name is almost equally as visible via IDE)

    Instead consider your proposal, which would read like this:

    Intervals var.sleep = 0;
    

    I used var as the variable name since you abstracted the informations “sleep”, “interval” and “seconds” into other definitions.

    So now I still know it’s an interval used for sleeping, but both the real variable type and the information that it’s in seconds is gone. I have to navigate to the Intervals class/type to figure that out.

    IRL this often gets even worse, because Intervals probably doesn’t even contain the fields directly, but instead inherits from a Time class, which then inherits from some other class, and then you might get to the actual definition of the field.

    This is only amplified by using Mixins and other obfuscation goodness.

    If you have two options, and one option creates extra code, extra classes and extra code paths without reducing the complexity of the code significantly, than that’s the worse option.