MobileVet 4 days ago

I wish this discussed the timing arbitration of each move. Based on the packet information (if that is correct & complete) then the timing is done entirely on the clients. However, they show the time in seconds which can't be right so I am curious how accurate this packet schema is (or if those are float values).

Regardless, one thing I find maddening about chess.com is the time architecture of the game. I haven't seen the underlying code, but it feels like the SERVER is tracking the time. This completely neglects transport time & latency meaning that 1s to move isn't really a second. Playing on the mobile client is an exercise in frustration if you are playing timed games and down to the wire. Even when you aren't, your clock will jump on normal moves and it is most obvious during the opening.

This could also be due to general poor network code as well. The number of errors I get during puzzles is also frustrating. Do they really not retry a send automatically?? <breath>

Chess.com has the brand and the names... but dang, the tech feels SO rough to me.

  • bluecalm 3 days ago

    Chess.com software might be the worst public facing software ever assembled. During their most popular weekly tournament (by the number of spectators) called Titled Tuesday where significant % of the world elite regularly competes they send links on a public chat to a 3rd site every 4 rounds. The reason is that there is a few minutes break and they failed implementing a clock on their side so they need 3rd party service for that.

    This is one of the many, many things but imo it's the most telling. They can't even add a clock counting down the 6 minutes to their web client.

    • luisgvv 3 days ago

      I can't believe this, but it makes sense now lol I think I heard a streamer say it was for kicking out the cheaters

      • bluecalm 3 days ago

        Another thing is that you need to click around the chat area just before the break ends (and you need to monitor when the break ends on that 3rd party site) so the chess.com server won't throw you out of the tournament for inactivity :)

        >>I think I heard a streamer say it was for kicking out the cheaters

        I can't see how it can possibly help. Maybe he meant something else?

  • pshc 4 days ago

    > it feels like the SERVER is tracking the time

    TBH this is what I expected for all online chess. How else to reconcile the two players' differing clocks and also prevent client-side cheating?

    • MobileVet 4 days ago

      I guess my naive frustration comes from crazy fps games tracking things so precisely and yet somehow Chess.com can’t handle a turn based game?! Honestly.

      I do recognize that fps games utilize predictive algorithms and planning to estimate future player positions but still, turn based networking with 100ms accuracy should be a solved problem

      • ajuc 3 days ago

        Bullet chess is almost an RTS ;) You need starcraft-like micro ;)

      • pshc 4 days ago

        Yeah honestly I agree like it would be nice if they switched to WebRTC or UDP.

        • palata 3 days ago

          So you need to send a few bytes of information every few seconds, but you want to spam the network with UDP packets containing those few bytes?

          • sokoloff 3 days ago

            Sure; why not? If I can stream videos on that same network for entertainment, or play another online, multiplayer game, why not use UDP if it gives a better user experience? It's not like UDP transport of those bytes is significantly worse network-wise and certainly lighter weight than those other alternatives.

          • lomase 3 days ago

            Every single online multiplayer game that cares about latency reimplements a subset of HTTP over UDP.

            • gs17 3 days ago

              >reimplements a subset of HTTP over UDP.

              TCP, although I like to imagine FPS games where shooting someone sends "DELETE /players/n00b HTTP/1.1" to the server.

              • lomase 2 days ago

                Yes TCP!!!!

            • WJW 3 days ago

              Huh, that's not what I would expect at all. If you are having custom network protocols anyway, why deal with all the overhead that even a subset of HTTP brings? You might as well make an entirely new protocol at that point.

              • lomase 2 days ago

                Is TCP. Why everybody does it you ask?

    • nightowl_games 4 days ago

      Netcode dev here. Predicting the clock is a trivially solved problem. The client and server know the latency between each other, the server can offset the timestamp on the input from the client to compensate for this difference, and the client can offset it's rendering of the clock data from the server. The same techniques used in regular online gaming would apply here. The only X factor here is the impact of the client lieing about its latency to the server, perhaps that could have an impact, not sure.

      • crashbunny 4 days ago

        > The only X factor here is the impact of the client lieing about its latency to the server, perhaps that could have an impact, not sure.

        on lichess it does have an impact. lichess has a thing they call lag compensation where the server can add time to a player's clock after the server receives their move.

        The goal is to make it fair for someone with high lag playing someone with low lag.

        I don't know the exact cheating method used. I'll have a guess, though. What if someone spent a few seconds looking at the board before making their move, and then adding (edit: oops, subtracting) a few seconds to their clock in their response packet. The server would see the client made their move instantly based on the time in the response packet, but it took a few seconds for the server to receive the packet. i.e. lag. So it might add time to compensate for the perceived lag.

        Lag compensation cheating is a frequent topic on the lichess forums.

    • palata 3 days ago

      > How else to reconcile the two players' differing clocks and also prevent client-side cheating?

      Is there a point in preventing cheating, really? I can just make a bot...

    • MichaelZuo 4 days ago

      It hasn’t been done client side in any pvp game I’ve heard of.

      • stevage 4 days ago

        I'm pretty sure freechess.org did.

        • MichaelZuo 4 days ago

          How is it being done client side?

          • compiler-guy 3 days ago

            Freechess shipped a binary called “timeseal” that did the calculations for you and encrypted the communications. It was not foolproof—-not by a long shot, but it also didn’t completely suck.

            You can read about what became timeseal here. https://eprint.iacr.org/2004/203.pdf

          • stevage 4 days ago

            Well it's a long time since I played there. But it had custom chess clients, which I assume just recorded how much time your move actually took and sent that with the move.

            Yes, it's easy to cheat with this, but it's very easy to cheat with chess anyway.

            • 4star3star 3 days ago

              This makes the most sense. Start a timer when the UI actually hands control to the player whose move it is, stop the timer when they've completed their move, and simply subtract that from their remaining time. The interval gets sent to the server and relayed to the other player to update their opponent's clock accurately.

    • bongodongobob 4 days ago

      Track the two clients pings? What client side cheating prevention would you need to do in chess? Afaik you can't cheat by clipping through walls or jumping around on the map.

      • connicpu 4 days ago

        The client side cheating would by lying about when you received the packet in order to give yourself more time to think. Even if you only shifted it by 200ms per move, that could add up to a lot over the course of a long game.

        • kaoD 4 days ago

          To give additional context: bullet chess can go down to 1 minute per player. Lying about a few millisecond per move there is huge.

      • HDThoreaun 4 days ago

        Cheat by giving yourself more time

  • pengowray 4 days ago

    > they show the time in seconds which can't be right

    Seems right.

    If you export/download games from lichess, they use the .pgn (Portable Game Notation) format, which is a standard plain-text format circa 1993, used by pretty much everyone for describing a chess game.

    Lichess follows the specification to the letter, and as it only technically allows one-second accuracy, lichess only record moves with one-second accuracy. It seems insane, but that's how they do it.

    Chess.com also exports PGN files, but they add a decimal place, allowing subsecond accuracy. No one has a problem with this. There is no software which cannot handle this. But Lichess refuses to "break" the spec.

    lichess PGN export example:

    > 1. d3 { [%eval -0.15] [%clk 0:01:00] } 1... g6 { [%eval 0.04] [%clk 0:01:00] }

    Chess.com PGN export example:

    > 1. d4 {[%clk 0:02:58.6]} 1... b6 {[%clk 0:02:59.2]}

    • kibwen 4 days ago

      > lichess only record moves with one-second accuracy

      According to this blog post, this doesn't appear to be the case since at least 2017:

      https://lichess.org/@/lichess/blog/a-better-game-clock-histo...

      "Move times are now stored and displayed with a precision of one tenth of a second. The precision even goes up to one hundredth of a second, for positions where the player had less than 10 seconds left on their clock."

      • pengowray 4 days ago

        Interesting. Thanks for the correction and link. I'll note though the .pgn downloads still only show 1 second precision, as do the game PGNs in lichess's "open database" archive.

  • Scene_Cast2 4 days ago

    What I'd love is for my pre-moves to be sent to the server immediately so I don't time out when I pre-moved.

    • fbernier 4 days ago

      What's interesting about this is chess.com allows you to stack as many pre-moves as you like but they each cost 0.1s, whereas on lichess you can only have one pre-move which is technically free but maybe not because of delay.

    • y-curious 4 days ago

      The worst part is they call it an intentional choice. "First off, premoves take 0.1 seconds. That is what has been preferred and agreed upon by most professional players we have consulted on the topic. They prefer .1 to .0 for premove. This is also what other chess servers do."[1]

      It's super annoying and the reason I only play blitz+ on chesscom.

      [1]https://www.chess.com/forum/view/help-support/mate-in-one-qu...

      • CamelCaseName 3 days ago

        Well, I'm not sure about "most professional players" but I strongly prefer .0 to .1

    • KolmogorovComp 3 days ago

      That would introduce other issues I think. Since premove are cancellable/changeable, what happen if you changed at the very last moment but due to delay it did not reached the server in time?

    • ycombinete 3 days ago

      This is how it works on Lichess

  • bongodongobob 4 days ago

    I can't play bullet on chess.com for this reason. Lost way too many games on "time" even though I had a second or two on the clock. Incredibly frustrating.

  • mkagenius 4 days ago

    Vladimir Kramnik agrees with your observations about chesscom.

    • tkahlrt 4 days ago

      Yes, he had timing problems in an online tournament on chess.com (against a Mexican GM in the same room) where his computer did not have all Windows updates and/or the timezone was wrong.

      chess.com confirmed the issue.

    • chongli 4 days ago

      I'm surprised to see anyone bring him up here!

      • sourcepluck 4 days ago

        You're surprised that Kramnik is mentioned when the discussion topic is related to chess? I don't understand why. He's well-known in chess (and in chess memeland).

        • chongli 3 days ago

          Kramnik is a former world champion who has taken a torch to his own reputation by accusing tons of people of cheating without evidence. He’s been banned as a regular columnist on chess.com after using his column as a platform to attack people. He has next to no credibility on any chess issues these days.

          • rjatran 3 days ago

            A ban just means that a group of bureaucrats decided to take public action against a specific person and not against others.

            Carlsen, Nakamura and chess.com itself have participated in the Niemann witch hunt. No evidence of over-the-board cheating has ever been provided.

            None of the accusers is banned.

            Kramnik did the same as they did (by indeed going too far).

            • watwut 3 days ago

              I think there is difference between "accusing tons of people" and "using his column as a platform to attack people" vs "accusing that one person in one instance".

              It is actually perfectly fine to not ban people doing something wrong one or two times while banning people doing that exact thing for fifth time.

              • krlamn 3 days ago

                No, the difference is between a well connected mob going after a single person (Niemann) vs. a single person going after several.

                The former is always excused, presumably because people's ancient group instincts kick in. The latter is a single heretic that must be destroyed.

                The group does much more damage than the individual. Everyone already thought that Kramnik's pseudo-mathematical evidence was garbage, so why the ban? The answer is that he mentioned well connected people like Nakamura, so the rogue nail needed to be hammered in.

                Groups are never punished, and 95% of Internet commenters always excuse the group.

                • chongli 3 days ago

                  No, Kramnik deserved a ban because he was going after not-well-connected, young, up-and-coming players who have no other recourse to defend themselves. Nakamura made it very clear that he was not going to tolerate this because it jeopardizes the future of the game.

                  Niemann was already an admitted cheater who had been previously sanctioned for his activities. Carlsen’s accusations against him may have been unfounded for the specific game in question but Niemann’s reputation was already blemished. Niemann’s lawsuit was inexcusable though, as it was essentially a SLAPP [1] designed to hinder everyone’s efforts to get cheating out of chess. Thankfully the lawsuit was unsuccessful and Niemann continues to play chess.

                  [1] https://en.wikipedia.org/wiki/Strategic_lawsuit_against_publ...

                • progbits 3 days ago

                  Three new accounts just so you can keep agreeing with yourself, huh?

    • nih 4 days ago

      Interesting

galkk 4 days ago

So essentially lichess chose StackOverflow approach - (rather) beefy servers, instead of "treating them like a cattle".

Interesting that they accumulate and periodically store game state. Unfortunately it is not very clear, where they store ongoing game state - in redis or on server itself. Also cost breakdown doesn't have server for redis, only for DB.

BTW, their github has better architectural picture, than overly simplified one in the article: https://raw.githubusercontent.com/lichess-org/lila/master/pu.... Unfortunately, I'm afraid, drawing something like that during interview may not land a job at faang =(

Note that they have cost per game fairly low: $0.00027, 3,671 games per dollar.

Their cost breakdown, for ones who are curious https://docs.google.com/spreadsheets/d/1Si3PMUJGR9KrpE5lngSk...

p.s. I'm not saying that Lichess's approach is the best or faang is the worst. Remember, lichess had 10 hours outage exactly because of the architecture chosen (single datacenter dependency). https://lichess.org/@/Lichess/blog/post-mortem-of-our-longes... . And outages like that are exactly the reasons why multi-datacenter and multi-region architectures are drilled down into faang engineers.

My point is is that there are cases when this approach is legit, but typical interview is laser focused on different things, and most probably won't appreciate the "old style" approach to the problem. I'm sure that if Thibault will ever decide to land in faang he will neither do whiteboard coding nor system design.

  • winrid 3 days ago

    The downtime here is mostly OVH's fault. They're not known for fast support on hardware failures, that's why they're cheap. If they had this architecture on AWS EC2 and could just spin up a new AMI, then they'd only have a few minutes downtime, and the same simple architecture.

  • juujian 4 days ago

    I remember Meta having a few outages of their own. And outlook as well. So I'm not sure what to think now. But sure, on paper FAANG is redundant and hence better.

    • xmprt 4 days ago

      In my experience, issues scale exponentially with scale. So handling 10x the traffic might mean 100x the potentially issues. Redundancy helps with that so when something inevitably fails, the architecture is able to automatically recover and the end user doesn't see any degradation. So what works for lichess wouldn't work for Meta.

  • benediktwerner 4 days ago

    Redid runs on the main server, where lila runs, as indicated in the diagram you linked. And moves are buffered in lila. Redis is only used for pub-sub.

  • jeanlucas 4 days ago

    roughtly 3600 games per dollar? I have over 30k games... Time to pay up

  • epolanski 4 days ago

    > Unfortunately, I'm afraid, drawing something like that during interview may not land a job at faang =(

    Yet another reason to be skeptical of the quality of hiring in faang if anything.

    • immibis 4 days ago

      Why feel anything about it at all? You work at FAANG: be glad for the money or quit if there isn't any. You don't work at FAANG: bad hiring makes it easier for you to get hired and make money.

      • epolanski 4 days ago

        You haven't considered the third option: couldn't care less about working at these companies because of different reasons (personal, financial, geography, cv or whatever).

        My criticism was mostly towards the very poor metrics these companies have introduced behind hiring, albeit I can understand that given the gigantic amount of applications they get a mechanism for removing false positives is acceptable even if missing on false negatives.

        And even more that it spread to companies that do not have their problems and can't afford false negatives.

      • simplify 4 days ago

        This is a limited, self-centered way of thinking (not self-ish, just self in the neutral sense of the word).

        Looking at second-order effects, many companies look up to FAANG for "best practices", which often includes them blindly copying their hiring practices. Without feeling or calling out any healthy skepticism, the software hiring world becomes a worse place overall.

perihelions 4 days ago

- "While these moves could be calculated client-side, providing them server-side ensures consistency - especially for complex or esoteric chess variants - and optimizes performance on clients with limited processing capabilities or energy restrictions."

Just a wild guess: might be intended to lower the implementation barrier for new open-source software clients on new platforms, and/or preempt them from implementing subtle logic bugs that only show up much later.

The rules of chess are a bit tedious to implement, and you can easily get tired and code an edge-case bug that's almost invisible. Lichess itself did this—it once had a logic error that affected a very tiny number (exactly 7) of games,

https://github.com/lichess-org/database/issues/23 ("Before 2015: Some games with illegal moves were recorded")

(I apologize I couldn't find the specific patch that fixed this)

  • xmprt 4 days ago

    For those curious about the illegal move, it seems like it's allowing queen side castling through the king side rook (or vice versa). eg. if this is the first rank, R _ _ R K _ _ _, then you could make the move O-O-O and end up with _ _ _ R K _ _ _

    Naturally, it's not possible to view this move anymore, but this game (https://lichess.org/XDQeUk6j#48) has everything up until the last legal move right before the illegal castling happened.

    • ARandumGuy 4 days ago

      I can see why that only appeared in 7 games. It's pretty rare to see a rook in between a king and another rook that are otherwise legally able to castle. Even rarer for someone to get into that position and actually try to castle.

      Also that linked game is pretty entertaining. It's not a good game, but it can be fun watching lower ranked players make moves that you'd never see in higher level games. Like, who plays Bb5+ against the Scandinavian? Amazing stuff.

    • complexworld 4 days ago

      Wouldn't the bug with queen side castling end up with _ _ K R _ _ _ _?

  • ARandumGuy 4 days ago

    Another wild guess: Lichess could be pre-calculating and caching the legal moves for the most common chess positions. While pre-calculating every possible legal move for every position would be impossible, you could pre-calculate the most common openings and endgames, which could cover a lot of real-world positions. This cache could easily be larger then practical for the client, but a server could hold onto it no problem. This could save on the net processing time, compared to the client determining all legal moves for every position.

    • Sesse__ 3 days ago

      Given that a good chess move generator will work in way less than a microsecond (TBH, probably even less than taking a DRAM lookup for a large hash table), and most chess positions have never been seen before, having a cache sounds counterproductive.

  • epcoa 4 days ago

    > and/or preempt them from implementing subtle logic bugs that only show up much later.

    Validating a submitted move is distinct from listing valid moves. I assumed the server would need to validate regardless of providing a list to the client.

    • perihelions 4 days ago

      It's still duplicated work, and clients are likely to get it wrong and create more work for both devs.

  • benediktwerner 4 days ago

    From what I remember, one of the main reason also was to avoid bloating the JS on the game page. That page is kept especially slim to maximize performance and load times for low-powered devices.

    • ngcc_hk 3 days ago

      Great!

      A bit of surprise consideration … is that even common in these days of overfancy web sites.

hyperhopper 4 days ago

I wish the article explained how it dealt with message loss from the at-most-once redis pub/sub channel

  • benatkin 4 days ago

    Indeed, it does deal with the message loss. I was momentarily confused because in my many thousands of bullet chess games on Lichess I haven't had much of any message loss that can be attributed to Lichess's servers (but plenty when my Internet connection is down or unstable).

    I will have to take a look, because whatever it's doing, it works very well!

    • crabmusket 3 days ago

      The at-most-once delivery could be an issue if lichess's backend services (lila or lila-ws) crash. Presumably this a rare enough occurrence that message loss is more of a theoretical concern.

  • MathMonkeyMan 4 days ago

    I have no idea, but the in-house pub/sub tech at a previous job used [PGM][1] together with some hand-written brokers and a client library. The overall delivery guarantee is at-most-once, but in over ten years and across tens of thousands of machines in multiple datacenters, they never saw a single dropped message. Not sure how they measured that, but I was told the measurements were accurate.

    Well, except for that one major outage where everything shit the bed due to some misconfiguration of IP multicast in the datacenters, or so I was told.

    So, maybe if your mission isn't life critical, you can just wrongfully assume exactly-once delivery.

    [1]: https://en.wikipedia.org/wiki/Pragmatic_General_Multicast

  • DylanSp 4 days ago

    I was hoping for that too, that's the kind of interesting architectural question I wanted this article to answer.

d4rti 4 days ago

I suspect the “l” parameter is for observed latency as the client displays observed latency from the server.

  • lxgr 4 days ago

    Lichess also compensates for latency to some extent.

    To do that, the server needs some measure of “how long does the client think the player actually took to make a move”, to later subtract latency not attributable to actual thinking from the clock.

zxilly 4 days ago

I wonder why this protocol needs an ack? a websocket wrapped in a tls should be perfectly capable of guaranteeing the integrity of the message

  • parl_match 4 days ago

    That just means that the message hit the TLS terminator. It doesn't mean that the backend logic received the state change.

  • andai 4 days ago

    You can verify this with ten lines of code and clumsy (a tool for simulating packet loss).

    I tried this and not all the messages I sent arrived.

    • enneff 4 days ago

      What do you mean? If you open a web socket connection it should behave like a normal TCP connection. All sent data guaranteed to be delivered complete and in order, unless the connection fails.

      • mananaysiempre 4 days ago

        Unless the connection fails, at which point you have no idea when it failed. You know that the other side received all stream offsets within [initial, X] with X ≥ last received ACK, but other than that you have no idea what X is. Even getting the last received ACK value out of whatever API or upper-level protocol you’re using could be nontrivial, because people rarely bother.

      • andai 3 days ago

        I think I had it set up to auto reconnect. So I suppose the packets sent between "failure occurs" and "socket disconnected" were lost.

        At any rate my conclusion was disappointment that if I actually want reliability, I need to implement my own ACKs anyway, meaning I'm paying a pretty high overhead for no benefit.

        At least now there's UDP in browser with WebTransport. I haven't tried it yet, but I hear it's a lot more pleasant than the previous option WebRTC, which was so convoluted (for the "I just want a UDP socket" usecase) that very few people used it.

  • augusto-moura 4 days ago

    Maybe authorization, illegal moves? Don't know the full protocol to know how they handle edge cases. They might just return a NACK

  • enneff 4 days ago

    So that the client knows the message has been delivered and handled by the server, which can make the UI indicate the state of the connection.

burgerquizz 4 days ago

how would you protect your websocket server? I am building a game, but when I put the domain behind (free plan) cloudflare, I get latency delay (3x slower) on the players events.

Saw CF had some paying solution, but was wondering about a free solution

  • NathanFlurry 4 days ago

    I've been managing game servers that get attacked on a daily basis for almost a decade. I've tried Cloudflare a few times (on their business plan) and seen poor results every time.

    Cloudflare has a lower latency product called Argo Smart Routing [1]. When we tried Argo in 2020, we still saw 10+ ms increased latency across the board, which is unacceptable for competitive multiplayer games. That said, Discord voice still (or used to) uses Argo for voice, so there are certainly less latency-sensitive games where it would work well.

    The other issue with sockets over Cloudflare (circa 2020 on business plan) is they get terminate liberally with the assumption you have a reconnection mechanism in place. I'd imagine this is acceptable for traditional WebSocket use cases, but not for games.

    Services like OVH & Vultr also advertise "DDoS protection for games," but I've found these to be pretty useless in practice. We can only measure traffic that reaches our game servers, so I have no way of knowing if they're actually helping at all.

    Your best bet is getting familiar with iptables and fine-tuning rules to match your game's traffic patterns. Thankfully, LLMs are pretty good at generating these rules for you nowadays if you're not already familiar with these tools. Make sure to set up something like node-exporter to be able to monitor attacks and understand where things go wrong. There have been a few other posts on HN in the past that go into more depth about game server DoS mitigation [2] [3].

    I built something in the same vein for my startup (Apache 2.0 OSS, steal our code!) [4] that runs a series of load balancers in front of game servers in order to act like a mini-Cloudflare. In addition to the basics I already listed, we also have logic under the hood that (a) dynamically routes traffic to load balancers and (b) autoscales hardware based on traffic in order to absorb attacks. We're rolling out a dynamic bot attack & mitigation mechanism soon to handle more complex patterns.

    [1] https://www.cloudflare.com/application-services/products/arg...

    [2] https://news.ycombinator.com/item?id=35771466

    [3] https://news.ycombinator.com/item?id=28675094

    [4] https://github.com/rivet-gg/rivet

immibis 4 days ago

As I understand, the separation between Lila and Lila-ws is primarily for fault isolation rather than independent scaling. Maybe independent scaling becomes useful if websocket overhead exceeds what one machine can handle.

jackcviers3 4 days ago

And scalachess is written in scala, to piggyback off a post earlier this month that claimed the language is dead. The project is very successful and has been around and maintained for years.

  • valenterry 4 days ago

    If all the Rust people knew how nice Scala 3 as a language is... they would be surprised.

    What still isn't great is the ecosystem and the build-tooling compared to Rust (part of it because of the JVM). But just language-wise, it basically has all the goodies of Rust and much more. Ofc. it's easier for Scala to have that because it does not have to balance against zero-overhead abstraction like Rust does.

    Still, Scala was hyped at some point (and I find it wasn't justified). But now, the language is actually one if not the best of very-high-level-languages that is used in production and not just academic. It's kind of sad to see, that it does not receive more traction, but it does not have the marketing budget of, say, golang.

    • ackfoobar 3 days ago

      I think the incompatibilities burned a lot of the good will. I'm very fluent in Scala 2, but I will avoid Scala if I can, mostly to stay away from purely functional programmers.

      > all the goodies of Rust

      Does it prevent me from using a non-thread-safe object in multiple threads? Or storing a given object which is no longer valid after the call ends?

      Does it have a unified error handling culture? In Scala some prefer exceptions (with or without `using CanThrow`), some prefer the `Either` (`Result`) type.

      Does it have named destructuring?

      • valenterry 3 days ago

        Yeah, that's true. Scala 2 allowed a lot of weird things and sometimes even nudged people into the direction of overengineering and writing cryptic code. I'm not surprised a lot of people were burned.

        Basically, you needed a good and experienced developer from the start of a project for it to be a nice code base.

        > I'm very fluent in Scala 2, but I will avoid Scala if I can, mostly to stay away from purely functional programmers.

        There is the whole [Li Haoyi](http://www.lihaoyi.com/) ecosystem in Scala that is much more python-like, but nicely designed, statically typed and using immutable datastructures by default. I think it's the best you can get nowadays if you want to have immutable datastructures on the JVM. Any other option I've ever tried was way worse.

        If you are fine with Java's stdlib then I guess Kotlin is the better choice.

        > Does it prevent me from using a non-thread-safe object in multiple threads?

        I would answer the question with yes, but maybe in a different way than you might expect. Scala prevents problems/bugs from using a non-thread-safe object in multiple threads by simply having immutability by default. Rust cannot do that (due to performance) so it has to have another way (the borrow checker). I would argue that the Scala way is better if you don't need the performance / memory-efficiency of rust and can live with garbage collection. That reduces the domains that you can use Scala for, but in exchange the code will be simpler compared to Rust code, so in those domains Scala will have the advantage but it's a minor one.

        > Or storing a given object which is no longer valid after the call ends?

        To this one I would say "in practice yes". Rust is better here, but when using e.g. [ZIO Scope](https://zio.dev/reference/resource/scope/) then the problem isn't really existing. You can technically still do something like that, but you would basically have to do it intentionally. Rust has the advantage here though, but it's a minor one.

        > Does it have a unified error handling culture?

        No, Scala has no unified culture. Maybe the situation is better than in Rust, but then Rust has its own problems. [Just a few days ago I found a comment about a problem caused by a hardcoded panic that caused issues](https://github.com/orgs/meilisearch/discussions/532#discussi...).

        > Does it have named destructuring?

        Unless we are talking about two different things, yes it does. I would even argue that Scala is more powerful here, because it also supports local imports and (with Scala 3) exports. So not only can you extract fields of an object into a variable, you can also generally bring them into scope and alias them at the same time, but you can do the reverse as well: [you can export them as well](https://docs.scala-lang.org/scala3/reference/other-new-featu...).

        • ackfoobar 2 days ago

          > whole [Li Haoyi](http://www.lihaoyi.com/) ecosystem in Scala

          He has very good taste. I wish more Scala people are like him.

          > simply having immutability by default

          Not everything is a pure data structure. I called a gRPC streaming callback with multiple threads in Scala (got garbled result in the receiver). You can say this is the fault of using the Java API, but the more Scala solution (fs2) involves serializing the access under the hood which is not cheap.

          Recalling that my contention is with "all the goodies of Rust".

          >> named destructuring

          > Unless we are talking about two different things, yes it does.

          I guess we are. I complained about this quite some time ago: https://news.ycombinator.com/item?id=31399737

          • valenterry 2 days ago

            > Not everything is a pure data structure. I called a gRPC streaming callback with multiple threads in Scala (got garbled result in the receiver). You can say this is the fault of using the Java API, but the more Scala solution (fs2) involves serializing the access under the hood which is not cheap.

            Well yes, that's what I'm saying: in Scala you sometimes have to sacrifice performance. Though I don't think that serialization is generally required just because you use e.g. fs2 or ZIO for streaming.

            > I guess we are. I complained about this quite some time ago: https://news.ycombinator.com/item?id=31399737

            You are making a fair point. I'm also not happy with some of the decisions about pattern-matching.

            You can resolve some of those issue by just using `import` locally though. So for example, if you have a case class X with many fields and you want to access many of them, you don't need to extract them all in pattern matching (and deal with _ for the stuff you don't need) but you can rather do `val myX = X(...); import myX._` and then just use the fields.

            That is basically equivalent to doing `const {a, b, c} = myX` in typescript. You don't need to do `case X(a, b, c, _, _, ...)` in Scala here.

            Not saying that this just solves all issues, but I think ergonomics of Scala in terms of pattern matching, destructuring and scoping/importing is generally/overall not worse than in Rust, at least not significantly so - that's why I think it's fair to say that Scala has the same "goodies" in this area. I did not mean to say that Scala is as good as or superior than Rust in all language features.

    • kriiuuu 3 days ago

      https://bleep.build is a very promising tool for building Scala projects. I like it more than I like cargo

      • valenterry 3 days ago

        Maybe, but the thing is, if you are a new Scala dev, you will 1.) be confused by the number of build tools. Sbt is still kind of standard but there is now also Mill and now Bleep (first time I even hear of it!). And some people will tell you to just use Maven or even Gradle. Well...

        And 2.) most people will go with sbt; and while it has improved a lot it is still comparably slow, has some annoying bugs and so on.

        Compare that to Rust - I don't think those problems exist there.

        • kriiuuu a day ago

          Hopefully scala-cli being the default runner might help in the furure

ruereed 3 days ago

what actually happens when i make a move is someone takes my piece

huins 4 days ago

> - l: Probably some length?

I don't understand why the author didn't just look this up in the source code. Lichess is open source and we can see exactly what this field is here, it's the average lag:

https://github.com/lichess-org/lila/blob/45b5f0cfbbf6c045ad7...

  send = (t: string, d: any, o: any = {}, noRetry = false): void => {
    const msg: Partial<MsgOut> = { t };
    if (d !== undefined) {
      if (o.withLag) d.l = Math.round(this.averageLag);
      if (o.millis >= 0) d.s = Math.round(o.millis * 0.1).toString(36);
      msg.d = d;
    }
    if (o.ackable) {
      msg.d = msg.d || {}; // can't ack message without data
      this.ackable.register(t, msg.d); // adds d.a, the ack ID we expect to get back
    }

    const message = JSON.stringify(msg);
    ...
Which is calculated from how long the server takes to respond to ping messages that the client sends:

  private schedulePing = (delay: number): void => {
    clearTimeout(this.pingSchedule);
    this.pingSchedule = setTimeout(this.pingNow, delay);
  };

  private pingNow = (): void => {
    clearTimeout(this.pingSchedule);
    clearTimeout(this.connectSchedule);
    const pingData =
      this.options.isAuth && this.pongCount % 10 == 2
        ? JSON.stringify({
            t: 'p',
            l: Math.round(0.1 * this.averageLag),
          })
        : 'null';
    try {
      this.ws!.send(pingData);
      this.lastPingTime = performance.now();
    } catch (e) {
      this.debug(e, true);
    }
    this.scheduleConnect();
  };

  private computePingDelay = (): number => this.options.pingDelay + (this.options.idle ? 1000 : 0);

  private pong = (): void => {
    clearTimeout(this.connectSchedule);
    this.schedulePing(this.computePingDelay());
    const currentLag = Math.min(performance.now() - this.lastPingTime, 10000);
    this.pongCount++;

    // Average first 4 pings, then switch to decaying average.
    const mix = this.pongCount > 4 ? 0.1 : 1 / this.pongCount;
    this.averageLag += mix * (currentLag - this.averageLag);

    pubsub.emit('socket.lag', this.averageLag);
    this.updateStats(currentLag);
  };
  • stevage 4 days ago

    To be fair, the author already put tons of work into this post. Don't begrudge them for not doing even more.

    • huins 3 days ago

      I don't begrudge the author, I'm just surprised given the otherwise high quality of the analysis, including him looking at other parts of the source code.

evrydayhustling 4 days ago

It seems shocking to me that the server enumerates and transmits all legal next-moves. I get that there could be chess variants with server side information, but the article also says it might be good for constrained clients. Is it really cheaper to read moves off a serialized interface than to compute them client side??

  • jdthedisciple 3 days ago

    pretty sure computing moves is in NP so probably yep

    • evrydayhustling 3 days ago

      Nope, finite number of pieces and finite number of viable moves to check on each. Not sure what you're thinking of, but the entire concept of complexity class only applies if there is some axis of scaling (n-size chess board?).

      • jdthedisciple 2 days ago

        I think you might be misunderstanding:

        Yes the instance of chess is finite but the problem of computing moves is inherently in NP.

        The key is that just because a problem is in NP it does't mean that its difficult to solve the instances with small parameters.

        See the famous coloring, SAT, or any other equal NP problem...

        • evrydayhustling 2 days ago

          When we talk about what class a problem belongs in, we have to define the problem with respect to some scaling axis. For example, coloring with K=3 colors is NP-complete with respect to N = # nodes in the graph, but not with fixed N and scaling K. But I think it would actually be an interesting and non-trivial exercise to define a variant of chess with a scaling axis such that computing a list of valid moves for one player is NP-complete. Just scaling board size won't do it. Any suggestions?

          • jdthedisciple 2 days ago

            Sorry I think I was talking about a different thing. With depth as scaling axis it should be NP-complete, but not with depth=1 which is what was being talked about. My bad.

bobmcnamara 4 days ago

nit: fen only encodes board state, not game state

Edit: also includes move count but not repetition.

shironandon 4 days ago

what happens to those websocket connections when the API is updated or redeployed?

  • paxys 4 days ago

    It's pretty easy to build auto reconnect capability in the client. The server will drop all its connections and go out of rotation, and the client will start a new connection and find the new one. If the switch happens fast enough then the user shouldn't even notice.

  • conover 2 days ago

    Along with the reconnect solution already mentioned, you can also decouple your Websocket and business logic layers using something like Pushpin: https://pushpin.org/. This allows you to deploy your business logic layer without disconnecting/reconnecting clients.

  • zazaulola 4 days ago

    It is to be expected that LLM will make a decision on its own if it suspects any changes to the API. In any case, there is no time to fix the code during the game.

sam0x17 4 days ago

20 years later I still think "female lich" whenever I see the word lichess, even though I know it's li chess.

  • krisoft 3 days ago

    One day I, if I find the time for the pun, i really want to sculpt a chess set where the black pieces are all undead necromancer wizards and the white pieces are all asian fruits with rough-skin. That way we can have a game of lychees vs liches on lichess.

  • AlienRobot 4 days ago

    When you promote a pawn to queen that's actually the lichess.

  • Suppafly 4 days ago

    makes me think of the Asian fruit.

  • Keyframe 4 days ago

    there are more of us then!

blastro 4 days ago

lichess is one of the best sites on the internet. very happy to contribute my $5/mo

  • hilux 4 days ago

    Hello, fellow Patron!

    Even though nowadays I hardly have time to play, I'm still happy to support such a delightfully honorable and usable(!) open-source project.

    • dankwizard 4 days ago

      People love mentioning that they donate to LiChess.

      It's a weird trend. Altruism truly does not exist

      (I donated btw) (Probably more than you) (But who's counting)

      • hilux 4 days ago

        You must be fun at parties.

  • trod123 4 days ago

    If you consider this to be true, you would seem to have a rather low standard.

    There are many aspects in which they are not the best.

    • dibyadarshan 4 days ago

      Like?

      Ad-free, compute intensive, non-CRUD, massively scaled, complex cheat moderation, infinite puzzles/analysis, educational (studies/tactics/openings explorer), etc. All this for free. I'm curious what's the best website in your opinion

      • trod123 3 days ago

        I could elaborate, but rather, let me ask you this instead since its more relevant.

        What is the point of responding with any legitimate criticism when any potentially negative sentiment however mild, upfront, expressing disagreement, gets downvoted to the point where the mechanics of the website squelches the person and silences them (by purposeful intent).

        Can you ever have any legitimate intelligent conversation after a participant has been harmed and effectively silenced in this way?

        When you cannot speak freely, there can be no intelligent communications raising the bar objectively. The opposite occurs, and anything provided, even seemingly rational conversation falls after such a threat or action of violence, all conversation then falls into the gutter as a result of the added coercive cost imposed. You may contend that its not violence, but it meets the WHO definition for such which properly accounts for psychological torture and coercion (of which this is a common form).

        It should go without saying, but you cannot have any intelligent conversation when those who embrace totalitarian methods prevent you from speaking (and yes these meet the criteria).

        At the point this happens, regardless of valid criticism, or pointing out errors in methodology, it all dies on the vine, the communication is clear; you will be punished for disagreeing. That destructive behavior inevitably leads to ruin.

        This is fairly basic stuff, in order to think and be intelligent, one must be able to risk being offensive. In order to learn something new, one must risk being offended.

        When neither are possible because you or someone else muzzles any conversation expressing disagreement or corrosively add cost, even under such modest terms as here, the fallout is silent, yet devastating.

        It might not seem like much, but the light goes out of the world as those with intelligence withdraw their support, and the natural consequences which were held at bay by these people, albeit slow moving, become inevitable.

        Best of luck to you. There is only the possibility of harm by continuing any discussion under these circumstances.

        I'd suggest remembering this when you start wondering, "where have all the intelligent and competent people gone?".

        Silence doesn't indicate agreement. It is indicative of the best and brightest no longer contributing to the same systems that seek to destroy or enslave them.

        • palata 3 days ago

          It seems like your negative sentiment above has been downvoted a lot, and I understand your frustration. Your comment was indeed not offensive.

          But I believe it was just that: a negative sentiment. Not exactly a "constructive, intelligent criticism". And when you go there, the reality is that people will vote to reflect their own opinion. If you say "This project is so amazing!" and get a ton of upvotes, it does not mean that your comment is super useful; just that many people agree. Similarly, if you say "Naah, it sucks" and get a ton of downvotes, it means that many people disagree. Not that they want to silent you.

          Now try an actual constructive criticism: you may get downvotes (that's how it is because people are emotional beings), but probably upvotes as well if you bring interesting insights.

          > There is only the possibility of harm by continuing any discussion under these circumstances.

          That's fair. I think one mistake there is that you should have started with a constructive criticism rather than an admittedly polite "naaaah, I think it sucks".

          • trod123 3 days ago

            There is no point, even my previous response was significantly downvoted, and that was quite constructive which contradicts your entire statement.

            You might suggest such, but this has the effect of just baiting me for a response so it can be marked down more where you are engaging me for the effect to further punish.

            You see these people don't do this because of their opinion, they do it because it causes psychological harm, its a totalitarian tactic that is not unknown. Silencing was used somewhat heavily during Hitler's rise to power.

            Forcing the only conversation to first agree before moving forward, at any point, causes you to fight your own psychology to remain consistent and the process warps you subtly. Most aren't self-reflective enough to notice but the effect is the same regardless.

            Robert Cialdini wrote quite a lot about the various lever of influences that are often used as mental compulsion/coercion, and Joost Meerloo and Robert Lifton both cover these structures and techniques in detail in the context of WW2 torture and moving forward. These behavioral structures run parallel with those of the Nazi's, and other totalitarian regimes.

            This is what is happening, and when mild conversation causes this type of behavior, this is the time you should be most greatly concerned because its arbitrary, causes mass delusion, and continues until destruction, albeit slow, overtakes that group.

            As far as I'm concerned, the people doing this can ride their train right to their own demise for all I care. They are true evil, and they'll be doing the world a favor when that happens. The rules of society will no longer protect them once they destroy society.

            We were taught from children to not be violent. This is violence, there is no excuse for bullying, and people are no better than animals if they can't reason and be civil. What one does in small things, they do first in large things that matter.

            If they want to be violent for a mild comment like that, they won't get anything from me, and I'll reciprocate in the only way I can right now, withdrawing and not providing anything of value.

            I'll pray I never meet them in person because if you or anyone else tries to harm me, I'll be exacting an equal or greater cost in self-defense.

            This destructive behavior is despicable on so many levels, and you say its not so bad but you don't realize just how bad it gets, this behavior promoting menticide, and robotization is what led to the gas chambers in Germany during WW2.

            When no one questions rationally, or can express disagreement, evil flourishes. You can't ever argue with evil, you have to kill it, as we had to do during WW2 (at great cost).

            Read the notes from the Wannasee Conference, or if you can't be bothered, rent Conspiracy (2001). The history is well documented by experts who studied these things to prevent it from ever happening again, and yet it seems no one has learned their lessons since they repeat it yet again.

            They are emotional beings (/s), can you imagine that being a valid defense of Nazism during WW2? For the deaths of all those Jews in the camps? If it's unjustifiable at the extremes, it is unjustifiable anywhere.

            These are the same things, the only difference is perspective and the fact that you don't have perfect information upfront at the bottom level, you only ever find out afterwards, and its a goose-step death march ever forward and people don't realize this is how it works. One step at a time, pivoting, with no questions.

            This is only the beginning, and when you can't stop it early, then its too late to do anything later to prevent the massive destruction that these people inevitably bring on themselves and everyone else.

            This is why it is so damn important to protect and maintain freedom of speech in a civil atmosphere. There can be no rational support for this behavior, not ever.

            Please stop making excuses for the truly evil.

            • palata 3 days ago

              Well, moderation is very hard. I guess in an ideal world people would be able to upvote/downvote based solely on the quality of the comment, and flag for moderation when they genuinely think a comment is unacceptable. In such a world, your polite "nah it sucks" would still be downvoted (because it's neither insightful nor pleasant for Lichess supporters, let's be honest) but it would not disappear; it would just appear at the bottom of the list and you wouldn't know how many people disagreed with you. But that's not how it is, so your comment looks like it got moderated when actually I genuinely believe it just got downvoted by many people.

              Now I would not compare this to Nazism or call it "true evil". Try to pick someone randomly in the street, go talk to them and politely explain why you find them unattractive (e.g. "I just would like to say that anyone finding you attractive would have pretty low standards"). Would you call it Nazism if they asked you to leave them alone?

              • trod123 3 days ago

                When you silence people, you isolate them.

                Isolation does weird things to the mind, one of which distorts reflected appraisal, other parts where reflected appraisal is denied in communications are even more impactful. It induces a involuntary hypnotic states which varies in intensity by exposure.

                The tortured takes on mannerisms of the torturer, and when denied communication, or only distorted reflected apprsail long enough their entire being unravels, and you have psychotic break or disassociate completely. The psychotic break is a semi-lucid state where planning is capable. To make an apt comparison, an active shooter might fall into this latter category. Years of investigations into what causes these people to do what they did, and the powers that be can only say with certainty that these people were bullied, but no clear cause can be found.

                Coercion is a dangerous thing.

                You can corroborate what I've said with the literature in the books I mentioned, or with isolation studies where the studies had to be cancelled early for the safety of the study participants. There is a lot of research out there.

                Destroying someone's identity, their personality, what makes them them, is true evil, and these structures are how you do it which is why its so important to recognize the problem, few do.

                Your comparison is apples to oranges. It is not asking someone a opinion, its silencing them entirely by force, where they are disadvantaged when they don't answer. There is a very big distinction between the two.

                • palata 2 days ago

                  > When you silence people, you isolate them.

                  I do understand, and it is unfortunate that moderation gets mixed up with score. If you have a better solution, feel free to explain it! Because not moderating at all is a problem in its own.

                  • trod123 2 days ago

                    It wasn't always like that. The actual solution is something that all forums have done for decades starting in the 90s with BBS/moderated usenet groups.

                    You only allow moderators to do the actual moderating. You don't allow upvotes or downvotes because they are used following a sybil attack structure (many sock-puppet accounts to one person) to silence or amplify messages.

                    You have a flag button (which they already have in place for HN), to report content that should be flagged for violating rules.

                    Those that report spurious (good) content get warned/punished for abusing the flag features. They may have the feature silently revoked (shadowbanned) for abuses, or banned for other activity suggesting the accounts are sockpuppets (i.e. going directly to an article when normal viewing requires first loading the index, then following a link with the associated metadata including the referrer to get to the page to make a report.

                    It really is that simple.

                    This moderates, and it limits bad actors, its still done in most online forums that are still around; because it works.

ilrwbwrkhv 4 days ago

Beautiful architecture. Startups and companies like Netflix should learn from this instead of cargo culting microservices.

  • enneff 4 days ago

    And what exactly do you think lila, lila-ws, and redis are if not microservices (or as they should be called, “services”)? Lichess could easily be implemented as a single monolithic process but it is not.

    • immibis 4 days ago

      They are services, but not micro. lila-ws spun off of Lila for a good reason (fault isolation) and not because "let's make everything a service". And they don't follow any standard microservice pattern - a reverse proxy isn't a microservice.

  • ajkjk 4 days ago

    What? Do you have some reason to think Netflix's architecture is deficient?

    • paxys 4 days ago

      Because the top 5 comments on HN always say so, so it must be true.

    • ilrwbwrkhv 4 days ago

      Overly complicated with microservices. Can be made 10x simpler.

      • LinuxAmbulance 4 days ago

        Sometimes simplicity is not the best goal.

        Redundancy, scalability, decoupling, resilience, best possible handling of errors, cost optimization, etc. may be more important at the scale Netflix operates at.

        • lcnPylGDnU4H9OF 4 days ago

          > Redundancy, scalability, decoupling, resilience, best possible handling of errors, cost optimization, etc. may be more important at the scale Netflix operates at.

          So much that they built a tool to intentionally make things difficult (read: it arbitrarily stops production system processes/containers/etc.) and help inform what decisions to make in favor of fault tolerance.

          > Exposing engineers to failures more frequently incentivizes them to build resilient services.

          https://github.com/Netflix/chaosmonkey

          https://en.wikipedia.org/wiki/Chaos_engineering

          • renewiltord 4 days ago

            Embarrassing. I built 99% of Netflix functionality locally with VLC and a subdirectory of mkv files.

            • trashburger 4 days ago

              Good for you. Now please aim 10,000 requests a second at your file server.

              • renewiltord 4 days ago

                Because I don't use microservices, I don't need 10,000 requests a second to play a video file.

                • achierius 4 days ago

                  I think the point was 10,000 files on 10,000 different hosts, per second.

                  • renewiltord 4 days ago

                    Well, if they’re only watching one second of video that’s easy. The files could be super small too.

                    Okay, I can’t keep this up. I was parodying the position not being serious.

        • ilrwbwrkhv 4 days ago

          For Netflix level of complexity. Pornhub has more traffic and serves more customer than Netflix with monolithic PHP and some services.

          • kredd 3 days ago

            They require completely different levels of viewing patterns and complexity. It’s such a reductionist take.

      • ajkjk 4 days ago

        You know something about their internal architecture and why it was built that way and the tradeoffs involved, I guess?