_0ffh 9 hours ago

I am sorry, I don't quite understand where the "abuse" comes in. Attaching function pointers to structs? I've done that in production twenty years ago (and others have probably done it in the 70s) and I see nothing wrong with it. C gives us function pointers for a reason, and it's not to not use them... to me, at least, they're actually one of the premier features of C! (GCC's goto *ptr on the other hand, now there's some potential for getting a little bit wild...)

  • uecker 9 hours ago

    I assume it is because nowadays there are so many people running around claiming you need C++ or Rust or other more complicated languages to build abstractions, that people do not understand anymore that you can do this in C just fine.

    • userbinator an hour ago

      The earliest C++ compilers compiled to C, and I believe that's still possible to do even for the latest versions of the C++ standard.

    • usrnm 8 hours ago

      I don't think I've ever seen a large C codebase that wasn't using structs with methods. It isn't by any measure a secret or lost art.

      • uecker 6 hours ago

        And yet, here we are reading about it in the news.

  • alberth 7 hours ago

    People sometimes use structs when a class has only public members.

    And I believe the STL is also implemented that way as well.

  • fsckboy 6 hours ago

    >I am sorry, I don't quite understand where the "abuse" comes in. Attaching function pointers to structs?

    i had the same reacion, then i realized

    (and yes, i know the following opinion is a minority oldschool boomer opinion, no need to downvote me to hell to make sure i know)

    JSON itself is the abuse of C/unix, like XML before it.

boricj 13 hours ago

I'm working on a C++ library at work that lets you define a data model (a tree of objects, arrays and values) and run visitors through it. The data model is both effectively a JSON schema in source code form and a binding to the underlying data/application through callbacks and whatnots.

I can make it ingest or emit whatever data format I want by writing a deserializer or serializer visitor for it (either directly or through a library), I can perform various pipeline-like transformations by chaining visitors together. The core is header-only, templated, constexpr and suitable for usage on resource-constrained systems.

I know that there are equivalents in managed or interpreted languages (like Java's Jackson library), but I haven't managed to find anything quite like it elsewhere on the Internet for compiled, unmanaged languages. Maybe I haven't looked hard enough or I just don't know what to search for, which is too bad because it handily beats writing serialization/deserlialization code by hand.

  • nly 13 hours ago

    Here's something like that I wrote in C++ in 2017 for JSON using Boost.Fusion + simple function overloading of a single 'from_json' function for handling different types. It works for nested objects, it's runtime type checked, and all numeric conversions are checked for loss while being forgiving.

    https://gist.github.com/nlyan/045fbe075b4e51d83be0cf4513fecd...

    The DEFINE_JSON macro is a tiny wrapper around BOOST_FUSION_DEFINE_STRUCT so the whole parsing routine effectively gets unrolled at compile time.

    https://www.boost.org/doc/libs/1_87_0/libs/fusion/doc/html/f...

    The code predates broad availability of std::optional (so uses boost::optional), [[unlikely]] and the existence of Boost.JSON, so if i were using this technique today that's what I'd use, but at the time I used the taocpp JSON library (which is still actively maintained 8 years later)

    https://github.com/taocpp/json

    Here's an article talking about the technique from a CppCon 2014 talk - "Implementing Wire Protocols with Boost Fusion --Thomas Rodgers"

    https://isocpp.org/blog/2015/01/cppcon-2014-implementing-wir...

    • boricj 8 hours ago

      It's fairly different than the approach inside the CppCon 2014 talk because the data model itself is not actually strongly tied to the underlying types of the data.

      That lack of strong integration does result in a bit more boiderplate code, but it allows more flexibility. In principle, the data model and the actual data can be quite different as long as you can bridge the two together with the boilderplate code. It's more like projecting a JSON schema over a bunch of various getters and setters and running visitors through them, with some helpers to automatically grok STL-like containers.

  • danhau 12 hours ago

    What slightly annoys me about JSON is that the order of object properties is defined to be irrelevant.

    { “hello“: 123, “world“: “foo“ }

    is the same as

    { “world“: “foo“, “hello“: 123 }

    If these two were semantically different, writing deserializer would be easier and more efficient, since you can simply expect the next tokens to represent the currently visited class member, or error.

    Otherwise you need to construct an object tree and look up its properties by name / hash.

    • nly 12 hours ago

      Apache Avro's C++ deserializer for the JSON serialization used to expect exactly that: keys to be in schema order.

      Trust me you don't want this. Usually the reason you want to use JSON in the first place is you want to support third party data access.

      Even if you constrained the field order you'd still have to deal with absent fields.

    • inbx0 8 hours ago

      Does the JSON spec actually say that those objects should be "equal", or does it just leave that detail to implementations?

      In JavaScript at least, those two are not exactly "the same", in the sense that you can observe the difference if you want to. If you parse those JSON strings and then iterate the keys (e.g. with Object.keys), the ordering will be different.

      • reichstein 6 hours ago

        The JSON spec only defines the JSON text format. It doesn't say what the text means. There are obvious interpretations, but every program that reads or writes JSON can decide what it does with it.

        On the other hand, the thing that makes JSON actually useful is the interoperability, that JSON written by one program, on one platform, can be read by another preterm on another platform. Those programs have to agree on a protocol, what the JSON text must satisfy and what it means. It's usually not considered valuable to require object properties to be in a specific order, so they don't. But they could.

    • jasonthorsness 10 hours ago

      Notably this is one of the main differences between JSON and the MongoDB BSON format. Though some client libraries just treat it as JSON and don’t preserve the order.

  • lmm 10 hours ago

    > I know that there are equivalents in managed or interpreted languages (like Java's Jackson library), but I haven't managed to find anything quite like it elsewhere on the Internet for compiled, unmanaged languages.

    Like Rust's Frunk? Or like what Zig or D let you do with "compile time reflection"?

    • boricj 9 hours ago

      It's definitively different than Frunk, the library is not a general-purpose functional toolkit. One could certainly implement it with Zig's compile-time reflection with ease (don't know much about D). Actually, it's superficially similar to refl-cpp's serialization example [1], but with far less templating magic underneath due to the restricted scope.

      [1] https://github.com/veselink1/refl-cpp/blob/master/examples/e...

hyperhello 21 hours ago

Great site, but the JSON looks like it's embedded in C as values but is really just a macro that expands it into a string and then parses it. So you can't put any local variables into the JSON.

In my opinion, C really needs some officially supported reflection for the names of enums and such things. People have reinvented the wheel so many times and it never quite gets there.

  • DeathArrow 14 hours ago

    >In my opinion, C really needs some officially supported reflection for the names of enums and such things.

    C is good as it is an someone has the chance to hold all its features in his memory. If you need or want to complicate things, there's C++ or Rust.

    • pjmlp 14 hours ago

      I very much doubt that as any pub quiz on C will validate.

      People think they know C, in reality they never read ISO C document, aren't aware of the differences between ISO C standard library and POSIX, how each compiler handles ill formed no diagnostic required, implementation defined and UB parts of the standard, and to top that even better compiler specific extensions.

      • virgilp 14 hours ago

        You do not need how compiler handles UB parts of the standard; in fact you should NEVER rely on it, as it can change without warning. That's effectively "incorrect code", by definition.

        • pjmlp 12 hours ago

          And yet that is what many folks happen to do, because "performance all the things!".

          • uecker 8 hours ago

            They will also do this in C++ or Rust. The only difference is that in Rust they will pretend it is ok because they wrapped it "unsafe", so if they mess up it is nobody's fault because it can not possible be a problem with Rust or the Rust programmer, so it was unavoidable fate.

            • pjmlp 8 hours ago

              Like in any language that has adopted unsafe concept since ESPOL/NEWP did it in 1961, at least Rust has the advantage we can find those code blocks without the help of a static analysis tool.

              Nowdays C++ has inherited the same mentality as many performance minded C developers, which is a bit sad, given that during the 1990's it felt we had a better security first mentality, especially with the vendor specific frameworks that were shipped alongside the compilers.

              Regarding static analysis tooling it is kind of sad that many developers still think they known better than the language authors themselves, as per Dennis own words,

              > Although the first edition of K&R described most of the rules that brought C's type structure to its present form, many programs written in the older, more relaxed style persisted, and so did compilers that tolerated it. To encourage people to pay more attention to the official language rules, to detect legal but suspicious constructions, and to help find interface mismatches undetectable with simple mechanisms for separate compilation, Steve Johnson adapted his pcc compiler to produce lint [Johnson 79b], which scanned a set of files and remarked on dubious constructions.

              -- http://cm.bell-labs.co/who/dmr/chist.html

              And to come back to the original point, the worst part regardless of the language is that most tricks are done due to cargo cult and hearsay, without reaching to a profiler a single time.

            • chlorion 5 hours ago

              What are you even talking about?

              I've never saw anyone claim that doing UB in Rust is okay because its wrapped in an unsafe block?

              If anything I've saw the exact opposite of this, cases of UB in libraries is considered a bug almost always, vs in C or C++ where "its the users fault for doing the thing wrong".

              I notice you spreading an awful lot of bs about rust lately, not sure what the deal is but its pretty childish and lame, not to mention objectively wrong.

      • uecker 8 hours ago

        Some even don't know that there is no such thing as "ill formed no diagnostic required" in C.

  • xnacly 11 hours ago

    Hi, author here, I just added the JSON macro to omit the quotation escaping around object keys and strings:

    without JSON macro:

    char* j = "{ \"key\": [\"value1\", \"value2\"]}";

    with JSON macro:

    char *j = JSON({ "key": ["value1", "value2"]});

    But yes of course, due to the stringify no C values can be embedded - i wouldnt even know how one would solve this with a macro, but maybe someone has an idea and can comment on that.

  • immibis 19 hours ago

    That would be a C++ feature. C is very "batteries not included". In particular, it won't generate entire functions or data blocks that you didn't write or include from a library.

teo_zero 15 hours ago

Analyzing the json_value struct, I don't get why values, object_keys and length are fields outside the union. I expected something like:

  struct json_value {
    enum json_type type;
    union {
      bool boolean;
      char *string;
      double number;
      struct {
        char **keys;
        struct json_value *values;
        size_t length;
      } object;
    } value;
  };
Of course I would also use anonymous structs and unions to simplify

  json->value.object.length
to

  json->length
uecker 15 hours ago

I haven't looked at this in detail, but I do not see an "abuse" here. This is just regular C code.

  • flohofwoe 12 hours ago

    It looks a lot like the kind of 'object oriented C' that was all the rage in the 90s though ;)

    • curt15 10 hours ago

      GObject still works pretty much this way.

pjmlp a day ago

Also known as how to manually do the work C++ compilers do for you.

  • tempodox 14 hours ago

    Manually doing the work others have automated still helps you understand.

    • pjmlp 12 hours ago

      There is a difference between understanding and coding a full application like 1990's.

flohofwoe 13 hours ago

Would be good to get some words about performance. Looking at the code there are a lot of granular memory allocations, which IME with other JSON libraries which work like this is the number one performance killer and can add up to seconds when parsing JSON files with hundreds of thousands of nodes.

Always going through a 'virtual method table' also can't be great for performance (not so much because of the indirection, but because it acts as an optimization barrier for the compiler).

Peformance doesn't matter much of course when the code is ever only used to load tiny JSON files.

  • xnacly 11 hours ago

    Hi, author here, definitely, allocating elements while parsing is slow, especially calling realloc for each new element encountered, a growing array backed by an arena would be my go to if i had to overengineer this one.

anonymousiam 19 hours ago

I've used this trick (unions within structs) for decades. You can parse very quickly, but your code will not necessarily be portable. It's a perfect solution if your application will only be run on the hardware you're developing on.

  • atiedebee 14 hours ago

    Why are unions within structs not portable?

    • cantrecallmypwd 11 hours ago

      Undefined padding and alignment.

      • atiedebee 9 hours ago

        That doesn't really matter for sum types as used in this blog, does it? As long as you're not serialising the struct or accessing it through pointers of different types it ought to work anywhere.

      • Keyframe 9 hours ago

        you can do explicit padding and you can force alignment; latter might be compiler specific until C11 at least. You can always check for the struct layout as an assert.. it's doable.

  • cantrecallmypwd 11 hours ago

    Temporary, in-memory representation of an AST generally doesn't need to be serialized or passed outside the process in this use case.

McUsr 2 hours ago

I look forward to the next blog post implementing the actual parser.

userbinator 14 hours ago

Once we hit the end we allocate a temporary string, copy the chars containing the number from the input string and terminate the string with \0. strtod is used to convert this string to a double.

No need to allocate, just use strtod on the original string.

DeathArrow 14 hours ago

>Instead of using by itself functions: attach functions to a struct and use these as methods

He could evolve this further: use SOLID and design patterns.

If the goal is to bastardize C experience, why stop at objects?

pwdisswordfishz 12 hours ago

> CFLAGS := -std=c23

CFLAGS is for optional flags that don't change semantics or whether (correct) code compiles at all…

  • MathMonkeyMan 12 hours ago

    CFLAGS are flags for the C compiler. It could totally change the interpretation of the semantics. C89 != C23.

osmsucks 13 hours ago

I'm not really seeing what's noteworthy about this article.

tempodox 14 hours ago

But how do you know whether a JSON string is a pointer to static data (like in the example with the `main` function) or was dynamically allocated? The same goes for arrays and objects.

  • threeducks 13 hours ago

    You could tag dynamically allocated objects with some magic bytes. To then check whether a string is dynamically allocated, you can simply compare the first few bytes against the magic bytes. This is a probabilistic method, but if you use enough magic bytes, it becomes basically impossible to go wrong.

    See e.g. https://github.com/99991/dynamic/blob/a423a04061ee44bad0720f... for an example (incidentally also a C JSON parser, but with automatic garbage collection).

  • xnacly 11 hours ago

    Hi, author here, i use the 'type' field of the json_value struct, strings, object keys, object values and array members are allocated, i use this info for freeing the memory

    • UncleEntity 9 hours ago

      So, umm, abusing the C switch statement to implement polymorphism?

      I kid, I kid...

      I am curious about how much this really saves over just using C++ classes and something like the curiously recurring template pattern while turning off runtime type checking.

jeffrallen 6 hours ago

Please don't write new code in nonsafe languages.

ingen0s 10 hours ago

this is still the best read of the year for me

flykespice a day ago

Meta but I really appreciate how minimal static but styleful the website design is, specially the code snippets.

  • bt1a 21 hours ago

    looks great on mobile, too. it's exceptional at organizing the content's contrast ratios and spacing, my only nitpick is slightly too much color

  • xnacly 11 hours ago

    Thanks, i tried to go for a mimimalist cyberpunk inspired vibe - thus far i like it :)

    • oguz-ismail 9 hours ago

      What's with the lowercase I's though?

      • johnisgood 8 hours ago

        The first letter in a couple of sentences is also lowercase, so probably typos.

        Somewhere he uses "I" (correctly), somewhere he is not, so it is inconsistent, and most likely just typos.

      • xnacly 8 hours ago

        I am not a native english speaker so I often forget to uppercase the 'I's

jbirer 13 hours ago

This is giving me GObject and Gdk flashbacks. If anyone wants to traumatize themselves, compile GTK2 and work with the aforementioned.

That being said, it's still a very clever way to implement this.

  • robinsonb5 11 hours ago

    I'm seeing reasonably clean use of function pointers without the usual tangle of opaque macros. This kind of pattern easily turns into a mess if you get too clever with it, but I quite like what I'm seeing here.

    (I suspect autotools will have traumatised anyone long before they manage to get GTK2 to compile on a modern system!)