Show HN: Duper – The Format That's Super

duper.dev.br

26 points by epiceric 21 hours ago

An MIT-licensed human-friendly extension of JSON with quality-of-life improvements (comments, trailing commas, unquoted keys), extra types (tuples, bytes, raw strings), and semantic identifiers (think type annotations).

Built in Rust, with bindings for Python and WebAssembly, as well as syntax highlighting in VSCode. I made it for those like me who hand-edit JSONs and want a breath of fresh air.

It's at a good enough point that I felt like sharing it, but there's still plenty I wanna work on! Namely, I want to add (real) Node support, make a proper LSP with auto-formatting, and get it out there before I start thinking about stabilization.

jitl 19 hours ago

I think a neat route would be to use this as an authoring plugin in VS Code, like prettier: write Duper (or JSON5, or whatever), and then downlevel it to regular json automatically when pressing cmd-s. You wouldn't get to keep your comments (or they could be transformed to { "//": "comment text" }).

Outside of that, it's tough to compete with JSON in the "human readable unschematized serialization format" market, especially targetting JavaScript:

Use in the browser requires some degree of bundle size increase, since the parser code needs to be loaded before your format can be used. WebAssembly libraries are usually quite large compared to a pure-JS implementation. According to [bundlejs](https://bundlejs.com/?q=%40duper-js%2Fwasm&treeshake=%5B*%5D), @duper-js/wasm weighs in at about 488 kB uncompressed, 159 kB gzip.

Use in any JavaScript runtime means you're competing against the runtime's native `JSON.parse` and `JSON.stringify`. In v8, these are very quick and have runtime-level tricks to go faster, for example see [v8's recent post on making JSON.stringify 2x faster](https://v8.dev/blog/json-stringify) when serializing plain objects with no funny business .toJSON methods, replacer, or indent formatting.

Besides those points, my major complaint about JSON is how expensive it is to encode binary data for transmission; in JSON I usually do base64, with your format it's transformed to escape characters that are less efficient than base64, right? \xNN is base16 with 2 extra bytes wasted on the \ and x, or \uNNNN which is base 10 with 2 extra bytes. Is there a way you can fit binary with no expensive encode/decode step into the format?

So, for me this seems suitable as a config file format: there you get good benefit from comments, identifiers, easier string authoring. Not sure I need the binary raw string thingy in config files that much, but I guess it doesn't hurt.

  • notpushkin 15 hours ago

    > I think a neat route would be to use this as an authoring plugin in VS Code, like prettier: write Duper (or JSON5, or whatever),

    This actually somewhat works right now. If you pass this JSON5 example through Prettier:

        {
          // comments
          unquoted: 'and you can quote me on that',
          singleQuotes: 'I can use "double quotes" here',
          lineBreaks: "Look, Mom! \
        No \\n's!",
          hexadecimal: 0xdecaf,
          leadingDecimalPoint: .8675309, andTrailing: 8675309.,
          positiveSign: +1,
          trailingComma: 'in objects', andIn: ['arrays',],
          "backwardsCompatible": "with JSON",
        }
    
    You’ll get:

        {
          // comments
          "unquoted": "and you can quote me on that",
          "singleQuotes": "I can use \"double quotes\" here",
          "lineBreaks": "Look, Mom! \
        No \\n's!",
          "hexadecimal": 0xdecaf,
          "leadingDecimalPoint": 0.8675309,
          "andTrailing": 8675309,
          "positiveSign": +1,
          "trailingComma": "in objects",
          "andIn": ["arrays"],
          "backwardsCompatible": "with JSON"
        }
    
    Which is still invalid JSON... but it does fix unquoted keys, floats, trailing comma, and single → double quote strings with correct escaping. So if you have “format on save” enabled in your editor, it might just work!
  • epiceric 12 hours ago

    Duper certainly won't outperform the native JSON implementation (and it likely never will), though I do think benchmarks would be a great addition. Bundle size and binary representation are definitely things I'll keep in mind!

    The config file transpiration to JSON idea is quite interesting. It's pretty similar to how I'm already defining the TextMate grammar used by the website's syntax highlighter, so I'll certainly try to incorporate that into the tooling.

    • jitl 5 hours ago

      It may be worth it to pipe Duper into your WASM/native code, and get back plain JSON out, which you then hand off to the runtime's `JSON.parse` with a post-processing step to support any special features needed. Something like this:

          // idea of implementing public duper.parse function to lean on
          // runtime's JSON.parse
          //
          // downlevel to json, eg binary strings become base64 normal json strings
          const { jsonString, enhancements } = duper.duperToJSON(data)
          // let the runtime go fast when decoding
          const rawObject = JSON.parse(jsonString)
          // `enhance` knows the paths to all the binary base64 strings
          // and replaces them with Uint8Arrays
          const decoded = duper.enhance(rawObject, enhancements)
      
      Here enhancements is something very easy / low cost to construct over the FFI bridge, like

          type Path = Array<string | number>
          type TransformFn = (value: unknown) => unknown
          type Transform = TransformFn | Enhancements
          type Enhancements = Array<[path: Path, transform: Transform]>
      
      Not sure if this would end up faster, it may allocate more, but it's probably better than unoptimized object/array construction from WASM/native -> runtime. You could also try with a `reviver` argument to JSON.parse but i always find the lack of full path to key somewhat clunky.
aappleby 16 hours ago

Where the ** is the grammar specification? Prose is nice, but with a BNF I could plug this into my parsing expression grammar library right quick and give it a rundown.

  • epiceric 13 hours ago

    Good point. I'll see about making one.

anilgulecha 16 hours ago

The object notation format that's going to win is the one that's going to maximally support LLM output. I've come across BAML before, but it's not widely used for some reason.

Today JSON is winning, but for more complex structures, there's still syntax issues in output. XML does reasonably well (given the deep react jsx/HTML in the training corpos), so perhaps that will make a comeback.

Are there benchmarks on this? I think the SOTA models are fine -- they can work with most models, but the fun is that models that are 90% of SOTA performance and cost 90% less - which output format do they work best with. This is where the winner will be found.

TLDR: probably JSON or XML will remain the config format for a while.

ACAVJW4H 19 hours ago

Nice work this actually looks great. Of course, it’s only a matter of time before someone drops the XKCD about standards proliferation, so I’ll save them the trouble. Pre-emptive XKCD #927 deployed.

anonzzzies 14 hours ago

Why no date and time?

  • epiceric 12 hours ago

    My reasoning is that they are normally transmitted as strings in JSON, and you could use an identifier like DateTime("2025-11-02T02:33:00Z") if you need to be explicit.

    Making them part of the language would increase the complexity of parsers - how would you validate that a date is actually valid? It's doable (YAML and TOML do it, after all) but requires extra steps.

    • epiceric 7 hours ago

      Although given the feedback I've received, date/time might get included into the format.

      • jitl 5 hours ago

        note that a DateTime w/ a UTC offset is significantly different from a DateTime w/ a TimeZone (+ optional Calendar), aka ZonedDateTime. ZonedDateTime(July 26, 2035 10:15:32pm in Instanbul) may not necessarily always be at today's value of Instant(July 26, 2035 10:15:32pm in Instanbul). If you are going to support date/time, you should not use the word "DateTime", "Date", "Time" in a way that is ambiguous (is it a ZonedDateTime, or an Instant?), or forget to include support for ZonedDateTime.

        MDN page on JavaScript's Temporal library gives a good overview of the difference between the two; today's practice of encoding Instants as ISO 8601 strings in UTC (Z suffix) or at a UTC offset is okay for ephemeral data-in-motion that will be used right now, but is not a good practice for persisted data since time zones, DST rules, etc change all the time. Temporal is the JS-specific API, but these concepts apply to all handling of date/time/etc data in computer systems.

        That said, v8 plans to use [temporal_rs][] as their Temporal backend.

        Temporal: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

        temporal_rs: https://crates.io/crates/temporal_rs

        You can encode extended ZonedDateTime information to string following this RFC [Date and Time on the Internet: Timestamps with Additional Information](https://www.rfc-editor.org/rfc/rfc9557.txt)