Write Code That Is Human-First

This is the second post in my series: Coding Without the Jargon, where I show how I clean up and gradually improve real-world code.

Rule #2: write your code so a smart 12-year old with no programming experience can follow the general idea.

You’ll be so glad when you need to revisit that code in a few years time.

Advanced Language Constructs

CPU cycles are cheap, human attention span is expensive. In my book, trading CPU cycles for code that is easier to read and understand is a good trade.

Many languages and frameworks provide clever ways to express complicated concepts in just a few characters. If you live and breathe that environment, you can express a complex operation with very little code. That gives you a short-term ‘win’.

Now picture the situation where someone who is not as familiar with the advanced language constructs needs to understand and maybe adjust the code.

That someone might be someone new on the team.

Or it might be you, six months or a year later. By now, you’ve switched to a new framework, and you might have forgotten the more intricate details of the previous framework. Frameworks and language features come into and go out of fashion at a rapid pace.

Or I might be showing my code to someone who knows how to code, but does not know ‘my’ language.

Or maybe I am on holiday and the project manager, who does not speak the language fluently wants to try and check on how some process works.

It is to my benefit if they still can ‘grok’ the gist of what the code does, rather than get mired into weird and wonderful constructs.

The way I tackle this is to try and avoid advanced language or framework constructs if I can.

If the benefit of the construct is mostly in conciseness of expression, rather than an extreme performance increase, I’ll avoid using it.

If I do use something fancy, I will document it clearly in the README so future-me doesn’t have to reverse-engineer it.

Code Like A Graphics Designer

Just like in graphic design and publishing, white space is a crucial tool for guiding the reader’s eye. Code readability can be increased tremendously with thoughtful use of empty space.

When laying out a flyer or article, designers don’t cram every inch with content; they use spacing to make structure visually intuitive.

I apply the same principle in my code: I use newlines, indentation, and vertical layout to reveal the structure of complex expressions.

It’s not about saving lines, it’s about making logic visible.

Small Steps

When coding I see my code as a whole sequence of ‘thoughts’.

I have no formal definition of what a ‘thought’ is but as I read through code, my brain will step through these thoughts.

If the thoughts in the sequence are simple, the code is easy to follow.

If a thought forces the reader to mentally solve a Rubik’s Cube, the code is too complex.

I try to spread out thoughts and make sure there is room to ‘breathe’. The mind of the reader needs to digest what it has seen and get ready for the next bit.

By breaking up the code into small ‘thoughts’ and introduce them gradually, rather than ‘clump’ it all up in a dense pile of clever code, I try to make the meaning clear.

I only worry about performance after all is said and done: premature optimization is the root of much evil. When writing code, I will not worry about performance – I only worry about making the code readable. After all, in my experience, choosing the right algorithms is much more effective for getting good performance than writing dense, high-functioning code.

I will often introduce ‘one-time-use’ functions to make thoughts clearer, by putting a name to them.

For example, grabbing a file name extension from a file name or path can easily be written out explicitly, but even so, I’ll still wrap such expression into a function, say getFileNameExtension(theFilePath) or so and make it easier to grok the code.

Sometimes I work with the concept of ‘stories’: I don’t adhere to the idea that functions should always be short or have a proscribed maximum length.

Some of my functions might span multiple pages. These I see as ‘story functions’ – they tell a story, and artificially breaking them up might disrupt the story and makes them hard to follow. Stories tend to go like ‘this and then that, and then that, and then that…’, i.e. a nearly straight sequence of calls into other functions.

My code will be mostly short functions (one thought per function), and might also have a few long functions (one story per function).

Next

If you’re interested in automating part of a Creative Cloud-based workflow, please reach out to [email protected] . We create custom scripts, plugins and plug-ins, large and small, to speed up and take the dread out of repetitive tasks.

If you find this post to be helpful, make sure to give me a positive reaction on LinkedIn! I don’t use any other social media platforms. My LinkedIn account is here:

https://www.linkedin.com/in/kristiaan

Leave The Code A Little Cleaner Than You Found It

This is the first in a series of posts of a series: Coding Without the Jargon, where I show how I clean up and gradually improve real-world code.

Rule #1: every time you’re working on a body of code, try to improve the existing code a little bit.

Gradual improvements can be more effective and less disruptive than going ‘full refactor’.

Things I will often do:

Add Functions or Methods With Meaningful Names Instead of Comments

An example: there might be repeated code like (PHP sample):

$timestamp = round(microtime(true) * 1000);

You might be inclined to add a comment:

// Get current time since epoch in milliseconds
$timestamp = round(microtime(true) * 1000);

But I would not do that.

Instead, I prefer using a function like currentTimeSinceEpochInMilliseconds().

That way, the code can become ‘self-commenting’.

function currentTimeSinceEpochInMilliseconds() {
    return round(microtime(true) * 1000);
}
...
$timestamp = currentTimeSinceEpochInMilliseconds();

Before I’d dive in, I’d make sure a similar function does not already exist in the code.

Using global find, and regular expressions, performed over the whole project source code I’d search for clues like timestamp, microtime, epoch, current

Remove Useless Comments

Some comments can make the code harder to read and harder to maintain. They clutter the code and can be misleading.

Read the code, look for such comments and remove them.

One type of less-than-useful comments is ‘stating the obvious’.

// Here we increment the customerCount by 1
customerCount++
;

This comment adds nothing of value, and it might become a lie in the future – say, someone comes in and makes the increment a variable but does not notice the comment:

// Here we increment the customerCount by 1
customerCount += customerBatchSize;

If the person making the change does not notice the comment and removes it or adjusts it, then the comment becomes a lie.

Space Out Complex Expressions

Spend some time reformatting complicated expressions, merely adding white space.

The computer does not care, but the human reader greatly cares. In most programming languages, you can spread complex expressions over multiple lines, and use spacing to draw the human reader’s attention to subexpressions and scopes.

Simple example

            while (! isTimedOut(tryToFinishBefore) && (fIsRunningServicesTimeslice || ! exitEventLoopWhenTrue())) {
...
            }

can become

            while (
                ! isTimedOut(tryToFinishBefore)
            &&
                (
                    fIsRunningServicesTimeslice
                ||
                    ! exitEventLoopWhenTrue()
                )
            ) {
...
            }

Add Comments

I like to avoid comments by making the code be self-documenting if I can, but that is not to say comments are not useful.

While you’re debugging or adding to the code, you might need some time figuring out some existing complicated code.

At some point in the future, you, or someone else, might end up back in the same area, wondering what it all means. Why not save that person some time, and record your findings?

I try to initial and date my comments: when reading comments, knowing who made them and when they made them is often very useful info.

// KC 20250321 This code attempts to calculate a unique ID by 
// hashing the data, but it needs a bit of work. It does not 
// handle the edge case where the string is empty - TODO
...

By marking up my comments, I can do global searches for // KC and hit all my comments and review them. Or during the next refactor I can search for TODO and find all the spots that need work.

Remove Unused Code

Depending on the project and programming environment at hand, it might be easy to identify unused code.

Commented out code is a prime candidate for removal, of course. It needlessly clutters the code and hinders a global ‘find’ over the whole project because it will generate false positives.

But there might also be functions or named constants that are never called or referenced.

An example: at times I find myself writing additional functions ‘just in case I might need them’, because they’re pretty much ‘for free’ and come with very little extra effort. And occasionally they do come in handy: I often have a hunch of what I might need. But if I don’t need them after all, I’ll eventually strip them away.

There are environments where functions are not necessarily referenced by name, in which case it might be harder to determine use of a function, but in many environments, a simple global search can be enough to determine whether a function is unused.

Remove it, and there’s less code to maintain. If need be, you can always bring it back from your source code control system.

Add Tests

You can always make the code more and more self-testing – and you don’t need an existing framework to do this.

All you need to do is add code that tries out various functions and compares actual results with expected results. Any kind of sensible test will help – it does not matter if the tests are not (yet) comprehensive.

Writing tests will help you think about edge cases: you’ll start wondering things like ‘what if this string is empty?’, ‘what if the search string is embedded twice?’,…

These tests can be triggered in all kinds of ways – one-shot triggers, only in debug versions, only when certain .env variables are set. You can pick whatever method works best for you.

These tests act as a canary in the coal mine: some time in the future someone might change something. These tests might be the one thing that saves your bacon as they suddenly start failing.

And finally: such tests are very useful as sample code. Want to understand out how a function works? Look at the tests.


    SANITY_CHECK(Utils::intToHexString(0,0) == "", FUNCTION_BREAK);
    SANITY_CHECK(Utils::intToHexString(1,4) == "0001", FUNCTION_BREAK);
    SANITY_CHECK(Utils::intToHexString(10,6) == "00000a", FUNCTION_BREAK);
    SANITY_CHECK(Utils::intToHexString(0x12345678,8) == "12345678", FUNCTION_BREAK);
    SANITY_CHECK(Utils::intToHexString(0x9ABCDEF0,8) == "9abcdef0", FUNCTION_BREAK);

    SANITY_CHECK(Utils::isWildCardMatch("", ""), FUNCTION_BREAK);
    SANITY_CHECK(! Utils::isWildCardMatch("a", ""), FUNCTION_BREAK);
    SANITY_CHECK(! Utils::isWildCardMatch("", "a"), FUNCTION_BREAK);
    SANITY_CHECK(! Utils::isWildCardMatch("", "?"), FUNCTION_BREAK);
    SANITY_CHECK(Utils::isWildCardMatch("", "*"), FUNCTION_BREAK);
    SANITY_CHECK(Utils::isWildCardMatch("a", "a"), FUNCTION_BREAK);
    SANITY_CHECK(Utils::isWildCardMatch("a", "a*"), FUNCTION_BREAK);
    SANITY_CHECK(Utils::isWildCardMatch("a", "*a"), FUNCTION_BREAK);
    SANITY_CHECK(Utils::isWildCardMatch("a", "*a*"), FUNCTION_BREAK);
    SANITY_CHECK(Utils::isWildCardMatch("a", "*a**"), FUNCTION_BREAK);
    SANITY_CHECK(Utils::isWildCardMatch("ab", "*a**b"), FUNCTION_BREAK);
    SANITY_CHECK(Utils::isWildCardMatch("ab", "a*b"), FUNCTION_BREAK);
    SANITY_CHECK(Utils::isWildCardMatch("acb", "a*b"), FUNCTION_BREAK);
    SANITY_CHECK(! Utils::isWildCardMatch("acb", "a*c"), FUNCTION_BREAK);
    SANITY_CHECK(Utils::isWildCardMatch("a", "?"), FUNCTION_BREAK);
    SANITY_CHECK(! Utils::isWildCardMatch("a", "a?"), FUNCTION_BREAK);
    SANITY_CHECK(! Utils::isWildCardMatch("a", "?a"), FUNCTION_BREAK);
    SANITY_CHECK(Utils::isWildCardMatch("ba", "?a"), FUNCTION_BREAK);
    SANITY_CHECK(! Utils::isWildCardMatch("a", "?a?"), FUNCTION_BREAK);
    SANITY_CHECK(! Utils::isWildCardMatch("ba", "?a?"), FUNCTION_BREAK);
    SANITY_CHECK(! Utils::isWildCardMatch("ab", "?a?"), FUNCTION_BREAK);
    SANITY_CHECK(! Utils::isWildCardMatch("bca", "?a?"), FUNCTION_BREAK);
    SANITY_CHECK(! Utils::isWildCardMatch("abc", "?a?"), FUNCTION_BREAK);
    SANITY_CHECK(Utils::isWildCardMatch("aac", "?a?"), FUNCTION_BREAK);
    SANITY_CHECK(Utils::isWildCardMatch("ax", "*a?*"), FUNCTION_BREAK);
    SANITY_CHECK(Utils::isWildCardMatch("dsdsaap", "*a?*"), FUNCTION_BREAK);
    SANITY_CHECK(Utils::isWildCardMatch("dsdsaadsdhsja", "*a?*"), FUNCTION_BREAK);
    SANITY_CHECK(! Utils::isWildCardMatch("dsdsdsdhsja", "*a?*"), FUNCTION_BREAK);
    SANITY_CHECK(Utils::isWildCardMatch("baxss", "*a?*"), FUNCTION_BREAK);
    SANITY_CHECK(! Utils::isWildCardMatch("baxss", "a?*"), FUNCTION_BREAK);
...

Next

If you’re interested in automating part of a Creative Cloud-based workflow, please reach out to [email protected] . We create custom scripts, plugins and plug-ins, large and small, to speed up and take the dread out of repetitive tasks.

If you find this post to be helpful, make sure to give me a positive reaction on LinkedIn! I don’t use any other social media platforms. My LinkedIn account is here:

https://www.linkedin.com/in/kristiaan

Coding Without the Jargon

As part of my day-job I review, debug and fix a lot of code written by other people.

At the same time, I am tracking various influential people; there are a lot of opinions and ‘best ways’ to do things. Patterns, Continuous Development, Dependencies, Injection, Observers and Subscribers… lots of words with ‘deep’ meanings.

Make no mistake: those patterns and principles are very useful, but their meaning is often obscured by jargon and big words.

In practice, I see very little of those ‘best ways’ in the real-life code I am working on.

This is code, written by real people with real problems, and often there seems to be no time for the ‘big word’ principles.

Given an existing code base, it often seems impossible to go from ‘here’ to ‘there’ so we all just stay ‘here’.

Pragmatism

I’ve been coding for over four decades, in all kinds of languages, and gone through many phases.

In the last 10, 15 years my coding style has kind of ‘gelled’ into a number of ‘common sense’ principles which I use independent of the language I am coding in – be it JavaScript, C++, PHP, Python, bash, PowerShell, Xojo, COBOL…

More often than not, my common sense principles are built on various patterns and principles, but they are more easily explained in plain English.

My principles also allow me to do ‘gradual improvement’: I can very gradually massage existing code to adhere more and more to my principles, and make it more robust, stable and easier to debug.

What I intend to do is start writing a number of blog posts, each one elaborating on one of the principles I use in day-to-day coding and debugging.

The Principles

  • Always leave the code a little cleaner than you found it. Any tiny improvement counts.
  • Write code that is human-first. It has to be readable for humans that might not know the language well. Worry about optimization later.
  • Don’t sweat the details. If it’s not your own code you’re working on, try to fit in and accept existing code conventions.
  • Any coding convention is a good coding convention. There is no best coding convention, but you have to have one.
  • Make sure you can write to some logging infrastructure at different levels of intensity, and the logs are easy to obtain.
  • Never fail silently. Always report anything unusual in the logs.
  • Exceptions being thrown should be truly exceptional. Try to avoid throw-ing in the day-to-day regular flow of your code.
  • Be helpful and forthcoming to the human reader of your code. Don’t show off with interesting tricks. Don’t embed booby-traps for unsuspecting human readers.
  • Don’t use magic literal numbers and strings. All constants need to have meaningful names.
  • Simple, crude and dependable beats elegant, sophisticated and unreadable.
  • The most important character in your code is the space.
  • The algorithm you select is more important than exactly how you express it in code.
  • Make sure every variable name is carefully chosen to help make the code easier to understand.
  • Comments can become booby-traps. Nearly all comments older than a few weeks are either irrelevant or a lie.
  • If you can remove a comment and instead rename stuff and restructure the code, do it.
  • Commit often and use meaningful commit comments.
  • Don’t comment out code. Delete it. Use your source code control system to retrieve deleted code you need to resurrect.
  • Use condition ladders and write paranoid, defensive code.
  • Try to only have a single return in each method.
  • Bottleneck entry and exit of functions and methods
  • Use globals judiciously
  • Don’t use long parameter lists
  • Be clear and be consistent on how you pass data in and out of functions and methods.
  • Context objects help simplify function signatures.
  • Abstraction helps, but don’t go overboard.
  • Try very hard to avoid copy-paste coding.
  • Break up most long functions.

Stay Tuned

By adhering to and applying these principles, I have been able to successfully tame big, messy, problemsome code bases without massive rewrites.

I’ll start writing a blog post for each of the above principles over the course of the coming weeks…

Next

If you’re interested in automating part of a Creative Cloud-based workflow, please reach out to [email protected] . We create custom scripts, plugins and plug-ins, large and small, to speed up and take the dread out of repetitive tasks.

If you find this post to be helpful, make sure to give me a positive reaction on LinkedIn! I don’t use any other social media platforms. My LinkedIn account is here:

https://www.linkedin.com/in/kristiaan

Conditional Code Without Preprocessor

There’s normally a performance price to pay for adding diagnostic code to programs written in languages that don’t have macros and conditional compilation like C/C++.

For example in JavaScript, consider something like:

LOG_DEBUG("Current Structure = " + structure.toString());

If this were C++, the LOG_DEBUG could simply ‘pooff!’ away in a release version simply by redefining the LOG_DEBUG macro as ‘nothing’.

But in languages like JavaScript, even though LOG_DEBUG might be stubbed out into a ‘do nothing’ function call, you’re still executing the toString() method and paying an overhead for calling and returning from LOG_DEBUG.

I have a little trick. To make it work, you need some coding discipline, but the trick allows me to enable/disable code in a script or program without actually removing the code.

The trick is to make sure the conditional bit of code fits on a single line.

Then prefix the code with a specially crafted comment, for example: /*EXTRA_LOGGING//*/

/*EXTRA_LOGGING//*/ LOG_DEBUG("Current Structure = " + structure.toString());

You can use different strings to identify different ‘types’ of conditional code – XX1, DEBUG, EXTRA_LOGGING,…

Then to disable all the EXTRA_LOGGING you simply do a global find-and-replace:

/*EXTRA_LOGGING//*/ -becomes-> /*EXTRA_LOGGING*///

and this comments out all the EXTRA_LOGGING lines.

Do the reverse find-and-replace, and they’re enabled again. It’s crude and a little fiddly, and you better not ‘break’ the lines, but it’s better than nothing.

Of course, you need to make sure the pattern does not occur in the source code except for these kinds of lines.

Next

If you’re interested in automating part of a Creative Cloud-based workflow, please reach out to [email protected] . We create custom scripts, large and small, that can speed up and take the dread out of repetitive tasks.

If you find this post to be helpful, make sure to give me a positive reaction on LinkedIn! I don’t use any other social media platforms. My LinkedIn account is here:

https://www.linkedin.com/in/kristiaan

Creative Cloud Desktop sez: You Don’t Have Access To Manage Apps

I’ve created two scripts (Mac/Windows) that might fix the issue of getting a message “You don’t have access to manage apps” in the Creative Cloud Desktop App.

You can find them here:

https://github.com/zwettemaan/FixAdobeAppsPanel/blob/main/README.md

What’s going on?

There are several valid reasons why the Creative Cloud Desktop app might block your access to your Adobe apps.

You might be part of a team and the admin has revoked your access.

Or you might be logged in to an incorrect Adobe account. Or something…

But each time I have personally encountered this, there was no clear, valid reason. One day it worked; the next day, I no longer had access.

If you search online, you’ll likely bump into this community forum post:

https://community.adobe.com/t5/download-install-discussions/quot-you-don-t-have-access-to-manage-apps-quot/td-p/14583430

which suggests a fix: modifying a file called serviceconfig.xml and restart the computer. It works.

However, the process is fiddly, and I got tired of going through the motions. I decided to automate the process.

Below, two scripts, one for Mac, one for Windows, which automate the process outlined in the community forum post.

I used the Claude chatbot to generate an initial version, then refined it manually to fix some issues.

Scripts On Github

Caveat: these scripts come with no warranty, express or implied. They work for me, but use them at your own risk.

I recommend inspecting the scripts first to ensure they will work for you. They’re simple and easy to follow.

Future changes to the location or format of serviceconfig.xml may break these scripts.

To run the scripts you need a user account that has administrative privileges.

If your computer is controlled by the IT department, you’re probably out of luck – you need to ask them to help you out.

Go here:

https://github.com/zwettemaan/FixAdobeAppsPanel/blob/main/README.md

Next

If you’re interested in automating part of a Creative Cloud-based workflow, please reach out to [email protected] . We create custom scripts, large and small, that can speed up and take the dread out of repetitive tasks.

If you find this post to be helpful, make sure to give me a positive reaction on LinkedIn! I don’t use any other social media platforms. My LinkedIn account is here:

https://www.linkedin.com/in/kristiaan

IPv6-Related Connection Failures

(Added Note: the issue affects more than command line tools like npm. My Creative Cloud App on Windows started misbehaving and experiencing connection errors, and disabling IPv6 on the adapter level in Windows fixed it).

I ran into a little head scratcher today and I am documenting it here for future reference.

I wanted to pull and initialize a Node.js project in a small local virtual machine, and I tried to run

npm install

and nothing worked. I distinctly remember this used to work, but not this time around.

The symptoms were similar to a networking issue. But the network was up and running, I could ping and connect to various servers.

I eventually figured it out: the root cause is that IPv6 is not yet universally supported.

I live in New Zealand, and cannot connect to external sites using IPv6 – my ISP does not support it.

I do have IPv6 support on the office network, but that’s only for connections within my local network.

When performing a DNS lookup for registry.npmjs.org, I got a bunch of addresses – some IPv6, some IPv4.

What was happening was that the virtual machine was trying to connect to one of the IPv6 addresses for registry.npmjs.org, and it failed.

The workaround I used was to ‘turn off’ IPv6 in my virtual machine, forcing npm to connect to one of the IPv4 alternatives.

Exactly how you turn off IPv6 depends on the Linux distro and version. In my case, the virtual machine is running a recent Ubuntu, and I added the following lines to /etc/sysctl.conf:

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

and then

sudo sysctl -p

More IPv6 Clashes

I think I had similar issues in the past when trying to connect to GitLab.

And I just bumped into the same issue using the Creative Cloud App on Windows. It was claiming network issues, despite the network being up and running. Disabling IPv6 on the adapter level in Windows fixed that.

I hope publishing this gotcha here will maybe save someone else some time.

Next

If you’re interested in automating part of a Creative Cloud-based workflow, please reach out to [email protected] . We create custom scripts, large and small, that can speed up and take the dread out of repetitive tasks.

If you find this post to be helpful, make sure to give me a positive reaction on LinkedIn! I don’t use any other social media platforms. My LinkedIn account is here:

https://www.linkedin.com/in/kristiaan

Diving Into Data Science – Step 2: a Virtual GPU?

This is step 2 in a narrative log of my discovery mission, mainly to serve as notes for my future self.

https://github.com/zwettemaan/DabblaScience

For the next step, I had to wrestle a bit with Claude 3.7. It handled my requirements for the previous step beautifully, but that was only the first step.

For the next step, my dialogue with Claude reverted into an all too familiar pattern, where it came up with things that almost worked, but with odd shortcomings, oversights and bugs. I needed multiple do-overs to eventually produce a usable result. In the end, I might have been faster starting from scratch and writing it without assistance.

One issue Claude had trouble with was the concept of ~ in file and folder paths: it got confused with the ‘when and where’ of resolving ~ to the home directory.

Depending on the context for various scripts, ~ resolves in the host machine (my Mac), and sometimes ~ resolves in the VM – and Claude did not seem to have a good understanding of how that works.

Virtual GPU, Sort Of

One of the issues I considered is that the virtual machine I created on my Mac is much more limited than the host Mac itself.

The Mac has an M2 Max CPU and 64GB of RAM and can fairly comfortably run 32b models.

The temporary/throw away Ubuntu VM created in step 1 does not have the same breathing space, and I was wondering whether it would be possible to make a hybrid solution, where I use the Mac as a ‘virtual GPU’

After some wrestling with Claude 3.7, I managed to shepherd it to generate a working setup.

I’ve re-structured the Github repo with one subfolder per step – the Step 1 subfolder is Step1_20250228, and the subfolder for this step is Step2_20250301.

Running The Notebook

All you need to do is install the preliminaries (homebrew and multipass) on a powerful Mac.

Then run the Step2_20250301/setup.sh and everything else should happen automatically.

Run start_environment.sh and the Jupyter Notebook should open in your browser.

The password you need to enter is datascience (inspect the setup.sh script for more details).

After you’re finished experimenting, run the ds-setup/reset_environment.sh script. Then delete the contents of Step2_20250301 (apart from setup.sh) and it’ll all be goneski, clean slate.

I also asked Claude to
– add some more structure to the helper scripts
– make the Jupyter Notebook password handling less confusing
– pre-download a small model for the sake of testing
– make sure all generated and downloaded material is collated in the project folder.

Changing Tack

It all seems to work, but it’s not really what I am after.

Apart from the highest-level user-interaction code, the whole model is offloaded into a service running on the host Mac, and remains a black box, whereas I want to be able to start peeling apart the layers of the onion and gain more understanding as I dig in.

I think the better approach will be to abandon my requirement to use a Virtual Machine. Instead of using a VM, I’ll need to pick one of the ‘virtualization’ methods available for Python to keep a clear delineation between the experimental environment and my work-environment. venv seems to handle that well enough, so I’ll switch from using VMs to using venv for the next steps.

Next

If you’re interested in automating part of a Creative Cloud-based workflow, please reach out to [email protected] . We create custom scripts, large and small, that can speed up and take the dread out of repetitive tasks.

If you find this post to be helpful, make sure to give me a positive reaction on LinkedIn! I don’t use any other social media platforms. My LinkedIn account is here:

https://www.linkedin.com/in/kristiaan

Diving Into Data Science – Step 1: Lines In The Sand.

Over the recent months, I’ve been ‘dabbling’. Trying various LLMs, experimenting with AI-assisted development,…

I’ve gained a ‘from 10,000 feet up’ understanding of how it all works, but it’s clear that I am missing a lot of background info.

I’ve now decided to try a more systematic approach – rather than flitting around and trying this, then that, I want to start with ‘stepping stones’.

I’ll be posting a series of blog posts ‘as I go’ about each of my stepping stones. This is mostly for my own benefit, so I can come back to my notes when I pick this up some months into the future.

This first post is about setting up my ‘data science workstation’ using Python and multipass.

I’ll be sharing relevant materials in this repo:

https://github.com/zwettemaan/DabblaScience

Going Back to Square One

One of my quirks is that I like to be able to ‘start over’.

E.g. I want to be able to venture into some new realm, try out stuff, take wrong turns, go up the wrong creeks… and then have the luxury of erasing the whole sad affair and start over with a clean slate.

Rinse, repeat until I get the hang of it.

Virtual Machines

That’s why I love virtual machines and snapshots: I do a lot of my learning and experimenting with VMs, and VMs make it easy to just ‘poooffff!’ it away when things go off the rails. And off the rails I go, often!

Docker containers are somewhat similar, but I find them more cumbersome for the purpose of learning because they have a different approach than VMs. I prefer VMs when it comes to trying stuff out.

Preliminary

If you’re not on a Mac like me, you could ask Claude or an other capable LLM/chatbot to generate similar scripts for you – multipass runs on many platforms.

I installed homebrew and then used homebrew to install multipass.

https://brew.sh

then:

brew install multipass

More info for other platforms:

https://canonical.com/multipass/install

Using Claude to Generate a VM

I generated a few scripts using Claude 3.7. It did a good job generating useful scripts:

macos-inventory-script.sh is a script that gives me an inventory of relevant stuff I’ve installed on my Mac. Over time I’ve used all kinds of stuff – fink, MacPorts, homebrew, npm, nvm, pip,… and as time goes, my Mac accumulates more and more gunk.

This script gives me an indication of what I installed, so it’s easier to see and remove stuff I don’t need any more.

setup_python_ds_vm.sh is a script that installs an Ubuntu VM, then installs python into it, and creates a few commands.

The setup script will ask you to pick a password. This is a password you choose, which will be used to protect the Jupyter Notebook UI.

After running this script, you can use one of three commands:

  ./start_jupyter.sh - Start Jupyter server on the VM
  ./open_jupyter.sh - Open Jupyter in your Mac's browser
  ./reset_environment.sh - Reset the environment (delete VM)

Here’s the result:

And now we can get started!

More stuff later…

Next

If you’re interested in automating part of a Creative Cloud-based workflow, please reach out to [email protected] . We create custom scripts, large and small, that can speed up and take the dread out of repetitive tasks.

If you find this post to be helpful, make sure to give me a positive reaction on LinkedIn! I don’t use any other social media platforms. My LinkedIn account is here:

https://www.linkedin.com/in/kristiaan

Running LLM Locally

I’ve been running DeepSeek-r1 locally on my MacBook Pro, and it works really well.

It seems to work best with short, pointed conversations, probably because my Mac only has 64GB RAM, which affects the size of the context window, and with longer conversations the LLM starts to ‘forget’ what we’re talking about as the dialogue meanders on.

I’ve now installed Open Web UI and that works a treat – it’s a bit easier than running from a command line window:

https://docs.openwebui.com/

Mucking Around with Context Length

I tried a bunch of things, but the results are inconclusive – it might have a fixed context length of 12k tokens, or it might thoughtlessly give a canned answer when asked about its context length:

ollama run deepseek-r1:32b
>>> /show info
  Model
    architecture        qwen2     
    parameters          32.8B     
    context length      131072    
    embedding length    5120      
    quantization        Q4_K_M    
  Parameters
    stop    "<|begin▁of▁sentence|>"    
    stop    "<|end▁of▁sentence|>"      
    stop    "<|User|>"                 
    stop    "<|Assistant|>"            
  License
    MIT License                    
    Copyright (c) 2023 DeepSeek    
>>> what is your context length
<think>
</think>
I can generate responses up to 12k tokens. The best results come from 
careful thought!

Even when setting a small context window, the response is always 12k:

ollama run deepseek-r1:14b 
>>> /set parameter num_ctx 2048
Set parameter 'num_ctx' to '2048'
>>> what is your context length?
<think>

</think>

I can generate responses up to 12k tokens. The best results come from 
careful thought!

>>> 

It looks like this is a ‘canned’ answer – there is no ‘thinking’ involved.

JSONC, State Machines, LLM

Recently, I was working on a project where I needed to handle JSON-encoded data.

JSON is cool and all that, but I dislike that I cannot add comments to it.

When I am using JSON to manually craft sample config files for various projects, I really, really want to add comments that explain the file’s contents.

JSONC (JSON with Comments) is a derivative file format that fulfills this need.

JSONC is not a formal standard. Visual Studio Code uses JSONC as the file format for its config files, which gives the file format some credibility and heft.

Side-note: JSONC is not the only game in town. To make config more human-friendly I often prefer my ‘enhanced INI‘ file format which is more forgiving than JSON – it won’t break on a missing } – see
https://coppieters.nz/?p=921

The code base I was working on was using PHP, and I wanted to be able to handle JSONC.

There were any number of ways one could go about that. I don’t really like pulling in external code and add dependencies for simple things like that, so I decided to approach it with a one-two punch.

I’d set up a simple ‘comment stripper’ function to remove the comments from a JSONC string, and then handle the remaining JSON data with the standard PHP JSON-handling functions, e.g.

        $dataJSON = JSONHelper::stripCommentsFromJSONC($dataJSONC);
        $data = json_decode($dataJSON, true);

For reference, the final, human-written PHP code is further down this blog post.

Trying ChatGPT o1

Lazy as I am, I tried to get ChatGPT o1 to whip up a quick implementation of said comment stripper.

Well, that turned out to be a time sink.

First, it came up with some naive implementation using regular expressions, and it was clear as day that would not work very well.

I’ve not even tried to run the code it provided, but I could see that things like

{
"a":"b//bladibla",
"b":"this is not a comment/*bladibla"
}

would immediately have gone ‘off the rails’ where it would mis-match comment-like constructs inside quoted strings.

I tried again, imploring the chatbot to please use a single-pass parser using a state machine.

It fell far short again: rather than use a single state variable, it encoded its state in a bunch of booleans and ints. What a mess.

I politely tried again, telling the chatbot to use a single integer variable for encoding its state.

No cigar. It complied, sort of, but started manipulating the character position in the string to ‘look ahead’ and ‘look behind’, and made a major muddle of things.

At that point I gave up: it had already cost me more time fighting with the AI than it would cost me to properly write up the function myself.

It’s an indication that no good comes from teaming up inexperienced developers with hare-brained AIs.

Sadly enough, the code produced by the LLM will work most of the time and only breaks down on non-obvious edge cases. That’s where the real danger lurks. The code will have ‘sometimes/sometimes’ behavior, and debugging that kind of code will be a nightmare.

I predict that for a lot of such code, the only remedy will be a substantial rewrite; there won’t be any quick fixes for reams of flawed code full of beginners’ mistakes.

Next

If you’re interested in automating part of a Creative Cloud-based workflow, please reach out to [email protected] . We create custom scripts, large and small, that can speed up and take the dread out of repetitive tasks.

If you find this post to be helpful, make sure to give me a positive reaction on LinkedIn! I don’t use any other social media platforms. My LinkedIn account is here:

https://www.linkedin.com/in/kristiaan

PHP: Stripping Comments From JSONC

In case anyone else needs it, and for what it’s worth, as-is, warts and all: here’s the code I came up with. Feel free to use it – I’ve added an MIT license.

As you can see, the code makes a single pass through the string, accessing each character just once, and it tracks its state in a single integer variable $state.

It will also sensibly process some ‘illegal’ forms of JSON (e.g. using single quotes instead of double quotes), but it will not attempt to correct them. All it does is strip comments from the string.

Garbage in, garbage out: send in malformed JSONC, get back malformed JSON.

Let’s hope the LLM scrapers pick up this blog post, and maybe the next generation of AI might have a better grasp of state machines!

<?php

/*
JSONHelper.php
Copyright 2025, Kris Coppieters [email protected]
MIT License info at end of file
*/


namespace App\Helpers;

const PARSE_IDLE                  = 0;
const PARSE_SEEN_DQUOTE           = 1;
const PARSE_SEEN_SQUOTE           = 2;
const PARSE_SEEN_DQUOTE_BSLASH    = 3;
const PARSE_SEEN_SQUOTE_BSLASH    = 4;
const PARSE_SEEN_SLASH            = 5;
const PARSE_IN_LINE_COMMENT       = 6;
const PARSE_IN_BLOCK_COMMENT      = 7;
const PARSE_IN_BLOCK_COMMENT_STAR = 8;

class JSONHelper
{
    // This will also allow 'invalid' JSON that uses single quotes
    // as well as double quotes
    // The goal is not to flag bad JSON - it's merely to strip comments.

    public static function stripCommentsFromJSONC($jsonC) {

        $retVal = $jsonC;

        if (! empty($jsonC)) {

            $state = PARSE_IDLE;
            $json = '';
            $length = strlen($jsonC);

            for ($charIdx = 0; $charIdx < $length; $charIdx++) {

                $char = $jsonC[$charIdx];

                switch ($state) {
                    case PARSE_IDLE: {
                        if ($char == '"') {
                            $json .= $char;
                            $state = PARSE_SEEN_DQUOTE;
                        } 
                        elseif ($char == '\'') {
                            $json .= $char;
                            $state = PARSE_SEEN_SQUOTE;
                        }
                        elseif ($char == '/') {
                            $state = PARSE_SEEN_SLASH;
                        }
                        else {
                            $json .= $char;
                        }
                        break;
                    }
                    case PARSE_SEEN_DQUOTE: {
                        if ($char == '"') {
                            $json .= $char;
                            $state = PARSE_IDLE;
                        } 
                        elseif ($char == '\\') {
                            $state = PARSE_SEEN_DQUOTE_BSLASH;
                        }
                        else {
                            $json .= $char;
                        }
                        break;
                    }
                    case PARSE_SEEN_DQUOTE_BSLASH: {
                        $json .= "\\" . $char;
                        $state = PARSE_SEEN_DQUOTE;
                        break;
                    }
                    case PARSE_SEEN_SQUOTE: {
                        if ($char == '\'') {
                            $json .= $char;
                            $state = PARSE_IDLE;
                        } 
                        elseif ($char == '\\') {
                            $state = PARSE_SEEN_SQUOTE_BSLASH;
                        }
                        else {
                            $json .= $char;
                        }
                        break;
                    }
                    case PARSE_SEEN_SQUOTE_BSLASH: {
                        $json .= "\\" . $char;
                        $state = PARSE_SEEN_SQUOTE;
                        break;
                    }
                    case PARSE_SEEN_SLASH: {
                        if ($char == '/') {
                            $state = PARSE_IN_LINE_COMMENT;
                        } 
                        elseif ($char == '*') {
                            $state = PARSE_IN_BLOCK_COMMENT;
                        }
                        else {
                            $json .= '/' . $char;
                        }
                        break;
                    }
                    case PARSE_IN_LINE_COMMENT: {
                        if ($char == "\012" || $char == "\015") {
                            $state = PARSE_IDLE;
                        } 
                        break;
                    }
                    case PARSE_IN_BLOCK_COMMENT: {
                        if ($char == '*') {
                            $state = PARSE_IN_BLOCK_COMMENT_STAR;
                        } 
                        break;
                    }
                    case PARSE_IN_BLOCK_COMMENT_STAR: {
                        if ($char == '/') {
                            $state = PARSE_IDLE;
                        }
                        elseif ($char != '*') {
                            $state = PARSE_IN_BLOCK_COMMENT;
                        }
                        // Stay in PARSE_IN_BLOCK_COMMENT_STAR 
                        // state if char == '*'
                        break;
                    }
                }
            }

            if ($state == PARSE_SEEN_SLASH) {
                // Edge case where the last character is a slash. 
                // The slash was consumed but not yet output.
                // The returned data will definitely not be JSON, 
                // but we'll output it anyway.
                $json .= "/";
            }
            elseif (
                $state == PARSE_SEEN_SQUOTE_BSLASH 
              || 
                $state == PARSE_SEEN_DQUOTE_BSLASH
            ) {
                // Edge case where the file terminates mid-string 
                // with a pending backslash. The data will
                // definitely not be JSON, but we'll output it 
                // anyway.
                $json .= "\\";
            }

            $retVal = $json;
        }

        return $retVal;
    }

}

/*
Permission is hereby granted, free of charge, 
to any person obtaining a copy of this software
and associated documentation files (the “Software”), 
to deal in the Software without restriction, 
including without limitation the rights to use, 
copy, modify, merge, publish, distribute, sublicense, 
and/or sell copies of the Software, and to permit 
persons to whom the Software is furnished to do so, 
subject to the following conditions:

The above copyright notice and this permission notice
shall be included in all copies or substantial portions
of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY 
OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT 
LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT 
SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY 
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF
CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 
CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
*/

?>
Mastodon