published on May 14, 2025 in llm data engineering dbt vibe coding

Where LLMs fail to tread - successes,failures and how we may remain relevant

Generative LLM-backed AI is controversial and amazing, no debate about that. It hinders, it helps. It frustrates, it amuses, and, it is here to stay. Knowing when to, and when not to lean on the LLM is probably a skill to acquire. This post is a reflection on a bit of work, where I was both helped and hindered by an LLM. The adventure is set to the backdrop of writing a Jinja2 macro, DBT model and config for finding natural keys. It’s a long story, quite a journey really, so feel free to skip to the summary and conclusion first before investing your time in reading the whole post.

Uniqueness and finding natural keys

Finding the natural key is a process that is critical in any attempt to understand data. Most data sets come with some kind of surrogate key, such as an incrementing integer ID, a hash or some other artificial unique identifier. It can at times be fine to simply use that identifier as a key in your data modelling for OLAP/ML/AI, but this is not generally a good idea. When we use a surrogate key produced by a source system to define uniqueness in our system, we are tightly coupling our system to the source system. Tight coupling is problamatic since a change in the source system producing the data could change how this key is calculated. We accept an understanding of uniqueness that is not grounded in reality but one that is artificially generated. The system offering us this “artificial definition of uniqueness” we usually have little to no insight or influence on.

Act 1: where the LLM worked - well, sort of

Just before the April Easter holidays I found out about Aider - a CLI-based generative/agentic AI tool that can function with many LLM backends. It’s not Cursor or Replit but an open source project, CLI-based - it needs a lot of hand-holding. For that hand-holding you get to be fully in control; you choose exactly which files to provide as context, and when what happens. Not exactly vibe coding unless of course you simply go on a roll and accept all changes blindly. There is an /ask mode for finding out the plan, a /code mode for actioning the plan, and an /architect mode I do not understand yet. Playing around with it I achieved some success, but judged it to be too green, requiring too much effort to precisely instruct.

During the April Easter holidays I heard one of the ThoughtWorks Ninjas, Lilly Ryan, talk about her success using Aider. It was a minor comment, a little side quest, but enough to motivate me, so I gave it another chance.

What I already had

A completeness model/macro pair controlled by config alone that could count blank or null values for all columns of a model.
A specialisation of this model/macro pair that enabled it to operate on a subset of columns.
A uniqueness model/macro pair controlled by config alone that could count number of rows involved in duplication.

What I wanted

The uniqueness macro/model pair to operate on a subset of columns, similar to the completeness one.
The uniqueness macro/model pair to expand to all combinations of the subset of columns, explained below.

In more detail…

I already had a method consisting of macro, model and config for null/blank checking, as part of a larger data quality project:

A model that would dynamically interpret the vars: global element in a dbt_project.yml file to figure out which models, and optionally which columns to operate on - the “config”.
This model would then make DBT aware of dependencies by means of SQL comments constructed through calling the ref() function while emitting -- Depends on {{ ref('model_name') }} for each of the specified models in the vars: element.
It would then call the main macro with the respective parts of the vars: element.
The main macro would then proceed to construct SQL to perform null/blank checking which is basically a counting of rows where the value is NULL or epty string in the case of text column data type.

This workes great, and given input in dbt_project.yml:

vars:
  adq_models:
    - name: patients
      completeness:
        - non_blank_score

the model and macro combination would know how to construct SQL that would perform a non_blank_score query on each column in model patients, and this “check” is in the “completeness” category of checks. The result of this is a very easy method for obtaining a count of null or blank column values, for each column of a model. Generally, this type of check is something you have to write for each column individually, but now you can do it by simply providing a couple of lines of YML and have the counts for all columns.

Extending this pattern further, when we have a very wide “patients” model, with hundreds of columns; We do not want to calculate non_blank_score for all columns. We are only interested in a small subset - in this case one column only drivers representing drivers license number.

vars:
  adq_models:
    - name: patients
      completeness:
        - non_blank_score
      columns:
        - name: "drivers"
          completeness:
             - non_blank_score

The columns specified overrides the all columns behaviour and performs the non_blank_score for the drivers column only.

Lastly I had an additional model macro pair that would perform a type of uniqueness check on all columns. This is a method for calculating how many rows are part of any kind of duplication. For 5 rows [(1), (1), (2), (2), (3)] the check returns 4 since 4 of the 5 rows are involved in duplication.

What I wanted

This uniqueness check I wanted to enable on a per-column basis for a subset of columns, similar to the non_blank_score check.

vars:
  adq_models:
    - name: patients
      columns:
        - name: "birthdate"
          uniqueness:
             - uniqueness_score
        - name: "deathdate"
          uniqueness:
             - uniqueness_score
        - name: "ssn"
          uniqueness:
             - uniqueness_score
        - name: "passport"
          uniqueness:
             - uniqueness_score
        - name: "drivers"
          completeness:
             - non_blank_score
          uniqueness:
             - uniqueness_score

After connecting Aider to OpenAI API I added the relevant files to the context using /add file command. Then I entered the /ask mode and told it the whole story; that I wanted it to morf my existing uniqueness model/macro pair into a per-column mode similar to the non_blank_score check. I asked Aider to modify the model/macro to be able to do uniqueness on a per-column basis. A minute or so later it had a plan, which I scanned, and it looked decent enough, so I entered the /code mode and told it to go ahead.

Act 2: First run…

Using DBT’s selector syntax I isolated the model that would cause the new code to be executed. Unsurprisingly, the first time execution failed due to SQL errors, but the Jinja was not broken. Interestingly, it seemed that the LLM introduced a subtle SQL bug. It was calling the SQL aggregation function sum() in a context where summation was not required - only one row was returned. Because other columns were also referenced in the SELECT, Postgres was saying that the other columns need to be in a GROUP BY clause. This is easy to fix, although tedious, but since the LLM introduced this bug I did not trust it to solve it cleanly, so I did the fix manually. Success! With the fix applied there were no more errors and I confirmed that the uniqueness check now could function on a per-column basis.

Considering what just happend, I was impressed. The LLM was able to do meta programming; write code that would write code. Yes, the LLM had code from me to work with, but it had to apply patterns from one set of files to another without introducing problems. Yes, there was a limit to this, but it is conceivable that, given additional prompting, that it would have been able to solve the sum() bug, but I estimated that the probability for it to do more unwanted code was high enough to warrant a manual intervention.

Act 3: Extending the solution

In addition to the per-column behaviour, I also wanted to have it perform the uniqueness check on all subsets of the subsets. For example, given columns a, b, c I wanted to perform uniqueness checking on (a), (b), (c), (a, b), (b, c), (a, c), (a, b, c).

Aider is quite cool in that it commits code before changes are made so that rollback is straight forward. None the less, I created a new branch for this work, just to be safe.

I went ahead and asked for this added behaviour. The LLM responded with altered code which seemed decent enough, so I ran it…

This time we were not so lucky - the macro could not even run because the LLM farmed out the combinations part to some specific Python module that is not available in DBT Jinja.

At this point I realised that the critical part of the problem was the combinations; I needed a macro to produce all combinations given an input list of columns.

Act 4: Focussing on the small but hard part

Leaving the CLI and Aider, I moved over to Chat GPT since this was a problem I could frame precisely.

I said:

Given a list, write code to produce all combinations of the items in the list; produce a new list of lists where the elements of the outermost list are all combinations of the original list. This needs to work in Jinja, and ideally a single expression, but a function is also acceptable.

ChatGPT said:

In Jinja, producing all combinations (i.e., power set) of a list is a bit tricky since Jinja is not a full programming language. However, with a custom macro or a well-written function in a supported Jinja environment (e.g., in DBT, Ansible, or Flask), you can still get it done.

🔧 Solution using a Jinja macro (recommended for clarity)

This builds the power set (all combinations of all lengths including the empty set):

{% macro powerset(seq) %}
  {% set result = [[]] %}
  {% for item in seq %}
    {% set result = result + result | map('union', [item]) | list %}
  {% endfor %}
  {{ result }}
{% endmacro %}

I thought:

This macro does not even look correct, but none the less, I gave it a try, but it would not execute, with DBT complaining that union was ot available. It also confidently asserted what the results would be should this macro be implemented, but alas… It was very wrong.

I said:

No filter named ‘union’.

ChatGPT said:

You’re right — Jinja does not have a built-in union filter. That error occurs because union is not available unless you define it yourself (which you typically can’t in DBT, Ansible, etc. without custom extensions). But we can still achieve all non-empty combinations using Jinja-native syntax with a macro. We simulate the logic manually by appending new combinations to the list using basic iteration.

✅ Jinja Macro for All Non-Empty Combinations (No union filter)

{% macro powerset_nonempty(seq) %}
  {% set result = [[]] %}
  {% for item in seq %}
    {% set new_result = [] %}
    {% for r in result %}
      {% set new_result = new_result + [r + [item]] %}
    {% endfor %}
    {% set result = result + new_result %}
  {% endfor %}
  {{ result | reject('equalto', []) | list }}
{% endmacro %}

I thought:

Now at least the macro executed, but it produced no output at all. Taking a closer look, I could not imagine that this macro would work, and I started to doubt this is solveable with iteration alone - that this is in fact a problem that would need recursion.

In addition, a more basic bug with the macro would be obvious to anyone who has ever tried to do something in a Jinja for loop. Looping introduce new scope, so assignment becomes instantiation. In other words, what happens in the loop stays in the loop! To avoid this, we can use the .append() method on the previously defined list instead of assignment with =. Interestingly, a while back I was able to make Chat GPT understand this, but it seems it forgot about it.

I said:

Write Jinja to filter the power set such that it only contains elements of two or more items.

ChatGPT said:

To filter the power set so that it only includes combinations of two or more items (i.e., excludes both the empty list [] and any list of length 1), you can extend the Jinja macro like this:

{% macro powerset_min2(seq) %}
  {% set result = [[]] %}
  {% for item in seq %}
    {% set new_result = [] %}
    {% for r in result %}
      {% set new_result = new_result + [r + [item]] %}
    {% endfor %}
    {% set result = result + new_result %}
  {% endfor %}
  {{ result | selectattr('length', '>=', 2) | list }}
{% endmacro %}

I said:

The macro powerset_nonempty does not work: it returns an empty list.

ChatGPT said:

Thanks for pointing that out — you’re absolutely right. The issue is that Jinja’s set statements cannot modify existing lists in-place, and when we write: {% set result = result + new_result %} …it doesn’t behave as expected unless we re-assign everything carefully, especially in nested loops. Let me give you a fully working and tested Jinja2 macro that generates all combinations of a list with length ≥ 2, without relying on unavailable filters like union.

✅ Working Jinja Macro: Power Set with Combinations of 2 or More

{% macro powerset_min2(seq) %}
  {% set result = [[]] %}
  {% for item in seq %}
    {% set temp = [] %}
    {% for subset in result %}
      {% set temp = temp + [subset + [item]] %}
    {% endfor %}
    {% set result = result + temp %}
  {% endfor %}
  {% set result = result | selectattr('length', 'ge', 2) | list %}
  {{ result }}
{% endmacro %}

I thought:

Well, it still hasn’t figured out the bit about not using assignment in loops… Needless to say, this too did not work, and we are not even on recursion yet - still iterating. Note that I have simplified this: yes, there were more iterations, let’s just say more than I would comfortably like to admit. I really tried hard to help Chat GPT to win. Another prompt I threw in was: “Rewrite the original power set macro specifically for DBT constraints and usage.”. Like everything that went before it, it too failed. It is time for the big gun: recursion!

I said:

Write a recursive Jinja macro to produce the power set from a given set.

ChatGPT said:

Jinja2 doesn’t support true recursion out of the box — especially in DBT, which disables some advanced features like call blocks (used for recursion in standard Jinja). However, we can still simulate recursion using a helper macro that behaves recursively by calling itself with smaller slices of the list. Here’s a DBT-compatible recursive macro to generate the power set of a list (including the empty list):

✅ Recursive Power Set Macro for DBT

{% macro powerset_recursive(seq) %}
  {% if seq | length == 0 %}
    {{ return([[]]) }}
  {% else %}
    {% set head = [seq[0]] %}
    {% set tail = seq[1:] %}
    {% set tail_powerset = powerset_recursive(tail) %}
    {% set combined = [] %}
    {% for subset in tail_powerset %}
      {% set combined = combined + [subset] + [head + subset] %}
    {% endfor %}
    {{ return(combined) }}
  {% endif %}
{% endmacro %}

I thought

A very helpful output from Chat GPT was identifying that this was power set expansion way in the beginning. I did test the new and improved macro, but it failed also. At this point though I felt that things had gone too far. Next I turned to Google, and researched a bit, you know, in the old-fashioned way.

I found many examples of power set expansion in Python through recursion, and the solutions were elegant, so I thought, let’s see if Chat GPT also knows. I started a new chat since I feared all the context from the past would cloud things a bit.

I said:

Implement in Python using lists a recursive function to produce the power set. Explain the code.

ChatGPT said:

Here’s a clean and fully recursive implementation of the power set in Python using lists, followed by a clear explanation:

✅ Python Code:

def power_set(lst):
    if not lst:
        return [[]]  # Base case: power set of empty list is list with empty list

    first = lst[0]
    rest = lst[1:]

    # Recursively compute power set of the rest
    without_first = power_set(rest)

    # For each subset, add another subset that includes the first element
    with_first = [[first] + subset for subset in without_first]

    return without_first + with_first

I thought:

Well, this looks quite nice, so I gave it a try and it was the first thing that worked! But, we want to do this within DBT and Jinja macros. So, me, the human stepped in in and started translating and rewriting this Python code into a Jinja macro.

Finale: A working recursive powerset expansion in Jinja that works in DBT!

I would not say it was simple, nor was it quick. The rewriting of the Python recursive code into Jinja macro took a bit of time and a lot of testing. It also required me to make use of my experiential knowledge with Jinja macros, and probbly more faculties such as debugging, that I am not aware of directly.

{% macro powerset_recursive(seq) %}
  {% if seq | length == 0 %}
    {{ return([[]]) }}
  {% endif %}
  {% set head = seq[0] %}
  {% set tail = seq[1:] %}
  {% set tail_powerset = powerset_recursive(tail) %}
  {% set combined = [] %}
  {% for item in tail_powerset %}
    {% set subset = item | list  %}
    {% do subset.append(head) %}
    {% do combined.append(subset | list) %}
  {% endfor %}
  {{ return(combined | list + tail_powerset | list) }}
{% endmacro %}

Note of caution: power set expansion of N elements results in a set with 2 ^ N elements, so it grows exponentially.

Summary

At the start of this year, Open AI’s models made basic mistakes when writing Jinja, like changing {% .. %} into ‘{{ … }}’, but now it is able to write much better code.
The first step, the slog work of transferring the learnings from one macro/model pair into another it handled expertly. It saved time and was a huge help.
Where things started going wrong was with the more trickey part of the problem: the combinations.
The LLM providing the correct term (power set expansion), although a small thing, helped a lot with further investigations.
The LLM was unable to make the leap from an iterative to a recursive solution without prompt direction. If I was unaware of recursion, who knows how long we would have been stuck.
Although not really a niche thing, DBT, Jinja and macros are niche enough for a strong commercial model like 4O to struggle with.
Chat GPT was not able to harvest the history of chats where the looping issue with Jinja was dealt with, and although it had some notion that you should approach with caution, it didn’t really know how to solve it.
Python remains the language it can code really well in, so a reasonable strategy is to let it code Python and then translate into something else.

Conclusion

Make sure that you can make use of agentic/generative AI to assist with your work, but also make sure that you can identify the small but complex parts that the AIs struggle with. I suspect that without a fair number of these small, trickey parts to your work your job is at risk of AI replacement.

I hope this article can be of use to you, to help you a little bit more to navigate these strange times of generative AI assisted programming. Don’t be discouraged to invest in learning new programming languages and frameworks just because some of the new tools can build entire simple systems. Now, more than ever before, we are going to need people with deep technical skills to do the really hard parts and to step in where LLMs fail to tread.

But don’t just take my word for it, read this article by Alberto Fortin describing his experiences at all of this on a much grander scale.

P.S.

I also want to give credit where credit is due - kind of. As part of the first response, Chat GPT did inform me of a simple Python-based solution through itertools module. I copy it beloe.

from itertools import chain, combinations

def powerset(seq):
    return list(chain.from_iterable(combinations(seq, r) for r in range(len(seq)+1)))

Let’s see if it works:

>>> l=['a','b','c']
>>> l
['a', 'b', 'c']
>>> powerset(l)
[(), ('a',), ('b',), ('c',), ('a', 'b'), ('a', 'c'), ('b', 'c'), ('a', 'b', 'c')]
>>>
>>> 2**len(l) == len(powerset(l))
True

P.P.S.

A quick Google revealed that no, defining custom Jinja filters in DBT is not possible, but DBT now has support for a couple of very useful Python standard library modules:

Although this still isn’t possible (and comes up from time to time), we have made good progress on knocking off the sorts of things that make them particularly necessary. dbt now includes native support for: the re regex module (in v0.19.0), which enabled slugify in dbt-utils zip, set and itertools in v1.2.0

Thank you for reading.