henriblancke / dbt-formatter Goto Github PK
View Code? Open in Web Editor NEWFormatting for dbt jinja-flavored sql
License: MIT License
Formatting for dbt jinja-flavored sql
License: MIT License
I typed:
SELECT
updated_at,
asin,
data->'title' AS product_title
FROM
product
And it formatted to:
SELECT
updated_at,
ASIN,
DATA -> 'title' AS product_title
FROM
product
Why is asin
and data
capitalized?
The formatter can introduce an error when formatting the end of a case statement:
Before formatting:
when GTTV_28 between 2500 and 4000 then 'Trending'
when GTTV_28 < 2500 then 'Low'
else null end csm_status
After formatting:
WHEN GTTV_28 BETWEEN 2500
AND 4000 THEN 'Trending'
WHEN GTTV_28 < 2500 THEN 'Low'
ELSE NULLEND csm_status
NULL END
should remain as separate words.
This does not happen on all case statements; in this case it may have been related to use of between X and Y
in one of the WHEN
clauses
Snowflake complains when arrows are expanded with a space between =
and >
When else is NULL
it will append END
without space to NULL
Example:
CASE
WHEN a > 1 THEN b
WHEN b > 1 THEN a
ELSE NULL
END AS col
It would be nice if you could configure the plugin to use tabs vs spaces.
Weird bug where every time I save (and formatter runs), the indentation keeps increasing:
Screencast: https://streamable.com/kbfgqq
I think JSONB and type casting is being mis-formated:
From:
(data ->> 'newPriceIsMAP')::boolean as new_price_is_map,
To:
(
data ->> 'newPriceIsMAP'
) :: boolean as new_price_is_map,
Seems like the first option is more readable.
Support .editorconfig
configuration for SQL files rather than requiring custom config. This will make it easier to centralise formatting configuration.
prettier supports this out-of-the-box.
Currently expressions with a single word are exempt from becoming an inline block. Ideally we would be able to let the user decide at what number of words they wan't the parenthesis expression to become an inline block.
Desired:
LEFT JOIN {{ ref('mytable') }} AS tb
ON {{ var }} = tb.col
Actual:
LEFT JOIN {{ ref('mytable') }} AS tb
ON {{ var }}
= tb.col
Hello maintainers of this package ๐ thanks for your hard work on this! I have been using this for a little while now at work. I've noticed a few bugs in the formatter such that it produces invalid SQL. I'm wondering if you're open to PRs to address bugs?
Hello! We're keen to use this to start formatting our dbt codebase; it looks like exactly what we want to have a standardised SQL output that takes into account jinja code blocks, but have come across a couple of issues. I'm happy to look at fixing this but want to validate the approach first.
The issue is to do with the way the tokeniser treats [
and ]
as string delimiters, which is fine for SQL Server SQL, but breaks when we start using arrays and dictionaries in our jinja code (which we currently do). The errors can be reproduced as follows:
it('works with arrays in the jinja statement', () => {
const formatted = formatter(`{{ some_macro(arg_1 = 'foo', arg_2 = ['[string with square brackets]'], arg_3 = 'bar') }}`);
expect(formatted).toBe(`{{ some_macro(
arg_1 = 'foo',
arg_2 = ['[string with square brackets]'],
arg_3 = 'bar'
) }}`);
})
it('does not treat square brackets in jinja statements as strings', () => {
const formatted = formatter(`{{ variable['key'] }}`);
expect(formatted).toBe(`{{ variable['key'] }}`);
})
When I run these tests, I get the following output:
โ jinja flavored queries โบ works with arrays in the jinja statement
expect(received).toBe(expected) // Object.is equality
- Expected - 3
+ Received + 2
{{ some_macro(
arg_1 = 'foo',
- arg_2 = ['[string with square brackets]'],
- arg_3 = 'bar'
- ) }}
+ arg_2 = ['[string with square brackets] '], arg_3 = ' bar ') }}
+
โ jinja flavored queries โบ does not treat square brackets in jinja statements as strings
expect(received).toBe(expected) // Object.is equality
- Expected - 1
+ Received + 1
- {{ variable['key'] }}
+ {{ variable ['key'] }}
โต
The problem in the first test is that it tokenises ['[string with square brackets]']
into ['[string with square brackets]
and '], arg_3 = '" }
. The incorrect match on the trailing bracket ultimately results in tokenising the entirety of the remainder of the statment into a single token, and we end up with our entire file on one line.
The problem in the second test is that due to the treatment of [ ... ]
as a string, it separates a dictionary access in the jinja code variable['key']
into two tokens variable
and ['key']
, which are then separated by whitespace because they're just two different strings.
I think the root of this problem is due to the fact that the tokeniser doesn't distinguish between SQL and code in the jinja templates. Obviously we still need to treat [...]
as a string in the SQL but treating it like a string in python doesn't make sense. Consequently I think what we need to do is to parse and format jinja separately to the SQL:
SELECT * FROM {{ ref("table") }}
would get tokenised into something like:
{'type': 'reserved', 'value': 'SELECT'}
{'type': 'string', 'value': '*'}
{'type': 'reserved', 'value': 'FROM'}
{'type': 'jinja', 'value': '{{ ref("table") }}'}
jinja
token type.Does the problem and the proposed solution make sense? Like I say, I'm happy to work on a fix for this but given it's a fairly substantial change I thought I'd make sure I'm not missing something first ๐
A nice to have is a binary so this package can easily be ran on files using a runner such as pre-commit or https://github.com/okonet/lint-staged.
e.g.
npx dbt-formatter ./path-to-some/file.sql --spaces 2 --indent 2 --sql redshift
Join with alias formats as:
INNER JOIN user_mins um
ON um.user_id = C.user_id
Join using jinja and an alias adds an extra newline before the alias:
INNER JOIN {{ ref("dim_user") }}
dim_user
ON dim_user.id = C.user_id
We would love to start using this in our DBT projects. We are currently formatting SQL queries manually, and we are using a justified AS blocks for column aliases - which adds a lot in terms of readability. Has anyone looked into what this would require?
For example:
SELECT
CAST(1 as int) AS col_a
, CAST('text_value' as varchar(255)) AS col_b
...instead of:
SELECT
CAST(1 as int) AS col_a
, CAST('text_value' as varchar(255)) AS col_b
I'm happy to help contribute code if someone is able to point me in the right direction.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.