It should be a war crime for programs in 2022 to use non-UNIX/non-GNU style command line options. Add it to the Rome Statute's Article 7 list of crimes against humanity. Full blown tribunal at The Hague presided over by the international criminal court. Punishable by having to use Visual Basic 3.0 for all programming for the rest of their life.Reply
sqlite3 :memory: -cmd '.mode csv' ...
> sqlite3 :memory: -cmd '.mode csv' ...
csv mode can also be set with a switch:
sqlite3 :memory: -csv
Btw, am I alone in thinking that DataFrame abstractions in OOP languages (like Pandas in Python) are oftentimes simply inferior to relational algebra? I'm not sure that many Data Scientists are aware of the expressive power of SQL.Reply
This is far more useful for SQL users than chaining several sed/awk/sort commands on pipe (although a bit against nix principles).Reply
How smart is SQLite at detecting column types from Csv data?
I once wrote a Python script to load csv files into SQLite. It had a whole hierarchy of rules to determine the data type of each column.Reply
SQLite's virtual table API (https://www.sqlite.org/vtab.html) makes it possible to access other data structures through the query engine. You don't need to know much if anything about how the database engine executes queries, you only need to implement the callbacks it needs to do its job. A few years ago I wrote an extension to let me search through serialized Protobufs which were stored as blobs in a regular database.Reply
I had to do something very similar for analysing CVE information recently, but I don't remember having to use the :memory: option. I suspect it defaults to that if no .db file is specified.
Slightly tangentially, when doing aggregated queries, SQLite has a very useful group_concat(..., ',') function that will concatenate the expression in the first arg for each row in the group, separated by the separator in the 2nd arg.
In many situations SQLite is a suitable alternative to jq for simple tabular JSON.Reply
This comes at the most opportune time when I am cranking through selectstarsql,sqlbolt,schemaverse,pgexcercise to bone up on my SQL skills for my upcoming data engineering interviews. SQL IS the secret super power that devs don't know they possess.Reply
I’ve been doing this. But I hate it. CSVs need to die. They’re terrible data formats. But here we are. And SQLlite makes things amazing.Reply
I'm genuinely curious about the use case of this. whenever i have CSVs I either import then into R/python of they're small or some DBMS first if they're large. can somebody tell me what niche this is filling?Reply
i used to daydream about adding an ALLOW DOING IT LIVE option to cassandra's csql client. in the event that your where clause was incompatible with your table's key, it would just wholesale dump the table in question into a sqlite while indexing the appropriate columns, run the query in question, actually return the result and then properly clean up.Reply
I've become a fan of using SQLite-utils to work with CSV or JSON files.
It's a two step process though. One to create and insert into a DB and a second to select from and return.Reply
Miller is also a great tool to do computation, transformations on CSV/TSV/JSON/SQLite files...
Here is the way to pretty-print the same result with mlr:
mlr --icsv --opprint --barred stats1 -a count,mean -f total_amount -g passenger_count then sort -f passenger_count taxi.csvReply
Can somebody post the equivalent in Windows cmd?Reply
MODE is one of: ascii Columns/rows delimited by 0x1F and 0x1E
One of my all-time favorite (and somehow still-obscure / relatively unknown) tools is called `lnav` ^1. It's a mini-ETL powertool with embedded SQLite, perfect for wrangling log files or other semi-structured data (a few millions of rows are no problem), it's intuitive and flexible...Reply
I am a data scientists. I have used a lot of tools/libraries to interact with data. SQLite is my favorite. It is hard to beat the syntax/grammar.
Also, when I use SQLite I do not output using column mode. I pipe to `tv` (tidy-viewer) to get a pretty output.
transparency: I am the dev of this utilityReply
Using DuckDB :
DuckDB will automatically infer you are reading a CSV file from the extension, then automatically infer column names from the header, together with various CSV properties (data types, delimiter, quote type, etc). You don't even need to quote the table name as long as the file is in your current directory and the file name contains no special characters.
duckdb -c "SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi.csv GROUP BY ALL"
DuckDB uses the SQLite shell, so all of the commands that are mentioned in the article with SQLite will also work for DuckDB.
Disclaimer: Developer of DuckDBReply
Using ClickHouse you can also process local files in one line using clickhouse-local command tool. And it will look a lot easier:
clickhouse local -q "SELECT passenger_count, COUNT(*), AVG(total_amount) FROM file(taxi.csv, 'CSVWithNames') GROUP BY passenger_count"
And ClickHouse supports a lot of different file formats both for import and export (you can see all of them here https://clickhouse.com/docs/en/interfaces/formats/).
There is an example of using clickhouse-local with taxi dataset mentioned in the post: https://colab.research.google.com/drive/1tiOUCjTnwUIFRxovpRX...Reply
I'm looking through this guys website for 'today I learned' and at first I'm impressed by how many of them there are. But then I start thinking: when you're trying to solve a problem you search for a lot of data. None of his posts are attributed. He's getting all his information from somewhere and then he goes and posts these articles just ripping off other sources.
I can understand when its based on your original work but this website reads more like basic questions posted on Stackoverflow. E.g. 'how to connect to a website with IPv6." Tell me he didn't just Google that and post the result. 0/10Reply
Since many people are sharing one-liners with various tools...
It also infers everything automatically and typechecks your query for errors. You can use it with csv, json, parquet but also Postgres, MySQL, etc. All in a single query!
octosql 'SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi.csv GROUP BY passenger_count'
Disclaimer: author of OctoSQLReply
the .import command used for actually loading the CSV is kinda picky about your CSVs being well-formatted. I don't think it supports embedded newlines at all.Reply
You can also prefix a SQLite import command with |, which hopefully produces text. At the system level, some import scripts can be entirely in one sql file.Reply