Hacker News Re-Imagined

Ask HN: What Happened to Big Data?

There was a buzzword Big Data which regularly popped up in tech news. I haven’t seen the word used much lately. What advances are being made on it?

I imagine with various privacy scandals it fell out of favour since your data should be /your/ data only.

And many have talked about data being the ‘new oil’ when really it should be reframed as radioactive waste.

What happened to using this term to hype up your brand: ‘We use Big Data to infer information about how to improve and go forward’?

Was it just a hyped up buzzword?

  • 66 points
  • 9 days ago

  • @night-rider
  • Created a post

Ask HN: What Happened to Big Data?


@troymc 9 days

Replying to @night-rider 🎙

You wrote, "And many have talked about data being the ‘new oil’ when really it should be reframed as radioactive waste."

It's true that privacy regulations have made personally identifiable information (PII) into something that is challenging to store, like radioactive waste.

But most of the world's big data is not PII. For example, the huge amount of data being produced by modern telescopes and particle physics labs is about things like stars and subatomic particles, not people.

The world has less than 8e9 people, but there are around 1e11 stars in our galaxy, and there are more than 1e11 galaxies in the observable universe.

Reply


@michaelbuckbee 9 days

Replying to @night-rider 🎙

Two things:

1. It morphed into ML as the dirty secret to most ML projects is that they're predominantly about data. Put another way you can't derive a model from nothing.

2. You mentioned privacy scandals, but things like CCPA + GDPR legitimately did make larger corporations pause and ask "Do we actually need this information?" where prior to that everyone was a hoarder "just in case"

Reply


@tacheiordache 8 days

Replying to @night-rider 🎙

The goalpost moved a bit. We now talk about privacy since there is at least something we could potentially accomplish. Fighting big data is impossible, corps are collecting more and more if it and we fall for it. We are giving it away via using free products that appear to be backed by advertising.

Reply


@d--b 9 days

Replying to @night-rider 🎙

Now people throw everything in “data lakes”. It’s already so complicated to handle the ingestion that they don’t even want to try to do anything with that much data.

Reply


@indymike 9 days

Replying to @night-rider 🎙

> Was it just a hyped up buzzword?

Yes.

I can't tell you how many meetings I've been in where someone was pitching a big data idea and it the meeting ended when we all realized that if it fits on a $50 thumb drive it isn't big data.

Reply


@jasode 9 days

Replying to @night-rider 🎙

To echo what others have said, the practice of big data has become so normalized that the language term "big data" -- as a new thing to call attention to itself -- is not needed as much as before.

Similar language history that happened to terms like "dynamic web" or "interactive web". In the late 1990s when Javascript started to be heavily used, we called attention to that new trend the "dynamic web". Today, the "interactive web" phrase has mostly gone away. But that doesn't mean that Javascript-enabled web pages were a fad. On the contrary, we're using more Javascript than ever before. We just take it as a given that the web is interactive.

Examples of rise & fall of "interactive web" in language use that peaked around 2004:

https://books.google.com/ngrams/graph?content=dynamic+web&ye...

https://books.google.com/ngrams/graph?content=interactive+we...

Reply


@mechanical_bear 9 days

Replying to @night-rider 🎙

See “Serverless”, “cloud native”, “zero trust” etc

Hype.

Reply


@c2h5oh 9 days

Replying to @night-rider 🎙

- hyped buzzword

- catch-all excuse to record everything forever without having an idea how to use that data

- actually hard problems

Reply


@speedgoose 9 days

Replying to @night-rider 🎙

A new one is born: extreme data.

For example: https://www.horizon-europe.gouv.fr/extreme-data-mining-aggre...

But most people work on small to medium data.

Reply


@nojito 9 days

Replying to @night-rider 🎙

Big data is just data now.

We nonchalantly spin up massive 1TB+ ram clusters to process our data without really admiring how much data it actually is.

Reply


@dr-neptune 9 days

Replying to @night-rider 🎙

These days I hear a lot of "AIML"

It doesn't make much sense to me as I've never seen anyone use anything you'd find in an AI book that you wouldn't also find in a Machine learning book

For big data, I think that the terminology waned but data engineers internalized the desire to scale everything they make to handle big data. So data engineering teams are still using things like Spark (or databricks) even if their datasets aren't big enough to need that

Reply


@jgrahamc 9 days

Replying to @night-rider 🎙

Never really went away, never really arrived: https://www.youtube.com/watch?v=pcBJfkE5UwU

Reply


@Konohamaru 9 days

Replying to @night-rider 🎙

Here's a rule of thumb: anything with the name "big" before it is bad.

Big oil. Big bad. Big lie. Big brother. Big apple. Etc...

Reply


@taubek 9 days

Replying to @night-rider 🎙

I thing that "big data" has simply became "data".

Reply


@kleinsch 9 days

Replying to @night-rider 🎙

Everyone who has data is still doing it, the buzzword just went out of fashion. Now it’s data science, analytics, ML eng. What truly ended is “big data” meaning “we’ll come take your logs and magically transform your business.”

Reply


@speedylight 9 days

Replying to @night-rider 🎙

If the comments here anything to go by, “Big Data” has simply become “data”

Reply


@civilized 9 days

Replying to @night-rider 🎙

Big Data was about the giddy excitement of being able to run some fancy predictive model on a large amount of data and get some sort of incredible benefit. Now most organizations have tried it. The few that actually benefit now take it for granted. The rest have moved on, although they still have a team of data engineers babysitting a legacy Hadoop cluster.

Reply


@rg111 9 days

Replying to @night-rider 🎙

It became ubiquitous.

Now it is in many places. Enterprises use it each moment.

A laptop hard disk is now capable of holding databases with tens of millions of rows.

Traditional "Data Science" and modern Deep Learning rely entirely on it. Millions of datapoints are used to create models everyday.

A sensor on human wrist collects and stores thousands of data points each day.

So do refrigrators, cars, and your washing machine with ubiquity of IoT.

Giant tech cos use billions of rows each day to show users products, or sell their attention as products.

Big Data became ubiquitous. And it became so common that no nody calls it that anymore.

Tools like BigQuery, Dask, and even Pandas and SQL can handle hundreds of thousands to hundreds of millions of rows (or other structure) with normal, regular programming, command, etc.

Reply


@aeyes 9 days

Replying to @night-rider 🎙

Most companies probably realized that they don't have Big Data problems because they only have a limited amount of data which you actually can process in an acceptable amount of time on a single Postgres instance. Distributed data processing has a huge upfront tax and you really only want to be doing it if the data set is enormous.

I guess it is similar to other technologies which most companies or developers would really never need due to their limited scale like distributed databases, NoSQL or microservices: It is interesting technology and engineers would like to get their hands on it because that's what the big boys play with, even if they don't really need it. In the meantime the industry hypes it because the technology is difficult so they know that they can make money doing consulting.

I'm not saying that it is not useful technology, I work at a company where we had the need to go from Postgres to "Big Data" tooling. But for tons of businesses it just doesn't make any sense. And even in our case one of the questions I have most frequently is: What business decision are you taking based on processing this enormous amount of data? Can we not take the same decision based on less data?

Reply


@monkeybutton 9 days

Replying to @night-rider 🎙

If you need big data, the thing you are looking for is small and the effect size of what you are optimizing will also be small.

Reply


@marcinzm 9 days

Replying to @night-rider 🎙

It's simply become the norm. Companies store and analyze lots of data all the time. It's no longer special but simply table stakes. Look at the valuations of Snowflake and Databricks.

Reply


@ironchef 9 days

Replying to @night-rider 🎙

It's just considered "data" these days. We just look at the Vs of the data and adjust based on those. High velocity? Do X. High volume? Accommodate Y. High variety? ... The other side of things is the underlying data quality often had tons of issues, so there's been a lot of focus on the data observability (which isn't sexy at all).

Still tons of folks out there using Hadoop (ew), Snowflake, etc. New technologies coming out include things like Trino, Apache Iceberg, etc. So it's there ... just no one cares about the moniker .. just getting things done.

Reply


@koliber 9 days

Replying to @night-rider 🎙

The buzzword had many definitions, and with time, people realize that what they are dealing with is not BIG data, but just data.

People tried to define big data in terms of the size of the data set. The best definition of big data I heard is "a data storage and/or processing system that cannot handle the amount of data in one physical machine and needs distributed storage and/or processing".

That's a lot of data. Most people and companies are not dealing with big data.

Kind of like everything being "blockchain" at one point. Eventually people realized that the word has a specific meaning that does not apply to many things..

Reply


@danielmarkbruce 8 days

Replying to @night-rider 🎙

It might sound trite, but we got "big disk", "big memory", "big cpu" and "big gpu" instead.

It's crazy how much you can do with one machine these days. Hence you often just have "data". And then snowflake/bigquery/redshift if it literally can't fit on a machine (which is rare).

Reply


@zcw100 9 days

Replying to @night-rider 🎙

Data maybe the new oil, which I don't agree with, but it looked about like this https://en.wikipedia.org/wiki/Petroleum_industry_in_Ohio#/me...

You can call data the new oil when someone invades a country to secure a data center.

I don't think anything fell out of favor and things are a long way from data being, "your data only" although you have been given some rights in that regard.

Nothing happened to it. Big data always represented pushing the boundaries of what could be done when dealing with large amounts of data. After a while the technology matured to the point where working with large datasets just became something you did. There was a lot of hype to it and many organizations unnecessarily went along for the ride. It's also a balance betwee current technology and economics of compute, storage, networking, etc. As the balance changes what and how you do things also changes.

Reply


@photochemsyn 9 days

Replying to @night-rider 🎙

A cursory search of DDG News & Goggle News reveals "big data" is still a widely used buzzword in headlines. I don't think it went anywhere.

Reply


@gregw2 9 days

Replying to @night-rider 🎙

If you used big Data tech (Hadoop/yarn/spark) when you didn't actually have big Data (PB), it was slower than columnar databases so the shine wore off.

Reply


@Froedlich 8 days

Replying to @night-rider 🎙

The Big Data Problem in a nutshell: the more data you have, the easier it is to draw wrong conclusions from it.

Otherwise known as, "you tend to find what you're looking for", as hidden biases in the query will ignore data that doesn't support that.

Reply


@benjaminwootton 9 days

Replying to @night-rider 🎙

In addition to the skeptical comments, I think infrastructure and best practice also caught up such that what used to be big data is not so big anymore.

Storing data on S3 or using BigQuery remove a lot of the challenges as opposed to doing this stuff in the data centre. You then also have services such as EMR, Databricks and Snowflakes to acquire the tooling and platforms as an IaaS/SaaS. The actual work then moves up the stack.

Businesses are doing more with data than ever before and the volumes are growing. I just think the challenge moved on from managing large datasets as result of new tooling, infrastructure and practices.

Reply


@PaulHoule 9 days

Replying to @night-rider 🎙

Basically.

Reply


@ravenstine 9 days

Replying to @night-rider 🎙

Under most circumstances, there was no "big data" in the first place, and many businesses discovered that you can't magically derive insight, create value, build features, solve problems, etc. by performing aggregation and data science on most forms of data, especially if that data is not "big." No one can even define what "big" means. How many "rows" of data do you need for it to be "big"? How many "columns" per record? How many relationships both implicit and explicit? How fast does that data change or grow? How structured or unstructured? It's entirely a value judgment that many engineers and managers had no business determining as being "big."

The true "big data" became so ubiquitous and accessible that there became no reason for anyone to care about it outside the bubble of Silicon Valley. It's just data, and really was all along.

Reply


@singularity2001 9 days

Replying to @night-rider 🎙

It was consumed by AI

Reply


@h2odragon 9 days

Replying to @night-rider 🎙

It was just a hyped up buzzword, and new ones have been substituted now. "machine learning" had a broader appeal, many places didn't have that much data; the ones that did have "big" data largely didn't find much signal in the noise even with AI/ML tools.

Reply


@lbriner 9 days

Replying to @night-rider 🎙

Like a lot of fads in IT, Big Data sounded like "if you have a lot of data you can monetize it" so companies threw 7+ figures at the technology and then realised that you can have too much data to know what to do with and couldn't really monetize it (obviously some did/still do). Even at a simple level, working at a data collection company, it is very clear that lots of people want to collect as much as possible only then to do precisely nothing useful with the results.

Then Machine Learning comes along and these same people think that means that you can just feed the beast with your big data and it will be clever enough to tell you what you want to know and then the same companies will realise that they still don't have the skills to work out what to tell the ML algorithm to do.

Reply


@pluc 9 days

Replying to @night-rider 🎙

Big Data has always been a marketing paradigm. We've always had lots of data we just didn't process it for business intelligence before.

"The advances in computing have made it easier to accomplish tasks that were completely unnecessary before"

Reply


@jhoelzel 9 days

Replying to @night-rider 🎙

Well, with the introduction of kubernetes as a platform and other cloud solutions, most "big data" just became "data".

Its amazing to see that nowadays the persistent volume claim used for logging, is on average now much bigger than the average dedicated machine was about 10 years ago.

Reply


@sys_64738 9 days

Replying to @night-rider 🎙

It's parked next to the information superhighway.

Reply


@bravetraveler 9 days

Replying to @night-rider 🎙

We still have teams doing work that would qualify probably, but it's definitely not the craze anymore

Reply


@sphix0r 9 days

Replying to @night-rider 🎙

Compliance and web3 arrived.

A lot of companies didn't even need to go the hadoop route. CSVs, jupyter notebooks and SQL databases are very powerful tools for most companies.

Reply


About Us

site design / logo © 2022 Box Piper