There was a buzzword Big Data which regularly popped up in tech news. I haven’t seen the word used much lately. What advances are being made on it?
I imagine with various privacy scandals it fell out of favour since your data should be /your/ data only.
And many have talked about data being the ‘new oil’ when really it should be reframed as radioactive waste.
What happened to using this term to hype up your brand: ‘We use Big Data to infer information about how to improve and go forward’?
Was it just a hyped up buzzword?
You wrote, "And many have talked about data being the ‘new oil’ when really it should be reframed as radioactive waste."
It's true that privacy regulations have made personally identifiable information (PII) into something that is challenging to store, like radioactive waste.
But most of the world's big data is not PII. For example, the huge amount of data being produced by modern telescopes and particle physics labs is about things like stars and subatomic particles, not people.
The world has less than 8e9 people, but there are around 1e11 stars in our galaxy, and there are more than 1e11 galaxies in the observable universe.Reply
1. It morphed into ML as the dirty secret to most ML projects is that they're predominantly about data. Put another way you can't derive a model from nothing.
2. You mentioned privacy scandals, but things like CCPA + GDPR legitimately did make larger corporations pause and ask "Do we actually need this information?" where prior to that everyone was a hoarder "just in case"Reply
The goalpost moved a bit. We now talk about privacy since there is at least something we could potentially accomplish. Fighting big data is impossible, corps are collecting more and more if it and we fall for it. We are giving it away via using free products that appear to be backed by advertising.Reply
Now people throw everything in “data lakes”. It’s already so complicated to handle the ingestion that they don’t even want to try to do anything with that much data.Reply
> Was it just a hyped up buzzword?
I can't tell you how many meetings I've been in where someone was pitching a big data idea and it the meeting ended when we all realized that if it fits on a $50 thumb drive it isn't big data.Reply
To echo what others have said, the practice of big data has become so normalized that the language term "big data" -- as a new thing to call attention to itself -- is not needed as much as before.
Examples of rise & fall of "interactive web" in language use that peaked around 2004:Reply
See “Serverless”, “cloud native”, “zero trust” etc
- hyped buzzword
- catch-all excuse to record everything forever without having an idea how to use that data
- actually hard problemsReply
A new one is born: extreme data.
But most people work on small to medium data.Reply
Big data is just data now.
We nonchalantly spin up massive 1TB+ ram clusters to process our data without really admiring how much data it actually is.Reply
These days I hear a lot of "AIML"
It doesn't make much sense to me as I've never seen anyone use anything you'd find in an AI book that you wouldn't also find in a Machine learning book
For big data, I think that the terminology waned but data engineers internalized the desire to scale everything they make to handle big data. So data engineering teams are still using things like Spark (or databricks) even if their datasets aren't big enough to need thatReply
Here's a rule of thumb: anything with the name "big" before it is bad.
Big oil. Big bad. Big lie. Big brother. Big apple. Etc...Reply
I thing that "big data" has simply became "data".Reply
Everyone who has data is still doing it, the buzzword just went out of fashion. Now it’s data science, analytics, ML eng. What truly ended is “big data” meaning “we’ll come take your logs and magically transform your business.”Reply
If the comments here anything to go by, “Big Data” has simply become “data”Reply
Big Data was about the giddy excitement of being able to run some fancy predictive model on a large amount of data and get some sort of incredible benefit. Now most organizations have tried it. The few that actually benefit now take it for granted. The rest have moved on, although they still have a team of data engineers babysitting a legacy Hadoop cluster.Reply
It became ubiquitous.
Now it is in many places. Enterprises use it each moment.
A laptop hard disk is now capable of holding databases with tens of millions of rows.
Traditional "Data Science" and modern Deep Learning rely entirely on it. Millions of datapoints are used to create models everyday.
A sensor on human wrist collects and stores thousands of data points each day.
So do refrigrators, cars, and your washing machine with ubiquity of IoT.
Giant tech cos use billions of rows each day to show users products, or sell their attention as products.
Big Data became ubiquitous. And it became so common that no nody calls it that anymore.
Tools like BigQuery, Dask, and even Pandas and SQL can handle hundreds of thousands to hundreds of millions of rows (or other structure) with normal, regular programming, command, etc.Reply
Most companies probably realized that they don't have Big Data problems because they only have a limited amount of data which you actually can process in an acceptable amount of time on a single Postgres instance. Distributed data processing has a huge upfront tax and you really only want to be doing it if the data set is enormous.
I guess it is similar to other technologies which most companies or developers would really never need due to their limited scale like distributed databases, NoSQL or microservices: It is interesting technology and engineers would like to get their hands on it because that's what the big boys play with, even if they don't really need it. In the meantime the industry hypes it because the technology is difficult so they know that they can make money doing consulting.
I'm not saying that it is not useful technology, I work at a company where we had the need to go from Postgres to "Big Data" tooling. But for tons of businesses it just doesn't make any sense. And even in our case one of the questions I have most frequently is: What business decision are you taking based on processing this enormous amount of data? Can we not take the same decision based on less data?Reply
If you need big data, the thing you are looking for is small and the effect size of what you are optimizing will also be small.Reply
It's simply become the norm. Companies store and analyze lots of data all the time. It's no longer special but simply table stakes. Look at the valuations of Snowflake and Databricks.Reply
It's just considered "data" these days. We just look at the Vs of the data and adjust based on those. High velocity? Do X. High volume? Accommodate Y. High variety? ... The other side of things is the underlying data quality often had tons of issues, so there's been a lot of focus on the data observability (which isn't sexy at all).
Still tons of folks out there using Hadoop (ew), Snowflake, etc. New technologies coming out include things like Trino, Apache Iceberg, etc. So it's there ... just no one cares about the moniker .. just getting things done.Reply
The buzzword had many definitions, and with time, people realize that what they are dealing with is not BIG data, but just data.
People tried to define big data in terms of the size of the data set. The best definition of big data I heard is "a data storage and/or processing system that cannot handle the amount of data in one physical machine and needs distributed storage and/or processing".
That's a lot of data. Most people and companies are not dealing with big data.
Kind of like everything being "blockchain" at one point. Eventually people realized that the word has a specific meaning that does not apply to many things..Reply
It might sound trite, but we got "big disk", "big memory", "big cpu" and "big gpu" instead.
It's crazy how much you can do with one machine these days. Hence you often just have "data". And then snowflake/bigquery/redshift if it literally can't fit on a machine (which is rare).Reply
Data maybe the new oil, which I don't agree with, but it looked about like this https://en.wikipedia.org/wiki/Petroleum_industry_in_Ohio#/me...
You can call data the new oil when someone invades a country to secure a data center.
I don't think anything fell out of favor and things are a long way from data being, "your data only" although you have been given some rights in that regard.
Nothing happened to it. Big data always represented pushing the boundaries of what could be done when dealing with large amounts of data. After a while the technology matured to the point where working with large datasets just became something you did. There was a lot of hype to it and many organizations unnecessarily went along for the ride. It's also a balance betwee current technology and economics of compute, storage, networking, etc. As the balance changes what and how you do things also changes.Reply
A cursory search of DDG News & Goggle News reveals "big data" is still a widely used buzzword in headlines. I don't think it went anywhere.Reply
If you used big Data tech (Hadoop/yarn/spark) when you didn't actually have big Data (PB), it was slower than columnar databases so the shine wore off.Reply
The Big Data Problem in a nutshell: the more data you have, the easier it is to draw wrong conclusions from it.
Otherwise known as, "you tend to find what you're looking for", as hidden biases in the query will ignore data that doesn't support that.Reply
In addition to the skeptical comments, I think infrastructure and best practice also caught up such that what used to be big data is not so big anymore.
Storing data on S3 or using BigQuery remove a lot of the challenges as opposed to doing this stuff in the data centre. You then also have services such as EMR, Databricks and Snowflakes to acquire the tooling and platforms as an IaaS/SaaS. The actual work then moves up the stack.
Businesses are doing more with data than ever before and the volumes are growing. I just think the challenge moved on from managing large datasets as result of new tooling, infrastructure and practices.Reply
Under most circumstances, there was no "big data" in the first place, and many businesses discovered that you can't magically derive insight, create value, build features, solve problems, etc. by performing aggregation and data science on most forms of data, especially if that data is not "big." No one can even define what "big" means. How many "rows" of data do you need for it to be "big"? How many "columns" per record? How many relationships both implicit and explicit? How fast does that data change or grow? How structured or unstructured? It's entirely a value judgment that many engineers and managers had no business determining as being "big."
The true "big data" became so ubiquitous and accessible that there became no reason for anyone to care about it outside the bubble of Silicon Valley. It's just data, and really was all along.Reply
It was consumed by AIReply
It was just a hyped up buzzword, and new ones have been substituted now. "machine learning" had a broader appeal, many places didn't have that much data; the ones that did have "big" data largely didn't find much signal in the noise even with AI/ML tools.Reply
Like a lot of fads in IT, Big Data sounded like "if you have a lot of data you can monetize it" so companies threw 7+ figures at the technology and then realised that you can have too much data to know what to do with and couldn't really monetize it (obviously some did/still do). Even at a simple level, working at a data collection company, it is very clear that lots of people want to collect as much as possible only then to do precisely nothing useful with the results.
Then Machine Learning comes along and these same people think that means that you can just feed the beast with your big data and it will be clever enough to tell you what you want to know and then the same companies will realise that they still don't have the skills to work out what to tell the ML algorithm to do.Reply
Big Data has always been a marketing paradigm. We've always had lots of data we just didn't process it for business intelligence before.
"The advances in computing have made it easier to accomplish tasks that were completely unnecessary before"Reply
Well, with the introduction of kubernetes as a platform and other cloud solutions, most "big data" just became "data".
Its amazing to see that nowadays the persistent volume claim used for logging, is on average now much bigger than the average dedicated machine was about 10 years ago.Reply
It's parked next to the information superhighway.Reply
We still have teams doing work that would qualify probably, but it's definitely not the craze anymoreReply
Compliance and web3 arrived.
A lot of companies didn't even need to go the hadoop route. CSVs, jupyter notebooks and SQL databases are very powerful tools for most companies.Reply