Hey, the founder of Deepnote here. Happy to chat about this.
Recent @Equiet Activity
Hey, the founder of Deepnote here. Happy to chat about this.
It took 30 years, but literate programming paradigm is now very commonly referenced when working with notebooks, even though it's not what Knuth was going for in the first place and the original intent was different.
Notebooks are a direct evolution of REPLs and at the time of writing (Knuth published his paper on literate programming in 1984), REPLs have been well known (they had been around since 1970s). Yet there is not a single mention of them in the original paper, nor any prediction how literate-programming-capable notebooks could look like. At the time, these were two different concepts each serving its own need (documentation vs exploratory programming).
We tried Logtail in early beta and the experience was mind bending.
We used both ELK and Datadog before and over time I somehow stopped questioning why the logs are so slow. I thought we are simply hitting the theoretical limits of how fast searching in logs can be and unless there is a significant leap in hardware this is what we have to deal with. Then we tried Logtail and now we are migrating everything we can.
There's a video which shows how you look like to a UV camera when you put sunscreen on. Also glasses with UV filters: https://www.youtube.com/watch?v=o9BqrSAHbTc
Looks really good, but how is this different from PyPI? Is it just the design or are there more features planned?
When we look at data scientists, there is a wide spectrum in their background, skillset, workflow, etc. When we look at the users of data science notebooks specifically, there are many people who don't want to deal with the hassle of running their own infrastructure (possibly even more than those who do). I agree with you that the ability to use your own servers is important for lots of use cases, but it's not a hard requirement for us to get the product out to the public and ask for feedback. So we don't have on prem or GPUs at the moment, but it's something that we are working on.
Hey, just saw that, great work!
Not right now. This is something we'd like to do, but connecting to other clusters/architectures would bring a lot of additional complexity into the product. As a startup, we need to balance a lot of things and while we're in beta the development speed has the highest priority.
Got it. I can't speak of Gradient's roadmap, but as of right now they are using Jupyter as a notebook and focusing on infrastructure around it. We are innovating on the notebook itself.
Huge part of it is simply the UX. There's a wide range of what kind of work a data scientist does. Some train models that go into production, some analyze the datasets and build reports. Probably best to try both products with your workload and see what works better.
Paperspace is doing a great job providing infrastructure for data science workloads and mlops. The target users are data scientists/engineers. The ability to share with non-technical users is quite limited.
We built Deepnote so that the work you do as a data scientist can be shared with both engineers and non-technical folks. We're not really an mlops platform. We make a really good notebook that integrates with other platforms.
At the moment we have GitHub integration, so you can easily commit changes like you're used to. We also have project history (so you can see all the actions that lead to the current state of the project and review what happened while you were away).
But I'd like to improve on this experience. There are many ways how to do it (great job btw), but we want to explore how a versioning system native to notebooks would look like. We're still iterating on that.
We are using the same kernels as Jupyter, so features like debugging work out of the box. However, we don't have an interface for visual debugging yet.
Quick summary: - real time collaboration - integrations (databases, S3 buckets, environment variables) - persistent (and much much faster) filesystem - hardware doesn't shut off - many more features like variable explorer or automatic visualizations - much nicer interface so you can share with non-technical people - paid plan so you can build your data science team around it - no GPU/TPU machines yet, but that's coming
Re Dockerfiles: Right now, we need to rebuild your docker images from time to time (e.g. when we make some changes to the kernel). That means that if you create a docker image with `RUN pip install numpy` and we need to rebuild it in a year, you might get a different version which might break things. The correct solution here is to encourage users to always use `RUN pip install numpy==1.19.3`. We already do this in Python cells (when you run `!pip install numpy` we query PyPI and suggest the last version to you), but we haven't added it to Dockerfiles yet. So to set the expectations we have this notice in the docs.
Regarding other issues: We currently record every execution in project history. That means even if you run cells out of order, you can still get a list of commands that shows how you got to the current state.
The next step for us is to start subtly notifying users when they are doing something that could be an issue later down the road (for example executing cells out of order). We already built this, but decided not to ship it yet because it needed more love. The second thing we are working on is interactive/reacting execution. This is very very very cool and brings the experience from the notebook to the next level (at least for me), but needs much more testing.
Reproducibility vs flexibility (in the sense of letting the user do whatever they want if they know what they're doing) is a difficult problem. In the end, it's going to be a combination of friendly nudges and much better experience if users are following the "reproducible" path. However, we never want to limit users in what they are able to do.
I spent a lot of time thinking about this and would be happy to chat about what you're thinking. Feel free to email me at email@example.com.
We're building a pretty hard product. We had a nice working demo a year ago, but there's a lot of work to make a platform like this stable. Real-time collaboration is pretty difficult by itself (especially when you're not syncing just text), but we also had to build a computing platform where users can run arbitrary code. That opens us up to everything from a large attack surface to a huge number of quite inventive crypto miners. So we kept building in a private beta until we were confident enough to launch publicly.
Interestingly, ever since we started almost 2 years ago we've been pretty laser focused and there were minimal changes to the vision overall. But we also knew what we were going into and that it'd take time.
Thank you! Actually I'm working on that right now.
Well, we are not a Jupyter hosting service. There's definitely a lot of work being put into embedding Jupyter into data science platforms (mostly putting Jupyter into an iframe). But at the end of the day, there are limitations to this approach so some things won't work that well.
Ship early, ship often. GPUs are coming.
Thanks! The difference between Deepnote and MyBinder is that we keep the pool of Docker images as small as possible. That means they are always in cache. You can still write your own Dockerfile, but they are layered on our base image. MyBinder has a lot of work that needs to be done (pulling the image, sometimes building it, etc) which we thankfully mostly avoided.
Regarding the lock-in, it's in our best interest to remain fully compatible. So yes, there'll always be a way how to export your project and run it in plain Jupyter. The hope is the more advanced features (comments, output visualizations, different cell types) will appear in Jupyter over time as well, but it's also up to Jupyter whether they want those features.
Thanks! Honest answer: it was faster to implement. Regular sign up via email should be coming soon.