• 9 months ago
The results are fantastic, but I can't see how the excerpts relate to the search term.
For example, a search term for 'Scotichronicon' returns some fascinating results, but the search term itself doesn't appear in the title or excerpts of most of the results.
This makes it harder to judge how relevant they are.
ReplyThis is brilliant - I can actually surf the web for fun again. This engine's actually a nice complement to another mainstream engine, as the regular one is good for searches during the working day, whilst "marginalia" (?) is great for recreational reading, and actual learning.
ReplyWow, that's awesome. Great work!
For a simple test, I searched "fall of the roman empire". In your search engine, I got wikipedia, followed by academic talks, chapters of books, and long-form blogs. All extremely useful resources.
When I search on google, I get wikipedia, followed by a listicle "8 Reasons Why Rome Fell", then the imdb page for a movie by the same name, and then two Amazon book links, which are totally useless.
ReplySearched for my initials - got back a bunch of raw binary results (mp4, pdf, img, txz, etc.) which was disconcerting. Although it did find one reference to actual-me which is better than Google manages on the first 4 pages...
ReplyFast and doesn't crash when on the front page of Hacker News!
ReplyHave you given any thought on what you will do if you get a DMCA take down request or a request from a person asking you to remove them from search results?
ReplyI don't really care about website's design, as long as it gets out of the way of me reading it.
ReplyThis is amazing!
ReplyAwesome work! I had similar idea in mind but I'm glad to see someone else was able to pull it off.
ReplyI had an idea for an alternative search engine a few years ago thats a bit simpler to implement than this one. First we extract all the external links from Wikipedia dumps. Then we ingest only those sites into the index. The entire database comprises sites that have already been screened by Wikipedians. Gaming Wikipedia is generally more difficult than gaming Google or Bing. In theory, SERPs would be of a less commercial and more substantive nature than those we get from Google and Bing.
ReplyLove this! Is there any way someone could help contribute to this?
ReplyThis has been needed in my life for a while. I am growing really apathetic about the internet lately, but I realize that is because my entry point is always a google search.
I miss finding blog posts and scholarly articles in long form. I hate the SEO sites with unreadable UI because the information in them is often a lot lower quality as well.
ReplyI tried two searches in Norwegian ("norsk ordbok" [norwegian dictionary] and "stortinget" [the parliament]), and they both returned many extreme or "alternative" websites. It was especially striking that the neo-nazi group Vigrid's website was the top hit for both searches. Maybe these sites just have less modern web design?
ReplySearching for your own name will turn up some interesting results! I got some early 90s webpages that just contain obituaries or marriage records. I never knew cities maintained these records online!
ReplyDoes this also penalize pages with tons of ads and three paragraphs of text? Or anything from Medium?
ReplyPretty cool. I am not sure yet how useful, but cool it is.
However, it seems that it currently does not support non-Latin alphabets. Which I understand in an early version. Still, it's handling of such "exception cases" could be improved:
when I search for a Russian word, say "Аквариум", I get <<Search "Аквариум" needs to be a word>>, which is rather rude...
Replyneed a duck duck go `bang!` for this
Replyhow about a search engine that bans all pinterest content.
I hate pinterest with a passion. I may need to get a "his" laptop separate from my wife, since she needs that darn pinterest extension for pinning photos.
ReplyI searched for "giraffe evolution" (without quotes) and received the following links on the first page:
- Evolutionist scientists say the theory is unscientific and worthless
- Seven Mysteries of Evolution
- OTHER EVIDENCE AGAINST EVOLUTION
- Evolution Falsified
Not a single result about the evolution of giraffes...
Replywow super cool!
ReplyNice! Typing my name in, gets my own site back as 3 of the top 5 results. I suddenly feel important ;)
ReplyI've also found that brave search gets much better results than google for some programming related topic, simply by not being targeted by blogspam SEO as much. It's refreshing to not have to click through 3 auto generated "articles" but to either a) get the documentation straight away or b) find actually human written blog entries.
ReplyThis came back in a result set and I am thoroughly pleased.
http://www.aliens-everything-you-want-to-know.com/
This is the Information Superhighway in all her glory.
ReplyI did a quick check using the name of the Scottish village I am originally from (or as I should say "far am fae") and this produced a much more interesting set of links for me than that produced by Google
ReplyAbsolute textgasm.
Wonder how 'text-only first' prioritization is being implemented, algorithmically speaking?
Reply(Old man yells at cloud.)
ReplyDamn that's is interesting search engine, this is great for search simple terms and find a bunch of blog articles about the term.
ReplyGreat for a text-focused site- however, the results are a bit confusing. Would help if there were more details on the criteria used for a site to be included in the index.
Suggestion - Use system fonts (the site downloads almost 300k of fonts)
ReplyLove it! Searched for Shigeru Miyamoto and this was the third result: https://www.glitterberri.com/developer-interviews/miyamoto-h...
ReplyAdd an OpenSearch file: https://developer.mozilla.org/en-US/docs/Web/OpenSearch
Maybe something like:
<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/"
xmlns:moz="http://www.mozilla.org/2006/browser/search/">
<ShortName>Marginalia</ShortName>
<Description>Marginalia Search - a search engine that favors text-heavy sites and punishes modern web design</Description>
<InputEncoding>UTF-8</InputEncoding>
<Image width="16" height="16" type="image/x-icon">https://search.marginalia.nu/favicon.ico</Image>
<Url type="text/html" template="https://search.marginalia.nu/search?query={searchTerms}" />
<moz:SearchForm>https://search.marginalia.nu/</moz:SearchForm>
</OpenSearchDescription>
and then add: <link rel="search"
type="application/opensearchdescription+xml"
title="Marginalia"
href="/opensearch.xml" />
to your page's <head>.Reply I'm all for nostalgia but IMO the information should be the most important thing. Presentation is a close second though, and I can kind of get behind this project when I see what the web is like without an ad-blocker.
ReplyI've got some results where the same site is has than 70% of the links. It was a very on topic and high quality site, but still, all the results shouldn't point to the same place.
I think some grouping by site (and capping to only the few most relevant links there) would improve the engine.
ReplyGotta say, sometimes the results really are nice. I searched for "Land Cruiser 70." The first result is a simple, short blog post about a couple who traveled across Europe and Asia in their Troop Carrier (http://www.destoop.com/trip/1%20PREPARATION/2%20Vehicle%20sp...).
The first results on Google are Australian site for buying a LC70 (news-flash, I can't buy one in the USA). There is also a MotorTrend article about the LC70...also irrelevant since it's only sold in Australia.
ReplySearch for playwright waitForSelector and you land in pretty useless page. I'm all in for text websites, but something like playwright.dev documentation is top notch - fuzzy search being key thing.
ReplyThis is great. The results for my search were like a suggested reading list.
ReplyI wish we could configure Google's algorithm to our needs, and blacklist websites.
Replysuper! How far would you say are you in indexing the blogosphere ? I tried the engine a few times, but I mostly get academic papers and I know most (good) blogs are in fact text-heavy.
ReplyA big fan of your work! Just wanted to let you know what iOS devices provide quotes as “ rather than " - you may need to support the character “ or at least let people know that iOS is not supported etc… right now I get a generic character error.
ReplyOne nitpick that kind of bothered me - on a large desktop monitor, the results page was like 70% whitespace margins with the results squished in the middle like a portrait cellphone. Hopefully it's easy to fix, I like to research at home and this website could help a lot!
ReplySo is this a filter on top of Google or is it search from scratch? Would love to understand more of the implementation.
ReplyThis would be awesome if the search actually worked. Typed in 'runescape' and expected a few websites left over from the early 2000s. But I got nothing, just a lot of hits to other keywords.
ReplyWikpedia links point to <https://encyclopedia.marginalia.nu/> instead, which to my eyes is less readable. The justified text, done with CSS, instead of the LaTeX algorithm, looks wild. The font used for quotations is even worse (very thin).
Wikipedia is perfectly usable without JavaScript and it's one of the nicest sites out there typography-wise, so I'd reconsider this redirection.
ReplyI wish niche search engines has an option to group results by domain names. There are a few major sites that dominate Google search results with low effort content. As long as Google stands as the largest search engine, it's unlikely that these major sites will want to rearchitect itself into different domain names.
ReplyI tried a few searches.
<<javascript pipe syntax>>: none of the search results appeared to have anything to do with Javascript pipe syntax. (Which doesn't exist yet, but it's under discussion.) Google gives a bunch of highly-relevant results.
<<hans reichenbach relativity>>: first result is a list of books about relativity, one of which is Reichenbach's "Philosophy of space and time"; good, but there's no real information there. Second is about Reichenbach but nothing to do with relativity or even, really, philosophy of science. Third is about philosophy of science and mentions some of Reichenbach's work but not related to relativity. Fourth mentions Reichenbach's "Philosophy of space and time" as part of a list of books relevant to a seminar on "time and eternity". None of this is bad, but it's not great either. Google gives a couple of online philosophy encyclopaedia entries, then a journal article on "Hans Reichenbach's relativity of geometry", then the Wikipedia article on Reichenbach ... much more informative.
<<luna lovegood actress>>: I thought this would be an easy one. It was easy for Google, which gave me her name in large friendly letters at the top, then her IMDB entry, and a bunch of other relevant things. Literally nothing in the Marginalia results was relevant to the query.
I guess maybe popular culture is just too monetizable, so no one is going to write about it on the sites that Marginalia crawls? Let's try some slightly less popular culture.
<<wilde "a handbag">>: First result is kinda-relevant but weird: it's about a musical adaptation of The Importance of Being Earnest. It doesn't mention that famous line from the play, but one of the numbers in the musical has the words "a handbag" in the title. Second result is a review of a CD of musicals, including the same work. Third is a bunch of short reviews of theatrical items from the Buxton Festival Fringe, one of which is a three-man adaptation of TIOBE. Next four are 100% irrelevant. Next is a list of names of plays. Last one is actually relevant; it's an article about "Lady Bracknell through the decades". Google puts that one first (after, sigh, a bunch of YouTube videos which look as if they might actually be relevant).
I really like the idea of this, and many of the things it turns up look like they might be interesting, but it isn't doing very well at producing results that are actually relevant to the thing being searched for.
ReplyWow. I tested it on recipes which Google has destroyed and this was the first result, a simple clear recipe:
http://demont.myds.me/leerecipes/mainmeals/mainmeals1/chicke...
compared to Google's endless drivel of "This chicken stir fry recipe will become a staple in your home. It’s so quick to make and you can use whatever vegetables you have on hand. It tastes wonderful regardless of how you alter the ingredients. ... " JUST SHUT UP AND GIVE ME THE RECIPE!
https://natashaskitchen.com/chicken-stir-fry-recipe/
ReplyToo bad it rejects non-Latin words, as if the definition of "text" is a sequence of alphabetical letters originated from Latin.
I thought that we've reached the time to embrace all cultures in the world, but this retrogressive engine proves that most modern tech designers are myopic about other civilizations in the globe.
ReplyReally impressed with the results I’m seeing so far. In all searches I have done so far, the results are truly lightweight, and haven’t had to click through any modals, subscription pop-ups or any other junk thus far! Will be using more in the days to come.
ReplyThis is incredible. I just got goosebumps as I stumbled upon https://solitaryroad.com after searching for "linear algebra homomorphism". It reminds me of the magical feelings of the early Internet. Keep up the great work!
ReplyDoes it filter out ad-heavy copy-paste/autogenerated fake sites? Tired of seeing those on the first few pages of Google. Bing gets more and more usable, but far from perfect.
ReplyI like the idea. However results take too much space vertically it's slow and cumbersome to scan through them.
I think it would benefit from using a responsive layout, allow the text expand to a wide 1000+ px, make the font smaller, so the excerpt can fit one or two lines below the links.
Google has problems but their search results layout is easy to scan.
Otherwise I genuinely wish I would use it, because the Google search's "self referential reality bubble" is really annoying.
ReplyI think I want a BBS. Text mode, fixed width font, keyboard-driven menus, no (or very little) bitmapped graphics. I've been thinking about the UIs for a lot of sites that I use to "do things" on the web. E.g. search for flights. Do I need any of that "beautiful" web design with pretty forms and fonts, bevelled edges, drop shadows, drop-down menus, hovers? Hell, do I even need a map? Heck no, I need three text entry fields and output a bulleted list, maybe table of results. Just give me the raw data and do as little presentation as possible, thanks.
I really think I want an internet console, not an animated magazine.
ReplyThis is really good; I'll actually use it!
ReplyThis is a really cool idea. I tried a few technical queries I did on DDG today and didn't get amazing results - hence the warning in the About page about this engine giving you things you didn't know you were looking for, rather than specific facts. But the examples others have posted sound promising and refreshing. I would love to read about the algorithms behind this and how modern web design gets detected in order to punish it...
ReplyI like the design.
ReplyAre there any technical infos about the search engine? Found some information here https://memex.marginalia.nu/projects/edge/about.gmi
Author said they threw it together on consumer hardware. How big is the index? (TB used or entries) how is it realised?
I'm pretty much interested in this since I myself am crawling some pages for my own "search index".
Oh and thx for making and posting. Added it as a keyword to firefox
Edit: Just realized that my question is a bit shallow. What I'm particular interested in is the storage before the indexing. I'm trying to store the raw html so that I can reindex everything with better algorithms, but I'm hitting many limits. It takes a few minutes getting the size of a site-directory (every site has it's own dir) and I'm at a point where I can't reasonably manage the scrape-versioning over git and I cycled through a few filesystems only to find that the metadata management kind of sucks for most of them. It's rather interesting how we store such files and I'm thinking about storing a few sites in a simple sqlite format for easy access and search. I'm thinking about a a few low overhead solutions like facebooks project haystack (implemented open source in seaweedfs) or something similar... Hopefully this gives some context to the question of storage and sites that are indexed
ReplyTested with the first person to settle on Island: https://search.marginalia.nu/search?query=Ingolf+Arnarson
and it worked surprisingly well.
Anyone else has good examples?
ReplyCan we submit text-heavy sites for possible inclusion? Assuming they pass your filters.
Replylol this is great, reminds me of the old school search engines we would use in school back in the day before Google haha.
ReplyCongratulations. Truly impressive search results. I tried two, one word searches. The results were interesting, useful, and would have been impossible (well, really really hard) to find, on standard search engines. Plus, no garbage, ads, recommendations, etc etc. As another commenter suggested, it is what World Wide Web searches results were like, twenty years ago!
ReplyThis is a fascinating tool, I estimated that the corpus of the factual web was between 1 and 10 TB when I last played around with BigQuery using domain names which had low amounts of click bait. Seeing these search results I suspect my estimate was off by a couple orders of magnitude.
Although a search for "Fractional Reserve Banking" shows that some further ranking improvements can be made to exclude unrelated results, and potentially penalize old conspiracy sites.
https://search.marginalia.nu/search?query=fractional+reserve...
ReplyIs it fair to assume that text-heavy sites that are inactive (but still online) don't have SSL?
If so, would you ever tweak the parameters to surface sites that that aren't served with "HTTPS"?
ReplyYou should monetise this with amazon affiliate links that are relevant to each search. And then use that money to keep this project going. Google is fantastic, but it has become something different from what it was, the company and the product. It is so refreshing to see a modern tool that encourages exploration of the actual world wide web.
ReplyLet a thousand search engines bloom.
btw, interesting how many http (as opposed to https) sites show up...
ReplyThis kicks ass!!
ReplyI love it. Even though it didn't give me the results I was looking for. I searched "new york fishing license", and it didn't give me any links to the actual new york fishing license websites. But it did give me a ton of really cute little websites related to lakes and fishing in New York. This one has amazing information about fishing all over Western New York: http://www.huntfishnyoutdoors.com/fishing.php
ReplyWow, I'm floored by the quality of the search results.
For example, I'm a huge fan of obscure Brian Eno stories and interviews and articles.
Using the default blend for "Brian Eno", I found http://www.moredarkthanshark.org/eno_interviews.html which is truly a labor of love and the most comprehensive list of articles (with links to them, too) I've ever seen.
Not in a hundred years would I have found this using Google.
Thank you for building this!
ReplyThis is really cool! So retro!
Here is the second result when you search for “cat food”. It takes you to some old dudes entire family tree with full history and biographies… it even uses sub domains and everything! Crazy!
ReplyThere's probably a more suitable term than "modern" that we should generally be using, since "modern" consistently has a positive connotation.
ReplyI like it.
Coincidentally, the other day I was daydreaming about a search engine that favors sites that are updated less frequently. The thought being, the kinds of labors of love that characterized the 1990s Web that I still sometimes miss are still out there, it's just harder to find them amidst the flood of SEO dreck. So perhaps they could be made discoverable again with the help of a contrarian search engine that specifically looks for the kinds of things that Google and Bing don't like to see.
ReplyCool!
There do seem to be some text encoding issues though. For example: https://search.marginalia.nu/search?query=tim+visee
ReplyFantastic project! Found very interesting links to a lot of compiler related keywords. A similar service, yet different in their approach to cut through the e-commerce and seo optimized websites I found useful is MillionsShort[0]
millionshort.com
ReplyDesigned for serendipity indeed. Tried a few searches, results are quite fun, but none of them relevant.
ReplyThis is really refreshing work, and we can all benefit from other search engines focused on improving the field. I tried a bunch of searches and some of them were quite wonderful, others were a little dry on results. But overall I enjoyed going through it. Here is some critiques if you don't mind:
I did search for "Daria Bilodid" and the results were a bit troublesome. First the Wikipedia result did not work: https://en.wikipedia.org/wiki/Daria_Bilodid vs https://encyclopedia.marginalia.nu/wiki/Daria_Bilodid
Secondly the results matched a few judoinside.com results which is ok, including sites to her competitors, but seemed to miss the judoinside website for her: https://www.judoinside.com/judoka/92660/Daria_Bilodid.
The design is hard on my eyes, I have a average size screen and its using less than half of the width. The line-height is enormous and seems to breakup flow making it uncomfortable for me to read. The spacing around each result is the same as between titles and paragraph items, which again was unpleasant to read.
Reply"corporate speak" bs detector and filter on google search engine would be nice.
ReplyAn interesting concept and awesome work!
I searched for high pressure air (HPA) regulator trying to find a description of how one works. I didn't find that, but did find some interesting links on how they're used in scuba, and one guy's homemade gas laser.
ReplyIs it possible to also make a site that favors a diverse set of information sources? For instance a lot of searches turn up results from Pinterest or Wikipedia or Amazon or whatever else. I wonder if there's room for a search engine that is all about favoring a greater diversity of smaller sources, for those who are less interested in staying within walled gardens.
ReplyI just looked up my last name and found a World class heavyweight weightlifter named Josef Grafl born in 1872 who has an awesome portrait of him on Wikipedia. Never before have I read about that man.
I love this.
ReplyBased on a few searches it seems to favor sites with very long passages of text. Search for a name and you get pages with massive lists of names. It quite simply isn’t very good at everyday searches. But it does bring up the point, shouldn’t I be able to tell my search engine I want results like this? It should be a feature of google I can turn on and off. It should be one of many ways to impact relevance.
ReplySeems like this is still very very hard! I searched for "hart protocol" hoping to find this: http://www.romilly.co.uk/
ReplyNot to overemphasize meta commentary, but damn 3200 points, 650 comments in 2 days -- this is one of the highest rated posts I can remember. Seems HN readers are very interested in alternatives to the current search hegemony and the kind of low-quality junk articles that litter it.
ReplyI searched "c strtok" and got one result saying '"strtok" could be spelled "stroke", "stork", "sarto", "strop"'.
Cool concept though!
ReplyIs there anyway to add this as a favored search engine in the browser?
I currently use google as it's set as the default search when I type in the address bar but would love to switch and move google/ddg to a added character like "<search terms> @g"
ReplyIt's excellent, I looked up some physics topics and got some excellent results - real meaty stuff full of text, eqations and applicable diagrams, etc.
I've not only bookmarked it but also I've an icon linked to it on the taskbar. Will watch its progress with interest.
ReplyBeware, I got the impression straight away that some sites were censored from the results for no good reason.
For example, if you search "jehovahs witnesses", all pages from jw.org are missing.
Exactly the same thing happened when I searched "mormons" - the official website is missing and it only brings up sects/hate/conspiracies against mormons.
ReplyThis is soooo good. I'm finally finding sites I haven't heard of with good content.
I didn't realize how much I missed this stuff.
The popular web has become so bad nowadays.
ReplyI searched for "Starlink satellites" and found this Y2K-style Canadian UFO blog [1] explaining it isn't aliens. I might just waste my weekend with this search engine.
[1] https://www.ufobc.ca/Reports/stringoflights.html
ReplyJust searching for 'dogs' gave me more interesting results than I've seen from google in years
ReplyThe website itself seems generated with some kind of kick-ass generator from template files (.gmi?)
I feel like I'm stuck with Wordpress.com because it brings me some traffic (whereas something hand-rolled on nsfspeech or digital ocean or whatever would literally be off the edge of the web), but the structure of that is so cool.
ReplyOne use case to always test: "online wishlist" or "make a wishlist". If you start seeing tools like https://www.DreamList.com or others, you are on the right path. If you start seeing random web pages linking to individual wish lists, then people are likely not able to find tools on your search engine.
Reply> Don't be afraid to scroll down in the search results
I never knew it was fear that was preventing me from scrolling
ReplyKudos for taking on this project, and I like the idea! I think it'll be a big project to take it to the next level, but would love to have a search engine that's more useful.
Some reactions:
- The font is really big and the columns really narrow, so I get 3 - 4 entries per page, something like 8 words per line, and huge spacings between lines, which makes it a frustrating experience. I've been using the recommendations in https://practicaltypography.com/, which recommends 60 - 90 characters in a line I think, and line spacing of 120% - 140% (I like 125%). The line lengths here might technically fall within the lower bound, but it's really short, and for search results I'm going to try scanning the text to see if there's something relevant, so I think going on the long side is better here. At least make the width somewhat variable so that I can shrink the rather large font and fit more on the line.
- The results are eclectic, but I'm not sure it's usable at the moment. "scala append list" did not get me much that's helpful, while Google will usually at least put up some click-farming tutorial that although minimal effort does tend to answer the question. Both "mapo doufu recipe" and "ma po do fu recipe" had very few recipes, although the latter did have one. Unfortunately, recipe websites are some of the worst, with about 10 pages of description, ads, pictures, what-have-you until the recipe at the very bottom. "collection unmitigated pedantry" did return the acoup.blog entry at the top, though.
Good luck on the project!
-
ReplyMy pet peeve with search results is simply that there are ancient technical results that in many cases are irrelevant. If I am searching for a Window error message, I don't want some old forum post from 2001, especially if it didn't have any answers!
What would be cool would be for people who host old stuff to "archive" it at some point so it doesn't appear in normal results, only if you tick "include archives".
Replymodern design = low information density?
ReplyI like the concept, but I did not work on any of the search phrases I entered consisting of the full title of a computer science article or book.
It also does not work for subjects. For example, if you search "discrete math" it links to academic webpages, but most of them do not have any notes posted. It is just a plain text website with the syllabus of a class.
ReplyObligatory https://wiby.me/ plug. If you're looking for a decent minimalist website search engine, this fits the bill pretty well.
ReplyCongrats for the effort, I really like the idea and it works wonderfully for some searches.
However, I searched "infiniband" and the results are far away from what I would expect or like to see. Most of the results that appear first are completely unrelated to the topic.
ReplyI tried a few queries and got extremely irrelevant results
ReplyA little bit harsh "punishes". It's a cool search engine.
ReplyVery cool. A person can really appreciate simple web design looking at something like Luke Smith's recipe page.
So how on earth do you take an idea like this and scale it for both broad web coverage and high traffic? For that matter, just how much 'useful' text is there on the net?
ReplyCool idea but it needs to be able to handle special characters. Right now searching "Hello World c#" returns no results because the search term can't handle #. I also can't just delete the # because then I would be stuck writing C...
ReplyThis is awesome! We should definitely move in this direction.
ReplyFascinating. I studied an "obscure" group of insects. My go-to search term to test an engine is their family name as it is a rarely used word and I know most (all?) of the major data sources that have accumulated data on it. When Wolfram Alpha added species names, I checked with the name, boring, Duck Duck, boring, Google (well we know Google isn't for search anymore, it's absolutely horrible) boring, Bing, boring... you get the idea.
This was a little different, extremely few results, but a couple of them really made me grin, and all(?) made me curious or raise an eyebrow or reflect on who/what might have been the source of the link, or remember some obscure connection from grad-school. So, if anything a crawled list of results worthy of ponder, thanks for this!
ReplyThis is great, I like the results. Couple of things I noticed:
- Search results often very old, from the early 2000s (I guess because back then more websites were text oriented). Are you taking into account the age of the page when showing results? It would be great to see more up-to-date results at the top
- I noticed a few results which directed me to websites with security risks, Firefox didn't even let me open them. Is it possible to filter these out from the results?
ReplyNo cyrillic or hiragana suport :-(
ReplyWhat we need now is a search engine that weeds out sites that have been SEO optimized for keyword density.
I’m tired of searching for “generic keyword” and getting a page with an extremely low signal to noise ratio written like this:
“Many people search for generic keyword. That is why you can find all about generic keyword here. In fact we specialize in generic keyword and slight alterations of generic keyword.”
It’s like Google stopped caring that people were gaming it.
Reply- Semantic HTML; not everything is a div; correct use of markup.
- Search results are not overrun with commercial, SEO stuffing, "content" farms.
I don't know what to say. This is such a refreshing sight. Well done.
ReplyI don't think this is a good idea: when I'm searching on the web I want to get results with a high relevance to my search query. I don't care a lot about the presentation.
ReplyYes please! More of this!
Reply"Search results Search "alt.sysadmin.recovery" needs to be a word Those were all the results,"
No comment.
ReplyCongratulations, great work!
ReplyI like the concept of a search engine that does not try to figure out what I should learn based on what I search..I know what I search for
ReplyGreat idea, awful UI
ReplyVery nice. Start a trend :)
ReplyIt says it punishes modern web design but it has my most irritating feature of modern web design: a narrow strip of text on an otherwise blank page.
ReplyYeah so this is my project. It's very much a work in progress, but occasionally I think it works remarkably well for something I cobbled together alone out of consumer hardware and home-made code :-)
Reply> New: You can now look up dictionary definitions for words. If you for example don't know what the definition of is is, you can inquire thus: define:is.
Oh man, I love subtle jabs and tongue in cheek writing like this. Very Robin Williams-esque.
Reply> This search engine isn't particularly well equipped to answering queries posed like questions, instead try to imagine some text that might appear in the website you are looking for, and search for that.
Heh, I guess I'm getting old but I remember when this was the only way to search the web
ReplyI love this! I've been searching random words with no aim in particular and keep finding lots of interesting tiny personal webpages. It feels like the old web
ReplyWow this is immediately useful
If you figure out some sort of funding model (maybe even just Patreon) I could totally see this as a viable side project
Already discovered this recipe site: https://based.cooking/
I love how adding recipes is through pull requests: https://github.com/LukeSmithxyz/based.cooking/pulls
ReplyLove it. You should provide a link to Patreon / whatever so people can support you financially. Hosting is probably not cheap for you. Given the love here on HN I suspect you'd do well.
ReplyAll of my searches are turning up unrelated results ("college life after the pandemic", "post-pandemic teaching in higher education", "football news NFL" etc.)
NFL one had 'some' decently related results, but the websites were all strangely disreputable.
ReplyGreat work and congrats!
ReplyNot sure how to contact you in case of possible bugs/problems, just going to drop a comment here. Was trying a bunch of "define:" queries and noticed something small;
This works https://search.marginalia.nu/search?query=define%3Ahallucina... This doesn't, or at least it's empty https://search.marginalia.nu/search?query=define%3Ahallucina...
ReplyThis is a fantastic search engine. It delivers on its promise of "serendipity". I found pages featuring my name that I'm not sure I've ever seen before, after many years of searching myself to test out search engines.
Perhaps more importantly, it delivers the most correct result when searching for my username: the first result is not any of my social media accounts, or even my own blog, but the text of the obscure science fiction story that I took my username from! Well done.
I've immediately added this as a search keyword in Firefox, and I'll be using it more in the future.
Could meta search engines like DuckDuckGo include this as a source? Should they?
ReplyHow does this have 2.5K upvotes when every single HN related project needs JS and a quad core CPU (for the browser to open a blank page) to view a paragraph of text?
Replyfrom About page:
> If you search for "Plato", you might for example end up at the Canterbury Tales. Go looking for the Canterbury Tales, and you may stumble upon Neil Gaiman's blog.
I know it is just a suggestion, but had to try searching both, with no luck in getting the expected unexpected.
ReplySee also: https://wiby.me/
ReplyAll search engines favors more text and less graphics.
ReplyOh, this is brilliant! I think I'll make this my "first stop" search engine.
ReplyI adore this. Unfortunately, searching for my own name - with or without quotes - doesn't actually find my site.
It does find a handful of references to me from over twenty years ago, though, which I thought was fascinating.
ReplyIf the website is targeted towards international audience then its nice to have the first page links to content in english. All the four links in the main page https://www.marginalia.nu/ have links to non-english content which is not useful.
Disclaimer: I am not a native english speaker. English is my second language.
ReplySaving this forever. Thank you for making it.
ReplyAs a quick test, I searched for the name of one of my favorite game series: "Baldur's Gate" (on its own, no qualifiers, properly spelled - I would usually spell it "baldurs gate" on Google, but I decided to give this one the best chance). I search for info around video games a lot, so that's quite representative of a good chunk of my web searches, and I pretty much know the top sites Google would give me for that query (on its own, without any further qualifiers).
The results were all either barely relevant, outdated (sites that covered the game back in the 90s/2000s before it was re-released), at best tangentially relevant or complete garbage noise. Some of the most highly relevant pages (such as the Steam store listing, the fandom wiki, the publisher/developer's forums for the re-releases, the Baldur's Gate 3 website and the subreddit) were not included at all. Those are all fairly text heavy by any reasonable standard, so I assume they were "punished" because they use JS? Would make sense that nearly all of them are way out of date.
Then I searched specifically for "Baldur's Gate Wiki" but still out of luck - some results, but nothing vaguely Wiki-like.
Finally I searched for "Baldur's Gate Fandom Wiki". This is basically "search engine easy mode", by giving essentially the name of of the site I am looking for. I got ZERO results. At this point I gave up and decided that this thing is useless.
Look, I'm all for unearthing good long-form content (in fact I would say that much of the content around this specific game would qualify), and I do get as annoyed at modern SPAs as the next grumpy neckbeard.
I think considering both of those in a search engine is not a bad idea in and of itself. But I have to wonder what's the point of a search engine that weights some arbitrary aspect of web design higher than the relevancy of the subject matter (to the point of not returning any results at all)? In fact, considering that generally speaking more recent websites tend to include more scripting, you are intentionally skewing the results towards (very) old content, which is probably doing the user a disservice.
ReplyThis is great!
I tried with "covid tyranny", and got some very interesting results I'd never get with any of the other search engines!
ReplyAfter a few experiments, I'd compare this to panning for gold in a river others ignore because the yield isn't good. You will find nuggets you would have otherwise missed, but you'll work for it. This engine is likely best used to supplement other options.
What set aside Google from its competitors was its use of eigenvalue weights. I don't sense a robust weighting system in use here.
ReplyLove it! I can punish my employees by setting this as a default search engine on their work laptops.
ReplyIvermectin (marginalia): https://search.marginalia.nu/search?query=ivermectin+
Ivermectin (Google): https://www.google.com/search?q=ivermectin
The difference in the overall _thrust_ of the results is remarkable.
Very interesting! Thanks for building it.
ReplyGreat results for "sauna". Lots of Web 1.0 pages discussing building plans and displaying pictures of individually built, traditional, unique, old saunas on some property.
The Google result are all blogspam or sales pages for cheap shipped saunas. Lots of "IR" results. Phony health benefit pages. Stock photos solely of beautiful new hotel gyms.
I've noticed this problem with Google results for quite some time. Sadly, the new content being created of the top variety is mostly being done within private Facebook groups that can't be easily searched, linked, or archived.
ReplyFantastic idea and it works quite well for short phrases that I tried.
As expected I am getting a lot of early 2000s sites which is something that I miss on regular Google.
Hilariously searching for "array data structure" got me one of the top results this little tiny page: http://infolab.stanford.edu/~backrub/google.html
ReplyThis is stunning. I searched "winemaking" because it's my latest obsession, and turned up dozens of links to high-quality pages I'd never seen despite spending an hour a day for three months cruising Google on the topic.
Please do announce it here if you ever decide to solicit help or contributors. My stab at this problem was to have a search index of only ad-free pages, on the hypothesis it would turn up self-hosted blogs, university personal pages, that sort of thing. But the results were too thin, your approach is much better.
Replyhmm, I dream of recipes search engine that punishes recipes pages with too much text. lol
ReplyLooking for an arm assembly instruction, instead I get this strange website as the result http://mailstar.net/coronavirus.html
Is that accidental or is this website promoted because it's text heavy and will surface for any search without many results?
Replythis is sweet
ReplyHi, It'd be nice if you could add a OpenSearch description document for your site.
https://developer.mozilla.org/en-US/docs/Web/OpenSearch
ReplyI like the idea but could use some tweaking. I keep getting conservative christian websites for some reason. And foreign language sites
ReplyIt you like wacky search engines, there's also Million Short: https://millionshort.com where you can search and remove the top 100/1K/10k/100K/1M results.
ReplyQuoted from the linked site:
> Convenience functions have been added, and the search engine can now perform simple calculations and unit conversions. Try 1 pint in cubic centimeters, or 50+sqrt(pi). This functionality is still under development, be patient if it doesn't work.
Why would you make any ever so small effort to implement calculations? I don't get it.
If your search engine enabled me to find more useful search results to my queries than google or yacy or whatever, I wouldn't care one tiny bit about being able to do calculations with it.
Why not focus on the search functionality?
ReplyThis is absolutely great. Seeing this for the first time somehow reminds me when I learnt about Google. many years ago. The salespitch for google back then was "They have all the linux related docs and infos indexed". The promise of this engine seems even more promising. A search engine specialized in text, look 'ma! I hope this grows and gets the user attention it deserves. Google has becomes so ad-infested in the last 3 years, its time something replaces it.
ReplyInteresting approach.
I always search myself on new search engines to compare the results. Most engines return my personal blog/website, books/stories I've written, news stories, my github projects/contributions, social links, etc.
This search engine surfaces just three obscure IRC logs that contain my nick in join/part messages (nothing said from me!) from 2009. And nothing else.
There's probably some things this approach is really good at but I'm not sure what they'd be for me off hand. Always cool to see new approaches to search, though.
Replyfuck you hacker news
ReplyI've read most of the comments here and people are evaluating the search results: all good information.
I'm looking at "punishes modern web design"... This thing IS modern web design. I think it's called "marginalia" in reference to the huge margins they chose!
I'm using a browser on a linux desktop and side-by-side, HN's page design is old-fashioned tasteful making pretty good use of space, and maginalia has a font that's more than twice the 2D pointsize and is so spread out with whitespace that the "Tips" on the home page are off the bottom of my window.
ReplyAs everything in life flows in cycle, I predict the search engine that will de-throne Google will be like Google when it started - a simple variation of page rank.
No smarts, no bubble, no signals decided by over fitting to a biased engineer preference.
ReplyWow. Love this.
Searched for “Ramanujan”, one of my heros.
Found this gem- https://math.ucr.edu/home/baez/ramanujan/
Ramanujan’s “easiest” formula.
Awesome!!
ReplyQuestion: how do we benchmark search engines? Are there any groups attempting to provide (open) solutions in this space?
(It seems to me that if you want to build a good search engine, this is the question you need to address first.)
Replythank you!
ReplyI have an interest in logic and cs curriculum and i like Geneses in general(last days i've read intro in math phylosophy from Russell and some acm report of cs curriculum. I search for cs curriculum and this is the first link https://www.cs.rice.edu/~vardi/sigcse/ Feels so good to recive good answers so easy. Thanks.
ReplyThank you so much. This is wonderful
ReplyI'm developing a text-heavy site and philosophically I'm trying to view documents as just that... documents [1].
But I don't get good results for "rug pull".
ReplyWouldn't this just skew towards really old sites?
The third search result for "dog" is this page on how to remove AOL Instant Messenger, published in 2002.
https://sillydog.org/netscape/kb/removeaim.html
No one wants to see newsletter signup popovers, but "modern web design" includes good performance and relevant content. (The search engine itself takes about 2 seconds to first contentful paint, not great.)
ReplyWhere does the data come from? Do you index the whole web yourself? I see it totally impossible for a personal project. I'm very curious about that.
ReplyWonderful work (':
Replyexcept it doesn't actually return that many results
ReplyAmazing! How do I make this my search engine on browser? Not home page.
Replycurious how do you afford the infrastructure? I found that the hardest part of running a search engine.
ReplyI tested this with "Caribbean Vacation" and wow what a difference. Everything on Google is "TOP X LIST" and "BEST XYZ" which are just the worst when trying to find real interesting information about experiences you can have on vacation somewhere. I had used those as starting points then searched for long-form blogs of real experiences people have had. This surfaced those kinds of things immediately. I love it.
ReplyI don't even want to imagine how google and other search engines crawl websites that make heavy usage of react or other ajax stuff. I don't want to be that guy.
I wonder if some browser engineers are trying to have some ideas on how to find a solution on this. Personally, I would just make a browser that breaks backward compatibility, remove old features, etc. I guess browsers would be much lighter, fast and simple if some hard choices were made.
Mozilla already decided to break some websites with the strict cookie policy. I wish they would do the same for everything else that sucks on the modern web.
I honestly don't think I have much respect for "web developers". In a way I want mobile apps to kill the modern web, just to prove a point.
ReplyRelated question - suppose I want to create a meta search engine for myself, and I want it to be as fast as possible. What are the things I should be optimizing for?
ReplyHow do you submit a site? I searched for "A search engine that favors text-heavy sites and punishes modern web design" but nothing was found.
ReplyOk this is great if all I want to do is read text, but often times that is very much not all I want to do. The web is much more than text and images these days. I can appreciate this as long as it’s branded as a search engine for blogs and articles specifically, as opposed to being touted as a drop-in replacement for the modern search engine.
ReplyVery cool idea! Room for lots of improvement, keep working on it, I like the direction this is going.
ReplyWow, if this catches on, my original content will actually matter![1] I've always had a love-hate relationship with modern web design principles because my design choices have all the excitement and polish of what we get on HN.
I'm sure I'm not the only one, either. Content-rich sites need more love.
ReplyIt works. Nice job.
ReplyNot sure what to do with this.
https://search.marginalia.nu/search?query=gan+charger
aside from nexperia none of this looks even remotely relevant.
ReplyThere is an open standard way for an engine like this to provide a mechanism for your standards aware browser to add the site as a alternative search with a click.
That way I would not have to remember or bookmark just use my search bar as normal and choose which engine for this query or set it as default.
[]https://developer.mozilla.org/en-US/docs/Web/OpenSearch
Reply"Don't be afraid to scroll down in the search results, unlike in many other search engines, depending on what you are looking for, you may find the best results in the middle of the listing."
This is a very polite way of saying "this engine isn't very good"
Overall impressed with the project but I thought the word play there was funny
ReplyThis engine is fantastic for recipes
ReplyLets get the the internet great place foe knowledge again. I really loved the engine ans tried for different terms and very happy. Goos job
ReplyInstead of looking for something specific, I decided to try a category of some sort to see what came up. Thinking about Jeopardy categories, I tried "potent potables" and found a lot of random pages that may or may not have made sense given that category but that I had a lot of fun reading. Definitely a win for me.
ReplyI would like a search that punishes 'modern' SPOs that load 87mb of the author's pet JS projects to display simple text. Basically every modern SPO.
Replyblessings upon you sir for making this
Replyeffort is good, but needs some work, no results here :
https://search.marginalia.nu/search?query=rxjava+2+api+docs
https://www.google.com/search?q=rxjava+2+api+docs&oq=rxjava+...
ReplyI liked this one... I searched for 'George Harrison' and among the first results there was a page with interesting comments about Harrison's solo career; someone reminiscing about the time they got to talk to him about guitars for half an hour at a bar at the airport; a transcript for an interview he gave on TV... Whereas on GOOGLE: an instrusive 'People also ask' which I was not interested; thumbnails for videos on youtube that I was not looking for; previews to garbage clickbaity news articles; and then finally for the search items: a bunch of websites for lyrics; his Instagram (!) and fb pages; his imdb page; some more news articles I was not looking for...
Granted, google's web results above are perhaps what people are looking for 75% of the time, but how limiting and boring.
I'm also a sucker for the simplistic text-centric, information-laden pages from the pre-facebook era.
For 'global warming', however - since Marginalia excludes modern web-design pages - the results are of dubious relevance and interest, since they are, well, 'old'.
I see myself using this engine a lot.
ReplyThis is wonderful and stupendous.
I’ve often thought that Google could be turned back into a good search engine by simply eliminating the crap and letting the useful sites float to the top of the results.
marginalia.nu seems to like my sites, so it must be good!
Some results are prefixed with ! or an arrow dingbat. What does that mean?
Replysearching for covid gives a bunch of bogus crap of fake news
ReplyA common use case, how to do random thing in programming:
I searched python make a bar chart and it returned a live coding video with an AI generated text transcript and two articles which mentioned a different kind of bar.
I then narrowed it down to just python bar chart, and got a blog post about scripting with a bar chart in it, this http://www.nitcentral.com/voyager4/hellyear.htm with monty python, bars, and charts from 1996 and among some other things I found this https://python-course.eu/naive_bayes_classifier_introduction..., which had an example of a python bar chart even though the title of the page made me think it wasn't what I wanted.
So for what I imagine to be a difficult search because of all the different meanings of the words, I found my result on the second query pretty quickly, and found some cool unrelated stuff too.
I like mostly that I get what I type in, and not exactly what I want, but what I want is there too.
ReplyI use webcrawler.com, and IMO it's better than any other search engine for finding exactly what I'm looking for. Not what's "trending", or "popular", or what the sheeple are searching for. It finds the exact matching keywords that I'm looking for. No inference or other bullshit -- just the matches.
Such a relief to not wade through oceans of worthless crap any more.
ReplyThis is the most amazing thing I have seen on here in at least a year!
It's... no... it can't be... a search engine that finds actual information instead of 5 megabyte blobs of tracking code and SEO crap!
ReplyI predict it will return a disproportionate amount of sites by schizophrenic conspiracists.
ReplyYou can add this to Firefox as search engine option by right clicking on the URL and selecting "Add Marginalia". From there, setting it as your default search engine is done from the "Settings" panel as with other predefined search engines.
I'm experimenting with using it as my default ...
ReplyThis is a search engine indexing the internet on a mariadb database hosted on consumer hardware maintained by a single person as a hobby and it does not suffer from HN hug of death
ReplyI read from other comments that you're writing your crawler bot yourself. Instead of crawling "from scratch", have you considered using an existing DB like Commoncrawl? Or is there something else that you index not present in Commoncrawl?
ReplyAs a sufferer of Tinnitus, and having spent near 100 hours researching it, I found a few sites I had never seen offering great data and tools. Thank you
ReplyToo bad the search index is currently restricted to ASCII-only (or at least Cyrillic and Latin-2 characters were rejected as "contains characters that are not currently supported").
I love the idea definitely, and I've long toyed around with building a similar thing that starts crawling off my own bookmarks (a personal small-deep-web if you wish).
I also love the "Small Web" name: this is the first I hear of it, and it's what I've long complained about — the web today hides all of the cool gems search engines of old would have given you!
I am also a bit split on the "www" prefix restriction (iiuc, domains which do not have "www" subdomain too are dropped from the index because many of them are spammy): it might for sure be a useful heuristic, but I've advocated for dropping "www" back in late 90s and early 2000s already (one reason being that for eg. Serbian, "w" is not in the alphabet, so you can't reasonably quote it as Serbian is otherwise a phonetic-language).
ReplyGave it a go with two different queries. The first I chose was “amazon vendor services” didn’t get a single result about the topic.
The second query was a nation+city(in the nation). Got a lot of result that were in no way related to either.
It seems to be biased towards IT topics (based uniquely on the two queries).
ReplyVery interesting because of the interesting results from random websites. It's a great discovery tool.
Now hoping for search engine that favors text-heavy sites and punishes paywalls
ReplySearch engines always like websites with more text and less graphics.
ReplyOh, I dream of a day where there are multiple useful search engines, specialized for different purposes.
You're doing God's work here. Thanks and good luck.
ReplyI'd like a chrome extension that marks links that target text-heavy vs "modern" so I know beforehand what to expect - paywall, ads, popups, clickbaits, etc.
ReplyThis is really cool, it filters out all fluff.
It's not always taking me to totally relevant sites but the results contain my favourite type of content.
Full of writing and pure html - usually the hallmark of someone who knows what they are doing, wants to communicate but doesn't want to waste their time.
ReplySearched for “chocolate chip cookie recipe”
First result had a recipe I could see both recipe and directions in a single page, no ads, no scrolling, no fake seo anecdotes about kids and grandmas.
(Pls make the search query box fit small mobile devices)
Great project idea!
Replysite design / logo © 2022 Box Piper