Writing an alternative to Pocket App


#1

Pocket an, offline Read it later app, has been very popular and has recently made its way into firefox as a default application.
So Myself and @anandmoorthy have decided to start writing an alternative for the same.

We did expriment Wallabag, and it was too buggy and with my lack of knowledge in PHP (Wallabag is written in PHP), I have decided to use the following tech stack

  1. Base Language : Python
  2. Web Scraping : Scrapy, Beautiful Soup, (NLP do we need ?)
  3. API : flask or django (Prefer Flask)
  4. Database : MongoDB or Postgres (Mongo makes sense to me.)
  5. Client Apps : Kivy (kivy.org) (Android, iOS, Desktop etc)
  6. Firefox and Chrome Addon : Javascript

Will share the repo url soon.
We are looking for help in terms of contribution in code and designing architecture of the app (thinking does it make sense to make it decentralized) and documentation obviously.

Would also be grealy helpful, if somebody can explain how to really go about writing code for a app like this .


#2

In my previous workplace, Pocket was our benchmark for testing the quality of the product that we were building. It’s not a pocket alternative, but the core component were similar. i.e we want to parse html and extract the sensible / required information from any given link. We were able to make some progress.

I have written the following components like resolver, web crawler, etc,. the knowledge of which I can contribute to this development. We have learned that simple HTML parsing is not going to help us. Though web has set it’s own standards for HTML through W3C, the probability of a site following those standards is very very low which makes it worse to parse and look for content and moreover not all the pages are structured in a same way. Writing rules for parsing one specific site like wikipedia is easy, but structures differ by sites and there are millions of sites out there.

Now that I have defined what the core issue to be solved is, we have something called Portia which is what we should experiment with first. From here we can proceed. Visual bots are the key to proceed here. We need to look into machine learning stuff, NLP is of less importance here, but we cannot ignore it completely.

I can contribute in terms of design architecture, code contribution.

Phase 1 would be to focus on the prototype for the core module. Once we have a prototype, we can start tuning it and build other modules around it.


#3

@prashere Thanks for the reply and sharing your thoughts.
Just had a look at portia. Unless I have over looked, it is more of a scraping tool, but what it has got is a lot of generic spider definations.
Could we just use scrapy, and take a few spider definations from Portia (as most of the definations is already defined, not point re-writing).

Exprimenting portia now. More deeper


#4

Actually what are the tasks involved?
We need to save an article in either or both of the two forms - Web page view, Article view (a.k.a Reader’s view). Right?
Isn’t saving a web page view just like just saving a web page for offline reading?
May be for Article view (a.k.a Reader’s view) we can try to figure out how Firefox is managing to do it?

EDIT:
For starters, Firefox Reader View - StackOverflow


#5

Yes, tough webpage view and article view are important, What is really important in such a tool is identifying the actual content of the page.
What does a webpage typically include ? A menu, some side links, images, animations , the title of the page and the actual content . What we are interested is only the title and the content of the page. That is where as @prashere mentioned, we need some ML, and Visual bots.

I have been exploring portia, and trying to understand how the rules have been written for various websites.


#6

I use Firefox Reader View very often. Also in Firefox for Android, there is the same feature called “Readling List” which has the ability to replace Pocket App. Few months back, Mozilla officially integrated Pocket into the browser and I mistook it for the reader view / reading list feature.

Thanks for the stack overflow link @arjunmayilvaganan. Now that we know Firefox uses Readability.js, we can start using Firefox Reading list in Android phones which can save the pages for reading offline and I think we can use this readability.js library to write a standalone smartphone applications like Pocket.


#7

Yes, we’ll first start with a Standalone Pocket-like app and then integrate readability inside Firefox for Android, if that’s what you mean.


Besides, I don’t know why would Mozilla include Pocket by default inside Firefox, as it could be a privacy concern. I started a thread about 3 months ago, regarding the same - Firefox’s inclusion of Pocket - FSFTN Discussion Forum


#8

No. What I meant is, Readability is already integrated into Firefox for Android as well as Firefox for Desktop. Some people might not install Firefox in smartphone, having a standalone Pocket like app built using readability would replace Pocket without the need to install a web browser.


#9

Sorry, I was confused because of the word Readability as there is also another Pocket-like app called Readability, besides the ReadabilityJS framework.


Correct me If I’m wrong. Readability can only help extract the article (or) required content out of the page, but the need for a Free alternative to Pocket still exists, as offline reading is also required, right?


#10

They both are same. Readability is a Javascript Framework available under Apache License. Mozilla has forked the ReadabilityJS repo so that they can continue developing it for mozilla with their goals and features in the mind without relying on the mainstream readability repository.

If at some point in time, the team or the company (arc90.com) behind the readability js decides to change the license and make it proprietary like Pocket, Mozilla or any of us won’t suffer because we have forked it already.

Readability.com as a website or smartphone app is offered as a pocket like service by the same developers. With readability, the core (i.e js framework) is open, but the clients (website, smartphone apps) are closed it seems.

Right now what we can do is to use the core of readability and build an free software clients around it. (before that we have to explore if such clients exists already).


#11

Ok, now I get it.
Thanks for the explanation.


#12

Perhaps Wallabag fits the bill as a free software replacement for the Pocket App?


#13

@ramaseshan and I tried Wallabag last year which sucked. I think they have a new release will try it and get back here.


#14

Folks, check Pilgrim


#15

Looks like this is no more required

https://blog.mozilla.org/blog/2017/02/27/mozilla-acquires-pocket/

Mozilla has now acquired Pocket. So pocket should pretty soon be made Free Software.

The only thing I am still trying to figure out is , Pocket does user reading content based content suggestions, need to see how mozilla handles it.


#16

I got myself settled with self-hosting Wallabag instance at https://read.purambokku.me and I have also spared some space for 4 other people in my instance.