mirror of
https://github.com/openzim/zimit.git
synced 2025-12-31 04:23:15 +00:00
reset master branch for 2020 codebase
This commit is contained in:
parent
d178431e20
commit
15cf636ff3
17 changed files with 13 additions and 8305 deletions
3
Dockerfile
Normal file
3
Dockerfile
Normal file
|
|
@ -0,0 +1,3 @@
|
||||||
|
FROM debian:buster-slim
|
||||||
|
|
||||||
|
CMD ["/bin/bash"]
|
||||||
10
README.md
Normal file
10
README.md
Normal file
|
|
@ -0,0 +1,10 @@
|
||||||
|
zimit
|
||||||
|
===
|
||||||
|
|
||||||
|
Create ZIM files out of HTTP websites
|
||||||
|
|
||||||
|
# Previous version
|
||||||
|
|
||||||
|
A first version of a generic HTTP scraper was created in 2016 during the [Wikimania Esino Lario Hackathon](https://wikimania2016.wikimedia.org/wiki/Programme/Kiwix-dedicated_Hackathon).
|
||||||
|
|
||||||
|
That version is now considered outdated and [archived in `2016` branch](https://github.com/openzim/zimit/tree/2016).
|
||||||
246
README.rst
246
README.rst
|
|
@ -1,246 +0,0 @@
|
||||||
#####################################
|
|
||||||
Create ZIM files out of HTTP websites
|
|
||||||
#####################################
|
|
||||||
|
|
||||||
This project provides an API and an user interface in order to convert any
|
|
||||||
website into a Zim file.
|
|
||||||
|
|
||||||
Exposed API
|
|
||||||
###########
|
|
||||||
|
|
||||||
All APIs are talking JSON over HTTP. As such, all parameters should be sent as
|
|
||||||
stringified JSON and the Content-Type should be set to "application/json".
|
|
||||||
|
|
||||||
POST /website-zim
|
|
||||||
=================
|
|
||||||
|
|
||||||
By posting to this endpoint, you are asking the system to start a new download
|
|
||||||
of a website and a conversion into a Zim format.
|
|
||||||
|
|
||||||
Required parameters
|
|
||||||
-------------------
|
|
||||||
|
|
||||||
- **url**: URL of the website to be crawled
|
|
||||||
- **title**: Title that will be used in the created Zim file
|
|
||||||
- **email**: Email address that will get notified when the creation of the file is over
|
|
||||||
|
|
||||||
Optional parameters
|
|
||||||
-------------------
|
|
||||||
|
|
||||||
- **language**: An `ISO 639-3 <https://en.wikipedia.org/wiki/ISO_639-3>`_ code
|
|
||||||
representing the language
|
|
||||||
- **welcome**: the page that will be first shown in the Zim file
|
|
||||||
- **description**: The description that will be embedded in the Zim file
|
|
||||||
- **author**: The author of the content
|
|
||||||
|
|
||||||
Return values
|
|
||||||
-------------
|
|
||||||
|
|
||||||
- **job_id**: The job id is returned in JSON format. It can be used to know the
|
|
||||||
status of the process.
|
|
||||||
|
|
||||||
Status codes
|
|
||||||
------------
|
|
||||||
|
|
||||||
- `400 Bad Request` will be returned in case you are not respecting the
|
|
||||||
expected inputs. In case of error, have a look at the body of the response:
|
|
||||||
it contains information about what is missing.
|
|
||||||
- `201 Created` will be returned if the process started.
|
|
||||||
|
|
||||||
Exemple
|
|
||||||
-------
|
|
||||||
|
|
||||||
::
|
|
||||||
|
|
||||||
$ http POST http://0.0.0.0:6543/website-url url="https://refugeeinfo.eu/" title="Refugee Info" email="alexis@notmyidea.org"
|
|
||||||
HTTP/1.1 201 Created
|
|
||||||
|
|
||||||
{
|
|
||||||
"job": "5012abe3-bee2-4dd7-be87-39a88d76035d"
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
GET /status/{jobid}
|
|
||||||
===================
|
|
||||||
|
|
||||||
Retrieve the status of a job and displays the associated logs.
|
|
||||||
|
|
||||||
Return values
|
|
||||||
-------------
|
|
||||||
|
|
||||||
- **status**: The status of the job, it is one of 'queued', finished',
|
|
||||||
'failed', 'started' and 'deferred'.
|
|
||||||
- **log**: The logs of the job.
|
|
||||||
|
|
||||||
Status codes
|
|
||||||
------------
|
|
||||||
|
|
||||||
- `404 Not Found` will be returned in case the requested job does not exist.
|
|
||||||
- `200 OK` will be returned in any other case.
|
|
||||||
|
|
||||||
Exemple
|
|
||||||
-------
|
|
||||||
|
|
||||||
::
|
|
||||||
|
|
||||||
http GET http://0.0.0.0:6543/status/5012abe3-bee2-4dd7-be87-39a88d76035d
|
|
||||||
HTTP/1.1 200 OK
|
|
||||||
|
|
||||||
{
|
|
||||||
"log": "<snip>",
|
|
||||||
"status": "finished"
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
Okay, so how do I install it on my server?
|
|
||||||
##########################################
|
|
||||||
|
|
||||||
Currently, the best way to install it is by retrieving the sources from github
|
|
||||||
|
|
||||||
::
|
|
||||||
|
|
||||||
$ git clone https://github.com/almet/zimit.git
|
|
||||||
$ cd zimit
|
|
||||||
|
|
||||||
Create a virtual environment and install the project in it::
|
|
||||||
|
|
||||||
$ virtualenv venv
|
|
||||||
$ venv/bin/pip install -e .
|
|
||||||
|
|
||||||
Then, run it how you want, for instance with pserve::
|
|
||||||
|
|
||||||
$ venv/bin/pserve zimit.ini
|
|
||||||
|
|
||||||
|
|
||||||
In a separate process, you also need to run the worker::
|
|
||||||
|
|
||||||
$ venv/bin/rqworker
|
|
||||||
|
|
||||||
|
|
||||||
And you're ready to go. To test it::
|
|
||||||
|
|
||||||
$ http POST http://0.0.0.0:6543/website-url url="https://refugeeinfo.eu/" title="Refugee Info" email="alexis@notmyidea.org"
|
|
||||||
|
|
||||||
|
|
||||||
Debian dependencies
|
|
||||||
####################
|
|
||||||
|
|
||||||
Installing the dependencies
|
|
||||||
===========================
|
|
||||||
|
|
||||||
::
|
|
||||||
|
|
||||||
sudo apt-get install httrack libzim-dev libmagic-dev liblzma-dev libz-dev build-essential libtool libgumbo-dev redis-server automake pkg-config
|
|
||||||
|
|
||||||
Installing zimwriterfs
|
|
||||||
======================
|
|
||||||
|
|
||||||
::
|
|
||||||
|
|
||||||
git clone https://github.com/wikimedia/openzim.git
|
|
||||||
cd openzim/zimwriterfs
|
|
||||||
./autogen.sh
|
|
||||||
./configure
|
|
||||||
make
|
|
||||||
|
|
||||||
Then upgrade the path to zimwriterfs executable in zimit.ini
|
|
||||||
|
|
||||||
::
|
|
||||||
|
|
||||||
$ rqworker & pserve zimit.ini
|
|
||||||
|
|
||||||
How to deploy?
|
|
||||||
##############
|
|
||||||
|
|
||||||
There are multiple ways to deploy such service, so I'll describe how I do it
|
|
||||||
with my own best-practices.
|
|
||||||
|
|
||||||
First of all, get all the dependencies and the code. I like to have everything
|
|
||||||
available in /home/www, so let's consider this will be the case here::
|
|
||||||
|
|
||||||
$ mkdir /home/www/zimit.notmyidea.org
|
|
||||||
$ cd /home/www/zimit.notmyidea.org
|
|
||||||
$ git clone https://github.com/almet/zimit.git
|
|
||||||
|
|
||||||
Then, you can change the configuration file, by creating a new one::
|
|
||||||
|
|
||||||
$ cd zimit
|
|
||||||
$ cp zimit.ini local.ini
|
|
||||||
|
|
||||||
From there, you need to update the configuration to point to the correct
|
|
||||||
binaries and locations.
|
|
||||||
|
|
||||||
Nginx configuration
|
|
||||||
===================
|
|
||||||
|
|
||||||
::
|
|
||||||
|
|
||||||
# the upstream component nginx needs to connect to
|
|
||||||
upstream zimit_upstream {
|
|
||||||
server unix:///tmp/zimit.sock;
|
|
||||||
}
|
|
||||||
|
|
||||||
# configuration of the server
|
|
||||||
server {
|
|
||||||
listen 80;
|
|
||||||
listen [::]:80;
|
|
||||||
server_name zimit.ideascube.org;
|
|
||||||
charset utf-8;
|
|
||||||
|
|
||||||
client_max_body_size 200M;
|
|
||||||
|
|
||||||
location /zims {
|
|
||||||
alias /home/ideascube/zimit.ideascube.org/zims/;
|
|
||||||
autoindex on;
|
|
||||||
}
|
|
||||||
|
|
||||||
# Finally, send all non-media requests to the Pyramid server.
|
|
||||||
location / {
|
|
||||||
uwsgi_pass zimit_upstream;
|
|
||||||
include /var/ideascube/uwsgi_params;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
UWSGI configuration
|
|
||||||
===================
|
|
||||||
|
|
||||||
::
|
|
||||||
|
|
||||||
[uwsgi]
|
|
||||||
uid = ideascube
|
|
||||||
gid = ideascube
|
|
||||||
chdir = /home/ideascube/zimit.ideascube.org/zimit/
|
|
||||||
ini = /home/ideascube/zimit.ideascube.org/zimit/local.ini
|
|
||||||
# the virtualenv (full path)
|
|
||||||
home = /home/ideascube/zimit.ideascube.org/venv/
|
|
||||||
|
|
||||||
# process-related settings
|
|
||||||
# master
|
|
||||||
master = true
|
|
||||||
# maximum number of worker processes
|
|
||||||
processes = 4
|
|
||||||
# the socket (use the full path to be safe
|
|
||||||
socket = /tmp/zimit.sock
|
|
||||||
# ... with appropriate permissions - may be needed
|
|
||||||
chmod-socket = 666
|
|
||||||
# stats = /tmp/ideascube.stats.sock
|
|
||||||
# clear environment on exit
|
|
||||||
vacuum = true
|
|
||||||
plugins = python
|
|
||||||
|
|
||||||
|
|
||||||
supervisord configuration
|
|
||||||
=========================
|
|
||||||
|
|
||||||
::
|
|
||||||
|
|
||||||
[program:zimit-worker]
|
|
||||||
command=/home/ideascube/zimit.ideascube.org/venv/bin/rqworker
|
|
||||||
directory=/home/ideascube/zimit.ideascube.org/zimit/
|
|
||||||
user=www-data
|
|
||||||
autostart=true
|
|
||||||
autorestart=true
|
|
||||||
redirect_stderr=true
|
|
||||||
|
|
||||||
That's it!
|
|
||||||
24
app.wsgi
24
app.wsgi
|
|
@ -1,24 +0,0 @@
|
||||||
try:
|
|
||||||
import ConfigParser as configparser
|
|
||||||
except ImportError:
|
|
||||||
import configparser
|
|
||||||
import logging.config
|
|
||||||
import os
|
|
||||||
|
|
||||||
from zimit import main
|
|
||||||
|
|
||||||
here = os.path.dirname(__file__)
|
|
||||||
|
|
||||||
ini_path = os.environ.get('ZIMIT_INI')
|
|
||||||
if ini_path is None:
|
|
||||||
ini_path = os.path.join(here, 'local.ini')
|
|
||||||
|
|
||||||
# Set up logging
|
|
||||||
logging.config.fileConfig(ini_path)
|
|
||||||
|
|
||||||
# Parse config and create WSGI app
|
|
||||||
config = configparser.ConfigParser()
|
|
||||||
config.read(ini_path)
|
|
||||||
|
|
||||||
application = main(config.items('DEFAULT'), **dict(config.items('app:main'
|
|
||||||
)))
|
|
||||||
|
|
@ -1 +0,0 @@
|
||||||
.alertify-logs>*{padding:12px 24px;color:#fff;box-shadow:0 2px 5px 0 rgba(0,0,0,.2);border-radius:1px}.alertify-logs>*,.alertify-logs>.default{background:rgba(0,0,0,.8)}.alertify-logs>.error{background:rgba(244,67,54,.8)}.alertify-logs>.success{background:rgba(76,175,80,.9)}.alertify{position:fixed;background-color:rgba(0,0,0,.3);left:0;right:0;top:0;bottom:0;width:100%;height:100%;z-index:1}.alertify.hide{opacity:0;pointer-events:none}.alertify,.alertify.show{box-sizing:border-box;transition:all .33s cubic-bezier(.25,.8,.25,1)}.alertify,.alertify *{box-sizing:border-box}.alertify .dialog{padding:12px}.alertify .alert,.alertify .dialog{width:100%;margin:0 auto;position:relative;top:50%;transform:translateY(-50%)}.alertify .alert>*,.alertify .dialog>*{width:400px;max-width:95%;margin:0 auto;text-align:center;padding:12px;background:#fff;box-shadow:0 2px 4px -1px rgba(0,0,0,.14),0 4px 5px 0 rgba(0,0,0,.098),0 1px 10px 0 rgba(0,0,0,.084)}.alertify .alert .msg,.alertify .dialog .msg{padding:12px;margin-bottom:12px;margin:0;text-align:left}.alertify .alert input:not(.form-control),.alertify .dialog input:not(.form-control){margin-bottom:15px;width:100%;font-size:100%;padding:12px}.alertify .alert input:not(.form-control):focus,.alertify .dialog input:not(.form-control):focus{outline-offset:-2px}.alertify .alert nav,.alertify .dialog nav{text-align:right}.alertify .alert nav button:not(.btn):not(.pure-button):not(.md-button):not(.mdl-button),.alertify .dialog nav button:not(.btn):not(.pure-button):not(.md-button):not(.mdl-button){background:transparent;box-sizing:border-box;color:rgba(0,0,0,.87);position:relative;outline:0;border:0;display:inline-block;-ms-flex-align:center;-ms-grid-row-align:center;align-items:center;padding:0 6px;margin:6px 8px;line-height:36px;min-height:36px;white-space:nowrap;min-width:88px;text-align:center;text-transform:uppercase;font-size:14px;text-decoration:none;cursor:pointer;border:1px solid transparent;border-radius:2px}.alertify .alert nav button:not(.btn):not(.pure-button):not(.md-button):not(.mdl-button):active,.alertify .alert nav button:not(.btn):not(.pure-button):not(.md-button):not(.mdl-button):hover,.alertify .dialog nav button:not(.btn):not(.pure-button):not(.md-button):not(.mdl-button):active,.alertify .dialog nav button:not(.btn):not(.pure-button):not(.md-button):not(.mdl-button):hover{background-color:rgba(0,0,0,.05)}.alertify .alert nav button:not(.btn):not(.pure-button):not(.md-button):not(.mdl-button):focus,.alertify .dialog nav button:not(.btn):not(.pure-button):not(.md-button):not(.mdl-button):focus{border:1px solid rgba(0,0,0,.1)}.alertify .alert nav button.btn,.alertify .dialog nav button.btn{margin:6px 4px}.alertify-logs{position:fixed;z-index:1}.alertify-logs.bottom,.alertify-logs:not(.top){bottom:16px}.alertify-logs.left,.alertify-logs:not(.right){left:16px}.alertify-logs.left>*,.alertify-logs:not(.right)>*{float:left;transform:translateZ(0);height:auto}.alertify-logs.left>.show,.alertify-logs:not(.right)>.show{left:0}.alertify-logs.left>*,.alertify-logs.left>.hide,.alertify-logs:not(.right)>*,.alertify-logs:not(.right)>.hide{left:-110%}.alertify-logs.right{right:16px}.alertify-logs.right>*{float:right;transform:translateZ(0)}.alertify-logs.right>.show{right:0;opacity:1}.alertify-logs.right>*,.alertify-logs.right>.hide{right:-110%;opacity:0}.alertify-logs.top{top:0}.alertify-logs>*{box-sizing:border-box;transition:all .4s cubic-bezier(.25,.8,.25,1);position:relative;clear:both;backface-visibility:hidden;perspective:1000;max-height:0;margin:0;padding:0;overflow:hidden;opacity:0;pointer-events:none}.alertify-logs>.show{margin-top:12px;opacity:1;max-height:1000px;padding:12px;pointer-events:auto}
|
|
||||||
File diff suppressed because one or more lines are too long
7523
app/assets/bootstrap.css
vendored
7523
app/assets/bootstrap.css
vendored
File diff suppressed because it is too large
Load diff
|
|
@ -1,84 +0,0 @@
|
||||||
<!DOCTYPE html>
|
|
||||||
|
|
||||||
<head>
|
|
||||||
</head>
|
|
||||||
<link rel="stylesheet" href="./assets/bootstrap.css">
|
|
||||||
<link rel="stylesheet" href="./assets/alertify.css">
|
|
||||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
|
||||||
<meta http-equiv="X-UA-Compatible" content="IE=edge">
|
|
||||||
<meta http-equiv="content-type" content="text/html; charset=utf-8">
|
|
||||||
<title>Zimit — Create a zim archive out of a website URL</title>
|
|
||||||
|
|
||||||
<meta charset="utf-8" />
|
|
||||||
<body>
|
|
||||||
<div class="navbar navbar-default navbar-static-top">
|
|
||||||
<div class="container">
|
|
||||||
<div class="navbar-header">
|
|
||||||
<a class="navbar-brand" href="#">Zim it!</a>
|
|
||||||
</div>
|
|
||||||
<div class="navbar-collapse collapse">
|
|
||||||
<ul class="nav navbar-nav navbar-right">
|
|
||||||
<li><a href="http://www.openzim.org/wiki/Mission">Our values</a></li>
|
|
||||||
</ul>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
<div class="container">
|
|
||||||
<form action="#" id="zimcreator" onSubmit="submitForm()">
|
|
||||||
<div class="form-group field field-object">
|
|
||||||
<fieldset>
|
|
||||||
<div class="form-group field field-string">
|
|
||||||
<label class="control-label" for="url">Website URL</label>
|
|
||||||
<input id="url" label="Website URL" placeholder="https://google.com" class="form-control" type="url">
|
|
||||||
</div>
|
|
||||||
<div class="form-group field field-string">
|
|
||||||
<label class="control-label" for="url">Zim Title</label>
|
|
||||||
<input id="title" label="Website URL" placeholder="A great website" class="form-control" type="text">
|
|
||||||
</div>
|
|
||||||
<div class="form-group field field-string">
|
|
||||||
<label class="control-label" for="url">Enter an email to be notified when this is finished</label>
|
|
||||||
<input id="email" label="Email" placeholder="john@doe.com" class="form-control" type="email">
|
|
||||||
</div>
|
|
||||||
</fieldset>
|
|
||||||
</div>
|
|
||||||
<p>
|
|
||||||
<button type="submit" class="btn btn-info">Create the Zim file!</button>
|
|
||||||
</p>
|
|
||||||
</form>
|
|
||||||
<p>
|
|
||||||
This is a <a href="http://www.openzim.org/wiki/OpenZIM">Zim</a> creator. Enter the <em>url</em> of the website you want ton turn in a zim file, a <em>title</em> and click on <em>Create zim File</em>
|
|
||||||
</p>
|
|
||||||
<p>Enjoy !</p>
|
|
||||||
</div>
|
|
||||||
<script src="./assets/alertify.js"></script>
|
|
||||||
<script type="text/javascript">
|
|
||||||
|
|
||||||
function getField(field) {
|
|
||||||
return document.forms['zimcreator'].elements[field].value;
|
|
||||||
}
|
|
||||||
|
|
||||||
function submitForm() {
|
|
||||||
var content = {
|
|
||||||
url: getField('url'),
|
|
||||||
title: getField('title'),
|
|
||||||
email: getField('email'),
|
|
||||||
}
|
|
||||||
fetch("/website-zim", {
|
|
||||||
method: "POST",
|
|
||||||
body: JSON.stringify(content),
|
|
||||||
headers: {'Content-Type': 'application/json'}
|
|
||||||
}).then(function (result) {
|
|
||||||
if (result.status >= 400) {
|
|
||||||
alertify.error("The server wasn't able to start the job, please check your inputs.");
|
|
||||||
} else {
|
|
||||||
alertify.success("The job has been submitted! You'll receive an email when it's finished.");
|
|
||||||
}
|
|
||||||
})
|
|
||||||
.catch(function (error) {
|
|
||||||
alertify.error("Sorry, we weren't able to join the server. This is usually due to connectivity issues.");
|
|
||||||
});
|
|
||||||
return false;
|
|
||||||
}
|
|
||||||
</script>
|
|
||||||
|
|
||||||
</body>
|
|
||||||
BIN
favicon.ico
BIN
favicon.ico
Binary file not shown.
|
Before Width: | Height: | Size: 9.1 KiB |
33
setup.py
33
setup.py
|
|
@ -1,33 +0,0 @@
|
||||||
import os
|
|
||||||
from setuptools import setup, find_packages
|
|
||||||
|
|
||||||
here = os.path.abspath(os.path.dirname(__file__))
|
|
||||||
|
|
||||||
with open(os.path.join(here, 'README.rst')) as f:
|
|
||||||
README = f.read()
|
|
||||||
|
|
||||||
|
|
||||||
setup(name='zimit',
|
|
||||||
version=0.1,
|
|
||||||
description='zimit',
|
|
||||||
long_description=README,
|
|
||||||
classifiers=[
|
|
||||||
"Programming Language :: Python",
|
|
||||||
"Framework :: Pylons",
|
|
||||||
"Topic :: Internet :: WWW/HTTP",
|
|
||||||
"Topic :: Internet :: WWW/HTTP :: WSGI :: Application"
|
|
||||||
],
|
|
||||||
keywords="web services",
|
|
||||||
author='',
|
|
||||||
author_email='',
|
|
||||||
url='',
|
|
||||||
packages=find_packages(),
|
|
||||||
include_package_data=True,
|
|
||||||
zip_safe=False,
|
|
||||||
install_requires=['cornice', 'waitress', 'rq', 'colander',
|
|
||||||
'python-slugify', 'pyramid_mailer'],
|
|
||||||
entry_points="""\
|
|
||||||
[paste.app_factory]
|
|
||||||
main=zimit:main
|
|
||||||
""",
|
|
||||||
paster_plugins=['pyramid'])
|
|
||||||
62
zimit.ini
62
zimit.ini
|
|
@ -1,62 +0,0 @@
|
||||||
[app:main]
|
|
||||||
use = egg:zimit
|
|
||||||
|
|
||||||
zimit.zimwriterfs_bin = /home/alexis/dev/openzim/zimwriterfs/zimwriterfs
|
|
||||||
zimit.httrack_bin = /usr/bin/httrack
|
|
||||||
zimit.output_location = /home/alexis/dev/zimit/zims
|
|
||||||
zimit.output_url = http://zimit.notmyidea.org/zims
|
|
||||||
|
|
||||||
mail.host = localhost
|
|
||||||
mail.port = 2525
|
|
||||||
mail.default_sender = zimit@notmyidea.org
|
|
||||||
|
|
||||||
pyramid.includes =
|
|
||||||
pyramid_mailer
|
|
||||||
|
|
||||||
[server:main]
|
|
||||||
use = egg:waitress#main
|
|
||||||
host = 0.0.0.0
|
|
||||||
port = 6543
|
|
||||||
|
|
||||||
# Begin logging configuration
|
|
||||||
|
|
||||||
[uwsgi]
|
|
||||||
wsgi-file = app.wsgi
|
|
||||||
http-socket = :8000
|
|
||||||
enable-threads = true
|
|
||||||
master = true
|
|
||||||
processes = 1
|
|
||||||
virtualenv = .
|
|
||||||
module = zimit
|
|
||||||
lazy = true
|
|
||||||
lazy-apps = true
|
|
||||||
|
|
||||||
|
|
||||||
[loggers]
|
|
||||||
keys = root, gplayproxy
|
|
||||||
|
|
||||||
[handlers]
|
|
||||||
keys = console
|
|
||||||
|
|
||||||
[formatters]
|
|
||||||
keys = generic
|
|
||||||
|
|
||||||
[logger_root]
|
|
||||||
level = INFO
|
|
||||||
handlers = console
|
|
||||||
|
|
||||||
[logger_gplayproxy]
|
|
||||||
level = DEBUG
|
|
||||||
handlers =
|
|
||||||
qualname = gplayproxy
|
|
||||||
|
|
||||||
[handler_console]
|
|
||||||
class = StreamHandler
|
|
||||||
args = (sys.stderr,)
|
|
||||||
level = NOTSET
|
|
||||||
formatter = generic
|
|
||||||
|
|
||||||
[formatter_generic]
|
|
||||||
format = %(asctime)s %(levelname)-5.5s [%(name)s][%(threadName)s] %(message)s
|
|
||||||
|
|
||||||
# End logging configuration
|
|
||||||
|
|
@ -1,25 +0,0 @@
|
||||||
from pyramid.config import Configurator
|
|
||||||
from pyramid.events import NewRequest
|
|
||||||
from pyramid.static import static_view
|
|
||||||
|
|
||||||
from redis import Redis
|
|
||||||
from rq import Queue
|
|
||||||
|
|
||||||
|
|
||||||
def main(global_config, **settings):
|
|
||||||
config = Configurator(settings=settings)
|
|
||||||
config.registry.queue = Queue(connection=Redis())
|
|
||||||
|
|
||||||
def attach_objects_to_request(event):
|
|
||||||
event.request.queue = config.registry.queue
|
|
||||||
|
|
||||||
config.add_subscriber(attach_objects_to_request, NewRequest)
|
|
||||||
|
|
||||||
config.include("cornice")
|
|
||||||
config.include('pyramid_mailer')
|
|
||||||
config.scan("zimit.views")
|
|
||||||
|
|
||||||
static = static_view('../app', use_subpath=True, index='index.html')
|
|
||||||
config.add_route('catchall_static', '/app/*subpath')
|
|
||||||
config.add_view(static, route_name="catchall_static")
|
|
||||||
return config.make_wsgi_app()
|
|
||||||
146
zimit/creator.py
146
zimit/creator.py
|
|
@ -1,146 +0,0 @@
|
||||||
import os
|
|
||||||
import os.path
|
|
||||||
import shutil
|
|
||||||
import tempfile
|
|
||||||
import urlparse
|
|
||||||
|
|
||||||
from slugify import slugify
|
|
||||||
|
|
||||||
from zimit import utils
|
|
||||||
|
|
||||||
HTTRACK_BIN = "/usr/bin/httrack"
|
|
||||||
DEFAULT_AUTHOR = "ZimIt"
|
|
||||||
|
|
||||||
|
|
||||||
class ZimCreator(object):
|
|
||||||
"""A synchronous zim creator, using HTTrack to spider websites and
|
|
||||||
zimwriterfs to create the zim files.
|
|
||||||
|
|
||||||
Please note that every operation is blocking the interpretor. As such, it
|
|
||||||
is recommended to run this operation in a worker if invoked from a website
|
|
||||||
view / controller.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, zimwriterfs_bin, output_location,
|
|
||||||
author=DEFAULT_AUTHOR, httrack_bin=HTTRACK_BIN,
|
|
||||||
log_file=None, max_download_speed=25000):
|
|
||||||
self.output_location = output_location
|
|
||||||
self.author = author
|
|
||||||
self.zimwriterfs_bin = zimwriterfs_bin
|
|
||||||
self.httrack_bin = httrack_bin
|
|
||||||
self.log_file = log_file
|
|
||||||
self.max_download_speed = max_download_speed
|
|
||||||
|
|
||||||
utils.ensure_paths_exists(
|
|
||||||
self.zimwriterfs_bin,
|
|
||||||
self.httrack_bin,
|
|
||||||
self.output_location)
|
|
||||||
|
|
||||||
def _spawn(self, cmd):
|
|
||||||
return utils.spawn(cmd, self.log_file)
|
|
||||||
|
|
||||||
def download_website(self, url, destination_path):
|
|
||||||
"""Downloads the website using HTTrack and wait for the results to
|
|
||||||
be available before returning.
|
|
||||||
|
|
||||||
:param url:
|
|
||||||
The entry URL of the website to retrieve.
|
|
||||||
|
|
||||||
:param destination_path:
|
|
||||||
The absolute location of a folder where the files will be written.
|
|
||||||
"""
|
|
||||||
options = {
|
|
||||||
"path": destination_path,
|
|
||||||
"max-rate": self.max_download_speed,
|
|
||||||
"keep-alive": None,
|
|
||||||
"robots": 0,
|
|
||||||
"near": None,
|
|
||||||
}
|
|
||||||
|
|
||||||
self._spawn(utils.get_command(self.httrack_bin, url, **options))
|
|
||||||
|
|
||||||
def prepare_website_folder(self, url, input_location):
|
|
||||||
"""Prepare the website files to make them ready to be embedded in a zim
|
|
||||||
file.
|
|
||||||
|
|
||||||
:returns:
|
|
||||||
the absolute location of the website folder, ready to be embedded.
|
|
||||||
"""
|
|
||||||
netloc = urlparse.urlparse(url).netloc.replace(":", "_")
|
|
||||||
website_folder = os.path.join(input_location, netloc)
|
|
||||||
if not os.path.isdir(website_folder):
|
|
||||||
message = "Unable to find the website folder! %s" % website_folder
|
|
||||||
raise Exception(message)
|
|
||||||
shutil.copy('./favicon.ico', website_folder)
|
|
||||||
return website_folder
|
|
||||||
|
|
||||||
def create_zim(self, input_location, output_name, zim_options):
|
|
||||||
"""Create a zim file out of an existing folder on disk.
|
|
||||||
|
|
||||||
:param input_location:
|
|
||||||
The absolute location of the files to be bundled in the zim file.
|
|
||||||
:param output_name:
|
|
||||||
The name to use to create the zim file.
|
|
||||||
:param options:
|
|
||||||
Options to pass to the zim creator.
|
|
||||||
"""
|
|
||||||
|
|
||||||
zim_options.update({
|
|
||||||
'bin': self.zimwriterfs_bin,
|
|
||||||
'location': input_location,
|
|
||||||
'output': os.path.join(self.output_location, output_name),
|
|
||||||
'icon': 'favicon.ico',
|
|
||||||
'publisher': self.author,
|
|
||||||
})
|
|
||||||
|
|
||||||
# Spawn zimwriterfs with the correct options.
|
|
||||||
options = (
|
|
||||||
'{bin} -w "{welcome}" -l "{language}" -t "{title}"'
|
|
||||||
' -d "{description}" -f {icon} -c "{author}"'
|
|
||||||
' -p "{publisher}" {location} {output}'
|
|
||||||
).format(**zim_options)
|
|
||||||
self._spawn(options)
|
|
||||||
return output_name
|
|
||||||
|
|
||||||
def create_zim_from_website(self, url, zim_options):
|
|
||||||
"""Create a zim file from a website. It might take some time.
|
|
||||||
|
|
||||||
The name of the generated zim file is a slugified version of its URL.
|
|
||||||
|
|
||||||
:param url:
|
|
||||||
the URL of the website to download.
|
|
||||||
|
|
||||||
:param zim_options:
|
|
||||||
A dictionary of options to use when generating the Zim file. They
|
|
||||||
are title, language, welcome and description.
|
|
||||||
|
|
||||||
:returns:
|
|
||||||
the name of the generated zim_file (relative to the output_folder)
|
|
||||||
"""
|
|
||||||
temporary_location = tempfile.mkdtemp("zimit")
|
|
||||||
self.download_website(url, temporary_location)
|
|
||||||
website_folder = self.prepare_website_folder(url, temporary_location)
|
|
||||||
output_name = "{slug}.zim".format(slug=slugify(url))
|
|
||||||
zim_file = self.create_zim(website_folder, output_name, zim_options)
|
|
||||||
return zim_file
|
|
||||||
|
|
||||||
|
|
||||||
def load_from_settings(settings, log_file=None):
|
|
||||||
"""Load the ZimCreator object from the given pyramid settings, converting
|
|
||||||
them to actual parameters.
|
|
||||||
|
|
||||||
This is a convenience function for people wanting to create a ZimCreator
|
|
||||||
out of a ini file compatible with the pyramid framework.
|
|
||||||
|
|
||||||
:param settings: the dictionary of settings.
|
|
||||||
"""
|
|
||||||
if 'zimit.zimwriterfs_bin' not in settings:
|
|
||||||
raise ValueError('Please define zimit.zimwriterfs_bin config.')
|
|
||||||
|
|
||||||
return ZimCreator(
|
|
||||||
zimwriterfs_bin=settings['zimit.zimwriterfs_bin'],
|
|
||||||
httrack_bin=settings.get('zimit.httrack_bin'),
|
|
||||||
output_location=settings.get('zimit.output_location'),
|
|
||||||
author=settings.get('zimit.default_author'),
|
|
||||||
log_file=log_file
|
|
||||||
)
|
|
||||||
|
|
@ -1,42 +0,0 @@
|
||||||
from pyramid_mailer.message import Attachment, Message
|
|
||||||
from pyramid_mailer import Mailer
|
|
||||||
|
|
||||||
|
|
||||||
def send_zim_url(settings, email, zim_url):
|
|
||||||
"""Send an email with a link to one zim file.
|
|
||||||
|
|
||||||
:param settings:
|
|
||||||
A pyramid settings object, used by pyramid_mailer.
|
|
||||||
:param email:
|
|
||||||
The email of the recipient.
|
|
||||||
:param zim_url:
|
|
||||||
The URL of the zim file.
|
|
||||||
"""
|
|
||||||
mailer = Mailer.from_settings(settings)
|
|
||||||
msg = ZimReadyMessage(email, zim_url)
|
|
||||||
mailer.send_immediately(msg)
|
|
||||||
|
|
||||||
|
|
||||||
class ZimReadyMessage(Message):
|
|
||||||
def __init__(self, email, zim_link):
|
|
||||||
subject = "[ZimIt!] Your zimfile is ready!"
|
|
||||||
|
|
||||||
bdata = """
|
|
||||||
Hi,
|
|
||||||
|
|
||||||
You have asked for the creation of a zim file, and it is now ready !
|
|
||||||
|
|
||||||
You can access it at the following URL:
|
|
||||||
|
|
||||||
{zim_link}
|
|
||||||
|
|
||||||
Cheers,
|
|
||||||
ZimIt.
|
|
||||||
""".format(zim_link=zim_link)
|
|
||||||
hdata = bdata
|
|
||||||
|
|
||||||
body = Attachment(data=bdata, transfer_encoding="quoted-printable")
|
|
||||||
html = Attachment(data=hdata, transfer_encoding="quoted-printable")
|
|
||||||
|
|
||||||
super(ZimReadyMessage, self).__init__(
|
|
||||||
subject=subject, body=body, html=html, recipients=[email])
|
|
||||||
|
|
@ -1,35 +0,0 @@
|
||||||
import os
|
|
||||||
import shlex
|
|
||||||
import subprocess
|
|
||||||
|
|
||||||
|
|
||||||
def spawn(cmd, logfile=None):
|
|
||||||
"""Quick shortcut to spawn a command on the filesystem"""
|
|
||||||
if logfile is not None:
|
|
||||||
with open(logfile, "a+") as f:
|
|
||||||
prepared_cmd = shlex.split("stdbuf -o0 %s" % cmd)
|
|
||||||
process = subprocess.Popen(prepared_cmd, stdout=f)
|
|
||||||
else:
|
|
||||||
prepared_cmd = shlex.split(cmd)
|
|
||||||
process = subprocess.Popen(prepared_cmd)
|
|
||||||
process.wait()
|
|
||||||
return process
|
|
||||||
|
|
||||||
|
|
||||||
def ensure_paths_exists(*paths):
|
|
||||||
for path in paths:
|
|
||||||
if not os.path.exists(path):
|
|
||||||
msg = '%s does not exist.' % path
|
|
||||||
raise OSError(msg)
|
|
||||||
|
|
||||||
|
|
||||||
def get_command(cmd, *params, **options):
|
|
||||||
prepared_options = []
|
|
||||||
for key, value in options.items():
|
|
||||||
if value is None:
|
|
||||||
opt = "--%s" % key
|
|
||||||
else:
|
|
||||||
opt = "--%s=%s" % (key, value)
|
|
||||||
prepared_options.append(opt)
|
|
||||||
|
|
||||||
return " ".join((cmd, " ".join(params), " ".join(prepared_options)))
|
|
||||||
|
|
@ -1,63 +0,0 @@
|
||||||
import os
|
|
||||||
|
|
||||||
from cornice import Service
|
|
||||||
from colander import MappingSchema, SchemaNode, String
|
|
||||||
from pyramid.httpexceptions import HTTPTemporaryRedirect, HTTPNotFound
|
|
||||||
|
|
||||||
from zimit.worker import create_zim
|
|
||||||
|
|
||||||
website = Service(name='website', path='/website-zim')
|
|
||||||
home = Service(name='home', path='/')
|
|
||||||
status = Service(name='status', path='/status/{id}')
|
|
||||||
|
|
||||||
|
|
||||||
@home.get()
|
|
||||||
def redirect_to_app(request):
|
|
||||||
raise HTTPTemporaryRedirect("/app/index.html")
|
|
||||||
|
|
||||||
|
|
||||||
class WebSiteSchema(MappingSchema):
|
|
||||||
url = SchemaNode(String(), location="body", type='str')
|
|
||||||
title = SchemaNode(String(), location="body", type='str')
|
|
||||||
email = SchemaNode(String(), location="body", type='str')
|
|
||||||
description = SchemaNode(String(), default="-",
|
|
||||||
location="body", type='str')
|
|
||||||
author = SchemaNode(String(), default=None,
|
|
||||||
location="body", type='str')
|
|
||||||
welcome = SchemaNode(String(), default="index.html",
|
|
||||||
location="body", type='str')
|
|
||||||
language = SchemaNode(String(), default="eng",
|
|
||||||
location="body", type='str')
|
|
||||||
|
|
||||||
|
|
||||||
@website.post(schema=WebSiteSchema)
|
|
||||||
def crawl_new_website(request):
|
|
||||||
job = request.queue.enqueue(
|
|
||||||
create_zim,
|
|
||||||
request.registry.settings,
|
|
||||||
request.validated,
|
|
||||||
timeout=1800)
|
|
||||||
request.response.status_code = 201
|
|
||||||
return {
|
|
||||||
'job_id': job.id
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
@status.get()
|
|
||||||
def display_status(request):
|
|
||||||
job = request.queue.fetch_job(request.matchdict["id"])
|
|
||||||
if job is None:
|
|
||||||
raise HTTPNotFound()
|
|
||||||
|
|
||||||
log_dir = request.registry.settings.get('zimit.logdir', '/tmp')
|
|
||||||
log_file = os.path.join(log_dir, "%s.log" % job.id)
|
|
||||||
|
|
||||||
log_content = None
|
|
||||||
if os.path.exists(log_file):
|
|
||||||
with open(log_file) as f:
|
|
||||||
log_content = f.read()
|
|
||||||
|
|
||||||
return {
|
|
||||||
"status": job.status,
|
|
||||||
"log": log_content
|
|
||||||
}
|
|
||||||
|
|
@ -1,20 +0,0 @@
|
||||||
import os
|
|
||||||
import urlparse
|
|
||||||
|
|
||||||
from rq import get_current_job
|
|
||||||
|
|
||||||
from zimit.mailer import send_zim_url
|
|
||||||
from zimit.creator import load_from_settings
|
|
||||||
|
|
||||||
|
|
||||||
def create_zim(settings, options):
|
|
||||||
"""Call the zim creator and the mailer when it is finished.
|
|
||||||
"""
|
|
||||||
job = get_current_job()
|
|
||||||
log_dir = settings.get('zimit.logdir', '/tmp')
|
|
||||||
log_file = os.path.join(log_dir, "%s.log" % job.id)
|
|
||||||
zim_creator = load_from_settings(settings, log_file)
|
|
||||||
zim_file = zim_creator.create_zim_from_website(options['url'], options)
|
|
||||||
output_url = settings.get('zimit.output_url')
|
|
||||||
zim_url = urlparse.urljoin(output_url, zim_file)
|
|
||||||
send_zim_url(settings, options['email'], zim_url)
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue