mirror of
https://github.com/openzim/zimit.git
synced 2025-12-31 04:23:15 +00:00
reset master branch for 2020 codebase
This commit is contained in:
parent
d178431e20
commit
15cf636ff3
17 changed files with 13 additions and 8305 deletions
3
Dockerfile
Normal file
3
Dockerfile
Normal file
|
|
@ -0,0 +1,3 @@
|
|||
FROM debian:buster-slim
|
||||
|
||||
CMD ["/bin/bash"]
|
||||
10
README.md
Normal file
10
README.md
Normal file
|
|
@ -0,0 +1,10 @@
|
|||
zimit
|
||||
===
|
||||
|
||||
Create ZIM files out of HTTP websites
|
||||
|
||||
# Previous version
|
||||
|
||||
A first version of a generic HTTP scraper was created in 2016 during the [Wikimania Esino Lario Hackathon](https://wikimania2016.wikimedia.org/wiki/Programme/Kiwix-dedicated_Hackathon).
|
||||
|
||||
That version is now considered outdated and [archived in `2016` branch](https://github.com/openzim/zimit/tree/2016).
|
||||
246
README.rst
246
README.rst
|
|
@ -1,246 +0,0 @@
|
|||
#####################################
|
||||
Create ZIM files out of HTTP websites
|
||||
#####################################
|
||||
|
||||
This project provides an API and an user interface in order to convert any
|
||||
website into a Zim file.
|
||||
|
||||
Exposed API
|
||||
###########
|
||||
|
||||
All APIs are talking JSON over HTTP. As such, all parameters should be sent as
|
||||
stringified JSON and the Content-Type should be set to "application/json".
|
||||
|
||||
POST /website-zim
|
||||
=================
|
||||
|
||||
By posting to this endpoint, you are asking the system to start a new download
|
||||
of a website and a conversion into a Zim format.
|
||||
|
||||
Required parameters
|
||||
-------------------
|
||||
|
||||
- **url**: URL of the website to be crawled
|
||||
- **title**: Title that will be used in the created Zim file
|
||||
- **email**: Email address that will get notified when the creation of the file is over
|
||||
|
||||
Optional parameters
|
||||
-------------------
|
||||
|
||||
- **language**: An `ISO 639-3 <https://en.wikipedia.org/wiki/ISO_639-3>`_ code
|
||||
representing the language
|
||||
- **welcome**: the page that will be first shown in the Zim file
|
||||
- **description**: The description that will be embedded in the Zim file
|
||||
- **author**: The author of the content
|
||||
|
||||
Return values
|
||||
-------------
|
||||
|
||||
- **job_id**: The job id is returned in JSON format. It can be used to know the
|
||||
status of the process.
|
||||
|
||||
Status codes
|
||||
------------
|
||||
|
||||
- `400 Bad Request` will be returned in case you are not respecting the
|
||||
expected inputs. In case of error, have a look at the body of the response:
|
||||
it contains information about what is missing.
|
||||
- `201 Created` will be returned if the process started.
|
||||
|
||||
Exemple
|
||||
-------
|
||||
|
||||
::
|
||||
|
||||
$ http POST http://0.0.0.0:6543/website-url url="https://refugeeinfo.eu/" title="Refugee Info" email="alexis@notmyidea.org"
|
||||
HTTP/1.1 201 Created
|
||||
|
||||
{
|
||||
"job": "5012abe3-bee2-4dd7-be87-39a88d76035d"
|
||||
}
|
||||
|
||||
|
||||
GET /status/{jobid}
|
||||
===================
|
||||
|
||||
Retrieve the status of a job and displays the associated logs.
|
||||
|
||||
Return values
|
||||
-------------
|
||||
|
||||
- **status**: The status of the job, it is one of 'queued', finished',
|
||||
'failed', 'started' and 'deferred'.
|
||||
- **log**: The logs of the job.
|
||||
|
||||
Status codes
|
||||
------------
|
||||
|
||||
- `404 Not Found` will be returned in case the requested job does not exist.
|
||||
- `200 OK` will be returned in any other case.
|
||||
|
||||
Exemple
|
||||
-------
|
||||
|
||||
::
|
||||
|
||||
http GET http://0.0.0.0:6543/status/5012abe3-bee2-4dd7-be87-39a88d76035d
|
||||
HTTP/1.1 200 OK
|
||||
|
||||
{
|
||||
"log": "<snip>",
|
||||
"status": "finished"
|
||||
}
|
||||
|
||||
|
||||
Okay, so how do I install it on my server?
|
||||
##########################################
|
||||
|
||||
Currently, the best way to install it is by retrieving the sources from github
|
||||
|
||||
::
|
||||
|
||||
$ git clone https://github.com/almet/zimit.git
|
||||
$ cd zimit
|
||||
|
||||
Create a virtual environment and install the project in it::
|
||||
|
||||
$ virtualenv venv
|
||||
$ venv/bin/pip install -e .
|
||||
|
||||
Then, run it how you want, for instance with pserve::
|
||||
|
||||
$ venv/bin/pserve zimit.ini
|
||||
|
||||
|
||||
In a separate process, you also need to run the worker::
|
||||
|
||||
$ venv/bin/rqworker
|
||||
|
||||
|
||||
And you're ready to go. To test it::
|
||||
|
||||
$ http POST http://0.0.0.0:6543/website-url url="https://refugeeinfo.eu/" title="Refugee Info" email="alexis@notmyidea.org"
|
||||
|
||||
|
||||
Debian dependencies
|
||||
####################
|
||||
|
||||
Installing the dependencies
|
||||
===========================
|
||||
|
||||
::
|
||||
|
||||
sudo apt-get install httrack libzim-dev libmagic-dev liblzma-dev libz-dev build-essential libtool libgumbo-dev redis-server automake pkg-config
|
||||
|
||||
Installing zimwriterfs
|
||||
======================
|
||||
|
||||
::
|
||||
|
||||
git clone https://github.com/wikimedia/openzim.git
|
||||
cd openzim/zimwriterfs
|
||||
./autogen.sh
|
||||
./configure
|
||||
make
|
||||
|
||||
Then upgrade the path to zimwriterfs executable in zimit.ini
|
||||
|
||||
::
|
||||
|
||||
$ rqworker & pserve zimit.ini
|
||||
|
||||
How to deploy?
|
||||
##############
|
||||
|
||||
There are multiple ways to deploy such service, so I'll describe how I do it
|
||||
with my own best-practices.
|
||||
|
||||
First of all, get all the dependencies and the code. I like to have everything
|
||||
available in /home/www, so let's consider this will be the case here::
|
||||
|
||||
$ mkdir /home/www/zimit.notmyidea.org
|
||||
$ cd /home/www/zimit.notmyidea.org
|
||||
$ git clone https://github.com/almet/zimit.git
|
||||
|
||||
Then, you can change the configuration file, by creating a new one::
|
||||
|
||||
$ cd zimit
|
||||
$ cp zimit.ini local.ini
|
||||
|
||||
From there, you need to update the configuration to point to the correct
|
||||
binaries and locations.
|
||||
|
||||
Nginx configuration
|
||||
===================
|
||||
|
||||
::
|
||||
|
||||
# the upstream component nginx needs to connect to
|
||||
upstream zimit_upstream {
|
||||
server unix:///tmp/zimit.sock;
|
||||
}
|
||||
|
||||
# configuration of the server
|
||||
server {
|
||||
listen 80;
|
||||
listen [::]:80;
|
||||
server_name zimit.ideascube.org;
|
||||
charset utf-8;
|
||||
|
||||
client_max_body_size 200M;
|
||||
|
||||
location /zims {
|
||||
alias /home/ideascube/zimit.ideascube.org/zims/;
|
||||
autoindex on;
|
||||
}
|
||||
|
||||
# Finally, send all non-media requests to the Pyramid server.
|
||||
location / {
|
||||
uwsgi_pass zimit_upstream;
|
||||
include /var/ideascube/uwsgi_params;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
UWSGI configuration
|
||||
===================
|
||||
|
||||
::
|
||||
|
||||
[uwsgi]
|
||||
uid = ideascube
|
||||
gid = ideascube
|
||||
chdir = /home/ideascube/zimit.ideascube.org/zimit/
|
||||
ini = /home/ideascube/zimit.ideascube.org/zimit/local.ini
|
||||
# the virtualenv (full path)
|
||||
home = /home/ideascube/zimit.ideascube.org/venv/
|
||||
|
||||
# process-related settings
|
||||
# master
|
||||
master = true
|
||||
# maximum number of worker processes
|
||||
processes = 4
|
||||
# the socket (use the full path to be safe
|
||||
socket = /tmp/zimit.sock
|
||||
# ... with appropriate permissions - may be needed
|
||||
chmod-socket = 666
|
||||
# stats = /tmp/ideascube.stats.sock
|
||||
# clear environment on exit
|
||||
vacuum = true
|
||||
plugins = python
|
||||
|
||||
|
||||
supervisord configuration
|
||||
=========================
|
||||
|
||||
::
|
||||
|
||||
[program:zimit-worker]
|
||||
command=/home/ideascube/zimit.ideascube.org/venv/bin/rqworker
|
||||
directory=/home/ideascube/zimit.ideascube.org/zimit/
|
||||
user=www-data
|
||||
autostart=true
|
||||
autorestart=true
|
||||
redirect_stderr=true
|
||||
|
||||
That's it!
|
||||
24
app.wsgi
24
app.wsgi
|
|
@ -1,24 +0,0 @@
|
|||
try:
|
||||
import ConfigParser as configparser
|
||||
except ImportError:
|
||||
import configparser
|
||||
import logging.config
|
||||
import os
|
||||
|
||||
from zimit import main
|
||||
|
||||
here = os.path.dirname(__file__)
|
||||
|
||||
ini_path = os.environ.get('ZIMIT_INI')
|
||||
if ini_path is None:
|
||||
ini_path = os.path.join(here, 'local.ini')
|
||||
|
||||
# Set up logging
|
||||
logging.config.fileConfig(ini_path)
|
||||
|
||||
# Parse config and create WSGI app
|
||||
config = configparser.ConfigParser()
|
||||
config.read(ini_path)
|
||||
|
||||
application = main(config.items('DEFAULT'), **dict(config.items('app:main'
|
||||
)))
|
||||
|
|
@ -1 +0,0 @@
|
|||
.alertify-logs>*{padding:12px 24px;color:#fff;box-shadow:0 2px 5px 0 rgba(0,0,0,.2);border-radius:1px}.alertify-logs>*,.alertify-logs>.default{background:rgba(0,0,0,.8)}.alertify-logs>.error{background:rgba(244,67,54,.8)}.alertify-logs>.success{background:rgba(76,175,80,.9)}.alertify{position:fixed;background-color:rgba(0,0,0,.3);left:0;right:0;top:0;bottom:0;width:100%;height:100%;z-index:1}.alertify.hide{opacity:0;pointer-events:none}.alertify,.alertify.show{box-sizing:border-box;transition:all .33s cubic-bezier(.25,.8,.25,1)}.alertify,.alertify *{box-sizing:border-box}.alertify .dialog{padding:12px}.alertify .alert,.alertify .dialog{width:100%;margin:0 auto;position:relative;top:50%;transform:translateY(-50%)}.alertify .alert>*,.alertify .dialog>*{width:400px;max-width:95%;margin:0 auto;text-align:center;padding:12px;background:#fff;box-shadow:0 2px 4px -1px rgba(0,0,0,.14),0 4px 5px 0 rgba(0,0,0,.098),0 1px 10px 0 rgba(0,0,0,.084)}.alertify .alert .msg,.alertify .dialog .msg{padding:12px;margin-bottom:12px;margin:0;text-align:left}.alertify .alert input:not(.form-control),.alertify .dialog input:not(.form-control){margin-bottom:15px;width:100%;font-size:100%;padding:12px}.alertify .alert input:not(.form-control):focus,.alertify .dialog input:not(.form-control):focus{outline-offset:-2px}.alertify .alert nav,.alertify .dialog nav{text-align:right}.alertify .alert nav button:not(.btn):not(.pure-button):not(.md-button):not(.mdl-button),.alertify .dialog nav button:not(.btn):not(.pure-button):not(.md-button):not(.mdl-button){background:transparent;box-sizing:border-box;color:rgba(0,0,0,.87);position:relative;outline:0;border:0;display:inline-block;-ms-flex-align:center;-ms-grid-row-align:center;align-items:center;padding:0 6px;margin:6px 8px;line-height:36px;min-height:36px;white-space:nowrap;min-width:88px;text-align:center;text-transform:uppercase;font-size:14px;text-decoration:none;cursor:pointer;border:1px solid transparent;border-radius:2px}.alertify .alert nav button:not(.btn):not(.pure-button):not(.md-button):not(.mdl-button):active,.alertify .alert nav button:not(.btn):not(.pure-button):not(.md-button):not(.mdl-button):hover,.alertify .dialog nav button:not(.btn):not(.pure-button):not(.md-button):not(.mdl-button):active,.alertify .dialog nav button:not(.btn):not(.pure-button):not(.md-button):not(.mdl-button):hover{background-color:rgba(0,0,0,.05)}.alertify .alert nav button:not(.btn):not(.pure-button):not(.md-button):not(.mdl-button):focus,.alertify .dialog nav button:not(.btn):not(.pure-button):not(.md-button):not(.mdl-button):focus{border:1px solid rgba(0,0,0,.1)}.alertify .alert nav button.btn,.alertify .dialog nav button.btn{margin:6px 4px}.alertify-logs{position:fixed;z-index:1}.alertify-logs.bottom,.alertify-logs:not(.top){bottom:16px}.alertify-logs.left,.alertify-logs:not(.right){left:16px}.alertify-logs.left>*,.alertify-logs:not(.right)>*{float:left;transform:translateZ(0);height:auto}.alertify-logs.left>.show,.alertify-logs:not(.right)>.show{left:0}.alertify-logs.left>*,.alertify-logs.left>.hide,.alertify-logs:not(.right)>*,.alertify-logs:not(.right)>.hide{left:-110%}.alertify-logs.right{right:16px}.alertify-logs.right>*{float:right;transform:translateZ(0)}.alertify-logs.right>.show{right:0;opacity:1}.alertify-logs.right>*,.alertify-logs.right>.hide{right:-110%;opacity:0}.alertify-logs.top{top:0}.alertify-logs>*{box-sizing:border-box;transition:all .4s cubic-bezier(.25,.8,.25,1);position:relative;clear:both;backface-visibility:hidden;perspective:1000;max-height:0;margin:0;padding:0;overflow:hidden;opacity:0;pointer-events:none}.alertify-logs>.show{margin-top:12px;opacity:1;max-height:1000px;padding:12px;pointer-events:auto}
|
||||
File diff suppressed because one or more lines are too long
7523
app/assets/bootstrap.css
vendored
7523
app/assets/bootstrap.css
vendored
File diff suppressed because it is too large
Load diff
|
|
@ -1,84 +0,0 @@
|
|||
<!DOCTYPE html>
|
||||
|
||||
<head>
|
||||
</head>
|
||||
<link rel="stylesheet" href="./assets/bootstrap.css">
|
||||
<link rel="stylesheet" href="./assets/alertify.css">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<meta http-equiv="X-UA-Compatible" content="IE=edge">
|
||||
<meta http-equiv="content-type" content="text/html; charset=utf-8">
|
||||
<title>Zimit — Create a zim archive out of a website URL</title>
|
||||
|
||||
<meta charset="utf-8" />
|
||||
<body>
|
||||
<div class="navbar navbar-default navbar-static-top">
|
||||
<div class="container">
|
||||
<div class="navbar-header">
|
||||
<a class="navbar-brand" href="#">Zim it!</a>
|
||||
</div>
|
||||
<div class="navbar-collapse collapse">
|
||||
<ul class="nav navbar-nav navbar-right">
|
||||
<li><a href="http://www.openzim.org/wiki/Mission">Our values</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="container">
|
||||
<form action="#" id="zimcreator" onSubmit="submitForm()">
|
||||
<div class="form-group field field-object">
|
||||
<fieldset>
|
||||
<div class="form-group field field-string">
|
||||
<label class="control-label" for="url">Website URL</label>
|
||||
<input id="url" label="Website URL" placeholder="https://google.com" class="form-control" type="url">
|
||||
</div>
|
||||
<div class="form-group field field-string">
|
||||
<label class="control-label" for="url">Zim Title</label>
|
||||
<input id="title" label="Website URL" placeholder="A great website" class="form-control" type="text">
|
||||
</div>
|
||||
<div class="form-group field field-string">
|
||||
<label class="control-label" for="url">Enter an email to be notified when this is finished</label>
|
||||
<input id="email" label="Email" placeholder="john@doe.com" class="form-control" type="email">
|
||||
</div>
|
||||
</fieldset>
|
||||
</div>
|
||||
<p>
|
||||
<button type="submit" class="btn btn-info">Create the Zim file!</button>
|
||||
</p>
|
||||
</form>
|
||||
<p>
|
||||
This is a <a href="http://www.openzim.org/wiki/OpenZIM">Zim</a> creator. Enter the <em>url</em> of the website you want ton turn in a zim file, a <em>title</em> and click on <em>Create zim File</em>
|
||||
</p>
|
||||
<p>Enjoy !</p>
|
||||
</div>
|
||||
<script src="./assets/alertify.js"></script>
|
||||
<script type="text/javascript">
|
||||
|
||||
function getField(field) {
|
||||
return document.forms['zimcreator'].elements[field].value;
|
||||
}
|
||||
|
||||
function submitForm() {
|
||||
var content = {
|
||||
url: getField('url'),
|
||||
title: getField('title'),
|
||||
email: getField('email'),
|
||||
}
|
||||
fetch("/website-zim", {
|
||||
method: "POST",
|
||||
body: JSON.stringify(content),
|
||||
headers: {'Content-Type': 'application/json'}
|
||||
}).then(function (result) {
|
||||
if (result.status >= 400) {
|
||||
alertify.error("The server wasn't able to start the job, please check your inputs.");
|
||||
} else {
|
||||
alertify.success("The job has been submitted! You'll receive an email when it's finished.");
|
||||
}
|
||||
})
|
||||
.catch(function (error) {
|
||||
alertify.error("Sorry, we weren't able to join the server. This is usually due to connectivity issues.");
|
||||
});
|
||||
return false;
|
||||
}
|
||||
</script>
|
||||
|
||||
</body>
|
||||
BIN
favicon.ico
BIN
favicon.ico
Binary file not shown.
|
Before Width: | Height: | Size: 9.1 KiB |
33
setup.py
33
setup.py
|
|
@ -1,33 +0,0 @@
|
|||
import os
|
||||
from setuptools import setup, find_packages
|
||||
|
||||
here = os.path.abspath(os.path.dirname(__file__))
|
||||
|
||||
with open(os.path.join(here, 'README.rst')) as f:
|
||||
README = f.read()
|
||||
|
||||
|
||||
setup(name='zimit',
|
||||
version=0.1,
|
||||
description='zimit',
|
||||
long_description=README,
|
||||
classifiers=[
|
||||
"Programming Language :: Python",
|
||||
"Framework :: Pylons",
|
||||
"Topic :: Internet :: WWW/HTTP",
|
||||
"Topic :: Internet :: WWW/HTTP :: WSGI :: Application"
|
||||
],
|
||||
keywords="web services",
|
||||
author='',
|
||||
author_email='',
|
||||
url='',
|
||||
packages=find_packages(),
|
||||
include_package_data=True,
|
||||
zip_safe=False,
|
||||
install_requires=['cornice', 'waitress', 'rq', 'colander',
|
||||
'python-slugify', 'pyramid_mailer'],
|
||||
entry_points="""\
|
||||
[paste.app_factory]
|
||||
main=zimit:main
|
||||
""",
|
||||
paster_plugins=['pyramid'])
|
||||
62
zimit.ini
62
zimit.ini
|
|
@ -1,62 +0,0 @@
|
|||
[app:main]
|
||||
use = egg:zimit
|
||||
|
||||
zimit.zimwriterfs_bin = /home/alexis/dev/openzim/zimwriterfs/zimwriterfs
|
||||
zimit.httrack_bin = /usr/bin/httrack
|
||||
zimit.output_location = /home/alexis/dev/zimit/zims
|
||||
zimit.output_url = http://zimit.notmyidea.org/zims
|
||||
|
||||
mail.host = localhost
|
||||
mail.port = 2525
|
||||
mail.default_sender = zimit@notmyidea.org
|
||||
|
||||
pyramid.includes =
|
||||
pyramid_mailer
|
||||
|
||||
[server:main]
|
||||
use = egg:waitress#main
|
||||
host = 0.0.0.0
|
||||
port = 6543
|
||||
|
||||
# Begin logging configuration
|
||||
|
||||
[uwsgi]
|
||||
wsgi-file = app.wsgi
|
||||
http-socket = :8000
|
||||
enable-threads = true
|
||||
master = true
|
||||
processes = 1
|
||||
virtualenv = .
|
||||
module = zimit
|
||||
lazy = true
|
||||
lazy-apps = true
|
||||
|
||||
|
||||
[loggers]
|
||||
keys = root, gplayproxy
|
||||
|
||||
[handlers]
|
||||
keys = console
|
||||
|
||||
[formatters]
|
||||
keys = generic
|
||||
|
||||
[logger_root]
|
||||
level = INFO
|
||||
handlers = console
|
||||
|
||||
[logger_gplayproxy]
|
||||
level = DEBUG
|
||||
handlers =
|
||||
qualname = gplayproxy
|
||||
|
||||
[handler_console]
|
||||
class = StreamHandler
|
||||
args = (sys.stderr,)
|
||||
level = NOTSET
|
||||
formatter = generic
|
||||
|
||||
[formatter_generic]
|
||||
format = %(asctime)s %(levelname)-5.5s [%(name)s][%(threadName)s] %(message)s
|
||||
|
||||
# End logging configuration
|
||||
|
|
@ -1,25 +0,0 @@
|
|||
from pyramid.config import Configurator
|
||||
from pyramid.events import NewRequest
|
||||
from pyramid.static import static_view
|
||||
|
||||
from redis import Redis
|
||||
from rq import Queue
|
||||
|
||||
|
||||
def main(global_config, **settings):
|
||||
config = Configurator(settings=settings)
|
||||
config.registry.queue = Queue(connection=Redis())
|
||||
|
||||
def attach_objects_to_request(event):
|
||||
event.request.queue = config.registry.queue
|
||||
|
||||
config.add_subscriber(attach_objects_to_request, NewRequest)
|
||||
|
||||
config.include("cornice")
|
||||
config.include('pyramid_mailer')
|
||||
config.scan("zimit.views")
|
||||
|
||||
static = static_view('../app', use_subpath=True, index='index.html')
|
||||
config.add_route('catchall_static', '/app/*subpath')
|
||||
config.add_view(static, route_name="catchall_static")
|
||||
return config.make_wsgi_app()
|
||||
146
zimit/creator.py
146
zimit/creator.py
|
|
@ -1,146 +0,0 @@
|
|||
import os
|
||||
import os.path
|
||||
import shutil
|
||||
import tempfile
|
||||
import urlparse
|
||||
|
||||
from slugify import slugify
|
||||
|
||||
from zimit import utils
|
||||
|
||||
HTTRACK_BIN = "/usr/bin/httrack"
|
||||
DEFAULT_AUTHOR = "ZimIt"
|
||||
|
||||
|
||||
class ZimCreator(object):
|
||||
"""A synchronous zim creator, using HTTrack to spider websites and
|
||||
zimwriterfs to create the zim files.
|
||||
|
||||
Please note that every operation is blocking the interpretor. As such, it
|
||||
is recommended to run this operation in a worker if invoked from a website
|
||||
view / controller.
|
||||
"""
|
||||
|
||||
def __init__(self, zimwriterfs_bin, output_location,
|
||||
author=DEFAULT_AUTHOR, httrack_bin=HTTRACK_BIN,
|
||||
log_file=None, max_download_speed=25000):
|
||||
self.output_location = output_location
|
||||
self.author = author
|
||||
self.zimwriterfs_bin = zimwriterfs_bin
|
||||
self.httrack_bin = httrack_bin
|
||||
self.log_file = log_file
|
||||
self.max_download_speed = max_download_speed
|
||||
|
||||
utils.ensure_paths_exists(
|
||||
self.zimwriterfs_bin,
|
||||
self.httrack_bin,
|
||||
self.output_location)
|
||||
|
||||
def _spawn(self, cmd):
|
||||
return utils.spawn(cmd, self.log_file)
|
||||
|
||||
def download_website(self, url, destination_path):
|
||||
"""Downloads the website using HTTrack and wait for the results to
|
||||
be available before returning.
|
||||
|
||||
:param url:
|
||||
The entry URL of the website to retrieve.
|
||||
|
||||
:param destination_path:
|
||||
The absolute location of a folder where the files will be written.
|
||||
"""
|
||||
options = {
|
||||
"path": destination_path,
|
||||
"max-rate": self.max_download_speed,
|
||||
"keep-alive": None,
|
||||
"robots": 0,
|
||||
"near": None,
|
||||
}
|
||||
|
||||
self._spawn(utils.get_command(self.httrack_bin, url, **options))
|
||||
|
||||
def prepare_website_folder(self, url, input_location):
|
||||
"""Prepare the website files to make them ready to be embedded in a zim
|
||||
file.
|
||||
|
||||
:returns:
|
||||
the absolute location of the website folder, ready to be embedded.
|
||||
"""
|
||||
netloc = urlparse.urlparse(url).netloc.replace(":", "_")
|
||||
website_folder = os.path.join(input_location, netloc)
|
||||
if not os.path.isdir(website_folder):
|
||||
message = "Unable to find the website folder! %s" % website_folder
|
||||
raise Exception(message)
|
||||
shutil.copy('./favicon.ico', website_folder)
|
||||
return website_folder
|
||||
|
||||
def create_zim(self, input_location, output_name, zim_options):
|
||||
"""Create a zim file out of an existing folder on disk.
|
||||
|
||||
:param input_location:
|
||||
The absolute location of the files to be bundled in the zim file.
|
||||
:param output_name:
|
||||
The name to use to create the zim file.
|
||||
:param options:
|
||||
Options to pass to the zim creator.
|
||||
"""
|
||||
|
||||
zim_options.update({
|
||||
'bin': self.zimwriterfs_bin,
|
||||
'location': input_location,
|
||||
'output': os.path.join(self.output_location, output_name),
|
||||
'icon': 'favicon.ico',
|
||||
'publisher': self.author,
|
||||
})
|
||||
|
||||
# Spawn zimwriterfs with the correct options.
|
||||
options = (
|
||||
'{bin} -w "{welcome}" -l "{language}" -t "{title}"'
|
||||
' -d "{description}" -f {icon} -c "{author}"'
|
||||
' -p "{publisher}" {location} {output}'
|
||||
).format(**zim_options)
|
||||
self._spawn(options)
|
||||
return output_name
|
||||
|
||||
def create_zim_from_website(self, url, zim_options):
|
||||
"""Create a zim file from a website. It might take some time.
|
||||
|
||||
The name of the generated zim file is a slugified version of its URL.
|
||||
|
||||
:param url:
|
||||
the URL of the website to download.
|
||||
|
||||
:param zim_options:
|
||||
A dictionary of options to use when generating the Zim file. They
|
||||
are title, language, welcome and description.
|
||||
|
||||
:returns:
|
||||
the name of the generated zim_file (relative to the output_folder)
|
||||
"""
|
||||
temporary_location = tempfile.mkdtemp("zimit")
|
||||
self.download_website(url, temporary_location)
|
||||
website_folder = self.prepare_website_folder(url, temporary_location)
|
||||
output_name = "{slug}.zim".format(slug=slugify(url))
|
||||
zim_file = self.create_zim(website_folder, output_name, zim_options)
|
||||
return zim_file
|
||||
|
||||
|
||||
def load_from_settings(settings, log_file=None):
|
||||
"""Load the ZimCreator object from the given pyramid settings, converting
|
||||
them to actual parameters.
|
||||
|
||||
This is a convenience function for people wanting to create a ZimCreator
|
||||
out of a ini file compatible with the pyramid framework.
|
||||
|
||||
:param settings: the dictionary of settings.
|
||||
"""
|
||||
if 'zimit.zimwriterfs_bin' not in settings:
|
||||
raise ValueError('Please define zimit.zimwriterfs_bin config.')
|
||||
|
||||
return ZimCreator(
|
||||
zimwriterfs_bin=settings['zimit.zimwriterfs_bin'],
|
||||
httrack_bin=settings.get('zimit.httrack_bin'),
|
||||
output_location=settings.get('zimit.output_location'),
|
||||
author=settings.get('zimit.default_author'),
|
||||
log_file=log_file
|
||||
)
|
||||
|
|
@ -1,42 +0,0 @@
|
|||
from pyramid_mailer.message import Attachment, Message
|
||||
from pyramid_mailer import Mailer
|
||||
|
||||
|
||||
def send_zim_url(settings, email, zim_url):
|
||||
"""Send an email with a link to one zim file.
|
||||
|
||||
:param settings:
|
||||
A pyramid settings object, used by pyramid_mailer.
|
||||
:param email:
|
||||
The email of the recipient.
|
||||
:param zim_url:
|
||||
The URL of the zim file.
|
||||
"""
|
||||
mailer = Mailer.from_settings(settings)
|
||||
msg = ZimReadyMessage(email, zim_url)
|
||||
mailer.send_immediately(msg)
|
||||
|
||||
|
||||
class ZimReadyMessage(Message):
|
||||
def __init__(self, email, zim_link):
|
||||
subject = "[ZimIt!] Your zimfile is ready!"
|
||||
|
||||
bdata = """
|
||||
Hi,
|
||||
|
||||
You have asked for the creation of a zim file, and it is now ready !
|
||||
|
||||
You can access it at the following URL:
|
||||
|
||||
{zim_link}
|
||||
|
||||
Cheers,
|
||||
ZimIt.
|
||||
""".format(zim_link=zim_link)
|
||||
hdata = bdata
|
||||
|
||||
body = Attachment(data=bdata, transfer_encoding="quoted-printable")
|
||||
html = Attachment(data=hdata, transfer_encoding="quoted-printable")
|
||||
|
||||
super(ZimReadyMessage, self).__init__(
|
||||
subject=subject, body=body, html=html, recipients=[email])
|
||||
|
|
@ -1,35 +0,0 @@
|
|||
import os
|
||||
import shlex
|
||||
import subprocess
|
||||
|
||||
|
||||
def spawn(cmd, logfile=None):
|
||||
"""Quick shortcut to spawn a command on the filesystem"""
|
||||
if logfile is not None:
|
||||
with open(logfile, "a+") as f:
|
||||
prepared_cmd = shlex.split("stdbuf -o0 %s" % cmd)
|
||||
process = subprocess.Popen(prepared_cmd, stdout=f)
|
||||
else:
|
||||
prepared_cmd = shlex.split(cmd)
|
||||
process = subprocess.Popen(prepared_cmd)
|
||||
process.wait()
|
||||
return process
|
||||
|
||||
|
||||
def ensure_paths_exists(*paths):
|
||||
for path in paths:
|
||||
if not os.path.exists(path):
|
||||
msg = '%s does not exist.' % path
|
||||
raise OSError(msg)
|
||||
|
||||
|
||||
def get_command(cmd, *params, **options):
|
||||
prepared_options = []
|
||||
for key, value in options.items():
|
||||
if value is None:
|
||||
opt = "--%s" % key
|
||||
else:
|
||||
opt = "--%s=%s" % (key, value)
|
||||
prepared_options.append(opt)
|
||||
|
||||
return " ".join((cmd, " ".join(params), " ".join(prepared_options)))
|
||||
|
|
@ -1,63 +0,0 @@
|
|||
import os
|
||||
|
||||
from cornice import Service
|
||||
from colander import MappingSchema, SchemaNode, String
|
||||
from pyramid.httpexceptions import HTTPTemporaryRedirect, HTTPNotFound
|
||||
|
||||
from zimit.worker import create_zim
|
||||
|
||||
website = Service(name='website', path='/website-zim')
|
||||
home = Service(name='home', path='/')
|
||||
status = Service(name='status', path='/status/{id}')
|
||||
|
||||
|
||||
@home.get()
|
||||
def redirect_to_app(request):
|
||||
raise HTTPTemporaryRedirect("/app/index.html")
|
||||
|
||||
|
||||
class WebSiteSchema(MappingSchema):
|
||||
url = SchemaNode(String(), location="body", type='str')
|
||||
title = SchemaNode(String(), location="body", type='str')
|
||||
email = SchemaNode(String(), location="body", type='str')
|
||||
description = SchemaNode(String(), default="-",
|
||||
location="body", type='str')
|
||||
author = SchemaNode(String(), default=None,
|
||||
location="body", type='str')
|
||||
welcome = SchemaNode(String(), default="index.html",
|
||||
location="body", type='str')
|
||||
language = SchemaNode(String(), default="eng",
|
||||
location="body", type='str')
|
||||
|
||||
|
||||
@website.post(schema=WebSiteSchema)
|
||||
def crawl_new_website(request):
|
||||
job = request.queue.enqueue(
|
||||
create_zim,
|
||||
request.registry.settings,
|
||||
request.validated,
|
||||
timeout=1800)
|
||||
request.response.status_code = 201
|
||||
return {
|
||||
'job_id': job.id
|
||||
}
|
||||
|
||||
|
||||
@status.get()
|
||||
def display_status(request):
|
||||
job = request.queue.fetch_job(request.matchdict["id"])
|
||||
if job is None:
|
||||
raise HTTPNotFound()
|
||||
|
||||
log_dir = request.registry.settings.get('zimit.logdir', '/tmp')
|
||||
log_file = os.path.join(log_dir, "%s.log" % job.id)
|
||||
|
||||
log_content = None
|
||||
if os.path.exists(log_file):
|
||||
with open(log_file) as f:
|
||||
log_content = f.read()
|
||||
|
||||
return {
|
||||
"status": job.status,
|
||||
"log": log_content
|
||||
}
|
||||
|
|
@ -1,20 +0,0 @@
|
|||
import os
|
||||
import urlparse
|
||||
|
||||
from rq import get_current_job
|
||||
|
||||
from zimit.mailer import send_zim_url
|
||||
from zimit.creator import load_from_settings
|
||||
|
||||
|
||||
def create_zim(settings, options):
|
||||
"""Call the zim creator and the mailer when it is finished.
|
||||
"""
|
||||
job = get_current_job()
|
||||
log_dir = settings.get('zimit.logdir', '/tmp')
|
||||
log_file = os.path.join(log_dir, "%s.log" % job.id)
|
||||
zim_creator = load_from_settings(settings, log_file)
|
||||
zim_file = zim_creator.create_zim_from_website(options['url'], options)
|
||||
output_url = settings.get('zimit.output_url')
|
||||
zim_url = urlparse.urljoin(output_url, zim_file)
|
||||
send_zim_url(settings, options['email'], zim_url)
|
||||
Loading…
Add table
Add a link
Reference in a new issue