diff --git a/Dockerfile b/Dockerfile index abf0229d..c859c771 100644 --- a/Dockerfile +++ b/Dockerfile @@ -1,47 +1,56 @@ -FROM debian:stretch +FROM node:8-slim LABEL maintainer="Nick Sweeting " RUN apt-get update \ - && apt-get install -qy git wget curl youtube-dl gnupg2 libgconf-2-4 python3 python3-pip \ + && apt-get install -yq --no-install-recommends \ + git wget curl youtube-dl gnupg2 libgconf-2-4 python3 python3-pip \ && rm -rf /var/lib/apt/lists/* # Install latest chrome package and fonts to support major charsets (Chinese, Japanese, Arabic, Hebrew, Thai and a few others) -RUN apt-get update && apt-get install -y curl --no-install-recommends \ +RUN apt-get update && apt-get install -y wget --no-install-recommends \ && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \ && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \ && apt-get update \ - && apt-get install -y chromium fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst ttf-freefont \ + && apt-get install -y google-chrome-unstable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst ttf-freefont \ --no-install-recommends \ && rm -rf /var/lib/apt/lists/* \ - && rm -rf /src/*.deb \ - && ln -s /usr/bin/chromium /usr/bin/chromium-browser + && rm -rf /src/*.deb -# It might be a good idea to use dumb-init to help prevent zombie chrome processes. -# ADD https://github.com/Yelp/dumb-init/releases/download/v1.2.0/dumb-init_1.2.0_amd64 /usr/local/bin/dumb-init -# RUN chmod +x /usr/local/bin/dumb-init +# It's a good idea to use dumb-init to help prevent zombie chrome processes. +ADD https://github.com/Yelp/dumb-init/releases/download/v1.2.0/dumb-init_1.2.0_amd64 /usr/local/bin/dumb-init +RUN chmod +x /usr/local/bin/dumb-init -RUN git clone https://github.com/pirate/ArchiveBox /home/chromeuser/app \ - && pip3 install -r /home/chromeuser/app/archivebox/requirements.txt \ - && ln -s /home/chromeuser/app/bin/archivebox /usr/bin/archive +# Install puppeteer so it's available in the container. +RUN npm i puppeteer -# Add user so we area strong, independent chrome that don't need --no-sandbox. -RUN groupadd -r chromeuser && useradd -r -g chromeuser -G audio,video chromeuser \ +# Add user so we don't need --no-sandbox. +RUN groupadd -r pptruser && useradd -r -g pptruser -G audio,video pptruser \ + && mkdir -p /home/pptruser/Downloads \ + && chown -R pptruser:pptruser /home/pptruser \ + && chown -R pptruser:pptruser /node_modules + +# Install the ArchiveBox repository and pip requirements +RUN git clone https://github.com/pirate/ArchiveBox /home/pptruser/app \ && mkdir -p /data \ - && ln -s /data /home/chromeuser/app/archivebox/output \ - && chown -R chromeuser:chromeuser /home/chromeuser/app/archivebox/output \ - && chown -R chromeuser:chromeuser /home/chromeuser + && chown -R pptruser:pptruser /data \ + && ln -s /data /home/pptruser/app/archivebox/output \ + && ln -s /home/pptruser/app/bin/archivebox /bin/archive \ + && chown -R pptruser:pptruser /home/pptruser/app/archivebox + # && pip3 install -r /home/pptruser/app/archivebox/requirements.txt VOLUME /data -ENV LANG=en_US.UTF-8 \ +ENV LANG=C.UTF-8 \ LANGUAGE=en_US:en \ - LC_ALL=en_US.UTF-8 \ + LC_ALL=C.UTF-8 \ PYTHONIOENCODING=UTF-8 \ CHROME_SANDBOX=False \ + CHROME_BINARY=google-chrome-unstable \ OUTPUT_DIR=/data # Run everything from here on out as non-privileged user -USER chromeuser -WORKDIR /home/chromeuser/app +USER pptruser +WORKDIR /home/pptruser/app -CMD ["/usr/bin/archive"] +ENTRYPOINT ["dumb-init", "--"] +CMD ["/bin/archive"] diff --git a/archivebox/config.py b/archivebox/config.py index 74cb0b62..0111ebcd 100644 --- a/archivebox/config.py +++ b/archivebox/config.py @@ -63,6 +63,7 @@ if not CHROME_BINARY: 'google-chrome-beta', 'google-chrome-canary', 'google-chrome-dev', + 'google-chrome-unstable', ) for name in common_chrome_executable_names: full_path_exists = shutil.which(name) diff --git a/docker-compose.yml b/docker-compose.yml index 25cdceac..0331d507 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -6,15 +6,11 @@ services: stdin_open: true tty: true environment: - - FETCH_SCREENSHOT=False - - FETCH_PDF=False - - FETCH_DOM=False - - FETCH_MEDIA=False - USE_COLOR=False - SHOW_PROGRESS=False volumes: - ./data:/data - command: bash -c 'echo "https://example.com" | /usr/bin/archive; tail -f /dev/null' + command: bash -c 'echo "https://example.com" | /bin/archive; tail -f /dev/null' nginx: image: 'nginx'