BiRD - Birkbeck Research Data

    4chan /pol board as a temporary evolution of live threads and posts.

    Cite as: Prifti, Ylli (2021): 4chan /pol board as a temporary evolution of live threads and posts. Birkbeck College, University of London. doi: https://doi.org/10.18743/DATA.00154

    Description

    We monitored the live /pol board on 4chan and scraped data from each thread multiple times during their time in the board and additionally (but not included in this dataset), scraped the archive status of the thread as exposed by the 4chan API and 4pleb (internet archiving service) API.

    Included in this dataset there is a three month extract of data scraped from 4chan during the period from 1st of April 2021 to 1st of July 2021

    Collection Method

    We used a distributed scraping system to collect this information and pointed a combination of 14 nodes, 3 clusters and 100 running instances per minute to scrap the data at high frequency, due to the ephemeral characteristics of 4chan.

    The scraping system was configure as a breath-first scraping system and MongoDB was used as document storage configured as a replica-set made of one primary, three secondaries and one arbiter.

    Data Objects

    Offline / Analogue Data Records

    There are no offline / analogue datasets associated with this record

    External Data Records

    There are no external datasets associated with this record

    Digital Data Downloads

    To download and items from this dataset, you must agree to abide by the licence attached to the individual items. If you make use of any item you download, you must also cite it in any publication or outputs of your own.

    If you have any questions or would like additional information, please contact us at researchdata@bbk.ac.uk.

    Data

    Metadata

    Dataset Title:

    4chan /pol board as a temporary evolution of live threads and posts.

    Creators:

    Prifti, Ylli

    School/Department:

    Birkbeck Schools and Research Centres > School of Business, Economics & Informatics > Computer Science and Information Systems

    Keywords:

    4chan, live board, threads, posts

    Data collection method:

    We used a distributed scraping system to collect this information and pointed a combination of 14 nodes, 3 clusters and 100 running instances per minute to scrap the data at high frequency, due to the ephemeral characteristics of 4chan.

    The scraping system was configure as a breath-first scraping system and MongoDB was used as document storage configured as a replica-set made of one primary, three secondaries and one arbiter.

    Collection period:

    FromTo
    1 April 20211 July 2021

    Temporal coverage:

    FromTo
    1 April 20211 July 2021

    Statement on legal, ethical, and access issues:

    This is a collection of publicly available, anonymous at source and otherwise ephemeral data.

    Export / Share Citation

    Cite as: Prifti, Ylli (2021): 4chan /pol board as a temporary evolution of live threads and posts. Birkbeck College, University of London. doi: https://doi.org/10.18743/DATA.00154

    Impact & Reach

    Downloads
    Activity Overview
    0Downloads
    0Hits

    Additional statistics for this dataset are available via IRStats2.

    Actions (Login Required)

    Edit Item Edit Item