Giter Site home page Giter Site logo

cdb90's Introduction

CDB90 Battle Dataset

A tidier and open-data version of CDB90 (CAA Database of Battles, Version 1990). This used to be called CDB13, which I thought was an amusing name, but seemed to make people think I added to the CDB90 data.

The description of CDB90 from Helmbold (1993):

A database of over 600 battles that were fought between 1600AD and 1973AD. Descriptive data include battle name, date, and location; the strengths and losses on each side; identification of the victor; temporal duration of the battle; and selected environmental and tactical environment descriptors (such as type of fortifications, type of tactical scheme, weather conditions, width of front, etc.)

The original documentation for the CDB90 dataset is in src-data/M000121/README.TXT.

The directory src-data/M000121 contains the original data from CDB90 as obtained from the NTIS. The original datafiles are the WKS files. However, since this is an archaic format, these were converted to csv files using LibreOffice (v. 4.4.3.2; BuildId: 88805f81e9fe61362df02b9941de8e38a9b5fd16; Locale: en_).

The directory data contains the revised data. No major substantive changes were made to the data, nor were new battles added. A brief summary of the revisions are:

  • Minor changes to several values so that logical relationships hold. These are all the edits made in the revision CDB91 as described in Helmbold (1995) DTIC reort ADA298124, p. 2-4. The remainder are a few other logical inconsistencies in the original data: e.g. total tanks must equal light battle tanks plus main battle tanks; tanks are 0, not missing prior to WWI.
  • Reformatting the data from one of the messier spreadsheets that I have ever seen to tidy data. This means that there are seperate tables for battles, combatants, activity periods, etc.
  • Additional variables linking the battles to other datasources such as the Correlates of War
  • Different categorizations of wars, both from the original HERO dataset and my own categorization which is mostly consistent with the COW wars after 1815, and Wikipedia before 1815.

This dataset follows the Data Package standard.

Usage

To build the data csv's, documentation, and create a SQLite database with the data, run:

$ ./build.sh

References

  • CAA Study Report CAA-SR-84-6, "Analysis of Factors That Have Influenced Outcomes of Battles and Wars: A Data Base of Battles and Engagements," September 1984: AD-B086-797L, AD-B087-718L, AD-B087-719L, AD-B087-720L, AD-B087-721L, AD-B087-722L
  • HERO Report Number 129, "Combat History Analysis Study Effort (CHASE) Data Enhancement Study (CDES)," 31 January 1986, AD-A175-712, AD-A175-713, AD-A175-714, AD-A175-715, AD-A175-716
  • Data Base Error Correction (DBEC)," 23 January 1987. It was prepared for CAA under Purchase Order Number MDA903-86-M-8560 and is available from DTIC under accession number AD-A176-750.
  • FW Management Associates, Inc. Report "Independent Review/Reassessment of Anomalous Data (IR/RAD)," 22 June 1987, in four volumes. It was prepared for CAA under Contract Number MDA903-86-C-0396. It is available from DTIC under the following accession numbers: AD-???-??? (Volume I), AD-195-726 (Volume II), AD-???-??? (Volume III), and AD-???-??? (Volume IV)
  • Data Memory Systems Incorporated report, "New Engagement Data for the Breakpoints Data Base," prepared for the US Army Concepts Analysis Agency under Contract No. MDA903-87-C-0807, 30 September 1988.
  • Robert Helmbold. 1993. "Personnel Attrition Rates in Historical Land Combat Operations: A Catalog of Attrition and Casualty Data Bases on Diskettes Usable with Personal Computers." DTIC ADA279069.
  • Robert Helmbold. 1995. "Personnel Attrition Rates in Historical Land Combat Operations. Some Empirical Relations among Force Sizes, Battle Durations" DTIC AD-A268 787.

Licenses

  • Code is BSD-3 unless otherwise noted.
  • Data is odc-by.
  • The original CDB90 data in src-data/M000121 is Public Domain.

cdb90's People

Contributors

jrnold avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cdb90's Issues

Negative values missing?

Hello,

I've noticed in the description that some variables are supposed to have negative values (e.g., wina, cea, surpa) but they do not.
After looking at the R Code, I see this is being done by the function "tomissing_neg." Is this deliberate?

I spot checked some of the wina values and saw that the ones that are missing should indeed be -1 (defender victory). Am I making a mistake in understanding reformulation?

Date Data

Could be screwing something up here, but the majority of values in Year, Month, Day, Hour are all 9s. From my exposure to the set in the HERO work, I don't think that was originally the case?

Explic

The file src-data/CDB90/CDB90.tsv was made by concatenating the csv files in data/M000121 into a single file. But I also made edits directly to the file out of laziness. These included the changes in Helmbold 1995 version CDB91, described on pp 2-4 to 2-5.

To address this

  • Convert and concatenate all WKS files into a csv file
  • Use csvdiff or similar tool to find differences in the data, and create a patch that can be applied programatically to the original data.
  • Create a script to generate the data from the original files. Probably convert WKS to CSV manually and start the script from there.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.