Parsing info from .txt files

  • @takkaria
    That is what I was thinking of too: using the existing (in V anyway) parse code. Once you have a parser, it's easy to generate any output format.

  • Basically pulling out chunks of existing angband code and usin them in other contexts is what I perceive to be wizardry and there are so many useful things I would want to do if I could do that. As I said before, if you can take code we already have and get it to spit out SQL or JSON then you are a wizard and have helped to save angband.

  • I'd still be interested in pure-JS implementation of info-file parsers.

    I don't think it's too much work (irrespective of async), we've all already hacked similar scripts before.

    The hard(er) part is not writing a parser, but agreeing on a common format, as in, what do we parse INTO?

    Because having things like:

    "type": "R",
    "serial": 9,
    "X-line": [25,25,12]

    seems pointless, surely we want something like:

     "name": "White centipede",
     "common_attribute1": "something",

  • I totally forgot I already wrote a script for this. It spits out html tables but it could be hacked to spit out something more useful to us (SQL) instead

    bandit is shot for angband editor which this was originally conceived of as facilitating, give people nice html forms to edit datafiles with.

    it looks like past me was thinking ahead too, with the definitions of variables describing what we are parsing and then functions which can be re-used for other definitions.

  • I don't know how to put the output data into Mediawiki. Anyone who could point me in the right direction would be greatly appreciated.

  • I've written a Perl script that parses monster.txt and monster_base.txt. I don't think I have Wiki database access yet to add it to the new wiki. I created a really basic page on the wiki here:
    My current idea is to generate a new page for each monster and fill that "statblock" table with the data. Then people can edit all around the statblock - tips and tricks, Tolkien history, whatever - and when the data changes I can replace the statblock and leave everything else untouched. I think we'll want a little drop-down menu to display statblocks from different versions but I haven't gotten that far. It would be handy, I think, to have identically named monsters from similar *bands share a page as well, but that's undecided.

    If anyone has ideas how to liven up that template, that'd be awesome. For example, if we're legally allowed we could put Shockbolt's icon up there. But I don't know off-hand how to take the big image of all the tiles and turn it into one image for each tile and name it appropriately. But I know that professionals in image manipulation could do it in an hour or less. And we'll need to display what *band(s) this monster is found in. Also, not all info is equally important, so we'd probably want Depth (for example) displayed prominently.

  • I sent you the access credentials to the wiki db on discord

    I'll have a look at the db later, but first step is to make sure we have an extensible system for parsing datafiles from any arbitrary variant, and generating the pages will be the second step and involve extending mediawiki with our own PHP, which I will begin the work on.

  • Step 1: define an extensible template structure which can be given to the parser

    class variant {
    	string name;
    	string version;
    	fileDescriptor[] files;
    class fileDescriptor {
    	string subject;
    	// dataType to default to 0 meaning 'list', redundant for parsing r_info, a_info etc but non-list type parseable edit files exist and we can extend to them later if we leave ourselves a variable loose
    	int dataType;
    	entryDescriptor entry;
    //assume ':' seperates fields, this one is for lists
    class listDescriptor extends entryDescriptor {
    	//e.g. entryBegin = "^N" or "^name"
    	string entryBegin;
    	//e.g. lineBegin += "^N" or "^F" or "^name" or "^flags"
    	string[] lineBegin;
    	//variable to describing how many entries to read from a line;
    	//negative numbers can be used when special meanings like 
    	//aribtrary numbers of values on a single line are required. 
    	//Array length must match lineBegin. Entries can still have 
    	//arbitrarily many e.g. "F:" lines, they matching array lengths 
    	//are for telling the parser how to handle a line beggining that 
    	//it has matched.
    	int[] lineEntries;
    	//probably an optional variable if entryBegin is defined
    	string entryTerminate;

    I have apparently reverted to a Java-like syntax for pseudocoding this definition although I'd like to actually write them in JSON or XML.

    This is incomplete! We need to define a system for labelling the extracted values e.g. which value did we parse that was AC in the middle of a line. Certainly a list of the expected fields which our variant defines.

    Step 2 is defining the database.

    Table prefixes. Examples: Angband_341_monsters, Composband_712_artifacts

    Some variants will need tables that others don't. Column heading may be inconsistent between variants but should match equivalents when possible but no reason to eg force angband to have a corpse weight field.

    Can we agree on this, and maybe someone with more formal experience write a more formal specification for a parser template?

  • @Gwarl What are your goals apart from putting this information into the wiki? We don't need "formal specifications" to take name/value pairs from a text file and put them into an HTML table.

  • We do to make any sense of them. Not everything is a name:value pair. How do we know which lines are name/value pairs and which aren't, and how do we know which names they have? We need a solution for all variants, not just angband 4.2.0. If we have to write new code for every version and variant we will remain small and incomplete.

  • I don't mean to suggest that every bit of data in every file we're going to read is in the format "name:value". The line:
    I:2:7d4:0 (Sil)
    Is three name/value pairs once it's been read properly, though, and that's how I picture it looking in the wiki:

  • I'm just saying we should establish a convention for writing templates for these structures so the parser knows what to look for.

  • I'm sure you're right about that, but I want to do it from the bottom-up, so to speak. And I don't want the data structure of this template any more complex than it needs to be.

  • can you post an example of that initial txt file that we want to parse, maybe? Sounds like easy even parsing all r_info.txt straight away (trickiest part would be to understand what each and every flag stands for), but I understand that you do not want to parse entire file.

  • Alright, and in the end you want to get something like this? But not as a single list, but as a wiki entries for each monster?

  • for the minute I just want everything in tables in the MySQL dB, once we have that we can present the information any way we like and have SQL selects automatically to run searches against them, eg

    List level 60-80 unique D's from every variant, show every monster marked DEMON in vanilla, etc.

  • oh, having this in SQL DB gonna be pain.. but ok, first to parse initial files.

  • Alright, made some code to start with.

    My main idea is to define all structures and be able to fill them up .txt files directly, some of information that is found in source codes only take from external files or hardcode them in for now. Just as an example I looked through r_info.txt file and made some structures. This is all not 100% done, of course, just my rage code 🙂

    I don't see much of a problem to parse files in /lib/edit, just see some different ways how to avoid hardcoding in a lot of stuff. One way of making it more universal would be ability to read our own txt-alike files that have info found only in source code files. In example above I hardcoded a bit spell types and colors.

    If we able to read .txt files and fill defined structures from them we can output it to whatever format we desire. I did that in Go since I just like it very much 🙂

  • Ok very good but I think there's some confusion, now we have my prototype for a parser in node.js, DavidMedley's prototype in Perl, and now yours in Go. You (specifically you) can see the perl scripts in /home/angwiki/

    As long as there's some sort of plugin-able template which describes a variant that any of our parser scripts can read then it doesn't matter which one we use, really.

Log in to reply