Parsing info from .txt files



  • Let's have a thread for this.

    The plan is to add lots and lots of tables full of info populated with values read from the .txt files where they're defined. This isn't going to be an exhaustive list of info and will be a different process for different variants.

    That is part 1.

    Part 2 is extending our wiki with PHP which gathers this info from the databse where we've put it. We do not want to parse an entire r_info.txt every time we serve a page.

    There are lots of details to hash out in part 2. Given the plurality of variants we will probably want a single page for a green icky thing rather than having green icky thing pages for every version and variant.

    I think monster lists are a good place to start, even though V makes this complicated by having base types we'll need to display eg resistances and vulns inherited from on the page.

    I propose that most of us focus on getting data online for V as proof of concept and we can post our angband -> SQL statement parsers for rodent or quirk to tweak for their variants although I don't think either of them is likely to like SQL very much.


  • The parser needs to be extensible. We can start writing JSON templates that look like

    {
        "variant":"angband",
        "version":"4.2.0",
        "files": {
            "monsterdata":"monster.txt",
            "monsterbasedata":"monster_base.txt"
            ...
        }
    }
    

    and functions like

    function parsemonsterlist(file, parserscheme){
        return monstersobj;
    }
    

    with associated parserscheme objects (or be really OO and have these functions as properties of an object associated with the game.) and

    function monsterstosql(monstersobj){
        return sql_create_inject_statements;
    }
    

    'monster' vs 'object' could be parametrised too of course but the point is to make templates for version/variant that can be passed into generalised functions for scraping each new variant and version for data

    We don't have any restrictions to the choice of language for this step I just wrote my pseudocode in C-like JS because it's what comes naturally.


  • okay monsterbasedata/monsterdata properties in files maybe objects of their own containing the parser schemes rather than strings.


  • Have you thought about using the output of the spoiler generation code instead of writing your own per-version, per-variant parser?


  • Insert the outputs as static blocks of HTML? Much less one-size fits all and much less flexible about formatting the output. We can host those too but it isn't quite the same.


  • if someone hacked the spoiler generation code to spit out JSON or SQL statements however, that would be the act of a wizard.


  • @takkaria
    That is what I was thinking of too: using the existing (in V anyway) parse code. Once you have a parser, it's easy to generate any output format.


  • Basically pulling out chunks of existing angband code and usin them in other contexts is what I perceive to be wizardry and there are so many useful things I would want to do if I could do that. As I said before, if you can take code we already have and get it to spit out SQL or JSON then you are a wizard and have helped to save angband.


  • I'd still be interested in pure-JS implementation of info-file parsers.

    I don't think it's too much work (irrespective of async), we've all already hacked similar scripts before.

    The hard(er) part is not writing a parser, but agreeing on a common format, as in, what do we parse INTO?

    Because having things like:

    {
    "type": "R",
    "serial": 9,
    "X-line": [25,25,12]
    }
    

    seems pointless, surely we want something like:

    {
     "name": "White centipede",
     "common_attribute1": "something",
    }
    

  • I totally forgot I already wrote a script for this. It spits out html tables but it could be hacked to spit out something more useful to us (SQL) instead

    https://github.com/OwenGHB/angband-webclient/blob/master/bandit.js

    bandit is shot for angband editor which this was originally conceived of as facilitating, give people nice html forms to edit datafiles with.

    it looks like past me was thinking ahead too, with the definitions of variables describing what we are parsing and then functions which can be re-used for other definitions.


  • I don't know how to put the output data into Mediawiki. Anyone who could point me in the right direction would be greatly appreciated.


  • I've written a Perl script that parses monster.txt and monster_base.txt. I don't think I have Wiki database access yet to add it to the new wiki. I created a really basic page on the wiki here:
    https://wiki.angband.live/index.php/Creature_template
    My current idea is to generate a new page for each monster and fill that "statblock" table with the data. Then people can edit all around the statblock - tips and tricks, Tolkien history, whatever - and when the data changes I can replace the statblock and leave everything else untouched. I think we'll want a little drop-down menu to display statblocks from different versions but I haven't gotten that far. It would be handy, I think, to have identically named monsters from similar *bands share a page as well, but that's undecided.

    If anyone has ideas how to liven up that template, that'd be awesome. For example, if we're legally allowed we could put Shockbolt's icon up there. But I don't know off-hand how to take the big image of all the tiles and turn it into one image for each tile and name it appropriately. But I know that professionals in image manipulation could do it in an hour or less. And we'll need to display what *band(s) this monster is found in. Also, not all info is equally important, so we'd probably want Depth (for example) displayed prominently.


  • I sent you the access credentials to the wiki db on discord

    I'll have a look at the db later, but first step is to make sure we have an extensible system for parsing datafiles from any arbitrary variant, and generating the pages will be the second step and involve extending mediawiki with our own PHP, which I will begin the work on.


  • Step 1: define an extensible template structure which can be given to the parser

    class variant {
    	string name;
    	string version;
    	fileDescriptor[] files;
    }
    class fileDescriptor {
    	string subject;
    	// dataType to default to 0 meaning 'list', redundant for parsing r_info, a_info etc but non-list type parseable edit files exist and we can extend to them later if we leave ourselves a variable loose
    	int dataType;
    	entryDescriptor entry;
    }
    //assume ':' seperates fields, this one is for lists
    class listDescriptor extends entryDescriptor {
    	//e.g. entryBegin = "^N" or "^name"
    	string entryBegin;
    	//e.g. lineBegin += "^N" or "^F" or "^name" or "^flags"
    	string[] lineBegin;
    	//variable to describing how many entries to read from a line;
    	//negative numbers can be used when special meanings like 
    	//aribtrary numbers of values on a single line are required. 
    	//Array length must match lineBegin. Entries can still have 
    	//arbitrarily many e.g. "F:" lines, they matching array lengths 
    	//are for telling the parser how to handle a line beggining that 
    	//it has matched.
    	int[] lineEntries;
    	//probably an optional variable if entryBegin is defined
    	string entryTerminate;
    }
    
    

    I have apparently reverted to a Java-like syntax for pseudocoding this definition although I'd like to actually write them in JSON or XML.

    This is incomplete! We need to define a system for labelling the extracted values e.g. which value did we parse that was AC in the middle of a line. Certainly a list of the expected fields which our variant defines.

    Step 2 is defining the database.

    Table prefixes. Examples: Angband_341_monsters, Composband_712_artifacts

    Some variants will need tables that others don't. Column heading may be inconsistent between variants but should match equivalents when possible but no reason to eg force angband to have a corpse weight field.

    Can we agree on this, and maybe someone with more formal experience write a more formal specification for a parser template?


  • @Gwarl What are your goals apart from putting this information into the wiki? We don't need "formal specifications" to take name/value pairs from a text file and put them into an HTML table.


  • We do to make any sense of them. Not everything is a name:value pair. How do we know which lines are name/value pairs and which aren't, and how do we know which names they have? We need a solution for all variants, not just angband 4.2.0. If we have to write new code for every version and variant we will remain small and incomplete.


  • I don't mean to suggest that every bit of data in every file we're going to read is in the format "name:value". The line:
    I:2:7d4:0 (Sil)
    Is three name/value pairs once it's been read properly, though, and that's how I picture it looking in the wiki:
    Speed->2
    Health->7d4
    Light->0


  • I'm just saying we should establish a convention for writing templates for these structures so the parser knows what to look for.


  • I'm sure you're right about that, but I want to do it from the bottom-up, so to speak. And I don't want the data structure of this template any more complex than it needs to be.


  • can you post an example of that initial txt file that we want to parse, maybe? Sounds like easy even parsing all r_info.txt straight away (trickiest part would be to understand what each and every flag stands for), but I understand that you do not want to parse entire file.

Log in to reply