Parsing info from .txt files

  • Step 1: define an extensible template structure which can be given to the parser

    class variant {
    	string name;
    	string version;
    	fileDescriptor[] files;
    class fileDescriptor {
    	string subject;
    	// dataType to default to 0 meaning 'list', redundant for parsing r_info, a_info etc but non-list type parseable edit files exist and we can extend to them later if we leave ourselves a variable loose
    	int dataType;
    	entryDescriptor entry;
    //assume ':' seperates fields, this one is for lists
    class listDescriptor extends entryDescriptor {
    	//e.g. entryBegin = "^N" or "^name"
    	string entryBegin;
    	//e.g. lineBegin += "^N" or "^F" or "^name" or "^flags"
    	string[] lineBegin;
    	//variable to describing how many entries to read from a line;
    	//negative numbers can be used when special meanings like 
    	//aribtrary numbers of values on a single line are required. 
    	//Array length must match lineBegin. Entries can still have 
    	//arbitrarily many e.g. "F:" lines, they matching array lengths 
    	//are for telling the parser how to handle a line beggining that 
    	//it has matched.
    	int[] lineEntries;
    	//probably an optional variable if entryBegin is defined
    	string entryTerminate;

    I have apparently reverted to a Java-like syntax for pseudocoding this definition although I'd like to actually write them in JSON or XML.

    This is incomplete! We need to define a system for labelling the extracted values e.g. which value did we parse that was AC in the middle of a line. Certainly a list of the expected fields which our variant defines.

    Step 2 is defining the database.

    Table prefixes. Examples: Angband_341_monsters, Composband_712_artifacts

    Some variants will need tables that others don't. Column heading may be inconsistent between variants but should match equivalents when possible but no reason to eg force angband to have a corpse weight field.

    Can we agree on this, and maybe someone with more formal experience write a more formal specification for a parser template?

  • @Gwarl What are your goals apart from putting this information into the wiki? We don't need "formal specifications" to take name/value pairs from a text file and put them into an HTML table.

  • We do to make any sense of them. Not everything is a name:value pair. How do we know which lines are name/value pairs and which aren't, and how do we know which names they have? We need a solution for all variants, not just angband 4.2.0. If we have to write new code for every version and variant we will remain small and incomplete.

  • I don't mean to suggest that every bit of data in every file we're going to read is in the format "name:value". The line:
    I:2:7d4:0 (Sil)
    Is three name/value pairs once it's been read properly, though, and that's how I picture it looking in the wiki:

  • I'm just saying we should establish a convention for writing templates for these structures so the parser knows what to look for.

  • I'm sure you're right about that, but I want to do it from the bottom-up, so to speak. And I don't want the data structure of this template any more complex than it needs to be.

  • can you post an example of that initial txt file that we want to parse, maybe? Sounds like easy even parsing all r_info.txt straight away (trickiest part would be to understand what each and every flag stands for), but I understand that you do not want to parse entire file.

  • Alright, and in the end you want to get something like this? But not as a single list, but as a wiki entries for each monster?

  • for the minute I just want everything in tables in the MySQL dB, once we have that we can present the information any way we like and have SQL selects automatically to run searches against them, eg

    List level 60-80 unique D's from every variant, show every monster marked DEMON in vanilla, etc.

  • oh, having this in SQL DB gonna be pain.. but ok, first to parse initial files.

  • Alright, made some code to start with.

    My main idea is to define all structures and be able to fill them up .txt files directly, some of information that is found in source codes only take from external files or hardcode them in for now. Just as an example I looked through r_info.txt file and made some structures. This is all not 100% done, of course, just my rage code 🙂

    I don't see much of a problem to parse files in /lib/edit, just see some different ways how to avoid hardcoding in a lot of stuff. One way of making it more universal would be ability to read our own txt-alike files that have info found only in source code files. In example above I hardcoded a bit spell types and colors.

    If we able to read .txt files and fill defined structures from them we can output it to whatever format we desire. I did that in Go since I just like it very much 🙂

  • Ok very good but I think there's some confusion, now we have my prototype for a parser in node.js, DavidMedley's prototype in Perl, and now yours in Go. You (specifically you) can see the perl scripts in /home/angwiki/

    As long as there's some sort of plugin-able template which describes a variant that any of our parser scripts can read then it doesn't matter which one we use, really.

  • oh, no problem then, please continue with your and David's, no need to reinvent the wheel here 🙂 Did not noticed that you have some working scripts 🙂

  • Yeah it looks like what you've done here is to do for (frog?)composband what I did for whichever nightly version of angband I did my JS script for, the problem comes when all angbands have (sometimes) very similar but not quite the same formats, but DM has been working on it, I am looking for the time to do it. I have gotten a new PC and have been very distracted by the pretty 3D tank battles it lets me fight and have just been catching up on making new job applications.

    I have chosen Eclipse to use as an IDE and have imported some angband-related projects which I also need to catch up on as I learn my way around Eclipse. Have also ditched windows and linux is so mind-bogglingly simple it's difficult to conceptualise my workflow with all the same clunky slow-moving parts I had on windows.

Log in to reply