08 April 2014, 12:14 | #1 |
Users Awaiting Email Confirmation
Join Date: Mar 2014
Location: my town
Posts: 12
|
Parsing an HTML file in Arexx
I want to extract information from <td> elements, eg:
<td> A </td> <td> B </td> ... How do I achieve that in Arexx ? Thanks in advance. |
08 April 2014, 13:02 | #2 |
Registered User
Join Date: Jan 2002
Location: Germany
Posts: 6,985
|
How much do you know about these HTML files? For example, are <td> value </td> always on seperate lines? Are <td> </td> always written in lower case?
To write a generic HTML parser might be very difficult. But if you know that the files you read are always built up in the same way, it will get easier. |
08 April 2014, 13:55 | #3 | |
Users Awaiting Email Confirmation
Join Date: Mar 2014
Location: my town
Posts: 12
|
Quote:
The <td> elements (on separate lines) are within a <table class="tableclass">; the "class" attribute "tableclass" is a unique identifier. |
|
08 April 2014, 15:03 | #4 |
Registered User
Join Date: Jan 2002
Location: Germany
Posts: 6,985
|
This wasn't what I asked for.
Let's look at an example. I saved the source code of this page as board.htm. The ARexx program board.rexx extracts the posting time and user name from the table of posts. It's done in a quite easy way but this means that it only works on HTML files which look exactly like this. Code:
3> rx board.rexx Today, 12:14 Gundam Today, 13:02 thomas Today, 13:55 Gundam 3> |
08 April 2014, 15:26 | #5 | |
Users Awaiting Email Confirmation
Join Date: Mar 2014
Location: my town
Posts: 12
|
Quote:
that's OK! I don't need a generic parser; I just want to extract the contents of <td> elements which are in the <table> with THAT specific identifer. The value is on a separate line, like this: <td> value </td> the <td> tags don't have any "id" attribute, so they can't be easily identified; thats why I asked for help. |
|
08 April 2014, 17:23 | #6 |
Registered User
Join Date: Oct 2009
Location: Germany
Posts: 3,303
|
Thats easy. Use Thomas script as template and change/reduce it to your needs. You just have to readln() and check if the string is <td>. If yes, read next line until </td> is reached.
|
08 April 2014, 18:20 | #7 |
Users Awaiting Email Confirmation
Join Date: Mar 2014
Location: my town
Posts: 12
|
|
09 April 2014, 00:16 | #8 | |
AMOS Extensions Developer
Join Date: Jun 2007
Location: near Cambridge, UK
Age: 44
Posts: 1,924
|
Quote:
Code:
<TABLE> <TR><TH>Some data</TH></TR> <TR><TD>More data</TD></TR> <TR><TD>Even more data</TD></TR> <!-- and the rest of the table follows... ---> </TABLE> <TH> is table (column/s) header and </TH> marks it's end The above example is a 1 column table, it is entirely possible to have multiple entries of each tag on the same row. |
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
xml parsing in arexx | amiga_user | Coders. General | 2 | 17 November 2011 15:42 |
error parsing global configuration file line 16 | DDNI | project.WHDLoad | 7 | 21 March 2011 13:09 |
HTML datatype? | NovaCoder | support.Apps | 7 | 05 July 2010 12:59 |
No html posting??? | Thorham | project.EAB | 14 | 18 February 2008 02:21 |
HTML problem | Dastardly | Amiga websites reviews | 11 | 28 November 2002 15:21 |
|
|