parsowanie html c# :: 4programmers.net

0

załączone biblioteki

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Net;
using System.IO;
using System.Threading.Tasks;
using System.Windows.Forms;
using HtmlAgilityPack;

Kod

 label1.Text = "1 USD";
                List<string> kursy = new List<string>();

                WebClient web = new WebClient();
                string html = web.DownloadString("http://kursy-walut.mybank.pl/");
                MatchCollection m1 = Regex.Matches(html, @"<td id=\"GBPPLN_NBP\">\\s*(.+?)\\s*</td>", RegexOptions.Singleline);
                foreach (Match m in m1)
                {
                    if (m.Groups[1].Value != "")
                        {
                        string kurs = m.Groups[1].Value;
                        kursy.Add(kurs);
                    }
                }
                listBox1.DataSource = kursy;

ktoś wie dlaczego linijka

 MatchCollection m1 = Regex.Matches(html, @"<td id=\"GBPPLN_NBP\">\\s*(.+?)\\s*</td>", RegexOptions.Singleline);

wywala błąd:
Syntax error, ';' expected
The name 'GBPPLN_NBP' does not exist in the current context
Cannot resolve symbol 'GBPPLN_NBP'

1

Widzę że korzystasz z HTMLAgilityPack, więc strzelam że nie potrzebujesz zadnego regexa.
Spróbuj tak:
Odpalasz strone w chrome, klikasz prawym element ktory chcesz sparsowac, klikasz "zbadaj". W oknie po prawej klikasz prawym element ktory sie zaznaczyl i dajesz copy ->copy XPath .

Potem w kodzie robisz tak:

     var html = @"http://kursy-walut.mybank.pl/";

     HtmlWeb web = new HtmlWeb();
     var htmlDoc = web.Load(html);

     string tresc = htmlDoc.DocumentNode.SelectSingleNode(" /*tutaj wklejasz co skopiowales z chrome */").InnerText;

Mozna tez to zrobic bez kopiowania XPath(czasami zawodzi), stosujac LINQ. Poczytaj o HTMLAgilityPack, ma naprawde spore mozliwosci

0

kopiuje się coś takiego

//*[@id="article"]/table/tbody/tr/td/center/table[1]/tbody/tr[2]/td[3]

po wstawieniu tego tam gdzie mówiłeś

htmlDoc.DocumentNode.SelectSingleNode("//*[@id="article"]/table/tbody/tr/td/center/table[1]/tbody/tr[2]/td[3]").InnerText;

oczywiście:

The name 'article' does not exist in the current context

0

Chyba zle skopiowales Xpath, poza tym musisz pamietac zeby powstawiać \ przed podwojnymi apostrofami.

 string tresc = htmlDoc.DocumentNode.SelectSingleNode("//*[@id=\"GBPPLN_NBP\"]").InnerText;

wynik: 4,7856

0

Poszło!
Dzięki wielkie

parsowanie html c#

1 użytkowników online, w tym zalogowanych: 0, gości: 1

Praca dla programistów

Forum dyskusyjne

Sprawy administracyjne

O nas

Skontaktuj się z nami