C# wyciaganie tekstu z website pomiedzy linijkami.

0

WItam,

Potrzebuje pomocy w wyciagnieciu numerow telefonu oraz adres email z ponizszego zrodla strony:

<table border="0" cellpadding="0" cellspacing="0">
    
    
		<tr valign="top" class="communication_tel">
			<td class="t_label">Phone:</td><td> </td><td class="t_value">
				
					+49 9522-7097540 
				
			</td></tr>
	
    
		<tr valign="top" class="communication_fax">
			<td class="t_label">Fax:</td><td> </td><td class="t_value">
				
					+49 9522-7097534 
				
			</td></tr>
	
    
    
		<tr valign="top" class="communication_email">
			<td class="t_label">e-mail:</td><td> </td><td class="t_value">
				
					<a href="mailto:[email protected]">[email protected]</a>
					
				
			</td></tr>
	
    
		<tr valign="top" class="communication_internet">
			<td>Website:</td><td> </td><td class="t_value">
				
					<a target="_blank" href="http://www.1a-abrasives.com">www.1a-abrasives.com</a>
					
				
			</td></tr>
	


	</table>

Dla przykladu chcialem wyciagnac Phone: Stworzylem program do tego ale output jest pusty:

static void Main(string[] args)
        {
            try
            {

                WebClient myWebClient = new WebClient();

                string page = myWebClient.DownloadString("http://exhibitors.grindtec.de/en/exhibitors-products/exhibitors/exhibitors-details/ID/751702/action/detail/controller/Exhibitors/");

                string name1 = "<td class=\"t_label\">Phone:</td><td> </td><td class=\"t_value\">(.*)</td></tr>";


                Regex rgx = new Regex(name1);
                MatchCollection matches = rgx.Matches(page);



                foreach (Match match in matches)
                {
                    var b = Regex.Match(match.Value, "(?<=>)(.*)(<=?)");
                    Console.WriteLine(b.Value);
                }
            }

            catch (Exception ex)
            {
                Console.WriteLine("Error: " + ex.Message);
            }
            Console.ReadLine();
        }

Czy ktos moze mi podpowiedziec co robie zle?

0

Naucz się korzystać z debuggera. Masz problem z pierwszym regexem, kolekcja match jest pusta.

0

Udalo sie samemu dojsc do rozwiazania: Troche zmienilem kod I zaskoczylo. Dla potomnych:


static void Main(string[] args)
        {
            const string mysite = "http://exhibitors.grindtec.de/en/exhibitors-products/exhibitors/exhibitors-details/ID/751702/action/detail/controller/Exhibitors/";

            try
            {

                WebClient myWebClient = new WebClient();
                string a = "Phone:";

                string page = myWebClient.DownloadString(mysite);
                //phone
                string name1 = @"<td class=""t_label"">" + a + @"</td><td>&nbsp;</td><td class=""t_value"">\s*(.*?)\s*</td>";
                //email //todo
                string name2 = "<a href=\"mailto:(.*?)\">";

                Regex rgx = new Regex(name1,RegexOptions.Singleline);
                MatchCollection matches = rgx.Matches(page);


                foreach (Match match in matches)
                {
                    string cont = match.Groups[1].Value;
                    Console.WriteLine(cont);
                }
            }

            catch (Exception ex)
            {
                Console.WriteLine("Error: " + ex.Message);
            }
            Console.ReadLine();
        }

1 użytkowników online, w tym zalogowanych: 0, gości: 1