Sindbad~EG File Manager
�
�=Og�$���dZddlZddlZddlZdgZejdd��ZGd�d��ZGd�d��Z Gd �d
��Z
dS)a% robotparser.py
Copyright (C) 2000 Bastian Kleineidam
You can choose between two licenses when using this package:
1) GNU GPLv2
2) PSF license for Python 2.2
The robots.txt Exclusion Protocol is implemented as specified in
http://www.robotstxt.org/norobots-rfc.txt
�N�RobotFileParser�RequestRatezrequests secondsc�\�eZdZdZdd�Zd�Zd�Zd�Zd�Zd�Z d �Z
d
�Zd�Zd�Z
d
�Zd�ZdS)rzs This class provides a set of methods to read, parse and answer
questions about a single robots.txt file.
�c��g|_g|_d|_d|_d|_|�|��d|_dS)NFr)�entries�sitemaps�
default_entry�disallow_all� allow_all�set_url�last_checked��self�urls �+/usr/lib64/python3.11/urllib/robotparser.py�__init__zRobotFileParser.__init__sG�������
�!���!���������S���������c��|jS)z�Returns the time the robots.txt file was last fetched.
This is useful for long-running web spiders that need to
check for new robots.txt files periodically.
)r�rs r�mtimezRobotFileParser.mtime%s
��� � rc�@�ddl}|���|_dS)zYSets the time the robots.txt file was last fetched to the
current time.
rN)�timer)rrs r�modifiedzRobotFileParser.modified.s#��
���� �I�I�K�K����rc�|�||_tj�|��dd�\|_|_dS)z,Sets the URL referring to a robots.txt file.��N)r�urllib�parse�urlparse�host�pathrs rr
zRobotFileParser.set_url6s4�����%�|�4�4�S�9�9�!�A�#�>��� �4�9�9�9rc�� tj�|j��}|���}|�|�d�������dS#tjj $rK}|j
dvrd|_n)|j
dkr|j
dkrd|_Yd}~dSYd}~dSYd}~dSYd}~dSd}~wwxYw)z4Reads the robots.txt URL and feeds it to the parser.zutf-8)i�i�Ti�i�N)
r�request�urlopenr�readr�decode�
splitlines�error� HTTPError�coderr)r�f�raw�errs rr&zRobotFileParser.read;s��� 9���&�&�t�x�0�0�A��&�&�(�(�C��J�J�s�z�z�'�*�*�5�5�7�7�8�8�8�8�8���|�%� &� &� &��x�:�%�%�$(��!�!���S���S�X��^�^�!%���������!������"�!�!�!�!�!�%3�^�^�^�^�^����� &���s�$A6�6C�
.C�Cc�p�d|jvr|j� ||_dSdS|j�|��dS�N�*)�
useragentsr
r�append)r�entrys r�
_add_entryzRobotFileParser._add_entryHsM���%�"�"�"��!�)�%*��"�"�"�*�)�
�L����&�&�&�&�&rc�<�d}t��}|���|D�]V}|sB|dkrt��}d}n+|dkr%|�|��t��}d}|�d��}|dkr
|d|�}|���}|s��|�dd��}t
|��dk�r�|d������|d<tj �
|d�����|d<|ddkrM|dkr#|�|��t��}|j�|d��d}��o|ddkr8|dkr0|j
�t|dd ����d}���|dd
kr8|dkr0|j
�t|dd����d}���|ddkrP|dkrH|d������rt!|d��|_d}��S|dd
kr�|dkr�|d�d��}t
|��dkr�|d������rg|d������r;t%t!|d��t!|d����|_d}��*|ddkr |j�|d����X|dkr|�|��dSdS)z�Parse the input lines from a robots.txt file.
We allow that a user-agent: line is not preceded by
one or more blank lines.
rr��#N�:z
user-agent�disallowF�allowTzcrawl-delayzrequest-rate�/�sitemap)�Entryrr5�find�strip�split�len�lowerrr�unquoter2r3� rulelines�RuleLine�isdigit�int�delayr�req_rater )r�lines�stater4�line�i�numberss rrzRobotFileParser.parseQsJ���������
�
�����7 2�7 2�D��
��A�:�:�!�G�G�E��E�E��a�Z�Z��O�O�E�*�*�*�!�G�G�E��E�� � �#���A��A�v�v��B�Q�B�x���:�:�<�<�D��
���:�:�c�1�%�%�D��4�y�y�A�~�~��q�'�-�-�/�/�/�/�1�1��Q�� �,�.�.�t�A�w�}�}���?�?��Q����7�l�*�*���z�z�����.�.�.� %�����$�+�+�D��G�4�4�4��E�E��!�W�
�*�*���z�z���.�.�x��Q���/G�/G�H�H�H� !����!�W��'�'���z�z���.�.�x��Q���/F�/F�G�G�G� !����!�W�
�-�-���z�z� ��7�=�=�?�?�2�2�4�4�7�*-�d�1�g�,�,�E�K� !����!�W��.�.���z�z�"&�q�'�-�-��"4�"4����L�L�A�-�-�'�!�*�2B�2B�2D�2D�2L�2L�2N�2N�-� '��
� 0� 0� 2� 2� :� :� <� <�.�-8��W�Q�Z���#�g�VW�j�/�/�-Z�-Z�E�N� !����!�W� �)�)�
�M�(�(��a��1�1�1���A�:�:��O�O�E�"�"�"�"�"��:rc��|jrdS|jrdS|jsdStj�tj�|����}tj�dd|j|j |j
|jf��}tj�|��}|sd}|j
D].}|�|��r|�|��cS�/|jr|j�|��SdS)z=using the parsed robots.txt decide if useragent can fetch urlFTrr<)rrrrrr rD�
urlunparser"�params�query�fragment�quoter�
applies_to� allowancer
)r� useragentr�
parsed_urlr4s r� can_fetchzRobotFileParser.can_fetch�s ���� ��5��>� ��4�
� � ��5��\�*�*�6�<�+?�+?��+D�+D�E�E�
��l�%�%�r�"�Z�_���j�.�
�0C�'E�F�F���l� � ��%�%��� ��C��\� ,� ,�E���� �*�*�
,����s�+�+�+�+�+�
,��� 5��%�/�/��4�4�4��trc��|���sdS|jD] }|�|��r |jcS�!|jr|jjSdS�N)rrrVrIr
�rrXr4s r�crawl_delayzRobotFileParser.crawl_delay�sm���z�z�|�|� ��4��\� #� #�E���� �*�*�
#��{�"�"�"�
#��� ,��%�+�+��trc��|���sdS|jD] }|�|��r |jcS�!|jr|jjSdSr\)rrrVrJr
r]s r�request_ratezRobotFileParser.request_rate�sm���z�z�|�|� ��4��\� &� &�E���� �*�*�
&��~�%�%�%�
&��� /��%�.�.��trc�"�|jsdS|jSr\)r rs r� site_mapszRobotFileParser.site_maps�s���}� ��4��}�rc��|j}|j�||jgz}d�tt|����S)Nz
)rr
�join�map�str)rrs r�__str__zRobotFileParser.__str__�s>���,����)���!3� 4�4�G��{�{�3�s�G�,�,�-�-�-rN)r)�__name__�
__module__�__qualname__�__doc__rrrr
r&r5rrZr^r`rbrg�rrrrs���������
����!�!�!�(�(�(�?�?�?�
9�9�9�'�'�'�G#�G#�G#�R���:���������
.�.�.�.�.rc�$�eZdZdZd�Zd�Zd�ZdS)rFzoA rule line is a single "Allow:" (allowance==True) or "Disallow:"
(allowance==False) followed by a path.c���|dkr|sd}tj�tj�|����}tj�|��|_||_dS)NrT)rrrQr rUr"rW)rr"rWs rrzRuleLine.__init__�s[���2�:�:�i�:��I��|�&�&�v�|�'<�'<�T�'B�'B�C�C���L�&�&�t�,�,�� �"����rc�L�|jdkp|�|j��Sr0)r"�
startswith)r�filenames rrVzRuleLine.applies_to�s$���y�C��A�8�#6�#6�t�y�#A�#A�Arc�.�|jrdnddz|jzS)N�Allow�Disallowz: )rWr"rs rrgzRuleLine.__str__�s���>�9���z�T�A�D�I�M�MrN)rhrirjrkrrVrgrlrrrFrF�sS������1�1�#�#�#�B�B�B�N�N�N�N�NrrFc�*�eZdZdZd�Zd�Zd�Zd�ZdS)r>z?An entry has one or more user-agents and zero or more rulelinesc�>�g|_g|_d|_d|_dSr\)r2rErIrJrs rrzEntry.__init__�s"����������
���
�
�
rc�|�g}|jD]}|�d|�����|j�|�d|j����|j�,|j}|�d|j�d|j����|�tt|j ����d�
|��S)NzUser-agent: z
Crawl-delay: zRequest-rate: r<�
)r2r3rIrJ�requests�seconds�extendrerfrErd)r�ret�agent�rates rrgz
Entry.__str__�s������_� /� /�E��J�J�-�e�-�-�.�.�.�.��:�!��J�J�3�t�z�3�3�4�4�4��=�$��=�D��J�J�F��
�F�F���F�F�G�G�G��
�
�3�s�D�N�+�+�,�,�,��y�y��~�~�rc��|�d��d���}|jD]&}|dkrdS|���}||vrdS�'dS)z2check if this entry applies to the specified agentr<rr1TF)rArCr2)rrXr}s rrVzEntry.applies_to�sp���O�O�C�(�(��+�1�1�3�3� ��_� � �E���|�|��t�t��K�K�M�M�E�� �!�!��t�t�"��urc�V�|jD] }|�|��r |jcS�!dS)zZPreconditions:
- our agent applies to this entry
- filename is URL decodedT)rErVrW)rrqrMs rrWzEntry.allowance
sA���N� &� &�D����x�(�(�
&��~�%�%�%�
&��trN)rhrirjrkrrgrVrWrlrrr>r>�sV������I�I����
�
�
��������rr>)rk�collections�urllib.parser�urllib.request�__all__�
namedtuplerrrFr>rlrr�<module>r�s���
�
��������������
��$�k�$�]�4F�G�G��~.�~.�~.�~.�~.�~.�~.�~.�BN�N�N�N�N�N�N�N�$(�(�(�(�(�(�(�(�(�(r
Sindbad File Manager Version 1.0, Coded By Sindbad EG ~ The Terrorists